unit 1 report

2026-04-11 10:07:12 -05:00 · 2024-10-03 23:47:33 -04:00
parent 549fa862a4
commit ead29fd637
6 changed files with 187 additions and 9 deletions
--- a/notes.md
+++ b/notes.md
@@ -0,0 +1,9 @@
+## Assumptions and Learnings 
+ - probability density function expected value
+ - Confidence interval is not my value * confidence, it's confidence chance of being in my range
+ - I've made some mistakes in stat review, looking at narrow topics before covering broader parent topics.  Should reorganize structure (tree, not list?)
+
+t-test, z-test: both are hypothesis tests
+The t-test is used when the population variance is unknown, or the sample size is small (n < 30)
+The z-test is used when the population variance (σ2) is known *and* the sample size is large (n > 30)
+To create a z-distribution table, mathematicians calculate the CDF for various z-scores and tabulate the results.
--- a/report/report.pdf
+++ b/report/report.pdf
--- a/report/report.tex
+++ b/report/report.tex
@@ -0,0 +1,149 @@
+\documentclass[12pt]{article}
+\usepackage{blindtext}
+\usepackage{hyperref}
+\usepackage{amsmath}
+\usepackage{amssymb}
+\usepackage[a4paper, total={6in, 10in}]{geometry}
+\hyphenpenalty 1000
+
+\begin{document}
+\begin{titlepage}
+\begin{center}
+
+\vspace*{5cm}
+\Large{\textbf{Implementations of Probability Theory}}\\
+
+\rule{14cm}{0.05cm}\\ \vspace{.25cm}
+
+\Large{Independent Study Report}\\
+\large{Andrew Simonson}
+
+\vspace*{\fill}
+\large{Compiled on: \today}\\
+ 
+\end{center}
+\end{titlepage}
+
+\newpage 
+% Table of Contents
+% \large{Table of Contents}
+\tableofcontents
+\addtocontents{toc}{~\hfill\textbf{Page}\par}
+
+\newpage
+% Begin report
+\section{Objective}
+yada yada yah I started this independent study for my own selfish gain
+
+\newpage
+\section{Units}
+\rule{14cm}{0.05cm}
+\subsection{Unit 1: Statistics Review}
+To ensure a strong statistical foundation for the future learnings in probabilistic models, 
+the first objective was to create a document outlining and defining key topics that are 
+prerequisites for probabilities in statistics or for understanding generic analytical models.
+
+\subsubsection{Random Variables}
+\begin{enumerate}
+\item \textbf{Discrete Random Variables - }values are selected by chance from a countable (including countably infinite) list of distinct values
+\item \textbf{Continuous Random Variables - }values are selected by chance with an uncountable number of values within its range
+\end{enumerate}
+
+\subsubsection{Sample Space}
+A sample space is the set of all possible outcomes of an instance.  For a six-sided dice roll event, 
+the die may land with 1 through 6 dots facing upwards, hence:
+\[S = [1, 2, 3, 4, 5, 6] \quad\text{where }S\text{ is the sample space}\]
+
+\subsubsection{Probability Axioms}
+There are three probability axioms:
+
+\begin{enumerate}
+    \item \textbf{Non-negativity}:  
+    \[
+    P(A) \geq 0 \quad \text{for any event }A, \ P(A) \in \mathbb{R}
+    \]
+    No event can be less likely to occur than an impossible event ( \(P(A) = 0\) ). P(A) is a real number.  
+    Paired with axiom 2 we can also conclude that \(P(A) \leq 1\).
+    
+    \item \textbf{Normalization}:  
+    \[
+    P(S) = 1\quad\text{where }S\text{ is the sample space}
+    \]
+    \textbf{Unit Measure - } All event probabilities in a sample space add up to 1.  In essence, there is a 100\% 
+    chance that one of the events in the sample space will occur.
+    
+    \item \textbf{Additivity}:  
+    \[
+    P(A \cup B) = P(A) + P(B) \quad \text{if } A \cap B = \emptyset
+    \]
+    A union between events that are mutually exclusive (events that cannot both happen for an instance) has a 
+    probability that is the sum of the associated event probabilities.
+\end{enumerate}
+
+\subsubsection{Expectations and Deviation}
+\begin{enumerate}
+\item \textbf{Expectation - }The weighted average of the probabilities in the sample space
+\[\sum_{}^{S}{P(A) * A} = E \quad\text{where }E\text{ is the expected value}\]
+\item \textbf{Variance - }The spread of possible values for a random variable
+\item \textbf{Standard Deviation - }something
+\[std = \sqrt{V}\quad\text{where variance is }V\]
+\end{enumerate}
+
+\subsubsection{Probability Functions}
+Probability Functions map the likelihood of random variables to be a specific value.
+
+\subsubsection*{Probability Mass Functions}
+Probability Mass Functions (PMFs) map discrete random variables.
+For example, a six-sided die roll creates a uniform random PMF:
+\begin{equation*}
+    P(A) = 
+    \begin{cases}
+        1/6\qquad\text{if }&X=1\\
+        1/6&X=2\\
+        1/6&X=3\\
+        1/6&X=4\\
+        1/6&X=5\\
+        1/6&X=6\\
+    \end{cases}
+\end{equation*}
+
+\subsubsection*{Probability Density Functions}
+Probability Density Functions (PDFs) map continuous random variables.
+For example, this is a PDF where things happen.
+\begin{equation*}
+    P(A) = 
+    \begin{cases}
+        X\qquad\qquad\text{if }&0\leq X\leq .5\\
+        -X+1&.5<X\leq 1\\
+        0&otherwise
+    \end{cases}
+\end{equation*}
+
+\subsubsection{Limit Theorems}
+\subsubsection*{Law of Large Numbers}
+The Law of Large Numbers states that as the number of independent random samples increases, the average of the samples' 
+means will approach the true mean of the population. 
+\[\text{true average}\approx \frac{1}{n} \sum_{i=1}^{n} X_{i} \qquad\text{as }n \rightarrow \infty\]
+\subsubsection*{Central Limit Theorem}
+The Central Limit Theorem states that the sampling distribution of a sample mean is a normal distribution even when the 
+population distribution is not normal.
+\[
+\frac{\sqrt{n} \left( \bar{X}_n - \mu \right)}{\sigma} \xrightarrow{d} N(0, 1),
+\]
+\[
+\text{Where \( \bar{X}_n = \frac{1}{n} \sum_{i=1}^{n}\), \( X_i \) is the sample mean, and \( N(0, 1) \) is a standard normal distribution.}
+\]
+This is a challenging to understand solely as an equation.  As an example, take a sample of two six-sided dice rolls and average their numbers.  
+The more sample averages taken, the more they will resemble a normal distribution where the majority of samples average around 3.
+
+\subsubsection{Confidence}
+Confidence is described using a confidence interval, which is a range of values that the true value is expected to be in, and its associated confidence level, 
+which is a probability (expressed as a percentage) that the true value is in the confidence interval.
+
+% Confidence intervals can be calculated with z-tests, t-tests.  Go into parametric vs non-parametric
+
+\subsubsection{Statistical Inference}
+Statistical Inference is any data analysis to draw conclusions from a sample to make assertions about the population.  
+Methods include estimation via averages and confidence intervals, and hypothesis testing, which attempts to invalidate (never \textit{validate}) a hypothesis.
+
+\end{document}
--- a/timesheet/timesheet.csv
+++ b/timesheet/timesheet.csv
@@ -1,4 +1,10 @@
 Week,Date,Type,Duration (Hours),Description
-1,08/30,Advising Meetings,1,"Stat Review Content acknowledgement, Latex overview for reports"
+1,08/30,Advising Meetings,2,"Stat Review Content acknowledgement, Latex overview for reports"
 2,09/02,Reporting,3,"First applications of Latex for final report, created Timesheet System."
 2,09/02,Research,2,"Stat Review: Sample Space through Probability Density Functions"
+2,09/06,Advising Meetings,1,"Research Review and exploration of PDF expected values and confidence intervals"
+4,09/19,Research,2,"Producing Confidence Intervals"
+4,09/20,Research,1,"Statistical Inference and t-testing"
+4,09/20,Advising Meetings,1,"Stat Review finalization, definition of reporting standard"
+5,09/23,Research,2,"Parametric and Non-parametric tests"
+6,10/03,Reporting,4,"Structuring stat review report"
--- a/timesheet/timesheet.pdf
+++ b/timesheet/timesheet.pdf
--- a/timesheet/timesheet.tex
+++ b/timesheet/timesheet.tex
@@ -1,13 +1,15 @@
-\documentclass{article}
+\documentclass[12pt]{article}
 \usepackage{blindtext}
-\usepackage[a4paper, total={6in, 8in}]{geometry}
+\usepackage[a4paper, total={6in, 10in}]{geometry}
 \nofiles
+\hyphenpenalty 1000

 \begin{document}

 \begin{titlepage}
 \begin{center}

+\vspace*{5cm}
 \Large{\textbf{Implementations of Probability Theory}}\\

 \rule{14cm}{0.05cm}\\ \vspace{.5cm}
@@ -28,18 +30,30 @@
 \hline
 Week & Date & Type & Duration (Hours) & Description \\
 \hline
-1 & 08/30 & Advising Meetings & 1 & Stat Review Content acknowledgement, Latex overview for reports \\
+1 & 08/30 & Advising Meetings & 2 & Stat Review Content acknowledgement, Latex overview for reports \\
 \hline
 2 & 09/02 & Reporting & 3 & First applications of Latex for final report, created Timesheet System. \\
 \hline
 2 & 09/02 & Research & 2 & Stat Review: Sample Space through Probability Density Functions \\
 \hline
+2 & 09/06 & Advising Meetings & 1 & Research Review and exploration of PDF expected values and confidence intervals \\
+\hline
+4 & 09/19 & Research & 2 & Producing Confidence Intervals \\
+\hline
+4 & 09/20 & Research & 1 & Statistical Inference and t-testing \\
+\hline
+4 & 09/20 & Advising Meetings & 1 & Stat Review finalization, definition of reporting standard \\
+\hline
+5 & 09/23 & Research & 2 & Parametric and Non-parametric tests \\
+\hline
+6 & 10/03 & Reporting & 4 & Structuring stat review report \\
+\hline
 \end{tabular}
 \end{table}
-\noindent Hours for Advising Meetings: 1\\
-Hours for Reporting: 3\\
-Hours for Research: 2\\
-\textbf{Total Hours: 6}\\
+\noindent Hours for Advising Meetings: 4\\
+Hours for Reporting: 7\\
+Hours for Research: 7\\
+\textbf{Total Hours: 18}\\
 % CLOSE Timesheet

 \end{document}