unit 1 report

2026-02-24 21:59:50 -06:00 · 2024-10-03 23:47:33 -04:00
parent 549fa862a4
commit ead29fd637
6 changed files with 187 additions and 9 deletions
--- a/notes.md
+++ b/notes.md
@@ -0,0 +1,9 @@
 ## Assumptions and Learnings 
 - probability density function expected value
 - Confidence interval is not my value * confidence, it's confidence chance of being in my range
 - I've made some mistakes in stat review, looking at narrow topics before covering broader parent topics.  Should reorganize structure (tree, not list?)
 t-test, z-test: both are hypothesis tests
 The t-test is used when the population variance is unknown, or the sample size is small (n < 30)
 The z-test is used when the population variance (σ2) is known *and* the sample size is large (n > 30)
 To create a z-distribution table, mathematicians calculate the CDF for various z-scores and tabulate the results.
--- a/report/report.pdf
+++ b/report/report.pdf
--- a/report/report.tex
+++ b/report/report.tex
@@ -0,0 +1,149 @@
 \documentclass[12pt]{article}
 \usepackage{blindtext}
 \usepackage{hyperref}
 \usepackage{amsmath}
 \usepackage{amssymb}
 \usepackage[a4paper, total={6in, 10in}]{geometry}
 \hyphenpenalty 1000
 \begin{document}
 \begin{titlepage}
 \begin{center}
 \vspace*{5cm}
 \Large{\textbf{Implementations of Probability Theory}}\\
 \rule{14cm}{0.05cm}\\ \vspace{.25cm}
 \Large{Independent Study Report}\\
 \large{Andrew Simonson}
 \vspace*{\fill}
 \large{Compiled on: \today}\\
 \end{center}
 \end{titlepage}
 \newpage 
 % Table of Contents
 % \large{Table of Contents}
 \tableofcontents
 \addtocontents{toc}{~\hfill\textbf{Page}\par}
 \newpage
 % Begin report
 \section{Objective}
 yada yada yah I started this independent study for my own selfish gain
 \newpage
 \section{Units}
 \rule{14cm}{0.05cm}
 \subsection{Unit 1: Statistics Review}
 To ensure a strong statistical foundation for the future learnings in probabilistic models, 
 the first objective was to create a document outlining and defining key topics that are 
 prerequisites for probabilities in statistics or for understanding generic analytical models.
 \subsubsection{Random Variables}
 \begin{enumerate}
 \item \textbf{Discrete Random Variables - }values are selected by chance from a countable (including countably infinite) list of distinct values
 \item \textbf{Continuous Random Variables - }values are selected by chance with an uncountable number of values within its range
 \end{enumerate}
 \subsubsection{Sample Space}
 A sample space is the set of all possible outcomes of an instance.  For a six-sided dice roll event, 
 the die may land with 1 through 6 dots facing upwards, hence:
 \[S = [1, 2, 3, 4, 5, 6] \quad\text{where }S\text{ is the sample space}\]
 \subsubsection{Probability Axioms}
 There are three probability axioms:
 \begin{enumerate}
    \item \textbf{Non-negativity}:  
    \[
    P(A) \geq 0 \quad \text{for any event }A, \ P(A) \in \mathbb{R}
    \]
    No event can be less likely to occur than an impossible event ( \(P(A) = 0\) ). P(A) is a real number.  
    Paired with axiom 2 we can also conclude that \(P(A) \leq 1\).
    \item \textbf{Normalization}:  
    \[
    P(S) = 1\quad\text{where }S\text{ is the sample space}
    \]
    \textbf{Unit Measure - } All event probabilities in a sample space add up to 1.  In essence, there is a 100\% 
    chance that one of the events in the sample space will occur.
    \item \textbf{Additivity}:  
    \[
    P(A \cup B) = P(A) + P(B) \quad \text{if } A \cap B = \emptyset
    \]
    A union between events that are mutually exclusive (events that cannot both happen for an instance) has a 
    probability that is the sum of the associated event probabilities.
 \end{enumerate}
 \subsubsection{Expectations and Deviation}
 \begin{enumerate}
 \item \textbf{Expectation - }The weighted average of the probabilities in the sample space
 \[\sum_{}^{S}{P(A) * A} = E \quad\text{where }E\text{ is the expected value}\]
 \item \textbf{Variance - }The spread of possible values for a random variable
 \item \textbf{Standard Deviation - }something
 \[std = \sqrt{V}\quad\text{where variance is }V\]
 \end{enumerate}
 \subsubsection{Probability Functions}
 Probability Functions map the likelihood of random variables to be a specific value.
 \subsubsection*{Probability Mass Functions}
 Probability Mass Functions (PMFs) map discrete random variables.
 For example, a six-sided die roll creates a uniform random PMF:
 \begin{equation*}
    P(A) = 
    \begin{cases}
        1/6\qquad\text{if }&X=1\\
        1/6&X=2\\
        1/6&X=3\\
        1/6&X=4\\
        1/6&X=5\\
        1/6&X=6\\
    \end{cases}
 \end{equation*}
 \subsubsection*{Probability Density Functions}
 Probability Density Functions (PDFs) map continuous random variables.
 For example, this is a PDF where things happen.
 \begin{equation*}
    P(A) = 
    \begin{cases}
        X\qquad\qquad\text{if }&0\leq X\leq .5\\
        -X+1&.5<X\leq 1\\
        0&otherwise
    \end{cases}
 \end{equation*}
 \subsubsection{Limit Theorems}
 \subsubsection*{Law of Large Numbers}
 The Law of Large Numbers states that as the number of independent random samples increases, the average of the samples' 
 means will approach the true mean of the population. 
 \[\text{true average}\approx \frac{1}{n} \sum_{i=1}^{n} X_{i} \qquad\text{as }n \rightarrow \infty\]
 \subsubsection*{Central Limit Theorem}
 The Central Limit Theorem states that the sampling distribution of a sample mean is a normal distribution even when the 
 population distribution is not normal.
 \[
 \frac{\sqrt{n} \left( \bar{X}_n - \mu \right)}{\sigma} \xrightarrow{d} N(0, 1),
 \]
 \[
 \text{Where \( \bar{X}_n = \frac{1}{n} \sum_{i=1}^{n}\), \( X_i \) is the sample mean, and \( N(0, 1) \) is a standard normal distribution.}
 \]
 This is a challenging to understand solely as an equation.  As an example, take a sample of two six-sided dice rolls and average their numbers.  
 The more sample averages taken, the more they will resemble a normal distribution where the majority of samples average around 3.
 \subsubsection{Confidence}
 Confidence is described using a confidence interval, which is a range of values that the true value is expected to be in, and its associated confidence level, 
 which is a probability (expressed as a percentage) that the true value is in the confidence interval.
 % Confidence intervals can be calculated with z-tests, t-tests.  Go into parametric vs non-parametric
 \subsubsection{Statistical Inference}
 Statistical Inference is any data analysis to draw conclusions from a sample to make assertions about the population.  
 Methods include estimation via averages and confidence intervals, and hypothesis testing, which attempts to invalidate (never \textit{validate}) a hypothesis.
 \end{document}
--- a/timesheet/timesheet.csv
+++ b/timesheet/timesheet.csv
@@ -1,4 +1,10 @@
 Week,Date,Type,Duration (Hours),Description
-1,08/30,Advising Meetings,1,"Stat Review Content acknowledgement, Latex overview for reports"
+1,08/30,Advising Meetings,2,"Stat Review Content acknowledgement, Latex overview for reports"
 2,09/02,Reporting,3,"First applications of Latex for final report, created Timesheet System."
 2,09/02,Research,2,"Stat Review: Sample Space through Probability Density Functions"
 2,09/06,Advising Meetings,1,"Research Review and exploration of PDF expected values and confidence intervals"
 4,09/19,Research,2,"Producing Confidence Intervals"
 4,09/20,Research,1,"Statistical Inference and t-testing"
 4,09/20,Advising Meetings,1,"Stat Review finalization, definition of reporting standard"
 5,09/23,Research,2,"Parametric and Non-parametric tests"
 6,10/03,Reporting,4,"Structuring stat review report"
--- a/timesheet/timesheet.pdf
+++ b/timesheet/timesheet.pdf
--- a/timesheet/timesheet.tex
+++ b/timesheet/timesheet.tex
@@ -1,13 +1,15 @@
-\documentclass{article}
+\documentclass[12pt]{article}
 \usepackage{blindtext}
-\usepackage[a4paper, total={6in, 8in}]{geometry}
+\usepackage[a4paper, total={6in, 10in}]{geometry}
 \nofiles
 \hyphenpenalty 1000
 \begin{document}
 \begin{titlepage}
 \begin{center}
 \vspace*{5cm}
 \Large{\textbf{Implementations of Probability Theory}}\\
 \rule{14cm}{0.05cm}\\ \vspace{.5cm}
@@ -28,18 +30,30 @@
 \hline
 Week & Date & Type & Duration (Hours) & Description \\
 \hline
-1 & 08/30 & Advising Meetings & 1 & Stat Review Content acknowledgement, Latex overview for reports \\
+1 & 08/30 & Advising Meetings & 2 & Stat Review Content acknowledgement, Latex overview for reports \\
 \hline
 2 & 09/02 & Reporting & 3 & First applications of Latex for final report, created Timesheet System. \\
 \hline
 2 & 09/02 & Research & 2 & Stat Review: Sample Space through Probability Density Functions \\
 \hline
 2 & 09/06 & Advising Meetings & 1 & Research Review and exploration of PDF expected values and confidence intervals \\
 \hline
 4 & 09/19 & Research & 2 & Producing Confidence Intervals \\
 \hline
 4 & 09/20 & Research & 1 & Statistical Inference and t-testing \\
 \hline
 4 & 09/20 & Advising Meetings & 1 & Stat Review finalization, definition of reporting standard \\
 \hline
 5 & 09/23 & Research & 2 & Parametric and Non-parametric tests \\
 \hline
 6 & 10/03 & Reporting & 4 & Structuring stat review report \\
 \hline
 \end{tabular}
 \end{table}
-\noindent Hours for Advising Meetings: 1\\
+\noindent Hours for Advising Meetings: 4\\
-Hours for Reporting: 3\\
+Hours for Reporting: 7\\
-Hours for Research: 2\\
+Hours for Research: 7\\
-\textbf{Total Hours: 6}\\
+\textbf{Total Hours: 18}\\
 % CLOSE Timesheet
 \end{document}