unit 1 report

This commit is contained in:
2024-10-03 23:47:33 -04:00
parent 549fa862a4
commit ead29fd637
6 changed files with 187 additions and 9 deletions

9
notes.md Normal file
View File

@@ -0,0 +1,9 @@
## Assumptions and Learnings
- probability density function expected value
- Confidence interval is not my value * confidence, it's confidence chance of being in my range
- I've made some mistakes in stat review, looking at narrow topics before covering broader parent topics. Should reorganize structure (tree, not list?)
t-test, z-test: both are hypothesis tests
The t-test is used when the population variance is unknown, or the sample size is small (n < 30)
The z-test is used when the population variance (σ2) is known *and* the sample size is large (n > 30)
To create a z-distribution table, mathematicians calculate the CDF for various z-scores and tabulate the results.

BIN
report/report.pdf Normal file

Binary file not shown.

149
report/report.tex Normal file
View File

@@ -0,0 +1,149 @@
\documentclass[12pt]{article}
\usepackage{blindtext}
\usepackage{hyperref}
\usepackage{amsmath}
\usepackage{amssymb}
\usepackage[a4paper, total={6in, 10in}]{geometry}
\hyphenpenalty 1000
\begin{document}
\begin{titlepage}
\begin{center}
\vspace*{5cm}
\Large{\textbf{Implementations of Probability Theory}}\\
\rule{14cm}{0.05cm}\\ \vspace{.25cm}
\Large{Independent Study Report}\\
\large{Andrew Simonson}
\vspace*{\fill}
\large{Compiled on: \today}\\
\end{center}
\end{titlepage}
\newpage
% Table of Contents
% \large{Table of Contents}
\tableofcontents
\addtocontents{toc}{~\hfill\textbf{Page}\par}
\newpage
% Begin report
\section{Objective}
yada yada yah I started this independent study for my own selfish gain
\newpage
\section{Units}
\rule{14cm}{0.05cm}
\subsection{Unit 1: Statistics Review}
To ensure a strong statistical foundation for the future learnings in probabilistic models,
the first objective was to create a document outlining and defining key topics that are
prerequisites for probabilities in statistics or for understanding generic analytical models.
\subsubsection{Random Variables}
\begin{enumerate}
\item \textbf{Discrete Random Variables - }values are selected by chance from a countable (including countably infinite) list of distinct values
\item \textbf{Continuous Random Variables - }values are selected by chance with an uncountable number of values within its range
\end{enumerate}
\subsubsection{Sample Space}
A sample space is the set of all possible outcomes of an instance. For a six-sided dice roll event,
the die may land with 1 through 6 dots facing upwards, hence:
\[S = [1, 2, 3, 4, 5, 6] \quad\text{where }S\text{ is the sample space}\]
\subsubsection{Probability Axioms}
There are three probability axioms:
\begin{enumerate}
\item \textbf{Non-negativity}:
\[
P(A) \geq 0 \quad \text{for any event }A, \ P(A) \in \mathbb{R}
\]
No event can be less likely to occur than an impossible event ( \(P(A) = 0\) ). P(A) is a real number.
Paired with axiom 2 we can also conclude that \(P(A) \leq 1\).
\item \textbf{Normalization}:
\[
P(S) = 1\quad\text{where }S\text{ is the sample space}
\]
\textbf{Unit Measure - } All event probabilities in a sample space add up to 1. In essence, there is a 100\%
chance that one of the events in the sample space will occur.
\item \textbf{Additivity}:
\[
P(A \cup B) = P(A) + P(B) \quad \text{if } A \cap B = \emptyset
\]
A union between events that are mutually exclusive (events that cannot both happen for an instance) has a
probability that is the sum of the associated event probabilities.
\end{enumerate}
\subsubsection{Expectations and Deviation}
\begin{enumerate}
\item \textbf{Expectation - }The weighted average of the probabilities in the sample space
\[\sum_{}^{S}{P(A) * A} = E \quad\text{where }E\text{ is the expected value}\]
\item \textbf{Variance - }The spread of possible values for a random variable
\item \textbf{Standard Deviation - }something
\[std = \sqrt{V}\quad\text{where variance is }V\]
\end{enumerate}
\subsubsection{Probability Functions}
Probability Functions map the likelihood of random variables to be a specific value.
\subsubsection*{Probability Mass Functions}
Probability Mass Functions (PMFs) map discrete random variables.
For example, a six-sided die roll creates a uniform random PMF:
\begin{equation*}
P(A) =
\begin{cases}
1/6\qquad\text{if }&X=1\\
1/6&X=2\\
1/6&X=3\\
1/6&X=4\\
1/6&X=5\\
1/6&X=6\\
\end{cases}
\end{equation*}
\subsubsection*{Probability Density Functions}
Probability Density Functions (PDFs) map continuous random variables.
For example, this is a PDF where things happen.
\begin{equation*}
P(A) =
\begin{cases}
X\qquad\qquad\text{if }&0\leq X\leq .5\\
-X+1&.5<X\leq 1\\
0&otherwise
\end{cases}
\end{equation*}
\subsubsection{Limit Theorems}
\subsubsection*{Law of Large Numbers}
The Law of Large Numbers states that as the number of independent random samples increases, the average of the samples'
means will approach the true mean of the population.
\[\text{true average}\approx \frac{1}{n} \sum_{i=1}^{n} X_{i} \qquad\text{as }n \rightarrow \infty\]
\subsubsection*{Central Limit Theorem}
The Central Limit Theorem states that the sampling distribution of a sample mean is a normal distribution even when the
population distribution is not normal.
\[
\frac{\sqrt{n} \left( \bar{X}_n - \mu \right)}{\sigma} \xrightarrow{d} N(0, 1),
\]
\[
\text{Where \( \bar{X}_n = \frac{1}{n} \sum_{i=1}^{n}\), \( X_i \) is the sample mean, and \( N(0, 1) \) is a standard normal distribution.}
\]
This is a challenging to understand solely as an equation. As an example, take a sample of two six-sided dice rolls and average their numbers.
The more sample averages taken, the more they will resemble a normal distribution where the majority of samples average around 3.
\subsubsection{Confidence}
Confidence is described using a confidence interval, which is a range of values that the true value is expected to be in, and its associated confidence level,
which is a probability (expressed as a percentage) that the true value is in the confidence interval.
% Confidence intervals can be calculated with z-tests, t-tests. Go into parametric vs non-parametric
\subsubsection{Statistical Inference}
Statistical Inference is any data analysis to draw conclusions from a sample to make assertions about the population.
Methods include estimation via averages and confidence intervals, and hypothesis testing, which attempts to invalidate (never \textit{validate}) a hypothesis.
\end{document}

View File

@@ -1,4 +1,10 @@
Week,Date,Type,Duration (Hours),Description Week,Date,Type,Duration (Hours),Description
1,08/30,Advising Meetings,1,"Stat Review Content acknowledgement, Latex overview for reports" 1,08/30,Advising Meetings,2,"Stat Review Content acknowledgement, Latex overview for reports"
2,09/02,Reporting,3,"First applications of Latex for final report, created Timesheet System." 2,09/02,Reporting,3,"First applications of Latex for final report, created Timesheet System."
2,09/02,Research,2,"Stat Review: Sample Space through Probability Density Functions" 2,09/02,Research,2,"Stat Review: Sample Space through Probability Density Functions"
2,09/06,Advising Meetings,1,"Research Review and exploration of PDF expected values and confidence intervals"
4,09/19,Research,2,"Producing Confidence Intervals"
4,09/20,Research,1,"Statistical Inference and t-testing"
4,09/20,Advising Meetings,1,"Stat Review finalization, definition of reporting standard"
5,09/23,Research,2,"Parametric and Non-parametric tests"
6,10/03,Reporting,4,"Structuring stat review report"
1 Week Date Type Duration (Hours) Description
2 1 08/30 Advising Meetings 1 2 Stat Review Content acknowledgement, Latex overview for reports
3 2 09/02 Reporting 3 First applications of Latex for final report, created Timesheet System.
4 2 09/02 Research 2 Stat Review: Sample Space through Probability Density Functions
5 2 09/06 Advising Meetings 1 Research Review and exploration of PDF expected values and confidence intervals
6 4 09/19 Research 2 Producing Confidence Intervals
7 4 09/20 Research 1 Statistical Inference and t-testing
8 4 09/20 Advising Meetings 1 Stat Review finalization, definition of reporting standard
9 5 09/23 Research 2 Parametric and Non-parametric tests
10 6 10/03 Reporting 4 Structuring stat review report

Binary file not shown.

View File

@@ -1,13 +1,15 @@
\documentclass{article} \documentclass[12pt]{article}
\usepackage{blindtext} \usepackage{blindtext}
\usepackage[a4paper, total={6in, 8in}]{geometry} \usepackage[a4paper, total={6in, 10in}]{geometry}
\nofiles \nofiles
\hyphenpenalty 1000
\begin{document} \begin{document}
\begin{titlepage} \begin{titlepage}
\begin{center} \begin{center}
\vspace*{5cm}
\Large{\textbf{Implementations of Probability Theory}}\\ \Large{\textbf{Implementations of Probability Theory}}\\
\rule{14cm}{0.05cm}\\ \vspace{.5cm} \rule{14cm}{0.05cm}\\ \vspace{.5cm}
@@ -28,18 +30,30 @@
\hline \hline
Week & Date & Type & Duration (Hours) & Description \\ Week & Date & Type & Duration (Hours) & Description \\
\hline \hline
1 & 08/30 & Advising Meetings & 1 & Stat Review Content acknowledgement, Latex overview for reports \\ 1 & 08/30 & Advising Meetings & 2 & Stat Review Content acknowledgement, Latex overview for reports \\
\hline \hline
2 & 09/02 & Reporting & 3 & First applications of Latex for final report, created Timesheet System. \\ 2 & 09/02 & Reporting & 3 & First applications of Latex for final report, created Timesheet System. \\
\hline \hline
2 & 09/02 & Research & 2 & Stat Review: Sample Space through Probability Density Functions \\ 2 & 09/02 & Research & 2 & Stat Review: Sample Space through Probability Density Functions \\
\hline \hline
2 & 09/06 & Advising Meetings & 1 & Research Review and exploration of PDF expected values and confidence intervals \\
\hline
4 & 09/19 & Research & 2 & Producing Confidence Intervals \\
\hline
4 & 09/20 & Research & 1 & Statistical Inference and t-testing \\
\hline
4 & 09/20 & Advising Meetings & 1 & Stat Review finalization, definition of reporting standard \\
\hline
5 & 09/23 & Research & 2 & Parametric and Non-parametric tests \\
\hline
6 & 10/03 & Reporting & 4 & Structuring stat review report \\
\hline
\end{tabular} \end{tabular}
\end{table} \end{table}
\noindent Hours for Advising Meetings: 1\\ \noindent Hours for Advising Meetings: 4\\
Hours for Reporting: 3\\ Hours for Reporting: 7\\
Hours for Research: 2\\ Hours for Research: 7\\
\textbf{Total Hours: 6}\\ \textbf{Total Hours: 18}\\
% CLOSE Timesheet % CLOSE Timesheet
\end{document} \end{document}