diff --git a/report/report.pdf b/report/report.pdf index 62f0cb8..da2cf05 100644 Binary files a/report/report.pdf and b/report/report.pdf differ diff --git a/report/report.tex b/report/report.tex index 438381d..4bd5f91 100644 --- a/report/report.tex +++ b/report/report.tex @@ -84,9 +84,13 @@ There are three probability axioms: \begin{enumerate} \item \textbf{Expectation - }The weighted average of the probabilities in the sample space \[\sum_{}^{S}{P(A) * A} = E \quad\text{where }E\text{ is the expected value}\] -\item \textbf{Variance - }The spread of possible values for a random variable -\item \textbf{Standard Deviation - }something -\[std = \sqrt{V}\quad\text{where variance is }V\] +\item \textbf{Variance - }The spread of possible values for a random variable, calculated as: +\[\sigma^{2}=\frac{\sum(X - \mu)^{2}}{N}\] +Where \(N\) is the population size, \(\mu\) is the population average, and \(X\) is each value in the population.\\ +For samples, variance is calculated with \textbf{Bessel's Correction}, which increases the variance to avoid overfitting the sample: +\[s^{2}=\frac{\sum(X - \bar{x})^{2}}{n - 1}\] +\item \textbf{Standard Deviation - }The square root of the variance, giving a measure of the average distance of each data point from the mean in the same units as the data. +\[\sigma = \sqrt{V}\quad\text{where variance is }V\] \end{enumerate} \subsubsection{Probability Functions} @@ -128,11 +132,9 @@ means will approach the true mean of the population. The Central Limit Theorem states that the sampling distribution of a sample mean is a normal distribution even when the population distribution is not normal. \[ -\frac{\sqrt{n} \left( \bar{X}_n - \mu \right)}{\sigma} \xrightarrow{d} N(0, 1), -\] -\[ -\text{Where \( \bar{X}_n = \frac{1}{n} \sum_{i=1}^{n}\), \( X_i \) is the sample mean, and \( N(0, 1) \) is a standard normal distribution.} +\frac{\sqrt{n} \left( \bar{X}_n - \mu \right)}{\sigma} \xrightarrow{d} N(0, 1) \] +Where \(X_i\) is the sample mean, \(N(0, 1)\) is a standard normal distribution, and \(\bar{X}_n = \frac{1}{n} \sum_{i=1}^{n}X_i\).\\ This is a challenging to understand solely as an equation. As an example, take a sample of two six-sided dice rolls and average their numbers. The more sample averages taken, the more they will resemble a normal distribution where the majority of samples average around 3. @@ -140,6 +142,13 @@ The more sample averages taken, the more they will resemble a normal distributio Confidence is described using a confidence interval, which is a range of values that the true value is expected to be in, and its associated confidence level, which is a probability (expressed as a percentage) that the true value is in the confidence interval. +It is important to note that confidence levels, such as 95\%, do not indicate that the real value is within 5\% of the point estimate. The confidence level expresses +the probability that the real value is in the range provided by the confidence interval. + +At the highest level, calculating confidence intervals is simply the observed statistic (generally the mean) plus or minus the standard error. + +To calculate standard error, kys. + % Confidence intervals can be calculated with z-tests, t-tests. Go into parametric vs non-parametric \subsubsection{Statistical Inference}