\documentclass[12pt]{article} \usepackage{blindtext} \usepackage{hyperref} \usepackage{amsmath} \usepackage{amssymb} \usepackage{tikz} \usepackage[a4paper, total={6in, 10in}]{geometry} \usepackage{setspace} \setstretch{1.25} \hyphenpenalty 1000 \begin{document} \begin{titlepage} \begin{center} \vspace*{5cm} \Large{\textbf{Implementations of Probability Theory}}\\ \rule{14cm}{0.05cm}\\ \vspace{.25cm} \Large{Independent Study Report}\\ \large{Andrew Simonson} \vspace*{\fill} \large{Compiled on: \today}\\ \end{center} \end{titlepage} \newpage % Table of Contents % \large{Table of Contents} \tableofcontents \addtocontents{toc}{~\hfill\textbf{Page}\par} \newpage % Begin report \section{Objective} The educational focus of Implementations of Probability Theory surrounds the application of data models that produce non-deterministic insights through probabilistic methodology. By pursuing this study I hope to gain a deeper understanding of how to apply data in risk calculation for mitigation scenarios as they appear in real life, rather than the experimental lab conditions that enable algorithmic certainty. In contrast to the path of black-box artificial intelligence and algorithms taught in \textbf{CSCI 335: Machine Learning}, this study is tailored to methods designed to produce confidence levels for uncertain events using certain terms, leveraging logical, traceable, and definite, calculations. Current course offerings in the realm of data science focus largely on the storing and management of data, and it is noted that the cluster of data science was until very recently under the branding of data management. Implementations of Probability Theory is intended to extend learnings in previous courses, notably \textbf{CSCI 420: Principles of Data Mining}, for more advanced algorithms used at the intersection of data and computing after the preprocessing stage. After beginning this study the intended deliverable outline was determined to be technically implausible and has been replaced with demonstrations of applied algorithms. Taking inspiration from the retinal mosaic as displayed in \textbf{CSCI 431: Intro to Computer Vision} and discussion in \textbf{IGME 589: Computational Creativity and Algorithmic Art} on the appearance and nature of randomness in graphics, I hope to create a program that can determine the liklihood that randomly distributed colors on a hexagonal grid appear as they do in an image. \newpage \section{Units} \rule{14cm}{0.05cm} \subsection{Unit 1: Statistics Review} To ensure a strong statistical foundation for the future learnings in probabilistic models, the first objective was to create a document outlining and defining key topics that are prerequisites for probabilities in statistics or for understanding generic analytical models. \subsubsection{Random Variables} \begin{enumerate} \item \textbf{Discrete Random Variables - }values are selected by chance from a countable (including countably infinite) list of distinct values \item \textbf{Continuous Random Variables - }values are selected by chance with an uncountable number of values within its range \end{enumerate} \subsubsection{Sample Space} A sample space is the set of all possible outcomes of an instance. For a six-sided dice roll event, the die may land with 1 through 6 dots facing upwards, hence: \[S = [1, 2, 3, 4, 5, 6] \quad\text{where }S\text{ is the sample space}\] \subsubsection{Probability Axioms} There are three probability axioms: \begin{enumerate} \item \textbf{Non-negativity}: \[ P(A) \geq 0 \quad \text{for any event }A, \ P(A) \in \mathbb{R} \] No event can be less likely to occur than an impossible event ( \(P(A) = 0\) ). P(A) is a real number. Paired with axiom 2 we can also conclude that \(P(A) \leq 1\). \item \textbf{Normalization}: \[ P(S) = 1\quad\text{where }S\text{ is the sample space} \] \textbf{Unit Measure - } All event probabilities in a sample space add up to 1. In essence, there is a 100\% chance that one of the events in the sample space will occur. \item \textbf{Additivity}: \[ P(A \cup B) = P(A) + P(B) \quad \text{if } A \cap B = \emptyset \] A union between events that are mutually exclusive (events that cannot both happen for an instance) has a probability that is the sum of the associated event probabilities. \end{enumerate} \subsubsection{Expectations and Deviation} \begin{enumerate} \item \textbf{Expectation - }The weighted average of the probabilities in the sample space \[\sum_{}^{S}{P(A) * A} = E \quad\text{where }E\text{ is the expected value}\] \item \textbf{Variance - }The spread of possible values for a random variable, calculated as: \[\sigma^{2}=\frac{\sum(X - \mu)^{2}}{N}\] Where \(N\) is the population size, \(\mu\) is the population average, and \(X\) is each value in the population.\\ For samples, variance is calculated with \textbf{Bessel's Correction}, which increases the variance to avoid overfitting the sample: \[s^{2}=\frac{\sum(X - \bar{x})^{2}}{n - 1}\] \item \textbf{Standard Deviation - }The square root of the variance, giving a measure of the average distance of each data point from the mean in the same units as the data. \[\sigma = \sqrt{V}\quad\text{where variance is }V\] \end{enumerate} \subsubsection{Probability Functions} Probability Functions map the likelihood of random variables to be a specific value. \subsubsection*{Probability Mass Functions} Probability Mass Functions (PMFs) map discrete random variables. For example, a six-sided die roll creates a uniform random PMF: \begin{equation*} P(A) = \begin{cases} 1/6\qquad\text{if }&X=1\\ 1/6&X=2\\ 1/6&X=3\\ 1/6&X=4\\ 1/6&X=5\\ 1/6&X=6\\ \end{cases} \end{equation*} \subsubsection*{Probability Density Functions} Probability Density Functions (PDFs) map continuous random variables. For example, this is a PDF where things happen. \begin{equation*} P(A) = \begin{cases} X\qquad\qquad\text{if }&0\leq X\leq .5\\ -X+1&.5] (-.6, 1) -- (.15, 1); \node[label=below:45/1000] at (1.5,-2/3) {FP}; \draw[->] (1.5, -1/3) -- (1.5, .05); \draw[gray, thin] (3/10, 0) rectangle (3, 3*.05); \end{tikzpicture} \end{center} \vskip 2pt Using this visual where TP represents true positives and FP representing false positives, Bayes Theorem is simply expressed as: \[ P(A|E) = \frac{TP}{TP + FP} = \frac{\frac{95}{1000}}{\frac{95}{1000} + \frac{45}{1000}} = 67.9\% \] Meaning that, given a random positive test, there is a 67.9\% chance of the patient actually having cancer. This percentage visually tracks with the graphic as the TP box appears to be approximately twice the size of the FP box, giving a two-thirds chance of the patient being a true positive. \subsubsection{Bayesian Updating} Bayesian Updating is another term that has been added to buzzword vocabulary to describe a process that isn't directly related to Bayesian Statistics but appears to have been rediscovered by academia through study of applied Bayes Theorem. In essence, Bayesian Updating simply states that observed occurrences should not override previous evidence and that it should instead be added to it in equal weight (equal value being a naive assumption). This evidence updating makes applications of Bayes Theory calculate posterior probabilities continuously as new information enters the system rather than a calculation that is only done once. \subsubsection{Bayesian Belief Networks} Bayesian Belief Networks are probablistic graphical models that preserve conditional dependence between random variables. In spite of its name, Bayesian Belief Networks do not necessarily apply Bayesian models, though they are a way to utilize Bayes Theorem for domains with greater complexity beyond a single posterior probability. In this type of network, edges are directed and the structure is utilized in a single direction. This is in contrast to undirected Hidden Markov Models (to be covered in the next unit) that do not assume the order of aquisition of random variables. While it may not be practical to calculate the full conditional probability of a variable, Bayesian Belief Networks allow us to identify conditionally dependent variables that are weighted on the basis of an earlier random variable. Following the example in the Bayes Theorem section of this report (\ref{Bayes Theorem}), let's suppose that a patient with a positive test takes a hypothetical second test whose results are partially dependent on the first as they measure overlapping biological markers. In this case, the results of the first test is relevant to the second test: \vskip 5pt \begin{center} \begin{tikzpicture} \draw[black, thick] (-2, 4.5) rectangle (2, 5.5); \node at (0, 5) (bio) {Biological Markers}; \draw[black, thick] (-1.5, 3) circle (0.75); \node at (-1.5, 3) (T1) {Test 1}; \draw[black, thick] (1.5, 3) circle (0.75); \node at (1.5, 3) (T2) {Test 2}; \draw[black, thick] (-2, 0) rectangle (2, 1); \node at (0, 0.5) (DepRes) {Dependent Results}; % Draw arrows from the bottom of the circles to the top of the rectangle \draw[->] (T1.south) -- (DepRes.north); \draw[->] (T2.south) -- (DepRes.north); \draw[->] (bio.south) -- (T1.north); \draw[->] (bio.south) -- (T2.north); \end{tikzpicture} \end{center} \vskip 5pt \begin{center} \vskip 5pt \begin{tabular}{| c | c | c |} \hline Test 1 Result & Test 2 Result & P(A) \\ \hline\hline \multicolumn{3}{| c |}{Prior beliefs of test 1} \\ \hline Unknown & Unknown & 10\% \\ Positive & Unknown & 67.857\% \\ Negative & Unknown & 0.581\% \\ \hline \multicolumn{3}{| c |}{Prior beliefs of test 2} \\ \hline Unknown & Positive & 55\% \\ Unknown & Negative & 1\% \\ \hline \multicolumn{3}{| c |}{Dependent results from both tests} \\ \hline Positive & Positive & 75\% \\ Positive & Negative & 1.5\% \\ Negative & Positive & 0.6\% \\ Negative & Negative & 0.087\% \\ \hline \end{tabular} \end{center} Note that this probability of positive results in both tests (which both have greater than 50\% of positives being true positives) is as certain as two positives from two completely independent tests with 50\% of positives being true. If the partial dependence was not included in the calculation, as would have occured in a Naive Bayes model, the model's listed accuracy would be inflated. \newpage \section{Unit 4: Markov Chains} \end{document}