\documentclass[12pt]{article} \usepackage{blindtext} \usepackage{hyperref} \usepackage{amsmath} \usepackage{amssymb} \usepackage{tikz} \usetikzlibrary{arrows, automata, positioning} \usepackage[a4paper, total={6in, 10in}]{geometry} \usepackage{setspace} \setstretch{1.25} \hyphenpenalty 1000 \begin{document} \begin{titlepage} \begin{center} \vspace*{5cm} \Large{\textbf{Implementations of Probability Theory}}\\ \rule{14cm}{0.05cm}\\ \vspace{.25cm} \Large{Independent Study Report}\\ \large{Andrew Simonson} \vspace*{\fill} \large{Compiled on: \today}\\ \end{center} \end{titlepage} \newpage % Table of Contents % \large{Table of Contents} \tableofcontents \addtocontents{toc}{~\hfill\textbf{Page}\par} \newpage % Begin report \section{Objective} \rule{14cm}{0.05cm} The educational focus of Implementations of Probability Theory surrounds the application of data models that produce non-deterministic insights through probabilistic methodology. By pursuing this study I hope to gain a deeper understanding of how to apply data in risk calculation for mitigation scenarios as they appear in real life, rather than the experimental lab conditions that enable algorithmic certainty. In contrast to the path of black-box artificial intelligence and algorithms taught in \textbf{CSCI 335: Machine Learning}, this study is tailored to methods designed to produce confidence levels for uncertain events using certain terms, leveraging logical, traceable, and definite, calculations. Current course offerings in the realm of data science focus largely on the storing and management of data, and it is noted that the cluster of data science was until very recently under the branding of data management. Implementations of Probability Theory is intended to extend learnings in previous courses, notably \textbf{CSCI 420: Principles of Data Mining}, for more advanced algorithms used at the intersection of data and computing after the preprocessing stage. After beginning this study the intended deliverable outline was determined to be technically implausible and has been replaced with demonstrations of applied algorithms. Taking inspiration from the retinal mosaic as displayed in \textbf{CSCI 431: Intro to Computer Vision} and discussion in \textbf{IGME 589: Computational Creativity and Algorithmic Art} on the appearance and nature of randomness in graphics, I hope to create a program that can determine the liklihood that randomly distributed colors on a hexagonal grid appear as they do in an image. \newpage \section{Units} \rule{14cm}{0.05cm} \subsection{Unit 1: Statistics Review} To ensure a strong statistical foundation for the future learnings in probabilistic models, the first objective was to create a document outlining and defining key topics that are prerequisites for probabilities in statistics or for understanding generic analytical models. \subsubsection{Random Variables} \begin{enumerate} \item \textbf{Discrete Random Variables - }values are selected by chance from a countable (including countably infinite) list of distinct values \item \textbf{Continuous Random Variables - }values are selected by chance with an uncountable number of values within its range \end{enumerate} \subsubsection{Sample Space} A sample space is the set of all possible outcomes of an instance. For a six-sided dice roll event, the die may land with 1 through 6 dots facing upwards, hence: \[S = [1, 2, 3, 4, 5, 6] \quad\text{where }S\text{ is the sample space}\] \subsubsection{Probability Axioms} There are three probability axioms: \begin{enumerate} \item \textbf{Non-negativity}: \[ P(A) \geq 0 \quad \text{for any event }A, \ P(A) \in \mathbb{R} \] No event can be less likely to occur than an impossible event ( \(P(A) = 0\) ). P(A) is a real number. Paired with axiom 2 we can also conclude that \(P(A) \leq 1\). \item \textbf{Normalization}: \[ P(S) = 1\quad\text{where }S\text{ is the sample space} \] \textbf{Unit Measure - } All event probabilities in a sample space add up to 1. In essence, there is a 100\% chance that one of the events in the sample space will occur. \item \textbf{Additivity}: \[ P(A \cup B) = P(A) + P(B) \quad \text{if } A \cap B = \emptyset \] A union between events that are mutually exclusive (events that cannot both happen for an instance) has a probability that is the sum of the associated event probabilities. \end{enumerate} \subsubsection{Expectations and Deviation} \begin{enumerate} \item \textbf{Expectation - }The weighted average of the probabilities in the sample space \[\sum_{}^{S}{P(A) * A} = E \quad\text{where }E\text{ is the expected value}\] \item \textbf{Variance - }The spread of possible values for a random variable, calculated as: \[\sigma^{2}=\frac{\sum(X - \mu)^{2}}{N}\] Where \(N\) is the population size, \(\mu\) is the population average, and \(X\) is each value in the population.\\ For samples, variance is calculated with \textbf{Bessel's Correction}, which increases the variance to avoid overfitting the sample: \[s^{2}=\frac{\sum(X - \bar{x})^{2}}{n - 1}\] \item \textbf{Standard Deviation - }The square root of the variance, giving a measure of the average distance of each data point from the mean in the same units as the data. \[\sigma = \sqrt{V}\quad\text{where variance is }V\] \end{enumerate} \subsubsection{Probability Functions} Probability Functions map the likelihood of random variables to be a specific value. \subsubsection*{Probability Mass Functions} Probability Mass Functions (PMFs) map discrete random variables. For example, a six-sided die roll creates a uniform random PMF. Each side of the die has a one-sixth chance of landing face-up, so the discrete chances of each x value between 1 and 6 is represented by a \(\frac{1}{6}\)th portion of the sample space: \begin{equation*} P(A) = \begin{cases} 1/6\qquad\text{if }&X=1\\ 1/6&X=2\\ 1/6&X=3\\ 1/6&X=4\\ 1/6&X=5\\ 1/6&X=6\\ \end{cases} \end{equation*} \subsubsection*{Probability Density Functions} Probability Density Functions (PDFs) map continuous random variables. For example, this is a PDF representing a vehicle's risk of being stranded as it travels (in a line at a fixed speed). The y value increases as the vehicle puts distance between itself and the starting point but, once the halfway point is reached, the risk decreases as the distance between the vehicle and the destination decreases. \begin{equation*} P(A) = \begin{cases} X\qquad\qquad\text{if }&0\leq X\leq .5\\ -X+1&.5] (-0.6, 2.5) -- (0.25, 2.5); \node[label=below:45/1000] at (4,-2/3) {FP}; \draw[->] (4, -1/3) -- (4, .15); \node[label=below:5/1000] at (-1, 5.85) {FN}; \node[label=below:855/1000] at (3.5, 3.5) {TN}; \draw[->] (-0.6, 5.85) -- (0.25, 5.85); \end{tikzpicture} \end{center} \vskip 2pt Bayes Theorem as applied to this problem can be simply expressed as: \[ P(\text{has cancer given positive test}) = \frac{\colorbox{blue!5}{TP}}{\colorbox{blue!5}{TP} + \colorbox{red!5}{FP}} = \frac{\colorbox{blue!5}{\(\frac{95}{1000}\)}}{\colorbox{blue!5}{\(\frac{95}{1000}\)} + \colorbox{red!5}{\(\frac{45}{1000}\)}} = 67.9\% \] Meaning that, given a random positive test, there is a 67.9\% chance of the patient actually having cancer, not far off from the two-thirds visual trick. \subsubsection{Bayesian Updating} Bayesian Updating is another term that has been added to buzzword vocabulary to describe a process that isn't directly related to Bayesian Statistics but appears to have been rediscovered by academia through study of applied Bayes Theorem. In essence, Bayesian Updating simply states that observed occurrences should not override previous evidence and that it should instead be added to it in equal weight (equal value being a naive assumption). This evidence updating makes applications of Bayes Theory calculate posterior probabilities continuously as new information enters the system rather than a frequentist approach where the calculation only performed once. \subsubsection{Bayesian Belief Networks} Bayesian Belief Networks are probabilistic graphical models that preserve conditional dependence between random variables. In spite of its name, Bayesian Belief Networks do not necessarily apply Bayesian models, though they are a way to utilize Bayes Theorem for domains with greater complexity beyond a single posterior probability. In this type of network, edges are directed and the structure is utilized in a single direction. This is in contrast to undirected Hidden Markov Models (to be covered in the next unit) that do not assume the order of aquisition of random variables. While it may not be practical to calculate the full conditional probability of a variable, Bayesian Belief Networks allow us to identify conditionally dependent variables that are weighted on the basis of an earlier random variable. Following the example in the Bayes Theorem section of this report (\ref{Bayes Theorem}), let's suppose that a patient with a positive test takes a hypothetical second test. However, the second test's results are partially dependent on the first since they measure overlapping biological markers. \vskip 5pt \begin{center} \begin{tikzpicture} \draw[black, thick] (-2, 4.5) rectangle (2, 5.5); \node at (0, 5) (bio) {Biological Markers}; \draw[black, thick] (-1.5, 3) circle (0.75); \node at (-1.5, 3) (T1) {Test 1}; \draw[black, thick] (1.5, 3) circle (0.75); \node at (1.5, 3) (T2) {Test 2}; \draw[black, thick] (-2, 0) rectangle (2, 1); \node at (0, 0.5) (DepRes) {Dependent Results}; % Draw arrows from the bottom of the circles to the top of the rectangle \draw[->] (T1.south) -- (DepRes.north); \draw[->] (T2.south) -- (DepRes.north); \draw[->] (bio.south) -- (T1.north); \draw[->] (bio.south) -- (T2.north); \end{tikzpicture} \end{center} \vskip 5pt \begin{center} \vskip 5pt \begin{tabular}{| c | c | c |} \hline Test 1 Result & Test 2 Result & P(A) \\ \hline\hline \multicolumn{3}{| c |}{Prior beliefs of test 1} \\ \hline Unknown & Unknown & 10\% \\ Positive & Unknown & 67.857\% \\ Negative & Unknown & 0.581\% \\ \hline \multicolumn{3}{| c |}{Prior beliefs of test 2} \\ \hline Unknown & Positive & 55\% \\ Unknown & Negative & 1\% \\ \hline \multicolumn{3}{| c |}{Dependent results from both tests} \\ \hline Positive & Positive & 75\% \\ Positive & Negative & 1.5\% \\ Negative & Positive & 0.6\% \\ Negative & Negative & 0.087\% \\ \hline \end{tabular} \end{center} Note that this probability of positive results in both tests (which both have greater than 50\% of positives being true positives) is only equally certain as two positives from two independent tests each with 50\% of positives being true. If the dependence was not included in the calculation and we ignored the fact that the tests partially measure the same thing, as would have occured in a Naive Bayes model, the tests' combined accuracy would be unjustly inflated. \newpage \subsection{Unit 4: Markov Methods} \subsubsection{Markov Chains} Markov Chains are a form of probabilistic automaton where the likelihood of transitioning to a new state depends solely on the current state with no memory of prior states. For example\footnote{example sourced from:\\\url{https://towardsdatascience.com/introduction-to-markov-chains-50da3645a50d}}, suppose a weather prediction program wants to know whether tomorrow will be a sunny or cloudy day, based on the current weather. Using the current weather as a state, the program identifies that there is a 10\% chance of a sunny day transitioning into a cloudy day and a 50\% chance that a cloudy day transitions into a sunny day: \begin{center} \begin{tikzpicture}[shorten >=1pt, node distance=3cm, on grid, auto] \node[state] (Sunny) {Sunny}; \node[state, right=of Sunny] (Cloudy) {Cloudy}; \path[->] (Sunny) edge [loop left] node {.9} (Sunny) edge [bend right=-15] node {.1} (Cloudy) (Cloudy) edge [loop right] node {.5} (Cloudy) edge [bend left=15] node {.5} (Sunny); \end{tikzpicture} \end{center} Note that there is no information preserved between steps. Markov Chains are memoryless, so any information that must be available to them must be expressed as the state, such as the sunny and cloudy states in the example above. Accemically, this is called the \textbf{Markov Assumption}, though it is vocabulary that can easily be explained with few additional words and won't be used for the rest of this paper. One benefit of such a straightforward structure is that it enables easy calculation of the probabilities of reaching a state k-steps from the current position. By expressing the chain as a transition matrix where rows represent the current state, the column represents the next state, and each cell contains the probability of the state moving from the column state to the row state, we get a 1-step transition matrix: \[ \begin{pmatrix} .9 & .1 \\ .5 & .5 \end{pmatrix} \] or, expressed as a table: \begin{center} \begin{tabular}{ | c | c | c | } \hline Current State & Next: Sunny & Next: Cloudy \\ \hline \hline Sunny & 90\% & 10\% \\ \hline Cloudy & 50\% & 50\% \\ \hline \end{tabular} \end{center} To turn this into a k-steps transition matrix, this 1-step matrix only needs to be raised to the k-th power: \[ \begin{pmatrix} .9 & .1 \\ .5 & .5 \end{pmatrix}^k \] To find the probability of the weather two days from the current state, plug 2 into k: \[ \begin{pmatrix} .9 & .1 \\ .5 & .5 \end{pmatrix}^2 = \begin{pmatrix} .86 & .14 \\ .7 & .3 \end{pmatrix} \] From this matrix we can determine that if it is currently sunny, there is a 86\% chance that it will be sunny in two days and, if it is currently cloudy, there is a 70\% chance that it will be sunny in two days. As k approaches infinity, the model approaches its equilibrium where the starting state becomes irrelevant. In this example, any random day would be 83.333\% likely to be sunny, representative of the long-term behavior of the system (climate), so the matrix of the equilibrium looks like this: \[\begin{pmatrix} .9 & .1 \\ .5 & .5 \end{pmatrix}^\infty \approx \begin{pmatrix} .83333 & .16666 \\ .83333 & .16666 \end{pmatrix} \text{ OR: } \begin{pmatrix} .83333 \\ .16666 \end{pmatrix} \] \subsubsection{Hidden Markov Models}\label{HMMs} In contrast to the visible Markov Models above, Hidden Markov Models cannot observe the states within the model. The benefit to using such a model is that observations of occurrences can use alogirthms such as the Viterbi Algorithm to determine the probability of a sequence of observations and estimate which state is active in a given instance. These results extrapolating process to the result is reminiscent of inverse problems and many explanatory uses of data science, such as in finance where, with the benefit of hindsight, analysts work to determine why events unfolded the way they did. In addition to states, initial state probabilities, and transition probabilities, Hidden Markov Models also utilize observations, and emission probabilities, or the probability of an observation given a transition from state a to b. Using the earlier example where states represent either a sunny or cloudy day, an observation liklihood matrix can be created for a weather sensor that can only determine if the ground is wet. On a cloudy day there is a probability of rain and thus a high probability of the ground being wet, whereas a sunny day would not nearly as often be triggered by dew or sensor tampering: \[ \begin{array}{c c} & \begin{array}{ccc} % Align column labels above the matrix \text{dry} & \text{wet} \end{array} \\ % End the first row (labels) with double backslash \begin{array}{c} % Row labels \text{Sunny} \\ \text{Cloudy} \\ \end{array} & \begin{bmatrix} % Matrix with brackets .95 & .05 \\ .6 & .4 \\ \end{bmatrix} \end{array} \] Thus, an observation sequence may look like this: \[ [\text{Dry, Dry, Wet}] \] In this case, it can be confidently assumed that the wet signal is representative of a rainy, cloudy day. In contrast, we can only be moderately confident that the two dry days leading up to it were sunny days. Intuitively, it is most likely that there were two sunny days followed by a rainy day. By multiplying the probability of observation to the transformation to the potential state, the probability of occurrence is revealed. Below, we assume a 50-50 chance of initialization at a sunny or cloudy day: \begin{center} Three consecutive sunny days: \[(.5 * .95) * (.9 * .95) * (.9 * .05) \approx 0.01828 \] Three consecutive cloudy days: \[(.5 * .6) * (.5 * .6) * (.5 * .4) = 0.018 \] Sunny, sunny, cloudy: \[(.5 * .95) * (.9 * .95) * (.1 * .4) \approx 0.01625 \] \end{center} Interestingly, the calculation reveals that it is actually more probable that there was an unusual wet third day during a sunny streak than for there to have been a cloudy day following two sunny days.\footnote{I say interesting because I forgot how low I set the probability of sunny to cloudy and wholly expected the intuitive sun-sun-cloud answer to prove accurate. Math moment.} Brief sidenote, since the probability initial state is not known, the probability of initalization at state \(n\) is expressed in calculations as \(\pi_n\). I will not use this notation in this report because I think it is confusing and somewhat ridiculous to have mathematical notation with as ubiquitous and universally constant a meaning as \(\pi\) be addressed for something that has no relation to the constant. Whatever convention made this determination is seriously damaging the accessibility of mathematics for anybody shy of a walking computational index. \subsubsection{Viterbi Algorithm} While it is feasible to calculate the probabilities for each possible route to a series of observations, such a process produces an exponential time complexity. With each state change, the number of paths to keep track of grows exponentially, which in practical terms means countless threads on each state separated only by the history of how they got there. Enter the Viterbi Algorithm, which reduces the effect of a step (or, as in our example, a new day) from an exponential relationship ( \(O(N^T)\) ) to a flat multiple ( \(O(N^2 T)\) ). This is possible because the Viterbi Algorithm creates partial solutions by eliminating all but the most optimal branch to reach the next state instead of recomputing each exit from a state for each entry. If a route is deemed improbable, it will not be considered the next time the same observation sequence occurs at that state. More intuitively, consider that there are multiple ways to reach a given state in 1 step. Once each path's probability is computed, you only need to retain the highest probability path to that state and the next step will only require calculation from that state once.\footnote{The mathematical notation to describe this algorithm is criminally challenging to parse. I want to acknowledge this video for being the only one of its kind that did not rely on the notation: \url{https://www.youtube.com/watch?v=6JVqutwtzmo}} Consider the following graphic rendition of each possible 3-day sequence of sunny vs cloudy: \begin{center} \begin{tikzpicture}[shorten >=1pt, node distance=3cm, on grid, auto] \node[state] (Sunny1) {Sunny}; \node[state, below=of Sunny1] (Cloudy1) {Cloudy}; \node[state, right=of Sunny1] (Sunny2) {Sunny}; \node[state, below=of Sunny2] (Cloudy2) {Cloudy}; \node[state, right=of Sunny2] (Sunny3) {Sunny}; \node[state, below=of Sunny3] (Cloudy3) {Cloudy}; \node[above=of Sunny1, yshift=-1.5cm]{Day 1}; \node[above=of Sunny2, yshift=-1.5cm]{Day 2}; \node[above=of Sunny3, yshift=-1.5cm]{Day 3}; \path[->] (Sunny1) edge node {} (Sunny2) edge node {} (Cloudy2) (Cloudy1) edge node {} (Sunny2) edge node {} (Cloudy2) ([yshift=1mm] Sunny2.east) edge[->] node {} ([yshift=1mm] Sunny3.west) ([yshift=-1mm] Sunny2.east) edge[->] node {} ([yshift=-1mm] Sunny3.west) ([yshift=1mm] Sunny2.east) edge[->] node {} ([yshift=1mm] Cloudy3.west) ([yshift=-1mm] Sunny2.east) edge[->] node {} ([yshift=-1mm] Cloudy3.west) ([yshift=1mm] Cloudy2.east) edge[->] node {} ([yshift=1mm] Sunny3.west) ([yshift=-1mm] Cloudy2.east) edge[->] node {} ([yshift=-1mm] Sunny3.west) ([yshift=1mm] Cloudy2.east) edge[->] node {} ([yshift=1mm] Cloudy3.west) ([yshift=-1mm] Cloudy2.east) edge[->] node {} ([yshift=-1mm] Cloudy3.west); \end{tikzpicture} \end{center} Notice that there are two arrows from each day 2 state to each day 3 state because there two paths were created to reach each of the day 2 states. If there was a fourth day depicted, there would be 4 calculations from each day 3 state to each day 4 state. To prevent this, the Viterbi Algorithm only preserves the most likely path to each node. For instance, there are two paths to a sunny day on day 2. Either the first day was sunny and it stayed sunny, or the first day was cloudy but transitioned to sunny the next day. Using the same \([\text{Dry, Dry, Wet}]\) observation sequence as before, the probabilities of these paths occurring can be calculated: \begin{center} Two consecutive sunny days: \[(.5 * .95) * (.9 * .95) = 0.406125 \] Sunny, cloudy: \[(.5 * .6) * (.5 * .95) = 0.1425 \] \end{center} Hence, we can eliminate the \([\text{Cloudy, Sunny}]\) starting sequence from the most probable sequence of steps given the observations. Doing the same thing for the rest of the visualization leaves fewer arrows and therefore fewer calculations: \begin{center} \begin{tikzpicture}[shorten >=1pt, node distance=3cm, on grid, auto] \node[state] (Sunny1) {Sunny}; \node[state, below=of Sunny1] (Cloudy1) {Cloudy}; \node[state, right=of Sunny1] (Sunny2) {Sunny}; \node[state, below=of Sunny2] (Cloudy2) {Cloudy}; \node[state, right=of Sunny2] (Sunny3) {Sunny}; \node[state, below=of Sunny3] (Cloudy3) {Cloudy}; \node[above=of Sunny1, yshift=-1.5cm]{Day 1}; \node[above=of Sunny2, yshift=-1.5cm]{Day 2}; \node[above=of Sunny3, yshift=-1.5cm]{Day 3}; \path[->] (Sunny1) edge node {} (Sunny2) (Cloudy1) edge node {} (Cloudy2) (Sunny2) edge node {} (Sunny3) (Cloudy2) edge node {} (Cloudy3); \path[->, draw=red] (Sunny1) edge node[midway] {\textbf{x}} (Cloudy2) (Cloudy1) edge node[midway] {\textbf{x}} (Sunny2) (Sunny2) edge node[midway] {\textbf{x}} (Cloudy3) (Cloudy2) edge node[midway] {\textbf{x}} (Sunny3); \end{tikzpicture} \end{center} With only two sequences remaining, the final comparison needs only to determine if it is more likely for there to have been three consecutive sunny days or three consecutive cloudy days, which was already done in the Hidden Markov Model section (\ref{HMMs}). \newpage \subsection{Unit 5: Monte Carlo Simulations} what is this shit \subsubsection{How To Make a Monte Carlo Simulation} \subsubsection{Monte Carlo Integration} \subsubsection{Markov Chain Monte Carlo (MCMC) methods} \newpage \section{Applied Projects} \rule{14cm}{0.05cm} \subsection{Randomness of Retinal Mosaic layout} hexagonal grid of marbles. are colors randomly distributed? Hexagonal basis vectors, retinal mosaic, entropy \subsection{Bayes Server Ripoff} I planned to create a trickle-down density belief network using probability density functions as nodes that choose the direction of rows in a relational database. Found this later, it's sort of similar. \url{https://www.bayesserver.com/} Even better than their jank bayesian belief network I may be able to make mixed bayesian/markov chain models. This is a big project. \subsection{Cost-Benefit Analysis of Remote Education} This section covers a calculation I devised to make me feel better about my life decisions. The data is based on implicit guesswork and, while I will be taking it somewhat seriously for my decision to do either the online or on-campus RIT Data Science Masters Program, it should not be taken seriously as a probabilistic model. Since there is no framework for making a subjective decision weighting the potential benefits of on-campus life with the value of entering the workforce 18 months sooner, I decided to make one. Inshallah I shall reach my true potential and fulfill destiny. \subsubsection{Selecting and Creating Key Metrics} Since both programs result in a Data Science M.S. degree (albeit under the school of Software Engineering for on-campus versus the school of information for online), the functional equivalence of the resulting certificate of completion is an effective isolator of potential long-term ramifications in career path that might otherwise be dictated by hiring processes that favor one degree over the other. Therefore, this analysis is justified in focusing only on events occurring during my extended education. I have selected two calculated features\footnote{features that I do not intend to calculate on the basis that it is impossible without a crystal ball and knowledge of fortune telling - a cursed art that has been forbidden by the council for centuries.} that are important to determining the utility of potential events from each masters program. The generalized feature I've selected is serendipity\footnote{Read more about this definition of serendipity in \textit{Where Good Ideas Come From: The Natural History of Innovation} by Steven Johnson}: the potential for the spontaneous formulation of creative genius brought about by the random collision of ideas - the proverbial cafe of intellectuals where overheard conversations turn into incredible revelations. The on-campus program excels in this category because it extends my stay in the academically diverse setting of Rochester Institute of Technology's main campus, potentially enabling interdisciplinary connections and research opportunities. It also would grant me more time to get involved in the Simone Center for Innovation and Entrepreneurship which is an enticing hub for startups that I can see myself becoming a key part of. In contrast, the online program offers me few opportunities to connect within RIT while opening the door to starting a career in person sooner, which holds potential for intrapreneurship and a more directed interdisciplinary relationship. I acknowledge the magnitude of such opportunities to be lesser, but more probable, especially if I change jobs more frequently. When I was first choosing features I wanted to include a second metric to capture a level of character growth and mental health as a reflection of the impact of being online and not being face-to-face with other people. In doing so I'd be modeling real-life variables that most would overlook. Digging into it I realized I'd have to derive it from the magnitude and probabilities of social advantages of each program. The community fostered, the friends not made. I can't bring myself to even make up numbers for that in a goof napkin-math formula. Measuring covariance between these two features just feels disgusting. Instead, I'm going to negate the whole variable with this assumption about finding something else to do with my life outside of work: \begin{center} \textit{The negative social effects of online program isolation are equal to and canceled out by the personal growth derived from the extra effort to find 'the third place' \footnote{First and second places are home and work. Read more at: \url{https://en.wikipedia.org/wiki/Third_place}} seeded by the frustration towards myself for puttimg myself in this position.} \end{center} \paragraph{Creating PMFs} Let's create probability mass functions for our feature in each program to subjectively measure potential: Let the probability of magnitude \(X\) serendipity on the campus program and the online program as \(P(X_c)\) and \(P(X_o)\) respectively. The on-campus program has advantages in serendipity, but while events may be an order of magnitude more impactful, I've already been on campus for three and a half years and it feels highly unlikely that I will make sufficient changes to my routines to grant me more than a marginal probability of a serendipitous event occurring \begin{equation*} P(A_c) = \begin{cases} .8\qquad\text{if }&X=0\\ .105&X=1\\ .045&X=2\\ .025&X=3\\ .0125&X=4\\ .009&X=5\\ .0035&X=6\\ 0&\text{Otherwise} \end{cases} \end{equation*} **graph** The online program wields greater chances of serendipity by placing me in more unique environments by means of starting my career sooner, hopefully giving me more time to utilize what remains of my ambition before it crumbles with age and routine. There may be less of an impact for a serendipitous event when experiencing it remotely or within a corporate structure, but what does a foolish little boy still in school know about the passion inbued by one's own accidental discoveries? \begin{equation*} P(A_o) = \begin{cases} .6\qquad\text{if }&X=0\\ .225&X=1\\ .115&X=2\\ .045&X=3\\ .0087&X=4\\ .0043&X=5\\ .002&X=6\\ 0&\text{Otherwise} \end{cases} \end{equation*} **graph** with archaic knowledge imbued by Dr. Pepper flowing through my veins, I have selected \(y= 3x^2 - 2y\) as the equation for covariance. \end{document}