diff --git a/report/report.pdf b/report/report.pdf index 1d1a21a..c180810 100644 Binary files a/report/report.pdf and b/report/report.pdf differ diff --git a/report/report.tex b/report/report.tex index a526f20..9c629ed 100644 --- a/report/report.tex +++ b/report/report.tex @@ -295,6 +295,18 @@ probability can only be calculated by determining how likely it is that the frie the odds that such a friend may also be willing to throw limbs at his car so as to maintain their ever-reliable facade. If one also considers the possibility that Shafer's friends mistakenly believed a limb fell on his car, this uncertainty must also be combined with the evidence for the most accurate picture. +\subsubsection{Minority Rule through Renormalization} +One way that details about a sample can be supressed is through minority rule, where analyses is skewed by the influence of a small subsection of the population +imposing attributes onto a pliable, but larger subsection of the population. Often used in social sciences and asymmetric warfare, the stubbornness of a handful of +people, say, those with a demanding preference for organic foods, requires the surrounding environment to adapt. Most people who do not eat organic but would not +object if it was all that was offered. Thus, a family with a single person with a dietary preference can flip the entire kitchen to fit that preference. This +process is called renormalization and it runs counter to the observations of outsiders that might infer that the whole family prefers organic foods. + +Scaled upwards, the renormalization effect might then apply itself to a cookout between families who acknowledge one family has a dietary preference. That might +then renormalize the entire community, resulting in local grocery store offerings being near-exclusive to the dietary preference of a remarkably small portion of the +community. If a data scientist then infers from the offerings of this grocery store the dietary preferences of the community, they would be inclined to believe that +the actual minority is not just a majority, but a requirement amongst the population. In this sense, tolerance for intolerance begets intolerance. + \subsubsection{Methodology Considerations} I have taken 10134023 instances of the last 40 years, during all of which Obama has been alive. Therefore I can say with a high degree of certainty that Obama is immortal. @@ -485,7 +497,7 @@ that the tests partially measure the same thing, as would have occured in a Naiv \subsubsection{Markov Chains} -Markov Chains are a form of probabilistic automaton where, the likelihood of transitioning to a new state depends solely on the current state, with no memory of prior +Markov Chains are a form of probabilistic automaton where the likelihood of transitioning to a new state depends solely on the current state with no memory of prior states. For example\footnote{example sourced from:\\\url{https://towardsdatascience.com/introduction-to-markov-chains-50da3645a50d}}, suppose a weather prediction program wants to know whether tomorrow will be a sunny or cloudy day, based on the current weather. Using the current weather as a state, the program identifies that there is a 10\% chance of a sunny day transitioning into a cloudy day and a 50\% chance that a cloudy day transitions into a sunny day: @@ -506,9 +518,11 @@ there is a 10\% chance of a sunny day transitioning into a cloudy day and a 50\% \end{center} Note that there is no information preserved between steps. Markov Chains are memoryless, so any information that must be available to them must be expressed as the -state, such as the sunny and cloudy states in the example above. One benefit of such a straightforward structure is that it enables easy calculation of the -probabilities of reaching a state k-steps from the current position. By expressing the chain as a transition matrix where rows represent the current state, the -column represents the next state, and each cell contains the probability of the state moving from the column state to the row state, we get a 1-step transition matrix: +state, such as the sunny and cloudy states in the example above. Accemically, this is called the \textbf{Markov Assumption}, though it is vocabulary that can easily +be explained with few additional words and won't be used for the rest of this paper. One benefit of such a straightforward structure is that it enables easy +calculation of the probabilities of reaching a state k-steps from the current position. By expressing the chain as a transition matrix where rows represent the +current state, the column represents the next state, and each cell contains the probability of the state moving from the column state to the row state, we get a +1-step transition matrix: \[ \begin{pmatrix} @@ -569,8 +583,147 @@ looks like this: \end{pmatrix} \] -\subsubsection{Hidden Markov Models} -maybe add notes on mixed +\subsubsection{Hidden Markov Models}\label{HMMs} +In contrast to the visible Markov Models above, Hidden Markov Models cannot observe the states within the model. The benefit to using such a model is that +observations of occurrences can use alogirthms such as the Viterbi Algorithm to determine the probability of a sequence of observations and estimate which state is +active in a given instance. These results extrapolating process to the result is reminiscent of inverse problems and many explanatory uses of data science, such as +in finance where, with the benefit of hindsight, analysts work to determine why events unfolded the way they did. + +In addition to states, initial state probabilities, and transition probabilities, Hidden Markov Models also utilize observations, and emission probabilities, or the +probability of an observation given a transition from state a to b. Using the earlier example where states represent either a sunny or cloudy day, an observation +liklihood matrix can be created for a weather sensor that can only determine if the ground is wet. On a cloudy day there is a probability of rain and thus a high +probability of the ground being wet, whereas a sunny day would not nearly as often be triggered by dew or sensor tampering: + +\[ + \begin{array}{c c} + & \begin{array}{ccc} % Align column labels above the matrix + \text{dry} & \text{wet} + \end{array} \\ % End the first row (labels) with double backslash + \begin{array}{c} % Row labels + \text{Sunny} \\ + \text{Cloudy} \\ + \end{array} & + \begin{bmatrix} % Matrix with brackets + .95 & .05 \\ + .6 & .4 \\ + \end{bmatrix} + \end{array} +\] + +Thus, an observation sequence may look like this: +\[ + [\text{Dry, Dry, Wet}] +\] + +In this case, it can be confidently assumed that the wet signal is representative of a rainy, cloudy day. In contrast, we can only be moderately confident that the +two dry days leading up to it were sunny days. Intuitively, it is most likely that there were two sunny days followed by a rainy day. By multiplying the probability +of observation to the transformation to the potential state, the probability of occurrence is revealed. Below, we assume a 50-50 chance of initialization at a sunny +or cloudy day: +\begin{center} + Three consecutive sunny days: + \[(.5 * .95) * (.9 * .95) * (.9 * .05) \approx 0.01828 \] + Three consecutive cloudy days: + \[(.5 * .6) * (.5 * .6) * (.5 * .4) = 0.018 \] + Sunny, sunny, cloudy: + \[(.5 * .95) * (.9 * .95) * (.1 * .4) \approx 0.01625 \] +\end{center} + +Interestingly, the calculation reveals that it is actually more probable that there was an unusual wet third day during a sunny streak than for there to have been +a cloudy day following two sunny days.\footnote{I say interesting because I forgot how low I set the probability of sunny to cloudy and wholly expected the intuitive +sun-sun-cloud answer to prove accurate. Math moment.} + +Brief sidenote, since the probability initial state is not known, the probability of initalization at state \(n\) is expressed in calculations as \(\pi_n\). I will +not use this notation in this report because I think it is confusing and somewhat ridiculous to have mathematical notation with as ubiquitous and universally constant +a meaning as \(\pi\) be addressed for something that has no relation to the constant. Whatever convention made this determination is seriously damaging the +accessibility of mathematics for anybody shy of a walking computational index. + +\subsubsection{Viterbi Algorithm} +While it is feasible to calculate the probabilities for each possible route to a series of observations, such a process produces an exponential time complexity. +With each state change, the number of paths to keep track of grows exponentially, which in practical terms means countless threads on each state separated only by +the history of how they got there. Enter the Viterbi Algorithm, which reduces the effect of a step (or, as in our example, a new day) from an exponential +relationship ( \(O(N^T)\) ) to a flat multiple ( \(O(N^2 T)\) ). This is possible because the Viterbi Algorithm creates partial solutions by eliminating all but the +most optimal branch to reach the next state instead of recomputing each exit from a state for each entry. If a route is deemed improbable, it will not be considered +the next time the same observation sequence occurs at that state. + +More intuitively, consider that there are multiple ways to reach a given state in 1 step. Once each path's probability is computed, you only need to retain the +highest probability path to that state and the next step will only require calculation from that state once.\footnote{The mathematical notation to describe this +algorithm is criminally challenging to parse. I want to acknowledge this video for being the only one of its kind that did not rely on the notation: +\url{https://www.youtube.com/watch?v=6JVqutwtzmo}} Consider the following graphic rendition of each possible 3-day sequence of sunny vs cloudy: + +\begin{center} + \begin{tikzpicture}[shorten >=1pt, node distance=3cm, on grid, auto] + + \node[state] (Sunny1) {Sunny}; + \node[state, below=of Sunny1] (Cloudy1) {Cloudy}; + \node[state, right=of Sunny1] (Sunny2) {Sunny}; + \node[state, below=of Sunny2] (Cloudy2) {Cloudy}; + \node[state, right=of Sunny2] (Sunny3) {Sunny}; + \node[state, below=of Sunny3] (Cloudy3) {Cloudy}; + \node[above=of Sunny1, yshift=-1.5cm]{Day 1}; + \node[above=of Sunny2, yshift=-1.5cm]{Day 2}; + \node[above=of Sunny3, yshift=-1.5cm]{Day 3}; + + \path[->] + (Sunny1) edge node {} (Sunny2) + edge node {} (Cloudy2) + (Cloudy1) edge node {} (Sunny2) + edge node {} (Cloudy2) + ([yshift=1mm] Sunny2.east) edge[->] node {} ([yshift=1mm] Sunny3.west) + ([yshift=-1mm] Sunny2.east) edge[->] node {} ([yshift=-1mm] Sunny3.west) + ([yshift=1mm] Sunny2.east) edge[->] node {} ([yshift=1mm] Cloudy3.west) + ([yshift=-1mm] Sunny2.east) edge[->] node {} ([yshift=-1mm] Cloudy3.west) + ([yshift=1mm] Cloudy2.east) edge[->] node {} ([yshift=1mm] Sunny3.west) + ([yshift=-1mm] Cloudy2.east) edge[->] node {} ([yshift=-1mm] Sunny3.west) + ([yshift=1mm] Cloudy2.east) edge[->] node {} ([yshift=1mm] Cloudy3.west) + ([yshift=-1mm] Cloudy2.east) edge[->] node {} ([yshift=-1mm] Cloudy3.west); + + \end{tikzpicture} +\end{center} + +Notice that there are two arrows from each day 2 state to each day 3 state because there two paths were created to reach each of the day 2 states. If there was a +fourth day depicted, there would be 4 calculations from each day 3 state to each day 4 state. To prevent this, the Viterbi Algorithm only preserves the most likely +path to each node. For instance, there are two paths to a sunny day on day 2. Either the first day was sunny and it stayed sunny, or the first day was cloudy but +transitioned to sunny the next day. Using the same \([\text{Dry, Dry, Wet}]\) observation sequence as before, the probabilities of these paths occurring can be +calculated: + +\begin{center} + Two consecutive sunny days: + \[(.5 * .95) * (.9 * .95) = 0.406125 \] + Sunny, cloudy: + \[(.5 * .6) * (.5 * .95) = 0.1425 \] +\end{center} + +Hence, we can eliminate the \([\text{Cloudy, Sunny}]\) starting sequence from the most probable sequence of steps given the observations. Doing the same thing +for the rest of the visualization leaves fewer arrows and therefore fewer calculations: + +\begin{center} + \begin{tikzpicture}[shorten >=1pt, node distance=3cm, on grid, auto] + + \node[state] (Sunny1) {Sunny}; + \node[state, below=of Sunny1] (Cloudy1) {Cloudy}; + \node[state, right=of Sunny1] (Sunny2) {Sunny}; + \node[state, below=of Sunny2] (Cloudy2) {Cloudy}; + \node[state, right=of Sunny2] (Sunny3) {Sunny}; + \node[state, below=of Sunny3] (Cloudy3) {Cloudy}; + \node[above=of Sunny1, yshift=-1.5cm]{Day 1}; + \node[above=of Sunny2, yshift=-1.5cm]{Day 2}; + \node[above=of Sunny3, yshift=-1.5cm]{Day 3}; + + \path[->] + (Sunny1) edge node {} (Sunny2) + (Cloudy1) edge node {} (Cloudy2) + (Sunny2) edge node {} (Sunny3) + (Cloudy2) edge node {} (Cloudy3); + \path[->, draw=red] + (Sunny1) edge node[midway] {\textbf{x}} (Cloudy2) + (Cloudy1) edge node[midway] {\textbf{x}} (Sunny2) + (Sunny2) edge node[midway] {\textbf{x}} (Cloudy3) + (Cloudy2) edge node[midway] {\textbf{x}} (Sunny3); + \end{tikzpicture} +\end{center} + +With only two sequences remaining, the final comparison needs only to determine if it is more likely for there to have been three consecutive sunny days or three +consecutive cloudy days, which was already done in the Hidden Markov Model section (\ref{HMMs}). \newpage \subsection{Unit 5: Monte Carlo Simulations} @@ -596,12 +749,85 @@ Found this later, it's sort of similar. \url{https://www.bayesserver.com/} Even better than their jank bayesian belief network I may be able to make mixed bayesian/markov chain models. This is a big project. -\subsection{Cost-Benefit Analysis of Asychronous Education} +\subsection{Cost-Benefit Analysis of Remote Education} This section covers a calculation I devised to make me feel better about my life decisions. The data is based on implicit guesswork and, while I will be taking it -seriously for my decision to do either the online or on-campus RIT Data Science Masters Program, it should not be taken seriously as a probabilistic model. +somewhat seriously for my decision to do either the online or on-campus RIT Data Science Masters Program, it should not be taken seriously as a probabilistic model. Since there is no framework for making a subjective decision weighting the potential benefits of on-campus life with the value of entering the workforce 18 months sooner, I decided to make one. Inshallah I shall reach my true potential and fulfill destiny. +\subsubsection{Selecting and Creating Key Metrics} +Since both programs result in a Data Science M.S. degree (albeit under the school of Software Engineering for on-campus versus the school of information for online), +the functional equivalence of the resulting certificate of completion is an effective isolator of potential long-term ramifications in career path that might otherwise +be dictated by hiring processes that favor one degree over the other. Therefore, this analysis is justified in focusing only on events occurring during my extended +education. I have selected two calculated features\footnote{features that I do not intend to calculate on the basis that it is impossible without a crystal ball and +knowledge of fortune telling - a cursed art that has been forbidden by the council for centuries.} that are important to determining the utility of +potential events from each masters program. + +The generalized feature I've selected is serendipity\footnote{Read more about this definition of serendipity in \textit{Where Good Ideas Come From: The Natural +History of Innovation} by Steven Johnson}: the potential for the spontaneous formulation of creative genius brought about by the random collision of ideas - the +proverbial cafe of intellectuals where overheard conversations turn into incredible revelations. The on-campus program excels in this category because it extends +my stay in the academically diverse setting of Rochester Institute of Technology's main campus, potentially enabling interdisciplinary connections and research +opportunities. It also would grant me more time to get involved in the Simone Center for Innovation and Entrepreneurship which is an enticing hub for startups that +I can see myself becoming a key part of. In contrast, the online program offers me few opportunities to connect within RIT while opening the door to starting a +career in person sooner, which holds potential for intrapreneurship and a more directed interdisciplinary relationship. I acknowledge the magnitude of such +opportunities to be lesser, but more probable, especially if I change jobs more frequently. + +When I was first choosing features I wanted to include a second metric to capture a level of character growth and mental health as a reflection of the impact of being +online and not being face-to-face with other people. In doing so I'd be modeling real-life variables that most would overlook. +Digging into it I realized I'd have to derive it from the magnitude and probabilities of social advantages of each program. +The community fostered, the friends not made. I can't bring myself to even make up numbers for that in a goof napkin-math formula. +Measuring covariance between these two features just feels disgusting. Instead, I'm going to negate the whole variable with this assumption about finding something +else to do with my life outside of work: +\begin{center} + \textit{The negative social effects of online program isolation are equal to and canceled out by the personal growth derived from the extra effort to find + 'the third place' \footnote{First and second places are home and work. Read more at: \url{https://en.wikipedia.org/wiki/Third_place}} seeded by the frustration + towards myself for puttimg myself in this position.} +\end{center} + +\paragraph{Creating PMFs} + +Let's create probability mass functions for our feature in each program to subjectively measure potential: + +Let the probability of magnitude \(X\) serendipity on the campus program and the online program as \(P(X_c)\) and \(P(X_o)\) respectively. + +The on-campus program has advantages in serendipity, but while events may be an order of magnitude more impactful, I've already been on campus for three and a half +years and it feels highly unlikely that I will make sufficient changes to my routines to grant me more than a marginal probability of a serendipitous event occurring +\begin{equation*} + P(A_c) = + \begin{cases} + .8\qquad\text{if }&X=0\\ + .105&X=1\\ + .045&X=2\\ + .025&X=3\\ + .0125&X=4\\ + .009&X=5\\ + .0035&X=6\\ + 0&\text{Otherwise} + \end{cases} +\end{equation*} + +**graph** + +The online program wields greater chances of serendipity by placing me in more unique environments by means of starting my career sooner, hopefully giving me more +time to utilize what remains of my ambition before it crumbles with age and routine. There may be less of an impact for a serendipitous event when experiencing it +remotely or within a corporate structure, but what does a foolish little boy still in school know about the passion inbued by one's own accidental discoveries? + +\begin{equation*} + P(A_o) = + \begin{cases} + .6\qquad\text{if }&X=0\\ + .225&X=1\\ + .115&X=2\\ + .045&X=3\\ + .0087&X=4\\ + .0043&X=5\\ + .002&X=6\\ + 0&\text{Otherwise} + \end{cases} +\end{equation*} + +**graph** + with archaic knowledge imbued by Dr. Pepper flowing through my veins, I have selected \(y= 3x^2 - 2y\) as the equation for covariance. \end{document} \ No newline at end of file