Markov, but not just the timesheet

2026-02-24 21:59:50 -06:00 · 2024-11-15 18:16:03 -05:00
parent c086910ff5
commit a7b3e46f56
2 changed files with 234 additions and 8 deletions
--- a/report/report.pdf
+++ b/report/report.pdf
--- a/report/report.tex
+++ b/report/report.tex
@@ -295,6 +295,18 @@ probability can only be calculated by determining how likely it is that the frie
 the odds that such a friend may also be willing to throw limbs at his car so as to maintain their ever-reliable facade.  If one also considers the possibility 
 that Shafer's friends mistakenly believed a limb fell on his car, this uncertainty must also be combined with the evidence for the most accurate picture.

+\subsubsection{Minority Rule through Renormalization}
+One way that details about a sample can be supressed is through minority rule, where analyses is skewed by the influence of a small subsection of the population 
+imposing attributes onto a pliable, but larger subsection of the population.  Often used in social sciences and asymmetric warfare, the stubbornness of a handful of 
+people, say, those with a demanding preference for organic foods, requires the surrounding environment to adapt.  Most people who do not eat organic but would not 
+object if it was all that was offered.  Thus, a family with a single person with a dietary preference can flip the entire kitchen to fit that preference.  This 
+process is called renormalization and it runs counter to the observations of outsiders that might infer that the whole family prefers organic foods.  
+
+Scaled upwards, the renormalization effect might then apply itself to a cookout between families who acknowledge one family has a dietary preference.  That might 
+then renormalize the entire community, resulting in local grocery store offerings being near-exclusive to the dietary preference of a remarkably small portion of the 
+community.  If a data scientist then infers from the offerings of this grocery store the dietary preferences of the community, they would be inclined to believe that 
+the actual minority is not just a majority, but a requirement amongst the population.  In this sense, tolerance for intolerance begets intolerance.
+
 \subsubsection{Methodology Considerations}
 I have taken 10134023 instances of the last 40 years, during all of which Obama has been alive.  Therefore I can say with a high degree of certainty that Obama is 
 immortal.
@@ -485,7 +497,7 @@ that the tests partially measure the same thing, as would have occured in a Naiv


 \subsubsection{Markov Chains}
-Markov Chains are a form of probabilistic automaton where, the likelihood of transitioning to a new state depends solely on the current state, with no memory of prior 
+Markov Chains are a form of probabilistic automaton where the likelihood of transitioning to a new state depends solely on the current state with no memory of prior 
 states.  For example\footnote{example sourced from:\\\url{https://towardsdatascience.com/introduction-to-markov-chains-50da3645a50d}}, suppose a weather prediction 
 program wants to know whether tomorrow will be a sunny or cloudy day, based on the current weather.  Using the current weather as a state, the program identifies that 
 there is a 10\% chance of a sunny day transitioning into a cloudy day and a 50\% chance that a cloudy day transitions into a sunny day:
@@ -506,9 +518,11 @@ there is a 10\% chance of a sunny day transitioning into a cloudy day and a 50\%
 \end{center}

 Note that there is no information preserved between steps.  Markov Chains are memoryless, so any information that must be available to them must be expressed as the 
-state, such as the sunny and cloudy states in the example above.  One benefit of such a straightforward structure is that it enables easy calculation of the 
-probabilities of reaching a state k-steps from the current position.  By expressing the chain as a transition matrix where rows represent the current state, the 
-column represents the next state, and each cell contains the probability of the state moving from the column state to the row state, we get a 1-step transition matrix:
+state, such as the sunny and cloudy states in the example above.  Accemically, this is called the \textbf{Markov Assumption}, though it is vocabulary that can easily 
+be explained with few additional words and won't be used for the rest of this paper.  One benefit of such a straightforward structure is that it enables easy 
+calculation of the probabilities of reaching a state k-steps from the current position.  By expressing the chain as a transition matrix where rows represent the 
+current state, the column represents the next state, and each cell contains the probability of the state moving from the column state to the row state, we get a 
+1-step transition matrix:

 \[
 \begin{pmatrix}
@@ -569,8 +583,147 @@ looks like this:
 \end{pmatrix}
 \]

-\subsubsection{Hidden Markov Models}
-maybe add notes on mixed
+\subsubsection{Hidden Markov Models}\label{HMMs}
+In contrast to the visible Markov Models above, Hidden Markov Models cannot observe the states within the model.  The benefit to using such a model is that 
+observations of occurrences can use alogirthms such as the Viterbi Algorithm to determine the probability of a sequence of observations and estimate which state is 
+active in a given instance.  These results extrapolating process to the result is reminiscent of inverse problems and many explanatory uses of data science, such as 
+in finance where, with the benefit of hindsight, analysts work to determine why events unfolded the way they did.
+
+In addition to states, initial state probabilities, and transition probabilities, Hidden Markov Models also utilize observations, and emission probabilities, or the 
+probability of an observation given a transition from state a to b. Using the earlier example where states represent either a sunny or cloudy day, an observation 
+liklihood matrix can be created for a weather sensor that can only determine if the ground is wet.  On a cloudy day there is a probability of rain and thus a high 
+probability of the ground being wet, whereas a sunny day would not nearly as often be triggered by dew or sensor tampering:
+
+\[
+    \begin{array}{c c} 
+        & \begin{array}{ccc} % Align column labels above the matrix
+            \text{dry} & \text{wet}
+        \end{array} \\ % End the first row (labels) with double backslash
+        \begin{array}{c} % Row labels
+            \text{Sunny} \\ 
+            \text{Cloudy} \\
+        \end{array} & 
+        \begin{bmatrix} % Matrix with brackets
+            .95 & .05 \\
+            .6 & .4 \\
+        \end{bmatrix}
+    \end{array}
+\]
+
+Thus, an observation sequence may look like this:
+\[
+ [\text{Dry, Dry, Wet}]
+\]
+
+In this case, it can be confidently assumed that the wet signal is representative of a rainy, cloudy day.  In contrast, we can only be moderately confident that the 
+two dry days leading up to it were sunny days.  Intuitively, it is most likely that there were two sunny days followed by a rainy day.  By multiplying the probability 
+of observation to the transformation to the potential state, the probability of occurrence is revealed.  Below, we assume a 50-50 chance of initialization at a sunny 
+or cloudy day:
+\begin{center}
+    Three consecutive sunny days:
+    \[(.5 * .95) * (.9 * .95) * (.9 * .05) \approx 0.01828 \]
+    Three consecutive cloudy days:
+    \[(.5 * .6) * (.5 * .6) * (.5 * .4) = 0.018 \]
+    Sunny, sunny, cloudy:
+    \[(.5 * .95) * (.9 * .95) * (.1 * .4) \approx 0.01625 \]
+\end{center}
+
+Interestingly, the calculation reveals that it is actually more probable that there was an unusual wet third day during a sunny streak than for there to have been 
+a cloudy day following two sunny days.\footnote{I say interesting because I forgot how low I set the probability of sunny to cloudy and wholly expected the intuitive 
+sun-sun-cloud answer to prove accurate.  Math moment.}
+
+Brief sidenote, since the probability initial state is not known, the probability of initalization at state \(n\) is expressed in calculations as \(\pi_n\).  I will 
+not use this notation in this report because I think it is confusing and somewhat ridiculous to have mathematical notation with as ubiquitous and universally constant 
+a meaning as \(\pi\) be addressed for something that has no relation to the constant.  Whatever convention made this determination is seriously damaging the 
+accessibility of mathematics for anybody shy of a walking computational index.
+
+\subsubsection{Viterbi Algorithm}
+While it is feasible to calculate the probabilities for each possible route to a series of observations, such a process produces an exponential time complexity.  
+With each state change, the number of paths to keep track of grows exponentially, which in practical terms means countless threads on each state separated only by 
+the history of how they got there.  Enter the Viterbi Algorithm, which reduces the effect of a step (or, as in our example, a new day) from an exponential 
+relationship ( \(O(N^T)\) ) to a flat multiple ( \(O(N^2 T)\) ).  This is possible because the Viterbi Algorithm creates partial solutions by eliminating all but the 
+most optimal branch to reach the next state instead of recomputing each exit from a state for each entry.  If a route is deemed improbable, it will not be considered 
+the next time the same observation sequence occurs at that state.
+
+More intuitively, consider that there are multiple ways to reach a given state in 1 step.  Once each path's probability is computed, you only need to retain the 
+highest probability path to that state and the next step will only require calculation from that state once.\footnote{The mathematical notation to describe this 
+algorithm is criminally challenging to parse.  I want to acknowledge this video for being the only one of its kind that did not rely on the notation: 
+\url{https://www.youtube.com/watch?v=6JVqutwtzmo}}  Consider the following graphic rendition of each possible 3-day sequence of sunny vs cloudy:
+
+\begin{center}
+    \begin{tikzpicture}[shorten >=1pt, node distance=3cm, on grid, auto]
+
+        \node[state] (Sunny1) {Sunny};
+        \node[state, below=of Sunny1] (Cloudy1) {Cloudy};
+        \node[state, right=of Sunny1] (Sunny2) {Sunny};
+        \node[state, below=of Sunny2] (Cloudy2) {Cloudy};
+        \node[state, right=of Sunny2] (Sunny3) {Sunny};
+        \node[state, below=of Sunny3] (Cloudy3) {Cloudy};
+        \node[above=of Sunny1, yshift=-1.5cm]{Day 1};
+        \node[above=of Sunny2, yshift=-1.5cm]{Day 2};
+        \node[above=of Sunny3, yshift=-1.5cm]{Day 3};
+
+        \path[->]
+        (Sunny1) edge node {} (Sunny2)
+                edge node {} (Cloudy2)
+        (Cloudy1) edge node {} (Sunny2)
+                edge node {} (Cloudy2)
+        ([yshift=1mm] Sunny2.east) edge[->] node {} ([yshift=1mm] Sunny3.west)
+        ([yshift=-1mm] Sunny2.east) edge[->] node {} ([yshift=-1mm] Sunny3.west)
+        ([yshift=1mm] Sunny2.east) edge[->] node {} ([yshift=1mm] Cloudy3.west)
+        ([yshift=-1mm] Sunny2.east) edge[->] node {} ([yshift=-1mm] Cloudy3.west)
+        ([yshift=1mm] Cloudy2.east) edge[->] node {} ([yshift=1mm] Sunny3.west)
+        ([yshift=-1mm] Cloudy2.east) edge[->] node {} ([yshift=-1mm] Sunny3.west)
+        ([yshift=1mm] Cloudy2.east) edge[->] node {} ([yshift=1mm] Cloudy3.west)
+        ([yshift=-1mm] Cloudy2.east) edge[->] node {} ([yshift=-1mm] Cloudy3.west);
+
+    \end{tikzpicture}
+\end{center}
+
+Notice that there are two arrows from each day 2 state to each day 3 state because there two paths were created to reach each of the day 2 states.  If there was a 
+fourth day depicted, there would be 4 calculations from each day 3 state to each day 4 state.  To prevent this, the Viterbi Algorithm only preserves the most likely 
+path to each node.  For instance, there are two paths to a sunny day on day 2.  Either the first day was sunny and it stayed sunny, or the first day was cloudy but 
+transitioned to sunny the next day.  Using the same \([\text{Dry, Dry, Wet}]\) observation sequence as before, the probabilities of these paths occurring can be 
+calculated:
+
+\begin{center}
+    Two consecutive sunny days:
+    \[(.5 * .95) * (.9 * .95) = 0.406125 \]
+    Sunny, cloudy:
+    \[(.5 * .6) * (.5 * .95) = 0.1425 \]
+\end{center}
+
+Hence, we can eliminate the \([\text{Cloudy, Sunny}]\) starting sequence from the most probable sequence of steps given the observations.  Doing the same thing 
+for the rest of the visualization leaves fewer arrows and therefore fewer calculations:
+
+\begin{center}
+    \begin{tikzpicture}[shorten >=1pt, node distance=3cm, on grid, auto]
+
+        \node[state] (Sunny1) {Sunny};
+        \node[state, below=of Sunny1] (Cloudy1) {Cloudy};
+        \node[state, right=of Sunny1] (Sunny2) {Sunny};
+        \node[state, below=of Sunny2] (Cloudy2) {Cloudy};
+        \node[state, right=of Sunny2] (Sunny3) {Sunny};
+        \node[state, below=of Sunny3] (Cloudy3) {Cloudy};
+        \node[above=of Sunny1, yshift=-1.5cm]{Day 1};
+        \node[above=of Sunny2, yshift=-1.5cm]{Day 2};
+        \node[above=of Sunny3, yshift=-1.5cm]{Day 3};
+
+        \path[->]
+        (Sunny1) edge node {} (Sunny2)
+        (Cloudy1) edge node {} (Cloudy2)
+        (Sunny2) edge node {} (Sunny3)
+        (Cloudy2) edge node {} (Cloudy3);
+        \path[->, draw=red]
+        (Sunny1) edge node[midway] {\textbf{x}} (Cloudy2)
+        (Cloudy1) edge node[midway] {\textbf{x}} (Sunny2)
+        (Sunny2) edge node[midway] {\textbf{x}} (Cloudy3)
+        (Cloudy2) edge node[midway] {\textbf{x}} (Sunny3);
+    \end{tikzpicture}
+\end{center}
+
+With only two sequences remaining, the final comparison needs only to determine if it is more likely for there to have been three consecutive sunny days or three 
+consecutive cloudy days, which was already done in the Hidden Markov Model section (\ref{HMMs}).

 \newpage
 \subsection{Unit 5: Monte Carlo Simulations}
@@ -596,12 +749,85 @@ Found this later, it's sort of similar. \url{https://www.bayesserver.com/}

 Even better than their jank bayesian belief network I may be able to make mixed bayesian/markov chain models.  This is a big project.

-\subsection{Cost-Benefit Analysis of Asychronous Education}
+\subsection{Cost-Benefit Analysis of Remote Education}
 This section covers a calculation I devised to make me feel better about my life decisions.  The data is based on implicit guesswork and, while I will be taking it 
-seriously for my decision to do either the online or on-campus RIT Data Science Masters Program, it should not be taken seriously as a probabilistic model.  
+somewhat seriously for my decision to do either the online or on-campus RIT Data Science Masters Program, it should not be taken seriously as a probabilistic model.  
 Since there is no framework for making a subjective decision weighting the potential benefits of on-campus life with the value of entering the workforce 18 months 
 sooner, I decided to make one.  Inshallah I shall reach my true potential and fulfill destiny.

+\subsubsection{Selecting and Creating Key Metrics}
+Since both programs result in a Data Science M.S. degree (albeit under the school of Software Engineering for on-campus versus the school of information for online), 
+the functional equivalence of the resulting certificate of completion is an effective isolator of potential long-term ramifications in career path that might otherwise 
+be dictated by hiring processes that favor one degree over the other.  Therefore, this analysis is justified in focusing only on events occurring during my extended 
+education.  I have selected two calculated features\footnote{features that I do not intend to calculate on the basis that it is impossible without a crystal ball and 
+knowledge of fortune telling - a cursed art that has been forbidden by the council for centuries.} that are important to determining the utility of 
+potential events from each masters program.  
+
+The generalized feature I've selected is serendipity\footnote{Read more about this definition of serendipity in \textit{Where Good Ideas Come From: The Natural 
+History of Innovation} by Steven Johnson}: the potential for the spontaneous formulation of creative genius brought about by the random collision of ideas - the 
+proverbial cafe of intellectuals where overheard conversations turn into incredible revelations.  The on-campus program excels in this category because it extends 
+my stay in the academically diverse setting of Rochester Institute of Technology's main campus, potentially enabling interdisciplinary connections and research 
+opportunities.  It also would grant me more time to get involved in the Simone Center for Innovation and Entrepreneurship which is an enticing hub for startups that 
+I can see myself becoming a key part of.  In contrast, the online program offers me few opportunities to connect within RIT while opening the door to starting a 
+career in person sooner, which holds potential for intrapreneurship and a more directed interdisciplinary relationship.  I acknowledge the magnitude of such 
+opportunities to be lesser, but more probable, especially if I change jobs more frequently.
+
+When I was first choosing features I wanted to include a second metric to capture a level of character growth and mental health as a reflection of the impact of being 
+online and not being face-to-face with other people.  In doing so I'd be modeling real-life variables that most would overlook. 
+Digging into it I realized I'd have to derive it from the magnitude and probabilities of social advantages of each program.  
+The community fostered, the friends not made.  I can't bring myself to even make up numbers for that in a goof napkin-math formula.  
+Measuring covariance between these two features just feels disgusting.  Instead, I'm going to negate the whole variable with this assumption about finding something 
+else to do with my life outside of work:
+\begin{center}
+    \textit{The negative social effects of online program isolation are equal to and canceled out by the personal growth derived from the extra effort to find 
+    'the third place' \footnote{First and second places are home and work.  Read more at: \url{https://en.wikipedia.org/wiki/Third_place}} seeded by the frustration
+    towards myself for puttimg myself in this position.}
+\end{center}
+
+\paragraph{Creating PMFs}
+
+Let's create probability mass functions for our feature in each program to subjectively measure potential:
+
+Let the probability of magnitude \(X\) serendipity on the campus program and the online program as \(P(X_c)\) and \(P(X_o)\) respectively.
+
+The on-campus program has advantages in serendipity, but while events may be an order of magnitude more impactful, I've already been on campus for three and a half 
+years and it feels highly unlikely that I will make sufficient changes to my routines to grant me more than a marginal probability of a serendipitous event occurring
+\begin{equation*}
+    P(A_c) = 
+    \begin{cases}
+        .8\qquad\text{if }&X=0\\
+        .105&X=1\\
+        .045&X=2\\
+        .025&X=3\\
+        .0125&X=4\\
+        .009&X=5\\
+        .0035&X=6\\
+        0&\text{Otherwise}
+    \end{cases}
+\end{equation*}
+
+**graph**
+
+The online program wields greater chances of serendipity by placing me in more unique environments by means of starting my career sooner, hopefully giving me more 
+time to utilize what remains of my ambition before it crumbles with age and routine.  There may be less of an impact for a serendipitous event when experiencing it 
+remotely or within a corporate structure, but what does a foolish little boy still in school know about the passion inbued by one's own accidental discoveries?
+
+\begin{equation*}
+    P(A_o) = 
+    \begin{cases}
+        .6\qquad\text{if }&X=0\\
+        .225&X=1\\
+        .115&X=2\\
+        .045&X=3\\
+        .0087&X=4\\
+        .0043&X=5\\
+        .002&X=6\\
+        0&\text{Otherwise}
+    \end{cases}
+\end{equation*}
+
+**graph**
+
 with archaic knowledge imbued by Dr. Pepper flowing through my veins, I have selected \(y= 3x^2 - 2y\) as the equation for covariance.

 \end{document}