mirror of
https://github.com/asimonson1125/Implementations-of-Probability-Theory.git
synced 2026-02-25 06:09:50 -06:00
Markov, but not just the timesheet
This commit is contained in:
Binary file not shown.
@@ -295,6 +295,18 @@ probability can only be calculated by determining how likely it is that the frie
|
|||||||
the odds that such a friend may also be willing to throw limbs at his car so as to maintain their ever-reliable facade. If one also considers the possibility
|
the odds that such a friend may also be willing to throw limbs at his car so as to maintain their ever-reliable facade. If one also considers the possibility
|
||||||
that Shafer's friends mistakenly believed a limb fell on his car, this uncertainty must also be combined with the evidence for the most accurate picture.
|
that Shafer's friends mistakenly believed a limb fell on his car, this uncertainty must also be combined with the evidence for the most accurate picture.
|
||||||
|
|
||||||
|
\subsubsection{Minority Rule through Renormalization}
|
||||||
|
One way that details about a sample can be supressed is through minority rule, where analyses is skewed by the influence of a small subsection of the population
|
||||||
|
imposing attributes onto a pliable, but larger subsection of the population. Often used in social sciences and asymmetric warfare, the stubbornness of a handful of
|
||||||
|
people, say, those with a demanding preference for organic foods, requires the surrounding environment to adapt. Most people who do not eat organic but would not
|
||||||
|
object if it was all that was offered. Thus, a family with a single person with a dietary preference can flip the entire kitchen to fit that preference. This
|
||||||
|
process is called renormalization and it runs counter to the observations of outsiders that might infer that the whole family prefers organic foods.
|
||||||
|
|
||||||
|
Scaled upwards, the renormalization effect might then apply itself to a cookout between families who acknowledge one family has a dietary preference. That might
|
||||||
|
then renormalize the entire community, resulting in local grocery store offerings being near-exclusive to the dietary preference of a remarkably small portion of the
|
||||||
|
community. If a data scientist then infers from the offerings of this grocery store the dietary preferences of the community, they would be inclined to believe that
|
||||||
|
the actual minority is not just a majority, but a requirement amongst the population. In this sense, tolerance for intolerance begets intolerance.
|
||||||
|
|
||||||
\subsubsection{Methodology Considerations}
|
\subsubsection{Methodology Considerations}
|
||||||
I have taken 10134023 instances of the last 40 years, during all of which Obama has been alive. Therefore I can say with a high degree of certainty that Obama is
|
I have taken 10134023 instances of the last 40 years, during all of which Obama has been alive. Therefore I can say with a high degree of certainty that Obama is
|
||||||
immortal.
|
immortal.
|
||||||
@@ -485,7 +497,7 @@ that the tests partially measure the same thing, as would have occured in a Naiv
|
|||||||
|
|
||||||
|
|
||||||
\subsubsection{Markov Chains}
|
\subsubsection{Markov Chains}
|
||||||
Markov Chains are a form of probabilistic automaton where, the likelihood of transitioning to a new state depends solely on the current state, with no memory of prior
|
Markov Chains are a form of probabilistic automaton where the likelihood of transitioning to a new state depends solely on the current state with no memory of prior
|
||||||
states. For example\footnote{example sourced from:\\\url{https://towardsdatascience.com/introduction-to-markov-chains-50da3645a50d}}, suppose a weather prediction
|
states. For example\footnote{example sourced from:\\\url{https://towardsdatascience.com/introduction-to-markov-chains-50da3645a50d}}, suppose a weather prediction
|
||||||
program wants to know whether tomorrow will be a sunny or cloudy day, based on the current weather. Using the current weather as a state, the program identifies that
|
program wants to know whether tomorrow will be a sunny or cloudy day, based on the current weather. Using the current weather as a state, the program identifies that
|
||||||
there is a 10\% chance of a sunny day transitioning into a cloudy day and a 50\% chance that a cloudy day transitions into a sunny day:
|
there is a 10\% chance of a sunny day transitioning into a cloudy day and a 50\% chance that a cloudy day transitions into a sunny day:
|
||||||
@@ -506,9 +518,11 @@ there is a 10\% chance of a sunny day transitioning into a cloudy day and a 50\%
|
|||||||
\end{center}
|
\end{center}
|
||||||
|
|
||||||
Note that there is no information preserved between steps. Markov Chains are memoryless, so any information that must be available to them must be expressed as the
|
Note that there is no information preserved between steps. Markov Chains are memoryless, so any information that must be available to them must be expressed as the
|
||||||
state, such as the sunny and cloudy states in the example above. One benefit of such a straightforward structure is that it enables easy calculation of the
|
state, such as the sunny and cloudy states in the example above. Accemically, this is called the \textbf{Markov Assumption}, though it is vocabulary that can easily
|
||||||
probabilities of reaching a state k-steps from the current position. By expressing the chain as a transition matrix where rows represent the current state, the
|
be explained with few additional words and won't be used for the rest of this paper. One benefit of such a straightforward structure is that it enables easy
|
||||||
column represents the next state, and each cell contains the probability of the state moving from the column state to the row state, we get a 1-step transition matrix:
|
calculation of the probabilities of reaching a state k-steps from the current position. By expressing the chain as a transition matrix where rows represent the
|
||||||
|
current state, the column represents the next state, and each cell contains the probability of the state moving from the column state to the row state, we get a
|
||||||
|
1-step transition matrix:
|
||||||
|
|
||||||
\[
|
\[
|
||||||
\begin{pmatrix}
|
\begin{pmatrix}
|
||||||
@@ -569,8 +583,147 @@ looks like this:
|
|||||||
\end{pmatrix}
|
\end{pmatrix}
|
||||||
\]
|
\]
|
||||||
|
|
||||||
\subsubsection{Hidden Markov Models}
|
\subsubsection{Hidden Markov Models}\label{HMMs}
|
||||||
maybe add notes on mixed
|
In contrast to the visible Markov Models above, Hidden Markov Models cannot observe the states within the model. The benefit to using such a model is that
|
||||||
|
observations of occurrences can use alogirthms such as the Viterbi Algorithm to determine the probability of a sequence of observations and estimate which state is
|
||||||
|
active in a given instance. These results extrapolating process to the result is reminiscent of inverse problems and many explanatory uses of data science, such as
|
||||||
|
in finance where, with the benefit of hindsight, analysts work to determine why events unfolded the way they did.
|
||||||
|
|
||||||
|
In addition to states, initial state probabilities, and transition probabilities, Hidden Markov Models also utilize observations, and emission probabilities, or the
|
||||||
|
probability of an observation given a transition from state a to b. Using the earlier example where states represent either a sunny or cloudy day, an observation
|
||||||
|
liklihood matrix can be created for a weather sensor that can only determine if the ground is wet. On a cloudy day there is a probability of rain and thus a high
|
||||||
|
probability of the ground being wet, whereas a sunny day would not nearly as often be triggered by dew or sensor tampering:
|
||||||
|
|
||||||
|
\[
|
||||||
|
\begin{array}{c c}
|
||||||
|
& \begin{array}{ccc} % Align column labels above the matrix
|
||||||
|
\text{dry} & \text{wet}
|
||||||
|
\end{array} \\ % End the first row (labels) with double backslash
|
||||||
|
\begin{array}{c} % Row labels
|
||||||
|
\text{Sunny} \\
|
||||||
|
\text{Cloudy} \\
|
||||||
|
\end{array} &
|
||||||
|
\begin{bmatrix} % Matrix with brackets
|
||||||
|
.95 & .05 \\
|
||||||
|
.6 & .4 \\
|
||||||
|
\end{bmatrix}
|
||||||
|
\end{array}
|
||||||
|
\]
|
||||||
|
|
||||||
|
Thus, an observation sequence may look like this:
|
||||||
|
\[
|
||||||
|
[\text{Dry, Dry, Wet}]
|
||||||
|
\]
|
||||||
|
|
||||||
|
In this case, it can be confidently assumed that the wet signal is representative of a rainy, cloudy day. In contrast, we can only be moderately confident that the
|
||||||
|
two dry days leading up to it were sunny days. Intuitively, it is most likely that there were two sunny days followed by a rainy day. By multiplying the probability
|
||||||
|
of observation to the transformation to the potential state, the probability of occurrence is revealed. Below, we assume a 50-50 chance of initialization at a sunny
|
||||||
|
or cloudy day:
|
||||||
|
\begin{center}
|
||||||
|
Three consecutive sunny days:
|
||||||
|
\[(.5 * .95) * (.9 * .95) * (.9 * .05) \approx 0.01828 \]
|
||||||
|
Three consecutive cloudy days:
|
||||||
|
\[(.5 * .6) * (.5 * .6) * (.5 * .4) = 0.018 \]
|
||||||
|
Sunny, sunny, cloudy:
|
||||||
|
\[(.5 * .95) * (.9 * .95) * (.1 * .4) \approx 0.01625 \]
|
||||||
|
\end{center}
|
||||||
|
|
||||||
|
Interestingly, the calculation reveals that it is actually more probable that there was an unusual wet third day during a sunny streak than for there to have been
|
||||||
|
a cloudy day following two sunny days.\footnote{I say interesting because I forgot how low I set the probability of sunny to cloudy and wholly expected the intuitive
|
||||||
|
sun-sun-cloud answer to prove accurate. Math moment.}
|
||||||
|
|
||||||
|
Brief sidenote, since the probability initial state is not known, the probability of initalization at state \(n\) is expressed in calculations as \(\pi_n\). I will
|
||||||
|
not use this notation in this report because I think it is confusing and somewhat ridiculous to have mathematical notation with as ubiquitous and universally constant
|
||||||
|
a meaning as \(\pi\) be addressed for something that has no relation to the constant. Whatever convention made this determination is seriously damaging the
|
||||||
|
accessibility of mathematics for anybody shy of a walking computational index.
|
||||||
|
|
||||||
|
\subsubsection{Viterbi Algorithm}
|
||||||
|
While it is feasible to calculate the probabilities for each possible route to a series of observations, such a process produces an exponential time complexity.
|
||||||
|
With each state change, the number of paths to keep track of grows exponentially, which in practical terms means countless threads on each state separated only by
|
||||||
|
the history of how they got there. Enter the Viterbi Algorithm, which reduces the effect of a step (or, as in our example, a new day) from an exponential
|
||||||
|
relationship ( \(O(N^T)\) ) to a flat multiple ( \(O(N^2 T)\) ). This is possible because the Viterbi Algorithm creates partial solutions by eliminating all but the
|
||||||
|
most optimal branch to reach the next state instead of recomputing each exit from a state for each entry. If a route is deemed improbable, it will not be considered
|
||||||
|
the next time the same observation sequence occurs at that state.
|
||||||
|
|
||||||
|
More intuitively, consider that there are multiple ways to reach a given state in 1 step. Once each path's probability is computed, you only need to retain the
|
||||||
|
highest probability path to that state and the next step will only require calculation from that state once.\footnote{The mathematical notation to describe this
|
||||||
|
algorithm is criminally challenging to parse. I want to acknowledge this video for being the only one of its kind that did not rely on the notation:
|
||||||
|
\url{https://www.youtube.com/watch?v=6JVqutwtzmo}} Consider the following graphic rendition of each possible 3-day sequence of sunny vs cloudy:
|
||||||
|
|
||||||
|
\begin{center}
|
||||||
|
\begin{tikzpicture}[shorten >=1pt, node distance=3cm, on grid, auto]
|
||||||
|
|
||||||
|
\node[state] (Sunny1) {Sunny};
|
||||||
|
\node[state, below=of Sunny1] (Cloudy1) {Cloudy};
|
||||||
|
\node[state, right=of Sunny1] (Sunny2) {Sunny};
|
||||||
|
\node[state, below=of Sunny2] (Cloudy2) {Cloudy};
|
||||||
|
\node[state, right=of Sunny2] (Sunny3) {Sunny};
|
||||||
|
\node[state, below=of Sunny3] (Cloudy3) {Cloudy};
|
||||||
|
\node[above=of Sunny1, yshift=-1.5cm]{Day 1};
|
||||||
|
\node[above=of Sunny2, yshift=-1.5cm]{Day 2};
|
||||||
|
\node[above=of Sunny3, yshift=-1.5cm]{Day 3};
|
||||||
|
|
||||||
|
\path[->]
|
||||||
|
(Sunny1) edge node {} (Sunny2)
|
||||||
|
edge node {} (Cloudy2)
|
||||||
|
(Cloudy1) edge node {} (Sunny2)
|
||||||
|
edge node {} (Cloudy2)
|
||||||
|
([yshift=1mm] Sunny2.east) edge[->] node {} ([yshift=1mm] Sunny3.west)
|
||||||
|
([yshift=-1mm] Sunny2.east) edge[->] node {} ([yshift=-1mm] Sunny3.west)
|
||||||
|
([yshift=1mm] Sunny2.east) edge[->] node {} ([yshift=1mm] Cloudy3.west)
|
||||||
|
([yshift=-1mm] Sunny2.east) edge[->] node {} ([yshift=-1mm] Cloudy3.west)
|
||||||
|
([yshift=1mm] Cloudy2.east) edge[->] node {} ([yshift=1mm] Sunny3.west)
|
||||||
|
([yshift=-1mm] Cloudy2.east) edge[->] node {} ([yshift=-1mm] Sunny3.west)
|
||||||
|
([yshift=1mm] Cloudy2.east) edge[->] node {} ([yshift=1mm] Cloudy3.west)
|
||||||
|
([yshift=-1mm] Cloudy2.east) edge[->] node {} ([yshift=-1mm] Cloudy3.west);
|
||||||
|
|
||||||
|
\end{tikzpicture}
|
||||||
|
\end{center}
|
||||||
|
|
||||||
|
Notice that there are two arrows from each day 2 state to each day 3 state because there two paths were created to reach each of the day 2 states. If there was a
|
||||||
|
fourth day depicted, there would be 4 calculations from each day 3 state to each day 4 state. To prevent this, the Viterbi Algorithm only preserves the most likely
|
||||||
|
path to each node. For instance, there are two paths to a sunny day on day 2. Either the first day was sunny and it stayed sunny, or the first day was cloudy but
|
||||||
|
transitioned to sunny the next day. Using the same \([\text{Dry, Dry, Wet}]\) observation sequence as before, the probabilities of these paths occurring can be
|
||||||
|
calculated:
|
||||||
|
|
||||||
|
\begin{center}
|
||||||
|
Two consecutive sunny days:
|
||||||
|
\[(.5 * .95) * (.9 * .95) = 0.406125 \]
|
||||||
|
Sunny, cloudy:
|
||||||
|
\[(.5 * .6) * (.5 * .95) = 0.1425 \]
|
||||||
|
\end{center}
|
||||||
|
|
||||||
|
Hence, we can eliminate the \([\text{Cloudy, Sunny}]\) starting sequence from the most probable sequence of steps given the observations. Doing the same thing
|
||||||
|
for the rest of the visualization leaves fewer arrows and therefore fewer calculations:
|
||||||
|
|
||||||
|
\begin{center}
|
||||||
|
\begin{tikzpicture}[shorten >=1pt, node distance=3cm, on grid, auto]
|
||||||
|
|
||||||
|
\node[state] (Sunny1) {Sunny};
|
||||||
|
\node[state, below=of Sunny1] (Cloudy1) {Cloudy};
|
||||||
|
\node[state, right=of Sunny1] (Sunny2) {Sunny};
|
||||||
|
\node[state, below=of Sunny2] (Cloudy2) {Cloudy};
|
||||||
|
\node[state, right=of Sunny2] (Sunny3) {Sunny};
|
||||||
|
\node[state, below=of Sunny3] (Cloudy3) {Cloudy};
|
||||||
|
\node[above=of Sunny1, yshift=-1.5cm]{Day 1};
|
||||||
|
\node[above=of Sunny2, yshift=-1.5cm]{Day 2};
|
||||||
|
\node[above=of Sunny3, yshift=-1.5cm]{Day 3};
|
||||||
|
|
||||||
|
\path[->]
|
||||||
|
(Sunny1) edge node {} (Sunny2)
|
||||||
|
(Cloudy1) edge node {} (Cloudy2)
|
||||||
|
(Sunny2) edge node {} (Sunny3)
|
||||||
|
(Cloudy2) edge node {} (Cloudy3);
|
||||||
|
\path[->, draw=red]
|
||||||
|
(Sunny1) edge node[midway] {\textbf{x}} (Cloudy2)
|
||||||
|
(Cloudy1) edge node[midway] {\textbf{x}} (Sunny2)
|
||||||
|
(Sunny2) edge node[midway] {\textbf{x}} (Cloudy3)
|
||||||
|
(Cloudy2) edge node[midway] {\textbf{x}} (Sunny3);
|
||||||
|
\end{tikzpicture}
|
||||||
|
\end{center}
|
||||||
|
|
||||||
|
With only two sequences remaining, the final comparison needs only to determine if it is more likely for there to have been three consecutive sunny days or three
|
||||||
|
consecutive cloudy days, which was already done in the Hidden Markov Model section (\ref{HMMs}).
|
||||||
|
|
||||||
\newpage
|
\newpage
|
||||||
\subsection{Unit 5: Monte Carlo Simulations}
|
\subsection{Unit 5: Monte Carlo Simulations}
|
||||||
@@ -596,12 +749,85 @@ Found this later, it's sort of similar. \url{https://www.bayesserver.com/}
|
|||||||
|
|
||||||
Even better than their jank bayesian belief network I may be able to make mixed bayesian/markov chain models. This is a big project.
|
Even better than their jank bayesian belief network I may be able to make mixed bayesian/markov chain models. This is a big project.
|
||||||
|
|
||||||
\subsection{Cost-Benefit Analysis of Asychronous Education}
|
\subsection{Cost-Benefit Analysis of Remote Education}
|
||||||
This section covers a calculation I devised to make me feel better about my life decisions. The data is based on implicit guesswork and, while I will be taking it
|
This section covers a calculation I devised to make me feel better about my life decisions. The data is based on implicit guesswork and, while I will be taking it
|
||||||
seriously for my decision to do either the online or on-campus RIT Data Science Masters Program, it should not be taken seriously as a probabilistic model.
|
somewhat seriously for my decision to do either the online or on-campus RIT Data Science Masters Program, it should not be taken seriously as a probabilistic model.
|
||||||
Since there is no framework for making a subjective decision weighting the potential benefits of on-campus life with the value of entering the workforce 18 months
|
Since there is no framework for making a subjective decision weighting the potential benefits of on-campus life with the value of entering the workforce 18 months
|
||||||
sooner, I decided to make one. Inshallah I shall reach my true potential and fulfill destiny.
|
sooner, I decided to make one. Inshallah I shall reach my true potential and fulfill destiny.
|
||||||
|
|
||||||
|
\subsubsection{Selecting and Creating Key Metrics}
|
||||||
|
Since both programs result in a Data Science M.S. degree (albeit under the school of Software Engineering for on-campus versus the school of information for online),
|
||||||
|
the functional equivalence of the resulting certificate of completion is an effective isolator of potential long-term ramifications in career path that might otherwise
|
||||||
|
be dictated by hiring processes that favor one degree over the other. Therefore, this analysis is justified in focusing only on events occurring during my extended
|
||||||
|
education. I have selected two calculated features\footnote{features that I do not intend to calculate on the basis that it is impossible without a crystal ball and
|
||||||
|
knowledge of fortune telling - a cursed art that has been forbidden by the council for centuries.} that are important to determining the utility of
|
||||||
|
potential events from each masters program.
|
||||||
|
|
||||||
|
The generalized feature I've selected is serendipity\footnote{Read more about this definition of serendipity in \textit{Where Good Ideas Come From: The Natural
|
||||||
|
History of Innovation} by Steven Johnson}: the potential for the spontaneous formulation of creative genius brought about by the random collision of ideas - the
|
||||||
|
proverbial cafe of intellectuals where overheard conversations turn into incredible revelations. The on-campus program excels in this category because it extends
|
||||||
|
my stay in the academically diverse setting of Rochester Institute of Technology's main campus, potentially enabling interdisciplinary connections and research
|
||||||
|
opportunities. It also would grant me more time to get involved in the Simone Center for Innovation and Entrepreneurship which is an enticing hub for startups that
|
||||||
|
I can see myself becoming a key part of. In contrast, the online program offers me few opportunities to connect within RIT while opening the door to starting a
|
||||||
|
career in person sooner, which holds potential for intrapreneurship and a more directed interdisciplinary relationship. I acknowledge the magnitude of such
|
||||||
|
opportunities to be lesser, but more probable, especially if I change jobs more frequently.
|
||||||
|
|
||||||
|
When I was first choosing features I wanted to include a second metric to capture a level of character growth and mental health as a reflection of the impact of being
|
||||||
|
online and not being face-to-face with other people. In doing so I'd be modeling real-life variables that most would overlook.
|
||||||
|
Digging into it I realized I'd have to derive it from the magnitude and probabilities of social advantages of each program.
|
||||||
|
The community fostered, the friends not made. I can't bring myself to even make up numbers for that in a goof napkin-math formula.
|
||||||
|
Measuring covariance between these two features just feels disgusting. Instead, I'm going to negate the whole variable with this assumption about finding something
|
||||||
|
else to do with my life outside of work:
|
||||||
|
\begin{center}
|
||||||
|
\textit{The negative social effects of online program isolation are equal to and canceled out by the personal growth derived from the extra effort to find
|
||||||
|
'the third place' \footnote{First and second places are home and work. Read more at: \url{https://en.wikipedia.org/wiki/Third_place}} seeded by the frustration
|
||||||
|
towards myself for puttimg myself in this position.}
|
||||||
|
\end{center}
|
||||||
|
|
||||||
|
\paragraph{Creating PMFs}
|
||||||
|
|
||||||
|
Let's create probability mass functions for our feature in each program to subjectively measure potential:
|
||||||
|
|
||||||
|
Let the probability of magnitude \(X\) serendipity on the campus program and the online program as \(P(X_c)\) and \(P(X_o)\) respectively.
|
||||||
|
|
||||||
|
The on-campus program has advantages in serendipity, but while events may be an order of magnitude more impactful, I've already been on campus for three and a half
|
||||||
|
years and it feels highly unlikely that I will make sufficient changes to my routines to grant me more than a marginal probability of a serendipitous event occurring
|
||||||
|
\begin{equation*}
|
||||||
|
P(A_c) =
|
||||||
|
\begin{cases}
|
||||||
|
.8\qquad\text{if }&X=0\\
|
||||||
|
.105&X=1\\
|
||||||
|
.045&X=2\\
|
||||||
|
.025&X=3\\
|
||||||
|
.0125&X=4\\
|
||||||
|
.009&X=5\\
|
||||||
|
.0035&X=6\\
|
||||||
|
0&\text{Otherwise}
|
||||||
|
\end{cases}
|
||||||
|
\end{equation*}
|
||||||
|
|
||||||
|
**graph**
|
||||||
|
|
||||||
|
The online program wields greater chances of serendipity by placing me in more unique environments by means of starting my career sooner, hopefully giving me more
|
||||||
|
time to utilize what remains of my ambition before it crumbles with age and routine. There may be less of an impact for a serendipitous event when experiencing it
|
||||||
|
remotely or within a corporate structure, but what does a foolish little boy still in school know about the passion inbued by one's own accidental discoveries?
|
||||||
|
|
||||||
|
\begin{equation*}
|
||||||
|
P(A_o) =
|
||||||
|
\begin{cases}
|
||||||
|
.6\qquad\text{if }&X=0\\
|
||||||
|
.225&X=1\\
|
||||||
|
.115&X=2\\
|
||||||
|
.045&X=3\\
|
||||||
|
.0087&X=4\\
|
||||||
|
.0043&X=5\\
|
||||||
|
.002&X=6\\
|
||||||
|
0&\text{Otherwise}
|
||||||
|
\end{cases}
|
||||||
|
\end{equation*}
|
||||||
|
|
||||||
|
**graph**
|
||||||
|
|
||||||
with archaic knowledge imbued by Dr. Pepper flowing through my veins, I have selected \(y= 3x^2 - 2y\) as the equation for covariance.
|
with archaic knowledge imbued by Dr. Pepper flowing through my veins, I have selected \(y= 3x^2 - 2y\) as the equation for covariance.
|
||||||
|
|
||||||
\end{document}
|
\end{document}
|
||||||
Reference in New Issue
Block a user