mirror of
https://github.com/asimonson1125/Implementations-of-Probability-Theory.git
synced 2026-02-24 21:59:50 -06:00
Semester Submission
This commit is contained in:
Binary file not shown.
@@ -57,8 +57,9 @@ used at the intersection of data and computing after the preprocessing stage.
|
||||
|
||||
After beginning this study the intended deliverable outline was determined to be technically implausible and has been replaced with
|
||||
demonstrations of applied algorithms. Taking inspiration from the retinal mosaic as displayed in \textbf{CSCI 431: Intro to Computer Vision}
|
||||
and discussion in \textbf{IGME 589: Computational Creativity and Algorithmic Art} on the appearance and nature of randomness in graphics, I hope to create
|
||||
a program that can determine the liklihood that randomly distributed colors on a hexagonal grid appear as they do in an image.
|
||||
and discussion in \textbf{IGME 589: Computational Creativity and Algorithmic Art} on the appearance and nature of randomness in graphics, I will use this report as a
|
||||
platform for conceptual refactorization. These experiments are designed to appeal to human logical heuristics, helping them function as educational resources that
|
||||
develop a deeper understanding of why these systems work, not just the equations to use them.
|
||||
|
||||
\newpage
|
||||
\section{Units}
|
||||
@@ -66,8 +67,9 @@ a program that can determine the liklihood that randomly distributed colors on a
|
||||
|
||||
\subsection{Unit 1: Statistics Review}
|
||||
To ensure a strong statistical foundation for the future learnings in probabilistic models,
|
||||
the first objective was to create a document outlining and defining key topics that are
|
||||
prerequisites for probabilities in statistics or for understanding generic analytical models.
|
||||
the first objective is to create a document outlining and defining key topics that are
|
||||
prerequisites for probabilities in statistics or for understanding generic analytical models. While not intended to be in-depth, the reported review can function as
|
||||
a topic recall and simplification dictionary.
|
||||
|
||||
\subsubsection{Random Variables}
|
||||
\begin{enumerate}
|
||||
@@ -115,14 +117,15 @@ There are three probability axioms:
|
||||
Where \(N\) is the population size, \(\mu\) is the population average, and \(X\) is each value in the population.\\
|
||||
For samples, variance is calculated with \textbf{Bessel's Correction}, which increases the variance to avoid overfitting the sample:
|
||||
\[s^{2}=\frac{\sum(X - \bar{x})^{2}}{n - 1}\]
|
||||
\item \textbf{Standard Deviation - }The square root of the variance, giving a measure of the average distance of each data point from the mean in the same units as the data.
|
||||
\item \textbf{Standard Deviation - }The square root of the variance, giving a measure of the average distance of each data point from the mean in the same units as
|
||||
the data.
|
||||
\[\sigma = \sqrt{V}\quad\text{where variance is }V\]
|
||||
\end{enumerate}
|
||||
|
||||
\subsubsection{Probability Functions}
|
||||
Probability Functions map the likelihood of random variables to be a specific value.
|
||||
|
||||
\subsubsection*{Probability Mass Functions}
|
||||
\subsubsection*{Probability Mass Functions}\label{PMF}
|
||||
Probability Mass Functions (PMFs) map discrete random variables.
|
||||
For example, a six-sided die roll creates a uniform random PMF. Each side of the die has a one-sixth chance of landing face-up, so the discrete chances of each x
|
||||
value between 1 and 6 is represented by a \(\frac{1}{6}\)th portion of the sample space:
|
||||
@@ -285,8 +288,8 @@ sample size.
|
||||
|
||||
\subsubsection{Dempster-Shafer Theory}\label{Dempster_Shafer_Theory}
|
||||
This section is an extra theory chosen to coincide with the unit 3 focus on Bayesian statistics. The Dempster-Shafer theory is a derivative application of
|
||||
Bayes Theorem (\ref{Bayes Theorem}) where subjective beliefs are applied to independent variables not tracked by the belief network. Shafer so eloquently describes this
|
||||
process by supposing that two friends, both of whom he subjectively believes are 90\% reliable, tell him that a limb has fallen on his car
|
||||
Bayes Theorem (\ref{Bayes Theorem}) where subjective beliefs are applied to independent variables not tracked by the belief network. Shafer so eloquently describes
|
||||
this process by supposing that two friends, both of whom he subjectively believes are 90\% reliable, tell him that a limb has fallen on his car
|
||||
\footnote{\url{http://glennshafer.com/assets/downloads/articles/article48.pdf}}. Without observing Shafer's car we can calculate that there is only a 1\% chance that
|
||||
both friends are unreliable, so there's a high liklihood that the statement is true.
|
||||
|
||||
@@ -307,13 +310,47 @@ then renormalize the entire community, resulting in local grocery store offering
|
||||
community. If a data scientist then infers from the offerings of this grocery store the dietary preferences of the community, they would be inclined to believe that
|
||||
the actual minority is not just a majority, but a requirement amongst the population. In this sense, tolerance for intolerance begets intolerance.
|
||||
|
||||
\subsubsection{Methodology Considerations}
|
||||
I have taken 10134023 instances of the last 40 years, during all of which Obama has been alive. Therefore I can say with a high degree of certainty that Obama is
|
||||
immortal.
|
||||
\subsubsection{Scale as a Dimension}
|
||||
Just as the rate and plausibility of renormalization is impacted by the ratio of the minority to the flexible majority, other interactions can become more complex
|
||||
through scale to the same effect as the curse of dimensionality. The curse of dimensionality is a reference to the exponential complexity of solving a problem with
|
||||
x variables. Two boolean variables, each containing one of two values, has 4 possible combinations of values. A third variable doubles this number to 8, a fourth
|
||||
doubles it again to 16. In complex interactions, scale acts as its own source of dimensionality becuase each new node in an ecosystem can interact with each
|
||||
pre-existing node, influencing interactions between it and another pre-existing node, which then influences the interactions from that node, and so forth.
|
||||
|
||||
In \textit{Skin in the Game}, Taleb uses the example of neuroscience to show the improbability of AI ever reflecting the full complexity of the human brain. He
|
||||
acknowledges advancements in neuoscience that accurately models interactions between neurons in the human brain, but scaling ths up to replicate human behavior is
|
||||
not so easy. While binary variables apply an exponential effect with a base of 2 (\(2^x\) where x is number of binary variables), neurons interlock and may have an
|
||||
effect of a hundredth, thousandth, or even millionth base.
|
||||
|
||||
This complexity, Taleb says, explains why even carefully studied brains of worms with only 300 neurons are still too complex to really understand, let alone
|
||||
simulate. If neurons had only a binary effect, the complexity could be calculated to \(2^300 = 2 * 10^90\) which, while massive, could conceivably be computed in
|
||||
the distant future. However, if each neuron can interact with just 5 others, the combination explosion grows to \((2^5)^300 = 3.5 * 10^451\). Applying Moore's Law
|
||||
and we assume that a society's computational capacity doubles every two years, it would take 2400 years before this difference in computational power could be
|
||||
rectified:
|
||||
\[
|
||||
2 * \log_2\left(\frac{3.5 * 10^{451}}{2 * 10^{90}}\right) \approx 2400.04 \text{ years}
|
||||
\]
|
||||
|
||||
Not to mention, neuron interactions are incredibly complex, containing dimensions in of themselves, not binary values. Good luck computing that, robot overlords.
|
||||
|
||||
\subsubsection{Methodology Considerations}
|
||||
As another homage to \textit{Garbage In, Garbage Out}, I'd like to present some instances of methodology creating useless data for the target variables. This is not
|
||||
just a reference to bad studies, such as those that try to measure social behaviors, oblivious to the fact that participant observation alters their behavior. There
|
||||
are many instances that data can be untainted but used without appropriate context. In particular, \textit{The Signal and the Noise} and
|
||||
\textit{Fooled by Randomness} highlight many instances where timeseries studies believe that decades of historical data is necessicarily comprehensive. Financial
|
||||
events in particular are often labelled as unpredictable by experts only when their models fail because the context of a national economy changes dramatically which
|
||||
can reveal attributes to market economics that were previously obscured by practices that isolate those variables. An event never occurring in history does not
|
||||
discount its possiblity of occurring in the future. Similarly, events that may have been impossible in the past are not necessarily impossible in the future. As an
|
||||
extreme example to prove a point, consider the following:
|
||||
\begin{quote}
|
||||
I have taken 10134023 instances of the last 40 years, during all of which Obama has been alive. Therefore, with so much time passed and many trials, I can say with
|
||||
a high degree of certainty that Obama is immortal.
|
||||
\end{quote}
|
||||
Silly, yes, but it is easy to become detached from context points when you begin digging deep into mathematical models. Data science is generally considered to be
|
||||
the intersection of coding, statistics, and domain knowledge, implying domain knowledge is secondary to computational ability. I'd argue just the opposite -
|
||||
incomplete knowledge of contemporary models still lends itself to effective data analysis but an incomplete understanding of what is being measured is dangerous and
|
||||
potentially counterproductive.
|
||||
|
||||
An event never occurring in history does not discount its possiblity of occurring in the future. Similarly, events that may have been impossible in the past
|
||||
are not necessarily impossible in the future.
|
||||
Also, psychology. Someone who knows they are being studied will act differently than someone who isn't being studied so models will be inaccurate.
|
||||
|
||||
\newpage
|
||||
\subsection{Unit 3: Bayesian Statistics}
|
||||
@@ -415,7 +452,8 @@ category:
|
||||
\vskip 2pt
|
||||
Bayes Theorem as applied to this problem can be simply expressed as:
|
||||
\[
|
||||
P(\text{has cancer given positive test}) = \frac{\colorbox{blue!5}{TP}}{\colorbox{blue!5}{TP} + \colorbox{red!5}{FP}} = \frac{\colorbox{blue!5}{\(\frac{95}{1000}\)}}{\colorbox{blue!5}{\(\frac{95}{1000}\)} + \colorbox{red!5}{\(\frac{45}{1000}\)}} = 67.9\%
|
||||
P(\text{has cancer given positive test}) = \frac{\colorbox{blue!5}{TP}}{\colorbox{blue!5}{TP} + \colorbox{red!5}{FP}} = \frac{\colorbox{blue!5}{\(\frac{95}{1000}\)}}
|
||||
{\colorbox{blue!5}{\(\frac{95}{1000}\)} + \colorbox{red!5}{\(\frac{45}{1000}\)}} = 67.9\%
|
||||
\]
|
||||
Meaning that, given a random positive test, there is a 67.9\% chance of the patient actually having cancer, not far off from the two-thirds visual trick.
|
||||
|
||||
@@ -429,6 +467,9 @@ the calculation only performed once.
|
||||
|
||||
|
||||
\subsubsection{Bayesian Belief Networks}
|
||||
\begin{center}
|
||||
\textit{Using Bayes to build an ensemble of models}
|
||||
\end{center}
|
||||
Bayesian Belief Networks are probabilistic graphical models that preserve conditional dependence between random variables. In spite of its name,
|
||||
Bayesian Belief Networks do not necessarily apply Bayesian models, though they are a way to utilize Bayes Theorem for domains with greater complexity beyond a
|
||||
single posterior probability. In this type of network, edges are directed and the structure is utilized in a single direction. This is in contrast to undirected
|
||||
@@ -493,7 +534,7 @@ positives from two independent tests each with 50\% of positives being true. If
|
||||
that the tests partially measure the same thing, as would have occured in a Naive Bayes model, the tests' combined accuracy would be unjustly inflated.
|
||||
|
||||
\newpage
|
||||
\subsection{Unit 4: Markov Methods}
|
||||
\subsection{Unit 4: Markov Methods}\label{Markov}
|
||||
|
||||
|
||||
\subsubsection{Markov Chains}
|
||||
@@ -617,15 +658,15 @@ Thus, an observation sequence may look like this:
|
||||
|
||||
In this case, it can be confidently assumed that the wet signal is representative of a rainy, cloudy day. In contrast, we can only be moderately confident that the
|
||||
two dry days leading up to it were sunny days. Intuitively, it is most likely that there were two sunny days followed by a rainy day. By multiplying the probability
|
||||
of observation to the transformation to the potential state, the probability of occurrence is revealed. Below, we assume a 50-50 chance of initialization at a sunny
|
||||
or cloudy day:
|
||||
of observation to the transformation to the potential state, the probability of occurrence is revealed. For the purposes of the example we will use the 83\%-16\%
|
||||
equilibrium matrix from earlier as the initialization matrix to reflect the random chance of any given day being sunny or cloudy:
|
||||
\begin{center}
|
||||
Three consecutive sunny days:
|
||||
\[(.5 * .95) * (.9 * .95) * (.9 * .05) \approx 0.01828 \]
|
||||
\[(\frac{5}{6} * .95) * (.9 * .95) * (.9 * .05) \approx 0.03 \]
|
||||
Three consecutive cloudy days:
|
||||
\[(.5 * .6) * (.5 * .6) * (.5 * .4) = 0.018 \]
|
||||
\[(\frac{1}{6} * .6) * (.5 * .6) * (.5 * .4) = 0.006 \]
|
||||
Sunny, sunny, cloudy:
|
||||
\[(.5 * .95) * (.9 * .95) * (.1 * .4) \approx 0.01625 \]
|
||||
\[(\frac{5}{6} * .95) * (.9 * .95) * (.1 * .4) \approx 0.027 \]
|
||||
\end{center}
|
||||
|
||||
Interestingly, the calculation reveals that it is actually more probable that there was an unusual wet third day during a sunny streak than for there to have been
|
||||
@@ -638,6 +679,9 @@ a meaning as \(\pi\) be addressed for something that has no relation to the cons
|
||||
accessibility of mathematics for anybody shy of a walking computational index.
|
||||
|
||||
\subsubsection{Viterbi Algorithm}
|
||||
\begin{center}
|
||||
\textit{Markov is memoryless - only the most probable sequence to a state matters}
|
||||
\end{center}
|
||||
While it is feasible to calculate the probabilities for each possible route to a series of observations, such a process produces an exponential time complexity.
|
||||
With each state change, the number of paths to keep track of grows exponentially, which in practical terms means countless threads on each state separated only by
|
||||
the history of how they got there. Enter the Viterbi Algorithm, which reduces the effect of a step (or, as in our example, a new day) from an exponential
|
||||
@@ -688,9 +732,9 @@ calculated:
|
||||
|
||||
\begin{center}
|
||||
Two consecutive sunny days:
|
||||
\[(.5 * .95) * (.9 * .95) = 0.406125 \]
|
||||
\[(\frac{5}{6} * .95) * (.9 * .95) \approx 0.677 \]
|
||||
Sunny, cloudy:
|
||||
\[(.5 * .6) * (.5 * .95) = 0.1425 \]
|
||||
\[(\frac{1}{6} * .6) * (.1 * .6) = 0.006 \]
|
||||
\end{center}
|
||||
|
||||
Hence, we can eliminate the \([\text{Cloudy, Sunny}]\) starting sequence from the most probable sequence of steps given the observations. Doing the same thing
|
||||
@@ -713,121 +757,258 @@ for the rest of the visualization leaves fewer arrows and therefore fewer calcul
|
||||
(Sunny1) edge node {} (Sunny2)
|
||||
(Cloudy1) edge node {} (Cloudy2)
|
||||
(Sunny2) edge node {} (Sunny3)
|
||||
(Cloudy2) edge node {} (Cloudy3);
|
||||
\path[->, draw=red]
|
||||
(Sunny1) edge node[midway] {\textbf{x}} (Cloudy2)
|
||||
(Cloudy1) edge node[midway] {\textbf{x}} (Sunny2)
|
||||
(Sunny2) edge node[midway] {\textbf{x}} (Cloudy3)
|
||||
(Cloudy2) edge node[midway] {\textbf{x}} (Sunny3);
|
||||
(Sunny2) edge node {} (Cloudy3);
|
||||
% \path[->, draw=red]
|
||||
% (Sunny1) edge node {} (Cloudy2)
|
||||
% (Cloudy1) edge node {} (Sunny2)
|
||||
% (Cloudy2) edge node {} (Cloudy3)
|
||||
% (Cloudy2) edge node {} (Sunny3);
|
||||
\end{tikzpicture}
|
||||
\end{center}
|
||||
|
||||
With only two sequences remaining, the final comparison needs only to determine if it is more likely for there to have been three consecutive sunny days or three
|
||||
consecutive cloudy days, which was already done in the Hidden Markov Model section (\ref{HMMs}).
|
||||
With only two sequences remaining, the final comparison needs only to determine if it is more likely for there to have been three consecutive sunny days or a sequence
|
||||
of two sunny days and a cloudy day\footnote{Had we assumed a 50-50 chance of initialization on a sunny or cloudy day, the probability of three consecutive cloudy days
|
||||
would have been more likely than a sunny, sunny, cloudy sequence. Yet another example where contextual completeness in the methodology makes a significant
|
||||
improvement in accuracy over what might otherwise have been napkin math.}, which we already calculated in the Hidden Markov Model section (\ref{HMMs}). If this
|
||||
calculation was extended to include additional days, the Viterbi Algorithm would never need to calculate a path that started with two cloudy days because all branches
|
||||
stemming from that route have already been pruned by the third day.
|
||||
|
||||
\newpage
|
||||
\subsection{Unit 5: Monte Carlo Simulations}
|
||||
what is this shit
|
||||
Monte Carlo Simulations are models that directly recreate the conditions of an environment containing random variables to simulate the outcome given a value in place
|
||||
of the random variable. This placeholder value may be an average of an expected occurrence but often the simulation is run many times with a randomly selected value
|
||||
so the results can be analyzed in place of many trials in the real environment.
|
||||
|
||||
Monte Carlo is useful when interactions between many variables produce deterministic but intractable results or if the steps to translate into a deterministic
|
||||
model are not fully understood. For every probability problem there exists a Monte Carlo Simulation that steps through the process of how a result is created without
|
||||
any derived formulation (which may be incorrect, especially if a problem is not completely understood). While the results are influenced by short-term bias in the
|
||||
random variable, the results converge towards the true Probability Mass Function (\ref{PMF}) as long as the simulation accurately reflects the interaction between
|
||||
variables.
|
||||
|
||||
\subsubsection{How To Make a Monte Carlo Simulation}
|
||||
If you've ever created a simulation and run it multiple times to get a feel for what is most likely to happen, congratulations! You've created a Monte Carlo
|
||||
Simulation.
|
||||
|
||||
\subsubsection{Monte Carlo Integration}
|
||||
|
||||
\subsubsection{Markov Chain Monte Carlo (MCMC) methods}
|
||||
|
||||
\newpage
|
||||
\section{Applied Projects}
|
||||
\rule{14cm}{0.05cm}
|
||||
|
||||
\subsection{Randomness of Retinal Mosaic layout}
|
||||
hexagonal grid of marbles. are colors randomly distributed?
|
||||
Hexagonal basis vectors, retinal mosaic, entropy
|
||||
|
||||
\subsection{Bayes Server Ripoff}
|
||||
I planned to create a trickle-down density belief network using probability density functions as nodes that choose the direction of rows in a relational database.
|
||||
Found this later, it's sort of similar. \url{https://www.bayesserver.com/}
|
||||
|
||||
Even better than their jank bayesian belief network I may be able to make mixed bayesian/markov chain models. This is a big project.
|
||||
|
||||
\subsection{Cost-Benefit Analysis of Remote Education}
|
||||
This section covers a calculation I devised to make me feel better about my life decisions. The data is based on implicit guesswork and, while I will be taking it
|
||||
somewhat seriously for my decision to do either the online or on-campus RIT Data Science Masters Program, it should not be taken seriously as a probabilistic model.
|
||||
Since there is no framework for making a subjective decision weighting the potential benefits of on-campus life with the value of entering the workforce 18 months
|
||||
sooner, I decided to make one. Inshallah I shall reach my true potential and fulfill destiny.
|
||||
|
||||
\subsubsection{Selecting and Creating Key Metrics}
|
||||
Since both programs result in a Data Science M.S. degree (albeit under the school of Software Engineering for on-campus versus the school of information for online),
|
||||
the functional equivalence of the resulting certificate of completion is an effective isolator of potential long-term ramifications in career path that might otherwise
|
||||
be dictated by hiring processes that favor one degree over the other. Therefore, this analysis is justified in focusing only on events occurring during my extended
|
||||
education. I have selected two calculated features\footnote{features that I do not intend to calculate on the basis that it is impossible without a crystal ball and
|
||||
knowledge of fortune telling - a cursed art that has been forbidden by the council for centuries.} that are important to determining the utility of
|
||||
potential events from each masters program.
|
||||
|
||||
The generalized feature I've selected is serendipity\footnote{Read more about this definition of serendipity in \textit{Where Good Ideas Come From: The Natural
|
||||
History of Innovation} by Steven Johnson}: the potential for the spontaneous formulation of creative genius brought about by the random collision of ideas - the
|
||||
proverbial cafe of intellectuals where overheard conversations turn into incredible revelations. The on-campus program excels in this category because it extends
|
||||
my stay in the academically diverse setting of Rochester Institute of Technology's main campus, potentially enabling interdisciplinary connections and research
|
||||
opportunities. It also would grant me more time to get involved in the Simone Center for Innovation and Entrepreneurship which is an enticing hub for startups that
|
||||
I can see myself becoming a key part of. In contrast, the online program offers me few opportunities to connect within RIT while opening the door to starting a
|
||||
career in person sooner, which holds potential for intrapreneurship and a more directed interdisciplinary relationship. I acknowledge the magnitude of such
|
||||
opportunities to be lesser, but more probable, especially if I change jobs more frequently.
|
||||
|
||||
When I was first choosing features I wanted to include a second metric to capture a level of character growth and mental health as a reflection of the impact of being
|
||||
online and not being face-to-face with other people. In doing so I'd be modeling real-life variables that most would overlook.
|
||||
Digging into it I realized I'd have to derive it from the magnitude and probabilities of social advantages of each program.
|
||||
The community fostered, the friends not made. I can't bring myself to even make up numbers for that in a goof napkin-math formula.
|
||||
Measuring covariance between these two features just feels disgusting. Instead, I'm going to negate the whole variable with this assumption about finding something
|
||||
else to do with my life outside of work:
|
||||
As an example, consider the scenario described in the Markov Model section of this report (\ref{Markov}) where we want to predict if a day \(x\) days in the future
|
||||
will be either sunny or rainy. Here is that same table representing the odds of a day transitioning from the state of the previous day:
|
||||
\begin{center}
|
||||
\textit{The negative social effects of online program isolation are equal to and canceled out by the personal growth derived from the extra effort to find
|
||||
'the third place' \footnote{First and second places are home and work. Read more at: \url{https://en.wikipedia.org/wiki/Third_place}} seeded by the frustration
|
||||
towards myself for puttimg myself in this position.}
|
||||
\begin{tabular}{ | c | c | c | }
|
||||
\hline
|
||||
Current State & Next: Sunny & Next: Cloudy \\
|
||||
\hline
|
||||
\hline
|
||||
Sunny & 90\% & 10\% \\
|
||||
\hline
|
||||
Cloudy & 50\% & 50\% \\
|
||||
\hline
|
||||
\end{tabular}
|
||||
\end{center}
|
||||
|
||||
\paragraph{Creating PMFs}
|
||||
To run a single possiblity of this interaction, initialize the state to define if the first day is sunny or cloudy (possibly using the equilibrium matrix as discussed
|
||||
prevously). Then, generate a random number and partition the possible results to match the table. If the first day is sunny and the random number is between 0 and 1
|
||||
then one option is to transition to a cloudy state if the number is greater than .9, reflecting the 90\% chance that the next day will also be sunny. Continuing this
|
||||
for the next few days, the random variable may leave a state transition path like \([\text{Sunny, Sunny, Cloudy}]\). Running the simulation again may net a different
|
||||
path: \([\text{Sunny, Cloudy, Sunny}]\). With more simulations, the collected random sample will quantify the probability of a sunny day on the third day with a
|
||||
simple ratio:
|
||||
|
||||
Let's create probability mass functions for our feature in each program to subjectively measure potential:
|
||||
\[\frac{\text{\# of simulations that end with a sunny day}}{\text{total \# of simulations}} \approx 0.86\%\footnote{Again, assuming a 100\% chance of sunny day
|
||||
initialization.}\]
|
||||
|
||||
Let the probability of magnitude \(X\) serendipity on the campus program and the online program as \(P(X_c)\) and \(P(X_o)\) respectively.
|
||||
We can validate this model by using our k-step transition matrix (\ref{Markov}):
|
||||
|
||||
The on-campus program has advantages in serendipity, but while events may be an order of magnitude more impactful, I've already been on campus for three and a half
|
||||
years and it feels highly unlikely that I will make sufficient changes to my routines to grant me more than a marginal probability of a serendipitous event occurring
|
||||
\begin{equation*}
|
||||
P(A_c) =
|
||||
\begin{cases}
|
||||
.8\qquad\text{if }&X=0\\
|
||||
.105&X=1\\
|
||||
.045&X=2\\
|
||||
.025&X=3\\
|
||||
.0125&X=4\\
|
||||
.009&X=5\\
|
||||
.0035&X=6\\
|
||||
0&\text{Otherwise}
|
||||
\end{cases}
|
||||
\end{equation*}
|
||||
\[
|
||||
\begin{pmatrix}
|
||||
.9 & .1 \\
|
||||
.5 & .5
|
||||
\end{pmatrix}^2
|
||||
=
|
||||
\begin{pmatrix}
|
||||
.86 & .14 \\
|
||||
.7 & .3
|
||||
\end{pmatrix}
|
||||
\]
|
||||
|
||||
**graph**
|
||||
Recall the top left number of this matrix reflects the probability of ending on a sunny day (column) given that the first day was sunny (row).
|
||||
|
||||
The online program wields greater chances of serendipity by placing me in more unique environments by means of starting my career sooner, hopefully giving me more
|
||||
time to utilize what remains of my ambition before it crumbles with age and routine. There may be less of an impact for a serendipitous event when experiencing it
|
||||
remotely or within a corporate structure, but what does a foolish little boy still in school know about the passion inbued by one's own accidental discoveries?
|
||||
\subsubsection{Monte Carlo Integration}
|
||||
Monte Carlo Integration is one use of Monte Carlo Simulations where the area of an object (or graphical integral) is calculated by selecting random coordinates and
|
||||
calculating the ratio of random coordinate points that were in the object (under the curve) to the total number of random coordinates. I'm including this section in
|
||||
the report for completeness since when I drafted this study's schedule I incorrectly assumed that this was a topic that would extend Monte Carlo, not just
|
||||
apply it.\footnote{I made this mistake at least twice. If you're bored, try to spot which topics they are. Unlicensed gamification moment.}
|
||||
|
||||
\begin{equation*}
|
||||
P(A_o) =
|
||||
\begin{cases}
|
||||
.6\qquad\text{if }&X=0\\
|
||||
.225&X=1\\
|
||||
.115&X=2\\
|
||||
.045&X=3\\
|
||||
.0087&X=4\\
|
||||
.0043&X=5\\
|
||||
.002&X=6\\
|
||||
0&\text{Otherwise}
|
||||
\end{cases}
|
||||
\end{equation*}
|
||||
One example of this integration method, called Buffon's Needle, is an approximation of pi (yes, \(\pi\)) by dropping sticks on a series of parallel lines. Assuming
|
||||
the length of the sticks is shorter than the distance between the parallel lines, this interaction is statistically governed by the expression \(\frac{2l}{\pi d}\)
|
||||
where \(l\) is the length of the sticks and \(d\) is the space between parallel lines\footnote{Learn more about and run a Monte Carlo Simulation of the sticks
|
||||
approximation at \url{https://prancer.physics.louisville.edu/modules/pi/index.html}}.
|
||||
|
||||
**graph**
|
||||
\subsubsection{Markov Chain Monte Carlo (MCMC) methods}
|
||||
\begin{center}
|
||||
\textit{Simulations can depend on their prior results}
|
||||
\end{center}
|
||||
MCMCs are a class of Monte Carlo simulations that epitomize stochastic sampling. Given a probability distribution that is too complex to be analyzed
|
||||
traditionally, MCMCs approximate the target distribution with an equilibrim distribution that converges on that target distribution.
|
||||
|
||||
with archaic knowledge imbued by Dr. Pepper flowing through my veins, I have selected \(y= 3x^2 - 2y\) as the equation for covariance.
|
||||
Contrary to the name of "Markov Chain Monte Carlos" and most educational works on the topic, I believe the easiest way to understand MCMC as a Monte Carlo simulation
|
||||
with 1-step memory. MCMC invokes the name of Markov Chains because in the array of sampled random values each value is randomly selected with influence of the
|
||||
previous value - something many compare to the memoryless state-hopping in Markov Chains. In reality, the 'state' in MCMCs is just a value whose importance is in how
|
||||
often this value is in the array. It's not a state with contextual value or an associated transition matrix.
|
||||
|
||||
There are a number of algorithms that implement the concept of MCMC, the most common of which is called the \textbf{Metropolis-Hastings Algorithm}.
|
||||
\footnote{If you're like me and can't handle the abstractions that education by mathematical notation requires, this video on Metropolis-Hastings is the best I can
|
||||
point you to on the topic of MCMCs: \url{https://www.youtube.com/watch?v=oX2wIGSn4jY}}
|
||||
In this variation, an initial value is selected at random. For each step, another random value, frequently in the range of one standard distribution, is added to this
|
||||
number, which has a \(\frac{P(\text{new value})}{P(\text{new value}) + P(\text{current value})}\) chance of becoming the new current value and added to the list of
|
||||
samples. If the current sample is selected over the new value, the current sample is added a second time to the list of samples. This acceptance criteria directs
|
||||
the samples towards high probability events while still keeping open the chance of the samples bridging the gap between local probabilistic maxima.
|
||||
|
||||
% \newpage
|
||||
% \subsection{Unit 6: Miscellaneous}
|
||||
% This section represents research on topics that were not initially a part of the study's scope but were either interesting, relevant, or suggested to me.
|
||||
|
||||
% \subsubsection{Spatial Descriptive Statistics and Ripley's K and L Functions}
|
||||
% \url{https://en.wikipedia.org/wiki/Spatial_descriptive_statistics}
|
||||
|
||||
% \subsubsection{Gibbs Sampling}
|
||||
|
||||
% \newpage
|
||||
% \section{Applied Projects}
|
||||
% \rule{14cm}{0.05cm}
|
||||
|
||||
% \subsection{Randomness of Retinal Mosaic layout}
|
||||
% hexagonal grid of marbles. are colors randomly distributed?
|
||||
% Hexagonal basis vectors, retinal mosaic, entropy
|
||||
|
||||
% \subsection{Bayes Server Ripoff}
|
||||
% I planned to create a trickle-down density belief network using probability density functions as nodes that choose the direction of rows in a relational database.
|
||||
% Found this later, it's sort of similar. \url{https://www.bayesserver.com/}
|
||||
|
||||
% Even better than their jank bayesian belief network I may be able to make mixed bayesian/markov chain models. This is a big project.
|
||||
|
||||
% \subsection{Modeling the Invisible Hand}
|
||||
% Skin in the Game: Taleb. Unintelligent random actors in structure create intelligent decisions.
|
||||
% I can monte carlo that shit.
|
||||
|
||||
% \subsection{Cost-Benefit Analysis of Remote Education}
|
||||
% This section covers a calculation I devised to make me feel better about my life decisions. The data is based on implicit guesswork and, while I will be taking it
|
||||
% somewhat seriously for my decision to do either the online or on-campus RIT Data Science Masters Program, it should not be taken seriously as a probabilistic model.
|
||||
% Since there is no framework for making a subjective decision weighting the potential benefits of on-campus life with the value of entering the workforce 18 months
|
||||
% sooner, I decided to make one. Inshallah I shall reach my true potential and fulfill destiny.
|
||||
|
||||
% \subsubsection{Selecting and Creating Key Metrics}
|
||||
% Since both programs result in a Data Science M.S. degree (albeit under the school of Software Engineering for on-campus versus the school of information for online),
|
||||
% the functional equivalence of the resulting certificate of completion is an effective isolator of potential long-term ramifications in career path that might otherwise
|
||||
% be dictated by hiring processes that favor one degree over the other. Therefore, this analysis is justified in focusing only on events occurring during my extended
|
||||
% education. I have selected two calculated features\footnote{features that I do not intend to calculate on the basis that it is impossible without a crystal ball and
|
||||
% knowledge of fortune telling - a cursed art that has been forbidden by the council for centuries.} that are important to determining the utility of
|
||||
% potential events from each masters program.
|
||||
|
||||
% The generalized feature I've selected is serendipity\footnote{Read more about this definition of serendipity in \textit{Where Good Ideas Come From: The Natural
|
||||
% History of Innovation} by Steven Johnson}: the potential for the spontaneous formulation of creative genius brought about by the random collision of ideas - the
|
||||
% proverbial cafe of intellectuals where overheard conversations turn into incredible revelations. The on-campus program excels in this category because it extends
|
||||
% my stay in the academically diverse setting of Rochester Institute of Technology's main campus, potentially enabling interdisciplinary connections and research
|
||||
% opportunities. It also would grant me more time to get involved in the Simone Center for Innovation and Entrepreneurship which is an enticing hub for startups that
|
||||
% I can see myself becoming a key part of. In contrast, the online program offers me few opportunities to connect within RIT while opening the door to starting a
|
||||
% career in person sooner, which holds potential for intrapreneurship and a more directed interdisciplinary relationship. I acknowledge the magnitude of such
|
||||
% opportunities to be lesser, but more probable, especially if I change jobs more frequently.
|
||||
|
||||
% When I was first choosing features I wanted to include a second metric to capture a level of character growth and mental health as a reflection of the impact of being
|
||||
% online and not being face-to-face with other people. In doing so I'd be modeling real-life variables that most would overlook.
|
||||
% Digging into it I realized I'd have to derive it from the magnitude and probabilities of social advantages of each program.
|
||||
% The community fostered, the friends not made. I can't bring myself to even make up numbers for that in a goof napkin-math formula.
|
||||
% Measuring covariance between these two features just feels disgusting. Instead, I'm going to negate the whole variable with this assumption about finding something
|
||||
% else to do with my life outside of work:
|
||||
% \begin{center}
|
||||
% \textit{The negative social effects of online program isolation are equal to and canceled out by the personal growth derived from the extra effort to find
|
||||
% 'the third place' \footnote{First and second places are home and work. Read more at: \url{https://en.wikipedia.org/wiki/Third_place}} seeded by the frustration
|
||||
% towards myself for puttimg myself in this position.}
|
||||
% \end{center}
|
||||
|
||||
% \paragraph{Creating PMFs}
|
||||
|
||||
% Let's create probability mass functions for our feature in each program to subjectively measure potential:
|
||||
|
||||
% Let the probability of magnitude \(X\) serendipity on the campus program and the online program as \(P(X_c)\) and \(P(X_o)\) respectively.
|
||||
|
||||
% The on-campus program has advantages in serendipity, but while events may be an order of magnitude more impactful, I've already been on campus for three and a half
|
||||
% years and it feels highly unlikely that I will make sufficient changes to my routines to grant me more than a marginal probability of a serendipitous event occurring
|
||||
% \begin{equation*}
|
||||
% P(A_c) =
|
||||
% \begin{cases}
|
||||
% .8\qquad\text{if }&X=0\\
|
||||
% .105&X=1\\
|
||||
% .045&X=2\\
|
||||
% .025&X=3\\
|
||||
% .0125&X=4\\
|
||||
% .009&X=5\\
|
||||
% .0035&X=6\\
|
||||
% 0&\text{Otherwise}
|
||||
% \end{cases}
|
||||
% \end{equation*}
|
||||
|
||||
% **graph**
|
||||
|
||||
% The online program wields greater chances of serendipity by placing me in more unique environments by means of starting my career sooner, hopefully giving me more
|
||||
% time to utilize what remains of my ambition before it crumbles with age and routine. There may be less of an impact for a serendipitous event when experiencing it
|
||||
% remotely or within a corporate structure, but what does a foolish little boy still in school know about the passion inbued by one's own accidental discoveries?
|
||||
|
||||
% \begin{equation*}
|
||||
% P(A_o) =
|
||||
% \begin{cases}
|
||||
% .6\qquad\text{if }&X=0\\
|
||||
% .225&X=1\\
|
||||
% .115&X=2\\
|
||||
% .045&X=3\\
|
||||
% .0087&X=4\\
|
||||
% .0043&X=5\\
|
||||
% .002&X=6\\
|
||||
% 0&\text{Otherwise}
|
||||
% \end{cases}
|
||||
% \end{equation*}
|
||||
|
||||
% **graph**
|
||||
|
||||
% with archaic knowledge imbued by Dr. Pepper flowing through my veins, I have selected \(y= 3x^2 - 2y\) as the equation for covariance.
|
||||
|
||||
\newpage
|
||||
\section{Retrospective Discussion}
|
||||
|
||||
At the end of this independent study it's worth reflecting on how my initial proposal has changed as I've learned more about this topic. Going into the Fall 2024
|
||||
semester I wanted to understand how complex algorithms manage the influence of untracked variables and how they could be used to derive formulas for the influence on
|
||||
tracked variables on the target. While I did recieve some insight on how to go about formulating experiments to do this, especially through a more personal
|
||||
understanding with the foundational statistics, I found fairly little industry application of the scientific conceptualization that I expected. Most practical
|
||||
applications of probability theory rely less on an in-depth understanding of a scenario's component interactions and more on building a model that is robust to what
|
||||
it does not understand. Instead of removing noise, probablistic techniques work within the noise and are capable of correcting when noise leads it to make an
|
||||
incorrect assessment.
|
||||
|
||||
I still believe in the value of probability to track underlying and derivative features. In the future I will be considering the development of multivariate and
|
||||
noise isolation techniques. In executing this study when I did, not only will the content I learned will be fresh in my mind for when I start my Data Science
|
||||
graduate classes next month, but the unresolved curiosities that it uncovered will also be given a chance to develop. I'm already half-expecting one of the projects
|
||||
that I thought up for this to end up in my thesis. If in 2 years I publish some model derived on intelligent action of random but structured agents, you'll know
|
||||
that something this semester stuck.
|
||||
|
||||
A major challenge of this study was sifting through the mountains for of educational resources that rely on obscure mathematical notation with monumental complexity.
|
||||
It is simply an unfathomable failure on behalf of the educational systems that instruct probability to convey intuitive algorithms with an archaic language that
|
||||
nobody speaks. It felt like striking gold when I finally found the one resource that graphically or even programmatically translates these formulas. Most of my
|
||||
research time was dedicated to interpreting educational resources that appeared to have been made to appease superior instructors rather than making an effort to
|
||||
instruct. There may have been hours spent in research of confidence intervals, Bayes Theorem, and the Viterbi Algorithm, but there was ultimately only a single
|
||||
article or video for each of these topics that bridged the gap between abstraction to conceptualization.
|
||||
|
||||
I want to propagate this treasure and wrote this report to utilize those methods of instruction - not through mathematical abstractions of memory but through
|
||||
description. I am very proud of my newfound skills writing expressions and creating graphics in \LaTeX \ but even here I disjointed and rejoined each calculation
|
||||
with textual explanation, just as one would comment code in any remotely complex function. Mathematicians should not be exempt from this procedure. Additionally, I
|
||||
structured my report to be comprehensive, down to the order of axiom review. Content relevant to a section is either found in previous sections or simply described
|
||||
such that there isn't even a need for the actual academic terminology. While there is little expectation that this report will be read by anyone seeking to learn
|
||||
these concepts, I very much hope to hone the explanatory qualities that I have started here and share them with future students.
|
||||
|
||||
There may not have been a major application project as we'd originally intended for this independent study but I feel what came out of it has made my understanding
|
||||
of probability theory more grounded in how it's actually used than if I had made some niche demonstration that was poorly thought out in its viability. I'd like to
|
||||
thank my advisor, Dr. Kinsman, for seeing this endeavor for what it is and by encouraging me to keep up the research in its natural direction. This flexibility and
|
||||
uncertain guidance is exactly what is needed from data scientists if we are to truly find the unseen gems in our experiments. With the indefinite optimism that is
|
||||
lacking the world over, take confidence in the solutions not yet found.
|
||||
|
||||
\newpage
|
||||
\section{Appendix Information}
|
||||
Given that this report may only be shared by the RIT Computer Science Department without the appendix, the appendix for this report, to include the timesheet and
|
||||
tasks completed for this independent study, will be made available as a separate document.
|
||||
|
||||
\end{document}
|
||||
@@ -1,5 +1,6 @@
|
||||
Week,Date,Type,Duration (Hours),Description
|
||||
1,08/30,Advising Meetings,2,"Stat Review Content acknowledgement, Latex overview for reports"
|
||||
1,08/26-08/30,Research,8,"Subjective Probablistic Models in The Scout Mindset"
|
||||
2,09/02,Reporting,3,"First applications of Latex for final report, created Timesheet System."
|
||||
2,09/02,Research,2.5,"Stat Review: Sample Space through Probability Density Functions"
|
||||
2,09/06,Advising Meetings,1,"Research Review and exploration of PDF expected values and confidence intervals"
|
||||
@@ -47,3 +48,17 @@ Week,Date,Type,Duration (Hours),Description
|
||||
12,11/14,Reporting,1.5,"Contrasting Viterbi to exponential blowup"
|
||||
12,11/15,Reporting,3,"Viterbi graphics"
|
||||
12,11/15,Reporting,1,"Renormalization and Minority Rule Reporting"
|
||||
12,11/16,Reporting,2,"applying post-wandering mental feedback"
|
||||
13,11/18,Research,3,"Monte Carlo Simulations"
|
||||
13,11/19,Reporting,1,"Retrospective Discussion"
|
||||
13,11/19,Reporting,2,"Monte Carlo Simulations, Buffon's Needle"
|
||||
13,11/21,Research,1.5,"Skin in the Game - Fragile System Risk Aversion"
|
||||
13,11/21,Research,5,"Understanding MCMC. So not worth it."
|
||||
13,11/21,Reporting,2,"MCMC writeup"
|
||||
13,11/22,Advising Meetings,1,"Virtual Review of Research"
|
||||
14,11/26-11/30,Research,4.5,"The Signal and the Noise - Nate Silver"
|
||||
15,12/3,Advising Meetings,1,"Impromptu Discussion"
|
||||
15,12/5,Reporting,2,"Scale as Dimension, Methodology Considerations"
|
||||
15,12/5,Application,1.5,"Scale as Dimension and Moore's Law calculation (hugggeee numbers)"
|
||||
15,12/6,Advising Meetings,1,"Report Submission Discussion and Future Application"
|
||||
15,12/6,Reporting,3,"Pre-Finals Report finalization for semester submission"
|
||||
|
||||
|
Binary file not shown.
@@ -15,7 +15,7 @@
|
||||
|
||||
\rule{14cm}{0.05cm}\\ \vspace{.5cm}
|
||||
|
||||
\Large{Independent Study Timesheet}\\
|
||||
\Large{Independent Study Appendix and Timesheet}\\
|
||||
\large{Andrew Simonson}
|
||||
|
||||
\vspace*{\fill}
|
||||
@@ -33,6 +33,8 @@ Week & Date & Type & Duration (Hours) & Description \\
|
||||
\hline
|
||||
1 & 08/30 & Advising Meetings & 2 & Stat Review Content acknowledgement, Latex overview for reports \\
|
||||
\hline
|
||||
1 & 08/26-08/30 & Research & 8 & Subjective Probablistic Models in The Scout Mindset \\
|
||||
\hline
|
||||
2 & 09/02 & Reporting & 3 & First applications of Latex for final report, created Timesheet System. \\
|
||||
\hline
|
||||
2 & 09/02 & Research & 2.5 & Stat Review: Sample Space through Probability Density Functions \\
|
||||
@@ -127,12 +129,40 @@ Week & Date & Type & Duration (Hours) & Description \\
|
||||
\hline
|
||||
12 & 11/15 & Reporting & 1 & Renormalization and Minority Rule Reporting \\
|
||||
\hline
|
||||
12 & 11/16 & Reporting & 2 & applying post-wandering mental feedback \\
|
||||
\hline
|
||||
13 & 11/18 & Research & 3 & Monte Carlo Simulations \\
|
||||
\hline
|
||||
13 & 11/19 & Reporting & 1 & Retrospective Discussion \\
|
||||
\hline
|
||||
13 & 11/19 & Reporting & 2 & Monte Carlo Simulations, Buffon's Needle \\
|
||||
\hline
|
||||
13 & 11/21 & Research & 1.5 & Skin in the Game - Fragile System Risk Aversion \\
|
||||
\hline
|
||||
13 & 11/21 & Research & 5 & Understanding MCMC. So not worth it. \\
|
||||
\hline
|
||||
13 & 11/21 & Reporting & 2 & MCMC writeup \\
|
||||
\hline
|
||||
13 & 11/22 & Advising Meetings & 1 & Virtual Review of Research \\
|
||||
\hline
|
||||
14 & 11/26-11/30 & Research & 4.5 & The Signal and the Noise - Nate Silver \\
|
||||
\hline
|
||||
15 & 12/3 & Advising Meetings & 1 & Impromptu Discussion \\
|
||||
\hline
|
||||
15 & 12/5 & Reporting & 2 & Scale as Dimension, Methodology Considerations \\
|
||||
\hline
|
||||
15 & 12/5 & Application & 1.5 & Scale as Dimension and Moore's Law calculation (hugggeee numbers) \\
|
||||
\hline
|
||||
15 & 12/6 & Advising Meetings & 1 & Report Submission Discussion and Future Application \\
|
||||
\hline
|
||||
15 & 12/6 & Reporting & 3 & Pre-Finals Report finalization for semester submission \\
|
||||
\hline
|
||||
\end{longtable}
|
||||
\noindent Hours for Advising Meetings: 10.0\\
|
||||
Hours for Application: 16.0\\
|
||||
Hours for Reporting: 30.0\\
|
||||
Hours for Research: 46.0\\
|
||||
\textbf{Total Hours: 102.0}\\
|
||||
\noindent Hours for Advising Meetings: 13.0\\
|
||||
Hours for Application: 17.5\\
|
||||
Hours for Reporting: 42.0\\
|
||||
Hours for Research: 68.0\\
|
||||
\textbf{Total Hours: 140.5}\\
|
||||
% CLOSE Timesheet
|
||||
|
||||
\end{document}
|
||||
Reference in New Issue
Block a user