Semester Submission

This commit is contained in:
2024-12-06 18:02:57 -05:00
parent a7b3e46f56
commit c9189c0e8f
5 changed files with 351 additions and 125 deletions

Binary file not shown.

View File

@@ -57,8 +57,9 @@ used at the intersection of data and computing after the preprocessing stage.
After beginning this study the intended deliverable outline was determined to be technically implausible and has been replaced with After beginning this study the intended deliverable outline was determined to be technically implausible and has been replaced with
demonstrations of applied algorithms. Taking inspiration from the retinal mosaic as displayed in \textbf{CSCI 431: Intro to Computer Vision} demonstrations of applied algorithms. Taking inspiration from the retinal mosaic as displayed in \textbf{CSCI 431: Intro to Computer Vision}
and discussion in \textbf{IGME 589: Computational Creativity and Algorithmic Art} on the appearance and nature of randomness in graphics, I hope to create and discussion in \textbf{IGME 589: Computational Creativity and Algorithmic Art} on the appearance and nature of randomness in graphics, I will use this report as a
a program that can determine the liklihood that randomly distributed colors on a hexagonal grid appear as they do in an image. platform for conceptual refactorization. These experiments are designed to appeal to human logical heuristics, helping them function as educational resources that
develop a deeper understanding of why these systems work, not just the equations to use them.
\newpage \newpage
\section{Units} \section{Units}
@@ -66,8 +67,9 @@ a program that can determine the liklihood that randomly distributed colors on a
\subsection{Unit 1: Statistics Review} \subsection{Unit 1: Statistics Review}
To ensure a strong statistical foundation for the future learnings in probabilistic models, To ensure a strong statistical foundation for the future learnings in probabilistic models,
the first objective was to create a document outlining and defining key topics that are the first objective is to create a document outlining and defining key topics that are
prerequisites for probabilities in statistics or for understanding generic analytical models. prerequisites for probabilities in statistics or for understanding generic analytical models. While not intended to be in-depth, the reported review can function as
a topic recall and simplification dictionary.
\subsubsection{Random Variables} \subsubsection{Random Variables}
\begin{enumerate} \begin{enumerate}
@@ -115,14 +117,15 @@ There are three probability axioms:
Where \(N\) is the population size, \(\mu\) is the population average, and \(X\) is each value in the population.\\ Where \(N\) is the population size, \(\mu\) is the population average, and \(X\) is each value in the population.\\
For samples, variance is calculated with \textbf{Bessel's Correction}, which increases the variance to avoid overfitting the sample: For samples, variance is calculated with \textbf{Bessel's Correction}, which increases the variance to avoid overfitting the sample:
\[s^{2}=\frac{\sum(X - \bar{x})^{2}}{n - 1}\] \[s^{2}=\frac{\sum(X - \bar{x})^{2}}{n - 1}\]
\item \textbf{Standard Deviation - }The square root of the variance, giving a measure of the average distance of each data point from the mean in the same units as the data. \item \textbf{Standard Deviation - }The square root of the variance, giving a measure of the average distance of each data point from the mean in the same units as
the data.
\[\sigma = \sqrt{V}\quad\text{where variance is }V\] \[\sigma = \sqrt{V}\quad\text{where variance is }V\]
\end{enumerate} \end{enumerate}
\subsubsection{Probability Functions} \subsubsection{Probability Functions}
Probability Functions map the likelihood of random variables to be a specific value. Probability Functions map the likelihood of random variables to be a specific value.
\subsubsection*{Probability Mass Functions} \subsubsection*{Probability Mass Functions}\label{PMF}
Probability Mass Functions (PMFs) map discrete random variables. Probability Mass Functions (PMFs) map discrete random variables.
For example, a six-sided die roll creates a uniform random PMF. Each side of the die has a one-sixth chance of landing face-up, so the discrete chances of each x For example, a six-sided die roll creates a uniform random PMF. Each side of the die has a one-sixth chance of landing face-up, so the discrete chances of each x
value between 1 and 6 is represented by a \(\frac{1}{6}\)th portion of the sample space: value between 1 and 6 is represented by a \(\frac{1}{6}\)th portion of the sample space:
@@ -285,8 +288,8 @@ sample size.
\subsubsection{Dempster-Shafer Theory}\label{Dempster_Shafer_Theory} \subsubsection{Dempster-Shafer Theory}\label{Dempster_Shafer_Theory}
This section is an extra theory chosen to coincide with the unit 3 focus on Bayesian statistics. The Dempster-Shafer theory is a derivative application of This section is an extra theory chosen to coincide with the unit 3 focus on Bayesian statistics. The Dempster-Shafer theory is a derivative application of
Bayes Theorem (\ref{Bayes Theorem}) where subjective beliefs are applied to independent variables not tracked by the belief network. Shafer so eloquently describes this Bayes Theorem (\ref{Bayes Theorem}) where subjective beliefs are applied to independent variables not tracked by the belief network. Shafer so eloquently describes
process by supposing that two friends, both of whom he subjectively believes are 90\% reliable, tell him that a limb has fallen on his car this process by supposing that two friends, both of whom he subjectively believes are 90\% reliable, tell him that a limb has fallen on his car
\footnote{\url{http://glennshafer.com/assets/downloads/articles/article48.pdf}}. Without observing Shafer's car we can calculate that there is only a 1\% chance that \footnote{\url{http://glennshafer.com/assets/downloads/articles/article48.pdf}}. Without observing Shafer's car we can calculate that there is only a 1\% chance that
both friends are unreliable, so there's a high liklihood that the statement is true. both friends are unreliable, so there's a high liklihood that the statement is true.
@@ -307,13 +310,47 @@ then renormalize the entire community, resulting in local grocery store offering
community. If a data scientist then infers from the offerings of this grocery store the dietary preferences of the community, they would be inclined to believe that community. If a data scientist then infers from the offerings of this grocery store the dietary preferences of the community, they would be inclined to believe that
the actual minority is not just a majority, but a requirement amongst the population. In this sense, tolerance for intolerance begets intolerance. the actual minority is not just a majority, but a requirement amongst the population. In this sense, tolerance for intolerance begets intolerance.
\subsubsection{Methodology Considerations} \subsubsection{Scale as a Dimension}
I have taken 10134023 instances of the last 40 years, during all of which Obama has been alive. Therefore I can say with a high degree of certainty that Obama is Just as the rate and plausibility of renormalization is impacted by the ratio of the minority to the flexible majority, other interactions can become more complex
immortal. through scale to the same effect as the curse of dimensionality. The curse of dimensionality is a reference to the exponential complexity of solving a problem with
x variables. Two boolean variables, each containing one of two values, has 4 possible combinations of values. A third variable doubles this number to 8, a fourth
doubles it again to 16. In complex interactions, scale acts as its own source of dimensionality becuase each new node in an ecosystem can interact with each
pre-existing node, influencing interactions between it and another pre-existing node, which then influences the interactions from that node, and so forth.
In \textit{Skin in the Game}, Taleb uses the example of neuroscience to show the improbability of AI ever reflecting the full complexity of the human brain. He
acknowledges advancements in neuoscience that accurately models interactions between neurons in the human brain, but scaling ths up to replicate human behavior is
not so easy. While binary variables apply an exponential effect with a base of 2 (\(2^x\) where x is number of binary variables), neurons interlock and may have an
effect of a hundredth, thousandth, or even millionth base.
This complexity, Taleb says, explains why even carefully studied brains of worms with only 300 neurons are still too complex to really understand, let alone
simulate. If neurons had only a binary effect, the complexity could be calculated to \(2^300 = 2 * 10^90\) which, while massive, could conceivably be computed in
the distant future. However, if each neuron can interact with just 5 others, the combination explosion grows to \((2^5)^300 = 3.5 * 10^451\). Applying Moore's Law
and we assume that a society's computational capacity doubles every two years, it would take 2400 years before this difference in computational power could be
rectified:
\[
2 * \log_2\left(\frac{3.5 * 10^{451}}{2 * 10^{90}}\right) \approx 2400.04 \text{ years}
\]
Not to mention, neuron interactions are incredibly complex, containing dimensions in of themselves, not binary values. Good luck computing that, robot overlords.
\subsubsection{Methodology Considerations}
As another homage to \textit{Garbage In, Garbage Out}, I'd like to present some instances of methodology creating useless data for the target variables. This is not
just a reference to bad studies, such as those that try to measure social behaviors, oblivious to the fact that participant observation alters their behavior. There
are many instances that data can be untainted but used without appropriate context. In particular, \textit{The Signal and the Noise} and
\textit{Fooled by Randomness} highlight many instances where timeseries studies believe that decades of historical data is necessicarily comprehensive. Financial
events in particular are often labelled as unpredictable by experts only when their models fail because the context of a national economy changes dramatically which
can reveal attributes to market economics that were previously obscured by practices that isolate those variables. An event never occurring in history does not
discount its possiblity of occurring in the future. Similarly, events that may have been impossible in the past are not necessarily impossible in the future. As an
extreme example to prove a point, consider the following:
\begin{quote}
I have taken 10134023 instances of the last 40 years, during all of which Obama has been alive. Therefore, with so much time passed and many trials, I can say with
a high degree of certainty that Obama is immortal.
\end{quote}
Silly, yes, but it is easy to become detached from context points when you begin digging deep into mathematical models. Data science is generally considered to be
the intersection of coding, statistics, and domain knowledge, implying domain knowledge is secondary to computational ability. I'd argue just the opposite -
incomplete knowledge of contemporary models still lends itself to effective data analysis but an incomplete understanding of what is being measured is dangerous and
potentially counterproductive.
An event never occurring in history does not discount its possiblity of occurring in the future. Similarly, events that may have been impossible in the past
are not necessarily impossible in the future.
Also, psychology. Someone who knows they are being studied will act differently than someone who isn't being studied so models will be inaccurate.
\newpage \newpage
\subsection{Unit 3: Bayesian Statistics} \subsection{Unit 3: Bayesian Statistics}
@@ -415,7 +452,8 @@ category:
\vskip 2pt \vskip 2pt
Bayes Theorem as applied to this problem can be simply expressed as: Bayes Theorem as applied to this problem can be simply expressed as:
\[ \[
P(\text{has cancer given positive test}) = \frac{\colorbox{blue!5}{TP}}{\colorbox{blue!5}{TP} + \colorbox{red!5}{FP}} = \frac{\colorbox{blue!5}{\(\frac{95}{1000}\)}}{\colorbox{blue!5}{\(\frac{95}{1000}\)} + \colorbox{red!5}{\(\frac{45}{1000}\)}} = 67.9\% P(\text{has cancer given positive test}) = \frac{\colorbox{blue!5}{TP}}{\colorbox{blue!5}{TP} + \colorbox{red!5}{FP}} = \frac{\colorbox{blue!5}{\(\frac{95}{1000}\)}}
{\colorbox{blue!5}{\(\frac{95}{1000}\)} + \colorbox{red!5}{\(\frac{45}{1000}\)}} = 67.9\%
\] \]
Meaning that, given a random positive test, there is a 67.9\% chance of the patient actually having cancer, not far off from the two-thirds visual trick. Meaning that, given a random positive test, there is a 67.9\% chance of the patient actually having cancer, not far off from the two-thirds visual trick.
@@ -429,6 +467,9 @@ the calculation only performed once.
\subsubsection{Bayesian Belief Networks} \subsubsection{Bayesian Belief Networks}
\begin{center}
\textit{Using Bayes to build an ensemble of models}
\end{center}
Bayesian Belief Networks are probabilistic graphical models that preserve conditional dependence between random variables. In spite of its name, Bayesian Belief Networks are probabilistic graphical models that preserve conditional dependence between random variables. In spite of its name,
Bayesian Belief Networks do not necessarily apply Bayesian models, though they are a way to utilize Bayes Theorem for domains with greater complexity beyond a Bayesian Belief Networks do not necessarily apply Bayesian models, though they are a way to utilize Bayes Theorem for domains with greater complexity beyond a
single posterior probability. In this type of network, edges are directed and the structure is utilized in a single direction. This is in contrast to undirected single posterior probability. In this type of network, edges are directed and the structure is utilized in a single direction. This is in contrast to undirected
@@ -493,7 +534,7 @@ positives from two independent tests each with 50\% of positives being true. If
that the tests partially measure the same thing, as would have occured in a Naive Bayes model, the tests' combined accuracy would be unjustly inflated. that the tests partially measure the same thing, as would have occured in a Naive Bayes model, the tests' combined accuracy would be unjustly inflated.
\newpage \newpage
\subsection{Unit 4: Markov Methods} \subsection{Unit 4: Markov Methods}\label{Markov}
\subsubsection{Markov Chains} \subsubsection{Markov Chains}
@@ -617,15 +658,15 @@ Thus, an observation sequence may look like this:
In this case, it can be confidently assumed that the wet signal is representative of a rainy, cloudy day. In contrast, we can only be moderately confident that the In this case, it can be confidently assumed that the wet signal is representative of a rainy, cloudy day. In contrast, we can only be moderately confident that the
two dry days leading up to it were sunny days. Intuitively, it is most likely that there were two sunny days followed by a rainy day. By multiplying the probability two dry days leading up to it were sunny days. Intuitively, it is most likely that there were two sunny days followed by a rainy day. By multiplying the probability
of observation to the transformation to the potential state, the probability of occurrence is revealed. Below, we assume a 50-50 chance of initialization at a sunny of observation to the transformation to the potential state, the probability of occurrence is revealed. For the purposes of the example we will use the 83\%-16\%
or cloudy day: equilibrium matrix from earlier as the initialization matrix to reflect the random chance of any given day being sunny or cloudy:
\begin{center} \begin{center}
Three consecutive sunny days: Three consecutive sunny days:
\[(.5 * .95) * (.9 * .95) * (.9 * .05) \approx 0.01828 \] \[(\frac{5}{6} * .95) * (.9 * .95) * (.9 * .05) \approx 0.03 \]
Three consecutive cloudy days: Three consecutive cloudy days:
\[(.5 * .6) * (.5 * .6) * (.5 * .4) = 0.018 \] \[(\frac{1}{6} * .6) * (.5 * .6) * (.5 * .4) = 0.006 \]
Sunny, sunny, cloudy: Sunny, sunny, cloudy:
\[(.5 * .95) * (.9 * .95) * (.1 * .4) \approx 0.01625 \] \[(\frac{5}{6} * .95) * (.9 * .95) * (.1 * .4) \approx 0.027 \]
\end{center} \end{center}
Interestingly, the calculation reveals that it is actually more probable that there was an unusual wet third day during a sunny streak than for there to have been Interestingly, the calculation reveals that it is actually more probable that there was an unusual wet third day during a sunny streak than for there to have been
@@ -638,6 +679,9 @@ a meaning as \(\pi\) be addressed for something that has no relation to the cons
accessibility of mathematics for anybody shy of a walking computational index. accessibility of mathematics for anybody shy of a walking computational index.
\subsubsection{Viterbi Algorithm} \subsubsection{Viterbi Algorithm}
\begin{center}
\textit{Markov is memoryless - only the most probable sequence to a state matters}
\end{center}
While it is feasible to calculate the probabilities for each possible route to a series of observations, such a process produces an exponential time complexity. While it is feasible to calculate the probabilities for each possible route to a series of observations, such a process produces an exponential time complexity.
With each state change, the number of paths to keep track of grows exponentially, which in practical terms means countless threads on each state separated only by With each state change, the number of paths to keep track of grows exponentially, which in practical terms means countless threads on each state separated only by
the history of how they got there. Enter the Viterbi Algorithm, which reduces the effect of a step (or, as in our example, a new day) from an exponential the history of how they got there. Enter the Viterbi Algorithm, which reduces the effect of a step (or, as in our example, a new day) from an exponential
@@ -688,9 +732,9 @@ calculated:
\begin{center} \begin{center}
Two consecutive sunny days: Two consecutive sunny days:
\[(.5 * .95) * (.9 * .95) = 0.406125 \] \[(\frac{5}{6} * .95) * (.9 * .95) \approx 0.677 \]
Sunny, cloudy: Sunny, cloudy:
\[(.5 * .6) * (.5 * .95) = 0.1425 \] \[(\frac{1}{6} * .6) * (.1 * .6) = 0.006 \]
\end{center} \end{center}
Hence, we can eliminate the \([\text{Cloudy, Sunny}]\) starting sequence from the most probable sequence of steps given the observations. Doing the same thing Hence, we can eliminate the \([\text{Cloudy, Sunny}]\) starting sequence from the most probable sequence of steps given the observations. Doing the same thing
@@ -713,121 +757,258 @@ for the rest of the visualization leaves fewer arrows and therefore fewer calcul
(Sunny1) edge node {} (Sunny2) (Sunny1) edge node {} (Sunny2)
(Cloudy1) edge node {} (Cloudy2) (Cloudy1) edge node {} (Cloudy2)
(Sunny2) edge node {} (Sunny3) (Sunny2) edge node {} (Sunny3)
(Cloudy2) edge node {} (Cloudy3); (Sunny2) edge node {} (Cloudy3);
\path[->, draw=red] % \path[->, draw=red]
(Sunny1) edge node[midway] {\textbf{x}} (Cloudy2) % (Sunny1) edge node {} (Cloudy2)
(Cloudy1) edge node[midway] {\textbf{x}} (Sunny2) % (Cloudy1) edge node {} (Sunny2)
(Sunny2) edge node[midway] {\textbf{x}} (Cloudy3) % (Cloudy2) edge node {} (Cloudy3)
(Cloudy2) edge node[midway] {\textbf{x}} (Sunny3); % (Cloudy2) edge node {} (Sunny3);
\end{tikzpicture} \end{tikzpicture}
\end{center} \end{center}
With only two sequences remaining, the final comparison needs only to determine if it is more likely for there to have been three consecutive sunny days or three With only two sequences remaining, the final comparison needs only to determine if it is more likely for there to have been three consecutive sunny days or a sequence
consecutive cloudy days, which was already done in the Hidden Markov Model section (\ref{HMMs}). of two sunny days and a cloudy day\footnote{Had we assumed a 50-50 chance of initialization on a sunny or cloudy day, the probability of three consecutive cloudy days
would have been more likely than a sunny, sunny, cloudy sequence. Yet another example where contextual completeness in the methodology makes a significant
improvement in accuracy over what might otherwise have been napkin math.}, which we already calculated in the Hidden Markov Model section (\ref{HMMs}). If this
calculation was extended to include additional days, the Viterbi Algorithm would never need to calculate a path that started with two cloudy days because all branches
stemming from that route have already been pruned by the third day.
\newpage \newpage
\subsection{Unit 5: Monte Carlo Simulations} \subsection{Unit 5: Monte Carlo Simulations}
what is this shit Monte Carlo Simulations are models that directly recreate the conditions of an environment containing random variables to simulate the outcome given a value in place
of the random variable. This placeholder value may be an average of an expected occurrence but often the simulation is run many times with a randomly selected value
so the results can be analyzed in place of many trials in the real environment.
Monte Carlo is useful when interactions between many variables produce deterministic but intractable results or if the steps to translate into a deterministic
model are not fully understood. For every probability problem there exists a Monte Carlo Simulation that steps through the process of how a result is created without
any derived formulation (which may be incorrect, especially if a problem is not completely understood). While the results are influenced by short-term bias in the
random variable, the results converge towards the true Probability Mass Function (\ref{PMF}) as long as the simulation accurately reflects the interaction between
variables.
\subsubsection{How To Make a Monte Carlo Simulation} \subsubsection{How To Make a Monte Carlo Simulation}
If you've ever created a simulation and run it multiple times to get a feel for what is most likely to happen, congratulations! You've created a Monte Carlo
Simulation.
\subsubsection{Monte Carlo Integration} As an example, consider the scenario described in the Markov Model section of this report (\ref{Markov}) where we want to predict if a day \(x\) days in the future
will be either sunny or rainy. Here is that same table representing the odds of a day transitioning from the state of the previous day:
\subsubsection{Markov Chain Monte Carlo (MCMC) methods}
\newpage
\section{Applied Projects}
\rule{14cm}{0.05cm}
\subsection{Randomness of Retinal Mosaic layout}
hexagonal grid of marbles. are colors randomly distributed?
Hexagonal basis vectors, retinal mosaic, entropy
\subsection{Bayes Server Ripoff}
I planned to create a trickle-down density belief network using probability density functions as nodes that choose the direction of rows in a relational database.
Found this later, it's sort of similar. \url{https://www.bayesserver.com/}
Even better than their jank bayesian belief network I may be able to make mixed bayesian/markov chain models. This is a big project.
\subsection{Cost-Benefit Analysis of Remote Education}
This section covers a calculation I devised to make me feel better about my life decisions. The data is based on implicit guesswork and, while I will be taking it
somewhat seriously for my decision to do either the online or on-campus RIT Data Science Masters Program, it should not be taken seriously as a probabilistic model.
Since there is no framework for making a subjective decision weighting the potential benefits of on-campus life with the value of entering the workforce 18 months
sooner, I decided to make one. Inshallah I shall reach my true potential and fulfill destiny.
\subsubsection{Selecting and Creating Key Metrics}
Since both programs result in a Data Science M.S. degree (albeit under the school of Software Engineering for on-campus versus the school of information for online),
the functional equivalence of the resulting certificate of completion is an effective isolator of potential long-term ramifications in career path that might otherwise
be dictated by hiring processes that favor one degree over the other. Therefore, this analysis is justified in focusing only on events occurring during my extended
education. I have selected two calculated features\footnote{features that I do not intend to calculate on the basis that it is impossible without a crystal ball and
knowledge of fortune telling - a cursed art that has been forbidden by the council for centuries.} that are important to determining the utility of
potential events from each masters program.
The generalized feature I've selected is serendipity\footnote{Read more about this definition of serendipity in \textit{Where Good Ideas Come From: The Natural
History of Innovation} by Steven Johnson}: the potential for the spontaneous formulation of creative genius brought about by the random collision of ideas - the
proverbial cafe of intellectuals where overheard conversations turn into incredible revelations. The on-campus program excels in this category because it extends
my stay in the academically diverse setting of Rochester Institute of Technology's main campus, potentially enabling interdisciplinary connections and research
opportunities. It also would grant me more time to get involved in the Simone Center for Innovation and Entrepreneurship which is an enticing hub for startups that
I can see myself becoming a key part of. In contrast, the online program offers me few opportunities to connect within RIT while opening the door to starting a
career in person sooner, which holds potential for intrapreneurship and a more directed interdisciplinary relationship. I acknowledge the magnitude of such
opportunities to be lesser, but more probable, especially if I change jobs more frequently.
When I was first choosing features I wanted to include a second metric to capture a level of character growth and mental health as a reflection of the impact of being
online and not being face-to-face with other people. In doing so I'd be modeling real-life variables that most would overlook.
Digging into it I realized I'd have to derive it from the magnitude and probabilities of social advantages of each program.
The community fostered, the friends not made. I can't bring myself to even make up numbers for that in a goof napkin-math formula.
Measuring covariance between these two features just feels disgusting. Instead, I'm going to negate the whole variable with this assumption about finding something
else to do with my life outside of work:
\begin{center} \begin{center}
\textit{The negative social effects of online program isolation are equal to and canceled out by the personal growth derived from the extra effort to find \begin{tabular}{ | c | c | c | }
'the third place' \footnote{First and second places are home and work. Read more at: \url{https://en.wikipedia.org/wiki/Third_place}} seeded by the frustration \hline
towards myself for puttimg myself in this position.} Current State & Next: Sunny & Next: Cloudy \\
\hline
\hline
Sunny & 90\% & 10\% \\
\hline
Cloudy & 50\% & 50\% \\
\hline
\end{tabular}
\end{center} \end{center}
\paragraph{Creating PMFs} To run a single possiblity of this interaction, initialize the state to define if the first day is sunny or cloudy (possibly using the equilibrium matrix as discussed
prevously). Then, generate a random number and partition the possible results to match the table. If the first day is sunny and the random number is between 0 and 1
then one option is to transition to a cloudy state if the number is greater than .9, reflecting the 90\% chance that the next day will also be sunny. Continuing this
for the next few days, the random variable may leave a state transition path like \([\text{Sunny, Sunny, Cloudy}]\). Running the simulation again may net a different
path: \([\text{Sunny, Cloudy, Sunny}]\). With more simulations, the collected random sample will quantify the probability of a sunny day on the third day with a
simple ratio:
Let's create probability mass functions for our feature in each program to subjectively measure potential: \[\frac{\text{\# of simulations that end with a sunny day}}{\text{total \# of simulations}} \approx 0.86\%\footnote{Again, assuming a 100\% chance of sunny day
initialization.}\]
Let the probability of magnitude \(X\) serendipity on the campus program and the online program as \(P(X_c)\) and \(P(X_o)\) respectively. We can validate this model by using our k-step transition matrix (\ref{Markov}):
The on-campus program has advantages in serendipity, but while events may be an order of magnitude more impactful, I've already been on campus for three and a half \[
years and it feels highly unlikely that I will make sufficient changes to my routines to grant me more than a marginal probability of a serendipitous event occurring \begin{pmatrix}
\begin{equation*} .9 & .1 \\
P(A_c) = .5 & .5
\begin{cases} \end{pmatrix}^2
.8\qquad\text{if }&X=0\\ =
.105&X=1\\ \begin{pmatrix}
.045&X=2\\ .86 & .14 \\
.025&X=3\\ .7 & .3
.0125&X=4\\ \end{pmatrix}
.009&X=5\\ \]
.0035&X=6\\
0&\text{Otherwise}
\end{cases}
\end{equation*}
**graph** Recall the top left number of this matrix reflects the probability of ending on a sunny day (column) given that the first day was sunny (row).
The online program wields greater chances of serendipity by placing me in more unique environments by means of starting my career sooner, hopefully giving me more \subsubsection{Monte Carlo Integration}
time to utilize what remains of my ambition before it crumbles with age and routine. There may be less of an impact for a serendipitous event when experiencing it Monte Carlo Integration is one use of Monte Carlo Simulations where the area of an object (or graphical integral) is calculated by selecting random coordinates and
remotely or within a corporate structure, but what does a foolish little boy still in school know about the passion inbued by one's own accidental discoveries? calculating the ratio of random coordinate points that were in the object (under the curve) to the total number of random coordinates. I'm including this section in
the report for completeness since when I drafted this study's schedule I incorrectly assumed that this was a topic that would extend Monte Carlo, not just
apply it.\footnote{I made this mistake at least twice. If you're bored, try to spot which topics they are. Unlicensed gamification moment.}
\begin{equation*} One example of this integration method, called Buffon's Needle, is an approximation of pi (yes, \(\pi\)) by dropping sticks on a series of parallel lines. Assuming
P(A_o) = the length of the sticks is shorter than the distance between the parallel lines, this interaction is statistically governed by the expression \(\frac{2l}{\pi d}\)
\begin{cases} where \(l\) is the length of the sticks and \(d\) is the space between parallel lines\footnote{Learn more about and run a Monte Carlo Simulation of the sticks
.6\qquad\text{if }&X=0\\ approximation at \url{https://prancer.physics.louisville.edu/modules/pi/index.html}}.
.225&X=1\\
.115&X=2\\
.045&X=3\\
.0087&X=4\\
.0043&X=5\\
.002&X=6\\
0&\text{Otherwise}
\end{cases}
\end{equation*}
**graph** \subsubsection{Markov Chain Monte Carlo (MCMC) methods}
\begin{center}
\textit{Simulations can depend on their prior results}
\end{center}
MCMCs are a class of Monte Carlo simulations that epitomize stochastic sampling. Given a probability distribution that is too complex to be analyzed
traditionally, MCMCs approximate the target distribution with an equilibrim distribution that converges on that target distribution.
with archaic knowledge imbued by Dr. Pepper flowing through my veins, I have selected \(y= 3x^2 - 2y\) as the equation for covariance. Contrary to the name of "Markov Chain Monte Carlos" and most educational works on the topic, I believe the easiest way to understand MCMC as a Monte Carlo simulation
with 1-step memory. MCMC invokes the name of Markov Chains because in the array of sampled random values each value is randomly selected with influence of the
previous value - something many compare to the memoryless state-hopping in Markov Chains. In reality, the 'state' in MCMCs is just a value whose importance is in how
often this value is in the array. It's not a state with contextual value or an associated transition matrix.
There are a number of algorithms that implement the concept of MCMC, the most common of which is called the \textbf{Metropolis-Hastings Algorithm}.
\footnote{If you're like me and can't handle the abstractions that education by mathematical notation requires, this video on Metropolis-Hastings is the best I can
point you to on the topic of MCMCs: \url{https://www.youtube.com/watch?v=oX2wIGSn4jY}}
In this variation, an initial value is selected at random. For each step, another random value, frequently in the range of one standard distribution, is added to this
number, which has a \(\frac{P(\text{new value})}{P(\text{new value}) + P(\text{current value})}\) chance of becoming the new current value and added to the list of
samples. If the current sample is selected over the new value, the current sample is added a second time to the list of samples. This acceptance criteria directs
the samples towards high probability events while still keeping open the chance of the samples bridging the gap between local probabilistic maxima.
% \newpage
% \subsection{Unit 6: Miscellaneous}
% This section represents research on topics that were not initially a part of the study's scope but were either interesting, relevant, or suggested to me.
% \subsubsection{Spatial Descriptive Statistics and Ripley's K and L Functions}
% \url{https://en.wikipedia.org/wiki/Spatial_descriptive_statistics}
% \subsubsection{Gibbs Sampling}
% \newpage
% \section{Applied Projects}
% \rule{14cm}{0.05cm}
% \subsection{Randomness of Retinal Mosaic layout}
% hexagonal grid of marbles. are colors randomly distributed?
% Hexagonal basis vectors, retinal mosaic, entropy
% \subsection{Bayes Server Ripoff}
% I planned to create a trickle-down density belief network using probability density functions as nodes that choose the direction of rows in a relational database.
% Found this later, it's sort of similar. \url{https://www.bayesserver.com/}
% Even better than their jank bayesian belief network I may be able to make mixed bayesian/markov chain models. This is a big project.
% \subsection{Modeling the Invisible Hand}
% Skin in the Game: Taleb. Unintelligent random actors in structure create intelligent decisions.
% I can monte carlo that shit.
% \subsection{Cost-Benefit Analysis of Remote Education}
% This section covers a calculation I devised to make me feel better about my life decisions. The data is based on implicit guesswork and, while I will be taking it
% somewhat seriously for my decision to do either the online or on-campus RIT Data Science Masters Program, it should not be taken seriously as a probabilistic model.
% Since there is no framework for making a subjective decision weighting the potential benefits of on-campus life with the value of entering the workforce 18 months
% sooner, I decided to make one. Inshallah I shall reach my true potential and fulfill destiny.
% \subsubsection{Selecting and Creating Key Metrics}
% Since both programs result in a Data Science M.S. degree (albeit under the school of Software Engineering for on-campus versus the school of information for online),
% the functional equivalence of the resulting certificate of completion is an effective isolator of potential long-term ramifications in career path that might otherwise
% be dictated by hiring processes that favor one degree over the other. Therefore, this analysis is justified in focusing only on events occurring during my extended
% education. I have selected two calculated features\footnote{features that I do not intend to calculate on the basis that it is impossible without a crystal ball and
% knowledge of fortune telling - a cursed art that has been forbidden by the council for centuries.} that are important to determining the utility of
% potential events from each masters program.
% The generalized feature I've selected is serendipity\footnote{Read more about this definition of serendipity in \textit{Where Good Ideas Come From: The Natural
% History of Innovation} by Steven Johnson}: the potential for the spontaneous formulation of creative genius brought about by the random collision of ideas - the
% proverbial cafe of intellectuals where overheard conversations turn into incredible revelations. The on-campus program excels in this category because it extends
% my stay in the academically diverse setting of Rochester Institute of Technology's main campus, potentially enabling interdisciplinary connections and research
% opportunities. It also would grant me more time to get involved in the Simone Center for Innovation and Entrepreneurship which is an enticing hub for startups that
% I can see myself becoming a key part of. In contrast, the online program offers me few opportunities to connect within RIT while opening the door to starting a
% career in person sooner, which holds potential for intrapreneurship and a more directed interdisciplinary relationship. I acknowledge the magnitude of such
% opportunities to be lesser, but more probable, especially if I change jobs more frequently.
% When I was first choosing features I wanted to include a second metric to capture a level of character growth and mental health as a reflection of the impact of being
% online and not being face-to-face with other people. In doing so I'd be modeling real-life variables that most would overlook.
% Digging into it I realized I'd have to derive it from the magnitude and probabilities of social advantages of each program.
% The community fostered, the friends not made. I can't bring myself to even make up numbers for that in a goof napkin-math formula.
% Measuring covariance between these two features just feels disgusting. Instead, I'm going to negate the whole variable with this assumption about finding something
% else to do with my life outside of work:
% \begin{center}
% \textit{The negative social effects of online program isolation are equal to and canceled out by the personal growth derived from the extra effort to find
% 'the third place' \footnote{First and second places are home and work. Read more at: \url{https://en.wikipedia.org/wiki/Third_place}} seeded by the frustration
% towards myself for puttimg myself in this position.}
% \end{center}
% \paragraph{Creating PMFs}
% Let's create probability mass functions for our feature in each program to subjectively measure potential:
% Let the probability of magnitude \(X\) serendipity on the campus program and the online program as \(P(X_c)\) and \(P(X_o)\) respectively.
% The on-campus program has advantages in serendipity, but while events may be an order of magnitude more impactful, I've already been on campus for three and a half
% years and it feels highly unlikely that I will make sufficient changes to my routines to grant me more than a marginal probability of a serendipitous event occurring
% \begin{equation*}
% P(A_c) =
% \begin{cases}
% .8\qquad\text{if }&X=0\\
% .105&X=1\\
% .045&X=2\\
% .025&X=3\\
% .0125&X=4\\
% .009&X=5\\
% .0035&X=6\\
% 0&\text{Otherwise}
% \end{cases}
% \end{equation*}
% **graph**
% The online program wields greater chances of serendipity by placing me in more unique environments by means of starting my career sooner, hopefully giving me more
% time to utilize what remains of my ambition before it crumbles with age and routine. There may be less of an impact for a serendipitous event when experiencing it
% remotely or within a corporate structure, but what does a foolish little boy still in school know about the passion inbued by one's own accidental discoveries?
% \begin{equation*}
% P(A_o) =
% \begin{cases}
% .6\qquad\text{if }&X=0\\
% .225&X=1\\
% .115&X=2\\
% .045&X=3\\
% .0087&X=4\\
% .0043&X=5\\
% .002&X=6\\
% 0&\text{Otherwise}
% \end{cases}
% \end{equation*}
% **graph**
% with archaic knowledge imbued by Dr. Pepper flowing through my veins, I have selected \(y= 3x^2 - 2y\) as the equation for covariance.
\newpage
\section{Retrospective Discussion}
At the end of this independent study it's worth reflecting on how my initial proposal has changed as I've learned more about this topic. Going into the Fall 2024
semester I wanted to understand how complex algorithms manage the influence of untracked variables and how they could be used to derive formulas for the influence on
tracked variables on the target. While I did recieve some insight on how to go about formulating experiments to do this, especially through a more personal
understanding with the foundational statistics, I found fairly little industry application of the scientific conceptualization that I expected. Most practical
applications of probability theory rely less on an in-depth understanding of a scenario's component interactions and more on building a model that is robust to what
it does not understand. Instead of removing noise, probablistic techniques work within the noise and are capable of correcting when noise leads it to make an
incorrect assessment.
I still believe in the value of probability to track underlying and derivative features. In the future I will be considering the development of multivariate and
noise isolation techniques. In executing this study when I did, not only will the content I learned will be fresh in my mind for when I start my Data Science
graduate classes next month, but the unresolved curiosities that it uncovered will also be given a chance to develop. I'm already half-expecting one of the projects
that I thought up for this to end up in my thesis. If in 2 years I publish some model derived on intelligent action of random but structured agents, you'll know
that something this semester stuck.
A major challenge of this study was sifting through the mountains for of educational resources that rely on obscure mathematical notation with monumental complexity.
It is simply an unfathomable failure on behalf of the educational systems that instruct probability to convey intuitive algorithms with an archaic language that
nobody speaks. It felt like striking gold when I finally found the one resource that graphically or even programmatically translates these formulas. Most of my
research time was dedicated to interpreting educational resources that appeared to have been made to appease superior instructors rather than making an effort to
instruct. There may have been hours spent in research of confidence intervals, Bayes Theorem, and the Viterbi Algorithm, but there was ultimately only a single
article or video for each of these topics that bridged the gap between abstraction to conceptualization.
I want to propagate this treasure and wrote this report to utilize those methods of instruction - not through mathematical abstractions of memory but through
description. I am very proud of my newfound skills writing expressions and creating graphics in \LaTeX \ but even here I disjointed and rejoined each calculation
with textual explanation, just as one would comment code in any remotely complex function. Mathematicians should not be exempt from this procedure. Additionally, I
structured my report to be comprehensive, down to the order of axiom review. Content relevant to a section is either found in previous sections or simply described
such that there isn't even a need for the actual academic terminology. While there is little expectation that this report will be read by anyone seeking to learn
these concepts, I very much hope to hone the explanatory qualities that I have started here and share them with future students.
There may not have been a major application project as we'd originally intended for this independent study but I feel what came out of it has made my understanding
of probability theory more grounded in how it's actually used than if I had made some niche demonstration that was poorly thought out in its viability. I'd like to
thank my advisor, Dr. Kinsman, for seeing this endeavor for what it is and by encouraging me to keep up the research in its natural direction. This flexibility and
uncertain guidance is exactly what is needed from data scientists if we are to truly find the unseen gems in our experiments. With the indefinite optimism that is
lacking the world over, take confidence in the solutions not yet found.
\newpage
\section{Appendix Information}
Given that this report may only be shared by the RIT Computer Science Department without the appendix, the appendix for this report, to include the timesheet and
tasks completed for this independent study, will be made available as a separate document.
\end{document} \end{document}

View File

@@ -1,5 +1,6 @@
Week,Date,Type,Duration (Hours),Description Week,Date,Type,Duration (Hours),Description
1,08/30,Advising Meetings,2,"Stat Review Content acknowledgement, Latex overview for reports" 1,08/30,Advising Meetings,2,"Stat Review Content acknowledgement, Latex overview for reports"
1,08/26-08/30,Research,8,"Subjective Probablistic Models in The Scout Mindset"
2,09/02,Reporting,3,"First applications of Latex for final report, created Timesheet System." 2,09/02,Reporting,3,"First applications of Latex for final report, created Timesheet System."
2,09/02,Research,2.5,"Stat Review: Sample Space through Probability Density Functions" 2,09/02,Research,2.5,"Stat Review: Sample Space through Probability Density Functions"
2,09/06,Advising Meetings,1,"Research Review and exploration of PDF expected values and confidence intervals" 2,09/06,Advising Meetings,1,"Research Review and exploration of PDF expected values and confidence intervals"
@@ -47,3 +48,17 @@ Week,Date,Type,Duration (Hours),Description
12,11/14,Reporting,1.5,"Contrasting Viterbi to exponential blowup" 12,11/14,Reporting,1.5,"Contrasting Viterbi to exponential blowup"
12,11/15,Reporting,3,"Viterbi graphics" 12,11/15,Reporting,3,"Viterbi graphics"
12,11/15,Reporting,1,"Renormalization and Minority Rule Reporting" 12,11/15,Reporting,1,"Renormalization and Minority Rule Reporting"
12,11/16,Reporting,2,"applying post-wandering mental feedback"
13,11/18,Research,3,"Monte Carlo Simulations"
13,11/19,Reporting,1,"Retrospective Discussion"
13,11/19,Reporting,2,"Monte Carlo Simulations, Buffon's Needle"
13,11/21,Research,1.5,"Skin in the Game - Fragile System Risk Aversion"
13,11/21,Research,5,"Understanding MCMC. So not worth it."
13,11/21,Reporting,2,"MCMC writeup"
13,11/22,Advising Meetings,1,"Virtual Review of Research"
14,11/26-11/30,Research,4.5,"The Signal and the Noise - Nate Silver"
15,12/3,Advising Meetings,1,"Impromptu Discussion"
15,12/5,Reporting,2,"Scale as Dimension, Methodology Considerations"
15,12/5,Application,1.5,"Scale as Dimension and Moore's Law calculation (hugggeee numbers)"
15,12/6,Advising Meetings,1,"Report Submission Discussion and Future Application"
15,12/6,Reporting,3,"Pre-Finals Report finalization for semester submission"
1 Week Date Type Duration (Hours) Description
2 1 08/30 Advising Meetings 2 Stat Review Content acknowledgement, Latex overview for reports
3 1 08/26-08/30 Research 8 Subjective Probablistic Models in The Scout Mindset
4 2 09/02 Reporting 3 First applications of Latex for final report, created Timesheet System.
5 2 09/02 Research 2.5 Stat Review: Sample Space through Probability Density Functions
6 2 09/06 Advising Meetings 1 Research Review and exploration of PDF expected values and confidence intervals
48 12 11/14 Reporting 1.5 Contrasting Viterbi to exponential blowup
49 12 11/15 Reporting 3 Viterbi graphics
50 12 11/15 Reporting 1 Renormalization and Minority Rule Reporting
51 12 11/16 Reporting 2 applying post-wandering mental feedback
52 13 11/18 Research 3 Monte Carlo Simulations
53 13 11/19 Reporting 1 Retrospective Discussion
54 13 11/19 Reporting 2 Monte Carlo Simulations, Buffon's Needle
55 13 11/21 Research 1.5 Skin in the Game - Fragile System Risk Aversion
56 13 11/21 Research 5 Understanding MCMC. So not worth it.
57 13 11/21 Reporting 2 MCMC writeup
58 13 11/22 Advising Meetings 1 Virtual Review of Research
59 14 11/26-11/30 Research 4.5 The Signal and the Noise - Nate Silver
60 15 12/3 Advising Meetings 1 Impromptu Discussion
61 15 12/5 Reporting 2 Scale as Dimension, Methodology Considerations
62 15 12/5 Application 1.5 Scale as Dimension and Moore's Law calculation (hugggeee numbers)
63 15 12/6 Advising Meetings 1 Report Submission Discussion and Future Application
64 15 12/6 Reporting 3 Pre-Finals Report finalization for semester submission

Binary file not shown.

View File

@@ -15,7 +15,7 @@
\rule{14cm}{0.05cm}\\ \vspace{.5cm} \rule{14cm}{0.05cm}\\ \vspace{.5cm}
\Large{Independent Study Timesheet}\\ \Large{Independent Study Appendix and Timesheet}\\
\large{Andrew Simonson} \large{Andrew Simonson}
\vspace*{\fill} \vspace*{\fill}
@@ -33,6 +33,8 @@ Week & Date & Type & Duration (Hours) & Description \\
\hline \hline
1 & 08/30 & Advising Meetings & 2 & Stat Review Content acknowledgement, Latex overview for reports \\ 1 & 08/30 & Advising Meetings & 2 & Stat Review Content acknowledgement, Latex overview for reports \\
\hline \hline
1 & 08/26-08/30 & Research & 8 & Subjective Probablistic Models in The Scout Mindset \\
\hline
2 & 09/02 & Reporting & 3 & First applications of Latex for final report, created Timesheet System. \\ 2 & 09/02 & Reporting & 3 & First applications of Latex for final report, created Timesheet System. \\
\hline \hline
2 & 09/02 & Research & 2.5 & Stat Review: Sample Space through Probability Density Functions \\ 2 & 09/02 & Research & 2.5 & Stat Review: Sample Space through Probability Density Functions \\
@@ -127,12 +129,40 @@ Week & Date & Type & Duration (Hours) & Description \\
\hline \hline
12 & 11/15 & Reporting & 1 & Renormalization and Minority Rule Reporting \\ 12 & 11/15 & Reporting & 1 & Renormalization and Minority Rule Reporting \\
\hline \hline
12 & 11/16 & Reporting & 2 & applying post-wandering mental feedback \\
\hline
13 & 11/18 & Research & 3 & Monte Carlo Simulations \\
\hline
13 & 11/19 & Reporting & 1 & Retrospective Discussion \\
\hline
13 & 11/19 & Reporting & 2 & Monte Carlo Simulations, Buffon's Needle \\
\hline
13 & 11/21 & Research & 1.5 & Skin in the Game - Fragile System Risk Aversion \\
\hline
13 & 11/21 & Research & 5 & Understanding MCMC. So not worth it. \\
\hline
13 & 11/21 & Reporting & 2 & MCMC writeup \\
\hline
13 & 11/22 & Advising Meetings & 1 & Virtual Review of Research \\
\hline
14 & 11/26-11/30 & Research & 4.5 & The Signal and the Noise - Nate Silver \\
\hline
15 & 12/3 & Advising Meetings & 1 & Impromptu Discussion \\
\hline
15 & 12/5 & Reporting & 2 & Scale as Dimension, Methodology Considerations \\
\hline
15 & 12/5 & Application & 1.5 & Scale as Dimension and Moore's Law calculation (hugggeee numbers) \\
\hline
15 & 12/6 & Advising Meetings & 1 & Report Submission Discussion and Future Application \\
\hline
15 & 12/6 & Reporting & 3 & Pre-Finals Report finalization for semester submission \\
\hline
\end{longtable} \end{longtable}
\noindent Hours for Advising Meetings: 10.0\\ \noindent Hours for Advising Meetings: 13.0\\
Hours for Application: 16.0\\ Hours for Application: 17.5\\
Hours for Reporting: 30.0\\ Hours for Reporting: 42.0\\
Hours for Research: 46.0\\ Hours for Research: 68.0\\
\textbf{Total Hours: 102.0}\\ \textbf{Total Hours: 140.5}\\
% CLOSE Timesheet % CLOSE Timesheet
\end{document} \end{document}