Drafted bayes report

This commit is contained in:
2024-10-16 15:50:07 -04:00
parent 6473765329
commit f0eb89cd4f
6 changed files with 215 additions and 15 deletions

Binary file not shown.

View File

@@ -4,6 +4,8 @@
\usepackage{amsmath}
\usepackage{amssymb}
\usepackage[a4paper, total={6in, 10in}]{geometry}
\usepackage{setspace}
\setstretch{1.25}
\hyphenpenalty 1000
\begin{document}
@@ -33,7 +35,24 @@
\newpage
% Begin report
\section{Objective}
yada yada yah I started this independent study for my own selfish gain
The educational focus of Implementations of Probability Theory surrounds the application of data
models that produce non-deterministic insights through probabilistic methodology. By pursuing this
study I hope to gain a deeper understanding of how to apply data in risk calculation for mitigation
scenarios as they appear in real life, rather than the experimental lab conditions that enable algorithmic
certainty.
In contrast to the path of black-box artificial intelligence and algorithms taught in \textbf{CSCI 335: Machine Learning}, this study is tailored to methods
designed to produce confidence levels for uncertain events using certain terms, leveraging logical,
traceable, and definite, calculations. Current course offerings in the realm of data science focus largely on
the storing and management of data, and it is noted that the cluster of data science was until very recently
under the branding of data management. Implementations of Probability Theory is intended to extend
learnings in previous courses, notably \textbf{CSCI 420: Principles of Data Mining}, for more advanced algorithms
used at the intersection of data and computing after the preprocessing stage.
After beginning this study the intended deliverable outline was determined to be technically implausible and has been replaced with
demonstrations of applied algorithms. Taking inspiration from the retinal mosaic as displayed in \textbf{CSCI 431: Intro to Computer Vision}
and discussion in \textbf{IGME 589: Computational Creativity and Algorithmic Art} on the appearance and nature of randomness in graphics, I hope to create
a program that can determine the liklihood that randomly distributed colors on a hexagonal grid appear as they do in an image.
\newpage
\section{Units}
@@ -155,4 +174,139 @@ To calculate standard error, kys.
Statistical Inference is any data analysis to draw conclusions from a sample to make assertions about the population.
Methods include estimation via averages and confidence intervals, and hypothesis testing, which attempts to invalidate (never \textit{validate}) a hypothesis.
\newpage
\subsection{Unit 2: Probabilistic Theories and Epistemology}
When developing probabilistic models it is vital to use domain expertise to expose the product to the full range of external variables that would be expected
of a model applied to the real world. Without an appropriate understanding of both the limitations in research procedures and the true value of the data collected,
the integrity of the model becomes inherently compromised.
As data scientists, we are uniquely at risk of falling for this trap because it is hard to fully grasp domain expertise when the nature of data science
in a business setting frequently means consulting for many separate projects with a collectively massive scope. Of equal consideration, it is also easy
to assume that the sophistication of our tools overrides imperfections in the data, in spite of mantras like 'Garbage In, Garbage Out'.
In this unit I explored some common fallacies and assumptions held by analysts who may not fully grasp the content that they work with,
nor the problems they intend to solve. This required extensive research that I found was best digested in the form of books whose chapters chronicle multiple
examples of a given principle. As such, the reading was not confined to just the timeslot designated for this unit. Research started during the months leading up
to the start of the semester\footnote{Only research during the semester was logged in the timesheet} and have continued through the independent study. This structure was particularly helpful to pull me back and gain perspective of what
my goal was when I was knee-deep in feature construction and model formulation.
\subsubsection{Moral Hazards and The Bob Rubin Trade}
Picking pennies in front of a steamroller.
When studying the effectiveness of a model the scope of review must capture the entire range of the sample space. Discarding black swans that don't impact
the client does not mean the results will not reflect on the client for an oversight. There is therefore a question of obligation for data scientists to include
flags for significant events in reality that do not effect the proposed course of action to the client.
The 2009 recession, attributed to the collapse of the housing market bubble, is the most common example of a moral hazard because the displacement of risk from
banks who were federally required to give subprime loans to the taxpayer meant that banks could profit from subprime loans but would not be harmed when the inevitable
occurred. In popular media, the housing bubble bursting is attributed to the banks where those in the industry passed off the event as something that nobody could
have forseen.\footnote{For instance, in the 2015 movie \textit{The Big Short}, only a few savvy traders who bothered to look into the details find that banks had,
in their ignorance, built the bundled mortgages on an unstable foundation.} In reality, banks only ignored a probablistic eventuality because their models did not
need to account for such an event.
Most emphasize the problems with risk transferrence when creating models. For this study's purposes, the important learning is that probablistic models should not
drop evaluations as soon as an event leaves the scope of the immediate client.
\subsubsection{Ignoring Improbable Outliers with Outsized Impact}
In machine learning it is common for algorithms to drop the most extreme (or a random selection of) datapoints to avoid overfitting and errors in data collection.
One issue with the current implementation of this procedure is that it is often done blindly, ignorant of information that these outliers may relay. For instance,
in a selection of 300 water samples from a stream, all but a few show a normal amount of oxygen in the stream. A citizen scientist may discount the remaining pockets
as a statistical implausibility that is most likely indicative of a failure in sample testing and drop the most extreme 5\% of datapoints.
However, if these few pockets show a complete disruption of the dissolution process, the vast majority of aquatic life in the stream will eventually pass through
these pockets without oxygen and die, resulting in an outsized impact from just a few sources.
Nassim Taleb in \textit{Fooled By Randomness} describes this event with an analogy to Russian Roulette: If there was a 5/6 chance of winning a million dollars and a
1/6 chance of killing yourself, many people would at least hesitate before pulling the trigger. But what if the barrel is 10,000 rounds and it was only a
1/10,000 chance of harm? In this case, many less-than-rational actors use the game repeatedly to acquire wealth indefinitely, forgetting or even outright ignorant
that eventually the unlikely, or, as the actor would see it, the unthinkable, happens and all of the gains are completely negated.
\subsubsection{Fooled By Randomness}
May justify its own subsection since the others acknowledge small probabilities whereas this is outright randomness.
\subsubsection{Lindy Effect}
"For the perishable, every additional day in its life translates into a shorter additional life expectancy.
For the nonperishable, every additional day may imply a longer life expectancy."
A tool that is proven is more likely to stand the test of time than a new tool replacing it since it is unproven.
"The robustness of an item is proportional to its life!"
"Inaccurate science\ldots is constantly being published. The Lindy-conscious consumer of scientific data will take seriously only
information that has held up over a period of time."\footnote{\url{https://www.nytimes.com/2021/06/17/style/lindy.html}}
\subsubsection{Decision Theory}
Decision theory is the study of how people make decisions with uncertain information. There are two main branches of decision theory:
\subsubsection*{Normative/Rational Decision Theory}
This branch studies how people \textit{should} make decisions. In problems with other actors, as in game theory, it is assumed that all other actors will also
act with perfect rationality, allowing for precise calculation of the actions of all of the others and their expected utility to the agent.
\subsubsection*{Descriptive Decision Theory}
This branch studies how people actually make decisions which includes factors such as psychological and emotional biases.
\subsubsection{Info Gap Decisions}
In info gap decision theory there is not enough information to assign probabilities to events and the goal is to select a course of action that is robust in the
face of uncertainty. Where decision theory can predict expectations in irrationality to determine expected values, info gap decisions approximate the range of
probabilities and weight them to estimate expected value. In essence, it applies probabilities to probabilities, adding an additional layer to insulate calculations
from a lack of data or lack of understanding of a topic.
\subsubsection{Methodology Considerations}
Given I have taken 10134023 instances of the last 40 years, all of which Obama has been alive, I can say with a high degree of certainty that Obama is immortal.
An event never occurring in history does not discount its possiblity of occurring in the future. Similarly, events that may have been impossible in the past
are not necessarily impossible in the future.
Also, psychology. Someone who knows they are being studied will act differently than someone who isn't being studied so models will be inaccurate.
\newpage
\subsection{Unit 3: Bayesian Statistics}
This unit was deliberately separated from statistical review due to the percieved complexity of the topic and the magnitude of usage in recent data science
breakthroughs. Bayes Theorem is a part of the cirriculum for both \textbf{MATH 351 - Probability and Statistics} and \textbf{CSCI 420 - Principles of Data Mining}.
However, as both approached the topic from different perspectives and while neither solidified my personal confidence in its use, I chose to take extra time to learn
this important topic in my own way.
It has been said that statistics does not come naturally to the human brain, hence statistics is, by mathematical standards, a
young discipline. Resulting research on Bayesian statistics has led me to the conclusion that the opposite may be true - Bayes Theorem is quite intuitive, but
its discipline has not had the time to crystallize best practices for instructing it. For instance, updating one's beliefs to compare probabilities with the
number of documented occurrences is frequently used in philosophical discussion in the form of explanations that subsets with high liklihood of fufilling terms
are valid classifications even when the subset size results in overall fufilled terms to be infrequently categorized as the proposed subset. Most people understand
these expressions but, when shown a table and how to calculate those ratios, the content enters the realm of collegiate instruction.
\subsubsection{Bayes Theorem}
The equation for Bayes Theorem is as follows:
\[
P(A|E) = \frac{P(A) * P(E|A)}{P(A) * P(E|A) + (1 - P(A)) * P(E|\neg A)}
\]
This formula appears more complex as it is. The denominator, while directly translating to "The probability of A times the probability of event E occuring in A
divided by the probability of A times the probability of event E occuring in A plus the probability of not A times the probability of E occuring in not A"
can be more easily expressed simply as \(P(E)\) or the probability of event E occuring.
By utilizing venacular more familiar to everyday life, Bayes Theorem can be translated into:
\[
\text{P(occurence came from category)} = \frac{\text{\# of occurences from category}}{\text{total \# of occurences}}
\]
Finally, this equation is updated to replace descriptions with technical terms:
\[
\text{Posterior Probability} = \frac{\text{prior} * \text{likelihood}}{\text{Evidence}}
\]
Even this equation can be misconstrued as a number of arrangements of ratios involving total occurrences from a category or non-occurrences from outside
of the category so as a final demonstration, the sample space will be visualized geometrically
\footnote{Concept credit to 3Blue1Brown on Youtube, this video is what finally clarified in my mind what the equation behind Bayes Theorem meant.\\
\url{https://www.youtube.com/watch?v=HZGCoVF3YvM}} as a 1 unit by 1 unit square.
\subsubsection{Bayesian Updating}
Bayesian Updating is another term that has been added to buzzword vocabulary to describe a process that isn't directly related to Bayesian Statistics but appears
to have been rediscovered by academia through study of applied Bayes Theorem. In essence, Bayesian Updating simply states that observed occurrences should not
override previous evidence and that it should instead be added to it in equal weight (equal value being a naive assumption). This evidence updating makes
applications of Bayes Theory calculate posterior probabilities continuously as new information enters the system rather than a calculation that is only done once.
\subsubsection{Bayesian Belief Networks}
Bayesian Belief Networks are probablistic graphical models that preserve conditional dependence between random variables. In spite of its name,
Bayesian Belief Networks do not necessarily apply Bayesian models, though they are a way to utilize Bayes Theorem for domains with greater complexity beyond a
single posterior probability. In this type of network, edges are directed and the structure is utilized in a single direction. This is in contrast to undirected
Hidden Markov Models that do not assume the order of aquisition of random variables.
\end{document}