mirror of
https://github.com/asimonson1125/Implementations-of-Probability-Theory.git
synced 2026-02-25 06:09:50 -06:00
starting markov things
This commit is contained in:
Binary file not shown.
@@ -3,7 +3,10 @@
|
|||||||
\usepackage{hyperref}
|
\usepackage{hyperref}
|
||||||
\usepackage{amsmath}
|
\usepackage{amsmath}
|
||||||
\usepackage{amssymb}
|
\usepackage{amssymb}
|
||||||
|
|
||||||
\usepackage{tikz}
|
\usepackage{tikz}
|
||||||
|
\usetikzlibrary{arrows, automata, positioning}
|
||||||
|
|
||||||
\usepackage[a4paper, total={6in, 10in}]{geometry}
|
\usepackage[a4paper, total={6in, 10in}]{geometry}
|
||||||
\usepackage{setspace}
|
\usepackage{setspace}
|
||||||
\setstretch{1.25}
|
\setstretch{1.25}
|
||||||
@@ -36,6 +39,8 @@
|
|||||||
\newpage
|
\newpage
|
||||||
% Begin report
|
% Begin report
|
||||||
\section{Objective}
|
\section{Objective}
|
||||||
|
\rule{14cm}{0.05cm}
|
||||||
|
|
||||||
The educational focus of Implementations of Probability Theory surrounds the application of data
|
The educational focus of Implementations of Probability Theory surrounds the application of data
|
||||||
models that produce non-deterministic insights through probabilistic methodology. By pursuing this
|
models that produce non-deterministic insights through probabilistic methodology. By pursuing this
|
||||||
study I hope to gain a deeper understanding of how to apply data in risk calculation for mitigation
|
study I hope to gain a deeper understanding of how to apply data in risk calculation for mitigation
|
||||||
@@ -58,6 +63,7 @@ a program that can determine the liklihood that randomly distributed colors on a
|
|||||||
\newpage
|
\newpage
|
||||||
\section{Units}
|
\section{Units}
|
||||||
\rule{14cm}{0.05cm}
|
\rule{14cm}{0.05cm}
|
||||||
|
|
||||||
\subsection{Unit 1: Statistics Review}
|
\subsection{Unit 1: Statistics Review}
|
||||||
To ensure a strong statistical foundation for the future learnings in probabilistic models,
|
To ensure a strong statistical foundation for the future learnings in probabilistic models,
|
||||||
the first objective was to create a document outlining and defining key topics that are
|
the first objective was to create a document outlining and defining key topics that are
|
||||||
@@ -118,7 +124,8 @@ Probability Functions map the likelihood of random variables to be a specific va
|
|||||||
|
|
||||||
\subsubsection*{Probability Mass Functions}
|
\subsubsection*{Probability Mass Functions}
|
||||||
Probability Mass Functions (PMFs) map discrete random variables.
|
Probability Mass Functions (PMFs) map discrete random variables.
|
||||||
For example, a six-sided die roll creates a uniform random PMF:
|
For example, a six-sided die roll creates a uniform random PMF. Each side of the die has a one-sixth chance of landing face-up, so the discrete chances of each x
|
||||||
|
value between 1 and 6 is represented by a \(\frac{1}{6}\)th portion of the sample space:
|
||||||
\begin{equation*}
|
\begin{equation*}
|
||||||
P(A) =
|
P(A) =
|
||||||
\begin{cases}
|
\begin{cases}
|
||||||
@@ -133,7 +140,9 @@ For example, a six-sided die roll creates a uniform random PMF:
|
|||||||
|
|
||||||
\subsubsection*{Probability Density Functions}
|
\subsubsection*{Probability Density Functions}
|
||||||
Probability Density Functions (PDFs) map continuous random variables.
|
Probability Density Functions (PDFs) map continuous random variables.
|
||||||
For example, this is a PDF where things happen.
|
For example, this is a PDF representing a vehicle's risk of being stranded as it travels (in a line at a fixed speed). The y value increases as the vehicle puts
|
||||||
|
distance between itself and the starting point but, once the halfway point is reached, the risk decreases as the distance between the vehicle and the destination
|
||||||
|
decreases.
|
||||||
\begin{equation*}
|
\begin{equation*}
|
||||||
P(A) =
|
P(A) =
|
||||||
\begin{cases}
|
\begin{cases}
|
||||||
@@ -144,7 +153,7 @@ For example, this is a PDF where things happen.
|
|||||||
\end{equation*}
|
\end{equation*}
|
||||||
|
|
||||||
\subsubsection{Limit Theorems}
|
\subsubsection{Limit Theorems}
|
||||||
\subsubsection*{Law of Large Numbers}
|
\subsubsection*{Law of Large Numbers}\label{Law of Large Numbers}
|
||||||
The Law of Large Numbers states that as the number of independent random samples increases, the average of the samples'
|
The Law of Large Numbers states that as the number of independent random samples increases, the average of the samples'
|
||||||
means will approach the true mean of the population.
|
means will approach the true mean of the population.
|
||||||
\[\text{true average}\approx \frac{1}{n} \sum_{i=1}^{n} X_{i} \qquad\text{as }n \rightarrow \infty\]
|
\[\text{true average}\approx \frac{1}{n} \sum_{i=1}^{n} X_{i} \qquad\text{as }n \rightarrow \infty\]
|
||||||
@@ -156,7 +165,7 @@ population distribution is not normal.
|
|||||||
\]
|
\]
|
||||||
Where \(X_i\) is the sample mean, \(N(0, 1)\) is a standard normal distribution, and \(\bar{X}_n = \frac{1}{n} \sum_{i=1}^{n}X_i\).\\
|
Where \(X_i\) is the sample mean, \(N(0, 1)\) is a standard normal distribution, and \(\bar{X}_n = \frac{1}{n} \sum_{i=1}^{n}X_i\).\\
|
||||||
This is a challenging to understand solely as an equation. As an example, take a sample of two six-sided dice rolls and average their numbers.
|
This is a challenging to understand solely as an equation. As an example, take a sample of two six-sided dice rolls and average their numbers.
|
||||||
The more sample averages taken, the more they will resemble a normal distribution where the majority of samples average around 3.
|
The more sample averages taken, the more they will resemble a normal distribution where the majority of samples average around 3.5.
|
||||||
|
|
||||||
\subsubsection{Confidence}
|
\subsubsection{Confidence}
|
||||||
Confidence is described using a confidence interval, which is a range of values that the true value is expected to be in, and its associated confidence level,
|
Confidence is described using a confidence interval, which is a range of values that the true value is expected to be in, and its associated confidence level,
|
||||||
@@ -165,9 +174,22 @@ which is a probability (expressed as a percentage) that the true value is in the
|
|||||||
It is important to note that confidence levels, such as 95\%, do not indicate that the real value is within 5\% of the point estimate. The confidence level expresses
|
It is important to note that confidence levels, such as 95\%, do not indicate that the real value is within 5\% of the point estimate. The confidence level expresses
|
||||||
the probability that the real value is in the range provided by the confidence interval.
|
the probability that the real value is in the range provided by the confidence interval.
|
||||||
|
|
||||||
At the highest level, calculating confidence intervals is simply the observed statistic (generally the mean) plus or minus the standard error.
|
At the highest level, calculating confidence intervals is simply the observed statistic (generally the mean) plus or minus the standard error. The percentage is
|
||||||
|
identified by applying the z-score coefficient (in the case of nornal distribution, other distributions use non-parametric methods) that corresponds to that level
|
||||||
|
of confidence. For instance, the z-multiplier for a confidence level of 95\% is 1.96 so a confidence interval formula around the mean would look like this:
|
||||||
|
|
||||||
To calculate standard error, kys.
|
\[\text{interval} = \mu \pm (1.96 * \text{SE})\]
|
||||||
|
|
||||||
|
To calculate standard error when the population standard deviation (\(\sigma\)) is known:
|
||||||
|
|
||||||
|
\[\text{SE} = \frac{\sigma}{\sqrt{n}}\]
|
||||||
|
|
||||||
|
When \(\sigma\) is unknown:
|
||||||
|
|
||||||
|
\[\text{SE} = \frac{s}{\sqrt{n}}\]
|
||||||
|
|
||||||
|
where \(n\) is the size of the sample and \(s\) is the sample standard deviation. Notice how the standard error decreases with a larger sample size because it
|
||||||
|
indicates a resilience in the sample to random events as per the Law of Large Numbers (\ref{Law of Large Numbers}).
|
||||||
|
|
||||||
% Confidence intervals can be calculated with z-tests, t-tests. Go into parametric vs non-parametric
|
% Confidence intervals can be calculated with z-tests, t-tests. Go into parametric vs non-parametric
|
||||||
|
|
||||||
@@ -201,10 +223,10 @@ The 2009 recession, attributed to the collapse of the housing market bubble, is
|
|||||||
banks who were federally required to give subprime loans to the taxpayer meant that banks could profit from subprime loans but would not be harmed when the inevitable
|
banks who were federally required to give subprime loans to the taxpayer meant that banks could profit from subprime loans but would not be harmed when the inevitable
|
||||||
occurred. In popular media, the housing bubble bursting is attributed to the banks where those in the industry passed off the event as something that nobody could
|
occurred. In popular media, the housing bubble bursting is attributed to the banks where those in the industry passed off the event as something that nobody could
|
||||||
have foreseen\footnote{For instance, in the 2015 movie \textit{The Big Short}, only a few savvy traders who bothered to look into the details find that banks had,
|
have foreseen\footnote{For instance, in the 2015 movie \textit{The Big Short}, only a few savvy traders who bothered to look into the details find that banks had,
|
||||||
in their ignorance, built the bundled mortgages on an unstable foundation.}. In reality, banks only ignored a probablistic eventuality because their models did not
|
in their ignorance, built the bundled mortgages on an unstable foundation.}. In reality, banks only ignored a probabilistic eventuality because their models did not
|
||||||
need to account for such an event.
|
need to account for such an event.
|
||||||
|
|
||||||
Most emphasize the problems with risk transferrence when creating models. For this study's purposes, the important learning is that probablistic models should not
|
Most emphasize the problems with risk transferrence when creating models. For this study's purposes, the important learning is that probabilistic models should not
|
||||||
drop evaluations as soon as an event leaves the scope of the immediate client.
|
drop evaluations as soon as an event leaves the scope of the immediate client.
|
||||||
|
|
||||||
\subsubsection{Ignoring Improbable Outliers with Outsized Impact}
|
\subsubsection{Ignoring Improbable Outliers with Outsized Impact}
|
||||||
@@ -221,16 +243,28 @@ Nassim Taleb in \textit{Fooled By Randomness} describes this event with an analo
|
|||||||
that eventually the unlikely, or, as the actor would see it, the unthinkable, happens and all of the gains are completely negated.
|
that eventually the unlikely, or, as the actor would see it, the unthinkable, happens and all of the gains are completely negated.
|
||||||
|
|
||||||
\subsubsection{Fooled By Randomness}
|
\subsubsection{Fooled By Randomness}
|
||||||
May justify its own subsection since the others acknowledge small probabilities whereas this is outright randomness.
|
While most statisticians are familiar with techniques to remove noise to get a clearer picture of long-term trends, many forget that noise over longer terms can
|
||||||
|
materialize as highly improbable events. For instance, it is improbable to flip a fair coin and have heads land face up 5 times in a row, but if the coin is flipped
|
||||||
|
millions of times, it's exceedingly unlikely that a 5-head sequence does not occur.
|
||||||
|
|
||||||
\subsubsection{Lindy Effect}
|
In Nassim Taleb's namesake book, \textit{Fooled By Randomness}, this concept is applied to ongoing timeseries analysis in stock markets. By accounting for the scope
|
||||||
"For the perishable, every additional day in its life translates into a shorter additional life expectancy.
|
of the prior evidence, Taleb models the probability that daily events are the effect of noise, a number that remains high even in the face of multiple point swings
|
||||||
For the nonperishable, every additional day may imply a longer life expectancy."
|
in the market. Understanding this chance is critical because often observers attempt to justify random market events to events with high publicity that in reality
|
||||||
A tool that is proven is more likely to stand the test of time than a new tool replacing it since it is unproven.
|
had a negligible on the market, fooling investors out of acting on prices deviating from their target.
|
||||||
"The robustness of an item is proportional to its life!"
|
|
||||||
|
|
||||||
"Inaccurate science\ldots is constantly being published. The Lindy-conscious consumer of scientific data will take seriously only
|
\subsubsection{Lindy Effect}\label{Lindy Effect}
|
||||||
information that has held up over a period of time."\footnote{\url{https://www.nytimes.com/2021/06/17/style/lindy.html}}
|
The Lindy Effect describes the importance of historical evidence of continuity when estimating its continuity in the future. For items with a set lifespan, such as
|
||||||
|
perishable goods, each passing day is indicative of a shorter remaining life expectancy, but the same is not true for nonperishables like tools and concepts.
|
||||||
|
For example, consider the lifespan of a news story or hot book. Many such stories may take the world by storm, but then be nearly forgotten months later. However,
|
||||||
|
older writings are incredibly unlikely to be forgotten in the next few months. It would be truly bizzare if everyone decided Shakespeare was not worth learning in
|
||||||
|
the next few years because its value has been determined for so long to be high enough to maintain its popularity.
|
||||||
|
|
||||||
|
Applying this concept to probability theory, information and evidence that has been important for a long time is likely to stick around long after hot new examples
|
||||||
|
or tactics that contradict it fade into obscurity. When measuring risk of startups, the concept and foundations may indeed be strong, but they have to be contrasted
|
||||||
|
with the robustness of past ideas as proven over time. This concept also has applications for how people think about new things in their day to day life.
|
||||||
|
In the news and papers outlining new developments, "Inaccurate science\ldots is constantly being published. The Lindy-conscious consumer of scientific data will take
|
||||||
|
seriously only information that has held up over a period of time"\footnote{\url{https://www.nytimes.com/2021/06/17/style/lindy.html}} because time has removed
|
||||||
|
uncertainty associated with volatility of untested (or tested less than the alternative) information.
|
||||||
|
|
||||||
\subsubsection{Decision Theory}
|
\subsubsection{Decision Theory}
|
||||||
Decision theory is the study of how people make decisions with uncertain information. There are two main branches of decision theory:
|
Decision theory is the study of how people make decisions with uncertain information. There are two main branches of decision theory:
|
||||||
@@ -238,16 +272,32 @@ Decision theory is the study of how people make decisions with uncertain informa
|
|||||||
This branch studies how people \textit{should} make decisions. In problems with other actors, as in game theory, it is assumed that all other actors will also
|
This branch studies how people \textit{should} make decisions. In problems with other actors, as in game theory, it is assumed that all other actors will also
|
||||||
act with perfect rationality, allowing for precise calculation of the actions of all of the others and their expected utility to the agent.
|
act with perfect rationality, allowing for precise calculation of the actions of all of the others and their expected utility to the agent.
|
||||||
\subsubsection*{Descriptive Decision Theory}
|
\subsubsection*{Descriptive Decision Theory}
|
||||||
This branch studies how people actually make decisions which includes factors such as psychological and emotional biases.
|
This branch studies how people actually make decisions which includes factors such as psychological and emotional biases. It applies subjective value measurements,
|
||||||
|
frequently working in parallel with Dempster-Shafer Theory (\ref{Dempster_Shafer_Theory}).
|
||||||
|
|
||||||
\subsubsection{Info Gap Decisions}
|
\subsubsection{Info Gap Decisions}
|
||||||
In info gap decision theory there is not enough information to assign probabilities to events and the goal is to select a course of action that is robust in the
|
In info gap decision theory there is not enough information to assign probabilities to events. The goal, then, is to select a course of action that is robust in the
|
||||||
face of uncertainty. Where decision theory can predict expectations in irrationality to determine expected values, info gap decisions approximate the range of
|
face of uncertainty. Where decision theory can predict expectations in irrationality to determine expected values, info gap decisions approximate the range of
|
||||||
probabilities and weight them to estimate expected value. In essence, it applies probabilities to probabilities, adding an additional layer to insulate calculations
|
probabilities and weight them to estimate expected value. In essence, it applies probabilities to probabilities, adding an additional layer to insulate calculations
|
||||||
from a lack of data or lack of understanding of a topic.
|
from a lack of data or lack of understanding of a topic. Tying this into the Lindy Effect (\ref{Lindy Effect}), we can compare the large range of probabilities of
|
||||||
|
new, untested information with the narrower range from old, tested information which has experienced more challenges, just as confidence increases with a larger
|
||||||
|
sample size.
|
||||||
|
|
||||||
|
\subsubsection{Dempster-Shafer Theory}\label{Dempster_Shafer_Theory}
|
||||||
|
This section is an extra theory chosen to coincide with the unit 3 focus on Bayesian statistics. The Dempster-Shafer theory is a derivative application of
|
||||||
|
Bayes Theorem (\ref{Bayes Theorem}) where subjective beliefs are applied to independent variables not tracked by the belief network. Shafer so eloquently describes this
|
||||||
|
process by supposing that two friends, both of whom he subjectively believes are 90\% reliable, tell him that a limb has fallen on his car
|
||||||
|
\footnote{\url{http://glennshafer.com/assets/downloads/articles/article48.pdf}}. Without observing Shafer's car we can calculate that there is only a 1\% chance that
|
||||||
|
both friends are unreliable, so there's a high liklihood that the statement is true.
|
||||||
|
|
||||||
|
However, if both friends are unreliable, they are not necessarily lying. Thus, there is actually less than 1\% chance that a limb fell on the car. The exact
|
||||||
|
probability can only be calculated by determining how likely it is that the friends would find it funny to tell Shafer that a limb fell on his car, contrasted with
|
||||||
|
the odds that such a friend may also be willing to throw limbs at his car so as to maintain their ever-reliable facade. If one also considers the possibility
|
||||||
|
that Shafer's friends mistakenly believed a limb fell on his car, this uncertainty must also be combined with the evidence for the most accurate picture.
|
||||||
|
|
||||||
\subsubsection{Methodology Considerations}
|
\subsubsection{Methodology Considerations}
|
||||||
Given I have taken 10134023 instances of the last 40 years, all of which Obama has been alive, I can say with a high degree of certainty that Obama is immortal.
|
I have taken 10134023 instances of the last 40 years, during all of which Obama has been alive. Therefore I can say with a high degree of certainty that Obama is
|
||||||
|
immortal.
|
||||||
|
|
||||||
An event never occurring in history does not discount its possiblity of occurring in the future. Similarly, events that may have been impossible in the past
|
An event never occurring in history does not discount its possiblity of occurring in the future. Similarly, events that may have been impossible in the past
|
||||||
are not necessarily impossible in the future.
|
are not necessarily impossible in the future.
|
||||||
@@ -269,20 +319,19 @@ these expressions but, when shown a table and how to calculate those ratios, the
|
|||||||
|
|
||||||
\subsubsection{Bayes Theorem}\label{Bayes Theorem}
|
\subsubsection{Bayes Theorem}\label{Bayes Theorem}
|
||||||
|
|
||||||
The equation for Bayes Theorem is as follows:
|
Bayes Theorem is a rule for conditional probability that calculates the probability of a cause given an event has occurred. The equation for Bayes Theorem is as
|
||||||
|
follows:
|
||||||
|
|
||||||
\[
|
\[
|
||||||
P(A|E) = \frac{P(A) * P(E|A)}{P(A) * P(E|A) + (1 - P(A)) * P(E|\neg A)}
|
P(A|E) = \frac{P(A) * P(E|A)}{P(A) * P(E|A) + (1 - P(A)) * P(E|\neg A)}
|
||||||
\]
|
\]
|
||||||
|
|
||||||
This formula appears more complex as it is. The denominator, while directly translating to "The probability of A times the probability of event E occuring in A
|
This formula appears more complex as it is. The denominator, while directly translating to "The probability of A times the probability of event E occuring given A
|
||||||
divided by the probability of A times the probability of event E occuring in A plus the probability of not A times the probability of E occuring in not A"
|
divided by the probability of A times the probability of event E occuring in A plus the probability of not A times the probability of E occuring in not A"
|
||||||
can be more easily expressed simply as \(P(E)\) or the probability of event E occuring.
|
can be more easily expressed as \(P(E)\) or the probability of event E occuring:
|
||||||
|
|
||||||
By utilizing venacular more familiar to everyday life, Bayes Theorem can be translated into:
|
|
||||||
|
|
||||||
\[
|
\[
|
||||||
\text{P(occurence came from category)} = \frac{\text{\# of occurences from category}}{\text{total \# of occurences}}
|
P(A|E) = \frac{P(A) * P(E|A)}{P(E)}
|
||||||
\]
|
\]
|
||||||
|
|
||||||
Finally, this equation is updated to replace descriptions with technical terms:
|
Finally, this equation is updated to replace descriptions with technical terms:
|
||||||
@@ -291,47 +340,84 @@ Finally, this equation is updated to replace descriptions with technical terms:
|
|||||||
\text{Posterior Probability} = \frac{\text{prior} * \text{likelihood}}{\text{Evidence}}
|
\text{Posterior Probability} = \frac{\text{prior} * \text{likelihood}}{\text{Evidence}}
|
||||||
\]
|
\]
|
||||||
|
|
||||||
Even this equation can be misconstrued as a number of arrangements of ratios involving total occurrences from a category or non-occurrences from outside
|
By utilizing venacular more familiar to everyday life, Bayes Theorem can be translated as:
|
||||||
of the category so as a final demonstration, the sample space can be visualized geometrically as a 1 unit by 1 unit
|
|
||||||
square\footnote{Concept credit to 3Blue1Brown on Youtube, this video is what finally clarified in my mind what the frankly simple equation behind Bayes Theorem
|
\[
|
||||||
meant.\\\url{https://www.youtube.com/watch?v=HZGCoVF3YvM}}. The area of this square, 1 unit squared, is the equivalent to a probability of 1 (or 100\%).
|
\text{P(occurence stems from A)} = \frac{\text{\# of occurences from A}}{\text{total \# of occurences}}
|
||||||
In such an example, a vertical line is drawn to separate proportions representative of the category (or the assumed-true event) and observations not of that category.
|
\]
|
||||||
Horizontal lines drawn in each represent the probability of an occurrence in each category.
|
|
||||||
|
To appeal to mental visualization, the sample space can be imagined geometrically as a 1 unit by 1 unit
|
||||||
|
square\footnote{Concept credit to 3Blue1Brown on Youtube, this video is what finally clarified in my mind what the frankly simple equation behind Bayes Theorem
|
||||||
|
meant.\\\url{https://www.youtube.com/watch?v=HZGCoVF3YvM}}. The area of this square, 1 unit squared, represents a probability of 1 (or 100\%) and the probability of
|
||||||
|
any possible outcome fits inside this square. Intuitively, this visualization can also be thought of as a confusion matrix where the squares are drawn proportional
|
||||||
|
to their representative probabilities.
|
||||||
|
|
||||||
|
Consider an example where a patient wants to know if their positive cancer test is actually a false negative. Reviewing the test history, it's found to be accurate
|
||||||
|
95\% across 1,000 uses. Given that we want to find the chances that a positive test is truly from a patient with cancer, let's highlight only the cases where a
|
||||||
|
test is positive. A confusion matrix for this example would look like this:
|
||||||
|
|
||||||
Consider an example where a cancer test given to 1,000 people has a 95\% accuracy rate. Of those 1,000 people, 10\% of them have cancer, 95 of whom test positive
|
|
||||||
(true positive) and 5 who test negative (false negative). Of the remaining 900, 45 test positive (false positive) and 855 test negative (true negative). Such
|
|
||||||
an example can be expressed visually as:
|
|
||||||
\vskip 2pt
|
|
||||||
\begin{center}
|
\begin{center}
|
||||||
\begin{tikzpicture}
|
\begin{tikzpicture}
|
||||||
\draw[gray, thick] (0,0) rectangle (3,3);
|
\draw[gray, thick, fill=blue!5] (0, 0) rectangle (3, 3);
|
||||||
\draw[gray, thin] (3/10, 0) -- (3/10, 3);
|
\node[align=center, text width=3cm] at (1.5, 1.5) {True Positives\\95 patients};
|
||||||
\draw[gray, thin] (0, 0) rectangle (3/10, 3*.95);
|
|
||||||
\node[label=below:95/1000] at (-1,1) {TP};
|
\draw[gray, thick, fill=red!5] (3, 0) rectangle (6, 3);
|
||||||
\draw[->] (-.6, 1) -- (.15, 1);
|
\node[align=center, text width=3cm] at (4.5, 1.5) {False Positives\\45 patients};
|
||||||
\node[label=below:45/1000] at (1.5,-2/3) {FP};
|
|
||||||
\draw[->] (1.5, -1/3) -- (1.5, .05);
|
\draw[gray, thick] (0, 3) rectangle (3, 6);
|
||||||
\draw[gray, thin] (3/10, 0) rectangle (3, 3*.05);
|
\node[align=center, text width=3cm] at (1.5, 4.5) {False Negatives\\5 patients};
|
||||||
|
|
||||||
|
\draw[gray, thick] (3, 3) rectangle (6, 6);
|
||||||
|
\node[align=center, text width=3cm] at (4.5, 4.5) {True Negatives\\855 patients};
|
||||||
|
|
||||||
|
\node[label, align=center, text width=3cm] at (1.5, 6.75) {Cancer\\ (100 patients)};
|
||||||
|
\node[label, align=center, text width=3cm] at (4.5, 6.75) {No Cancer\\ (900 patients)};
|
||||||
|
\node[label, rotate=90] at (-0.5, 1.5) {Positive};
|
||||||
|
\node[label, rotate=90] at (-0.5, 4.5) {Negative};
|
||||||
|
\end{tikzpicture}
|
||||||
|
\end{center}
|
||||||
|
|
||||||
|
Notice that the test does make the correct identification 95\% of the time (and in this example, 95\% regardless of actual value) but that there are almost half as
|
||||||
|
many false positives as there are true positives, meaning having a positive test is not representative of a 95\% chance of having cancer.
|
||||||
|
|
||||||
|
Proportinally scaling the probability matrix squares to create the sample space square defined earlier, we can see that the TP box appears to be approximately
|
||||||
|
twice the size of the FP box. Logically, then, if we chose a random positive test, there's a two-thirds chance of the patient selected being from the true positive
|
||||||
|
category:
|
||||||
|
|
||||||
|
\vfil % Added to keep the footer down since a new page is entering on the next tikz picture
|
||||||
|
\begin{center}
|
||||||
|
\begin{tikzpicture}
|
||||||
|
\draw[gray, thick] (0,0) rectangle (6, 6);
|
||||||
|
\draw[gray, thin] (6/10, 0) -- (6/10, 6);
|
||||||
|
\draw[gray, thin, fill=blue!5] (0, 0) rectangle (6/10, 6*.95);
|
||||||
|
\draw[gray, thin, fill=red!5] (6/10, 0) rectangle (6, 6*.05);
|
||||||
|
\node[label=below:95/1000] at (-1, 2.5) {TP};
|
||||||
|
\draw[->] (-0.6, 2.5) -- (0.25, 2.5);
|
||||||
|
\node[label=below:45/1000] at (4,-2/3) {FP};
|
||||||
|
\draw[->] (4, -1/3) -- (4, .15);
|
||||||
|
\node[label=below:5/1000] at (-1, 5.85) {FN};
|
||||||
|
\node[label=below:855/1000] at (3.5, 3.5) {TN};
|
||||||
|
\draw[->] (-0.6, 5.85) -- (0.25, 5.85);
|
||||||
\end{tikzpicture}
|
\end{tikzpicture}
|
||||||
\end{center}
|
\end{center}
|
||||||
\vskip 2pt
|
\vskip 2pt
|
||||||
Using this visual where TP represents true positives and FP representing false positives, Bayes Theorem is simply expressed as:
|
Bayes Theorem as applied to this problem can be simply expressed as:
|
||||||
\[
|
\[
|
||||||
P(A|E) = \frac{TP}{TP + FP} = \frac{\frac{95}{1000}}{\frac{95}{1000} + \frac{45}{1000}} = 67.9\%
|
P(\text{has cancer given positive test}) = \frac{\colorbox{blue!5}{TP}}{\colorbox{blue!5}{TP} + \colorbox{red!5}{FP}} = \frac{\colorbox{blue!5}{\(\frac{95}{1000}\)}}{\colorbox{blue!5}{\(\frac{95}{1000}\)} + \colorbox{red!5}{\(\frac{45}{1000}\)}} = 67.9\%
|
||||||
\]
|
\]
|
||||||
Meaning that, given a random positive test, there is a 67.9\% chance of the patient actually having cancer. This percentage visually tracks with the graphic as
|
Meaning that, given a random positive test, there is a 67.9\% chance of the patient actually having cancer, not far off from the two-thirds visual trick.
|
||||||
the TP box appears to be approximately twice the size of the FP box, giving a two-thirds chance of the patient being a true positive.
|
|
||||||
|
|
||||||
|
|
||||||
\subsubsection{Bayesian Updating}
|
\subsubsection{Bayesian Updating}
|
||||||
Bayesian Updating is another term that has been added to buzzword vocabulary to describe a process that isn't directly related to Bayesian Statistics but appears
|
Bayesian Updating is another term that has been added to buzzword vocabulary to describe a process that isn't directly related to Bayesian Statistics but appears
|
||||||
to have been rediscovered by academia through study of applied Bayes Theorem. In essence, Bayesian Updating simply states that observed occurrences should not
|
to have been rediscovered by academia through study of applied Bayes Theorem. In essence, Bayesian Updating simply states that observed occurrences should not
|
||||||
override previous evidence and that it should instead be added to it in equal weight (equal value being a naive assumption). This evidence updating makes
|
override previous evidence and that it should instead be added to it in equal weight (equal value being a naive assumption). This evidence updating makes
|
||||||
applications of Bayes Theory calculate posterior probabilities continuously as new information enters the system rather than a calculation that is only done once.
|
applications of Bayes Theory calculate posterior probabilities continuously as new information enters the system rather than a frequentist approach where
|
||||||
|
the calculation only performed once.
|
||||||
|
|
||||||
|
|
||||||
\subsubsection{Bayesian Belief Networks}
|
\subsubsection{Bayesian Belief Networks}
|
||||||
Bayesian Belief Networks are probablistic graphical models that preserve conditional dependence between random variables. In spite of its name,
|
Bayesian Belief Networks are probabilistic graphical models that preserve conditional dependence between random variables. In spite of its name,
|
||||||
Bayesian Belief Networks do not necessarily apply Bayesian models, though they are a way to utilize Bayes Theorem for domains with greater complexity beyond a
|
Bayesian Belief Networks do not necessarily apply Bayesian models, though they are a way to utilize Bayes Theorem for domains with greater complexity beyond a
|
||||||
single posterior probability. In this type of network, edges are directed and the structure is utilized in a single direction. This is in contrast to undirected
|
single posterior probability. In this type of network, edges are directed and the structure is utilized in a single direction. This is in contrast to undirected
|
||||||
Hidden Markov Models (to be covered in the next unit) that do not assume the order of aquisition of random variables. While it may not be practical to calculate
|
Hidden Markov Models (to be covered in the next unit) that do not assume the order of aquisition of random variables. While it may not be practical to calculate
|
||||||
@@ -339,8 +425,7 @@ the full conditional probability of a variable, Bayesian Belief Networks allow u
|
|||||||
an earlier random variable.
|
an earlier random variable.
|
||||||
|
|
||||||
Following the example in the Bayes Theorem section of this report (\ref{Bayes Theorem}), let's suppose that a patient with a positive test takes a hypothetical
|
Following the example in the Bayes Theorem section of this report (\ref{Bayes Theorem}), let's suppose that a patient with a positive test takes a hypothetical
|
||||||
second test whose results are partially dependent on the first as they measure overlapping biological markers. In this case, the results of the first test
|
second test. However, the second test's results are partially dependent on the first since they measure overlapping biological markers.
|
||||||
is relevant to the second test:
|
|
||||||
\vskip 5pt
|
\vskip 5pt
|
||||||
\begin{center}
|
\begin{center}
|
||||||
\begin{tikzpicture}
|
\begin{tikzpicture}
|
||||||
@@ -391,11 +476,132 @@ is relevant to the second test:
|
|||||||
\hline
|
\hline
|
||||||
\end{tabular}
|
\end{tabular}
|
||||||
\end{center}
|
\end{center}
|
||||||
Note that this probability of positive results in both tests (which both have greater than 50\% of positives being true positives) is as certain as two positives
|
Note that this probability of positive results in both tests (which both have greater than 50\% of positives being true positives) is only equally certain as two
|
||||||
from two completely independent tests with 50\% of positives being true. If the partial dependence was not included in the calculation, as would have occured in a
|
positives from two independent tests each with 50\% of positives being true. If the dependence was not included in the calculation and we ignored the fact
|
||||||
Naive Bayes model, the model's listed accuracy would be inflated.
|
that the tests partially measure the same thing, as would have occured in a Naive Bayes model, the tests' combined accuracy would be unjustly inflated.
|
||||||
|
|
||||||
\newpage
|
\newpage
|
||||||
\section{Unit 4: Markov Chains}
|
\subsection{Unit 4: Markov Methods}
|
||||||
|
|
||||||
|
|
||||||
|
\subsubsection{Markov Chains}
|
||||||
|
Markov Chains are a form of probabilistic automaton where, the likelihood of transitioning to a new state depends solely on the current state, with no memory of prior
|
||||||
|
states. For example\footnote{example sourced from:\\\url{https://towardsdatascience.com/introduction-to-markov-chains-50da3645a50d}}, suppose a weather prediction
|
||||||
|
program wants to know whether tomorrow will be a sunny or cloudy day, based on the current weather. Using the current weather as a state, the program identifies that
|
||||||
|
there is a 10\% chance of a sunny day transitioning into a cloudy day and a 50\% chance that a cloudy day transitions into a sunny day:
|
||||||
|
|
||||||
|
\begin{center}
|
||||||
|
\begin{tikzpicture}[shorten >=1pt, node distance=3cm, on grid, auto]
|
||||||
|
|
||||||
|
\node[state] (Sunny) {Sunny};
|
||||||
|
\node[state, right=of Sunny] (Cloudy) {Cloudy};
|
||||||
|
|
||||||
|
\path[->]
|
||||||
|
(Sunny) edge [loop left] node {.9} (Sunny)
|
||||||
|
edge [bend right=-15] node {.1} (Cloudy)
|
||||||
|
(Cloudy) edge [loop right] node {.5} (Cloudy)
|
||||||
|
edge [bend left=15] node {.5} (Sunny);
|
||||||
|
|
||||||
|
\end{tikzpicture}
|
||||||
|
\end{center}
|
||||||
|
|
||||||
|
Note that there is no information preserved between steps. Markov Chains are memoryless, so any information that must be available to them must be expressed as the
|
||||||
|
state, such as the sunny and cloudy states in the example above. One benefit of such a straightforward structure is that it enables easy calculation of the
|
||||||
|
probabilities of reaching a state k-steps from the current position. By expressing the chain as a transition matrix where rows represent the current state, the
|
||||||
|
column represents the next state, and each cell contains the probability of the state moving from the column state to the row state, we get a 1-step transition matrix:
|
||||||
|
|
||||||
|
\[
|
||||||
|
\begin{pmatrix}
|
||||||
|
.9 & .1 \\
|
||||||
|
.5 & .5
|
||||||
|
\end{pmatrix}
|
||||||
|
\]
|
||||||
|
or, expressed as a table:
|
||||||
|
\begin{center}
|
||||||
|
\begin{tabular}{ | c | c | c | }
|
||||||
|
\hline
|
||||||
|
Current State & Next: Sunny & Next: Cloudy \\
|
||||||
|
\hline
|
||||||
|
\hline
|
||||||
|
Sunny & 90\% & 10\% \\
|
||||||
|
\hline
|
||||||
|
Cloudy & 50\% & 50\% \\
|
||||||
|
\hline
|
||||||
|
\end{tabular}
|
||||||
|
\end{center}
|
||||||
|
|
||||||
|
To turn this into a k-steps transition matrix, this 1-step matrix only needs to be raised to the k-th power:
|
||||||
|
\[
|
||||||
|
\begin{pmatrix}
|
||||||
|
.9 & .1 \\
|
||||||
|
.5 & .5
|
||||||
|
\end{pmatrix}^k
|
||||||
|
\]
|
||||||
|
To find the probability of the weather two days from the current state, plug 2 into k:
|
||||||
|
\[
|
||||||
|
\begin{pmatrix}
|
||||||
|
.9 & .1 \\
|
||||||
|
.5 & .5
|
||||||
|
\end{pmatrix}^2 =
|
||||||
|
\begin{pmatrix}
|
||||||
|
.86 & .14 \\
|
||||||
|
.7 & .3
|
||||||
|
\end{pmatrix}
|
||||||
|
\]
|
||||||
|
|
||||||
|
From this matrix we can determine that if it is currently sunny, there is a 86\% chance that it will be sunny in two days and, if it is currently cloudy, there is a
|
||||||
|
70\% chance that it will be sunny in two days. As k approaches infinity, the model approaches its equilibrium where the starting state becomes irrelevant. In this
|
||||||
|
example, any random day would be 83.333\% likely to be sunny, representative of the long-term behavior of the system (climate), so the matrix of the equilibrium
|
||||||
|
looks like this:
|
||||||
|
|
||||||
|
\[\begin{pmatrix}
|
||||||
|
.9 & .1 \\
|
||||||
|
.5 & .5
|
||||||
|
\end{pmatrix}^\infty \approx
|
||||||
|
\begin{pmatrix}
|
||||||
|
.83333 & .16666 \\
|
||||||
|
.83333 & .16666
|
||||||
|
\end{pmatrix}
|
||||||
|
\text{ OR: }
|
||||||
|
\begin{pmatrix}
|
||||||
|
.83333 \\
|
||||||
|
.16666
|
||||||
|
\end{pmatrix}
|
||||||
|
\]
|
||||||
|
|
||||||
|
\subsubsection{Hidden Markov Models}
|
||||||
|
maybe add notes on mixed
|
||||||
|
|
||||||
|
\newpage
|
||||||
|
\subsection{Unit 5: Monte Carlo Simulations}
|
||||||
|
what is this shit
|
||||||
|
|
||||||
|
\subsubsection{How To Make a Monte Carlo Simulation}
|
||||||
|
|
||||||
|
\subsubsection{Monte Carlo Integration}
|
||||||
|
|
||||||
|
\subsubsection{Markov Chain Monte Carlo (MCMC) methods}
|
||||||
|
|
||||||
|
\newpage
|
||||||
|
\section{Applied Projects}
|
||||||
|
\rule{14cm}{0.05cm}
|
||||||
|
|
||||||
|
\subsection{Randomness of Retinal Mosaic layout}
|
||||||
|
hexagonal grid of marbles. are colors randomly distributed?
|
||||||
|
Hexagonal basis vectors, retinal mosaic, entropy
|
||||||
|
|
||||||
|
\subsection{Bayes Server Ripoff}
|
||||||
|
I planned to create a trickle-down density belief network using probability density functions as nodes that choose the direction of rows in a relational database.
|
||||||
|
Found this later, it's sort of similar. \url{https://www.bayesserver.com/}
|
||||||
|
|
||||||
|
Even better than their jank bayesian belief network I may be able to make mixed bayesian/markov chain models. This is a big project.
|
||||||
|
|
||||||
|
\subsection{Cost-Benefit Analysis of Asychronous Education}
|
||||||
|
This section covers a calculation I devised to make me feel better about my life decisions. The data is based on implicit guesswork and, while I will be taking it
|
||||||
|
seriously for my decision to do either the online or on-campus RIT Data Science Masters Program, it should not be taken seriously as a probabilistic model.
|
||||||
|
Since there is no framework for making a subjective decision weighting the potential benefits of on-campus life with the value of entering the workforce 18 months
|
||||||
|
sooner, I decided to make one. Inshallah I shall reach my true potential and fulfill destiny.
|
||||||
|
|
||||||
|
with archaic knowledge imbued by Dr. Pepper flowing through my veins, I have selected \(y= 3x^2 - 2y\) as the equation for covariance.
|
||||||
|
|
||||||
\end{document}
|
\end{document}
|
||||||
@@ -35,8 +35,7 @@ def csv2Table(inFile):
|
|||||||
reader = csv.reader(f)
|
reader = csv.reader(f)
|
||||||
rows = list(reader)
|
rows = list(reader)
|
||||||
|
|
||||||
out = "\\begin{table}[h!]\n\\centering\n"
|
out = "\\begin{longtable}{|" + " c | p{1.3cm} | c | c | p{6cm} |}\n"
|
||||||
out += "\\begin{tabular}[t]{|" + " c | p{1.3cm} | c | c | p{6cm} |}\n"
|
|
||||||
out += "\\hline\n"
|
out += "\\hline\n"
|
||||||
|
|
||||||
for row in rows:
|
for row in rows:
|
||||||
@@ -45,7 +44,7 @@ def csv2Table(inFile):
|
|||||||
out += " & ".join(row) + " \\\\\n"
|
out += " & ".join(row) + " \\\\\n"
|
||||||
out += "\\hline\n"
|
out += "\\hline\n"
|
||||||
|
|
||||||
out += "\\end{tabular}\n\\end{table}\n"
|
out += "\\end{longtable}\n"
|
||||||
return out
|
return out
|
||||||
|
|
||||||
def findIn(arr, target):
|
def findIn(arr, target):
|
||||||
|
|||||||
@@ -24,3 +24,13 @@ Week,Date,Type,Duration (Hours),Description
|
|||||||
8,10/16,Application,2.5,"Bayes visualizations and practice worksheets"
|
8,10/16,Application,2.5,"Bayes visualizations and practice worksheets"
|
||||||
8,10/16,Reporting,2,"Early Bayesian Statistics Report"
|
8,10/16,Reporting,2,"Early Bayesian Statistics Report"
|
||||||
8,10/17,Application,2,"Bayes Geometric Visualization"
|
8,10/17,Application,2,"Bayes Geometric Visualization"
|
||||||
|
8,10/18,Application,2,"Bayes Belief Network Visualization and reporting"
|
||||||
|
8,10/18,Advising Meetings,1,"Bayes Report Review"
|
||||||
|
8,10/18,Reporting,2.5,"Applying meeting feedback"
|
||||||
|
9,10/22,Research,2,"Dempster-Shafer Theory"
|
||||||
|
9,10/25,Advising Meetings,0.5,"'are you doing things' check"
|
||||||
|
9,10/26,Research,1,"First Pass Markov Chains"
|
||||||
|
10,10/28,Reporting,3,"Finalization, added Standard Error, Dempster-Shafer Theory, Fooled by Randomness"
|
||||||
|
10,10/29,Reporting,1,"Lindy, Info Gap tie ins"
|
||||||
|
10,10/29,Research,3,"Markov Chains (brilliant.org, towardsdatascience)"
|
||||||
|
10,10/29,Reporting,2,"Markov Chain Summary and Visuals"
|
||||||
|
Binary file not shown.
@@ -1,5 +1,6 @@
|
|||||||
\documentclass[12pt]{article}
|
\documentclass[12pt]{article}
|
||||||
\usepackage{blindtext}
|
\usepackage{blindtext}
|
||||||
|
\usepackage{longtable}
|
||||||
\usepackage[a4paper, total={6in, 10in}]{geometry}
|
\usepackage[a4paper, total={6in, 10in}]{geometry}
|
||||||
\nofiles
|
\nofiles
|
||||||
\hyphenpenalty 1000
|
\hyphenpenalty 1000
|
||||||
@@ -26,9 +27,7 @@
|
|||||||
\newpage
|
\newpage
|
||||||
|
|
||||||
% OPEN Timesheet
|
% OPEN Timesheet
|
||||||
\begin{table}[h!]
|
\begin{longtable}{| c | p{1.3cm} | c | c | p{6cm} |}
|
||||||
\centering
|
|
||||||
\begin{tabular}[t]{| c | p{1.3cm} | c | c | p{6cm} |}
|
|
||||||
\hline
|
\hline
|
||||||
Week & Date & Type & Duration (Hours) & Description \\
|
Week & Date & Type & Duration (Hours) & Description \\
|
||||||
\hline
|
\hline
|
||||||
@@ -82,13 +81,32 @@ Week & Date & Type & Duration (Hours) & Description \\
|
|||||||
\hline
|
\hline
|
||||||
8 & 10/17 & Application & 2 & Bayes Geometric Visualization \\
|
8 & 10/17 & Application & 2 & Bayes Geometric Visualization \\
|
||||||
\hline
|
\hline
|
||||||
\end{tabular}
|
8 & 10/18 & Application & 2 & Bayes Belief Network Visualization and reporting \\
|
||||||
\end{table}
|
\hline
|
||||||
\noindent Hours for Advising Meetings: 6.0\\
|
8 & 10/18 & Advising Meetings & 1 & Bayes Report Review \\
|
||||||
Hours for Application: 6.0\\
|
\hline
|
||||||
Hours for Reporting: 16.0\\
|
8 & 10/18 & Reporting & 2.5 & Applying meeting feedback \\
|
||||||
Hours for Research: 28.5\\
|
\hline
|
||||||
\textbf{Total Hours: 56.5}\\
|
9 & 10/22 & Research & 2 & Dempster-Shafer Theory \\
|
||||||
|
\hline
|
||||||
|
9 & 10/25 & Advising Meetings & 0.5 & 'are you doing things' check \\
|
||||||
|
\hline
|
||||||
|
9 & 10/26 & Research & 1 & First Pass Markov Chains \\
|
||||||
|
\hline
|
||||||
|
10 & 10/28 & Reporting & 3 & Finalization, added Standard Error, Dempster-Shafer Theory, Fooled by Randomness \\
|
||||||
|
\hline
|
||||||
|
10 & 10/29 & Reporting & 1 & Lindy, Info Gap tie ins \\
|
||||||
|
\hline
|
||||||
|
10 & 10/29 & Research & 3 & Markov Chains (brilliant.org, towardsdatascience) \\
|
||||||
|
\hline
|
||||||
|
10 & 10/29 & Reporting & 2 & Markov Chain Summary and Visuals \\
|
||||||
|
\hline
|
||||||
|
\end{longtable}
|
||||||
|
\noindent Hours for Advising Meetings: 7.5\\
|
||||||
|
Hours for Application: 8.0\\
|
||||||
|
Hours for Reporting: 24.5\\
|
||||||
|
Hours for Research: 34.5\\
|
||||||
|
\textbf{Total Hours: 74.5}\\
|
||||||
% CLOSE Timesheet
|
% CLOSE Timesheet
|
||||||
|
|
||||||
\end{document}
|
\end{document}
|
||||||
Reference in New Issue
Block a user