starting markov things

2026-02-25 06:09:50 -06:00 · 2024-10-29 15:38:53 -04:00
parent e8749deb8d
commit 6ad7ce6fdd
6 changed files with 303 additions and 70 deletions
--- a/report/report.tex
+++ b/report/report.tex
@@ -3,7 +3,10 @@
 \usepackage{hyperref}
 \usepackage{amsmath}
 \usepackage{amssymb}
+
 \usepackage{tikz}
+\usetikzlibrary{arrows, automata, positioning}
+
 \usepackage[a4paper, total={6in, 10in}]{geometry}
 \usepackage{setspace}
 \setstretch{1.25}
@@ -36,6 +39,8 @@
 \newpage
 % Begin report
 \section{Objective}
+\rule{14cm}{0.05cm}
+
 The educational focus of Implementations of Probability Theory surrounds the application of data
 models that produce non-deterministic insights through probabilistic methodology. By pursuing this
 study I hope to gain a deeper understanding of how to apply data in risk calculation for mitigation
@@ -58,6 +63,7 @@ a program that can determine the liklihood that randomly distributed colors on a
 \newpage
 \section{Units}
 \rule{14cm}{0.05cm}
+
 \subsection{Unit 1: Statistics Review}
 To ensure a strong statistical foundation for the future learnings in probabilistic models, 
 the first objective was to create a document outlining and defining key topics that are 
@@ -118,7 +124,8 @@ Probability Functions map the likelihood of random variables to be a specific va

 \subsubsection*{Probability Mass Functions}
 Probability Mass Functions (PMFs) map discrete random variables.
-For example, a six-sided die roll creates a uniform random PMF:
+For example, a six-sided die roll creates a uniform random PMF.  Each side of the die has a one-sixth chance of landing face-up, so the discrete chances of each x 
+value between 1 and 6 is represented by a \(\frac{1}{6}\)th portion of the sample space:
 \begin{equation*}
    P(A) = 
    \begin{cases}
@@ -133,7 +140,9 @@ For example, a six-sided die roll creates a uniform random PMF:

 \subsubsection*{Probability Density Functions}
 Probability Density Functions (PDFs) map continuous random variables.
-For example, this is a PDF where things happen.
+For example, this is a PDF representing a vehicle's risk of being stranded as it travels (in a line at a fixed speed).  The y value increases as the vehicle puts 
+distance between itself and the starting point but, once the halfway point is reached, the risk decreases as the distance between the vehicle and the destination 
+decreases.
 \begin{equation*}
    P(A) = 
    \begin{cases}
@@ -144,7 +153,7 @@ For example, this is a PDF where things happen.
 \end{equation*}

 \subsubsection{Limit Theorems}
-\subsubsection*{Law of Large Numbers}
+\subsubsection*{Law of Large Numbers}\label{Law of Large Numbers}
 The Law of Large Numbers states that as the number of independent random samples increases, the average of the samples' 
 means will approach the true mean of the population. 
 \[\text{true average}\approx \frac{1}{n} \sum_{i=1}^{n} X_{i} \qquad\text{as }n \rightarrow \infty\]
@@ -156,7 +165,7 @@ population distribution is not normal.
 \]
 Where \(X_i\) is the sample mean, \(N(0, 1)\) is a standard normal distribution, and \(\bar{X}_n = \frac{1}{n} \sum_{i=1}^{n}X_i\).\\
 This is a challenging to understand solely as an equation.  As an example, take a sample of two six-sided dice rolls and average their numbers.  
-The more sample averages taken, the more they will resemble a normal distribution where the majority of samples average around 3.
+The more sample averages taken, the more they will resemble a normal distribution where the majority of samples average around 3.5.

 \subsubsection{Confidence}
 Confidence is described using a confidence interval, which is a range of values that the true value is expected to be in, and its associated confidence level, 
@@ -165,9 +174,22 @@ which is a probability (expressed as a percentage) that the true value is in the
 It is important to note that confidence levels, such as 95\%, do not indicate that the real value is within 5\% of the point estimate.  The confidence level expresses 
 the probability that the real value is in the range provided by the confidence interval.

-At the highest level, calculating confidence intervals is simply the observed statistic (generally the mean) plus or minus the standard error.
+At the highest level, calculating confidence intervals is simply the observed statistic (generally the mean) plus or minus the standard error.  The percentage is 
+identified by applying the z-score coefficient (in the case of nornal distribution, other distributions use non-parametric methods) that corresponds to that level 
+of confidence.  For instance, the z-multiplier for a confidence level of 95\% is 1.96 so a confidence interval formula around the mean would look like this:

-To calculate standard error, kys.
+\[\text{interval} = \mu \pm (1.96 * \text{SE})\]
+
+To calculate standard error when the population standard deviation (\(\sigma\)) is known:
+
+\[\text{SE} = \frac{\sigma}{\sqrt{n}}\]
+
+When \(\sigma\) is unknown:
+
+\[\text{SE} = \frac{s}{\sqrt{n}}\]
+
+where \(n\) is the size of the sample and \(s\) is the sample standard deviation.  Notice how the standard error decreases with a larger sample size because it 
+indicates a resilience in the sample to random events as per the Law of Large Numbers (\ref{Law of Large Numbers}).

 % Confidence intervals can be calculated with z-tests, t-tests.  Go into parametric vs non-parametric

@@ -201,10 +223,10 @@ The 2009 recession, attributed to the collapse of the housing market bubble, is
 banks who were federally required to give subprime loans to the taxpayer meant that banks could profit from subprime loans but would not be harmed when the inevitable 
 occurred.  In popular media, the housing bubble bursting is attributed to the banks where those in the industry passed off the event as something that nobody could 
 have foreseen\footnote{For instance, in the 2015 movie \textit{The Big Short}, only a few savvy traders who bothered to look into the details find that banks had, 
-in their ignorance, built the bundled mortgages on an unstable foundation.}.  In reality, banks only ignored a probablistic eventuality because their models did not 
+in their ignorance, built the bundled mortgages on an unstable foundation.}.  In reality, banks only ignored a probabilistic eventuality because their models did not 
 need to account for such an event.

-Most emphasize the problems with risk transferrence when creating models.  For this study's purposes, the important learning is that probablistic models should not 
+Most emphasize the problems with risk transferrence when creating models.  For this study's purposes, the important learning is that probabilistic models should not 
 drop evaluations as soon as an event leaves the scope of the immediate client.

 \subsubsection{Ignoring Improbable Outliers with Outsized Impact}
@@ -221,16 +243,28 @@ Nassim Taleb in \textit{Fooled By Randomness} describes this event with an analo
 that eventually the unlikely, or, as the actor would see it, the unthinkable, happens and all of the gains are completely negated.

 \subsubsection{Fooled By Randomness}
-May justify its own subsection since the others acknowledge small probabilities whereas this is outright randomness.
+While most statisticians are familiar with techniques to remove noise to get a clearer picture of long-term trends, many forget that noise over longer terms can 
+materialize as highly improbable events.  For instance, it is improbable to flip a fair coin and have heads land face up 5 times in a row, but if the coin is flipped 
+millions of times, it's exceedingly unlikely that a 5-head sequence does not occur.

-\subsubsection{Lindy Effect}
-"For the perishable, every additional day in its life translates into a shorter additional life expectancy. 
-For the nonperishable, every additional day may imply a longer life expectancy."
-A tool that is proven is more likely to stand the test of time than a new tool replacing it since it is unproven.  
-"The robustness of an item is proportional to its life!"
+In Nassim Taleb's namesake book, \textit{Fooled By Randomness}, this concept is applied to ongoing timeseries analysis in stock markets.  By accounting for the scope 
+of the prior evidence, Taleb models the probability that daily events are the effect of noise, a number that remains high even in the face of multiple point swings 
+in the market.  Understanding this chance is critical because often observers attempt to justify random market events to events with high publicity that in reality 
+had a negligible on the market, fooling investors out of acting on prices deviating from their target.

-"Inaccurate science\ldots is constantly being published. The Lindy-conscious consumer of scientific data will take seriously only 
-information that has held up over a period of time."\footnote{\url{https://www.nytimes.com/2021/06/17/style/lindy.html}} 
+\subsubsection{Lindy Effect}\label{Lindy Effect}
+The Lindy Effect describes the importance of historical evidence of continuity when estimating its continuity in the future.  For items with a set lifespan, such as 
+perishable goods, each passing day is indicative of a shorter remaining life expectancy, but the same is not true for nonperishables like tools and concepts.  
+For example, consider the lifespan of a news story or hot book.  Many such stories may take the world by storm, but then be nearly forgotten months later.  However, 
+older writings are incredibly unlikely to be forgotten in the next few months.  It would be truly bizzare if everyone decided Shakespeare was not worth learning in 
+the next few years because its value has been determined for so long to be high enough to maintain its popularity.
+
+Applying this concept to probability theory, information and evidence that has been important for a long time is likely to stick around long after hot new examples 
+or tactics that contradict it fade into obscurity.  When measuring risk of startups, the concept and foundations may indeed be strong, but they have to be contrasted 
+with the robustness of past ideas as proven over time.  This concept also has applications for how people think about new things in their day to day life.  
+In the news and papers outlining new developments, "Inaccurate science\ldots is constantly being published. The Lindy-conscious consumer of scientific data will take 
+seriously only information that has held up over a period of time"\footnote{\url{https://www.nytimes.com/2021/06/17/style/lindy.html}} because time has removed 
+uncertainty associated with volatility of untested (or tested less than the alternative) information.

 \subsubsection{Decision Theory}
 Decision theory is the study of how people make decisions with uncertain information.  There are two main branches of decision theory:
@@ -238,16 +272,32 @@ Decision theory is the study of how people make decisions with uncertain informa
 This branch studies how people \textit{should} make decisions.  In problems with other actors, as in game theory, it is assumed that all other actors will also 
 act with perfect rationality, allowing for precise calculation of the actions of all of the others and their expected utility to the agent.
 \subsubsection*{Descriptive Decision Theory}
-This branch studies how people actually make decisions which includes factors such as psychological and emotional biases.
+This branch studies how people actually make decisions which includes factors such as psychological and emotional biases.  It applies subjective value measurements, 
+frequently working in parallel with Dempster-Shafer Theory (\ref{Dempster_Shafer_Theory}).

 \subsubsection{Info Gap Decisions}
-In info gap decision theory there is not enough information to assign probabilities to events and the goal is to select a course of action that is robust in the 
+In info gap decision theory there is not enough information to assign probabilities to events.  The goal, then, is to select a course of action that is robust in the 
 face of uncertainty.  Where decision theory can predict expectations in irrationality to determine expected values, info gap decisions approximate the range of 
 probabilities and weight them to estimate expected value.  In essence, it applies probabilities to probabilities, adding an additional layer to insulate calculations 
-from a lack of data or lack of understanding of a topic.
+from a lack of data or lack of understanding of a topic.  Tying this into the Lindy Effect (\ref{Lindy Effect}), we can compare the large range of probabilities of 
+new, untested information with the narrower range from old, tested information which has experienced more challenges, just as confidence increases with a larger 
+sample size.
+
+\subsubsection{Dempster-Shafer Theory}\label{Dempster_Shafer_Theory}
+This section is an extra theory chosen to coincide with the unit 3 focus on Bayesian statistics.  The Dempster-Shafer theory is a derivative application of 
+Bayes Theorem (\ref{Bayes Theorem}) where subjective beliefs are applied to independent variables not tracked by the belief network.  Shafer so eloquently describes this 
+process by supposing that two friends, both of whom he subjectively believes are 90\% reliable, tell him that a limb has fallen on his car
+\footnote{\url{http://glennshafer.com/assets/downloads/articles/article48.pdf}}.  Without observing Shafer's car we can calculate that there is only a 1\% chance that 
+both friends are unreliable, so there's a high liklihood that the statement is true.
+
+However, if both friends are unreliable, they are not necessarily lying.  Thus, there is actually less than 1\% chance that a limb fell on the car.  The exact 
+probability can only be calculated by determining how likely it is that the friends would find it funny to tell Shafer that a limb fell on his car, contrasted with 
+the odds that such a friend may also be willing to throw limbs at his car so as to maintain their ever-reliable facade.  If one also considers the possibility 
+that Shafer's friends mistakenly believed a limb fell on his car, this uncertainty must also be combined with the evidence for the most accurate picture.

 \subsubsection{Methodology Considerations}
-Given I have taken 10134023 instances of the last 40 years, all of which Obama has been alive, I can say with a high degree of certainty that Obama is immortal.
+I have taken 10134023 instances of the last 40 years, during all of which Obama has been alive.  Therefore I can say with a high degree of certainty that Obama is 
+immortal.

 An event never occurring in history does not discount its possiblity of occurring in the future.  Similarly, events that may have been impossible in the past 
 are not necessarily impossible in the future.
@@ -269,20 +319,19 @@ these expressions but, when shown a table and how to calculate those ratios, the

 \subsubsection{Bayes Theorem}\label{Bayes Theorem}

-The equation for Bayes Theorem is as follows:
+Bayes Theorem is a rule for conditional probability that calculates the probability of a cause given an event has occurred.  The equation for Bayes Theorem is as 
+follows:

 \[
 P(A|E) = \frac{P(A) * P(E|A)}{P(A) * P(E|A) + (1 - P(A)) * P(E|\neg A)}
 \]

-This formula appears more complex as it is.  The denominator, while directly translating to "The probability of A times the probability of event E occuring in A 
+This formula appears more complex as it is.  The denominator, while directly translating to "The probability of A times the probability of event E occuring given A 
 divided by the probability of A times the probability of event E occuring in A plus the probability of not A times the probability of E occuring in not A" 
-can be more easily expressed simply as \(P(E)\) or the probability of event E occuring.  
-
-By utilizing venacular more familiar to everyday life, Bayes Theorem can be translated into:
+can be more easily expressed as \(P(E)\) or the probability of event E occuring:

 \[
-\text{P(occurence came from category)} = \frac{\text{\# of occurences from category}}{\text{total \# of occurences}}
+P(A|E) = \frac{P(A) * P(E|A)}{P(E)}
 \]

 Finally, this equation is updated to replace descriptions with technical terms:
@@ -291,47 +340,84 @@ Finally, this equation is updated to replace descriptions with technical terms:
 \text{Posterior Probability} = \frac{\text{prior} * \text{likelihood}}{\text{Evidence}}
 \]

-Even this equation can be misconstrued as a number of arrangements of ratios involving total occurrences from a category or non-occurrences from outside 
-of the category so as a final demonstration, the sample space can be visualized geometrically as a 1 unit by 1 unit 
-square\footnote{Concept credit to 3Blue1Brown on Youtube, this video is what finally clarified in my mind what the frankly simple equation behind Bayes Theorem 
-meant.\\\url{https://www.youtube.com/watch?v=HZGCoVF3YvM}}.  The area of this square, 1 unit squared, is the equivalent to a probability of 1 (or 100\%).  
-In such an example, a vertical line is drawn to separate proportions representative of the category (or the assumed-true event) and observations not of that category.  
-Horizontal lines drawn in each represent the probability of an occurrence in each category.
+By utilizing venacular more familiar to everyday life, Bayes Theorem can be translated as:
+
+\[
+\text{P(occurence stems from A)} = \frac{\text{\# of occurences from A}}{\text{total \# of occurences}}
+\]
+
+To appeal to mental visualization, the sample space can be imagined geometrically as a 1 unit by 1 unit 
+square\footnote{Concept credit to 3Blue1Brown on Youtube, this video is what finally clarified in my mind what the frankly simple equation behind Bayes Theorem 
+meant.\\\url{https://www.youtube.com/watch?v=HZGCoVF3YvM}}.  The area of this square, 1 unit squared, represents a probability of 1 (or 100\%) and the probability of
+any possible outcome fits inside this square.  Intuitively, this visualization  can also be thought of as a confusion matrix where the squares are drawn proportional 
+to their representative probabilities.
+
+Consider an example where a patient wants to know if their positive cancer test is actually a false negative.  Reviewing the test history, it's found to be accurate 
+95\% across 1,000 uses.  Given that we want to find the chances that a positive test is truly from a patient with cancer, let's highlight only the cases where a 
+test is positive. A confusion matrix for this example would look like this:

-Consider an example where a cancer test given to 1,000 people has a 95\% accuracy rate.  Of those 1,000 people, 10\% of them have cancer, 95 of whom test positive 
-(true positive) and 5 who test negative (false negative).  Of the remaining 900, 45 test positive (false positive) and 855 test negative (true negative).  Such 
-an example can be expressed visually as:
-\vskip 2pt
 \begin{center}
    \begin{tikzpicture}
-        \draw[gray, thick] (0,0) rectangle (3,3);
-        \draw[gray, thin] (3/10, 0) -- (3/10, 3);
-        \draw[gray, thin] (0, 0) rectangle (3/10, 3*.95);
-        \node[label=below:95/1000] at (-1,1) {TP};
-        \draw[->] (-.6, 1) -- (.15, 1);
-        \node[label=below:45/1000] at (1.5,-2/3) {FP};
-        \draw[->] (1.5, -1/3) -- (1.5, .05);
-        \draw[gray, thin] (3/10, 0) rectangle (3, 3*.05);
+        \draw[gray, thick, fill=blue!5] (0, 0) rectangle (3, 3);
+        \node[align=center, text width=3cm] at (1.5, 1.5) {True Positives\\95 patients};
+        
+        \draw[gray, thick, fill=red!5] (3, 0) rectangle (6, 3);
+        \node[align=center, text width=3cm] at (4.5, 1.5) {False Positives\\45 patients};
+        
+        \draw[gray, thick] (0, 3) rectangle (3, 6);
+        \node[align=center, text width=3cm] at (1.5, 4.5) {False Negatives\\5 patients};
+
+        \draw[gray, thick] (3, 3) rectangle (6, 6);
+        \node[align=center, text width=3cm] at (4.5, 4.5) {True Negatives\\855 patients};
+
+        \node[label, align=center, text width=3cm] at (1.5, 6.75) {Cancer\\ (100 patients)};
+        \node[label, align=center, text width=3cm] at (4.5, 6.75) {No Cancer\\ (900 patients)};
+        \node[label, rotate=90] at (-0.5, 1.5) {Positive};
+        \node[label, rotate=90] at (-0.5, 4.5) {Negative};
+    \end{tikzpicture}
+\end{center}
+
+Notice that the test does make the correct identification 95\% of the time (and in this example, 95\% regardless of actual value) but that there are almost half as 
+many false positives as there are true positives, meaning having a positive test is not representative of a 95\% chance of having cancer.
+
+Proportinally scaling the probability matrix squares to create the sample space square defined earlier, we can see that the TP box appears to be approximately 
+twice the size of the FP box.  Logically, then, if we chose a random positive test, there's a two-thirds chance of the patient selected being from the true positive 
+category:
+
+\vfil % Added to keep the footer down since a new page is entering on the next tikz picture
+\begin{center}
+    \begin{tikzpicture}
+        \draw[gray, thick] (0,0) rectangle (6, 6);
+        \draw[gray, thin] (6/10, 0) -- (6/10, 6);
+        \draw[gray, thin, fill=blue!5] (0, 0) rectangle (6/10, 6*.95);
+        \draw[gray, thin, fill=red!5] (6/10, 0) rectangle (6, 6*.05);
+        \node[label=below:95/1000] at (-1, 2.5) {TP};
+        \draw[->] (-0.6, 2.5) -- (0.25, 2.5);
+        \node[label=below:45/1000] at (4,-2/3) {FP};
+        \draw[->] (4, -1/3) -- (4, .15);
+        \node[label=below:5/1000] at (-1, 5.85) {FN};
+        \node[label=below:855/1000] at (3.5, 3.5) {TN};
+        \draw[->] (-0.6, 5.85) -- (0.25, 5.85);
    \end{tikzpicture}
 \end{center}
 \vskip 2pt
-Using this visual where TP represents true positives and FP representing false positives, Bayes Theorem is simply expressed as:
+Bayes Theorem as applied to this problem can be simply expressed as:
 \[
-P(A|E) = \frac{TP}{TP + FP} = \frac{\frac{95}{1000}}{\frac{95}{1000} + \frac{45}{1000}} = 67.9\%
+P(\text{has cancer given positive test}) = \frac{\colorbox{blue!5}{TP}}{\colorbox{blue!5}{TP} + \colorbox{red!5}{FP}} = \frac{\colorbox{blue!5}{\(\frac{95}{1000}\)}}{\colorbox{blue!5}{\(\frac{95}{1000}\)} + \colorbox{red!5}{\(\frac{45}{1000}\)}} = 67.9\%
 \]
-Meaning that, given a random positive test, there is a 67.9\% chance of the patient actually having cancer.  This percentage visually tracks with the graphic as 
-the TP box appears to be approximately twice the size of the FP box, giving a two-thirds chance of the patient being a true positive.
+Meaning that, given a random positive test, there is a 67.9\% chance of the patient actually having cancer, not far off from the two-thirds visual trick. 


 \subsubsection{Bayesian Updating}
 Bayesian Updating is another term that has been added to buzzword vocabulary to describe a process that isn't directly related to Bayesian Statistics but appears 
 to have been rediscovered by academia through study of applied Bayes Theorem.  In essence, Bayesian Updating simply states that observed occurrences should not 
 override previous evidence and that it should instead be added to it in equal weight (equal value being a naive assumption).  This evidence updating makes 
-applications of Bayes Theory calculate posterior probabilities continuously as new information enters the system rather than a calculation that is only done once.
+applications of Bayes Theory calculate posterior probabilities continuously as new information enters the system rather than a frequentist approach where 
+the calculation only performed once.


 \subsubsection{Bayesian Belief Networks}
-Bayesian Belief Networks are probablistic graphical models that preserve conditional dependence between random variables.  In spite of its name, 
+Bayesian Belief Networks are probabilistic graphical models that preserve conditional dependence between random variables.  In spite of its name, 
 Bayesian Belief Networks do not necessarily apply Bayesian models, though they are a way to utilize Bayes Theorem for domains with greater complexity beyond a 
 single posterior probability.  In this type of network, edges are directed and the structure is utilized in a single direction.  This is in contrast to undirected
 Hidden Markov Models (to be covered in the next unit) that do not assume the order of aquisition of random variables.  While it may not be practical to calculate 
@@ -339,8 +425,7 @@ the full conditional probability of a variable, Bayesian Belief Networks allow u
 an earlier random variable.

 Following the example in the Bayes Theorem section of this report (\ref{Bayes Theorem}), let's suppose that a patient with a positive test takes a hypothetical 
-second test whose results are partially dependent on the first as they measure overlapping biological markers.  In this case, the results of the first test 
-is relevant to the second test:
+second test.  However, the second test's results are partially dependent on the first since they measure overlapping biological markers.
 \vskip 5pt
 \begin{center}
    \begin{tikzpicture}
@@ -391,11 +476,132 @@ is relevant to the second test:
        \hline    
    \end{tabular}
 \end{center}
-Note that this probability of positive results in both tests (which both have greater than 50\% of positives being true positives) is as certain as two positives 
-from two completely independent tests with 50\% of positives being true.  If the partial dependence was not included in the calculation, as would have occured in a 
-Naive Bayes model, the model's listed accuracy would be inflated.
+Note that this probability of positive results in both tests (which both have greater than 50\% of positives being true positives) is only equally certain as two 
+positives from two independent tests each with 50\% of positives being true.  If the dependence was not included in the calculation and we ignored the fact 
+that the tests partially measure the same thing, as would have occured in a Naive Bayes model, the tests' combined accuracy would be unjustly inflated.

 \newpage
-\section{Unit 4: Markov Chains}
+\subsection{Unit 4: Markov Methods}
+
+
+\subsubsection{Markov Chains}
+Markov Chains are a form of probabilistic automaton where, the likelihood of transitioning to a new state depends solely on the current state, with no memory of prior 
+states.  For example\footnote{example sourced from:\\\url{https://towardsdatascience.com/introduction-to-markov-chains-50da3645a50d}}, suppose a weather prediction 
+program wants to know whether tomorrow will be a sunny or cloudy day, based on the current weather.  Using the current weather as a state, the program identifies that 
+there is a 10\% chance of a sunny day transitioning into a cloudy day and a 50\% chance that a cloudy day transitions into a sunny day:
+
+\begin{center}
+    \begin{tikzpicture}[shorten >=1pt, node distance=3cm, on grid, auto]
+
+        \node[state] (Sunny) {Sunny};
+        \node[state, right=of Sunny] (Cloudy) {Cloudy};
+
+        \path[->]
+        (Sunny) edge [loop left] node {.9} (Sunny)
+                edge [bend right=-15] node {.1} (Cloudy)
+        (Cloudy) edge [loop right] node {.5} (Cloudy)
+                 edge [bend left=15] node {.5} (Sunny);
+
+    \end{tikzpicture}
+\end{center}
+
+Note that there is no information preserved between steps.  Markov Chains are memoryless, so any information that must be available to them must be expressed as the 
+state, such as the sunny and cloudy states in the example above.  One benefit of such a straightforward structure is that it enables easy calculation of the 
+probabilities of reaching a state k-steps from the current position.  By expressing the chain as a transition matrix where rows represent the current state, the 
+column represents the next state, and each cell contains the probability of the state moving from the column state to the row state, we get a 1-step transition matrix:
+
+\[
+\begin{pmatrix}
+    .9 & .1 \\
+    .5 & .5
+\end{pmatrix}
+\]
+or, expressed as a table: 
+\begin{center}
+    \begin{tabular}{ | c | c | c | } 
+        \hline
+        Current State & Next: Sunny & Next: Cloudy \\ 
+        \hline
+        \hline
+        Sunny & 90\% & 10\% \\ 
+        \hline
+        Cloudy & 50\% & 50\% \\ 
+        \hline
+    \end{tabular}
+\end{center}
+
+To turn this into a k-steps transition matrix, this 1-step matrix only needs to be raised to the k-th power:
+\[
+\begin{pmatrix}
+    .9 & .1 \\
+    .5 & .5
+\end{pmatrix}^k
+\]
+To find the probability of the weather two days from the current state, plug 2 into k:
+\[
+\begin{pmatrix}
+    .9 & .1 \\
+    .5 & .5
+\end{pmatrix}^2 = 
+\begin{pmatrix}
+    .86 & .14 \\
+    .7 & .3
+\end{pmatrix}
+\]
+
+From this matrix we can determine that if it is currently sunny, there is a 86\% chance that it will be sunny in two days and, if it is currently cloudy, there is a 
+70\% chance that it will be sunny in two days.  As k approaches infinity, the model approaches its equilibrium where the starting state becomes irrelevant.  In this 
+example, any random day would be 83.333\% likely to be sunny, representative of the long-term behavior of the system (climate), so the matrix of the equilibrium 
+looks like this:
+
+\[\begin{pmatrix}
+    .9 & .1 \\
+    .5 & .5
+\end{pmatrix}^\infty \approx
+\begin{pmatrix}
+    .83333 & .16666 \\
+    .83333 & .16666
+\end{pmatrix}
+\text{ OR: }
+\begin{pmatrix}
+    .83333 \\
+    .16666
+\end{pmatrix}
+\]
+
+\subsubsection{Hidden Markov Models}
+maybe add notes on mixed
+
+\newpage
+\subsection{Unit 5: Monte Carlo Simulations}
+what is this shit
+
+\subsubsection{How To Make a Monte Carlo Simulation}
+
+\subsubsection{Monte Carlo Integration}
+
+\subsubsection{Markov Chain Monte Carlo (MCMC) methods}
+
+\newpage
+\section{Applied Projects}
+\rule{14cm}{0.05cm}
+
+\subsection{Randomness of Retinal Mosaic layout}
+hexagonal grid of marbles.  are colors randomly distributed?
+Hexagonal basis vectors, retinal mosaic, entropy
+
+\subsection{Bayes Server Ripoff}
+I planned to create a trickle-down density belief network using probability density functions as nodes that choose the direction of rows in a relational database.  
+Found this later, it's sort of similar. \url{https://www.bayesserver.com/}
+
+Even better than their jank bayesian belief network I may be able to make mixed bayesian/markov chain models.  This is a big project.
+
+\subsection{Cost-Benefit Analysis of Asychronous Education}
+This section covers a calculation I devised to make me feel better about my life decisions.  The data is based on implicit guesswork and, while I will be taking it 
+seriously for my decision to do either the online or on-campus RIT Data Science Masters Program, it should not be taken seriously as a probabilistic model.  
+Since there is no framework for making a subjective decision weighting the potential benefits of on-campus life with the value of entering the workforce 18 months 
+sooner, I decided to make one.  Inshallah I shall reach my true potential and fulfill destiny.
+
+with archaic knowledge imbued by Dr. Pepper flowing through my veins, I have selected \(y= 3x^2 - 2y\) as the equation for covariance.

 \end{document}