diff --git a/report/report.pdf b/report/report.pdf index a8fc1ae..ae18a61 100644 Binary files a/report/report.pdf and b/report/report.pdf differ diff --git a/report/report.tex b/report/report.tex index 48dd965..b4a02ce 100644 --- a/report/report.tex +++ b/report/report.tex @@ -3,6 +3,7 @@ \usepackage{hyperref} \usepackage{amsmath} \usepackage{amssymb} +\usepackage{tikz} \usepackage[a4paper, total={6in, 10in}]{geometry} \usepackage{setspace} \setstretch{1.25} @@ -187,8 +188,8 @@ to assume that the sophistication of our tools overrides imperfections in the da In this unit I explored some common fallacies and assumptions held by analysts who may not fully grasp the content that they work with, nor the problems they intend to solve. This required extensive research that I found was best digested in the form of books whose chapters chronicle multiple examples of a given principle. As such, the reading was not confined to just the timeslot designated for this unit. Research started during the months leading up -to the start of the semester\footnote{Only research during the semester was logged in the timesheet} and have continued through the independent study. This structure was particularly helpful to pull me back and gain perspective of what -my goal was when I was knee-deep in feature construction and model formulation. +to the start of the semester\footnote{Only research during the semester was logged in the timesheet} and have continued through the independent study. This +structure was particularly helpful to pull me back and gain perspective of what my goal was when I was knee-deep in feature construction and model formulation. \subsubsection{Moral Hazards and The Bob Rubin Trade} Picking pennies in front of a steamroller. @@ -199,8 +200,8 @@ flags for significant events in reality that do not effect the proposed course o The 2009 recession, attributed to the collapse of the housing market bubble, is the most common example of a moral hazard because the displacement of risk from banks who were federally required to give subprime loans to the taxpayer meant that banks could profit from subprime loans but would not be harmed when the inevitable occurred. In popular media, the housing bubble bursting is attributed to the banks where those in the industry passed off the event as something that nobody could -have forseen.\footnote{For instance, in the 2015 movie \textit{The Big Short}, only a few savvy traders who bothered to look into the details find that banks had, -in their ignorance, built the bundled mortgages on an unstable foundation.} In reality, banks only ignored a probablistic eventuality because their models did not +have foreseen\footnote{For instance, in the 2015 movie \textit{The Big Short}, only a few savvy traders who bothered to look into the details find that banks had, +in their ignorance, built the bundled mortgages on an unstable foundation.}. In reality, banks only ignored a probablistic eventuality because their models did not need to account for such an event. Most emphasize the problems with risk transferrence when creating models. For this study's purposes, the important learning is that probablistic models should not @@ -291,9 +292,35 @@ Finally, this equation is updated to replace descriptions with technical terms: \] Even this equation can be misconstrued as a number of arrangements of ratios involving total occurrences from a category or non-occurrences from outside -of the category so as a final demonstration, the sample space will be visualized geometrically -\footnote{Concept credit to 3Blue1Brown on Youtube, this video is what finally clarified in my mind what the equation behind Bayes Theorem meant.\\ -\url{https://www.youtube.com/watch?v=HZGCoVF3YvM}} as a 1 unit by 1 unit square. +of the category so as a final demonstration, the sample space can be visualized geometrically as a 1 unit by 1 unit +square\footnote{Concept credit to 3Blue1Brown on Youtube, this video is what finally clarified in my mind what the frankly simple equation behind Bayes Theorem +meant.\\\url{https://www.youtube.com/watch?v=HZGCoVF3YvM}}. The area of this square, 1 unit squared, is the equivalent to a probability of 1 (or 100\%). +In such an example, a vertical line is drawn to separate proportions representative of the category (or the assumed-true event) and observations not of that category. +Horizontal lines drawn in each represent the probability of an occurrence in each category. + +Consider an example where a cancer test given to 1,000 people has a 95\% accuracy rate. Of those 1,000 people, 10\% of them have cancer, 95 of whom test positive +(true positive) and 5 who test negative (false negative). Of the remaining 900, 45 test positive (false positive) and 855 test negative (true negative). Such +an example can be expressed visually as: +\vskip 2pt +\begin{center} + \begin{tikzpicture} + \draw[gray, thick] (0,0) rectangle (3,3); + \draw[gray, thin] (3/10, 0) -- (3/10, 3); + \draw[gray, thin] (0, 0) rectangle (3/10, 3*.95); + \node[label=below:95/1000] at (-1,1) {TP}; + \draw[->] (-.6, 1) -- (.15, 1); + \node[label=below:45/1000] at (1.5,-2/3) {FP}; + \draw[->] (1.5, -1/3) -- (1.5, .05); + \draw[gray, thin] (3/10, 0) rectangle (3, 3*.05); + \end{tikzpicture} +\end{center} +\vskip 2pt +Using this visual where TP represents true positives and FP representing false positives, Bayes Theorem is simply expressed as: +\[ +P(A|E) = \frac{TP}{TP + FP} = \frac{\frac{95}{1000}}{\frac{95}{1000} + \frac{45}{1000}} = 67.9\% +\] +Meaning that, given a random positive test, there is a 67.9\% chance of the patient actually having cancer. This percentage visually tracks with the graphic as +the TP box appears to be approximately twice the size of the FP box, giving a two-thirds chance of the patient being a true positive. \subsubsection{Bayesian Updating} diff --git a/timesheet/timesheet.csv b/timesheet/timesheet.csv index d709162..c48e3a2 100644 --- a/timesheet/timesheet.csv +++ b/timesheet/timesheet.csv @@ -22,4 +22,5 @@ Week,Date,Type,Duration (Hours),Description 7,10/11,Advising Meetings,1,"Epistemology and Overview discussion, hex mapping" 8,10/15,Research,3,"Bayes Belief Networks" 8,10/16,Application,2.5,"Bayes visualizations and practice worksheets" -8,10/16,Reporting,2,"Early Bayesian Statistics Report" \ No newline at end of file +8,10/16,Reporting,2,"Early Bayesian Statistics Report" +8,10/17,Application,2,"Bayes Geometric Visualization" \ No newline at end of file diff --git a/timesheet/timesheet.pdf b/timesheet/timesheet.pdf index 168777c..2cdc13f 100644 Binary files a/timesheet/timesheet.pdf and b/timesheet/timesheet.pdf differ diff --git a/timesheet/timesheet.tex b/timesheet/timesheet.tex index f40f11a..99fb68d 100644 --- a/timesheet/timesheet.tex +++ b/timesheet/timesheet.tex @@ -80,13 +80,15 @@ Week & Date & Type & Duration (Hours) & Description \\ \hline 8 & 10/16 & Reporting & 2 & Early Bayesian Statistics Report \\ \hline +8 & 10/17 & Application & 2 & Bayes Geometric Visualization \\ +\hline \end{tabular} \end{table} \noindent Hours for Advising Meetings: 6.0\\ -Hours for Application: 4.0\\ +Hours for Application: 6.0\\ Hours for Reporting: 16.0\\ Hours for Research: 28.5\\ -\textbf{Total Hours: 54.5}\\ +\textbf{Total Hours: 56.5}\\ % CLOSE Timesheet \end{document} \ No newline at end of file