Measure theory and probability
1 Intro
- σ-algebra of some set \(X\) are the set of measurable subsets of \(X\)
- We called these measurable subsets of \(X\) intervals or measurable sets.
- We can have different perspectives aka different σ-algebras for the same set \(X\)
\[ X = \{a,b\} \qquad \mathcal{A} \subseteq P(X) \] \[ X = \mathbb{N} = \{1,2,3,4,5..\} \qquad P(X) = \{\emptyset,\{1,2\},\{1,3\},..\}\tag{example}\] \[\mathcal{A}= \{\emptyset , \{1\},\{1,3,5\},\{1,3,5,7\}..\}\]
- Instead of having a measure for the entire powerset of natural numbers \(P(X)\),
we choose the set containing measurable sets of odd numbers \(\mathcal{A}\)
A trivial smallest σ-algebra is \(\mathcal{A} = \{\emptyset , X\}\)
A trivial largest σ-algebra is \(\mathcal{A} = P(X)\)
\(\mathcal{A}\) is a σ-algebra aka A :: σ-algebra(X)
- each elements in a σ-algebra,\(\mathcal{A}\) is a subset of \(X\)
- each of these subsets of \(X\) are called measurable sets
- ex. \(\emptyset\) is a measurable set, \(X\) is a measurable set, \(\{1\}\) is a measurable set, \(\{1,3,5\}\) is a measurable set
- A σ-algebra is some set \(\mathcal{A}\) containing the listed measurable sets above
- each of these subsets of \(X\) are called measurable sets
- Rules of σ-algebra
- \(\{\emptyset,X\} \in \mathcal{A}\)
- All σ-algebras must contain the empty set and the base set
- \(L \in \mathcal{A} \rightarrow X/L \in \mathcal{A}\)
- Each measurable set \(L\) has a complement measurable set
- if \(X=\{1,2,3\}, L=\{2\}\) then \(X/L=\{1,3\}\) must also be in the σ-algebra
- to build a a σ-algebra, we can just break \(X\) into two subsets then add these measurable sets to \(\mathcal{A}\)
- Each measurable set \(L\) has a complement measurable set
- \(for\ L_i \in \mathcal{A}, \bigcup\limits_{i \in \mathbb{N}}^{\infty} L_i \in \mathcal{A}\)
- Obviously, If we merged all the measurable sets \(L_i\) inside a σ-algebra together, the output measurable set would still be in the same σ-algebra.
- \(\{\emptyset,X\} \in \mathcal{A}\)
For a given set we may have multiple perspectives of what can be measured aka we can have multiple σ-algebras \(\mathcal{A}\) on \(X\).
1.1 Borel-Sigma algebra and closure
\[X=\{1,2,3\} \qquad \mathcal{M}=\{\{2\}\}\] \[ \mathcal{M} \cup \{\emptyset, X\} \tag{add empty and base set}\] \[ X/\{2\} = \{1,2\} \quad \sigma (\mathcal{M}) = \mathcal{M}\cup \{\emptyset, X\} \cup \{\{1,2\}\} \tag{add complement of measurable sets}\] Given a subset of \(P(X), \mathcal{M}\) which is not a σ-algebra, what is the minimal measurable sets we need to add to turn it into a σ-algebra? \(\sigma (\mathcal{M})\) is the result and is called the Borel-Sigma algebra.
- Let \((X,\mathcal{M})\) be a topological space or a metric space (more importantly \(\mathcal{T}\) has open sets)
- then \(B(X) := \sigma (\mathcal{M})\) is the Borel-Sigma algebra generated by open sets
- \(\sigma (\mathcal{M})\) is the σ-algebra closure of \(\mathcal{M}\)
2 What is a measure
3 Terms
- Probability space is \((\Omega , \mathcal{F} , P)\)
- \(X: \Omega \rightarrow \mathbb{R}^n\)
- \(\Omega\) is “Sample Space” which is actually the Statistical Population.
- \(\mathcal{F}\) is “Event Space” which is actually the Statistical random samples.`````
Stats
- Population is reality
- Sample is a subset of some population,
Statistical Test is all about asking do these 2 samples come from the same population?
- Statistics always use sample to estimate or model population.
- Aka use Subset of Population to estimate population.
3.1 Notation
- X~N(0,1) means random variable \(X\) has the normal distribution of mean 0 and variance of 1
- \(X\) is a random variable means we choose a subset aka sample \(X\) from a population.
- This subset can take on many different RANDOM VARying combinations of values aka “random variable”.
- Wiki: Random Variable is any function that maps from the Sample Space to a Real number.
- Sample Space is just the possible sample subsets of the population.
- \(X\) is a random variable means we choose a subset aka sample \(X\) from a population.
Population Mean
- \(\mu = E(Y_i)\)
Sample Mean
- \(\bar{x}\~N(\mu,\frac{\sigma^2}{n})\)
- How does a sample mean have a distribution?
- The sample mean is a RANDOM VARIABLE, not a constant, since it’s value will differ depending on the subset of population sampled. This variability allows thie sample mean to have a distribution.
- The meaning of a normally distributed sample means is
“the sampled mean has some probability of falling within some interval which follows a normal distribution”
- The meaning of a normally distributed sample means is
- The sample mean is a RANDOM VARIABLE, not a constant, since it’s value will differ depending on the subset of population sampled. This variability allows thie sample mean to have a distribution.
Parameters
- \(\mu\) mu is the mean
- σ sigma is std (z-score = sigma)
- \(\sigma^2\) sigma squared is variance
3.2 Z-test T-test ANOVA
- z-test is closest to normal dist.
- t-test is similar to z-test but takes into account degrees of freedom.
- ANOVA-analysis of variance is basically t-test but with more than 2 populations
Tails
- 2-tail test for Alt Hypothesis inequality
- 1-tail test for Alt Hypothesis gt or lt
Multiple Regression vs Multivariate regression
- Multiple regression means more than one independent variable
- Age, Weight, Height as predictors for one independent variable GPA
- Multivariate means more than one dependent variable
independent random variable = Subset
Note these Subsets can come from the same or different populations.
4 P-value
4.1 Example p-value of fair coin
p-value for 2 heads: Probability of Event + Probability of Equally rare Event + Probability of More Rarer Events = Prob(HH) + Prob(TT) + 0 = 0.5
Notice even though Probability of 2 heads is only Prob(HH)=0.25, the p-value is 0.5
p-value is almost like inverse Shannon entropy. High p-value means not surprising something is “fair” or Equal(Null hypothesis).
5 z-score
z-score is used when you can normalize your dataset to a 0 mean and 1 std
____
/ \
/+| |+\
/++| |++\ -1 0 1 z-score
\[P(X \lt -1)+P (X \gt 1) = \text{zscore of 1}\]
p-value = 0.3173 aka 31.73% probability or area under the z-distribution curve
Notice how the inverse 1-0.3173 = 0.6827 is around 68% which aligns with the 68-95-99.7 rule
____
/ \
/+| \
/++| \ -1 0 1 z-score
\[P(X \lt -1) = \text{zscore of -1 one-tailed}\]
p-value = 0.1586 aka 15.86% probability or area under the z-distribution curve
____
/++++++\
/++++++++| \
/+++++++++| \ -1 0 1 z-score
\[P(X \gt 1) = \text{zscore of 1 one-tailed}\]
p-value = 0.8413 which is 84.13% probability or 84.13% under the z-distribution curve
6 CLT
- no matter what type of distribution the population distribution is, if we sample enough times, the sample distribution AKA the means of all our samplings form a normal distribution.
- The sum of multiple independent random variable converges to a normal dist as the # of variables increases.
7 Distributions
- Bernoulli = LEM in probability
- VERY SIMPLE
- An event either happens or it doesnt.
- Bernoulli distribution is just these a plot of 2 probabilities that add up to 1.
- Eg. we draw sticks, probability we get red is 0.2, probability of other color is (1-0.2).
- Binomial = repeated Bernoulli
- Bernoulli but Sample multiple times
- Eg. we draw sticks multiple times, probability we get red 5 times.
- Beta-distribution = Ordering
- if n points are randomly chosen from interval [0,1], the j-th point has beta(j,n-j+1) as the beta-dist
- Hyper-geom = Finite resources
- 1000 chocolate bars, 5 golden tickets and 20 chocolate bars are bought. What is prob we get X golden tickets?
- Logistic dist = Closely related to Logistic regression
- CDF is the logistic function (sigmoid is an example of a logistic function)
- Maps the real number line [-inf,inf] to probabilities [0,1]
- CDF is the logistic function (sigmoid is an example of a logistic function)
- Lognormal = dist where the log is normally distributed
- The log of stock returns is normally distributed.
- Given what is the probability we achieve a log(0.05%) return for today? To answer this, look up log(0.05%) in the lognormal dist
- Chi-Squared ## Jaynes