Measure theory and probability

Posted on April 2, 2019

Tags: probability

1 Intro
- 1.1 Borel-Sigma algebra and closure
2 What is a measure
3 Terms
- 3.1 Notation
- 3.2 Z-test T-test ANOVA
4 P-value
- 4.1 Example p-value of fair coin
5 z-score
6 CLT
7 Distributions

1 Intro

σ-algebra of some set \(X\) are the set of measurable subsets of \(X\)
- We called these measurable subsets of \(X\) intervals or measurable sets.
We can have different perspectives aka different σ-algebras for the same set \(X\)

\[ X = \{a,b\} \qquad \mathcal{A} \subseteq P(X) \] \[ X = \mathbb{N} = \{1,2,3,4,5..\} \qquad P(X) = \{\emptyset,\{1,2\},\{1,3\},..\}\tag{example}\] \[\mathcal{A}= \{\emptyset , \{1\},\{1,3,5\},\{1,3,5,7\}..\}\]

Instead of having a measure for the entire powerset of natural numbers \(P(X)\),
we choose the set containing measurable sets of odd numbers \(\mathcal{A}\)

A trivial smallest σ-algebra is \(\mathcal{A} = \{\emptyset , X\}\)
A trivial largest σ-algebra is \(\mathcal{A} = P(X)\)

\(\mathcal{A}\) is a σ-algebra aka A :: σ-algebra(X)

each elements in a σ-algebra,\(\mathcal{A}\) is a subset of \(X\)
- each of these subsets of \(X\) are called measurable sets
  - ex. \(\emptyset\) is a measurable set, \(X\) is a measurable set, \(\{1\}\) is a measurable set, \(\{1,3,5\}\) is a measurable set
  - A σ-algebra is some set \(\mathcal{A}\) containing the listed measurable sets above
Rules of σ-algebra
1. \(\{\emptyset,X\} \in \mathcal{A}\)
  - All σ-algebras must contain the empty set and the base set
2. \(L \in \mathcal{A} \rightarrow X/L \in \mathcal{A}\)
  - Each measurable set \(L\) has a complement measurable set
    - if \(X=\{1,2,3\}, L=\{2\}\) then \(X/L=\{1,3\}\) must also be in the σ-algebra
    - to build a a σ-algebra, we can just break \(X\) into two subsets then add these measurable sets to \(\mathcal{A}\)
3. \(for\ L_i \in \mathcal{A}, \bigcup\limits_{i \in \mathbb{N}}^{\infty} L_i \in \mathcal{A}\)
  - Obviously, If we merged all the measurable sets \(L_i\) inside a σ-algebra together, the output measurable set would still be in the same σ-algebra.

For a given set we may have multiple perspectives of what can be measured aka we can have multiple σ-algebras \(\mathcal{A}\) on \(X\).

1.1 Borel-Sigma algebra and closure

\[X=\{1,2,3\} \qquad \mathcal{M}=\{\{2\}\}\] \[ \mathcal{M} \cup \{\emptyset, X\} \tag{add empty and base set}\] \[ X/\{2\} = \{1,2\} \quad \sigma (\mathcal{M}) = \mathcal{M}\cup \{\emptyset, X\} \cup \{\{1,2\}\} \tag{add complement of measurable sets}\] Given a subset of \(P(X), \mathcal{M}\) which is not a σ-algebra, what is the minimal measurable sets we need to add to turn it into a σ-algebra? \(\sigma (\mathcal{M})\) is the result and is called the Borel-Sigma algebra.

Let \((X,\mathcal{M})\) be a topological space or a metric space (more importantly \(\mathcal{T}\) has open sets)
- then \(B(X) := \sigma (\mathcal{M})\) is the Borel-Sigma algebra generated by open sets
- \(\sigma (\mathcal{M})\) is the σ-algebra closure of \(\mathcal{M}\)

2 What is a measure

3 Terms

Probability space is \((\Omega , \mathcal{F} , P)\)
\(X: \Omega \rightarrow \mathbb{R}^n\)
\(\Omega\) is “Sample Space” which is actually the Statistical Population.
\(\mathcal{F}\) is “Event Space” which is actually the Statistical random samples.`````

Stats

Population is reality
Sample is a subset of some population,

Statistical Test is all about asking do these 2 samples come from the same population?

Statistics always use sample to estimate or model population.
- Aka use Subset of Population to estimate population.

3.1 Notation

X~N(0,1) means random variable \(X\) has the normal distribution of mean 0 and variance of 1
- \(X\) is a random variable means we choose a subset aka sample \(X\) from a population.
  - This subset can take on many different RANDOM VARying combinations of values aka “random variable”.
- Wiki: Random Variable is any function that maps from the Sample Space to a Real number.
  - Sample Space is just the possible sample subsets of the population.

Population Mean

\(\mu = E(Y_i)\)

Sample Mean

\(\bar{x}\~N(\mu,\frac{\sigma^2}{n})\)
How does a sample mean have a distribution?
- The sample mean is a RANDOM VARIABLE, not a constant, since it’s value will differ depending on the subset of population sampled. This variability allows thie sample mean to have a distribution.
  - The meaning of a normally distributed sample means is
    “the sampled mean has some probability of falling within some interval which follows a normal distribution”

Parameters

\(\mu\) mu is the mean
σ sigma is std (z-score = sigma)
\(\sigma^2\) sigma squared is variance

3.2 Z-test T-test ANOVA

z-test is closest to normal dist.
t-test is similar to z-test but takes into account degrees of freedom.
ANOVA-analysis of variance is basically t-test but with more than 2 populations

Tails

2-tail test for Alt Hypothesis inequality
1-tail test for Alt Hypothesis gt or lt

Multiple Regression vs Multivariate regression

Multiple regression means more than one independent variable
- Age, Weight, Height as predictors for one independent variable GPA
Multivariate means more than one dependent variable

independent random variable = Subset
Note these Subsets can come from the same or different populations.

4 P-value

4.1 Example p-value of fair coin

p-value for 2 heads: Probability of Event + Probability of Equally rare Event + Probability of More Rarer Events = Prob(HH) + Prob(TT) + 0 = 0.5

Notice even though Probability of 2 heads is only Prob(HH)=0.25, the p-value is 0.5

p-value is almost like inverse Shannon entropy. High p-value means not surprising something is “fair” or Equal(Null hypothesis).

5 z-score

z-score is used when you can normalize your dataset to a 0 mean and 1 std

     ____
   /      \ 
 /+|      |+\
/++|      |++\
  -1  0   1     z-score

\[P(X \lt -1)+P (X \gt 1) = \text{zscore of 1}\]

p-value = 0.3173 aka 31.73% probability or area under the z-distribution curve

Notice how the inverse 1-0.3173 = 0.6827 is around 68% which aligns with the 68-95-99.7 rule

     ____
   /      \ 
 /+|        \
/++|         \
  -1  0   1     z-score

\[P(X \lt -1) = \text{zscore of -1 one-tailed}\]

p-value = 0.1586 aka 15.86% probability or area under the z-distribution curve

     ____
   /++++++\ 
 /++++++++| \
/+++++++++|  \
  -1  0   1     z-score

\[P(X \gt 1) = \text{zscore of 1 one-tailed}\]

p-value = 0.8413 which is 84.13% probability or 84.13% under the z-distribution curve

6 CLT

no matter what type of distribution the population distribution is, if we sample enough times, the sample distribution AKA the means of all our samplings form a normal distribution.
The sum of multiple independent random variable converges to a normal dist as the # of variables increases.

7 Distributions

Bernoulli = LEM in probability
- VERY SIMPLE
- An event either happens or it doesnt.
  - Bernoulli distribution is just these a plot of 2 probabilities that add up to 1.
- Eg. we draw sticks, probability we get red is 0.2, probability of other color is (1-0.2).
Binomial = repeated Bernoulli
- Bernoulli but Sample multiple times
- Eg. we draw sticks multiple times, probability we get red 5 times.
Beta-distribution = Ordering
- if n points are randomly chosen from interval [0,1], the j-th point has beta(j,n-j+1) as the beta-dist
Hyper-geom = Finite resources
- 1000 chocolate bars, 5 golden tickets and 20 chocolate bars are bought. What is prob we get X golden tickets?
Logistic dist = Closely related to Logistic regression
- CDF is the logistic function (sigmoid is an example of a logistic function)
  - Maps the real number line [-inf,inf] to probabilities [0,1]
Lognormal = dist where the log is normally distributed
- The log of stock returns is normally distributed.
- Given what is the probability we achieve a log(0.05%) return for today? To answer this, look up log(0.05%) in the lognormal dist
Chi-Squared ## Jaynes