Quant Basic Statistics

Posted on July 1, 2015

Tags: Economics

1 Core Statistics

1.1 Mean

Means are “Point Estimates” meaning all the data is collapsed into one point
Point Estimates Can Be Deceiving since you may lose important information

1.1.1 Arithmetic mean

obvious

1.1.2 Geometric mean

Geometric mean is preferrable for understanding returns.

Example is a stock that stays same for 1 year then doubles for 2 years,
{100,100,400,800}
- Remember: 4 datapoints =implies=> 3 ratios aka 3 returns
- The geometric mean of returns is 200%
- This means at 100 if we 200% our previous value each year, we will reach 800
- 100 =200%=> 200 =200%=> 400 =200%=> 800
- \(G = \sqrt[3]{(\frac{100}{100}) (\frac{400}{100})(\frac{800}{400})} = 2.0\)
  - this is geometric mean
- \(R_G = \sqrt[3]{(1 + 0\%) (1 + 300\%)(1 + 100\%)} - 1 = 1.0\)
  - this is in terms of returns meaning 100% returns each year, (200% our previous value is same as 100% returns)

1.1.3 Harmonic mean

\(H = \frac{n}{\sum_{i=1}^n \frac{1}{X_i}}\)

\(Reciprocal(HarmonicMean[Dataset]) = ArithmeticMean(Reciprocal([Dataset]))\)

If data is (100,200,300)
the Reciprocal is (1/100, 1/200, 1/300) the ArithmeticMean of that is 0.006111.
The harmonic mean is 1/0.006111 which is 163.636363666

The harmonic mean is appropriate if the data values are ratios of two variables with different measures, called rates
The harmonic mean can be used when the data can be naturally phrased in terms of ratios. For instance, in the dollar-cost averaging strategy, a fixed amount is spent on shares of a stock at regular intervals. The higher the price of the stock, then, the fewer shares an investor following this strategy buys.
- The average (arithmetic mean) amount they pay for the stock is the harmonic mean of the prices.

2 Variance (Measures of dispersion)

using standard deviation to measure frequency of an event will usually assume normality
If returns have been very tight around a central value, then we have less reason to worry. If returns have been all over the place, that is risky.

2.1 Range

Range = Max - Min

2.2 Mean Absolute Deviation (MAD)

\(MAD = \frac{\sum_{i=1}^n |X_i - \mu|}{n}\)

MAD is average absolute difference of each datapoint from mean

2.3 Variance

\(\sigma^2 = \frac{\sum_{i=1}^n (X_i - \mu)^2}{n}\)

Variance(\(\sigma^2\)) is average squared difference of each datapoint from mean
prefer Variance over MAD because squared function is smooth and differentiable but absolute-value function is not

2.4 Std dev

Std dev(\(\sigma\)) is sqrt of variance
Easier to interpret since it is same units as datapoints

2.4.1 Chebyshev’s inequality

Chebyshev’s inequality is useful for ALL DISTRIBUTIONS
- This inequality is more general than the normal distribution’s 68-95-99.7 rule
Chebyshev’s inequality tells us that the proportion of samples within \(k\) standard deviations of the mean is at least \(1 - 1/k^2\)
- Within k=1 std dev of mean, has at least 0 samples (trivial)
- within k=2 std dev of mean, has at least (1-1/4)= 75% samples
- within k=3 std dev of mean, has at least (1-1/9)= 88.89% samples
- within k=4 std dev of mean, has at least (1-1/16)= 93.75% samples

2.5 Semivariance and semideviation

Stocks we worry more about deviation downwards,
Semivariance(\(\sigma^2_{<\mu}\)) and semideviation(\(\sigma_{<\mu}\)) is like variance and std dev but only for datapoints less than or equal to the mean(\(\mu\))

(1,2,3,4,5) has mean of 3. Semivariance only looks at data points (1,2,3) and ignore (4,5). \(\sigma^2_{<\mu}=\frac{(1-3)^2 + (2-3)^2 + (3-3)^2}{3}\)

Another related concept is target semivariance or target semideviation, which is deviation below our chosen target value.

3 Skew - Statistical Moments

Variance and mean tells us nothing about the Shape of the distribution
Skew tells us the shape of a distribution
0 Skew = symmetric distribution like a normal curve
A Skewed distribution can have the same mean and variance as a normal distribution

Positive Skew - many small positive value, few large negative value, Mean > Median > Mode
Negative Skew - many small negative value, few large positive value, Mean < Median < Mode

3.1 Kurtosis

kurtosis = 3, All normal distribution regardless of mean and variance have kurtosis = 3
kurtosis > 3, called leptokurtic distribution, many spikes away from mean
kurtosis < 3, called platykurtic distribution, fewer spikes away from mean

SP500 returns is leptokurtic, meaning returns have many spikes away from mean

3.2 Normality Testing Using Jarque-Bera

Example taking the mean of a bi-modal distribution is useless
Typically we assume the distribution of asset returns are normally distributed but they arent so many statistical tools are actually flawed

The Jarque-Bera test is a common statistical test that compares whether sample data has skewness and kurtosis similar to a normal distribution.
The Jarque Bera test’s null hypothesis is that the data came from a normal distribution. Because of this it can err on the side of not catching a non-normal process if you have a low p-value.

4 Linear correlation analysis

Correlation is typically a Correlation matrix
The correlation coefficient measures the extent to which the relationship between two variables is linear.
Correl > 0, positive linear correlation
Correl = 0, no correlation
Correl < 0, inverse linear correlation

4.1 Covariance vs Correlation

Correlation = Normalized(Covariance)

4.2 Rolling window correlation

Correlation may vary over time period so it may be useful to do a rolling window correlation w/ assets

5 Instability of Parameter Estimates

Basically use rolling window statistics over “Point Estimates” over the entire timeframe.

5.1 Sharpe Ratio

\[R = \frac{E[r_a - r_b]}{\sqrt{Var(r_a - r_b)}}\]

One statistic often used to describe the performance of assets and portfolios is the Sharpe ratio, which measures the additional return per unit additional risk achieved by a portfolio, relative to a risk-free source of return such as Treasury bills

5.2 Moving averages

You can take moving average of sharpe ratio, means, standard deviation
Moving average of mean, +1 std dev, -1 std dev are called Bollinger bands

6 Random Variables

6.1 Binomial distribution

\[p(x) = P(X = x) = \binom{n}{x}p^x(1-p)^{n-x} = \frac{n!}{(n-x)! \ x!} p^x(1-p)^{n-x}\]

Either it happens or it doesnt with probability p and 1-p
\(\binom{6}{2} = 6 \times 5\)

6.2 Stock movement as Binormial Distribution

Example: Assume we know a stock moves UP or DOWN with probability of 50%. In 5 days, we can predict the chance there are 1,2,3,4, or 5 UP days using a binomial distribution. Ofc the downside is order doesnt matter, example, 5 choose 3 tells us there are 3 UP days but they can be in any order.

6.2.1 Binomial Model of Stock Price Movement

This is used as one of the foundations for option pricing.
In the Binomial Model, it is assumed that for any given time period a stock price can move up or down by a value determined by the up or down probabilities.
This turns the stock price into the function of a binomial random variable, the magnitude of upward or downward movement, and the initial stock price. We can vary these parameters in order to approximate different stock price distributions.

6.3 Linearity of Normal Distributions

\[N(\mu_1, \sigma_1^2) + N(\mu_2, \sigma_2^2) = N(\mu_1 + \mu_2, \sigma_1^2 + \sigma_2^2)\]

The mean and variance of a normal distribution is linear meaning you can add two normal distributions (aka two normal random variables) and the mean and variance of the resultant normal distribution will be the sum
In modern portfolio theory, stock returns are generally assumed to follow a normal distribution. One major characteristic of a normal random variable is that a linear combination of two or more normal random variables is another normal random variable. This is useful for considering mean returns and variance of a portfolio of multiple stocks. Up until this point, we have only considered single variable, or univariate, probability distributions. When we want to describe random variables at once, as in the case of observing multiple stocks, we can instead look at a multivariate distribution. A multivariate normal distribution is described entirely by the means of each variable, their variances, and the distinct correlations between each and every pair of variables. This is important when determining characteristics of portfolios because the variance of the overall portfolio depends on both the variances of its securities and the correlations between them

7 Linear Regression

\[Y = \alpha + \beta X\]

The linear regression model rests on these assumptions

The independent variable is not random.
The variance of the error term is constant across observations. This is important for evaluating the goodness of the fit.
The errors are not autocorrelated. The Durbin-Watson statistic detects this; if it is close to 2, there is no autocorrelation.
The errors are normally distributed. If this does not hold, we cannot use some of the statistics, such as the F-test.

If we confirm that the necessary assumptions of the regression model are satisfied, we can safely use the statistics reported to analyze the fit. For example, \(R^2\) the value tells us the fraction of \(Y\) the total variation of that is explained by the model.

7.1 Deriving Linear Regression using OLS

\[OLS=\sum_{i=1}^n (Y_i - a - bX_i)^2\]

Linear Regression model is an optimization function using the Ordinary Least Squares (OLS) as the objective function aka likelihood function
We use \(a\) and \(b\) to represent the potential candidates for \(\alpha\) and \(\beta\). What this objective function means is that for each point on the line of best fit we compare it with the real point and take the square of the difference. This function will decrease as we get better parameter estimates.
Regression is a simple case of numerical optimization that has a closed form solution and does not need any optimizer. We just find the results that minimize the objective function.

We will denote the eventual model that results from minimizing our objective function as:

\[\hat{Y} = \hat{\alpha} + \hat{\beta}X\]

7.2 Standard Error

We can also find the standard error of estimate, which measures the standard deviation of the error term \(\epsilon\) by getting the scale parameter of the model returned by the regression and taking its square root. The formula for standard error of estimate is

\[s = \left( \frac{\sum_{i=1}^n \epsilon_i^2}{n-2} \right)^{1/2}\]

8 Maximum Likelihood Estimation (MLE)

9 Regression Model Instability

we would like to evaluate the accuracy of the model not by how well it explains the dependent variable, but by how stable it is (that is, how stable the regression coefficients are) with respect to our sample data. After all, if a model is truly a good fit, it should be similar, say, for two random halves of our data set that we model individually.

9.1 Regime changes

In this case our regression model will not be predictive of future data points since the underlying system is no longer the same as in the sample. In fact, the regression analysis assumes that the errors are uncorrelated and have constant variance, which is often not be the case if there is a regime change.