Quant Basic Statistics

Posted on July 1, 2015
Tags: Economics

1 Core Statistics

1.1 Mean

  • Means are “Point Estimates” meaning all the data is collapsed into one point
  • Point Estimates Can Be Deceiving since you may lose important information

1.1.1 Arithmetic mean

obvious

1.1.2 Geometric mean

Geometric mean is preferrable for understanding returns.

  • Example is a stock that stays same for 1 year then doubles for 2 years,
  • {100,100,400,800}
    • Remember: 4 datapoints =implies=> 3 ratios aka 3 returns
    • The geometric mean of returns is 200%
    • This means at 100 if we 200% our previous value each year, we will reach 800
    • 100 =200%=> 200 =200%=> 400 =200%=> 800
    • \(G = \sqrt[3]{(\frac{100}{100}) (\frac{400}{100})(\frac{800}{400})} = 2.0\)
      • this is geometric mean
    • \(R_G = \sqrt[3]{(1 + 0\%) (1 + 300\%)(1 + 100\%)} - 1 = 1.0\)
      • this is in terms of returns meaning 100% returns each year, (200% our previous value is same as 100% returns)

1.1.3 Harmonic mean

\(H = \frac{n}{\sum_{i=1}^n \frac{1}{X_i}}\)

\(Reciprocal(HarmonicMean[Dataset]) = ArithmeticMean(Reciprocal([Dataset]))\)

If data is (100,200,300)
the Reciprocal is (1/100, 1/200, 1/300) the ArithmeticMean of that is 0.006111.
The harmonic mean is 1/0.006111 which is 163.636363666

  • The harmonic mean is appropriate if the data values are ratios of two variables with different measures, called rates
  • The harmonic mean can be used when the data can be naturally phrased in terms of ratios. For instance, in the dollar-cost averaging strategy, a fixed amount is spent on shares of a stock at regular intervals. The higher the price of the stock, then, the fewer shares an investor following this strategy buys.
    • The average (arithmetic mean) amount they pay for the stock is the harmonic mean of the prices.

2 Variance (Measures of dispersion)

2.1 Range

  • Range = Max - Min

2.2 Mean Absolute Deviation (MAD)

\(MAD = \frac{\sum_{i=1}^n |X_i - \mu|}{n}\)

  • MAD is average absolute difference of each datapoint from mean

2.3 Variance

\(\sigma^2 = \frac{\sum_{i=1}^n (X_i - \mu)^2}{n}\)

  • Variance(\(\sigma^2\)) is average squared difference of each datapoint from mean
  • prefer Variance over MAD because squared function is smooth and differentiable but absolute-value function is not

2.4 Std dev

  • Std dev(\(\sigma\)) is sqrt of variance
  • Easier to interpret since it is same units as datapoints

2.4.1 Chebyshev’s inequality

  • Chebyshev’s inequality is useful for ALL DISTRIBUTIONS
    • This inequality is more general than the normal distribution’s 68-95-99.7 rule
  • Chebyshev’s inequality tells us that the proportion of samples within \(k\) standard deviations of the mean is at least \(1 - 1/k^2\)
    • Within k=1 std dev of mean, has at least 0 samples (trivial)
    • within k=2 std dev of mean, has at least (1-1/4)= 75% samples
    • within k=3 std dev of mean, has at least (1-1/9)= 88.89% samples
    • within k=4 std dev of mean, has at least (1-1/16)= 93.75% samples

2.5 Semivariance and semideviation

  • Stocks we worry more about deviation downwards,
  • Semivariance(\(\sigma^2_{<\mu}\)) and semideviation(\(\sigma_{<\mu}\)) is like variance and std dev but only for datapoints less than or equal to the mean(\(\mu\))

(1,2,3,4,5) has mean of 3. Semivariance only looks at data points (1,2,3) and ignore (4,5). \(\sigma^2_{<\mu}=\frac{(1-3)^2 + (2-3)^2 + (3-3)^2}{3}\)

  • Another related concept is target semivariance or target semideviation, which is deviation below our chosen target value.

3 Skew - Statistical Moments

Positive Skew - many small positive value, few large negative value, Mean > Median > Mode
Negative Skew - many small negative value, few large positive value, Mean < Median < Mode

3.1 Kurtosis

  • kurtosis = 3, All normal distribution regardless of mean and variance have kurtosis = 3
  • kurtosis > 3, called leptokurtic distribution, many spikes away from mean
  • kurtosis < 3, called platykurtic distribution, fewer spikes away from mean

SP500 returns is leptokurtic, meaning returns have many spikes away from mean

3.2 Normality Testing Using Jarque-Bera

Example taking the mean of a bi-modal distribution is useless
Typically we assume the distribution of asset returns are normally distributed but they arent so many statistical tools are actually flawed

  • The Jarque-Bera test is a common statistical test that compares whether sample data has skewness and kurtosis similar to a normal distribution.
  • The Jarque Bera test’s null hypothesis is that the data came from a normal distribution. Because of this it can err on the side of not catching a non-normal process if you have a low p-value.

4 Linear correlation analysis

4.1 Covariance vs Correlation

  • Correlation = Normalized(Covariance)

4.2 Rolling window correlation

  • Correlation may vary over time period so it may be useful to do a rolling window correlation w/ assets

5 Instability of Parameter Estimates

5.1 Sharpe Ratio

\[R = \frac{E[r_a - r_b]}{\sqrt{Var(r_a - r_b)}}\]

  • One statistic often used to describe the performance of assets and portfolios is the Sharpe ratio, which measures the additional return per unit additional risk achieved by a portfolio, relative to a risk-free source of return such as Treasury bills

5.2 Moving averages

  • You can take moving average of sharpe ratio, means, standard deviation
  • Moving average of mean, +1 std dev, -1 std dev are called Bollinger bands

6 Random Variables

6.1 Binomial distribution

\[p(x) = P(X = x) = \binom{n}{x}p^x(1-p)^{n-x} = \frac{n!}{(n-x)! \ x!} p^x(1-p)^{n-x}\]

  • Either it happens or it doesnt with probability p and 1-p
  • \(\binom{6}{2} = 6 \times 5\)

6.2 Stock movement as Binormial Distribution

  • Example: Assume we know a stock moves UP or DOWN with probability of 50%. In 5 days, we can predict the chance there are 1,2,3,4, or 5 UP days using a binomial distribution. Ofc the downside is order doesnt matter, example, 5 choose 3 tells us there are 3 UP days but they can be in any order.

6.2.1 Binomial Model of Stock Price Movement

  • This is used as one of the foundations for option pricing.
  • In the Binomial Model, it is assumed that for any given time period a stock price can move up or down by a value determined by the up or down probabilities.
  • This turns the stock price into the function of a binomial random variable, the magnitude of upward or downward movement, and the initial stock price. We can vary these parameters in order to approximate different stock price distributions.

6.3 Linearity of Normal Distributions

\[N(\mu_1, \sigma_1^2) + N(\mu_2, \sigma_2^2) = N(\mu_1 + \mu_2, \sigma_1^2 + \sigma_2^2)\]

  • The mean and variance of a normal distribution is linear meaning you can add two normal distributions (aka two normal random variables) and the mean and variance of the resultant normal distribution will be the sum

  • In modern portfolio theory, stock returns are generally assumed to follow a normal distribution. One major characteristic of a normal random variable is that a linear combination of two or more normal random variables is another normal random variable. This is useful for considering mean returns and variance of a portfolio of multiple stocks. Up until this point, we have only considered single variable, or univariate, probability distributions. When we want to describe random variables at once, as in the case of observing multiple stocks, we can instead look at a multivariate distribution. A multivariate normal distribution is described entirely by the means of each variable, their variances, and the distinct correlations between each and every pair of variables. This is important when determining characteristics of portfolios because the variance of the overall portfolio depends on both the variances of its securities and the correlations between them

7 Linear Regression

\[Y = \alpha + \beta X\]

The linear regression model rests on these assumptions

If we confirm that the necessary assumptions of the regression model are satisfied, we can safely use the statistics reported to analyze the fit. For example, \(R^2\) the value tells us the fraction of \(Y\) the total variation of that is explained by the model.

7.1 Deriving Linear Regression using OLS

\[OLS=\sum_{i=1}^n (Y_i - a - bX_i)^2\]

  • Linear Regression model is an optimization function using the Ordinary Least Squares (OLS) as the objective function aka likelihood function
  • We use \(a\) and \(b\) to represent the potential candidates for \(\alpha\) and \(\beta\). What this objective function means is that for each point on the line of best fit we compare it with the real point and take the square of the difference. This function will decrease as we get better parameter estimates.
  • Regression is a simple case of numerical optimization that has a closed form solution and does not need any optimizer. We just find the results that minimize the objective function.

We will denote the eventual model that results from minimizing our objective function as:

\[\hat{Y} = \hat{\alpha} + \hat{\beta}X\]

7.2 Standard Error

  • We can also find the standard error of estimate, which measures the standard deviation of the error term \(\epsilon\) by getting the scale parameter of the model returned by the regression and taking its square root. The formula for standard error of estimate is

\[s = \left( \frac{\sum_{i=1}^n \epsilon_i^2}{n-2} \right)^{1/2}\]

8 Maximum Likelihood Estimation (MLE)

9 Regression Model Instability

9.1 Regime changes

  • In this case our regression model will not be predictive of future data points since the underlying system is no longer the same as in the sample. In fact, the regression analysis assumes that the errors are uncorrelated and have constant variance, which is often not be the case if there is a regime change.

9.2 Multicollinearity