Loading and Plotting Financial Data

Posted on July 1, 2015

Tags: Economics

0.1 Python

Example file: /home/kali/Desktop/SPY.csv

Date	Close	Volume	Open	High	Low
1/1/2020	4471.31	–	4447.69	4475.82	4447.69

from pathlib import Path
import pandas as pd
mycsv = Path("/home/kali/Desktop/SPY.csv")
mycsv.exists()
#> True

#Pandas automatically takes the first line as the header row.
outDF=pd.read_csv(mycsv,
                  infer_datetime_format=True,
                  parse_dates=["Date"], #Convert "Date" :: object => "Date" :: datetime64[ns]
                  index_col="Date", #Set index from natural numbers to date 
                  na_values="--")
print(outDF.dtypes)

# Close/Last    float64
# Volume        float64
# Open          float64
# High          float64
# Low           float64
# dtype: object

0.1.1 Plotting

import matplotlib.pyplot as plt
%matplotlib inline

plotDF = pd.DataFrame(outDF, index=outDF.index, columns=["Close/Last","Open", "High", "Low"])
plotDF.plot()

Pandas Series analogous to a list
Pandas DataFrame analogous to dictionary of series

0.1.1.1 Series

0.1.1.1.1 Convert DataFrame to Series - squeeze()

closeSERIES = outDF["Close/Last"].squeeze()

0.1.1.1.2 Plotting a Series

closeSERIES.plot()

0.1.2 Shifting by time

Shifting a DF or Series is practically the same
Need to make sure our index is a “datatime64[ns]” to use “freq” argument

shiftedDF=outDF.shift(periods=30, freq=None) # returns a new shifted DF by 30 days

shiftSERIES = closeSERIES.shift(periods=30, freq=None)

Possible values for freq :: str

0.1.3 Getting daily % returns

Note we don’t even have to use a .shift()
forward fill missing data because backfill has lookahead bias

pctReturnSERIES = closeSERIES.pct_change(
                                        periods=1, 
                                        fill_method ="ffill")
#default fill_method is 'pad' == 'ffill'

0.1.3.0.1 Plotting histogram of % returns

pctReturnSERIES.hist()

0.1.4 Converting % returns into Categorical buckets with .cut()

First let’s analyze returns of the SPY

pctReturnSERIES.describe()
# count    251.000000
# mean      -0.000957
# std        0.008594
# min       -0.023238
# 25%       -0.006424    IQR lower bound
# 50%       -0.001255    Median
# 75%        0.003340    IQR upper bound
# max        0.036579
# Name: Close/Last, dtype: float64

The 50% represents the median or most common return. (line 7)
(25% 75%) is the IQR or where our returns will lie half of the time. (line 6+8)
- half of the time, we will see a return that lies in the interval
  [-0.6424%, 0.3340%]

Prefer IQR over std because IQR is a frequency measure resistant to outliers while std depends on the mean.

0.2 Mathematica

CandlestickChart[{"SPY", {{2020, 1, 1}, {2021, 10, 20}}}]