Loading and Plotting Financial Data
Posted on July 1, 2015
Tags: Economics
0.1 Python
Example file: /home/kali/Desktop/SPY.csv
Date | Close | Volume | Open | High | Low |
---|---|---|---|---|---|
1/1/2020 | 4471.31 | – | 4447.69 | 4475.82 | 4447.69 |
from pathlib import Path
import pandas as pd
= Path("/home/kali/Desktop/SPY.csv")
mycsv
mycsv.exists()#> True
#Pandas automatically takes the first line as the header row.
=pd.read_csv(mycsv,
outDF=True,
infer_datetime_format=["Date"], #Convert "Date" :: object => "Date" :: datetime64[ns]
parse_dates="Date", #Set index from natural numbers to date
index_col="--")
na_valuesprint(outDF.dtypes)
# Close/Last float64
# Volume float64
# Open float64
# High float64
# Low float64
# dtype: object
0.1.1 Plotting
import matplotlib.pyplot as plt
%matplotlib inline
= pd.DataFrame(outDF, index=outDF.index, columns=["Close/Last","Open", "High", "Low"])
plotDF plotDF.plot()
- Pandas Series analogous to a list
- Pandas DataFrame analogous to dictionary of series
0.1.1.1 Series
0.1.1.1.1 Convert DataFrame to Series - squeeze()
= outDF["Close/Last"].squeeze() closeSERIES
0.1.1.1.2 Plotting a Series
closeSERIES.plot()
0.1.2 Shifting by time
- Shifting a DF or Series is practically the same
- Need to make sure our index is a “datatime64[ns]” to use “freq” argument
=outDF.shift(periods=30, freq=None) # returns a new shifted DF by 30 days
shiftedDF
= closeSERIES.shift(periods=30, freq=None) shiftSERIES
Possible values for freq :: str
0.1.3 Getting daily % returns
- Note we don’t even have to use a .shift()
- forward fill missing data because backfill has lookahead bias
= closeSERIES.pct_change(
pctReturnSERIES =1,
periods="ffill")
fill_method #default fill_method is 'pad' == 'ffill'
0.1.3.0.1 Plotting histogram of % returns
pctReturnSERIES.hist()
0.1.4 Converting % returns into Categorical buckets with .cut()
- First let’s analyze returns of the SPY
pctReturnSERIES.describe()
# count 251.000000
# mean -0.000957
# std 0.008594
# min -0.023238
# 25% -0.006424 IQR lower bound
# 50% -0.001255 Median
# 75% 0.003340 IQR upper bound
# max 0.036579
# Name: Close/Last, dtype: float64
- The
50%
represents the median or most common return. (line 7) - (
25% 75%
) is the IQR or where our returns will lie half of the time. (line 6+8)- half of the time, we will see a return that lies in the interval
[-0.6424%, 0.3340%]
- half of the time, we will see a return that lies in the interval
Prefer IQR over std because IQR is a frequency measure resistant to outliers while std depends on the mean.
0.2 Mathematica
CandlestickChart[{"SPY", {{2020, 1, 1}, {2021, 10, 20}}}]