6 Data used here

We use three data files in these notes, all with daily US stock prices between 2003 and 2022, both unadjusted and adjusted. The data was obtained from the data vendor we usually use, EOD, and is distributed here with their permission. The files can be found on https://www.financialriskforecasting.com/data as CSV files.

SP500, daily prices of the standard and Poor’s 500 index (S&P 500), the data we used to illustrate univariate applications;
SP500TR, total returns on the S&P 500;
Stocks, the prices of six different stocks, chosen to represent the various sectors of the US economy, and including both winners and losers, “stocks.csv”. The file contains both unadjusted and adjusted prices.

The raw data arrives as CSV files called stocks.csv, sp500.csv and sp500tr.csv.

6.1 Libraries

library(reshape2)

6.2 Processing the `sp500.csv` file

We can import directly a .csv file into R as a data.frame object using the function read.csv(). The first three lines in the file are:

date,Close
20030102,909.03
20030103,908.59

And the last two:

20221229,3849.28
20221230,3839.5

sp500=read.csv('data/sp500.csv')

Now let’s check that our variable df is actually a dataframe:

class(sp500)

[1] "data.frame"

We can check the structure of a dataframe using the function str(). This can be quite useful to see what every column of the dataframe holds and what type of data it has:

str(sp500)

'data.frame':   5035 obs. of  2 variables:
 $ date : int  20030102 20030103 20030106 20030107 20030108 20030109 20030110 20030113 20030114 20030115 ...
 $ Close: num  909 909 929 923 910 ...

head(sp500,2)

date	Close
20030102	909.03
20030103	908.59

tail(sp500,2)

	date	Close
5034	20221229	3849.28
5035	20221230	3839.50

names(sp500)

[1] "date"  "Close"

The name Close is not that convenient, so we renamed it.

names(sp500)[2]="price"
names(sp500)

[1] "date"  "price"

We can make a new column with log returns.

\[ y_t=\log P_1 - \log P_{t-1}. \]

When we do this, we will not have an observation for day 1, and since the new column needs to have the same number of rows as the dataframe, we have to add a first observation and put NA into it:

sp500$y = c(NA,diff(log(sp500$price)))
head(sp500)

date	price	y
20030102	909.03	NA
20030103	908.59	-0.0004841
20030106	929.01	0.0222256
20030107	922.93	-0.0065661
20030108	909.93	-0.0141857
20030109	927.57	0.0192006

It will be nicer to remove the first row. One way to do that is

sp500=sp500[2:dim(sp500)[1],]

A more elegant way is to

sp500=sp500[!is.na(sp500$y),]
head(sp500,2)

	date	price	y
2	20030103	908.59	-0.0004841
3	20030106	929.01	0.0222256

6.2.1 Processing the `sp500tr.csv` file

We do the same with the SP-500 TR file

sp500tr=read.csv('data/sp500tr.csv')
names(sp500tr)[2]="price"
sp500tr$y = c(NA,diff(log(sp500tr$price)))

6.3 Processing the `stocks.csv` file

The start of the stocks.csv file looks like

ticker,date,Close,Adjusted_close
MCD,20030102,16.55,9.6942
MCD,20030103,16.12,9.4423

And the end

INTC,20221229,26.21,25.8945
INTC,20221230,26.43,26.1118

stocks=read.csv('data/stocks.csv')
str(stocks)

'data.frame':   30210 obs. of  4 variables:
 $ ticker        : chr  "MCD" "MCD" "MCD" "MCD" ...
 $ date          : int  20030102 20030103 20030106 20030107 20030108 20030109 20030110 20030113 20030114 20030115 ...
 $ Close         : num  16.6 16.1 16.6 16.7 16.8 ...
 $ Adjusted_close: num  9.69 9.44 9.75 9.76 9.86 ...

The dataframe has five columns and 30210 rows. The str() function shows us the names of each column, along with the data type it holds and some observations. There is a column for every stock, which holds numeric data, and a column for dates, which is of type character.

We can also check parts of the structure with different functions:

dim(stocks)

[1] 30210     4

nrow(stocks)

[1] 30210

ncol(stocks)

[1] 4

colnames(stocks)

[1] "ticker"         "date"           "Close"          "Adjusted_close"

We should rename some of the columns.

names(stocks)[3:4]=c("UnAdjustedPrice","price")
names(stocks)

[1] "ticker"          "date"            "UnAdjustedPrice" "price"

The reason is that we will not work much with the unadjusted prices, so price is transparent and simple.

We can show the first and last row, and rbind put that into the same dataframe

rbind(head(stocks,1),tail(stocks,1))

	ticker	date	UnAdjustedPrice	price
1	MCD	20030102	16.55	9.6942
30210	INTC	20221230	26.43	26.1118

We can now see how which stocks we have:

unique(stocks$ticker)

[1] "MCD"  "DIS"  "AAPL" "GE"   "JPM"  "INTC"

6.3.0.1 Reshaping

The stocks dataframe isn’t particularly useful as it would be much better to have the prices for each stock in its column. While there are many ways to do that, the best is to use the package reshape2.

If we want to create a dataframe with the returns for every stock, we can use the dcast() function. date ~ means each row is data from the same date and each column is data from the same ticker, value.var is the column of the input dataframe that will be filled into the new dataframe.

Price = dcast(stocks, date ~ ticker, value.var = "price")
head(Price,2)

date	AAPL	DIS	GE	INTC	JPM	MCD
20030102	0.2250	13.7547	88.5027	9.7865	14.5077	9.6942
20030103	0.2265	13.8344	88.2248	9.6985	14.7929	9.4423

UnAdjustedPrice = dcast(stocks, date ~ ticker, value.var = "UnAdjustedPrice")

This works since all the stocks are American and from the same exchange, so they have observations on the same days.

We can now make a dataframe with returns. There are several ways to do that. Note that the first column is dates and we dont want to transform those.

Return=Price
for (i in 2:dim(Price)[2]) Return[,i]=c(NA,diff(log(Price[,i])))
head(Return,2)

date	AAPL	DIS	GE	INTC	JPM	MCD
20030102	NA	NA	NA	NA	NA	NA
20030103	0.0066445	0.0057777	-0.003145	-0.0090327	0.0194678	-0.0263282

And to remove the NA.

Price=Price[!is.na(Return[,2]),]
UnAdjustedPrice=UnAdjustedPrice[!is.na(Return[,2]),]
Return=Return[!is.na(Return[,2]),]
head(Price,2)

	date	AAPL	DIS	GE	INTC	JPM	MCD
2	20030103	0.2265	13.8344	88.2248	9.6985	14.7929	9.4423
3	20030106	0.2265	14.5436	90.4825	10.0738	15.9562	9.7527

head(UnAdjustedPrice,2)

	date	AAPL	DIS	GE	INTC	JPM	MCD
2	20030103	14.9016	17.1292	25.3826	16.54	25.94	16.12
3	20030106	14.9016	18.0074	26.0322	17.18	27.98	16.65

head(Return,2)

	date	AAPL	DIS	GE	INTC	JPM	MCD
2	20030103	0.0066445	0.0057777	-0.0031450	-0.0090327	0.0194678	-0.0263282
3	20030106	0.0000000	0.0499928	0.0252684	0.0379668	0.0757001	0.0323446

But it can be more convenient and safer to put all the dataframes into one list:

x=list(Return=Return,Price=Price,UnAdjustedPrice=UnAdjustedPrice)

6.4 Process raw data

Even better is to put all variables into the same list and load them all with ProcessRawData(). We put that into functions.r.

ProcessRawData=function(){

    sp500=read.csv('data/sp500.csv')
    names(sp500)[2]="price"
    sp500$y = c(NA,diff(log(sp500$price)))
    sp500=sp500[!is.na(sp500$y),]
    sp500$date.ts = ymd(sp500$date)
    sp500$y.ts = zoo(sp500$y,order.by=sp500$date.ts)

    sp500tr=read.csv('data/sp500tr.csv')
    names(sp500tr)[2]="price"
    sp500tr$y = c(NA,diff(log(sp500tr$price)))
    sp500tr$date.ts = ymd(sp500tr$date)
    sp500tr$y.ts = zoo(sp500tr$y,order.by=sp500tr$date.ts)

    stocks=read.csv('data/stocks.csv')
    names(stocks)[3:4]=c("UnAdjustedPrice","price")
    Price = dcast(stocks, date ~ ticker, value.var = "price")
    head(Price,2)
    UnAdjustedPrice = dcast(stocks, date ~ ticker, value.var = "UnAdjustedPrice")

    Return=Price
    for (i in 2:dim(Price)[2]) Return[,i]=c(NA,diff(log(Price[,i])))

    Price=Price[!is.na(Return[,2]),]
    UnAdjustedPrice=UnAdjustedPrice[!is.na(Return[,2]),]
    Return=Return[!is.na(Return[,2]),]

    data=list(
        Return=Return,
        Price=Price,
        UnAdjustedPrice=UnAdjustedPrice,
        sp500=sp500,
        sp500tr=sp500tr,
        Ticker=unique(stocks$ticker)
    )

    return(data)
}

6.1 Libraries

6.2 Processing the sp500.csv file

6.2.1 Processing the sp500tr.csv file

6.3 Processing the stocks.csv file

6.3.0.1 Reshaping

6.4 Process raw data

6.2 Processing the `sp500.csv` file

6.2.1 Processing the `sp500tr.csv` file

6.3 Processing the `stocks.csv` file