9  Descriptive statistics

Below we do some descriptive statistics on our data, mostly for dependence and tail shape. We opted for the SP-500. ## Libraries

library(moments)
suppressPackageStartupMessages(library(tseries))
suppressPackageStartupMessages(library(car))

9.1 Data

load('data/data.RData')
sp500=data$sp500

9.2 Statistical analysis

9.2.1 Sample stats

mean(sp500$y)
sd(sp500$y)
skewness(sp500$y)
kurtosis(sp500$y)
0.000286197721858836
0.0121610363140039
-0.511112007341173
15.765198795919

We can present these results a bit nicer by putting them into a data frame.

cat('Sample statistics for the SP-500 from',min(sp500$date),'to',max(sp500$date),'\n')

stats=data.frame()
stats=c('mean:',paste0(round(mean(sp500$y)*100,3),'%'))
stats=rbind(stats,c('sd:',paste0(round(sd(sp500$y)*100,2),'%')))
stats=rbind(stats,c('skewness:',round(skewness(sp500$y),1)))
stats=rbind(stats,c('kurtosis:',round(kurtosis(sp500$y),1)))
stats
Sample statistics for the SP-500 from 20030103 to 20221230 
A matrix: 4 × 2 of type chr
statsmean: 0.029%
sd: 1.22%
skewness:-0.5
kurtosis:15.8

Or printing them with formatting.

for(i in 1:dim(stats)[1])
    cat(stats[i,1],rep("",10-nchar(stats[i,1])),stats[i,2],"\n")
mean:      0.029% 
sd:        1.22% 
skewness:  -0.5 
kurtosis:  15.8 

9.2.2 Jarque-Bera

The Jarque-Bera (JB) test in the tseries library uses the third and fourth central moments of the sample data to check if it has a skewness and kurtosis matching a normal distribution.

jarque.bera.test(sp500$y)

    Jarque Bera Test

data:  sp500$y
X-squared = 34398, df = 2, p-value < 2.2e-16

The p-value presented is the inverse of the JB statistic under the CDF of the asymptotic distribution. The p-value of this test is virtually zero, which means that we have enough evidence to reject the null hypothesis that our data has been drawn from a normal distribution.

9.2.3 Ljung-Box

The Ljung-Box (LB) test checks if the presented data exhibit serial correlation. We can implement the test using the Box.test() function and specifying type = "Ljung-Box":

Box.test(sp500$y, type = "Ljung-Box")
Box.test(sp500$y^2, type = "Ljung-Box")

    Box-Ljung test

data:  sp500$y
X-squared = 81.253, df = 1, p-value < 2.2e-16

    Box-Ljung test

data:  sp500$y^2
X-squared = 551.08, df = 1, p-value < 2.2e-16

9.2.4 Autocorrelation

The autocorrelation function shows us the linear correlation between a value in our time series with its different lags. The acf function plots the autocorrelation of an ordered vector. The horizontal lines are confidence intervals, meaning that if a value is outside of the interval, it is considered significantly different from zero.

par(mar=c(4,4,1,0))
acf(sp500$y, main = "Autocorrelation of returns")

The plots made by the acf function don’t not look all the nice. While there are other versions available for R, including one for ggplot, for now we will simply modify the builtin acf(). Call that myACF() and put it into functions.r.

myACF=function (x, n = 50, lwd = 2, col1 = co2[2], col2 = co2[1], main = NULL) 
{
    a = acf(x, n, plot = FALSE)
    significance_level = qnorm((1 + 0.95)/2)/sqrt(sum(!is.na(x)))
    barplot(a$acf[2:n, , 1], las = 1, col = col1, border = FALSE, 
        las = 1, xlab = "lag", ylab = "ACF", main = main)
    abline(significance_level, 0, col = col2, lwd = lwd)
    abline(-significance_level, 0, col = col2, lwd = lwd)
    axis(side = 1)
}
par(mar=c(4,4,1,0))
myACF(sp500$y, main = "Autocorrelation of SP-500 returns")
myACF(sp500$y^2, main = "Autocorrelation of squared SP-500 returns")

9.2.5 QQ-Plots

We can use the quantile-quantile (QQ) plot to see how our data fits particular distributions. To make the QQ plots, we will use the qqPlot() function from the car package:

Start with the normal plot, indicated by distribution = "norm" which is the default.

par(mar=c(4,4,1,0))
x=qqPlot(sp500$y, distribution = "norm", envelope = FALSE)

We can also use QQ-Plots to see how our data fits tthe t distributions with various degrees of freedom.

par(mfrow=c(2,2), mar=c(4,4,1,0))
x=qqPlot(sp500$y, distribution = "norm", envelope = FALSE,xlab="normal")
x=qqPlot(sp500$y, distribution = "t", df = 4, envelope = FALSE,xlab="t(4)")
x=qqPlot(sp500$y, distribution = "t", df = 3.5, envelope = FALSE,xlab="t(3.5)")
x=qqPlot(sp500$y, distribution = "t", df = 3, envelope = FALSE,xlab="t(3)")