2  Financial and economic data

While financial markets generate a vast amount of data (sometimes with new observations every nanosecond), it is generally difficult and costly to get this data. All data providers are commercial with complicated interfaces, sometimes very expensive, and the few free data providers tend to have erratic access and errors in data.

2.1 Symbols and names and identifiers

There are many categories of financial data one might use, like stocks, bonds, futures, options, commodities and foreign exchange. For each of these, we have a large number of individual assets and types of assets. Identifying the asset we need can be quite complex.

2.1.1 Names and tickers

All stocks that are trading in the market are associated with a ticker symbol, serving as an identifier for the specific security. These are often specific to the particular exchange or country of listing. This can often lead to confusion or ambiguity when a company has cross-listings. For instance, the Japanese car manufacturer Toyota is listed as 7203 on the Tokyo Stock Exchange, TYT on the London Stock Exchange, and TM on the New York Stock Exchange.

Some identical securities have different names. Depending on the source, searching for data on the S&P-500 (a major American stock market index) might require the ticker names SPX (Bloomberg), ^GSPC (Yahoo Finance), INX (Google Finance), GSPC.INDX (EOD), and so on.

Furthermore, firms sometimes change names, so tickers often change over time (for example, due to a merger or name change) and can be recycled. The same ticker can refer to two or more firms, giving rise to problems if not taken into consideration when querying financial data.

When searching for a particular security from the data source, it is good practice to verify that the ticker symbol used indeed corresponds to the correct data series by checking the data description.

2.1.2 Tickers, ISIN and PERMNO

When downloading data, it is best not to refer to stocks by their ticker symbol but rather by one of the permanent asset identifiers.

The ISIN is the International Securities Identification Number, an internationally recognised 12-character code unique to each stock. Unlike Tickers, the ISIN for a stock is the same regardless of the market.

The PERMNO is the permanent issue identifier of the CRSP dataset.

2.2 Dates

A date has three components: year, month, and day. Refer to those as YYYY, MM, and DD, respectively. There may also be a time component: hours, minutes, seconds, and fractions of a second. There is also a time zone and a summer time indicator.

Dates are very complicated to work with in software, as most code does not internally use years, months, etc. Instead, dates are floating point numbers relative to some initial date. For example, R’s origin date/time is set as 1 January 1970 00:00:00, and all dates are relative to that. Midnight 1 January 2000 is the number 10957.

Dates often present problems when using multiple languages or programs to carry out numerical work. Excel has two different conventions for dates depending on the version — the origin year can either be 1900 or 1904 — this requires verification before use. Furthermore, Excel does not allow dates before 1900.

A date can be represented numerically in many different ways. Consider the date 13 September 2018:

Format Example
DD-MM-YYYY 13-09-2018
MM-DD-YYYY 09-13-2018
YYYY-MM-DD 2018-09-13
YYYYMMDD 20180913

The best way is to use the YYYYMMDD convention for two reasons:

  1. It can be represented as an integer, not as a string, making data handling more convenient;
  2. It sorts naturally (in chronological order).

The R package lubridate is very useful for dates.

2.2.1 Adjusted and unadjusted prices

If you download stock prices, your data will often be unusable because the number of outstanding stocks is often adjusted, usually by stock splits.

For example, Amazon announced a 20 for 1 stock split in 2020. This meant that every Amazon stock became 20 stocks, and similarly, the price dropped by a factor of 19, from about $2000 to $124.

This means that if you load such price data into R and do an analysis, you would have a big price drop that has no impact on risk or wealth.

We, therefore, work with what is called adjusted prices, that is, prices adjusted for stock splits.

2.2.2 Asynchronous prices

Problems arise when data comes from different markets and countries due to:

  1. Holidays;
  2. Time zones.

Public holidays, days when the markets are closed, are often different across countries. For example, an Independence Day or religious holiday, like 4 July, in the United States. The exchanges are usually open Monday through Friday, but the Saudi Stock Exchange is open from Sunday through Thursday. Some exchanges close for a lunch break. Some countries have summertime, and others do not, and summertime often happens on different dates, like in the US vs. Europe.

Name Time Zone Trading Hours Lunch Break
New York Stock Exchange EDT 9:30 a.m. to 4:00 p.m. No
Shanghai Stock Exchange CST 9:30 a.m. to 11:30 a.m. to 3:00 p.m. 11:30 a.m. to 1:00 p.m.
Tokyo Stock Exchange JST 9:00 a.m. to 3:00 p.m. 11:30 a.m. to 12:30 p.m.
London Stock Exchange BST 8:00 to 3:30 p.m. 12:00 p.m. to 12:02 p.m.
Frankfurt Stock Exchange CET 9:00 a.m. to 5:30 p.m. No

The New York market might overlap with London but not some European markets, and since Tokyo is 12 hours ahead of London, there is no overlap in trading hours.

This means that any research comparing prices across countries at a daily frequency needs to consider these issues. They can be bypassed by using weekly or monthly data.

2.3 Common sources of financial and economics data

The type of data we use here can only be obtained from a commercial vendor, either for free or by paying. Your university might have a subscription to a commercial vendor that you can use for free.

2.3.1 What we usually use

2.3.1.1 EOD historical data

Our primary source of financial prices is End of Day Historical Data, which provides fundamental data API and live and end-of-day historical prices for stocks, ETFs, and mutual funds from exchanges worldwide. While not free, it is not very expensive and comes with an academic discount.

It has a very useful API interface that allows data to be downloaded directly into R. Suppose you have obtained an API token, and this is how you can download daily Apple stock prices.

api.token = "YOUR_API_KEY_HERE"
symbol = "AAPL.US"
ticker.link <- paste("http://nonsecure.eodhistoricaldata.com/api/eod/", symbol, "?api_token=", api.token, "&period=m&order=d", sep="")
data <- read.csv(url(ticker.link))

The data we use in these notes was obtained with permission from EOD.

2.3.1.2 DBnomics

Our main source of economic data is DBnomics, which aggregates free data feeds from various sources, including the World Bank, IMF, BIS, and OECD.

It comes with both a browser and an API interface. For R, we can use rdbnomics.

require(rdbnomics)
df1 = rdb(ids = "AMECO/ZUTN/EA19.1.0.0.0.ZUTN")

2.3.2 Some other vendors

2.3.2.1 Bloomberg

One of the most ubiquitous data sources in finance is the Bloomberg Terminal. Due to its pervasiveness throughout the industry, there are numerous packages in practically every language that allows access to its APIs. We can download Bloomberg data directly into R.

LSE students have access to a number of Bloomberg terminals in the library and the master student’s common rooms.

2.3.2.2 Wind

The Wind Financial Terminal (WFT) also provides market data like the Bloomberg Terminal but with a specific focus on the Chinese financial markets. It supports APIs for MATLAB, R, C++ and Python, among others. LSE has access to Wind.

2.3.2.3 WRDS

The Wharton business school at the University of Pennsylvania provides a service called Wharton Research Data Services (WRDS) that many universities subscribe to. This provides a common interface to several databases, including CRSP and TAC high-frequency data. WRDS and many of its databases are available to LSE students and staff.

2.3.2.4 Yahoo Finance

The go-to place for many researchers requiring financial data has been finance.yahoo.com. This data can be automatically downloaded for free into many software packages, including Matlab, R and Python.

There are three problems with Yahoo Finance.

  1. Yahoo occasionally changes how the API works, requiring updates to software;
  2. It often is unavailable for days or weeks;
  3. There are errors in the data. For example, UK prices, quoted in pence by convention, sometimes appear in pounds for one or two days, reverting to pence. On other occasions, numbers need to be corrected.

2.3.2.5 Federal Reserve Economic Data (FRED)

The FRED Economic Research is a good source for macroeconomic data, including data for unemployment, GDP, interest rates, the money supply, etc. It can be accessed from DBnomics.

2.3.2.6 IEX

IEX provides access to US equity data via https://iextrading.com/developer/.

2.3.2.7 ECB FX

The European Central Bank Statistical Data Warehouse and its corresponding SDMX interface allow for retrieval of daily Euro FX data.

The entire dataset is here http://www.ecb.europa.eu/stats/eurofxref/eurofxref-hist.zip, and it can be accessed using

wget http://www.ecb.europa.eu/stats/eurofxref/eurofxref-hist.zip -O eurofxref-hist.zip

EOD also has this data.

2.3.2.8 Alpha Vantage

Alpha Vantage provides free daily and real-time stock price data and API access is available in R and Python. Its data source appears to be the same as Yahoo Finance and, hence, is subject to the same errors.

2.3.2.9 Quandl

Quandl provides R and Python with common API access to a large number of commercial databases, some of which are free. While comprehensive, one may need to subscribe to data from several providers.

2.3.2.10 Fama-French Data Library:

The Fama-French Data Library provides a large amount of historical data compiled by Eugene Fama and Kenneth French. The data is updated regularly, and the Fama-French 3-factor data is especially useful for analysing fund and portfolio performance.

2.3.3 Other useful databases

Some useful databases that can be assessed in different ways include:

2.3.3.1 CRSP

One of the major sources for historical US stock market data is The Center for Research in Security Prices (CRSP, pronounced “crisp”), headquartered at the University of Chicago.

CRSP can be accessed via WRDS.

2.3.3.2 BIS

The BIS provides a very useful database on credit and banking statistics. You can either access it directly at stats.bis.org or DBnomics.

2.3.3.3 World bank

The World Bank provides extensive economic development data. The best way to access it is via DBnomics.

2.3.3.4 OECD

The OECD provides extensive data on member states. The best way to access this is via DBnomics.