3  The programming language we use here is R

We do not assume you have any knowledge of programming, but if you want to use these notes, we expect you to be willing to learn a programming language.

3.1 Excel is not useful for this work

Microsoft Excel is a very useful piece of software and is extensively used throughout the financial system. Many professionals spend more time with Excel than with their family or pets. While Excel is extremely powerful, and the appropriate choice in many applications, we cannot recommend it for the type of work we are doing here.

It is very difficult to do numerical programming in Excel, including VBA. Of course, you might use an external package to do the types of calculations we are doing here, but then you’re not studying how to implement risk forecast; you are just using the software’s output. Excel can then be quite beneficial.

3.2 Numerical programming language options

There are many programming languages one could use, ranging from general-purpose languages such as C, C++, and Rust to mathematics languages such as Fortran and specialised statistical languages like Stata. All of these are designed for other purposes and are not recommended here unless there is a special need for them, in which case you know you need them.

There are four main software choices, all of which would be very useful for risk forecasting.

  1. Matlab;
  2. Python (Numpy);
  3. Julia;

We recently compared them to some commonly used alternatives in Choosing a numerical programming language for economic research: Julia, MATLAB, Python or R. The website for Financial Risk Forecasting contains basic code in all four languages.

3.3 Our choice: R

We opt for R, a widely used open-source package especially good for statistical analysis of the type we do here. At the time of writing, it has better statistical libraries than the other three languages, the best user interface, RStudio, and a large number of resources available for learning it.

3.3.1 Problems with R

R, like any other language, has problems. It is 40 years old and comes with a huge number of design decisions that might have made sense decades ago but are bizarre or worse today. Patrick Burns, in his R inferno, does an excellent job exposing those problems.

3.4 Matlab

One of the most widely used numerical programming languages is Matlab. It has been around since the 1970s and is particularly good in calculations involving linear algebra. It comes with a number of high-quality libraries and remains widely used in a number of industries, not the least in engineering.

While we can certainly do the type of work we are doing here with Matlab, it is hard to recommend it. Not only is it commercial and very expensive to acquire, but it also needs specialised numerical libraries relevant to what we are doing here. Because it is commercial, very few people outside of the vendor create libraries for it, and it is therefore missing the types of specialist libraries we use in this work.

It also shows its age, being almost 50 years old, archaic, and very difficult to program in a way that is natural to people trained with more modern languages.

3.5 Python

One of the most commonly used programming languages in the world is Python. Compared to R and Matlab, it is relatively young, dating back to the early 1990s.

While Python was originally designed for text and file system handling, because of its flexibility and power, it has acquired a very powerful numerical package Numpi. Success breeds success, and because of its very popularity, it is usually the first and even only place where powerful libraries are developed. This has become particularly clear with machine learning, where libraries such as PyTorch are the primary way most people do machine learning. The benefit is that the user can leverage the power of Python for data handling and network file systems, and then call complicated algorithms, usually coded in specialised languages such as c.

So why not recommend Python for the work we are doing here? The reason is that we are not calling libraries. We are implementing code to forecast risk. That means one would have to program directly in Numpi, which is more cumbersome than for the other three languages. Furthermore, it lacks specialised numerical and statistical libraries.

3.6 Julia

One might then say, why not use Julia, a modern and much better-designed language? Julia, unlike the other three languages, is a child of this century, dating back to 2012. It is designed with numerical work in mind, and we have found it to be excellent. We make much fewer programming errors, code faster and run the code much faster than the other three.

So why not recommend Julia instead of R? A key reason is that it is more complicated to use than the others. It does not come with a high-quality development environment like RStudio and the threshold for getting started with it is higher than that of the other three. Furthermore, it does not enjoy the rich ecosystem of R, which has better libraries, documentation and development environments.

3.7 What do we use?

In our daily work, we use R, Python, and Julia. We pick the best language for the task at hand. For example, our website extreme risk, which has risk forecasts that are updated every day, is based on all three: Python for downloading and basic data processing and to run the processing pipeline; Julia for the actual risk calculation; and R for the graphics.