3  Why we use R

We do not assume you have any knowledge of programming, but if you want to use these notes, we expect you to be willing to learn a programming language.

3.1 Excel vs programming languages

Microsoft Excel is extensively used throughout the financial system and remains an excellent tool for many financial tasks including data exploration, reporting and ad-hoc analysis. Many professionals rely on Excel for its accessibility, familiar interface and powerful built-in functions.

However, for the type of systematic risk forecasting we do in these notes, programming languages offer significant advantages. Excel becomes cumbersome for complex statistical calculations, large datasets and reproducible research. While Excel can interface with external packages for sophisticated analysis, this approach limits your understanding of the underlying methods. For learning how to implement risk forecasting techniques from first principles, a programming environment is more suitable.

Excel and R can work well together — many workflows involve R for analysis and Excel for presentation and reporting.

3.2 Numerical programming language options

There are many programming languages one could use, ranging from general-purpose languages such as C, C++, and Rust to mathematics languages such as Fortran and specialised statistical languages like Stata. All of these are designed for other purposes and are not recommended here unless there is a special need for them, in which case you know you need them.

There are four main software choices, all of which would be very useful for risk forecasting.

  1. Matlab;
  2. Python (NumPy);
  3. Julia;

We recently compared them to some commonly used alternatives in Choosing a numerical programming language for economic research: Julia, MATLAB, Python or R. The website for Financial Risk Forecasting contains basic code in all four languages.

3.3 Our choice: R

We opt for R, a widely used open-source package especially good for statistical analysis of the type we do here. At the time of writing, it has better statistical libraries than the other three languages, the best user interface, RStudio, and a large number of resources available for learning it.

3.3.1 Problems with R

R, like any programming language, has limitations. It is 40 years old and carries some design decisions that can be confusing for new programmers. These include inconsistent function naming conventions, multiple ways to accomplish the same task, and some counterintuitive default behaviours. Patrick Burns, in his R inferno, catalogues these issues comprehensively.

However, for the statistical and risk analysis work we do here, these problems are manageable. The extensive statistical libraries, excellent documentation, and large user community mean that solutions to common problems are well-established. Modern R practices and tools like RStudio help mitigate many of the historical design issues, and the benefits for financial analysis outweigh the learning curve challenges.

3.4 Matlab

One of the most widely used numerical programming languages is Matlab. It has been around since the 1970s and is particularly good in calculations involving linear algebra. It comes with a number of high-quality libraries and remains widely used in a number of industries, not least in engineering.

While we can certainly do the type of work we are doing here with Matlab, it is hard to recommend it. Not only is it commercial and very expensive to acquire, but it also needs specialised numerical libraries relevant to what we are doing here. Because it is commercial, very few people outside of the vendor create libraries for it, and it is therefore missing the types of specialist libraries we use in this work.

It also shows its age, being almost 50 years old, archaic, and very difficult to program in a way that is natural to people trained with more modern languages.

3.5 Python

One of the most commonly used programming languages in the world is Python. Compared to R and Matlab, it is relatively young, dating back to the early 1990s.

While Python was originally designed for text and file system handling, because of its flexibility and power, it has acquired a very powerful numerical package NumPy. Success breeds success, and because of its very popularity, it is usually the first and even only place where powerful libraries are developed. This has become particularly clear with machine learning, where libraries such as PyTorch are the primary way most people do machine learning. The benefit is that the user can leverage the power of Python for data handling and network file systems, and then call complicated algorithms, usually coded in specialised languages such as C.

So why not recommend Python for the work we are doing here? Python has excellent libraries for financial applications and is widely used in quantitative finance, particularly for data processing, machine learning and production systems. However, R remains better suited for the statistical analysis and risk modelling we focus on in these notes. R’s syntax is more naturally designed for statistical operations, and it has more comprehensive libraries specifically for econometrics and risk analysis. Many quantitative professionals use Python for data engineering and R for statistical modelling, combining both languages effectively.

3.6 Julia

One might then say, why not use Julia, a modern and much better-designed language? Julia, unlike the other three languages, is a child of this century, dating back to 2012. It is designed with numerical work in mind, and we have found it to be excellent. We make much fewer programming errors, code faster and run the code much faster than the other three.

So why not recommend Julia instead of R? Julia is excellent for performance-intensive applications and has a growing ecosystem for financial analysis. However, the learning curve remains steeper, particularly for those new to programming. While development environments like VSCode with Julia extensions have improved substantially, R’s combination of RStudio and extensive educational resources makes it more accessible for learning risk forecasting concepts.

3.7 Stata

Stata is a specialised statistical package widely used in economics and finance. It provides a comprehensive suite of statistical functions through both a graphical interface and command-line scripting. Stata excels at handling panel data, survey data and econometric analysis, with excellent documentation and built-in help systems.

However, Stata is expensive proprietary software with licensing restrictions. Its programming capabilities are more limited compared to general-purpose languages, and it lacks the flexibility needed for complex risk modelling workflows. While Stata produces publication-ready output easily, R offers greater programmability and a much larger ecosystem of packages relevant to financial risk analysis.

3.8 Lower-level languages

Some programming languages prioritise performance and system control over ease of use. C and C++ have been extensively used for decades and remain important for performance-intensive applications. Fortran, despite its age, continues to be used for intensive numerical computations and forms the backbone of many mathematical libraries. Rust offers memory safety with C-like performance, while Zig provides a modern approach to system programming.

These languages are used extensively in finance for high-frequency trading systems, exchange matching engines and other applications where microsecond performance matters. Many of the libraries we use in R, Python and Julia are actually implemented in C, C++ or Fortran underneath, providing the speed while hiding the complexity.

However, these languages require substantial programming expertise and are not suitable for learning statistical concepts. The development time is much longer, debugging is more complex, and the focus shifts from statistical thinking to memory management and system programming. For the educational approach in these notes, they would create unnecessary barriers to understanding risk forecasting methods.

3.9 Practical considerations

3.9.1 Learning considerations

The time investment required to become productive varies significantly across these languages. R has a relatively gentle learning curve for statistical work, particularly if you focus on the core functions needed for risk analysis. Most users become reasonably proficient within a few months of regular use.

Python requires learning more general programming concepts before becoming effective for statistical analysis, though this broader knowledge can be valuable for other applications. Matlab has a moderate learning curve but its commercial nature limits practice opportunities. Julia offers excellent performance but requires more programming sophistication to use effectively.

For the focused statistical work in these notes, R allows you to start producing meaningful results quickly while building programming skills gradually.

3.9.2 Cost considerations

R is free and open-source, as is Python and Julia. Matlab requires expensive commercial licenses that can cost thousands of pounds per user annually, making it impractical for individual learning or small organisations. While some universities provide Matlab access, this limits your ability to continue using it after graduation. The free alternatives also tend to have more active development communities and faster adoption of new methods.

3.9.3 Industry usage

Industry usage varies significantly by institution, department and specific role. Large financial firms typically use multiple languages across different functions. Python is common for data analysis and some quantitative research, while R is widely used in risk management, econometric research and regulatory reporting. Julia is making rapid inroads due to its combination of speed and ease of coding, particularly for performance-intensive applications. Legacy funcions often use Matlab. Many quantitative analysts work with several languages depending on the task. Learning R provides strong foundations in statistical thinking that transfer well to other programming environments when workplace requirements vary.

3.10 What do we use?

In our daily work, we use R, Python and Julia. We pick the best language for the task at hand. For example, our website Extreme Risk, which has risk forecasts that are updated every day, is based on all three. Python handles downloading and basic data processing and runs the processing pipeline. Julia performs the actual risk calculations. R creates the graphics.