4 Variables and objects

When working with financial data in R, you need to store and manipulate different types of information, such as stock prices, returns, volatility measures, portfolio weights and risk metrics. Understanding how R handles these different data types is helpful for financial analysis.

Variables in R are containers that hold your financial data, whether that’s a single stock price, a time series of returns, or a complex portfolio structure. Choosing appropriate variable types and names makes your analysis more reliable and easier to understand.

4.1 Variables

Variables have three key components:

Name (Identifier): A unique label like stock_price, daily_returns, or portfolio_value;
Data Type: The kind of financial data it holds (e.g., numeric for prices, character for ticker symbols, logical for buy/sell signals);
Value: The actual financial data stored (e.g., 150.25, “AAPL”, TRUE for buy).

It can be surprisingly difficult to name variables. While it is often tempting to use one character, like x or P, we might not remember what x actually stands for when looking at the code later. It is usually better to use more descriptive names like Price or VaR99.

When working with financial data in R, you encounter several fundamental data types. Each type serves specific purposes in financial analysis and has particular characteristics that affect how your calculations behave.

4.1.1 Types of variables

An object is a self-contained entity that combines related data and functionality within a single structure. In R, objects combine data and functionality. For financial analysis, this means a portfolio object might contain:

Attributes: Holdings data like stock weights, prices and returns;
Methods: Functions to calculate portfolio return, risk or rebalance holdings;
State: Current portfolio composition and value;
Identity: Each object is distinct, like different traders (buyer/seller) or separate portfolios, even with identical holdings.

All variables in R are objects, helpful for complex financial structures.

In R, every object has a class that describes what type of object it is. The class determines how R treats the object and what operations you can perform on it. You can discover an object’s class using the class() function and its underlying storage type using typeof(). Understanding classes helps you work more effectively with financial data since different classes behave differently in calculations and analysis.

The main variable types covered in this chapter are integers, floating point numbers, characters/strings and logical values. Date and time variables are covered separately in Chapter 5 due to their complexity in financial applications.

4.1.2 Integer

In mathematics, an integer is a discrete number like 0, -1, 10. In financial analysis, integers are commonly used for counting discrete items: the number of shares in a portfolio (100 shares), trading days in a period (252 trading days per year), or position sizes (long 500, short 200).

In R, integers are automatically stored as the numeric class, which can handle both integers and floating point numbers. This numeric class works seamlessly for all financial calculations, whether you’re working with share quantities (whole numbers) or prices (floating point numbers).

4.1.3 Floating point numbers

Floating point numbers represent real numbers with decimal places, such as stock prices (£150.25), returns (0.0523), or volatility measures (0.187). These are the most common type of numerical data in financial analysis.

In R, floating point numbers are stored using the numeric class and can handle very large and very small values, making them suitable for financial calculations ranging from individual stock prices to portfolio valuations in millions or billions.

However, floating point numbers have precision limitations that can affect financial calculations discussed in the Technical background section below.

4.1.4 Characters and strings

In R, text data is stored as the character class. You create character values by enclosing text in quotes: "AAPL" or 'Risk'. R treats both single and double quotes the same way, but double quotes are more common.

In financial analysis, character data is commonly used for ticker symbols ("AAPL", "MSFT"), instrument names ("Apple Inc.", "10-Year Treasury"), currency codes ("USD", "GBP"), and categorical data like sector classifications ("Technology", "Healthcare").

R provides many functions for working with character data, such as paste() for combining strings, substr() for extracting parts of strings, and grep() for pattern matching — all useful when cleaning and manipulating financial datasets.

4.1.5 Logical

Logical values (also called Boolean values) can only be TRUE or FALSE. In financial analysis, logical values are commonly used for decision making and filtering: buy/sell signals (buy_signal = TRUE), portfolio inclusion flags (include_in_portfolio = FALSE), market condition indicators (bull_market = TRUE), or risk threshold breaches (exceeds_VaR = FALSE).

Logical values are particularly useful for subsetting financial data. For example, you might filter stocks where the price-to-earnings ratio is below 15 (PE_ratio < 15) or identify days when trading volume exceeded the average (volume > AVG_volume). These comparisons return logical vectors that can be used to select specific observations from your datasets.

4.2 Technical background

This section covers lower-level computer science concepts that underpin how R handles data.

4.2.1 Bit

In computer science, a bit (short for binary digit) is the smallest unit of data in a computer. It can hold only one of two possible values: - 0 → Represents an “off” state or false. - 1 → Represents an “on” state or true.

A bit is the fundamental building block of all digital information and is the basis of binary code, as well as the language computers use to process and store data.

The letter ‘A’ in ASCII is represented as 01000001.

The number 5 in binary is 00000101.

4.2.2 Character encoding

Characters are stored in memory as numeric codes using character encoding schemes like ASCII or Unicode.

4.2.2.1 ASCII and Unicode

ASCII and Unicode are both character encoding standards that map text characters to numeric codes, enabling computers to store and process text data. However, they differ in their scope, character range, and flexibility.

ASCII was developed in the 1960s as a standard for encoding English text. It uses 7 bits to represent each character, allowing for 128 unique characters (values 0–127), and covers basic English letters, digits (0–9), punctuation marks, and control characters (e.g., newline, carriage return). It supports only English characters and has no native support for special symbols, emojis, or non-Latin scripts (e.g., Chinese, Arabic). For example, the character ‘A’ is represented by the number 65.

Unicode was introduced in the 1990s to address the global limitations of ASCII and aims to encode all characters from all written languages, as well as emojis and symbols. Supports over 143,000 characters from multiple scripts.Has different encoding formats:

UTF-8 (variable length, backwards compatible with ASCII)
UTF-16
UTF-32

4.2.3 Floating point precision

While floating point numbers work well for most financial calculations, they have inherent precision limitations due to how computers store decimal numbers in binary format. These precision errors can accumulate in complex financial calculations and affect results.

Here is the classic arithmetic error that demonstrates the issue:

result = 0.1 + 0.2
print(result)

[1] 0.3

print(result == 0.3)  # FALSE - causes problems in financial comparisons

[1] FALSE

cat("Difference from 0.3:", result - 0.3, "\n")

Difference from 0.3: 5.551115e-17

These precision errors can accumulate in complex financial calculations involving many operations, particularly in risk management systems processing thousands of securities over years of data, potentially affecting regulatory capital calculations and portfolio valuations.