Portfolio Optimization and Returns in R

Photo by Lukas from Pexels

       In this post, we will explore some finance topics— portfolio optimization and computing portfolio returns. My goal is to apply what I’ve learned in portfolio theory using R as the main tool of analysis. There are many advantages to using a GUI like MS Excel, but R has an amazing data analysis work flow— a sort of one-stop solution from initial data importation to data wrangling and transformation to computations and analysis and then finally to visualizing and reporting results. We will be using functions from several R packages— xts, PerformanceAnalytics, PortfolioAnalytics, tidyquant, tidyverse. In particular, the tidyquant package is a recent development that has markedly enriched R’s financial analytics infrastructure, enhancing its usability in finance. While I cover some theory in this post, my main focus will be on the implementation of these topics in R.

       For more readings on the theory, I recommend Essentials of Investments and Practical Portfolio Performance Measurement and Attribution. To learn more about analyzing financial data in R, there is Reproducible Finance with R, which is a very practical book with a strong emphasis on application.

       With that being said, let us get financial!

Optimal Weights for a five-asset portfolio (Minimum Variance)

       We will employ Markowitz’s Mean-Variance model as the framework for computing optimal weights, essentially treating the task as an “unconstrained” optimization problem. The objective of this optimization problem is one of minimization:

\[\begin{align*} \text{Minimize}\hspace{2mm}(\sigma^{2}=\vec{W}^{T}\sum\vec{W})) \end{align*}\]

subject to the sum of weights constraint and the box constraint: \[ \sum_{i=1}^{N} W_{i}=1 \quad \text { and } \hspace{3mm} \varepsilon_{i} \leq W_{i} \leq \delta_{i} \] where

  • where \(\varepsilon_{i}=0.05 \hspace{3mm} \delta_{i}=0.6\) are the lower and upper bounds for the weights \(W_{i}\)

       The minimization problem is a quadratic problem with linear constraints, since the variance formula is a quadratic function and the constraints are linear functions; this type of problem is well suited to be solved using a quadratic programming solver. The PortfolioAnalytics package uses ROI.plugin.quadprog, a plug-in for the “R” Optimization Infrastructure, to solve the problem. The solver can be specified with the optimize_method argument in optimize.portfolio(). If optimize_method = “ROI” is specified, a default solver will be selected based on the optimization problem. The glpk solver is the default solver for LP and MILP optimization problems. The quadprog solver is the default solver for QP optimization problems.


Implementation in R

       For this task, we will import our data from Yahoo Finance. The five assets under examination are Exchange Traded Funds, which are funds that can be traded on an exchange like a stock. Exchange-traded funds are a type of investment fund that offers the best attributes of two popular assets: they have the diversification benefits of mutual funds and the ease with which stocks are traded. However, before we import any data, we must answer the following question: In what form do we want our data to be? Since we are in the world of finance, times series is the most common type of data. R has the xts package to handle data that are indexed by date. Our task, therefore, reduces to the following:

  • Import daily prices from Yahoo Finance

  • Convert daily prices to monthly return

  • Because we will be aggregating monthly returns to form a portfolio, we will need to compute the simple returns

# Create a vector of ticker symbols
symbols <- c("SPY", "EFA", "IJS", "EEM", "AGG")
# Load data from 2012 to today
# Specify the "to = " argument to specify an end date
prices <- quantmod::getSymbols(
  Symbols = symbols,
  src = "yahoo",
  from = "2012-12-31",
  auto.assign = TRUE,
  warnings = FALSE
) %>%
  # The map function takes an anonymous function and will return a list of five
  # The function Ad() extracts the daily adjusted price series
  purrr::map(.f = ~ quantmod::Ad(get(x = .x))) %>%
  # Use reduce() to merge the elements of .x interactively
  purrr::reduce(.f = merge) %>%
  # Use a replacement function to set column names to ticker symbols
  # This function is in prefix form
  # It is equivalent to colnames(x = prices) <- value
  `colnames<-`(value = symbols)
# Keep only the last reading of each month
# We could have chosen to keep only the first reading of each month
asset_returns_xts <- xts::to.monthly(
  x = prices,
  drop.time = TRUE,
  indexAt = "lastof",
  OHLC = FALSE
) %>%
  # Compute simple returns
  # Log returns are time-additive but not portfolio additive
  PerformanceAnalytics::Return.calculate(method = "discrete") %>%
  # Drop the first row since we lose 12/31/2012
  stats::na.omit()
# Keep only the xts returns, ticker symbols, and the prices series
rm(list = setdiff(x = ls(), y = c("symbols", "prices", "asset_returns_xts")))
  • Create Portfolio object
# Examine the monthly simple returns for our five ETF's
head(x = asset_returns_xts, 5)
                  SPY         EFA         IJS          EEM           AGG
2013-01-31 0.05119052  0.03728466 0.053516334 -0.002930946 -0.0062115144
2013-02-28 0.01275821 -0.01288577 0.016306731 -0.022840526  0.0059086990
2013-03-31 0.03797176  0.01305398 0.041079862 -0.010182972  0.0009852088
2013-04-30 0.01921213  0.05018653 0.001223168  0.012158139  0.0096857020
2013-05-31 0.02360972 -0.03019051 0.042869516 -0.048279085 -0.0200111498
# Create Portfolio object which is essentially a list object
min_var_portfolio <- PortfolioAnalytics::portfolio.spec(assets = symbols)
typeof(min_var_portfolio)
[1] "list"
  • Add constraints to the portfolio object
# Add the full investment constraint that specifies that the weights must sum to 1
min_var_portfolio <- PortfolioAnalytics::add.constraint(
  portfolio = min_var_portfolio,
  type = "full_investment"
)
# Examine the constraint element by extracting min_var_portfolio[["constraints"]][[1]]
str(pluck(.x = min_var_portfolio, "constraints", 1))
List of 6
 $ type   : chr "full_investment"
 $ enabled: logi TRUE
 $ message: logi FALSE
 $ min_sum: num 1
 $ max_sum: num 1
 $ call   : language PortfolioAnalytics::add.constraint(portfolio = min_var_portfolio, type = "full_investment")
 - attr(*, "class")= chr [1:2] "weight_sum_constraint" "constraint"
# Add the box constraint that ensure the weights are between 0.1 and 0.6
min_var_portfolio <- PortfolioAnalytics::add.constraint(
  portfolio = min_var_portfolio,
  type = "box", min = 0.05, max = 0.6
)
# Examine the constraint element by extracting min_var_portfolio[["constraints"]][[2]]
str(pluck(.x = min_var_portfolio, "constraints", 2))
List of 5
 $ type   : chr "box"
 $ enabled: logi TRUE
 $ min    : Named num [1:5] 0.05 0.05 0.05 0.05 0.05
  ..- attr(*, "names")= chr [1:5] "SPY" "EFA" "IJS" "EEM" ...
 $ max    : Named num [1:5] 0.6 0.6 0.6 0.6 0.6
  ..- attr(*, "names")= chr [1:5] "SPY" "EFA" "IJS" "EEM" ...
 $ call   : language PortfolioAnalytics::add.constraint(portfolio = min_var_portfolio, type = "box",      min = 0.05, max = 0.6)
 - attr(*, "class")= chr [1:2] "box_constraint" "constraint"
  • Add objective function
# Add objective to minimize variance
min_var_portfolio <- PortfolioAnalytics::add.objective(
  portfolio = min_var_portfolio,
  # Minimize risk
  type = "risk",
  # A character corresponding to a function name, var()
  name = "var"
)
  • Optimization
# Run the optimization
global_min_portfolio <- PortfolioAnalytics::optimize.portfolio(
  R = asset_returns_xts,
  portfolio = min_var_portfolio,
  # This defaults to the "quadprog" solver
  optimize_method = "quadprog",
  # Return additional information on the path or portfolios searched
  trace = TRUE
)
# Examine returned portfolio list object
global_min_portfolio
***********************************
PortfolioAnalytics Optimization
***********************************

Call:
PortfolioAnalytics::optimize.portfolio(R = asset_returns_xts, 
    portfolio = min_var_portfolio, optimize_method = "quadprog", 
    trace = TRUE)

Optimal Weights:
  SPY   EFA   IJS   EEM   AGG 
0.213 0.087 0.050 0.050 0.600 

Objective Measure:
 StdDev 
0.01644 

Optimal Weights for the five-asset portfolio (Maximum Expected Return)

       Now that we found the global minimum variance portfolio, we may be interested in finding the maximal expected return portfolio. The objective is one of maximization:

\[\begin{align*} \text{Maximize}\hspace{2mm}(\vec{\mu}^{T}\vec{W}) \end{align*}\]

subject to the sum of weights constraint and the box constraint: \[ \sum_{i=1}^{N} W_{i}=1 \quad \text { and } \hspace{3mm} \varepsilon_{i} \leq W_{i} \leq \delta_{i} \]

       The optimization problem in this case is a linear programming problem, since the portfolio expected return formula is a linear function. This is best tackled using a linear programming solver. The package PortfolioAnalytics uses the ROI package with the glpk plugin, the GNU Linear Programming toolkit of R’s Optimization Infrastructure.


Implementation in R

  • Create portfolio object
# Create Portfolio object
max_exp_return_portfolio <- PortfolioAnalytics::portfolio.spec(assets = symbols)
  • Add constraints to the object
# Add the full investment constraint that specifies the weights must sum to 1
max_exp_return_portfolio <- PortfolioAnalytics::add.constraint(
  portfolio = max_exp_return_portfolio,
  type = "full_investment"
)
# Add the box constraint that ensure the weights are between 0.1 and 0.6
max_exp_return_portfolio <- PortfolioAnalytics::add.constraint(
  portfolio = max_exp_return_portfolio,
  type = "box", min = 0.05, max = 0.6
)
  • Add objective function
# Add objective to maximize mean returns
max_exp_return_portfolio <- PortfolioAnalytics::add.objective(
  portfolio = max_exp_return_portfolio,
  # Maximize expected returns
  type = "return",
  # A character corresponding to a function name, mean()
  name = "mean"
)
  • Optimization
# Run the optimization
global_max_portfolio <- PortfolioAnalytics::optimize.portfolio(
  R = asset_returns_xts,
  portfolio = max_exp_return_portfolio,
  # This defaults to the "glpk" solver
  optimize_method = "glpk",
  # Return additional information on the path or portfolios searched
  trace = TRUE
)
# Examine returned portfolio list object
global_max_portfolio
***********************************
PortfolioAnalytics Optimization
***********************************

Call:
PortfolioAnalytics::optimize.portfolio(R = asset_returns_xts, 
    portfolio = max_exp_return_portfolio, optimize_method = "glpk", 
    trace = TRUE)

Optimal Weights:
 SPY  EFA  IJS  EEM  AGG 
0.60 0.05 0.25 0.05 0.05 

Objective Measure:
  mean 
0.0114 

Building a portfolio

       We have found two sets of optimal weights that yield portfolios that offer the lowest possible risk and the high possible expected return given two basic constraints. Our next task is to aggregate the monthly returns of the individual ETF’s to find the monthly returns of our five-asset portfolio.

# Set optimal weights
weights <- pluck(.x = global_max_portfolio, "weights")
# Check if the weights and symbols align
tibble(weights, symbols)
# A tibble: 5 × 2
  weights symbols
    <dbl> <chr>  
1   0.6   SPY    
2   0.05  EFA    
3   0.250 IJS    
4   0.05  EEM    
5   0.05  AGG    
# Ensure that the weights vector sums up to 1
tibble(weights, symbols) %>%
  dplyr::summarize(total_weight = sum(weights))
# A tibble: 1 × 1
  total_weight
         <dbl>
1            1

The portfolio return in month, \(t\), is given by:

\[\begin{align*} r_{\text{portfolio,t}}=\sum_{i=1}^{5}W_{i}r_{i,t} \end{align*}\]

Monthly portfolio returns by hand

       We can compute portfolio monthly returns using a brute force method:

# Compute by hand
portfolio_returns_by_hand <-
  (weights[[1]] * asset_returns_xts[, 1]) +
  (weights[[2]] * asset_returns_xts[, 2]) +
  (weights[[3]] * asset_returns_xts[, 3]) +
  (weights[[4]] * asset_returns_xts[, 4]) +
  (weights[[5]] * asset_returns_xts[, 5])
# Name the series
portfolio_returns_by_hand <- `names<-`(portfolio_returns_by_hand, "Monthly portfolio returns")
# Examine
head(portfolio_returns_by_hand, 5)
           Monthly portfolio returns
2013-01-31                0.04550050
2013-02-28                0.01024073
2013-03-31                0.03324583
2013-04-30                0.01543459
2013-05-31                0.01995917

Monthly portfolio returns in xts

       Another way to compute portfolio monthly returns is to use functions from the PerformanceAnalytics package, which works well with xts objects. We also adopt monthly re-balancing as a strategy. The re-balancing of investments is the action of keeping the portfolio weights consistent with the optimal weights. Note that re-balancing the weights every month may be unrealistic; in reality, the convention is often to re-balance quarterly or annually. For this example, however, we will re-balance monthly to be consistent with our brute force computation.

# Compute monthly portfolio returns
portfolio_returns_xts_rebalanced_monthly <-
  PerformanceAnalytics::Return.portfolio(
    R = asset_returns_xts,
    weights = weights,
    # Monthly re-balancing
    reblance_on = "months",
    # Use simple/arithmetic chaining to aggregate returns
    geometric = FALSE
  ) %>%
  `colnames<-`("Monthly_portfolio_returns")
# Examine
head(portfolio_returns_xts_rebalanced_monthly, 5)
           Monthly_portfolio_returns
2013-01-31                0.04550050
2013-02-28                0.01024073
2013-03-31                0.03324583
2013-04-30                0.01543459
2013-05-31                0.01995917
  • The function Return.portfolio(R, weights = NULL, wealth.index = FALSE, contribution = FALSE, geometric = TRUE, rebalance_on = c(NA, "years", "quarters", "months", "weeks", "days"), value = 1, verbose = FALSE, ...) calculates the returns of a portfolio with the same periodicity of the returns data.

Monthly portfolio returns in the tidyverse

       The tidyverse is a collection of R packages designed with the same underlying philosophy, grammar, and data structures. Simply put, the “tidy” data structure that works well with tidyverse functions is one where every row is an observation and every column is a variable. If we re-examine the xts object called “asset_returns_xts,” we see that every column is a returns series for a particular asset. This is inconsistent with the tidyverse data structure, and so we must convert the xts object to a tidy format for our computations. Ideally, we would like to have one column called “asset” that specifies the names of the ETF instead of having five individual columns of returns data. This idea will become clearer once we convert our xts object to a tibble.

Now, examine the “tidy” data structure and compare it to the xts object created earlier:

FSA::headtail(asset_returns_long, n = 5)
          date asset       returns
1   2013-01-31   SPY  0.0511905160
2   2013-01-31   EFA  0.0372846569
3   2013-01-31   IJS  0.0535163338
4   2013-01-31   EEM -0.0029309465
5   2013-01-31   AGG -0.0062115144
521 2021-09-30   SPY -0.0224997875
522 2021-09-30   EFA -0.0017360987
523 2021-09-30   IJS -0.0316548170
524 2021-09-30   EEM -0.0198435604
525 2021-09-30   AGG -0.0001984158

Since our returns data is in a “tidy” format, computing portfolio monthly returns is very a straight forward. For those who are familiar with SQL, we are essentially using the CASE statement here.

# Use vectorized if else statements to assign weights according to the asset column
potfolio_returns_dplyr_byhand <- asset_returns_long %>%
  group_by(asset) %>%
  mutate(
    weights = dplyr::case_when(
      asset == symbols[[1]] ~ weights[[1]],
      asset == symbols[[2]] ~ weights[[2]],
      asset == symbols[[3]] ~ weights[[3]],
      asset == symbols[[4]] ~ weights[[4]],
      asset == symbols[[5]] ~ weights[[5]]
    ),
    weighted_returns = weights * returns
  ) %>%
  # Group by date
  # We need to group by date so that the aggregate sum() function is carried out by month
  # For each date, the original series has 5 weighted returns, one for each ETF
  # The results should be 1 portfolio return (the sum of the 5 weighted returns) for each month
  dplyr::group_by(date) %>%
  # Compute monthly portfolio returns
  dplyr::summarize(Monthly_portfolio_returns = sum(weighted_returns))
# Examine the data
head(potfolio_returns_dplyr_byhand, 5)
# A tibble: 5 × 2
  date       Monthly_portfolio_returns
  <date>                         <dbl>
1 2013-01-31                    0.0455
2 2013-02-28                    0.0102
3 2013-03-31                    0.0332
4 2013-04-30                    0.0154
5 2013-05-31                    0.0200
  • The function summarize(.data, ..., .groups = NULL) creates a new data frame. It will have one (or more) rows for each combination of grouping variables.

Monthly portfolio returns in tidyquant

       The tidyquant package gives us two ways to compute the portfolio monthly returns.

# Use function from the tidyquant
portfolio_returns_tq_rebalanced_monthly_method_1 <- asset_returns_long %>%
  tidyquant::tq_portfolio(
    assets_col = asset,
    returns_col = returns,
    weights = weights,
    col_rename = "Monthly_portfolio_returns",
    rebalance_on = "months"
  )
# Examine
head(portfolio_returns_tq_rebalanced_monthly_method_1, 5)
# A tibble: 5 × 2
  date       Monthly_portfolio_returns
  <date>                         <dbl>
1 2013-01-31                    0.0455
2 2013-02-28                    0.0102
3 2013-03-31                    0.0332
4 2013-04-30                    0.0154
5 2013-05-31                    0.0200
  • The function tq_portfolio(data, assets_col, returns_col, weights = NULL, col_rename = NULL, ...) aggregates a group of returns by asset into portfolio returns.

       An alternative way to load data is to use the tidyquant wrapper function, which automatically coerces the returns into a long tidy format:

# Load data
asset_returns_tq <- tidyquant::tq_get(
  x = symbols,
  get = "stock.prices",
  from = "2012-12-31"
) %>%
  # The asset column is named symbol by default (see body(tidyquant::tq_get))
  dplyr::group_by(symbol) %>%
  # Select adjusted daily prices
  tidyquant::tq_transmute(
    select = adjusted,
    col_rename = "returns",
    # This function is from quantmod
    mutate_fun = periodReturn,
    # These arguments are passed along to the mutate function quantmod::periodReturn
    period = "monthly",
    # Simple returns
    type = "arithmetic",
    # Do not return leading period data
    leading = FALSE,
    # This argument is passed along to xts::to.period, which is wrapped by quantmod::periodReturn
    # We use the last reading of each month to find percentage changes
    indexAt = "lastof"
  ) %>%
  dplyr::rename(asset = symbol) %>%
  na.omit()
  • The function tq_get(x, get = "stock.prices", complete_cases = TRUE, ...) gets data in tibble format. The most important argument is perhaps the dot-dot-dot, since these are the arguments passed to the underlying functions from the other packages that tq_get() uses. There could be multiple layers of wrapper functions, and so it is best practice to read the documentations carefully.

       A possibly more useful method of aggregating asset returns to portfolio returns is then to map a tibble of ticker symbols and weights to the tibble of ticker symbols and monthly returns:

# Create a tibble of weights
weights_tibble <- tibble(
  asset = symbols,
  weights = weights
)
head(weights_tibble, 5)
# A tibble: 5 × 2
  asset weights
  <chr>   <dbl>
1 SPY     0.6  
2 EFA     0.05 
3 IJS     0.250
4 EEM     0.05 
5 AGG     0.05 
# Map the weights to the returns column using the asset column as the identifier
portfolio_returns_tq_rebalanced_monthly_method_2 <- asset_returns_tq %>%
  tidyquant::tq_portfolio(
    assets_col = asset,
    returns_col = returns,
    weights = weights_tibble,
    col_rename = "Monthly_portfolio_returns",
    rebalance_on = "months"
  )
head(portfolio_returns_tq_rebalanced_monthly_method_2, 5)
# A tibble: 5 × 2
  date       Monthly_portfolio_returns
  <date>                         <dbl>
1 2013-01-31                    0.0455
2 2013-02-28                    0.0102
3 2013-03-31                    0.0332
4 2013-04-30                    0.0154
5 2013-05-31                    0.0200

Compare and Contrast the four methods

       We have covered a variety of different methods for aggregating asset monthly returns to portfolio monthly returns. As a sanity check, we want to ensure that these methods yield consistent results:

potfolio_returns_dplyr_byhand %>%
  # Rename the column
  rename(tidyverse_method = Monthly_portfolio_returns) %>%
  # Create three new columns that contain results for other methods
  dplyr::mutate(
    "equation_byhand" = zoo::coredata(x = portfolio_returns_by_hand),
    "tq_method_1" = portfolio_returns_tq_rebalanced_monthly_method_1[["Monthly_portfolio_returns"]],
    "tq_method_2" = portfolio_returns_tq_rebalanced_monthly_method_2[["Monthly_portfolio_returns"]],
    "xts_method" = zoo::coredata(portfolio_returns_xts_rebalanced_monthly)
  ) %>%
  purrr::modify_if(.p = is.numeric, .f = ~ round(x = .x, digits = 7)) %>%
  dplyr::select(-date) %>%
  # Examine
  head(10)
# A tibble: 10 × 5
   tidyverse_method equation_byhand[,… tq_method_1 tq_method_2 xts_method[,"Mon…
              <dbl>              <dbl>       <dbl>       <dbl>             <dbl>
 1           0.0455             0.0455      0.0455      0.0455            0.0455
 2           0.0102             0.0102      0.0102      0.0102            0.0102
 3           0.0332             0.0332      0.0332      0.0332            0.0332
 4           0.0154             0.0154      0.0154      0.0154            0.0154
 5           0.0200             0.0200      0.0200      0.0200            0.0200
 6          -0.0131            -0.0131     -0.0131     -0.0131           -0.0131
 7           0.0509             0.0509      0.0509      0.0509            0.0509
 8          -0.0292            -0.0292     -0.0292     -0.0292           -0.0292
 9           0.0436             0.0436      0.0436      0.0436            0.0436
10           0.0406             0.0406      0.0406      0.0406            0.0406
  • The function coredata(x, ...) is a generic function for extracting the core data contained in, a (more complex) object and replacing it. The replacement function is coredata<-.

       As can be seen, the results are all consistent. Sigh of relief. For the upcoming posts, I also plan on tackling other topics in portfolio theory and to build off of what I’ve covered in this post. In particular, Python also has many great libraries— NumPy, SciPy, Quandl— all of which contain great tools for financial analysis. Utilizing R’s tidyverse ecosystem and Python’s fast, memory-efficient methods, there’s great deal of content to cover.

Related