Portfolio Optimization and Returns in R
In this post, we will explore some finance topics— portfolio optimization and computing portfolio returns. My goal is to apply what I’ve learned in portfolio theory using R as the main tool of analysis. There are many advantages to using a GUI like MS Excel, but R has an amazing data analysis work flow— a sort of one-stop solution from initial data importation to data wrangling and transformation to computations and analysis and then finally to visualizing and reporting results. We will be using functions from several R packages— xts
, PerformanceAnalytics
, PortfolioAnalytics
, tidyquant
, tidyverse
. In particular, the tidyquant
package is a recent development that has markedly enriched R’s financial analytics infrastructure, enhancing its usability in finance. While I cover some theory in this post, my main focus will be on the implementation of these topics in R.
For more readings on the theory, I recommend Essentials of Investments and Practical Portfolio Performance Measurement and Attribution. To learn more about analyzing financial data in R, there is Reproducible Finance with R, which is a very practical book with a strong emphasis on application.
With that being said, let us get financial!
Optimal Weights for a five-asset portfolio (Minimum Variance)
We will employ Markowitz’s Mean-Variance model as the framework for computing optimal weights, essentially treating the task as an “unconstrained” optimization problem. The objective of this optimization problem is one of minimization:
\[\begin{align*} \text{Minimize}\hspace{2mm}(\sigma^{2}=\vec{W}^{T}\sum\vec{W})) \end{align*}\]
subject to the sum of weights constraint and the box constraint: \[ \sum_{i=1}^{N} W_{i}=1 \quad \text { and } \hspace{3mm} \varepsilon_{i} \leq W_{i} \leq \delta_{i} \] where
- where \(\varepsilon_{i}=0.05 \hspace{3mm} \delta_{i}=0.6\) are the lower and upper bounds for the weights \(W_{i}\)
The minimization problem is a quadratic problem with linear constraints, since the variance formula is a quadratic function and the constraints are linear functions; this type of problem is well suited to be solved using a quadratic programming solver. The PortfolioAnalytics
package uses ROI.plugin.quadprog
, a plug-in for the “R” Optimization Infrastructure, to solve the problem. The solver can be specified with the optimize_method argument in optimize.portfolio()
. If optimize_method = “ROI” is specified, a default solver will be selected based on the optimization problem. The glpk
solver is the default solver for LP and MILP optimization problems. The quadprog
solver is the default solver for QP optimization problems.
Implementation in R
For this task, we will import our data from Yahoo Finance. The five assets under examination are Exchange Traded Funds, which are funds that can be traded on an exchange like a stock. Exchange-traded funds are a type of investment fund that offers the best attributes of two popular assets: they have the diversification benefits of mutual funds and the ease with which stocks are traded. However, before we import any data, we must answer the following question: In what form do we want our data to be? Since we are in the world of finance, times series is the most common type of data. R has the xts
package to handle data that are indexed by date. Our task, therefore, reduces to the following:
Import daily prices from Yahoo Finance
Convert daily prices to monthly return
Because we will be aggregating monthly returns to form a portfolio, we will need to compute the simple returns
# Create a vector of ticker symbols
symbols <- c("SPY", "EFA", "IJS", "EEM", "AGG")
# Load data from 2012 to today
# Specify the "to = " argument to specify an end date
prices <- quantmod::getSymbols(
Symbols = symbols,
src = "yahoo",
from = "2012-12-31",
auto.assign = TRUE,
warnings = FALSE
) %>%
# The map function takes an anonymous function and will return a list of five
# The function Ad() extracts the daily adjusted price series
purrr::map(.f = ~ quantmod::Ad(get(x = .x))) %>%
# Use reduce() to merge the elements of .x interactively
purrr::reduce(.f = merge) %>%
# Use a replacement function to set column names to ticker symbols
# This function is in prefix form
# It is equivalent to colnames(x = prices) <- value
`colnames<-`(value = symbols)
# Keep only the last reading of each month
# We could have chosen to keep only the first reading of each month
asset_returns_xts <- xts::to.monthly(
x = prices,
drop.time = TRUE,
indexAt = "lastof",
OHLC = FALSE
) %>%
# Compute simple returns
# Log returns are time-additive but not portfolio additive
PerformanceAnalytics::Return.calculate(method = "discrete") %>%
# Drop the first row since we lose 12/31/2012
stats::na.omit()
# Keep only the xts returns, ticker symbols, and the prices series
rm(list = setdiff(x = ls(), y = c("symbols", "prices", "asset_returns_xts")))
- Create Portfolio object
# Examine the monthly simple returns for our five ETF's
head(x = asset_returns_xts, 5)
SPY EFA IJS EEM AGG
2013-01-31 0.05119012 0.03728447 0.05351525 -0.00293138 -0.0062122794
2013-02-28 0.01275911 -0.01288565 0.01630715 -0.02284013 0.0059090583
2013-03-31 0.03797120 0.01305385 0.04107958 -0.01018292 0.0009854246
2013-04-30 0.01921222 0.05018654 0.00122343 0.01215845 0.0096857443
2013-05-31 0.02360991 -0.03019031 0.04286969 -0.04827924 -0.0200115718
# Create Portfolio object which is essentially a list object
min_var_portfolio <- PortfolioAnalytics::portfolio.spec(assets = symbols)
typeof(min_var_portfolio)
[1] "list"
- Add constraints to the portfolio object
# Add the full investment constraint that specifies that the weights must sum to 1
min_var_portfolio <- PortfolioAnalytics::add.constraint(
portfolio = min_var_portfolio,
type = "full_investment"
)
# Examine the constraint element by extracting min_var_portfolio[["constraints"]][[1]]
str(pluck(.x = min_var_portfolio, "constraints", 1))
List of 6
$ type : chr "full_investment"
$ enabled: logi TRUE
$ message: logi FALSE
$ min_sum: num 1
$ max_sum: num 1
$ call : language PortfolioAnalytics::add.constraint(portfolio = min_var_portfolio, type = "full_investment")
- attr(*, "class")= chr [1:2] "weight_sum_constraint" "constraint"
# Add the box constraint that ensure the weights are between 0.1 and 0.6
min_var_portfolio <- PortfolioAnalytics::add.constraint(
portfolio = min_var_portfolio,
type = "box", min = 0.05, max = 0.6
)
# Examine the constraint element by extracting min_var_portfolio[["constraints"]][[2]]
str(pluck(.x = min_var_portfolio, "constraints", 2))
List of 5
$ type : chr "box"
$ enabled: logi TRUE
$ min : Named num [1:5] 0.05 0.05 0.05 0.05 0.05
..- attr(*, "names")= chr [1:5] "SPY" "EFA" "IJS" "EEM" ...
$ max : Named num [1:5] 0.6 0.6 0.6 0.6 0.6
..- attr(*, "names")= chr [1:5] "SPY" "EFA" "IJS" "EEM" ...
$ call : language PortfolioAnalytics::add.constraint(portfolio = min_var_portfolio, type = "box", min = 0.05, max = 0.6)
- attr(*, "class")= chr [1:2] "box_constraint" "constraint"
- Add objective function
# Add objective to minimize variance
min_var_portfolio <- PortfolioAnalytics::add.objective(
portfolio = min_var_portfolio,
# Minimize risk
type = "risk",
# A character corresponding to a function name, var()
name = "var"
)
- Optimization
# Run the optimization
global_min_portfolio <- PortfolioAnalytics::optimize.portfolio(
R = asset_returns_xts,
portfolio = min_var_portfolio,
# This defaults to the "quadprog" solver
optimize_method = "quadprog",
# Return additional information on the path or portfolios searched
trace = TRUE
)
# Examine returned portfolio list object
global_min_portfolio
***********************************
PortfolioAnalytics Optimization
***********************************
Call:
PortfolioAnalytics::optimize.portfolio(R = asset_returns_xts,
portfolio = min_var_portfolio, optimize_method = "quadprog",
trace = TRUE)
Optimal Weights:
SPY EFA IJS EEM AGG
0.2352 0.0546 0.0500 0.0602 0.6000
Objective Measure:
StdDev
0.02121
Optimal Weights for the five-asset portfolio (Maximum Expected Return)
Now that we found the global minimum variance portfolio, we may be interested in finding the maximal expected return portfolio. The objective is one of maximization:
\[\begin{align*} \text{Maximize}\hspace{2mm}(\vec{\mu}^{T}\vec{W}) \end{align*}\]
subject to the sum of weights constraint and the box constraint: \[ \sum_{i=1}^{N} W_{i}=1 \quad \text { and } \hspace{3mm} \varepsilon_{i} \leq W_{i} \leq \delta_{i} \]
The optimization problem in this case is a linear programming problem, since the portfolio expected return formula is a linear function. This is best tackled using a linear programming solver. The package PortfolioAnalytics
uses the ROI package with the glpk plugin
, the GNU Linear Programming toolkit of R’s Optimization Infrastructure.
Implementation in R
- Create portfolio object
# Create Portfolio object
max_exp_return_portfolio <- PortfolioAnalytics::portfolio.spec(assets = symbols)
- Add constraints to the object
# Add the full investment constraint that specifies the weights must sum to 1
max_exp_return_portfolio <- PortfolioAnalytics::add.constraint(
portfolio = max_exp_return_portfolio,
type = "full_investment"
)
# Add the box constraint that ensure the weights are between 0.1 and 0.6
max_exp_return_portfolio <- PortfolioAnalytics::add.constraint(
portfolio = max_exp_return_portfolio,
type = "box", min = 0.05, max = 0.6
)
- Add objective function
# Add objective to maximize mean returns
max_exp_return_portfolio <- PortfolioAnalytics::add.objective(
portfolio = max_exp_return_portfolio,
# Maximize expected returns
type = "return",
# A character corresponding to a function name, mean()
name = "mean"
)
- Optimization
# Run the optimization
global_max_portfolio <- PortfolioAnalytics::optimize.portfolio(
R = asset_returns_xts,
portfolio = max_exp_return_portfolio,
# This defaults to the "glpk" solver
optimize_method = "glpk",
# Return additional information on the path or portfolios searched
trace = TRUE
)
# Examine returned portfolio list object
global_max_portfolio
***********************************
PortfolioAnalytics Optimization
***********************************
Call:
PortfolioAnalytics::optimize.portfolio(R = asset_returns_xts,
portfolio = max_exp_return_portfolio, optimize_method = "glpk",
trace = TRUE)
Optimal Weights:
SPY EFA IJS EEM AGG
0.60 0.05 0.25 0.05 0.05
Objective Measure:
mean
0.01037
Building a portfolio
We have found two sets of optimal weights that yield portfolios that offer the lowest possible risk and the high possible expected return given two basic constraints. Our next task is to aggregate the monthly returns of the individual ETF’s to find the monthly returns of our five-asset portfolio.
# Set optimal weights
weights <- pluck(.x = global_max_portfolio, "weights")
# Check if the weights and symbols align
tibble(weights, symbols)
# A tibble: 5 × 2
weights symbols
<dbl> <chr>
1 0.6 SPY
2 0.05 EFA
3 0.250 IJS
4 0.05 EEM
5 0.05 AGG
# Ensure that the weights vector sums up to 1
tibble(weights, symbols) %>%
dplyr::summarize(total_weight = sum(weights))
# A tibble: 1 × 1
total_weight
<dbl>
1 1
The portfolio return in month, \(t\), is given by:
\[\begin{align*} r_{\text{portfolio,t}}=\sum_{i=1}^{5}W_{i}r_{i,t} \end{align*}\]
Monthly portfolio returns by hand
We can compute portfolio monthly returns using a brute force method:
# Compute by hand
portfolio_returns_by_hand <-
(weights[[1]] * asset_returns_xts[, 1]) +
(weights[[2]] * asset_returns_xts[, 2]) +
(weights[[3]] * asset_returns_xts[, 3]) +
(weights[[4]] * asset_returns_xts[, 4]) +
(weights[[5]] * asset_returns_xts[, 5])
# Name the series
portfolio_returns_by_hand <- `names<-`(portfolio_returns_by_hand, "Monthly portfolio returns")
# Examine
head(portfolio_returns_by_hand, 5)
Monthly portfolio returns
2013-01-31 0.04549992
2013-02-28 0.01024142
2013-03-31 0.03324543
2013-04-30 0.01543473
2013-05-31 0.01995931
Monthly portfolio returns in xts
Another way to compute portfolio monthly returns is to use functions from the PerformanceAnalytics
package, which works well with xts
objects. We also adopt monthly re-balancing as a strategy. The re-balancing of investments is the action of keeping the portfolio weights consistent with the optimal weights. Note that re-balancing the weights every month may be unrealistic; in reality, the convention is often to re-balance quarterly or annually. For this example, however, we will re-balance monthly to be consistent with our brute force computation.
# Compute monthly portfolio returns
portfolio_returns_xts_rebalanced_monthly <-
PerformanceAnalytics::Return.portfolio(
R = asset_returns_xts,
weights = weights,
# Monthly re-balancing
reblance_on = "months",
# Use simple/arithmetic chaining to aggregate returns
geometric = FALSE
) %>%
`colnames<-`("Monthly_portfolio_returns")
# Examine
head(portfolio_returns_xts_rebalanced_monthly, 5)
Monthly_portfolio_returns
2013-01-31 0.04549992
2013-02-28 0.01024142
2013-03-31 0.03324543
2013-04-30 0.01543473
2013-05-31 0.01995931
- The function
Return.portfolio(R, weights = NULL, wealth.index = FALSE, contribution = FALSE, geometric = TRUE, rebalance_on = c(NA, "years", "quarters", "months", "weeks", "days"), value = 1, verbose = FALSE, ...)
calculates the returns of a portfolio with the same periodicity of the returns data.
Monthly portfolio returns in the tidyverse
The tidyverse
is a collection of R packages designed with the same underlying philosophy, grammar, and data structures. Simply put, the “tidy” data structure that works well with tidyverse functions is one where every row is an observation and every column is a variable. If we re-examine the xts
object called “asset_returns_xts,” we see that every column is a returns series for a particular asset. This is inconsistent with the tidyverse data structure, and so we must convert the xts object to a tidy format for our computations. Ideally, we would like to have one column called “asset” that specifies the names of the ETF instead of having five individual columns of returns data. This idea will become clearer once we convert our xts object to a tibble.
Now, examine the “tidy” data structure and compare it to the xts object created earlier:
FSA::headtail(asset_returns_long, n = 5)
date asset returns
1 2013-01-31 SPY 0.051190115
2 2013-01-31 EFA 0.037284472
3 2013-01-31 IJS 0.053515246
4 2013-01-31 EEM -0.002931380
5 2013-01-31 AGG -0.006212279
696 2024-08-31 SPY 0.023365580
697 2024-08-31 EFA 0.032603319
698 2024-08-31 IJS -0.013794372
699 2024-08-31 EEM 0.009778770
700 2024-08-31 AGG 0.014614500
Since our returns data is in a “tidy” format, computing portfolio monthly returns is very a straight forward. For those who are familiar with SQL, we are essentially using the CASE statement here.
# Use vectorized if else statements to assign weights according to the asset column
potfolio_returns_dplyr_byhand <- asset_returns_long %>%
group_by(asset) %>%
mutate(
weights = dplyr::case_when(
asset == symbols[[1]] ~ weights[[1]],
asset == symbols[[2]] ~ weights[[2]],
asset == symbols[[3]] ~ weights[[3]],
asset == symbols[[4]] ~ weights[[4]],
asset == symbols[[5]] ~ weights[[5]]
),
weighted_returns = weights * returns
) %>%
# Group by date
# We need to group by date so that the aggregate sum() function is carried out by month
# For each date, the original series has 5 weighted returns, one for each ETF
# The results should be 1 portfolio return (the sum of the 5 weighted returns) for each month
dplyr::group_by(date) %>%
# Compute monthly portfolio returns
dplyr::summarize(Monthly_portfolio_returns = sum(weighted_returns))
# Examine the data
head(potfolio_returns_dplyr_byhand, 5)
# A tibble: 5 × 2
date Monthly_portfolio_returns
<date> <dbl>
1 2013-01-31 0.0455
2 2013-02-28 0.0102
3 2013-03-31 0.0332
4 2013-04-30 0.0154
5 2013-05-31 0.0200
- The function
summarize(.data, ..., .groups = NULL)
creates a new data frame. It will have one (or more) rows for each combination of grouping variables.
Monthly portfolio returns in tidyquant
The tidyquant
package gives us two ways to compute the portfolio monthly returns.
# Use function from the tidyquant
portfolio_returns_tq_rebalanced_monthly_method_1 <- asset_returns_long %>%
tidyquant::tq_portfolio(
assets_col = asset,
returns_col = returns,
weights = weights,
col_rename = "Monthly_portfolio_returns",
rebalance_on = "months"
)
# Examine
head(portfolio_returns_tq_rebalanced_monthly_method_1, 5)
# A tibble: 5 × 2
date Monthly_portfolio_returns
<date> <dbl>
1 2013-01-31 0.0455
2 2013-02-28 0.0102
3 2013-03-31 0.0332
4 2013-04-30 0.0154
5 2013-05-31 0.0200
- The function
tq_portfolio(data, assets_col, returns_col, weights = NULL, col_rename = NULL, ...)
aggregates a group of returns by asset into portfolio returns.
An alternative way to load data is to use the tidyquant
wrapper function, which automatically coerces the returns into a long tidy format:
# Load data
asset_returns_tq <- tidyquant::tq_get(
x = symbols,
get = "stock.prices",
from = "2012-12-31"
) %>%
# The asset column is named symbol by default (see body(tidyquant::tq_get))
dplyr::group_by(symbol) %>%
# Select adjusted daily prices
tidyquant::tq_transmute(
select = adjusted,
col_rename = "returns",
# This function is from quantmod
mutate_fun = periodReturn,
# These arguments are passed along to the mutate function quantmod::periodReturn
period = "monthly",
# Simple returns
type = "arithmetic",
# Do not return leading period data
leading = FALSE,
# This argument is passed along to xts::to.period, which is wrapped by quantmod::periodReturn
# We use the last reading of each month to find percentage changes
indexAt = "lastof"
) %>%
dplyr::rename(asset = symbol) %>%
na.omit()
- The function
tq_get(x, get = "stock.prices", complete_cases = TRUE, ...)
gets data in tibble format. The most important argument is perhaps the dot-dot-dot, since these are the arguments passed to the underlying functions from the other packages thattq_get()
uses. There could be multiple layers of wrapper functions, and so it is best practice to read the documentations carefully.
A possibly more useful method of aggregating asset returns to portfolio returns is then to map a tibble of ticker symbols and weights to the tibble of ticker symbols and monthly returns:
# Create a tibble of weights
weights_tibble <- tibble(
asset = symbols,
weights = weights
)
head(weights_tibble, 5)
# A tibble: 5 × 2
asset weights
<chr> <dbl>
1 SPY 0.6
2 EFA 0.05
3 IJS 0.250
4 EEM 0.05
5 AGG 0.05
# Map the weights to the returns column using the asset column as the identifier
portfolio_returns_tq_rebalanced_monthly_method_2 <- asset_returns_tq %>%
tidyquant::tq_portfolio(
assets_col = asset,
returns_col = returns,
weights = weights_tibble,
col_rename = "Monthly_portfolio_returns",
rebalance_on = "months"
)
head(portfolio_returns_tq_rebalanced_monthly_method_2, 5)
# A tibble: 5 × 2
date Monthly_portfolio_returns
<date> <dbl>
1 2013-01-31 0.0455
2 2013-02-28 0.0102
3 2013-03-31 0.0332
4 2013-04-30 0.0154
5 2013-05-31 0.0200
Compare and Contrast the four methods
We have covered a variety of different methods for aggregating asset monthly returns to portfolio monthly returns. As a sanity check, we want to ensure that these methods yield consistent results:
potfolio_returns_dplyr_byhand %>%
# Rename the column
rename(tidyverse_method = Monthly_portfolio_returns) %>%
# Create three new columns that contain results for other methods
dplyr::mutate(
"equation_byhand" = zoo::coredata(x = portfolio_returns_by_hand),
"tq_method_1" = portfolio_returns_tq_rebalanced_monthly_method_1[["Monthly_portfolio_returns"]],
"tq_method_2" = portfolio_returns_tq_rebalanced_monthly_method_2[["Monthly_portfolio_returns"]],
"xts_method" = zoo::coredata(portfolio_returns_xts_rebalanced_monthly)
) %>%
purrr::modify_if(.p = is.numeric, .f = ~ round(x = .x, digits = 7)) %>%
dplyr::select(-date) %>%
# Examine
head(10)
# A tibble: 10 × 5
tidyverse_method equation_byhand[,"Monthly portfoli…¹ tq_method_1 tq_method_2
<dbl> <dbl> <dbl> <dbl>
1 0.0455 0.0455 0.0455 0.0455
2 0.0102 0.0102 0.0102 0.0102
3 0.0332 0.0332 0.0332 0.0332
4 0.0154 0.0154 0.0154 0.0154
5 0.0200 0.0200 0.0200 0.0200
6 -0.0131 -0.0131 -0.0131 -0.0131
7 0.0509 0.0509 0.0509 0.0509
8 -0.0292 -0.0292 -0.0292 -0.0292
9 0.0436 0.0436 0.0436 0.0436
10 0.0406 0.0406 0.0406 0.0406
# ℹ abbreviated name: ¹equation_byhand[,"Monthly portfolio returns"]
# ℹ 1 more variable: xts_method <dbl[,1]>
- The function
coredata(x, ...)
is a generic function for extracting the core data contained in, a (more complex) object and replacing it. The replacement function iscoredata<-
.
As can be seen, the results are all consistent. Sigh of relief. For the upcoming posts, I also plan on tackling other topics in portfolio theory and to build off of what I’ve covered in this post. In particular, Python also has many great libraries— NumPy, SciPy, Quandl— all of which contain great tools for financial analysis. Utilizing R’s tidyverse ecosystem and Python’s fast, memory-efficient methods, there’s great deal of content to cover.