diff --git a/.DS_Store b/.DS_Store index 1cbe1de6..10bae7ee 100644 Binary files a/.DS_Store and b/.DS_Store differ diff --git a/_freeze/r/capital-asset-pricing-model/execute-results/html.json b/_freeze/r/capital-asset-pricing-model/execute-results/html.json new file mode 100644 index 00000000..99a2cc43 --- /dev/null +++ b/_freeze/r/capital-asset-pricing-model/execute-results/html.json @@ -0,0 +1,17 @@ +{ + "hash": "a27fbd74deab7a6c98e3794d0047cf17", + "result": { + "engine": "knitr", + "markdown": "---\ntitle: The Capital Asset Pricing Model\nmetadata:\n pagetitle: The CAPM with R\n description-meta: Learn how to use the programming language R for estimating the CAPM for asset evaluation.\n---\n\n\n\n\nKey questions: \n\n- What is the expected return of an asset?\n- Which portfolios should investors hold?\n\nCAPM is an **equilibrium model**\n\n- Extends MPT by including **systematic risk**\n- Adds simplifying assumptions to model **rational investors**\n- Developed by: [Sharpe (1964)](https://onlinelibrary.wiley.com/doi/10.1111/j.1540-6261.1964.tb02865.x), [Lintner (1965)](https://www.jstor.org/stable/1924119?origin=crossref) & [Mossin (1966)](https://www.jstor.org/stable/1910098?origin=crossref)\n\nThe CAPM in a nutshell: investors demand a **compensation for risk**\n\n- Expected Return = Risk-Free Rate + Compensation for Market Risk\n- **Risk-Free Rate**: return on an investment without risk\n- **Market Risk**: how much asset returns co-move with the overall market\n\n\n\n\n\n::: {.cell}\n\n```{.r .cell-code}\nlibrary(tidyverse)\nlibrary(tidyfinance)\nlibrary(scales)\nlibrary(ggrepel)\n```\n:::\n\n\n\n\n## Asset Returns & Volatilities\n\n\n\n\n::: {.cell}\n\n```{.r .cell-code}\nsymbols <- download_data(\n type = \"constituents\",\n index = \"Dow Jones Industrial Average\"\n)\n\nprices_daily <- download_data(\n type = \"stock_prices\", symbol = symbols$symbol,\n start_date = \"2019-10-01\", end_date = \"2024-09-30\"\n) |> \n select(symbol, date, price = adjusted_close)\n```\n:::\n\n\n\n\nCalculate *daily* returns\n\n\n\n\n::: {.cell}\n\n```{.r .cell-code}\nreturns_daily <- prices_daily |>\n group_by(symbol) |> \n mutate(ret = price / lag(price) - 1) |>\n ungroup() |> \n select(symbol, date, ret) |> \n drop_na(ret) |> \n arrange(symbol, date)\n```\n:::\n\n\n\n\nPlot risk & return @fig-300\n\n\n\n\n::: {.cell}\n\n```{.r .cell-code}\nassets <- returns_daily |> \n group_by(symbol) |> \n summarize(mu = mean(ret), sigma = sd(ret))\n\nfig_vola_return <- assets |> \n ggplot(aes(x = sigma, y = mu)) +\n geom_point() + \n geom_label_repel(data = assets |> filter(symbol %in% c(\"BA\", \"NVDA\")),\n aes(label = symbol)) +\n scale_x_continuous(labels = percent) +\n scale_y_continuous(labels = percent) + \n labs(x = \"Volatility\", y = \"Average return\",\n title = \"Average returns and volatilities of Dow index constituents\") \nfig_vola_return\n```\n\n::: {.cell-output-display}\n![Average returns and volatilities are based on returns adjusted for dividend payments and stock splits.](capital-asset-pricing-model_files/figure-html/fig-300-1.png){#fig-300 fig-alt='Title: Average returns and volatilities of Dow index constituents. The figure shows a scatter plot with volatities on the horizontal axis and average returns on the vertical axis. The stocks Nvidia and Boeing are highlighted because they exhibit the highest and lower average returns, respectively.' width=2100}\n:::\n:::\n\n\n\n\nDoes high risk bring high returns?\n\nBoeing (BA) vs Nvidia (NVDA)\n\n- Company-specific events might affect stock prices\n- Examples: CEO resignation, product launch, earnings report\n- Idiosyncratic events don't impact the overall market\n- This asset-specific risk can be eliminated through diversification\n\nFocus on **systematic risk** that affects all assets in the market\n\nSystematic vs idiosyncratic risk: Investors **dislike risk**\n\nDifferent sources of risk\n\n- **Systematic risk**: all assets are exposed to it, cannot be diversified away\n- **Idiosyncratic risk**: unique to particular asset, can be diversified away\n\n## Portfolio Return & Variance\n\n$\\text{Expected Portfolio Return} = \\omega'\\mu$\n\n -\t$\\omega$: vector of asset weights\n - $\\mu$: vector of expected return of assets\n\n$\\text{Portfolio Variance} = \\omega' \\Sigma \\omega$\n\n - $\\Sigma$: variance-covariance matrix\n\nIntroducing the risk-free asset: Allocate capital between risk-free asset & risky portfolio\n\n$$\\mu_c = c \\omega'\\mu + (1-c)r_f$$\n\n- $\\mu_{c}$: combined portfolio return\n- $r_f$: return of risk-free asset (e.g. government bond)\n- $c$ fraction of capital in risky portfolio\n\nRisk-free asset has 0 volatility.\n\n- Portfolio risk $\\sigma_c$ is measured by volatility of risky asset\n- $\\sigma_c= c\\sqrt{\\omega' \\Sigma \\omega}$ $\\Rightarrow$ $c = \\frac{\\sigma_c}{\\sqrt{\\omega' \\Sigma \\omega}}$\n\nAllows us to derive a **Capital Allocation Line** (CAL) \n\n$$\\mu_c = r_f +\\sigma_c \\frac{\\omega'\\mu-r_f}{\\sqrt{\\omega' \\Sigma \\omega}}$$\n\nSlope of CAL is called **Sharpe ratio**\n\n$$\\text{Sharpe ratio} = \\frac{\\omega'\\mu-r_f}{\\sqrt{\\omega' \\Sigma \\omega}}$$\n\n- Measures **excess return per unit of risk**\n- Higher ratio indicates more attractive **risk-adjusted return**\n\nCalculate the risk-free rate: 13-week T-bill rate (^IRX) is quoted in annualized percentage yields. Convert annualized to daily rates (252 trading days). Note: this approach has a 99% correlation with Fama-French risk free rate.\n\n\n\n\n::: {.cell}\n\n```{.r .cell-code}\nrisk_free_daily <- download_data(\n type = \"stock_prices\", symbol = \"^IRX\", \n start_date = \"2019-10-01\", end_date = \"2024-09-30\"\n) |> \n mutate(\n risk_free = (1 + adjusted_close / 100)^(1 / 252) - 1\n ) |> \n select(date, risk_free) |> \n drop_na()\n```\n:::\n\n\n\n\nCreate example portfolios\n\n\n\n\n::: {.cell}\n\n```{.r .cell-code}\nmu <- assets$mu\nsigma <- returns_daily |> \n pivot_wider(names_from = symbol, values_from = ret) |> \n select(-date) |> \n cov()\n```\n:::\n\n\n\n\nPortfolio with equal weights\n\n\n\n::: {.cell}\n\n```{.r .cell-code}\nnumber_of_assets <- nrow(assets)\nomega_ew <- rep(1 / number_of_assets, number_of_assets)\n\nsummary_ew <- tibble(\n mu = as.numeric(t(omega_ew) %*% mu),\n sigma = as.numeric(sqrt(t(omega_ew) %*% sigma %*% omega_ew)),\n type = \"Equal-Weighted Portfolio\"\n)\n```\n:::\n\n\n\n\nPortfolio with random weights\n\n\n\n::: {.cell}\n\n```{.r .cell-code}\nset.seed(1234)\nomega_random <- runif(number_of_assets, -1, 1)\nomega_random <- omega_random / sum(omega_random)\n\nsummary_random <- tibble(\n mu = as.numeric(t(omega_random) %*% mu),\n sigma = as.numeric(sqrt(t(omega_random) %*% sigma %*% omega_random)),\n type = \"Randomly-Weighted Portfolio\"\n)\n```\n:::\n\n\n\n\nRisk-free asset \n\n\n\n::: {.cell}\n\n```{.r .cell-code}\nsummary_risk_free <- tibble(\n mu = mean(risk_free_daily$risk_free),\n sigma = 0,\n type = \"Risk-Free Asset\"\n)\n\nsummaries <- bind_rows(assets, summary_ew, summary_random, summary_risk_free)\n```\n:::\n\n\n\n\n\nPlot CALs. First introduce helper function to calculate Sharpe Ratio. \n\n\n\n\n::: {.cell}\n\n```{.r .cell-code}\ncalculate_sharpe_ratio <- function(mu, sigma, risk_free) {\n as.numeric(mu - risk_free) / sigma \n}\n\nsummaries <- summaries |> \n mutate(\n sharpe_ratio = if_else(\n str_detect(type, \"Portfolio\"), \n calculate_sharpe_ratio(mu, sigma, risk_free = summary_risk_free$mu),\n NA\n ),\n risk_free = summary_risk_free$mu\n )\n```\n:::\n\n\n\n\nSee @fig-301\n\n\n\n\n::: {.cell}\n\n```{.r .cell-code}\nfig_cal <- summaries |> \n ggplot(aes(x = sigma, y = mu)) +\n geom_abline(aes(intercept = risk_free, slope = sharpe_ratio, color = type),\n linetype = \"dashed\", linewidth = 1) +\n geom_point(data = summaries |> filter(is.na(type))) +\n geom_point(data = summaries |> filter(!is.na(type)), shape = 4, size = 4) + \n geom_label_repel(aes(label = type)) + \n scale_x_continuous(labels = percent) +\n scale_y_continuous(labels = percent) + \n labs(\n x = \"Volatility\", y = \"Average return\",\n title = \"Average returns and volatilities of Dow index constituents with capital allocation lines\"\n ) +\n theme(legend.position = \"none\")\nfig_cal\n```\n\n::: {.cell-output-display}\n![Points correspond to individual assets, crosses to portfolios.](capital-asset-pricing-model_files/figure-html/fig-301-1.png){#fig-301 fig-alt='Title: Average returns and volatilities of Dow index constituents with capital allocation lines. he figure shows a scatter plot with volatities on the horizontal axis and average returns on the vertical axis. In addition, the figure shows capital allocation lines that connect the risk-free asset to the equal-weighted and randomly-weighted portfolio, respectively.' width=2100}\n:::\n:::\n\n\n\n\n## The Tangency Portfolio\n\nThe portfolio that **maximizes Sharpe ratio**\n\n$$\\max_w \\frac{\\omega' \\mu - r_f}{\\sqrt{\\omega' \\Sigma \\omega}}$$\nwhile staying **fully invested**\n\n$$ \\omega'\\iota = 1$$\n\nis called the **tangency portfolio**\n\nCalculate the tangency portfolio\n\n**Analytic solution** for tangency portfolio (see [here](https://bookdown.org/compfinezbook/introcompfinr/Efficient-portfolios-of.html#computing-the-tangency-portfolio-using-matrix-algebra))\n\n$$\\omega_{tan}=\\frac{\\Sigma^{-1}(\\mu-r_f)}{\\iota'\\Sigma^{-1}(\\mu-r_f)}$$\n\n\n\n\n::: {.cell}\n\n```{.r .cell-code}\nomega_tangency <- solve(sigma) %*% (mu - summary_risk_free$mu)\nomega_tangency <- as.vector(omega_tangency / sum(omega_tangency))\n\nsummary_tangency <- tibble(\n mu = as.numeric(t(omega_tangency) %*% mu),\n sigma = as.numeric(sqrt(t(omega_tangency) %*% sigma %*% omega_tangency)),\n type = \"Tangency Portfolio\",\n sharpe_ratio = calculate_sharpe_ratio(mu, sigma, risk_free = summary_risk_free$mu),\n risk_free = summary_risk_free$mu\n)\n```\n:::\n\n\n\n\n## The Capital Market Line \n\nCombination of risk-free asset & the tangency portfolio $\\omega_{tan}$\n\n$$\\mu_{c} = r_f +\\sigma_c \\frac{\\omega_{tan}'\\mu-r_f}{\\sqrt{\\omega_{tan}' \\Sigma \\omega_{tan}}}$$\n\nis called the **Capital Market Line** (CML)\n\n
\n\nCML describes **best risk-return trade-off** for portfolios that contain risk-free asset & tangency portfolio\n\nPlot the CML\n\n\n\n\n::: {.cell}\n\n```{.r .cell-code}\nsummaries <- bind_rows(summaries, summary_tangency)\n\nfig_cml <- summaries |> \n ggplot(aes(x = sigma, y = mu)) +\n geom_abline(aes(intercept = risk_free, slope = sharpe_ratio, color = type),\n linetype = \"dashed\", linewidth = 1) +\n geom_point(data = summaries |> filter(is.na(type))) +\n geom_point(data = summaries |> filter(!is.na(type)), shape = 4, size = 4) + \n ggrepel::geom_label_repel(aes(label = type)) + \n scale_x_continuous(labels = percent) +\n scale_y_continuous(labels = percent) + \n labs(x = \"Volatility\", y = \"Average return\",\n title = \"Average returns and volatilities of Dow index constituents with the capital market line\") +\n theme(legend.position = \"none\")\nfig_cml\n```\n\n::: {.cell-output-display}\n![Points correspond to individual assets, crosses to portfolios.](capital-asset-pricing-model_files/figure-html/fig-302-1.png){#fig-302 fig-alt='Title: Average returns and volatilities of Dow index constituents with capital allocation lines. he figure shows a scatter plot with volatities on the horizontal axis and average returns on the vertical axis. In addition, the figure shows capital allocation lines that connect the risk-free asset to the equal-weighted, randomly-weighted, and tangency portfolio, respectively.' width=2100}\n:::\n:::\n\n\n\n\nPortfolios vs individual assets. In the CAPM model:\n\n- Investors prefer to hold any portfolio on the CML over individual assets or any other portfolio\n- All rational investors hold the tangency portfolio\n- Return of an individual asset can be compared to efficient tangency weight\n- Risk of an asset is proportional to covariance with tangency portfolio weight\n\nExpected excess returns vs tangency weights. **Expected excess return of asset** $i$ is\n\n$$\\mu_i - r_f = \\beta_i \\cdot (\\omega_{tan}'\\mu - r_f)$$\n\nwhere\n\n$$\\beta_i = \\frac{\\text{Cov}(r_i, \\omega_{tan}r)}{\\omega_{tan}' \\Sigma \\omega_{tan}}$$\n\nis called the **asset beta**\n\nCalculate excess returns\n\n\n\n\n::: {.cell}\n\n```{.r .cell-code}\ntangency_weights <- tibble(\n symbol = assets$symbol, \n omega_tangency = omega_tangency\n)\n\nreturns_tangency_daily <- returns_daily |> \n left_join(tangency_weights, join_by(symbol)) |> \n group_by(date) |> \n summarize(mkt_ret = weighted.mean(ret, omega_tangency))\n\nreturns_excess_daily <- returns_daily |> \n left_join(returns_tangency_daily, join_by(date)) |> \n left_join(risk_free_daily, join_by(date)) |> \n mutate(ret_excess = ret - risk_free,\n mkt_excess = mkt_ret - risk_free) |> \n select(symbol, date, ret_excess, mkt_excess)\n```\n:::\n\n\n\n\nEstimate Asset Betas\n\n\n\n\n::: {.cell}\n\n```{.r .cell-code}\nestimate_beta <- function(data) {\n fit <- lm(\"ret_excess ~ mkt_excess - 1\", data = data)\n coefficients(fit)\n}\n \nbeta_results <- returns_excess_daily |> \n nest(data = -symbol) |> \n mutate(beta = map_dbl(data, estimate_beta))\n```\n:::\n\n\n\n\nPlot asset betas\n\n\n\n\n::: {.cell}\n\n```{.r .cell-code}\nfig_betas <- beta_results |> \n ggplot(aes(x = beta, y = fct_reorder(symbol, beta))) +\n geom_col() +\n labs(\n x = \"Estimated asset beta\", y = NULL, \n title = \"Estimated asset betas based on the tangency portfolio for Dow index constituents\"\n )\nfig_betas\n```\n\n::: {.cell-output-display}\n![Weights are based on returns adjusted for dividend payments and stock splits.](capital-asset-pricing-model_files/figure-html/fig-303-1.png){#fig-303 fig-alt='Title: Estimated asset betas based on the tangency portfolio for Dow index constituents. The figure shows a bar chart with estimated asset betas for each Dow index constituent.' width=2100}\n:::\n:::\n\n\n\n\nAsset returns vs systematic risk: the assets all fall onto the 45 degree line, as they should according to CAPM @fig-304\n\n\n\n\n::: {.cell}\n\n```{.r .cell-code}\nassets <- assets |> \n mutate(mu_excess = mu - summary_risk_free$mu) |> \n left_join(beta_results, join_by(symbol))\n \nfig_betas_returns <- assets |> \n ggplot(aes(x = beta, y = mu_excess)) + \n geom_abline(intercept = 0, \n slope = summary_tangency$mu - summary_risk_free$mu) + \n geom_point() +\n geom_label_repel(data = assets |> filter(symbol %in% c(\"BA\", \"NVDA\")),\n aes(label = symbol)) + \n scale_y_continuous(labels = percent) + \n labs(\n x = \"Estimated asset beta\", y = \"Average return\", \n title = \"Estimated CAPM-betas and average returns for Dow index constituents\"\n )\nfig_betas_returns\n```\n\n::: {.cell-output-display}\n![Estimates are based on returns adjusted for dividend payments and stock splits and using the tangency portfolio as a measure for the market.](capital-asset-pricing-model_files/figure-html/fig-304-1.png){#fig-304 fig-alt='Title: Estimated CAPM-betas and average returns for Dow index constituents. The figure shows a bar scatter plot with estimated asset betas on the horizontal and average returns on the vertical axis. All points fall onto the 45-degree line.' width=2100}\n:::\n:::\n\n\n\n\n## CAPM in Practice\n\nHow to estimate betas in practice?\n\nCalculating the **tangency portfolio** can be **cumbersome**\n\n- What is the correct asset universe?\n- How to estimate $\\mu$ and $\\Sigma$ for many assets?\n\nIn the CAPM: **market portfolio = tangency portfolio**\n\n- Skip calculation of tangency portfolio weights\n- Use portfolios weighted by market capitalization \n\nKey assumptions behind CAPM:\n\n- Equilibrium model in a single-period economy\n-\tNo transaction costs or taxes\n-\tRisk-free borrowing and lending are available to all investors\n-\tInvestors share homogeneous expectations\n-\tInvestors maximize returns for limited level of risk\n\nCAPM is a **foundation for other models** because of its simplicity\n\nThe Security Market Line (SML). **Expected return of asset** $i$ is\n\n$$\\mu_i = r_f + \\beta_i \\cdot (\\mu_m - r_f)$$\n\nwhere\n\n$$\\beta_i = \\frac{\\sigma_{im}}{\\sigma_m^2}$$\n\n- $\\mu_m$: expected market returns\n- $\\sigma_{im}$: covariance of asset $i$ with market\n- $\\sigma_m$: market volatility\n\nEvaluate asset performance with the SML: **Alpha** is difference between actual excess return & expected return\n\n$$\\mu_i - r_f = \\alpha_i + \\beta_i \\cdot (\\mu_m - r_f)$$\n\nAlpha is **performance adjusted for market risk**\n\n- **Positive alpha**: outperformance relative to market\n- **Negative alpha**: underperformance relative to market\n\nEstimate Asset Alphas & Beta. Regression model:\n\n$$r_{i,t} - r_{f,t} = \\hat{\\alpha}_i + \\hat{\\beta}_i \\cdot (r_{m,t} - r_{f,t} ) + \\hat{\\varepsilon}_{i,t} $$\n\n- $r_{i,t}$: actual returns of asset $i$ on day $t$\n- $r_{m,t}$: actual market returns on day $t$\n\nDownload excess market returns\n\n\n\n\n::: {.cell}\n\n```{.r .cell-code}\nfactors <- download_data(\n type = \"factors_ff_5_2x3_daily\", \n start_date = \"2019-10-01\", end_date = \"2024-09-30\"\n) |> \n select(date, mkt_excess, risk_free)\n```\n:::\n\n\n\n\nEstimate alphas & betas\n\n\n\n\n::: {.cell}\n\n```{.r .cell-code}\nreturns_excess_daily <- returns_daily |> \n left_join(factors, join_by(date)) |> \n mutate(ret_excess = ret - risk_free) |> \n select(symbol, date, ret_excess, mkt_excess)\n\nestimate_capm <- function(data) {\n fit <- lm(\"ret_excess ~ mkt_excess\", data = data)\n tibble(\n coefficient = c(\"alpha\", \"beta\"),\n estimate = coefficients(fit),\n t_statistic = summary(fit)$coefficients[, \"t value\"]\n )\n}\n \ncapm_results <- returns_excess_daily |> \n nest(data = -symbol) |> \n mutate(capm = map(data, estimate_capm)) |> \n unnest(capm) |> \n select(symbol, coefficient, estimate, t_statistic)\n```\n:::\n\n\n\n\nPlot asset alphas\n\n\n\n\n::: {.cell}\n\n```{.r .cell-code}\nfig_alpha <- capm_results |> \n filter(coefficient == \"alpha\") |> \n mutate(is_significant = abs(t_statistic) >= 1.96) |> \n ggplot(aes(x = estimate, y = fct_reorder(symbol, estimate), \n fill = is_significant)) +\n geom_col() +\n scale_x_continuous(labels = percent) + \n labs(\n x = \"Estimated asset alphas\", y = NULL, fill = \"Significant at 95%?\",\n title = \"Estimated CAPM alphas for Dow index constituents\"\n )\nfig_alpha\n```\n\n::: {.cell-output-display}\n![Estimates are based on returns adjusted for dividend payments and stock splits and using the Fama-French market excess returns as a measure for the market.](capital-asset-pricing-model_files/figure-html/fig-305-1.png){#fig-305 fig-alt='Title: Estimated CAPM alphas for Dow index constituents. The figure shows a bar chart with estimated alphas and indicates whether an estimate is statistically significant at 95%. Only Nvidia exhibits a statistically significant positive alpha.' width=2100}\n:::\n:::\n\n\n\n\n## Shortcomings & Extensions\n\nPopular shortcomings of CAPM\n\n- Impossible to create **universal measure for market**\n - Market definition might depend on context (e.g. S&P 500, DAX, TOPIX)\n- **Beta** might **not be stable **over time\n - Company operations, leverage or competitive environment might change beta\n- **Systematic risk** might not be the **only factor**\n - Poor empirical performance in explaining small-cap or high-growth returns\n- Many more: behavioral biases, heterogeneous preferences, liquidity, etc.\n\nAlternatives & extensions.\n\n[Fama-French 3-Factor model](https://en.wikipedia.org/wiki/Fama%E2%80%93French_three-factor_model) extends CAPM\n\n- Outperformance of small vs big companies (see [tidy-finance.org](size-sorts-and-p-hacking.qmd))\n- Outperformance of high vs low value companies (see [tidy-finance.org](value-and-bivariate-sorts.qmd))\n \nFama-French 5-Factor model extends 3-factor model (see [tidy-finance.org](replicating-fama-and-french-factors.qmd))\n\n- Outperformance of companies with robust vs weak operating profitability\n- Outperformance of companies with conservative vs aggressive investment\n\nMany more: [consumption CAPM](https://en.wikipedia.org/wiki/Consumption-based_capital_asset_pricing_model), [conditional CAPM](https://www.jstor.org/stable/2329301), [Carhart Four-Factor Model](https://en.wikipedia.org/wiki/Carhart_four-factor_model), [Q-Factor Model & ivnestment CAPM](https://global-q.org/index.html)\n\n## Key takeways\n\n- CAPM is an **equilibrium model** in a **frictionless** economy\n- Investors hold mix of **market portfolio & risk-free asset**\n- **Expected return** of a stock is a linear function of its **beta**\n- Beta is the **sensitivity** of a stock to **market movements**\n- Beta estimation via **linear regression** using historical data\n\n## Exercises\n\n1. ...\n", + "supporting": [ + "capital-asset-pricing-model_files" + ], + "filters": [ + "rmarkdown/pagebreak.lua" + ], + "includes": {}, + "engineDependencies": {}, + "preserve": {}, + "postProcess": true + } +} \ No newline at end of file diff --git a/_freeze/r/capital-asset-pricing-model/figure-html/fig-300-1.png b/_freeze/r/capital-asset-pricing-model/figure-html/fig-300-1.png new file mode 100644 index 00000000..fbbb2318 Binary files /dev/null and b/_freeze/r/capital-asset-pricing-model/figure-html/fig-300-1.png differ diff --git a/_freeze/r/capital-asset-pricing-model/figure-html/fig-301-1.png b/_freeze/r/capital-asset-pricing-model/figure-html/fig-301-1.png new file mode 100644 index 00000000..62bc0123 Binary files /dev/null and b/_freeze/r/capital-asset-pricing-model/figure-html/fig-301-1.png differ diff --git a/_freeze/r/capital-asset-pricing-model/figure-html/fig-302-1.png b/_freeze/r/capital-asset-pricing-model/figure-html/fig-302-1.png new file mode 100644 index 00000000..5cefde47 Binary files /dev/null and b/_freeze/r/capital-asset-pricing-model/figure-html/fig-302-1.png differ diff --git a/_freeze/r/capital-asset-pricing-model/figure-html/fig-303-1.png b/_freeze/r/capital-asset-pricing-model/figure-html/fig-303-1.png new file mode 100644 index 00000000..1290a054 Binary files /dev/null and b/_freeze/r/capital-asset-pricing-model/figure-html/fig-303-1.png differ diff --git a/_freeze/r/capital-asset-pricing-model/figure-html/fig-304-1.png b/_freeze/r/capital-asset-pricing-model/figure-html/fig-304-1.png new file mode 100644 index 00000000..26ccc78f Binary files /dev/null and b/_freeze/r/capital-asset-pricing-model/figure-html/fig-304-1.png differ diff --git a/_freeze/r/capital-asset-pricing-model/figure-html/fig-305-1.png b/_freeze/r/capital-asset-pricing-model/figure-html/fig-305-1.png new file mode 100644 index 00000000..56414b40 Binary files /dev/null and b/_freeze/r/capital-asset-pricing-model/figure-html/fig-305-1.png differ diff --git a/_freeze/r/capital-asset-pricing-model/figure-html/unnamed-chunk-11-1.png b/_freeze/r/capital-asset-pricing-model/figure-html/unnamed-chunk-11-1.png new file mode 100644 index 00000000..510f0063 Binary files /dev/null and b/_freeze/r/capital-asset-pricing-model/figure-html/unnamed-chunk-11-1.png differ diff --git a/_freeze/r/capital-asset-pricing-model/figure-html/unnamed-chunk-13-1.png b/_freeze/r/capital-asset-pricing-model/figure-html/unnamed-chunk-13-1.png new file mode 100644 index 00000000..476dd21b Binary files /dev/null and b/_freeze/r/capital-asset-pricing-model/figure-html/unnamed-chunk-13-1.png differ diff --git a/_freeze/r/capital-asset-pricing-model/figure-html/unnamed-chunk-16-1.png b/_freeze/r/capital-asset-pricing-model/figure-html/unnamed-chunk-16-1.png new file mode 100644 index 00000000..1bbfd08a Binary files /dev/null and b/_freeze/r/capital-asset-pricing-model/figure-html/unnamed-chunk-16-1.png differ diff --git a/_freeze/r/capital-asset-pricing-model/figure-html/unnamed-chunk-17-1.png b/_freeze/r/capital-asset-pricing-model/figure-html/unnamed-chunk-17-1.png new file mode 100644 index 00000000..39db18cc Binary files /dev/null and b/_freeze/r/capital-asset-pricing-model/figure-html/unnamed-chunk-17-1.png differ diff --git a/_freeze/r/capital-asset-pricing-model/figure-html/unnamed-chunk-20-1.png b/_freeze/r/capital-asset-pricing-model/figure-html/unnamed-chunk-20-1.png new file mode 100644 index 00000000..56414b40 Binary files /dev/null and b/_freeze/r/capital-asset-pricing-model/figure-html/unnamed-chunk-20-1.png differ diff --git a/_freeze/r/capital-asset-pricing-model/figure-html/unnamed-chunk-4-1.png b/_freeze/r/capital-asset-pricing-model/figure-html/unnamed-chunk-4-1.png new file mode 100644 index 00000000..461e346f Binary files /dev/null and b/_freeze/r/capital-asset-pricing-model/figure-html/unnamed-chunk-4-1.png differ diff --git a/_freeze/r/discounted-cash-flow-analysis/execute-results/html.json b/_freeze/r/discounted-cash-flow-analysis/execute-results/html.json new file mode 100644 index 00000000..b8fd0bf9 --- /dev/null +++ b/_freeze/r/discounted-cash-flow-analysis/execute-results/html.json @@ -0,0 +1,17 @@ +{ + "hash": "56a2b044540c4d9af0239547115c19bc", + "result": { + "engine": "knitr", + "markdown": "---\ntitle: Discounted Cash Flow Analysis\nmetadata:\n pagetitle: DCF with R\n description-meta: Learn how to use the programming language R to value companies using discounted cash flow analysis.\n---\n\n\n\n\nIn this chapter, we address a fundamental question: what is the value of a company? ompany valuation is a critical tool that helps us determine the economic value of a business. Whether it’s for investment decisions, mergers and acquisitions, or financial reporting, understanding a company’s value is essential. But valuation isn’t just about assigning a number - it’s about providing a framework for making informed decisions. For example, investors use valuation to identify whether a stock is under- or over-valued. Companies rely on valuation for strategic decisions, like pricing an acquisition or preparing for an IPO.\n\nThere are several approaches to valuation, each suited to different purposes and scenarios: \n\n- Market-based valuation compares the company to others in the market, often using multiples like the Price-to-Earnings ratio or Enterprise Value-to-EBITDA. It’s quick and intuitive but relies heavily on the availability of comparable data.\n- Asset-based valuation focuses on the net value of a company’s assets. For instance, the book value method considers what the company would be worth if it sold off all its assets and settled its liabilities. It’s straightforward but doesn’t always capture intangible assets like brand value or future growth potential.\n- Income-based valuation puts the focus on the company’s ability to generate future earnings. A popular method in this category is Discounted Cash Flow (DCF) which estimates value based on expected future cash flows discounted to present value. This method is comprehensive but requires robust assumptions about the future.\n\nWe focus on DCF analysis in this chapter for a couple of reasons:\n\n- DCF accounts for the time value of money—a dollar today is worth more than a dollar in the future. By discounting future cash flows back to the present, we ensure that our valuation reflects the cost of waiting and the risk involved.\n- Unlike methods that only rely on historical data, DCF adresses future cash flows. This forward-looking approach makes it especially useful for dynamic businesses where past performance doesn’t fully capture future potential.\n- DCF is applicable across industries and company sizes, whether you’re valuing a tech startup or a mature manufacturing firm. It’s also adaptable for companies with varying capital structures or growth trajectories.\n- DCF isn’t limited to valuing companies - it’s also a great tool for assessing the viability of individual projects..\n\nBecause it focuses on long-term cash flows and risk factors, DCF serves as a foundation for long-term strategic decision making. It helps management, investors, and analysts take a holistic view of a company’s future value.\n\nDCF in its essence comprises of three key components. The first component is forecasted free cash flows (FCF), which represent the expected future earnings of the company. FCF measure the cash that remains after accounting for operating expenses, taxes, investments, and changes in working capital. They give us a clear picture of the cash available for distribution to investors, making them a key indicator of value.\n\nNext, we have the continuation value, also known as the terminal value. This captures the value of the business beyond the explicit forecast period. Since forecasting cash flows far into the future is inherently uncertain, the terminal value accounts for the bulk of a company’s value in many DCF analyses.\n\nThe final component is the discount rate, which adjusts future cash flows to their present value by accounting for risk and the time value of money. Typically, we use the Weighted Average Cost of Capital (WACC), which combines the costs of equity and debt, weighted by their proportion in the company’s capital structure. In practice, getting the WACC right is crucial, as small changes in the discount rate can have a significant impact on the final valuation.\n\nIn this chapter, we rely on the following packages to build a simple DCF analysis:\n\n\n\n\n::: {.cell}\n\n```{.r .cell-code}\nlibrary(tidyverse)\nlibrary(tidyfinance)\nlibrary(scales)\nlibrary(fmpapi)\n```\n:::\n\n\n\n\n## Prepare Data\n\n- Import data using [Financial Modeling Prep (FMP) API](https://site.financialmodelingprep.com/developer/docs)\n- R package: [tidy-finance/r-fmpapi](https://github.com/tidy-finance/r-fmpapi)\n\n\n\n\n::: {.cell}\n\n```{.r .cell-code}\nsymbol <- \"MSFT\"\n\nincome_statements <- fmp_get(\"income-statement\", symbol, list(period = \"annual\", limit = 5))\ncash_flow_statements <- fmp_get(\"cash-flow-statement\", symbol, list(period = \"annual\", limit = 5))\n```\n:::\n\n\n\n\nFCF is the cash that a company generates after accounting for outflows to support operatings & maintain capital assets. It represents the cash available to investors - both equity holders and debt holders - after a company has met its operational and capital expenditure needs. There are multiple ways to calculate FCF and we use the following definition^[See [investopedia.com](https://www.investopedia.com/terms/f/freecashflow.asp) for alternative definitions.]\n\n$$\\text{FCF} = \\text{EBIT} + \\text{Depreciation & Amortization} - \\text{Taxes} + \\Delta \\text{Working Capital} - \\text{CAPEX}$$\nLet’s break down the formula for FCF step by step:\n\n- EBIT (Earnings Before Interest and Taxes): represents the company’s core operating profit, excluding the effects of financing and tax expenses.\n-\tDepreciation & Amortization: non-cash expenses that allocate the cost of tangible and intangible assets over their useful lives.\n-\tTaxes: the amount a company pays to the government, calculated on its taxable income.\n-\t$\\Delta$ Working Capital: the change in current assets minus current liabilities, reflecting the cash needed to support daily operations.\n-\tCAPEX (Capital Expenditures): funds used by a company to acquire or upgrade physical assets like property, buildings, or equipment.\n\nWe can calculate FCF using these items as follows: \n\n\n\n\n::: {.cell}\n\n```{.r .cell-code}\ndcf_data <- income_statements |> \n mutate(\n ebit = net_income + income_tax_expense - interest_expense - interest_income\n ) |> \n select(\n year = calendar_year, \n ebit, revenue, depreciation_and_amortization, taxes = income_tax_expense\n ) |> \n left_join(\n cash_flow_statements |> \n select(year = calendar_year, \n delta_working_capital = change_in_working_capital,\n capex = capital_expenditure), join_by(year)\n ) |> \n mutate(\n fcf = ebit + depreciation_and_amortization - taxes + delta_working_capital - capex\n ) |> \n arrange(year) \n```\n:::\n\n\n\n\n## Forecast Free-Cash-Flow\n\nNow that we’ve covered the components of FCF, let’s discuss how to forecast FCF over the projection period. Forecasting FCF typically involves a balance between data-driven analysis and informed judgment. While historical data provides a foundation, financial analysts often need to make educated assumptions about the future. The first step is to express the components of FCF as ratios relative to revenue. This ratio-based approach makes it easier to link the components of FCF to a single driving variable: revenue.\n\n\n\n\n::: {.cell}\n\n```{.r .cell-code}\ndcf_data <- dcf_data |> \n mutate(\n revenue_growth = revenue / lag(revenue) - 1,\n operating_margin = ebit / revenue,\n da_margin = depreciation_and_amortization / revenue,\n taxes_to_revenue = taxes / revenue,\n delta_working_capital_to_revenue = delta_working_capital / revenue,\n capex_to_revenue = capex / revenue\n )\n\nfig_financial_ratios <- dcf_data |> \n pivot_longer(cols = c(operating_margin:capex_to_revenue)) |>\n ggplot(aes(x = year, y = value, color = name)) +\n geom_line() +\n scale_x_continuous(breaks = pretty_breaks()) +\n scale_y_continuous(labels = percent) +\n labs(\n x = NULL, y = NULL, color = NULL,\n title = \"Key financial ratios of Microsoft between 2020 and 2024\"\n )\nfig_financial_ratios\n```\n\n::: {.cell-output-display}\n![Ratios are based on financial statements as provided through the FMP API.](discounted-cash-flow-analysis_files/figure-html/fig-500-1.png){#fig-500 fig-alt='Title: Key financial ratios of Microsoft between 2020 and 2024. The figure shows a line chart with years on the horizontal axis and financial ratios on the vertical axis.' width=2100}\n:::\n:::\n\n\n\n\nNext, analysts use their understanding of the company’s operations and industry dynamics to make subjective guesses about how these ratios might change in the future.\n\nWe define examplary ratio dynamics in @fig-501. \n\n\n\n\n::: {.cell}\n\n```{.r .cell-code}\ndcf_data_forecast_ratios <- tribble(\n ~year, ~operating_margin, ~da_margin, ~taxes_to_revenue, ~delta_working_capital_to_revenue, ~capex_to_revenue,\n 2025, 0.41, 0.09, 0.08, 0.001, -0.2,\n 2026, 0.42, 0.09, 0.07, 0.001, -0.22,\n 2027, 0.43, 0.09, 0.06, 0.001, -0.2,\n 2028, 0.44, 0.09, 0.06, 0.001, -0.18,\n 2029, 0.45, 0.09, 0.06, 0.001, -0.16\n) |> \n mutate(type = \"Forecast\")\n\ndcf_data <- dcf_data |> \n mutate(type = \"Realized\") |> \n bind_rows(dcf_data_forecast_ratios)\n\nfig_financial_ratios_forecast <- dcf_data |> \n pivot_longer(cols = c(operating_margin:capex_to_revenue)) |> \n ggplot(aes(x = year, y = value, color = name, linetype = rev(type))) +\n geom_line() +\n scale_x_continuous(breaks = pretty_breaks()) +\n scale_y_continuous(labels = percent) +\n labs(\n x = NULL, y = NULL, color = NULL, linetype = NULL,\n title = \"Key financial ratios and ad-hoc forecasts of Microsoft between 2020 and 2029\"\n )\nfig_financial_ratios_forecast\n```\n\n::: {.cell-output-display}\n![Realized ratios are based on financial statements as provided through the FMP API, while forecasts are manually defined.](discounted-cash-flow-analysis_files/figure-html/fig-501-1.png){#fig-501 fig-alt='Title: Key financial ratios and ad-hoc forecasts of Microsoft between 2020 and 2029. The figure shows a line chart with years on the horizontal axis and financial ratios and their forecasts on the vertical axis.' width=2100}\n:::\n:::\n\n\n\n\nFinally, revenue growth projections are typically based on macroeconomic factors, such as GDP growth or industry trends, as well as company-specific factors like market share and product pipelines. A great starting point for revenue growth projections is the [IMF World Economic Outlook (WEO)](https://www.imf.org/en/Publications/WEO/weo-database/2024/October/weo-report?c=111,&s=NGDP_RPCH,&sy=2020&ey=2029&ssm=0&scsm=1&scc=0&ssd=1&ssc=0&sic=1&sort=country&ds=.&br=1) data for the US. This resource provides in-depth analyses of the global economy, including trends and forecasts for key metrics like GDP growth. However, keep in mind that macroeconomic data and hence GDP forecasts always have some inherent lag. \n\nA simple method is to model revenue growth as a linear function of GDP growth. The idea is straightforward: if GDP grows by, e.g., 3%, you might project company revenue to grow by a similar rate, adjusted for factors like market share and industry dynamics.\n\nThe following code chunk implements this approach:\n\n\n\n\n::: {.cell}\n\n```{.r .cell-code}\ngdp_growth <- tibble(\n year = 2020:2029,\n gdp_growth = c(-0.02163, 0.06055,\t0.02512, 0.02887, 0.02765, 0.02153, 0.02028, 0.02120, 0.02122, 0.02122)\n)\n\ndcf_data <- dcf_data |> \n left_join(gdp_growth, join_by(year)) \n\nrevenue_growth_model <- dcf_data |> \n lm(revenue_growth ~ gdp_growth, data = _) |> \n coefficients()\n \ndcf_data <- dcf_data |> \n mutate(\n revenue_growth_modeled = revenue_growth_model[1] + revenue_growth_model[2] * gdp_growth,\n revenue_growth = if_else(type == \"Forecast\", revenue_growth_modeled, revenue_growth) \n ) \n```\n:::\n\n::: {.cell}\n\n```{.r .cell-code}\nfig_growth <- dcf_data |> \n filter(year >= 2021) |> \n pivot_longer(cols = c(revenue_growth, gdp_growth)) |> \n ggplot(aes(x = year, y = value, color = name, linetype = rev(type))) +\n geom_line() +\n scale_x_continuous(breaks = pretty_breaks()) +\n scale_y_continuous(labels = percent) +\n labs(\n x = NULL, y = NULL, color = NULL, linetype = NULL,\n title = \"GDP growth and Microsoft revenue growth and modeled forecasts between 2020 and 2029\"\n )\nfig_growth\n```\n\n::: {.cell-output-display}\n![Realized revue growth rates are based on financial statements as provided through the FMP API, while forecasts are modeled unsing IMF WEO forecasts.](discounted-cash-flow-analysis_files/figure-html/fig-502-1.png){#fig-502 fig-alt='Title: GDP growth and Microsoft revenue growth and modeled forecasts between 2020 and 2029. The figure shows a line chart with years on the horizontal axis and gdp growth, revenue growth and their forecasts on the vertical axis.' width=2100}\n:::\n:::\n\n\n\n\nAn alternative approach would be to look up consensus analyst forecasts, but this data is typically proprietary.\n\nNow that we have all required components, we can finally calculate FCF forecasts.\n\n\n\n\n::: {.cell}\n\n```{.r .cell-code}\ndcf_data$revenue_growth[1] <- 0\ndcf_data$revenue <- dcf_data$revenue[1] * cumprod(1 + dcf_data$revenue_growth)\n\ndcf_data <- dcf_data |> \n mutate(\n ebit = operating_margin * revenue,\n depreciation_and_amortization = da_margin * revenue,\n taxes = taxes_to_revenue * revenue,\n delta_working_capital = delta_working_capital_to_revenue * revenue,\n capex = capex_to_revenue * revenue,\n fcf = ebit + depreciation_and_amortization - taxes + delta_working_capital - capex\n )\n```\n:::\n\n\n\n\n@fig-503 Vvisualizes these FCF forecasts.\n\n\n\n\n::: {.cell}\n\n```{.r .cell-code}\nfig_fcf <- dcf_data |>\n ggplot(aes(x = year, y = fcf / 1e9)) +\n geom_col(aes(fill = type)) +\n scale_x_continuous(breaks = pretty_breaks()) +\n scale_y_continuous(labels = comma) + \n labs(\n x = NULL, y = \"Free Cash Flow (in B USD)\", fill = NULL,\n title = \"Actual and predicted free cash flow for Microsoft from 2020 to 2029\"\n )\nfig_fcf\n```\n\n::: {.cell-output-display}\n![Realized growth rates are based on financial statements as provided through the FMP API, while forecasts are manually defined.](discounted-cash-flow-analysis_files/figure-html/fig-503-1.png){#fig-503 fig-alt='Title: GDP growth and Microsoft revenue growth and modeled forecasts between 2020 and 2029. The figure shows a line chart with years on the horizontal axis and gdp growth, revenue growth and their forecasts on the vertical axis.' width=2100}\n:::\n:::\n\n\n\n\n## Continuation Value\n\nNow, let’s discuss how to compute the continuation value, also known as the terminal value. This value is a critical component of the DCF analysis, as it often represents a significant portion of a company’s overall valuation. One typical approach is the Perpetuity Growth Model, which assumes that free cash flows grow at a constant rate indefinitely. The model simply assumes that cash flows grow at a constant rate indefinitely:\n\n$$TV_{T} = \\frac{{FCF_{T+1}}}{{r - g}},$$\nwhere $r$ is the discount rate, typically measured by the WACC, and $g$ is the perpetual growth rate. \n\nFor our application, we need to make an assumption for perpetual growth rate. For instance, last 20 years GDP growth is a sensible assumption (nominal growth rate is 4% for the US).\n\nAn alternative method is the exit multiple approach, which estimates the continuation value based on a multiple of EBITDA or another financial metric at the end of the forecast period.^[See (corporatefinanceinstitute.com/)[https://corporatefinanceinstitute.com/resources/valuation/exit-multiple/)] for an intuitive explanation of the exit multiple approach.]\n\n\n\n\n::: {.cell}\n\n```{.r .cell-code}\ncompute_terminal_value <- function(last_fcf, growth_rate, discount_rate){\n last_fcf * (1 + growth_rate) / (discount_rate - growth_rate)\n}\n\nlast_fcf <- tail(dcf_data$fcf, 1)\nterminal_value <- compute_terminal_value(last_fcf, 0.04, 0.08)\nterminal_value / 1e9\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n[1] 7564\n```\n\n\n:::\n:::\n\n\n\n\n## Discount Rates\n\nAs a last critical step, we need to bring future cash flows to their present value using a discount factor. In company valuation settings, the WACC typically serves this purpose. The WACC represents the average rate of return required by all the company’s investors, including both equity holders and debt holders. It reflects the overall cost of financing the company’s operations and serves as the discount rate in our valuation. The definition is a follows:\n\n$$WACC = \\frac{E}{D+E} \\cdot r^E + \\frac{D}{D+E} \\cdot r^D \\cdot (1 - \\tau),$$\n\nwhere $E$ is the market value of the company’s equity with required return $r^E$, $D$ is the market value of the company's debt with pre-tax return $r^D$, and $\\tau$ is the tax rate.\n\nWhile you can often find estimates of WACC from financial databases or analysts’ reports, sometimes you may need to calculate it yourself. Let’s walk through the practical steps to estimate WACC using real-world data: \n\n- $E$ is typically measured as the market value of the company’s equity. One common approach is to calculate it by subtracting net debt (total debt minus cash) from the enterprise value.\n- $D$ is often measured using the book value of the company’s debt. While this might not perfectly reflect market conditions, it’s a practical starting point when market data is unavailable.\n- The Capital Asset Pricing Model (CAPM) is a popular method to estimate the cost of equity $r^E$. It considers the risk-free rate, the equity risk premium, and the company’s beta. For a detailed guide on how to estimate the CAPM, we refer to Chapter [Capital Asset Pricing Model](capital-asset-pricing-model.qmd).\n- The return on debt $r^D$ can also be estimated in different ways. For instance, effective interest rates can be calculated as the ratio of interest expense to total debt from financial statements. This gives you a real-world measure of what the company is currently paying. Alternatively, you can look up corporate bond spreads for companies in the same rating group. For highly rated companies like Microsoft, this would reflect their low-risk profile and correspondingly low borrowing costs.\n\nIf you'd rather not estimate WACC manually, there are excellent resources available to help you find industry-specific discount rates. One of the most widely used sources is Aswath Damodaran’s [database](https://pages.stern.nyu.edu/~adamodar/New_Home_Page/datacurrent.html). He maintains an extensive database that provides a wealth of financial data, including estimated discount rates, cash flows, growth rates, multiples, and more. What makes his database particularly valuable is its level of detail and coverage of multiple industries and regions. For example, if you’re analyzing a company in the Computer Services sector, as we do here, you can look up the industry’s average WACC and use it as a benchmark for your analysis. The following code chunk downloads the WACC data and extracts the value for this industry:\n\n\n\n\n::: {.cell}\n\n```{.r .cell-code}\nlibrary(readxl)\n\nfile <- tempfile(fileext = \"xls\")\n\nurl <- \"https://pages.stern.nyu.edu/~adamodar/pc/datasets/wacc.xls\"\ndownload.file(url, file)\nwacc_raw <- read_xls(file, sheet = 2, skip = 18)\nunlink(file)\n\nwacc <- wacc_raw |> \n filter(`Industry Name` == \"Computer Services\") |> \n pull(`Cost of Capital`)\n```\n:::\n\n\n\n\n## Compute DCF Value\n\n$$\n\\text{Total DCF Value} = \\sum_{t=1}^{\\text{T}} \\frac{\\text{FCF}_t}{(1 + \\text{WACC})^t} + \\frac{\\text{TV}_{T}}{(1 + \\text{WACC})^{\\text{T}}}\n$$\n\n\n\n\n\n::: {.cell}\n\n```{.r .cell-code}\nforecasted_years <- 5\n\ncompute_dcf <- function(wacc, growth_rate, years = 5) {\n free_cash_flow <- dcf_data$fcf\n last_fcf <- tail(free_cash_flow, 1)\n terminal_value <- compute_terminal_value(last_fcf, growth_rate, wacc)\n \n present_value_fcf <- free_cash_flow / (1 + wacc)^(1:years)\n present_value_tv <- terminal_value / (1 + wacc)^years\n total_dcf_value <- sum(present_value_fcf) + present_value_tv\n total_dcf_value\n}\n\ncompute_dcf(wacc, 0.03) / 1e9\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n[1] 6084\n```\n\n\n:::\n:::\n\n\n\n\n## Sensitvity Analysis\n\nOne of the key challenges in a DCF analysis is that it relies heavily on assumptions about the future, growth, and risk. This is where sensitivity analysis comes into play, helping us understand how changes in these assumptions can impact our valuation. For instance, small changes in assumptions like operating margin or CAPEX as a percentage of revenue might lead to noticeable shifts in FCF projections, or overestimating or underestimating revenue growth might have a cascading effect on cash flow projections.\n\nWe focus on the commonly biggest drivers of valuation that can have dramatic effects on the calculation: the perpetual growth rate and the WACC. The following code chunk implements different WACC and growth scenarios:\n\n\n\n\n::: {.cell}\n\n```{.r .cell-code}\nwacc_range <- seq(0.06, 0.08, by = 0.01)\ngrowth_rate_range <- seq(0.02, 0.04, by = 0.01)\n\nsensitivity <- expand_grid(\n wacc = wacc_range,\n growth_rate = growth_rate_range\n) |>\n mutate(value = pmap_dbl(list(wacc, growth_rate), compute_dcf))\n\nfig_sensitivity <- sensitivity |> \n mutate(value = round(value / 1e9, 0)) |> \n ggplot(aes(x = wacc, y = growth_rate, fill = value)) +\n geom_tile() +\n geom_text(aes(label = comma(value)), color = \"white\") +\n scale_x_continuous(labels = percent) + \n scale_y_continuous(labels = percent) +\n scale_fill_continuous(labels = comma) + \n labs(\n title = \"DCF value of Microsoft for different WACC and growth scenarios\",\n x = \"WACC\",\n y = \"Perpetual growth rate\",\n fill = \"Company value\"\n ) + \n guides(fill = guide_colorbar(barwidth = 15, barheight = 0.5))\nfig_sensitivity\n```\n\n::: {.cell-output-display}\n![DCF value combines data from FMP API, ad-hoc forecasts of financial ratios, and IMF WEO growth forecasts.](discounted-cash-flow-analysis_files/figure-html/fig-504-1.png){#fig-504 fig-alt='Title: DCF value of Microsoft for different WACC and growth scenarios. The figure shows a tile chart different values of WACC on the horizontal axis and perpetual growth rates on the vertical axis. Each tile shows a corresponding DCF value, illustrating the sensitivity of the DCF analysis to assumptions.' width=2100}\n:::\n:::\n\n\n\n\n@fig-504 shows that ...\n\n## From DCF to Equity Value\n\nDCF model provides an estimate for value of operations\n\n$$\\text{Equity Value} = \\text{DCF Value} + \\text{Non-Operating Assets} - \\text{Value of Debt}$$\n\n- Non-Operating Assets: not essential to operations, but generate income (e.g., marketable securities, vacant land, idle equipment)\n- Value of Debt: in theory market value of total debt, in practice book debt\n\n## Key takeaways\n\nThe DCF method provides a structured framework for making informed decisions. By breaking down the valuation process into clear, logical steps, it helps analysts and decision-makers focus on the fundamentals that drive value. DCF stands out because it values companies or projects based on their projected future cash flows, rather than just historical data or market sentiment. This forward-looking approach makes it especially useful for long-term strategic decisions. The three core elements that we discussed in this chapter are: free cash flow, continuation value, and the WACC. Finally, the quality of a DCF analysis critically depends on the assumptions we make. Key drivers like financial ratios, revenue growth, and WACC require careful validation, as even small errors can lead to significant deviations in valuation.\n\n## Exercises\n\n1. ...\n\n", + "supporting": [ + "discounted-cash-flow-analysis_files" + ], + "filters": [ + "rmarkdown/pagebreak.lua" + ], + "includes": {}, + "engineDependencies": {}, + "preserve": {}, + "postProcess": true + } +} \ No newline at end of file diff --git a/_freeze/r/discounted-cash-flow-analysis/figure-html/fig-500-1.png b/_freeze/r/discounted-cash-flow-analysis/figure-html/fig-500-1.png new file mode 100644 index 00000000..ec8c370d Binary files /dev/null and b/_freeze/r/discounted-cash-flow-analysis/figure-html/fig-500-1.png differ diff --git a/_freeze/r/discounted-cash-flow-analysis/figure-html/fig-501-1.png b/_freeze/r/discounted-cash-flow-analysis/figure-html/fig-501-1.png new file mode 100644 index 00000000..4605c224 Binary files /dev/null and b/_freeze/r/discounted-cash-flow-analysis/figure-html/fig-501-1.png differ diff --git a/_freeze/r/discounted-cash-flow-analysis/figure-html/fig-502-1.png b/_freeze/r/discounted-cash-flow-analysis/figure-html/fig-502-1.png new file mode 100644 index 00000000..8f3c467a Binary files /dev/null and b/_freeze/r/discounted-cash-flow-analysis/figure-html/fig-502-1.png differ diff --git a/_freeze/r/discounted-cash-flow-analysis/figure-html/fig-503-1.png b/_freeze/r/discounted-cash-flow-analysis/figure-html/fig-503-1.png new file mode 100644 index 00000000..7b894616 Binary files /dev/null and b/_freeze/r/discounted-cash-flow-analysis/figure-html/fig-503-1.png differ diff --git a/_freeze/r/discounted-cash-flow-analysis/figure-html/fig-504-1.png b/_freeze/r/discounted-cash-flow-analysis/figure-html/fig-504-1.png new file mode 100644 index 00000000..b1df1f8d Binary files /dev/null and b/_freeze/r/discounted-cash-flow-analysis/figure-html/fig-504-1.png differ diff --git a/_freeze/r/discounted-cash-flow-analysis/figure-html/unnamed-chunk-13-1.png b/_freeze/r/discounted-cash-flow-analysis/figure-html/unnamed-chunk-13-1.png new file mode 100644 index 00000000..b7912c80 Binary files /dev/null and b/_freeze/r/discounted-cash-flow-analysis/figure-html/unnamed-chunk-13-1.png differ diff --git a/_freeze/r/discounted-cash-flow-analysis/figure-html/unnamed-chunk-4-1.png b/_freeze/r/discounted-cash-flow-analysis/figure-html/unnamed-chunk-4-1.png new file mode 100644 index 00000000..42d79f34 Binary files /dev/null and b/_freeze/r/discounted-cash-flow-analysis/figure-html/unnamed-chunk-4-1.png differ diff --git a/_freeze/r/discounted-cash-flow-analysis/figure-html/unnamed-chunk-5-1.png b/_freeze/r/discounted-cash-flow-analysis/figure-html/unnamed-chunk-5-1.png new file mode 100644 index 00000000..60af24ab Binary files /dev/null and b/_freeze/r/discounted-cash-flow-analysis/figure-html/unnamed-chunk-5-1.png differ diff --git a/_freeze/r/discounted-cash-flow-analysis/figure-html/unnamed-chunk-7-1.png b/_freeze/r/discounted-cash-flow-analysis/figure-html/unnamed-chunk-7-1.png new file mode 100644 index 00000000..768b26cc Binary files /dev/null and b/_freeze/r/discounted-cash-flow-analysis/figure-html/unnamed-chunk-7-1.png differ diff --git a/_freeze/r/discounted-cash-flow-analysis/figure-html/unnamed-chunk-9-1.png b/_freeze/r/discounted-cash-flow-analysis/figure-html/unnamed-chunk-9-1.png new file mode 100644 index 00000000..7b894616 Binary files /dev/null and b/_freeze/r/discounted-cash-flow-analysis/figure-html/unnamed-chunk-9-1.png differ diff --git a/_freeze/r/financial-ratios/execute-results/html.json b/_freeze/r/financial-ratios/execute-results/html.json new file mode 100644 index 00000000..befc48d4 --- /dev/null +++ b/_freeze/r/financial-ratios/execute-results/html.json @@ -0,0 +1,15 @@ +{ + "hash": "4cf3e31894787f5ca7ac4bafc8ed4199", + "result": { + "engine": "knitr", + "markdown": "---\ntitle: Financial Ratios\nmetadata:\n pagetitle: Financial Ratios with R\n description-meta: Learn how to use the programming language R to analyze companies using financial ratios.\ncache: true\n---\n\n\n\n\nIn this chapter, we explore the role of financial statements and financial ratios in analyzing companies. Financial statements are essential because they serve as a standardized source of information, providing a consistent framework that enables investors, creditors, and analysts to assess a company’s financial health and performance. All companies are legally required to file financial statements, which adds a layer of accountability and reliability to the information they disclose. Public companies, in particular, are subject to even more rigorous standards: they must have their financial statements independently audited, which helps ensure accuracy and integrity in reporting. Additionally, in the United States, public companies are required by the Securities and Exchange Commission, or SEC, to file their financials quarterly and annually. This requirement ensures that investors and analysts have timely information, allowing them to make informed decisions throughout the year.\n\nFinancial ratios are tools for understanding a company’s financial health and performance. They facilitate:\n\n- Benchmarking: By comparing ratios across different companies, investors and analysts can assess how a company stands relative to its peers within the same industry.\n-\tTrend Analysis: Evaluating a company’s financial ratios over multiple periods reveals trends and patterns, offering insights into its financial trajectory and operational effectiveness.\n-\tPortfolio Selection: Investors often use financial ratios to screen and select high-quality firms for their portfolios. \n-\tAsset Pricing Models: Financial ratios are integral to factor models in asset pricing, such as the Fama-French three-factor model or Q-factors)\n-\tCapital Structure and Risk Management: Ratios aid in assessing its risk profile and predicting potential financial distress. \n\nThis chapter is based on the following packages: \n\n\n\n\n::: {.cell}\n\n```{.r .cell-code}\nlibrary(tidyverse)\nlibrary(tidyfinance)\nlibrary(scales)\nlibrary(ggrepel)\nlibrary(fmpapi)\n```\n:::\n\n\n\n\n## Balance Sheet Statements\n\nThe balance sheet provides a snapshot of a company's financial standing at a specific point in time. It shows the company’s assets, liabilities, and shareholders’ equity, according to the fundamental accounting equation: \n\n$$\\text{Assets} = \\text{Liabilities} + \\text{Shareholders’ Equity}$$\n\nAssets are the resources owned by the company that are expected to provide future economic benefits, liabilities are obligations the company owes to external parties, and shareholders’ equity is the residual interest in the assets of the company after deducting liabilities. @fig-400 shows a stylized balance sheet with these components.\n\n![A stylized representation of a balance sheet statement.](../assets/img/balance-sheet.svg){#fig-400 alt=\"A stylized representation of a balance sheet statement.\"}\n\nLet us dive deeper into the asset side which typically comprises of the following parts: current assets that are expected to be converted into cash or used up within one year, such as cash, accounts receivable (=money owed to a business for goods or services), and inventory, non-current assets, which are long-term investments and property, plant, and equipment (PP&E) that are not expected to be liquidated within a year, and intangible assets, which are non-physical asset such as a patent, brand, trademark, or copyright. @fig-401 shows a stylized breakdown of the asset side. \n \n![A stylized representation of a breakdown of assets on a balance sheet.](../assets/img/assets.svg){#fig-401 alt=\"A stylized representation of a breakdown of assets on a balance sheet.\"}\n\nLiabilities are typically split into current liabilities, which are debts or obligations due within one year, including accounts payable and short-term loans, and non-current liabilities, which are long-term debts and obligations due beyond one year, such as long-term debt or deferred taxes. @fig-402 illustrates this breakdown in liabilities. \n\n![A stylized representation of a breakdown of liabilities on a balance sheet.](../assets/img/liabilities.svg){#fig-402 alt=\"A stylized representation of a breakdown of liabilities on a balance sheet.\"}\n\nLastly, equity is typically divided into retained earnings, which are accumulated profits that have been reinvested in the business rather than distributed as dividends, common stock, which is capital contributed by shareholders, and preferred stock, which is a different type of equity that represents ownership of a company and the right to claim income from the company’s operations, but with limited voting rights. @fig-403 shows the corresponding equity breakdown.\n\n![A stylized representation of a breakdown of equity on a balance sheet.](../assets/img/equity.svg){#fig-403 alt=\"A stylized representation of a breakdown of equity on a balance sheet.\"}\n\n@fig-404 shows an example balance sheet from Microsoft in 2023. \n\n![A screenshot of the balance sheet statement of Microsoft in 2023.](../assets/img/balance-sheet-msft.png){#fig-404 alt=\"A screenshot of the balance sheet statement of Microsoft in 2023.\"}\n\nWe can use the `fmpapi` package to download financial statements:\n\n- SEC provides interface to [search filings](https://www.sec.gov/search-filings)\n- [Financial Modeling Prep (FMP) API](https://site.financialmodelingprep.com/developer/docs) provides programming interface\n- Free tier: 250 calls / day, 5 year historical fundamental data\n- R package: [tidy-finance/r-fmpapi](https://github.com/tidy-finance/r-fmpapi)\n- Install via `install.packages(\"fmpapi\")`\n\n\n\n\n::: {.cell}\n\n```{.r .cell-code}\nfmp_get(\n resource = \"balance-sheet-statement\", \n symbol = \"MSFT\", \n params = list(period = \"annual\", limit = 5)\n)\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n# A tibble: 5 × 54\n date symbol reported_currency cik filling_date\n \n1 2024-06-30 MSFT USD 0000789019 2024-07-30 \n2 2023-06-30 MSFT USD 0000789019 2023-07-27 \n3 2022-06-30 MSFT USD 0000789019 2022-07-28 \n4 2021-06-30 MSFT USD 0000789019 2021-07-29 \n5 2020-06-30 MSFT USD 0000789019 2020-07-30 \n# ℹ 49 more variables: accepted_date , calendar_year ,\n# period , cash_and_cash_equivalents ,\n# short_term_investments ,\n# cash_and_short_term_investments , net_receivables ,\n# inventory , other_current_assets ,\n# total_current_assets , property_plant_equipment_net ,\n# goodwill , intangible_assets , …\n```\n\n\n:::\n:::\n\n\n\n\n## Income Statements\n\nIncome statements show a company’s financial performance over a quarter or year by detailing revenue, costs, and profits. It's main components are:\n\n- Revenue (Sales): the total income generated from goods or services sold.\n- Cost of Goods Sold (COGS): direct costs associated with producing the goods or services (raw materials, labor, etc.).\n- Gross Profit: revenue minus COGS, showing the basic profitability from core operations.\n- Operating Expenses: costs related to regular business operations (Salaries, Rent, Marketing).\n- Operating Income (EBIT): earnings before interest and taxes (measures profitability from core operations before financing and tax costs).\n- Net Income: The “bottom line”—total profit after all expenses, taxes, and interest are subtracted from revenue.\n\n@fig-405 provides a stylized representation of these components. Income statements are key in analyizing profitability, operational efficiency, and cost management of a company.\n\n![A stylized representation of an income statement.](../assets/img/income-statements.svg){#fig-405 alt=\"A stylized representation of an income statement.\"}\n\n@fig-406 shows an example income statements of Microsoft 2023.\n\n![A screenshot of the income statement of Microsoft in 2023.](../assets/img/income-statements-msft.png){#fig-406 alt=\"A screenshot of the income statement of Microsoft in 2023.\"}\n\nDownload income statements data:\n\n\n\n\n::: {.cell}\n\n```{.r .cell-code}\nfmp_get(\n resource = \"income-statement\", \n symbol = \"MSFT\", \n params = list(period = \"annual\", limit = 5)\n)\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n# A tibble: 5 × 38\n date symbol reported_currency cik filling_date\n \n1 2024-06-30 MSFT USD 0000789019 2024-07-30 \n2 2023-06-30 MSFT USD 0000789019 2023-07-27 \n3 2022-06-30 MSFT USD 0000789019 2022-07-28 \n4 2021-06-30 MSFT USD 0000789019 2021-07-29 \n5 2020-06-30 MSFT USD 0000789019 2020-07-30 \n# ℹ 33 more variables: accepted_date , calendar_year ,\n# period , revenue , cost_of_revenue ,\n# gross_profit , gross_profit_ratio ,\n# research_and_development_expenses ,\n# general_and_administrative_expenses ,\n# selling_and_marketing_expenses ,\n# selling_general_and_administrative_expenses , …\n```\n\n\n:::\n:::\n\n\n\n\n## Cash Flow Statements\n\nCash flow statements provide details about the flow of cash in and out of the business during a quarter or year, categorized into operating, investing, and financing activities. Overall, they show a company’s ability to generate cash to fund operations and growth.\n\n- Operating Activities: cash generated from a company’s core business activities (Net Income adjusted for non-cash items like depreciation, and changes in working capital).\n- Financing Activities: cash flows related to borrowing, repaying debt, issuing equity, or paying dividends.\n- Investing Activities: cash spent on or received from long-term investments, such as purchasing or selling property, equipment, or securities.\n\n@fig-407 shows a stylized cash flow statement. \n\n![A stylized representation of a cash flow statement.](../assets/img/cash-flow-statements.svg){#fig-407 alt=\"A stylized representation of a cash flow statement.\"}\n\n@fig-408 shows example cash flow statements of Microsoft 2023.\n\n![A screenshot of the cash flow statement of Microsoft in 2023.](../assets/img/cash-flow-statements-msft.png){#fig-408 alt=\"A screenshot of the cash flow statement of Microsoft in 2023.\"}\n\nDownload cash flow statements data for Microsoft:\n\n\n\n\n::: {.cell}\n\n```{.r .cell-code}\nfmp_get(\n resource = \"cash-flow-statement\", \n symbol = \"MSFT\", \n params = list(period = \"annual\", limit = 5)\n)\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n# A tibble: 5 × 40\n date symbol reported_currency cik filling_date\n \n1 2024-06-30 MSFT USD 0000789019 2024-07-30 \n2 2023-06-30 MSFT USD 0000789019 2023-07-27 \n3 2022-06-30 MSFT USD 0000789019 2022-07-28 \n4 2021-06-30 MSFT USD 0000789019 2021-07-29 \n5 2020-06-30 MSFT USD 0000789019 2020-07-30 \n# ℹ 35 more variables: accepted_date , calendar_year ,\n# period , net_income ,\n# depreciation_and_amortization , deferred_income_tax ,\n# stock_based_compensation , change_in_working_capital ,\n# accounts_receivables , inventory ,\n# accounts_payables , other_working_capital ,\n# other_non_cash_items , …\n```\n\n\n:::\n:::\n\n\n\n\n## Download Financial Statements\n\n\n\n\n::: {.cell}\n\n```{.r .cell-code}\nconstituents <- download_data_constituents(\"Dow Jones Industrial Average\") |> \n pull(symbol)\n\nparams <- list(period = \"annual\", limit = 5)\n\nbalance_sheet_statements <- constituents |> \n map_df(\n \\(x) fmp_get(resource = \"balance-sheet-statement\", symbol = x, params = params)\n )\n\nincome_statements <- constituents |> \n map_df(\n \\(x) fmp_get(resource = \"income-statement\", symbol = x, params = params)\n )\n\ncash_flow_statements <- constituents |> \n map_df(\n \\(x) fmp_get(resource = \"cash-flow-statement\", symbol = x, params = params)\n )\n```\n:::\n\n\n\n\n## Liquidity Ratios\n\nWe start with ratios that aim to assess a company's liquidity using items from balance sheet statements. The Current Ratio measures a company's ability to pay off its short-term liabilities with its short-term assets. A ratio above 1 indicates that the company has more current assets than current liabilities, suggesting good short-term financial health. \n\n$$\\text{Current Ratio} = \\frac{\\text{Current Assets}}{\\text{Total Assets}}$$\n\nThe next ratio is the Quick Ratio, which measures a company's ability to meet its short-term obligations without relying on the sale of inventory. A ratio above 1 here indicates that the company can cover its short-term liabilities with its most liquid assets. \n\n$$\\text{Quick Ratio} = \\frac{\\text{Current Assets - Liabilities}}{\\text{Current Liabilities}}$$\nLastly, the Cash Ratio measures a company's ability to pay off its short-term liabilities with its cash and cash equivalents. A ratio of 1 or higher indicates a strong liquidity position. \n\n$$\\text{Cash Ratio} = \\frac{\\text{Cash and Cash Equivalents}}{\\text{Current Liabilities}}$$\n\n\n\n\n::: {.cell}\n\n```{.r .cell-code}\nselected_symbols <- c(\"MSFT\", \"AAPL\", \"AMZN\")\n\nbalance_sheets_statements <- balance_sheet_statements |> \n mutate(\n current_ratio = total_current_assets / total_assets,\n quick_ratio = (total_current_assets - total_liabilities) / total_current_liabilities,\n cash_ratio = cash_and_cash_equivalents / total_current_liabilities,\n label = if_else(symbol %in% selected_symbols, symbol, NA),\n )\n```\n:::\n\n\n\n\nComparing liquidity ratios @fig-409 shows...\n\n\n\n\n::: {.cell}\n\n```{.r .cell-code}\nfig_liquidity_ratios <- balance_sheets_statements |> \n filter(calendar_year == 2023 & !is.na(label)) |> \n select(symbol, contains(\"ratio\")) |> \n pivot_longer(-symbol) |> \n mutate(name = str_to_title(str_replace_all(name, \"_\", \" \"))) |> \n ggplot(aes(x = value, y = name, fill = symbol)) +\n geom_col(position = \"dodge\") +\n scale_x_continuous(labels = percent) + \n labs(\n x = NULL, y = NULL, fill = NULL,\n title = \"Liquidity ratios for selected stocks from the Dow index for 2023\"\n )\nfig_liquidity_ratios\n```\n\n::: {.cell-output-display}\n![Liquidity ratios are based on financial statements as provided through the FMP API.](financial-ratios_files/figure-html/fig-409-1.png){#fig-409 fig-alt='Title: Liquidity ratios for selected stocks for 2023. The figure shows a bar chart liquidity ratios on the vertical and corresponding values on the horizontal axis.' width=2100}\n:::\n:::\n\n\n\n\n## Leverage Ratios\n\nThe debt-to-equity ratio measures the proportion of debt financing relative to equity financing. A higher ratio indicates more leverage and potentially higher financial risk. \n\n$$\\text{Debt-to-Equity} = \\frac{\\text{Total Debt}}{\\text{Total Equity}}$$\n\nThe debt-to-asset ratio indicates the percentage of a company’s assets that are financed by debt. A higher ratio also suggests more leverage. \n\n$$\\text{Debt-to-Asset} = \\frac{\\text{Total Debt}}{\\text{Total Assets}}$$\n\nInterest Coverage assesses a company’s ability to pay interest on its debt. Here, a higher ratio indicates better capability to meet interest obligations, and hence less financial risk. \n\n$$\\text{Interest Coverage} = \\frac{\\text{EBIT}}{\\text{Interest Expense}}$$\n\nWe can easily calculate these ratios using our balance sheet and income statements data:\n\n\n\n\n::: {.cell}\n\n```{.r .cell-code}\nbalance_sheets_statements <- balance_sheets_statements |> \n mutate(\n debt_to_equity = total_debt / total_equity,\n debt_to_asset = total_debt / total_assets\n )\n\nincome_statements <- income_statements |> \n mutate(\n interest_coverage = operating_income / interest_expense,\n label = if_else(symbol %in% selected_symbols, symbol, NA),\n )\n```\n:::\n\n\n\n\n@fig-410 shows the debt-to-assets over time.\n\n\n\n\n::: {.cell}\n\n```{.r .cell-code}\nfig_debt_to_asset <- balance_sheets_statements |> \n filter(symbol %in% selected_symbols) |> \n ggplot(aes(x = calendar_year, y = debt_to_asset,\n color = symbol)) +\n geom_line(linewidth = 1) +\n scale_y_continuous(labels = percent) +\n labs(x = NULL, y = NULL, color = NULL,\n title = \"Debt-to-asset ratios of selected stocks between 2020 and 2024\") \nfig_debt_to_asset\n```\n\n::: {.cell-output-display}\n![Debt-to-asset ratios are based on financial statements as provided through the FMP API.](financial-ratios_files/figure-html/fig-410-1.png){#fig-410 fig-alt='Title: Debt-to-asset ratios of selected stocks between 2020 and 2024. The figure shows a line chart with years on the horizontal axis and debt-to-asset ratios on the vertical axis.' width=2100}\n:::\n:::\n\n\n\n\n@fig-411 shows debt-to-asset ratio in the cross-section\n\n\n\n\n::: {.cell}\n\n```{.r .cell-code}\nselected_colors <- c(\"#F21A00\", \"#EBCC2A\", \"#3B9AB2\", \"lightgrey\")\n\nfig_debt_to_asset_cross_section <- balance_sheets_statements |> \n filter(calendar_year == 2023) |> \n ggplot(aes(x = debt_to_asset,\n y = fct_reorder(symbol, debt_to_asset),\n fill = label)) +\n geom_col() +\n scale_x_continuous(labels = percent) +\n scale_fill_manual(values = selected_colors) +\n labs(x = NULL, y = NULL, color = NULL,\n title = \"Debt-to-asset ratios of Dow index constituents in 2023\") + \n theme(legend.position = \"none\")\nfig_debt_to_asset_cross_section\n```\n\n::: {.cell-output-display}\n![Debt-to-asset ratios are based on financial statements as provided through the FMP API.](financial-ratios_files/figure-html/fig-411-1.png){#fig-411 fig-alt='Title: Debt-to-asset ratios of Dow index constituents in 2023. The figure shows a bar chart with debt-to-asset ratios on the horizontal and corresponding symbols on the vertical axis.' width=2100}\n:::\n:::\n\n\n\n\n@fig-412 shows debt-to-asset vs interest coverage\n \n\n\n\n::: {.cell}\n\n```{.r .cell-code}\nfig_debt_to_asset_interest_coverage <- income_statements |> \n filter(calendar_year == 2023) |> \n select(symbol, interest_coverage, calendar_year) |> \n left_join(\n balance_sheets_statements,\n join_by(symbol, calendar_year)\n ) |> \n ggplot(aes(x = debt_to_asset, y = interest_coverage, color = label)) +\n geom_point(size = 2) +\n geom_label_repel(aes(label = label), seed = 42, box.padding = 0.75) +\n scale_x_continuous(labels = percent) +\n scale_y_continuous(labels = percent) +\n scale_color_manual(values = selected_colors) +\n labs(\n x = \"Debt-to-Asset\", y = \"Interest Coverage\",\n title = \"Debt-to-asset ratios and interest coverages for Dow index constituents\"\n ) +\n theme(legend.position = \"none\")\nfig_debt_to_asset_interest_coverage\n```\n\n::: {.cell-output .cell-output-stderr}\n\n```\nWarning: Removed 27 rows containing missing values\n(`geom_label_repel()`).\n```\n\n\n:::\n\n::: {.cell-output-display}\n![Debt-to-asset ratios and interest coverages are based on financial statements as provided through the FMP API.](financial-ratios_files/figure-html/fig-412-1.png){#fig-412 fig-alt='Title: Debt-to-asset ratios and interest coverages for Dow index constituents. The figure shows a scatter plot with debt-to-asset on the horizontal and interest coverage on the vertical axis.' width=2100}\n:::\n:::\n\n\n\n\n## Efficiency Ratios\n\nAsset Turnover measures how efficiently a company uses its assets to generate revenue. A higher ratio indicates more efficient use of assets. \n\n$$\\text{Asset Turnover} = \\frac{\\text{Revenue}}{\\text{Total Assets}}$$\n\nInventory turnover indicates how many times a company’s inventory is sold and replaced over a period. The higher the ratio, the more efficient is the inventory management. \n\n$$\\text{Inventory Turnover} = \\frac{\\text{COGS}}{\\text{Inventory}}$$\n\nReceivables turnover measures how effectively a company collects receivables. A higher ratio indicates a more efficient credit and collection processes. \n\n$$\\text{Receivables Turnover} = \\frac{\\text{Revenue}}{\\text{Accounts Receivable}}$$\n\n\n\n\n::: {.cell}\n\n```{.r .cell-code}\ncombined_statements <- balance_sheets_statements |> \n select(symbol, calendar_year, label, current_ratio, quick_ratio, cash_ratio,\n debt_to_equity, debt_to_asset, total_assets, total_equity) |> \n left_join(\n income_statements |> \n select(symbol, calendar_year, interest_coverage, revenue, cost_of_revenue,\n selling_general_and_administrative_expenses, interest_expense,\n gross_profit, net_income),\n join_by(symbol, calendar_year)\n ) |> \n left_join(\n cash_flow_statements |> \n select(symbol, calendar_year, inventory, accounts_receivables),\n join_by(symbol, calendar_year)\n )\n\ncombined_statements <- combined_statements |> \n mutate(\n asset_turnover = revenue / total_assets,\n inventory_turnover = cost_of_revenue / inventory,\n receivables_turnover = revenue / accounts_receivables\n )\n```\n:::\n\n\n\n\n## Profitability Ratios\n\nGross margin shows the percentage of revenue that exceeds the cost of goods sold (COGS). A higher gross margin implies that the company retains a higher percentage of revenue as gross profit. \n\n$$\\text{Gross Margin} = \\frac{\\text{Gross Profit}}{\\text{Revenue}}$$\n\nProfit margin is the percentage of revenue that translates into net income. A higher profit margin suggests a more profitable company. \n\n$$\\text{Profit Margin} = \\frac{\\text{Net Income}}{\\text{Revenue}}$$\n\nAfter-tax ROE measures the return on shareholders' equity after accounting for taxes. A higher ROE indicates that the company is effectively generating profit from shareholders' investments. \n\n$$\\text{After-Tax ROE} = \\frac{\\text{Net Income}}{\\text{Total Equity}}$$\n\n\n\n\n::: {.cell}\n\n```{.r .cell-code}\ncombined_statements <- combined_statements |> \n mutate(\n gross_margin = gross_profit / revenue,\n profit_margin = net_income / revenue,\n after_tax_roe = net_income / total_equity\n )\n```\n:::\n\n\n\n\nGross margin over time @fig-413 shows\n\n\n\n\n::: {.cell}\n\n```{.r .cell-code}\nfig_gross_margin <- combined_statements |> \n filter(symbol %in% selected_symbols) |> \n ggplot(aes(x = calendar_year, y = gross_margin, color = symbol)) +\n geom_line() +\n scale_y_continuous(labels = percent) + \n labs(x = NULL, y = NULL, color = NULL,\n title = \"Gross margins for selected stocks between 2019 and 2023\")\nfig_gross_margin\n```\n\n::: {.cell-output-display}\n![Gross margins are based on financial statements as provided through the FMP API.](financial-ratios_files/figure-html/fig-413-1.png){#fig-413 fig-alt='Title: Gross margins for selected stocks between 2019 and 2023. The figure shows a line chart with years on the horizontal axis and gross margins on the vertical axis.' width=2100}\n:::\n:::\n\n\n\n\nProfit margin vs gross margin @fig-414 shows\n\n\n\n\n::: {.cell}\n\n```{.r .cell-code}\nfig_gross_margin_profit_margin <- combined_statements |> \n filter(calendar_year == 2023) |> \n ggplot(aes(x = gross_margin, y = profit_margin, color = label)) +\n geom_point(size = 2) +\n geom_label_repel(aes(label = label), seed = 42, box.padding = 0.75) +\n scale_x_continuous(labels = percent) +\n scale_y_continuous(labels = percent) + \n scale_color_manual(values = selected_colors) + \n labs(\n x = \"Gross margin\", y = \"Profit margin\",\n title = \"Gross and profit margins for Dow index constituents for 2023\"\n ) +\n theme(legend.position = \"none\")\nfig_gross_margin_profit_margin\n```\n\n::: {.cell-output .cell-output-stderr}\n\n```\nWarning: Removed 27 rows containing missing values\n(`geom_label_repel()`).\n```\n\n\n:::\n\n::: {.cell-output-display}\n![Gross and profit margins are based on financial statements as provided through the FMP API.](financial-ratios_files/figure-html/fig-414-1.png){#fig-414 fig-alt='Title: Gross and profit margins for Dow index constituents for 2023. The figure shows a scatter plot with gross margins on the horizontal and profit margins on the vertical axis.' width=2100}\n:::\n:::\n\n\n\n\n## Combining Financial Ratios\n\nRanking companies in different categories\n\n@fig-415 shows\n\n\n\n\n::: {.cell}\n\n```{.r .cell-code}\nfinancial_ratios <- combined_statements |> \n filter(calendar_year == 2023) |> \n select(symbol, \n contains(c(\"ratio\", \"margin\", \"roe\", \"_to_\", \"turnover\", \"interest_coverage\"))) |> \n pivot_longer(cols = -symbol) |> \n mutate(\n type = case_when(\n name %in% c(\"current_ratio\", \"quick_ratio\", \"cash_ratio\") ~ \"Liquidity Ratios\",\n name %in% c(\"debt_to_equity\", \"debt_to_asset\", \"interest_coverage\") ~ \"Leverage Ratios\",\n name %in% c(\"asset_turnover\", \"inventory_turnover\", \"receivables_turnover\") ~ \"Efficiency Ratios\",\n name %in% c(\"gross_margin\", \"profit_margin\", \"after_tax_roe\") ~ \"Profitability Ratios\"\n )\n ) \n\nfig_ranks <- financial_ratios |> \n group_by(type, name) |> \n arrange(desc(value)) |> \n mutate(rank = row_number()) |> \n group_by(symbol, type) |> \n summarize(rank = mean(rank), \n .groups = \"drop\") |> \n filter(symbol %in% selected_symbols) |> \n ggplot(aes(x = rank, y = type, color = symbol)) +\n geom_point(shape = 17, size = 4) +\n scale_color_manual(values = selected_colors) + \n labs(x = \"Average rank\", y = NULL, color = NULL,\n title = \"Average rank among Dow index constituents for selected stocks\") +\n coord_cartesian(xlim = c(1, 30))\nfig_ranks\n```\n\n::: {.cell-output-display}\n![Ranks are based on financial statements as provided through the FMP API.](financial-ratios_files/figure-html/fig-415-1.png){#fig-415 fig-alt='Title: Rank in financial ratio categories for selected stocks from the Dow index. The figure shows a scatter plot with ranks for selected stocks on the horizontal and categories of financial ratios on the vertical axis.' width=2100}\n:::\n:::\n\n\n\n\n## Financial Ratios in Asset Pricing\n\nThe Fama-French five-factor model aims to explain stock returns by incorporating specific financial metrics ratios. We provide more details in [Replicating Fama-French Factors](replicating-fama-and-french-factors.qmd), but here is an intuitive overview:\n\n- Size: Calculated as the logarithm of a company’s market capitalization, which is the total market value of its outstanding shares. This factor captures the tendency for smaller firms to outperform larger ones over time.\n- Book-to-Market Ratio: Determined by dividing the company’s book equity by its market capitalization. A higher ratio indicates a 'value' stock, while a lower ratio suggests a 'growth'’' stock. This metric helps differentiate between undervalued and overvalued companies.\n- Profitability: Measured as the ratio of operating profit to book equity, where operating profit is calculated as revenue minus cost of goods sold (COGS), selling, general, and administrative expenses (SG&A), and interest expense. This factor assesses a company’s efficiency in generating profits from its equity base.\n- Investment: Calculated as the percentage change in total assets from the previous period. This factor reflects the company’s growth strategy, indicating whether it is investing aggressively or conservatively.\n\nWe can calculate these factors using the FMP API as follows:\n\n\n\n\n::: {.cell}\n\n```{.r .cell-code}\nmarket_cap <- constituents |> \n map_df(\n \\(x) fmp_get(\n resource = \"historical-market-capitalization\", \n x, \n list(from = \"2023-12-29\", to = \"2023-12-29\")\n )\n ) \n\ncombined_statements_ff <- combined_statements |> \n filter(calendar_year == 2023) |> \n left_join(market_cap, join_by(symbol)) |> \n left_join(\n balance_sheets_statements |> \n filter(calendar_year == 2022) |> \n select(symbol, total_assets_lag = total_assets), \n join_by(symbol)\n ) |> \n mutate(\n size = log(market_cap),\n book_to_market = market_cap / total_equity,\n operating_profitability = (revenue - cost_of_revenue - selling_general_and_administrative_expenses - interest_expense) / total_equity,\n investment = total_assets / total_assets_lag\n )\n```\n:::\n\n\n\n\n@fig-416 shows the ranks of our selected stocks for the Fama-French factors. \n\n\n\n\n::: {.cell}\n\n```{.r .cell-code}\nfig_rank_ff <- combined_statements_ff |> \n select(symbol, Size = size, \n `Book-to-Market` = book_to_market, \n `Profitability` = operating_profitability,\n Investment = investment) |> \n pivot_longer(-symbol) |> \n group_by(name) |> \n arrange(desc(value)) |> \n mutate(rank = row_number()) |> \n ungroup() |> \n filter(symbol %in% selected_symbols) |> \n ggplot(aes(x = rank, y = name, color = symbol)) +\n geom_point(shape = 17, size = 4) +\n scale_color_manual(values = selected_colors) + \n labs(\n x = \"Rank\", y = NULL, color = NULL,\n title = \"Rank in Fama-French variables for selected stocks from the Dow index\"\n ) +\n coord_cartesian(xlim = c(1, 30))\nfig_rank_ff\n```\n\n::: {.cell-output-display}\n![Ranks are based on financial statements and historical market capitalization as provided through the FMP API.](financial-ratios_files/figure-html/fig-416-1.png){#fig-416 fig-alt='Title: Rank in Fama-French variables for selected stocks from the Dow index. The figure shows a scatter plot with ranks for selected stocks on the horizontal and Fama-French variables on the vertical axis.' width=2100}\n:::\n:::\n\n\n\n\n## Key Takeaways\n\n- Financial statements provide standardized, legally required insights into a company’s financial position\n- Ratios allow benchmarking & trend analysis across liquidity, leverage, efficiency & profitability dimensions\n- `fmpapi` enables easy access to financial data for ratio calculations & peer comparisons\n\n## Exercises\n\n1. ...\n\n", + "supporting": [], + "filters": [ + "rmarkdown/pagebreak.lua" + ], + "includes": {}, + "engineDependencies": {}, + "preserve": {}, + "postProcess": true + } +} \ No newline at end of file diff --git a/_freeze/r/financial-ratios/figure-html/fig-409-1.png b/_freeze/r/financial-ratios/figure-html/fig-409-1.png new file mode 100644 index 00000000..4d1ae385 Binary files /dev/null and b/_freeze/r/financial-ratios/figure-html/fig-409-1.png differ diff --git a/_freeze/r/financial-ratios/figure-html/fig-410-1.png b/_freeze/r/financial-ratios/figure-html/fig-410-1.png new file mode 100644 index 00000000..3bd291af Binary files /dev/null and b/_freeze/r/financial-ratios/figure-html/fig-410-1.png differ diff --git a/_freeze/r/financial-ratios/figure-html/fig-411-1.png b/_freeze/r/financial-ratios/figure-html/fig-411-1.png new file mode 100644 index 00000000..95548af6 Binary files /dev/null and b/_freeze/r/financial-ratios/figure-html/fig-411-1.png differ diff --git a/_freeze/r/financial-ratios/figure-html/fig-412-1.png b/_freeze/r/financial-ratios/figure-html/fig-412-1.png new file mode 100644 index 00000000..ef2c02fa Binary files /dev/null and b/_freeze/r/financial-ratios/figure-html/fig-412-1.png differ diff --git a/_freeze/r/financial-ratios/figure-html/fig-413-1.png b/_freeze/r/financial-ratios/figure-html/fig-413-1.png new file mode 100644 index 00000000..5493dfe7 Binary files /dev/null and b/_freeze/r/financial-ratios/figure-html/fig-413-1.png differ diff --git a/_freeze/r/financial-ratios/figure-html/fig-414-1.png b/_freeze/r/financial-ratios/figure-html/fig-414-1.png new file mode 100644 index 00000000..a721c62a Binary files /dev/null and b/_freeze/r/financial-ratios/figure-html/fig-414-1.png differ diff --git a/_freeze/r/financial-ratios/figure-html/fig-415-1.png b/_freeze/r/financial-ratios/figure-html/fig-415-1.png new file mode 100644 index 00000000..05b9d02d Binary files /dev/null and b/_freeze/r/financial-ratios/figure-html/fig-415-1.png differ diff --git a/_freeze/r/financial-ratios/figure-html/fig-416-1.png b/_freeze/r/financial-ratios/figure-html/fig-416-1.png new file mode 100644 index 00000000..99ba9539 Binary files /dev/null and b/_freeze/r/financial-ratios/figure-html/fig-416-1.png differ diff --git a/_freeze/r/financial-ratios/figure-html/unnamed-chunk-10-1.png b/_freeze/r/financial-ratios/figure-html/unnamed-chunk-10-1.png new file mode 100644 index 00000000..1c4633cb Binary files /dev/null and b/_freeze/r/financial-ratios/figure-html/unnamed-chunk-10-1.png differ diff --git a/_freeze/r/financial-ratios/figure-html/unnamed-chunk-11-1.png b/_freeze/r/financial-ratios/figure-html/unnamed-chunk-11-1.png new file mode 100644 index 00000000..6506a319 Binary files /dev/null and b/_freeze/r/financial-ratios/figure-html/unnamed-chunk-11-1.png differ diff --git a/_freeze/r/financial-ratios/figure-html/unnamed-chunk-14-1.png b/_freeze/r/financial-ratios/figure-html/unnamed-chunk-14-1.png new file mode 100644 index 00000000..5493dfe7 Binary files /dev/null and b/_freeze/r/financial-ratios/figure-html/unnamed-chunk-14-1.png differ diff --git a/_freeze/r/financial-ratios/figure-html/unnamed-chunk-15-1.png b/_freeze/r/financial-ratios/figure-html/unnamed-chunk-15-1.png new file mode 100644 index 00000000..fbe557e0 Binary files /dev/null and b/_freeze/r/financial-ratios/figure-html/unnamed-chunk-15-1.png differ diff --git a/_freeze/r/financial-ratios/figure-html/unnamed-chunk-16-1.png b/_freeze/r/financial-ratios/figure-html/unnamed-chunk-16-1.png new file mode 100644 index 00000000..05b9d02d Binary files /dev/null and b/_freeze/r/financial-ratios/figure-html/unnamed-chunk-16-1.png differ diff --git a/_freeze/r/financial-ratios/figure-html/unnamed-chunk-18-1.png b/_freeze/r/financial-ratios/figure-html/unnamed-chunk-18-1.png new file mode 100644 index 00000000..99ba9539 Binary files /dev/null and b/_freeze/r/financial-ratios/figure-html/unnamed-chunk-18-1.png differ diff --git a/_freeze/r/financial-ratios/figure-html/unnamed-chunk-7-1.png b/_freeze/r/financial-ratios/figure-html/unnamed-chunk-7-1.png new file mode 100644 index 00000000..4d1ae385 Binary files /dev/null and b/_freeze/r/financial-ratios/figure-html/unnamed-chunk-7-1.png differ diff --git a/_freeze/r/financial-ratios/figure-html/unnamed-chunk-9-1.png b/_freeze/r/financial-ratios/figure-html/unnamed-chunk-9-1.png new file mode 100644 index 00000000..83311cc4 Binary files /dev/null and b/_freeze/r/financial-ratios/figure-html/unnamed-chunk-9-1.png differ diff --git a/_freeze/r/modern-portfolio-theory/execute-results/html.json b/_freeze/r/modern-portfolio-theory/execute-results/html.json new file mode 100644 index 00000000..e1330851 --- /dev/null +++ b/_freeze/r/modern-portfolio-theory/execute-results/html.json @@ -0,0 +1,17 @@ +{ + "hash": "35794a8526818fe3aa9679072ee29e40", + "result": { + "engine": "knitr", + "markdown": "---\ntitle: Modern Portfolio Theory\nmetadata:\n pagetitle: Modern Portfolio Theory with R\n description-meta: Learn how to use the programming language R for implementing the Markowitz model for portfolio optimization.\n---\n\n\n\n\nIn the previous chapter, we showed how to download stock market data and analyze them with graphs and summary statistics. Now, we move to a typical question in Finance: how should wealth be allocated across assets with varying returns, risks, and correlations to optimize a portfolio’s performance?\\index{Portfolio choice} Modern Portfolio Theory (MPT), introduced by [@Markowitz1952], revolutionized the way we think about such investments by formalizing the trade-off between risk and return. Markowitz’s framework laid the foundation for much of modern finance, earning him the Sveriges Riksbank Prize in Economic Sciences in 1990.\n\nMarkowitz demonstrates that portfolio risk depends not only on individual asset volatilities but also on the correlations between asset returns. This insight highlights the power of diversification: combining assets with low or negative correlations reduces overall portfolio risk. This principle is often illustrated with the analogy of a fruit basket: If all you have are apples & they spoil, you lose everything. With a variety of fruits, some fruits may spoil, but others will stay fresh.\n\nAt the heart of MPT is mean-variance analysis, which evaluates portfolios based on two dimensions: expected return and risk. By balancing these two factors, investors can construct portfolios that either maximize return for a given level of risk or minimize risk for a desired level of return.\n\nWe use the following packages throughout this chapter: \n\n\n\n\n::: {.cell}\n\n```{.r .cell-code}\nlibrary(tidyverse)\nlibrary(tidyfinance)\nlibrary(scales)\nlibrary(ggrepel)\n```\n:::\n\n\n\n\n\n\n## Estimate Expected Returns\n\nExpected returns, denoted as $\\mu_i$, represent the anticipated profit from holding an asset $i$. They are typically estimated using historical data by computing the average of past returns:\n\n$$\\hat{\\mu}_i = \\frac{1}{T} \\sum_{t=1}^{T} r_{it},$$\n\nwhere $r_{it}$ is the return of asset $i$ in period $t$, and $T$ is the total number of periods. While past performance does not guarantee future results, the typical assumption is that it is at least indicative of future performance.\n\nLeveraging the approach of [Working with Stock Returns](working-with-stock-returns.qmd), we download the constituents of the Dow Jones Industrial Average as an example portfolio, as well as their daily adjusted close prices:\n\n\n\n\n::: {.cell}\n\n```{.r .cell-code}\nsymbols <- download_data(\n type = \"constituents\",\n index = \"Dow Jones Industrial Average\"\n)\n\nprices_daily <- download_data(\n type = \"stock_prices\", symbol = symbols$symbol,\n start_date = \"2019-08-01\", end_date = \"2024-07-31\"\n) |> \n select(symbol, date, price = adjusted_close)\n\nprices_daily\n```\n:::\n\n\n\n\nThen, we proceed to calculate daily returns for each asset. \n\n\n\n\n::: {.cell}\n\n```{.r .cell-code}\nreturns_daily <- prices_daily |>\n group_by(symbol) |> \n mutate(ret = price / lag(price) - 1) |>\n ungroup() |> \n select(symbol, date, ret) |> \n drop_na(ret) |> \n arrange(symbol, date)\n\nreturns_daily \n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n# A tibble: 37,680 × 3\n symbol date ret\n \n1 AAPL 2019-08-02 -0.0212\n2 AAPL 2019-08-05 -0.0523\n3 AAPL 2019-08-06 0.0189\n4 AAPL 2019-08-07 0.0104\n5 AAPL 2019-08-08 0.0221\n# ℹ 37,675 more rows\n```\n\n\n:::\n:::\n\n\n\n\nWe can use the tidy return data to quickly calcualte the estimated expected return of each asset in the Dow Jones Industrial Average.\n\n\n\n\n::: {.cell}\n\n```{.r .cell-code}\nassets <- returns_daily |> \n group_by(symbol) |> \n summarize(mu = mean(ret))\n```\n:::\n\n\n\n\nFigure @fig-201 shows the corresponding average daily returns of the constituents of our example portfolio. \n\n\n\n\n::: {.cell}\n\n```{.r .cell-code}\nfig_mu <- assets |> \n ggplot(aes(x = mu, y = fct_reorder(symbol, mu), \n fill = mu > 0)) +\n geom_col() +\n scale_x_continuous(labels = percent) + \n labs(x = NULL, y = NULL, fill = NULL,\n title = \"Average daily returns of Dow index constituents\") +\n theme(legend.position = \"none\")\nfig_mu\n```\n\n::: {.cell-output-display}\n![Average daily returns based on prices adjusted for dividend payments and stock splits.](modern-portfolio-theory_files/figure-html/fig-201-1.png){#fig-201 fig-alt='Title: Average daily stock returns of Dow index constituents. The figure shows 30 bars with average daily returns.' width=2100}\n:::\n:::\n\n\n\n\n## Estimate the Variance-Covariance Matrix\n\nIndividual asset risk in MPT is typically quantified using variance ($\\sigma^2$) or volatilities ($\\sigma$). The latter can be estimated as:\n\n$$\\hat{\\sigma}_i = \\sqrt{\\frac{1}{T-1} \\sum_{t=1}^{T} (r_{it} - \\hat{\\mu}_i)^2}$$\n\n\nNext, we transform the returns from a tidy tibble into a $(T \\times N)$ matrix with one column for each of the $N$ symbols and one row for each of the $T$ trading days to compute the sample average return vector $$\\hat\\mu = \\frac{1}{T}\\sum\\limits_{t=1}^T r_t$$ where $r_t$ is the $N$ vector of returns on date $t$ and the sample covariance matrix $$\\hat\\Sigma = \\frac{1}{T-1}\\sum\\limits_{t=1}^T (r_t - \\hat\\mu)(r_t - \\hat\\mu)'.$$ We achieve this by using `pivot_wider()` with the new column names from the column `symbol` and setting the values to `ret`. We compute the vector of sample average returns and the sample variance-covariance matrix, which we consider as proxies for the parameters of the distribution of future stock returns. Thus, for simplicity, we refer to $\\Sigma$ and $\\mu$ instead of explicitly highlighting that the sample moments are estimates. \\index{Covariance} In later chapters, we discuss the issues that arise once we take estimation uncertainty into account.\n\n\n\n\n::: {.cell}\n\n```{.r .cell-code}\nvolatilities <- returns_daily |> \n group_by(symbol) |> \n summarize(sigma = sd(ret))\n\nassets <- assets |> \n left_join(volatilities, join_by(symbol))\n```\n:::\n\n\n\n\nFigure @fig-202 shows the corresponding invidiual stock volatitilies. \n\n\n\n\n::: {.cell}\n\n```{.r .cell-code}\nfig_sigma <- assets |> \n ggplot(aes(x = sigma, y = fct_reorder(symbol, sigma))) +\n geom_col() +\n scale_x_continuous(labels = percent) + \n labs(x = NULL, y = NULL,\n title = \"Daily volatilities of Dow index constituents\")\nfig_sigma\n```\n\n::: {.cell-output-display}\n![Daily volatilities based on prices adjusted for dividend payments and stock splits.](modern-portfolio-theory_files/figure-html/fig-202-1.png){#fig-202 fig-alt='Title: Daily volatilities of DOW index constituents. The figure shows 30 bars with daily volatilities.' width=2100}\n:::\n:::\n\n\n\n\n*Covariance* measures interaction between assets\n\n$$\\hat{\\sigma}_{ij} = \\frac{1}{T-1} \\sum_{t=1}^{T} (R_{it} - \\hat{\\mu}_i)(R_{jt} - \\hat{\\mu}_j)$$\n\n**Interpretation**:\n\n- **Positive**: assets move in the same direction, potentially increasing portfolio risk\n- **Negative**: assets move in opposite directions, which can reduce risk through diversification\n\nEstimating the *variance-covariance matrix*\n\n\n\n\n::: {.cell}\n\n```{.r .cell-code}\nreturns_wide <- returns_daily |> \n pivot_wider(names_from = symbol, values_from = ret) \n\nvcov <- returns_wide |> \n select(-date) |> \n cov()\n```\n:::\n\n\n\n\nFigure @fig-203 provides an illustration of the variance-covariance matrix. \n\n\n\n\n::: {.cell}\n\n```{.r .cell-code}\nfig_vcov <- vcov |> \n as_tibble(rownames = \"symbol_a\") |> \n pivot_longer(-symbol_a, names_to = \"symbol_b\") |> \n ggplot(aes(x = symbol_a, y = fct_rev(symbol_b), fill = value)) +\n geom_tile() +\n labs(\n x = NULL, y = NULL, fill = \"(Co-)Variance\",\n title = \"Variance-covariance matrix of Dow index constituents\"\n ) + \n theme(axis.text.x = element_text(angle = 45, hjust = 1)) +\n guides(fill = guide_colorbar(barwidth = 15, barheight = 0.5))\nfig_vcov\n```\n\n::: {.cell-output-display}\n![Variances and covariances based on prices adjusted for dividend payments and stock splits.](modern-portfolio-theory_files/figure-html/fig-203-1.png){#fig-203 fig-alt='Title: Variance-covariance matrix of DOW index constituents. The figure shows 900 tiles with variances and covariances between each constituent-pair.' width=2100}\n:::\n:::\n\n\n\n\n## The Minimum-Variance Framework\n\n$\\text{Expected Portfolio Return} = \\sum_{i=1}^n \\omega_i \\hat{\\mu}_i$\n\n-\t$\\omega_i$: weight of asset $i$ in the portfolio\n- $\\hat{\\mu}_i$: estimated expected return of asset $i$\n\nExample:\n\n- Asset A: 60% weight, expected return 8%\n- Asset B: 40% weight, expected return 12%\n- $(0.6 \\times 8\\%) + (0.4 \\times 12\\%) = 9.6\\%$\n\n**Assumption**: portfolio weights are constant over time\n\nPortfolio variance is calculated as\n\n$$\\sum_{i=1}^{n} \\sum_{j=1}^{n} \\omega_i \\omega_j \\hat{\\sigma}_{ij}$$\n\n- $\\omega_i$, $\\omega_j$: the weights of assets $i$, $j$ in the portfolio\n- $\\hat{\\sigma}_{ij}$: covariance between returns of assets $i$ and $j$\n-\t$n$: number of assets in portfolio\n\n**Minimize portfolio variance**\n\n$$\\min_{\\omega_1, ... \\omega_n} \\sum_{i=1}^{n} \\sum_{j=1}^{n} \\omega_i \\omega_j \\hat{\\sigma}_{ij}$$\n\nwhile staying **fully invested**\n\n$$\\sum_{i=1}^{n} \\omega_i = 1$$\n\nMinimum variance in *matrix notation*\n\n**Minimize portfolio variance**\n\n$$\\min_{\\omega} \\omega' \\hat{\\Sigma} \\omega$$\n\nwhile staying **fully invested**\n\n$$ \\omega'\\iota = 1$$\n\nSolution for minimum-variance portfolio\n\n$$\\omega_\\text{mvp} = \\frac{\\Sigma^{-1}\\iota}{\\iota'\\Sigma^{-1}\\iota}$$\n\n- $\\iota$: vector of 1's\n- $\\Sigma^{-1}$: inverse of variance-covariance matrix $\\Sigma$\n\nThen, we compute the minimum variance portfolio weights $\\omega_\\text{mvp}$ as well as the expected portfolio return $\\omega_\\text{mvp}'\\mu$ and volatility $\\sqrt{\\omega_\\text{mvp}'\\Sigma\\omega_\\text{mvp}}$ of this portfolio. \\index{Minimum variance portfolio} Recall that the minimum variance portfolio is the vector of portfolio weights that are the solution to $$\\omega_\\text{mvp} = \\arg\\min \\omega'\\Sigma \\omega \\text{ s.t. } \\sum\\limits_{i=1}^N\\omega_i = 1.$$ The constraint that weights sum up to one simply implies that all funds are distributed across the available asset universe, i.e., there is no possibility to retain cash. It is easy to show analytically that $\\omega_\\text{mvp} = \\frac{\\Sigma^{-1}\\iota}{\\iota'\\Sigma^{-1}\\iota}$, where $\\iota$ is a vector of ones and $\\Sigma^{-1}$ is the inverse of $\\Sigma$.\n\n\n\n\n::: {.cell}\n\n```{.r .cell-code}\niota <- rep(1, dim(vcov)[1])\nvcov_inv <- solve(vcov)\nomega_mvp <- as.vector(vcov_inv %*% iota) / \n as.numeric(t(iota) %*% vcov_inv %*% iota)\n```\n:::\n\n\n\n\nFigure @fig-204 shows the resulting portfolio weights. \n\n\n\n\n::: {.cell}\n\n```{.r .cell-code}\nassets <- bind_cols(assets, omega_mvp = omega_mvp)\n\nfig_omega_mvp <- assets |>\n ggplot(aes(x = omega_mvp, y = fct_reorder(symbol, omega_mvp), \n fill = omega_mvp > 0)) +\n geom_col() +\n scale_x_continuous(labels = percent) + \n labs(x = NULL, y = NULL, \n title = \"Minimum-variance portfolio weights\") +\n theme(legend.position = \"none\")\nfig_omega_mvp\n```\n\n::: {.cell-output-display}\n![Weights are based on returns adjusted for dividend payments and stock splits.](modern-portfolio-theory_files/figure-html/fig-204-1.png){#fig-204 fig-alt='Title: Minimum-variance portfolio weights. The figure shows a bar chart with portfolio weights for each DOW index constituent.' width=2100}\n:::\n:::\n\n\n\n\nMinimum-variance portfolio return\n\n\n\n\n::: {.cell}\n\n```{.r .cell-code}\nmu <- assets$mu\n\nsummary_mvp <- tibble(\n mu = sum(omega_mvp * mu),\n sigma = as.numeric(sqrt(t(omega_mvp) %*% vcov %*% omega_mvp)),\n type = \"Minimum-Variance Portfolio\"\n)\n\nsummary_mvp\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n# A tibble: 1 × 3\n mu sigma type \n \n1 0.000307 0.00937 Minimum-Variance Portfolio\n```\n\n\n:::\n:::\n\n\n\n\n## Efficient Portfolios\n\n**Minimize portfolio variance**\n\n$$\\min_{\\omega} \\omega' \\hat{\\Sigma} \\omega$$\n\nWhile earning **minimum expected return** $\\bar{\\mu}$\n\n- $$ \\omega'\\iota = 1$$\n- $\\omega'\\hat{\\mu} = \\bar{\\mu}$\n\nDow Jones vs Nasdaq 100\n\n\n\n\n::: {.cell}\n\n```{.r .cell-code}\ndownload_data(\n type = \"stock_prices\", \n symbol = c(\"^DJI\", \"^NDX\"), \n start_date = \"2019-08-01\", end_date = \"2024-07-31\"\n) |> \n group_by(symbol) |> \n arrange(date) |> \n mutate(adjusted_close = adjusted_close / first(adjusted_close)) |> \n ggplot(aes(x = date, y = adjusted_close, color = symbol)) +\n geom_line() +\n scale_y_continuous(labels = percent) + \n labs(x = NULL, y = NULL, color = NULL,\n title = \"Performance of Dow (^DJI) vs Nasdaq 100 (^NDX)\",\n subtitle = \"Both indexes start at 100%\") \n```\n\n::: {.cell-output-display}\n![](modern-portfolio-theory_files/figure-html/unnamed-chunk-13-1.png){width=2100}\n:::\n:::\n\n\n\n\nChoose a minimum expected return: Achieve at least average Nasdaq 100 return:\n\n\n\n\n\n\n::: {.cell}\n\n```{.r .cell-code}\nmu_bar <- download_data(\n \"stock_prices\", symbol = \"^NDX\", \n start_date = \"2019-08-01\", end_date = \"2024-07-31\"\n) |> \n mutate(\n ret = adjusted_close / lag(adjusted_close) - 1\n ) |> \n summarize(mean(ret, na.rm = TRUE)) |> \n pull() \n```\n:::\n\n\n\n\n**Note:** $\\bar\\mu$ needs to be higher than $\\hat\\mu_{mvp}$\n\nSolution for efficient portfolio:\n\n$$\\omega_{efp} = \\frac{\\lambda^*}{2}\\left(\\Sigma^{-1}\\mu -\\frac{D}{C}\\Sigma^{-1}\\iota \\right)$$ \n\nwhere $\\lambda^* = 2\\frac{\\bar\\mu - D/C}{E-D^2/C}$, $C = \\iota'\\Sigma^{-1}\\iota$, $D=\\iota'\\Sigma^{-1}\\mu$ & $E=\\mu'\\Sigma^{-1}\\mu$\n\nSee details on [tidy-finance.org](https://www.tidy-finance.org/r/introduction-to-tidy-finance.html#the-efficient-frontier)\n\nCalculate efficient portfolio\n\nThe command `solve(A, b)` returns the solution of a system of equations $Ax = b$. If `b` is not provided, as in the example above, it defaults to the identity matrix such that `solve(sigma)` delivers $\\Sigma^{-1}$ (if a unique solution exists).\\\nNote that the *monthly* volatility of the minimum variance portfolio is of the same order of magnitude as the *daily* standard deviation of the individual components. Thus, the diversification benefits in terms of risk reduction are tremendous!\\index{Diversification}\n\nNext, we set out to find the weights for a portfolio that achieves, as an example, three times the expected return of the minimum variance portfolio. However, mean-variance investors are not interested in any portfolio that achieves the required return but rather in the efficient portfolio, i.e., the portfolio with the lowest standard deviation. If you wonder where the solution $\\omega_\\text{eff}$ comes from: \\index{Efficient portfolio} The efficient portfolio is chosen by an investor who aims to achieve minimum variance *given a minimum acceptable expected return* $\\bar{\\mu}$. Hence, their objective function is to choose $\\omega_\\text{eff}$ as the solution to $$\\omega_\\text{eff}(\\bar{\\mu}) = \\arg\\min \\omega'\\Sigma \\omega \\text{ s.t. } \\omega'\\iota = 1 \\text{ and } \\omega'\\mu \\geq \\bar{\\mu}.$$\n\nThe code below implements the analytic solution to this optimization problem for a benchmark return $\\bar\\mu$, which we set to 3 times the expected return of the minimum variance portfolio. We encourage you to verify that it is correct.\n\n\n\n\n::: {.cell}\n\n```{.r .cell-code}\nC <- as.numeric(t(iota) %*% vcov_inv %*% iota)\nD <- as.numeric(t(iota) %*% vcov_inv %*% mu)\nE <- as.numeric(t(mu) %*% vcov_inv %*% mu)\nlambda_tilde <- as.numeric(2 * (mu_bar - D / C) / (E - D^2 / C))\nomega_efp <- as.vector(omega_mvp + lambda_tilde / 2 * (vcov_inv %*% mu - D * omega_mvp))\n\nsummary_efp <- tibble(\n mu = sum(omega_efp * mu),\n sigma = as.numeric(sqrt(t(omega_efp) %*% vcov %*% omega_efp)),\n type = \"Efficient Portfolio\"\n)\n```\n:::\n\n\n\n\nFigure @fig-205 sows the average return and volatility of the the minimum-variance and efficient portfolio relative to the index constituents. \n\n\n\n\n::: {.cell}\n\n```{.r .cell-code}\nsummaries <- bind_rows(\n assets, summary_mvp, summary_efp\n) \n\nfig_summaries <- summaries |> \n ggplot(aes(x = sigma, y = mu)) +\n geom_point(\n data = summaries |> filter(is.na(type))\n ) +\n geom_point(\n data = summaries |> filter(!is.na(type)), color = \"#F21A00\", size = 3\n ) +\n geom_label_repel(aes(label = type)) +\n scale_x_continuous(labels = percent) +\n scale_y_continuous(labels = percent) + \n labs(\n x = \"Volatility\", y = \"Average return\",\n title = \"Efficient & minimum-variance portfolios for Dow index constituents\"\n ) \nfig_summaries\n```\n\n::: {.cell-output-display}\n![The big dots indicate the location of the minimum variance and the efficient portfolio that delivers the expected return of the Nasdaq 100, respectively. The small dots indicate the location of the individual constituents.](modern-portfolio-theory_files/figure-html/fig-205-1.png){#fig-205 fig-alt='Title: Minimum-variance portfolio weights. The figure shows a bar chart with portfolio weights for each Dow index constituent.' width=2100}\n:::\n:::\n\n\n\n\n## The Efficient Frontier\n\n\\index{Efficient frontier} An essential tool to evaluate portfolios in the mean-variance context is the *efficient frontier*, the set of portfolios which satisfies the condition that no other portfolio exists with a higher expected return but with the same volatility (the square root of the variance, i.e., the risk), see, e.g., @Merton1972.\\index{Return volatility} We compute and visualize the efficient frontier for several stocks. First, we extract each asset's *monthly* returns. In order to keep things simple, we work with a balanced panel and exclude Dow constituents for which we do not observe a price on every single trading day since the year 2000.\n\n\\index{Separation theorem} The [mutual fund separation theorem](https://en.wikipedia.org/wiki/Mutual_fund_separation_theorem) states that as soon as we have two efficient portfolios (such as the minimum variance portfolio $\\omega_\\text{mvp}$ and the efficient portfolio for a higher required level of expected returns $\\omega_\\text{eff}(\\bar{\\mu})$, we can characterize the entire efficient frontier by combining these two portfolios. That is, any linear combination of the two portfolio weights will again represent an efficient portfolio. \\index{Efficient frontier} The code below implements the construction of the *efficient frontier*, which characterizes the highest expected return achievable at each level of risk.\n\n$$\\omega_{eff} = a \\cdot \\omega_{efp} + (1-a) \\cdot\\omega_{mvp}$$\n\n\n\n\n::: {.cell}\n\n```{.r .cell-code}\nefficient_frontier <- tibble(\n a = seq(from = -1, to = 4, by = 0.01),\n) |> \n mutate(\n omega = map(a, ~ .x * omega_efp + (1 - .x) * omega_mvp),\n mu = map_dbl(omega, ~ t(.x) %*% mu),\n sigma = map_dbl(omega, ~ sqrt(t(.x) %*% vcov %*% .x)),\n ) \n```\n:::\n\n\n\n\nThe code above proceeds in two steps: First, we compute a vector of combination weights $a$ and then we evaluate the resulting linear combination with $a\\in\\mathbb{R}$:\\\n$$\\omega^* = a\\omega_\\text{eff}(\\bar\\mu) + (1-a)\\omega_\\text{mvp} = \\omega_\\text{mvp} + \\frac{\\lambda^*}{2}\\left(\\Sigma^{-1}\\mu -\\frac{D}{C}\\Sigma^{-1}\\iota \\right)$$ with $\\lambda^* = 2\\frac{a\\bar\\mu + (1-a)\\tilde\\mu - D/C}{E-D^2/C}$ where $C = \\iota'\\Sigma^{-1}\\iota$, $D=\\iota'\\Sigma^{-1}\\mu$, and $E=\\mu'\\Sigma^{-1}\\mu$. Finally, it is simple to visualize the efficient frontier alongside the two efficient portfolios within one powerful figure using `ggplot` (see @fig-206). We also add the individual stocks in the same call. We compute annualized returns based on the simple assumption that monthly returns are independent and identically distributed. Thus, the average annualized return is just 12 times the expected monthly return.\\index{Graph!Efficient frontier}\n\n\n\n\n::: {.cell}\n\n```{.r .cell-code}\nsummaries <- bind_rows(\n summaries, efficient_frontier\n )\n\nfig_efficient_frontier <- summaries |> \n ggplot(aes(x = sigma, y = mu)) +\n geom_point(\n data = summaries |> filter(is.na(type))\n ) +\n geom_point(\n data = summaries |> filter(!is.na(type)), color = \"#F21A00\", size = 3\n ) +\n geom_label_repel(aes(label = type)) +\n scale_x_continuous(labels = percent) +\n scale_y_continuous(labels = percent) + \n labs(x = \"Volatility\", y = \"Average return\",\n title = \"Efficient frontier for Dow index constituents\") + \n theme(legend.position = \"none\")\nfig_efficient_frontier\n```\n\n::: {.cell-output-display}\n![The big dots indicate the location of the minimum variance and the efficient portfolio that delivers 3 times the expected return of the minimum variance portfolio, respectively. The small dots indicate the location of the individual constituents.](modern-portfolio-theory_files/figure-html/fig-206-1.png){#fig-206 fig-alt='Title: Efficient frontier for Dow index constituents. The figure shows Dow index constituents in a mean-variance diagram. A hyperbola indicates the efficient frontier of portfolios that dominate the individual holdings in the sense that they deliver higher expected returns for the same level of volatility.' width=2100}\n:::\n:::\n\n\n\n\n## Extending the Markowitz Model\n\nReplicate minimum-variance via `PortfolioAnalytics` package. \n\n\n\n\n::: {.cell}\n\n```{.r .cell-code}\nlibrary(PortfolioAnalytics)\nlibrary(CVXR)\n```\n:::\n\n::: {.cell}\n\n```{.r .cell-code}\nreturns_matrix <- column_to_rownames(\n returns_wide, var = \"date\"\n)\n\nproblem_mvp <- portfolio.spec(colnames(returns_matrix)) |>\n add.objective(type = \"risk\", name = \"var\") |> \n add.constraint(\"full_investment\")\n\nsolution_mvp <- optimize.portfolio(\n returns_matrix, problem_mvp, optimize_method = \"CVXR\"\n)\n\nall.equal(omega_mvp, as.vector(solution_mvp$weights))\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n[1] TRUE\n```\n\n\n:::\n:::\n\n\n\n\nReplicate efficient portfolio via *PortfolioAnalytics*\n\n\n\n\n::: {.cell}\n\n```{.r .cell-code}\nproblem_efp <- problem_mvp |> \n add.constraint(\"return\", return_target = mu_bar)\n\nsolution_efp <- optimize.portfolio(\n returns_matrix, problem_efp, optimize_method = \"CVXR\"\n)\n\nall.equal(omega_efp, as.vector(solution_efp$weights)) \n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n[1] TRUE\n```\n\n\n:::\n:::\n\n\n\n\nEasy to extend Markowitz model\n\n- Short sale constraints: `add.constraint(\"long_only\")`\n- Position limit: `add.constraint(\"position_limit\", max_pos = 10)`\n- Expected shortfall: `add.objective(type = \"risk\", name = \"ES\")`\n\n.. and many more, see [official *PortfolioAnalytics* vignette](https://cran.r-project.org/web/packages/PortfolioAnalytics/vignettes/portfolio_vignette.pdf) \n\n- Mean-variance framework is a cornerstone of finance\n- Download financial data using `tidyfinance` package\n- Easy to compute analytic solutions 'manually'\n- Implement extensions using `PortfolioAnalytics`\n- More advanced: [constrained optimization & backtesting](https://www.tidy-finance.org/r/constrained-optimization-and-backtesting.html)\n\n## Key Takeaways\n\n...\n\n## Exercises\n\n1. In the portfolio choice analysis, we restricted our sample to all assets trading every day since 2000. How is such a decision a problem when you want to infer future expected portfolio performance from the results?\n1. The efficient frontier characterizes the portfolios with the highest expected return for different levels of risk. Identify the portfolio with the highest expected return per standard deviation. Which famous performance measure is close to the ratio of average returns to the standard deviation of returns?", + "supporting": [ + "modern-portfolio-theory_files" + ], + "filters": [ + "rmarkdown/pagebreak.lua" + ], + "includes": {}, + "engineDependencies": {}, + "preserve": {}, + "postProcess": true + } +} \ No newline at end of file diff --git a/_freeze/r/modern-portfolio-theory/figure-html/fig-201-1.png b/_freeze/r/modern-portfolio-theory/figure-html/fig-201-1.png new file mode 100644 index 00000000..282377dd Binary files /dev/null and b/_freeze/r/modern-portfolio-theory/figure-html/fig-201-1.png differ diff --git a/_freeze/r/modern-portfolio-theory/figure-html/fig-202-1.png b/_freeze/r/modern-portfolio-theory/figure-html/fig-202-1.png new file mode 100644 index 00000000..53976510 Binary files /dev/null and b/_freeze/r/modern-portfolio-theory/figure-html/fig-202-1.png differ diff --git a/_freeze/r/modern-portfolio-theory/figure-html/fig-203-1.png b/_freeze/r/modern-portfolio-theory/figure-html/fig-203-1.png new file mode 100644 index 00000000..d2d809d1 Binary files /dev/null and b/_freeze/r/modern-portfolio-theory/figure-html/fig-203-1.png differ diff --git a/_freeze/r/modern-portfolio-theory/figure-html/fig-204-1.png b/_freeze/r/modern-portfolio-theory/figure-html/fig-204-1.png new file mode 100644 index 00000000..54aed6ea Binary files /dev/null and b/_freeze/r/modern-portfolio-theory/figure-html/fig-204-1.png differ diff --git a/_freeze/r/modern-portfolio-theory/figure-html/fig-205-1.png b/_freeze/r/modern-portfolio-theory/figure-html/fig-205-1.png new file mode 100644 index 00000000..c9183387 Binary files /dev/null and b/_freeze/r/modern-portfolio-theory/figure-html/fig-205-1.png differ diff --git a/_freeze/r/modern-portfolio-theory/figure-html/fig-206-1.png b/_freeze/r/modern-portfolio-theory/figure-html/fig-206-1.png new file mode 100644 index 00000000..4dbf1356 Binary files /dev/null and b/_freeze/r/modern-portfolio-theory/figure-html/fig-206-1.png differ diff --git a/_freeze/r/modern-portfolio-theory/figure-html/unnamed-chunk-13-1.png b/_freeze/r/modern-portfolio-theory/figure-html/unnamed-chunk-13-1.png new file mode 100644 index 00000000..2b6dd217 Binary files /dev/null and b/_freeze/r/modern-portfolio-theory/figure-html/unnamed-chunk-13-1.png differ diff --git a/_freeze/r/working-with-stock-returns/execute-results/html.json b/_freeze/r/working-with-stock-returns/execute-results/html.json new file mode 100644 index 00000000..7230d96c --- /dev/null +++ b/_freeze/r/working-with-stock-returns/execute-results/html.json @@ -0,0 +1,17 @@ +{ + "hash": "8390f7ba2b1e79b035301d39ca33c81c", + "result": { + "engine": "knitr", + "markdown": "---\ntitle: Working with Stock Returns\naliases:\n - ../introduction-to-tidy-finance.html\n - ../r/introduction-to-tidy-finance.html\nmetadata:\n pagetitle: Working with Stock Returns in R\n description-meta: Learn how to use the programming language R for downloading and analyzing stock market data.\n---\n\n\n\n\n::: callout-note\nYou are reading **Tidy Finance with R**. You can find the equivalent chapter for the sibling **Tidy Finance with Python** [here](../python/introduction-to-tidy-finance.qmd).\n:::\n\nThe main aim of this chapter is to familiarize yourself with the `tidyverse` for working with stock market data. We focus on downloading and visualizing stock data from Yahoo Finance.\n\nAt the start of each session, we load the required R packages. Throughout the entire book, we always use the `tidyverse` [@Wickham2019]. In this chapter, we also load the `tidyfinance` package to download stock price data. This package provides a convenient wrapper for various quantitative functions compatible with the `tidyverse` and our book.\\index{tidyverse} Finally, the package `scales` [@scales] provides useful scale functions for visualizations.\n\nYou typically have to install a package once before you can load it. In case you have not done this yet, call, for instance, `install.packages(\"tidyfinance\")`. \\index{tidyfinance}\n\n\n\n\n::: {.cell}\n\n```{.r .cell-code}\nlibrary(tidyverse)\nlibrary(tidyfinance)\nlibrary(scales)\n```\n:::\n\n\n\n\nWe first download daily prices for one stock symbol, e.g., the Apple stock, *AAPL*, directly from the data provider Yahoo Finance. To download the data, you can use the function `download_data`. If you do not know how to use it, make sure you read the help file by calling `?download_data`. We especially recommend taking a look at the examples section of the documentation. We request daily data for a period of more than 20 years.\\index{Stock prices}\n\n\n\n\n::: {.cell}\n\n```{.r .cell-code}\nprices <- download_data(\n type = \"stock_prices\",\n symbols = \"AAPL\",\n start_date = \"2000-01-01\",\n end_date = \"2023-12-31\"\n)\nprices\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n# A tibble: 6,037 × 8\n symbol date volume open low high close adjusted_close\n \n1 AAPL 2000-01-03 535796800 0.936 0.908 1.00 0.999 0.843\n2 AAPL 2000-01-04 512377600 0.967 0.903 0.988 0.915 0.772\n3 AAPL 2000-01-05 778321600 0.926 0.920 0.987 0.929 0.783\n4 AAPL 2000-01-06 767972800 0.948 0.848 0.955 0.848 0.716\n5 AAPL 2000-01-07 460734400 0.862 0.853 0.902 0.888 0.749\n# ℹ 6,032 more rows\n```\n\n\n:::\n:::\n\n\n\n\n\\index{Data!Yahoo Finance} `download_data(type = \"stock_prices\")` downloads stock market data from Yahoo Finance. The function returns a tibble with eight quite self-explanatory columns: `symbol`, `date`, the daily `volume` (in the number of traded shares), the market prices at the `open`, `high`, `low`, `close`, and the `adjusted` price in USD. The adjusted prices are corrected for anything that might affect the stock price after the market closes, e.g., stock splits and dividends. These actions affect the quoted prices, but they have no direct impact on the investors who hold the stock. Therefore, we often rely on adjusted prices when it comes to analyzing the returns an investor would have earned by holding the stock continuously.\\index{Stock price adjustments}\n\nNext, we use the `ggplot2` package [@ggplot2] to visualize the time series of adjusted prices in @fig-100 . This package takes care of visualization tasks based on the principles of the grammar of graphics [@Wilkinson2012].\\index{Graph!Time series}\n\n\n\n\n::: {.cell}\n\n```{.r .cell-code}\nprices |>\n ggplot(aes(x = date, y = adjusted_close)) +\n geom_line() +\n labs(\n x = NULL,\n y = NULL,\n title = \"Apple stock prices between beginning of 2000 and end of 2023\"\n )\n```\n\n::: {.cell-output-display}\n![Prices are in USD, adjusted for dividend payments and stock splits.](working-with-stock-returns_files/figure-html/fig-100-1.png){#fig-100 fig-alt='Title: Apple stock prices between the beginning of 2000 and the end of 2023. The figure shows that the stock price of Apple increased dramatically from about 1 USD to around 125 USD.' width=2100}\n:::\n:::\n\n\n\n\n\\index{Returns} Instead of analyzing prices, we compute daily net returns defined as $r_t = p_t / p_{t-1} - 1$, where $p_t$ is the adjusted day $t$ price. In that context, the function `lag()` is helpful, which returns the previous value in a vector.\n\n\n\n\n::: {.cell}\n\n```{.r .cell-code}\nreturns <- prices |>\n arrange(date) |>\n mutate(ret = adjusted_close / lag(adjusted_close) - 1) |>\n select(symbol, date, ret)\nreturns\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n# A tibble: 6,037 × 3\n symbol date ret\n \n1 AAPL 2000-01-03 NA \n2 AAPL 2000-01-04 -0.0843\n3 AAPL 2000-01-05 0.0146\n4 AAPL 2000-01-06 -0.0865\n5 AAPL 2000-01-07 0.0474\n# ℹ 6,032 more rows\n```\n\n\n:::\n:::\n\n\n\n\nThe resulting tibble contains three columns, where the last contains the daily returns (`ret`). Note that the first entry naturally contains a missing value (`NA`) because there is no previous price.\\index{Missing value} Obviously, the use of `lag()` would be meaningless if the time series is not ordered by ascending dates.\\index{Lag observations} The command `arrange()` provides a convenient way to order observations in the correct way for our application. In case you want to order observations by descending dates, you can use `arrange(desc(date))`.\n\nFor the upcoming examples, we remove missing values as these would require separate treatment when computing, e.g., sample averages. In general, however, make sure you understand why `NA` values occur and carefully examine if you can simply get rid of these observations.\n\n\n\n\n::: {.cell}\n\n```{.r .cell-code}\nreturns <- returns |>\n drop_na(ret)\n```\n:::\n\n\n\n\nNext, we visualize the distribution of daily returns in a histogram in @fig-101. \\index{Graph!Histogram} Additionally, we add a dashed line that indicates the 5 percent quantile of the daily returns to the histogram, which is a (crude) proxy for the worst return of the stock with a probability of at most 5 percent. The 5 percent quantile is closely connected to the (historical) value-at-risk, a risk measure commonly monitored by regulators. \\index{Value-at-risk} We refer to @Tsay2010 for a more thorough introduction to stylized facts of returns.\\index{Returns}\n\n\n\n\n::: {.cell}\n\n```{.r .cell-code}\nquantile_05 <- quantile(returns |> pull(ret), probs = 0.05)\nreturns |>\n ggplot(aes(x = ret)) +\n geom_histogram(bins = 100) +\n geom_vline(aes(xintercept = quantile_05),\n linetype = \"dashed\"\n ) +\n labs(\n x = NULL,\n y = NULL,\n title = \"Distribution of daily Apple stock returns\"\n ) +\n scale_x_continuous(labels = percent)\n```\n\n::: {.cell-output-display}\n![The dotted vertical line indicates the historical 5 percent quantile.](working-with-stock-returns_files/figure-html/fig-101-1.png){#fig-101 fig-alt='Title: Distribution of daily Apple stock returns in percent. The figure shows a histogram of daily returns. The range indicates a few large negative values, while the remaining returns are distributed around 0. The vertical line indicates that the historical 5 percent quantile of daily returns was around negative 3 percent.' width=2100}\n:::\n:::\n\n\n\n\nHere, `bins = 100` determines the number of bins used in the illustration and hence implicitly the width of the bins. Before proceeding, make sure you understand how to use the geom `geom_vline()` to add a dashed line that indicates the 5 percent quantile of the daily returns. A typical task before proceeding with *any* data is to compute summary statistics for the main variables of interest.\n\n\n\n\n::: {.cell}\n\n```{.r .cell-code}\nreturns |>\n summarize(across(\n ret,\n list(\n daily_mean = mean,\n daily_sd = sd,\n daily_min = min,\n daily_max = max\n )\n ))\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n# A tibble: 1 × 4\n ret_daily_mean ret_daily_sd ret_daily_min ret_daily_max\n \n1 0.00122 0.0247 -0.519 0.139\n```\n\n\n:::\n:::\n\n\n\n\nWe see that the maximum *daily* return was 13.905 percent. Perhaps not surprisingly, the average daily return is close to but slightly above 0. In line with the illustration above, the large losses on the day with the minimum returns indicate a strong asymmetry in the distribution of returns.\\\nYou can also compute these summary statistics for each year individually by imposing `group_by(year = year(date))`, where the call `year(date)` returns the year. More specifically, the few lines of code below compute the summary statistics from above for individual groups of data defined by year. The summary statistics, therefore, allow an eyeball analysis of the time-series dynamics of the return distribution.\n\n\n\n\n::: {.cell}\n\n```{.r .cell-code}\nreturns |>\n group_by(year = year(date)) |>\n summarize(across(\n ret,\n list(\n daily_mean = mean,\n daily_sd = sd,\n daily_min = min,\n daily_max = max\n ),\n .names = \"{.fn}\"\n )) |>\n print(n = Inf)\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n# A tibble: 24 × 5\n year daily_mean daily_sd daily_min daily_max\n \n 1 2000 -0.00346 0.0549 -0.519 0.137 \n 2 2001 0.00233 0.0393 -0.172 0.129 \n 3 2002 -0.00121 0.0305 -0.150 0.0846\n 4 2003 0.00186 0.0234 -0.0814 0.113 \n 5 2004 0.00470 0.0255 -0.0558 0.132 \n 6 2005 0.00349 0.0245 -0.0921 0.0912\n 7 2006 0.000950 0.0243 -0.0633 0.118 \n 8 2007 0.00366 0.0238 -0.0702 0.105 \n 9 2008 -0.00265 0.0367 -0.179 0.139 \n10 2009 0.00382 0.0214 -0.0502 0.0676\n11 2010 0.00183 0.0169 -0.0496 0.0769\n12 2011 0.00104 0.0165 -0.0559 0.0589\n13 2012 0.00130 0.0186 -0.0644 0.0887\n14 2013 0.000472 0.0180 -0.124 0.0514\n15 2014 0.00145 0.0136 -0.0799 0.0820\n16 2015 0.0000199 0.0168 -0.0612 0.0574\n17 2016 0.000575 0.0147 -0.0657 0.0650\n18 2017 0.00164 0.0111 -0.0388 0.0610\n19 2018 -0.0000573 0.0181 -0.0663 0.0704\n20 2019 0.00266 0.0165 -0.0996 0.0683\n21 2020 0.00281 0.0294 -0.129 0.120 \n22 2021 0.00131 0.0158 -0.0417 0.0539\n23 2022 -0.000970 0.0225 -0.0587 0.0890\n24 2023 0.00168 0.0128 -0.0480 0.0469\n```\n\n\n:::\n:::\n\n\n\n\n\\index{Summary statistics}\n\nIn case you wonder: the additional argument `.names = \"{.fn}\"` in `across()` determines how to name the output columns. The specification is rather flexible and allows almost arbitrary column names, which can be useful for reporting. The `print()` function simply controls the output options for the R console.\n\n## Scaling Up the Analysis\n\nAs a next step, we generalize the code from before such that all the computations can handle an arbitrary vector of symbols (e.g., all constituents of an index). Following tidy principles, it is quite easy to download the data, plot the price time series, and tabulate the summary statistics for an arbitrary number of assets.\n\nThis is where the `tidyverse` magic starts: tidy data makes it extremely easy to generalize the computations from before to as many assets as you like. The following code takes any vector of symbols, e.g., `symbol <- c(\"AAPL\", \"MMM\", \"BA\")`, and automates the download as well as the plot of the price time series. In the end, we create the table of summary statistics for an arbitrary number of assets. We perform the analysis with data from all current constituents of the [Dow Jones Industrial Average index.](https://en.wikipedia.org/wiki/Dow_Jones_Industrial_Average) \\index{Data!Dow Jones Index}\n\n\n\n\n::: {.cell}\n\n```{.r .cell-code}\nsymbols <- download_data(type = \"constituents\", index = \"Dow Jones Industrial Average\") \nsymbols\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n# A tibble: 30 × 5\n symbol name location exchange currency\n \n1 GS GOLDMAN SACHS GROUP INC Vereinigte Staaten New Yor… USD \n2 UNH UNITEDHEALTH GROUP INC Vereinigte Staaten New Yor… USD \n3 MSFT MICROSOFT CORP Vereinigte Staaten NASDAQ USD \n4 HD HOME DEPOT INC Vereinigte Staaten New Yor… USD \n5 CAT CATERPILLAR INC Vereinigte Staaten New Yor… USD \n# ℹ 25 more rows\n```\n\n\n:::\n:::\n\n\n\n\nConveniently, `tidyfinance` provides the functionality to get all stock prices from an index with a single call. \\index{Exchange!NASDAQ}\n\n\n\n\n::: {.cell}\n\n```{.r .cell-code}\nprices_daily <- download_data(\n type = \"stock_prices\",\n symbols = symbols$symbol,\n start_date = \"2000-01-01\",\n end_date = \"2023-12-31\"\n)\n```\n:::\n\n\n\n\nThe resulting tibble contains 177925 daily observations for GS, UNH, MSFT, HD, CAT, SHW, CRM, V, AXP, MCD, AMGN, AAPL, TRV, JPM, HON, AMZN, IBM, BA, PG, CVX, JNJ, NVDA, MMM, DIS, MRK, WMT, NKE, KO, CSCO, VZ different stocks. @fig-103 illustrates the time series of downloaded *adjusted* prices for each of the constituents of the Dow index. Make sure you understand every single line of code! What are the arguments of `aes()`? Which alternative `geoms` could you use to visualize the time series? Hint: if you do not know the answers try to change the code to see what difference your intervention causes.\n\n\n\n\n::: {.cell}\n\n```{.r .cell-code}\nfig_prices <- prices_daily |>\n ggplot(aes(\n x = date,\n y = adjusted_close,\n color = symbol\n )) +\n geom_line() +\n labs(\n x = NULL,\n y = NULL,\n color = NULL,\n title = \"Stock prices of Dow index constituents\"\n ) +\n theme(legend.position = \"none\")\nfig_prices\n```\n\n::: {.cell-output-display}\n![Prices in USD, adjusted for dividend payments and stock splits.](working-with-stock-returns_files/figure-html/fig-103-1.png){#fig-103 fig-alt='Title: Stock prices of Dow index constituents. The figure shows many time series with daily prices. The general trend seems positive for most stocks in the Dow index.' width=2100}\n:::\n:::\n\n\n\n\nDo you notice the small differences relative to the code we used before? All we need to do to illustrate all stock symbols simultaneously is to include `color = symbol` in the `ggplot` aesthetics. In this way, we generate a separate line for each symbol. Of course, there are simply too many lines on this graph to identify the individual stocks properly, but it illustrates the point well.\n\nThe same holds for stock returns. Before computing the returns, we use `group_by(symbol)` such that the `mutate()` command is performed for each symbol individually. The same logic also applies to the computation of summary statistics: `group_by(symbol)` is the key to aggregating the time series into symbol-specific variables of interest.\n\n\n\n\n::: {.cell}\n\n```{.r .cell-code}\nreturns_daily <- prices_daily |>\n group_by(symbol) |>\n mutate(ret = adjusted_close / lag(adjusted_close) - 1) |>\n select(symbol, date, ret) |>\n drop_na(ret)\n\nreturns_daily |>\n group_by(symbol) |>\n summarize(across(\n ret,\n list(\n daily_mean = mean,\n daily_sd = sd,\n daily_min = min,\n daily_max = max\n ),\n .names = \"{.fn}\"\n )) |>\n print(n = Inf)\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n# A tibble: 30 × 5\n symbol daily_mean daily_sd daily_min daily_max\n \n 1 AAPL 0.00122 0.0247 -0.519 0.139\n 2 AMGN 0.000493 0.0194 -0.134 0.151\n 3 AMZN 0.00107 0.0315 -0.248 0.345\n 4 AXP 0.000544 0.0227 -0.176 0.219\n 5 BA 0.000628 0.0222 -0.238 0.243\n 6 CAT 0.000724 0.0203 -0.145 0.147\n 7 CRM 0.00119 0.0266 -0.271 0.260\n 8 CSCO 0.000322 0.0234 -0.162 0.244\n 9 CVX 0.000511 0.0175 -0.221 0.227\n10 DIS 0.000414 0.0194 -0.184 0.160\n11 GS 0.000557 0.0229 -0.190 0.265\n12 HD 0.000544 0.0192 -0.287 0.141\n13 HON 0.000497 0.0191 -0.174 0.282\n14 IBM 0.000297 0.0163 -0.155 0.120\n15 JNJ 0.000379 0.0121 -0.158 0.122\n16 JPM 0.000606 0.0238 -0.207 0.251\n17 KO 0.000318 0.0131 -0.101 0.139\n18 MCD 0.000536 0.0145 -0.159 0.181\n19 MMM 0.000363 0.0151 -0.129 0.126\n20 MRK 0.000371 0.0166 -0.268 0.130\n21 MSFT 0.000573 0.0193 -0.156 0.196\n22 NKE 0.000708 0.0193 -0.198 0.155\n23 NVDA 0.00175 0.0376 -0.352 0.424\n24 PG 0.000362 0.0133 -0.302 0.120\n25 SHW 0.000860 0.0180 -0.208 0.153\n26 TRV 0.000555 0.0181 -0.208 0.256\n27 UNH 0.000948 0.0196 -0.186 0.348\n28 V 0.000933 0.0185 -0.136 0.150\n29 VZ 0.000238 0.0151 -0.118 0.146\n30 WMT 0.000323 0.0148 -0.114 0.117\n```\n\n\n:::\n:::\n\n\n\n\n\\index{Summary statistics}\n\nNote that you are now also equipped with all tools to download price data for *each* symbol listed in the S&P 500 index with the same number of lines of code. Just use `symbol <- download_data(type = \"constituents\", index = \"S&P 500\")`, which provides you with a tibble that contains each symbol that is (currently) part of the S&P 500.\\index{Data!SP 500} However, don't try this if you are not prepared to wait for a couple of minutes because this is quite some data to download!\n\n## Other Forms of Data Aggregation\n\nOf course, aggregation across variables other than `symbol` can also make sense. For instance, suppose you are interested in answering the question: Are days with high aggregate trading volume likely followed by days with high aggregate trading volume? To provide some initial analysis on this question, we take the downloaded data and compute aggregate daily trading volume for all Dow index constituents in USD. Recall that the column `volume` is denoted in the number of traded shares.\\index{Trading volume} Thus, we multiply the trading volume with the daily closing price to get a proxy for the aggregate trading volume in USD. Scaling by `1e9` (R can handle scientific notation) denotes daily trading volume in billion USD.\n\n\n\n\n::: {.cell}\n\n```{.r .cell-code}\ntrading_volume <- prices_daily |>\n group_by(date) |>\n summarize(trading_volume = sum(volume * adjusted_close))\n\nfig_trading_volume <- trading_volume |>\n ggplot(aes(x = date, y = trading_volume)) +\n geom_line() +\n labs(\n x = NULL, y = NULL,\n title = \"Aggregate daily trading volume of Dow index constitutens\"\n ) +\n scale_y_continuous(labels = unit_format(unit = \"B\", scale = 1e-9))\nfig_trading_volume\n```\n\n::: {.cell-output-display}\n![Total daily trading volume in billion USD.](working-with-stock-returns_files/figure-html/fig-104-1.png){#fig-104 fig-alt='Title: Aggregate daily trading volume. The figure shows a volatile time series of daily trading volume, ranging from 15 in 2000 to 20.5 in 2023, with a maximum of more than 100.' width=2100}\n:::\n:::\n\n\n\n\n@fig-104 indicates a clear upward trend in aggregated daily trading volume. In particular, since the outbreak of the COVID-19 pandemic, markets have processed substantial trading volumes, as analyzed, for instance, by @Goldstein2021.\\index{Covid 19} One way to illustrate the persistence of trading volume would be to plot volume on day $t$ against volume on day $t-1$ as in the example below. In @fig-105, we add a dotted 45°-line to indicate a hypothetical one-to-one relation by `geom_abline()`, addressing potential differences in the axes' scales.\n\n\n\n\n::: {.cell}\n\n```{.r .cell-code}\nfig_persistence <- trading_volume |>\n ggplot(aes(x = lag(trading_volume), y = trading_volume)) +\n geom_point() +\n geom_abline(aes(intercept = 0, slope = 1),\n linetype = \"dashed\"\n ) +\n labs(\n x = \"Previous day aggregate trading volume\",\n y = \"Aggregate trading volume\",\n title = \"Persistence in daily trading volume of Dow index constituents\"\n ) + \n scale_x_continuous(labels = unit_format(unit = \"B\", scale = 1e-9)) +\n scale_y_continuous(labels = unit_format(unit = \"B\", scale = 1e-9))\nfig_persistence\n```\n\n::: {.cell-output .cell-output-stderr}\n\n```\nWarning: Removed 1 rows containing missing values (`geom_point()`).\n```\n\n\n:::\n\n::: {.cell-output-display}\n![Total daily trading volume in billion USD.](working-with-stock-returns_files/figure-html/fig-105-1.png){#fig-105 fig-alt='Title: Persistence in daily trading volume of Dow index constituents. The figure shows a scatterplot where aggregate trading volume and previous-day aggregate trading volume neatly line up along a 45-degree line.' width=2100}\n:::\n:::\n\n\n\n\nDo you understand where the warning `## Warning: Removed 1 rows containing missing values (geom_point).` comes from and what it means? Purely eye-balling reveals that days with high trading volume are often followed by similarly high trading volume days.\\index{Error message}\n\n## Key Takeaways\n\nIn this chapter, you learned how to effectively use R to download, analyze, and visualize stock market data using tidy principles. From downloading adjusted stock prices to computing returns, summarizing statistics, and visualizing trends, we have laid a solid foundation for working with financial data. Key takeaways include the importance of using adjusted prices for return calculations, leveraging `tidyverse`-tools for efficient data manipulation, and employing visualizations like histograms and line charts to uncover insights. Scaling up analyses to handle multiple stocks or broader indices demonstrates the flexibility of tidy data workflows. Equipped with these foundational techniques, you are now ready to apply them to different contexts in financial economics coming in subsequent chapters.\n\n## Exercises\n\n1. Download daily prices for another stock market symbol of your choice from Yahoo Finance with `download_data()` from the `tidyfinance` package. Plot two time series of the symbol’s un-adjusted and adjusted closing prices. Explain the differences.\n1. Compute daily net returns for an asset of your choice and visualize the distribution of daily returns in a histogram using 100 bins. Also, use `geom_vline()` to add a dashed red vertical line that indicates the 5 percent quantile of the daily returns. Compute summary statistics (mean, standard deviation, minimum and maximum) for the daily returns.\n1. Take your code from before and generalize it such that you can perform all the computations for an arbitrary vector of symbols (e.g., `symbol <- c(\"AAPL\", \"MMM\", \"BA\")`). Automate the download, the plot of the price time series, and create a table of return summary statistics for this arbitrary number of assets.\n1. Are days with high aggregate trading volume often also days with large absolute returns? Find an appropriate visualization to analyze the question using the symbol `AAPL`.\n1.Compute monthly returns from the downloaded stock market prices. Compute the vector of historical average returns and the sample variance-covariance matrix. Compute the minimum variance portfolio weights and the portfolio volatility and average returns. Visualize the mean-variance efficient frontier. Choose one of your assets and identify the portfolio which yields the same historical volatility but achieves the highest possible average return.\n", + "supporting": [ + "working-with-stock-returns_files" + ], + "filters": [ + "rmarkdown/pagebreak.lua" + ], + "includes": {}, + "engineDependencies": {}, + "preserve": {}, + "postProcess": true + } +} \ No newline at end of file diff --git a/_freeze/r/working-with-stock-returns/figure-html/fig-100-1.png b/_freeze/r/working-with-stock-returns/figure-html/fig-100-1.png new file mode 100644 index 00000000..02b8831a Binary files /dev/null and b/_freeze/r/working-with-stock-returns/figure-html/fig-100-1.png differ diff --git a/_freeze/r/working-with-stock-returns/figure-html/fig-101-1.png b/_freeze/r/working-with-stock-returns/figure-html/fig-101-1.png new file mode 100644 index 00000000..3b2854df Binary files /dev/null and b/_freeze/r/working-with-stock-returns/figure-html/fig-101-1.png differ diff --git a/_freeze/r/working-with-stock-returns/figure-html/fig-103-1.png b/_freeze/r/working-with-stock-returns/figure-html/fig-103-1.png new file mode 100644 index 00000000..5571b074 Binary files /dev/null and b/_freeze/r/working-with-stock-returns/figure-html/fig-103-1.png differ diff --git a/_freeze/r/working-with-stock-returns/figure-html/fig-104-1.png b/_freeze/r/working-with-stock-returns/figure-html/fig-104-1.png new file mode 100644 index 00000000..c6647a9d Binary files /dev/null and b/_freeze/r/working-with-stock-returns/figure-html/fig-104-1.png differ diff --git a/_freeze/r/working-with-stock-returns/figure-html/fig-105-1.png b/_freeze/r/working-with-stock-returns/figure-html/fig-105-1.png new file mode 100644 index 00000000..21a2c471 Binary files /dev/null and b/_freeze/r/working-with-stock-returns/figure-html/fig-105-1.png differ diff --git a/_quarto.yml b/_quarto.yml index 6b44e2d3..f495fadb 100644 --- a/_quarto.yml +++ b/_quarto.yml @@ -97,7 +97,11 @@ website: contents: - r/setting-up-your-environment.qmd - r/the-tidyfinance-r-package.qmd - - r/introduction-to-tidy-finance.qmd + - r/working-with-stock-returns.qmd + - r/modern-portfolio-theory.qmd + - r/capital-asset-pricing-model.qmd + - r/financial-ratios.qmd + - r/discounted-cash-flow-analysis.qmd - section: "Financial Data" contents: - r/accessing-and-managing-financial-data.qmd diff --git a/assets/bib/packages-r.bib b/assets/bib/packages-r.bib index cafac939..6ee5af76 100644 --- a/assets/bib/packages-r.bib +++ b/assets/bib/packages-r.bib @@ -135,6 +135,13 @@ @book{ggplot2 note = {R package version 3.3.6}, isbn = {978-3-319-24277-4}, } +@manual{@ggrepel, + title = {ggrepel: Automatically Position Non-Overlapping Text Labels with 'ggplot2'}, + author = {Kamil Slowikowski}, + year = {2024}, + url = {https://ggrepel.slowkow.com/}, + note = {R package version 0.9.6}, +} @article{glmnet, title = {{Regularization paths for Cox's proportional hazards model via coordinate descent}}, author = {Noah Simon and Jerome Friedman and Trevor Hastie and Rob Tibshirani}, diff --git a/assets/img/assets.svg b/assets/img/assets.svg new file mode 100644 index 00000000..b3f0ea2e --- /dev/null +++ b/assets/img/assets.svg @@ -0,0 +1,48 @@ + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + diff --git a/assets/img/balance-sheet-msft.png b/assets/img/balance-sheet-msft.png new file mode 100644 index 00000000..0a4631a6 Binary files /dev/null and b/assets/img/balance-sheet-msft.png differ diff --git a/assets/img/balance-sheet.svg b/assets/img/balance-sheet.svg new file mode 100644 index 00000000..4624a975 --- /dev/null +++ b/assets/img/balance-sheet.svg @@ -0,0 +1,24 @@ + + + + + + + + + + + + + + + + + + + + + + + + diff --git a/assets/img/cash-flow-statements-msft.png b/assets/img/cash-flow-statements-msft.png new file mode 100644 index 00000000..5ba7b235 Binary files /dev/null and b/assets/img/cash-flow-statements-msft.png differ diff --git a/assets/img/cash-flow-statements.svg b/assets/img/cash-flow-statements.svg new file mode 100644 index 00000000..ddd22224 --- /dev/null +++ b/assets/img/cash-flow-statements.svg @@ -0,0 +1,115 @@ + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + diff --git a/assets/img/equity.svg b/assets/img/equity.svg new file mode 100644 index 00000000..1ee9c732 --- /dev/null +++ b/assets/img/equity.svg @@ -0,0 +1,30 @@ + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + diff --git a/assets/img/income-statements-msft.png b/assets/img/income-statements-msft.png new file mode 100644 index 00000000..e769c523 Binary files /dev/null and b/assets/img/income-statements-msft.png differ diff --git a/assets/img/income-statements.svg b/assets/img/income-statements.svg new file mode 100644 index 00000000..1724b7ad --- /dev/null +++ b/assets/img/income-statements.svg @@ -0,0 +1,55 @@ + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + diff --git a/assets/img/liabilities.svg b/assets/img/liabilities.svg new file mode 100644 index 00000000..a3fe946a --- /dev/null +++ b/assets/img/liabilities.svg @@ -0,0 +1,44 @@ + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + diff --git a/assets/img/stockholders-equity-msft.png b/assets/img/stockholders-equity-msft.png new file mode 100644 index 00000000..c0b84d57 Binary files /dev/null and b/assets/img/stockholders-equity-msft.png differ diff --git a/docs/assets/img/assets.svg b/docs/assets/img/assets.svg new file mode 100644 index 00000000..b3f0ea2e --- /dev/null +++ b/docs/assets/img/assets.svg @@ -0,0 +1,48 @@ + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + diff --git a/docs/assets/img/balance-sheet-msft.png b/docs/assets/img/balance-sheet-msft.png new file mode 100644 index 00000000..0a4631a6 Binary files /dev/null and b/docs/assets/img/balance-sheet-msft.png differ diff --git a/docs/assets/img/balance-sheet.svg b/docs/assets/img/balance-sheet.svg new file mode 100644 index 00000000..4624a975 --- /dev/null +++ b/docs/assets/img/balance-sheet.svg @@ -0,0 +1,24 @@ + + + + + + + + + + + + + + + + + + + + + + + + diff --git a/docs/assets/img/cash-flow-statements-msft.png b/docs/assets/img/cash-flow-statements-msft.png new file mode 100644 index 00000000..5ba7b235 Binary files /dev/null and b/docs/assets/img/cash-flow-statements-msft.png differ diff --git a/docs/assets/img/cash-flow-statements.svg b/docs/assets/img/cash-flow-statements.svg new file mode 100644 index 00000000..ddd22224 --- /dev/null +++ b/docs/assets/img/cash-flow-statements.svg @@ -0,0 +1,115 @@ + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + diff --git a/docs/assets/img/equity.svg b/docs/assets/img/equity.svg new file mode 100644 index 00000000..1ee9c732 --- /dev/null +++ b/docs/assets/img/equity.svg @@ -0,0 +1,30 @@ + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + diff --git a/docs/assets/img/income-statements-msft.png b/docs/assets/img/income-statements-msft.png new file mode 100644 index 00000000..e769c523 Binary files /dev/null and b/docs/assets/img/income-statements-msft.png differ diff --git a/docs/assets/img/income-statements.svg b/docs/assets/img/income-statements.svg new file mode 100644 index 00000000..1724b7ad --- /dev/null +++ b/docs/assets/img/income-statements.svg @@ -0,0 +1,55 @@ + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + diff --git a/docs/assets/img/liabilities.svg b/docs/assets/img/liabilities.svg new file mode 100644 index 00000000..a3fe946a --- /dev/null +++ b/docs/assets/img/liabilities.svg @@ -0,0 +1,44 @@ + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + diff --git a/docs/introduction-to-tidy-finance.html b/docs/introduction-to-tidy-finance.html index dad6b89d..e83bfdf2 100644 --- a/docs/introduction-to-tidy-finance.html +++ b/docs/introduction-to-tidy-finance.html @@ -2,7 +2,7 @@ Redirect - + @@ -292,8 +292,32 @@

Accessing and Managing Financial Data

+ + + + @@ -1423,8 +1447,8 @@

Exercises

diff --git a/docs/r/trace-and-fisd.html b/docs/r/trace-and-fisd.html index 148cbbcb..aeec59dd 100644 --- a/docs/r/trace-and-fisd.html +++ b/docs/r/trace-and-fisd.html @@ -298,8 +298,32 @@

TRACE and FISD

+ + + + diff --git a/docs/r/univariate-portfolio-sorts.html b/docs/r/univariate-portfolio-sorts.html index 479d1ad6..7cfee417 100644 --- a/docs/r/univariate-portfolio-sorts.html +++ b/docs/r/univariate-portfolio-sorts.html @@ -327,8 +327,32 @@

Univariate Portfolio Sorts

+ + + + diff --git a/docs/r/value-and-bivariate-sorts.html b/docs/r/value-and-bivariate-sorts.html index 739ed18f..3f6e4498 100644 --- a/docs/r/value-and-bivariate-sorts.html +++ b/docs/r/value-and-bivariate-sorts.html @@ -292,8 +292,32 @@

Value and Bivariate Sorts

+ + + + diff --git a/docs/r/working-with-stock-returns.html b/docs/r/working-with-stock-returns.html new file mode 100644 index 00000000..0e67b7c4 --- /dev/null +++ b/docs/r/working-with-stock-returns.html @@ -0,0 +1,1482 @@ + + + + + + + + + + +Working with Stock Returns in R – Tidy Finance + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
+
+ + +
+ +
+ + +
+ + + +
+ +
+
+

Working with Stock Returns

+
+ + + +
+ + + + +
+ + + +
+ + +
+
+
+ +
+
+Note +
+
+
+

You are reading Tidy Finance with R. You can find the equivalent chapter for the sibling Tidy Finance with Python here.

+
+
+

The main aim of this chapter is to familiarize yourself with the tidyverse for working with stock market data. We focus on downloading and visualizing stock data from Yahoo Finance.

+

At the start of each session, we load the required R packages. Throughout the entire book, we always use the tidyverse (Wickham et al. 2019). In this chapter, we also load the tidyfinance package to download stock price data. This package provides a convenient wrapper for various quantitative functions compatible with the tidyverse and our book. Finally, the package scales (Wickham and Seidel 2022) provides useful scale functions for visualizations.

+

You typically have to install a package once before you can load it. In case you have not done this yet, call, for instance, install.packages("tidyfinance").

+
+
library(tidyverse)
+library(tidyfinance)
+library(scales)
+
+

We first download daily prices for one stock symbol, e.g., the Apple stock, AAPL, directly from the data provider Yahoo Finance. To download the data, you can use the function download_data. If you do not know how to use it, make sure you read the help file by calling ?download_data. We especially recommend taking a look at the examples section of the documentation. We request daily data for a period of more than 20 years.

+
+
prices <- download_data(
+  type = "stock_prices",
+  symbols = "AAPL",
+  start_date = "2000-01-01",
+  end_date = "2023-12-31"
+)
+prices
+
+
# A tibble: 6,037 × 8
+  symbol date          volume  open   low  high close adjusted_close
+  <chr>  <date>         <dbl> <dbl> <dbl> <dbl> <dbl>          <dbl>
+1 AAPL   2000-01-03 535796800 0.936 0.908 1.00  0.999          0.843
+2 AAPL   2000-01-04 512377600 0.967 0.903 0.988 0.915          0.772
+3 AAPL   2000-01-05 778321600 0.926 0.920 0.987 0.929          0.783
+4 AAPL   2000-01-06 767972800 0.948 0.848 0.955 0.848          0.716
+5 AAPL   2000-01-07 460734400 0.862 0.853 0.902 0.888          0.749
+# ℹ 6,032 more rows
+
+
+

download_data(type = "stock_prices") downloads stock market data from Yahoo Finance. The function returns a tibble with eight quite self-explanatory columns: symbol, date, the daily volume (in the number of traded shares), the market prices at the open, high, low, close, and the adjusted price in USD. The adjusted prices are corrected for anything that might affect the stock price after the market closes, e.g., stock splits and dividends. These actions affect the quoted prices, but they have no direct impact on the investors who hold the stock. Therefore, we often rely on adjusted prices when it comes to analyzing the returns an investor would have earned by holding the stock continuously.

+

Next, we use the ggplot2 package (Wickham 2016) to visualize the time series of adjusted prices in Figure 1 . This package takes care of visualization tasks based on the principles of the grammar of graphics (Wilkinson 2012).

+
+
prices |>
+  ggplot(aes(x = date, y = adjusted_close)) +
+  geom_line() +
+  labs(
+    x = NULL,
+    y = NULL,
+    title = "Apple stock prices between beginning of 2000 and end of 2023"
+  )
+
+
+
+
+Title: Apple stock prices between the beginning of 2000 and the end of 2023. The figure shows that the stock price of Apple increased dramatically from about 1 USD to around 125 USD. +
+
+Figure 1: Prices are in USD, adjusted for dividend payments and stock splits. +
+
+
+
+
+

Instead of analyzing prices, we compute daily net returns defined as \(r_t = p_t / p_{t-1} - 1\), where \(p_t\) is the adjusted day \(t\) price. In that context, the function lag() is helpful, which returns the previous value in a vector.

+
+
returns <- prices |>
+  arrange(date) |>
+  mutate(ret = adjusted_close / lag(adjusted_close) - 1) |>
+  select(symbol, date, ret)
+returns
+
+
# A tibble: 6,037 × 3
+  symbol date           ret
+  <chr>  <date>       <dbl>
+1 AAPL   2000-01-03 NA     
+2 AAPL   2000-01-04 -0.0843
+3 AAPL   2000-01-05  0.0146
+4 AAPL   2000-01-06 -0.0865
+5 AAPL   2000-01-07  0.0474
+# ℹ 6,032 more rows
+
+
+

The resulting tibble contains three columns, where the last contains the daily returns (ret). Note that the first entry naturally contains a missing value (NA) because there is no previous price. Obviously, the use of lag() would be meaningless if the time series is not ordered by ascending dates. The command arrange() provides a convenient way to order observations in the correct way for our application. In case you want to order observations by descending dates, you can use arrange(desc(date)).

+

For the upcoming examples, we remove missing values as these would require separate treatment when computing, e.g., sample averages. In general, however, make sure you understand why NA values occur and carefully examine if you can simply get rid of these observations.

+
+
returns <- returns |>
+  drop_na(ret)
+
+

Next, we visualize the distribution of daily returns in a histogram in Figure 2. Additionally, we add a dashed line that indicates the 5 percent quantile of the daily returns to the histogram, which is a (crude) proxy for the worst return of the stock with a probability of at most 5 percent. The 5 percent quantile is closely connected to the (historical) value-at-risk, a risk measure commonly monitored by regulators. We refer to Tsay (2010) for a more thorough introduction to stylized facts of returns.

+
+
quantile_05 <- quantile(returns |> pull(ret), probs = 0.05)
+returns |>
+  ggplot(aes(x = ret)) +
+  geom_histogram(bins = 100) +
+  geom_vline(aes(xintercept = quantile_05),
+    linetype = "dashed"
+  ) +
+  labs(
+    x = NULL,
+    y = NULL,
+    title = "Distribution of daily Apple stock returns"
+  ) +
+  scale_x_continuous(labels = percent)
+
+
+
+
+Title: Distribution of daily Apple stock returns in percent. The figure shows a histogram of daily returns. The range indicates a few large negative values, while the remaining returns are distributed around 0. The vertical line indicates that the historical 5 percent quantile of daily returns was around negative 3 percent. +
+
+Figure 2: The dotted vertical line indicates the historical 5 percent quantile. +
+
+
+
+
+

Here, bins = 100 determines the number of bins used in the illustration and hence implicitly the width of the bins. Before proceeding, make sure you understand how to use the geom geom_vline() to add a dashed line that indicates the 5 percent quantile of the daily returns. A typical task before proceeding with any data is to compute summary statistics for the main variables of interest.

+
+
returns |>
+  summarize(across(
+    ret,
+    list(
+      daily_mean = mean,
+      daily_sd = sd,
+      daily_min = min,
+      daily_max = max
+    )
+  ))
+
+
# A tibble: 1 × 4
+  ret_daily_mean ret_daily_sd ret_daily_min ret_daily_max
+           <dbl>        <dbl>         <dbl>         <dbl>
+1        0.00122       0.0247        -0.519         0.139
+
+
+

We see that the maximum daily return was 13.905 percent. Perhaps not surprisingly, the average daily return is close to but slightly above 0. In line with the illustration above, the large losses on the day with the minimum returns indicate a strong asymmetry in the distribution of returns.
+You can also compute these summary statistics for each year individually by imposing group_by(year = year(date)), where the call year(date) returns the year. More specifically, the few lines of code below compute the summary statistics from above for individual groups of data defined by year. The summary statistics, therefore, allow an eyeball analysis of the time-series dynamics of the return distribution.

+
+
returns |>
+  group_by(year = year(date)) |>
+  summarize(across(
+    ret,
+    list(
+      daily_mean = mean,
+      daily_sd = sd,
+      daily_min = min,
+      daily_max = max
+    ),
+    .names = "{.fn}"
+  )) |>
+  print(n = Inf)
+
+
# A tibble: 24 × 5
+    year daily_mean daily_sd daily_min daily_max
+   <dbl>      <dbl>    <dbl>     <dbl>     <dbl>
+ 1  2000 -0.00346     0.0549   -0.519     0.137 
+ 2  2001  0.00233     0.0393   -0.172     0.129 
+ 3  2002 -0.00121     0.0305   -0.150     0.0846
+ 4  2003  0.00186     0.0234   -0.0814    0.113 
+ 5  2004  0.00470     0.0255   -0.0558    0.132 
+ 6  2005  0.00349     0.0245   -0.0921    0.0912
+ 7  2006  0.000950    0.0243   -0.0633    0.118 
+ 8  2007  0.00366     0.0238   -0.0702    0.105 
+ 9  2008 -0.00265     0.0367   -0.179     0.139 
+10  2009  0.00382     0.0214   -0.0502    0.0676
+11  2010  0.00183     0.0169   -0.0496    0.0769
+12  2011  0.00104     0.0165   -0.0559    0.0589
+13  2012  0.00130     0.0186   -0.0644    0.0887
+14  2013  0.000472    0.0180   -0.124     0.0514
+15  2014  0.00145     0.0136   -0.0799    0.0820
+16  2015  0.0000199   0.0168   -0.0612    0.0574
+17  2016  0.000575    0.0147   -0.0657    0.0650
+18  2017  0.00164     0.0111   -0.0388    0.0610
+19  2018 -0.0000573   0.0181   -0.0663    0.0704
+20  2019  0.00266     0.0165   -0.0996    0.0683
+21  2020  0.00281     0.0294   -0.129     0.120 
+22  2021  0.00131     0.0158   -0.0417    0.0539
+23  2022 -0.000970    0.0225   -0.0587    0.0890
+24  2023  0.00168     0.0128   -0.0480    0.0469
+
+
+

+

In case you wonder: the additional argument .names = "{.fn}" in across() determines how to name the output columns. The specification is rather flexible and allows almost arbitrary column names, which can be useful for reporting. The print() function simply controls the output options for the R console.

+
+

Scaling Up the Analysis

+

As a next step, we generalize the code from before such that all the computations can handle an arbitrary vector of symbols (e.g., all constituents of an index). Following tidy principles, it is quite easy to download the data, plot the price time series, and tabulate the summary statistics for an arbitrary number of assets.

+

This is where the tidyverse magic starts: tidy data makes it extremely easy to generalize the computations from before to as many assets as you like. The following code takes any vector of symbols, e.g., symbol <- c("AAPL", "MMM", "BA"), and automates the download as well as the plot of the price time series. In the end, we create the table of summary statistics for an arbitrary number of assets. We perform the analysis with data from all current constituents of the Dow Jones Industrial Average index.

+
+
symbols <- download_data(type = "constituents", index = "Dow Jones Industrial Average") 
+symbols
+
+
# A tibble: 30 × 5
+  symbol name                    location           exchange currency
+  <chr>  <chr>                   <chr>              <chr>    <chr>   
+1 GS     GOLDMAN SACHS GROUP INC Vereinigte Staaten New Yor… USD     
+2 UNH    UNITEDHEALTH GROUP INC  Vereinigte Staaten New Yor… USD     
+3 MSFT   MICROSOFT CORP          Vereinigte Staaten NASDAQ   USD     
+4 HD     HOME DEPOT INC          Vereinigte Staaten New Yor… USD     
+5 CAT    CATERPILLAR INC         Vereinigte Staaten New Yor… USD     
+# ℹ 25 more rows
+
+
+

Conveniently, tidyfinance provides the functionality to get all stock prices from an index with a single call.

+
+
prices_daily <- download_data(
+  type = "stock_prices",
+  symbols = symbols$symbol,
+  start_date = "2000-01-01",
+  end_date = "2023-12-31"
+)
+
+

The resulting tibble contains 177925 daily observations for GS, UNH, MSFT, HD, CAT, SHW, CRM, V, AXP, MCD, AMGN, AAPL, TRV, JPM, HON, AMZN, IBM, BA, PG, CVX, JNJ, NVDA, MMM, DIS, MRK, WMT, NKE, KO, CSCO, VZ different stocks. Figure 3 illustrates the time series of downloaded adjusted prices for each of the constituents of the Dow index. Make sure you understand every single line of code! What are the arguments of aes()? Which alternative geoms could you use to visualize the time series? Hint: if you do not know the answers try to change the code to see what difference your intervention causes.

+
+
fig_prices <- prices_daily |>
+  ggplot(aes(
+    x = date,
+    y = adjusted_close,
+    color = symbol
+  )) +
+  geom_line() +
+  labs(
+    x = NULL,
+    y = NULL,
+    color = NULL,
+    title = "Stock prices of Dow index constituents"
+  ) +
+  theme(legend.position = "none")
+fig_prices
+
+
+
+
+Title: Stock prices of Dow index constituents. The figure shows many time series with daily prices. The general trend seems positive for most stocks in the Dow index. +
+
+Figure 3: Prices in USD, adjusted for dividend payments and stock splits. +
+
+
+
+
+

Do you notice the small differences relative to the code we used before? All we need to do to illustrate all stock symbols simultaneously is to include color = symbol in the ggplot aesthetics. In this way, we generate a separate line for each symbol. Of course, there are simply too many lines on this graph to identify the individual stocks properly, but it illustrates the point well.

+

The same holds for stock returns. Before computing the returns, we use group_by(symbol) such that the mutate() command is performed for each symbol individually. The same logic also applies to the computation of summary statistics: group_by(symbol) is the key to aggregating the time series into symbol-specific variables of interest.

+
+
returns_daily <- prices_daily |>
+  group_by(symbol) |>
+  mutate(ret = adjusted_close / lag(adjusted_close) - 1) |>
+  select(symbol, date, ret) |>
+  drop_na(ret)
+
+returns_daily |>
+  group_by(symbol) |>
+  summarize(across(
+    ret,
+    list(
+      daily_mean = mean,
+      daily_sd = sd,
+      daily_min = min,
+      daily_max = max
+    ),
+    .names = "{.fn}"
+  )) |>
+  print(n = Inf)
+
+
# A tibble: 30 × 5
+   symbol daily_mean daily_sd daily_min daily_max
+   <chr>       <dbl>    <dbl>     <dbl>     <dbl>
+ 1 AAPL     0.00122    0.0247    -0.519     0.139
+ 2 AMGN     0.000493   0.0194    -0.134     0.151
+ 3 AMZN     0.00107    0.0315    -0.248     0.345
+ 4 AXP      0.000544   0.0227    -0.176     0.219
+ 5 BA       0.000628   0.0222    -0.238     0.243
+ 6 CAT      0.000724   0.0203    -0.145     0.147
+ 7 CRM      0.00119    0.0266    -0.271     0.260
+ 8 CSCO     0.000322   0.0234    -0.162     0.244
+ 9 CVX      0.000511   0.0175    -0.221     0.227
+10 DIS      0.000414   0.0194    -0.184     0.160
+11 GS       0.000557   0.0229    -0.190     0.265
+12 HD       0.000544   0.0192    -0.287     0.141
+13 HON      0.000497   0.0191    -0.174     0.282
+14 IBM      0.000297   0.0163    -0.155     0.120
+15 JNJ      0.000379   0.0121    -0.158     0.122
+16 JPM      0.000606   0.0238    -0.207     0.251
+17 KO       0.000318   0.0131    -0.101     0.139
+18 MCD      0.000536   0.0145    -0.159     0.181
+19 MMM      0.000363   0.0151    -0.129     0.126
+20 MRK      0.000371   0.0166    -0.268     0.130
+21 MSFT     0.000573   0.0193    -0.156     0.196
+22 NKE      0.000708   0.0193    -0.198     0.155
+23 NVDA     0.00175    0.0376    -0.352     0.424
+24 PG       0.000362   0.0133    -0.302     0.120
+25 SHW      0.000860   0.0180    -0.208     0.153
+26 TRV      0.000555   0.0181    -0.208     0.256
+27 UNH      0.000948   0.0196    -0.186     0.348
+28 V        0.000933   0.0185    -0.136     0.150
+29 VZ       0.000238   0.0151    -0.118     0.146
+30 WMT      0.000323   0.0148    -0.114     0.117
+
+
+

+

Note that you are now also equipped with all tools to download price data for each symbol listed in the S&P 500 index with the same number of lines of code. Just use symbol <- download_data(type = "constituents", index = "S&P 500"), which provides you with a tibble that contains each symbol that is (currently) part of the S&P 500. However, don’t try this if you are not prepared to wait for a couple of minutes because this is quite some data to download!

+
+
+

Other Forms of Data Aggregation

+

Of course, aggregation across variables other than symbol can also make sense. For instance, suppose you are interested in answering the question: Are days with high aggregate trading volume likely followed by days with high aggregate trading volume? To provide some initial analysis on this question, we take the downloaded data and compute aggregate daily trading volume for all Dow index constituents in USD. Recall that the column volume is denoted in the number of traded shares. Thus, we multiply the trading volume with the daily closing price to get a proxy for the aggregate trading volume in USD. Scaling by 1e9 (R can handle scientific notation) denotes daily trading volume in billion USD.

+
+
trading_volume <- prices_daily |>
+  group_by(date) |>
+  summarize(trading_volume = sum(volume * adjusted_close))
+
+fig_trading_volume <- trading_volume |>
+  ggplot(aes(x = date, y = trading_volume)) +
+  geom_line() +
+  labs(
+    x = NULL, y = NULL,
+    title = "Aggregate daily trading volume of Dow index constitutens"
+  ) +
+    scale_y_continuous(labels = unit_format(unit = "B", scale = 1e-9))
+fig_trading_volume
+
+
+
+
+Title: Aggregate daily trading volume. The figure shows a volatile time series of daily trading volume, ranging from 15 in 2000 to 20.5 in 2023, with a maximum of more than 100. +
+
+Figure 4: Total daily trading volume in billion USD. +
+
+
+
+
+

Figure 4 indicates a clear upward trend in aggregated daily trading volume. In particular, since the outbreak of the COVID-19 pandemic, markets have processed substantial trading volumes, as analyzed, for instance, by Goldstein, Koijen, and Mueller (2021). One way to illustrate the persistence of trading volume would be to plot volume on day \(t\) against volume on day \(t-1\) as in the example below. In Figure 5, we add a dotted 45°-line to indicate a hypothetical one-to-one relation by geom_abline(), addressing potential differences in the axes’ scales.

+
+
fig_persistence <- trading_volume |>
+  ggplot(aes(x = lag(trading_volume), y = trading_volume)) +
+  geom_point() +
+  geom_abline(aes(intercept = 0, slope = 1),
+    linetype = "dashed"
+  ) +
+  labs(
+    x = "Previous day aggregate trading volume",
+    y = "Aggregate trading volume",
+    title = "Persistence in daily trading volume of Dow index constituents"
+  ) + 
+  scale_x_continuous(labels = unit_format(unit = "B", scale = 1e-9)) +
+  scale_y_continuous(labels = unit_format(unit = "B", scale = 1e-9))
+fig_persistence
+
+
Warning: Removed 1 rows containing missing values (`geom_point()`).
+
+
+
+
+
+Title: Persistence in daily trading volume of Dow index constituents. The figure shows a scatterplot where aggregate trading volume and previous-day aggregate trading volume neatly line up along a 45-degree line. +
+
+Figure 5: Total daily trading volume in billion USD. +
+
+
+
+
+

Do you understand where the warning ## Warning: Removed 1 rows containing missing values (geom_point). comes from and what it means? Purely eye-balling reveals that days with high trading volume are often followed by similarly high trading volume days.

+
+
+

Key Takeaways

+

In this chapter, you learned how to effectively use R to download, analyze, and visualize stock market data using tidy principles. From downloading adjusted stock prices to computing returns, summarizing statistics, and visualizing trends, we have laid a solid foundation for working with financial data. Key takeaways include the importance of using adjusted prices for return calculations, leveraging tidyverse-tools for efficient data manipulation, and employing visualizations like histograms and line charts to uncover insights. Scaling up analyses to handle multiple stocks or broader indices demonstrates the flexibility of tidy data workflows. Equipped with these foundational techniques, you are now ready to apply them to different contexts in financial economics coming in subsequent chapters.

+
+
+

Exercises

+
    +
  1. Download daily prices for another stock market symbol of your choice from Yahoo Finance with download_data() from the tidyfinance package. Plot two time series of the symbol’s un-adjusted and adjusted closing prices. Explain the differences.
  2. +
  3. Compute daily net returns for an asset of your choice and visualize the distribution of daily returns in a histogram using 100 bins. Also, use geom_vline() to add a dashed red vertical line that indicates the 5 percent quantile of the daily returns. Compute summary statistics (mean, standard deviation, minimum and maximum) for the daily returns.
  4. +
  5. Take your code from before and generalize it such that you can perform all the computations for an arbitrary vector of symbols (e.g., symbol <- c("AAPL", "MMM", "BA")). Automate the download, the plot of the price time series, and create a table of return summary statistics for this arbitrary number of assets.
  6. +
  7. Are days with high aggregate trading volume often also days with large absolute returns? Find an appropriate visualization to analyze the question using the symbol AAPL. 1.Compute monthly returns from the downloaded stock market prices. Compute the vector of historical average returns and the sample variance-covariance matrix. Compute the minimum variance portfolio weights and the portfolio volatility and average returns. Visualize the mean-variance efficient frontier. Choose one of your assets and identify the portfolio which yields the same historical volatility but achieves the highest possible average return.
  8. +
+ + + +
+ +

References

+
+Goldstein, Itay, Ralph S J Koijen, and Holger M. Mueller. 2021. COVID-19 and its impact on financial markets and the real economy.” Review of Financial Studies 34 (11): 5135–48. https://doi.org/10.1093/rfs/hhab085. +
+
+Tsay, Ruey S. 2010. Analysis of financial time series. John Wiley & Sons. +
+
+Wickham, Hadley. 2016. ggplot2: Elegant graphics for data analysis. Springer-Verlag New York. https://ggplot2.tidyverse.org. +
+
+Wickham, Hadley, Mara Averick, Jennifer Bryan, Winston Chang, Lucy D’Agostino McGowan, Romain François, Garrett Grolemund, et al. 2019. Welcome to the Tidyverse.” Journal of Open Source Software 4 (43): 1686. https://doi.org/10.21105/joss.01686. +
+
+Wickham, Hadley, and Dana Seidel. 2022. scales: Scale functions for visualization. https://CRAN.R-project.org/package=scales. +
+
+Wilkinson, Leland. 2012. The grammar of graphics. Springer. +
+
+ + + +
+ + + + + + + + \ No newline at end of file diff --git a/docs/r/working-with-stock-returns_files/figure-html/fig-100-1.png b/docs/r/working-with-stock-returns_files/figure-html/fig-100-1.png new file mode 100644 index 00000000..02b8831a Binary files /dev/null and b/docs/r/working-with-stock-returns_files/figure-html/fig-100-1.png differ diff --git a/docs/r/working-with-stock-returns_files/figure-html/fig-101-1.png b/docs/r/working-with-stock-returns_files/figure-html/fig-101-1.png new file mode 100644 index 00000000..3b2854df Binary files /dev/null and b/docs/r/working-with-stock-returns_files/figure-html/fig-101-1.png differ diff --git a/docs/r/working-with-stock-returns_files/figure-html/fig-103-1.png b/docs/r/working-with-stock-returns_files/figure-html/fig-103-1.png new file mode 100644 index 00000000..5571b074 Binary files /dev/null and b/docs/r/working-with-stock-returns_files/figure-html/fig-103-1.png differ diff --git a/docs/r/working-with-stock-returns_files/figure-html/fig-104-1.png b/docs/r/working-with-stock-returns_files/figure-html/fig-104-1.png new file mode 100644 index 00000000..c6647a9d Binary files /dev/null and b/docs/r/working-with-stock-returns_files/figure-html/fig-104-1.png differ diff --git a/docs/r/working-with-stock-returns_files/figure-html/fig-105-1.png b/docs/r/working-with-stock-returns_files/figure-html/fig-105-1.png new file mode 100644 index 00000000..21a2c471 Binary files /dev/null and b/docs/r/working-with-stock-returns_files/figure-html/fig-105-1.png differ diff --git a/docs/r/wrds-crsp-and-compustat.html b/docs/r/wrds-crsp-and-compustat.html index c794bdae..21330675 100644 --- a/docs/r/wrds-crsp-and-compustat.html +++ b/docs/r/wrds-crsp-and-compustat.html @@ -298,8 +298,32 @@

WRDS, CRSP, and Compustat

+ + + + diff --git a/docs/r/wrds-dummy-data.html b/docs/r/wrds-dummy-data.html index 7566f7d2..dc41a8e6 100644 --- a/docs/r/wrds-dummy-data.html +++ b/docs/r/wrds-dummy-data.html @@ -272,8 +272,32 @@

WRDS Dummy Data

+ + + + diff --git a/docs/search.json b/docs/search.json index e117e977..aae7b54a 100644 --- a/docs/search.json +++ b/docs/search.json @@ -519,171 +519,243 @@ ] }, { - "objectID": "r/proofs.html", - "href": "r/proofs.html", - "title": "Proofs", + "objectID": "r/clean-enhanced-trace-with-r.html", + "href": "r/clean-enhanced-trace-with-r.html", + "title": "Clean Enhanced TRACE with R", "section": "", - "text": "The minimum variance portfolio weights are given by the solution to \\[\\omega_\\text{mvp} = \\arg\\min \\omega'\\Sigma \\omega \\text{ s.t. } \\iota'\\omega= 1,\\] where \\(\\iota\\) is an \\((N \\times 1)\\) vector of ones. The Lagrangian reads \\[ \\mathcal{L}(\\omega) = \\omega'\\Sigma \\omega - \\lambda(\\omega'\\iota - 1).\\] We can solve the first-order conditions of the Lagrangian equation: \\[\n\\begin{aligned}\n& \\frac{\\partial\\mathcal{L}(\\omega)}{\\partial\\omega} = 0 \\Leftrightarrow 2\\Sigma \\omega = \\lambda\\iota \\Rightarrow \\omega = \\frac{\\lambda}{2}\\Sigma^{-1}\\iota \\\\ \\end{aligned}\n\\] Next, the constraint that weights have to sum up to one delivers: \\(1 = \\iota'\\omega = \\frac{\\lambda}{2}\\iota'\\Sigma^{-1}\\iota \\Rightarrow \\lambda = \\frac{2}{\\iota'\\Sigma^{-1}\\iota}.\\) Finally, plug-in the derived value of \\(\\lambda\\) to get \\[\n\\begin{aligned}\n\\omega_\\text{mvp} = \\frac{\\Sigma^{-1}\\iota}{\\iota'\\Sigma^{-1}\\iota}.\n\\end{aligned}\n\\]\n\n\n\nConsider an investor who aims to achieve minimum variance given a desired expected return \\(\\bar{\\mu}\\), that is: \\[\\omega_\\text{eff}\\left(\\bar{\\mu}\\right) = \\arg\\min \\omega'\\Sigma \\omega \\text{ s.t. } \\iota'\\omega = 1 \\text{ and } \\omega'\\mu \\geq \\bar{\\mu}.\\] The Lagrangian reads \\[ \\mathcal{L}(\\omega) = \\omega'\\Sigma \\omega - \\lambda(\\omega'\\iota - 1) - \\tilde{\\lambda}(\\omega'\\mu - \\bar{\\mu}). \\] We can solve the first-order conditions to get \\[\n\\begin{aligned}\n2\\Sigma \\omega &= \\lambda\\iota + \\tilde\\lambda \\mu\\\\\n\\Rightarrow\\omega &= \\frac{\\lambda}{2}\\Sigma^{-1}\\iota + \\frac{\\tilde\\lambda}{2}\\Sigma^{-1}\\mu.\n\\end{aligned}\n\\]\nNext, the two constraints (\\(w'\\iota = 1 \\text{ and } \\omega'\\mu \\geq \\bar{\\mu}\\)) imply \\[\n\\begin{aligned}\n1 &= \\iota'\\omega = \\frac{\\lambda}{2}\\underbrace{\\iota'\\Sigma^{-1}\\iota}_{C} + \\frac{\\tilde\\lambda}{2}\\underbrace{\\iota'\\Sigma^{-1}\\mu}_D\\\\\n\\Rightarrow \\lambda&= \\frac{2 - \\tilde\\lambda D}{C}\\\\\n\\bar\\mu &= \\mu'\\omega = \\frac{\\lambda}{2}\\underbrace{\\mu'\\Sigma^{-1}\\iota}_{D} + \\frac{\\tilde\\lambda}{2}\\underbrace{\\mu'\\Sigma^{-1}\\mu}_E = \\frac{1}{2}\\left(\\frac{2 - \\tilde\\lambda D}{C}\\right)D+\\frac{\\tilde\\lambda}{2}E \\\\&=\\frac{D}{C}+\\frac{\\tilde\\lambda}{2}\\left(E - \\frac{D^2}{C}\\right)\\\\\n\\Rightarrow \\tilde\\lambda &= 2\\frac{\\bar\\mu - D/C}{E-D^2/C}.\n\\end{aligned}\n\\] As a result, the efficient portfolio weight takes the form (for \\(\\bar{\\mu} \\geq D/C = \\mu'\\omega_\\text{mvp}\\)) \\[\\omega_\\text{eff}\\left(\\bar\\mu\\right) = \\omega_\\text{mvp} + \\frac{\\tilde\\lambda}{2}\\left(\\Sigma^{-1}\\mu -\\frac{D}{C}\\Sigma^{-1}\\iota \\right).\\] Thus, the efficient portfolio allocates wealth in the minimum variance portfolio \\(\\omega_\\text{mvp}\\) and a levered (self-financing) portfolio to increase the expected return.\nNote that the portfolio weights sum up to one as \\[\\iota'\\left(\\Sigma^{-1}\\mu -\\frac{D}{C}\\Sigma^{-1}\\iota \\right) = D - D = 0\\text{ so }\\iota'\\omega_\\text{eff} = \\iota'\\omega_\\text{mvp} = 1.\\] Finally, the expected return of the efficient portfolio is \\[\\mu'\\omega_\\text{eff} = \\frac{D}{C} + \\bar\\mu - \\frac{D}{C} = \\bar\\mu.\\]\n\n\n\nWe argue that an investor with a quadratic utility function with certainty equivalent \\[\\max_\\omega CE(\\omega) = \\omega'\\mu - \\frac{\\gamma}{2} \\omega'\\Sigma \\omega \\text{ s.t. } \\iota'\\omega = 1\\] faces an equivalent optimization problem to a framework where portfolio weights are chosen with the aim to minimize volatility given a pre-specified level or expected returns \\[\\min_\\omega \\omega'\\Sigma \\omega \\text{ s.t. } \\omega'\\mu = \\bar\\mu \\text{ and } \\iota'\\omega = 1.\\] Note the difference: In the first case, the investor has a (known) risk aversion \\(\\gamma\\) which determines their optimal balance between risk (\\(\\omega'\\Sigma\\omega)\\) and return (\\(\\mu'\\omega\\)). In the second case, the investor has a target return they want to achieve while minimizing the volatility. Intuitively, both approaches are closely connected if we consider that the risk aversion \\(\\gamma\\) determines the desirable return \\(\\bar\\mu\\). More risk-averse investors (higher \\(\\gamma\\)) will chose a lower target return to keep their volatility level down. The efficient frontier then spans all possible portfolios depending on the risk aversion \\(\\gamma\\), starting from the minimum variance portfolio (\\(\\gamma = \\infty\\)).\nTo proof this equivalence, consider first the optimal portfolio weights for a certainty equivalent maximizing investor. The first-order condition reads \\[\n\\begin{aligned}\n\\mu - \\lambda \\iota &= \\gamma \\Sigma \\omega \\\\\n\\Leftrightarrow \\omega &= \\frac{1}{\\gamma}\\Sigma^{-1}\\left(\\mu - \\lambda\\iota\\right)\n\\end{aligned}\n\\] Next, we make use of the constraint \\(\\iota'\\omega = 1\\). \\[\n\\begin{aligned}\n\\iota'\\omega &= 1 = \\frac{1}{\\gamma}\\left(\\iota'\\Sigma^{-1}\\mu - \\lambda\\iota'\\Sigma^{-1}\\iota\\right)\\\\\n\\Rightarrow \\lambda &= \\frac{1}{\\iota'\\Sigma^{-1}\\iota}\\left(\\iota'\\Sigma^{-1}\\mu - \\gamma \\right).\n\\end{aligned}\n\\] Plugging in the value of \\(\\lambda\\) reveals the desired portfolio for an investor with risk aversion \\(\\gamma\\). \\[\n\\begin{aligned}\n\\omega &= \\frac{1}{\\gamma}\\Sigma^{-1}\\left(\\mu - \\frac{1}{\\iota'\\Sigma^{-1}\\iota}\\left(\\iota'\\Sigma^{-1}\\mu - \\gamma \\right)\\right) \\\\\n\\Rightarrow \\omega &= \\frac{\\Sigma^{-1}\\iota}{\\iota'\\Sigma^{-1}\\iota} + \\frac{1}{\\gamma}\\left(\\Sigma^{-1} - \\frac{\\Sigma^{-1}\\iota}{\\iota'\\Sigma^{-1}\\iota}\\iota'\\Sigma^{-1}\\right)\\mu\\\\\n&= \\omega_\\text{mvp} + \\frac{1}{\\gamma}\\left(\\Sigma^{-1}\\mu - \\frac{\\iota'\\Sigma^{-1}\\mu}{\\iota'\\Sigma^{-1}\\iota}\\Sigma^{-1}\\iota\\right).\n\\end{aligned}\n\\] The resulting weights correspond to the efficient portfolio with desired return \\(\\bar r\\) such that (in the notation of book) \\[\\frac{1}{\\gamma} = \\frac{\\tilde\\lambda}{2} = \\frac{\\bar\\mu - D/C}{E - D^2/C}\\] which implies that the desired return is just \\[\\bar\\mu = \\frac{D}{C} + \\frac{1}{\\gamma}\\left({E - D^2/C}\\right)\\] which is \\(\\frac{D}{C} = \\mu'\\omega_\\text{mvp}\\) for \\(\\gamma\\rightarrow \\infty\\) as expected. For instance, letting \\(\\gamma \\rightarrow \\infty\\) implies \\(\\bar\\mu = \\frac{D}{C} = \\omega_\\text{mvp}'\\mu\\).", + "text": "Note\n\n\n\nYou are reading Tidy Finance with R. You can find the equivalent chapter for the sibling Tidy Finance with Python here.\n\n\nThis appendix contains code to clean enhanced TRACE with R. It is also available via the following GitHub gist. Hence, you could also source the function with devtools::source_gist(\"3a05b3ab281563b2e94858451c2eb3a4\"). We need this function in Chapter TRACE and FISD to download and clean enhanced TRACE trade messages following Dick-Nielsen (2009) and Dick-Nielsen (2014) for enhanced TRACE specifically. Relatedly, WRDS provides SAS code and there is Python code available by the project Open Source Bond Asset Pricing.\nThe function takes a vector of CUSIPs (in cusips), a connection to WRDS (connection) explained in Chapter 3, and a start and end date (start_date and end_date, respectively). Specifying too many CUSIPs will result in very slow downloads and a potential failure due to the size of the request to WRDS. The dates should be within the coverage of TRACE itself, i.e., starting after 2002, and the dates should be supplied using the class date. The output of the function contains all valid trade messages for the selected CUSIPs over the specified period.\n\nclean_enhanced_trace <- function(cusips,\n connection,\n start_date = as.Date(\"2002-01-01\"),\n end_date = today()) {\n\n # Packages (required)\n library(dplyr)\n library(lubridate)\n library(dbplyr)\n library(RPostgres)\n\n # Function checks ---------------------------------------------------------\n # Input parameters\n ## Cusips\n if (length(cusips) == 0 | any(is.na(cusips))) stop(\"Check cusips.\")\n\n ## Dates\n if (!is.Date(start_date) | !is.Date(end_date)) stop(\"Dates needed\")\n if (start_date < as.Date(\"2002-01-01\")) stop(\"TRACE starts later.\")\n if (end_date > today()) stop(\"TRACE does not predict the future.\")\n if (start_date >= end_date) stop(\"Date conflict.\")\n\n ## Connection\n if (!dbIsValid(connection)) stop(\"Connection issue.\")\n\n # Enhanced Trace ----------------------------------------------------------\n trace_enhanced_db <- tbl(connection, I(\"trace.trace_enhanced\"))\n \n # Main file\n trace_all <- trace_enhanced_db |>\n filter(\n cusip_id %in% cusips,\n between(trd_exctn_dt, start_date, end_date)\n ) |>\n select(cusip_id, msg_seq_nb, orig_msg_seq_nb,\n entrd_vol_qt, rptd_pr, yld_pt, rpt_side_cd, cntra_mp_id,\n trd_exctn_dt, trd_exctn_tm, trd_rpt_dt, trd_rpt_tm,\n pr_trd_dt, trc_st, asof_cd, wis_fl,\n days_to_sttl_ct, stlmnt_dt, spcl_trd_fl) |>\n collect()\n\n # Enhanced Trace: Post 06-02-2012 -----------------------------------------\n # Trades (trc_st = T) and correction (trc_st = R)\n trace_post_TR <- trace_all |>\n filter((trc_st == \"T\" | trc_st == \"R\"),\n trd_rpt_dt >= as.Date(\"2012-02-06\"))\n\n # Cancellations (trc_st = X) and correction cancellations (trc_st = C)\n trace_post_XC <- trace_all |>\n filter((trc_st == \"X\" | trc_st == \"C\"),\n trd_rpt_dt >= as.Date(\"2012-02-06\"))\n\n # Cleaning corrected and cancelled trades\n trace_post_TR <- trace_post_TR |>\n anti_join(trace_post_XC,\n by = join_by(cusip_id, msg_seq_nb, entrd_vol_qt,\n rptd_pr, rpt_side_cd, cntra_mp_id,\n trd_exctn_dt, trd_exctn_tm))\n\n # Reversals (trc_st = Y)\n trace_post_Y <- trace_all |>\n filter(trc_st == \"Y\",\n trd_rpt_dt >= as.Date(\"2012-02-06\"))\n\n # Clean reversals\n ## match the orig_msg_seq_nb of the Y-message to\n ## the msg_seq_nb of the main message\n trace_post <- trace_post_TR |>\n anti_join(trace_post_Y,\n by = join_by(cusip_id, msg_seq_nb == orig_msg_seq_nb,\n entrd_vol_qt, rptd_pr, rpt_side_cd,\n cntra_mp_id, trd_exctn_dt, trd_exctn_tm))\n\n\n # Enhanced TRACE: Pre 06-02-2012 ------------------------------------------\n # Cancellations (trc_st = C)\n trace_pre_C <- trace_all |>\n filter(trc_st == \"C\",\n trd_rpt_dt < as.Date(\"2012-02-06\"))\n\n # Trades w/o cancellations\n ## match the orig_msg_seq_nb of the C-message\n ## to the msg_seq_nb of the main message\n trace_pre_T <- trace_all |>\n filter(trc_st == \"T\",\n trd_rpt_dt < as.Date(\"2012-02-06\")) |>\n anti_join(trace_pre_C,\n by = join_by(cusip_id, msg_seq_nb == orig_msg_seq_nb,\n entrd_vol_qt, rptd_pr, rpt_side_cd,\n cntra_mp_id, trd_exctn_dt, trd_exctn_tm))\n\n # Corrections (trc_st = W) - W can also correct a previous W\n trace_pre_W <- trace_all |>\n filter(trc_st == \"W\",\n trd_rpt_dt < as.Date(\"2012-02-06\"))\n\n # Implement corrections in a loop\n ## Correction control\n correction_control <- nrow(trace_pre_W)\n correction_control_last <- nrow(trace_pre_W)\n\n ## Correction loop\n while (correction_control > 0) {\n # Corrections that correct some msg\n trace_pre_W_correcting <- trace_pre_W |>\n semi_join(trace_pre_T,\n by = join_by(cusip_id, trd_exctn_dt,\n orig_msg_seq_nb == msg_seq_nb))\n\n # Corrections that do not correct some msg\n trace_pre_W <- trace_pre_W |>\n anti_join(trace_pre_T,\n by = join_by(cusip_id, trd_exctn_dt,\n orig_msg_seq_nb == msg_seq_nb))\n\n # Delete msgs that are corrected and add correction msgs\n trace_pre_T <- trace_pre_T |>\n anti_join(trace_pre_W_correcting,\n by = join_by(cusip_id, trd_exctn_dt,\n msg_seq_nb == orig_msg_seq_nb)) |>\n union_all(trace_pre_W_correcting)\n\n # Escape if no corrections remain or they cannot be matched\n correction_control <- nrow(trace_pre_W)\n\n if (correction_control == correction_control_last) {\n\n correction_control <- 0\n\n }\n\n correction_control_last <- nrow(trace_pre_W)\n\n }\n\n\n # Clean reversals\n ## Record reversals\n trace_pre_R <- trace_pre_T |>\n filter(asof_cd == 'R') |>\n group_by(cusip_id, trd_exctn_dt, entrd_vol_qt,\n rptd_pr, rpt_side_cd, cntra_mp_id) |>\n arrange(trd_exctn_tm, trd_rpt_dt, trd_rpt_tm) |>\n mutate(seq = row_number()) |>\n ungroup()\n\n ## Remove reversals and the reversed trade\n trace_pre <- trace_pre_T |>\n filter(is.na(asof_cd) | !(asof_cd %in% c('R', 'X', 'D'))) |>\n group_by(cusip_id, trd_exctn_dt, entrd_vol_qt,\n rptd_pr, rpt_side_cd, cntra_mp_id) |>\n arrange(trd_exctn_tm, trd_rpt_dt, trd_rpt_tm) |>\n mutate(seq = row_number()) |>\n ungroup() |>\n anti_join(trace_pre_R,\n by = join_by(cusip_id, trd_exctn_dt, entrd_vol_qt,\n rptd_pr, rpt_side_cd, cntra_mp_id, seq)) |>\n select(-seq)\n\n\n # Agency trades -----------------------------------------------------------\n # Combine pre and post trades\n trace_clean <- trace_post |>\n union_all(trace_pre)\n\n # Keep angency sells and unmatched agency buys\n ## Agency sells\n trace_agency_sells <- trace_clean |>\n filter(cntra_mp_id == \"D\",\n rpt_side_cd == \"S\")\n\n # Agency buys that are unmatched\n trace_agency_buys_filtered <- trace_clean |>\n filter(cntra_mp_id == \"D\",\n rpt_side_cd == \"B\") |>\n anti_join(trace_agency_sells,\n by = join_by(cusip_id, trd_exctn_dt,\n entrd_vol_qt, rptd_pr))\n\n # Agency clean\n trace_clean <- trace_clean |>\n filter(cntra_mp_id == \"C\") |>\n union_all(trace_agency_sells) |>\n union_all(trace_agency_buys_filtered)\n\n\n # Additional Filters ------------------------------------------------------\n trace_add_filters <- trace_clean |>\n mutate(days_to_sttl_ct2 = stlmnt_dt - trd_exctn_dt) |>\n filter(is.na(days_to_sttl_ct) | as.numeric(days_to_sttl_ct) <= 7,\n is.na(days_to_sttl_ct2) | as.numeric(days_to_sttl_ct2) <= 7,\n wis_fl == \"N\",\n is.na(spcl_trd_fl) | spcl_trd_fl == \"\",\n is.na(asof_cd) | asof_cd == \"\")\n\n\n # Output ------------------------------------------------------------------\n # Only keep necessary columns\n trace_final <- trace_add_filters |>\n arrange(cusip_id, trd_exctn_dt, trd_exctn_tm) |>\n select(cusip_id, trd_exctn_dt, trd_exctn_tm,\n rptd_pr, entrd_vol_qt, yld_pt, rpt_side_cd, cntra_mp_id) |>\n mutate(trd_exctn_tm = format(as_datetime(trd_exctn_tm, tz = \"America/New_York\"), \"%H:%M:%S\"))\n\n trace_final\n}\n\n\n\n\n\nReferences\n\nDick-Nielsen, Jens. 2009. “Liquidity biases in TRACE.” The Journal of Fixed Income 19 (2): 43–55. https://doi.org/10.3905/jfi.2009.19.2.043.\n\n\n———. 2014. “How to clean enhanced TRACE data.” Working Paper. https://ssrn.com/abstract=2337908.", "crumbs": [ "R", "Appendix", - "Proofs" + "Clean Enhanced TRACE with R" ] }, { - "objectID": "r/proofs.html#optimal-portfolio-choice", - "href": "r/proofs.html#optimal-portfolio-choice", - "title": "Proofs", + "objectID": "r/changelog.html", + "href": "r/changelog.html", + "title": "Changelog", "section": "", - "text": "The minimum variance portfolio weights are given by the solution to \\[\\omega_\\text{mvp} = \\arg\\min \\omega'\\Sigma \\omega \\text{ s.t. } \\iota'\\omega= 1,\\] where \\(\\iota\\) is an \\((N \\times 1)\\) vector of ones. The Lagrangian reads \\[ \\mathcal{L}(\\omega) = \\omega'\\Sigma \\omega - \\lambda(\\omega'\\iota - 1).\\] We can solve the first-order conditions of the Lagrangian equation: \\[\n\\begin{aligned}\n& \\frac{\\partial\\mathcal{L}(\\omega)}{\\partial\\omega} = 0 \\Leftrightarrow 2\\Sigma \\omega = \\lambda\\iota \\Rightarrow \\omega = \\frac{\\lambda}{2}\\Sigma^{-1}\\iota \\\\ \\end{aligned}\n\\] Next, the constraint that weights have to sum up to one delivers: \\(1 = \\iota'\\omega = \\frac{\\lambda}{2}\\iota'\\Sigma^{-1}\\iota \\Rightarrow \\lambda = \\frac{2}{\\iota'\\Sigma^{-1}\\iota}.\\) Finally, plug-in the derived value of \\(\\lambda\\) to get \\[\n\\begin{aligned}\n\\omega_\\text{mvp} = \\frac{\\Sigma^{-1}\\iota}{\\iota'\\Sigma^{-1}\\iota}.\n\\end{aligned}\n\\]\n\n\n\nConsider an investor who aims to achieve minimum variance given a desired expected return \\(\\bar{\\mu}\\), that is: \\[\\omega_\\text{eff}\\left(\\bar{\\mu}\\right) = \\arg\\min \\omega'\\Sigma \\omega \\text{ s.t. } \\iota'\\omega = 1 \\text{ and } \\omega'\\mu \\geq \\bar{\\mu}.\\] The Lagrangian reads \\[ \\mathcal{L}(\\omega) = \\omega'\\Sigma \\omega - \\lambda(\\omega'\\iota - 1) - \\tilde{\\lambda}(\\omega'\\mu - \\bar{\\mu}). \\] We can solve the first-order conditions to get \\[\n\\begin{aligned}\n2\\Sigma \\omega &= \\lambda\\iota + \\tilde\\lambda \\mu\\\\\n\\Rightarrow\\omega &= \\frac{\\lambda}{2}\\Sigma^{-1}\\iota + \\frac{\\tilde\\lambda}{2}\\Sigma^{-1}\\mu.\n\\end{aligned}\n\\]\nNext, the two constraints (\\(w'\\iota = 1 \\text{ and } \\omega'\\mu \\geq \\bar{\\mu}\\)) imply \\[\n\\begin{aligned}\n1 &= \\iota'\\omega = \\frac{\\lambda}{2}\\underbrace{\\iota'\\Sigma^{-1}\\iota}_{C} + \\frac{\\tilde\\lambda}{2}\\underbrace{\\iota'\\Sigma^{-1}\\mu}_D\\\\\n\\Rightarrow \\lambda&= \\frac{2 - \\tilde\\lambda D}{C}\\\\\n\\bar\\mu &= \\mu'\\omega = \\frac{\\lambda}{2}\\underbrace{\\mu'\\Sigma^{-1}\\iota}_{D} + \\frac{\\tilde\\lambda}{2}\\underbrace{\\mu'\\Sigma^{-1}\\mu}_E = \\frac{1}{2}\\left(\\frac{2 - \\tilde\\lambda D}{C}\\right)D+\\frac{\\tilde\\lambda}{2}E \\\\&=\\frac{D}{C}+\\frac{\\tilde\\lambda}{2}\\left(E - \\frac{D^2}{C}\\right)\\\\\n\\Rightarrow \\tilde\\lambda &= 2\\frac{\\bar\\mu - D/C}{E-D^2/C}.\n\\end{aligned}\n\\] As a result, the efficient portfolio weight takes the form (for \\(\\bar{\\mu} \\geq D/C = \\mu'\\omega_\\text{mvp}\\)) \\[\\omega_\\text{eff}\\left(\\bar\\mu\\right) = \\omega_\\text{mvp} + \\frac{\\tilde\\lambda}{2}\\left(\\Sigma^{-1}\\mu -\\frac{D}{C}\\Sigma^{-1}\\iota \\right).\\] Thus, the efficient portfolio allocates wealth in the minimum variance portfolio \\(\\omega_\\text{mvp}\\) and a levered (self-financing) portfolio to increase the expected return.\nNote that the portfolio weights sum up to one as \\[\\iota'\\left(\\Sigma^{-1}\\mu -\\frac{D}{C}\\Sigma^{-1}\\iota \\right) = D - D = 0\\text{ so }\\iota'\\omega_\\text{eff} = \\iota'\\omega_\\text{mvp} = 1.\\] Finally, the expected return of the efficient portfolio is \\[\\mu'\\omega_\\text{eff} = \\frac{D}{C} + \\bar\\mu - \\frac{D}{C} = \\bar\\mu.\\]\n\n\n\nWe argue that an investor with a quadratic utility function with certainty equivalent \\[\\max_\\omega CE(\\omega) = \\omega'\\mu - \\frac{\\gamma}{2} \\omega'\\Sigma \\omega \\text{ s.t. } \\iota'\\omega = 1\\] faces an equivalent optimization problem to a framework where portfolio weights are chosen with the aim to minimize volatility given a pre-specified level or expected returns \\[\\min_\\omega \\omega'\\Sigma \\omega \\text{ s.t. } \\omega'\\mu = \\bar\\mu \\text{ and } \\iota'\\omega = 1.\\] Note the difference: In the first case, the investor has a (known) risk aversion \\(\\gamma\\) which determines their optimal balance between risk (\\(\\omega'\\Sigma\\omega)\\) and return (\\(\\mu'\\omega\\)). In the second case, the investor has a target return they want to achieve while minimizing the volatility. Intuitively, both approaches are closely connected if we consider that the risk aversion \\(\\gamma\\) determines the desirable return \\(\\bar\\mu\\). More risk-averse investors (higher \\(\\gamma\\)) will chose a lower target return to keep their volatility level down. The efficient frontier then spans all possible portfolios depending on the risk aversion \\(\\gamma\\), starting from the minimum variance portfolio (\\(\\gamma = \\infty\\)).\nTo proof this equivalence, consider first the optimal portfolio weights for a certainty equivalent maximizing investor. The first-order condition reads \\[\n\\begin{aligned}\n\\mu - \\lambda \\iota &= \\gamma \\Sigma \\omega \\\\\n\\Leftrightarrow \\omega &= \\frac{1}{\\gamma}\\Sigma^{-1}\\left(\\mu - \\lambda\\iota\\right)\n\\end{aligned}\n\\] Next, we make use of the constraint \\(\\iota'\\omega = 1\\). \\[\n\\begin{aligned}\n\\iota'\\omega &= 1 = \\frac{1}{\\gamma}\\left(\\iota'\\Sigma^{-1}\\mu - \\lambda\\iota'\\Sigma^{-1}\\iota\\right)\\\\\n\\Rightarrow \\lambda &= \\frac{1}{\\iota'\\Sigma^{-1}\\iota}\\left(\\iota'\\Sigma^{-1}\\mu - \\gamma \\right).\n\\end{aligned}\n\\] Plugging in the value of \\(\\lambda\\) reveals the desired portfolio for an investor with risk aversion \\(\\gamma\\). \\[\n\\begin{aligned}\n\\omega &= \\frac{1}{\\gamma}\\Sigma^{-1}\\left(\\mu - \\frac{1}{\\iota'\\Sigma^{-1}\\iota}\\left(\\iota'\\Sigma^{-1}\\mu - \\gamma \\right)\\right) \\\\\n\\Rightarrow \\omega &= \\frac{\\Sigma^{-1}\\iota}{\\iota'\\Sigma^{-1}\\iota} + \\frac{1}{\\gamma}\\left(\\Sigma^{-1} - \\frac{\\Sigma^{-1}\\iota}{\\iota'\\Sigma^{-1}\\iota}\\iota'\\Sigma^{-1}\\right)\\mu\\\\\n&= \\omega_\\text{mvp} + \\frac{1}{\\gamma}\\left(\\Sigma^{-1}\\mu - \\frac{\\iota'\\Sigma^{-1}\\mu}{\\iota'\\Sigma^{-1}\\iota}\\Sigma^{-1}\\iota\\right).\n\\end{aligned}\n\\] The resulting weights correspond to the efficient portfolio with desired return \\(\\bar r\\) such that (in the notation of book) \\[\\frac{1}{\\gamma} = \\frac{\\tilde\\lambda}{2} = \\frac{\\bar\\mu - D/C}{E - D^2/C}\\] which implies that the desired return is just \\[\\bar\\mu = \\frac{D}{C} + \\frac{1}{\\gamma}\\left({E - D^2/C}\\right)\\] which is \\(\\frac{D}{C} = \\mu'\\omega_\\text{mvp}\\) for \\(\\gamma\\rightarrow \\infty\\) as expected. For instance, letting \\(\\gamma \\rightarrow \\infty\\) implies \\(\\bar\\mu = \\frac{D}{C} = \\omega_\\text{mvp}'\\mu\\).", + "text": "You can find every single change in our commit history. We collect the most important changes for Tidy Finance with R in the list below.\n\nSeptember 13, 2024, Commit 6464f94: we introduced the tidyfinance R package into the book. The package is available on CRAN.\nAugust 4, 2024, Commit 524bdd1: We added an additional filter to the Compustat download to exclude non-US companies in WRDS, CRSP, and Compustat.\nAugust 1, 2024, Commit 2980cf2: We updated the data until 2023-12-31 in all chapters.\nJuly 29, 2024, Commit cedec3e: We removed the month column from all chapters because it was misleading and consistently introduced date.\nJuly 16, 2024, Commit f4bbd00: We improved the documentation with respect to delisting returns in WRDS, CRSP, and Compustat.\nJune 3, 2024, Commit 23d379f: We fixed a bug in Univaritate Portfolio Sorts, which led to wrong annual returns in Figure 3.\nMay 15, 2024, Commit 2bb2e07: We added a new subsection about creating environment variables to Setting Up Your Environment.\nMay 15, 2024, Commit adccfc9: We updated the filters in CRSP download, so that correct historical information is used and daily and monthly data are aligned.\nApril 19, 2024, Commit d8c4de3: We updated to dbplyr version 2.5.0 and switched to the new I() instead of in_schema() syntax.\nMarch 4, 2024, Commit 6acb50b: We updated the download of monthly and daily CRSP data to the new official file format as distributed by WRDS, see additional information on the WRDS website.\nFeb 13, 2024, Commit 7871900: We updated the function for cleaning enhanced TRACE used in TRACE and FISD and shown in Appendix Clean Enhanced TRACE with R to reflect the correct time zone (i.e., New York, ET) and require less dependencies. We also updated the respective gist.\nFeb 13, 2024, Commit 5fce497: We removed the depedency on googledrive in Accessing and Managing Financial Data because frequently encountered failed downloads due to quota limits on the Google API.\nJan 4, 2024, Commit e9ab1a3: We updated the syntax of *_join() functions to use join_by() instead of by.\nDec 10, 2023, Commit 9814a2f: We added handling of delisting returns to daily CRSP download.\nOct 14, 2023 Commit b5a7495: We changed the download of daily CRSP data from individual stocks to batches in WRDS, CRSP, and Compustat.\nOct 12, 2023, Commit 48b6b29: We migrated from keras to torch in Option Pricing via Machine Learning for improved environment management.\nOct 4, 2023, Commit d4e0717: We added a new chapter Setting Up Your Environment.\nSep 28, 2023, Commit 290a612: We updated all data sources until 2022-12-31.\nSep 23, 2023, Commit f88f6c9: We switched from alabama and quadprog to nloptr in Constrained Optimization and Backtesting to be more consistent with the optimization in Python and to provide more flexibility with respect to constraints.\nJune 15, 2023, Commit 47dbb30: We moved the first usage of broom:tidy() from Fama-Macbeth Regressions to Univariate Portfolio Sorts to clean up the CAPM estimation.\nJune 12, 2023, Commit e008622: We fixed some inconsencies in notation of portfolio weights. Now, we refer to portfolio weights with \\(\\omega\\) throughout the complete book.\nJune 12, 2023, Commit 186ec7b2: We fixed a typo in the discussion of the elastic net in Chapter Factor Selection via Machine Learning.\nMay 23, 2023, Commit d5e355c: We update the workflow to collect() tables from tidy_finance.sqlite: To make variable selection more obvious, we now explicitly select() columns before collecting. As part of the pull request Commit 91d3077, we now select excess returns instead of net returns in the Chapter Fama-MacBeth Regressions.\nMay 20, 2023, Commit be0f0b4: We include NA-observations in the Mergent filters in Chapter TRACE and FISD.\nMay 17, 2023, Commit 2209bb1: We changed the assign_portfolio()-functions in Chapters Univariate Portfolio Sorts, Size Sorts and p-Hacking, Value and Bivariate Sorts, and Replicating Fama and French Factors. Additionally, we added a small explanation to potential issues with the function for clustered sorting variables in Chapter Univariate Portfolio Sorts.\nMay 12, 2023, Commit 54b76d7: We removed magic numbers in Chapter Introduction to Tidy Finance and introduced the scales packages already in the introduction chapter to reduce scaling issues in figures.\nMar. 30, 2023, Issue 29: We upgraded to tidyverse 2.0.0 and R 4.2.3 and removed all explicit loads of lubridate.\nFeb. 15, 2023, Commit bfda6af: We corrected an error in the calculation of the annualized average return volatility in the Chapter Introduction to Tidy Finance.\nMar. 06, 2023, Commit 857f0f5: We corrected an error in the label of Figure 6 in Chapter Introduction to Tidy Finance, which wrongly claimed to show the efficient tangency portfolio.\nMar. 09, 2023, Commit fae4ac3: We corrected a typo in the definition of the power utility function in Chapter Portfolio Performance. The utility function implemented in the code is now consistent with the text.", "crumbs": [ "R", "Appendix", - "Proofs" + "Changelog" ] }, { - "objectID": "r/changelog.html", - "href": "r/changelog.html", - "title": "Changelog", + "objectID": "r/accessing-and-managing-financial-data.html", + "href": "r/accessing-and-managing-financial-data.html", + "title": "Accessing and Managing Financial Data", "section": "", - "text": "You can find every single change in our commit history. We collect the most important changes for Tidy Finance with R in the list below.\n\nSeptember 13, 2024, Commit 6464f94: we introduced the tidyfinance R package into the book. The package is available on CRAN.\nAugust 4, 2024, Commit 524bdd1: We added an additional filter to the Compustat download to exclude non-US companies in WRDS, CRSP, and Compustat.\nAugust 1, 2024, Commit 2980cf2: We updated the data until 2023-12-31 in all chapters.\nJuly 29, 2024, Commit cedec3e: We removed the month column from all chapters because it was misleading and consistently introduced date.\nJuly 16, 2024, Commit f4bbd00: We improved the documentation with respect to delisting returns in WRDS, CRSP, and Compustat.\nJune 3, 2024, Commit 23d379f: We fixed a bug in Univaritate Portfolio Sorts, which led to wrong annual returns in Figure 3.\nMay 15, 2024, Commit 2bb2e07: We added a new subsection about creating environment variables to Setting Up Your Environment.\nMay 15, 2024, Commit adccfc9: We updated the filters in CRSP download, so that correct historical information is used and daily and monthly data are aligned.\nApril 19, 2024, Commit d8c4de3: We updated to dbplyr version 2.5.0 and switched to the new I() instead of in_schema() syntax.\nMarch 4, 2024, Commit 6acb50b: We updated the download of monthly and daily CRSP data to the new official file format as distributed by WRDS, see additional information on the WRDS website.\nFeb 13, 2024, Commit 7871900: We updated the function for cleaning enhanced TRACE used in TRACE and FISD and shown in Appendix Clean Enhanced TRACE with R to reflect the correct time zone (i.e., New York, ET) and require less dependencies. We also updated the respective gist.\nFeb 13, 2024, Commit 5fce497: We removed the depedency on googledrive in Accessing and Managing Financial Data because frequently encountered failed downloads due to quota limits on the Google API.\nJan 4, 2024, Commit e9ab1a3: We updated the syntax of *_join() functions to use join_by() instead of by.\nDec 10, 2023, Commit 9814a2f: We added handling of delisting returns to daily CRSP download.\nOct 14, 2023 Commit b5a7495: We changed the download of daily CRSP data from individual stocks to batches in WRDS, CRSP, and Compustat.\nOct 12, 2023, Commit 48b6b29: We migrated from keras to torch in Option Pricing via Machine Learning for improved environment management.\nOct 4, 2023, Commit d4e0717: We added a new chapter Setting Up Your Environment.\nSep 28, 2023, Commit 290a612: We updated all data sources until 2022-12-31.\nSep 23, 2023, Commit f88f6c9: We switched from alabama and quadprog to nloptr in Constrained Optimization and Backtesting to be more consistent with the optimization in Python and to provide more flexibility with respect to constraints.\nJune 15, 2023, Commit 47dbb30: We moved the first usage of broom:tidy() from Fama-Macbeth Regressions to Univariate Portfolio Sorts to clean up the CAPM estimation.\nJune 12, 2023, Commit e008622: We fixed some inconsencies in notation of portfolio weights. Now, we refer to portfolio weights with \\(\\omega\\) throughout the complete book.\nJune 12, 2023, Commit 186ec7b2: We fixed a typo in the discussion of the elastic net in Chapter Factor Selection via Machine Learning.\nMay 23, 2023, Commit d5e355c: We update the workflow to collect() tables from tidy_finance.sqlite: To make variable selection more obvious, we now explicitly select() columns before collecting. As part of the pull request Commit 91d3077, we now select excess returns instead of net returns in the Chapter Fama-MacBeth Regressions.\nMay 20, 2023, Commit be0f0b4: We include NA-observations in the Mergent filters in Chapter TRACE and FISD.\nMay 17, 2023, Commit 2209bb1: We changed the assign_portfolio()-functions in Chapters Univariate Portfolio Sorts, Size Sorts and p-Hacking, Value and Bivariate Sorts, and Replicating Fama and French Factors. Additionally, we added a small explanation to potential issues with the function for clustered sorting variables in Chapter Univariate Portfolio Sorts.\nMay 12, 2023, Commit 54b76d7: We removed magic numbers in Chapter Introduction to Tidy Finance and introduced the scales packages already in the introduction chapter to reduce scaling issues in figures.\nMar. 30, 2023, Issue 29: We upgraded to tidyverse 2.0.0 and R 4.2.3 and removed all explicit loads of lubridate.\nFeb. 15, 2023, Commit bfda6af: We corrected an error in the calculation of the annualized average return volatility in the Chapter Introduction to Tidy Finance.\nMar. 06, 2023, Commit 857f0f5: We corrected an error in the label of Figure 6, which wrongly claimed to show the efficient tangency portfolio.\nMar. 09, 2023, Commit fae4ac3: We corrected a typo in the definition of the power utility function in Chapter Portfolio Performance. The utility function implemented in the code is now consistent with the text.", + "text": "Note\n\n\n\nYou are reading Tidy Finance with R. You can find the equivalent chapter for the sibling Tidy Finance with Python here.\nIn this chapter, we suggest a way to organize your financial data. Everybody who has experience with data is also familiar with storing data in various formats like CSV, XLS, XLSX, or other delimited value storage. Reading and saving data can become very cumbersome in the case of using different data formats, both across different projects and across different programming languages. Moreover, storing data in delimited files often leads to problems with respect to column type consistency. For instance, date-type columns frequently lead to inconsistencies across different data formats and programming languages.\nThis chapter shows how to import different open source data sets. Specifically, our data comes from the application programming interface (API) of Yahoo Finance, a downloaded standard CSV file, an XLSX file stored in a public Google Drive repository, and other macroeconomic time series that can be scraped directly from a website. We show how to process these raw data, as well as how to take a shortcut using the tidyfinance package, which provides a consistent interface to tidy financial data. We store all the data in a single database, which serves as the only source of data in subsequent chapters. We conclude the chapter by providing some tips on managing databases.\nFirst, we load the global R packages that we use throughout this chapter. Later on, we load more packages in the sections where we need them.\nlibrary(tidyverse)\nlibrary(tidyfinance)\nlibrary(scales)\nMoreover, we initially define the date range for which we fetch and store the financial data, making future data updates tractable. In case you need another time frame, you can adjust the dates below. Our data starts with 1960 since most asset pricing studies use data from 1962 on.\nstart_date <- ymd(\"1960-01-01\")\nend_date <- ymd(\"2023-12-31\")", "crumbs": [ "R", - "Appendix", - "Changelog" + "Financial Data", + "Accessing and Managing Financial Data" ] }, { - "objectID": "r/parametric-portfolio-policies.html", - "href": "r/parametric-portfolio-policies.html", - "title": "Parametric Portfolio Policies", - "section": "", - "text": "Note\n\n\n\nYou are reading Tidy Finance with R. You can find the equivalent chapter for the sibling Tidy Finance with Python here.\nIn this chapter, we apply different portfolio performance measures to evaluate and compare portfolio allocation strategies. For this purpose, we introduce a direct way to estimate optimal portfolio weights for large-scale cross-sectional applications. More precisely, the approach of Brandt, Santa-Clara, and Valkanov (2009) proposes to parametrize the optimal portfolio weights as a function of stock characteristics instead of estimating the stock’s expected return, variance, and covariances with other stocks in a prior step. We choose weights as a function of the characteristics, which maximize the expected utility of the investor. This approach is feasible for large portfolio dimensions (such as the entire CRSP universe) and has been proposed by Brandt, Santa-Clara, and Valkanov (2009). See the review paper by Brandt (2010) for an excellent treatment of related portfolio choice methods.\nThe current chapter relies on the following set of R packages:\nlibrary(tidyverse)\nlibrary(RSQLite)", + "objectID": "r/accessing-and-managing-financial-data.html#fama-french-data", + "href": "r/accessing-and-managing-financial-data.html#fama-french-data", + "title": "Accessing and Managing Financial Data", + "section": "Fama-French Data", + "text": "Fama-French Data\nWe start by downloading some famous Fama-French factors (e.g., Fama and French 1993) and portfolio returns commonly used in empirical asset pricing. Fortunately, there is a neat package by Nelson Areal that allows us to access the data easily: the frenchdata package provides functions to download and read data sets from Prof. Kenneth French finance data library (Areal 2021). \n\nlibrary(frenchdata)\n\nWe can use the download_french_data() function of the package to download monthly Fama-French factors. The set Fama/French 3 Factors contains the return time series of the market mkt_excess, size smb and value hml alongside the risk-free rates rf. Note that we have to do some manual work to correctly parse all the columns and scale them appropriately, as the raw Fama-French data comes in a very unpractical data format. For precise descriptions of the variables, we suggest consulting Prof. Kenneth French’s finance data library directly. If you are on the website, check the raw data files to appreciate the time you can save thanks to frenchdata.\n\nfactors_ff3_monthly_raw <- download_french_data(\"Fama/French 3 Factors\")\nfactors_ff3_monthly <- factors_ff3_monthly_raw$subsets$data[[1]] |>\n mutate(\n date = floor_date(ymd(str_c(date, \"01\")), \"month\"),\n across(c(RF, `Mkt-RF`, SMB, HML), ~as.numeric(.) / 100),\n .keep = \"none\"\n ) |>\n rename_with(str_to_lower) |>\n rename(mkt_excess = `mkt-rf`) |> \n filter(date >= start_date & date <= end_date)\n\nWe also download the set 5 Factors (2x3), which additionally includes the return time series of the profitability rmw and investment cma factors. We demonstrate how the monthly factors are constructed in the chapter Replicating Fama and French Factors.\n\nfactors_ff5_monthly_raw <- download_french_data(\"Fama/French 5 Factors (2x3)\")\n\nfactors_ff5_monthly <- factors_ff5_monthly_raw$subsets$data[[1]] |>\n mutate(\n date = floor_date(ymd(str_c(date, \"01\")), \"month\"),\n across(c(RF, `Mkt-RF`, SMB, HML, RMW, CMA), ~as.numeric(.) / 100),\n .keep = \"none\"\n ) |>\n rename_with(str_to_lower) |>\n rename(mkt_excess = `mkt-rf`) |> \n filter(date >= start_date & date <= end_date)\n\nIt is straightforward to download the corresponding daily Fama-French factors with the same function.\n\nfactors_ff3_daily_raw <- download_french_data(\"Fama/French 3 Factors [Daily]\")\n\nfactors_ff3_daily <- factors_ff3_daily_raw$subsets$data[[1]] |>\n mutate(\n date = ymd(date),\n across(c(RF, `Mkt-RF`, SMB, HML), ~as.numeric(.) / 100),\n .keep = \"none\"\n ) |>\n rename_with(str_to_lower) |>\n rename(mkt_excess = `mkt-rf`) |>\n filter(date >= start_date & date <= end_date)\n\nIn a subsequent chapter, we also use the 10 monthly industry portfolios, so let us fetch that data, too.\n\nindustries_ff_monthly_raw <- download_french_data(\"10 Industry Portfolios\")\n\nindustries_ff_monthly <- industries_ff_monthly_raw$subsets$data[[1]] |>\n mutate(date = floor_date(ymd(str_c(date, \"01\")), \"month\")) |>\n mutate(across(where(is.numeric), ~ . / 100)) |>\n select(date, everything()) |>\n filter(date >= start_date & date <= end_date) |> \n rename_with(str_to_lower)\n\nIt is worth taking a look at all available portfolio return time series from Kenneth French’s homepage. You should check out the other sets by calling get_french_data_list().\nTo automatically download and process Fama-French data, you can also use the tidyfinance package with type = \"factors_ff_3_monthly\" or similar, e.g.:\n\ndownload_data(\n type = \"factors_ff_3_monthly\", \n start_date = start_date, \n end_date = end_date\n)\n\nThe tidyfinance package implements the processing steps as above and returns the same cleaned data frame. The list of supported Fama-French data types can be called as follows:\n\nlist_supported_types(domain = \"Fama-French\")", "crumbs": [ "R", - "Portfolio Optimization", - "Parametric Portfolio Policies" + "Financial Data", + "Accessing and Managing Financial Data" ] }, { - "objectID": "r/parametric-portfolio-policies.html#data-preparation", - "href": "r/parametric-portfolio-policies.html#data-preparation", - "title": "Parametric Portfolio Policies", - "section": "Data Preparation", - "text": "Data Preparation\nTo get started, we load the monthly CRSP file, which forms our investment universe. We load the data from our SQLite-database introduced in Accessing and Managing Financial Data and WRDS, CRSP, and Compustat.\n\ntidy_finance <- dbConnect(\n SQLite(), \"data/tidy_finance_r.sqlite\",\n extended_types = TRUE\n)\n\ncrsp_monthly <- tbl(tidy_finance, \"crsp_monthly\") |>\n select(permno, date, ret_excess, mktcap, mktcap_lag) |>\n collect()\n\nTo evaluate the performance of portfolios, we further use monthly market returns as a benchmark to compute CAPM alphas.\n\nfactors_ff3_monthly <- tbl(tidy_finance, \"factors_ff3_monthly\") |>\n select(date, mkt_excess) |>\n collect()\n\nNext, we retrieve some stock characteristics that have been shown to have an effect on the expected returns or expected variances (or even higher moments) of the return distribution. In particular, we record the lagged one-year return momentum (momentum_lag), defined as the compounded return between months \\(t-13\\) and \\(t-2\\) for each firm. In finance, momentum is the empirically observed tendency for rising asset prices to rise further, and falling prices to keep falling (Jegadeesh and Titman 1993). The second characteristic is the firm’s market equity (size_lag), defined as the log of the price per share times the number of shares outstanding (Banz 1981). To construct the correct lagged values, we use the approach introduced in WRDS, CRSP, and Compustat.\n\ncrsp_monthly_lags <- crsp_monthly |>\n transmute(permno,\n date_13 = date %m+% months(13),\n mktcap\n )\n\ncrsp_monthly <- crsp_monthly |>\n inner_join(crsp_monthly_lags,\n join_by(permno, date == date_13),\n suffix = c(\"\", \"_13\")\n )\n\ndata_portfolios <- crsp_monthly |>\n mutate(\n momentum_lag = mktcap_lag / mktcap_13,\n size_lag = log(mktcap_lag)\n ) |>\n drop_na(contains(\"lag\"))", + "objectID": "r/accessing-and-managing-financial-data.html#q-factors", + "href": "r/accessing-and-managing-financial-data.html#q-factors", + "title": "Accessing and Managing Financial Data", + "section": "q-Factors", + "text": "q-Factors\nIn recent years, the academic discourse experienced the rise of alternative factor models, e.g., in the form of the Hou, Xue, and Zhang (2014) q-factor model. We refer to the extended background information provided by the original authors for further information. The q factors can be downloaded directly from the authors’ homepage from within read_csv().\nWe also need to adjust this data. First, we discard information we will not use in the remainder of the book. Then, we rename the columns with the “R_”-prescript using regular expressions and write all column names in lowercase. You should always try sticking to a consistent style for naming objects, which we try to illustrate here - the emphasis is on try. You can check out style guides available online, e.g., Hadley Wickham’s tidyverse style guide.\n\nfactors_q_monthly_link <-\n \"https://global-q.org/uploads/1/2/2/6/122679606/q5_factors_monthly_2023.csv\"\n\nfactors_q_monthly <- read_csv(factors_q_monthly_link) |>\n mutate(date = ymd(str_c(year, month, \"01\", sep = \"-\"))) |>\n rename_with(~str_remove(., \"R_\")) |>\n rename_with(str_to_lower) |>\n mutate(across(-date, ~. / 100)) |>\n select(date, risk_free = f, mkt_excess = mkt, everything()) |>\n filter(date >= start_date & date <= end_date)\n\nAgain, you can use the tidyfinance package for a shortcut:\n\ndownload_data(\n type = \"factors_q5_monthly\", \n start_date = start_date, \n end_date = end_date\n)", "crumbs": [ "R", - "Portfolio Optimization", - "Parametric Portfolio Policies" + "Financial Data", + "Accessing and Managing Financial Data" ] }, { - "objectID": "r/parametric-portfolio-policies.html#parametric-portfolio-policies", - "href": "r/parametric-portfolio-policies.html#parametric-portfolio-policies", - "title": "Parametric Portfolio Policies", - "section": "Parametric Portfolio Policies", - "text": "Parametric Portfolio Policies\nThe basic idea of parametric portfolio weights is as follows. Suppose that at each date \\(t\\) we have \\(N_t\\) stocks in the investment universe, where each stock \\(i\\) has a return of \\(r_{i, t+1}\\) and is associated with a vector of firm characteristics \\(x_{i, t}\\) such as time-series momentum or the market capitalization. The investor’s problem is to choose portfolio weights \\(w_{i,t}\\) to maximize the expected utility of the portfolio return: \\[\\begin{aligned}\n\\max_{\\omega} E_t\\left(u(r_{p, t+1})\\right) = E_t\\left[u\\left(\\sum\\limits_{i=1}^{N_t}\\omega_{i,t}r_{i,t+1}\\right)\\right]\n\\end{aligned}\\] where \\(u(\\cdot)\\) denotes the utility function.\nWhere do the stock characteristics show up? We parameterize the optimal portfolio weights as a function of the stock characteristic \\(x_{i,t}\\) with the following linear specification for the portfolio weights: \\[\\omega_{i,t} = \\bar{\\omega}_{i,t} + \\frac{1}{N_t}\\theta'\\hat{x}_{i,t},\\] where \\(\\bar{\\omega}_{i,t}\\) is a stock’s weight in a benchmark portfolio (we use the value-weighted or naive portfolio in the application below), \\(\\theta\\) is a vector of coefficients which we are going to estimate, and \\(\\hat{x}_{i,t}\\) are the characteristics of stock \\(i\\), cross-sectionally standardized to have zero mean and unit standard deviation.\nIntuitively, the portfolio strategy is a form of active portfolio management relative to a performance benchmark. Deviations from the benchmark portfolio are derived from the individual stock characteristics. Note that by construction the weights sum up to one as \\(\\sum_{i=1}^{N_t}\\hat{x}_{i,t} = 0\\) due to the standardization. Moreover, the coefficients are constant across assets and over time. The implicit assumption is that the characteristics fully capture all aspects of the joint distribution of returns that are relevant for forming optimal portfolios.\nWe first implement cross-sectional standardization for the entire CRSP universe. We also keep track of (lagged) relative market capitalization relative_mktcap, which will represent the value-weighted benchmark portfolio, while n denotes the number of traded assets \\(N_t\\), which we use to construct the naive portfolio benchmark.\n\ndata_portfolios <- data_portfolios |>\n group_by(date) |>\n mutate(\n n = n(),\n relative_mktcap = mktcap_lag / sum(mktcap_lag),\n across(contains(\"lag\"), ~ (. - mean(.)) / sd(.)),\n ) |>\n ungroup() |>\n select(-mktcap_lag)", + "objectID": "r/accessing-and-managing-financial-data.html#macroeconomic-predictors", + "href": "r/accessing-and-managing-financial-data.html#macroeconomic-predictors", + "title": "Accessing and Managing Financial Data", + "section": "Macroeconomic Predictors", + "text": "Macroeconomic Predictors\nOur next data source is a set of macroeconomic variables often used as predictors for the equity premium. Welch and Goyal (2008) comprehensively reexamine the performance of variables suggested by the academic literature to be good predictors of the equity premium. The authors host the data updated to 2022 on Amit Goyal’s website. The data is an XLSX-file stored on a public Google drive location and we directly export a CSV file.\n\nsheet_id <- \"1bM7vCWd3WOt95Sf9qjLPZjoiafgF_8EG\"\nsheet_name <- \"Monthly\"\nmacro_predictors_url <- paste0(\n \"https://docs.google.com/spreadsheets/d/\", sheet_id,\n \"/gviz/tq?tqx=out:csv&sheet=\", sheet_name\n)\nmacro_predictors_raw <- read_csv(macro_predictors_url)\n\nNext, we transform the columns into the variables that we later use:\n\nThe dividend price ratio (dp), the difference between the log of dividends and the log of prices, where dividends are 12-month moving sums of dividends paid on the S&P 500 index, and prices are monthly averages of daily closing prices (Campbell and Shiller 1988; Campbell and Yogo 2006).\nDividend yield (dy), the difference between the log of dividends and the log of lagged prices (Ball 1978).\nEarnings price ratio (ep), the difference between the log of earnings and the log of prices, where earnings are 12-month moving sums of earnings on the S&P 500 index (Campbell and Shiller 1988).\nDividend payout ratio (de), the difference between the log of dividends and the log of earnings (Lamont 1998).\nStock variance (svar), the sum of squared daily returns on the S&P 500 index (Guo 2006).\nBook-to-market ratio (bm), the ratio of book value to market value for the Dow Jones Industrial Average (Kothari and Shanken 1997).\nNet equity expansion (ntis), the ratio of 12-month moving sums of net issues by NYSE listed stocks divided by the total end-of-year market capitalization of NYSE stocks (Campbell, Hilscher, and Szilagyi 2008).\nTreasury bills (tbl), the 3-Month Treasury Bill: Secondary Market Rate from the economic research database at the Federal Reserve Bank at St. Louis (Campbell 1987).\nLong-term yield (lty), the long-term government bond yield from Ibbotson’s Stocks, Bonds, Bills, and Inflation Yearbook (Welch and Goyal 2008).\nLong-term rate of returns (ltr), the long-term government bond returns from Ibbotson’s Stocks, Bonds, Bills, and Inflation Yearbook (Welch and Goyal 2008).\nTerm spread (tms), the difference between the long-term yield on government bonds and the Treasury bill (Campbell 1987).\nDefault yield spread (dfy), the difference between BAA and AAA-rated corporate bond yields (Fama and French 1989).\nInflation (infl), the Consumer Price Index (All Urban Consumers) from the Bureau of Labor Statistics (Campbell and Vuolteenaho 2004).\n\nFor variable definitions and the required data transformations, you can consult the material on Amit Goyal’s website.\n\nmacro_predictors <- macro_predictors_raw |>\n mutate(date = ym(yyyymm)) |>\n mutate(across(where(is.character), as.numeric)) |>\n mutate(\n IndexDiv = Index + D12,\n logret = log(IndexDiv) - log(lag(IndexDiv)),\n Rfree = log(Rfree + 1),\n rp_div = lead(logret - Rfree, 1), # Future excess market return\n dp = log(D12) - log(Index), # Dividend Price ratio\n dy = log(D12) - log(lag(Index)), # Dividend yield\n ep = log(E12) - log(Index), # Earnings price ratio\n de = log(D12) - log(E12), # Dividend payout ratio\n tms = lty - tbl, # Term spread\n dfy = BAA - AAA # Default yield spread\n ) |>\n select(\n date, rp_div, dp, dy, ep, de, svar,\n bm = `b/m`, ntis, tbl, lty, ltr,\n tms, dfy, infl\n ) |>\n filter(date >= start_date & date <= end_date) |>\n drop_na()\n\nTo get the equivalent data through tidyfinance, you can call:\n\ndownload_data(\n type = \"macro_predictors_monthly\",\n start_date = start_date,\n end_date = end_date\n)", "crumbs": [ "R", - "Portfolio Optimization", - "Parametric Portfolio Policies" + "Financial Data", + "Accessing and Managing Financial Data" ] }, { - "objectID": "r/parametric-portfolio-policies.html#computing-portfolio-weights", - "href": "r/parametric-portfolio-policies.html#computing-portfolio-weights", - "title": "Parametric Portfolio Policies", - "section": "Computing Portfolio Weights", - "text": "Computing Portfolio Weights\nNext, we move on to identify optimal choices of \\(\\theta\\). We rewrite the optimization problem together with the weight parametrization and can then estimate \\(\\theta\\) to maximize the objective function based on our sample \\[\\begin{aligned}\nE_t\\left(u(r_{p, t+1})\\right) = \\frac{1}{T}\\sum\\limits_{t=0}^{T-1}u\\left(\\sum\\limits_{i=1}^{N_t}\\left(\\bar{\\omega}_{i,t} + \\frac{1}{N_t}\\theta'\\hat{x}_{i,t}\\right)r_{i,t+1}\\right).\n\\end{aligned}\\] The allocation strategy is straightforward because the number of parameters to estimate is small. Instead of a tedious specification of the \\(N_t\\) dimensional vector of expected returns and the \\(N_t(N_t+1)/2\\) free elements of the covariance matrix, all we need to focus on in our application is the vector \\(\\theta\\). \\(\\theta\\) contains only two elements in our application: the relative deviation from the benchmark due to size and momentum.\nTo get a feeling for the performance of such an allocation strategy, we start with an arbitrary initial vector \\(\\theta_0\\). The next step is to choose \\(\\theta\\) optimally to maximize the objective function. We automatically detect the number of parameters by counting the number of columns with lagged values.\n\nn_parameters <- sum(str_detect(\n colnames(data_portfolios), \"lag\"\n))\n\ntheta <- rep(1.5, n_parameters)\n\nnames(theta) <- colnames(data_portfolios)[str_detect(\n colnames(data_portfolios), \"lag\"\n)]\n\nThe function compute_portfolio_weights() below computes the portfolio weights \\(\\bar{\\omega}_{i,t} + \\frac{1}{N_t}\\theta'\\hat{x}_{i,t}\\) according to our parametrization for a given value \\(\\theta_0\\). Everything happens within a single pipeline. Hence, we provide a short walk-through.\nWe first compute characteristic_tilt, the tilting values \\(\\frac{1}{N_t}\\theta'\\hat{x}_{i, t}\\) which resemble the deviation from the benchmark portfolio. Next, we compute the benchmark portfolio weight_benchmark, which can be any reasonable set of portfolio weights. In our case, we choose either the value or equal-weighted allocation. weight_tilt completes the picture and contains the final portfolio weights weight_tilt = weight_benchmark + characteristic_tilt which deviate from the benchmark portfolio depending on the stock characteristics.\nThe final few lines go a bit further and implement a simple version of a no-short sale constraint. While it is generally not straightforward to ensure portfolio weight constraints via parameterization, we simply normalize the portfolio weights such that they are enforced to be positive. Finally, we make sure that the normalized weights sum up to one again: \\[\\omega_{i,t}^+ = \\frac{\\max(0, \\omega_{i,t})}{\\sum_{j=1}^{N_t}\\max(0, \\omega_{i,t})}.\\]\nThe following function computes the optimal portfolio weights in the way just described.\n\ncompute_portfolio_weights <- function(theta,\n data,\n value_weighting = TRUE,\n allow_short_selling = TRUE) {\n data |>\n group_by(date) |>\n bind_cols(\n characteristic_tilt = data |>\n transmute(across(contains(\"lag\"), ~ . / n)) |>\n as.matrix() %*% theta |> as.numeric()\n ) |>\n mutate(\n # Definition of benchmark weight\n weight_benchmark = case_when(\n value_weighting == TRUE ~ relative_mktcap,\n value_weighting == FALSE ~ 1 / n\n ),\n # Parametric portfolio weights\n weight_tilt = weight_benchmark + characteristic_tilt,\n # Short-sell constraint\n weight_tilt = case_when(\n allow_short_selling == TRUE ~ weight_tilt,\n allow_short_selling == FALSE ~ pmax(0, weight_tilt)\n ),\n # Weights sum up to 1\n weight_tilt = weight_tilt / sum(weight_tilt)\n ) |>\n ungroup()\n}\n\nIn the next step, we compute the portfolio weights for the arbitrary vector \\(\\theta_0\\). In the example below, we use the value-weighted portfolio as a benchmark and allow negative portfolio weights.\n\nweights_crsp <- compute_portfolio_weights(\n theta,\n data_portfolios,\n value_weighting = TRUE,\n allow_short_selling = TRUE\n)", + "objectID": "r/accessing-and-managing-financial-data.html#other-macroeconomic-data", + "href": "r/accessing-and-managing-financial-data.html#other-macroeconomic-data", + "title": "Accessing and Managing Financial Data", + "section": "Other Macroeconomic Data", + "text": "Other Macroeconomic Data\nThe Federal Reserve bank of St. Louis provides the Federal Reserve Economic Data (FRED), an extensive database for macroeconomic data. In total, there are 817,000 US and international time series from 108 different sources. The data can be downloaded directly from FRED by constructing the appropriate URL. For instance, let us consider the consumer price index (CPI) data that can be found under the CPIAUCNS:\n\nseries <- \"CPIAUCNS\"\ncpi_url <- paste0(\"https://fred.stlouisfed.org/series/\", series, \"/downloaddata/\", series, \".csv\")\n\nWe can then use the httr2 (Wickham 2024) package to request the CSV, extract the data from the response body, and convert the columns to a tidy format:\n\nlibrary(httr2)\n\ncpi_daily <- request(cpi_url) |>\n req_perform() |> \n resp_body_string() |> \n read_csv() |> \n mutate(\n date = as.Date(DATE),\n value = as.numeric(VALUE),\n series = series,\n .keep = \"none\"\n )\n\nWe convert the daily CPI data to monthly because we use the latter in later chapters.\n\ncpi_monthly <- cpi_daily |>\n mutate(\n date = floor_date(date, \"month\"),\n cpi = value / value[date == max(date)],\n .keep = \"none\"\n )\n\nThe tidyfinance package can, of course, also fetch the same daily data and many more data series:\n\ndownload_data(\n type = \"fred\",\n series = \"CPIAUCNS\",\n start_date = start_date,\n end_date = end_date\n)\n\n# A tibble: 768 × 3\n date value series \n <date> <dbl> <chr> \n1 1960-01-01 29.3 CPIAUCNS\n2 1960-02-01 29.4 CPIAUCNS\n3 1960-03-01 29.4 CPIAUCNS\n4 1960-04-01 29.5 CPIAUCNS\n5 1960-05-01 29.5 CPIAUCNS\n# ℹ 763 more rows\n\n\nTo download other time series, we just have to look it up on the FRED website and extract the corresponding key from the address. For instance, the producer price index for gold ores can be found under the PCU2122212122210 key. If your desired time series is not supported through tidyfinance, we recommend working with the fredr package (Boysel and Vaughan 2021). Note that you need to get an API key to use its functionality. We refer to the package documentation for details.", "crumbs": [ "R", - "Portfolio Optimization", - "Parametric Portfolio Policies" + "Financial Data", + "Accessing and Managing Financial Data" ] }, { - "objectID": "r/parametric-portfolio-policies.html#portfolio-performance", - "href": "r/parametric-portfolio-policies.html#portfolio-performance", - "title": "Parametric Portfolio Policies", - "section": "Portfolio Performance", - "text": "Portfolio Performance\n Are the computed weights optimal in any way? Most likely not, as we picked \\(\\theta_0\\) arbitrarily. To evaluate the performance of an allocation strategy, one can think of many different approaches. In their original paper, Brandt, Santa-Clara, and Valkanov (2009) focus on a simple evaluation of the hypothetical utility of an agent equipped with a power utility function \\(u_\\gamma(r) = \\frac{(1 + r)^{(1-\\gamma)}}{1-\\gamma}\\), where \\(\\gamma\\) is the risk aversion factor.\n\npower_utility <- function(r, gamma = 5) {\n (1 + r)^(1 - gamma) / (1 - gamma)\n}\n\nWe want to note that Gehrig, Sögner, and Westerkamp (2020) warn that, in the leading case of constant relative risk aversion (CRRA), strong assumptions on the properties of the returns, the variables used to implement the parametric portfolio policy, and the parameter space are necessary to obtain a well-defined optimization problem.\nNo doubt, there are many other ways to evaluate a portfolio. The function below provides a summary of all kinds of interesting measures that can be considered relevant. Do we need all these evaluation measures? It depends: the original paper by Brandt, Santa-Clara, and Valkanov (2009) only cares about the expected utility to choose \\(\\theta\\). However, if you want to choose optimal values that achieve the highest performance while putting some constraints on your portfolio weights, it is helpful to have everything in one function.\n\nevaluate_portfolio <- function(weights_crsp,\n capm_evaluation = TRUE,\n full_evaluation = TRUE,\n length_year = 12) {\n \n evaluation <- weights_crsp |>\n group_by(date) |>\n summarize(\n tilt = weighted.mean(ret_excess, weight_tilt),\n benchmark = weighted.mean(ret_excess, weight_benchmark)\n ) |>\n pivot_longer(\n -date,\n values_to = \"portfolio_return\",\n names_to = \"model\"\n ) \n \n evaluation_stats <- evaluation |>\n group_by(model) |>\n left_join(factors_ff3_monthly, join_by(date)) |>\n summarize(tibble(\n \"Expected utility\" = mean(power_utility(portfolio_return)),\n \"Average return\" = 100 * mean(length_year * portfolio_return),\n \"SD return\" = 100 * sqrt(length_year) * sd(portfolio_return),\n \"Sharpe ratio\" = sqrt(length_year) * mean(portfolio_return) / sd(portfolio_return),\n\n )) |>\n mutate(model = str_remove(model, \"return_\")) \n \n if (capm_evaluation) {\n evaluation_capm <- evaluation |> \n left_join(factors_ff3_monthly, join_by(date)) |>\n group_by(model) |>\n summarize(\n \"CAPM alpha\" = coefficients(lm(portfolio_return ~ mkt_excess))[1],\n \"Market beta\" = coefficients(lm(portfolio_return ~ mkt_excess))[2]\n )\n \n evaluation_stats <- evaluation_stats |> \n left_join(evaluation_capm, join_by(model))\n }\n\n if (full_evaluation) {\n evaluation_weights <- weights_crsp |>\n select(date, contains(\"weight\")) |>\n pivot_longer(-date, values_to = \"weight\", names_to = \"model\") |>\n group_by(model, date) |>\n mutate(\n \"Absolute weight\" = abs(weight),\n \"Max. weight\" = max(weight),\n \"Min. weight\" = min(weight),\n \"Avg. sum of negative weights\" = -sum(weight[weight < 0]),\n \"Avg. fraction of negative weights\" = sum(weight < 0) / n(),\n .keep = \"none\"\n ) |>\n group_by(model) |>\n summarize(across(-date, ~ 100 * mean(.))) |>\n mutate(model = str_remove(model, \"weight_\")) \n \n evaluation_stats <- evaluation_stats |> \n left_join(evaluation_weights, join_by(model))\n }\n \n evaluation_output <- evaluation_stats |> \n pivot_longer(cols = -model, names_to = \"measure\") |> \n pivot_wider(names_from = model)\n \n return(evaluation_output)\n}\n\n Let us take a look at the different portfolio strategies and evaluation measures.\n\nevaluate_portfolio(weights_crsp) |>\n print(n = Inf)\n\n# A tibble: 11 × 3\n measure benchmark tilt\n <chr> <dbl> <dbl>\n 1 Expected utility -0.250 -0.261 \n 2 Average return 6.87 0.537 \n 3 SD return 15.5 21.2 \n 4 Sharpe ratio 0.444 0.0254 \n 5 CAPM alpha 0.000141 -0.00485\n 6 Market beta 0.994 0.943 \n 7 Absolute weight 0.0249 0.0638 \n 8 Max. weight 3.63 3.76 \n 9 Min. weight 0.0000270 -0.144 \n10 Avg. sum of negative weights 0 78.1 \n11 Avg. fraction of negative weights 0 49.5 \n\n\nThe value-weighted portfolio delivers an annualized return of more than 6 percent and clearly outperforms the tilted portfolio, irrespective of whether we evaluate expected utility, the Sharpe ratio, or the CAPM alpha. We can conclude the market beta is close to one for both strategies (naturally almost identically 1 for the value-weighted benchmark portfolio). When it comes to the distribution of the portfolio weights, we see that the benchmark portfolio weight takes less extreme positions (lower average absolute weights and lower maximum weight). By definition, the value-weighted benchmark does not take any negative positions, while the tilted portfolio also takes short positions.", + "objectID": "r/accessing-and-managing-financial-data.html#setting-up-a-database", + "href": "r/accessing-and-managing-financial-data.html#setting-up-a-database", + "title": "Accessing and Managing Financial Data", + "section": "Setting Up a Database", + "text": "Setting Up a Database\nNow that we have downloaded some (freely available) data from the web into the memory of our R session let us set up a database to store that information for future use. We will use the data stored in this database throughout the following chapters, but you could alternatively implement a different strategy and replace the respective code.\nThere are many ways to set up and organize a database, depending on the use case. For our purpose, the most efficient way is to use an SQLite database, which is the C-language library that implements a small, fast, self-contained, high-reliability, full-featured, SQL database engine. Note that SQL (Structured Query Language) is a standard language for accessing and manipulating databases and heavily inspired the dplyr functions. We refer to this tutorial for more information on SQL.\nThere are two packages that make working with SQLite in R very simple: RSQLite (Müller et al. 2022) embeds the SQLite database engine in R, and dbplyr (Wickham, Girlich, and Ruiz 2022) is the database back-end for dplyr. These packages allow to set up a database to remotely store tables and use these remote database tables as if they are in-memory data frames by automatically converting dplyr into SQL. Check out the RSQLite and dbplyr vignettes for more information.\n\nlibrary(RSQLite)\nlibrary(dbplyr)\n\nAn SQLite database is easily created - the code below is really all there is. You do not need any external software. Note that we use the extended_types = TRUE option to enable date types when storing and fetching data. Otherwise, date columns are stored and retrieved as integers. We will use the file tidy_finance_r.sqlite, located in the data subfolder, to retrieve data for all subsequent chapters. The initial part of the code ensures that the directory is created if it does not already exist.\n\nif (!dir.exists(\"data\")) {\n dir.create(\"data\")\n}\n\ntidy_finance <- dbConnect(\n SQLite(),\n \"data/tidy_finance_r.sqlite\",\n extended_types = TRUE\n)\n\nNext, we create a remote table with the monthly Fama-French factor data. We do so with the function dbWriteTable(), which copies the data to our SQLite-database.\n\ndbWriteTable(\n tidy_finance,\n \"factors_ff3_monthly\",\n value = factors_ff3_monthly,\n overwrite = TRUE\n)\n\nWe can use the remote table as an in-memory data frame by building a connection via tbl().\n\nfactors_ff3_monthly_db <- tbl(tidy_finance, \"factors_ff3_monthly\")\n\nAll dplyr calls are evaluated lazily, i.e., the data is not in our R session’s memory, and the database does most of the work. You can see that by noticing that the output below does not show the number of rows. In fact, the following code chunk only fetches the top 10 rows from the database for printing.\n\nfactors_ff3_monthly_db |>\n select(date, rf)\n\n# Source: SQL [?? x 2]\n# Database: sqlite 3.41.2 [data/tidy_finance_r.sqlite]\n date rf\n <date> <dbl>\n1 1960-01-01 0.0033\n2 1960-02-01 0.0029\n3 1960-03-01 0.0035\n4 1960-04-01 0.0019\n5 1960-05-01 0.0027\n# ℹ more rows\n\n\nIf we want to have the whole table in memory, we need to collect() it. You will see that we regularly load the data into the memory in the next chapters.\n\nfactors_ff3_monthly_db |>\n select(date, rf) |>\n collect()\n\n# A tibble: 768 × 2\n date rf\n <date> <dbl>\n1 1960-01-01 0.0033\n2 1960-02-01 0.0029\n3 1960-03-01 0.0035\n4 1960-04-01 0.0019\n5 1960-05-01 0.0027\n# ℹ 763 more rows\n\n\nThe last couple of code chunks is really all there is to organizing a simple database! You can also share the SQLite database across devices and programming languages.\nBefore we move on to the next data source, let us also store the other five tables in our new SQLite database.\n\ndbWriteTable(\n tidy_finance,\n \"factors_ff5_monthly\",\n value = factors_ff5_monthly,\n overwrite = TRUE\n)\n\ndbWriteTable(\n tidy_finance,\n \"factors_ff3_daily\",\n value = factors_ff3_daily,\n overwrite = TRUE\n)\n\ndbWriteTable(\n tidy_finance,\n \"industries_ff_monthly\",\n value = industries_ff_monthly,\n overwrite = TRUE\n)\n\ndbWriteTable(\n tidy_finance,\n \"factors_q_monthly\",\n value = factors_q_monthly,\n overwrite = TRUE\n)\n\ndbWriteTable(\n tidy_finance,\n \"macro_predictors\",\n value = macro_predictors,\n overwrite = TRUE\n)\n\ndbWriteTable(\n tidy_finance,\n \"cpi_monthly\",\n value = cpi_monthly,\n overwrite = TRUE\n)\n\nFrom now on, all you need to do to access data that is stored in the database is to follow three steps: (i) Establish the connection to the SQLite database, (ii) call the table you want to extract, and (iii) collect the data. For your convenience, the following steps show all you need in a compact fashion.\n\nlibrary(tidyverse)\nlibrary(RSQLite)\n\ntidy_finance <- dbConnect(\n SQLite(),\n \"data/tidy_finance_r.sqlite\",\n extended_types = TRUE\n)\n\nfactors_q_monthly <- tbl(tidy_finance, \"factors_q_monthly\")\nfactors_q_monthly <- factors_q_monthly |> collect()", "crumbs": [ "R", - "Portfolio Optimization", - "Parametric Portfolio Policies" + "Financial Data", + "Accessing and Managing Financial Data" ] }, { - "objectID": "r/parametric-portfolio-policies.html#optimal-parameter-choice", - "href": "r/parametric-portfolio-policies.html#optimal-parameter-choice", - "title": "Parametric Portfolio Policies", - "section": "Optimal Parameter Choice", - "text": "Optimal Parameter Choice\nNext, we move to a choice of \\(\\theta\\) that actually aims to improve some (or all) of the performance measures. We first define a helper function compute_objective_function(), which we then pass to an optimizer.\n\ncompute_objective_function <- function(theta,\n data,\n objective_measure = \"Expected utility\",\n value_weighting = TRUE,\n allow_short_selling = TRUE) {\n processed_data <- compute_portfolio_weights(\n theta,\n data,\n value_weighting,\n allow_short_selling\n )\n\n objective_function <- evaluate_portfolio(\n processed_data,\n capm_evaluation = FALSE,\n full_evaluation = FALSE\n ) |>\n filter(measure == objective_measure) |>\n pull(tilt)\n\n return(-objective_function)\n}\n\nYou may wonder why we return the negative value of the objective function. This is simply due to the common convention for optimization procedures to search for minima as a default. By minimizing the negative value of the objective function, we get the maximum value as a result. In its most basic form, R optimization relies on the function optim(). As main inputs, the function requires an initial guess of the parameters and the objective function to minimize. Now, we are fully equipped to compute the optimal values of \\(\\hat\\theta\\), which maximize the hypothetical expected utility of the investor.\n\noptimal_theta <- optim(\n par = theta,\n fn = compute_objective_function,\n objective_measure = \"Expected utility\",\n data = data_portfolios,\n value_weighting = TRUE,\n allow_short_selling = TRUE,\n method = \"Nelder-Mead\"\n)\n\noptimal_theta$par\n\nmomentum_lag size_lag \n 0.304 -1.705 \n\n\nThe resulting values of \\(\\hat\\theta\\) are easy to interpret: intuitively, expected utility increases by tilting weights from the value-weighted portfolio toward smaller stocks (negative coefficient for size) and toward past winners (positive value for momentum). Both findings are in line with the well-documented size effect (Banz 1981) and the momentum anomaly (Jegadeesh and Titman 1993).", + "objectID": "r/accessing-and-managing-financial-data.html#managing-sqlite-databases", + "href": "r/accessing-and-managing-financial-data.html#managing-sqlite-databases", + "title": "Accessing and Managing Financial Data", + "section": "Managing SQLite Databases", + "text": "Managing SQLite Databases\nFinally, at the end of our data chapter, we revisit the SQLite database itself. When you drop database objects such as tables or delete data from tables, the database file size remains unchanged because SQLite just marks the deleted objects as free and reserves their space for future uses. As a result, the database file always grows in size.\nTo optimize the database file, you can run the VACUUM command in the database, which rebuilds the database and frees up unused space. You can execute the command in the database using the dbSendQuery() function.\n\nres <- dbSendQuery(tidy_finance, \"VACUUM\")\nres\n\n<SQLiteResult>\n SQL VACUUM\n ROWS Fetched: 0 [complete]\n Changed: 0\n\n\nThe VACUUM command actually performs a couple of additional cleaning steps, which you can read about in this tutorial. \nWe store the result of the above query in res because the database keeps the result set open. To close open results and avoid warnings going forward, we can use dbClearResult().\n\ndbClearResult(res)\n\nApart from cleaning up, you might be interested in listing all the tables that are currently in your database. You can do this via the dbListTables() function.\n\ndbListTables(tidy_finance)\n\n [1] \"beta\" \"compustat\" \n [3] \"cpi_monthly\" \"crsp_daily\" \n [5] \"crsp_monthly\" \"factors_ff3_daily\" \n [7] \"factors_ff3_monthly\" \"factors_ff5_monthly\" \n [9] \"factors_q_monthly\" \"fisd\" \n[11] \"industries_ff_monthly\" \"macro_predictors\" \n[13] \"trace_enhanced\" \n\n\nThis function comes in handy if you are unsure about the correct naming of the tables in your database.", "crumbs": [ "R", - "Portfolio Optimization", - "Parametric Portfolio Policies" + "Financial Data", + "Accessing and Managing Financial Data" ] }, { - "objectID": "r/parametric-portfolio-policies.html#more-model-specifications", - "href": "r/parametric-portfolio-policies.html#more-model-specifications", - "title": "Parametric Portfolio Policies", - "section": "More Model Specifications", - "text": "More Model Specifications\nHow does the portfolio perform for different model specifications? For this purpose, we compute the performance of a number of different modeling choices based on the entire CRSP sample. The next code chunk performs all the heavy lifting.\n\nevaluate_optimal_performance <- function(data, \n objective_measure,\n value_weighting, \n allow_short_selling) {\n optimal_theta <- optim(\n par = theta,\n fn = compute_objective_function,\n data = data,\n objective_measure = \"Expected utility\",\n value_weighting = TRUE,\n allow_short_selling = TRUE,\n method = \"Nelder-Mead\"\n )\n\n processed_data = compute_portfolio_weights(\n optimal_theta$par, \n data,\n value_weighting,\n allow_short_selling\n )\n \n portfolio_evaluation = evaluate_portfolio(\n processed_data,\n capm_evaluation = TRUE,\n full_evaluation = TRUE\n )\n \n return(portfolio_evaluation) \n}\n\nspecifications <- expand_grid(\n data = list(data_portfolios),\n objective_measure = \"Expected utility\",\n value_weighting = c(TRUE, FALSE),\n allow_short_selling = c(TRUE, FALSE)\n) |> \n mutate(\n portfolio_evaluation = pmap(\n .l = list(data, objective_measure, value_weighting, allow_short_selling),\n .f = evaluate_optimal_performance\n )\n)\n\nFinally, we can compare the results. The table below shows summary statistics for all possible combinations: equal- or value-weighted benchmark portfolio, with or without short-selling constraints, and tilted toward maximizing expected utility.\n\nperformance_table <- specifications |>\n select(\n value_weighting,\n allow_short_selling,\n portfolio_evaluation\n ) |>\n unnest(portfolio_evaluation)\n\nperformance_table |>\n rename(\n \" \" = benchmark,\n Optimal = tilt\n ) |>\n mutate(\n value_weighting = case_when(\n value_weighting == TRUE ~ \"VW\",\n value_weighting == FALSE ~ \"EW\"\n ),\n allow_short_selling = case_when(\n allow_short_selling == TRUE ~ \"\",\n allow_short_selling == FALSE ~ \"(no s.)\"\n )\n ) |>\n pivot_wider(\n names_from = value_weighting:allow_short_selling,\n values_from = \" \":Optimal,\n names_glue = \"{value_weighting} {allow_short_selling} {.value} \"\n ) |>\n select(\n measure,\n `EW `,\n `VW `,\n sort(contains(\"Optimal\"))\n ) |>\n print(n = 11)\n\n# A tibble: 11 × 7\n measure `EW ` `VW ` `VW Optimal ` `VW (no s.) Optimal `\n <chr> <dbl> <dbl> <dbl> <dbl>\n 1 Expected u… -0.251 -2.50e-1 -0.247 -0.248 \n 2 Average re… 10.0 6.87e+0 12.9 12.1 \n 3 SD return 20.5 1.55e+1 19.5 19.0 \n 4 Sharpe rat… 0.489 4.44e-1 0.660 0.636 \n 5 CAPM alpha 0.00200 1.41e-4 0.00506 0.00425\n 6 Market beta 1.13 9.94e-1 1.01 1.03 \n 7 Absolute w… 0.0249 2.49e-2 0.0345 0.0249 \n 8 Max. weight 0.0249 3.63e+0 3.48 2.91 \n 9 Min. weight 0.0249 2.70e-5 -0.0281 0 \n10 Avg. sum o… 0 0 20.1 0 \n11 Avg. fract… 0 0 36.8 0 \n# ℹ 2 more variables: `EW Optimal ` <dbl>,\n# `EW (no s.) Optimal ` <dbl>\n\n\nThe results indicate that the average annualized Sharpe ratio of the equal-weighted portfolio exceeds the Sharpe ratio of the value-weighted benchmark portfolio. Nevertheless, starting with the weighted value portfolio as a benchmark and tilting optimally with respect to momentum and small stocks yields the highest Sharpe ratio across all specifications. Finally, imposing no short-sale constraints does not improve the performance of the portfolios in our application.", + "objectID": "r/accessing-and-managing-financial-data.html#exercises", + "href": "r/accessing-and-managing-financial-data.html#exercises", + "title": "Accessing and Managing Financial Data", + "section": "Exercises", + "text": "Exercises\n\nDownload the monthly Fama-French factors manually from Ken French’s data library and read them in via read_csv(). Validate that you get the same data as via the frenchdata package.\nDownload the daily Fama-French 5 factors using the frenchdata package. Use get_french_data_list() to find the corresponding table name. After the successful download and conversion to the column format that we used above, compare the rf, mkt_excess, smb, and hml columns of factors_ff3_daily to factors_ff5_daily. Discuss any differences you might find.", "crumbs": [ "R", - "Portfolio Optimization", - "Parametric Portfolio Policies" + "Financial Data", + "Accessing and Managing Financial Data" ] }, { - "objectID": "r/parametric-portfolio-policies.html#exercises", - "href": "r/parametric-portfolio-policies.html#exercises", - "title": "Parametric Portfolio Policies", + "objectID": "r/fixed-effects-and-clustered-standard-errors.html", + "href": "r/fixed-effects-and-clustered-standard-errors.html", + "title": "Fixed Effects and Clustered Standard Errors", + "section": "", + "text": "Note\n\n\n\nYou are reading Tidy Finance with R. You can find the equivalent chapter for the sibling Tidy Finance with Python here.\nIn this chapter, we provide an intuitive introduction to the two popular concepts of fixed effects regressions and clustered standard errors. When working with regressions in empirical finance, you will sooner or later be confronted with discussions around how you deal with omitted variables bias and dependence in your residuals. The concepts we introduce in this chapter are designed to address such concerns.\nWe focus on a classical panel regression common to the corporate finance literature (e.g., Fazzari et al. 1988; Erickson and Whited 2012; Gulen and Ion 2015): firm investment modeled as a function that increases in firm cash flow and firm investment opportunities.\nTypically, this investment regression uses quarterly balance sheet data provided via Compustat because it allows for richer dynamics in the regressors and more opportunities to construct variables. As we focus on the implementation of fixed effects and clustered standard errors, we use the annual Compustat data from our previous chapters and leave the estimation using quarterly data as an exercise. We demonstrate below that the regression based on annual data yields qualitatively similar results to estimations based on quarterly data from the literature, namely confirming the positive relationships between investment and the two regressors.\nThe current chapter relies on the following set of R packages.\nlibrary(tidyverse)\nlibrary(RSQLite)\nlibrary(fixest)\nCompared to previous chapters, we introduce fixest (Bergé 2018) for the fixed effects regressions, the implementation of standard error clusters, and tidy estimation output.", + "crumbs": [ + "R", + "Modeling and Machine Learning", + "Fixed Effects and Clustered Standard Errors" + ] + }, + { + "objectID": "r/fixed-effects-and-clustered-standard-errors.html#data-preparation", + "href": "r/fixed-effects-and-clustered-standard-errors.html#data-preparation", + "title": "Fixed Effects and Clustered Standard Errors", + "section": "Data Preparation", + "text": "Data Preparation\nWe use CRSP and annual Compustat as data sources from our SQLite-database introduced in Accessing and Managing Financial Data and WRDS, CRSP, and Compustat. In particular, Compustat provides balance sheet and income statement data on a firm level, while CRSP provides market valuations. \n\ntidy_finance <- dbConnect(\n SQLite(),\n \"data/tidy_finance_r.sqlite\",\n extended_types = TRUE\n)\n\ncrsp_monthly <- tbl(tidy_finance, \"crsp_monthly\") |>\n select(gvkey, date, mktcap) |>\n collect()\n\ncompustat <- tbl(tidy_finance, \"compustat\") |>\n select(datadate, gvkey, year, at, be, capx, oancf, txdb) |>\n collect()\n\nThe classical investment regressions model the capital investment of a firm as a function of operating cash flows and Tobin’s q, a measure of a firm’s investment opportunities. We start by constructing investment and cash flows which are usually normalized by lagged total assets of a firm. In the following code chunk, we construct a panel of firm-year observations, so we have both cross-sectional information on firms as well as time-series information for each firm.\n\ndata_investment <- compustat |>\n mutate(date = floor_date(datadate, \"month\")) |>\n left_join(compustat |>\n select(gvkey, year, at_lag = at) |>\n mutate(year = year + 1),\n join_by(gvkey, year)\n ) |>\n filter(at > 0, at_lag > 0) |>\n mutate(\n investment = capx / at_lag,\n cash_flows = oancf / at_lag\n )\n\ndata_investment <- data_investment |>\n left_join(data_investment |>\n select(gvkey, year, investment_lead = investment) |>\n mutate(year = year - 1),\n join_by(gvkey, year)\n )\n\nTobin’s q is the ratio of the market value of capital to its replacement costs. It is one of the most common regressors in corporate finance applications (e.g., Fazzari et al. 1988; Erickson and Whited 2012). We follow the implementation of Gulen and Ion (2015) and compute Tobin’s q as the market value of equity (mktcap) plus the book value of assets (at) minus book value of equity (be) plus deferred taxes (txdb), all divided by book value of assets (at). Finally, we only keep observations where all variables of interest are non-missing, and the reported book value of assets is strictly positive.\n\ndata_investment <- data_investment |>\n left_join(crsp_monthly, join_by(gvkey, date)) |>\n mutate(tobins_q = (mktcap + at - be + txdb) / at) |>\n select(gvkey, year, investment_lead, cash_flows, tobins_q) |>\n drop_na()\n\nAs the variable construction typically leads to extreme values that are most likely related to data issues (e.g., reporting errors), many papers include winsorization of the variables of interest. Winsorization involves replacing values of extreme outliers with quantiles on the respective end. The following function implements the winsorization for any percentage cut that should be applied on either end of the distributions. In the specific example, we winsorize the main variables (investment, cash_flows, and tobins_q) at the one percent level.\n\nwinsorize <- function(x, cut) {\n x <- replace(\n x,\n x > quantile(x, 1 - cut, na.rm = T),\n quantile(x, 1 - cut, na.rm = T)\n )\n x <- replace(\n x,\n x < quantile(x, cut, na.rm = T),\n quantile(x, cut, na.rm = T)\n )\n return(x)\n}\n\ndata_investment <- data_investment |>\n mutate(across(\n c(investment_lead, cash_flows, tobins_q),\n ~ winsorize(., 0.01)\n ))\n\nBefore proceeding to any estimations, we highly recommend tabulating summary statistics of the variables that enter the regression. These simple tables allow you to check the plausibility of your numerical variables, as well as spot any obvious errors or outliers. Additionally, for panel data, plotting the time series of the variable’s mean and the number of observations is a useful exercise to spot potential problems.\n\ndata_investment |>\n pivot_longer(\n cols = c(investment_lead, cash_flows, tobins_q),\n names_to = \"measure\"\n ) |>\n group_by(measure) |>\n summarize(\n mean = mean(value),\n sd = sd(value),\n min = min(value),\n q05 = quantile(value, 0.05),\n q50 = quantile(value, 0.50),\n q95 = quantile(value, 0.95),\n max = max(value),\n n = n(),\n .groups = \"drop\"\n )\n\n# A tibble: 3 × 9\n measure mean sd min q05 q50 q95 max n\n <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <int>\n1 cash_flo… 0.00985 0.274 -1.55 -4.75e-1 0.0630 0.272 0.478 130559\n2 investme… 0.0570 0.0766 0 6.18e-4 0.0322 0.204 0.460 130559\n3 tobins_q 1.99 1.69 0.565 7.91e-1 1.39 5.35 10.9 130559", + "crumbs": [ + "R", + "Modeling and Machine Learning", + "Fixed Effects and Clustered Standard Errors" + ] + }, + { + "objectID": "r/fixed-effects-and-clustered-standard-errors.html#fixed-effects", + "href": "r/fixed-effects-and-clustered-standard-errors.html#fixed-effects", + "title": "Fixed Effects and Clustered Standard Errors", + "section": "Fixed Effects", + "text": "Fixed Effects\nTo illustrate fixed effects regressions, we use the fixest package, which is both computationally powerful and flexible with respect to model specifications. We start out with the basic investment regression using the simple model \\[ \\text{Investment}_{i,t+1} = \\alpha + \\beta_1\\text{Cash Flows}_{i,t}+\\beta_2\\text{Tobin's q}_{i,t}+\\varepsilon_{i,t},\\] where \\(\\varepsilon_t\\) is i.i.d. normally distributed across time and firms. We use the feols()-function to estimate the simple model so that the output has the same structure as the other regressions below, but you could also use lm().\n\nmodel_ols <- feols(\n fml = investment_lead ~ cash_flows + tobins_q,\n vcov = \"iid\",\n data = data_investment\n)\nmodel_ols\n\nOLS estimation, Dep. Var.: investment_lead\nObservations: 130,559 \nStandard-errors: IID \n Estimate Std. Error t value Pr(>|t|) \n(Intercept) 0.04209 0.000327 128.8 < 2.2e-16 ***\ncash_flows 0.04923 0.000777 63.3 < 2.2e-16 ***\ntobins_q 0.00724 0.000126 57.5 < 2.2e-16 ***\n---\nSignif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1\nRMSE: 0.074865 Adj. R2: 0.043615\n\n\nAs expected, the regression output shows significant coefficients for both variables. Higher cash flows and investment opportunities are associated with higher investment. However, the simple model actually may have a lot of omitted variables, so our coefficients are most likely biased. As there is a lot of unexplained variation in our simple model (indicated by the rather low adjusted R-squared), the bias in our coefficients is potentially severe, and the true values could be above or below zero. Note that there are no clear cutoffs to decide when an R-squared is high or low, but it depends on the context of your application and on the comparison of different models for the same data.\nOne way to tackle the issue of omitted variable bias is to get rid of as much unexplained variation as possible by including fixed effects; i.e., model parameters that are fixed for specific groups (e.g., Wooldridge 2010). In essence, each group has its own mean in fixed effects regressions. The simplest group that we can form in the investment regression is the firm level. The firm fixed effects regression is then \\[ \\text{Investment}_{i,t+1} = \\alpha_i + \\beta_1\\text{Cash Flows}_{i,t}+\\beta_2\\text{Tobin's q}_{i,t}+\\varepsilon_{i,t},\\] where \\(\\alpha_i\\) is the firm fixed effect and captures the firm-specific mean investment across all years. In fact, you could also compute firms’ investments as deviations from the firms’ average investments and estimate the model without the fixed effects. The idea of the firm fixed effect is to remove the firm’s average investment, which might be affected by firm-specific variables that you do not observe. For example, firms in a specific industry might invest more on average. Or you observe a young firm with large investments but only small concurrent cash flows, which will only happen in a few years. This sort of variation is unwanted because it is related to unobserved variables that can bias your estimates in any direction.\nTo include the firm fixed effect, we use gvkey (Compustat’s firm identifier) as follows:\n\nmodel_fe_firm <- feols(\n investment_lead ~ cash_flows + tobins_q | gvkey,\n vcov = \"iid\",\n data = data_investment\n)\nmodel_fe_firm\n\nOLS estimation, Dep. Var.: investment_lead\nObservations: 130,559 \nFixed-effects: gvkey: 14,556\nStandard-errors: IID \n Estimate Std. Error t value Pr(>|t|) \ncash_flows 0.0141 0.000897 15.8 < 2.2e-16 ***\ntobins_q 0.0107 0.000130 82.5 < 2.2e-16 ***\n---\nSignif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1\nRMSE: 0.049257 Adj. R2: 0.534034\n Within R2: 0.056459\n\n\nThe regression output shows a lot of unexplained variation at the firm level that is taken care of by including the firm fixed effect as the adjusted R-squared rises above 50 percent. In fact, it is more interesting to look at the within R-squared that shows the explanatory power of a firm’s cash flow and Tobin’s q on top of the average investment of each firm. We can also see that the coefficients changed slightly in magnitude but not in sign.\nThere is another source of variation that we can get rid of in our setting: average investment across firms might vary over time due to macroeconomic factors that affect all firms, such as economic crises. By including year fixed effects, we can take out the effect of unobservables that vary over time. The two-way fixed effects regression is then \\[ \\text{Investment}_{i,t+1} = \\alpha_i + \\alpha_t + \\beta_1\\text{Cash Flows}_{i,t}+\\beta_2\\text{Tobin's q}_{i,t}+\\varepsilon_{i,t},\\] where \\(\\alpha_t\\) is the time fixed effect. Here you can think of higher investments during an economic expansion with simultaneously high cash flows.\n\nmodel_fe_firmyear <- feols(\n investment_lead ~ cash_flows + tobins_q | gvkey + year,\n vcov = \"iid\",\n data = data_investment\n)\nmodel_fe_firmyear\n\nOLS estimation, Dep. Var.: investment_lead\nObservations: 130,559 \nFixed-effects: gvkey: 14,556, year: 36\nStandard-errors: IID \n Estimate Std. Error t value Pr(>|t|) \ncash_flows 0.01721 0.000877 19.6 < 2.2e-16 ***\ntobins_q 0.00972 0.000128 75.8 < 2.2e-16 ***\n---\nSignif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1\nRMSE: 0.047969 Adj. R2: 0.557959\n Within R2: 0.049442\n\n\nThe inclusion of time fixed effects did only marginally affect the R-squared and the coefficients, which we can interpret as a good thing as it indicates that the coefficients are not driven by an omitted variable that varies over time.\nHow can we further improve the robustness of our regression results? Ideally, we want to get rid of unexplained variation at the firm-year level, which means we need to include more variables that vary across firm and time and are likely correlated with investment. Note that we cannot include firm-year fixed effects in our setting because then cash flows and Tobin’s q are colinear with the fixed effects, and the estimation becomes void.\nBefore we discuss the properties of our estimation errors, we want to point out that regression tables are at the heart of every empirical analysis, where you compare multiple models. Fortunately, the etable() function provides a convenient way to tabulate the regression output (with many parameters to customize and even print the output in LaTeX). We recommend printing \\(t\\)-statistics rather than standard errors in regression tables because the latter are typically very hard to interpret across coefficients that vary in size. We also do not print p-values because they are sometimes misinterpreted to signal the importance of observed effects (Wasserstein and Lazar 2016). The \\(t\\)-statistics provide a consistent way to interpret changes in estimation uncertainty across different model specifications.\n\netable(\n model_ols, model_fe_firm, model_fe_firmyear,\n coefstat = \"tstat\", digits = 3, digits.stats = 3\n)\n\n model_ols model_fe_firm model_fe_firm..\nDependent Var.: investment_lead investment_lead investment_lead\n \nConstant 0.042*** (128.8) \ncash_flows 0.049*** (63.3) 0.014*** (15.8) 0.017*** (19.6)\ntobins_q 0.007*** (57.5) 0.011*** (82.5) 0.010*** (75.8)\nFixed-Effects: ---------------- --------------- ---------------\ngvkey No Yes Yes\nyear No No Yes\n_______________ ________________ _______________ _______________\nVCOV type IID IID IID\nObservations 130,559 130,559 130,559\nR2 0.044 0.586 0.607\nWithin R2 -- 0.057 0.049\n---\nSignif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1", + "crumbs": [ + "R", + "Modeling and Machine Learning", + "Fixed Effects and Clustered Standard Errors" + ] + }, + { + "objectID": "r/fixed-effects-and-clustered-standard-errors.html#clustering-standard-errors", + "href": "r/fixed-effects-and-clustered-standard-errors.html#clustering-standard-errors", + "title": "Fixed Effects and Clustered Standard Errors", + "section": "Clustering Standard Errors", + "text": "Clustering Standard Errors\nApart from biased estimators, we usually have to deal with potentially complex dependencies of our residuals with each other. Such dependencies in the residuals invalidate the i.i.d. assumption of OLS and lead to biased standard errors. With biased OLS standard errors, we cannot reliably interpret the statistical significance of our estimated coefficients.\nIn our setting, the residuals may be correlated across years for a given firm (time-series dependence), or, alternatively, the residuals may be correlated across different firms (cross-section dependence). One of the most common approaches to dealing with such dependence is the use of clustered standard errors (Petersen 2008). The idea behind clustering is that the correlation of residuals within a cluster can be of any form. As the number of clusters grows, the cluster-robust standard errors become consistent (Donald and Lang 2007; Wooldridge 2010). A natural requirement for clustering standard errors in practice is hence a sufficiently large number of clusters. Typically, around at least 30 to 50 clusters are seen as sufficient (Cameron, Gelbach, and Miller 2011).\nInstead of relying on the iid assumption, we can use the cluster option in the feols-function as above. The code chunk below applies both one-way clustering by firm as well as two-way clustering by firm and year.\n\nmodel_cluster_firm <- feols(\n investment_lead ~ cash_flows + tobins_q | gvkey + year,\n cluster = \"gvkey\",\n data = data_investment\n)\n\nmodel_cluster_firmyear <- feols(\n investment_lead ~ cash_flows + tobins_q | gvkey + year,\n cluster = c(\"gvkey\", \"year\"),\n data = data_investment\n)\n\n The table below shows the comparison of the different assumptions behind the standard errors. In the first column, we can see highly significant coefficients on both cash flows and Tobin’s q. By clustering the standard errors on the firm level, the \\(t\\)-statistics of both coefficients drop in half, indicating a high correlation of residuals within firms. If we additionally cluster by year, we see a drop, particularly for Tobin’s q, again. Even after relaxing the assumptions behind our standard errors, both coefficients are still comfortably significant as the \\(t\\)-statistics are well above the usual critical values of 1.96 or 2.576 for two-tailed significance tests.\n\netable(\n model_fe_firmyear, model_cluster_firm, model_cluster_firmyear,\n coefstat = \"tstat\", digits = 3, digits.stats = 3\n)\n\n model_fe_firm.. model_cluster.. model_cluster...1\nDependent Var.: investment_lead investment_lead investment_lead\n \ncash_flows 0.017*** (19.6) 0.017*** (11.4) 0.017*** (9.58)\ntobins_q 0.010*** (75.8) 0.010*** (35.6) 0.010*** (15.1)\nFixed-Effects: --------------- --------------- ---------------\ngvkey Yes Yes Yes\nyear Yes Yes Yes\n_______________ _______________ _______________ _______________\nVCOV type IID by: gvkey by: gvkey & year\nObservations 130,559 130,559 130,559\nR2 0.607 0.607 0.607\nWithin R2 0.049 0.049 0.049\n---\nSignif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1\n\n\nInspired by Abadie et al. (2017), we want to close this chapter by highlighting that choosing the right dimensions for clustering is a design problem. Even if the data is informative about whether clustering matters for standard errors, they do not tell you whether you should adjust the standard errors for clustering. Clustering at too aggregate levels can hence lead to unnecessarily inflated standard errors.", + "crumbs": [ + "R", + "Modeling and Machine Learning", + "Fixed Effects and Clustered Standard Errors" + ] + }, + { + "objectID": "r/fixed-effects-and-clustered-standard-errors.html#exercises", + "href": "r/fixed-effects-and-clustered-standard-errors.html#exercises", + "title": "Fixed Effects and Clustered Standard Errors", "section": "Exercises", - "text": "Exercises\n\nHow do the estimated parameters \\(\\hat\\theta\\) and the portfolio performance change if your objective is to maximize the Sharpe ratio instead of the hypothetical expected utility?\nThe code above is very flexible in the sense that you can easily add new firm characteristics. Construct a new characteristic of your choice and evaluate the corresponding coefficient \\(\\hat\\theta_i\\).\nTweak the function optimal_theta() such that you can impose additional performance constraints in order to determine \\(\\hat\\theta\\), which maximizes expected utility under the constraint that the market beta is below 1.\nDoes the portfolio performance resemble a realistic out-of-sample backtesting procedure? Verify the robustness of the results by first estimating \\(\\hat\\theta\\) based on past data only. Then, use more recent periods to evaluate the actual portfolio performance.\nBy formulating the portfolio problem as a statistical estimation problem, you can easily obtain standard errors for the coefficients of the weight function. Brandt, Santa-Clara, and Valkanov (2009) provide the relevant derivations in their paper in Equation (10). Implement a small function that computes standard errors for \\(\\hat\\theta\\).", + "text": "Exercises\n\nEstimate the two-way fixed effects model with two-way clustered standard errors using quarterly Compustat data from WRDS. Note that you can access quarterly data via tbl(wrds, I(\"comp.fundq\")).\nFollowing Peters and Taylor (2017), compute Tobin’s q as the market value of outstanding equity mktcap plus the book value of debt (dltt + dlc) minus the current assets atc and everything divided by the book value of property, plant and equipment ppegt. What is the correlation between the measures of Tobin’s q? What is the impact on the two-way fixed effects regressions?", "crumbs": [ "R", - "Portfolio Optimization", - "Parametric Portfolio Policies" + "Modeling and Machine Learning", + "Fixed Effects and Clustered Standard Errors" ] }, { - "objectID": "r/wrds-dummy-data.html", - "href": "r/wrds-dummy-data.html", - "title": "WRDS Dummy Data", + "objectID": "r/working-with-stock-returns.html", + "href": "r/working-with-stock-returns.html", + "title": "Working with Stock Returns", "section": "", - "text": "Note\n\n\n\nThis appendix chapter is based on a blog post Dummy Data for Tidy Finance Readers without Access to WRDS by Christoph Scheuch.\nIn this appendix chapter, we alleviate the constraints of readers who do not have access to WRDS and hence cannot run the code that we provide. We show how to create a dummy database that contains the WRDS tables and corresponding columns such that all code chunks in this book can be executed with this dummy database. We do not create dummy data for tables of macroeconomic variables because they can be freely downloaded from the original sources; check out Accessing and Managing Financial Data.\nWe deliberately use the dummy label because the data is not meaningful in the sense that it allows readers to actually replicate the results of the book. For legal reasons, the data does not contain any samples of the original data. We merely generate random numbers for all columns of the tables that we use throughout the books.\nTo generate the dummy database, we use the following packages:\nlibrary(tidyverse)\nlibrary(RSQLite)\nLet us initialize a SQLite database (tidy_finance_r.sqlite) or connect to your existing one. Be careful, if you already downloaded the data from WRDS, then the code in this chapter will overwrite your data!\ntidy_finance <- dbConnect(\n SQLite(),\n \"data/tidy_finance_r.sqlite\",\n extended_types = TRUE\n)\nSince we draw random numbers for most of the columns, we also define a seed to ensure that the generated numbers are replicable. We also initialize vectors of dates of different frequencies over ten years that we then use to create yearly, monthly, and daily data, respectively.\nset.seed(1234)\n\nstart_date <- as.Date(\"2003-01-01\")\nend_date <- as.Date(\"2022-12-31\")\n\ntime_series_years <- seq(year(start_date), year(end_date), 1)\ntime_series_months <- seq(start_date, end_date, \"1 month\")\ntime_series_days <- seq(start_date, end_date, \"1 day\")", + "text": "Note\n\n\n\nYou are reading Tidy Finance with R. You can find the equivalent chapter for the sibling Tidy Finance with Python here.\nThe main aim of this chapter is to familiarize yourself with the tidyverse for working with stock market data. We focus on downloading and visualizing stock data from Yahoo Finance.\nAt the start of each session, we load the required R packages. Throughout the entire book, we always use the tidyverse (Wickham et al. 2019). In this chapter, we also load the tidyfinance package to download stock price data. This package provides a convenient wrapper for various quantitative functions compatible with the tidyverse and our book. Finally, the package scales (Wickham and Seidel 2022) provides useful scale functions for visualizations.\nYou typically have to install a package once before you can load it. In case you have not done this yet, call, for instance, install.packages(\"tidyfinance\").\nlibrary(tidyverse)\nlibrary(tidyfinance)\nlibrary(scales)\nWe first download daily prices for one stock symbol, e.g., the Apple stock, AAPL, directly from the data provider Yahoo Finance. To download the data, you can use the function download_data. If you do not know how to use it, make sure you read the help file by calling ?download_data. We especially recommend taking a look at the examples section of the documentation. We request daily data for a period of more than 20 years.\nprices <- download_data(\n type = \"stock_prices\",\n symbols = \"AAPL\",\n start_date = \"2000-01-01\",\n end_date = \"2023-12-31\"\n)\nprices\n\n# A tibble: 6,037 × 8\n symbol date volume open low high close adjusted_close\n <chr> <date> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>\n1 AAPL 2000-01-03 535796800 0.936 0.908 1.00 0.999 0.843\n2 AAPL 2000-01-04 512377600 0.967 0.903 0.988 0.915 0.772\n3 AAPL 2000-01-05 778321600 0.926 0.920 0.987 0.929 0.783\n4 AAPL 2000-01-06 767972800 0.948 0.848 0.955 0.848 0.716\n5 AAPL 2000-01-07 460734400 0.862 0.853 0.902 0.888 0.749\n# ℹ 6,032 more rows\ndownload_data(type = \"stock_prices\") downloads stock market data from Yahoo Finance. The function returns a tibble with eight quite self-explanatory columns: symbol, date, the daily volume (in the number of traded shares), the market prices at the open, high, low, close, and the adjusted price in USD. The adjusted prices are corrected for anything that might affect the stock price after the market closes, e.g., stock splits and dividends. These actions affect the quoted prices, but they have no direct impact on the investors who hold the stock. Therefore, we often rely on adjusted prices when it comes to analyzing the returns an investor would have earned by holding the stock continuously.\nNext, we use the ggplot2 package (Wickham 2016) to visualize the time series of adjusted prices in Figure 1 . This package takes care of visualization tasks based on the principles of the grammar of graphics (Wilkinson 2012).\nprices |>\n ggplot(aes(x = date, y = adjusted_close)) +\n geom_line() +\n labs(\n x = NULL,\n y = NULL,\n title = \"Apple stock prices between beginning of 2000 and end of 2023\"\n )\n\n\n\n\n\n\n\nFigure 1: Prices are in USD, adjusted for dividend payments and stock splits.\nInstead of analyzing prices, we compute daily net returns defined as \\(r_t = p_t / p_{t-1} - 1\\), where \\(p_t\\) is the adjusted day \\(t\\) price. In that context, the function lag() is helpful, which returns the previous value in a vector.\nreturns <- prices |>\n arrange(date) |>\n mutate(ret = adjusted_close / lag(adjusted_close) - 1) |>\n select(symbol, date, ret)\nreturns\n\n# A tibble: 6,037 × 3\n symbol date ret\n <chr> <date> <dbl>\n1 AAPL 2000-01-03 NA \n2 AAPL 2000-01-04 -0.0843\n3 AAPL 2000-01-05 0.0146\n4 AAPL 2000-01-06 -0.0865\n5 AAPL 2000-01-07 0.0474\n# ℹ 6,032 more rows\nThe resulting tibble contains three columns, where the last contains the daily returns (ret). Note that the first entry naturally contains a missing value (NA) because there is no previous price. Obviously, the use of lag() would be meaningless if the time series is not ordered by ascending dates. The command arrange() provides a convenient way to order observations in the correct way for our application. In case you want to order observations by descending dates, you can use arrange(desc(date)).\nFor the upcoming examples, we remove missing values as these would require separate treatment when computing, e.g., sample averages. In general, however, make sure you understand why NA values occur and carefully examine if you can simply get rid of these observations.\nreturns <- returns |>\n drop_na(ret)\nNext, we visualize the distribution of daily returns in a histogram in Figure 2. Additionally, we add a dashed line that indicates the 5 percent quantile of the daily returns to the histogram, which is a (crude) proxy for the worst return of the stock with a probability of at most 5 percent. The 5 percent quantile is closely connected to the (historical) value-at-risk, a risk measure commonly monitored by regulators. We refer to Tsay (2010) for a more thorough introduction to stylized facts of returns.\nquantile_05 <- quantile(returns |> pull(ret), probs = 0.05)\nreturns |>\n ggplot(aes(x = ret)) +\n geom_histogram(bins = 100) +\n geom_vline(aes(xintercept = quantile_05),\n linetype = \"dashed\"\n ) +\n labs(\n x = NULL,\n y = NULL,\n title = \"Distribution of daily Apple stock returns\"\n ) +\n scale_x_continuous(labels = percent)\n\n\n\n\n\n\n\nFigure 2: The dotted vertical line indicates the historical 5 percent quantile.\nHere, bins = 100 determines the number of bins used in the illustration and hence implicitly the width of the bins. Before proceeding, make sure you understand how to use the geom geom_vline() to add a dashed line that indicates the 5 percent quantile of the daily returns. A typical task before proceeding with any data is to compute summary statistics for the main variables of interest.\nreturns |>\n summarize(across(\n ret,\n list(\n daily_mean = mean,\n daily_sd = sd,\n daily_min = min,\n daily_max = max\n )\n ))\n\n# A tibble: 1 × 4\n ret_daily_mean ret_daily_sd ret_daily_min ret_daily_max\n <dbl> <dbl> <dbl> <dbl>\n1 0.00122 0.0247 -0.519 0.139\nWe see that the maximum daily return was 13.905 percent. Perhaps not surprisingly, the average daily return is close to but slightly above 0. In line with the illustration above, the large losses on the day with the minimum returns indicate a strong asymmetry in the distribution of returns.\nYou can also compute these summary statistics for each year individually by imposing group_by(year = year(date)), where the call year(date) returns the year. More specifically, the few lines of code below compute the summary statistics from above for individual groups of data defined by year. The summary statistics, therefore, allow an eyeball analysis of the time-series dynamics of the return distribution.\nreturns |>\n group_by(year = year(date)) |>\n summarize(across(\n ret,\n list(\n daily_mean = mean,\n daily_sd = sd,\n daily_min = min,\n daily_max = max\n ),\n .names = \"{.fn}\"\n )) |>\n print(n = Inf)\n\n# A tibble: 24 × 5\n year daily_mean daily_sd daily_min daily_max\n <dbl> <dbl> <dbl> <dbl> <dbl>\n 1 2000 -0.00346 0.0549 -0.519 0.137 \n 2 2001 0.00233 0.0393 -0.172 0.129 \n 3 2002 -0.00121 0.0305 -0.150 0.0846\n 4 2003 0.00186 0.0234 -0.0814 0.113 \n 5 2004 0.00470 0.0255 -0.0558 0.132 \n 6 2005 0.00349 0.0245 -0.0921 0.0912\n 7 2006 0.000950 0.0243 -0.0633 0.118 \n 8 2007 0.00366 0.0238 -0.0702 0.105 \n 9 2008 -0.00265 0.0367 -0.179 0.139 \n10 2009 0.00382 0.0214 -0.0502 0.0676\n11 2010 0.00183 0.0169 -0.0496 0.0769\n12 2011 0.00104 0.0165 -0.0559 0.0589\n13 2012 0.00130 0.0186 -0.0644 0.0887\n14 2013 0.000472 0.0180 -0.124 0.0514\n15 2014 0.00145 0.0136 -0.0799 0.0820\n16 2015 0.0000199 0.0168 -0.0612 0.0574\n17 2016 0.000575 0.0147 -0.0657 0.0650\n18 2017 0.00164 0.0111 -0.0388 0.0610\n19 2018 -0.0000573 0.0181 -0.0663 0.0704\n20 2019 0.00266 0.0165 -0.0996 0.0683\n21 2020 0.00281 0.0294 -0.129 0.120 \n22 2021 0.00131 0.0158 -0.0417 0.0539\n23 2022 -0.000970 0.0225 -0.0587 0.0890\n24 2023 0.00168 0.0128 -0.0480 0.0469\nIn case you wonder: the additional argument .names = \"{.fn}\" in across() determines how to name the output columns. The specification is rather flexible and allows almost arbitrary column names, which can be useful for reporting. The print() function simply controls the output options for the R console.", "crumbs": [ "R", - "Appendix", - "WRDS Dummy Data" + "Getting Started", + "Working with Stock Returns" ] }, { - "objectID": "r/wrds-dummy-data.html#create-stock-dummy-data", - "href": "r/wrds-dummy-data.html#create-stock-dummy-data", - "title": "WRDS Dummy Data", - "section": "Create Stock Dummy Data", - "text": "Create Stock Dummy Data\nLet us start with the core data used throughout the book: stock and firm characteristics. We first generate a table with a cross-section of stock identifiers with unique permno and gvkey values, as well as associated exchcd, exchange, industry, and siccd values. The generated data is based on the characteristics of stocks in the crsp_monthly table of the original database, ensuring that the generated stocks roughly reflect the distribution of industries and exchanges in the original data, but the identifiers and corresponding exchanges or industries do not reflect actual firms. Similarly, the permno-gvkey combinations are purely nonsensical and should not be used together with actual CRSP or Compustat data.\n\nnumber_of_stocks <- 100\n\nindustries <- tibble(\n industry = c(\"Agriculture\", \"Construction\", \"Finance\", \n \"Manufacturing\", \"Mining\", \"Public\", \"Retail\", \n \"Services\", \"Transportation\", \"Utilities\", \n \"Wholesale\"),\n n = c(81, 287, 4682, 8584, 1287, 1974, 1571, 4277, 1249, \n 457, 904),\n prob = c(0.00319, 0.0113, 0.185, 0.339, 0.0508, 0.0779, \n 0.0620, 0.169, 0.0493, 0.0180, 0.0357)\n)\n\nexchanges <- exchanges <- tibble(\n exchange = c(\"AMEX\", \"NASDAQ\", \"NYSE\"),\n n = c(2893, 17236, 5553),\n prob = c(0.113, 0.671, 0.216)\n)\n\nstock_identifiers <- 1:number_of_stocks |> \n map_df(\n function(x) {\n tibble(\n permno = x,\n gvkey = as.character(x + 10000),\n exchange = sample(exchanges$exchange, 1, \n prob = exchanges$prob),\n industry = sample(industries$industry, 1, \n prob = industries$prob)\n ) |> \n mutate(\n exchcd = case_when(\n exchange == \"NYSE\" ~ sample(c(1, 31), n()),\n exchange == \"AMEX\" ~ sample(c(2, 32), n()),\n exchange == \"NASDAQ\" ~ sample(c(3, 33), n())\n ),\n siccd = case_when(\n industry == \"Agriculture\" ~ sample(1:999, n()),\n industry == \"Mining\" ~ sample(1000:1499, n()),\n industry == \"Construction\" ~ sample(1500:1799, n()),\n industry == \"Manufacturing\" ~ sample(1800:3999, n()),\n industry == \"Transportation\" ~ sample(4000:4899, n()),\n industry == \"Utilities\" ~ sample(4900:4999, n()),\n industry == \"Wholesale\" ~ sample(5000:5199, n()),\n industry == \"Retail\" ~ sample(5200:5999, n()),\n industry == \"Finance\" ~ sample(6000:6799, n()),\n industry == \"Services\" ~ sample(7000:8999, n()),\n industry == \"Public\" ~ sample(9000:9999, n())\n )\n )\n }\n )\n\nNext, we construct three panels of stock data with varying frequencies: yearly, monthly, and daily. We begin by creating the stock_panel_yearly panel. To achieve this, we combine the stock_identifiers table with a new table containing the variable year from dummy_years. The expand_grid() function ensures that we get all possible combinations of the two tables. After combining, we select only the gvkey and year columns for our final yearly panel.\nNext, we construct the stock_panel_monthly panel. Similar to the yearly panel, we use the expand_grid() function to combine stock_identifiers with a new table that has the date variable from dummy_months. After merging, we select the columns permno, gvkey, date, siccd, industry, exchcd, and exchange to form our monthly panel.\nLastly, we create the stock_panel_daily panel. We combine stock_identifiers with a table containing the date variable from dummy_days. After merging, we retain only the permno and date columns for our daily panel.\n\nstock_panel_yearly <- expand_grid(\n stock_identifiers, \n tibble(year = time_series_years)\n) |> \n select(gvkey, year)\n\nstock_panel_monthly <- expand_grid(\n stock_identifiers, \n tibble(date = time_series_months)\n) |> \n select(permno, gvkey, date, siccd, industry, exchcd, exchange)\n\nstock_panel_daily <- expand_grid(\n stock_identifiers, \n tibble(date = time_series_days)\n)|> \n select(permno, date)\n\n\nDummy beta table\nWe then proceed to create dummy beta values for our stock_panel_monthly table. We generate monthly beta values beta_monthly using the rnorm() function with a mean and standard deviation of 1. For daily beta values beta_daily, we take the dummy monthly beta and add a small random noise to it. This noise is generated again using the rnorm() function, but this time we divide the random values by 100 to ensure they are small deviations from the monthly beta.\n\nbeta_dummy <- stock_panel_monthly |> \n mutate(\n beta_monthly = rnorm(n(), mean = 1, sd = 1),\n beta_daily = beta_monthly + rnorm(n()) / 100\n )\n\ndbWriteTable(\n tidy_finance,\n \"beta\", \n beta_dummy, \n overwrite = TRUE\n)\n\n\n\nDummy compustat table\nTo create dummy firm characteristics, we take all columns from the compustat table and create random numbers between 0 and 1. For simplicity, we set the datadate for each firm-year observation to the last day of the year, although it is empirically not the case. \n\nrelevant_columns <- c(\n \"seq\", \"ceq\", \"at\", \"lt\", \"txditc\", \"txdb\", \"itcb\", \n \"pstkrv\", \"pstkl\", \"pstk\", \"capx\", \"oancf\", \"sale\", \n \"cogs\", \"xint\", \"xsga\", \"be\", \"op\", \"at_lag\", \"inv\"\n)\n\ncommands <- unlist(\n map(\n relevant_columns, \n ~rlang::exprs(!!..1 := runif(n()))\n )\n)\n\ncompustat_dummy <- stock_panel_yearly |> \n mutate(\n datadate = ymd(str_c(year, \"12\", \"31\")),\n !!!commands\n )\n\ndbWriteTable(\n tidy_finance, \n \"compustat\", \n compustat_dummy,\n overwrite = TRUE\n)\n\n\n\nDummy crsp_monthly table\nThe crsp_monthly table only lacks a few more columns compared to stock_panel_monthly: the returns ret drawn from a normal distribution, the excess returns ret_excess with small deviations from the returns, the shares outstanding shrout and the last price per month altprc both drawn from uniform distributions, and the market capitalization mktcap as the product of shrout and altprc. \n\ncrsp_monthly_dummy <- stock_panel_monthly |> \n mutate(\n ret = pmax(rnorm(n()), -1),\n ret_excess = pmax(ret - runif(n(), 0, 0.0025), -1),\n shrout = runif(n(), 1, 50) * 1000,\n altprc = runif(n(), 0, 1000),\n mktcap = shrout * altprc\n ) |> \n group_by(permno) |> \n arrange(date) |> \n mutate(mktcap_lag = lag(mktcap)) |> \n ungroup()\n\ndbWriteTable(\n tidy_finance, \n \"crsp_monthly\",\n crsp_monthly_dummy,\n overwrite = TRUE\n)\n\n\n\nDummy crsp_daily table\nThe crsp_daily table only contains a date column and the daily excess returns ret_excess as additional columns to stock_panel_daily.\n\ncrsp_daily_dummy <- stock_panel_daily |> \n mutate(\n ret_excess = pmax(rnorm(n()), -1)\n )\n\ndbWriteTable(\n tidy_finance,\n \"crsp_daily\",\n crsp_daily_dummy, \n overwrite = TRUE\n)", + "objectID": "r/working-with-stock-returns.html#scaling-up-the-analysis", + "href": "r/working-with-stock-returns.html#scaling-up-the-analysis", + "title": "Working with Stock Returns", + "section": "Scaling Up the Analysis", + "text": "Scaling Up the Analysis\nAs a next step, we generalize the code from before such that all the computations can handle an arbitrary vector of symbols (e.g., all constituents of an index). Following tidy principles, it is quite easy to download the data, plot the price time series, and tabulate the summary statistics for an arbitrary number of assets.\nThis is where the tidyverse magic starts: tidy data makes it extremely easy to generalize the computations from before to as many assets as you like. The following code takes any vector of symbols, e.g., symbol <- c(\"AAPL\", \"MMM\", \"BA\"), and automates the download as well as the plot of the price time series. In the end, we create the table of summary statistics for an arbitrary number of assets. We perform the analysis with data from all current constituents of the Dow Jones Industrial Average index. \n\nsymbols <- download_data(type = \"constituents\", index = \"Dow Jones Industrial Average\") \nsymbols\n\n# A tibble: 30 × 5\n symbol name location exchange currency\n <chr> <chr> <chr> <chr> <chr> \n1 GS GOLDMAN SACHS GROUP INC Vereinigte Staaten New Yor… USD \n2 UNH UNITEDHEALTH GROUP INC Vereinigte Staaten New Yor… USD \n3 MSFT MICROSOFT CORP Vereinigte Staaten NASDAQ USD \n4 HD HOME DEPOT INC Vereinigte Staaten New Yor… USD \n5 CAT CATERPILLAR INC Vereinigte Staaten New Yor… USD \n# ℹ 25 more rows\n\n\nConveniently, tidyfinance provides the functionality to get all stock prices from an index with a single call. \n\nprices_daily <- download_data(\n type = \"stock_prices\",\n symbols = symbols$symbol,\n start_date = \"2000-01-01\",\n end_date = \"2023-12-31\"\n)\n\nThe resulting tibble contains 177925 daily observations for GS, UNH, MSFT, HD, CAT, SHW, CRM, V, AXP, MCD, AMGN, AAPL, TRV, JPM, HON, AMZN, IBM, BA, PG, CVX, JNJ, NVDA, MMM, DIS, MRK, WMT, NKE, KO, CSCO, VZ different stocks. Figure 3 illustrates the time series of downloaded adjusted prices for each of the constituents of the Dow index. Make sure you understand every single line of code! What are the arguments of aes()? Which alternative geoms could you use to visualize the time series? Hint: if you do not know the answers try to change the code to see what difference your intervention causes.\n\nfig_prices <- prices_daily |>\n ggplot(aes(\n x = date,\n y = adjusted_close,\n color = symbol\n )) +\n geom_line() +\n labs(\n x = NULL,\n y = NULL,\n color = NULL,\n title = \"Stock prices of Dow index constituents\"\n ) +\n theme(legend.position = \"none\")\nfig_prices\n\n\n\n\n\n\n\nFigure 3: Prices in USD, adjusted for dividend payments and stock splits.\n\n\n\n\n\nDo you notice the small differences relative to the code we used before? All we need to do to illustrate all stock symbols simultaneously is to include color = symbol in the ggplot aesthetics. In this way, we generate a separate line for each symbol. Of course, there are simply too many lines on this graph to identify the individual stocks properly, but it illustrates the point well.\nThe same holds for stock returns. Before computing the returns, we use group_by(symbol) such that the mutate() command is performed for each symbol individually. The same logic also applies to the computation of summary statistics: group_by(symbol) is the key to aggregating the time series into symbol-specific variables of interest.\n\nreturns_daily <- prices_daily |>\n group_by(symbol) |>\n mutate(ret = adjusted_close / lag(adjusted_close) - 1) |>\n select(symbol, date, ret) |>\n drop_na(ret)\n\nreturns_daily |>\n group_by(symbol) |>\n summarize(across(\n ret,\n list(\n daily_mean = mean,\n daily_sd = sd,\n daily_min = min,\n daily_max = max\n ),\n .names = \"{.fn}\"\n )) |>\n print(n = Inf)\n\n# A tibble: 30 × 5\n symbol daily_mean daily_sd daily_min daily_max\n <chr> <dbl> <dbl> <dbl> <dbl>\n 1 AAPL 0.00122 0.0247 -0.519 0.139\n 2 AMGN 0.000493 0.0194 -0.134 0.151\n 3 AMZN 0.00107 0.0315 -0.248 0.345\n 4 AXP 0.000544 0.0227 -0.176 0.219\n 5 BA 0.000628 0.0222 -0.238 0.243\n 6 CAT 0.000724 0.0203 -0.145 0.147\n 7 CRM 0.00119 0.0266 -0.271 0.260\n 8 CSCO 0.000322 0.0234 -0.162 0.244\n 9 CVX 0.000511 0.0175 -0.221 0.227\n10 DIS 0.000414 0.0194 -0.184 0.160\n11 GS 0.000557 0.0229 -0.190 0.265\n12 HD 0.000544 0.0192 -0.287 0.141\n13 HON 0.000497 0.0191 -0.174 0.282\n14 IBM 0.000297 0.0163 -0.155 0.120\n15 JNJ 0.000379 0.0121 -0.158 0.122\n16 JPM 0.000606 0.0238 -0.207 0.251\n17 KO 0.000318 0.0131 -0.101 0.139\n18 MCD 0.000536 0.0145 -0.159 0.181\n19 MMM 0.000363 0.0151 -0.129 0.126\n20 MRK 0.000371 0.0166 -0.268 0.130\n21 MSFT 0.000573 0.0193 -0.156 0.196\n22 NKE 0.000708 0.0193 -0.198 0.155\n23 NVDA 0.00175 0.0376 -0.352 0.424\n24 PG 0.000362 0.0133 -0.302 0.120\n25 SHW 0.000860 0.0180 -0.208 0.153\n26 TRV 0.000555 0.0181 -0.208 0.256\n27 UNH 0.000948 0.0196 -0.186 0.348\n28 V 0.000933 0.0185 -0.136 0.150\n29 VZ 0.000238 0.0151 -0.118 0.146\n30 WMT 0.000323 0.0148 -0.114 0.117\n\n\n\nNote that you are now also equipped with all tools to download price data for each symbol listed in the S&P 500 index with the same number of lines of code. Just use symbol <- download_data(type = \"constituents\", index = \"S&P 500\"), which provides you with a tibble that contains each symbol that is (currently) part of the S&P 500. However, don’t try this if you are not prepared to wait for a couple of minutes because this is quite some data to download!", "crumbs": [ "R", - "Appendix", - "WRDS Dummy Data" + "Getting Started", + "Working with Stock Returns" ] }, { - "objectID": "r/wrds-dummy-data.html#create-bond-dummy-data", - "href": "r/wrds-dummy-data.html#create-bond-dummy-data", - "title": "WRDS Dummy Data", - "section": "Create Bond Dummy Data", - "text": "Create Bond Dummy Data\nLastly, we move to the bond data that we use in our books.\n\nDummy fisd data\nTo create dummy data with the structure of Mergent FISD, we calculate the empirical probabilities of actual bonds for several variables: maturity, offering_amt, interest_frequency, coupon, and sic_code. We use these probabilities to sample a small cross-section of bonds with completely made up complete_cusip, issue_id, and issuer_id.\n\nnumber_of_bonds <- 100\n\nfisd_dummy <- 1:number_of_bonds |> \n map_df(\n function(x) {\n tibble(\n complete_cusip = str_to_upper(\n str_c(\n sample(c(letters, 0:9), 12, replace = TRUE), \n collapse = \"\"\n )\n ),\n )\n }\n ) |> \n mutate(\n maturity = sample(time_series_days, n(), replace = TRUE),\n offering_amt = sample(seq(1:100) * 100000, n(), replace = TRUE),\n offering_date = maturity - sample(seq(1:25) * 365, n(),replace = TRUE),\n dated_date = offering_date - sample(-10:10, n(), replace = TRUE),\n interest_frequency = sample(c(0, 1, 2, 4, 12), n(), replace = TRUE),\n coupon = sample(seq(0, 2, by = 0.1), n(), replace = TRUE),\n last_interest_date = pmax(maturity, offering_date, dated_date),\n issue_id = row_number(),\n issuer_id = sample(1:250, n(), replace = TRUE),\n sic_code = as.character(sample(seq(1:9)*1000, n(), replace = TRUE))\n )\n \ndbWriteTable(\n tidy_finance, \n \"fisd\", \n fisd_dummy, \n overwrite = TRUE\n)\n\n\n\nDummy trace_enhanced data\nFinally, we create a dummy bond transaction data for the fictional CUSIPs of the dummy fisd data. We take the date range that we also analyze in the book and ensure that we have at least five transactions per day to fulfill a filtering step in the book. \n\nstart_date <- as.Date(\"2014-01-01\")\nend_date <- as.Date(\"2016-11-30\")\n\nbonds_panel <- expand_grid(\n fisd_dummy |> \n select(cusip_id = complete_cusip),\n tibble(\n trd_exctn_dt = seq(start_date, end_date, \"1 day\")\n )\n)\n\ntrace_enhanced_dummy <- bind_rows(\n bonds_panel, bonds_panel, \n bonds_panel, bonds_panel, \n bonds_panel) |> \n mutate(\n trd_exctn_tm = str_c(\n sample(0:24, n(), replace = TRUE), \":\", \n sample(0:60, n(), replace = TRUE), \":\", \n sample(0:60, n(), replace = TRUE)\n ),\n rptd_pr = runif(n(), 10, 200),\n entrd_vol_qt = sample(1:20, n(), replace = TRUE) * 1000,\n yld_pt = runif(n(), -10, 10),\n rpt_side_cd = sample(c(\"B\", \"S\"), n(), replace = TRUE),\n cntra_mp_id = sample(c(\"C\", \"D\"), n(), replace = TRUE)\n ) \n \ndbWriteTable(\n tidy_finance, \n \"trace_enhanced\", \n trace_enhanced_dummy, \n overwrite = TRUE\n)\n\nAs stated in the introduction, the data does not contain any samples of the original data. We merely generate random numbers for all columns of the tables that we use throughout this book.", + "objectID": "r/working-with-stock-returns.html#other-forms-of-data-aggregation", + "href": "r/working-with-stock-returns.html#other-forms-of-data-aggregation", + "title": "Working with Stock Returns", + "section": "Other Forms of Data Aggregation", + "text": "Other Forms of Data Aggregation\nOf course, aggregation across variables other than symbol can also make sense. For instance, suppose you are interested in answering the question: Are days with high aggregate trading volume likely followed by days with high aggregate trading volume? To provide some initial analysis on this question, we take the downloaded data and compute aggregate daily trading volume for all Dow index constituents in USD. Recall that the column volume is denoted in the number of traded shares. Thus, we multiply the trading volume with the daily closing price to get a proxy for the aggregate trading volume in USD. Scaling by 1e9 (R can handle scientific notation) denotes daily trading volume in billion USD.\n\ntrading_volume <- prices_daily |>\n group_by(date) |>\n summarize(trading_volume = sum(volume * adjusted_close))\n\nfig_trading_volume <- trading_volume |>\n ggplot(aes(x = date, y = trading_volume)) +\n geom_line() +\n labs(\n x = NULL, y = NULL,\n title = \"Aggregate daily trading volume of Dow index constitutens\"\n ) +\n scale_y_continuous(labels = unit_format(unit = \"B\", scale = 1e-9))\nfig_trading_volume\n\n\n\n\n\n\n\nFigure 4: Total daily trading volume in billion USD.\n\n\n\n\n\nFigure 4 indicates a clear upward trend in aggregated daily trading volume. In particular, since the outbreak of the COVID-19 pandemic, markets have processed substantial trading volumes, as analyzed, for instance, by Goldstein, Koijen, and Mueller (2021). One way to illustrate the persistence of trading volume would be to plot volume on day \\(t\\) against volume on day \\(t-1\\) as in the example below. In Figure 5, we add a dotted 45°-line to indicate a hypothetical one-to-one relation by geom_abline(), addressing potential differences in the axes’ scales.\n\nfig_persistence <- trading_volume |>\n ggplot(aes(x = lag(trading_volume), y = trading_volume)) +\n geom_point() +\n geom_abline(aes(intercept = 0, slope = 1),\n linetype = \"dashed\"\n ) +\n labs(\n x = \"Previous day aggregate trading volume\",\n y = \"Aggregate trading volume\",\n title = \"Persistence in daily trading volume of Dow index constituents\"\n ) + \n scale_x_continuous(labels = unit_format(unit = \"B\", scale = 1e-9)) +\n scale_y_continuous(labels = unit_format(unit = \"B\", scale = 1e-9))\nfig_persistence\n\nWarning: Removed 1 rows containing missing values (`geom_point()`).\n\n\n\n\n\n\n\n\nFigure 5: Total daily trading volume in billion USD.\n\n\n\n\n\nDo you understand where the warning ## Warning: Removed 1 rows containing missing values (geom_point). comes from and what it means? Purely eye-balling reveals that days with high trading volume are often followed by similarly high trading volume days.", "crumbs": [ "R", - "Appendix", - "WRDS Dummy Data" + "Getting Started", + "Working with Stock Returns" + ] + }, + { + "objectID": "r/working-with-stock-returns.html#key-takeaways", + "href": "r/working-with-stock-returns.html#key-takeaways", + "title": "Working with Stock Returns", + "section": "Key Takeaways", + "text": "Key Takeaways\nIn this chapter, you learned how to effectively use R to download, analyze, and visualize stock market data using tidy principles. From downloading adjusted stock prices to computing returns, summarizing statistics, and visualizing trends, we have laid a solid foundation for working with financial data. Key takeaways include the importance of using adjusted prices for return calculations, leveraging tidyverse-tools for efficient data manipulation, and employing visualizations like histograms and line charts to uncover insights. Scaling up analyses to handle multiple stocks or broader indices demonstrates the flexibility of tidy data workflows. Equipped with these foundational techniques, you are now ready to apply them to different contexts in financial economics coming in subsequent chapters.", + "crumbs": [ + "R", + "Getting Started", + "Working with Stock Returns" + ] + }, + { + "objectID": "r/working-with-stock-returns.html#exercises", + "href": "r/working-with-stock-returns.html#exercises", + "title": "Working with Stock Returns", + "section": "Exercises", + "text": "Exercises\n\nDownload daily prices for another stock market symbol of your choice from Yahoo Finance with download_data() from the tidyfinance package. Plot two time series of the symbol’s un-adjusted and adjusted closing prices. Explain the differences.\nCompute daily net returns for an asset of your choice and visualize the distribution of daily returns in a histogram using 100 bins. Also, use geom_vline() to add a dashed red vertical line that indicates the 5 percent quantile of the daily returns. Compute summary statistics (mean, standard deviation, minimum and maximum) for the daily returns.\nTake your code from before and generalize it such that you can perform all the computations for an arbitrary vector of symbols (e.g., symbol <- c(\"AAPL\", \"MMM\", \"BA\")). Automate the download, the plot of the price time series, and create a table of return summary statistics for this arbitrary number of assets.\nAre days with high aggregate trading volume often also days with large absolute returns? Find an appropriate visualization to analyze the question using the symbol AAPL. 1.Compute monthly returns from the downloaded stock market prices. Compute the vector of historical average returns and the sample variance-covariance matrix. Compute the minimum variance portfolio weights and the portfolio volatility and average returns. Visualize the mean-variance efficient frontier. Choose one of your assets and identify the portfolio which yields the same historical volatility but achieves the highest possible average return.", + "crumbs": [ + "R", + "Getting Started", + "Working with Stock Returns" ] }, { @@ -890,6 +962,150 @@ "Factor Selection via Machine Learning" ] }, + { + "objectID": "r/discounted-cash-flow-analysis.html", + "href": "r/discounted-cash-flow-analysis.html", + "title": "Discounted Cash Flow Analysis", + "section": "", + "text": "In this chapter, we address a fundamental question: what is the value of a company? ompany valuation is a critical tool that helps us determine the economic value of a business. Whether it’s for investment decisions, mergers and acquisitions, or financial reporting, understanding a company’s value is essential. But valuation isn’t just about assigning a number - it’s about providing a framework for making informed decisions. For example, investors use valuation to identify whether a stock is under- or over-valued. Companies rely on valuation for strategic decisions, like pricing an acquisition or preparing for an IPO.\nThere are several approaches to valuation, each suited to different purposes and scenarios:\nWe focus on DCF analysis in this chapter for a couple of reasons:\nBecause it focuses on long-term cash flows and risk factors, DCF serves as a foundation for long-term strategic decision making. It helps management, investors, and analysts take a holistic view of a company’s future value.\nDCF in its essence comprises of three key components. The first component is forecasted free cash flows (FCF), which represent the expected future earnings of the company. FCF measure the cash that remains after accounting for operating expenses, taxes, investments, and changes in working capital. They give us a clear picture of the cash available for distribution to investors, making them a key indicator of value.\nNext, we have the continuation value, also known as the terminal value. This captures the value of the business beyond the explicit forecast period. Since forecasting cash flows far into the future is inherently uncertain, the terminal value accounts for the bulk of a company’s value in many DCF analyses.\nThe final component is the discount rate, which adjusts future cash flows to their present value by accounting for risk and the time value of money. Typically, we use the Weighted Average Cost of Capital (WACC), which combines the costs of equity and debt, weighted by their proportion in the company’s capital structure. In practice, getting the WACC right is crucial, as small changes in the discount rate can have a significant impact on the final valuation.\nIn this chapter, we rely on the following packages to build a simple DCF analysis:\nlibrary(tidyverse)\nlibrary(tidyfinance)\nlibrary(scales)\nlibrary(fmpapi)", + "crumbs": [ + "R", + "Getting Started", + "Discounted Cash Flow Analysis" + ] + }, + { + "objectID": "r/discounted-cash-flow-analysis.html#prepare-data", + "href": "r/discounted-cash-flow-analysis.html#prepare-data", + "title": "Discounted Cash Flow Analysis", + "section": "Prepare Data", + "text": "Prepare Data\n\nImport data using Financial Modeling Prep (FMP) API\nR package: tidy-finance/r-fmpapi\n\n\nsymbol <- \"MSFT\"\n\nincome_statements <- fmp_get(\"income-statement\", symbol, list(period = \"annual\", limit = 5))\ncash_flow_statements <- fmp_get(\"cash-flow-statement\", symbol, list(period = \"annual\", limit = 5))\n\nFCF is the cash that a company generates after accounting for outflows to support operatings & maintain capital assets. It represents the cash available to investors - both equity holders and debt holders - after a company has met its operational and capital expenditure needs. There are multiple ways to calculate FCF and we use the following definition1\n\\[\\text{FCF} = \\text{EBIT} + \\text{Depreciation & Amortization} - \\text{Taxes} + \\Delta \\text{Working Capital} - \\text{CAPEX}\\] Let’s break down the formula for FCF step by step:\n\nEBIT (Earnings Before Interest and Taxes): represents the company’s core operating profit, excluding the effects of financing and tax expenses.\nDepreciation & Amortization: non-cash expenses that allocate the cost of tangible and intangible assets over their useful lives.\nTaxes: the amount a company pays to the government, calculated on its taxable income.\n\\(\\Delta\\) Working Capital: the change in current assets minus current liabilities, reflecting the cash needed to support daily operations.\nCAPEX (Capital Expenditures): funds used by a company to acquire or upgrade physical assets like property, buildings, or equipment.\n\nWe can calculate FCF using these items as follows:\n\ndcf_data <- income_statements |> \n mutate(\n ebit = net_income + income_tax_expense - interest_expense - interest_income\n ) |> \n select(\n year = calendar_year, \n ebit, revenue, depreciation_and_amortization, taxes = income_tax_expense\n ) |> \n left_join(\n cash_flow_statements |> \n select(year = calendar_year, \n delta_working_capital = change_in_working_capital,\n capex = capital_expenditure), join_by(year)\n ) |> \n mutate(\n fcf = ebit + depreciation_and_amortization - taxes + delta_working_capital - capex\n ) |> \n arrange(year)", + "crumbs": [ + "R", + "Getting Started", + "Discounted Cash Flow Analysis" + ] + }, + { + "objectID": "r/discounted-cash-flow-analysis.html#forecast-free-cash-flow", + "href": "r/discounted-cash-flow-analysis.html#forecast-free-cash-flow", + "title": "Discounted Cash Flow Analysis", + "section": "Forecast Free-Cash-Flow", + "text": "Forecast Free-Cash-Flow\nNow that we’ve covered the components of FCF, let’s discuss how to forecast FCF over the projection period. Forecasting FCF typically involves a balance between data-driven analysis and informed judgment. While historical data provides a foundation, financial analysts often need to make educated assumptions about the future. The first step is to express the components of FCF as ratios relative to revenue. This ratio-based approach makes it easier to link the components of FCF to a single driving variable: revenue.\n\ndcf_data <- dcf_data |> \n mutate(\n revenue_growth = revenue / lag(revenue) - 1,\n operating_margin = ebit / revenue,\n da_margin = depreciation_and_amortization / revenue,\n taxes_to_revenue = taxes / revenue,\n delta_working_capital_to_revenue = delta_working_capital / revenue,\n capex_to_revenue = capex / revenue\n )\n\nfig_financial_ratios <- dcf_data |> \n pivot_longer(cols = c(operating_margin:capex_to_revenue)) |>\n ggplot(aes(x = year, y = value, color = name)) +\n geom_line() +\n scale_x_continuous(breaks = pretty_breaks()) +\n scale_y_continuous(labels = percent) +\n labs(\n x = NULL, y = NULL, color = NULL,\n title = \"Key financial ratios of Microsoft between 2020 and 2024\"\n )\nfig_financial_ratios\n\n\n\n\n\n\n\nFigure 1: Ratios are based on financial statements as provided through the FMP API.\n\n\n\n\n\nNext, analysts use their understanding of the company’s operations and industry dynamics to make subjective guesses about how these ratios might change in the future.\nWe define examplary ratio dynamics in Figure 2.\n\ndcf_data_forecast_ratios <- tribble(\n ~year, ~operating_margin, ~da_margin, ~taxes_to_revenue, ~delta_working_capital_to_revenue, ~capex_to_revenue,\n 2025, 0.41, 0.09, 0.08, 0.001, -0.2,\n 2026, 0.42, 0.09, 0.07, 0.001, -0.22,\n 2027, 0.43, 0.09, 0.06, 0.001, -0.2,\n 2028, 0.44, 0.09, 0.06, 0.001, -0.18,\n 2029, 0.45, 0.09, 0.06, 0.001, -0.16\n) |> \n mutate(type = \"Forecast\")\n\ndcf_data <- dcf_data |> \n mutate(type = \"Realized\") |> \n bind_rows(dcf_data_forecast_ratios)\n\nfig_financial_ratios_forecast <- dcf_data |> \n pivot_longer(cols = c(operating_margin:capex_to_revenue)) |> \n ggplot(aes(x = year, y = value, color = name, linetype = rev(type))) +\n geom_line() +\n scale_x_continuous(breaks = pretty_breaks()) +\n scale_y_continuous(labels = percent) +\n labs(\n x = NULL, y = NULL, color = NULL, linetype = NULL,\n title = \"Key financial ratios and ad-hoc forecasts of Microsoft between 2020 and 2029\"\n )\nfig_financial_ratios_forecast\n\n\n\n\n\n\n\nFigure 2: Realized ratios are based on financial statements as provided through the FMP API, while forecasts are manually defined.\n\n\n\n\n\nFinally, revenue growth projections are typically based on macroeconomic factors, such as GDP growth or industry trends, as well as company-specific factors like market share and product pipelines. A great starting point for revenue growth projections is the IMF World Economic Outlook (WEO) data for the US. This resource provides in-depth analyses of the global economy, including trends and forecasts for key metrics like GDP growth. However, keep in mind that macroeconomic data and hence GDP forecasts always have some inherent lag.\nA simple method is to model revenue growth as a linear function of GDP growth. The idea is straightforward: if GDP grows by, e.g., 3%, you might project company revenue to grow by a similar rate, adjusted for factors like market share and industry dynamics.\nThe following code chunk implements this approach:\n\ngdp_growth <- tibble(\n year = 2020:2029,\n gdp_growth = c(-0.02163, 0.06055, 0.02512, 0.02887, 0.02765, 0.02153, 0.02028, 0.02120, 0.02122, 0.02122)\n)\n\ndcf_data <- dcf_data |> \n left_join(gdp_growth, join_by(year)) \n\nrevenue_growth_model <- dcf_data |> \n lm(revenue_growth ~ gdp_growth, data = _) |> \n coefficients()\n \ndcf_data <- dcf_data |> \n mutate(\n revenue_growth_modeled = revenue_growth_model[1] + revenue_growth_model[2] * gdp_growth,\n revenue_growth = if_else(type == \"Forecast\", revenue_growth_modeled, revenue_growth) \n ) \n\n\nfig_growth <- dcf_data |> \n filter(year >= 2021) |> \n pivot_longer(cols = c(revenue_growth, gdp_growth)) |> \n ggplot(aes(x = year, y = value, color = name, linetype = rev(type))) +\n geom_line() +\n scale_x_continuous(breaks = pretty_breaks()) +\n scale_y_continuous(labels = percent) +\n labs(\n x = NULL, y = NULL, color = NULL, linetype = NULL,\n title = \"GDP growth and Microsoft revenue growth and modeled forecasts between 2020 and 2029\"\n )\nfig_growth\n\n\n\n\n\n\n\nFigure 3: Realized revue growth rates are based on financial statements as provided through the FMP API, while forecasts are modeled unsing IMF WEO forecasts.\n\n\n\n\n\nAn alternative approach would be to look up consensus analyst forecasts, but this data is typically proprietary.\nNow that we have all required components, we can finally calculate FCF forecasts.\n\ndcf_data$revenue_growth[1] <- 0\ndcf_data$revenue <- dcf_data$revenue[1] * cumprod(1 + dcf_data$revenue_growth)\n\ndcf_data <- dcf_data |> \n mutate(\n ebit = operating_margin * revenue,\n depreciation_and_amortization = da_margin * revenue,\n taxes = taxes_to_revenue * revenue,\n delta_working_capital = delta_working_capital_to_revenue * revenue,\n capex = capex_to_revenue * revenue,\n fcf = ebit + depreciation_and_amortization - taxes + delta_working_capital - capex\n )\n\nFigure 4 Vvisualizes these FCF forecasts.\n\nfig_fcf <- dcf_data |>\n ggplot(aes(x = year, y = fcf / 1e9)) +\n geom_col(aes(fill = type)) +\n scale_x_continuous(breaks = pretty_breaks()) +\n scale_y_continuous(labels = comma) + \n labs(\n x = NULL, y = \"Free Cash Flow (in B USD)\", fill = NULL,\n title = \"Actual and predicted free cash flow for Microsoft from 2020 to 2029\"\n )\nfig_fcf\n\n\n\n\n\n\n\nFigure 4: Realized growth rates are based on financial statements as provided through the FMP API, while forecasts are manually defined.", + "crumbs": [ + "R", + "Getting Started", + "Discounted Cash Flow Analysis" + ] + }, + { + "objectID": "r/discounted-cash-flow-analysis.html#forecast-revenue-growth", + "href": "r/discounted-cash-flow-analysis.html#forecast-revenue-growth", + "title": "Discounted Cash Flow Analysis", + "section": "Forecast Revenue Growth", + "text": "Forecast Revenue Growth\n\nGet current IMF World Economic Outlook (WEO) data for US\nIMF publishes analyses of global economy, including trends & forecasts\nLast forecast update: October 2024\nSimple approach: model revenue growth as a linear function of GDP growth\n\nAlternative: look up consensus analyst forecasts (typically proprietary)\nCreate forecasts using IMF WEO\n\ngdp_growth <- tibble(\n year = 2020:2029,\n gdp_growth = c(-0.02163, 0.06055, 0.02512, 0.02887, 0.02765, 0.02153, 0.02028, 0.02120, 0.02122, 0.02122)\n)\n\ndcf_data <- dcf_data |> \n left_join(gdp_growth, join_by(year)) \n\nrevenue_growth_model <- dcf_data |> \n lm(revenue_growth ~ gdp_growth, data = _) |> \n coefficients()\n \ndcf_data <- dcf_data |> \n mutate(\n revenue_growth_modeled = revenue_growth_model[1] + revenue_growth_model[2] * gdp_growth,\n revenue_growth = if_else(type == \"Forecast\", revenue_growth_modeled, revenue_growth) \n ) \n\n\nfig_growth <- dcf_data |> \n filter(year >= 2021) |> \n pivot_longer(cols = c(revenue_growth, gdp_growth)) |> \n ggplot(aes(x = year, y = value, color = name, linetype = rev(type))) +\n geom_line() +\n scale_x_continuous(breaks = pretty_breaks()) +\n scale_y_continuous(labels = percent) +\n labs(\n x = NULL, y = NULL, color = NULL, linetype = NULL,\n title = \"GDP growth and Microsoft revenue growth and modeled forecasts between 2020 and 2029\"\n )\nfig_growth\n\n\n\n\n\n\n\nFigure 3: Realized revue growth rates are based on financial statements as provided through the FMP API, while forecasts are modeled unsing IMF WEO forecasts.", + "crumbs": [ + "R", + "Getting Started", + "Discounted Cash Flow Analysis" + ] + }, + { + "objectID": "r/discounted-cash-flow-analysis.html#calculate-fcf-forecasts", + "href": "r/discounted-cash-flow-analysis.html#calculate-fcf-forecasts", + "title": "Discounted Cash Flow Analysis", + "section": "Calculate FCF Forecasts", + "text": "Calculate FCF Forecasts\n\ndcf_data$revenue_growth[1] <- 0\ndcf_data$revenue <- dcf_data$revenue[1] * cumprod(1 + dcf_data$revenue_growth)\n\ndcf_data <- dcf_data |> \n mutate(\n ebit = operating_margin * revenue,\n depreciation_and_amortization = da_margin * revenue,\n taxes = taxes_to_revenue * revenue,\n delta_working_capital = delta_working_capital_to_revenue * revenue,\n capex = capex_to_revenue * revenue,\n fcf = ebit + depreciation_and_amortization - taxes + delta_working_capital - capex\n )\n\nVisualize FCF\n\nfig_fcf <- dcf_data |>\n ggplot(aes(x = year, y = fcf / 1e9)) +\n geom_col(aes(fill = type)) +\n scale_x_continuous(breaks = pretty_breaks()) +\n scale_y_continuous(labels = comma) + \n labs(\n x = NULL, y = \"Free Cash Flow (in B USD)\", fill = NULL,\n title = \"Actual and predicted free cash flow for Microsoft from 2020 to 2029\"\n )\nfig_fcf\n\n\n\n\n\n\n\nFigure 4: Realized growth rates are based on financial statements as provided through the FMP API, while forecasts are manually defined.", + "crumbs": [ + "R", + "Getting Started", + "Discounted Cash Flow Analysis" + ] + }, + { + "objectID": "r/discounted-cash-flow-analysis.html#continuation-value", + "href": "r/discounted-cash-flow-analysis.html#continuation-value", + "title": "Discounted Cash Flow Analysis", + "section": "Continuation Value", + "text": "Continuation Value\nNow, let’s discuss how to compute the continuation value, also known as the terminal value. This value is a critical component of the DCF analysis, as it often represents a significant portion of a company’s overall valuation. One typical approach is the Perpetuity Growth Model, which assumes that free cash flows grow at a constant rate indefinitely. The model simply assumes that cash flows grow at a constant rate indefinitely:\n\\[TV_{T} = \\frac{{FCF_{T+1}}}{{r - g}},\\] where \\(r\\) is the discount rate, typically measured by the WACC, and \\(g\\) is the perpetual growth rate.\nFor our application, we need to make an assumption for perpetual growth rate. For instance, last 20 years GDP growth is a sensible assumption (nominal growth rate is 4% for the US).\nAn alternative method is the exit multiple approach, which estimates the continuation value based on a multiple of EBITDA or another financial metric at the end of the forecast period.2\n\ncompute_terminal_value <- function(last_fcf, growth_rate, discount_rate){\n last_fcf * (1 + growth_rate) / (discount_rate - growth_rate)\n}\n\nlast_fcf <- tail(dcf_data$fcf, 1)\nterminal_value <- compute_terminal_value(last_fcf, 0.04, 0.08)\nterminal_value / 1e9\n\n[1] 7564", + "crumbs": [ + "R", + "Getting Started", + "Discounted Cash Flow Analysis" + ] + }, + { + "objectID": "r/discounted-cash-flow-analysis.html#discount-rates", + "href": "r/discounted-cash-flow-analysis.html#discount-rates", + "title": "Discounted Cash Flow Analysis", + "section": "Discount Rates", + "text": "Discount Rates\nAs a last critical step, we need to bring future cash flows to their present value using a discount factor. In company valuation settings, the WACC typically serves this purpose. The WACC represents the average rate of return required by all the company’s investors, including both equity holders and debt holders. It reflects the overall cost of financing the company’s operations and serves as the discount rate in our valuation. The definition is a follows:\n\\[WACC = \\frac{E}{D+E} \\cdot r^E + \\frac{D}{D+E} \\cdot r^D \\cdot (1 - \\tau),\\]\nwhere \\(E\\) is the market value of the company’s equity with required return \\(r^E\\), \\(D\\) is the market value of the company’s debt with pre-tax return \\(r^D\\), and \\(\\tau\\) is the tax rate.\nWhile you can often find estimates of WACC from financial databases or analysts’ reports, sometimes you may need to calculate it yourself. Let’s walk through the practical steps to estimate WACC using real-world data:\n\n\\(E\\) is typically measured as the market value of the company’s equity. One common approach is to calculate it by subtracting net debt (total debt minus cash) from the enterprise value.\n\\(D\\) is often measured using the book value of the company’s debt. While this might not perfectly reflect market conditions, it’s a practical starting point when market data is unavailable.\nThe Capital Asset Pricing Model (CAPM) is a popular method to estimate the cost of equity \\(r^E\\). It considers the risk-free rate, the equity risk premium, and the company’s beta. For a detailed guide on how to estimate the CAPM, we refer to Chapter Capital Asset Pricing Model.\nThe return on debt \\(r^D\\) can also be estimated in different ways. For instance, effective interest rates can be calculated as the ratio of interest expense to total debt from financial statements. This gives you a real-world measure of what the company is currently paying. Alternatively, you can look up corporate bond spreads for companies in the same rating group. For highly rated companies like Microsoft, this would reflect their low-risk profile and correspondingly low borrowing costs.\n\nIf you’d rather not estimate WACC manually, there are excellent resources available to help you find industry-specific discount rates. One of the most widely used sources is Aswath Damodaran’s database. He maintains an extensive database that provides a wealth of financial data, including estimated discount rates, cash flows, growth rates, multiples, and more. What makes his database particularly valuable is its level of detail and coverage of multiple industries and regions. For example, if you’re analyzing a company in the Computer Services sector, as we do here, you can look up the industry’s average WACC and use it as a benchmark for your analysis. The following code chunk downloads the WACC data and extracts the value for this industry:\n\nlibrary(readxl)\n\nfile <- tempfile(fileext = \"xls\")\n\nurl <- \"https://pages.stern.nyu.edu/~adamodar/pc/datasets/wacc.xls\"\ndownload.file(url, file)\nwacc_raw <- read_xls(file, sheet = 2, skip = 18)\nunlink(file)\n\nwacc <- wacc_raw |> \n filter(`Industry Name` == \"Computer Services\") |> \n pull(`Cost of Capital`)", + "crumbs": [ + "R", + "Getting Started", + "Discounted Cash Flow Analysis" + ] + }, + { + "objectID": "r/discounted-cash-flow-analysis.html#compute-dcf-value", + "href": "r/discounted-cash-flow-analysis.html#compute-dcf-value", + "title": "Discounted Cash Flow Analysis", + "section": "Compute DCF Value", + "text": "Compute DCF Value\n\\[\n\\text{Total DCF Value} = \\sum_{t=1}^{\\text{T}} \\frac{\\text{FCF}_t}{(1 + \\text{WACC})^t} + \\frac{\\text{TV}_{T}}{(1 + \\text{WACC})^{\\text{T}}}\n\\]\n\nforecasted_years <- 5\n\ncompute_dcf <- function(wacc, growth_rate, years = 5) {\n free_cash_flow <- dcf_data$fcf\n last_fcf <- tail(free_cash_flow, 1)\n terminal_value <- compute_terminal_value(last_fcf, growth_rate, wacc)\n \n present_value_fcf <- free_cash_flow / (1 + wacc)^(1:years)\n present_value_tv <- terminal_value / (1 + wacc)^years\n total_dcf_value <- sum(present_value_fcf) + present_value_tv\n total_dcf_value\n}\n\ncompute_dcf(wacc, 0.03) / 1e9\n\n[1] 6084", + "crumbs": [ + "R", + "Getting Started", + "Discounted Cash Flow Analysis" + ] + }, + { + "objectID": "r/discounted-cash-flow-analysis.html#sensitvity-analysis", + "href": "r/discounted-cash-flow-analysis.html#sensitvity-analysis", + "title": "Discounted Cash Flow Analysis", + "section": "Sensitvity Analysis", + "text": "Sensitvity Analysis\nOne of the key challenges in a DCF analysis is that it relies heavily on assumptions about the future, growth, and risk. This is where sensitivity analysis comes into play, helping us understand how changes in these assumptions can impact our valuation. For instance, small changes in assumptions like operating margin or CAPEX as a percentage of revenue might lead to noticeable shifts in FCF projections, or overestimating or underestimating revenue growth might have a cascading effect on cash flow projections.\nWe focus on the commonly biggest drivers of valuation that can have dramatic effects on the calculation: the perpetual growth rate and the WACC. The following code chunk implements different WACC and growth scenarios:\n\nwacc_range <- seq(0.06, 0.08, by = 0.01)\ngrowth_rate_range <- seq(0.02, 0.04, by = 0.01)\n\nsensitivity <- expand_grid(\n wacc = wacc_range,\n growth_rate = growth_rate_range\n) |>\n mutate(value = pmap_dbl(list(wacc, growth_rate), compute_dcf))\n\nfig_sensitivity <- sensitivity |> \n mutate(value = round(value / 1e9, 0)) |> \n ggplot(aes(x = wacc, y = growth_rate, fill = value)) +\n geom_tile() +\n geom_text(aes(label = comma(value)), color = \"white\") +\n scale_x_continuous(labels = percent) + \n scale_y_continuous(labels = percent) +\n scale_fill_continuous(labels = comma) + \n labs(\n title = \"DCF value of Microsoft for different WACC and growth scenarios\",\n x = \"WACC\",\n y = \"Perpetual growth rate\",\n fill = \"Company value\"\n ) + \n guides(fill = guide_colorbar(barwidth = 15, barheight = 0.5))\nfig_sensitivity\n\n\n\n\n\n\n\nFigure 5: DCF value combines data from FMP API, ad-hoc forecasts of financial ratios, and IMF WEO growth forecasts.\n\n\n\n\n\nFigure 5 shows that …", + "crumbs": [ + "R", + "Getting Started", + "Discounted Cash Flow Analysis" + ] + }, + { + "objectID": "r/discounted-cash-flow-analysis.html#from-dcf-to-equity-value", + "href": "r/discounted-cash-flow-analysis.html#from-dcf-to-equity-value", + "title": "Discounted Cash Flow Analysis", + "section": "From DCF to Equity Value", + "text": "From DCF to Equity Value\nDCF model provides an estimate for value of operations\n\\[\\text{Equity Value} = \\text{DCF Value} + \\text{Non-Operating Assets} - \\text{Value of Debt}\\]\n\nNon-Operating Assets: not essential to operations, but generate income (e.g., marketable securities, vacant land, idle equipment)\nValue of Debt: in theory market value of total debt, in practice book debt", + "crumbs": [ + "R", + "Getting Started", + "Discounted Cash Flow Analysis" + ] + }, + { + "objectID": "r/discounted-cash-flow-analysis.html#key-takeaways", + "href": "r/discounted-cash-flow-analysis.html#key-takeaways", + "title": "Discounted Cash Flow Analysis", + "section": "Key takeaways", + "text": "Key takeaways\nThe DCF method provides a structured framework for making informed decisions. By breaking down the valuation process into clear, logical steps, it helps analysts and decision-makers focus on the fundamentals that drive value. DCF stands out because it values companies or projects based on their projected future cash flows, rather than just historical data or market sentiment. This forward-looking approach makes it especially useful for long-term strategic decisions. The three core elements that we discussed in this chapter are: free cash flow, continuation value, and the WACC. Finally, the quality of a DCF analysis critically depends on the assumptions we make. Key drivers like financial ratios, revenue growth, and WACC require careful validation, as even small errors can lead to significant deviations in valuation.", + "crumbs": [ + "R", + "Getting Started", + "Discounted Cash Flow Analysis" + ] + }, + { + "objectID": "r/discounted-cash-flow-analysis.html#exercises", + "href": "r/discounted-cash-flow-analysis.html#exercises", + "title": "Discounted Cash Flow Analysis", + "section": "Exercises", + "text": "Exercises\n\n…", + "crumbs": [ + "R", + "Getting Started", + "Discounted Cash Flow Analysis" + ] + }, { "objectID": "r/trace-and-fisd.html", "href": "r/trace-and-fisd.html", @@ -3098,6 +3314,162 @@ "Option Pricing via Machine Learning" ] }, + { + "objectID": "r/financial-ratios.html", + "href": "r/financial-ratios.html", + "title": "Financial Ratios", + "section": "", + "text": "In this chapter, we explore the role of financial statements and financial ratios in analyzing companies. Financial statements are essential because they serve as a standardized source of information, providing a consistent framework that enables investors, creditors, and analysts to assess a company’s financial health and performance. All companies are legally required to file financial statements, which adds a layer of accountability and reliability to the information they disclose. Public companies, in particular, are subject to even more rigorous standards: they must have their financial statements independently audited, which helps ensure accuracy and integrity in reporting. Additionally, in the United States, public companies are required by the Securities and Exchange Commission, or SEC, to file their financials quarterly and annually. This requirement ensures that investors and analysts have timely information, allowing them to make informed decisions throughout the year.\nFinancial ratios are tools for understanding a company’s financial health and performance. They facilitate:\nThis chapter is based on the following packages:\nlibrary(tidyverse)\nlibrary(tidyfinance)\nlibrary(scales)\nlibrary(ggrepel)\nlibrary(fmpapi)", + "crumbs": [ + "R", + "Getting Started", + "Financial Ratios" + ] + }, + { + "objectID": "r/financial-ratios.html#balance-sheet-statements", + "href": "r/financial-ratios.html#balance-sheet-statements", + "title": "Financial Ratios", + "section": "Balance Sheet Statements", + "text": "Balance Sheet Statements\nThe balance sheet provides a snapshot of a company’s financial standing at a specific point in time. It shows the company’s assets, liabilities, and shareholders’ equity, according to the fundamental accounting equation:\n\\[\\text{Assets} = \\text{Liabilities} + \\text{Shareholders’ Equity}\\]\nAssets are the resources owned by the company that are expected to provide future economic benefits, liabilities are obligations the company owes to external parties, and shareholders’ equity is the residual interest in the assets of the company after deducting liabilities. Figure 1 shows a stylized balance sheet with these components.\n\n\n\n\n\n\nFigure 1: A stylized representation of a balance sheet statement.\n\n\n\nLet us dive deeper into the asset side which typically comprises of the following parts: current assets that are expected to be converted into cash or used up within one year, such as cash, accounts receivable (=money owed to a business for goods or services), and inventory, non-current assets, which are long-term investments and property, plant, and equipment (PP&E) that are not expected to be liquidated within a year, and intangible assets, which are non-physical asset such as a patent, brand, trademark, or copyright. Figure 2 shows a stylized breakdown of the asset side.\n\n\n\n\n\n\nFigure 2: A stylized representation of a breakdown of assets on a balance sheet.\n\n\n\nLiabilities are typically split into current liabilities, which are debts or obligations due within one year, including accounts payable and short-term loans, and non-current liabilities, which are long-term debts and obligations due beyond one year, such as long-term debt or deferred taxes. Figure 3 illustrates this breakdown in liabilities.\n\n\n\n\n\n\nFigure 3: A stylized representation of a breakdown of liabilities on a balance sheet.\n\n\n\nLastly, equity is typically divided into retained earnings, which are accumulated profits that have been reinvested in the business rather than distributed as dividends, common stock, which is capital contributed by shareholders, and preferred stock, which is a different type of equity that represents ownership of a company and the right to claim income from the company’s operations, but with limited voting rights. Figure 4 shows the corresponding equity breakdown.\n\n\n\n\n\n\nFigure 4: A stylized representation of a breakdown of equity on a balance sheet.\n\n\n\nFigure 5 shows an example balance sheet from Microsoft in 2023.\n\n\n\n\n\n\nFigure 5: A screenshot of the balance sheet statement of Microsoft in 2023.\n\n\n\nWe can use the fmpapi package to download financial statements:\n\nSEC provides interface to search filings\nFinancial Modeling Prep (FMP) API provides programming interface\nFree tier: 250 calls / day, 5 year historical fundamental data\nR package: tidy-finance/r-fmpapi\nInstall via install.packages(\"fmpapi\")\n\n\nfmp_get(\n resource = \"balance-sheet-statement\", \n symbol = \"MSFT\", \n params = list(period = \"annual\", limit = 5)\n)\n\n# A tibble: 5 × 54\n date symbol reported_currency cik filling_date\n <date> <chr> <chr> <chr> <date> \n1 2024-06-30 MSFT USD 0000789019 2024-07-30 \n2 2023-06-30 MSFT USD 0000789019 2023-07-27 \n3 2022-06-30 MSFT USD 0000789019 2022-07-28 \n4 2021-06-30 MSFT USD 0000789019 2021-07-29 \n5 2020-06-30 MSFT USD 0000789019 2020-07-30 \n# ℹ 49 more variables: accepted_date <dttm>, calendar_year <int>,\n# period <chr>, cash_and_cash_equivalents <dbl>,\n# short_term_investments <dbl>,\n# cash_and_short_term_investments <dbl>, net_receivables <dbl>,\n# inventory <dbl>, other_current_assets <dbl>,\n# total_current_assets <dbl>, property_plant_equipment_net <dbl>,\n# goodwill <dbl>, intangible_assets <dbl>, …", + "crumbs": [ + "R", + "Getting Started", + "Financial Ratios" + ] + }, + { + "objectID": "r/financial-ratios.html#income-statements", + "href": "r/financial-ratios.html#income-statements", + "title": "Financial Ratios", + "section": "Income Statements", + "text": "Income Statements\nIncome statements show a company’s financial performance over a quarter or year by detailing revenue, costs, and profits. It’s main components are:\n\nRevenue (Sales): the total income generated from goods or services sold.\nCost of Goods Sold (COGS): direct costs associated with producing the goods or services (raw materials, labor, etc.).\nGross Profit: revenue minus COGS, showing the basic profitability from core operations.\nOperating Expenses: costs related to regular business operations (Salaries, Rent, Marketing).\nOperating Income (EBIT): earnings before interest and taxes (measures profitability from core operations before financing and tax costs).\nNet Income: The “bottom line”—total profit after all expenses, taxes, and interest are subtracted from revenue.\n\nFigure 6 provides a stylized representation of these components. Income statements are key in analyizing profitability, operational efficiency, and cost management of a company.\n\n\n\n\n\n\nFigure 6: A stylized representation of an income statement.\n\n\n\nFigure 7 shows an example income statements of Microsoft 2023.\n\n\n\n\n\n\nFigure 7: A screenshot of the income statement of Microsoft in 2023.\n\n\n\nDownload income statements data:\n\nfmp_get(\n resource = \"income-statement\", \n symbol = \"MSFT\", \n params = list(period = \"annual\", limit = 5)\n)\n\n# A tibble: 5 × 38\n date symbol reported_currency cik filling_date\n <date> <chr> <chr> <chr> <date> \n1 2024-06-30 MSFT USD 0000789019 2024-07-30 \n2 2023-06-30 MSFT USD 0000789019 2023-07-27 \n3 2022-06-30 MSFT USD 0000789019 2022-07-28 \n4 2021-06-30 MSFT USD 0000789019 2021-07-29 \n5 2020-06-30 MSFT USD 0000789019 2020-07-30 \n# ℹ 33 more variables: accepted_date <dttm>, calendar_year <int>,\n# period <chr>, revenue <dbl>, cost_of_revenue <dbl>,\n# gross_profit <dbl>, gross_profit_ratio <dbl>,\n# research_and_development_expenses <dbl>,\n# general_and_administrative_expenses <dbl>,\n# selling_and_marketing_expenses <dbl>,\n# selling_general_and_administrative_expenses <dbl>, …", + "crumbs": [ + "R", + "Getting Started", + "Financial Ratios" + ] + }, + { + "objectID": "r/financial-ratios.html#cash-flow-statements", + "href": "r/financial-ratios.html#cash-flow-statements", + "title": "Financial Ratios", + "section": "Cash Flow Statements", + "text": "Cash Flow Statements\nCash flow statements provide details about the flow of cash in and out of the business during a quarter or year, categorized into operating, investing, and financing activities. Overall, they show a company’s ability to generate cash to fund operations and growth.\n\nOperating Activities: cash generated from a company’s core business activities (Net Income adjusted for non-cash items like depreciation, and changes in working capital).\nFinancing Activities: cash flows related to borrowing, repaying debt, issuing equity, or paying dividends.\nInvesting Activities: cash spent on or received from long-term investments, such as purchasing or selling property, equipment, or securities.\n\nFigure 8 shows a stylized cash flow statement.\n\n\n\n\n\n\nFigure 8: A stylized representation of a cash flow statement.\n\n\n\nFigure 9 shows example cash flow statements of Microsoft 2023.\n\n\n\n\n\n\nFigure 9: A screenshot of the cash flow statement of Microsoft in 2023.\n\n\n\nDownload cash flow statements data for Microsoft:\n\nfmp_get(\n resource = \"cash-flow-statement\", \n symbol = \"MSFT\", \n params = list(period = \"annual\", limit = 5)\n)\n\n# A tibble: 5 × 40\n date symbol reported_currency cik filling_date\n <date> <chr> <chr> <chr> <date> \n1 2024-06-30 MSFT USD 0000789019 2024-07-30 \n2 2023-06-30 MSFT USD 0000789019 2023-07-27 \n3 2022-06-30 MSFT USD 0000789019 2022-07-28 \n4 2021-06-30 MSFT USD 0000789019 2021-07-29 \n5 2020-06-30 MSFT USD 0000789019 2020-07-30 \n# ℹ 35 more variables: accepted_date <dttm>, calendar_year <int>,\n# period <chr>, net_income <dbl>,\n# depreciation_and_amortization <dbl>, deferred_income_tax <dbl>,\n# stock_based_compensation <dbl>, change_in_working_capital <dbl>,\n# accounts_receivables <dbl>, inventory <int>,\n# accounts_payables <dbl>, other_working_capital <dbl>,\n# other_non_cash_items <int>, …", + "crumbs": [ + "R", + "Getting Started", + "Financial Ratios" + ] + }, + { + "objectID": "r/financial-ratios.html#download-financial-statements", + "href": "r/financial-ratios.html#download-financial-statements", + "title": "Financial Ratios", + "section": "Download Financial Statements", + "text": "Download Financial Statements\n\nconstituents <- download_data_constituents(\"Dow Jones Industrial Average\") |> \n pull(symbol)\n\nparams <- list(period = \"annual\", limit = 5)\n\nbalance_sheet_statements <- constituents |> \n map_df(\n \\(x) fmp_get(resource = \"balance-sheet-statement\", symbol = x, params = params)\n )\n\nincome_statements <- constituents |> \n map_df(\n \\(x) fmp_get(resource = \"income-statement\", symbol = x, params = params)\n )\n\ncash_flow_statements <- constituents |> \n map_df(\n \\(x) fmp_get(resource = \"cash-flow-statement\", symbol = x, params = params)\n )", + "crumbs": [ + "R", + "Getting Started", + "Financial Ratios" + ] + }, + { + "objectID": "r/financial-ratios.html#liquidity-ratios", + "href": "r/financial-ratios.html#liquidity-ratios", + "title": "Financial Ratios", + "section": "Liquidity Ratios", + "text": "Liquidity Ratios\nWe start with ratios that aim to assess a company’s liquidity using items from balance sheet statements. The Current Ratio measures a company’s ability to pay off its short-term liabilities with its short-term assets. A ratio above 1 indicates that the company has more current assets than current liabilities, suggesting good short-term financial health.\n\\[\\text{Current Ratio} = \\frac{\\text{Current Assets}}{\\text{Total Assets}}\\]\nThe next ratio is the Quick Ratio, which measures a company’s ability to meet its short-term obligations without relying on the sale of inventory. A ratio above 1 here indicates that the company can cover its short-term liabilities with its most liquid assets.\n\\[\\text{Quick Ratio} = \\frac{\\text{Current Assets - Liabilities}}{\\text{Current Liabilities}}\\] Lastly, the Cash Ratio measures a company’s ability to pay off its short-term liabilities with its cash and cash equivalents. A ratio of 1 or higher indicates a strong liquidity position.\n\\[\\text{Cash Ratio} = \\frac{\\text{Cash and Cash Equivalents}}{\\text{Current Liabilities}}\\]\n\nselected_symbols <- c(\"MSFT\", \"AAPL\", \"AMZN\")\n\nbalance_sheets_statements <- balance_sheet_statements |> \n mutate(\n current_ratio = total_current_assets / total_assets,\n quick_ratio = (total_current_assets - total_liabilities) / total_current_liabilities,\n cash_ratio = cash_and_cash_equivalents / total_current_liabilities,\n label = if_else(symbol %in% selected_symbols, symbol, NA),\n )\n\nComparing liquidity ratios Figure 10 shows…\n\nfig_liquidity_ratios <- balance_sheets_statements |> \n filter(calendar_year == 2023 & !is.na(label)) |> \n select(symbol, contains(\"ratio\")) |> \n pivot_longer(-symbol) |> \n mutate(name = str_to_title(str_replace_all(name, \"_\", \" \"))) |> \n ggplot(aes(x = value, y = name, fill = symbol)) +\n geom_col(position = \"dodge\") +\n scale_x_continuous(labels = percent) + \n labs(\n x = NULL, y = NULL, fill = NULL,\n title = \"Liquidity ratios for selected stocks from the Dow index for 2023\"\n )\nfig_liquidity_ratios\n\n\n\n\n\n\n\nFigure 10: Liquidity ratios are based on financial statements as provided through the FMP API.", + "crumbs": [ + "R", + "Getting Started", + "Financial Ratios" + ] + }, + { + "objectID": "r/financial-ratios.html#leverage-ratios", + "href": "r/financial-ratios.html#leverage-ratios", + "title": "Financial Ratios", + "section": "Leverage Ratios", + "text": "Leverage Ratios\nThe debt-to-equity ratio measures the proportion of debt financing relative to equity financing. A higher ratio indicates more leverage and potentially higher financial risk.\n\\[\\text{Debt-to-Equity} = \\frac{\\text{Total Debt}}{\\text{Total Equity}}\\]\nThe debt-to-asset ratio indicates the percentage of a company’s assets that are financed by debt. A higher ratio also suggests more leverage.\n\\[\\text{Debt-to-Asset} = \\frac{\\text{Total Debt}}{\\text{Total Assets}}\\]\nInterest Coverage assesses a company’s ability to pay interest on its debt. Here, a higher ratio indicates better capability to meet interest obligations, and hence less financial risk.\n\\[\\text{Interest Coverage} = \\frac{\\text{EBIT}}{\\text{Interest Expense}}\\]\nWe can easily calculate these ratios using our balance sheet and income statements data:\n\nbalance_sheets_statements <- balance_sheets_statements |> \n mutate(\n debt_to_equity = total_debt / total_equity,\n debt_to_asset = total_debt / total_assets\n )\n\nincome_statements <- income_statements |> \n mutate(\n interest_coverage = operating_income / interest_expense,\n label = if_else(symbol %in% selected_symbols, symbol, NA),\n )\n\nFigure 11 shows the debt-to-assets over time.\n\nfig_debt_to_asset <- balance_sheets_statements |> \n filter(symbol %in% selected_symbols) |> \n ggplot(aes(x = calendar_year, y = debt_to_asset,\n color = symbol)) +\n geom_line(linewidth = 1) +\n scale_y_continuous(labels = percent) +\n labs(x = NULL, y = NULL, color = NULL,\n title = \"Debt-to-asset ratios of selected stocks between 2020 and 2024\") \nfig_debt_to_asset\n\n\n\n\n\n\n\nFigure 11: Debt-to-asset ratios are based on financial statements as provided through the FMP API.\n\n\n\n\n\nFigure 12 shows debt-to-asset ratio in the cross-section\n\nselected_colors <- c(\"#F21A00\", \"#EBCC2A\", \"#3B9AB2\", \"lightgrey\")\n\nfig_debt_to_asset_cross_section <- balance_sheets_statements |> \n filter(calendar_year == 2023) |> \n ggplot(aes(x = debt_to_asset,\n y = fct_reorder(symbol, debt_to_asset),\n fill = label)) +\n geom_col() +\n scale_x_continuous(labels = percent) +\n scale_fill_manual(values = selected_colors) +\n labs(x = NULL, y = NULL, color = NULL,\n title = \"Debt-to-asset ratios of Dow index constituents in 2023\") + \n theme(legend.position = \"none\")\nfig_debt_to_asset_cross_section\n\n\n\n\n\n\n\nFigure 12: Debt-to-asset ratios are based on financial statements as provided through the FMP API.\n\n\n\n\n\nFigure 13 shows debt-to-asset vs interest coverage\n\nfig_debt_to_asset_interest_coverage <- income_statements |> \n filter(calendar_year == 2023) |> \n select(symbol, interest_coverage, calendar_year) |> \n left_join(\n balance_sheets_statements,\n join_by(symbol, calendar_year)\n ) |> \n ggplot(aes(x = debt_to_asset, y = interest_coverage, color = label)) +\n geom_point(size = 2) +\n geom_label_repel(aes(label = label), seed = 42, box.padding = 0.75) +\n scale_x_continuous(labels = percent) +\n scale_y_continuous(labels = percent) +\n scale_color_manual(values = selected_colors) +\n labs(\n x = \"Debt-to-Asset\", y = \"Interest Coverage\",\n title = \"Debt-to-asset ratios and interest coverages for Dow index constituents\"\n ) +\n theme(legend.position = \"none\")\nfig_debt_to_asset_interest_coverage\n\nWarning: Removed 27 rows containing missing values\n(`geom_label_repel()`).\n\n\n\n\n\n\n\n\nFigure 13: Debt-to-asset ratios and interest coverages are based on financial statements as provided through the FMP API.", + "crumbs": [ + "R", + "Getting Started", + "Financial Ratios" + ] + }, + { + "objectID": "r/financial-ratios.html#efficiency-ratios", + "href": "r/financial-ratios.html#efficiency-ratios", + "title": "Financial Ratios", + "section": "Efficiency Ratios", + "text": "Efficiency Ratios\nAsset Turnover measures how efficiently a company uses its assets to generate revenue. A higher ratio indicates more efficient use of assets.\n\\[\\text{Asset Turnover} = \\frac{\\text{Revenue}}{\\text{Total Assets}}\\]\nInventory turnover indicates how many times a company’s inventory is sold and replaced over a period. The higher the ratio, the more efficient is the inventory management.\n\\[\\text{Inventory Turnover} = \\frac{\\text{COGS}}{\\text{Inventory}}\\]\nReceivables turnover measures how effectively a company collects receivables. A higher ratio indicates a more efficient credit and collection processes.\n\\[\\text{Receivables Turnover} = \\frac{\\text{Revenue}}{\\text{Accounts Receivable}}\\]\n\ncombined_statements <- balance_sheets_statements |> \n select(symbol, calendar_year, label, current_ratio, quick_ratio, cash_ratio,\n debt_to_equity, debt_to_asset, total_assets, total_equity) |> \n left_join(\n income_statements |> \n select(symbol, calendar_year, interest_coverage, revenue, cost_of_revenue,\n selling_general_and_administrative_expenses, interest_expense,\n gross_profit, net_income),\n join_by(symbol, calendar_year)\n ) |> \n left_join(\n cash_flow_statements |> \n select(symbol, calendar_year, inventory, accounts_receivables),\n join_by(symbol, calendar_year)\n )\n\ncombined_statements <- combined_statements |> \n mutate(\n asset_turnover = revenue / total_assets,\n inventory_turnover = cost_of_revenue / inventory,\n receivables_turnover = revenue / accounts_receivables\n )", + "crumbs": [ + "R", + "Getting Started", + "Financial Ratios" + ] + }, + { + "objectID": "r/financial-ratios.html#profitability-ratios", + "href": "r/financial-ratios.html#profitability-ratios", + "title": "Financial Ratios", + "section": "Profitability Ratios", + "text": "Profitability Ratios\nGross margin shows the percentage of revenue that exceeds the cost of goods sold (COGS). A higher gross margin implies that the company retains a higher percentage of revenue as gross profit.\n\\[\\text{Gross Margin} = \\frac{\\text{Gross Profit}}{\\text{Revenue}}\\]\nProfit margin is the percentage of revenue that translates into net income. A higher profit margin suggests a more profitable company.\n\\[\\text{Profit Margin} = \\frac{\\text{Net Income}}{\\text{Revenue}}\\]\nAfter-tax ROE measures the return on shareholders’ equity after accounting for taxes. A higher ROE indicates that the company is effectively generating profit from shareholders’ investments.\n\\[\\text{After-Tax ROE} = \\frac{\\text{Net Income}}{\\text{Total Equity}}\\]\n\ncombined_statements <- combined_statements |> \n mutate(\n gross_margin = gross_profit / revenue,\n profit_margin = net_income / revenue,\n after_tax_roe = net_income / total_equity\n )\n\nGross margin over time Figure 14 shows\n\nfig_gross_margin <- combined_statements |> \n filter(symbol %in% selected_symbols) |> \n ggplot(aes(x = calendar_year, y = gross_margin, color = symbol)) +\n geom_line() +\n scale_y_continuous(labels = percent) + \n labs(x = NULL, y = NULL, color = NULL,\n title = \"Gross margins for selected stocks between 2019 and 2023\")\nfig_gross_margin\n\n\n\n\n\n\n\nFigure 14: Gross margins are based on financial statements as provided through the FMP API.\n\n\n\n\n\nProfit margin vs gross margin Figure 15 shows\n\nfig_gross_margin_profit_margin <- combined_statements |> \n filter(calendar_year == 2023) |> \n ggplot(aes(x = gross_margin, y = profit_margin, color = label)) +\n geom_point(size = 2) +\n geom_label_repel(aes(label = label), seed = 42, box.padding = 0.75) +\n scale_x_continuous(labels = percent) +\n scale_y_continuous(labels = percent) + \n scale_color_manual(values = selected_colors) + \n labs(\n x = \"Gross margin\", y = \"Profit margin\",\n title = \"Gross and profit margins for Dow index constituents for 2023\"\n ) +\n theme(legend.position = \"none\")\nfig_gross_margin_profit_margin\n\nWarning: Removed 27 rows containing missing values\n(`geom_label_repel()`).\n\n\n\n\n\n\n\n\nFigure 15: Gross and profit margins are based on financial statements as provided through the FMP API.", + "crumbs": [ + "R", + "Getting Started", + "Financial Ratios" + ] + }, + { + "objectID": "r/financial-ratios.html#combining-financial-ratios", + "href": "r/financial-ratios.html#combining-financial-ratios", + "title": "Financial Ratios", + "section": "Combining Financial Ratios", + "text": "Combining Financial Ratios\nRanking companies in different categories\nFigure 16 shows\n\nfinancial_ratios <- combined_statements |> \n filter(calendar_year == 2023) |> \n select(symbol, \n contains(c(\"ratio\", \"margin\", \"roe\", \"_to_\", \"turnover\", \"interest_coverage\"))) |> \n pivot_longer(cols = -symbol) |> \n mutate(\n type = case_when(\n name %in% c(\"current_ratio\", \"quick_ratio\", \"cash_ratio\") ~ \"Liquidity Ratios\",\n name %in% c(\"debt_to_equity\", \"debt_to_asset\", \"interest_coverage\") ~ \"Leverage Ratios\",\n name %in% c(\"asset_turnover\", \"inventory_turnover\", \"receivables_turnover\") ~ \"Efficiency Ratios\",\n name %in% c(\"gross_margin\", \"profit_margin\", \"after_tax_roe\") ~ \"Profitability Ratios\"\n )\n ) \n\nfig_ranks <- financial_ratios |> \n group_by(type, name) |> \n arrange(desc(value)) |> \n mutate(rank = row_number()) |> \n group_by(symbol, type) |> \n summarize(rank = mean(rank), \n .groups = \"drop\") |> \n filter(symbol %in% selected_symbols) |> \n ggplot(aes(x = rank, y = type, color = symbol)) +\n geom_point(shape = 17, size = 4) +\n scale_color_manual(values = selected_colors) + \n labs(x = \"Average rank\", y = NULL, color = NULL,\n title = \"Average rank among Dow index constituents for selected stocks\") +\n coord_cartesian(xlim = c(1, 30))\nfig_ranks\n\n\n\n\n\n\n\nFigure 16: Ranks are based on financial statements as provided through the FMP API.", + "crumbs": [ + "R", + "Getting Started", + "Financial Ratios" + ] + }, + { + "objectID": "r/financial-ratios.html#financial-ratios-in-asset-pricing", + "href": "r/financial-ratios.html#financial-ratios-in-asset-pricing", + "title": "Financial Ratios", + "section": "Financial Ratios in Asset Pricing", + "text": "Financial Ratios in Asset Pricing\nThe Fama-French five-factor model aims to explain stock returns by incorporating specific financial metrics ratios. We provide more details in Replicating Fama-French Factors, but here is an intuitive overview:\n\nSize: Calculated as the logarithm of a company’s market capitalization, which is the total market value of its outstanding shares. This factor captures the tendency for smaller firms to outperform larger ones over time.\nBook-to-Market Ratio: Determined by dividing the company’s book equity by its market capitalization. A higher ratio indicates a ‘value’ stock, while a lower ratio suggests a ‘growth’’’ stock. This metric helps differentiate between undervalued and overvalued companies.\nProfitability: Measured as the ratio of operating profit to book equity, where operating profit is calculated as revenue minus cost of goods sold (COGS), selling, general, and administrative expenses (SG&A), and interest expense. This factor assesses a company’s efficiency in generating profits from its equity base.\nInvestment: Calculated as the percentage change in total assets from the previous period. This factor reflects the company’s growth strategy, indicating whether it is investing aggressively or conservatively.\n\nWe can calculate these factors using the FMP API as follows:\n\nmarket_cap <- constituents |> \n map_df(\n \\(x) fmp_get(\n resource = \"historical-market-capitalization\", \n x, \n list(from = \"2023-12-29\", to = \"2023-12-29\")\n )\n ) \n\ncombined_statements_ff <- combined_statements |> \n filter(calendar_year == 2023) |> \n left_join(market_cap, join_by(symbol)) |> \n left_join(\n balance_sheets_statements |> \n filter(calendar_year == 2022) |> \n select(symbol, total_assets_lag = total_assets), \n join_by(symbol)\n ) |> \n mutate(\n size = log(market_cap),\n book_to_market = market_cap / total_equity,\n operating_profitability = (revenue - cost_of_revenue - selling_general_and_administrative_expenses - interest_expense) / total_equity,\n investment = total_assets / total_assets_lag\n )\n\nFigure 17 shows the ranks of our selected stocks for the Fama-French factors.\n\nfig_rank_ff <- combined_statements_ff |> \n select(symbol, Size = size, \n `Book-to-Market` = book_to_market, \n `Profitability` = operating_profitability,\n Investment = investment) |> \n pivot_longer(-symbol) |> \n group_by(name) |> \n arrange(desc(value)) |> \n mutate(rank = row_number()) |> \n ungroup() |> \n filter(symbol %in% selected_symbols) |> \n ggplot(aes(x = rank, y = name, color = symbol)) +\n geom_point(shape = 17, size = 4) +\n scale_color_manual(values = selected_colors) + \n labs(\n x = \"Rank\", y = NULL, color = NULL,\n title = \"Rank in Fama-French variables for selected stocks from the Dow index\"\n ) +\n coord_cartesian(xlim = c(1, 30))\nfig_rank_ff\n\n\n\n\n\n\n\nFigure 17: Ranks are based on financial statements and historical market capitalization as provided through the FMP API.", + "crumbs": [ + "R", + "Getting Started", + "Financial Ratios" + ] + }, + { + "objectID": "r/financial-ratios.html#key-takeaways", + "href": "r/financial-ratios.html#key-takeaways", + "title": "Financial Ratios", + "section": "Key Takeaways", + "text": "Key Takeaways\n\nFinancial statements provide standardized, legally required insights into a company’s financial position\nRatios allow benchmarking & trend analysis across liquidity, leverage, efficiency & profitability dimensions\nfmpapi enables easy access to financial data for ratio calculations & peer comparisons", + "crumbs": [ + "R", + "Getting Started", + "Financial Ratios" + ] + }, + { + "objectID": "r/financial-ratios.html#exercises", + "href": "r/financial-ratios.html#exercises", + "title": "Financial Ratios", + "section": "Exercises", + "text": "Exercises\n\n…", + "crumbs": [ + "R", + "Getting Started", + "Financial Ratios" + ] + }, { "objectID": "r/index.html", "href": "r/index.html", @@ -3274,332 +3646,452 @@ "text": "First Glimpse of the CRSP Sample\nBefore we move on to other data sources, let us look at some descriptive statistics of the CRSP sample, which is our main source for stock returns.\nFigure 1 shows the monthly number of securities by listing exchange over time. NYSE has the longest history in the data, but NASDAQ lists a considerably large number of stocks. The number of stocks listed on AMEX decreased steadily over the last couple of decades. By the end of 2023, there were 2565 stocks with a primary listing on NASDAQ, 1287 on NYSE, and 168 on AMEX. \n\ncrsp_monthly |>\n count(exchange, date) |>\n ggplot(aes(x = date, y = n, color = exchange, linetype = exchange)) +\n geom_line() +\n labs(\n x = NULL, y = NULL, color = NULL, linetype = NULL,\n title = \"Monthly number of securities by listing exchange\"\n ) +\n scale_x_date(date_breaks = \"10 years\", date_labels = \"%Y\") +\n scale_y_continuous(labels = comma)\n\n\n\n\n\n\n\nFigure 1: Number of stocks in the CRSP sample listed at each of the US exchanges.\n\n\n\n\n\nNext, we look at the aggregate market capitalization grouped by the respective listing exchanges in Figure 2. To ensure that we look at meaningful data which is comparable over time, we adjust the nominal values for inflation. In fact, we can use the tables that are already in our database to calculate aggregate market caps by listing exchange and plotting it just as if they were in memory. All values in Figure 2 are at the end of 2023 USD to ensure intertemporal comparability. NYSE-listed stocks have by far the largest market capitalization, followed by NASDAQ-listed stocks.\n\ntbl(tidy_finance, \"crsp_monthly\") |>\n left_join(tbl(tidy_finance, \"cpi_monthly\"), join_by(date)) |>\n group_by(date, exchange) |>\n summarize(\n mktcap = sum(mktcap, na.rm = TRUE) / cpi,\n .groups = \"drop\"\n ) |>\n collect() |>\n mutate(date = ymd(date)) |>\n ggplot(aes(\n x = date, y = mktcap / 1000,\n color = exchange, linetype = exchange\n )) +\n geom_line() +\n labs(\n x = NULL, y = NULL, color = NULL, linetype = NULL,\n title = \"Monthly market cap by listing exchange in billions of Dec 2023 USD\"\n ) +\n scale_x_date(date_breaks = \"10 years\", date_labels = \"%Y\") +\n scale_y_continuous(labels = comma)\n\n\n\n\n\n\n\nFigure 2: Market capitalization is measured in billion USD, adjusted for consumer price index changes such that the values on the horizontal axis reflect the buying power of billion USD in December 2023.\n\n\n\n\n\nOf course, performing the computation in the database is not really meaningful because we can easily pull all the required data into our memory. The code chunk above is slower than performing the same steps on tables that are already in memory. However, we just want to illustrate that you can perform many things in the database before loading the data into your memory. Before we proceed, we load the monthly CPI data.\n\ncpi_monthly <- tbl(tidy_finance, \"cpi_monthly\") |>\n collect()\n\nNext, we look at the same descriptive statistics by industry. Figure 3 plots the number of stocks in the sample for each of the SIC industry classifiers. For most of the sample period, the largest share of stocks is in manufacturing, albeit the number peaked somewhere in the 90s. The number of firms associated with public administration seems to be the only category on the rise in recent years, even surpassing manufacturing at the end of our sample period.\n\ncrsp_monthly_industry <- crsp_monthly |>\n left_join(cpi_monthly, join_by(date)) |>\n group_by(date, industry) |>\n summarize(\n securities = n_distinct(permno),\n mktcap = sum(mktcap) / mean(cpi),\n .groups = \"drop\"\n )\n\ncrsp_monthly_industry |>\n ggplot(aes(\n x = date,\n y = securities,\n color = industry,\n linetype = industry\n )) +\n geom_line() +\n labs(\n x = NULL, y = NULL, color = NULL, linetype = NULL,\n title = \"Monthly number of securities by industry\"\n ) +\n scale_x_date(date_breaks = \"10 years\", date_labels = \"%Y\") +\n scale_y_continuous(labels = comma)\n\n\n\n\n\n\n\nFigure 3: Number of stocks in the CRSP sample associated with different industries.\n\n\n\n\n\nWe also compute the market cap of all stocks belonging to the respective industries and show the evolution over time in Figure 4. All values are again in terms of billions of end of 2023 USD. At all points in time, manufacturing firms comprise of the largest portion of market capitalization. Toward the end of the sample, however, financial firms and services begin to make up a substantial portion of the market cap.\n\ncrsp_monthly_industry |>\n ggplot(aes(\n x = date,\n y = mktcap / 1000,\n color = industry,\n linetype = industry\n )) +\n geom_line() +\n labs(\n x = NULL, y = NULL, color = NULL, linetype = NULL,\n title = \"Monthly total market cap by industry in billions as of Dec 2023 USD\"\n ) +\n scale_x_date(date_breaks = \"10 years\", date_labels = \"%Y\") +\n scale_y_continuous(labels = comma)\n\n\n\n\n\n\n\nFigure 4: Market capitalization is measured in billion USD, adjusted for consumer price index changes such that the values on the y-axis reflect the buying power of billion USD in December 2023.", "crumbs": [ "R", - "Financial Data", - "WRDS, CRSP, and Compustat" + "Financial Data", + "WRDS, CRSP, and Compustat" + ] + }, + { + "objectID": "r/wrds-crsp-and-compustat.html#daily-crsp-data", + "href": "r/wrds-crsp-and-compustat.html#daily-crsp-data", + "title": "WRDS, CRSP, and Compustat", + "section": "Daily CRSP Data", + "text": "Daily CRSP Data\nBefore we turn to accounting data, we provide a proposal for downloading daily CRSP data with the same filters used for the monthly data (i.e., using information from stksecurityinfohist). While the monthly data from above typically fit into your memory and can be downloaded in a meaningful amount of time, this is usually not true for daily return data. The daily CRSP data file is substantially larger than monthly data and can exceed 20 GB. This has two important implications: you cannot hold all the daily return data in your memory (hence it is not possible to copy the entire dataset to your local database), and in our experience, the download usually crashes (or never stops) because it is too much data for the WRDS cloud to prepare and send to your R session.\nThere is a solution to this challenge. As with many big data problems, you can split up the big task into several smaller tasks that are easier to handle. That is, instead of downloading data about all stocks at once, download the data in small batches of stocks consecutively. Such operations can be implemented in for-loops, where we download, prepare, and store the data for a small number of stocks in each iteration. This operation might nonetheless take around 5 minutes, depending on your internet connection. To keep track of the progress, we create ad-hoc progress updates using message(). Notice that we also use the function dbWriteTable() here with the option to append the new data to an existing table, when we process the second and all following batches. As for the monthly CRSP data, there is no need to adjust for delisting returns in the daily CRSP data since July 2022.\n\ndsf_db <- tbl(wrds, I(\"crsp.dsf_v2\"))\nstksecurityinfohist_db <- tbl(wrds, I(\"crsp.stksecurityinfohist\"))\n\nfactors_ff3_daily <- tbl(tidy_finance, \"factors_ff3_daily\") |>\n collect()\n\npermnos <- stksecurityinfohist_db |>\n distinct(permno) |> \n pull(permno)\n\nbatch_size <- 500\nbatches <- ceiling(length(permnos) / batch_size)\n\nfor (j in 1:batches) {\n \n permno_batch <- permnos[\n ((j - 1) * batch_size + 1):min(j * batch_size, length(permnos))\n ]\n\n crsp_daily_sub <- dsf_db |>\n filter(permno %in% permno_batch) |> \n filter(dlycaldt >= start_date & dlycaldt <= end_date) |> \n inner_join(\n stksecurityinfohist_db |>\n filter(sharetype == \"NS\" & \n securitytype == \"EQTY\" & \n securitysubtype == \"COM\" & \n usincflg == \"Y\" & \n issuertype %in% c(\"ACOR\", \"CORP\") & \n primaryexch %in% c(\"N\", \"A\", \"Q\") &\n conditionaltype %in% c(\"RW\", \"NW\") &\n tradingstatusflg == \"A\") |> \n select(permno, secinfostartdt, secinfoenddt),\n join_by(permno)\n ) |>\n filter(dlycaldt >= secinfostartdt & dlycaldt <= secinfoenddt) |> \n select(permno, date = dlycaldt, ret = dlyret) |>\n collect() |>\n drop_na()\n\n if (nrow(crsp_daily_sub) > 0) {\n \n crsp_daily_sub <- crsp_daily_sub |>\n left_join(factors_ff3_daily |>\n select(date, rf), join_by(date)) |>\n mutate(\n ret_excess = ret - rf,\n ret_excess = pmax(ret_excess, -1)\n ) |>\n select(permno, date, ret, ret_excess)\n\n dbWriteTable(\n tidy_finance,\n \"crsp_daily\",\n value = crsp_daily_sub,\n overwrite = ifelse(j == 1, TRUE, FALSE),\n append = ifelse(j != 1, TRUE, FALSE)\n )\n }\n\n message(\"Batch \", j, \" out of \", batches, \" done (\", percent(j / batches), \")\\n\")\n}\n\nEventually, we end up with more than 71 million rows of daily return data. Note that we only store the identifying information that we actually need, namely permno and date alongside the excess returns. We thus ensure that our local database contains only the data that we actually use.\nTo download the daily CRSP data via the tidyfinance package, you can call:\n\ncrsp_daily <- download_data(\n type = \"wrds_crsp_daily\",\n start_date = start_date,\n end_date = end_date\n)\n\nNote that you need at least 16 GB of memory to hold all the daily CRSP returns in memory. We hence recommend to use loop the function over different date periods and store the results.", + "crumbs": [ + "R", + "Financial Data", + "WRDS, CRSP, and Compustat" + ] + }, + { + "objectID": "r/wrds-crsp-and-compustat.html#preparing-compustat-data", + "href": "r/wrds-crsp-and-compustat.html#preparing-compustat-data", + "title": "WRDS, CRSP, and Compustat", + "section": "Preparing Compustat Data", + "text": "Preparing Compustat Data\nFirm accounting data are an important source of information that we use in portfolio analyses in subsequent chapters. The commonly used source for firm financial information is Compustat provided by S&P Global Market Intelligence, which is a global data vendor that provides financial, statistical, and market information on active and inactive companies throughout the world. For US and Canadian companies, annual history is available back to 1950 and quarterly as well as monthly histories date back to 1962.\nTo access Compustat data, we can again tap WRDS, which hosts the funda table that contains annual firm-level information on North American companies.\n\nfunda_db <- tbl(wrds, I(\"comp.funda\"))\n\nWe follow the typical filter conventions and pull only data that we actually need: (i) we get only records in industrial data format, which includes companies that are primarily involved in manufacturing, services, and other non-financial business activities,3 (ii) in the standard format (i.e., consolidated information in standard presentation), (iii) reported in USD,4 and (iv) only data in the desired time window.\n\ncompustat <- funda_db |>\n filter(\n indfmt == \"INDL\" &\n datafmt == \"STD\" & \n consol == \"C\" &\n curcd == \"USD\" &\n datadate >= start_date & datadate <= end_date\n ) |>\n select(\n gvkey, # Firm identifier\n datadate, # Date of the accounting data\n seq, # Stockholders' equity\n ceq, # Total common/ordinary equity\n at, # Total assets\n lt, # Total liabilities\n txditc, # Deferred taxes and investment tax credit\n txdb, # Deferred taxes\n itcb, # Investment tax credit\n pstkrv, # Preferred stock redemption value\n pstkl, # Preferred stock liquidating value\n pstk, # Preferred stock par value\n capx, # Capital investment\n oancf, # Operating cash flow\n sale, # Revenue\n cogs, # Costs of goods sold\n xint, # Interest expense\n xsga # Selling, general, and administrative expenses\n ) |>\n collect()\n\nNext, we calculate the book value of preferred stock and equity be and the operating profitability op inspired by the variable definitions in Ken French’s data library. Note that we set negative or zero equity to missing which is a common practice when working with book-to-market ratios (see Fama and French 1992 for details).\n\ncompustat <- compustat |>\n mutate(\n be = coalesce(seq, ceq + pstk, at - lt) +\n coalesce(txditc, txdb + itcb, 0) -\n coalesce(pstkrv, pstkl, pstk, 0),\n be = if_else(be <= 0, NA, be),\n op = (sale - coalesce(cogs, 0) - \n coalesce(xsga, 0) - coalesce(xint, 0)) / be,\n )\n\nWe keep only the last available information for each firm-year group. Note that datadate defines the time the corresponding financial data refers to (e.g., annual report as of December 31, 2022). Therefore, datadate is not the date when data was made available to the public. Check out the exercises for more insights into the peculiarities of datadate.\n\ncompustat <- compustat |>\n mutate(year = year(datadate)) |>\n group_by(gvkey, year) |>\n filter(datadate == max(datadate)) |>\n ungroup()\n\nWe also compute the investment ratio inv according to Ken French’s variable definitions as the change in total assets from one fiscal year to another. Note that we again use the approach using joins as introduced with the CRSP data above to construct lagged assets.\n\ncompustat <- compustat |> \n left_join(\n compustat |> \n select(gvkey, year, at_lag = at) |> \n mutate(year = year + 1), \n join_by(gvkey, year)\n ) |> \n mutate(\n inv = at / at_lag - 1,\n inv = if_else(at_lag <= 0, NA, inv)\n )\n\nWith the last step, we are already done preparing the firm fundamentals. Thus, we can store them in our local database.\n\ndbWriteTable(\n tidy_finance,\n \"compustat\",\n value = compustat,\n overwrite = TRUE\n)\n\nThe tidyfinance package provides a shortcut for the processing steps as well:\n\ncompustat <- download_data(\n type = \"wrds_compustat_annual\",\n start_date = start_date,\n end_date = end_date\n)", + "crumbs": [ + "R", + "Financial Data", + "WRDS, CRSP, and Compustat" + ] + }, + { + "objectID": "r/wrds-crsp-and-compustat.html#merging-crsp-with-compustat", + "href": "r/wrds-crsp-and-compustat.html#merging-crsp-with-compustat", + "title": "WRDS, CRSP, and Compustat", + "section": "Merging CRSP with Compustat", + "text": "Merging CRSP with Compustat\nUnfortunately, CRSP and Compustat use different keys to identify stocks and firms. CRSP uses permno for stocks, while Compustat uses gvkey to identify firms. Fortunately, a curated matching table on WRDS allows us to merge CRSP and Compustat, so we create a connection to the CRSP-Compustat Merged table (provided by CRSP).\n\nccm_linking_table_db <- tbl(wrds, I(\"crsp.ccmxpf_lnkhist\"))\n\nThe linking table contains links between CRSP and Compustat identifiers from various approaches. However, we need to make sure that we keep only relevant and correct links, again following the description outlined in Bali, Engle, and Murray (2016). Note also that currently active links have no end date, so we just enter the current date via today().\n\nccm_linking_table <- ccm_linking_table_db |>\n filter(\n linktype %in% c(\"LU\", \"LC\") &\n linkprim %in% c(\"P\", \"C\")\n ) |>\n select(permno = lpermno, gvkey, linkdt, linkenddt) |>\n collect() |>\n mutate(linkenddt = replace_na(linkenddt, today()))\n\nWe use these links to create a new table with a mapping between stock identifier, firm identifier, and month. We then add these links to the Compustat gvkey to our monthly stock data.\n\nccm_links <- crsp_monthly |>\n inner_join(ccm_linking_table, \n join_by(permno), relationship = \"many-to-many\") |>\n filter(!is.na(gvkey) & \n (date >= linkdt & date <= linkenddt)) |>\n select(permno, gvkey, date)\n\nTo fetch these links via tidyfinance, you can call:\n\nccm_links <- download_data(type = \"wrds_ccm_links\")\n\nAs the last step, we update the previously prepared monthly CRSP file with the linking information in our local database.\n\ncrsp_monthly <- crsp_monthly |>\n left_join(ccm_links, join_by(permno, date))\n\ndbWriteTable(\n tidy_finance,\n \"crsp_monthly\",\n value = crsp_monthly,\n overwrite = TRUE\n)\n\nBefore we close this chapter, let us look at an interesting descriptive statistic of our data. As the book value of equity plays a crucial role in many asset pricing applications, it is interesting to know for how many of our stocks this information is available. Hence, Figure 5 plots the share of securities with book equity values for each exchange. It turns out that the coverage is pretty bad for AMEX- and NYSE-listed stocks in the 1960s but hovers around 80 percent for all periods thereafter. We can ignore the erratic coverage of securities that belong to the other category since there is only a handful of them anyway in our sample.\n\ncrsp_monthly |>\n group_by(permno, year = year(date)) |>\n filter(date == max(date)) |>\n ungroup() |>\n left_join(compustat, join_by(gvkey, year)) |>\n group_by(exchange, year) |>\n summarize(\n share = n_distinct(permno[!is.na(be)]) / n_distinct(permno),\n .groups = \"drop\"\n ) |>\n ggplot(aes(\n x = year, \n y = share, \n color = exchange,\n linetype = exchange\n )) +\n geom_line() +\n labs(\n x = NULL, y = NULL, color = NULL, linetype = NULL,\n title = \"Share of securities with book equity values by exchange\"\n ) +\n scale_y_continuous(labels = percent) +\n coord_cartesian(ylim = c(0, 1))\n\n\n\n\n\n\n\nFigure 5: End-of-year share of securities with book equity values by listing exchange.", + "crumbs": [ + "R", + "Financial Data", + "WRDS, CRSP, and Compustat" + ] + }, + { + "objectID": "r/wrds-crsp-and-compustat.html#some-tricks-for-postgresql-databases", + "href": "r/wrds-crsp-and-compustat.html#some-tricks-for-postgresql-databases", + "title": "WRDS, CRSP, and Compustat", + "section": "Some Tricks for PostgreSQL Databases", + "text": "Some Tricks for PostgreSQL Databases\nAs we mentioned above, the WRDS database runs on PostgreSQL rather than SQLite. Finding the right tables for your data needs can be tricky in the WRDS PostgreSQL instance, as the tables are organized in schemas. If you wonder what the purpose of schemas is, check out this documetation. For instance, if you want to find all tables that live in the crsp schema, you run\n\ndbListObjects(wrds, Id(schema = \"crsp\"))\n\nThis operation returns a list of all tables that belong to the crsp family on WRDS, e.g., <Id> schema = crsp, table = msenames. Similarly, you can fetch a list of all tables that belong to the comp family via\n\ndbListObjects(wrds, Id(schema = \"comp\"))\n\nIf you want to get all schemas, then run\n\ndbListObjects(wrds)", + "crumbs": [ + "R", + "Financial Data", + "WRDS, CRSP, and Compustat" + ] + }, + { + "objectID": "r/wrds-crsp-and-compustat.html#exercises", + "href": "r/wrds-crsp-and-compustat.html#exercises", + "title": "WRDS, CRSP, and Compustat", + "section": "Exercises", + "text": "Exercises\n\nCheck out the structure of the WRDS database by sending queries in the spirit of “Querying WRDS Data using R” and verify the output with dbListObjects(). How many tables are associated with CRSP? Can you identify what is stored within msp500?\nCompute mkt_cap_lag using lag(mktcap) rather than using joins as above. Filter out all the rows where the lag-based market capitalization measure is different from the one we computed above. Why are the two measures they different?\nPlot the average market capitalization of firms for each exchange and industry, respectively, over time. What do you find?\nIn the compustat table, datadate refers to the date to which the fiscal year of a corresponding firm refers. Count the number of observations in Compustat by month of this date variable. What do you find? What does the finding suggest about pooling observations with the same fiscal year?\nGo back to the original Compustat data in funda_db and extract rows where the same firm has multiple rows for the same fiscal year. What is the reason for these observations?\nKeep the last observation of crsp_monthly by year and join it with the compustat table. Create the following plots: (i) aggregate book equity by exchange over time and (ii) aggregate annual book equity by industry over time. Do you notice any different patterns to the corresponding plots based on market capitalization?\nRepeat the analysis of market capitalization for book equity, which we computed from the Compustat data. Then, use the matched sample to plot book equity against market capitalization. How are these two variables related?", + "crumbs": [ + "R", + "Financial Data", + "WRDS, CRSP, and Compustat" + ] + }, + { + "objectID": "r/wrds-crsp-and-compustat.html#footnotes", + "href": "r/wrds-crsp-and-compustat.html#footnotes", + "title": "WRDS, CRSP, and Compustat", + "section": "Footnotes", + "text": "Footnotes\n\n\nThe tbl() function creates a lazy table in our R session based on the remote WRDS database. To look up specific tables, we use the I(\"schema_name.table_name\") approach.↩︎\nThese three criteria jointly replicate the filter exchcd %in% c(1, 2, 3, 31, 32, 33) used for the legacy version of CRSP. If you do not want to include stocks at issuance, you can set the conditionaltype == \"RW\", which is equivalent to the restriction of exchcd %in% c(1, 2, 3) with the old CRSP format.↩︎\nCompanies that operate in the banking, insurance, or utilities sector typically report in different industry formats that reflect their specific regulatory requirements.↩︎\nCompustat also contains reports in CAD, which can lead a currency mismatch, e.g., when relating book equity to market equity.↩︎", + "crumbs": [ + "R", + "Financial Data", + "WRDS, CRSP, and Compustat" + ] + }, + { + "objectID": "r/wrds-dummy-data.html", + "href": "r/wrds-dummy-data.html", + "title": "WRDS Dummy Data", + "section": "", + "text": "Note\n\n\n\nThis appendix chapter is based on a blog post Dummy Data for Tidy Finance Readers without Access to WRDS by Christoph Scheuch.\nIn this appendix chapter, we alleviate the constraints of readers who do not have access to WRDS and hence cannot run the code that we provide. We show how to create a dummy database that contains the WRDS tables and corresponding columns such that all code chunks in this book can be executed with this dummy database. We do not create dummy data for tables of macroeconomic variables because they can be freely downloaded from the original sources; check out Accessing and Managing Financial Data.\nWe deliberately use the dummy label because the data is not meaningful in the sense that it allows readers to actually replicate the results of the book. For legal reasons, the data does not contain any samples of the original data. We merely generate random numbers for all columns of the tables that we use throughout the books.\nTo generate the dummy database, we use the following packages:\nlibrary(tidyverse)\nlibrary(RSQLite)\nLet us initialize a SQLite database (tidy_finance_r.sqlite) or connect to your existing one. Be careful, if you already downloaded the data from WRDS, then the code in this chapter will overwrite your data!\ntidy_finance <- dbConnect(\n SQLite(),\n \"data/tidy_finance_r.sqlite\",\n extended_types = TRUE\n)\nSince we draw random numbers for most of the columns, we also define a seed to ensure that the generated numbers are replicable. We also initialize vectors of dates of different frequencies over ten years that we then use to create yearly, monthly, and daily data, respectively.\nset.seed(1234)\n\nstart_date <- as.Date(\"2003-01-01\")\nend_date <- as.Date(\"2022-12-31\")\n\ntime_series_years <- seq(year(start_date), year(end_date), 1)\ntime_series_months <- seq(start_date, end_date, \"1 month\")\ntime_series_days <- seq(start_date, end_date, \"1 day\")", + "crumbs": [ + "R", + "Appendix", + "WRDS Dummy Data" + ] + }, + { + "objectID": "r/wrds-dummy-data.html#create-stock-dummy-data", + "href": "r/wrds-dummy-data.html#create-stock-dummy-data", + "title": "WRDS Dummy Data", + "section": "Create Stock Dummy Data", + "text": "Create Stock Dummy Data\nLet us start with the core data used throughout the book: stock and firm characteristics. We first generate a table with a cross-section of stock identifiers with unique permno and gvkey values, as well as associated exchcd, exchange, industry, and siccd values. The generated data is based on the characteristics of stocks in the crsp_monthly table of the original database, ensuring that the generated stocks roughly reflect the distribution of industries and exchanges in the original data, but the identifiers and corresponding exchanges or industries do not reflect actual firms. Similarly, the permno-gvkey combinations are purely nonsensical and should not be used together with actual CRSP or Compustat data.\n\nnumber_of_stocks <- 100\n\nindustries <- tibble(\n industry = c(\"Agriculture\", \"Construction\", \"Finance\", \n \"Manufacturing\", \"Mining\", \"Public\", \"Retail\", \n \"Services\", \"Transportation\", \"Utilities\", \n \"Wholesale\"),\n n = c(81, 287, 4682, 8584, 1287, 1974, 1571, 4277, 1249, \n 457, 904),\n prob = c(0.00319, 0.0113, 0.185, 0.339, 0.0508, 0.0779, \n 0.0620, 0.169, 0.0493, 0.0180, 0.0357)\n)\n\nexchanges <- exchanges <- tibble(\n exchange = c(\"AMEX\", \"NASDAQ\", \"NYSE\"),\n n = c(2893, 17236, 5553),\n prob = c(0.113, 0.671, 0.216)\n)\n\nstock_identifiers <- 1:number_of_stocks |> \n map_df(\n function(x) {\n tibble(\n permno = x,\n gvkey = as.character(x + 10000),\n exchange = sample(exchanges$exchange, 1, \n prob = exchanges$prob),\n industry = sample(industries$industry, 1, \n prob = industries$prob)\n ) |> \n mutate(\n exchcd = case_when(\n exchange == \"NYSE\" ~ sample(c(1, 31), n()),\n exchange == \"AMEX\" ~ sample(c(2, 32), n()),\n exchange == \"NASDAQ\" ~ sample(c(3, 33), n())\n ),\n siccd = case_when(\n industry == \"Agriculture\" ~ sample(1:999, n()),\n industry == \"Mining\" ~ sample(1000:1499, n()),\n industry == \"Construction\" ~ sample(1500:1799, n()),\n industry == \"Manufacturing\" ~ sample(1800:3999, n()),\n industry == \"Transportation\" ~ sample(4000:4899, n()),\n industry == \"Utilities\" ~ sample(4900:4999, n()),\n industry == \"Wholesale\" ~ sample(5000:5199, n()),\n industry == \"Retail\" ~ sample(5200:5999, n()),\n industry == \"Finance\" ~ sample(6000:6799, n()),\n industry == \"Services\" ~ sample(7000:8999, n()),\n industry == \"Public\" ~ sample(9000:9999, n())\n )\n )\n }\n )\n\nNext, we construct three panels of stock data with varying frequencies: yearly, monthly, and daily. We begin by creating the stock_panel_yearly panel. To achieve this, we combine the stock_identifiers table with a new table containing the variable year from dummy_years. The expand_grid() function ensures that we get all possible combinations of the two tables. After combining, we select only the gvkey and year columns for our final yearly panel.\nNext, we construct the stock_panel_monthly panel. Similar to the yearly panel, we use the expand_grid() function to combine stock_identifiers with a new table that has the date variable from dummy_months. After merging, we select the columns permno, gvkey, date, siccd, industry, exchcd, and exchange to form our monthly panel.\nLastly, we create the stock_panel_daily panel. We combine stock_identifiers with a table containing the date variable from dummy_days. After merging, we retain only the permno and date columns for our daily panel.\n\nstock_panel_yearly <- expand_grid(\n stock_identifiers, \n tibble(year = time_series_years)\n) |> \n select(gvkey, year)\n\nstock_panel_monthly <- expand_grid(\n stock_identifiers, \n tibble(date = time_series_months)\n) |> \n select(permno, gvkey, date, siccd, industry, exchcd, exchange)\n\nstock_panel_daily <- expand_grid(\n stock_identifiers, \n tibble(date = time_series_days)\n)|> \n select(permno, date)\n\n\nDummy beta table\nWe then proceed to create dummy beta values for our stock_panel_monthly table. We generate monthly beta values beta_monthly using the rnorm() function with a mean and standard deviation of 1. For daily beta values beta_daily, we take the dummy monthly beta and add a small random noise to it. This noise is generated again using the rnorm() function, but this time we divide the random values by 100 to ensure they are small deviations from the monthly beta.\n\nbeta_dummy <- stock_panel_monthly |> \n mutate(\n beta_monthly = rnorm(n(), mean = 1, sd = 1),\n beta_daily = beta_monthly + rnorm(n()) / 100\n )\n\ndbWriteTable(\n tidy_finance,\n \"beta\", \n beta_dummy, \n overwrite = TRUE\n)\n\n\n\nDummy compustat table\nTo create dummy firm characteristics, we take all columns from the compustat table and create random numbers between 0 and 1. For simplicity, we set the datadate for each firm-year observation to the last day of the year, although it is empirically not the case. \n\nrelevant_columns <- c(\n \"seq\", \"ceq\", \"at\", \"lt\", \"txditc\", \"txdb\", \"itcb\", \n \"pstkrv\", \"pstkl\", \"pstk\", \"capx\", \"oancf\", \"sale\", \n \"cogs\", \"xint\", \"xsga\", \"be\", \"op\", \"at_lag\", \"inv\"\n)\n\ncommands <- unlist(\n map(\n relevant_columns, \n ~rlang::exprs(!!..1 := runif(n()))\n )\n)\n\ncompustat_dummy <- stock_panel_yearly |> \n mutate(\n datadate = ymd(str_c(year, \"12\", \"31\")),\n !!!commands\n )\n\ndbWriteTable(\n tidy_finance, \n \"compustat\", \n compustat_dummy,\n overwrite = TRUE\n)\n\n\n\nDummy crsp_monthly table\nThe crsp_monthly table only lacks a few more columns compared to stock_panel_monthly: the returns ret drawn from a normal distribution, the excess returns ret_excess with small deviations from the returns, the shares outstanding shrout and the last price per month altprc both drawn from uniform distributions, and the market capitalization mktcap as the product of shrout and altprc. \n\ncrsp_monthly_dummy <- stock_panel_monthly |> \n mutate(\n ret = pmax(rnorm(n()), -1),\n ret_excess = pmax(ret - runif(n(), 0, 0.0025), -1),\n shrout = runif(n(), 1, 50) * 1000,\n altprc = runif(n(), 0, 1000),\n mktcap = shrout * altprc\n ) |> \n group_by(permno) |> \n arrange(date) |> \n mutate(mktcap_lag = lag(mktcap)) |> \n ungroup()\n\ndbWriteTable(\n tidy_finance, \n \"crsp_monthly\",\n crsp_monthly_dummy,\n overwrite = TRUE\n)\n\n\n\nDummy crsp_daily table\nThe crsp_daily table only contains a date column and the daily excess returns ret_excess as additional columns to stock_panel_daily.\n\ncrsp_daily_dummy <- stock_panel_daily |> \n mutate(\n ret_excess = pmax(rnorm(n()), -1)\n )\n\ndbWriteTable(\n tidy_finance,\n \"crsp_daily\",\n crsp_daily_dummy, \n overwrite = TRUE\n)", + "crumbs": [ + "R", + "Appendix", + "WRDS Dummy Data" + ] + }, + { + "objectID": "r/wrds-dummy-data.html#create-bond-dummy-data", + "href": "r/wrds-dummy-data.html#create-bond-dummy-data", + "title": "WRDS Dummy Data", + "section": "Create Bond Dummy Data", + "text": "Create Bond Dummy Data\nLastly, we move to the bond data that we use in our books.\n\nDummy fisd data\nTo create dummy data with the structure of Mergent FISD, we calculate the empirical probabilities of actual bonds for several variables: maturity, offering_amt, interest_frequency, coupon, and sic_code. We use these probabilities to sample a small cross-section of bonds with completely made up complete_cusip, issue_id, and issuer_id.\n\nnumber_of_bonds <- 100\n\nfisd_dummy <- 1:number_of_bonds |> \n map_df(\n function(x) {\n tibble(\n complete_cusip = str_to_upper(\n str_c(\n sample(c(letters, 0:9), 12, replace = TRUE), \n collapse = \"\"\n )\n ),\n )\n }\n ) |> \n mutate(\n maturity = sample(time_series_days, n(), replace = TRUE),\n offering_amt = sample(seq(1:100) * 100000, n(), replace = TRUE),\n offering_date = maturity - sample(seq(1:25) * 365, n(),replace = TRUE),\n dated_date = offering_date - sample(-10:10, n(), replace = TRUE),\n interest_frequency = sample(c(0, 1, 2, 4, 12), n(), replace = TRUE),\n coupon = sample(seq(0, 2, by = 0.1), n(), replace = TRUE),\n last_interest_date = pmax(maturity, offering_date, dated_date),\n issue_id = row_number(),\n issuer_id = sample(1:250, n(), replace = TRUE),\n sic_code = as.character(sample(seq(1:9)*1000, n(), replace = TRUE))\n )\n \ndbWriteTable(\n tidy_finance, \n \"fisd\", \n fisd_dummy, \n overwrite = TRUE\n)\n\n\n\nDummy trace_enhanced data\nFinally, we create a dummy bond transaction data for the fictional CUSIPs of the dummy fisd data. We take the date range that we also analyze in the book and ensure that we have at least five transactions per day to fulfill a filtering step in the book. \n\nstart_date <- as.Date(\"2014-01-01\")\nend_date <- as.Date(\"2016-11-30\")\n\nbonds_panel <- expand_grid(\n fisd_dummy |> \n select(cusip_id = complete_cusip),\n tibble(\n trd_exctn_dt = seq(start_date, end_date, \"1 day\")\n )\n)\n\ntrace_enhanced_dummy <- bind_rows(\n bonds_panel, bonds_panel, \n bonds_panel, bonds_panel, \n bonds_panel) |> \n mutate(\n trd_exctn_tm = str_c(\n sample(0:24, n(), replace = TRUE), \":\", \n sample(0:60, n(), replace = TRUE), \":\", \n sample(0:60, n(), replace = TRUE)\n ),\n rptd_pr = runif(n(), 10, 200),\n entrd_vol_qt = sample(1:20, n(), replace = TRUE) * 1000,\n yld_pt = runif(n(), -10, 10),\n rpt_side_cd = sample(c(\"B\", \"S\"), n(), replace = TRUE),\n cntra_mp_id = sample(c(\"C\", \"D\"), n(), replace = TRUE)\n ) \n \ndbWriteTable(\n tidy_finance, \n \"trace_enhanced\", \n trace_enhanced_dummy, \n overwrite = TRUE\n)\n\nAs stated in the introduction, the data does not contain any samples of the original data. We merely generate random numbers for all columns of the tables that we use throughout this book.", + "crumbs": [ + "R", + "Appendix", + "WRDS Dummy Data" + ] + }, + { + "objectID": "r/parametric-portfolio-policies.html", + "href": "r/parametric-portfolio-policies.html", + "title": "Parametric Portfolio Policies", + "section": "", + "text": "Note\n\n\n\nYou are reading Tidy Finance with R. You can find the equivalent chapter for the sibling Tidy Finance with Python here.\nIn this chapter, we apply different portfolio performance measures to evaluate and compare portfolio allocation strategies. For this purpose, we introduce a direct way to estimate optimal portfolio weights for large-scale cross-sectional applications. More precisely, the approach of Brandt, Santa-Clara, and Valkanov (2009) proposes to parametrize the optimal portfolio weights as a function of stock characteristics instead of estimating the stock’s expected return, variance, and covariances with other stocks in a prior step. We choose weights as a function of the characteristics, which maximize the expected utility of the investor. This approach is feasible for large portfolio dimensions (such as the entire CRSP universe) and has been proposed by Brandt, Santa-Clara, and Valkanov (2009). See the review paper by Brandt (2010) for an excellent treatment of related portfolio choice methods.\nThe current chapter relies on the following set of R packages:\nlibrary(tidyverse)\nlibrary(RSQLite)", + "crumbs": [ + "R", + "Portfolio Optimization", + "Parametric Portfolio Policies" ] }, { - "objectID": "r/wrds-crsp-and-compustat.html#daily-crsp-data", - "href": "r/wrds-crsp-and-compustat.html#daily-crsp-data", - "title": "WRDS, CRSP, and Compustat", - "section": "Daily CRSP Data", - "text": "Daily CRSP Data\nBefore we turn to accounting data, we provide a proposal for downloading daily CRSP data with the same filters used for the monthly data (i.e., using information from stksecurityinfohist). While the monthly data from above typically fit into your memory and can be downloaded in a meaningful amount of time, this is usually not true for daily return data. The daily CRSP data file is substantially larger than monthly data and can exceed 20 GB. This has two important implications: you cannot hold all the daily return data in your memory (hence it is not possible to copy the entire dataset to your local database), and in our experience, the download usually crashes (or never stops) because it is too much data for the WRDS cloud to prepare and send to your R session.\nThere is a solution to this challenge. As with many big data problems, you can split up the big task into several smaller tasks that are easier to handle. That is, instead of downloading data about all stocks at once, download the data in small batches of stocks consecutively. Such operations can be implemented in for-loops, where we download, prepare, and store the data for a small number of stocks in each iteration. This operation might nonetheless take around 5 minutes, depending on your internet connection. To keep track of the progress, we create ad-hoc progress updates using message(). Notice that we also use the function dbWriteTable() here with the option to append the new data to an existing table, when we process the second and all following batches. As for the monthly CRSP data, there is no need to adjust for delisting returns in the daily CRSP data since July 2022.\n\ndsf_db <- tbl(wrds, I(\"crsp.dsf_v2\"))\nstksecurityinfohist_db <- tbl(wrds, I(\"crsp.stksecurityinfohist\"))\n\nfactors_ff3_daily <- tbl(tidy_finance, \"factors_ff3_daily\") |>\n collect()\n\npermnos <- stksecurityinfohist_db |>\n distinct(permno) |> \n pull(permno)\n\nbatch_size <- 500\nbatches <- ceiling(length(permnos) / batch_size)\n\nfor (j in 1:batches) {\n \n permno_batch <- permnos[\n ((j - 1) * batch_size + 1):min(j * batch_size, length(permnos))\n ]\n\n crsp_daily_sub <- dsf_db |>\n filter(permno %in% permno_batch) |> \n filter(dlycaldt >= start_date & dlycaldt <= end_date) |> \n inner_join(\n stksecurityinfohist_db |>\n filter(sharetype == \"NS\" & \n securitytype == \"EQTY\" & \n securitysubtype == \"COM\" & \n usincflg == \"Y\" & \n issuertype %in% c(\"ACOR\", \"CORP\") & \n primaryexch %in% c(\"N\", \"A\", \"Q\") &\n conditionaltype %in% c(\"RW\", \"NW\") &\n tradingstatusflg == \"A\") |> \n select(permno, secinfostartdt, secinfoenddt),\n join_by(permno)\n ) |>\n filter(dlycaldt >= secinfostartdt & dlycaldt <= secinfoenddt) |> \n select(permno, date = dlycaldt, ret = dlyret) |>\n collect() |>\n drop_na()\n\n if (nrow(crsp_daily_sub) > 0) {\n \n crsp_daily_sub <- crsp_daily_sub |>\n left_join(factors_ff3_daily |>\n select(date, rf), join_by(date)) |>\n mutate(\n ret_excess = ret - rf,\n ret_excess = pmax(ret_excess, -1)\n ) |>\n select(permno, date, ret, ret_excess)\n\n dbWriteTable(\n tidy_finance,\n \"crsp_daily\",\n value = crsp_daily_sub,\n overwrite = ifelse(j == 1, TRUE, FALSE),\n append = ifelse(j != 1, TRUE, FALSE)\n )\n }\n\n message(\"Batch \", j, \" out of \", batches, \" done (\", percent(j / batches), \")\\n\")\n}\n\nEventually, we end up with more than 71 million rows of daily return data. Note that we only store the identifying information that we actually need, namely permno and date alongside the excess returns. We thus ensure that our local database contains only the data that we actually use.\nTo download the daily CRSP data via the tidyfinance package, you can call:\n\ncrsp_daily <- download_data(\n type = \"wrds_crsp_daily\",\n start_date = start_date,\n end_date = end_date\n)\n\nNote that you need at least 16 GB of memory to hold all the daily CRSP returns in memory. We hence recommend to use loop the function over different date periods and store the results.", + "objectID": "r/parametric-portfolio-policies.html#data-preparation", + "href": "r/parametric-portfolio-policies.html#data-preparation", + "title": "Parametric Portfolio Policies", + "section": "Data Preparation", + "text": "Data Preparation\nTo get started, we load the monthly CRSP file, which forms our investment universe. We load the data from our SQLite-database introduced in Accessing and Managing Financial Data and WRDS, CRSP, and Compustat.\n\ntidy_finance <- dbConnect(\n SQLite(), \"data/tidy_finance_r.sqlite\",\n extended_types = TRUE\n)\n\ncrsp_monthly <- tbl(tidy_finance, \"crsp_monthly\") |>\n select(permno, date, ret_excess, mktcap, mktcap_lag) |>\n collect()\n\nTo evaluate the performance of portfolios, we further use monthly market returns as a benchmark to compute CAPM alphas.\n\nfactors_ff3_monthly <- tbl(tidy_finance, \"factors_ff3_monthly\") |>\n select(date, mkt_excess) |>\n collect()\n\nNext, we retrieve some stock characteristics that have been shown to have an effect on the expected returns or expected variances (or even higher moments) of the return distribution. In particular, we record the lagged one-year return momentum (momentum_lag), defined as the compounded return between months \\(t-13\\) and \\(t-2\\) for each firm. In finance, momentum is the empirically observed tendency for rising asset prices to rise further, and falling prices to keep falling (Jegadeesh and Titman 1993). The second characteristic is the firm’s market equity (size_lag), defined as the log of the price per share times the number of shares outstanding (Banz 1981). To construct the correct lagged values, we use the approach introduced in WRDS, CRSP, and Compustat.\n\ncrsp_monthly_lags <- crsp_monthly |>\n transmute(permno,\n date_13 = date %m+% months(13),\n mktcap\n )\n\ncrsp_monthly <- crsp_monthly |>\n inner_join(crsp_monthly_lags,\n join_by(permno, date == date_13),\n suffix = c(\"\", \"_13\")\n )\n\ndata_portfolios <- crsp_monthly |>\n mutate(\n momentum_lag = mktcap_lag / mktcap_13,\n size_lag = log(mktcap_lag)\n ) |>\n drop_na(contains(\"lag\"))", "crumbs": [ "R", - "Financial Data", - "WRDS, CRSP, and Compustat" + "Portfolio Optimization", + "Parametric Portfolio Policies" ] }, { - "objectID": "r/wrds-crsp-and-compustat.html#preparing-compustat-data", - "href": "r/wrds-crsp-and-compustat.html#preparing-compustat-data", - "title": "WRDS, CRSP, and Compustat", - "section": "Preparing Compustat Data", - "text": "Preparing Compustat Data\nFirm accounting data are an important source of information that we use in portfolio analyses in subsequent chapters. The commonly used source for firm financial information is Compustat provided by S&P Global Market Intelligence, which is a global data vendor that provides financial, statistical, and market information on active and inactive companies throughout the world. For US and Canadian companies, annual history is available back to 1950 and quarterly as well as monthly histories date back to 1962.\nTo access Compustat data, we can again tap WRDS, which hosts the funda table that contains annual firm-level information on North American companies.\n\nfunda_db <- tbl(wrds, I(\"comp.funda\"))\n\nWe follow the typical filter conventions and pull only data that we actually need: (i) we get only records in industrial data format, which includes companies that are primarily involved in manufacturing, services, and other non-financial business activities,3 (ii) in the standard format (i.e., consolidated information in standard presentation), (iii) reported in USD,4 and (iv) only data in the desired time window.\n\ncompustat <- funda_db |>\n filter(\n indfmt == \"INDL\" &\n datafmt == \"STD\" & \n consol == \"C\" &\n curcd == \"USD\" &\n datadate >= start_date & datadate <= end_date\n ) |>\n select(\n gvkey, # Firm identifier\n datadate, # Date of the accounting data\n seq, # Stockholders' equity\n ceq, # Total common/ordinary equity\n at, # Total assets\n lt, # Total liabilities\n txditc, # Deferred taxes and investment tax credit\n txdb, # Deferred taxes\n itcb, # Investment tax credit\n pstkrv, # Preferred stock redemption value\n pstkl, # Preferred stock liquidating value\n pstk, # Preferred stock par value\n capx, # Capital investment\n oancf, # Operating cash flow\n sale, # Revenue\n cogs, # Costs of goods sold\n xint, # Interest expense\n xsga # Selling, general, and administrative expenses\n ) |>\n collect()\n\nNext, we calculate the book value of preferred stock and equity be and the operating profitability op inspired by the variable definitions in Ken French’s data library. Note that we set negative or zero equity to missing which is a common practice when working with book-to-market ratios (see Fama and French 1992 for details).\n\ncompustat <- compustat |>\n mutate(\n be = coalesce(seq, ceq + pstk, at - lt) +\n coalesce(txditc, txdb + itcb, 0) -\n coalesce(pstkrv, pstkl, pstk, 0),\n be = if_else(be <= 0, NA, be),\n op = (sale - coalesce(cogs, 0) - \n coalesce(xsga, 0) - coalesce(xint, 0)) / be,\n )\n\nWe keep only the last available information for each firm-year group. Note that datadate defines the time the corresponding financial data refers to (e.g., annual report as of December 31, 2022). Therefore, datadate is not the date when data was made available to the public. Check out the exercises for more insights into the peculiarities of datadate.\n\ncompustat <- compustat |>\n mutate(year = year(datadate)) |>\n group_by(gvkey, year) |>\n filter(datadate == max(datadate)) |>\n ungroup()\n\nWe also compute the investment ratio inv according to Ken French’s variable definitions as the change in total assets from one fiscal year to another. Note that we again use the approach using joins as introduced with the CRSP data above to construct lagged assets.\n\ncompustat <- compustat |> \n left_join(\n compustat |> \n select(gvkey, year, at_lag = at) |> \n mutate(year = year + 1), \n join_by(gvkey, year)\n ) |> \n mutate(\n inv = at / at_lag - 1,\n inv = if_else(at_lag <= 0, NA, inv)\n )\n\nWith the last step, we are already done preparing the firm fundamentals. Thus, we can store them in our local database.\n\ndbWriteTable(\n tidy_finance,\n \"compustat\",\n value = compustat,\n overwrite = TRUE\n)\n\nThe tidyfinance package provides a shortcut for the processing steps as well:\n\ncompustat <- download_data(\n type = \"wrds_compustat_annual\",\n start_date = start_date,\n end_date = end_date\n)", + "objectID": "r/parametric-portfolio-policies.html#parametric-portfolio-policies", + "href": "r/parametric-portfolio-policies.html#parametric-portfolio-policies", + "title": "Parametric Portfolio Policies", + "section": "Parametric Portfolio Policies", + "text": "Parametric Portfolio Policies\nThe basic idea of parametric portfolio weights is as follows. Suppose that at each date \\(t\\) we have \\(N_t\\) stocks in the investment universe, where each stock \\(i\\) has a return of \\(r_{i, t+1}\\) and is associated with a vector of firm characteristics \\(x_{i, t}\\) such as time-series momentum or the market capitalization. The investor’s problem is to choose portfolio weights \\(w_{i,t}\\) to maximize the expected utility of the portfolio return: \\[\\begin{aligned}\n\\max_{\\omega} E_t\\left(u(r_{p, t+1})\\right) = E_t\\left[u\\left(\\sum\\limits_{i=1}^{N_t}\\omega_{i,t}r_{i,t+1}\\right)\\right]\n\\end{aligned}\\] where \\(u(\\cdot)\\) denotes the utility function.\nWhere do the stock characteristics show up? We parameterize the optimal portfolio weights as a function of the stock characteristic \\(x_{i,t}\\) with the following linear specification for the portfolio weights: \\[\\omega_{i,t} = \\bar{\\omega}_{i,t} + \\frac{1}{N_t}\\theta'\\hat{x}_{i,t},\\] where \\(\\bar{\\omega}_{i,t}\\) is a stock’s weight in a benchmark portfolio (we use the value-weighted or naive portfolio in the application below), \\(\\theta\\) is a vector of coefficients which we are going to estimate, and \\(\\hat{x}_{i,t}\\) are the characteristics of stock \\(i\\), cross-sectionally standardized to have zero mean and unit standard deviation.\nIntuitively, the portfolio strategy is a form of active portfolio management relative to a performance benchmark. Deviations from the benchmark portfolio are derived from the individual stock characteristics. Note that by construction the weights sum up to one as \\(\\sum_{i=1}^{N_t}\\hat{x}_{i,t} = 0\\) due to the standardization. Moreover, the coefficients are constant across assets and over time. The implicit assumption is that the characteristics fully capture all aspects of the joint distribution of returns that are relevant for forming optimal portfolios.\nWe first implement cross-sectional standardization for the entire CRSP universe. We also keep track of (lagged) relative market capitalization relative_mktcap, which will represent the value-weighted benchmark portfolio, while n denotes the number of traded assets \\(N_t\\), which we use to construct the naive portfolio benchmark.\n\ndata_portfolios <- data_portfolios |>\n group_by(date) |>\n mutate(\n n = n(),\n relative_mktcap = mktcap_lag / sum(mktcap_lag),\n across(contains(\"lag\"), ~ (. - mean(.)) / sd(.)),\n ) |>\n ungroup() |>\n select(-mktcap_lag)", "crumbs": [ "R", - "Financial Data", - "WRDS, CRSP, and Compustat" + "Portfolio Optimization", + "Parametric Portfolio Policies" ] }, { - "objectID": "r/wrds-crsp-and-compustat.html#merging-crsp-with-compustat", - "href": "r/wrds-crsp-and-compustat.html#merging-crsp-with-compustat", - "title": "WRDS, CRSP, and Compustat", - "section": "Merging CRSP with Compustat", - "text": "Merging CRSP with Compustat\nUnfortunately, CRSP and Compustat use different keys to identify stocks and firms. CRSP uses permno for stocks, while Compustat uses gvkey to identify firms. Fortunately, a curated matching table on WRDS allows us to merge CRSP and Compustat, so we create a connection to the CRSP-Compustat Merged table (provided by CRSP).\n\nccm_linking_table_db <- tbl(wrds, I(\"crsp.ccmxpf_lnkhist\"))\n\nThe linking table contains links between CRSP and Compustat identifiers from various approaches. However, we need to make sure that we keep only relevant and correct links, again following the description outlined in Bali, Engle, and Murray (2016). Note also that currently active links have no end date, so we just enter the current date via today().\n\nccm_linking_table <- ccm_linking_table_db |>\n filter(\n linktype %in% c(\"LU\", \"LC\") &\n linkprim %in% c(\"P\", \"C\")\n ) |>\n select(permno = lpermno, gvkey, linkdt, linkenddt) |>\n collect() |>\n mutate(linkenddt = replace_na(linkenddt, today()))\n\nWe use these links to create a new table with a mapping between stock identifier, firm identifier, and month. We then add these links to the Compustat gvkey to our monthly stock data.\n\nccm_links <- crsp_monthly |>\n inner_join(ccm_linking_table, \n join_by(permno), relationship = \"many-to-many\") |>\n filter(!is.na(gvkey) & \n (date >= linkdt & date <= linkenddt)) |>\n select(permno, gvkey, date)\n\nTo fetch these links via tidyfinance, you can call:\n\nccm_links <- download_data(type = \"wrds_ccm_links\")\n\nAs the last step, we update the previously prepared monthly CRSP file with the linking information in our local database.\n\ncrsp_monthly <- crsp_monthly |>\n left_join(ccm_links, join_by(permno, date))\n\ndbWriteTable(\n tidy_finance,\n \"crsp_monthly\",\n value = crsp_monthly,\n overwrite = TRUE\n)\n\nBefore we close this chapter, let us look at an interesting descriptive statistic of our data. As the book value of equity plays a crucial role in many asset pricing applications, it is interesting to know for how many of our stocks this information is available. Hence, Figure 5 plots the share of securities with book equity values for each exchange. It turns out that the coverage is pretty bad for AMEX- and NYSE-listed stocks in the 1960s but hovers around 80 percent for all periods thereafter. We can ignore the erratic coverage of securities that belong to the other category since there is only a handful of them anyway in our sample.\n\ncrsp_monthly |>\n group_by(permno, year = year(date)) |>\n filter(date == max(date)) |>\n ungroup() |>\n left_join(compustat, join_by(gvkey, year)) |>\n group_by(exchange, year) |>\n summarize(\n share = n_distinct(permno[!is.na(be)]) / n_distinct(permno),\n .groups = \"drop\"\n ) |>\n ggplot(aes(\n x = year, \n y = share, \n color = exchange,\n linetype = exchange\n )) +\n geom_line() +\n labs(\n x = NULL, y = NULL, color = NULL, linetype = NULL,\n title = \"Share of securities with book equity values by exchange\"\n ) +\n scale_y_continuous(labels = percent) +\n coord_cartesian(ylim = c(0, 1))\n\n\n\n\n\n\n\nFigure 5: End-of-year share of securities with book equity values by listing exchange.", + "objectID": "r/parametric-portfolio-policies.html#computing-portfolio-weights", + "href": "r/parametric-portfolio-policies.html#computing-portfolio-weights", + "title": "Parametric Portfolio Policies", + "section": "Computing Portfolio Weights", + "text": "Computing Portfolio Weights\nNext, we move on to identify optimal choices of \\(\\theta\\). We rewrite the optimization problem together with the weight parametrization and can then estimate \\(\\theta\\) to maximize the objective function based on our sample \\[\\begin{aligned}\nE_t\\left(u(r_{p, t+1})\\right) = \\frac{1}{T}\\sum\\limits_{t=0}^{T-1}u\\left(\\sum\\limits_{i=1}^{N_t}\\left(\\bar{\\omega}_{i,t} + \\frac{1}{N_t}\\theta'\\hat{x}_{i,t}\\right)r_{i,t+1}\\right).\n\\end{aligned}\\] The allocation strategy is straightforward because the number of parameters to estimate is small. Instead of a tedious specification of the \\(N_t\\) dimensional vector of expected returns and the \\(N_t(N_t+1)/2\\) free elements of the covariance matrix, all we need to focus on in our application is the vector \\(\\theta\\). \\(\\theta\\) contains only two elements in our application: the relative deviation from the benchmark due to size and momentum.\nTo get a feeling for the performance of such an allocation strategy, we start with an arbitrary initial vector \\(\\theta_0\\). The next step is to choose \\(\\theta\\) optimally to maximize the objective function. We automatically detect the number of parameters by counting the number of columns with lagged values.\n\nn_parameters <- sum(str_detect(\n colnames(data_portfolios), \"lag\"\n))\n\ntheta <- rep(1.5, n_parameters)\n\nnames(theta) <- colnames(data_portfolios)[str_detect(\n colnames(data_portfolios), \"lag\"\n)]\n\nThe function compute_portfolio_weights() below computes the portfolio weights \\(\\bar{\\omega}_{i,t} + \\frac{1}{N_t}\\theta'\\hat{x}_{i,t}\\) according to our parametrization for a given value \\(\\theta_0\\). Everything happens within a single pipeline. Hence, we provide a short walk-through.\nWe first compute characteristic_tilt, the tilting values \\(\\frac{1}{N_t}\\theta'\\hat{x}_{i, t}\\) which resemble the deviation from the benchmark portfolio. Next, we compute the benchmark portfolio weight_benchmark, which can be any reasonable set of portfolio weights. In our case, we choose either the value or equal-weighted allocation. weight_tilt completes the picture and contains the final portfolio weights weight_tilt = weight_benchmark + characteristic_tilt which deviate from the benchmark portfolio depending on the stock characteristics.\nThe final few lines go a bit further and implement a simple version of a no-short sale constraint. While it is generally not straightforward to ensure portfolio weight constraints via parameterization, we simply normalize the portfolio weights such that they are enforced to be positive. Finally, we make sure that the normalized weights sum up to one again: \\[\\omega_{i,t}^+ = \\frac{\\max(0, \\omega_{i,t})}{\\sum_{j=1}^{N_t}\\max(0, \\omega_{i,t})}.\\]\nThe following function computes the optimal portfolio weights in the way just described.\n\ncompute_portfolio_weights <- function(theta,\n data,\n value_weighting = TRUE,\n allow_short_selling = TRUE) {\n data |>\n group_by(date) |>\n bind_cols(\n characteristic_tilt = data |>\n transmute(across(contains(\"lag\"), ~ . / n)) |>\n as.matrix() %*% theta |> as.numeric()\n ) |>\n mutate(\n # Definition of benchmark weight\n weight_benchmark = case_when(\n value_weighting == TRUE ~ relative_mktcap,\n value_weighting == FALSE ~ 1 / n\n ),\n # Parametric portfolio weights\n weight_tilt = weight_benchmark + characteristic_tilt,\n # Short-sell constraint\n weight_tilt = case_when(\n allow_short_selling == TRUE ~ weight_tilt,\n allow_short_selling == FALSE ~ pmax(0, weight_tilt)\n ),\n # Weights sum up to 1\n weight_tilt = weight_tilt / sum(weight_tilt)\n ) |>\n ungroup()\n}\n\nIn the next step, we compute the portfolio weights for the arbitrary vector \\(\\theta_0\\). In the example below, we use the value-weighted portfolio as a benchmark and allow negative portfolio weights.\n\nweights_crsp <- compute_portfolio_weights(\n theta,\n data_portfolios,\n value_weighting = TRUE,\n allow_short_selling = TRUE\n)", "crumbs": [ "R", - "Financial Data", - "WRDS, CRSP, and Compustat" + "Portfolio Optimization", + "Parametric Portfolio Policies" ] }, { - "objectID": "r/wrds-crsp-and-compustat.html#some-tricks-for-postgresql-databases", - "href": "r/wrds-crsp-and-compustat.html#some-tricks-for-postgresql-databases", - "title": "WRDS, CRSP, and Compustat", - "section": "Some Tricks for PostgreSQL Databases", - "text": "Some Tricks for PostgreSQL Databases\nAs we mentioned above, the WRDS database runs on PostgreSQL rather than SQLite. Finding the right tables for your data needs can be tricky in the WRDS PostgreSQL instance, as the tables are organized in schemas. If you wonder what the purpose of schemas is, check out this documetation. For instance, if you want to find all tables that live in the crsp schema, you run\n\ndbListObjects(wrds, Id(schema = \"crsp\"))\n\nThis operation returns a list of all tables that belong to the crsp family on WRDS, e.g., <Id> schema = crsp, table = msenames. Similarly, you can fetch a list of all tables that belong to the comp family via\n\ndbListObjects(wrds, Id(schema = \"comp\"))\n\nIf you want to get all schemas, then run\n\ndbListObjects(wrds)", + "objectID": "r/parametric-portfolio-policies.html#portfolio-performance", + "href": "r/parametric-portfolio-policies.html#portfolio-performance", + "title": "Parametric Portfolio Policies", + "section": "Portfolio Performance", + "text": "Portfolio Performance\n Are the computed weights optimal in any way? Most likely not, as we picked \\(\\theta_0\\) arbitrarily. To evaluate the performance of an allocation strategy, one can think of many different approaches. In their original paper, Brandt, Santa-Clara, and Valkanov (2009) focus on a simple evaluation of the hypothetical utility of an agent equipped with a power utility function \\(u_\\gamma(r) = \\frac{(1 + r)^{(1-\\gamma)}}{1-\\gamma}\\), where \\(\\gamma\\) is the risk aversion factor.\n\npower_utility <- function(r, gamma = 5) {\n (1 + r)^(1 - gamma) / (1 - gamma)\n}\n\nWe want to note that Gehrig, Sögner, and Westerkamp (2020) warn that, in the leading case of constant relative risk aversion (CRRA), strong assumptions on the properties of the returns, the variables used to implement the parametric portfolio policy, and the parameter space are necessary to obtain a well-defined optimization problem.\nNo doubt, there are many other ways to evaluate a portfolio. The function below provides a summary of all kinds of interesting measures that can be considered relevant. Do we need all these evaluation measures? It depends: the original paper by Brandt, Santa-Clara, and Valkanov (2009) only cares about the expected utility to choose \\(\\theta\\). However, if you want to choose optimal values that achieve the highest performance while putting some constraints on your portfolio weights, it is helpful to have everything in one function.\n\nevaluate_portfolio <- function(weights_crsp,\n capm_evaluation = TRUE,\n full_evaluation = TRUE,\n length_year = 12) {\n \n evaluation <- weights_crsp |>\n group_by(date) |>\n summarize(\n tilt = weighted.mean(ret_excess, weight_tilt),\n benchmark = weighted.mean(ret_excess, weight_benchmark)\n ) |>\n pivot_longer(\n -date,\n values_to = \"portfolio_return\",\n names_to = \"model\"\n ) \n \n evaluation_stats <- evaluation |>\n group_by(model) |>\n left_join(factors_ff3_monthly, join_by(date)) |>\n summarize(tibble(\n \"Expected utility\" = mean(power_utility(portfolio_return)),\n \"Average return\" = 100 * mean(length_year * portfolio_return),\n \"SD return\" = 100 * sqrt(length_year) * sd(portfolio_return),\n \"Sharpe ratio\" = sqrt(length_year) * mean(portfolio_return) / sd(portfolio_return),\n\n )) |>\n mutate(model = str_remove(model, \"return_\")) \n \n if (capm_evaluation) {\n evaluation_capm <- evaluation |> \n left_join(factors_ff3_monthly, join_by(date)) |>\n group_by(model) |>\n summarize(\n \"CAPM alpha\" = coefficients(lm(portfolio_return ~ mkt_excess))[1],\n \"Market beta\" = coefficients(lm(portfolio_return ~ mkt_excess))[2]\n )\n \n evaluation_stats <- evaluation_stats |> \n left_join(evaluation_capm, join_by(model))\n }\n\n if (full_evaluation) {\n evaluation_weights <- weights_crsp |>\n select(date, contains(\"weight\")) |>\n pivot_longer(-date, values_to = \"weight\", names_to = \"model\") |>\n group_by(model, date) |>\n mutate(\n \"Absolute weight\" = abs(weight),\n \"Max. weight\" = max(weight),\n \"Min. weight\" = min(weight),\n \"Avg. sum of negative weights\" = -sum(weight[weight < 0]),\n \"Avg. fraction of negative weights\" = sum(weight < 0) / n(),\n .keep = \"none\"\n ) |>\n group_by(model) |>\n summarize(across(-date, ~ 100 * mean(.))) |>\n mutate(model = str_remove(model, \"weight_\")) \n \n evaluation_stats <- evaluation_stats |> \n left_join(evaluation_weights, join_by(model))\n }\n \n evaluation_output <- evaluation_stats |> \n pivot_longer(cols = -model, names_to = \"measure\") |> \n pivot_wider(names_from = model)\n \n return(evaluation_output)\n}\n\n Let us take a look at the different portfolio strategies and evaluation measures.\n\nevaluate_portfolio(weights_crsp) |>\n print(n = Inf)\n\n# A tibble: 11 × 3\n measure benchmark tilt\n <chr> <dbl> <dbl>\n 1 Expected utility -0.250 -0.261 \n 2 Average return 6.87 0.537 \n 3 SD return 15.5 21.2 \n 4 Sharpe ratio 0.444 0.0254 \n 5 CAPM alpha 0.000141 -0.00485\n 6 Market beta 0.994 0.943 \n 7 Absolute weight 0.0249 0.0638 \n 8 Max. weight 3.63 3.76 \n 9 Min. weight 0.0000270 -0.144 \n10 Avg. sum of negative weights 0 78.1 \n11 Avg. fraction of negative weights 0 49.5 \n\n\nThe value-weighted portfolio delivers an annualized return of more than 6 percent and clearly outperforms the tilted portfolio, irrespective of whether we evaluate expected utility, the Sharpe ratio, or the CAPM alpha. We can conclude the market beta is close to one for both strategies (naturally almost identically 1 for the value-weighted benchmark portfolio). When it comes to the distribution of the portfolio weights, we see that the benchmark portfolio weight takes less extreme positions (lower average absolute weights and lower maximum weight). By definition, the value-weighted benchmark does not take any negative positions, while the tilted portfolio also takes short positions.", "crumbs": [ "R", - "Financial Data", - "WRDS, CRSP, and Compustat" + "Portfolio Optimization", + "Parametric Portfolio Policies" ] }, { - "objectID": "r/wrds-crsp-and-compustat.html#exercises", - "href": "r/wrds-crsp-and-compustat.html#exercises", - "title": "WRDS, CRSP, and Compustat", - "section": "Exercises", - "text": "Exercises\n\nCheck out the structure of the WRDS database by sending queries in the spirit of “Querying WRDS Data using R” and verify the output with dbListObjects(). How many tables are associated with CRSP? Can you identify what is stored within msp500?\nCompute mkt_cap_lag using lag(mktcap) rather than using joins as above. Filter out all the rows where the lag-based market capitalization measure is different from the one we computed above. Why are the two measures they different?\nPlot the average market capitalization of firms for each exchange and industry, respectively, over time. What do you find?\nIn the compustat table, datadate refers to the date to which the fiscal year of a corresponding firm refers. Count the number of observations in Compustat by month of this date variable. What do you find? What does the finding suggest about pooling observations with the same fiscal year?\nGo back to the original Compustat data in funda_db and extract rows where the same firm has multiple rows for the same fiscal year. What is the reason for these observations?\nKeep the last observation of crsp_monthly by year and join it with the compustat table. Create the following plots: (i) aggregate book equity by exchange over time and (ii) aggregate annual book equity by industry over time. Do you notice any different patterns to the corresponding plots based on market capitalization?\nRepeat the analysis of market capitalization for book equity, which we computed from the Compustat data. Then, use the matched sample to plot book equity against market capitalization. How are these two variables related?", + "objectID": "r/parametric-portfolio-policies.html#optimal-parameter-choice", + "href": "r/parametric-portfolio-policies.html#optimal-parameter-choice", + "title": "Parametric Portfolio Policies", + "section": "Optimal Parameter Choice", + "text": "Optimal Parameter Choice\nNext, we move to a choice of \\(\\theta\\) that actually aims to improve some (or all) of the performance measures. We first define a helper function compute_objective_function(), which we then pass to an optimizer.\n\ncompute_objective_function <- function(theta,\n data,\n objective_measure = \"Expected utility\",\n value_weighting = TRUE,\n allow_short_selling = TRUE) {\n processed_data <- compute_portfolio_weights(\n theta,\n data,\n value_weighting,\n allow_short_selling\n )\n\n objective_function <- evaluate_portfolio(\n processed_data,\n capm_evaluation = FALSE,\n full_evaluation = FALSE\n ) |>\n filter(measure == objective_measure) |>\n pull(tilt)\n\n return(-objective_function)\n}\n\nYou may wonder why we return the negative value of the objective function. This is simply due to the common convention for optimization procedures to search for minima as a default. By minimizing the negative value of the objective function, we get the maximum value as a result. In its most basic form, R optimization relies on the function optim(). As main inputs, the function requires an initial guess of the parameters and the objective function to minimize. Now, we are fully equipped to compute the optimal values of \\(\\hat\\theta\\), which maximize the hypothetical expected utility of the investor.\n\noptimal_theta <- optim(\n par = theta,\n fn = compute_objective_function,\n objective_measure = \"Expected utility\",\n data = data_portfolios,\n value_weighting = TRUE,\n allow_short_selling = TRUE,\n method = \"Nelder-Mead\"\n)\n\noptimal_theta$par\n\nmomentum_lag size_lag \n 0.304 -1.705 \n\n\nThe resulting values of \\(\\hat\\theta\\) are easy to interpret: intuitively, expected utility increases by tilting weights from the value-weighted portfolio toward smaller stocks (negative coefficient for size) and toward past winners (positive value for momentum). Both findings are in line with the well-documented size effect (Banz 1981) and the momentum anomaly (Jegadeesh and Titman 1993).", "crumbs": [ "R", - "Financial Data", - "WRDS, CRSP, and Compustat" + "Portfolio Optimization", + "Parametric Portfolio Policies" ] }, { - "objectID": "r/wrds-crsp-and-compustat.html#footnotes", - "href": "r/wrds-crsp-and-compustat.html#footnotes", - "title": "WRDS, CRSP, and Compustat", - "section": "Footnotes", - "text": "Footnotes\n\n\nThe tbl() function creates a lazy table in our R session based on the remote WRDS database. To look up specific tables, we use the I(\"schema_name.table_name\") approach.↩︎\nThese three criteria jointly replicate the filter exchcd %in% c(1, 2, 3, 31, 32, 33) used for the legacy version of CRSP. If you do not want to include stocks at issuance, you can set the conditionaltype == \"RW\", which is equivalent to the restriction of exchcd %in% c(1, 2, 3) with the old CRSP format.↩︎\nCompanies that operate in the banking, insurance, or utilities sector typically report in different industry formats that reflect their specific regulatory requirements.↩︎\nCompustat also contains reports in CAD, which can lead a currency mismatch, e.g., when relating book equity to market equity.↩︎", + "objectID": "r/parametric-portfolio-policies.html#more-model-specifications", + "href": "r/parametric-portfolio-policies.html#more-model-specifications", + "title": "Parametric Portfolio Policies", + "section": "More Model Specifications", + "text": "More Model Specifications\nHow does the portfolio perform for different model specifications? For this purpose, we compute the performance of a number of different modeling choices based on the entire CRSP sample. The next code chunk performs all the heavy lifting.\n\nevaluate_optimal_performance <- function(data, \n objective_measure,\n value_weighting, \n allow_short_selling) {\n optimal_theta <- optim(\n par = theta,\n fn = compute_objective_function,\n data = data,\n objective_measure = \"Expected utility\",\n value_weighting = TRUE,\n allow_short_selling = TRUE,\n method = \"Nelder-Mead\"\n )\n\n processed_data = compute_portfolio_weights(\n optimal_theta$par, \n data,\n value_weighting,\n allow_short_selling\n )\n \n portfolio_evaluation = evaluate_portfolio(\n processed_data,\n capm_evaluation = TRUE,\n full_evaluation = TRUE\n )\n \n return(portfolio_evaluation) \n}\n\nspecifications <- expand_grid(\n data = list(data_portfolios),\n objective_measure = \"Expected utility\",\n value_weighting = c(TRUE, FALSE),\n allow_short_selling = c(TRUE, FALSE)\n) |> \n mutate(\n portfolio_evaluation = pmap(\n .l = list(data, objective_measure, value_weighting, allow_short_selling),\n .f = evaluate_optimal_performance\n )\n)\n\nFinally, we can compare the results. The table below shows summary statistics for all possible combinations: equal- or value-weighted benchmark portfolio, with or without short-selling constraints, and tilted toward maximizing expected utility.\n\nperformance_table <- specifications |>\n select(\n value_weighting,\n allow_short_selling,\n portfolio_evaluation\n ) |>\n unnest(portfolio_evaluation)\n\nperformance_table |>\n rename(\n \" \" = benchmark,\n Optimal = tilt\n ) |>\n mutate(\n value_weighting = case_when(\n value_weighting == TRUE ~ \"VW\",\n value_weighting == FALSE ~ \"EW\"\n ),\n allow_short_selling = case_when(\n allow_short_selling == TRUE ~ \"\",\n allow_short_selling == FALSE ~ \"(no s.)\"\n )\n ) |>\n pivot_wider(\n names_from = value_weighting:allow_short_selling,\n values_from = \" \":Optimal,\n names_glue = \"{value_weighting} {allow_short_selling} {.value} \"\n ) |>\n select(\n measure,\n `EW `,\n `VW `,\n sort(contains(\"Optimal\"))\n ) |>\n print(n = 11)\n\n# A tibble: 11 × 7\n measure `EW ` `VW ` `VW Optimal ` `VW (no s.) Optimal `\n <chr> <dbl> <dbl> <dbl> <dbl>\n 1 Expected u… -0.251 -2.50e-1 -0.247 -0.248 \n 2 Average re… 10.0 6.87e+0 12.9 12.1 \n 3 SD return 20.5 1.55e+1 19.5 19.0 \n 4 Sharpe rat… 0.489 4.44e-1 0.660 0.636 \n 5 CAPM alpha 0.00200 1.41e-4 0.00506 0.00425\n 6 Market beta 1.13 9.94e-1 1.01 1.03 \n 7 Absolute w… 0.0249 2.49e-2 0.0345 0.0249 \n 8 Max. weight 0.0249 3.63e+0 3.48 2.91 \n 9 Min. weight 0.0249 2.70e-5 -0.0281 0 \n10 Avg. sum o… 0 0 20.1 0 \n11 Avg. fract… 0 0 36.8 0 \n# ℹ 2 more variables: `EW Optimal ` <dbl>,\n# `EW (no s.) Optimal ` <dbl>\n\n\nThe results indicate that the average annualized Sharpe ratio of the equal-weighted portfolio exceeds the Sharpe ratio of the value-weighted benchmark portfolio. Nevertheless, starting with the weighted value portfolio as a benchmark and tilting optimally with respect to momentum and small stocks yields the highest Sharpe ratio across all specifications. Finally, imposing no short-sale constraints does not improve the performance of the portfolios in our application.", "crumbs": [ "R", - "Financial Data", - "WRDS, CRSP, and Compustat" + "Portfolio Optimization", + "Parametric Portfolio Policies" ] }, { - "objectID": "r/fixed-effects-and-clustered-standard-errors.html", - "href": "r/fixed-effects-and-clustered-standard-errors.html", - "title": "Fixed Effects and Clustered Standard Errors", - "section": "", - "text": "Note\n\n\n\nYou are reading Tidy Finance with R. You can find the equivalent chapter for the sibling Tidy Finance with Python here.\nIn this chapter, we provide an intuitive introduction to the two popular concepts of fixed effects regressions and clustered standard errors. When working with regressions in empirical finance, you will sooner or later be confronted with discussions around how you deal with omitted variables bias and dependence in your residuals. The concepts we introduce in this chapter are designed to address such concerns.\nWe focus on a classical panel regression common to the corporate finance literature (e.g., Fazzari et al. 1988; Erickson and Whited 2012; Gulen and Ion 2015): firm investment modeled as a function that increases in firm cash flow and firm investment opportunities.\nTypically, this investment regression uses quarterly balance sheet data provided via Compustat because it allows for richer dynamics in the regressors and more opportunities to construct variables. As we focus on the implementation of fixed effects and clustered standard errors, we use the annual Compustat data from our previous chapters and leave the estimation using quarterly data as an exercise. We demonstrate below that the regression based on annual data yields qualitatively similar results to estimations based on quarterly data from the literature, namely confirming the positive relationships between investment and the two regressors.\nThe current chapter relies on the following set of R packages.\nlibrary(tidyverse)\nlibrary(RSQLite)\nlibrary(fixest)\nCompared to previous chapters, we introduce fixest (Bergé 2018) for the fixed effects regressions, the implementation of standard error clusters, and tidy estimation output.", + "objectID": "r/parametric-portfolio-policies.html#exercises", + "href": "r/parametric-portfolio-policies.html#exercises", + "title": "Parametric Portfolio Policies", + "section": "Exercises", + "text": "Exercises\n\nHow do the estimated parameters \\(\\hat\\theta\\) and the portfolio performance change if your objective is to maximize the Sharpe ratio instead of the hypothetical expected utility?\nThe code above is very flexible in the sense that you can easily add new firm characteristics. Construct a new characteristic of your choice and evaluate the corresponding coefficient \\(\\hat\\theta_i\\).\nTweak the function optimal_theta() such that you can impose additional performance constraints in order to determine \\(\\hat\\theta\\), which maximizes expected utility under the constraint that the market beta is below 1.\nDoes the portfolio performance resemble a realistic out-of-sample backtesting procedure? Verify the robustness of the results by first estimating \\(\\hat\\theta\\) based on past data only. Then, use more recent periods to evaluate the actual portfolio performance.\nBy formulating the portfolio problem as a statistical estimation problem, you can easily obtain standard errors for the coefficients of the weight function. Brandt, Santa-Clara, and Valkanov (2009) provide the relevant derivations in their paper in Equation (10). Implement a small function that computes standard errors for \\(\\hat\\theta\\).", "crumbs": [ "R", - "Modeling and Machine Learning", - "Fixed Effects and Clustered Standard Errors" + "Portfolio Optimization", + "Parametric Portfolio Policies" ] }, { - "objectID": "r/fixed-effects-and-clustered-standard-errors.html#data-preparation", - "href": "r/fixed-effects-and-clustered-standard-errors.html#data-preparation", - "title": "Fixed Effects and Clustered Standard Errors", - "section": "Data Preparation", - "text": "Data Preparation\nWe use CRSP and annual Compustat as data sources from our SQLite-database introduced in Accessing and Managing Financial Data and WRDS, CRSP, and Compustat. In particular, Compustat provides balance sheet and income statement data on a firm level, while CRSP provides market valuations. \n\ntidy_finance <- dbConnect(\n SQLite(),\n \"data/tidy_finance_r.sqlite\",\n extended_types = TRUE\n)\n\ncrsp_monthly <- tbl(tidy_finance, \"crsp_monthly\") |>\n select(gvkey, date, mktcap) |>\n collect()\n\ncompustat <- tbl(tidy_finance, \"compustat\") |>\n select(datadate, gvkey, year, at, be, capx, oancf, txdb) |>\n collect()\n\nThe classical investment regressions model the capital investment of a firm as a function of operating cash flows and Tobin’s q, a measure of a firm’s investment opportunities. We start by constructing investment and cash flows which are usually normalized by lagged total assets of a firm. In the following code chunk, we construct a panel of firm-year observations, so we have both cross-sectional information on firms as well as time-series information for each firm.\n\ndata_investment <- compustat |>\n mutate(date = floor_date(datadate, \"month\")) |>\n left_join(compustat |>\n select(gvkey, year, at_lag = at) |>\n mutate(year = year + 1),\n join_by(gvkey, year)\n ) |>\n filter(at > 0, at_lag > 0) |>\n mutate(\n investment = capx / at_lag,\n cash_flows = oancf / at_lag\n )\n\ndata_investment <- data_investment |>\n left_join(data_investment |>\n select(gvkey, year, investment_lead = investment) |>\n mutate(year = year - 1),\n join_by(gvkey, year)\n )\n\nTobin’s q is the ratio of the market value of capital to its replacement costs. It is one of the most common regressors in corporate finance applications (e.g., Fazzari et al. 1988; Erickson and Whited 2012). We follow the implementation of Gulen and Ion (2015) and compute Tobin’s q as the market value of equity (mktcap) plus the book value of assets (at) minus book value of equity (be) plus deferred taxes (txdb), all divided by book value of assets (at). Finally, we only keep observations where all variables of interest are non-missing, and the reported book value of assets is strictly positive.\n\ndata_investment <- data_investment |>\n left_join(crsp_monthly, join_by(gvkey, date)) |>\n mutate(tobins_q = (mktcap + at - be + txdb) / at) |>\n select(gvkey, year, investment_lead, cash_flows, tobins_q) |>\n drop_na()\n\nAs the variable construction typically leads to extreme values that are most likely related to data issues (e.g., reporting errors), many papers include winsorization of the variables of interest. Winsorization involves replacing values of extreme outliers with quantiles on the respective end. The following function implements the winsorization for any percentage cut that should be applied on either end of the distributions. In the specific example, we winsorize the main variables (investment, cash_flows, and tobins_q) at the one percent level.\n\nwinsorize <- function(x, cut) {\n x <- replace(\n x,\n x > quantile(x, 1 - cut, na.rm = T),\n quantile(x, 1 - cut, na.rm = T)\n )\n x <- replace(\n x,\n x < quantile(x, cut, na.rm = T),\n quantile(x, cut, na.rm = T)\n )\n return(x)\n}\n\ndata_investment <- data_investment |>\n mutate(across(\n c(investment_lead, cash_flows, tobins_q),\n ~ winsorize(., 0.01)\n ))\n\nBefore proceeding to any estimations, we highly recommend tabulating summary statistics of the variables that enter the regression. These simple tables allow you to check the plausibility of your numerical variables, as well as spot any obvious errors or outliers. Additionally, for panel data, plotting the time series of the variable’s mean and the number of observations is a useful exercise to spot potential problems.\n\ndata_investment |>\n pivot_longer(\n cols = c(investment_lead, cash_flows, tobins_q),\n names_to = \"measure\"\n ) |>\n group_by(measure) |>\n summarize(\n mean = mean(value),\n sd = sd(value),\n min = min(value),\n q05 = quantile(value, 0.05),\n q50 = quantile(value, 0.50),\n q95 = quantile(value, 0.95),\n max = max(value),\n n = n(),\n .groups = \"drop\"\n )\n\n# A tibble: 3 × 9\n measure mean sd min q05 q50 q95 max n\n <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <int>\n1 cash_flo… 0.00985 0.274 -1.55 -4.75e-1 0.0630 0.272 0.478 130559\n2 investme… 0.0570 0.0766 0 6.18e-4 0.0322 0.204 0.460 130559\n3 tobins_q 1.99 1.69 0.565 7.91e-1 1.39 5.35 10.9 130559", + "objectID": "r/capital-asset-pricing-model.html", + "href": "r/capital-asset-pricing-model.html", + "title": "The Capital Asset Pricing Model", + "section": "", + "text": "Key questions:\nCAPM is an equilibrium model\nThe CAPM in a nutshell: investors demand a compensation for risk\nlibrary(tidyverse)\nlibrary(tidyfinance)\nlibrary(scales)\nlibrary(ggrepel)", "crumbs": [ "R", - "Modeling and Machine Learning", - "Fixed Effects and Clustered Standard Errors" + "Getting Started", + "The Capital Asset Pricing Model" ] }, { - "objectID": "r/fixed-effects-and-clustered-standard-errors.html#fixed-effects", - "href": "r/fixed-effects-and-clustered-standard-errors.html#fixed-effects", - "title": "Fixed Effects and Clustered Standard Errors", - "section": "Fixed Effects", - "text": "Fixed Effects\nTo illustrate fixed effects regressions, we use the fixest package, which is both computationally powerful and flexible with respect to model specifications. We start out with the basic investment regression using the simple model \\[ \\text{Investment}_{i,t+1} = \\alpha + \\beta_1\\text{Cash Flows}_{i,t}+\\beta_2\\text{Tobin's q}_{i,t}+\\varepsilon_{i,t},\\] where \\(\\varepsilon_t\\) is i.i.d. normally distributed across time and firms. We use the feols()-function to estimate the simple model so that the output has the same structure as the other regressions below, but you could also use lm().\n\nmodel_ols <- feols(\n fml = investment_lead ~ cash_flows + tobins_q,\n vcov = \"iid\",\n data = data_investment\n)\nmodel_ols\n\nOLS estimation, Dep. Var.: investment_lead\nObservations: 130,559 \nStandard-errors: IID \n Estimate Std. Error t value Pr(>|t|) \n(Intercept) 0.04209 0.000327 128.8 < 2.2e-16 ***\ncash_flows 0.04923 0.000777 63.3 < 2.2e-16 ***\ntobins_q 0.00724 0.000126 57.5 < 2.2e-16 ***\n---\nSignif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1\nRMSE: 0.074865 Adj. R2: 0.043615\n\n\nAs expected, the regression output shows significant coefficients for both variables. Higher cash flows and investment opportunities are associated with higher investment. However, the simple model actually may have a lot of omitted variables, so our coefficients are most likely biased. As there is a lot of unexplained variation in our simple model (indicated by the rather low adjusted R-squared), the bias in our coefficients is potentially severe, and the true values could be above or below zero. Note that there are no clear cutoffs to decide when an R-squared is high or low, but it depends on the context of your application and on the comparison of different models for the same data.\nOne way to tackle the issue of omitted variable bias is to get rid of as much unexplained variation as possible by including fixed effects; i.e., model parameters that are fixed for specific groups (e.g., Wooldridge 2010). In essence, each group has its own mean in fixed effects regressions. The simplest group that we can form in the investment regression is the firm level. The firm fixed effects regression is then \\[ \\text{Investment}_{i,t+1} = \\alpha_i + \\beta_1\\text{Cash Flows}_{i,t}+\\beta_2\\text{Tobin's q}_{i,t}+\\varepsilon_{i,t},\\] where \\(\\alpha_i\\) is the firm fixed effect and captures the firm-specific mean investment across all years. In fact, you could also compute firms’ investments as deviations from the firms’ average investments and estimate the model without the fixed effects. The idea of the firm fixed effect is to remove the firm’s average investment, which might be affected by firm-specific variables that you do not observe. For example, firms in a specific industry might invest more on average. Or you observe a young firm with large investments but only small concurrent cash flows, which will only happen in a few years. This sort of variation is unwanted because it is related to unobserved variables that can bias your estimates in any direction.\nTo include the firm fixed effect, we use gvkey (Compustat’s firm identifier) as follows:\n\nmodel_fe_firm <- feols(\n investment_lead ~ cash_flows + tobins_q | gvkey,\n vcov = \"iid\",\n data = data_investment\n)\nmodel_fe_firm\n\nOLS estimation, Dep. Var.: investment_lead\nObservations: 130,559 \nFixed-effects: gvkey: 14,556\nStandard-errors: IID \n Estimate Std. Error t value Pr(>|t|) \ncash_flows 0.0141 0.000897 15.8 < 2.2e-16 ***\ntobins_q 0.0107 0.000130 82.5 < 2.2e-16 ***\n---\nSignif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1\nRMSE: 0.049257 Adj. R2: 0.534034\n Within R2: 0.056459\n\n\nThe regression output shows a lot of unexplained variation at the firm level that is taken care of by including the firm fixed effect as the adjusted R-squared rises above 50 percent. In fact, it is more interesting to look at the within R-squared that shows the explanatory power of a firm’s cash flow and Tobin’s q on top of the average investment of each firm. We can also see that the coefficients changed slightly in magnitude but not in sign.\nThere is another source of variation that we can get rid of in our setting: average investment across firms might vary over time due to macroeconomic factors that affect all firms, such as economic crises. By including year fixed effects, we can take out the effect of unobservables that vary over time. The two-way fixed effects regression is then \\[ \\text{Investment}_{i,t+1} = \\alpha_i + \\alpha_t + \\beta_1\\text{Cash Flows}_{i,t}+\\beta_2\\text{Tobin's q}_{i,t}+\\varepsilon_{i,t},\\] where \\(\\alpha_t\\) is the time fixed effect. Here you can think of higher investments during an economic expansion with simultaneously high cash flows.\n\nmodel_fe_firmyear <- feols(\n investment_lead ~ cash_flows + tobins_q | gvkey + year,\n vcov = \"iid\",\n data = data_investment\n)\nmodel_fe_firmyear\n\nOLS estimation, Dep. Var.: investment_lead\nObservations: 130,559 \nFixed-effects: gvkey: 14,556, year: 36\nStandard-errors: IID \n Estimate Std. Error t value Pr(>|t|) \ncash_flows 0.01721 0.000877 19.6 < 2.2e-16 ***\ntobins_q 0.00972 0.000128 75.8 < 2.2e-16 ***\n---\nSignif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1\nRMSE: 0.047969 Adj. R2: 0.557959\n Within R2: 0.049442\n\n\nThe inclusion of time fixed effects did only marginally affect the R-squared and the coefficients, which we can interpret as a good thing as it indicates that the coefficients are not driven by an omitted variable that varies over time.\nHow can we further improve the robustness of our regression results? Ideally, we want to get rid of unexplained variation at the firm-year level, which means we need to include more variables that vary across firm and time and are likely correlated with investment. Note that we cannot include firm-year fixed effects in our setting because then cash flows and Tobin’s q are colinear with the fixed effects, and the estimation becomes void.\nBefore we discuss the properties of our estimation errors, we want to point out that regression tables are at the heart of every empirical analysis, where you compare multiple models. Fortunately, the etable() function provides a convenient way to tabulate the regression output (with many parameters to customize and even print the output in LaTeX). We recommend printing \\(t\\)-statistics rather than standard errors in regression tables because the latter are typically very hard to interpret across coefficients that vary in size. We also do not print p-values because they are sometimes misinterpreted to signal the importance of observed effects (Wasserstein and Lazar 2016). The \\(t\\)-statistics provide a consistent way to interpret changes in estimation uncertainty across different model specifications.\n\netable(\n model_ols, model_fe_firm, model_fe_firmyear,\n coefstat = \"tstat\", digits = 3, digits.stats = 3\n)\n\n model_ols model_fe_firm model_fe_firm..\nDependent Var.: investment_lead investment_lead investment_lead\n \nConstant 0.042*** (128.8) \ncash_flows 0.049*** (63.3) 0.014*** (15.8) 0.017*** (19.6)\ntobins_q 0.007*** (57.5) 0.011*** (82.5) 0.010*** (75.8)\nFixed-Effects: ---------------- --------------- ---------------\ngvkey No Yes Yes\nyear No No Yes\n_______________ ________________ _______________ _______________\nVCOV type IID IID IID\nObservations 130,559 130,559 130,559\nR2 0.044 0.586 0.607\nWithin R2 -- 0.057 0.049\n---\nSignif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1", + "objectID": "r/capital-asset-pricing-model.html#asset-returns-volatilities", + "href": "r/capital-asset-pricing-model.html#asset-returns-volatilities", + "title": "The Capital Asset Pricing Model", + "section": "Asset Returns & Volatilities", + "text": "Asset Returns & Volatilities\n\nsymbols <- download_data(\n type = \"constituents\",\n index = \"Dow Jones Industrial Average\"\n)\n\nprices_daily <- download_data(\n type = \"stock_prices\", symbol = symbols$symbol,\n start_date = \"2019-10-01\", end_date = \"2024-09-30\"\n) |> \n select(symbol, date, price = adjusted_close)\n\nCalculate daily returns\n\nreturns_daily <- prices_daily |>\n group_by(symbol) |> \n mutate(ret = price / lag(price) - 1) |>\n ungroup() |> \n select(symbol, date, ret) |> \n drop_na(ret) |> \n arrange(symbol, date)\n\nPlot risk & return Figure 1\n\nassets <- returns_daily |> \n group_by(symbol) |> \n summarize(mu = mean(ret), sigma = sd(ret))\n\nfig_vola_return <- assets |> \n ggplot(aes(x = sigma, y = mu)) +\n geom_point() + \n geom_label_repel(data = assets |> filter(symbol %in% c(\"BA\", \"NVDA\")),\n aes(label = symbol)) +\n scale_x_continuous(labels = percent) +\n scale_y_continuous(labels = percent) + \n labs(x = \"Volatility\", y = \"Average return\",\n title = \"Average returns and volatilities of Dow index constituents\") \nfig_vola_return\n\n\n\n\n\n\n\nFigure 1: Average returns and volatilities are based on returns adjusted for dividend payments and stock splits.\n\n\n\n\n\nDoes high risk bring high returns?\nBoeing (BA) vs Nvidia (NVDA)\n\nCompany-specific events might affect stock prices\nExamples: CEO resignation, product launch, earnings report\nIdiosyncratic events don’t impact the overall market\nThis asset-specific risk can be eliminated through diversification\n\nFocus on systematic risk that affects all assets in the market\nSystematic vs idiosyncratic risk: Investors dislike risk\nDifferent sources of risk\n\nSystematic risk: all assets are exposed to it, cannot be diversified away\nIdiosyncratic risk: unique to particular asset, can be diversified away", "crumbs": [ "R", - "Modeling and Machine Learning", - "Fixed Effects and Clustered Standard Errors" + "Getting Started", + "The Capital Asset Pricing Model" ] }, { - "objectID": "r/fixed-effects-and-clustered-standard-errors.html#clustering-standard-errors", - "href": "r/fixed-effects-and-clustered-standard-errors.html#clustering-standard-errors", - "title": "Fixed Effects and Clustered Standard Errors", - "section": "Clustering Standard Errors", - "text": "Clustering Standard Errors\nApart from biased estimators, we usually have to deal with potentially complex dependencies of our residuals with each other. Such dependencies in the residuals invalidate the i.i.d. assumption of OLS and lead to biased standard errors. With biased OLS standard errors, we cannot reliably interpret the statistical significance of our estimated coefficients.\nIn our setting, the residuals may be correlated across years for a given firm (time-series dependence), or, alternatively, the residuals may be correlated across different firms (cross-section dependence). One of the most common approaches to dealing with such dependence is the use of clustered standard errors (Petersen 2008). The idea behind clustering is that the correlation of residuals within a cluster can be of any form. As the number of clusters grows, the cluster-robust standard errors become consistent (Donald and Lang 2007; Wooldridge 2010). A natural requirement for clustering standard errors in practice is hence a sufficiently large number of clusters. Typically, around at least 30 to 50 clusters are seen as sufficient (Cameron, Gelbach, and Miller 2011).\nInstead of relying on the iid assumption, we can use the cluster option in the feols-function as above. The code chunk below applies both one-way clustering by firm as well as two-way clustering by firm and year.\n\nmodel_cluster_firm <- feols(\n investment_lead ~ cash_flows + tobins_q | gvkey + year,\n cluster = \"gvkey\",\n data = data_investment\n)\n\nmodel_cluster_firmyear <- feols(\n investment_lead ~ cash_flows + tobins_q | gvkey + year,\n cluster = c(\"gvkey\", \"year\"),\n data = data_investment\n)\n\n The table below shows the comparison of the different assumptions behind the standard errors. In the first column, we can see highly significant coefficients on both cash flows and Tobin’s q. By clustering the standard errors on the firm level, the \\(t\\)-statistics of both coefficients drop in half, indicating a high correlation of residuals within firms. If we additionally cluster by year, we see a drop, particularly for Tobin’s q, again. Even after relaxing the assumptions behind our standard errors, both coefficients are still comfortably significant as the \\(t\\)-statistics are well above the usual critical values of 1.96 or 2.576 for two-tailed significance tests.\n\netable(\n model_fe_firmyear, model_cluster_firm, model_cluster_firmyear,\n coefstat = \"tstat\", digits = 3, digits.stats = 3\n)\n\n model_fe_firm.. model_cluster.. model_cluster...1\nDependent Var.: investment_lead investment_lead investment_lead\n \ncash_flows 0.017*** (19.6) 0.017*** (11.4) 0.017*** (9.58)\ntobins_q 0.010*** (75.8) 0.010*** (35.6) 0.010*** (15.1)\nFixed-Effects: --------------- --------------- ---------------\ngvkey Yes Yes Yes\nyear Yes Yes Yes\n_______________ _______________ _______________ _______________\nVCOV type IID by: gvkey by: gvkey & year\nObservations 130,559 130,559 130,559\nR2 0.607 0.607 0.607\nWithin R2 0.049 0.049 0.049\n---\nSignif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1\n\n\nInspired by Abadie et al. (2017), we want to close this chapter by highlighting that choosing the right dimensions for clustering is a design problem. Even if the data is informative about whether clustering matters for standard errors, they do not tell you whether you should adjust the standard errors for clustering. Clustering at too aggregate levels can hence lead to unnecessarily inflated standard errors.", + "objectID": "r/capital-asset-pricing-model.html#portfolio-return-variance", + "href": "r/capital-asset-pricing-model.html#portfolio-return-variance", + "title": "The Capital Asset Pricing Model", + "section": "Portfolio Return & Variance", + "text": "Portfolio Return & Variance\n\\(\\text{Expected Portfolio Return} = \\omega'\\mu\\)\n\n\\(\\omega\\): vector of asset weights\n\\(\\mu\\): vector of expected return of assets\n\n\\(\\text{Portfolio Variance} = \\omega' \\Sigma \\omega\\)\n\n\\(\\Sigma\\): variance-covariance matrix\n\nIntroducing the risk-free asset: Allocate capital between risk-free asset & risky portfolio\n\\[\\mu_c = c \\omega'\\mu + (1-c)r_f\\]\n\n\\(\\mu_{c}\\): combined portfolio return\n\\(r_f\\): return of risk-free asset (e.g. government bond)\n\\(c\\) fraction of capital in risky portfolio\n\nRisk-free asset has 0 volatility.\n\nPortfolio risk \\(\\sigma_c\\) is measured by volatility of risky asset\n\\(\\sigma_c= c\\sqrt{\\omega' \\Sigma \\omega}\\) \\(\\Rightarrow\\) \\(c = \\frac{\\sigma_c}{\\sqrt{\\omega' \\Sigma \\omega}}\\)\n\nAllows us to derive a Capital Allocation Line (CAL)\n\\[\\mu_c = r_f +\\sigma_c \\frac{\\omega'\\mu-r_f}{\\sqrt{\\omega' \\Sigma \\omega}}\\]\nSlope of CAL is called Sharpe ratio\n\\[\\text{Sharpe ratio} = \\frac{\\omega'\\mu-r_f}{\\sqrt{\\omega' \\Sigma \\omega}}\\]\n\nMeasures excess return per unit of risk\nHigher ratio indicates more attractive risk-adjusted return\n\nCalculate the risk-free rate: 13-week T-bill rate (^IRX) is quoted in annualized percentage yields. Convert annualized to daily rates (252 trading days). Note: this approach has a 99% correlation with Fama-French risk free rate.\n\nrisk_free_daily <- download_data(\n type = \"stock_prices\", symbol = \"^IRX\", \n start_date = \"2019-10-01\", end_date = \"2024-09-30\"\n) |> \n mutate(\n risk_free = (1 + adjusted_close / 100)^(1 / 252) - 1\n ) |> \n select(date, risk_free) |> \n drop_na()\n\nCreate example portfolios\n\nmu <- assets$mu\nsigma <- returns_daily |> \n pivot_wider(names_from = symbol, values_from = ret) |> \n select(-date) |> \n cov()\n\nPortfolio with equal weights\n\nnumber_of_assets <- nrow(assets)\nomega_ew <- rep(1 / number_of_assets, number_of_assets)\n\nsummary_ew <- tibble(\n mu = as.numeric(t(omega_ew) %*% mu),\n sigma = as.numeric(sqrt(t(omega_ew) %*% sigma %*% omega_ew)),\n type = \"Equal-Weighted Portfolio\"\n)\n\nPortfolio with random weights\n\nset.seed(1234)\nomega_random <- runif(number_of_assets, -1, 1)\nomega_random <- omega_random / sum(omega_random)\n\nsummary_random <- tibble(\n mu = as.numeric(t(omega_random) %*% mu),\n sigma = as.numeric(sqrt(t(omega_random) %*% sigma %*% omega_random)),\n type = \"Randomly-Weighted Portfolio\"\n)\n\nRisk-free asset\n\nsummary_risk_free <- tibble(\n mu = mean(risk_free_daily$risk_free),\n sigma = 0,\n type = \"Risk-Free Asset\"\n)\n\nsummaries <- bind_rows(assets, summary_ew, summary_random, summary_risk_free)\n\nPlot CALs. First introduce helper function to calculate Sharpe Ratio.\n\ncalculate_sharpe_ratio <- function(mu, sigma, risk_free) {\n as.numeric(mu - risk_free) / sigma \n}\n\nsummaries <- summaries |> \n mutate(\n sharpe_ratio = if_else(\n str_detect(type, \"Portfolio\"), \n calculate_sharpe_ratio(mu, sigma, risk_free = summary_risk_free$mu),\n NA\n ),\n risk_free = summary_risk_free$mu\n )\n\nSee Figure 2\n\nfig_cal <- summaries |> \n ggplot(aes(x = sigma, y = mu)) +\n geom_abline(aes(intercept = risk_free, slope = sharpe_ratio, color = type),\n linetype = \"dashed\", linewidth = 1) +\n geom_point(data = summaries |> filter(is.na(type))) +\n geom_point(data = summaries |> filter(!is.na(type)), shape = 4, size = 4) + \n geom_label_repel(aes(label = type)) + \n scale_x_continuous(labels = percent) +\n scale_y_continuous(labels = percent) + \n labs(\n x = \"Volatility\", y = \"Average return\",\n title = \"Average returns and volatilities of Dow index constituents with capital allocation lines\"\n ) +\n theme(legend.position = \"none\")\nfig_cal\n\n\n\n\n\n\n\nFigure 2: Points correspond to individual assets, crosses to portfolios.", "crumbs": [ "R", - "Modeling and Machine Learning", - "Fixed Effects and Clustered Standard Errors" + "Getting Started", + "The Capital Asset Pricing Model" ] }, { - "objectID": "r/fixed-effects-and-clustered-standard-errors.html#exercises", - "href": "r/fixed-effects-and-clustered-standard-errors.html#exercises", - "title": "Fixed Effects and Clustered Standard Errors", - "section": "Exercises", - "text": "Exercises\n\nEstimate the two-way fixed effects model with two-way clustered standard errors using quarterly Compustat data from WRDS. Note that you can access quarterly data via tbl(wrds, I(\"comp.fundq\")).\nFollowing Peters and Taylor (2017), compute Tobin’s q as the market value of outstanding equity mktcap plus the book value of debt (dltt + dlc) minus the current assets atc and everything divided by the book value of property, plant and equipment ppegt. What is the correlation between the measures of Tobin’s q? What is the impact on the two-way fixed effects regressions?", + "objectID": "r/capital-asset-pricing-model.html#the-tangency-portfolio", + "href": "r/capital-asset-pricing-model.html#the-tangency-portfolio", + "title": "The Capital Asset Pricing Model", + "section": "The Tangency Portfolio", + "text": "The Tangency Portfolio\nThe portfolio that maximizes Sharpe ratio\n\\[\\max_w \\frac{\\omega' \\mu - r_f}{\\sqrt{\\omega' \\Sigma \\omega}}\\] while staying fully invested\n\\[ \\omega'\\iota = 1\\]\nis called the tangency portfolio\nCalculate the tangency portfolio\nAnalytic solution for tangency portfolio (see here)\n\\[\\omega_{tan}=\\frac{\\Sigma^{-1}(\\mu-r_f)}{\\iota'\\Sigma^{-1}(\\mu-r_f)}\\]\n\nomega_tangency <- solve(sigma) %*% (mu - summary_risk_free$mu)\nomega_tangency <- as.vector(omega_tangency / sum(omega_tangency))\n\nsummary_tangency <- tibble(\n mu = as.numeric(t(omega_tangency) %*% mu),\n sigma = as.numeric(sqrt(t(omega_tangency) %*% sigma %*% omega_tangency)),\n type = \"Tangency Portfolio\",\n sharpe_ratio = calculate_sharpe_ratio(mu, sigma, risk_free = summary_risk_free$mu),\n risk_free = summary_risk_free$mu\n)", "crumbs": [ "R", - "Modeling and Machine Learning", - "Fixed Effects and Clustered Standard Errors" + "Getting Started", + "The Capital Asset Pricing Model" ] }, { - "objectID": "r/accessing-and-managing-financial-data.html", - "href": "r/accessing-and-managing-financial-data.html", - "title": "Accessing and Managing Financial Data", - "section": "", - "text": "Note\n\n\n\nYou are reading Tidy Finance with R. You can find the equivalent chapter for the sibling Tidy Finance with Python here.\nIn this chapter, we suggest a way to organize your financial data. Everybody who has experience with data is also familiar with storing data in various formats like CSV, XLS, XLSX, or other delimited value storage. Reading and saving data can become very cumbersome in the case of using different data formats, both across different projects and across different programming languages. Moreover, storing data in delimited files often leads to problems with respect to column type consistency. For instance, date-type columns frequently lead to inconsistencies across different data formats and programming languages.\nThis chapter shows how to import different open source data sets. Specifically, our data comes from the application programming interface (API) of Yahoo Finance, a downloaded standard CSV file, an XLSX file stored in a public Google Drive repository, and other macroeconomic time series that can be scraped directly from a website. We show how to process these raw data, as well as how to take a shortcut using the tidyfinance package, which provides a consistent interface to tidy financial data. We store all the data in a single database, which serves as the only source of data in subsequent chapters. We conclude the chapter by providing some tips on managing databases.\nFirst, we load the global R packages that we use throughout this chapter. Later on, we load more packages in the sections where we need them.\nlibrary(tidyverse)\nlibrary(tidyfinance)\nlibrary(scales)\nMoreover, we initially define the date range for which we fetch and store the financial data, making future data updates tractable. In case you need another time frame, you can adjust the dates below. Our data starts with 1960 since most asset pricing studies use data from 1962 on.\nstart_date <- ymd(\"1960-01-01\")\nend_date <- ymd(\"2023-12-31\")", + "objectID": "r/capital-asset-pricing-model.html#the-capital-market-line", + "href": "r/capital-asset-pricing-model.html#the-capital-market-line", + "title": "The Capital Asset Pricing Model", + "section": "The Capital Market Line", + "text": "The Capital Market Line\nCombination of risk-free asset & the tangency portfolio \\(\\omega_{tan}\\)\n\\[\\mu_{c} = r_f +\\sigma_c \\frac{\\omega_{tan}'\\mu-r_f}{\\sqrt{\\omega_{tan}' \\Sigma \\omega_{tan}}}\\]\nis called the Capital Market Line (CML)\n\nCML describes best risk-return trade-off for portfolios that contain risk-free asset & tangency portfolio\nPlot the CML\n\nsummaries <- bind_rows(summaries, summary_tangency)\n\nfig_cml <- summaries |> \n ggplot(aes(x = sigma, y = mu)) +\n geom_abline(aes(intercept = risk_free, slope = sharpe_ratio, color = type),\n linetype = \"dashed\", linewidth = 1) +\n geom_point(data = summaries |> filter(is.na(type))) +\n geom_point(data = summaries |> filter(!is.na(type)), shape = 4, size = 4) + \n ggrepel::geom_label_repel(aes(label = type)) + \n scale_x_continuous(labels = percent) +\n scale_y_continuous(labels = percent) + \n labs(x = \"Volatility\", y = \"Average return\",\n title = \"Average returns and volatilities of Dow index constituents with the capital market line\") +\n theme(legend.position = \"none\")\nfig_cml\n\n\n\n\n\n\n\nFigure 3: Points correspond to individual assets, crosses to portfolios.\n\n\n\n\n\nPortfolios vs individual assets. In the CAPM model:\n\nInvestors prefer to hold any portfolio on the CML over individual assets or any other portfolio\nAll rational investors hold the tangency portfolio\nReturn of an individual asset can be compared to efficient tangency weight\nRisk of an asset is proportional to covariance with tangency portfolio weight\n\nExpected excess returns vs tangency weights. Expected excess return of asset \\(i\\) is\n\\[\\mu_i - r_f = \\beta_i \\cdot (\\omega_{tan}'\\mu - r_f)\\]\nwhere\n\\[\\beta_i = \\frac{\\text{Cov}(r_i, \\omega_{tan}r)}{\\omega_{tan}' \\Sigma \\omega_{tan}}\\]\nis called the asset beta\nCalculate excess returns\n\ntangency_weights <- tibble(\n symbol = assets$symbol, \n omega_tangency = omega_tangency\n)\n\nreturns_tangency_daily <- returns_daily |> \n left_join(tangency_weights, join_by(symbol)) |> \n group_by(date) |> \n summarize(mkt_ret = weighted.mean(ret, omega_tangency))\n\nreturns_excess_daily <- returns_daily |> \n left_join(returns_tangency_daily, join_by(date)) |> \n left_join(risk_free_daily, join_by(date)) |> \n mutate(ret_excess = ret - risk_free,\n mkt_excess = mkt_ret - risk_free) |> \n select(symbol, date, ret_excess, mkt_excess)\n\nEstimate Asset Betas\n\nestimate_beta <- function(data) {\n fit <- lm(\"ret_excess ~ mkt_excess - 1\", data = data)\n coefficients(fit)\n}\n \nbeta_results <- returns_excess_daily |> \n nest(data = -symbol) |> \n mutate(beta = map_dbl(data, estimate_beta))\n\nPlot asset betas\n\nfig_betas <- beta_results |> \n ggplot(aes(x = beta, y = fct_reorder(symbol, beta))) +\n geom_col() +\n labs(\n x = \"Estimated asset beta\", y = NULL, \n title = \"Estimated asset betas based on the tangency portfolio for Dow index constituents\"\n )\nfig_betas\n\n\n\n\n\n\n\nFigure 4: Weights are based on returns adjusted for dividend payments and stock splits.\n\n\n\n\n\nAsset returns vs systematic risk: the assets all fall onto the 45 degree line, as they should according to CAPM Figure 5\n\nassets <- assets |> \n mutate(mu_excess = mu - summary_risk_free$mu) |> \n left_join(beta_results, join_by(symbol))\n \nfig_betas_returns <- assets |> \n ggplot(aes(x = beta, y = mu_excess)) + \n geom_abline(intercept = 0, \n slope = summary_tangency$mu - summary_risk_free$mu) + \n geom_point() +\n geom_label_repel(data = assets |> filter(symbol %in% c(\"BA\", \"NVDA\")),\n aes(label = symbol)) + \n scale_y_continuous(labels = percent) + \n labs(\n x = \"Estimated asset beta\", y = \"Average return\", \n title = \"Estimated CAPM-betas and average returns for Dow index constituents\"\n )\nfig_betas_returns\n\n\n\n\n\n\n\nFigure 5: Estimates are based on returns adjusted for dividend payments and stock splits and using the tangency portfolio as a measure for the market.", "crumbs": [ "R", - "Financial Data", - "Accessing and Managing Financial Data" + "Getting Started", + "The Capital Asset Pricing Model" ] }, { - "objectID": "r/accessing-and-managing-financial-data.html#fama-french-data", - "href": "r/accessing-and-managing-financial-data.html#fama-french-data", - "title": "Accessing and Managing Financial Data", - "section": "Fama-French Data", - "text": "Fama-French Data\nWe start by downloading some famous Fama-French factors (e.g., Fama and French 1993) and portfolio returns commonly used in empirical asset pricing. Fortunately, there is a neat package by Nelson Areal that allows us to access the data easily: the frenchdata package provides functions to download and read data sets from Prof. Kenneth French finance data library (Areal 2021). \n\nlibrary(frenchdata)\n\nWe can use the download_french_data() function of the package to download monthly Fama-French factors. The set Fama/French 3 Factors contains the return time series of the market mkt_excess, size smb and value hml alongside the risk-free rates rf. Note that we have to do some manual work to correctly parse all the columns and scale them appropriately, as the raw Fama-French data comes in a very unpractical data format. For precise descriptions of the variables, we suggest consulting Prof. Kenneth French’s finance data library directly. If you are on the website, check the raw data files to appreciate the time you can save thanks to frenchdata.\n\nfactors_ff3_monthly_raw <- download_french_data(\"Fama/French 3 Factors\")\nfactors_ff3_monthly <- factors_ff3_monthly_raw$subsets$data[[1]] |>\n mutate(\n date = floor_date(ymd(str_c(date, \"01\")), \"month\"),\n across(c(RF, `Mkt-RF`, SMB, HML), ~as.numeric(.) / 100),\n .keep = \"none\"\n ) |>\n rename_with(str_to_lower) |>\n rename(mkt_excess = `mkt-rf`) |> \n filter(date >= start_date & date <= end_date)\n\nWe also download the set 5 Factors (2x3), which additionally includes the return time series of the profitability rmw and investment cma factors. We demonstrate how the monthly factors are constructed in the chapter Replicating Fama and French Factors.\n\nfactors_ff5_monthly_raw <- download_french_data(\"Fama/French 5 Factors (2x3)\")\n\nfactors_ff5_monthly <- factors_ff5_monthly_raw$subsets$data[[1]] |>\n mutate(\n date = floor_date(ymd(str_c(date, \"01\")), \"month\"),\n across(c(RF, `Mkt-RF`, SMB, HML, RMW, CMA), ~as.numeric(.) / 100),\n .keep = \"none\"\n ) |>\n rename_with(str_to_lower) |>\n rename(mkt_excess = `mkt-rf`) |> \n filter(date >= start_date & date <= end_date)\n\nIt is straightforward to download the corresponding daily Fama-French factors with the same function.\n\nfactors_ff3_daily_raw <- download_french_data(\"Fama/French 3 Factors [Daily]\")\n\nfactors_ff3_daily <- factors_ff3_daily_raw$subsets$data[[1]] |>\n mutate(\n date = ymd(date),\n across(c(RF, `Mkt-RF`, SMB, HML), ~as.numeric(.) / 100),\n .keep = \"none\"\n ) |>\n rename_with(str_to_lower) |>\n rename(mkt_excess = `mkt-rf`) |>\n filter(date >= start_date & date <= end_date)\n\nIn a subsequent chapter, we also use the 10 monthly industry portfolios, so let us fetch that data, too.\n\nindustries_ff_monthly_raw <- download_french_data(\"10 Industry Portfolios\")\n\nindustries_ff_monthly <- industries_ff_monthly_raw$subsets$data[[1]] |>\n mutate(date = floor_date(ymd(str_c(date, \"01\")), \"month\")) |>\n mutate(across(where(is.numeric), ~ . / 100)) |>\n select(date, everything()) |>\n filter(date >= start_date & date <= end_date) |> \n rename_with(str_to_lower)\n\nIt is worth taking a look at all available portfolio return time series from Kenneth French’s homepage. You should check out the other sets by calling get_french_data_list().\nTo automatically download and process Fama-French data, you can also use the tidyfinance package with type = \"factors_ff_3_monthly\" or similar, e.g.:\n\ndownload_data(\n type = \"factors_ff_3_monthly\", \n start_date = start_date, \n end_date = end_date\n)\n\nThe tidyfinance package implements the processing steps as above and returns the same cleaned data frame. The list of supported Fama-French data types can be called as follows:\n\nlist_supported_types(domain = \"Fama-French\")", + "objectID": "r/capital-asset-pricing-model.html#capm-in-practice", + "href": "r/capital-asset-pricing-model.html#capm-in-practice", + "title": "The Capital Asset Pricing Model", + "section": "CAPM in Practice", + "text": "CAPM in Practice\nHow to estimate betas in practice?\nCalculating the tangency portfolio can be cumbersome\n\nWhat is the correct asset universe?\nHow to estimate \\(\\mu\\) and \\(\\Sigma\\) for many assets?\n\nIn the CAPM: market portfolio = tangency portfolio\n\nSkip calculation of tangency portfolio weights\nUse portfolios weighted by market capitalization\n\nKey assumptions behind CAPM:\n\nEquilibrium model in a single-period economy\nNo transaction costs or taxes\nRisk-free borrowing and lending are available to all investors\nInvestors share homogeneous expectations\nInvestors maximize returns for limited level of risk\n\nCAPM is a foundation for other models because of its simplicity\nThe Security Market Line (SML). Expected return of asset \\(i\\) is\n\\[\\mu_i = r_f + \\beta_i \\cdot (\\mu_m - r_f)\\]\nwhere\n\\[\\beta_i = \\frac{\\sigma_{im}}{\\sigma_m^2}\\]\n\n\\(\\mu_m\\): expected market returns\n\\(\\sigma_{im}\\): covariance of asset \\(i\\) with market\n\\(\\sigma_m\\): market volatility\n\nEvaluate asset performance with the SML: Alpha is difference between actual excess return & expected return\n\\[\\mu_i - r_f = \\alpha_i + \\beta_i \\cdot (\\mu_m - r_f)\\]\nAlpha is performance adjusted for market risk\n\nPositive alpha: outperformance relative to market\nNegative alpha: underperformance relative to market\n\nEstimate Asset Alphas & Beta. Regression model:\n\\[r_{i,t} - r_{f,t} = \\hat{\\alpha}_i + \\hat{\\beta}_i \\cdot (r_{m,t} - r_{f,t} ) + \\hat{\\varepsilon}_{i,t} \\]\n\n\\(r_{i,t}\\): actual returns of asset \\(i\\) on day \\(t\\)\n\\(r_{m,t}\\): actual market returns on day \\(t\\)\n\nDownload excess market returns\n\nfactors <- download_data(\n type = \"factors_ff_5_2x3_daily\", \n start_date = \"2019-10-01\", end_date = \"2024-09-30\"\n) |> \n select(date, mkt_excess, risk_free)\n\nEstimate alphas & betas\n\nreturns_excess_daily <- returns_daily |> \n left_join(factors, join_by(date)) |> \n mutate(ret_excess = ret - risk_free) |> \n select(symbol, date, ret_excess, mkt_excess)\n\nestimate_capm <- function(data) {\n fit <- lm(\"ret_excess ~ mkt_excess\", data = data)\n tibble(\n coefficient = c(\"alpha\", \"beta\"),\n estimate = coefficients(fit),\n t_statistic = summary(fit)$coefficients[, \"t value\"]\n )\n}\n \ncapm_results <- returns_excess_daily |> \n nest(data = -symbol) |> \n mutate(capm = map(data, estimate_capm)) |> \n unnest(capm) |> \n select(symbol, coefficient, estimate, t_statistic)\n\nPlot asset alphas\n\nfig_alpha <- capm_results |> \n filter(coefficient == \"alpha\") |> \n mutate(is_significant = abs(t_statistic) >= 1.96) |> \n ggplot(aes(x = estimate, y = fct_reorder(symbol, estimate), \n fill = is_significant)) +\n geom_col() +\n scale_x_continuous(labels = percent) + \n labs(\n x = \"Estimated asset alphas\", y = NULL, fill = \"Significant at 95%?\",\n title = \"Estimated CAPM alphas for Dow index constituents\"\n )\nfig_alpha\n\n\n\n\n\n\n\nFigure 6: Estimates are based on returns adjusted for dividend payments and stock splits and using the Fama-French market excess returns as a measure for the market.", "crumbs": [ "R", - "Financial Data", - "Accessing and Managing Financial Data" + "Getting Started", + "The Capital Asset Pricing Model" ] }, { - "objectID": "r/accessing-and-managing-financial-data.html#q-factors", - "href": "r/accessing-and-managing-financial-data.html#q-factors", - "title": "Accessing and Managing Financial Data", - "section": "q-Factors", - "text": "q-Factors\nIn recent years, the academic discourse experienced the rise of alternative factor models, e.g., in the form of the Hou, Xue, and Zhang (2014) q-factor model. We refer to the extended background information provided by the original authors for further information. The q factors can be downloaded directly from the authors’ homepage from within read_csv().\nWe also need to adjust this data. First, we discard information we will not use in the remainder of the book. Then, we rename the columns with the “R_”-prescript using regular expressions and write all column names in lowercase. You should always try sticking to a consistent style for naming objects, which we try to illustrate here - the emphasis is on try. You can check out style guides available online, e.g., Hadley Wickham’s tidyverse style guide.\n\nfactors_q_monthly_link <-\n \"https://global-q.org/uploads/1/2/2/6/122679606/q5_factors_monthly_2023.csv\"\n\nfactors_q_monthly <- read_csv(factors_q_monthly_link) |>\n mutate(date = ymd(str_c(year, month, \"01\", sep = \"-\"))) |>\n rename_with(~str_remove(., \"R_\")) |>\n rename_with(str_to_lower) |>\n mutate(across(-date, ~. / 100)) |>\n select(date, risk_free = f, mkt_excess = mkt, everything()) |>\n filter(date >= start_date & date <= end_date)\n\nAgain, you can use the tidyfinance package for a shortcut:\n\ndownload_data(\n type = \"factors_q5_monthly\", \n start_date = start_date, \n end_date = end_date\n)", + "objectID": "r/capital-asset-pricing-model.html#shortcomings-extensions", + "href": "r/capital-asset-pricing-model.html#shortcomings-extensions", + "title": "The Capital Asset Pricing Model", + "section": "Shortcomings & Extensions", + "text": "Shortcomings & Extensions\nPopular shortcomings of CAPM\n\nImpossible to create universal measure for market\n\nMarket definition might depend on context (e.g. S&P 500, DAX, TOPIX)\n\nBeta might not be stable over time\n\nCompany operations, leverage or competitive environment might change beta\n\nSystematic risk might not be the only factor\n\nPoor empirical performance in explaining small-cap or high-growth returns\n\nMany more: behavioral biases, heterogeneous preferences, liquidity, etc.\n\nAlternatives & extensions.\nFama-French 3-Factor model extends CAPM\n\nOutperformance of small vs big companies (see tidy-finance.org)\nOutperformance of high vs low value companies (see tidy-finance.org)\n\nFama-French 5-Factor model extends 3-factor model (see tidy-finance.org)\n\nOutperformance of companies with robust vs weak operating profitability\nOutperformance of companies with conservative vs aggressive investment\n\nMany more: consumption CAPM, conditional CAPM, Carhart Four-Factor Model, Q-Factor Model & ivnestment CAPM", "crumbs": [ "R", - "Financial Data", - "Accessing and Managing Financial Data" + "Getting Started", + "The Capital Asset Pricing Model" ] }, { - "objectID": "r/accessing-and-managing-financial-data.html#macroeconomic-predictors", - "href": "r/accessing-and-managing-financial-data.html#macroeconomic-predictors", - "title": "Accessing and Managing Financial Data", - "section": "Macroeconomic Predictors", - "text": "Macroeconomic Predictors\nOur next data source is a set of macroeconomic variables often used as predictors for the equity premium. Welch and Goyal (2008) comprehensively reexamine the performance of variables suggested by the academic literature to be good predictors of the equity premium. The authors host the data updated to 2022 on Amit Goyal’s website. The data is an XLSX-file stored on a public Google drive location and we directly export a CSV file.\n\nsheet_id <- \"1bM7vCWd3WOt95Sf9qjLPZjoiafgF_8EG\"\nsheet_name <- \"Monthly\"\nmacro_predictors_url <- paste0(\n \"https://docs.google.com/spreadsheets/d/\", sheet_id,\n \"/gviz/tq?tqx=out:csv&sheet=\", sheet_name\n)\nmacro_predictors_raw <- read_csv(macro_predictors_url)\n\nNext, we transform the columns into the variables that we later use:\n\nThe dividend price ratio (dp), the difference between the log of dividends and the log of prices, where dividends are 12-month moving sums of dividends paid on the S&P 500 index, and prices are monthly averages of daily closing prices (Campbell and Shiller 1988; Campbell and Yogo 2006).\nDividend yield (dy), the difference between the log of dividends and the log of lagged prices (Ball 1978).\nEarnings price ratio (ep), the difference between the log of earnings and the log of prices, where earnings are 12-month moving sums of earnings on the S&P 500 index (Campbell and Shiller 1988).\nDividend payout ratio (de), the difference between the log of dividends and the log of earnings (Lamont 1998).\nStock variance (svar), the sum of squared daily returns on the S&P 500 index (Guo 2006).\nBook-to-market ratio (bm), the ratio of book value to market value for the Dow Jones Industrial Average (Kothari and Shanken 1997).\nNet equity expansion (ntis), the ratio of 12-month moving sums of net issues by NYSE listed stocks divided by the total end-of-year market capitalization of NYSE stocks (Campbell, Hilscher, and Szilagyi 2008).\nTreasury bills (tbl), the 3-Month Treasury Bill: Secondary Market Rate from the economic research database at the Federal Reserve Bank at St. Louis (Campbell 1987).\nLong-term yield (lty), the long-term government bond yield from Ibbotson’s Stocks, Bonds, Bills, and Inflation Yearbook (Welch and Goyal 2008).\nLong-term rate of returns (ltr), the long-term government bond returns from Ibbotson’s Stocks, Bonds, Bills, and Inflation Yearbook (Welch and Goyal 2008).\nTerm spread (tms), the difference between the long-term yield on government bonds and the Treasury bill (Campbell 1987).\nDefault yield spread (dfy), the difference between BAA and AAA-rated corporate bond yields (Fama and French 1989).\nInflation (infl), the Consumer Price Index (All Urban Consumers) from the Bureau of Labor Statistics (Campbell and Vuolteenaho 2004).\n\nFor variable definitions and the required data transformations, you can consult the material on Amit Goyal’s website.\n\nmacro_predictors <- macro_predictors_raw |>\n mutate(date = ym(yyyymm)) |>\n mutate(across(where(is.character), as.numeric)) |>\n mutate(\n IndexDiv = Index + D12,\n logret = log(IndexDiv) - log(lag(IndexDiv)),\n Rfree = log(Rfree + 1),\n rp_div = lead(logret - Rfree, 1), # Future excess market return\n dp = log(D12) - log(Index), # Dividend Price ratio\n dy = log(D12) - log(lag(Index)), # Dividend yield\n ep = log(E12) - log(Index), # Earnings price ratio\n de = log(D12) - log(E12), # Dividend payout ratio\n tms = lty - tbl, # Term spread\n dfy = BAA - AAA # Default yield spread\n ) |>\n select(\n date, rp_div, dp, dy, ep, de, svar,\n bm = `b/m`, ntis, tbl, lty, ltr,\n tms, dfy, infl\n ) |>\n filter(date >= start_date & date <= end_date) |>\n drop_na()\n\nTo get the equivalent data through tidyfinance, you can call:\n\ndownload_data(\n type = \"macro_predictors_monthly\",\n start_date = start_date,\n end_date = end_date\n)", + "objectID": "r/capital-asset-pricing-model.html#key-takeways", + "href": "r/capital-asset-pricing-model.html#key-takeways", + "title": "The Capital Asset Pricing Model", + "section": "Key takeways", + "text": "Key takeways\n\nCAPM is an equilibrium model in a frictionless economy\nInvestors hold mix of market portfolio & risk-free asset\nExpected return of a stock is a linear function of its beta\nBeta is the sensitivity of a stock to market movements\nBeta estimation via linear regression using historical data", "crumbs": [ "R", - "Financial Data", - "Accessing and Managing Financial Data" + "Getting Started", + "The Capital Asset Pricing Model" ] }, { - "objectID": "r/accessing-and-managing-financial-data.html#other-macroeconomic-data", - "href": "r/accessing-and-managing-financial-data.html#other-macroeconomic-data", - "title": "Accessing and Managing Financial Data", - "section": "Other Macroeconomic Data", - "text": "Other Macroeconomic Data\nThe Federal Reserve bank of St. Louis provides the Federal Reserve Economic Data (FRED), an extensive database for macroeconomic data. In total, there are 817,000 US and international time series from 108 different sources. The data can be downloaded directly from FRED by constructing the appropriate URL. For instance, let us consider the consumer price index (CPI) data that can be found under the CPIAUCNS:\n\nseries <- \"CPIAUCNS\"\ncpi_url <- paste0(\"https://fred.stlouisfed.org/series/\", series, \"/downloaddata/\", series, \".csv\")\n\nWe can then use the httr2 (Wickham 2024) package to request the CSV, extract the data from the response body, and convert the columns to a tidy format:\n\nlibrary(httr2)\n\ncpi_daily <- request(cpi_url) |>\n req_perform() |> \n resp_body_string() |> \n read_csv() |> \n mutate(\n date = as.Date(DATE),\n value = as.numeric(VALUE),\n series = series,\n .keep = \"none\"\n )\n\nWe convert the daily CPI data to monthly because we use the latter in later chapters.\n\ncpi_monthly <- cpi_daily |>\n mutate(\n date = floor_date(date, \"month\"),\n cpi = value / value[date == max(date)],\n .keep = \"none\"\n )\n\nThe tidyfinance package can, of course, also fetch the same daily data and many more data series:\n\ndownload_data(\n type = \"fred\",\n series = \"CPIAUCNS\",\n start_date = start_date,\n end_date = end_date\n)\n\n# A tibble: 768 × 3\n date value series \n <date> <dbl> <chr> \n1 1960-01-01 29.3 CPIAUCNS\n2 1960-02-01 29.4 CPIAUCNS\n3 1960-03-01 29.4 CPIAUCNS\n4 1960-04-01 29.5 CPIAUCNS\n5 1960-05-01 29.5 CPIAUCNS\n# ℹ 763 more rows\n\n\nTo download other time series, we just have to look it up on the FRED website and extract the corresponding key from the address. For instance, the producer price index for gold ores can be found under the PCU2122212122210 key. If your desired time series is not supported through tidyfinance, we recommend working with the fredr package (Boysel and Vaughan 2021). Note that you need to get an API key to use its functionality. We refer to the package documentation for details.", + "objectID": "r/capital-asset-pricing-model.html#exercises", + "href": "r/capital-asset-pricing-model.html#exercises", + "title": "The Capital Asset Pricing Model", + "section": "Exercises", + "text": "Exercises\n\n…", "crumbs": [ "R", - "Financial Data", - "Accessing and Managing Financial Data" + "Getting Started", + "The Capital Asset Pricing Model" ] }, { - "objectID": "r/accessing-and-managing-financial-data.html#setting-up-a-database", - "href": "r/accessing-and-managing-financial-data.html#setting-up-a-database", - "title": "Accessing and Managing Financial Data", - "section": "Setting Up a Database", - "text": "Setting Up a Database\nNow that we have downloaded some (freely available) data from the web into the memory of our R session let us set up a database to store that information for future use. We will use the data stored in this database throughout the following chapters, but you could alternatively implement a different strategy and replace the respective code.\nThere are many ways to set up and organize a database, depending on the use case. For our purpose, the most efficient way is to use an SQLite database, which is the C-language library that implements a small, fast, self-contained, high-reliability, full-featured, SQL database engine. Note that SQL (Structured Query Language) is a standard language for accessing and manipulating databases and heavily inspired the dplyr functions. We refer to this tutorial for more information on SQL.\nThere are two packages that make working with SQLite in R very simple: RSQLite (Müller et al. 2022) embeds the SQLite database engine in R, and dbplyr (Wickham, Girlich, and Ruiz 2022) is the database back-end for dplyr. These packages allow to set up a database to remotely store tables and use these remote database tables as if they are in-memory data frames by automatically converting dplyr into SQL. Check out the RSQLite and dbplyr vignettes for more information.\n\nlibrary(RSQLite)\nlibrary(dbplyr)\n\nAn SQLite database is easily created - the code below is really all there is. You do not need any external software. Note that we use the extended_types = TRUE option to enable date types when storing and fetching data. Otherwise, date columns are stored and retrieved as integers. We will use the file tidy_finance_r.sqlite, located in the data subfolder, to retrieve data for all subsequent chapters. The initial part of the code ensures that the directory is created if it does not already exist.\n\nif (!dir.exists(\"data\")) {\n dir.create(\"data\")\n}\n\ntidy_finance <- dbConnect(\n SQLite(),\n \"data/tidy_finance_r.sqlite\",\n extended_types = TRUE\n)\n\nNext, we create a remote table with the monthly Fama-French factor data. We do so with the function dbWriteTable(), which copies the data to our SQLite-database.\n\ndbWriteTable(\n tidy_finance,\n \"factors_ff3_monthly\",\n value = factors_ff3_monthly,\n overwrite = TRUE\n)\n\nWe can use the remote table as an in-memory data frame by building a connection via tbl().\n\nfactors_ff3_monthly_db <- tbl(tidy_finance, \"factors_ff3_monthly\")\n\nAll dplyr calls are evaluated lazily, i.e., the data is not in our R session’s memory, and the database does most of the work. You can see that by noticing that the output below does not show the number of rows. In fact, the following code chunk only fetches the top 10 rows from the database for printing.\n\nfactors_ff3_monthly_db |>\n select(date, rf)\n\n# Source: SQL [?? x 2]\n# Database: sqlite 3.41.2 [data/tidy_finance_r.sqlite]\n date rf\n <date> <dbl>\n1 1960-01-01 0.0033\n2 1960-02-01 0.0029\n3 1960-03-01 0.0035\n4 1960-04-01 0.0019\n5 1960-05-01 0.0027\n# ℹ more rows\n\n\nIf we want to have the whole table in memory, we need to collect() it. You will see that we regularly load the data into the memory in the next chapters.\n\nfactors_ff3_monthly_db |>\n select(date, rf) |>\n collect()\n\n# A tibble: 768 × 2\n date rf\n <date> <dbl>\n1 1960-01-01 0.0033\n2 1960-02-01 0.0029\n3 1960-03-01 0.0035\n4 1960-04-01 0.0019\n5 1960-05-01 0.0027\n# ℹ 763 more rows\n\n\nThe last couple of code chunks is really all there is to organizing a simple database! You can also share the SQLite database across devices and programming languages.\nBefore we move on to the next data source, let us also store the other five tables in our new SQLite database.\n\ndbWriteTable(\n tidy_finance,\n \"factors_ff5_monthly\",\n value = factors_ff5_monthly,\n overwrite = TRUE\n)\n\ndbWriteTable(\n tidy_finance,\n \"factors_ff3_daily\",\n value = factors_ff3_daily,\n overwrite = TRUE\n)\n\ndbWriteTable(\n tidy_finance,\n \"industries_ff_monthly\",\n value = industries_ff_monthly,\n overwrite = TRUE\n)\n\ndbWriteTable(\n tidy_finance,\n \"factors_q_monthly\",\n value = factors_q_monthly,\n overwrite = TRUE\n)\n\ndbWriteTable(\n tidy_finance,\n \"macro_predictors\",\n value = macro_predictors,\n overwrite = TRUE\n)\n\ndbWriteTable(\n tidy_finance,\n \"cpi_monthly\",\n value = cpi_monthly,\n overwrite = TRUE\n)\n\nFrom now on, all you need to do to access data that is stored in the database is to follow three steps: (i) Establish the connection to the SQLite database, (ii) call the table you want to extract, and (iii) collect the data. For your convenience, the following steps show all you need in a compact fashion.\n\nlibrary(tidyverse)\nlibrary(RSQLite)\n\ntidy_finance <- dbConnect(\n SQLite(),\n \"data/tidy_finance_r.sqlite\",\n extended_types = TRUE\n)\n\nfactors_q_monthly <- tbl(tidy_finance, \"factors_q_monthly\")\nfactors_q_monthly <- factors_q_monthly |> collect()", + "objectID": "r/modern-portfolio-theory.html", + "href": "r/modern-portfolio-theory.html", + "title": "Modern Portfolio Theory", + "section": "", + "text": "In the previous chapter, we showed how to download stock market data and analyze them with graphs and summary statistics. Now, we move to a typical question in Finance: how should wealth be allocated across assets with varying returns, risks, and correlations to optimize a portfolio’s performance? Modern Portfolio Theory (MPT), introduced by (Markowitz 1952), revolutionized the way we think about such investments by formalizing the trade-off between risk and return. Markowitz’s framework laid the foundation for much of modern finance, earning him the Sveriges Riksbank Prize in Economic Sciences in 1990.\nMarkowitz demonstrates that portfolio risk depends not only on individual asset volatilities but also on the correlations between asset returns. This insight highlights the power of diversification: combining assets with low or negative correlations reduces overall portfolio risk. This principle is often illustrated with the analogy of a fruit basket: If all you have are apples & they spoil, you lose everything. With a variety of fruits, some fruits may spoil, but others will stay fresh.\nAt the heart of MPT is mean-variance analysis, which evaluates portfolios based on two dimensions: expected return and risk. By balancing these two factors, investors can construct portfolios that either maximize return for a given level of risk or minimize risk for a desired level of return.\nWe use the following packages throughout this chapter:\nlibrary(tidyverse)\nlibrary(tidyfinance)\nlibrary(scales)\nlibrary(ggrepel)", "crumbs": [ "R", - "Financial Data", - "Accessing and Managing Financial Data" + "Getting Started", + "Modern Portfolio Theory" ] }, { - "objectID": "r/accessing-and-managing-financial-data.html#managing-sqlite-databases", - "href": "r/accessing-and-managing-financial-data.html#managing-sqlite-databases", - "title": "Accessing and Managing Financial Data", - "section": "Managing SQLite Databases", - "text": "Managing SQLite Databases\nFinally, at the end of our data chapter, we revisit the SQLite database itself. When you drop database objects such as tables or delete data from tables, the database file size remains unchanged because SQLite just marks the deleted objects as free and reserves their space for future uses. As a result, the database file always grows in size.\nTo optimize the database file, you can run the VACUUM command in the database, which rebuilds the database and frees up unused space. You can execute the command in the database using the dbSendQuery() function.\n\nres <- dbSendQuery(tidy_finance, \"VACUUM\")\nres\n\n<SQLiteResult>\n SQL VACUUM\n ROWS Fetched: 0 [complete]\n Changed: 0\n\n\nThe VACUUM command actually performs a couple of additional cleaning steps, which you can read about in this tutorial. \nWe store the result of the above query in res because the database keeps the result set open. To close open results and avoid warnings going forward, we can use dbClearResult().\n\ndbClearResult(res)\n\nApart from cleaning up, you might be interested in listing all the tables that are currently in your database. You can do this via the dbListTables() function.\n\ndbListTables(tidy_finance)\n\n [1] \"beta\" \"compustat\" \n [3] \"cpi_monthly\" \"crsp_daily\" \n [5] \"crsp_monthly\" \"factors_ff3_daily\" \n [7] \"factors_ff3_monthly\" \"factors_ff5_monthly\" \n [9] \"factors_q_monthly\" \"fisd\" \n[11] \"industries_ff_monthly\" \"macro_predictors\" \n[13] \"trace_enhanced\" \n\n\nThis function comes in handy if you are unsure about the correct naming of the tables in your database.", + "objectID": "r/modern-portfolio-theory.html#estimate-expected-returns", + "href": "r/modern-portfolio-theory.html#estimate-expected-returns", + "title": "Modern Portfolio Theory", + "section": "Estimate Expected Returns", + "text": "Estimate Expected Returns\nExpected returns, denoted as \\(\\mu_i\\), represent the anticipated profit from holding an asset \\(i\\). They are typically estimated using historical data by computing the average of past returns:\n\\[\\hat{\\mu}_i = \\frac{1}{T} \\sum_{t=1}^{T} r_{it},\\]\nwhere \\(r_{it}\\) is the return of asset \\(i\\) in period \\(t\\), and \\(T\\) is the total number of periods. While past performance does not guarantee future results, the typical assumption is that it is at least indicative of future performance.\nLeveraging the approach of Working with Stock Returns, we download the constituents of the Dow Jones Industrial Average as an example portfolio, as well as their daily adjusted close prices:\n\nsymbols <- download_data(\n type = \"constituents\",\n index = \"Dow Jones Industrial Average\"\n)\n\nprices_daily <- download_data(\n type = \"stock_prices\", symbol = symbols$symbol,\n start_date = \"2019-08-01\", end_date = \"2024-07-31\"\n) |> \n select(symbol, date, price = adjusted_close)\n\nprices_daily\n\nThen, we proceed to calculate daily returns for each asset.\n\nreturns_daily <- prices_daily |>\n group_by(symbol) |> \n mutate(ret = price / lag(price) - 1) |>\n ungroup() |> \n select(symbol, date, ret) |> \n drop_na(ret) |> \n arrange(symbol, date)\n\nreturns_daily \n\n# A tibble: 37,680 × 3\n symbol date ret\n <chr> <date> <dbl>\n1 AAPL 2019-08-02 -0.0212\n2 AAPL 2019-08-05 -0.0523\n3 AAPL 2019-08-06 0.0189\n4 AAPL 2019-08-07 0.0104\n5 AAPL 2019-08-08 0.0221\n# ℹ 37,675 more rows\n\n\nWe can use the tidy return data to quickly calcualte the estimated expected return of each asset in the Dow Jones Industrial Average.\n\nassets <- returns_daily |> \n group_by(symbol) |> \n summarize(mu = mean(ret))\n\nFigure Figure 1 shows the corresponding average daily returns of the constituents of our example portfolio.\n\nfig_mu <- assets |> \n ggplot(aes(x = mu, y = fct_reorder(symbol, mu), \n fill = mu > 0)) +\n geom_col() +\n scale_x_continuous(labels = percent) + \n labs(x = NULL, y = NULL, fill = NULL,\n title = \"Average daily returns of Dow index constituents\") +\n theme(legend.position = \"none\")\nfig_mu\n\n\n\n\n\n\n\nFigure 1: Average daily returns based on prices adjusted for dividend payments and stock splits.", "crumbs": [ "R", - "Financial Data", - "Accessing and Managing Financial Data" + "Getting Started", + "Modern Portfolio Theory" ] }, { - "objectID": "r/accessing-and-managing-financial-data.html#exercises", - "href": "r/accessing-and-managing-financial-data.html#exercises", - "title": "Accessing and Managing Financial Data", - "section": "Exercises", - "text": "Exercises\n\nDownload the monthly Fama-French factors manually from Ken French’s data library and read them in via read_csv(). Validate that you get the same data as via the frenchdata package.\nDownload the daily Fama-French 5 factors using the frenchdata package. Use get_french_data_list() to find the corresponding table name. After the successful download and conversion to the column format that we used above, compare the rf, mkt_excess, smb, and hml columns of factors_ff3_daily to factors_ff5_daily. Discuss any differences you might find.", + "objectID": "r/modern-portfolio-theory.html#estimate-the-variance-covariance-matrix", + "href": "r/modern-portfolio-theory.html#estimate-the-variance-covariance-matrix", + "title": "Modern Portfolio Theory", + "section": "Estimate the Variance-Covariance Matrix", + "text": "Estimate the Variance-Covariance Matrix\nIndividual asset risk in MPT is typically quantified using variance (\\(\\sigma^2\\)) or volatilities (\\(\\sigma\\)). The latter can be estimated as:\n\\[\\hat{\\sigma}_i = \\sqrt{\\frac{1}{T-1} \\sum_{t=1}^{T} (r_{it} - \\hat{\\mu}_i)^2}\\]\nNext, we transform the returns from a tidy tibble into a \\((T \\times N)\\) matrix with one column for each of the \\(N\\) symbols and one row for each of the \\(T\\) trading days to compute the sample average return vector \\[\\hat\\mu = \\frac{1}{T}\\sum\\limits_{t=1}^T r_t\\] where \\(r_t\\) is the \\(N\\) vector of returns on date \\(t\\) and the sample covariance matrix \\[\\hat\\Sigma = \\frac{1}{T-1}\\sum\\limits_{t=1}^T (r_t - \\hat\\mu)(r_t - \\hat\\mu)'.\\] We achieve this by using pivot_wider() with the new column names from the column symbol and setting the values to ret. We compute the vector of sample average returns and the sample variance-covariance matrix, which we consider as proxies for the parameters of the distribution of future stock returns. Thus, for simplicity, we refer to \\(\\Sigma\\) and \\(\\mu\\) instead of explicitly highlighting that the sample moments are estimates. In later chapters, we discuss the issues that arise once we take estimation uncertainty into account.\n\nvolatilities <- returns_daily |> \n group_by(symbol) |> \n summarize(sigma = sd(ret))\n\nassets <- assets |> \n left_join(volatilities, join_by(symbol))\n\nFigure Figure 2 shows the corresponding invidiual stock volatitilies.\n\nfig_sigma <- assets |> \n ggplot(aes(x = sigma, y = fct_reorder(symbol, sigma))) +\n geom_col() +\n scale_x_continuous(labels = percent) + \n labs(x = NULL, y = NULL,\n title = \"Daily volatilities of Dow index constituents\")\nfig_sigma\n\n\n\n\n\n\n\nFigure 2: Daily volatilities based on prices adjusted for dividend payments and stock splits.\n\n\n\n\n\nCovariance measures interaction between assets\n\\[\\hat{\\sigma}_{ij} = \\frac{1}{T-1} \\sum_{t=1}^{T} (R_{it} - \\hat{\\mu}_i)(R_{jt} - \\hat{\\mu}_j)\\]\nInterpretation:\n\nPositive: assets move in the same direction, potentially increasing portfolio risk\nNegative: assets move in opposite directions, which can reduce risk through diversification\n\nEstimating the variance-covariance matrix\n\nreturns_wide <- returns_daily |> \n pivot_wider(names_from = symbol, values_from = ret) \n\nvcov <- returns_wide |> \n select(-date) |> \n cov()\n\nFigure Figure 3 provides an illustration of the variance-covariance matrix.\n\nfig_vcov <- vcov |> \n as_tibble(rownames = \"symbol_a\") |> \n pivot_longer(-symbol_a, names_to = \"symbol_b\") |> \n ggplot(aes(x = symbol_a, y = fct_rev(symbol_b), fill = value)) +\n geom_tile() +\n labs(\n x = NULL, y = NULL, fill = \"(Co-)Variance\",\n title = \"Variance-covariance matrix of Dow index constituents\"\n ) + \n theme(axis.text.x = element_text(angle = 45, hjust = 1)) +\n guides(fill = guide_colorbar(barwidth = 15, barheight = 0.5))\nfig_vcov\n\n\n\n\n\n\n\nFigure 3: Variances and covariances based on prices adjusted for dividend payments and stock splits.", "crumbs": [ "R", - "Financial Data", - "Accessing and Managing Financial Data" + "Getting Started", + "Modern Portfolio Theory" ] }, { - "objectID": "r/clean-enhanced-trace-with-r.html", - "href": "r/clean-enhanced-trace-with-r.html", - "title": "Clean Enhanced TRACE with R", - "section": "", - "text": "Note\n\n\n\nYou are reading Tidy Finance with R. You can find the equivalent chapter for the sibling Tidy Finance with Python here.\n\n\nThis appendix contains code to clean enhanced TRACE with R. It is also available via the following GitHub gist. Hence, you could also source the function with devtools::source_gist(\"3a05b3ab281563b2e94858451c2eb3a4\"). We need this function in Chapter TRACE and FISD to download and clean enhanced TRACE trade messages following Dick-Nielsen (2009) and Dick-Nielsen (2014) for enhanced TRACE specifically. Relatedly, WRDS provides SAS code and there is Python code available by the project Open Source Bond Asset Pricing.\nThe function takes a vector of CUSIPs (in cusips), a connection to WRDS (connection) explained in Chapter 3, and a start and end date (start_date and end_date, respectively). Specifying too many CUSIPs will result in very slow downloads and a potential failure due to the size of the request to WRDS. The dates should be within the coverage of TRACE itself, i.e., starting after 2002, and the dates should be supplied using the class date. The output of the function contains all valid trade messages for the selected CUSIPs over the specified period.\n\nclean_enhanced_trace <- function(cusips,\n connection,\n start_date = as.Date(\"2002-01-01\"),\n end_date = today()) {\n\n # Packages (required)\n library(dplyr)\n library(lubridate)\n library(dbplyr)\n library(RPostgres)\n\n # Function checks ---------------------------------------------------------\n # Input parameters\n ## Cusips\n if (length(cusips) == 0 | any(is.na(cusips))) stop(\"Check cusips.\")\n\n ## Dates\n if (!is.Date(start_date) | !is.Date(end_date)) stop(\"Dates needed\")\n if (start_date < as.Date(\"2002-01-01\")) stop(\"TRACE starts later.\")\n if (end_date > today()) stop(\"TRACE does not predict the future.\")\n if (start_date >= end_date) stop(\"Date conflict.\")\n\n ## Connection\n if (!dbIsValid(connection)) stop(\"Connection issue.\")\n\n # Enhanced Trace ----------------------------------------------------------\n trace_enhanced_db <- tbl(connection, I(\"trace.trace_enhanced\"))\n \n # Main file\n trace_all <- trace_enhanced_db |>\n filter(\n cusip_id %in% cusips,\n between(trd_exctn_dt, start_date, end_date)\n ) |>\n select(cusip_id, msg_seq_nb, orig_msg_seq_nb,\n entrd_vol_qt, rptd_pr, yld_pt, rpt_side_cd, cntra_mp_id,\n trd_exctn_dt, trd_exctn_tm, trd_rpt_dt, trd_rpt_tm,\n pr_trd_dt, trc_st, asof_cd, wis_fl,\n days_to_sttl_ct, stlmnt_dt, spcl_trd_fl) |>\n collect()\n\n # Enhanced Trace: Post 06-02-2012 -----------------------------------------\n # Trades (trc_st = T) and correction (trc_st = R)\n trace_post_TR <- trace_all |>\n filter((trc_st == \"T\" | trc_st == \"R\"),\n trd_rpt_dt >= as.Date(\"2012-02-06\"))\n\n # Cancellations (trc_st = X) and correction cancellations (trc_st = C)\n trace_post_XC <- trace_all |>\n filter((trc_st == \"X\" | trc_st == \"C\"),\n trd_rpt_dt >= as.Date(\"2012-02-06\"))\n\n # Cleaning corrected and cancelled trades\n trace_post_TR <- trace_post_TR |>\n anti_join(trace_post_XC,\n by = join_by(cusip_id, msg_seq_nb, entrd_vol_qt,\n rptd_pr, rpt_side_cd, cntra_mp_id,\n trd_exctn_dt, trd_exctn_tm))\n\n # Reversals (trc_st = Y)\n trace_post_Y <- trace_all |>\n filter(trc_st == \"Y\",\n trd_rpt_dt >= as.Date(\"2012-02-06\"))\n\n # Clean reversals\n ## match the orig_msg_seq_nb of the Y-message to\n ## the msg_seq_nb of the main message\n trace_post <- trace_post_TR |>\n anti_join(trace_post_Y,\n by = join_by(cusip_id, msg_seq_nb == orig_msg_seq_nb,\n entrd_vol_qt, rptd_pr, rpt_side_cd,\n cntra_mp_id, trd_exctn_dt, trd_exctn_tm))\n\n\n # Enhanced TRACE: Pre 06-02-2012 ------------------------------------------\n # Cancellations (trc_st = C)\n trace_pre_C <- trace_all |>\n filter(trc_st == \"C\",\n trd_rpt_dt < as.Date(\"2012-02-06\"))\n\n # Trades w/o cancellations\n ## match the orig_msg_seq_nb of the C-message\n ## to the msg_seq_nb of the main message\n trace_pre_T <- trace_all |>\n filter(trc_st == \"T\",\n trd_rpt_dt < as.Date(\"2012-02-06\")) |>\n anti_join(trace_pre_C,\n by = join_by(cusip_id, msg_seq_nb == orig_msg_seq_nb,\n entrd_vol_qt, rptd_pr, rpt_side_cd,\n cntra_mp_id, trd_exctn_dt, trd_exctn_tm))\n\n # Corrections (trc_st = W) - W can also correct a previous W\n trace_pre_W <- trace_all |>\n filter(trc_st == \"W\",\n trd_rpt_dt < as.Date(\"2012-02-06\"))\n\n # Implement corrections in a loop\n ## Correction control\n correction_control <- nrow(trace_pre_W)\n correction_control_last <- nrow(trace_pre_W)\n\n ## Correction loop\n while (correction_control > 0) {\n # Corrections that correct some msg\n trace_pre_W_correcting <- trace_pre_W |>\n semi_join(trace_pre_T,\n by = join_by(cusip_id, trd_exctn_dt,\n orig_msg_seq_nb == msg_seq_nb))\n\n # Corrections that do not correct some msg\n trace_pre_W <- trace_pre_W |>\n anti_join(trace_pre_T,\n by = join_by(cusip_id, trd_exctn_dt,\n orig_msg_seq_nb == msg_seq_nb))\n\n # Delete msgs that are corrected and add correction msgs\n trace_pre_T <- trace_pre_T |>\n anti_join(trace_pre_W_correcting,\n by = join_by(cusip_id, trd_exctn_dt,\n msg_seq_nb == orig_msg_seq_nb)) |>\n union_all(trace_pre_W_correcting)\n\n # Escape if no corrections remain or they cannot be matched\n correction_control <- nrow(trace_pre_W)\n\n if (correction_control == correction_control_last) {\n\n correction_control <- 0\n\n }\n\n correction_control_last <- nrow(trace_pre_W)\n\n }\n\n\n # Clean reversals\n ## Record reversals\n trace_pre_R <- trace_pre_T |>\n filter(asof_cd == 'R') |>\n group_by(cusip_id, trd_exctn_dt, entrd_vol_qt,\n rptd_pr, rpt_side_cd, cntra_mp_id) |>\n arrange(trd_exctn_tm, trd_rpt_dt, trd_rpt_tm) |>\n mutate(seq = row_number()) |>\n ungroup()\n\n ## Remove reversals and the reversed trade\n trace_pre <- trace_pre_T |>\n filter(is.na(asof_cd) | !(asof_cd %in% c('R', 'X', 'D'))) |>\n group_by(cusip_id, trd_exctn_dt, entrd_vol_qt,\n rptd_pr, rpt_side_cd, cntra_mp_id) |>\n arrange(trd_exctn_tm, trd_rpt_dt, trd_rpt_tm) |>\n mutate(seq = row_number()) |>\n ungroup() |>\n anti_join(trace_pre_R,\n by = join_by(cusip_id, trd_exctn_dt, entrd_vol_qt,\n rptd_pr, rpt_side_cd, cntra_mp_id, seq)) |>\n select(-seq)\n\n\n # Agency trades -----------------------------------------------------------\n # Combine pre and post trades\n trace_clean <- trace_post |>\n union_all(trace_pre)\n\n # Keep angency sells and unmatched agency buys\n ## Agency sells\n trace_agency_sells <- trace_clean |>\n filter(cntra_mp_id == \"D\",\n rpt_side_cd == \"S\")\n\n # Agency buys that are unmatched\n trace_agency_buys_filtered <- trace_clean |>\n filter(cntra_mp_id == \"D\",\n rpt_side_cd == \"B\") |>\n anti_join(trace_agency_sells,\n by = join_by(cusip_id, trd_exctn_dt,\n entrd_vol_qt, rptd_pr))\n\n # Agency clean\n trace_clean <- trace_clean |>\n filter(cntra_mp_id == \"C\") |>\n union_all(trace_agency_sells) |>\n union_all(trace_agency_buys_filtered)\n\n\n # Additional Filters ------------------------------------------------------\n trace_add_filters <- trace_clean |>\n mutate(days_to_sttl_ct2 = stlmnt_dt - trd_exctn_dt) |>\n filter(is.na(days_to_sttl_ct) | as.numeric(days_to_sttl_ct) <= 7,\n is.na(days_to_sttl_ct2) | as.numeric(days_to_sttl_ct2) <= 7,\n wis_fl == \"N\",\n is.na(spcl_trd_fl) | spcl_trd_fl == \"\",\n is.na(asof_cd) | asof_cd == \"\")\n\n\n # Output ------------------------------------------------------------------\n # Only keep necessary columns\n trace_final <- trace_add_filters |>\n arrange(cusip_id, trd_exctn_dt, trd_exctn_tm) |>\n select(cusip_id, trd_exctn_dt, trd_exctn_tm,\n rptd_pr, entrd_vol_qt, yld_pt, rpt_side_cd, cntra_mp_id) |>\n mutate(trd_exctn_tm = format(as_datetime(trd_exctn_tm, tz = \"America/New_York\"), \"%H:%M:%S\"))\n\n trace_final\n}\n\n\n\n\n\nReferences\n\nDick-Nielsen, Jens. 2009. “Liquidity biases in TRACE.” The Journal of Fixed Income 19 (2): 43–55. https://doi.org/10.3905/jfi.2009.19.2.043.\n\n\n———. 2014. “How to clean enhanced TRACE data.” Working Paper. https://ssrn.com/abstract=2337908.", + "objectID": "r/modern-portfolio-theory.html#the-minimum-variance-framework", + "href": "r/modern-portfolio-theory.html#the-minimum-variance-framework", + "title": "Modern Portfolio Theory", + "section": "The Minimum-Variance Framework", + "text": "The Minimum-Variance Framework\n\\(\\text{Expected Portfolio Return} = \\sum_{i=1}^n \\omega_i \\hat{\\mu}_i\\)\n\n\\(\\omega_i\\): weight of asset \\(i\\) in the portfolio\n\\(\\hat{\\mu}_i\\): estimated expected return of asset \\(i\\)\n\nExample:\n\nAsset A: 60% weight, expected return 8%\nAsset B: 40% weight, expected return 12%\n\\((0.6 \\times 8\\%) + (0.4 \\times 12\\%) = 9.6\\%\\)\n\nAssumption: portfolio weights are constant over time\nPortfolio variance is calculated as\n\\[\\sum_{i=1}^{n} \\sum_{j=1}^{n} \\omega_i \\omega_j \\hat{\\sigma}_{ij}\\]\n\n\\(\\omega_i\\), \\(\\omega_j\\): the weights of assets \\(i\\), \\(j\\) in the portfolio\n\\(\\hat{\\sigma}_{ij}\\): covariance between returns of assets \\(i\\) and \\(j\\)\n\\(n\\): number of assets in portfolio\n\nMinimize portfolio variance\n\\[\\min_{\\omega_1, ... \\omega_n} \\sum_{i=1}^{n} \\sum_{j=1}^{n} \\omega_i \\omega_j \\hat{\\sigma}_{ij}\\]\nwhile staying fully invested\n\\[\\sum_{i=1}^{n} \\omega_i = 1\\]\nMinimum variance in matrix notation\nMinimize portfolio variance\n\\[\\min_{\\omega} \\omega' \\hat{\\Sigma} \\omega\\]\nwhile staying fully invested\n\\[ \\omega'\\iota = 1\\]\nSolution for minimum-variance portfolio\n\\[\\omega_\\text{mvp} = \\frac{\\Sigma^{-1}\\iota}{\\iota'\\Sigma^{-1}\\iota}\\]\n\n\\(\\iota\\): vector of 1’s\n\\(\\Sigma^{-1}\\): inverse of variance-covariance matrix \\(\\Sigma\\)\n\nThen, we compute the minimum variance portfolio weights \\(\\omega_\\text{mvp}\\) as well as the expected portfolio return \\(\\omega_\\text{mvp}'\\mu\\) and volatility \\(\\sqrt{\\omega_\\text{mvp}'\\Sigma\\omega_\\text{mvp}}\\) of this portfolio. Recall that the minimum variance portfolio is the vector of portfolio weights that are the solution to \\[\\omega_\\text{mvp} = \\arg\\min \\omega'\\Sigma \\omega \\text{ s.t. } \\sum\\limits_{i=1}^N\\omega_i = 1.\\] The constraint that weights sum up to one simply implies that all funds are distributed across the available asset universe, i.e., there is no possibility to retain cash. It is easy to show analytically that \\(\\omega_\\text{mvp} = \\frac{\\Sigma^{-1}\\iota}{\\iota'\\Sigma^{-1}\\iota}\\), where \\(\\iota\\) is a vector of ones and \\(\\Sigma^{-1}\\) is the inverse of \\(\\Sigma\\).\n\niota <- rep(1, dim(vcov)[1])\nvcov_inv <- solve(vcov)\nomega_mvp <- as.vector(vcov_inv %*% iota) / \n as.numeric(t(iota) %*% vcov_inv %*% iota)\n\nFigure Figure 4 shows the resulting portfolio weights.\n\nassets <- bind_cols(assets, omega_mvp = omega_mvp)\n\nfig_omega_mvp <- assets |>\n ggplot(aes(x = omega_mvp, y = fct_reorder(symbol, omega_mvp), \n fill = omega_mvp > 0)) +\n geom_col() +\n scale_x_continuous(labels = percent) + \n labs(x = NULL, y = NULL, \n title = \"Minimum-variance portfolio weights\") +\n theme(legend.position = \"none\")\nfig_omega_mvp\n\n\n\n\n\n\n\nFigure 4: Weights are based on returns adjusted for dividend payments and stock splits.\n\n\n\n\n\nMinimum-variance portfolio return\n\nmu <- assets$mu\n\nsummary_mvp <- tibble(\n mu = sum(omega_mvp * mu),\n sigma = as.numeric(sqrt(t(omega_mvp) %*% vcov %*% omega_mvp)),\n type = \"Minimum-Variance Portfolio\"\n)\n\nsummary_mvp\n\n# A tibble: 1 × 3\n mu sigma type \n <dbl> <dbl> <chr> \n1 0.000307 0.00937 Minimum-Variance Portfolio", "crumbs": [ "R", - "Appendix", - "Clean Enhanced TRACE with R" + "Getting Started", + "Modern Portfolio Theory" ] }, { - "objectID": "r/introduction-to-tidy-finance.html", - "href": "r/introduction-to-tidy-finance.html", - "title": "Introduction to Tidy Finance", - "section": "", - "text": "Note\n\n\n\nYou are reading Tidy Finance with R. You can find the equivalent chapter for the sibling Tidy Finance with Python here.\nThe main aim of this chapter is to familiarize yourself with the tidyverse. We start by downloading and visualizing stock data from Yahoo Finance. Then we move to a simple portfolio choice problem and construct the efficient frontier. These examples introduce you to our approach of Tidy Finance.", + "objectID": "r/modern-portfolio-theory.html#efficient-portfolios", + "href": "r/modern-portfolio-theory.html#efficient-portfolios", + "title": "Modern Portfolio Theory", + "section": "Efficient Portfolios", + "text": "Efficient Portfolios\nMinimize portfolio variance\n\\[\\min_{\\omega} \\omega' \\hat{\\Sigma} \\omega\\]\nWhile earning minimum expected return \\(\\bar{\\mu}\\)\n\n\\[ \\omega'\\iota = 1\\]\n\\(\\omega'\\hat{\\mu} = \\bar{\\mu}\\)\n\nDow Jones vs Nasdaq 100\n\ndownload_data(\n type = \"stock_prices\", \n symbol = c(\"^DJI\", \"^NDX\"), \n start_date = \"2019-08-01\", end_date = \"2024-07-31\"\n) |> \n group_by(symbol) |> \n arrange(date) |> \n mutate(adjusted_close = adjusted_close / first(adjusted_close)) |> \n ggplot(aes(x = date, y = adjusted_close, color = symbol)) +\n geom_line() +\n scale_y_continuous(labels = percent) + \n labs(x = NULL, y = NULL, color = NULL,\n title = \"Performance of Dow (^DJI) vs Nasdaq 100 (^NDX)\",\n subtitle = \"Both indexes start at 100%\") \n\n\n\n\n\n\n\n\nChoose a minimum expected return: Achieve at least average Nasdaq 100 return:\n\nmu_bar <- download_data(\n \"stock_prices\", symbol = \"^NDX\", \n start_date = \"2019-08-01\", end_date = \"2024-07-31\"\n) |> \n mutate(\n ret = adjusted_close / lag(adjusted_close) - 1\n ) |> \n summarize(mean(ret, na.rm = TRUE)) |> \n pull() \n\nNote: \\(\\bar\\mu\\) needs to be higher than \\(\\hat\\mu_{mvp}\\)\nSolution for efficient portfolio:\n\\[\\omega_{efp} = \\frac{\\lambda^*}{2}\\left(\\Sigma^{-1}\\mu -\\frac{D}{C}\\Sigma^{-1}\\iota \\right)\\]\nwhere \\(\\lambda^* = 2\\frac{\\bar\\mu - D/C}{E-D^2/C}\\), \\(C = \\iota'\\Sigma^{-1}\\iota\\), \\(D=\\iota'\\Sigma^{-1}\\mu\\) & \\(E=\\mu'\\Sigma^{-1}\\mu\\)\nSee details on tidy-finance.org\nCalculate efficient portfolio\nThe command solve(A, b) returns the solution of a system of equations \\(Ax = b\\). If b is not provided, as in the example above, it defaults to the identity matrix such that solve(sigma) delivers \\(\\Sigma^{-1}\\) (if a unique solution exists).\nNote that the monthly volatility of the minimum variance portfolio is of the same order of magnitude as the daily standard deviation of the individual components. Thus, the diversification benefits in terms of risk reduction are tremendous!\nNext, we set out to find the weights for a portfolio that achieves, as an example, three times the expected return of the minimum variance portfolio. However, mean-variance investors are not interested in any portfolio that achieves the required return but rather in the efficient portfolio, i.e., the portfolio with the lowest standard deviation. If you wonder where the solution \\(\\omega_\\text{eff}\\) comes from: The efficient portfolio is chosen by an investor who aims to achieve minimum variance given a minimum acceptable expected return \\(\\bar{\\mu}\\). Hence, their objective function is to choose \\(\\omega_\\text{eff}\\) as the solution to \\[\\omega_\\text{eff}(\\bar{\\mu}) = \\arg\\min \\omega'\\Sigma \\omega \\text{ s.t. } \\omega'\\iota = 1 \\text{ and } \\omega'\\mu \\geq \\bar{\\mu}.\\]\nThe code below implements the analytic solution to this optimization problem for a benchmark return \\(\\bar\\mu\\), which we set to 3 times the expected return of the minimum variance portfolio. We encourage you to verify that it is correct.\n\nC <- as.numeric(t(iota) %*% vcov_inv %*% iota)\nD <- as.numeric(t(iota) %*% vcov_inv %*% mu)\nE <- as.numeric(t(mu) %*% vcov_inv %*% mu)\nlambda_tilde <- as.numeric(2 * (mu_bar - D / C) / (E - D^2 / C))\nomega_efp <- as.vector(omega_mvp + lambda_tilde / 2 * (vcov_inv %*% mu - D * omega_mvp))\n\nsummary_efp <- tibble(\n mu = sum(omega_efp * mu),\n sigma = as.numeric(sqrt(t(omega_efp) %*% vcov %*% omega_efp)),\n type = \"Efficient Portfolio\"\n)\n\nFigure Figure 5 sows the average return and volatility of the the minimum-variance and efficient portfolio relative to the index constituents.\n\nsummaries <- bind_rows(\n assets, summary_mvp, summary_efp\n) \n\nfig_summaries <- summaries |> \n ggplot(aes(x = sigma, y = mu)) +\n geom_point(\n data = summaries |> filter(is.na(type))\n ) +\n geom_point(\n data = summaries |> filter(!is.na(type)), color = \"#F21A00\", size = 3\n ) +\n geom_label_repel(aes(label = type)) +\n scale_x_continuous(labels = percent) +\n scale_y_continuous(labels = percent) + \n labs(\n x = \"Volatility\", y = \"Average return\",\n title = \"Efficient & minimum-variance portfolios for Dow index constituents\"\n ) \nfig_summaries\n\n\n\n\n\n\n\nFigure 5: The big dots indicate the location of the minimum variance and the efficient portfolio that delivers the expected return of the Nasdaq 100, respectively. The small dots indicate the location of the individual constituents.", "crumbs": [ "R", "Getting Started", - "Introduction to Tidy Finance" + "Modern Portfolio Theory" ] }, { - "objectID": "r/introduction-to-tidy-finance.html#working-with-stock-market-data", - "href": "r/introduction-to-tidy-finance.html#working-with-stock-market-data", - "title": "Introduction to Tidy Finance", - "section": "Working with Stock Market Data", - "text": "Working with Stock Market Data\nAt the start of each session, we load the required R packages. Throughout the entire book, we always use the tidyverse (Wickham et al. 2019). In this chapter, we also load the tidyfinance package to download stock price data. This package provides a convenient wrapper for various quantitative functions compatible with the tidyverse and our book. Finally, the package scales (Wickham and Seidel 2022) provides useful scale functions for visualizations.\nYou typically have to install a package once before you can load it. In case you have not done this yet, call install.packages(\"tidyfinance\"). \n\nlibrary(tidyverse)\nlibrary(tidyfinance)\nlibrary(scales)\n\nWe first download daily prices for one stock symbol, e.g., the Apple stock, AAPL, directly from the data provider Yahoo Finance. To download the data, you can use the function download_data. If you do not know how to use it, make sure you read the help file by calling ?download_data. We especially recommend taking a look at the examples section of the documentation. We request daily data for a period of more than 20 years.\n\nprices <- download_data(\n type = \"stock_prices\",\n symbols = \"AAPL\",\n start_date = \"2000-01-01\",\n end_date = \"2023-12-31\"\n)\nprices\n\n# A tibble: 6,037 × 8\n symbol date volume open low high close adjusted_close\n <chr> <date> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>\n1 AAPL 2000-01-03 535796800 0.936 0.908 1.00 0.999 0.844\n2 AAPL 2000-01-04 512377600 0.967 0.903 0.988 0.915 0.773\n3 AAPL 2000-01-05 778321600 0.926 0.920 0.987 0.929 0.784\n4 AAPL 2000-01-06 767972800 0.948 0.848 0.955 0.848 0.716\n5 AAPL 2000-01-07 460734400 0.862 0.853 0.902 0.888 0.750\n# ℹ 6,032 more rows\n\n\n download_data(type = \"stock_prices\") downloads stock market data from Yahoo Finance. The function returns a tibble with eight quite self-explanatory columns: symbol, date, the daily volume (in the number of traded shares), the market prices at the open, high, low, close, and the adjusted price in USD. The adjusted prices are corrected for anything that might affect the stock price after the market closes, e.g., stock splits and dividends. These actions affect the quoted prices, but they have no direct impact on the investors who hold the stock. Therefore, we often rely on adjusted prices when it comes to analyzing the returns an investor would have earned by holding the stock continuously.\nNext, we use the ggplot2 package (Wickham 2016) to visualize the time series of adjusted prices in Figure 1 . This package takes care of visualization tasks based on the principles of the grammar of graphics (Wilkinson 2012).\n\nprices |>\n ggplot(aes(x = date, y = adjusted_close)) +\n geom_line() +\n labs(\n x = NULL,\n y = NULL,\n title = \"Apple stock prices between beginning of 2000 and end of 2023\"\n )\n\n\n\n\n\n\n\nFigure 1: Prices are in USD, adjusted for dividend payments and stock splits.\n\n\n\n\n\n Instead of analyzing prices, we compute daily net returns defined as \\(r_t = p_t / p_{t-1} - 1\\), where \\(p_t\\) is the adjusted day \\(t\\) price. In that context, the function lag() is helpful, which returns the previous value in a vector.\n\nreturns <- prices |>\n arrange(date) |>\n mutate(ret = adjusted_close / lag(adjusted_close) - 1) |>\n select(symbol, date, ret)\nreturns\n\n# A tibble: 6,037 × 3\n symbol date ret\n <chr> <date> <dbl>\n1 AAPL 2000-01-03 NA \n2 AAPL 2000-01-04 -0.0843\n3 AAPL 2000-01-05 0.0146\n4 AAPL 2000-01-06 -0.0865\n5 AAPL 2000-01-07 0.0474\n# ℹ 6,032 more rows\n\n\nThe resulting tibble contains three columns, where the last contains the daily returns (ret). Note that the first entry naturally contains a missing value (NA) because there is no previous price. Obviously, the use of lag() would be meaningless if the time series is not ordered by ascending dates. The command arrange() provides a convenient way to order observations in the correct way for our application. In case you want to order observations by descending dates, you can use arrange(desc(date)).\nFor the upcoming examples, we remove missing values as these would require separate treatment when computing, e.g., sample averages. In general, however, make sure you understand why NA values occur and carefully examine if you can simply get rid of these observations.\n\nreturns <- returns |>\n drop_na(ret)\n\nNext, we visualize the distribution of daily returns in a histogram in Figure 2. Additionally, we add a dashed line that indicates the 5 percent quantile of the daily returns to the histogram, which is a (crude) proxy for the worst return of the stock with a probability of at most 5 percent. The 5 percent quantile is closely connected to the (historical) value-at-risk, a risk measure commonly monitored by regulators. We refer to Tsay (2010) for a more thorough introduction to stylized facts of returns.\n\nquantile_05 <- quantile(returns |> pull(ret), probs = 0.05)\nreturns |>\n ggplot(aes(x = ret)) +\n geom_histogram(bins = 100) +\n geom_vline(aes(xintercept = quantile_05),\n linetype = \"dashed\"\n ) +\n labs(\n x = NULL,\n y = NULL,\n title = \"Distribution of daily Apple stock returns\"\n ) +\n scale_x_continuous(labels = percent)\n\n\n\n\n\n\n\nFigure 2: The dotted vertical line indicates the historical 5 percent quantile.\n\n\n\n\n\nHere, bins = 100 determines the number of bins used in the illustration and hence implicitly the width of the bins. Before proceeding, make sure you understand how to use the geom geom_vline() to add a dashed line that indicates the 5 percent quantile of the daily returns. A typical task before proceeding with any data is to compute summary statistics for the main variables of interest.\n\nreturns |>\n summarize(across(\n ret,\n list(\n daily_mean = mean,\n daily_sd = sd,\n daily_min = min,\n daily_max = max\n )\n ))\n\n# A tibble: 1 × 4\n ret_daily_mean ret_daily_sd ret_daily_min ret_daily_max\n <dbl> <dbl> <dbl> <dbl>\n1 0.00122 0.0247 -0.519 0.139\n\n\nWe see that the maximum daily return was 13.905 percent. Perhaps not surprisingly, the average daily return is close to but slightly above 0. In line with the illustration above, the large losses on the day with the minimum returns indicate a strong asymmetry in the distribution of returns.\nYou can also compute these summary statistics for each year individually by imposing group_by(year = year(date)), where the call year(date) returns the year. More specifically, the few lines of code below compute the summary statistics from above for individual groups of data defined by year. The summary statistics, therefore, allow an eyeball analysis of the time-series dynamics of the return distribution.\n\nreturns |>\n group_by(year = year(date)) |>\n summarize(across(\n ret,\n list(\n daily_mean = mean,\n daily_sd = sd,\n daily_min = min,\n daily_max = max\n ),\n .names = \"{.fn}\"\n )) |>\n print(n = Inf)\n\n# A tibble: 24 × 5\n year daily_mean daily_sd daily_min daily_max\n <dbl> <dbl> <dbl> <dbl> <dbl>\n 1 2000 -0.00346 0.0549 -0.519 0.137 \n 2 2001 0.00233 0.0393 -0.172 0.129 \n 3 2002 -0.00121 0.0305 -0.150 0.0846\n 4 2003 0.00186 0.0234 -0.0814 0.113 \n 5 2004 0.00470 0.0255 -0.0558 0.132 \n 6 2005 0.00349 0.0245 -0.0921 0.0912\n 7 2006 0.000949 0.0243 -0.0633 0.118 \n 8 2007 0.00366 0.0238 -0.0702 0.105 \n 9 2008 -0.00265 0.0367 -0.179 0.139 \n10 2009 0.00382 0.0214 -0.0502 0.0676\n11 2010 0.00183 0.0169 -0.0496 0.0769\n12 2011 0.00104 0.0165 -0.0559 0.0589\n13 2012 0.00130 0.0186 -0.0644 0.0887\n14 2013 0.000472 0.0180 -0.124 0.0514\n15 2014 0.00145 0.0136 -0.0799 0.0820\n16 2015 0.0000199 0.0168 -0.0612 0.0574\n17 2016 0.000575 0.0147 -0.0657 0.0650\n18 2017 0.00164 0.0111 -0.0388 0.0610\n19 2018 -0.0000573 0.0181 -0.0663 0.0704\n20 2019 0.00266 0.0165 -0.0996 0.0683\n21 2020 0.00281 0.0294 -0.129 0.120 \n22 2021 0.00131 0.0158 -0.0417 0.0539\n23 2022 -0.000970 0.0225 -0.0587 0.0890\n24 2023 0.00168 0.0128 -0.0480 0.0469\n\n\n\nIn case you wonder: the additional argument .names = \"{.fn}\" in across() determines how to name the output columns. The specification is rather flexible and allows almost arbitrary column names, which can be useful for reporting. The print() function simply controls the output options for the R console.", + "objectID": "r/modern-portfolio-theory.html#the-efficient-frontier", + "href": "r/modern-portfolio-theory.html#the-efficient-frontier", + "title": "Modern Portfolio Theory", + "section": "The Efficient Frontier", + "text": "The Efficient Frontier\n An essential tool to evaluate portfolios in the mean-variance context is the efficient frontier, the set of portfolios which satisfies the condition that no other portfolio exists with a higher expected return but with the same volatility (the square root of the variance, i.e., the risk), see, e.g., Merton (1972). We compute and visualize the efficient frontier for several stocks. First, we extract each asset’s monthly returns. In order to keep things simple, we work with a balanced panel and exclude Dow constituents for which we do not observe a price on every single trading day since the year 2000.\n The mutual fund separation theorem states that as soon as we have two efficient portfolios (such as the minimum variance portfolio \\(\\omega_\\text{mvp}\\) and the efficient portfolio for a higher required level of expected returns \\(\\omega_\\text{eff}(\\bar{\\mu})\\), we can characterize the entire efficient frontier by combining these two portfolios. That is, any linear combination of the two portfolio weights will again represent an efficient portfolio. The code below implements the construction of the efficient frontier, which characterizes the highest expected return achievable at each level of risk.\n\\[\\omega_{eff} = a \\cdot \\omega_{efp} + (1-a) \\cdot\\omega_{mvp}\\]\n\nefficient_frontier <- tibble(\n a = seq(from = -1, to = 4, by = 0.01),\n) |> \n mutate(\n omega = map(a, ~ .x * omega_efp + (1 - .x) * omega_mvp),\n mu = map_dbl(omega, ~ t(.x) %*% mu),\n sigma = map_dbl(omega, ~ sqrt(t(.x) %*% vcov %*% .x)),\n ) \n\nThe code above proceeds in two steps: First, we compute a vector of combination weights \\(a\\) and then we evaluate the resulting linear combination with \\(a\\in\\mathbb{R}\\):\n\\[\\omega^* = a\\omega_\\text{eff}(\\bar\\mu) + (1-a)\\omega_\\text{mvp} = \\omega_\\text{mvp} + \\frac{\\lambda^*}{2}\\left(\\Sigma^{-1}\\mu -\\frac{D}{C}\\Sigma^{-1}\\iota \\right)\\] with \\(\\lambda^* = 2\\frac{a\\bar\\mu + (1-a)\\tilde\\mu - D/C}{E-D^2/C}\\) where \\(C = \\iota'\\Sigma^{-1}\\iota\\), \\(D=\\iota'\\Sigma^{-1}\\mu\\), and \\(E=\\mu'\\Sigma^{-1}\\mu\\). Finally, it is simple to visualize the efficient frontier alongside the two efficient portfolios within one powerful figure using ggplot (see Figure 6). We also add the individual stocks in the same call. We compute annualized returns based on the simple assumption that monthly returns are independent and identically distributed. Thus, the average annualized return is just 12 times the expected monthly return.\n\nsummaries <- bind_rows(\n summaries, efficient_frontier\n )\n\nfig_efficient_frontier <- summaries |> \n ggplot(aes(x = sigma, y = mu)) +\n geom_point(\n data = summaries |> filter(is.na(type))\n ) +\n geom_point(\n data = summaries |> filter(!is.na(type)), color = \"#F21A00\", size = 3\n ) +\n geom_label_repel(aes(label = type)) +\n scale_x_continuous(labels = percent) +\n scale_y_continuous(labels = percent) + \n labs(x = \"Volatility\", y = \"Average return\",\n title = \"Efficient frontier for Dow index constituents\") + \n theme(legend.position = \"none\")\nfig_efficient_frontier\n\n\n\n\n\n\n\nFigure 6: The big dots indicate the location of the minimum variance and the efficient portfolio that delivers 3 times the expected return of the minimum variance portfolio, respectively. The small dots indicate the location of the individual constituents.", "crumbs": [ "R", "Getting Started", - "Introduction to Tidy Finance" + "Modern Portfolio Theory" ] }, { - "objectID": "r/introduction-to-tidy-finance.html#scaling-up-the-analysis", - "href": "r/introduction-to-tidy-finance.html#scaling-up-the-analysis", - "title": "Introduction to Tidy Finance", - "section": "Scaling Up the Analysis", - "text": "Scaling Up the Analysis\nAs a next step, we generalize the code from before such that all the computations can handle an arbitrary vector of symbols (e.g., all constituents of an index). Following tidy principles, it is quite easy to download the data, plot the price time series, and tabulate the summary statistics for an arbitrary number of assets.\nThis is where the tidyverse magic starts: tidy data makes it extremely easy to generalize the computations from before to as many assets as you like. The following code takes any vector of symbols, e.g., symbol <- c(\"AAPL\", \"MMM\", \"BA\"), and automates the download as well as the plot of the price time series. In the end, we create the table of summary statistics for an arbitrary number of assets. We perform the analysis with data from all current constituents of the Dow Jones Industrial Average index. \n\nsymbols <- download_data(type = \"constituents\", index = \"Dow Jones Industrial Average\") \nsymbols\n\n# A tibble: 30 × 5\n symbol name location exchange currency\n <chr> <chr> <chr> <chr> <chr> \n1 UNH UNITEDHEALTH GROUP INC Vereinigte Staaten New Yor… USD \n2 GS GOLDMAN SACHS GROUP INC Vereinigte Staaten New Yor… USD \n3 MSFT MICROSOFT CORP Vereinigte Staaten NASDAQ USD \n4 HD HOME DEPOT INC Vereinigte Staaten New Yor… USD \n5 CAT CATERPILLAR INC Vereinigte Staaten New Yor… USD \n# ℹ 25 more rows\n\n\nConveniently, tidyfinance provides the functionality to get all stock prices from an index with a single call. \n\nprices_daily <- download_data(\n type = \"stock_prices\",\n symbols = symbols$symbol,\n start_date = \"2000-01-01\",\n end_date = \"2023-12-31\"\n)\n\nThe resulting tibble contains 173093 daily observations for UNH, GS, MSFT, HD, CAT, AMGN, MCD, V, AXP, CRM, TRV, AAPL, IBM, JPM, HON, AMZN, PG, JNJ, BA, CVX, MMM, MRK, DIS, WMT, NKE, KO, DOW, CSCO, VZ, INTC different stocks. Figure 3 illustrates the time series of downloaded adjusted prices for each of the constituents of the Dow Jones index. Make sure you understand every single line of code! What are the arguments of aes()? Which alternative geoms could you use to visualize the time series? Hint: if you do not know the answers try to change the code to see what difference your intervention causes.\n\nprices_daily |>\n ggplot(aes(\n x = date,\n y = adjusted_close,\n color = symbol\n )) +\n geom_line() +\n labs(\n x = NULL,\n y = NULL,\n color = NULL,\n title = \"Stock prices of DOW index constituents\"\n ) +\n theme(legend.position = \"none\")\n\n\n\n\n\n\n\nFigure 3: Prices in USD, adjusted for dividend payments and stock splits.\n\n\n\n\n\nDo you notice the small differences relative to the code we used before? All we need to do to illustrate all stock symbols simultaneously is to include color = symbol in the ggplot aesthetics. In this way, we generate a separate line for each symbol. Of course, there are simply too many lines on this graph to identify the individual stocks properly, but it illustrates the point well.\nThe same holds for stock returns. Before computing the returns, we use group_by(symbol) such that the mutate() command is performed for each symbol individually. The same logic also applies to the computation of summary statistics: group_by(symbol) is the key to aggregating the time series into symbol-specific variables of interest.\n\nreturns_daily <- prices_daily |>\n group_by(symbol) |>\n mutate(ret = adjusted_close / lag(adjusted_close) - 1) |>\n select(symbol, date, ret) |>\n drop_na(ret)\n\nreturns_daily |>\n group_by(symbol) |>\n summarize(across(\n ret,\n list(\n daily_mean = mean,\n daily_sd = sd,\n daily_min = min,\n daily_max = max\n ),\n .names = \"{.fn}\"\n )) |>\n print(n = Inf)\n\n# A tibble: 30 × 5\n symbol daily_mean daily_sd daily_min daily_max\n <chr> <dbl> <dbl> <dbl> <dbl>\n 1 AAPL 0.00122 0.0247 -0.519 0.139\n 2 AMGN 0.000493 0.0194 -0.134 0.151\n 3 AMZN 0.00107 0.0315 -0.248 0.345\n 4 AXP 0.000544 0.0227 -0.176 0.219\n 5 BA 0.000628 0.0222 -0.238 0.243\n 6 CAT 0.000724 0.0203 -0.145 0.147\n 7 CRM 0.00119 0.0266 -0.271 0.260\n 8 CSCO 0.000322 0.0234 -0.162 0.244\n 9 CVX 0.000511 0.0175 -0.221 0.227\n10 DIS 0.000414 0.0194 -0.184 0.160\n11 DOW 0.000580 0.0240 -0.217 0.209\n12 GS 0.000557 0.0229 -0.190 0.265\n13 HD 0.000544 0.0192 -0.287 0.141\n14 HON 0.000497 0.0191 -0.174 0.282\n15 IBM 0.000297 0.0163 -0.155 0.120\n16 INTC 0.000396 0.0236 -0.220 0.201\n17 JNJ 0.000379 0.0121 -0.158 0.122\n18 JPM 0.000606 0.0238 -0.207 0.251\n19 KO 0.000318 0.0131 -0.101 0.139\n20 MCD 0.000536 0.0145 -0.159 0.181\n21 MMM 0.000363 0.0151 -0.129 0.126\n22 MRK 0.000371 0.0166 -0.268 0.130\n23 MSFT 0.000573 0.0193 -0.156 0.196\n24 NKE 0.000708 0.0193 -0.198 0.155\n25 PG 0.000362 0.0133 -0.302 0.120\n26 TRV 0.000555 0.0181 -0.208 0.256\n27 UNH 0.000948 0.0196 -0.186 0.348\n28 V 0.000933 0.0185 -0.136 0.150\n29 VZ 0.000238 0.0151 -0.118 0.146\n30 WMT 0.000323 0.0148 -0.114 0.117\n\n\n\nNote that you are now also equipped with all tools to download price data for each symbol listed in the S&P 500 index with the same number of lines of code. Just use symbol <- download_data(type = \"constituents\", index = \"S&P 500\"), which provides you with a tibble that contains each symbol that is (currently) part of the S&P 500. However, don’t try this if you are not prepared to wait for a couple of minutes because this is quite some data to download!", + "objectID": "r/modern-portfolio-theory.html#extending-the-markowitz-model", + "href": "r/modern-portfolio-theory.html#extending-the-markowitz-model", + "title": "Modern Portfolio Theory", + "section": "Extending the Markowitz Model", + "text": "Extending the Markowitz Model\nReplicate minimum-variance via PortfolioAnalytics package.\n\nlibrary(PortfolioAnalytics)\nlibrary(CVXR)\n\n\nreturns_matrix <- column_to_rownames(\n returns_wide, var = \"date\"\n)\n\nproblem_mvp <- portfolio.spec(colnames(returns_matrix)) |>\n add.objective(type = \"risk\", name = \"var\") |> \n add.constraint(\"full_investment\")\n\nsolution_mvp <- optimize.portfolio(\n returns_matrix, problem_mvp, optimize_method = \"CVXR\"\n)\n\nall.equal(omega_mvp, as.vector(solution_mvp$weights))\n\n[1] TRUE\n\n\nReplicate efficient portfolio via PortfolioAnalytics\n\nproblem_efp <- problem_mvp |> \n add.constraint(\"return\", return_target = mu_bar)\n\nsolution_efp <- optimize.portfolio(\n returns_matrix, problem_efp, optimize_method = \"CVXR\"\n)\n\nall.equal(omega_efp, as.vector(solution_efp$weights)) \n\n[1] TRUE\n\n\nEasy to extend Markowitz model\n\nShort sale constraints: add.constraint(\"long_only\")\nPosition limit: add.constraint(\"position_limit\", max_pos = 10)\nExpected shortfall: add.objective(type = \"risk\", name = \"ES\")\n\n.. and many more, see official PortfolioAnalytics vignette\n\nMean-variance framework is a cornerstone of finance\nDownload financial data using tidyfinance package\nEasy to compute analytic solutions ‘manually’\nImplement extensions using PortfolioAnalytics\nMore advanced: constrained optimization & backtesting", "crumbs": [ "R", "Getting Started", - "Introduction to Tidy Finance" + "Modern Portfolio Theory" ] }, { - "objectID": "r/introduction-to-tidy-finance.html#other-forms-of-data-aggregation", - "href": "r/introduction-to-tidy-finance.html#other-forms-of-data-aggregation", - "title": "Introduction to Tidy Finance", - "section": "Other Forms of Data Aggregation", - "text": "Other Forms of Data Aggregation\nOf course, aggregation across variables other than symbol can also make sense. For instance, suppose you are interested in answering the question: Are days with high aggregate trading volume likely followed by days with high aggregate trading volume? To provide some initial analysis on this question, we take the downloaded data and compute aggregate daily trading volume for all Dow Jones constituents in USD. Recall that the column volume is denoted in the number of traded shares. Thus, we multiply the trading volume with the daily closing price to get a proxy for the aggregate trading volume in USD. Scaling by 1e9 (R can handle scientific notation) denotes daily trading volume in billion USD.\n\ntrading_volume <- prices_daily |>\n group_by(date) |>\n summarize(trading_volume = sum(volume * adjusted_close))\n\ntrading_volume |>\n ggplot(aes(x = date, y = trading_volume)) +\n geom_line() +\n labs(\n x = NULL, y = NULL,\n title = \"Aggregate daily trading volume of DOW index constitutens\"\n ) +\n scale_y_continuous(labels = unit_format(unit = \"B\", scale = 1e-9))\n\n\n\n\n\n\n\nFigure 4: Total daily trading volume in billion USD.\n\n\n\n\n\nFigure 4 indicates a clear upward trend in aggregated daily trading volume. In particular, since the outbreak of the COVID-19 pandemic, markets have processed substantial trading volumes, as analyzed, for instance, by Goldstein, Koijen, and Mueller (2021). One way to illustrate the persistence of trading volume would be to plot volume on day \\(t\\) against volume on day \\(t-1\\) as in the example below. In Figure 5, we add a dotted 45°-line to indicate a hypothetical one-to-one relation by geom_abline(), addressing potential differences in the axes’ scales.\n\ntrading_volume |>\n ggplot(aes(x = lag(trading_volume), y = trading_volume)) +\n geom_point() +\n geom_abline(aes(intercept = 0, slope = 1),\n linetype = \"dashed\"\n ) +\n labs(\n x = \"Previous day aggregate trading volume\",\n y = \"Aggregate trading volume\",\n title = \"Persistence in daily trading volume of DOW index constituents\"\n ) + \n scale_x_continuous(labels = unit_format(unit = \"B\", scale = 1e-9)) +\n scale_y_continuous(labels = unit_format(unit = \"B\", scale = 1e-9))\n\nWarning: Removed 1 rows containing missing values (`geom_point()`).\n\n\n\n\n\n\n\n\nFigure 5: Total daily trading volume in billion USD.\n\n\n\n\n\nDo you understand where the warning ## Warning: Removed 1 rows containing missing values (geom_point). comes from and what it means? Purely eye-balling reveals that days with high trading volume are often followed by similarly high trading volume days.", + "objectID": "r/modern-portfolio-theory.html#key-takeaways", + "href": "r/modern-portfolio-theory.html#key-takeaways", + "title": "Modern Portfolio Theory", + "section": "Key Takeaways", + "text": "Key Takeaways\n…", "crumbs": [ "R", "Getting Started", - "Introduction to Tidy Finance" + "Modern Portfolio Theory" ] }, { - "objectID": "r/introduction-to-tidy-finance.html#portfolio-choice-problems", - "href": "r/introduction-to-tidy-finance.html#portfolio-choice-problems", - "title": "Introduction to Tidy Finance", - "section": "Portfolio Choice Problems", - "text": "Portfolio Choice Problems\nIn the previous part, we show how to download stock market data and inspect it with graphs and summary statistics. Now, we move to a typical question in Finance: how to allocate wealth across different assets optimally. The standard framework for optimal portfolio selection considers investors that prefer higher future returns but dislike future return volatility (defined as the square root of the return variance): the mean-variance investor (Markowitz 1952).\n An essential tool to evaluate portfolios in the mean-variance context is the efficient frontier, the set of portfolios which satisfies the condition that no other portfolio exists with a higher expected return but with the same volatility (the square root of the variance, i.e., the risk), see, e.g., Merton (1972). We compute and visualize the efficient frontier for several stocks. First, we extract each asset’s monthly returns. In order to keep things simple, we work with a balanced panel and exclude DOW constituents for which we do not observe a price on every single trading day since the year 2000.\n\nprices_daily <- prices_daily |>\n group_by(symbol) |>\n mutate(n = n()) |>\n ungroup() |>\n filter(n == max(n)) |>\n select(-n)\n\nreturns_monthly <- prices_daily |>\n mutate(date = floor_date(date, \"month\")) |>\n group_by(symbol, date) |>\n summarize(price = last(adjusted_close), .groups = \"drop_last\") |>\n mutate(ret = price / lag(price) - 1) |>\n drop_na(ret) |>\n select(-price)\n\nHere, floor_date() is a function from the lubridate package (Grolemund and Wickham 2011), which provides useful functions to work with dates and times.\nNext, we transform the returns from a tidy tibble into a \\((T \\times N)\\) matrix with one column for each of the \\(N\\) symbols and one row for each of the \\(T\\) trading days to compute the sample average return vector \\[\\hat\\mu = \\frac{1}{T}\\sum\\limits_{t=1}^T r_t\\] where \\(r_t\\) is the \\(N\\) vector of returns on date \\(t\\) and the sample covariance matrix \\[\\hat\\Sigma = \\frac{1}{T-1}\\sum\\limits_{t=1}^T (r_t - \\hat\\mu)(r_t - \\hat\\mu)'.\\] We achieve this by using pivot_wider() with the new column names from the column symbol and setting the values to ret. We compute the vector of sample average returns and the sample variance-covariance matrix, which we consider as proxies for the parameters of the distribution of future stock returns. Thus, for simplicity, we refer to \\(\\Sigma\\) and \\(\\mu\\) instead of explicitly highlighting that the sample moments are estimates. In later chapters, we discuss the issues that arise once we take estimation uncertainty into account.\n\nreturns_matrix <- returns_monthly |>\n pivot_wider(\n names_from = symbol,\n values_from = ret\n ) |>\n select(-date)\nsigma <- cov(returns_matrix)\nmu <- colMeans(returns_matrix)\n\nThen, we compute the minimum variance portfolio weights \\(\\omega_\\text{mvp}\\) as well as the expected portfolio return \\(\\omega_\\text{mvp}'\\mu\\) and volatility \\(\\sqrt{\\omega_\\text{mvp}'\\Sigma\\omega_\\text{mvp}}\\) of this portfolio. Recall that the minimum variance portfolio is the vector of portfolio weights that are the solution to \\[\\omega_\\text{mvp} = \\arg\\min \\omega'\\Sigma \\omega \\text{ s.t. } \\sum\\limits_{i=1}^N\\omega_i = 1.\\] The constraint that weights sum up to one simply implies that all funds are distributed across the available asset universe, i.e., there is no possibility to retain cash. It is easy to show analytically that \\(\\omega_\\text{mvp} = \\frac{\\Sigma^{-1}\\iota}{\\iota'\\Sigma^{-1}\\iota}\\), where \\(\\iota\\) is a vector of ones and \\(\\Sigma^{-1}\\) is the inverse of \\(\\Sigma\\).\n\nN <- ncol(returns_matrix)\niota <- rep(1, N)\nsigma_inv <- solve(sigma)\nmvp_weights <- sigma_inv %*% iota\nmvp_weights <- mvp_weights / sum(mvp_weights)\ntibble(\n average_ret = as.numeric(t(mvp_weights) %*% mu),\n volatility = as.numeric(sqrt(t(mvp_weights) %*% sigma %*% mvp_weights))\n)\n\n# A tibble: 1 × 2\n average_ret volatility\n <dbl> <dbl>\n1 0.00783 0.0323\n\n\nThe command solve(A, b) returns the solution of a system of equations \\(Ax = b\\). If b is not provided, as in the example above, it defaults to the identity matrix such that solve(sigma) delivers \\(\\Sigma^{-1}\\) (if a unique solution exists).\nNote that the monthly volatility of the minimum variance portfolio is of the same order of magnitude as the daily standard deviation of the individual components. Thus, the diversification benefits in terms of risk reduction are tremendous!\nNext, we set out to find the weights for a portfolio that achieves, as an example, three times the expected return of the minimum variance portfolio. However, mean-variance investors are not interested in any portfolio that achieves the required return but rather in the efficient portfolio, i.e., the portfolio with the lowest standard deviation. If you wonder where the solution \\(\\omega_\\text{eff}\\) comes from: The efficient portfolio is chosen by an investor who aims to achieve minimum variance given a minimum acceptable expected return \\(\\bar{\\mu}\\). Hence, their objective function is to choose \\(\\omega_\\text{eff}\\) as the solution to \\[\\omega_\\text{eff}(\\bar{\\mu}) = \\arg\\min \\omega'\\Sigma \\omega \\text{ s.t. } \\omega'\\iota = 1 \\text{ and } \\omega'\\mu \\geq \\bar{\\mu}.\\]\nThe code below implements the analytic solution to this optimization problem for a benchmark return \\(\\bar\\mu\\), which we set to 3 times the expected return of the minimum variance portfolio. We encourage you to verify that it is correct.\n\nbenchmark_multiple <- 3\nmu_bar <- benchmark_multiple * t(mvp_weights) %*% mu\nC <- as.numeric(t(iota) %*% sigma_inv %*% iota)\nD <- as.numeric(t(iota) %*% sigma_inv %*% mu)\nE <- as.numeric(t(mu) %*% sigma_inv %*% mu)\nlambda_tilde <- as.numeric(2 * (mu_bar - D / C) / (E - D^2 / C))\nefp_weights <- mvp_weights +\n lambda_tilde / 2 * (sigma_inv %*% mu - D * mvp_weights)", + "objectID": "r/modern-portfolio-theory.html#exercises", + "href": "r/modern-portfolio-theory.html#exercises", + "title": "Modern Portfolio Theory", + "section": "Exercises", + "text": "Exercises\n\nIn the portfolio choice analysis, we restricted our sample to all assets trading every day since 2000. How is such a decision a problem when you want to infer future expected portfolio performance from the results?\nThe efficient frontier characterizes the portfolios with the highest expected return for different levels of risk. Identify the portfolio with the highest expected return per standard deviation. Which famous performance measure is close to the ratio of average returns to the standard deviation of returns?", "crumbs": [ "R", "Getting Started", - "Introduction to Tidy Finance" + "Modern Portfolio Theory" ] }, { - "objectID": "r/introduction-to-tidy-finance.html#the-efficient-frontier", - "href": "r/introduction-to-tidy-finance.html#the-efficient-frontier", - "title": "Introduction to Tidy Finance", - "section": "The Efficient Frontier", - "text": "The Efficient Frontier\n The mutual fund separation theorem states that as soon as we have two efficient portfolios (such as the minimum variance portfolio \\(\\omega_\\text{mvp}\\) and the efficient portfolio for a higher required level of expected returns \\(\\omega_\\text{eff}(\\bar{\\mu})\\), we can characterize the entire efficient frontier by combining these two portfolios. That is, any linear combination of the two portfolio weights will again represent an efficient portfolio. The code below implements the construction of the efficient frontier, which characterizes the highest expected return achievable at each level of risk. To understand the code better, make sure to familiarize yourself with the inner workings of the for loop.\n\nlength_year <- 12\na <- seq(from = -0.4, to = 1.9, by = 0.01)\nresults <- tibble(\n a = a,\n mu = NA,\n sd = NA\n)\nfor (i in seq_along(a)) {\n w <- (1 - a[i]) * mvp_weights + (a[i]) * efp_weights\n results$mu[i] <- length_year * t(w) %*% mu \n results$sd[i] <- sqrt(length_year) * sqrt(t(w) %*% sigma %*% w)\n}\n\nThe code above proceeds in two steps: First, we compute a vector of combination weights \\(a\\) and then we evaluate the resulting linear combination with \\(a\\in\\mathbb{R}\\):\n\\[\\omega^* = a\\omega_\\text{eff}(\\bar\\mu) + (1-a)\\omega_\\text{mvp} = \\omega_\\text{mvp} + \\frac{\\lambda^*}{2}\\left(\\Sigma^{-1}\\mu -\\frac{D}{C}\\Sigma^{-1}\\iota \\right)\\] with \\(\\lambda^* = 2\\frac{a\\bar\\mu + (1-a)\\tilde\\mu - D/C}{E-D^2/C}\\) where \\(C = \\iota'\\Sigma^{-1}\\iota\\), \\(D=\\iota'\\Sigma^{-1}\\mu\\), and \\(E=\\mu'\\Sigma^{-1}\\mu\\). Finally, it is simple to visualize the efficient frontier alongside the two efficient portfolios within one powerful figure using ggplot (see Figure 6). We also add the individual stocks in the same call. We compute annualized returns based on the simple assumption that monthly returns are independent and identically distributed. Thus, the average annualized return is just 12 times the expected monthly return.\n\nresults |>\n ggplot(aes(x = sd, y = mu)) +\n geom_point() +\n geom_point(\n data = results |> filter(a %in% c(0, 1)),\n size = 4\n ) +\n geom_point(\n data = tibble(\n mu = length_year * mu, \n sd = sqrt(length_year) * sqrt(diag(sigma))\n ),\n aes(y = mu, x = sd), size = 1\n ) +\n labs(\n x = \"Annualized standard deviation\",\n y = \"Annualized expected return\",\n title = \"Efficient frontier for DOW index constituents\"\n ) +\n scale_x_continuous(labels = percent) +\n scale_y_continuous(labels = percent)\n\n\n\n\n\n\n\nFigure 6: The big dots indicate the location of the minimum variance and the efficient portfolio that delivers 3 times the expected return of the minimum variance portfolio, respectively. The small dots indicate the location of the individual constituents.\n\n\n\n\n\nThe line in Figure 6 indicates the efficient frontier: the set of portfolios a mean-variance efficient investor would choose from. Compare the performance relative to the individual assets (the dots) - it should become clear that diversifying yields massive performance gains (at least as long as we take the parameters \\(\\Sigma\\) and \\(\\mu\\) as given).", + "objectID": "r/proofs.html", + "href": "r/proofs.html", + "title": "Proofs", + "section": "", + "text": "The minimum variance portfolio weights are given by the solution to \\[\\omega_\\text{mvp} = \\arg\\min \\omega'\\Sigma \\omega \\text{ s.t. } \\iota'\\omega= 1,\\] where \\(\\iota\\) is an \\((N \\times 1)\\) vector of ones. The Lagrangian reads \\[ \\mathcal{L}(\\omega) = \\omega'\\Sigma \\omega - \\lambda(\\omega'\\iota - 1).\\] We can solve the first-order conditions of the Lagrangian equation: \\[\n\\begin{aligned}\n& \\frac{\\partial\\mathcal{L}(\\omega)}{\\partial\\omega} = 0 \\Leftrightarrow 2\\Sigma \\omega = \\lambda\\iota \\Rightarrow \\omega = \\frac{\\lambda}{2}\\Sigma^{-1}\\iota \\\\ \\end{aligned}\n\\] Next, the constraint that weights have to sum up to one delivers: \\(1 = \\iota'\\omega = \\frac{\\lambda}{2}\\iota'\\Sigma^{-1}\\iota \\Rightarrow \\lambda = \\frac{2}{\\iota'\\Sigma^{-1}\\iota}.\\) Finally, plug-in the derived value of \\(\\lambda\\) to get \\[\n\\begin{aligned}\n\\omega_\\text{mvp} = \\frac{\\Sigma^{-1}\\iota}{\\iota'\\Sigma^{-1}\\iota}.\n\\end{aligned}\n\\]\n\n\n\nConsider an investor who aims to achieve minimum variance given a desired expected return \\(\\bar{\\mu}\\), that is: \\[\\omega_\\text{eff}\\left(\\bar{\\mu}\\right) = \\arg\\min \\omega'\\Sigma \\omega \\text{ s.t. } \\iota'\\omega = 1 \\text{ and } \\omega'\\mu \\geq \\bar{\\mu}.\\] The Lagrangian reads \\[ \\mathcal{L}(\\omega) = \\omega'\\Sigma \\omega - \\lambda(\\omega'\\iota - 1) - \\tilde{\\lambda}(\\omega'\\mu - \\bar{\\mu}). \\] We can solve the first-order conditions to get \\[\n\\begin{aligned}\n2\\Sigma \\omega &= \\lambda\\iota + \\tilde\\lambda \\mu\\\\\n\\Rightarrow\\omega &= \\frac{\\lambda}{2}\\Sigma^{-1}\\iota + \\frac{\\tilde\\lambda}{2}\\Sigma^{-1}\\mu.\n\\end{aligned}\n\\]\nNext, the two constraints (\\(w'\\iota = 1 \\text{ and } \\omega'\\mu \\geq \\bar{\\mu}\\)) imply \\[\n\\begin{aligned}\n1 &= \\iota'\\omega = \\frac{\\lambda}{2}\\underbrace{\\iota'\\Sigma^{-1}\\iota}_{C} + \\frac{\\tilde\\lambda}{2}\\underbrace{\\iota'\\Sigma^{-1}\\mu}_D\\\\\n\\Rightarrow \\lambda&= \\frac{2 - \\tilde\\lambda D}{C}\\\\\n\\bar\\mu &= \\mu'\\omega = \\frac{\\lambda}{2}\\underbrace{\\mu'\\Sigma^{-1}\\iota}_{D} + \\frac{\\tilde\\lambda}{2}\\underbrace{\\mu'\\Sigma^{-1}\\mu}_E = \\frac{1}{2}\\left(\\frac{2 - \\tilde\\lambda D}{C}\\right)D+\\frac{\\tilde\\lambda}{2}E \\\\&=\\frac{D}{C}+\\frac{\\tilde\\lambda}{2}\\left(E - \\frac{D^2}{C}\\right)\\\\\n\\Rightarrow \\tilde\\lambda &= 2\\frac{\\bar\\mu - D/C}{E-D^2/C}.\n\\end{aligned}\n\\] As a result, the efficient portfolio weight takes the form (for \\(\\bar{\\mu} \\geq D/C = \\mu'\\omega_\\text{mvp}\\)) \\[\\omega_\\text{eff}\\left(\\bar\\mu\\right) = \\omega_\\text{mvp} + \\frac{\\tilde\\lambda}{2}\\left(\\Sigma^{-1}\\mu -\\frac{D}{C}\\Sigma^{-1}\\iota \\right).\\] Thus, the efficient portfolio allocates wealth in the minimum variance portfolio \\(\\omega_\\text{mvp}\\) and a levered (self-financing) portfolio to increase the expected return.\nNote that the portfolio weights sum up to one as \\[\\iota'\\left(\\Sigma^{-1}\\mu -\\frac{D}{C}\\Sigma^{-1}\\iota \\right) = D - D = 0\\text{ so }\\iota'\\omega_\\text{eff} = \\iota'\\omega_\\text{mvp} = 1.\\] Finally, the expected return of the efficient portfolio is \\[\\mu'\\omega_\\text{eff} = \\frac{D}{C} + \\bar\\mu - \\frac{D}{C} = \\bar\\mu.\\]\n\n\n\nWe argue that an investor with a quadratic utility function with certainty equivalent \\[\\max_\\omega CE(\\omega) = \\omega'\\mu - \\frac{\\gamma}{2} \\omega'\\Sigma \\omega \\text{ s.t. } \\iota'\\omega = 1\\] faces an equivalent optimization problem to a framework where portfolio weights are chosen with the aim to minimize volatility given a pre-specified level or expected returns \\[\\min_\\omega \\omega'\\Sigma \\omega \\text{ s.t. } \\omega'\\mu = \\bar\\mu \\text{ and } \\iota'\\omega = 1.\\] Note the difference: In the first case, the investor has a (known) risk aversion \\(\\gamma\\) which determines their optimal balance between risk (\\(\\omega'\\Sigma\\omega)\\) and return (\\(\\mu'\\omega\\)). In the second case, the investor has a target return they want to achieve while minimizing the volatility. Intuitively, both approaches are closely connected if we consider that the risk aversion \\(\\gamma\\) determines the desirable return \\(\\bar\\mu\\). More risk-averse investors (higher \\(\\gamma\\)) will chose a lower target return to keep their volatility level down. The efficient frontier then spans all possible portfolios depending on the risk aversion \\(\\gamma\\), starting from the minimum variance portfolio (\\(\\gamma = \\infty\\)).\nTo proof this equivalence, consider first the optimal portfolio weights for a certainty equivalent maximizing investor. The first-order condition reads \\[\n\\begin{aligned}\n\\mu - \\lambda \\iota &= \\gamma \\Sigma \\omega \\\\\n\\Leftrightarrow \\omega &= \\frac{1}{\\gamma}\\Sigma^{-1}\\left(\\mu - \\lambda\\iota\\right)\n\\end{aligned}\n\\] Next, we make use of the constraint \\(\\iota'\\omega = 1\\). \\[\n\\begin{aligned}\n\\iota'\\omega &= 1 = \\frac{1}{\\gamma}\\left(\\iota'\\Sigma^{-1}\\mu - \\lambda\\iota'\\Sigma^{-1}\\iota\\right)\\\\\n\\Rightarrow \\lambda &= \\frac{1}{\\iota'\\Sigma^{-1}\\iota}\\left(\\iota'\\Sigma^{-1}\\mu - \\gamma \\right).\n\\end{aligned}\n\\] Plugging in the value of \\(\\lambda\\) reveals the desired portfolio for an investor with risk aversion \\(\\gamma\\). \\[\n\\begin{aligned}\n\\omega &= \\frac{1}{\\gamma}\\Sigma^{-1}\\left(\\mu - \\frac{1}{\\iota'\\Sigma^{-1}\\iota}\\left(\\iota'\\Sigma^{-1}\\mu - \\gamma \\right)\\right) \\\\\n\\Rightarrow \\omega &= \\frac{\\Sigma^{-1}\\iota}{\\iota'\\Sigma^{-1}\\iota} + \\frac{1}{\\gamma}\\left(\\Sigma^{-1} - \\frac{\\Sigma^{-1}\\iota}{\\iota'\\Sigma^{-1}\\iota}\\iota'\\Sigma^{-1}\\right)\\mu\\\\\n&= \\omega_\\text{mvp} + \\frac{1}{\\gamma}\\left(\\Sigma^{-1}\\mu - \\frac{\\iota'\\Sigma^{-1}\\mu}{\\iota'\\Sigma^{-1}\\iota}\\Sigma^{-1}\\iota\\right).\n\\end{aligned}\n\\] The resulting weights correspond to the efficient portfolio with desired return \\(\\bar r\\) such that (in the notation of book) \\[\\frac{1}{\\gamma} = \\frac{\\tilde\\lambda}{2} = \\frac{\\bar\\mu - D/C}{E - D^2/C}\\] which implies that the desired return is just \\[\\bar\\mu = \\frac{D}{C} + \\frac{1}{\\gamma}\\left({E - D^2/C}\\right)\\] which is \\(\\frac{D}{C} = \\mu'\\omega_\\text{mvp}\\) for \\(\\gamma\\rightarrow \\infty\\) as expected. For instance, letting \\(\\gamma \\rightarrow \\infty\\) implies \\(\\bar\\mu = \\frac{D}{C} = \\omega_\\text{mvp}'\\mu\\).", "crumbs": [ "R", - "Getting Started", - "Introduction to Tidy Finance" + "Appendix", + "Proofs" ] }, { - "objectID": "r/introduction-to-tidy-finance.html#exercises", - "href": "r/introduction-to-tidy-finance.html#exercises", - "title": "Introduction to Tidy Finance", - "section": "Exercises", - "text": "Exercises\n\nDownload daily prices for another stock market symbol of your choice from Yahoo Finance with download_data() from the tidyfinance package. Plot two time series of the symbol’s un-adjusted and adjusted closing prices. Explain the differences.\nCompute daily net returns for an asset of your choice and visualize the distribution of daily returns in a histogram using 100 bins. Also, use geom_vline() to add a dashed red vertical line that indicates the 5 percent quantile of the daily returns. Compute summary statistics (mean, standard deviation, minimum and maximum) for the daily returns.\nTake your code from before and generalize it such that you can perform all the computations for an arbitrary vector of symbols (e.g., symbol <- c(\"AAPL\", \"MMM\", \"BA\")). Automate the download, the plot of the price time series, and create a table of return summary statistics for this arbitrary number of assets.\nAre days with high aggregate trading volume often also days with large absolute returns? Find an appropriate visualization to analyze the question using the symbol AAPL. 1.Compute monthly returns from the downloaded stock market prices. Compute the vector of historical average returns and the sample variance-covariance matrix. Compute the minimum variance portfolio weights and the portfolio volatility and average returns. Visualize the mean-variance efficient frontier. Choose one of your assets and identify the portfolio which yields the same historical volatility but achieves the highest possible average return.\nIn the portfolio choice analysis, we restricted our sample to all assets trading every day since 2000. How is such a decision a problem when you want to infer future expected portfolio performance from the results?\nThe efficient frontier characterizes the portfolios with the highest expected return for different levels of risk. Identify the portfolio with the highest expected return per standard deviation. Which famous performance measure is close to the ratio of average returns to the standard deviation of returns?", + "objectID": "r/proofs.html#optimal-portfolio-choice", + "href": "r/proofs.html#optimal-portfolio-choice", + "title": "Proofs", + "section": "", + "text": "The minimum variance portfolio weights are given by the solution to \\[\\omega_\\text{mvp} = \\arg\\min \\omega'\\Sigma \\omega \\text{ s.t. } \\iota'\\omega= 1,\\] where \\(\\iota\\) is an \\((N \\times 1)\\) vector of ones. The Lagrangian reads \\[ \\mathcal{L}(\\omega) = \\omega'\\Sigma \\omega - \\lambda(\\omega'\\iota - 1).\\] We can solve the first-order conditions of the Lagrangian equation: \\[\n\\begin{aligned}\n& \\frac{\\partial\\mathcal{L}(\\omega)}{\\partial\\omega} = 0 \\Leftrightarrow 2\\Sigma \\omega = \\lambda\\iota \\Rightarrow \\omega = \\frac{\\lambda}{2}\\Sigma^{-1}\\iota \\\\ \\end{aligned}\n\\] Next, the constraint that weights have to sum up to one delivers: \\(1 = \\iota'\\omega = \\frac{\\lambda}{2}\\iota'\\Sigma^{-1}\\iota \\Rightarrow \\lambda = \\frac{2}{\\iota'\\Sigma^{-1}\\iota}.\\) Finally, plug-in the derived value of \\(\\lambda\\) to get \\[\n\\begin{aligned}\n\\omega_\\text{mvp} = \\frac{\\Sigma^{-1}\\iota}{\\iota'\\Sigma^{-1}\\iota}.\n\\end{aligned}\n\\]\n\n\n\nConsider an investor who aims to achieve minimum variance given a desired expected return \\(\\bar{\\mu}\\), that is: \\[\\omega_\\text{eff}\\left(\\bar{\\mu}\\right) = \\arg\\min \\omega'\\Sigma \\omega \\text{ s.t. } \\iota'\\omega = 1 \\text{ and } \\omega'\\mu \\geq \\bar{\\mu}.\\] The Lagrangian reads \\[ \\mathcal{L}(\\omega) = \\omega'\\Sigma \\omega - \\lambda(\\omega'\\iota - 1) - \\tilde{\\lambda}(\\omega'\\mu - \\bar{\\mu}). \\] We can solve the first-order conditions to get \\[\n\\begin{aligned}\n2\\Sigma \\omega &= \\lambda\\iota + \\tilde\\lambda \\mu\\\\\n\\Rightarrow\\omega &= \\frac{\\lambda}{2}\\Sigma^{-1}\\iota + \\frac{\\tilde\\lambda}{2}\\Sigma^{-1}\\mu.\n\\end{aligned}\n\\]\nNext, the two constraints (\\(w'\\iota = 1 \\text{ and } \\omega'\\mu \\geq \\bar{\\mu}\\)) imply \\[\n\\begin{aligned}\n1 &= \\iota'\\omega = \\frac{\\lambda}{2}\\underbrace{\\iota'\\Sigma^{-1}\\iota}_{C} + \\frac{\\tilde\\lambda}{2}\\underbrace{\\iota'\\Sigma^{-1}\\mu}_D\\\\\n\\Rightarrow \\lambda&= \\frac{2 - \\tilde\\lambda D}{C}\\\\\n\\bar\\mu &= \\mu'\\omega = \\frac{\\lambda}{2}\\underbrace{\\mu'\\Sigma^{-1}\\iota}_{D} + \\frac{\\tilde\\lambda}{2}\\underbrace{\\mu'\\Sigma^{-1}\\mu}_E = \\frac{1}{2}\\left(\\frac{2 - \\tilde\\lambda D}{C}\\right)D+\\frac{\\tilde\\lambda}{2}E \\\\&=\\frac{D}{C}+\\frac{\\tilde\\lambda}{2}\\left(E - \\frac{D^2}{C}\\right)\\\\\n\\Rightarrow \\tilde\\lambda &= 2\\frac{\\bar\\mu - D/C}{E-D^2/C}.\n\\end{aligned}\n\\] As a result, the efficient portfolio weight takes the form (for \\(\\bar{\\mu} \\geq D/C = \\mu'\\omega_\\text{mvp}\\)) \\[\\omega_\\text{eff}\\left(\\bar\\mu\\right) = \\omega_\\text{mvp} + \\frac{\\tilde\\lambda}{2}\\left(\\Sigma^{-1}\\mu -\\frac{D}{C}\\Sigma^{-1}\\iota \\right).\\] Thus, the efficient portfolio allocates wealth in the minimum variance portfolio \\(\\omega_\\text{mvp}\\) and a levered (self-financing) portfolio to increase the expected return.\nNote that the portfolio weights sum up to one as \\[\\iota'\\left(\\Sigma^{-1}\\mu -\\frac{D}{C}\\Sigma^{-1}\\iota \\right) = D - D = 0\\text{ so }\\iota'\\omega_\\text{eff} = \\iota'\\omega_\\text{mvp} = 1.\\] Finally, the expected return of the efficient portfolio is \\[\\mu'\\omega_\\text{eff} = \\frac{D}{C} + \\bar\\mu - \\frac{D}{C} = \\bar\\mu.\\]\n\n\n\nWe argue that an investor with a quadratic utility function with certainty equivalent \\[\\max_\\omega CE(\\omega) = \\omega'\\mu - \\frac{\\gamma}{2} \\omega'\\Sigma \\omega \\text{ s.t. } \\iota'\\omega = 1\\] faces an equivalent optimization problem to a framework where portfolio weights are chosen with the aim to minimize volatility given a pre-specified level or expected returns \\[\\min_\\omega \\omega'\\Sigma \\omega \\text{ s.t. } \\omega'\\mu = \\bar\\mu \\text{ and } \\iota'\\omega = 1.\\] Note the difference: In the first case, the investor has a (known) risk aversion \\(\\gamma\\) which determines their optimal balance between risk (\\(\\omega'\\Sigma\\omega)\\) and return (\\(\\mu'\\omega\\)). In the second case, the investor has a target return they want to achieve while minimizing the volatility. Intuitively, both approaches are closely connected if we consider that the risk aversion \\(\\gamma\\) determines the desirable return \\(\\bar\\mu\\). More risk-averse investors (higher \\(\\gamma\\)) will chose a lower target return to keep their volatility level down. The efficient frontier then spans all possible portfolios depending on the risk aversion \\(\\gamma\\), starting from the minimum variance portfolio (\\(\\gamma = \\infty\\)).\nTo proof this equivalence, consider first the optimal portfolio weights for a certainty equivalent maximizing investor. The first-order condition reads \\[\n\\begin{aligned}\n\\mu - \\lambda \\iota &= \\gamma \\Sigma \\omega \\\\\n\\Leftrightarrow \\omega &= \\frac{1}{\\gamma}\\Sigma^{-1}\\left(\\mu - \\lambda\\iota\\right)\n\\end{aligned}\n\\] Next, we make use of the constraint \\(\\iota'\\omega = 1\\). \\[\n\\begin{aligned}\n\\iota'\\omega &= 1 = \\frac{1}{\\gamma}\\left(\\iota'\\Sigma^{-1}\\mu - \\lambda\\iota'\\Sigma^{-1}\\iota\\right)\\\\\n\\Rightarrow \\lambda &= \\frac{1}{\\iota'\\Sigma^{-1}\\iota}\\left(\\iota'\\Sigma^{-1}\\mu - \\gamma \\right).\n\\end{aligned}\n\\] Plugging in the value of \\(\\lambda\\) reveals the desired portfolio for an investor with risk aversion \\(\\gamma\\). \\[\n\\begin{aligned}\n\\omega &= \\frac{1}{\\gamma}\\Sigma^{-1}\\left(\\mu - \\frac{1}{\\iota'\\Sigma^{-1}\\iota}\\left(\\iota'\\Sigma^{-1}\\mu - \\gamma \\right)\\right) \\\\\n\\Rightarrow \\omega &= \\frac{\\Sigma^{-1}\\iota}{\\iota'\\Sigma^{-1}\\iota} + \\frac{1}{\\gamma}\\left(\\Sigma^{-1} - \\frac{\\Sigma^{-1}\\iota}{\\iota'\\Sigma^{-1}\\iota}\\iota'\\Sigma^{-1}\\right)\\mu\\\\\n&= \\omega_\\text{mvp} + \\frac{1}{\\gamma}\\left(\\Sigma^{-1}\\mu - \\frac{\\iota'\\Sigma^{-1}\\mu}{\\iota'\\Sigma^{-1}\\iota}\\Sigma^{-1}\\iota\\right).\n\\end{aligned}\n\\] The resulting weights correspond to the efficient portfolio with desired return \\(\\bar r\\) such that (in the notation of book) \\[\\frac{1}{\\gamma} = \\frac{\\tilde\\lambda}{2} = \\frac{\\bar\\mu - D/C}{E - D^2/C}\\] which implies that the desired return is just \\[\\bar\\mu = \\frac{D}{C} + \\frac{1}{\\gamma}\\left({E - D^2/C}\\right)\\] which is \\(\\frac{D}{C} = \\mu'\\omega_\\text{mvp}\\) for \\(\\gamma\\rightarrow \\infty\\) as expected. For instance, letting \\(\\gamma \\rightarrow \\infty\\) implies \\(\\bar\\mu = \\frac{D}{C} = \\omega_\\text{mvp}'\\mu\\).", "crumbs": [ "R", - "Getting Started", - "Introduction to Tidy Finance" + "Appendix", + "Proofs" ] }, { @@ -4076,5 +4568,17 @@ "title": "Non-standard errors in portfolio sorts", "section": "Footnotes", "text": "Footnotes\n\n\nMenkveld, A. J. et al. (2023). “Non-standard Errors”, Journal of Finance (forthcoming). http://dx.doi.org/10.2139/ssrn.3961574↩︎\nWalter, D., Weber, R., and Weiss, P. (2023). “Non-Standard Errors in Portfolio Sorts”. http://dx.doi.org/10.2139/ssrn.4164117↩︎\nCooper, M. J., Gulen, H., and Schill, M. J. (2008). “Asset growth and the cross‐section of stock returns”, The Journal of Finance, 63(4), 1609-1651.↩︎" + }, + { + "objectID": "r/discounted-cash-flow-analysis.html#footnotes", + "href": "r/discounted-cash-flow-analysis.html#footnotes", + "title": "Discounted Cash Flow Analysis", + "section": "Footnotes", + "text": "Footnotes\n\n\nSee investopedia.com for alternative definitions.↩︎\nSee (corporatefinanceinstitute.com/)[https://corporatefinanceinstitute.com/resources/valuation/exit-multiple/)] for an intuitive explanation of the exit multiple approach.↩︎", + "crumbs": [ + "R", + "Getting Started", + "Discounted Cash Flow Analysis" + ] } ] \ No newline at end of file diff --git a/docs/sitemap.xml b/docs/sitemap.xml index d1069fed..a76c6786 100644 --- a/docs/sitemap.xml +++ b/docs/sitemap.xml @@ -57,20 +57,24 @@ 2024-02-20T18:08:18.412Z - https://www.tidy-finance.org/r/proofs.html - 2024-02-20T18:08:18.417Z + https://www.tidy-finance.org/r/clean-enhanced-trace-with-r.html + 2024-09-13T19:16:35.907Z https://www.tidy-finance.org/r/changelog.html - 2024-09-13T19:18:16.354Z + 2025-01-06T09:20:36.899Z - https://www.tidy-finance.org/r/parametric-portfolio-policies.html - 2024-07-29T16:56:21.339Z + https://www.tidy-finance.org/r/accessing-and-managing-financial-data.html + 2024-11-19T18:31:08.401Z - https://www.tidy-finance.org/r/wrds-dummy-data.html - 2024-07-29T18:22:18.412Z + https://www.tidy-finance.org/r/fixed-effects-and-clustered-standard-errors.html + 2024-08-04T17:54:28.621Z + + + https://www.tidy-finance.org/r/working-with-stock-returns.html + 2025-01-06T07:21:45.365Z https://www.tidy-finance.org/r/difference-in-differences.html @@ -84,6 +88,10 @@ https://www.tidy-finance.org/r/factor-selection-via-machine-learning.html 2024-09-26T10:09:59.088Z + + https://www.tidy-finance.org/r/discounted-cash-flow-analysis.html + 2025-01-06T11:17:09.342Z + https://www.tidy-finance.org/r/trace-and-fisd.html 2024-09-13T19:16:35.909Z @@ -110,7 +118,7 @@ https://www.tidy-finance.org/python/introduction-to-tidy-finance.html - 2024-12-23T16:36:44.160Z + 2025-01-06T06:22:06.475Z https://www.tidy-finance.org/python/changelog.html @@ -224,6 +232,10 @@ https://www.tidy-finance.org/r/option-pricing-via-machine-learning.html 2024-04-17T14:42:51.025Z + + https://www.tidy-finance.org/r/financial-ratios.html + 2025-01-06T10:42:13.834Z + https://www.tidy-finance.org/r/index.html 2024-08-23T13:04:59.135Z @@ -237,20 +249,24 @@ 2024-10-12T11:14:16.872Z - https://www.tidy-finance.org/r/fixed-effects-and-clustered-standard-errors.html - 2024-08-04T17:54:28.621Z + https://www.tidy-finance.org/r/wrds-dummy-data.html + 2024-07-29T18:22:18.412Z - https://www.tidy-finance.org/r/accessing-and-managing-financial-data.html - 2024-11-19T18:31:08.401Z + https://www.tidy-finance.org/r/parametric-portfolio-policies.html + 2024-07-29T16:56:21.339Z - https://www.tidy-finance.org/r/clean-enhanced-trace-with-r.html - 2024-09-13T19:16:35.907Z + https://www.tidy-finance.org/r/capital-asset-pricing-model.html + 2025-01-06T08:28:53.560Z - https://www.tidy-finance.org/r/introduction-to-tidy-finance.html - 2024-12-23T16:06:06.365Z + https://www.tidy-finance.org/r/modern-portfolio-theory.html + 2025-01-06T08:02:28.875Z + + + https://www.tidy-finance.org/r/proofs.html + 2024-02-20T18:08:18.417Z https://www.tidy-finance.org/r/beta-estimation.html diff --git a/r/capital-asset-pricing-model.qmd b/r/capital-asset-pricing-model.qmd new file mode 100644 index 00000000..57cbe25e --- /dev/null +++ b/r/capital-asset-pricing-model.qmd @@ -0,0 +1,496 @@ +--- +title: The Capital Asset Pricing Model +metadata: + pagetitle: The CAPM with R + description-meta: Learn how to use the programming language R for estimating the CAPM for asset evaluation. +--- + +The Capital Asset Pricing Model (CAPM) is one of the most influential theories in finance and builds on the foundation laid by [Modern Portfolio Theory](modern-portfolio-theory.qmd) (MPT). It was simultaneously developed by @Sharpe1964, @Lintner1965, and @Mossin1966. While MPT shows how to construct optimal portfolios, the CAPM extends this framework to explain how assets should be priced in equilibrium when all investors follow MPT principles. The CAPM is the simplest model that aims to explain equilibrium asset prices and hence the cornerstone for a myriad of extensions. It is also used in industry applications, in particular for performance evaluation, as it provides a simple measure of the market risk of an asset or portfolio. This chapter shows how to estimate the CAPM. We download stock market data, estimate betas using regression analysis, and evaluate asset performance. + +We use the following packages throughout this chapter: + + +```{r} +#| message: false +#| warning: false +library(tidyverse) +library(tidyfinance) +library(scales) +library(ggrepel) +``` + +## Asset Returns & Volatilities + +Building on our analysis from the previous chapter on [Modern Portfolio Theory](modern-portfolio-theory.qmd), we again examine the Dow Jones Industrial Average constituents. However, our focus shifts from portfolio optimization to understanding how these assets are priced in equilibrium. Let's start by downloading and preparing our daily return data: + +```{r} +#| output: false +symbols <- download_data( + type = "constituents", + index = "Dow Jones Industrial Average" +) + +prices_daily <- download_data( + type = "stock_prices", symbol = symbols$symbol, + start_date = "2019-10-01", end_date = "2024-09-30" +) |> + select(symbol, date, price = adjusted_close) + +returns_daily <- prices_daily |> + group_by(symbol) |> + mutate(ret = price / lag(price) - 1) |> + ungroup() |> + select(symbol, date, ret) |> + drop_na(ret) |> + arrange(symbol, date) +``` + +The relationship between risk and return is central to the CAPM. Figure @fig-300 plots average returns against volatilities for our sample of stocks, highlighting two particularly interesting cases: Boeing (BA) and Nvidia (NVDA). This visualization raises an important question: why don't we observe a clear positive relationship between risk and return? + +```{r} +#| label: fig-300 +#| fig-cap: "Average returns and volatilities are based on returns adjusted for dividend payments and stock splits." +#| fig-alt: "Title: Average returns and volatilities of Dow index constituents. The figure shows a scatter plot with volatilities on the horizontal axis and average returns on the vertical axis. The stocks Nvidia and Boeing are highlighted because they exhibit the highest and lower average returns, respectively." +assets <- returns_daily |> + group_by(symbol) |> + summarize(mu = mean(ret), sigma = sd(ret)) + +fig_vola_return <- assets |> + ggplot(aes(x = sigma, y = mu)) + + geom_point() + + geom_label_repel(data = assets |> filter(symbol %in% c("BA", "NVDA")), + aes(label = symbol)) + + scale_x_continuous(labels = percent) + + scale_y_continuous(labels = percent) + + labs(x = "Volatility", y = "Average return", + title = "Average returns and volatilities of Dow index constituents") +fig_vola_return +``` + +This apparent puzzle - that higher volatility doesn't necessarily lead to higher returns - motivates one of CAPM's key insights: not all risks are rewarded in equilibrium. To understand this, we need to distinguish between systematic and idiosyncratic risk. + +Company-specific events might affect individual stock prices, e.g., CEO resignations, product launches, and earnings reports. These idiosyncratic events don't necessarily impact the overall market and this asset-specific risk can be eliminated through diversification. Therefore, we call this risk idiosyncratic. Systematic risk, on the other hand, affects all assets in the market at the same time and investors really dislike it because it cannot be diversified away. + +## Portfolio Return & Variance + +While we covered portfolio mathematics in detail in the previous chapter, CAPM introduces a crucial new element: the risk-free asset. This addition fundamentally changes the investment opportunity set and leads to powerful conclusions about optimal portfolio choice. + +To recap, we use the notion of expected portfolio return: + +$$\text{Expected Portfolio Return} = \omega'\mu,$$ + +where $\omega$ is the vector of asset weights and $\mu$ is vector of expected return of assets. + +Portfolio variance is given by: + +$$\text{Portfolio Variance} = \omega' \Sigma \omega$$ +where $\Sigma$ is the variance-covariance matrix. + +The introduction of a risk-free asset transforms the investment problem. Instead of choosing only between risky assets, investors can now combine a risky portfolio with risk-free lending or borrowing. Let $r_f$ be the return of the risk-free asset (e.g., a short-term government bond), then we can describe the combined portfolio return as + +$$\mu_c = c \omega'\mu + (1-c)r_f = r_f + c (\omega'\mu - r_f),$$ +where $c$ is the fraction of capital in the risky portfolio. + +By assumption, the risk-free asset has zero volatility. The risk of the combined portfolio is thus only measured by the volatility of the risk assets: + +$$\sigma_c= c\sqrt{\omega' \Sigma \omega} \Rightarrow c = \frac{\sigma_c}{\sqrt{\omega' \Sigma \omega}}$$ + +This equation allows us to derive a **Capital Allocation Line** (CAL): + +$$\mu_c = r_f +\sigma_c \frac{\omega'\mu-r_f}{\sqrt{\omega' \Sigma \omega}}$$ + +Importantly, the slope of CAL is called **Sharpe ratio**: + +$$\text{Sharpe ratio} = \frac{\omega'\mu-r_f}{\sqrt{\omega' \Sigma \omega}}$$ + +The Sharpe ratio measures the *excess return per unit of risk*. A higher ratio hence indicates a more attractive *risk-adjusted return*. + +Let's illustrate the Sharpe ratios and CALs using a few example portfolios. To construct combined portfolios, we need a measure for the risk-free rate. One way to calculate risk-free rates is to use the 13-week T-bill rate (with the symbol ^IRX). Since the prices are quoted in annualized percentage yields, we have to divide them by 100 and convert them to daily rates using 252 trading days.^[Note that the resulting returns have a 99% correlation with Fama-French risk-free rate.] + +```{r} +risk_free_daily <- download_data( + type = "stock_prices", symbol = "^IRX", + start_date = "2019-10-01", end_date = "2024-09-30" +) |> + mutate( + risk_free = (1 + adjusted_close / 100)^(1 / 252) - 1 + ) |> + select(date, risk_free) |> + drop_na() +``` + +We also prepare the vector of expected returns and calculate the variance-covariance matrix. + +```{r} +mu <- assets$mu +sigma <- returns_daily |> + pivot_wider(names_from = symbol, values_from = ret) |> + select(-date) |> + cov() +``` + +As a first example portfolio, we construct a portfolio of equal weights across the risky assets. + +```{r} +number_of_assets <- nrow(assets) +omega_ew <- rep(1 / number_of_assets, number_of_assets) + +summary_ew <- tibble( + mu = as.numeric(t(omega_ew) %*% mu), + sigma = as.numeric(sqrt(t(omega_ew) %*% sigma %*% omega_ew)), + type = "Equal-Weighted Portfolio" +) +``` + +As another example, we construct a portfolio using random weights in the risky assets. + +```{r} +set.seed(1234) +omega_random <- runif(number_of_assets, -1, 1) +omega_random <- omega_random / sum(omega_random) + +summary_random <- tibble( + mu = as.numeric(t(omega_random) %*% mu), + sigma = as.numeric(sqrt(t(omega_random) %*% sigma %*% omega_random)), + type = "Randomly-Weighted Portfolio" +) +``` + +The last portfolio is invested only in the risk-free asset. + +```{r} +summary_risk_free <- tibble( + mu = mean(risk_free_daily$risk_free), + sigma = 0, + type = "Risk-Free Asset" +) +``` + +By combining this risk-free portfolio with the other two risky portfolios, we can draw CALs. To facilitate the computation, we introduce a helper function that calculates the slope of the CALs - the Sharpe ratios. + +```{r} +calculate_sharpe_ratio <- function(mu, sigma, risk_free) { + as.numeric(mu - risk_free) / sigma +} + +summaries <- bind_rows(assets, summary_ew, summary_random, summary_risk_free) + +summaries <- summaries |> + mutate( + sharpe_ratio = if_else( + str_detect(type, "Portfolio"), + calculate_sharpe_ratio(mu, sigma, risk_free = summary_risk_free$mu), + NA + ), + risk_free = summary_risk_free$mu + ) +``` + +@fig-301 now plots the CALs that connect the risk-free asset to the equal-weighted and randomly-weighted portfolio, respectively. We can see that the CALs look very similar although both portfolios have different risk-return profiles. + +```{r} +#| label: fig-301 +#| fig-cap: "Points correspond to individual assets, crosses to portfolios." +#| fig-alt: "Title: Average returns and volatilities of Dow index constituents with capital allocation lines. The figure shows a scatter plot with volatities on the horizontal axis and average returns on the vertical axis. In addition, the figure shows capital allocation lines that connect the risk-free asset to the equal-weighted and randomly-weighted portfolio, respectively." +#| warning: false +fig_cal <- summaries |> + ggplot(aes(x = sigma, y = mu)) + + geom_abline(aes(intercept = risk_free, slope = sharpe_ratio, color = type), + linetype = "dashed", linewidth = 1) + + geom_point(data = summaries |> filter(is.na(type))) + + geom_point(data = summaries |> filter(!is.na(type)), shape = 4, size = 4) + + geom_label_repel(aes(label = type)) + + scale_x_continuous(labels = percent) + + scale_y_continuous(labels = percent) + + labs( + x = "Volatility", y = "Average return", + title = "Average returns and volatilities of Dow index constituents with capital allocation lines" + ) + + theme(legend.position = "none") +fig_cal +``` + +## The Tangency Portfolio + +So far we have only chosen arbitrary portfolios. Now, we want to look for an optimal portfolio. In particular, we look for a portfolio that maximizes the Sharpe ratio; + +$$\max_w \frac{\omega' \mu - r_f}{\sqrt{\omega' \Sigma \omega}},$$ +while staying **fully invested** + +$$ \omega'\iota = 1.$$ + +The resulting optimal portfolio is called the *tangency portfolio* (and you see in a minute why). We can calculate the tangency portfolio using the following **analytic solution** ^[See [this online resource](https://bookdown.org/compfinezbook/introcompfinr/Efficient-portfolios-of.html#computing-the-tangency-portfolio-using-matrix-algebra)) provided by @Zivot2026 for details on the derivation.] + +$$\omega_{tan}=\frac{\Sigma^{-1}(\mu-r_f)}{\iota'\Sigma^{-1}(\mu-r_f)}$$ +The next code chunk implements the analytic solution. + +```{r} +omega_tangency <- solve(sigma) %*% (mu - summary_risk_free$mu) +omega_tangency <- as.vector(omega_tangency / sum(omega_tangency)) + +summary_tangency <- tibble( + mu = as.numeric(t(omega_tangency) %*% mu), + sigma = as.numeric(sqrt(t(omega_tangency) %*% sigma %*% omega_tangency)), + type = "Tangency Portfolio", + sharpe_ratio = calculate_sharpe_ratio(mu, sigma, risk_free = summary_risk_free$mu), + risk_free = summary_risk_free$mu +) +``` + +## The Capital Market Line + +Now that we have the tangency portfolio, we want to derive the CAL by combining it with the risk-free asset. This combination is called the *Capital Market Line* (CML): + +$$\mu_{c} = r_f +\sigma_c \frac{\omega_{tan}'\mu-r_f}{\sqrt{\omega_{tan}' \Sigma \omega_{tan}}}$$ + +The CML describes *best risk-return trade-off* for portfolios that contain the risk-free asset and the tangency portfolio. We can plot the CML by recycling the code from above: + +```{r} +#| label: fig-302 +#| fig-cap: "Points correspond to individual assets, crosses to portfolios." +#| fig-alt: "Title: Average returns and volatilities of Dow index constituents with capital allocation lines. The figure shows a scatter plot with volatities on the horizontal axis and average returns on the vertical axis. In addition, the figure shows capital allocation lines that connect the risk-free asset to the equal-weighted, randomly-weighted, and tangency portfolio, respectively." +#| warning: false +summaries <- bind_rows(summaries, summary_tangency) + +fig_cml <- summaries |> + ggplot(aes(x = sigma, y = mu)) + + geom_abline(aes(intercept = risk_free, slope = sharpe_ratio, color = type), + linetype = "dashed", linewidth = 1) + + geom_point(data = summaries |> filter(is.na(type))) + + geom_point(data = summaries |> filter(!is.na(type)), shape = 4, size = 4) + + ggrepel::geom_label_repel(aes(label = type)) + + scale_x_continuous(labels = percent) + + scale_y_continuous(labels = percent) + + labs(x = "Volatility", y = "Average return", + title = "Average returns and volatilities of Dow index constituents with the capital market line") + + theme(legend.position = "none") +fig_cml +``` + +## Individual Assets and the Market Portfolio + +While our previous analysis focused on portfolios, the CAPM's real power lies in explaining the pricing of individual assets. The model provides two key insights: + +1. All rational investors prefer portfolios on the CML to individual assets or any other portfolios. +1. The tangency portfolio serves as the optimal risky portfolio for all investors. + +This leads to a powerful conclusion: in equilibrium, all investors hold some combination of the risk-free asset and the tangency portfolio, regardless of their risk preferences. Their only choice is how much to allocate to each. + +Let's examine how individual assets relate to this optimal portfolio. We can calculate each asset $i$'s expected excess return (return above the risk-free rate) and compare it to its relationship with the tangency portfolio: + +$$\mu_i - r_f = \beta_i \cdot (\omega_{tan}'\mu - r_f),$$ + +where + +$$\beta_i = \frac{\text{Cov}(r_i, \omega_{tan}r)}{\omega_{tan}' \Sigma \omega_{tan}}$$ + +is called the **asset beta**. The mathematics of CAPM thus shows that an asset's expected excess return must be proportional to its risk contribution to the tangency portfolio. + +We can calculate these excess returns as follows: + +```{r} +tangency_weights <- tibble( + symbol = assets$symbol, + omega_tangency = omega_tangency +) + +returns_tangency_daily <- returns_daily |> + left_join(tangency_weights, join_by(symbol)) |> + group_by(date) |> + summarize(mkt_ret = weighted.mean(ret, omega_tangency)) + +returns_excess_daily <- returns_daily |> + left_join(returns_tangency_daily, join_by(date)) |> + left_join(risk_free_daily, join_by(date)) |> + mutate(ret_excess = ret - risk_free, + mkt_excess = mkt_ret - risk_free) |> + select(symbol, date, ret_excess, mkt_excess) +``` + +For beta estimation, we employ regression analysis through a custom function. The `estimate_beta()` function runs a regression without an intercept (note the -1 in the formula) to directly capture the relationship between asset excess returns and market excess returns. We leverage `tidyverse`'s nested dataframes to efficiently run these regressions for all assets simultaneously. The `map_dbl()` function applies our regression to each nested dataset and extracts the beta coefficient, giving us a clean data frame of assets and their corresponding betas. + +```{r} +estimate_beta <- function(data) { + fit <- lm("ret_excess ~ mkt_excess - 1", data = data) + coefficients(fit) +} + +beta_results <- returns_excess_daily |> + nest(data = -symbol) |> + mutate(beta = map_dbl(data, estimate_beta)) +``` + +Figure @fig-303 presents these estimated betas. By ordering assets according to their beta values, we create a clear ranking of market sensitivity across our sample of stocks. This visualization makes it easy to identify which stocks are more defensive (beta < 1) versus aggressive (beta > 1) in their relationship with market movements. + +```{r} +#| label: fig-303 +#| fig-cap: "Weights are based on returns adjusted for dividend payments and stock splits." +#| fig-alt: "Title: Estimated asset betas based on the tangency portfolio for Dow index constituents. The figure shows a bar chart with estimated asset betas for each Dow index constituent." +fig_betas <- beta_results |> + ggplot(aes(x = beta, y = fct_reorder(symbol, beta))) + + geom_col() + + labs( + x = "Estimated asset beta", y = NULL, + title = "Estimated asset betas based on the tangency portfolio for Dow index constituents" + ) +fig_betas +``` + +Figure @fig-304 brings together theory and empirics by plotting the SML. We first calculate excess returns for the assets and combine this with our beta estimates. The plot includes a 45-degree line representing the theoretical CAPM relationship, where an asset's excess return should exactly equal its beta times the market excess return. By overlaying the actual data points and highlighting specific stocks (BA and NVDA), we can visually assess how well CAPM explains the cross-section of returns in our sample. This plot effectively demonstrates both the strengths and potential limitations of CAPM as an asset pricing model. + +```{r} +#| label: fig-304 +#| fig-cap: "Estimates are based on returns adjusted for dividend payments and stock splits and using the tangency portfolio as a measure for the market." +#| fig-alt: "Title: Estimated CAPM-betas and average returns for Dow index constituents. The figure shows a bar scatter plot with estimated asset betas on the horizontal and average returns on the vertical axis. All points fall onto the 45-degree line." +assets <- assets |> + mutate(mu_excess = mu - summary_risk_free$mu) |> + left_join(beta_results, join_by(symbol)) + +fig_betas_returns <- assets |> + ggplot(aes(x = beta, y = mu_excess)) + + geom_abline(intercept = 0, + slope = summary_tangency$mu - summary_risk_free$mu) + + geom_point() + + geom_label_repel(data = assets |> filter(symbol %in% c("BA", "NVDA")), + aes(label = symbol)) + + scale_y_continuous(labels = percent) + + labs( + x = "Estimated asset beta", y = "Average return", + title = "Estimated CAPM-betas and average returns for Dow index constituents" + ) +fig_betas_returns +``` + +## CAPM in Practice + +While our previous analysis used the tangency portfolio to derive CAPM's key insights, implementing this approach in practice presents several challenges. Computing the tangency portfolio requires estimating expected returns and the variance-covariance matrix for potentially thousands of assets. This raises difficult questions: Which assets should we include in our universe? How can we reliably estimate these parameters, especially when the number of assets is large? + +Fortunately, the CAPM provides an elegant solution. The model implies that in equilibrium, the market portfolio must equal the tangency portfolio. This theoretical insight leads to a practical simplification: instead of calculating complex tangency portfolio weights, we can use market capitalization-weighted portfolios as our benchmark. This approach aligns with the theoretical framework while being much simpler to implement. + +It's worth noting that this simplification relies on several key assumptions underlying the CAPM: + +- The model describes equilibrium in a single-period economy. +- Markets are frictionless, with no transaction costs or taxes. +- All investors can borrow and lend at the risk-free rate. +- Investors share the same expectations about returns and risks. +- Investors are rational, seeking to maximize returns for a given level of risk. + +Despite these stringent assumptions (or rather because of them), the CAPM has become a cornerstone of modern finance. Its simplicity makes it an excellent foundation for more sophisticated models and a useful starting point for practical applications. + +The model's practical implementation centers on SML. An asset's expected return is given by: + +$$\mu_i = r_f + \beta_i \cdot (\mu_m - r_f),$$ + +where beta measures the asset's systematic risk: + +$$\beta_i = \frac{\sigma_{im}}{\sigma_m^2}.$$ + +Here, $\mu_m$ represents the expected market return, $\sigma_{im}$ is the covariance between the asset and market returns, and $\sigma_m^2$ is the market variance. This formulation is particularly useful for performance evaluation. + +To assess an asset's performance, we introduce alpha ($\alpha_i$), which measures the difference between actual and expected returns: + +$$\mu_i - r_f = \alpha_i + \beta_i \cdot (\mu_m - r_f)$$ + +Alpha provides a risk-adjusted performance measure. A positive alpha indicates that the asset outperformed its risk-adjusted benchmark, while a negative alpha suggests underperformance. This makes alpha a valuable tool for investment managers and analysts. + +In practice, we estimate both alpha and beta using regression analysis. The empirical model is: + +$$r_{i,t} - r_{f,t} = \hat{\alpha}_i + \hat{\beta}_i \cdot (r_{m,t} - r_{f,t} ) + \varepsilon_{i,t}, $$ +where $r_{i,t}$ and $r_{m,t}$ are the realized returns of the asset and market portfolio on day $t$, respectively. The error term $\varepsilon_{i,t}$ captures the asset's idiosyncratic risk – the portion of returns unexplained by market movements. + +Let's turn to estimating CAPM parameters using real market data. Instead of using our previously constructed tangency portfolio, we employ the Fama-French market returns, which provide a widely accepted proxy for the market portfolio. These returns are already adjusted to represent excess returns over the risk-free rate, simplifying our analysis. + +```{r} +factors <- download_data( + type = "factors_ff_5_2x3_daily", + start_date = "2019-10-01", end_date = "2024-09-30" +) |> + select(date, mkt_excess, risk_free) +``` + +For our regression analysis, we first prepare the data by calculating excess returns for each stock. We join our daily returns with the Fama-French factors and subtract the risk-free rate to obtain excess returns. The `estimate_capm()` function then implements the regression equation we previously discussed: + +```{r} +returns_excess_daily <- returns_daily |> + left_join(factors, join_by(date)) |> + mutate(ret_excess = ret - risk_free) |> + select(symbol, date, ret_excess, mkt_excess) + +estimate_capm <- function(data) { + fit <- lm("ret_excess ~ mkt_excess", data = data) + tibble( + coefficient = c("alpha", "beta"), + estimate = coefficients(fit), + t_statistic = summary(fit)$coefficients[, "t value"] + ) +} + +capm_results <- returns_excess_daily |> + nest(data = -symbol) |> + mutate(capm = map(data, estimate_capm)) |> + unnest(capm) |> + select(symbol, coefficient, estimate, t_statistic) +``` + +The results are particularly interesting when we visualize the alphas across our sample of Dow Jones constituents. @fig-305 reveals the cross-sectional distribution of risk-adjusted performance, with positive values indicating outperformance and negative values indicating underperformance relative to what CAPM would predict. Statistical significance is indicated through color coding, showing which alphas are reliably different from zero at the 95% confidence level. + +```{r} +#| label: fig-305 +#| fig-cap: "Estimates are based on returns adjusted for dividend payments and stock splits and using the Fama-French market excess returns as a measure for the market." +#| fig-alt: "Title: Estimated CAPM alphas for Dow index constituents. The figure shows a bar chart with estimated alphas and indicates whether an estimate is statistically significant at 95%. Only Nvidia exhibits a statistically significant positive alpha." +fig_alpha <- capm_results |> + filter(coefficient == "alpha") |> + mutate(is_significant = abs(t_statistic) >= 1.96) |> + ggplot(aes(x = estimate, y = fct_reorder(symbol, estimate), + fill = is_significant)) + + geom_col() + + scale_x_continuous(labels = percent) + + labs( + x = "Estimated asset alphas", y = NULL, fill = "Significant at 95%?", + title = "Estimated CAPM alphas for Dow index constituents" + ) +fig_alpha +``` + +Most notably, our analysis shows that only NVIDIA (NVDA) exhibits a statistically significant positive alpha during our sample period. This finding aligns with the exceptional performance of technology stocks, particularly those involved in AI and chip manufacturing, but suggests that most Dow components' returns can be explained by their market exposure alone. + +## Shortcomings & Extensions + +While the CAPM's elegance and simplicity have made it a cornerstone of modern finance, the model faces several important challenges in practice. Understanding these limitations is crucial for both academic research and practical applications. + +A fundamental challenge lies in the identification of the market portfolio. The CAPM theory requires a truly universal market portfolio that includes all investable assets – not just stocks, but also real estate, private businesses, human capital, and even intangible assets. In practice, we must rely on proxies like the S&P 500, DAX, or TOPIX. The choice of market proxy can significantly impact our estimates and may need to be tailored to specific contexts. For instance, a U.S.-focused investor might use the S&P 500, while a Japanese investor might prefer the TOPIX. + +Another crucial limitation concerns the stability of beta over time. The model assumes that an asset's systematic risk remains constant, but this rarely holds in practice. Companies undergo significant changes that can affect their market sensitivity: they may alter their capital structure, enter new markets, face new competitors, or transform their business models. Consider how tech companies' betas might change as they mature from growth startups to established enterprises, or how a retailer's beta might shift as it expands its online presence. + +Perhaps most importantly, empirical evidence suggests that systematic risk alone cannot fully explain asset returns. Numerous studies have documented patterns in stock returns that CAPM cannot explain. Small-cap stocks tend to outperform large-cap stocks, and value stocks (those with high book-to-market ratios) tend to outperform growth stocks, even after adjusting for market risk. These "anomalies" suggest that investors may care about multiple dimensions of risk beyond market sensitivity. + +These limitations have spawned a rich literature of alternative and extended models. The Fama-French three-factor model represents a seminal extension, adding two factors to capture the size and value effects: + +- The SMB (Small Minus Big) factor captures the tendency of small stocks to outperform large stocks, as we discuss in our chapter [Size Sorts and P-Hacking](size-sorts-and-p-hacking.qmd). +- The HML (High Minus Low) factor captures the tendency of value stocks to outperform growth stocks, as we show in [Value and Bivariate Sorts](value-and-bivariate-sorts.qmd). + +Building on this framework, the Fama-French five-factor model adds two more dimensions, which we later discuss in [Replicating Fama-French Factors](replicating-fama-and-french-factors.qmd): + +- The RMW (Robust Minus Weak) factor captures the outperformance of companies with strong operating profitability +- The CMA (Conservative Minus Aggressive) factor reflects the tendency of companies with conservative investment policies to outperform those with aggressive investment policies + +The field continues to evolve with various theoretical and empirical innovations. The Consumption CAPM links asset prices to macroeconomic risks through aggregate consumption. The Conditional CAPM allows risk premiums and betas to vary with the business cycle. The Carhart four-factor model adds momentum to the three-factor framework, while the Q-factor model and investment CAPM provide alternative theoretical foundations rooted in corporate finance. + +Despite its limitations, the CAPM remains valuable as a conceptual framework and practical tool. Its core insight – that only systematic risk should be priced in equilibrium – continues to influence how we think about risk and return. Understanding both its strengths and weaknesses helps us apply it more effectively and appreciate the contributions of newer models that build upon its foundation. + +## Key Takeaways + +The CAPM provides a fundamental framework for understanding asset pricing and risk-return relationships. The key insights from this chapter are: + +- The CAPM is an equilibrium model that assumes a frictionless economy where investors can freely trade without costs and borrow or lend at the risk-free rate +- In equilibrium, all investors hold some combination of the market portfolio and the risk-free asset, with their risk preferences determining only the proportion of each of the two. +- Expected returns follow a linear relationship with systematic risk: Assets with higher market sensitivity (beta) should earn higher returns to compensate investors for bearing undiversifiable risk. +- Beta measures an asset's sensitivity to market movements and can be estimated through linear regression of asset excess returns on market excess returns. +- While the model has limitations, its insights about systematic versus idiosyncratic risk continue to influence both academic research and practical investment decisions. + +## Exercises + +1. Download daily returns for a German stock of your choice and the S&P 500 index for the past five years. Calculate the stock's beta and interpret its meaning. How does your estimate change if you use monthly instead of daily returns? +1. Compare the betas of stocks estimated using different market proxies (e.g., S&P 500, Russell 3000, MSCI World). How do the differences in market definition affect your conclusions about systematic risk? +1. Select a mutual fund and estimate its alpha and beta relative to its benchmark index. Is the fund's performance statistically significant after accounting for market risk? How do your conclusions change if you use a different benchmark? +1. Compare betas of multinational companies using both local and global market indices. How do the estimates differ? What might explain these differences? \ No newline at end of file diff --git a/r/changelog.qmd b/r/changelog.qmd index 8cc7022f..d005da7c 100644 --- a/r/changelog.qmd +++ b/r/changelog.qmd @@ -35,8 +35,8 @@ You can find every single change in our [commit history](https://github.com/tidy - [May 23, 2023, Commit d5e355c:](https://github.com/tidy-finance/website/commit/d5e355ca6cf117bcc193376124c46ca1b2e9ed1d) We update the workflow to `collect()` tables from `tidy_finance.sqlite`: To make variable selection more obvious, we now explicitly `select()` columns before collecting. As part of the pull request [Commit 91d3077](https://github.com/tidy-finance/website/pull/42/commits/91d3077ee75a3ab71db684001d0562a53031c73c), we now select excess returns instead of net returns in the Chapter [Fama-MacBeth Regressions](fama-macbeth-regressions.qmd). - [May 20, 2023, Commit be0f0b4:](https://github.com/tidy-finance/website/commit/be0f0b4b156487299369c682a4d47d1d10ec5485) We include `NA`-observations in the Mergent filters in Chapter [TRACE and FISD](trace-and-fisd.qmd). - [May 17, 2023, Commit 2209bb1:](https://github.com/tidy-finance/website/commit/2209bb133d2080eae52cbbc5ec14e4550ff186d3) We changed the `assign_portfolio()`-functions in Chapters [Univariate Portfolio Sorts](univariate-portfolio-sorts.qmd), [Size Sorts and p-Hacking](size-sorts-and-p-hacking.qmd), [Value and Bivariate Sorts](value-and-bivariate-sorts.qmd), and [Replicating Fama and French Factors](replicating-fama-and-french-factors.qmd). Additionally, we added a small explanation to potential issues with the function for clustered sorting variables in Chapter [Univariate Portfolio Sorts](univariate-portfolio-sorts.qmd). -- [May 12, 2023, Commit 54b76d7:](54b76d7c1977c3759ed8bd641940d17add1a755b) We removed magic numbers in Chapter [Introduction to Tidy Finance](introduction-to-tidy-finance.qmd#the-efficient-frontier) and introduced the `scales` packages already in the introduction chapter to reduce scaling issues in figures. +- [May 12, 2023, Commit 54b76d7:](54b76d7c1977c3759ed8bd641940d17add1a755b) We removed magic numbers in Chapter Introduction to Tidy Finance and introduced the `scales` packages already in the introduction chapter to reduce scaling issues in figures. - [Mar. 30, 2023, Issue 29:](https://github.com/tidy-finance/website/issues/29) We upgraded to `tidyverse` 2.0.0 and R 4.2.3 and removed all explicit loads of `lubridate`. -- [Feb. 15, 2023, Commit bfda6af: ](https://github.com/tidy-finance/website/commit/bfda6af6169a42f433568e32b7a9cce06cb948ac) We corrected an error in the calculation of the annualized average return volatility in the Chapter [Introduction to Tidy Finance](introduction-to-tidy-finance.qmd#the-efficient-frontier). -- [Mar. 06, 2023, Commit 857f0f5: ](https://github.com/tidy-finance/website/commit/857f0f5893a8e7e4c2b4475e1461ebf3d0abe2d6) We corrected an error in the label of [Figure 6](introduction-to-tidy-finance.qmd#fig-106), which wrongly claimed to show the efficient tangency portfolio. +- [Feb. 15, 2023, Commit bfda6af: ](https://github.com/tidy-finance/website/commit/bfda6af6169a42f433568e32b7a9cce06cb948ac) We corrected an error in the calculation of the annualized average return volatility in the Chapter Introduction to Tidy Finance. +- [Mar. 06, 2023, Commit 857f0f5: ](https://github.com/tidy-finance/website/commit/857f0f5893a8e7e4c2b4475e1461ebf3d0abe2d6) We corrected an error in the label of Figure 6 in Chapter Introduction to Tidy Finance, which wrongly claimed to show the efficient tangency portfolio. - [Mar. 09, 2023, Commit fae4ac3: ](https://github.com/tidy-finance/website/commit/fae4ac3fd12797d66a48f43af3d8e84ded694f13) We corrected a typo in the definition of the power utility function in Chapter [Portfolio Performance](parametric-portfolio-policies.qmd#portfolio-performance). The utility function implemented in the code is now consistent with the text. diff --git a/r/discounted-cash-flow-analysis.qmd b/r/discounted-cash-flow-analysis.qmd new file mode 100644 index 00000000..9aab5c78 --- /dev/null +++ b/r/discounted-cash-flow-analysis.qmd @@ -0,0 +1,398 @@ +--- +title: Discounted Cash Flow Analysis +metadata: + pagetitle: DCF with R + description-meta: Learn how to use the programming language R to value companies using discounted cash flow analysis. +--- + +In this chapter, we address a fundamental question: what is the value of a company? Company valuation is a critical tool that helps us determine the economic value of a business. Whether it’s for investment decisions, mergers and acquisitions, or financial reporting, understanding a company’s value is important. But valuation isn’t just about assigning a number - it’s about providing a framework for making informed decisions. For example, investors use valuation to identify whether a stock is under- or overvalued, and companies rely on valuation for strategic decisions, like pricing an acquisition or preparing for an IPO. + +Company valuation methods broadly fall into three categories:\index{Valuation methods} + +Market-based approaches compare companies using relative metrics like Price-to-Earnings ratios +Asset-based methods focus on the net value of a company's tangible and intangible assets +Income-based techniques value companies based on their ability to generate future cash flows + +We focus on DCF analysis, an income-based approach, because it captures three crucial aspects of valuation:\index{DCF components} First, DCF explicitly accounts for the time value of money - the principle that a dollar today is worth more than a dollar in the future. By discounting future cash flows to present value, we incorporate both time preferences and risk.\index{Time value of money} Second, DCF is inherently forward-looking, making it particularly suitable for companies where historical performance may not fully reflect future potential. This characteristic is especially relevant when valuing growth companies or analyzing new business opportunities. Third, DCF analysis is flexible enough to accommodate various business models and capital structures, making it applicable across different industries and company sizes. + +A DCF valuation consists of three key components:\index{DCF structure} + +- Free Cash Flow (FCF) forecasts represent the expected future cash available for distribution to investors after accounting for operating expenses, taxes, and investments.\index{Free Cash Flow} +- Terminal value captures the company's value beyond the explicit forecast period, often representing a significant portion of the total valuation.\index{Terminal value} +- Discount rate, typically the Weighted Average Cost of Capital (WACC), adjusts future cash flows to present value by incorporating risk and capital structure considerations.\index{WACC} + +In this chapter, we rely on the following packages to build a simple DCF analysis: + +```{r} +#| message: false +#| warning: false +library(tidyverse) +library(tidyfinance) +library(scales) +library(fmpapi) +``` + +## Prepare Data + +Before we can perform a DCF analysis, we need historical financial data to inform our forecasts. We use the Financial Modeling Prep (FMP) API to download financial statements. The `fmpapi` package provides a convenient interface for accessing this data in a tidy format.\index{Data!FMP API} + +```{r} +#| cache: true +symbol <- "MSFT" + +income_statements <- fmp_get("income-statement", symbol, list(period = "annual", limit = 5)) +cash_flow_statements <- fmp_get("cash-flow-statement", symbol, list(period = "annual", limit = 5)) +``` + +Our analysis centers on Free Cash Flow (FCF), which represents the cash available to all investors after the company has covered its operational needs and capital investments.\index{Free Cash Flow} We calculate FCF using the following formula: +$$\text{FCF} = \text{EBIT} + \text{Depreciation & Amortization} - \text{Taxes} + \Delta \text{Working Capital} - \text{CAPEX}$$ +Each component of this formula serves a specific purpose in capturing the company's cash-generating ability:\index{Free Cash Flow components} + +- EBIT (Earnings Before Interest and Taxes) measures core operating profit +- Depreciation & Amortization accounts for non-cash expenses +- Taxes reflect actual cash payments to tax authorities +- Changes in Working Capital capture cash tied up in operations +- Capital Expenditures (CAPEX) represent investments in long-term assets + +We can implement this calculation by combining and transforming our financial statement data: + +```{r} +dcf_data <- income_statements |> + mutate( + ebit = net_income + income_tax_expense - interest_expense - interest_income + ) |> + select( + year = calendar_year, + ebit, revenue, depreciation_and_amortization, taxes = income_tax_expense + ) |> + left_join( + cash_flow_statements |> + select(year = calendar_year, + delta_working_capital = change_in_working_capital, + capex = capital_expenditure), join_by(year) + ) |> + mutate( + fcf = ebit + depreciation_and_amortization - taxes + delta_working_capital - capex + ) |> + arrange(year) +``` + +## Forecast Free-Cash-Flow + +After calculating historical FCF, we need to project it into the future. While historical data provides a foundation, forecasting requires both quantitative analysis and qualitative judgment. We use a ratio-based approach that links all FCF components to revenue growth, making our forecasts more tractable.\index{Free Cash Flow!Forecasting} + +First, we express each FCF component as a ratio relative to revenue. This standardization helps identify trends and makes forecasting more systematic. @fig-500 shows the historical evolution of these key financial ratios. + +```{r} +#| label: fig-500 +#| fig-cap: "Ratios are based on financial statements as provided through the FMP API." +#| fig-alt: "Title: Key financial ratios of Microsoft between 2020 and 2024. The figure shows a line chart with years on the horizontal axis and financial ratios on the vertical axis." +dcf_data <- dcf_data |> + mutate( + revenue_growth = revenue / lag(revenue) - 1, + operating_margin = ebit / revenue, + da_margin = depreciation_and_amortization / revenue, + taxes_to_revenue = taxes / revenue, + delta_working_capital_to_revenue = delta_working_capital / revenue, + capex_to_revenue = capex / revenue + ) + +fig_financial_ratios <- dcf_data |> + pivot_longer(cols = c(operating_margin:capex_to_revenue)) |> + ggplot(aes(x = year, y = value, color = name)) + + geom_line() + + scale_x_continuous(breaks = pretty_breaks()) + + scale_y_continuous(labels = percent) + + labs( + x = NULL, y = NULL, color = NULL, + title = "Key financial ratios of Microsoft between 2020 and 2024" + ) +fig_financial_ratios +``` + +The operating margin, for instance, represents how much of each revenue dollar translates into operating profit (EBIT), while the CAPEX-to-revenue ratio indicates the company's investment intensity.\index{Financial ratios} + +For our DCF analysis, we need to project these ratios into the future. These projections should reflect both historical patterns and forward-looking considerations such as: industry trends and competitive dynamics, company-specific growth initiatives, expected operational efficiency improvements, planned capital investments, or working capital management strategies. + +We demonstrate this forecasting approach in @fig-501: + +```{r} +#| label: fig-501 +#| fig-cap: "Realized ratios are based on financial statements as provided through the FMP API, while forecasts are manually defined." +#| fig-alt: "Title: Key financial ratios and ad-hoc forecasts of Microsoft between 2020 and 2029. The figure shows a line chart with years on the horizontal axis and financial ratios and their forecasts on the vertical axis." +dcf_data_forecast_ratios <- tribble( + ~year, ~operating_margin, ~da_margin, ~taxes_to_revenue, ~delta_working_capital_to_revenue, ~capex_to_revenue, + 2025, 0.41, 0.09, 0.08, 0.001, -0.2, + 2026, 0.42, 0.09, 0.07, 0.001, -0.22, + 2027, 0.43, 0.09, 0.06, 0.001, -0.2, + 2028, 0.44, 0.09, 0.06, 0.001, -0.18, + 2029, 0.45, 0.09, 0.06, 0.001, -0.16 +) |> + mutate(type = "Forecast") + +dcf_data <- dcf_data |> + mutate(type = "Realized") |> + bind_rows(dcf_data_forecast_ratios) + +fig_financial_ratios_forecast <- dcf_data |> + pivot_longer(cols = c(operating_margin:capex_to_revenue)) |> + ggplot(aes(x = year, y = value, color = name, linetype = rev(type))) + + geom_line() + + scale_x_continuous(breaks = pretty_breaks()) + + scale_y_continuous(labels = percent) + + labs( + x = NULL, y = NULL, color = NULL, linetype = NULL, + title = "Key financial ratios and ad-hoc forecasts of Microsoft between 2020 and 2029" + ) +fig_financial_ratios_forecast +``` + +The final step in our FCF forecast is projecting revenue growth. While there are multiple approaches to this task, we demonstrate a GDP-based method that links company growth to macroeconomic forecasts.\index{Revenue growth} + +We use GDP growth forecasts from the [IMF World Economic Outlook (WEO)](https://www.imf.org/en/Publications/WEO/weo-database/2024/October/weo-report?c=111,&s=NGDP_RPCH,&sy=2020&ey=2029&ssm=0&scsm=1&scc=0&ssd=1&ssc=0&sic=1&sort=country&ds=.&br=1) database as our baseline. The WEO provides comprehensive economic projections, though it's important to note that these forecasts are periodically revised as new data becomes available.\index{Data!IMF WEO} + +```{r} +gdp_growth <- tibble( + year = 2020:2029, + gdp_growth = c(-0.02163, 0.06055, 0.02512, 0.02887, 0.02765, 0.02153, 0.02028, 0.02120, 0.02122, 0.02122) +) + +dcf_data <- dcf_data |> + left_join(gdp_growth, join_by(year)) +``` + +Our approach models revenue growth as a linear function of GDP growth. This relationship captures the intuition that company revenues often move in tandem with broader economic activity, though usually with different sensitivity.\index{Growth modeling} + +```{r} +revenue_growth_model <- dcf_data |> + lm(revenue_growth ~ gdp_growth, data = _) |> + coefficients() + +dcf_data <- dcf_data |> + mutate( + revenue_growth_modeled = revenue_growth_model[1] + revenue_growth_model[2] * gdp_growth, + revenue_growth = if_else(type == "Forecast", revenue_growth_modeled, revenue_growth) + ) +``` + +The model estimates two parameters: (i) an intercept that captures the company's baseline growth, and (ii) a slope coefficient that measures the company's sensitivity to GDP changes. + +We visualize the historical and projected growth rates using this approach in @fig-502: + +```{r} +#| label: fig-502 +#| fig-cap: "Realized revue growth rates are based on financial statements as provided through the FMP API, while forecasts are modeled unsing IMF WEO forecasts." +#| fig-alt: "Title: GDP growth and Microsoft revenue growth and modeled forecasts between 2020 and 2029. The figure shows a line chart with years on the horizontal axis and gdp growth, revenue growth and their forecasts on the vertical axis." +fig_growth <- dcf_data |> + filter(year >= 2021) |> + pivot_longer(cols = c(revenue_growth, gdp_growth)) |> + ggplot(aes(x = year, y = value, color = name, linetype = rev(type))) + + geom_line() + + scale_x_continuous(breaks = pretty_breaks()) + + scale_y_continuous(labels = percent) + + labs( + x = NULL, y = NULL, color = NULL, linetype = NULL, + title = "GDP growth and Microsoft revenue growth and modeled forecasts between 2020 and 2029" + ) +fig_growth +``` + +While more sophisticated approaches exist, including proprietary analyst forecasts or bottom-up market analysis, this method provides a transparent and data-driven starting point for revenue projections.\index{Forecasting methods} + +With all components in place - revenue growth projections and financial ratios - we can now calculate our FCF forecasts. We first need to convert our growth rates into revenue projections and then apply our forecasted ratios to compute each FCF component.\index{Free Cash Flow!Computation} + +```{r} +dcf_data$revenue_growth[1] <- 0 +dcf_data$revenue <- dcf_data$revenue[1] * cumprod(1 + dcf_data$revenue_growth) + +dcf_data <- dcf_data |> + mutate( + ebit = operating_margin * revenue, + depreciation_and_amortization = da_margin * revenue, + taxes = taxes_to_revenue * revenue, + delta_working_capital = delta_working_capital_to_revenue * revenue, + capex = capex_to_revenue * revenue, + fcf = ebit + depreciation_and_amortization - taxes + delta_working_capital - capex + ) +``` + +We visualize the resulting FCF projections in @fig-503: + +```{r} +#| label: fig-503 +#| fig-cap: "Realized growth rates are based on financial statements as provided through the FMP API, while forecasts are manually defined." +#| fig-alt: "Title: GDP growth and Microsoft revenue growth and modeled forecasts between 2020 and 2029. The figure shows a line chart with years on the horizontal axis and gdp growth, revenue growth and their forecasts on the vertical axis." +fig_fcf <- dcf_data |> + ggplot(aes(x = year, y = fcf / 1e9)) + + geom_col(aes(fill = type)) + + scale_x_continuous(breaks = pretty_breaks()) + + scale_y_continuous(labels = comma) + + labs( + x = NULL, y = "Free Cash Flow (in B USD)", fill = NULL, + title = "Actual and predicted free cash flow for Microsoft from 2020 to 2029" + ) +fig_fcf +``` + +## Continuation Value + +A key component of DCF analysis is the continuation value (or terminal value), which represents the company's value beyond the explicit forecast period. This value often constitutes the majority of the total valuation, making its estimation particularly important.\index{Continuation value} +The most common approach is the Perpetuity Growth Model, which assumes FCF grows at a constant rate indefinitely. The formula for this model is: + +$$TV_{T} = \frac{FCF_{T+1}}{r - g}$$ + +where $TV_{T}$ is the terminal value at time $T$, $FCF_{T+1}$ is the free cash flow in the first year after our forecast period, $r$ is the discount rate (typically WACC, see below), and $g$ is the perpetual growth rate. + +The perpetual growth rate $g$ should reflect long-term economic growth potential. A common benchmark is the long-term GDP growth rate, as few companies can sustainably grow faster than the overall economy indefinitely.\index{Growth rate!Perpetual} For instance, last 20 years GDP growth is a sensible assumption (nominal growth rate is 4% for the US). + +Let's implement the perpetuity growth model: + +```{r} +compute_terminal_value <- function(last_fcf, growth_rate, discount_rate){ + last_fcf * (1 + growth_rate) / (discount_rate - growth_rate) +} + +last_fcf <- tail(dcf_data$fcf, 1) +terminal_value <- compute_terminal_value(last_fcf, 0.04, 0.08) +terminal_value / 1e9 +``` + +Note that while we use the Perpetuity Growth Model here, practitioners often cross-check their estimates with alternative methods like the exit multiple approach, which bases the terminal value on comparable company valuations.^[See (corporatefinanceinstitute.com/)[https://corporatefinanceinstitute.com/resources/valuation/exit-multiple/)] for an intuitive explanation of the exit multiple approach.] +\index{Valuation methods!Exit multiple} + +## Discount Rates + +The final component of our DCF analysis involves discounting future cash flows to present value. We typically use the Weighted Average Cost of Capital (WACC) as the discount rate, as it represents the blended cost of financing for all company stakeholders.\index{WACC} + +The WACC formula combines the costs of equity and debt financing: + +$$WACC = \frac{E}{D+E} \cdot r^E + \frac{D}{D+E} \cdot r^D \cdot (1 - \tau),$$ + +where $E$ is the market value of the company’s equity with required return $r^E$, $D$ is the market value of the company's debt with pre-tax return $r^D$, and $\tau$ is the tax rate. + +While you can often find estimates of WACC from financial databases or analysts’ reports, sometimes you may need to calculate it yourself. Let’s walk through the practical steps to estimate WACC using real-world data: +- $E$ is typically measured as the market value of the company’s equity. One common approach is to calculate it by subtracting net debt (total debt minus cash) from the enterprise value. +- $D$ is often measured using the book value of the company’s debt. While this might not perfectly reflect market conditions, it’s a practical starting point when market data is unavailable. +- The Capital Asset Pricing Model (CAPM) is a popular method to estimate the cost of equity $r^E$. It considers the risk-free rate, the equity risk premium, and the company’s beta. For a detailed guide on how to estimate the CAPM, we refer to Chapter [Capital Asset Pricing Model](capital-asset-pricing-model.qmd). +- The return on debt $r^D$ can also be estimated in different ways. For instance, effective interest rates can be calculated as the ratio of interest expense to total debt from financial statements. This gives you a real-world measure of what the company is currently paying. Alternatively, you can look up corporate bond spreads for companies in the same rating group. For highly rated companies like Microsoft, this would reflect their low-risk profile and correspondingly low borrowing costs. + +If you'd rather not estimate WACC manually, there are excellent resources available to help you find industry-specific discount rates. One of the most widely used sources is Aswath Damodaran’s [database](https://pages.stern.nyu.edu/~adamodar/New_Home_Page/datacurrent.html). He maintains an extensive database that provides a wealth of financial data, including estimated discount rates, cash flows, growth rates, multiples, and more. What makes his database particularly valuable is its level of detail and coverage of multiple industries and regions. For example, if you’re analyzing a company in the Computer Services sector, as we do here, you can look up the industry’s average WACC and use it as a benchmark for your analysis. The following code chunk downloads the WACC data and extracts the value for this industry: + +```{r} +#| cache: true +library(readxl) + +file <- tempfile(fileext = "xls") + +url <- "https://pages.stern.nyu.edu/~adamodar/pc/datasets/wacc.xls" +download.file(url, file) +wacc_raw <- read_xls(file, sheet = 2, skip = 18) +unlink(file) + +wacc <- wacc_raw |> + filter(`Industry Name` == "Computer Services") |> + pull(`Cost of Capital`) +``` + +## Compute DCF Value + +aving established all components, we can now compute the total company value. The DCF value combines two elements:\index{DCF!Computation} + +- The present value of explicit forecast period cash flows +- The present value of the terminal value + +This is expressed mathematically as: + +$$ +\text{Total DCF Value} = \sum_{t=1}^{T} \frac{\text{FCF}t}{(1 + \text{WACC})^t} + \frac{\text{TV}{T}}{(1 + \text{WACC})^T} +$$ + +where $T$ is the length of our forecast period. Let's implement this calculation: + +```{r} +forecasted_years <- 5 + +compute_dcf <- function(wacc, growth_rate, years = 5) { + free_cash_flow <- dcf_data$fcf + last_fcf <- tail(free_cash_flow, 1) + terminal_value <- compute_terminal_value(last_fcf, growth_rate, wacc) + + present_value_fcf <- free_cash_flow / (1 + wacc)^(1:years) + present_value_tv <- terminal_value / (1 + wacc)^years + total_dcf_value <- sum(present_value_fcf) + present_value_tv + total_dcf_value +} + +compute_dcf(wacc, 0.03) / 1e9 +``` + +Note that this valuation represents an enterprise value - the total value of the company's operations. To arrive at an equity value, we would need to subtract net debt (total debt minus cash and equivalents).\index{Enterprise value} + +## Sensitvity Analysis + +DCF valuation is only as robust as its underlying assumptions. Given the inherent uncertainty in forecasting, it's crucial to understand how changes in key inputs affect our valuation.\index{Sensitivity analysis} + +While we could examine sensitivity to various inputs like operating margins or capital expenditure ratios, we focus on two critical drivers: + +- The perpetual growth rate, which determines long-term value creation +- The WACC, which affects how we value future cash flows + +Let's implement a sensitivity analysis that varies these parameters: +```{r} +#| label: fig-504 +#| fig-cap: "DCF value combines data from FMP API, ad-hoc forecasts of financial ratios, and IMF WEO growth forecasts." +#| fig-alt: "Title: DCF value of Microsoft for different WACC and growth scenarios. The figure shows a tile chart different values of WACC on the horizontal axis and perpetual growth rates on the vertical axis. Each tile shows a corresponding DCF value, illustrating the sensitivity of the DCF analysis to assumptions." +wacc_range <- seq(0.06, 0.08, by = 0.01) +growth_rate_range <- seq(0.02, 0.04, by = 0.01) + +sensitivity <- expand_grid( + wacc = wacc_range, + growth_rate = growth_rate_range +) |> + mutate(value = pmap_dbl(list(wacc, growth_rate), compute_dcf)) + +fig_sensitivity <- sensitivity |> + mutate(value = round(value / 1e9, 0)) |> + ggplot(aes(x = wacc, y = growth_rate, fill = value)) + + geom_tile() + + geom_text(aes(label = comma(value)), color = "white") + + scale_x_continuous(labels = percent) + + scale_y_continuous(labels = percent) + + scale_fill_continuous(labels = comma) + + labs( + title = "DCF value of Microsoft for different WACC and growth scenarios", + x = "WACC", + y = "Perpetual growth rate", + fill = "Company value" + ) + + guides(fill = guide_colorbar(barwidth = 15, barheight = 0.5)) +fig_sensitivity +``` + +@fig-504 reveals several key insights about our valuation:\index{Visualization!Sensitivity} The valuation is highly sensitive to both WACC and growth assumptions; small changes in either parameter can lead to substantial changes in company value; the relationship between these parameters and company value is non-linear; and the impact of growth rate changes becomes more pronounced at lower WACCs. + +## From DCF to Equity Value + +Our DCF analysis yields the enterprise value - the value of the company's operations. To arrive at the equity value that belongs to shareholders, we need to make several adjustments:\index{Equity value} + +$$\text{Equity Value} = \text{DCF Value} + \text{Non-Operating Assets} - \text{Value of Debt}$$ +where Non-Operating Assets are not essential to operations, but generate income (e.g., marketable securities, vacant land, idle equipment) and the Value of Debt is in theory the market value of total debt, but in practice typically book debt. We leave it as an exercise to calculate the Equity Value using the DCF Value from above. + +## Key takeaways + +Discounted Cash Flow analysis provides a structured approach to company valuation. In this chapter, you learned to: + +- Calculate Free Cash Flow using financial statement data and ratio analysis. +- Project future cash flows by combining company-specific metrics with macroeconomic forecasts. +- Estimate terminal value using the perpetuity growth method. +- Apply appropriate discount rates based on industry benchmarks. +- Test the sensitivity of valuations to key assumptions. +- Convert enterprise values to equity values using balance sheet adjustments. + +## Exercises + +1. Download financial statements for another company of your choice and compute its historical Free Cash Flow. Compare the results with the Microsoft example from this chapter. +1. Create a function that automatically generates FCF forecasts using different sets of ratio assumptions. Use it to create alternative scenarios for Microsoft. +1. Implement an exit multiple approach for terminal value calculation and compare the results with the perpetuity growth method. +1. Extend the sensitivity analysis to include operating margin assumptions. Create a visualization showing how changes in margins affect the final valuation. + diff --git a/r/financial-ratios.qmd b/r/financial-ratios.qmd new file mode 100644 index 00000000..d740012e --- /dev/null +++ b/r/financial-ratios.qmd @@ -0,0 +1,595 @@ +--- +title: Financial Ratios +metadata: + pagetitle: Financial Ratios with R + description-meta: Learn how to use the programming language R to analyze companies using financial ratios. +cache: true +--- + +Financial statements and ratios are fundamental tools for understanding and evaluating companies. While the previous chapter on the [Capital Asset Pricing Model](capital-asset-pricing-model.qmd) focused on how assets are priced in equilibrium, this chapter examines how investors and analysts assess individual companies using accounting information. Financial statements serve as the primary source of standardized information about a company's operations, financial position, and performance. Their standardization and legal requirements make them particularly valuable: all companies must file financial statements, and public companies face additional scrutiny through mandatory independent audits and regular filings with the Securities and Exchange Commission (SEC). + +Building on this foundation of standardized financial information, financial ratios transform raw accounting data into meaningful metrics that facilitate analysis across companies and over time. These ratios serve multiple purposes in both academic research and practical applications. They enable investors to benchmark companies against their peers, identify industry trends, and screen for investment opportunities. In academic finance, ratios play a crucial role in asset pricing models - from the book-to-market ratio in the Fama-French three-factor model discussed in Value and Bivariate Sorts to the investment and profitability ratios underlying the Q-factor model. Moreover, ratios help assess a company's financial health, capital structure decisions, and potential distress risk. + +This chapter demonstrates how to access, process, and analyze financial statements using R. We show how to calculate key financial ratios, implement common screening strategies, and evaluate companies' financial health. Our analysis combines theoretical frameworks with practical implementation, providing tools for both academic research and investment practice. + +We use the following packages throughout this chapter: + +```{r} +#| message: false +#| warning: false +library(tidyverse) +library(tidyfinance) +library(scales) +library(ggrepel) +library(fmpapi) +``` + +## Balance Sheet Statements + +The balance sheet is one of the three primary financial statements, alongside the income statement and cash flow statement. It captures a company's financial position at a specific moment in time, much like a financial photograph. The statement is built on the fundamental accounting equation: + +$$\text{Assets} = \text{Liabilities} + \text{Shareholders’ Equity}$$ + +This equation reflects a core principle of accounting: a company's resources (assets) must equal its sources of funding, whether from creditors (liabilities) or investors (shareholders' equity). Assets represent resources that the company controls and expects to generate future economic benefits, such as cash, inventory, or equipment. Liabilities encompass all obligations to external parties, from short-term payables to long-term debt. Shareholders' equity, sometimes called net worth or book value, represents the residual claim on assets after accounting for all liabilities. + +@fig-400 provides a stylized representation of a balance sheet's structure. The visualization highlights how assets on the left side must equal the combined claims of creditors and shareholders on the right side. This balance illustrates the dual nature of corporate finance: every asset must be financed either through debt (liabilities) or equity. + +![A stylized representation of a balance sheet statement.](../assets/img/balance-sheet.svg){#fig-400 alt="A stylized representation of a balance sheet statement."} + +The asset side of the balance sheet typically comprises three main categories, each serving different roles in the company's operations: + +1. Current Assets: These are assets expected to be converted into cash or used within one operating cycle (typically one year). They include: cash and cash equivalents, short-term investments, accounts receivable (money owed by customers), inventory (raw materials, work in progress, and finished goods). +1. Non-Current Assets: These long-term assets support the company's operations beyond one year: Property, Plant, and Equipment (PP&E), long-term investments, other long-term assets. +1. Intangible Assets: These non-physical assets often represent significant value in modern companies: patents and intellectual property, trademarks and brands, goodwill from acquisitions, Software and development costs. + +@fig-401 illustrates this hierarchical breakdown of assets, showing how companies classify their resources based on liquidity and nature. + +![A stylized representation of a breakdown of assets on a balance sheet.](../assets/img/assets.svg){#fig-401 alt="A stylized representation of a breakdown of assets on a balance sheet."} + +The liability side similarly follows a temporal classification, dividing obligations based on when they come due: + +1. Current Liabilities: Obligations due within one year such as accounts payable, short-term debt, current portion of long-term debt, accrued expenses. +1. Non-Current Liabilities: Long-term obligations such as long-term debt, bonds payable, deferred tax liabilities, pension obligations. + +@fig-402 shows this breakdown, highlighting how companies structure their debt obligations. + +![A stylized representation of a breakdown of liabilities on a balance sheet.](../assets/img/liabilities.svg){#fig-402 alt="A stylized representation of a breakdown of liabilities on a balance sheet."} + +Lastly, the equity section represents ownership claims and typically consists of: + +- Retained Earnings: Accumulated profits reinvested in the business. +- Common Stock: Par value and additional paid-in capital from share issuance +- Preferred Stock: Hybrid securities with characteristics of both debt and equity + +@fig-403 depicts this equity structure, showing how companies track different forms of ownership claims. + +![A stylized representation of a breakdown of equity on a balance sheet.](../assets/img/equity.svg){#fig-403 alt="A stylized representation of a breakdown of equity on a balance sheet."} + +To illustrate these concepts in practice, @fig-404 presents Microsoft's balance sheet from 2023. This real-world example demonstrates how one of the world's largest technology companies structures its financial position, reflecting both traditional elements like PP&E and modern aspects like significant intangible assets. + +![A screenshot of the balance sheet statement of Microsoft in 2023.](../assets/img/balance-sheet-msft.png){#fig-404 alt="A screenshot of the balance sheet statement of Microsoft in 2023."} + +Microsoft's balance sheet exemplifies several key trends in modern corporate finance: the growing importance of intangible assets in technology companies, the strategic use of cash and investments, and the complex interplay between different forms of financing. In subsequent sections, we'll explore how to analyze such statements using financial ratios, particularly focusing on measures of liquidity, solvency, and efficiency. + +While the SEC provides a web interface to search filings, programmatic access to financial statements greatly facilitates systematic analysis. The Financial Modeling Prep (FMP) API offers such programmatic access, which we can leverage through the R package fmpapi. + +The FMP API's free tier provides access to: + +- 250 API calls per day +- 5 years of historical fundamental data +- Real-time and historical stock prices +- Key financial ratios and metrics + +Let's examine Microsoft's balance sheet statements using the `fmp_get()` function. This function requires three main arguments: The type of financial data to retrieve (`resource`), the stock ticker symbol (`symbol`), and additional parameters like periodicity and number of periods (`params`). + +```{r} +#| cache: true +fmp_get( + resource = "balance-sheet-statement", + symbol = "MSFT", + params = list(period = "annual", limit = 5) +) +``` + +The function returns a data frame containing detailed balance sheet information, with each row representing a different reporting period. This structured format makes it easy to analyze trends over time and calculate financial ratios. We can see how the data aligns with the balance sheet components we discussed earlier, from current assets like cash and receivables to long-term assets and various forms of liabilities and equity. + +## Income Statements + +While the balance sheet provides a snapshot of a company's financial position at a point in time, the income statement (also called profit and loss statement) measures financial performance over a period, typically a quarter or year. It follows a hierarchical structure that progressively captures different levels of profitability: + +- Revenue (Sales): the total income generated from goods or services sold. +- Cost of Goods Sold (COGS): direct costs associated with producing the goods or services (raw materials, labor, etc.). +- Gross Profit: revenue minus COGS, showing the basic profitability from core operations. +- Operating Expenses: costs related to regular business operations (Salaries, Rent, Marketing). +- Operating Income (EBIT): earnings before interest and taxes (measures profitability from core operations before financing and tax costs). +- Net Income: The “bottom line”—total profit after all expenses, taxes, and interest are subtracted from revenue. + +@fig-405 illustrates this progression from total revenue to net income, showing how various costs and expenses are subtracted to arrive at different profitability measures. + +![A stylized representation of an income statement.](../assets/img/income-statements.svg){#fig-405 alt="A stylized representation of an income statement."} + +Consider Microsoft's 2023 income statement in @fig-406, which exemplifies how a leading technology company reports its financial performance: + +![A screenshot of the income statement of Microsoft in 2023.](../assets/img/income-statements-msft.png){#fig-406 alt="A screenshot of the income statement of Microsoft in 2023."} + +We can also access this data programmatically using the FMP API: + +```{r} +#| cache: true +fmp_get( + resource = "income-statement", + symbol = "MSFT", + params = list(period = "annual", limit = 5) +) +``` + +The structured format of this data enables systematic analysis of profitability trends and operational efficiency. In later sections, we'll use these figures to calculate important profitability ratios and examine how they compare across companies and industries. The income statement's focus on performance complements the balance sheet's position snapshot, providing a more complete picture of a company's financial health. + +## Cash Flow Statements + +The cash flow statement complements the balance sheet and income statement by tracking the actual movement of cash through the business. While the income statement shows profitability and the balance sheet shows financial position, the cash flow statement reveals a company's ability to generate and manage cash - a crucial aspect of financial health. The statement is divided into three main categories: + +- Operating Activities: cash generated from a company’s core business activities (Net Income adjusted for non-cash items like depreciation, and changes in working capital). +- Financing Activities: cash flows related to borrowing, repaying debt, issuing equity, or paying dividends. +- Investing Activities: cash spent on or received from long-term investments, such as purchasing or selling property, equipment, or securities. + +@fig-407 illustrates how these three categories combine to show the overall change in a company's cash position. + +![A stylized representation of a cash flow statement.](../assets/img/cash-flow-statements.svg){#fig-407 alt="A stylized representation of a cash flow statement."} + +The statement reconciles accrual-based accounting (used in the income statement) with actual cash movements. This reconciliation is crucial because profitable companies can still face cash shortages, and unprofitable companies might maintain positive cash flow. Microsoft's 2023 cash flow statement in @fig-408 demonstrates how a large technology company manages its cash flows: + +![A screenshot of the cash flow statement of Microsoft in 2023.](../assets/img/cash-flow-statements-msft.png){#fig-408 alt="A screenshot of the cash flow statement of Microsoft in 2023."} + +Of course, we can access this data systematically through the FMP API: + +```{r} +#| cache: true +fmp_get( + resource = "cash-flow-statement", + symbol = "MSFT", + params = list(period = "annual", limit = 5) +) +``` + +In subsequent sections, we'll use this data to calculate important cash flow ratios that help assess a company's liquidity, capital allocation efficiency, and overall financial sustainability. The combination of all three financial statements - balance sheet, income statement, and cash flow statement - provides a comprehensive view of a company's financial health and performance. + +## Download Financial Statements + +We now turn to downloading and processing statements for multiple companies systematically. The next code chunk demonstrates how to retrieve financial data for all constituents of the Dow Jones Industrial Average, similar to our approach in the CAPM chapter. + +```{r} +#| cache: true +constituents <- download_data_constituents("Dow Jones Industrial Average") |> + pull(symbol) + +params <- list(period = "annual", limit = 5) + +balance_sheet_statements <- constituents |> + map_df( + \(x) fmp_get(resource = "balance-sheet-statement", symbol = x, params = params) + ) + +income_statements <- constituents |> + map_df( + \(x) fmp_get(resource = "income-statement", symbol = x, params = params) + ) + +cash_flow_statements <- constituents |> + map_df( + \(x) fmp_get(resource = "cash-flow-statement", symbol = x, params = params) + ) +``` + +The resulting datasets provide a foundation for cross-sectional analysis of financial ratios and trends across major U.S. companies. In the following sections, we'll use these datasets to calculate various financial ratios and analyze patterns in corporate financial performance. + +## Liquidity Ratios + +Liquidity ratios assess a company's ability to meet its short-term obligations and are typically calculated using balance sheet items. These ratios are particularly important for creditors and investors concerned about a company's short-term financial health and ability to cover immediate obligations. + +The Current Ratio is the most basic measure of liquidity, comparing all current assets to current liabilities: + +$$\text{Current Ratio} = \frac{\text{Current Assets}}{\text{Total Assets}}$$ + +A ratio above 1 indicates that the company has enough current assets to cover its current liabilities. + +However, not all current assets are equally liquid, leading to the Quick Ratio: + +$$\text{Quick Ratio} = \frac{\text{Current Assets - Inventory}}{\text{Current Liabilities}}$$ +The Quick Ratio provides a more stringent measure of liquidity by excluding inventory, which is typically the least liquid current asset. A ratio above 1 suggests strong short-term solvency without relying on inventory sales. + +The most conservative liquidity measure is the Cash Ratio: + +$$\text{Cash Ratio} = \frac{\text{Cash and Cash Equivalents}}{\text{Current Liabilities}}$$ + +This ratio focuses solely on the most liquid assets - cash and cash equivalents. While a ratio of 1 indicates robust liquidity, most companies maintain lower cash ratios to avoid holding excessive non-productive assets. +Let's calculate these ratios for Dow Jones constituents, focusing on three major technology companies: + +```{r} +selected_symbols <- c("MSFT", "AAPL", "AMZN") + +balance_sheets_statements <- balance_sheet_statements |> + mutate( + current_ratio = total_current_assets / total_assets, + quick_ratio = (total_current_assets - inventory) / total_current_liabilities, + cash_ratio = cash_and_cash_equivalents / total_current_liabilities, + label = if_else(symbol %in% selected_symbols, symbol, NA), + ) +``` + +@fig-409 compares these liquidity ratios across Microsoft, Apple, and Amazon for 2023: + +```{r} +#| label: fig-409 +#| fig-cap: "Liquidity ratios are based on financial statements as provided through the FMP API." +#| fig-alt: "Title: Liquidity ratios for selected stocks for 2023. The figure shows a bar chart liquidity ratios on the vertical and corresponding values on the horizontal axis." +fig_liquidity_ratios <- balance_sheets_statements |> + filter(calendar_year == 2023 & !is.na(label)) |> + select(symbol, contains("ratio")) |> + pivot_longer(-symbol) |> + mutate(name = str_to_title(str_replace_all(name, "_", " "))) |> + ggplot(aes(x = value, y = name, fill = symbol)) + + geom_col(position = "dodge") + + scale_x_continuous(labels = percent) + + labs( + x = NULL, y = NULL, fill = NULL, + title = "Liquidity ratios for selected stocks from the Dow index for 2023" + ) +fig_liquidity_ratios +``` + +The liquidity ratios for Microsoft, Apple, and Amazon in 2023 reveal distinct patterns in how these technology giants manage their short-term financial positions. Microsoft demonstrates the most conservative approach with the highest quick ratio of around 160%, significantly above Amazon's 100% and Apple's 90%, indicating strong ability to cover short-term obligations without relying on inventory. However, all three companies maintain relatively modest current ratios between 30-50%, which might appear concerning but is typical for technology companies with strong cash flows and limited inventory needs. Their cash ratios also show interesting variation, with Amazon leading at 45%, followed by Microsoft at 40%, and Apple maintaining a more conservative cash position at 20%. These patterns reflect the technology sector's characteristics of minimal inventory requirements and strategic cash management, while also highlighting company-specific differences in financial management approaches, with Microsoft generally maintaining the most conservative liquidity position overall. + +## Leverage Ratios + +Leverage ratios assess a company's capital structure and its ability to meet financial obligations. These metrics are crucial for understanding financial risk and long-term solvency. We examine three key leverage measures: + +The debt-to-equity ratio indicates how much a company is financing its operations through debt versus shareholders' equity: + +$$\text{Debt-to-Equity} = \frac{\text{Total Debt}}{\text{Total Equity}}$$ + +The debt-to-asset ratio shows the percentage of assets financed through debt: + +$$\text{Debt-to-Asset} = \frac{\text{Total Debt}}{\text{Total Assets}}$$ + +Interest coverage measures a company's ability to meet interest payments: + +$$\text{Interest Coverage} = \frac{\text{EBIT}}{\text{Interest Expense}}$$ + +Let's calculate these ratios for our sample of companies: + +```{r} +balance_sheets_statements <- balance_sheets_statements |> + mutate( + debt_to_equity = total_debt / total_equity, + debt_to_asset = total_debt / total_assets + ) + +income_statements <- income_statements |> + mutate( + interest_coverage = operating_income / interest_expense, + label = if_else(symbol %in% selected_symbols, symbol, NA), + ) +``` + +@fig-410 tracks the evolution of debt-to-asset ratios for Microsoft, Apple, and Amazon over time: + +```{r} +#| label: fig-410 +#| fig-cap: "Debt-to-asset ratios are based on financial statements as provided through the FMP API." +#| fig-alt: "Title: Debt-to-asset ratios of selected stocks between 2020 and 2024. The figure shows a line chart with years on the horizontal axis and debt-to-asset ratios on the vertical axis." +fig_debt_to_asset <- balance_sheets_statements |> + filter(symbol %in% selected_symbols) |> + ggplot(aes(x = calendar_year, y = debt_to_asset, + color = symbol)) + + geom_line(linewidth = 1) + + scale_y_continuous(labels = percent) + + labs(x = NULL, y = NULL, color = NULL, + title = "Debt-to-asset ratios of selected stocks between 2020 and 2024") +fig_debt_to_asset +``` + +The evolution of debt-to-asset ratios among these major technology companies reveals distinct capital structure strategies and their changes over time. Apple shows the most dramatic shift, with its debt-to-asset ratio declining significantly from approximately 38% in 2021 to around 29% by 2024, suggesting a deliberate deleveraging strategy. Amazon maintained a relatively stable debt-to-asset ratio between 25-30% throughout the period, indicating a consistent approach to financial leverage. Microsoft demonstrates the most conservative leverage policy, systematically reducing its debt-to-asset ratio from about 27% in 2020 to below 20% by 2024. + +@fig-411 provides a cross-sectional view of debt-to-asset ratios across Dow Jones constituents in 2023. + +```{r} +#| label: fig-411 +#| fig-cap: "Debt-to-asset ratios are based on financial statements as provided through the FMP API." +#| fig-alt: "Title: Debt-to-asset ratios of Dow index constituents in 2023. The figure shows a bar chart with debt-to-asset ratios on the horizontal and corresponding symbols on the vertical axis." +selected_colors <- c("#F21A00", "#EBCC2A", "#3B9AB2", "lightgrey") + +fig_debt_to_asset_cross_section <- balance_sheets_statements |> + filter(calendar_year == 2023) |> + ggplot(aes(x = debt_to_asset, + y = fct_reorder(symbol, debt_to_asset), + fill = label)) + + geom_col() + + scale_x_continuous(labels = percent) + + scale_fill_manual(values = selected_colors) + + labs(x = NULL, y = NULL, color = NULL, + title = "Debt-to-asset ratios of Dow index constituents in 2023") + + theme(legend.position = "none") +fig_debt_to_asset_cross_section +``` + +McDonald's (MCD) shows the highest leverage with a debt-to-asset ratio approaching 90%, followed by Home Depot (HD) and Amgen (AMGN) at around 75%. At the other end of the spectrum, Travelers (TRV) and Cisco (CSCO) maintain the lowest leverage ratios at approximately 15%. Among our selected companies, Apple (AAPL) sits in the middle range with about 30% debt-to-asset ratio, while Amazon (AMZN) and Microsoft (MSFT) show more conservative leverage positions at around 25% and 20% respectively. + +@fig-412 reveals the relationship between companies' debt levels and their ability to service that debt. + +```{r} +#| label: fig-412 +#| fig-cap: "Debt-to-asset ratios and interest coverages are based on financial statements as provided through the FMP API." +#| fig-alt: "Title: Debt-to-asset ratios and interest coverages for Dow index constituents. The figure shows a scatter plot with debt-to-asset on the horizontal and interest coverage on the vertical axis." +fig_debt_to_asset_interest_coverage <- income_statements |> + filter(calendar_year == 2023) |> + select(symbol, interest_coverage, calendar_year) |> + left_join( + balance_sheets_statements, + join_by(symbol, calendar_year) + ) |> + ggplot(aes(x = debt_to_asset, y = interest_coverage, color = label)) + + geom_point(size = 2) + + geom_label_repel(aes(label = label), seed = 42, box.padding = 0.75) + + scale_x_continuous(labels = percent) + + scale_y_continuous(labels = percent) + + scale_color_manual(values = selected_colors) + + labs( + x = "Debt-to-Asset", y = "Interest Coverage", + title = "Debt-to-asset ratios and interest coverages for Dow index constituents" + ) + + theme(legend.position = "none") +fig_debt_to_asset_interest_coverage +``` + +The scatter plot suggests that companies with higher debt-to-asset ratios tend to have lower interest coverage ratios, though there's considerable variation in this relationship. This pattern reflects the natural trade-off between leverage and financial flexibility - higher debt loads typically result in higher interest expenses, which reduce interest coverage ratios. Microsoft stands out with a conservative debt-to-asset ratio around 20% but a strong interest coverage ratio of nearly 500%, indicating very comfortable debt servicing capacity. Apple shows a moderate debt-to-asset ratio near 30% with interest coverage around 300%, while Amazon maintains similar leverage but lower interest coverage at about 100%. + +## Efficiency Ratios + +Efficiency ratios measure how effectively a company utilizes its assets and manages its operations. These metrics are crucial for understanding operational performance and management effectiveness, particularly in how well a company converts its various assets into revenue and profit. + +Asset Turnover measures how efficiently a company uses its total assets to generate revenue: + +$$\text{Asset Turnover} = \frac{\text{Revenue}}{\text{Total Assets}}$$ + +A higher ratio indicates more efficient use of assets in generating sales. However, this ratio typically varies significantly across industries - retail companies often have higher turnover ratios due to lower asset requirements, while manufacturing companies might show lower ratios due to substantial fixed asset investments. + +Inventory turnover indicates how many times a company's inventory is sold and replaced over a period: + +$$\text{Inventory Turnover} = \frac{\text{COGS}}{\text{Inventory}}$$ + +Higher inventory turnover suggests more efficient inventory management and working capital utilization. However, extremely high ratios might indicate potential stockouts, while very low ratios could suggest obsolete inventory or overinvestment in working capital. + +Receivables turnover measures how effectively a company collects payments from customers: + +$$\text{Receivables Turnover} = \frac{\text{Revenue}}{\text{Accounts Receivable}}$$ +A higher ratio indicates more efficient credit and collection processes, though this must be balanced against the potential impact on sales from overly restrictive credit policies. + +Here's how we can calculate these efficiency metrics across our sample of Dow Jones companies: + +```{r} +combined_statements <- balance_sheets_statements |> + select(symbol, calendar_year, label, current_ratio, quick_ratio, cash_ratio, + debt_to_equity, debt_to_asset, total_assets, total_equity) |> + left_join( + income_statements |> + select(symbol, calendar_year, interest_coverage, revenue, cost_of_revenue, + selling_general_and_administrative_expenses, interest_expense, + gross_profit, net_income), + join_by(symbol, calendar_year) + ) |> + left_join( + cash_flow_statements |> + select(symbol, calendar_year, inventory, accounts_receivables), + join_by(symbol, calendar_year) + ) + +combined_statements <- combined_statements |> + mutate( + asset_turnover = revenue / total_assets, + inventory_turnover = cost_of_revenue / inventory, + receivables_turnover = revenue / accounts_receivables + ) +``` + +We'll leave the visualization and interpretation of these figures as an exercise and move on the the last category of financial ratios. + +## Profitability Ratios + +Profitability ratios evaluate a company's ability to generate earnings relative to its revenue, assets, and equity. These metrics are fundamental to investment analysis as they directly measure a company's operational efficiency and financial success. + +The gross margin measures what percentage of revenue remains after accounting for the direct costs of producing goods or services: + +$$\text{Gross Margin} = \frac{\text{Gross Profit}}{\text{Revenue}}$$ + +A higher gross margin indicates stronger pricing power or more efficient production processes. This metric is particularly useful for comparing companies within the same industry, as it reveals their relative efficiency in core operations before accounting for operating expenses and other costs. + +The profit margin reveals what percentage of revenue ultimately becomes net income: + +$$\text{Profit Margin} = \frac{\text{Net Income}}{\text{Revenue}}$$ +This comprehensive profitability measure accounts for all costs, expenses, interest, and taxes. A higher profit margin suggests more effective overall cost management and stronger competitive position, though optimal margins vary significantly across industries. + +Return on Equity (ROE) measures how efficiently a company uses shareholders' investments to generate profits: + +$$\text{After-Tax ROE} = \frac{\text{Net Income}}{\text{Total Equity}}$$ +This metric is particularly important for investors as it directly measures the return on their invested capital. A higher ROE indicates more effective use of shareholders' equity, though it must be considered alongside leverage ratios since high debt levels can artificially inflate ROE. + +The next code chunk calculates these profitability metrics for our sample of companies, allowing us to analyze how effectively different firms convert their revenue into various levels of profit and return on investment. + +```{r} +combined_statements <- combined_statements |> + mutate( + gross_margin = gross_profit / revenue, + profit_margin = net_income / revenue, + after_tax_roe = net_income / total_equity + ) +``` + +@fig-413 shows the patterns in gross margin trends among Microsoft, Apple, and Amazon between 2019 and 2023. + +```{r} +#| label: fig-413 +#| fig-cap: "Gross margins are based on financial statements as provided through the FMP API." +#| fig-alt: "Title: Gross margins for selected stocks between 2019 and 2023. The figure shows a line chart with years on the horizontal axis and gross margins on the vertical axis." +fig_gross_margin <- combined_statements |> + filter(symbol %in% selected_symbols) |> + ggplot(aes(x = calendar_year, y = gross_margin, color = symbol)) + + geom_line() + + scale_y_continuous(labels = percent) + + labs(x = NULL, y = NULL, color = NULL, + title = "Gross margins for selected stocks between 2019 and 2023") +fig_gross_margin +``` + +Microsoft maintains the highest margins at 65-70%, reflecting its low-cost software business model, while Apple and Amazon show lower but improving margins from around 40% to 45-47%. This divergence highlights fundamental business model differences: Microsoft's software and cloud services have minimal direct costs compared to Apple's hardware manufacturing and Amazon's retail operations, though the upward trends across all three companies suggest successful shifts toward higher-margin segments and improved operational efficiency. + +@fig-414 illustrates the relationship between gross margins and profit margins across Dow Jones constituents in 2023 + +```{r} +#| label: fig-414 +#| fig-cap: "Gross and profit margins are based on financial statements as provided through the FMP API." +#| fig-alt: "Title: Gross and profit margins for Dow index constituents for 2023. The figure shows a scatter plot with gross margins on the horizontal and profit margins on the vertical axis." +fig_gross_margin_profit_margin <- combined_statements |> + filter(calendar_year == 2023) |> + ggplot(aes(x = gross_margin, y = profit_margin, color = label)) + + geom_point(size = 2) + + geom_label_repel(aes(label = label), seed = 42, box.padding = 0.75) + + scale_x_continuous(labels = percent) + + scale_y_continuous(labels = percent) + + scale_color_manual(values = selected_colors) + + labs( + x = "Gross margin", y = "Profit margin", + title = "Gross and profit margins for Dow index constituents for 2023" + ) + + theme(legend.position = "none") +fig_gross_margin_profit_margin +``` + +Microsoft shows superior profitability with approximately 75% gross margin and 35% profit margin, indicating strong cost management throughout its operations. Apple maintains solid performance with 45% gross margin converting to about 25% profit margin, while Amazon, despite similar gross margins to Apple at around 45%, achieves a much lower profit margin of approximately 5%. This disparity in margin conversion efficiency across companies with similar gross margins suggests significant differences in operating costs, with Microsoft's software-focused model enabling more efficient conversion of gross profits into net income compared to Apple's hardware business and Amazon's retail operations. + +## Combining Financial Ratios + +While individual financial ratios provide specific insights, combining them offers a more comprehensive view of company performance. By examining how companies rank across different ratio categories, we can better understand their overall financial position and identify potential strengths and weaknesses in their operations. + +@fig-415 compares Microsoft, Apple, and Amazon's rankings across four key financial ratio categories among Dow Jones constituents. Rankings closer to 1 indicate better performance within each category. + +```{r} +#| label: fig-415 +#| fig-cap: "Ranks are based on financial statements as provided through the FMP API." +#| fig-alt: "Title: Rank in financial ratio categories for selected stocks from the Dow index. The figure shows a scatter plot with ranks for selected stocks on the horizontal and categories of financial ratios on the vertical axis." +financial_ratios <- combined_statements |> + filter(calendar_year == 2023) |> + select(symbol, + contains(c("ratio", "margin", "roe", "_to_", "turnover", "interest_coverage"))) |> + pivot_longer(cols = -symbol) |> + mutate( + type = case_when( + name %in% c("current_ratio", "quick_ratio", "cash_ratio") ~ "Liquidity Ratios", + name %in% c("debt_to_equity", "debt_to_asset", "interest_coverage") ~ "Leverage Ratios", + name %in% c("asset_turnover", "inventory_turnover", "receivables_turnover") ~ "Efficiency Ratios", + name %in% c("gross_margin", "profit_margin", "after_tax_roe") ~ "Profitability Ratios" + ) + ) + +fig_ranks <- financial_ratios |> + group_by(type, name) |> + arrange(desc(value)) |> + mutate(rank = row_number()) |> + group_by(symbol, type) |> + summarize(rank = mean(rank), + .groups = "drop") |> + filter(symbol %in% selected_symbols) |> + ggplot(aes(x = rank, y = type, color = symbol)) + + geom_point(shape = 17, size = 4) + + scale_color_manual(values = selected_colors) + + labs(x = "Average rank", y = NULL, color = NULL, + title = "Average rank among Dow index constituents for selected stocks") + + coord_cartesian(xlim = c(1, 30)) +fig_ranks +``` + +These combined rankings highlight how different business models and strategies lead to varying financial profiles. Technology companies tend to show distinct patterns in their financial ratios, often excelling in certain categories while facing challenges in others based on their specific business focus and operational requirements. This analysis underscores the importance of considering multiple financial metrics together rather than in isolation when evaluating company performance. + +## Financial Ratios in Asset Pricing + +The Fama-French five-factor model aims to explain stock returns by incorporating specific financial metrics ratios. We provide more details in [Replicating Fama-French Factors](replicating-fama-and-french-factors.qmd), but here is an intuitive overview: + +- Size: Calculated as the logarithm of a company’s market capitalization, which is the total market value of its outstanding shares. This factor captures the tendency for smaller firms to outperform larger ones over time. +- Book-to-Market Ratio: Determined by dividing the company’s book equity by its market capitalization. A higher ratio indicates a 'value' stock, while a lower ratio suggests a 'growth'’' stock. This metric helps differentiate between undervalued and overvalued companies. +- Profitability: Measured as the ratio of operating profit to book equity, where operating profit is calculated as revenue minus cost of goods sold (COGS), selling, general, and administrative expenses (SG&A), and interest expense. This factor assesses a company’s efficiency in generating profits from its equity base. +- Investment: Calculated as the percentage change in total assets from the previous period. This factor reflects the company’s growth strategy, indicating whether it is investing aggressively or conservatively. + +We can calculate these factors using the FMP API as follows: + +```{r} +market_cap <- constituents |> + map_df( + \(x) fmp_get( + resource = "historical-market-capitalization", + x, + list(from = "2023-12-29", to = "2023-12-29") + ) + ) + +combined_statements_ff <- combined_statements |> + filter(calendar_year == 2023) |> + left_join(market_cap, join_by(symbol)) |> + left_join( + balance_sheets_statements |> + filter(calendar_year == 2022) |> + select(symbol, total_assets_lag = total_assets), + join_by(symbol) + ) |> + mutate( + size = log(market_cap), + book_to_market = total_equity / market_cap, + operating_profitability = (revenue - cost_of_revenue - selling_general_and_administrative_expenses - interest_expense) / total_equity, + investment = total_assets / total_assets_lag + ) +``` + +@fig-416 shows the ranks of our selected stocks for the Fama-French factors. The ranks of Microsoft, Apple, and Amazon across Fama-French factors reveal interesting patterns in how these major technology companies align with established asset pricing factors. + +```{r} +#| label: fig-416 +#| fig-cap: "Ranks are based on financial statements and historical market capitalization as provided through the FMP API." +#| fig-alt: "Title: Rank in Fama-French variables for selected stocks from the Dow index. The figure shows a scatter plot with ranks for selected stocks on the horizontal and Fama-French variables on the vertical axis." +fig_rank_ff <- combined_statements_ff |> + select(symbol, Size = size, + `Book-to-Market` = book_to_market, + `Profitability` = operating_profitability, + Investment = investment) |> + pivot_longer(-symbol) |> + group_by(name) |> + arrange(desc(value)) |> + mutate(rank = row_number()) |> + ungroup() |> + filter(symbol %in% selected_symbols) |> + ggplot(aes(x = rank, y = name, color = symbol)) + + geom_point(shape = 17, size = 4) + + scale_color_manual(values = selected_colors) + + labs( + x = "Rank", y = NULL, color = NULL, + title = "Rank in Fama-French variables for selected stocks from the Dow index" + ) + + coord_cartesian(xlim = c(1, 30)) +fig_rank_ff +``` + +As expected, all three tech giants rank among the largest firms by size. Apple shows the highest profitability among the three tech giants according to the new measure, while Microsoft ranks only in the middle. In terms of investment, however, Apple ranks in the lower third of the distribution. All three stocks tend to have low book to market ratios which are typical for value stocks. + +## Key Takeaways + +This chapter introduced fundamental tools for analyzing companies through their financial statements and ratios. Key insights include: + +- Financial statements are standardized tools that capture a company's financial position (balance sheet), performance (income statement), and cash movements (cash flow statement). +- Liquidity ratios reveal short-term solvency, with tech companies typically showing lower ratios due to strong cash flows. +- Leverage and profitability ratios highlight distinct business strategies, even within sectors, as seen in the varying approaches of Microsoft, Apple, and Amazon. +- Financial ratios connect to asset pricing, for instance, through the Fama-French factors, demonstrating how fundamental characteristics influence expected returns. + +## Exercises + +1. Download the financial statements for Netflix (NFLX) using the FMP API. Calculate its current ratio, quick ratio, and cash ratio for the past 3 years. Create a line plot showing how these liquidity ratios have evolved over time. How do Netflix's liquidity ratios compare to those of the technology companies discussed in this chapter? +1. Select three companies from different industries in the Dow Jones Industrial Average. Calculate their debt-to-equity ratios, debt-to-asset ratios, and interest coverage ratios. Create a visualization comparing these leverage metrics across the companies. Write a brief analysis explaining how and why leverage patterns differ across industries. +1. For all Dow Jones constituents, calculate asset turnover, inventory turnover, and receivables turnover. Create a scatter plot showing the relationship between asset turnover and profitability. Identify any outliers and explain potential reasons for their unusual performance. Which industries tend to show higher efficiency ratios? Why might this be the case? +1. Create an R function that (i) takes a company symbol as input, (ii) downloads the latest financial statements, (iii) calculates the gross margin, and (iv) plots a visualization that shows the company's rank compared to the Dow Jones Industrial Average companies. diff --git a/r/financial-ratios_cache/html/__packages b/r/financial-ratios_cache/html/__packages new file mode 100644 index 00000000..72a6b799 --- /dev/null +++ b/r/financial-ratios_cache/html/__packages @@ -0,0 +1,14 @@ +ggplot2 +tidyverse +tibble +tidyr +readr +purrr +dplyr +stringr +forcats +lubridate +tidyfinance +scales +ggrepel +fmpapi diff --git a/r/financial-ratios_files/figure-html/fig-409-1.png b/r/financial-ratios_files/figure-html/fig-409-1.png new file mode 100644 index 00000000..4d1ae385 Binary files /dev/null and b/r/financial-ratios_files/figure-html/fig-409-1.png differ diff --git a/r/financial-ratios_files/figure-html/fig-410-1.png b/r/financial-ratios_files/figure-html/fig-410-1.png new file mode 100644 index 00000000..3bd291af Binary files /dev/null and b/r/financial-ratios_files/figure-html/fig-410-1.png differ diff --git a/r/financial-ratios_files/figure-html/fig-411-1.png b/r/financial-ratios_files/figure-html/fig-411-1.png new file mode 100644 index 00000000..95548af6 Binary files /dev/null and b/r/financial-ratios_files/figure-html/fig-411-1.png differ diff --git a/r/financial-ratios_files/figure-html/fig-412-1.png b/r/financial-ratios_files/figure-html/fig-412-1.png new file mode 100644 index 00000000..ef2c02fa Binary files /dev/null and b/r/financial-ratios_files/figure-html/fig-412-1.png differ diff --git a/r/financial-ratios_files/figure-html/fig-413-1.png b/r/financial-ratios_files/figure-html/fig-413-1.png new file mode 100644 index 00000000..5493dfe7 Binary files /dev/null and b/r/financial-ratios_files/figure-html/fig-413-1.png differ diff --git a/r/financial-ratios_files/figure-html/fig-414-1.png b/r/financial-ratios_files/figure-html/fig-414-1.png new file mode 100644 index 00000000..a721c62a Binary files /dev/null and b/r/financial-ratios_files/figure-html/fig-414-1.png differ diff --git a/r/financial-ratios_files/figure-html/fig-415-1.png b/r/financial-ratios_files/figure-html/fig-415-1.png new file mode 100644 index 00000000..05b9d02d Binary files /dev/null and b/r/financial-ratios_files/figure-html/fig-415-1.png differ diff --git a/r/financial-ratios_files/figure-html/fig-416-1.png b/r/financial-ratios_files/figure-html/fig-416-1.png new file mode 100644 index 00000000..99ba9539 Binary files /dev/null and b/r/financial-ratios_files/figure-html/fig-416-1.png differ diff --git a/r/financial-ratios_files/figure-html/unnamed-chunk-10-1.png b/r/financial-ratios_files/figure-html/unnamed-chunk-10-1.png new file mode 100644 index 00000000..1c4633cb Binary files /dev/null and b/r/financial-ratios_files/figure-html/unnamed-chunk-10-1.png differ diff --git a/r/financial-ratios_files/figure-html/unnamed-chunk-11-1.png b/r/financial-ratios_files/figure-html/unnamed-chunk-11-1.png new file mode 100644 index 00000000..6506a319 Binary files /dev/null and b/r/financial-ratios_files/figure-html/unnamed-chunk-11-1.png differ diff --git a/r/financial-ratios_files/figure-html/unnamed-chunk-14-1.png b/r/financial-ratios_files/figure-html/unnamed-chunk-14-1.png new file mode 100644 index 00000000..5493dfe7 Binary files /dev/null and b/r/financial-ratios_files/figure-html/unnamed-chunk-14-1.png differ diff --git a/r/financial-ratios_files/figure-html/unnamed-chunk-15-1.png b/r/financial-ratios_files/figure-html/unnamed-chunk-15-1.png new file mode 100644 index 00000000..fbe557e0 Binary files /dev/null and b/r/financial-ratios_files/figure-html/unnamed-chunk-15-1.png differ diff --git a/r/financial-ratios_files/figure-html/unnamed-chunk-16-1.png b/r/financial-ratios_files/figure-html/unnamed-chunk-16-1.png new file mode 100644 index 00000000..05b9d02d Binary files /dev/null and b/r/financial-ratios_files/figure-html/unnamed-chunk-16-1.png differ diff --git a/r/financial-ratios_files/figure-html/unnamed-chunk-18-1.png b/r/financial-ratios_files/figure-html/unnamed-chunk-18-1.png new file mode 100644 index 00000000..99ba9539 Binary files /dev/null and b/r/financial-ratios_files/figure-html/unnamed-chunk-18-1.png differ diff --git a/r/financial-ratios_files/figure-html/unnamed-chunk-7-1.png b/r/financial-ratios_files/figure-html/unnamed-chunk-7-1.png new file mode 100644 index 00000000..4d1ae385 Binary files /dev/null and b/r/financial-ratios_files/figure-html/unnamed-chunk-7-1.png differ diff --git a/r/financial-ratios_files/figure-html/unnamed-chunk-9-1.png b/r/financial-ratios_files/figure-html/unnamed-chunk-9-1.png new file mode 100644 index 00000000..83311cc4 Binary files /dev/null and b/r/financial-ratios_files/figure-html/unnamed-chunk-9-1.png differ diff --git a/r/introduction-to-tidy-finance.qmd b/r/introduction-to-tidy-finance.qmd deleted file mode 100644 index 94fa1483..00000000 --- a/r/introduction-to-tidy-finance.qmd +++ /dev/null @@ -1,390 +0,0 @@ ---- -title: Introduction to Tidy Finance -aliases: - - ../introduction-to-tidy-finance.html -metadata: - pagetitle: Introduction to Tidy Finance with R - description-meta: Learn how to use the programming language R for downloading and analyzing stock market data. ---- - -::: callout-note -You are reading **Tidy Finance with R**. You can find the equivalent chapter for the sibling **Tidy Finance with Python** [here](../python/introduction-to-tidy-finance.qmd). -::: - -The main aim of this chapter is to familiarize yourself with the `tidyverse`. We start by downloading and visualizing stock data from Yahoo Finance. Then we move to a simple portfolio choice problem and construct the efficient frontier. These examples introduce you to our approach of *Tidy Finance*. - -## Working with Stock Market Data - -At the start of each session, we load the required R packages. Throughout the entire book, we always use the `tidyverse` [@Wickham2019]. In this chapter, we also load the `tidyfinance` package to download stock price data. This package provides a convenient wrapper for various quantitative functions compatible with the `tidyverse` and our book.\index{tidyverse} Finally, the package `scales` [@scales] provides useful scale functions for visualizations. - -You typically have to install a package once before you can load it. In case you have not done this yet, call `install.packages("tidyfinance")`. \index{tidyfinance} - -```{r} -#| message: false -library(tidyverse) -library(tidyfinance) -library(scales) -``` - -We first download daily prices for one stock symbol, e.g., the Apple stock, *AAPL*, directly from the data provider Yahoo Finance. To download the data, you can use the function `download_data`. If you do not know how to use it, make sure you read the help file by calling `?download_data`. We especially recommend taking a look at the examples section of the documentation. We request daily data for a period of more than 20 years.\index{Stock prices} - -```{r} -#| cache: true -prices <- download_data( - type = "stock_prices", - symbols = "AAPL", - start_date = "2000-01-01", - end_date = "2023-12-31" -) -prices -``` - -\index{Data!Yahoo Finance} `download_data(type = "stock_prices")` downloads stock market data from Yahoo Finance. The function returns a tibble with eight quite self-explanatory columns: `symbol`, `date`, the daily `volume` (in the number of traded shares), the market prices at the `open`, `high`, `low`, `close`, and the `adjusted` price in USD. The adjusted prices are corrected for anything that might affect the stock price after the market closes, e.g., stock splits and dividends. These actions affect the quoted prices, but they have no direct impact on the investors who hold the stock. Therefore, we often rely on adjusted prices when it comes to analyzing the returns an investor would have earned by holding the stock continuously.\index{Stock price adjustments} - -Next, we use the `ggplot2` package [@ggplot2] to visualize the time series of adjusted prices in @fig-100 . This package takes care of visualization tasks based on the principles of the grammar of graphics [@Wilkinson2012].\index{Graph!Time series} - -```{r} -#| label: fig-100 -#| fig-cap: "Prices are in USD, adjusted for dividend payments and stock splits." -#| fig-alt: "Title: Apple stock prices between the beginning of 2000 and the end of 2023. The figure shows that the stock price of Apple increased dramatically from about 1 USD to around 125 USD." -prices |> - ggplot(aes(x = date, y = adjusted_close)) + - geom_line() + - labs( - x = NULL, - y = NULL, - title = "Apple stock prices between beginning of 2000 and end of 2023" - ) -``` - -\index{Returns} Instead of analyzing prices, we compute daily net returns defined as $r_t = p_t / p_{t-1} - 1$, where $p_t$ is the adjusted day $t$ price. In that context, the function `lag()` is helpful, which returns the previous value in a vector. - -```{r} -returns <- prices |> - arrange(date) |> - mutate(ret = adjusted_close / lag(adjusted_close) - 1) |> - select(symbol, date, ret) -returns -``` - -The resulting tibble contains three columns, where the last contains the daily returns (`ret`). Note that the first entry naturally contains a missing value (`NA`) because there is no previous price.\index{Missing value} Obviously, the use of `lag()` would be meaningless if the time series is not ordered by ascending dates.\index{Lag observations} The command `arrange()` provides a convenient way to order observations in the correct way for our application. In case you want to order observations by descending dates, you can use `arrange(desc(date))`. - -For the upcoming examples, we remove missing values as these would require separate treatment when computing, e.g., sample averages. In general, however, make sure you understand why `NA` values occur and carefully examine if you can simply get rid of these observations. - -```{r} -returns <- returns |> - drop_na(ret) -``` - -Next, we visualize the distribution of daily returns in a histogram in @fig-101. \index{Graph!Histogram} Additionally, we add a dashed line that indicates the 5 percent quantile of the daily returns to the histogram, which is a (crude) proxy for the worst return of the stock with a probability of at most 5 percent. The 5 percent quantile is closely connected to the (historical) value-at-risk, a risk measure commonly monitored by regulators. \index{Value-at-risk} We refer to @Tsay2010 for a more thorough introduction to stylized facts of returns.\index{Returns} - -```{r} -#| label: fig-101 -#| fig-alt: "Title: Distribution of daily Apple stock returns in percent. The figure shows a histogram of daily returns. The range indicates a few large negative values, while the remaining returns are distributed around 0. The vertical line indicates that the historical 5 percent quantile of daily returns was around negative 3 percent." -#| fig-cap: "The dotted vertical line indicates the historical 5 percent quantile." -quantile_05 <- quantile(returns |> pull(ret), probs = 0.05) -returns |> - ggplot(aes(x = ret)) + - geom_histogram(bins = 100) + - geom_vline(aes(xintercept = quantile_05), - linetype = "dashed" - ) + - labs( - x = NULL, - y = NULL, - title = "Distribution of daily Apple stock returns" - ) + - scale_x_continuous(labels = percent) -``` - -Here, `bins = 100` determines the number of bins used in the illustration and hence implicitly the width of the bins. Before proceeding, make sure you understand how to use the geom `geom_vline()` to add a dashed line that indicates the 5 percent quantile of the daily returns. A typical task before proceeding with *any* data is to compute summary statistics for the main variables of interest. - -```{r} -returns |> - summarize(across( - ret, - list( - daily_mean = mean, - daily_sd = sd, - daily_min = min, - daily_max = max - ) - )) -``` - -We see that the maximum *daily* return was `r returns |> pull(ret) |> max() * 100` percent. Perhaps not surprisingly, the average daily return is close to but slightly above 0. In line with the illustration above, the large losses on the day with the minimum returns indicate a strong asymmetry in the distribution of returns.\ -You can also compute these summary statistics for each year individually by imposing `group_by(year = year(date))`, where the call `year(date)` returns the year. More specifically, the few lines of code below compute the summary statistics from above for individual groups of data defined by year. The summary statistics, therefore, allow an eyeball analysis of the time-series dynamics of the return distribution. - -```{r} -returns |> - group_by(year = year(date)) |> - summarize(across( - ret, - list( - daily_mean = mean, - daily_sd = sd, - daily_min = min, - daily_max = max - ), - .names = "{.fn}" - )) |> - print(n = Inf) -``` - -\index{Summary statistics} - -In case you wonder: the additional argument `.names = "{.fn}"` in `across()` determines how to name the output columns. The specification is rather flexible and allows almost arbitrary column names, which can be useful for reporting. The `print()` function simply controls the output options for the R console. - -## Scaling Up the Analysis - -As a next step, we generalize the code from before such that all the computations can handle an arbitrary vector of symbols (e.g., all constituents of an index). Following tidy principles, it is quite easy to download the data, plot the price time series, and tabulate the summary statistics for an arbitrary number of assets. - -This is where the `tidyverse` magic starts: tidy data makes it extremely easy to generalize the computations from before to as many assets as you like. The following code takes any vector of symbols, e.g., `symbol <- c("AAPL", "MMM", "BA")`, and automates the download as well as the plot of the price time series. In the end, we create the table of summary statistics for an arbitrary number of assets. We perform the analysis with data from all current constituents of the [Dow Jones Industrial Average index.](https://en.wikipedia.org/wiki/Dow_Jones_Industrial_Average) \index{Data!Dow Jones Index} - -```{r} -#| message: false -symbols <- download_data(type = "constituents", index = "Dow Jones Industrial Average") -symbols -``` - -Conveniently, `tidyfinance` provides the functionality to get all stock prices from an index with a single call. \index{Exchange!NASDAQ} - -```{r} -#| cache: true -#| output: false -prices_daily <- download_data( - type = "stock_prices", - symbols = symbols$symbol, - start_date = "2000-01-01", - end_date = "2023-12-31" -) -``` - -The resulting tibble contains `r nrow(prices_daily)` daily observations for `r unique(prices_daily$symbol)` different stocks. @fig-103 illustrates the time series of downloaded *adjusted* prices for each of the constituents of the Dow Jones index. Make sure you understand every single line of code! What are the arguments of `aes()`? Which alternative `geoms` could you use to visualize the time series? Hint: if you do not know the answers try to change the code to see what difference your intervention causes. - -```{r} -#| label: fig-103 -#| fig-cap: "Prices in USD, adjusted for dividend payments and stock splits." -#| fig-alt: "Title: Stock prices of DOW index constituents. The figure shows many time series with daily prices. The general trend seems positive for most stocks in the DOW index." -prices_daily |> - ggplot(aes( - x = date, - y = adjusted_close, - color = symbol - )) + - geom_line() + - labs( - x = NULL, - y = NULL, - color = NULL, - title = "Stock prices of DOW index constituents" - ) + - theme(legend.position = "none") -``` - -Do you notice the small differences relative to the code we used before? All we need to do to illustrate all stock symbols simultaneously is to include `color = symbol` in the `ggplot` aesthetics. In this way, we generate a separate line for each symbol. Of course, there are simply too many lines on this graph to identify the individual stocks properly, but it illustrates the point well. - -The same holds for stock returns. Before computing the returns, we use `group_by(symbol)` such that the `mutate()` command is performed for each symbol individually. The same logic also applies to the computation of summary statistics: `group_by(symbol)` is the key to aggregating the time series into symbol-specific variables of interest. - -```{r} -returns_daily <- prices_daily |> - group_by(symbol) |> - mutate(ret = adjusted_close / lag(adjusted_close) - 1) |> - select(symbol, date, ret) |> - drop_na(ret) - -returns_daily |> - group_by(symbol) |> - summarize(across( - ret, - list( - daily_mean = mean, - daily_sd = sd, - daily_min = min, - daily_max = max - ), - .names = "{.fn}" - )) |> - print(n = Inf) -``` - -\index{Summary statistics} - -Note that you are now also equipped with all tools to download price data for *each* symbol listed in the S&P 500 index with the same number of lines of code. Just use `symbol <- download_data(type = "constituents", index = "S&P 500")`, which provides you with a tibble that contains each symbol that is (currently) part of the S&P 500.\index{Data!SP 500} However, don't try this if you are not prepared to wait for a couple of minutes because this is quite some data to download! - -## Other Forms of Data Aggregation - -Of course, aggregation across variables other than `symbol` can also make sense. For instance, suppose you are interested in answering the question: Are days with high aggregate trading volume likely followed by days with high aggregate trading volume? To provide some initial analysis on this question, we take the downloaded data and compute aggregate daily trading volume for all Dow Jones constituents in USD. Recall that the column `volume` is denoted in the number of traded shares.\index{Trading volume} Thus, we multiply the trading volume with the daily closing price to get a proxy for the aggregate trading volume in USD. Scaling by `1e9` (R can handle scientific notation) denotes daily trading volume in billion USD. - -```{r} -#| label: fig-104 -#| fig-cap: "Total daily trading volume in billion USD." -#| fig-alt: "Title: Aggregate daily trading volume. The figure shows a volatile time series of daily trading volume, ranging from 15 in 2000 to 20.5 in 2023, with a maximum of more than 100." -trading_volume <- prices_daily |> - group_by(date) |> - summarize(trading_volume = sum(volume * adjusted_close)) - -trading_volume |> - ggplot(aes(x = date, y = trading_volume)) + - geom_line() + - labs( - x = NULL, y = NULL, - title = "Aggregate daily trading volume of DOW index constitutens" - ) + - scale_y_continuous(labels = unit_format(unit = "B", scale = 1e-9)) - -``` - -@fig-104 indicates a clear upward trend in aggregated daily trading volume. In particular, since the outbreak of the COVID-19 pandemic, markets have processed substantial trading volumes, as analyzed, for instance, by @Goldstein2021.\index{Covid 19} One way to illustrate the persistence of trading volume would be to plot volume on day $t$ against volume on day $t-1$ as in the example below. In @fig-105, we add a dotted 45°-line to indicate a hypothetical one-to-one relation by `geom_abline()`, addressing potential differences in the axes' scales. - -```{r} -#| label: fig-105 -#| fig-cap: "Total daily trading volume in billion USD." -#| fig-alt: "Title: Persistence in daily trading volume of DOW index constituents. The figure shows a scatterplot where aggregate trading volume and previous-day aggregate trading volume neatly line up along a 45-degree line." -trading_volume |> - ggplot(aes(x = lag(trading_volume), y = trading_volume)) + - geom_point() + - geom_abline(aes(intercept = 0, slope = 1), - linetype = "dashed" - ) + - labs( - x = "Previous day aggregate trading volume", - y = "Aggregate trading volume", - title = "Persistence in daily trading volume of DOW index constituents" - ) + - scale_x_continuous(labels = unit_format(unit = "B", scale = 1e-9)) + - scale_y_continuous(labels = unit_format(unit = "B", scale = 1e-9)) -``` - -Do you understand where the warning `## Warning: Removed 1 rows containing missing values (geom_point).` comes from and what it means? Purely eye-balling reveals that days with high trading volume are often followed by similarly high trading volume days.\index{Error message} - -## Portfolio Choice Problems - -In the previous part, we show how to download stock market data and inspect it with graphs and summary statistics. Now, we move to a typical question in Finance: how to allocate wealth across different assets optimally.\index{Portfolio choice} The standard framework for optimal portfolio selection considers investors that prefer higher future returns but dislike future return volatility (defined as the square root of the return variance): the *mean-variance investor* [@Markowitz1952].\index{Markowitz optimization} - -\index{Efficient frontier} An essential tool to evaluate portfolios in the mean-variance context is the *efficient frontier*, the set of portfolios which satisfies the condition that no other portfolio exists with a higher expected return but with the same volatility (the square root of the variance, i.e., the risk), see, e.g., @Merton1972.\index{Return volatility} We compute and visualize the efficient frontier for several stocks. First, we extract each asset's *monthly* returns. In order to keep things simple, we work with a balanced panel and exclude DOW constituents for which we do not observe a price on every single trading day since the year 2000. - -```{r} -prices_daily <- prices_daily |> - group_by(symbol) |> - mutate(n = n()) |> - ungroup() |> - filter(n == max(n)) |> - select(-n) - -returns_monthly <- prices_daily |> - mutate(date = floor_date(date, "month")) |> - group_by(symbol, date) |> - summarize(price = last(adjusted_close), .groups = "drop_last") |> - mutate(ret = price / lag(price) - 1) |> - drop_na(ret) |> - select(-price) -``` - -Here, `floor_date()` is a function from the `lubridate` package [@lubridate], which provides useful functions to work with dates and times. - -Next, we transform the returns from a tidy tibble into a $(T \times N)$ matrix with one column for each of the $N$ symbols and one row for each of the $T$ trading days to compute the sample average return vector $$\hat\mu = \frac{1}{T}\sum\limits_{t=1}^T r_t$$ where $r_t$ is the $N$ vector of returns on date $t$ and the sample covariance matrix $$\hat\Sigma = \frac{1}{T-1}\sum\limits_{t=1}^T (r_t - \hat\mu)(r_t - \hat\mu)'.$$ We achieve this by using `pivot_wider()` with the new column names from the column `symbol` and setting the values to `ret`. We compute the vector of sample average returns and the sample variance-covariance matrix, which we consider as proxies for the parameters of the distribution of future stock returns. Thus, for simplicity, we refer to $\Sigma$ and $\mu$ instead of explicitly highlighting that the sample moments are estimates. \index{Covariance} In later chapters, we discuss the issues that arise once we take estimation uncertainty into account. - -```{r} -returns_matrix <- returns_monthly |> - pivot_wider( - names_from = symbol, - values_from = ret - ) |> - select(-date) -sigma <- cov(returns_matrix) -mu <- colMeans(returns_matrix) -``` - -Then, we compute the minimum variance portfolio weights $\omega_\text{mvp}$ as well as the expected portfolio return $\omega_\text{mvp}'\mu$ and volatility $\sqrt{\omega_\text{mvp}'\Sigma\omega_\text{mvp}}$ of this portfolio. \index{Minimum variance portfolio} Recall that the minimum variance portfolio is the vector of portfolio weights that are the solution to $$\omega_\text{mvp} = \arg\min \omega'\Sigma \omega \text{ s.t. } \sum\limits_{i=1}^N\omega_i = 1.$$ The constraint that weights sum up to one simply implies that all funds are distributed across the available asset universe, i.e., there is no possibility to retain cash. It is easy to show analytically that $\omega_\text{mvp} = \frac{\Sigma^{-1}\iota}{\iota'\Sigma^{-1}\iota}$, where $\iota$ is a vector of ones and $\Sigma^{-1}$ is the inverse of $\Sigma$. - -```{r} -N <- ncol(returns_matrix) -iota <- rep(1, N) -sigma_inv <- solve(sigma) -mvp_weights <- sigma_inv %*% iota -mvp_weights <- mvp_weights / sum(mvp_weights) -tibble( - average_ret = as.numeric(t(mvp_weights) %*% mu), - volatility = as.numeric(sqrt(t(mvp_weights) %*% sigma %*% mvp_weights)) -) -``` - -The command `solve(A, b)` returns the solution of a system of equations $Ax = b$. If `b` is not provided, as in the example above, it defaults to the identity matrix such that `solve(sigma)` delivers $\Sigma^{-1}$ (if a unique solution exists).\ -Note that the *monthly* volatility of the minimum variance portfolio is of the same order of magnitude as the *daily* standard deviation of the individual components. Thus, the diversification benefits in terms of risk reduction are tremendous!\index{Diversification} - -Next, we set out to find the weights for a portfolio that achieves, as an example, three times the expected return of the minimum variance portfolio. However, mean-variance investors are not interested in any portfolio that achieves the required return but rather in the efficient portfolio, i.e., the portfolio with the lowest standard deviation. If you wonder where the solution $\omega_\text{eff}$ comes from: \index{Efficient portfolio} The efficient portfolio is chosen by an investor who aims to achieve minimum variance *given a minimum acceptable expected return* $\bar{\mu}$. Hence, their objective function is to choose $\omega_\text{eff}$ as the solution to $$\omega_\text{eff}(\bar{\mu}) = \arg\min \omega'\Sigma \omega \text{ s.t. } \omega'\iota = 1 \text{ and } \omega'\mu \geq \bar{\mu}.$$ - -The code below implements the analytic solution to this optimization problem for a benchmark return $\bar\mu$, which we set to 3 times the expected return of the minimum variance portfolio. We encourage you to verify that it is correct. - -```{r} -benchmark_multiple <- 3 -mu_bar <- benchmark_multiple * t(mvp_weights) %*% mu -C <- as.numeric(t(iota) %*% sigma_inv %*% iota) -D <- as.numeric(t(iota) %*% sigma_inv %*% mu) -E <- as.numeric(t(mu) %*% sigma_inv %*% mu) -lambda_tilde <- as.numeric(2 * (mu_bar - D / C) / (E - D^2 / C)) -efp_weights <- mvp_weights + - lambda_tilde / 2 * (sigma_inv %*% mu - D * mvp_weights) -``` - -## The Efficient Frontier - -\index{Separation theorem} The mutual fund separation theorem states that as soon as we have two efficient portfolios (such as the minimum variance portfolio $\omega_\text{mvp}$ and the efficient portfolio for a higher required level of expected returns $\omega_\text{eff}(\bar{\mu})$, we can characterize the entire efficient frontier by combining these two portfolios. That is, any linear combination of the two portfolio weights will again represent an efficient portfolio. \index{Efficient frontier} The code below implements the construction of the *efficient frontier*, which characterizes the highest expected return achievable at each level of risk. To understand the code better, make sure to familiarize yourself with the inner workings of the `for` loop. - -```{r} -length_year <- 12 -a <- seq(from = -0.4, to = 1.9, by = 0.01) -results <- tibble( - a = a, - mu = NA, - sd = NA -) -for (i in seq_along(a)) { - w <- (1 - a[i]) * mvp_weights + (a[i]) * efp_weights - results$mu[i] <- length_year * t(w) %*% mu - results$sd[i] <- sqrt(length_year) * sqrt(t(w) %*% sigma %*% w) -} -``` - -The code above proceeds in two steps: First, we compute a vector of combination weights $a$ and then we evaluate the resulting linear combination with $a\in\mathbb{R}$:\ -$$\omega^* = a\omega_\text{eff}(\bar\mu) + (1-a)\omega_\text{mvp} = \omega_\text{mvp} + \frac{\lambda^*}{2}\left(\Sigma^{-1}\mu -\frac{D}{C}\Sigma^{-1}\iota \right)$$ with $\lambda^* = 2\frac{a\bar\mu + (1-a)\tilde\mu - D/C}{E-D^2/C}$ where $C = \iota'\Sigma^{-1}\iota$, $D=\iota'\Sigma^{-1}\mu$, and $E=\mu'\Sigma^{-1}\mu$. Finally, it is simple to visualize the efficient frontier alongside the two efficient portfolios within one powerful figure using `ggplot` (see @fig-106). We also add the individual stocks in the same call. We compute annualized returns based on the simple assumption that monthly returns are independent and identically distributed. Thus, the average annualized return is just 12 times the expected monthly return.\index{Graph!Efficient frontier} - -```{r} -#| label: fig-106 -#| fig-cap: "The big dots indicate the location of the minimum variance and the efficient portfolio that delivers 3 times the expected return of the minimum variance portfolio, respectively. The small dots indicate the location of the individual constituents." -#| fig-alt: "Title: Efficient frontier for DOW index constituents. The figure shows DOW index constituents in a mean-variance diagram. A hyperbola indicates the efficient frontier of portfolios that dominate the individual holdings in the sense that they deliver higher expected returns for the same level of volatility." -results |> - ggplot(aes(x = sd, y = mu)) + - geom_point() + - geom_point( - data = results |> filter(a %in% c(0, 1)), - size = 4 - ) + - geom_point( - data = tibble( - mu = length_year * mu, - sd = sqrt(length_year) * sqrt(diag(sigma)) - ), - aes(y = mu, x = sd), size = 1 - ) + - labs( - x = "Annualized standard deviation", - y = "Annualized expected return", - title = "Efficient frontier for DOW index constituents" - ) + - scale_x_continuous(labels = percent) + - scale_y_continuous(labels = percent) -``` - -The line in @fig-106 indicates the efficient frontier: the set of portfolios a mean-variance efficient investor would choose from. Compare the performance relative to the individual assets (the dots) - it should become clear that diversifying yields massive performance gains (at least as long as we take the parameters $\Sigma$ and $\mu$ as given). - -## Exercises - -1. Download daily prices for another stock market symbol of your choice from Yahoo Finance with `download_data()` from the `tidyfinance` package. Plot two time series of the symbol’s un-adjusted and adjusted closing prices. Explain the differences. -1. Compute daily net returns for an asset of your choice and visualize the distribution of daily returns in a histogram using 100 bins. Also, use `geom_vline()` to add a dashed red vertical line that indicates the 5 percent quantile of the daily returns. Compute summary statistics (mean, standard deviation, minimum and maximum) for the daily returns. -1. Take your code from before and generalize it such that you can perform all the computations for an arbitrary vector of symbols (e.g., `symbol <- c("AAPL", "MMM", "BA")`). Automate the download, the plot of the price time series, and create a table of return summary statistics for this arbitrary number of assets. -1. Are days with high aggregate trading volume often also days with large absolute returns? Find an appropriate visualization to analyze the question using the symbol `AAPL`. -1.Compute monthly returns from the downloaded stock market prices. Compute the vector of historical average returns and the sample variance-covariance matrix. Compute the minimum variance portfolio weights and the portfolio volatility and average returns. Visualize the mean-variance efficient frontier. Choose one of your assets and identify the portfolio which yields the same historical volatility but achieves the highest possible average return. -1. In the portfolio choice analysis, we restricted our sample to all assets trading every day since 2000. How is such a decision a problem when you want to infer future expected portfolio performance from the results? -1. The efficient frontier characterizes the portfolios with the highest expected return for different levels of risk. Identify the portfolio with the highest expected return per standard deviation. Which famous performance measure is close to the ratio of average returns to the standard deviation of returns? diff --git a/r/modern-portfolio-theory.qmd b/r/modern-portfolio-theory.qmd new file mode 100644 index 00000000..367e2f06 --- /dev/null +++ b/r/modern-portfolio-theory.qmd @@ -0,0 +1,435 @@ +--- +title: Modern Portfolio Theory +metadata: + pagetitle: Modern Portfolio Theory with R + description-meta: Learn how to use the programming language R for implementing the Markowitz model for portfolio optimization. +--- + +In the previous chapter, we show how to download stock market data and analyze them with graphs and summary statistics. Now, we move to a typical question in finance: How should an investor allocate her wealth across assets with varying returns, risks, and correlations to optimize her portfolio’s performance?\index{Portfolio choice} Modern Portfolio Theory (MPT), introduced by [@Markowitz1952], revolutionized the way how we think about such investment decisions by formalizing the trade-off between risk and expected return. Markowitz’s framework laid the foundation for much of modern finance, also earning him the Sveriges Riksbank Prize in Economic Sciences in 1990. + +Markowitz demonstrates that portfolio risk depends on individual asset volatilities as well as on the correlations between asset returns. This insight highlights the power of diversification: combining assets with low or negative correlations reduces the overall portfolio risk. This principle is often illustrated with the analogy of a fruit basket: If all you have are apples and they spoil, you lose everything. With a variety of fruits, some fruits may spoil, but others will stay fresh. + +At the heart of MPT is mean-variance analysis, which evaluates portfolios based on two dimensions: expected return and risk. By balancing these two factors, investors can construct portfolios that either maximize their expected return for a given level of risk or minimize their taken risk for a desired level of return. In this chapter we'll implement this mean-variance approach in R. + +We use the following packages throughout this chapter: + +```{r} +#| message: false +#| warning: false +library(tidyverse) +library(tidyfinance) +library(scales) +library(ggrepel) +``` + +We introduce the @ggrepel package for adding text labels to the figures in this chapter [@ggrepel]. + +## Estimate Expected Returns + +Expected returns, denoted as $\mu_i$, represent the anticipated profit from holding an asset $i$ with $i=1,..., N$. They are typically estimated using historical data by computing the average of past returns: + +$$\hat{\mu}_i = \frac{1}{T} \sum_{t=1}^{T} r_{it},$$ + +where $r_{it}$ is the return of asset $i$ in period $t$, and $T$ is the total number of periods. Note that the hat in $\hat{\mu}_i$ indicates that it is the estimated counterpart of the theoretical entity $\mu_i$. While past performance does not guarantee future results, the typical assumption is that it is at least indicative of future performance and hence a sensible estimator. + +Leveraging the approach introduced in [Working with Stock Returns](working-with-stock-returns.qmd), we download the constituents of the Dow Jones Industrial Average as an example portfolio, as well as their daily adjusted close prices: + +```{r} +#| output: false +symbols <- download_data( + type = "constituents", + index = "Dow Jones Industrial Average" +) + +prices_daily <- download_data( + type = "stock_prices", + symbol = symbols$symbol, + start_date = "2019-08-01", + end_date = "2024-07-31" +) |> + select(symbol, date, price = adjusted_close) +``` + +Then, we proceed to calculate daily returns for each asset. + +```{r} +returns_daily <- prices_daily |> + group_by(symbol) |> + mutate(ret = price / lag(price) - 1) |> + ungroup() |> + select(symbol, date, ret) |> + drop_na(ret) |> + arrange(symbol, date) +``` + +We can use the tidy return data to quickly calculate the sample average of each asset in the Dow Jones Industrial Average that we can use as the estimated expected return. + +```{r} +assets <- returns_daily |> + group_by(symbol) |> + summarize(mu = mean(ret)) +``` + +Figure @fig-201 shows the corresponding average daily returns of the constituents of our example portfolio. + +```{r} +#| label: fig-201 +#| fig-cap: "Average daily returns based on prices adjusted for dividend payments and stock splits." +#| fig-alt: "Title: Average daily stock returns of Dow index constituents. The figure shows 30 bars with average daily returns." +fig_mu <- assets |> + ggplot(aes(x = mu, y = fct_reorder(symbol, mu), fill = mu > 0)) + + geom_col() + + scale_x_continuous(labels = percent) + + labs( + x = NULL, y = NULL, fill = NULL, + title = "Average daily returns of Dow index constituents" + ) + + theme(legend.position = "none") +fig_mu +``` + +## Estimate the Variance-Covariance Matrix + +Individual asset risk in MPT is typically quantified using variance ($\sigma^2$) or volatilities ($\sigma$).^[Alternative approaches include Value-at-Risk (VaR), Expected Shortfall, or higher-order moments such as skewness and kurtosis.] The latter can be estimated as the sample standard deviation: + +$$\hat{\sigma}_i = \sqrt{\frac{1}{T-1} \sum_{t=1}^{T} (r_{it} - \hat{\mu}_i)^2}$$ + +We can estimate the volatilities for each asset by simply using the `sd()` function. + +```{r} +volatilities <- returns_daily |> + group_by(symbol) |> + summarize(sigma = sd(ret)) + +assets <- assets |> + left_join(volatilities, join_by(symbol)) +``` + +Figure @fig-202 shows the corresponding individual stock volatilities. + +```{r} +#| label: fig-202 +#| fig-cap: "Daily volatilities based on prices adjusted for dividend payments and stock splits." +#| fig-alt: "Title: Daily volatilities of DOW index constituents. The figure shows 30 bars with daily volatilities." +fig_sigma <- assets |> + ggplot(aes(x = sigma, y = fct_reorder(symbol, sigma))) + + geom_col() + + scale_x_continuous(labels = percent) + + labs(x = NULL, y = NULL, + title = "Daily volatilities of Dow index constituents") +fig_sigma +``` + +As highlighted above, a key innovation of MPT is to consider interactions between assets. *Covariance* provides a measure for these interactions and can be estimated as follows: \index{Covariance} + +$$\hat{\sigma}_{ij} = \frac{1}{T-1} \sum_{t=1}^{T} (r_{it} - \hat{\mu}_i)(r_{jt} - \hat{\mu}_j)$$ + +The interpretation is straightforward: While a positive covariance indicates that assets tend to move in the same direction, a negative covariance indicates that assets move in opposite directions. Since the overall portfolio volatility is a function of the individual covariances, the lower the covariance values the higher the reduction in overall risk through diversification. + +Since individual asset variances are just a special case of the covariance formula above with $i=j$, we can denote the estimated variance-covariance matrix as: + +$$\hat\Sigma = \frac{1}{T-1}\sum\limits_{t=1}^T (r_t - \hat\mu)(r_t - \hat\mu)'.$$ + +To estimate this variance-covariance matrix, we can use the `cov()` function that takes a matrix of returns as inputs. We thus need to transform the returns from a tidy data frame into a $(T \times N)$ matrix with one column for each of the $N$ symbols and one row for each of the $T$ trading days. We achieve this by using `pivot_wider()` with the new column names from the `symbol`-column and setting the values to `ret`. + +```{r} +returns_wide <- returns_daily |> + pivot_wider(names_from = symbol, values_from = ret) + +vcov <- returns_wide |> + select(-date) |> + cov() +``` + +Figure @fig-203 illustrates the resulting variance-covariance matrix. + +```{r} +#| label: fig-203 +#| fig-cap: "Variances and covariances based on prices adjusted for dividend payments and stock splits." +#| fig-alt: "Title: Variance-covariance matrix of DOW index constituents. The figure shows 900 tiles with variances and covariances between each constituent-pair." +fig_vcov <- vcov |> + as_tibble(rownames = "symbol_a") |> + pivot_longer(-symbol_a, names_to = "symbol_b") |> + ggplot(aes(x = symbol_a, y = fct_rev(symbol_b), fill = value)) + + geom_tile() + + labs( + x = NULL, y = NULL, fill = "(Co-)Variance", + title = "Variance-covariance matrix of Dow index constituents" + ) + + theme(axis.text.x = element_text(angle = 45, hjust = 1)) + + guides(fill = guide_colorbar(barwidth = 15, barheight = 0.5)) +fig_vcov +``` + +## The Minimum-Variance Framework + +The expected asset returns and variance-covariance matrix allow us to formally define portfolio returns: + +$$\text{Expected Portfolio Return} = \sum_{i=1}^N \omega_i \hat{\mu}_i,$$ +where $\omega_i$ is the weight of asset $i$ in the portfolio and $\hat{\mu}_i$ is the estimated expected return of asset $i$. For simplicity, we assume that portfolio weights are constant over time for now. + +The portfolio variance is further calculated as + +$$\text{Portfolio Variance} =\sum_{i=1}^{N} \sum_{j=1}^{N} \omega_i \omega_j \hat{\sigma}_{ij},$$ + +where $\omega_i$ and $\omega_j$ are the weights of assets $i$ and $j$ in the portfolio, respectively, and $\hat{\sigma}_{ij}$ is the estimated covariance between returns of assets $i$ and $j$. + +Let us begin by finding the portfolio with the minimum risk as a reference point. The optimization problem of the mean-variance approach only aiming to minimize the portfolio variance is given by + +$$\min_{\omega_1, ... \omega_n} \sum_{i=1}^{n} \sum_{j=1}^{n} \omega_i \omega_j \hat{\sigma}_{ij},$$ + +while staying fully invested across all available assets $N$: + +$$\sum_{i=1}^{N} \omega_i = 1$$ + +Solving this problem analytically is easier in matrix notation, so we transform the problem to + +$$\min_{\omega} \omega' \hat{\Sigma} \omega \text{ s.t. } \omega'\iota = 1,$$ + +where $\hat{\Sigma}$ is the $(N\times N)$ variance-covariance matrix and $\omega=\left(\omega_1,\ldots,\omega_N\right)'$ is the $(N\times1)$ vector of given portfolio weights. The solution for the minimum-variance portfolio can then be described as + +$$\omega_\text{mvp} = \frac{\Sigma^{-1}\iota}{\iota'\Sigma^{-1}\iota},$$ +where $\iota$ is a vector of 1's and $\Sigma^{-1}$ is inverse of the variance-covariance matrix $\Sigma$. See [Proofs](proofs.qmd) in the Appendix for details on the derivation. In the following code chunk, we calculate the weights of the minimum variance portfolio using this formula: + +```{r} +iota <- rep(1, dim(vcov)[1]) +vcov_inv <- solve(vcov) +omega_mvp <- as.vector(vcov_inv %*% iota) / + as.numeric(t(iota) %*% vcov_inv %*% iota) +``` + +Figure @fig-204 shows the resulting portfolio weights. + +```{r} +#| label: fig-204 +#| fig-cap: "Weights are based on returns adjusted for dividend payments and stock splits." +#| fig-alt: "Title: Minimum-variance portfolio weights. The figure shows a bar chart with portfolio weights for each DOW index constituent." +assets <- bind_cols(assets, omega_mvp = omega_mvp) + +fig_omega_mvp <- assets |> + ggplot(aes(x = omega_mvp, y = fct_reorder(symbol, omega_mvp), + fill = omega_mvp > 0)) + + geom_col() + + scale_x_continuous(labels = percent) + + labs(x = NULL, y = NULL, + title = "Minimum-variance portfolio weights") + + theme(legend.position = "none") +fig_omega_mvp +``` + +Before we move on to other portfolios, we collect the return and risk of the minimum variance portfolio in a data frame: + +```{r} +mu <- assets$mu + +summary_mvp <- tibble( + mu = t(omega_mvp) %*% mu, + sigma = as.numeric(sqrt(t(omega_mvp) %*% vcov %*% omega_mvp)), + type = "Minimum-Variance Portfolio" +) +summary_mvp +``` + +## Efficient Portfolios + +Next, we introduce the concept of efficient portfolios. Again, we aim to minimize portfolio variance + +$$\min_{\omega} \omega' \hat{\Sigma} \omega,$$ + +while staying fully invested *and* earning a **minimum expected return** $\bar{\mu}$ + +$$ \omega'\iota = 1$$ +$$\omega'\hat{\mu} = \bar{\mu}$$ + +To illustrate the difference between the minimum-variance and efficient portfolio, we now use the Nasdaq 100 index: + +```{r} +nasdaq <- download_data( + "stock_prices", symbol = "^NDX", + start_date = "2019-08-01", end_date = "2024-07-31" +) |> + group_by(symbol) |> + arrange(date) |> + mutate(adjusted_close = adjusted_close / first(adjusted_close), + ret = adjusted_close / lag(adjusted_close) - 1) + +nasdaq |> + ggplot(aes(x = date, y = adjusted_close, color = symbol)) + + geom_line() + + scale_y_continuous(labels = percent) + + labs(x = NULL, y = NULL, color = NULL, + title = "Performance of Dow (^DJI) vs Nasdaq 100 (^NDX)", + subtitle = "Both indexes start at 100%") +``` + +We choose the minimum expected return such that we achieve the average Nasdaq 100 return: + +```{r} +mu_bar <- mean(nasdaq$ret, na.rm = TRUE) +``` + +Note that $\bar\mu$ needs to be higher than $\hat\mu_{mvp}$. + +The analytic solution for the efficient portfolio can be derived as: + +$$\omega_{efp} = \frac{\lambda^*}{2}\left(\Sigma^{-1}\mu -\frac{D}{C}\Sigma^{-1}\iota \right),$$ + +where $\lambda^* = 2\frac{\bar\mu - D/C}{E-D^2/C}$, $C = \iota'\Sigma^{-1}\iota$, $D=\iota'\Sigma^{-1}\mu$ & $E=\mu'\Sigma^{-1}\mu$. For details, we again refer to the [Proofs](proofs.qmd) in the Appendix. + +The code below implements the analytic solution to this optimization problem and collects the resulting portfolio return and risk in a data frame. + +```{r} +C <- as.numeric(t(iota) %*% vcov_inv %*% iota) +D <- as.numeric(t(iota) %*% vcov_inv %*% mu) +E <- as.numeric(t(mu) %*% vcov_inv %*% mu) +lambda_tilde <- as.numeric(2 * (mu_bar - D / C) / (E - D^2 / C)) +omega_efp <- as.vector(omega_mvp + lambda_tilde / 2 * (vcov_inv %*% mu - D * omega_mvp)) + +summary_efp <- tibble( + mu = t(omega_efp) %*% mu, + sigma = as.numeric(sqrt(t(omega_efp) %*% vcov %*% omega_efp)), + type = "Efficient Portfolio" +) +``` + +Figure @fig-205 shows the average return and volatility of the minimum-variance and the efficient portfolio relative to the index constituents. We can see that the efficient portfolio dominates the minimum-variance portfolio in both dimensions. + +```{r} +#| label: fig-205 +#| fig-cap: "The big dots indicate the location of the minimum variance and the efficient portfolio that delivers the expected return of the Nasdaq 100, respectively. The small dots indicate the location of the individual constituents." +#| fig-alt: "Title: Minimum-variance portfolio weights. The figure shows a bar chart with portfolio weights for each Nasdaq index constituent." +#| warning: false +summaries <- bind_rows( + assets, summary_mvp, summary_efp +) + +fig_summaries <- summaries |> + ggplot(aes(x = sigma, y = mu)) + + geom_point( + data = summaries |> filter(is.na(type)) + ) + + geom_point( + data = summaries |> filter(!is.na(type)), color = "#F21A00", size = 3 + ) + + geom_label_repel(aes(label = type)) + + scale_x_continuous(labels = percent) + + scale_y_continuous(labels = percent) + + labs( + x = "Volatility", y = "Average return", + title = "Efficient & minimum-variance portfolios for Nasdaq 100 index constituents" + ) +fig_summaries +``` + +## The Efficient Frontier + +\index{Efficient frontier} An essential tool to evaluate portfolios in the mean-variance context is the *efficient frontier*, the set of portfolios which satisfies the condition that no other portfolio exists with a higher expected return for a given level of volatility, e.g., @Merton1972.\index{Return volatility} \index{Separation theorem} The [mutual fund separation theorem](https://en.wikipedia.org/wiki/Mutual_fund_separation_theorem) states that as soon as we have two efficient portfolios (such as the minimum variance portfolio $\omega_\text{mvp}$ and the efficient portfolio for a higher required level of expected returns $\omega_\text{efp}$, we can characterize the entire efficient frontier by combining these two portfolios. That is, any linear combination of the two portfolio weights will again represent an efficient portfolio. \index{Efficient frontier} + +The code below implements the construction of this efficient frontier, which characterizes the highest expected return achievable at each level of risk. + +$$\omega_{efp} = a \cdot \omega_{efp} + (1-a) \cdot\omega_{mvp}$$ + +```{r} +efficient_frontier <- tibble( + a = seq(from = -1, to = 4, by = 0.01), +) |> + mutate( + omega = map(a, ~ .x * omega_efp + (1 - .x) * omega_mvp), + mu = map_dbl(omega, ~ t(.x) %*% mu), + sigma = map_dbl(omega, ~ sqrt(t(.x) %*% vcov %*% .x)), + ) +``` + +Finally, it is simple to visualize the efficient frontier alongside the two efficient portfolios in a figure using `ggplot` (see @fig-206). We also add the individual stocks in the same plot.\index{Graph!Efficient frontier} + +```{r} +#| label: fig-206 +#| fig-cap: "The big dots indicate the location of the minimum variance and the efficient portfolio that delivers 3 times the expected return of the minimum variance portfolio, respectively. The small dots indicate the location of the individual constituents." +#| fig-alt: "Title: Efficient frontier for Nasdaq 100 index constituents. The figure shows Nasdaq 100 index constituents in a mean-variance diagram. A hyperbola indicates the efficient frontier of portfolios that dominate the individual holdings in the sense that they deliver higher expected returns for the same level of volatility." +#| warning: false +summaries <- bind_rows( + summaries, efficient_frontier + ) + +fig_efficient_frontier <- summaries |> + ggplot(aes(x = sigma, y = mu)) + + geom_point( + data = summaries |> filter(is.na(type)) + ) + + geom_point( + data = summaries |> filter(!is.na(type)), color = "#F21A00", size = 3 + ) + + geom_label_repel(aes(label = type)) + + scale_x_continuous(labels = percent) + + scale_y_continuous(labels = percent) + + labs(x = "Volatility", y = "Average return", + title = "Efficient frontier for Dow index constituents") + + theme(legend.position = "none") +fig_efficient_frontier +``` + +## Extending the Markowitz Model + +There are many ways to extend the Markowitz framework. We want to point interested readers to the `PortfolioAnalytics` package, which provides a consistent interface to many popular extensions. + +To illustrate the `PortfolioAnalytics` package, we replicate our analytic solutions from above. We additionally use the `CVXR` package for a flexible suite of numerical optimizers. + +```{r} +#| message: false +#| warning: false +library(PortfolioAnalytics) +library(CVXR) +``` + +We first transform our wide returns data to a matrix where the only column names are the symbols. These symbols are the key input to `portfolio.spec()`, which initializes a portfolio. We can then sequentially add objectives and constraints using the corresponding functions. Finally, the `optimize.portfolio()` function returns the resulting weights that are equal to our analytic solution. + +```{r} +returns_matrix <- column_to_rownames( + returns_wide, var = "date" +) + +problem_mvp <- portfolio.spec(colnames(returns_matrix)) |> + add.objective(type = "risk", name = "var") |> + add.constraint("full_investment") + +solution_mvp <- optimize.portfolio( + returns_matrix, problem_mvp, optimize_method = "CVXR" +) + +all.equal(omega_mvp, as.vector(solution_mvp$weights)) +``` + +Similarly, we can calculate the efficient portfolio by adding the return target constraint to the problem from above. Again, our + +```{r} +problem_efp <- problem_mvp |> + add.constraint("return", return_target = mu_bar) + +solution_efp <- optimize.portfolio( + returns_matrix, problem_efp, optimize_method = "CVXR" +) + +all.equal(omega_efp, as.vector(solution_efp$weights)) +``` + +You can easily extend this, e.g., by adding short-sale constraints (`add.constraint("long_only")`), position limits (`add.constraint("position_limit", max_pos = 10)`), or using different objectives (`add.objective(type = "risk", name = "ES")`). See the [official *PortfolioAnalytics* vignette](https://cran.r-project.org/web/packages/PortfolioAnalytics/vignettes/portfolio_vignette.pdf) for more possibilities. + +## Key Takeaways + +The mean-variance framework is a cornerstone of modern finance, emphasizing the trade-off between risk and return. The key insights from this chapter are: + +- Mean-variance optimization balances expected returns against expected portfolio risk. +- Portfolio risk depends on both volatilities and correlations between assets. +- Minimum-variance portfolio achieves the lowest possible risk for a given set of assets. +- Efficient portfolios maximize expected return for each level of risk. +- Analytical solutions exist for simple constraints but numerical optimization is needed for complex problems. +- The `PortfolioAnalytics` package provides a robust interface for extending the mean-variance framework with additional constraints or alternative objectives. + +## Exercises + +1. We restricted our sample to all assets trading every day since 2019-08-01. Discuss why this restriction might introduce survivorship bias and how it could affect inferences about future expected portfolio performance. +1. The efficient frontier characterizes portfolios with the highest expected return for different levels of risk. Identify the portfolio with the highest expected return per unit of standard deviation (risk). Which famous performance measure corresponds to the ratio of average returns to standard deviation? +1. Analyze how varying the correlation coefficients between asset returns influences the shape of the efficient frontier. Use hypothetical data for a small number of assets to visualize and interpret these changes. +1. Modify the optimization problem to include a constraint that disallows short-selling (i.e., all portfolio weights must be non-negative). How does this constraint affect the minimum-variance portfolio and the efficient frontier? +1. Implement position limits (e.g., no single asset can represent more than 20% of the portfolio). Compare the resulting efficient frontier with the unrestricted case and discuss the implications. +1. Replace the variance and standard deviation with alternative risk metrics, such as Value at Risk (VaR) or Expected Shortfall (ES), in the portfolio optimization process. How do these changes affect the composition of the efficient portfolio? \ No newline at end of file diff --git a/r/render-settings.R b/r/render-settings.R index a5eda4f4..be8cc619 100644 --- a/r/render-settings.R +++ b/r/render-settings.R @@ -32,6 +32,7 @@ scale_colour_continuous <- function(...) { ... ) } + scale_colour_discrete <- function(...) { discrete_scale("colour", scale_name = "pal", @@ -45,6 +46,7 @@ scale_fill_continuous <- function(...) { ... ) } + scale_fill_discrete <- function(...) { discrete_scale("fill", scale_name = "pal", diff --git a/r/working-with-stock-returns.qmd b/r/working-with-stock-returns.qmd new file mode 100644 index 00000000..e376538e --- /dev/null +++ b/r/working-with-stock-returns.qmd @@ -0,0 +1,279 @@ +--- +title: Working with Stock Returns +aliases: + - ../introduction-to-tidy-finance.html + - ../r/introduction-to-tidy-finance.html +metadata: + pagetitle: Working with Stock Returns in R + description-meta: Learn how to use the programming language R for downloading and analyzing stock market data. +--- + +::: callout-note +You are reading **Tidy Finance with R**. You can find the equivalent chapter for the sibling **Tidy Finance with Python** [here](../python/introduction-to-tidy-finance.qmd). +::: + +The main aim of this chapter is to familiarize yourself with the `tidyverse` for working with stock market data. We focus on downloading and visualizing stock data from data provider Yahoo Finance.\index{tidyverse} + +At the start of each session, we load the required R packages. Throughout the entire book, we always use the `tidyverse` [@Wickham2019] package. In this chapter, we also load the `tidyfinance` package to download stock price data. This package provides a convenient wrapper for various quantitative functions compatible with the `tidyverse` and our book. Finally, the package `scales` [@scales] provides useful scale functions for visualizations. + +You typically have to install a package once before you can load it into your active R session. In case you have not done this yet, call, for instance, `install.packages("tidyfinance")`. \index{tidyfinance} + +```{r} +#| message: false +library(tidyverse) +library(tidyfinance) +library(scales) +``` + +We first download daily prices for one stock symbol, e.g., the Apple stock (*AAPL*) directly from the data provider Yahoo Finance. To download the data, you can use the function `download_data`. If you do not know how to use it, make sure you read the help file by calling `?download_data`. We especially recommend taking a look at the examples section of the documentation. We request daily data from the beginning of 2000 to the end of the last year; a period of more than 20 years.\index{Stock prices} + +```{r} +#| cache: true +prices <- download_data( + type = "stock_prices", + symbols = "AAPL", + start_date = "2000-01-01", + end_date = "2024-12-31" +) +prices +``` + +\index{Data!Yahoo Finance} `download_data(type = "stock_prices")` downloads stock market data from Yahoo Finance. The function returns a [tibble](https://tibble.tidyverse.org/) with eight self-explanatory columns: `symbol`, `date`, the daily `volume` (in the number of traded shares), the market prices at the `open`, `high`, `low`, `close`, and the `adjusted` price in USD. The adjusted prices are corrected for anything that might affect the stock price after the market closes, e.g., stock splits and dividends. These actions affect the quoted prices, but they have no direct impact on the investors who hold the stock. Therefore, we often rely on adjusted prices when it comes to analyzing the returns an investor would have earned by holding the stock continuously.\index{Stock price adjustments} + +Next, we use the `ggplot2` package [@ggplot2] to visualize the time series of adjusted prices in @fig-100. This package takes care of visualization tasks based on the principles of the grammar of graphics [@Wilkinson2012].\index{Graph!Time series} + +```{r} +#| label: fig-100 +#| fig-cap: "Prices are in USD, adjusted for dividend payments and stock splits." +#| fig-alt: "Title: Apple stock prices between the beginning of 2000 and the end of 2023. The figure shows that the stock price of Apple increased dramatically from about 1 USD to around 125 USD." +prices |> + ggplot(aes(x = date, y = adjusted_close)) + + geom_line() + + labs( + x = NULL, + y = NULL, + title = "Apple stock prices between beginning of 2000 and end of 2023" + ) +``` + +Instead of analyzing prices, we compute daily net returns defined as $r_t = p_t / p_{t-1} - 1$, where $p_t$ is the adjusted price at the end of day $t$.\index{Returns} In that context, the function `lag()` is helpful by returning the previous value in a vector. + +```{r} +returns <- prices |> + arrange(date) |> + mutate(ret = adjusted_close / lag(adjusted_close) - 1) |> + select(symbol, date, ret) +returns +``` + +The resulting tibble has three columns, in which the last one contains the daily returns (`ret`). Note that the first entry naturally contains a missing value (`NA`) because there is no previous price.\index{Missing value} Obviously, the use of `lag()` would be meaningless if the time series is not ordered by ascending dates.\index{Lag observations} The command `arrange()` provides a convenient way to order observations in the correct way for our application. In case you want to order observations by descending dates, you can use `arrange(desc(date))`. + +For the upcoming examples, we remove missing values as these would require separate treatment when computing. For example, missing values can affect sums and averages by reducing the number of valid data points, leading to underestimation in sums and potentially skewed averages if not properly accounted for. In general, always make sure you understand why `NA` values occur and carefully examine if you can simply get rid of these observations. + +```{r} +returns <- returns |> + drop_na(ret) +``` + +Next, we visualize the distribution of daily returns in a histogram in @fig-101. \index{Graph!Histogram} Additionally, we draw a dashed line that indicates the five percent quantile of the daily returns to the histogram, which is a crude proxy for the worst possible return of the stock with a probability of at most five percent. This quantile is closely connected to the (historical) value-at-risk, a risk measure commonly monitored by regulators.\index{Value-at-risk} We refer to @Tsay2010 for a more thorough introduction to the stylized facts of financial returns. + +```{r} +#| label: fig-101 +#| fig-alt: "Title: Distribution of daily Apple stock returns in percent. The figure shows a histogram of daily returns. The range indicates a few large negative values, while the remaining returns are distributed around 0. The vertical line indicates that the historical 5 percent quantile of daily returns was around negative 3 percent." +#| fig-cap: "The dotted vertical line indicates the historical 5 percent quantile." +quantile_05 <- quantile(returns |> pull(ret), probs = 0.05) +returns |> + ggplot(aes(x = ret)) + + geom_histogram(bins = 100) + + geom_vline(aes(xintercept = quantile_05), + linetype = "dashed" + ) + + labs( + x = NULL, + y = NULL, + title = "Distribution of daily Apple stock returns" + ) + + scale_x_continuous(labels = percent) +``` + +Here, `bins = 100` determines the number of bins used in the illustration and hence implicitly sets the width of the bins. Before proceeding, make sure you understand how to use the geom `geom_vline()` to add a dashed line that indicates the five percent quantile of the daily returns. A typical task before proceeding with *any* data is to compute and analzye the summary statistics for the main variables of interest. + +```{r} +returns |> + summarize(across( + ret, + list( + daily_mean = mean, + daily_sd = sd, + daily_min = min, + daily_max = max + ) + )) +``` + +We see that the maximum *daily* return was `r returns |> pull(ret) |> max() * 100` percent. Perhaps not surprisingly, the average daily return is close to but slightly above 0. In line with the illustration above, the large losses on the day with the minimum returns indicate a strong asymmetry in the distribution of returns.\ +You can also compute these summary statistics for each year individually by imposing `group_by(year = year(date))`, where the call `year(date)` returns the year. More specifically, the few lines of code below compute the summary statistics from above for individual groups of data defined by the values of the column year. The summary statistics, therefore, allow an eyeball analysis of the time-series dynamics of the daily return distribution. + +```{r} +returns |> + group_by(year = year(date)) |> + summarize(across( + ret, + list( + daily_mean = mean, + daily_sd = sd, + daily_min = min, + daily_max = max + ), + .names = "{.fn}" + )) |> + print(n = Inf) +``` + +\index{Summary statistics} + +In case you wonder: the additional argument `.names = "{.fn}"` in `across()` determines how to name the output columns. It acts as a placeholder that gets replaced by the name of the function being applied (e.g., mean, sd, min, max) when creating new column names. The specification is rather flexible and allows almost arbitrary column names, which can be useful for reporting. The `print()` function simply controls the output options for the R console. + +## Scaling Up the Analysis + +As a next step, we generalize the code from before such that all computations can handle an arbitrary vector of symbols (e.g., all constituents of an index). Following tidy principles, it is quite easy to download the data, plot the price time series, and tabulate the summary statistics for an arbitrary number of assets. + +This is where the `tidyverse` magic starts: tidy data makes it extremely easy to generalize the computations from before to as many assets or groups as you like. The following code takes any vector of symbols, e.g., `symbol <- c("AAPL", "MMM", "BA")`, and automates the download as well as the plot of the price time series. In the end, we create the table of summary statistics for all assets at once. We perform the analysis with data from all current constituents of the [Dow Jones Industrial Average index.](https://en.wikipedia.org/wiki/Dow_Jones_Industrial_Average) \index{Data!Dow Jones Index} + +```{r} +#| message: false +symbols <- download_data( + type = "constituents", + index = "Dow Jones Industrial Average" +) +symbols +``` + +Conveniently, `tidyfinance` provides the functionality to get all stock prices from an index for a specific point in time with a single call. \index{Exchange!NASDAQ} + +```{r} +#| cache: true +#| output: false +prices_daily <- download_data( + type = "stock_prices", + symbols = symbols$symbol, + start_date = "2000-01-01", + end_date = "2023-12-31" +) +``` + +The resulting tibble contains `r nrow(prices_daily)` daily observations for `r unique(prices_daily$symbol)` different stocks. @fig-103 illustrates the time series of the downloaded *adjusted* prices for each of the constituents of the Dow index. Make sure you understand every single line of code! What are the arguments of `aes()`? Which alternative `geoms` could you use to visualize the time series? Hint: if you do not know the answers try to change the code to see what difference your intervention causes. + +```{r} +#| label: fig-103 +#| fig-cap: "Prices in USD, adjusted for dividend payments and stock splits." +#| fig-alt: "Title: Stock prices of Dow index constituents. The figure shows many time series with daily prices. The general trend seems positive for most stocks in the Dow index." +fig_prices <- prices_daily |> + ggplot(aes( + x = date, + y = adjusted_close, + color = symbol + )) + + geom_line() + + labs( + x = NULL, + y = NULL, + color = NULL, + title = "Stock prices of Dow index constituents" + ) + + theme(legend.position = "none") +fig_prices +``` + +Do you notice the small differences relative to the code we used before? All we needed to do to illustrate all stock symbols simultaneously is to include `color = symbol` in the `ggplot` aesthetics. In this way, we generate a separate line for each symbol. Of course, there are simply too many lines on this graph to identify the individual stocks properly, but it illustrates our point of how to generalize a specific analysis to an arbitrage number of subjects quite well. + +The same holds for stock returns. Before computing the returns, we use `group_by(symbol)` such that the `mutate()` command is performed for each symbol individually. The same logic also applies to the computation of summary statistics: `group_by(symbol)` is the key to aggregating the time series into symbol-specific variables of interest.\index{Summary statistics} + +```{r} +returns_daily <- prices_daily |> + group_by(symbol) |> + mutate(ret = adjusted_close / lag(adjusted_close) - 1) |> + select(symbol, date, ret) |> + drop_na(ret) + +returns_daily |> + group_by(symbol) |> + summarize(across( + ret, + list( + daily_mean = mean, + daily_sd = sd, + daily_min = min, + daily_max = max + ), + .names = "{.fn}" + )) |> + print(n = Inf) +``` + +Note that you are now also equipped with all tools to download price data for *each* symbol listed in the S&P 500 index with the same number of lines of code. Just use `symbol <- download_data(type = "constituents", index = "S&P 500")`, which provides you with a tibble that contains each symbol that is (currently) part of the S&P 500.\index{Data!SP 500} However, don't try this if you are not prepared to wait for a couple of minutes because this is quite some data to download! + +## Other Forms of Data Aggregation + +Of course, aggregation across variables other than `symbol` can also make sense. For instance, suppose you are interested in answering questions like: Are days with high aggregate trading volume likely followed by days with high aggregate trading volume? To provide some initial analysis on this question, we take the downloaded data and compute aggregate daily trading volume for all Dow index constituents in USD. Recall that the column `volume` is denoted in the number of traded shares.\index{Trading volume} Thus, we multiply the trading volume with the daily adjusted closing price to get a proxy for the aggregate trading volume in USD. Scaling by `1e-9` (R can handle scientific notation) denotes daily trading volume in billion USD. + +```{r} +#| label: fig-104 +#| fig-cap: "Total daily trading volume in billion USD." +#| fig-alt: "Title: Aggregate daily trading volume. The figure shows a volatile time series of daily trading volume, ranging from 15 in 2000 to 20.5 in 2023, with a maximum of more than 100." +trading_volume <- prices_daily |> + group_by(date) |> + summarize(trading_volume = sum(volume * adjusted_close)) + +fig_trading_volume <- trading_volume |> + ggplot(aes(x = date, y = trading_volume)) + + geom_line() + + labs( + x = NULL, y = NULL, + title = "Aggregate daily trading volume of Dow index constitutens" + ) + + scale_y_continuous(labels = unit_format(unit = "B", scale = 1e-9)) +fig_trading_volume +``` + +@fig-104 indicates a clear upward trend in aggregated daily trading volume. In particular, since the outbreak of the COVID-19 pandemic, markets have processed substantial trading volumes, as analyzed, for instance, by @Goldstein2021.\index{Covid 19} One way to illustrate the persistence of trading volume would be to plot volume on day $t$ against volume on day $t-1$ as in the example below. In @fig-105, we add a dotted 45°-line to indicate a hypothetical one-to-one relation by `geom_abline()`, addressing potential differences in the axes' scales. + +```{r} +#| label: fig-105 +#| fig-cap: "Total daily trading volume in billion USD." +#| fig-alt: "Title: Persistence in daily trading volume of Dow index constituents. The figure shows a scatterplot where aggregate trading volume and previous-day aggregate trading volume neatly line up along a 45-degree line." +fig_persistence <- trading_volume |> + ggplot(aes(x = lag(trading_volume), y = trading_volume)) + + geom_point() + + geom_abline(aes(intercept = 0, slope = 1), + linetype = "dashed" + ) + + labs( + x = "Previous day aggregate trading volume", + y = "Aggregate trading volume", + title = "Persistence in daily trading volume of Dow index constituents" + ) + + scale_x_continuous(labels = unit_format(unit = "B", scale = 1e-9)) + + scale_y_continuous(labels = unit_format(unit = "B", scale = 1e-9)) +fig_persistence +``` + +Do you understand where the warning `## Warning: Removed 1 rows containing missing values (geom_point).` comes from and what it means?\index{Error message} Purely eye-balling reveals that days with high trading volume are often followed by similarly high trading volume days. + +## Key Takeaways + +In this chapter, you learned how to effectively use R to download, analyze, and visualize stock market data using tidy principles: + +- Tidy data principles enable efficient analysis of financial data. +- Adjusted prices account for corporate actions like splits and dividends. +- Summary statistics help identify key patterns in financial data. +- Visualization techniques reveal trends and distributions in returns. +- Data manipulation with tidyverse scales easily to multiple assets. +- Consistent workflows form the foundation for advanced financial analysis. + +## Exercises + +1. Download daily prices for another stock market symbol of your choice from Yahoo Finance using `download_data()` from the `tidyfinance` package. Plot two time series of the symbol’s un-adjusted and adjusted closing prices. Explain any visible differences. +1. Compute daily net returns for an asset of your choice and visualize the distribution of daily returns in a histogram using 100 bins. Also, use `geom_vline()` to add a dashed red vertical line that indicates the 5 percent quantile of the daily returns. Compute summary statistics (mean, standard deviation, minimum and maximum) for the daily returns. +1. Take your code from the previous exercises and generalize it such that you can perform all the computations for an arbitrary vector of symbols (e.g., `symbol <- c("AAPL", "MMM", "BA")`). Automate the download, the plot of the price time series, and create a table of return summary statistics for this arbitrary number of assets. +1. Are days with high aggregate trading volume often also days with large absolute returns? Find an appropriate visualization to analyze the question using the symbol `AAPL`.