Mean-variance optimization, efficient frontier, and portfolio backtesting
Portfolio Theory in Python
Expected return:
$$er_p = w_1 \cdot er_1 + w_2 \cdot er_2$$
Variance:
$$\sigma_p^2 = w_1^2 \sigma_1^2 + w_2^2 \sigma_2^2 + 2 w_1 w_2 \, \text{cov}_{12}$$
Volatility:
$$\sigma_p = \sqrt{\sigma_p^2}$$
Covariance:
$$\sigma_{ij} = \text{cov}_{ij} = \sum_{1}^{obs} \frac{(r_i - \bar{r_i})(r_j - \bar{r_j})}{n}$$
Expected return vector:
$$e_r = \begin{bmatrix} \bar{r_1} \\ \bar{r_2} \\ \vdots \\ \bar{r_n} \end{bmatrix}$$
Weights vector:
$$w = \begin{bmatrix} w_1 \\ w_2 \\ \vdots \\ w_n \end{bmatrix}, \quad \sum w_i = 1$$
Variance-covariance matrix:
$$S = \begin{bmatrix} \sigma_{11} & \sigma_{12} & \cdots & \sigma_{1n} \\ \sigma_{21} & \sigma_{22} & \cdots & \sigma_{2n} \\ \vdots & \vdots & \ddots & \vdots \\ \sigma_{n1} & \sigma_{n2} & \cdots & \sigma_{nn} \end{bmatrix}$$
Portfolio expected return: $\; Er_p = er^T w$
Portfolio variance: $\; \sigma_p^2 = w^T S \, w$
Covariance of two portfolios: $\; \sigma_{p_1 p_2} = w_1^T S \, w_2$
Portfolio volatility: $\; \sigma_p = \sqrt{\sigma_p^2}$
Merton Proposition 1 — Weights of an efficient portfolio:
$$w = \frac{S^{-1}(e_r - c)}{\sum S^{-1}(e_r - c)}$$
Merton Proposition 2 — Combination of two envelope portfolios $w_1$ & $w_2$ is another envelope portfolio:
$$w = w_1 \lambda + w_2 (1 - \lambda)$$
Global Minimum Variance Portfolio (GMVP):
$$w_{\text{gmvp}} = \frac{\mathbf{1}^T S^{-1}}{\sum \mathbf{1}^T S^{-1}}$$
We download 5 years of daily prices from Yahoo Finance for 5 asset classes:
The risk-free rate is fetched from the FRED API (5-Year Treasury, series DGS5).
Then compute log returns, annualized expected returns, volatility, variance, covariance matrix, and correlation matrix. All portfolios in this section are long-only (no negative weights).
import numpy as np
import matplotlib.pyplot as plt
import pandas as pd
import scipy.optimize as spo
import yfinance as yf
import requests
from numpy.linalg import inv
from math import sqrt
tickers = ['SPY', 'GLD', 'CEMB', 'TLT', 'EFA']
n_assets = len(tickers)
df = yf.download(tickers, period='5y')['Close']
df = df[tickers] # ensure column order
df
from fredapi import Fred
fred = Fred(api_key="YOUR_FRED_API_KEY")
# Get latest 5Y Treasury yield
rf = fred.get_series("DGS5").dropna().iloc[-1] / 100
print(f"5Y Treasury: {rf:.4f} ({rf*100:.2f}%)")
smat = df.values
smat
$$r_t = \ln\!\left(\frac{S_t}{S_{t-1}}\right)$$
ret = np.log(smat[1:, :] / smat[:-1, :])
ret
We annualize by multiplying daily means by 250 and daily std by $\sqrt{250}$.
er = np.mean(ret, axis=0) * 250
print("Expected Returns (annualized):")
for i, t in enumerate(tickers):
print(f" {t}: {er[i]:.4f}")
vol = np.std(ret, axis=0, ddof=1) * sqrt(250)
print("\nVolatility (annualized):")
for i, t in enumerate(tickers):
print(f" {t}: {vol[i]:.4f}")
var = np.var(ret, axis=0, ddof=1) * 250
print("\nVariance (annualized):")
for i, t in enumerate(tickers):
print(f" {t}: {var[i]:.4f}")
s = np.cov(ret, rowvar=False, ddof=1) * 250
rho = np.corrcoef(ret, rowvar=False)
print("Covariance Matrix (annualized):")
print(pd.DataFrame(s, index=tickers, columns=tickers).round(4))
print("\nCorrelation Matrix:")
print(pd.DataFrame(rho, index=tickers, columns=tickers).round(4))
# Equal-weight portfolio
n_assets = len(tickers)
w = np.ones(n_assets) / n_assets
er1 = er.T @ w
var = (w.T @ s) @ w
er1
Using the risk-free rate $r_f$ from FRED as the constant $c$:
$$w = \frac{S^{-1}(e_r - r_f)}{\sum S^{-1}(e_r - r_f)}$$
This is the unconstrained analytical solution — it may produce negative weights (short positions). The proper long-only solution requires numerical optimization (Part 2).
w1 = inv(s) @ (er - rf) / sum(inv(s) @ (er - rf))
w1
sum(w1)
er_p1 = er.T @ w1
er_p1
var_p1 = (w1.T @ s) @ w1
vol_p1 = sqrt(var_p1)
vol_p1
sharpe_p1 = (er_p1 - rf) / np.sqrt(var_p1)
sharpe_p1
With 2 assets, we can sweep one weight $w$ and set the other to $1 - w$:
$$er_p = w \cdot er_1 + (1-w) \cdot er_2$$
$$\sigma_p = \sqrt{w^2 \sigma_1^2 + (1-w)^2 \sigma_2^2 + 2w(1-w)\sigma_{12}}$$
# 2-asset example: SPY (index 0) and GLD (index 1)
w = np.arange(-2, 8, 0.01)
er_spy = er[0]
er_gld = er[1]
var_spy = s[0, 0]
var_gld = s[1, 1]
cov_spy_gld = s[0, 1]
r_2 = er_spy * w + er_gld * (1 - w)
sigma_2 = np.sqrt(var_spy * w**2 + var_gld * (1 - w)**2 + 2 * w * (1 - w) * cov_spy_gld)
plt.plot(sigma_2, r_2)
plt.title("Efficient Frontier - 2 Assets (SPY & GLD)")
plt.xlabel("Volatility")
plt.ylabel("Return")
plt.scatter(vol[0], er[0])
plt.annotate("SPY", (vol[0], er[0]))
plt.scatter(vol[1], er[1])
plt.annotate("GLD", (vol[1], er[1]))
plt.show()
We maximize the Sharpe ratio using scipy.optimize.minimize on the negative Sharpe.
Objective function:
$$f(w) = -\frac{er_p - r_f}{\sigma_p}$$
Constraints — system of inequations:
$$\sum w_i = 1 \quad \text{(equality)}$$
$$w_i \geq 0 \quad \forall \, i \quad \text{(inequality - long only)}$$
def f_negative_sharpe(w, er, s, r):
er_p = er.T @ w
var_p = (w.T @ s) @ w
vol_p = sqrt(var_p)
return -(er_p - r) / vol_p
w0 = np.ones(n_assets) / n_assets # equal-weight starting guess
# Constraints: sum(w) = 1 and w_i >= 0
cons_p1 = (
{'type': 'eq', 'fun': lambda w: np.sum(w) - 1},
{'type': 'ineq', 'fun': lambda w: w}
)
bo_p1 = tuple((0, 1) for _ in range(n_assets))
result_p1 = spo.minimize(f_negative_sharpe, w0, args=(er, s, rf),
constraints=cons_p1, bounds=bo_p1)
print("Portfolio 1 - Long-Only Optimal Weights:")
for i, t in enumerate(tickers):
print(f" {t}: {result_p1.x[i]:.4f}")
print(f" Sharpe: {-result_p1.fun:.4f}")
Same objective, but each weight is capped at 30%.
Constraints — system of inequations:
$$\sum w_i = 1 \quad \text{(equality)}$$
$$w_i \geq 0 \quad \forall \, i \quad \text{(inequality - long only)}$$
$$w_i \leq 0.30 \quad \forall \, i \quad \text{(inequality - max 30\%)}$$
# Constraints: sum(w) = 1, w_i >= 0, w_i <= 0.30
cons_p2 = (
{'type': 'eq', 'fun': lambda w: np.sum(w) - 1},
{'type': 'ineq', 'fun': lambda w: w},
{'type': 'ineq', 'fun': lambda w: 0.30 - w}
)
bo_p2 = tuple((0, 0.30) for _ in range(n_assets))
result_p2 = spo.minimize(f_negative_sharpe, w0, args=(er, s, rf),
constraints=cons_p2, bounds=bo_p2)
print("Portfolio 2 - Max 30% Allocation Weights:")
for i, t in enumerate(tickers):
print(f" {t}: {result_p2.x[i]:.4f}")
print(f" Sharpe: {-result_p2.fun:.4f}")
We remove GLD from the universe. The weight of GLD is forced to zero.
Constraints — system of inequations:
$$\sum w_i = 1 \quad \text{(equality)}$$
$$w_i \geq 0 \quad \forall \, i \quad \text{(inequality - long only)}$$
$$w_{\text{GLD}} = 0 \quad \text{(equality - exclude gold)}$$
gld_idx = tickers.index('GLD')
# Constraints: sum(w) = 1, w_i >= 0, w_GLD = 0
cons_p3 = (
{'type': 'eq', 'fun': lambda w: np.sum(w) - 1},
{'type': 'ineq', 'fun': lambda w: w},
{'type': 'eq', 'fun': lambda w: w[gld_idx]}
)
bo_p3 = tuple((0, 1) for _ in range(n_assets))
result_p3 = spo.minimize(f_negative_sharpe, w0, args=(er, s, rf),
constraints=cons_p3, bounds=bo_p3)
print("Portfolio 3 - Exclude Gold Weights:")
for i, t in enumerate(tickers):
print(f" {t}: {result_p3.x[i]:.4f}")
print(f" Sharpe: {-result_p3.fun:.4f}")
results = {'P1 Long-Only': result_p1.x, 'P2 Max30%': result_p2.x, 'P3 No Gold': result_p3.x}
summary = pd.DataFrame(results, index=tickers)
for name, w_opt in results.items():
summary.loc['Er', name] = er.T @ w_opt
summary.loc['Vol', name] = sqrt((w_opt.T @ s) @ w_opt)
summary.loc['Sharpe', name] = (er.T @ w_opt - rf) / sqrt((w_opt.T @ s) @ w_opt)
print(f"Risk-free rate (rf): {rf:.4f}\n")
print(summary.round(4))
fig, axes = plt.subplots(1, 3, figsize=(14, 5), sharey=True)
colors = ['#1f77b4', '#ff7f0e', '#2ca02c', '#d62728', '#9467bd']
port_names = ['P1 Long-Only', 'P2 Max 30%', 'P3 No Gold']
port_weights = [result_p1.x, result_p2.x, result_p3.x]
for ax, name, w_opt in zip(axes, port_names, port_weights):
bars = ax.bar(tickers, w_opt, color=colors, edgecolor='black', linewidth=0.5)
ax.set_title(name, fontsize=12, fontweight='bold')
ax.set_ylim(0, 1.0)
ax.set_ylabel('Weight' if ax == axes[0] else '')
ax.axhline(y=0, color='black', linewidth=0.5)
ax.grid(axis='y', alpha=0.3)
for bar, w in zip(bars, w_opt):
if w > 0.01:
ax.text(bar.get_x() + bar.get_width() / 2, bar.get_height() + 0.02,
f'{w:.0%}', ha='center', va='bottom', fontsize=9, fontweight='bold')
plt.suptitle('Portfolio Weights Comparison', fontsize=14, fontweight='bold', y=1.02)
plt.tight_layout()
plt.show()
For each constraint set, we sweep target returns from the minimum (worst asset) to the maximum (best asset), and for each target we minimize portfolio variance:
$$\min_w \; w^T S \, w \quad \text{s.t.} \quad er^T w = \text{target}, \;\sum w_i = 1, \; \text{constraints}$$
The GMVP (leftmost point) splits the curve into:
def f_variance(w, s):
return (w.T @ s) @ w
def build_frontier(cons, bo, w0, er, s, n_points=500):
# Find actual min and max achievable return within THIS constraint set
res_min = spo.minimize(lambda w: er.T @ w, w0, constraints=cons, bounds=bo)
res_max = spo.minimize(lambda w: -er.T @ w, w0, constraints=cons, bounds=bo)
ret_min = er.T @ res_min.x
ret_max = er.T @ res_max.x
# Step 1: find the GMVP for this constraint set
res_gmvp = spo.minimize(f_variance, w0, args=(s,), constraints=cons,
bounds=bo, method='SLSQP', options={'ftol': 1e-12})
er_gmvp_val = er.T @ res_gmvp.x
# Use only sum(w)=1 equality + bounds for variance minimization
cons_eq = ({'type': 'eq', 'fun': lambda w: np.sum(w) - 1},)
# Step 2: upper branch - from GMVP return up to max return
r_upper, sigma_upper = [], []
for target in np.linspace(er_gmvp_val, ret_max, n_points):
cons_target = list(cons_eq) + [
{'type': 'eq', 'fun': lambda w, t=target: er.T @ w - t}
]
res = spo.minimize(f_variance, w0, args=(s,), constraints=cons_target,
bounds=bo, method='SLSQP', options={'ftol': 1e-12})
if res.success:
r_upper.append(er.T @ res.x)
sigma_upper.append(sqrt((res.x.T @ s) @ res.x))
# Step 3: lower branch - from GMVP return down to min return
r_lower, sigma_lower = [], []
for target in np.linspace(er_gmvp_val, ret_min, n_points):
cons_target = list(cons_eq) + [
{'type': 'eq', 'fun': lambda w, t=target: er.T @ w - t}
]
res = spo.minimize(f_variance, w0, args=(s,), constraints=cons_target,
bounds=bo, method='SLSQP', options={'ftol': 1e-12})
if res.success:
r_lower.append(er.T @ res.x)
sigma_lower.append(sqrt((res.x.T @ s) @ res.x))
return sigma_upper, r_upper, sigma_lower, r_lower
# Bounds for each constraint set
bo_frontier_p1 = tuple((0, 1) for _ in range(n_assets)) # P1: long-only
bo_frontier_p2 = tuple((0, 0.30) for _ in range(n_assets)) # P2: max 30%
bo_frontier_p3 = tuple( # P3: exclude gold
(0, 0) if i == gld_idx else (0, 1) for i in range(n_assets)
)
cons_frontier = ({'type': 'eq', 'fun': lambda w: np.sum(w) - 1},)
# Build all three frontiers
sigma_p1_up, r_p1_up, sigma_p1_lo, r_p1_lo = build_frontier(
cons_frontier, bo_frontier_p1, w0, er, s)
sigma_p2_up, r_p2_up, sigma_p2_lo, r_p2_lo = build_frontier(
cons_frontier, bo_frontier_p2, w0, er, s)
sigma_p3_up, r_p3_up, sigma_p3_lo, r_p3_lo = build_frontier(
cons_frontier, bo_frontier_p3, w0, er, s)
def port_stats(w_opt):
return er.T @ w_opt, sqrt((w_opt.T @ s) @ w_opt)
er_opt1, vol_opt1 = port_stats(result_p1.x)
er_opt2, vol_opt2 = port_stats(result_p2.x)
er_opt3, vol_opt3 = port_stats(result_p3.x)
# Capital Market Line
cml_slope = (er_opt1 - rf) / vol_opt1
cml_x = np.linspace(0, max(vol) * 1.3, 200)
cml_y = rf + cml_slope * cml_x
plt.figure(figsize=(10, 6))
# Upper branches (efficient frontier) - solid
plt.plot(sigma_p1_up, r_p1_up, label='P1 Long-Only', linewidth=2, color='#1f77b4')
plt.plot(sigma_p2_up, r_p2_up, label='P2 Max 30%', linewidth=2,
linestyle='--', color='#ff7f0e')
plt.plot(sigma_p3_up, r_p3_up, label='P3 No Gold', linewidth=2,
linestyle='-.', color='#2ca02c')
# Lower branches (inefficient) - same color, thinner, dashed
plt.plot(sigma_p1_lo, r_p1_lo, linewidth=1, color='#1f77b4', alpha=0.5)
plt.plot(sigma_p2_lo, r_p2_lo, linewidth=1, color='#ff7f0e', alpha=0.5)
plt.plot(sigma_p3_lo, r_p3_lo, linewidth=1, color='#2ca02c', alpha=0.5)
# CML
plt.plot(cml_x, cml_y, label=f'CML (slope={cml_slope:.2f})',
linewidth=1.5, linestyle=':', color='red')
# Optimal portfolios (max Sharpe)
plt.scatter(vol_opt1, er_opt1, s=100, zorder=5, marker='*')
plt.annotate(f"P1 (Sharpe={-result_p1.fun:.2f})", (vol_opt1, er_opt1),
textcoords="offset points", xytext=(10, 5), fontsize=9)
plt.scatter(vol_opt2, er_opt2, s=100, zorder=5, marker='*')
plt.annotate(f"P2 (Sharpe={-result_p2.fun:.2f})", (vol_opt2, er_opt2),
textcoords="offset points", xytext=(10, 5), fontsize=9)
plt.scatter(vol_opt3, er_opt3, s=100, zorder=5, marker='*')
plt.annotate(f"P3 (Sharpe={-result_p3.fun:.2f})", (vol_opt3, er_opt3),
textcoords="offset points", xytext=(10, 5), fontsize=9)
# Individual assets
for i, t in enumerate(tickers):
plt.scatter(vol[i], er[i], s=40, color='gray', zorder=4)
plt.annotate(t, (vol[i], er[i]), textcoords="offset points",
xytext=(6, 4), fontsize=8, color='gray')
# Risk-free rate
plt.scatter(0, rf, s=80, color='red', zorder=5, marker='o')
plt.annotate(f"rf={rf:.2%}", (0, rf), textcoords="offset points",
xytext=(8, -10), fontsize=9, color='red')
plt.title("Efficient Frontiers - 3 Portfolio Constraints + CML")
plt.xlabel("Volatility")
plt.ylabel("Return")
plt.grid(alpha=0.3)
plt.legend(fontsize=10)
plt.tight_layout()
plt.show()