MVO Portfolio Optimization

Mean-variance optimization, efficient frontier, and portfolio backtesting

Mean-Variance Optimization

Portfolio Theory in Python


Review — Portfolio Theory

1. Portfolio of 2 Assets

Expected return:

$$er_p = w_1 \cdot er_1 + w_2 \cdot er_2$$

Variance:

$$\sigma_p^2 = w_1^2 \sigma_1^2 + w_2^2 \sigma_2^2 + 2 w_1 w_2 \, \text{cov}_{12}$$

Volatility:

$$\sigma_p = \sqrt{\sigma_p^2}$$

Covariance:

$$\sigma_{ij} = \text{cov}_{ij} = \sum_{1}^{obs} \frac{(r_i - \bar{r_i})(r_j - \bar{r_j})}{n}$$

2. Portfolio of n Assets

Expected return vector:

$$e_r = \begin{bmatrix} \bar{r_1} \\ \bar{r_2} \\ \vdots \\ \bar{r_n} \end{bmatrix}$$

Weights vector:

$$w = \begin{bmatrix} w_1 \\ w_2 \\ \vdots \\ w_n \end{bmatrix}, \quad \sum w_i = 1$$

Variance-covariance matrix:

$$S = \begin{bmatrix} \sigma_{11} & \sigma_{12} & \cdots & \sigma_{1n} \\ \sigma_{21} & \sigma_{22} & \cdots & \sigma_{2n} \\ \vdots & \vdots & \ddots & \vdots \\ \sigma_{n1} & \sigma_{n2} & \cdots & \sigma_{nn} \end{bmatrix}$$

Portfolio expected return: $\; Er_p = er^T w$

Portfolio variance: $\; \sigma_p^2 = w^T S \, w$

Covariance of two portfolios: $\; \sigma_{p_1 p_2} = w_1^T S \, w_2$

Portfolio volatility: $\; \sigma_p = \sqrt{\sigma_p^2}$

Key Propositions

Merton Proposition 1 — Weights of an efficient portfolio:

$$w = \frac{S^{-1}(e_r - c)}{\sum S^{-1}(e_r - c)}$$

Merton Proposition 2 — Combination of two envelope portfolios $w_1$ & $w_2$ is another envelope portfolio:

$$w = w_1 \lambda + w_2 (1 - \lambda)$$

Global Minimum Variance Portfolio (GMVP):

$$w_{\text{gmvp}} = \frac{\mathbf{1}^T S^{-1}}{\sum \mathbf{1}^T S^{-1}}$$


Part 1 — Data & Covariance Matrix

We download 5 years of daily prices from Yahoo Finance for 5 asset classes:

The risk-free rate is fetched from the FRED API (5-Year Treasury, series DGS5).

Then compute log returns, annualized expected returns, volatility, variance, covariance matrix, and correlation matrix. All portfolios in this section are long-only (no negative weights).

import numpy as np
import matplotlib.pyplot as plt
import pandas as pd
import scipy.optimize as spo
import yfinance as yf
import requests
from numpy.linalg import inv
from math import sqrt
tickers = ['SPY', 'GLD', 'CEMB', 'TLT', 'EFA']
n_assets = len(tickers)

df = yf.download(tickers, period='5y')['Close']
df = df[tickers]  # ensure column order
df

Risk-Free Rate — FRED API (5-Year Treasury)

from fredapi import Fred

fred = Fred(api_key="YOUR_FRED_API_KEY")

# Get latest 5Y Treasury yield
rf = fred.get_series("DGS5").dropna().iloc[-1] / 100

print(f"5Y Treasury: {rf:.4f} ({rf*100:.2f}%)")
smat = df.values
smat

Log Returns

$$r_t = \ln\!\left(\frac{S_t}{S_{t-1}}\right)$$

ret = np.log(smat[1:, :] / smat[:-1, :])
ret

Annualized Statistics

We annualize by multiplying daily means by 250 and daily std by $\sqrt{250}$.

er = np.mean(ret, axis=0) * 250
print("Expected Returns (annualized):")
for i, t in enumerate(tickers):
    print(f"  {t}: {er[i]:.4f}")

vol = np.std(ret, axis=0, ddof=1) * sqrt(250)
print("\nVolatility (annualized):")
for i, t in enumerate(tickers):
    print(f"  {t}: {vol[i]:.4f}")

var = np.var(ret, axis=0, ddof=1) * 250
print("\nVariance (annualized):")
for i, t in enumerate(tickers):
    print(f"  {t}: {var[i]:.4f}")

Covariance & Correlation Matrices

s = np.cov(ret, rowvar=False, ddof=1) * 250
rho = np.corrcoef(ret, rowvar=False)

print("Covariance Matrix (annualized):")
print(pd.DataFrame(s, index=tickers, columns=tickers).round(4))
print("\nCorrelation Matrix:")
print(pd.DataFrame(rho, index=tickers, columns=tickers).round(4))
# Equal-weight portfolio
n_assets = len(tickers)
w = np.ones(n_assets) / n_assets
er1 = er.T @ w
var = (w.T @ s) @ w

er1

Efficient Portfolio — Merton Proposition 1

Using the risk-free rate $r_f$ from FRED as the constant $c$:

$$w = \frac{S^{-1}(e_r - r_f)}{\sum S^{-1}(e_r - r_f)}$$

This is the unconstrained analytical solution — it may produce negative weights (short positions). The proper long-only solution requires numerical optimization (Part 2).

w1 = inv(s) @ (er - rf) / sum(inv(s) @ (er - rf))
w1
sum(w1)
er_p1 = er.T @ w1
er_p1
var_p1 = (w1.T @ s) @ w1
vol_p1 = sqrt(var_p1)
vol_p1
sharpe_p1 = (er_p1 - rf) / np.sqrt(var_p1)
sharpe_p1

Efficient Frontier — 2 Asset Example (SPY & GLD)

With 2 assets, we can sweep one weight $w$ and set the other to $1 - w$:

$$er_p = w \cdot er_1 + (1-w) \cdot er_2$$

$$\sigma_p = \sqrt{w^2 \sigma_1^2 + (1-w)^2 \sigma_2^2 + 2w(1-w)\sigma_{12}}$$

# 2-asset example: SPY (index 0) and GLD (index 1)
w = np.arange(-2, 8, 0.01)

er_spy = er[0]
er_gld = er[1]
var_spy = s[0, 0]
var_gld = s[1, 1]
cov_spy_gld = s[0, 1]

r_2 = er_spy * w + er_gld * (1 - w)
sigma_2 = np.sqrt(var_spy * w**2 + var_gld * (1 - w)**2 + 2 * w * (1 - w) * cov_spy_gld)

plt.plot(sigma_2, r_2)
plt.title("Efficient Frontier - 2 Assets (SPY & GLD)")
plt.xlabel("Volatility")
plt.ylabel("Return")
plt.scatter(vol[0], er[0])
plt.annotate("SPY", (vol[0], er[0]))
plt.scatter(vol[1], er[1])
plt.annotate("GLD", (vol[1], er[1]))
plt.show()

Part 2 — Optimizer (Portfolio 1: Long-Only)

We maximize the Sharpe ratio using scipy.optimize.minimize on the negative Sharpe.

Objective function:

$$f(w) = -\frac{er_p - r_f}{\sigma_p}$$

Constraints — system of inequations:

$$\sum w_i = 1 \quad \text{(equality)}$$

$$w_i \geq 0 \quad \forall \, i \quad \text{(inequality - long only)}$$

def f_negative_sharpe(w, er, s, r):
    er_p = er.T @ w
    var_p = (w.T @ s) @ w
    vol_p = sqrt(var_p)
    return -(er_p - r) / vol_p
w0 = np.ones(n_assets) / n_assets  # equal-weight starting guess

# Constraints: sum(w) = 1 and w_i >= 0
cons_p1 = (
    {'type': 'eq',   'fun': lambda w: np.sum(w) - 1},
    {'type': 'ineq', 'fun': lambda w: w}
)

bo_p1 = tuple((0, 1) for _ in range(n_assets))

result_p1 = spo.minimize(f_negative_sharpe, w0, args=(er, s, rf),
                          constraints=cons_p1, bounds=bo_p1)

print("Portfolio 1 - Long-Only Optimal Weights:")
for i, t in enumerate(tickers):
    print(f"  {t}: {result_p1.x[i]:.4f}")
print(f"  Sharpe: {-result_p1.fun:.4f}")

Part 3 — Constrained Portfolios

Portfolio 2 — Max 30% Allocation

Same objective, but each weight is capped at 30%.

Constraints — system of inequations:

$$\sum w_i = 1 \quad \text{(equality)}$$

$$w_i \geq 0 \quad \forall \, i \quad \text{(inequality - long only)}$$

$$w_i \leq 0.30 \quad \forall \, i \quad \text{(inequality - max 30\%)}$$

# Constraints: sum(w) = 1, w_i >= 0, w_i <= 0.30
cons_p2 = (
    {'type': 'eq',   'fun': lambda w: np.sum(w) - 1},
    {'type': 'ineq', 'fun': lambda w: w},
    {'type': 'ineq', 'fun': lambda w: 0.30 - w}
)

bo_p2 = tuple((0, 0.30) for _ in range(n_assets))

result_p2 = spo.minimize(f_negative_sharpe, w0, args=(er, s, rf),
                          constraints=cons_p2, bounds=bo_p2)

print("Portfolio 2 - Max 30% Allocation Weights:")
for i, t in enumerate(tickers):
    print(f"  {t}: {result_p2.x[i]:.4f}")
print(f"  Sharpe: {-result_p2.fun:.4f}")

Portfolio 3 — Exclude Gold

We remove GLD from the universe. The weight of GLD is forced to zero.

Constraints — system of inequations:

$$\sum w_i = 1 \quad \text{(equality)}$$

$$w_i \geq 0 \quad \forall \, i \quad \text{(inequality - long only)}$$

$$w_{\text{GLD}} = 0 \quad \text{(equality - exclude gold)}$$

gld_idx = tickers.index('GLD')

# Constraints: sum(w) = 1, w_i >= 0, w_GLD = 0
cons_p3 = (
    {'type': 'eq',   'fun': lambda w: np.sum(w) - 1},
    {'type': 'ineq', 'fun': lambda w: w},
    {'type': 'eq',   'fun': lambda w: w[gld_idx]}
)

bo_p3 = tuple((0, 1) for _ in range(n_assets))

result_p3 = spo.minimize(f_negative_sharpe, w0, args=(er, s, rf),
                          constraints=cons_p3, bounds=bo_p3)

print("Portfolio 3 - Exclude Gold Weights:")
for i, t in enumerate(tickers):
    print(f"  {t}: {result_p3.x[i]:.4f}")
print(f"  Sharpe: {-result_p3.fun:.4f}")

Summary — All Three Portfolios

results = {'P1 Long-Only': result_p1.x, 'P2 Max30%': result_p2.x, 'P3 No Gold': result_p3.x}

summary = pd.DataFrame(results, index=tickers)
for name, w_opt in results.items():
    summary.loc['Er',  name] = er.T @ w_opt
    summary.loc['Vol', name] = sqrt((w_opt.T @ s) @ w_opt)
    summary.loc['Sharpe', name] = (er.T @ w_opt - rf) / sqrt((w_opt.T @ s) @ w_opt)

print(f"Risk-free rate (rf): {rf:.4f}\n")
print(summary.round(4))
fig, axes = plt.subplots(1, 3, figsize=(14, 5), sharey=True)

colors = ['#1f77b4', '#ff7f0e', '#2ca02c', '#d62728', '#9467bd']
port_names = ['P1 Long-Only', 'P2 Max 30%', 'P3 No Gold']
port_weights = [result_p1.x, result_p2.x, result_p3.x]

for ax, name, w_opt in zip(axes, port_names, port_weights):
    bars = ax.bar(tickers, w_opt, color=colors, edgecolor='black', linewidth=0.5)
    ax.set_title(name, fontsize=12, fontweight='bold')
    ax.set_ylim(0, 1.0)
    ax.set_ylabel('Weight' if ax == axes[0] else '')
    ax.axhline(y=0, color='black', linewidth=0.5)
    ax.grid(axis='y', alpha=0.3)
    for bar, w in zip(bars, w_opt):
        if w > 0.01:
            ax.text(bar.get_x() + bar.get_width() / 2, bar.get_height() + 0.02,
                    f'{w:.0%}', ha='center', va='bottom', fontsize=9, fontweight='bold')

plt.suptitle('Portfolio Weights Comparison', fontsize=14, fontweight='bold', y=1.02)
plt.tight_layout()
plt.show()

Part 4 — Efficient Frontier Visualization

For each constraint set, we sweep target returns from the minimum (worst asset) to the maximum (best asset), and for each target we minimize portfolio variance:

$$\min_w \; w^T S \, w \quad \text{s.t.} \quad er^T w = \text{target}, \;\sum w_i = 1, \; \text{constraints}$$

The GMVP (leftmost point) splits the curve into:

def f_variance(w, s):
    return (w.T @ s) @ w

def build_frontier(cons, bo, w0, er, s, n_points=500):
    # Find actual min and max achievable return within THIS constraint set
    res_min = spo.minimize(lambda w: er.T @ w, w0, constraints=cons, bounds=bo)
    res_max = spo.minimize(lambda w: -er.T @ w, w0, constraints=cons, bounds=bo)
    ret_min = er.T @ res_min.x
    ret_max = er.T @ res_max.x

    # Step 1: find the GMVP for this constraint set
    res_gmvp = spo.minimize(f_variance, w0, args=(s,), constraints=cons,
                             bounds=bo, method='SLSQP', options={'ftol': 1e-12})
    er_gmvp_val = er.T @ res_gmvp.x

    # Use only sum(w)=1 equality + bounds for variance minimization
    cons_eq = ({'type': 'eq', 'fun': lambda w: np.sum(w) - 1},)

    # Step 2: upper branch - from GMVP return up to max return
    r_upper, sigma_upper = [], []
    for target in np.linspace(er_gmvp_val, ret_max, n_points):
        cons_target = list(cons_eq) + [
            {'type': 'eq', 'fun': lambda w, t=target: er.T @ w - t}
        ]
        res = spo.minimize(f_variance, w0, args=(s,), constraints=cons_target,
                            bounds=bo, method='SLSQP', options={'ftol': 1e-12})
        if res.success:
            r_upper.append(er.T @ res.x)
            sigma_upper.append(sqrt((res.x.T @ s) @ res.x))

    # Step 3: lower branch - from GMVP return down to min return
    r_lower, sigma_lower = [], []
    for target in np.linspace(er_gmvp_val, ret_min, n_points):
        cons_target = list(cons_eq) + [
            {'type': 'eq', 'fun': lambda w, t=target: er.T @ w - t}
        ]
        res = spo.minimize(f_variance, w0, args=(s,), constraints=cons_target,
                            bounds=bo, method='SLSQP', options={'ftol': 1e-12})
        if res.success:
            r_lower.append(er.T @ res.x)
            sigma_lower.append(sqrt((res.x.T @ s) @ res.x))

    return sigma_upper, r_upper, sigma_lower, r_lower

# Bounds for each constraint set
bo_frontier_p1 = tuple((0, 1) for _ in range(n_assets))          # P1: long-only
bo_frontier_p2 = tuple((0, 0.30) for _ in range(n_assets))       # P2: max 30%
bo_frontier_p3 = tuple(                                            # P3: exclude gold
    (0, 0) if i == gld_idx else (0, 1) for i in range(n_assets)
)

cons_frontier = ({'type': 'eq', 'fun': lambda w: np.sum(w) - 1},)

# Build all three frontiers
sigma_p1_up, r_p1_up, sigma_p1_lo, r_p1_lo = build_frontier(
    cons_frontier, bo_frontier_p1, w0, er, s)
sigma_p2_up, r_p2_up, sigma_p2_lo, r_p2_lo = build_frontier(
    cons_frontier, bo_frontier_p2, w0, er, s)
sigma_p3_up, r_p3_up, sigma_p3_lo, r_p3_lo = build_frontier(
    cons_frontier, bo_frontier_p3, w0, er, s)
def port_stats(w_opt):
    return er.T @ w_opt, sqrt((w_opt.T @ s) @ w_opt)

er_opt1, vol_opt1 = port_stats(result_p1.x)
er_opt2, vol_opt2 = port_stats(result_p2.x)
er_opt3, vol_opt3 = port_stats(result_p3.x)

# Capital Market Line
cml_slope = (er_opt1 - rf) / vol_opt1
cml_x = np.linspace(0, max(vol) * 1.3, 200)
cml_y = rf + cml_slope * cml_x

plt.figure(figsize=(10, 6))

# Upper branches (efficient frontier) - solid
plt.plot(sigma_p1_up, r_p1_up, label='P1 Long-Only', linewidth=2, color='#1f77b4')
plt.plot(sigma_p2_up, r_p2_up, label='P2 Max 30%', linewidth=2,
         linestyle='--', color='#ff7f0e')
plt.plot(sigma_p3_up, r_p3_up, label='P3 No Gold', linewidth=2,
         linestyle='-.', color='#2ca02c')

# Lower branches (inefficient) - same color, thinner, dashed
plt.plot(sigma_p1_lo, r_p1_lo, linewidth=1, color='#1f77b4', alpha=0.5)
plt.plot(sigma_p2_lo, r_p2_lo, linewidth=1, color='#ff7f0e', alpha=0.5)
plt.plot(sigma_p3_lo, r_p3_lo, linewidth=1, color='#2ca02c', alpha=0.5)

# CML
plt.plot(cml_x, cml_y, label=f'CML (slope={cml_slope:.2f})',
         linewidth=1.5, linestyle=':', color='red')

# Optimal portfolios (max Sharpe)
plt.scatter(vol_opt1, er_opt1, s=100, zorder=5, marker='*')
plt.annotate(f"P1 (Sharpe={-result_p1.fun:.2f})", (vol_opt1, er_opt1),
             textcoords="offset points", xytext=(10, 5), fontsize=9)
plt.scatter(vol_opt2, er_opt2, s=100, zorder=5, marker='*')
plt.annotate(f"P2 (Sharpe={-result_p2.fun:.2f})", (vol_opt2, er_opt2),
             textcoords="offset points", xytext=(10, 5), fontsize=9)
plt.scatter(vol_opt3, er_opt3, s=100, zorder=5, marker='*')
plt.annotate(f"P3 (Sharpe={-result_p3.fun:.2f})", (vol_opt3, er_opt3),
             textcoords="offset points", xytext=(10, 5), fontsize=9)

# Individual assets
for i, t in enumerate(tickers):
    plt.scatter(vol[i], er[i], s=40, color='gray', zorder=4)
    plt.annotate(t, (vol[i], er[i]), textcoords="offset points",
                 xytext=(6, 4), fontsize=8, color='gray')

# Risk-free rate
plt.scatter(0, rf, s=80, color='red', zorder=5, marker='o')
plt.annotate(f"rf={rf:.2%}", (0, rf), textcoords="offset points",
             xytext=(8, -10), fontsize=9, color='red')

plt.title("Efficient Frontiers - 3 Portfolio Constraints + CML")
plt.xlabel("Volatility")
plt.ylabel("Return")
plt.grid(alpha=0.3)
plt.legend(fontsize=10)
plt.tight_layout()
plt.show()
David Arias, CFA
Written and Modelled by

David Arias, CFA

Licensed portfolio manager with 4+ years of experience, specializing in emerging markets private debt, derivatives, and quantitative finance.

Let's Connect