Application of Extreme Value Theory on Time Series Data for Risk Management

Feb 10, 2026


Introduction

Extreme Value Theory (EVT), is a branch of probability theory, which focuses on the laws that regard extreme values in large samples of data.

The goal of this theory, is to try and model what happens in these extreme cases, that we find in nature or industry, such as when rope breaks or when a market crashes. If we understand the underlying dynamics and laws that govern these, we’re able to better manage risk.

Usually, this is done by having some sort of dataset, which contains observations for extreme events, and the magnitude that comes with them, such as financial losses which are incurred by them. A very common example of these datasets is the Danish Fire Insurance data.

This is done by first considering a sequence of independent identically distributed random variables

(Xi)iN\left(X_{i}\right)_{i \in \mathbb{N}}

these are typically the financial losses of such events taking place

Generalized Extreme Value Distribution (GEV)

By applying Generalized Extreme Value Distribution (GEV), which is given by

Hξ(x)={e(1+ξx)1ξ,ξ0eex,ξ=0H_{\xi}(x) = \begin{cases} e^{-(1+\xi x)^{\frac{-1}{\xi}}}, & \xi \neq 0\\ e^{-e^{-x}}, & \xi = 0 \end{cases}

In this, we have that ξ\xi is given as the shape parameter of the distribution, and HξH_{\xi} is the definition of a type of distribution.

We can try and illustrate this by choosing different shape parameters, which are given by

  • ξ>0\xi > 0; Fréchet (Usually the most observed)
  • ξ=0\xi = 0; Gumbel
  • ξ<0\xi < 0; Weibull

Block Maxima Method

The Block Maxima Method (BMM), is a method of dividing datasets into intervals, this can be years, quarters and so on. Given these blocks, we then observe the biggest magnitude event that have happened during that period, which we would consider a negative event. So in finance, we look at the case of large losses.

Given this, we then proceed to take these observations and fit a three parameter GEV to these observations. This three parameter GEV is given by redefining xx as

x=xμσx = \frac{x-\mu}{\sigma}

which is called a location-scale transformation. Doing this then gives us the following GEV

Hξ,μ,σ(x)={e(1+ξxμσ)1ξ,ξ0eexμσ,ξ=0H_{\xi, \mu, \sigma}(x) = \begin{cases} e^{-\left(1 + \xi \frac{x - \mu}{\sigma}\right)^{\frac{-1}{\xi}}}, & \xi \neq 0 \\ e^{-e^{-\frac{x - \mu}{\sigma}}}, & \xi = 0 \end{cases}

Example: S&P500

Given the context of which we wish to try and apply this theory, we will use the S&P500 index, and try to fit this extreme loss model.

To start with, we fetch our data, which will be the price of the S&P500 throughout time

1ticker = "^GSPC"
Ticker

The ticker ^GSPC relates to the data which is for the S&P 500

2data = yf.download(ticker,start="1960-01-01", end="1987-10-17")
Fetching data

We then fetch the data. For this we use yfinance package.

Having done this, we then calculate the returns of the closing price of every single day. Note that this will leave us with the first observation to have a NA value, due to us not having any observation to compare the first value to.

1data['Returns'] = data['Close'].pct_change()
Returns

We get the day by day returns in percentage

2data['Daily_Fall'] = -1 * data['Returns'] * 100
Scaling the data

We then scale the data, such that we get in non-decimal representation. We do the 1-1 due to wanting to find the highest loss.

3data = data.dropna()
Remove NA values

Since we will have that the first observation will result in a NA value, we will drop that

We then group by years, this is the part where the blocks becomes part of it, given the length of our data interval, we will end up with a total of 2424 blocks;

1annual_maxima = data['Daily_Fall'].resample('YE').max()
Grouping by year

We then group the data by years, which gives us block data

MjM_j

In this case we will end up with 24 blocks.

Having done so, we are then able to fit Hξ,μ,σH_{\xi, \mu, \sigma} onto it, under the fitting we will then obtain the results of;

  1. Shape (ξ)(\xi)
  2. Mean (μ)(\mu)
  3. Scale (σ)(\sigma)
1params = stats.genextreme.fit(annual_maxima)
Fitting

We apply the fitting using stats.genextreme.fit on the blocks, it will solve it using Maximum Likelihood

2c, mu, sigma = params
Getting parameters

We extract the parameters

3xi = -c
Changing sign

The way that scipy defines the shape parameter is by cc, which is the negative of ξ\xi

Given this, we get an output of the shape parameters for the distribution, of the extreme losses that have been observed from the return data of the S&P500

ξ=0.2858μ=2.0348σ=0.7235\begin{aligned} \xi &= 0.2858\\ \mu &= 2.0348\\ \sigma &= 0.7235 \end{aligned}

We can then try and illustrate this Fréchet distribution

Example: AMD

But are we able to do this for other assets? The answer is yes. Let’s try and fetch for AMD. Given that AMD is a younger asset than the S&P500, we will extend it, to start from 1980 to 2004, such that we get the same amount of blocks

Get the daily returns, and construct the fall

And then get the block maxima losses, yearly.

And we can then present the distribution parameters;

ξ=0.3886μ=11.0590σ=4.8099\begin{aligned} \xi &= 0.3886\\ \mu &= 11.0590\\ \sigma &= 4.8099 \end{aligned}

From this, we then get a Fréchet distribution

Note: ξ>1\xi > 1

It should be noted, that in the case of getting ξ>1\xi > 1 that we will be dealing with infinite mean

Return level

Given the distributions we now have found, we can then look at the frequency of given events. We have that the return level formula rn,kr_{n,k} is given by

rn,k={μσξ((log(11k))ξ1),ξ0μσlog(log(11k)),ξ=0r_{n, k} = \begin{cases} \mu - \frac{\sigma}{\xi}\left(\left(-\log\left(1 - \frac{1}{k}\right)\right)^{-\xi}-1\right), & \xi \neq 0 \\ \mu - \sigma \log\left(-\log\left(1 - \frac{1}{k}\right)\right), & \xi = 0 \end{cases}

Where we have that kk is the return period, which is the expected number of years between events of a given magnitude, using AMD as an example, we can then calculate the return level for a 20 year event, i.e. giving us a 95%95\% confidence level. We start by defining the function

1def return_level(xi: float, mu: float, sigma: float, k: int) -> float:
2 if xi != 0:
Statement for first case

We create a statement for the case where ξ0\xi \neq 0

3 level = mu + (sigma / xi) * ((-np.log(1 - 1 / k)) ** (-xi) - 1)
Calculate return level

We then calculate the return level, by setting a variable to the given return

4 elif xi == 0:
Statement for second case

We create a statement for the case where ξ=0\xi = 0

5 level = mu - sigma * np.log(-np.log(1 - 1 / k))
Calculate return level

We calculate the return level in the case where it's a Gumbel distribution

6 return level

After defining the function, we can then calculate the return level for a 20 year event

1k = 20
Set period

We set the period to a 1-in-20 year event, which gives us a confidence level of 95%95\%

2rl = return_level(xi, mu, sigma, 20)
Calculate return level

We then calculate the return level

3print(rl) # 37.9377

From this, we then have that

r260,20=37.9377%r_{260,20} = 37.9377\%

Which means that it’s a 1-in-20 year event, that we will see a loss of 37.9377% or more in a single day.

Return period

If we instead wish to calculate the return period for a given loss, we can then use the following formula

kn,u=1Hˉ(u)k_{n, u} = \frac{1}{\bar{H}(u)}

So given that we wish to find the return period for a loss of 37.9377%37.9377\%, we can then calculate this by using the GEV distribution that we have fitted.

1def return_period(xi: float, mu: float, sigma: float, u: float) -> float:
2 if xi != 0:
Statement for first case

We create a statement for the case where ξ0\xi \neq 0

3 t = 1 + xi * (u - mu) / sigma
Helper variable

We construct a helper t, to better keep track of the formula

4 cdf_value = np.exp(-t ** (-1 / xi))
Main variable for Fréchet and Weibull

We then calculate the main variable

5 elif xi == 0:
Statement for Gumbel case

Construct statement for the case where ξ=0\xi = 0

6 cdf_value = np.exp(-np.exp(-(u - mu) / sigma))
Main variable for Gumbel

We calculate H(u)H(u) for the Gumbel case

7 return 1 / (1 - cdf_value)
Return

We then return the return period, by calculating 1/(1H(u))1/(1-H(u)) where 1H(u)=Hˉ(u)1-H(u) = \bar{H}(u)

We apply testing, to see if we get the same results, by using the return level results

1u = 37.93779050159416
Define return level

We use the result from the return level

2rp = return_period(xi, mu, sigma, u)
Construct variable to store results

We construct a variable to store the return period results from calling the function

3print(rp) # 19.9999

So as we can see, we get that

k260,37.9377=19.9999k_{260,37.9377} = 19.9999

As expected.

Point of Threshold (PoT)

If we consider the way that block maxima functions, we can without much difficulty at all, observe that it’s a method that is very wasteful of data. In the 24 blocks, we have 250\sim 250 observations, however, we only use one single observation from that data block. Giving us a total of 24 observations to estimate our GEV distribution from. One way to combat this, is to use the Point of Threshold Method (PoT) instead.

The theory from this, is that we select some threshold given by uu, from this we then apply a fitting of a Generalized Pareto Distribution (GPD).

The GPD is given by

Gξ,β(x)={1(1+ξxβ)1ξ,ξ0,1exp(xβ),ξ=0.G_{\xi,\beta}(x) = \begin{cases} 1 - \left(1 + \frac{\xi x}{\beta}\right)^{-\frac{1}{\xi}}, & \xi \neq 0, \\[1em] 1 - \exp\left(-\frac{x}{\beta}\right), & \xi = 0. \end{cases}

Where we have that ξ\xi is the shape parameter, and β\beta is the scale parameter.

Gathering data

We gather data just like the previous method for Block Maxima, and use the loss function

Lt=100(1eXt)L_t = 100 (1- e^{X_t})

Under the assumption that XtX_t is the log return of the asset. Given the case where we just use normal returns, we have that the loss function can be simplified to

Lt=100XtL_t = -100 X_t

So in reality, we can construct the loss function by

1data['Loss'] = -100 * data['Close'].pct_change()
Simplified loss function

Loss function given that we are using normal returns.

However, given that the returns are log returns, we would apply the following function

1def loss_function(data: pd.Series, series: str) -> pd.Series:
2 df = pd.DataFrame()
Construct empty dataframe

Used to store results.

3 df["Loss"] = 100 * (1 - np.exp(data[series]))
Add series to dataframe

Using the equation for converting log returns to losses.

4 return df
Return dataframe

We then return the new dataframe.

We can then try and illustrate this, using the AMD data, however, we are now using weekly time intervals.

Mean excess plot method

Gathered the loss data, we then need to find a suitable threshold, which we can do by using the mean excess plot, given by

en(v)=i=1n(Xiv)1{Xi>v}i=1n1{Xi>v}e_n(v) = \frac{ \sum_{i=1}^{n} (X_i - v)\, \mathbf{1}_{\{X_i > v\}} }{ \sum_{i=1}^{n} \mathbf{1}_{\{X_i > v\}} }

We do this by creating a series of thresholds, and then calculating the mean excess for each of these thresholds.

1def mean_excess_series(losses: pd.Series, upper_threshold: int) -> pd.Series:
2 thresholds = list(range(1, upper_threshold + 1))
Setting thresholds

We create a list of thresholds, we always wish to start from some low number, and then move up, because of this we just define the upper limit.

3 results = []
Create storage list

A list where we store the mean excess for each threshold.

4 for v in thresholds:
Loop over the thresholds

We run a loop over the thresholds.

5 excesses = losses[losses > v] - v
Finding Excesses

We have that the excesses are given by the losses, where we have that they're over the threshold, and then the threshold subtracted from it.

6 results.append(excesses.mean() if not excesses.empty else 0)
Appending the mean

We then append the mean of it, if there are no excesses, we append 0.

7 return pd.Series(results, index=thresholds)
Return as series

We return it as a series with the index being the threshold levels.

Given the declared function, we then call it on our data

1upper_threshold = 30
Set threshold

We set the upper limit

2losses = df['Loss']
Define losses

Create a variable for the losses.

3mef = mean_excess_series(losses, upper_threshold)
Run function

We then run the function, giving us a variable mef which contains the series.

We can observe decreasing mean excess from 11 to 55, then from 66 to 1616 we see a roughly linear increase in mean excess, and given this, we select a threshold of 6%6\% as the threshold for our Point of Threshold method. Then by applying Maximum Likelihood Estimation, we can fit the GPD distribution to the excesses over the threshold.

1u = 6
Set threshold

We set the threshold to 6%

2exceedances = losses[losses > u]
We get exceedances

We get the observations which exceeds the threshold.

3excess_values = (exceedances - u).values
Calculate excess

We calculate the excess, by subtracting the threshold from the exceedances.

4n_exceedances = len(excess_values)
Number of exceedances

We calculate the number of exceedances, which is used in the fitting process.

We then create a function which is negative log likelihood function, which we then minimize to get the parameters for the GPD distribution. The negative log likelihood function is given by

lnL(ξ,β;Y1,,YNu)=j=1Nulngξ,β(Yj)=Nulnβ(1+1ξ)j=1Nuln(1+ξYjβ)\begin{aligned} \ln L(\xi, \beta ; Y_{1}, \ldots, Y_{N_{u}}) &= \sum_{j=1}^{N_{u}} \ln g_{\xi, \beta}(Y_{j}) \\ &= -N_{u} \ln \beta - \left(1 + \frac{1}{\xi}\right) \sum_{j=1}^{N_{u}} \ln \left(1 + \xi \frac{Y_{j}}{\beta}\right) \end{aligned}

Where gξ,βg_{\xi, \beta} is the density function of the GPD distribution, and NuN_u is the number of exceedances over the threshold.

1def neg_log_likelihood(params: np.ndarray, y_data: np.ndarray) -> float:
2 xi, beta = params
Define initial guess

We will feed the function some initial guess during the optimization process, and we define them here.

3 n_u = len(y_data)
Get the length of the data

This corresponds to NuN_u in the formula, and is used in the negative log likelihood function.

4
5 if beta <= 0 or np.any(1 + xi * y_data / beta <= 0):
We then set up the constraints

We have that the scale parameter must be positive, and that the shape parameter must be such that the term inside the logarithm is positive.

6 return 1e10
7
8 term1 = -n_u * np.log(beta)
First term

We define the first term of the negative log likelihood function, which is Nulnβ-N_u \ln \beta

9 term2 = -(1 + 1 / xi) * np.sum(np.log(1 + xi * y_data / beta))
Second term

We define the second term of the negative log likelihood function, which is (1+1ξ)j=1Nuln(1+ξYjβ)-\left(1 + \frac{1}{\xi}\right) \sum_{j=1}^{N_u} \ln \left(1 + \xi \frac{Y_j}{\beta}\right)

10 return -(term1 + term2)
Return negative log likelihood

We return the negative log likelihood, which is the sum of the two terms, and we negate it because we want to minimize it.

We can then run the optimization process, to get the parameters for the GPD distribution.

1res = minimize(neg_log_likelihood,
Create variable to store results

We define res to store the results from the optimization process, which we will run on the function neg_log_likelihood.

2 [0.1, np.mean(excess_values)],
Define initial guess

We define the initial guess for the optimization process, which is given by the shape parameter being 0.1, and the scale parameter being the mean of the excesses.

3 args=(excess_values,),
Define data to fit on

We give the data to fit on, which is the excesses over the threshold.

4 method='BFGS')
Define optimization method

We use the BFGS method, since it will give us the inverse Hessian, resulting in variance-covariance matrix.

5res
Extract parameters

We extract the parameters from the optimization results, which are the shape and scale parameters for the GPD distribution.

1message: Optimization terminated successfully.
2success: True
3 status: 0
4 fun: 615.7623180982071
5 x: [ 1.460e-01 3.732e+00]
6 nit: 9
7 jac: [ 0.000e+00 0.000e+00]
8hess_inv: [[ 5.401e-03 -1.780e-02]
9 [-1.780e-02 1.272e-01]]
10 nfev: 54
11 njev: 18

Given this, we then get

ξ=0.1459β=3.7324\begin{aligned} \xi &= 0.1459\\ \beta &= 3.7324 \end{aligned}

Having done this, we can then try and illustrate the fit of the GPD distribution to the excesses, by plotting the empirical CDF of the excesses, and the theoretical CDF of the GPD distribution.

Furthermore, we can gather the standard errors, due to us having the parameter variances from the variance-covariance matrix.

Var(ξ)=0.005401Var(β)=0.1272\begin{aligned} \text{Var}(\xi) &= 0.005401\\ \text{Var}(\beta) &= 0.1272 \end{aligned}

Of which we can then calculate them

SE(ξ)=0.005401=0.0735SE(β)=0.1272=0.3566\begin{aligned} \text{SE}(\xi) &= \sqrt{0.005401}\\ &= 0.0735\\ \text{SE}(\beta) &= \sqrt{0.1272}\\ &= 0.3566 \end{aligned}

Getting Value-at-Risk (VaR)

Given that we now have the parameters for the GPD distribution, we are able to calculate the Value at Risk (VaR) for a given confidence level. The formula for the VaR is given by

VaRα=u+βξ((1αFˉ(u))ξ1)\text{VaR}_\alpha = u + \frac{\beta}{\xi} \left( \left( \frac{1 - \alpha}{\bar{F}(u)} \right)^{-\xi} - 1 \right)
1def calculate_var(alpha: float,
Define function

We define a function to calculate the VaR for a given confidence level, using the parameters from the GPD distribution.

2 u: int,
3 xi: float,
4 beta: float,
5 n_total: int,
6 n_exceedances: int) -> float:
7
8 f_bar_u = n_exceedances / n_total
Calculate exceedance probability

We calculate the exceedance probability, which is given by the number of exceedances over the total number of observations.

9
10 inner_term = (1 - alpha) / f_bar_u
Calculate inner term

We calculate the inner term of the formula, which is given by (1α)/fuˉ(1 - \alpha) / f_{\bar{u}} where fuˉf_{\bar{u}} is the exceedance probability.

11 var_alpha = u + (beta / xi) * (inner_term ** (-xi) - 1)
Calculate VaR

We then calculate the VaR using the formula, and return it.

12
13 return var_alpha

From calculating the VaR0.99VaR_{0.99}, we get that it is given by

VaR0.99=19.8798%\text{VaR}_{0.99} = 19.8798\%

Getting Expected Shortfall (ES)

Just like we can get the VaR, we are also able to get the Expected Shortfall (ES) for a given confidence level, which is given by the formula

ESα=11αα1qx(F)dx=VaRα1ξ+βξu1ξ\text{ES}_\alpha = \frac{1}{1 - \alpha} \int_\alpha^1 q_x(F) dx = \frac{\text{VaR}_\alpha}{1 - \xi} + \frac{\beta - \xi u}{1 - \xi}
1def calculate_es(var_alpha: float,
2 u: int,
3 xi: float,
4 beta: float) -> float:
5
6 if xi >= 1:
Condition for undefined ES

In the case where ξ>1\xi > 1, we are unable to find the ES.

7 return float('inf')
8
9 es_alpha = (var_alpha / (1 - xi)) + ((beta - xi * u) / (1 - xi))
Calculate ES

We calculate the ES using the formula

10
11 return es_alpha

From this, we then get that the ES0.99ES_{0.99} is given by

ES0.99=26.62325%\text{ES}_{0.99} = 26.62325\%
contact