Application of Extreme Value Theory on Time Series Data for Risk Management

Feb 10, 2026

Introduction

Extreme Value Theory (EVT), is a branch of probability theory, which focuses on the laws that regard extreme values in large samples of data.

The goal of this theory, is to try and model what happens in these extreme cases, that we find in nature or industry, such as when rope breaks or when a market crashes. If we understand the underlying dynamics and laws that govern these, we’re able to better manage risk.

Usually, this is done by having some sort of dataset, which contains observations for extreme events, and the magnitude that comes with them, such as financial losses which are incurred by them. A very common example of these datasets is the Danish Fire Insurance data.

This is done by first considering a sequence of independent identically distributed random variables

\left(X_{i}\right)_{i \in \mathbb{N}}

these are typically the financial losses of such events taking place

Generalized Extreme Value Distribution (GEV)

By applying Generalized Extreme Value Distribution (GEV), which is given by

H_{\xi}(x) = \begin{cases} e^{-(1+\xi x)^{\frac{-1}{\xi}}}, & \xi \neq 0\\ e^{-e^{-x}}, & \xi = 0 \end{cases}

In this, we have that $\xi$ is given as the shape parameter of the distribution, and $H_{\xi}$ is the definition of a type of distribution.

We can try and illustrate this by choosing different shape parameters, which are given by

$\xi > 0$ ; Fréchet (Usually the most observed)
$\xi = 0$ ; Gumbel
$\xi < 0$ ; Weibull

Block Maxima Method

The Block Maxima Method (BMM), is a method of dividing datasets into intervals, this can be years, quarters and so on. Given these blocks, we then observe the biggest magnitude event that have happened during that period, which we would consider a negative event. So in finance, we look at the case of large losses.

Given this, we then proceed to take these observations and fit a three parameter GEV to these observations. This three parameter GEV is given by redefining $x$ as

x = \frac{x-\mu}{\sigma}

which is called a location-scale transformation. Doing this then gives us the following GEV

H_{\xi, \mu, \sigma}(x) = \begin{cases} e^{-\left(1 + \xi \frac{x - \mu}{\sigma}\right)^{\frac{-1}{\xi}}}, & \xi \neq 0 \\ e^{-e^{-\frac{x - \mu}{\sigma}}}, & \xi = 0 \end{cases}

Example: S&P500

Given the context of which we wish to try and apply this theory, we will use the S&P500 index, and try to fit this extreme loss model.

To start with, we fetch our data, which will be the price of the S&P500 throughout time

+1ticker = "^GSPC"Ticker
The ticker ^GSPC relates to the data which is for the S&P 500
+2data = yf.download(ticker,start="1960-01-01", end="1987-10-17")Fetching data
We then fetch the data. For this we use yfinance package.

Having done this, we then calculate the returns of the closing price of every single day. Note that this will leave us with the first observation to have a NA value, due to us not having any observation to compare the first value to.

+1data['Returns'] = data['Close'].pct_change()Returns
We get the day by day returns in percentage
+2data['Daily_Fall'] = -1 * data['Returns'] * 100Scaling the data
We then scale the data, such that we get in non-decimal representation. We do the  $-1$  due to wanting to find the highest loss.
+3data = data.dropna()Remove NA values
Since we will have that the first observation will result in a NA value, we will drop that

We then group by years, this is the part where the blocks becomes part of it, given the length of our data interval, we will end up with a total of $24$ blocks;

+1annual_maxima = data['Daily_Fall'].resample('YE').max()Grouping by year
We then group the data by years, which gives us block data
 $M_j$ 
In this case we will end up with 24 blocks.

Having done so, we are then able to fit $H_{\xi, \mu, \sigma}$ onto it, under the fitting we will then obtain the results of;

Shape $(\xi)$

Mean $(\mu)$

Scale $(\sigma)$

+1params = stats.genextreme.fit(annual_maxima)Fitting
We apply the fitting using stats.genextreme.fit on the blocks, it will solve it using Maximum Likelihood
+2c, mu, sigma = paramsGetting parameters
We extract the parameters
+3xi = -cChanging sign
The way that scipy defines the shape parameter is by  $c$ , which is the negative of  $\xi$

Given this, we get an output of the shape parameters for the distribution, of the extreme losses that have been observed from the return data of the S&P500

\begin{aligned} \xi &= 0.2858\\ \mu &= 2.0348\\ \sigma &= 0.7235 \end{aligned}

We can then try and illustrate this Fréchet distribution

Example: AMD

But are we able to do this for other assets? The answer is yes. Let’s try and fetch for AMD. Given that AMD is a younger asset than the S&P500, we will extend it, to start from 1980 to 2004, such that we get the same amount of blocks

Get the daily returns, and construct the fall

And then get the block maxima losses, yearly.

And we can then present the distribution parameters;

\begin{aligned} \xi &= 0.3886\\ \mu &= 11.0590\\ \sigma &= 4.8099 \end{aligned}

From this, we then get a Fréchet distribution

Note: $\xi > 1$

It should be noted, that in the case of getting $\xi > 1$ that we will be dealing with infinite mean

Return level

Given the distributions we now have found, we can then look at the frequency of given events. We have that the return level formula $r_{n,k}$ is given by

r_{n, k} = \begin{cases} \mu - \frac{\sigma}{\xi}\left(\left(-\log\left(1 - \frac{1}{k}\right)\right)^{-\xi}-1\right), & \xi \neq 0 \\ \mu - \sigma \log\left(-\log\left(1 - \frac{1}{k}\right)\right), & \xi = 0 \end{cases}

Where we have that $k$ is the return period, which is the expected number of years between events of a given magnitude, using AMD as an example, we can then calculate the return level for a 20 year event, i.e. giving us a $95\%$ confidence level. We start by defining the function

+1def return_level(xi: float, mu: float, sigma: float, k: int) -> float:
+2  if xi != 0:Statement for first case
We create a statement for the case where  $\xi \neq 0$ 
+3      level = mu + (sigma / xi) * ((-np.log(1 - 1 / k)) ** (-xi) - 1)Calculate return level
We then calculate the return level, by setting a variable to the given return
+4  elif xi == 0:Statement for second case
We create a statement for the case where  $\xi = 0$ 
+5      level = mu - sigma * np.log(-np.log(1 - 1 / k))Calculate return level
We calculate the return level in the case where it's a Gumbel distribution
+6  return level

After defining the function, we can then calculate the return level for a 20 year event

+1k = 20Set period
We set the period to a 1-in-20 year event, which gives us a confidence level of  $95\%$ 
+2rl = return_level(xi, mu, sigma, 20)Calculate return level
We then calculate the return level
+3print(rl)  # 37.9377

From this, we then have that

r_{260,20} = 37.9377\%

Which means that it’s a 1-in-20 year event, that we will see a loss of 37.9377% or more in a single day.

Return period

If we instead wish to calculate the return period for a given loss, we can then use the following formula

k_{n, u} = \frac{1}{\bar{H}(u)}

So given that we wish to find the return period for a loss of $37.9377\%$ , we can then calculate this by using the GEV distribution that we have fitted.

+1def return_period(xi: float, mu: float, sigma: float, u: float) -> float:
+2  if xi != 0:Statement for first case
We create a statement for the case where  $\xi \neq 0$ 
+3      t = 1 + xi * (u - mu) / sigmaHelper variable
We construct a helper t, to better keep track of the formula
+4      cdf_value = np.exp(-t ** (-1 / xi))Main variable for Fréchet and Weibull
We then calculate the main variable
+5  elif xi == 0:Statement for Gumbel case
Construct statement for the case where  $\xi = 0$ 
+6      cdf_value = np.exp(-np.exp(-(u - mu) / sigma))Main variable for Gumbel
We calculate  $H(u)$  for the Gumbel case
+7  return 1 / (1 - cdf_value)Return
We then return the return period, by calculating  $1/(1-H(u))$  where  $1-H(u) = \bar{H}(u)$

We apply testing, to see if we get the same results, by using the return level results

+1u = 37.93779050159416Define return level
We use the result from the return level
+2rp = return_period(xi, mu, sigma, u)Construct variable to store results
We construct a variable to store the return period results from calling the function
+3print(rp)  # 19.9999

So as we can see, we get that

k_{260,37.9377} = 19.9999

As expected.

Point of Threshold (PoT)

If we consider the way that block maxima functions, we can without much difficulty at all, observe that it’s a method that is very wasteful of data. In the 24 blocks, we have $\sim 250$ observations, however, we only use one single observation from that data block. Giving us a total of 24 observations to estimate our GEV distribution from. One way to combat this, is to use the Point of Threshold Method (PoT) instead.

The theory from this, is that we select some threshold given by $u$ , from this we then apply a fitting of a Generalized Pareto Distribution (GPD).

The GPD is given by

G_{\xi,\beta}(x) = \begin{cases} 1 - \left(1 + \frac{\xi x}{\beta}\right)^{-\frac{1}{\xi}}, & \xi \neq 0, \\[1em] 1 - \exp\left(-\frac{x}{\beta}\right), & \xi = 0. \end{cases}

Where we have that $\xi$ is the shape parameter, and $\beta$ is the scale parameter.

Gathering data

We gather data just like the previous method for Block Maxima, and use the loss function

L_t = 100 (1- e^{X_t})

Under the assumption that $X_t$ is the log return of the asset. Given the case where we just use normal returns, we have that the loss function can be simplified to

L_t = -100 X_t

So in reality, we can construct the loss function by

+1data['Loss'] = -100 * data['Close'].pct_change()Simplified loss function
Loss function given that we are using normal returns.

However, given that the returns are log returns, we would apply the following function

+1def loss_function(data: pd.Series, series: str) -> pd.Series:
+2  df = pd.DataFrame()Construct empty dataframe
Used to store results.
+3  df["Loss"] = 100 * (1 - np.exp(data[series]))Add series to dataframe
Using the equation for converting log returns to losses.
+4  return dfReturn dataframe
We then return the new dataframe.

We can then try and illustrate this, using the AMD data, however, we are now using weekly time intervals.

Mean excess plot method

Gathered the loss data, we then need to find a suitable threshold, which we can do by using the mean excess plot, given by

e_n(v) = \frac{ \sum_{i=1}^{n} (X_i - v)\, \mathbf{1}_{\{X_i > v\}} }{ \sum_{i=1}^{n} \mathbf{1}_{\{X_i > v\}} }

We do this by creating a series of thresholds, and then calculating the mean excess for each of these thresholds.

+1def mean_excess_series(losses: pd.Series, upper_threshold: int) -> pd.Series:
+2  thresholds = list(range(1, upper_threshold + 1))Setting thresholds
We create a list of thresholds, we always wish to start from some low number, and then move up, because of this we just define the upper limit.
+3  results = []Create storage list
A list where we store the mean excess for each threshold.
+4  for v in thresholds:Loop over the thresholds
We run a loop over the thresholds.
+5      excesses = losses[losses > v] - vFinding Excesses
We have that the excesses are given by the losses, where we have that they're over the threshold, and then the threshold subtracted from it.
+6      results.append(excesses.mean() if not excesses.empty else 0)Appending the mean
We then append the mean of it, if there are no excesses, we append 0.
+7  return pd.Series(results, index=thresholds)Return as series
We return it as a series with the index being the threshold levels.

Given the declared function, we then call it on our data

+1upper_threshold = 30Set threshold
We set the upper limit
+2losses = df['Loss']Define losses
Create a variable for the losses.
+3mef = mean_excess_series(losses, upper_threshold)Run function
We then run the function, giving us a variable mef which contains the series.

We can observe decreasing mean excess from $1$ to $5$ , then from $6$ to $16$ we see a roughly linear increase in mean excess, and given this, we select a threshold of $6\%$ as the threshold for our Point of Threshold method. Then by applying Maximum Likelihood Estimation, we can fit the GPD distribution to the excesses over the threshold.

+1u = 6Set threshold
We set the threshold to 6%
+2exceedances = losses[losses > u]We get exceedances
We get the observations which exceeds the threshold.
+3excess_values = (exceedances - u).valuesCalculate excess
We calculate the excess, by subtracting the threshold from the exceedances.
+4n_exceedances = len(excess_values)Number of exceedances
We calculate the number of exceedances, which is used in the fitting process.

We then create a function which is negative log likelihood function, which we then minimize to get the parameters for the GPD distribution. The negative log likelihood function is given by

\begin{aligned} \ln L(\xi, \beta ; Y_{1}, \ldots, Y_{N_{u}}) &= \sum_{j=1}^{N_{u}} \ln g_{\xi, \beta}(Y_{j}) \\ &= -N_{u} \ln \beta - \left(1 + \frac{1}{\xi}\right) \sum_{j=1}^{N_{u}} \ln \left(1 + \xi \frac{Y_{j}}{\beta}\right) \end{aligned}

Where $g_{\xi, \beta}$ is the density function of the GPD distribution, and $N_u$ is the number of exceedances over the threshold.

+1def neg_log_likelihood(params: np.ndarray, y_data: np.ndarray) -> float:
+2  xi, beta = paramsDefine initial guess
We will feed the function some initial guess during the optimization process, and we define them here.
+3  n_u = len(y_data)Get the length of the data
This corresponds to  $N_u$  in the formula, and is used in the negative log likelihood function.
+4  
+5  if beta <= 0 or np.any(1 + xi * y_data / beta <= 0):We then set up the constraints
We have that the scale parameter must be positive, and that the shape parameter must be such that the term inside the logarithm is positive.
+6      return 1e10
+7  
+8  term1 = -n_u * np.log(beta)First term
We define the first term of the negative log likelihood function, which is  $-N_u \ln \beta$ 
+9  term2 = -(1 + 1 / xi) * np.sum(np.log(1 + xi * y_data / beta))Second term
We define the second term of the negative log likelihood function, which is  $-\left(1 + \frac{1}{\xi}\right) \sum_{j=1}^{N_u} \ln \left(1 + \xi \frac{Y_j}{\beta}\right)$ 
+10  return -(term1 + term2)Return negative log likelihood
We return the negative log likelihood, which is the sum of the two terms, and we negate it because we want to minimize it.

We can then run the optimization process, to get the parameters for the GPD distribution.

+1res = minimize(neg_log_likelihood,Create variable to store results
We define res to store the results from the optimization process, which we will run on the function neg_log_likelihood.
+2             [0.1, np.mean(excess_values)],Define initial guess
We define the initial guess for the optimization process, which is given by the shape parameter being 0.1, and the scale parameter being the mean of the excesses.
+3             args=(excess_values,),Define data to fit on
We give the data to fit on, which is the excesses over the threshold.
+4             method='BFGS')Define optimization method
We use the BFGS method, since it will give us the inverse Hessian, resulting in variance-covariance matrix.
+5resExtract parameters
We extract the parameters from the optimization results, which are the shape and scale parameters for the GPD distribution.

+1message: Optimization terminated successfully.
+2success: True
+3 status: 0
+4    fun: 615.7623180982071
+5      x: [ 1.460e-01  3.732e+00]
+6    nit: 9
+7    jac: [ 0.000e+00  0.000e+00]
+8hess_inv: [[ 5.401e-03 -1.780e-02]
+9          [-1.780e-02  1.272e-01]]
+10   nfev: 54
+11   njev: 18

Given this, we then get

\begin{aligned} \xi &= 0.1459\\ \beta &= 3.7324 \end{aligned}

Having done this, we can then try and illustrate the fit of the GPD distribution to the excesses, by plotting the empirical CDF of the excesses, and the theoretical CDF of the GPD distribution.

Furthermore, we can gather the standard errors, due to us having the parameter variances from the variance-covariance matrix.

\begin{aligned} \text{Var}(\xi) &= 0.005401\\ \text{Var}(\beta) &= 0.1272 \end{aligned}

Of which we can then calculate them

\begin{aligned} \text{SE}(\xi) &= \sqrt{0.005401}\\ &= 0.0735\\ \text{SE}(\beta) &= \sqrt{0.1272}\\ &= 0.3566 \end{aligned}

Getting Value-at-Risk (VaR)

Given that we now have the parameters for the GPD distribution, we are able to calculate the Value at Risk (VaR) for a given confidence level. The formula for the VaR is given by

\text{VaR}_\alpha = u + \frac{\beta}{\xi} \left( \left( \frac{1 - \alpha}{\bar{F}(u)} \right)^{-\xi} - 1 \right)

+1def calculate_var(alpha: float,Define function
We define a function to calculate the VaR for a given confidence level, using the parameters from the GPD distribution.
+2                u: int,
+3                xi: float,
+4                beta: float,
+5                n_total: int,
+6                n_exceedances: int) -> float:
+7
+8  f_bar_u = n_exceedances / n_totalCalculate exceedance probability
We calculate the exceedance probability, which is given by the number of exceedances over the total number of observations.
+9
+10  inner_term = (1 - alpha) / f_bar_uCalculate inner term
We calculate the inner term of the formula, which is given by  $(1 - \alpha) / f_{\bar{u}}$  where  $f_{\bar{u}}$  is the exceedance probability.
+11  var_alpha = u + (beta / xi) * (inner_term ** (-xi) - 1)Calculate VaR
We then calculate the VaR using the formula, and return it.
+12  
+13  return var_alpha

From calculating the $VaR_{0.99}$ , we get that it is given by

\text{VaR}_{0.99} = 19.8798\%

Getting Expected Shortfall (ES)

Just like we can get the VaR, we are also able to get the Expected Shortfall (ES) for a given confidence level, which is given by the formula

\text{ES}_\alpha = \frac{1}{1 - \alpha} \int_\alpha^1 q_x(F) dx = \frac{\text{VaR}_\alpha}{1 - \xi} + \frac{\beta - \xi u}{1 - \xi}

+1def calculate_es(var_alpha: float,
+2               u: int,
+3               xi: float,
+4               beta: float) -> float:
+5  
+6  if xi >= 1:Condition for undefined ES
In the case where  $\xi > 1$ , we are unable to find the ES.
+7      return float('inf')
+8      
+9  es_alpha = (var_alpha / (1 - xi)) + ((beta - xi * u) / (1 - xi))Calculate ES
We calculate the ES using the formula
+10  
+11  return es_alpha

From this, we then get that the $ES_{0.99}$ is given by

\text{ES}_{0.99} = 26.62325\%