Application of Extreme Value Theory on Time Series Data for Risk Management
Feb 10, 2026
Introduction
Extreme Value Theory (EVT), is a branch of probability theory, which focuses on the laws that regard extreme values in large samples of data.
The goal of this theory, is to try and model what happens in these extreme cases, that we find in nature or industry, such as when rope breaks or when a market crashes. If we understand the underlying dynamics and laws that govern these, we’re able to better manage risk.
Usually, this is done by having some sort of dataset, which contains observations for extreme events, and the magnitude that comes with them, such as financial losses which are incurred by them. A very common example of these datasets is the Danish Fire Insurance data.
This is done by first considering a sequence of independent identically distributed random variables
these are typically the financial losses of such events taking place
Generalized Extreme Value Distribution (GEV)
By applying Generalized Extreme Value Distribution (GEV), which is given by
In this, we have that is given as the shape parameter of the distribution, and is the definition of a type of distribution.
We can try and illustrate this by choosing different shape parameters, which are given by
- ; Fréchet (Usually the most observed)
- ; Gumbel
- ; Weibull
Block Maxima Method
The Block Maxima Method (BMM), is a method of dividing datasets into intervals, this can be years, quarters and so on. Given these blocks, we then observe the biggest magnitude event that have happened during that period, which we would consider a negative event. So in finance, we look at the case of large losses.
Given this, we then proceed to take these observations and fit a three parameter GEV to these observations. This three parameter GEV is given by redefining as
which is called a location-scale transformation. Doing this then gives us the following GEV
Example: S&P500
Given the context of which we wish to try and apply this theory, we will use the S&P500 index, and try to fit this extreme loss model.
To start with, we fetch our data, which will be the price of the S&P500 throughout time
1ticker = "^GSPC"TickerThe ticker
^GSPCrelates to the data which is for the S&P 5002data = yf.download(ticker,start="1960-01-01", end="1987-10-17")Fetching dataWe then fetch the data. For this we use
yfinancepackage.
Having done this, we then calculate the returns of the closing price of every single day. Note that this will leave us with the first observation to have a NA value,
due to us not having any observation to compare the first value to.
1data['Returns'] = data['Close'].pct_change()ReturnsWe get the day by day returns in percentage
2data['Daily_Fall'] = -1 * data['Returns'] * 100Scaling the dataWe then scale the data, such that we get in non-decimal representation. We do the due to wanting to find the highest loss.
3data = data.dropna()Remove NA valuesSince we will have that the first observation will result in a NA value, we will drop that
We then group by years, this is the part where the blocks becomes part of it, given the length of our data interval, we will end up with a total of blocks;
1annual_maxima = data['Daily_Fall'].resample('YE').max()Grouping by yearWe then group the data by years, which gives us block data
In this case we will end up with 24 blocks.
Having done so, we are then able to fit onto it, under the fitting we will then obtain the results of;
- Shape
- Mean
- Scale
1params = stats.genextreme.fit(annual_maxima)FittingWe apply the fitting using
stats.genextreme.fiton the blocks, it will solve it using Maximum Likelihood2c, mu, sigma = paramsGetting parametersWe extract the parameters
3xi = -cChanging signThe way that
scipydefines the shape parameter is by , which is the negative of
Given this, we get an output of the shape parameters for the distribution, of the extreme losses that have been observed from the return data of the S&P500
We can then try and illustrate this Fréchet distribution
Example: AMD
But are we able to do this for other assets? The answer is yes. Let’s try and fetch for AMD. Given that AMD is a younger asset than
the S&P500, we will extend it, to start from 1980 to 2004, such that we get the same amount of blocks
Get the daily returns, and construct the fall
And then get the block maxima losses, yearly.
And we can then present the distribution parameters;
From this, we then get a Fréchet distribution
Note:
It should be noted, that in the case of getting that we will be dealing with infinite mean
Return level
Given the distributions we now have found, we can then look at the frequency of given events. We have that the return level formula is given by
Where we have that is the return period, which is the expected number of years between events of a given magnitude, using AMD as an example, we can then calculate the return level for a 20 year event, i.e. giving us a confidence level. We start by defining the function
1def return_level(xi: float, mu: float, sigma: float, k: int) -> float:2 if xi != 0:Statement for first caseWe create a statement for the case where
3 level = mu + (sigma / xi) * ((-np.log(1 - 1 / k)) ** (-xi) - 1)Calculate return levelWe then calculate the return level, by setting a variable to the given return
4 elif xi == 0:Statement for second caseWe create a statement for the case where
5 level = mu - sigma * np.log(-np.log(1 - 1 / k))Calculate return levelWe calculate the return level in the case where it's a Gumbel distribution
6 return level
After defining the function, we can then calculate the return level for a 20 year event
1k = 20Set periodWe set the period to a 1-in-20 year event, which gives us a confidence level of
2rl = return_level(xi, mu, sigma, 20)Calculate return levelWe then calculate the return level
3print(rl) # 37.9377
From this, we then have that
Which means that it’s a 1-in-20 year event, that we will see a loss of 37.9377% or more in a single day.
Return period
If we instead wish to calculate the return period for a given loss, we can then use the following formula
So given that we wish to find the return period for a loss of , we can then calculate this by using the GEV distribution that we have fitted.
1def return_period(xi: float, mu: float, sigma: float, u: float) -> float:2 if xi != 0:Statement for first caseWe create a statement for the case where
3 t = 1 + xi * (u - mu) / sigmaHelper variableWe construct a helper
t, to better keep track of the formula4 cdf_value = np.exp(-t ** (-1 / xi))Main variable for Fréchet and WeibullWe then calculate the main variable
5 elif xi == 0:Statement for Gumbel caseConstruct statement for the case where
6 cdf_value = np.exp(-np.exp(-(u - mu) / sigma))Main variable for GumbelWe calculate for the Gumbel case
7 return 1 / (1 - cdf_value)ReturnWe then return the return period, by calculating where
We apply testing, to see if we get the same results, by using the return level results
1u = 37.93779050159416Define return levelWe use the result from the return level
2rp = return_period(xi, mu, sigma, u)Construct variable to store resultsWe construct a variable to store the return period results from calling the function
3print(rp) # 19.9999
So as we can see, we get that
As expected.
Point of Threshold (PoT)
If we consider the way that block maxima functions, we can without much difficulty at all, observe that it’s a method that is very wasteful of data. In the 24 blocks, we have observations, however, we only use one single observation from that data block. Giving us a total of 24 observations to estimate our GEV distribution from. One way to combat this, is to use the Point of Threshold Method (PoT) instead.
The theory from this, is that we select some threshold given by , from this we then apply a fitting of a Generalized Pareto Distribution (GPD).
The GPD is given by
Where we have that is the shape parameter, and is the scale parameter.
Gathering data
We gather data just like the previous method for Block Maxima, and use the loss function
Under the assumption that is the log return of the asset. Given the case where we just use normal returns, we have that the loss function can be simplified to
So in reality, we can construct the loss function by
1data['Loss'] = -100 * data['Close'].pct_change()Simplified loss functionLoss function given that we are using normal returns.
However, given that the returns are log returns, we would apply the following function
1def loss_function(data: pd.Series, series: str) -> pd.Series:2 df = pd.DataFrame()Construct empty dataframeUsed to store results.
3 df["Loss"] = 100 * (1 - np.exp(data[series]))Add series to dataframeUsing the equation for converting log returns to losses.
4 return dfReturn dataframeWe then return the new dataframe.
We can then try and illustrate this, using the AMD data, however, we are now using weekly time intervals.
Mean excess plot method
Gathered the loss data, we then need to find a suitable threshold, which we can do by using the mean excess plot, given by
We do this by creating a series of thresholds, and then calculating the mean excess for each of these thresholds.
1def mean_excess_series(losses: pd.Series, upper_threshold: int) -> pd.Series:2 thresholds = list(range(1, upper_threshold + 1))Setting thresholdsWe create a list of thresholds, we always wish to start from some low number, and then move up, because of this we just define the upper limit.
3 results = []Create storage listA list where we store the mean excess for each threshold.
4 for v in thresholds:Loop over the thresholdsWe run a loop over the thresholds.
5 excesses = losses[losses > v] - vFinding ExcessesWe have that the excesses are given by the losses, where we have that they're over the threshold, and then the threshold subtracted from it.
6 results.append(excesses.mean() if not excesses.empty else 0)Appending the meanWe then append the mean of it, if there are no excesses, we append 0.
7 return pd.Series(results, index=thresholds)Return as seriesWe return it as a series with the index being the threshold levels.
Given the declared function, we then call it on our data
1upper_threshold = 30Set thresholdWe set the upper limit
2losses = df['Loss']Define lossesCreate a variable for the losses.
3mef = mean_excess_series(losses, upper_threshold)Run functionWe then run the function, giving us a variable
mefwhich contains the series.
We can observe decreasing mean excess from to , then from to we see a roughly linear increase in mean excess, and given this, we select a threshold of as the threshold for our Point of Threshold method. Then by applying Maximum Likelihood Estimation, we can fit the GPD distribution to the excesses over the threshold.
1u = 6Set thresholdWe set the threshold to 6%
2exceedances = losses[losses > u]We get exceedancesWe get the observations which exceeds the threshold.
3excess_values = (exceedances - u).valuesCalculate excessWe calculate the excess, by subtracting the threshold from the exceedances.
4n_exceedances = len(excess_values)Number of exceedancesWe calculate the number of exceedances, which is used in the fitting process.
We then create a function which is negative log likelihood function, which we then minimize to get the parameters for the GPD distribution. The negative log likelihood function is given by
Where is the density function of the GPD distribution, and is the number of exceedances over the threshold.
1def neg_log_likelihood(params: np.ndarray, y_data: np.ndarray) -> float:2 xi, beta = paramsDefine initial guessWe will feed the function some initial guess during the optimization process, and we define them here.
3 n_u = len(y_data)Get the length of the dataThis corresponds to in the formula, and is used in the negative log likelihood function.
45 if beta <= 0 or np.any(1 + xi * y_data / beta <= 0):We then set up the constraintsWe have that the scale parameter must be positive, and that the shape parameter must be such that the term inside the logarithm is positive.
6 return 1e1078 term1 = -n_u * np.log(beta)First termWe define the first term of the negative log likelihood function, which is
9 term2 = -(1 + 1 / xi) * np.sum(np.log(1 + xi * y_data / beta))Second termWe define the second term of the negative log likelihood function, which is
10 return -(term1 + term2)Return negative log likelihoodWe return the negative log likelihood, which is the sum of the two terms, and we negate it because we want to minimize it.
We can then run the optimization process, to get the parameters for the GPD distribution.
1res = minimize(neg_log_likelihood,Create variable to store resultsWe define
resto store the results from the optimization process, which we will run on the functionneg_log_likelihood.2 [0.1, np.mean(excess_values)],Define initial guessWe define the initial guess for the optimization process, which is given by the shape parameter being 0.1, and the scale parameter being the mean of the excesses.
3 args=(excess_values,),Define data to fit onWe give the data to fit on, which is the excesses over the threshold.
4 method='BFGS')Define optimization methodWe use the
BFGSmethod, since it will give us the inverse Hessian, resulting in variance-covariance matrix.5resExtract parametersWe extract the parameters from the optimization results, which are the shape and scale parameters for the GPD distribution.
1message: Optimization terminated successfully.2success: True3 status: 04 fun: 615.76231809820715 x: [ 1.460e-01 3.732e+00]6 nit: 97 jac: [ 0.000e+00 0.000e+00]8hess_inv: [[ 5.401e-03 -1.780e-02]9 [-1.780e-02 1.272e-01]]10 nfev: 5411 njev: 18
Given this, we then get
Having done this, we can then try and illustrate the fit of the GPD distribution to the excesses, by plotting the empirical CDF of the excesses, and the theoretical CDF of the GPD distribution.
Furthermore, we can gather the standard errors, due to us having the parameter variances from the variance-covariance matrix.
Of which we can then calculate them
Getting Value-at-Risk (VaR)
Given that we now have the parameters for the GPD distribution, we are able to calculate the Value at Risk (VaR) for a given confidence level. The formula for the VaR is given by
1def calculate_var(alpha: float,Define functionWe define a function to calculate the VaR for a given confidence level, using the parameters from the GPD distribution.
2 u: int,3 xi: float,4 beta: float,5 n_total: int,6 n_exceedances: int) -> float:78 f_bar_u = n_exceedances / n_totalCalculate exceedance probabilityWe calculate the exceedance probability, which is given by the number of exceedances over the total number of observations.
910 inner_term = (1 - alpha) / f_bar_uCalculate inner termWe calculate the inner term of the formula, which is given by where is the exceedance probability.
11 var_alpha = u + (beta / xi) * (inner_term ** (-xi) - 1)Calculate VaRWe then calculate the VaR using the formula, and return it.
1213 return var_alpha
From calculating the , we get that it is given by
Getting Expected Shortfall (ES)
Just like we can get the VaR, we are also able to get the Expected Shortfall (ES) for a given confidence level, which is given by the formula
1def calculate_es(var_alpha: float,2 u: int,3 xi: float,4 beta: float) -> float:56 if xi >= 1:Condition for undefined ESIn the case where , we are unable to find the ES.
7 return float('inf')89 es_alpha = (var_alpha / (1 - xi)) + ((beta - xi * u) / (1 - xi))Calculate ESWe calculate the ES using the formula
1011 return es_alpha
From this, we then get that the is given by