Friday, March 20, 2009

Why use the Generalized Error Distribution?

Published earlier on blog.gillerinvestments.com


This post is to address the question why use the Generalized Error Distribution? The subject, the evident abnormality of financial data, should be very familiar to the intended audience of this blog; but I'm going to summarize some basic facts here as there have been requests as to why the GED should be used.



Firstly, longitudenal returns of financial asset prices are evidently not described by the Normal Distribution. Many statements one hears along the lines of "a once in a hundred years"event are made in the context on comparing the scale of a realized event with its expected rate under the normal distribution. However, financial data are so clearly non-normal (more specifically not identically and independently distributed, or I.I.D., normal) that only a naive analyst would even start off an argument by discussing that hypothesis.



Abnormality of S&P 500 Returns



Even without doing any statistical tests, a cursory analysis of the time series of daily S&P 500 Index returns (the upper panel in the above figure) would suggest that the returns are not homoskedastic — or constant in variance.



The lower panel shows the best fit of the normal distribution form to a histogram of daily index returns. The fit is clearly poor, and the data shows the pattern typical of leptokurtotic data. There is a deficit of events in the sides of the distribution (in the region around ±1σ) and an excess in the centre and in the tails.


Since the data seems heteroskedastic, and since there seem to be episodes of heteroskedasticity, this data is clearly a candidate to try to fit a GARCH model to. It's possible to specify a GARCH model with normally distributed innovations, but which would give rise to the leptokurtotic distribution we observe in the histogram, so we should test for that.


I'm interested in specifying the process distribution correctly because it directly affects the relative weighting of the various data periods in any regression analysis we do. Ordinary least squares is only the correct estimation procedure when the underlying data are i.i.d. normal. This procedure assumes that deviations at the level of 3σ–5σ, or more, are highly significant and will cause the estimated parameters to be chosen to explain these particular realizations more than those in the lower range.



In the case of the data above, the regression will listen strongly to the current period, although the process realization now many not be that characteristic of the entire period. One might argue that we should just replace OLS with generalized least squares which, if we weight with the appropriate covariance matrix, is equivalent to maximum likelihood estimation which is a very powerful technique. However, this does not circumvent the problem of estimation based on the normal distribution treating 3σ–5σ residuals as very very significant whereas, under a leptokurtotic distribution, they are not particularly so.



The GED is useful because it can be smoothely transformed from a Normal distribution into a leptokurtotic distribution ("fat tails") or even into a platykurtotic distribution ("thin tails"). This allows us to use the maximum likelihood ratio test to test the hypothesis as to whether the GARCH process innovations are IID normal.



Results of MLR test for IID Normal SPX innovations


This test convincingly rejects the null hypothesis that the GARCH process innovations are normally distributed (shape=1). The estimated shape parameter, which controls the kurtosis of the distribution, is also approximately 6σ from the null hypothesis value.


In another post I will go into more depth about the various distributional choices that are available once one rejects the Normal.

No comments: