Tuesday, November 4, 2008

VIX vs GARCH: Results from a New Region of Phase Space

In the article Our New Volatility Laboratory, we discussed the impact of exploring a new region of volatility phase space on our econometric modelling procedures. That discussion exhibited the Dow Jones Industrial Average, which is an index I regularly trade futures on. However, the recent dramatic increases in the level of volatility have not been restricted just to the DOW.

The chart SPX Volatility compactly illustrates the volatility of the S&P 500 index since 1995. The volatility model presented here is a simple GARCH model that was developed on the 2000-2003 daily data and is quoted in terms of daily volatility in index points.

In the chart Comparison of VIX with Volatility Model we compare the VIX index, published by the CBOE and designed to represent the annualized volatility of the S&P 500 index, over the next 30 days, implied by the current market in index options, with the GARCH model discussed above.

We see very strong agreement between the VIX and the GARCH model with a regression line gradient statistically indistinct from unity. This indicates that changes in the VIX and changes in the GARCH model track eachother quite well. However, there does seem to be a persistent and significant premium in the VIX over the GARCH model of about 4%. (In the scatter plot the blue line is the regression line and the green line is a line with unit gradient and zero intercept.)

However, if we examine the tail of the scatter, which is the data where volatility is significantly higher than the norm, we see that this relationship is not true. All this data lies below the regression line. Thus, if we assumed that now would be a good time to sell volatility we might be making a mistake as rather than being very expensive, volatility may be a little cheap right now.

Monday, October 20, 2008

What Does the Markowitz Portfolio Really Mean

One of my formative physics memories is watching Prof. Donald H. Perkins derive the Rutherford Scattering cross-section formula on half a blackboard in about five minutes. His derivation was based on dimensional arguments and physical principles, and came just one week after my graduate class in Oxford had derived the very same formula from first priciples, which took several hours and pages of dense algebra. Prof. Perkins class was about phenomenology, which means it was about what happens in nature and his lesson to me was that Physics is not Applied Mathematics. That what happens in nature is due to the structure of the universe and not because of the way the math works out.

When solving the Markowitz Mean-Variance efficient investment problem one is lead to the portfolio defined by the product of the inverse of the covariance matrix into the vector of the asset return forecasts. So let's follow Prof. Perkins' lead and ask what this equation tells us about the principles of how we should structure a portfolio.

First of all, remember that the covariance matrix is required to be a symmetric positive definite matrix. What this means is that it can be diagonalized by a similarity transformation and that the diagonal terms of the resultant matrix are positive quantities. (A little linear algebra reveals that the transformation matrix is the matrix of eigenvectors of the covariance matrix and the diagonal terms are the associated eigenvalues.)

From a statistical point of view what we have done is rotated into a new coordinate system in which our set of original correlated random variables have been replaced with new variables, each one formed from a linear combination of the original variables, which are all statistically independent. The new, transformed, matrix represents the covariance matrix of these new variables.

Many authors now declare that these new variables are the actualdriving factors behind the variance of our original portfolio and that each asset has a factor loading on to the factors which are the real sources of portfolio variance.

I would not go so far. Mathematically, any symmetric positive definite matrix, whatever its source, can be decomposed in this manner, so I feel unwilling to add any interpetational overhead where it is not neccessary. I am not saying that factor models do not exist, what I'm saying is that all covariance matrices can be diagonalized, with or without the existence of factors, so the fact that a particular matrix can be treated in this manner doesn't actually contain any new information. We will use the common term principal components to refer to the independent variables we have produced, but need to go no further that that.

This philosophical diversion notwithstanding, the mathematics is fairly straightforward. When we transform to the principal components coordinate system we move from a system in which each axis represents a particular asset to one in which each axis represents an independent portfolio. The vector of asset forecasts is similarly transformed into a vector of portfolio forecasts. The payoff is that the covariance matrix has become a trivial diagonal matrix and, even more usefully, the inverse of the covariance matrix is simple a that matrix with the diagonal elements replaced by their reciprocals. The product of the inverse covariance matrix with the forecast vector then becomes a simple vector where each element is the ratio of the forecast to the variance of each component portfolio.

So the big question is why is this the right portfolio? The answer comes when we consider what the expected profit for each component portfolio is. This is just the product of the forecast and the holding which is the ratio of the square of the component forecast to the component variance. This is interesting because it is dimensionless and structure free (by which I mean that the formula is the same for every component independent of the component label). We are diversified because we are treating each component equally --- it's not due to the fancy mathematics, but it is clearly the right answer.

No Evidence for GARCH in the Mean

Here's a first result using the excursion of DJIA daily volatility into the 300-400 points per day region. Does this market show any evidence of GARCH-in-the-Mean type behaviour? i.e. Does there seem to be a systematic bias to the drift of the market conditioned on the level of volatility. From the naive regression analysis, this hypothesis should clearly be rejected.

The more sophisticated approach of building a GARCH-M model directly, also fails.

Thursday, October 16, 2008

Our New Volatility Laboratory

Recent turmoil in the financial markets has been accompanied by daily volatility reaching unprecedented levels. (The chart DJIA Volatility illustrates the daily point volatility for the DJIA estimated from a simple GARCH model. n.b. This chart was prepared during the day, so the "current" levels indicated numerically do not represent the "end of day" levels.)

The level of volatility enters into our trading strategy in several places. Firstly, if we are risk averse, then our asset price change forecasts must be weighed against a risk metric when we decide whether or not to trade. Most likely this risk metric will scale in some way with the level of volatility. If we do not dynamically alter our risk metric to take account of the current levels of volatility then we will fail to maintain the same risk/reward ratio (or signal-to-noise ratio) in volatilite times that we have in quiescent times. This will act to deteriorate the Sharpe Ratio of the trading strategy. To pick a guady metaphore, when one hears the noise of the waterfall ahead one should start to paddle less swiftly.

Secondly, if our forecasting procedure involves variables in lagged returns; or, cross-sectional dispersive measures; or, implied volatilities; or such like factors; then, our alpha itself will scale in some way with the level of volatility and so it itself will become larger in magnitude during volatile times.

Canonical "Modern Portfolio Theory" explicitly specifies that the ideal portfolio should be linear in the product of the inverse of the covariance matrix into the vector of forecasts. This quantity, whether expressed in price change space or return space or some other manner, is not dimensionless (it has the dimension of quantity/forecast e.g. contracts/dollar) and will therefore scale inversely with the level of volatility.

So theory often tells risk averse traders to take some account of volatility when making their trade decision. However, in practice I've often found it difficult to show the actual benefit of such considerations as an empirical reality. But one problem with econometric analysis of financial markets is that the data does not do a good job of exploring the available ranges of empirically important variables. Interest rates, for example, can stay in a similar range for years. This, as we see from the DJIA chart referenced above, is also true for volatility.

Now, in stark contrast, volatility has broken into a wholly new region of phase space. Now we can actually compare decisions made in times of radically high volatility with those made in more quiet times. Of course, this analysis still has a temporal bias --- for we only have one such region of high volatility and during that time the markets fell dramatically --- so we must maintain caution as to what we do with this dataset but, nevertheless, we have a new volatility laboratory to work in.

Monday, September 15, 2008

Mispricing of Correlation Risk --- I Think it is Really that Simple

I'm not an option modeller or a mortage expert. My knowledge of financial economics is mostly focussed around equity markets. So my understanding of how MBS and CDO's work comes from what I've read in the papers essentially.

Let's take an equity guy's look at a MBS and see how it's alchemy is pulled off. We take a portfolio of risky securities (in this case the risk is that the mortgages go into default, for equity it would be the "common" risk we're familiar with --- systematic risk and idiosyncratic or residual risk). If we hold the portfolio as a simple group of assets we get the "portfolio effect." That is standard deviation per dollar of asset of the portfolio is less than the sum of the standard deviations of per dollar of the assets invidivually because the idosyncratic value fluctuations are not correlated and so sometimes they cancel each other out. This is the statistician's friend, the Law of Large Numbers, working some real genuine magic for us. Of course, if we have stocks that are dominated by market risk, i.e. stocks whose returns are highly correlated with eachother, then we don't get much diversification value; on the other hand for uncorrelated stocks or anticorrelated stocks we get a big effect.

One can state the "value" of the diversification effect as: St.Dev.(Portfolio) - Sum St.Dev.(Assets).

Now let's take our portfolio and put it in a trust. However, we write the trust documents so that the trust's assets are transferred back to several portfolio's at the end of a fixed period, not just the single original portfolio. We set up these portfolios by ranking all the constituent assets by their total return at the end of the period and then we give the top quintile to the A portfolio; the second quintile to the B portfolio; the third to the C portfolio; etc.

Clearly for "normal stocks" the present value of the "A" portfolio is much higher than that of the "E" portfolio and we've now worked the MBS magic on an equity portfolio. We should be able to sell the "A" portfolio rights for much more money than the "E" portfolio rights. This value comes from the fact that we have written a trust document that allows us to adjust the portfolio constituents after the relative returns are known.

This is essentially what's done with a MBS. A portfolio of mortgages is put in a trust and trust documents written so that the "AAA" tranche gets the payments from the mortgages that default last and the "equity" tranche gets the payments from the mortgages that defaults first.

Going back to our equity trust, look at the spread in value between the "A" portfolio and the "E" portfolio in the circumstance that the equity returns are all completely correlated. Clearly, in that circumstance all the stocks have exactly the same return, whatever that turns out to be, and there is no difference in any of the constituent returns and so:

value(A) - value(E) = 0 when common correlation = 100%.

The other end of the spectrum is when the stocks are uncorrelated (there are mathematical restrictions on the number of stocks that can share a common anti-correlation, for example three stocks cannot all be perfectly anti-correlated with each other, so we won't consider that case). In this case the dispersion between the constituent asset returns is maximized and so:

value(A) - value(E) = maximum when common correlation = 0%.

So, although I'm not going to work out the precise form here (that is going to be a function of how we actually specify our returns model), it's seems clear that the "tranching value" will turn out to be a decreasing function of the degree of common correlation.

I claim that the same will also be true for a MBS. On the assumption that default is purely idiosyncratic, there is value to the reordering of cash flows done within a MBS or CDO trust. On the assumption that default purely synchronous, there is no value whatsoever to such a device. In general the premium for a MBS "AAA" tranche above the "equity" tranche is a decreasing function of the degree of synchronous default.

It was previously claimed (by market participants) that default of a mortgagee was a highly idiosyncratic event, contingent on personal circumstances like the loss of a job or a death in a family etc. So the idea that everybody could default at the same time was not worth considering and so the price of this correlation, expressed as a discount to the "normal" premium of the "AAA" tranche over the "equity" tranche was zero. But a gigantic, worldwide, property bubble, gave us a situation in which there would be a lot of systematic defaults and this correlation discount has been repriced dramatically. Hence the falling apart of the MBS/CDO markets.

I believe it is really that simple.

Wednesday, September 10, 2008

An Aside - Ad-hoc Trades and the Take Profits Algorithm

Everybody makes ad-hoc trades, even the most rigourously algorithmic traders sometimes just pick up the mouse and click themselves into a position. If we're right, or if we're lucky, this starts as a good idea and generates a gain. However, although many people feel they have good insight about when to buy (or sell) --- following an earnings suprise, for example --- it's my observation that the decision to close a position out and take profits, or limit a loss, is a lot harder to make successfully. And this applies to myself as much as others.

So what happens is that the initial information fades and the trade turns against you, but you sit on the position waiting for it to come back. Everybody does this when they "punt" stocks; it's something about how the human brain processes decisions.

I going to describe here something that is at complete variance to the statistical-analytical trading methods I use for normal businesses. But I find that it helps. I use automated electronic trading to manage my systems, and I apply the following method to get me out of ad hoc trades with a system I call the take profits algorithm. It's a simple idea that doesn't really need the computer to be operated (although that does make it emotionally easier to deal with); you could use stop orders to do some of this.

Essentially. we're going to resign ourselves to not taking all of the profits nominally available to us. We're going to leave some profits "on the table" as discretionary traders would put it. I would claim that the profits we leave on the table, the opportunity cost of our trading, are the risk premium which we are paying out in return for our aversion to losing our gains.

I implement an algorithm in which, when a trade is profitable, we take a profit. Specifically if we have a gross gain of G on a position, we cut the position to a fraction of the initial position. I chose the fraction 1/(1+G)^2, but that is essentially an arbitary amount. The important point is that we take some profits and the more profit that exists the more of it we take. This means that if a stock goes straight up we cannot ever capture all of the gains it makes, because we will book profits on the way up. The chart "Effect of Take Profit Algorithm" shows how much you are theoretically giving up.

Sometimes, of course, we don't get it right and the stock we bought sinks instead of rising; or we bought when it was 10% up on the earnings suprise but it settles to 5% up so we actually get a 5% loss. Whenever the trade is losing I apply a different algorithm. This one is based on holding time because I'm all for giving the trade a little time to turn around (especially since academic research indicates that the initial pop on an earnings suprise generally underestimates the final net response to the news). So I cut the position to a fraction exp(-d/5) of the initial positions, where d is the number of days since the trade entry. Again the exact formalism is arbitary, but the idea is that the longer it's been since the inital trade the more you should take off. Essentially what we're saying is that if it's been a long time since the initial trade idea we have to accept the fact that we're probably wrong.

The final step is that when we have adjusted our position, we "reset the clock" and treat the new position as a new trade. (I also make all my decisions on a beta adjusted basis relative to the S&P 500 benchmark, because ad hoc trades are about residual returns not systematic risk.)

I apply this algorithm automatically to all the ad-hoc trades I do. I have a completely automated environment so the algorithm just places orders for me, I don't have to work through the rules for each trade every day.

For example, at the end of July this year I decided to take a punt on Lehman Brothers Inc. (I'm a client, and have been for years, and I also know people who work there.) I thought that the housing/credit crisis had got through the worst and that things were going to look up from now on. I bought 2,500 shares of LEH for my personal account at $15.90.

Today LEH is trading at $7.98. Although the financial stocks initially turned up, it seemed that market had underestimated the extent of the problems at Lehman and the loss from entry-to-date would have been around 50%. However, the TPA got me out at a profit. I'm pretty certain I would have been caught up in the euphoria of my gains and not closed out until it was too late. My actual trading activity is illustrated in the table "Trading in Lehman Brothers."

Monday, August 11, 2008

About the Model Portfolios

I've posted three model portfolios. Two of these (the eurodollar portfolio and the long-short version of the CMP) represent actual positions I hold. The last (long-only CMP) represents the equivalent portfolio when positions in QQQQ are replaced with QID. This data is updated every day and is valid as of the timestamps embedded in the file.

Wednesday, August 6, 2008

Profit Maximizers, Risk Averse Traders and Separable Models

In trying to predict a market, there are essentially three classes of forecasting model we can attempt to build: directional models, which forecast only the future direction of the market; point models, which forecast a conditional mean of the future distribution of returns for some horizon from any given decision point; and, distributional models that attempt to describe the entire conditional probability density function (or maybe the lower moments of it) from which the next period(s) realized return will be drawn.

(Note, in the following we asume that the forecasting models are all actually valid out-of-sample.)

In the first circumstance, we do not have much sophistication. The actual return that occurs on any given day can be modelled as an i.i.d. random number plus a constant times our directional forecast. That constant is the mean return conditioned on our forcast indicator (+1, 0, -1).

Now, if that mean return is less than the costs of trading (slippage, brokerage fees, etc.) then a rational profit maximizer would not ever trade. If it exceeded the costs of trading then a rational profit maximizer would always trade in the direction of the forecast.

With the second style of forecast, a point forecast which changes dynamically, then a rational profit maximizer could compare the forecast on any given day with his knowledge of transaction costs (slippage as a function of position size, fixed trading fees etc.) and decide dynamically whether to trade on any given forecast. Since this makes a conditional decision, which should take the trader out of the market on some occasions when the first scenario would have us in the market, we expect that this strategy would perform better than the first (it may fail to for a particular realization). With a point forecast a profit maximizer would trade only on a forecast with a net positive expectation (i.e. forecast return exceeds all trading costs) and they will trade on every forecast with a net positive expectation.

However, we are not pure profit maximizers. Most of us are risk averse to some extent, and few of us would go any buy 10,000 S&P futures contracts if our net expected profit was just $1 on the whole trade (to give a cartoon example). A risk averse trader, which describes anybody who seeks to maximize a risk adjusted returns metric such as the Sharpe Ratio, does not do every trade that has a net positive expectation. They choose to veto the set of trades in which the expected profit does not pay them sufficiently for the risk they are taking in entering the trade.
This is equivalent to adding an additional cost to the trade: I will trade when my point forecast exceeds my trading costs and my risk penalty.

Thus a risk averse trader will outperform a profit maximizing trader in expectation on a risk adjusted basis. This is because they will not do some of the trades that the profit maximizer would enter.

Perhaps suprisingly, we should take this all into account when choosing our basic modelling paradigm. The reason is the Law of Large Numbers, which tells us that when we make a measurement with a large sample size we get a more accurate estimate of the true value of a parameter than when we make a measurement with a small sample size.

When we build a trading system which is optimized by examing a backtest of the trades, we are dealing with a sub-sample of the data available --- the data for the trades actually entered. When we build a forecasting system which is optimized by following a statistical estimation procedure on the returns data, we are dealing with the full dataset available. Thus the accuracy of our parameter estimates, by the Law of Large Numbers, is as high as possible. Furthermore, we are not embedding assumptions about our risk aversion or transaction costs etc. into our estimation procedure.

Defining Some Terms

When re-reading the first post I saw that I'd used the term "trading strategy" and so feel a need to define this as I am using it.

Quantitative trading generally means using quantitatively derived information to trade. This can be thought of a three stage system. The first is using quantitative methods to forecast alphas, or asset specific (i.e. idiosyncratic) returns. I view most alphas as stochastic (i.e. random) with a zero mean longitudenally, but that does not mean that they cannot be conditionally forecast.

Once one has a set of forecasts one has to decide when to trade. This is what I mean by trading strategy: given private or semi-private forecasts of asset returns, knowledge of trading costs, and forecasts of risk, how do you combine this information to produce a decision to trade.

The third element is how much capital to commit to a given trade. This, you would call risk management.

The nice thing about making this devision is that it makes it easier to work and easier to evaluate one's work. One could call a trading system "separable" if it's analysis can be cleanly divided in this way (sort of in the way in which a partial differential equation is separable if f(x,y,z) is written X(x)Y(y)Z(z), for example).

The job of forecasting, or alpha generation, is a cleanly defined piece of statistical analysis: viz, to construct a forecasting system that is consistently reliable out-of-sample, meaning when used on data not used to develop it. This is unambiguously a piece of science.

The job of trading strategy is a cleanly defined piece of mathematical logic. Given a forecast set, when should one trade? This is applied mathematics, nothing more nor less. We have no need of backtesting if our forecasts are good and our logic is correct.

The job of risk management is more fuzzy, as this is the point at which economic theory enters the picture. Given a trade decision and a risk estimate, how much should I invest relative to capital.

Note how the paradigm described above differs from what one would call a "technical trading" system. Which is a black box system that takes in market data and outputs trades, based on parameters which are optimized through backtesting. Of course, this method can also work.

Tuesday, August 5, 2008

Identity and Agenda

I am a quantitative trader. I sometimes call myself a statistician, although I have no formal statistical training. However, I have a considerable informal statistical training which was acquired while completing my doctorate in experimental elementary particle physics at Oxford. For my professional career I have applied this empirical knowledge, and some theoretical skills, to the financial markets.

I used to work in the Process Driving Trading Group (PDT) at Morgan Stanley. One of the things I did there was develop a formal mathematical description of the trading strategy used as part of their "Stat. Arb." quantitative trading system. I also managed futures trading which, overall, was not successful. PDT were great at relative value trading, but futures require a different focus, on outright risk taking, and I feel the two didn't mesh very well.

In 1999 I got married, and in 2000 I left PDT. I set up a commodity trading advisory (CTA) firm and, later, a registered exempt commodity pool operator (CPO). I abandoned my futures trading style from Morgan Stanley and created an entirely new business, albeit trading the same contracts -- three month eurodollar futures. This was a much more successful business generating returns, for its partners, of approximately 30% per annum from 2000 to 2003. I closed that business for personal reasons, and have been managing a private family investment fund since then.


I learned a lot working at Morgan Stanley, but I learned much much more investing my own capital. I have always tried to think carefully, and more importantly analytically, about my activities in the markets. Over the years I have developed some interesting models for financial data, and it is my intention to use this forum to publish some of this information.

I don't believe markets are efficient, but I do believe they are nearly so. I will publish some information on methods, some on particular forecasting systems, and some on general items of interest. I do hold positions in the markets and will always disclose them.

I am going to start with something concrete: a stock selection strategy I call the Compact Model Portfolio.