Menu

Sunday, December 30, 2012

Is pairs trading dead?

Bad news everybody, according to my calculations, ( which I sincerely hope are incorrect) the classical pairs trading is dead. Some people would strongly disagree, but here is what I found:

Let's take a hypothetical strategy that works on a basket of etfs:
['SPY','XLY','XLE','XLF','XLI','XLB','XLK','IWM','QQQ','DIA']
From these etfs 90 unique pairs can be made. Each pair is constructed as a market-neutral spread.

Strategy rules:
On each day, for each pair, calculate z-score based on 25-day standard deviation.
If z-score > threshold, go short, close next day
If z-score < -threshold go long, close next day

To keep it all simple, the calculation is done without any capital management (one can have up to 90 pairs in portfolio on each day) . Transaction costs are not taken into account either.

To put it simply,  this strategy tracks one-day mean reverting nature of market neutral spreads.
Here are the results simulated for several thresholds:


No matter what threshold is used, the strategy is highly profitable in 2008, pretty good throuh 2009 and completely worthless from early 2010.
This is not the first time I came across this change in mean-reverting behavior in etfs. No matter what I've tried, I had no luck in finding a pairs trading strategy that would work on ETFs past 2010. My conclusion is that these types of simple stat-arb models just don't cut it any more.

Pca - how it really works

I suppose that my previous post did not provide insights on how PCA really works. Here is another try at the subject, using a simple pair as an example.
Let's take SPY and IWM, which are highly correlated. If daily returns of IWM are plotted against daily returns of SPY, the relationship is highly linear (see left chart).
Applying PCA on this data gives two principal component vectors, plotted in red (first) and green (second). These two vectors are orhogonal, with the first one pointing in the direction of highest variance. Transformed data is nothing more than the original data projected on the new coordinate axis formed by these two vectors. The transformed data is shown in the right chart. As you can clearly see, all  points are still there, but the dataset is rotated.
The second vector is in this case -0.78 SPY + 0.62 IWM which produces a market-neutral spread.  Of course the same result would be achieved by using the beta of IWM.
The fun thing about PCA is that it is useful in building three- and more legged spreads. The procedure is exactly the same as above, but the transformation is done in a higer dimensional space. 

Monday, December 3, 2012

Using PCA for spread trading

Classical pairs trading usually involves building a pair consisting of two legs, which ideally should be market-neutral or in other words, pair returns should have zero correlation with market returns. The process of building a 'good' pair is pretty standard. A typical way of building a pair (spread) involve choosing two correlated securities and forming a market-neutral pair using stock betas.

Multi-legged spreads are more advanced and very difficult to build using the traditional method.
However, there is a mathematical method called Principal Component Analysis that can be easily used to create stable (=tradeable?) spreads. All the linear algebra is luckily hidden inside the princomp function, but if you'd like to understand how PCA really works, take a look at this tutorial. The transformed data can be described as : 1-st component: 'max volatility portfolio', which is usually very highly correlated with the market. 2-nd component: 'market-neutral' portfolio, having maximum variance. 3-d and further components have decreasing degrees of variance. Note that by design, PCA produces orthogonal components, meaning that all portfolios are not correlated to each other. So 2nd and further portfolios are market-neutral.

Here is an example of applying PCA on some correlated etfs in the energy sector:
The upper chart shows raw prices, the lower char are the cumulative returns of principal components. To compute the principal components I only used first 250 days of data. It seems that the principal components, which are linear combinations of each security returns are quite stable out-of-sample, which is a pleasant surprise. First (blue) component has most of the variance, and it is clearly correlated to the movement of the prices in the upper chart.

Let's take a closer look at the last two components: these seem to be quite stable and tradeable even far out-of-sample.