Monday, December 3, 2012

Using PCA for spread trading

Classical pairs trading usually involves building a pair consisting of two legs, which ideally should be market-neutral or in other words, pair returns should have zero correlation with market returns. The process of building a 'good' pair is pretty standard. A typical way of building a pair (spread) involve choosing two correlated securities and forming a market-neutral pair using stock betas.

Multi-legged spreads are more advanced and very difficult to build using the traditional method.
However, there is a mathematical method called Principal Component Analysis that can be easily used to create stable (=tradeable?) spreads. All the linear algebra is luckily hidden inside the princomp function, but if you'd like to understand how PCA really works, take a look at this tutorial. The transformed data can be described as : 1-st component: 'max volatility portfolio', which is usually very highly correlated with the market. 2-nd component: 'market-neutral' portfolio, having maximum variance. 3-d and further components have decreasing degrees of variance. Note that by design, PCA produces orthogonal components, meaning that all portfolios are not correlated to each other. So 2nd and further portfolios are market-neutral.

Here is an example of applying PCA on some correlated etfs in the energy sector:
The upper chart shows raw prices, the lower char are the cumulative returns of principal components. To compute the principal components I only used first 250 days of data. It seems that the principal components, which are linear combinations of each security returns are quite stable out-of-sample, which is a pleasant surprise. First (blue) component has most of the variance, and it is clearly correlated to the movement of the prices in the upper chart.

Let's take a closer look at the last two components: these seem to be quite stable and tradeable even far out-of-sample.