Monday, December 3, 2012

Using PCA for spread trading

Classical pairs trading usually involves building a pair consisting of two legs, which ideally should be market-neutral or in other words, pair returns should have zero correlation with market returns. The process of building a 'good' pair is pretty standard. A typical way of building a pair (spread) involve choosing two correlated securities and forming a market-neutral pair using stock betas.

Multi-legged spreads are more advanced and very difficult to build using the traditional method.
However, there is a mathematical method called Principal Component Analysis that can be easily used to create stable (=tradeable?) spreads. All the linear algebra is luckily hidden inside the princomp function, but if you'd like to understand how PCA really works, take a look at this tutorial. The transformed data can be described as : 1-st component: 'max volatility portfolio', which is usually very highly correlated with the market. 2-nd component: 'market-neutral' portfolio, having maximum variance. 3-d and further components have decreasing degrees of variance. Note that by design, PCA produces orthogonal components, meaning that all portfolios are not correlated to each other. So 2nd and further portfolios are market-neutral.

Here is an example of applying PCA on some correlated etfs in the energy sector:
The upper chart shows raw prices, the lower char are the cumulative returns of principal components. To compute the principal components I only used first 250 days of data. It seems that the principal components, which are linear combinations of each security returns are quite stable out-of-sample, which is a pleasant surprise. First (blue) component has most of the variance, and it is clearly correlated to the movement of the prices in the upper chart.

Let's take a closer look at the last two components: these seem to be quite stable and tradeable even far out-of-sample.


  1. A (better) human trader is flexible enough to change his view when the facts change. A model cannot do that because it assumes the PCA analysis (or any kind of analysis) is 100% correct.

    John Methew
    Options Trading

    1. true, but a trader is a human being and as such exposed to changing views, even if the facts do not really change. this is because associations, feelings, wrong conclusions and so on are involved in the decision. to make it short: there is no free lunch. each concept comes at a cost. flexibility is not for free.

  2. Hi Jev,

    how would you trade market neutral spreads? I guess you would choose the second one (explains most variance while being market neutral), and trade it either as a trend or mean reversion,e.g. using RSI or MAs. Or?

    1. z-score, bollinger, rsi , just pick one :). The only thing that is important is that the spread is mean-reverting.

  3. Jev, have you compared this approach with cointegration? E.g. johansen on multi asset portfolio? Are PCA based spreads more stable? In the article you focus on the 3rd and 4th PC, why not 2nd?

    Thanks, Jozef

  4. Hi Jev,
    Can you explain how the PCA components 3 and 4 relate to the actual security pairs in this example?

    I assume the blue or 1st PCA relates to XLE and XOP but I was not sure.


  5. This is a very interesting idea. Maybe it's a way to find non-correlated "composite" asset classes to trade other than the "standard" ones (equities, commodities, bonds, etc).
    Is mean reversion the best way to trade these spreads? If yes, why? I tried RSI OB/OS on PCA components of 4 oil country ETFs and the components do not seem to be extremely mean reverting, at least not short term (i.e. I had to use RSI(10+) rather than a short term RSI(2)).

  6. I am sorry but pair traiding requires correlated tradeable assets.