Menu

Sunday, December 30, 2012

Is pairs trading dead?

Bad news everybody, according to my calculations, ( which I sincerely hope are incorrect) the classical pairs trading is dead. Some people would strongly disagree, but here is what I found:

Let's take a hypothetical strategy that works on a basket of etfs:
['SPY','XLY','XLE','XLF','XLI','XLB','XLK','IWM','QQQ','DIA']
From these etfs 90 unique pairs can be made. Each pair is constructed as a market-neutral spread.

Strategy rules:
On each day, for each pair, calculate z-score based on 25-day standard deviation.
If z-score > threshold, go short, close next day
If z-score < -threshold go long, close next day

To keep it all simple, the calculation is done without any capital management (one can have up to 90 pairs in portfolio on each day) . Transaction costs are not taken into account either.

To put it simply,  this strategy tracks one-day mean reverting nature of market neutral spreads.
Here are the results simulated for several thresholds:


No matter what threshold is used, the strategy is highly profitable in 2008, pretty good throuh 2009 and completely worthless from early 2010.
This is not the first time I came across this change in mean-reverting behavior in etfs. No matter what I've tried, I had no luck in finding a pairs trading strategy that would work on ETFs past 2010. My conclusion is that these types of simple stat-arb models just don't cut it any more.

Pca - how it really works

I suppose that my previous post did not provide insights on how PCA really works. Here is another try at the subject, using a simple pair as an example.
Let's take SPY and IWM, which are highly correlated. If daily returns of IWM are plotted against daily returns of SPY, the relationship is highly linear (see left chart).
Applying PCA on this data gives two principal component vectors, plotted in red (first) and green (second). These two vectors are orhogonal, with the first one pointing in the direction of highest variance. Transformed data is nothing more than the original data projected on the new coordinate axis formed by these two vectors. The transformed data is shown in the right chart. As you can clearly see, all  points are still there, but the dataset is rotated.
The second vector is in this case -0.78 SPY + 0.62 IWM which produces a market-neutral spread.  Of course the same result would be achieved by using the beta of IWM.
The fun thing about PCA is that it is useful in building three- and more legged spreads. The procedure is exactly the same as above, but the transformation is done in a higer dimensional space. 

Monday, December 3, 2012

Using PCA for spread trading

Classical pairs trading usually involves building a pair consisting of two legs, which ideally should be market-neutral or in other words, pair returns should have zero correlation with market returns. The process of building a 'good' pair is pretty standard. A typical way of building a pair (spread) involve choosing two correlated securities and forming a market-neutral pair using stock betas.

Multi-legged spreads are more advanced and very difficult to build using the traditional method.
However, there is a mathematical method called Principal Component Analysis that can be easily used to create stable (=tradeable?) spreads. All the linear algebra is luckily hidden inside the princomp function, but if you'd like to understand how PCA really works, take a look at this tutorial. The transformed data can be described as : 1-st component: 'max volatility portfolio', which is usually very highly correlated with the market. 2-nd component: 'market-neutral' portfolio, having maximum variance. 3-d and further components have decreasing degrees of variance. Note that by design, PCA produces orthogonal components, meaning that all portfolios are not correlated to each other. So 2nd and further portfolios are market-neutral.

Here is an example of applying PCA on some correlated etfs in the energy sector:
The upper chart shows raw prices, the lower char are the cumulative returns of principal components. To compute the principal components I only used first 250 days of data. It seems that the principal components, which are linear combinations of each security returns are quite stable out-of-sample, which is a pleasant surprise. First (blue) component has most of the variance, and it is clearly correlated to the movement of the prices in the upper chart.

Let's take a closer look at the last two components: these seem to be quite stable and tradeable even far out-of-sample.


Thursday, September 27, 2012

Gap strategy with intraday data

The gap fading strategy from previous posts looked all right, but my worry is that Yahoo data does not provide accurate quotes. To check the strategy performance, I've generated a new OHLC dataset based on the Weighted Average Price (wap) of 30-second intraday data. So the opening quote is the wap of first 30 seconds of trading and close is the last 30-second wap. To make sure that my dataset is correct, I have compared it to the yahoo quotes. As shown in the chart below, the difference between the two quotes is ~5ct which seems very reasonable.
Now, testing the gap fade strategy on the OHLC data that I generated myself produces much less favorable result:
One look at the pnl chart is enough to say that this strategy would be rubbish.
This brings me to a conclusion that I already was aware of: Yahoo opening quotes are not suitable for strategy backtesting.

Thursday, September 20, 2012

Gap strategy revisited

In the beginning of 2011 I've backtested a fade gap strategy. There seemed to be an edge to fading gaps, so let's take a look how this strategy performed since then. Once again strategy rules:
  • Trade only gaps larger than 0.1 %
  • Enter on the open (short for Up gap and long for Down gap). Profit target is set at previous day close.
  • If profit target was not reached during the day, exit on close
This time I corrected the data for dividends.

The results out-of-sample are pretty good, the strategy was doing well in 2011-2012.
A more realistic case is including transaction cost of about 0.03% , which is approximately 3ct for SPY. 1ct is IB commission, another two are needed for crossing the bid-ask spread.
The Sharpe ratio for these strategies is still not solid enough for me to actually put my money on it.

-----Sharpe----
buyAndHold      0.189366
fadeUpGaps      0.508378
fadeDownGaps    0.595578
fadeAllGaps     0.783124


... and I still keep wondering, how can there be an edge while there seems to be no significant correlation between the night gap and the day session change.

Wednesday, September 19, 2012

SPY opening gaps

There are quite some people in the blogsphere claiming that gap trading is statistically profitable. Just google for 'opening gaps' or something similar to get a bunch of links. Some claims are quite interesting stating >70%  chance of  a gap closing after a 'gap up'. Well, I imagine that it is possible to have a 70% 'closed gap' statistics and still have zero edge in trading the gap. I have looked into this topic about a year ago and first results were promising.
This time I took a look at the matter in a slightly different way: looking for correlation between overnight change of the SPY (previousClose-to-open) and the daily change (open-to-close) of the following trading session.
Below is a chart of cumulative daily percentage changes  of SPY for about 3 years of data. Blue line is the overnight change and green line the day session change. It is immediately clear that day session is more volatile then night session. . Apart from that nothing really special in this  chart.
More insight comes from plotting the overnight return vs daily return :
Judging by eye, there is no relation between nightly change and the daily session. Testing for correlation between the two gives : 0.000062 ... yes, zero.  I'm not sure while my previous attempts at building a gap strategy produced positive results, but now I'm determined to get to the bottom of this...

Monday, March 26, 2012

Hidden cost of XIV

Two popular options to trade short-term volatility are VXX an XIV ETNs. By their design, they should provide exactly the same daily returns, but with opposite sides. Both should have similar yearly fee. However, I have noticed that XIV shows a consistent underperformance relative to VXX. Time for a quantitative investigation.
Imagine we would trade a portfolio of long VXX, long XIV and both legs having the same capital, rebalancing at the close. In this case, the daily returns of VXX an XIV should cancel each other out, and cumulative pnl should be flat. But in reality, a long volatility position using short VXX outperforms a long XIV by about 7% a year. Correcting for borrowing cost of VXX, which is about 2.5% , there is still about 5% difference!.
The green line represents the path that a long XIV+VXX portfolio have followed since its inception. To make a realistic estimation of the daily relative loss, I have first filtered out the outliers. The red line shows an estimate of the daily cost of XIV relative to VXX.
Conclusion: playing short volatility using XIV will cost you  about -0.028 % per day, relative to a short VXX position (borrow cost not included).

P.S I'm not sure there is money to make by shorting both ETNs, daily rebalancing cost will most probably eat away all potential profits.

Update 22/07/2012: here is a chart of an estimated price of XIV by taking a direct daily reverse of VXX. All prices are normalized (starting at 100).


Friday, February 24, 2012

Add database functionality to Matlab with SQLite.

Matlab data structures are fine for most of research work, but running a daily trading business is a different thing. A trader or an account manager often needs to maintain a list of trades, accounts, strategies, clients etc., often dealing with relational data An SQL database is ideal for such a task, but most solutions (like MySQL) are are quite an overkill, as we don't need a concurrent client database, data trees etc.
A solution comes in form  SQLite, a serverless database engine that stores the whole database in a single file. It is widely used in anything from mobile phones to mainframes and runs on almost anything.
Quite some time ago I have written a post on using SQLite in Matlab. As I am still a happy user, time for an update and a demo.
Installation: you only need to download and unzip the files from here: http://mksqlite.berlios.de/mksqlite_eng.html
(note: you'll probably need to compile from source on a 64bit system).

Demo: I have written a simple script to demonstrate how to keep a list of portfolio positions for thee separate accounts.


%{

Copyright: Jev Kuznetsov 
License: BSD

demo of SQLite for portfolio management.

%}

% test sqlite
clear all;
clc;
mksqlite('open','test.db');

tables = mksqlite('show tables');
disp(tables);

%% create a new data table 
mksqlite('DROP TABLE tbl_portfolios');
sql = 'CREATE TABLE tbl_portfolios ( id INTEGER PRIMARY KEY AUTOINCREMENT, accountName TEXT, symbol TEXT, position INTEGER)'; 
mksqlite(sql);

%% now add some random data 
symbols = {'ABC','DEF','GHI','XYZ','AAA','BBB','CCC','DDD'};
accounts = {'acct1','acct2','acct3'}; 

%mksqlite('PRAGMA synchronous=OFF'); % speed tweak, see sqlite doc
mksqlite('BEGIN'); % bundle multiple inserts  into one transaction, speed boost!
tic
for i=1:100
    symbol = symbols{ceil(length(symbols)*rand)}; % pick a random symbol from symbols
    account = accounts{ceil(length(accounts)*rand)}; % same for account
    position = ceil(1000*rand); 
    
    fprintf('adding account: %s symbol:%s position:%i \n', symbol,account,position);
    
    % first, check if symbol is already in portfolio
    res = mksqlite(sprintf('SELECT id FROM tbl_portfolios WHERE accountName="%s" AND symbol="%s"',account,symbol));
    if isempty(res)
      fprintf('Adding symbol \n');
      mksqlite(sprintf('INSERT INTO tbl_portfolios (accountName, symbol, position) VALUES ("%s","%s",%i)',account,symbol,position));
    else
      fprintf('Updating symbol \n');
      mksqlite(sprintf('UPDATE tbl_portfolios SET position=%i WHERE id=%i',position,res.id));
    end
    
end
%mksqlite('PRAGMA synchronous=NORMAL');
mksqlite('END');

toc

%% now pull the data from database

fprintf('\nGetting data from database\n');

res = mksqlite('SELECT * FROM tbl_portfolios ORDER BY accountName ASC');
fprintf('Account\tSymbol\tposition\n-----------------------\n');
for i=1:length(res)
  fprintf('%s\t%s\t%i\n', res(i).accountName, res(i).symbol,res(i).position);
end
  
%% try some handy sql stuff
% unique account names
res= mksqlite('SELECT DISTINCT accountName FROM tbl_portfolios') 
% sum of all positions in acct1
res= mksqlite('SELECT SUM(position) as sm FROM tbl_portfolios WHERE accountName="acct1"')