Wednesday, December 23, 2009

What does Dickey-Fuller test tell us (not)?

I've been trying to build an index-cointegrating portfolio in the last couple of days. One of the crucial questions here is which criterion to use for stock combination ranking. Dickey-Fuller test is the first option that comes to mind. I know it tests for stationarity, but that is just a  part of what I'm looking for.
You see, even when you have a stationary time series that has very little variance, it is also of very little use for trading, as you will probably  not earn the transaction costs back. (low variance is  a sign of market efficiency, in an ideal efficient market variance will be zero, eliminating any arbitrage opportunities). On the other hand, if a series is instationary with a low drift, but has plenty of variance, many good trading opportunities will exist.
To test the DF-test I've simulated a combination of two AR(1) time series I(0.95) and I(1):
y  = (1-drift)*s + drift*d          

s - stationary I(0.95) time series.
d-  non-stationary I(1) time series
Both series have variance alpha.

I've varied drift from 0 to 1 and  alpha from 0.2 to 5.   (0, 5) combination being most profitable. Increasing the drift value effectively means a transition from stationary signal to random walk.
The results are in the figure below:

Just as I've thought, DF can not distinguish between different levels of variance.

I clearly have a need for a better estimator, not looking for stationarity but for variance/drift ratio.

Maybe it's time to blow the dust off the good old Fourier transform, looking for high spectral peaks.
Probably an even better idea is to use wavelets for spectral decomposition and filtering, then estimate spectral density of each frequency band.
Any other ideas?

Files: drift_model.m , semistat.m


  1. Did you get a chance to look at this:

  2. Maverick, you've just helped me to get a shining idea! In the beginning of my research I've played around with neural networks. They turned to be of little use to predicting stock movements (or at least not more useful than any other technical tool) due to the random walk nature of the stock prices. But NNs are known for their wonderful ability of non-linear fitting (just take a look at house pricing example in matlab). To make the story short, my idea is now to create a set of metrics, such as ADF, variance, information ratio etc and fit them to a performance of my trading model. This will save a tremendous amount of computation time, giving an estimation of model performance before running it. Get a faster pc? I don't think so, as I'll be dealing with ~10e9 portfolio combinations ;-).

  3. Check out the Hilbert-Huang Transform. It's a new form of spectral analysis.

  4. @Gigi: Looks interesting enough to give it a go. There is a Matlab implementation by the way:

  5. I've taken a closer look at the HHT transform. Elegant. But just as other transforms (FFT, wavelets) it suffers heavily from end effects at the borders of the dataset. This makes it unusable for forecasting.

  6. dose anyone notice that yahoo historicals are missing data? How are you backtesting pairs with yahoo? data needs to be lines up. some help please

  7. Hey Anonymous,
    Look up the intersect function in MATLAB. Using the intersectf function you can match up dates that are common between the 2 data sets and discard the dates that are uncommon.
    I found out about this function from Ernest Chan’s book. If you are researching pairs trading and you are using MATLAB, get the book Quantitative Finance, by Chan. This book will answer many of your questions.

  8. Mr. Jesse, Ok this seems to work. Thank you. What is up with this whole Dickey-Fuller, I have programed a correlation test and this seems to work good with this pairs trade thesis ( I can now dump in as many stock as I want into my program and it will propagate out the most correlated assets!!!)
    Should I really bother with learning the Dicky Fuller method? The method i am using would seem to hold some promise as is!!

  9. Hi Anonymous,

    DF test gives you an estimation of the mean-reverting nature of a pair. Corellation does not. E.Chan has an post about this somewhere on this blog. However, many pair traders rely on correlation in pair selection, thus contributing to the mr process. In my experience, correllation is a good measure to filter the assets. Another thing that you could use is *auto*correllation on the pair returns. If it is not significant, the pair is probably not tradeable.

  10. Oh wow this is some very very heavy stuff. I use to mess around with corelation in back of the day.

  11. hello, here is my understanding sjev. If one uses correllation to filter they will only have the highest correlated pairs. However, where the trader makes their money wih pairs is when the stocks are not so correlated, but only cointegrated -(meaning they may not move up and down on a daily basis together, but over time they move apart and then move back together at some point)- this is where the money is made with pairs strategies that hold positions for weeks or more. - Please comment further sjev

  12. @Anonymbous: corellation is a good thing, a trade opportunity can arise when something goes 'wrong' with a correlation. Pair trading is in my point of view all about outlier detection.
    But you are right about the level of corellation. Too high, and the stocks do not drift apart far enough to justify the transaction costs.

  13. sometimes stocks are correlated.. they always move up on same day and always move down on same day. Say however that one of these stocks always moves up more than the other on the up days and this same stock always moves down less than the other on the down days. This pair will look very correlated, but pairs trader will lose money on this situation. Bottom line, correlation is good for intra-day pairs trading and cointegration is for pairs trading over longer term.