Thursday, March 18, 2010

Intraday data handling can be easy, when you've got the right tools.

The title of this post could be just as well 'In praise of Matlab' ( or possibly SciLab or R, but not Excel). Yesterday it took me only a couple of hours to complete a task that otherwise would be almost impossible to achieve without the right tooling. I was so excited about how easy it turned out to be that I decided to share my experience.
For some time now I was going to take a look at Forex intraday data. It tempted me because of the law of  'large numbers'. A model should be allowed to make a couple of hundred of trades before its profitability can be estimated reliably. For a swing-trading system this should mean at least half a year paper trading and I just don't like to wait. In case of an intraday system, one only needs a couple of days!  Forex data is freely available (intraday data archive)   making it a good candidate to test some intraday strategies.
But once you download some data, a problem becomes obvious: the data is sampled at irregular intervals, making it very difficult to compare one dataset to another. Take a look at the beginning of the data files:
1051976017,D,USD/CHF,2010-01-31 17:04:45,1.060100,1.060600
1051976080,D,USD/CHF,2010-01-31 17:05:20,1.060200,1.060600
1051976102,D,USD/CHF,2010-01-31 17:05:22,1.060100,1.060600
1051976119,D,USD/CHF,2010-01-31 17:05:22,1.060200,1.060700
1051975015,D,USD/CAD,2010-01-31 17:01:50,1.070500,1.071000
1051975069,D,USD/CAD,2010-01-31 17:01:56,1.070500,1.071100
1051976049,D,USD/CAD,2010-01-31 17:05:14,1.070500,1.071200
1051976441,D,USD/CAD,2010-01-31 17:06:10,1.070600,1.071200

The sampling time seems to vary between less than a second to more than a minute! Even if you don't want to compare these two datasets, the data is still unusable for a backtest.

Now imagine trying to align these datasets in Excel. If somebody has an idea, please let me know ;-).
However, in a technical software tool (like Matlab ) there should be an interpolation function.
In Matlab it is the wonderful interp1 . It accepts available data along with a new vector of time values for interpolation.
I've used a 10-second interval for interpolation, while synchronizing start times between different datasets. The result is a neat matrix with prices at even time periods. Of course this introduces an error , but looking at the interpolated data (see graph below, deeply zoomed in), less than 1/2 pip, nothing serious.

Again, it gives me a headache just thinking about having to complete this task without the right tooling.


  1. I know this is an old post.. But here is a tip:

    to avoid the problems of linear interpolation enabling you to see forward in time (as you show in your graph) put this in your interp1.m:

    case 'b' % 'backwards'
    i = find(xiCol >= (xCol(k)+xCol(k+1))/2);
    k(i) = k(i);
    yiMat(p,:) = yMat(k,:);

    You will now have backwards interpolation, and at all times you will only use the previous tick seen, not an interpolation between the previous tick and the next tick (data not yet available).

  2. For free forex historical data, you might look also at