Monday, March 22, 2010

Forex costs

Once again I must come to the conclusion that Forex costs are too high for a descent high frequency strategy. By high frequency I mean time windows of less than an hour. Everything I've tried so far gets killed by the spreads, very frustrating. When the arbitrage spread is just 2-3 times larger than the broker spread all the odds shift against my favor.
As the arbitrage combinations are quite limited in Forex, there are probably just too many players with access to much tighter spreads playing this game. No easy money here, moving on...

Sunday, March 21, 2010

Transaction costs : forex vs stocks

I've decided to take a quick look at the transaction costs involved in forex versus stock trading. Most of the forex brokers brag about 'no transaction costs', but the impact of the ask-bid spread is often unclear. The results were  as expected: forex has higher transaction costs than the stock market. .
To enable the comparison of the costs I've defined a cost_ratio=spread/(max(bid)-min(bid)); . In other words, the cost ratio is the transaction cost divided by the price range in some given period of time. I am using a week. To calculate the cost ratio for forex I've taken the spreads for 'USD/CAD'    'USD/CHF'    'USD/DKK'    'USD/NOK'    'USD/SEK' pairs relative to their weekly range and then averaged them. (data source: Gain Capital)

The cost ratio for a stock is calculated in a similar way. SPY has a weekly range of about 2% (rough estimate) , one share price is around  $115, so the weekly range is 0.02*115 = $2.3. Transaction costs at IB are $0.005 a share, resulting in a cost ratio of  2.2e-3.

The results are in the graph below:

The cost ratio for forex seemst to be around 5 times higher than for stocks. This changes of course if a lower priced stock is traded. For a $20 share with the same weekly range , the price ratios between forex and stock market are roughly equeal.

I guess the difference in cost ratios is the price you pay for liquidity, minimal slippage and real-time data.

Thursday, March 18, 2010

Intraday data handling can be easy, when you've got the right tools.

The title of this post could be just as well 'In praise of Matlab' ( or possibly SciLab or R, but not Excel). Yesterday it took me only a couple of hours to complete a task that otherwise would be almost impossible to achieve without the right tooling. I was so excited about how easy it turned out to be that I decided to share my experience.
For some time now I was going to take a look at Forex intraday data. It tempted me because of the law of  'large numbers'. A model should be allowed to make a couple of hundred of trades before its profitability can be estimated reliably. For a swing-trading system this should mean at least half a year paper trading and I just don't like to wait. In case of an intraday system, one only needs a couple of days!  Forex data is freely available (intraday data archive)   making it a good candidate to test some intraday strategies.
But once you download some data, a problem becomes obvious: the data is sampled at irregular intervals, making it very difficult to compare one dataset to another. Take a look at the beginning of the data files:
1051976017,D,USD/CHF,2010-01-31 17:04:45,1.060100,1.060600
1051976080,D,USD/CHF,2010-01-31 17:05:20,1.060200,1.060600
1051976102,D,USD/CHF,2010-01-31 17:05:22,1.060100,1.060600
1051976119,D,USD/CHF,2010-01-31 17:05:22,1.060200,1.060700
1051975015,D,USD/CAD,2010-01-31 17:01:50,1.070500,1.071000
1051975069,D,USD/CAD,2010-01-31 17:01:56,1.070500,1.071100
1051976049,D,USD/CAD,2010-01-31 17:05:14,1.070500,1.071200
1051976441,D,USD/CAD,2010-01-31 17:06:10,1.070600,1.071200

The sampling time seems to vary between less than a second to more than a minute! Even if you don't want to compare these two datasets, the data is still unusable for a backtest.

Now imagine trying to align these datasets in Excel. If somebody has an idea, please let me know ;-).
However, in a technical software tool (like Matlab ) there should be an interpolation function.
In Matlab it is the wonderful interp1 . It accepts available data along with a new vector of time values for interpolation.
I've used a 10-second interval for interpolation, while synchronizing start times between different datasets. The result is a neat matrix with prices at even time periods. Of course this introduces an error , but looking at the interpolated data (see graph below, deeply zoomed in), less than 1/2 pip, nothing serious.

Again, it gives me a headache just thinking about having to complete this task without the right tooling.

Saturday, March 13, 2010

The temptation of overfitting and how to resist it.

One of the greatest temptations when designing a  quantitative system is  without a doubt overfitting (or data snooping). Sometimes it is so evident that only a novice can get fooled by it, sometimes really hard to point out. For example, everybody knows that you should not try to predict the future prices by linear extrapolation. The higher the polinome order you are using, the better it will fit the training data and the worse it will fit the test data.
Still, I often come across traders running insane optimizations of a 10+ parameter model. But the truth is, if your data is bad , no amount of optimization will help. Imagine a true random walk process. It can not be predicted by definition, but by pure chance it can produce something that *looks* predictable. These situations can lead to a life-long quest for a 'jesus indicator', which of course can not be found. Speaking of indicators I must say that I do not believe in them. Well, I do use a couple occasionally, but for me it is only tooling. Just like a wrench could be useful from time to time as a tool, but there is really no need to 'believe' in it.
I've spent a lot of time lately developing a robust model and things are going in the right direction. Here is an example of a P&L from one of the better models. It does not account for transaction costs and slippage yet, but the good thing is that the result is not optimized in any way.
My way of avoiding overfitting is
  • making a hypothesis of a physical phenomenon, and testing it on multiple datasets. If the existence of  the phenomenon is proven, only then a robust model can be bult.
  • avoiding models that produce negative sharpe for any model settings
  • optimizing the data instead of the model. In other words, if a general model does not work on your data, go find a better one.