Saturday, March 13, 2010

The temptation of overfitting and how to resist it.

One of the greatest temptations when designing a  quantitative system is  without a doubt overfitting (or data snooping). Sometimes it is so evident that only a novice can get fooled by it, sometimes really hard to point out. For example, everybody knows that you should not try to predict the future prices by linear extrapolation. The higher the polinome order you are using, the better it will fit the training data and the worse it will fit the test data.
Still, I often come across traders running insane optimizations of a 10+ parameter model. But the truth is, if your data is bad , no amount of optimization will help. Imagine a true random walk process. It can not be predicted by definition, but by pure chance it can produce something that *looks* predictable. These situations can lead to a life-long quest for a 'jesus indicator', which of course can not be found. Speaking of indicators I must say that I do not believe in them. Well, I do use a couple occasionally, but for me it is only tooling. Just like a wrench could be useful from time to time as a tool, but there is really no need to 'believe' in it.
I've spent a lot of time lately developing a robust model and things are going in the right direction. Here is an example of a P&L from one of the better models. It does not account for transaction costs and slippage yet, but the good thing is that the result is not optimized in any way.
My way of avoiding overfitting is
  • making a hypothesis of a physical phenomenon, and testing it on multiple datasets. If the existence of  the phenomenon is proven, only then a robust model can be bult.
  • avoiding models that produce negative sharpe for any model settings
  • optimizing the data instead of the model. In other words, if a general model does not work on your data, go find a better one.

No comments:

Post a Comment