Menu

Tuesday, December 7, 2010

VIX buy & hold strategy

Here is a simple buy&hold strategy that popped up while I was playing with  some VIX based ETNs . The volatility was extremely high through the end 2008- beginning 2009, presenting a good opportunity to go short on the VIX. A good way to go was probably the  VXX etn, however one would suffer some substantial losses through the summer of 2010.
Now take a look what happens if the upside risk of the VXX (short term futures) is hedged by the VXZ (long term futures). It turns out that a the summer 2010 dip can be completely 'ironed out' by constructing a pair of short VXZ-long VXX, allocating 2x of the capital to VXX.  The graph below shows a portfolio of -22 shares VXX,  46 shares VXZ. Sharpe is just above 2.
Of course past returns are not necessarily representative for the future, but I imagine that during the next crisis this behavior could be repeated.
At this moment this strategy may be turning around, as downwards potential of the VIX seemes limited for now. But I can't wait for the next crisis. ;-).

Thursday, November 25, 2010

ActiveX vs Java API what's the difference?

There has been quite a discussion on Chans blog about ActiveX vs Java API. One of the claims presented by Anonymous is that ActiveX is missing ticks when another function is executing. This alarmed me as I rely heavily on the  ActiveX api.
So I've tested this claim: while tws events are handled in the background by the wrapper and listener classes, I start a ~10 second 100% cpu intensive task in the foreground. After this I examine the timestamps of each tick.
The results are in the figure:
On the horizontal axis is the logged tick number and tick timestamp is plotted vertically. The cpu is loaded between the red lines. This graph shows that during cpu load no ticks reach the listener class, however no events are lost as they are all fired up just after the cpu has some available time.
The other claims were quite vague as 'performance', as far as I'm concerned I've never had any performance issues using activeX. And last but not least, if you need 'performance' why do you use Matlab??? I would directly go for some native language and FIX CTCI api.

So I'm sticking with the ActiveX for now.

Monday, November 15, 2010

HOWTO: Wrap Interactive Brokers TWS api in a Matlab class.

There are many ways of interfacing Matlab with Interactive Brokers API. The main choice is of course whether you are going to buy a commercial product (like quant2ib) or go the DIY route (a-la Max Dama). For me it was the second option. The reason is that not only am I short of $600 for an interface, but in most cases a commercial product will have some limitations compared to your own or open-source code.
I've started with the code from Max that really got me a flying start, but his approach to interfacing is far from optimal. One would need a bunch of global variables and separate functions to get the data back and forth. Synchronization to events also becomes very difficult with this approach. Without object oriented design, one would quickly become bogged down in 'spaghetti' code.
So I went another route: wrapping all of the tws stuff in a single class and using events for synchronization. This was not trivial to do, as passing class methods as a callback function proved quite tricky.  With code from leptokurtosis.com (thanks!) this issue was solved and I finally managed to create a descent wrapper.

Here is the base code for the wrapper : ib_matlab_tutorial.zip
It's a basic skeleton that can be extended further. At this moment only market data subscriptions  are implemented. To add order placing, historical data etc. you'll need to add some code, but from here on most of the heavy lifting is already done.

How to use:
- install tws ActiveX , enable api connection in tws
- start TWS
- open demoScript.m  and step through the code blocks to see what happens.

File description:
CTws - main wrapper class .
CSymbolData - used by CTws as a data container
CListener - basic example of a listener class, listening to tws events.
GenericIbEvent - used for passing symbol data within an event (written by leptokurtosis.com)
demoScript - basic tws/listener demo

Have fun!
Any remaks/improvements are welcome!

Saturday, October 9, 2010

Behold the ideal database tooling combination

As I've mentioned earlier, I've spent most of my time in the last year managing data instead of developing strategies. Its a dirty job... but somebody's gonna do it  and with the right tooling somebody has already done it for you.
In my search for the ideal database tooling I've tried native matlab files, excel, MySql, Access and plain csv, before settling for far-from-ideal xml - *.mat combination. Still, a lot of programming is required to manage data this way.
A couple of days ago I finally found the solution: SQLite + mksqlite . Both packages are simply excellent: a serverless database contained in a single file plus a very fast matlab interface. The best thing with this combination is that one does not need a db server or any drivers to take full advantage of an SQL database.
Thanks to the open source community, I can now really start focusing on strategy development instead of data management.

Sunday, September 26, 2010

Yahoo historical data downloader

I've taken a look inside the get_hist_stock_data.m from Luminous Logic , the script that I was using for a looong time. The code seemed like it could use a little bit of cleaning up, but I've ended up rewriting it completely. The resulting code is just several lines long , accepts precise historical period and returns a struct.

Code:  download_hist_yahoo_data.m

Sunday, June 27, 2010

Where did all the money go???

We've seen the markets rise and fall all over again. Sometimes I've asked myself a question : ' When the prices fall, where does all the money go?' I could imagine that 'smart money' left first and somebody made a huge profit (just like in any pyramid scheme). Never really got the time to think about it, until today. Today somebody asked me this question and while trying to explain it all suddenly became clear. The 'money' was never really there. A good illustration are the house prices. Imagine you bought a house for $500k a couple of years ago. The prices have risen since then to 550k, so you've made a virtual profit of 10%. Now the prices fall again and you start loosing 'virtual' money. This is the same as unrealized pnl. The good thing with real estate is that they have some real underlying value, while for the stocks the underlying value is often completely unclear.

Monday, June 7, 2010

Strange things that happen at the open

Each stock has its quoted open price. But how is it formed? I've logged tick data during todays opening  and plotted it in the figure below. Each point represents a tick, the color tick type (ask, bid, trade). Take a note that NYSE opens at 15:30 in my time zone. According to Google finance, the stock opened at $37.5.
As you can see, the opening price jumps all over the place in the first minutes of trading. Seems that traders are not sure of what the fair price is, but it does settle after about one minute.
One conclusion is clear, if you want a stock against its opening price, you got to act fast.

Monday, May 31, 2010

Estimating the fair price of ETF components

The essence of every arbitrage strategy (in any timeframe) is an estimator of a 'fair price'. Once you got that, the rest is easy: just buy the stocks that are too low relative to the fair value and sell the overpriced ones.
Now the hard part: how do we estimate the fair value? Solutions could be countless: linear regression, neural networks, kalman filter, weight in the index, mean value, ... you name it. And of course the ETF tracker itself.
In the example below I've taken a look at the old time favorite group of stocks : the XLE. The data  plotted is a cumulative return for 1 week history with 15 minutes sampling rate. Notice the fat lines: the blue one is just the mean return of the group and the red one is the return of XLE itself.

It turns out that the returns of XLE are almost identical to the mean of the group! A quick conclusion from this is that the ETF itself is as good estimator as just the mean value!. Another conclusion could be that one does not need a phd-rocket-science-chaos-theory sophistication to  design a descent arbitrage strategy.

Saturday, May 29, 2010

Intraday data downloader for TWS

Having descent historical data is essential for developing a good strategy. But access to (free) intraday data is almost non-existent. However, if you have an account at interactive brokers, they let you download historical data for free.
In the last couple of weeks I've been very busy with implementing an intraday arbitrage strategy. The strategy itself is very simple with zero (!!!)  parameters, but the amount of code needed for data and error handling is much more than I've anticipated. Have been coding data handling and conversion for a couple of weeks now with one clear conclusion: I hate programming. So this is how it feels to be the low-level software engineer aka code-monkey ;-).  
However, during this process I managed to get Matlab to talk to TWS through their API and written a historic data downloader that could spare some boring work for other people.
The code is based on tutorial written by Max Dama (thanks Max!). You should follow the steps described on his page to install and configure the tws api. After that just fire it up:

hDataIntraday = getHistoryTws(tickers, filename_copy ,period, barsize);

Keep in mind that IB imposes restrictions on data download :  Historical Data Limitations
The data is downloaded to a structure:
hDataIntraday :
         stocks: [1x20 struct]
                  ticker: 'AEE'
                 history: [1x1 struct]
                           dates: [253x1 double]
                            open: [253x1 double]
                           close: [253x1 double]
                          volume: [253x1 double]

                  ticker: 'AEP'
                 history: [1x1 struct]
                           dates: [253x1 double]
                            open: [253x1 double]
                           close: [253x1 double]
                          volume: [253x1 double]

         last_update: '29-May-2010 16:07:21'


The code: tws_history_downloader.zip

Monday, March 22, 2010

Forex costs

Once again I must come to the conclusion that Forex costs are too high for a descent high frequency strategy. By high frequency I mean time windows of less than an hour. Everything I've tried so far gets killed by the spreads, very frustrating. When the arbitrage spread is just 2-3 times larger than the broker spread all the odds shift against my favor.
As the arbitrage combinations are quite limited in Forex, there are probably just too many players with access to much tighter spreads playing this game. No easy money here, moving on...

Sunday, March 21, 2010

Transaction costs : forex vs stocks

I've decided to take a quick look at the transaction costs involved in forex versus stock trading. Most of the forex brokers brag about 'no transaction costs', but the impact of the ask-bid spread is often unclear. The results were  as expected: forex has higher transaction costs than the stock market. .
To enable the comparison of the costs I've defined a cost_ratio=spread/(max(bid)-min(bid)); . In other words, the cost ratio is the transaction cost divided by the price range in some given period of time. I am using a week. To calculate the cost ratio for forex I've taken the spreads for 'USD/CAD'    'USD/CHF'    'USD/DKK'    'USD/NOK'    'USD/SEK' pairs relative to their weekly range and then averaged them. (data source: Gain Capital)

The cost ratio for a stock is calculated in a similar way. SPY has a weekly range of about 2% (rough estimate) , one share price is around  $115, so the weekly range is 0.02*115 = $2.3. Transaction costs at IB are $0.005 a share, resulting in a cost ratio of  2.2e-3.

The results are in the graph below:

The cost ratio for forex seemst to be around 5 times higher than for stocks. This changes of course if a lower priced stock is traded. For a $20 share with the same weekly range , the price ratios between forex and stock market are roughly equeal.

I guess the difference in cost ratios is the price you pay for liquidity, minimal slippage and real-time data.

Thursday, March 18, 2010

Intraday data handling can be easy, when you've got the right tools.

The title of this post could be just as well 'In praise of Matlab' ( or possibly SciLab or R, but not Excel). Yesterday it took me only a couple of hours to complete a task that otherwise would be almost impossible to achieve without the right tooling. I was so excited about how easy it turned out to be that I decided to share my experience.
For some time now I was going to take a look at Forex intraday data. It tempted me because of the law of  'large numbers'. A model should be allowed to make a couple of hundred of trades before its profitability can be estimated reliably. For a swing-trading system this should mean at least half a year paper trading and I just don't like to wait. In case of an intraday system, one only needs a couple of days!  Forex data is freely available (intraday data archive)   making it a good candidate to test some intraday strategies.
But once you download some data, a problem becomes obvious: the data is sampled at irregular intervals, making it very difficult to compare one dataset to another. Take a look at the beginning of the data files:
USD/CHF:
lTid,cDealable,CurrencyPair,RateDateTime,RateBid,RateAsk
1051976017,D,USD/CHF,2010-01-31 17:04:45,1.060100,1.060600
1051976080,D,USD/CHF,2010-01-31 17:05:20,1.060200,1.060600
1051976102,D,USD/CHF,2010-01-31 17:05:22,1.060100,1.060600
1051976119,D,USD/CHF,2010-01-31 17:05:22,1.060200,1.060700
------------------------------------------------------------------------
USD/CAD
lTid,cDealable,CurrencyPair,RateDateTime,RateBid,RateAsk
1051975015,D,USD/CAD,2010-01-31 17:01:50,1.070500,1.071000
1051975069,D,USD/CAD,2010-01-31 17:01:56,1.070500,1.071100
1051976049,D,USD/CAD,2010-01-31 17:05:14,1.070500,1.071200
1051976441,D,USD/CAD,2010-01-31 17:06:10,1.070600,1.071200

The sampling time seems to vary between less than a second to more than a minute! Even if you don't want to compare these two datasets, the data is still unusable for a backtest.

Now imagine trying to align these datasets in Excel. If somebody has an idea, please let me know ;-).
However, in a technical software tool (like Matlab ) there should be an interpolation function.
In Matlab it is the wonderful interp1 . It accepts available data along with a new vector of time values for interpolation.
I've used a 10-second interval for interpolation, while synchronizing start times between different datasets. The result is a neat matrix with prices at even time periods. Of course this introduces an error , but looking at the interpolated data (see graph below, deeply zoomed in), less than 1/2 pip, nothing serious.

Again, it gives me a headache just thinking about having to complete this task without the right tooling.

Saturday, March 13, 2010

The temptation of overfitting and how to resist it.

One of the greatest temptations when designing a  quantitative system is  without a doubt overfitting (or data snooping). Sometimes it is so evident that only a novice can get fooled by it, sometimes really hard to point out. For example, everybody knows that you should not try to predict the future prices by linear extrapolation. The higher the polinome order you are using, the better it will fit the training data and the worse it will fit the test data.
Still, I often come across traders running insane optimizations of a 10+ parameter model. But the truth is, if your data is bad , no amount of optimization will help. Imagine a true random walk process. It can not be predicted by definition, but by pure chance it can produce something that *looks* predictable. These situations can lead to a life-long quest for a 'jesus indicator', which of course can not be found. Speaking of indicators I must say that I do not believe in them. Well, I do use a couple occasionally, but for me it is only tooling. Just like a wrench could be useful from time to time as a tool, but there is really no need to 'believe' in it.
I've spent a lot of time lately developing a robust model and things are going in the right direction. Here is an example of a P&L from one of the better models. It does not account for transaction costs and slippage yet, but the good thing is that the result is not optimized in any way.
My way of avoiding overfitting is
  • making a hypothesis of a physical phenomenon, and testing it on multiple datasets. If the existence of  the phenomenon is proven, only then a robust model can be bult.
  • avoiding models that produce negative sharpe for any model settings
  • optimizing the data instead of the model. In other words, if a general model does not work on your data, go find a better one.

Thursday, February 25, 2010

Using VIX for volatility correction

Market neutral strategies often rely on a relative mispricing of two instruments. One of the challenges I've been facing is how to keep this mispricing constant in time, allowing it to stretch much further before initiating a trade in times of high market volatility.
A solution I've come up with is using the VIX index as a correction measure. It seems to work much better than a moving window estimation based on the data itself.  The formula I'm using for correction is  C = (100-VIX)/100 . The spread is then multiplied by C.
 
Upper graph: VIX , lower graph : real and corected spreads.

Notice how the corrected spread remains stable through the end of 2008.

P.S. The spread shown is a relative mispricing in time an not an actual X/Y ratio.

Wednesday, February 17, 2010

Probability mapping (2)

While building a model based on the probability mapping from the previous post I wasn't quite satisfied with the initial results. So I took a step back to a one dimensional state space and plotted the 5 day forecast of the XLE/XOM vs bollinger %b. Zero on the x-axis corresponds with -3 sigma and 100 with +3 sigma deviation . Y-axis shows the ratio between XLE/XOM  5 days into the future and today .


The figure shows just how difficult it is to forecast the movement of the ratio. Ideally, the data should follow a band from upper left to lower right corner. Instead, it is quite hard to see any trend present. After a linear fit (red line) some trend can be found still. Note that around 50 pct_b it is exactly coin flip between increase and  decrease in XLE/XOM.

Tuesday, February 16, 2010

Probability mapping

Here is a nice idea I've got during cross-country skiing this weekend. The 'classic' way of trading pairs is defining some measure of divergence from the mean, such as z-score. Outside some threshold a buy or sell signal is triggered. This brought me to thinking about what we are actually doing here. In essence, we are using a linear classifier in a 1-d space. By optimizing the model, the classifier is trained for an optimal value. People familiar with pattern recognition know that the linear classifier is the most basic and limited tool there is.
Having quite a background in Q-learning (my masters thesis), I understand its beautiful ability to map a state space to an expected reward, elegantly and without any models. Doesn't trading in general boil down to state-reward mapping? For sure!
I have tried different implementations of reinforcement learning without much success. But this weekend I managed to combine some ideas from pattern recognition and RL to an implementation that could work.
Some people are probably wondering by now: ' Here we go with the AI bullshit again!'. I'd like to call it probability mapping. Also, probably many of the advanced quantitative traders are already using it. Its all about estimating the chances of a bet at a set of certain conditions.
First, I define the conditions, called 'feature 1' and 'feature 2'. Two features means a two dimensional feature space, nothing keeps us from making it more (or less) dimensional, but since my monitor can best visualize two dimensions I choose that number. In my case, both features are oscillators based on cumulative returns over past x days. Feature 1 uses 3 days averaging and feature 2 20 days. Any other measure could be used (RSI, Stochastics, etc).
In the figure above the ratio between XLE and XOM is plotted along with two oscillators. It is clear that a high oscillator value correlates with the subsequent drop in the ratio.
Normally you could start applying threshold conditions from here based on oscillator levels, but i want to go a couple of steps further.

So now I plot my state map for all values of feature 1 and 2 along with the corresponding future XLE/XOM ratio after 5 days. A green dot represents increase and red dot decrease of the XLE/XOM ratio.
From this map an estimation can be made how likely it is for the ratio to go up or down for each combination of the features. Let's call it 'Sharpe surface'.  I define it similar to the sharpe ratio:  mean(20 nearest neighbors)/std(20 nearest neighbors).
 

 Scanning the feature space gives me the nice plot as above (note that the vertical axis is flipped over compared to the previous figure).

The interpretation of the sharpe surface is very simple: expect the XLE/XOM ratio to rise in red areas and drop in the blue. This is very much in line with common sense:  low values of both features correspondent with anticipated increase of the ratio (see feature 2 <20 or feature 1 < 20).
Again, normally we would use just one of the dimensions and put thresholds somewhere around 20 and 80. But with this probability mapping we can go for cherry picking!

Any remarks about the code are very welcome.

Saturday, January 30, 2010

Which sharpe is good enough?

I've made some progress designing a market neutral strategy. My test set is XLE (stocks and its component stocks (2001-2009). The strategy trades each stock against the XLE using bollinger bands. A trade is entered when the spread between XLE and a stock exceedes 2. A position is closed after the spread crosses zero. As a safety measure, a stop-loss at 5% is implemented together with a maximum holding period of 60 days. Transaction costs are 0.2% one-way.
Generally I get sharpe ratios around 0.8  without optimization ( z-score and window size are fixed).

Now I'm unsure how far can I push the simple stock-index arbitrage. If I start optimizing for a specific stock, overfitting seems to occur very fast.

I still see some possible improvements to the strategy, but I'm not sure if I'll be able to push it above 1. There should be a limit how much profit you can squeeze from stock-etf arbitrage anyway.

The question I'm asking myself now 'is this good enough?'. Apart from a personal preference about risk I'm very curious about the results achieved by others.

Which Sharpe ratio is 'good enough' for stock-index arbitrage in your opinion?