November 28, 2007
Backfilled vs Real-Time Data issues
If you are not sure what HFAT (High Frequency Automated Trading) is all about, please Google the topic. This post highlights some of the problems you may encounter when venturing into HFAT of stocks. The views expressed here are based on personal experiences and/or may be anecdotal; not everything that happens in real-time trading is easy to explain. If you have technical insight and see inaccuracies, please comment for the benefit of future readers.
Designing and implementing high frequency trading systems is, from a trader’s viewpoint, probably the ultimate experience. To see and hear trades executed every few seconds and see the profits rolling in should give any trader an unprecedented high.
The catch is that to design an HFAT system that works with real money is very different from designing one using local data. The smaller the time frame the greater the impact of small data discrepancies. In sub-minute time frames, your HFAT system may perform very differently with local data than with real-time, raw market data.
A typical problem is that real-time live market data are delayed by up to several hundred milliseconds and that quotes may arrive out of sequence. What you see on your charts may be several quotes after the trade took place. This flawed data is what your trading system is trading and must be designed to work with. The charts you see in AmiBroker are mostly backfilled and/or updated after trading hours. At the end of a trading day, you will have data in your database that have a mixture of backfilled data (time and data errors have been corrected) and raw data (flawed) that were collected during the current day’s trading session. You may also have several lengthy data gaps that were introduced when you shut down the system and/or you lost your data feed.
While the procedure may vary for different data providers, quotes that are received in real time will lag in time. Since bar periods during live data collection are based on your computer clock, quotes may end up in the next bar due to their delayed arrival. The data used to backfill your database come from a different data server and will be time stamped. This allows AmiBroker to correct the position of quotes that were received out of sequence. This process removes the real time delays that were present when the data was received.
Since there are no delays with backfilled data, your backfilled data look ahead by several hundred milliseconds with respect to the data you will eventually be trading.
It is not unusual to develop a system with 5-second backfilled data (where all bad ticks and time-stamp errors have been corrected by the data provider) and obtain Holy Grail performance only to find out that when traded with real-time streaming data (where the data is delayed, contains bad ticks and time-stamp errors), the system is a total failure. The following charts illustrate this problem. The data to the left of the red line is backfilled and the data to the right of the Red line is data collected in real time. White is the equity.
You will not be able to visually judge whether data are backfilled or raw. The differences will only show up by running a trading system on the data; your trading system may be the only way to distinguish between backfilled and raw data. The chart below shows a close up of the data change.
Backfilling the above database and performing another Backtest over the same period produces a very different equity:
There is no guarantee that a system developed on one type of data will perform equally well with the other. When you first encounter a major equity drawdown, you may assume that this was just “a bad day”; after all, all trading systems have them. You may have developed and backtested your system over thousands of trades, covering a period of six months or more. You have been a good student and have used all the recommended methods to validate your trading system. You have tested in- and out-of-sample, applied intelligent optimizations, used Walk-Forward testing, performed Monte-Carlo analysis, and the list goes on. After being so thorough, how could you go wrong? You are ready to trade real money tomorrow and make your first 50% in one day!
The point is that all this effort is wasted time if the data used during development aren’t 100% identical to what you will be trading with.
The best way to develop an HFAT system is to use real live market data. The earlier you change from local or edemo data to real data, the more time you will save, and the more disappointments you will be spared. An HFAT system can never be completed off line, with a local database, or with simulated edemo Data. Its design must always include a significant paper trading and real-money phase.
Another problem when trading your IB paper-trading (simulated) account is that the user does not know the rules Interactive Brokers uses to decide whether an order should be executed or not. These execution criteria may change without warning. This imposes an artificial order to your executions that is unreal; the simulated market conditions will be different from those encountered in real trading. You may well develop a trading system that exploits IB’s way of processing to give you unreal performance, but such a system would fail in real trading.
Also, your paper-trades are not seen by, and cannot influence, the market. When trading real money your orders could be setting a new High or a Low, or if you are trading large sums, you could draw the price up or down. This means that even if your system performs extremely well in simulated trading, this is no guarantee that your system will perform well trading real money.
Using your simulated account to validate your system should never be your final validation before trading for profits; you should always include a real-money evaluation phase in your development plan. Your first real trades should never be to make money; they should be planned to validate your system under varied conditions.
Market behavior is very complex; be prepared for the unexpected and never skip a development step because something works extremely well. For example you might be testing your system using your IB simulated paper-trading account and see your profits skyrocket too fast to follow, perhaps having 90% winners and RARs that are out of this world. When this happens, it is extremely exciting and fun to watch; it is a rare experience that must be appreciated. It suggests that Holy-Grails are possible. But are they? Such favorable trading conditions may last for a few hundred trades, a few hours, or perhaps a few days. This can happen when technical conditions and market behavior are all just perfect for your system. Some unknown factor just made everything work perfectly. When you experience this, you’ll be analyzing your charts, trading log, execution report, etc. for weeks to follow. The fact is that it may never happen again, and you may never know what really happened.
Order and Position Status
IB Position size reporting may be erratic, is always delayed, and may include transient information. If you are trading fast and you use the IB Position Size to determine your next action, this will be a problem. This is especially the case with reversal systems where Covers may be processed before the Buys, and there may be many partial fills. For example, if you are reversing 100 shares, going alternatively Long and Short, you might read position sizes of 0, 100, 200, and even 300 shares. Do not base your system’s action solely on a single reading of the position size; your protective mechanisms will shut down your system many times a day. If a position is not what it is supposed to be on 5 consecutive queries (at quote interval), you may want to close all positions, suspend operation and continue later, or shut down the system and retry later.
Reporting of order status appears more reliable and stable. Usually it seems unnecessary to repeat Order Status queries.
Not addressed in this post is the matter of Snapshots however it is extremely important for real-time traders to understand how IB compresses and transmits its data. This topic has been discussed on several forums, for more information on IB data in general please read the following threads:
The IB Maximum Message Rate
IB has a limit to the maximum rate of messages (order related) you can transmit per second. The rate of queries is not limited. The current limit is 50 messages per second. If you exceed this rate, IB will produce an error code, and if you continue to exceed the message rate, IB will suspend your connection. This, of course, should be prevented at all cost. The message rates are documented here. How to introduce real-time delays measuring in milliseconds is documented in the post on High Precision Interval and Delay Timing.
Order and Position status is subject to a 50-400 milliseconds Internet delay. This delay will vary with your location and type of Internet connection to IB. You can test this delay by pinging the IB server. To do this type ping gw1.ibllc.com on command in the Start->Run windows (for Windows XP), and click the run button. A window, as shown below, will appear and show you the delays for three consecutive queries (pings) to IB:
If you encounter excessive delays or cannot connect at all, you can get more details about how your connection is routed by running tracert gw1.ibllc.com in the same manner. You may want to browse the Technical FAQ at IB for related items.
Edited by Al Venosa.