High-Frequency Automated Trading (HFAT); Part 1

Backfilled vs Real-Time Data issues

If you are not sure what HFAT (High Frequency Automated Trading) is all about, please Google the topic. This post highlights some of the problems you may encounter when venturing into HFAT of stocks. The views expressed here are based on personal experiences and/or may be anecdotal; not everything that happens in real-time trading is easy to explain. If you have technical insight and see inaccuracies, please comment for the benefit of future readers.

Designing and implementing high frequency trading systems is, from a trader’s viewpoint, probably the ultimate experience. To see and hear trades executed every few seconds and see the profits rolling in should give any trader an unprecedented high.

The catch is that to design an HFAT system that works with real money is very different from designing one using local data. The smaller the time frame the greater the impact of small data discrepancies. In sub-minute time frames, your HFAT system may perform very differently with local data than with real-time, raw market data.

A typical problem is that real-time live market data are delayed by up to several hundred milliseconds and that quotes may arrive out of sequence. What you see on your charts may be several quotes after the trade took place. This flawed data is what your trading system is trading and must be designed to work with. The charts you see in AmiBroker are mostly backfilled and/or updated after trading hours. At the end of a trading day, you will have data in your database that have a mixture of backfilled data (time and data errors have been corrected) and raw data (flawed) that were collected during the current day’s trading session. You may also have several lengthy data gaps that were introduced when you shut down the system and/or you lost your data feed.

While the procedure may vary for different data providers, quotes that are received in real time will lag in time. Since bar periods during live data collection are based on your computer clock, quotes may end up in the next bar due to their delayed arrival. The data used to backfill your database come from a different data server and will be time stamped. This allows AmiBroker to correct the position of quotes that were received out of sequence. This process removes the real time delays that were present when the data was received.

Since there are no delays with backfilled data, your backfilled data look ahead by several hundred milliseconds with respect to the data you will eventually be trading.

It is not unusual to develop a system with 5-second backfilled data (where all bad ticks and time-stamp errors have been corrected by the data provider) and obtain Holy Grail performance only to find out that when traded with real-time streaming data (where the data is delayed, contains bad ticks and time-stamp errors), the system is a total failure. The following charts illustrate this problem. The data to the left of the red line is backfilled and the data to the right of the Red line is data collected in real time. White is the equity.

You will not be able to visually judge whether data are backfilled or raw. The differences will only show up by running a trading system on the data; your trading system may be the only way to distinguish between backfilled and raw data. The chart below shows a close up of the data change.

Backfilling the above database and performing another Backtest over the same period produces a very different equity:

There is no guarantee that a system developed on one type of data will perform equally well with the other. When you first encounter a major equity drawdown, you may assume that this was just “a bad day”; after all, all trading systems have them. You may have developed and backtested your system over thousands of trades, covering a period of six months or more. You have been a good student and have used all the recommended methods to validate your trading system. You have tested in- and out-of-sample, applied intelligent optimizations, used Walk-Forward testing, performed Monte-Carlo analysis, and the list goes on. After being so thorough, how could you go wrong? You are ready to trade real money tomorrow and make your first 50% in one day!

The point is that all this effort is wasted time if the data used during development aren’t 100% identical to what you will be trading with.

The best way to develop an HFAT system is to use real live market data. The earlier you change from local or edemo data to real data, the more time you will save, and the more disappointments you will be spared. An HFAT system can never be completed off line, with a local database, or with simulated edemo Data. Its design must always include a significant paper trading and real-money phase.

Another problem when trading your IB paper-trading (simulated) account is that the user does not know the rules Interactive Brokers uses to decide whether an order should be executed or not. These execution criteria may change without warning. This imposes an artificial order to your executions that is unreal; the simulated market conditions will be different from those encountered in real trading. You may well develop a trading system that exploits IB’s way of processing to give you unreal performance, but such a system would fail in real trading.

Also, your paper-trades are not seen by, and cannot influence, the market. When trading real money your orders could be setting a new High or a Low, or if you are trading large sums, you could draw the price up or down. This means that even if your system performs extremely well in simulated trading, this is no guarantee that your system will perform well trading real money.

Using your simulated account to validate your system should never be your final validation before trading for profits; you should always include a real-money evaluation phase in your development plan. Your first real trades should never be to make money; they should be planned to validate your system under varied conditions.

Market behavior is very complex; be prepared for the unexpected and never skip a development step because something works extremely well. For example you might be testing your system using your IB simulated paper-trading account and see your profits skyrocket too fast to follow, perhaps having 90% winners and RARs that are out of this world. When this happens, it is extremely exciting and fun to watch; it is a rare experience that must be appreciated. It suggests that Holy-Grails are possible. But are they? Such favorable trading conditions may last for a few hundred trades, a few hours, or perhaps a few days. This can happen when technical conditions and market behavior are all just perfect for your system. Some unknown factor just made everything work perfectly. When you experience this, you’ll be analyzing your charts, trading log, execution report, etc. for weeks to follow. The fact is that it may never happen again, and you may never know what really happened.

Order and Position Status

IB Position size reporting may be erratic, is always delayed, and may include transient information. If you are trading fast and you use the IB Position Size to determine your next action, this will be a problem. This is especially the case with reversal systems where Covers may be processed before the Buys, and there may be many partial fills. For example, if you are reversing 100 shares, going alternatively Long and Short, you might read position sizes of 0, 100, 200, and even 300 shares. Do not base your system’s action solely on a single reading of the position size; your protective mechanisms will shut down your system many times a day. If a position is not what it is supposed to be on 5 consecutive queries (at quote interval), you may want to close all positions, suspend operation and continue later, or shut down the system and retry later.

Reporting of order status appears more reliable and stable. Usually it seems unnecessary to repeat Order Status queries.

IB Snapshots

Not addressed in this post is the matter of Snapshots however it is extremely important for real-time traders to understand how IB compresses and transmits its data. This topic has been discussed on several forums, for more information on IB data in general please read the following threads:

AmiBroker user group: Interactive Brokers Plug-in dropping volume data
IB’s Discussion Board: Globex Ticks snapshot or reality?
AmiBroker User Group: AB Tick Bar Analysis

The IB Maximum Message Rate

IB has a limit to the maximum rate of messages (order related) you can transmit per second. The rate of queries is not limited. The current limit is 50 messages per second. If you exceed this rate, IB will produce an error code, and if you continue to exceed the message rate, IB will suspend your connection. This, of course, should be prevented at all cost. The message rates are documented here. How to introduce real-time delays measuring in milliseconds is documented in the post on High Precision Interval and Delay Timing.

Internet Delays

Order and Position status is subject to a 50-400 milliseconds Internet delay. This delay will vary with your location and type of Internet connection to IB. You can test this delay by pinging the IB server. To do this type ping gw1.ibllc.com on command in the Start->Run windows (for Windows XP), and click the run button. A window, as shown below, will appear and show you the delays for three consecutive queries (pings) to IB:

If you encounter excessive delays or cannot connect at all, you can get more details about how your connection is routed by running tracert gw1.ibllc.com in the same manner. You may want to browse the Technical FAQ at IB for related items.

Edited by Al Venosa.

1 Star2 Stars3 Stars4 Stars5 Stars (9 votes, average: 4.89 out of 5)

8 Responses to “High-Frequency Automated Trading (HFAT); Part 1”

  1. December 1st, 2007 | 11:00 am

    It seems to me that this:

    It is not unusual to develop a system with 5-second local data and obtain Holy Grail performance only to find out that, after a backfill of data, the system is a total failure.

    should read like this instead:

    It is not unusual to develop a system with 5-second local data (backfilled, timestamped, analyzed with zero delay) and obtain Holy Grail performance only to find out that when traded in realtime (streaming data, non-zero delay), the system is a total failure.

    Is the first paragraph above really meaning to say the same thing as the 2nd? Or is it saying something else?

  2. December 1st, 2007 | 11:07 am

    BTW, I think that this is a great discussion/illustration of a critical topic for HFAT!

  3. Herman
    December 1st, 2007 | 12:11 pm

    Thank you Progster, you are correct and I revised the sentence.

  4. December 7th, 2007 | 10:25 am

    I really like this article. It repays repeated reading!

    “Since bar periods during live data collection are based on your computer clock, quotes may end up in the next bar due to their delayed arrival.”

    This is a general and eye-catching statement. Is it a property of the data feed being discussed, a property of AB realtime bar building, or a combination of the two? Would this be true if using ESignal instead of IB? (etc.)

    IOW, are there not timestamps on the streaming data as it arrives? Are those timestamps not used by AB when building bars?

    Is there a way to save a day of streaming data, then set it aside, get the backfill, save that, and then compare the two (either visually on a chart, or in Excel)?

    Can a screen movie be used to demonstrate that bar building is based on local clock rather than timestamp of arriving data?

  5. Herman
    December 7th, 2007 | 11:17 am

    Hello Progster,

    As I explained at the start of the article; “The views expressed here are based on personal experiences and/or may be anecdotal”. I am also indebted to my trading partners who helped me put this topic into context with practical observations. Personally, I am by no means an expert on how data are processed between the time the actual trade takes place at the market and the time we see it on our chart. My experience tells me that one can write a book on this. There are many absolutely critical aspects of real-time stock data collection and trading. Most are not covered anywhere.

    AFAIK, live IB data doesn’t have a timestamp but IB Backfill data does (the data comes from different servers). eSignal timestamps its data but that doesn’t help because eSignal is subject to delayed market data just like anyone else. Live data should only contain as-it-becomes-available data, live data should never be corrected on the fly else our charts and trading signals would change retroactively. The pleasure of changing signals/prices is reserved for after backfill :-) With regards to the deceptive HG performance with 5-second backfilled data, this is true for IB as well as eSignal, and undoubtedly for all other data providers.

    Yes, you can collect Raw data and compare it to backfilled data (that is how I produced the charts). In AmiBroker Backfill cannot be turned OFF during live trading, so you have to be very careful that you do not inadvertently corrupt your raw data with backfill data. To facilitate real-time system development I create a new 5-second data base each day and set it to Local before shutting down. Such personal raw data bases are vital for real-time system development and cannot be purchased of downloaded from anywhere.


  6. December 7th, 2007 | 1:38 pm

    Creating a new 5-sec database each day …

    That’s serious business! Do you define it prior to the session start so that the first tick in the DB is the first tick of the session? Otherwise there would be backfill present as you mention, yes?

    So if you are successful, you are accumulating dozens and ultimately hundreds of days of data, each in it’s own database. Is there any way to knit these together? If not, do you test your strategies on each day individually and then somehow aggregate the results?

    It seems the issues you are raising might possibly make a good argument for the ability to optionally disable all backfill for a database.

  7. Herman
    December 7th, 2007 | 2:01 pm

    Databases can be merged however if you make hundreds of trades a day you may only need a few days of data to develop with. Even so, it is a good idea to backup old data because things will happen that you want to analyze later. To prevent backfill during a new day, set the DB to 24 hours, start the system before the market open but develop with RTH data. Serious development should be done with live data.

  8. December 11th, 2007 | 11:29 am

    RTH = regular trading hours (i.e. the day session)
    (I had to look it up … :^)