Installation and Deployment
Installation Guides
For installation instructions, please see the Installation tutorial for your platform.
Architecture
QuantRocket utilizes a Docker-based microservice architecture. Users who are unfamiliar with microservices or new to Docker may find it helpful to read the overview of QuantRocket's architecture.
License key
Activation
To activate QuantRocket, look up your license key on your account page and enter it in your deployment:
$ quantrocket license set 'XXXXXXXXXXXXXXXX'
account:
account_limit: 100000 USD
exchanges: ASX
licensekey: XXXXXXXXXXXXXXXX
plan:
currency: USD
name: Bottle Rocket
>>> from quantrocket.license import set_license
>>> set_license("XXXXXXXXXXXXXXXX")
{'licensekey': 'XXXXXXXXXXXXXXXX',
'account': {'account_limit': '100000 USD'},
'exchanges': 'ASX',
'plan': {'name': 'Bottle Rocket', 'currency': 'USD'}}
$ curl -X PUT 'http://houston/license-service/license/XXXXXXXXXXXXXXXX'
{"licensekey": "XXXXXXXXXXXXXXXX", "account": {"account_limit": "100000 USD"}, "exchanges": "ASX", "plan": {"name": "Bottle Rocket", "currency": "USD"}}
View your license
You can view the details of the currently installed license:
$ quantrocket license get
account:
account_limit: 100000 USD
exchanges: ASX
licensekey: XXXXXXXXXXXXXXXX
plan:
currency: USD
name: Bottle Rocket
>>> from quantrocket.license import get_license_profile
>>> get_license_profile()
{'licensekey': 'XXXXXXXXXXXXXXXX',
'account': {'account_limit': '100000 USD'},
'exchanges': 'ASX',
'plan': {'name': 'Bottle Rocket', 'currency': 'USD'}}
$ curl -X GET 'http://houston/license-service/license'
{"licensekey": "XXXXXXXXXXXXXXXX", "account": {"account_limit": "100000 USD"}, "exchanges": "ASX", "plan": {"name": "Bottle Rocket", "currency": "USD"}}
Refresh license profile
The license service will re-query your subscriptions and permissions every 10 minutes. If you make a change to your billing plan and want your deployment to see the change immediately, you can force a refresh:
$ quantrocket license get --force-refresh
account:
account_limit: 100000 USD
exchanges: ASX
licensekey: XXXXXXXXXXXXXXXX
plan:
currency: USD
name: Bottle Rocket
>>> from quantrocket.license import get_license_profile
>>> get_license_profile(force_refresh=True)
{'licensekey': 'XXXXXXXXXXXXXXXX',
'account': {'account_limit': '100000 USD'},
'exchanges': 'ASX',
'plan': {'name': 'Bottle Rocket', 'currency': 'USD'}}
$ curl -X GET 'http://houston/license-service/license?force_refresh=true'
{"licensekey": "XXXXXXXXXXXXXXXX", "account": {"account_limit": "100000 USD"}, "exchanges": "ASX", "plan": {"name": "Bottle Rocket", "currency": "USD"}}
Account limit validation
For subscription plans with an account limit, the account limit applies to live trading using the blotter and to real-time data. The account limit does not apply to historical data collection, research, or backtesting. For advisor accounts, the account size is the sum of all master and sub-accounts.
Paper trading is not subject to the account limit, however paper trading requires that the live account limit has previously been validated. Thus before paper trading it is first necessary to connect your live account at least once and let the software validate it.
To validate your account limit if you have only connected your paper account:
To verify that account validation has occurred, refresh your license profile. It should now display your account balance and whether the balance is under the account limit:
$ quantrocket license get --force-refresh
account:
account_balance: 73716.57 USD
account_balance_details:
- Account: U12345
Currency: USD
NetLiquidation: 73716.57
account_balance_under_limit: true
account_limit: 100000 USD
exchanges: ASX
licensekey: XXXXXXXXXXXXXXXX
plan:
currency: USD
name: Bottle Rocket
>>> from quantrocket.license import get_license_profile
>>> get_license_profile(force_refresh=True)
{'licensekey': 'XXXXXXXXXXXXXXXX',
'account': {'account_limit': '100000 USD'
'account_balance': '73716.57 USD',
'account_balance_under_limit': True,
'account_balance_details': [{'Account': 'U12345',
'Currency': 'USD',
'NetLiquidation': 73716.57}]},
'exchanges': 'ASX',
'plan': {'name': 'Bottle Rocket', 'currency': 'USD'}}
$ curl -X GET 'http://houston/license-service/license?force_refresh=true'
{"licensekey": "XXXXXXXXXXXXXXXX", "account": {"account_limit": "100000 USD", "account_balance": "73716.57 USD", "account_balance_under_limit": true, "account_balance_details": [{"Account": "U12345", "Currency": "USD", "NetLiquidation": 73716.57}]}, "exchanges": "ASX", "plan": {"name": "Bottle Rocket", "currency": "USD"}}
If the command output is missing the account_balance
and account_balance_under_limit
keys, this indicates that the account limit has not yet been validated.
Now you can switch back to your paper account and begin paper trading.
IB account structure
Multiple logins and data concurrency
The structure of your IB account has a bearing on the speed with which you can collect real-time and historical data with QuantRocket. In short, the more IB Gateways you run, the more data you can collect. The basics of account structure and data concurrency are outlined below:
- All interaction with the IB servers, including real-time and historical data collection, is routed through IB Gateway, IB's slimmed-down version of Trader Workstation.
- IB imposes rate limits on the amount of historical and real-time data that can be received through IB Gateway.
- Each IB Gateway is tied to a particular set of login credentials. Each login can be running only one active IB Gateway session at any given time.
- However, an account holder can have multiple logins—at least two logins or possibly more, depending on the account structure. Each login can run its own IB Gateway session. In this way, an account holder can potentially run multiple instances of IB Gateway simultaneously.
- QuantRocket is designed to take advantage of multiple IB Gateways. When running multiple gateways, QuantRocket will spread your market data requests among the connected gateways.
- Since each instance of IB Gateway is rate limited separately by IB, the combined data throughput from splitting requests among two IB Gateways is twice that of sending all requests to one IB Gateway.
- Each separate login must separately subscribe to the relevant market data in IB Client Portal. (This refers to IB market data subscriptions, not QuantRocket exchange permissions.)
Below are a few common ways to obtain additional logins.
IB account structures are complex and vary by subsidiary, local regulations, the person opening the account, etc. The following guidelines are suggestions only and may not be applicable to your situation.
Second user login
Individual account holders can add a second login to their account. This is designed to allow you to use one login for API trading while using the other login to use Trader Workstation for manual trading or account monitoring. However, you can use both logins to collect data with QuantRocket. Note that you can't use the same login to simultaneously run Trader Workstation and collect data with QuantRocket. However, QuantRocket makes it easy to start and stop IB Gateway on a schedule, so the following is an option:
- Login 1 (used for QuantRocket only)
- IB Gateway always running and available for data collection and placing API orders
- Login 2 (used for QuantRocket and Trader Workstation)
- automatically stop IB Gateway daily at 9:30 AM
- Run Trader Workstation during trading session for manual trading/account monitoring
- automatically start IB Gateway daily at 4:00 PM so it can be used for overnight data collection
Advisor/Friends and Family accounts
An advisor account or the similarly structured Friends and Family account offers the possibility to obtain additional logins. Even an individual trader can open a Friends and Family account, in which they serve as their own advisor. The account setup is as follows:
- Master/advisor account: no trading occurs in this account. The account is funded only with enough money to cover market data costs. This yields 1 IB Gateway login.
- Master/advisor second user login: like an individual account, the master account can create a second login, subscribe to market data with this login, and use it for data collection.
- Client account: this is main trading account where the trading funds are deposited. This account receives its own login (for 3 total). By default this account does not having trading permissions, but you can enable client trading permissions via the master account, then subscribe to market data in the client account and begin using the client login to run another instance of IB Gateway. (Note that it's not possible to add a second login for a client account.)
If you have other accounts such as retirement accounts, you can add them as additional client accounts and obtain additional logins.
Paper trading accounts
Each IB account holder can enable a paper trading account for simulated trading. You can share market data with your paper account and use the paper account login with QuantRocket to collect data, as well as to paper trade your strategies. You don't need to switch to using your live account until you're ready for live trading (although it's also fine to use your live account login from the start).
Note that, due to restrictions on market data sharing, it's not possible to run IB Gateway using the live account login and corresponding paper account login at the same time. If you try, one of the sessions will disconnect the other session.
IB market data permissions
To collect IB data using QuantRocket, you must subscribe to the relevant market data in your IB account. In IB Client Portal, click on Settings > User Settings > Market Data Subscriptions:

Click the edit icon then select and confirm the relevant subscriptions:

Market data for paper accounts
IB paper accounts do not directly subscribe to market data. Rather, to access market data using your IB paper account, subscribe to the data in your live account and share it with your paper account. Log in to IB Client Portal with your live account login and go to Settings > Account Settings > Paper Trading Account:

Then select the option to share your live account's market data with your paper account:

IB Gateway
QuantRocket uses the IB API to collect market data from IB, submit orders, and track positions and account balances. All communication with IB is routed through IB Gateway, a Java application which is a slimmed-down version of Trader Workstation (TWS) intended for API use. You can run one or more IB Gateway services through QuantRocket, where each gateway instance is associated with a different IB username and password.
Connect to IB
IB Gateway runs inside the ibg1
container and connects to IB using your IB username and password. (If you have multiple IB usernames, you can run multiple IB Gateways.) The launchpad
container provides an API that allows you to start and stop IB Gateway inside the ibg
container(s).
The steps for connecting to your IB account and starting IB Gateway differ depending on whether your IB account requires the use of a security card at login.
Secure Login System (SLS)
For fully automated configuration and running of IB Gateway, you must partially opt out of the Secure Login System (SLS), IB's two-factor authentication. With a partial opt-out, your username and password (but not your security device) are required for logging into IB Gateway and other IB trading platforms. Your security device is still required for logging in to Client Portal. A partial opt-out can be performed in Client Portal by going to Settings > User Settings > Secure Login System > Secure Login Settings.
If you prefer not to perform a partial opt-out of IB's Secure Login System (SLS) or can't for regulatory reasons, you can still use QuantRocket but will need to manually enter your security code each time you start IB Gateway using your live login.
A security card is not required for paper accounts, so you can enjoy full automation by using your paper account, even if your live account requires a security card for login.
Enter IB login (no security card)
To connect to your IB account, enter your IB login into your deployment, as well as the desired trading mode (live or paper). You'll be prompted for your password:
$ quantrocket launchpad credentials 'ibg1' --username 'myuser' --paper
Enter IB Password:
status: successfully set ibg1 credentials
>>> from quantrocket.launchpad import set_credentials
>>> set_credentials("ibg1", username="myuser", trading_mode="paper")
Enter IB Password:
{'status': 'successfully set ibg1 credentials'}
$ curl -X PUT 'http://houston/ibg1/credentials' -d 'username=myuser' -d 'password=mypassword' -d 'trading_mode=paper'
{"status": "successfully set ibg1 credentials"}
When setting your credentials, QuantRocket performs several steps. It stores your credentials inside your deployment so you don't need to enter them again. It then starts and stops IB Gateway, which causes IB Gateway to download a settings file which QuantRocket then configures appropriately. The entire process takes approximately 30 seconds to complete.
If you encounter errors trying to start IB Gateway, refer to a later section to learn how to
access the IB Gateway GUI for troubleshooting.
Enter IB login (security card required)
To connect to a live IB account which requires second factor authentication, enter your IB login into your deployment. You'll be prompted for your password:
$ quantrocket launchpad credentials 'ibg1' --username 'myuser' --live
Enter IB Password:
msg: Cannot start gateway because second factor authentication is required. API settings
not updated. Please open the IB Gateway GUI to complete authentication, then manually
update the API settings. See http://qrok.it/h/ib2fa to learn more
status: error
>>> from quantrocket.launchpad import set_credentials
>>> set_credentials("ibg1", username="myuser", trading_mode="live")
Enter IB Password:
HTTPError: ('401 Client Error: UNAUTHORIZED for url: http://houston/ibg1/credentials', {'status': 'error', 'msg': 'Cannot start gateway because second factor authentication is required. API settings not updated. Please open the IB Gateway GUI to complete authentication, then manually update the API settings. See http://qrok.it/h/ib2fa to learn more'})
$ curl -X PUT 'http://houston/ibg1/credentials' -d 'username=myuser' -d 'password=mypassword' -d 'trading_mode=live'
{"status": "error", "msg": "Cannot start gateway because second factor authentication is required. API settings not updated. Please open the IB Gateway GUI to complete authentication, then manually update the API settings. See http://qrok.it/h/ib2fa to learn more"}
An error message advises you to open the IB Gateway GUI to complete the login. Follow the instructions in a later section to open the GUI, and enter your security code to complete the login.

Due to the security card requirement, QuantRocket wasn't able to programatically update IB Gateway settings, so you should update those manually. In the IB Gateway GUI, click Configure > Settings and change the following settings:
- uncheck Read-only API (if you intend to place orders using QuantRocket)
- set Master Client ID to 6000 (if you want QuantRocket to track your trades)

To quit the GUI session but leave IB Gateway running, simply close your browser tab.
Verify IB connection
Querying your IB account balance is a good way to verify your IB connection:
$ quantrocket account balance --latest --fields 'NetLiquidation' | csvlook
| Account | Currency | NetLiquidation | LastUpdated |
| --------- | -------- | -------------- | ------------------- |
| DU12345 | USD | 500,000.00 | 2018-02-02 22:57:13 |
>>> from quantrocket.account import download_account_balances
>>> import io
>>> import pandas as pd
>>> f = io.StringIO()
>>> download_account_balances(f, latest=True, fields=["NetLiquidation"])
>>> balances = pd.read_csv(f, parse_dates=["LastUpdated"])
>>> balances.head()
Account Currency NetLiquidation LastUpdated
0 DU12345 USD 500000.0 2018-02-02 22:57:13
$ curl 'http://houston/account/balances.csv?latest=true&fields=NetLiquidation'
Account,Currency,NetLiquidation,LastUpdated
DU12345,USD,500000.0,2018-02-02 22:57:13
Switch between live and paper account
When you sign up for an IB paper account, IB provides login credentials for the paper account. However, it is also possible to login to the paper account by using your live account credentials and specifying the trading mode as "paper". Thus, technically the paper login credentials are unnecessary.
Using your live login credentials for both live and paper trading allows you to easily switch back and forth. Supposing you originally select the paper trading mode:
$ quantrocket launchpad credentials 'ibg1' --username 'myliveuser' --paper
Enter IB Password:
status: successfully set ibg1 credentials
>>> from quantrocket.launchpad import set_credentials
>>> set_credentials("ibg1", username="myliveuser", trading_mode="paper")
Enter IB Password:
{'status': 'successfully set ibg1 credentials'}
$ curl -X PUT 'http://houston/ibg1/credentials' -d 'username=myliveuser' -d 'password=mypassword' -d 'trading_mode=paper'
{"status": "successfully set ibg1 credentials"}
You can later switch to live trading mode without re-entering your credentials:
$ quantrocket launchpad credentials 'ibg1' --live
status: successfully set ibg1 credentials
>>> set_credentials("ibg1", trading_mode="live")
{'status': 'successfully set ibg1 credentials'}
$ curl -X PUT 'http://houston/ibg1/credentials' -d 'trading_mode=live'
{"status": "successfully set ibg1 credentials"}
If you forget which mode you're in (or which login you used), you can check:
$ quantrocket launchpad credentials 'ibg1'
TRADING_MODE: live
TWSUSERID: myliveuser
>>> from quantrocket.launchpad import get_credentials
>>> get_credentials("ibg1")
{'TWSUSERID': 'myliveuser', 'TRADING_MODE': 'live'}
$ curl -X GET 'http://houston/ibg1/credentials'
{"TWSUSERID": "myliveuser", "TRADING_MODE": "live"}
Start/stop IB Gateway
IB Gateway must be running whenever you want to collect market data or place or monitor orders. You can optionally stop IB Gateway when you're not using it.
To check the current status of your IB Gateway(s):
$ quantrocket launchpad status
ibg1: stopped
>>> from quantrocket.launchpad import list_gateway_statuses
>>> list_gateway_statuses()
{'ibg1': 'stopped'}
$ curl -X GET 'http://houston/launchpad/gateways'
{"ibg1": "stopped"}
You can start IB Gateway, optionally waiting for the startup process to complete:
$ quantrocket launchpad start --wait
ibg1:
status: running
>>> from quantrocket.launchpad import start_gateways
>>> start_gateways(wait=True)
{'ibg1': {'status': 'running'}}
$ curl -X POST 'http://houston/launchpad/gateways?wait=True'
{"ibg1": {"status": "running"}}
And later stop it:
$ quantrocket launchpad stop --wait
ibg1:
status: stopped
>>> from quantrocket.launchpad import stop_gateways
>>> stop_gateways(wait=True)
{'ibg1': {'status': 'stopped'}}
$ curl -X DELETE 'http://houston/launchpad/gateways?wait=True'
{"ibg1": {"status": "stopped"}}
Although IB Gateway is advertised as not having to be restarted once a day like Trader Workstation, it's not unusual for IB Gateway to display unexpected behavior (such as not returning market data when requested) which is then resolved simply by restarting IB Gateway. Therefore you might find it beneficial to restart your gateways from time to time, which you could do via countdown, QuantRocket's cron service:
0 1 * * * quantrocket launchpad stop --wait && quantrocket launchpad start
Or, perhaps you use one of your IB logins during the day to monitor the market using Trader Workstation, but in the evenings you'd like to use this login to add concurrency to your historical data collection. You could start and stop the IB Gateway service in conjunction with the data collection:
30 17 * * 1-5 quantrocket launchpad start --wait --gateways 'ibg2' && quantrocket history collect "nasdaq-1d" && quantrocket launchpad stop --gateways 'ibg2'
IB Gateway GUI
Normally you won't need to access the IB Gateway GUI. However, you might need access to troubleshoot a login issue, or if you've enabled two-factor authentication for IB Gateway.
To allow access to the IB Gateway GUI, QuantRocket uses NoVNC, which uses the WebSockets protocol to support VNC connections in the browser. To open an IB Gateway GUI connection in your browser, click the "IB Gateway GUI" button located on the JupyterLab Launcher or from the File menu. The IB Gateway GUI will open in a new window (make sure your browser doesn't block the pop-up).

If IB Gateway isn't currently running, the screen will be black.
To quit the VNC session but leave IB Gateway running, simply close your browser tab.
For improved security for cloud deployments, QuantRocket doesn't directly expose any VNC ports to the outside. By proxying VNC connections through houston using NoVNC, such connections are protected by Basic Auth and SSL, just like every other request sent through houston.
Multiple IB Gateways
QuantRocket support running multiple IB Gateways, each associated with a particular IB login. Two of the main reasons for running multiple IB Gateways are:
- To trade multiple accounts
- To increase market data concurrency
The default IB Gateway service is called ibg1
. To run multiple IB Gateways, create a file called docker-compose.override.yml
in the same directory as your docker-compose.yml
and add the desired number of additional services as shown below. In this example we are adding two additional IB Gateway services, ibg2
and ibg3
, which inherit from the definition of ibg1
:
version: '2.4'
services:
ibg2:
extends:
file: docker-compose.yml
service: ibg1
ibg3:
extends:
file: docker-compose.yml
service: ibg1
You can learn more about docker-compose.override.yml
in another section.
Then, deploy the new service(s):
$ cd /path/to/docker-compose.yml
$ docker-compose -p quantrocket up -d
You can then enter your login for each of the new IB Gateways:
$ quantrocket launchpad credentials 'ibg2' --username 'myuser' --paper
Enter IB Password:
status: successfully set ibg2 credentials
>>> from quantrocket.launchpad import set_credentials
>>> set_credentials("ibg2", username="myuser", trading_mode="paper")
Enter IB Password:
{'status': 'successfully set ibg2 credentials'}
$ curl -X PUT 'http://houston/ibg2/credentials' -d 'username=myuser' -d 'password=mypassword' -d 'trading_mode=paper'
{"status": "successfully set ibg2 credentials"}
When starting and stopping gateways, the default behavior is start or stop
all gateways. To target specific gateways, use the
gateways
parameter:
$ quantrocket launchpad start --gateways 'ibg2'
status: the gateways will be started asynchronously
>>> from quantrocket.launchpad import start_gateways
>>> start_gateways(gateways=["ibg2"])
{'status': 'the gateways will be started asynchronously'}
$ curl -X POST 'http://houston/launchpad/gateways?gateways=ibg2'
{"status": "the gateways will be started asynchronously"}
Market data permission file
Generally, loading your market data permissions into QuantRocket is only necessary when you are running multiple IB Gateway services with different market data permissions for each.
To retrieve market data from IB, you must subscribe to the appropriate market data subscriptions in IB Client Portal. QuantRocket can't identify your subscriptions via API, so you must tell QuantRocket about your subscriptions by loading a YAML configuration file. If you don't load a configuration file, QuantRocket will assume you have market data permissions for any data you request through QuantRocket. If you only run one IB Gateway service, this is probably sufficient and you can skip the configuration file. However, if you run multiple IB Gateway services with separate market data permissions for each, you will probably want to load a configuration file so QuantRocket can route your requests to the appropriate IB Gateway service. You should also update your configuration file whenever you modify your market data permissions in IB Client Portal.
QuantRocket looks for a market data permission file called quantrocket.launchpad.permissions.yml
in the top-level of the Jupyter file browser (that is, /codeload/quantrocket.launchpad.permissions.yml
). The format of the YAML file is shown below:
ibg1:
marketdata:
STK:
- NYSE
- ISLAND
- TSEJ
FUT:
- GLOBEX
- OSE
CASH:
- IDEALPRO
research:
- reuters
- wsh
ibg2:
marketdata:
STK:
- NYSE
When you create or edit this file, QuantRocket will detect the change and load the configuration. It's a good idea to have flightlog open when you do this. If the configuration file is valid, you'll see a success message:
2018-08-12 09:39:31 quantrocket.launchpad: INFO Successfully loaded /codeload/quantrocket.launchpad.permissions.yml
If the configuration file is invalid, you'll see an error message:
2018-08-12 09:46:46 quantrocket.launchpad: ERROR Could not load /codeload/quantrocket.launchpad.permissions.yml:
2018-08-12 09:46:46 quantrocket.launchpad: ERROR unknown key(s) for service ibg1: marketdata-typo
You can also dump out the currently loaded config to confirm it is as you expect:
$ quantrocket launchpad config
ibg1:
marketdata:
CASH:
- IDEALPRO
FUT:
- GLOBEX
- OSE
STK:
- NYSE
- ISLAND
- TSEJ
research:
- reuters
- wsh
ibg2:
marketdata:
STK:
- NYSE
>>> from quantrocket.launchpad import get_launchpad_config
>>> get_launchpad_config()
{
'ibg1': {
'marketdata': {
'CASH': [
'IDEALPRO'
],
'FUT': [
'GLOBEX',
'OSE'
],
'STK': [
'NYSE',
'ISLAND',
'TSEJ'
]
},
'research': [
'reuters',
'wsh'
]
},
'ibg2': {
'marketdata': {
'STK': [
'NYSE'
]
}
}
}
$ curl -X GET 'http://houston/launchpad/config'
{
"ibg1": {
"marketdata": {
"CASH": [
"IDEALPRO"
],
"FUT": [
"GLOBEX",
"OSE"
],
"STK": [
"NYSE",
"ISLAND",
"TSEJ"
]
},
"research": [
"reuters",
"wsh"
]
},
"ibg2": {
"marketdata": {
"STK": [
"NYSE"
]
}
}
}
IB Gateway log files
If you need to send your IB Gateway log files to IB for troubleshooting, you can use the IB Gateway GUI to export the log files to the Docker filesystem, then copy them to your local filesystem.
- With IB Gateway running, open the GUI.
- In the IB Gateway GUI, click File > Gateway Logs, and select the day you're interested in.
- For small logs, you can view the logs directly in IB Gateway and copy them to your clipboard.
- For larger logs, click Export Logs or Export Today Logs. A file browser will open, showing the filesystem inside the Docker container.
- Export the log file to an easy-to-find location such as
/tmp/ibgateway-exported-logs.txt
. - From the host machine, copy the exported logs from the Docker container to your local filesystem. For
ibg1
logs saved to the above location, the command would be:
$ docker cp quantrocket_ibg1_1:/tmp/ibgateway-exported-logs.txt ibgateway-exported-logs.txt
Connect from other applications
If you run other applications, you can connect them to your QuantRocket deployment for the purpose of querying data, submitting orders, etc.
Each remote connection to a cloud deployment counts against your plan's concurrent install limit. For example, if you run a single cloud deployment of QuantRocket and connect to it from a single remote application, this is counted as 2 concurrent installs, one for the deployment and one for the remote connection. (Connecting to a local deployment from a separate application running on your local machine does not count against the concurrent install limit.)
To utilize the Python API and/or CLI from outside of QuantRocket, install the client on the application or system you wish to connect from:
$ pip install 'quantrocket-client'
To ensure compatibility, the MAJOR.MINOR version of the client should match the MAJOR.MINOR version of your deployment. For example, if your deployment is version 1.7.x, you can install the latest 1.7.x client:
$ pip install 'quantrocket-client>=1.7,<1.8'
Don't forget to update your client version when you update your deployment version.
Next, set environment variables to tell the client how to connect to your QuantRocket deployment. For a cloud deployment, this means providing the deployment URL and credentials:
$
$ export HOUSTON_URL=https://quantrocket.123capital.com
$ export HOUSTON_USERNAME=myusername
$ export HOUSTON_PASSWORD=mypassword
$
$ [Environment]::SetEnvironmentVariable("HOUSTON_URL", "https://quantrocket.123capital.com", "User")
$ [Environment]::SetEnvironmentVariable("HOUSTON_USERNAME", "myusername", "User")
$ [Environment]::SetEnvironmentVariable("HOUSTON_PASSWORD", "mypassword", "User")
For connecting to a local deployment, only the URL is needed:
$
$ export HOUSTON_URL=http://localhost:1969
$
$ [Environment]::SetEnvironmentVariable("HOUSTON_URL", "http://localhost:1969", "User")
Environment variable syntax varies by operating system. Don't forget to make your environment variables persistent by adding them to .bashrc
(Linux) or .profile
(MacOS) and sourcing it (for example source ~/.bashrc
), or restarting PowerShell (Windows).
Finally, test that it worked:
$ quantrocket houston ping
msg: hello from houston
>>> from quantrocket.houston import ping
>>> ping()
{u'msg': u'hello from houston'}
$ curl -u myusername:mypassword https://quantrocket.123capital.com/ping
{"msg": "hello from houston"}
To connect from applications running languages other than Python, you can skip the client installation and use the HTTP API directly.
Multi-user deployments
Hedge funds and other multi-user organizations can benefit from the ability to run more than one QuantRocket deployment. You can deploy QuantRocket to two or in some cases more than two computers or cloud servers, depending on your subscription plan.
The user interface for QuantRocket is JupyterLab, which is best suited for use by a single user at a time. While it is possible for multiple users to log in to the same QuantRocket cloud deployment, it is usually not ideal because they will be working in a shared JupyterLab environment, with a shared filesytem and notebooks, shared JupyterLab terminals and kernels, and shared compute resources. This will likely lead to stepping on each other's toes.
For hedge funds, a recommended deployment strategy is to run a primary deployment for data collection and live trading, and one or more research deployments (depending on subscription) for research and backtesting.
| Deployed to | How many | Connects to IB Gateway | Used for | Used by |
---|
Primary deployment | Cloud | 1 | Yes | Data collection, live trading | Sys admin / owner / manager |
Research deployment(s) | Cloud or local | 1 or more | No | Research and backtesting | Quant researchers |
Collect data on the primary deployment and push it to S3. Once pushed, deep historical data can optionally be purged from the primary deployment, retaining only enough historical data to run live trading. Then, selectively pull databases from S3 onto the research deployment(s), where researchers analyze the data and run backtests.
Research deployments can be hosted in the cloud or run on the researcher's local workstation.
Each researcher's code, notebooks, and JupyterLab environment are isolated from those of other researchers. The code can be pushed to separate Git repositories, with sharing and access control managed on the Git repositories.
You can only run IB Gateway on one deployment at a time, due to restrictions imposed by IB. With the deployment strategy above, this is not a problem because IB Gateway only runs on the primary deployment.
Universe Selection
QuantRocket supports dozens of global exchanges and tens of thousands of financial instruments across multiple asset classes including equities, futures, forex, and options. You can easily collect all listings or contracts for the exchanges that interest you and flexibly group them into universes that make sense for your trading strategies.
Collect listings
First, decide which exchange(s) you want to work with. You can view exchange listings on the IB website or use QuantRocket to summarize the IB website by security type:
$ quantrocket master exchanges --regions 'asia' --sec-types 'STK'
STK:
Australia:
- ASX
- CHIXAU
Hong Kong:
- SEHK
- SEHKNTL
- SEHKSZSE
India:
- NSE
Japan:
- CHIXJ
- JPNNEXT
- TSEJ
Singapore:
- SGX
>>> from quantrocket.master import list_exchanges
>>> list_exchanges(regions=["asia"], sec_types=["STK"])
{'STK': {'Australia': ['ASX', 'CHIXAU'],
'Hong Kong': ['SEHK', 'SEHKNTL', 'SEHKSZSE'],
'India': ['NSE'],
'Japan': ['CHIXJ', 'JPNNEXT', 'TSEJ'],
'Singapore': ['SGX']}}
$ curl 'http://houston/master/exchanges?regions=asia&sec_types=STK'
{"STK": {"Australia": ["ASX", "CHIXAU"], "Hong Kong": ["SEHK", "SEHKNTL", "SEHKSZSE"], "India": ["NSE"], "Japan": ["CHIXJ", "JPNNEXT", "TSEJ"], "Singapore": ["SGX"]}}
Let's collect all stock listings on the Hong Kong Stock Exchange:
$ quantrocket master collect --exchanges 'SEHK' --sec-types 'STK'
status: the listing details will be collected asynchronously
>>> from quantrocket.master import collect_listings
>>> collect_listings(exchanges="SEHK", sec_types=["STK"])
{'status': 'the listing details will be collected asynchronously'}
$ curl -X POST 'http://houston/master/securities?exchanges=SEHK&sec_types=STK'
{"status": "the listing details will be collected asynchronously"}
QuantRocket uses the IB website to collect all symbols for the requested exchange then retrieves contract details from the IB API. The process runs asynchronously; check flightlog to monitor the progress:.$ quantrocket flightlog stream --hist 5
12:07:40 quantrocket.master: INFO Collecting SEHK STK listings from IB website
12:08:29 quantrocket.master: INFO Requesting details for 2220 SEHK listings found on IB website
12:10:06 quantrocket.master: INFO Saved 2215 SEHK listings to securities master database
The number of listings collected from the IB website might be larger than the number of listings actually saved to the database. This is because the IB website lists all symbols that trade on a given exchange, even if the exchange is not the primary listing exchange. For example, the primary listing exchange for Alcoa (AA) is NYSE, but the IB website also lists Alcoa under the BATS exchange because Alcoa also trades on BATS (and many other US exchanges). QuantRocket saves Alcoa's contract details when you collect NYSE listings, not when you collect BATS listings. For futures, the number of contracts saved to the database will typically be larger than the number of listings found on the IB website because the website only lists underlyings but QuantRocket saves all available expiries for each underlying.
The master file
After you collect listings, you can inspect the master file, which provides the symbol, exchange, currency, and many other fields:
$ quantrocket master get --exchanges 'SEHK' -o listings.csv
$ csvlook listings.csv --max-rows 5 --max-columns 10 -I
| ConId | Symbol | Etf | SecType | PrimaryExchange | Currency | LocalSymbol | TradingClass | MarketName | LongName | ... |
| ------- | ------ | --- | ------- | --------------- | -------- | ----------- | ------------ | ---------- | ---------------------------- | --- |
| 1616383 | 11 | 0 | STK | SEHK | HKD | 11 | 11 | 11 | HANG SENG BANK LTD | ... |
| 1616387 | 44 | 0 | STK | SEHK | HKD | 44 | 44 | 44 | HONG KONG AIRCRAFT ENGINEERG | ... |
| 1616390 | 5 | 0 | STK | SEHK | HKD | 5 | 5 | 5 | HSBC HOLDINGS PLC | ... |
| 1616393 | 101 | 0 | STK | SEHK | HKD | 101 | 101 | 101 | HANG LUNG PROPERTIES LTD | ... |
| 1616396 | 2 | 0 | STK | SEHK | HKD | 2 | 2 | 2 | CLP HOLDINGS LTD | ... |
| ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
>>> import pandas as pd
>>> from quantrocket.master import download_master_file
>>> download_master_file("listings.csv", exchanges="SEHK")
>>> securities = pd.read_csv("listings.csv")
>>> securities.head()
ConId Symbol Etf SecType PrimaryExchange Currency LocalSymbol TradingClass
0 1616383 11 0 STK SEHK HKD 11 11
1 1616387 44 0 STK SEHK HKD 44 44
2 1616390 5 0 STK SEHK HKD 5 5
3 1616393 101 0 STK SEHK HKD 101 101
4 1616396 2 0 STK SEHK HKD 2 2
$ curl -X GET 'http://houston/master/securities.csv?exchanges=SEHK' > listings.csv
$ head listings.csv
ConId,Symbol,Etf,SecType,PrimaryExchange,Currency,LocalSymbol,TradingClass,...
1616383,11,0,STK,SEHK,HKD,11,11,...
1616387,44,0,STK,SEHK,HKD,44,44,...
...
Note the ConId
column in the CSV file: ConId is short for "contract ID" and is IB's unique identifier for a particular security or contract. ConIds are used throughout QuantRocket to refer to securities.
Define universes
Once you've collected listings that interest you, you can group them into meaningful universes. Universes provide a convenient way to refer to and manipulate large groups of securities when collecting historical data, running a trading strategy, etc. You can create universes based on exchanges, security types, sectors, liquidity, or any criteria you like.
There are different ways to create a universe. You can download a CSV of securities, manually pare it down to the desired securities, and create the universe from the edited list:
$ quantrocket master get --exchanges 'SEHK' --outfile hongkong_securities.csv
$
$ quantrocket master universe 'hongkong' --infile hongkong_securities_edited.csv
code: hongkong
inserted: 2216
provided: 2216
total_after_insert: 2216
>>> from quantrocket.master import download_master_file, create_universe
>>> download_master_file("hongkong_securities.csv", exchanges=["SEHK"])
>>>
>>> create_universe("hongkong", infilepath_or_buffer="hongkong_securities_edited.csv")
{'code': 'hongkong',
'inserted': 2216,
'provided': 2216,
'total_after_insert': 2216}
$ curl -X GET 'http://houston/master/securities.csv?exchanges=SEHK' > hongkong_securities.csv
$
$ curl -X PUT 'http://houston/master/universes/hongkong' --upload-file hongkong_securities_edited.csv
{"code": "hongkong", "provided": 2216, "inserted": 2216, "total_after_insert": 2216}
Using the CLI, you can create a universe in one-line by piping the downloaded CSV to the universe command:
$ quantrocket master get --exchanges 'SEHK' --sectors 'Financial' | quantrocket master universe 'hongkong-fin' --infile -
code: hongkong-fin
inserted: 416
provided: 416
total_after_insert: 416
You can also create a universe from existing universes:
$ quantrocket master universe 'asx' --from-universes 'asx-sml' 'asx-mid' 'asx-lrg'
code: asx
inserted: 1604
provided: 1604
total_after_insert: 1604
>>> from quantrocket.master import create_universe
>>> create_universe("asx", from_universes=["asx-sml", "asx-mid", "asx-lrg"])
{'code': 'asx',
'inserted': 1604,
'provided': 1604,
'total_after_insert': 1604}
$ curl -X PUT 'http://houston/master/universes/asx?from_universes=asx-sml&from_universes=asx-mid&from_universes=asx-lrg'
{"code": "asx", "provided": 1604, "inserted": 1604, "total_after_insert": 1604}
Filter by securities master fields with csvgrep
You can filter securities master queries by a variety of fields including Symbol
, Exchange
, Currency
, Sector
, and more. (Run quantrocket master get -h
to see filtering options.) However, sometimes you may want to filter by a field that is not exposed by the API. From a terminal, you can use csvgrep
for this purpose. For example, NASDAQ stocks are divided into the NMS (National Market System) and SCM (SmallCap Market) listing tiers, which are stored in the TradingClass
field. To create a separate universe for each listing tier:
$ quantrocket master get -e 'NASDAQ' | csvgrep --columns 'TradingClass' --match 'NMS' | quantrocket master universe 'nasdaq-nms' -f -
code: nasdaq-nms
inserted: 2252
provided: 2252
total_after_insert: 2252
$ quantrocket master get -e 'NASDAQ' | csvgrep --columns 'TradingClass' --match 'SCM' | quantrocket master universe 'nasdaq-scm' -f -
code: nasdaq-scm
inserted: 778
provided: 778
total_after_insert: 778
Or save a CSV of OTC stocks, excluding the "NOINFO" trading class:
$ quantrocket master get -e 'PINK' | csvgrep --columns 'TradingClass' --match 'NOINFO' --invert-match > pink_with_info.csv
Define universes by fundamental data availability
If you want to limit a universe to stocks with fundamental data, the best approach is to create a universe comprising the entire pool of relevant securities, collect the needed data for this universe, then create the sub-universe.
Suppose we've collected all NYSE stock listings and want to create a universe of all NYSE stocks with Reuters estimates available. First, define a universe of all NYSE stocks and collect estimates:
$ quantrocket master get -e 'NYSE' -t 'STK' | quantrocket master universe 'nyse-stk' -f -
code: nyse-stk
inserted: 3109
provided: 3109
total_after_insert: 3109
$ quantrocket fundamental collect-estimates --universes 'nyse-stk'
status: the fundamental data will be collected asynchronously
Wait for the fundamental data to be collected (monitor flightlog for status). Then, since a universe can be created from any file with a ConId
column, simply download a file of estimates for the desired codes and re-upload the file to create the universe:
$ quantrocket fundamental estimates 'BVPS' 'EPS' 'NAV' 'ROE' 'ROA' -u 'nyse-stk' | quantrocket master universe 'nyse-stk-with-estimates' -f -
code: nyse-stk-with-estimates
inserted: 1957
provided: 1957
total_after_insert: 1957
Define universes by dollar volume
Alternatively, suppose we want to create 3 universes - smallcaps, midcaps, and largecaps - based on the 90-day average dollar volume of NYSE stocks. First, create a history database and collect historical data for all NYSE stocks (see the Historical Data section for more detail).
$ quantrocket history create-db 'nyse-eod' --bar-size '1 day' --universes 'nyse-stk'
status: successfully created quantrocket.history.nyse-eod.sqlite
$ quantrocket history collect 'nyse-eod'
status: the historical data will be collected asynchronously
Once the historical data has been collected (monitor flightlog for status), you can use pandas and the Python client to determine average dollar volume and create your universes. First, query the history database and load into pandas:>>> from quantrocket import get_prices
>>> from quantrocket.master import create_universe
>>> import io
>>> prices = get_prices("nyse-eod", fields=["Close", "Volume"])
Next we calculate daily dollar volume and take a 90-day average:
>>> closes = prices.loc["Close"]
>>> volumes = prices.loc["Volume"]
>>> dollar_volumes = closes * volumes
>>> avg_dollar_volumes = dollar_volumes.rolling(window=90).mean()
>>>
>>> avg_dollar_volumes = avg_dollar_volumes.iloc[-1]
>>> avg_dollar_volumes.describe()
Out[60]:
count 2.255000e+03
mean 3.609773e+07
std 9.085866e+07
min 3.270559e+04
25% 1.058080e+06
50% 6.229675e+06
75% 3.399090e+07
max 1.719344e+09
Name: 2017-08-15 00:00:00, dtype: float64
Let's make universes of $1-5M, $5-25M, and $25M+:
>>> sml = avg_dollar_volumes[(avg_dollar_volumes >= 1000000) & (avg_dollar_volumes < 5000000)]
>>> mid = avg_dollar_volumes[(avg_dollar_volumes >= 5000000) & (avg_dollar_volumes < 25000000)]
>>> lrg = avg_dollar_volumes[avg_dollar_volumes >= 25000000]
The DataFrame indexes contain the conids which are needed to make the universes, so we write the DataFrames to in-memory CSVs and pass the CSVs to the master service:
>>> f = io.StringIO()
>>> sml.to_csv(f, header=True)
>>> create_universe("nyse-sml", infilepath_or_buffer=f)
{'code': 'nyse-sml',
'inserted': 509,
'provided': 509,
'total_after_insert': 509}
>>> f = io.StringIO()
>>> mid.to_csv(f, header=True)
>>> create_universe("nyse-mid", infilepath_or_buffer=f)
{'code': 'nyse-mid',
'inserted': 530,
'provided': 530,
'total_after_insert': 530}
>>> f = io.StringIO()
>>> lrg.to_csv(f, header=True)
>>> create_universe("nyse-lrg", infilepath_or_buffer=f)
{'code': 'nyse-lrg',
'inserted': 665,
'provided': 665,
'total_after_insert': 665}
On a side note, now that you've created different universes for different market caps, a typical workflow might involve creating a history database for each universe. As described more fully in the Historical Data documentation, you can seed your databases for each market cap segment from the historical data you've already collected, saving you the trouble of re-collecting the data from scratch.
$ quantrocket history create-db 'nyse-sml-eod' --bar-size '1 day' --universes 'nyse-sml'
status: successfully created quantrocket.history.nyse-sml-eod.sqlite
$ quantrocket history get 'nyse-eod' --universes 'nyse-sml' | quantrocket history load 'nyse-sml-eod'
db: nyse-sml-eod
loaded: 572081
See the Historical Data documentation for more details on copying data from one history database to another.
Security types
The following security types or asset classes are available:
Code | Asset class |
---|
STK | stocks |
ETF | ETFs |
FUT | futures |
CASH | forex |
IND | indices |
OPT | options1 |
FOP | futures options1 |
1 For collecting options and futures options, see option chains.
With the exception of ETFs, these security type codes are stored in the SecType
field of the master file. ETFs are a "pseudo" security type in QuantRocket. See ETF classification.
ETF classification
The IB API does not classify ETFs as a separate security type; they are simply classified as stocks (STK
). However, the IB website organizes ETF and stock listings separately, and QuantRocket uses this information to classify ETFs. Stocks and ETFs are indicated as follows in the master file:
| SecType field | Etf field |
---|
ETF | STK | 1 |
Stock | STK | 0 |
ADR classification
ADRs (American Depositary Receipts) are not identified as such by the IB API. The best option for identifying ADRs is to search the LongName
field for the text "ADR" and create a universe of the results.
This can be done using the CLI with csvgrep. First, peek at a few results:
$ quantrocket master get -e 'NYSE' --fields 'Symbol' 'LongName' | csvgrep --columns 'LongName' --match ' ADR' | csvlook --max-rows 10
| ConId | Symbol | LongName |
| ----- | ------ | ---------------------------- |
| 4,442 | AMX | AMERICA MOVIL-SPN ADR CL L |
| 4,656 | AU | ANGLOGOLD ASHANTI-SPON ADR |
| 4,815 | BBVA | BANCO BILBAO VIZCAYA-SP ADR |
| 4,839 | BCH | BANCO DE CHILE-ADR |
| 4,854 | BCS | BARCLAYS PLC-SPONS ADR |
| 4,940 | BFR | BBVA BANCO FRANCES SA-ADR |
| 4,986 | BHP | BHP BILLITON LTD-SPON ADR |
| 5,171 | BP | BP PLC-SPONS ADR |
| 5,315 | BT | BT GROUP PLC-SPON ADR |
| 6,257 | CX | CEMEX SAB-SPONS ADR PART CER |
| ... | ... | ... |
Note the space in front of " ADR" in the above search, which is intended to prevent matching a word that ends with "ADR". csvgrep
also supports regex searches which would allow for finer-grained searches.
Now create a universe of ADRs from all of the results:
$ quantrocket master get -e 'NYSE' --fields 'Symbol' 'LongName' | csvgrep --columns 'LongName' --match ' ADR' | quantrocket master universe 'nyse-adrs' -f -
code: nyse-adrs
inserted: 228
provided: 228
total_after_insert: 228
This can also be accomplished with the Python API combined with pandas. Searching for "ADR" and creating the universe might look like this:
>>> adrs = securities[securities.LongName.str.contains(" ADR")]
>>> f = io.StringIO()
>>> adrs.to_csv(f)
>>> create_universe("nyse-adrs", f)
{'code': 'nyse-adrs',
'provided': 228,
'inserted': 228,
'total_after_insert': 228}
Moonshot strategies that wish to exclude ADRs could do so using the EXCLUDE_UNIVERSES
attribute:
class MyStrategyWithoutADRs(Moonshot):
EXCLUDE_UNIVERSES = ['nyse-adrs']
...
Option chains
To collect option chains, first collect listings for the underlying securities:
$ quantrocket master collect --exchanges 'NASDAQ' --sec-types 'STK' --symbols 'GOOG' 'FB' 'AAPL'
status: the listing details will be collected asynchronously
>>> from quantrocket.master import collect_listings
>>> collect_listings(exchanges="NASDAQ", sec_types=["STK"], symbols=["GOOG", "FB", "AAPL"])
{'status': 'the listing details will be collected asynchronously'}
$ curl -X POST 'http://houston/master/securities?exchanges=NASDAQ&sec_types=STK&symbols=GOOG&symbols=FB&symbols=AAPL'
{"status": "the listing details will be collected asynchronously"}
Then request option chains for the underlying stocks:
$ quantrocket master get -e 'NASDAQ' -t 'STK' -s 'GOOG' 'FB' 'AAPL' | quantrocket master options --infile -
status: the option chains will be collected asynchronously
>>> from quantrocket.master import download_master_file, collect_option_chains
>>> import io
>>> f = io.StringIO()
>>> download_master_file(f, exchanges=["NASDAQ"], sec_types=["STK"], symbols=["GOOG", "FB", "AAPL"])
>>> collect_option_chains(infilepath_or_buffer=f)
{'status': 'the option chains will be collected asynchronously'}
$ curl -X GET 'http://houston/master/securities.csv?exchanges=NASDAQ&sec_types=STK&symbols=GOOG&symbols=FB&symbols=AAPL' > nasdaq_mega.csv
$ curl -X POST 'http://houston/master/options' --upload-file nasdaq_mega.csv
{"status": "the option chains will be collected asynchronously"}
Once the options request has finished, you can query the options like any other security:
$ quantrocket master get -s 'GOOG' 'FB' 'AAPL' -t 'OPT' --outfile 'options.csv'
>>> from quantrocket.master import download_master_file
>>> download_master_file("options.csv", symbols=["GOOG", "FB", "AAPL"], sec_types=["OPT"])
$ curl -X GET 'http://houston/master/securities.csv?symbols=GOOG&symbols=FB&symbols=AAPL&sec_types=OPT' > options.csv
Option chains often consist of hundreds, sometimes thousands of options per underlying security. Requesting option chains for large universes of underlying securities, such as all stocks on the NYSE, can take numerous hours to complete.
Maintain listings
Listings change over time and QuantRocket helps you keep your securities master database up-to-date. Your can monitor for changes to your existing listings (such as a company moving its listing from one exchange to another), you can delist securities to exclude them from your backtests and trading (without deleting them), and you can look for new listings.
Listings diffs
Security listings can change - for example, a stock might be delisted from Nasdaq and start trading OTC - and we probably want to be alerted when this happens. We can flag securities where the details as stored in our database differ from the latest details available from IB.
$ quantrocket master diff --universes 'nasdaq'
status: the diff, \if any, will be logged to flightlog asynchronously
>>> from quantrocket.master import diff_securities
>>> diff_securities(universes=["nasdaq"])
{'status': 'the diff, if any, will be logged to flightlog asynchronously'}
$ curl -X GET 'http://houston/master/diff?universes=nasdaq'
{"status": "the diff, if any, will be logged to flightlog asynchronously"}
If any listings have changed, they'll be logged to flightlog at the WARNING
level with a description of what fields have changed. You may wish to schedule this command on your countdown service and monitor Papertrail:

Delist stocks
When a stock ceases trading, IB removes it from their system. To reflect this status in QuantRocket, you can delist the security, which doesn't delete it but simply marks it as delisted. In the master file, delisted securities will be reflected by the Delisted
field having a value of 1
.
Delisting a security is a matter of proper record-keeping and also benefits data collection as it instructs QuantRocket not to waste time requesting data from IB for this security.
To delist a single security:
$ quantrocket master delist --conid 194245757
msg: delisted conid 194245757
>>> from quantrocket.master import delist_security
>>> delist_security(conid=194245757)
{'msg': 'delisted conid 194245757'}
$ curl -X DELETE 'http://houston/master/securities?conids=194245757'
{"msg": "delisted conid 194245757"}
A more automated approach is to run
quantrocket master diff
with the
--delist-missing
option, which delists securities that are no longer available from IB, and with the
--delist-exchanges
option, which delists securities associated with the exchanges you specify (note that IB uses the "VALUE" exchange as a placeholder for some delisted symbols):
$ quantrocket master diff --universes 'nasdaq' --fields 'ConId' --delist-missing --delist-exchanges 'VALUE' 'PINK'
status: the diff, \if any, will be logged to flightlog asynchronously
>>> from quantrocket.master import diff_securities
>>> diff_securities(universes="nasdaq", fields="ConId", delist_missing=True, delist_exchanges=["VALUE", "PINK"])
{'status': 'the diff, if any, will be logged to flightlog asynchronously'}
$ curl -X GET 'http://houston/master/diff?universes=nasdaq&fields=ConId&delist_missing=True&delist_exchanges=VALUE&delist_exchanges=PINK'
{"status": "the diff, if any, will be logged to flightlog asynchronously"}
If delisting is the only goal of running the command, it's useful to limit the diff to the ConId
field as shown above, since this field doesn't change. That way changes in other fields won't be logged to flightlog, which is potentially noisy.
Delisted securities will still be included by default in the master file, but you can optionally exclude them:
Ticker symbol changes
Sometimes when a ticker symbol changes IB will preserve the conid (contract ID); in this case, to incorporate the changes into our database, we can simply collect the listing details for the symbol we care about, which will overwrite the old (stale) listing details:
$
$ quantrocket master get --exchanges TSE --symbols OLD --pretty --fields ConId
ConId = 123456
$ quantrocket master collect -i 123456
status: the listing details will be collected asynchronously
However, sometimes IB will issue a new conid. In this case, if you want to continue trading the symbol, you should delist the old symbol, collect the new listing, and append the new symbol to the universe(s) you care about:
$ quantrocket master delist --exchange TSE --symbol OLD
msg: delisted conid 123456
$ quantrocket master collect --exchanges TSE --symbols NEW --sec-types STK
$
$ quantrocket master get -e TSE -s NEW -t STK | quantrocket master universe "canada" --append --infile -
The above examples expect you to take action in response to individual ticker changes, but what if your universes consist of thousands of stocks and you don't want to deal with them individually? Use quantrocket master diff --delist-missing
to automate the delisting of symbols that go missing, as described in the previous section, and use quantrocket master collect
to periodically collect any listings that might belong in your universe(s), as described in the next section. If any symbols go missing due to ticker changes that cause IB to issue a new conid, you'll pick up the new listings the next time you run quantrocket master collect
.
Add new listings
What if you want to look for new listings that IB has added since your initial universe creation and add them to your universe? First, collect all listings again from IB:
$ quantrocket master collect --exchanges SEHK --sec-types STK
status: the listing details will be collected asynchronously
>>> from quantrocket.master import collect_listings
>>> collect_listings(exchanges="SEHK", sec_types=["STK"])
{'status': 'the listing details will be collected asynchronously'}
$ curl -X POST 'http://houston/master/securities?exchanges=SEHK&sec_types=STK'
{"status": "the listing details will be collected asynchronously"}
You can see what's new by excluding what you already have:
$ quantrocket master get --exchanges SEHK --exclude-universes "hongkong" --outfile new_hongkong_securities.csv
>>> from quantrocket.master import download_master_file
>>> download_master_file("new_hongkong_securities.csv", exchanges=["SEHK"], exclude_universes=["hongkong"])
$ curl -X GET 'http://houston/master/securities.csv?exchanges=SEHK&exclude_universes=hongkong' > new_hongkong_securities.csv
If you like what you see, you can then append the new listings to your universe:
$ quantrocket master universe "hongkong" --infile new_hongkong_securities.csv
code: hongkong
inserted: 10
provided: 10
total_after_insert: 2226
>>> from quantrocket.master import create_universe
>>> create_universe("hongkong", infilepath_or_buffer="new_hongkong_securities.csv", append=True)
{'code': 'hongkong',
'inserted': 10,
'provided': 10,
'total_after_insert': 2226}
$ curl -X PATCH 'http://houston/master/universes/hongkong' --upload-file new_hongkong_securities.csv
{"code": "hongkong", "provided": 10, "inserted": 10, "total_after_insert": 2226}
For futures, IB provides several years of future expiries. From time to time, you should collect the listings again for your futures exchange(s) in order to collect the new expiries, then add them to any universes you may wish to include them in.
IB Historical Data
QuantRocket makes it easy to retrieve and work with IB's abundant, global historical market data. (Appropriate IB market data subscriptions required.) Simply define your historical data requirements, and QuantRocket will retrieve data from IB according to your requirements and store it in a database for fast, flexible querying. You can create as many databases as you need for your backtesting and trading.
About IB historical data
Split adjustments
All IB historical data is split-adjusted.
Thus, your data will be split-adjusted when you initially retrieve it into your history database. If a split occurs after the initial retrieval, the data that was already stored needs to be adjusted for the split. QuantRocket handles this circumstance by comparing a recent price in the database to the equivalently-timestamped price from IB. If the prices differ, this indicates either that a split has occurred or in some other way the vendor has adjusted their data since QuantRocket stored it. Regardless of the reason, QuantRocket deletes the data for that particular security and re-collects the entire history from IB, in order to make sure the database stays synced with IB.
Dividend adjustments
By default, IB historical data is not dividend-adjusted. However, dividend-adjusted data is available from IB using the ADJUSTED_LAST
bar type. This bar type has an important limitation: it is only available with a 1 day
bar size.
$ quantrocket history create-db 'us-stk-1d' --universes 'us-stk' --bar-size '1 day' --bar-type 'ADJUSTED_LAST'
status: successfully created quantrocket.history.us-stk-1d.sqlite
>>> from quantrocket.history import create_db
>>> create_db("us-stk-1d", universes=["us-stk"], bar_size="1 day", bar_type="ADJUSTED_LAST")
{'status': 'successfully created quantrocket.history.us-stk-1d.sqlite'}
$ curl -X PUT 'http://houston/history/databases/us-stk-1d?universes=us-stk&bar_size=1 day&bar_type=ADJUSTED_LAST'
{"status": "successfully created quantrocket.history.us-stk-1d.sqlite"}
With ADJUSTED_LAST
, QuantRocket handles dividend adjustments in the same way it handles split adjustments: whenever IB applies a dividend adjustment, QuantRocket will detect the discrepancy between the IB data and the data as stored in the history database, and will delete the stored data and re-sync with IB.
Primary vs consolidated prices
By default, IB returns consolidated prices for equities. (Consolidated prices are the aggregated prices across all exchanges where a security trades.) If you run an end-of-day strategy that enters and exits in the opening or closing auction, using consolidated prices may be less accurate than using prices from the primary exchange only. This issue is especially significant in US markets due to after-hours trading and the large number of exchanges and ECNs. (For more on this topic, see this blog post by Ernie Chan.)
You can instruct QuantRocket to collect primary exchange prices instead of consolidated prices using the --primary-exchange
option. This instructs IB to filter out trades that didn't take place on the primary listing exchange for the security:
$ quantrocket history create-db 'us-stk-1d' --universes 'us-stk' --bar-size '1 day' --primary-exchange
status: successfully created quantrocket.history.us-stk-1d.sqlite
>>> from quantrocket.history import create_db
>>> create_db("us-stk-1d", universes=["us-stk"], bar_size="1 day", primary_exchange=True)
{'status': 'successfully created quantrocket.history.us-stk-1d.sqlite'}
$ curl -X PUT 'http://houston/history/databases/us-stk-1d?universes=us-stk&bar_size=1 day&primary_exchange=true'
{"status": "successfully created quantrocket.history.us-stk-1d.sqlite"}
Note that volume is also filtered by the primary exchange when using this option. Thus, volume will be lower (possibly significant lower) than the consolidated volume typically reported on financial websites (or if omitting this option).
Collecting consolidated historical data typically requires IB market data permissions for all the exchanges where trades occurred. Collecting data with the primary exchange filter typically only requires IB market data permission for the primary exchange.
Bar sizes
IB offers over 20 bar sizes ranging from 1 month to 1 second. The full list includes: 1 month, 1 week, 1 day, 8 hours, 4 hours, 3 hours, 2 hours, 1 hour, 30 mins, 20 mins, 15 mins, 10 mins, 5 mins, 3 mins, 2 mins, 1 min, 30 secs, 15 secs, 10 secs, 5 secs, and 1 secs.
Types of data
You can use the --bar-type
parameter with create-db
to indicate what type of historical data you want:
Bar type | Description | Available for | Notes |
---|
TRADES | traded price | stocks, futures, options, forex, indexes | adjusted for splits but not dividends |
ADJUSTED_LAST | traded price | stocks | adjusted for splits and dividends |
MIDPOINT | bid-ask midpoint | stocks, futures, options, forex | the open, high, low, and closing midpoint price |
BID | bid | stocks, futures, options, forex | the open, high, low, and closing bid price |
ASK | ask | stocks, futures, options, forex | the open, high, low, and closing ask price |
BID_ASK | time-average bid and ask | stocks, futures, options, forex | time-average bid is stored in the Open field, and time-average ask is stored in the Close field; the High and Low fields contain the max ask and min bid, respectively |
HISTORICAL_VOLATILITY | historical volatility | stocks, indexes | 30 day Garman-Klass volatility of corporate action adjusted data |
OPTION_IMPLIED_VOLATILITY | implied volatility | stocks, indexes | IB calculates implied volatility as follows: "The IB 30-day volatility is the at-market volatility estimated for a maturity thirty calendar days forward of the current trading day, and is based on option prices from two consecutive expiration months." |
If --bar-type
is omitted, it defaults to MIDPOINT
for forex and TRADES
for everything else.
How far back historical data goes
For stocks and currencies, IB historical data depth varies by exchange and bar size. End of day prices go back as far as 1980 for some exchanges, while intraday prices down to 1-minute bars go back as far as 2004. The amount of data available from the IB API is the same as the amount of data available when viewing the corresponding chart in Trader Workstation.
Historical data availability for select exchanges is shown here.
For futures, historical data is available for contracts that expired no more than 2 years ago. IB removes historical futures data from its system 2 years after the contract expiration date. Deeper historical data is available for indices. Thus, for futures contracts with a corresponding index (and for which backwardation and contango are negligible factors), you can run deeper backtests on the index then switch to the futures contract for recent backtests or live trading.
For bar sizes of 30 seconds or smaller, historical data goes back 6 months only.
Survivorship bias
IB historical data does not include delisted companies. If a stock went bankrupt, was acquired, went private, etc. it won't be in IB's data.
QuantRocket doesn't delete delisted tickers, so over time you will build up a database that includes delisted tickers.
End-of-day data that includes active and delisted tickers for US stocks and is free of survivorship bias is available as a premium dataset from Sharadar.
Data collection
Create historical databases
Create a database by defining, at minimum, the bar size you want and the universe of securities to include. Suppose we've used the master service to define a universe of banking stocks on the Tokyo Stock Exchange, and now we want to collect end-of-day historical data for those stocks. First, create the database:
$ quantrocket history create-db 'japan-bank-eod' --universes 'japan-bank' --bar-size '1 day'
status: successfully created quantrocket.history.japan-bank-eod.sqlite
>>> from quantrocket.history import create_db
>>> create_db("japan-bank-eod", universes=["japan-bank"], bar_size="1 day")
{'status': 'successfully created quantrocket.history.japan-bank-eod.sqlite'}
$ curl -X PUT 'http://houston/history/databases/japan-bank-eod?universes=japan-bank&bar_size=1 day'
{"status": "successfully created quantrocket.history.japan-bank-eod.sqlite"}
Then, fill up the database with data from IB:
$ quantrocket history collect 'japan-bank-eod'
status: the historical data will be collected asynchronously
>>> from quantrocket.history import collect_history
>>> collect_history("japan-bank-eod")
{'status': 'the historical data will be collected asynchronously'}
$ curl -X POST 'http://houston/history/queue?codes=japan-bank-eod'
{"status": "the historical data will be collected asynchronously"}
QuantRocket will first query the IB API to determine how far back historical data is available for each security, then query the IB API again to collect the data for that date range. Depending on the bar size and the number of securities in the universe, collecting data can take from several minutes to several hours. If you're running multiple IB Gateway services, QuantRocket will spread the requests among the services to speed up the process. Based on how quickly the IB API is responding to requests, QuantRocket will periodically estimate how long it will take to collect the data. You can monitor flightlog via the command line or Papertrail to track progress:
$ quantrocket flightlog stream
2017-08-22 13:24:09 quantrocket.history: INFO [japan-bank-eod] Determining how much history is available from IB for japan-bank-eod
2017-08-22 13:25:45 quantrocket.history: INFO [japan-bank-eod] Collecting history from IB for japan-bank-eod
2017-08-22 13:26:11 quantrocket.history: INFO [japan-bank-eod] Expected remaining runtime to collect japan-bank-eod history based on IB response times so far: 0:23:11
2017-08-22 13:55:00 quantrocket.history: INFO [japan-bank-eod] Saved 468771 total records for 85 total securities to quantrocket.history.japan-bank-eod.sqlite
In addition to bar size and universe(s), you can optionally define the type of data you want (for example, trades, bid/ask, midpoint, etc.), a fixed start date instead of "as far back as possible", whether to include trades from outside regular trading hours, whether to use consolidated prices or primary exchange prices, and more. For a complete list of options, view the API Reference.
As you become interested in new exchanges or want to test new ideas, you can keep adding as many new databases with as many different configurations as you like.
Update historical data
After you create a history database and run the initial data collection, you will often want to keep the data up-to-date over time. To do so, simply collect the data again:
$ quantrocket history collect 'japan-bank-eod'
status: the historical data will be collected asynchronously
>>> from quantrocket.history import collect_history
>>> collect_history("japan-bank-eod")
{'status': 'the historical data will be collected asynchronously'}
$ curl -X POST 'http://houston/history/queue?codes=japan-bank-eod'
{"status": "the historical data will be collected asynchronously"}
QuantRocket will bring the database current, appending new data to what you already have. The update process will run much faster than the initial data collection due to collecting fewer records.
If QuantRocket detects that a split or other adjustment has occurred, it will not only collect the new data but replace the existing data for that security.
You can use the countdown service to schedule your databases to be updated regularly.
List databases
List your historical databases to see which ones you've created:
$ quantrocket history list
es-fut-1min
japan-bank-eod
uk-etf-15min
usa-stk-1d
>>> from quantrocket.history import list_databases
>>> list_databases()
['es-fut-1min',
'japan-bank-eod',
'uk-etf-15min',
'usa-stk-1d']
$ curl -X GET 'http://houston/history/databases'
["es-fut-1min", "japan-bank-eod", "uk-etf-15min", "usa-stk-1d"]
You can also check the configuration parameters of a specific database:
$ quantrocket history config 'japan-bank-eod'
bar_size: 1 day
fields:
- Open
- High
- Low
- Close
- Volume
- Wap
- TradeCount
universes:
- japan-bank
vendor: ib
>>> from quantrocket.history import get_db_config
>>> get_db_config("japan-bank-eod")
{'universes': ['japan-bank'],
'bar_size': '1 day',
'vendor': 'ib',
'fields': ['Open', 'High', 'Low', 'Close', 'Volume', 'Wap', 'TradeCount']}
$ curl -X GET 'http://houston/history/databases/japan-bank-eod'
{"universes": ["japan-bank"], "bar_size": "1 day", "vendor": "ib", "fields": ["Open", "High", "Low", "Close", "Volume", "Wap", "TradeCount"]}
Historical data collection queue
You can queue as many historical data requests as you wish, and they will be processed in sequential order, one at a time:
$ quantrocket history collect 'aus-lrg-eod' 'singapore-15min' 'germany-1hr-bid-ask'
status: the historical data will be collected asynchronously
>>> from quantrocket.history import collect_history
>>> collect_history(["aus-lrg-eod", "singapore-15min", "germany-1hr-bid-ask"])
{'status': 'the historical data will be collected asynchronously'}
$ curl -X POST 'http://houston/history/queue?codes=aus-lrg-eod&codes=singapore-15min&codes=germany-1hr-bid-ask'
{"status": "the historical data will be collected asynchronously"}
You can view the current queue:
$ quantrocket history queue
priority: []
standard:
- aus-lrg-eod
- singapore-15min
- germany-1hr-bid-ask
>>> quantrocket.history import get_history_queue
>>> get_history_queue()
{'priority': [],
'standard': ['aus-lrg-eod', 'singapore-15min', 'germany-1hr-bid-ask']}
$ curl -X GET 'http://houston/history/queue'
{"priority": [], "standard": ["aus-lrg-eod", "singapore-15min", "germany-1hr-bid-ask"]}
Maybe you're regretting that the Germany request is at the end of the queue because you'd like to get that data first and start analyzing it. You can cancel the requests in front of it then add them to the end of the queue:
$ quantrocket history cancel 'aus-lrg-eod' 'singapore-15min'
priority: []
standard:
- germany-1hr-bid-ask
$ quantrocket history collect 'aus-lrg-eod' 'singapore-15min'
status: the historical data will be collected asynchronously
$ quantrocket history queue
priority: []
standard:
- germany-1hr-bid-ask
- aus-lrg-eod
- singapore-15min
>>> from quantrocket.history import get_history_queue, cancel_collections, collect_history
>>> cancel_collections(codes=["aus-lrg-eod", "singapore-15min"])
{'priority': [],
'standard': ['germany-1hr-bid-ask']}
>>> collect_history(["aus-lrg-eod", "singapore-15min"])
{'status': 'the historical data will be collected asynchronously'}
>>> get_history_queue()
{'priority': [],
'standard': ['germany-1hr-bid-ask', 'aus-lrg-eod', 'singapore-15min']}
$ curl -X DELETE 'http://houston/history/queue?codes=aus-lrg-eod&codes=singapore-15min'
{"priority": [], "standard": ["germany-1hr-bid-ask"]}
$ curl -X POST 'http://houston/history/queue?codes=aus-lrg-eod&codes=singapore-15min'
{"status": "the historical data will be collected asynchronously"}
$ curl -X GET 'http://houston/history/queue'
{"priority": [], "standard": ["germany-1hr-bid-ask", "aus-lrg-eod", "singapore-15min"]}
There's another way to control queue priority: QuantRocket provides a standard queue and a priority queue. The standard queue will only be processed when the priority queue is empty. This can be useful when you're trying to collect a large amount of historical data for backtesting but you don't want it to interfere with daily updates to the databases you use for trading. First, schedule your daily updates on your countdown (cron) service, using the --priority
flag to route them to the priority queue:
30 17 * * mon-fri quantrocket history collect --priority 'nyse-eod'
Then, queue your long-running requests on the standard queue:
$ quantrocket history collect 'nyse-15min'
At 5:30pm, when a request is queued on the priority queue, the long-running request on the standard queue will pause until the priority queue is empty again, and then resume.
Delete historical database
Once you've created a database, you can't edit the configuration; you can only add new databases. If you made a mistake or no longer need an old database, you can drop the database and its associated config:
$ quantrocket history drop-db 'japan-bank-eod' --confirm-by-typing-db-code-again 'japan-bank-eod'
status: deleted quantrocket.history.japan-bank-eod.sqlite
>>> from quantrocket.history import drop_db
>>> drop_db("japan-bank-eod", confirm_by_typing_db_code_again="japan-bank-eod")
{'status': 'deleted quantrocket.history.japan-bank-eod.sqlite'}
$ curl -X DELETE 'http://houston/history/databases/japan-bank-eod?confirm_by_typing_db_code_again=japan-bank-eod'
{"status": "deleted quantrocket.history.japan-bank-eod.sqlite"}
Intraday data collection
IB is a treasure trove of global market data, but few IB customers tap its potential due to the complexity of the API and the long runtimes required to collect it due to IB's rate limits. QuantRocket can collect data in the background continuously for days, weeks, or months on end, surviving network interruptions, IB server blackouts, and other challenges and idiosyncrasies of the IB API. With sufficient hard drive space and a little patience, you can collect terabytes and terabytes of market data.
Initial data collection
Depending on the bar size, number of securities, and date range of your historical database, initial data collection from the IB API can take some time. After the initial data collection, keeping your database up to date is much faster and much easier.
QuantRocket fills your historical database by making a series of requests to the IB API to get a portion of the data, from earlier data to later data. The smaller the bars, the more requests are required to collect all the data.
If you run multiple IB Gateways, each with appropriate IB market data subscriptions, QuantRocket splits the requests between the gateways which results in a proportionate reduction in runtime.
IB API response times also vary by the monthly commissions generated on the account. Accounts with monthly commissions of several thousand USD/month or higher will see response times which are about twice as fast as those for small accounts (or for large accounts with small commissions).
The following table shows estimated runtimes and database sizes for a variety of historical database configurations:
Bar size | Number of stocks | Years of data | Example universes | Runtime (high commission account, 4 IB Gateways) | Runtime (standard account, 2 IB Gateways) | Database size |
---|
1 day | 6,000 | all available (1980-present) | US listed stocks | 3 hours | 12 hours | 2.5 GB |
15 minutes | 6,000 | all available (2004-present) | US listed stocks | 3 days | 2 weeks | 50 GB |
1 minute | 3,000 | 5 years | one of: NYSE, NASDAQ, TSEJ, LSE | 1 week | 1 month | 150 GB |
1 minute | 6,000 | 5 years | US listed stocks | 2 weeks | 2 months | 200 GB |
1 minute | 6,000 | all available (2004-present) | US listed stocks | 1 month | 4 months | 700 GB |
You can use the table above to infer the collection times for other bar sizes and universe sizes. See the exchanges table on the account page for the approximate number of listings for each exchange.
Data collection strategies
Below are several data collection strategies that may help speed up data collection, reduce the amount of data you need to collect, or allow you to begin working with a subset of data while collecting the full amount of data.
Run multiple IB Gateways
You can cut down initial data collection time by running multiple IB gateways. See the section on obtaining and using multiple IB logins.
Daily bars before intraday bars
Suppose you want to collect intraday bars for the top 1000 liquid securities trading on NYSE and NASDAQ. Instead of collecting intraday bars for all NYSE and NASDAQ securities then filtering out illiquid ones, you could try this approach:
- collect a year's worth of daily bars for all NYSE and NASDAQ securities (this requires only 1 request to the IB API per security and will run much faster than collecting multiple years of intraday bars)
- in a notebook, query the daily bars and use them to calculate dollar volume, then create a universe of liquid securities only (see usage guide section on using price data to define universes)
- collect intraday bars for the universe of liquid securities only
You can periodically repeat this process to update the universe constituents.
Filter by availability of fundamentals
Suppose you have a strategy that requires intraday bars and fundamental data and utilizes a universe of small-cap stocks. For many small-cap stocks, fundamental data won't be available, so it doesn't make sense to spend time collecting intraday historical data for stocks that won't have fundamental data. Instead, collect the fundamental data first and filter your universe to stocks with fundamentals, then collect the historical intraday data. For example:
- create a universe of all Japanese small-cap stocks called 'japan-sml'
- collect fundamentals for the universe 'japan-sml'
- in a notebook, query the fundamentals for 'japan-sml' and use the query results to create a new universe called 'japan-sml-with-fundamentals'
- collect intraday price history for 'japan-sml-with-fundamentals'
Earlier history before later history
Suppose you want to collect numerous years of intraday bars. But you'd like to test your ideas on a smaller date range first in order to decide if collecting the full history is worthwhile. This can be done as follows. First, define your desired start date when you create the database:
$ quantrocket history create-db 'usa-liquid-15min' -u 'usa-liquid' -z '15 mins' -s '2011-01-01'
The above database is designed to collect data back to 2011-01-01 and up to the present. However, you can temporarily specify an end date when collecting the data:
$ quantrocket history collect 'usa-liquid-15min' -e '2012-01-01'
In this example, only a year of data will be collected (that is, from the start date of 2011-01-01 specified when the database was created to the end date of 2012-01-01 specified in the above command). That way you can start your research sooner. Later, you can repeat this command with a later end date or remove the end date entirely to bring the database current.
In contrast, it's a bad idea to use a temporary start date to shorten the date range and speed up the data collection, with the intention of going back later to get the earlier data. Since data is filled from back to front (that is, from older dates to newer), once you've collected a later portion of data for a given security, you can't append an earlier portion of data without starting over.
Database per decade
Data for some securities goes back 30 years or more. After testing on recent data, you might want to explore earlier years. While you can't append earlier data to an existing database, you can collect the earlier data in a completely separate database. Depending on your bar size and universe size, you might create a separate database for each decade. These databases would be for backtesting only and, after the initial data collection, would not need to be updated. Only your database of the most recent decade would need to be updated.
Small universes before large universes
Another option to get you researching and backtesting sooner is to collect a subset of your target universe before collecting the entire universe. For example, instead of collecting intraday bars for 1000 securities, collect bars for 100 securities and start testing with those while collecting the remaining data.
Don't collect what you don't need
Many of the strategies outlined above can be summarized in one principle: try to keep your databases as small as possible. Small databases are faster to fill initially, take up less disk space, and, most importantly, are faster and easier to work with in research, backtesting, and trading. If you need a large universe of minute bars, by all means collect it, but in light of the runtime and performance costs of working with large amounts of data, it pays to analyze your data requirements in advance and exclude any data you know you won't need.
Database sharding
Database sharding is only applicable to intraday databases.
Summary of sharding options
| Suitable for queries that | Suitable for backtesting |
---|
shard by year, month, or day | load many securities and many bar times but only a small date range at a time | Moonshot strategies that trade throughout the day, and/or segmented backtests |
shard by time of day | load many securities but only a few bar times at a time | intraday Moonshot strategies that trade once a day |
shard by conid | load a few securities but many bar times and a large date range at a time | Zipline strategies |
shard by conid and time (uses 2x disk space) | load many securities but only a few bar times, or load a few securities but many bar times | intraday Moonshot strategies that trade once a day, or Zipline strategies |
no sharding | load small universes | strategies that use small universes |
More detailed descriptions are provided below.
What is sharding?
In database design, "sharding" refers to dividing a large database into multiple smaller databases, with each smaller database or "shard" containing a subset of the total database rows. A collection of database shards typically performs better than a single large database by allowing more efficient queries. When a query is run, the rows from each shard are combined into a single result set as if they came from a single database.
Very large databases are too large to load entirely into memory, and sharding doesn't circumvent this. Rather, the purpose of sharding is to allow you to efficiently query the particular subset of data you're interested in at the moment.
When you query a sharded database using a filter that corresponds to the sharding scheme (for example, filtering by time for a time-sharded database, or filtering by conid for a conid-sharded database), the query runs faster because it only needs to look in the subset of relevant shards based on the query parameters.
To get the benefit of improved query performance, the sharding scheme must correspond to how you will usually query the database; thus it is necessary to think about this in advance.
A secondary benefit of sharding is that smaller database files are easier to move around, including copying them to and from S3.
Choose sharding option
For intraday databases, you must indicate your sharding option at the time you create the database:
$
$ quantrocket history create-db 'usa-stk-15min' --universes 'usa-stk' --bar-size '15 mins' --shard 'conid,time'
status: successfully created quantrocket.history.usa-stk-15min.sqlite
>>>
>>> from quantrocket.history import create_db
>>> create_db("usa-stk-15min", universes=["usa-stk"], bar_size="15 mins", shard="conid,time")
{'status': 'successfully created quantrocket.history.usa-stk-15min.sqlite'}
$
$ curl -X PUT 'http://houston/history/databases/usa-stk-15min?universes=usa-stk&bar_size=15%20mins&shard=conid,time'
{"status": "successfully created quantrocket.history.usa-stk-15min.sqlite"}
The choices are:
- year
- month
- day
- time
- conid
- conid,time
- off
Sharded database storage
If you list a sharded database using the --expand
/expand=True
parameter, you'll see a separate database file for each time or conid shard:
$
$ quantrocket db list 'history' 'usa-stk-15min' --expand
quantrocket.history.usa-stk-15min.093000.sqlite
quantrocket.history.usa-stk-15min.094500.sqlite
...
$
$ quantrocket db list 'history' 'usa-stk-1min' --expand
quantrocket.history.usa-stk-1min.100248135.sqlite
quantrocket.history.usa-stk-1min.100296007.sqlite
quantrocket.history.usa-stk-1min.100296028.sqlite
...
Shard by year, month, or day
Sharding by year, month, or day results in a separate database shard for each year, month, or day of data, with each separate database containing all securities for only that time period. The number of shards is equal to the number of years, months, or days of data collected, respectively.
As a broad guideline, if collecting 1-minute bars, sharding by year would be suitable for a universe of tens of securities, sharding by month would be suitable for a universe of hundreds of securities, and sharding by day would be suitable for a universe of thousands of securities.
Sharding by year, month, or day is a sensible approach when you need to analyze the entire universe of securities but only for a small date range at a time. This approach pairs well with segmented backtests in Moonshot.
Shard by time
Sharding by time results in a separate database shard for each time of day. For example, assuming 15-minute bars, there will be a separate database for 09:30:00 bars, 09:45:00 bars, etc. (with each separate database containing all dates and all securities for only that bar time). The number of shards is equal to the number of bar times per day.
Sharding by time is an efficient approach when you are working with a large universe of securities but only need to query a handful of times for any given analysis. For example, the following query would run efficiently on a time-sharded database because it only needs to look in 3 shards:
>>> prices = get_prices("usa-stk-15min", times=["09:30:00", "12:00:00", "15:45:00"])
Sharding by time is well-suited to intraday Moonshot strategies that trade once a day, since such strategies typically only utilize a subset of bar times.
Sharding by conid
Sharding by conid results in a separate database shard for each security. Each shard will contain the entire date range and all bar times for a single security. The number of shards is equal to the number of securities in the universe.
Sharding by conid is an efficient approach when you need to query bars for all times of day but can do so for one or a handful of securities at a time. For example, the following query would run efficiently on a conid-sharded database because it only needs to look in 1 shard:
>>>
>>> aapl_prices = get_prices("usa-stk-1min", conids=[265598])
Sharding by conid is well-suited for ingesting data into Zipline for backtesting because Zipline ingests data one security at a time.
Sharding by conid and time
Sharding by conid and time results in duplicate copies of the database, one sharded by time and one by conid. QuantRocket will look in whichever copy of the database allows for the most efficient query based on your query parameters, that is, whichever copy allows looking in the fewest number of shards. For example, if you query prices at a few times of day for many securities, QuantRocket will use the time-sharded database to satisfy your request; if you query prices for many times of day for a few securities, QuantRocket will use the conid-sharded database to satisfy your request:
>>>
>>>
>>>
>>>
>>> prices = get_prices("usa-stk-15min", times=["09:30:00", "12:00:00", "15:45:00"])
>>>
>>>
>>>
>>> prices = get_prices("usa-stk-15min", conids=[265598, 4075])
Sharding by time and by conid allows for more flexible querying but requires double the disk space. It may also increase collection runtime due to the larger volume of data that must be written to disk.
Time filters for intraday databases
When creating a historical database of intraday bars, you can use the times
or between-times
options to filter out unwanted bars.
For example, it's usually a good practice to explicitly specify the session start and end times, as the IB API sometimes sends a small number of bars from outside regular trading hours, and any trading activity from these bars will be included in the cumulative daily totals calculated by QuantRocket. The following command instructs QuantRocket to keep only those bars that fall between 9:30 and 15:45, inclusive. (Note that bar times correspond to the start of the bar, so the final bar for US stocks using 15-min bars would be 15:45:00
.)
$ quantrocket history create-db 'nasdaq-stk-15min' --universes 'nasdaq-stk' --bar-size '15 mins' --between-times '09:30:00' '15:45:00'
status: successfully created quantrocket.history.nasdaq-stk-15min.sqlite
>>> from quantrocket.history import create_db
>>> create_db("nasdaq-stk-15min", universes=["nasdaq-stk"], bar_size="15 mins", between_times=["09:30:00", "15:45:00"])
{'status': 'successfully created quantrocket.history.nasdaq-stk-15min.sqlite'}
$ curl -X PUT 'http://houston/history/databases/nasdaq-stk-15min?universes=nasdaq-stk&bar_size=15+mins&between_times=09%3A30%3A00&between_times=15%3A45%3A00'
{"status": "successfully created quantrocket.history.nasdaq-stk-15min.sqlite"}
You can view the database config to see how QuantRocket expanded the
between-times
values into an explicit list of times to keep:
$ quantrocket history config "nasdaq-stk-15min"
bar_size: 15 mins
times:
- 09:30:00
- 09:45:00
- '10:00:00'
- '10:15:00'
- '10:30:00'
- '10:45:00'
- '11:00:00'
- '11:15:00'
- '11:30:00'
- '11:45:00'
- '12:00:00'
- '12:15:00'
- '12:30:00'
- '12:45:00'
- '13:00:00'
- '13:15:00'
- '13:30:00'
- '13:45:00'
- '14:00:00'
- '14:15:00'
- '14:30:00'
- '14:45:00'
- '15:00:00'
- '15:15:00'
- '15:30:00'
- '15:45:00'
universes:
- nasdaq-stk
vendor: ib
>>> from quantrocket.history import get_db_config
>>> get_db_config("nasdaq-stk-15min")
{'bar_size': '15 mins',
'times': ['09:30:00',
'09:45:00',
'10:00:00',
'10:15:00',
'10:30:00',
'10:45:00',
'11:00:00',
'11:15:00',
'11:30:00',
'11:45:00',
'12:00:00',
'12:15:00',
'12:30:00',
'12:45:00',
'13:00:00',
'13:15:00',
'13:30:00',
'13:45:00',
'14:00:00',
'14:15:00',
'14:30:00',
'14:45:00',
'15:00:00',
'15:15:00',
'15:30:00',
'15:45:00'],
'universes': ['nasdaq-stk'],
'vendor': 'ib'}
$ curl 'http://houston/history/databases/nasdaq-stk-15min'
{"universes": ["nasdaq-stk"], "bar_size": "15 mins", "vendor": "ib", "times": ["09:30:00", "09:45:00", "10:00:00", "10:15:00", "10:30:00", "10:45:00", "11:00:00", "11:15:00", "11:30:00", "11:45:00", "12:00:00", "12:15:00", "12:30:00", "12:45:00", "13:00:00", "13:15:00", "13:30:00", "13:45:00", "14:00:00", "14:15:00", "14:30:00", "14:45:00", "15:00:00", "15:15:00", "15:30:00", "15:45:00"]}
More selectively, if you know you only care about particular times, you can keep only those times, which will result in a smaller, faster database:
$ quantrocket history create-db 'nasdaq-stk-15min' --universes 'nasdaq-stk' --bar-size '15 mins' --times '09:30:00' '09:45:00' '10:00:00' '15:45:00'
status: successfully created quantrocket.history.nasdaq-stk-15min.sqlite
>>> from quantrocket.history import create_db
>>> create_db("nasdaq-stk-15min", universes=["nasdaq-stk"], bar_size="15 mins", times=["09:30:00", "09:45:00", "10:00:00", "15:45:00"])
{'status': 'successfully created quantrocket.history.nasdaq-stk-15min.sqlite'}
$ curl -X PUT 'http://houston/history/databases/nasdaq-stk-15min?universes=nasdaq-stk&bar_size=15+mins×=09%3A30%3A00×=09%3A45%3A00×=10%3A00%3A00×=15%3A45%3A00'
{"status": "successfully created quantrocket.history.nasdaq-stk-15min.sqlite"}
The downside of keeping only a few times is that you'll have to collect data again if you later decide you want to analyze prices at other times of the session. An alternative is to save all the times but filter by time when querying the data, as described below.
Data collection start date
When collecting historical data, QuantRocket first queries the IB API to determine how far back historical data is available for each security. By default, QuantRocket will collect as much data as is available. However, for large databases, such as intraday databases with many securities, it may be useful to set a fixed start date. Typically, the further back you go, the fewer securities there are with available data. Setting a fixed start date limits the size of your database (reducing initial data collection time and improving backtest speed).
Deciding how far back to collect data is made easier if you know how far back it's possible to go, and how many securities in your universe are available back to any given date. Historical data availability for select exchanges is shown here. For other exchanges or for more up-to-date availability, you can use the following approach. First, create a database with no start date:
$ quantrocket history create-db 'usa-stk-15min' --universes 'usa-stk' --bar-size '15 mins'
status: successfully created quantrocket.history.usa-stk-15min.sqlite
>>> from quantrocket.history import create_db
>>> create_db("usa-stk-15min", universes=["usa-stk"], bar_size="15 mins")
{'status': 'successfully created quantrocket.history.usa-stk-15min.sqlite'}
$ curl -X PUT 'http://houston/history/databases/usa-stk-15min?universes=usa-stk&bar_size=15+mins'
{"status": "successfully created quantrocket.history.usa-stk-15min.sqlite"}
Next, instruct QuantRocket to determine historical data availability but not yet collect the data. For large universes this might take a few hours but is much faster than actually collecting all the data:
$ quantrocket history collect 'usa-stk-15min' --availability
status: the historical data will be collected asynchronously
>>> from quantrocket.history import collect_history
>>> collect_history("usa-stk-15min", availability_only=True)
{'status': 'the historical data will be collected asynchronously'}
$ curl -X POST 'http://houston/history/queue?codes=usa-stk-15min&availability_only=true'
{"status": "the historical data will be collected asynchronously"}
Monitor flightlog, and after the data availability has been saved to your database, use the Python client to query and summarize the start dates:
>>> from quantrocket.history import get_history_availability
>>> start_dates = get_history_availability("usa-stk-15min")
>>> start_dates.head()
ConId
4027 2001-11-29 14:30:00
4050 1980-03-17 14:30:00
4065 1980-03-17 14:30:00
4151 1994-04-15 13:30:00
4157 1980-03-17 14:30:00
Name: StartDate, dtype: datetime64[ns]
>>>
>>> cumulative_ticker_counts = start_dates.groupby(start_dates.dt.year).count().cumsum()
>>>
>>> cumulative_ticker_counts = cumulative_ticker_counts[cumulative_ticker_counts.index < 2100]
>>> cumulative_ticker_counts.head()
StartDate
1980 564
1981 591
1982 614
1983 662
1984 693
>>> cumulative_ticker_counts.plot(kind="bar")
When no historical data is available for a particular security, this is indicated by a far future start date of 2200-01-01.
Based on your findings, you can drop and re-create the database with a fixed start date (the historical availability records you just collected are stored in a separate database, quantrocket.history.availability.sqlite
, so you won't lose any data when dropping your history database):
$ quantrocket history drop-db 'usa-stk-15min' --confirm-by-typing-db-code-again 'usa-stk-15min'
status: deleted quantrocket.history.usa-stk-15min.sqlite
$ quantrocket history create-db 'usa-stk-15min' --universes 'usa-stk' --bar-size '15 mins' --start-date '2005-01-01'
status: successfully created quantrocket.history.usa-stk-15min.sqlite
>>> from quantrocket.history import drop_db, create_db
>>> drop_db("usa-stk-15min", confirm_by_typing_db_code_again="usa-stk-15min")
{'status': 'deleted quantrocket.history.usa-stk-15min.sqlite'}
>>> create_db("usa-stk-15min", universes=["usa-stk"], bar_size="15 mins", start_date="2005-01-01")
{'status': 'successfully created quantrocket.history.usa-stk-15min.sqlite'}
$ curl -X DELETE 'http://houston/history/databases/usa-stk-15min?confirm_by_typing_db_code_again=usa-stk-15min'
{"status": "deleted quantrocket.history.usa-stk-15min.sqlite"}
$ curl -X PUT 'http://houston/history/databases/usa-stk-15min?universes=usa-stk&bar_size=15+mins&start_date=2005-01-01'
{"status": "successfully created quantrocket.history.usa-stk-15min.sqlite"}
Please note that for intraday bars, IB may not provide historical data as far back as their own reported start dates. For US stocks, no intraday data is available prior to January 2004, even if the reported start date is earlier. For Japan stocks, no intraday data is available prior to March 2004.
Query historical data
You can download a file of historical data:
$ $ quantrocket history get 'demo-stocks-1d' --start-date '2019-01-01' | csvlook --max-rows 5
| ConId | Date | Open | High | Low | Close | Volume | Wap | TradeCount |
| ------ | ---------- | ------ | ------ | ------ | ------ | -------- | -------- | ---------- |
| 265598 | 2019-01-02 | 154.89 | 158.85 | 154.23 | 157.92 | 25246800 | 156.9095 | 119993 |
| 265598 | 2019-01-03 | 143.95 | 145.72 | 142.0 | 142.19 | 72021700 | 143.634 | 336633 |
| 265598 | 2019-01-04 | 144.58 | 148.55 | 143.8 | 148.26 | 46077300 | 146.743 | 211279 |
| 265598 | 2019-01-07 | 148.64 | 148.83 | 145.9 | 147.93 | 45127300 | 147.3325 | 202827 |
| 265598 | 2019-01-08 | 149.34 | 151.82 | 148.52 | 150.75 | 32422400 | 150.095 | 155894 |
| ... | ... | ... | ... | ... | ... | ... | ... | ... |
>>> import pandas as pd
>>> from quantrocket.history import download_history_file
>>> download_history_file("demo-stocks-1d",
start_date="2019-01-01",
filepath_or_buffer="demo_stocks_1d.csv")
>>> prices = pd.read_csv("demo_stocks_1d.csv", parse_dates=["Date"])
>>> prices.head()
ConId Date Open High Low Close Volume Wap TradeCount
0 265598 2019-01-02 154.89 158.85 154.23 157.92 25246800 156.9095 119993
1 265598 2019-01-03 143.95 145.72 142.00 142.19 72021700 143.6340 336633
2 265598 2019-01-04 144.58 148.55 143.80 148.26 46077300 146.7430 211279
3 265598 2019-01-07 148.64 148.83 145.90 147.93 45127300 147.3325 202827
4 265598 2019-01-08 149.34 151.82 148.52 150.75 32422400 150.0950 155894
$ curl -X GET 'http://houston/history/demo-stocks-1d.csv?start_date=2019-01-01' | head
ConId,Date,Open,High,Low,Close,Volume,Wap,TradeCount
265598,2019-01-02,154.89,158.85,154.23,157.92,25246800,156.9095,119993
265598,2019-01-03,143.95,145.72,142.0,142.19,72021700,143.634,336633
265598,2019-01-04,144.58,148.55,143.8,148.26,46077300,146.743,211279
265598,2019-01-07,148.64,148.83,145.9,147.93,45127300,147.3325,202827
265598,2019-01-08,149.34,151.82,148.52,150.75,32422400,150.095,155894
For a higher-level API for loading historical data into Python, see the get_prices
function outlined in the Research section.
IB Fundamental Data
IB provides customers with access to global fundamental data sourced from Reuters. IB enables data access by default; no subscription in IB Client Portal is required. Appropriate exchange permissions in QuantRocket are required. There are two available datasets: estimates and actuals, and financial statements.
This section is about collecting and analyzing fundamental data. For an introductory overview of the available data, see the
Reuters Fundamentals data guide.
Reuters estimates and actuals
Collect Reuters estimates
To use Reuters estimates and actuals in QuantRocket, first collect the data from IB into your QuantRocket database. Then you can run queries against the database in your research and backtests.
To collect analyst estimates and actuals, specify one or more conids or universes to collect data for:
$ quantrocket fundamental collect-estimates --universes 'japan-banks' 'singapore-banks'
status: the fundamental data will be collected asynchronously
>>> from quantrocket.fundamental import collect_reuters_estimates
>>> collect_reuters_financials(universes=["japan-banks","singapore-banks"])
{'status': 'the fundamental data will be collected asynchronously'}
$ curl -X POST 'http://houston/fundamental/reuters/estimates?universes=japan-banks&universes=singapore-banks'
{"status": "the fundamental data will be collected asynchronously"}
Multiple requests will be queued and processed sequentially. You can monitor flightlog via the command line or Papertrail to track progress:
$ quantrocket flightlog stream
quantrocket.fundamental: INFO Collecting Reuters estimates from IB for universes japan-banks, singapore-banks
quantrocket.fundamental: INFO Expected remaining runtime to collect Reuters estimates for universes japan-banks, singapore-banks: 0:04:25
quantrocket.fundamental: INFO Saved 3298 total records for 60 total securities to quantrocket.fundamental.reuters.estimates.sqlite for universes japan-banks, singapore-banks
Query Reuters estimates
To query Reuters estimates and actuals, first look up the code(s) for the metrics you care about:
$ quantrocket fundamental codes --report-types 'estimates'
estimates:
BVPS: Book Value Per Share
CAPEX: Capital Expenditure
CPS: Cash Flow Per Share
DPS: Dividend Per Share
EBIT: Earnings Before Interest and Tax
...
>>> from quantrocket.fundamental import list_reuters_codes
>>> list_reuters_codes(report_types=["estimates"])
{'estimates': {'BVPS': 'Book Value Per Share',
'CAPEX': 'Capital Expenditure',
'CPS': 'Cash Flow Per Share',
'DPS': 'Dividend Per Share',
'EBIT': 'Earnings Before Interest and Tax',
...
}}
$ curl -X GET 'http://houston/fundamental/reuters/codes?report_types=estimates'
{"estimates": {"BVPS": "Book Value Per Share", "CAPEX": "Capital Expenditure", "CPS": "Cash Flow Per Share", "DPS": "Dividend Per Share", "EBIT": "Earnings Before Interest and Tax",...}}
Let's query EPS estimates and actuals:
$ quantrocket fundamental estimates 'EPS' -u 'us-banks' -s '2014-01-01' -e '2017-01-01' -o eps_estimates.csv
$ csvlook -I --max-columns 10 --max-rows 5 eps_estimates.csv
| ConId | Indicator | Unit | FiscalYear | FiscalPeriodEndDate | FiscalPeriodType | FiscalPeriodNumber | High | Low | Mean | ... |
| ----- | --------- | ---- | ---------- | ------------------- | ---------------- | ------------------ | ---- | ---- | ------ | --- |
| 9029 | EPS | U | 2014 | 2014-03-31 | Q | 1 | 0.31 | 0.2 | 0.255 | ... |
| 9029 | EPS | U | 2014 | 2014-06-30 | Q | 2 | 0.77 | 0.73 | 0.7467 | ... |
| 9029 | EPS | U | 2014 | 2014-09-30 | Q | 3 | 0.71 | 0.63 | 0.6667 | ... |
| 9029 | EPS | U | 2014 | 2014-12-31 | A | | 2.25 | 2.23 | 2.2433 | ... |
| 9029 | EPS | U | 2014 | 2014-12-31 | Q | 4 | 0.49 | 0.47 | 0.4833 | ... |
| ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
>>> from quantrocket.fundamental import download_reuters_estimates
>>> import io
>>> import pandas as pd
>>> f = io.StringIO()
>>> download_reuters_estimates(["EPS"],f,universes=["us-banks"],
start_date="2014-01-01", end_date="2017-01-01")
>>> eps_estimates = pd.read_csv(f, parse_dates=["FiscalPeriodEndDate", "AnnounceDate"])
>>> eps_estimates.head()
ConId Indicator Unit FiscalYear FiscalPeriodEndDate FiscalPeriodType \
0 9029 EPS U 2014 2014-03-31 Q
1 9029 EPS U 2014 2014-06-30 Q
2 9029 EPS U 2014 2014-09-30 Q
3 9029 EPS U 2014 2014-12-31 A
4 9029 EPS U 2014 2014-12-31 Q
FiscalPeriodNumber High Low Mean Median StdDev NumOfEst \
0 1.0 0.31 0.20 0.2550 0.255 0.0550 2.0
1 2.0 0.77 0.73 0.7467 0.740 0.0170 3.0
2 3.0 0.71 0.63 0.6667 0.660 0.0330 3.0
3 NaN 2.25 2.23 2.2433 2.250 0.0094 3.0
4 4.0 0.49 0.47 0.4833 0.490 0.0094 3.0
AnnounceDate UpdatedDate Actual
0 2014-05-01 11:45:00 2014-05-01T12:06:31 0.12
1 2014-07-31 11:45:00 2014-07-31T13:47:24 1.02
2 2014-11-04 12:45:00 2014-11-04T13:27:49 0.62
3 2015-02-27 12:45:00 2015-02-27T13:20:27 2.29
4 2015-02-27 12:45:00 2015-02-27T13:20:26 0.53
$ curl -X GET 'http://houston/fundamental/reuters/estimates.csv?codes=EPS&universes=us-banks&start_date=2014-01-01&end_date=2017-01-01' --output eps_estimates.csv
$ head eps_estimates.csv
ConId,Indicator,Unit,FiscalYear,FiscalPeriodEndDate,FiscalPeriodType,FiscalPeriodNumber,High,Low,Mean,Median,StdDev,NumOfEst,AnnounceDate,UpdatedDate,Actual
9029,EPS,U,2014,2014-03-31,Q,1,0.31,0.2,0.255,0.255,0.055,2,2014-05-01T11:45:00,2014-05-01T12:06:31,0.12
9029,EPS,U,2014,2014-06-30,Q,2,0.77,0.73,0.7467,0.74,0.017,3,2014-07-31T11:45:00,2014-07-31T13:47:24,1.02
9029,EPS,U,2014,2014-09-30,Q,3,0.71,0.63,0.6667,0.66,0.033,3,2014-11-04T12:45:00,2014-11-04T13:27:49,0.62
9029,EPS,U,2014,2014-12-31,A,,2.25,2.23,2.2433,2.25,0.0094,3,2015-02-27T12:45:00,2015-02-27T13:20:27,2.29
9029,EPS,U,2014,2014-12-31,Q,4,0.49,0.47,0.4833,0.49,0.0094,3,2015-02-27T12:45:00,2015-02-27T13:20:26,0.53
Reuters estimates aligned to prices
You can use a DataFrame of historical prices to get Reuters estimates and actuals data that is aligned to the price data. This makes it easy to perform matrix operations using fundamental data.
>>> from quantrocket import get_prices
>>> prices = get_prices("japan-bank-eod", start_date="2017-01-01", fields=["Open","High","Low","Close", "Volume"])
>>> closes = prices.loc["Close"]
For intraday databases, use .loc
and .xs
to isolate a particular field and time, so that the DataFrame index consists only of dates. Again, the particular field and time don't matter, as only the columns and index will be used:
>>> from quantrocket import get_prices
>>> prices = get_prices("japan-bank-15min", start_date="2017-01-01", fields=["Close", "Volume"])
>>> closes = prices.loc["Close"].xs("15:30:00", level="Time")
Now use the DataFrame of prices to get a DataFrame of estimates and actuals.
>>>
>>> from quantrocket.fundamental import get_reuters_estimates_reindexed_like
>>> estimates = get_reuters_estimates_reindexed_like(
closes,
codes=["EPS", "BVPS"])
Similar to historical data, the resulting DataFrame can be thought of as several stacked DataFrames, with a MultiIndex consisting of the indicator code, the field (by default only Actual
is returned), and the date. Note that get_reuters_estimates_reindexed_like
shifts values forward by one day to avoid any lookahead bias.
>>> estimates.head()
ConId 265598 3691937 15124833 208813719
Indicator Field Date
BVPS Actual 2016-01-04 21.39 26.5032 5.0875 336.454
2016-01-05 21.39 26.5032 5.0875 336.454
2016-01-06 21.39 26.5032 5.0875 336.454
2016-01-07 21.39 26.5032 5.0875 336.454
2016-01-08 21.39 26.5032 5.0875 336.454
...
EPS Actual 2016-01-04 1.96 0.17 0.07 7.35
2016-01-05 1.96 0.17 0.07 7.35
2016-01-06 1.96 0.17 0.07 7.35
2016-01-07 1.96 0.17 0.07 7.35
2016-01-08 1.96 0.17 0.07 7.35
You can use .loc
to isolate a particular indicator and field and perform matrix operations:
>>> book_values_per_share = estimates.loc["BVPS"].loc["Actual"]
Since the columns and date index match that of the historical data, you can perform matrix operations on prices and estimates/actuals together:
>>>
>>> pb_ratios = closes/book_values_per_share
For best performance, make two separate calls to get_reuters_estimates_reindexed_like
to retrieve numeric (integer or float) vs non-numeric (string or date) fields. Pandas loads numeric fields in an optimized format compared to non-numeric fields, but mixing numeric and non-numeric fields prevents Pandas from using this optimized format, resulting in slower loads and higher memory consumption.
>>>
>>> estimates = get_reuters_estimates_reindexed_like(
closes,
codes=["EPS", "BVPS"],
fields=["Actual", "FiscalPeriodEndDate"])
>>> eps_actuals = estimates.loc["EPS"].loc["Actual"]
>>> fiscal_periods = estimates.loc["EPS"].loc["FiscalPeriodEndDate"]
>>>
>>> estimates = get_reuters_estimates_reindexed_like(
closes,
codes=["EPS", "BVPS"],
fields=["Actual"])
>>> eps_actuals = estimates.loc["EPS"].loc["Actual"]
>>> estimates = get_reuters_estimates_reindexed_like(
closes,
codes=["EPS", "BVPS"],
fields=["FiscalPeriodEndDate"])
>>> fiscal_periods = estimates.loc["EPS"].loc["FiscalPeriodEndDate"]
Reuters financial statements
Collect Reuters financials
To use Reuters financial statements in QuantRocket, first collect the data from IB into your QuantRocket database. Then you can run queries against the database in your research and backtests.
To collect financial statements, specify one or more conids or universes to collect data for:
$ quantrocket fundamental collect-financials --universes 'japan-banks' 'singapore-banks'
status: the fundamental data will be collected asynchronously
>>> from quantrocket.fundamental import collect_reuters_financials
>>> collect_reuters_financials(universes=["japan-banks","singapore-banks"])
{'status': 'the fundamental data will be collected asynchronously'}
$ curl -X POST 'http://houston/fundamental/reuters/financials?universes=japan-banks&universes=singapore-banks'
{"status": "the fundamental data will be collected asynchronously"}
Multiple requests will be queued and processed sequentially. You can monitor flightlog via the command line or Papertrail to track progress:
$ quantrocket flightlog stream
quantrocket.fundamental: INFO Collecting Reuters financials from IB for universes japan-banks, singapore-banks
quantrocket.fundamental: INFO Expected remaining runtime to collect Reuters financials for universes japan-banks, singapore-banks: 0:00:33
quantrocket.fundamental: INFO Saved 12979 total records for 100 total securities to quantrocket.fundamental.reuters.financials.sqlite for universes japan-banks, singapore-banks
Query Reuters financials
To query Reuters financials, first look up the code(s) for the metrics you care about, optionally limiting to a particular statement type:
$ quantrocket fundamental codes --report-types 'financials' --statement-types 'CAS'
financials:
FCDP: Total Cash Dividends Paid
FPRD: Issuance (Retirement) of Debt, Net
FPSS: Issuance (Retirement) of Stock, Net
FTLF: Cash from Financing Activities
ITLI: Cash from Investing Activities
OBDT: Deferred Taxes
OCPD: Cash Payments
OCRC: Cash Receipts
...
>>> from quantrocket.fundamental import list_reuters_codes
>>> list_reuters_codes(report_types=["financials"], statement_types=["CAS"])
{'financials': {'FCDP': 'Total Cash Dividends Paid',
'FPRD': 'Issuance (Retirement) of Debt, Net',
'FPSS': 'Issuance (Retirement) of Stock, Net',
'FTLF': 'Cash from Financing Activities',
'ITLI': 'Cash from Investing Activities',
'OBDT': 'Deferred Taxes',
'OCPD': 'Cash Payments',
'OCRC': 'Cash Receipts',
...
}}
$ curl -X GET 'http://houston/fundamental/reuters/codes?report_types=financials&statement_types=CAS'
{"financials": {"FCDP": "Total Cash Dividends Paid", "FPRD": "Issuance (Retirement) of Debt, Net", "FPSS": "Issuance (Retirement) of Stock, Net", "FTLF": "Cash from Financing Activities", "ITLI": "Cash from Investing Activities", "OBDT": "Deferred Taxes", "OCPD": "Cash Payments", "OCRC": "Cash Receipts",...}}
QuantRocket reads the codes from the financial statements database; therefore, you must collect data into the database before you can list the available codes.
Let's query Net Income Before Taxes (code EIBT) for a universe of securities:
$ quantrocket fundamental financials 'EIBT' -u 'us-banks' -s '2014-01-01' -e '2017-01-01' -o financials.csv
$ csvlook -I --max-columns 6 --max-rows 5 financials.csv
| CoaCode | ConId | Amount | FiscalYear | FiscalPeriodEndDate | FiscalPeriodType | ... |
| ------- | ----- | ------ | ---------- | ------------------- | ---------------- | --- |
| EIBT | 9029 | 13.53 | 2014 | 2014-12-31 | Annual | ... |
| EIBT | 9029 | 28.117 | 2015 | 2015-12-31 | Annual | ... |
| EIBT | 12190 | -7.307 | 2014 | 2014-05-31 | Annual | ... |
| EIBT | 12190 | -4.188 | 2015 | 2015-05-31 | Annual | ... |
| EIBT | 12190 | 1.873 | 2016 | 2016-05-31 | Annual | ... |
| ... | ... | ... | ... | ... | ... | ... |
>>> from quantrocket.fundamental import download_reuters_financials
>>> import io
>>> import pandas as pd
>>> f = io.StringIO()
>>> download_reuters_financials(["EIBT"],f,universes=["us-banks"],
start_date="2014-01-01", end_date="2017-01-01")
>>> financials = pd.read_csv(f, parse_dates=["SourceDate", "FiscalPeriodEndDate"])
>>> financials.head()
CoaCode ConId Amount FiscalYear FiscalPeriodEndDate FiscalPeriodType \
0 EIBT 9029 13.530 2014 2014-12-31 Annual
1 EIBT 9029 28.117 2015 2015-12-31 Annual
2 EIBT 12190 -4.188 2015 2015-05-31 Annual
3 EIBT 12190 1.873 2016 2016-05-31 Annual
4 EIBT 270422 -3.770 2015 2015-09-30 Annual
FiscalPeriodNumber StatementType StatementPeriodLength \
0 NaN INC 12
1 NaN INC 12
2 NaN INC 12
3 NaN INC 12
4 NaN INC 12
StatementPeriodUnit UpdateTypeCode UpdateTypeDescription StatementDate \
0 M UPD Updated Normal 2014-12-31
1 M UPD Updated Normal 2015-12-31
2 M UPD Updated Normal 2015-05-31
3 M UPD Updated Normal 2016-05-31
4 M UPD Updated Normal 2015-09-30
AuditorNameCode AuditorName AuditorOpinionCode AuditorOpinion Source \
0 EY Ernst & Young LLP UNQ Unqualified 10-K
1 EY Ernst & Young LLP UNQ Unqualified 10-K
2 CROW Crowe Horwath LLP UNQ Unqualified 10-K
3 CROW Crowe Horwath LLP UNQ Unqualified 10-K
4 CROW Crowe Horwath LLP UNQ Unqualified 10-K
SourceDate
0 2015-03-13
1 2016-02-29
2 2015-08-26
3 2016-08-05
4 2015-12-18
$ curl -X GET 'http://houston/fundamental/reuters/financials.csv?codes=EIBT&universes=us-banks&start_date=2014-01-01&end_date=2017-01-01' --output financials.csv
$ head financials.csv
CoaCode,ConId,Amount,FiscalYear,FiscalPeriodEndDate,FiscalPeriodType,FiscalPeriodNumber,StatementType,StatementPeriodLength,StatementPeriodUnit,UpdateTypeCode,UpdateTypeDescription,StatementDate,AuditorNameCode,AuditorName,AuditorOpinionCode,AuditorOpinion,Source,SourceDate
EIBT,9029,13.53,2014,2014-12-31,Annual,,INC,12,M,UPD,"Updated Normal",2014-12-31,EY,"Ernst & Young LLP",UNQ,Unqualified,10-K,2015-03-13
EIBT,9029,28.117,2015,2015-12-31,Annual,,INC,12,M,UPD,"Updated Normal",2015-12-31,EY,"Ernst & Young LLP",UNQ,Unqualified,10-K,2016-02-29
EIBT,12190,-4.188,2015,2015-05-31,Annual,,INC,12,M,UPD,"Updated Normal",2015-05-31,CROW,"Crowe Horwath LLP",UNQ,Unqualified,10-K,2015-08-26
EIBT,12190,1.873,2016,2016-05-31,Annual,,INC,12,M,UPD,"Updated Normal",2016-05-31,CROW,"Crowe Horwath LLP",UNQ,Unqualified,10-K,2016-08-05
EIBT,270422,-3.77,2015,2015-09-30,Annual,,INC,12,M,UPD,"Updated Normal",2015-09-30,CROW,"Crowe Horwath LLP",UNQ,Unqualified,10-K,2015-12-18
By default, annual rather than interim statements are returned, and restatements are included. If you prefer, you can choose interim instead of annual statements, and/or you can choose to exclude restatements:$ quantrocket fundamental financials 'EIBT' -u 'us-banks' -s '2014-01-01' -e '2017-01-01' --interim --exclude-restatements -o interim_financials.csv
$ csvlook -I --max-columns 6 --max-rows 5 financials.csv
| CoaCode | ConId | Amount | FiscalYear | FiscalPeriodEndDate | FiscalPeriodType | ... |
| ------- | ------ | ------ | ---------- | ------------------- | ---------------- | --- |
| EIBT | 9029 | 15.386 | 2016 | 2016-06-30 | Interim | ... |
| EIBT | 9029 | 8.359 | 2016 | 2016-09-30 | Interim | ... |
| EIBT | 12190 | 0.744 | 2017 | 2016-08-31 | Interim | ... |
| EIBT | 12190 | -0.595 | 2017 | 2016-11-30 | Interim | ... |
| EIBT | 270422 | 1.599 | 2016 | 2016-07-01 | Interim | ... |
| ... | ... | ... | ... | ... | ... | ... |
>>> from quantrocket.fundamental import download_reuters_financials
>>> import io
>>> import pandas as pd
>>> f = io.StringIO()
>>> download_reuters_financials(["EIBT"],f,universes=["us-banks"],
interim=True,
exclude_restatements=True,
start_date="2014-01-01", end_date="2017-01-01")
>>> interim_financials = pd.read_csv(f, parse_dates=["SourceDate", "FiscalPeriodEndDate"])
>>> interim_financials.head()
CoaCode ConId Amount FiscalYear FiscalPeriodEndDate FiscalPeriodType \
0 EIBT 9029 8.359 2016 2016-09-30 Interim
1 EIBT 9029 3.459 2016 2016-12-31 Interim
2 EIBT 12190 0.744 2017 2016-08-31 Interim
3 EIBT 12190 -0.595 2017 2016-11-30 Interim
4 EIBT 270422 1.599 2016 2016-07-01 Interim
FiscalPeriodNumber StatementType StatementPeriodLength \
0 3 INC 3
1 4 INC 3
2 1 INC 3
3 2 INC 3
4 3 INC 3
StatementPeriodUnit UpdateTypeCode UpdateTypeDescription StatementDate \
0 M UPD Updated Normal 2016-09-30
1 M UCA Updated Calculated 2016-12-31
2 M UPD Updated Normal 2016-08-31
3 M UPD Updated Normal 2016-11-30
4 M UPD Updated Normal 2016-07-01
AuditorNameCode AuditorName AuditorOpinionCode AuditorOpinion \
0 NaN NaN NaN NaN
1 DHS Deloitte & Touche LLP UNQ Unqualified
2 NaN NaN NaN NaN
3 NaN NaN NaN NaN
4 NaN NaN NaN NaN
Source SourceDate
0 10-Q 2016-11-04
1 10-K 2017-03-03
2 10-Q 2016-10-13
3 10-Q 2017-01-12
4 10-Q 2016-08-10
$ curl -X GET 'http://houston/fundamental/reuters/financials.csv?codes=EIBT&universes=us-banks&interim=True&exclude_restatements=True&start_date=2014-01-01&end_date=2017-01-01' --output interim_financials.csv
$ head interim_financials.csv
CoaCode,ConId,Amount,FiscalYear,FiscalPeriodEndDate,FiscalPeriodType,FiscalPeriodNumber,StatementType,StatementPeriodLength,StatementPeriodUnit,UpdateTypeCode,UpdateTypeDescription,StatementDate,AuditorNameCode,AuditorName,AuditorOpinionCode,AuditorOpinion,Source,SourceDate
EIBT,9029,8.359,2016,2016-09-30,Interim,3,INC,3,M,UPD,"Updated Normal",2016-09-30,,,,,10-Q,2016-11-04
EIBT,9029,3.459,2016,2016-12-31,Interim,4,INC,3,M,UCA,"Updated Calculated",2016-12-31,DHS,"Deloitte & Touche LLP",UNQ,Unqualified,10-K,2017-03-03
EIBT,12190,0.744,2017,2016-08-31,Interim,1,INC,3,M,UPD,"Updated Normal",2016-08-31,,,,,10-Q,2016-10-13
EIBT,12190,-0.595,2017,2016-11-30,Interim,2,INC,3,M,UPD,"Updated Normal",2016-11-30,,,,,10-Q,2017-01-12
EIBT,270422,1.599,2016,2016-07-01,Interim,3,INC,3,M,UPD,"Updated Normal",2016-07-01,,,,,10-Q,2016-08-10
Reuters financials aligned to prices
As with Reuters estimates, you can use a DataFrame of historical prices to get Reuters fundamental data that is aligned to the price data. This makes it easy to perform matrix operations using fundamental data.
First, isolate a particular field of your prices DataFrame. It doesn't matter what field you select, as only the date index and the column names will be used to query the fundamentals. For daily data, use .loc
:
>>> from quantrocket import get_prices
>>> prices = get_prices("japan-bank-eod", start_date="2017-01-01", fields=["Open","High","Low","Close", "Volume"])
>>> closes = prices.loc["Close"]
For intraday databases, use .loc
and .xs
to isolate a particular field and time, so that the DataFrame index consists only of dates. Again, the particular field and time don't matter, as only the columns and index will be used:
>>> from quantrocket import get_prices
>>> prices = get_prices("japan-bank-15min", start_date="2017-01-01", fields=["Close", "Volume"])
>>> closes = prices.loc["Close"].xs("15:30:00", level="Time")
Now use the DataFrame of prices to get a DataFrame of fundamentals.
>>>
>>>
>>> from quantrocket.fundamental import get_reuters_financials_reindexed_like
>>> financials = get_reuters_financials_reindexed_like(
closes,
coa_codes=["ATOT", "LTLL", "QTCO"])
Similar to historical data, the resulting DataFrame can be thought of as several stacked DataFrames, with a MultiIndex consisting of the COA (Chart of Account) code, the field (by default only Amount
is returned), and the date. Note that get_reuters_financials_reindexed_like
shifts fundamental values forward by one day to avoid any lookahead bias.
>>> financials.head()
ConId 4157 4165 4187 4200
CoaCode Field Date
ATOT Amount 2018-01-02 21141.294 39769.0 1545.50394 425935.0
2018-01-03 21141.294 39769.0 1545.50394 425935.0
2018-01-04 21141.294 39769.0 1545.50394 425935.0
2018-01-05 21141.294 39769.0 1545.50394 425935.0
2018-01-08 21141.294 39769.0 1545.50394 425935.0
...
QTCO Amount 2018-04-03 368.63579 557.0 101.73566 2061.06063
2018-04-04 368.63579 557.0 101.73566 2061.06063
2018-04-05 368.63579 557.0 101.73566 2061.06063
2018-04-06 368.63579 557.0 101.73566 2061.06063
2018-04-09 368.63579 557.0 101.73566 2061.06063
You can use .loc
to isolate particular COA codes and fields and perform matrix operations:
>>>
>>> tot_assets = financials.loc["ATOT"].loc["Amount"]
>>> tot_liabilities = financials.loc["LTLL"].loc["Amount"]
>>> shares_out = financials.loc["QTCO"].loc["Amount"]
>>> book_values_per_share = (tot_assets - tot_liabilities)/shares_out
Since the columns and date index match that of the historical data, you can perform matrix operations on prices and fundamentals together:
>>>
>>> pb_ratios = closes/book_values_per_share
For best performance, make two separate calls to get_reuters_financials_reindexed_like
to retrieve numeric (integer or float) vs non-numeric (string or date) fields. Pandas loads numeric fields in an optimized format compared to non-numeric fields, but mixing numeric and non-numeric fields prevents Pandas from using this optimized format, resulting in slower loads and higher memory consumption.
>>>
>>> financials = get_reuters_financials_reindexed_like(
closes,
codes=["ATOT", "SREV"],
fields=["Amount", "FiscalPeriodEndDate"])
>>> tot_assets = financials.loc["ATOT"].loc["Amount"]
>>> fiscal_periods = financials.loc["ATOT"].loc["FiscalPeriodEndDate"]
>>>
>>> financials = get_reuters_financials_reindexed_like(
closes,
codes=["ATOT", "SREV"],
fields=["Amount"])
>>> tot_assets = financials.loc["ATOT"].loc["Amount"]
>>> financials = get_reuters_financials_reindexed_like(
closes,
codes=["ATOT", "BVPS"],
fields=["FiscalPeriodEndDate"])
>>> fiscal_periods = financials.loc["ATOT"].loc["FiscalPeriodEndDate"]
Reuters fundamental snippets
Enterprise Multiple (EV/EBITDA)
Enterprise multiple (enterprise value divided by EBITDA) is a popular valuation ratio that is not directly provided by the Reuters datasets. It can be calculated from metrics available in the Reuters financials dataset:
from quantrocket import get_prices
from quantrocket.fundamental import get_reuters_financials_reindexed_like
prices = get_prices("usa-stk-eod", fields=["Close"])
closes = prices.loc["Close"]
financials = get_reuters_financials_reindexed_like(
closes,
["QTCO", "QTPO", "STLD", "LMIN", "ACAE", "SOPI", "SDPR"])
shares_out = financials.loc["QTCO"].loc["Amount"]
preferred_shares_out = financials.loc["QTPO"].loc["Amount"]
total_debts = financials.loc["STLD"].loc["Amount"]
minority_interests = financials.loc["LMIN"].loc["Amount"]
cash = financials.loc["ACAE"].loc["Amount"]
market_values_common = prices * shares_out
market_values_preferred = prices * preferred_shares_out.fillna(0)
evs = market_values_common + market_values_preferred + total_debts + minority_interests.fillna(0) - cash
operating_profits = financials.loc["SOPI"].loc["Amount"]
depr_amorts = financials.loc["SDPR"].loc["Amount"]
ebitdas = operating_profits + depr_amorts.fillna(0)
enterprise_multiples = evs / ebitdas.where(ebitdas > 0)
Current vs prior fiscal period
Sometimes you may wish to calculate the change in a financial metric between the prior and current fiscal period. For example, suppose you wanted to calculate the change in the working capital ratio (defined as total assets / total liabilities). First, query the financial statements and calculate the current ratios:
from quantrocket import get_prices
from quantrocket.fundamental import get_reuters_financials_reindexed_like
prices = get_prices("usa-stk-eod", fields=["Close"])
closes = prices.loc["Close"]
financials = get_reuters_financials_reindexed_like(
closes,
["ATOT", "LTLL"])
tot_assets = financials.loc["ATOT"].loc["Amount"]
tot_liabilities = financials.loc["LTLL"].loc["Amount"]
current_ratios = tot_assets / tot_liabilities.where(total_liabilities != 0)
To get the prior year ratios, a simplistic method would be to shift the current ratios forward 1 year (current_ratios.shift(252)
), but this would be suboptimal because company reporting dates may not be spaced exactly one year apart. A more reliable approach is shown below:
fiscal_periods = get_reuters_financials_reindexed_like(
closes,
["ATOT"],
fields=["FiscalPeriodEndDate"]).loc["ATOT"].loc["FiscalPeriodEndDate"]
are_new_fiscal_periods = fiscal_periods != fiscal_periods.shift()
previous_current_ratios = current_ratios.shift().where(are_new_fiscal_periods).fillna(method="ffill")
ratio_increases = current_ratios > previous_current_ratios
If you want to go back more than one period, you can use the following approach, which is more flexible but has the disadvantage of running slower since the calculation is performed conid by conid:
fiscal_periods = get_reuters_financials_reindexed_like(
closes,
["ATOT"],
fields=["FiscalPeriodEndDate"]).loc["ATOT"].loc["FiscalPeriodEndDate"]
are_new_fiscal_periods = fiscal_periods != fiscal_periods.shift()
periods_ago = 4
def n_periods_ago(fundamentals_for_conid):
conid = fundamentals_for_conid.name
new_period_fundamentals = fundamentals_for_conid.where(are_new_fiscal_periods[conid]).dropna()
earlier_fundamentals = new_period_fundamentals.shift(periods_ago)
earlier_fundamentals = earlier_fundamentals.reindex(fundamentals_for_conid.index, method="ffill")
return earlier_fundamentals
earlier_current_ratios = current_ratios.apply(n_periods_ago)
Wall Street Horizon earnings announcement dates
The Wall Street Horizon earnings calendar, available via IB, provides forward-looking earnings announcement dates. (By contrast, the Reuters estimates and actuals dataset provides historical earnings announcement dates but does not provide forward-looking announcement dates.)
To access Wall Street Horizon data you must subscribe to the data feed via IB Client Portal, in the Research subscriptions section.
For US and Canadian stocks, Wall Street Horizon provides good coverage and the data generally indicate whether the earnings announcement occurred before, during, or after the market session. Coverage of other countries is more limited and generally does not include the time of day of the announcement.
Collect earnings announcement dates
To use Wall Street Horizon earnings announcement dates in QuantRocket, first collect the data from IB into your QuantRocket database. Specify one or more conids or universes to collect data for:
$ quantrocket fundamental collect-wsh --universes 'usa-stk'
status: the fundamental data will be collected asynchronously
>>> from quantrocket.fundamental import collect_wsh_earnings_dates
>>> collect_wsh_earnings_dates(universes="usa-stk")
{'status': 'the fundamental data will be collected asynchronously'}
$ curl -X POST 'http://houston/fundamental/wsh/calendar?universes=usa-stk'
{"status": "the fundamental data will be collected asynchronously"}
Multiple requests will be queued and processed sequentially. Monitor flightlog to track progress:
$ quantrocket flightlog stream
quantrocket.fundamental: INFO Collecting Wall Street Horizon earnings dates from IB for universes usa-stk
quantrocket.fundamental: INFO Expected remaining runtime to collect Wall Street Horizon earnings dates for universes usa-stk: 1:34:28
quantrocket.fundamental: INFO Saved 2671 total records for 2556 total securities to quantrocket.fundamental.wsh.calendar.sqlite for universes usa-stk (data unavailable for 1742 securities)
Wall Street Horizon returns the upcoming announcement for each security, including the date, status (confirmed or unconfirmed), and the time of day if available. Over successive data collection runs the details of a particular announcement may change as Wall Street Horizon gains new information. For example, an "unconfirmed" status may change to "confirmed." When this happens, QuantRocket will preserve both the old record and the updated record, allowing you to establish a point-in-time database of announcement forecasts.
Query earnings announcement dates
You can download the earnings announcement dates to CSV:
$ quantrocket fundamental wsh -u 'usa-stk' -o announcement_dates.csv
$ csvlook -I announcement_dates.csv
| ConId | Date | Time | Status | Period | LastUpdated |
| ----- | ---------- | ------------- | ----------- | ------ | ------------------- |
| 4027 | 2019-05-21 | Before Market | Unconfirmed | 2019-1 | 2019-04-04T02:17:27 |
| 4050 | 2019-06-05 | After Market | Unconfirmed | 2019-2 | 2019-04-04T02:17:27 |
| 4065 | 2019-04-17 | Before Market | Confirmed | 2019-1 | 2019-04-04T01:15:46 |
| 4151 | 2019-04-22 | After Market | Confirmed | 2019-1 | 2019-04-04T01:15:49 |
| 4157 | 2019-05-29 | Before Market | Unconfirmed | 2019-2 | 2019-04-04T02:17:27 |
| ... | ... | ... | ... | ... | ... |
>>> from quantrocket.fundamental import download_wsh_earnings_dates
>>> import pandas as pd
>>> download_wsh_earnings_dates("announcement_dates.csv", universes="usa-stk")
>>> announcement_dates = pd.read_csv("announcement_dates.csv", parse_dates=["Date", "LastUpdated"])
>>> announcement_dates.head()
ConId Date Time Status Period LastUpdated
0 4027 2019-05-21 Before Market Unconfirmed 2019-1 2019-04-04 02:17:27
1 4050 2019-06-05 After Market Unconfirmed 2019-2 2019-04-04 02:17:27
2 4065 2019-04-17 Before Market Confirmed 2019-1 2019-04-04 01:15:46
3 4151 2019-04-22 After Market Confirmed 2019-1 2019-04-04 01:15:49
4 4157 2019-05-29 Before Market Unconfirmed 2019-2 2019-04-04 02:17:27
$ curl -X GET 'http://houston/fundamental/wsh/calendar.csv?universes=usa-stk' --output announcement_dates.csv
$ head announcement_dates.csv
ConId,Date,Time,Status,Period,LastUpdated
4027,2019-05-21,"Before Market",Unconfirmed,2019-1,2019-04-04T02:17:27
4050,2019-06-05,"After Market",Unconfirmed,2019-2,2019-04-04T02:17:27
4065,2019-04-17,"Before Market",Confirmed,2019-1,2019-04-04T01:15:46
4151,2019-04-22,"After Market",Confirmed,2019-1,2019-04-04T01:15:49
4157,2019-05-29,"Before Market",Unconfirmed,2019-2,2019-04-04T02:17:27
4165,2019-04-30,"Before Market",Confirmed,2019-1,2019-04-04T02:17:27
4200,2019-08-15,"Before Market",Confirmed,2019-1,2019-04-04T02:17:27
4205,2019-04-25,"After Market",Confirmed,2019-1,2019-04-04T01:15:51
4211,2019-04-25,"Before Market",Unconfirmed,2019-1,2019-04-04T02:17:27
Because QuantRocket preserve changes to records over successive data collection runs, there may be multiple records for a given security and fiscal period. In the following example, Wall Street Horizon was originally expecting an announcement on April 25 but later confirmed the announcement for April 30:
$ quantrocket fundamental wsh -i 234647242 | csvlook
| ConId | Date | Time | Status | Period | LastUpdated |
| --------- | ---------- | ------------- | ----------- | ------ | ------------------- |
| 234647242 | 2019-04-25 | Unspecified | Unconfirmed | 2019-1 | 2018-03-22 13:07:13 |
| 234647242 | 2019-04-30 | Before Market | Confirmed | 2019-1 | 2019-04-04 02:17:27 |
>>> download_wsh_earnings_dates("announcement_dates.csv", conids=234647242)
>>> announcement_dates = pd.read_csv("announcement_dates.csv", parse_dates=["Date", "LastUpdated"])
>>> announcement_dates.head()
ConId Date Time Status Period LastUpdated
0 234647242 2019-04-25 Unspecified Unconfirmed 2019-1 2018-03-22 13:07:13
1 234647242 2019-04-30 Before Market Confirmed 2019-1 2019-04-04 02:17:27
$ curl -X GET 'http://houston/fundamental/wsh/calendar.csv?conids=234647242'
ConId,Date,Time,Status,Period,LastUpdated
234647242,2019-04-25,Unspecified,Unconfirmed,2019-1,2018-03-22T13:07:13
234647242,2019-04-30,"Before Market",Confirmed,2019-1,2019-04-04T02:17:27
If you only want the latest record for any given fiscal period, you should dedupe on ConId
and Period
, keeping only the latest record as indicated by the LastUpdated
field:
>>>
>>> announcement_dates = announcement_dates.sort_values("LastUpdated").drop_duplicates(subset=["ConId", "Period"], keep="last")
The function get_wsh_earnings_dates_reindexed_like
performs this deduping logic automatically.
Earnings announcement dates aligned to prices
You can use a DataFrame of historical prices to get earnings announcement dates that are aligned to the price data.
>>> from quantrocket import get_prices
>>> from quantrocket.fundamental import get_wsh_earnings_dates_reindexed_like
>>> prices = get_prices("tech-giants-1d", start_date="2019-01-01", fields="Close")
>>> closes = prices.loc["Close"]
>>> announcements = get_wsh_earnings_dates_reindexed_like(closes)
Since Wall Street Horizon data is forward-looking only, this function may not return any data until a few days or weeks after the initial data collection.
By default, only the Time
field is returned:
>>> announcements.tail()
ConId 265598 3691937 15124833 208813719
Field Date
Time 2019-04-26 NaN NaN NaN NaN
2019-04-27 NaN NaN NaN NaN
2019-04-28 NaN NaN NaN NaN
2019-04-29 NaN NaN NaN After Market
2019-04-30 Before Market NaN NaN NaN
The resulting DataFrame is sparse, not forward-filled, nor are the announcement dates shifted forward. By default the results are limited to confirmed announcements.
You can get a boolean DataFrame indicating announcements that occurred since the prior close by combining announcements that occurred before today's open or after yesterday's close:
>>> announce_times = announcements.loc["Time"]
>>> announced_since_prior_close = (announce_times == "Before Market") | (announce_times.shift() == "After Market")
Suppose you are live trading an end-of-day Moonshot strategy and want to get a boolean DataFrame indicating announcements that will occur before the next session's open. First, you must extend the index of the prices DataFrame to include the next session. This can be done with ib_trading_calendars
:
>>> from ib_trading_calendars import get_calendar
>>> nyse_cal = get_calendar("NYSE")
>>> latest_session = closes.index.max()
>>>
>>> next_session = nyse_cal.next_open(latest_session.replace(hour=23, minute=59)).date()
>>> closes = closes.reindex(closes.index.union([next_session]))
>>> closes.index.name = "Date"
Then get the announcements, shifting pre-market announcements backward:
>>> announcements = get_wsh_earnings_dates_reindexed_like(closes)
>>> announce_times = announcements.loc["Time"]
>>> announces_before_next_open = (announce_times == "After Market") | (announce_times.shift(-1) == "Before Market")
Finally, if needed, restore the DataFrame indexes to their original shape:
>>> closes = closes.drop(next_session)
>>> announces_before_next_open = announces_before_next_open.drop(next_session)
Fundamentals query cache
The fundamental service utilizes a file cache to improve query performance. When you query any of the fundamentals endpoints (Reuters, Sharadar, etc.), the data is loaded from the database and the resulting file is cached by the fundamental service. Later, if you query again using exactly the same query parameters, the cached file will be returned without hitting the database, resulting in a faster response. Whenever you collect fundamental data, the cached files are invalidated, forcing the subsequent query to hit the database in order to see the refreshed data.
Clear the cache
File caching usually requires no special action or awareness by the user, but there are a few edge cases where you might need to clear the cache manually:
- if you query fundamentals by universe, then change the constituents of the universe, then query again with the same parameters, the fundamental service won't know the universe constituents changed and will return the cached file that was generated using the original universe constituents
- if you query fundamentals, then overwrite the database by pulling another version of the database from S3, then query again with the same parameters, the fundamental service will return the cached file that was generated using the original database
If a fundamentals query is not returning expected results and you suspect caching is to blame, you can either vary the query parameters slightly (for example change the date range) to bypass the cache, or re-create the fundamental container (not just restart it) to clear all cached files.
Short Sale Data
QuantRocket provides current and historical short sale availability data from IB. The dataset includes the number of shortable shares available and the associated borrow fees. You can use this dataset to model the constraints and costs of short selling.
IB updates short sale availability data every 15 minutes. IB does not provide a historical archive of data but QuantRocket maintains a historical archive dating from April 16, 2018.
No IB market data subscriptions are required to access this dataset but you must have the appropriate exchange permissions in QuantRocket.
Collect short sale data
Shortable shares data and borrow fee data are stored separately but have similar APIs. Both datasets are organized by country. The available country names are:
| | |
---|
australia | france | mexico |
austria | germany | spain |
belgium | hongkong | swedish |
british | india | swiss |
canada | italy | usa |
dutch | japan | |
To use the data, first collect the desired dataset and countries from QuantRocket's archive into your local database. For shortable shares:
$ quantrocket fundamental collect-shortshares --countries 'japan' 'usa'
status: the shortable shares will be collected asynchronously
>>> from quantrocket.fundamental import collect_shortable_shares
>>> collect_shortable_shares(countries=["japan","usa"])
{'status': 'the shortable shares will be collected asynchronously'}
$ curl -X POST 'http://houston/fundamental/stockloan/shares?countries=japan&countries=usa'
{"status": "the shortable shares will be collected asynchronously"}
Similarly for borrow fees:
$ quantrocket fundamental collect-shortfees --countries 'japan' 'usa'
status: the borrow fees will be collected asynchronously
>>> from quantrocket.fundamental import collect_borrow_fees
>>> collect_borrow_fees(countries=["japan","usa"])
{'status': 'the borrow fees will be collected asynchronously'}
$ curl -X POST 'http://houston/fundamental/stockloan/fees?countries=japan&countries=usa'
{"status": "the borrow fees will be collected asynchronously"}
You can pass an invalid country such as "?" to either of the above endpoints to see the available country names.
QuantRocket will collect the data in 1-month batches and save it to your database. Monitor flightlog for progress:
2018-07-30 13:40:31 quantrocket.fundamental: INFO Collecting japan shortable shares from 2018-04-01 to present
2018-07-30 13:40:40 quantrocket.fundamental: INFO Collecting usa shortable shares from 2018-04-01 to present
2018-07-30 13:42:07 quantrocket.fundamental: INFO Saved 2993493 total shortable shares records to quantrocket.fundamental.stockloan.shares.sqlite
To update the data later, re-run the same command(s) you ran originally. QuantRocket will collect any new data since your last update and add it to your database.
Short sale data characteristics
Data storage
IB updates short sale availability data every 15 minutes, but the data for any given stock doesn't always change that frequently. To conserve disk space, QuantRocket stores the shortable shares and borrow fees data sparsely. That is, the data for any given security is stored only when the data changes. The following example illustrates:
Timestamp (UTC) | Shortable shares reported by IB for ABC stock | stored in QuantRocket database |
---|
2018-05-01T09:15:02 | 70,900 | yes |
2018-05-01T09:30:03 | 70,900 | - |
2018-05-01T09:45:02 | 70,900 | - |
2018-05-01T10:00:03 | 84,000 | yes |
2018-05-01T10:15:02 | 84,000 | - |
With this data storage design, the data is intended to be forward-filled after you query it. (The functions get_shortable_shares_reindexed_like
and get_borrow_fees_reindexed_like
do this for you.)
QuantRocket stores the first data point of each month for each stock regardless of whether it changed from the previous data point. This is to ensure that the data is not stored so sparsely that stocks are inadvertently omitted from date range queries. When querying and forward-filling the data you should request an initial 1-month buffer to ensure that infrequently-changing data is included in the query results. For example, if you want results back to June 17, 2018, you should query back to June 1, 2018 or earlier, as this ensures you will get the first-of-month data point for any infrequently changing securities. The functions get_shortable_shares_reindexed_like
and get_borrow_fees_reindexed_like
take care of this for you.
Missing data
The shortable shares and borrow fees datasets represent IB's comprehensive list of shortable stocks. If stocks are missing from the data, that means they were never available to short. Stocks that were available to short and later became unavailable will be present in the data and will have values of 0 when they became unavailable (possibly followed by nonzero values if they later became available again).
Timestamps and latency
The data timestamps are in UTC and indicate the time at which IB made the data available. It takes approximately two minutes for the data to be processed and made available in QuantRocket's archive. Once available, the data will be added to your local database the next time you collect it.
Stocks with >10M shortable shares
In the shortable shares dataset, 10000000 (10 million) is the largest number reported and means "10 million or more."
Query short sale data
You can export the short sale data to CSV (or JSON), querying by universe or conid:
$ quantrocket fundamental shortshares -u 'usa-stk' -o usa_shortable_shares.csv
$ csvlook -I --max-rows 5 usa_shortable_shares.csv
| ConId | Date | Quantity |
| ----- | ------------------- | -------- |
| 4027 | 2018-04-15T21:45:02 | 2200000 |
| 4027 | 2018-04-16T13:15:03 | 2300000 |
| 4027 | 2018-04-17T09:15:03 | 2100000 |
| 4027 | 2018-04-17T11:15:02 | 2000000 |
| 4027 | 2018-04-17T11:45:02 | 2100000 |
>>> from quantrocket.fundamental import download_shortable_shares
>>> import pandas as pd
>>> download_shortable_shares(
"usa_shortable_shares.csv",
universes=["usa-stk"])
>>> shortable_shares = pd.read_csv(
"usa_shortable_shares.csv",
parse_dates=["Date"])
>>> shortable_shares.head()
ConId Date Quantity
0 4027 2018-04-15 21:45:02 2200000
1 4027 2018-04-16 13:15:03 2300000
2 4027 2018-04-17 09:15:03 2100000
3 4027 2018-04-17 11:15:02 2000000
4 4027 2018-04-17 11:45:02 2100000
$ curl -X GET 'http://houston/fundamental/stockloan/shares.csv?&universes=usa-stk' --output usa_shortable_shares.csv
$ head usa_shortable_shares.csv
ConId,Date,Quantity
4027,2018-04-15T21:45:02,2200000
4027,2018-04-16T13:15:03,2300000
4027,2018-04-17T09:15:03,2100000
4027,2018-04-17T11:15:02,2000000
4027,2018-04-17T11:45:02,2100000
Similarly with borrow fees:
$ quantrocket fundamental shortfees -u 'usa-stk' -o usa_borrow_fees.csv
$ csvlook -I --max-rows 5 usa_borrow_fees.csv
| ConId | Date | FeeRate |
| ----- | ------------------- | ------- |
| 4027 | 2018-04-15T21:45:02 | 0.25 |
| 4027 | 2018-04-24T14:15:02 | 0.262 |
| 4027 | 2018-04-25T14:15:03 | 0.2945 |
| 4027 | 2018-04-26T14:15:03 | 0.2642 |
| 4027 | 2018-04-27T14:15:02 | 0.2609 |
>>> from quantrocket.fundamental import download_borrow_fees
>>> import pandas as pd
>>> download_borrow_fees(
"usa_borrow_fees.csv",
universes=["usa-stk"])
>>> borrow_fees = pd.read_csv(
"usa_borrow_fees.csv",
parse_dates=["Date"])
>>> borrow_fees.head()
ConId Date FeeRate
0 4027 2018-04-15 21:45:02 0.2500
1 4027 2018-04-24 14:15:02 0.2620
2 4027 2018-04-25 14:15:03 0.2945
3 4027 2018-04-26 14:15:03 0.2642
4 4027 2018-04-27 14:15:02 0.2609
$ curl -X GET 'http://houston/fundamental/stockloan/fees.csv?&universes=usa-stk' --output usa_borrow_fees.csv
$ head usa_borrow_fees.csv
ConId,Date,FeeRate
4027,2018-04-15T21:45:02,0.25
4027,2018-04-24T14:15:02,0.262
4027,2018-04-25T14:15:03,0.2945
4027,2018-04-26T14:15:03,0.2642
4027,2018-04-27T14:15:02,0.2609
Short sale data aligned to prices
As with Reuters fundamentals, you can use a DataFrame of historical prices to get shortable shares or borrow fees data that is aligned to the price data.
First, isolate a particular field of your prices DataFrame. It doesn't matter what field you select, as only the date index and the column names will be used to query the short sale data. For daily data, use .loc
:
>>> from quantrocket import get_prices
>>> prices = get_prices("usa-stk-1d", start_date="2018-04-16", fields=["Open","Close", "Volume"])
>>> closes = prices.loc["Close"]
For intraday databases, use .loc
and .xs
to isolate a particular field and time, so that the DataFrame index consists only of dates. Again, the particular field and time don't matter, as only the columns and index will be used:
>>> from quantrocket import get_prices
>>> prices = get_prices("usa-stk-15min", start_date="2018-04-16", fields=["Close", "Volume"], times="15:30:00")
>>> closes = prices.loc["Close"].xs("15:30:00", level="Time")
Now use the DataFrame of prices to get a DataFrame of shortable shares and/or borrow fees:
>>> from quantrocket.fundamental import get_shortable_shares_reindexed_like, get_borrow_fees_reindexed_like
>>> shortable_shares = get_shortable_shares_reindexed_like(closes)
>>> borrow_fees = get_borrow_fees_reindexed_like(closes)
The resulting DataFrame has a DatetimeIndex matching the input DataFrame:
>>> shortable_shares.head()
ConId 4027 4050 4065 4151 \
Date
2018-04-16 2200000.0 3000000.0 10000000.0 550000.0
2018-04-17 2300000.0 2900000.0 10000000.0 600000.0
2018-04-18 2100000.0 3000000.0 10000000.0 550000.0
2018-04-19 2100000.0 3000000.0 10000000.0 550000.0
2018-04-20 2100000.0 3000000.0 10000000.0 550000.0
The data for each date is as of midnight UTC. You can specify a different time and timezone using the time
parameter:
>>>
>>> shortable_shares = get_shortable_shares_reindexed_like(closes, time="09:30:00 America/New_York")
>>>
>>> borrow_fees = get_borrow_fees_reindexed_like(closes, time="17:00:00 America/New_York")
Dates prior to April 16, 2018 (the start date of QuantRocket's historical archive) will have NaNs in the resulting DataFrame.
Borrow fees are stored as annualized interest rates. For example, 1.0198 indicates an annualized interest rate of 1.0198%:
>>> borrow_fees.head()
ConId 4187 4200 4205 4211 \
Date
2018-04-16 1.0198 0.7224 0.5954 0.2500
2018-04-17 0.5023 0.5912 0.5954 0.2500
2018-04-18 0.6257 0.5925 0.5943 0.8844
2018-04-19 0.9537 0.5946 0.6463 0.8844
2018-04-20 1.6422 0.5936 0.3096 0.6476
Below is an example of calculating borrow fees for a DataFrame of positions (adapted from Moonshot's BorrowFees
slippage class):
borrow_fees = get_borrow_fees_reindexed_like(positions)
borrow_fees = borrow_fees / 100
daily_borrow_fees = borrow_fees / 252
assessed_fees = positions.where(positions < 0, 0).abs() * daily_borrow_fees
Sharadar Data
Sharadar provides premium datasets of US fundamentals and end-of-day prices, including active and delisted tickers, with over 20 years of history. You can use these datasets together for performing research and backtests that are free of survivorship bias, or you can use Sharadar fundamentals with IB historical prices for research, backtesting, and live trading.
A Sharadar premium data subscription is required.
This section is about technical usage of Sharadar data in QuantRocket. For an introductory overview of the fundamentals dataset characteristics, see the
Sharadar Fundamentals data guide.
Sharadar master
Sharadar domain
Sharadar data comes with its own securities master file listing all tickers (active and delisted) included in the dataset. The Sharadar master file forms a Venn diagram with IB listings: Sharadar includes delisted tickers not available from IB; IB includes global listings not available from Sharadar; and both include currently listed US stocks. Each vendor assigns its own unique IDs (conids) to its listings. For example, the IB conid for AAPL is 265598, while the Sharadar conid for AAPL is 199059.
In QuantRocket each securities master is referred to as a "domain". IB listings and conids constitute the "main" domain and are stored in quantrocket.master.main.sqlite
. Sharadar listings and conids constitute the "sharadar" domain and are stored in quantrocket.master.sharadar.sqlite
.
Collect Sharadar listings
Sharadar listings are automatically updated each time you collect Sharadar fundamentals or historical prices. Therefore the listings do not need to be collected by the user. However, if you wish to update the master listings without collecting fundamentals or prices, you can do so:
$ quantrocket master collect-sharadar
status: Saved 15411 Sharadar US stock listings to quantrocket.master.sharadar.sqlite
>>> from quantrocket.master import collect_sharadar_listings
>>> collect_sharadar_listings()
{'status': 'Saved 15411 Sharadar US stock listings to quantrocket.master.sharadar.sqlite'}
$ curl -X POST 'http://houston/master/sharadar/securities'
{"status": "Saved 15411 Sharadar US stock listings to quantrocket.master.sharadar.sqlite"}
Sharadar master file
You can download the Sharadar master file by specifying the domain
parameter, which tells QuantRocket to run the command against quantrocket.master.sharadar.sqlite
instead of quantrocket.master.main.sqlite
(with no domain parameter the command runs against the main, or IB, domain):
$ quantrocket master get --exchanges 'NYSE' --domain 'sharadar' -o sharadar_nyse_securities.csv
>>> from quantrocket.master import download_master_file
>>> download_master_file("sharadar_nyse_securities.csv", exchanges="NYSE", domain="sharadar")
$ curl -X GET 'http://houston/master/sharadar/securities.csv?exchanges=NYSE' > sharadar_nyse_securities.csv
The Sharadar master file provides all of the fields in the IB master file (with NULLs if Sharadar doesn't populate the field), plus additional fields not included in the IB master file.
Sharadar universes
The domain
parameter allows you to create and manage universes in the Sharadar master:
$
$ quantrocket master universe 'nyse-stk' -f sharadar_nyse_securities.csv --domain 'sharadar'
code: nyse-stk
inserted: 5067
provided: 5067
total_after_insert: 5067
$
$ quantrocket master list-universes --domain 'sharadar'
nyse-stk: 5067
$
$ quantrocket master delete-universe 'nyse-stk' --domain 'sharadar'
code: nyse-stk
deleted: 5067
>>> from quantrocket.master import create_universe, list_universes, delete_universe
>>>
>>> create_universe("nyse-stk", infilepath_or_buffer="sharadar_nyse_securities.csv", domain="sharadar")
{'code': 'nyse-stk',
'provided': 5067,
'inserted': 5067,
'total_after_insert': 5067}
>>>
>>> list_universes(domain="sharadar")
{'nyse-stk': 5067}
>>>
>> delete_universe("nyse-stk", domain="sharadar")
{'code': 'nyse-stk', 'deleted': 5067}
$
$ curl -X PUT 'http://houston/master/sharadar/universes/nyse-stk' --upload-file sharadar_nyse_securities.csv
{"code": "nyse-stk", "provided": 5067, "inserted": 5067, "total_after_insert": 5067}
$
$ curl -X GET 'http://houston/master/sharadar/universes'
{"nyse-stk": 5067}
$
$ curl -X DELETE 'http://houston/master/sharadar/universes/nyse-stk'
{"code": "nyse-stk", "deleted": 5067}
Sharadar-to-IB ticker translations
To support using Sharadar fundamentals with IB historical prices, QuantRocket provides a mapping of Sharadar conids to IB conids. These translations are stored in quantrocket.master.translations.sqlite
.
Typically you won't need to worry about the translations other than to specify the domain ("main" or "sharadar") for certain commands/functions. However, this section helps illustrate how domains work and how to translate between them if needed.
Let's compare the AAPL listing in IB vs Sharadar. First, check the AAPL listing from IB (not passing a domain
parameter defaults to the "main" or IB domain):
$ quantrocket master get -s 'AAPL' -e 'NASDAQ' -f 'Symbol' 'PrimaryExchange' 'LongName' 'Sector' 'Industry' --json | json2yaml
---
-
ConId: 265598
Symbol: "AAPL"
PrimaryExchange: "NASDAQ"
LongName: "APPLE INC"
Sector: "Technology"
Industry: "Computers"
>>> from quantrocket.master import download_master_file
>>> import io
>>> import pandas as pd
>>> f = io.StringIO()
>>> download_master_file(f, symbols=["AAPL"], exchanges=["NASDAQ"], fields=["Symbol","PrimaryExchange","LongName","Sector","Industry"])
>>> ib_listings = pd.read_csv(f)
>>> ib_listings.iloc[0]
ConId 265598
Symbol AAPL
PrimaryExchange NASDAQ
LongName APPLE INC
Sector Technology
Industry Computers
Name: 0, dtype: object
$ curl -X GET 'http://houston/master/securities.json?exchanges=NASDAQ&symbols=AAPL&fields=Symbol&fields=PrimaryExchange&fields=LongName&fields=Sector&fields=Industry' | json2yaml
---
-
ConId: 265598
Symbol: "AAPL"
PrimaryExchange: "NASDAQ"
LongName: "APPLE INC"
Sector: "Technology"
Industry: "Computers"
Then, check the AAPL listing from Sharadar by using the
domain
parameter:
$ quantrocket master get -s 'AAPL' -e 'NASDAQ' -f 'Symbol' 'PrimaryExchange' 'LongName' 'Sector' 'Industry' --domain 'sharadar' --json | json2yaml
---
-
ConId: 199059
Symbol: "AAPL"
PrimaryExchange: "NASDAQ"
LongName: "Apple Inc"
Sector: "Technology"
Industry: "Consumer Electronics"
>>> f = io.StringIO()
>>> download_master_file(f, symbols=["AAPL"], exchanges=["NASDAQ"], fields=["Symbol","PrimaryExchange","LongName","Sector","Industry"], domain="sharadar")
>>> sharadar_listings = pd.read_csv(f)
>>> sharadar_listings.iloc[0]
ConId 199059
Symbol AAPL
PrimaryExchange NASDAQ
LongName Apple Inc
Sector Technology
Industry Consumer Electronics
Name: 0, dtype: object
$ curl -X GET 'http://houston/master/sharadar/securities.json?exchanges=NASDAQ&symbols=AAPL&fields=Symbol&fields=PrimaryExchange&fields=LongName&fields=Sector&fields=Industry' | json2yaml
---
-
ConId: 199059
Symbol: "AAPL"
PrimaryExchange: "NASDAQ"
LongName: "Apple Inc"
Sector: "Technology"
Industry: "Consumer Electronics"
While similar, the exact listing details differ somewhat due to originating from different vendors. Most importantly, the conids are different. We can translate back and forth between conids. To find the Sharadar conid that corresponds to an IB conid:
$ quantrocket master translate 265598 --to 'sharadar'
265598: 199059
>>> from quantrocket.master import translate_conids
>>> translate_conids(list(ib_listings.ConId), to_domain="sharadar")
{265598: 199059}
$ curl -X GET 'http://houston/master/translations?conids=265598&to_domain=sharadar'
{"265598": 199059}
Conversely, to find the IB conid that corresponds to a Sharadar conid:
$ quantrocket master translate 199059 --from 'sharadar'
199059: 265598
>>> translate_conids(list(sharadar_listings.ConId), from_domain="sharadar")
{199059: 265598}
$ curl -X GET 'http://houston/master/translations?conids=199059&from_domain=sharadar'
{"199059": 265598}
US historical prices
Collect historical prices
To collect US historical prices from Sharadar, create a database tied to the "sharadar" vendor:
$ quantrocket history create-db 'sharadar-1d' --vendor 'sharadar'
status: successfully created quantrocket.history.sharadar-1d.sqlite
>>> from quantrocket.history import create_db
>>> create_db("sharadar-1d", vendor="sharadar")
{'status': 'successfully created quantrocket.history.sharadar-1d.sqlite'}
$ curl -X PUT 'http://houston/history/databases/sharadar-1d?vendor=sharadar'
{"status": "successfully created quantrocket.history.sharadar-1d.sqlite"}
Then collect the data:
$ quantrocket history collect 'sharadar-1d'
status: the historical data will be collected asynchronously
>>> from quantrocket.history import collect_history
>>> collect_history("sharadar-1d")
{'status': 'the historical data will be collected asynchronously'}
$ curl -X POST 'http://houston/history/queue?codes=sharadar-1d'
{"status": "the historical data will be collected asynchronously"}
Each time you collect the data, QuantRocket first updates the Sharadar master database with the latest listings, then collects the entire historical prices dataset your subscription gives you access to. This ensures you always have the latest, up-to-date data. Collecting and loading the data into your database takes anywhere from a few minutes up to 10 or 15 minutes, depending on the number of exchanges and years of history. When finished you'll see a completion message in flightlog:
quantrocket.history: INFO [sharadar-1d] Collecting updated Sharadar securities listings
quantrocket.history: INFO [sharadar-1d] Saved 15411 Sharadar US stock listings to quantrocket.master.sharadar.sqlite
quantrocket.history: INFO [sharadar-1d] Collecting all available history from Sharadar
quantrocket.history: INFO [sharadar-1d] Saved 9019707 total Sharadar records for 8380 total securities to quantrocket.history.sharadar-1d.sqlite
Query historical prices
You can query the database like any other history database:
>>> from quantrocket import get_prices
>>> prices = get_prices("sharadar-1d", universes=["nyse-stk"], start_date="2017-01-01", fields=["Close"])
ConId 114299 114300 114301 114302 114303 ...
Field Date
Close 2017-01-03 16.94 4.17 11.210 19.31 9.00
2017-01-04 16.73 4.17 11.074 19.36 9.00
2017-01-05 16.79 4.17 11.175 20.11 9.60
Note that the query runs against the Sharadar master, not the IB master. This means:
- Conids in the query results are Sharadar conids, not IB conids
- If you filter by
universes
, the Sharadar master is consulted, so the universe should have been defined in the Sharadar master - If you filter by
conids
, the conids you provide are interpreted as Sharadar conids, not IB conids
US fundamentals
QuantRocket supports using Sharadar fundamentals with Sharadar historical prices or with IB historical prices. For example, you can run backtests that are free of survivorship bias by using Sharadar fundamentals with Sharadar historical prices. Then, to switch to live trading, you can use Sharadar fundamentals with IB prices.
Collect fundamentals
To collect Sharadar fundamental data:
$ quantrocket fundamental collect-sharadar
status: the fundamental data will be collected asynchronously
>>> from quantrocket.fundamental import collect_sharadar_fundamentals
>>> collect_sharadar_fundamentals()
{'status': 'the fundamental data will be collected asynchronously'}
$ curl -X POST 'http://houston/fundamental/sharadar/sf1'
{"status": "the fundamental data will be collected asynchronously"}
Each time you collect the data, QuantRocket first updates the Sharadar master database with the latest listings, then collects the entire fundamental dataset your subscription gives you access to. This ensures you always have the latest, up-to-date data. Collecting and loading the data into your database takes approximately 5 minutes or less, depending on the number of exchanges and years of history. When finished you'll see a completion message in flightlog:
quantrocket.fundamental: INFO Collecting updated Sharadar securities listings
quantrocket.fundamental: INFO Saved 15411 Sharadar US stock listings to quantrocket.master.sharadar.sqlite
quantrocket.fundamental: INFO Collecting all available fundamentals from Sharadar
quantrocket.fundamental: INFO Saved 1998652 total Sharadar fundamental records for 14082 total securities to quantrocket.fundamental.sharadar.sf1.sqlite
Query fundamentals
Sharadar fundamentals can be used with Sharadar historical prices or with IB historical prices, based on the domain
parameter. If the domain
is "main," the query results are returned with IB conids; if the domain
is "sharadar", the query results are returned with Sharadar conids. The domain
parameter also determines whether the universes
and conids
parameters, if provided, are interpreted as referring to IB conids or Sharadar conids.
For example, in the following query, which specifies the "main" domain, we request as-reported trailing twelve month (ART) fundamentals for AAPL, specified by its IB conid (265598):
$ quantrocket fundamental sharadar --conids 265598 --domain 'main' --dimensions 'ART' -o aapl_fundamentals.csv
$ csvlook aapl_fundamentals.csv --max-columns 7 --max-rows 3
| ConId | DIMENSION | DATEKEY | REPORTPERIOD | CALENDARDATE | LASTUPDATED | REVENUE | ... |
| ------ | --------- | ---------- | ------------ | ------------ | ----------- | ------------- | --- |
| 265598 | ART | 1997-12-05 | 1997-09-26 | 1997-09-30 | 2018-08-01 | 7,081,000,000 | ... |
| 265598 | ART | 1998-02-09 | 1997-12-26 | 1997-12-31 | 2018-08-01 | | ... |
| 265598 | ART | 1998-05-11 | 1998-03-27 | 1998-03-31 | 2018-08-01 | | ... |
>>> from quantrocket.fundamental import download_sharadar_fundamentals
>>> download_sharadar_fundamentals(domain="main", filepath_or_buffer="aapl_fundamentals.csv", conids=265598, dimensions="ART")
>>> fundamentals = pd.read_csv("aapl_fundamentals.csv", parse_dates=["REPORTPERIOD", "DATEKEY", "CALENDARDATE"])
>>> fundamentals.tail()
ConId DIMENSION DATEKEY REPORTPERIOD CALENDARDATE LASTUPDATED REVENUE
78 265598 ART 2017-08-02 2017-07-01 2017-06-30 2018-08-01 2.235070e+11
79 265598 ART 2017-11-03 2017-09-30 2017-09-30 2018-08-01 2.292340e+11
80 265598 ART 2018-02-02 2017-12-30 2017-12-31 2018-08-01 2.391760e+11
$ curl -X GET 'http://houston/fundamental/sharadar/sf1.csv?conids=265598&domain=main&dimensions=ART' > aapl_fundamentals.csv
$ csvlook aapl_fundamentals.csv --max-columns 7 --max-rows 3
| ConId | DIMENSION | DATEKEY | REPORTPERIOD | CALENDARDATE | LASTUPDATED | REVENUE | ... |
| ------ | --------- | ---------- | ------------ | ------------ | ----------- | ------------- | --- |
| 265598 | ART | 1997-12-05 | 1997-09-26 | 1997-09-30 | 2018-08-01 | 7,081,000,000 | ... |
| 265598 | ART | 1998-02-09 | 1997-12-26 | 1997-12-31 | 2018-08-01 | | ... |
| 265598 | ART | 1998-05-11 | 1998-03-27 | 1998-03-31 | 2018-08-01 | | ... |
In the Sharadar master, AAPL's conid is 199059, so we can also get the data by providing this conid and specifying the "sharadar" domain. The data returned is identical expect that it is indexed to the Sharadar conid rather than the IB conid:
$ quantrocket fundamental sharadar --conids 199059 --domain 'sharadar' --dimensions 'ART' -o aapl_fundamentals.csv
$ csvlook aapl_fundamentals.csv --max-columns 7 --max-rows 3
| ConId | DIMENSION | DATEKEY | REPORTPERIOD | CALENDARDATE | LASTUPDATED | REVENUE | ... |
| ------ | --------- | ---------- | ------------ | ------------ | ----------- | ------------- | --- |
| 199059 | ART | 1997-12-05 | 1997-09-26 | 1997-09-30 | 2018-08-01 | 7,081,000,000 | ... |
| 199059 | ART | 1998-02-09 | 1997-12-26 | 1997-12-31 | 2018-08-01 | | ... |
| 199059 | ART | 1998-05-11 | 1998-03-27 | 1998-03-31 | 2018-08-01 | | ... |
>>> from quantrocket.fundamental import download_sharadar_fundamentals
>>> download_sharadar_fundamentals(domain="sharadar", filepath_or_buffer="aapl_fundamentals.csv", conids=199059, dimensions="ART")
>>> fundamentals = pd.read_csv("aapl_fundamentals.csv", parse_dates=["REPORTPERIOD", "DATEKEY", "CALENDARDATE"])
>>> fundamentals.tail()
ConId DIMENSION DATEKEY REPORTPERIOD CALENDARDATE LASTUPDATED REVENUE
78 199059 ART 2017-08-02 2017-07-01 2017-06-30 2018-08-01 2.235070e+11
79 199059 ART 2017-11-03 2017-09-30 2017-09-30 2018-08-01 2.292340e+11
80 199059 ART 2018-02-02 2017-12-30 2017-12-31 2018-08-01 2.391760e+11
$ curl -X GET 'http://houston/fundamental/sharadar/sf1.csv?conids=199059&domain=sharadar&dimensions=ART' > aapl_fundamentals.csv
$ csvlook aapl_fundamentals.csv --max-columns 7 --max-rows 3
| ConId | DIMENSION | DATEKEY | REPORTPERIOD | CALENDARDATE | LASTUPDATED | REVENUE | ... |
| ------ | --------- | ---------- | ------------ | ------------ | ----------- | ------------- | --- |
| 199059 | ART | 1997-12-05 | 1997-09-26 | 1997-09-30 | 2018-08-01 | 7,081,000,000 | ... |
| 199059 | ART | 1998-02-09 | 1997-12-26 | 1997-12-31 | 2018-08-01 | | ... |
| 199059 | ART | 1998-05-11 | 1998-03-27 | 1998-03-31 | 2018-08-01 | | ... |
Sharadar fundamentals aligned to prices
You can use a DataFrame of Sharadar historical prices or IB historical prices to get Sharadar fundamental data that is aligned to the price data. This makes it easy to perform matrix operations using fundamental data.
First, load the historical prices into Pandas. In this example we load Sharadar prices:
>>> from quantrocket import get_prices
>>> prices = get_prices("sharadar-1d", start_date="2018-01-01", end_date="2019-01-01", fields=["Close"])
>>> closes = prices.loc["Close"]
Now use the DataFrame of prices to get a DataFrame of fundamentals for several desired indicators. We pass domain="sharadar"
to tell the function that the input DataFrame contains Sharadar conids rather than IB conids.
>>> from quantrocket.fundamental import get_sharadar_fundamentals_reindexed_like
>>> fundamentals = get_sharadar_fundamentals_reindexed_like(
closes,
domain="sharadar",
fields=["EPS", "REVENUE", "EVEBITDA"],
dimension="ARQ")
Similar to historical data, the resulting DataFrame can be thought of as several stacked DataFrames, with a MultiIndex consisting of the field (indicator code) and the date. The columns in this example are Sharadar conids, matching the input DataFrame. The DataFrame gives each indicator's current value as of the given date. The function get_sharadar_fundamentals_reindexed_like
shifts values forward by one day to avoid lookahead bias.
>>> fundamentals.head()
ConId 124254 124256 124257 124258
Field Date
EPS 2018-01-02 -0.21 -0.18 -0.02 0.28
2018-01-03 -0.21 -0.18 -0.02 0.28
2018-01-04 -0.21 -0.18 -0.02 0.28
2018-01-05 -0.21 -0.18 -0.02 0.28
2018-01-08 -0.21 -0.18 -0.02 0.28
...
REVENUE 2018-10-26 61651000.0 416087000.0 2019730.0 86217000.0
2018-10-29 61651000.0 416087000.0 2019730.0 86217000.0
2018-10-30 61651000.0 416087000.0 2019730.0 86217000.0
2018-10-31 61651000.0 416087000.0 2019730.0 86217000.0
2018-11-01 61651000.0 416087000.0 2019730.0 86217000.0
You can use .loc
to isolate a particular indicator:
>>> enterprise_multiples = fundamentals.loc["EVEBITDA"]
Using this function with IB historical prices works exactly the same way except that you would pass domain="main"
to tell QuantRocket that you're passing IB conids. The resulting fundamentals DataFrame will contain IB conids matching the input DataFrame.
For best performance, make two separate calls to get_sharadar_fundamentals_reindexed_like
to retrieve numeric (integer or float) vs non-numeric (string or date) fields. Pandas loads numeric fields in an optimized format compared to non-numeric fields, but mixing numeric and non-numeric fields prevents Pandas from using this optimized format, resulting in slower loads and higher memory consumption.
>>>
>>> fundamentals = get_sharadar_fundamentals_reindexed_like(
closes,
domain="sharadar",
fields=["EPS", "REPORTPERIOD"],
dimension="ARQ")
>>> eps = fundamentals.loc["EPS"]
>>> fiscal_period_end_dates = fundamentals.loc["REPORTPERIOD"]
>>>
>>> fundamentals = get_sharadar_fundamentals_reindexed_like(
closes,
domain="sharadar",
fields=["EPS"],
dimension="ARQ")
>>> eps = fundamentals.loc["EPS"]
>>> fundamentals = get_sharadar_fundamentals_reindexed_like(
closes,
domain="sharadar",
fields=["REPORTPERIOD"],
dimension="ARQ")
>>> fiscal_period_end_dates = fundamentals.loc["REPORTPERIOD"]
Available indicators
Available indicators can be found on the Sharadar data guide page or using the API:
$ quantrocket fundamental sharadar-codes
sf1:
ACCOCI:
Description: A component of [EQUITY] representing the accumulated change in equity
from transactions and other events and circumstances from non-owner sources;
net of tax effect; at period end. Includes foreign currency translation items;
certain pension adjustments; unrealized gains and losses on certain investments
in debt and equity securities.
IndicatorType: Balance Sheet
Name: Accumulated Other Comprehensive Income
UnitType: currency
ASSETS:
Description: Sum of the carrying amounts as of the balance sheet date of all assets
that are recognized. Major components are [CASHNEQ]; [INVESTMENTS];[INTANGIBLES];
[PPNENET];[TAXASSETS] and [RECEIVABLES].
IndicatorType: Balance Sheet
Name: Total Assets
UnitType: currency
...
>>> from quantrocket.fundamental import list_sharadar_codes
>>> list_sharadar_codes()
{'sf1': {'ACCOCI': {'Name': 'Accumulated Other Comprehensive Income',
'Description': 'A component of [EQUITY] representing the accumulated change in equity from transactions and other events and circumstances from non-owner sources; net of tax effect; at period end. Includes foreign currency translation items; certain pension adjustments; unrealized gains and losses on certain investments in debt and equity securities.',
'UnitType': 'currency',
'IndicatorType': 'Balance Sheet'},
'ASSETS': {'Name': 'Total Assets',
'Description': 'Sum of the carrying amounts as of the balance sheet date of all assets that are recognized. Major components are [CASHNEQ]; [INVESTMENTS];[INTANGIBLES]; [PPNENET];[TAXASSETS] and[RECEIVABLES].',
'UnitType': 'currency',
'IndicatorType': 'Balance Sheet'},
...}}
$ curl -X GET 'http://houston/fundamental/sharadar/codes'
{"sf1": {"ACCOCI": {"Name": "Accumulated Other Comprehensive Income", "Description": "A component of [EQUITY] representing the accumulated change in equity from transactions and other events and circumstances from non-owner sources; net of tax effect; at period end. Includes foreign currency translation items; certain pension adjustments; unrealized gains and losses on certain investments in debt and equity securities.", "UnitType": "currency", "IndicatorType": "Balance Sheet"}, "ASSETS": {"Name": "Total Assets", "Description": "Sum of the carrying amounts as of the balance sheet date of all assets that are recognized. Major components are [CASHNEQ]; [INVESTMENTS];[INTANGIBLES]; [PPNENET];[TAXASSETS] and [RECEIVABLES].", "UnitType": "currency", "IndicatorType": "Balance Sheet"},...}}
Fundamentals query cache
The fundamental service utilizes a file cache which applies to Sharadar fundamentals and which is explained in another section.
Real-time Data
QuantRocket provides a powerful feature set for collecting, querying, and streaming real-time market data from Interactive Brokers. Highlights include:
- stream or snapshot: collect a continuous stream of market data or a single snapshot of data
- tick or aggregate: collect tick data and optionally aggregate it into bar data of any size
- pull or push: pull tick or aggregate data into your code by querying, or push the stream of tick data to your code over WebSockets
- live market recording: store the data in a database for later replay
Collect tick data
Create tick database
To get started with real-time data, first create an empty database for collecting tick data. Assign a code for the database, specify one or more universes or conids, and optionally specify the fields to collect. (If not specified, "LastPrice" and "Volume" are collected. See the market data field reference for available fields.)
$ quantrocket realtime create-tick-db 'fang-stk-tick' --universes 'fang-stk' --fields 'LastPrice' 'Volume' 'BidPrice' 'AskPrice' 'BidSize' 'AskSize'
status: successfully created tick database fang-stk-tick
>>> from quantrocket.realtime import create_tick_db
>>> create_tick_db("fang-stk-tick", universes="fang-stk",
fields=["LastPrice", "Volume", "BidPrice", "AskPrice", "BidSize", "AskSize"])
{'status': 'successfully created tick database fang-stk-tick'}
$ curl -X PUT 'http://houston/realtime/databases/fang-stk-tick?universes=fang-stk&fields=LastPrice&fields=Volume&fields=BidPrice&fields=AskPrice&fields=BidSize&fields=AskSize'
{"status": "successfully created tick database fang-stk-tick"}
Make sure IB Gateway is running, then begin collecting market data:
$ quantrocket realtime collect 'fang-stk-tick'
status: the market data will be collected \until canceled
>>> from quantrocket.realtime import collect_market_data
>>> collect_market_data("fang-stk-tick")
{'status': 'the market data will be collected until canceled'}
$ curl -X POST 'http://houston/realtime/collections?codes=fang-stk-tick'
{"status": "the market data will be collected until canceled"}
You can create any number of databases with differing configurations and collect data for more than one database at a time (subject to concurrent ticker limits).
Monitor data collection
There are numerous ways to monitor the flow of data as it's being collected.
You can view a simple summary of active collections, which will display the number of securities by database code (you can use --detail
/detail=True
if you want to see actual conids by database code instead of summary counts):
$ quantrocket realtime active
ib:
etf-sampler-trades: 6
fang-stk-tick: 5
>>> from quantrocket.realtime import get_active_collections
>>> get_active_collections()
{'ib': {'etf-sampler-trades': 6, 'fang-stk-tick': 5}}
$ curl -X GET 'http://houston/realtime/collections'
{"ib": {"etf-sampler-trades": 6, "fang-stk-tick": 5}}
You can monitor the detailed flightlog stream, which will print a summary approximately every minute of the total ticks and tickers recently received:
$ quantrocket flightlog stream -d
...
┌──────────────────────────────────────────────────┐
│ IB market data received: │
│ ibg1 │
│ unique_tickers total_ticks │
│ received at 20:04 UTC 11 2759 │
│ received at 20:05 UTC 11 2716 │
│ received at 20:06 UTC 11 2624 │
│ received at 20:07 UTC 11 2606 │
│ received at 20:08 UTC 11 2602 │
│ received at 20:09 UTC 11 2613 │
│ received at 20:10 UTC 11 2800 │
│ received at 20:11 UTC 11 2518 │
│ received at 20:12 UTC 11 2444 │
│ active collections 11 │
└──────────────────────────────────────────────────┘
...
You can connect directly to the data over a WebSocket to see the full, unfiltered stream, or you can query the database to see what's recently arrived.
Cancel data collection
You can cancel data collection by database code (optionally limiting by universe or conid), which returns the remaining active collections after cancellation:
$ quantrocket realtime cancel 'fang-stk-tick'
ib:
etf-sampler-trades: 6
>>> from quantrocket.realtime import cancel_market_data
>>> cancel_market_data("fang-stk-tick")
{'ib': {'etf-sampler-trades': 6}}
$ curl -X DELETE 'http://houston/realtime/collections?codes=fang-stk-tick'
{"ib": {"etf-sampler-trades": 6}}
Or you can cancel everything:
$ quantrocket realtime cancel --all
>>> cancel_market_data(cancel_all=True)
{}
$ curl -X DELETE 'http://houston/realtime/collections?cancel_all=True'
{}
Another option is to indicate a cancellation time when you initiate the data collection. You can specify a specific time and timezone, for example cancel data collection after the US market close:
$ quantrocket realtime collect 'fang-stk-tick' --until '16:01:00 America/New_York'
status: the market data will be collected \until 16:01:00 America/New_York
>>> from quantrocket.realtime import collect_market_data
>>> collect_market_data("fang-stk-tick", until="16:01:00 America/New_York")
{'status': 'the market data will be collected until 16:01:00 America/New_York'}
$ curl -X POST 'http://houston/realtime/collections?codes=fang-stk-tick&until=16:01:00+America/New_York'
{"status": "the market data will be collected until 16:01:00 America/New_York"}
Or you can specify a Pandas timedelta string, for example cancel data collection in 30 minutes:
$ quantrocket realtime collect 'fang-stk-tick' --until '30m'
status: the market data will be collected \until 30m
>>> from quantrocket.realtime import collect_market_data
>>> collect_market_data("fang-stk-tick", until="30m")
{'status': 'the market data will be collected until 30m'}
$ curl -X POST 'http://houston/realtime/collections?codes=fang-stk-tick&until=30m'
{"status": "the market data will be collected until 30m"}
Concurrent ticker limits
Ticker limits apply to streaming market data but do not apply to
snapshot data.
Interactive Brokers limits the number of securities you can stream simultaneously. By default, the limit is 100 concurrent tickers per IB Gateway. The limit can be increased in several ways:
- run multiple IB Gateways. QuantRocket will split requests between the IB Gateways, thereby increasing your ticker limit.
- purchase quote booster packs through IB Client Portal. Each purchased booster pack enables an additional 100 concurrent market data lines.
- accounts which are of significant size or which generate significant monthly commissions are allotted more generous ticker limits. See the "Market Data Display" section of the IB website to learn more about how concurrent ticker limits are calculated.
When you exceed your ticker limits, the IB API returns a "max tickers exceeded" error message for each security above the limit. QuantRocket automatically detects this error message and, if multiple IB Gateways are running, attempts to re-submit the rejected request to a different IB Gateway with additional capacity. Thus, you can run multiple IB Gateways with differing ticker limits and QuantRocket will split up the requests appropriately. If the ticker capacity is maxed out on all connected gateways, you will see warnings in flightlog:
quantrocket.realtime: WARNING All connected gateways have maxed out their concurrent market data collections, skipping SQM STK (conid 12374), please cancel existing collections or increase your market data lines then re-collect this security (max tickers: ibg1:100)
Streaming vs snapshot data
By default, streaming market data is collected. An alternative option is to collect a single snapshot of data. To do so, use the snapshot
parameter. The optional wait
parameter will cause the command to block until the data collection is complete:
$ quantrocket realtime collect 'usa-stk-quote' --snapshot --wait
status: completed market data snapshot for usa-stk-quote
>>> from quantrocket.realtime import collect_market_data
>>> collect_market_data("usa-stk-quote", snapshot=True, wait=True)
{'status': 'completed market data snapshot for usa-stk-quote'}
$ curl -X POST 'http://houston/realtime/collections?codes=usa-stk-quote&snapshot=True&wait=True'
{"status": "completed market data snapshot for usa-stk-quote"}
Aside from the obvious difference that snapshot data captures a single point in time while streaming data captures a period of time, below are the major points of comparison between streaming and snapshot data.
Ticker limit
The primary advantage of snapshot data is that it is not subject to concurrent ticker limits. If you want the latest quote for several thousand stocks and are limited to 100 concurrent tickers, snapshot data is the best choice.
Initialization latency
When collecting market data (streaming or snapshot) for several thousand securities, it can take a few minutes to issue all of the initial market data requests to the IB API, after which data flows in real time. (This is because the IB API limits the rate of messages that the client can send to the API, but not the rate of messages that the API can send to the client). With streaming data collection, you can work around this initial latency by simply initiating data collection a few minutes before you need the data. With snapshot data, this isn't possible since you're not collecting a continuous stream.
Fields supported
Snapshot data only supports a subset of the fields supported by streaming data. See the market data field reference.
IB market data field reference
IB streaming market data does not deliver every tick but is sampled and delivers ticks representing an average over the sampling interval. The sampling interval is 250 ms (4 samples per second) for stocks, futures, and non-US options, 100 ms (10 samples per second) for US options, and 5 ms (20 samples per second) for FX pairs.
For most fields, IB does not provide a timestamp. Therefore, QuantRocket provides one. Thus, the Date
field returned with real-time data indicates the time when the data first arrived in QuantRocket. Certain IB-provided timestamps are available, however, see LastTimestamp
and TimeSales
.
Due to the large number of market data fields and asset classes supported by Interactive Brokers, not all fields are applicable to all asset classes. Additionally, not all fields are available at all times of day. If a particular field is unavailable for a particular security at a particular time, often the IB API will not return an error message but will simply return no data. If you expect data but none is being returned, check whether you can view the data in Trader Workstation; data availability through the IB API mirrors availability in Trader Workstation.
Trades and quotes
Field | Description | Supports snapshot? |
---|
BidSize | Number of contracts or lots offered at the bid price | ✔ |
BidPrice | Highest priced bid for the contract | ✔ |
AskPrice | Lowest price offer on the contract | ✔ |
AskSize | Number of contracts or lots offered at the ask price | ✔ |
LastPrice | Last price at which the contract traded | ✔ |
LastSize | Number of contracts or lots traded at the last price. See note below. | ✔ |
Volume | Trading volume for the day. See note below. | ✔ |
LastTimestamp | Time of the last trade (in UNIX time). This field is provided only for trades, not quotes, and as it arrives separately from LastPrice , it can be difficult to know which LastPrice it corresponds to. It can however be used to calculate latency by comparing the timestamp to the QuantRocket-generated timestamp. See Time and sales for correlating trades with IB-provided timestamps. | ✔ |
LastTradeSize vs Volume
The Volume
field contains the cumulative volume for the day, while the LastSize
field contains the size of the last trade. Consider using the Volume
field for trade size calculation rather than using LastSize
. Because IB market data is not tick-by-tick, LastSize
may not provide a complete picture of all trades that have occurred. However, the cumulative Volume
field will. Trade size can be derived from volume by taking a diff in Pandas:
volumes = prices.loc["Volume"]
trade_sizes = volumes.diff()
Time and sales
TimeSales
and TimeSalesFiltered
provide an alternative method of collecting trades (but not quotes). These fields are the API equivalent of the Time and Sales window in Trader Workstation.
The primary advantage of these fields is that they provide the trade price, trade size, and trade timestamp (plus other fields) as a unified whole, unlike LastPrice
, LastSize
, and LastTimestamp
which arrive independently and thus can be difficult to associate with one another in fast-moving markets.
Field | Description | Supports snapshot? |
---|
TimeSales | Last trade details corresponding to Time & Sales window in TWS. Includes additional trade types such as combos, odd lots, derivates, etc. that are not reported by the LastPrice field. (In the IB API documentation the TimeSales field is called RtVolume.) | - |
TimeSalesFiltered | Identical to TimeSales but excludes combos, odd lots, derivates, etc. (In the IB API documentation the TimeSalesFiltered field is called RtTradeVolume.) | - |
When you request TimeSales
or TimeSalesFiltered
, several nested fields are returned.
LastPrice
- trade priceLastSize
- trade sizeLastTimestamp
- UTC datetime of tradeVolume
- total traded volume for the dayVwap
- volume-weighted average price for the dayOneFill
- whether or not the trade was filled by a single market maker
When streaming over WebSockets, these fields will arrive in a nested data structure:
{
"v": "ib",
"i": 15124833,
"t": "2019-06-05T18:23:16.532644",
"f": "TimeSales",
"d": {
"LastPrice":356.31,
"LastSize": 100,
"LastTimestamp": "2019-06-05T18:23:16.409000",
"Volume": 3043700,
"Vwap": 353.30651072,
"OneFill": 1
}
}
CSV output queried from the database will flatten the nested structure using the following naming convention: TimeSalesLastPrice
, TimeSalesLastSize
, etc.
Option Greeks
Field | Description | Supports snapshot? |
---|
ModelOptionComputation | Computed Greeks and implied volatility based on the underlying stock price and the option model price. Corresponds to Greeks shown in TWS | ✔ |
BidOptionComputation | Computed Greeks and implied volatility based on the underlying stock price and the option bid price | ✔ |
AskOptionComputation | Computed Greeks and implied volatility based on the underlying stock price and the option ask price | ✔ |
LastOptionComputation | Computed Greeks and implied volatility based on the underlying stock price and the option last traded price | ✔ |
When you request an option computation field, several nested fields will be returned representing the different Greeks. When streaming over WebSockets, these fields will arrive in a nested data structure:
{
"v": "ib",
"i": 350561917,
"t": "2019-06-05T16:10:16.162728",
"f": "ModelOptionComputation",
"d": {
"ImpliedVolatility": 0.27965811846647004,
"Delta": 0.01105129271665234,
"OptionPrice": 0.028713083045907993,
"PvDividend": 0.09943775573849334,
"Gamma": 0.0036857174753818366,
"Vega": 0.0103567465788384,
"Theta": -0.0011149809872252135,
"UnderlyingPrice": 52.37
}
}
CSV output queried from the database will flatten the nested structure using the following naming convention: ModelOptionComputationImpliedVolatility
, ModelOptionComputationDelta
, etc.
See Miscellaneous fields for other options-related fields.
Auction imbalance
Field | Description | Supports snapshot? |
---|
AuctionVolume | The number of shares that would trade if no new orders were received and the auction were held now. | - |
AuctionPrice | The price at which the auction would occur if no new orders were received and the auction were held now - the indicative price for the auction. Typically received after AuctionImbalance | - |
AuctionImbalance | The number of unmatched shares for the next auction; returns how many more shares are on one side of the auction than the other. Typically received after AuctionVolume | - |
RegulatoryImbalance | The imbalance that is used to determine which at-the-open or at-the-close orders can be entered following the publishing of the regulatory imbalance. | ✔ |
Miscellaneous fields
Field | Description | Supports snapshot? |
---|
High | High price for the day | ✔ |
Low | Low price for the day | ✔ |
Open | Current session's opening price. Before open will refer to previous day. The official opening price requires a market data subscription to the native exchange of the instrument | ✔ |
Close | Last available closing price for the previous day. | ✔ |
OptionHistoricalVolatility | The 30-day historical volatility (currently for stocks). | - |
OptionImpliedVolatility | A prediction of how volatile an underlying will be in the future. The IB 30-day volatility is the at-market volatility estimated for a maturity thirty calendar days forward of the current trading day, and is based on option prices from two consecutive expiration months. | - |
OptionCallOpenInterest | Call option open interest. | - |
OptionPutOpenInterest | Put option open interest. | - |
OptionCallVolume | Call option volume for the trading day. | - |
OptionPutVolume | Put option volume for the trading day. | - |
IndexFuturePremium | The number of points that the index is over the cash index. | - |
MarkPrice | The mark price is the current theoretical calculated value of an instrument. Since it is a calculated value, it will typically have many digits of precision. | - |
Halted | Indicates if a contract is halted. 1 = General halt imposed for regulatory reasons. 2 = Volatility halt imposed by the exchange to protect against extreme volatility. | - |
LastRthTrade | Last Regular Trading Hours traded price. | - |
RtHistoricalVolatility | 30-day real time historical volatility. | - |
CreditmanSlowMarkPrice | Mark price update used in system calculations | - |
FuturesOpenInterest | Total number of outstanding futures contracts | - |
AverageOptVolume | Average volume of the corresponding option contracts | - |
TradeCount | Trade count for the day. | - |
TradeRate | Trade count per minute. | - |
VolumeRate | Volume per minute. | - |
ShortTermVolume3min | The past three minutes volume. Interpolation may be applied. For stocks only. | - |
ShortTermVolume5min | The past five minutes volume. Interpolation may be applied. For stocks only. | - |
ShortTermVolume10min | The past ten minutes volume. Interpolation may be applied. For stocks only. | - |
Low13Weeks | Lowest price for the last 13 weeks. For stocks only. | - |
High13Weeks | Highest price for the last 13 weeks. For stocks only. | - |
Low26Weeks | Lowest price for the last 26 weeks. For stocks only. | - |
High26Weeks | Highest price for the last 26 weeks. For stocks only. | - |
Low52Weeks | Lowest price for the last 52 weeks. For stocks only. | - |
High52Weeks | Highest price for the last 52 weeks. For stocks only. | - |
AverageVolume | The average daily trading volume over 90 days. For stocks only. | - |
Stream tick data over WebSockets
With data collection in progress, you can connect to the incoming data stream over WebSockets. This allows you to push the data stream to your code; meanwhile the realtime service also saves the incoming data to the database in the background for future use.
Streaming market data to a JupyterLab terminal provides a simple technique to monitor the incoming data. To start the stream:
$ quantrocket realtime stream
Received ping
{"v": "ib", "i": 265598, "t": "2019-06-06T14:07:48.750025", "f": "LastPrice", "d": 182.87}
{"v": "ib", "i": 265598, "t": "2019-06-06T14:07:48.750321", "f": "LastSize", "d": 100}
...
Data arrives as a JSON array, with the following structure:
{
"v": "ib",
"i": 265598,
"t": "2019-06-06T14:07:48.732735",
"f": "LastPrice",
"d": 182.87
}
By default all incoming data is streamed, that is, all collected tickers and all fields, even fields that you have not configured to save to the database. You can optionally limit the fields and conids:
$ quantrocket realtime stream --conids '265598' --fields 'LastPrice' 'BidPrice' 'AskPrice'
Remember, filtering the WebSocket stream doesn't control what data is being collected from the vendor, it only controls how much of the collected data is included in the stream.
WebSocket Python integration
Streaming data is not currently integrated into any of QuantRocket's Python libraries or APIs. We plan to add this integration in the future. For now, users can stream data to their own custom scripts by installing and using the WebSockets library.
The wscat
utility is a useful tool to help you understand the WebSocket API for the purpose of Python development.
wscat
The command quantrocket realtime stream
is a lightweight wrapper around wscat
, a command-line utility written in Node.js for making WebSocket connections. You can use wscat
directly if you prefer, which is useful for experimenting with the WebSocket API. To start the stream:
$ wscat -c 'http://houston/realtime/stream'
connected (press CTRL+C to quit)
< Received ping
< {"v": "ib", "i": 265598, "t": "2019-06-06T14:07:48.750025", "f": "LastPrice", "d": 182.87}
< {"v": "ib", "i": 265598, "t": "2019-06-06T14:07:48.750321", "f": "LastSize", "d": 100}
...
You can send a JSON message to limit the fields:
> {"fields": ["LastPrice", "BidPrice", "AskPrice"]}
To limit the securities being returned, send JSON messages with the keys "conids" or "exclude_conids" to indicate which tickers you want to add to, or subtract from, the current stream. For example, this sequence of messages would exclude all tickers from the stream then re-enable only AAPL:
> {"exclude_conids":"*"}
> {"conids":[265598]}
You can also provide the filters as query string parameters at the time you initiate the WebSocket connection:
$ wscat -c 'http://houston/realtime/stream?conids=265598&conids=3691937&fields=LastPrice&fields=BidPrice'
Query tick data
You can download a file of the ticks stored in your tick database:
$ quantrocket realtime get 'fang-stk-tick' --start-date '2019-06-06' --conids 265598 --fields 'LastPrice' 'BidPrice' 'AskPrice' | csvlook --max-rows 10
| ConId | Date | LastPrice | BidPrice | AskPrice |
| ------ | ----------------------------- | --------- | -------- | -------- |
| 265598 | 2019-06-06 12:56:38.253084+00 | 182.77 | | |
| 265598 | 2019-06-06 12:56:38.273889+00 | | 182.76 | |
| 265598 | 2019-06-06 12:56:38.2748+00 | | | 182.82 |
| 265598 | 2019-06-06 12:56:39.348729+00 | 182.82 | | |
| 265598 | 2019-06-06 12:56:39.349447+00 | | | 182.85 |
| 265598 | 2019-06-06 12:56:39.848083+00 | 182.85 | | |
| 265598 | 2019-06-06 12:56:40.600397+00 | | | 182.98 |
| 265598 | 2019-06-06 12:56:41.102615+00 | | 182.85 | |
| 265598 | 2019-06-06 12:56:46.864118+00 | | | 182.95 |
| 265598 | 2019-06-06 12:56:47.365166+00 | | 182.9 | |
| ... | ... | ... | ... | ... |
>>> import pandas as pd
>>> from quantrocket.realtime import download_market_data_file
>>> download_market_data_file("fang-stk-tick",
start_date="2019-06-06",
conids=265598, fields=["LastPrice","BidPrice","AskPrice"],
filepath_or_buffer="fang_stk_tick.csv")
>>> ticks = pd.read_csv("fang_stk_tick.csv", parse_dates=["Date"])
>>> ticks.head()
ConId Date LastPrice BidPrice AskPrice
0 265598 2019-06-06 12:56:38.253084 182.77 NaN NaN
1 265598 2019-06-06 12:56:38.273889 NaN 182.76 NaN
2 265598 2019-06-06 12:56:38.274800 NaN NaN 182.82
3 265598 2019-06-06 12:56:39.348729 182.82 NaN NaN
4 265598 2019-06-06 12:56:39.349447 NaN NaN 182.85
5 265598 2019-06-06 12:56:39.848083 182.85 NaN NaN
6 265598 2019-06-06 12:56:40.600397 NaN NaN 182.98
7 265598 2019-06-06 12:56:41.102615 NaN 182.85 NaN
8 265598 2019-06-06 12:56:46.864118 NaN NaN 182.95
9 265598 2019-06-06 12:56:47.365166 NaN 182.90 NaN
$ curl -X GET 'http://houston/realtime/fang-stk-tick.csv?start_date=2019-06-06&conids=265598&fields=LastPrice&fields=BidPrice&fields=AskPrice' | head
ConId,Date,LastPrice,BidPrice,AskPrice
265598,2019-06-06 12:56:38.253084+00,182.77,,
265598,2019-06-06 12:56:38.273889+00,,182.76,
265598,2019-06-06 12:56:38.2748+00,,,182.82
265598,2019-06-06 12:56:39.348729+00,182.82,,
265598,2019-06-06 12:56:39.349447+00,,,182.85
265598,2019-06-06 12:56:39.848083+00,182.85,,
265598,2019-06-06 12:56:40.600397+00,,,182.98
265598,2019-06-06 12:56:41.102615+00,,182.85,
265598,2019-06-06 12:56:46.864118+00,,,182.9
Aggregate databases
Aggregate databases provide rolled-up views of tick databases. Tick data can be rolled up to any bar size, for example 1 second, 1 minute, 15 minutes, 2 hours, or 1 day. One of the major benefits of aggregate databases is that they provide a consistent API with historical databases, using the get_prices
function.
Create aggregate database
Create an aggregate database by providing a database code, the tick database to aggregate, the bar size (using a Pandas timedelta string such as '1s', '1m', '1h' or '1d'), and how to aggregate the tick fields. For example, the following command creates a 1-minute aggregate database with OHLCV bars, that is, with bars containing the open, high, low, and close of the LastPrice
field, plus the close of the Volume
field:
$ quantrocket realtime create-agg-db 'fang-stk-tick-1min' --tick-db 'fang-stk-tick' --bar-size '1m' --fields 'LastPrice:Open,High,Low,Close' 'Volume:Close'
status: successfully created aggregate database fang-stk-tick-1min from tick database fang-stk-tick
>>> from quantrocket.realtime import create_agg_db
>>> create_agg_db("fang-stk-tick-1min",
tick_db_code="fang-stk-tick",
bar_size="1m",
fields={"LastPrice":["Open","High","Low","Close"],
"Volume": ["Close"]})
{'status': 'successfully created aggregate database fang-stk-tick-1min from tick database fang-stk-tick'}
$ curl -X PUT 'http://houston/realtime/databases/fang-stk-tick/aggregates/fang-stk-tick-1min?bar_size=1m&fields=LastPrice%3AOpen%2CHigh%2CLow%2CClose&fields=Volume%3AClose'
{"status": "successfully created aggregate database fang-stk-tick-1min from tick database fang-stk-tick"}
Checking the database config reveals the fieldnames in the resulting aggregate database:
$ quantrocket realtime config 'fang-stk-tick-1min'
bar_size: 1m
fields:
- LastPriceClose
- LastPriceHigh
- LastPriceLow
- LastPriceOpen
- VolumeClose
tick_db_code: fang-stk-tick
>>> from quantrocket.realtime import get_db_config
>>> get_db_config("fang-stk-tick-1min")
{'tick_db_code': 'fang-stk-tick',
'bar_size': '1m',
'fields': ['LastPriceClose',
'LastPriceHigh',
'LastPriceLow',
'LastPriceOpen',
'VolumeClose']}
$ curl -X GET 'http://houston/realtime/databases/fang-stk-tick/aggregates/fang-stk-tick-1min'
{"tick_db_code": "fang-stk-tick", "bar_size": "1m", "fields": ["LastPriceClose", "LastPriceHigh", "LastPriceLow", "LastPriceOpen", "VolumeClose"]}
You can create multiple aggregate databases from a single tick database.
Materialization of aggregate databases
An aggregate database is populated by aggregating the tick data and storing the aggregated results as a separate database table which can then be queried directly. In database terminology, this process is called materialization.
No user action is required to materialize the aggregate database.
QuantRocket uses TimescaleDB to store tick data as well as to build aggregate databases from tick data. After you create an aggregate database, background workers will materialize the aggregate database from the tick data and will periodically run again to keep the aggregate database up-to-date. Moreover, the aggregate database is refreshed at the last second each time you query it to ensure it is fully up-to-date. This last-second refresh is usually very quick due to the ongoing background refresh of the aggregate database that has already happened. However, the first query might run slowly if the aggregate database is still being materialized for the first time.
Query aggregate data
You can download a file of aggregate data using the same API used to download tick data. Instead of ticks, bars are returned. As with tick data, all timestamps are UTC:
$ quantrocket realtime get 'fang-stk-tick-1min' --start-date '2019-06-06' --conids 265598 | csvlook --max-rows 5
| ConId | Date | VolumeClose | LastPriceOpen | LastPriceClose | LastPriceHigh | LastPriceLow |
| ------ | ---------------------- | ----------- | ------------- | -------------- | ------------- | ------------ |
| 265598 | 2019-06-06 12:56:00+00 | 127000 | 182.77 | 182.9 | 182.9 | 182.77 |
| 265598 | 2019-06-06 12:57:00+00 | 129300 | 182.95 | 182.9 | 182.95 | 182.78 |
| 265598 | 2019-06-06 12:58:00+00 | 131900 | 182.78 | 182.8 | 182.95 | 182.78 |
| 265598 | 2019-06-06 12:59:00+00 | 133400 | 182.81 | 182.75 | 182.88 | 182.75 |
| 265598 | 2019-06-06 13:00:00+00 | 123400 | 182.79 | 182.81 | 182.81 | 182.79 |
| ... | ... | ... | ... | ... | ... | ... |
>>> import pandas as pd
>>> from quantrocket.realtime import download_market_data_file
>>> download_market_data_file("fang-stk-tick-1min",
start_date="2019-06-06",
conids=265598,
filepath_or_buffer="fang_stk_tick_1min.csv")
>>> prices = pd.read_csv("fang_stk_tick_1min.csv", parse_dates=["Date"])
>>> prices.head()
ConId Date VolumeClose LastPriceOpen LastPriceClose LastPriceHigh LastPriceLow
0 265598 2019-06-06 12:56:00 127000.0 182.77 182.90 182.90 182.77
1 265598 2019-06-06 12:57:00 129300.0 182.95 182.90 182.95 182.78
2 265598 2019-06-06 12:58:00 131900.0 182.78 182.80 182.95 182.78
3 265598 2019-06-06 12:59:00 133400.0 182.81 182.75 182.88 182.75
4 265598 2019-06-06 13:00:00 123400.0 182.79 182.81 182.81 182.79
$ curl -X GET 'http://houston/realtime/fang-stk-tick-1min.csv?start_date=2019-06-06&conids=265598' | head
ConId,Date,VolumeClose,LastPriceOpen,LastPriceClose,LastPriceHigh,LastPriceLow
265598,2019-06-06 12:56:00+00,127000,182.77,182.9,182.9,182.77
265598,2019-06-06 12:57:00+00,129300,182.95,182.9,182.95,182.78
265598,2019-06-06 12:58:00+00,131900,182.78,182.8,182.95,182.78
265598,2019-06-06 12:59:00+00,133400,182.81,182.75,182.88,182.75
For a higher-level API, you can load real-time aggregate data with the get_prices function which is also used for loading historical data.
Database size
Collecting tick data can quickly consume a considerable amount of disk space. Creating an aggregate database from the tick database uses additional space. Therefore you should keep an eye on your disk space.
Tick data collection strategy
Here is an example strategy for collecting more tick data than will fit on your local disk.
Suppose you have the following constraints:
- you have only enough local disk space for 3 months of tick data
- you want data that won't fit on your local disk to be preserved in the cloud indefinitely
- your trading strategies require at minimum that the past 2 weeks of tick data are available on the local disk
First, create the tick database and append a date or version number:
$ quantrocket realtime create-tick-db 'globex-fut-taq-1' --universes 'globex-fut' --fields 'LastPrice' 'BidPrice' 'AskPrice'
status: successfully created tick database globex-fut-taq-1
>>> from quantrocket.realtime import create_tick_db
>>> create_tick_db("globex-fut-taq-1", universes="globex-fut", fields=["LastPrice","BidPrice","AskPrice"])
{'status': 'successfully created tick database globex-fut-taq-1'}
$ curl -X PUT 'http://houston/realtime/databases/globex-fut-taq-1?universes=globex-fut&fields=LastPrice&fields=BidPrice&fields=AskPrice'
{"status": "successfully created tick database globex-fut-taq-1"}
Collect data and use this database for your trading. After two and a half months, create a second, identical database:
$ quantrocket realtime create-tick-db 'globex-fut-taq-2' --universes 'globex-fut' --fields 'LastPrice' 'BidPrice' 'AskPrice'
status: successfully created tick database globex-fut-taq-2
>>> create_tick_db("globex-fut-taq-2", universes="globex-fut", fields=["LastPrice","BidPrice","AskPrice"])
{'status': 'successfully created tick database globex-fut-taq-2'}
$ curl -X PUT 'http://houston/realtime/databases/globex-fut-taq-2?universes=globex-fut&fields=LastPrice&fields=BidPrice&fields=AskPrice'
{"status": "successfully created tick database globex-fut-taq-2"}
Begin collecting data into both databases, but continue to point your trading strategies at the first database (since the second database does not yet have two weeks of data). Once you have collected two weeks of data into the new database,
push the first database to S3:
$ quantrocket db s3push --services 'realtime' --codes 'globex-fut-taq-1'
status: the databases will be pushed to S3 asynchronously
>>> from quantrocket.db import s3_push_databases
>>> s3_push_databases(services="realtime", codes="globex-fut-taq-1")
{'status': 'the databases will be pushed to S3 asynchronously'}
$ curl -X PUT 'http://houston/db/s3?services=realtime&codes=globex-fut-taq-1'
{"status": "the databases will be pushed to S3 asynchronously"}
With the first database safely in the cloud, point your trading strategies to the second database, and delete the first database:
$ quantrocket realtime drop-db 'globex-fut-taq-1' --confirm-by-typing-db-code-again 'globex-fut-taq-1'
status: deleted tick database globex-fut-taq-1
>>> from quantrocket.realtime import drop_db
>>> drop_db("globex-fut-taq-1", confirm_by_typing_db_code_again="globex-fut-taq-1")
{'status': 'deleted tick database globex-fut-taq-1'}
$ curl -X DELETE 'http://houston/realtime/databases/globex-fut-taq-1?confirm_by_typing_db_code_again=globex-fut-taq-1'
{"status": "deleted tick database globex-fut-taq-1"}
Repeat this database rotation strategy every 3 months. Later, if you need to perform analysis of an archived tick database, you can
restore it from the cloud.
History database as real-time feed
Each time you update a history database, the data is brought current as of the moment you collect it. Thus, for some use cases it may be suitable to use a history database as a real-time data source. One advantage of this approach, compared to using the realtime service, is simplicity: you only have to worry about a single database.
The primary limitation of this approach is that it takes longer to collect data using the history service than using the realtime service. This difference isn't significant for a small number of symbols, but it can be quite significant if you need up-to-date quotes for thousands of securities.
Wait for historical data collection
When using a history database as a real-time data source, you may need to coordinate data collection with other tasks that depend on the data. For example, if trading an intraday strategy using a history database, you will typically want to run your strategy shortly after collecting data, but you want to ensure that the strategy doesn't run while data collection is still in progress. You can use the command quantrocket history wait
for this purpose. This command simply blocks until the specified database is no longer being collected:
$
$ quantrocket history collect 'arca-15min'
status: the historical data will be collected asynchronously
$
$ quantrocket history wait 'arca-15min'
status: data collection finished for arca-15min
An optional timeout can be provided using a Pandas timedelta string; if the data collection doesn't finish within the allotted timeout, the wait command will return an error message and exit nonzero:
$ quantrocket history wait 'arca-15min' --timeout '10sec'
msg: data collection for arca-15min not finished after 10sec
status: error
To use the wait command on your countdown service crontab, you can run it before your trade command. In the example below, we collect data at 9:45 and want to place orders at 10:00. In case data collection is too slow, we will wait up to 5 minutes to place orders (that is, until 10:05). If data collection is still not finished, the wait command will exit nonzero and the strategy will not run. (If data collection is finished before 10:00, the wait command will return immediately and our strategy will run immediately.)
45 9 * * mon-fri quantrocket master isopen 'ARCA' && quantrocket history collect 'arca-15min'
0 10 * * mon-fri quantrocket master isopen 'ARCA' && quantrocket history wait 'arca-15min' --timeout '5min' && quantrocket moonshot trade 'intraday-strategy' | quantrocket blotter order -f '-'
Alternatively, if you want to run your strategy as soon as data collection finishes, you can place everything on one line:
45 9 * * mon-fri quantrocket master isopen 'ARCA' && quantrocket history collect 'arca-15min' && quantrocket history wait 'arca-15min' --timeout '15min' && quantrocket moonshot trade 'intraday-strategy' | quantrocket blotter order -f '-'
Research
Once you've collected some historical data, you're ready to open a research notebook in JupyterLab and start analyzing your ideas. QuantRocket makes it easy to work with historical and fundamental data using Pandas. You can analyze your alpha factors using Alphalens. Once you're ready to backtest, your research code readily transfers to Moonshot since Moonshot is Pandas-based.
Why research notebooks?
The workflow of many quants includes a research stage prior to backtesting. The purpose of a separate research stage is to rapidly test ideas in a preliminary manner to see if they're worth the effort of a full-scale backtest. The research stage typically ignores transaction costs, liquidity constraints, and other real-world challenges that traders face and that backtests try to simulate. Thus, the research stage constitutes a "first cut": promising ideas advance to the more stringent simulations of backtesting, while unpromising ideas are discarded.
Jupyter notebooks provide Python quants with an excellent tool for ad-hoc research. Jupyter notebooks let you write code to crunch your data, run visualizations, and make sense of the results with narrative commentary.
The get_prices function
End-of-day historical data
Using the Python client, you can load historical data into a Pandas DataFrame using the database code:
>>> from quantrocket import get_prices
>>> prices = get_prices("japan-bank-eod", start_date="2017-01-01", fields=["Open","High","Low","Close", "Volume"])
The DataFrame will have a column for each security (represented by conids). For daily bar sizes and larger, the DataFrame will have a two-level index: an outer level for each field (Open, Close, Volume, etc.) and an inner level containing a DatetimeIndex:
>>> prices.head()
ConId 13857203 13905344 13905462 13905522 13905624 \
Field Date
Close 2017-01-04 11150.0 3853.0 4889.0 4321.0 2712.0
2017-01-05 11065.0 3910.0 4927.0 4299.0 2681.0
2017-01-06 11105.0 3918.0 4965.0 4266.0 2672.5
2017-01-10 11210.0 3886.0 4965.0 4227.0 2640.0
2017-01-11 11115.0 3860.0 4970.0 4208.0 2652.0
...
Volume 2018-01-29 685800.0 2996700.0 1000600.0 1339000.0 6499600.0
2018-01-30 641700.0 2686100.0 1421900.0 1709900.0 7039800.0
2018-01-31 603400.0 3179000.0 1517100.0 1471000.0 5855500.0
2018-02-01 447300.0 3300900.0 1295800.0 1329600.0 5540600.0
2018-02-02 510200.0 4739800.0 2060500.0 1145200.0 5585300.0
The DataFrame can be thought of as several stacked DataFrames, one for each field. You can use .loc
to isolate a DataFrame for each field:
>>> closes = prices.loc["Close"]
>>> closes.head()
ConId 13857203 13905344 13905462 13905522 13905624 13905665 \
Date
2017-01-04 11150.0 3853.0 4889.0 4321.0 2712.0 655.9
2017-01-05 11065.0 3910.0 4927.0 4299.0 2681.0 658.4
2017-01-06 11105.0 3918.0 4965.0 4266.0 2672.5 656.2
2017-01-10 11210.0 3886.0 4965.0 4227.0 2640.0 652.8
2017-01-11 11115.0 3860.0 4970.0 4208.0 2652.0 665.1
Each field's DataFrame has the same columns and index, which makes it easy to perform matrix operations. For example, calculate dollar volume (or Euro volume, Yen volume, etc. depending on the universe):
>>> volumes = prices.loc["Volume"]
>>> dollar_volumes = closes * volumes
Or calculate overnight (close-to-open) returns:
>>> opens = prices.loc["Open"]
>>> prior_closes = closes.shift()
>>> overnight_returns = (opens - prior_closes) / prior_closes
>>> overnight_returns.head()
ConId 13857203 13905344 13905462 13905522 13905624 13905665 \
Date
2017-01-04 NaN NaN NaN NaN NaN NaN
2017-01-05 0.001345 0.004412 0.003477 -0.002083 0.002765 0.021497
2017-01-06 -0.000904 -0.005115 -0.000812 -0.011165 -0.016039 -0.012606
2017-01-10 -0.003152 -0.006891 0.009869 -0.008204 -0.011038 -0.002591
2017-01-11 0.000446 -0.000257 0.007049 0.004968 0.001894 0.009498
Intraday historical data
In contrast to daily bars, the stacked DataFrame for intraday bars is a three-level index, consisting of the field, the date, and the time as a string (for example, 09:30:00
):
>>> prices = get_prices("etf-1h", start_date="2017-01-01", fields=["Open","High","Low","Close", "Volume"])
>>> prices.head()
ConId 756733 72195411 73128548
Field Date Time
Close 2017-07-20 09:30:00 247.28 324.30 216.27
10:00:00 247.08 323.94 216.25
11:00:00 246.97 323.63 215.90
12:00:00 247.25 324.11 216.22
13:00:00 247.29 324.32 216.22
...
Volume 2017-08-04 11:00:00 5896400.0 168700.0 170900.0
12:00:00 2243700.0 237300.0 114100.0
13:00:00 2228000.0 113900.0 107600.0
14:00:00 2841400.0 84500.0 116700.0
15:00:00 11351600.0 334000.0 357000.0
As with daily bars, use .loc
to isolate a particular field.
>>> closes = prices.loc["Close"]
>>> closes.head()
ConId 756733 72195411 73128548
Date Time
2017-07-20 09:30:00 247.28 324.30 216.27
10:00:00 247.08 323.94 216.25
11:00:00 246.97 323.63 215.90
12:00:00 247.25 324.11 216.22
13:00:00 247.29 324.32 216.22
To isolate a particular time, use Pandas' .xs
method (short for "cross-section"):
>>> session_closes = closes.xs("15:45:00", level="Time")
>>> session_closes.head()
ConId 756733 72195411 73128548
Date
2017-07-20 247.07 323.84 216.16
2017-07-21 246.89 322.93 215.53
2017-07-24 246.81 323.50 215.09
2017-07-25 247.39 326.37 215.88
2017-07-26 247.45 323.36 216.81
A bar's time represents the start of the bar. Thus, to get the 4:00 PM closing price using 15-minute bars, you would look at the close of the "15:45:00" bar. To get the 3:45 PM price using 15-minute bars, you could look at the open of the "15:45:00" bar or the close of the "15:30:00" bar.
After taking a cross-section of an intraday DataFrame, you can perform matrix operations with bars from different times of day:
>>> opens = prices.loc["Open"]
>>> session_opens = opens.xs("09:30:00", level="Time")
>>> session_closes = closes.xs("15:45:00", level="Time")
>>> prior_session_closes = session_closes.shift()
>>> overnight_returns = (session_opens - prior_session_closes) / prior_session_closes
>>> overnight_returns.head()
ConId 756733 72195411 73128548
Date
2017-07-20 NaN NaN NaN
2017-07-21 -0.002509 -0.001637 -0.004441
2017-07-24 -0.000405 -0.000929 -0.000139
2017-07-25 0.003525 0.005286 0.006555
2017-07-26 0.001455 0.000123 0.004308
Timezone of intraday data
Intraday historical data is stored in the database in ISO-8601 format, which consists of the date followed by the time in the local timezone of the exchange, followed by a UTC offset. For example, a 9:30 AM bar for a stock trading on the NYSE might have a timestamp of 2017-07-25T09:30:00-04:00
, where -04:00
indicates that New York is 4 hours behind Greenwich Mean Time/UTC. This storage format allows QuantRocket to properly align data that may originate from different timezones.
If you don't specify the timezone
parameter when loading prices into Pandas using get_prices
, the function will infer the timezone from the data itself. (This is accomplished by querying the securities master database to determine the timezone of the securities in your dataset.) This approach works fine as long as your data originates from a single timezone. If multiple timezones are represented, an error will be raised.
>>> prices = get_prices("aapl-arb-5min")
ParameterError: cannot infer timezone because multiple timezones are present in data, please specify timezone explicitly (timezones: America/New_York, America/Mexico_City)
In this case, you should manually specify the timezone to which you want the data to be aligned:
>>> prices = get_prices("aapl-arb-5min", timezone="America/New_York")
Historical data with a bar size of 1 day or higher is stored and returned in YYYY-MM-DD format. Specifying a timezone for such a database has no effect.
Securities master fields aligned to prices
Sometimes it might be useful to use securities master fields such as the primary exchange in your data analysis. To do so, first use .loc
(or .loc
and .xs
for intraday data) to isolate a particular price field:
>>> prices = get_prices("usa-1d", fields=["Close","Open"], start_date="2018-01-01")
>>> closes = prices.loc["Close"]
Then use the DataFrame of prices to get a DataFrame of securities master fields shaped like the prices:
>>> from quantrocket.master import get_securities_reindexed_like
>>> securities = get_securities_reindexed_like(closes, domain="main", fields=["PrimaryExchange", "Symbol"])
The domain
parameter specifies which securities master to run the query against.
You can isolate the securities master fields using .loc
:
>>> exchanges = securities.loc["PrimaryExchange"]
>>> exchanges.head()
ConId 4027 4157 4205 309001160 309221203
Date
2018-01-02 NYSE NASDAQ NYSE AMEX NASDAQ
2018-01-03 NYSE NASDAQ NYSE AMEX NASDAQ
2018-01-04 NYSE NASDAQ NYSE AMEX NASDAQ
2018-01-05 NYSE NASDAQ NYSE AMEX NASDAQ
2018-01-08 NYSE NASDAQ NYSE AMEX NASDAQ
And perform matrix operations using your securities master data and price data:
>>> closes.where(exchanges=="NYSE").head()
ConId 4027 4157 4205 309001160 309221203
Date
2018-01-02 106.09 NaN 46.73 NaN NaN
2018-01-03 107.05 NaN 46.46 NaN NaN
2018-01-04 111 NaN 46.86 NaN NaN
2018-01-05 112.18 NaN 47.02 NaN NaN
2018-01-08 111.39 NaN 47.13 NaN NaN
Load only what you need
The more data you load into Pandas, the slower the performance will be. Therefore, it's a good idea to filter the dataset before loading it, particularly when working with large universes and intraday bars. Use the fields
, times
, start_date
, and end_date
parameters to load only the data you need:
>>> prices = get_prices("usa-stk-15min", start_date="2017-01-01", fields=["Open","Close"], times=["09:30:00", "15:45:00"])
Cumulative daily prices for intraday data
For historical databases with bar sizes smaller than 1 day, QuantRocket will calculate and store the day's high, low, and volume as of each intraday bar. When querying intraday data, the additional fields DayHigh
, DayLow
, and DayVolume
are available. Other fields represent only the trading activity that occurred within the duration of a particular bar: for example, the Volume
field for a 15:00:00
bar in a database with 1-hour bars represents the trading volume from 15:00:00 to 16:00:00. In contrast, DayHigh
, DayLow
, and DayVolume
represent the trading activity for the entire day up to and including the particular bar.
>>> prices = get_prices(
"spy-1h",
fields=["Open","High","Low","Close","Volume","DayHigh","DayLow","DayVolume"])
>>>
>>>
>>>
>>> prices.xs("2018-03-08", level="Date").xs("15:00:00", level="Time")
ConId 756733
Field
Close 274.09
DayHigh 274.24
DayLow 272.42
DayVolume 48126000.00
High 274.24
Low 272.97
Open 273.66
Volume 16897100.00
A common use case for cumulative daily totals is if your research idea or trading strategy needs a selection of intraday prices but also needs access to daily price fields (e.g. to calculate average daily volume). Instead of requesting and aggregating all intraday bars (which for large universes might require loading too much data), you can use the times
parameter to load only the intraday bars you need, including the final bar of the trading session to give you access to the daily totals. For example, here is how you might screen for stocks with heavy volume in the opening 30 minutes relative to their average volume:
>>>
>>> prices = get_prices("usa-stk-15min", start_date="2018-01-01", times=["09:45:00","15:45:00"], fields=["DayVolume"])
>>>
>>> early_session_volumes = prices.loc["DayVolume"].xs("09:45:00", level="Time")
>>>
>>> daily_volumes = prices.loc["DayVolume"].xs("15:45:00", level="Time")
>>> avg_daily_volumes = daily_volumes.rolling(window=30).mean()
>>>
>>> volume_surges = early_session_volumes > (avg_daily_volumes.shift() * 2)
Cumulative daily totals are calculated directly from the intraday data in your database and thus will reflect any times
or between-times
filters used when creating the database.
Real-time aggregate data
You can use the get_prices
function to load data from real-time aggregate databases, just like from history databases. Simply provide a real-time aggregate database code instead of a history database code:
>>> from quantrocket import get_prices
>>> prices = get_prices("fang-stk-tick-1min", start_date="2019-06-06")
>>> prices.head()
ConId 265598 3691937 15124833 107113386 208813720
Field Date Time
LastPriceClose 2019-06-06 09:30:00 183.30 1736.32 354.59 167.84 1039.85
09:31:00 182.77 1735.79 353.04 167.72 1037.79
09:32:00 183.12 1730.72 352.99 167.41 1035.11
09:33:00 183.40 1730.54 353.23 167.79 1035.80
09:34:00 183.40 1730.01 353.24 168.16 1036.18
Using get_prices
, it is possible to load data from history databases and real-time aggregate databases into the same DataFrame (provided the databases have the same bar size). This allows you (for example) to combine historical data with today's real-time updates:
>>>
>>> prices = get_prices(["fang-stk-1min",
"fang-stk-tick-1min"],
start_date="2019-06-01",
fields=["Close", "LastPriceClose"])
>>>
>>>
>>> history_closes = prices.loc["Close"]
>>> realtime_closes = prices.loc["LastPriceClose"]
>>>
>>>
>>> combined_closes = realtime_closes.fillna(history_closes)
Alphalens
Alphalens is an open source library created by Quantopian for analyzing alpha factors. You can use Alphalens early in your research process to determine if your ideas look promising.
For example, suppose you wanted to analyze the momentum factor, which says that recent winners tend to outperform recent losers. First, load your historical data and extract the closing prices:
>>> prices = get_prices("demo-stocks-1d", start_date="2010-01-01", fields=["Close"])
>>> closes = prices.loc["Close"]
Next, calculate the 12-month returns, skipping the most recent month (as commonly prescribed in academic papers about the momentum factor):
>>> MOMENTUM_WINDOW = 252
>>> RANKING_PERIOD_GAP = 22
>>> earlier_closes = closes.shift(MOMENTUM_WINDOW)
>>> later_closes = closes.shift(RANKING_PERIOD_GAP)
>>> returns = (later_closes - earlier_closes) / earlier_closes
The 12-month returns are the predictive factor we will pass to Alphalens, along with pricing data so Alphalens can see whether the factor was in fact predictive. To avoid lookahead bias, in this example we should shift()
our factor forward one period to align it with the subsequent prices, since the subsequent prices would represent our entry prices after calculating the factor. Alphalens expects the predictive factor to be stacked into a MultiIndex Series, while pricing data should be a DataFrame:
>>>
>>> returns = returns.shift()
>>>
>>> returns = returns.stack()
>>> factor_data = alphalens.utils.get_clean_factor_and_forward_returns(returns, closes)
>>> alphalens.tears.create_returns_tear_sheet(factor_data)
You'll see tabular statistics as well as graphs that look something like this:

Code reuse in Jupyter
If you find yourself writing the same code again and again, you can factor it out into a .py
file in Jupyter and import it into your notebooks and algo files. Any .py
files in or under the /codeload
directory inside Jupyter (that is, in or under the top-level directory visible in the Jupyter file browser) can be imported using standard Python import syntax. For example, suppose you've implemented a function in /codeload/research/utils.py
called analyze_fundamentals
. You can import and use the function in another file or notebook:
from codeload.research.utils import analyze_fundamentals
The .py
files can live wherever you like in the directory tree; subdirectories can be reached using standard Python dot syntax.
To make your code importable as a standard Python package, the 'codeload' directory and each subdirectory must contain a __init__.py
file. QuantRocket will create these files automatically if they don't exist.
QGrid
QGrid is a Jupyter notebook extension created by Quantopian that provides Excel-like sorting and filtering of DataFrames in Jupyter notebooks. You can use it to explore a DataFrame interactively without writing code. A basic example is shown below:
from quantrocket import get_prices
import qgrid
prices = get_prices("usa-stk-1d")
prices = prices.stack().unstack("Field")
widget = qgrid.show_grid(prices)
widget
You'll see a grid like this:

After filtering the grid, you can get the edited DataFrame:
prices_edited = widget.get_changed_df()
Moonshot
Moonshot is a fast, vectorized Pandas-based backtester that supports daily or intraday data, multi-strategy backtests and parameter scans, and live trading. It is well-suited for running cross-sectional strategies or screens involving hundreds or even thousands of securities.
What is Moonshot?
Key features
Pandas-based: Moonshot is based on Pandas, the centerpiece of the Python data science stack. If you love Pandas you'll love Moonshot. Moonshot can be thought of as a set of conventions for organizing Pandas code for the purpose of running backtests.
Lightweight: Moonshot is simple and lightweight because it relies on the power and flexibility of Pandas and doesn't attempt to re-create functionality that Pandas can already do. No bloated codebase full of countless indicators and models to import and learn. Most of Moonshot's code is contained in a single Moonshot
class.
Fast: Moonshot is fast because Pandas is fast. No event-driven backtester can match Moonshot's speed. Speed promotes alpha discovery by facilitating rapid experimentation and research iteration.
Multi-asset class, multi-time frame: Moonshot supports end-of-day and intraday strategies using equities, futures, and forex.
Machine learning support: Moonshot supports machine learning and deep learning strategies using scikit-learn or Keras.
Live trading: Live trading with Moonshot can be thought of as running a backtest on up-to-date historical data and generating a batch of orders based on the latest signals produced by the backtest.
No black boxes, no magic: Moonshot provides many conveniences to make backtesting easier, but it eschews hidden behaviors and complex, under-the-hood simulation rules that are hard to understand or audit. What you see is what you get.
Vectorized vs event-driven backtesters
What's the difference between event-driven backtesters like Zipline and vectorized backtesters like Moonshot? Event-driven backtests process one event at a time, where an event is usually one historical bar (or in the case of live trading, one real-time quote). Vectorized backtests process all events at once, by performing simultaneous calculations on an entire vector or matrix of data. (In pandas, a Series
is a vector and a DataFrame
is a matrix).
Imagine a simplistic strategy of buying a security whenever the price falls below $10 and selling whenever it rises above $10. We have a time series of prices and want to know which days to buy and which days to sell. In an event-driven backtester we loop through one date at a time and check the price at each iteration:
>>> data = {
>>> "2017-02-01": 10.07,
>>> "2017-02-02": 9.87,
>>> "2017-02-03": 9.91,
>>> "2017-02-04": 10.01
>>> }
>>> for date, price in data.items():
>>> if price < 10:
>>> buy_signal = True
>>> else:
>>> buy_signal = False
>>> print(date, buy_signal)
2017-02-01 False
2017-02-02 True
2017-02-03 True
2017-02-04 False
In a vectorized backtest, we check all the prices at once to calculate our buy signals:
>>> import pandas as pd
>>> data = {
>>> "2017-02-01": 10.07,
>>> "2017-02-02": 9.87,
>>> "2017-02-03": 9.91,
>>> "2017-02-04": 10.01
>>> }
>>> prices = pd.Series(data)
>>> buy_signals = prices < 10
>>> buy_signals.head()
2017-02-01 False
2017-02-02 True
2017-02-03 True
2017-02-04 False
dtype: bool
Both backtests produce the same result but use a different approach.
Vectorized backtests are faster than event-driven backtests
Speed is one of the principal benefits of vectorized backtests, thanks to running calculations on an entire time series at once. Event-driven backtests can be prohibitively slow when working with large universes of securities and large amounts of data. Because of their speed, vectorized backtesters support rapid experimentation and testing of new ideas.
Watch out for look-ahead bias with vectorized backtesters
Look-ahead bias refers to making decisions in your backtest based on information that wouldn't have been available at the time of the trade. Because event-driven backtesters only give you one bar at a time, they generally protect you from look-ahead bias. Because a vectorized backtester gives you the entire time-series, it's easier to introduce look-ahead bias by mistake, for example generating signals based on today's close but then calculating the return from today's open instead of tomorrow's.
If you achieve a phenomenal backtest result on the first try with a vectorized backtester, check for look-ahead bias.
How does live trading work?
With event-driven backtesters, switching from backtesting to live trading typically involves changing out a historical data feed for a real-time market data feed, and replacing a simulated broker with a real broker connection.
With a vectorized backtester, live trading can be achieved by running an up-to-the-moment backtest and using the final row of signals (that is, today's signals) to generate orders.
Supported types of strategies
The vectorized design of Moonshot is well-suited for cross-sectional and factor-model strategies with regular rebalancing intervals, or for any strategy that "wakes up" at a particular time, checks current and historical market conditions, and makes trading decisions accordingly.
Examples of supported strategies:
- End-of-day strategies
- Intraday strategies that trade once per day at a particular time of day
- Intraday strategies that trade throughout the day
- Cross-sectional and factor-model strategies
- Market neutral strategies
- Seasonal strategies (where "seasonal" might be time of year, day of month, day of week, or time of day)
- Strategies that use fundamental data
- Strategies that screen thousands of stocks using daily data
- Strategies that screen thousands of stocks using 15- or 30-minute intraday data
- Strategies that screen a few hundred stocks using 5-minute intraday data
- Strategies that screen a few stocks using 1-minute intraday data
Examples of unsupported strategies:
- Path-dependent strategies that don't lend themselves to Moonshot's vectorized design
Total data quantity for intraday strategies
Moonshot supports any number of securities and can utilize any bar size, from 1 month down to 1 minute. However, the combination of bar size and the number of securities in your trading universe determines the total data quantity, and there are practical limits on total data quantity. Smaller universes can support higher data frequencies (i.e. smaller bar sizes); larger universes require lower data frequencies (i.e. larger bar sizes).
In backtesting, the practical limit on total data quantity is imposed by the amount of time it takes to initially collect intraday historical data from IB. See the usage guide for more details on the practicalities of historical data collection.
In live trading, the practical limit on data quantity is imposed by the amount of time it takes to updating your historical database before live trading. Updating a database with fewer securities is faster than updating a database with many securities. Consequently small universes can be traded using higher data frequencies (smaller bar sizes) than large universes.
To give a few rough examples (these are conservative estimates, not hard limits), you should have no problem using 1-minute bars with a universe of 10 securities, 3-minute bars with a universe of 50 securities, and 15-minute bars with a universes of 1000 securities. 1-minute bars with a universe of 6,000 securities probably won't work because the data quantity will be prohibitive to collect, store, and work with.
Backtesting
Backtesting quickstart
Let's design a dual moving average strategy which buys tech stocks when their short moving average is above their long moving average. Assume we've already created a history database of daily bars for several tech stocks, like so:
$
$ quantrocket master collect --exchanges 'NASDAQ' --symbols 'GOOGL' 'NFLX' 'AAPL' 'AMZN'
status: the listing details will be collected asynchronously
$
$ quantrocket master get -e 'NASDAQ' -s 'GOOGL' 'NFLX' 'AAPL' 'AMZN' | quantrocket master universe 'tech-giants' -f -
code: tech-giants
inserted: 4
provided: 4
total_after_insert: 4
$
$ quantrocket history create-db 'tech-giants-1d' -u 'tech-giants' --bar-size '1 day'
status: successfully created quantrocket.history.tech-giants-1d.sqlite
$ quantrocket history collect 'tech-giants-1d'
status: the historical data will be collected asynchronously
Now let's write the minimal strategy code to run a backtest:
from moonshot import Moonshot
class DualMovingAverageStrategy(Moonshot):
CODE = "dma-tech"
DB = "tech-giants-1d"
LMAVG_WINDOW = 300
SMAVG_WINDOW = 100
def prices_to_signals(self, prices):
closes = prices.loc["Close"]
lmavgs = closes.rolling(self.LMAVG_WINDOW).mean()
smavgs = closes.rolling(self.SMAVG_WINDOW).mean()
signals = smavgs.shift() > lmavgs.shift()
return signals.astype(int)
A strategy is a subclass of the Moonshot
class. You implement your trading logic in the class methods and store your strategy parameters as class attributes. Class attributes include built-in Moonshot parameters which you can specify or override, as well as your own custom parameters. In the above example, CODE
and DB
are built-in parameters while LMAVG_WINDOW
and SMAVG_WINDOW
are custom parameters which we've chosen to store as class attributes, which will allow us to run parameter scans or create similar strategies with different parameters.
Place your code in a file inside the 'moonshot' directory in JupyterLab. QuantRocket recursively scans .py
files in this directory and loads your strategies.
You can run backtests via the command line or inside a Jupyter notebook, and you can get back a CSV of backtest results or a tear sheet with performance plots.
The performance plots will resemble the following:

Backtest visualization and analysis in Jupyter
In addition to running backtests from the CLI, you can run backtests from a Jupyter notebook and perform analysis and visualizations inside the notebook. First, run the backtest and save the results to a CSV:
>>> from quantrocket.moonshot import backtest
>>> backtest("dma-tech", start_date="2005-01-01", end_date="2017-01-01",
filepath_or_buffer="dma_tech_results.csv")
You can do four main things with the CSV results:
- generate a performance tear sheet using Moonchart, an open source companion library to Moonshot;
- generate a performance tear sheet using pyfolio, an open source library created by Quantopian;
- use Moonchart to get a
DailyPerformance
object and create your own plots; and - load the results into a Pandas DataFrame for further analysis.
Moonchart tear sheet
To look at a Moonchart tear sheet:
>>> from moonchart import Tearsheet
>>> Tearsheet.from_moonshot_csv("dma_tech_results.csv")
pyfolio tear sheet
To look at a pyfolio tear sheet:
>>> import pyfolio as pf
>>> pf.from_moonshot_csv("dma_tech_results.csv")
Moonchart and pyfolio offer somewhat different visualizations so it's nice to look at both.
Custom plots with Moonchart
For finer-grained control with Moonchart or for times when you don't want a full tear sheet, you can instantiate a DailyPerformance
object and create your own individual plots:
>>> from moonchart import DailyPerformance
>>> perf = DailyPerformance.from_moonshot_csv("dma_tech_results.csv")
>>> perf.cum_returns.tail()
AAPL(265598) AMZN(3691937) NFLX(15124833) GOOGL(208813719)
Date
2016-12-23 1.886121 2.456078 2.094612 1.656564
2016-12-27 1.889116 2.464804 2.106120 1.657656
2016-12-28 1.887102 2.465388 2.096028 1.654913
2016-12-29 1.886981 2.459816 2.093697 1.654044
2016-12-30 1.883303 2.447535 2.087307 1.648672
>>> perf.cum_returns.plot()
You can use the DailyPerformance
object to construct an AggregateDailyPerformance
object representing aggregated backtest results:
>>> from moonchart import AggregateDailyPerformance
>>> agg_perf = AggregateDailyPerformance(perf)
>>> agg_perf.cum_returns.tail()
Date
2016-12-23 13.610334
2016-12-27 13.764051
2016-12-28 13.663911
2016-12-29 13.609782
2016-12-30 13.429575
>>> agg_perf.cum_returns.plot()
See Moonchart reference for available performance attributes.
Raw backtest results analysis
You can also load the backtest results into a DataFrame:
>>> from quantrocket.moonshot import read_moonshot_csv
>>> results = read_moonshot_csv("dma_tech_results.csv")
>>> results.tail()
AAPL(265598) AMZN(3691937) NFLX(15124833) GOOGL(208813719)
Field Date
Weight 2016-12-23 0.25 0.25 0.25 0.25
2016-12-27 0.25 0.25 0.25 0.25
2016-12-28 0.25 0.25 0.25 0.25
2016-12-29 0.25 0.25 0.25 0.25
2016-12-30 0.25 0.25 0.25 0.25
The DataFrame consists of several stacked DataFrames, one DataFrame per field (see backtest field reference). Use .loc
to isolate a particular field:
>>> returns = results.loc["Return"]
>>> returns.tail()
AAPL(265598) AMZN(3691937) NFLX(15124833) GOOGL(208813719)
Date
2016-12-23 0.000494 -0.001876 0.000020 -0.000580
2016-12-27 0.001588 0.003553 0.005494 0.000659
2016-12-28 -0.001066 0.000237 -0.004792 -0.001654
2016-12-29 -0.000064 -0.002260 -0.001112 -0.000525
2016-12-30 -0.001949 -0.004992 -0.003052 -0.003248
Since we specified details=True
when running the backtest, there is a column per security. Had we omitted details=True
, or if we were running a multi-strategy backtest, there would be a column per strategy.
How a Moonshot backtest works
Moonshot is all about DataFrames. In a Moonshot backtest, we start with a DataFrame of historical prices and derive a variety of equivalently-indexed DataFrames, including DataFrames of signals, trade allocations, positions, and returns. These DataFrames consist of a time-series index (vertical axis) with one or more securities as columns (horizontal axis). A simple example of a DataFrame of signals is shown below for a strategy with a 2-security universe (securities are identified by conid):
ConId 12345 67890
Date
2017-09-19 0 -1
2017-09-20 1 -1
2017-09-21 1 0
A Moonshot strategy consists of strategy parameters (stored as class attributes) and strategy logic (implemented in class methods). The strategy logic required to run a backtest is spread across four main methods, mirroring the stages of a trade:
| method name | input/output |
---|
what direction to trade? | prices_to_signals | from a DataFrame of prices, return a DataFrame of integer signals, where 1=long, -1=short, and 0=cash |
how much capital to allocate to the trades? | signals_to_target_weights | from a DataFrame of integer signals (-1, 0, 1), return a DataFrame indicating how much capital to allocate to the signals, expressed as a percentage of the total capital allocated to the strategy (for example, -0.25, 0, 0.1 to indicate 25% short, cash, 10% long) |
enter the positions when? | target_weights_to_positions | from a DataFrame of target weights, return a DataFrame of positions (here we model the delay between when the signal occurs and when the position is entered, and possibly model non-fills) |
what's our return? | positions_to_gross_returns | from a DataFrame of positions and a DataFrame of prices, return a DataFrame of percentage returns before commissions and slippage (our return is the security's percent change over the period, multiplied by the size of the position) |
Since Moonshot is a vectorized backtester, each of these methods is called only once per backtest.
Our demo strategy above relies on the default implementations of several of these methods, but since it's better to be explicit than implicit, you should always implement these methods even if you copy the default behavior. Let's explicitly implement the default behavior in our demo strategy:
from moonshot import Moonshot
class DualMovingAverageStrategy(Moonshot):
CODE = "dma-tech"
DB = "tech-giants-1d"
LMAVG_WINDOW = 300
SMAVG_WINDOW = 100
def prices_to_signals(self, prices):
closes = prices.loc["Close"]
lmavgs = closes.rolling(self.LMAVG_WINDOW).mean()
smavgs = closes.rolling(self.SMAVG_WINDOW).mean()
signals = smavgs.shift() > lmavgs.shift()
return signals.astype(int)
def signals_to_target_weights(self, signals, prices):
weights = self.allocate_equal_weights(signals)
return weights
def target_weights_to_positions(self, weights, prices):
positions = weights.shift()
return positions
def positions_to_gross_returns(self, positions, prices):
closes = prices.loc["Close"]
gross_returns = closes.pct_change() * positions.shift()
return gross_returns
To summarize the above code, we generate signals based on moving average crossovers, we divide our capital equally among the securities with signals, we enter the positions the next day, and compute our (gross) returns using the securities' close-to-close returns.
Several weight allocation algorithms are provided out of the box via moonshot.mixins.WeightAllocationMixin
.
Benchmarks
Optionally, we can identify a benchmark security and get a plot of the strategy's performance against the benchmark. The benchmark can exist within the same database used by the strategy, or a different database. Our ETF strategy universe includes SPY
, so let's make that our benchmark. First, lookup the conid (contract ID) if needed, since that's how we specify the benchmark:
$ quantrocket master get -e ARCA -s SPY -f ConId -p
ConId = 756733
Now set this conid as the benchmark:
class DualMovingAverageStrategyETF(DualMovingAverageStrategy):
CODE = "dma-etf"
DB = "etf-sampler-1d"
LMAVG_WINDOW = 300
SMAVG_WINDOW = 100
BENCHMARK = 756733
Run the backtest again, and we'll see an additional chart in our tear sheet:

To use a benchmark security from a different database, specify a BENCHMARK_DB
:
class MeanReversionStrategy(Moonshot):
CODE = "mean-revert-us"
DB = "usa-stk-1d"
BENCHMARK = 756733
BENCHMARK_DB = "etf-sampler-1d"
Multi-strategy backtests
We can easily backtest multiple strategies at once to simulate running complex portfolios of strategies. Simply specify all of the strategies:
Our tear sheet will show the aggregate portfolio performance as well as the individual strategy performance:

By default, when backtesting multiple strategies, capital is divided equally among the strategies; that is, each strategy's allocation is 1.0 / number of strategies
. If this isn't what you want, you can specify custom allocations for each strategy (which need not add up to 1):
On-the-fly parameters
You can change Moonshot parameters on-the-fly from the Python client or CLI when running backtests, without having to edit your .py
algo files. Pass parameters as KEY:VALUE
pairs:
$
$ quantrocket moonshot backtest 'dma-tech' -o dma_tech_no_commissions.csv --params 'COMMISSION_CLASS:None'
>>>
>>> backtest("dma-tech", filepath_or_buffer="dma_tech_no_commissions.csv",
params={"COMMISSION_CLASS":None})
$
$ curl -X POST 'http://houston/moonshot/backtests?strategies=dma-tech¶ms=COMMISSION_CLASS%3ANone' > dma_tech_no_commissions.csv
This capability is provided as a convenience and also helps protect you from temporarily editing your algo file and forgetting to change it back. It is also available for parameter scans:
$
$ quantrocket moonshot paramscan 'dma-tech' -p 'SMAVG_WINDOW' -v 5 20 100 --params 'SLIPPAGE_BPS:2' -o dma_tech_1d_with_slippage.csv
>>>
>>> from quantrocket.moonshot import scan_parameters
>>> scan_parameters("dma-tech",
param1="SMAVG_WINDOW", vals1=[5,20,100],
params={"SLIPPAGE_BPS":2},
filepath_or_buffer="dma_tech_1d_with_slippage.csv")
$
$ curl -X POST 'http://houston/moonshot/paramscans?strategies=dma-tech¶m1=SMAVG_WINDOW&vals1=5&vals1=20&vals1=100&SLIPPAGE_BPS%3A2' > dma_tech_1d_with_slippage.csv
Lookback windows
Commonly, your strategy may need an initial cushion of data to perform rolling calculations (such as moving averages) before it can begin generating signals. By default, Moonshot will infer the required cushion size by using the largest integer value of any strategy attribute whose name ends with _WINDOW
. In the following example, the lookback window will be set to 200:
class DualMovingAverage(Moonshot):
...
SMAVG_WINDOW = 50
LMAVG_WINDOW = 200
This means Moonshot will load 200 trading days of historical data (plus a small additional buffer) prior to your backtest start date so that your signals can actually begin on the start date. If there are no _WINDOW
attributes, the cushion defaults to 252 (approx. 1 year).
Additionally, any attributes ending with _INTERVAL
which contain pandas offset aliases will be used to further pad the lookback window. In the following example, the calculated lookback window will be 100 trading days to cover the moving average window plus an additional month to cover the rebalancing interval:
class MonthlyRebalancingStrategy(Moonshot):
...
MAVG_WINDOW = 100
REBALANCE_INTERVAL = "M"
You can override the default behavior by explicitly setting the LOOKBACK_WINDOW
attribute (set to 0 to disable):
class StrategyWithQuarterlyLookback(Moonshot):
...
LOOKBACK_WINDOW = 63
If you make a habit of storing rolling window lengths as class attributes ending with _WINDOW
and storing rebalancing intervals as class attributes ending with _INTERVAL
, the lookback window will usually take care of itself and you shouldn't need to worry about it.
Adequate lookback windows are especially important for live trading. In case you don't name your rolling window attributes with _WINDOW
, make sure to define a LOOKBACK_WINDOW
that is adequate for your strategy's rolling calculations, as an inadequate lookback window will mean your strategy doesn't load enough data in live trading and therefore never generates any trades.
Segmented backtests
When running a backtest on a large universe and sizable date range, you might run out of memory. You'll see an error like this:
$ quantrocket moonshot backtest 'big-boy' --start-date '2000-01-01'
msg: 'HTTPError(''502 Server Error: Bad Gateway for url: http://houston/moonshot/backtests?strategies=big-boy&start_date=2000-01-01'',
''please check the logs for more details'')'
status: error
And in the logs you'll find this:
$ quantrocket flightlog stream --hist 1
quantrocket.moonshot: ERROR the system killed the worker handling the request, likely an Out Of Memory error; \if you were backtesting, try a segmented backtest to reduce memory usage (for example `segment="A"`), or add more memory
When this happens, you can try a segmented backtest. In a segmented backtest, QuantRocket breaks the backtest date range into smaller segments (for example, 1-year segments), runs each segment of the backtest in succession, and concatenates the partial results into a single backtest result. The output is identical to a non-segmented backtest, but the memory footprint is smaller. The segment
option takes a Pandas frequency string specifying the desired size of the segments, for example "A" for annual segments, "Q" for quarterly segments, or "2A" for 2-year segments:
Providing a start and end date is optional for a non-segmented backtest but required for a segmented backtest.
In the detailed logs, you'll see Moonshot running through each backtest segment:
$ quantrocket flightlog stream -d
quantrocket_moonshot_1|[big-boy] Backtesting strategy from 2001-01-01 to 2001-12-30
quantrocket_moonshot_1|[big-boy] Backtesting strategy from 2001-12-31 to 2002-12-30
quantrocket_moonshot_1|[big-boy] Backtesting strategy from 2002-12-31 to 2003-12-30
quantrocket_moonshot_1|[big-boy] Backtesting strategy from 2003-12-31 to 2004-12-30
quantrocket_moonshot_1|[big-boy] Backtesting strategy from 2004-12-31 to 2005-12-30
...
Backtest field reference
Backtest result CSVs contain the following fields in a stacked format. Each field is a DataFrame from the backtest. For detailed backtests, there is a column per security. For non-detailed or multi-strategy backtests, there is a column per strategy, with each column containing the aggregated (summed) results of all securities in the strategy.
Signal
: the signals returned by prices_to_signals
.NetExposure
: the net long or short positions returned by target_weights_to_positions
. Expressed as a proportion of capital base.AbsExposure
: the absolute value of positions, irrespective of their side (long or short). Expressed as a proportion of capital base. This represents the total market exposure of the strategy.Weight
: the target weights allocated to the strategy, after multiplying by strategy allocation and applying any weight constraints. Expressed as a proportion of capital base.AbsWeight
: the absolute value of the target weights.Turnover
: the strategy's day-to-day turnover. Expressed as a proportion of capital base.TotalHoldings
: the total number of holdings for the period.Return
: the returns, after commissions and slippage. Expressed as a proportion of capital base.Commission
: the commissions deducted from gross returns. Expressed as a proportion of capital base.Slippage
: the slippage deducted from gross returns. Expressed as a proportion of capital base.Benchmark
: the prices of the benchmark security, if any.
Moonchart reference
Moonchart DailyPerformance
and AggregateDailyPerformance
objects provide the following attributes.
Attributes copied directly from backtest results:
returns
: the returns, after commissions and slippage. Expressed as a proportion of capital base.net_exposures
: the net long or short positions. Expressed as a proportion of capital base.abs_exposures
: the absolute value of positions, irrespective of their side (long or short). Expressed as a proportion of capital base. This represents the total market exposure of the strategy.total_holdings
: the total number of holdings for the period.turnover
- the strategy's day-to-day turnover. Expressed as a proportion of capital base.commissions
- the commissions deducted from gross returns. Expressed as a proportion of capital base.slippages
- the slippage deducted from gross returns. Expressed as a proportion of capital base.benchmark_prices
: the prices of the benchmark security, if any.
Calculated attributes:
cum_returns
- cumulative returnscum_commissions
- cumulative commissionscum_slippage
- cumulative slippagecagr
- compound annual growth rate. DailyPerformance.cagr
returns a Series while AggregateDailyPerformance.cagr
returns a scalar.sharpe
- Sharpe ratio. DailyPerformance.sharpe
returns a Series while AggregateDailyPerformance.sharpe
returns a scalar.rolling_sharpe
- rolling Sharpe ratiodrawdowns
- drawdownsmax_drawdown
- maximum drawdowns. DailyPerformance.max_drawdown
returns a Series while AggregateDailyPerformance.max_drawdown
returns a scalar.benchmark_returns
- benchmark returns calculated from benchmark pricesbenchmark_cum_returns
- cumulative returns for benchmark
Parameter scans
You can run 1-dimensional or 2-dimensional parameter scans to see how your strategy performs for a variety of parameter values. You can run parameter scans against any parameter which is stored as a class attribute on your strategy (or as a class attribute on a parent class of your strategy).
For example, returning to the moving average crossover example, recall that the long and short moving average windows are stored as class attributes:
class DualMovingAverageStrategy(Moonshot):
CODE = "dma-tech"
DB = "tech-giants-1d"
LMAVG_WINDOW = 300
SMAVG_WINDOW = 100
Let's try varying the short moving average window on our dual moving average strategy:
The resulting tear sheet will show how the strategy performs for each parameter value:

Let's try a 2-dimensional parameter scan, varying both our short and long moving averages:
$ quantrocket moonshot paramscan 'dma-tech' --param1 'SMAVG_WINDOW' --vals1 5 20 100 --param2 'LMAVG_WINDOW' --vals2 150 200 300 -s '2005-01-01' -e '2017-01-01' --pdf -o dma_2d.pdf
>>> from quantrocket.moonshot import scan_parameters
>>> from moonchart import ParamscanTearsheet
>>> scan_parameters("dma-tech", start_date="2005-01-01", end_date="2017-01-01",
param1="SMAVG_WINDOW", vals1=[5,20,100],
param2="LMAVG_WINDOW", vals2=[150,200,300],
filepath_or_buffer="dma_tech_2d.csv")
>>> ParamscanTearsheet.from_moonshot_csv("dma_tech_2d.csv")
$ curl -X POST 'http://houston/moonshot/paramscans?strategies=dma-tech&start_date=2005-01-01&end_date=2017-01-01¶m1=SMAVG_WINDOW&vals1=5&vals1=20&vals1=100¶m2=LMAVG_WINDOW&vals2=150&vals2=200&vals2=300&pdf=true' > dma_tech_2d.pdf
This time our tear sheet uses a heat map to visualize the 2-D results:

We can even run a 1-D or 2-D parameter scan on multiple strategies at once:
The tear sheet shows the scan results for the individual strategies and the aggregate portfolio:

Often when first coding a strategy your parameter values will be hardcoded in the body of your methods:
class TrendDay(Moonshot):
...
def prices_to_signals(self, prices):
...
afternoon_prices = closes.xs("14:00:00", level="Time")
...
When you're ready to run parameter scans, simply factor out the hardcoded values into class attributes, naming the attribute whatever you like:
class TrendDay(Moonshot):
...
DECISION_TIME = "14:00:00"
def prices_to_signals(self, prices):
...
afternoon_prices = closes.xs(self.DECISION_TIME, level="Time")
...
Now run your parameter scan:
You can scan parameter values other than just strings or numbers, including True
, False
, None
, and lists of values. You can pass the special value "default" to run an iteration that preserves the parameter value already defined on your strategy.
$ quantrocket moonshot paramscan 'dma-tech' --param1 'SLIPPAGE_BPS' --vals1 'default' 'None' '2' '5' --param2 'EXCLUDE_CONIDS' --vals2 '756733' '6604766' '756733,6604766' --pdf -o paramscan_results.pdf
>>> from quantrocket.moonshot import scan_parameters
>>> from moonchart import ParamscanTearsheet
>>> scan_parameters("dma-tech",
param1="SLIPPAGE_BPS", vals1=["default",None,2,100],
param2="EXCLUDE_CONIDS", vals2=[756733,6604766,[756733,6604766]],
filepath_or_buffer="paramscan_results.csv")
>>> ParamscanTearsheet.from_moonshot_csv("paramscan_results.csv")
$ curl -X POST 'http://houston/moonshot/paramscans.csv?strategies=dma-tech¶m1=SLIPPAGE_BPS&vals1=default&vals1=None&vals1=2&vals1=100¶m2=EXCLUDE_CONIDS&vals2=756733&vals2=6604766&vals2=%5B756733%2C+6604766%5D' > paramscan_results.pdf
Parameter values are converted to strings, sent over HTTP to the moonshot service, then converted back to the appropriate types by the moonshot service using Python's built-in eval()
function.
Segmented parameter scans
As with backtests, you can run segmented parameter scans to reduce memory usage:
Learn more about segmented backtests in the section on backtesting.
Moonshot development workflow
Interactive strategy development in Jupyter
Working with DataFrames is much easier when done interactively. You can follow and validate the transformations at each step, rather than having to write lots of code and run a complete backtest only to wonder why the results don't match what you expected.
Luckily, Moonshot is a simple, fairly "raw" framework that doesn't perform lots of invisible, black-box magic, making it straightforward to step through your DataFrame transformations in a notebook and later transfer your working code to a .py
file.
To interactively develop our moving average crossover strategy, define a simple Moonshot class that points to your history database:
from moonshot import Moonshot
class DualMovingAverageStrategy(Moonshot):
DB = "tech-giants-1d"
To see other built-in parameters you might define besides DB
, check the Moonshot docstring by typing: Moonshot?
Instantiate the strategy and get a DataFrame of prices:
self = DualMovingAverageStrategy()
prices = self.get_prices(start_date="2016-01-01")
This is the same prices DataFrame that will be passed to your prices_to_signals
method in a backtest, so you can now interactively implement your logic to produce a DataFrame of signals from the DataFrame of prices (peeking at the intermediate DataFrames as you go):
closes = prices.loc["Close"]
lmavgs = closes.rolling(300).mean()
smavgs = closes.rolling(100).mean()
signals = smavgs.shift() > lmavgs.shift()
signals = signals.astype(int)
In a backtest your signals DataFrame will be passed to your signals_to_target_weights
method, so now work on the logic for that method. In this case it's easy:
weights = self.allocate_equal_weights(signals)
Next, transform the target weights into a positions DataFrame; this will become the logic of your strategy's target_weights_to_positions
method:
positions = weights.shift()
Finally, compute gross returns from your positions; this will become positions_to_gross_returns
:
closes = prices.loc["Close"]
gross_returns = closes.pct_change() * positions.shift()
Once you've stepped through this process and your code appears to be doing what you expect, you can create a .py
file for your strategy and copy your code into it, then run a full backtest.
Don't forget to add a CODE
attribute to your strategy class at this point to identify it (e.g. "dma-tech"). The class name of your strategy and the name of the file in which you store it don't matter; only the CODE
is used to identify the strategy throughout QuantRocket.
Save custom DataFrames to backtest results
You can add custom DataFrames to your backtest results, in addition to the DataFrames that are included by default. For example, you might save the computed moving averages:
def prices_to_signals(self, prices):
closes = prices.loc["Close"]
mavgs = closes.rolling(50).mean()
self.save_to_results("MAvg", mavgs)
...
After running a backtest with details=True
, the resulting CSV will contain the custom DataFrame:
>>> from quantrocket.moonshot import read_moonshot_csv
>>> results = read_moonshot_csv("dma_tech_results.csv")
>>> mavgs = results.loc["MAvg"]
>>> mavgs.head()
AAPL(265598) AMZN(3691937) NFLX(15124833) GOOGL(208813719)
Date
2008-12-22 17.31265 62.4673 3.80260 190.40620
2008-12-23 17.21225 62.2206 3.79965 189.55615
2008-12-24 17.11485 61.9779 3.79620 188.75510
2008-12-26 17.00795 61.7046 3.79415 187.85675
2008-12-29 16.89715 61.4177 3.79120 186.91120
Custom DataFrames are only returned when running single-strategy backtests using the --details
/details=True
option.
Debugging Moonshot strategies
There are several options for debugging your strategies.
First, you can interactively develop the strategy in a notebook. This is particularly helpful in the early stages of development.
Second, if your strategy is already in a .py
file, you can save custom DataFrames to your backtest output and try to see what's going on.
Third, you can add print statements to your .py
file, which will show up in flightlog's detailed logs. Open a terminal and start streaming the logs:
$ quantrocket flightlog stream -d
Then run your backtest from a notebook or another terminal.
If you want to inspect or debug the Moonshot library itself (we hope it's so solid you never need to!), a good tactic is to find the relevant method from the base Moonshot class and copy and paste it into your own strategy:
class MyStrategy(Moonshot):
...
def backtest(self, start_date=None, end_date=None):
self.is_backtest = True
...
This will override the corresponding method on the base Moonshot class, so you can now add print statements to your copy of the method and they'll show up in flightlog.
Strategy inheritance
Often, you may want to re-use a strategy's logic while changing some of the parameters. For example, perhaps you'd like to run an existing strategy on a different market. To do so, simply subclass your existing strategy and modify the parameters as needed. Let's try our dual moving average strategy on a group of ETFs. First, get the historical data for the ETFs:
$
$ quantrocket master collect --exchanges 'ARCA' --symbols 'SPY' 'XLF' 'EEM' 'VNQ' 'XOP' 'GDX'
status: the listing details will be collected asynchronously
$
$ quantrocket master get -e 'ARCA' -s 'SPY' 'XLF' 'EEM' 'VNQ' 'XOP' 'GDX' | quantrocket master universe 'etf-sampler' -f -
code: etf-sampler
inserted: 6
provided: 6
total_after_insert: 6
$
$ quantrocket history create-db 'etf-sampler-1d' -u 'etf-sampler' --bar-size '1 day'
status: successfully created quantrocket.history.etf-sampler-1d.sqlite
$ quantrocket history collect 'etf-sampler-1d'
status: the historical data will be collected asynchronously
Since we're inheriting from an existing strategy, implementing our strategy is easy:
class DualMovingAverageStrategyETF(DualMovingAverageStrategy):
CODE = "dma-etf"
DB = "etf-sampler-1d"
LMAVG_WINDOW = 300
SMAVG_WINDOW = 100
Now we can run our backtest:
Code organization
Your Moonshot code should be placed in the /codeload/moonshot
subdirectory inside JupyterLab. QuantRocket recursively scans .py
files in this directory and loads your strategies (a strategy is defined as a subclass of moonshot.Moonshot
). You can place as many strategies as you like within a single .py
file, or you can place them in separate files. If you like, you can organize your .py
files into subdirectories as you see fit.
If you want to re-use code across multiple files, you can do so using standard Python import syntax. Any .py
files in or under the /codeload
directory inside Jupyter (that is, any .py
files you can see in the Jupyter file browser) can be imported from codeload
. For example, consider a simple directory structure containing two files for your strategies and one file with helper functions used by multiple strategies:
/codeload/moonshot/helpers.py
/codeload/moonshot/meanreversion_strategies.py
/codeload/moonshot/momentum_strategies.py
Suppose you've implemented a function in helpers.py
called rebalance_positions
. You can import and use the function in another file like so:
from codeload.moonshot.helpers import rebalance_positions
Importing also works if you're using subdirectories:
/codeload/moonshot/helpers/rebalance.py
/codeload/moonshot/meanreversion/buythedip.py
/codeload/moonshot/momentum/hml.py
Just use standard Python dot syntax to reach your modules wherever they are in the directory tree:
from codeload.moonshot.helpers.rebalance import rebalance_positions
To make your code importable as a standard Python package, the 'codeload' directory and each subdirectory must contain a __init__.py
file. QuantRocket will create these files automatically if they don't exist.
Interactive order creation in Jupyter
This section might make more sense after reading about
live trading.
Just as you can interactively develop your Moonshot backtest code in Jupyter, you can use a similar approach to develop your order_stubs_to_orders
method.
First, import and instantiate your strategy:
from codeload.moonshot.dual_moving_average import DualMovingAverageTechGiantsStrategy
self = DualMovingAverageTechGiantsStrategy()
Next, run the trade method, which returns a DataFrame of orders. You'll need to pass at least one account allocation (normally this would be pulled from quantrocket.moonshot.allocations.yml
).
allocations = {"DU12345": 1.0}
orders = self.trade(allocations)
The account must be a valid account as Moonshot will try to pull the account balance from the account service. You can run quantrocket account balance --latest
to make sure account history is available for the account.
If
self.trade()
returns no orders, you can pass a
review_date
to
generate orders for an earlier date, and/or modify
prices_to_signals
to create some trades for the purpose of testing.
If your strategy hasn't overridden order_stubs_to_orders
, you'll receive the orders DataFrame as processed by the default implementation of order_stubs_to_orders
on the Moonshot base class. You can return the orders to the state in which they were passed to order_stubs_to_orders
by dropping a few columns:
orders = orders.drop(["OrderType", "Tif", "Exchange"], axis=1)
You can now experiment with modifying your orders DataFrame. For example, re-add the required fields:
orders["Exchange"] = "SMART"
orders["OrderType"] = "MKT"
orders["Tif"] = "DAY"
Or attach exit orders:
child_orders = self.orders_to_child_orders(orders)
child_orders.loc[:, "OrderType"] = "MOC"
orders = pd.concat([orders, child_orders])
To use the prices DataFrame for order creation (for example, to set limit prices), query recent historical prices. (To learn more about the historical data start date used in live trading, see the section on lookback windows.)
prices = self.get_prices("2018-04-01")
Now create limit prices set to the prior close:
closes = prices.loc["Close"]
prior_closes = closes.shift()
prior_closes = self.reindex_like_orders(prior_closes, orders)
orders["OrderType"] = "LMT"
orders["LmtPrice"] = prior_closes
Intraday strategies
When your strategy points to an intraday history database, the strategy receives a DataFrame of intraday prices, that is, a DataFrame containing the time in the index, not just the date.
Moonshot supports two different conventions for intraday strategies, depending on how frequently the strategy trades.
Trade frequency | Example strategy |
---|
throughout the day | using 5 minute bars, enter long (short) position whenever price moves above (below) its N-period moving average |
once per day | if intraday return is greater than X% as of 2:00 PM, enter long position at 2:15 PM and close position at 4:00 PM |
Throughout-the-day strategies
Intraday strategies that trade throughout the day are very similar to end-of-day strategies, the only difference being that the prices DataFrame and the derived DataFrames (signals, target weights, etc.) have a "Time" level in the index. (See the structure of intraday prices.)
Given the similarity with end-of-day strategies, we can demonstrate an intraday strategy by using the end-of-day dual moving average strategy from an earlier example but pointing it to an intraday database. Suppose we have collected 5-minute bars for a small universe of stocks:
$
$ quantrocket history create-db 'tech-giants-5min' -u 'tech-giants' --bar-size '5 mins' --shard 'off'
status: successfully created quantrocket.history.tech-giants-5min.sqlite
$ quantrocket history collect 'tech-giants-5min'
status: the historical data will be collected asynchronously
We can create a subclass of the end-of-day strategy which points to the intraday database:
class DualMovingAverageIntradayStrategy(DualMovingAverageStrategy):
CODE = "dma-tech-intraday"
DB = "tech-giants-5min"
Now we can run the backtest and view the performance:
If you load the backtest results CSV into a DataFrame, it has the same fields as an end-of-day CSV, but the index includes a "Time" level:
>>> from quantrocket.moonshot import read_moonshot_csv
>>> results = read_moonshot_csv("dma_tech_intraday.csv")
>>> results.tail()
AAPL(265598) AMZN(3691937) NFLX(15124833) GOOGL(208813719)
Field Date Time
Weight 2017-03-01 15:35:00 0.25 0.25 0.25 0.25
15:40:00 0.25 0.25 0.25 0.25
15:45:00 0.25 0.25 0.25 0.25
15:50:00 0.25 0.25 0.25 0.25
15:55:00 0.25 0.25 0.25 0.25
When you create a Moonchart or pyfolio tear sheet from an intraday Moonshot CSV, the respective libraries first aggregate the intraday results DataFrame to a daily results DataFrame, then plot the daily results.
Once-a-day strategies
Some intraday strategies only trade at most once per day, at a particular time of day. These strategies can be thought of as "seasonal": that is, instead of treating the intraday prices as a continuous series, the time of day is highly relevant to the trading logic. Once-a-day strategies need to select relevant times of day from the intraday prices DataFrame and perform calculations with those slices of data, rather than using the entirety of intraday prices.
For these once-a-day intraday strategies, the recommended convention is to "reduce" the DataFrame of intraday prices to a DataFrame of daily signals in prices_to_signals
. Since there can only be one signal per day, the signals DataFrame need not have the time in the index. An example will illustrate.
Consider a simple "trend day" strategy using several ETFs: if the ETF is up (down) more than 2% from yesterday's close as of 2:00 PM, buy (sell) the ETF at 2:15 PM and exit the position at the market close.
First, get the historical data for the ETFs:
$
$ quantrocket master collect --exchanges 'ARCA' --symbols 'SPY' 'XLF' 'EEM' 'VNQ' 'XOP' 'GDX'
status: the listing details will be collected asynchronously
$
$ quantrocket master get -e 'ARCA' -s 'SPY' 'XLF' 'EEM' 'VNQ' 'XOP' 'GDX' | quantrocket master universe 'etf-sampler' -f -
code: etf-sampler
inserted: 6
provided: 6
total_after_insert: 6
$
$ quantrocket history create-db 'etf-sampler-15min' -u 'etf-sampler' --bar-size '15 mins'
status: successfully created quantrocket.history.etf-sampler-15min.sqlite
$ quantrocket history collect 'etf-sampler-15min'
status: the historical data will be collected asynchronously
Define a Moonshot strategy and point it to the intraday history database:
class TrendDayStrategy(Moonshot):
CODE = 'trend-day'
DB = 'etf-sampler-15min'
DB_TIMES = ['14:00:00', '15:45:00']
DB_FIELDS = ['Open','Close']
Note the use of
DB_TIMES
and
DB_FIELDS
to limit the amount of data loaded into the backtest.
Loading only the data you need is an important performance optimization for intraday strategies with large universes (albeit unnecessary in this particular example since the universe is small).
Working with intraday prices in Moonshot is identical to working with intraday prices in historical research. We use .xs
to select particular times of day from the prices DataFrame, thereby reducing the DataFrame from intraday to daily. In this way our prices_to_signals
method calculates the return from yesterday's close to 2:00 PM and uses it to make trading decisions:
def prices_to_signals(self, prices):
closes = prices.loc["Close"]
opens = prices.loc["Open"]
session_closes = closes.xs("15:45:00", level="Time")
afternoon_prices = opens.xs("14:00:00", level="Time")
prior_closes = session_closes.shift()
returns = (afternoon_prices - prior_closes) / prior_closes
long_signals = returns > 0.02
short_signals = returns < -0.02
signals = long_signals.astype(int).where(long_signals, -short_signals.astype(int))
return signals
If you step through this code interactively, you'll see that after the use of .xs
to select particular times of day from the prices DataFrame, all subsequent DataFrames have dates in the index but not times, just like with an end-of-day strategy.
Because our prices_to_signals
method has reduced intraday prices to daily signals, our signals_to_target_weights
and target_weights_to_positions
methods don't need to do any special "intraday handling" and therefore look similar to how they might look for a daily strategy:
def signals_to_target_weights(self, signals, prices):
target_weights = self.allocate_fixed_weights_capped(signals, 0.20, cap=1.0)
return target_weights
def target_weights_to_positions(self, target_weights, prices):
positions = target_weights.copy()
return positions
To calculate gross returns, we select the intraday prices that correspond to our entry and exit times and multiply the security's return by our position size:
def positions_to_gross_returns(self, positions, prices):
closes = prices.loc["Close"]
entry_prices = closes.xs("14:00:00", level="Time")
session_closes = closes.xs("15:45:00", level="Time")
pct_changes = (session_closes - entry_prices) / entry_prices
gross_returns = pct_changes * positions
return gross_returns
Now we can run the backtest and view the performance:
$ quantrocket moonshot backtest 'trend-day' --pdf -o trend_day.pdf --details
>>> from quantrocket.moonshot import backtest
>>> from moonchart import Tearsheet
>>> backtest("trend-day", details=True, filepath_or_buffer="trend_day.csv")
>>> Tearsheet.from_moonshot_csv("trend_day.csv")
$ curl -X POST 'http://houston/moonshot/backtests.pdf?strategies=trend-day&pdf=true' -o trend_day.pdf
Commissions and slippage
Commissions
Moonshot supports realistic modeling of IB commissions. To model commissions, subclass the appropriate commission class, set the commission costs as per IB's website, then add the commission class to your strategy:
from moonshot import Moonshot
from moonshot.commission import PercentageCommission
class JapanStockFixedCommission(PercentageCommission):
IB_COMMISSION_RATE = 0.0008
MIN_COMMISSION = 80.00
class MyJapanStrategy(Moonshot):
COMMISSION_CLASS = JapanStockFixedCommission
Because commission costs change from time to time, and because some cost components depend on account specifics such as your monthly trade volume or the degree to which you add or remove liquidity, Moonshot provides the commission logic but expects you to fill in the specific cost constants.
Percentage commissions
Use moonshot.commission.PercentageCommission
where IB's commission is calculated as a percentage of the trade value. If you're using the tiered commission structure, you can also set an exchange fee (as a percentage of trade value). A variety of examples are shown below:
from moonshot.commission import PercentageCommission
class MexicoStockCommission(PercentageCommission):
IB_COMMISSION_RATE = 0.0010
MIN_COMMISSION = 60.00
class SingaporeStockTieredCommission(PercentageCommission):
IB_COMMISSION_RATE = 0.0008
EXCHANGE_FEE_RATE = 0.00034775 + 0.00008025
MIN_COMMISSION = 2.50
class UKStockTieredCommission(PercentageCommission):
IB_COMMISSION_RATE = 0.0008
EXCHANGE_FEE_RATE = 0.000045 + 0.0025
MIN_COMMISSION = 1.00
class HongKongStockTieredCommission(PercentageCommission):
IB_COMMISSION_RATE = 0.0008
EXCHANGE_FEE_RATE = (
0.00005
+ 0.00002
+ 0.001
+ 0.000027
)
MIN_COMMISSION = 18.00
class JapanStockTieredCommission(PercentageCommission):
IB_COMMISSION_RATE = 0.0005
EXCHANGE_FEE_RATE = 0.00002 + 0.000004
MIN_COMMISSION = 80.00
Per Share commissions
Use moonshot.commission.PerShareCommission
to model commissions which are assessed per share (US and Canada stock commissions). Here is an example of a fixed commission for US stocks:
from moonshot.commission import PerShareCommission
class USStockFixedCommission(PerShareCommission):
IB_COMMISSION_PER_SHARE = 0.005
MIN_COMMISSION = 1.00
IB Cost-Plus commissions can be complex; in addition to the IB commission they may include exchange fees which are assessed per share (and which may differ depending on whether you add or remove liqudity), fees which are based on the trade value, and fees which are assessed as a percentage of the IB comission itself. These can also be modeled:
class CostPlusUSStockCommission(PerShareCommission):
IB_COMMISSION_PER_SHARE = 0.0035
EXCHANGE_FEE_PER_SHARE = (0.0002
+ (0.000119/2))
MAKER_FEE_PER_SHARE = -0.002
TAKER_FEE_PER_SHARE = 0.00118
MAKER_RATIO = 0.25
COMMISSION_PERCENTAGE_FEE_RATE = (0.000175
+ 0.00056)
PERCENTAGE_FEE_RATE = 0.0000231
MIN_COMMISSION = 0.35
class CanadaStockCommission(PerShareCommission):
IB_COMMISSION_PER_SHARE = 0.008
EXCHANGE_FEE_PER_SHARE = (
0.00017
+ 0.00011
)
MAKER_FEE_PER_SHARE = -0.0019
TAKER_FEE_PER_SHARE = 0.003
MAKER_RATIO = 0
MIN_COMMISSION = 1.00
Futures commissions
moonshot.commission.FuturesCommission
lets you define a commission, exchange fee, and carrying fee per contract:
from moonshot.commission import FuturesCommission
class GlobexEquityEMiniFixedCommission(FuturesCommission):
IB_COMMISSION_PER_CONTRACT = 0.85
EXCHANGE_FEE_PER_CONTRACT = 1.18
CARRYING_FEE_PER_CONTRACT = 0
Forex commissions
Spot forex commissions are percentage-based, so moonshot.commission.SpotForexCommission
can be used directly without subclassing:
from moonshot import Moonshot
from moonshot.commission import SpotForexCommission
class MyForexStrategy(Moonshot):
COMMISSION_CLASS = SpotForexCommission
Note that at present, SpotForexCommission
does not model minimum commissions (this has to do with the fact that the minimum commission for forex is always expressed in USD, rather than the currency of the traded security). This limitation means that if your trades are small, SpotForexCommission
may underestimate the commission.
Minimum commissions
During backtests, Moonshot calculates and assesses commissions in percentage terms (relative to the capital allocated to the strategy) rather than in dollar terms. However, since minimum commissions are expressed in dollar terms, Moonshot must know your NLV (Net Liquidation Value, i.e. account balance) in order to accurately model minimum commissions in backtests. You can specify your NLV in your strategy definition or at the time you run a backtest.
If you trade in size and are unlikely ever to trigger minimum commissions, you don't need to model them.
NLV should be provided as key-value pairs of CURRENCY:NLV
. You must provide the NLV in each currency you wish to model. For example, if your account balance is $100K USD, and your strategy trades instruments denominated in JPY and AUD, you could specify this on the strategy:
class MyAsiaStrategy(Moonshot):
CODE = "my-asia-strategy"
NLV = {
"JPY": 100000 * 110,
"AUD": 100000 * 1.25
}
Or pass the NLV at the time you run the backtest:
If you don't specify NLV on the strategy or via the nlv
option, the backtest will still run, it just won't take into account minimum commissions.
Multiple commission structures on the same strategy
You might run a strategy that trades multiple securities with different commission structures. Instead of specifying a single commission class, you can specify a Python dictionary associating each commission class with the respective security type, exchange, and currency it applies to:
class USStockFixedCommission(PerShareCommission):
IB_COMMISSION_PER_SHARE = 0.005
MIN_COMMISSION = 1.00
class GlobexEquityEMiniFixedCommission(FuturesCommission):
IB_COMMISSION_PER_CONTRACT = 0.85
EXCHANGE_FEE_PER_CONTRACT = 1.18
class MultiSecTypeStrategy(Moonshot):
COMMISSION_CLASS = {
("STK", "NYSE", "USD"): USStockFixedCommission,
("STK", "NASDAQ", "USD"): USStockFixedCommission,
("FUT", "GLOBEX", "USD"): GlobexEquityEMiniFixedCommission
}
Slippage
Fixed slippage
You can apply a fixed amount of slippage (in basis points) to the trades in your backtest by setting SLIPPAGE_BPS
on your strategy:
class MyStrategy(Moonshot):
...
SLIPPAGE_BPS = 5
The above will apply 5 basis point of one-way slippage to each trade. If you expect different slippage for entry vs exit, take the average.
Parameter scans are a handy way to check your strategy's sensitivity to slippage:
You can research bid-ask spreads for the purpose of estimating slippage by collecting intraday historical data using the BID
, ASK
, or BID_ASK
bar types.
Commissions and slippage for intraday positions
If you run an intraday strategy that closes its positions the same day it opens them, you should set a parameter (POSITIONS_CLOSED_DAILY
, see below) to tell Moonshot you're doing this so that it can more accurately assess commissions and slippage. Here's why:
Moonshot calculates commissions and slippage by first diff()
ing the positions DataFrame in your backtest to calculate the day-to-day turnover. For example, suppose we entered a position in AAPL, then reduced the position the next day, then maintained the position for a day, then closed the position. Our holdings look like this:
>>> positions.head()
AAPL(265598)
Date
2012-01-06 0.000
2012-01-06 0.500
2012-01-09 0.333
2012-01-12 0.333
2012-01-12 0.000
The corresponding DataFrame of trades, representing our turnover due to opening and closing the position, would look like this:
>>> trades = positions.diff()
>>> trades.head()
AAPL(265598)
Date
2012-01-06 NaN
2012-01-06 0.500
2012-01-09 -0.167
2012-01-12 0.000
2012-01-12 -0.333
Commissions and slippage are applied against this DataFrame of trades.
The default use of diff()
to calculate trades from positions involves an assumption: that adjacent, same-side positions in the positions DataFrame represent continuous holdings. For strategies that close out their positions each day, this assumption isn't correct. For example, the positions DataFrame from above might actually indicate 3 positions opened and closed on 3 consecutive days, rather than 1 continuously held position:
>>> positions.head()
AAPL(265598)
Date
2012-01-06 0.000
2012-01-06 0.500
2012-01-09 0.333
2012-01-12 0.333
2012-01-12 0.000
If so, diff()
will underestimate turnover and thus underestimate commissions and slippage. The correct calculation of turnover is to multiply the positions by 2:
>>> trades = positions * 2
>>> trades.head()
AAPL(265598)
Date
2012-01-06 0.000
2012-01-06 1.000
2012-01-09 0.667
2012-01-12 0.667
2012-01-12 0.000
As there is no reliable way for Moonshot to infer automatically whether adjacent, same-side positions are continuously held or closed out daily, you must set POSITIONS_CLOSED_DAILY = True
on the strategy if you want Moonshot to assume they are closed out daily:
class TrendDay(Moonshot):
...
POSITIONS_CLOSED_DAILY = True
Otherwise, Moonshot will assume that adjacent, same-side positions are continuously held.
Position size constraints
Liquidity constraints
Instead of or in addition to limiting position sizes as described below, also consider using
VWAP or other algorithmic orders to trade in size if you have a large account and/or wish to trade illiquid securities. VWAP orders can be modeled in backtests as well as used in live trading.
A backtest that assumes it is possible to buy or sell any security you want in any size you want is likely to be unrealistic. In the real world, a security's liquidity constrains the number of shares it is practical to buy or sell.
Maximum position sizes for long and short positions can be defined in your strategy's limit_position_sizes
method. If defined, this method should return two DataFrames, one defining the maximum quantities (i.e. shares or contracts) allowed for longs and a second defining the maximum quantities allowed for shorts. The following example limits quantities to 1% of 15-day average daily volume:
def limit_position_sizes(self, prices):
volumes = prices.loc["Volume"]
mean_volumes = volumes.rolling(15).mean()
max_shares = (mean_volumes * 0.01).round()
max_quantities_for_longs = max_quantities_for_shorts = max_shares.shift()
return max_quantities_for_longs, max_quantities_for_shorts
The returned DataFrames might resemble the following:
>>> max_quantities_for_longs.head()
ConId 1234 2345
Date
2018-05-18 100 200
2018-05-19 100 200
>>> max_quantities_for_shorts.head()
ConId 1234 2345
Date
2018-05-18 100 200
2018-05-19 100 200
In the above example, our strategy will be allowed to long or short at most 100 shares of ConId 1234 and 200 shares of ConId 2345.
Note that max_quantities_for_shorts
can equivalently be represented with positive or negative numbers. Values of 100 and -100 are both interpreted to mean: short no more than 100 shares. (The same applies to max_quantities_for_longs
—only the absolute value matters).
The shape and alignment of the returned DataFrames should match that of the target_weights
returned by signals_to_target_weights
. Target weights will be reduced, if necessary, so as not to exceed max_quantities_for_longs
and max_quantities_for_shorts
. Position size limits are applied in backtesting and in live trading.
You can return None
for one or both DataFrames to indicate "no limits" (this is the default implementation in the Moonshot base class). For example to limit shorts but not longs:
def limit_position_sizes(self, prices):
...
return None, max_quantities_for_shorts
Within a DataFrame, any None
or NaN
will be treated as "no limit" for that particular security and date.
If you define position size limits for longs or shorts or both, you must specify the NLV to use for the backtest. This is because the target_weights
returned by signals_to_target_weights
are expressed as percentages of capital, and NLV is required for Moonshot to convert the percentage weights to the corresponding number of shares/contracts so that the position size limits can be enforced. NLV should be provided as key-value pairs of CURRENCY:NLV
, and should be provided for each currency represented in the strategy. For example, if your account balance is $100K USD, and your strategy trades instruments denominated in JPY and USD, you could specify NLV on the strategy:
class MyStrategy(Moonshot):
CODE = "my-strategy"
NLV = {
"USD": 100000,
"JPY": 100000 * 110,
}
Or pass the NLV at the time you run the backtest:
Fixed order quantities
Moonshot expects you to define your target weights as a percentage of capital. Moonshot then converts these percentage weights to the corresponding quantities of shares or contracts at the time of live trading.
For some trading strategies, you may wish to set the exact order quantities yourself, rather than using percentage weights. To accomplish this, set your weights very high (in absolute terms) in signals_to_target_weights
, then use limit_position_sizes
to reduce these percentage weights to the exact desired quantity of shares or contracts. See the examples above for the expected conventions to use in limit_position_sizes
.
Short sale constraints
You can use short sale availability data from IB to model short sale constraints in your backtests, including the available quantity of shortable shares and the associated borrow fees for overnight positions.
Shortable sales
One way to use shortable shares data is to enforce position limits based on share availability:
def limit_position_sizes(self, prices):
max_shares_for_shorts = get_shortable_shares_reindexed_like(prices.loc["Close"])
return None, max_shares_for_shorts
Shortable shares data is available back to April 16, 2018. Prior to that date, get_shortable_shares_reindexed_like
will return NaNs, which are interpreted by Moonshot as "no limit on position size".
Due to the limited historical depth of shortable shares data, a useful approach is to develop your strategy without modeling short sale constraints, then run a parameter scan starting at April 16, 2018 to compare the performance with and without short sale constraints. Add a parameter to make your short sale constraint code conditional:
class ShortSaleStrategy(Moonshot):
CODE = "shortseller"
CONSTRAIN_SHORTABLE = False
...
def limit_position_sizes(self, prices):
if self.CONSTRAIN_SHORTABLE:
max_shares_for_shorts = get_shortable_shares_reindexed_like(prices.loc["Close"])
else:
max_shares_for_shorts = None
return None, max_shares_for_shorts
Then run the parameter scan:
Borrow fees
You can use a built-in slippage class to assess borrow fees on your strategy's overnight short positions. (Note that IB does not assess borrow fees on intraday positions.)
from moonshot import Moonshot
from moonshot.slippage import BorrowFees
class ShortSaleStrategy(Moonshot):
CODE = "shortseller"
SLIPPAGE_CLASSES = BorrowFees
...
The BorrowFees
slippage class uses get_borrow_fees_reindexed_like
to query annualized borrow fees, divides them by 252 (the approximate number of trading days in a year) to get a daily rate, and applies the daily rate to your short positions in backtesting. No fees are applied prior to the data's start date of April 16, 2018.
To run a parameter scan with and without borrow fees, add the BorrowFees
slippage as shown above and run a scan on the SLIPPAGE_CLASSES
parameter with values of "default" (to test the strategy as-is, that is, with borrow fees) and "None":
Live trading
Live trading quickstart
Live trading with Moonshot can be thought of as running a backtest on up-to-date historical data and placing a batch of orders based on the latest signals generated by the backtest.
Recall the moving average crossover strategy from the backtesting quickstart:
from moonshot import Moonshot
class DualMovingAverageStrategy(Moonshot):
CODE = "dma-tech"
DB = "tech-giants-1d"
LMAVG_WINDOW = 300
SMAVG_WINDOW = 100
def prices_to_signals(self, prices):
closes = prices.loc["Close"]
lmavgs = closes.rolling(self.LMAVG_WINDOW).mean()
smavgs = closes.rolling(self.SMAVG_WINDOW).mean()
signals = smavgs.shift() > lmavgs.shift()
return signals.astype(int)
To trade the strategy, the first step is to define one or more accounts (live or paper) in which you want to run the strategy, and how much of each account's capital to allocate. Accounts allocations should be defined in quantrocket.moonshot.allocations.yml
, located in the /codeload
directory in Jupyter (that is, in the top-level directory of the Jupyter file browser). Allocations should be expressed as a decimal percent of the total capital (Net Liquidation Value) of the account:
DU12345:
dma-tech: 0.75
Next, bring your history database up-to-date if you haven't already done so:
$ quantrocket history collect 'tech-giants-1d'
status: the historical data will be collected asynchronously
>>> from quantrocket.history import collect_history
>>> collect_history("tech-giants-1d")
{'status': 'the historical data will be collected asynchronously'}
$ curl -X POST 'http://houston/history/queue?codes=tech-giants-1d'
{"status": "the historical data will be collected asynchronously"}
Now you're ready to run the strategy. Running the strategy doesn't place any orders but generates a CSV of orders to be placed in a subsequent step:
If any orders were generated, the CSV will look something like this:
$ csvlook -I orders.csv
| ConId | Account | Action | OrderRef | TotalQuantity | Exchange | OrderType | Tif |
| --------- | ------- | ------ | -------- | ------------- | -------- | --------- | --- |
| 265598 | DU12345 | BUY | dma-tech | 501 | SMART | MKT | DAY |
| 3691937 | DU12345 | BUY | dma-tech | 58 | SMART | MKT | DAY |
| 15124833 | DU12345 | BUY | dma-tech | 284 | SMART | MKT | DAY |
| 208813719 | DU12345 | BUY | dma-tech | 86 | SMART | MKT | DAY |
If no orders were generated, there won't be a CSV. If this happens, you can re-run the strategy with the
--review-date
option to
generate orders for an earlier date, and/or modify
prices_to_signals
to create some trades for the purpose of testing.
Finally, make sure IB Gateway is connected (quantrocket launchpad start
) for the account you're trading, then place the orders with QuantRocket's blotter:
$ quantrocket blotter order -f orders.csv
>>> from quantrocket.blotter import place_orders
>>> place_orders(infilepath_or_buffer="orders.csv")
$ curl -X POST 'http://houston/blotter/orders' --upload-file orders.csv
Normally, you will run your live trading in an automated manner from the countdown service using the command line interface (CLI). With the CLI, you can generate and place Moonshot orders in a one-liner by piping the orders CSV to the blotter over stdin (indicated by passing -
as the -f/--infile
option):
$ quantrocket moonshot trade 'dma-tech' | quantrocket blotter order -f '-'
How live trading works
Live trading in Moonshot starts out just like a backtest:
- Prices are queried from your history database
- The prices DataFrame is passed to your
prices_to_signals
method, which returns a DataFrame of signals - The signals DataFrame is passed to
signals_to_target_weights
, which returns a DataFrame of target weights
At this point, a backtest would proceed to simulate positions (target_weights_to_positions
) then simulate returns (positions_to_gross_returns
). In contrast, in live trading the target weights must be converted into a batch of live orders to be placed with the broker. This process happens as follows:
- First, Moonshot isolates the last row (corresponding to today) from the target weights DataFrame.
- Moonshot converts the target weights into the actual number of shares of each security to be ordered in each allocated account, taking into account the overall strategy allocation, the account balance, and any existing positions the strategy already holds.
- Moonshot provides you with a DataFrame of "order stubs" containing basic fields such as the account, action (buy or sell), order quantity, and contract ID (ConId).
- You can then customize the orders in the
order_stubs_to_orders
method by adding other order fields such as the order type, time in force, etc.
By default, the base class implementation of order_stubs_to_orders
creates MKT DAY orders routed to SMART. The above quickstart example relies on this default behavior, but you should always override order_stubs_to_orders
with your own order specifications.
From order stubs to orders
You can specify detailed order parameters in your strategy's order_stubs_to_orders
method.
The order stubs DataFrame provided to this method resembles the following:
>>> print(orders)
ConId Account Action OrderRef TotalQuantity
0 12345 U12345 SELL my-strategy 100
1 12345 U55555 SELL my-strategy 50
2 23456 U12345 BUY my-strategy 100
3 23456 U55555 BUY my-strategy 50
4 34567 U12345 BUY my-strategy 200
5 34567 U55555 BUY my-strategy 100
Modify the DataFrame by appending additional columns. At minimum, you must provide the order type (OrderType
), time in force (Tif
), and the exchange to route the order to. The default implementation is shown below:
def order_stubs_to_orders(self, orders, prices):
orders["Exchange"] = "SMART"
orders["OrderType"] = "MKT"
orders["Tif"] = "DAY"
return orders
Moonshot isn't limited to a handful of canned order types. You can use any of the order parameters and order types supported by the IB API. Learn more about required and available order fields in the blotter documentation.
As shown in the above example, Moonshot uses your strategy code (e.g. "my-strategy") to populate the
OrderRef
field, a field
used by the blotter for strategy-level tracking of your positions and performance.
Using prices and securities master fields in order creation
The prices DataFrame used throughout Moonshot is passed to order_stubs_to_orders
, allowing you to use prices or securities master fields to create your orders. This is useful, for example, for setting limit prices, or applying different order rules for different exchanges.
The prices DataFrame covers multiple dates while the orders DataFrame represents a current snapshot. You can use the reindex_like_orders
method to extract a current snapshot of data from the prices DataFrame. For example, create limit prices set to the prior close:
def order_stubs_to_orders(self, orders, prices):
closes = prices.loc["Close"]
prior_closes = closes.shift()
prior_closes = self.reindex_like_orders(prior_closes, orders)
orders["OrderType"] = "LMT"
orders["LmtPrice"] = prior_closes
...
Or, direct-route orders to their primary exchange:
def order_stubs_to_orders(self, orders, prices):
closes = prices.loc["Close"]
exchanges = prices.loc["PrimaryExchange"].reindex(closes.index, method="ffill")
exchanges = self.reindex_like_orders(exchanges, orders)
orders["Exchange"] = exchanges
...
Account allocations
Define your strategy allocations in quantrocket.moonshot.allocations.yml
, a YAML file located in the /codeload
directory in Jupyter (that is, in the top-level directory of the Jupyter file browser). You can run multiple strategies per account and/or multiple accounts per strategy. Allocations should be expressed as a decimal percent of the total capital (Net Liquidation Value) of the account:
DU12345:
dma-tech: 0.75
dma-etf: 0.5
U12345:
dma-tech: 1
By default, when you trade a strategy, Moonshot generates orders for all accounts which define allocations for that strategy. However, you can limit to particular accounts:
$ quantrocket moonshot trade 'dma-tech' -a 'U12345'
Note that you can also run multiple strategies at a time:
$ quantrocket moonshot trade 'dma-tech' 'dma-etf'
How Moonshot calculates order quantities
The behavior outlined in this section is handled automatically by Moonshot but is provided for informational purposes.
The target weights generated by signals_to_target_weights
are expressed in percentage terms (e.g. 0.1 = 10% of capital), but these weights must be converted into the actual numbers of shares, futures contracts, etc. that need to be bought or sold. Converting target weights into order quantities requires taking into account a number of factors including the strategy allocation, account NLV, exchange rates, existing positions and orders, and security price.
The conversion process is outlined below for an account with USD base currency:
Step | Source | Domestic stock example - AAPL (NASDAQ) | Foreign stock example - BP (London Stock Exchange) | Futures example - ES (GLOBEX) |
---|
What is target weight? | last row (= today) of target weights DataFrame | 0.2 | 0.2 | 0.2 |
What is account allocation for strategy? | quantrocket.moonshot.allocations.yml | 0.5 | 0.5 | 0.5 |
What is target weight for account? | multiply target weights by account allocations | 0.1 (0.2 x 0.5) | 0.1 (0.2 x 0.5) | 0.1 (0.2 x 0.5) |
What is latest account NLV? | account service | $1M USD | $1M USD | $1M USD |
What is target trade value in base currency? | multiply target weight for account by account NLV | $100K USD ($1M x 0.1) | $100K USD ($1M x 0.1) | $100K USD ($1M x 0.1) |
What is exchange rate? (if trade currency differs from base currency) | account service | Not applicable | USD.GBP = 0.75 | Not applicable |
What is target trade value in trade currency? | multiply target trade value in base currency by exchange rate | $100K USD | 75K GBP ($100K USD x 0.75 USD.GBP) | $100K USD |
What is market price of security? | prices DataFrame | $185 USD | 572 pence (quoted in pence, not pounds) | $2690 USD |
What is contract multiplier? (applicable to futures and options) | securities master service | Not applicable | Not applicable | 50x |
What is price magnifier? (used when prices are quoted in fractional units, for example, pence instead of pounds) | securities master service | Not applicable | 100 (i.e. 100 pence per pound) | Not applicable |
What is contract value? | contract value = (price x multiplier / price_magnifier) | $185 USD | 57.20 GBP (572 / 100) | $134,500 USD (2,690 x 50) |
What is target quantity? | divide target trade value by contract value | 540 shares ($100K / $185) | 1311 shares (75K GBP / 57.20 GBP) | 1 contract ($100K / $134.5K) |
Any current positions held by this strategy? | blotter service | 200 shares | 0 shares | 1 contract |
Any current open orders for this strategy? | blotter service | order for 100 shares currently active | none | none |
What is the required order quantity? | subtract current positions and open orders from target quantities | 240 shares (540 - 200 - 100) | 1311 shares (1311 - 0 - 0) | 0 contracts (1 - 1 - 0) |
Semi-manual vs automated trading
Since Moonshot generates a CSV of orders but doesn't actually place the orders, you can inspect the orders before placing them, if you prefer:
$ quantrocket moonshot trade 'my-strategy' -o orders.csv
$ csvlook -I orders.csv
| ConId | Account | Action | OrderRef | TotalQuantity | Exchange | OrderType | Tif |
| --------- | ------- | ------ | ----------- | ------------- | -------- | --------- | --- |
| 265598 | DU12345 | BUY | my-strategy | 501 | SMART | MKT | DAY |
| 3691937 | DU12345 | BUY | my-strategy | 58 | SMART | MKT | DAY |
| 15124833 | DU12345 | BUY | my-strategy | 284 | SMART | MKT | DAY |
| 208813719 | DU12345 | BUY | my-strategy | 86 | SMART | MKT | DAY |
If desired, you can edit the orders inside JupyterLab (right-click on filename > Open With > Editor). When ready, place the orders:
$ quantrocket blotter order -f orders.csv
For automated trading, pipe the orders CSV directly to the blotter over stdin:
$ quantrocket moonshot trade 'my-strategy' | quantrocket blotter order -f '-'
You can schedule this command to run on your countdown service. Be sure to read about collecting and using trading calendars, which enable you to run your trading command conditionally based on whether the market is open:
30 10 * * mon-fri quantrocket master isopen 'NASDAQ' && quantrocket moonshot trade 'my-strategy' | quantrocket blotter order -f '-'
In the event your strategy produces no orders, the blotter is designed to accept an empty file and simply do nothing.
End-of-day data collection and scheduling
For end of day strategies, you can use the same history database for live trading that you use for backtesting. Schedule your history database to be brought up-to-date each evening after the market closes and schedule Moonshot to run after that. Your countdown service crontab might look like this:
30 17 * * mon-fri quantrocket master isopen 'NASDAQ' --ago '5h' && quantrocket history collect 'nasdaq-eod'
0 9 * * mon-fri quantrocket master isopen 'NASDAQ' --in '1h' && quantrocket moonshot trade 'eod-strategy' | quantrocket blotter order -f '-'
Intraday real-time data collection and scheduling
For intraday strategies, there are two options for real-time data: your history database, or a real-time aggregate database.
History database as real-time feed
If your strategy trades a small number of securities or uses a large bar size, it may be suitable to use your history database as a real-time feed, updating the history database during the trading session. This approach is conceptually the simplest but historical data collection may be too slow for large universes and/or small bar sizes.
For an intraday strategy that uses 15-minute bars and enters the market at 10:00 AM based on 9:45 AM prices, you can schedule your history database to be brought current just after 9:45 AM and schedule Moonshot to run at 10:00 AM. Moonshot will generate orders based on the just-collected 9:45 AM prices.
46 9 * * mon-fri quantrocket master isopen 'ARCA' && quantrocket history collect 'arca-15min'
0 10 * * mon-fri quantrocket master isopen 'ARCA' && quantrocket moonshot trade 'intraday-strategy' | quantrocket blotter order -f '-'
In the above example, the 15-minute lag between collecting prices and placing orders mirrors the 15-minute bar size used in backtests. For smaller bar sizes, a smaller lag between data collection and order placement would be used.
The following is an example of scheduling an intraday strategy that trades throughout the day using 5-minute bars. Every 5 minutes between 8 AM and 8 PM, we collect forex data and run the strategy as soon as the data has been collected:
*/5 8-19 * * mon-fri quantrocket master isopen 'IDEALPRO' && quantrocket history collect 'fx-majors-5min' && quantrocket history wait 'fx-majors-5min' && quantrocket moonshot trade 'fx-revert' | quantrocket blotter order -f '-'
Real-time aggregate databases
If using your history database as a real-time feed is unsuitable, you should use a real-time aggregate database with a bar size equal to that of your history database.
Example 1: once-a-day equities strategy
In the first example, suppose we have backtested an Australian equities strategy using a history database of 15 minute bars called 'asx-15min'. At 15:00:00 Sydney time each day, we need to get an up-to-date quote for all ASX stocks and run Moonshot immediately afterward. To do so, we will collect real-time snapshot quotes, and aggregate them to 15-minute bars. (Even though there will only be a single quote to aggregate for each bar, aggregation is still required and ensures a uniform bar size.)
First we create the tick database and the aggregate database:
$ quantrocket realtime create-tick-db 'asx-snapshot' --universes 'asx-stk' --fields 'LastPrice'
status: successfully created tick database asx-snapshot
$ quantrocket realtime create-agg-db 'asx-snapshot-15min' --tick-db 'asx-snapshot' --bar-size '15m' --fields 'LastPrice:Close'
status: successfully created aggregate database asx-snapshot-15min from tick database asx-snapshot
>>> from quantrocket.realtime import create_tick_db, create_agg_db
>>> create_tick_db("asx-snapshot", universes="asx-stk",
fields=["LastPrice"])
{'status': 'successfully created tick database asx-snapshot'}
>>> create_agg_db("asx-snapshot-15min",
tick_db_code="asx-snapshot",
bar_size="15m",
fields={"LastPrice":["Close"]})
{'status': 'successfully created aggregate database asx-snapshot-15min from tick database asx-snapshot'}
$ curl -X PUT 'http://houston/realtime/databases/asx-snapshot?universes=asx-stk&fields=LastPrice'
{"status": "successfully created tick database asx-snapshot"}
$ curl -X PUT 'http://houston/realtime/databases/asx-snapshot/aggregates/asx-snapshot-15min?bar_size=15m&fields=LastPrice%3AClose'
{"status": "successfully created aggregate database asx-snapshot-15min from tick database asx-snapshot"}
For live trading, schedule real-time snapshots to be collected at the desired time and schedule Moonshot to run immediately afterward:
0 15 * * mon-fri quantrocket master isopen 'ASX' && quantrocket realtime collect 'asx-snapshot' --snapshot --wait && quantrocket moonshot trade 'asx-intraday-strategy' | quantrocket blotter order -f '-'
You can pull data from both your history database and your real-time aggregate database into your Moonshot strategy by specifying both databases in the DB
parameter. Also specify the combined set of fields you need from each database using the DB_FIELDS
parameter. In this example we need 'Close' from the history database and 'LastPriceClose' from the real-time aggregate database:
class ASXIntradayStrategy(Moonshot):
CODE = "asx-intraday-strategy"
DB = ["asx-15min", "asx-snapshot-15min"]
DB_FIELDS = ["Close", "LastPriceClose"]
Moonshot loads data using the get_prices
function, which supports querying a mix of history and real-time aggregate databases.
In your Moonshot code, you might combine the two data sources as follows:
>>> history_closes = prices.loc["Close"]
>>> realtime_closes = prices.loc["LastPriceClose"]
>>>
>>>
>>> combined_closes = realtime_closes.fillna(history_closes)
Example 2: continuous intraday futures strategy
In this example, we don't use a history database but rather collect real-time NYMEX futures data continuously throughout the day and run Moonshot every minute on the 1-minute aggregates.
First we create the tick database and the aggregate database:
$ quantrocket realtime create-tick-db 'nymex-fut-tick' --universes 'nymex-fut' --fields 'LastPrice' 'BidPrice' 'AskPrice'
status: successfully created tick database nymex-fut-tick
$ quantrocket realtime create-agg-db 'nymex-fut-tick-1min' --tick-db 'nymex-fut-tick' --bar-size '1m' --fields 'LastPrice:Close' 'BidPrice:Close' 'AskPrice:Close'
status: successfully created aggregate database nymex-fut-tick-1min from tick database nymex-fut-tick
>>> from quantrocket.realtime import create_tick_db, create_agg_db
>>> create_tick_db("nymex-fut-tick", universes="nymex-fut",
fields=["LastPrice","BidPrice","AskPrice"])
{'status': 'successfully created tick database nymex-fut-tick'}
>>> create_agg_db("nymex-fut-tick-1min",
tick_db_code="nymex-fut-tick",
bar_size="1m",
fields={"LastPrice":["Close"],"BidPrice":["Close"],"AskPrice":["Close"]})
{'status': 'successfully created aggregate database nymex-fut-tick-1min from tick database nymex-fut-tick'}
$ curl -X PUT 'http://houston/realtime/databases/nymex-fut-tick?universes=nymex-fut&fields=LastPrice&fields=BidPrice&fields=AskPrice'
{"status": "successfully created tick database nymex-fut-tick"}
$ curl -X PUT 'http://houston/realtime/databases/nymex-fut-tick/aggregates/nymex-fut-tick-1min?bar_size=1m&fields=LastPrice%3AClose&fields=BidPrice%3AClose&fields=AskPrice%3AClose'
{"status": "successfully created aggregate database nymex-fut-tick-1min from tick database nymex-fut-tick"}
Then, we schedule streaming market data to be collected throughout the day from 8:50 AM to 4:10 PM, and we schedule Moonshot to run every minute from 9:00 AM to 4:00 PM:
50 8 * * mon-fri quantrocket master isopen 'NYMEX' && quantrocket realtime collect 'nymex-fut-tick' --until '16:10:00 America/New_York'
* 9-15 * * mon-fri quantrocket master isopen 'NYMEX' && quantrocket moonshot trade 'nymex-futures-strategy' | quantrocket blotter order -f '-'
Since we aren't using a history database, Moonshot only needs to reference the real-time aggregate database:
class NymexFuturesStrategy(Moonshot):
CODE = "nymex-futures-strategy"
DB = "nymex-fut-tick-1min"
DB_FIELDS = ["LastPriceClose", "BidPriceClose", "AskPriceClose"]
Trade date validation
In live trading as in backtesting, a Moonshot strategy receives a DataFrame of historical prices and derives DataFrames of signals and target weights. In live trading, orders are created from the last row of the target weights DataFrame. To make sure you're not trading on stale data (for example because your history database hasn't been brought current), Moonshot validates that the target weights DataFrame is up-to-date.
Suppose our target weights DataFrame resembles the following:
>>> target_weights.tail()
AAPL(265598) AMZN(3691937)
Date
2018-05-04 0 0
2018-05-07 0.5 0
2018-05-08 0.5 0
2018-05-09 0 0
2018-05-10 0.25 0.25
By default, Moonshot looks for and extracts the row corresponding to today's date in the strategy timezone. (The strategy timezone can be set with the class attribute TIMEZONE
and is otherwise inferred from the timezone of the component securities.) Thus, if running the strategy on 2018-05-10, Moonshot would extract the last row from the above DataFrame. If running the strategy on 2018-05-11 or later, Moonshot will fail with the error:
msg: expected signal date 2018-05-11 not found in target weights DataFrame, is the underlying
data up-to-date? (max date is 2018-05-10)
status: error
This default validation behavior is appropriate for intraday strategies that trade once-a-day as well as end-of-day strategies that run after the market close, in both cases ensuring that today's price history is available to the strategy. However, if your strategy doesn't run until before the market open (for example because you need to collect fundamental data overnight), this validation behavior is too restrictive. In this case, you can set the CALENDAR
attribute on the strategy to an exchange code, and that exchange's trading calendar will be used for trade date validation instead of the timezone:
class MyStrategy(Moonshot):
...
CALENDAR = "NYSE"
...
Specifying the calendar allows Moonshot to be a little smarter, as it will only enforce the data being updated through the last date the exchange was open. Thus, if the strategy runs when the exchange is open, Moonshot still expects today's date to be in the target weights DataFrame. But if the exchange is currently closed, Moonshot expects the data date to correspond to the last date the exchange was open. This allows you to run the strategy before the market open using the prior session's data, while still enforcing that the data is not older than the previous session.
Intraday trade time validation
For intraday strategies that trade throughout the day (more specifically, for strategies that produce target weights DataFrames with a 'Time' level in the index), Moonshot validates the time of the data in addition to the date. For example, if you are using 15-minute bars and running a trading strategy at 11:48 AM, trade time validation ensures that the 11:45 AM target weights are used to create orders.
Trade time validation works as follows: Moonshot consults the entire date range of your DataFrame (not just the trade date) and finds the latest time that is earlier than the current time. In the example of running the strategy at 11:48 AM using 15-minute bars, this would be the 11:45 AM bar. Moonshot then checks that your prices DataFrame contains at least some non-null data for 11:45 AM on the trade date. If not, validation fails:
msg: no 11:45:00 data found in prices DataFrame for signal date 2018-05-10,
is the underlying data up-to-date? (max time for 2018-05-10 is 11:30:00)
status: error
This ensures that the intraday strategy won't run unless your data is up-to-date.
Review orders from earlier dates
At times you may want to bypass trade date validation and generate orders for an earlier date, for testing or troubleshooting purposes. You can pass a --review-date
for this purpose. For end-of-day strategies and once-a-day intraday strategies, only a date is needed:
$ quantrocket moonshot trade 'dma-tech' --review-date '2018-05-09' -o past_orders.csv
>>> from quantrocket.moonshot import trade
>>> trade("dma-tech", review_date="2018-05-09", filepath_or_buffer="past_orders.csv")
$ curl -X POST 'http://houston/moonshot/orders.csv?strategies=dma-tech&review_date=2018-05-09' > past_orders.csv
For intraday strategies that trade throughout the day, provide a date and time (you need not specify a timezone; the strategy timezone based on
TIMEZONE
or inferred from the component securities is assumed):
$ quantrocket moonshot trade 'fx-revert' --review-date '2018-05-09 11:45:00' -o past_intraday_orders.csv
>>> from quantrocket.moonshot import trade
>>> trade("fx-revert", review_date="2018-05-09 11:45:00", filepath_or_buffer="past_intraday_orders.csv")
$ curl -X POST 'http://houston/moonshot/orders.csv?strategies=fx-revert&review_date=2018-05-09+11%3A45%3A00' > past_intraday_orders.csv
Exiting positions
There are 3 ways to exit positions in Moonshot:
- Exit by rebalancing
- Attach exit orders
- Close positions with the blotter
Exit by rebalancing
By default, Moonshot calculates an order diff between your target positions and existing positions. This means that previously entered positions will be closed once the target position goes to 0, as Moonshot will generate the closing order needed to achieve the target position. This is a good fit for strategies that periodically rebalance.
Learn more about rebalancing.
Attach exit orders
Sometimes, instead of relying on rebalancing, it's helpful to submit exit orders at the time you submit your entry orders. For example, if your strategy enters the market intraday and exits at market close, it's easiest to submit the entry and exit orders at the same time.
This is referred to as attaching a child order , and can be used for bracket orders , hedging orders , or in this case, simply a pre-planned exit order. The attached order is submitted to IB's system but is only executed if the parent order executes.
Moonshot provides a utility method for creating attached child orders, orders_to_child_orders
, which can be used like this:
def order_stubs_to_orders(self, orders, prices):
orders["Exchange"] = "SMART"
orders["OrderType"] = "MKT"
orders["Tif"] = "Day"
child_orders = self.orders_to_child_orders(orders)
child_orders.loc[:, "OrderType"] = "MOC"
orders = pd.concat([orders, child_orders])
return orders
The orders_to_child_orders
method creates child orders by copying your orders DataFrame but reversing the Action (BUY/SELL), and linking the child orders to the parent orders via an OrderId
column on the parent orders and a ParentId
column on the child orders. Interactively, the above example would look like this:
>>> orders.head()
ConId Action TotalQuantity Exchange OrderType Tif
0 12345 BUY 200 SMART MKT Day
1 23456 BUY 400 SMART MKT Day
>>>
>>> child_orders = self.orders_to_child_orders(orders)
>>>
>>> child_orders.loc[:, "OrderType"] = "MOC"
>>> orders = pd.concat([orders, child_orders])
>>> orders.head()
ConId Action TotalQuantity Exchange OrderType Tif OrderId ParentId
0 12345 BUY 200 SMART MKT Day 0 NaN
1 23456 BUY 400 SMART MKT Day 1 NaN
0 12345 SELL 200 SMART MOC Day NaN 0
1 23456 SELL 400 SMART MOC Day NaN 1
Note that the OrderId
and ParentId
generated by Moonshot are not the actual order IDs used by the blotter. The blotter uses OrderId
/ParentId
(if provided) to identify linked orders but then generates the actual order IDs at the time of order submission to IB.
Close positions with the blotter
A third option for closing positions is to use the blotter to flatten all positions for a strategy. For example, if your strategy enters positions in the morning and exits on the close, you could design the strategy to create the entry orders only, then schedule a command in the afternoon to flatten the positions:
0 10 * * mon-fri quantrocket master isopen 'TSE' && quantrocket moonshot trade 'canada-intraday' | quantrocket blotter order -f '-'
0 15 * * mon-fri quantrocket blotter close --order-refs 'canada-intraday' --params 'OrderType:MOC' 'Tif:Day' 'Exchange:TSE' | quantrocket blotter order -f '-'
This approach works best in scenarios where you want to flatten all positions in between each successive run of the strategy. Such scenarios can also be handled by attaching exit orders.
Learn more about closing positions with the blotter.
Tick sizes
Price rounding
When placing limit orders, stop orders, or other orders that specify price levels, it is necessary to ensure that the price you submit to IB adheres to the security's tick size rules (also called minimum price increments in IB parlance). This refers to the minimum difference between price levels at which a security can trade.
Some securities have constant price increments at all price levels. For example, most US stocks trade in penny increments. Other securities have difference minimum increments on different exchanges on which they trade and/or different minimum increments at different price levels. For example, these are the tick size rules for orders for MITSUBISHI CORP direct-routed to the Tokyo Stock Exchange:
If price is between... | Tick size is... |
---|
0 - 1,000 | 0.1 |
1,000 - 3,000 | 0.5 |
3,000 - 10,000 | 1 |
10,000 - 30,000 | 5 |
30,000 - 100,000 | 10 |
100,000 - 300,000 | 50 |
300,000 - 1,000,000 | 100 |
1,000,000 - 3,000,000 | 500 |
3,000,000 - 10,000,000 | 1,000 |
10,000,000 - 30,000,000 | 5,000 |
30,000,000 - | 10,000 |
In contrast, SMART-routed orders for Mitsubishi must adhere to a different, simpler set of tick size rules:
If price is between... | Tick size is... |
---|
0 - 5,000 | 0.1 |
5,000 - 100,000 | 1 |
100,000 - | 10 |
Luckily you don't need to keep track of tick size rules as they are stored in the securities master database. You can create your Moonshot orders CSV with unrounded prices then pass the CSV to the master service for price rounding. For example, consider two limit orders for Mitsubishi, one SMART-routed and one direct-routed to TSEJ, with unrounded limit prices of 15203.1135 JPY:
$ csvlook -I orders.csv
| ConId | Account | Action | OrderRef | TotalQuantity | Exchange | OrderType | LmtPrice | Tif |
| -------- | ------- | ------ | -------------- | ------------- | -------- | --------- | ---------- | --- |
| 13905888 | DU12345 | BUY | japan-strategy | 1000 | SMART | LMT | 15203.1135 | DAY |
| 13905888 | DU12345 | BUY | japan-strategy | 1000 | TSEJ | LMT | 15203.1135 | DAY |
If you pass this CSV to the master service and tell it which columns to round, it will round the prices in those columns based on the tick size rules for that ConId and Exchange:
$ quantrocket master ticksize -f orders.csv --round 'LmtPrice' -o rounded_orders.csv
>>> from quantrocket.master import round_to_tick_sizes
>>> round_to_tick_sizes("orders.csv", round_fields=["LmtPrice"], outfilepath_or_buffer="rounded_orders.csv")
$ curl -X GET 'http://houston/master/ticksizes.csv?round_fields=LmtPrice' --upload-file orders.csv > rounded_orders.csv
The SMART-routed order is rounded to the nearest Yen while the TSEJ-routed order is rounded to the nearest 5 Yen, as per the tick size rules. Other columns are returned unchanged:
$ csvlook -I rounded_orders.csv
| ConId | Account | Action | OrderRef | TotalQuantity | Exchange | OrderType | LmtPrice | Tif |
| -------- | ------- | ------ | -------------- | ------------- | -------- | --------- | -------- | --- |
| 13905888 | DU12345 | BUY | japan-strategy | 1000 | SMART | LMT | 15203.0 | DAY |
| 13905888 | DU12345 | BUY | japan-strategy | 1000 | TSEJ | LMT | 15205.0 | DAY |
The ticksize command accepts file input over stdin, so you can pipe your moonshot orders directly to the master service for rounding, then pipe the rounded orders to the blotter for submission:
$ quantrocket moonshot trade 'my-japan-strategy' | quantrocket master ticksize -f '-' --round 'LmtPrice' | quantrocket blotter order -f '-'
In the event your strategy produces no orders, the ticksize command, like the blotter, is designed to accept an empty file and simply do nothing.
If you need the actual tick sizes and not just the rounded prices, you can instruct the ticksize endpoint to include the tick sizes in the resulting file:
$ quantrocket master ticksize -f orders.csv --round 'LmtPrice' --append-ticksize -o rounded_orders.csv
>>> from quantrocket.master import round_to_tick_sizes
>>> round_to_tick_sizes("orders.csv", round_fields=["LmtPrice"], append_ticksize=True, outfilepath_or_buffer="rounded_orders.csv")
$ curl -X GET 'http://houston/master/ticksizes.csv?round_fields=LmtPrice&append_ticksize=true' --upload-file orders.csv > rounded_orders.csv
A new column with the tick sizes will be appended, in this case called "LmtPriceTickSize":
$ csvlook -I rounded_orders.csv
| ConId | Account | Action | OrderRef | TotalQuantity | Exchange | OrderType | LmtPrice | Tif | LmtPriceTickSize |
| -------- | ------- | ------ | -------------- | ------------- | -------- | --------- | -------- | --- | ---------------- |
| 13905888 | DU12345 | BUY | japan-strategy | 1000 | SMART | LMT | 15203.0 | DAY | 1.0 |
| 13905888 | DU12345 | BUY | japan-strategy | 1000 | TSEJ | LMT | 15205.0 | DAY | 5.0 |
Tick sizes can be used for submitting orders that require price offsets such as Relative/Pegged-to-Primary orders.
Note that for securities with constant price increments, for example US stocks that trade in penny increments, you also have the option of simply rounding the prices in your strategy code using Pandas' round()
:
def order_stubs_to_orders(self, orders, prices):
...
orders["OrderType"] = "LMT"
limit_prices = prior_closes * 1.02
orders["LmtPrice"] = limit_prices.round(2)
...
Price offsets
Some orders, such as Relative/Pegged-to-Primary orders, require defining an offset amount using the AuxPrice
field. In the case of Relative orders, which move dynamically with the market, the offset amount defines how much more aggressive than the NBBO the order should be.
In some cases, it may suffice to hard-code an offset amount, e.g. $0.01:
def order_stubs_to_orders(self, orders, prices):
orders["Exchange"] = "SMART"
orders["OrderType"] = "REL"
orders["AuxPrice"] = 0.01
...
However, as the offset must conform to the security's tick size rules, for some exchanges it's necessary to look up the tick size and use that to define the offset:
import pandas as pd
import io
from quantrocket.master import round_to_tick_sizes
...
def order_stubs_to_orders(self, orders, prices):
orders["Exchange"] = "SMART"
orders["OrderType"] = "REL"
prior_closes = prices.loc["Close"].shift()
prior_closes = self.reindex_like_orders(prior_closes, orders)
orders["PriorClose"] = prior_closes
infile = io.StringIO()
outfile = io.StringIO()
orders.to_csv(infile, index=False)
round_to_tick_sizes(infile, round_fields=["PriorClose"], append_ticksize=True, outfilepath_or_buffer=outfile)
tick_sizes = pd.read_csv(outfile).PriorCloseTickSize
orders["AuxPrice"] = tick_sizes * 2
orders.drop("PriorClose", axis=1, inplace=True)
...
Paper trading
There are several options for testing your trades before you run your strategy on a live account. You can log the trades to flightlog, you can inspect the orders before placing them, and you can trade against your IB paper account.
Log trades to flightlog
After researching and backtesting a strategy in aggregate it's often nice to carefully inspect a handful of actual trades before committing real money. A good option is to start running the strategy but log the trades to flightlog instead of sending them to the blotter:
0 9 * * mon-fri quantrocket master isopen 'NYSE' --in 1h && quantrocket moonshot trade 'mean-reverter' | quantrocket flightlog log --name 'mean-reverter'
Then manually inspect the trades to see if you're happy with them.
Semi-manual trading
Another option which works well for end-of-day strategies is to generate the Moonshot orders, inspect the CSV file, then manually place the orders if you're happy. See the section on semi-manual trading.
IB Paper trading
You can also paper trade the strategy using your IB paper trading account. To do so, allocate the strategy to your paper account in quantrocket.moonshot.allocations.yml
:
DU12345:
mystrategy: 0.5
Then add the appropriate command to your countdown crontab, just as you would for a live account.
IB Paper trading limitations
IB paper trading accounts provide a useful way to dry-run your strategy, but it's important to note that IB's paper trading environment is not a full-scale simulation. For example, IB doesn't attempt to simulate certain order types such as on-the-open and on-the-close orders; such orders are accepted by the system but never filled. You may need to work around this limitation by modifying your orders for live vs paper accounts.
Paper trading is primarily useful for validating that your strategy is generating the orders you expect. It's less helpful for seeing what those orders do in the market or performing out-of-sample testing. For that, consider a small allocation to a live account.
See IB's website for a list of paper trading limitations .
Different orders for live vs paper accounts
As some order types aren't supported in IB paper accounts, you can specify different orders for paper vs live accounts:
def order_stubs_to_orders(self, orders, prices):
orders["OrderType"] = "MKT"
orders["Tif"] = "OPG"
orders.loc[orders.Account.str.startswith("D"), "Tif"] = "DAY"
...
Rebalancing
Periodic rebalancing
A Moonshot strategy's prices_to_signals
logic will typically calculate signals for each day in the prices DataFrame. However, for many factor model or cross-sectional strategies, you may not wish to rebalance that frequently. For example, suppose our strategy logic ranks stocks every day by momentum and buys the top 10%:
>>>
>>> returns = closes.shift(252)/closes - 1
>>>
>>> ranks = returns.rank(axis=1, ascending=False, pct=True)
>>>
>>> signals = (ranks <= 0.1).astype(int)
>>> signals.head()
ConId 123456 234567 ...
Date
2018-05-31 1 0
2018-06-01 0 1
2018-06-02 0 0
2018-06-03 1 0
...
2018-06-30 0 1
2018-07-01 0 1
2018-07-02 1 0
As implemented above, the strategy will trade in and out of positions daily. Instead, we can limit the strategy to monthly rebalancing:
>>>
>>>
>>>
>>>
>>>
>>> signals = signals.resample("M").last()
>>> signals = signals.reindex(closes.index, method="ffill")
>>> signals.head()
ConId 123456 234567 ...
Date
2018-05-31 1 0
2018-06-01 1 0
2018-06-02 1 0
2018-06-03 1 0
...
2018-06-30 0 1
2018-07-01 0 1
2018-07-02 0 1
Then, in live trading, to mirror the resampling logic, schedule the strategy to run only on the first trading day of the month:
0 9 * * mon-fri quantrocket master isclosed 'NASDAQ' --since 'M' && quantrocket master isopen 'NASDAQ' --in '1h' && quantrocket moonshot trade 'nasdaq-momentum' | quantrocket blotter order -f '-'
Disabling rebalancing
By default, Moonshot generates orders as needed to achieve your target weights, after taking account of your existing positions. This design is well-suited for strategies that periodically rebalance positions. However, in live trading, this behavior can be suboptimal for strategies that hold multi-day positions which are not intended to be rebalanced. You may wish to disable rebalancing for such strategies.
For example, suppose your strategy calls for holding a 5% position of AAPL for a period of several days. When you enter the position, you account balance is $1M USD and the price of AAPL is $100, so you buy 500 shares ($1M X 0.05 / $100). A day later, your account balance is $1.02M, while the price of AAPL is $97, so Moonshot calculates your target position as 526 shares ($1.02M X 0.05 / $97) and create an order to buy 26 shares (526 - 500). The following day, your account balance is unchanged at $1.02M but the price of AAPL is $98.50, resulting in a target position of 518 shares and a net order to sell 8 shares (518 - 526). Day-to-day changes in the share price and/or your account balance result in small buy or sell orders for the duration of the position.
These small rebalancing orders are problematic because they incur slippage and commissions which are not reflected in a backtest. In a backtest, the position is maintained at a constant weight of 5% so there are no day-to-day transaction costs. Thus, the daily rebalancing orders will introduce hidden costs into live performance compared to backtested performance.
You can disable rebalancing for a strategy using the ALLOW_REBALANCE
parameter:
class MultiDayStrategy(Moonshot):
...
ALLOW_REBALANCE = False
When ALLOW_REBALANCE
is set to False
, Moonshot will not create orders to rebalance a position which is already on the correct side (long or short). Moonshot will still create orders as needed to open a new position, close an existing position, or change sides (long to short or short to long). When ALLOW_REBALANCE
is True
(the default), Moonshot creates orders as needed to achieve the target weight.
You can also use a decimal value with ALLOW_REBALANCE
to allow rebalancing only when the target position is sufficiently different from the existing position size. For example, don't rebalance unless the position size will change by at least 25%:
class MultiDayStrategy(Moonshot):
...
ALLOW_REBALANCE = 0.25
In this example, if the target position size is 600 shares and the current position size is 500 shares, the rebalancing order will be suppressed because 100/500 < 0.25. If the target position is 300 shares, the rebalancing order will be allowed because 200/500 > 0.25.
By disabling rebalancing, your commissions and slippage will mirror your backtest. However, your live position weights will fluctuate and differ somewhat from the constant weights of your backtest, and as a result your live returns will not match your backtest returns exactly. This is often a good trade-off because the discrepancy in position weights (and thus returns) is usually two-sided (i.e. sometimes in your favor, sometimes not) and thus roughly nets out, while the added transaction costs of daily rebalancing is a one-sided cost that degrades live performance.
Algorithmic orders
IB provides various algorithmic order types which can be helpful for working large orders into the market. In fact, if you submit a market order that is too big based on the security's liquidity, IB might reject the order with this message:
quantrocket.blotter: WARNING ibg2 client 6001 got IB error code 202: Order Canceled - reason:In accordance with our regulatory obligations, we have rejected this order because it is too large compared to the liquidity that is generally available for this product. If you would like to submit an order of this size, please submit an algorithmic order (such as VWAP, TWAP, or Percent of Volume)
IB historical data for the default TRADES
bar type includes a Wap
field, which is defined by IB as "the VWAP over trade data filtered for some trade types (combos, derivatives, odd lots, block trades)".
>>> prices = get_prices("usa-stk-1d", fields=["Wap"])
>>> vwaps = prices.loc["Wap"]
This makes it possible to use the Wap
field to calculate returns in your backtest, then use IB's "Vwap" order algo in live trading (or a similar order algo) to mirror your backtest.
VWAP for end-of-day strategies
For an end-of-day strategy, the relevant example code for a backtest is shown below:
class UpMinusDown(Moonshot):
...
DB_FIELDS = ["Wap", "Volume", "Close"]
...
def positions_to_gross_returns(self, positions, prices):
vwaps = prices.loc["Wap"]
gross_returns = vwaps.pct_change() * positions.shift()
return gross_returns
Here, we are modeling our orders being filled at the next day's VWAP. Then, for live trading, create orders using IB's VWAP algo:
class UpMinusDown(Moonshot):
...
def order_stubs_to_orders(self, orders, prices):
orders["OrderType"] = "MKT"
orders["AlgoStrategy"] = "Vwap"
orders["Tif"] = "DAY"
orders["Exchange"] = "SMART"
return orders
If placed before the market open, IB will seek to fill this order over the course of the day at the day's VWAP, thus mirroring our backtest.
VWAP for intraday strategies
VWAP orders can also be modeled and used on an intraday timeframe. For example, suppose we are using 30-minute bars and want to enter and exit positions gradually between 3:00 and 3:30 PM. In backtesting, we can use the 15:00:00 Wap
:
class IntradayStrategy(Moonshot):
...
DB_FIELDS = ["Wap", "Volume", "Close"]
...
def positions_to_gross_returns(self, positions, prices):
vwaps = prices.loc["Wap"].xs("15:00:00", level="Time")
gross_returns = vwaps.pct_change() * positions.shift()
return gross_returns
Then, for live trading, run the strategy at 15:00:00 and instruct IB to finish the VWAP orders by 15:30:00:
class IntradayStrategy(Moonshot):
...
def order_stubs_to_orders(self, orders, prices):
orders["OrderType"] = "MKT"
orders["AlgoStrategy"] = "Vwap"
now = pd.Timestamp.now("America/New_York")
end_time = now.replace(hour=15, minute=30, second=0)
end_time_str = end_time.astimezone("UTC").strftime("%Y%m%d %H:%M:%S GMT")
orders["AlgoParams_endTime"] = end_time_str
orders["AlgoParams_allowPastEndTime"] = 1
orders["Tif"] = "DAY"
orders["Exchange"] = "SMART"
return orders
Algo parameters
In the IB API, algorithmic orders are specified by the AlgoStrategy
field, with additional algo parameters specified in the AlgoParams
fields (algo parameters are optional or required depending on the algo). The AlgoParams
field is a nested field which expects a list of multiple algo-specific parameters ; since the orders CSV (and the DataFrame it derives from) is a flat-file format, these nested parameters can be specified using underscore separators, e.g. AlgoParams_maxPctVol
:
def order_stubs_to_orders(self, orders, prices):
orders["AlgoStrategy"] = "Vwap"
orders["AlgoParams_maxPctVol"] = 0.1
orders["AlgoParams_noTakeLiq"] = 1
...
Moonshot snippets
These snippets are meant to be useful and suggestive as starting points, but they may require varying degrees of modification to conform to the particulars of your strategy.
Multi-day holding periods
One way to implement multi-day holding periods is to forward-fill signals with a limit:
def signals_to_target_weights(self, signals, prices):
weights = self.allocate_fixed_weights(signals, 0.05)
weights = weights.where(weights!=0).fillna(method="ffill", limit=2)
weights.fillna(0, inplace=True)
return weights
Limit orders
To use limit orders in a backtest, you can model whether they get filled in target_weights_to_positions
. For example, suppose we generate signals after the close and place orders to enter on the open the following day using limit orders set 1% above the prior close for BUYs and 1% below the prior close for SELLs:
def target_weights_to_positions(self, weights, prices):
positions = weights.shift()
prior_closes = prices.loc["Close"].shift()
buy_limit_prices = prior_closes * 1.01
sell_limit_prices = prior_closes * 0.99
opens = prices.loc["Open"]
buy_orders = positions > 0
sell_orders = positions < 0
opens_below_buy_limit = opens < buy_limit_prices
opens_above_sell_limit = opens > sell_limit_prices
gets_filled = (buy_orders & opens_below_buy_limit) | (sell_orders & opens_above_sell_limit)
positions = positions.where(gets_filled, 0)
return positions
For live trading, create the corresponding order parameters in order_stubs_to_orders
:
def order_stubs_to_orders(self, orders, prices):
prior_closes = prices.loc["Close"].shift()
prior_closes = self.reindex_like_orders(prior_closes, orders)
buy_limit_prices = prior_closes * 1.01
sell_limit_prices = prior_closes * 0.99
buy_orders = orders.Action == "BUY"
sell_orders = ~buy_orders
orders["LmtPrice"] = None
orders.loc[buy_orders, "LmtPrice"] = buy_limit_prices.loc[buy_orders]
orders.loc[sell_orders, "LmtPrice"] = sell_limit_prices.loc[sell_orders]
...
GoodAfterTime orders
Place market orders that won't become active until 3:55 PM:
def order_stubs_to_orders(self, orders, prices):
now = pd.Timestamp.now(self.TIMEZONE)
good_after_time = now.replace(hour=15, minute=55, second=0)
good_after_time_str = good_after_time.astimezone("UTC").strftime("%Y%m%d %H:%M:%S GMT")
orders["GoodAfterTime"] = good_after_time_str
...
Early close
For intraday strategies that use the session close bar for rolling calculations, early close days can interfere with the rolling calculations by introducing NaNs. Below, with 15-minute data, calculate 50-day moving average by using the early close bar when the close bar is missing:
session_closes = prices.loc["Close"].xs("15:45:00", level="Time")
early_close_session_closes = prices.loc["Close"].xs("12:45:00", level="Time")
session_closes.fillna(early_close_session_closes, inplace=True)
mavgs = session_closes.rolling(window=50).mean()
The scheduling section contains examples of scheduling live trading around early close days.
Moonshot cache
Moonshot implements DataFrame caching to improve performance.
When you run a Moonshot backtest, historical price data is retrieved from the database and loaded into Pandas, and the resulting DataFrame is cached to disk. If you run another backtest without changing any parameters that affect the historical data query (including start and end date, universes and conids, and database fields and times), the cached DataFrame is used without hitting the database, resulting in a faster runtime. Caching is particularly useful for parameter scans, which run repeated backtests using the same data.
No caching is used for live trading.
Bypass the cache
Moonshot tries to be intelligent about when the cache should not be used. For example, if you run a backtest with no end date (indicating you want up-to-date history from your database), Moonshot will bypass the cache if the database was recently modified (indicating there might be new data available). However, there are certain cases where you might need to manually bypass the Moonshot cache:
- if your strategy uses the
UNIVERSES
or EXCLUDE_UNIVERSES
parameters, and you change the constituents of the universe, then run another backtest, Moonshot will re-use the cached DataFrame, not realizing that the underlying universe constituents have changed. - if you run a backtest that specifies an end date, Moonshot will try to use the cache, even if the underlying history database has changed for whatever reason.
You can manually bypass the cache using the --no-cache/no_cache
option:
A similar parameter is available for parameter scans and machine learning walk-forward optimizations.
Machine Learning
Machine learning in QuantRocket utilizes Moonshot and this section assumes basic familiarity with Moonshot.
QuantRocket supports backtesting and live trading of machine learning strategies using Moonshot. Key features include:
- Walk-forward optimization: Support for rolling and expanding walk-forward optimization, widely considered the best technique for validating machine learning models in finance.
- Incremental/out-of-core learning: Train models and run backtests even when your data is too large to fit in memory.
- Multiple machine learning/deep learning packages: Support for multiple Python machine learning packages including scikit-learn, Keras + TensorFlow, and XGBoost.
The basic workflow of a machine learning strategy is as follows:
- use prices, fundamentals, or other data to create features and targets for your model (features are the predictors, for example past returns, and targets are what you want to predict, for example future returns)
- choose and customize a machine learning model (or rely on QuantRocket's default model)
- train the model with your features and targets
- use the model's predictions to generate trading signals
MoonshotML
Below is simple machine learning strategy which asks the model to predict next-day returns based on prior 1- and 2-day returns, then uses the model's predictions to generate signals:
from moonshot import MoonshotML
class DemoMLStrategy(MoonshotML):
CODE = "demo-ml"
DB = "demo-stk-1d"
def prices_to_features(self, prices):
closes = prices.loc["Close"]
features = {}
features["returns_1d"]= closes.pct_change()
features["returns_2d"] = (closes - closes.shift(2)) / closes.shift(2)
targets = closes.pct_change().shift(-1)
return features, targets
def predictions_to_signals(self, predictions, prices):
signals = predictions > 0
return signals.astype(int)
Machine learning strategies inherit from MoonshotML
instead of Moonshot
. Instead of defining a prices_to_signals
method as with a standard Moonshot strategy, a machine learning strategy should define two methods for generating signals: prices_to_features
and predictions_to_signals
.
Prices to features
The prices_to_features
method takes a DataFrame of prices and should return a tuple of features and targets that will be used to train the machine learning model.
The features should be a dict or list of DataFrames, where each DataFrame is a single feature. You can provide as many features as you want. If using a dict, assigning each feature to a unique key in the dict (the specific name of the dict keys is not used and doesn't matter).
features = {}
features["returns_1d"]= closes.pct_change()
features["returns_2d"] = (closes - closes.shift(2)) / closes.shift(2)
Alternatively features can be a list of DataFrames:
features = []
features.append(closes.pct_change())
features.append((closes - closes.shift(2)) / closes.shift(2))
The targets (what you want to predict) should be a DataFrame with an index matching that of the individual features DataFrames. The targets are only consulted by QuantRocket during the training segments of walk-forward optimization, in order to train the model. They are ignored during the backtesting segments of walk-forward optimization (as well as in live trading), when the model is used for prediction rather than training.
If using a regression model (which includes the default model), the targets should be a continuous variable such as returns. If using a classification model, the targets should represent two or more discrete classes (for example 1 and 0 for buy and don't-buy).
You can predict any variable you want; you need not predict returns.
Predictions to signals
In a backtest or live trading, the features (but not targets) from your prices_to_features
method are fed to the machine learning model to generate predictions. These predictions are in turn fed to your predictions_to_signals
method, which should use them (in conjunction with any other logic you wish to apply) to generate a DataFrame of signals. In the simple example below, we generate long signals when the predicted return is positive.
def predictions_to_signals(self, predictions, prices):
signals = predictions > 0
return signals.astype(int)
After you've generated signals, a MoonshotML
strategy is identical to a standard Moonshot
strategy. You can define the standard Moonshot methods including signals_to_target_weights
, target_weights_to_positions
, and positions_to_gross_returns
.
Single-security vs multi-security predictions
You can use different conventions for your features and targets, depending on how many things you are trying to predict.
The above examples demonstrate the use of DataFrames for the features and targets. This convention is suitable when you are making predictions about each security in the prices DataFrame. In the example, the model trains on the past returns of all securities and predicts the future returns of all securities.
When you create multiple DataFrames of features, QuantRocket prepares the DataFrames for the machine learning model by stacking each DataFrame into a single column and concatenating the columns into a single 2d numpy array of features, where each column is a feature.
Alternatively, you might have multiple instruments in your prices DataFrame but only wish to make predictions about one of them. This can be accomplished by using Series for the features and targets instead of DataFrames. In the following example, we want to predict the future return of the S&P 500 index using its past return and the level of the VIX:
SPX = 416904
VIX = 13455763
def prices_to_features(self, prices):
closes = prices.loc["Close"]
spx_closes = closes[SPX]
vix_closes = closes[VIX]
features = {}
features["spx_returns_1d"]= spx_closes.pct_change()
features["vix_above_20"] = (vix_closes > 20).astype(int)
targets = spx_closes.pct_change().shift(-1)
return features, targets
Since the features and targets are Series, the model's predictions that are fed back to predictions_to_signals
will also be a Series, which we can use to generate our SPX signals:
def predictions_to_signals(self, predictions, prices):
closes = prices.loc["Close"]
signals = pd.DataFrame(False, index=closes.index, columns=closes.columns)
signals.loc[:, SPX] = predictions > 0
return signals.astype(int)
Predict probabilities
By default, Moonshot always calls the predict
method on your model to generate predictions. Some scikit-learn classifiers provide an additional predict_proba
method, which predicts the probability that a sample belongs to the class. To use predict_proba
, you can monkey patch the model in prices_to_features
:
def prices_to_features(self, prices):
if self.model:
self.model.predict = self.model.predict_proba
...
The targets you define in prices_to_features
must be 0s and 1s (for example by casting a boolean DataFrame to integers). The predictions returned to predictions_to_signals
represent the probabilities that the samples belong to class label 1 (that is, True). An example is shown below:
def prices_to_features(self, prices):
...
are_hot_stocks = next_day_returns > 0.04
targets = are_hot_stocks.astype(int)
return features, targets
def predictions_to_signals(self, predictions, prices):
likely_hot_stocks = predictions > 0.70
long_signals = likely_hot_stocks.astype(int)
return long_signals
Walk-forward backtesting
With the MoonshotML strategy code in place, we are ready to run a walk-forward optimization:
>>> from quantrocket.moonshot import ml_walkforward
>>> ml_walkforward("demo-ml",
start_date="2006-01-01", end_date="2012-12-31",
train="Y", min_train="4Y",
filepath_or_buffer="demo_ml*")
In a walk-forward optimization, the data is split into segments. The model is trained on the first segment of data then tested on the second segment, then trained again with the second segment and tested on the third segment, and so on. In the above example, we retrain the model annually (train="Y"
) and require 4 years of initial training (min_train="4Y"
) before performing any backtesting. (Training intervals should be specified as Pandas offset aliases.) The above parameters result in the following sequence of training and testing:
| |
---|
train | 2006-2009 |
test | 2010 |
train | 2010 |
test | 2011 |
train | 2011 |
test | 2012 |
train | 2012 |
During each training segment, the features and targets for the training dates are collected from your MoonshotML strategy and used to train the model. During each testing segment, the features for the testing dates are collected from your MoonshotML strategy and used to make predictions, which are fed back to your strategy's predictions_to_signals
method.
Walk-forward results
The walk-forward optimization returns a Zip file containing the backtest results CSV (which is a concatenation of backtest results for each individual test period) and the trained model. As a convenience, you can use an asterisk in the output filename as in the above example (filepath_or_buffer="demo_ml*"
) to instruct the QuantRocket client to automatically extract the files from the Zip file, saving them in this example to "demo_ml_results.csv" and "demo_ml_trained_model.joblib".
The backtest results CSV is a standard Moonshot CSV which can be used to generate a Moonchart tear sheet:
>>> from moonchart import Tearsheet
>>> Tearsheet.from_moonshot_csv("demo_ml_results.csv")
The model file is a pickle (serialization) of the now trained machine learning model that was used in the walk-forward optimization. (In this example we did not specify a custom model so the default model was used.) The trained model can be loaded into Python using joblib
:
>>> import joblib
>>> trained_model = joblib.load("demo_ml_trained_model.joblib")
>>> print(trained_model.coef_)
Joblib is a package which, among other features, provides a replacement of Python's standard pickle library that is optimized for serializing objects containing large numpy arrays, as is the case for some trained machine learning models.
If you like the backtest results, make sure to save the trained model so you can use it later for live trading.
Rolling vs expanding windows
QuantRocket supports rolling or expanding walk-forward optimizations.
With an expanding window (the default), the training start date remains fixed to the beginning of the simulation and consequently the size of the training window expands over time. In contrast, with a rolling window, the model is trained using a rolling window of data that moves forward over time and remains constant in size. For example, assuming a model with 3 years initial training and retrained annually, the following table depicts the difference between expanding and rolling windows:
iteration | training period (expanding) | training period (rolling 3-yr) |
---|
1 | 2006-2009 | 2006-2009 |
2 | 2006-2010 | 2007-2010 |
3 | 2006-2011 | 2008-2011 |
Thus, a rolling walk-forward optimization trains the model using recent data only, whereas an expanding walk-forward optimization trains the model using all available data since the start of the simulation.
To run a rolling optimization, specify the rolling window size using the rolling_train
parameter:
>>> ml_walkforward("demo-ml",
start_date="2006-01-01", end_date="2012-12-31",
train="Y", rolling_train="4Y",
force_nonincremental=True,
filepath_or_buffer="demo_ml*")
Note the distinction between train
and rolling_train
: the model will be re-trained at intervals of size train
using data windows of size rolling_train
.
If using the default or another model that supports incremental learning, you must also specify
force_nonincremental=True
, as rolling optimizations cannot be run incrementally. See the
incremental learning section to learn more.
Progress indicator
For long-running walk-forward optimizations, you can specify progress=True
which will instruct QuantRocket to log the ongoing progress of the walk-forward optimization to flightlog at each iteration, showing which segments are completed as well as the Sharpe ratio of each test segment:
[demo-ml] Walk-forward analysis progress
train test progress
start end start end status Sharpe
iteration
0 2005-12-31 2009-12-30 2009-12-31 2010-12-30 ✓ 0.94
1 2009-12-31 2010-12-30 2010-12-31 2011-12-30 ✓ -0.11
2 2010-12-31 2011-12-30 2011-12-31 2012-12-31 -
...
8 2017-12-31 2018-12-31 NaN NaN
Note that the logged progress indicator will include timestamps and service names like any other log line and as a result may not fit nicely in your Terminal window. You can use the Unix cut
utility to trim the log lines and produce the cleaner output shown above:
$
$ quantrocket flightlog stream | cut -d ' ' -f 5-
Model customization
From the numerous machine learning algorithms that are available, QuantRocket provides a sensible default but also allows you to choose and customize your own.
To customize the model and/or its hyper-parameters, instantiate the model as desired, serialize it to disk, and pass the serialized model to the walk-forward optimization.
>>> from sklearn.tree import DecisionTreeRegressor
>>> import joblib
>>> regr = DecisionTreeRegressor()
>>> joblib.dump(regr, "tree_model.joblib")
>>> from quantrocket.moonshot import ml_walkforward
>>> ml_walkforward("demo-ml",
start_date="2006-01-01", end_date="2012-12-31",
train="Y",
model_filepath="tree_model.joblib",
filepath_or_buffer="demo_ml_decision_tree*")
Default model
If you don't specify a model, the model used is scikit-learn's SGDRegressor
, which provides linear regression with Stochastic Gradient Descent. Because SGD is sensitive to feature scaling, the default model first runs the features through scikit-learn's StandardScaler
, using a scikit-learn Pipeline
to combine the two steps. Using the default model is equivalent to creating the model shown below:
from sklearn.pipeline import Pipeline
from sklearn.linear_model import SGDRegressor
from sklearn.preprocessing import StandardScaler
model = Pipeline([("scaler", StandardScaler()),
("estimator", SGDRegressor())])
SGDRegressor
is used as the default model in part because it supports incremental learning and thus is suitable for larger-than-memory datasets.
Scikit-learn
Scikit-learn is perhaps the most commonly used machine learning library for Python. It provides a variety of off-the-shelf machine learning algorithms and boasts a user guide that is excellent not only as an API reference but as an introduction to many machine learning concepts. Depending on your needs, your model can be a single estimator:
>>> from sklearn.tree import DecisionTreeRegressor
>>> import joblib
>>> regr = DecisionTreeRegressor(max_depth=2)
>>> joblib.dump(regr, "tree_model.joblib")
Or a multi-step pipeline:
>>> from sklearn.pipeline import Pipeline
>>> from sklearn.decomposition import IncrementalPCA
>>> from sklearn.linear_model import SGDRegressor
>>> from sklearn.preprocessing import StandardScaler
>>> import joblib
>>> model = Pipeline([("scaler", StandardScaler()),
("pca", IncrementalPCA(n_components=3))
("estimator", SGDRegressor())])
>>> joblib.dump(model, "pipeline.joblib")
Keras + TensorFlow
Keras is a neural networks/deep learning library for Python which runs on top of TensorFlow. To use Keras with your machine learning strategy, build, compile, and save your model to disk. Use Keras's save
method to serialize the model to disk, rather than joblib. Make sure your model filename ends with .keras.h5
, as this provides a hint to the walk-forward optimization that the serialized model should be opened as a Keras model.
>>> from keras.models import Sequential
>>> from keras.layers import Dense
>>> model = Sequential()
>>>
>>> model.add(Dense(1, input_dim=2))
>>> model.compile(loss='mean_squared_error', optimizer='adam')
>>> model.save('my_model.keras.h5')
After running the walk-forward optimization:
>>> ml_walkforward("demo-ml",
start_date="2006-01-01", end_date="2012-12-31",
train="Y",
model_filepath="my_model.keras.h5",
filepath_or_buffer="demo_ml_keras*")
You can load the trained Keras model using the load_model()
function:
>>> from keras.models import load_model
>>> trained_model = load_model("demo_ml_keras_trained_model.keras.h5")
Keras models support incremental learning and thus are suitable for larger-than-memory datasets.
XGBoost
XGBoost provides a popular implementation of gradient boosted trees. XGBoost provides wrappers with a scikit-learn-compatible API, which can be used with QuantRocket:
>>> from xgboost import XGBRegressor
>>> import joblib
>>> regr = XGBRegressor()
>>> joblib.dump(regr, "xgb_model.joblib")
Decision tree algorithms like XGBoost require loading the entire dataset into memory. Although XGBoost supports distributing a dataset across a cluster, this functionality isn't currently supported by QuantRocket. To use XGBoost on a large amount of data, launch a cloud server that is large enough to hold the data in memory.
Data preprocessing
Feature standardization
Many machine learning algorithms work best when the features are standardized in some way, for example have comparable scales, zero mean, etc. The first step for properly standardizing your data is to understand your machine learning algorithm and your data. (Check the scikit-learn docs for your algorithm.) Once you know what you want to do, there are generally two different places where you can standardize your features: using scikit-learn or using Pandas.
Using scikit-learn
Scikit-learn provides a variety of transformers to preprocess data before the data are used to fit your estimator. Transformers and estimators can be combined using scikit-learn pipelines. For example, QuantRocket's default model, shown below, preprocesses features using StandardScaler
, which centers the data at 0 and scales to unit variance, before using the data to fit SGDRegressor
:
from sklearn.pipeline import Pipeline
from sklearn.linear_model import SGDRegressor
from sklearn.preprocessing import StandardScaler
model = Pipeline([("scaler", StandardScaler()),
("estimator", SGDRegressor())])
See the scikit-learn user guide to learn more about available transformers.
Using pandas
You can also standardize your features in your prices_to_features
method. For example, you might rank stocks with pct=True
which nicely results in a scale of 0 to 1:
features["winners"] = twelve_month_returns.rank(axis=1, ascending=False, pct=True).fillna(1)
Or if your data has outliers and your model is sensitive to them, you might winsorize them:
features["1d_returns"] = returns.where(returns < 1, 1)
Or re-create the StandardScaler
's behavior yourself by subtracting the mean and scaling to unit variance:
pb_ratios = pb_ratios - pb_ratios.stack().mean()
features["price_to_book"] = pb_ratios / pb_ratios.stack().std()
One-hot encoding
One-hot encoding (aka dummy encoding) is a data preprocessing technique whereby a categorical feature such as stock sectors is converted to multiple features, with each feature containing a boolean 1 or 0 to indicate whether the sample (stock) belongs to the category (sector). One-hot encoding is a necessary step for using categorical data with machine learning. The snippet below illustrates the before and after of one-hot encoding:
>>> sectors
Sector
Stock
AAPL Technology
BAC Financial
>>> sectors.Sector.str.get_dummies()
Financial Technology
Stock
AAPL 0 1
BAC 1 0
To one-hot encode a Series, you can use pandas get_dummies()
method as shown above, but this isn't suitable for DataFrames. To one-hot encode a categorical feature such as sector when working with a DataFrame, loop through the sectors and add a feature per sector as shown below:
from quantrocket.master import get_securities_reindexed_like
closes = prices.loc["Close"]
securities = get_securities_reindexed_like(closes, domain="main", fields="Sector")
sectors = securities.loc["Sector"]
features = {}
for sector in sectors.stack().unique():
features[sector] = (sectors == sector).astype(int)
Handling of NaNs
Most machine learning models do not handle NaNs, which therefore must be removed or replaced. If your features DataFrames contain any NaNs, QuantRocket replaces the NaNs with 0 before providing the data to your model. Sometimes this behavior might not be suitable; for example, if ranking stocks on a scale of 0 to 1 using pct=True
, 0 implies having the best rank, which is probably not what you want. In these cases you should fill your own NaNs:
features["winners"] = twelve_month_returns.rank(axis=1, ascending=False, pct=True).fillna(1)
Unlike features DataFrames, if there are NaNs in your targets DataFrame, they are not filled. Rather, the NaN targets and their corresponding features are dropped and thus excluded from model training.
Incremental vs non-incremental learning
To avoid overfitting, it is often desirable to train machine learning models with large amounts of data. Depending on your computer specs, this data might not fit in memory.
A subset of machine learning algorithms supports incremental learning, also known as out-of-core learning, meaning they can be trained on small, successive batches of data without the need to load the entire dataset into memory. Other machine learning algorithms cannot learn incrementally as they require seeing the complete dataset, which therefore must be loaded into memory in its entirety.
The following table summarizes the pros and cons of incremental and non-incremental algorithms:
| Incremental algorithms | Non-incremental algorithms |
---|
memory requirements | low due to loading dataset in batches | high due to loading entire dataset |
runtime | faster due to loading less data | slower due to loading more data |
supports rolling windows | no | yes |
Incremental algorithms
Algorithms that support incremental learning include:
- the default model, scikit-learn's
SGDRegressor
(linear regression with Stochastic Gradient Descent) - other scikit-learn algorithms that implement a
partial_fit
method. See the full list. - Keras + TensorFlow neural networks
Algorithms that do not support incremental learning include:
- Decision trees
- scikit-learn algorithms not included in the above list
- XGBoost
Memory and runtime
For an expanding walk-forward optimization with a 3-year initial training window and annual retraining, the following table shows the sequence of training periods for an incremental vs non-incremental learning algorithm:
iteration | training period (incremental) | training period (non-incremental) |
---|
1 | 2006-2009 | 2006-2009 |
2 | 2010 | 2006-2010 |
3 | 2011 | 2006-2011 |
... | ... | ... |
10 | 2018 | 2006-2018 |
The non-incremental algorithm must be trained from scratch at each iteration and thus must load more and more data as the simulation progresses, eventually loading the entire dataset. Moreover, the runtime is slower because many periods of data must be reloaded again and again (for example 2006 data is loaded in every iteration).
In contrast, the incremental algorithm is not re-trained from scratch at each iteration but is simply updated with the latest year of data, resulting in much lower memory usage and a faster runtime.
Sub-segmentation of incremental learning
Sometimes your dataset might be too large for your training periods, even with incremental learning. This can especially be true for the initial training period when you specify a longer value for min_train
.
You can use the segment
parameter to further limit the amount of data loaded into memory. The following example specifies annual model training (train="Y"
) with 4 years of initial training (min_train="4Y"
), but the segment
parameter ensures that the 4 years of initial training will only be loaded 1 year at a time:
>>> ml_walkforward("demo-ml",
start_date="2006-01-01", end_date="2012-12-31",
train="Y",
min_train="4Y",
segment="Y",
filepath_or_buffer="demo_ml*")
Alternatively, the following example would retrain annually but only load 1 quarter of data at a time:
>>> ml_walkforward("demo-ml",
start_date="2006-01-01", end_date="2012-12-31",
train="Y",
segment="Q",
filepath_or_buffer="demo_ml*")
The segment
parameter might seem redundant with the train
parameter: why not simply use train="Q"
to load quarterly data? Consider that the segment
parameter is a purely technical parameter that exists solely for the purpose of controlling memory usage. Meanwhile the train
and min_train
parameters, though they do affect memory usage, also express a strategic decision by the trader as to how often the model should be updated. The segment
parameter allows this strategic decision to be separated from the purely technical constraint of available memory.
Rolling optimization support
Incremental algorithms do not support rolling windows. This is because incremental learning updates a model's earlier training with new training but does not expunge the earlier training, as would be required for a rolling optimization. To using a rolling window with an incremental algorithm, you must force the algorithm to run non-incrementally (which will load the entire dataset):
>>> ml_walkforward("demo-ml",
start_date="2006-01-01", end_date="2012-12-31",
train="Y", rolling_train="4Y",
force_nonincremental=True,
filepath_or_buffer="demo_ml*")
Live trading
Live trading a MoonshotML machine learning strategy is nearly identical to live trading a standard Moonshot strategy. The only special requirement is that you must indicate which trained model to use with the strategy.
To do so, save the trained model from your walk-forward optimization to any location in or under the /codeload
directory. (Including a date or version number in the filename is a good idea.) Then, specify the full path to the model file in your MoonshotML strategy:
class DemoMLStrategy(MoonshotML):
CODE = "demo-ml"
DB = "demo-stk-1d"
MODEL = "/codeload/demo_ml_trained_model_20190101.joblib"
Then trade the strategy like any other:
$ quantrocket moonshot trade 'demo-ml' | quantrocket blotter order -f '-'
Periodically update the model based on your training interval. For example, if your walk-forward optimization used annual training (train="Y"
), you should re-run the walk-forward optimization annually to generate an updated model file, then reference this new model file in your MoonshotML strategy.
Zipline
Zipline and pyfolio are open-source libraries for running backtests and analyzing algorithm performance. Both libraries are developed by Quantopian. QuantRocket makes it easy to run Zipline backtests using historical data from QuantRocket's history service and view a pyfolio tear sheet of the results.
Data ingestion
To run a Zipline backtest using data from a QuantRocket history database, the first step is to collect the historical data, and the second step is to "ingest", or import, the historical data into Zipline's native format. Ingested data is referred to as a "data bundle."
Initial ingestion
You can ingest 1-day or 1-minute history databases (the two bar sizes Zipline supports). Let's ingest historical data for AAPL so we can run the Zipline demo strategy.
First, assume we've already collected 1-day bars for AAPL, like so:
$
$ quantrocket master collect --exchanges NASDAQ --symbols AAPL
status: the listing details will be collected asynchronously
$
$ quantrocket master get -e NASDAQ -s AAPL | quantrocket master universe 'just-aapl' -f -
code: just-aapl
inserted: 1
provided: 1
total_after_insert: 1
$
$ quantrocket history create-db 'aapl-1d' --universes 'just-aapl' --bar-size '1 day'
status: successfully created quantrocket.history.aapl-1d.sqlite
$ quantrocket history collect 'aapl-1d'
status: the historical data will be collected asynchronously
After the historical data request finishes, we can ingest our historical data into Zipline:
$ quantrocket zipline ingest --history-db 'aapl-1d' --calendar 'NYSE'
msg: successfully ingested aapl-1d bundle
status: success
>>> from quantrocket.zipline import ingest_bundle
>>> ingest_bundle(history_db="aapl-1d", calendar="NYSE")
{'status': 'success', 'msg': 'successfully ingested aapl-1d bundle'}
$ curl -X POST 'http://houston/zipline/bundles?history_db=aapl-1d&calendar=NYSE'
{"status": "success", "msg": "successfully ingested aapl-1d bundle"}
The calendar option is required the first time you ingest data. Calendars are important in Zipline, and choosing a calendar that doesn't align well with your data can lead to confusing error messages. The above data bundle will use Zipline's NYSE calendar. Or you can associate your data bundle with a different Zipline calendar:
$
$ quantrocket zipline ingest --history-db 'london-stk-1min' --calendar ?
msg: 'unknown calendar ''?'', choices are: BMF, CFE, CME, ICE, LSE, NYSE, TSX, us_futures'
status: error
$ quantrocket zipline ingest --history-db 'london-stk-1min' --calendar 'LSE'
msg: successfully ingested london-stk-1min bundle
status: success
>>>
>>> ingest_bundle(history_db="london-stk-1min", calendar="?")
{'status': 'error', 'msg': 'unknown calendar ?, choices are: BMF, CFE, CME, ICE, LSE, NYSE, TSX, us_futures'}
>>> ingest_bundle(history_db="london-stk-1min", calendar="LSE")
{'status': 'success', 'msg': 'successfully ingested london-stk-1min bundle'}
$ curl -X POST 'http://houston/zipline/bundles?history_db=london-stk-1min&calendar=?'
{"status": "error", "msg": "unknown calendar ?, choices are: BMF, CFE, CME, ICE, LSE, NYSE,TSX, us_futures"}
$ curl -X POST 'http://houston/zipline/bundles?history_db=london-stk-1min&calendar=LSE'
{"status": "success", "msg": "successfully ingested london-stk-1min bundle"}
You can optionally ingest a subset of the history database, filtering by date range, universe, or conid. For example, you might import only a single year of a large 1-minute database:
$ quantrocket zipline ingest --history-db 'usa-stk-1min' -s '2017-01-01' -e '2017-12-31' --calendar 'NYSE'
msg: successfully ingested usa-stk-1min bundle
status: success
>>> ingest_bundle(history_db="usa-stk-1min",
start_date="2017-01-01", end_date="2017-12-31",
calendar="NYSE")
{'status': 'success', 'msg': 'successfully ingested usa-stk-1min bundle'}
$ curl -X POST 'http://houston/zipline/bundles?history_db=usa-stk-1min&start_date=2017-01-01&end_date=2018-01-01&calendar=NYSE'
{"status": "success", "msg": "successfully ingested usa-stk-1min bundle"}
By default the history database code is used as the bundle name, but you can optionally assign a bundle name. Assigning a bundle name allows you to separately ingest multiple subsets of the same database:
$ quantrocket zipline ingest --history-db 'usa-stk-1min' --universes 'nyse-stk' --bundle 'nyse-stk-1min' --calendar 'NYSE'
msg: successfully ingested nyse-stk-1min bundle
status: success
$ quantrocket zipline ingest --history-db 'usa-stk-1min' --universes 'nasdaq-stk' --bundle 'nasdaq-stk-1min' --calendar 'NYSE'
msg: successfully ingested nasdaq-stk-1min bundle
status: success
>>> ingest_bundle(history_db="usa-stk-1min",
universes="nyse-stk",
bundle="nyse-stk-1min", calendar="NYSE")
{'status': 'success', 'msg': 'successfully ingested nyse-stk-1min bundle'}
>>> ingest_bundle(history_db="usa-stk-1min",
universes="nasdaq-stk",
bundle="nasdaq-stk-1min", calendar="NYSE")
{'status': 'success', 'msg': 'successfully ingested nasdaq-stk-1min bundle'}
$ curl -X POST 'http://houston/zipline/bundles?history_db=usa-stk-1min&universes=nyse-stk&bundle=nyse-stk-1min&calendar=NYSE'
{"status": "success", "msg": "successfully ingested nyse-stk-1min bundle"}
$ curl -X POST 'http://houston/zipline/bundles?history_db=usa-stk-1min&universes=nasdaq-stk&bundle=nasdaq-stk-1min&calendar=NYSE'
{"status": "success", "msg": "successfully ingested nasdaq-stk-1min bundle"}
Re-ingesting data
After you update your history database with new data, you can re-ingest the database into Zipline using the same API:
$ quantrocket zipline ingest --history-db 'aapl-1d'
msg: successfully ingested aapl-1d bundle
status: success
>>> from quantrocket.zipline import ingest_bundle
>>> ingest_bundle(history_db="aapl-1d")
{'status': 'success', 'msg': 'successfully ingested aapl-1d bundle'}
$ curl -X POST 'http://houston/zipline/bundles?history_db=aapl-1d'
{"status": "success", "msg": "successfully ingested aapl-1d bundle"}
The calendar and any date range or universe filters that you specified during the initial ingestion will be used for the re-ingestion as well. If you need to change the calendar or filters, you must first completely remove the existing bundle:
$ quantrocket zipline clean -b 'aapl-1d' --all
aapl-1d:
- /root/.zipline/data/aapl-1d/2018-10-05T14;07;51.482331
>>> from quantrocket.zipline import clean_bundles
>>> clean_bundles(bundles=["aapl-1d"], clean_all=True)
{'aapl-1d': ['/root/.zipline/data/aapl-1d/2018-10-05T14;07;51.482331']}
$ curl -X DELETE 'http://houston/zipline/bundles?bundles=aapl-1d&keep_last=1'
{"aapl-1d": ["/root/.zipline/data/aapl-1d/2018-10-05T14;07;51.482331"]}
The
--all/clean_all
option removes all ingestions for the bundle and also deletes the stored bundle configuration. Then, you can ingest the database again with the correct calendar or filters.
Bundle cleanup
Data re-ingestion is not incremental. That is, new data is not appended to earlier data. Rather, the entire database (or the subset based on your filters, if applicable) is ingested each time you run the ingest function. Re-ingested data does not replace the earlier ingested data; rather, Zipline stores each ingestion as a new version of the bundle. By default the most recent ingestion is used when you run a backtest.
You can list your bundles and see the different versions you've ingested:
$ quantrocket zipline bundles
aapl-1d:
- '2018-10-05 14:16:12.246592'
- '2018-10-05 14:07:51.482331'
london-stk-1min:
- '2018-10-05 14:20:11.241632'
>>> from quantrocket.zipline import list_bundles
>>> list_bundles()
{'aapl-1d': ['2018-10-05 14:16:12.246592',
'2018-10-05 14:07:51.482331'],
'london-stk-1min': ['2018-10-05 14:20:11.241632']}
$ curl -X GET 'http://houston/zipline/bundles'
{"aapl-1d": ["2018-10-05 14:16:12.246592","2018-10-05 14:07:51.482331"], "london-stk-1min": ["2018-10-05 14:20:11.241632"]}
And you can remove old ingestions:
$
$ quantrocket zipline clean -b 'aapl-1d' --keep-last 1
aapl-1d:
- /root/.zipline/data/aapl-1d/2018-10-05T14;07;51.482331
>>>
>>> from quantrocket.zipline import clean_bundles
>>> clean_bundles(bundles=["aapl-1d"], keep_last=1)
{'aapl-1d': ['/root/.zipline/data/aapl-1d/2018-10-05T14;07;51.482331']}
$
$ curl -X DELETE 'http://houston/zipline/bundles?bundles=aapl-1d&keep_last=1'
{"aapl-1d": ["/root/.zipline/data/aapl-1d/2018-10-05T14;07;51.482331"]}
Suppose you update a history database each evening and want to re-ingest it into Zipline each time. To avoid filling up your hard drive with all the bundle ingestions, you might schedule the following commands:
0 17 * * mon-fri quantrocket master isopen 'NYSE' --ago '6H' && quantrocket history collect 'nyse-stk-1min' --priority
0 23 * * mon-fri quantrocket zipline clean -b 'nyse-stk-1min' --all && quantrocket zipline ingest --history-db 'nyse-stk-1min' --calendar 'NYSE'
Backtesting
Zipline provides the following demo file of a dual moving average crossover strategy using AAPL:
from zipline.api import order_target_percent, record, symbol, set_benchmark
def initialize(context):
context.sym = symbol('AAPL')
set_benchmark(symbol('AAPL'))
context.i = 0
def handle_data(context, data):
context.i += 1
if context.i < 300:
return
short_mavg = data.history(context.sym, 'price', 100, '1d').mean()
long_mavg = data.history(context.sym, 'price', 300, '1d').mean()
if short_mavg > long_mavg:
order_target_percent(context.sym, 0.2)
elif short_mavg < long_mavg:
order_target_percent(context.sym, 0)
record(AAPL=data.current(context.sym, "price"),
short_mavg=short_mavg,
long_mavg=long_mavg)
Place this file in the 'zipline' subdirectory inside Jupyter.
Next, run the backtest from a notebook, specifying the bundle you ingested earlier, and save the backtest results to a CSV:
from quantrocket.zipline import run_algorithm
run_algorithm("dual_moving_average.py", data_frequency="daily",
bundle="aapl-1d",
start="2000-01-01", end="2017-01-01",
filepath_or_buffer="aapl_results.csv")
By default, Zipline tries to download S&P 500 price history from the web to use as a performance benchmark. This has been a perennial source of backtest failures as numerous data sources have deprecated their APIs over time, including Yahoo! Finance, Google Finance, and IEX. As a workaround, QuantRocket now patches Zipline to replace the remote data calls with dummy benchmark data with returns of 0. If you want meaningful benchmark data, please set the benchmark to a security already in your data. In your initialize function, add the line: set_benchmark(symbol("SOME_SYMBOL_IN_YOUR_DATA"))
You can plot the backtest results using pyfolio:
import pyfolio as pf
pf.from_zipline_csv("aapl_results.csv")
You can also load the backtest results into a DataFrame:
>>> from quantrocket.zipline import ZiplineBacktestResult
>>> result = ZiplineBacktestResult.from_csv("aapl_results.csv")
>>> result.perf.iloc[-1]
column
algorithm_period_return 0.140275
benchmark_period_return 2.6665
capital_used -21976.8
ending_cash 9.15718e+06
ending_exposure 2.24557e+06
ending_value 2.24557e+06
excess_return 0
gross_leverage 0.196932
long_exposure 2.24557e+06
long_value 2.24557e+06
longs_count 1
max_drawdown -0.0993747
max_leverage 0.214012
net_leverage 0.196932
orders [{'filled': 199, 'limit': None, 'commission': ...}]
period_close 2014-12-31 21:00:00+00:00
period_label 2014-12
period_open 2014-12-31 14:31:00+00:00
pnl -43121.5
portfolio_value 1.14028e+07
positions [{'last_sale_price': 110.38, 'sid': Equity(265...)}]
returns -0.00376743
...
Backtesting via CLI
You can also use the command line to run a backtest and generate a PDF tear sheet:
$ quantrocket zipline run --bundle 'aapl-1d' -f 'dual_moving_average.py' -s '2000-01-01' -e '2017-01-01' -o aapl_results.csv
$ quantrocket zipline tearsheet aapl_results.csv -o aapl_results.pdf
Open the PDF and have a look:

Fundamental data in Zipline
QuantRocket provides access to the Reuters Worldwide Fundamentals dataset via Zipline's Pipeline API. First collect the data into your QuantRocket database as described in the fundamental data section of the usage guide.
To use the fundamental data in Pipeline, import the ReutersFinancials
Pipeline dataset (for annual financial reports) or the ReutersInterimFinancials
dataset (for interim/quarterly financial reports) from the zipline_extensions
package provided by QuantRocket. You can reference any of the available financial statement indicator codes and use them to build a custom Pipeline factor. (See the fundamental data section of the usage guide for help looking up the codes.)
Below, we create a custom Pipeline factor that calculates price-to-book ratio.
from zipline.pipeline import Pipeline, CustomFactor
from zipline.pipeline.data import USEquityPricing
from zipline_extensions.pipeline.data import ReutersFinancials
class PriceBookRatio(CustomFactor):
"""
Custom factor that calculates price-to-book ratio.
First, calculate book value per share, defined as:
(Total Assets - Total Liabilities) / Number of shares outstanding
The codes we'll use for these metrics are 'ATOT' (Total Assets),
'LTLL' (Total Liabilities), and 'QTCO' (Total Common Shares Outstanding).
Price-to-book ratio is then calculated as:
closing price / book value per share
"""
inputs = [
USEquityPricing.close,
ReutersFinancials.ATOT,
ReutersFinancials.LTLL,
ReutersFinancials.QTCO
]
window_length = 1
def compute(self, today, assets, out, closes, tot_assets, tot_liabilities, shares_out):
book_values_per_share = (tot_assets - tot_liabilities)/shares_out
pb_ratios = closes/book_values_per_share
out[:] = pb_ratios
Now we can use our custom factor in our Pipeline:
pipe = Pipeline()
pb_ratios = PriceBookRatio()
pipe.add(pb_ratios, 'pb_ratio')
See Zipline's documentation for more on using the Pipeline API.
Custom Scripts
QuantRocket's satellite
service makes it easy to create and integrate custom scripts into QuantRocket. Here are some of the things you can do with custom scripts:
- create and schedule multi-step maintenance tasks that are too complex for the command line
- schedule download of custom data from a third party API to use in Moonshot or elsewhere
- connect directly to the IB API
- run backtests using a third-party backtester such as backtrader (see tutorial in Code Library)
- create an options trading script that uses QuantRocket's Python API to query data and place orders using the blotter
With the satellite
service you get the benefit of QuantRocket's infrastructure and data services together with the freedom and flexibility to execute your own custom logic.
Jupyter vs Satellite
Why should you use the satellite
service to run your custom code instead of simply running the code within JupyterLab? For one-and-done scripts or interactive research, it is fine to run your custom code from a Notebook, Console, or Terminal within JupyterLab. Running code via the satellite
service provides two main benefits:
- The ability to schedule your custom code to run automatically via your countdown service crontab.
- The ability to run your custom code within a dedicated container, optionally with custom packages you install. The container's environment is isolated from and unaffected by your JupyterLab environment.
Execute Python functions
Suppose you need to run a Python function once a day that creates a calendar spread in the securities master database. You create a file at /codeload/scripts/combos.py
in which you define a function called create_calendar_spread
which accepts the name of a universe and the contract numbers from which to create the calendar spread:
def create_calendar_spread(universe, contract_nums=[1,2]):
You can use the satellite service to run this function and pass it arguments. Specify the function using Python dot notation. The notation must start with codeload.
in order for the satellite service to interpret it as a Python function:
You can schedule this command to run on your crontab:
0 9 * * mon-fri quantrocket satellite exec 'codeload.scripts.combos.create_calendar_spread' --params 'universe:cl-fut' 'contract_months:[1,2]'
Execute shell commands
Any command that does not begin with 'codeload.' is interpreted and executed as a shell command. For example, you can execute a bash script:
Install custom packages
The satellite
service ships with the same Python and Linux (Debian) packages that are available inside the jupyter
service. This is a well-stocked environment as it includes the full Anaconda distribution. However, if needed, you can install additional Python or Debian packages.
To install additional Python packages, create a pip requirements file called quantrocket.satellite.pip.txt
and place it in the /codeload
directory, that is, in the top-level of the Jupyter file browser. Add one package per line (see more file format examples in Python's documentation):
beautifulsoup4
docopt==0.6.1
To install Linux (Debian) packages, create a file called quantrocket.satellite.apt.txt
in the /codeload
directory and add one package per line (these will be installed with apt-get
):
procps
r-base
To make the satellite service actually install the packages, execute the install-packages
command inside the satellite container:
Connect to IB API directly
You can use the satellite
service to connect directly to the IB API. You might do this to access a particular IB API call that is not currently mapped to QuantRocket. We recommend the ib_insync
package for directly accessing the IB API.
The IB API is already installed on the satellite
service but you must install ib_insync
. From a JupyterLab terminal, append the package to quantrocket.satellite.pip.txt
and tell the service to install it:
$
$ echo 'ib_insync==0.9.37' >> quantrocket.satellite.pip.txt
$ quantrocket satellite exec '/opt/quantrocket/bin/install-packages'
status: success
Create an empty .py
file in or under the codeload
directory (top-level directory in Jupyter file browser). For this example we create the script at /codeload/get_scan_data.py
. In your custom script, you can access your IB Gateway(s) using their service name as the host, and port 4001. Using a clientId smaller than 1000 will avoid collisions with other QuantRocket services:
from quantrocket.launchpad import start_gateways
from ib_insync import *
start_gateways(wait=True)
ib = IB()
ib.connect('ibg1', 4001, clientId=1)
Any data you request can be saved to a file in or under the /codeload
directory, where it can be accessed via JupyterLab or by other scripts:
sub = ScannerSubscription(
instrument='FUT.US',
locationCode='FUT.GLOBEX',
scanCode='TOP_PERC_GAIN')
scanData = ib.reqScannerData(sub)
scanData = util.df(scanData)
scanData.to_csv('/codeload/scandata.csv')
You can then execute the script as follows:
This command can be scheduled on your countdown service to automate the process.
Scheduling
You can use QuantRocket's cron service, named "countdown," to schedule automated tasks such as collecting historical data or running your trading strategies.
You can pick the timezone in which you want to schedule your tasks, and you can create as many countdown services as you like. If you plan to trade in multiple timezones, consider creating a separate countdown service for each timezone where you will trade.
Set timezone
By default, deployments come equipped with a single countdown service (called "countdown"). The countdown service's default timezone is UTC, meaning the times in your crontab are interpreted as UTC. However, it's best to change the timezone so that you can schedule your jobs in the timezone of the exchange they relate to. For example, if you want to collect shortable shares data for Australian stocks every day at 9:45 AM before the market opens at 10:00 AM local time, it's better to schedule this in Sydney time than in UTC or some other timezone, because scheduling in another timezone will necessitate editing the crontab several times per year due to daylight savings changes, which is error prone. By scheduling the cron job in Sydney time, you never have to worry about this.
You can set the timezone as follows:$ quantrocket countdown timezone 'Australia/Sydney'
status: successfully set timezone to Australia/Sydney
>>> from quantrocket.countdown import set_timezone
>>> set_timezone("Australia/Sydney")
{'status': 'successfully set timezone to Australia/Sydney'}
$ curl -X PUT 'http://houston/countdown/timezone?tz=Australia%2FSydney'
{"status": "successfully set timezone to Australia/Sydney"}
If you're not sure of the timezone name, type as much as you know to see a list of close matches:
$ quantrocket countdown timezone 'newyork'
msg: 'invalid timezone: newyork (close matches are: America/New_York)'
status: error
>>> set_timezone("newyork")
HTTPError: ('400 Client Error: BAD REQUEST for url: http://houston/countdown/timezone?tz=newyork', {'status': 'error', 'msg': 'invalid timezone: newyork (close matches are: America/New_York)'})
$ curl -X PUT 'http://houston/countdown/timezone?tz=newyork'
{"status": "error", "msg": "invalid timezone: newyork (close matches are: America/New_York)"}
You can pass '?' to see all available timezones:
$ quantrocket countdown timezone '?'
msg: 'invalid timezone: ? (choices are: Africa/Abidjan, Africa/Accra, Africa/Addis_Ababa,
Africa/Algiers, Africa/Asmara, Africa/Asmera, Africa/Bamako, Africa/Bangui, Africa/Banjul,
Africa/Bissau, Africa/Blantyre, Africa/Brazzaville, Africa/Bujumbura, Africa/Cairo,'
...
>>> set_timezone("?")
HTTPError: ('400 Client Error: BAD REQUEST for url: http://houston/countdown/timezone?tz=%3F', {'status': 'error', 'msg': 'invalid timezone: ? (choices are: Africa/Abidjan, Africa/Accra, Africa/Addis_Ababa, Africa/Algiers, Africa/Asmara, Africa/Asmera, Africa/Bamako, Africa/Bangui, Africa/Banjul, Africa/Bissau, Africa/Blantyre, Africa/Brazzaville, Africa/Bujumbura, Africa/Cairo,'...})
$ curl -X PUT 'http://houston/countdown/timezone?tz=?'
{"status": "error", "msg": "invalid timezone: ? (choices are: Africa/Abidjan, Africa/Accra, Africa/Addis_Ababa, Africa/Algiers, Africa/Asmara, Africa/Asmera, Africa/Bamako, Africa/Bangui, Africa/Banjul, Africa/Bissau, Africa/Blantyre, Africa/Brazzaville, Africa/Bujumbura, Africa/Cairo, ..."
Create your crontab
You can create and edit your crontab within the Jupyter environment. The countdown service uses a naming convention to recognize and load the correct crontab (in case you're running multiple countdown services). For the default countdown service named countdown
, the service will look for and load a crontab named quantrocket.countdown.crontab
. This file should be created in the top-level of your codeload volume, that is, in the top level of your Jupyter file browser.
After you create the file, you can add cron jobs as on a standard crontab. An example crontab is shown below:
30 17 * * 1-5 quantrocket history collect 'nasdaq-1d'
0 14 * * 7 quantrocket fundamental collect-financials -u 'nasdaq'
Each time you edit the crontab, the corresponding countdown service will detect the change and reload the file.
Crontab syntax help
There are many online crontab generators to help you generate correct cron schedule expressions. We like Crontab Guru , which validates your syntax and also provides helpful examples .
Crontab syntax highlighting
JupyterLab doesn't currently provide syntax highlighting for .crontab
files. To trigger Shell syntax highlighting, you can optionally append .sh
to your file: quantrocket.countdown.crontab.sh
. QuantRocket monitors for both the .crontab
and .crontab.sh
file extensions.
Validate your crontab
Whenever you save your crontab, it's a good idea to have flightlog open (quantrocket flightlog stream
) so you can check that it was successfully loaded by the countdown service:
2018-02-21 09:31:57 quantrocket.countdown: INFO Successfully loaded quantrocket.countdown.crontab
If there are syntax errors in the file, it will be rejected (a common error is failing to include an empty line at the bottom of the crontab):
2018-02-21 09:32:38 quantrocket.countdown: ERROR quantrocket.countdown.crontab is invalid, please correct the errors:
2018-02-21 09:32:38 quantrocket.countdown: ERROR new crontab file is missing newline before EOF, cannot install.
2018-02-21 09:32:38 quantrocket.countdown: ERROR
You can also use the client to print out the crontab installed in your container so you can verify that it is as expected:
Monitor cron errors
Assuming your crontab is free of syntax errors and loaded successfully, there might still be errors when your commands run and you will want to know about those. You can monitor flightlog for this purpose, as any errors returned by the unattended commands will be logged to flightlog. Setting up flightlog's Papertrail integration works well for this purpose as it allows you to monitor anywhere and set up alerts.
Generally, errors will be logged to flightlog's application (non-detailed) logs. The exception is that if you misspell "quantrocket" or call a program that doesn't exist, the error message will only show up in flightlog's detailed logs:
$ quantrocket flightlog get --detailed /tmp/system.log
$ tail /tmp/system.log
quantrocket_countdown_1|Date: Tue, 24 Apr 2018 13:04:01 -0400
quantrocket_countdown_1|
quantrocket_countdown_1|/bin/sh: 1: quantrockettttt: not found
quantrocket_countdown_1|
In addition to error output, flightlog's detailed logs will log all output from your cron jobs. The output will be formatted as text emails because this is the format the cron utility uses.
Multiple countdown services
By default, deployments include a single countdown service (called "countdown"). If you need to schedule jobs in multiple timezones, you can create additional countdown services.
To do so, create a file called docker-compose.override.yml
in the same directory as your docker-compose.yml
and add the desired additional countdown services. Each countdown service must have a unique name, which must start with "countdown". In this example we are adding two countdown services, one for Australia and one for Japan, which inherit from the definition of the default countdown
service:
version: '2.4'
services:
countdown-australia:
extends:
file: docker-compose.yml
service: countdown
countdown-japan:
extends:
file: docker-compose.yml
service: countdown
You can learn more about docker-compose.override.yml
in another section.
Then, deploy the new service(s):
$ cd /path/to/docker-compose.yml
$ docker-compose -p quantrocket up -d
You can then set the timezone for the new services:
$ quantrocket countdown timezone 'Australia/Sydney' --service 'countdown-australia'
status: successfully set timezone to Australia/Sydney
>>> from quantrocket.countdown import set_timezone
>>> set_timezone("Australia/Sydney", service="countdown-australia")
{'status': 'successfully set timezone to Australia/Sydney'}
$ curl -X PUT 'http://houston/countdown/timezone?tz=Australia%2FSydney&service=countdown-australia'
{"status": "successfully set timezone to Australia/Sydney"}
You would schedule jobs for these services in quantrocket.countdown-australia.crontab
and quantrocket.countdown-japan.crontab
, respectively, in the codeload
directory within JupyterLab.
Trading calendars
Trading calendars in QuantRocket allow you to conditionally schedule data collection, trading, and other tasks based on the exchange hours of the relevant exchange. This allows you to avoid being tripped up by holidays, early closes, lunch breaks, and so on.
Calendar data sources
QuantRocket relies on two calendar data sources: the IB API, and ib-trading-calendars, an open-source package from QuantRocket built on top of the trading_calendars
package from Quantopian. Both data sources have certain limitations which are remedied in combination:
| IB API | ib-trading-calendars package |
---|
Covers all IB-supported exchanges | yes | no |
Correctly handles holidays | no (reports the exchange as open) | yes |
Correctly handles lunch breaks (Asian exchanges) | yes | no |
Regular or extended hours | yes | regular hours only |
Historical data | no, forward-looking only (1 month) | historical and forward-looking |
Requires periodic data collection | yes | no |
QuantRocket consults both calendar data sources, when available, in order to report the most accurate calendar.
Collect trading calendars
No data collection is required for the
ib-trading-calendars
package, but calendar data from the IB API must be collected periodically. If the exchanges you care about are
supported by ib-trading-calendars and don't have lunch breaks, and you don't care about extended hours, you can rely on
ib-trading-calendars
and need not collect calendar data from the IB API. Otherwise, you should collect the IB API data.
To collect upcoming trading hours for the exchanges you care about, first make sure you've already collected listings for the exchange(s):
$ quantrocket master collect --exchanges 'TSEJ' --sec-types 'STK'
status: the listing details will be collected asynchronously
>>> from quantrocket.master import collect_listings
>>> collect_listings(exchanges="TSEJ", sec_types=["STK"])
{'status': 'the listing details will be collected asynchronously'}
$ curl -X POST 'http://houston/master/securities?exchanges=TSEJ&sec_types=STK'
{"status": "the listing details will be collected asynchronously"}
Once the listings are saved to your database, you're ready to collect the exchange hours:
$ quantrocket master collect-calendar
status: the trading hours will be collected asynchronously
>>> from quantrocket.master import collect_calendar
>>> collect_calendar()
{'status': 'the trading hours will be collected asynchronously'}
$ curl -X POST 'http://houston/master/calendar'
{"status": "the trading hours will be collected asynchronously"}
This will collect trading hours for all exchanges in your securities master database. Optionally, you can limit by exchange:
$ quantrocket master collect-calendar -e 'TSEJ'
status: the trading hours will be collected asynchronously
>>> from quantrocket.master import collect_calendar
>>> collect_calendar(exchanges=["TSEJ"])
{'status': 'the trading hours will be collected asynchronously'}
$ curl -X POST 'http://houston/master/calendar?exchanges=TSEJ'
{"status": "the trading hours will be collected asynchronously"}
Trading hours for the next month are returned by the IB API; this means you need to re-run the command periodically. You can add it to one of your countdown service crontabs:
0 3 * * mon-fri quantrocket master collect-calendar
The IB API provides trading hours by security, but for simplicity QuantRocket stores trading hours by exchange. QuantRocket selects a sampling of securities for each exchange and requests trading hours for those securities.
Query trading hours
Once you've collected trading hours for an exchange, you can query to see if the exchange is open or closed. You'll get the status (open or closed) as well as when the status took effect and when it will next change:
$ quantrocket master calendar 'NYSE'
NYSE:
since: '2018-05-10T09:30:00'
status: open
timezone: America/New_York
until: '2018-05-10T16:00:00'
>>> from quantrocket.master import list_calendar_statuses
>>> list_calendar_statuses(["NYSE"])
{'NYSE': {'since': '2018-05-10T09:30:00',
'status': 'open',
'timezone': 'America/New_York',
'until': '2018-05-10T16:00:00'}}
$ curl 'http://houston/master/calendar?exchanges=NYSE'
{"NYSE": {"status": "open", "since": "2018-05-10T09:30:00", "until": "2018-05-10T16:00:00", "timezone": "America/New_York"}}
By default the exchange's current status is returned, but you can also check what the exchange status was in the past (using a Pandas timedelta string):
$ quantrocket master calendar 'NYSE' --ago '12h'
NYSE:
since: '2018-05-09T16:00:00'
status: closed
timezone: America/New_York
until: '2018-05-10T09:30:00'
>>> from quantrocket.master import list_calendar_statuses
>>> list_calendar_statuses(["NYSE"], ago="12h")
{'NYSE': {'since': '2018-05-09T16:00:00',
'status': 'closed',
'timezone': 'America/New_York',
'until': '2018-05-10T09:30:00'}}
$ curl 'http://houston/master/calendar?exchanges=NYSE&ago=12h'
{"NYSE": {"status": "closed", "since": "2018-05-09T16:00:00", "until": "2018-05-10T09:30:00", "timezone": "America/New_York"}}
Or what the exchange status will be in the future:
$ quantrocket master calendar 'NYSE' --in '30min'
NYSE:
since: '2018-05-10T16:00:00'
status: closed
timezone: America/New_York
until: '2018-05-11T09:30:00'
>>> from quantrocket.master import list_calendar_statuses
>>> list_calendarstatuses(["NYSE"], in="30min")
{'NYSE': {'since': '2018-05-10T16:00:00',
'status': 'closed',
'timezone': 'America/New_York',
'until': '2018-05-11T09:30:00'}}
$ curl 'http://houston/master/calendar?exchanges=NYSE&in=30min'
{"NYSE": {"status": "closed", "since": "2018-05-10T16:00:00", "until": "2018-05-11T09:30:00", "timezone": "America/New_York"}}
Conditional scheduling with isopen / isclosed
The most common use of trading calendars in QuantRocket is to conditionally schedule commands that run on the countdown service. Conditional scheduling is accomplished using quantrocket master isopen
and quantrocket master isclosed
. For example, we could schedule a NASDAQ history database to be updated only if the NASDAQ was open today:
30 17 * * mon-fri quantrocket master isopen 'NASDAQ' --ago '5h' && quantrocket history collect 'nasdaq-eod'
quantrocket master isopen
and quantrocket master isclosed
are used as true/false assertions: they don't print any output but return an exit code of 0 (indicating success) if the condition is met and an exit code of 1 (indicating failure) if it is not met. In shell, a double-ampersand (&&
) between commands indicates that the second command will only run if the preceding command returns a 0 exit code. Thus, in the above example, if the NASDAQ was open 5 hours ago, the historical data command will run; if the NASDAQ wasn't open, it won't.
The --in
and --ago
options allow you to check the exchange status in the past or future; if omitted, the command checks the current exchange status. The --in/-ago
options accept any string that can be passed to pd.Timedelta
.
To get the feel of using isopen
/isclosed
, you can open a terminal and try the commands in conjunction with echo
:
$
$ quantrocket master isopen 'GLOBEX' --in '1h' && echo "assertion passed"
Generally, live trading commands should always be prefixed with an appropriate isopen
/isclosed
:
0 9 * * mon-fri quantrocket master isopen 'NASDAQ' --in '1h' && quantrocket moonshot trade 'my-strategy' | quantrocket blotter order -f '-'
30 10 * * mon-fri quantrocket master isopen 'NASDAQ' && quantrocket moonshot trade 'my-intraday-strategy' | quantrocket blotter order -f '-'
You can chain together multiple isopen
/isclosed
for more complex conditions. The following example shows how to run a strategy at 12:45 PM on early close days and at 3:45 PM on regular days:
45 12 * * mon-fri quantrocket master isopen 'ARCA' && quantrocket master isclosed 'ARCA' --in '1h' && quantrocket moonshot trade 'my-etf-strategy' | quantrocket blotter order -f '-'
45 15 * * mon-fri quantrocket master isopen 'ARCA' && quantrocket moonshot trade 'my-etf-strategy' | quantrocket blotter order -f '-'
Using the --since
and --until
options, you can schedule commands to run only at the beginning (or end) of the month, quarter, etc. This can be useful for strategies that periodically rebalance:
30 8 * * mon-fri quantrocket master isopen 'TSEJ' --in '1h' && quantrocket master isclosed 'TSEJ' --since 'Q' && quantrocket moonshot trade 'monthly-strategy' | quantrocket blotter order -f '-'
45 15 * * mon-fri quantrocket master isopen 'NYSE' && quantrocket master isclosed 'NYSE' --in '1h' --until 'M' && quantrocket moonshot trade 'window-dressing' | quantrocket blotter order -f '-'
0 9 * * mon-fri quantrocket master isclosed 'NYSE' --since 'W' && quantrocket master isopen 'NYSE' --in 1h && quantrocket moonshot trade 'umd-us' | quantrocket blotter order -f -
The --since/--until
options are applied after --in/--ago
, if both are specified. For example, quantrocket master isclosed 'NYSE' --in '1h' --until 'M'
asserts that the NYSE will be closed in 1 hour and will remained closed through month end. The --since/--until
options accept a Pandas offset alias or anchored offset , or more broadly any string that can be passed as the freq
argument to pd.date_range
.
Account Monitoring
QuantRocket keeps track of your IB account balances and of exchange rates between your IB base currency and other currencies you might trade. You can also check your IB portfolio in real-time.
IB account balances
You can query your latest account balance through QuantRocket without having to open Trader Workstation. IB provides many account-related fields, so you might want to limit which fields are returned. This will check your Net Liquidation Value (IB's term for your account balance):
$ quantrocket account balance --latest --fields 'NetLiquidation' | csvlook
| Account | Currency | NetLiquidation | LastUpdated |
| --------- | -------- | -------------- | ------------------- |
| DU12345 | USD | 500,000.00 | 2018-02-02 22:57:13 |
>>> from quantrocket.account import download_account_balances
>>> import io
>>> import pandas as pd
>>> f = io.StringIO()
>>> download_account_balances(f, latest=True, fields=["NetLiquidation"])
>>> balances = pd.read_csv(f, parse_dates=["LastUpdated"])
>>> balances.head()
Account Currency NetLiquidation LastUpdated
0 DU12345 USD 500000.0 2018-02-02 22:57:13
$ curl 'http://houston/account/balances.csv?latest=true&fields=NetLiquidation'
Account,Currency,NetLiquidation,LastUpdated
DU12345,USD,500000.0,2018-02-02 22:57:13
Using the CLI, you can filter the output to show only accounts where the margin cushion is below 5%, and log the results (if any) to flightlog:
$ quantrocket account balance --latest --below 'Cushion:0.05' --fields 'NetLiquidation' 'Cushion' | quantrocket flightlog log --name 'quantrocket.account' --level 'CRITICAL'
If you've set up Twilio alerts for CRITICAL messages, you can add this command to the crontab on one of your countdown services, and you'll get a text message whenever you're at risk of auto-liquidation by IB. If no accounts are below the cushion, nothing will be logged.
Account balance history
Whenever you're connected to IB, QuantRocket pings IB every few minutes and saves your latest account balance details to your database. One reading per day (if available) is retained permanently to provide a historical record of your account balances over time. This is used by the blotter for performance tracking. You can download a CSV of your available account balance history:
$ quantrocket account balance --outfile balances.csv
>>> from quantrocket.account import download_account_balances
>>> download_account_balances("balances.csv")
>>> balances = pd.read_csv("balances.csv")
$ curl 'http://houston/account/balances.csv' > balances.csv
IB portfolio
You can check your current IB portfolio without logging into Trader Workstation:
$ quantrocket account portfolio | csvlook -I
| Account | ConId | Description | Position | UnrealizedPnl | RealizedPnl | MarketPrice | ...
| -------- | --------- | ------------------------ | ---------- | ------------- | ----------- | ------------ |
| DU123456 | 255253337 | MXP FUT @GLOBEX 20180618 | -1.0 | 1173.72 | 0.0 | 0.0504276 |
| DU123456 | 35045199 | USD.MXN CASH @IDEALPRO | -24402.0 | 11960.16 | 0.0 | 19.7354698 |
| DU123456 | 185291219 | WALMEX STK @MEXI | 165.0 | 796.8 | 0.0 | 48.92274855 |
| DU123456 | 253190540 | EWI STK @ARCA | 109.0 | -2.03 | 0.0 | 32.38597105 |
>>> from quantrocket.account import download_account_portfolio
>>> import io
>>> f = io.StringIO()
>>> download_account_portfolio(f)
>>> portfolio = pd.read_csv(f, parse_dates=["LastUpdated"])
>>> portfolio.head()
Account ConId Description Position UnrealizedPnl RealizedPnl MarketPrice ...
0 DU123456 255253337 MXP FUT @GLOBEX 20180618 -1.0 1173.72 0.0 0.050428
1 DU123456 35045199 USD.MXN CASH @IDEALPRO -24402.0 12368.15 0.0 19.718750
2 DU123456 185291219 WALMEX STK @MEXI 165.0 796.80 0.0 48.922749
3 DU123456 253190540 EWI STK @ARCA 109.0 -2.03 0.0 32.385971
$ curl -X GET 'http://houston/account/portfolio.csv' | csvlook -I
| Account | ConId | Description | Position | UnrealizedPnl | RealizedPnl | MarketPrice | ...
| -------- | --------- | ------------------------ | ---------- | ------------- | ----------- | ------------ |
| DU123456 | 255253337 | MXP FUT @GLOBEX 20180618 | -1.0 | 1173.72 | 0.0 | 0.0504276 |
| DU123456 | 35045199 | USD.MXN CASH @IDEALPRO | -24402.0 | 11960.16 | 0.0 | 19.7354698 |
| DU123456 | 185291219 | WALMEX STK @MEXI | 165.0 | 796.8 | 0.0 | 48.92274855 |
| DU123456 | 253190540 | EWI STK @ARCA | 109.0 | -2.03 | 0.0 | 32.38597105 |
The portfolio is a basic snapshot of what is visible in TWS. Checking your portfolio requires IB Gateway to be connected and is mainly intended to be used when you can't log in to Trader Workstation because your login is being used by IB Gateway. Only the current portfolio is available; historical performance tracking is provided separately by QuantRocket's blotter.
Exchange rates
To support currency conversions between your base currency and other currencies you might trade, QuantRocket collects daily exchange rates and stores them in your database. Exchange rates come from the European Central Bank, which updates them each business day at 4 PM CET.
You probably won't need to query the exchange rates directly very often, but you can if needed. You can check the latest exchange rates:
$ quantrocket account rates --latest | csvlook -I
| BaseCurrency | QuoteCurrency | Rate | Date |
| ------------ | ------------- | ------- | ---------- |
| USD | AUD | 1.2774 | 2018-01-09 |
| USD | CAD | 1.2425 | 2018-01-09 |
| USD | CHF | 0.98282 | 2018-01-09 |
...
>>> from quantrocket.account import download_exchange_rates
>>> import io
>>> import pandas as pd
>>> f = io.StringIO()
>>> download_exchange_rates(f, latest=True)
>>> rates = pd.read_csv(f, parse_dates=["Date"])
>>> rates.head()
BaseCurrency QuoteCurrency Rate Date
0 USD AUD 1.2774 2018-01-09
1 USD CAD 1.2425 2018-01-09
2 USD CHF 0.98282 2018-01-09
...
$ curl 'http://houston/account/rates.csv?latest=true'
BaseCurrency,QuoteCurrency,Rate,Date
USD,AUD,1.2774,2018-01-09
USD,CAD,1.2425,2018-01-09
USD,CHF,0.98282,2018-01-09
...
Or download a CSV of all exchange rates stored in your database:
$ quantrocket account rates --outfile rates.csv
>>> from quantrocket.account import download_exchange_rates
>>> download_exchange_rates("rates.csv")
>>> rates = pd.read_csv("rates.csv")
$ curl 'http://houston/account/rates.csv' > rates.csv
Note on CNH (offshore Yuan)
The European Central Bank provides exchange rates for CNY (onshore Yuan) but not CNH (offshore Yuan). Some IB products are denominated in CNH. To facilitate currency conversions of CNH-denominated products, QuantRocket returns CNY rates for both CNH and CNY. CNY and CNH exchange rates are typically very similar but not identical. We believe this approximation will be satisfactory for most QuantRocket use cases.
Orders and Positions
You can use QuantRocket's blotter service to place, monitor, and cancel orders, track open positions, and record and analyze live trading performance.
In trading terminology, a "blotter" is a detailed log or record of orders and executions. In QuantRocket the blotter is not only used for tracking orders but for placing orders as well.
Place orders
You can place orders from a CSV or JSON file, or directly from the CLI or Python client. A CSV of orders should have one order per row:
$
$ csvlook -I orders.csv
| ConId | Account | Action | OrderRef | TotalQuantity | Exchange | OrderType | Tif |
| ------- | -------- | ------ | -------- | ------------- | -------- | --------- | --- |
| 265598 | DU123456 | BUY | dma-tech | 500 | SMART | MKT | DAY |
| 3691937 | DU123456 | BUY | dma-tech | 50 | SMART | MKT | DAY |
For live trading, Moonshot produces a CSV of orders similar to the above example. A JSON file of orders can also be used and should consist of an array of orders:
$
$ cat orders.json
[
{
"ConId": 265598,
"Account": "DU123456",
"Action": "BUY",
"OrderRef": "dma-tech",
"TotalQuantity": 500,
"Exchange": "SMART",
"OrderType": "MKT",
"Tif": "DAY"
},
{
"ConId": 3691937,
"Account": "DU123456",
"Action": "BUY",
"OrderRef": "dma-tech",
"TotalQuantity": 50,
"Exchange": "SMART",
"OrderType": "MKT",
"Tif": "DAY"
}
]
Use the blotter to place the orders in the file. The order IDs will be returned:
$ quantrocket blotter order -f orders.csv
6001:25
6001:26
>>> from quantrocket.blotter import place_orders
>>> place_orders(infilepath_or_buffer="orders.csv")
['6001:25', '6001:26']
$ curl -X POST 'http://houston/blotter/orders' --upload-file orders.csv
["6001:25", "6001:26"]
Instead of submitting a pre-made file of orders, you can also create orders directly in Python:
>>> from quantrocket.blotter import place_orders
>>> orders = []
>>> order1 = {
"ConId": 265598,
"Account": "DU123456",
"Action": "BUY",
"OrderRef": "dma-tech",
"TotalQuantity": 500,
"Exchange": "SMART",
"OrderType": "MKT",
"Tif": "DAY"
}
>>> orders.append(order1)
>>> order2 = {
"ConId": 3691937,
"Account": "DU123456",
"Action": "BUY",
"OrderRef": "dma-tech",
"TotalQuantity": 50,
"Exchange": "SMART",
"OrderType": "MKT",
"Tif": "DAY"
}
>>> orders.append(order2)
>>> order_ids = place_orders(orders)
Alternatively, you can place an order by specifying the order parameters directly on the command line. This approach is limited to placing one order at a time but is useful for testing and experimentation as well as one-off orders:
$
$ quantrocket blotter order --params 'ConId:265598' 'Action:BUY' 'Exchange:SMART' 'TotalQuantity:500' 'OrderType:MKT' 'Tif:DAY' 'Account:DU123456' 'OrderRef:dma-tech'
6001:27
Order fields
IB offers a large assortment of order types and algos. Learn about the available order types on IB's website, and refer to the IB API documentation for API example orders and a full list of possible order parameters . It can be helpful to manually create an order in Trader Workstation to familiarize yourself with the order attributes before trying to create the order via the API.
Order fields in QuantRocket should always use UpperCamelCase, that is, a concatenation of capitalized words, e.g. "OrderType". (Within the IB API documentation itself you will sometimes see UpperCamelCase and sometimes lowerCamelCase depending on the programming language.)
Required fields
The following fields are required when placing an order:
ConId
: the unique contract identifier for the security/instrumentAction
: "BUY" or "SELL"TotalQuantity
: the number of shares or contracts to orderOrderType
: the order type, e.g. "MKT" or "LMT"Tif
: the time-in-force, e.g. "DAY" or "GTC" (good-till-canceled)OrderRef
: a user-defined identifier used to associate the order with a trading strategyExchange
: the exchange to route the order to (not necessarily the primarily listing exchange), e.g. "SMART" or "NYSE". To see the available exchanges for a security, check the ValidExchanges
field in the master file (quantrocket master get
), or use Trader Workstation.Account
: the account number is required if connected to multiple accounts, as explained below
Specifying the account number in the Account
field is a best practice and is required if IB Gateway is connected to more than one account. (Moonshot order CSVs always include the Account
field.) If Account
is not specified and the blotter (via the IB Gateway services) is only connected to one account, that account will be used. If Account
is not specified and the blotter is connected to multiple accounts, the orders will be rejected:
$ quantrocket blotter order --params 'ConId:265598' 'Action:BUY' 'Exchange:SMART' 'TotalQuantity:500' 'OrderType:MKT' 'Tif:Day' 'OrderRef:dma-tech'
msg: 'no account specified and cannot infer because multiple accounts connected (connected
accounts: DU12345,U12345; order:
{"ConId": "265598", "Action": "BUY", "Exchange": "SMART", "TotalQuantity": "500",
"OrderType": "MKT", "Tif": "Day", "OrderRef": "dma-tech"}'
status: error
The OrderRef field
IB provides the OrderRef
field to allow users to assign arbitrary labels to orders for the user's own tracking purposes. In QuantRocket, the OrderRef
field is required as it is used to associate orders with a particular trading strategy. For orders generated by Moonshot, the strategy code (e.g. "dma-tech") is used as the order ref. This enables the blotter to track positions and performance on a strategy-by-strategy basis.
Order IDs
When you place orders, the blotter generates and returns unique order IDs for each order:
$ quantrocket blotter order -f orders.csv
6001:25
6001:26
>>> from quantrocket.blotter import place_orders
>>> place_orders(infilepath_or_buffer="orders.csv")
['6001:25', '6001:26']
$ curl -X POST 'http://houston/blotter/orders' --upload-file orders.csv
["6001:25", "6001:26"]
Orders IDs are used internally by the blotter and can be used to check order statuses or cancel orders. You can also check order statuses or cancel orders based on other lookups such as the order ref, account, or conid, so it is typically not necessary to hold on to the order IDs.
Order IDs take the form of <ClientId:OrderNum>
, where ClientId
is the ID used by the blotter (the client) to connect to the IB API, and OrderNum
is an auto-incrementing.
Parent-child orders, aka attached orders
IB provides the concept of attached orders , whereby a "parent" and "child" order are submitted to IB at the same time, but IB only activates the child order and submits it to the exchange if the parent order executes. Attached orders can be used for bracket orders and hedging orders , and can also be used in Moonshot to attach exit orders to entry orders.
Submitting an attached order requires adding a ParentId
attribute to the child order, which should be set to the OrderId
of the parent order. The following example CSV includes a market order to BUY 100 shares of AAPL, as well as a child order to sell 100 shares of AAPL at the close.
$ csvlook -I parent_child_orders.csv
| ConId | Account | Action | OrderRef | TotalQuantity | Exchange | OrderType | Tif | OrderId | ParentId |
| ------ | -------- | ------ | --------- | ------------- | -------- | --------- | --- | ------- | -------- |
| 265598 | DU123456 | BUY | strategy1 | 100 | SMART | MKT | DAY | 1 | |
| 265598 | DU123456 | SELL | strategy1 | 100 | SMART | MOC | DAY | | 1 |
The ParentId
of the second order links as a child order to the OrderId
of the first order. Note that the OrderId
and ParentId
fields in your orders file are not the actual order IDs used by the blotter. The blotter uses OrderId
/ParentId
(if provided) to identify linked orders but then generates the actual order IDs at the time of order submission to IB. Therefore any number can be used for the OrderId
/ParentId
as long as they are unique within the file.
The parent order must precede the child order in the orders file.
The blotter expects parent-child orders to be submitted within the same file. Attaching child orders to parent orders that were placed at a previous time is not supported.
IB execution algos
IB provides various execution algos which can be helpful for working large orders into the market. In the IB API, these are specified by the AlgoStrategy
and AlgoParams
fields. The AlgoParams
field is a nested field which expects a list of multiple algo-specific parameters. When submitting orders via a JSON file or directly via Python, the AlgoParams
can be provided in a nested format. Here is an example of a VWAP order:
>>> orders = []
>>> order1 = {
"ConId": 265598,
"Account": "DU123456",
"Action": "BUY",
"OrderRef": "dma-tech",
"TotalQuantity": 10000,
"Exchange": "SMART",
"OrderType": "LMT",
"LmtPrice": 104.30,
"AlgoStrategy": "Vwap",
"AlgoParams": {
"maxPctVol": 0.1,
"noTakeLiq": 1,
},
"Tif": "DAY"
}
>>> orders.append(order1)
>>> place_orders(orders)
Since CSV is a flat-file format, a CSV orders file requires a different syntax for AlgoParams
. Algo parameters can be specified using underscore separators, e.g. AlgoParams_maxPctVol
:
$ csvlook -I vwap_orders.csv
| ConId | Account | Action | OrderRef | TotalQuantity | AlgoStrategy | AlgoParams_maxPctVol | AlgoParams_noTakeLiq | ...
| ------ | -------- | ------ | -------- | ------------- | ------------ | -------------------- | -------------------- |
| 265598 | DU123456 | BUY | dma-tech | 10000 | Vwap | 0.1 | 1 |
In the above example, carefully note that AlgoParams
is UpperCamelCase like other order fields, but the nested parameters (e.g. maxPctVol
) are lowerCamelCase.
Order status
You can check order statuses based on a variety of lookups including the order ref, account, conid, order ID, or date range the order was submitted. For example, you could check the order statuses of all orders associated with a particular order ref and submitted on or after a particular date (such as today's date):
$ quantrocket blotter status -r 'my-strategy' -s '2018-05-18' | csvlook -I
| OrderId | ConId | Action | TotalQuantity | Account | OrderRef | Status | Filled | Remaining | ...
| ------- | ------ | ------ | ------------- | -------- | ----------- | ------------ | ------ | --------- |
| 6001:61 | 265598 | BUY | 100 | DU123456 | my-strategy | Filled | 100 | 0 |
| 6001:62 | 265598 | SELL | 100 | DU123456 | my-strategy | PreSubmitted | 0 | 100 |
>>> from quantrocket.blotter import download_order_statuses
>>> import io
>>> f = io.StringIO()
>>> download_order_statuses(f, order_refs=["my-strategy"], start_date="2018-05-18")
>>> statuses = pd.read_csv(f, parse_dates=["Submitted"])
>>> statuses.head()
OrderId Submitted ConId Action TotalQuantity Account OrderRef Status Filled Remaining Errors
0 6001:61 2018-05-18 18:10:29 265598 BUY 100 DU123456 my-strategy Filled 100 0 NaN
1 6001:62 2018-05-18 18:10:29 265598 SELL 100 DU123456 my-strategy PreSubmitted 0 100 NaN
$ curl -X GET 'http://houston/blotter/orders.csv?order_refs=my-strategy&start_date=2018-05-18' | csvlook -I
| OrderId | ConId | Action | TotalQuantity | Account | OrderRef | Status | Filled | Remaining | ...
| ------- | ------ | ------ | ------------- | -------- | ----------- | ------------ | ------ | --------- |
| 6001:61 | 265598 | BUY | 100 | DU123456 | my-strategy | Filled | 100 | 0 |
| 6001:62 | 265598 | SELL | 100 | DU123456 | my-strategy | PreSubmitted | 0 | 100 |
You'll see the
order status as well as the shares filled and shares remaining. Open orders as well as completed orders are included. Optionally, you can show open orders only (this filter can also be combined with other filters):
$ quantrocket blotter status --open | csvlook -I
| OrderId | ConId | Action | TotalQuantity | Account | OrderRef | Status | Filled | Remaining | ...
| ------- | --------- | ------ | ------------- | -------- | --------------- | ------------ | ------ | --------- |
| 6001:62 | 265598 | SELL | 100 | DU123456 | my-strategy | PreSubmitted | 0 | 100 |
| 6001:64 | 269745169 | BUY | 1 | DU123456 | es-fut-daytrade | Submitted | 0 | 1 |
>>> f = io.StringIO()
>>> download_order_statuses(f, open_orders=True)
>>> statuses = pd.read_csv(f, parse_dates=["Submitted"])
>>> statuses.head()
OrderId Submitted ConId Action TotalQuantity Account OrderRef Status Filled Remaining Errors
0 6001:62 2018-05-18 18:10:29 265598 SELL 100 DU123456 my-strategy PreSubmitted 0 100 NaN
1 6001:64 2018-05-18 18:33:08 269745169 BUY 1 DU123456 es-fut-daytrade Submitted 0 1 NaN
$ curl -X GET 'http://houston/blotter/orders.csv?open_orders=true' | csvlook -I
| OrderId | ConId | Action | TotalQuantity | Account | OrderRef | Status | Filled | Remaining | ...
| ------- | --------- | ------ | ------------- | -------- | --------------- | ------------ | ------ | --------- |
| 6001:62 | 265598 | SELL | 100 | DU123456 | my-strategy | PreSubmitted | 0 | 100 |
| 6001:64 | 269745169 | BUY | 1 | DU123456 | es-fut-daytrade | Submitted | 0 | 1 |
You can request that additional order fields be returned:
$
$
$ quantrocket blotter status --order-ids '6001:64' --fields 'OrderType' 'LmtPrice' --json | json2yaml
---
-
OrderId: "6001:64"
Submitted: "2018-05-18T18:33:08+00:00"
ConId: 269745169
Action: "BUY"
TotalQuantity: 1
Account: "DU123456"
OrderRef: "es-fut-daytrade"
LmtPrice: 2000
OrderType: "LMT"
Status: "Submitted"
Filled: 0
Remaining: 1
Errors: null
>>> f = io.StringIO()
>>>
>>> download_order_statuses(f, order_ids=["6001:64"], fields=["OrderType", "LmtPrice"])
>>> statuses = pd.read_csv(f, parse_dates=["Submitted"])
>>> statuses.to_dict(orient="records")
[{'Account': 'DU123456',
'Action': 'BUY',
'ConId': 269745169,
'Errors': nan,
'Filled': 0,
'LmtPrice': 2000.0,
'OrderId': '6001:64',
'OrderRef': 'es-fut-daytrade',
'OrderType': 'LMT',
'Remaining': 1,
'Status': 'Submitted',
'Submitted': Timestamp('2018-05-18 18:33:08'),
'TotalQuantity': 1}]
$
$
$ curl -X GET 'http://houston/blotter/orders.json?order_ids=6001%3A64&fields=OrderType&fields=LmtPrice' | json2yaml
---
-
OrderId: "6001:64"
Submitted: "2018-05-18T18:33:08+00:00"
ConId: 269745169
Action: "BUY"
TotalQuantity: 1
Account: "DU123456"
OrderRef: "es-fut-daytrade"
LmtPrice: 2000
OrderType: "LMT"
Status: "Submitted"
Filled: 0
Remaining: 1
Errors: null
Because there are many possible order parameters and because IB periodically adds new parameters, not every order parameter is saved to its own field in the blotter database. Order parameters which aren't saved to their own field are saved in JSON format to a common field called
OrderDetailsJson
. You can pass a "?" or any invalid fieldname to see the list of available fields; if the field you want is missing, it's stored in
OrderDetailsJson
:
$
$ quantrocket blotter status --field '?'
msg: 'unknown order status fields: ? (available fields are: Account, Action, AdjustableTrailingUnit,
AdjustedStopLimitPrice, AdjustedStopPrice, AdjustedTrailingAmount, AlgoId, AlgoStrategy,
AllOrNone, AuxPrice, BlockOrder, ClientId, ConId, DiscretionaryAmt, DisplaySize,
Errors, Exchange, FaGroup, FaMethod, FaPercentage, FaProfile, Filled, GoodAfterTime,
GoodTillDate, Hidden, LmtPrice, LmtPriceOffset, MinQty, NotHeld, OcaGroup, OcaType,
OpenClose, OrderDetailsJson, OrderId, OrderNum, OrderRef, OrderType, Origin, OutsideRth,
ParentId, PercentOffset, PermId, Remaining, Status, Submitted, SweepToFill, Tif,
TotalQuantity, TrailStopPrice, TrailingPercent, Transmit, TriggerMethod, TriggerPrice,
WhatIf'
status: error
$
$
$ quantrocket blotter status -d '6001:65' --fields 'AlgoStrategy' 'OrderDetailsJson' --json | json2yaml
---
-
OrderId: "6001:65"
Submitted: "2018-05-18T19:02:25+00:00"
ConId: 265598
Action: "BUY"
TotalQuantity: 10000
Account: "DU123456"
OrderRef: "my-strategy"
OrderDetailsJson:
AlgoParams:
maxPctVol: 0.1
noTakeLiq: 0
AlgoStrategy: "Vwap"
Status: "Submitted"
Filled: 4000
Remaining: 6000
Errors: null
>>> f = io.StringIO()
>>>
>>> download_order_statuses(f, fields=["?"])
HTTPError: ('400 Client Error: BAD REQUEST for url: http://houston/blotter/orders.csv?fields=%3F', {'status': 'error', 'msg': 'unknown order status fields: ? (available fields are: Account,Action, AdjustableTrailingUnit, AdjustedStopLimitPrice, AdjustedStopPrice, AdjustedTrailingAmount, AlgoId, AlgoStrategy, AllOrNone, AuxPrice, BlockOrder, ClientId, ConId, DiscretionaryAmt, DisplaySize, Errors, Exchange, FaGroup, FaMethod, FaPercentage, FaProfile, Filled, GoodAfterTime, GoodTillDate, Hidden, LmtPrice, LmtPriceOffset, MinQty, NotHeld, OcaGroup, OcaType, OpenClose, OrderDetailsJson, OrderId, OrderNum, OrderRef, OrderType, Origin, OutsideRth, ParentId, PercentOffset, PermId, Remaining, Status, Submitted, SweepToFill, Tif, TotalQuantity, TrailStopPrice, TrailingPercent, Transmit, TriggerMethod, TriggerPrice, WhatIf'})
>>>
>>>
>>> download_order_statuses(f, order_ids=["6001:65"], fields=["AlgoStrategy", "OrderDetailsJson"])
>>> statuses = pd.read_csv(f, parse_dates=["Submitted"])
>>> statuses.iloc[0]
OrderId 6001:65
Submitted 2018-05-18 19:02:25
ConId 265598
Action BUY
TotalQuantity 1000
Account DU123456
OrderRef my-strategy
OrderDetailsJson {'AlgoParams': {'maxPctVol': 0.1, 'noTakeLiq': 0}}
AlgoStrategy Vwap
Status Submitted
Filled 0
Remaining 1000
Errors NaN
$
$ curl -X GET 'http://houston/blotter/orders.csv?fields=?'
{"status": "error", "msg": "unknown order status fields: ? (available fields are: Account, Action, AdjustableTrailingUnit, AdjustedStopLimitPrice, AdjustedStopPrice, AdjustedTrailingAmount,AlgoId, AlgoStrategy, AllOrNone, AuxPrice, BlockOrder, ClientId, ConId, DiscretionaryAmt, DisplaySize, Errors, Exchange, FaGroup, FaMethod, FaPercentage, FaProfile, Filled, GoodAfterTime, GoodTillDate, Hidden, LmtPrice, LmtPriceOffset, MinQty, NotHeld, OcaGroup, OcaType, OpenClose, OrderDetailsJson, OrderId, OrderNum, OrderRef, OrderType, Origin, OutsideRth, ParentId, PercentOffset, PermId, Remaining, Status, Submitted, SweepToFill, Tif, TotalQuantity, TrailStopPrice, TrailingPercent, Transmit, TriggerMethod, TriggerPrice, WhatIf"}
$
$
$ curl -X GET 'http://houston/blotter/orders.json?order_ids=6001%3A65&fields=AlgoStrategy&fields=OrderDetailsJson' | json2yaml
---
-
OrderId: "6001:65"
Submitted: "2018-05-18T19:02:25+00:00"
ConId: 265598
Action: "BUY"
TotalQuantity: 10000
Account: "DU123456"
OrderRef: "my-strategy"
OrderDetailsJson:
AlgoParams:
maxPctVol: 0.1
noTakeLiq: 0
AlgoStrategy: "Vwap"
Status: "Submitted"
Filled: 4000
Remaining: 6000
Errors: null
Possible order statuses
The IB API defines the following order statuses:
ApiPending
- indicates order has not yet been sent to IB server, for instance if there is a delay in receiving the security definition. Uncommonly received.PendingSubmit
- indicates the order was sent from TWS, but confirmation has not been received that it has been received by the destination. Most commonly because exchange is closed.PendingCancel
- indicates that a request has been sent to cancel an order but confirmation has not been received of its cancellation.PreSubmitted
- indicates that a simulated order type has been accepted by the IB system and that this order has yet to be elected. The order is held in the IB system until the election criteria are met. At that time the order is transmitted to the order destination as specified.Submitted
- indicates that your order has been accepted at the order destination and is working.ApiCancelled
- after an order has been submitted and before it has been acknowledged, an API client can request its cancellation, producing this state.Cancelled
- indicates that the balance of your order has been confirmed cancelled by the IB system. This could occur unexpectedly when IB or the destination has rejected your order.Filled
- indicates that the order has been completely filled.Inactive
- indicates an order is not working, possible reasons include:- it is invalid or triggered an error. A corresponding error code is expected.
- the order is to short shares but the order is being held while shares are being located.
- an order is placed manually in TWS while the exchange is closed.
- an order is blocked by TWS due to a precautionary setting and appears there in an untransmitted state
Error
- this order status is provided by QuantRocket for orders that are immediately rejected by IB's system and thus never receive an order status from IB
Order errors and rejections
Your order might be rejected by the blotter or (more commonly) by IB or the exchange. The blotter performs basic validation of your orders such as making sure required fields are present:
$ quantrocket blotter order -p 'ConId:269745169' 'Action:BUY' 'OrderType:MKT' 'Tif:DAY' 'TotalQuantity:1'
msg: 'missing required fields OrderRef,Exchange,Account for order: {"ConId": "269745169",
"Action": "BUY", "OrderType": "MKT", "Tif": "DAY", "TotalQuantity": "1"}'
status: error
If the blotter rejects your orders, as indicated by an error message being returned, this means the whole batch of orders was rejected. In other words, either all of the orders are submitted to IB, or none are.
In contrast, if the batch of orders is submitted to IB (as indicated by the blotter returning a list of order IDs), IB and/or the exchange will accept or reject each order independently. You can check the order status to see if the order was rejected or cancelled. Any error messages from IB will be provided in the Errors
field. For example, if you don't have sufficient equity in your account, you might see an error like this:
$ quantrocket blotter status -d '6003:15' --json | json2yaml
---
-
OrderId: "6003:15"
Submitted: "2018-02-20T16:59:40+00:00"
ConId: 3691937
Action: "SELL"
TotalQuantity: 300
Account: "DU123456"
OrderRef: "my-strategy"
Status: "Cancelled"
Filled: 0
Remaining: 300
Errors:
-
ErrorCode: 202
ErrorMsg: "Order Canceled - reason:Your order is not accepted because your Equity with Loan Value of [499521.99 USD] is insufficient to cover the Initial Margin requirement of [537520.21 USD]\n"
>>> f = io.StringIO()
>>> download_order_statuses(f, order_ids=["6003:15"])
>>> statuses = pd.read_csv(f, parse_dates=["Submitted"])
>>> statuses.to_dict(orient="records")
[{'Account': 'DU123456',
'Action': 'SELL',
'ConId': 3691937,
'Errors': '[{"ErrorCode": 202, "ErrorMsg": "Order Canceled - reason:Your order is not accepted because your Equity with Loan Value of [499521.99 USD] is insufficient to cover the Initial Margin requirement of [537520.21 USD]\n"}]',
'Filled': 0,
'OrderId': '6003:15',
'OrderRef': 'my-strategy',
'Remaining': 300,
'Status': 'Cancelled',
'Submitted': Timestamp('2018-02-20 16:59:40'),
'TotalQuantity': 300}]
$ curl -X GET 'http://houston/blotter/orders.json?order_ids=6003%3A15' | json2yaml
---
-
OrderId: "6003:15"
Submitted: "2018-02-20T16:59:40+00:00"
ConId: 3691937
Action: "SELL"
TotalQuantity: 300
Account: "DU123456"
OrderRef: "my-strategy"
Status: "Cancelled"
Filled: 0
Remaining: 300
Errors:
-
ErrorCode: 202
ErrorMsg: "Order Canceled - reason:Your order is not accepted because your Equity with Loan Value of [499521.99 USD] is insufficient to cover the Initial Margin requirement of [537520.21 USD]\n"
Error messages don't always mean the order was rejected or cancelled. Some errors are more like informational warnings (for example, error 404 when shares aren't available for shorting: "Order held while securities are located"). Always check the specific error message and accompanying order status. You can look up the error code in IB's API documentation to get more information about the error, or open a support ticket with IB customer service.
One error that bears special mention because it is potentially confusing is error code 200: "No security definition has been found for the request." Normally, this error occurs when a security has been delisted and is no longer available in IB's database. However, in the context of order statuses, you can receive error code 200 for a valid conid if you try to route the order to an invalid exchange for the security:
$
$ quantrocket blotter order -p 'ConId:265598' 'Action:BUY' 'OrderType:MKT' 'Exchange:GLOBEX' 'Tif:DAY' 'OrderRef:my-strategy' 'TotalQuantity:100'
6001:66
$ quantrocket blotter status -d '6001:66' --json | json2yaml
---
-
OrderId: "6001:66"
Submitted: "2018-05-18T20:37:25+00:00"
ConId: 265598
Action: "BUY"
TotalQuantity: 100
Account: "DU123456"
OrderRef: "my-strategy"
Status: "Error"
Filled: 0
Remaining: 100
Errors:
-
ErrorCode: 200
ErrorMsg: "No security definition has been found for the request"
Cancel orders
You can cancel orders by order ID, account, conid, or order ref. For example, cancel all open orders for a particular order ref:
$ quantrocket blotter cancel --order-refs 'my-strategy'
order_ids:
- 6001:62
- 6001:65
status: the orders will be canceled asynchronously
>>> from quantrocket.blotter import cancel_orders
>>> cancel_orders(order_refs=["my-strategy"])
{'order_ids': ['6001:62', '6001:65'],
'status': 'the orders will be canceled asynchronously'}
$ curl -X DELETE 'http://houston/blotter/orders?order_refs=my-strategy'
{"order_ids": ["6001:62", "6001:65"], "status": "the orders will be canceled asynchronously"}
Or cancel all open orders:
$ quantrocket blotter cancel --all
order_ids:
- 6001:66
- 6001:67
- 6001:70
status: the orders will be canceled asynchronously
>>> from quantrocket.blotter import cancel_orders
>>> cancel_orders(cancel_all=True)
{'order_ids': ['6001:66', '6001:67', '6001:70'],
'status': 'the orders will be canceled asynchronously'}
$ curl -X DELETE 'http://houston/blotter/orders?cancel_all=true'
{"order_ids": ["6001:66", "6001:67", "6001:70"], "status": "the orders will be canceled asynchronously"}
Canceling an order submits the cancellation request to IB. To verify that the orders were actually cancelled, check the order status:
$ quantrocket blotter status -d '6001:62' --json | json2yaml
---
-
OrderId: "6001:62"
Submitted: "2018-05-18T18:33:08+00:00"
ConId: 265598
Action: "BUY"
TotalQuantity: 100
Account: "DU12345"
OrderRef: "my-strategy"
Status: "Cancelled"
Filled: 0
Remaining: 100
Errors:
-
ErrorCode: 202
ErrorMsg: "Order Canceled - reason:"
>>> f = io.StringIO()
>>> download_order_statuses(f, order_ids=["6001:62"])
>>> statuses = pd.read_csv(f, parse_dates=["Submitted"])
>>> statuses.to_dict(orient="records")
[{'Account': 'DU12345',
'Action': 'BUY',
'ConId': 265598,
'Errors': '[{"ErrorCode": 202, "ErrorMsg": "Order Canceled - reason:"}]',
'Filled': 0,
'OrderId': '6001:62',
'OrderRef': 'my-strategy',
'Remaining': 100,
'Status': 'Cancelled',
'Submitted': Timestamp('2018-05-18 18:33:08'),
'TotalQuantity': 100}]
$ curl -X GET 'http://houston/blotter/orders.json?order_ids=6001%3A64' | json2yaml
---
-
OrderId: "6001:62"
Submitted: "2018-05-18T18:33:08+00:00"
ConId: 265598
Action: "BUY"
TotalQuantity: 100
Account: "DU12345"
OrderRef: "my-strategy"
Status: "Cancelled"
Filled: 0
Remaining: 100
Errors:
-
ErrorCode: 202
ErrorMsg: "Order Canceled - reason:"
Track positions
The blotter tracks your positions by account, conid, and order ref:
$ quantrocket blotter positions | csvlook -I
| Account | OrderRef | ConId | Quantity |
| -------- | ---------------- | --------- | -------- |
| DU123456 | dma-tech | 265598 | 541 |
| DU123456 | dma-tech | 3691937 | 108 |
| DU123456 | my-strategy | 265598 | 200 |
| U1234567 | es-fut-daytrade | 269745169 | -1 |
| U1234567 | my-strategy | 265598 | -100 |
>>> from quantrocket.blotter import download_positions
>>> import io
>>> f = io.StringIO()
>>> download_positions(f)
>>> positions = pd.read_csv(f)
>>> positions.head()
Account OrderRef ConId Quantity
0 DU123456 dma-tech 265598 541
1 DU123456 dma-tech 3691937 108
2 DU123456 my-strategy 265598 200
3 U1234567 es-fut-daytrade 269745169 -1
4 U1234567 my-strategy 265598 -100
$ curl -X GET 'http://houston/blotter/positions.csv' | csvlook -I
| Account | OrderRef | ConId | Quantity |
| -------- | ---------------- | --------- | -------- |
| DU123456 | dma-tech | 265598 | 541 |
| DU123456 | dma-tech | 3691937 | 108 |
| DU123456 | my-strategy | 265598 | 200 |
| U1234567 | es-fut-daytrade | 269745169 | -1 |
| U1234567 | my-strategy | 265598 | -100 |
The blotter tracks positions by order ref so that multiple trading strategies can trade the same security and independently manage their positions. (Moonshot uses the blotter to take account of existing positions when generating orders.) IB does not track or report positions by order ref (only by account and conid), so the blotter tracks positions independently by monitoring trade executions.
For casual viewing of your portfolio where segregation by order ref isn't required, you may find the
account portfolio endpoint more convenient than using the blotter. The account portfolio endpoint provides a basic snapshot of what is visible in TWS, including descriptive labels for your positions (the blotter shows conids only), realized and unrealized PNL, and several other fields.
Close positions
You can use the blotter to generate a CSV of orders to close existing positions by account, conid, and/or order ref. Suppose you hold the following positions for a particular strategy:
$ quantrocket blotter positions --order-refs 'dma-tech' | csvlook -I
| Account | OrderRef | ConId | Quantity |
| -------- | -------- | ------- | -------- |
| DU123456 | dma-tech | 265598 | 1001 |
| DU123456 | dma-tech | 3691937 | -108 |
>>> f = io.StringIO()
>>> download_positions(f, order_refs=["dma-tech"])
>>> positions = pd.read_csv(f)
>>> positions.head()
Account OrderRef ConId Quantity
0 DU123456 dma-tech 265598 1001
1 DU123456 dma-tech 3691937 -108
$ curl -X GET 'http://houston/blotter/positions.csv?order_refs=dma-tech' | csvlook -I
| Account | OrderRef | ConId | Quantity |
| -------- | -------- | ------- | -------- |
| DU123456 | dma-tech | 265598 | 1001 |
| DU123456 | dma-tech | 3691937 | -108 |
To faciliate closing the positions, the blotter can generate a similar CSV output with the addition of an
Action
column set to "BUY" or "SELL" as needed to flatten the positions. You can specify additional order parameters to be appended to the CSV. In this example, we create SMART-routed market orders:
$ quantrocket blotter close --order-refs 'dma-tech' --params 'OrderType:MKT' 'Tif:Day' 'Exchange:SMART' | csvlook -I
| Account | OrderRef | ConId | TotalQuantity | Action | OrderType | Tif | Exchange |
| -------- | -------- | ------- | ------------- | ------ | --------- | --- | -------- |
| DU123456 | dma-tech | 265598 | 1001 | SELL | MKT | Day | SMART |
| DU123456 | dma-tech | 3691937 | 108 | BUY | MKT | Day | SMART |
>>> from quantrocket.blotter import close_positions
>>> import io
>>> f = io.StringIO()
>>> close_positions(f, order_refs=["dma-tech"], params={"OrderType":"MKT", "Tif":"Day", "Exchange":"SMART"})
>>> orders = pd.read_csv(f)
>>> orders.head()
Account OrderRef ConId TotalQuantity Action OrderType Tif Exchange
0 DU123456 dma-tech 265598 1001 SELL MKT Day SMART
1 DU123456 dma-tech 3691937 108 BUY MKT Day SMART
$ curl -X DELETE 'http://houston/blotter/positions.csv?order_refs=dma-tech¶ms=OrderType%3AMKT¶ms=Tif%3ADay¶ms=Exchange%3ASMART' | csvlook -I
| Account | OrderRef | ConId | TotalQuantity | Action | OrderType | Tif | Exchange |
| -------- | -------- | ------- | ------------- | ------ | --------- | --- | -------- |
| DU123456 | dma-tech | 265598 | 1001 | SELL | MKT | Day | SMART |
| DU123456 | dma-tech | 3691937 | 108 | BUY | MKT | Day | SMART |
Using the CLI, you can pipe the resulting orders CSV to the blotter to be placed:
$ quantrocket blotter close --order-refs 'dma-tech' --params 'OrderType:MKT' 'Tif:Day' 'Exchange:SMART' | quantrocket blotter order -f '-'
6001:79
6001:80
Any order parameters you specify using --params
are applied to each order in the file. To set parameters that vary per order (such as limit prices), save the CSV to file, edit it, then submit the orders:
$ quantrocket blotter close --order-refs 'dma-tech' --params 'OrderType:LMT' 'LmtPrice:0' 'Exchange:SMART' -o orders.csv
$
$ quantrocket blotter order -f orders.csv
Close positions from TWS
If you prefer, you can close a position manually from within Trader Workstation. If you do so, make sure to enable the Order Ref field in TWS (field location varies by TWS screen and configuration) and set the appropriate order ref so that the blotter can associate the trade execution with the correct strategy:

Tracking the performance of your trading strategies after they go live is just as important as backtesting them before they go live. As D.E. Shaw once said, "Analyzing the results of live trading taught us things that couldn't be learned by studying historical data." QuantRocket saves all of your trade executions to the blotter database and makes it easy to analyze your live performance. You can plot your PNL (profit and loss) by strategy and account using Moonchart and overlay your live results with your backtests to measure implementation shortfall.
Once you've accumulated some live trading results, you can query your PNL (profit and loss) from the blotter, optionally filtering by account, order ref (=strategy code), conid, or date range. Moonchart, the library used for Moonshot backtest visualizations, is also designed to support live trading visualization:
The performance plots will look similar to those you get for a Moonshot backtest, plus a few additional PNL-specific plots:

The blotter can return a CSV of PNL results, or a PDF tear sheet created from the CSV. The CSV output can be loaded into a DataFrame:
>>> from quantrocket.blotter import download_pnl, read_pnl_csv
>>> download_pnl("pnl.csv")
>>> results = read_pnl_csv("pnl.csv")
>>> results.head()
japan-overnight canada-energy midcap-earnings t3-nyse
Field Date Time
AbsExposure 2016-01-04 09:30:00 0.449054 0.0724852 0 0
09:30:01 0.289684 0.0724852 0 0
09:30:02 0 0.0724852 0 0
09:40:03 0 0.109379 0 0
09:40:04 0 0.0520223 0 0
Similar to a Moonshot backtest, the DataFrame consists of several stacked DataFrames, one DataFrame per field (see PNL field reference). Use .loc
to isolate a particular field:
>>> pnl = results.loc["Pnl"]
>>> pnl.head()
japan-overnight canada-energy midcap-earnings t3-nyse
Date Time
2016-01-04 09:30:00 -6643.5 -152.325 0 0
09:30:01 -2349.53 0 0 0
09:30:02 -3014.09 0 0 0
09:40:03 0 903.324 0 0
09:40:04 0 -626.888 0 0
PNL is reported in your account's base currency. QuantRocket's blotter takes care of converting trades denominated in foreign currencies.
PNL result CSVs contain the following fields in a stacked format. Each field is a DataFrame:
Pnl
: the daily profit and loss after commissions, expressed in the base currencyCommissionAmount
: the daily commissions paid, expressed in the base currencyCommission
: the commissions expressed as a decimal percentage of the net liquidation valueNetLiquidation
: the net liquidation value (account balance) for the account, as stored in the account databaseReturn
: the daily PNL (after commissions) expressed as a decimal percentage of the net liquidation valueNetExposure
: the net long or short positions, expressed as a proportion of the net liquidation valueAbsExposure
: the absolute value of positions, irrespective of their side (long or short). Expressed as a proportion of the net liquidation value. This represents the total market exposure of the strategy.TotalHoldings
: the total number of holdings.Turnover
: the turnover as a proportion of net liquidation value.OrderRef
: the order ref (= strategy code)Account
: the account number
The CSV/DataFrame column names—and the resulting series names in tear sheet plots—depend on how many accounts and order refs are included in the query results. For PNL results using --details/details=True
, there is a column per security. For non-detailed, multi-strategy, or multi-account PNL results, there is a column per strategy per account, with each column containing the aggregated (summed) results of all component securities for that strategy and account. The table below provides a summary:
If PNL query results are for... | column names will be... |
---|
one account, multiple order refs | order refs |
one order ref, multiple accounts | accounts |
multiple accounts, multiple orders refs | <OrderRef> - <Account> |
one account, one order ref, and --details/details=True is specified | securities (conids) |
PNL is calculated from trade execution records received from IB and saved to the blotter database. The calculation (in simplified form) works as follows:
- for each execution, calculate the proceeds (price X quantity bought or sold). For sales, the proceeds are positive; for purchases, the proceeds are negative (referred to as the cost basis).
- for each security (segregated per account and order ref), calculate the cumulative proceeds over time as shares/contracts are bought and sold.
- Likewise, calculate the cumulative quantity/position size over time as shares are bought and sold.
- The cumulative PNL (before commissions) is equal to the cumulative proceeds, but only when the cumulative quantity is zero, i.e. when the position has been closed. (When the quantity is nonzero, i.e. a position is open, the cumulative proceeds reflect a temporary credit or debit that will be offset when the position is closed. Thus cumulative proceeds do not represent PNL when there is an open position.)
The following example illustrates the calculation:
Action | Proceeds | Cumulative proceeds | Cumulative quantity | Cumulative PNL |
---|
BUY 200 shares of AAPL at $100 | -$20,000 | -$20,000 | 200 | — |
SELL 100 shares of AAPL at $105 | $10,500 | -$9,500 | 100 | — |
SELL 100 shares of AAPL at $110 | $11,000 | $1,500 | 0 | $1,500 |
SELL 100 shares of AAPL at $115 | $11,500 | $13,000 | -100 | — |
BUY 100 shares of AAPL at $120 | -$12,000 | $1,000 | 0 | $1,000 |
Accurate PNL calculation requires the blotter to have a complete history of trade executions. If executions are missing, not only will those trades not be reflected in the PNL but the cumulative quantities will be wrong, impacting the entire calculation. See the next section for best practices to ensure a complete history.
You may notice that PNL queries run faster the second time than the first time. The first time a PNL query runs, the blotter queries the entire execution history, calculates PNL, and caches the results in the blotter database. Subsequently, the cached results are returned, resulting in a speedup. The next time a new execution occurs for a particular account and order ref, the cached results for that account and order ref are deleted, forcing the blotter to recalculate PNL from the raw execution history the next time a PNL query is run.
Accurate PNL calculation requires the blotter to have a complete history of trade executions.
Whenever the blotter is connected to IB Gateway, it retrieves all available executions from IB every minute or so. The IB API makes available the current day's executions; more specifically, it makes available all executions which have occurred since the most recent IB server restart, the timing of which depends on the customer's location .
Consequently, to ensure the blotter has a complete execution history, the blotter must be connected to IB Gateway at least once after all executions for the day have finished and before the daily IB server restart. Executions could be missed under the following sequence of events:
- you place a non-marketable or held order
- you stop the IB Gateway service; thus the blotter is no longer receiving execution notifications from IB
- the order is subsequently filled
- you don't restart IB Gateway until after the next IB server restart, at which time the missed execution is no longer available from the IB API
A good rule of thumb is, if you have working orders, try to keep IB Gateway running so the blotter can be notified of executions. If you need to stop IB Gateway while there are working orders, make sure to restart it at least once before the end of the day.
Be aware of the following current limitations of PNL calculation:
- At present, positions are only priced when there is an execution; they are not marked-to-market on a daily basis. Thus, only realized PNL is reflected; unrealized PNL/open positions are not reflected.
- Due to positions not being marked-to-market, performance plots for multi-day positions may appear jumpy, that is, have flat lines for the duration of the position followed by a large jump in PNL when the position is closed. This jumpiness can affect the Sharpe ratio compared to what it would be if the positions were marked-to-market. The more frequently your strategy trades, the less this will be an issue.
- At present, dividends (received or debited) are not reflected in PNL.
- Margin interest and other fees are not reflected in PNL.
- At present, stock splits on existing positions are not accounted for by the blotter. Consequently PNL calculations will be wrong for positions that undergo splits while the position is held, since the opening and closing quantities will not match.
- IB commissions for FX trades are denominated in USD rather than in the base currency or trade currency. The blotter handles FX commissions correctly for accounts with USD base currency, but not for accounts with non-USD base currencies. This defect will be remedied in a future release.
Backtesting is fraught with biases that can inflate your backtest results and cause live trading to fall short of your expectations.
For Moonshot strategies, you can use Moonchart to compare your actual performance with the simulated performance of your backtest. This is an important tool for assessing whether your backtest adequately models live trading conditions and therefore whether the backtest can be trusted.
The term "implementation shortfall" often refers narrowly to the difference between the price when a trading decision is made and the price when the trade is executed. In QuantRocket, the term is used more broadly to refer to the difference between simulated and actual performance, whatever the cause.
To create a shortfall tear sheet, download CSVs of the backtest results and live performance results over the same date range:
>>> from quantrocket.blotter import download_pnl
>>> from quantrocket.moonshot import backtest
>>> download_pnl("pnl.csv", start_date="2019-06-01", order_refs="demo-strategy", accounts=["DU12345"])
>>> backtest("demo-strategy", start_date="2019-06-01", filepath_or_buffer="backtest_results.csv")
Then use the CSVs to create the tear sheet:
>>> from moonchart import ShortfallTearsheet
>>>
>>> ShortfallTearsheet.from_csvs(
x_filepath_or_buffer="backtest_results.csv",
y_filepath_or_buffer="pnl.csv")
The resulting tear sheet compares the cumulative returns and various other metrics:

Shortfall tear sheets can optionally include a table of specific dates and instruments with the largest magnitude shortfall (that is, the largest difference, whether positive or negative, between simulated and actual results). To use this feature, download detailed CSVs (details=True
) of PNL and backtest results. Then use the largest_n
parameter to specify how many specific dates and instruments to highlight:
>>> download_pnl("pnl.csv", details=True, start_date="2019-06-01", order_refs="demo-strategy", accounts=["DU12345"])
>>> backtest("demo-strategy", details=True, start_date="2019-06-01", filepath_or_buffer="backtest_results.csv")
>>> ShortfallTearsheet.from_csvs(
x_filepath_or_buffer="backtest_results_details.csv",
y_filepath_or_buffer="pnl_details.csv",
largest_n=5)
The resulting table of dates and instruments provides a useful starting point for a more detailed investigation of the causes of shortfall.

The tear sheet also includes an additional plot which compares simulated and actual performance excluding the dates and instruments that differ most. This can help you understand whether the shortfall is systemic or caused by a few outliers.
You can download and review the "raw" execution records from the blotter rather than the calculated PNL, optionally filtering by account, order ref, conid, or date range:
Execution records contain a combination of fields provided directly by the IB API and QuantRocket-provided fields related to currency conversions. An example execution is shown below:
$ head executions.csv | csvjson | json2yaml
-
ExecId: "00018037.55555555.01.01"
OrderId: "6001:55"
Account: "DU123456"
OrderRef: "dma-tech"
ConId: "265598"
Time: "2018-05-18 14:01:36"
Exchange: "BEX"
Price: "186.84"
Side: "BOT"
Quantity: "100"
Commission: "0.360257"
Liquidation: "0"
LastLiquidity: "2"
Symbol: "AAPL"
PrimaryExchange: "NASDAQ"
Currency: "USD"
SecType: "STK"
Multiplier: "1.0"
PriceMagnifier: "1"
LastTradeDate: null
Strike: "0.0"
Right: null
NetLiquidation: "1008491.11"
BaseCurrency: "USD"
Rate: "1"
GrossProceeds: "-18684.0"
Proceeds: "-18684.360257"
ProceedsInBaseCurrency: "-18684.360257"
CommissionInBaseCurrency: "0.360257"
Futures
Rollover rules
You can define rollover rules for the futures contracts you trade, and QuantRocket will automatically calculate the rollover date for each expiry and store it in the securities master database. Your rollover rules are used to identify each contract's sequence in the futures chain and optionally to provide continuous futures.
Rollover rules should be defined in a YAML file named quantrocket.master.rollover.yml
which should be located in the /codeload
directory, that is, the top level of the Jupyter file browser.
The format of the rollover rules configuration file is shown below. You can roll based on calendar days before expiration, business days before expiration, a specific day on the month of expiration or the month before expiration, etc. For underlyings that have a mix of illiquid and liquid contract months, you can define months to skip using the only_months
key.
GLOBEX:
ES:
rollrule:
days: -8
same_for:
- NQ
- RS
- YM
MXP:
only_months:
- 3
- 6
- 9
- 12
rollrule:
days: -7
same_for:
- GBP
- JPY
- AUD
HE:
rollrule:
months: -1
day: 27
NYMEX:
RB:
rollrule:
bdays: -2
The master service monitors this file and automatically recalculates rollover dates whenever you edit it.
You can query your rollover dates:
$ quantrocket master get --exchanges GLOBEX --symbols ES --sec-types FUT --fields Symbol LastTradeDate RolloverDate | csvlook -I
| ConId | Symbol | LastTradeDate | RolloverDate |
| --------- | ------ | ------------------- | ------------ |
| 177525433 | ES | 2016-03-18T00:00:00 | 2016-03-10 |
| 187532577 | ES | 2016-06-17T00:00:00 | 2016-06-09 |
| 197307551 | ES | 2016-09-16T00:00:00 | 2016-09-08 |
| 206848474 | ES | 2016-12-16T00:00:00 | 2016-12-08 |
| 215465490 | ES | 2017-03-17T00:00:00 | 2017-03-09 |
| 225652200 | ES | 2017-06-16T00:00:00 | 2017-06-08 |
| 236950077 | ES | 2017-09-15T00:00:00 | 2017-09-07 |
| 247950613 | ES | 2017-12-15T00:00:00 | 2017-12-07 |
| 258973438 | ES | 2018-03-16T00:00:00 | 2018-03-08 |
>>> from quantrocket.master import download_master_file
>>> import io
>>> import pandas as pd
>>> f = io.StringIO()
>>> download_master_file(f, exchanges=["GLOBEX"], symbols=["ES"], sec_types=["FUT"], fields=["Symbol", "LastTradeDate", "RolloverDate"])
>>> df = pd.read_csv(f)
>>> df.tail()
ConId Symbol LastTradeDate RolloverDate
8 236950077 ES 2017-09-15 2017-09-07
9 247950613 ES 2017-12-15 2017-12-07
10 258973438 ES 2018-03-16 2018-03-08
11 269745169 ES 2018-06-15 2018-06-07
12 279396694 ES 2018-09-21 2018-09-13
$ curl 'http://houston/master/securities.csv?exchanges=GLOBEX&symbols=ES&sec_types=FUT&fields=Symbol&fields=LastTradeDate&fields=RolloverDate'
177525433,ES,2016-03-18T00:00:00,2016-03-10
187532577,ES,2016-06-17T00:00:00,2016-06-09
197307551,ES,2016-09-16T00:00:00,2016-09-08
206848474,ES,2016-12-16T00:00:00,2016-12-08
215465490,ES,2017-03-17T00:00:00,2017-03-09
225652200,ES,2017-06-16T00:00:00,2017-06-08
236950077,ES,2017-09-15T00:00:00,2017-09-07
247950613,ES,2017-12-15T00:00:00,2017-12-07
258973438,ES,2018-03-16T00:00:00,2018-03-08
Or query only the front month contract:
$ quantrocket master get --exchanges GLOBEX --symbols ES --sec-types FUT --frontmonth --pretty
ConId = 236950077
Symbol = ES
SecType = FUT
PrimaryExchange = GLOBEX
Currency = USD
LocalSymbol = ESU7
TradingClass = ES
MarketName = ES
LongName = E-mini S&P 500
Timezone = America/Chicago
MinTick = 0.25
PriceMagnifier = 1
LastTradeDate = 2017-09-15
RolloverDate = 2017-09-07
Multiplier = 50
>>> f = io.StringIO()
>>> download_master_file(f, exchanges=["GLOBEX"], symbols=["ES"], sec_types=["FUT"], frontmonth=True, output="txt")
>>> print(f.getvalue())
ConId = 236950077
Symbol = ES
SecType = FUT
PrimaryExchange = GLOBEX
Currency = USD
LocalSymbol = ESU7
TradingClass = ES
MarketName = ES
LongName = E-mini S&P 500
Timezone = America/Chicago
MinTick = 0.25
PriceMagnifier = 1
MdSizeMultiplier = 1
LastTradeDate = 2017-09-15
RolloverDate = 2017-09-07
Multiplier = 50
$ curl 'http://houston/master/securities.txt?exchanges=GLOBEX&symbols=ES&sec_types=FUT&frontmonth=true'
ConId = 236950077
Symbol = ES
SecType = FUT
PrimaryExchange = GLOBEX
Currency = USD
LocalSymbol = ESU7
TradingClass = ES
MarketName = ES
LongName = E-mini S&P 500
Timezone = America/Chicago
MinTick = 0.25
PriceMagnifier = 1
MdSizeMultiplier = 1
LastTradeDate = 2017-09-15
RolloverDate = 2017-09-07
Continuous futures
QuantRocket collects and stores data for each individual futures expiry, but can optionally stitch the data into a continuous contract at query time.
Suppose we've created a universe of all expiries of KOSPI 200 futures, trading on the Korea Stock Exchange:
$ quantrocket master collect --exchanges 'KSE' --sec-types 'FUT' --symbols 'K200'
status: the listing details will be collected asynchronously
$
$ quantrocket master get -e 'KSE' -t 'FUT' -s 'K200' | quantrocket master universe 'k200' -f '-'
code: k200
inserted: 15
provided: 15
total_after_insert: 15
>>> from quantrocket.master import collect_listings, create_universe, download_master_file
>>> import io
>>> collect_listings(exchanges="KSE", sec_types=["FUT"], symbols=["K200"])
{'status': 'the listing details will be collected asynchronously'}
>>>
>>> f = io.StringIO()
>>> download_master_file(f, exchanges=["KSE"], sec_types=["FUT"], symbols=["K200"])
>>> create_universe("k200", infilepath_or_buffer=f)
{'code': 'k200', 'inserted': 15, 'provided': 15, 'total_after_insert': 15}
$ curl -X POST 'http://houston/master/securities?exchanges=KSE&sec_types=FUT&symbols=K200'
{"status": "the listing details will be collected asynchronously"}
$
$ curl -X GET 'http://houston/master/securities.csv?exchanges=KSE&sec_types=FUT&symbols=K200' > k200.csv
$ curl -X PUT 'http://houston/master/universes/k200' --upload-file k200.csv
{"code": "k200", "provided": 15, "inserted": 15, "total_after_insert": 15}
We can create a history database and collect historical data for each expiry:
$ quantrocket history create-db 'k200-1h' --universes 'k200' --bar-size '1 hour' --shard 'year'
status: successfully created quantrocket.history.k200-1h.sqlite
$ quantrocket history collect 'k200-1h'
status: the historical data will be collected asynchronously
>>> from quantrocket.history import create_db, collect_history
>>> create_db("k200-1h", universes=["k200"], bar_size="1 hour", shard="year")
{'status': 'successfully created quantrocket.history.k200-1h.sqlite'}
>>> collect_history("k200-1h")
{'status': 'the historical data will be collected asynchronously'}
$ curl -X PUT 'http://houston/history/databases/k200-1h?universes=k200&bar_size=1+hour&shard=year'
{"status": "successfully created quantrocket.history.k200-1h.sqlite"}
$ curl -X POST 'http://houston/history/queue?codes=k200-1h'
{"status": "the historical data will be collected asynchronously"}
The historical prices for each futures expiry are stored separately and by default are returned separately at query time, but we can optionally using the
cont_fut
parameter to tell QuantRocket to stitch the contracts together at query time. The only supported value is
concat
, indicating simple concatenation of contracts with no adjustments applied:
$ quantrocket history get 'k200-1h' --fields 'Open' 'Close' 'Volume' --outfile 'k200_1h.csv' --cont-fut 'concat'
>>> from quantrocket.history import download_history_file
>>> download_history_file("k200-1h", filepath_or_buffer="k200_1h.csv", fields=["Open","Close", "Volume"], cont_fut="concat")
$ curl -X GET 'http://houston/history/k200-1h.csv?fields=Open&fields=Close&fields=Volume&cont_fut=concat' > k200_1h.csv
The contracts will be stitched together according to the rollover dates as configured in the master service, and the continuous contract will be returned under the conid of the front-month contract as of the query's end date.
A history database need not contain only futures in order to use the continuous futures query option. The option will be ignored for any non-futures, which will be returned as stored. Any futures in the database will be grouped together by symbol, exchange, currency, and multiplier in order to create the continuous contracts. The continuous contracts will be returned alongside the non-futures.
Contract numbers aligned to prices
For futures traders who work with individual contracts rather than continuous contracts, QuantRocket provides a useful function to identify each contract's sequence in the futures chain at any given time, based on the rollover rules you've defined.
Start with a DataFrame of futures prices:
>>> from quantrocket import get_prices
>>> prices = get_prices("cl-fut-1d", fields=["Close"])
>>> closes = prices.loc["Close"]
Pass the prices to the function get_contract_nums_reindexed_like
and use the limit
parameter to specify how far out in the chain to sequence. For example, the following function call will identify the 1st, 2nd, and 3rd nearest contracts to expiration:
>>> from quantrocket.master import get_contract_nums_reindexed_like
>>> contract_nums = get_contract_nums_reindexed_like(closes, limit=3)
Each row in the resulting DataFrame shows the sequence of contracts for that date. This example illustrates a rollover that happened on March 7, 2019:
>>> contract_nums.head()
ConId 81037223 81093789 138979241 138979255 138979261
Date
2019-03-04 3.0 NaN 2.0 1.0 NaN
2019-03-05 3.0 NaN 2.0 1.0 NaN
2019-03-06 3.0 NaN 2.0 1.0 NaN
2019-03-07 2.0 NaN 1.0 NaN 3.0
2019-03-08 2.0 NaN 1.0 NaN 3.0
2019-03-11 2.0 NaN 1.0 NaN 3.0
You can use the contract_nums
DataFrame to mask your prices DataFrame:
>>> month1_closes = closes.where(contract_nums==1)
>>> month2_closes = closes.where(contract_nums==2)
To calculate a calendar spread, you might convert the masked DataFrames to Series and subtract one Series from another:
>>>
>>>
>>> month1_closes = month1_closes.mean(axis=1)
>>> month2_closes = month2_closes.mean(axis=1)
>>> spreads = month1_closes - month2_closes
Combos
Combos, also known as spreads, are composite financial instruments consisting of two or more individual instruments (legs) that are traded as a single instrument. Examples of combos include futures spreads such as calendar spreads or intercommodity spreads, option combos such as straddles or strangles, and stock combos. QuantRocket supports defining combos in the securities master database, collecting real-time data for combos, and placing combo orders through the blotter.
Define combos
Define combos by uploading a list of the combo legs you wish to include. For each combo leg, specify the action ("BUY" or "SELL"), the ratio (as an integer), and the conid of the instrument. The example below shows how to create a futures calendar spread:
>>> from quantrocket.master import download_master_file, create_combo
>>>
>>>
>>> download_master_file("vx.csv", symbols="VIX", exchanges="CFE", sec_types="FUT")
>>>
>>> vx_conids = pd.read_csv("vx.csv", index_col="LocalSymbol").ConId.to_dict()
>>>
>>> create_combo([
["BUY", 1, vx_conids["VXV9"]],
["SELL", 1, vx_conids["VXQ9"]]
])
{"conid": -111, "created": True}
QuantRocket assigns a negative integer as the conid for the combo. The negative integer always has a prefix of -11 followed by an auto-incrementing digit, for example: -111, -112, -113, ...
Negative conids are used to avoid collisions with IB-assigned conids, which are always positive.
The assigned combo conids are specific to your deployment. The integers are assigned in sequential order and thus the conid of a given combo depends on how many combos you have previously defined. This means that once you have begun collecting real-time combo data or placing combo orders you should avoid deleting and re-creating combos (which is not supported by the QuantRocket API anyway), as this would break the references to the combo which are stored in your real-time or blotter databases.
Each user-defined combo is stored in the securities master database with a SecType of "BAG". The combo legs are stored in the ComboLegs field as a JSON array:
$ quantrocket master get --conids -111 --fields 'Symbol' 'PrimaryExchange' 'SecType' 'ComboLegs' | csvlook -I
| ConId | Symbol | PrimaryExchange | SecType | ComboLegs |
| ----- | ------ | --------------- | ------- | --------------------------------------------------------------------------- |
| -111 | VIX | CFE | BAG | [["BUY", 1, 351024203, "CFE", "FUT"], ["SELL", 1, 343395828, "CFE", "FUT"]] |
>>> download_master_file("combo.csv", conids=-111, fields=["Symbol","PrimaryExchange","SecType","ComboLegs"])
>>> combos = pd.read_csv("combo.csv")
>>> combos.iloc[0].to_dict()
{'ConId': -111,
'Symbol': 'VIX',
'PrimaryExchange': 'CFE',
'SecType': 'BAG',
'ComboLegs': '[["BUY", 1, 351024203, "CFE", "FUT"], ["SELL", 1, 343395828, "CFE", "FUT"]]'}
$ curl -X GET 'http://houston/master/securities.csv?conids=-111&fields=Symbol&fields=PrimaryExchange&fields=SecType&fields=ComboLegs'
ConId,Symbol,PrimaryExchange,SecType,ComboLegs
-111,VIX,CFE,BAG,"[[""BUY"", 1, 351024203, ""CFE"", ""FUT""], [""SELL"", 1, 343395828, ""CFE"", ""FUT""]]"
If you attempt to create a combo that already exists, the existing conid will be returned instead of creating a duplicate record:
>>> create_combo([
["BUY", 1, vx_conids["VXV9"]],
["SELL", 1, vx_conids["VXQ9"]]
])
{"conid": -111, "created": False}
Real-time data collection
Collecting real-time data for combos is generally no different from collecting data for other instruments. The exceptions are noted below.
Historical data for combos is not available. You can build your own historical record by collecting real-time data over a period of time. Or you can collect historical data for the individual legs and calculate the spreads in your own code.
Native combo data
Some combos trade natively on an exchange (for example many futures spreads and intercommodity spreads), while other combos do not. For combos that do not trade natively on an exchange, Interactive Brokers provides synthetic market data constructed from the market data of the individual combo legs. For combos that trade natively on an exchange, you can choose whether to collect synthetic data or native data from the exchange. To collect native combo data, specify the --primary-exchange/primary_exchange
option when creating the database:
Without the primary_exchange
option, synthetic data will be collected. Using the primary_exchange
option for combos which don't trade natively on an exchange has no impact; synthetic data will be collected for such combos.
Combo orders
Combo orders can be placed like most other orders, with the following special considerations.
Native vs SMART-routed combos
The Exchange
field controls whether combo orders are executed by Interactive Brokers' SMART router or routed natively to the exchange (for combos that trade natively on an exchange). Set the Exchange
field to "SMART" or to the exchange code (for example "CFE" or "GLOBEX") to control the routing. Below is an example of a natively routed combo order:
$ quantrocket blotter order --params 'ConId:-111' 'Exchange:CFE' 'OrderType:LMT' 'LmtPrice:-0.50' 'TotalQuantity:1' 'Action:BUY' 'Tif:Day' 'Account:DU12345' 'OrderRef:vix-spread-strategy'
>>> from quantrocket.blotter import place_orders
>>> place_orders(orders=[{
'ConId': -111,
'Exchange': 'CFE',
'OrderType': 'LMT',
'LmtPrice': -0.50,
'TotalQuantity': 1,
'Action': 'BUY',
'Tif': 'Day',
'Account': 'DU12345',
'OrderRef': 'vix-spread-strategy'}])
$ cat > orders.json << EOF
[{
"ConId": -111,
"Exchange": "CFE",
"OrderType": "LMT",
"LmtPrice": -0.50,
"TotalQuantity": 1,
"Action": "BUY",
"Tif": "Day",
"Account": "DU12345",
"OrderRef": "vix-spread-strategy"
}]
EOF
$ curl -X POST 'http://houston/blotter/orders' --upload-file orders.json
Note that natively routed combos are guaranteed, that is, the combo will execute in its entirety or not at all, while SMART-routed combos are not guaranteed, that is, one leg may execute and another leg may not. The IB API requires setting a "non-guaranteed" flag on SMART-routed combo orders to acknowledge the risk of partial execution. QuantRocket sets this flag for you on SMART-routed combo orders.
Combo order tracking
Combo orders are tracked as composite instruments and/or as individual legs, depending on the context.
Order status
The blotter treats combos as a single composite instrument for the purpose of tracking order status:
$ quantrocket blotter status --order-refs 'vix-spread-strategy' | csvlook
| OrderId | Submitted | ConId | Action | TotalQuantity | Account | OrderRef | Status | Filled | Remaining | Errors |
| ------- | ------------------------- | ----- | ------ | ------------- | -------- | ------------------- | --------- | ------ | --------- | ------ |
| 6001:51 | 2019-08-01T21:05:00+00:00 | -111 | BUY | 1 | DU123456 | vix-spread-strategy | Submitted | 1 | 0 | |
>>> f = io.StringIO()
>>> download_order_statuses(f, order_refs="vix-spread-strategy")
>>> statuses = pd.read_csv(f, parse_dates=["Submitted"])
>>> statuses.head()
OrderId Submitted ConId Action TotalQuantity Account OrderRef Status Filled Remaining Errors
0 6001:51 2019-08-01 21:05:00 -111 BUY 1 DU123456 vix-spread-strategy Submitted 1 0 NaN
$ curl -X GET 'http://houston/blotter/orders.csv?order_refs=vix-spread-strategy' | csvlook
| OrderId | Submitted | ConId | Action | TotalQuantity | Account | OrderRef | Status | Filled | Remaining | Errors |
| ------- | ------------------------- | ----- | ------ | ------------- | -------- | ------------------- | --------- | ------ | --------- | ------ |
| 6001:51 | 2019-08-01T21:05:00+00:00 | -111 | BUY | 1 | DU123456 | vix-spread-strategy | Submitted | 1 | 0 | |
Positions
Combo positions are also tracked as composite instruments:
$ quantrocket blotter positions | csvlook
| Account | OrderRef | ConId | Quantity |
| -------- | ------------------- | ----- | -------- |
| DU123456 | vix-spread-strategy | -111 | 1 |
>>> from quantrocket.blotter import list_positions
>>> positions = list_positions()
>>> positions = pd.DataFrame(positions)
>>> positions.head()
Account OrderRef ConId Quantity
0 DU123456 vix-spread-strategy -111 1
$ curl -X GET 'http://houston/blotter/positions.csv' | csvlook
| Account | OrderRef | ConId | Quantity |
| -------- | ------------------- | ----- | -------- |
| DU123456 | vix-spread-strategy | -111 | 1 |
This allows you to manage the position just as you would manage any other instrument. For example, you can close the position by conid:
$ quantrocket blotter close --order-refs 'vix-spread-strategy' --conids -111 --params 'OrderType:MKT' 'Tif:Day' 'Exchange:CFE' | csvlook -I
| Account | OrderRef | ConId | TotalQuantity | Action | OrderType | Tif | Exchange |
| -------- | ------------------- | ----- | ------------- | ------ | --------- | --- | -------- |
| DU123456 | vix-spread-strategy | -111 | 1 | SELL | MKT | Day | CFE |
>>> from quantrocket.blotter import close_positions
>>> import io
>>> f = io.StringIO()
>>> close_positions(f, order_refs=["vix-spread-strategy"], params={"OrderType":"MKT", "Tif":"Day", "Exchange":"CFE"})
>>> orders = pd.read_csv(f)
>>> orders.head()
Account OrderRef ConId TotalQuantity Action OrderType Tif Exchange
0 DU123456 vix-spread-strategy -111 1 SELL MKT Day CFE
$ curl -X DELETE 'http://houston/blotter/positions.csv?order_refs=vix-spread-strategy¶ms=OrderType%3AMKT¶ms=Tif%3ADay¶ms=Exchange%3ACFE' | csvlook -I
| Account | OrderRef | ConId | TotalQuantity | Action | OrderType | Tif | Exchange |
| -------- | ------------------- | ----- | ------------- | ------ | --------- | --- | -------- |
| DU123456 | vix-spread-strategy | -111 | 1 | SELL | MKT | Day | CFE |
Combos in portfolio
If you view your account portfolio (similar to logging in to Trader Workstation), you will see the individual legs rather than the composite combo:
$ quantrocket account portfolio | csvlook
| Account | ConId | Description | Position | ...
| -------- | --------- | --------------------- | ---------- | ---
| DU123456 | 255253337 | VX FUT @CFE 20180618 | -1.0 | ...
| DU123456 | 35045199 | VX FUT @CFE 20180917 | 1.0 | ...
>>> from quantrocket.account import download_account_portfolio
>>> import io
>>> f = io.StringIO()
>>> download_account_portfolio(f)
>>> portfolio = pd.read_csv(f, parse_dates=["LastUpdated"])
>>> portfolio.head()
Account ConId Description Position ...
0 DU123456 255253337 VX FUT @CFE 20180618 -1.0
1 DU123456 35045199 VX FUT @CFE 20180917 1.0
$ curl -X GET 'http://houston/account/portfolio.csv' | csvlook
| Account | ConId | Description | Position | ...
| -------- | --------- | --------------------- | ---------- | ---
| DU123456 | 255253337 | VX FUT @CFE 20180618 | -1.0 | ...
| DU123456 | 35045199 | VX FUT @CFE 2018091 | 1.0 | ...
Combo executions
For executions, you will see an execution record representing the composite combo as well as execution records representing the individual legs. These records are distinguished by the ComboType
field, with values of "BAG" and "LEG" respectively (blank for non-combo orders):
$ quantrocket blotter executions --order-refs 'vix-spread-strategy' -o executions.csv
$ csvlook executions.csv
| Symbol | PrimaryExchange | SecType | ComboType | Side | Quantity | ...
| ------ | --------------- | ------- | --------- | ---- | -------- | ---
| VX | CFE | BAG | BAG | BOT | 1 | ...
| VX | CFE | FUT | LEG | BOT | 1 | ...
| VX | CFE | FUT | LEG | SLD | -1 | ...
>>> from quantrocket.pnl import download_executions
>>> download_executions("executions.csv", order_refs=["vix-spread-strategy"])
>>> executions = pd.read_csv("executions.csv")
>>> executions.head()
Symbol PrimaryExchange SecType ComboType Side Quantity ...
0 VX CFE BAG BAG BOT 1 ...
1 VX CFE FUT LEG BOT 1 ...
2 VX CFE FUT LEG SLD -1 ...
$ curl -X GET 'http://houston/blotter/executions.csv?order_refs=vix-spread-strategy' > executions.csv
$ csvlook executions.csv
| Symbol | PrimaryExchange | SecType | ComboType | Side | Quantity | ...
| ------ | --------------- | ------- | --------- | ---- | -------- | ---
| VX | CFE | BAG | BAG | BOT | 1 | ...
| VX | CFE | FUT | LEG | BOT | 1 | ...
| VX | CFE | FUT | LEG | SLD | -1 | ...
Combo PNL
PNL for combo orders is calculated by consulting the leg executions and ignoring the composite execution record. This means that if you download a detailed PNL CSV, you will the see the conids of the individual legs, not the composite combo.
Logging
Stream logs in real-time
You can stream your logs, tail -f
style, from flightlog:
$ quantrocket flightlog stream
2017-01-18 10:19:31 quantrocket.flightlog: INFO Detected a change in flightlog configs directory, reloading configs...
2017-01-18 10:19:31 quantrocket.flightlog: INFO Successfully loaded config
2017-01-18 14:25:57 quantrocket.master: INFO Requesting contract details for error 200 symbols
Flightlog provides application-level monitoring of the sort you will typically want to keep an eye on. For more verbose, low-level system logging which may be useful for troubleshooting, you can stream logs from the logspout
service:
$ quantrocket flightlog stream --detail
quantrocket_houston_1|172.18.0.22 - - [29/May/2018:20:21:45 +0000] GET /launchpad/gateways?status=running HTTP/1.1 200 3 - python-requests/2.14.2
quantrocket_blotter_1|[spooler /var/tmp/uwsgi/spool pid: 12] managing request uwsgi_spoolfile_on_2f55d7838d0f_5_2_1972169246_1526920303_414476 ...
quantrocket_account_1|waiting \until ECB\'s next expected 4PM CET update to collect exchange rates
quantrocket_houston_1|172.18.0.18 - - [29/May/2018:20:21:52 +0000] GET /ibg5/gateway HTTP/1.1 200 22 - python-requests/2.14.2
The quickstart tutorial (view in GitHub) describes a useful technique of docking terminals in JupyterLab for the purpose of log monitoring.
Filtering logs
The logs can be noisy, and sometimes you may want to filter out some of the noise. You can use standard Unix grep
for this purpose. For example:
$
$ quantrocket flightlog stream | grep 'usa-stk-1d'
Or use grep -v
to exclude log output:
$
$ quantrocket flightlog stream --detail | grep -v 'blotter'
You can also stream filtered logs with the Python API:
from quantrocket.flightlog import stream_logs
for line in stream_logs():
if "usa-stk-1d" in line:
print(line)
Download log files
In addition to streaming your logs, you can also download log files, which contain up to 7 days of log history. You can download the application logs:
$ quantrocket flightlog get app.log
>>> from quantrocket.flightlog import download_logfile
>>> download_logfile("app.log")
$ curl -X GET 'http://houston/flightlog/logfile/app' > app.log
Or you can download the more verbose system logs:
$ quantrocket flightlog get --detail system.log
>>> download_logfile("system.log", detail=True)
$ curl -X GET 'http://houston/flightlog/logfile/system' > system.log
To download a filtered log file, use the
match
parameter to specify a string to search for in each log line. For example, download a detailed log file for the fundamental service:
$ quantrocket flightlog get --detail --match 'quantrocket_fundamental' fundamental.log
>>> download_logfile("fundamental.log", detail=True, match="quantrocket_fundamental")
$ curl -X GET 'http://houston/flightlog/logfile/system?match=quantrocket_fundamental' > fundamental.log
Timezone of logs
Logs are timestamped in UTC by default, but you can set your preferred timezone:
$ quantrocket flightlog timezone 'America/New_York'
status: successfully set timezone to America/New_York
>>> from quantrocket.flightlog import set_timezone
>>> set_timezone("America/New_York")
{'status': 'successfully set timezone to America/New_York'}
$ curl -X PUT 'http://houston/flightlog/timezone?tz=America%2FNew_York'
{"status": "successfully set timezone to America/New_York"}
If you're not sure of the timezone name, type as much as you know to see a list of close matches:
$ quantrocket flightlog timezone 'sydney'
msg: 'invalid timezone: sydney (close matches are: Australia/Sydney)'
status: error
>>> set_timezone("sydney")
HTTPError: ('400 Client Error: BAD REQUEST for url: http://houston/flightlog/timezone?tz=sydney', {'status': 'error', 'msg': 'invalid timezone: sydney (close matches are: Australia/Sydney)'})
$ curl -X PUT 'http://houston/flightlog/timezone?tz=sydney'
{"status": "error", "msg": "invalid timezone: sydney (close matches are: Australia/Sydney)"}
You can pass '?' to see all available timezones:
$ quantrocket flightlog timezone '?'
msg: 'invalid timezone: ? (choices are: Africa/Abidjan, Africa/Accra, Africa/Addis_Ababa,
Africa/Algiers, Africa/Asmara, Africa/Asmera, Africa/Bamako, Africa/Bangui, Africa/Banjul,
Africa/Bissau, Africa/Blantyre, Africa/Brazzaville, Africa/Bujumbura, Africa/Cairo,'
...
>>> set_timezone("?")
HTTPError: ('400 Client Error: BAD REQUEST for url: http://houston/flightlog/timezone?tz=%3F', {'status': 'error', 'msg': 'invalid timezone: ? (choices are: Africa/Abidjan, Africa/Accra, Africa/Addis_Ababa, Africa/Algiers, Africa/Asmara, Africa/Asmera, Africa/Bamako, Africa/Bangui, Africa/Banjul, Africa/Bissau, Africa/Blantyre, Africa/Brazzaville, Africa/Bujumbura, Africa/Cairo,'...})
$ curl -X PUT 'http://houston/flightlog/timezone?tz=?'
{"status": "error", "msg": "invalid timezone: ? (choices are: Africa/Abidjan, Africa/Accra, Africa/Addis_Ababa, Africa/Algiers, Africa/Asmara, Africa/Asmera, Africa/Bamako, Africa/Bangui, Africa/Banjul, Africa/Bissau, Africa/Blantyre, Africa/Brazzaville, Africa/Bujumbura, Africa/Cairo, ..."
Send log messages
You can use the Python client to log to Flightlog from your own code:
import logging
from quantrocket.flightlog import FlightlogHandler
logger = logging.getLogger('myapp')
logger.setLevel(logging.DEBUG)
handler = FlightlogHandler()
logger.addHandler(handler)
logger.info('my app just opened a position')
You can also log directly from the CLI (this is a good way to test your Flightlog configuration):
$ quantrocket flightlog log "this is a test" --name myapp --level INFO
If you're streaming your logs, you should see your message show up:
2018-02-21 10:59:01 myapp: INFO this is a test
Log command output
The CLI can accept a log message over stdin, which is useful for piping in the output of another command. In the example below, we check our balance with the --below
option to only show account balance info if the cushion has dropped too low. If the cushion is safe, the first command produces no output and nothing is logged. If the cushion is too low, the output is logged to flightlog at a CRITICAL
level:
$ quantrocket account balance --latest --below 'Cushion:0.02' --fields 'NetLiquidation' 'Cushion' | quantrocket flightlog log --name 'quantrocket.account' --level 'CRITICAL'
If you've set up Papertrail alerts for CRITICAL messages, you can add this command to the crontab on one of your countdown services, and you'll get a text message whenever there's trouble.
Log levels
Log levels in QuantRocket are used as follows. (This applies to the application logs; detailed logs don't have log levels.)
Log level | How used | Examples |
---|
INFO | default log level for status messages | started collecting data; finished collecting data |
WARNING | recoverable errors and IB warning messages | an IB API call failed and will be automatically re-tried; a security was delisted and is longer available in IB's database; an order expired without being filled |
ERROR | unrecoverable error - the command or function may need to be manually re-run | a temporary but unexpected error occurred and the command should be retried; or, there may be a bug |
CRITICAL | not used by QuantRocket - reserved for user | can be used for critical account monitoring by user (see below) |
Most log messages will be at the INFO
level, with WARNING
being the second most common. ERROR
level messages are less common and indicate something went wrong with the application. CRITICAL
messages are not used by QuantRocket; you can use CRITICAL
to monitor for urgent situations such as the margin in your account falling too low:
30-59 9 * * mon-fri quantrocket account balance --latest --below 'Cushion:0.05' --fields 'NetLiquidation' 'Cushion' | quantrocket flightlog log --name 'quantrocket.account' --level 'CRITICAL'
You can set up Papertrail alerts to be notified of ERROR
or CRITICAL
messages, or any other types of messages you wish to highlight.
Papertrail integration
Papertrail is a log management service that lets you monitor logs from a web interface, flexibly search the logs, and send alerts to other services (email, Slack, PagerDuty, webhooks, etc.) based on log message criteria. You can configure flightlog to send your logs to your Papertrail account.
To get started, sign up for a Papertrail account (free plan available).
In Papertrail, locate your Papertrail host and port number (Settings > Log Destinations).
Enter your Papertrail settings into your deployment:
$ quantrocket flightlog papertrail --host 'logs.papertrailapp.com' --port 55555
status: successfully set papertrail config
>>> from quantrocket.flightlog import set_papertrail_config
>>> set_papertrail_config("logs.papertrailapp.com", 55555)
{'status': 'successfully set papertrail config'}
$ curl -X PUT 'http://houston/flightlog/papertrail?host=logs.papertrailapp.com&port=55555'
{"status": "successfully set papertrail config"}
You can log a message from the CLI to test your Flightlog configuration (first wait 10 seconds to give flightlog time to load the new configuration):
$ quantrocket flightlog log "this is a test" --name myapp --level INFO
Your message should show up in Papertrail:

Papertrail alerts
One of the benefits of Papertrail is that you can set up alerts based on specific log criteria. Alerts can be sent to email or a variety of third-party notification services. Below is an example of how you might monitor various types of log messages using Papertrail:
Message type | How to monitor | Papertrail saved search term | How Papertrail alerts you |
---|
INFO /WARNING messages | Periodically log into Papertrail web viewer | | |
ERROR messages | Papertrail saved search alert | program:quantrocket severity:ERROR | hourly email |
CRITICAL messages | Papertrail saved search alert | program:quantrocket severity:CRITICAL | Pushover (mobile push notifications) |
algo orders placed with the blotter | Papertrail saved search alert | program:quantrocket.blotter | daily email |
These are intended only as examples to hint at what's possible; the monitoring capabilities with QuantRocket and Papertrail are highly flexible.
Database Management
QuantRocket uses PostgreS