QuantConnect x DataBento x Dagster x QuestDB

A robust data pipeline to convert vendor data to QuantConnect’s Lean CLI Local format.

Data makes or breaks your research, your strategy. What started out as a simple data pull evolved into a full blown infrastructure project.

The goal of this project was to enable local backtesting on hardware more powerful than the cloud nodes available on QuantConnect. With QuantConnect’s Lean Local engine as the plumbing I embarked on a journey to setup a pipeline.

Interested in seeing the code? Click the github icon to view it!

The Pipeline Design

The first step of creating a data pipeline was to determine the flow of data.

The flow of the project is:

  1. A ticker list is determined

  2. This request is sent to databento

  3. The returned data is stored on a QuestDB instance, located on the kubernetes cluster

  4. This data is then transformed into a lean readable format (stored on the backtesting machine) for backtesting

  5. Requests for new data if not on the QuestDB instance are passed up to the requester who downloads the data from databento.

The Pull

Data is downloaded from DataBento who maintains high quality real time and historical financial data. Perfect to support my trading strategies, although currently focused on historical data.

Storage

After being downloaded from databento, the data is then uploaded to my self-hosted PostgreSQL server for storage.

To prevent potentially expensive data calls, the script first checks if the data requested and the dates requested are already within the database. If they are, the script simply downloads the requested data and runs it through the next stage of the script.

Conversion

The key caveat for local backtesting via Lean CLI is the need for local data within a specific format, stored as a csv within zip files. Currently the implementation handles only daily data, with plans to add hourly and minute data if my strategies require it.

Built into the pipeline script is a function that uses either postgresql data or fresh data from databento and converts it into the required format for Lean CLI.

Future Improvements

While the pipeline is complete for now, I have a series of improvements already planned to continue my development on this project.

  1. Data Orchestration via Dagster - As I progress through the dagster university course, I find myself chomping at the bit to move onto dagsterizing my pipeline. I have begun my implementation and am actively working on it alongside my other projects.

  2. Higher Resoultion Data - Added to the list, however as my strategies currently have no need for higher res data, I have placed this on the back burner for now. The modifications will be relatively small, consisting of changing storage and handling. The overall framework and functions within this project as it is lay the foundation for handling hourly, minute, second and even tick data. Although, these come with their own set of implementation caveats to tackle.

Previous
Previous

Elders Triple Screen

Next
Next

Pairs Trading