QuantConnect x DataBento x Dagster x QuestDB

A robust data pipeline to convert vendor data to QuantConnect’s Lean CLI Local format.

Data makes or breaks your research, your strategy. What started out as a simple data pull evolved into a full blown infrastructure project.

The goal of this project was to enable local backtesting on hardware more powerful than the cloud nodes available on QuantConnect. With QuantConnect’s Lean Local engine as the plumbing I embarked on a journey to setup a pipeline.

Interested in seeing the code? Click the github icon to view it!

The Pipeline Design

The first step of creating a data pipeline was to determine the flow of data.

The flow of the project is:

A ticker list is determined
This request is sent to databento
The returned data is stored on a QuestDB instance, located on the kubernetes cluster
This data is then transformed into a lean readable format (stored on the backtesting machine) for backtesting
Requests for new data if not on the QuestDB instance are passed up to the requester who downloads the data from databento.

The Pull

Data is downloaded from DataBento who maintains high quality real time and historical financial data. Perfect to support my trading strategies, although currently focused on historical data.

Storage

After being downloaded from databento, the data is then uploaded to my self-hosted PostgreSQL server for storage.

To prevent potentially expensive data calls, the script first checks if the data requested and the dates requested are already within the database. If they are, the script simply downloads the requested data and runs it through the next stage of the script.

Conversion

The key caveat for local backtesting via Lean CLI is the need for local data within a specific format, stored as a csv within zip files. Currently the implementation handles only daily data, with plans to add hourly and minute data if my strategies require it.

Built into the pipeline script is a function that uses either postgresql data or fresh data from databento and converts it into the required format for Lean CLI.

Future Improvements

While the pipeline is complete for now, I have a series of improvements already planned to continue my development on this project.

Data Orchestration via Dagster - As I progress through the dagster university course, I find myself chomping at the bit to move onto dagsterizing my pipeline. I have begun my implementation and am actively working on it alongside my other projects.
Higher Resoultion Data - Added to the list, however as my strategies currently have no need for higher res data, I have placed this on the back burner for now. The modifications will be relatively small, consisting of changing storage and handling. The overall framework and functions within this project as it is lay the foundation for handling hourly, minute, second and even tick data. Although, these come with their own set of implementation caveats to tackle.

QuantConnect x DataBento x Dagster x QuestDB

A robust data pipeline to convert vendor data to QuantConnect’s Lean CLI Local format.

The Pipeline Design

The Pull

Storage

Conversion

Future Improvements

Miguel Palanca

Social Links

Contact me

QuantConnect x DataBento x Dagster x QuestDB

A robust data pipeline to convert vendor data to QuantConnect’s Lean CLI Local format.

The Pipeline Design

The Pull

Storage

Conversion

Future Improvements

Elders Triple Screen

Pairs Trading

Miguel Palanca

Social Links

Contact me