Chronon is an open-source, end-to-end feature platform designed for machine learning (ML) teams to build, deploy, manage, and monitor data pipelines for machine learning.
Chronon enables you to harness all the data within your organization, including batch tables, event streams, and services, to drive your AI/ML projects without the need to manage the typically required orchestration.
Key features:
- Consume data from various sources, including event streams, DB table snapshots, change data streams, service endpoints, and warehouse tables modeled as slowly changing dimensions, fact, or dimension tables.
- Produce results in both online and offline contexts. Online, as scalable low-latency endpoints for feature serving, or offline as hive tables for generating training data.
- Real-time or batch accuracy: configure results to be either Temporal or Snapshot accurate. Temporal accuracy updates feature values in real-time in online contexts and produces point-in-time correct features offline. Snapshot accuracy updates features once daily at midnight.
- Backfill training sets from raw data without waiting months to accumulate feature logs for model training.
- Utilize a powerful Python API: data source types, freshness, and contexts are API-level abstractions composed of intuitive SQL primitives like group-by, join, and select, with powerful enhancements.
- Automate feature monitoring: auto-generate monitoring pipelines to understand training data quality, measure training-serving skew, and monitor feature drift.
Chronon is available for free on GitHub.
Must read: