AI-ready data infrastructure — from notebook to production
We build a self-hosted, production-grade platform for data engineering and AI. One framework, six components — goes from raw data to trained models to deployed agents, with observability built in at every layer.
Self-hosted. Your data never leaves your infrastructure. Composable. Use only what you need — each component works standalone. Production-grade. Medallion pipelines, ML drift detection, structured logging, Prometheus metrics, OpenTelemetry tracing out of the box.
graph LR
subgraph Sources
S1[(Databases)]
S2[(APIs)]
S3[(Files / S3)]
end
subgraph Core ["dataenginex — core framework"]
direction TB
MW[Middleware\nauth · rate limit · tracing]
QG[Quality Gates\nbronze → silver → gold]
ML[ML Registry\ntraining · drift · serving]
OB[Observability\nPrometheus · OTEL · structlog]
end
subgraph Apps
DD[DataDEX\npipeline engine]
AD[AgentDEX\nAI agents]
CD[CareerDEX\ncareer intel]
UI[DEX Studio\ndesktop UI]
end
subgraph Infra ["infradex — K3s / EKS"]
TF[Terraform]
HE[Helm]
AN[Ansible]
MON[Prometheus\nGrafana\nJaeger]
end
S1 & S2 & S3 --> DD --> Core
AD --> Core
CD --> Core
UI -->|HTTP| Core
Infra -->|deploys| Core
Infra -->|deploys| Apps
| Repo | What it does | Status | CI |
|---|---|---|---|
DEX — dataenginex |
Core framework: medallion pipelines, ML registry, auth, observability, plugin system | ||
DataDEX — datadex |
YAML-defined pipelines: ingest → transform → quality → lineage | Alpha | |
AgentDEX — agentdex |
AI agent runtime: persistent memory, tool registry, multi-model routing, audit trails | Alpha | |
CareerDEX — careerdex |
Career intelligence: job matching, salary prediction, skill gap analysis | In development | |
DEX Studio — dex-studio |
Cross-platform desktop UI: unified control plane for the full stack | v0.1.0 Alpha | |
InfraDEX — infradex |
IaC + monitoring: Terraform, Helm, Ansible, Prometheus, Grafana, Jaeger | Alpha |
Install the core package:
uv add dataenginex # core (FastAPI server included)Run from source:
git clone https://github.com/TheDataEngineX/DEX && cd DEX
uv run poe setup
uv run poe dev # → http://localhost:8000Try the examples:
curl http://localhost:8000/health
curl http://localhost:8000/metrics
ls examples/
# 01_medallion_pipeline.py 04_ml_training.py 07_plugin_system.py
# 02_api_quickstart.py 05_data_catalog.py 08_spark_ml.py
# 03_auth_jwt.py 06_observability.py ...Full observability stack (requires infradex):
git clone https://github.com/TheDataEngineX/infradex && cd infradex
docker compose -f docker-compose.monitoring.yml up -d
# Grafana → http://localhost:3000 (admin / admin)
# Prometheus → http://localhost:9090
# Jaeger → http://localhost:16686What's shipping next across the platform:
- MLflow integration — replace custom JSON model registry with MLflow tracking + lifecycle stages
- DataSecops module — PII detection, field masking, structured audit logs
- PySpark + Databricks connectors for
datadexpipeline engine - Cloud storage backends (S3, GCS, BigQuery) in
dexlakehouse - Complete DB connectors (Postgres, Kafka, MySQL) in
datadex
- Locust load tests across all API services
- Grafana dashboard per component wired to the infradex monitoring stack
- Full Terraform + Ansible + ArgoCD deploy verified end-to-end
- Language-Learning Agent (
agentdex) — memory, planning, tool use, multi-model routing - Book Recommender — Open Library → embeddings → Qdrant → recommendations
- Movie Recommender — MovieLens → collaborative filtering + content-based fallback
- Docker images published to GHCR (
ghcr.io/thedataenginex/*) - Docs site live at thedataenginex.org via Netlify
- Public CareerDEX demo — semantic job search powered by the platform
| DEX | DIY stack | |
|---|---|---|
| Medallion pipelines | Built-in Bronze/Silver/Gold with quality gates | Wire together dbt + Airflow + custom validators |
| ML lifecycle | Registry, drift detection (PSI), staging → prod promotion | MLflow + custom scripts |
| Observability | Prometheus metrics, OTEL tracing, structlog — zero config | Manually instrument every service |
| AI agents | Persistent memory, tool registry, cost tracking, audit log | LangChain boilerplate per project |
| Deployment | One Helm chart per service, ArgoCD GitOps, Terraform modules | Write your own IaC from scratch |
| Plugin system | Drop-in extensions via entry points | Fork and modify the framework |
| 📖 Documentation | docs.thedataenginex.org |
| 💬 Discussions | github.com/orgs/TheDataEngineX/discussions |
| 🐛 Bug reports | Open an issue in the relevant repo |
| 🤝 Contributing | CONTRIBUTING.md |
| 🔒 Security | SECURITY.md |
| 🌐 Website | thedataenginex.org |
MIT License · Python 3.12+ · Self-hosted · Production-grade