Skip to content

TheDataEngineX

AI-ready data infrastructure — from notebook to production

License: MIT Python 3.12+ PyPI Docs Discussions


We build a self-hosted, production-grade platform for data engineering and AI. One framework, six components — goes from raw data to trained models to deployed agents, with observability built in at every layer.

Self-hosted. Your data never leaves your infrastructure. Composable. Use only what you need — each component works standalone. Production-grade. Medallion pipelines, ML drift detection, structured logging, Prometheus metrics, OpenTelemetry tracing out of the box.


The Platform

graph LR
    subgraph Sources
        S1[(Databases)]
        S2[(APIs)]
        S3[(Files / S3)]
    end

    subgraph Core ["dataenginex — core framework"]
        direction TB
        MW[Middleware\nauth · rate limit · tracing]
        QG[Quality Gates\nbronze → silver → gold]
        ML[ML Registry\ntraining · drift · serving]
        OB[Observability\nPrometheus · OTEL · structlog]
    end

    subgraph Apps
        DD[DataDEX\npipeline engine]
        AD[AgentDEX\nAI agents]
        CD[CareerDEX\ncareer intel]
        UI[DEX Studio\ndesktop UI]
    end

    subgraph Infra ["infradex — K3s / EKS"]
        TF[Terraform]
        HE[Helm]
        AN[Ansible]
        MON[Prometheus\nGrafana\nJaeger]
    end

    S1 & S2 & S3 --> DD --> Core
    AD --> Core
    CD --> Core
    UI -->|HTTP| Core
    Infra -->|deploys| Core
    Infra -->|deploys| Apps
Loading

Components

Repo What it does Status CI
DEXdataenginex Core framework: medallion pipelines, ML registry, auth, observability, plugin system PyPI CI
DataDEXdatadex YAML-defined pipelines: ingest → transform → quality → lineage Alpha CI
AgentDEXagentdex AI agent runtime: persistent memory, tool registry, multi-model routing, audit trails Alpha CI
CareerDEXcareerdex Career intelligence: job matching, salary prediction, skill gap analysis In development CI
DEX Studiodex-studio Cross-platform desktop UI: unified control plane for the full stack v0.1.0 Alpha CI
InfraDEXinfradex IaC + monitoring: Terraform, Helm, Ansible, Prometheus, Grafana, Jaeger Alpha CI

Quick Start

Install the core package:

uv add dataenginex           # core (FastAPI server included)

Run from source:

git clone https://github.com/TheDataEngineX/DEX && cd DEX
uv run poe setup
uv run poe dev               # → http://localhost:8000

Try the examples:

curl http://localhost:8000/health
curl http://localhost:8000/metrics

ls examples/
# 01_medallion_pipeline.py  04_ml_training.py  07_plugin_system.py
# 02_api_quickstart.py      05_data_catalog.py  08_spark_ml.py
# 03_auth_jwt.py            06_observability.py  ...

Full observability stack (requires infradex):

git clone https://github.com/TheDataEngineX/infradex && cd infradex
docker compose -f docker-compose.monitoring.yml up -d
# Grafana    → http://localhost:3000  (admin / admin)
# Prometheus → http://localhost:9090
# Jaeger     → http://localhost:16686

Roadmap

What's shipping next across the platform:

Now — Core Completion

  • MLflow integration — replace custom JSON model registry with MLflow tracking + lifecycle stages
  • DataSecops module — PII detection, field masking, structured audit logs
  • PySpark + Databricks connectors for datadex pipeline engine
  • Cloud storage backends (S3, GCS, BigQuery) in dex lakehouse
  • Complete DB connectors (Postgres, Kafka, MySQL) in datadex

Next — Load Testing + Observability

  • Locust load tests across all API services
  • Grafana dashboard per component wired to the infradex monitoring stack
  • Full Terraform + Ansible + ArgoCD deploy verified end-to-end

Then — Demo Projects

  • Language-Learning Agent (agentdex) — memory, planning, tool use, multi-model routing
  • Book Recommender — Open Library → embeddings → Qdrant → recommendations
  • Movie Recommender — MovieLens → collaborative filtering + content-based fallback

Later — Infrastructure & Distribution

  • Docker images published to GHCR (ghcr.io/thedataenginex/*)
  • Docs site live at thedataenginex.org via Netlify
  • Public CareerDEX demo — semantic job search powered by the platform

Why DEX

DEX DIY stack
Medallion pipelines Built-in Bronze/Silver/Gold with quality gates Wire together dbt + Airflow + custom validators
ML lifecycle Registry, drift detection (PSI), staging → prod promotion MLflow + custom scripts
Observability Prometheus metrics, OTEL tracing, structlog — zero config Manually instrument every service
AI agents Persistent memory, tool registry, cost tracking, audit log LangChain boilerplate per project
Deployment One Helm chart per service, ArgoCD GitOps, Terraform modules Write your own IaC from scratch
Plugin system Drop-in extensions via entry points Fork and modify the framework

Community

📖 Documentation docs.thedataenginex.org
💬 Discussions github.com/orgs/TheDataEngineX/discussions
🐛 Bug reports Open an issue in the relevant repo
🤝 Contributing CONTRIBUTING.md
🔒 Security SECURITY.md
🌐 Website thedataenginex.org

MIT License · Python 3.12+ · Self-hosted · Production-grade

Pinned Loading

  1. dex dex Public

    DataEngineX core framework — medallion pipelines, ML registry, RAG, observability, plugin system. Published on PyPI.

    Python 5

  2. dex-studio dex-studio Public

    Desktop control plane for the DataEngineX platform — built with NiceGUI, connects to all services via HTTP.

    Python 3

  3. agentdex agentdex Public archive

    AI agent orchestration platform — persistent memory, multi-model routing, tool registry, workflows, and audit trails.

    Python

  4. careerdex careerdex Public

    Reference implementation for the DataEngineX ecosystem — AI career intelligence platform with job matching, salary prediction, and career path recommendations

    Python

  5. datadex datadex Public archive

    Config-driven data pipeline engine — YAML-defined ingest, transform, quality checks, and column-level lineage.

    Python

  6. infradex infradex Public

    IaC + monitoring for DataEngineX — Terraform, Helm charts, Ansible playbooks, Prometheus, Grafana, Jaeger.

    Shell

Repositories

Showing 7 of 7 repositories

People

This organization has no public members. You must be a member to see who’s a part of this organization.

Top languages

Loading…

Most used topics

Loading…