VEDA
[Project] VEDA - I built an autonomous ML platform with 140+ agents that takes any data source and a plain English goal, then builds and deploys the model itself
I've been working on this for a few months and finally launched it. Wanted to share it here and get some feedback from people who actually know ML.
What it does:
You connect a data source and describe your goal in plain English. VEDA figures out the rest.
Supported data sources:
- CSV, Excel, JSON, Parquet
- SQL databases
- REST APIs
- Cloud storage (S3, GCS)
- PDFs and documents
- Real-time streams
The pipeline runs 11 sequential agents:
Ingest → Clean → Profile → Feature Engineering → Feature Selection → Scaling → Training → Evaluation → Hyperparameter Tuning → Model Selection → Report
The ML stack:
- Optuna for Bayesian hyperparameter optimization (50 trials via TPE sampler)
- XGBoost, LightGBM, Random Forest benchmarked automatically
- SHAP explainability on every prediction
- KS-test + PSI drift detection on live predictions
- A/B testing with chi-square significance testing
- Hash-based data versioning with full lineage tracking
The AI layer:
- Groq LLM (Llama 3.3 70B) for natural language goal interpretation
- Claude AI for agent reasoning and decision-making
- LangGraph for multi-agent orchestration
Production engineering (the part most ML projects skip):
- FastAPI backend with async SQLAlchemy + PostgreSQL
- Celery + Redis task queue — jobs persist across server restarts
- Circuit breakers per agent with CLOSED/OPEN/HALF-OPEN state transitions
- Alembic database migrations
- Rate limiting (5/min login, 10/min workflow creation)
- Brute force protection — 5 failed attempts → 15 min lockout
- Secrets management with Vault/AWS/env backends
- Full docker-compose stack with Nginx + TLS
Numbers:
- 140+ agents across 12 domains
- 35 REST endpoints
- 7,000+ lines of Python
- Deployed live on HuggingFace Spaces
Links:
- Live demo: https://keshav1838-veda-ml-platform.hf.space
- GitHub: https://github.com/keshavloma1081-ctrl/VEDA--Auto-DS
- API docs: https://keshav1838-veda-ml-platform.hf.space/docs
Honest limitations:
- Currently optimized for tabular data (classification + regression)
- Celery/Redis features require local setup — HuggingFace deployment uses BackgroundTasks fallback
- Some advanced agents (GNN, RL, CV) are scaffolded but not fully wired into the main pipeline yet
Happy to answer any technical questions. Roast it if you want — genuine feedback is more useful than likes.