Projects
- 🎨 TeleTron: scalable long-context multi-modal Transformer training framework.
- 🔀 Kubernetes Scheduler Simulator: evaluates different scheduling policies in GPU-sharing clusters.
- 📊 Alibaba Cluster Trace Program: provides AI workload traces from real production clusters with analysis.