We apply a limit on the hardware available to each participant to run their solutions. Specifically,
- All solutions will be run on AWS g4dn.12xlarge instances equipped with NVIDIA T4 GPUs.
- Solutions for Phase 1 will have access to :
2x NVIDIA T4 GPU.20x vCPU (10physical CPU cores)90GBRAM
- Solutions for Phase 2 will have access to:
4x NVIDIA T4 GPU.40x vCPU (20physical CPU cores)180GBRAM
Note: When running in gpu:false mode, you will have access to 4 x vCPUs (2 physical cores) and 8GB RAM.
Please note that NVIDIA T4 uses a somewhat outdated architectures and is thus not compatible with certain acceleration toolkits (e.g. Flash Attention), so please be careful about compatibility.
Besides, the following restrictions will also be imposed:
- Network connection will be disabled.
- Each submission will be assigned a certain amount of time to run. Submissions that exceed the time limits will be killed and will not be evaluated. The tentative time limit is set as follows.
| Phase | Track 1 | Track 2 | Track 3 | Track 4 | Track 5 |
|---|---|---|---|---|---|
| Phase 1 | 140 minutes | 40 minutes | 60 minutes | 60 minutes | 5 hours |
- Each team will be able to make up to 2 submissions per week per track for Tracks 1-4, and 1 submission per week for track 5 all-around.
Based on the hardware and system configuration, we recommend participants to begin with 7B models. According to our experiments, 7B models like Vicuna-7B and Mistral can perform inference smoothly on 2 NVIDIA T4 GPUs, while 13B models will result in OOM.