You must be signed in to submit a project.
Please see the Schedule and Judging Criteria for details.
$ cat /etc/cookies.conf
We use cookies to understand how people use this site.
Analytics cookies help us improve your experience.
They are off by default. Nothing tracks you until you say so.
$ select cookie_preferences
Entries for the Hackathon: Benchmarking Small Language Models in the Real World hackathon.
Please see the Schedule and Judging Criteria for details.
The evaluation is designed specifically for this hackathon and reflects what matters most for the task: building the best text-to-Polars query generation model.
Submissions are evaluated primarily on their ability to generate correct Polars queries from natural language prompts. The most important factor is therefore the number of correct answers produced on the evaluation set.
The public leaderboard ranks teams by their best completed Global submission. Scores come from the official benchmark endpoint (POST /submit_final), and higher is better.
Score = N / (T * VRAM^0.1 * RAM^0.01)
Where:
This reflects the order of importance:
All teams are evaluated using the same set of benchmark questions and consistent evaluation conditions so results are directly comparable.
If two projects receive very similar scores, judges may use submission quality as a tie-breaker (reproducibility, clarity, documentation), with correctness remaining the primary objective.
The portal supports two submission modes:
test)Use these to iterate quickly and debug. Test runs provide development feedback to help you improve correctness and performance, but may not match final leaderboard rankings.
global)These are the submissions that count for the leaderboard and produce a Final Score using the formula above.
README with exact install and run steps.
This milestone has passed
The team formation deadline has passed, but you may still create or join a team now if you need to.
This milestone has passed
hint: You may still make edits to your submission up until the time you are actually judged.
This milestone has passed
This milestone has passed
Times shown are in Europe/Paris time.