Welcome to the Polars Bench hackathon. Your goal is to build a text → Polars system: given a natural-language prompt, your model should generate correct Polars queries / code that produces the right answer reliably and efficiently.
Key links (start here)
You will submit your project on AIT as well (see “Final submission (AIT)” below).
Template repo (recommended starting point)
Use the provided GitHub template to get the correct structure and runner contract from day one:
Recommended approach:
- Click Use this template to create your own repository (clean history), or fork it.
- Keep the entrypoint and required files consistent with the template, then swap in your model + logic.
Dataset access (important)
You do not download the evaluation dataset.
For official runs, the dataset is provided inside the evaluation runner (mounted/available in the runtime). This is done specifically to:
- prevent dataset leakage, and
- keep evaluation conditions consistent across teams.
Your code should read data from the runner-provided location(s) as defined by the template/runner contract.
What you’re building (hackathon theme)
Build a model + runtime that can:
- read a natural language analytics request,
- generate Polars operations (expressions / DataFrame code),
- run them correctly on the provided data,
- return valid outputs consistently (no crashes, no malformed responses).
The platform supports two submission modes:
1) Test submissions (test)
Use these during development to iterate quickly and debug. Test runs provide development feedback (logs, outputs, and other signals) to help you improve correctness and performance, but they may not match final leaderboard rankings.
2) Global submissions (global)
These count for the public leaderboard. The leaderboard ranks teams by their best completed Global submission (status done) and uses the Final Score returned by the official benchmark endpoint (POST /submit_final).
Final Score = N / (T * VRAM^0.1 * RAM^0.01)
Where:
-
N = number of correct answers (exact matches)
-
T = total generation time across the evaluation set
-
VRAM = GPU memory usage
-
RAM = system memory usage
Higher is better. Only your best global score per team is shown on the leaderboard.
How to use the evaluation platform (recommended workflow)
-
Create / join a team on
polarsbench.net.
- Start from the template repo and implement your model.
- Submit a test run early:
- confirm the runner can build your repo
- confirm outputs are valid
- Iterate until you’re stable and correct.
- Submit a global run when ready for an official ranking.
- Keep iterating: your leaderboard entry updates when you achieve a better global Final Score.
Repo requirements (what the runner expects)
Your submission is a public GitHub repository URL that is:
-
Reproducible: pinned dependencies (lockfiles recommended)
-
Self-contained: includes all code/config required to run
-
Non-interactive: no prompts; fully automated execution
-
Runner-friendly:
- clear entrypoint
- predictable install/build steps
- sensible logging (enough to debug failures)
Best practices:
- Pin versions (Python/Node deps, model revision, etc.).
- Avoid downloading large artifacts at runtime unless required (and cache when possible).
- Make your runtime deterministic where possible (seed, fixed decoding, stable formatting).
Best practices to score well (and demo well)
1) Correctness first (maximize N)
- Ensure outputs follow the expected format every time.
- Handle edge cases: nulls, empty results, dtype pitfalls, parsing issues.
- Add quick regression tests for prompts you commonly fail.
2) Then optimize latency (minimize T)
- Keep generation short and structured.
- Reduce overhead: model warm-up, repeated loads, unnecessary preprocessing.
- Cache safely where it doesn’t change correctness.
3) Control memory (VRAM / RAM)
- Prefer smaller / quantized models if they preserve accuracy.
- Avoid loading duplicate model copies.
- Watch peak allocations during generation and execution.
4) Make it easy to demo
- Provide a one-command run path.
- Add a short “Demo” section in your README with setup + example prompts + expected outputs.
- Keep logs readable: show key timings + failure reasons.
Final submission (AIT)
In addition to
polarsbench.net runs, you must submit on the AIT platform.
Include in your AIT submission:
- your team name
- your public GitHub repo URL
- your
polarsbench.net team page URL
- (recommended) your best global run / leaderboard proof (score + timestamp)
- short notes on:
- model choice + why
- key optimizations
- reproducibility instructions (how to run)
Troubleshooting checklist
If a run fails or you don’t appear on the leaderboard:
- Did you submit as global (not test)?
- Did the run reach status =
done?
- Does your run have a non-null Final Score?
- Is your repo public and buildable from scratch?
- Are you printing / returning outputs in the expected format?
If you’re stuck, do a test run first, fix build/runtime/output issues, then go global.