⬇️ Download links in the "Assets" section below ⬇️
New in this release Evals - a comprehensive evaluation toolkit:
- Build powerful SOTA evals (G-Eval, LLM as Judge)
- Check eval correlation to human preferences (Kiln Ratings) to find the best evaluator
- Synthetically generate eval datasets with Kiln Synthetic Data Gen
- Use the analysis tools to find the optimal prompt+model for your task
- Automatic eval: Kiln will automatically build an eval for any Kiln task using your task definition
- Templates for common eval use cases: bias, toxicity, jailbreaking, maliciousness, factual correctness.
Other new features include:
- Support for distilling (fine-tuning) an open model from Sonnet 3.7 thinking
- New Built-In Models: Sonnet 3.7, Dolphin 2.9 8x22B, and Grok
- Improved logging (thanks to @leonardmq)
- ARM Linux builds now included
CI Build Source for this release Mac and Linux: /Kiln-AI/Kiln/actions/runs/13638728833
CI Build Source for this release Windows: /Kiln-AI/Kiln/actions/runs/13638728765
Full Changelog: v0.11.1...v0.12.1