MagicPill Labs
Live App Lab Project

SparkEval

Lightweight LLM evaluation for small teams and individuals. Compare AI model providers, create custom graders, and run automated evaluations.

Get Started on SparkEval ↗

What SparkEval Does

Everything you need to evaluate and compare LLM performance — without the enterprise complexity.

⚖️

Compare Providers

Side-by-side comparison of AI model providers. See how GPT-4, Claude, Gemini, and others stack up on your data.

📝

Custom Graders

Define your own evaluation criteria. Score on accuracy, tone, format, or any custom metric that matters.

🔄

Automated Evals

Schedule recurring evaluations. Track model performance over time and catch regressions early.

📊

Dataset Testing

Upload your test datasets and run evaluations at scale. Batch testing for systematic quality assurance.

Pricing

Start free, upgrade when you need more. No credit card required.

Free

$0 /forever
  • Custom Graders 10
  • File Uploads 50
  • Dataset Test Problems 100
  • Automated Evals/mo 50
  • Export as JSON
Start Free

Basic

$10 /month
  • Custom Graders 50
  • File Uploads 250
  • Dataset Test Problems 500
  • Automated Evals/mo 100
  • Export as JSON
Subscribe
Most Popular

Pro

$25 /month
  • Custom Graders Unlimited
  • File Uploads Unlimited
  • Dataset Test Problems Unlimited
  • Automated Evals/mo Unlimited
  • Export as JSON
Subscribe

Ready to evaluate your AI models?

Try SparkEval Free →