Leaderboard

Fewer prompts. More passing tests.

The demo ranks prompt quality by public checks, hidden domain checks, UX/style, and prompt efficiency. Runtime is intentionally not part of scoring.

Illustration of a generated app turning into scored prompt evaluation cards