Each challenge exposes a normal public brief. The hidden evaluator checks the messy product and engineering behavior good specs should include.
A production checkout brief where vague specs collapse under real commerce edge cases.
The visible checkout is the easy part. Hidden tests reveal whether your spec encodes cents math, stock rules, async safety, and mobile checkout behavior.
SaaS settings workflow with hidden lifecycle and permission traps.
CRUD is not enough: hidden tests check duplicate invites, single-use tokens, last-owner protection, and role authorization.