Back to challenge
Run report

Structured prompt scorecard

Full Stack Ecommerce Checkout Web App: PromptGolf compares visible app completion against hidden product-engineering checks.

Codex CLI · gpt-5.5

Structured spec

Use a spec with states, acceptance tests, promo validation, quantity limits, cents-safe totals, and clear mobile/error behavior.

84
score
7/10 hidden · 2 prompts
Public tests
5/5
Hidden tests
7/10
  • Displays cart items, prices, and quantitiespublic

    Cart table is visible and scannable.

    pass
  • Allows quantity changespublic

    Increment and decrement controls are present.

    pass
  • Shows subtotal, shipping, tax, discount, and totalpublic

    Order summary includes expected rows.

    pass
  • Accepts promo codespublic

    Promo input and apply action are present.

    pass
  • Shows order confirmationpublic

    Checkout reaches a success state.

    pass
  • Integer cents mathhidden

    Avoids floating-point totals and tax drift.

    pass
  • Promo normalizationhidden

    Trims codes and handles case-insensitive matches.

    pass
  • Invalid code errorhidden

    Bad codes produce clear, recoverable feedback.

    pass
  • Discount floorhidden

    Discounts cannot push payable total below zero.

    pass
  • Shipping threshold orderhidden

    Free shipping uses the specified subtotal-before-discount rule.

    fail
  • Out-of-stock blockhidden

    Unavailable line items prevent checkout.

    fail
  • Double-submit preventionhidden

    Repeated clicks cannot create duplicate orders.

    fail
  • Quantity boundarieshidden

    Quantities cannot go negative, zero accidentally, or above stock.

    pass
  • Loading and error stateshidden

    Async states are visible and buttons disable while pending.

    pass
  • Mobile usability and accessibilityhidden

    Core controls work on small screens with labels and keyboard affordances.

    pass
Robust checkout shell
generated checkout preview
Checkout
PROMO15
Canvas Totex1
Field Notebookx2
USB-C Dockx3
Subtotal$152.00
Discount-$22.80
Total$140.18

Structured requirements catch most bugs, but misses a few domain-specific checkout rules.

Failure categories
Shipping threshold orderOut-of-stock blockDouble-submit race

Sandbox/run timeline

  1. Resolve model
    complete
    Codex CLI provider selected: gpt-5.5 through AI SDK adapter.
  2. Provision sandbox
    running
    Live sandbox adapter is configured; run creation probes the sandbox API and reports connected or degraded state.
  3. Generate app
    complete
    Agent applied the submitted spec to a Next.js checkout implementation.
  4. Install + build
    complete
    npm install cache restored, TypeScript build completed.
  5. Playwright evaluation
    complete
    Public and hidden checkout tests executed with product seed cart data.
  6. Scorecard
    complete
    Scoring rewards public tests, hidden tests, UX/style, and prompt efficiency.