
Test robot policies before field time.
Compare your policy against earlier checkpoints, another team, or a vendor runner on the same captured task pack — with provenance and proof boundaries attached.
Packing cell · 500 episodes · rank fidelity
Illustrative readout. Generated and simulated media is review support — not real-world proof.
How it works
Capture first. Package the proof. Decide the next test.
A run is configured per site. Blueprint turns a real captured site into a comparable task envelope so you can rank policies before spending scarce robot time.
Capture the site
A capturer records the real indoor site as a task pack — walkthrough media, depth, poses, and capture notes.
Package the evidence
The capture becomes a site-specific package with provenance, rights, and privacy limits attached and visible.
Run the comparison
Your policy is ranked against earlier checkpoints, another team, or a vendor runner on the same task envelope.
Decide the next test
Use the ranking, failure clusters, and missing-proof labels to pilot, tune, recapture, or hold.
Same task, same robot
One captured envelope. A clear policy ranking.
Compare your own checkpoints or policies submitted by other teams and vendors under one captured site, task, and threshold scope. Rankings are diagnostic rank fidelity, not a universal accuracy guarantee.
Illustrative values. Correlation reference 0.929 (SC3-Eval).
Command center
See the clips.
First-person POV clips make policy failures easier to review across factory, warehouse, industrial, and home-task variants.










Boundary: Blueprint uses policy-evaluation research as category evidence for ranking and diagnostic workflows. It does not turn a virtual score into a universal accuracy guarantee or public policy-ranking result outside the measured evaluation scope.
Request evaluation
Rank your policies before field time.
Bring your checkpoints, a teammate's policy, or a vendor runner. We package a captured real site and return a ranked, proof-bounded readout.
