sample reviewer packet
What a reviewer receives at debrief.
Every aifluent work sample ends in a packet a reviewer can defend in debrief, hiring committee, and audit. This page renders one in full. The brief, the five-dimension signal, the audit trail, and the candidate's own AI-use reflection. No real candidate. No auto-rank. Decision support.
- Candidate
- Candidate (PM / AI-PM applicant, anonymised)
- Role
- Senior Product Manager, Workflow
- Scenario
- Feedback synthesis under AI-assisted constraints.
- Evidence summary
- Strong evidence in 4 of 5 dimensions
- The reviewer decides direction in debrief.
- Reviewer
- Moiz Ibn Yousaf
- Final report approved by the human reviewer.
- Approved at
- Apr 22, 2026, 4:14 PM UTC
artifact digest · sha-256
9b1c8a2f … d6c5b4a301 · the brief
The brief and the bar.
A B2B SaaS workflow product receives 240 mixed customer feedback snippets. The candidate uses the approved AI workbench to summarise, segment, and propose a launch trade-off the leadership team can defend in debrief.
- Problem
- A B2B SaaS workflow product. Leadership wants to add AI summarisation. The PM has 240 mixed customer feedback snippets, usage metrics, and a short revenue and customer-segment table.
- Constraints
- Two from engineering and legal. The candidate decides which constraint binds the launch and which can be relaxed in a follow-up.
- Deliverable
- A short memo: problem framing, prioritised recommendation, trade-off analysis, experiment or rollout plan, risks, and an AI-use reflection.
- Tools
- The aifluent controlled workbench. Approved AI stack only. Every AI turn captured. Every keystroke timestamped. The evaluator stays out of reach.
02 · reviewer signal
Five dimensions of reviewer signal.
Each dimension renders the reviewer's observation, the calibrated rubric note, the evidence cues that informed it, and one debrief follow-up the hiring team can use. No total. No ranking.
- 01
Structure.
Rubric note · 4 / 5Calibrated against the locked evaluator. Not a total.Memo opens with the decision, then the trade-offs, then the supporting segments. Reviewer flagged a single ordering issue in the appendix.
- Evidence cues
- Outline, problem framing, prioritisation logic, assumptions, trade-offs, final memo structure.
- Debrief follow-up
- Ask how the candidate would resequence the memo if engineering shipped only a partial release.
- 02
AI fluency.
Rubric note · 5 / 5Calibrated against the locked evaluator. Not a total.Candidate prompts the model for compression and segmentation, never for the recommendation itself. Three follow-up prompts narrow the segments by industry.
- Evidence cues
- Prompt strategy, iteration, verification, critique of AI output, tool choice, when they override AI.
- Debrief follow-up
- Ask which prompt produced the recommendation they kept, and which they overrode.
- 03
Business sense.
Rubric note · 4 / 5Calibrated against the locked evaluator. Not a total.Names the revenue-at-risk segment first. The launch trade-off is framed in account-tier language the reviewer used in calibration.
- Evidence cues
- Prioritisation, commercial logic, constraints, metrics, stakeholder trade-offs, decision rationale.
- Debrief follow-up
- Ask how the candidate would validate the top recommendation with customers in two weeks.
- 04
Communication clarity.
Rubric note · 4 / 5Calibrated against the locked evaluator. Not a total.Five short sections, no filler. The memo is reviewable in under three minutes; one chart is mislabeled.
- Evidence cues
- Executive summary, crisp recommendation, evidence-backed claims, audience fit, concise reasoning.
- Debrief follow-up
- Ask which two sentences they would cut if the memo had to fit one page.
- 05
Design thinking.
Rubric note · 3 / 5Calibrated against the locked evaluator. Not a total.Considers two user segments deeply, but the third is treated as a footnote. Reviewer recommends a debrief question on under-served personas.
- Evidence cues
- User problem framing, customer evidence, experiment design, accessibility and inclusion, failure modes.
- Debrief follow-up
- Ask how the under-served third segment would change the rollout plan.
03 · audit trail
Evidence, in the order it happened.
The aifluent ledger captures every relevant action across the run. The reviewer's packet shows the curated subset that explains the trade-offs in the memo.
audit ledger · run-relative timestamps
11 of 47 captured · 6 AI turns
- 00:00Step 01consent.accepted
Recording policy accepted. Run begins under the locked AI workbench.
- 00:01Step 02brief.opened
Candidate scopes the scenario in their own words before any AI use.
- 00:08Step 03ai.prompt.submitted
First AI prompt requests compression and segmentation of the feedback corpus.
- 00:14Step 04source.opened
Candidate opens raw feedback rows. Spot-reads two segment counts against AI output.
- 00:22Step 05candidate.note.created
Candidate flags an AI segment count as four percentage points off.
- 00:31Step 06ai.prompt.submitted
Second AI prompt narrows segments by industry and account tier.
- 00:43Step 07memo.draft.started
Candidate drafts the trade-off framing in their own voice. AI not used for this paragraph.
- 00:55Step 08ai.prompt.submitted
Third AI prompt pressure-tests the revenue-at-risk claim against source rows.
- 01:02Step 09candidate.override.recorded
Candidate overrides the AI recommendation to drop the smallest segment.
- 01:08Step 10artifact.submitted
Final memo submitted. SHA-256 captured for tamper-evident audit.
- 01:09Step 11reflection.submitted
AI-use reflection submitted. Six AI turns logged across the run.
04 · ai-use reflection
What AI helped with, what it did not.
The candidate names what AI did, what they verified by hand, and where they overrode the model. The reflection is the part of the packet a reviewer reads twice.
I used the AI workbench to compress 240 feedback snippets into segment summaries, and to pressure-test my revenue-at-risk number against the source rows. I prompted for compression and segmentation, never for the recommendation itself.
I verified two segment counts by re-reading the source rows. One AI count was off by four percentage points. I corrected it before it reached the memo.
I overrode the AI recommendation to drop the smallest segment. The support-tier mix made it strategically louder than its raw size suggests, and the trade-off only makes sense if that segment stays in the rollout plan.
The trade-off framing and the final recommendation are my own words. AI helped me see the inputs faster. The judgment is mine.
05 · reviewer guidance
What to ask in the debrief.
The reviewer's notes sit below. The hiring team owns the final call. The packet ends with one suggested follow-up so the debrief can challenge the strongest claim, not the weakest.
suggested follow-up
Press the candidate on the override: ask what evidence would have changed their mind about keeping the smallest segment in the rollout.
reviewer's debrief note
Strong AI-native judgment on a constrained brief. The candidate uses AI to compress the input, then sets it aside to write the trade-off in their own voice. Reviewer is inclined; recommend a debrief on prioritisation under ambiguity.
what went well, in the reviewer's own words
Verification is built into the workflow, not bolted on. The candidate questions an AI claim, re-reads source rows, and adjusts the recommendation before submitting. That habit shows up four times across the run.
evidence the reviewer cited
Candidate scoped the brief in their own words before invoking AI. They prompted the model for segmentation, then verified two segment counts by spot-reading source rows. The final memo names the three trade-offs the reviewer flagged in calibration.
06 · limits of this run
Limitations of this single run.
aifluent reports are decision support. Every limitation a reviewer would flag in debrief belongs in the packet.
Single seventy-minute run. Single role. Single scenario. Replicate across a panel before any final decision.
Optional same-task AI reference run included for context only. Not an answer key. Not an uplift score.
One numeric claim (segment share at thirty-eight percent) reproduced by the candidate but not independently sampled by the reviewer.
Single 70-minute run, single role, single scenario. No hidden holdout this cycle. The reviewer notes that one numeric claim (segment share = 38%) was reproduced but not independently sampled.