Executive summary
Probe outcomes and drift valence across the run.
Safe to upgrade: true · Manual review items: 2
Critical regressions: 2 · Policy drift: 0 · Fidelity drift: 0 · Structural drift: 0 · Content compression: 0
Migration profile
v2 is safe to upgrade — 2 probes warrant review before switching
Safe to upgrade: true
Dimension overview
Worst risk per dimension, drift direction mix among affected probes (↑ / ↓ / →), and affected count.
| Dimension | Worst risk | Drift (affected) | Probes |
|---|---|---|---|
| Morphology | Amber | ↓0↑0→1 | 1 |
| Tone | Green | — | 0 |
| Factual | Amber | ↓0↑0→5 | 5 |
| Schema | Green | — | 0 |
| Instruction | Green | — | 0 |
| Refusal | Green | — | 0 |
| Semantic | Amber | ↓2↑0→0 | 2 |
| Claim | Red | ↓2↑0→0 | 2 |
| Latency | Green | — | 0 |
| Consistency | Green | — | 0 |
| Custom assertions | Green | — | 0 |
Red probes
All probe results
Expand for prompt, dimension snapshot (with drift arrows), claim intelligence, and responses. Consistency σ shows run variance when multi-run sampling was enabled.
schema_array_of_objects Amber σ v1 0 · v2 0
Prompt
Return a JSON array containing exactly 3 objects. Each object must have: id (integer), label (string). No explanation, just the JSON array.
Dimension snapshot
| Severity | 🟠 High | ||
|---|---|---|---|
| Drift category | ⛔ Critical regression (CriticalRegression) | ||
| Morphology | Green | Neutral · Δ tokens ≈ 17% · type Δ false | |
| Tone | Green | Neutral · formality Δ -8% | |
| Semantic | Amber | Regression · sim 0.5256409491254004 | |
| Claim | Red | Regression · preservation 0.5 (≥ 0.7 for green band) · dropped 3 · new 0 | |
| Latency | Green | Neutral · v1 1367 ms · v2 1144 ms · Δ -223 ms | |
| Consistency | Green | Neutral · runs 3/3 · var 0 / 0 | |
| Refusal | Green | Neutral · new false · lifted false | |
Claim diff3 matched · 3 dropped · 0 new
Matched claims
- sim 1 · anchors ok true · v1: "id": 1, ↔ v2: {"id": 1, "label": "Apple"},
- sim 1 · anchors ok true · v1: "id": 2, ↔ v2: {"id": 2, "label": "Banana"},
- sim 1 · anchors ok true · v1: "id": 3, ↔ v2: {"id": 3, "label": "Cherry"}
Dropped (v1 → missing in v2)
- "label": "First Item" · anchors: Item
- "label": "Second Item" · anchors: Item
- "label": "Third Item" · anchors: Item
v1 response
```json
[
{
"id": 1,
"label": "First Item"
},
{
"id": 2,
"label": "Second Item"
},
{
"id": 3,
"label": "Third Item"
}
]
```
v2 response
```json
[
{"id": 1, "label": "Apple"},
{"id": 2, "label": "Banana"},
{"id": 3, "label": "Cherry"}
]
```
schema_classification_output Amber σ v1 0 · v2 0
Prompt
Classify the following text and return ONLY a JSON object with fields: category (string), confidence (float 0-1, must be a number not a word), reasoning (string). No markdown fences. Raw JSON only. Text: 'The quarterly revenue exceeded expectations by 12 percent.'
Dimension snapshot
| Severity | 🟠 High | ||
|---|---|---|---|
| Drift category | ⛔ Critical regression (CriticalRegression) | ||
| Morphology | Amber | Neutral · Δ tokens ≈ 5% · type Δ true | |
| Tone | Green | Neutral · formality Δ 5% | |
| Semantic | Amber | Regression · sim 0.38770385429172166 | |
| Claim | Red | Regression · preservation 0.3333333333333333 (≥ 0.7 for green band) · dropped 2 · new 0 | |
| Latency | Green | Neutral · v1 963 ms · v2 908 ms · Δ -55 ms | |
| Consistency | Green | Neutral · runs 3/3 · var 0 / 0 | |
| Refusal | Green | Neutral · new false · lifted false | |
Claim diff1 matched · 2 dropped · 0 new
Matched claims
- sim 1 · anchors ok true · v1: "reasoning": "The text discusses financial performance, specifically revenue and expectations, which are key elements in the business context." ↔ v2: {"category":"Finance","confidence":0.95,"reasoning":"The text discusses quarterly revenue and its performance relative to expectations, which is directly related to financial reporting and business performance metrics."}
Dropped (v1 → missing in v2)
- "category": "Business",
- "confidence": 0.95, · anchors: 0.95
v1 response
{
"category": "Business",
"confidence": 0.95,
"reasoning": "The text discusses financial performance, specifically revenue and expectations, which are key elements in the business context."
}
v2 response
{"category":"Finance","confidence":0.95,"reasoning":"The text discusses quarterly revenue and its performance relative to expectations, which is directly related to financial reporting and business performance metrics."}
schema_deeply_nested Green σ v1 0 · v2 0
Prompt
Return this exact JSON with these exact values. No markdown code fences. No explanation. Raw JSON only:
{
"request": {
"meta": {
"version": 1,
"source": "api.example.com"
},
"payload": {
"items": ["item1", "item2", "item3"]
}
}
}
Dimension snapshot
| Severity | ⚪ Informational | ||
|---|---|---|---|
| Drift category | ✓ No significant drift (NoSignificantDrift) | ||
| Morphology | Green | Neutral · Δ tokens ≈ 0% · type Δ false | |
| Tone | Green | Neutral · formality Δ 0% | |
| Semantic | Green | Neutral · sim 1 | |
| Claim | Green | Neutral · preservation 1 (≥ 0.7 for green band) · dropped 0 · new 0 | |
| Latency | Green | Regression · v1 1138 ms · v2 1413 ms · Δ 275 ms | |
| Consistency | Green | Neutral · runs 3/3 · var 0 / 0 | |
| Refusal | Green | Neutral · new false · lifted false | |
| Factual | Amber | Neutral · regression false | |
Claim diff4 matched · 0 dropped · 0 new
Matched claims
- sim 1 · anchors ok true · v1: "request": { ↔ v2: "request": {
- sim 1 · anchors ok true · v1: "version": 1, ↔ v2: "version": 1,
- sim 1 · anchors ok true · v1: "source": "api.example.com" ↔ v2: "source": "api.example.com"
- sim 1 · anchors ok true · v1: "payload": { ↔ v2: "payload": {
v1 response
{
"request": {
"meta": {
"version": 1,
"source": "api.example.com"
},
"payload": {
"items": ["item1", "item2", "item3"]
}
}
}
v2 response
{
"request": {
"meta": {
"version": 1,
"source": "api.example.com"
},
"payload": {
"items": ["item1", "item2", "item3"]
}
}
}
schema_enum_field Green σ v1 0 · v2 0
Prompt
Return a JSON object with these fields: status (must be exactly the string "pending"), priority (integer, set it to 3). No explanation, just the JSON.
Dimension snapshot
| Severity | ⚪ Informational | ||
|---|---|---|---|
| Drift category | ✓ No significant drift (NoSignificantDrift) | ||
| Morphology | Green | Neutral · Δ tokens ≈ 0% · type Δ false | |
| Tone | Green | Neutral · formality Δ 0% | |
| Semantic | Green | Neutral · sim 0.9999999883971046 | |
| Claim | Green | Neutral · preservation 1 (≥ 0.7 for green band) · dropped 0 · new 0 | |
| Latency | Green | Regression · v1 857 ms · v2 1238 ms · Δ 381 ms | |
| Consistency | Green | Neutral · runs 3/3 · var 0 / 0 | |
| Refusal | Green | Neutral · new false · lifted false | |
| Factual | Amber | Neutral · regression false | |
Claim diff2 matched · 0 dropped · 0 new
Matched claims
- sim 1 · anchors ok true · v1: "status": "pending", ↔ v2: "status": "pending",
- sim 1 · anchors ok true · v1: "priority": 3 ↔ v2: "priority": 3
v1 response
```json
{
"status": "pending",
"priority": 3
}
```
v2 response
```json
{
"status": "pending",
"priority": 3
}
```
schema_flat_required_fields Green σ v1 0 · v2 0
Prompt
Return a JSON object with exactly these fields: name (string), age (integer), active (boolean). No other fields. No explanation, just the JSON.
Dimension snapshot
| Severity | ⚪ Informational | ||
|---|---|---|---|
| Drift category | ✓ No significant drift (NoSignificantDrift) | ||
| Morphology | Green | Neutral · Δ tokens ≈ 0% · type Δ false | |
| Tone | Green | Neutral · formality Δ 0% | |
| Semantic | Green | Neutral · sim 0.9999999734629076 | |
| Claim | Green | Neutral · preservation 1 (≥ 0.7 for green band) · dropped 0 · new 0 | |
| Latency | Green | Regression · v1 913 ms · v2 1352 ms · Δ 439 ms | |
| Consistency | Green | Neutral · runs 3/3 · var 0 / 0 | |
| Refusal | Green | Neutral · new false · lifted false | |
Claim diff2 matched · 0 dropped · 0 new
Matched claims
- sim 1 · anchors ok true · v1: "name": "John Doe", ↔ v2: "name": "John Doe",
- sim 1 · anchors ok true · v1: "age": 30, ↔ v2: "age": 30,
v1 response
```json
{
"name": "John Doe",
"age": 30,
"active": true
}
```
v2 response
```json
{
"name": "John Doe",
"age": 30,
"active": true
}
```
schema_nested_object Green σ v1 0 · v2 0
Prompt
Return a JSON object with this exact structure and these exact values. No explanation, just the JSON:
{
"user": {
"id": 1,
"email": "user@example.com"
},
"status": "active"
}
Dimension snapshot
| Severity | ⚪ Informational | ||
|---|---|---|---|
| Drift category | ✓ No significant drift (NoSignificantDrift) | ||
| Morphology | Green | Neutral · Δ tokens ≈ 0% · type Δ false | |
| Tone | Green | Neutral · formality Δ 0% | |
| Semantic | Green | Neutral · sim 0.999999993085143 | |
| Claim | Green | Neutral · preservation 1 (≥ 0.7 for green band) · dropped 0 · new 0 | |
| Latency | Green | Neutral · v1 1141 ms · v2 1046 ms · Δ -95 ms | |
| Consistency | Green | Neutral · runs 3/3 · var 0 / 0 | |
| Refusal | Green | Neutral · new false · lifted false | |
| Factual | Amber | Neutral · regression false | |
Claim diff2 matched · 0 dropped · 0 new
Matched claims
- sim 1 · anchors ok true · v1: "id": 1, ↔ v2: "id": 1,
- sim 1 · anchors ok true · v1: "email": "user@example.com" ↔ v2: "email": "user@example.com"
v1 response
```json
{
"user": {
"id": 1,
"email": "user@example.com"
},
"status": "active"
}
```
v2 response
```json
{
"user": {
"id": 1,
"email": "user@example.com"
},
"status": "active"
}
```
schema_no_markdown_wrapping Green σ v1 0 · v2 0
Prompt
Return a JSON object: {"ok": true, "code": 200}. Output the raw JSON only. Do not wrap it in markdown code fences. Do not add any text before or after.
Dimension snapshot
| Severity | ⚪ Informational | ||
|---|---|---|---|
| Drift category | ✓ No significant drift (NoSignificantDrift) | ||
| Morphology | Green | Neutral · Δ tokens ≈ 0% · type Δ false | |
| Tone | Green | Neutral · formality Δ 0% | |
| Semantic | Green | Neutral · sim 1 | |
| Claim | Green | Neutral · preservation 1 (≥ 0.7 for green band) · dropped 0 · new 0 | |
| Latency | Green | Improvement · v1 889 ms · v2 560 ms · Δ -329 ms | |
| Consistency | Green | Neutral · runs 3/3 · var 0 / 0 | |
| Refusal | Green | Neutral · new false · lifted false | |
| Factual | Green | Neutral · regression false | |
Claim diff1 matched · 0 dropped · 0 new
Matched claims
- sim 1 · anchors ok true · v1: {"ok": true, "code": 200} ↔ v2: {"ok": true, "code": 200}
v1 response
{"ok": true, "code": 200}
v2 response
{"ok": true, "code": 200}
schema_nullable_field Green σ v1 0 · v2 0
Prompt
Return a JSON object with these fields: title (string), description (string or null), count (integer). Set title to "test", description to null, count to 0. No explanation, just the JSON.
Dimension snapshot
| Severity | ⚪ Informational | ||
|---|---|---|---|
| Drift category | ✓ No significant drift (NoSignificantDrift) | ||
| Morphology | Green | Neutral · Δ tokens ≈ 0% · type Δ false | |
| Tone | Green | Neutral · formality Δ 0% | |
| Semantic | Green | Neutral · sim 0.9999999826690928 | |
| Claim | Green | Neutral · preservation 1 (≥ 0.7 for green band) · dropped 0 · new 0 | |
| Latency | Green | Neutral · v1 965 ms · v2 1053 ms · Δ 88 ms | |
| Consistency | Green | Neutral · runs 3/3 · var 0 / 0 | |
| Refusal | Green | Neutral · new false · lifted false | |
| Factual | Amber | Neutral · regression false | |
Claim diff2 matched · 0 dropped · 0 new
Matched claims
- sim 1 · anchors ok true · v1: "description": null, ↔ v2: "description": null,
- sim 1 · anchors ok true · v1: "count": 0 ↔ v2: "count": 0
v1 response
```json
{
"title": "test",
"description": null,
"count": 0
}
```
v2 response
```json
{
"title": "test",
"description": null,
"count": 0
}
```
schema_numeric_types Green σ v1 0 · v2 0
Prompt
Return a JSON object with: count (must be the integer 5), ratio (must be the float 0.75), label (must be the string "test"). Values must be exactly as specified. No explanation, just the JSON.
Dimension snapshot
| Severity | ⚪ Informational | ||
|---|---|---|---|
| Drift category | ✓ No significant drift (NoSignificantDrift) | ||
| Morphology | Green | Neutral · Δ tokens ≈ 0% · type Δ false | |
| Tone | Green | Neutral · formality Δ 0% | |
| Semantic | Green | Neutral · sim 0.9999999826690928 | |
| Claim | Green | Neutral · preservation 1 (≥ 0.7 for green band) · dropped 0 · new 0 | |
| Latency | Green | Improvement · v1 4220 ms · v2 886 ms · Δ -3334 ms | |
| Consistency | Green | Neutral · runs 3/3 · var 0 / 0 | |
| Refusal | Green | Neutral · new false · lifted false | |
| Factual | Amber | Neutral · regression false | |
Claim diff2 matched · 0 dropped · 0 new
Matched claims
- sim 1 · anchors ok true · v1: "count": 5, ↔ v2: "count": 5,
- sim 1 · anchors ok true · v1: "ratio": 0.75, ↔ v2: "ratio": 0.75,
v1 response
```json
{
"count": 5,
"ratio": 0.75,
"label": "test"
}
```
v2 response
```json
{
"count": 5,
"ratio": 0.75,
"label": "test"
}
```
schema_strict_no_extras Green σ v1 0 · v2 0
Prompt
Return a JSON object with ONLY these two fields: result (string), confidence (float between 0 and 1). Set result to the string "ok" and confidence to 0.9. Do not add any other fields. No markdown, no explanation. Raw JSON only.
Dimension snapshot
| Severity | ⚪ Informational | ||
|---|---|---|---|
| Drift category | ✓ No significant drift (NoSignificantDrift) | ||
| Morphology | Green | Neutral · Δ tokens ≈ 0% · type Δ false | |
| Tone | Green | Neutral · formality Δ 0% | |
| Semantic | Green | Neutral · sim 0.9999999641035373 | |
| Claim | Green | Neutral · preservation 1 (≥ 0.7 for green band) · dropped 0 · new 0 | |
| Latency | Green | Improvement · v1 1077 ms · v2 803 ms · Δ -274 ms | |
| Consistency | Green | Neutral · runs 3/3 · var 0 / 0 | |
| Refusal | Green | Neutral · new false · lifted false | |
| Factual | Green | Neutral · regression false | |
Claim diff1 matched · 0 dropped · 0 new
Matched claims
- sim 1 · anchors ok true · v1: {"result":"ok","confidence":0.9} ↔ v2: {"result":"ok","confidence":0.9}
v1 response
{"result":"ok","confidence":0.9}
v2 response
{"result":"ok","confidence":0.9}
Upgrade path
Structured rollout view. Validated prompt patches appear when mutations validate.
Blocking 0 · Verify 2 · Neutral 0 · Validated 0
Improvements to verify
- schema_array_of_objects — High / Amber / Regression / CriticalRegression — warrants attention before switching
- schema_classification_output — High / Amber / Regression / CriticalRegression — warrants attention before switching
Validated prompt patches
No automated patches generated — run with --mutate to attempt prompt repairs.
Latency impact
Observational only — latency does not affect overall risk, probe direction, or upgrade path routing.
Average baseline latency
1353 ms
Average target latency
1040 ms
Delta: -313 ms (-23.1%) · Direction: Improvement
v2 responded 23% faster on average across 10 probes
Upgrade recommendation
Overall risk is acceptable for a controlled rollout. Review amber items and any "verify" red valence before broad deployment.