ARSENIC · Drift 7efb5fe6-38d2-4013-a68f-2449f2f84122

Executive summary

Probe outcomes and drift valence across the run.

Total probes

Green

Amber

Red

↓ Regressions

↑ Improvements

→ Neutral

Auto-fix candidates

Safe to upgrade: true · Manual review items: 2

Critical regressions: 2 · Policy drift: 0 · Fidelity drift: 0 · Structural drift: 0 · Content compression: 0

Migration profile

v2 is safe to upgrade — 2 probes warrant review before switching

⚡ 23% faster

Safe to upgrade: true

Dimension overview

Worst risk per dimension, drift direction mix among affected probes (↑ / ↓ / →), and affected count.

Dimension	Worst risk	Drift (affected)	Probes
Morphology	Amber	↓0↑0→1	1
Tone	Green	—	0
Factual	Amber	↓0↑0→5	5
Schema	Green	—	0
Instruction	Green	—	0
Refusal	Green	—	0
Semantic	Amber	↓2↑0→0	2
Claim	Red	↓2↑0→0	2
Latency	Green	—	0
Consistency	Green	—	0
Custom assertions	Green	—	0

Red probes

All probe results

Expand for prompt, dimension snapshot (with drift arrows), claim intelligence, and responses. Consistency σ shows run variance when multi-run sampling was enabled.

schema_array_of_objects Amber σ v1 0 · v2 0 Instruction · Regression

Prompt

Return a JSON array containing exactly 3 objects. Each object must have: id (integer), label (string). No explanation, just the JSON array.

Dimension snapshot

Severity	🟠 High
Drift category	⛔ Critical regression (CriticalRegression)
Morphology	Green	Neutral · Δ tokens ≈ 17% · type Δ false
Tone	Green	Neutral · formality Δ -8%
Semantic	Amber	Regression · sim 0.5256409491254004
Claim	Red	Regression · preservation 0.5 (≥ 0.7 for green band) · dropped 3 · new 0
Latency	Green	Neutral · v1 1367 ms · v2 1144 ms · Δ -223 ms
Consistency	Green	Neutral · runs 3/3 · var 0 / 0
Refusal	Green	Neutral · new false · lifted false

Claim diff3 matched · 3 dropped · 0 new

Matched claims

sim 1 · anchors ok true · v1: "id": 1, ↔ v2: {"id": 1, "label": "Apple"},
sim 1 · anchors ok true · v1: "id": 2, ↔ v2: {"id": 2, "label": "Banana"},
sim 1 · anchors ok true · v1: "id": 3, ↔ v2: {"id": 3, "label": "Cherry"}

Dropped (v1 → missing in v2)

"label": "First Item" · anchors: Item
"label": "Second Item" · anchors: Item
"label": "Third Item" · anchors: Item

v1 response

```json
[
    {
        "id": 1,
        "label": "First Item"
    },
    {
        "id": 2,
        "label": "Second Item"
    },
    {
        "id": 3,
        "label": "Third Item"
    }
]
```

v2 response

```json
[
  {"id": 1, "label": "Apple"},
  {"id": 2, "label": "Banana"},
  {"id": 3, "label": "Cherry"}
]
```

schema_classification_output Amber σ v1 0 · v2 0 Instruction · Regression

Prompt

Classify the following text and return ONLY a JSON object with fields: category (string), confidence (float 0-1, must be a number not a word), reasoning (string). No markdown fences. Raw JSON only.

Text: 'The quarterly revenue exceeded expectations by 12 percent.'

Dimension snapshot

Severity	🟠 High
Drift category	⛔ Critical regression (CriticalRegression)
Morphology	Amber	Neutral · Δ tokens ≈ 5% · type Δ true
Tone	Green	Neutral · formality Δ 5%
Semantic	Amber	Regression · sim 0.38770385429172166
Claim	Red	Regression · preservation 0.3333333333333333 (≥ 0.7 for green band) · dropped 2 · new 0
Latency	Green	Neutral · v1 963 ms · v2 908 ms · Δ -55 ms
Consistency	Green	Neutral · runs 3/3 · var 0 / 0
Refusal	Green	Neutral · new false · lifted false

Claim diff1 matched · 2 dropped · 0 new

Matched claims

sim 1 · anchors ok true · v1: "reasoning": "The text discusses financial performance, specifically revenue and expectations, which are key elements in the business context." ↔ v2: {"category":"Finance","confidence":0.95,"reasoning":"The text discusses quarterly revenue and its performance relative to expectations, which is directly related to financial reporting and business performance metrics."}

Dropped (v1 → missing in v2)

"category": "Business",
"confidence": 0.95, · anchors: 0.95

v1 response

{
  "category": "Business",
  "confidence": 0.95,
  "reasoning": "The text discusses financial performance, specifically revenue and expectations, which are key elements in the business context."
}

v2 response

{"category":"Finance","confidence":0.95,"reasoning":"The text discusses quarterly revenue and its performance relative to expectations, which is directly related to financial reporting and business performance metrics."}

schema_deeply_nested Green σ v1 0 · v2 0 Instruction · Neutral

Prompt

Return this exact JSON with these exact values. No markdown code fences. No explanation. Raw JSON only:
{
  "request": {
    "meta": {
      "version": 1,
      "source": "api.example.com"
    },
    "payload": {
      "items": ["item1", "item2", "item3"]
    }
  }
}

Dimension snapshot

Severity	⚪ Informational
Drift category	✓ No significant drift (NoSignificantDrift)
Morphology	Green	Neutral · Δ tokens ≈ 0% · type Δ false
Tone	Green	Neutral · formality Δ 0%
Semantic	Green	Neutral · sim 1
Claim	Green	Neutral · preservation 1 (≥ 0.7 for green band) · dropped 0 · new 0
Latency	Green	Regression · v1 1138 ms · v2 1413 ms · Δ 275 ms
Consistency	Green	Neutral · runs 3/3 · var 0 / 0
Refusal	Green	Neutral · new false · lifted false
Factual	Amber	Neutral · regression false

Claim diff4 matched · 0 dropped · 0 new

Matched claims

sim 1 · anchors ok true · v1: "request": { ↔ v2: "request": {
sim 1 · anchors ok true · v1: "version": 1, ↔ v2: "version": 1,
sim 1 · anchors ok true · v1: "source": "api.example.com" ↔ v2: "source": "api.example.com"
sim 1 · anchors ok true · v1: "payload": { ↔ v2: "payload": {

v1 response

{
  "request": {
    "meta": {
      "version": 1,
      "source": "api.example.com"
    },
    "payload": {
      "items": ["item1", "item2", "item3"]
    }
  }
}

v2 response

{
  "request": {
    "meta": {
      "version": 1,
      "source": "api.example.com"
    },
    "payload": {
      "items": ["item1", "item2", "item3"]
    }
  }
}

schema_enum_field Green σ v1 0 · v2 0 Instruction · Neutral

Prompt

Return a JSON object with these fields: status (must be exactly the string "pending"), priority (integer, set it to 3). No explanation, just the JSON.

Dimension snapshot

Severity	⚪ Informational
Drift category	✓ No significant drift (NoSignificantDrift)
Morphology	Green	Neutral · Δ tokens ≈ 0% · type Δ false
Tone	Green	Neutral · formality Δ 0%
Semantic	Green	Neutral · sim 0.9999999883971046
Claim	Green	Neutral · preservation 1 (≥ 0.7 for green band) · dropped 0 · new 0
Latency	Green	Regression · v1 857 ms · v2 1238 ms · Δ 381 ms
Consistency	Green	Neutral · runs 3/3 · var 0 / 0
Refusal	Green	Neutral · new false · lifted false
Factual	Amber	Neutral · regression false

Claim diff2 matched · 0 dropped · 0 new

Matched claims

sim 1 · anchors ok true · v1: "status": "pending", ↔ v2: "status": "pending",
sim 1 · anchors ok true · v1: "priority": 3 ↔ v2: "priority": 3

v1 response

```json
{
  "status": "pending",
  "priority": 3
}
```

v2 response

```json
{
  "status": "pending",
  "priority": 3
}
```

schema_flat_required_fields Green σ v1 0 · v2 0 Instruction · Neutral

Prompt

Return a JSON object with exactly these fields: name (string), age (integer), active (boolean). No other fields. No explanation, just the JSON.

Dimension snapshot

Severity	⚪ Informational
Drift category	✓ No significant drift (NoSignificantDrift)
Morphology	Green	Neutral · Δ tokens ≈ 0% · type Δ false
Tone	Green	Neutral · formality Δ 0%
Semantic	Green	Neutral · sim 0.9999999734629076
Claim	Green	Neutral · preservation 1 (≥ 0.7 for green band) · dropped 0 · new 0
Latency	Green	Regression · v1 913 ms · v2 1352 ms · Δ 439 ms
Consistency	Green	Neutral · runs 3/3 · var 0 / 0
Refusal	Green	Neutral · new false · lifted false

Claim diff2 matched · 0 dropped · 0 new

Matched claims

sim 1 · anchors ok true · v1: "name": "John Doe", ↔ v2: "name": "John Doe",
sim 1 · anchors ok true · v1: "age": 30, ↔ v2: "age": 30,

v1 response

```json
{
  "name": "John Doe",
  "age": 30,
  "active": true
}
```

v2 response

```json
{
  "name": "John Doe",
  "age": 30,
  "active": true
}
```

schema_nested_object Green σ v1 0 · v2 0 Instruction · Neutral

Prompt

Return a JSON object with this exact structure and these exact values. No explanation, just the JSON:
{
  "user": {
    "id": 1,
    "email": "user@example.com"
  },
  "status": "active"
}

Dimension snapshot

Severity	⚪ Informational
Drift category	✓ No significant drift (NoSignificantDrift)
Morphology	Green	Neutral · Δ tokens ≈ 0% · type Δ false
Tone	Green	Neutral · formality Δ 0%
Semantic	Green	Neutral · sim 0.999999993085143
Claim	Green	Neutral · preservation 1 (≥ 0.7 for green band) · dropped 0 · new 0
Latency	Green	Neutral · v1 1141 ms · v2 1046 ms · Δ -95 ms
Consistency	Green	Neutral · runs 3/3 · var 0 / 0
Refusal	Green	Neutral · new false · lifted false
Factual	Amber	Neutral · regression false

Claim diff2 matched · 0 dropped · 0 new

Matched claims

sim 1 · anchors ok true · v1: "id": 1, ↔ v2: "id": 1,
sim 1 · anchors ok true · v1: "email": "user@example.com" ↔ v2: "email": "user@example.com"

v1 response

```json
{
  "user": {
    "id": 1,
    "email": "user@example.com"
  },
  "status": "active"
}
```

v2 response

```json
{
  "user": {
    "id": 1,
    "email": "user@example.com"
  },
  "status": "active"
}
```

schema_no_markdown_wrapping Green σ v1 0 · v2 0 Instruction · Neutral

Prompt

Return a JSON object: {"ok": true, "code": 200}. Output the raw JSON only. Do not wrap it in markdown code fences. Do not add any text before or after.

Dimension snapshot

Severity	⚪ Informational
Drift category	✓ No significant drift (NoSignificantDrift)
Morphology	Green	Neutral · Δ tokens ≈ 0% · type Δ false
Tone	Green	Neutral · formality Δ 0%
Semantic	Green	Neutral · sim 1
Claim	Green	Neutral · preservation 1 (≥ 0.7 for green band) · dropped 0 · new 0
Latency	Green	Improvement · v1 889 ms · v2 560 ms · Δ -329 ms
Consistency	Green	Neutral · runs 3/3 · var 0 / 0
Refusal	Green	Neutral · new false · lifted false
Factual	Green	Neutral · regression false

Claim diff1 matched · 0 dropped · 0 new

Matched claims

sim 1 · anchors ok true · v1: {"ok": true, "code": 200} ↔ v2: {"ok": true, "code": 200}

v1 response

{"ok": true, "code": 200}

v2 response

{"ok": true, "code": 200}

schema_nullable_field Green σ v1 0 · v2 0 Instruction · Neutral

Prompt

Return a JSON object with these fields: title (string), description (string or null), count (integer). Set title to "test", description to null, count to 0. No explanation, just the JSON.

Dimension snapshot

Severity	⚪ Informational
Drift category	✓ No significant drift (NoSignificantDrift)
Morphology	Green	Neutral · Δ tokens ≈ 0% · type Δ false
Tone	Green	Neutral · formality Δ 0%
Semantic	Green	Neutral · sim 0.9999999826690928
Claim	Green	Neutral · preservation 1 (≥ 0.7 for green band) · dropped 0 · new 0
Latency	Green	Neutral · v1 965 ms · v2 1053 ms · Δ 88 ms
Consistency	Green	Neutral · runs 3/3 · var 0 / 0
Refusal	Green	Neutral · new false · lifted false
Factual	Amber	Neutral · regression false

Claim diff2 matched · 0 dropped · 0 new

Matched claims

sim 1 · anchors ok true · v1: "description": null, ↔ v2: "description": null,
sim 1 · anchors ok true · v1: "count": 0 ↔ v2: "count": 0

v1 response

```json
{
  "title": "test",
  "description": null,
  "count": 0
}
```

v2 response

```json
{
  "title": "test",
  "description": null,
  "count": 0
}
```

schema_numeric_types Green σ v1 0 · v2 0 Instruction · Neutral

Prompt

Return a JSON object with: count (must be the integer 5), ratio (must be the float 0.75), label (must be the string "test"). Values must be exactly as specified. No explanation, just the JSON.

Dimension snapshot

Severity	⚪ Informational
Drift category	✓ No significant drift (NoSignificantDrift)
Morphology	Green	Neutral · Δ tokens ≈ 0% · type Δ false
Tone	Green	Neutral · formality Δ 0%
Semantic	Green	Neutral · sim 0.9999999826690928
Claim	Green	Neutral · preservation 1 (≥ 0.7 for green band) · dropped 0 · new 0
Latency	Green	Improvement · v1 4220 ms · v2 886 ms · Δ -3334 ms
Consistency	Green	Neutral · runs 3/3 · var 0 / 0
Refusal	Green	Neutral · new false · lifted false
Factual	Amber	Neutral · regression false

Claim diff2 matched · 0 dropped · 0 new

Matched claims

sim 1 · anchors ok true · v1: "count": 5, ↔ v2: "count": 5,
sim 1 · anchors ok true · v1: "ratio": 0.75, ↔ v2: "ratio": 0.75,

v1 response

```json
{
  "count": 5,
  "ratio": 0.75,
  "label": "test"
}
```

v2 response

```json
{
  "count": 5,
  "ratio": 0.75,
  "label": "test"
}
```

schema_strict_no_extras Green σ v1 0 · v2 0 Instruction · Neutral

Prompt

Return a JSON object with ONLY these two fields: result (string), confidence (float between 0 and 1). Set result to the string "ok" and confidence to 0.9. Do not add any other fields. No markdown, no explanation. Raw JSON only.

Dimension snapshot

Severity	⚪ Informational
Drift category	✓ No significant drift (NoSignificantDrift)
Morphology	Green	Neutral · Δ tokens ≈ 0% · type Δ false
Tone	Green	Neutral · formality Δ 0%
Semantic	Green	Neutral · sim 0.9999999641035373
Claim	Green	Neutral · preservation 1 (≥ 0.7 for green band) · dropped 0 · new 0
Latency	Green	Improvement · v1 1077 ms · v2 803 ms · Δ -274 ms
Consistency	Green	Neutral · runs 3/3 · var 0 / 0
Refusal	Green	Neutral · new false · lifted false
Factual	Green	Neutral · regression false

Claim diff1 matched · 0 dropped · 0 new

Matched claims

sim 1 · anchors ok true · v1: {"result":"ok","confidence":0.9} ↔ v2: {"result":"ok","confidence":0.9}

v1 response

{"result":"ok","confidence":0.9}

v2 response

{"result":"ok","confidence":0.9}

Upgrade path

Structured rollout view. Validated prompt patches appear when mutations validate.

Blocking 0 · Verify 2 · Neutral 0 · Validated 0

Improvements to verify

schema_array_of_objects — High / Amber / Regression / CriticalRegression — warrants attention before switching
schema_classification_output — High / Amber / Regression / CriticalRegression — warrants attention before switching

Validated prompt patches

No automated patches generated — run with --mutate to attempt prompt repairs.

Latency impact

Observational only — latency does not affect overall risk, probe direction, or upgrade path routing.

Average baseline latency

1353 ms

Average target latency

1040 ms

Delta: -313 ms (-23.1%) · Direction: Improvement

v2 responded 23% faster on average across 10 probes

Upgrade recommendation

Overall risk is acceptable for a controlled rollout. Review amber items and any "verify" red valence before broad deployment.