ARSENIC drift report

Run 4f9993c2-4ed5-459e-b91d-7bd7e868ac27 · 2026-05-18T21:18:10.945715Z · Overall Green

v1 (gpt-4o-mini) vs v2 (gpt-4.1-mini)

Executive summary

Probe outcomes and drift valence across the run.

10
Total probes
10
Green
0
Amber
0
Red
0
↓ Regressions
0
↑ Improvements
10
→ Neutral
0
Auto-fix candidates

Safe to upgrade: true · Manual review items: 0

Critical regressions: 0 · Policy drift: 0 · Fidelity drift: 0 · Structural drift: 0 · Content compression: 0

Migration profile

v2 is 32% faster

⚡ 32% faster

Safe to upgrade: true

Dimension overview

Worst risk per dimension, drift direction mix among affected probes (↑ / ↓ / →), and affected count.

Dimension Worst risk Drift (affected) Probes
Morphology Green 0
Tone Green 0
Factual Green 0
Schema Green 0
Instruction Green 0
Refusal Green 0
Semantic Amber ↓0↑1→0 1
Claim Green 0
Latency Green 0
Consistency Amber ↓0↑1→0 1
Custom assertions Green 0

Red probes

No blocking regressions detected.

All probe results

Expand for prompt, dimension snapshot (with drift arrows), claim intelligence, and responses. Consistency σ shows run variance when multi-run sampling was enabled.

code_javascript_arrow_function Green σ v1 0 · v2 0 Instruction · Neutral

Prompt

Write a JavaScript arrow function called `double` that takes a number and returns it multiplied by 2. Use const. Return only the function definition, no explanation.

Dimension snapshot

Severity ⚪ Informational
Drift category ✓ No significant drift (NoSignificantDrift)
Morphology Green Neutral · Δ tokens ≈ 4% · type Δ false
Tone Green Neutral · formality Δ 0%
Semantic Green Neutral · sim 0.9999999576965958
Claim Green Neutral · preservation 1 (≥ 0.7 for green band) · dropped 0 · new 0
Latency Green Regression · v1 642 ms · v2 775 ms · Δ 133 ms
Consistency Green Neutral · runs 3/3 · var 0 / 0
Refusal Green Neutral · new false · lifted false
Claim diff2 matched · 0 dropped · 0 new

Matched claims

  • sim 1 · anchors ok true · v1: ```javascript ↔ v2: ```javascript
  • sim 1 · anchors ok true · v1: const double = (num) => num * 2; ↔ v2: const double = num => num * 2;

v1 response

```javascript
const double = (num) => num * 2;
```

v2 response

```javascript
const double = num => num * 2;
```
code_no_explanation_discipline Green σ v1 0 · v2 0 Instruction · Neutral

Prompt

Write a Python function called `fibonacci` that returns the nth Fibonacci number using recursion. Return ONLY the function. No explanation before it. No explanation after it. No usage example. Just the def block.

Dimension snapshot

Severity ⚪ Informational
Drift category ✓ No significant drift (NoSignificantDrift)
Morphology Green Neutral · Δ tokens ≈ 2% · type Δ false
Tone Green Neutral · formality Δ -2%
Semantic Green Neutral · sim 0.999999992218997
Claim Green Neutral · preservation 1 (≥ 0.7 for green band) · dropped 0 · new 0
Latency Green Improvement · v1 1441 ms · v2 1138 ms · Δ -303 ms
Consistency Green Neutral · runs 3/3 · var 0 / 0
Refusal Green Neutral · new false · lifted false
Claim diff6 matched · 0 dropped · 0 new

Matched claims

  • sim 1 · anchors ok true · v1: def fibonacci(n): ↔ v2: def fibonacci(n):
  • sim 1 · anchors ok true · v1: if n <= 0: ↔ v2: if n <= 0:
  • sim 1 · anchors ok true · v1: return 0 ↔ v2: return 0
  • sim 1 · anchors ok true · v1: elif n == 1: ↔ v2: elif n == 1:
  • sim 1 · anchors ok true · v1: return 1 ↔ v2: return 1
  • sim 1 · anchors ok true · v1: return fibonacci(n - 1) + fibonacci(n - 2) ↔ v2: return fibonacci(n-1) + fibonacci(n-2)

v1 response

```python
def fibonacci(n):
    if n <= 0:
        return 0
    elif n == 1:
        return 1
    else:
        return fibonacci(n - 1) + fibonacci(n - 2)
```

v2 response

```python
def fibonacci(n):
    if n <= 0:
        return 0
    elif n == 1:
        return 1
    else:
        return fibonacci(n-1) + fibonacci(n-2)
```
code_python_class_definition Green σ v1 0 · v2 0 Instruction · Neutral

Prompt

Write a Python class called `Counter` with: an `__init__` method that sets `self.count` to 0, an `increment` method that adds 1 to count, and a `reset` method that sets count back to 0. Return only the class definition, no explanation.

Dimension snapshot

Severity ⚪ Informational
Drift category ✓ No significant drift (NoSignificantDrift)
Morphology Green Neutral · Δ tokens ≈ 0% · type Δ false
Tone Green Neutral · formality Δ 0%
Semantic Green Neutral · sim 0.999999982479268
Claim Green Neutral · preservation 1 (≥ 0.7 for green band) · dropped 0 · new 0
Latency Green Improvement · v1 3984 ms · v2 1042 ms · Δ -2942 ms
Consistency Green Neutral · runs 3/3 · var 0 / 0
Refusal Green Neutral · new false · lifted false
Claim diff7 matched · 0 dropped · 0 new

Matched claims

  • sim 1 · anchors ok true · v1: class Counter: ↔ v2: class Counter:
  • sim 1 · anchors ok true · v1: def __init__(self): ↔ v2: def __init__(self):
  • sim 1 · anchors ok true · v1: self.count = 0 ↔ v2: self.count = 0
  • sim 1 · anchors ok true · v1: def increment(self): ↔ v2: def increment(self):
  • sim 1 · anchors ok true · v1: self.count += 1 ↔ v2: self.count += 1
  • sim 1 · anchors ok true · v1: def reset(self): ↔ v2: def reset(self):
  • sim 1 · anchors ok true · v1: self.count = 0 ↔ v2: self.count = 0

v1 response

```python
class Counter:
    def __init__(self):
        self.count = 0

    def increment(self):
        self.count += 1

    def reset(self):
        self.count = 0
```

v2 response

```python
class Counter:
    def __init__(self):
        self.count = 0

    def increment(self):
        self.count += 1

    def reset(self):
        self.count = 0
```
code_python_dict_manipulation Green σ v1 0 · v2 0 Instruction · Neutral

Prompt

Write a Python function called `invert_dict` that takes a dictionary and returns a new dictionary with keys and values swapped. Return only the function, no explanation.

Dimension snapshot

Severity ⚪ Informational
Drift category ✓ No significant drift (NoSignificantDrift)
Morphology Green Neutral · Δ tokens ≈ 0% · type Δ false
Tone Green Neutral · formality Δ 0%
Semantic Green Neutral · sim 0.9999999889549346
Claim Green Neutral · preservation 1 (≥ 0.7 for green band) · dropped 0 · new 0
Latency Green Improvement · v1 1064 ms · v2 734 ms · Δ -330 ms
Consistency Green Neutral · runs 3/3 · var 0 / 0
Refusal Green Neutral · new false · lifted false
Claim diff1 matched · 0 dropped · 0 new

Matched claims

  • sim 1 · anchors ok true · v1: def invert_dict(d): ↔ v2: def invert_dict(d):

v1 response

```python
def invert_dict(d):
    return {v: k for k, v in d.items()}
```

v2 response

```python
def invert_dict(d):
    return {v: k for k, v in d.items()}
```
code_python_error_handling Green σ v1 0 · v2 0 Instruction · Neutral

Prompt

Write a Python function called `safe_divide` that takes two numbers `a` and `b`, returns `a / b`, but returns `None` if `b` is zero. Use a try/except block. Return only the function, no explanation.

Dimension snapshot

Severity ⚪ Informational
Drift category ✓ No significant drift (NoSignificantDrift)
Morphology Green Neutral · Δ tokens ≈ 0% · type Δ false
Tone Green Neutral · formality Δ 0%
Semantic Green Neutral · sim 0.9999999824548148
Claim Green Neutral · preservation 1 (≥ 0.7 for green band) · dropped 0 · new 0
Latency Green Improvement · v1 1316 ms · v2 831 ms · Δ -485 ms
Consistency Green Neutral · runs 3/3 · var 0 / 0
Refusal Green Neutral · new false · lifted false
Claim diff3 matched · 0 dropped · 0 new

Matched claims

  • sim 1 · anchors ok true · v1: def safe_divide(a, b): ↔ v2: def safe_divide(a, b):
  • sim 1 · anchors ok true · v1: except ZeroDivisionError: ↔ v2: except ZeroDivisionError:
  • sim 1 · anchors ok true · v1: return None ↔ v2: return None

v1 response

```python
def safe_divide(a, b):
    try:
        return a / b
    except ZeroDivisionError:
        return None
```

v2 response

```python
def safe_divide(a, b):
    try:
        return a / b
    except ZeroDivisionError:
        return None
```
code_python_list_comprehension Green σ v1 0 · v2 0 Instruction · Neutral

Prompt

Write a Python one-liner using list comprehension that returns all even numbers from a list called `numbers`. Assign the result to a variable called `evens`. Return only the code, no explanation.

Dimension snapshot

Severity ⚪ Informational
Drift category ✓ No significant drift (NoSignificantDrift)
Morphology Green Neutral · Δ tokens ≈ 0% · type Δ false
Tone Green Neutral · formality Δ 0%
Semantic Green Neutral · sim 0.999999947142337
Claim Green Neutral · preservation 1 (≥ 0.7 for green band) · dropped 0 · new 0
Latency Green Improvement · v1 1143 ms · v2 845 ms · Δ -298 ms
Consistency Green Neutral · runs 3/3 · var 0 / 0
Refusal Green Neutral · new false · lifted false
Claim diff1 matched · 0 dropped · 0 new

Matched claims

  • sim 1 · anchors ok true · v1: evens = [num for num in numbers if num % 2 == 0] ↔ v2: evens = [num for num in numbers if num % 2 == 0]

v1 response

```python
evens = [num for num in numbers if num % 2 == 0]
```

v2 response

```python
evens = [num for num in numbers if num % 2 == 0]
```
code_python_simple_function Green σ v1 0 · v2 0 Instruction · Neutral

Prompt

Write a Python function called `add` that takes two integers and returns their sum. Return only the function definition, no explanation, no usage example.

Dimension snapshot

Severity ⚪ Informational
Drift category ✓ No significant drift (NoSignificantDrift)
Morphology Green Neutral · Δ tokens ≈ 0% · type Δ false
Tone Green Neutral · formality Δ 0%
Semantic Green NotApplicable · sim 0.9999999659801694
Claim Green NotApplicable · preservation 1 (≥ 0.7 for green band) · dropped 0 · new 0
Latency Green Regression · v1 1135 ms · v2 1689 ms · Δ 554 ms
Consistency Amber Improvement · runs 3/3 · var 0 / 0
Refusal Green Neutral · new false · lifted false
Claim diff0 matched · 0 dropped · 0 new

No claim rows extracted for this pair.

v1 response

```python
def add(a: int, b: int) -> int:
    return a + b
```

v2 response

```python
def add(a: int, b: int) -> int:
    return a + b
```
code_regex_email Green σ v1 0 · v2 0 Instruction · Neutral

Prompt

Write a Python regex pattern string (just the pattern, assigned to a variable called `pattern`) that matches a basic email address format: word characters, an @ symbol, word characters, a dot, and 2-4 word characters for the domain. Return only the assignment, no explanation, no import.

Dimension snapshot

Severity ⚪ Informational
Drift category ✓ No significant drift (NoSignificantDrift)
Morphology Green Neutral · Δ tokens ≈ 2% · type Δ false
Tone Green Neutral · formality Δ 0%
Semantic Green Neutral · sim 1
Claim Green Neutral · preservation 1 (≥ 0.7 for green band) · dropped 0 · new 0
Latency Green Improvement · v1 1008 ms · v2 592 ms · Δ -416 ms
Consistency Green Neutral · runs 3/3 · var 0 / 0
Refusal Green Neutral · new false · lifted false
Claim diff1 matched · 0 dropped · 0 new

Matched claims

  • sim 1 · anchors ok true · v1: pattern = r'^\w+@\w+\.\w{2,4}$' ↔ v2: pattern = r'\w+@\w+\.\w{2,4}'

v1 response

```python
pattern = r'^\w+@\w+\.\w{2,4}$'
```

v2 response

```python
pattern = r'\w+@\w+\.\w{2,4}'
```
code_sql_basic_select Green σ v1 0 · v2 0 Instruction · Neutral

Prompt

Write a SQL query that selects all columns from a table called `orders` where the `status` column equals 'pending' and orders the results by `created_at` descending. Return only the SQL, no explanation.

Dimension snapshot

Severity ⚪ Informational
Drift category ✓ No significant drift (NoSignificantDrift)
Morphology Green Neutral · Δ tokens ≈ 1% · type Δ false
Tone Green Neutral · formality Δ 0%
Semantic Amber Improvement · sim 0.8005305268146136
Claim Green Improvement · preservation 1 (≥ 0.7 for green band) · dropped 0 · new 2
Latency Green Neutral · v1 1269 ms · v2 1024 ms · Δ -245 ms
Consistency Green Neutral · runs 3/3 · var 0 / 0
Refusal Green Neutral · new false · lifted false
Claim diff1 matched · 0 dropped · 2 new

Matched claims

  • sim 1 · anchors ok true · v1: SELECT * FROM orders WHERE status = 'pending' ORDER BY created_at DESC; ↔ v2: ORDER BY created_at DESC;

New in v2

  • SELECT * FROM orders
  • WHERE status = 'pending'

v1 response

```sql
SELECT * FROM orders WHERE status = 'pending' ORDER BY created_at DESC;
```

v2 response

```sql
SELECT * FROM orders
WHERE status = 'pending'
ORDER BY created_at DESC;
```
code_sql_join Green σ v1 0 · v2 0 Instruction · Neutral

Prompt

Write a SQL query that joins the `users` table with the `orders` table on `users.id = orders.user_id`, returning `users.name` and `orders.total`. Return only the SQL, no explanation.

Dimension snapshot

Severity ⚪ Informational
Drift category ✓ No significant drift (NoSignificantDrift)
Morphology Green Neutral · Δ tokens ≈ 0% · type Δ false
Tone Green Neutral · formality Δ 0%
Semantic Green Neutral · sim 0.9999999756073144
Claim Green Neutral · preservation 1 (≥ 0.7 for green band) · dropped 0 · new 0
Latency Green Neutral · v1 1269 ms · v2 1027 ms · Δ -242 ms
Consistency Green Neutral · runs 3/3 · var 0 / 0
Refusal Green Neutral · new false · lifted false
Claim diff3 matched · 0 dropped · 0 new

Matched claims

  • sim 1 · anchors ok true · v1: SELECT users.name, orders.total ↔ v2: SELECT users.name, orders.total
  • sim 1 · anchors ok true · v1: FROM users ↔ v2: FROM users
  • sim 1 · anchors ok true · v1: JOIN orders ON users.id = orders.user_id; ↔ v2: JOIN orders ON users.id = orders.user_id;

v1 response

```sql
SELECT users.name, orders.total
FROM users
JOIN orders ON users.id = orders.user_id;
```

v2 response

```sql
SELECT users.name, orders.total
FROM users
JOIN orders ON users.id = orders.user_id;
```

Upgrade path

Structured rollout view. Validated prompt patches appear when mutations validate.

Blocking 0 · Verify 0 · Neutral 0 · Validated 0

No blocking regressions detected.

Validated prompt patches

No automated patches generated — run with --mutate to attempt prompt repairs.

Latency impact

Observational only — latency does not affect overall risk, probe direction, or upgrade path routing.

Average baseline latency

1427 ms

Average target latency

969 ms

Delta: -458 ms (-32.1%) · Direction: Improvement

v2 responded 32% faster on average across 10 probes

Upgrade recommendation

Overall risk is acceptable for a controlled rollout. Review amber items and any "verify" red valence before broad deployment.

Copied to clipboard