# Endobest Quality Checks & Regression Testing Guide

## Part 3: Quality Assurance, Validation Rules & Configuration

**Document Version:** 3.1 (Updated with new Excel export module reference)
**Last Updated:** 2025-11-08
**Audience:** Developers, Business Analysts, QA Engineers
**Language:** English

**Note:** Excel export functionality now available - see DOCUMENTATION_13_EXCEL_EXPORT.md, DOCUMENTATION_98_USER_GUIDE.md, and DOCUMENTATION_99_CONFIG_GUIDE.md

---

## Version History

### Version 3.0 (2025-10-22) - UNIFIED FIELD SELECTION PIPELINE
**Complete Refactorization of Field Selection**
- ✅ **Merged Columns:** `field_group` (F) + `field_name` (G) → single `field_selection` (F)
- ✅ **Simplified Syntax:** Field selection uses same pipeline format as transitions: `[["action", "field_selector"], ...]`
- ✅ **3 Selector Patterns:** `*.*` (all fields), `group.*` (group), `group.field` (specific)
- ✅ **Cleaner Code:** Removed 150+ lines of dual-filter logic (field_group + field_name combinations)
- ✅ **Config-Driven Keys:** Key field determination (Patient_Id, Pseudo) now read from `field_selection` instead of hardcoded
- ✅ **Unified Key Detection:** New `_get_key_field_from_new_inclusions_rule()` applies field_selection pipeline directly to first inclusion (15 LOC, -75% vs manual parsing)
- ✅ **Helper Functions:** `_apply_field_selection_pipeline()`, `_get_key_field_from_new_inclusions_rule()`, `_build_candidate_fields()`
- ⚠️ **MAJOR Breaking Change:** Old `field_group` and `field_name` columns (F, G) are **removed**
- ⚠️ **Column Shifts:** `bloc_scope` moves H→G, `transitions` moves I→H
- ⚠️ **Configuration Migration Required:** Completely restructure Excel `Regression_Check` sheet

**Technical Details:**
- Field selection pipeline starts with empty set, each step adds/removes fields
- Responsibility on admin to order rules correctly (no implicit logic)
- Special rules `"New Fields", "Deleted Fields", "Deleted Inclusions"` must have empty field_selection
- Special rule `"New Inclusions"` applies field_selection pipeline to first inclusion sample (assumes stable structure)
- Key field detection: finds first field from pipeline that has non-null value in both first new and old inclusion
- Configuration validation: missing/invalid field_selection = CRITICAL error

**Removed Dead Code:**
- `_determine_key_field()` - hardcoded Patient_Id/Pseudo logic
- `_matches_field_group_filter()` - replaced by pipeline
- `_matches_field_name_filter()` - replaced by pipeline
- `_determine_key_field_from_config()` - replaced by simplified unified `_get_key_field_from_new_inclusions_rule()`

### Version 2.0 (2025-10-22) - Pipeline Architecture
**Transitions Pipeline Introduced**
- ✅ **Unified Format:** Merged `transitions` + `transition_exceptions` into single `transitions` column
- ✅ **Simplified Syntax:** Each step is a 4-element array `[action, field_selector, from, to]`
- ✅ **Sequential Processing:** Pipeline steps applied in order, allowing fine-grained control
- ✅ **Better Determinism:** All sets sorted for reproducible logs
- ✅ **Improved Error Handling:** Invalid configs silently skipped with warnings
- ⚠️ **Breaking Change:** Old `transition_exceptions` column (J) merged into `transitions` (I)

### Version 1.0 (2025-10-21) - Initial Release
- Dual-column system: `transitions` (I) + `transition_exceptions` (J)
- Include/exclude exception handling
- Multiple transition support per exception

---

## Table of Contents

1. [Overview](#overview)
2. [Quality Assurance Strategy](#quality-assurance-strategy)
3. [Coherence Check (Technical Details)](#coherence-check-technical-details)
4. [Non-Regression Check Framework](#non-regression-check-framework)
5. [Regression Check Configuration File](#regression-check-configuration-file)
6. [Column Reference](#column-reference)
7. [Special Keywords & Wildcards](#special-keywords--wildcards)
8. [Rule Types & Logic](#rule-types--logic)
9. [Field Selection Pipeline](#field-selection-pipeline-v30)
10. [Transition Patterns](#transition-patterns)
11. [Exception Handling](#exception-handling)
12. [Configuration Examples](#configuration-examples)
13. [User Guide: Adding/Modifying Rules](#user-guide-adding-modifying-rules)
14. [Execution Modes](#execution-modes)
15. [Troubleshooting](#troubleshooting)

---

## ⚠️ CRITICAL - Version 3.0 Migration Required

**This document describes v3.0 with BREAKING CHANGES from v2.0**

| Item | v2.0 | v3.0 |
|------|------|------|
| **Excel Columns F-I** | `field_group`, `field_name`, `bloc_scope`, `transitions` | `field_selection`, `bloc_scope`, `transitions` |
| **Column Count** | 4 columns for filtering+transitions | 3 columns (merged field_selection) |
| **Key Field Config** | Hardcoded (Patient_Id/Pseudo) | Config-driven (from field_selection) |
| **Field Filtering Logic** | 6+ combinations (complex) | Single pipeline (simple) |

**ACTION REQUIRED:**
1. ✅ Update Excel file column positions
2. ✅ Migrate field_group + field_name → field_selection
3. ✅ Run non-regression tests
4. ✅ Verify key field detection works with new config

---

## Overview

The **Quality Checks System** provides comprehensive data validation in two stages:

1. **Coherence Check:** Verifies that organization statistics (API counters) match the actual detailed inclusion data
2. **Non-Regression Check:** Detects unexpected data changes between current and previous collection runs

Both checks are **configurable via Excel** with **Warning/Critical severity levels** that can trigger user confirmation prompts.

### Design Philosophy

```
Trust, but Verify

- Trust: API data is generally reliable
- Verify: Statistical consistency and change detection
- Report: Multi-level severity (OK, Warning, Critical)
- Decide: User confirmation before export on critical issues
```

---

## Quality Assurance Strategy

### Workflow Integration

```
Data Collection
    ↓
QUALITY CHECKS
├─ COHERENCE CHECK (mandatory)
│  ├─ Load organization statistics from API responses
│  ├─ Calculate actual counts from detailed inclusions
│  └─ Compare: Stats vs. Actual
│
├─ NON-REGRESSION CHECK (if old file exists)
│  ├─ Load previous inclusions (_old file)
│  ├─ Apply config-driven comparison rules
│  └─ Report: Changes matching configured patterns
│
└─ RESULT
    ├─ has_coherence_critical flag
    └─ has_regression_critical flag
        ↓
    IF critical issues detected:
      ├─ Display warning: ⚠ CRITICAL
      ├─ Ask user: "Write results anyway?"
      ├─ If NO → Abort export, preserve old files
      └─ If YES → Continue with export (user override)
    ELSE:
      └─ Continue with export automatically
```

### Severity Levels

| Level | Display | Meaning | Action |
|-------|---------|---------|--------|
| **OK** | ✓ Green | No issues, within normal range | Continue automatically |
| **WARNING** | ⚠ Yellow | Issue detected, exceeds warning threshold | Log and display, continue automatically |
| **CRITICAL** | ✗ Red | Severe issue, exceeds critical threshold | Display, ask user before export |

### User Interaction

```
Quality Checks Complete

✗ [red]Coherence Check: CRITICAL[/red]
  ⚠ [yellow]Organization 1 mismatch: 95 vs 98[/yellow]

✗ [red]Non-Regression: CRITICAL[/red]
  ⚠ [yellow]New Inclusions: 42 (threshold 50)[/yellow]
  ✗ [red]Deleted Inclusions: 15 (threshold 0)[/red]

[bold]⚠ CRITICAL issues detected in quality checks![/bold]
Do you want to write the results anyway? [y/N]:
  y → Export anyway (risky, user override)
  n → Cancel export (preserve old files)
```

---

## Coherence Check (Technical Details)

### Purpose

Verify that **organization statistics** (fetched from API) match **actual detailed data** (inclusion-by-inclusion count).

### Data Sources

**Source 1: Organization Statistics (API)**
```
For each organization:
  GET /api/inclusions/inclusion-statistics
  Returns:
  {
    "totalInclusions": N,      // Total patients
    "preIncluded": P,          // Pré-inclus count
    "included": I,             // Inclus count
    "prematurelyTerminated": T // Prematurely terminated
  }
```

**Source 2: Inclusion Details (JSON Array)**
```
For each patient in endobest_inclusions:
  Check: Patient_Identification.Organisation_Id
  Count: Based on Inclusion.Inclusion_Status

  Classification rules:
  1. If status ends with " - AP" → prematurely_terminated
  2. Else if status starts with "pré-inclus" → preincluded
  3. Else if status starts with "inclus" → included
  Always count: patients += 1
```

### Validation Logic

```python
def coherence_check(current_inclusions, organizations_list):
    # STEP 1: Collect statistics from API
    total_stats = {
        'patients': sum(org['patients_count'] for org in organizations),
        'preincluded': sum(org['preincluded_count'] for org in organizations),
        'included': sum(org['included_count'] for org in organizations),
        'prematurely_terminated': sum(org['prematurely_terminated_count'] for org in organizations)
    }

    # STEP 2: Calculate actual counts from detailed data
    total_detail = calculate_detail_counters(current_inclusions, org_id=None)
    #  = (patients, preincluded, included, prematurely_terminated)

    # STEP 3: Compare all 4 counters
    is_match = (
        total_stats['patients'] == total_detail['patients'] AND
        total_stats['preincluded'] == total_detail['preincluded'] AND
        total_stats['included'] == total_detail['included'] AND
        total_stats['prematurely_terminated'] == total_detail['prematurely_terminated']
    )

    # STEP 4: Report total comparison
    IF is_match:
        PRINT: ✓ [green]TOTAL matches[/green]
    ELSE:
        PRINT: ✗ [red]TOTAL mismatch[/red]
        PRINT: Stats({P}/{Pre}/{Inc}/{Term}) vs Detail({p}/{pre}/{inc}/{term})
        set has_critical = True

    # STEP 5: Detail-level comparison (only if not OK)
    FOR EACH organization:
        org_stats = get organization counters
        org_detail = calculate_detail_counters(current_inclusions, org_id=org.id)

        IF org_stats != org_detail:
            PRINT: ⚠ [yellow]Organization "{name}" mismatch[/yellow]
            PRINT: Stats vs Detail breakdown
            set has_critical = True

    RETURN has_critical
```

### Example Output

**Scenario: Perfect Match**
```
═══ Coherence Check ═══

✓ [green]TOTAL - Stats(150/20/120/10) vs Detail(150/20/120/10)[/green]
```

**Scenario: Mismatch Detected**
```
═══ Coherence Check ═══

✗ [red]TOTAL - Stats(150/20/118/10) vs Detail(150/20/120/10)[/red]
  ⚠ [yellow]Center A - Stats(50/5/40/5) vs Detail(50/5/42/5)[/yellow]
  ⚠ [yellow]Center B - Stats(100/15/78/5) vs Detail(100/15/78/5)[/yellow]
```

### Interpretation

**Match (Green):**
```
API statistics perfectly align with detailed data
→ No data collection issues
→ Continue processing
```

**Minor Mismatch (Yellow):**
```
1-2 patients differ between statistics and details
→ Possible API consistency issue
→ Monitor but continue (it happens occasionally)
```

**Major Mismatch (Red):**
```
10+ patients difference
→ Significant data collection issue
→ Investigate root cause
→ Consider re-running collection
```

---

## Non-Regression Check Framework

### Purpose

Detect **unexpected data changes** between current and previous collections by comparing field values against configured transition patterns.

### Architecture

```
Previous Inclusions (File)
    ↓
┌─────────────────────────────┐
│ NON-REGRESSION CHECK        │
├─────────────────────────────┤
│ 1. Load Regression Config   │
│    (Excel: Regression_Check sheet)
│                             │
│ 2. Build Inclusion Dicts    │
│    Index by: Patient_Id or Pseudo
│                             │
│ 3. Group Rules by Bloc      │
│    - Structure              │
│    - Identification         │
│    - Inclusion Protocol     │
│    - Endotest               │
│    - Other Questionnaires   │
│                             │
│ 4. For Each Rule:           │
│    a) Detect rule type      │
│       - Normal rule         │
│       - New Inclusions      │
│       - Deleted Inclusions  │
│       - New Fields          │
│       - Deleted Fields      │
│                             │
│    b) Process rule logic    │
│       - Collect candidates  │
│       - Match transitions   │
│       - Apply exceptions    │
│       - Apply bloc_scope    │
│                             │
│    c) Calculate severity    │
│       - Count vs thresholds │
│       - Determine status    │
│                             │
│ 5. Display Results          │
│    - By bloc                │
│    - Color-coded status     │
│    - Detailed changes (debug)
│                             │
└─────────────────────────────┘
    ↓
Current Inclusions (Memory)
```

---

## Regression Check Configuration File

### File Location & Sheet

```
Endobest_Dashboard_Config.xlsx
│
├─ Sheet 1: "Inclusions_Mapping" (See DOCUMENTATION_11_FIELD_MAPPING.md)
│
└─ Sheet 2: "Regression_Check"
   ├─ Row 1: Headers
   └─ Row 2+: Rules
```

### Sheet Structure (Version 3.0)

```
Row 1 (Headers):
A            B          C              D                  E
ignore      bloc_title  line_label     warning_threshold  critical_threshold
F                       G              H
field_selection         bloc_scope     transitions

Row 2+: Rule definitions (one per row)
```

**BREAKING CHANGE (v3.0):** Columns F and G from v2.0 (`field_group` and `field_name`) have been **merged into single column F (`field_selection`)**. All subsequent columns shifted left by one position.

**Color Coding:**
- **Yellow:** Structure/Identification bloc (foundational rules)
- **Blue:** Inclusion Protocol bloc (inclusion status rules)
- **Light Purple:** Endotest bloc (test-related rules)
- **White:** Regular rules
- **Red:** Incomplete/error rules (missing required columns)

---

## Column Reference

### Column A: ignore
**Type:** String (optional)
**Description:** Skip this row if contains "ignore" (case-insensitive)
**Purpose:** Comment out rules without deleting rows
**Values:**
```
ignore          → Row is skipped
(empty)         → Row is processed
any_other_text  → Row is processed
```

### Column B: bloc_title
**Type:** String (required)
**Description:** Logical grouping of related rules
**Purpose:** Visual organization and blocking/reporting
**Valid Values:**
```
Structure              → File format and field availability rules
Identification         → Patient identification changes
Inclusion Protocol     → Inclusion status and protocol changes
Endotest               → Laboratory test request changes
Other Questionnaires   → Non-specific questionnaire changes
[Custom Group Names]   → Any custom bloc name for organization
```

**Rules Per Bloc:**
```
Structure bloc (Example):
  ├─ New Fields
  ├─ Deleted Fields
  └─ (Structure-specific rules)

Identification bloc:
  ├─ New Inclusions
  ├─ Deleted Inclusions
  ├─ Changed (Excluding Birthday)
  ├─ Changed Date of Birth/Age
  └─ (Identification-specific rules)

Endotest bloc:
  ├─ Undefined to Defined (Only)
  ├─ Defined to Undefined
  ├─ Changed Value
  └─ (Endotest-specific rules)
```

### Column C: line_label
**Type:** String (required)
**Description:** Unique rule identifier within its bloc
**Purpose:** Displayed in output, identifies rule in reports
**Examples:**
```
New Inclusions
Deleted Inclusions
New Fields
Deleted Fields
Changed Value
Undefined to Defined (Only)
```

**Requirements:**
- Must be unique within bloc_title
- Should be descriptive

### Column D: warning_threshold
**Type:** Numeric (required, >= 0)
**Description:** Count threshold that triggers WARNING level
**Position:** Column D (after line_label)
**Logic:**
```
IF count > warning_threshold AND count <= critical_threshold:
  Status = WARNING (yellow ⚠)
```

**Examples:**
```
0      → Any change triggers warning (strict)
5      → 1-5 changes = OK, 6-10 = Warning
50     → 1-50 changes = OK, 51+ = Warning (lenient)
200    → Very lenient, only alert on large changes
```

### Column E: critical_threshold
**Type:** Numeric (required, >= warning_threshold)
**Description:** Count threshold that triggers CRITICAL level
**Position:** Column E (after warning_threshold)
**Logic:**
```
IF count > critical_threshold:
  Status = CRITICAL (red ✗)
  → May prompt user for confirmation
```

**Relationship:**
```
warning_threshold <= critical_threshold

Examples:
(0, 1)      → Strict: any change is critical
(0, 50)     → Any warning also becomes critical
(50, 100)   → Normal operation: 1-50 OK, 51-100 warning, 100+ critical
(200, 200)  → Same thresholds: jump directly from OK to critical
```

### Column F: field_selection (NEW - v3.0)
**Type:** JSON array of 2-element arrays (mandatory for most rules)
**Description:** Pipeline-based field selection using include/exclude actions
**Position:** Column F (after critical_threshold) - **REPLACES old field_group + field_name**
**Rules:**
- **Format:** `[["action", "field_selector"], ["action", "field_selector"], ...]`
- **Mandatory:** For all rules EXCEPT `"New Fields"`, `"Deleted Fields"`, `"Deleted Inclusions"`
- **For special rules:** Must be empty `[]` or null
- **Explicit:** No implicit logic - admin must order steps correctly
- **Pipeline:** Starts with empty set, each step adds or removes fields

**Elements:**

| Element | Type | Valid Values | Example |
|---------|------|--------------|---------|
| **action** | String | `"include"` or `"exclude"` | `"include"` |
| **field_selector** | String | `*.*`, `group.*`, `group.field` | `"Endotest.Request_Sent"` |

**Selector Patterns (3 only):**
```
*.*              → All fields in all groups
group.*          → All fields in specific group (e.g., "Endotest.*")
group.field      → Specific field only (e.g., "Endotest.Request_Sent")
```

**Examples:**

**1. Include Single Group**
```json
[["include", "Endotest.*"]]
// All Endotest fields
```

**2. Include Multiple Groups**
```json
[["include", "Endotest.*"], ["include", "Inclusion.*"]]
// Endotest AND Inclusion fields
```

**3. Include All, Exclude Some**
```json
[["include", "*.*"], ["exclude", "Endotest.Last_Updated"]]
// All fields EXCEPT Endotest.Last_Updated
```

**4. Key Field Selection (for "New Inclusions" rule)**
```json
[["include", "Patient_Identification.Patient_Id"], ["include", "Patient_Identification.Pseudo"]]
// Tries Patient_Id first, then Pseudo (in order)
```

**5. Complex Pipeline**
```json
[
  ["include", "*.*"],
  ["exclude", "Inclusion.*"],
  ["exclude", "Patient_Identification.*"]
]
// All fields EXCEPT Inclusion and Patient_Identification
```

**Special Rules (field_selection must be EMPTY):**
```
"New Fields"           → [] or null
"Deleted Fields"       → [] or null
"Deleted Inclusions"   → [] or null
```

**Validation:**
- ✅ Missing or null field_selection for normal rules → **CRITICAL ERROR**
- ✅ Invalid selector (no dot) → **CRITICAL ERROR**
- ✅ Non-list format → **CRITICAL ERROR, skip rule**
- ✅ Step with wrong element count → **CRITICAL ERROR, skip rule**

### Column G: bloc_scope (moved from H - v3.0)
**Type:** String enum (optional, default: "any")
**Description:** Aggregation logic for matching fields within an inclusion
**Position:** Column G (after field_selection)
**Valid Values:**
```
"any"    → At least ONE field must match transitions
"all"    → ALL changed fields must match transitions
```

**Logic:**

**bloc_scope = "any" (Default)**
```
IF ANY candidate field has matching transition:
  RETURN inclusion matches rule

Use for: "Alert if any change occurs"
```

**bloc_scope = "all"**
```
IF ALL changed fields have matching transitions:
  RETURN inclusion matches rule

Use for: "Alert only if all changes match pattern"
```

**Example Comparison:**

```
Inclusion with 5 fields in scope:
  Field1: Changed, matches transition ✓
  Field2: Unchanged (always ignored)
  Field3: Changed, does NOT match transition ✗
  Field4: Unchanged (always ignored)
  Field5: Changed, matches transition ✓

Changed fields: [Field1, Field3, Field5]
Matched changed: [Field1, Field5]

Result with bloc_scope="any":  ✓ COUNT (Field1 matched)
Result with bloc_scope="all":  ✗ SKIP (Field3 didn't match)
```

| Scenario | bloc_scope="any" | bloc_scope="all" |
|----------|------------------|-----------------|
| 1 match, 0 mismatches | ✓ COUNT | ✓ COUNT |
| 1 match, 1 mismatch | ✓ COUNT | ✗ SKIP |
| 0 matches, 1 mismatch | ✗ SKIP | ✗ SKIP |
| 3 matches, 0 mismatches | ✓ COUNT | ✓ COUNT |
| 3 matches, 1 mismatch | ✓ COUNT | ✗ SKIP |

---

### Column H: transitions (moved from I - v3.0)
**Type:** JSON array of 4-element arrays (optional)
**Description:** Pipeline-based transition rules (old_value → new_value)
**Position:** Column H (after bloc_scope)
**Format:** `[["action", "field_selector", "from_pattern", "to_pattern"], ...]`
- Each step is exactly 4 elements
- If None/empty: Rule applies to ALL field changes
- Supports wildcard keywords: `*undefined`, `*defined`, `*`
- Supports literal values for exact matching

**Pipeline Concept (v2.0+):**

```
Initial state: All changed fields → is_checked = False

Step 1: Include rule for all fields (*.*) with *defined→*defined
  └─ is_checked = True if transition matches

Step 2: Include rule for Endotest.Diagnostic_Status with waiting→*undefined
  └─ is_checked = True (whitelisted exception)

Step 3: Exclude rule for Endotest.Request_Sent with false→true
  └─ is_checked = False (blacklisted exception)

Final result: Only fields matching the pipeline are checked
```

---

#### Syntax: 4-Element Pipeline Array

Each pipeline step is a **4-element array**:
```json
[action, field_selector, from_pattern, to_pattern]
```

| Element | Description | Examples |
|---------|-------------|----------|
| **action** | "include" (whitelist) or "exclude" (blacklist) | "include", "exclude" |
| **field_selector** | Which fields this step applies to | "*.*", "group.*", "group.field" |
| **from_pattern** | Old value pattern to match | "*undefined", "*defined", "*", literal value |
| **to_pattern** | New value pattern to match | "*undefined", "*defined", "*", literal value |

**Important:** The syntax is **strictly enforced** - each step must have exactly 4 elements. No shortcuts or variants are accepted.

---

#### Field Selector Patterns

```
*.*                    → All fields in all groups
group.*                → All fields in specific group (e.g., "Endotest.*")
group.field            → Specific field only (e.g., "Endotest.Request_Sent")
```

---

#### Complete Examples

**Example 1: Simple All-Fields Rule (Most Common)**
```json
{
  "transitions": [
    ["include", "*.*", "*defined", "*defined"]
  ]
}
// Pipeline: Include all fields that change between two defined values
```

**Example 2: Main Rule + One Include Exception**
```json
{
  "transitions": [
    ["include", "*.*", "*defined", "*defined"],
    ["include", "Endotest.Diagnostic_Status", "waiting", "*undefined"]
  ]
}
// Step 1: Include all *defined→*defined changes
// Step 2: ALSO include specific Endotest.Diagnostic_Status changes from waiting to undefined
```

**Example 3: Main Rule + Include Exception + Exclude Exception**
```json
{
  "transitions": [
    ["include", "*.*", "*defined", "*defined"],
    ["include", "Endotest.Diagnostic_Status", "waiting", "*undefined"],
    ["exclude", "Endotest.Request_Sent", false, true]
  ]
}
// Step 1: Include all *defined→*defined
// Step 2: Include Diagnostic_Status waiting→undefined (whitelist)
// Step 3: Exclude Request_Sent false→true (blacklist)
// Result: Step 3 overrides Step 1 for that specific field+transition
```

**Example 4: Multiple Include Steps for Different Fields**
```json
{
  "transitions": [
    ["include", "*.*", "*defined", "*defined"],
    ["include", "GDD.Status", "pending", "completed"],
    ["include", "GDD.Status", "pending", "failed"]
  ]
}
// Step 1: Include all *defined→*defined changes
// Step 2: Include GDD.Status pending→completed
// Step 3: Include GDD.Status pending→failed
```

**Example 5: Exclude Rule with Wildcard**
```json
{
  "transitions": [
    ["include", "*.*", "*defined", "*defined"],
    ["exclude", "Endotest.Last_Modified", "*", "*"]
  ]
}
// Include all changes EXCEPT any change to Last_Modified field
```

---

#### Processing Logic

The pipeline is executed **sequentially**, with each step modifying the `is_checked` status in-place:

```
1. Initialize: All changed fields have is_checked = False

2. For each transition step in order:
   a. Check if the current field matches the field_selector
   b. If yes: Check if the old→new values match from_pattern→to_pattern
   c. If yes:
      - If action="include": Set is_checked = True
      - If action="exclude": Set is_checked = False
   d. If no: Leave is_checked unchanged

3. Final: Only fields with is_checked = True are counted as matching
```

**Important:** Later steps can override earlier steps. Example:
```json
[
  ["include", "*.*", "*", "*"],      // Step 1: include everything
  ["exclude", "Field.X", "*", "*"]   // Step 2: exclude Field.X (overrides Step 1)
]
```
Result: Everything is included EXCEPT Field.X

---

#### Configuration Error Handling

If a transitions step has invalid syntax:
- The rule is silently skipped (logged with yellow warning)
- No exception is thrown
- User can see the ⚠ warning in the output
- User can choose to save the report or fix the config

**Valid syntax example:**
```json
["include", "*.*", "*defined", "*defined"]  // ✓ Exactly 4 elements
["include", "*.*", "*defined"]               // ✗ Only 3 elements (INVALID)
["maybe", "*.*", "*defined", "*defined"]    // ✗ Invalid action (INVALID)
["include", "invalid", "*defined", "*defined"] // ✗ No dot in selector (INVALID)
```

---

## Special Keywords & Wildcards

This section documents the special keywords and patterns used in transition specifications throughout the configuration.

### Keywords in Transition Patterns

The regression check configuration supports special keywords with `*` prefix for flexible transition matching:

#### Keyword 1: `*undefined`

**Meaning:** Matches any "undefined-like" value

**Matches:**
- `null` (None in Python)
- `""` (empty string)
- `"undefined"` (literal string)

**Example:**
```json
{
  "transitions": [["*undefined", "*defined"]]
}
// Matches: undefined → Active, null → 42, "" → true, etc.
```

**Use Case:** Detect when a field gets populated for the first time

---

#### Keyword 2: `*defined`

**Meaning:** Matches any "defined" value (opposite of *undefined)

**Matches:** Anything EXCEPT:
- `null` (None)
- `""` (empty string)
- `"undefined"` (literal string)

**Example:**
```json
{
  "transitions": [["*defined", "*undefined"]]
}
// Matches: Active → null, 42 → "", true → "undefined", etc.
```

**Use Case:** Detect when a field loses its value

---

#### Keyword 3: `*` (Wildcard)

**Meaning:** Matches absolutely any value

**Matches:** Any value including:
- Defined values (strings, numbers, booleans)
- Undefined-like values (null, "", "undefined")
- Objects, arrays, etc.

**Example:**
```json
{
  "transitions": [["*", "*"]]
}
// Matches: ANY old value → ANY new value
// Essentially: "any change at all"
```

**Use Case:** Monitor all changes to a field, filter out specific cases with exceptions

---

### Combining Keywords with Literal Values

Patterns can mix keywords and literal values:

| Pattern | Meaning |
|---------|---------|
| `["*undefined", "*defined"]` | Undefined → Defined (field becomes populated) |
| `["*defined", "*undefined"]` | Defined → Undefined (field gets cleared) |
| `["*defined", "*defined"]` | Value change while staying defined (actual value change required) |
| `["*", "*"]` | Any change at all |
| `["Active", "*defined"]` | From literal "Active" to any defined value |
| `["*undefined", "Active"]` | From undefined to literal "Active" |

---

### Literal Values (No `*` Prefix)

Any value that does NOT start with `*` is treated as a literal value and matched exactly:

```json
{
  "transitions": [
    ["pending", "accepted"],    // Exact string match
    [false, true],              // Exact boolean match
    [0, 1],                     // Exact numeric match
    [null, "Active"],           // null matches null, "Active" matches "Active"
    ["undefined", "Done"]       // "undefined" (literal string) matches "undefined"
  ]
}
```

**Important:** Literal values are matched by exact equality, including:
- `"undefined"` - matches the exact string "undefined" (not undefined state)
- `null` - matches null values
- `""` - matches empty string

---

## Summary Table: Special Keywords in Transitions

| Keyword | Matches | Use Case |
|---------|---------|----------|
| `*undefined` | null, "", "undefined" (any undefined-like value) | Detect when field becomes populated |
| `*defined` | Any defined value (NOT null, "", "undefined") | Detect when field loses value |
| `*` | Any value whatsoever | Alert on any change; use with exceptions for fine control |
| (no `*` prefix) | Exact literal values | Specific value matching (e.g., "pending" → "accepted") |

---

### Rule Type 1: Standard Rules (Normal Comparison)

**Purpose:** Detect field value changes matching configured patterns

**Processing Steps:**

```
Step 1: Collect Candidate Fields
├─ Filter by field_group (if specified)
├─ Filter by field_name (if specified)
└─ Result: List of (group_name, field_name) tuples

Step 2: For Each Candidate Field
├─ Get new_value and old_value
├─ Check if transition matches (if transitions specified)
├─ Apply exceptions (include/exclude)
├─ Mark as "checked" if matches

Step 3: Apply bloc_scope
├─ With "any": Count inclusion if ANY field is checked
├─ With "all": Count inclusion if ALL changed fields are checked

Step 4: Report Matching Inclusions
└─ Count vs. thresholds (warning/critical)
```

**Example Configuration:**

```json
{
  "bloc_title": "Inclusion Protocol",
  "line_label": "Undefined to Defined (Only)",
  "warning_threshold": 0,
  "critical_threshold": 200,
  "field_group": {"include": ["Inclusion"]},
  "field_name": null,
  "transitions": [
    ["include", "*.*", "*undefined", "*defined"]
  ],
  "bloc_scope": "all"
}
```

### Rule Type 2: New Inclusions

**Purpose:** Count patients that exist in current data but not in previous

**Syntax:**
```json
{
  "bloc_title": "Identification",
  "line_label": "New Inclusions",
  "warning_threshold": 0,
  "critical_threshold": 50,
  "field_group": "Patient_Identification",
  "field_name": ["Patient_Id", "Pseudo"],
  "transitions": [],
  "bloc_scope": null
}
```
**Note:** For special rules like "New Inclusions", transitions can be left as empty array `[]` since these rules don't use transition matching.

**Processing:**
```
1. Build dictionaries indexed by key field
   - Key field candidates: Patient_Id, Pseudo (tried in order)
   - key_dict_new = {patient_key: patient_data for patient in current}
   - key_dict_old = {patient_key: patient_data for patient in previous}

2. Find new inclusions
   new_keys = set(key_dict_new.keys()) - set(key_dict_old.keys())
   count = len(new_keys)

3. Compare to thresholds
   IF count > critical_threshold: CRITICAL
   ELIF count > warning_threshold: WARNING
   ELSE: OK
```

**Example Output:**
```
✓ [green]New Inclusions: 0[/green]
  (No new patients added)

⚠ [yellow]New Inclusions: 42[/yellow]
  (42 new patients - warning threshold exceeded)

✗ [red]New Inclusions: 75[/red]
  (75 new patients - exceeds critical threshold of 50)
```

### Rule Type 3: Deleted Inclusions

**Purpose:** Count patients that exist in previous but not in current

**Syntax:**
```json
{
  "bloc_title": "Identification",
  "line_label": "Deleted Inclusions",
  "warning_threshold": 0,
  "critical_threshold": 0,
  "field_group": "Patient_Identification",
  "field_name": ["Patient_Id", "Pseudo"],
  "transitions": [],
  "bloc_scope": null
}
```

**Processing:**
```
1. Build dictionaries (same as New Inclusions)

2. Find deleted inclusions
   deleted_keys = set(key_dict_old.keys()) - set(key_dict_new.keys())
   count = len(deleted_keys)

3. Compare to thresholds
   IF count > critical_threshold: CRITICAL
   ELIF count > warning_threshold: WARNING
   ELSE: OK
```

**Note:** Typically `critical_threshold=0` because any deletion is concerning.

### Rule Type 4: New Fields

**Purpose:** Detect field names that appear in current but not in previous

**Syntax:**
```json
{
  "bloc_title": "Structure",
  "line_label": "New Fields",
  "warning_threshold": 0,
  "critical_threshold": 1,
  "field_group": null,
  "field_name": null,
  "transitions": [],
  "bloc_scope": null
}
```

**Processing:**
```
1. For each patient in common (present in both versions):
   a) Get all groups and fields from current version
   b) Get all groups and fields from previous version
   c) Find new fields: current_fields - previous_fields
   d) Qualified name: "group_name.field_name"

2. Count by field name
   field_counts = {field_qualified_name: count_of_inclusions}
   total_new_fields = len(field_counts)

3. Display results
   For each new field:
     "Inclusion.New_Field (42 inclusions)"
     [count = number of inclusions that gained this field]
```

**Example Output:**
```
✓ [green]New Fields: 0[/green]

⚠ [yellow]New Fields: 2[/yellow]
    Endotest.New_Request_Type (1 inclusion)
    Inclusion.New_Status_Code (2 inclusions)
```

### Rule Type 5: Deleted Fields

**Purpose:** Detect field names that exist in previous but not in current

**Syntax:**
```json
{
  "bloc_title": "Structure",
  "line_label": "Deleted Fields",
  "warning_threshold": 0,
  "critical_threshold": 1,
  "field_group": null,
  "field_name": null,
  "transitions": [],
  "bloc_scope": null
}
```

**Processing:** Same as "New Fields" but reversed:
```
deleted_fields = previous_fields - current_fields
```

---

## Field Selection Pipeline (v3.0)

**NEW APPROACH:** Field selection now uses the **same pipeline architecture as transitions**.

### Pipeline Ordering (Key Concept)

Start with an **empty set of fields**. Each step either **includes** or **excludes** fields:

```python
candidate_fields = set()  # Empty initially

# Step 1: Include all Endotest fields
for each field in all_fields:
    if selector matches "Endotest.*":
        candidate_fields.add(field)

# Step 2: Also include Inclusion.Status
for each field in all_fields:
    if selector matches "Inclusion.Status":
        candidate_fields.add(field)

# Step 3: But exclude Endotest.Last_Updated
for each field in all_fields:
    if selector matches "Endotest.Last_Updated":
        candidate_fields.discard(field)

# Result: Endotest.* + Inclusion.Status, except Endotest.Last_Updated
```

### Simple Examples

#### Example 1: Single Group
```json
[["include", "Endotest.*"]]
// Result: All Endotest fields
```

#### Example 2: Multiple Groups
```json
[["include", "Endotest.*"], ["include", "Inclusion.*"]]
// Result: All Endotest + all Inclusion fields
```

#### Example 3: Specific Fields
```json
[["include", "Patient_Identification.Patient_Id"], ["include", "Patient_Identification.Pseudo"]]
// Result: Only Patient_Id and Pseudo fields
```

#### Example 4: All Except Some
```json
[["include", "*.*"], ["exclude", "Endotest.Last_Updated"]]
// Result: All fields EXCEPT Endotest.Last_Updated
```

#### Example 5: Complex Selection
```json
[
  ["include", "*.*"],
  ["exclude", "Patient_Identification.*"],
  ["exclude", "Inclusion.*"]
]
// Result: All fields EXCEPT Patient_Identification and Inclusion
```

### Important Notes

- ✅ **Order matters:** Steps are applied sequentially
- ✅ **Explicit:** Admin responsible for correct pipeline
- ✅ **No implicit AND/OR:** Use multiple include steps for OR logic
- ✅ **Deterministic:** Sets sorted, reproducible results

---

## Transition Patterns

### Pattern Matching Rules

#### Literal Value Matching
```json
[
  ["active", "inactive"],
  [true, false],
  [0, 1]
]
// Match exact value changes
// Type must match (string vs. number vs. boolean)
```

#### Undefined Keyword
```
*undefined: Matches any undefined-like value
  - null
  - "" (empty string)
  - "undefined"

*defined: Matches any defined value
  - NOT null
  - NOT ""
  - NOT "undefined"
```

**Examples:**
```json
[
  ["*undefined", "*defined"]
]
// Transition FROM any undefined TO any defined

[
  ["*defined", "*undefined"]
]
// Transition FROM any defined TO any undefined

[
  ["*defined", "*defined"]
]
// Transition FROM defined TO different defined
// (with actual value change check)
```

#### Wildcard Pattern
```json
[
  ["*", "*"]
]
// Match ANY transition
// Useful for: "Alert on any change to this field"
```

### Transition Combination Examples

**Example 1: Detect New Values Only**
```json
{
  "transitions": [["*undefined", "*defined"]]
}
// Alert when field goes from undefined to any value
// Ignore when field already had value
```

**Example 2: Detect Value Reversal**
```json
{
  "transitions": [
    [true, false],
    [false, true]
  ]
}
// Alert when boolean field toggles in either direction
```

**Example 3: Detect Specific Status Change**
```json
{
  "transitions": [
    ["pending", "approved"],
    ["pending", "rejected"]
  ]
}
// Alert when pending status changes to approved or rejected
// Ignore all other transitions
```

**Example 4: Detect Anything But This**
```json
{
  "transitions": [
    ["include", "*.*", "*", "*"],
    ["exclude", "Endotest.Last_Updated", "*", "*"]
  ]
}
// Alert on any field change
// EXCEPT exclude changes to Last_Updated
```

---

## Exception Handling (Pipeline Architecture)

With the new unified pipeline format, exceptions are now just regular pipeline steps with different actions. This section explains the patterns.

### Pattern 1: Simple Whitelist (Include Only)

Allow specific field/transition combinations:

```json
{
  "transitions": [
    ["include", "Request_Sent", false, true],
    ["include", "Diagnostic_Status", "warning", "complete"]
  ]
}
```

**Logic:**
```
Step 1: Include Request_Sent with false→true transition
Step 2: Include Diagnostic_Status with warning→complete
Result: ONLY these specific field+transition combinations are checked
```

### Pattern 2: Simple Blacklist (Exclude Only)

Block specific field/transition combinations:

```json
{
  "transitions": [
    ["include", "*.*", "*", "*"],
    ["exclude", "Last_Updated", "*", "*"],
    ["exclude", "Endotest.Import_Time", "*", "*"]
  ]
}
```

**Logic:**
```
Step 1: Include all fields with any change (*→*)
Step 2: Exclude Last_Updated from being checked
Step 3: Exclude Endotest.Import_Time from being checked
Result: All fields EXCEPT Last_Updated and Import_Time
```

### Pattern 3: Main Rule + Multiple Exceptions

Combine main transition rule with field-specific exceptions:

```json
{
  "transitions": [
    ["include", "*.*", "*defined", "*defined"],
    ["include", "Request_Sent", false, true],
    ["exclude", "Endotest.Last_Modified", "*", "*"]
  ]
}
```

**Logic:**
```
Step 1: Include fields that change between two defined values
Step 2: ALSO include Request_Sent changing from false to true (even if not *defined→*defined)
Step 3: But exclude any change to Last_Modified (overrides Step 1)
Result: *defined→*defined changes PLUS Request_Sent false→true, EXCEPT Last_Modified
```

### Field Selector Formats in Pipeline

**Simple field name (matches in any group):**
```json
{
  "field_selector": "Status"
}
// Matches "Status" in any group
// But this is NOT pipeline syntax - use "*.*" with field matching instead
```

**Better: Use qualified notation in field_selector:**
```json
["include", "Endotest.Request_Sent", false, true]
// Matches only Endotest group, Request_Sent field
// Matches ONLY Endotest.Request_Sent
```

**Full Specification:**
```json
{
  "field": "Endotest.Request_Sent",
  "transition": [false, true]
}
// Matches this specific field AND transition combination
```

### Practical Examples with Pipeline

**Example 1: Alert on Most Changes, Except System Fields**
```json
{
  "transitions": [
    ["include", "*.*", "*", "*"],
    ["exclude", "Last_Updated", "*", "*"],
    ["exclude", "Last_Modified_By", "*", "*"],
    ["exclude", "Import_Timestamp", "*", "*"]
  ]
}
// Step 1: Include ANY field change
// Step 2-4: Exclude system timestamp/audit fields
```

**Example 2: Alert on Undefined→Defined, Plus Status Reversals**
```json
{
  "transitions": [
    ["include", "*.*", "*undefined", "*defined"],
    ["include", "Request_Status", "rejected", "submitted"]
  ]
}
// Step 1: Include when field goes from undefined to defined
// Step 2: ALSO include Request_Status: rejected → submitted (even if not undefined→defined)
```

**Example 3: Complex Medical Rules with Multiple Conditions**
```json
{
  "transitions": [
    ["include", "*.*", "*undefined", "*defined"],
    ["include", "Endotest.Test_Result", "pending", "completed"],
    ["include", "GDD.Status", "pending", "failed"],
    ["exclude", "Endotest.Last_Sync", "*", "*"]
  ]
}
// Step 1: Include main rule: undefined→defined
// Step 2: ALSO include Test_Result pending→completed
// Step 3: ALSO include GDD.Status pending→failed
// Step 4: But exclude any change to Last_Sync field
// Result: All matching transitions except Last_Sync changes
```

**Example 4: Fine-Grained Control with Include + Exclude**
```json
{
  "transitions": [
    ["include", "*.*", "*"],
    ["include", "Status", "*undefined", "*defined"],
    ["include", "Status", "*defined", "*undefined"],
    ["exclude", "Last_Updated", "*", "*"],
    ["exclude", "Internal_Id", "*", "*"]
  ]
}
// Step 1: Include any change (baseline)
// Step 2-3: Specifically include Status becoming defined/undefined
// Step 4-5: Exclude Last_Updated and Internal_Id changes (override Step 1)
// Result: All changes EXCEPT Last_Updated/Internal_Id, plus Status transitions
```

---

## Configuration Examples

### Example 1: Monitor New Inclusions (v3.0)

**Requirement:** Alert if unexpected number of patients added

```json
{
  "ignore": null,
  "bloc_title": "Identification",
  "line_label": "New Inclusions",
  "warning_threshold": 0,
  "critical_threshold": 50,
  "field_selection": [["include", "Patient_Identification.Patient_Id"], ["include", "Patient_Identification.Pseudo"]],
  "bloc_scope": null,
  "transitions": []
}
```

**Field Selection Logic:**
```
Starts empty: candidate_fields = {}
Step 1: Include Patient_Identification.Patient_Id
Step 2: Include Patient_Identification.Pseudo
Result: [Patient_Identification.Patient_Id, Patient_Identification.Pseudo]
These become key candidates (tried in order)
```

**Logic:**
```
Count patients in current but not in previous
If count > 50: CRITICAL (too many new patients)
If count > 0: WARNING (any new patients)
If count == 0: OK
```

### Example 2: Detect Undefined→Defined Changes (v3.0)

**Requirement:** Alert if any field becomes defined

```json
{
  "bloc_title": "Inclusion Protocol",
  "line_label": "Undefined to Defined",
  "warning_threshold": 0,
  "critical_threshold": 100,
  "field_selection": [["include", "Inclusion.*"]],
  "bloc_scope": "any",
  "transitions": [
    ["include", "*.*", "*undefined", "*defined"]
  ]
}
```

**Field Selection & Transitions:**
```
Field Selection: Include all Inclusion fields
Transitions Pipeline:
  Step 1: Include *.*  *undefined→*defined
  Result: Only undefined→defined changes
```

**Logic:**
```
For each inclusion:
  Check if Inclusion.Inclusion_Status changed
  If transition is: undefined → defined:
    COUNT this inclusion
If count > 5: CRITICAL
If count > 0: WARNING
```

### Example 3: Strict All-Fields Completeness (v3.0)

**Requirement:** Ensure ALL changed fields follow undefined→defined pattern

```json
{
  "bloc_title": "Inclusion Protocol",
  "line_label": "All Changes Undefined to Defined",
  "warning_threshold": 0,
  "critical_threshold": 200,
  "field_selection": [["include", "Inclusion.*"]],
  "bloc_scope": "all",
  "transitions": [
    ["include", "*.*", "*undefined", "*defined"]
  ]
}
```

**Key Difference with bloc_scope="all":**
```
With bloc_scope="any": Count if ANY field matches
With bloc_scope="all": Count ONLY if ALL changed fields match
```

**Logic:**
```
For each inclusion:
  Find all Inclusion fields that changed
  Check if ALL changes are: undefined → defined
If all changed fields match pattern:
  COUNT this inclusion (expected pattern)
If any changed field doesn't match:
  SKIP (unexpected pattern)

If count > 200: CRITICAL (too many gaining data)
```

### Example 4: Request Lifecycle Validation (v3.0)

**Requirement:** Detect expected test request state transitions

```json
{
  "bloc_title": "Endotest",
  "line_label": "Request Status Changes",
  "warning_threshold": 0,
  "critical_threshold": 100,
  "field_selection": [["include", "Endotest.Request_Sent"], ["include", "Endotest.Request_Status"]],
  "bloc_scope": "any",
  "transitions": [
    ["include", "Endotest.Request_Sent", false, true],
    ["include", "Endotest.Request_Status", "pending", "accepted"],
    ["include", "Endotest.Request_Status", "pending", "rejected"]
  ]
}
```

**Field Selection Pipeline:**
```
Empty set start
Step 1: Include Endotest.Request_Sent
Step 2: Include Endotest.Request_Status
Result: {Endotest.Request_Sent, Endotest.Request_Status}
```

**Logic:**
```
For each inclusion:
  Check Endotest fields (Request_Sent, Request_Status)
  If ANY field matches transitions:
    COUNT this inclusion
If count > 100: CRITICAL (too many status changes)
```

### Example 5: Valid Workflow Transitions

**Requirement:** Alert on workflow changes but only for valid state transitions (request can go from pending to accepted/rejected/resubmitted)

```json
{
  "bloc_title": "Endotest",
  "line_label": "Valid Request Transitions",
  "warning_threshold": 0,
  "critical_threshold": 50,
  "field_group": {"include": ["Endotest"]},
  "field_name": ["Request_Status"],
  "transitions": [
    ["include", "Endotest.Request_Status", "pending", "accepted"],
    ["include", "Endotest.Request_Status", "pending", "rejected"],
    ["include", "Endotest.Request_Status", "rejected", "resubmitted"],
    ["include", "Endotest.Request_Status", "accepted", "cancelled"]
  ],
  "bloc_scope": "any"
}
```

**Logic:**
```
For each inclusion:
  Check if Request_Status field changed
  If transition matches ONE of the 4 allowed transitions:
    COUNT this inclusion (valid workflow)
  If transition is different:
    SKIP (unexpected change - needs investigation)

If count > 50: CRITICAL (too many valid status transitions)
```

**Note:** With multiple transitions in the exception, the field must match ANY of the specified transitions to be included.

---

### Example 6: Exclude Internal Fields

**Requirement:** Monitor data changes but ignore internal/system fields

```json
{
  "bloc_title": "Identification",
  "line_label": "Data Changes",
  "warning_threshold": 0,
  "critical_threshold": 100,
  "field_group": null,
  "field_name": {"exclude": ["Last_Updated", "Import_Time", "Internal_Id"]},
  "transitions": [
    ["include", "*.*", "*", "*"]
  ],
  "bloc_scope": "any"
}
```

**Logic:**
```
For each inclusion:
  Check ALL fields EXCEPT [Last_Updated, Import_Time, Internal_Id]
  If ANY field changed:
    COUNT this inclusion
If count > 100: CRITICAL (too many changes)
```

---

## User Guide: Adding/Modifying Rules

### Step 1: Identify Rule Need

Determine the data validation requirement:

```
Detection Type          Use Pattern
─────────────────────────────────────────────────
New patients added      "New Inclusions" rule
Patients removed        "Deleted Inclusions" rule
Field values changed    Standard rule + transitions
Field added/removed     "New/Deleted Fields" rule
Specific transitions    Standard rule + narrow transitions
Exclude system changes  Standard rule + exceptions
```

### Step 2: Choose Rule Type

| Rule Type | When to Use | Complexity |
|-----------|------------|-----------|
| New Inclusions | Track patient additions | Simple |
| Deleted Inclusions | Track patient removals | Simple |
| New Fields | Monitor schema changes | Simple |
| Deleted Fields | Detect removed data | Simple |
| Standard (Transitions) | Monitor specific changes | Medium |
| Standard (with Exceptions) | Monitor changes + allowances | Complex |

### Step 3: Define Thresholds

```
Decision Matrix:

Threshold Pattern    Meaning              Example Use
─────────────────────────────────────────────────────
(0, 0)              No changes allowed   Critical data
(0, 1)              Anything is critical Surgery dates
(0, 50)             Strict monitoring    High-value fields
(50, 100)           Normal operation     Flexible fields
(200, 200)          Skip to critical     Lenient tracking
```

Recommendation:
```
Strict validation (medical):
  warning = 0, critical = 1

Normal validation (most fields):
  warning = 5, critical = 20

Lenient validation (administrative):
  warning = 50, critical = 100
```

### Step 4: Create Rule Row in Excel

Open `Endobest_Dashboard_Config.xlsx` → `Regression_Check` sheet

```
Row N:
A: ignore            (leave empty)
B: bloc_title        (e.g., "Inclusion Protocol")
C: line_label        (e.g., "Status Changed")
D: warning_threshold (e.g., 0)
E: critical_threshold (e.g., 20)
F: field_group       (e.g., "Inclusion")
G: field_name        (e.g., ["Status", "Date"])
H: bloc_scope        (e.g., "any")
I: transitions       (e.g., [["include", "*.*", "*", "*"]])
```

### Step 5: Define Field Scope

Decide which fields the rule applies to:

```
Scope                    JSON
──────────────────────────────────────────────
All fields               null
All in group X           "group_name"
Multiple groups          {"include": ["group1", "group2"]}
All except group X       {"exclude": ["group1"]}
Specific field           "field_name"
Multiple fields          ["field1", "field2"]
Field with notation      ["Group.field1", "Group.field2"]
```

### Step 6: Define Transitions

Specify what changes to monitor:

```
Pattern                  JSON              Meaning
────────────────────────────────────────────────────────────
Any change               [["*", "*"]]      Monitor all changes
Become defined           [["*undefined", "*defined"]]      Field gets value
Become undefined         [["*defined", "*undefined"]]      Field loses value
Toggle boolean           [[true, false], [false, true]]    Boolean flip
Specific change          [["old", "new"]]  Exact transition
Multiple changes         [["old1", "new1"], ["old2", "new2"]]  Multiple patterns
```

### Step 7: Set Exceptions (Optional)

Allow specific field/transition combinations:

```
If needed:
i: transition_exceptions = {
    "include": [
      {"field": "Request_Sent", "transition": [false, true]}
    ]
  }

Or exclude specific cases:
i: transition_exceptions = {
    "exclude": [
      {"field": "Last_Updated"}
    ]
  }
```

### Step 8: Choose Bloc Scope

Decide aggregation logic:

```
Requirement              bloc_scope
─────────────────────────────────────────────
Any field changes        "any" (default)
All changes match        "all"
```

### Step 9: Validate & Test

```bash
# Check-only mode (validates configuration)
python eb_dashboard.py --check-only

# Expected output:
# ✓ Loaded 42 regression check rules
# ✓ All checks passed
```

### Step 10: Full Collection Test

```bash
# Run full collection to test rule
python eb_dashboard.py

# After collection, verify:
# 1. Rule appears in output
# 2. Severity level is correct (OK/Warning/Critical)
# 3. Count matches expectations
```

---

## Execution Modes

### Mode 1: Normal Collection with Quality Checks

```bash
python eb_dashboard.py
```

**Workflow:**
```
1. Collect data (organizations, inclusions)
2. Run Coherence Check
3. Run Non-Regression Check (if old file exists)
4. If critical issues: Ask user for confirmation
5. If OK or user confirms: Export files
6. Display elapsed time
```

**Output:**
```
Collecting data from 15 organizations...
[████████████████████] 1200/1200

═══ Coherence Check ═══
✓ [green]TOTAL matches[/green]

═══ Non Regression Check ═══
✓ [green]Structure: New Fields: 0[/green]
✓ [green]Identification: New Inclusions: 0[/green]
...

✓ All checks passed successfully!

Writing files...
Elapsed time: 3:42
```

### Mode 2: Check-Only (Validation Only)

```bash
python eb_dashboard.py --check-only
```

**Workflow:**
```
1. Load existing JSON files (no API calls)
2. Load regression configuration
3. Run Coherence Check
4. Run Non-Regression Check
5. Report results
6. Exit
```

**Use Case:** Validate data before distribution without fresh collection

**Output:**
```
═══ CHECK ONLY MODE ═══
Running quality checks on existing data files...

[Loading configuration...]
[Running checks...]

✓ All checks passed successfully!
```

### Mode 3: Compare Two Files

```bash
python eb_dashboard.py --check-only file1.json file2.json
```

**Workflow:**
```
1. Load file1 and file2 (as current and old)
2. Skip coherence check (organizations not provided)
3. Run regression check comparing them
4. Report differences
5. Exit
```

**Use Case:** Compare two snapshots, detect changes between versions

**Output:**
```
═══ CHECK ONLY COMPARE MODE ═══
Comparing two specific files:
  Current: file1.json
  Old: file2.json

[Running regression checks...]

⚠ [yellow]New Inclusions: 15[/yellow]
✗ [red]Deleted Inclusions: 5[/red]
...
```

### Mode 4: Debug Mode (Verbose Output)

```bash
python eb_dashboard.py --debug
```

**Workflow:**
```
1. Execute as Normal Mode
2. Enable DEBUG_MODE in quality checks
3. Display detailed field-by-field changes
4. Show individual inclusion comparisons
5. Verbose logging
```

**Use Case:** Troubleshoot regression rules, understand data changes

**Output:**
```
Running collection...
[████████] 1200/1200

═══ Non Regression Check (DEBUG MODE) ═══

Endotest - Undefined to Defined (Only): 12
  ✓ Patient-001:
    - Endotest.Request_Sent: false → true
    - Endotest.Request_Status: undefined → 'completed'

  ✓ Patient-002:
    - Endotest.Request_Sent: false → true

...
```

---

## Troubleshooting

### Issue 1: "Invalid JSON format" Error

**Symptom:** Configuration validation fails

**Cause:** Malformed JSON in transitions, field_name, or exceptions

**Solution:**
1. Open cell in JSON validator
2. Fix syntax errors
3. Re-run check

**Example - WRONG:**
```json
{
  "transitions": [["active", "inactive" ]  // Missing comma
}

{
  "field_name": ["Status" "Date"]  // Missing comma between array elements
}
```

**Example - CORRECT:**
```json
{
  "transitions": [["active", "inactive"]]
}

{
  "field_name": ["Status", "Date"]
}
```

### Issue 2: Rule Never Triggers

**Symptom:** Count always shows 0 even when data changes

**Causes:**
1. Field filters too restrictive
2. Transition pattern doesn't match actual changes
3. field_group/field_name filtering excludes target fields

**Solution:**
1. Loosen field filters: Set field_name to null
2. Use wildcards in transitions: `["*", "*"]`
3. Check actual field names in JSON output
4. Enable debug mode to see field matching

### Issue 3: Too Many False Positives

**Symptom:** Rule triggers unexpectedly, too many violations

**Causes:**
1. Thresholds set too low
2. Transitions too broad (matching unintended changes)
3. field_group/field_name too permissive

**Solution:**
1. Increase thresholds: Raise warning_threshold and critical_threshold
2. Narrow transitions: Use specific values instead of wildcards
3. Add exceptions: Use transition_exceptions to exclude specific cases
4. Narrow field scope: Specify field_name instead of null

### Issue 4: Configuration Changes Not Taking Effect

**Symptom:** Modifications to Excel file don't affect results

**Causes:**
1. File not saved
2. Regression_Check sheet not loaded
3. Old configuration still in memory

**Solution:**
1. Save Excel file (Ctrl+S)
2. Restart Python script
3. Verify sheet name is exactly "Regression_Check"
4. Check file path is correct

### Issue 5: User Confirmation Not Appearing

**Symptom:** Expected prompt for critical issues doesn't show

**Causes:**
1. Issues are at warning level, not critical
2. Thresholds higher than actual counts
3. Running in check-only mode (no export decision needed)

**Solution:**
1. Verify thresholds: warning < critical
2. Check actual violation counts
3. Run normal mode (not check-only)

### Issue 6: Comparison Mode Showing Unexpected Differences

**Symptom:** `--check-only file1 file2` reports many changes

**Causes:**
1. Files are from different collection dates (expected)
2. Configuration changed between collections (expected)
3. Field order or grouping changed (might be false positive)

**Solution:**
1. Review reported changes manually
2. Check if changes are expected (new patient data added)
3. Verify no data corruption occurred
4. Compare file sizes and counts manually

---

## Performance Considerations

### Regression Check Execution Time

**Factors Affecting Performance:**

```
1. Number of Inclusions (patients)
   - N patients = O(N) iterations
   - Typical: 1200 patients = 1-2 seconds

2. Number of Rules
   - R rules applied to each inclusion
   - Typical: 20-30 rules = <100ms total

3. Field Matching Complexity
   - Filter evaluation per field
   - Notation pointée parsing: O(1) per field
   - Typical: <50ms for all rules

4. Total Typical Time
   - 1200 inclusions × 25 rules = 1-3 seconds
```

### Optimization Tips

**If Regression Check is Slow:**

1. **Reduce rule count:**
   - Remove inactive rules (add "ignore" label)
   - Combine similar rules

2. **Simplify field filters:**
   - Use null instead of large filter lists
   - Use include (smaller) instead of exclude (larger)

3. **Narrow transitions:**
   - Use specific values instead of wildcards
   - Reduce number of transition pairs

4. **Consider file size:**
   - Large JSON files (>20MB) take longer to parse
   - This is rare and usually not the bottleneck

---

## Summary

The Quality Checks System provides:

✅ **Multi-Level Validation:** Coherence + Regression checks
✅ **Config-Driven Rules:** No code changes needed
✅ **Flexible Thresholds:** Warning and Critical levels
✅ **Rich Filtering:** Group, field, notation pointée support
✅ **Transition Patterns:** Wildcard, keyword, and specific matching
✅ **Advanced Exception Handling:**
   - Multiple transitions per exception: `[[old1, new1], [old2, new2], ...]`
   - Include + Exclude can coexist simultaneously
   - Fine-grained control over allowed/blocked transitions
✅ **Backward Compatible:** Legacy single-transition format still supported
✅ **Debug Support:** Detailed logging and debug mode
✅ **Execution Modes:** Normal, check-only, compare, debug

This architecture enables robust data quality monitoring without requiring code modifications, empowering business analysts to define and evolve validation rules independently.

---

**Document End**