Files

Abdelkouddous LHACHIMI cb8b5d9a12 Version fonctionnelle

2025-12-12 23:07:26 +01:00

59 KiB

Raw Blame History

Endobest Quality Checks & Regression Testing Guide

Part 3: Quality Assurance, Validation Rules & Configuration

Document Version: 3.1 (Updated with new Excel export module reference) Last Updated: 2025-11-08 Audience: Developers, Business Analysts, QA Engineers Language: English

Note: Excel export functionality now available - see DOCUMENTATION_13_EXCEL_EXPORT.md, DOCUMENTATION_98_USER_GUIDE.md, and DOCUMENTATION_99_CONFIG_GUIDE.md

Version History

Version 3.0 (2025-10-22) - UNIFIED FIELD SELECTION PIPELINE

Complete Refactorization of Field Selection

✅ Merged Columns: field_group (F) + field_name (G) → single field_selection (F)
✅ Simplified Syntax: Field selection uses same pipeline format as transitions: [["action", "field_selector"], ...]
✅ 3 Selector Patterns: *.* (all fields), group.* (group), group.field (specific)
✅ Cleaner Code: Removed 150+ lines of dual-filter logic (field_group + field_name combinations)
✅ Config-Driven Keys: Key field determination (Patient_Id, Pseudo) now read from field_selection instead of hardcoded
✅ Unified Key Detection: New _get_key_field_from_new_inclusions_rule() applies field_selection pipeline directly to first inclusion (15 LOC, -75% vs manual parsing)
✅ Helper Functions: _apply_field_selection_pipeline(), _get_key_field_from_new_inclusions_rule(), _build_candidate_fields()
⚠️ MAJOR Breaking Change: Old field_group and field_name columns (F, G) are removed
⚠️ Column Shifts: bloc_scope moves H→G, transitions moves I→H
⚠️ Configuration Migration Required: Completely restructure Excel Regression_Check sheet

Technical Details:

Field selection pipeline starts with empty set, each step adds/removes fields
Responsibility on admin to order rules correctly (no implicit logic)
Special rules "New Fields", "Deleted Fields", "Deleted Inclusions" must have empty field_selection
Special rule "New Inclusions" applies field_selection pipeline to first inclusion sample (assumes stable structure)
Key field detection: finds first field from pipeline that has non-null value in both first new and old inclusion
Configuration validation: missing/invalid field_selection = CRITICAL error

Removed Dead Code:

_determine_key_field() - hardcoded Patient_Id/Pseudo logic
_matches_field_group_filter() - replaced by pipeline
_matches_field_name_filter() - replaced by pipeline
_determine_key_field_from_config() - replaced by simplified unified _get_key_field_from_new_inclusions_rule()

Version 2.0 (2025-10-22) - Pipeline Architecture

Transitions Pipeline Introduced

✅ Unified Format: Merged transitions + transition_exceptions into single transitions column
✅ Simplified Syntax: Each step is a 4-element array [action, field_selector, from, to]
✅ Sequential Processing: Pipeline steps applied in order, allowing fine-grained control
✅ Better Determinism: All sets sorted for reproducible logs
✅ Improved Error Handling: Invalid configs silently skipped with warnings
⚠️ Breaking Change: Old transition_exceptions column (J) merged into transitions (I)

Version 1.0 (2025-10-21) - Initial Release

Dual-column system: transitions (I) + transition_exceptions (J)
Include/exclude exception handling
Multiple transition support per exception

Overview
Quality Assurance Strategy
Coherence Check (Technical Details)
Non-Regression Check Framework
Regression Check Configuration File
Column Reference
Special Keywords & Wildcards
Rule Types & Logic
Field Selection Pipeline
Transition Patterns
Exception Handling
Configuration Examples
User Guide: Adding/Modifying Rules
Execution Modes
Troubleshooting

⚠️ CRITICAL - Version 3.0 Migration Required

This document describes v3.0 with BREAKING CHANGES from v2.0

Item	v2.0	v3.0
Excel Columns F-I	`field_group`, `field_name`, `bloc_scope`, `transitions`	`field_selection`, `bloc_scope`, `transitions`
Column Count	4 columns for filtering+transitions	3 columns (merged field_selection)
Key Field Config	Hardcoded (Patient_Id/Pseudo)	Config-driven (from field_selection)
Field Filtering Logic	6+ combinations (complex)	Single pipeline (simple)

ACTION REQUIRED:

✅ Update Excel file column positions
✅ Migrate field_group + field_name → field_selection
✅ Run non-regression tests
✅ Verify key field detection works with new config

Overview

The Quality Checks System provides comprehensive data validation in two stages:

Coherence Check: Verifies that organization statistics (API counters) match the actual detailed inclusion data
Non-Regression Check: Detects unexpected data changes between current and previous collection runs

Both checks are configurable via Excel with Warning/Critical severity levels that can trigger user confirmation prompts.

Design Philosophy

Trust, but Verify

- Trust: API data is generally reliable
- Verify: Statistical consistency and change detection
- Report: Multi-level severity (OK, Warning, Critical)
- Decide: User confirmation before export on critical issues

Quality Assurance Strategy

Workflow Integration

Data Collection
    ↓
QUALITY CHECKS
├─ COHERENCE CHECK (mandatory)
│  ├─ Load organization statistics from API responses
│  ├─ Calculate actual counts from detailed inclusions
│  └─ Compare: Stats vs. Actual
│
├─ NON-REGRESSION CHECK (if old file exists)
│  ├─ Load previous inclusions (_old file)
│  ├─ Apply config-driven comparison rules
│  └─ Report: Changes matching configured patterns
│
└─ RESULT
    ├─ has_coherence_critical flag
    └─ has_regression_critical flag
        ↓
    IF critical issues detected:
      ├─ Display warning: ⚠ CRITICAL
      ├─ Ask user: "Write results anyway?"
      ├─ If NO → Abort export, preserve old files
      └─ If YES → Continue with export (user override)
    ELSE:
      └─ Continue with export automatically

Severity Levels

Level	Display	Meaning	Action
OK	✓ Green	No issues, within normal range	Continue automatically
WARNING	⚠ Yellow	Issue detected, exceeds warning threshold	Log and display, continue automatically
CRITICAL	✗ Red	Severe issue, exceeds critical threshold	Display, ask user before export

User Interaction

Quality Checks Complete

✗ [red]Coherence Check: CRITICAL[/red]
  ⚠ [yellow]Organization 1 mismatch: 95 vs 98[/yellow]

✗ [red]Non-Regression: CRITICAL[/red]
  ⚠ [yellow]New Inclusions: 42 (threshold 50)[/yellow]
  ✗ [red]Deleted Inclusions: 15 (threshold 0)[/red]

[bold]⚠ CRITICAL issues detected in quality checks![/bold]
Do you want to write the results anyway? [y/N]:
  y → Export anyway (risky, user override)
  n → Cancel export (preserve old files)

Coherence Check (Technical Details)

Purpose

Verify that organization statistics (fetched from API) match actual detailed data (inclusion-by-inclusion count).

Data Sources

Source 1: Organization Statistics (API)

For each organization:
  GET /api/inclusions/inclusion-statistics
  Returns:
  {
    "totalInclusions": N,      // Total patients
    "preIncluded": P,          // Pré-inclus count
    "included": I,             // Inclus count
    "prematurelyTerminated": T // Prematurely terminated
  }

Source 2: Inclusion Details (JSON Array)

For each patient in endobest_inclusions:
  Check: Patient_Identification.Organisation_Id
  Count: Based on Inclusion.Inclusion_Status

  Classification rules:
  1. If status ends with " - AP" → prematurely_terminated
  2. Else if status starts with "pré-inclus" → preincluded
  3. Else if status starts with "inclus" → included
  Always count: patients += 1

Validation Logic

def coherence_check(current_inclusions, organizations_list):
    # STEP 1: Collect statistics from API
    total_stats = {
        'patients': sum(org['patients_count'] for org in organizations),
        'preincluded': sum(org['preincluded_count'] for org in organizations),
        'included': sum(org['included_count'] for org in organizations),
        'prematurely_terminated': sum(org['prematurely_terminated_count'] for org in organizations)
    }

    # STEP 2: Calculate actual counts from detailed data
    total_detail = calculate_detail_counters(current_inclusions, org_id=None)
    #  = (patients, preincluded, included, prematurely_terminated)

    # STEP 3: Compare all 4 counters
    is_match = (
        total_stats['patients'] == total_detail['patients'] AND
        total_stats['preincluded'] == total_detail['preincluded'] AND
        total_stats['included'] == total_detail['included'] AND
        total_stats['prematurely_terminated'] == total_detail['prematurely_terminated']
    )

    # STEP 4: Report total comparison
    IF is_match:
        PRINT: ✓ [green]TOTAL matches[/green]
    ELSE:
        PRINT: ✗ [red]TOTAL mismatch[/red]
        PRINT: Stats({P}/{Pre}/{Inc}/{Term}) vs Detail({p}/{pre}/{inc}/{term})
        set has_critical = True

    # STEP 5: Detail-level comparison (only if not OK)
    FOR EACH organization:
        org_stats = get organization counters
        org_detail = calculate_detail_counters(current_inclusions, org_id=org.id)

        IF org_stats != org_detail:
            PRINT: ⚠ [yellow]Organization "{name}" mismatch[/yellow]
            PRINT: Stats vs Detail breakdown
            set has_critical = True

    RETURN has_critical

Example Output

Scenario: Perfect Match

═══ Coherence Check ═══

✓ [green]TOTAL - Stats(150/20/120/10) vs Detail(150/20/120/10)[/green]

Scenario: Mismatch Detected

═══ Coherence Check ═══

✗ [red]TOTAL - Stats(150/20/118/10) vs Detail(150/20/120/10)[/red]
  ⚠ [yellow]Center A - Stats(50/5/40/5) vs Detail(50/5/42/5)[/yellow]
  ⚠ [yellow]Center B - Stats(100/15/78/5) vs Detail(100/15/78/5)[/yellow]

Interpretation

Match (Green):

API statistics perfectly align with detailed data
→ No data collection issues
→ Continue processing

Minor Mismatch (Yellow):

1-2 patients differ between statistics and details
→ Possible API consistency issue
→ Monitor but continue (it happens occasionally)

Major Mismatch (Red):

10+ patients difference
→ Significant data collection issue
→ Investigate root cause
→ Consider re-running collection

Non-Regression Check Framework

Purpose

Detect unexpected data changes between current and previous collections by comparing field values against configured transition patterns.

Architecture

Previous Inclusions (File)
    ↓
┌─────────────────────────────┐
│ NON-REGRESSION CHECK        │
├─────────────────────────────┤
│ 1. Load Regression Config   │
│    (Excel: Regression_Check sheet)
│                             │
│ 2. Build Inclusion Dicts    │
│    Index by: Patient_Id or Pseudo
│                             │
│ 3. Group Rules by Bloc      │
│    - Structure              │
│    - Identification         │
│    - Inclusion Protocol     │
│    - Endotest               │
│    - Other Questionnaires   │
│                             │
│ 4. For Each Rule:           │
│    a) Detect rule type      │
│       - Normal rule         │
│       - New Inclusions      │
│       - Deleted Inclusions  │
│       - New Fields          │
│       - Deleted Fields      │
│                             │
│    b) Process rule logic    │
│       - Collect candidates  │
│       - Match transitions   │
│       - Apply exceptions    │
│       - Apply bloc_scope    │
│                             │
│    c) Calculate severity    │
│       - Count vs thresholds │
│       - Determine status    │
│                             │
│ 5. Display Results          │
│    - By bloc                │
│    - Color-coded status     │
│    - Detailed changes (debug)
│                             │
└─────────────────────────────┘
    ↓
Current Inclusions (Memory)

Regression Check Configuration File

File Location & Sheet

Endobest_Dashboard_Config.xlsx
│
├─ Sheet 1: "Inclusions_Mapping" (See DOCUMENTATION_11_FIELD_MAPPING.md)
│
└─ Sheet 2: "Regression_Check"
   ├─ Row 1: Headers
   └─ Row 2+: Rules

Sheet Structure (Version 3.0)

Row 1 (Headers):
A            B          C              D                  E
ignore      bloc_title  line_label     warning_threshold  critical_threshold
F                       G              H
field_selection         bloc_scope     transitions

Row 2+: Rule definitions (one per row)

BREAKING CHANGE (v3.0): Columns F and G from v2.0 (field_group and field_name) have been merged into single column F (field_selection). All subsequent columns shifted left by one position.

Color Coding:

Yellow: Structure/Identification bloc (foundational rules)
Blue: Inclusion Protocol bloc (inclusion status rules)
Light Purple: Endotest bloc (test-related rules)
White: Regular rules
Red: Incomplete/error rules (missing required columns)

Column Reference

Column A: ignore

Type: String (optional) Description: Skip this row if contains "ignore" (case-insensitive) Purpose: Comment out rules without deleting rows Values:

ignore          → Row is skipped
(empty)         → Row is processed
any_other_text  → Row is processed

Column B: bloc_title

Type: String (required) Description: Logical grouping of related rules Purpose: Visual organization and blocking/reporting Valid Values:

Structure              → File format and field availability rules
Identification         → Patient identification changes
Inclusion Protocol     → Inclusion status and protocol changes
Endotest               → Laboratory test request changes
Other Questionnaires   → Non-specific questionnaire changes
[Custom Group Names]   → Any custom bloc name for organization

Rules Per Bloc:

Structure bloc (Example):
  ├─ New Fields
  ├─ Deleted Fields
  └─ (Structure-specific rules)

Identification bloc:
  ├─ New Inclusions
  ├─ Deleted Inclusions
  ├─ Changed (Excluding Birthday)
  ├─ Changed Date of Birth/Age
  └─ (Identification-specific rules)

Endotest bloc:
  ├─ Undefined to Defined (Only)
  ├─ Defined to Undefined
  ├─ Changed Value
  └─ (Endotest-specific rules)

Column C: line_label

Type: String (required) Description: Unique rule identifier within its bloc Purpose: Displayed in output, identifies rule in reports Examples:

New Inclusions
Deleted Inclusions
New Fields
Deleted Fields
Changed Value
Undefined to Defined (Only)

Requirements:

Must be unique within bloc_title
Should be descriptive

Column D: warning_threshold

Type: Numeric (required, >= 0) Description: Count threshold that triggers WARNING level Position: Column D (after line_label) Logic:

IF count > warning_threshold AND count <= critical_threshold:
  Status = WARNING (yellow ⚠)

Examples:

0      → Any change triggers warning (strict)
5      → 1-5 changes = OK, 6-10 = Warning
50     → 1-50 changes = OK, 51+ = Warning (lenient)
200    → Very lenient, only alert on large changes

Column E: critical_threshold

Type: Numeric (required, >= warning_threshold) Description: Count threshold that triggers CRITICAL level Position: Column E (after warning_threshold) Logic:

IF count > critical_threshold:
  Status = CRITICAL (red ✗)
  → May prompt user for confirmation

Relationship:

warning_threshold <= critical_threshold

Examples:
(0, 1)      → Strict: any change is critical
(0, 50)     → Any warning also becomes critical
(50, 100)   → Normal operation: 1-50 OK, 51-100 warning, 100+ critical
(200, 200)  → Same thresholds: jump directly from OK to critical

Column F: field_selection (NEW - v3.0)

Type: JSON array of 2-element arrays (mandatory for most rules) Description: Pipeline-based field selection using include/exclude actions Position: Column F (after critical_threshold) - REPLACES old field_group + field_name Rules:

Format: [["action", "field_selector"], ["action", "field_selector"], ...]
Mandatory: For all rules EXCEPT "New Fields", "Deleted Fields", "Deleted Inclusions"
For special rules: Must be empty [] or null
Explicit: No implicit logic - admin must order steps correctly
Pipeline: Starts with empty set, each step adds or removes fields

Elements:

Element	Type	Valid Values	Example
action	String	`"include"` or `"exclude"`	`"include"`
field_selector	String	`.`, `group.*`, `group.field`	`"Endotest.Request_Sent"`

Selector Patterns (3 only):

*.*              → All fields in all groups
group.*          → All fields in specific group (e.g., "Endotest.*")
group.field      → Specific field only (e.g., "Endotest.Request_Sent")

Examples:

1. Include Single Group

[["include", "Endotest.*"]]
// All Endotest fields

2. Include Multiple Groups

[["include", "Endotest.*"], ["include", "Inclusion.*"]]
// Endotest AND Inclusion fields

3. Include All, Exclude Some

[["include", "*.*"], ["exclude", "Endotest.Last_Updated"]]
// All fields EXCEPT Endotest.Last_Updated

4. Key Field Selection (for "New Inclusions" rule)

[["include", "Patient_Identification.Patient_Id"], ["include", "Patient_Identification.Pseudo"]]
// Tries Patient_Id first, then Pseudo (in order)

5. Complex Pipeline

[
  ["include", "*.*"],
  ["exclude", "Inclusion.*"],
  ["exclude", "Patient_Identification.*"]
]
// All fields EXCEPT Inclusion and Patient_Identification

Special Rules (field_selection must be EMPTY):

"New Fields"           → [] or null
"Deleted Fields"       → [] or null
"Deleted Inclusions"   → [] or null

Validation:

✅ Missing or null field_selection for normal rules → CRITICAL ERROR
✅ Invalid selector (no dot) → CRITICAL ERROR
✅ Non-list format → CRITICAL ERROR, skip rule
✅ Step with wrong element count → CRITICAL ERROR, skip rule

Column G: bloc_scope (moved from H - v3.0)

Type: String enum (optional, default: "any") Description: Aggregation logic for matching fields within an inclusion Position: Column G (after field_selection) Valid Values:

"any"    → At least ONE field must match transitions
"all"    → ALL changed fields must match transitions

Logic:

bloc_scope = "any" (Default)

IF ANY candidate field has matching transition:
  RETURN inclusion matches rule

Use for: "Alert if any change occurs"

bloc_scope = "all"

IF ALL changed fields have matching transitions:
  RETURN inclusion matches rule

Use for: "Alert only if all changes match pattern"

Example Comparison:

Inclusion with 5 fields in scope:
  Field1: Changed, matches transition ✓
  Field2: Unchanged (always ignored)
  Field3: Changed, does NOT match transition ✗
  Field4: Unchanged (always ignored)
  Field5: Changed, matches transition ✓

Changed fields: [Field1, Field3, Field5]
Matched changed: [Field1, Field5]

Result with bloc_scope="any":  ✓ COUNT (Field1 matched)
Result with bloc_scope="all":  ✗ SKIP (Field3 didn't match)

Scenario	bloc_scope="any"	bloc_scope="all"
1 match, 0 mismatches	✓ COUNT	✓ COUNT
1 match, 1 mismatch	✓ COUNT	✗ SKIP
0 matches, 1 mismatch	✗ SKIP	✗ SKIP
3 matches, 0 mismatches	✓ COUNT	✓ COUNT
3 matches, 1 mismatch	✓ COUNT	✗ SKIP

Column H: transitions (moved from I - v3.0)

Type: JSON array of 4-element arrays (optional) Description: Pipeline-based transition rules (old_value → new_value) Position: Column H (after bloc_scope) Format: [["action", "field_selector", "from_pattern", "to_pattern"], ...]

Each step is exactly 4 elements
If None/empty: Rule applies to ALL field changes
Supports wildcard keywords: *undefined, *defined, *
Supports literal values for exact matching

Pipeline Concept (v2.0+):

Initial state: All changed fields → is_checked = False

Step 1: Include rule for all fields (*.*) with *defined→*defined
  └─ is_checked = True if transition matches

Step 2: Include rule for Endotest.Diagnostic_Status with waiting→*undefined
  └─ is_checked = True (whitelisted exception)

Step 3: Exclude rule for Endotest.Request_Sent with false→true
  └─ is_checked = False (blacklisted exception)

Final result: Only fields matching the pipeline are checked

Syntax: 4-Element Pipeline Array

Each pipeline step is a 4-element array:

[action, field_selector, from_pattern, to_pattern]

Element	Description	Examples
action	"include" (whitelist) or "exclude" (blacklist)	"include", "exclude"
field_selector	Which fields this step applies to	".", "group.*", "group.field"
from_pattern	Old value pattern to match	"undefined", "defined", "*", literal value
to_pattern	New value pattern to match	"undefined", "defined", "*", literal value

Important: The syntax is strictly enforced - each step must have exactly 4 elements. No shortcuts or variants are accepted.

Field Selector Patterns

*.*                    → All fields in all groups
group.*                → All fields in specific group (e.g., "Endotest.*")
group.field            → Specific field only (e.g., "Endotest.Request_Sent")

Complete Examples

Example 1: Simple All-Fields Rule (Most Common)

{
  "transitions": [
    ["include", "*.*", "*defined", "*defined"]
  ]
}
// Pipeline: Include all fields that change between two defined values

Example 2: Main Rule + One Include Exception

{
  "transitions": [
    ["include", "*.*", "*defined", "*defined"],
    ["include", "Endotest.Diagnostic_Status", "waiting", "*undefined"]
  ]
}
// Step 1: Include all *defined→*defined changes
// Step 2: ALSO include specific Endotest.Diagnostic_Status changes from waiting to undefined

Example 3: Main Rule + Include Exception + Exclude Exception

{
  "transitions": [
    ["include", "*.*", "*defined", "*defined"],
    ["include", "Endotest.Diagnostic_Status", "waiting", "*undefined"],
    ["exclude", "Endotest.Request_Sent", false, true]
  ]
}
// Step 1: Include all *defined→*defined
// Step 2: Include Diagnostic_Status waiting→undefined (whitelist)
// Step 3: Exclude Request_Sent false→true (blacklist)
// Result: Step 3 overrides Step 1 for that specific field+transition

Example 4: Multiple Include Steps for Different Fields

{
  "transitions": [
    ["include", "*.*", "*defined", "*defined"],
    ["include", "GDD.Status", "pending", "completed"],
    ["include", "GDD.Status", "pending", "failed"]
  ]
}
// Step 1: Include all *defined→*defined changes
// Step 2: Include GDD.Status pending→completed
// Step 3: Include GDD.Status pending→failed

Example 5: Exclude Rule with Wildcard

{
  "transitions": [
    ["include", "*.*", "*defined", "*defined"],
    ["exclude", "Endotest.Last_Modified", "*", "*"]
  ]
}
// Include all changes EXCEPT any change to Last_Modified field

Processing Logic

The pipeline is executed sequentially, with each step modifying the is_checked status in-place:

1. Initialize: All changed fields have is_checked = False

2. For each transition step in order:
   a. Check if the current field matches the field_selector
   b. If yes: Check if the old→new values match from_pattern→to_pattern
   c. If yes:
      - If action="include": Set is_checked = True
      - If action="exclude": Set is_checked = False
   d. If no: Leave is_checked unchanged

3. Final: Only fields with is_checked = True are counted as matching

Important: Later steps can override earlier steps. Example:

[
  ["include", "*.*", "*", "*"],      // Step 1: include everything
  ["exclude", "Field.X", "*", "*"]   // Step 2: exclude Field.X (overrides Step 1)
]

Result: Everything is included EXCEPT Field.X

Configuration Error Handling

If a transitions step has invalid syntax:

The rule is silently skipped (logged with yellow warning)
No exception is thrown
User can see the ⚠ warning in the output
User can choose to save the report or fix the config

Valid syntax example:

["include", "*.*", "*defined", "*defined"]  // ✓ Exactly 4 elements
["include", "*.*", "*defined"]               // ✗ Only 3 elements (INVALID)
["maybe", "*.*", "*defined", "*defined"]    // ✗ Invalid action (INVALID)
["include", "invalid", "*defined", "*defined"] // ✗ No dot in selector (INVALID)

Special Keywords & Wildcards

This section documents the special keywords and patterns used in transition specifications throughout the configuration.

Keywords in Transition Patterns

The regression check configuration supports special keywords with * prefix for flexible transition matching:

Keyword 1: `*undefined`

Meaning: Matches any "undefined-like" value

Matches:

null (None in Python)
"" (empty string)
"undefined" (literal string)

Example:

{
  "transitions": [["*undefined", "*defined"]]
}
// Matches: undefined → Active, null → 42, "" → true, etc.

Use Case: Detect when a field gets populated for the first time

Keyword 2: `*defined`

Meaning: Matches any "defined" value (opposite of *undefined)

Matches: Anything EXCEPT:

null (None)
"" (empty string)
"undefined" (literal string)

Example:

{
  "transitions": [["*defined", "*undefined"]]
}
// Matches: Active → null, 42 → "", true → "undefined", etc.

Use Case: Detect when a field loses its value

Keyword 3: `*` (Wildcard)

Meaning: Matches absolutely any value

Matches: Any value including:

Defined values (strings, numbers, booleans)
Undefined-like values (null, "", "undefined")
Objects, arrays, etc.

Example:

{
  "transitions": [["*", "*"]]
}
// Matches: ANY old value → ANY new value
// Essentially: "any change at all"

Use Case: Monitor all changes to a field, filter out specific cases with exceptions

Combining Keywords with Literal Values

Patterns can mix keywords and literal values:

Pattern	Meaning
`["undefined", "defined"]`	Undefined → Defined (field becomes populated)
`["defined", "undefined"]`	Defined → Undefined (field gets cleared)
`["defined", "defined"]`	Value change while staying defined (actual value change required)
`["", ""]`	Any change at all
`["Active", "*defined"]`	From literal "Active" to any defined value
`["*undefined", "Active"]`	From undefined to literal "Active"

Literal Values (No `*` Prefix)

Any value that does NOT start with * is treated as a literal value and matched exactly:

{
  "transitions": [
    ["pending", "accepted"],    // Exact string match
    [false, true],              // Exact boolean match
    [0, 1],                     // Exact numeric match
    [null, "Active"],           // null matches null, "Active" matches "Active"
    ["undefined", "Done"]       // "undefined" (literal string) matches "undefined"
  ]
}

Important: Literal values are matched by exact equality, including:

"undefined" - matches the exact string "undefined" (not undefined state)
null - matches null values
"" - matches empty string

Summary Table: Special Keywords in Transitions

Keyword	Matches	Use Case
`*undefined`	null, "", "undefined" (any undefined-like value)	Detect when field becomes populated
`*defined`	Any defined value (NOT null, "", "undefined")	Detect when field loses value
`*`	Any value whatsoever	Alert on any change; use with exceptions for fine control
(no `*` prefix)	Exact literal values	Specific value matching (e.g., "pending" → "accepted")

Rule Type 1: Standard Rules (Normal Comparison)

Purpose: Detect field value changes matching configured patterns

Processing Steps:

Step 1: Collect Candidate Fields
├─ Filter by field_group (if specified)
├─ Filter by field_name (if specified)
└─ Result: List of (group_name, field_name) tuples

Step 2: For Each Candidate Field
├─ Get new_value and old_value
├─ Check if transition matches (if transitions specified)
├─ Apply exceptions (include/exclude)
├─ Mark as "checked" if matches

Step 3: Apply bloc_scope
├─ With "any": Count inclusion if ANY field is checked
├─ With "all": Count inclusion if ALL changed fields are checked

Step 4: Report Matching Inclusions
└─ Count vs. thresholds (warning/critical)

Example Configuration:

{
  "bloc_title": "Inclusion Protocol",
  "line_label": "Undefined to Defined (Only)",
  "warning_threshold": 0,
  "critical_threshold": 200,
  "field_group": {"include": ["Inclusion"]},
  "field_name": null,
  "transitions": [
    ["include", "*.*", "*undefined", "*defined"]
  ],
  "bloc_scope": "all"
}

Rule Type 2: New Inclusions

Purpose: Count patients that exist in current data but not in previous

Syntax:

{
  "bloc_title": "Identification",
  "line_label": "New Inclusions",
  "warning_threshold": 0,
  "critical_threshold": 50,
  "field_group": "Patient_Identification",
  "field_name": ["Patient_Id", "Pseudo"],
  "transitions": [],
  "bloc_scope": null
}

Note: For special rules like "New Inclusions", transitions can be left as empty array [] since these rules don't use transition matching.

Processing:

1. Build dictionaries indexed by key field
   - Key field candidates: Patient_Id, Pseudo (tried in order)
   - key_dict_new = {patient_key: patient_data for patient in current}
   - key_dict_old = {patient_key: patient_data for patient in previous}

2. Find new inclusions
   new_keys = set(key_dict_new.keys()) - set(key_dict_old.keys())
   count = len(new_keys)

3. Compare to thresholds
   IF count > critical_threshold: CRITICAL
   ELIF count > warning_threshold: WARNING
   ELSE: OK

Example Output:

✓ [green]New Inclusions: 0[/green]
  (No new patients added)

⚠ [yellow]New Inclusions: 42[/yellow]
  (42 new patients - warning threshold exceeded)

✗ [red]New Inclusions: 75[/red]
  (75 new patients - exceeds critical threshold of 50)

Rule Type 3: Deleted Inclusions

Purpose: Count patients that exist in previous but not in current

Syntax:

{
  "bloc_title": "Identification",
  "line_label": "Deleted Inclusions",
  "warning_threshold": 0,
  "critical_threshold": 0,
  "field_group": "Patient_Identification",
  "field_name": ["Patient_Id", "Pseudo"],
  "transitions": [],
  "bloc_scope": null
}

Processing:

1. Build dictionaries (same as New Inclusions)

2. Find deleted inclusions
   deleted_keys = set(key_dict_old.keys()) - set(key_dict_new.keys())
   count = len(deleted_keys)

3. Compare to thresholds
   IF count > critical_threshold: CRITICAL
   ELIF count > warning_threshold: WARNING
   ELSE: OK

Note: Typically critical_threshold=0 because any deletion is concerning.

Rule Type 4: New Fields

Purpose: Detect field names that appear in current but not in previous

Syntax:

{
  "bloc_title": "Structure",
  "line_label": "New Fields",
  "warning_threshold": 0,
  "critical_threshold": 1,
  "field_group": null,
  "field_name": null,
  "transitions": [],
  "bloc_scope": null
}

Processing:

1. For each patient in common (present in both versions):
   a) Get all groups and fields from current version
   b) Get all groups and fields from previous version
   c) Find new fields: current_fields - previous_fields
   d) Qualified name: "group_name.field_name"

2. Count by field name
   field_counts = {field_qualified_name: count_of_inclusions}
   total_new_fields = len(field_counts)

3. Display results
   For each new field:
     "Inclusion.New_Field (42 inclusions)"
     [count = number of inclusions that gained this field]

Example Output:

✓ [green]New Fields: 0[/green]

⚠ [yellow]New Fields: 2[/yellow]
    Endotest.New_Request_Type (1 inclusion)
    Inclusion.New_Status_Code (2 inclusions)

Rule Type 5: Deleted Fields

Purpose: Detect field names that exist in previous but not in current

Syntax:

{
  "bloc_title": "Structure",
  "line_label": "Deleted Fields",
  "warning_threshold": 0,
  "critical_threshold": 1,
  "field_group": null,
  "field_name": null,
  "transitions": [],
  "bloc_scope": null
}

Processing: Same as "New Fields" but reversed:

deleted_fields = previous_fields - current_fields

Field Selection Pipeline (v3.0)

NEW APPROACH: Field selection now uses the same pipeline architecture as transitions.

Pipeline Ordering (Key Concept)

Start with an empty set of fields. Each step either includes or excludes fields:

candidate_fields = set()  # Empty initially

# Step 1: Include all Endotest fields
for each field in all_fields:
    if selector matches "Endotest.*":
        candidate_fields.add(field)

# Step 2: Also include Inclusion.Status
for each field in all_fields:
    if selector matches "Inclusion.Status":
        candidate_fields.add(field)

# Step 3: But exclude Endotest.Last_Updated
for each field in all_fields:
    if selector matches "Endotest.Last_Updated":
        candidate_fields.discard(field)

# Result: Endotest.* + Inclusion.Status, except Endotest.Last_Updated

Simple Examples

Example 1: Single Group

[["include", "Endotest.*"]]
// Result: All Endotest fields

Example 2: Multiple Groups

[["include", "Endotest.*"], ["include", "Inclusion.*"]]
// Result: All Endotest + all Inclusion fields

Example 3: Specific Fields

[["include", "Patient_Identification.Patient_Id"], ["include", "Patient_Identification.Pseudo"]]
// Result: Only Patient_Id and Pseudo fields

Example 4: All Except Some

[["include", "*.*"], ["exclude", "Endotest.Last_Updated"]]
// Result: All fields EXCEPT Endotest.Last_Updated

Example 5: Complex Selection

[
  ["include", "*.*"],
  ["exclude", "Patient_Identification.*"],
  ["exclude", "Inclusion.*"]
]
// Result: All fields EXCEPT Patient_Identification and Inclusion

Important Notes

✅ Order matters: Steps are applied sequentially
✅ Explicit: Admin responsible for correct pipeline
✅ No implicit AND/OR: Use multiple include steps for OR logic
✅ Deterministic: Sets sorted, reproducible results

Transition Patterns

Pattern Matching Rules

Literal Value Matching

[
  ["active", "inactive"],
  [true, false],
  [0, 1]
]
// Match exact value changes
// Type must match (string vs. number vs. boolean)

Undefined Keyword

*undefined: Matches any undefined-like value
  - null
  - "" (empty string)
  - "undefined"

*defined: Matches any defined value
  - NOT null
  - NOT ""
  - NOT "undefined"

Examples:

[
  ["*undefined", "*defined"]
]
// Transition FROM any undefined TO any defined

[
  ["*defined", "*undefined"]
]
// Transition FROM any defined TO any undefined

[
  ["*defined", "*defined"]
]
// Transition FROM defined TO different defined
// (with actual value change check)

Wildcard Pattern

[
  ["*", "*"]
]
// Match ANY transition
// Useful for: "Alert on any change to this field"

Transition Combination Examples

Example 1: Detect New Values Only

{
  "transitions": [["*undefined", "*defined"]]
}
// Alert when field goes from undefined to any value
// Ignore when field already had value

Example 2: Detect Value Reversal

{
  "transitions": [
    [true, false],
    [false, true]
  ]
}
// Alert when boolean field toggles in either direction

Example 3: Detect Specific Status Change

{
  "transitions": [
    ["pending", "approved"],
    ["pending", "rejected"]
  ]
}
// Alert when pending status changes to approved or rejected
// Ignore all other transitions

Example 4: Detect Anything But This

{
  "transitions": [
    ["include", "*.*", "*", "*"],
    ["exclude", "Endotest.Last_Updated", "*", "*"]
  ]
}
// Alert on any field change
// EXCEPT exclude changes to Last_Updated

Exception Handling (Pipeline Architecture)

With the new unified pipeline format, exceptions are now just regular pipeline steps with different actions. This section explains the patterns.

Pattern 1: Simple Whitelist (Include Only)

Allow specific field/transition combinations:

{
  "transitions": [
    ["include", "Request_Sent", false, true],
    ["include", "Diagnostic_Status", "warning", "complete"]
  ]
}

Logic:

Step 1: Include Request_Sent with false→true transition
Step 2: Include Diagnostic_Status with warning→complete
Result: ONLY these specific field+transition combinations are checked

Pattern 2: Simple Blacklist (Exclude Only)

Block specific field/transition combinations:

{
  "transitions": [
    ["include", "*.*", "*", "*"],
    ["exclude", "Last_Updated", "*", "*"],
    ["exclude", "Endotest.Import_Time", "*", "*"]
  ]
}

Logic:

Step 1: Include all fields with any change (*→*)
Step 2: Exclude Last_Updated from being checked
Step 3: Exclude Endotest.Import_Time from being checked
Result: All fields EXCEPT Last_Updated and Import_Time

Pattern 3: Main Rule + Multiple Exceptions

Combine main transition rule with field-specific exceptions:

{
  "transitions": [
    ["include", "*.*", "*defined", "*defined"],
    ["include", "Request_Sent", false, true],
    ["exclude", "Endotest.Last_Modified", "*", "*"]
  ]
}

Logic:

Step 1: Include fields that change between two defined values
Step 2: ALSO include Request_Sent changing from false to true (even if not *defined→*defined)
Step 3: But exclude any change to Last_Modified (overrides Step 1)
Result: *defined→*defined changes PLUS Request_Sent false→true, EXCEPT Last_Modified

Field Selector Formats in Pipeline

Simple field name (matches in any group):

{
  "field_selector": "Status"
}
// Matches "Status" in any group
// But this is NOT pipeline syntax - use "*.*" with field matching instead

Better: Use qualified notation in field_selector:

["include", "Endotest.Request_Sent", false, true]
// Matches only Endotest group, Request_Sent field
// Matches ONLY Endotest.Request_Sent

Full Specification:

{
  "field": "Endotest.Request_Sent",
  "transition": [false, true]
}
// Matches this specific field AND transition combination

Practical Examples with Pipeline

Example 1: Alert on Most Changes, Except System Fields

{
  "transitions": [
    ["include", "*.*", "*", "*"],
    ["exclude", "Last_Updated", "*", "*"],
    ["exclude", "Last_Modified_By", "*", "*"],
    ["exclude", "Import_Timestamp", "*", "*"]
  ]
}
// Step 1: Include ANY field change
// Step 2-4: Exclude system timestamp/audit fields

Example 2: Alert on Undefined→Defined, Plus Status Reversals

{
  "transitions": [
    ["include", "*.*", "*undefined", "*defined"],
    ["include", "Request_Status", "rejected", "submitted"]
  ]
}
// Step 1: Include when field goes from undefined to defined
// Step 2: ALSO include Request_Status: rejected → submitted (even if not undefined→defined)

Example 3: Complex Medical Rules with Multiple Conditions

{
  "transitions": [
    ["include", "*.*", "*undefined", "*defined"],
    ["include", "Endotest.Test_Result", "pending", "completed"],
    ["include", "GDD.Status", "pending", "failed"],
    ["exclude", "Endotest.Last_Sync", "*", "*"]
  ]
}
// Step 1: Include main rule: undefined→defined
// Step 2: ALSO include Test_Result pending→completed
// Step 3: ALSO include GDD.Status pending→failed
// Step 4: But exclude any change to Last_Sync field
// Result: All matching transitions except Last_Sync changes

Example 4: Fine-Grained Control with Include + Exclude

{
  "transitions": [
    ["include", "*.*", "*"],
    ["include", "Status", "*undefined", "*defined"],
    ["include", "Status", "*defined", "*undefined"],
    ["exclude", "Last_Updated", "*", "*"],
    ["exclude", "Internal_Id", "*", "*"]
  ]
}
// Step 1: Include any change (baseline)
// Step 2-3: Specifically include Status becoming defined/undefined
// Step 4-5: Exclude Last_Updated and Internal_Id changes (override Step 1)
// Result: All changes EXCEPT Last_Updated/Internal_Id, plus Status transitions

Configuration Examples

Example 1: Monitor New Inclusions (v3.0)

Requirement: Alert if unexpected number of patients added

{
  "ignore": null,
  "bloc_title": "Identification",
  "line_label": "New Inclusions",
  "warning_threshold": 0,
  "critical_threshold": 50,
  "field_selection": [["include", "Patient_Identification.Patient_Id"], ["include", "Patient_Identification.Pseudo"]],
  "bloc_scope": null,
  "transitions": []
}

Field Selection Logic:

Starts empty: candidate_fields = {}
Step 1: Include Patient_Identification.Patient_Id
Step 2: Include Patient_Identification.Pseudo
Result: [Patient_Identification.Patient_Id, Patient_Identification.Pseudo]
These become key candidates (tried in order)

Logic:

Count patients in current but not in previous
If count > 50: CRITICAL (too many new patients)
If count > 0: WARNING (any new patients)
If count == 0: OK

Example 2: Detect Undefined→Defined Changes (v3.0)

Requirement: Alert if any field becomes defined

{
  "bloc_title": "Inclusion Protocol",
  "line_label": "Undefined to Defined",
  "warning_threshold": 0,
  "critical_threshold": 100,
  "field_selection": [["include", "Inclusion.*"]],
  "bloc_scope": "any",
  "transitions": [
    ["include", "*.*", "*undefined", "*defined"]
  ]
}

Field Selection & Transitions:

Field Selection: Include all Inclusion fields
Transitions Pipeline:
  Step 1: Include *.*  *undefined→*defined
  Result: Only undefined→defined changes

Logic:

For each inclusion:
  Check if Inclusion.Inclusion_Status changed
  If transition is: undefined → defined:
    COUNT this inclusion
If count > 5: CRITICAL
If count > 0: WARNING

Example 3: Strict All-Fields Completeness (v3.0)

Requirement: Ensure ALL changed fields follow undefined→defined pattern

{
  "bloc_title": "Inclusion Protocol",
  "line_label": "All Changes Undefined to Defined",
  "warning_threshold": 0,
  "critical_threshold": 200,
  "field_selection": [["include", "Inclusion.*"]],
  "bloc_scope": "all",
  "transitions": [
    ["include", "*.*", "*undefined", "*defined"]
  ]
}

Key Difference with bloc_scope="all":

With bloc_scope="any": Count if ANY field matches
With bloc_scope="all": Count ONLY if ALL changed fields match

Logic:

For each inclusion:
  Find all Inclusion fields that changed
  Check if ALL changes are: undefined → defined
If all changed fields match pattern:
  COUNT this inclusion (expected pattern)
If any changed field doesn't match:
  SKIP (unexpected pattern)

If count > 200: CRITICAL (too many gaining data)

Example 4: Request Lifecycle Validation (v3.0)

Requirement: Detect expected test request state transitions

{
  "bloc_title": "Endotest",
  "line_label": "Request Status Changes",
  "warning_threshold": 0,
  "critical_threshold": 100,
  "field_selection": [["include", "Endotest.Request_Sent"], ["include", "Endotest.Request_Status"]],
  "bloc_scope": "any",
  "transitions": [
    ["include", "Endotest.Request_Sent", false, true],
    ["include", "Endotest.Request_Status", "pending", "accepted"],
    ["include", "Endotest.Request_Status", "pending", "rejected"]
  ]
}

Field Selection Pipeline:

Empty set start
Step 1: Include Endotest.Request_Sent
Step 2: Include Endotest.Request_Status
Result: {Endotest.Request_Sent, Endotest.Request_Status}

Logic:

For each inclusion:
  Check Endotest fields (Request_Sent, Request_Status)
  If ANY field matches transitions:
    COUNT this inclusion
If count > 100: CRITICAL (too many status changes)

Example 5: Valid Workflow Transitions

Requirement: Alert on workflow changes but only for valid state transitions (request can go from pending to accepted/rejected/resubmitted)

{
  "bloc_title": "Endotest",
  "line_label": "Valid Request Transitions",
  "warning_threshold": 0,
  "critical_threshold": 50,
  "field_group": {"include": ["Endotest"]},
  "field_name": ["Request_Status"],
  "transitions": [
    ["include", "Endotest.Request_Status", "pending", "accepted"],
    ["include", "Endotest.Request_Status", "pending", "rejected"],
    ["include", "Endotest.Request_Status", "rejected", "resubmitted"],
    ["include", "Endotest.Request_Status", "accepted", "cancelled"]
  ],
  "bloc_scope": "any"
}

Logic:

For each inclusion:
  Check if Request_Status field changed
  If transition matches ONE of the 4 allowed transitions:
    COUNT this inclusion (valid workflow)
  If transition is different:
    SKIP (unexpected change - needs investigation)

If count > 50: CRITICAL (too many valid status transitions)

Note: With multiple transitions in the exception, the field must match ANY of the specified transitions to be included.

Example 6: Exclude Internal Fields

Requirement: Monitor data changes but ignore internal/system fields

{
  "bloc_title": "Identification",
  "line_label": "Data Changes",
  "warning_threshold": 0,
  "critical_threshold": 100,
  "field_group": null,
  "field_name": {"exclude": ["Last_Updated", "Import_Time", "Internal_Id"]},
  "transitions": [
    ["include", "*.*", "*", "*"]
  ],
  "bloc_scope": "any"
}

Logic:

For each inclusion:
  Check ALL fields EXCEPT [Last_Updated, Import_Time, Internal_Id]
  If ANY field changed:
    COUNT this inclusion
If count > 100: CRITICAL (too many changes)

User Guide: Adding/Modifying Rules

Step 1: Identify Rule Need

Determine the data validation requirement:

Detection Type          Use Pattern
─────────────────────────────────────────────────
New patients added      "New Inclusions" rule
Patients removed        "Deleted Inclusions" rule
Field values changed    Standard rule + transitions
Field added/removed     "New/Deleted Fields" rule
Specific transitions    Standard rule + narrow transitions
Exclude system changes  Standard rule + exceptions

Step 2: Choose Rule Type

Rule Type	When to Use	Complexity
New Inclusions	Track patient additions	Simple
Deleted Inclusions	Track patient removals	Simple
New Fields	Monitor schema changes	Simple
Deleted Fields	Detect removed data	Simple
Standard (Transitions)	Monitor specific changes	Medium
Standard (with Exceptions)	Monitor changes + allowances	Complex

Step 3: Define Thresholds

Decision Matrix:

Threshold Pattern    Meaning              Example Use
─────────────────────────────────────────────────────
(0, 0)              No changes allowed   Critical data
(0, 1)              Anything is critical Surgery dates
(0, 50)             Strict monitoring    High-value fields
(50, 100)           Normal operation     Flexible fields
(200, 200)          Skip to critical     Lenient tracking

Recommendation:

Strict validation (medical):
  warning = 0, critical = 1

Normal validation (most fields):
  warning = 5, critical = 20

Lenient validation (administrative):
  warning = 50, critical = 100

Step 4: Create Rule Row in Excel

Open Endobest_Dashboard_Config.xlsx → Regression_Check sheet

Row N:
A: ignore            (leave empty)
B: bloc_title        (e.g., "Inclusion Protocol")
C: line_label        (e.g., "Status Changed")
D: warning_threshold (e.g., 0)
E: critical_threshold (e.g., 20)
F: field_group       (e.g., "Inclusion")
G: field_name        (e.g., ["Status", "Date"])
H: bloc_scope        (e.g., "any")
I: transitions       (e.g., [["include", "*.*", "*", "*"]])

Step 5: Define Field Scope

Decide which fields the rule applies to:

Scope                    JSON
──────────────────────────────────────────────
All fields               null
All in group X           "group_name"
Multiple groups          {"include": ["group1", "group2"]}
All except group X       {"exclude": ["group1"]}
Specific field           "field_name"
Multiple fields          ["field1", "field2"]
Field with notation      ["Group.field1", "Group.field2"]

Step 6: Define Transitions

Specify what changes to monitor:

Pattern                  JSON              Meaning
────────────────────────────────────────────────────────────
Any change               [["*", "*"]]      Monitor all changes
Become defined           [["*undefined", "*defined"]]      Field gets value
Become undefined         [["*defined", "*undefined"]]      Field loses value
Toggle boolean           [[true, false], [false, true]]    Boolean flip
Specific change          [["old", "new"]]  Exact transition
Multiple changes         [["old1", "new1"], ["old2", "new2"]]  Multiple patterns

Step 7: Set Exceptions (Optional)

Allow specific field/transition combinations:

If needed:
i: transition_exceptions = {
    "include": [
      {"field": "Request_Sent", "transition": [false, true]}
    ]
  }

Or exclude specific cases:
i: transition_exceptions = {
    "exclude": [
      {"field": "Last_Updated"}
    ]
  }

Step 8: Choose Bloc Scope

Decide aggregation logic:

Requirement              bloc_scope
─────────────────────────────────────────────
Any field changes        "any" (default)
All changes match        "all"

Step 9: Validate & Test

# Check-only mode (validates configuration)
python eb_dashboard.py --check-only

# Expected output:
# ✓ Loaded 42 regression check rules
# ✓ All checks passed

Step 10: Full Collection Test

# Run full collection to test rule
python eb_dashboard.py

# After collection, verify:
# 1. Rule appears in output
# 2. Severity level is correct (OK/Warning/Critical)
# 3. Count matches expectations

Execution Modes

Mode 1: Normal Collection with Quality Checks

python eb_dashboard.py

Workflow:

1. Collect data (organizations, inclusions)
2. Run Coherence Check
3. Run Non-Regression Check (if old file exists)
4. If critical issues: Ask user for confirmation
5. If OK or user confirms: Export files
6. Display elapsed time

Output:

Collecting data from 15 organizations...
[████████████████████] 1200/1200

═══ Coherence Check ═══
✓ [green]TOTAL matches[/green]

═══ Non Regression Check ═══
✓ [green]Structure: New Fields: 0[/green]
✓ [green]Identification: New Inclusions: 0[/green]
...

✓ All checks passed successfully!

Writing files...
Elapsed time: 3:42

Mode 2: Check-Only (Validation Only)

python eb_dashboard.py --check-only

Workflow:

1. Load existing JSON files (no API calls)
2. Load regression configuration
3. Run Coherence Check
4. Run Non-Regression Check
5. Report results
6. Exit

Use Case: Validate data before distribution without fresh collection

Output:

═══ CHECK ONLY MODE ═══
Running quality checks on existing data files...

[Loading configuration...]
[Running checks...]

✓ All checks passed successfully!

Mode 3: Compare Two Files

python eb_dashboard.py --check-only file1.json file2.json

Workflow:

1. Load file1 and file2 (as current and old)
2. Skip coherence check (organizations not provided)
3. Run regression check comparing them
4. Report differences
5. Exit

Use Case: Compare two snapshots, detect changes between versions

Output:

═══ CHECK ONLY COMPARE MODE ═══
Comparing two specific files:
  Current: file1.json
  Old: file2.json

[Running regression checks...]

⚠ [yellow]New Inclusions: 15[/yellow]
✗ [red]Deleted Inclusions: 5[/red]
...

Mode 4: Debug Mode (Verbose Output)

python eb_dashboard.py --debug

Workflow:

1. Execute as Normal Mode
2. Enable DEBUG_MODE in quality checks
3. Display detailed field-by-field changes
4. Show individual inclusion comparisons
5. Verbose logging

Use Case: Troubleshoot regression rules, understand data changes

Output:

Running collection...
[████████] 1200/1200

═══ Non Regression Check (DEBUG MODE) ═══

Endotest - Undefined to Defined (Only): 12
  ✓ Patient-001:
    - Endotest.Request_Sent: false → true
    - Endotest.Request_Status: undefined → 'completed'

  ✓ Patient-002:
    - Endotest.Request_Sent: false → true

...

Troubleshooting

Issue 1: "Invalid JSON format" Error

Symptom: Configuration validation fails

Cause: Malformed JSON in transitions, field_name, or exceptions

Solution:

Open cell in JSON validator
Fix syntax errors
Re-run check

Example - WRONG:

{
  "transitions": [["active", "inactive" ]  // Missing comma
}

{
  "field_name": ["Status" "Date"]  // Missing comma between array elements
}

Example - CORRECT:

{
  "transitions": [["active", "inactive"]]
}

{
  "field_name": ["Status", "Date"]
}

Issue 2: Rule Never Triggers

Symptom: Count always shows 0 even when data changes

Causes:

Field filters too restrictive
Transition pattern doesn't match actual changes
field_group/field_name filtering excludes target fields

Solution:

Loosen field filters: Set field_name to null
Use wildcards in transitions: ["*", "*"]
Check actual field names in JSON output
Enable debug mode to see field matching

Issue 3: Too Many False Positives

Symptom: Rule triggers unexpectedly, too many violations

Causes:

Thresholds set too low
Transitions too broad (matching unintended changes)
field_group/field_name too permissive

Solution:

Increase thresholds: Raise warning_threshold and critical_threshold
Narrow transitions: Use specific values instead of wildcards
Add exceptions: Use transition_exceptions to exclude specific cases
Narrow field scope: Specify field_name instead of null

Issue 4: Configuration Changes Not Taking Effect

Symptom: Modifications to Excel file don't affect results

Causes:

File not saved
Regression_Check sheet not loaded
Old configuration still in memory

Solution:

Save Excel file (Ctrl+S)
Restart Python script
Verify sheet name is exactly "Regression_Check"
Check file path is correct

Issue 5: User Confirmation Not Appearing

Symptom: Expected prompt for critical issues doesn't show

Causes:

Issues are at warning level, not critical
Thresholds higher than actual counts
Running in check-only mode (no export decision needed)

Solution:

Verify thresholds: warning < critical
Check actual violation counts
Run normal mode (not check-only)

Issue 6: Comparison Mode Showing Unexpected Differences

Symptom: --check-only file1 file2 reports many changes

Causes:

Files are from different collection dates (expected)
Configuration changed between collections (expected)
Field order or grouping changed (might be false positive)

Solution:

Review reported changes manually
Check if changes are expected (new patient data added)
Verify no data corruption occurred
Compare file sizes and counts manually

Performance Considerations

Regression Check Execution Time

Factors Affecting Performance:

1. Number of Inclusions (patients)
   - N patients = O(N) iterations
   - Typical: 1200 patients = 1-2 seconds

2. Number of Rules
   - R rules applied to each inclusion
   - Typical: 20-30 rules = <100ms total

3. Field Matching Complexity
   - Filter evaluation per field
   - Notation pointée parsing: O(1) per field
   - Typical: <50ms for all rules

4. Total Typical Time
   - 1200 inclusions × 25 rules = 1-3 seconds

Optimization Tips

If Regression Check is Slow:

Reduce rule count:
- Remove inactive rules (add "ignore" label)
- Combine similar rules
Simplify field filters:
- Use null instead of large filter lists
- Use include (smaller) instead of exclude (larger)
Narrow transitions:
- Use specific values instead of wildcards
- Reduce number of transition pairs
Consider file size:
- Large JSON files (>20MB) take longer to parse
- This is rare and usually not the bottleneck

Summary

The Quality Checks System provides:

✅ Multi-Level Validation: Coherence + Regression checks ✅ Config-Driven Rules: No code changes needed ✅ Flexible Thresholds: Warning and Critical levels ✅ Rich Filtering: Group, field, notation pointée support ✅ Transition Patterns: Wildcard, keyword, and specific matching ✅ Advanced Exception Handling:

Multiple transitions per exception: [[old1, new1], [old2, new2], ...]
Include + Exclude can coexist simultaneously
Fine-grained control over allowed/blocked transitions ✅ Backward Compatible: Legacy single-transition format still supported ✅ Debug Support: Detailed logging and debug mode ✅ Execution Modes: Normal, check-only, compare, debug

This architecture enables robust data quality monitoring without requiring code modifications, empowering business analysts to define and evolve validation rules independently.

Document End

59 KiB Raw Blame History Unescape Escape

Endobest Quality Checks & Regression Testing Guide

Part 3: Quality Assurance, Validation Rules & Configuration

Version History

Version 3.0 (2025-10-22) - UNIFIED FIELD SELECTION PIPELINE

Version 2.0 (2025-10-22) - Pipeline Architecture

Version 1.0 (2025-10-21) - Initial Release

Table of Contents

⚠️ CRITICAL - Version 3.0 Migration Required

Overview

Design Philosophy

Quality Assurance Strategy

Workflow Integration

Severity Levels

User Interaction

Coherence Check (Technical Details)

Purpose

Data Sources

Validation Logic

Example Output

Interpretation

Non-Regression Check Framework

Purpose

Architecture

Regression Check Configuration File

File Location & Sheet

Sheet Structure (Version 3.0)

Column Reference

Column A: ignore

Column B: bloc_title

Column C: line_label

Column D: warning_threshold

Column E: critical_threshold

Column F: field_selection (NEW - v3.0)

Column G: bloc_scope (moved from H - v3.0)

Column H: transitions (moved from I - v3.0)

Syntax: 4-Element Pipeline Array

Field Selector Patterns

Complete Examples

Processing Logic

Configuration Error Handling

Special Keywords & Wildcards

Keywords in Transition Patterns

Keyword 1: *undefined

Keyword 2: *defined

Keyword 3: * (Wildcard)

Combining Keywords with Literal Values

Literal Values (No * Prefix)

Summary Table: Special Keywords in Transitions

Rule Type 1: Standard Rules (Normal Comparison)

Rule Type 2: New Inclusions

Rule Type 3: Deleted Inclusions

Rule Type 4: New Fields

Rule Type 5: Deleted Fields

Field Selection Pipeline (v3.0)

Pipeline Ordering (Key Concept)

Simple Examples

Example 1: Single Group

Example 2: Multiple Groups

Example 3: Specific Fields

Example 4: All Except Some

Example 5: Complex Selection

Important Notes

Transition Patterns

Pattern Matching Rules

Literal Value Matching

Undefined Keyword

Wildcard Pattern

Transition Combination Examples

Exception Handling (Pipeline Architecture)

Pattern 1: Simple Whitelist (Include Only)

Pattern 2: Simple Blacklist (Exclude Only)

Pattern 3: Main Rule + Multiple Exceptions

Field Selector Formats in Pipeline

Practical Examples with Pipeline

Configuration Examples

Example 1: Monitor New Inclusions (v3.0)

Example 2: Detect Undefined→Defined Changes (v3.0)

Example 3: Strict All-Fields Completeness (v3.0)

Example 4: Request Lifecycle Validation (v3.0)

59 KiB

Raw Blame History

Keyword 1: `*undefined`

Keyword 2: `*defined`

Keyword 3: `*` (Wildcard)

Literal Values (No `*` Prefix)