# Endobest Quality Checks & Regression Testing Guide ## Part 3: Quality Assurance, Validation Rules & Configuration **Document Version:** 3.1 (Updated with new Excel export module reference) **Last Updated:** 2025-11-08 **Audience:** Developers, Business Analysts, QA Engineers **Language:** English **Note:** Excel export functionality now available - see DOCUMENTATION_13_EXCEL_EXPORT.md, DOCUMENTATION_98_USER_GUIDE.md, and DOCUMENTATION_99_CONFIG_GUIDE.md --- ## Version History ### Version 3.0 (2025-10-22) - UNIFIED FIELD SELECTION PIPELINE **Complete Refactorization of Field Selection** - ✅ **Merged Columns:** `field_group` (F) + `field_name` (G) → single `field_selection` (F) - ✅ **Simplified Syntax:** Field selection uses same pipeline format as transitions: `[["action", "field_selector"], ...]` - ✅ **3 Selector Patterns:** `*.*` (all fields), `group.*` (group), `group.field` (specific) - ✅ **Cleaner Code:** Removed 150+ lines of dual-filter logic (field_group + field_name combinations) - ✅ **Config-Driven Keys:** Key field determination (Patient_Id, Pseudo) now read from `field_selection` instead of hardcoded - ✅ **Unified Key Detection:** New `_get_key_field_from_new_inclusions_rule()` applies field_selection pipeline directly to first inclusion (15 LOC, -75% vs manual parsing) - ✅ **Helper Functions:** `_apply_field_selection_pipeline()`, `_get_key_field_from_new_inclusions_rule()`, `_build_candidate_fields()` - ⚠️ **MAJOR Breaking Change:** Old `field_group` and `field_name` columns (F, G) are **removed** - ⚠️ **Column Shifts:** `bloc_scope` moves H→G, `transitions` moves I→H - ⚠️ **Configuration Migration Required:** Completely restructure Excel `Regression_Check` sheet **Technical Details:** - Field selection pipeline starts with empty set, each step adds/removes fields - Responsibility on admin to order rules correctly (no implicit logic) - Special rules `"New Fields", "Deleted Fields", "Deleted Inclusions"` must have empty field_selection - Special rule `"New Inclusions"` applies field_selection pipeline to first inclusion sample (assumes stable structure) - Key field detection: finds first field from pipeline that has non-null value in both first new and old inclusion - Configuration validation: missing/invalid field_selection = CRITICAL error **Removed Dead Code:** - `_determine_key_field()` - hardcoded Patient_Id/Pseudo logic - `_matches_field_group_filter()` - replaced by pipeline - `_matches_field_name_filter()` - replaced by pipeline - `_determine_key_field_from_config()` - replaced by simplified unified `_get_key_field_from_new_inclusions_rule()` ### Version 2.0 (2025-10-22) - Pipeline Architecture **Transitions Pipeline Introduced** - ✅ **Unified Format:** Merged `transitions` + `transition_exceptions` into single `transitions` column - ✅ **Simplified Syntax:** Each step is a 4-element array `[action, field_selector, from, to]` - ✅ **Sequential Processing:** Pipeline steps applied in order, allowing fine-grained control - ✅ **Better Determinism:** All sets sorted for reproducible logs - ✅ **Improved Error Handling:** Invalid configs silently skipped with warnings - ⚠️ **Breaking Change:** Old `transition_exceptions` column (J) merged into `transitions` (I) ### Version 1.0 (2025-10-21) - Initial Release - Dual-column system: `transitions` (I) + `transition_exceptions` (J) - Include/exclude exception handling - Multiple transition support per exception --- ## Table of Contents 1. [Overview](#overview) 2. [Quality Assurance Strategy](#quality-assurance-strategy) 3. [Coherence Check (Technical Details)](#coherence-check-technical-details) 4. [Non-Regression Check Framework](#non-regression-check-framework) 5. [Regression Check Configuration File](#regression-check-configuration-file) 6. [Column Reference](#column-reference) 7. [Special Keywords & Wildcards](#special-keywords--wildcards) 8. [Rule Types & Logic](#rule-types--logic) 9. [Field Selection Pipeline](#field-selection-pipeline-v30) 10. [Transition Patterns](#transition-patterns) 11. [Exception Handling](#exception-handling) 12. [Configuration Examples](#configuration-examples) 13. [User Guide: Adding/Modifying Rules](#user-guide-adding-modifying-rules) 14. [Execution Modes](#execution-modes) 15. [Troubleshooting](#troubleshooting) --- ## ⚠️ CRITICAL - Version 3.0 Migration Required **This document describes v3.0 with BREAKING CHANGES from v2.0** | Item | v2.0 | v3.0 | |------|------|------| | **Excel Columns F-I** | `field_group`, `field_name`, `bloc_scope`, `transitions` | `field_selection`, `bloc_scope`, `transitions` | | **Column Count** | 4 columns for filtering+transitions | 3 columns (merged field_selection) | | **Key Field Config** | Hardcoded (Patient_Id/Pseudo) | Config-driven (from field_selection) | | **Field Filtering Logic** | 6+ combinations (complex) | Single pipeline (simple) | **ACTION REQUIRED:** 1. ✅ Update Excel file column positions 2. ✅ Migrate field_group + field_name → field_selection 3. ✅ Run non-regression tests 4. ✅ Verify key field detection works with new config --- ## Overview The **Quality Checks System** provides comprehensive data validation in two stages: 1. **Coherence Check:** Verifies that organization statistics (API counters) match the actual detailed inclusion data 2. **Non-Regression Check:** Detects unexpected data changes between current and previous collection runs Both checks are **configurable via Excel** with **Warning/Critical severity levels** that can trigger user confirmation prompts. ### Design Philosophy ``` Trust, but Verify - Trust: API data is generally reliable - Verify: Statistical consistency and change detection - Report: Multi-level severity (OK, Warning, Critical) - Decide: User confirmation before export on critical issues ``` --- ## Quality Assurance Strategy ### Workflow Integration ``` Data Collection ↓ QUALITY CHECKS ├─ COHERENCE CHECK (mandatory) │ ├─ Load organization statistics from API responses │ ├─ Calculate actual counts from detailed inclusions │ └─ Compare: Stats vs. Actual │ ├─ NON-REGRESSION CHECK (if old file exists) │ ├─ Load previous inclusions (_old file) │ ├─ Apply config-driven comparison rules │ └─ Report: Changes matching configured patterns │ └─ RESULT ├─ has_coherence_critical flag └─ has_regression_critical flag ↓ IF critical issues detected: ├─ Display warning: ⚠ CRITICAL ├─ Ask user: "Write results anyway?" ├─ If NO → Abort export, preserve old files └─ If YES → Continue with export (user override) ELSE: └─ Continue with export automatically ``` ### Severity Levels | Level | Display | Meaning | Action | |-------|---------|---------|--------| | **OK** | ✓ Green | No issues, within normal range | Continue automatically | | **WARNING** | ⚠ Yellow | Issue detected, exceeds warning threshold | Log and display, continue automatically | | **CRITICAL** | ✗ Red | Severe issue, exceeds critical threshold | Display, ask user before export | ### User Interaction ``` Quality Checks Complete ✗ [red]Coherence Check: CRITICAL[/red] ⚠ [yellow]Organization 1 mismatch: 95 vs 98[/yellow] ✗ [red]Non-Regression: CRITICAL[/red] ⚠ [yellow]New Inclusions: 42 (threshold 50)[/yellow] ✗ [red]Deleted Inclusions: 15 (threshold 0)[/red] [bold]⚠ CRITICAL issues detected in quality checks![/bold] Do you want to write the results anyway? [y/N]: y → Export anyway (risky, user override) n → Cancel export (preserve old files) ``` --- ## Coherence Check (Technical Details) ### Purpose Verify that **organization statistics** (fetched from API) match **actual detailed data** (inclusion-by-inclusion count). ### Data Sources **Source 1: Organization Statistics (API)** ``` For each organization: GET /api/inclusions/inclusion-statistics Returns: { "totalInclusions": N, // Total patients "preIncluded": P, // Pré-inclus count "included": I, // Inclus count "prematurelyTerminated": T // Prematurely terminated } ``` **Source 2: Inclusion Details (JSON Array)** ``` For each patient in endobest_inclusions: Check: Patient_Identification.Organisation_Id Count: Based on Inclusion.Inclusion_Status Classification rules: 1. If status ends with " - AP" → prematurely_terminated 2. Else if status starts with "pré-inclus" → preincluded 3. Else if status starts with "inclus" → included Always count: patients += 1 ``` ### Validation Logic ```python def coherence_check(current_inclusions, organizations_list): # STEP 1: Collect statistics from API total_stats = { 'patients': sum(org['patients_count'] for org in organizations), 'preincluded': sum(org['preincluded_count'] for org in organizations), 'included': sum(org['included_count'] for org in organizations), 'prematurely_terminated': sum(org['prematurely_terminated_count'] for org in organizations) } # STEP 2: Calculate actual counts from detailed data total_detail = calculate_detail_counters(current_inclusions, org_id=None) # = (patients, preincluded, included, prematurely_terminated) # STEP 3: Compare all 4 counters is_match = ( total_stats['patients'] == total_detail['patients'] AND total_stats['preincluded'] == total_detail['preincluded'] AND total_stats['included'] == total_detail['included'] AND total_stats['prematurely_terminated'] == total_detail['prematurely_terminated'] ) # STEP 4: Report total comparison IF is_match: PRINT: ✓ [green]TOTAL matches[/green] ELSE: PRINT: ✗ [red]TOTAL mismatch[/red] PRINT: Stats({P}/{Pre}/{Inc}/{Term}) vs Detail({p}/{pre}/{inc}/{term}) set has_critical = True # STEP 5: Detail-level comparison (only if not OK) FOR EACH organization: org_stats = get organization counters org_detail = calculate_detail_counters(current_inclusions, org_id=org.id) IF org_stats != org_detail: PRINT: ⚠ [yellow]Organization "{name}" mismatch[/yellow] PRINT: Stats vs Detail breakdown set has_critical = True RETURN has_critical ``` ### Example Output **Scenario: Perfect Match** ``` ═══ Coherence Check ═══ ✓ [green]TOTAL - Stats(150/20/120/10) vs Detail(150/20/120/10)[/green] ``` **Scenario: Mismatch Detected** ``` ═══ Coherence Check ═══ ✗ [red]TOTAL - Stats(150/20/118/10) vs Detail(150/20/120/10)[/red] ⚠ [yellow]Center A - Stats(50/5/40/5) vs Detail(50/5/42/5)[/yellow] ⚠ [yellow]Center B - Stats(100/15/78/5) vs Detail(100/15/78/5)[/yellow] ``` ### Interpretation **Match (Green):** ``` API statistics perfectly align with detailed data → No data collection issues → Continue processing ``` **Minor Mismatch (Yellow):** ``` 1-2 patients differ between statistics and details → Possible API consistency issue → Monitor but continue (it happens occasionally) ``` **Major Mismatch (Red):** ``` 10+ patients difference → Significant data collection issue → Investigate root cause → Consider re-running collection ``` --- ## Non-Regression Check Framework ### Purpose Detect **unexpected data changes** between current and previous collections by comparing field values against configured transition patterns. ### Architecture ``` Previous Inclusions (File) ↓ ┌─────────────────────────────┐ │ NON-REGRESSION CHECK │ ├─────────────────────────────┤ │ 1. Load Regression Config │ │ (Excel: Regression_Check sheet) │ │ │ 2. Build Inclusion Dicts │ │ Index by: Patient_Id or Pseudo │ │ │ 3. Group Rules by Bloc │ │ - Structure │ │ - Identification │ │ - Inclusion Protocol │ │ - Endotest │ │ - Other Questionnaires │ │ │ │ 4. For Each Rule: │ │ a) Detect rule type │ │ - Normal rule │ │ - New Inclusions │ │ - Deleted Inclusions │ │ - New Fields │ │ - Deleted Fields │ │ │ │ b) Process rule logic │ │ - Collect candidates │ │ - Match transitions │ │ - Apply exceptions │ │ - Apply bloc_scope │ │ │ │ c) Calculate severity │ │ - Count vs thresholds │ │ - Determine status │ │ │ │ 5. Display Results │ │ - By bloc │ │ - Color-coded status │ │ - Detailed changes (debug) │ │ └─────────────────────────────┘ ↓ Current Inclusions (Memory) ``` --- ## Regression Check Configuration File ### File Location & Sheet ``` Endobest_Dashboard_Config.xlsx │ ├─ Sheet 1: "Inclusions_Mapping" (See DOCUMENTATION_11_FIELD_MAPPING.md) │ └─ Sheet 2: "Regression_Check" ├─ Row 1: Headers └─ Row 2+: Rules ``` ### Sheet Structure (Version 3.0) ``` Row 1 (Headers): A B C D E ignore bloc_title line_label warning_threshold critical_threshold F G H field_selection bloc_scope transitions Row 2+: Rule definitions (one per row) ``` **BREAKING CHANGE (v3.0):** Columns F and G from v2.0 (`field_group` and `field_name`) have been **merged into single column F (`field_selection`)**. All subsequent columns shifted left by one position. **Color Coding:** - **Yellow:** Structure/Identification bloc (foundational rules) - **Blue:** Inclusion Protocol bloc (inclusion status rules) - **Light Purple:** Endotest bloc (test-related rules) - **White:** Regular rules - **Red:** Incomplete/error rules (missing required columns) --- ## Column Reference ### Column A: ignore **Type:** String (optional) **Description:** Skip this row if contains "ignore" (case-insensitive) **Purpose:** Comment out rules without deleting rows **Values:** ``` ignore → Row is skipped (empty) → Row is processed any_other_text → Row is processed ``` ### Column B: bloc_title **Type:** String (required) **Description:** Logical grouping of related rules **Purpose:** Visual organization and blocking/reporting **Valid Values:** ``` Structure → File format and field availability rules Identification → Patient identification changes Inclusion Protocol → Inclusion status and protocol changes Endotest → Laboratory test request changes Other Questionnaires → Non-specific questionnaire changes [Custom Group Names] → Any custom bloc name for organization ``` **Rules Per Bloc:** ``` Structure bloc (Example): ├─ New Fields ├─ Deleted Fields └─ (Structure-specific rules) Identification bloc: ├─ New Inclusions ├─ Deleted Inclusions ├─ Changed (Excluding Birthday) ├─ Changed Date of Birth/Age └─ (Identification-specific rules) Endotest bloc: ├─ Undefined to Defined (Only) ├─ Defined to Undefined ├─ Changed Value └─ (Endotest-specific rules) ``` ### Column C: line_label **Type:** String (required) **Description:** Unique rule identifier within its bloc **Purpose:** Displayed in output, identifies rule in reports **Examples:** ``` New Inclusions Deleted Inclusions New Fields Deleted Fields Changed Value Undefined to Defined (Only) ``` **Requirements:** - Must be unique within bloc_title - Should be descriptive ### Column D: warning_threshold **Type:** Numeric (required, >= 0) **Description:** Count threshold that triggers WARNING level **Position:** Column D (after line_label) **Logic:** ``` IF count > warning_threshold AND count <= critical_threshold: Status = WARNING (yellow ⚠) ``` **Examples:** ``` 0 → Any change triggers warning (strict) 5 → 1-5 changes = OK, 6-10 = Warning 50 → 1-50 changes = OK, 51+ = Warning (lenient) 200 → Very lenient, only alert on large changes ``` ### Column E: critical_threshold **Type:** Numeric (required, >= warning_threshold) **Description:** Count threshold that triggers CRITICAL level **Position:** Column E (after warning_threshold) **Logic:** ``` IF count > critical_threshold: Status = CRITICAL (red ✗) → May prompt user for confirmation ``` **Relationship:** ``` warning_threshold <= critical_threshold Examples: (0, 1) → Strict: any change is critical (0, 50) → Any warning also becomes critical (50, 100) → Normal operation: 1-50 OK, 51-100 warning, 100+ critical (200, 200) → Same thresholds: jump directly from OK to critical ``` ### Column F: field_selection (NEW - v3.0) **Type:** JSON array of 2-element arrays (mandatory for most rules) **Description:** Pipeline-based field selection using include/exclude actions **Position:** Column F (after critical_threshold) - **REPLACES old field_group + field_name** **Rules:** - **Format:** `[["action", "field_selector"], ["action", "field_selector"], ...]` - **Mandatory:** For all rules EXCEPT `"New Fields"`, `"Deleted Fields"`, `"Deleted Inclusions"` - **For special rules:** Must be empty `[]` or null - **Explicit:** No implicit logic - admin must order steps correctly - **Pipeline:** Starts with empty set, each step adds or removes fields **Elements:** | Element | Type | Valid Values | Example | |---------|------|--------------|---------| | **action** | String | `"include"` or `"exclude"` | `"include"` | | **field_selector** | String | `*.*`, `group.*`, `group.field` | `"Endotest.Request_Sent"` | **Selector Patterns (3 only):** ``` *.* → All fields in all groups group.* → All fields in specific group (e.g., "Endotest.*") group.field → Specific field only (e.g., "Endotest.Request_Sent") ``` **Examples:** **1. Include Single Group** ```json [["include", "Endotest.*"]] // All Endotest fields ``` **2. Include Multiple Groups** ```json [["include", "Endotest.*"], ["include", "Inclusion.*"]] // Endotest AND Inclusion fields ``` **3. Include All, Exclude Some** ```json [["include", "*.*"], ["exclude", "Endotest.Last_Updated"]] // All fields EXCEPT Endotest.Last_Updated ``` **4. Key Field Selection (for "New Inclusions" rule)** ```json [["include", "Patient_Identification.Patient_Id"], ["include", "Patient_Identification.Pseudo"]] // Tries Patient_Id first, then Pseudo (in order) ``` **5. Complex Pipeline** ```json [ ["include", "*.*"], ["exclude", "Inclusion.*"], ["exclude", "Patient_Identification.*"] ] // All fields EXCEPT Inclusion and Patient_Identification ``` **Special Rules (field_selection must be EMPTY):** ``` "New Fields" → [] or null "Deleted Fields" → [] or null "Deleted Inclusions" → [] or null ``` **Validation:** - ✅ Missing or null field_selection for normal rules → **CRITICAL ERROR** - ✅ Invalid selector (no dot) → **CRITICAL ERROR** - ✅ Non-list format → **CRITICAL ERROR, skip rule** - ✅ Step with wrong element count → **CRITICAL ERROR, skip rule** ### Column G: bloc_scope (moved from H - v3.0) **Type:** String enum (optional, default: "any") **Description:** Aggregation logic for matching fields within an inclusion **Position:** Column G (after field_selection) **Valid Values:** ``` "any" → At least ONE field must match transitions "all" → ALL changed fields must match transitions ``` **Logic:** **bloc_scope = "any" (Default)** ``` IF ANY candidate field has matching transition: RETURN inclusion matches rule Use for: "Alert if any change occurs" ``` **bloc_scope = "all"** ``` IF ALL changed fields have matching transitions: RETURN inclusion matches rule Use for: "Alert only if all changes match pattern" ``` **Example Comparison:** ``` Inclusion with 5 fields in scope: Field1: Changed, matches transition ✓ Field2: Unchanged (always ignored) Field3: Changed, does NOT match transition ✗ Field4: Unchanged (always ignored) Field5: Changed, matches transition ✓ Changed fields: [Field1, Field3, Field5] Matched changed: [Field1, Field5] Result with bloc_scope="any": ✓ COUNT (Field1 matched) Result with bloc_scope="all": ✗ SKIP (Field3 didn't match) ``` | Scenario | bloc_scope="any" | bloc_scope="all" | |----------|------------------|-----------------| | 1 match, 0 mismatches | ✓ COUNT | ✓ COUNT | | 1 match, 1 mismatch | ✓ COUNT | ✗ SKIP | | 0 matches, 1 mismatch | ✗ SKIP | ✗ SKIP | | 3 matches, 0 mismatches | ✓ COUNT | ✓ COUNT | | 3 matches, 1 mismatch | ✓ COUNT | ✗ SKIP | --- ### Column H: transitions (moved from I - v3.0) **Type:** JSON array of 4-element arrays (optional) **Description:** Pipeline-based transition rules (old_value → new_value) **Position:** Column H (after bloc_scope) **Format:** `[["action", "field_selector", "from_pattern", "to_pattern"], ...]` - Each step is exactly 4 elements - If None/empty: Rule applies to ALL field changes - Supports wildcard keywords: `*undefined`, `*defined`, `*` - Supports literal values for exact matching **Pipeline Concept (v2.0+):** ``` Initial state: All changed fields → is_checked = False Step 1: Include rule for all fields (*.*) with *defined→*defined └─ is_checked = True if transition matches Step 2: Include rule for Endotest.Diagnostic_Status with waiting→*undefined └─ is_checked = True (whitelisted exception) Step 3: Exclude rule for Endotest.Request_Sent with false→true └─ is_checked = False (blacklisted exception) Final result: Only fields matching the pipeline are checked ``` --- #### Syntax: 4-Element Pipeline Array Each pipeline step is a **4-element array**: ```json [action, field_selector, from_pattern, to_pattern] ``` | Element | Description | Examples | |---------|-------------|----------| | **action** | "include" (whitelist) or "exclude" (blacklist) | "include", "exclude" | | **field_selector** | Which fields this step applies to | "*.*", "group.*", "group.field" | | **from_pattern** | Old value pattern to match | "*undefined", "*defined", "*", literal value | | **to_pattern** | New value pattern to match | "*undefined", "*defined", "*", literal value | **Important:** The syntax is **strictly enforced** - each step must have exactly 4 elements. No shortcuts or variants are accepted. --- #### Field Selector Patterns ``` *.* → All fields in all groups group.* → All fields in specific group (e.g., "Endotest.*") group.field → Specific field only (e.g., "Endotest.Request_Sent") ``` --- #### Complete Examples **Example 1: Simple All-Fields Rule (Most Common)** ```json { "transitions": [ ["include", "*.*", "*defined", "*defined"] ] } // Pipeline: Include all fields that change between two defined values ``` **Example 2: Main Rule + One Include Exception** ```json { "transitions": [ ["include", "*.*", "*defined", "*defined"], ["include", "Endotest.Diagnostic_Status", "waiting", "*undefined"] ] } // Step 1: Include all *defined→*defined changes // Step 2: ALSO include specific Endotest.Diagnostic_Status changes from waiting to undefined ``` **Example 3: Main Rule + Include Exception + Exclude Exception** ```json { "transitions": [ ["include", "*.*", "*defined", "*defined"], ["include", "Endotest.Diagnostic_Status", "waiting", "*undefined"], ["exclude", "Endotest.Request_Sent", false, true] ] } // Step 1: Include all *defined→*defined // Step 2: Include Diagnostic_Status waiting→undefined (whitelist) // Step 3: Exclude Request_Sent false→true (blacklist) // Result: Step 3 overrides Step 1 for that specific field+transition ``` **Example 4: Multiple Include Steps for Different Fields** ```json { "transitions": [ ["include", "*.*", "*defined", "*defined"], ["include", "GDD.Status", "pending", "completed"], ["include", "GDD.Status", "pending", "failed"] ] } // Step 1: Include all *defined→*defined changes // Step 2: Include GDD.Status pending→completed // Step 3: Include GDD.Status pending→failed ``` **Example 5: Exclude Rule with Wildcard** ```json { "transitions": [ ["include", "*.*", "*defined", "*defined"], ["exclude", "Endotest.Last_Modified", "*", "*"] ] } // Include all changes EXCEPT any change to Last_Modified field ``` --- #### Processing Logic The pipeline is executed **sequentially**, with each step modifying the `is_checked` status in-place: ``` 1. Initialize: All changed fields have is_checked = False 2. For each transition step in order: a. Check if the current field matches the field_selector b. If yes: Check if the old→new values match from_pattern→to_pattern c. If yes: - If action="include": Set is_checked = True - If action="exclude": Set is_checked = False d. If no: Leave is_checked unchanged 3. Final: Only fields with is_checked = True are counted as matching ``` **Important:** Later steps can override earlier steps. Example: ```json [ ["include", "*.*", "*", "*"], // Step 1: include everything ["exclude", "Field.X", "*", "*"] // Step 2: exclude Field.X (overrides Step 1) ] ``` Result: Everything is included EXCEPT Field.X --- #### Configuration Error Handling If a transitions step has invalid syntax: - The rule is silently skipped (logged with yellow warning) - No exception is thrown - User can see the ⚠ warning in the output - User can choose to save the report or fix the config **Valid syntax example:** ```json ["include", "*.*", "*defined", "*defined"] // ✓ Exactly 4 elements ["include", "*.*", "*defined"] // ✗ Only 3 elements (INVALID) ["maybe", "*.*", "*defined", "*defined"] // ✗ Invalid action (INVALID) ["include", "invalid", "*defined", "*defined"] // ✗ No dot in selector (INVALID) ``` --- ## Special Keywords & Wildcards This section documents the special keywords and patterns used in transition specifications throughout the configuration. ### Keywords in Transition Patterns The regression check configuration supports special keywords with `*` prefix for flexible transition matching: #### Keyword 1: `*undefined` **Meaning:** Matches any "undefined-like" value **Matches:** - `null` (None in Python) - `""` (empty string) - `"undefined"` (literal string) **Example:** ```json { "transitions": [["*undefined", "*defined"]] } // Matches: undefined → Active, null → 42, "" → true, etc. ``` **Use Case:** Detect when a field gets populated for the first time --- #### Keyword 2: `*defined` **Meaning:** Matches any "defined" value (opposite of *undefined) **Matches:** Anything EXCEPT: - `null` (None) - `""` (empty string) - `"undefined"` (literal string) **Example:** ```json { "transitions": [["*defined", "*undefined"]] } // Matches: Active → null, 42 → "", true → "undefined", etc. ``` **Use Case:** Detect when a field loses its value --- #### Keyword 3: `*` (Wildcard) **Meaning:** Matches absolutely any value **Matches:** Any value including: - Defined values (strings, numbers, booleans) - Undefined-like values (null, "", "undefined") - Objects, arrays, etc. **Example:** ```json { "transitions": [["*", "*"]] } // Matches: ANY old value → ANY new value // Essentially: "any change at all" ``` **Use Case:** Monitor all changes to a field, filter out specific cases with exceptions --- ### Combining Keywords with Literal Values Patterns can mix keywords and literal values: | Pattern | Meaning | |---------|---------| | `["*undefined", "*defined"]` | Undefined → Defined (field becomes populated) | | `["*defined", "*undefined"]` | Defined → Undefined (field gets cleared) | | `["*defined", "*defined"]` | Value change while staying defined (actual value change required) | | `["*", "*"]` | Any change at all | | `["Active", "*defined"]` | From literal "Active" to any defined value | | `["*undefined", "Active"]` | From undefined to literal "Active" | --- ### Literal Values (No `*` Prefix) Any value that does NOT start with `*` is treated as a literal value and matched exactly: ```json { "transitions": [ ["pending", "accepted"], // Exact string match [false, true], // Exact boolean match [0, 1], // Exact numeric match [null, "Active"], // null matches null, "Active" matches "Active" ["undefined", "Done"] // "undefined" (literal string) matches "undefined" ] } ``` **Important:** Literal values are matched by exact equality, including: - `"undefined"` - matches the exact string "undefined" (not undefined state) - `null` - matches null values - `""` - matches empty string --- ## Summary Table: Special Keywords in Transitions | Keyword | Matches | Use Case | |---------|---------|----------| | `*undefined` | null, "", "undefined" (any undefined-like value) | Detect when field becomes populated | | `*defined` | Any defined value (NOT null, "", "undefined") | Detect when field loses value | | `*` | Any value whatsoever | Alert on any change; use with exceptions for fine control | | (no `*` prefix) | Exact literal values | Specific value matching (e.g., "pending" → "accepted") | --- ### Rule Type 1: Standard Rules (Normal Comparison) **Purpose:** Detect field value changes matching configured patterns **Processing Steps:** ``` Step 1: Collect Candidate Fields ├─ Filter by field_group (if specified) ├─ Filter by field_name (if specified) └─ Result: List of (group_name, field_name) tuples Step 2: For Each Candidate Field ├─ Get new_value and old_value ├─ Check if transition matches (if transitions specified) ├─ Apply exceptions (include/exclude) ├─ Mark as "checked" if matches Step 3: Apply bloc_scope ├─ With "any": Count inclusion if ANY field is checked ├─ With "all": Count inclusion if ALL changed fields are checked Step 4: Report Matching Inclusions └─ Count vs. thresholds (warning/critical) ``` **Example Configuration:** ```json { "bloc_title": "Inclusion Protocol", "line_label": "Undefined to Defined (Only)", "warning_threshold": 0, "critical_threshold": 200, "field_group": {"include": ["Inclusion"]}, "field_name": null, "transitions": [ ["include", "*.*", "*undefined", "*defined"] ], "bloc_scope": "all" } ``` ### Rule Type 2: New Inclusions **Purpose:** Count patients that exist in current data but not in previous **Syntax:** ```json { "bloc_title": "Identification", "line_label": "New Inclusions", "warning_threshold": 0, "critical_threshold": 50, "field_group": "Patient_Identification", "field_name": ["Patient_Id", "Pseudo"], "transitions": [], "bloc_scope": null } ``` **Note:** For special rules like "New Inclusions", transitions can be left as empty array `[]` since these rules don't use transition matching. **Processing:** ``` 1. Build dictionaries indexed by key field - Key field candidates: Patient_Id, Pseudo (tried in order) - key_dict_new = {patient_key: patient_data for patient in current} - key_dict_old = {patient_key: patient_data for patient in previous} 2. Find new inclusions new_keys = set(key_dict_new.keys()) - set(key_dict_old.keys()) count = len(new_keys) 3. Compare to thresholds IF count > critical_threshold: CRITICAL ELIF count > warning_threshold: WARNING ELSE: OK ``` **Example Output:** ``` ✓ [green]New Inclusions: 0[/green] (No new patients added) ⚠ [yellow]New Inclusions: 42[/yellow] (42 new patients - warning threshold exceeded) ✗ [red]New Inclusions: 75[/red] (75 new patients - exceeds critical threshold of 50) ``` ### Rule Type 3: Deleted Inclusions **Purpose:** Count patients that exist in previous but not in current **Syntax:** ```json { "bloc_title": "Identification", "line_label": "Deleted Inclusions", "warning_threshold": 0, "critical_threshold": 0, "field_group": "Patient_Identification", "field_name": ["Patient_Id", "Pseudo"], "transitions": [], "bloc_scope": null } ``` **Processing:** ``` 1. Build dictionaries (same as New Inclusions) 2. Find deleted inclusions deleted_keys = set(key_dict_old.keys()) - set(key_dict_new.keys()) count = len(deleted_keys) 3. Compare to thresholds IF count > critical_threshold: CRITICAL ELIF count > warning_threshold: WARNING ELSE: OK ``` **Note:** Typically `critical_threshold=0` because any deletion is concerning. ### Rule Type 4: New Fields **Purpose:** Detect field names that appear in current but not in previous **Syntax:** ```json { "bloc_title": "Structure", "line_label": "New Fields", "warning_threshold": 0, "critical_threshold": 1, "field_group": null, "field_name": null, "transitions": [], "bloc_scope": null } ``` **Processing:** ``` 1. For each patient in common (present in both versions): a) Get all groups and fields from current version b) Get all groups and fields from previous version c) Find new fields: current_fields - previous_fields d) Qualified name: "group_name.field_name" 2. Count by field name field_counts = {field_qualified_name: count_of_inclusions} total_new_fields = len(field_counts) 3. Display results For each new field: "Inclusion.New_Field (42 inclusions)" [count = number of inclusions that gained this field] ``` **Example Output:** ``` ✓ [green]New Fields: 0[/green] ⚠ [yellow]New Fields: 2[/yellow] Endotest.New_Request_Type (1 inclusion) Inclusion.New_Status_Code (2 inclusions) ``` ### Rule Type 5: Deleted Fields **Purpose:** Detect field names that exist in previous but not in current **Syntax:** ```json { "bloc_title": "Structure", "line_label": "Deleted Fields", "warning_threshold": 0, "critical_threshold": 1, "field_group": null, "field_name": null, "transitions": [], "bloc_scope": null } ``` **Processing:** Same as "New Fields" but reversed: ``` deleted_fields = previous_fields - current_fields ``` --- ## Field Selection Pipeline (v3.0) **NEW APPROACH:** Field selection now uses the **same pipeline architecture as transitions**. ### Pipeline Ordering (Key Concept) Start with an **empty set of fields**. Each step either **includes** or **excludes** fields: ```python candidate_fields = set() # Empty initially # Step 1: Include all Endotest fields for each field in all_fields: if selector matches "Endotest.*": candidate_fields.add(field) # Step 2: Also include Inclusion.Status for each field in all_fields: if selector matches "Inclusion.Status": candidate_fields.add(field) # Step 3: But exclude Endotest.Last_Updated for each field in all_fields: if selector matches "Endotest.Last_Updated": candidate_fields.discard(field) # Result: Endotest.* + Inclusion.Status, except Endotest.Last_Updated ``` ### Simple Examples #### Example 1: Single Group ```json [["include", "Endotest.*"]] // Result: All Endotest fields ``` #### Example 2: Multiple Groups ```json [["include", "Endotest.*"], ["include", "Inclusion.*"]] // Result: All Endotest + all Inclusion fields ``` #### Example 3: Specific Fields ```json [["include", "Patient_Identification.Patient_Id"], ["include", "Patient_Identification.Pseudo"]] // Result: Only Patient_Id and Pseudo fields ``` #### Example 4: All Except Some ```json [["include", "*.*"], ["exclude", "Endotest.Last_Updated"]] // Result: All fields EXCEPT Endotest.Last_Updated ``` #### Example 5: Complex Selection ```json [ ["include", "*.*"], ["exclude", "Patient_Identification.*"], ["exclude", "Inclusion.*"] ] // Result: All fields EXCEPT Patient_Identification and Inclusion ``` ### Important Notes - ✅ **Order matters:** Steps are applied sequentially - ✅ **Explicit:** Admin responsible for correct pipeline - ✅ **No implicit AND/OR:** Use multiple include steps for OR logic - ✅ **Deterministic:** Sets sorted, reproducible results --- ## Transition Patterns ### Pattern Matching Rules #### Literal Value Matching ```json [ ["active", "inactive"], [true, false], [0, 1] ] // Match exact value changes // Type must match (string vs. number vs. boolean) ``` #### Undefined Keyword ``` *undefined: Matches any undefined-like value - null - "" (empty string) - "undefined" *defined: Matches any defined value - NOT null - NOT "" - NOT "undefined" ``` **Examples:** ```json [ ["*undefined", "*defined"] ] // Transition FROM any undefined TO any defined [ ["*defined", "*undefined"] ] // Transition FROM any defined TO any undefined [ ["*defined", "*defined"] ] // Transition FROM defined TO different defined // (with actual value change check) ``` #### Wildcard Pattern ```json [ ["*", "*"] ] // Match ANY transition // Useful for: "Alert on any change to this field" ``` ### Transition Combination Examples **Example 1: Detect New Values Only** ```json { "transitions": [["*undefined", "*defined"]] } // Alert when field goes from undefined to any value // Ignore when field already had value ``` **Example 2: Detect Value Reversal** ```json { "transitions": [ [true, false], [false, true] ] } // Alert when boolean field toggles in either direction ``` **Example 3: Detect Specific Status Change** ```json { "transitions": [ ["pending", "approved"], ["pending", "rejected"] ] } // Alert when pending status changes to approved or rejected // Ignore all other transitions ``` **Example 4: Detect Anything But This** ```json { "transitions": [ ["include", "*.*", "*", "*"], ["exclude", "Endotest.Last_Updated", "*", "*"] ] } // Alert on any field change // EXCEPT exclude changes to Last_Updated ``` --- ## Exception Handling (Pipeline Architecture) With the new unified pipeline format, exceptions are now just regular pipeline steps with different actions. This section explains the patterns. ### Pattern 1: Simple Whitelist (Include Only) Allow specific field/transition combinations: ```json { "transitions": [ ["include", "Request_Sent", false, true], ["include", "Diagnostic_Status", "warning", "complete"] ] } ``` **Logic:** ``` Step 1: Include Request_Sent with false→true transition Step 2: Include Diagnostic_Status with warning→complete Result: ONLY these specific field+transition combinations are checked ``` ### Pattern 2: Simple Blacklist (Exclude Only) Block specific field/transition combinations: ```json { "transitions": [ ["include", "*.*", "*", "*"], ["exclude", "Last_Updated", "*", "*"], ["exclude", "Endotest.Import_Time", "*", "*"] ] } ``` **Logic:** ``` Step 1: Include all fields with any change (*→*) Step 2: Exclude Last_Updated from being checked Step 3: Exclude Endotest.Import_Time from being checked Result: All fields EXCEPT Last_Updated and Import_Time ``` ### Pattern 3: Main Rule + Multiple Exceptions Combine main transition rule with field-specific exceptions: ```json { "transitions": [ ["include", "*.*", "*defined", "*defined"], ["include", "Request_Sent", false, true], ["exclude", "Endotest.Last_Modified", "*", "*"] ] } ``` **Logic:** ``` Step 1: Include fields that change between two defined values Step 2: ALSO include Request_Sent changing from false to true (even if not *defined→*defined) Step 3: But exclude any change to Last_Modified (overrides Step 1) Result: *defined→*defined changes PLUS Request_Sent false→true, EXCEPT Last_Modified ``` ### Field Selector Formats in Pipeline **Simple field name (matches in any group):** ```json { "field_selector": "Status" } // Matches "Status" in any group // But this is NOT pipeline syntax - use "*.*" with field matching instead ``` **Better: Use qualified notation in field_selector:** ```json ["include", "Endotest.Request_Sent", false, true] // Matches only Endotest group, Request_Sent field // Matches ONLY Endotest.Request_Sent ``` **Full Specification:** ```json { "field": "Endotest.Request_Sent", "transition": [false, true] } // Matches this specific field AND transition combination ``` ### Practical Examples with Pipeline **Example 1: Alert on Most Changes, Except System Fields** ```json { "transitions": [ ["include", "*.*", "*", "*"], ["exclude", "Last_Updated", "*", "*"], ["exclude", "Last_Modified_By", "*", "*"], ["exclude", "Import_Timestamp", "*", "*"] ] } // Step 1: Include ANY field change // Step 2-4: Exclude system timestamp/audit fields ``` **Example 2: Alert on Undefined→Defined, Plus Status Reversals** ```json { "transitions": [ ["include", "*.*", "*undefined", "*defined"], ["include", "Request_Status", "rejected", "submitted"] ] } // Step 1: Include when field goes from undefined to defined // Step 2: ALSO include Request_Status: rejected → submitted (even if not undefined→defined) ``` **Example 3: Complex Medical Rules with Multiple Conditions** ```json { "transitions": [ ["include", "*.*", "*undefined", "*defined"], ["include", "Endotest.Test_Result", "pending", "completed"], ["include", "GDD.Status", "pending", "failed"], ["exclude", "Endotest.Last_Sync", "*", "*"] ] } // Step 1: Include main rule: undefined→defined // Step 2: ALSO include Test_Result pending→completed // Step 3: ALSO include GDD.Status pending→failed // Step 4: But exclude any change to Last_Sync field // Result: All matching transitions except Last_Sync changes ``` **Example 4: Fine-Grained Control with Include + Exclude** ```json { "transitions": [ ["include", "*.*", "*"], ["include", "Status", "*undefined", "*defined"], ["include", "Status", "*defined", "*undefined"], ["exclude", "Last_Updated", "*", "*"], ["exclude", "Internal_Id", "*", "*"] ] } // Step 1: Include any change (baseline) // Step 2-3: Specifically include Status becoming defined/undefined // Step 4-5: Exclude Last_Updated and Internal_Id changes (override Step 1) // Result: All changes EXCEPT Last_Updated/Internal_Id, plus Status transitions ``` --- ## Configuration Examples ### Example 1: Monitor New Inclusions (v3.0) **Requirement:** Alert if unexpected number of patients added ```json { "ignore": null, "bloc_title": "Identification", "line_label": "New Inclusions", "warning_threshold": 0, "critical_threshold": 50, "field_selection": [["include", "Patient_Identification.Patient_Id"], ["include", "Patient_Identification.Pseudo"]], "bloc_scope": null, "transitions": [] } ``` **Field Selection Logic:** ``` Starts empty: candidate_fields = {} Step 1: Include Patient_Identification.Patient_Id Step 2: Include Patient_Identification.Pseudo Result: [Patient_Identification.Patient_Id, Patient_Identification.Pseudo] These become key candidates (tried in order) ``` **Logic:** ``` Count patients in current but not in previous If count > 50: CRITICAL (too many new patients) If count > 0: WARNING (any new patients) If count == 0: OK ``` ### Example 2: Detect Undefined→Defined Changes (v3.0) **Requirement:** Alert if any field becomes defined ```json { "bloc_title": "Inclusion Protocol", "line_label": "Undefined to Defined", "warning_threshold": 0, "critical_threshold": 100, "field_selection": [["include", "Inclusion.*"]], "bloc_scope": "any", "transitions": [ ["include", "*.*", "*undefined", "*defined"] ] } ``` **Field Selection & Transitions:** ``` Field Selection: Include all Inclusion fields Transitions Pipeline: Step 1: Include *.* *undefined→*defined Result: Only undefined→defined changes ``` **Logic:** ``` For each inclusion: Check if Inclusion.Inclusion_Status changed If transition is: undefined → defined: COUNT this inclusion If count > 5: CRITICAL If count > 0: WARNING ``` ### Example 3: Strict All-Fields Completeness (v3.0) **Requirement:** Ensure ALL changed fields follow undefined→defined pattern ```json { "bloc_title": "Inclusion Protocol", "line_label": "All Changes Undefined to Defined", "warning_threshold": 0, "critical_threshold": 200, "field_selection": [["include", "Inclusion.*"]], "bloc_scope": "all", "transitions": [ ["include", "*.*", "*undefined", "*defined"] ] } ``` **Key Difference with bloc_scope="all":** ``` With bloc_scope="any": Count if ANY field matches With bloc_scope="all": Count ONLY if ALL changed fields match ``` **Logic:** ``` For each inclusion: Find all Inclusion fields that changed Check if ALL changes are: undefined → defined If all changed fields match pattern: COUNT this inclusion (expected pattern) If any changed field doesn't match: SKIP (unexpected pattern) If count > 200: CRITICAL (too many gaining data) ``` ### Example 4: Request Lifecycle Validation (v3.0) **Requirement:** Detect expected test request state transitions ```json { "bloc_title": "Endotest", "line_label": "Request Status Changes", "warning_threshold": 0, "critical_threshold": 100, "field_selection": [["include", "Endotest.Request_Sent"], ["include", "Endotest.Request_Status"]], "bloc_scope": "any", "transitions": [ ["include", "Endotest.Request_Sent", false, true], ["include", "Endotest.Request_Status", "pending", "accepted"], ["include", "Endotest.Request_Status", "pending", "rejected"] ] } ``` **Field Selection Pipeline:** ``` Empty set start Step 1: Include Endotest.Request_Sent Step 2: Include Endotest.Request_Status Result: {Endotest.Request_Sent, Endotest.Request_Status} ``` **Logic:** ``` For each inclusion: Check Endotest fields (Request_Sent, Request_Status) If ANY field matches transitions: COUNT this inclusion If count > 100: CRITICAL (too many status changes) ``` ### Example 5: Valid Workflow Transitions **Requirement:** Alert on workflow changes but only for valid state transitions (request can go from pending to accepted/rejected/resubmitted) ```json { "bloc_title": "Endotest", "line_label": "Valid Request Transitions", "warning_threshold": 0, "critical_threshold": 50, "field_group": {"include": ["Endotest"]}, "field_name": ["Request_Status"], "transitions": [ ["include", "Endotest.Request_Status", "pending", "accepted"], ["include", "Endotest.Request_Status", "pending", "rejected"], ["include", "Endotest.Request_Status", "rejected", "resubmitted"], ["include", "Endotest.Request_Status", "accepted", "cancelled"] ], "bloc_scope": "any" } ``` **Logic:** ``` For each inclusion: Check if Request_Status field changed If transition matches ONE of the 4 allowed transitions: COUNT this inclusion (valid workflow) If transition is different: SKIP (unexpected change - needs investigation) If count > 50: CRITICAL (too many valid status transitions) ``` **Note:** With multiple transitions in the exception, the field must match ANY of the specified transitions to be included. --- ### Example 6: Exclude Internal Fields **Requirement:** Monitor data changes but ignore internal/system fields ```json { "bloc_title": "Identification", "line_label": "Data Changes", "warning_threshold": 0, "critical_threshold": 100, "field_group": null, "field_name": {"exclude": ["Last_Updated", "Import_Time", "Internal_Id"]}, "transitions": [ ["include", "*.*", "*", "*"] ], "bloc_scope": "any" } ``` **Logic:** ``` For each inclusion: Check ALL fields EXCEPT [Last_Updated, Import_Time, Internal_Id] If ANY field changed: COUNT this inclusion If count > 100: CRITICAL (too many changes) ``` --- ## User Guide: Adding/Modifying Rules ### Step 1: Identify Rule Need Determine the data validation requirement: ``` Detection Type Use Pattern ───────────────────────────────────────────────── New patients added "New Inclusions" rule Patients removed "Deleted Inclusions" rule Field values changed Standard rule + transitions Field added/removed "New/Deleted Fields" rule Specific transitions Standard rule + narrow transitions Exclude system changes Standard rule + exceptions ``` ### Step 2: Choose Rule Type | Rule Type | When to Use | Complexity | |-----------|------------|-----------| | New Inclusions | Track patient additions | Simple | | Deleted Inclusions | Track patient removals | Simple | | New Fields | Monitor schema changes | Simple | | Deleted Fields | Detect removed data | Simple | | Standard (Transitions) | Monitor specific changes | Medium | | Standard (with Exceptions) | Monitor changes + allowances | Complex | ### Step 3: Define Thresholds ``` Decision Matrix: Threshold Pattern Meaning Example Use ───────────────────────────────────────────────────── (0, 0) No changes allowed Critical data (0, 1) Anything is critical Surgery dates (0, 50) Strict monitoring High-value fields (50, 100) Normal operation Flexible fields (200, 200) Skip to critical Lenient tracking ``` Recommendation: ``` Strict validation (medical): warning = 0, critical = 1 Normal validation (most fields): warning = 5, critical = 20 Lenient validation (administrative): warning = 50, critical = 100 ``` ### Step 4: Create Rule Row in Excel Open `Endobest_Dashboard_Config.xlsx` → `Regression_Check` sheet ``` Row N: A: ignore (leave empty) B: bloc_title (e.g., "Inclusion Protocol") C: line_label (e.g., "Status Changed") D: warning_threshold (e.g., 0) E: critical_threshold (e.g., 20) F: field_group (e.g., "Inclusion") G: field_name (e.g., ["Status", "Date"]) H: bloc_scope (e.g., "any") I: transitions (e.g., [["include", "*.*", "*", "*"]]) ``` ### Step 5: Define Field Scope Decide which fields the rule applies to: ``` Scope JSON ────────────────────────────────────────────── All fields null All in group X "group_name" Multiple groups {"include": ["group1", "group2"]} All except group X {"exclude": ["group1"]} Specific field "field_name" Multiple fields ["field1", "field2"] Field with notation ["Group.field1", "Group.field2"] ``` ### Step 6: Define Transitions Specify what changes to monitor: ``` Pattern JSON Meaning ──────────────────────────────────────────────────────────── Any change [["*", "*"]] Monitor all changes Become defined [["*undefined", "*defined"]] Field gets value Become undefined [["*defined", "*undefined"]] Field loses value Toggle boolean [[true, false], [false, true]] Boolean flip Specific change [["old", "new"]] Exact transition Multiple changes [["old1", "new1"], ["old2", "new2"]] Multiple patterns ``` ### Step 7: Set Exceptions (Optional) Allow specific field/transition combinations: ``` If needed: i: transition_exceptions = { "include": [ {"field": "Request_Sent", "transition": [false, true]} ] } Or exclude specific cases: i: transition_exceptions = { "exclude": [ {"field": "Last_Updated"} ] } ``` ### Step 8: Choose Bloc Scope Decide aggregation logic: ``` Requirement bloc_scope ───────────────────────────────────────────── Any field changes "any" (default) All changes match "all" ``` ### Step 9: Validate & Test ```bash # Check-only mode (validates configuration) python eb_dashboard.py --check-only # Expected output: # ✓ Loaded 42 regression check rules # ✓ All checks passed ``` ### Step 10: Full Collection Test ```bash # Run full collection to test rule python eb_dashboard.py # After collection, verify: # 1. Rule appears in output # 2. Severity level is correct (OK/Warning/Critical) # 3. Count matches expectations ``` --- ## Execution Modes ### Mode 1: Normal Collection with Quality Checks ```bash python eb_dashboard.py ``` **Workflow:** ``` 1. Collect data (organizations, inclusions) 2. Run Coherence Check 3. Run Non-Regression Check (if old file exists) 4. If critical issues: Ask user for confirmation 5. If OK or user confirms: Export files 6. Display elapsed time ``` **Output:** ``` Collecting data from 15 organizations... [████████████████████] 1200/1200 ═══ Coherence Check ═══ ✓ [green]TOTAL matches[/green] ═══ Non Regression Check ═══ ✓ [green]Structure: New Fields: 0[/green] ✓ [green]Identification: New Inclusions: 0[/green] ... ✓ All checks passed successfully! Writing files... Elapsed time: 3:42 ``` ### Mode 2: Check-Only (Validation Only) ```bash python eb_dashboard.py --check-only ``` **Workflow:** ``` 1. Load existing JSON files (no API calls) 2. Load regression configuration 3. Run Coherence Check 4. Run Non-Regression Check 5. Report results 6. Exit ``` **Use Case:** Validate data before distribution without fresh collection **Output:** ``` ═══ CHECK ONLY MODE ═══ Running quality checks on existing data files... [Loading configuration...] [Running checks...] ✓ All checks passed successfully! ``` ### Mode 3: Compare Two Files ```bash python eb_dashboard.py --check-only file1.json file2.json ``` **Workflow:** ``` 1. Load file1 and file2 (as current and old) 2. Skip coherence check (organizations not provided) 3. Run regression check comparing them 4. Report differences 5. Exit ``` **Use Case:** Compare two snapshots, detect changes between versions **Output:** ``` ═══ CHECK ONLY COMPARE MODE ═══ Comparing two specific files: Current: file1.json Old: file2.json [Running regression checks...] ⚠ [yellow]New Inclusions: 15[/yellow] ✗ [red]Deleted Inclusions: 5[/red] ... ``` ### Mode 4: Debug Mode (Verbose Output) ```bash python eb_dashboard.py --debug ``` **Workflow:** ``` 1. Execute as Normal Mode 2. Enable DEBUG_MODE in quality checks 3. Display detailed field-by-field changes 4. Show individual inclusion comparisons 5. Verbose logging ``` **Use Case:** Troubleshoot regression rules, understand data changes **Output:** ``` Running collection... [████████] 1200/1200 ═══ Non Regression Check (DEBUG MODE) ═══ Endotest - Undefined to Defined (Only): 12 ✓ Patient-001: - Endotest.Request_Sent: false → true - Endotest.Request_Status: undefined → 'completed' ✓ Patient-002: - Endotest.Request_Sent: false → true ... ``` --- ## Troubleshooting ### Issue 1: "Invalid JSON format" Error **Symptom:** Configuration validation fails **Cause:** Malformed JSON in transitions, field_name, or exceptions **Solution:** 1. Open cell in JSON validator 2. Fix syntax errors 3. Re-run check **Example - WRONG:** ```json { "transitions": [["active", "inactive" ] // Missing comma } { "field_name": ["Status" "Date"] // Missing comma between array elements } ``` **Example - CORRECT:** ```json { "transitions": [["active", "inactive"]] } { "field_name": ["Status", "Date"] } ``` ### Issue 2: Rule Never Triggers **Symptom:** Count always shows 0 even when data changes **Causes:** 1. Field filters too restrictive 2. Transition pattern doesn't match actual changes 3. field_group/field_name filtering excludes target fields **Solution:** 1. Loosen field filters: Set field_name to null 2. Use wildcards in transitions: `["*", "*"]` 3. Check actual field names in JSON output 4. Enable debug mode to see field matching ### Issue 3: Too Many False Positives **Symptom:** Rule triggers unexpectedly, too many violations **Causes:** 1. Thresholds set too low 2. Transitions too broad (matching unintended changes) 3. field_group/field_name too permissive **Solution:** 1. Increase thresholds: Raise warning_threshold and critical_threshold 2. Narrow transitions: Use specific values instead of wildcards 3. Add exceptions: Use transition_exceptions to exclude specific cases 4. Narrow field scope: Specify field_name instead of null ### Issue 4: Configuration Changes Not Taking Effect **Symptom:** Modifications to Excel file don't affect results **Causes:** 1. File not saved 2. Regression_Check sheet not loaded 3. Old configuration still in memory **Solution:** 1. Save Excel file (Ctrl+S) 2. Restart Python script 3. Verify sheet name is exactly "Regression_Check" 4. Check file path is correct ### Issue 5: User Confirmation Not Appearing **Symptom:** Expected prompt for critical issues doesn't show **Causes:** 1. Issues are at warning level, not critical 2. Thresholds higher than actual counts 3. Running in check-only mode (no export decision needed) **Solution:** 1. Verify thresholds: warning < critical 2. Check actual violation counts 3. Run normal mode (not check-only) ### Issue 6: Comparison Mode Showing Unexpected Differences **Symptom:** `--check-only file1 file2` reports many changes **Causes:** 1. Files are from different collection dates (expected) 2. Configuration changed between collections (expected) 3. Field order or grouping changed (might be false positive) **Solution:** 1. Review reported changes manually 2. Check if changes are expected (new patient data added) 3. Verify no data corruption occurred 4. Compare file sizes and counts manually --- ## Performance Considerations ### Regression Check Execution Time **Factors Affecting Performance:** ``` 1. Number of Inclusions (patients) - N patients = O(N) iterations - Typical: 1200 patients = 1-2 seconds 2. Number of Rules - R rules applied to each inclusion - Typical: 20-30 rules = <100ms total 3. Field Matching Complexity - Filter evaluation per field - Notation pointée parsing: O(1) per field - Typical: <50ms for all rules 4. Total Typical Time - 1200 inclusions × 25 rules = 1-3 seconds ``` ### Optimization Tips **If Regression Check is Slow:** 1. **Reduce rule count:** - Remove inactive rules (add "ignore" label) - Combine similar rules 2. **Simplify field filters:** - Use null instead of large filter lists - Use include (smaller) instead of exclude (larger) 3. **Narrow transitions:** - Use specific values instead of wildcards - Reduce number of transition pairs 4. **Consider file size:** - Large JSON files (>20MB) take longer to parse - This is rare and usually not the bottleneck --- ## Summary The Quality Checks System provides: ✅ **Multi-Level Validation:** Coherence + Regression checks ✅ **Config-Driven Rules:** No code changes needed ✅ **Flexible Thresholds:** Warning and Critical levels ✅ **Rich Filtering:** Group, field, notation pointée support ✅ **Transition Patterns:** Wildcard, keyword, and specific matching ✅ **Advanced Exception Handling:** - Multiple transitions per exception: `[[old1, new1], [old2, new2], ...]` - Include + Exclude can coexist simultaneously - Fine-grained control over allowed/blocked transitions ✅ **Backward Compatible:** Legacy single-transition format still supported ✅ **Debug Support:** Detailed logging and debug mode ✅ **Execution Modes:** Normal, check-only, compare, debug This architecture enables robust data quality monitoring without requiring code modifications, empowering business analysts to define and evolve validation rules independently. --- **Document End**