59 KiB
Endobest Quality Checks & Regression Testing Guide
Part 3: Quality Assurance, Validation Rules & Configuration
Document Version: 3.1 (Updated with new Excel export module reference) Last Updated: 2025-11-08 Audience: Developers, Business Analysts, QA Engineers Language: English
Note: Excel export functionality now available - see DOCUMENTATION_13_EXCEL_EXPORT.md, DOCUMENTATION_98_USER_GUIDE.md, and DOCUMENTATION_99_CONFIG_GUIDE.md
Version History
Version 3.0 (2025-10-22) - UNIFIED FIELD SELECTION PIPELINE
Complete Refactorization of Field Selection
- ✅ Merged Columns:
field_group(F) +field_name(G) → singlefield_selection(F) - ✅ Simplified Syntax: Field selection uses same pipeline format as transitions:
[["action", "field_selector"], ...] - ✅ 3 Selector Patterns:
*.*(all fields),group.*(group),group.field(specific) - ✅ Cleaner Code: Removed 150+ lines of dual-filter logic (field_group + field_name combinations)
- ✅ Config-Driven Keys: Key field determination (Patient_Id, Pseudo) now read from
field_selectioninstead of hardcoded - ✅ Unified Key Detection: New
_get_key_field_from_new_inclusions_rule()applies field_selection pipeline directly to first inclusion (15 LOC, -75% vs manual parsing) - ✅ Helper Functions:
_apply_field_selection_pipeline(),_get_key_field_from_new_inclusions_rule(),_build_candidate_fields() - ⚠️ MAJOR Breaking Change: Old
field_groupandfield_namecolumns (F, G) are removed - ⚠️ Column Shifts:
bloc_scopemoves H→G,transitionsmoves I→H - ⚠️ Configuration Migration Required: Completely restructure Excel
Regression_Checksheet
Technical Details:
- Field selection pipeline starts with empty set, each step adds/removes fields
- Responsibility on admin to order rules correctly (no implicit logic)
- Special rules
"New Fields", "Deleted Fields", "Deleted Inclusions"must have empty field_selection - Special rule
"New Inclusions"applies field_selection pipeline to first inclusion sample (assumes stable structure) - Key field detection: finds first field from pipeline that has non-null value in both first new and old inclusion
- Configuration validation: missing/invalid field_selection = CRITICAL error
Removed Dead Code:
_determine_key_field()- hardcoded Patient_Id/Pseudo logic_matches_field_group_filter()- replaced by pipeline_matches_field_name_filter()- replaced by pipeline_determine_key_field_from_config()- replaced by simplified unified_get_key_field_from_new_inclusions_rule()
Version 2.0 (2025-10-22) - Pipeline Architecture
Transitions Pipeline Introduced
- ✅ Unified Format: Merged
transitions+transition_exceptionsinto singletransitionscolumn - ✅ Simplified Syntax: Each step is a 4-element array
[action, field_selector, from, to] - ✅ Sequential Processing: Pipeline steps applied in order, allowing fine-grained control
- ✅ Better Determinism: All sets sorted for reproducible logs
- ✅ Improved Error Handling: Invalid configs silently skipped with warnings
- ⚠️ Breaking Change: Old
transition_exceptionscolumn (J) merged intotransitions(I)
Version 1.0 (2025-10-21) - Initial Release
- Dual-column system:
transitions(I) +transition_exceptions(J) - Include/exclude exception handling
- Multiple transition support per exception
Table of Contents
- Overview
- Quality Assurance Strategy
- Coherence Check (Technical Details)
- Non-Regression Check Framework
- Regression Check Configuration File
- Column Reference
- Special Keywords & Wildcards
- Rule Types & Logic
- Field Selection Pipeline
- Transition Patterns
- Exception Handling
- Configuration Examples
- User Guide: Adding/Modifying Rules
- Execution Modes
- Troubleshooting
⚠️ CRITICAL - Version 3.0 Migration Required
This document describes v3.0 with BREAKING CHANGES from v2.0
| Item | v2.0 | v3.0 |
|---|---|---|
| Excel Columns F-I | field_group, field_name, bloc_scope, transitions |
field_selection, bloc_scope, transitions |
| Column Count | 4 columns for filtering+transitions | 3 columns (merged field_selection) |
| Key Field Config | Hardcoded (Patient_Id/Pseudo) | Config-driven (from field_selection) |
| Field Filtering Logic | 6+ combinations (complex) | Single pipeline (simple) |
ACTION REQUIRED:
- ✅ Update Excel file column positions
- ✅ Migrate field_group + field_name → field_selection
- ✅ Run non-regression tests
- ✅ Verify key field detection works with new config
Overview
The Quality Checks System provides comprehensive data validation in two stages:
- Coherence Check: Verifies that organization statistics (API counters) match the actual detailed inclusion data
- Non-Regression Check: Detects unexpected data changes between current and previous collection runs
Both checks are configurable via Excel with Warning/Critical severity levels that can trigger user confirmation prompts.
Design Philosophy
Trust, but Verify
- Trust: API data is generally reliable
- Verify: Statistical consistency and change detection
- Report: Multi-level severity (OK, Warning, Critical)
- Decide: User confirmation before export on critical issues
Quality Assurance Strategy
Workflow Integration
Data Collection
↓
QUALITY CHECKS
├─ COHERENCE CHECK (mandatory)
│ ├─ Load organization statistics from API responses
│ ├─ Calculate actual counts from detailed inclusions
│ └─ Compare: Stats vs. Actual
│
├─ NON-REGRESSION CHECK (if old file exists)
│ ├─ Load previous inclusions (_old file)
│ ├─ Apply config-driven comparison rules
│ └─ Report: Changes matching configured patterns
│
└─ RESULT
├─ has_coherence_critical flag
└─ has_regression_critical flag
↓
IF critical issues detected:
├─ Display warning: ⚠ CRITICAL
├─ Ask user: "Write results anyway?"
├─ If NO → Abort export, preserve old files
└─ If YES → Continue with export (user override)
ELSE:
└─ Continue with export automatically
Severity Levels
| Level | Display | Meaning | Action |
|---|---|---|---|
| OK | ✓ Green | No issues, within normal range | Continue automatically |
| WARNING | ⚠ Yellow | Issue detected, exceeds warning threshold | Log and display, continue automatically |
| CRITICAL | ✗ Red | Severe issue, exceeds critical threshold | Display, ask user before export |
User Interaction
Quality Checks Complete
✗ [red]Coherence Check: CRITICAL[/red]
⚠ [yellow]Organization 1 mismatch: 95 vs 98[/yellow]
✗ [red]Non-Regression: CRITICAL[/red]
⚠ [yellow]New Inclusions: 42 (threshold 50)[/yellow]
✗ [red]Deleted Inclusions: 15 (threshold 0)[/red]
[bold]⚠ CRITICAL issues detected in quality checks![/bold]
Do you want to write the results anyway? [y/N]:
y → Export anyway (risky, user override)
n → Cancel export (preserve old files)
Coherence Check (Technical Details)
Purpose
Verify that organization statistics (fetched from API) match actual detailed data (inclusion-by-inclusion count).
Data Sources
Source 1: Organization Statistics (API)
For each organization:
GET /api/inclusions/inclusion-statistics
Returns:
{
"totalInclusions": N, // Total patients
"preIncluded": P, // Pré-inclus count
"included": I, // Inclus count
"prematurelyTerminated": T // Prematurely terminated
}
Source 2: Inclusion Details (JSON Array)
For each patient in endobest_inclusions:
Check: Patient_Identification.Organisation_Id
Count: Based on Inclusion.Inclusion_Status
Classification rules:
1. If status ends with " - AP" → prematurely_terminated
2. Else if status starts with "pré-inclus" → preincluded
3. Else if status starts with "inclus" → included
Always count: patients += 1
Validation Logic
def coherence_check(current_inclusions, organizations_list):
# STEP 1: Collect statistics from API
total_stats = {
'patients': sum(org['patients_count'] for org in organizations),
'preincluded': sum(org['preincluded_count'] for org in organizations),
'included': sum(org['included_count'] for org in organizations),
'prematurely_terminated': sum(org['prematurely_terminated_count'] for org in organizations)
}
# STEP 2: Calculate actual counts from detailed data
total_detail = calculate_detail_counters(current_inclusions, org_id=None)
# = (patients, preincluded, included, prematurely_terminated)
# STEP 3: Compare all 4 counters
is_match = (
total_stats['patients'] == total_detail['patients'] AND
total_stats['preincluded'] == total_detail['preincluded'] AND
total_stats['included'] == total_detail['included'] AND
total_stats['prematurely_terminated'] == total_detail['prematurely_terminated']
)
# STEP 4: Report total comparison
IF is_match:
PRINT: ✓ [green]TOTAL matches[/green]
ELSE:
PRINT: ✗ [red]TOTAL mismatch[/red]
PRINT: Stats({P}/{Pre}/{Inc}/{Term}) vs Detail({p}/{pre}/{inc}/{term})
set has_critical = True
# STEP 5: Detail-level comparison (only if not OK)
FOR EACH organization:
org_stats = get organization counters
org_detail = calculate_detail_counters(current_inclusions, org_id=org.id)
IF org_stats != org_detail:
PRINT: ⚠ [yellow]Organization "{name}" mismatch[/yellow]
PRINT: Stats vs Detail breakdown
set has_critical = True
RETURN has_critical
Example Output
Scenario: Perfect Match
═══ Coherence Check ═══
✓ [green]TOTAL - Stats(150/20/120/10) vs Detail(150/20/120/10)[/green]
Scenario: Mismatch Detected
═══ Coherence Check ═══
✗ [red]TOTAL - Stats(150/20/118/10) vs Detail(150/20/120/10)[/red]
⚠ [yellow]Center A - Stats(50/5/40/5) vs Detail(50/5/42/5)[/yellow]
⚠ [yellow]Center B - Stats(100/15/78/5) vs Detail(100/15/78/5)[/yellow]
Interpretation
Match (Green):
API statistics perfectly align with detailed data
→ No data collection issues
→ Continue processing
Minor Mismatch (Yellow):
1-2 patients differ between statistics and details
→ Possible API consistency issue
→ Monitor but continue (it happens occasionally)
Major Mismatch (Red):
10+ patients difference
→ Significant data collection issue
→ Investigate root cause
→ Consider re-running collection
Non-Regression Check Framework
Purpose
Detect unexpected data changes between current and previous collections by comparing field values against configured transition patterns.
Architecture
Previous Inclusions (File)
↓
┌─────────────────────────────┐
│ NON-REGRESSION CHECK │
├─────────────────────────────┤
│ 1. Load Regression Config │
│ (Excel: Regression_Check sheet)
│ │
│ 2. Build Inclusion Dicts │
│ Index by: Patient_Id or Pseudo
│ │
│ 3. Group Rules by Bloc │
│ - Structure │
│ - Identification │
│ - Inclusion Protocol │
│ - Endotest │
│ - Other Questionnaires │
│ │
│ 4. For Each Rule: │
│ a) Detect rule type │
│ - Normal rule │
│ - New Inclusions │
│ - Deleted Inclusions │
│ - New Fields │
│ - Deleted Fields │
│ │
│ b) Process rule logic │
│ - Collect candidates │
│ - Match transitions │
│ - Apply exceptions │
│ - Apply bloc_scope │
│ │
│ c) Calculate severity │
│ - Count vs thresholds │
│ - Determine status │
│ │
│ 5. Display Results │
│ - By bloc │
│ - Color-coded status │
│ - Detailed changes (debug)
│ │
└─────────────────────────────┘
↓
Current Inclusions (Memory)
Regression Check Configuration File
File Location & Sheet
Endobest_Dashboard_Config.xlsx
│
├─ Sheet 1: "Inclusions_Mapping" (See DOCUMENTATION_11_FIELD_MAPPING.md)
│
└─ Sheet 2: "Regression_Check"
├─ Row 1: Headers
└─ Row 2+: Rules
Sheet Structure (Version 3.0)
Row 1 (Headers):
A B C D E
ignore bloc_title line_label warning_threshold critical_threshold
F G H
field_selection bloc_scope transitions
Row 2+: Rule definitions (one per row)
BREAKING CHANGE (v3.0): Columns F and G from v2.0 (field_group and field_name) have been merged into single column F (field_selection). All subsequent columns shifted left by one position.
Color Coding:
- Yellow: Structure/Identification bloc (foundational rules)
- Blue: Inclusion Protocol bloc (inclusion status rules)
- Light Purple: Endotest bloc (test-related rules)
- White: Regular rules
- Red: Incomplete/error rules (missing required columns)
Column Reference
Column A: ignore
Type: String (optional) Description: Skip this row if contains "ignore" (case-insensitive) Purpose: Comment out rules without deleting rows Values:
ignore → Row is skipped
(empty) → Row is processed
any_other_text → Row is processed
Column B: bloc_title
Type: String (required) Description: Logical grouping of related rules Purpose: Visual organization and blocking/reporting Valid Values:
Structure → File format and field availability rules
Identification → Patient identification changes
Inclusion Protocol → Inclusion status and protocol changes
Endotest → Laboratory test request changes
Other Questionnaires → Non-specific questionnaire changes
[Custom Group Names] → Any custom bloc name for organization
Rules Per Bloc:
Structure bloc (Example):
├─ New Fields
├─ Deleted Fields
└─ (Structure-specific rules)
Identification bloc:
├─ New Inclusions
├─ Deleted Inclusions
├─ Changed (Excluding Birthday)
├─ Changed Date of Birth/Age
└─ (Identification-specific rules)
Endotest bloc:
├─ Undefined to Defined (Only)
├─ Defined to Undefined
├─ Changed Value
└─ (Endotest-specific rules)
Column C: line_label
Type: String (required) Description: Unique rule identifier within its bloc Purpose: Displayed in output, identifies rule in reports Examples:
New Inclusions
Deleted Inclusions
New Fields
Deleted Fields
Changed Value
Undefined to Defined (Only)
Requirements:
- Must be unique within bloc_title
- Should be descriptive
Column D: warning_threshold
Type: Numeric (required, >= 0) Description: Count threshold that triggers WARNING level Position: Column D (after line_label) Logic:
IF count > warning_threshold AND count <= critical_threshold:
Status = WARNING (yellow ⚠)
Examples:
0 → Any change triggers warning (strict)
5 → 1-5 changes = OK, 6-10 = Warning
50 → 1-50 changes = OK, 51+ = Warning (lenient)
200 → Very lenient, only alert on large changes
Column E: critical_threshold
Type: Numeric (required, >= warning_threshold) Description: Count threshold that triggers CRITICAL level Position: Column E (after warning_threshold) Logic:
IF count > critical_threshold:
Status = CRITICAL (red ✗)
→ May prompt user for confirmation
Relationship:
warning_threshold <= critical_threshold
Examples:
(0, 1) → Strict: any change is critical
(0, 50) → Any warning also becomes critical
(50, 100) → Normal operation: 1-50 OK, 51-100 warning, 100+ critical
(200, 200) → Same thresholds: jump directly from OK to critical
Column F: field_selection (NEW - v3.0)
Type: JSON array of 2-element arrays (mandatory for most rules) Description: Pipeline-based field selection using include/exclude actions Position: Column F (after critical_threshold) - REPLACES old field_group + field_name Rules:
- Format:
[["action", "field_selector"], ["action", "field_selector"], ...] - Mandatory: For all rules EXCEPT
"New Fields","Deleted Fields","Deleted Inclusions" - For special rules: Must be empty
[]or null - Explicit: No implicit logic - admin must order steps correctly
- Pipeline: Starts with empty set, each step adds or removes fields
Elements:
| Element | Type | Valid Values | Example |
|---|---|---|---|
| action | String | "include" or "exclude" |
"include" |
| field_selector | String | *.*, group.*, group.field |
"Endotest.Request_Sent" |
Selector Patterns (3 only):
*.* → All fields in all groups
group.* → All fields in specific group (e.g., "Endotest.*")
group.field → Specific field only (e.g., "Endotest.Request_Sent")
Examples:
1. Include Single Group
[["include", "Endotest.*"]]
// All Endotest fields
2. Include Multiple Groups
[["include", "Endotest.*"], ["include", "Inclusion.*"]]
// Endotest AND Inclusion fields
3. Include All, Exclude Some
[["include", "*.*"], ["exclude", "Endotest.Last_Updated"]]
// All fields EXCEPT Endotest.Last_Updated
4. Key Field Selection (for "New Inclusions" rule)
[["include", "Patient_Identification.Patient_Id"], ["include", "Patient_Identification.Pseudo"]]
// Tries Patient_Id first, then Pseudo (in order)
5. Complex Pipeline
[
["include", "*.*"],
["exclude", "Inclusion.*"],
["exclude", "Patient_Identification.*"]
]
// All fields EXCEPT Inclusion and Patient_Identification
Special Rules (field_selection must be EMPTY):
"New Fields" → [] or null
"Deleted Fields" → [] or null
"Deleted Inclusions" → [] or null
Validation:
- ✅ Missing or null field_selection for normal rules → CRITICAL ERROR
- ✅ Invalid selector (no dot) → CRITICAL ERROR
- ✅ Non-list format → CRITICAL ERROR, skip rule
- ✅ Step with wrong element count → CRITICAL ERROR, skip rule
Column G: bloc_scope (moved from H - v3.0)
Type: String enum (optional, default: "any") Description: Aggregation logic for matching fields within an inclusion Position: Column G (after field_selection) Valid Values:
"any" → At least ONE field must match transitions
"all" → ALL changed fields must match transitions
Logic:
bloc_scope = "any" (Default)
IF ANY candidate field has matching transition:
RETURN inclusion matches rule
Use for: "Alert if any change occurs"
bloc_scope = "all"
IF ALL changed fields have matching transitions:
RETURN inclusion matches rule
Use for: "Alert only if all changes match pattern"
Example Comparison:
Inclusion with 5 fields in scope:
Field1: Changed, matches transition ✓
Field2: Unchanged (always ignored)
Field3: Changed, does NOT match transition ✗
Field4: Unchanged (always ignored)
Field5: Changed, matches transition ✓
Changed fields: [Field1, Field3, Field5]
Matched changed: [Field1, Field5]
Result with bloc_scope="any": ✓ COUNT (Field1 matched)
Result with bloc_scope="all": ✗ SKIP (Field3 didn't match)
| Scenario | bloc_scope="any" | bloc_scope="all" |
|---|---|---|
| 1 match, 0 mismatches | ✓ COUNT | ✓ COUNT |
| 1 match, 1 mismatch | ✓ COUNT | ✗ SKIP |
| 0 matches, 1 mismatch | ✗ SKIP | ✗ SKIP |
| 3 matches, 0 mismatches | ✓ COUNT | ✓ COUNT |
| 3 matches, 1 mismatch | ✓ COUNT | ✗ SKIP |
Column H: transitions (moved from I - v3.0)
Type: JSON array of 4-element arrays (optional)
Description: Pipeline-based transition rules (old_value → new_value)
Position: Column H (after bloc_scope)
Format: [["action", "field_selector", "from_pattern", "to_pattern"], ...]
- Each step is exactly 4 elements
- If None/empty: Rule applies to ALL field changes
- Supports wildcard keywords:
*undefined,*defined,* - Supports literal values for exact matching
Pipeline Concept (v2.0+):
Initial state: All changed fields → is_checked = False
Step 1: Include rule for all fields (*.*) with *defined→*defined
└─ is_checked = True if transition matches
Step 2: Include rule for Endotest.Diagnostic_Status with waiting→*undefined
└─ is_checked = True (whitelisted exception)
Step 3: Exclude rule for Endotest.Request_Sent with false→true
└─ is_checked = False (blacklisted exception)
Final result: Only fields matching the pipeline are checked
Syntax: 4-Element Pipeline Array
Each pipeline step is a 4-element array:
[action, field_selector, from_pattern, to_pattern]
| Element | Description | Examples |
|---|---|---|
| action | "include" (whitelist) or "exclude" (blacklist) | "include", "exclude" |
| field_selector | Which fields this step applies to | ".", "group.*", "group.field" |
| from_pattern | Old value pattern to match | "*undefined", "defined", "", literal value |
| to_pattern | New value pattern to match | "*undefined", "defined", "", literal value |
Important: The syntax is strictly enforced - each step must have exactly 4 elements. No shortcuts or variants are accepted.
Field Selector Patterns
*.* → All fields in all groups
group.* → All fields in specific group (e.g., "Endotest.*")
group.field → Specific field only (e.g., "Endotest.Request_Sent")
Complete Examples
Example 1: Simple All-Fields Rule (Most Common)
{
"transitions": [
["include", "*.*", "*defined", "*defined"]
]
}
// Pipeline: Include all fields that change between two defined values
Example 2: Main Rule + One Include Exception
{
"transitions": [
["include", "*.*", "*defined", "*defined"],
["include", "Endotest.Diagnostic_Status", "waiting", "*undefined"]
]
}
// Step 1: Include all *defined→*defined changes
// Step 2: ALSO include specific Endotest.Diagnostic_Status changes from waiting to undefined
Example 3: Main Rule + Include Exception + Exclude Exception
{
"transitions": [
["include", "*.*", "*defined", "*defined"],
["include", "Endotest.Diagnostic_Status", "waiting", "*undefined"],
["exclude", "Endotest.Request_Sent", false, true]
]
}
// Step 1: Include all *defined→*defined
// Step 2: Include Diagnostic_Status waiting→undefined (whitelist)
// Step 3: Exclude Request_Sent false→true (blacklist)
// Result: Step 3 overrides Step 1 for that specific field+transition
Example 4: Multiple Include Steps for Different Fields
{
"transitions": [
["include", "*.*", "*defined", "*defined"],
["include", "GDD.Status", "pending", "completed"],
["include", "GDD.Status", "pending", "failed"]
]
}
// Step 1: Include all *defined→*defined changes
// Step 2: Include GDD.Status pending→completed
// Step 3: Include GDD.Status pending→failed
Example 5: Exclude Rule with Wildcard
{
"transitions": [
["include", "*.*", "*defined", "*defined"],
["exclude", "Endotest.Last_Modified", "*", "*"]
]
}
// Include all changes EXCEPT any change to Last_Modified field
Processing Logic
The pipeline is executed sequentially, with each step modifying the is_checked status in-place:
1. Initialize: All changed fields have is_checked = False
2. For each transition step in order:
a. Check if the current field matches the field_selector
b. If yes: Check if the old→new values match from_pattern→to_pattern
c. If yes:
- If action="include": Set is_checked = True
- If action="exclude": Set is_checked = False
d. If no: Leave is_checked unchanged
3. Final: Only fields with is_checked = True are counted as matching
Important: Later steps can override earlier steps. Example:
[
["include", "*.*", "*", "*"], // Step 1: include everything
["exclude", "Field.X", "*", "*"] // Step 2: exclude Field.X (overrides Step 1)
]
Result: Everything is included EXCEPT Field.X
Configuration Error Handling
If a transitions step has invalid syntax:
- The rule is silently skipped (logged with yellow warning)
- No exception is thrown
- User can see the ⚠ warning in the output
- User can choose to save the report or fix the config
Valid syntax example:
["include", "*.*", "*defined", "*defined"] // ✓ Exactly 4 elements
["include", "*.*", "*defined"] // ✗ Only 3 elements (INVALID)
["maybe", "*.*", "*defined", "*defined"] // ✗ Invalid action (INVALID)
["include", "invalid", "*defined", "*defined"] // ✗ No dot in selector (INVALID)
Special Keywords & Wildcards
This section documents the special keywords and patterns used in transition specifications throughout the configuration.
Keywords in Transition Patterns
The regression check configuration supports special keywords with * prefix for flexible transition matching:
Keyword 1: *undefined
Meaning: Matches any "undefined-like" value
Matches:
null(None in Python)""(empty string)"undefined"(literal string)
Example:
{
"transitions": [["*undefined", "*defined"]]
}
// Matches: undefined → Active, null → 42, "" → true, etc.
Use Case: Detect when a field gets populated for the first time
Keyword 2: *defined
Meaning: Matches any "defined" value (opposite of *undefined)
Matches: Anything EXCEPT:
null(None)""(empty string)"undefined"(literal string)
Example:
{
"transitions": [["*defined", "*undefined"]]
}
// Matches: Active → null, 42 → "", true → "undefined", etc.
Use Case: Detect when a field loses its value
Keyword 3: * (Wildcard)
Meaning: Matches absolutely any value
Matches: Any value including:
- Defined values (strings, numbers, booleans)
- Undefined-like values (null, "", "undefined")
- Objects, arrays, etc.
Example:
{
"transitions": [["*", "*"]]
}
// Matches: ANY old value → ANY new value
// Essentially: "any change at all"
Use Case: Monitor all changes to a field, filter out specific cases with exceptions
Combining Keywords with Literal Values
Patterns can mix keywords and literal values:
| Pattern | Meaning |
|---|---|
["*undefined", "*defined"] |
Undefined → Defined (field becomes populated) |
["*defined", "*undefined"] |
Defined → Undefined (field gets cleared) |
["*defined", "*defined"] |
Value change while staying defined (actual value change required) |
["*", "*"] |
Any change at all |
["Active", "*defined"] |
From literal "Active" to any defined value |
["*undefined", "Active"] |
From undefined to literal "Active" |
Literal Values (No * Prefix)
Any value that does NOT start with * is treated as a literal value and matched exactly:
{
"transitions": [
["pending", "accepted"], // Exact string match
[false, true], // Exact boolean match
[0, 1], // Exact numeric match
[null, "Active"], // null matches null, "Active" matches "Active"
["undefined", "Done"] // "undefined" (literal string) matches "undefined"
]
}
Important: Literal values are matched by exact equality, including:
"undefined"- matches the exact string "undefined" (not undefined state)null- matches null values""- matches empty string
Summary Table: Special Keywords in Transitions
| Keyword | Matches | Use Case |
|---|---|---|
*undefined |
null, "", "undefined" (any undefined-like value) | Detect when field becomes populated |
*defined |
Any defined value (NOT null, "", "undefined") | Detect when field loses value |
* |
Any value whatsoever | Alert on any change; use with exceptions for fine control |
(no * prefix) |
Exact literal values | Specific value matching (e.g., "pending" → "accepted") |
Rule Type 1: Standard Rules (Normal Comparison)
Purpose: Detect field value changes matching configured patterns
Processing Steps:
Step 1: Collect Candidate Fields
├─ Filter by field_group (if specified)
├─ Filter by field_name (if specified)
└─ Result: List of (group_name, field_name) tuples
Step 2: For Each Candidate Field
├─ Get new_value and old_value
├─ Check if transition matches (if transitions specified)
├─ Apply exceptions (include/exclude)
├─ Mark as "checked" if matches
Step 3: Apply bloc_scope
├─ With "any": Count inclusion if ANY field is checked
├─ With "all": Count inclusion if ALL changed fields are checked
Step 4: Report Matching Inclusions
└─ Count vs. thresholds (warning/critical)
Example Configuration:
{
"bloc_title": "Inclusion Protocol",
"line_label": "Undefined to Defined (Only)",
"warning_threshold": 0,
"critical_threshold": 200,
"field_group": {"include": ["Inclusion"]},
"field_name": null,
"transitions": [
["include", "*.*", "*undefined", "*defined"]
],
"bloc_scope": "all"
}
Rule Type 2: New Inclusions
Purpose: Count patients that exist in current data but not in previous
Syntax:
{
"bloc_title": "Identification",
"line_label": "New Inclusions",
"warning_threshold": 0,
"critical_threshold": 50,
"field_group": "Patient_Identification",
"field_name": ["Patient_Id", "Pseudo"],
"transitions": [],
"bloc_scope": null
}
Note: For special rules like "New Inclusions", transitions can be left as empty array [] since these rules don't use transition matching.
Processing:
1. Build dictionaries indexed by key field
- Key field candidates: Patient_Id, Pseudo (tried in order)
- key_dict_new = {patient_key: patient_data for patient in current}
- key_dict_old = {patient_key: patient_data for patient in previous}
2. Find new inclusions
new_keys = set(key_dict_new.keys()) - set(key_dict_old.keys())
count = len(new_keys)
3. Compare to thresholds
IF count > critical_threshold: CRITICAL
ELIF count > warning_threshold: WARNING
ELSE: OK
Example Output:
✓ [green]New Inclusions: 0[/green]
(No new patients added)
⚠ [yellow]New Inclusions: 42[/yellow]
(42 new patients - warning threshold exceeded)
✗ [red]New Inclusions: 75[/red]
(75 new patients - exceeds critical threshold of 50)
Rule Type 3: Deleted Inclusions
Purpose: Count patients that exist in previous but not in current
Syntax:
{
"bloc_title": "Identification",
"line_label": "Deleted Inclusions",
"warning_threshold": 0,
"critical_threshold": 0,
"field_group": "Patient_Identification",
"field_name": ["Patient_Id", "Pseudo"],
"transitions": [],
"bloc_scope": null
}
Processing:
1. Build dictionaries (same as New Inclusions)
2. Find deleted inclusions
deleted_keys = set(key_dict_old.keys()) - set(key_dict_new.keys())
count = len(deleted_keys)
3. Compare to thresholds
IF count > critical_threshold: CRITICAL
ELIF count > warning_threshold: WARNING
ELSE: OK
Note: Typically critical_threshold=0 because any deletion is concerning.
Rule Type 4: New Fields
Purpose: Detect field names that appear in current but not in previous
Syntax:
{
"bloc_title": "Structure",
"line_label": "New Fields",
"warning_threshold": 0,
"critical_threshold": 1,
"field_group": null,
"field_name": null,
"transitions": [],
"bloc_scope": null
}
Processing:
1. For each patient in common (present in both versions):
a) Get all groups and fields from current version
b) Get all groups and fields from previous version
c) Find new fields: current_fields - previous_fields
d) Qualified name: "group_name.field_name"
2. Count by field name
field_counts = {field_qualified_name: count_of_inclusions}
total_new_fields = len(field_counts)
3. Display results
For each new field:
"Inclusion.New_Field (42 inclusions)"
[count = number of inclusions that gained this field]
Example Output:
✓ [green]New Fields: 0[/green]
⚠ [yellow]New Fields: 2[/yellow]
Endotest.New_Request_Type (1 inclusion)
Inclusion.New_Status_Code (2 inclusions)
Rule Type 5: Deleted Fields
Purpose: Detect field names that exist in previous but not in current
Syntax:
{
"bloc_title": "Structure",
"line_label": "Deleted Fields",
"warning_threshold": 0,
"critical_threshold": 1,
"field_group": null,
"field_name": null,
"transitions": [],
"bloc_scope": null
}
Processing: Same as "New Fields" but reversed:
deleted_fields = previous_fields - current_fields
Field Selection Pipeline (v3.0)
NEW APPROACH: Field selection now uses the same pipeline architecture as transitions.
Pipeline Ordering (Key Concept)
Start with an empty set of fields. Each step either includes or excludes fields:
candidate_fields = set() # Empty initially
# Step 1: Include all Endotest fields
for each field in all_fields:
if selector matches "Endotest.*":
candidate_fields.add(field)
# Step 2: Also include Inclusion.Status
for each field in all_fields:
if selector matches "Inclusion.Status":
candidate_fields.add(field)
# Step 3: But exclude Endotest.Last_Updated
for each field in all_fields:
if selector matches "Endotest.Last_Updated":
candidate_fields.discard(field)
# Result: Endotest.* + Inclusion.Status, except Endotest.Last_Updated
Simple Examples
Example 1: Single Group
[["include", "Endotest.*"]]
// Result: All Endotest fields
Example 2: Multiple Groups
[["include", "Endotest.*"], ["include", "Inclusion.*"]]
// Result: All Endotest + all Inclusion fields
Example 3: Specific Fields
[["include", "Patient_Identification.Patient_Id"], ["include", "Patient_Identification.Pseudo"]]
// Result: Only Patient_Id and Pseudo fields
Example 4: All Except Some
[["include", "*.*"], ["exclude", "Endotest.Last_Updated"]]
// Result: All fields EXCEPT Endotest.Last_Updated
Example 5: Complex Selection
[
["include", "*.*"],
["exclude", "Patient_Identification.*"],
["exclude", "Inclusion.*"]
]
// Result: All fields EXCEPT Patient_Identification and Inclusion
Important Notes
- ✅ Order matters: Steps are applied sequentially
- ✅ Explicit: Admin responsible for correct pipeline
- ✅ No implicit AND/OR: Use multiple include steps for OR logic
- ✅ Deterministic: Sets sorted, reproducible results
Transition Patterns
Pattern Matching Rules
Literal Value Matching
[
["active", "inactive"],
[true, false],
[0, 1]
]
// Match exact value changes
// Type must match (string vs. number vs. boolean)
Undefined Keyword
*undefined: Matches any undefined-like value
- null
- "" (empty string)
- "undefined"
*defined: Matches any defined value
- NOT null
- NOT ""
- NOT "undefined"
Examples:
[
["*undefined", "*defined"]
]
// Transition FROM any undefined TO any defined
[
["*defined", "*undefined"]
]
// Transition FROM any defined TO any undefined
[
["*defined", "*defined"]
]
// Transition FROM defined TO different defined
// (with actual value change check)
Wildcard Pattern
[
["*", "*"]
]
// Match ANY transition
// Useful for: "Alert on any change to this field"
Transition Combination Examples
Example 1: Detect New Values Only
{
"transitions": [["*undefined", "*defined"]]
}
// Alert when field goes from undefined to any value
// Ignore when field already had value
Example 2: Detect Value Reversal
{
"transitions": [
[true, false],
[false, true]
]
}
// Alert when boolean field toggles in either direction
Example 3: Detect Specific Status Change
{
"transitions": [
["pending", "approved"],
["pending", "rejected"]
]
}
// Alert when pending status changes to approved or rejected
// Ignore all other transitions
Example 4: Detect Anything But This
{
"transitions": [
["include", "*.*", "*", "*"],
["exclude", "Endotest.Last_Updated", "*", "*"]
]
}
// Alert on any field change
// EXCEPT exclude changes to Last_Updated
Exception Handling (Pipeline Architecture)
With the new unified pipeline format, exceptions are now just regular pipeline steps with different actions. This section explains the patterns.
Pattern 1: Simple Whitelist (Include Only)
Allow specific field/transition combinations:
{
"transitions": [
["include", "Request_Sent", false, true],
["include", "Diagnostic_Status", "warning", "complete"]
]
}
Logic:
Step 1: Include Request_Sent with false→true transition
Step 2: Include Diagnostic_Status with warning→complete
Result: ONLY these specific field+transition combinations are checked
Pattern 2: Simple Blacklist (Exclude Only)
Block specific field/transition combinations:
{
"transitions": [
["include", "*.*", "*", "*"],
["exclude", "Last_Updated", "*", "*"],
["exclude", "Endotest.Import_Time", "*", "*"]
]
}
Logic:
Step 1: Include all fields with any change (*→*)
Step 2: Exclude Last_Updated from being checked
Step 3: Exclude Endotest.Import_Time from being checked
Result: All fields EXCEPT Last_Updated and Import_Time
Pattern 3: Main Rule + Multiple Exceptions
Combine main transition rule with field-specific exceptions:
{
"transitions": [
["include", "*.*", "*defined", "*defined"],
["include", "Request_Sent", false, true],
["exclude", "Endotest.Last_Modified", "*", "*"]
]
}
Logic:
Step 1: Include fields that change between two defined values
Step 2: ALSO include Request_Sent changing from false to true (even if not *defined→*defined)
Step 3: But exclude any change to Last_Modified (overrides Step 1)
Result: *defined→*defined changes PLUS Request_Sent false→true, EXCEPT Last_Modified
Field Selector Formats in Pipeline
Simple field name (matches in any group):
{
"field_selector": "Status"
}
// Matches "Status" in any group
// But this is NOT pipeline syntax - use "*.*" with field matching instead
Better: Use qualified notation in field_selector:
["include", "Endotest.Request_Sent", false, true]
// Matches only Endotest group, Request_Sent field
// Matches ONLY Endotest.Request_Sent
Full Specification:
{
"field": "Endotest.Request_Sent",
"transition": [false, true]
}
// Matches this specific field AND transition combination
Practical Examples with Pipeline
Example 1: Alert on Most Changes, Except System Fields
{
"transitions": [
["include", "*.*", "*", "*"],
["exclude", "Last_Updated", "*", "*"],
["exclude", "Last_Modified_By", "*", "*"],
["exclude", "Import_Timestamp", "*", "*"]
]
}
// Step 1: Include ANY field change
// Step 2-4: Exclude system timestamp/audit fields
Example 2: Alert on Undefined→Defined, Plus Status Reversals
{
"transitions": [
["include", "*.*", "*undefined", "*defined"],
["include", "Request_Status", "rejected", "submitted"]
]
}
// Step 1: Include when field goes from undefined to defined
// Step 2: ALSO include Request_Status: rejected → submitted (even if not undefined→defined)
Example 3: Complex Medical Rules with Multiple Conditions
{
"transitions": [
["include", "*.*", "*undefined", "*defined"],
["include", "Endotest.Test_Result", "pending", "completed"],
["include", "GDD.Status", "pending", "failed"],
["exclude", "Endotest.Last_Sync", "*", "*"]
]
}
// Step 1: Include main rule: undefined→defined
// Step 2: ALSO include Test_Result pending→completed
// Step 3: ALSO include GDD.Status pending→failed
// Step 4: But exclude any change to Last_Sync field
// Result: All matching transitions except Last_Sync changes
Example 4: Fine-Grained Control with Include + Exclude
{
"transitions": [
["include", "*.*", "*"],
["include", "Status", "*undefined", "*defined"],
["include", "Status", "*defined", "*undefined"],
["exclude", "Last_Updated", "*", "*"],
["exclude", "Internal_Id", "*", "*"]
]
}
// Step 1: Include any change (baseline)
// Step 2-3: Specifically include Status becoming defined/undefined
// Step 4-5: Exclude Last_Updated and Internal_Id changes (override Step 1)
// Result: All changes EXCEPT Last_Updated/Internal_Id, plus Status transitions
Configuration Examples
Example 1: Monitor New Inclusions (v3.0)
Requirement: Alert if unexpected number of patients added
{
"ignore": null,
"bloc_title": "Identification",
"line_label": "New Inclusions",
"warning_threshold": 0,
"critical_threshold": 50,
"field_selection": [["include", "Patient_Identification.Patient_Id"], ["include", "Patient_Identification.Pseudo"]],
"bloc_scope": null,
"transitions": []
}
Field Selection Logic:
Starts empty: candidate_fields = {}
Step 1: Include Patient_Identification.Patient_Id
Step 2: Include Patient_Identification.Pseudo
Result: [Patient_Identification.Patient_Id, Patient_Identification.Pseudo]
These become key candidates (tried in order)
Logic:
Count patients in current but not in previous
If count > 50: CRITICAL (too many new patients)
If count > 0: WARNING (any new patients)
If count == 0: OK
Example 2: Detect Undefined→Defined Changes (v3.0)
Requirement: Alert if any field becomes defined
{
"bloc_title": "Inclusion Protocol",
"line_label": "Undefined to Defined",
"warning_threshold": 0,
"critical_threshold": 100,
"field_selection": [["include", "Inclusion.*"]],
"bloc_scope": "any",
"transitions": [
["include", "*.*", "*undefined", "*defined"]
]
}
Field Selection & Transitions:
Field Selection: Include all Inclusion fields
Transitions Pipeline:
Step 1: Include *.* *undefined→*defined
Result: Only undefined→defined changes
Logic:
For each inclusion:
Check if Inclusion.Inclusion_Status changed
If transition is: undefined → defined:
COUNT this inclusion
If count > 5: CRITICAL
If count > 0: WARNING
Example 3: Strict All-Fields Completeness (v3.0)
Requirement: Ensure ALL changed fields follow undefined→defined pattern
{
"bloc_title": "Inclusion Protocol",
"line_label": "All Changes Undefined to Defined",
"warning_threshold": 0,
"critical_threshold": 200,
"field_selection": [["include", "Inclusion.*"]],
"bloc_scope": "all",
"transitions": [
["include", "*.*", "*undefined", "*defined"]
]
}
Key Difference with bloc_scope="all":
With bloc_scope="any": Count if ANY field matches
With bloc_scope="all": Count ONLY if ALL changed fields match
Logic:
For each inclusion:
Find all Inclusion fields that changed
Check if ALL changes are: undefined → defined
If all changed fields match pattern:
COUNT this inclusion (expected pattern)
If any changed field doesn't match:
SKIP (unexpected pattern)
If count > 200: CRITICAL (too many gaining data)
Example 4: Request Lifecycle Validation (v3.0)
Requirement: Detect expected test request state transitions
{
"bloc_title": "Endotest",
"line_label": "Request Status Changes",
"warning_threshold": 0,
"critical_threshold": 100,
"field_selection": [["include", "Endotest.Request_Sent"], ["include", "Endotest.Request_Status"]],
"bloc_scope": "any",
"transitions": [
["include", "Endotest.Request_Sent", false, true],
["include", "Endotest.Request_Status", "pending", "accepted"],
["include", "Endotest.Request_Status", "pending", "rejected"]
]
}
Field Selection Pipeline:
Empty set start
Step 1: Include Endotest.Request_Sent
Step 2: Include Endotest.Request_Status
Result: {Endotest.Request_Sent, Endotest.Request_Status}
Logic:
For each inclusion:
Check Endotest fields (Request_Sent, Request_Status)
If ANY field matches transitions:
COUNT this inclusion
If count > 100: CRITICAL (too many status changes)
Example 5: Valid Workflow Transitions
Requirement: Alert on workflow changes but only for valid state transitions (request can go from pending to accepted/rejected/resubmitted)
{
"bloc_title": "Endotest",
"line_label": "Valid Request Transitions",
"warning_threshold": 0,
"critical_threshold": 50,
"field_group": {"include": ["Endotest"]},
"field_name": ["Request_Status"],
"transitions": [
["include", "Endotest.Request_Status", "pending", "accepted"],
["include", "Endotest.Request_Status", "pending", "rejected"],
["include", "Endotest.Request_Status", "rejected", "resubmitted"],
["include", "Endotest.Request_Status", "accepted", "cancelled"]
],
"bloc_scope": "any"
}
Logic:
For each inclusion:
Check if Request_Status field changed
If transition matches ONE of the 4 allowed transitions:
COUNT this inclusion (valid workflow)
If transition is different:
SKIP (unexpected change - needs investigation)
If count > 50: CRITICAL (too many valid status transitions)
Note: With multiple transitions in the exception, the field must match ANY of the specified transitions to be included.
Example 6: Exclude Internal Fields
Requirement: Monitor data changes but ignore internal/system fields
{
"bloc_title": "Identification",
"line_label": "Data Changes",
"warning_threshold": 0,
"critical_threshold": 100,
"field_group": null,
"field_name": {"exclude": ["Last_Updated", "Import_Time", "Internal_Id"]},
"transitions": [
["include", "*.*", "*", "*"]
],
"bloc_scope": "any"
}
Logic:
For each inclusion:
Check ALL fields EXCEPT [Last_Updated, Import_Time, Internal_Id]
If ANY field changed:
COUNT this inclusion
If count > 100: CRITICAL (too many changes)
User Guide: Adding/Modifying Rules
Step 1: Identify Rule Need
Determine the data validation requirement:
Detection Type Use Pattern
─────────────────────────────────────────────────
New patients added "New Inclusions" rule
Patients removed "Deleted Inclusions" rule
Field values changed Standard rule + transitions
Field added/removed "New/Deleted Fields" rule
Specific transitions Standard rule + narrow transitions
Exclude system changes Standard rule + exceptions
Step 2: Choose Rule Type
| Rule Type | When to Use | Complexity |
|---|---|---|
| New Inclusions | Track patient additions | Simple |
| Deleted Inclusions | Track patient removals | Simple |
| New Fields | Monitor schema changes | Simple |
| Deleted Fields | Detect removed data | Simple |
| Standard (Transitions) | Monitor specific changes | Medium |
| Standard (with Exceptions) | Monitor changes + allowances | Complex |
Step 3: Define Thresholds
Decision Matrix:
Threshold Pattern Meaning Example Use
─────────────────────────────────────────────────────
(0, 0) No changes allowed Critical data
(0, 1) Anything is critical Surgery dates
(0, 50) Strict monitoring High-value fields
(50, 100) Normal operation Flexible fields
(200, 200) Skip to critical Lenient tracking
Recommendation:
Strict validation (medical):
warning = 0, critical = 1
Normal validation (most fields):
warning = 5, critical = 20
Lenient validation (administrative):
warning = 50, critical = 100
Step 4: Create Rule Row in Excel
Open Endobest_Dashboard_Config.xlsx → Regression_Check sheet
Row N:
A: ignore (leave empty)
B: bloc_title (e.g., "Inclusion Protocol")
C: line_label (e.g., "Status Changed")
D: warning_threshold (e.g., 0)
E: critical_threshold (e.g., 20)
F: field_group (e.g., "Inclusion")
G: field_name (e.g., ["Status", "Date"])
H: bloc_scope (e.g., "any")
I: transitions (e.g., [["include", "*.*", "*", "*"]])
Step 5: Define Field Scope
Decide which fields the rule applies to:
Scope JSON
──────────────────────────────────────────────
All fields null
All in group X "group_name"
Multiple groups {"include": ["group1", "group2"]}
All except group X {"exclude": ["group1"]}
Specific field "field_name"
Multiple fields ["field1", "field2"]
Field with notation ["Group.field1", "Group.field2"]
Step 6: Define Transitions
Specify what changes to monitor:
Pattern JSON Meaning
────────────────────────────────────────────────────────────
Any change [["*", "*"]] Monitor all changes
Become defined [["*undefined", "*defined"]] Field gets value
Become undefined [["*defined", "*undefined"]] Field loses value
Toggle boolean [[true, false], [false, true]] Boolean flip
Specific change [["old", "new"]] Exact transition
Multiple changes [["old1", "new1"], ["old2", "new2"]] Multiple patterns
Step 7: Set Exceptions (Optional)
Allow specific field/transition combinations:
If needed:
i: transition_exceptions = {
"include": [
{"field": "Request_Sent", "transition": [false, true]}
]
}
Or exclude specific cases:
i: transition_exceptions = {
"exclude": [
{"field": "Last_Updated"}
]
}
Step 8: Choose Bloc Scope
Decide aggregation logic:
Requirement bloc_scope
─────────────────────────────────────────────
Any field changes "any" (default)
All changes match "all"
Step 9: Validate & Test
# Check-only mode (validates configuration)
python eb_dashboard.py --check-only
# Expected output:
# ✓ Loaded 42 regression check rules
# ✓ All checks passed
Step 10: Full Collection Test
# Run full collection to test rule
python eb_dashboard.py
# After collection, verify:
# 1. Rule appears in output
# 2. Severity level is correct (OK/Warning/Critical)
# 3. Count matches expectations
Execution Modes
Mode 1: Normal Collection with Quality Checks
python eb_dashboard.py
Workflow:
1. Collect data (organizations, inclusions)
2. Run Coherence Check
3. Run Non-Regression Check (if old file exists)
4. If critical issues: Ask user for confirmation
5. If OK or user confirms: Export files
6. Display elapsed time
Output:
Collecting data from 15 organizations...
[████████████████████] 1200/1200
═══ Coherence Check ═══
✓ [green]TOTAL matches[/green]
═══ Non Regression Check ═══
✓ [green]Structure: New Fields: 0[/green]
✓ [green]Identification: New Inclusions: 0[/green]
...
✓ All checks passed successfully!
Writing files...
Elapsed time: 3:42
Mode 2: Check-Only (Validation Only)
python eb_dashboard.py --check-only
Workflow:
1. Load existing JSON files (no API calls)
2. Load regression configuration
3. Run Coherence Check
4. Run Non-Regression Check
5. Report results
6. Exit
Use Case: Validate data before distribution without fresh collection
Output:
═══ CHECK ONLY MODE ═══
Running quality checks on existing data files...
[Loading configuration...]
[Running checks...]
✓ All checks passed successfully!
Mode 3: Compare Two Files
python eb_dashboard.py --check-only file1.json file2.json
Workflow:
1. Load file1 and file2 (as current and old)
2. Skip coherence check (organizations not provided)
3. Run regression check comparing them
4. Report differences
5. Exit
Use Case: Compare two snapshots, detect changes between versions
Output:
═══ CHECK ONLY COMPARE MODE ═══
Comparing two specific files:
Current: file1.json
Old: file2.json
[Running regression checks...]
⚠ [yellow]New Inclusions: 15[/yellow]
✗ [red]Deleted Inclusions: 5[/red]
...
Mode 4: Debug Mode (Verbose Output)
python eb_dashboard.py --debug
Workflow:
1. Execute as Normal Mode
2. Enable DEBUG_MODE in quality checks
3. Display detailed field-by-field changes
4. Show individual inclusion comparisons
5. Verbose logging
Use Case: Troubleshoot regression rules, understand data changes
Output:
Running collection...
[████████] 1200/1200
═══ Non Regression Check (DEBUG MODE) ═══
Endotest - Undefined to Defined (Only): 12
✓ Patient-001:
- Endotest.Request_Sent: false → true
- Endotest.Request_Status: undefined → 'completed'
✓ Patient-002:
- Endotest.Request_Sent: false → true
...
Troubleshooting
Issue 1: "Invalid JSON format" Error
Symptom: Configuration validation fails
Cause: Malformed JSON in transitions, field_name, or exceptions
Solution:
- Open cell in JSON validator
- Fix syntax errors
- Re-run check
Example - WRONG:
{
"transitions": [["active", "inactive" ] // Missing comma
}
{
"field_name": ["Status" "Date"] // Missing comma between array elements
}
Example - CORRECT:
{
"transitions": [["active", "inactive"]]
}
{
"field_name": ["Status", "Date"]
}
Issue 2: Rule Never Triggers
Symptom: Count always shows 0 even when data changes
Causes:
- Field filters too restrictive
- Transition pattern doesn't match actual changes
- field_group/field_name filtering excludes target fields
Solution:
- Loosen field filters: Set field_name to null
- Use wildcards in transitions:
["*", "*"] - Check actual field names in JSON output
- Enable debug mode to see field matching
Issue 3: Too Many False Positives
Symptom: Rule triggers unexpectedly, too many violations
Causes:
- Thresholds set too low
- Transitions too broad (matching unintended changes)
- field_group/field_name too permissive
Solution:
- Increase thresholds: Raise warning_threshold and critical_threshold
- Narrow transitions: Use specific values instead of wildcards
- Add exceptions: Use transition_exceptions to exclude specific cases
- Narrow field scope: Specify field_name instead of null
Issue 4: Configuration Changes Not Taking Effect
Symptom: Modifications to Excel file don't affect results
Causes:
- File not saved
- Regression_Check sheet not loaded
- Old configuration still in memory
Solution:
- Save Excel file (Ctrl+S)
- Restart Python script
- Verify sheet name is exactly "Regression_Check"
- Check file path is correct
Issue 5: User Confirmation Not Appearing
Symptom: Expected prompt for critical issues doesn't show
Causes:
- Issues are at warning level, not critical
- Thresholds higher than actual counts
- Running in check-only mode (no export decision needed)
Solution:
- Verify thresholds: warning < critical
- Check actual violation counts
- Run normal mode (not check-only)
Issue 6: Comparison Mode Showing Unexpected Differences
Symptom: --check-only file1 file2 reports many changes
Causes:
- Files are from different collection dates (expected)
- Configuration changed between collections (expected)
- Field order or grouping changed (might be false positive)
Solution:
- Review reported changes manually
- Check if changes are expected (new patient data added)
- Verify no data corruption occurred
- Compare file sizes and counts manually
Performance Considerations
Regression Check Execution Time
Factors Affecting Performance:
1. Number of Inclusions (patients)
- N patients = O(N) iterations
- Typical: 1200 patients = 1-2 seconds
2. Number of Rules
- R rules applied to each inclusion
- Typical: 20-30 rules = <100ms total
3. Field Matching Complexity
- Filter evaluation per field
- Notation pointée parsing: O(1) per field
- Typical: <50ms for all rules
4. Total Typical Time
- 1200 inclusions × 25 rules = 1-3 seconds
Optimization Tips
If Regression Check is Slow:
-
Reduce rule count:
- Remove inactive rules (add "ignore" label)
- Combine similar rules
-
Simplify field filters:
- Use null instead of large filter lists
- Use include (smaller) instead of exclude (larger)
-
Narrow transitions:
- Use specific values instead of wildcards
- Reduce number of transition pairs
-
Consider file size:
- Large JSON files (>20MB) take longer to parse
- This is rare and usually not the bottleneck
Summary
The Quality Checks System provides:
✅ Multi-Level Validation: Coherence + Regression checks ✅ Config-Driven Rules: No code changes needed ✅ Flexible Thresholds: Warning and Critical levels ✅ Rich Filtering: Group, field, notation pointée support ✅ Transition Patterns: Wildcard, keyword, and specific matching ✅ Advanced Exception Handling:
- Multiple transitions per exception:
[[old1, new1], [old2, new2], ...] - Include + Exclude can coexist simultaneously
- Fine-grained control over allowed/blocked transitions ✅ Backward Compatible: Legacy single-transition format still supported ✅ Debug Support: Detailed logging and debug mode ✅ Execution Modes: Normal, check-only, compare, debug
This architecture enables robust data quality monitoring without requiring code modifications, empowering business analysts to define and evolve validation rules independently.
Document End