Files
EB_Dashboard/DOCUMENTATION/DOCUMENTATION_12_QUALITY_CHECKS.md

2139 lines
59 KiB
Markdown
Raw Permalink Blame History

This file contains ambiguous Unicode characters
This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.
# Endobest Quality Checks & Regression Testing Guide
## Part 3: Quality Assurance, Validation Rules & Configuration
**Document Version:** 3.1 (Updated with new Excel export module reference)
**Last Updated:** 2025-11-08
**Audience:** Developers, Business Analysts, QA Engineers
**Language:** English
**Note:** Excel export functionality now available - see DOCUMENTATION_13_EXCEL_EXPORT.md, DOCUMENTATION_98_USER_GUIDE.md, and DOCUMENTATION_99_CONFIG_GUIDE.md
---
## Version History
### Version 3.0 (2025-10-22) - UNIFIED FIELD SELECTION PIPELINE
**Complete Refactorization of Field Selection**
-**Merged Columns:** `field_group` (F) + `field_name` (G) → single `field_selection` (F)
-**Simplified Syntax:** Field selection uses same pipeline format as transitions: `[["action", "field_selector"], ...]`
-**3 Selector Patterns:** `*.*` (all fields), `group.*` (group), `group.field` (specific)
-**Cleaner Code:** Removed 150+ lines of dual-filter logic (field_group + field_name combinations)
-**Config-Driven Keys:** Key field determination (Patient_Id, Pseudo) now read from `field_selection` instead of hardcoded
-**Unified Key Detection:** New `_get_key_field_from_new_inclusions_rule()` applies field_selection pipeline directly to first inclusion (15 LOC, -75% vs manual parsing)
-**Helper Functions:** `_apply_field_selection_pipeline()`, `_get_key_field_from_new_inclusions_rule()`, `_build_candidate_fields()`
- ⚠️ **MAJOR Breaking Change:** Old `field_group` and `field_name` columns (F, G) are **removed**
- ⚠️ **Column Shifts:** `bloc_scope` moves H→G, `transitions` moves I→H
- ⚠️ **Configuration Migration Required:** Completely restructure Excel `Regression_Check` sheet
**Technical Details:**
- Field selection pipeline starts with empty set, each step adds/removes fields
- Responsibility on admin to order rules correctly (no implicit logic)
- Special rules `"New Fields", "Deleted Fields", "Deleted Inclusions"` must have empty field_selection
- Special rule `"New Inclusions"` applies field_selection pipeline to first inclusion sample (assumes stable structure)
- Key field detection: finds first field from pipeline that has non-null value in both first new and old inclusion
- Configuration validation: missing/invalid field_selection = CRITICAL error
**Removed Dead Code:**
- `_determine_key_field()` - hardcoded Patient_Id/Pseudo logic
- `_matches_field_group_filter()` - replaced by pipeline
- `_matches_field_name_filter()` - replaced by pipeline
- `_determine_key_field_from_config()` - replaced by simplified unified `_get_key_field_from_new_inclusions_rule()`
### Version 2.0 (2025-10-22) - Pipeline Architecture
**Transitions Pipeline Introduced**
-**Unified Format:** Merged `transitions` + `transition_exceptions` into single `transitions` column
-**Simplified Syntax:** Each step is a 4-element array `[action, field_selector, from, to]`
-**Sequential Processing:** Pipeline steps applied in order, allowing fine-grained control
-**Better Determinism:** All sets sorted for reproducible logs
-**Improved Error Handling:** Invalid configs silently skipped with warnings
- ⚠️ **Breaking Change:** Old `transition_exceptions` column (J) merged into `transitions` (I)
### Version 1.0 (2025-10-21) - Initial Release
- Dual-column system: `transitions` (I) + `transition_exceptions` (J)
- Include/exclude exception handling
- Multiple transition support per exception
---
## Table of Contents
1. [Overview](#overview)
2. [Quality Assurance Strategy](#quality-assurance-strategy)
3. [Coherence Check (Technical Details)](#coherence-check-technical-details)
4. [Non-Regression Check Framework](#non-regression-check-framework)
5. [Regression Check Configuration File](#regression-check-configuration-file)
6. [Column Reference](#column-reference)
7. [Special Keywords & Wildcards](#special-keywords--wildcards)
8. [Rule Types & Logic](#rule-types--logic)
9. [Field Selection Pipeline](#field-selection-pipeline-v30)
10. [Transition Patterns](#transition-patterns)
11. [Exception Handling](#exception-handling)
12. [Configuration Examples](#configuration-examples)
13. [User Guide: Adding/Modifying Rules](#user-guide-adding-modifying-rules)
14. [Execution Modes](#execution-modes)
15. [Troubleshooting](#troubleshooting)
---
## ⚠️ CRITICAL - Version 3.0 Migration Required
**This document describes v3.0 with BREAKING CHANGES from v2.0**
| Item | v2.0 | v3.0 |
|------|------|------|
| **Excel Columns F-I** | `field_group`, `field_name`, `bloc_scope`, `transitions` | `field_selection`, `bloc_scope`, `transitions` |
| **Column Count** | 4 columns for filtering+transitions | 3 columns (merged field_selection) |
| **Key Field Config** | Hardcoded (Patient_Id/Pseudo) | Config-driven (from field_selection) |
| **Field Filtering Logic** | 6+ combinations (complex) | Single pipeline (simple) |
**ACTION REQUIRED:**
1. ✅ Update Excel file column positions
2. ✅ Migrate field_group + field_name → field_selection
3. ✅ Run non-regression tests
4. ✅ Verify key field detection works with new config
---
## Overview
The **Quality Checks System** provides comprehensive data validation in two stages:
1. **Coherence Check:** Verifies that organization statistics (API counters) match the actual detailed inclusion data
2. **Non-Regression Check:** Detects unexpected data changes between current and previous collection runs
Both checks are **configurable via Excel** with **Warning/Critical severity levels** that can trigger user confirmation prompts.
### Design Philosophy
```
Trust, but Verify
- Trust: API data is generally reliable
- Verify: Statistical consistency and change detection
- Report: Multi-level severity (OK, Warning, Critical)
- Decide: User confirmation before export on critical issues
```
---
## Quality Assurance Strategy
### Workflow Integration
```
Data Collection
QUALITY CHECKS
├─ COHERENCE CHECK (mandatory)
│ ├─ Load organization statistics from API responses
│ ├─ Calculate actual counts from detailed inclusions
│ └─ Compare: Stats vs. Actual
├─ NON-REGRESSION CHECK (if old file exists)
│ ├─ Load previous inclusions (_old file)
│ ├─ Apply config-driven comparison rules
│ └─ Report: Changes matching configured patterns
└─ RESULT
├─ has_coherence_critical flag
└─ has_regression_critical flag
IF critical issues detected:
├─ Display warning: ⚠ CRITICAL
├─ Ask user: "Write results anyway?"
├─ If NO → Abort export, preserve old files
└─ If YES → Continue with export (user override)
ELSE:
└─ Continue with export automatically
```
### Severity Levels
| Level | Display | Meaning | Action |
|-------|---------|---------|--------|
| **OK** | ✓ Green | No issues, within normal range | Continue automatically |
| **WARNING** | ⚠ Yellow | Issue detected, exceeds warning threshold | Log and display, continue automatically |
| **CRITICAL** | ✗ Red | Severe issue, exceeds critical threshold | Display, ask user before export |
### User Interaction
```
Quality Checks Complete
✗ [red]Coherence Check: CRITICAL[/red]
⚠ [yellow]Organization 1 mismatch: 95 vs 98[/yellow]
✗ [red]Non-Regression: CRITICAL[/red]
⚠ [yellow]New Inclusions: 42 (threshold 50)[/yellow]
✗ [red]Deleted Inclusions: 15 (threshold 0)[/red]
[bold]⚠ CRITICAL issues detected in quality checks![/bold]
Do you want to write the results anyway? [y/N]:
y → Export anyway (risky, user override)
n → Cancel export (preserve old files)
```
---
## Coherence Check (Technical Details)
### Purpose
Verify that **organization statistics** (fetched from API) match **actual detailed data** (inclusion-by-inclusion count).
### Data Sources
**Source 1: Organization Statistics (API)**
```
For each organization:
GET /api/inclusions/inclusion-statistics
Returns:
{
"totalInclusions": N, // Total patients
"preIncluded": P, // Pré-inclus count
"included": I, // Inclus count
"prematurelyTerminated": T // Prematurely terminated
}
```
**Source 2: Inclusion Details (JSON Array)**
```
For each patient in endobest_inclusions:
Check: Patient_Identification.Organisation_Id
Count: Based on Inclusion.Inclusion_Status
Classification rules:
1. If status ends with " - AP" → prematurely_terminated
2. Else if status starts with "pré-inclus" → preincluded
3. Else if status starts with "inclus" → included
Always count: patients += 1
```
### Validation Logic
```python
def coherence_check(current_inclusions, organizations_list):
# STEP 1: Collect statistics from API
total_stats = {
'patients': sum(org['patients_count'] for org in organizations),
'preincluded': sum(org['preincluded_count'] for org in organizations),
'included': sum(org['included_count'] for org in organizations),
'prematurely_terminated': sum(org['prematurely_terminated_count'] for org in organizations)
}
# STEP 2: Calculate actual counts from detailed data
total_detail = calculate_detail_counters(current_inclusions, org_id=None)
# = (patients, preincluded, included, prematurely_terminated)
# STEP 3: Compare all 4 counters
is_match = (
total_stats['patients'] == total_detail['patients'] AND
total_stats['preincluded'] == total_detail['preincluded'] AND
total_stats['included'] == total_detail['included'] AND
total_stats['prematurely_terminated'] == total_detail['prematurely_terminated']
)
# STEP 4: Report total comparison
IF is_match:
PRINT: [green]TOTAL matches[/green]
ELSE:
PRINT: [red]TOTAL mismatch[/red]
PRINT: Stats({P}/{Pre}/{Inc}/{Term}) vs Detail({p}/{pre}/{inc}/{term})
set has_critical = True
# STEP 5: Detail-level comparison (only if not OK)
FOR EACH organization:
org_stats = get organization counters
org_detail = calculate_detail_counters(current_inclusions, org_id=org.id)
IF org_stats != org_detail:
PRINT: [yellow]Organization "{name}" mismatch[/yellow]
PRINT: Stats vs Detail breakdown
set has_critical = True
RETURN has_critical
```
### Example Output
**Scenario: Perfect Match**
```
═══ Coherence Check ═══
✓ [green]TOTAL - Stats(150/20/120/10) vs Detail(150/20/120/10)[/green]
```
**Scenario: Mismatch Detected**
```
═══ Coherence Check ═══
✗ [red]TOTAL - Stats(150/20/118/10) vs Detail(150/20/120/10)[/red]
⚠ [yellow]Center A - Stats(50/5/40/5) vs Detail(50/5/42/5)[/yellow]
⚠ [yellow]Center B - Stats(100/15/78/5) vs Detail(100/15/78/5)[/yellow]
```
### Interpretation
**Match (Green):**
```
API statistics perfectly align with detailed data
→ No data collection issues
→ Continue processing
```
**Minor Mismatch (Yellow):**
```
1-2 patients differ between statistics and details
→ Possible API consistency issue
→ Monitor but continue (it happens occasionally)
```
**Major Mismatch (Red):**
```
10+ patients difference
→ Significant data collection issue
→ Investigate root cause
→ Consider re-running collection
```
---
## Non-Regression Check Framework
### Purpose
Detect **unexpected data changes** between current and previous collections by comparing field values against configured transition patterns.
### Architecture
```
Previous Inclusions (File)
┌─────────────────────────────┐
│ NON-REGRESSION CHECK │
├─────────────────────────────┤
│ 1. Load Regression Config │
│ (Excel: Regression_Check sheet)
│ │
│ 2. Build Inclusion Dicts │
│ Index by: Patient_Id or Pseudo
│ │
│ 3. Group Rules by Bloc │
│ - Structure │
│ - Identification │
│ - Inclusion Protocol │
│ - Endotest │
│ - Other Questionnaires │
│ │
│ 4. For Each Rule: │
│ a) Detect rule type │
│ - Normal rule │
│ - New Inclusions │
│ - Deleted Inclusions │
│ - New Fields │
│ - Deleted Fields │
│ │
│ b) Process rule logic │
│ - Collect candidates │
│ - Match transitions │
│ - Apply exceptions │
│ - Apply bloc_scope │
│ │
│ c) Calculate severity │
│ - Count vs thresholds │
│ - Determine status │
│ │
│ 5. Display Results │
│ - By bloc │
│ - Color-coded status │
│ - Detailed changes (debug)
│ │
└─────────────────────────────┘
Current Inclusions (Memory)
```
---
## Regression Check Configuration File
### File Location & Sheet
```
Endobest_Dashboard_Config.xlsx
├─ Sheet 1: "Inclusions_Mapping" (See DOCUMENTATION_11_FIELD_MAPPING.md)
└─ Sheet 2: "Regression_Check"
├─ Row 1: Headers
└─ Row 2+: Rules
```
### Sheet Structure (Version 3.0)
```
Row 1 (Headers):
A B C D E
ignore bloc_title line_label warning_threshold critical_threshold
F G H
field_selection bloc_scope transitions
Row 2+: Rule definitions (one per row)
```
**BREAKING CHANGE (v3.0):** Columns F and G from v2.0 (`field_group` and `field_name`) have been **merged into single column F (`field_selection`)**. All subsequent columns shifted left by one position.
**Color Coding:**
- **Yellow:** Structure/Identification bloc (foundational rules)
- **Blue:** Inclusion Protocol bloc (inclusion status rules)
- **Light Purple:** Endotest bloc (test-related rules)
- **White:** Regular rules
- **Red:** Incomplete/error rules (missing required columns)
---
## Column Reference
### Column A: ignore
**Type:** String (optional)
**Description:** Skip this row if contains "ignore" (case-insensitive)
**Purpose:** Comment out rules without deleting rows
**Values:**
```
ignore → Row is skipped
(empty) → Row is processed
any_other_text → Row is processed
```
### Column B: bloc_title
**Type:** String (required)
**Description:** Logical grouping of related rules
**Purpose:** Visual organization and blocking/reporting
**Valid Values:**
```
Structure → File format and field availability rules
Identification → Patient identification changes
Inclusion Protocol → Inclusion status and protocol changes
Endotest → Laboratory test request changes
Other Questionnaires → Non-specific questionnaire changes
[Custom Group Names] → Any custom bloc name for organization
```
**Rules Per Bloc:**
```
Structure bloc (Example):
├─ New Fields
├─ Deleted Fields
└─ (Structure-specific rules)
Identification bloc:
├─ New Inclusions
├─ Deleted Inclusions
├─ Changed (Excluding Birthday)
├─ Changed Date of Birth/Age
└─ (Identification-specific rules)
Endotest bloc:
├─ Undefined to Defined (Only)
├─ Defined to Undefined
├─ Changed Value
└─ (Endotest-specific rules)
```
### Column C: line_label
**Type:** String (required)
**Description:** Unique rule identifier within its bloc
**Purpose:** Displayed in output, identifies rule in reports
**Examples:**
```
New Inclusions
Deleted Inclusions
New Fields
Deleted Fields
Changed Value
Undefined to Defined (Only)
```
**Requirements:**
- Must be unique within bloc_title
- Should be descriptive
### Column D: warning_threshold
**Type:** Numeric (required, >= 0)
**Description:** Count threshold that triggers WARNING level
**Position:** Column D (after line_label)
**Logic:**
```
IF count > warning_threshold AND count <= critical_threshold:
Status = WARNING (yellow ⚠)
```
**Examples:**
```
0 → Any change triggers warning (strict)
5 → 1-5 changes = OK, 6-10 = Warning
50 → 1-50 changes = OK, 51+ = Warning (lenient)
200 → Very lenient, only alert on large changes
```
### Column E: critical_threshold
**Type:** Numeric (required, >= warning_threshold)
**Description:** Count threshold that triggers CRITICAL level
**Position:** Column E (after warning_threshold)
**Logic:**
```
IF count > critical_threshold:
Status = CRITICAL (red ✗)
→ May prompt user for confirmation
```
**Relationship:**
```
warning_threshold <= critical_threshold
Examples:
(0, 1) → Strict: any change is critical
(0, 50) → Any warning also becomes critical
(50, 100) → Normal operation: 1-50 OK, 51-100 warning, 100+ critical
(200, 200) → Same thresholds: jump directly from OK to critical
```
### Column F: field_selection (NEW - v3.0)
**Type:** JSON array of 2-element arrays (mandatory for most rules)
**Description:** Pipeline-based field selection using include/exclude actions
**Position:** Column F (after critical_threshold) - **REPLACES old field_group + field_name**
**Rules:**
- **Format:** `[["action", "field_selector"], ["action", "field_selector"], ...]`
- **Mandatory:** For all rules EXCEPT `"New Fields"`, `"Deleted Fields"`, `"Deleted Inclusions"`
- **For special rules:** Must be empty `[]` or null
- **Explicit:** No implicit logic - admin must order steps correctly
- **Pipeline:** Starts with empty set, each step adds or removes fields
**Elements:**
| Element | Type | Valid Values | Example |
|---------|------|--------------|---------|
| **action** | String | `"include"` or `"exclude"` | `"include"` |
| **field_selector** | String | `*.*`, `group.*`, `group.field` | `"Endotest.Request_Sent"` |
**Selector Patterns (3 only):**
```
*.* → All fields in all groups
group.* → All fields in specific group (e.g., "Endotest.*")
group.field → Specific field only (e.g., "Endotest.Request_Sent")
```
**Examples:**
**1. Include Single Group**
```json
[["include", "Endotest.*"]]
// All Endotest fields
```
**2. Include Multiple Groups**
```json
[["include", "Endotest.*"], ["include", "Inclusion.*"]]
// Endotest AND Inclusion fields
```
**3. Include All, Exclude Some**
```json
[["include", "*.*"], ["exclude", "Endotest.Last_Updated"]]
// All fields EXCEPT Endotest.Last_Updated
```
**4. Key Field Selection (for "New Inclusions" rule)**
```json
[["include", "Patient_Identification.Patient_Id"], ["include", "Patient_Identification.Pseudo"]]
// Tries Patient_Id first, then Pseudo (in order)
```
**5. Complex Pipeline**
```json
[
["include", "*.*"],
["exclude", "Inclusion.*"],
["exclude", "Patient_Identification.*"]
]
// All fields EXCEPT Inclusion and Patient_Identification
```
**Special Rules (field_selection must be EMPTY):**
```
"New Fields" → [] or null
"Deleted Fields" → [] or null
"Deleted Inclusions" → [] or null
```
**Validation:**
- ✅ Missing or null field_selection for normal rules → **CRITICAL ERROR**
- ✅ Invalid selector (no dot) → **CRITICAL ERROR**
- ✅ Non-list format → **CRITICAL ERROR, skip rule**
- ✅ Step with wrong element count → **CRITICAL ERROR, skip rule**
### Column G: bloc_scope (moved from H - v3.0)
**Type:** String enum (optional, default: "any")
**Description:** Aggregation logic for matching fields within an inclusion
**Position:** Column G (after field_selection)
**Valid Values:**
```
"any" → At least ONE field must match transitions
"all" → ALL changed fields must match transitions
```
**Logic:**
**bloc_scope = "any" (Default)**
```
IF ANY candidate field has matching transition:
RETURN inclusion matches rule
Use for: "Alert if any change occurs"
```
**bloc_scope = "all"**
```
IF ALL changed fields have matching transitions:
RETURN inclusion matches rule
Use for: "Alert only if all changes match pattern"
```
**Example Comparison:**
```
Inclusion with 5 fields in scope:
Field1: Changed, matches transition ✓
Field2: Unchanged (always ignored)
Field3: Changed, does NOT match transition ✗
Field4: Unchanged (always ignored)
Field5: Changed, matches transition ✓
Changed fields: [Field1, Field3, Field5]
Matched changed: [Field1, Field5]
Result with bloc_scope="any": ✓ COUNT (Field1 matched)
Result with bloc_scope="all": ✗ SKIP (Field3 didn't match)
```
| Scenario | bloc_scope="any" | bloc_scope="all" |
|----------|------------------|-----------------|
| 1 match, 0 mismatches | ✓ COUNT | ✓ COUNT |
| 1 match, 1 mismatch | ✓ COUNT | ✗ SKIP |
| 0 matches, 1 mismatch | ✗ SKIP | ✗ SKIP |
| 3 matches, 0 mismatches | ✓ COUNT | ✓ COUNT |
| 3 matches, 1 mismatch | ✓ COUNT | ✗ SKIP |
---
### Column H: transitions (moved from I - v3.0)
**Type:** JSON array of 4-element arrays (optional)
**Description:** Pipeline-based transition rules (old_value → new_value)
**Position:** Column H (after bloc_scope)
**Format:** `[["action", "field_selector", "from_pattern", "to_pattern"], ...]`
- Each step is exactly 4 elements
- If None/empty: Rule applies to ALL field changes
- Supports wildcard keywords: `*undefined`, `*defined`, `*`
- Supports literal values for exact matching
**Pipeline Concept (v2.0+):**
```
Initial state: All changed fields → is_checked = False
Step 1: Include rule for all fields (*.*) with *defined→*defined
└─ is_checked = True if transition matches
Step 2: Include rule for Endotest.Diagnostic_Status with waiting→*undefined
└─ is_checked = True (whitelisted exception)
Step 3: Exclude rule for Endotest.Request_Sent with false→true
└─ is_checked = False (blacklisted exception)
Final result: Only fields matching the pipeline are checked
```
---
#### Syntax: 4-Element Pipeline Array
Each pipeline step is a **4-element array**:
```json
[action, field_selector, from_pattern, to_pattern]
```
| Element | Description | Examples |
|---------|-------------|----------|
| **action** | "include" (whitelist) or "exclude" (blacklist) | "include", "exclude" |
| **field_selector** | Which fields this step applies to | "*.*", "group.*", "group.field" |
| **from_pattern** | Old value pattern to match | "*undefined", "*defined", "*", literal value |
| **to_pattern** | New value pattern to match | "*undefined", "*defined", "*", literal value |
**Important:** The syntax is **strictly enforced** - each step must have exactly 4 elements. No shortcuts or variants are accepted.
---
#### Field Selector Patterns
```
*.* → All fields in all groups
group.* → All fields in specific group (e.g., "Endotest.*")
group.field → Specific field only (e.g., "Endotest.Request_Sent")
```
---
#### Complete Examples
**Example 1: Simple All-Fields Rule (Most Common)**
```json
{
"transitions": [
["include", "*.*", "*defined", "*defined"]
]
}
// Pipeline: Include all fields that change between two defined values
```
**Example 2: Main Rule + One Include Exception**
```json
{
"transitions": [
["include", "*.*", "*defined", "*defined"],
["include", "Endotest.Diagnostic_Status", "waiting", "*undefined"]
]
}
// Step 1: Include all *defined→*defined changes
// Step 2: ALSO include specific Endotest.Diagnostic_Status changes from waiting to undefined
```
**Example 3: Main Rule + Include Exception + Exclude Exception**
```json
{
"transitions": [
["include", "*.*", "*defined", "*defined"],
["include", "Endotest.Diagnostic_Status", "waiting", "*undefined"],
["exclude", "Endotest.Request_Sent", false, true]
]
}
// Step 1: Include all *defined→*defined
// Step 2: Include Diagnostic_Status waiting→undefined (whitelist)
// Step 3: Exclude Request_Sent false→true (blacklist)
// Result: Step 3 overrides Step 1 for that specific field+transition
```
**Example 4: Multiple Include Steps for Different Fields**
```json
{
"transitions": [
["include", "*.*", "*defined", "*defined"],
["include", "GDD.Status", "pending", "completed"],
["include", "GDD.Status", "pending", "failed"]
]
}
// Step 1: Include all *defined→*defined changes
// Step 2: Include GDD.Status pending→completed
// Step 3: Include GDD.Status pending→failed
```
**Example 5: Exclude Rule with Wildcard**
```json
{
"transitions": [
["include", "*.*", "*defined", "*defined"],
["exclude", "Endotest.Last_Modified", "*", "*"]
]
}
// Include all changes EXCEPT any change to Last_Modified field
```
---
#### Processing Logic
The pipeline is executed **sequentially**, with each step modifying the `is_checked` status in-place:
```
1. Initialize: All changed fields have is_checked = False
2. For each transition step in order:
a. Check if the current field matches the field_selector
b. If yes: Check if the old→new values match from_pattern→to_pattern
c. If yes:
- If action="include": Set is_checked = True
- If action="exclude": Set is_checked = False
d. If no: Leave is_checked unchanged
3. Final: Only fields with is_checked = True are counted as matching
```
**Important:** Later steps can override earlier steps. Example:
```json
[
["include", "*.*", "*", "*"], // Step 1: include everything
["exclude", "Field.X", "*", "*"] // Step 2: exclude Field.X (overrides Step 1)
]
```
Result: Everything is included EXCEPT Field.X
---
#### Configuration Error Handling
If a transitions step has invalid syntax:
- The rule is silently skipped (logged with yellow warning)
- No exception is thrown
- User can see the ⚠ warning in the output
- User can choose to save the report or fix the config
**Valid syntax example:**
```json
["include", "*.*", "*defined", "*defined"] // ✓ Exactly 4 elements
["include", "*.*", "*defined"] // ✗ Only 3 elements (INVALID)
["maybe", "*.*", "*defined", "*defined"] // ✗ Invalid action (INVALID)
["include", "invalid", "*defined", "*defined"] // ✗ No dot in selector (INVALID)
```
---
## Special Keywords & Wildcards
This section documents the special keywords and patterns used in transition specifications throughout the configuration.
### Keywords in Transition Patterns
The regression check configuration supports special keywords with `*` prefix for flexible transition matching:
#### Keyword 1: `*undefined`
**Meaning:** Matches any "undefined-like" value
**Matches:**
- `null` (None in Python)
- `""` (empty string)
- `"undefined"` (literal string)
**Example:**
```json
{
"transitions": [["*undefined", "*defined"]]
}
// Matches: undefined → Active, null → 42, "" → true, etc.
```
**Use Case:** Detect when a field gets populated for the first time
---
#### Keyword 2: `*defined`
**Meaning:** Matches any "defined" value (opposite of *undefined)
**Matches:** Anything EXCEPT:
- `null` (None)
- `""` (empty string)
- `"undefined"` (literal string)
**Example:**
```json
{
"transitions": [["*defined", "*undefined"]]
}
// Matches: Active → null, 42 → "", true → "undefined", etc.
```
**Use Case:** Detect when a field loses its value
---
#### Keyword 3: `*` (Wildcard)
**Meaning:** Matches absolutely any value
**Matches:** Any value including:
- Defined values (strings, numbers, booleans)
- Undefined-like values (null, "", "undefined")
- Objects, arrays, etc.
**Example:**
```json
{
"transitions": [["*", "*"]]
}
// Matches: ANY old value → ANY new value
// Essentially: "any change at all"
```
**Use Case:** Monitor all changes to a field, filter out specific cases with exceptions
---
### Combining Keywords with Literal Values
Patterns can mix keywords and literal values:
| Pattern | Meaning |
|---------|---------|
| `["*undefined", "*defined"]` | Undefined → Defined (field becomes populated) |
| `["*defined", "*undefined"]` | Defined → Undefined (field gets cleared) |
| `["*defined", "*defined"]` | Value change while staying defined (actual value change required) |
| `["*", "*"]` | Any change at all |
| `["Active", "*defined"]` | From literal "Active" to any defined value |
| `["*undefined", "Active"]` | From undefined to literal "Active" |
---
### Literal Values (No `*` Prefix)
Any value that does NOT start with `*` is treated as a literal value and matched exactly:
```json
{
"transitions": [
["pending", "accepted"], // Exact string match
[false, true], // Exact boolean match
[0, 1], // Exact numeric match
[null, "Active"], // null matches null, "Active" matches "Active"
["undefined", "Done"] // "undefined" (literal string) matches "undefined"
]
}
```
**Important:** Literal values are matched by exact equality, including:
- `"undefined"` - matches the exact string "undefined" (not undefined state)
- `null` - matches null values
- `""` - matches empty string
---
## Summary Table: Special Keywords in Transitions
| Keyword | Matches | Use Case |
|---------|---------|----------|
| `*undefined` | null, "", "undefined" (any undefined-like value) | Detect when field becomes populated |
| `*defined` | Any defined value (NOT null, "", "undefined") | Detect when field loses value |
| `*` | Any value whatsoever | Alert on any change; use with exceptions for fine control |
| (no `*` prefix) | Exact literal values | Specific value matching (e.g., "pending" → "accepted") |
---
### Rule Type 1: Standard Rules (Normal Comparison)
**Purpose:** Detect field value changes matching configured patterns
**Processing Steps:**
```
Step 1: Collect Candidate Fields
├─ Filter by field_group (if specified)
├─ Filter by field_name (if specified)
└─ Result: List of (group_name, field_name) tuples
Step 2: For Each Candidate Field
├─ Get new_value and old_value
├─ Check if transition matches (if transitions specified)
├─ Apply exceptions (include/exclude)
├─ Mark as "checked" if matches
Step 3: Apply bloc_scope
├─ With "any": Count inclusion if ANY field is checked
├─ With "all": Count inclusion if ALL changed fields are checked
Step 4: Report Matching Inclusions
└─ Count vs. thresholds (warning/critical)
```
**Example Configuration:**
```json
{
"bloc_title": "Inclusion Protocol",
"line_label": "Undefined to Defined (Only)",
"warning_threshold": 0,
"critical_threshold": 200,
"field_group": {"include": ["Inclusion"]},
"field_name": null,
"transitions": [
["include", "*.*", "*undefined", "*defined"]
],
"bloc_scope": "all"
}
```
### Rule Type 2: New Inclusions
**Purpose:** Count patients that exist in current data but not in previous
**Syntax:**
```json
{
"bloc_title": "Identification",
"line_label": "New Inclusions",
"warning_threshold": 0,
"critical_threshold": 50,
"field_group": "Patient_Identification",
"field_name": ["Patient_Id", "Pseudo"],
"transitions": [],
"bloc_scope": null
}
```
**Note:** For special rules like "New Inclusions", transitions can be left as empty array `[]` since these rules don't use transition matching.
**Processing:**
```
1. Build dictionaries indexed by key field
- Key field candidates: Patient_Id, Pseudo (tried in order)
- key_dict_new = {patient_key: patient_data for patient in current}
- key_dict_old = {patient_key: patient_data for patient in previous}
2. Find new inclusions
new_keys = set(key_dict_new.keys()) - set(key_dict_old.keys())
count = len(new_keys)
3. Compare to thresholds
IF count > critical_threshold: CRITICAL
ELIF count > warning_threshold: WARNING
ELSE: OK
```
**Example Output:**
```
✓ [green]New Inclusions: 0[/green]
(No new patients added)
⚠ [yellow]New Inclusions: 42[/yellow]
(42 new patients - warning threshold exceeded)
✗ [red]New Inclusions: 75[/red]
(75 new patients - exceeds critical threshold of 50)
```
### Rule Type 3: Deleted Inclusions
**Purpose:** Count patients that exist in previous but not in current
**Syntax:**
```json
{
"bloc_title": "Identification",
"line_label": "Deleted Inclusions",
"warning_threshold": 0,
"critical_threshold": 0,
"field_group": "Patient_Identification",
"field_name": ["Patient_Id", "Pseudo"],
"transitions": [],
"bloc_scope": null
}
```
**Processing:**
```
1. Build dictionaries (same as New Inclusions)
2. Find deleted inclusions
deleted_keys = set(key_dict_old.keys()) - set(key_dict_new.keys())
count = len(deleted_keys)
3. Compare to thresholds
IF count > critical_threshold: CRITICAL
ELIF count > warning_threshold: WARNING
ELSE: OK
```
**Note:** Typically `critical_threshold=0` because any deletion is concerning.
### Rule Type 4: New Fields
**Purpose:** Detect field names that appear in current but not in previous
**Syntax:**
```json
{
"bloc_title": "Structure",
"line_label": "New Fields",
"warning_threshold": 0,
"critical_threshold": 1,
"field_group": null,
"field_name": null,
"transitions": [],
"bloc_scope": null
}
```
**Processing:**
```
1. For each patient in common (present in both versions):
a) Get all groups and fields from current version
b) Get all groups and fields from previous version
c) Find new fields: current_fields - previous_fields
d) Qualified name: "group_name.field_name"
2. Count by field name
field_counts = {field_qualified_name: count_of_inclusions}
total_new_fields = len(field_counts)
3. Display results
For each new field:
"Inclusion.New_Field (42 inclusions)"
[count = number of inclusions that gained this field]
```
**Example Output:**
```
✓ [green]New Fields: 0[/green]
⚠ [yellow]New Fields: 2[/yellow]
Endotest.New_Request_Type (1 inclusion)
Inclusion.New_Status_Code (2 inclusions)
```
### Rule Type 5: Deleted Fields
**Purpose:** Detect field names that exist in previous but not in current
**Syntax:**
```json
{
"bloc_title": "Structure",
"line_label": "Deleted Fields",
"warning_threshold": 0,
"critical_threshold": 1,
"field_group": null,
"field_name": null,
"transitions": [],
"bloc_scope": null
}
```
**Processing:** Same as "New Fields" but reversed:
```
deleted_fields = previous_fields - current_fields
```
---
## Field Selection Pipeline (v3.0)
**NEW APPROACH:** Field selection now uses the **same pipeline architecture as transitions**.
### Pipeline Ordering (Key Concept)
Start with an **empty set of fields**. Each step either **includes** or **excludes** fields:
```python
candidate_fields = set() # Empty initially
# Step 1: Include all Endotest fields
for each field in all_fields:
if selector matches "Endotest.*":
candidate_fields.add(field)
# Step 2: Also include Inclusion.Status
for each field in all_fields:
if selector matches "Inclusion.Status":
candidate_fields.add(field)
# Step 3: But exclude Endotest.Last_Updated
for each field in all_fields:
if selector matches "Endotest.Last_Updated":
candidate_fields.discard(field)
# Result: Endotest.* + Inclusion.Status, except Endotest.Last_Updated
```
### Simple Examples
#### Example 1: Single Group
```json
[["include", "Endotest.*"]]
// Result: All Endotest fields
```
#### Example 2: Multiple Groups
```json
[["include", "Endotest.*"], ["include", "Inclusion.*"]]
// Result: All Endotest + all Inclusion fields
```
#### Example 3: Specific Fields
```json
[["include", "Patient_Identification.Patient_Id"], ["include", "Patient_Identification.Pseudo"]]
// Result: Only Patient_Id and Pseudo fields
```
#### Example 4: All Except Some
```json
[["include", "*.*"], ["exclude", "Endotest.Last_Updated"]]
// Result: All fields EXCEPT Endotest.Last_Updated
```
#### Example 5: Complex Selection
```json
[
["include", "*.*"],
["exclude", "Patient_Identification.*"],
["exclude", "Inclusion.*"]
]
// Result: All fields EXCEPT Patient_Identification and Inclusion
```
### Important Notes
-**Order matters:** Steps are applied sequentially
-**Explicit:** Admin responsible for correct pipeline
-**No implicit AND/OR:** Use multiple include steps for OR logic
-**Deterministic:** Sets sorted, reproducible results
---
## Transition Patterns
### Pattern Matching Rules
#### Literal Value Matching
```json
[
["active", "inactive"],
[true, false],
[0, 1]
]
// Match exact value changes
// Type must match (string vs. number vs. boolean)
```
#### Undefined Keyword
```
*undefined: Matches any undefined-like value
- null
- "" (empty string)
- "undefined"
*defined: Matches any defined value
- NOT null
- NOT ""
- NOT "undefined"
```
**Examples:**
```json
[
["*undefined", "*defined"]
]
// Transition FROM any undefined TO any defined
[
["*defined", "*undefined"]
]
// Transition FROM any defined TO any undefined
[
["*defined", "*defined"]
]
// Transition FROM defined TO different defined
// (with actual value change check)
```
#### Wildcard Pattern
```json
[
["*", "*"]
]
// Match ANY transition
// Useful for: "Alert on any change to this field"
```
### Transition Combination Examples
**Example 1: Detect New Values Only**
```json
{
"transitions": [["*undefined", "*defined"]]
}
// Alert when field goes from undefined to any value
// Ignore when field already had value
```
**Example 2: Detect Value Reversal**
```json
{
"transitions": [
[true, false],
[false, true]
]
}
// Alert when boolean field toggles in either direction
```
**Example 3: Detect Specific Status Change**
```json
{
"transitions": [
["pending", "approved"],
["pending", "rejected"]
]
}
// Alert when pending status changes to approved or rejected
// Ignore all other transitions
```
**Example 4: Detect Anything But This**
```json
{
"transitions": [
["include", "*.*", "*", "*"],
["exclude", "Endotest.Last_Updated", "*", "*"]
]
}
// Alert on any field change
// EXCEPT exclude changes to Last_Updated
```
---
## Exception Handling (Pipeline Architecture)
With the new unified pipeline format, exceptions are now just regular pipeline steps with different actions. This section explains the patterns.
### Pattern 1: Simple Whitelist (Include Only)
Allow specific field/transition combinations:
```json
{
"transitions": [
["include", "Request_Sent", false, true],
["include", "Diagnostic_Status", "warning", "complete"]
]
}
```
**Logic:**
```
Step 1: Include Request_Sent with false→true transition
Step 2: Include Diagnostic_Status with warning→complete
Result: ONLY these specific field+transition combinations are checked
```
### Pattern 2: Simple Blacklist (Exclude Only)
Block specific field/transition combinations:
```json
{
"transitions": [
["include", "*.*", "*", "*"],
["exclude", "Last_Updated", "*", "*"],
["exclude", "Endotest.Import_Time", "*", "*"]
]
}
```
**Logic:**
```
Step 1: Include all fields with any change (*→*)
Step 2: Exclude Last_Updated from being checked
Step 3: Exclude Endotest.Import_Time from being checked
Result: All fields EXCEPT Last_Updated and Import_Time
```
### Pattern 3: Main Rule + Multiple Exceptions
Combine main transition rule with field-specific exceptions:
```json
{
"transitions": [
["include", "*.*", "*defined", "*defined"],
["include", "Request_Sent", false, true],
["exclude", "Endotest.Last_Modified", "*", "*"]
]
}
```
**Logic:**
```
Step 1: Include fields that change between two defined values
Step 2: ALSO include Request_Sent changing from false to true (even if not *defined→*defined)
Step 3: But exclude any change to Last_Modified (overrides Step 1)
Result: *defined→*defined changes PLUS Request_Sent false→true, EXCEPT Last_Modified
```
### Field Selector Formats in Pipeline
**Simple field name (matches in any group):**
```json
{
"field_selector": "Status"
}
// Matches "Status" in any group
// But this is NOT pipeline syntax - use "*.*" with field matching instead
```
**Better: Use qualified notation in field_selector:**
```json
["include", "Endotest.Request_Sent", false, true]
// Matches only Endotest group, Request_Sent field
// Matches ONLY Endotest.Request_Sent
```
**Full Specification:**
```json
{
"field": "Endotest.Request_Sent",
"transition": [false, true]
}
// Matches this specific field AND transition combination
```
### Practical Examples with Pipeline
**Example 1: Alert on Most Changes, Except System Fields**
```json
{
"transitions": [
["include", "*.*", "*", "*"],
["exclude", "Last_Updated", "*", "*"],
["exclude", "Last_Modified_By", "*", "*"],
["exclude", "Import_Timestamp", "*", "*"]
]
}
// Step 1: Include ANY field change
// Step 2-4: Exclude system timestamp/audit fields
```
**Example 2: Alert on Undefined→Defined, Plus Status Reversals**
```json
{
"transitions": [
["include", "*.*", "*undefined", "*defined"],
["include", "Request_Status", "rejected", "submitted"]
]
}
// Step 1: Include when field goes from undefined to defined
// Step 2: ALSO include Request_Status: rejected → submitted (even if not undefined→defined)
```
**Example 3: Complex Medical Rules with Multiple Conditions**
```json
{
"transitions": [
["include", "*.*", "*undefined", "*defined"],
["include", "Endotest.Test_Result", "pending", "completed"],
["include", "GDD.Status", "pending", "failed"],
["exclude", "Endotest.Last_Sync", "*", "*"]
]
}
// Step 1: Include main rule: undefined→defined
// Step 2: ALSO include Test_Result pending→completed
// Step 3: ALSO include GDD.Status pending→failed
// Step 4: But exclude any change to Last_Sync field
// Result: All matching transitions except Last_Sync changes
```
**Example 4: Fine-Grained Control with Include + Exclude**
```json
{
"transitions": [
["include", "*.*", "*"],
["include", "Status", "*undefined", "*defined"],
["include", "Status", "*defined", "*undefined"],
["exclude", "Last_Updated", "*", "*"],
["exclude", "Internal_Id", "*", "*"]
]
}
// Step 1: Include any change (baseline)
// Step 2-3: Specifically include Status becoming defined/undefined
// Step 4-5: Exclude Last_Updated and Internal_Id changes (override Step 1)
// Result: All changes EXCEPT Last_Updated/Internal_Id, plus Status transitions
```
---
## Configuration Examples
### Example 1: Monitor New Inclusions (v3.0)
**Requirement:** Alert if unexpected number of patients added
```json
{
"ignore": null,
"bloc_title": "Identification",
"line_label": "New Inclusions",
"warning_threshold": 0,
"critical_threshold": 50,
"field_selection": [["include", "Patient_Identification.Patient_Id"], ["include", "Patient_Identification.Pseudo"]],
"bloc_scope": null,
"transitions": []
}
```
**Field Selection Logic:**
```
Starts empty: candidate_fields = {}
Step 1: Include Patient_Identification.Patient_Id
Step 2: Include Patient_Identification.Pseudo
Result: [Patient_Identification.Patient_Id, Patient_Identification.Pseudo]
These become key candidates (tried in order)
```
**Logic:**
```
Count patients in current but not in previous
If count > 50: CRITICAL (too many new patients)
If count > 0: WARNING (any new patients)
If count == 0: OK
```
### Example 2: Detect Undefined→Defined Changes (v3.0)
**Requirement:** Alert if any field becomes defined
```json
{
"bloc_title": "Inclusion Protocol",
"line_label": "Undefined to Defined",
"warning_threshold": 0,
"critical_threshold": 100,
"field_selection": [["include", "Inclusion.*"]],
"bloc_scope": "any",
"transitions": [
["include", "*.*", "*undefined", "*defined"]
]
}
```
**Field Selection & Transitions:**
```
Field Selection: Include all Inclusion fields
Transitions Pipeline:
Step 1: Include *.* *undefined→*defined
Result: Only undefined→defined changes
```
**Logic:**
```
For each inclusion:
Check if Inclusion.Inclusion_Status changed
If transition is: undefined → defined:
COUNT this inclusion
If count > 5: CRITICAL
If count > 0: WARNING
```
### Example 3: Strict All-Fields Completeness (v3.0)
**Requirement:** Ensure ALL changed fields follow undefined→defined pattern
```json
{
"bloc_title": "Inclusion Protocol",
"line_label": "All Changes Undefined to Defined",
"warning_threshold": 0,
"critical_threshold": 200,
"field_selection": [["include", "Inclusion.*"]],
"bloc_scope": "all",
"transitions": [
["include", "*.*", "*undefined", "*defined"]
]
}
```
**Key Difference with bloc_scope="all":**
```
With bloc_scope="any": Count if ANY field matches
With bloc_scope="all": Count ONLY if ALL changed fields match
```
**Logic:**
```
For each inclusion:
Find all Inclusion fields that changed
Check if ALL changes are: undefined → defined
If all changed fields match pattern:
COUNT this inclusion (expected pattern)
If any changed field doesn't match:
SKIP (unexpected pattern)
If count > 200: CRITICAL (too many gaining data)
```
### Example 4: Request Lifecycle Validation (v3.0)
**Requirement:** Detect expected test request state transitions
```json
{
"bloc_title": "Endotest",
"line_label": "Request Status Changes",
"warning_threshold": 0,
"critical_threshold": 100,
"field_selection": [["include", "Endotest.Request_Sent"], ["include", "Endotest.Request_Status"]],
"bloc_scope": "any",
"transitions": [
["include", "Endotest.Request_Sent", false, true],
["include", "Endotest.Request_Status", "pending", "accepted"],
["include", "Endotest.Request_Status", "pending", "rejected"]
]
}
```
**Field Selection Pipeline:**
```
Empty set start
Step 1: Include Endotest.Request_Sent
Step 2: Include Endotest.Request_Status
Result: {Endotest.Request_Sent, Endotest.Request_Status}
```
**Logic:**
```
For each inclusion:
Check Endotest fields (Request_Sent, Request_Status)
If ANY field matches transitions:
COUNT this inclusion
If count > 100: CRITICAL (too many status changes)
```
### Example 5: Valid Workflow Transitions
**Requirement:** Alert on workflow changes but only for valid state transitions (request can go from pending to accepted/rejected/resubmitted)
```json
{
"bloc_title": "Endotest",
"line_label": "Valid Request Transitions",
"warning_threshold": 0,
"critical_threshold": 50,
"field_group": {"include": ["Endotest"]},
"field_name": ["Request_Status"],
"transitions": [
["include", "Endotest.Request_Status", "pending", "accepted"],
["include", "Endotest.Request_Status", "pending", "rejected"],
["include", "Endotest.Request_Status", "rejected", "resubmitted"],
["include", "Endotest.Request_Status", "accepted", "cancelled"]
],
"bloc_scope": "any"
}
```
**Logic:**
```
For each inclusion:
Check if Request_Status field changed
If transition matches ONE of the 4 allowed transitions:
COUNT this inclusion (valid workflow)
If transition is different:
SKIP (unexpected change - needs investigation)
If count > 50: CRITICAL (too many valid status transitions)
```
**Note:** With multiple transitions in the exception, the field must match ANY of the specified transitions to be included.
---
### Example 6: Exclude Internal Fields
**Requirement:** Monitor data changes but ignore internal/system fields
```json
{
"bloc_title": "Identification",
"line_label": "Data Changes",
"warning_threshold": 0,
"critical_threshold": 100,
"field_group": null,
"field_name": {"exclude": ["Last_Updated", "Import_Time", "Internal_Id"]},
"transitions": [
["include", "*.*", "*", "*"]
],
"bloc_scope": "any"
}
```
**Logic:**
```
For each inclusion:
Check ALL fields EXCEPT [Last_Updated, Import_Time, Internal_Id]
If ANY field changed:
COUNT this inclusion
If count > 100: CRITICAL (too many changes)
```
---
## User Guide: Adding/Modifying Rules
### Step 1: Identify Rule Need
Determine the data validation requirement:
```
Detection Type Use Pattern
─────────────────────────────────────────────────
New patients added "New Inclusions" rule
Patients removed "Deleted Inclusions" rule
Field values changed Standard rule + transitions
Field added/removed "New/Deleted Fields" rule
Specific transitions Standard rule + narrow transitions
Exclude system changes Standard rule + exceptions
```
### Step 2: Choose Rule Type
| Rule Type | When to Use | Complexity |
|-----------|------------|-----------|
| New Inclusions | Track patient additions | Simple |
| Deleted Inclusions | Track patient removals | Simple |
| New Fields | Monitor schema changes | Simple |
| Deleted Fields | Detect removed data | Simple |
| Standard (Transitions) | Monitor specific changes | Medium |
| Standard (with Exceptions) | Monitor changes + allowances | Complex |
### Step 3: Define Thresholds
```
Decision Matrix:
Threshold Pattern Meaning Example Use
─────────────────────────────────────────────────────
(0, 0) No changes allowed Critical data
(0, 1) Anything is critical Surgery dates
(0, 50) Strict monitoring High-value fields
(50, 100) Normal operation Flexible fields
(200, 200) Skip to critical Lenient tracking
```
Recommendation:
```
Strict validation (medical):
warning = 0, critical = 1
Normal validation (most fields):
warning = 5, critical = 20
Lenient validation (administrative):
warning = 50, critical = 100
```
### Step 4: Create Rule Row in Excel
Open `Endobest_Dashboard_Config.xlsx``Regression_Check` sheet
```
Row N:
A: ignore (leave empty)
B: bloc_title (e.g., "Inclusion Protocol")
C: line_label (e.g., "Status Changed")
D: warning_threshold (e.g., 0)
E: critical_threshold (e.g., 20)
F: field_group (e.g., "Inclusion")
G: field_name (e.g., ["Status", "Date"])
H: bloc_scope (e.g., "any")
I: transitions (e.g., [["include", "*.*", "*", "*"]])
```
### Step 5: Define Field Scope
Decide which fields the rule applies to:
```
Scope JSON
──────────────────────────────────────────────
All fields null
All in group X "group_name"
Multiple groups {"include": ["group1", "group2"]}
All except group X {"exclude": ["group1"]}
Specific field "field_name"
Multiple fields ["field1", "field2"]
Field with notation ["Group.field1", "Group.field2"]
```
### Step 6: Define Transitions
Specify what changes to monitor:
```
Pattern JSON Meaning
────────────────────────────────────────────────────────────
Any change [["*", "*"]] Monitor all changes
Become defined [["*undefined", "*defined"]] Field gets value
Become undefined [["*defined", "*undefined"]] Field loses value
Toggle boolean [[true, false], [false, true]] Boolean flip
Specific change [["old", "new"]] Exact transition
Multiple changes [["old1", "new1"], ["old2", "new2"]] Multiple patterns
```
### Step 7: Set Exceptions (Optional)
Allow specific field/transition combinations:
```
If needed:
i: transition_exceptions = {
"include": [
{"field": "Request_Sent", "transition": [false, true]}
]
}
Or exclude specific cases:
i: transition_exceptions = {
"exclude": [
{"field": "Last_Updated"}
]
}
```
### Step 8: Choose Bloc Scope
Decide aggregation logic:
```
Requirement bloc_scope
─────────────────────────────────────────────
Any field changes "any" (default)
All changes match "all"
```
### Step 9: Validate & Test
```bash
# Check-only mode (validates configuration)
python eb_dashboard.py --check-only
# Expected output:
# ✓ Loaded 42 regression check rules
# ✓ All checks passed
```
### Step 10: Full Collection Test
```bash
# Run full collection to test rule
python eb_dashboard.py
# After collection, verify:
# 1. Rule appears in output
# 2. Severity level is correct (OK/Warning/Critical)
# 3. Count matches expectations
```
---
## Execution Modes
### Mode 1: Normal Collection with Quality Checks
```bash
python eb_dashboard.py
```
**Workflow:**
```
1. Collect data (organizations, inclusions)
2. Run Coherence Check
3. Run Non-Regression Check (if old file exists)
4. If critical issues: Ask user for confirmation
5. If OK or user confirms: Export files
6. Display elapsed time
```
**Output:**
```
Collecting data from 15 organizations...
[████████████████████] 1200/1200
═══ Coherence Check ═══
✓ [green]TOTAL matches[/green]
═══ Non Regression Check ═══
✓ [green]Structure: New Fields: 0[/green]
✓ [green]Identification: New Inclusions: 0[/green]
...
✓ All checks passed successfully!
Writing files...
Elapsed time: 3:42
```
### Mode 2: Check-Only (Validation Only)
```bash
python eb_dashboard.py --check-only
```
**Workflow:**
```
1. Load existing JSON files (no API calls)
2. Load regression configuration
3. Run Coherence Check
4. Run Non-Regression Check
5. Report results
6. Exit
```
**Use Case:** Validate data before distribution without fresh collection
**Output:**
```
═══ CHECK ONLY MODE ═══
Running quality checks on existing data files...
[Loading configuration...]
[Running checks...]
✓ All checks passed successfully!
```
### Mode 3: Compare Two Files
```bash
python eb_dashboard.py --check-only file1.json file2.json
```
**Workflow:**
```
1. Load file1 and file2 (as current and old)
2. Skip coherence check (organizations not provided)
3. Run regression check comparing them
4. Report differences
5. Exit
```
**Use Case:** Compare two snapshots, detect changes between versions
**Output:**
```
═══ CHECK ONLY COMPARE MODE ═══
Comparing two specific files:
Current: file1.json
Old: file2.json
[Running regression checks...]
⚠ [yellow]New Inclusions: 15[/yellow]
✗ [red]Deleted Inclusions: 5[/red]
...
```
### Mode 4: Debug Mode (Verbose Output)
```bash
python eb_dashboard.py --debug
```
**Workflow:**
```
1. Execute as Normal Mode
2. Enable DEBUG_MODE in quality checks
3. Display detailed field-by-field changes
4. Show individual inclusion comparisons
5. Verbose logging
```
**Use Case:** Troubleshoot regression rules, understand data changes
**Output:**
```
Running collection...
[████████] 1200/1200
═══ Non Regression Check (DEBUG MODE) ═══
Endotest - Undefined to Defined (Only): 12
✓ Patient-001:
- Endotest.Request_Sent: false → true
- Endotest.Request_Status: undefined → 'completed'
✓ Patient-002:
- Endotest.Request_Sent: false → true
...
```
---
## Troubleshooting
### Issue 1: "Invalid JSON format" Error
**Symptom:** Configuration validation fails
**Cause:** Malformed JSON in transitions, field_name, or exceptions
**Solution:**
1. Open cell in JSON validator
2. Fix syntax errors
3. Re-run check
**Example - WRONG:**
```json
{
"transitions": [["active", "inactive" ] // Missing comma
}
{
"field_name": ["Status" "Date"] // Missing comma between array elements
}
```
**Example - CORRECT:**
```json
{
"transitions": [["active", "inactive"]]
}
{
"field_name": ["Status", "Date"]
}
```
### Issue 2: Rule Never Triggers
**Symptom:** Count always shows 0 even when data changes
**Causes:**
1. Field filters too restrictive
2. Transition pattern doesn't match actual changes
3. field_group/field_name filtering excludes target fields
**Solution:**
1. Loosen field filters: Set field_name to null
2. Use wildcards in transitions: `["*", "*"]`
3. Check actual field names in JSON output
4. Enable debug mode to see field matching
### Issue 3: Too Many False Positives
**Symptom:** Rule triggers unexpectedly, too many violations
**Causes:**
1. Thresholds set too low
2. Transitions too broad (matching unintended changes)
3. field_group/field_name too permissive
**Solution:**
1. Increase thresholds: Raise warning_threshold and critical_threshold
2. Narrow transitions: Use specific values instead of wildcards
3. Add exceptions: Use transition_exceptions to exclude specific cases
4. Narrow field scope: Specify field_name instead of null
### Issue 4: Configuration Changes Not Taking Effect
**Symptom:** Modifications to Excel file don't affect results
**Causes:**
1. File not saved
2. Regression_Check sheet not loaded
3. Old configuration still in memory
**Solution:**
1. Save Excel file (Ctrl+S)
2. Restart Python script
3. Verify sheet name is exactly "Regression_Check"
4. Check file path is correct
### Issue 5: User Confirmation Not Appearing
**Symptom:** Expected prompt for critical issues doesn't show
**Causes:**
1. Issues are at warning level, not critical
2. Thresholds higher than actual counts
3. Running in check-only mode (no export decision needed)
**Solution:**
1. Verify thresholds: warning < critical
2. Check actual violation counts
3. Run normal mode (not check-only)
### Issue 6: Comparison Mode Showing Unexpected Differences
**Symptom:** `--check-only file1 file2` reports many changes
**Causes:**
1. Files are from different collection dates (expected)
2. Configuration changed between collections (expected)
3. Field order or grouping changed (might be false positive)
**Solution:**
1. Review reported changes manually
2. Check if changes are expected (new patient data added)
3. Verify no data corruption occurred
4. Compare file sizes and counts manually
---
## Performance Considerations
### Regression Check Execution Time
**Factors Affecting Performance:**
```
1. Number of Inclusions (patients)
- N patients = O(N) iterations
- Typical: 1200 patients = 1-2 seconds
2. Number of Rules
- R rules applied to each inclusion
- Typical: 20-30 rules = <100ms total
3. Field Matching Complexity
- Filter evaluation per field
- Notation pointée parsing: O(1) per field
- Typical: <50ms for all rules
4. Total Typical Time
- 1200 inclusions × 25 rules = 1-3 seconds
```
### Optimization Tips
**If Regression Check is Slow:**
1. **Reduce rule count:**
- Remove inactive rules (add "ignore" label)
- Combine similar rules
2. **Simplify field filters:**
- Use null instead of large filter lists
- Use include (smaller) instead of exclude (larger)
3. **Narrow transitions:**
- Use specific values instead of wildcards
- Reduce number of transition pairs
4. **Consider file size:**
- Large JSON files (>20MB) take longer to parse
- This is rare and usually not the bottleneck
---
## Summary
The Quality Checks System provides:
**Multi-Level Validation:** Coherence + Regression checks
**Config-Driven Rules:** No code changes needed
**Flexible Thresholds:** Warning and Critical levels
**Rich Filtering:** Group, field, notation pointée support
**Transition Patterns:** Wildcard, keyword, and specific matching
**Advanced Exception Handling:**
- Multiple transitions per exception: `[[old1, new1], [old2, new2], ...]`
- Include + Exclude can coexist simultaneously
- Fine-grained control over allowed/blocked transitions
**Backward Compatible:** Legacy single-transition format still supported
**Debug Support:** Detailed logging and debug mode
**Execution Modes:** Normal, check-only, compare, debug
This architecture enables robust data quality monitoring without requiring code modifications, empowering business analysts to define and evolve validation rules independently.
---
**Document End**