2139 lines
59 KiB
Markdown
2139 lines
59 KiB
Markdown
# Endobest Quality Checks & Regression Testing Guide
|
||
|
||
## Part 3: Quality Assurance, Validation Rules & Configuration
|
||
|
||
**Document Version:** 3.1 (Updated with new Excel export module reference)
|
||
**Last Updated:** 2025-11-08
|
||
**Audience:** Developers, Business Analysts, QA Engineers
|
||
**Language:** English
|
||
|
||
**Note:** Excel export functionality now available - see DOCUMENTATION_13_EXCEL_EXPORT.md, DOCUMENTATION_98_USER_GUIDE.md, and DOCUMENTATION_99_CONFIG_GUIDE.md
|
||
|
||
---
|
||
|
||
## Version History
|
||
|
||
### Version 3.0 (2025-10-22) - UNIFIED FIELD SELECTION PIPELINE
|
||
**Complete Refactorization of Field Selection**
|
||
- ✅ **Merged Columns:** `field_group` (F) + `field_name` (G) → single `field_selection` (F)
|
||
- ✅ **Simplified Syntax:** Field selection uses same pipeline format as transitions: `[["action", "field_selector"], ...]`
|
||
- ✅ **3 Selector Patterns:** `*.*` (all fields), `group.*` (group), `group.field` (specific)
|
||
- ✅ **Cleaner Code:** Removed 150+ lines of dual-filter logic (field_group + field_name combinations)
|
||
- ✅ **Config-Driven Keys:** Key field determination (Patient_Id, Pseudo) now read from `field_selection` instead of hardcoded
|
||
- ✅ **Unified Key Detection:** New `_get_key_field_from_new_inclusions_rule()` applies field_selection pipeline directly to first inclusion (15 LOC, -75% vs manual parsing)
|
||
- ✅ **Helper Functions:** `_apply_field_selection_pipeline()`, `_get_key_field_from_new_inclusions_rule()`, `_build_candidate_fields()`
|
||
- ⚠️ **MAJOR Breaking Change:** Old `field_group` and `field_name` columns (F, G) are **removed**
|
||
- ⚠️ **Column Shifts:** `bloc_scope` moves H→G, `transitions` moves I→H
|
||
- ⚠️ **Configuration Migration Required:** Completely restructure Excel `Regression_Check` sheet
|
||
|
||
**Technical Details:**
|
||
- Field selection pipeline starts with empty set, each step adds/removes fields
|
||
- Responsibility on admin to order rules correctly (no implicit logic)
|
||
- Special rules `"New Fields", "Deleted Fields", "Deleted Inclusions"` must have empty field_selection
|
||
- Special rule `"New Inclusions"` applies field_selection pipeline to first inclusion sample (assumes stable structure)
|
||
- Key field detection: finds first field from pipeline that has non-null value in both first new and old inclusion
|
||
- Configuration validation: missing/invalid field_selection = CRITICAL error
|
||
|
||
**Removed Dead Code:**
|
||
- `_determine_key_field()` - hardcoded Patient_Id/Pseudo logic
|
||
- `_matches_field_group_filter()` - replaced by pipeline
|
||
- `_matches_field_name_filter()` - replaced by pipeline
|
||
- `_determine_key_field_from_config()` - replaced by simplified unified `_get_key_field_from_new_inclusions_rule()`
|
||
|
||
### Version 2.0 (2025-10-22) - Pipeline Architecture
|
||
**Transitions Pipeline Introduced**
|
||
- ✅ **Unified Format:** Merged `transitions` + `transition_exceptions` into single `transitions` column
|
||
- ✅ **Simplified Syntax:** Each step is a 4-element array `[action, field_selector, from, to]`
|
||
- ✅ **Sequential Processing:** Pipeline steps applied in order, allowing fine-grained control
|
||
- ✅ **Better Determinism:** All sets sorted for reproducible logs
|
||
- ✅ **Improved Error Handling:** Invalid configs silently skipped with warnings
|
||
- ⚠️ **Breaking Change:** Old `transition_exceptions` column (J) merged into `transitions` (I)
|
||
|
||
### Version 1.0 (2025-10-21) - Initial Release
|
||
- Dual-column system: `transitions` (I) + `transition_exceptions` (J)
|
||
- Include/exclude exception handling
|
||
- Multiple transition support per exception
|
||
|
||
---
|
||
|
||
## Table of Contents
|
||
|
||
1. [Overview](#overview)
|
||
2. [Quality Assurance Strategy](#quality-assurance-strategy)
|
||
3. [Coherence Check (Technical Details)](#coherence-check-technical-details)
|
||
4. [Non-Regression Check Framework](#non-regression-check-framework)
|
||
5. [Regression Check Configuration File](#regression-check-configuration-file)
|
||
6. [Column Reference](#column-reference)
|
||
7. [Special Keywords & Wildcards](#special-keywords--wildcards)
|
||
8. [Rule Types & Logic](#rule-types--logic)
|
||
9. [Field Selection Pipeline](#field-selection-pipeline-v30)
|
||
10. [Transition Patterns](#transition-patterns)
|
||
11. [Exception Handling](#exception-handling)
|
||
12. [Configuration Examples](#configuration-examples)
|
||
13. [User Guide: Adding/Modifying Rules](#user-guide-adding-modifying-rules)
|
||
14. [Execution Modes](#execution-modes)
|
||
15. [Troubleshooting](#troubleshooting)
|
||
|
||
---
|
||
|
||
## ⚠️ CRITICAL - Version 3.0 Migration Required
|
||
|
||
**This document describes v3.0 with BREAKING CHANGES from v2.0**
|
||
|
||
| Item | v2.0 | v3.0 |
|
||
|------|------|------|
|
||
| **Excel Columns F-I** | `field_group`, `field_name`, `bloc_scope`, `transitions` | `field_selection`, `bloc_scope`, `transitions` |
|
||
| **Column Count** | 4 columns for filtering+transitions | 3 columns (merged field_selection) |
|
||
| **Key Field Config** | Hardcoded (Patient_Id/Pseudo) | Config-driven (from field_selection) |
|
||
| **Field Filtering Logic** | 6+ combinations (complex) | Single pipeline (simple) |
|
||
|
||
**ACTION REQUIRED:**
|
||
1. ✅ Update Excel file column positions
|
||
2. ✅ Migrate field_group + field_name → field_selection
|
||
3. ✅ Run non-regression tests
|
||
4. ✅ Verify key field detection works with new config
|
||
|
||
---
|
||
|
||
## Overview
|
||
|
||
The **Quality Checks System** provides comprehensive data validation in two stages:
|
||
|
||
1. **Coherence Check:** Verifies that organization statistics (API counters) match the actual detailed inclusion data
|
||
2. **Non-Regression Check:** Detects unexpected data changes between current and previous collection runs
|
||
|
||
Both checks are **configurable via Excel** with **Warning/Critical severity levels** that can trigger user confirmation prompts.
|
||
|
||
### Design Philosophy
|
||
|
||
```
|
||
Trust, but Verify
|
||
|
||
- Trust: API data is generally reliable
|
||
- Verify: Statistical consistency and change detection
|
||
- Report: Multi-level severity (OK, Warning, Critical)
|
||
- Decide: User confirmation before export on critical issues
|
||
```
|
||
|
||
---
|
||
|
||
## Quality Assurance Strategy
|
||
|
||
### Workflow Integration
|
||
|
||
```
|
||
Data Collection
|
||
↓
|
||
QUALITY CHECKS
|
||
├─ COHERENCE CHECK (mandatory)
|
||
│ ├─ Load organization statistics from API responses
|
||
│ ├─ Calculate actual counts from detailed inclusions
|
||
│ └─ Compare: Stats vs. Actual
|
||
│
|
||
├─ NON-REGRESSION CHECK (if old file exists)
|
||
│ ├─ Load previous inclusions (_old file)
|
||
│ ├─ Apply config-driven comparison rules
|
||
│ └─ Report: Changes matching configured patterns
|
||
│
|
||
└─ RESULT
|
||
├─ has_coherence_critical flag
|
||
└─ has_regression_critical flag
|
||
↓
|
||
IF critical issues detected:
|
||
├─ Display warning: ⚠ CRITICAL
|
||
├─ Ask user: "Write results anyway?"
|
||
├─ If NO → Abort export, preserve old files
|
||
└─ If YES → Continue with export (user override)
|
||
ELSE:
|
||
└─ Continue with export automatically
|
||
```
|
||
|
||
### Severity Levels
|
||
|
||
| Level | Display | Meaning | Action |
|
||
|-------|---------|---------|--------|
|
||
| **OK** | ✓ Green | No issues, within normal range | Continue automatically |
|
||
| **WARNING** | ⚠ Yellow | Issue detected, exceeds warning threshold | Log and display, continue automatically |
|
||
| **CRITICAL** | ✗ Red | Severe issue, exceeds critical threshold | Display, ask user before export |
|
||
|
||
### User Interaction
|
||
|
||
```
|
||
Quality Checks Complete
|
||
|
||
✗ [red]Coherence Check: CRITICAL[/red]
|
||
⚠ [yellow]Organization 1 mismatch: 95 vs 98[/yellow]
|
||
|
||
✗ [red]Non-Regression: CRITICAL[/red]
|
||
⚠ [yellow]New Inclusions: 42 (threshold 50)[/yellow]
|
||
✗ [red]Deleted Inclusions: 15 (threshold 0)[/red]
|
||
|
||
[bold]⚠ CRITICAL issues detected in quality checks![/bold]
|
||
Do you want to write the results anyway? [y/N]:
|
||
y → Export anyway (risky, user override)
|
||
n → Cancel export (preserve old files)
|
||
```
|
||
|
||
---
|
||
|
||
## Coherence Check (Technical Details)
|
||
|
||
### Purpose
|
||
|
||
Verify that **organization statistics** (fetched from API) match **actual detailed data** (inclusion-by-inclusion count).
|
||
|
||
### Data Sources
|
||
|
||
**Source 1: Organization Statistics (API)**
|
||
```
|
||
For each organization:
|
||
GET /api/inclusions/inclusion-statistics
|
||
Returns:
|
||
{
|
||
"totalInclusions": N, // Total patients
|
||
"preIncluded": P, // Pré-inclus count
|
||
"included": I, // Inclus count
|
||
"prematurelyTerminated": T // Prematurely terminated
|
||
}
|
||
```
|
||
|
||
**Source 2: Inclusion Details (JSON Array)**
|
||
```
|
||
For each patient in endobest_inclusions:
|
||
Check: Patient_Identification.Organisation_Id
|
||
Count: Based on Inclusion.Inclusion_Status
|
||
|
||
Classification rules:
|
||
1. If status ends with " - AP" → prematurely_terminated
|
||
2. Else if status starts with "pré-inclus" → preincluded
|
||
3. Else if status starts with "inclus" → included
|
||
Always count: patients += 1
|
||
```
|
||
|
||
### Validation Logic
|
||
|
||
```python
|
||
def coherence_check(current_inclusions, organizations_list):
|
||
# STEP 1: Collect statistics from API
|
||
total_stats = {
|
||
'patients': sum(org['patients_count'] for org in organizations),
|
||
'preincluded': sum(org['preincluded_count'] for org in organizations),
|
||
'included': sum(org['included_count'] for org in organizations),
|
||
'prematurely_terminated': sum(org['prematurely_terminated_count'] for org in organizations)
|
||
}
|
||
|
||
# STEP 2: Calculate actual counts from detailed data
|
||
total_detail = calculate_detail_counters(current_inclusions, org_id=None)
|
||
# = (patients, preincluded, included, prematurely_terminated)
|
||
|
||
# STEP 3: Compare all 4 counters
|
||
is_match = (
|
||
total_stats['patients'] == total_detail['patients'] AND
|
||
total_stats['preincluded'] == total_detail['preincluded'] AND
|
||
total_stats['included'] == total_detail['included'] AND
|
||
total_stats['prematurely_terminated'] == total_detail['prematurely_terminated']
|
||
)
|
||
|
||
# STEP 4: Report total comparison
|
||
IF is_match:
|
||
PRINT: ✓ [green]TOTAL matches[/green]
|
||
ELSE:
|
||
PRINT: ✗ [red]TOTAL mismatch[/red]
|
||
PRINT: Stats({P}/{Pre}/{Inc}/{Term}) vs Detail({p}/{pre}/{inc}/{term})
|
||
set has_critical = True
|
||
|
||
# STEP 5: Detail-level comparison (only if not OK)
|
||
FOR EACH organization:
|
||
org_stats = get organization counters
|
||
org_detail = calculate_detail_counters(current_inclusions, org_id=org.id)
|
||
|
||
IF org_stats != org_detail:
|
||
PRINT: ⚠ [yellow]Organization "{name}" mismatch[/yellow]
|
||
PRINT: Stats vs Detail breakdown
|
||
set has_critical = True
|
||
|
||
RETURN has_critical
|
||
```
|
||
|
||
### Example Output
|
||
|
||
**Scenario: Perfect Match**
|
||
```
|
||
═══ Coherence Check ═══
|
||
|
||
✓ [green]TOTAL - Stats(150/20/120/10) vs Detail(150/20/120/10)[/green]
|
||
```
|
||
|
||
**Scenario: Mismatch Detected**
|
||
```
|
||
═══ Coherence Check ═══
|
||
|
||
✗ [red]TOTAL - Stats(150/20/118/10) vs Detail(150/20/120/10)[/red]
|
||
⚠ [yellow]Center A - Stats(50/5/40/5) vs Detail(50/5/42/5)[/yellow]
|
||
⚠ [yellow]Center B - Stats(100/15/78/5) vs Detail(100/15/78/5)[/yellow]
|
||
```
|
||
|
||
### Interpretation
|
||
|
||
**Match (Green):**
|
||
```
|
||
API statistics perfectly align with detailed data
|
||
→ No data collection issues
|
||
→ Continue processing
|
||
```
|
||
|
||
**Minor Mismatch (Yellow):**
|
||
```
|
||
1-2 patients differ between statistics and details
|
||
→ Possible API consistency issue
|
||
→ Monitor but continue (it happens occasionally)
|
||
```
|
||
|
||
**Major Mismatch (Red):**
|
||
```
|
||
10+ patients difference
|
||
→ Significant data collection issue
|
||
→ Investigate root cause
|
||
→ Consider re-running collection
|
||
```
|
||
|
||
---
|
||
|
||
## Non-Regression Check Framework
|
||
|
||
### Purpose
|
||
|
||
Detect **unexpected data changes** between current and previous collections by comparing field values against configured transition patterns.
|
||
|
||
### Architecture
|
||
|
||
```
|
||
Previous Inclusions (File)
|
||
↓
|
||
┌─────────────────────────────┐
|
||
│ NON-REGRESSION CHECK │
|
||
├─────────────────────────────┤
|
||
│ 1. Load Regression Config │
|
||
│ (Excel: Regression_Check sheet)
|
||
│ │
|
||
│ 2. Build Inclusion Dicts │
|
||
│ Index by: Patient_Id or Pseudo
|
||
│ │
|
||
│ 3. Group Rules by Bloc │
|
||
│ - Structure │
|
||
│ - Identification │
|
||
│ - Inclusion Protocol │
|
||
│ - Endotest │
|
||
│ - Other Questionnaires │
|
||
│ │
|
||
│ 4. For Each Rule: │
|
||
│ a) Detect rule type │
|
||
│ - Normal rule │
|
||
│ - New Inclusions │
|
||
│ - Deleted Inclusions │
|
||
│ - New Fields │
|
||
│ - Deleted Fields │
|
||
│ │
|
||
│ b) Process rule logic │
|
||
│ - Collect candidates │
|
||
│ - Match transitions │
|
||
│ - Apply exceptions │
|
||
│ - Apply bloc_scope │
|
||
│ │
|
||
│ c) Calculate severity │
|
||
│ - Count vs thresholds │
|
||
│ - Determine status │
|
||
│ │
|
||
│ 5. Display Results │
|
||
│ - By bloc │
|
||
│ - Color-coded status │
|
||
│ - Detailed changes (debug)
|
||
│ │
|
||
└─────────────────────────────┘
|
||
↓
|
||
Current Inclusions (Memory)
|
||
```
|
||
|
||
---
|
||
|
||
## Regression Check Configuration File
|
||
|
||
### File Location & Sheet
|
||
|
||
```
|
||
Endobest_Dashboard_Config.xlsx
|
||
│
|
||
├─ Sheet 1: "Inclusions_Mapping" (See DOCUMENTATION_11_FIELD_MAPPING.md)
|
||
│
|
||
└─ Sheet 2: "Regression_Check"
|
||
├─ Row 1: Headers
|
||
└─ Row 2+: Rules
|
||
```
|
||
|
||
### Sheet Structure (Version 3.0)
|
||
|
||
```
|
||
Row 1 (Headers):
|
||
A B C D E
|
||
ignore bloc_title line_label warning_threshold critical_threshold
|
||
F G H
|
||
field_selection bloc_scope transitions
|
||
|
||
Row 2+: Rule definitions (one per row)
|
||
```
|
||
|
||
**BREAKING CHANGE (v3.0):** Columns F and G from v2.0 (`field_group` and `field_name`) have been **merged into single column F (`field_selection`)**. All subsequent columns shifted left by one position.
|
||
|
||
**Color Coding:**
|
||
- **Yellow:** Structure/Identification bloc (foundational rules)
|
||
- **Blue:** Inclusion Protocol bloc (inclusion status rules)
|
||
- **Light Purple:** Endotest bloc (test-related rules)
|
||
- **White:** Regular rules
|
||
- **Red:** Incomplete/error rules (missing required columns)
|
||
|
||
---
|
||
|
||
## Column Reference
|
||
|
||
### Column A: ignore
|
||
**Type:** String (optional)
|
||
**Description:** Skip this row if contains "ignore" (case-insensitive)
|
||
**Purpose:** Comment out rules without deleting rows
|
||
**Values:**
|
||
```
|
||
ignore → Row is skipped
|
||
(empty) → Row is processed
|
||
any_other_text → Row is processed
|
||
```
|
||
|
||
### Column B: bloc_title
|
||
**Type:** String (required)
|
||
**Description:** Logical grouping of related rules
|
||
**Purpose:** Visual organization and blocking/reporting
|
||
**Valid Values:**
|
||
```
|
||
Structure → File format and field availability rules
|
||
Identification → Patient identification changes
|
||
Inclusion Protocol → Inclusion status and protocol changes
|
||
Endotest → Laboratory test request changes
|
||
Other Questionnaires → Non-specific questionnaire changes
|
||
[Custom Group Names] → Any custom bloc name for organization
|
||
```
|
||
|
||
**Rules Per Bloc:**
|
||
```
|
||
Structure bloc (Example):
|
||
├─ New Fields
|
||
├─ Deleted Fields
|
||
└─ (Structure-specific rules)
|
||
|
||
Identification bloc:
|
||
├─ New Inclusions
|
||
├─ Deleted Inclusions
|
||
├─ Changed (Excluding Birthday)
|
||
├─ Changed Date of Birth/Age
|
||
└─ (Identification-specific rules)
|
||
|
||
Endotest bloc:
|
||
├─ Undefined to Defined (Only)
|
||
├─ Defined to Undefined
|
||
├─ Changed Value
|
||
└─ (Endotest-specific rules)
|
||
```
|
||
|
||
### Column C: line_label
|
||
**Type:** String (required)
|
||
**Description:** Unique rule identifier within its bloc
|
||
**Purpose:** Displayed in output, identifies rule in reports
|
||
**Examples:**
|
||
```
|
||
New Inclusions
|
||
Deleted Inclusions
|
||
New Fields
|
||
Deleted Fields
|
||
Changed Value
|
||
Undefined to Defined (Only)
|
||
```
|
||
|
||
**Requirements:**
|
||
- Must be unique within bloc_title
|
||
- Should be descriptive
|
||
|
||
### Column D: warning_threshold
|
||
**Type:** Numeric (required, >= 0)
|
||
**Description:** Count threshold that triggers WARNING level
|
||
**Position:** Column D (after line_label)
|
||
**Logic:**
|
||
```
|
||
IF count > warning_threshold AND count <= critical_threshold:
|
||
Status = WARNING (yellow ⚠)
|
||
```
|
||
|
||
**Examples:**
|
||
```
|
||
0 → Any change triggers warning (strict)
|
||
5 → 1-5 changes = OK, 6-10 = Warning
|
||
50 → 1-50 changes = OK, 51+ = Warning (lenient)
|
||
200 → Very lenient, only alert on large changes
|
||
```
|
||
|
||
### Column E: critical_threshold
|
||
**Type:** Numeric (required, >= warning_threshold)
|
||
**Description:** Count threshold that triggers CRITICAL level
|
||
**Position:** Column E (after warning_threshold)
|
||
**Logic:**
|
||
```
|
||
IF count > critical_threshold:
|
||
Status = CRITICAL (red ✗)
|
||
→ May prompt user for confirmation
|
||
```
|
||
|
||
**Relationship:**
|
||
```
|
||
warning_threshold <= critical_threshold
|
||
|
||
Examples:
|
||
(0, 1) → Strict: any change is critical
|
||
(0, 50) → Any warning also becomes critical
|
||
(50, 100) → Normal operation: 1-50 OK, 51-100 warning, 100+ critical
|
||
(200, 200) → Same thresholds: jump directly from OK to critical
|
||
```
|
||
|
||
### Column F: field_selection (NEW - v3.0)
|
||
**Type:** JSON array of 2-element arrays (mandatory for most rules)
|
||
**Description:** Pipeline-based field selection using include/exclude actions
|
||
**Position:** Column F (after critical_threshold) - **REPLACES old field_group + field_name**
|
||
**Rules:**
|
||
- **Format:** `[["action", "field_selector"], ["action", "field_selector"], ...]`
|
||
- **Mandatory:** For all rules EXCEPT `"New Fields"`, `"Deleted Fields"`, `"Deleted Inclusions"`
|
||
- **For special rules:** Must be empty `[]` or null
|
||
- **Explicit:** No implicit logic - admin must order steps correctly
|
||
- **Pipeline:** Starts with empty set, each step adds or removes fields
|
||
|
||
**Elements:**
|
||
|
||
| Element | Type | Valid Values | Example |
|
||
|---------|------|--------------|---------|
|
||
| **action** | String | `"include"` or `"exclude"` | `"include"` |
|
||
| **field_selector** | String | `*.*`, `group.*`, `group.field` | `"Endotest.Request_Sent"` |
|
||
|
||
**Selector Patterns (3 only):**
|
||
```
|
||
*.* → All fields in all groups
|
||
group.* → All fields in specific group (e.g., "Endotest.*")
|
||
group.field → Specific field only (e.g., "Endotest.Request_Sent")
|
||
```
|
||
|
||
**Examples:**
|
||
|
||
**1. Include Single Group**
|
||
```json
|
||
[["include", "Endotest.*"]]
|
||
// All Endotest fields
|
||
```
|
||
|
||
**2. Include Multiple Groups**
|
||
```json
|
||
[["include", "Endotest.*"], ["include", "Inclusion.*"]]
|
||
// Endotest AND Inclusion fields
|
||
```
|
||
|
||
**3. Include All, Exclude Some**
|
||
```json
|
||
[["include", "*.*"], ["exclude", "Endotest.Last_Updated"]]
|
||
// All fields EXCEPT Endotest.Last_Updated
|
||
```
|
||
|
||
**4. Key Field Selection (for "New Inclusions" rule)**
|
||
```json
|
||
[["include", "Patient_Identification.Patient_Id"], ["include", "Patient_Identification.Pseudo"]]
|
||
// Tries Patient_Id first, then Pseudo (in order)
|
||
```
|
||
|
||
**5. Complex Pipeline**
|
||
```json
|
||
[
|
||
["include", "*.*"],
|
||
["exclude", "Inclusion.*"],
|
||
["exclude", "Patient_Identification.*"]
|
||
]
|
||
// All fields EXCEPT Inclusion and Patient_Identification
|
||
```
|
||
|
||
**Special Rules (field_selection must be EMPTY):**
|
||
```
|
||
"New Fields" → [] or null
|
||
"Deleted Fields" → [] or null
|
||
"Deleted Inclusions" → [] or null
|
||
```
|
||
|
||
**Validation:**
|
||
- ✅ Missing or null field_selection for normal rules → **CRITICAL ERROR**
|
||
- ✅ Invalid selector (no dot) → **CRITICAL ERROR**
|
||
- ✅ Non-list format → **CRITICAL ERROR, skip rule**
|
||
- ✅ Step with wrong element count → **CRITICAL ERROR, skip rule**
|
||
|
||
### Column G: bloc_scope (moved from H - v3.0)
|
||
**Type:** String enum (optional, default: "any")
|
||
**Description:** Aggregation logic for matching fields within an inclusion
|
||
**Position:** Column G (after field_selection)
|
||
**Valid Values:**
|
||
```
|
||
"any" → At least ONE field must match transitions
|
||
"all" → ALL changed fields must match transitions
|
||
```
|
||
|
||
**Logic:**
|
||
|
||
**bloc_scope = "any" (Default)**
|
||
```
|
||
IF ANY candidate field has matching transition:
|
||
RETURN inclusion matches rule
|
||
|
||
Use for: "Alert if any change occurs"
|
||
```
|
||
|
||
**bloc_scope = "all"**
|
||
```
|
||
IF ALL changed fields have matching transitions:
|
||
RETURN inclusion matches rule
|
||
|
||
Use for: "Alert only if all changes match pattern"
|
||
```
|
||
|
||
**Example Comparison:**
|
||
|
||
```
|
||
Inclusion with 5 fields in scope:
|
||
Field1: Changed, matches transition ✓
|
||
Field2: Unchanged (always ignored)
|
||
Field3: Changed, does NOT match transition ✗
|
||
Field4: Unchanged (always ignored)
|
||
Field5: Changed, matches transition ✓
|
||
|
||
Changed fields: [Field1, Field3, Field5]
|
||
Matched changed: [Field1, Field5]
|
||
|
||
Result with bloc_scope="any": ✓ COUNT (Field1 matched)
|
||
Result with bloc_scope="all": ✗ SKIP (Field3 didn't match)
|
||
```
|
||
|
||
| Scenario | bloc_scope="any" | bloc_scope="all" |
|
||
|----------|------------------|-----------------|
|
||
| 1 match, 0 mismatches | ✓ COUNT | ✓ COUNT |
|
||
| 1 match, 1 mismatch | ✓ COUNT | ✗ SKIP |
|
||
| 0 matches, 1 mismatch | ✗ SKIP | ✗ SKIP |
|
||
| 3 matches, 0 mismatches | ✓ COUNT | ✓ COUNT |
|
||
| 3 matches, 1 mismatch | ✓ COUNT | ✗ SKIP |
|
||
|
||
---
|
||
|
||
### Column H: transitions (moved from I - v3.0)
|
||
**Type:** JSON array of 4-element arrays (optional)
|
||
**Description:** Pipeline-based transition rules (old_value → new_value)
|
||
**Position:** Column H (after bloc_scope)
|
||
**Format:** `[["action", "field_selector", "from_pattern", "to_pattern"], ...]`
|
||
- Each step is exactly 4 elements
|
||
- If None/empty: Rule applies to ALL field changes
|
||
- Supports wildcard keywords: `*undefined`, `*defined`, `*`
|
||
- Supports literal values for exact matching
|
||
|
||
**Pipeline Concept (v2.0+):**
|
||
|
||
```
|
||
Initial state: All changed fields → is_checked = False
|
||
|
||
Step 1: Include rule for all fields (*.*) with *defined→*defined
|
||
└─ is_checked = True if transition matches
|
||
|
||
Step 2: Include rule for Endotest.Diagnostic_Status with waiting→*undefined
|
||
└─ is_checked = True (whitelisted exception)
|
||
|
||
Step 3: Exclude rule for Endotest.Request_Sent with false→true
|
||
└─ is_checked = False (blacklisted exception)
|
||
|
||
Final result: Only fields matching the pipeline are checked
|
||
```
|
||
|
||
---
|
||
|
||
#### Syntax: 4-Element Pipeline Array
|
||
|
||
Each pipeline step is a **4-element array**:
|
||
```json
|
||
[action, field_selector, from_pattern, to_pattern]
|
||
```
|
||
|
||
| Element | Description | Examples |
|
||
|---------|-------------|----------|
|
||
| **action** | "include" (whitelist) or "exclude" (blacklist) | "include", "exclude" |
|
||
| **field_selector** | Which fields this step applies to | "*.*", "group.*", "group.field" |
|
||
| **from_pattern** | Old value pattern to match | "*undefined", "*defined", "*", literal value |
|
||
| **to_pattern** | New value pattern to match | "*undefined", "*defined", "*", literal value |
|
||
|
||
**Important:** The syntax is **strictly enforced** - each step must have exactly 4 elements. No shortcuts or variants are accepted.
|
||
|
||
---
|
||
|
||
#### Field Selector Patterns
|
||
|
||
```
|
||
*.* → All fields in all groups
|
||
group.* → All fields in specific group (e.g., "Endotest.*")
|
||
group.field → Specific field only (e.g., "Endotest.Request_Sent")
|
||
```
|
||
|
||
---
|
||
|
||
#### Complete Examples
|
||
|
||
**Example 1: Simple All-Fields Rule (Most Common)**
|
||
```json
|
||
{
|
||
"transitions": [
|
||
["include", "*.*", "*defined", "*defined"]
|
||
]
|
||
}
|
||
// Pipeline: Include all fields that change between two defined values
|
||
```
|
||
|
||
**Example 2: Main Rule + One Include Exception**
|
||
```json
|
||
{
|
||
"transitions": [
|
||
["include", "*.*", "*defined", "*defined"],
|
||
["include", "Endotest.Diagnostic_Status", "waiting", "*undefined"]
|
||
]
|
||
}
|
||
// Step 1: Include all *defined→*defined changes
|
||
// Step 2: ALSO include specific Endotest.Diagnostic_Status changes from waiting to undefined
|
||
```
|
||
|
||
**Example 3: Main Rule + Include Exception + Exclude Exception**
|
||
```json
|
||
{
|
||
"transitions": [
|
||
["include", "*.*", "*defined", "*defined"],
|
||
["include", "Endotest.Diagnostic_Status", "waiting", "*undefined"],
|
||
["exclude", "Endotest.Request_Sent", false, true]
|
||
]
|
||
}
|
||
// Step 1: Include all *defined→*defined
|
||
// Step 2: Include Diagnostic_Status waiting→undefined (whitelist)
|
||
// Step 3: Exclude Request_Sent false→true (blacklist)
|
||
// Result: Step 3 overrides Step 1 for that specific field+transition
|
||
```
|
||
|
||
**Example 4: Multiple Include Steps for Different Fields**
|
||
```json
|
||
{
|
||
"transitions": [
|
||
["include", "*.*", "*defined", "*defined"],
|
||
["include", "GDD.Status", "pending", "completed"],
|
||
["include", "GDD.Status", "pending", "failed"]
|
||
]
|
||
}
|
||
// Step 1: Include all *defined→*defined changes
|
||
// Step 2: Include GDD.Status pending→completed
|
||
// Step 3: Include GDD.Status pending→failed
|
||
```
|
||
|
||
**Example 5: Exclude Rule with Wildcard**
|
||
```json
|
||
{
|
||
"transitions": [
|
||
["include", "*.*", "*defined", "*defined"],
|
||
["exclude", "Endotest.Last_Modified", "*", "*"]
|
||
]
|
||
}
|
||
// Include all changes EXCEPT any change to Last_Modified field
|
||
```
|
||
|
||
---
|
||
|
||
#### Processing Logic
|
||
|
||
The pipeline is executed **sequentially**, with each step modifying the `is_checked` status in-place:
|
||
|
||
```
|
||
1. Initialize: All changed fields have is_checked = False
|
||
|
||
2. For each transition step in order:
|
||
a. Check if the current field matches the field_selector
|
||
b. If yes: Check if the old→new values match from_pattern→to_pattern
|
||
c. If yes:
|
||
- If action="include": Set is_checked = True
|
||
- If action="exclude": Set is_checked = False
|
||
d. If no: Leave is_checked unchanged
|
||
|
||
3. Final: Only fields with is_checked = True are counted as matching
|
||
```
|
||
|
||
**Important:** Later steps can override earlier steps. Example:
|
||
```json
|
||
[
|
||
["include", "*.*", "*", "*"], // Step 1: include everything
|
||
["exclude", "Field.X", "*", "*"] // Step 2: exclude Field.X (overrides Step 1)
|
||
]
|
||
```
|
||
Result: Everything is included EXCEPT Field.X
|
||
|
||
---
|
||
|
||
#### Configuration Error Handling
|
||
|
||
If a transitions step has invalid syntax:
|
||
- The rule is silently skipped (logged with yellow warning)
|
||
- No exception is thrown
|
||
- User can see the ⚠ warning in the output
|
||
- User can choose to save the report or fix the config
|
||
|
||
**Valid syntax example:**
|
||
```json
|
||
["include", "*.*", "*defined", "*defined"] // ✓ Exactly 4 elements
|
||
["include", "*.*", "*defined"] // ✗ Only 3 elements (INVALID)
|
||
["maybe", "*.*", "*defined", "*defined"] // ✗ Invalid action (INVALID)
|
||
["include", "invalid", "*defined", "*defined"] // ✗ No dot in selector (INVALID)
|
||
```
|
||
|
||
---
|
||
|
||
## Special Keywords & Wildcards
|
||
|
||
This section documents the special keywords and patterns used in transition specifications throughout the configuration.
|
||
|
||
### Keywords in Transition Patterns
|
||
|
||
The regression check configuration supports special keywords with `*` prefix for flexible transition matching:
|
||
|
||
#### Keyword 1: `*undefined`
|
||
|
||
**Meaning:** Matches any "undefined-like" value
|
||
|
||
**Matches:**
|
||
- `null` (None in Python)
|
||
- `""` (empty string)
|
||
- `"undefined"` (literal string)
|
||
|
||
**Example:**
|
||
```json
|
||
{
|
||
"transitions": [["*undefined", "*defined"]]
|
||
}
|
||
// Matches: undefined → Active, null → 42, "" → true, etc.
|
||
```
|
||
|
||
**Use Case:** Detect when a field gets populated for the first time
|
||
|
||
---
|
||
|
||
#### Keyword 2: `*defined`
|
||
|
||
**Meaning:** Matches any "defined" value (opposite of *undefined)
|
||
|
||
**Matches:** Anything EXCEPT:
|
||
- `null` (None)
|
||
- `""` (empty string)
|
||
- `"undefined"` (literal string)
|
||
|
||
**Example:**
|
||
```json
|
||
{
|
||
"transitions": [["*defined", "*undefined"]]
|
||
}
|
||
// Matches: Active → null, 42 → "", true → "undefined", etc.
|
||
```
|
||
|
||
**Use Case:** Detect when a field loses its value
|
||
|
||
---
|
||
|
||
#### Keyword 3: `*` (Wildcard)
|
||
|
||
**Meaning:** Matches absolutely any value
|
||
|
||
**Matches:** Any value including:
|
||
- Defined values (strings, numbers, booleans)
|
||
- Undefined-like values (null, "", "undefined")
|
||
- Objects, arrays, etc.
|
||
|
||
**Example:**
|
||
```json
|
||
{
|
||
"transitions": [["*", "*"]]
|
||
}
|
||
// Matches: ANY old value → ANY new value
|
||
// Essentially: "any change at all"
|
||
```
|
||
|
||
**Use Case:** Monitor all changes to a field, filter out specific cases with exceptions
|
||
|
||
---
|
||
|
||
### Combining Keywords with Literal Values
|
||
|
||
Patterns can mix keywords and literal values:
|
||
|
||
| Pattern | Meaning |
|
||
|---------|---------|
|
||
| `["*undefined", "*defined"]` | Undefined → Defined (field becomes populated) |
|
||
| `["*defined", "*undefined"]` | Defined → Undefined (field gets cleared) |
|
||
| `["*defined", "*defined"]` | Value change while staying defined (actual value change required) |
|
||
| `["*", "*"]` | Any change at all |
|
||
| `["Active", "*defined"]` | From literal "Active" to any defined value |
|
||
| `["*undefined", "Active"]` | From undefined to literal "Active" |
|
||
|
||
---
|
||
|
||
### Literal Values (No `*` Prefix)
|
||
|
||
Any value that does NOT start with `*` is treated as a literal value and matched exactly:
|
||
|
||
```json
|
||
{
|
||
"transitions": [
|
||
["pending", "accepted"], // Exact string match
|
||
[false, true], // Exact boolean match
|
||
[0, 1], // Exact numeric match
|
||
[null, "Active"], // null matches null, "Active" matches "Active"
|
||
["undefined", "Done"] // "undefined" (literal string) matches "undefined"
|
||
]
|
||
}
|
||
```
|
||
|
||
**Important:** Literal values are matched by exact equality, including:
|
||
- `"undefined"` - matches the exact string "undefined" (not undefined state)
|
||
- `null` - matches null values
|
||
- `""` - matches empty string
|
||
|
||
---
|
||
|
||
## Summary Table: Special Keywords in Transitions
|
||
|
||
| Keyword | Matches | Use Case |
|
||
|---------|---------|----------|
|
||
| `*undefined` | null, "", "undefined" (any undefined-like value) | Detect when field becomes populated |
|
||
| `*defined` | Any defined value (NOT null, "", "undefined") | Detect when field loses value |
|
||
| `*` | Any value whatsoever | Alert on any change; use with exceptions for fine control |
|
||
| (no `*` prefix) | Exact literal values | Specific value matching (e.g., "pending" → "accepted") |
|
||
|
||
---
|
||
|
||
### Rule Type 1: Standard Rules (Normal Comparison)
|
||
|
||
**Purpose:** Detect field value changes matching configured patterns
|
||
|
||
**Processing Steps:**
|
||
|
||
```
|
||
Step 1: Collect Candidate Fields
|
||
├─ Filter by field_group (if specified)
|
||
├─ Filter by field_name (if specified)
|
||
└─ Result: List of (group_name, field_name) tuples
|
||
|
||
Step 2: For Each Candidate Field
|
||
├─ Get new_value and old_value
|
||
├─ Check if transition matches (if transitions specified)
|
||
├─ Apply exceptions (include/exclude)
|
||
├─ Mark as "checked" if matches
|
||
|
||
Step 3: Apply bloc_scope
|
||
├─ With "any": Count inclusion if ANY field is checked
|
||
├─ With "all": Count inclusion if ALL changed fields are checked
|
||
|
||
Step 4: Report Matching Inclusions
|
||
└─ Count vs. thresholds (warning/critical)
|
||
```
|
||
|
||
**Example Configuration:**
|
||
|
||
```json
|
||
{
|
||
"bloc_title": "Inclusion Protocol",
|
||
"line_label": "Undefined to Defined (Only)",
|
||
"warning_threshold": 0,
|
||
"critical_threshold": 200,
|
||
"field_group": {"include": ["Inclusion"]},
|
||
"field_name": null,
|
||
"transitions": [
|
||
["include", "*.*", "*undefined", "*defined"]
|
||
],
|
||
"bloc_scope": "all"
|
||
}
|
||
```
|
||
|
||
### Rule Type 2: New Inclusions
|
||
|
||
**Purpose:** Count patients that exist in current data but not in previous
|
||
|
||
**Syntax:**
|
||
```json
|
||
{
|
||
"bloc_title": "Identification",
|
||
"line_label": "New Inclusions",
|
||
"warning_threshold": 0,
|
||
"critical_threshold": 50,
|
||
"field_group": "Patient_Identification",
|
||
"field_name": ["Patient_Id", "Pseudo"],
|
||
"transitions": [],
|
||
"bloc_scope": null
|
||
}
|
||
```
|
||
**Note:** For special rules like "New Inclusions", transitions can be left as empty array `[]` since these rules don't use transition matching.
|
||
|
||
**Processing:**
|
||
```
|
||
1. Build dictionaries indexed by key field
|
||
- Key field candidates: Patient_Id, Pseudo (tried in order)
|
||
- key_dict_new = {patient_key: patient_data for patient in current}
|
||
- key_dict_old = {patient_key: patient_data for patient in previous}
|
||
|
||
2. Find new inclusions
|
||
new_keys = set(key_dict_new.keys()) - set(key_dict_old.keys())
|
||
count = len(new_keys)
|
||
|
||
3. Compare to thresholds
|
||
IF count > critical_threshold: CRITICAL
|
||
ELIF count > warning_threshold: WARNING
|
||
ELSE: OK
|
||
```
|
||
|
||
**Example Output:**
|
||
```
|
||
✓ [green]New Inclusions: 0[/green]
|
||
(No new patients added)
|
||
|
||
⚠ [yellow]New Inclusions: 42[/yellow]
|
||
(42 new patients - warning threshold exceeded)
|
||
|
||
✗ [red]New Inclusions: 75[/red]
|
||
(75 new patients - exceeds critical threshold of 50)
|
||
```
|
||
|
||
### Rule Type 3: Deleted Inclusions
|
||
|
||
**Purpose:** Count patients that exist in previous but not in current
|
||
|
||
**Syntax:**
|
||
```json
|
||
{
|
||
"bloc_title": "Identification",
|
||
"line_label": "Deleted Inclusions",
|
||
"warning_threshold": 0,
|
||
"critical_threshold": 0,
|
||
"field_group": "Patient_Identification",
|
||
"field_name": ["Patient_Id", "Pseudo"],
|
||
"transitions": [],
|
||
"bloc_scope": null
|
||
}
|
||
```
|
||
|
||
**Processing:**
|
||
```
|
||
1. Build dictionaries (same as New Inclusions)
|
||
|
||
2. Find deleted inclusions
|
||
deleted_keys = set(key_dict_old.keys()) - set(key_dict_new.keys())
|
||
count = len(deleted_keys)
|
||
|
||
3. Compare to thresholds
|
||
IF count > critical_threshold: CRITICAL
|
||
ELIF count > warning_threshold: WARNING
|
||
ELSE: OK
|
||
```
|
||
|
||
**Note:** Typically `critical_threshold=0` because any deletion is concerning.
|
||
|
||
### Rule Type 4: New Fields
|
||
|
||
**Purpose:** Detect field names that appear in current but not in previous
|
||
|
||
**Syntax:**
|
||
```json
|
||
{
|
||
"bloc_title": "Structure",
|
||
"line_label": "New Fields",
|
||
"warning_threshold": 0,
|
||
"critical_threshold": 1,
|
||
"field_group": null,
|
||
"field_name": null,
|
||
"transitions": [],
|
||
"bloc_scope": null
|
||
}
|
||
```
|
||
|
||
**Processing:**
|
||
```
|
||
1. For each patient in common (present in both versions):
|
||
a) Get all groups and fields from current version
|
||
b) Get all groups and fields from previous version
|
||
c) Find new fields: current_fields - previous_fields
|
||
d) Qualified name: "group_name.field_name"
|
||
|
||
2. Count by field name
|
||
field_counts = {field_qualified_name: count_of_inclusions}
|
||
total_new_fields = len(field_counts)
|
||
|
||
3. Display results
|
||
For each new field:
|
||
"Inclusion.New_Field (42 inclusions)"
|
||
[count = number of inclusions that gained this field]
|
||
```
|
||
|
||
**Example Output:**
|
||
```
|
||
✓ [green]New Fields: 0[/green]
|
||
|
||
⚠ [yellow]New Fields: 2[/yellow]
|
||
Endotest.New_Request_Type (1 inclusion)
|
||
Inclusion.New_Status_Code (2 inclusions)
|
||
```
|
||
|
||
### Rule Type 5: Deleted Fields
|
||
|
||
**Purpose:** Detect field names that exist in previous but not in current
|
||
|
||
**Syntax:**
|
||
```json
|
||
{
|
||
"bloc_title": "Structure",
|
||
"line_label": "Deleted Fields",
|
||
"warning_threshold": 0,
|
||
"critical_threshold": 1,
|
||
"field_group": null,
|
||
"field_name": null,
|
||
"transitions": [],
|
||
"bloc_scope": null
|
||
}
|
||
```
|
||
|
||
**Processing:** Same as "New Fields" but reversed:
|
||
```
|
||
deleted_fields = previous_fields - current_fields
|
||
```
|
||
|
||
---
|
||
|
||
## Field Selection Pipeline (v3.0)
|
||
|
||
**NEW APPROACH:** Field selection now uses the **same pipeline architecture as transitions**.
|
||
|
||
### Pipeline Ordering (Key Concept)
|
||
|
||
Start with an **empty set of fields**. Each step either **includes** or **excludes** fields:
|
||
|
||
```python
|
||
candidate_fields = set() # Empty initially
|
||
|
||
# Step 1: Include all Endotest fields
|
||
for each field in all_fields:
|
||
if selector matches "Endotest.*":
|
||
candidate_fields.add(field)
|
||
|
||
# Step 2: Also include Inclusion.Status
|
||
for each field in all_fields:
|
||
if selector matches "Inclusion.Status":
|
||
candidate_fields.add(field)
|
||
|
||
# Step 3: But exclude Endotest.Last_Updated
|
||
for each field in all_fields:
|
||
if selector matches "Endotest.Last_Updated":
|
||
candidate_fields.discard(field)
|
||
|
||
# Result: Endotest.* + Inclusion.Status, except Endotest.Last_Updated
|
||
```
|
||
|
||
### Simple Examples
|
||
|
||
#### Example 1: Single Group
|
||
```json
|
||
[["include", "Endotest.*"]]
|
||
// Result: All Endotest fields
|
||
```
|
||
|
||
#### Example 2: Multiple Groups
|
||
```json
|
||
[["include", "Endotest.*"], ["include", "Inclusion.*"]]
|
||
// Result: All Endotest + all Inclusion fields
|
||
```
|
||
|
||
#### Example 3: Specific Fields
|
||
```json
|
||
[["include", "Patient_Identification.Patient_Id"], ["include", "Patient_Identification.Pseudo"]]
|
||
// Result: Only Patient_Id and Pseudo fields
|
||
```
|
||
|
||
#### Example 4: All Except Some
|
||
```json
|
||
[["include", "*.*"], ["exclude", "Endotest.Last_Updated"]]
|
||
// Result: All fields EXCEPT Endotest.Last_Updated
|
||
```
|
||
|
||
#### Example 5: Complex Selection
|
||
```json
|
||
[
|
||
["include", "*.*"],
|
||
["exclude", "Patient_Identification.*"],
|
||
["exclude", "Inclusion.*"]
|
||
]
|
||
// Result: All fields EXCEPT Patient_Identification and Inclusion
|
||
```
|
||
|
||
### Important Notes
|
||
|
||
- ✅ **Order matters:** Steps are applied sequentially
|
||
- ✅ **Explicit:** Admin responsible for correct pipeline
|
||
- ✅ **No implicit AND/OR:** Use multiple include steps for OR logic
|
||
- ✅ **Deterministic:** Sets sorted, reproducible results
|
||
|
||
---
|
||
|
||
## Transition Patterns
|
||
|
||
### Pattern Matching Rules
|
||
|
||
#### Literal Value Matching
|
||
```json
|
||
[
|
||
["active", "inactive"],
|
||
[true, false],
|
||
[0, 1]
|
||
]
|
||
// Match exact value changes
|
||
// Type must match (string vs. number vs. boolean)
|
||
```
|
||
|
||
#### Undefined Keyword
|
||
```
|
||
*undefined: Matches any undefined-like value
|
||
- null
|
||
- "" (empty string)
|
||
- "undefined"
|
||
|
||
*defined: Matches any defined value
|
||
- NOT null
|
||
- NOT ""
|
||
- NOT "undefined"
|
||
```
|
||
|
||
**Examples:**
|
||
```json
|
||
[
|
||
["*undefined", "*defined"]
|
||
]
|
||
// Transition FROM any undefined TO any defined
|
||
|
||
[
|
||
["*defined", "*undefined"]
|
||
]
|
||
// Transition FROM any defined TO any undefined
|
||
|
||
[
|
||
["*defined", "*defined"]
|
||
]
|
||
// Transition FROM defined TO different defined
|
||
// (with actual value change check)
|
||
```
|
||
|
||
#### Wildcard Pattern
|
||
```json
|
||
[
|
||
["*", "*"]
|
||
]
|
||
// Match ANY transition
|
||
// Useful for: "Alert on any change to this field"
|
||
```
|
||
|
||
### Transition Combination Examples
|
||
|
||
**Example 1: Detect New Values Only**
|
||
```json
|
||
{
|
||
"transitions": [["*undefined", "*defined"]]
|
||
}
|
||
// Alert when field goes from undefined to any value
|
||
// Ignore when field already had value
|
||
```
|
||
|
||
**Example 2: Detect Value Reversal**
|
||
```json
|
||
{
|
||
"transitions": [
|
||
[true, false],
|
||
[false, true]
|
||
]
|
||
}
|
||
// Alert when boolean field toggles in either direction
|
||
```
|
||
|
||
**Example 3: Detect Specific Status Change**
|
||
```json
|
||
{
|
||
"transitions": [
|
||
["pending", "approved"],
|
||
["pending", "rejected"]
|
||
]
|
||
}
|
||
// Alert when pending status changes to approved or rejected
|
||
// Ignore all other transitions
|
||
```
|
||
|
||
**Example 4: Detect Anything But This**
|
||
```json
|
||
{
|
||
"transitions": [
|
||
["include", "*.*", "*", "*"],
|
||
["exclude", "Endotest.Last_Updated", "*", "*"]
|
||
]
|
||
}
|
||
// Alert on any field change
|
||
// EXCEPT exclude changes to Last_Updated
|
||
```
|
||
|
||
---
|
||
|
||
## Exception Handling (Pipeline Architecture)
|
||
|
||
With the new unified pipeline format, exceptions are now just regular pipeline steps with different actions. This section explains the patterns.
|
||
|
||
### Pattern 1: Simple Whitelist (Include Only)
|
||
|
||
Allow specific field/transition combinations:
|
||
|
||
```json
|
||
{
|
||
"transitions": [
|
||
["include", "Request_Sent", false, true],
|
||
["include", "Diagnostic_Status", "warning", "complete"]
|
||
]
|
||
}
|
||
```
|
||
|
||
**Logic:**
|
||
```
|
||
Step 1: Include Request_Sent with false→true transition
|
||
Step 2: Include Diagnostic_Status with warning→complete
|
||
Result: ONLY these specific field+transition combinations are checked
|
||
```
|
||
|
||
### Pattern 2: Simple Blacklist (Exclude Only)
|
||
|
||
Block specific field/transition combinations:
|
||
|
||
```json
|
||
{
|
||
"transitions": [
|
||
["include", "*.*", "*", "*"],
|
||
["exclude", "Last_Updated", "*", "*"],
|
||
["exclude", "Endotest.Import_Time", "*", "*"]
|
||
]
|
||
}
|
||
```
|
||
|
||
**Logic:**
|
||
```
|
||
Step 1: Include all fields with any change (*→*)
|
||
Step 2: Exclude Last_Updated from being checked
|
||
Step 3: Exclude Endotest.Import_Time from being checked
|
||
Result: All fields EXCEPT Last_Updated and Import_Time
|
||
```
|
||
|
||
### Pattern 3: Main Rule + Multiple Exceptions
|
||
|
||
Combine main transition rule with field-specific exceptions:
|
||
|
||
```json
|
||
{
|
||
"transitions": [
|
||
["include", "*.*", "*defined", "*defined"],
|
||
["include", "Request_Sent", false, true],
|
||
["exclude", "Endotest.Last_Modified", "*", "*"]
|
||
]
|
||
}
|
||
```
|
||
|
||
**Logic:**
|
||
```
|
||
Step 1: Include fields that change between two defined values
|
||
Step 2: ALSO include Request_Sent changing from false to true (even if not *defined→*defined)
|
||
Step 3: But exclude any change to Last_Modified (overrides Step 1)
|
||
Result: *defined→*defined changes PLUS Request_Sent false→true, EXCEPT Last_Modified
|
||
```
|
||
|
||
### Field Selector Formats in Pipeline
|
||
|
||
**Simple field name (matches in any group):**
|
||
```json
|
||
{
|
||
"field_selector": "Status"
|
||
}
|
||
// Matches "Status" in any group
|
||
// But this is NOT pipeline syntax - use "*.*" with field matching instead
|
||
```
|
||
|
||
**Better: Use qualified notation in field_selector:**
|
||
```json
|
||
["include", "Endotest.Request_Sent", false, true]
|
||
// Matches only Endotest group, Request_Sent field
|
||
// Matches ONLY Endotest.Request_Sent
|
||
```
|
||
|
||
**Full Specification:**
|
||
```json
|
||
{
|
||
"field": "Endotest.Request_Sent",
|
||
"transition": [false, true]
|
||
}
|
||
// Matches this specific field AND transition combination
|
||
```
|
||
|
||
### Practical Examples with Pipeline
|
||
|
||
**Example 1: Alert on Most Changes, Except System Fields**
|
||
```json
|
||
{
|
||
"transitions": [
|
||
["include", "*.*", "*", "*"],
|
||
["exclude", "Last_Updated", "*", "*"],
|
||
["exclude", "Last_Modified_By", "*", "*"],
|
||
["exclude", "Import_Timestamp", "*", "*"]
|
||
]
|
||
}
|
||
// Step 1: Include ANY field change
|
||
// Step 2-4: Exclude system timestamp/audit fields
|
||
```
|
||
|
||
**Example 2: Alert on Undefined→Defined, Plus Status Reversals**
|
||
```json
|
||
{
|
||
"transitions": [
|
||
["include", "*.*", "*undefined", "*defined"],
|
||
["include", "Request_Status", "rejected", "submitted"]
|
||
]
|
||
}
|
||
// Step 1: Include when field goes from undefined to defined
|
||
// Step 2: ALSO include Request_Status: rejected → submitted (even if not undefined→defined)
|
||
```
|
||
|
||
**Example 3: Complex Medical Rules with Multiple Conditions**
|
||
```json
|
||
{
|
||
"transitions": [
|
||
["include", "*.*", "*undefined", "*defined"],
|
||
["include", "Endotest.Test_Result", "pending", "completed"],
|
||
["include", "GDD.Status", "pending", "failed"],
|
||
["exclude", "Endotest.Last_Sync", "*", "*"]
|
||
]
|
||
}
|
||
// Step 1: Include main rule: undefined→defined
|
||
// Step 2: ALSO include Test_Result pending→completed
|
||
// Step 3: ALSO include GDD.Status pending→failed
|
||
// Step 4: But exclude any change to Last_Sync field
|
||
// Result: All matching transitions except Last_Sync changes
|
||
```
|
||
|
||
**Example 4: Fine-Grained Control with Include + Exclude**
|
||
```json
|
||
{
|
||
"transitions": [
|
||
["include", "*.*", "*"],
|
||
["include", "Status", "*undefined", "*defined"],
|
||
["include", "Status", "*defined", "*undefined"],
|
||
["exclude", "Last_Updated", "*", "*"],
|
||
["exclude", "Internal_Id", "*", "*"]
|
||
]
|
||
}
|
||
// Step 1: Include any change (baseline)
|
||
// Step 2-3: Specifically include Status becoming defined/undefined
|
||
// Step 4-5: Exclude Last_Updated and Internal_Id changes (override Step 1)
|
||
// Result: All changes EXCEPT Last_Updated/Internal_Id, plus Status transitions
|
||
```
|
||
|
||
---
|
||
|
||
## Configuration Examples
|
||
|
||
### Example 1: Monitor New Inclusions (v3.0)
|
||
|
||
**Requirement:** Alert if unexpected number of patients added
|
||
|
||
```json
|
||
{
|
||
"ignore": null,
|
||
"bloc_title": "Identification",
|
||
"line_label": "New Inclusions",
|
||
"warning_threshold": 0,
|
||
"critical_threshold": 50,
|
||
"field_selection": [["include", "Patient_Identification.Patient_Id"], ["include", "Patient_Identification.Pseudo"]],
|
||
"bloc_scope": null,
|
||
"transitions": []
|
||
}
|
||
```
|
||
|
||
**Field Selection Logic:**
|
||
```
|
||
Starts empty: candidate_fields = {}
|
||
Step 1: Include Patient_Identification.Patient_Id
|
||
Step 2: Include Patient_Identification.Pseudo
|
||
Result: [Patient_Identification.Patient_Id, Patient_Identification.Pseudo]
|
||
These become key candidates (tried in order)
|
||
```
|
||
|
||
**Logic:**
|
||
```
|
||
Count patients in current but not in previous
|
||
If count > 50: CRITICAL (too many new patients)
|
||
If count > 0: WARNING (any new patients)
|
||
If count == 0: OK
|
||
```
|
||
|
||
### Example 2: Detect Undefined→Defined Changes (v3.0)
|
||
|
||
**Requirement:** Alert if any field becomes defined
|
||
|
||
```json
|
||
{
|
||
"bloc_title": "Inclusion Protocol",
|
||
"line_label": "Undefined to Defined",
|
||
"warning_threshold": 0,
|
||
"critical_threshold": 100,
|
||
"field_selection": [["include", "Inclusion.*"]],
|
||
"bloc_scope": "any",
|
||
"transitions": [
|
||
["include", "*.*", "*undefined", "*defined"]
|
||
]
|
||
}
|
||
```
|
||
|
||
**Field Selection & Transitions:**
|
||
```
|
||
Field Selection: Include all Inclusion fields
|
||
Transitions Pipeline:
|
||
Step 1: Include *.* *undefined→*defined
|
||
Result: Only undefined→defined changes
|
||
```
|
||
|
||
**Logic:**
|
||
```
|
||
For each inclusion:
|
||
Check if Inclusion.Inclusion_Status changed
|
||
If transition is: undefined → defined:
|
||
COUNT this inclusion
|
||
If count > 5: CRITICAL
|
||
If count > 0: WARNING
|
||
```
|
||
|
||
### Example 3: Strict All-Fields Completeness (v3.0)
|
||
|
||
**Requirement:** Ensure ALL changed fields follow undefined→defined pattern
|
||
|
||
```json
|
||
{
|
||
"bloc_title": "Inclusion Protocol",
|
||
"line_label": "All Changes Undefined to Defined",
|
||
"warning_threshold": 0,
|
||
"critical_threshold": 200,
|
||
"field_selection": [["include", "Inclusion.*"]],
|
||
"bloc_scope": "all",
|
||
"transitions": [
|
||
["include", "*.*", "*undefined", "*defined"]
|
||
]
|
||
}
|
||
```
|
||
|
||
**Key Difference with bloc_scope="all":**
|
||
```
|
||
With bloc_scope="any": Count if ANY field matches
|
||
With bloc_scope="all": Count ONLY if ALL changed fields match
|
||
```
|
||
|
||
**Logic:**
|
||
```
|
||
For each inclusion:
|
||
Find all Inclusion fields that changed
|
||
Check if ALL changes are: undefined → defined
|
||
If all changed fields match pattern:
|
||
COUNT this inclusion (expected pattern)
|
||
If any changed field doesn't match:
|
||
SKIP (unexpected pattern)
|
||
|
||
If count > 200: CRITICAL (too many gaining data)
|
||
```
|
||
|
||
### Example 4: Request Lifecycle Validation (v3.0)
|
||
|
||
**Requirement:** Detect expected test request state transitions
|
||
|
||
```json
|
||
{
|
||
"bloc_title": "Endotest",
|
||
"line_label": "Request Status Changes",
|
||
"warning_threshold": 0,
|
||
"critical_threshold": 100,
|
||
"field_selection": [["include", "Endotest.Request_Sent"], ["include", "Endotest.Request_Status"]],
|
||
"bloc_scope": "any",
|
||
"transitions": [
|
||
["include", "Endotest.Request_Sent", false, true],
|
||
["include", "Endotest.Request_Status", "pending", "accepted"],
|
||
["include", "Endotest.Request_Status", "pending", "rejected"]
|
||
]
|
||
}
|
||
```
|
||
|
||
**Field Selection Pipeline:**
|
||
```
|
||
Empty set start
|
||
Step 1: Include Endotest.Request_Sent
|
||
Step 2: Include Endotest.Request_Status
|
||
Result: {Endotest.Request_Sent, Endotest.Request_Status}
|
||
```
|
||
|
||
**Logic:**
|
||
```
|
||
For each inclusion:
|
||
Check Endotest fields (Request_Sent, Request_Status)
|
||
If ANY field matches transitions:
|
||
COUNT this inclusion
|
||
If count > 100: CRITICAL (too many status changes)
|
||
```
|
||
|
||
### Example 5: Valid Workflow Transitions
|
||
|
||
**Requirement:** Alert on workflow changes but only for valid state transitions (request can go from pending to accepted/rejected/resubmitted)
|
||
|
||
```json
|
||
{
|
||
"bloc_title": "Endotest",
|
||
"line_label": "Valid Request Transitions",
|
||
"warning_threshold": 0,
|
||
"critical_threshold": 50,
|
||
"field_group": {"include": ["Endotest"]},
|
||
"field_name": ["Request_Status"],
|
||
"transitions": [
|
||
["include", "Endotest.Request_Status", "pending", "accepted"],
|
||
["include", "Endotest.Request_Status", "pending", "rejected"],
|
||
["include", "Endotest.Request_Status", "rejected", "resubmitted"],
|
||
["include", "Endotest.Request_Status", "accepted", "cancelled"]
|
||
],
|
||
"bloc_scope": "any"
|
||
}
|
||
```
|
||
|
||
**Logic:**
|
||
```
|
||
For each inclusion:
|
||
Check if Request_Status field changed
|
||
If transition matches ONE of the 4 allowed transitions:
|
||
COUNT this inclusion (valid workflow)
|
||
If transition is different:
|
||
SKIP (unexpected change - needs investigation)
|
||
|
||
If count > 50: CRITICAL (too many valid status transitions)
|
||
```
|
||
|
||
**Note:** With multiple transitions in the exception, the field must match ANY of the specified transitions to be included.
|
||
|
||
---
|
||
|
||
### Example 6: Exclude Internal Fields
|
||
|
||
**Requirement:** Monitor data changes but ignore internal/system fields
|
||
|
||
```json
|
||
{
|
||
"bloc_title": "Identification",
|
||
"line_label": "Data Changes",
|
||
"warning_threshold": 0,
|
||
"critical_threshold": 100,
|
||
"field_group": null,
|
||
"field_name": {"exclude": ["Last_Updated", "Import_Time", "Internal_Id"]},
|
||
"transitions": [
|
||
["include", "*.*", "*", "*"]
|
||
],
|
||
"bloc_scope": "any"
|
||
}
|
||
```
|
||
|
||
**Logic:**
|
||
```
|
||
For each inclusion:
|
||
Check ALL fields EXCEPT [Last_Updated, Import_Time, Internal_Id]
|
||
If ANY field changed:
|
||
COUNT this inclusion
|
||
If count > 100: CRITICAL (too many changes)
|
||
```
|
||
|
||
---
|
||
|
||
## User Guide: Adding/Modifying Rules
|
||
|
||
### Step 1: Identify Rule Need
|
||
|
||
Determine the data validation requirement:
|
||
|
||
```
|
||
Detection Type Use Pattern
|
||
─────────────────────────────────────────────────
|
||
New patients added "New Inclusions" rule
|
||
Patients removed "Deleted Inclusions" rule
|
||
Field values changed Standard rule + transitions
|
||
Field added/removed "New/Deleted Fields" rule
|
||
Specific transitions Standard rule + narrow transitions
|
||
Exclude system changes Standard rule + exceptions
|
||
```
|
||
|
||
### Step 2: Choose Rule Type
|
||
|
||
| Rule Type | When to Use | Complexity |
|
||
|-----------|------------|-----------|
|
||
| New Inclusions | Track patient additions | Simple |
|
||
| Deleted Inclusions | Track patient removals | Simple |
|
||
| New Fields | Monitor schema changes | Simple |
|
||
| Deleted Fields | Detect removed data | Simple |
|
||
| Standard (Transitions) | Monitor specific changes | Medium |
|
||
| Standard (with Exceptions) | Monitor changes + allowances | Complex |
|
||
|
||
### Step 3: Define Thresholds
|
||
|
||
```
|
||
Decision Matrix:
|
||
|
||
Threshold Pattern Meaning Example Use
|
||
─────────────────────────────────────────────────────
|
||
(0, 0) No changes allowed Critical data
|
||
(0, 1) Anything is critical Surgery dates
|
||
(0, 50) Strict monitoring High-value fields
|
||
(50, 100) Normal operation Flexible fields
|
||
(200, 200) Skip to critical Lenient tracking
|
||
```
|
||
|
||
Recommendation:
|
||
```
|
||
Strict validation (medical):
|
||
warning = 0, critical = 1
|
||
|
||
Normal validation (most fields):
|
||
warning = 5, critical = 20
|
||
|
||
Lenient validation (administrative):
|
||
warning = 50, critical = 100
|
||
```
|
||
|
||
### Step 4: Create Rule Row in Excel
|
||
|
||
Open `Endobest_Dashboard_Config.xlsx` → `Regression_Check` sheet
|
||
|
||
```
|
||
Row N:
|
||
A: ignore (leave empty)
|
||
B: bloc_title (e.g., "Inclusion Protocol")
|
||
C: line_label (e.g., "Status Changed")
|
||
D: warning_threshold (e.g., 0)
|
||
E: critical_threshold (e.g., 20)
|
||
F: field_group (e.g., "Inclusion")
|
||
G: field_name (e.g., ["Status", "Date"])
|
||
H: bloc_scope (e.g., "any")
|
||
I: transitions (e.g., [["include", "*.*", "*", "*"]])
|
||
```
|
||
|
||
### Step 5: Define Field Scope
|
||
|
||
Decide which fields the rule applies to:
|
||
|
||
```
|
||
Scope JSON
|
||
──────────────────────────────────────────────
|
||
All fields null
|
||
All in group X "group_name"
|
||
Multiple groups {"include": ["group1", "group2"]}
|
||
All except group X {"exclude": ["group1"]}
|
||
Specific field "field_name"
|
||
Multiple fields ["field1", "field2"]
|
||
Field with notation ["Group.field1", "Group.field2"]
|
||
```
|
||
|
||
### Step 6: Define Transitions
|
||
|
||
Specify what changes to monitor:
|
||
|
||
```
|
||
Pattern JSON Meaning
|
||
────────────────────────────────────────────────────────────
|
||
Any change [["*", "*"]] Monitor all changes
|
||
Become defined [["*undefined", "*defined"]] Field gets value
|
||
Become undefined [["*defined", "*undefined"]] Field loses value
|
||
Toggle boolean [[true, false], [false, true]] Boolean flip
|
||
Specific change [["old", "new"]] Exact transition
|
||
Multiple changes [["old1", "new1"], ["old2", "new2"]] Multiple patterns
|
||
```
|
||
|
||
### Step 7: Set Exceptions (Optional)
|
||
|
||
Allow specific field/transition combinations:
|
||
|
||
```
|
||
If needed:
|
||
i: transition_exceptions = {
|
||
"include": [
|
||
{"field": "Request_Sent", "transition": [false, true]}
|
||
]
|
||
}
|
||
|
||
Or exclude specific cases:
|
||
i: transition_exceptions = {
|
||
"exclude": [
|
||
{"field": "Last_Updated"}
|
||
]
|
||
}
|
||
```
|
||
|
||
### Step 8: Choose Bloc Scope
|
||
|
||
Decide aggregation logic:
|
||
|
||
```
|
||
Requirement bloc_scope
|
||
─────────────────────────────────────────────
|
||
Any field changes "any" (default)
|
||
All changes match "all"
|
||
```
|
||
|
||
### Step 9: Validate & Test
|
||
|
||
```bash
|
||
# Check-only mode (validates configuration)
|
||
python eb_dashboard.py --check-only
|
||
|
||
# Expected output:
|
||
# ✓ Loaded 42 regression check rules
|
||
# ✓ All checks passed
|
||
```
|
||
|
||
### Step 10: Full Collection Test
|
||
|
||
```bash
|
||
# Run full collection to test rule
|
||
python eb_dashboard.py
|
||
|
||
# After collection, verify:
|
||
# 1. Rule appears in output
|
||
# 2. Severity level is correct (OK/Warning/Critical)
|
||
# 3. Count matches expectations
|
||
```
|
||
|
||
---
|
||
|
||
## Execution Modes
|
||
|
||
### Mode 1: Normal Collection with Quality Checks
|
||
|
||
```bash
|
||
python eb_dashboard.py
|
||
```
|
||
|
||
**Workflow:**
|
||
```
|
||
1. Collect data (organizations, inclusions)
|
||
2. Run Coherence Check
|
||
3. Run Non-Regression Check (if old file exists)
|
||
4. If critical issues: Ask user for confirmation
|
||
5. If OK or user confirms: Export files
|
||
6. Display elapsed time
|
||
```
|
||
|
||
**Output:**
|
||
```
|
||
Collecting data from 15 organizations...
|
||
[████████████████████] 1200/1200
|
||
|
||
═══ Coherence Check ═══
|
||
✓ [green]TOTAL matches[/green]
|
||
|
||
═══ Non Regression Check ═══
|
||
✓ [green]Structure: New Fields: 0[/green]
|
||
✓ [green]Identification: New Inclusions: 0[/green]
|
||
...
|
||
|
||
✓ All checks passed successfully!
|
||
|
||
Writing files...
|
||
Elapsed time: 3:42
|
||
```
|
||
|
||
### Mode 2: Check-Only (Validation Only)
|
||
|
||
```bash
|
||
python eb_dashboard.py --check-only
|
||
```
|
||
|
||
**Workflow:**
|
||
```
|
||
1. Load existing JSON files (no API calls)
|
||
2. Load regression configuration
|
||
3. Run Coherence Check
|
||
4. Run Non-Regression Check
|
||
5. Report results
|
||
6. Exit
|
||
```
|
||
|
||
**Use Case:** Validate data before distribution without fresh collection
|
||
|
||
**Output:**
|
||
```
|
||
═══ CHECK ONLY MODE ═══
|
||
Running quality checks on existing data files...
|
||
|
||
[Loading configuration...]
|
||
[Running checks...]
|
||
|
||
✓ All checks passed successfully!
|
||
```
|
||
|
||
### Mode 3: Compare Two Files
|
||
|
||
```bash
|
||
python eb_dashboard.py --check-only file1.json file2.json
|
||
```
|
||
|
||
**Workflow:**
|
||
```
|
||
1. Load file1 and file2 (as current and old)
|
||
2. Skip coherence check (organizations not provided)
|
||
3. Run regression check comparing them
|
||
4. Report differences
|
||
5. Exit
|
||
```
|
||
|
||
**Use Case:** Compare two snapshots, detect changes between versions
|
||
|
||
**Output:**
|
||
```
|
||
═══ CHECK ONLY COMPARE MODE ═══
|
||
Comparing two specific files:
|
||
Current: file1.json
|
||
Old: file2.json
|
||
|
||
[Running regression checks...]
|
||
|
||
⚠ [yellow]New Inclusions: 15[/yellow]
|
||
✗ [red]Deleted Inclusions: 5[/red]
|
||
...
|
||
```
|
||
|
||
### Mode 4: Debug Mode (Verbose Output)
|
||
|
||
```bash
|
||
python eb_dashboard.py --debug
|
||
```
|
||
|
||
**Workflow:**
|
||
```
|
||
1. Execute as Normal Mode
|
||
2. Enable DEBUG_MODE in quality checks
|
||
3. Display detailed field-by-field changes
|
||
4. Show individual inclusion comparisons
|
||
5. Verbose logging
|
||
```
|
||
|
||
**Use Case:** Troubleshoot regression rules, understand data changes
|
||
|
||
**Output:**
|
||
```
|
||
Running collection...
|
||
[████████] 1200/1200
|
||
|
||
═══ Non Regression Check (DEBUG MODE) ═══
|
||
|
||
Endotest - Undefined to Defined (Only): 12
|
||
✓ Patient-001:
|
||
- Endotest.Request_Sent: false → true
|
||
- Endotest.Request_Status: undefined → 'completed'
|
||
|
||
✓ Patient-002:
|
||
- Endotest.Request_Sent: false → true
|
||
|
||
...
|
||
```
|
||
|
||
---
|
||
|
||
## Troubleshooting
|
||
|
||
### Issue 1: "Invalid JSON format" Error
|
||
|
||
**Symptom:** Configuration validation fails
|
||
|
||
**Cause:** Malformed JSON in transitions, field_name, or exceptions
|
||
|
||
**Solution:**
|
||
1. Open cell in JSON validator
|
||
2. Fix syntax errors
|
||
3. Re-run check
|
||
|
||
**Example - WRONG:**
|
||
```json
|
||
{
|
||
"transitions": [["active", "inactive" ] // Missing comma
|
||
}
|
||
|
||
{
|
||
"field_name": ["Status" "Date"] // Missing comma between array elements
|
||
}
|
||
```
|
||
|
||
**Example - CORRECT:**
|
||
```json
|
||
{
|
||
"transitions": [["active", "inactive"]]
|
||
}
|
||
|
||
{
|
||
"field_name": ["Status", "Date"]
|
||
}
|
||
```
|
||
|
||
### Issue 2: Rule Never Triggers
|
||
|
||
**Symptom:** Count always shows 0 even when data changes
|
||
|
||
**Causes:**
|
||
1. Field filters too restrictive
|
||
2. Transition pattern doesn't match actual changes
|
||
3. field_group/field_name filtering excludes target fields
|
||
|
||
**Solution:**
|
||
1. Loosen field filters: Set field_name to null
|
||
2. Use wildcards in transitions: `["*", "*"]`
|
||
3. Check actual field names in JSON output
|
||
4. Enable debug mode to see field matching
|
||
|
||
### Issue 3: Too Many False Positives
|
||
|
||
**Symptom:** Rule triggers unexpectedly, too many violations
|
||
|
||
**Causes:**
|
||
1. Thresholds set too low
|
||
2. Transitions too broad (matching unintended changes)
|
||
3. field_group/field_name too permissive
|
||
|
||
**Solution:**
|
||
1. Increase thresholds: Raise warning_threshold and critical_threshold
|
||
2. Narrow transitions: Use specific values instead of wildcards
|
||
3. Add exceptions: Use transition_exceptions to exclude specific cases
|
||
4. Narrow field scope: Specify field_name instead of null
|
||
|
||
### Issue 4: Configuration Changes Not Taking Effect
|
||
|
||
**Symptom:** Modifications to Excel file don't affect results
|
||
|
||
**Causes:**
|
||
1. File not saved
|
||
2. Regression_Check sheet not loaded
|
||
3. Old configuration still in memory
|
||
|
||
**Solution:**
|
||
1. Save Excel file (Ctrl+S)
|
||
2. Restart Python script
|
||
3. Verify sheet name is exactly "Regression_Check"
|
||
4. Check file path is correct
|
||
|
||
### Issue 5: User Confirmation Not Appearing
|
||
|
||
**Symptom:** Expected prompt for critical issues doesn't show
|
||
|
||
**Causes:**
|
||
1. Issues are at warning level, not critical
|
||
2. Thresholds higher than actual counts
|
||
3. Running in check-only mode (no export decision needed)
|
||
|
||
**Solution:**
|
||
1. Verify thresholds: warning < critical
|
||
2. Check actual violation counts
|
||
3. Run normal mode (not check-only)
|
||
|
||
### Issue 6: Comparison Mode Showing Unexpected Differences
|
||
|
||
**Symptom:** `--check-only file1 file2` reports many changes
|
||
|
||
**Causes:**
|
||
1. Files are from different collection dates (expected)
|
||
2. Configuration changed between collections (expected)
|
||
3. Field order or grouping changed (might be false positive)
|
||
|
||
**Solution:**
|
||
1. Review reported changes manually
|
||
2. Check if changes are expected (new patient data added)
|
||
3. Verify no data corruption occurred
|
||
4. Compare file sizes and counts manually
|
||
|
||
---
|
||
|
||
## Performance Considerations
|
||
|
||
### Regression Check Execution Time
|
||
|
||
**Factors Affecting Performance:**
|
||
|
||
```
|
||
1. Number of Inclusions (patients)
|
||
- N patients = O(N) iterations
|
||
- Typical: 1200 patients = 1-2 seconds
|
||
|
||
2. Number of Rules
|
||
- R rules applied to each inclusion
|
||
- Typical: 20-30 rules = <100ms total
|
||
|
||
3. Field Matching Complexity
|
||
- Filter evaluation per field
|
||
- Notation pointée parsing: O(1) per field
|
||
- Typical: <50ms for all rules
|
||
|
||
4. Total Typical Time
|
||
- 1200 inclusions × 25 rules = 1-3 seconds
|
||
```
|
||
|
||
### Optimization Tips
|
||
|
||
**If Regression Check is Slow:**
|
||
|
||
1. **Reduce rule count:**
|
||
- Remove inactive rules (add "ignore" label)
|
||
- Combine similar rules
|
||
|
||
2. **Simplify field filters:**
|
||
- Use null instead of large filter lists
|
||
- Use include (smaller) instead of exclude (larger)
|
||
|
||
3. **Narrow transitions:**
|
||
- Use specific values instead of wildcards
|
||
- Reduce number of transition pairs
|
||
|
||
4. **Consider file size:**
|
||
- Large JSON files (>20MB) take longer to parse
|
||
- This is rare and usually not the bottleneck
|
||
|
||
---
|
||
|
||
## Summary
|
||
|
||
The Quality Checks System provides:
|
||
|
||
✅ **Multi-Level Validation:** Coherence + Regression checks
|
||
✅ **Config-Driven Rules:** No code changes needed
|
||
✅ **Flexible Thresholds:** Warning and Critical levels
|
||
✅ **Rich Filtering:** Group, field, notation pointée support
|
||
✅ **Transition Patterns:** Wildcard, keyword, and specific matching
|
||
✅ **Advanced Exception Handling:**
|
||
- Multiple transitions per exception: `[[old1, new1], [old2, new2], ...]`
|
||
- Include + Exclude can coexist simultaneously
|
||
- Fine-grained control over allowed/blocked transitions
|
||
✅ **Backward Compatible:** Legacy single-transition format still supported
|
||
✅ **Debug Support:** Detailed logging and debug mode
|
||
✅ **Execution Modes:** Normal, check-only, compare, debug
|
||
|
||
This architecture enables robust data quality monitoring without requiring code modifications, empowering business analysts to define and evolve validation rules independently.
|
||
|
||
---
|
||
|
||
**Document End**
|