Version fonctionnelle

This commit is contained in:
2025-12-12 23:07:26 +01:00
commit cb8b5d9a12
42 changed files with 465285 additions and 0 deletions

View File

@@ -0,0 +1,990 @@
# 📊 Endobest Clinical Research Dashboard - Architecture Summary
**Last Updated:** 2025-11-08
**Project Status:** Production Ready with Excel Export Feature
**Language:** Python 3.x
---
## 🎯 Executive Summary
The **Endobest Clinical Research Dashboard** is a sophisticated, production-grade automated data collection and reporting system designed to aggregate patient inclusion data from the Endobest clinical research protocol across multiple healthcare organizations. The system combines high-performance multithreading, comprehensive quality assurance, and fully externalized configuration to enable non-technical users to manage complex data extraction workflows without code modifications.
### Core Value Propositions
**100% Externalized Configuration** - All field definitions, quality rules, and export logic defined in Excel
**High-Performance Architecture** - 4-5x faster via optimized API calls and parallel processing
**Robust Resilience** - Automatic token refresh, retries, graceful degradation
**Comprehensive Quality Assurance** - Coherence checks + config-driven regression testing
**Multi-Format Export** - JSON + configurable Excel workbooks with data transformation
**User-Friendly Interface** - Interactive prompts, progress tracking, clear error messages
---
## 📁 Project Structure
```
Endobest Dashboard/
├── 📜 MAIN SCRIPT
│ └── eb_dashboard.py (57.5 KB, 1,021 lines)
│ Core orchestrator for data collection, processing, and export
├── 🔧 UTILITY MODULES
│ ├── eb_dashboard_utils.py (6.4 KB, 184 lines)
│ │ Thread-safe HTTP clients, nested data navigation, config resolution
│ │
│ ├── eb_dashboard_quality_checks.py (58.5 KB, 1,266 lines)
│ │ Coherence checks, non-regression testing, data validation
│ │
│ └── eb_dashboard_excel_export.py (32 KB, ~1,000 lines)
│ Configuration-driven Excel workbook generation
├── 📚 DOCUMENTATION
│ ├── DOCUMENTATION_10_ARCHITECTURE.md (43.7 KB)
│ │ System design, data flow, API integration, multithreading
│ │
│ ├── DOCUMENTATION_11_FIELD_MAPPING.md (56.3 KB)
│ │ Field extraction logic, custom functions, transformations
│ │
│ ├── DOCUMENTATION_12_QUALITY_CHECKS.md (60.2 KB)
│ │ Quality assurance framework, regression rules, validation logic
│ │
│ ├── DOCUMENTATION_13_EXCEL_EXPORT.md (29.6 KB)
│ │ Excel generation architecture, data transformation pipeline
│ │
│ ├── DOCUMENTATION_98_USER_GUIDE.md (8.4 KB)
│ │ End-user instructions, quick start, troubleshooting
│ │
│ └── DOCUMENTATION_99_CONFIG_GUIDE.md (24.8 KB)
│ Administrator configuration reference
├── ⚙️ CONFIGURATION
│ └── config/
│ ├── Endobest_Dashboard_Config.xlsx (Configuration file)
│ │ Inclusions_Mapping
│ │ Organizations_Mapping
│ │ Excel_Workbooks
│ │ Excel_Sheets
│ │ Regression_Check
│ │
│ ├── eb_org_center_mapping.xlsx (Organization enrichment)
│ │
│ └── templates/
│ ├── Endobest_Template.xlsx
│ ├── Statistics_Template.xlsx
│ └── (Other Excel templates)
├── 📊 OUTPUT FILES
│ ├── endobest_inclusions.json (~6-7 MB, patient data)
│ ├── endobest_inclusions_old.json (backup)
│ ├── endobest_organizations.json (~17-20 KB, stats)
│ ├── endobest_organizations_old.json (backup)
│ ├── [Excel outputs] (*.xlsx, configurable)
│ └── dashboard.log (Execution log)
└── 🔨 EXECUTABLES
├── eb_dashboard.exe (16.5 MB, PyInstaller build)
└── [Various .bat launch scripts]
```
---
## 🏗️ System Architecture Overview
### High-Level Component Diagram
```
┌─────────────────────────────────────────────────────────────────────┐
│ ENDOBEST DASHBOARD MAIN PROCESS │
│ eb_dashboard.py │
├─────────────────────────────────────────────────────────────────────┤
│ │
│ ┌──────────────────────────────────────────────────────────────┐ │
│ │ PHASE 1: INITIALIZATION & AUTHENTICATION │ │
│ │ ├─ User Login (IAM API) │ │
│ │ ├─ Token Exchange (RC-specific) │ │
│ │ ├─ Config Loading (Excel parsing & validation) │ │
│ │ └─ Thread Pool Setup (20 workers main, 40 subtasks) │ │
│ └──────────────────────────────────────────────────────────────┘ │
│ ↓ │
│ ┌──────────────────────────────────────────────────────────────┐ │
│ │ PHASE 2: ORGANIZATION & COUNTERS RETRIEVAL │ │
│ │ ├─ Get All Organizations (getAllOrganizations API) │ │
│ │ ├─ Fetch Counters Parallelized (20 workers) │ │
│ │ ├─ Enrich with Center Mapping (optional) │ │
│ │ └─ Calculate Totals & Sort │ │
│ └──────────────────────────────────────────────────────────────┘ │
│ ↓ │
│ ┌──────────────────────────────────────────────────────────────┐ │
│ │ PHASE 3: PATIENT INCLUSION DATA COLLECTION │ │
│ │ Outer Loop: Organizations (20 parallel workers) │ │
│ │ ├─ For Each Organization: │ │
│ │ │ ├─ Get Inclusions List (POST /api/inclusions/search) │ │
│ │ │ └─ For Each Patient (Sequential): │ │
│ │ │ ├─ Fetch Clinical Record (API) │ │
│ │ │ ├─ Fetch All Questionnaires (Optimized: 1 call) │ │
│ │ │ ├─ Fetch Lab Requests (Async pool) │ │
│ │ │ ├─ Process Field Mappings (extraction + transform) │ │
│ │ │ └─ Update Progress Bars (thread-safe) │ │
│ │ │ │ │
│ │ │ Inner Async: Lab/Questionnaire Fetches (40 workers) │ │
│ │ │ (Non-blocking I/O during main processing) │ │
│ │ └─ Combine Inclusions from All Orgs │ │
│ └──────────────────────────────────────────────────────────────┘ │
│ ↓ │
│ ┌──────────────────────────────────────────────────────────────┐ │
│ │ PHASE 4: QUALITY ASSURANCE & VALIDATION │ │
│ │ ├─ Coherence Check (API stats vs actual data) │ │
│ │ │ └─ Compares counters with detailed records │ │
│ │ ├─ Non-Regression Check (config-driven) │ │
│ │ │ └─ Detects changes with severity levels │ │
│ │ └─ Critical Issue Handling (user confirmation if needed) │ │
│ └──────────────────────────────────────────────────────────────┘ │
│ ↓ │
│ ┌──────────────────────────────────────────────────────────────┐ │
│ │ PHASE 5: EXPORT & PERSISTENCE │ │
│ │ ├─ Backup Old Files (if quality passed) │ │
│ │ ├─ Write JSON Outputs (endobest_inclusions.json, etc.) │ │
│ │ ├─ Export to Excel (if configured) │ │
│ │ │ ├─ Load Templates │ │
│ │ │ ├─ Apply Filters & Sorts │ │
│ │ │ ├─ Fill Data into Sheets │ │
│ │ │ ├─ Replace Values │ │
│ │ │ └─ Recalculate Formulas (win32com) │ │
│ │ └─ Display Summary & Elapsed Time │ │
│ └──────────────────────────────────────────────────────────────┘ │
│ ↓ │
│ EXIT │
└─────────────────────────────────────────────────────────────────────┘
↓ EXTERNAL DEPENDENCIES ↓
┌─────────────────────────────────────────────────────────────────────┐
│ EXTERNAL APIS │
├─────────────────────────────────────────────────────────────────────┤
│ │
│ 🔐 AUTHENTICATION (IAM) │
│ └─ api-auth.ziwig-connect.com │
│ ├─ POST /api/auth/ziwig-pro/login │
│ └─ POST /api/auth/refreshToken │
│ │
│ 🏥 RESEARCH CLINIC (RC) │
│ └─ api-hcp.ziwig-connect.com │
│ ├─ POST /api/auth/config-token │
│ ├─ GET /api/inclusions/getAllOrganizations │
│ ├─ POST /api/inclusions/inclusion-statistics │
│ ├─ POST /api/inclusions/search │
│ ├─ POST /api/records/byPatient │
│ └─ POST /api/surveys/filter/with-answers (optimized!) │
│ │
│ 🧪 LAB / DIAGNOSTICS (GDD) │
│ └─ api-lab.ziwig-connect.com │
│ └─ GET /api/requests/by-tube-id/{tubeId} │
│ │
│ 📝 EXCEL TEMPLATES │
│ └─ config/templates/ │
│ ├─ Endobest_Template.xlsx │
│ ├─ Statistics_Template.xlsx │
│ └─ (Custom templates) │
│ │
└─────────────────────────────────────────────────────────────────────┘
```
---
## 🔌 Module Descriptions
### 1. **eb_dashboard.py** - Main Orchestrator (57.5 KB)
**Responsibility:** Complete data collection workflow, API coordination, multithreaded execution
**Structure (9 Blocks):**
| Block | Purpose | Key Functions |
|-------|---------|---|
| **1** | Configuration & Infrastructure | Constants, global vars, progress bar setup |
| **2** | Decorators & Resilience | `@api_call_with_retry`, retry logic |
| **3** | Authentication | `login()`, token exchange, IAM integration |
| **3B** | File Utilities | `load_json_file()` |
| **4** | Inclusions Mapping Config | `load_inclusions_mapping_config()`, validation |
| **5** | Data Search & Extraction | Questionnaire finding, field retrieval |
| **6** | Custom Functions | Business logic, calculated fields |
| **7** | Business API Calls | RC, GDD, organization endpoints |
| **7b** | Organization Center Mapping | `load_org_center_mapping()` |
| **8** | Processing Orchestration | `process_organization_patients()`, patient data processing |
| **9** | Main Execution | Entry point, quality checks, export |
**Key Technologies:**
- `httpx` - HTTP client (with thread-local instances)
- `openpyxl` - Excel parsing
- `concurrent.futures.ThreadPoolExecutor` - Parallel execution
- `tqdm` - Progress tracking
- `questionary` - Interactive prompts
---
### 2. **eb_dashboard_utils.py** - Utility Functions (6.4 KB)
**Responsibility:** Generic, reusable utility functions shared across modules
**Core Functions:**
```python
get_httpx_client() # Thread-local HTTP client management
get_thread_position() # Progress bar positioning
get_nested_value() # JSON path navigation with wildcard support (*)
get_config_path() # Config folder resolution (script vs PyInstaller)
get_old_filename() # Backup filename generation
```
**Key Features:**
- Thread-safe HTTP client pooling
- Wildcard support in nested JSON paths (e.g., `["items", "*", "value"]`)
- Cross-platform path resolution
---
### 3. **eb_dashboard_quality_checks.py** - QA & Validation (58.5 KB)
**Responsibility:** Quality assurance, data validation, regression checking
**Core Functions:**
| Function | Purpose |
|----------|---------|
| `load_regression_check_config()` | Load regression rules from Excel |
| `run_quality_checks()` | Orchestrate all QA checks |
| `coherence_check()` | Verify stats vs detailed data consistency |
| `non_regression_check()` | Config-driven change validation |
| `run_check_only_mode()` | Standalone validation mode |
| `backup_output_files()` | Create versioned backups |
**Quality Check Types:**
1. **Coherence Check**
- Compares API-provided organization statistics vs. actual inclusion counts
- Severity: Warning/Critical
- Example: Total API count (145) vs. actual inclusions (143)
2. **Non-Regression Check**
- Compares current vs. previous run data
- Applies config-driven rules with transition patterns
- Detects: new inclusions, deletions, field changes
- Severity: Warning/Critical with exceptions
---
### 4. **eb_dashboard_excel_export.py** - Excel Generation & Orchestration (38 KB, v1.1+)
**Responsibility:** Configuration-driven Excel workbook generation with data transformation + high-level orchestration
**Core Functions (Low-Level):**
| Function | Purpose |
|----------|---------|
| `load_excel_export_config()` | Load Excel_Workbooks + Excel_Sheets config |
| `validate_excel_config()` | Validate templates and named ranges |
| `export_to_excel()` | Main export orchestration (openpyxl + win32com) |
| `_apply_filter()` | AND-condition filtering |
| `_apply_sort()` | Multi-key sorting with datetime support |
| `_apply_value_replacement()` | Strict type matching value transformation |
| `_handle_output_exists()` | File conflict resolution |
| `_recalculate_workbook()` | Formula recalculation via win32com |
| `_process_sheet()` | Sheet-specific data filling |
**High-Level Orchestration Functions (v1.1+):**
| Function | Purpose | Called From |
|----------|---------|-------------|
| `export_excel_only()` | Complete --excel-only mode | main() CLI detection |
| `run_normal_mode_export()` | Normal mode export phase | main() after JSON write |
| `prepare_excel_export()` | Preparation + validation | Both orchestration functions |
| `execute_excel_export()` | Execution with error handling | Both orchestration functions |
| `_load_json_file_internal()` | Safe JSON loading | run_normal_mode_export() |
**Data Transformation Pipeline:**
```
1. Load Configuration (Excel_Workbooks + Excel_Sheets)
2. For each workbook:
a. Load template (openpyxl)
b. For each sheet:
- Apply filter (AND conditions)
- Apply sort (multi-key)
- Apply value replacement (strict type matching)
- Fill data into cells/named ranges
c. Handle file conflicts (Overwrite/Increment/Backup)
d. Save workbook (openpyxl)
e. Recalculate formulas (win32com - optional)
```
**Orchestration Pattern (v1.1+):**
As of v1.1, the system delegates all export orchestration to dedicated functions following the pattern established by `run_check_only_mode()` from quality_checks:
1. **--excel-only mode:** Main script calls single function → `export_excel_only()` handles everything
2. **Normal mode export:** Main script calls single function → `run_normal_mode_export()` handles everything
This keeps the main script focused on business logic while all export mechanics are encapsulated in the module.
---
## 🔄 Complete Data Collection Workflow
### Phase 1: Initialization (2-3 seconds)
1. User provides credentials (with defaults)
2. IAM Login: `POST /api/auth/ziwig-pro/login`
3. Token Exchange: `POST /api/auth/config-token`
4. Load configuration from `Endobest_Dashboard_Config.xlsx`
5. Validate field mappings and quality check rules
6. Setup thread pools (main: 20 workers, subtasks: 40 workers)
### Phase 2: Organization Retrieval (5-8 seconds)
1. Get all organizations: `GET /api/inclusions/getAllOrganizations`
2. Filter excluded centers (config-driven)
3. Fetch counters in parallel (20 workers):
- For each org: `POST /api/inclusions/inclusion-statistics`
- Store: patients_count, preincluded_count, included_count, prematurely_terminated_count
4. Optional: Enrich with center mapping (from `eb_org_center_mapping.xlsx`)
5. Calculate totals and sort
### Phase 3: Patient Data Collection (2-4 minutes)
**Nested Parallel Architecture:**
**Outer Loop (20 workers):** For each organization
- `POST /api/inclusions/search?limit=1000&page=1` → Get up to 1000 inclusions
**Middle Loop (Sequential):** For each patient
- Fetch clinical record: `POST /api/records/byPatient`
- Fetch questionnaires: `POST /api/surveys/filter/with-answers` (**optimized: 1 call**)
- Submit async lab request: `GET /api/requests/by-tube-id/{tubeId}` (in subtasks pool)
**Inner Loop (40 async workers):** Non-blocking lab/questionnaire processing
- Parallel fetches of lab requests while main thread processes fields
**Field Processing (per patient):**
- For each field in configuration:
1. Determine source (questionnaire, record, inclusion, request, calculated)
2. Extract raw value (supports JSON paths with wildcards)
3. Check field condition (optional)
4. Apply post-processing transformations
5. Format score dictionaries
6. Store in nested output structure
### Phase 4: Quality Assurance (10-15 seconds)
1. **Coherence Check:** Compare API counters vs. actual data
2. **Non-Regression Check:** Compare current vs. previous run with config rules
3. **Critical Issue Handling:** User confirmation if issues detected
4. If NO critical issues → continue to export
5. If YES critical issues → prompt user for override
### Phase 5: Export & Persistence (3-5 seconds)
**Step 1: Backup & JSON Write**
1. Backup old files (if quality checks passed)
2. Write JSON outputs:
- `endobest_inclusions.json` (6-7 MB)
- `endobest_organizations.json` (17-20 KB)
**Step 2: Excel Export (if configured)**
Delegated to `run_normal_mode_export()` function which handles:
1. Load JSONs from filesystem (ensures consistency)
2. Load Excel configuration
3. Validate templates and named ranges
4. For each configured workbook:
- Load template file
- Apply filter conditions (AND logic)
- Apply multi-key sort
- Apply value replacements (strict type matching)
- Fill data into cells/named ranges
- Handle file conflicts (Overwrite/Increment/Backup)
- Save workbook
- Recalculate formulas (optional, via win32com)
5. Display results and return status
**Step 3: Summary**
1. Display elapsed time
2. Report file locations
3. Note any warnings/errors during export
---
## ⚙️ Configuration System
### Three-Layer Configuration Architecture
#### Layer 1: Excel Configuration (`Endobest_Dashboard_Config.xlsx`)
**Sheet 1: Inclusions_Mapping** (Field Extraction)
- Define which patient fields to extract
- Specify sources (questionnaire, record, inclusion, request, calculated)
- Configure transformations (value labels, templates, conditions)
- ~50+ fields typically configured
**Sheet 2: Organizations_Mapping** (Organization Fields)
- Define which organization fields to export
- Rarely modified
**Sheet 3: Excel_Workbooks** (Excel Export Metadata)
- Workbook names
- Template paths
- Output filenames (with template variables)
- File conflict handling strategy (Overwrite/Increment/Backup)
**Sheet 4: Excel_Sheets** (Sheet Configurations)
- Workbook name (reference to Excel_Workbooks)
- Sheet name (in template)
- Source type (Inclusions/Organizations/Variable)
- Target (cell or named range)
- Column mapping (JSON)
- Filter conditions (JSON with AND logic)
- Sort keys (JSON, multi-key with datetime support)
- Value replacements (JSON, strict type matching)
**Sheet 5: Regression_Check** (Quality Rules)
- Rule names
- Field selection pipeline (include/exclude patterns)
- Scope (all organizations or specific org list)
- Transition patterns (expected state changes)
- Severity levels (Warning/Critical)
#### Layer 2: Organization Mapping (`eb_org_center_mapping.xlsx`)
- Optional mapping file
- Sheet: `Org_Center_Mapping`
- Maps organization names to center identifiers
- Gracefully degraded if missing
#### Layer 3: Excel Templates (`config/templates/`)
- Excel workbook templates with:
- Sheet definitions
- Named ranges (for data fill targets)
- Formula structures
- Formatting and styles
### Configuration Constants (in code)
```python
# API Configuration
IAM_URL = "https://api-auth.ziwig-connect.com"
RC_URL = "https://api-hcp.ziwig-connect.com"
GDD_URL = "https://api-lab.ziwig-connect.com"
RC_APP_ID = "602aea51-cdb2-4f73-ac99-fd84050dc393"
RC_ENDOBEST_PROTOCOL_ID = "3c7bcb4d-91ed-4e9f-b93f-99d8447a276e"
# Threading & Performance
MAX_THREADS = 20 # Main thread pool workers
ASYNC_THREADS = 40 # Subtasks thread pool workers
ERROR_MAX_RETRY = 10 # Maximum retry attempts
WAIT_BEFORE_RETRY = 0.5 # Seconds between retries
# Excluded Organizations
RC_ENDOBEST_EXCLUDED_CENTERS = ["e18e7487-...", "5582bd75-...", "e053512f-..."]
```
---
## 🔐 API Integration
### Authentication Flow
```
1. IAM Login
POST https://api-auth.ziwig-connect.com/api/auth/ziwig-pro/login
Request: {"username": "...", "password": "..."}
Response: {"access_token": "jwt_master", "userId": "uuid"}
2. Token Exchange (RC-specific)
POST https://api-hcp.ziwig-connect.com/api/auth/config-token
Headers: Authorization: Bearer {master_token}
Request: {"userId": "...", "clientId": "...", "userAgent": "..."}
Response: {"access_token": "jwt_rc", "refresh_token": "refresh_token"}
3. Automatic Token Refresh (on 401)
POST https://api-hcp.ziwig-connect.com/api/auth/refreshToken
Headers: Authorization: Bearer {current_token}
Request: {"refresh_token": "..."}
Response: {"access_token": "jwt_new", "refresh_token": "new_refresh"}
```
### Key API Endpoints
| Endpoint | Method | Purpose |
|----------|--------|---------|
| `/api/inclusions/getAllOrganizations` | GET | List all organizations |
| `/api/inclusions/inclusion-statistics` | POST | Get patient counts per org |
| `/api/inclusions/search` | POST | Get inclusions list for org (paginated) |
| `/api/records/byPatient` | POST | Get clinical record for patient |
| `/api/surveys/filter/with-answers` | POST | **OPTIMIZED:** Get all questionnaires for patient |
| `/api/requests/by-tube-id/{tubeId}` | GET | Get lab test results |
### Performance Optimization: Questionnaire Batching
**Problem:** Multiple API calls per patient (1 call per questionnaire × N patients = slow)
**Solution:** Single optimized call retrieves all questionnaires with answers
```
BEFORE (inefficient):
for qcm_id in questionnaire_ids:
GET /api/surveys/{qcm_id}/answers?subject={patient_id}
# Result: N API calls per patient
AFTER (optimized):
POST /api/surveys/filter/with-answers
{
"context": "clinic_research",
"subject": patient_id
}
# Result: 1 API call per patient
# Impact: 4-5x performance improvement
```
---
## ⚡ Multithreading & Performance Optimization
### Thread Pool Architecture
```
Main Application Thread
┌─ Phase 1: Counter Fetching ──────────────────────────┐
│ ThreadPoolExecutor(max_workers=user_input, cap=20) │
│ ├─ Task 1: Get counters for Org 1 │
│ ├─ Task 2: Get counters for Org 2 │
│ └─ Task N: Get counters for Org N │
│ [Sequential wait: tqdm.as_completed] │
└──────────────────────────────────────────────────────┘
┌─ Phase 2: Inclusion Data Collection (Nested) ────────┐
│ Outer: ThreadPoolExecutor(max_workers=user_input) │
│ │
│ For Org 1: │
│ │ Inner: ThreadPoolExecutor(max_workers=40) │
│ │ ├─ Patient 1: Async lab/questionnaire fetch │
│ │ ├─ Patient 2: Async lab/questionnaire fetch │
│ │ └─ Patient N: Async lab/questionnaire fetch │
│ │ [Sequential outer wait: as_completed] │
│ │ │
│ For Org 2: │
│ │ [Similar parallel processing] │
│ │ │
│ For Org N: │
│ │ [Similar parallel processing] │
└──────────────────────────────────────────────────────┘
```
### Performance Optimizations
1. **Thread-Local HTTP Clients**
- Each thread maintains its own `httpx.Client`
- Avoids connection conflicts
- Implementation via `get_httpx_client()`
2. **Nested Parallelization**
- Main pool: Organizations (20 workers)
- Subtasks pool: Lab requests (40 workers)
- Non-blocking I/O during processing
3. **Questionnaire Batching** (4-5x improvement)
- Single call retrieves all questionnaires + answers
- Eliminates N filtered calls per patient
4. **Configurable Worker Threads**
- User input selection (1-20 workers)
- Tunable for network bandwidth and API rate limits
### Progress Tracking (Multi-Level)
```
Overall Progress [████████████░░░░░░░░░░░░] 847/1200
1/15 - Center 1 [██████████░░░░░░░░░░░░░░░] 73/95
2/15 - Center 2 [██████░░░░░░░░░░░░░░░░░░░] 42/110
3/15 - Center 3 [████░░░░░░░░░░░░░░░░░░░░░] 28/85
```
**Thread-Safe Updates:**
```python
with _global_pbar_lock:
if global_pbar:
global_pbar.update(1)
```
---
## 🛡️ Error Handling & Resilience
### Token Management Strategy
1. **Automatic Token Refresh on 401**
- Triggered by `@api_call_with_retry` decorator
- Thread-safe via `_token_refresh_lock`
2. **Retry Mechanism**
- Max retries: 10 attempts
- Delay between retries: 0.5 seconds
- Decorators: `@api_call_with_retry`
3. **Thread-Safe Token Refresh**
```python
def new_token():
global access_token, refresh_token
with _token_refresh_lock: # Only one thread refreshes at a time
for attempt in range(ERROR_MAX_RETRY):
try:
# POST /api/auth/refreshToken
# Update global tokens
except:
sleep(WAIT_BEFORE_RETRY)
```
### Exception Handling Categories
| Category | Examples | Handling |
|----------|----------|----------|
| **API Errors** | Network timeouts, HTTP errors | Retry with exponential spacing |
| **File I/O Errors** | Missing config, permission denied | Graceful error + exit |
| **Validation Errors** | Invalid config, incoherent data | Log warning + prompt user |
| **Thread Errors** | Worker thread failures | Shutdown gracefully + propagate |
### Graceful Degradation
1. **Missing Organization Mapping:** Skip silently, use fallback (org name)
2. **Critical Quality Issues:** Prompt user for confirmation before export
3. **Thread Failure:** Shutdown all workers gracefully, preserve partial results
4. **Invalid Configuration:** Clear error messages with remediation suggestions
---
## 📊 Data Output Structure
### JSON Output: `endobest_inclusions.json`
```json
[
{
"Patient_Identification": {
"Organisation_Id": "uuid",
"Organisation_Name": "Hospital Name",
"Center_Name": "HOSP-A",
"Patient_Id": "internal_id",
"Pseudo": "ENDO-001",
"Patient_Name": "Doe, John",
"Patient_Birthday": "1975-05-15",
"Patient_Age": 49
},
"Inclusion": {
"Consent_Signed": true,
"Inclusion_Date": "15/10/2024",
"Inclusion_Status": "incluse",
"isPrematurelyTerminated": false
},
"Extended_Fields": {
"Custom_Field_1": "value",
"Custom_Field_2": 42,
"Composite_Score": "8/10"
},
"Endotest": {
"Request_Sent": true,
"Diagnostic_Status": "Completed"
}
}
]
```
### JSON Output: `endobest_organizations.json`
```json
[
{
"id": "org-uuid",
"name": "Hospital A",
"Center_Name": "HOSP-A",
"patients_count": 45,
"preincluded_count": 8,
"included_count": 35,
"prematurely_terminated_count": 2
}
]
```
---
## 🚀 Execution Modes
### Mode 1: Normal (Full Collection)
```bash
python eb_dashboard.py
```
- Authenticates
- Collects from APIs
- Runs quality checks
- Exports JSON + Excel
- Duration: 2.5-5 minutes (typical)
### Mode 2: Excel-Only (Fast Export)
```bash
python eb_dashboard.py --excel-only
```
- Skips data collection
- Uses existing JSON files
- Regenerates Excel workbooks
- Duration: 5-15 seconds
- Use case: Reconfigure reports, test templates
### Mode 3: Check-Only (Validation Only)
```bash
python eb_dashboard.py --check-only
```
- Loads existing JSON
- Runs quality checks
- No export
- Duration: 5-10 seconds
- Use case: Verify data before distribution
### Mode 4: Debug (Verbose Output)
```bash
python eb_dashboard.py --debug
```
- Executes normal mode
- Enables detailed logging
- Shows field-by-field changes
- Check `dashboard.log` for details
---
## 📈 Performance Metrics & Benchmarks
### Typical Execution Times (Full Dataset: 1,200+ patients, 15+ organizations)
| Phase | Duration | Notes |
|-------|----------|-------|
| **Login & Config** | 2-3 sec | Sequential, network-dependent |
| **Fetch Counters** | 5-8 sec | 20 workers, parallelized |
| **Collect Inclusions** | 2-4 min | Includes API calls + field processing |
| **Quality Checks** | 10-15 sec | File loads, data comparison |
| **Export to JSON** | 3-5 sec | File I/O |
| **Export to Excel** | 5-15 sec | Template processing + fill |
| **TOTAL** | **~2.5-5 min** | Depends on network, API perf |
### Network Optimization Impact
**With old questionnaire approach (N filtered calls per patient):**
- 1,200 patients × 15 questionnaires = 18,000 API calls
- Estimated: 15-30 minutes
**With optimized single-call questionnaire:**
- 1,200 patients × 1 call = 1,200 API calls
- Estimated: 2-5 minutes
- **Improvement: 3-6x faster** ✅
---
## 🔍 Field Extraction & Processing Logic
### Complete Field Processing Pipeline
```
For each field in INCLUSIONS_MAPPING_CONFIG:
├─ Step 1: Determine Source Type
│ ├─ q_id / q_name / q_category → Find questionnaire
│ ├─ record → Use clinical record
│ ├─ inclusion → Use patient inclusion data
│ ├─ request → Use lab request data
│ └─ calculated → Execute custom function
├─ Step 2: Extract Raw Value
│ ├─ Navigate JSON using field_path
│ ├─ Supports wildcard (*) for list traversal
│ └─ Return value or "undefined"
├─ Step 3: Check Field Condition (optional)
│ ├─ If condition undefined → Set to "undefined"
│ ├─ If condition not boolean → Error flag
│ ├─ If condition false → Set to "N/A"
│ └─ If condition true → Continue
├─ Step 4: Apply Post-Processing Transformations
│ ├─ true_if_any: Convert to boolean
│ ├─ value_labels: Map to localized text
│ ├─ field_template: Apply formatting
│ └─ List joining: Flatten arrays with pipe delimiter
├─ Step 5: Format Score Dictionaries
│ ├─ If {total, max} → Format as "total/max"
│ └─ Otherwise → Keep as-is
└─ Store: output_inclusion[field_group][field_name] = final_value
```
### Custom Functions for Calculated Fields
| Function | Purpose | Syntax |
|----------|---------|--------|
| `search_in_fields_using_regex` | Search multiple fields for pattern | `["search_in_fields_using_regex", "pattern", "field1", "field2"]` |
| `extract_parentheses_content` | Extract text within parentheses | `["extract_parentheses_content", "field_name"]` |
| `append_terminated_suffix` | Add suffix if patient terminated | `["append_terminated_suffix", "status_field", "is_terminated_field"]` |
| `if_then_else` | Unified conditional with 8 operators | `["if_then_else", "operator", arg1, arg2_optional, true_result, false_result]` |
**if_then_else Operators:**
- `is_true` / `is_false` - Boolean field test
- `is_defined` / `is_undefined` - Existence test
- `all_true` / `all_defined` - Multiple field test
- `==` / `!=` - Value comparison
---
## ✅ Quality Assurance Framework
### Coherence Check
**Purpose:** Verify API-provided statistics match actual collected data
**Logic:**
```
For each organization:
API_Count = statistic.total
Actual_Count = count of inclusion records
if API_Count != Actual_Count:
Report discrepancy with severity
├─ ±10%: Warning
└─ >±10%: Critical
```
### Non-Regression Check
**Purpose:** Detect unexpected changes between data runs
**Configuration-Driven Rules:**
- Field selection pipeline (include/exclude patterns)
- Transition patterns (expected state changes)
- Severity levels (Warning/Critical)
- Exception handling (exclude specific organizations)
**Logic:**
```
Load previous inclusion data (_old file)
For each rule:
├─ Build candidate fields via pipeline
├─ Determine key field for matching
└─ For each inclusion:
├─ Find matching old inclusion by key
├─ Check for unexpected transitions
├─ Apply exceptions
└─ Report violations
```
---
## 📋 Documentation Structure
The system includes comprehensive documentation:
| Document | Size | Content |
|----------|------|---------|
| **DOCUMENTATION_10_ARCHITECTURE.md** | 43.7 KB | System design, workflow, APIs, multithreading |
| **DOCUMENTATION_11_FIELD_MAPPING.md** | 56.3 KB | Field extraction logic, custom functions, examples |
| **DOCUMENTATION_12_QUALITY_CHECKS.md** | 60.2 KB | QA framework, regression rules, configuration |
| **DOCUMENTATION_13_EXCEL_EXPORT.md** | 29.6 KB | Excel generation, data transformation, config |
| **DOCUMENTATION_98_USER_GUIDE.md** | 8.4 KB | End-user instructions, troubleshooting, FAQ |
| **DOCUMENTATION_99_CONFIG_GUIDE.md** | 24.8 KB | Administrator reference, Excel tables, examples |
---
## 🔧 Key Technical Features
### Thread Safety
- Per-thread HTTP clients (no connection conflicts)
- Synchronized access to global state via locks
- Thread-safe progress bar updates
### Error Recovery
- Automatic token refresh on 401 errors
- Exponential backoff retry logic (configurable)
- Graceful degradation for optional features
- User confirmation on critical issues
### Configuration Flexibility
- 100% externalized to Excel (zero code changes)
- Supports multiple data sources
- Custom business logic functions
- Field dependencies and conditions
- Value transformations and templates
### Performance
- Optimized API calls (4-5x improvement)
- Parallel processing (20+ workers)
- Async I/O operations
- Configurable thread pools
### Data Quality
- Coherence checking (stats vs actual data)
- Non-regression testing (config-driven)
- Comprehensive validation
- Audit trail logging
---
## 📦 Dependencies
### Core Libraries
- **httpx** - HTTP client with connection pooling
- **openpyxl** - Excel file reading/writing
- **questionary** - Interactive CLI prompts
- **tqdm** - Progress bars
- **rich** - Rich text formatting
- **pywin32** - Windows COM automation (optional, for formula recalculation)
- **pytz** - Timezone support (optional)
### Python Version
- Python 3.7+
### External Services
- Ziwig IAM API
- Ziwig Research Clinic (RC) API
- Ziwig Lab (GDD) API
---
## 🎓 Usage Patterns
### For End Users
1. Configure fields in Excel (no code needed)
2. Run: `python eb_dashboard.py`
3. Review results in JSON or Excel
### For Administrators
1. Add new fields to `Inclusions_Mapping`
2. Define quality rules in `Regression_Check`
3. Configure Excel export in `Excel_Workbooks` + `Excel_Sheets`
4. Restart: script picks up config automatically
### For Developers
1. Add custom function to Block 6 (eb_dashboard.py)
2. Register in field config (Inclusions_Mapping)
3. Use via: `"source_id": "function_name"`
4. No code recompile needed for other changes
---
## 🎯 Summary
The **Endobest Clinical Research Dashboard** represents a mature, production-ready system that successfully combines:
**Architectural Excellence** - Clean modular design with separation of concerns
**User-Centric Configuration** - 100% externalized, no code changes needed
**Performance Optimization** - 4-5x faster via API and threading improvements
**Robust Resilience** - Comprehensive error handling, automatic recovery, graceful degradation
**Quality Assurance** - Multi-level validation, coherence checks, regression testing
**Comprehensive Documentation** - 250+ KB of technical and user guides
**Maintainability** - Clear code structure, extensive logging, audit trails
The system successfully enables non-technical users to configure complex data extraction and reporting workflows while maintaining enterprise-grade reliability and performance standards.
---
**Document Version:** 1.0
**Last Updated:** 2025-11-08
**Status:** ✅ Complete & Production Ready