1349 lines
45 KiB
Markdown
1349 lines
45 KiB
Markdown
# Endobest Clinical Research Dashboard - Technical Documentation
|
||
|
||
## Part 1: General Architecture & Report Generation Workflow
|
||
|
||
**Document Version:** 2.0 (Updated with Excel Export feature)
|
||
**Last Updated:** 2025-11-08
|
||
**Audience:** Developers, Technical Architects
|
||
**Language:** English
|
||
|
||
---
|
||
|
||
## Table of Contents
|
||
|
||
1. [Overview](#overview)
|
||
2. [System Architecture](#system-architecture)
|
||
3. [Module Structure](#module-structure)
|
||
4. [Complete Data Collection Workflow](#complete-data-collection-workflow)
|
||
5. [API Integration](#api-integration)
|
||
6. [Multithreading & Performance](#multithreading--performance)
|
||
7. [Data Processing Pipeline](#data-processing-pipeline)
|
||
8. [Execution Modes](#execution-modes)
|
||
9. [Error Handling & Resilience](#error-handling--resilience)
|
||
|
||
---
|
||
|
||
## Overview
|
||
|
||
The **Endobest Clinical Research Dashboard** is an automated data collection and processing system designed to extract, validate, and consolidate patient inclusion data from the Endobest clinical research protocol across multiple healthcare organizations.
|
||
|
||
### Key Characteristics
|
||
|
||
- **100% Externalized Configuration**: All extraction fields defined in Excel, zero code changes needed
|
||
- **Multi-Source Data Integration**: Fetches from RC (Research Clinic), GDD (Lab), and questionnaire APIs
|
||
- **High-Performance Multithreading**: 20+ concurrent workers for API parallelization
|
||
- **Comprehensive Quality Assurance**: Built-in coherence checks and regression testing
|
||
- **Thread-Safe Operations**: Dedicated HTTP clients per thread, synchronized access to shared resources
|
||
- **Automated Error Recovery**: Token refresh, automatic retry with exponential backoff
|
||
- **Audit Trail**: Detailed logging and JSON backup versioning
|
||
|
||
---
|
||
|
||
## System Architecture
|
||
|
||
### High-Level Component Diagram
|
||
|
||
```
|
||
┌─────────────────────────────────────────────────────────┐
|
||
│ Endobest Dashboard Main Process │
|
||
│ eb_dashboard.py │
|
||
├─────────────────────────────────────────────────────────┤
|
||
│ │
|
||
│ ┌──────────────┐ ┌──────────────┐ ┌──────────────┐ │
|
||
│ │ Block 1-3 │ │ Block 4 │ │ Block 5-6 │ │
|
||
│ │ Config & Auth│ │ Config Load │ │ Data Extract │ │
|
||
│ └──────────────┘ └──────────────┘ └──────────────┘ │
|
||
│ ↓ ↓ ↓ │
|
||
│ ┌─────────────────────────────────────────────────┐ │
|
||
│ │ Extended Fields Configuration │ │
|
||
│ │ (Excel: Mapping Sheet → JSON field mapping) │ │
|
||
│ └─────────────────────────────────────────────────┘ │
|
||
│ ↓ ↓ ↓ │
|
||
│ ┌──────────────┐ ┌──────────────┐ ┌──────────────┐ │
|
||
│ │ Block 7 │ │ Block 8 │ │ Block 9 │ │
|
||
│ │ API Calls │ │ Orchestration│ │ Quality QA │ │
|
||
│ └──────────────┘ └──────────────┘ └──────────────┘ │
|
||
│ ↓ ↓ ↓ │
|
||
│ ┌─────────────────────────────────────────────────┐ │
|
||
│ │ Multithreaded Processing (ThreadPoolExecutor) │ │
|
||
│ │ - Organizations: 20 workers (parallel) │ │
|
||
│ │ - Requests/Questionnaires: 40 workers (async) │ │
|
||
│ └─────────────────────────────────────────────────┘ │
|
||
│ ↓ ↓ ↓ │
|
||
│ ┌─────────────────────────────────────────────────┐ │
|
||
│ │ Quality Checks & Validation │ │
|
||
│ │ - Coherence Check (stats vs detail) │ │
|
||
│ │ - Non-Regression Check (config-driven) │ │
|
||
│ └─────────────────────────────────────────────────┘ │
|
||
│ ↓ ↓ ↓ │
|
||
│ ┌─────────────────────────────────────────────────┐ │
|
||
│ │ Export & Persistence │ │
|
||
│ │ - endobest_inclusions.json │ │
|
||
│ │ - endobest_organizations.json │ │
|
||
│ │ - Versioned backups (_old suffix) │ │
|
||
│ └─────────────────────────────────────────────────┘ │
|
||
│ │
|
||
└─────────────────────────────────────────────────────────┘
|
||
↓
|
||
┌──────────────────────────────────┐
|
||
│ Utility Modules │
|
||
├──────────────────────────────────┤
|
||
│ • eb_dashboard_utils.py │
|
||
│ • eb_dashboard_quality_checks.py │
|
||
└──────────────────────────────────┘
|
||
↓
|
||
┌──────────────────────────────────┐
|
||
│ External APIs │
|
||
├──────────────────────────────────┤
|
||
│ • IAM (Authentication) │
|
||
│ • RC (Research Clinic) │
|
||
│ • GDD (Lab / Diagnostic Data) │
|
||
└──────────────────────────────────┘
|
||
```
|
||
|
||
---
|
||
|
||
## Module Structure
|
||
|
||
### 1. **eb_dashboard.py** (Primary Orchestrator)
|
||
**Size:** ~45 KB | **Lines:** 1,021
|
||
**Responsibility:** Main application logic, API coordination, multithreading
|
||
|
||
#### Major Blocks:
|
||
- **Block 1**: Configuration & Base Infrastructure (constants, global variables, progress bar setup)
|
||
- **Block 2**: Decorators & Resilience (retry logic, token refresh)
|
||
- **Block 3**: Authentication (IAM login, token management)
|
||
- **Block 4**: Extended Fields Configuration (Excel loading & validation)
|
||
- **Block 5**: Data Search & Extraction (questionnaire finding, field retrieval)
|
||
- **Block 6**: Custom Functions & Field Processing (business logic, calculated fields)
|
||
- **Block 7**: Business API Calls (RC, GDD endpoints)
|
||
- **Block 7b**: Organization Center Mapping (organization enrichment with center identifiers)
|
||
- **Block 8**: Processing Orchestration (patient data processing)
|
||
- **Block 9**: Main Execution (entry point, quality checks, export)
|
||
|
||
### 2. **eb_dashboard_utils.py** (Reusable Utilities)
|
||
**Size:** ~6.4 KB | **Lines:** 184
|
||
**Responsibility:** Generic utility functions shared across modules
|
||
|
||
#### Core Functions:
|
||
```python
|
||
get_httpx_client() # Thread-local HTTP client management
|
||
get_thread_position() # Progress bar positioning
|
||
get_nested_value() # JSON path navigation with wildcard support
|
||
get_config_path() # Config folder resolution (script vs PyInstaller)
|
||
get_old_filename() # Backup filename generation
|
||
```
|
||
|
||
### 3. **eb_dashboard_quality_checks.py** (QA & Validation)
|
||
**Size:** ~59 KB | **Lines:** 1,266
|
||
**Responsibility:** Quality assurance, data validation, regression checking
|
||
|
||
#### Core Functions:
|
||
```python
|
||
load_regression_check_config() # Load regression rules from Excel
|
||
run_quality_checks() # Orchestrate all QA checks
|
||
coherence_check() # Verify stats vs detailed data consistency
|
||
non_regression_check() # Config-driven change validation
|
||
run_check_only_mode() # Standalone validation mode
|
||
backup_output_files() # Create versioned backups
|
||
```
|
||
|
||
### 4. **eb_dashboard_excel_export.py** (Excel Report Generation & Orchestration)
|
||
**Size:** ~38 KB | **Lines:** ~1,340 (v1.1+)
|
||
**Responsibility:** Configuration-driven Excel workbook generation with data transformation + high-level orchestration
|
||
|
||
#### Low-Level Functions (Data Processing):
|
||
```python
|
||
load_excel_export_config() # Load Excel_Workbooks and Excel_Sheets config
|
||
validate_excel_config() # Validate templates and named ranges
|
||
export_to_excel() # Main export orchestration (openpyxl + win32com)
|
||
_apply_filter() # AND-condition filtering
|
||
_apply_sort() # Multi-key sorting with datetime support
|
||
_apply_value_replacement() # Strict type matching value transformation
|
||
_handle_output_exists() # File conflict resolution (Overwrite/Increment/Backup)
|
||
_recalculate_workbook() # Formula recalculation via win32com (optional)
|
||
_process_sheet() # Sheet-specific data filling
|
||
```
|
||
|
||
#### High-Level Orchestration Functions (v1.1+):
|
||
```python
|
||
export_excel_only(sys_argv, console, ...) # Complete --excel-only mode orchestration
|
||
run_normal_mode_export(data, data, enabled, config, ...) # Normal mode export phase
|
||
prepare_excel_export(inclusions_file, organizations_file, ...) # Prep + validate
|
||
execute_excel_export(inclusions_data, organizations_data, config, ...) # Exec + error handling
|
||
_load_json_file_internal(filename) # Safe JSON loading helper
|
||
```
|
||
|
||
**Design Pattern (v1.1+):**
|
||
- All export mechanics delegated to module (follows quality_checks pattern)
|
||
- Main script calls single function per mode: `export_excel_only()` or `run_normal_mode_export()`
|
||
- Configuration validation and error handling centralized in module
|
||
- Result: Main script focused on business logic, export details encapsulated
|
||
|
||
**Note:** See [DOCUMENTATION_13_EXCEL_EXPORT.md](DOCUMENTATION_13_EXCEL_EXPORT.md) for complete architecture and configuration details.
|
||
|
||
### 5. **eb_dashboard_constants.py** (Centralized Configuration)
|
||
**Size:** ~3.5 KB | **Lines:** 120
|
||
**Responsibility:** Single source of truth for all application constants
|
||
|
||
#### Constants Categories:
|
||
```python
|
||
# File Management
|
||
INCLUSIONS_FILE_NAME, ORGANIZATIONS_FILE_NAME, CONFIG_FOLDER_NAME, etc.
|
||
|
||
# Excel Configuration
|
||
DASHBOARD_CONFIG_FILE_NAME, ORG_CENTER_MAPPING_FILE_NAME
|
||
EXCEL_WORKBOOKS_TABLE_NAME, EXCEL_SHEETS_TABLE_NAME, etc.
|
||
|
||
# API Configuration
|
||
API_TIMEOUT, API_*_ENDPOINT (9 endpoints across Auth, RC, GDD)
|
||
DEFAULT_USER_NAME, DEFAULT_PASSWORD, IAM_URL, RC_URL, GDD_URL, RC_APP_ID
|
||
|
||
# Research Protocol
|
||
RC_ENDOBEST_PROTOCOL_ID, RC_ENDOBEST_EXCLUDED_CENTERS
|
||
|
||
# Performance & Quality
|
||
ERROR_MAX_RETRY, WAIT_BEFORE_RETRY, MAX_THREADS
|
||
EXCEL_RECALC_TIMEOUT
|
||
|
||
# Logging & UI
|
||
LOG_FILE_NAME, BAR_N_FMT_WIDTH, BAR_TOTAL_FMT_WIDTH, etc.
|
||
```
|
||
|
||
**Design Principle:** All constants are imported from this module - never duplicated or redefined in other modules. This ensures a single source of truth for all configuration values across the entire application.
|
||
|
||
---
|
||
|
||
## Complete Data Collection Workflow
|
||
|
||
### Phase 1: Initialization & Authentication
|
||
|
||
```
|
||
START
|
||
↓
|
||
[1] User Login Prompt
|
||
├─ Input: username, password (defaults available)
|
||
├─ IAM Authentication: POST /api/auth/ziwig-pro/login
|
||
├─ Get Master Token + User ID
|
||
└─ RC Token Exchange: POST /api/auth/config-token
|
||
└─ Output: access_token, refresh_token
|
||
↓
|
||
[2] Configuration Loading
|
||
├─ Parse Excel: Endobest_Dashboard_Config.xlsx
|
||
├─ Load Inclusions_Mapping sheet → Field mapping definition
|
||
├─ Validate all field configurations
|
||
└─ Load Regression_Check sheet → Quality rules
|
||
↓
|
||
[3] Thread Pool Configuration
|
||
├─ Main pool: ThreadPoolExecutor(user_input_threads, max=20)
|
||
├─ Async pool: ThreadPoolExecutor(40) for nested tasks
|
||
└─ Initialize per-thread HTTP clients
|
||
```
|
||
|
||
### Phase 2: Organization & Counters Retrieval
|
||
|
||
```
|
||
[4] Get All Organizations
|
||
├─ API: GET /api/inclusions/getAllOrganizations
|
||
├─ Filter: Exclude RC_ENDOBEST_EXCLUDED_CENTERS
|
||
└─ Output: List of all centers
|
||
↓
|
||
[5] Fetch Organization Counters (Parallelized)
|
||
├─ For each organization:
|
||
│ └─ POST /api/inclusions/inclusion-statistics
|
||
│ ├─ Protocol: RC_ENDOBEST_PROTOCOL_ID
|
||
│ └─ Store: patients_count, preincluded_count, included_count, prematurely_terminated_count
|
||
├─ Execute: 20 parallel workers
|
||
└─ Output: Organizations with counters
|
||
↓
|
||
[5b] Enrich Organizations with Center Mapping (Optional)
|
||
├─ Load mapping file: eb_org_center_mapping.xlsx (if exists)
|
||
├─ Parse sheet: Org_Center_Mapping
|
||
│ ├─ Extract: Organization_Name → Center_Name pairs
|
||
│ ├─ Validate: No duplicate organizations or centers
|
||
│ └─ Build: Normalized key mapping (case-insensitive, trimmed)
|
||
├─ For each organization:
|
||
│ ├─ Normalize organization name
|
||
│ ├─ Lookup in mapping dictionary
|
||
│ ├─ If found: Add Center_Name field (mapped value)
|
||
│ └─ If not found: Add Center_Name field (fallback to org name)
|
||
├─ Error Handling: Graceful degradation (missing file = skip silently)
|
||
└─ Output: Organizations with enriched Center_Name field
|
||
↓
|
||
[6] Calculate Totals & Sort
|
||
├─ Sum all patient counts across organizations
|
||
├─ Sort organizations by patient count (descending)
|
||
└─ Display summary statistics
|
||
```
|
||
|
||
### Phase 3: Patient Inclusion Data Collection
|
||
|
||
```
|
||
[7] For Each Organization (Parallelized - 20 workers):
|
||
├─ API: POST /api/inclusions/search?limit=1000&page=1
|
||
│ └─ Retrieve up to 1000 inclusions per organization
|
||
├─ Store: inclusions_list[]
|
||
└─ For Each Patient in Inclusions (Sequential):
|
||
↓
|
||
[8] Fetch Patient Data Sources (Parallel):
|
||
├─ THREAD 1: GET /api/records/byPatient
|
||
│ └─ Retrieve clinical record, protocol inclusions, data
|
||
├─ THREAD 2: GET /api/surveys/filter/with-answers (OPTIMIZED)
|
||
│ └─ Single call retrieves ALL questionnaires + answers for patient
|
||
├─ THREAD 3: GET /api/requests/by-tube-id/{tubeId}
|
||
│ └─ Retrieve lab test results
|
||
└─ WAIT: All parallel threads complete
|
||
↓
|
||
[9] Process Field Mappings
|
||
├─ For each field in field mapping config:
|
||
│ ├─ Determine field source (questionnaire, record, inclusion, request)
|
||
│ ├─ Extract raw value using field_path (supports JSON path + wildcards)
|
||
│ ├─ Apply field condition (if specified)
|
||
│ ├─ Execute custom functions (if Calculated type)
|
||
│ ├─ Apply post-processing transformations:
|
||
│ │ ├─ true_if_any: Convert to boolean if value matches list
|
||
│ │ ├─ value_labels: Map value to localized text
|
||
│ │ ├─ field_template: Apply formatting template
|
||
│ │ └─ List joining: Join array values with pipe delimiter
|
||
│ └─ Store in output_inclusion[field_group][field_name]
|
||
└─ Output: Complete inclusion record with all fields
|
||
↓
|
||
[10] Progress Update
|
||
├─ Update per-organization progress bar
|
||
└─ Update global progress bar (thread-safe)
|
||
↓
|
||
[11] Aggregate Results
|
||
└─ Combine all inclusions from all organizations
|
||
```
|
||
|
||
### Phase 4: Quality Assurance & Validation
|
||
|
||
```
|
||
[12] Sorting
|
||
├─ Sort by: Organization Name, Inclusion Date, Patient Pseudo
|
||
└─ Output: Ordered inclusions_list[]
|
||
↓
|
||
[13] Quality Checks Execution
|
||
├─ COHERENCE CHECK:
|
||
│ ├─ Compare organization statistics (API counters)
|
||
│ ├─ vs. actual inclusion data (detailed records)
|
||
│ ├─ Verify: total, preincluded, included, prematurely_terminated counts
|
||
│ └─ Report mismatches with severity levels
|
||
│
|
||
├─ NON-REGRESSION CHECK:
|
||
│ ├─ Load previous inclusions (_old file)
|
||
│ ├─ Compare current vs. previous data
|
||
│ ├─ Apply config-driven regression rules
|
||
│ ├─ Detect: new inclusions, deleted inclusions, field changes
|
||
│ ├─ Apply transition patterns and exceptions
|
||
│ └─ Report violations by severity (Warning/Critical)
|
||
│
|
||
└─ Result: has_coherence_critical, has_regression_critical flags
|
||
↓
|
||
[14] Critical Issues Handling
|
||
├─ If NO critical issues:
|
||
│ └─ Continue to export
|
||
├─ If YES critical issues:
|
||
│ ├─ Display warning: ⚠ CRITICAL issues detected!
|
||
│ ├─ Prompt user: "Do you want to write results anyway?"
|
||
│ ├─ If NO → Cancel export, exit gracefully
|
||
│ └─ If YES → Continue to export (user override)
|
||
```
|
||
|
||
### Phase 5: Export & Persistence
|
||
|
||
**Phase 5 covers both JSON persistence and optional Excel export. The architecture is flexible:**
|
||
|
||
```
|
||
[15] Backup Old Files (only if checks passed)
|
||
├─ endobest_inclusions.json → endobest_inclusions_old.json
|
||
├─ endobest_organizations.json → endobest_organizations_old.json
|
||
└─ Operation: Silent, overwrite existing backups
|
||
↓
|
||
[16] Write JSON Output Files
|
||
├─ File 1: endobest_inclusions.json
|
||
│ ├─ Format: JSON array of inclusion objects
|
||
│ ├─ Structure: Nested by field groups
|
||
│ └─ Size: Typically 6-7 MB (for full Endobest)
|
||
│
|
||
├─ File 2: endobest_organizations.json
|
||
│ ├─ Format: JSON array of organization objects
|
||
│ ├─ Includes: counters, statistics
|
||
│ └─ Size: Typically 17-20 KB
|
||
│
|
||
└─ Both: UTF-8 encoding, 4-space indentation
|
||
↓
|
||
[17] Excel Export (if configured)
|
||
├─ DELEGATED TO: run_normal_mode_export()
|
||
├─ (from eb_dashboard_excel_export module)
|
||
│
|
||
├─ Workflow:
|
||
│ ├─ Check: Is Excel export enabled?
|
||
│ │ └─ If NO → Skip to Completion (step 18)
|
||
│ │ └─ If YES → Continue
|
||
│ │
|
||
│ ├─ Load JSONs from filesystem
|
||
│ │ └─ Ensures consistency with just-written files
|
||
│ │
|
||
│ ├─ Load Excel export configuration
|
||
│ │ ├─ Sheet: Excel_Workbooks (workbook definitions)
|
||
│ │ └─ Sheet: Excel_Sheets (sheet configurations)
|
||
│ │
|
||
│ ├─ For each configured workbook:
|
||
│ │ ├─ Load template file (openpyxl)
|
||
│ │ ├─ For each sheet in workbook:
|
||
│ │ │ ├─ Load source data (Inclusions or Organizations JSON)
|
||
│ │ │ ├─ Apply filter (AND conditions)
|
||
│ │ │ ├─ Apply multi-key sort (datetime-aware)
|
||
│ │ │ ├─ Apply value replacements (strict type matching)
|
||
│ │ │ └─ Fill data into cells/named ranges
|
||
│ │ │
|
||
│ │ ├─ Handle file conflicts (Overwrite/Increment/Backup strategy)
|
||
│ │ ├─ Save workbook (openpyxl)
|
||
│ │ └─ Recalculate formulas (optional, via win32com)
|
||
│ │
|
||
│ └─ Return: status (success/failure) + error message
|
||
│
|
||
└─ Note: See DOCUMENTATION_13_EXCEL_EXPORT.md for data transformation details
|
||
↓
|
||
[18] Completion & Reporting
|
||
├─ Display elapsed time
|
||
├─ Report all file locations (JSONs + Excel files if generated)
|
||
├─ Log all operations to dashboard.log
|
||
└─ EXIT
|
||
```
|
||
|
||
**Three Operating Modes:**
|
||
|
||
1. **NORMAL MODE** (full workflow)
|
||
- Collect data → Quality checks → Write JSONs → Excel export (if enabled)
|
||
|
||
2. **--excel-only MODE**
|
||
- Skip data collection + quality checks
|
||
- Load existing JSONs → Excel export
|
||
- Uses: `export_excel_only()` function from module
|
||
|
||
3. **--check-only MODE**
|
||
- Skip data collection
|
||
- Run quality checks only
|
||
- Uses: `run_check_only_mode()` function from quality_checks module
|
||
|
||
### Expected Output Structure
|
||
|
||
```json
|
||
[
|
||
{
|
||
"Patient_Identification": {
|
||
"Organisation_Id": "uuid",
|
||
"Organisation_Name": "Center Name",
|
||
"Patient_Id": "internal_id",
|
||
"Pseudo": "ENDO-001",
|
||
"Patient_Name": "Doe, John",
|
||
"Patient_Birthday": "1975-05-15",
|
||
"Patient_Age": 49
|
||
},
|
||
"Inclusion": {
|
||
"Consent_Signed": true,
|
||
"Inclusion_Date": "15/10/2024",
|
||
"Inclusion_Status": "incluse",
|
||
"Inclusion_Complex": "Non",
|
||
"isPrematurelyTerminated": false,
|
||
"Inclusion_Status_Complete": "incluse",
|
||
"Need_RCP": false
|
||
},
|
||
"Extended_Fields": {
|
||
"Custom_Field_1": "value",
|
||
"Custom_Field_2": 42
|
||
},
|
||
"Endotest": {
|
||
"Request_Sent": true,
|
||
"Diagnostic_Status": "Completed",
|
||
"Request_Overall_Status": "Accepted par Ziwig Lab"
|
||
},
|
||
"Infos Générales": {
|
||
"Couleurs (ex: 8/10)": "8/10",
|
||
"Qualité de vie (ex: 43/55)": "43/55"
|
||
}
|
||
}
|
||
]
|
||
```
|
||
|
||
---
|
||
|
||
## API Integration
|
||
|
||
### Authentication APIs (IAM)
|
||
|
||
#### Login Endpoint
|
||
```
|
||
POST https://api-auth.ziwig-connect.com/api/auth/ziwig-pro/login
|
||
|
||
Request:
|
||
{
|
||
"username": "user@example.com",
|
||
"password": "password123"
|
||
}
|
||
|
||
Response:
|
||
{
|
||
"access_token": "jwt_token_master",
|
||
"userId": "user-uuid",
|
||
...
|
||
}
|
||
```
|
||
|
||
#### Token Exchange (RC-specific)
|
||
```
|
||
POST https://api-hcp.ziwig-connect.com/api/auth/config-token
|
||
|
||
Headers:
|
||
Authorization: Bearer {master_token}
|
||
|
||
Request:
|
||
{
|
||
"userId": "user-uuid",
|
||
"clientId": "602aea51-cdb2-4f73-ac99-fd84050dc393",
|
||
"userAgent": "Mozilla/5.0..."
|
||
}
|
||
|
||
Response:
|
||
{
|
||
"access_token": "jwt_token_rc",
|
||
"refresh_token": "refresh_token_value"
|
||
}
|
||
```
|
||
|
||
#### Token Refresh (Automatic on 401)
|
||
```
|
||
POST https://api-hcp.ziwig-connect.com/api/auth/refreshToken
|
||
|
||
Headers:
|
||
Authorization: Bearer {current_access_token}
|
||
|
||
Request:
|
||
{
|
||
"refresh_token": "refresh_token_value"
|
||
}
|
||
|
||
Response:
|
||
{
|
||
"access_token": "new_jwt_token",
|
||
"refresh_token": "new_refresh_token"
|
||
}
|
||
```
|
||
|
||
### Research Clinic APIs (RC)
|
||
|
||
#### Get All Organizations
|
||
```
|
||
GET https://api-hcp.ziwig-connect.com/api/inclusions/getAllOrganizations
|
||
|
||
Headers:
|
||
Authorization: Bearer {access_token}
|
||
|
||
Response:
|
||
[
|
||
{
|
||
"id": "org-uuid",
|
||
"name": "Center Name",
|
||
"address": "...",
|
||
...
|
||
}
|
||
]
|
||
```
|
||
|
||
#### Get Organization Statistics
|
||
```
|
||
POST https://api-hcp.ziwig-connect.com/api/inclusions/inclusion-statistics
|
||
|
||
Headers:
|
||
Authorization: Bearer {access_token}
|
||
|
||
Request:
|
||
{
|
||
"protocolId": "3c7bcb4d-91ed-4e9f-b93f-99d8447a276e",
|
||
"center": "org-uuid",
|
||
"excludedCenters": ["excluded-org-uuid-1", "excluded-org-uuid-2"]
|
||
}
|
||
|
||
Response:
|
||
{
|
||
"statistic": {
|
||
"totalInclusions": 145,
|
||
"preIncluded": 23,
|
||
"included": 110,
|
||
"prematurelyTerminated": 12
|
||
}
|
||
}
|
||
```
|
||
|
||
#### Search Inclusions by Organization
|
||
```
|
||
POST https://api-hcp.ziwig-connect.com/api/inclusions/search?limit=1000&page=1
|
||
|
||
Headers:
|
||
Authorization: Bearer {access_token}
|
||
|
||
Request:
|
||
{
|
||
"protocolId": "3c7bcb4d-91ed-4e9f-b93f-99d8447a276e",
|
||
"center": "org-uuid",
|
||
"keywords": ""
|
||
}
|
||
|
||
Response:
|
||
{
|
||
"data": [
|
||
{
|
||
"id": "patient-uuid",
|
||
"name": "Doe, John",
|
||
"status": "incluse",
|
||
...
|
||
}
|
||
]
|
||
}
|
||
```
|
||
|
||
#### Get Patient Clinical Record
|
||
```
|
||
POST https://api-hcp.ziwig-connect.com/api/records/byPatient
|
||
|
||
Headers:
|
||
Authorization: Bearer {access_token}
|
||
|
||
Request:
|
||
{
|
||
"center": "org-uuid",
|
||
"patientId": "patient-uuid",
|
||
"mode": "exchange",
|
||
"state": "ongoing",
|
||
"includeEndoParcour": false,
|
||
"sourceClient": "pro_prm"
|
||
}
|
||
|
||
Response:
|
||
{
|
||
"record": {
|
||
"protocol_inclusions": [
|
||
{
|
||
"status": "incluse",
|
||
"blockedQcmVersions": [],
|
||
"clinicResearchData": [
|
||
{
|
||
"requestMetaData": {
|
||
"tubeId": "tube-uuid"
|
||
}
|
||
}
|
||
]
|
||
}
|
||
]
|
||
}
|
||
}
|
||
```
|
||
|
||
#### Get All Questionnaires for Patient (Optimized)
|
||
```
|
||
POST https://api-hcp.ziwig-connect.com/api/surveys/filter/with-answers
|
||
|
||
Headers:
|
||
Authorization: Bearer {access_token}
|
||
|
||
Request:
|
||
{
|
||
"context": "clinic_research",
|
||
"subject": "patient-uuid",
|
||
"blockedQcmVersions": [] (optional)
|
||
}
|
||
|
||
Response:
|
||
[
|
||
{
|
||
"questionnaire": {
|
||
"id": "qcm-uuid",
|
||
"name": "Questionnaire Name",
|
||
"category": "Category"
|
||
},
|
||
"answers": {
|
||
"question_1": "answer_value",
|
||
"question_2": true,
|
||
...
|
||
}
|
||
}
|
||
]
|
||
```
|
||
|
||
### Lab APIs (GDD)
|
||
|
||
#### Get Request by Tube ID
|
||
```
|
||
GET https://api-lab.ziwig-connect.com/api/requests/by-tube-id/{tubeId}?isAdmin=true&organization=undefined
|
||
|
||
Headers:
|
||
Authorization: Bearer {access_token}
|
||
|
||
Response:
|
||
{
|
||
"id": "request-uuid",
|
||
"status": "completed",
|
||
"tubeId": "tube-uuid",
|
||
"diagnostic_status": "Completed",
|
||
"results": [
|
||
{
|
||
"test_name": "Test Result",
|
||
"value": "Result Value"
|
||
}
|
||
]
|
||
}
|
||
```
|
||
|
||
---
|
||
|
||
## Multithreading & Performance
|
||
|
||
### Thread Pool Architecture
|
||
|
||
```
|
||
Main Application Thread
|
||
↓
|
||
┌─────────────────────────────────────────────────────┐
|
||
│ Phase 1: Counter Fetching │
|
||
│ ThreadPoolExecutor(max_workers=user_input) │
|
||
│ ├─ Task 1: Get counter for Org 1 │
|
||
│ ├─ Task 2: Get counter for Org 2 │
|
||
│ └─ Task N: Get counter for Org N │
|
||
│ [Sequential wait: tqdm.as_completed] │
|
||
└─────────────────────────────────────────────────────┘
|
||
↓
|
||
┌─────────────────────────────────────────────────────┐
|
||
│ Phase 2: Inclusion Data Collection (Nested) │
|
||
│ Outer: ThreadPoolExecutor(max_workers=user_input) │
|
||
│ ├─ For Org 1: │
|
||
│ │ └─ Inner: ThreadPoolExecutor(max_workers=40) │
|
||
│ │ ├─ Patient 1: Async request/questionnaires │
|
||
│ │ ├─ Patient 2: Async request/questionnaires │
|
||
│ │ └─ Patient N: Async request/questionnaires │
|
||
│ │ └─ [Sequential wait: as_completed] │
|
||
│ │ │
|
||
│ ├─ For Org 2: │
|
||
│ │ └─ [Similar parallel processing] │
|
||
│ │ │
|
||
│ └─ For Org N: │
|
||
│ └─ [Similar parallel processing] │
|
||
│ [Outer wait: tqdm.as_completed] │
|
||
└─────────────────────────────────────────────────────┘
|
||
```
|
||
|
||
### Performance Optimizations
|
||
|
||
#### 1. Questionnaire Batching
|
||
**Problem:** Multiple filtered API calls per patient (slow)
|
||
**Solution:** Single optimized API call retrieves all questionnaires with answers
|
||
**Impact:** 4-5x performance improvement
|
||
|
||
```python
|
||
# BEFORE (inefficient):
|
||
for qcm_id in questionnaire_ids:
|
||
answers = GET /api/surveys/{qcm_id}/answers?subject={patient_id}
|
||
|
||
# AFTER (optimized):
|
||
all_answers = POST /api/surveys/filter/with-answers
|
||
with payload: {"context": "clinic_research", "subject": patient_id}
|
||
```
|
||
|
||
#### 2. Thread-Local HTTP Clients
|
||
**Problem:** Shared httpx.Client causes connection conflicts
|
||
**Solution:** Each thread maintains its own client
|
||
**Implementation:**
|
||
```python
|
||
def get_httpx_client() -> httpx.Client:
|
||
thread_id = threading.get_ident()
|
||
if thread_id not in httpx_clients:
|
||
httpx_clients[thread_id] = httpx.Client()
|
||
return httpx_clients[thread_id]
|
||
```
|
||
|
||
#### 3. Nested Parallelization
|
||
**Problem:** Sequential patient processing within organization
|
||
**Solution:** Submitting request/questionnaire fetches to async pool
|
||
**Benefit:** Non-blocking I/O during main thread processing
|
||
|
||
```python
|
||
for inclusion in inclusions:
|
||
output_inclusion = _process_inclusion_data(inclusion, organization)
|
||
# Within _process_inclusion_data():
|
||
request_future = subtasks_thread_pool.submit(get_request_by_tube_id, tube_id)
|
||
all_questionnaires = get_all_questionnaires_by_patient(patient_id, record_data)
|
||
request_data = request_future.result() # Wait for async completion
|
||
```
|
||
|
||
#### 4. Configurable Worker Threads
|
||
**User Input:** Thread count selection (1-20 workers)
|
||
**Rationale:** Allows tuning for network bandwidth, API rate limits, system resources
|
||
|
||
### Progress Tracking
|
||
|
||
#### Multi-Level Progress Bars
|
||
```
|
||
Overall Progress [████████████░░░░░░░░░░░░] 847/1200
|
||
1/15 - Center 1 [██████████░░░░░░░░░░░░░░░] 73/95
|
||
2/15 - Center 2 [██████░░░░░░░░░░░░░░░░░░░] 42/110
|
||
3/15 - Center 3 [████░░░░░░░░░░░░░░░░░░░░░] 28/85
|
||
```
|
||
|
||
#### Thread-Safe Progress Updates
|
||
```python
|
||
with _global_pbar_lock:
|
||
if global_pbar:
|
||
global_pbar.update(1) # Thread-safe update
|
||
```
|
||
|
||
---
|
||
|
||
## Data Processing Pipeline
|
||
|
||
### Field Extraction Logic
|
||
|
||
```
|
||
For each field in field mapping configuration:
|
||
├─ Input: field configuration from Excel
|
||
│
|
||
├─ Step 1: Determine Field Source
|
||
│ ├─ If source_type in [q_id, q_name, q_category]
|
||
│ │ └─ Find questionnaire in all_questionnaires dict
|
||
│ ├─ If source_type == "record"
|
||
│ │ └─ Use record_data (clinical record)
|
||
│ ├─ If source_type == "inclusion"
|
||
│ │ └─ Use inclusion_data (patient inclusion data)
|
||
│ ├─ If source_type == "request"
|
||
│ │ └─ Use request_data (lab test request)
|
||
│ └─ If source_name == "Calculated"
|
||
│ └─ Execute custom function
|
||
│
|
||
├─ Step 2: Extract Raw Value
|
||
│ ├─ Navigate JSON using field_path (supports * wildcard)
|
||
│ ├─ Example: ["record", "clinicResearchData", "*", "value"]
|
||
│ └─ Result: raw_value or "undefined"
|
||
│
|
||
├─ Step 3: Check Field Condition (optional)
|
||
│ ├─ If condition field is undefined
|
||
│ │ └─ Set final_value = "undefined"
|
||
│ ├─ If condition field is not boolean
|
||
│ │ └─ Set final_value = "$$$$ Condition Field Error"
|
||
│ ├─ If condition field is False
|
||
│ │ └─ Set final_value = "N/A"
|
||
│ └─ If condition field is True
|
||
│ └─ Continue processing
|
||
│
|
||
├─ Step 4: Apply Post-Processing Transformations
|
||
│ ├─ true_if_any: Convert to boolean
|
||
│ │ └─ If raw_value matches any value in true_if_any list → True
|
||
│ │ └─ Otherwise → False
|
||
│ │
|
||
│ ├─ value_labels: Map to localized text
|
||
│ │ └─ Find matching label_map entry by raw_value
|
||
│ │ └─ Replace with French text (text.fr)
|
||
│ │
|
||
│ ├─ field_template: Apply formatting
|
||
│ │ └─ Replace "$value" placeholder with formatted value
|
||
│ │ └─ Example: "$value%" → "85%"
|
||
│ │
|
||
│ └─ List joining: Flatten arrays
|
||
│ └─ Join array elements with "|" delimiter
|
||
│
|
||
├─ Step 5: Format Score Dictionaries
|
||
│ ├─ If value is dict with keys ['total', 'max']
|
||
│ │ └─ Format as "total/max" string
|
||
│ │ └─ Example: {"total": 8, "max": 10} → "8/10"
|
||
│ └─ Otherwise: Keep as-is
|
||
│
|
||
└─ Output: final_value
|
||
└─ Stored in output_inclusion[field_group][field_name]
|
||
```
|
||
|
||
### Custom Functions for Calculated Fields
|
||
|
||
#### 1. search_in_fields_using_regex
|
||
**Purpose:** Search multiple fields for regex pattern match
|
||
**Syntax:** `["search_in_fields_using_regex", "regex_pattern", "field_1", "field_2", ...]`
|
||
**Logic:**
|
||
```
|
||
FOR each field in [field_1, field_2, ...]:
|
||
IF field value matches regex_pattern (case-insensitive):
|
||
RETURN True
|
||
RETURN False
|
||
```
|
||
**Example:**
|
||
```json
|
||
{
|
||
"source_id": "search_in_fields_using_regex",
|
||
"field_path": [".*surgery.*", "Indication", "Previous_Surgery"]
|
||
}
|
||
```
|
||
|
||
#### 2. extract_parentheses_content
|
||
**Purpose:** Extract text within parentheses
|
||
**Syntax:** `["extract_parentheses_content", "field_name"]`
|
||
**Logic:**
|
||
```
|
||
value = get_value_from_inclusion(field_name)
|
||
RETURN match first occurrence of (content) pattern
|
||
```
|
||
**Example:**
|
||
```
|
||
Input: "Status (Active)"
|
||
Output: "Active"
|
||
```
|
||
|
||
#### 3. append_terminated_suffix
|
||
**Purpose:** Add " - AP" suffix if patient prematurely terminated
|
||
**Syntax:** `["append_terminated_suffix", "status_field", "is_terminated_field"]`
|
||
**Logic:**
|
||
```
|
||
status = get_value_from_inclusion(status_field)
|
||
is_terminated = get_value_from_inclusion(is_terminated_field)
|
||
IF is_terminated == True:
|
||
RETURN status + " - AP"
|
||
ELSE:
|
||
RETURN status
|
||
```
|
||
|
||
#### 4. if_then_else
|
||
**Purpose:** Unified conditional logic with 8 operators
|
||
**Syntax:** `["if_then_else", "operator", arg1, arg2_optional, result_if_true, result_if_false]`
|
||
|
||
**Operators:**
|
||
|
||
| Operator | Args | Logic |
|
||
|----------|------|-------|
|
||
| `is_true` | field, true_val, false_val | IF field == True THEN true_val ELSE false_val |
|
||
| `is_false` | field, true_val, false_val | IF field == False THEN true_val ELSE false_val |
|
||
| `is_defined` | field, true_val, false_val | IF field is not undefined THEN true_val ELSE false_val |
|
||
| `is_undefined` | field, true_val, false_val | IF field is undefined THEN true_val ELSE false_val |
|
||
| `all_true` | [fields_list], true_val, false_val | IF all fields are True THEN true_val ELSE false_val |
|
||
| `all_defined` | [fields_list], true_val, false_val | IF all fields are defined THEN true_val ELSE false_val |
|
||
| `==` | value1, value2, true_val, false_val | IF value1 == value2 THEN true_val ELSE false_val |
|
||
| `!=` | value1, value2, true_val, false_val | IF value1 != value2 THEN true_val ELSE false_val |
|
||
|
||
**Value Resolution Rules:**
|
||
- **Boolean literals:** `true`, `false` → used directly
|
||
- **Numeric literals:** `42`, `3.14` → used directly
|
||
- **String literals:** Prefixed with `$` → `$"Active"` → `"Active"`
|
||
- **Field references:** No prefix → looked up from inclusion data
|
||
|
||
**Examples:**
|
||
```json
|
||
{
|
||
"source_id": "if_then_else",
|
||
"field_path": ["is_defined", "Patient_Id", "$\"DEFINED\"", "$\"UNDEFINED\""]
|
||
}
|
||
|
||
{
|
||
"source_id": "if_then_else",
|
||
"field_path": ["==", "Status", "$\"Active\"", "$\"Is Active\"", "$\"Not Active\""]
|
||
}
|
||
|
||
{
|
||
"source_id": "if_then_else",
|
||
"field_path": ["all_true", ["Is_Consented", "Is_Included"], true, false]
|
||
}
|
||
```
|
||
|
||
---
|
||
|
||
## Execution Modes
|
||
|
||
### Mode 1: Normal Mode (Full Data Collection)
|
||
```bash
|
||
python eb_dashboard.py
|
||
```
|
||
|
||
**Workflow:**
|
||
1. User login (with defaults)
|
||
2. Load configuration
|
||
3. Collect organizations & counters
|
||
4. Collect all inclusion data (parallelized)
|
||
5. Run quality checks (coherence + regression)
|
||
6. Prompt user if critical issues
|
||
7. Export JSON files
|
||
8. Display elapsed time
|
||
|
||
**Output Files:**
|
||
- `endobest_inclusions.json`
|
||
- `endobest_organizations.json`
|
||
- Backup files with `_old` suffix
|
||
- Excel files (if configured in Excel_Workbooks table)
|
||
|
||
### Mode 2: Excel-Only Mode (Fast Export) - NEW
|
||
```bash
|
||
python eb_dashboard.py --excel-only
|
||
```
|
||
|
||
**Workflow:**
|
||
1. Load existing JSON files (no API calls, no collection)
|
||
2. Load Excel export configuration
|
||
3. Generate Excel workbooks from existing data
|
||
4. Exit
|
||
|
||
**Use Case:** Regenerate Excel reports without data collection (faster iteration), test new configurations, apply new filters/sorts
|
||
|
||
**Output Files:**
|
||
- Excel files as specified in Excel_Workbooks configuration
|
||
|
||
### Mode 3: Check-Only Mode (Validation Only)
|
||
```bash
|
||
python eb_dashboard.py --check-only
|
||
```
|
||
|
||
**Workflow:**
|
||
1. Load existing JSON files (no API calls)
|
||
2. Load regression check configuration
|
||
3. Run quality checks without collecting new data
|
||
4. Report any issues
|
||
5. Exit
|
||
|
||
**Use Case:** Validate data before distribution, no fresh collection needed
|
||
|
||
### Mode 4: Check-Only Compare Mode (File Comparison)
|
||
```bash
|
||
python eb_dashboard.py --check-only file1.json file2.json
|
||
```
|
||
|
||
**Workflow:**
|
||
1. Load two specific JSON files
|
||
2. Run regression check comparing file1 vs file2
|
||
3. Skip coherence check (organizations file not needed)
|
||
4. Report differences
|
||
5. Exit
|
||
|
||
**Use Case:** Compare two snapshot versions without coherence validation
|
||
|
||
### Mode 4: Debug Mode (Detailed Output)
|
||
```bash
|
||
python eb_dashboard.py --debug
|
||
```
|
||
|
||
**Workflow:**
|
||
1. Execute as normal mode
|
||
2. Enable DEBUG_MODE in quality checks module
|
||
3. Display detailed field-by-field changes
|
||
4. Show individual inclusion comparisons
|
||
5. Verbose logging
|
||
|
||
**Use Case:** Troubleshoot regression check rules, understand data changes
|
||
|
||
---
|
||
|
||
## Organization ↔ Center Mapping
|
||
|
||
### Overview
|
||
|
||
The organization-to-center mapping feature enriches healthcare organization records with standardized center identifiers. This enables center-based reporting without requiring code modifications.
|
||
|
||
### Configuration
|
||
|
||
**File:** `eb_org_center_mapping.xlsx` (optional, in script directory)
|
||
|
||
**Sheet Name:** `Org_Center_Mapping` (case-sensitive)
|
||
|
||
**Required Columns:**
|
||
```
|
||
| Organization_Name | Center_Name |
|
||
|-------------------|-------------|
|
||
| Hospital A | HOSP-A |
|
||
| Hospital B | HOSP-B |
|
||
```
|
||
|
||
### Workflow
|
||
|
||
1. **Load Mapping** (Step [5b] of Phase 2)
|
||
- Read `eb_org_center_mapping.xlsx` if file exists
|
||
- Parse `Org_Center_Mapping` sheet
|
||
- Skip silently if file not found (graceful degradation)
|
||
|
||
2. **Validate Data**
|
||
- Check for duplicate organization names (normalized: lowercase, trimmed)
|
||
- Check for duplicate center names
|
||
- If duplicates found: abort mapping, return empty dict
|
||
|
||
3. **Build Mapping Dictionary**
|
||
- Key: normalized organization name
|
||
- Value: center name (original case preserved)
|
||
- Example: `{"hospital a": "HOSP-A"}`
|
||
|
||
4. **Apply to Organizations**
|
||
- For each organization from RC API:
|
||
- Normalize organization name (lowercase, trim)
|
||
- Lookup in mapping dictionary
|
||
- If found: Add `Center_Name` field with mapped value
|
||
- If not found: Add `Center_Name` field with fallback (org name)
|
||
|
||
### Error Handling
|
||
|
||
| Scenario | Behavior |
|
||
|----------|----------|
|
||
| File missing | Print warning, skip mapping |
|
||
| Sheet not found | Print warning, skip mapping |
|
||
| Columns missing | Print warning, skip mapping |
|
||
| Duplicate organizations | Abort mapping, print error |
|
||
| Duplicate centers | Abort mapping, print error |
|
||
| Organization not in mapping | Use fallback (org name) |
|
||
|
||
### Output
|
||
|
||
**In `endobest_organizations.json`:**
|
||
```json
|
||
{
|
||
"id": "org-uuid",
|
||
"name": "Hospital A",
|
||
"Center_Name": "HOSP-A",
|
||
"patients_count": 45,
|
||
...
|
||
}
|
||
```
|
||
|
||
**In `endobest_inclusions.json` (if extended field configured):**
|
||
```json
|
||
{
|
||
"Patient_Identification": {
|
||
"Organisation_Name": "Hospital A",
|
||
"Center_Name": "HOSP-A",
|
||
...
|
||
}
|
||
}
|
||
```
|
||
|
||
### Example
|
||
|
||
**Input Organizations (from RC API):**
|
||
```json
|
||
[
|
||
{"id": "org1", "name": "Hospital A"},
|
||
{"id": "org2", "name": "Hospital B"},
|
||
{"id": "org3", "name": "Clinic C"}
|
||
]
|
||
```
|
||
|
||
**Mapping File:**
|
||
```
|
||
Organization_Name | Center_Name
|
||
Hospital A | HOSP-A
|
||
Hospital B | HOSP-B
|
||
```
|
||
|
||
**Console Output:**
|
||
```
|
||
Mapping organizations to centers...
|
||
⚠ 1 organization(s) not mapped:
|
||
- Clinic C
|
||
```
|
||
|
||
**Result:** Clinic C uses fallback → `Center_Name = "Clinic C"`
|
||
|
||
### Features
|
||
|
||
- ✅ **Case-Insensitive Matching**: "Hospital A" matches "hospital a" in file
|
||
- ✅ **Whitespace Trimming**: " Hospital A " matches "Hospital A"
|
||
- ✅ **Graceful Degradation**: Missing file doesn't break process
|
||
- ✅ **Fallback Strategy**: Unmapped organizations use original name
|
||
- ✅ **No Code Changes**: Fully configurable via Excel file
|
||
|
||
---
|
||
|
||
## Error Handling & Resilience
|
||
|
||
### Token Management Strategy
|
||
|
||
#### 1. Automatic Token Refresh on 401
|
||
```python
|
||
@api_call_with_retry
|
||
def some_api_call():
|
||
# If response.status_code == 401:
|
||
# new_token() is called automatically
|
||
# Request is retried
|
||
pass
|
||
```
|
||
|
||
#### 2. Thread-Safe Token Refresh
|
||
```python
|
||
def new_token():
|
||
global access_token, refresh_token
|
||
with _token_refresh_lock: # Only one thread refreshes at a time
|
||
# Attempt refresh up to ERROR_MAX_RETRY times
|
||
for attempt in range(ERROR_MAX_RETRY):
|
||
try:
|
||
# POST /api/auth/refreshToken
|
||
# Update global tokens
|
||
except:
|
||
sleep(WAIT_BEFORE_RETRY)
|
||
```
|
||
|
||
### Retry Mechanism
|
||
|
||
#### Configuration Constants
|
||
```python
|
||
ERROR_MAX_RETRY = 10 # Maximum retry attempts
|
||
WAIT_BEFORE_RETRY = 0.5 # Seconds between retries (no exponential backoff)
|
||
```
|
||
|
||
#### Retry Logic
|
||
```python
|
||
for attempt in range(ERROR_MAX_RETRY):
|
||
try:
|
||
# Make API call
|
||
response.raise_for_status()
|
||
return result
|
||
except (httpx.RequestError, httpx.HTTPStatusError) as exc:
|
||
logging.warning(f"Error (Attempt {attempt + 1}/{ERROR_MAX_RETRY}): {exc}")
|
||
|
||
# Handle 401 (token expired)
|
||
if isinstance(exc, httpx.HTTPStatusError) and exc.response.status_code == 401:
|
||
logging.info("Token expired. Refreshing token.")
|
||
new_token()
|
||
|
||
# Wait before retry (except last attempt)
|
||
if attempt < ERROR_MAX_RETRY - 1:
|
||
sleep(WAIT_BEFORE_RETRY)
|
||
|
||
# If all retries fail
|
||
logging.critical(f"Persistent error after {ERROR_MAX_RETRY} attempts")
|
||
raise httpx.RequestError(message="Persistent error")
|
||
```
|
||
|
||
### Exception Handling
|
||
|
||
#### API Errors
|
||
- **httpx.RequestError:** Network errors, connection timeouts, DNS failures
|
||
- **httpx.HTTPStatusError:** HTTP status codes >= 400
|
||
- **json.JSONDecodeError:** Invalid JSON in configuration or response
|
||
|
||
#### File I/O Errors
|
||
- **FileNotFoundError:** Configuration file missing
|
||
- **IOError:** Cannot write output files
|
||
- **json.JSONDecodeError:** Corrupted JSON file loading
|
||
|
||
#### Validation Errors
|
||
- **Configuration validation:** Invalid field definitions in Excel
|
||
- **Data validation:** Incoherent statistics vs. detailed data
|
||
- **Regression check violations:** Unexpected data changes
|
||
|
||
#### Error Logging
|
||
```python
|
||
import logging
|
||
logging.basicConfig(
|
||
level=logging.WARNING,
|
||
format='%(asctime)s - %(levelname)s - %(message)s',
|
||
filename='dashboard.log',
|
||
filemode='w'
|
||
)
|
||
```
|
||
|
||
**Logged Events:**
|
||
- API errors with attempt numbers
|
||
- Token refresh events
|
||
- Configuration loading status
|
||
- Quality check results
|
||
- File I/O operations
|
||
- Thread errors with stack traces
|
||
|
||
### Graceful Degradation
|
||
|
||
#### User Confirmation on Critical Issues
|
||
```
|
||
If has_coherence_critical or has_regression_critical:
|
||
Display: "⚠ CRITICAL issues detected in quality checks!"
|
||
Prompt: "Do you want to write the results anyway?"
|
||
|
||
If YES:
|
||
Continue with export (user override)
|
||
If NO:
|
||
Cancel export, preserve old files
|
||
Exit gracefully
|
||
```
|
||
|
||
#### Thread Failure Handling
|
||
```python
|
||
try:
|
||
result = future.result()
|
||
output_inclusions.extend(result)
|
||
except Exception as exc:
|
||
logging.critical(f"Critical error in worker: {exc}", exc_info=True)
|
||
thread_pool.shutdown(wait=False, cancel_futures=True)
|
||
raise # Propagate to main handler
|
||
```
|
||
|
||
#### Main Exception Handler
|
||
```python
|
||
if __name__ == '__main__':
|
||
try:
|
||
main()
|
||
except Exception as e:
|
||
logging.critical(f"Script terminated prematurely: {e}", exc_info=True)
|
||
print(f"Error: {e}")
|
||
finally:
|
||
if 'subtasks_thread_pool' in globals():
|
||
subtasks_thread_pool.shutdown(wait=False, cancel_futures=True)
|
||
input("Press Enter to exit...")
|
||
```
|
||
|
||
---
|
||
|
||
## Performance Metrics & Benchmarks
|
||
|
||
### Typical Execution Times
|
||
|
||
For a full Endobest dataset (1,200+ patients, 15+ organizations):
|
||
|
||
| Phase | Duration | Notes |
|
||
|-------|----------|-------|
|
||
| Login & Config | ~2-3 sec | Sequential |
|
||
| Fetch Counters (20 workers) | ~5-8 sec | Parallelized |
|
||
| Collect Inclusions (20 workers) | ~2-4 min | Includes API calls + processing |
|
||
| Quality Checks | ~10-15 sec | Loads files, compares data |
|
||
| Export to JSON | ~3-5 sec | File I/O |
|
||
| **Total** | **~2.5-5 min** | Depends on network, API performance |
|
||
|
||
### Network Optimization Impact
|
||
|
||
**With old questionnaire fetching (N filtered calls per patient):**
|
||
- 1,200 patients × 15 questionnaires = 18,000 API calls
|
||
- Estimated: 15-30 minutes
|
||
|
||
**With optimized single-call questionnaire fetching:**
|
||
- 1,200 patients × 1 call = 1,200 API calls
|
||
- Estimated: 2-5 minutes
|
||
- **Improvement: 3-6x faster**
|
||
|
||
---
|
||
|
||
## Configuration Files
|
||
|
||
### Excel Configuration File: `Endobest_Dashboard_Config.xlsx`
|
||
|
||
#### Sheet 1: Inclusions_Mapping (Field Mapping Definition)
|
||
Defines all fields to be extracted and their transformation rules.
|
||
See **DOCUMENTATION_11_FIELD_MAPPING.md** for detailed guide.
|
||
|
||
#### Sheet 2: Regression_Check (Non-Regression Rules)
|
||
Defines data validation rules for detecting unexpected changes.
|
||
See **DOCUMENTATION_12_QUALITY_CHECKS.md** for detailed guide.
|
||
|
||
---
|
||
|
||
## Summary
|
||
|
||
The **Endobest Dashboard** implements a sophisticated, production-grade data collection system with:
|
||
|
||
✅ **Flexible Configuration:** Zero-code field definitions via Excel
|
||
✅ **High Performance:** 4-5x faster via optimized API calls
|
||
✅ **Robust Resilience:** Automatic token refresh, retries, error recovery
|
||
✅ **Thread Safety:** Per-thread clients, synchronized shared state
|
||
✅ **Quality Assurance:** Coherence checks + config-driven regression testing
|
||
✅ **Comprehensive Logging:** Full audit trail in dashboard.log
|
||
✅ **User-Friendly:** Progress bars, interactive prompts, clear error messages
|
||
|
||
This architecture enables non-technical users to configure new data sources without code changes, while providing developers with extensible hooks for custom logic and quality validation.
|
||
|
||
---
|
||
|
||
**Document End**
|