Version fonctionnelle
This commit is contained in:
990
DOCUMENTATION/DOCUMENTATION_30_ARCHITECTURE_SUMMARY.md
Normal file
990
DOCUMENTATION/DOCUMENTATION_30_ARCHITECTURE_SUMMARY.md
Normal file
@@ -0,0 +1,990 @@
|
||||
# 📊 Endobest Clinical Research Dashboard - Architecture Summary
|
||||
|
||||
**Last Updated:** 2025-11-08
|
||||
**Project Status:** Production Ready with Excel Export Feature
|
||||
**Language:** Python 3.x
|
||||
|
||||
---
|
||||
|
||||
## 🎯 Executive Summary
|
||||
|
||||
The **Endobest Clinical Research Dashboard** is a sophisticated, production-grade automated data collection and reporting system designed to aggregate patient inclusion data from the Endobest clinical research protocol across multiple healthcare organizations. The system combines high-performance multithreading, comprehensive quality assurance, and fully externalized configuration to enable non-technical users to manage complex data extraction workflows without code modifications.
|
||||
|
||||
### Core Value Propositions
|
||||
|
||||
✅ **100% Externalized Configuration** - All field definitions, quality rules, and export logic defined in Excel
|
||||
✅ **High-Performance Architecture** - 4-5x faster via optimized API calls and parallel processing
|
||||
✅ **Robust Resilience** - Automatic token refresh, retries, graceful degradation
|
||||
✅ **Comprehensive Quality Assurance** - Coherence checks + config-driven regression testing
|
||||
✅ **Multi-Format Export** - JSON + configurable Excel workbooks with data transformation
|
||||
✅ **User-Friendly Interface** - Interactive prompts, progress tracking, clear error messages
|
||||
|
||||
---
|
||||
|
||||
## 📁 Project Structure
|
||||
|
||||
```
|
||||
Endobest Dashboard/
|
||||
├── 📜 MAIN SCRIPT
|
||||
│ └── eb_dashboard.py (57.5 KB, 1,021 lines)
|
||||
│ Core orchestrator for data collection, processing, and export
|
||||
│
|
||||
├── 🔧 UTILITY MODULES
|
||||
│ ├── eb_dashboard_utils.py (6.4 KB, 184 lines)
|
||||
│ │ Thread-safe HTTP clients, nested data navigation, config resolution
|
||||
│ │
|
||||
│ ├── eb_dashboard_quality_checks.py (58.5 KB, 1,266 lines)
|
||||
│ │ Coherence checks, non-regression testing, data validation
|
||||
│ │
|
||||
│ └── eb_dashboard_excel_export.py (32 KB, ~1,000 lines)
|
||||
│ Configuration-driven Excel workbook generation
|
||||
│
|
||||
├── 📚 DOCUMENTATION
|
||||
│ ├── DOCUMENTATION_10_ARCHITECTURE.md (43.7 KB)
|
||||
│ │ System design, data flow, API integration, multithreading
|
||||
│ │
|
||||
│ ├── DOCUMENTATION_11_FIELD_MAPPING.md (56.3 KB)
|
||||
│ │ Field extraction logic, custom functions, transformations
|
||||
│ │
|
||||
│ ├── DOCUMENTATION_12_QUALITY_CHECKS.md (60.2 KB)
|
||||
│ │ Quality assurance framework, regression rules, validation logic
|
||||
│ │
|
||||
│ ├── DOCUMENTATION_13_EXCEL_EXPORT.md (29.6 KB)
|
||||
│ │ Excel generation architecture, data transformation pipeline
|
||||
│ │
|
||||
│ ├── DOCUMENTATION_98_USER_GUIDE.md (8.4 KB)
|
||||
│ │ End-user instructions, quick start, troubleshooting
|
||||
│ │
|
||||
│ └── DOCUMENTATION_99_CONFIG_GUIDE.md (24.8 KB)
|
||||
│ Administrator configuration reference
|
||||
│
|
||||
├── ⚙️ CONFIGURATION
|
||||
│ └── config/
|
||||
│ ├── Endobest_Dashboard_Config.xlsx (Configuration file)
|
||||
│ │ Inclusions_Mapping
|
||||
│ │ Organizations_Mapping
|
||||
│ │ Excel_Workbooks
|
||||
│ │ Excel_Sheets
|
||||
│ │ Regression_Check
|
||||
│ │
|
||||
│ ├── eb_org_center_mapping.xlsx (Organization enrichment)
|
||||
│ │
|
||||
│ └── templates/
|
||||
│ ├── Endobest_Template.xlsx
|
||||
│ ├── Statistics_Template.xlsx
|
||||
│ └── (Other Excel templates)
|
||||
│
|
||||
├── 📊 OUTPUT FILES
|
||||
│ ├── endobest_inclusions.json (~6-7 MB, patient data)
|
||||
│ ├── endobest_inclusions_old.json (backup)
|
||||
│ ├── endobest_organizations.json (~17-20 KB, stats)
|
||||
│ ├── endobest_organizations_old.json (backup)
|
||||
│ ├── [Excel outputs] (*.xlsx, configurable)
|
||||
│ └── dashboard.log (Execution log)
|
||||
│
|
||||
└── 🔨 EXECUTABLES
|
||||
├── eb_dashboard.exe (16.5 MB, PyInstaller build)
|
||||
└── [Various .bat launch scripts]
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## 🏗️ System Architecture Overview
|
||||
|
||||
### High-Level Component Diagram
|
||||
|
||||
```
|
||||
┌─────────────────────────────────────────────────────────────────────┐
|
||||
│ ENDOBEST DASHBOARD MAIN PROCESS │
|
||||
│ eb_dashboard.py │
|
||||
├─────────────────────────────────────────────────────────────────────┤
|
||||
│ │
|
||||
│ ┌──────────────────────────────────────────────────────────────┐ │
|
||||
│ │ PHASE 1: INITIALIZATION & AUTHENTICATION │ │
|
||||
│ │ ├─ User Login (IAM API) │ │
|
||||
│ │ ├─ Token Exchange (RC-specific) │ │
|
||||
│ │ ├─ Config Loading (Excel parsing & validation) │ │
|
||||
│ │ └─ Thread Pool Setup (20 workers main, 40 subtasks) │ │
|
||||
│ └──────────────────────────────────────────────────────────────┘ │
|
||||
│ ↓ │
|
||||
│ ┌──────────────────────────────────────────────────────────────┐ │
|
||||
│ │ PHASE 2: ORGANIZATION & COUNTERS RETRIEVAL │ │
|
||||
│ │ ├─ Get All Organizations (getAllOrganizations API) │ │
|
||||
│ │ ├─ Fetch Counters Parallelized (20 workers) │ │
|
||||
│ │ ├─ Enrich with Center Mapping (optional) │ │
|
||||
│ │ └─ Calculate Totals & Sort │ │
|
||||
│ └──────────────────────────────────────────────────────────────┘ │
|
||||
│ ↓ │
|
||||
│ ┌──────────────────────────────────────────────────────────────┐ │
|
||||
│ │ PHASE 3: PATIENT INCLUSION DATA COLLECTION │ │
|
||||
│ │ Outer Loop: Organizations (20 parallel workers) │ │
|
||||
│ │ ├─ For Each Organization: │ │
|
||||
│ │ │ ├─ Get Inclusions List (POST /api/inclusions/search) │ │
|
||||
│ │ │ └─ For Each Patient (Sequential): │ │
|
||||
│ │ │ ├─ Fetch Clinical Record (API) │ │
|
||||
│ │ │ ├─ Fetch All Questionnaires (Optimized: 1 call) │ │
|
||||
│ │ │ ├─ Fetch Lab Requests (Async pool) │ │
|
||||
│ │ │ ├─ Process Field Mappings (extraction + transform) │ │
|
||||
│ │ │ └─ Update Progress Bars (thread-safe) │ │
|
||||
│ │ │ │ │
|
||||
│ │ │ Inner Async: Lab/Questionnaire Fetches (40 workers) │ │
|
||||
│ │ │ (Non-blocking I/O during main processing) │ │
|
||||
│ │ └─ Combine Inclusions from All Orgs │ │
|
||||
│ └──────────────────────────────────────────────────────────────┘ │
|
||||
│ ↓ │
|
||||
│ ┌──────────────────────────────────────────────────────────────┐ │
|
||||
│ │ PHASE 4: QUALITY ASSURANCE & VALIDATION │ │
|
||||
│ │ ├─ Coherence Check (API stats vs actual data) │ │
|
||||
│ │ │ └─ Compares counters with detailed records │ │
|
||||
│ │ ├─ Non-Regression Check (config-driven) │ │
|
||||
│ │ │ └─ Detects changes with severity levels │ │
|
||||
│ │ └─ Critical Issue Handling (user confirmation if needed) │ │
|
||||
│ └──────────────────────────────────────────────────────────────┘ │
|
||||
│ ↓ │
|
||||
│ ┌──────────────────────────────────────────────────────────────┐ │
|
||||
│ │ PHASE 5: EXPORT & PERSISTENCE │ │
|
||||
│ │ ├─ Backup Old Files (if quality passed) │ │
|
||||
│ │ ├─ Write JSON Outputs (endobest_inclusions.json, etc.) │ │
|
||||
│ │ ├─ Export to Excel (if configured) │ │
|
||||
│ │ │ ├─ Load Templates │ │
|
||||
│ │ │ ├─ Apply Filters & Sorts │ │
|
||||
│ │ │ ├─ Fill Data into Sheets │ │
|
||||
│ │ │ ├─ Replace Values │ │
|
||||
│ │ │ └─ Recalculate Formulas (win32com) │ │
|
||||
│ │ └─ Display Summary & Elapsed Time │ │
|
||||
│ └──────────────────────────────────────────────────────────────┘ │
|
||||
│ ↓ │
|
||||
│ EXIT │
|
||||
└─────────────────────────────────────────────────────────────────────┘
|
||||
|
||||
↓ EXTERNAL DEPENDENCIES ↓
|
||||
|
||||
┌─────────────────────────────────────────────────────────────────────┐
|
||||
│ EXTERNAL APIS │
|
||||
├─────────────────────────────────────────────────────────────────────┤
|
||||
│ │
|
||||
│ 🔐 AUTHENTICATION (IAM) │
|
||||
│ └─ api-auth.ziwig-connect.com │
|
||||
│ ├─ POST /api/auth/ziwig-pro/login │
|
||||
│ └─ POST /api/auth/refreshToken │
|
||||
│ │
|
||||
│ 🏥 RESEARCH CLINIC (RC) │
|
||||
│ └─ api-hcp.ziwig-connect.com │
|
||||
│ ├─ POST /api/auth/config-token │
|
||||
│ ├─ GET /api/inclusions/getAllOrganizations │
|
||||
│ ├─ POST /api/inclusions/inclusion-statistics │
|
||||
│ ├─ POST /api/inclusions/search │
|
||||
│ ├─ POST /api/records/byPatient │
|
||||
│ └─ POST /api/surveys/filter/with-answers (optimized!) │
|
||||
│ │
|
||||
│ 🧪 LAB / DIAGNOSTICS (GDD) │
|
||||
│ └─ api-lab.ziwig-connect.com │
|
||||
│ └─ GET /api/requests/by-tube-id/{tubeId} │
|
||||
│ │
|
||||
│ 📝 EXCEL TEMPLATES │
|
||||
│ └─ config/templates/ │
|
||||
│ ├─ Endobest_Template.xlsx │
|
||||
│ ├─ Statistics_Template.xlsx │
|
||||
│ └─ (Custom templates) │
|
||||
│ │
|
||||
└─────────────────────────────────────────────────────────────────────┘
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## 🔌 Module Descriptions
|
||||
|
||||
### 1. **eb_dashboard.py** - Main Orchestrator (57.5 KB)
|
||||
|
||||
**Responsibility:** Complete data collection workflow, API coordination, multithreaded execution
|
||||
|
||||
**Structure (9 Blocks):**
|
||||
|
||||
| Block | Purpose | Key Functions |
|
||||
|-------|---------|---|
|
||||
| **1** | Configuration & Infrastructure | Constants, global vars, progress bar setup |
|
||||
| **2** | Decorators & Resilience | `@api_call_with_retry`, retry logic |
|
||||
| **3** | Authentication | `login()`, token exchange, IAM integration |
|
||||
| **3B** | File Utilities | `load_json_file()` |
|
||||
| **4** | Inclusions Mapping Config | `load_inclusions_mapping_config()`, validation |
|
||||
| **5** | Data Search & Extraction | Questionnaire finding, field retrieval |
|
||||
| **6** | Custom Functions | Business logic, calculated fields |
|
||||
| **7** | Business API Calls | RC, GDD, organization endpoints |
|
||||
| **7b** | Organization Center Mapping | `load_org_center_mapping()` |
|
||||
| **8** | Processing Orchestration | `process_organization_patients()`, patient data processing |
|
||||
| **9** | Main Execution | Entry point, quality checks, export |
|
||||
|
||||
**Key Technologies:**
|
||||
- `httpx` - HTTP client (with thread-local instances)
|
||||
- `openpyxl` - Excel parsing
|
||||
- `concurrent.futures.ThreadPoolExecutor` - Parallel execution
|
||||
- `tqdm` - Progress tracking
|
||||
- `questionary` - Interactive prompts
|
||||
|
||||
---
|
||||
|
||||
### 2. **eb_dashboard_utils.py** - Utility Functions (6.4 KB)
|
||||
|
||||
**Responsibility:** Generic, reusable utility functions shared across modules
|
||||
|
||||
**Core Functions:**
|
||||
|
||||
```python
|
||||
get_httpx_client() # Thread-local HTTP client management
|
||||
get_thread_position() # Progress bar positioning
|
||||
get_nested_value() # JSON path navigation with wildcard support (*)
|
||||
get_config_path() # Config folder resolution (script vs PyInstaller)
|
||||
get_old_filename() # Backup filename generation
|
||||
```
|
||||
|
||||
**Key Features:**
|
||||
- Thread-safe HTTP client pooling
|
||||
- Wildcard support in nested JSON paths (e.g., `["items", "*", "value"]`)
|
||||
- Cross-platform path resolution
|
||||
|
||||
---
|
||||
|
||||
### 3. **eb_dashboard_quality_checks.py** - QA & Validation (58.5 KB)
|
||||
|
||||
**Responsibility:** Quality assurance, data validation, regression checking
|
||||
|
||||
**Core Functions:**
|
||||
|
||||
| Function | Purpose |
|
||||
|----------|---------|
|
||||
| `load_regression_check_config()` | Load regression rules from Excel |
|
||||
| `run_quality_checks()` | Orchestrate all QA checks |
|
||||
| `coherence_check()` | Verify stats vs detailed data consistency |
|
||||
| `non_regression_check()` | Config-driven change validation |
|
||||
| `run_check_only_mode()` | Standalone validation mode |
|
||||
| `backup_output_files()` | Create versioned backups |
|
||||
|
||||
**Quality Check Types:**
|
||||
|
||||
1. **Coherence Check**
|
||||
- Compares API-provided organization statistics vs. actual inclusion counts
|
||||
- Severity: Warning/Critical
|
||||
- Example: Total API count (145) vs. actual inclusions (143)
|
||||
|
||||
2. **Non-Regression Check**
|
||||
- Compares current vs. previous run data
|
||||
- Applies config-driven rules with transition patterns
|
||||
- Detects: new inclusions, deletions, field changes
|
||||
- Severity: Warning/Critical with exceptions
|
||||
|
||||
---
|
||||
|
||||
### 4. **eb_dashboard_excel_export.py** - Excel Generation & Orchestration (38 KB, v1.1+)
|
||||
|
||||
**Responsibility:** Configuration-driven Excel workbook generation with data transformation + high-level orchestration
|
||||
|
||||
**Core Functions (Low-Level):**
|
||||
|
||||
| Function | Purpose |
|
||||
|----------|---------|
|
||||
| `load_excel_export_config()` | Load Excel_Workbooks + Excel_Sheets config |
|
||||
| `validate_excel_config()` | Validate templates and named ranges |
|
||||
| `export_to_excel()` | Main export orchestration (openpyxl + win32com) |
|
||||
| `_apply_filter()` | AND-condition filtering |
|
||||
| `_apply_sort()` | Multi-key sorting with datetime support |
|
||||
| `_apply_value_replacement()` | Strict type matching value transformation |
|
||||
| `_handle_output_exists()` | File conflict resolution |
|
||||
| `_recalculate_workbook()` | Formula recalculation via win32com |
|
||||
| `_process_sheet()` | Sheet-specific data filling |
|
||||
|
||||
**High-Level Orchestration Functions (v1.1+):**
|
||||
|
||||
| Function | Purpose | Called From |
|
||||
|----------|---------|-------------|
|
||||
| `export_excel_only()` | Complete --excel-only mode | main() CLI detection |
|
||||
| `run_normal_mode_export()` | Normal mode export phase | main() after JSON write |
|
||||
| `prepare_excel_export()` | Preparation + validation | Both orchestration functions |
|
||||
| `execute_excel_export()` | Execution with error handling | Both orchestration functions |
|
||||
| `_load_json_file_internal()` | Safe JSON loading | run_normal_mode_export() |
|
||||
|
||||
**Data Transformation Pipeline:**
|
||||
```
|
||||
1. Load Configuration (Excel_Workbooks + Excel_Sheets)
|
||||
2. For each workbook:
|
||||
a. Load template (openpyxl)
|
||||
b. For each sheet:
|
||||
- Apply filter (AND conditions)
|
||||
- Apply sort (multi-key)
|
||||
- Apply value replacement (strict type matching)
|
||||
- Fill data into cells/named ranges
|
||||
c. Handle file conflicts (Overwrite/Increment/Backup)
|
||||
d. Save workbook (openpyxl)
|
||||
e. Recalculate formulas (win32com - optional)
|
||||
```
|
||||
|
||||
**Orchestration Pattern (v1.1+):**
|
||||
|
||||
As of v1.1, the system delegates all export orchestration to dedicated functions following the pattern established by `run_check_only_mode()` from quality_checks:
|
||||
|
||||
1. **--excel-only mode:** Main script calls single function → `export_excel_only()` handles everything
|
||||
2. **Normal mode export:** Main script calls single function → `run_normal_mode_export()` handles everything
|
||||
|
||||
This keeps the main script focused on business logic while all export mechanics are encapsulated in the module.
|
||||
|
||||
---
|
||||
|
||||
## 🔄 Complete Data Collection Workflow
|
||||
|
||||
### Phase 1: Initialization (2-3 seconds)
|
||||
1. User provides credentials (with defaults)
|
||||
2. IAM Login: `POST /api/auth/ziwig-pro/login`
|
||||
3. Token Exchange: `POST /api/auth/config-token`
|
||||
4. Load configuration from `Endobest_Dashboard_Config.xlsx`
|
||||
5. Validate field mappings and quality check rules
|
||||
6. Setup thread pools (main: 20 workers, subtasks: 40 workers)
|
||||
|
||||
### Phase 2: Organization Retrieval (5-8 seconds)
|
||||
1. Get all organizations: `GET /api/inclusions/getAllOrganizations`
|
||||
2. Filter excluded centers (config-driven)
|
||||
3. Fetch counters in parallel (20 workers):
|
||||
- For each org: `POST /api/inclusions/inclusion-statistics`
|
||||
- Store: patients_count, preincluded_count, included_count, prematurely_terminated_count
|
||||
4. Optional: Enrich with center mapping (from `eb_org_center_mapping.xlsx`)
|
||||
5. Calculate totals and sort
|
||||
|
||||
### Phase 3: Patient Data Collection (2-4 minutes)
|
||||
**Nested Parallel Architecture:**
|
||||
|
||||
**Outer Loop (20 workers):** For each organization
|
||||
- `POST /api/inclusions/search?limit=1000&page=1` → Get up to 1000 inclusions
|
||||
|
||||
**Middle Loop (Sequential):** For each patient
|
||||
- Fetch clinical record: `POST /api/records/byPatient`
|
||||
- Fetch questionnaires: `POST /api/surveys/filter/with-answers` (**optimized: 1 call**)
|
||||
- Submit async lab request: `GET /api/requests/by-tube-id/{tubeId}` (in subtasks pool)
|
||||
|
||||
**Inner Loop (40 async workers):** Non-blocking lab/questionnaire processing
|
||||
- Parallel fetches of lab requests while main thread processes fields
|
||||
|
||||
**Field Processing (per patient):**
|
||||
- For each field in configuration:
|
||||
1. Determine source (questionnaire, record, inclusion, request, calculated)
|
||||
2. Extract raw value (supports JSON paths with wildcards)
|
||||
3. Check field condition (optional)
|
||||
4. Apply post-processing transformations
|
||||
5. Format score dictionaries
|
||||
6. Store in nested output structure
|
||||
|
||||
### Phase 4: Quality Assurance (10-15 seconds)
|
||||
1. **Coherence Check:** Compare API counters vs. actual data
|
||||
2. **Non-Regression Check:** Compare current vs. previous run with config rules
|
||||
3. **Critical Issue Handling:** User confirmation if issues detected
|
||||
4. If NO critical issues → continue to export
|
||||
5. If YES critical issues → prompt user for override
|
||||
|
||||
### Phase 5: Export & Persistence (3-5 seconds)
|
||||
|
||||
**Step 1: Backup & JSON Write**
|
||||
1. Backup old files (if quality checks passed)
|
||||
2. Write JSON outputs:
|
||||
- `endobest_inclusions.json` (6-7 MB)
|
||||
- `endobest_organizations.json` (17-20 KB)
|
||||
|
||||
**Step 2: Excel Export (if configured)**
|
||||
Delegated to `run_normal_mode_export()` function which handles:
|
||||
1. Load JSONs from filesystem (ensures consistency)
|
||||
2. Load Excel configuration
|
||||
3. Validate templates and named ranges
|
||||
4. For each configured workbook:
|
||||
- Load template file
|
||||
- Apply filter conditions (AND logic)
|
||||
- Apply multi-key sort
|
||||
- Apply value replacements (strict type matching)
|
||||
- Fill data into cells/named ranges
|
||||
- Handle file conflicts (Overwrite/Increment/Backup)
|
||||
- Save workbook
|
||||
- Recalculate formulas (optional, via win32com)
|
||||
5. Display results and return status
|
||||
|
||||
**Step 3: Summary**
|
||||
1. Display elapsed time
|
||||
2. Report file locations
|
||||
3. Note any warnings/errors during export
|
||||
|
||||
---
|
||||
|
||||
## ⚙️ Configuration System
|
||||
|
||||
### Three-Layer Configuration Architecture
|
||||
|
||||
#### Layer 1: Excel Configuration (`Endobest_Dashboard_Config.xlsx`)
|
||||
|
||||
**Sheet 1: Inclusions_Mapping** (Field Extraction)
|
||||
- Define which patient fields to extract
|
||||
- Specify sources (questionnaire, record, inclusion, request, calculated)
|
||||
- Configure transformations (value labels, templates, conditions)
|
||||
- ~50+ fields typically configured
|
||||
|
||||
**Sheet 2: Organizations_Mapping** (Organization Fields)
|
||||
- Define which organization fields to export
|
||||
- Rarely modified
|
||||
|
||||
**Sheet 3: Excel_Workbooks** (Excel Export Metadata)
|
||||
- Workbook names
|
||||
- Template paths
|
||||
- Output filenames (with template variables)
|
||||
- File conflict handling strategy (Overwrite/Increment/Backup)
|
||||
|
||||
**Sheet 4: Excel_Sheets** (Sheet Configurations)
|
||||
- Workbook name (reference to Excel_Workbooks)
|
||||
- Sheet name (in template)
|
||||
- Source type (Inclusions/Organizations/Variable)
|
||||
- Target (cell or named range)
|
||||
- Column mapping (JSON)
|
||||
- Filter conditions (JSON with AND logic)
|
||||
- Sort keys (JSON, multi-key with datetime support)
|
||||
- Value replacements (JSON, strict type matching)
|
||||
|
||||
**Sheet 5: Regression_Check** (Quality Rules)
|
||||
- Rule names
|
||||
- Field selection pipeline (include/exclude patterns)
|
||||
- Scope (all organizations or specific org list)
|
||||
- Transition patterns (expected state changes)
|
||||
- Severity levels (Warning/Critical)
|
||||
|
||||
#### Layer 2: Organization Mapping (`eb_org_center_mapping.xlsx`)
|
||||
- Optional mapping file
|
||||
- Sheet: `Org_Center_Mapping`
|
||||
- Maps organization names to center identifiers
|
||||
- Gracefully degraded if missing
|
||||
|
||||
#### Layer 3: Excel Templates (`config/templates/`)
|
||||
- Excel workbook templates with:
|
||||
- Sheet definitions
|
||||
- Named ranges (for data fill targets)
|
||||
- Formula structures
|
||||
- Formatting and styles
|
||||
|
||||
### Configuration Constants (in code)
|
||||
|
||||
```python
|
||||
# API Configuration
|
||||
IAM_URL = "https://api-auth.ziwig-connect.com"
|
||||
RC_URL = "https://api-hcp.ziwig-connect.com"
|
||||
GDD_URL = "https://api-lab.ziwig-connect.com"
|
||||
RC_APP_ID = "602aea51-cdb2-4f73-ac99-fd84050dc393"
|
||||
RC_ENDOBEST_PROTOCOL_ID = "3c7bcb4d-91ed-4e9f-b93f-99d8447a276e"
|
||||
|
||||
# Threading & Performance
|
||||
MAX_THREADS = 20 # Main thread pool workers
|
||||
ASYNC_THREADS = 40 # Subtasks thread pool workers
|
||||
ERROR_MAX_RETRY = 10 # Maximum retry attempts
|
||||
WAIT_BEFORE_RETRY = 0.5 # Seconds between retries
|
||||
|
||||
# Excluded Organizations
|
||||
RC_ENDOBEST_EXCLUDED_CENTERS = ["e18e7487-...", "5582bd75-...", "e053512f-..."]
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## 🔐 API Integration
|
||||
|
||||
### Authentication Flow
|
||||
|
||||
```
|
||||
1. IAM Login
|
||||
POST https://api-auth.ziwig-connect.com/api/auth/ziwig-pro/login
|
||||
Request: {"username": "...", "password": "..."}
|
||||
Response: {"access_token": "jwt_master", "userId": "uuid"}
|
||||
|
||||
2. Token Exchange (RC-specific)
|
||||
POST https://api-hcp.ziwig-connect.com/api/auth/config-token
|
||||
Headers: Authorization: Bearer {master_token}
|
||||
Request: {"userId": "...", "clientId": "...", "userAgent": "..."}
|
||||
Response: {"access_token": "jwt_rc", "refresh_token": "refresh_token"}
|
||||
|
||||
3. Automatic Token Refresh (on 401)
|
||||
POST https://api-hcp.ziwig-connect.com/api/auth/refreshToken
|
||||
Headers: Authorization: Bearer {current_token}
|
||||
Request: {"refresh_token": "..."}
|
||||
Response: {"access_token": "jwt_new", "refresh_token": "new_refresh"}
|
||||
```
|
||||
|
||||
### Key API Endpoints
|
||||
|
||||
| Endpoint | Method | Purpose |
|
||||
|----------|--------|---------|
|
||||
| `/api/inclusions/getAllOrganizations` | GET | List all organizations |
|
||||
| `/api/inclusions/inclusion-statistics` | POST | Get patient counts per org |
|
||||
| `/api/inclusions/search` | POST | Get inclusions list for org (paginated) |
|
||||
| `/api/records/byPatient` | POST | Get clinical record for patient |
|
||||
| `/api/surveys/filter/with-answers` | POST | **OPTIMIZED:** Get all questionnaires for patient |
|
||||
| `/api/requests/by-tube-id/{tubeId}` | GET | Get lab test results |
|
||||
|
||||
### Performance Optimization: Questionnaire Batching
|
||||
|
||||
**Problem:** Multiple API calls per patient (1 call per questionnaire × N patients = slow)
|
||||
|
||||
**Solution:** Single optimized call retrieves all questionnaires with answers
|
||||
|
||||
```
|
||||
BEFORE (inefficient):
|
||||
for qcm_id in questionnaire_ids:
|
||||
GET /api/surveys/{qcm_id}/answers?subject={patient_id}
|
||||
# Result: N API calls per patient
|
||||
|
||||
AFTER (optimized):
|
||||
POST /api/surveys/filter/with-answers
|
||||
{
|
||||
"context": "clinic_research",
|
||||
"subject": patient_id
|
||||
}
|
||||
# Result: 1 API call per patient
|
||||
# Impact: 4-5x performance improvement
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## ⚡ Multithreading & Performance Optimization
|
||||
|
||||
### Thread Pool Architecture
|
||||
|
||||
```
|
||||
Main Application Thread
|
||||
↓
|
||||
┌─ Phase 1: Counter Fetching ──────────────────────────┐
|
||||
│ ThreadPoolExecutor(max_workers=user_input, cap=20) │
|
||||
│ ├─ Task 1: Get counters for Org 1 │
|
||||
│ ├─ Task 2: Get counters for Org 2 │
|
||||
│ └─ Task N: Get counters for Org N │
|
||||
│ [Sequential wait: tqdm.as_completed] │
|
||||
└──────────────────────────────────────────────────────┘
|
||||
↓
|
||||
┌─ Phase 2: Inclusion Data Collection (Nested) ────────┐
|
||||
│ Outer: ThreadPoolExecutor(max_workers=user_input) │
|
||||
│ │
|
||||
│ For Org 1: │
|
||||
│ │ Inner: ThreadPoolExecutor(max_workers=40) │
|
||||
│ │ ├─ Patient 1: Async lab/questionnaire fetch │
|
||||
│ │ ├─ Patient 2: Async lab/questionnaire fetch │
|
||||
│ │ └─ Patient N: Async lab/questionnaire fetch │
|
||||
│ │ [Sequential outer wait: as_completed] │
|
||||
│ │ │
|
||||
│ For Org 2: │
|
||||
│ │ [Similar parallel processing] │
|
||||
│ │ │
|
||||
│ For Org N: │
|
||||
│ │ [Similar parallel processing] │
|
||||
└──────────────────────────────────────────────────────┘
|
||||
```
|
||||
|
||||
### Performance Optimizations
|
||||
|
||||
1. **Thread-Local HTTP Clients**
|
||||
- Each thread maintains its own `httpx.Client`
|
||||
- Avoids connection conflicts
|
||||
- Implementation via `get_httpx_client()`
|
||||
|
||||
2. **Nested Parallelization**
|
||||
- Main pool: Organizations (20 workers)
|
||||
- Subtasks pool: Lab requests (40 workers)
|
||||
- Non-blocking I/O during processing
|
||||
|
||||
3. **Questionnaire Batching** (4-5x improvement)
|
||||
- Single call retrieves all questionnaires + answers
|
||||
- Eliminates N filtered calls per patient
|
||||
|
||||
4. **Configurable Worker Threads**
|
||||
- User input selection (1-20 workers)
|
||||
- Tunable for network bandwidth and API rate limits
|
||||
|
||||
### Progress Tracking (Multi-Level)
|
||||
|
||||
```
|
||||
Overall Progress [████████████░░░░░░░░░░░░] 847/1200
|
||||
1/15 - Center 1 [██████████░░░░░░░░░░░░░░░] 73/95
|
||||
2/15 - Center 2 [██████░░░░░░░░░░░░░░░░░░░] 42/110
|
||||
3/15 - Center 3 [████░░░░░░░░░░░░░░░░░░░░░] 28/85
|
||||
```
|
||||
|
||||
**Thread-Safe Updates:**
|
||||
```python
|
||||
with _global_pbar_lock:
|
||||
if global_pbar:
|
||||
global_pbar.update(1)
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## 🛡️ Error Handling & Resilience
|
||||
|
||||
### Token Management Strategy
|
||||
|
||||
1. **Automatic Token Refresh on 401**
|
||||
- Triggered by `@api_call_with_retry` decorator
|
||||
- Thread-safe via `_token_refresh_lock`
|
||||
|
||||
2. **Retry Mechanism**
|
||||
- Max retries: 10 attempts
|
||||
- Delay between retries: 0.5 seconds
|
||||
- Decorators: `@api_call_with_retry`
|
||||
|
||||
3. **Thread-Safe Token Refresh**
|
||||
```python
|
||||
def new_token():
|
||||
global access_token, refresh_token
|
||||
with _token_refresh_lock: # Only one thread refreshes at a time
|
||||
for attempt in range(ERROR_MAX_RETRY):
|
||||
try:
|
||||
# POST /api/auth/refreshToken
|
||||
# Update global tokens
|
||||
except:
|
||||
sleep(WAIT_BEFORE_RETRY)
|
||||
```
|
||||
|
||||
### Exception Handling Categories
|
||||
|
||||
| Category | Examples | Handling |
|
||||
|----------|----------|----------|
|
||||
| **API Errors** | Network timeouts, HTTP errors | Retry with exponential spacing |
|
||||
| **File I/O Errors** | Missing config, permission denied | Graceful error + exit |
|
||||
| **Validation Errors** | Invalid config, incoherent data | Log warning + prompt user |
|
||||
| **Thread Errors** | Worker thread failures | Shutdown gracefully + propagate |
|
||||
|
||||
### Graceful Degradation
|
||||
|
||||
1. **Missing Organization Mapping:** Skip silently, use fallback (org name)
|
||||
2. **Critical Quality Issues:** Prompt user for confirmation before export
|
||||
3. **Thread Failure:** Shutdown all workers gracefully, preserve partial results
|
||||
4. **Invalid Configuration:** Clear error messages with remediation suggestions
|
||||
|
||||
---
|
||||
|
||||
## 📊 Data Output Structure
|
||||
|
||||
### JSON Output: `endobest_inclusions.json`
|
||||
|
||||
```json
|
||||
[
|
||||
{
|
||||
"Patient_Identification": {
|
||||
"Organisation_Id": "uuid",
|
||||
"Organisation_Name": "Hospital Name",
|
||||
"Center_Name": "HOSP-A",
|
||||
"Patient_Id": "internal_id",
|
||||
"Pseudo": "ENDO-001",
|
||||
"Patient_Name": "Doe, John",
|
||||
"Patient_Birthday": "1975-05-15",
|
||||
"Patient_Age": 49
|
||||
},
|
||||
"Inclusion": {
|
||||
"Consent_Signed": true,
|
||||
"Inclusion_Date": "15/10/2024",
|
||||
"Inclusion_Status": "incluse",
|
||||
"isPrematurelyTerminated": false
|
||||
},
|
||||
"Extended_Fields": {
|
||||
"Custom_Field_1": "value",
|
||||
"Custom_Field_2": 42,
|
||||
"Composite_Score": "8/10"
|
||||
},
|
||||
"Endotest": {
|
||||
"Request_Sent": true,
|
||||
"Diagnostic_Status": "Completed"
|
||||
}
|
||||
}
|
||||
]
|
||||
```
|
||||
|
||||
### JSON Output: `endobest_organizations.json`
|
||||
|
||||
```json
|
||||
[
|
||||
{
|
||||
"id": "org-uuid",
|
||||
"name": "Hospital A",
|
||||
"Center_Name": "HOSP-A",
|
||||
"patients_count": 45,
|
||||
"preincluded_count": 8,
|
||||
"included_count": 35,
|
||||
"prematurely_terminated_count": 2
|
||||
}
|
||||
]
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## 🚀 Execution Modes
|
||||
|
||||
### Mode 1: Normal (Full Collection)
|
||||
```bash
|
||||
python eb_dashboard.py
|
||||
```
|
||||
- Authenticates
|
||||
- Collects from APIs
|
||||
- Runs quality checks
|
||||
- Exports JSON + Excel
|
||||
- Duration: 2.5-5 minutes (typical)
|
||||
|
||||
### Mode 2: Excel-Only (Fast Export)
|
||||
```bash
|
||||
python eb_dashboard.py --excel-only
|
||||
```
|
||||
- Skips data collection
|
||||
- Uses existing JSON files
|
||||
- Regenerates Excel workbooks
|
||||
- Duration: 5-15 seconds
|
||||
- Use case: Reconfigure reports, test templates
|
||||
|
||||
### Mode 3: Check-Only (Validation Only)
|
||||
```bash
|
||||
python eb_dashboard.py --check-only
|
||||
```
|
||||
- Loads existing JSON
|
||||
- Runs quality checks
|
||||
- No export
|
||||
- Duration: 5-10 seconds
|
||||
- Use case: Verify data before distribution
|
||||
|
||||
### Mode 4: Debug (Verbose Output)
|
||||
```bash
|
||||
python eb_dashboard.py --debug
|
||||
```
|
||||
- Executes normal mode
|
||||
- Enables detailed logging
|
||||
- Shows field-by-field changes
|
||||
- Check `dashboard.log` for details
|
||||
|
||||
---
|
||||
|
||||
## 📈 Performance Metrics & Benchmarks
|
||||
|
||||
### Typical Execution Times (Full Dataset: 1,200+ patients, 15+ organizations)
|
||||
|
||||
| Phase | Duration | Notes |
|
||||
|-------|----------|-------|
|
||||
| **Login & Config** | 2-3 sec | Sequential, network-dependent |
|
||||
| **Fetch Counters** | 5-8 sec | 20 workers, parallelized |
|
||||
| **Collect Inclusions** | 2-4 min | Includes API calls + field processing |
|
||||
| **Quality Checks** | 10-15 sec | File loads, data comparison |
|
||||
| **Export to JSON** | 3-5 sec | File I/O |
|
||||
| **Export to Excel** | 5-15 sec | Template processing + fill |
|
||||
| **TOTAL** | **~2.5-5 min** | Depends on network, API perf |
|
||||
|
||||
### Network Optimization Impact
|
||||
|
||||
**With old questionnaire approach (N filtered calls per patient):**
|
||||
- 1,200 patients × 15 questionnaires = 18,000 API calls
|
||||
- Estimated: 15-30 minutes
|
||||
|
||||
**With optimized single-call questionnaire:**
|
||||
- 1,200 patients × 1 call = 1,200 API calls
|
||||
- Estimated: 2-5 minutes
|
||||
- **Improvement: 3-6x faster** ✅
|
||||
|
||||
---
|
||||
|
||||
## 🔍 Field Extraction & Processing Logic
|
||||
|
||||
### Complete Field Processing Pipeline
|
||||
|
||||
```
|
||||
For each field in INCLUSIONS_MAPPING_CONFIG:
|
||||
│
|
||||
├─ Step 1: Determine Source Type
|
||||
│ ├─ q_id / q_name / q_category → Find questionnaire
|
||||
│ ├─ record → Use clinical record
|
||||
│ ├─ inclusion → Use patient inclusion data
|
||||
│ ├─ request → Use lab request data
|
||||
│ └─ calculated → Execute custom function
|
||||
│
|
||||
├─ Step 2: Extract Raw Value
|
||||
│ ├─ Navigate JSON using field_path
|
||||
│ ├─ Supports wildcard (*) for list traversal
|
||||
│ └─ Return value or "undefined"
|
||||
│
|
||||
├─ Step 3: Check Field Condition (optional)
|
||||
│ ├─ If condition undefined → Set to "undefined"
|
||||
│ ├─ If condition not boolean → Error flag
|
||||
│ ├─ If condition false → Set to "N/A"
|
||||
│ └─ If condition true → Continue
|
||||
│
|
||||
├─ Step 4: Apply Post-Processing Transformations
|
||||
│ ├─ true_if_any: Convert to boolean
|
||||
│ ├─ value_labels: Map to localized text
|
||||
│ ├─ field_template: Apply formatting
|
||||
│ └─ List joining: Flatten arrays with pipe delimiter
|
||||
│
|
||||
├─ Step 5: Format Score Dictionaries
|
||||
│ ├─ If {total, max} → Format as "total/max"
|
||||
│ └─ Otherwise → Keep as-is
|
||||
│
|
||||
└─ Store: output_inclusion[field_group][field_name] = final_value
|
||||
```
|
||||
|
||||
### Custom Functions for Calculated Fields
|
||||
|
||||
| Function | Purpose | Syntax |
|
||||
|----------|---------|--------|
|
||||
| `search_in_fields_using_regex` | Search multiple fields for pattern | `["search_in_fields_using_regex", "pattern", "field1", "field2"]` |
|
||||
| `extract_parentheses_content` | Extract text within parentheses | `["extract_parentheses_content", "field_name"]` |
|
||||
| `append_terminated_suffix` | Add suffix if patient terminated | `["append_terminated_suffix", "status_field", "is_terminated_field"]` |
|
||||
| `if_then_else` | Unified conditional with 8 operators | `["if_then_else", "operator", arg1, arg2_optional, true_result, false_result]` |
|
||||
|
||||
**if_then_else Operators:**
|
||||
- `is_true` / `is_false` - Boolean field test
|
||||
- `is_defined` / `is_undefined` - Existence test
|
||||
- `all_true` / `all_defined` - Multiple field test
|
||||
- `==` / `!=` - Value comparison
|
||||
|
||||
---
|
||||
|
||||
## ✅ Quality Assurance Framework
|
||||
|
||||
### Coherence Check
|
||||
|
||||
**Purpose:** Verify API-provided statistics match actual collected data
|
||||
|
||||
**Logic:**
|
||||
```
|
||||
For each organization:
|
||||
API_Count = statistic.total
|
||||
Actual_Count = count of inclusion records
|
||||
|
||||
if API_Count != Actual_Count:
|
||||
Report discrepancy with severity
|
||||
├─ ±10%: Warning
|
||||
└─ >±10%: Critical
|
||||
```
|
||||
|
||||
### Non-Regression Check
|
||||
|
||||
**Purpose:** Detect unexpected changes between data runs
|
||||
|
||||
**Configuration-Driven Rules:**
|
||||
- Field selection pipeline (include/exclude patterns)
|
||||
- Transition patterns (expected state changes)
|
||||
- Severity levels (Warning/Critical)
|
||||
- Exception handling (exclude specific organizations)
|
||||
|
||||
**Logic:**
|
||||
```
|
||||
Load previous inclusion data (_old file)
|
||||
|
||||
For each rule:
|
||||
├─ Build candidate fields via pipeline
|
||||
├─ Determine key field for matching
|
||||
└─ For each inclusion:
|
||||
├─ Find matching old inclusion by key
|
||||
├─ Check for unexpected transitions
|
||||
├─ Apply exceptions
|
||||
└─ Report violations
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## 📋 Documentation Structure
|
||||
|
||||
The system includes comprehensive documentation:
|
||||
|
||||
| Document | Size | Content |
|
||||
|----------|------|---------|
|
||||
| **DOCUMENTATION_10_ARCHITECTURE.md** | 43.7 KB | System design, workflow, APIs, multithreading |
|
||||
| **DOCUMENTATION_11_FIELD_MAPPING.md** | 56.3 KB | Field extraction logic, custom functions, examples |
|
||||
| **DOCUMENTATION_12_QUALITY_CHECKS.md** | 60.2 KB | QA framework, regression rules, configuration |
|
||||
| **DOCUMENTATION_13_EXCEL_EXPORT.md** | 29.6 KB | Excel generation, data transformation, config |
|
||||
| **DOCUMENTATION_98_USER_GUIDE.md** | 8.4 KB | End-user instructions, troubleshooting, FAQ |
|
||||
| **DOCUMENTATION_99_CONFIG_GUIDE.md** | 24.8 KB | Administrator reference, Excel tables, examples |
|
||||
|
||||
---
|
||||
|
||||
## 🔧 Key Technical Features
|
||||
|
||||
### Thread Safety
|
||||
- Per-thread HTTP clients (no connection conflicts)
|
||||
- Synchronized access to global state via locks
|
||||
- Thread-safe progress bar updates
|
||||
|
||||
### Error Recovery
|
||||
- Automatic token refresh on 401 errors
|
||||
- Exponential backoff retry logic (configurable)
|
||||
- Graceful degradation for optional features
|
||||
- User confirmation on critical issues
|
||||
|
||||
### Configuration Flexibility
|
||||
- 100% externalized to Excel (zero code changes)
|
||||
- Supports multiple data sources
|
||||
- Custom business logic functions
|
||||
- Field dependencies and conditions
|
||||
- Value transformations and templates
|
||||
|
||||
### Performance
|
||||
- Optimized API calls (4-5x improvement)
|
||||
- Parallel processing (20+ workers)
|
||||
- Async I/O operations
|
||||
- Configurable thread pools
|
||||
|
||||
### Data Quality
|
||||
- Coherence checking (stats vs actual data)
|
||||
- Non-regression testing (config-driven)
|
||||
- Comprehensive validation
|
||||
- Audit trail logging
|
||||
|
||||
---
|
||||
|
||||
## 📦 Dependencies
|
||||
|
||||
### Core Libraries
|
||||
- **httpx** - HTTP client with connection pooling
|
||||
- **openpyxl** - Excel file reading/writing
|
||||
- **questionary** - Interactive CLI prompts
|
||||
- **tqdm** - Progress bars
|
||||
- **rich** - Rich text formatting
|
||||
- **pywin32** - Windows COM automation (optional, for formula recalculation)
|
||||
- **pytz** - Timezone support (optional)
|
||||
|
||||
### Python Version
|
||||
- Python 3.7+
|
||||
|
||||
### External Services
|
||||
- Ziwig IAM API
|
||||
- Ziwig Research Clinic (RC) API
|
||||
- Ziwig Lab (GDD) API
|
||||
|
||||
---
|
||||
|
||||
## 🎓 Usage Patterns
|
||||
|
||||
### For End Users
|
||||
1. Configure fields in Excel (no code needed)
|
||||
2. Run: `python eb_dashboard.py`
|
||||
3. Review results in JSON or Excel
|
||||
|
||||
### For Administrators
|
||||
1. Add new fields to `Inclusions_Mapping`
|
||||
2. Define quality rules in `Regression_Check`
|
||||
3. Configure Excel export in `Excel_Workbooks` + `Excel_Sheets`
|
||||
4. Restart: script picks up config automatically
|
||||
|
||||
### For Developers
|
||||
1. Add custom function to Block 6 (eb_dashboard.py)
|
||||
2. Register in field config (Inclusions_Mapping)
|
||||
3. Use via: `"source_id": "function_name"`
|
||||
4. No code recompile needed for other changes
|
||||
|
||||
---
|
||||
|
||||
## 🎯 Summary
|
||||
|
||||
The **Endobest Clinical Research Dashboard** represents a mature, production-ready system that successfully combines:
|
||||
|
||||
✅ **Architectural Excellence** - Clean modular design with separation of concerns
|
||||
✅ **User-Centric Configuration** - 100% externalized, no code changes needed
|
||||
✅ **Performance Optimization** - 4-5x faster via API and threading improvements
|
||||
✅ **Robust Resilience** - Comprehensive error handling, automatic recovery, graceful degradation
|
||||
✅ **Quality Assurance** - Multi-level validation, coherence checks, regression testing
|
||||
✅ **Comprehensive Documentation** - 250+ KB of technical and user guides
|
||||
✅ **Maintainability** - Clear code structure, extensive logging, audit trails
|
||||
|
||||
The system successfully enables non-technical users to configure complex data extraction and reporting workflows while maintaining enterprise-grade reliability and performance standards.
|
||||
|
||||
---
|
||||
|
||||
**Document Version:** 1.0
|
||||
**Last Updated:** 2025-11-08
|
||||
**Status:** ✅ Complete & Production Ready
|
||||
Reference in New Issue
Block a user