Files
EB_Dashboard/DOCUMENTATION/DOCUMENTATION_30_ARCHITECTURE_SUMMARY.md

991 lines
40 KiB
Markdown
Raw Blame History

This file contains ambiguous Unicode characters
This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.
# 📊 Endobest Clinical Research Dashboard - Architecture Summary
**Last Updated:** 2025-11-08
**Project Status:** Production Ready with Excel Export Feature
**Language:** Python 3.x
---
## 🎯 Executive Summary
The **Endobest Clinical Research Dashboard** is a sophisticated, production-grade automated data collection and reporting system designed to aggregate patient inclusion data from the Endobest clinical research protocol across multiple healthcare organizations. The system combines high-performance multithreading, comprehensive quality assurance, and fully externalized configuration to enable non-technical users to manage complex data extraction workflows without code modifications.
### Core Value Propositions
**100% Externalized Configuration** - All field definitions, quality rules, and export logic defined in Excel
**High-Performance Architecture** - 4-5x faster via optimized API calls and parallel processing
**Robust Resilience** - Automatic token refresh, retries, graceful degradation
**Comprehensive Quality Assurance** - Coherence checks + config-driven regression testing
**Multi-Format Export** - JSON + configurable Excel workbooks with data transformation
**User-Friendly Interface** - Interactive prompts, progress tracking, clear error messages
---
## 📁 Project Structure
```
Endobest Dashboard/
├── 📜 MAIN SCRIPT
│ └── eb_dashboard.py (57.5 KB, 1,021 lines)
│ Core orchestrator for data collection, processing, and export
├── 🔧 UTILITY MODULES
│ ├── eb_dashboard_utils.py (6.4 KB, 184 lines)
│ │ Thread-safe HTTP clients, nested data navigation, config resolution
│ │
│ ├── eb_dashboard_quality_checks.py (58.5 KB, 1,266 lines)
│ │ Coherence checks, non-regression testing, data validation
│ │
│ └── eb_dashboard_excel_export.py (32 KB, ~1,000 lines)
│ Configuration-driven Excel workbook generation
├── 📚 DOCUMENTATION
│ ├── DOCUMENTATION_10_ARCHITECTURE.md (43.7 KB)
│ │ System design, data flow, API integration, multithreading
│ │
│ ├── DOCUMENTATION_11_FIELD_MAPPING.md (56.3 KB)
│ │ Field extraction logic, custom functions, transformations
│ │
│ ├── DOCUMENTATION_12_QUALITY_CHECKS.md (60.2 KB)
│ │ Quality assurance framework, regression rules, validation logic
│ │
│ ├── DOCUMENTATION_13_EXCEL_EXPORT.md (29.6 KB)
│ │ Excel generation architecture, data transformation pipeline
│ │
│ ├── DOCUMENTATION_98_USER_GUIDE.md (8.4 KB)
│ │ End-user instructions, quick start, troubleshooting
│ │
│ └── DOCUMENTATION_99_CONFIG_GUIDE.md (24.8 KB)
│ Administrator configuration reference
├── ⚙️ CONFIGURATION
│ └── config/
│ ├── Endobest_Dashboard_Config.xlsx (Configuration file)
│ │ Inclusions_Mapping
│ │ Organizations_Mapping
│ │ Excel_Workbooks
│ │ Excel_Sheets
│ │ Regression_Check
│ │
│ ├── eb_org_center_mapping.xlsx (Organization enrichment)
│ │
│ └── templates/
│ ├── Endobest_Template.xlsx
│ ├── Statistics_Template.xlsx
│ └── (Other Excel templates)
├── 📊 OUTPUT FILES
│ ├── endobest_inclusions.json (~6-7 MB, patient data)
│ ├── endobest_inclusions_old.json (backup)
│ ├── endobest_organizations.json (~17-20 KB, stats)
│ ├── endobest_organizations_old.json (backup)
│ ├── [Excel outputs] (*.xlsx, configurable)
│ └── dashboard.log (Execution log)
└── 🔨 EXECUTABLES
├── eb_dashboard.exe (16.5 MB, PyInstaller build)
└── [Various .bat launch scripts]
```
---
## 🏗️ System Architecture Overview
### High-Level Component Diagram
```
┌─────────────────────────────────────────────────────────────────────┐
│ ENDOBEST DASHBOARD MAIN PROCESS │
│ eb_dashboard.py │
├─────────────────────────────────────────────────────────────────────┤
│ │
│ ┌──────────────────────────────────────────────────────────────┐ │
│ │ PHASE 1: INITIALIZATION & AUTHENTICATION │ │
│ │ ├─ User Login (IAM API) │ │
│ │ ├─ Token Exchange (RC-specific) │ │
│ │ ├─ Config Loading (Excel parsing & validation) │ │
│ │ └─ Thread Pool Setup (20 workers main, 40 subtasks) │ │
│ └──────────────────────────────────────────────────────────────┘ │
│ ↓ │
│ ┌──────────────────────────────────────────────────────────────┐ │
│ │ PHASE 2: ORGANIZATION & COUNTERS RETRIEVAL │ │
│ │ ├─ Get All Organizations (getAllOrganizations API) │ │
│ │ ├─ Fetch Counters Parallelized (20 workers) │ │
│ │ ├─ Enrich with Center Mapping (optional) │ │
│ │ └─ Calculate Totals & Sort │ │
│ └──────────────────────────────────────────────────────────────┘ │
│ ↓ │
│ ┌──────────────────────────────────────────────────────────────┐ │
│ │ PHASE 3: PATIENT INCLUSION DATA COLLECTION │ │
│ │ Outer Loop: Organizations (20 parallel workers) │ │
│ │ ├─ For Each Organization: │ │
│ │ │ ├─ Get Inclusions List (POST /api/inclusions/search) │ │
│ │ │ └─ For Each Patient (Sequential): │ │
│ │ │ ├─ Fetch Clinical Record (API) │ │
│ │ │ ├─ Fetch All Questionnaires (Optimized: 1 call) │ │
│ │ │ ├─ Fetch Lab Requests (Async pool) │ │
│ │ │ ├─ Process Field Mappings (extraction + transform) │ │
│ │ │ └─ Update Progress Bars (thread-safe) │ │
│ │ │ │ │
│ │ │ Inner Async: Lab/Questionnaire Fetches (40 workers) │ │
│ │ │ (Non-blocking I/O during main processing) │ │
│ │ └─ Combine Inclusions from All Orgs │ │
│ └──────────────────────────────────────────────────────────────┘ │
│ ↓ │
│ ┌──────────────────────────────────────────────────────────────┐ │
│ │ PHASE 4: QUALITY ASSURANCE & VALIDATION │ │
│ │ ├─ Coherence Check (API stats vs actual data) │ │
│ │ │ └─ Compares counters with detailed records │ │
│ │ ├─ Non-Regression Check (config-driven) │ │
│ │ │ └─ Detects changes with severity levels │ │
│ │ └─ Critical Issue Handling (user confirmation if needed) │ │
│ └──────────────────────────────────────────────────────────────┘ │
│ ↓ │
│ ┌──────────────────────────────────────────────────────────────┐ │
│ │ PHASE 5: EXPORT & PERSISTENCE │ │
│ │ ├─ Backup Old Files (if quality passed) │ │
│ │ ├─ Write JSON Outputs (endobest_inclusions.json, etc.) │ │
│ │ ├─ Export to Excel (if configured) │ │
│ │ │ ├─ Load Templates │ │
│ │ │ ├─ Apply Filters & Sorts │ │
│ │ │ ├─ Fill Data into Sheets │ │
│ │ │ ├─ Replace Values │ │
│ │ │ └─ Recalculate Formulas (win32com) │ │
│ │ └─ Display Summary & Elapsed Time │ │
│ └──────────────────────────────────────────────────────────────┘ │
│ ↓ │
│ EXIT │
└─────────────────────────────────────────────────────────────────────┘
↓ EXTERNAL DEPENDENCIES ↓
┌─────────────────────────────────────────────────────────────────────┐
│ EXTERNAL APIS │
├─────────────────────────────────────────────────────────────────────┤
│ │
│ 🔐 AUTHENTICATION (IAM) │
│ └─ api-auth.ziwig-connect.com │
│ ├─ POST /api/auth/ziwig-pro/login │
│ └─ POST /api/auth/refreshToken │
│ │
│ 🏥 RESEARCH CLINIC (RC) │
│ └─ api-hcp.ziwig-connect.com │
│ ├─ POST /api/auth/config-token │
│ ├─ GET /api/inclusions/getAllOrganizations │
│ ├─ POST /api/inclusions/inclusion-statistics │
│ ├─ POST /api/inclusions/search │
│ ├─ POST /api/records/byPatient │
│ └─ POST /api/surveys/filter/with-answers (optimized!) │
│ │
│ 🧪 LAB / DIAGNOSTICS (GDD) │
│ └─ api-lab.ziwig-connect.com │
│ └─ GET /api/requests/by-tube-id/{tubeId} │
│ │
│ 📝 EXCEL TEMPLATES │
│ └─ config/templates/ │
│ ├─ Endobest_Template.xlsx │
│ ├─ Statistics_Template.xlsx │
│ └─ (Custom templates) │
│ │
└─────────────────────────────────────────────────────────────────────┘
```
---
## 🔌 Module Descriptions
### 1. **eb_dashboard.py** - Main Orchestrator (57.5 KB)
**Responsibility:** Complete data collection workflow, API coordination, multithreaded execution
**Structure (9 Blocks):**
| Block | Purpose | Key Functions |
|-------|---------|---|
| **1** | Configuration & Infrastructure | Constants, global vars, progress bar setup |
| **2** | Decorators & Resilience | `@api_call_with_retry`, retry logic |
| **3** | Authentication | `login()`, token exchange, IAM integration |
| **3B** | File Utilities | `load_json_file()` |
| **4** | Inclusions Mapping Config | `load_inclusions_mapping_config()`, validation |
| **5** | Data Search & Extraction | Questionnaire finding, field retrieval |
| **6** | Custom Functions | Business logic, calculated fields |
| **7** | Business API Calls | RC, GDD, organization endpoints |
| **7b** | Organization Center Mapping | `load_org_center_mapping()` |
| **8** | Processing Orchestration | `process_organization_patients()`, patient data processing |
| **9** | Main Execution | Entry point, quality checks, export |
**Key Technologies:**
- `httpx` - HTTP client (with thread-local instances)
- `openpyxl` - Excel parsing
- `concurrent.futures.ThreadPoolExecutor` - Parallel execution
- `tqdm` - Progress tracking
- `questionary` - Interactive prompts
---
### 2. **eb_dashboard_utils.py** - Utility Functions (6.4 KB)
**Responsibility:** Generic, reusable utility functions shared across modules
**Core Functions:**
```python
get_httpx_client() # Thread-local HTTP client management
get_thread_position() # Progress bar positioning
get_nested_value() # JSON path navigation with wildcard support (*)
get_config_path() # Config folder resolution (script vs PyInstaller)
get_old_filename() # Backup filename generation
```
**Key Features:**
- Thread-safe HTTP client pooling
- Wildcard support in nested JSON paths (e.g., `["items", "*", "value"]`)
- Cross-platform path resolution
---
### 3. **eb_dashboard_quality_checks.py** - QA & Validation (58.5 KB)
**Responsibility:** Quality assurance, data validation, regression checking
**Core Functions:**
| Function | Purpose |
|----------|---------|
| `load_regression_check_config()` | Load regression rules from Excel |
| `run_quality_checks()` | Orchestrate all QA checks |
| `coherence_check()` | Verify stats vs detailed data consistency |
| `non_regression_check()` | Config-driven change validation |
| `run_check_only_mode()` | Standalone validation mode |
| `backup_output_files()` | Create versioned backups |
**Quality Check Types:**
1. **Coherence Check**
- Compares API-provided organization statistics vs. actual inclusion counts
- Severity: Warning/Critical
- Example: Total API count (145) vs. actual inclusions (143)
2. **Non-Regression Check**
- Compares current vs. previous run data
- Applies config-driven rules with transition patterns
- Detects: new inclusions, deletions, field changes
- Severity: Warning/Critical with exceptions
---
### 4. **eb_dashboard_excel_export.py** - Excel Generation & Orchestration (38 KB, v1.1+)
**Responsibility:** Configuration-driven Excel workbook generation with data transformation + high-level orchestration
**Core Functions (Low-Level):**
| Function | Purpose |
|----------|---------|
| `load_excel_export_config()` | Load Excel_Workbooks + Excel_Sheets config |
| `validate_excel_config()` | Validate templates and named ranges |
| `export_to_excel()` | Main export orchestration (openpyxl + win32com) |
| `_apply_filter()` | AND-condition filtering |
| `_apply_sort()` | Multi-key sorting with datetime support |
| `_apply_value_replacement()` | Strict type matching value transformation |
| `_handle_output_exists()` | File conflict resolution |
| `_recalculate_workbook()` | Formula recalculation via win32com |
| `_process_sheet()` | Sheet-specific data filling |
**High-Level Orchestration Functions (v1.1+):**
| Function | Purpose | Called From |
|----------|---------|-------------|
| `export_excel_only()` | Complete --excel-only mode | main() CLI detection |
| `run_normal_mode_export()` | Normal mode export phase | main() after JSON write |
| `prepare_excel_export()` | Preparation + validation | Both orchestration functions |
| `execute_excel_export()` | Execution with error handling | Both orchestration functions |
| `_load_json_file_internal()` | Safe JSON loading | run_normal_mode_export() |
**Data Transformation Pipeline:**
```
1. Load Configuration (Excel_Workbooks + Excel_Sheets)
2. For each workbook:
a. Load template (openpyxl)
b. For each sheet:
- Apply filter (AND conditions)
- Apply sort (multi-key)
- Apply value replacement (strict type matching)
- Fill data into cells/named ranges
c. Handle file conflicts (Overwrite/Increment/Backup)
d. Save workbook (openpyxl)
e. Recalculate formulas (win32com - optional)
```
**Orchestration Pattern (v1.1+):**
As of v1.1, the system delegates all export orchestration to dedicated functions following the pattern established by `run_check_only_mode()` from quality_checks:
1. **--excel-only mode:** Main script calls single function → `export_excel_only()` handles everything
2. **Normal mode export:** Main script calls single function → `run_normal_mode_export()` handles everything
This keeps the main script focused on business logic while all export mechanics are encapsulated in the module.
---
## 🔄 Complete Data Collection Workflow
### Phase 1: Initialization (2-3 seconds)
1. User provides credentials (with defaults)
2. IAM Login: `POST /api/auth/ziwig-pro/login`
3. Token Exchange: `POST /api/auth/config-token`
4. Load configuration from `Endobest_Dashboard_Config.xlsx`
5. Validate field mappings and quality check rules
6. Setup thread pools (main: 20 workers, subtasks: 40 workers)
### Phase 2: Organization Retrieval (5-8 seconds)
1. Get all organizations: `GET /api/inclusions/getAllOrganizations`
2. Filter excluded centers (config-driven)
3. Fetch counters in parallel (20 workers):
- For each org: `POST /api/inclusions/inclusion-statistics`
- Store: patients_count, preincluded_count, included_count, prematurely_terminated_count
4. Optional: Enrich with center mapping (from `eb_org_center_mapping.xlsx`)
5. Calculate totals and sort
### Phase 3: Patient Data Collection (2-4 minutes)
**Nested Parallel Architecture:**
**Outer Loop (20 workers):** For each organization
- `POST /api/inclusions/search?limit=1000&page=1` → Get up to 1000 inclusions
**Middle Loop (Sequential):** For each patient
- Fetch clinical record: `POST /api/records/byPatient`
- Fetch questionnaires: `POST /api/surveys/filter/with-answers` (**optimized: 1 call**)
- Submit async lab request: `GET /api/requests/by-tube-id/{tubeId}` (in subtasks pool)
**Inner Loop (40 async workers):** Non-blocking lab/questionnaire processing
- Parallel fetches of lab requests while main thread processes fields
**Field Processing (per patient):**
- For each field in configuration:
1. Determine source (questionnaire, record, inclusion, request, calculated)
2. Extract raw value (supports JSON paths with wildcards)
3. Check field condition (optional)
4. Apply post-processing transformations
5. Format score dictionaries
6. Store in nested output structure
### Phase 4: Quality Assurance (10-15 seconds)
1. **Coherence Check:** Compare API counters vs. actual data
2. **Non-Regression Check:** Compare current vs. previous run with config rules
3. **Critical Issue Handling:** User confirmation if issues detected
4. If NO critical issues → continue to export
5. If YES critical issues → prompt user for override
### Phase 5: Export & Persistence (3-5 seconds)
**Step 1: Backup & JSON Write**
1. Backup old files (if quality checks passed)
2. Write JSON outputs:
- `endobest_inclusions.json` (6-7 MB)
- `endobest_organizations.json` (17-20 KB)
**Step 2: Excel Export (if configured)**
Delegated to `run_normal_mode_export()` function which handles:
1. Load JSONs from filesystem (ensures consistency)
2. Load Excel configuration
3. Validate templates and named ranges
4. For each configured workbook:
- Load template file
- Apply filter conditions (AND logic)
- Apply multi-key sort
- Apply value replacements (strict type matching)
- Fill data into cells/named ranges
- Handle file conflicts (Overwrite/Increment/Backup)
- Save workbook
- Recalculate formulas (optional, via win32com)
5. Display results and return status
**Step 3: Summary**
1. Display elapsed time
2. Report file locations
3. Note any warnings/errors during export
---
## ⚙️ Configuration System
### Three-Layer Configuration Architecture
#### Layer 1: Excel Configuration (`Endobest_Dashboard_Config.xlsx`)
**Sheet 1: Inclusions_Mapping** (Field Extraction)
- Define which patient fields to extract
- Specify sources (questionnaire, record, inclusion, request, calculated)
- Configure transformations (value labels, templates, conditions)
- ~50+ fields typically configured
**Sheet 2: Organizations_Mapping** (Organization Fields)
- Define which organization fields to export
- Rarely modified
**Sheet 3: Excel_Workbooks** (Excel Export Metadata)
- Workbook names
- Template paths
- Output filenames (with template variables)
- File conflict handling strategy (Overwrite/Increment/Backup)
**Sheet 4: Excel_Sheets** (Sheet Configurations)
- Workbook name (reference to Excel_Workbooks)
- Sheet name (in template)
- Source type (Inclusions/Organizations/Variable)
- Target (cell or named range)
- Column mapping (JSON)
- Filter conditions (JSON with AND logic)
- Sort keys (JSON, multi-key with datetime support)
- Value replacements (JSON, strict type matching)
**Sheet 5: Regression_Check** (Quality Rules)
- Rule names
- Field selection pipeline (include/exclude patterns)
- Scope (all organizations or specific org list)
- Transition patterns (expected state changes)
- Severity levels (Warning/Critical)
#### Layer 2: Organization Mapping (`eb_org_center_mapping.xlsx`)
- Optional mapping file
- Sheet: `Org_Center_Mapping`
- Maps organization names to center identifiers
- Gracefully degraded if missing
#### Layer 3: Excel Templates (`config/templates/`)
- Excel workbook templates with:
- Sheet definitions
- Named ranges (for data fill targets)
- Formula structures
- Formatting and styles
### Configuration Constants (in code)
```python
# API Configuration
IAM_URL = "https://api-auth.ziwig-connect.com"
RC_URL = "https://api-hcp.ziwig-connect.com"
GDD_URL = "https://api-lab.ziwig-connect.com"
RC_APP_ID = "602aea51-cdb2-4f73-ac99-fd84050dc393"
RC_ENDOBEST_PROTOCOL_ID = "3c7bcb4d-91ed-4e9f-b93f-99d8447a276e"
# Threading & Performance
MAX_THREADS = 20 # Main thread pool workers
ASYNC_THREADS = 40 # Subtasks thread pool workers
ERROR_MAX_RETRY = 10 # Maximum retry attempts
WAIT_BEFORE_RETRY = 0.5 # Seconds between retries
# Excluded Organizations
RC_ENDOBEST_EXCLUDED_CENTERS = ["e18e7487-...", "5582bd75-...", "e053512f-..."]
```
---
## 🔐 API Integration
### Authentication Flow
```
1. IAM Login
POST https://api-auth.ziwig-connect.com/api/auth/ziwig-pro/login
Request: {"username": "...", "password": "..."}
Response: {"access_token": "jwt_master", "userId": "uuid"}
2. Token Exchange (RC-specific)
POST https://api-hcp.ziwig-connect.com/api/auth/config-token
Headers: Authorization: Bearer {master_token}
Request: {"userId": "...", "clientId": "...", "userAgent": "..."}
Response: {"access_token": "jwt_rc", "refresh_token": "refresh_token"}
3. Automatic Token Refresh (on 401)
POST https://api-hcp.ziwig-connect.com/api/auth/refreshToken
Headers: Authorization: Bearer {current_token}
Request: {"refresh_token": "..."}
Response: {"access_token": "jwt_new", "refresh_token": "new_refresh"}
```
### Key API Endpoints
| Endpoint | Method | Purpose |
|----------|--------|---------|
| `/api/inclusions/getAllOrganizations` | GET | List all organizations |
| `/api/inclusions/inclusion-statistics` | POST | Get patient counts per org |
| `/api/inclusions/search` | POST | Get inclusions list for org (paginated) |
| `/api/records/byPatient` | POST | Get clinical record for patient |
| `/api/surveys/filter/with-answers` | POST | **OPTIMIZED:** Get all questionnaires for patient |
| `/api/requests/by-tube-id/{tubeId}` | GET | Get lab test results |
### Performance Optimization: Questionnaire Batching
**Problem:** Multiple API calls per patient (1 call per questionnaire × N patients = slow)
**Solution:** Single optimized call retrieves all questionnaires with answers
```
BEFORE (inefficient):
for qcm_id in questionnaire_ids:
GET /api/surveys/{qcm_id}/answers?subject={patient_id}
# Result: N API calls per patient
AFTER (optimized):
POST /api/surveys/filter/with-answers
{
"context": "clinic_research",
"subject": patient_id
}
# Result: 1 API call per patient
# Impact: 4-5x performance improvement
```
---
## ⚡ Multithreading & Performance Optimization
### Thread Pool Architecture
```
Main Application Thread
┌─ Phase 1: Counter Fetching ──────────────────────────┐
│ ThreadPoolExecutor(max_workers=user_input, cap=20) │
│ ├─ Task 1: Get counters for Org 1 │
│ ├─ Task 2: Get counters for Org 2 │
│ └─ Task N: Get counters for Org N │
│ [Sequential wait: tqdm.as_completed] │
└──────────────────────────────────────────────────────┘
┌─ Phase 2: Inclusion Data Collection (Nested) ────────┐
│ Outer: ThreadPoolExecutor(max_workers=user_input) │
│ │
│ For Org 1: │
│ │ Inner: ThreadPoolExecutor(max_workers=40) │
│ │ ├─ Patient 1: Async lab/questionnaire fetch │
│ │ ├─ Patient 2: Async lab/questionnaire fetch │
│ │ └─ Patient N: Async lab/questionnaire fetch │
│ │ [Sequential outer wait: as_completed] │
│ │ │
│ For Org 2: │
│ │ [Similar parallel processing] │
│ │ │
│ For Org N: │
│ │ [Similar parallel processing] │
└──────────────────────────────────────────────────────┘
```
### Performance Optimizations
1. **Thread-Local HTTP Clients**
- Each thread maintains its own `httpx.Client`
- Avoids connection conflicts
- Implementation via `get_httpx_client()`
2. **Nested Parallelization**
- Main pool: Organizations (20 workers)
- Subtasks pool: Lab requests (40 workers)
- Non-blocking I/O during processing
3. **Questionnaire Batching** (4-5x improvement)
- Single call retrieves all questionnaires + answers
- Eliminates N filtered calls per patient
4. **Configurable Worker Threads**
- User input selection (1-20 workers)
- Tunable for network bandwidth and API rate limits
### Progress Tracking (Multi-Level)
```
Overall Progress [████████████░░░░░░░░░░░░] 847/1200
1/15 - Center 1 [██████████░░░░░░░░░░░░░░░] 73/95
2/15 - Center 2 [██████░░░░░░░░░░░░░░░░░░░] 42/110
3/15 - Center 3 [████░░░░░░░░░░░░░░░░░░░░░] 28/85
```
**Thread-Safe Updates:**
```python
with _global_pbar_lock:
if global_pbar:
global_pbar.update(1)
```
---
## 🛡️ Error Handling & Resilience
### Token Management Strategy
1. **Automatic Token Refresh on 401**
- Triggered by `@api_call_with_retry` decorator
- Thread-safe via `_token_refresh_lock`
2. **Retry Mechanism**
- Max retries: 10 attempts
- Delay between retries: 0.5 seconds
- Decorators: `@api_call_with_retry`
3. **Thread-Safe Token Refresh**
```python
def new_token():
global access_token, refresh_token
with _token_refresh_lock: # Only one thread refreshes at a time
for attempt in range(ERROR_MAX_RETRY):
try:
# POST /api/auth/refreshToken
# Update global tokens
except:
sleep(WAIT_BEFORE_RETRY)
```
### Exception Handling Categories
| Category | Examples | Handling |
|----------|----------|----------|
| **API Errors** | Network timeouts, HTTP errors | Retry with exponential spacing |
| **File I/O Errors** | Missing config, permission denied | Graceful error + exit |
| **Validation Errors** | Invalid config, incoherent data | Log warning + prompt user |
| **Thread Errors** | Worker thread failures | Shutdown gracefully + propagate |
### Graceful Degradation
1. **Missing Organization Mapping:** Skip silently, use fallback (org name)
2. **Critical Quality Issues:** Prompt user for confirmation before export
3. **Thread Failure:** Shutdown all workers gracefully, preserve partial results
4. **Invalid Configuration:** Clear error messages with remediation suggestions
---
## 📊 Data Output Structure
### JSON Output: `endobest_inclusions.json`
```json
[
{
"Patient_Identification": {
"Organisation_Id": "uuid",
"Organisation_Name": "Hospital Name",
"Center_Name": "HOSP-A",
"Patient_Id": "internal_id",
"Pseudo": "ENDO-001",
"Patient_Name": "Doe, John",
"Patient_Birthday": "1975-05-15",
"Patient_Age": 49
},
"Inclusion": {
"Consent_Signed": true,
"Inclusion_Date": "15/10/2024",
"Inclusion_Status": "incluse",
"isPrematurelyTerminated": false
},
"Extended_Fields": {
"Custom_Field_1": "value",
"Custom_Field_2": 42,
"Composite_Score": "8/10"
},
"Endotest": {
"Request_Sent": true,
"Diagnostic_Status": "Completed"
}
}
]
```
### JSON Output: `endobest_organizations.json`
```json
[
{
"id": "org-uuid",
"name": "Hospital A",
"Center_Name": "HOSP-A",
"patients_count": 45,
"preincluded_count": 8,
"included_count": 35,
"prematurely_terminated_count": 2
}
]
```
---
## 🚀 Execution Modes
### Mode 1: Normal (Full Collection)
```bash
python eb_dashboard.py
```
- Authenticates
- Collects from APIs
- Runs quality checks
- Exports JSON + Excel
- Duration: 2.5-5 minutes (typical)
### Mode 2: Excel-Only (Fast Export)
```bash
python eb_dashboard.py --excel-only
```
- Skips data collection
- Uses existing JSON files
- Regenerates Excel workbooks
- Duration: 5-15 seconds
- Use case: Reconfigure reports, test templates
### Mode 3: Check-Only (Validation Only)
```bash
python eb_dashboard.py --check-only
```
- Loads existing JSON
- Runs quality checks
- No export
- Duration: 5-10 seconds
- Use case: Verify data before distribution
### Mode 4: Debug (Verbose Output)
```bash
python eb_dashboard.py --debug
```
- Executes normal mode
- Enables detailed logging
- Shows field-by-field changes
- Check `dashboard.log` for details
---
## 📈 Performance Metrics & Benchmarks
### Typical Execution Times (Full Dataset: 1,200+ patients, 15+ organizations)
| Phase | Duration | Notes |
|-------|----------|-------|
| **Login & Config** | 2-3 sec | Sequential, network-dependent |
| **Fetch Counters** | 5-8 sec | 20 workers, parallelized |
| **Collect Inclusions** | 2-4 min | Includes API calls + field processing |
| **Quality Checks** | 10-15 sec | File loads, data comparison |
| **Export to JSON** | 3-5 sec | File I/O |
| **Export to Excel** | 5-15 sec | Template processing + fill |
| **TOTAL** | **~2.5-5 min** | Depends on network, API perf |
### Network Optimization Impact
**With old questionnaire approach (N filtered calls per patient):**
- 1,200 patients × 15 questionnaires = 18,000 API calls
- Estimated: 15-30 minutes
**With optimized single-call questionnaire:**
- 1,200 patients × 1 call = 1,200 API calls
- Estimated: 2-5 minutes
- **Improvement: 3-6x faster** ✅
---
## 🔍 Field Extraction & Processing Logic
### Complete Field Processing Pipeline
```
For each field in INCLUSIONS_MAPPING_CONFIG:
├─ Step 1: Determine Source Type
│ ├─ q_id / q_name / q_category → Find questionnaire
│ ├─ record → Use clinical record
│ ├─ inclusion → Use patient inclusion data
│ ├─ request → Use lab request data
│ └─ calculated → Execute custom function
├─ Step 2: Extract Raw Value
│ ├─ Navigate JSON using field_path
│ ├─ Supports wildcard (*) for list traversal
│ └─ Return value or "undefined"
├─ Step 3: Check Field Condition (optional)
│ ├─ If condition undefined → Set to "undefined"
│ ├─ If condition not boolean → Error flag
│ ├─ If condition false → Set to "N/A"
│ └─ If condition true → Continue
├─ Step 4: Apply Post-Processing Transformations
│ ├─ true_if_any: Convert to boolean
│ ├─ value_labels: Map to localized text
│ ├─ field_template: Apply formatting
│ └─ List joining: Flatten arrays with pipe delimiter
├─ Step 5: Format Score Dictionaries
│ ├─ If {total, max} → Format as "total/max"
│ └─ Otherwise → Keep as-is
└─ Store: output_inclusion[field_group][field_name] = final_value
```
### Custom Functions for Calculated Fields
| Function | Purpose | Syntax |
|----------|---------|--------|
| `search_in_fields_using_regex` | Search multiple fields for pattern | `["search_in_fields_using_regex", "pattern", "field1", "field2"]` |
| `extract_parentheses_content` | Extract text within parentheses | `["extract_parentheses_content", "field_name"]` |
| `append_terminated_suffix` | Add suffix if patient terminated | `["append_terminated_suffix", "status_field", "is_terminated_field"]` |
| `if_then_else` | Unified conditional with 8 operators | `["if_then_else", "operator", arg1, arg2_optional, true_result, false_result]` |
**if_then_else Operators:**
- `is_true` / `is_false` - Boolean field test
- `is_defined` / `is_undefined` - Existence test
- `all_true` / `all_defined` - Multiple field test
- `==` / `!=` - Value comparison
---
## ✅ Quality Assurance Framework
### Coherence Check
**Purpose:** Verify API-provided statistics match actual collected data
**Logic:**
```
For each organization:
API_Count = statistic.total
Actual_Count = count of inclusion records
if API_Count != Actual_Count:
Report discrepancy with severity
├─ ±10%: Warning
└─ >±10%: Critical
```
### Non-Regression Check
**Purpose:** Detect unexpected changes between data runs
**Configuration-Driven Rules:**
- Field selection pipeline (include/exclude patterns)
- Transition patterns (expected state changes)
- Severity levels (Warning/Critical)
- Exception handling (exclude specific organizations)
**Logic:**
```
Load previous inclusion data (_old file)
For each rule:
├─ Build candidate fields via pipeline
├─ Determine key field for matching
└─ For each inclusion:
├─ Find matching old inclusion by key
├─ Check for unexpected transitions
├─ Apply exceptions
└─ Report violations
```
---
## 📋 Documentation Structure
The system includes comprehensive documentation:
| Document | Size | Content |
|----------|------|---------|
| **DOCUMENTATION_10_ARCHITECTURE.md** | 43.7 KB | System design, workflow, APIs, multithreading |
| **DOCUMENTATION_11_FIELD_MAPPING.md** | 56.3 KB | Field extraction logic, custom functions, examples |
| **DOCUMENTATION_12_QUALITY_CHECKS.md** | 60.2 KB | QA framework, regression rules, configuration |
| **DOCUMENTATION_13_EXCEL_EXPORT.md** | 29.6 KB | Excel generation, data transformation, config |
| **DOCUMENTATION_98_USER_GUIDE.md** | 8.4 KB | End-user instructions, troubleshooting, FAQ |
| **DOCUMENTATION_99_CONFIG_GUIDE.md** | 24.8 KB | Administrator reference, Excel tables, examples |
---
## 🔧 Key Technical Features
### Thread Safety
- Per-thread HTTP clients (no connection conflicts)
- Synchronized access to global state via locks
- Thread-safe progress bar updates
### Error Recovery
- Automatic token refresh on 401 errors
- Exponential backoff retry logic (configurable)
- Graceful degradation for optional features
- User confirmation on critical issues
### Configuration Flexibility
- 100% externalized to Excel (zero code changes)
- Supports multiple data sources
- Custom business logic functions
- Field dependencies and conditions
- Value transformations and templates
### Performance
- Optimized API calls (4-5x improvement)
- Parallel processing (20+ workers)
- Async I/O operations
- Configurable thread pools
### Data Quality
- Coherence checking (stats vs actual data)
- Non-regression testing (config-driven)
- Comprehensive validation
- Audit trail logging
---
## 📦 Dependencies
### Core Libraries
- **httpx** - HTTP client with connection pooling
- **openpyxl** - Excel file reading/writing
- **questionary** - Interactive CLI prompts
- **tqdm** - Progress bars
- **rich** - Rich text formatting
- **pywin32** - Windows COM automation (optional, for formula recalculation)
- **pytz** - Timezone support (optional)
### Python Version
- Python 3.7+
### External Services
- Ziwig IAM API
- Ziwig Research Clinic (RC) API
- Ziwig Lab (GDD) API
---
## 🎓 Usage Patterns
### For End Users
1. Configure fields in Excel (no code needed)
2. Run: `python eb_dashboard.py`
3. Review results in JSON or Excel
### For Administrators
1. Add new fields to `Inclusions_Mapping`
2. Define quality rules in `Regression_Check`
3. Configure Excel export in `Excel_Workbooks` + `Excel_Sheets`
4. Restart: script picks up config automatically
### For Developers
1. Add custom function to Block 6 (eb_dashboard.py)
2. Register in field config (Inclusions_Mapping)
3. Use via: `"source_id": "function_name"`
4. No code recompile needed for other changes
---
## 🎯 Summary
The **Endobest Clinical Research Dashboard** represents a mature, production-ready system that successfully combines:
**Architectural Excellence** - Clean modular design with separation of concerns
**User-Centric Configuration** - 100% externalized, no code changes needed
**Performance Optimization** - 4-5x faster via API and threading improvements
**Robust Resilience** - Comprehensive error handling, automatic recovery, graceful degradation
**Quality Assurance** - Multi-level validation, coherence checks, regression testing
**Comprehensive Documentation** - 250+ KB of technical and user guides
**Maintainability** - Clear code structure, extensive logging, audit trails
The system successfully enables non-technical users to configure complex data extraction and reporting workflows while maintaining enterprise-grade reliability and performance standards.
---
**Document Version:** 1.0
**Last Updated:** 2025-11-08
**Status:** ✅ Complete & Production Ready