40 KiB
📊 Endobest Clinical Research Dashboard - Architecture Summary
Last Updated: 2025-11-08 Project Status: Production Ready with Excel Export Feature Language: Python 3.x
🎯 Executive Summary
The Endobest Clinical Research Dashboard is a sophisticated, production-grade automated data collection and reporting system designed to aggregate patient inclusion data from the Endobest clinical research protocol across multiple healthcare organizations. The system combines high-performance multithreading, comprehensive quality assurance, and fully externalized configuration to enable non-technical users to manage complex data extraction workflows without code modifications.
Core Value Propositions
✅ 100% Externalized Configuration - All field definitions, quality rules, and export logic defined in Excel ✅ High-Performance Architecture - 4-5x faster via optimized API calls and parallel processing ✅ Robust Resilience - Automatic token refresh, retries, graceful degradation ✅ Comprehensive Quality Assurance - Coherence checks + config-driven regression testing ✅ Multi-Format Export - JSON + configurable Excel workbooks with data transformation ✅ User-Friendly Interface - Interactive prompts, progress tracking, clear error messages
📁 Project Structure
Endobest Dashboard/
├── 📜 MAIN SCRIPT
│ └── eb_dashboard.py (57.5 KB, 1,021 lines)
│ Core orchestrator for data collection, processing, and export
│
├── 🔧 UTILITY MODULES
│ ├── eb_dashboard_utils.py (6.4 KB, 184 lines)
│ │ Thread-safe HTTP clients, nested data navigation, config resolution
│ │
│ ├── eb_dashboard_quality_checks.py (58.5 KB, 1,266 lines)
│ │ Coherence checks, non-regression testing, data validation
│ │
│ └── eb_dashboard_excel_export.py (32 KB, ~1,000 lines)
│ Configuration-driven Excel workbook generation
│
├── 📚 DOCUMENTATION
│ ├── DOCUMENTATION_10_ARCHITECTURE.md (43.7 KB)
│ │ System design, data flow, API integration, multithreading
│ │
│ ├── DOCUMENTATION_11_FIELD_MAPPING.md (56.3 KB)
│ │ Field extraction logic, custom functions, transformations
│ │
│ ├── DOCUMENTATION_12_QUALITY_CHECKS.md (60.2 KB)
│ │ Quality assurance framework, regression rules, validation logic
│ │
│ ├── DOCUMENTATION_13_EXCEL_EXPORT.md (29.6 KB)
│ │ Excel generation architecture, data transformation pipeline
│ │
│ ├── DOCUMENTATION_98_USER_GUIDE.md (8.4 KB)
│ │ End-user instructions, quick start, troubleshooting
│ │
│ └── DOCUMENTATION_99_CONFIG_GUIDE.md (24.8 KB)
│ Administrator configuration reference
│
├── ⚙️ CONFIGURATION
│ └── config/
│ ├── Endobest_Dashboard_Config.xlsx (Configuration file)
│ │ Inclusions_Mapping
│ │ Organizations_Mapping
│ │ Excel_Workbooks
│ │ Excel_Sheets
│ │ Regression_Check
│ │
│ ├── eb_org_center_mapping.xlsx (Organization enrichment)
│ │
│ └── templates/
│ ├── Endobest_Template.xlsx
│ ├── Statistics_Template.xlsx
│ └── (Other Excel templates)
│
├── 📊 OUTPUT FILES
│ ├── endobest_inclusions.json (~6-7 MB, patient data)
│ ├── endobest_inclusions_old.json (backup)
│ ├── endobest_organizations.json (~17-20 KB, stats)
│ ├── endobest_organizations_old.json (backup)
│ ├── [Excel outputs] (*.xlsx, configurable)
│ └── dashboard.log (Execution log)
│
└── 🔨 EXECUTABLES
├── eb_dashboard.exe (16.5 MB, PyInstaller build)
└── [Various .bat launch scripts]
🏗️ System Architecture Overview
High-Level Component Diagram
┌─────────────────────────────────────────────────────────────────────┐
│ ENDOBEST DASHBOARD MAIN PROCESS │
│ eb_dashboard.py │
├─────────────────────────────────────────────────────────────────────┤
│ │
│ ┌──────────────────────────────────────────────────────────────┐ │
│ │ PHASE 1: INITIALIZATION & AUTHENTICATION │ │
│ │ ├─ User Login (IAM API) │ │
│ │ ├─ Token Exchange (RC-specific) │ │
│ │ ├─ Config Loading (Excel parsing & validation) │ │
│ │ └─ Thread Pool Setup (20 workers main, 40 subtasks) │ │
│ └──────────────────────────────────────────────────────────────┘ │
│ ↓ │
│ ┌──────────────────────────────────────────────────────────────┐ │
│ │ PHASE 2: ORGANIZATION & COUNTERS RETRIEVAL │ │
│ │ ├─ Get All Organizations (getAllOrganizations API) │ │
│ │ ├─ Fetch Counters Parallelized (20 workers) │ │
│ │ ├─ Enrich with Center Mapping (optional) │ │
│ │ └─ Calculate Totals & Sort │ │
│ └──────────────────────────────────────────────────────────────┘ │
│ ↓ │
│ ┌──────────────────────────────────────────────────────────────┐ │
│ │ PHASE 3: PATIENT INCLUSION DATA COLLECTION │ │
│ │ Outer Loop: Organizations (20 parallel workers) │ │
│ │ ├─ For Each Organization: │ │
│ │ │ ├─ Get Inclusions List (POST /api/inclusions/search) │ │
│ │ │ └─ For Each Patient (Sequential): │ │
│ │ │ ├─ Fetch Clinical Record (API) │ │
│ │ │ ├─ Fetch All Questionnaires (Optimized: 1 call) │ │
│ │ │ ├─ Fetch Lab Requests (Async pool) │ │
│ │ │ ├─ Process Field Mappings (extraction + transform) │ │
│ │ │ └─ Update Progress Bars (thread-safe) │ │
│ │ │ │ │
│ │ │ Inner Async: Lab/Questionnaire Fetches (40 workers) │ │
│ │ │ (Non-blocking I/O during main processing) │ │
│ │ └─ Combine Inclusions from All Orgs │ │
│ └──────────────────────────────────────────────────────────────┘ │
│ ↓ │
│ ┌──────────────────────────────────────────────────────────────┐ │
│ │ PHASE 4: QUALITY ASSURANCE & VALIDATION │ │
│ │ ├─ Coherence Check (API stats vs actual data) │ │
│ │ │ └─ Compares counters with detailed records │ │
│ │ ├─ Non-Regression Check (config-driven) │ │
│ │ │ └─ Detects changes with severity levels │ │
│ │ └─ Critical Issue Handling (user confirmation if needed) │ │
│ └──────────────────────────────────────────────────────────────┘ │
│ ↓ │
│ ┌──────────────────────────────────────────────────────────────┐ │
│ │ PHASE 5: EXPORT & PERSISTENCE │ │
│ │ ├─ Backup Old Files (if quality passed) │ │
│ │ ├─ Write JSON Outputs (endobest_inclusions.json, etc.) │ │
│ │ ├─ Export to Excel (if configured) │ │
│ │ │ ├─ Load Templates │ │
│ │ │ ├─ Apply Filters & Sorts │ │
│ │ │ ├─ Fill Data into Sheets │ │
│ │ │ ├─ Replace Values │ │
│ │ │ └─ Recalculate Formulas (win32com) │ │
│ │ └─ Display Summary & Elapsed Time │ │
│ └──────────────────────────────────────────────────────────────┘ │
│ ↓ │
│ EXIT │
└─────────────────────────────────────────────────────────────────────┘
↓ EXTERNAL DEPENDENCIES ↓
┌─────────────────────────────────────────────────────────────────────┐
│ EXTERNAL APIS │
├─────────────────────────────────────────────────────────────────────┤
│ │
│ 🔐 AUTHENTICATION (IAM) │
│ └─ api-auth.ziwig-connect.com │
│ ├─ POST /api/auth/ziwig-pro/login │
│ └─ POST /api/auth/refreshToken │
│ │
│ 🏥 RESEARCH CLINIC (RC) │
│ └─ api-hcp.ziwig-connect.com │
│ ├─ POST /api/auth/config-token │
│ ├─ GET /api/inclusions/getAllOrganizations │
│ ├─ POST /api/inclusions/inclusion-statistics │
│ ├─ POST /api/inclusions/search │
│ ├─ POST /api/records/byPatient │
│ └─ POST /api/surveys/filter/with-answers (optimized!) │
│ │
│ 🧪 LAB / DIAGNOSTICS (GDD) │
│ └─ api-lab.ziwig-connect.com │
│ └─ GET /api/requests/by-tube-id/{tubeId} │
│ │
│ 📝 EXCEL TEMPLATES │
│ └─ config/templates/ │
│ ├─ Endobest_Template.xlsx │
│ ├─ Statistics_Template.xlsx │
│ └─ (Custom templates) │
│ │
└─────────────────────────────────────────────────────────────────────┘
🔌 Module Descriptions
1. eb_dashboard.py - Main Orchestrator (57.5 KB)
Responsibility: Complete data collection workflow, API coordination, multithreaded execution
Structure (9 Blocks):
| Block | Purpose | Key Functions |
|---|---|---|
| 1 | Configuration & Infrastructure | Constants, global vars, progress bar setup |
| 2 | Decorators & Resilience | @api_call_with_retry, retry logic |
| 3 | Authentication | login(), token exchange, IAM integration |
| 3B | File Utilities | load_json_file() |
| 4 | Inclusions Mapping Config | load_inclusions_mapping_config(), validation |
| 5 | Data Search & Extraction | Questionnaire finding, field retrieval |
| 6 | Custom Functions | Business logic, calculated fields |
| 7 | Business API Calls | RC, GDD, organization endpoints |
| 7b | Organization Center Mapping | load_org_center_mapping() |
| 8 | Processing Orchestration | process_organization_patients(), patient data processing |
| 9 | Main Execution | Entry point, quality checks, export |
Key Technologies:
httpx- HTTP client (with thread-local instances)openpyxl- Excel parsingconcurrent.futures.ThreadPoolExecutor- Parallel executiontqdm- Progress trackingquestionary- Interactive prompts
2. eb_dashboard_utils.py - Utility Functions (6.4 KB)
Responsibility: Generic, reusable utility functions shared across modules
Core Functions:
get_httpx_client() # Thread-local HTTP client management
get_thread_position() # Progress bar positioning
get_nested_value() # JSON path navigation with wildcard support (*)
get_config_path() # Config folder resolution (script vs PyInstaller)
get_old_filename() # Backup filename generation
Key Features:
- Thread-safe HTTP client pooling
- Wildcard support in nested JSON paths (e.g.,
["items", "*", "value"]) - Cross-platform path resolution
3. eb_dashboard_quality_checks.py - QA & Validation (58.5 KB)
Responsibility: Quality assurance, data validation, regression checking
Core Functions:
| Function | Purpose |
|---|---|
load_regression_check_config() |
Load regression rules from Excel |
run_quality_checks() |
Orchestrate all QA checks |
coherence_check() |
Verify stats vs detailed data consistency |
non_regression_check() |
Config-driven change validation |
run_check_only_mode() |
Standalone validation mode |
backup_output_files() |
Create versioned backups |
Quality Check Types:
-
Coherence Check
- Compares API-provided organization statistics vs. actual inclusion counts
- Severity: Warning/Critical
- Example: Total API count (145) vs. actual inclusions (143)
-
Non-Regression Check
- Compares current vs. previous run data
- Applies config-driven rules with transition patterns
- Detects: new inclusions, deletions, field changes
- Severity: Warning/Critical with exceptions
4. eb_dashboard_excel_export.py - Excel Generation & Orchestration (38 KB, v1.1+)
Responsibility: Configuration-driven Excel workbook generation with data transformation + high-level orchestration
Core Functions (Low-Level):
| Function | Purpose |
|---|---|
load_excel_export_config() |
Load Excel_Workbooks + Excel_Sheets config |
validate_excel_config() |
Validate templates and named ranges |
export_to_excel() |
Main export orchestration (openpyxl + win32com) |
_apply_filter() |
AND-condition filtering |
_apply_sort() |
Multi-key sorting with datetime support |
_apply_value_replacement() |
Strict type matching value transformation |
_handle_output_exists() |
File conflict resolution |
_recalculate_workbook() |
Formula recalculation via win32com |
_process_sheet() |
Sheet-specific data filling |
High-Level Orchestration Functions (v1.1+):
| Function | Purpose | Called From |
|---|---|---|
export_excel_only() |
Complete --excel-only mode | main() CLI detection |
run_normal_mode_export() |
Normal mode export phase | main() after JSON write |
prepare_excel_export() |
Preparation + validation | Both orchestration functions |
execute_excel_export() |
Execution with error handling | Both orchestration functions |
_load_json_file_internal() |
Safe JSON loading | run_normal_mode_export() |
Data Transformation Pipeline:
1. Load Configuration (Excel_Workbooks + Excel_Sheets)
2. For each workbook:
a. Load template (openpyxl)
b. For each sheet:
- Apply filter (AND conditions)
- Apply sort (multi-key)
- Apply value replacement (strict type matching)
- Fill data into cells/named ranges
c. Handle file conflicts (Overwrite/Increment/Backup)
d. Save workbook (openpyxl)
e. Recalculate formulas (win32com - optional)
Orchestration Pattern (v1.1+):
As of v1.1, the system delegates all export orchestration to dedicated functions following the pattern established by run_check_only_mode() from quality_checks:
- --excel-only mode: Main script calls single function →
export_excel_only()handles everything - Normal mode export: Main script calls single function →
run_normal_mode_export()handles everything
This keeps the main script focused on business logic while all export mechanics are encapsulated in the module.
🔄 Complete Data Collection Workflow
Phase 1: Initialization (2-3 seconds)
- User provides credentials (with defaults)
- IAM Login:
POST /api/auth/ziwig-pro/login - Token Exchange:
POST /api/auth/config-token - Load configuration from
Endobest_Dashboard_Config.xlsx - Validate field mappings and quality check rules
- Setup thread pools (main: 20 workers, subtasks: 40 workers)
Phase 2: Organization Retrieval (5-8 seconds)
- Get all organizations:
GET /api/inclusions/getAllOrganizations - Filter excluded centers (config-driven)
- Fetch counters in parallel (20 workers):
- For each org:
POST /api/inclusions/inclusion-statistics - Store: patients_count, preincluded_count, included_count, prematurely_terminated_count
- For each org:
- Optional: Enrich with center mapping (from
eb_org_center_mapping.xlsx) - Calculate totals and sort
Phase 3: Patient Data Collection (2-4 minutes)
Nested Parallel Architecture:
Outer Loop (20 workers): For each organization
POST /api/inclusions/search?limit=1000&page=1→ Get up to 1000 inclusions
Middle Loop (Sequential): For each patient
- Fetch clinical record:
POST /api/records/byPatient - Fetch questionnaires:
POST /api/surveys/filter/with-answers(optimized: 1 call) - Submit async lab request:
GET /api/requests/by-tube-id/{tubeId}(in subtasks pool)
Inner Loop (40 async workers): Non-blocking lab/questionnaire processing
- Parallel fetches of lab requests while main thread processes fields
Field Processing (per patient):
- For each field in configuration:
- Determine source (questionnaire, record, inclusion, request, calculated)
- Extract raw value (supports JSON paths with wildcards)
- Check field condition (optional)
- Apply post-processing transformations
- Format score dictionaries
- Store in nested output structure
Phase 4: Quality Assurance (10-15 seconds)
- Coherence Check: Compare API counters vs. actual data
- Non-Regression Check: Compare current vs. previous run with config rules
- Critical Issue Handling: User confirmation if issues detected
- If NO critical issues → continue to export
- If YES critical issues → prompt user for override
Phase 5: Export & Persistence (3-5 seconds)
Step 1: Backup & JSON Write
- Backup old files (if quality checks passed)
- Write JSON outputs:
endobest_inclusions.json(6-7 MB)endobest_organizations.json(17-20 KB)
Step 2: Excel Export (if configured)
Delegated to run_normal_mode_export() function which handles:
- Load JSONs from filesystem (ensures consistency)
- Load Excel configuration
- Validate templates and named ranges
- For each configured workbook:
- Load template file
- Apply filter conditions (AND logic)
- Apply multi-key sort
- Apply value replacements (strict type matching)
- Fill data into cells/named ranges
- Handle file conflicts (Overwrite/Increment/Backup)
- Save workbook
- Recalculate formulas (optional, via win32com)
- Display results and return status
Step 3: Summary
- Display elapsed time
- Report file locations
- Note any warnings/errors during export
⚙️ Configuration System
Three-Layer Configuration Architecture
Layer 1: Excel Configuration (Endobest_Dashboard_Config.xlsx)
Sheet 1: Inclusions_Mapping (Field Extraction)
- Define which patient fields to extract
- Specify sources (questionnaire, record, inclusion, request, calculated)
- Configure transformations (value labels, templates, conditions)
- ~50+ fields typically configured
Sheet 2: Organizations_Mapping (Organization Fields)
- Define which organization fields to export
- Rarely modified
Sheet 3: Excel_Workbooks (Excel Export Metadata)
- Workbook names
- Template paths
- Output filenames (with template variables)
- File conflict handling strategy (Overwrite/Increment/Backup)
Sheet 4: Excel_Sheets (Sheet Configurations)
- Workbook name (reference to Excel_Workbooks)
- Sheet name (in template)
- Source type (Inclusions/Organizations/Variable)
- Target (cell or named range)
- Column mapping (JSON)
- Filter conditions (JSON with AND logic)
- Sort keys (JSON, multi-key with datetime support)
- Value replacements (JSON, strict type matching)
Sheet 5: Regression_Check (Quality Rules)
- Rule names
- Field selection pipeline (include/exclude patterns)
- Scope (all organizations or specific org list)
- Transition patterns (expected state changes)
- Severity levels (Warning/Critical)
Layer 2: Organization Mapping (eb_org_center_mapping.xlsx)
- Optional mapping file
- Sheet:
Org_Center_Mapping - Maps organization names to center identifiers
- Gracefully degraded if missing
Layer 3: Excel Templates (config/templates/)
- Excel workbook templates with:
- Sheet definitions
- Named ranges (for data fill targets)
- Formula structures
- Formatting and styles
Configuration Constants (in code)
# API Configuration
IAM_URL = "https://api-auth.ziwig-connect.com"
RC_URL = "https://api-hcp.ziwig-connect.com"
GDD_URL = "https://api-lab.ziwig-connect.com"
RC_APP_ID = "602aea51-cdb2-4f73-ac99-fd84050dc393"
RC_ENDOBEST_PROTOCOL_ID = "3c7bcb4d-91ed-4e9f-b93f-99d8447a276e"
# Threading & Performance
MAX_THREADS = 20 # Main thread pool workers
ASYNC_THREADS = 40 # Subtasks thread pool workers
ERROR_MAX_RETRY = 10 # Maximum retry attempts
WAIT_BEFORE_RETRY = 0.5 # Seconds between retries
# Excluded Organizations
RC_ENDOBEST_EXCLUDED_CENTERS = ["e18e7487-...", "5582bd75-...", "e053512f-..."]
🔐 API Integration
Authentication Flow
1. IAM Login
POST https://api-auth.ziwig-connect.com/api/auth/ziwig-pro/login
Request: {"username": "...", "password": "..."}
Response: {"access_token": "jwt_master", "userId": "uuid"}
2. Token Exchange (RC-specific)
POST https://api-hcp.ziwig-connect.com/api/auth/config-token
Headers: Authorization: Bearer {master_token}
Request: {"userId": "...", "clientId": "...", "userAgent": "..."}
Response: {"access_token": "jwt_rc", "refresh_token": "refresh_token"}
3. Automatic Token Refresh (on 401)
POST https://api-hcp.ziwig-connect.com/api/auth/refreshToken
Headers: Authorization: Bearer {current_token}
Request: {"refresh_token": "..."}
Response: {"access_token": "jwt_new", "refresh_token": "new_refresh"}
Key API Endpoints
| Endpoint | Method | Purpose |
|---|---|---|
/api/inclusions/getAllOrganizations |
GET | List all organizations |
/api/inclusions/inclusion-statistics |
POST | Get patient counts per org |
/api/inclusions/search |
POST | Get inclusions list for org (paginated) |
/api/records/byPatient |
POST | Get clinical record for patient |
/api/surveys/filter/with-answers |
POST | OPTIMIZED: Get all questionnaires for patient |
/api/requests/by-tube-id/{tubeId} |
GET | Get lab test results |
Performance Optimization: Questionnaire Batching
Problem: Multiple API calls per patient (1 call per questionnaire × N patients = slow)
Solution: Single optimized call retrieves all questionnaires with answers
BEFORE (inefficient):
for qcm_id in questionnaire_ids:
GET /api/surveys/{qcm_id}/answers?subject={patient_id}
# Result: N API calls per patient
AFTER (optimized):
POST /api/surveys/filter/with-answers
{
"context": "clinic_research",
"subject": patient_id
}
# Result: 1 API call per patient
# Impact: 4-5x performance improvement
⚡ Multithreading & Performance Optimization
Thread Pool Architecture
Main Application Thread
↓
┌─ Phase 1: Counter Fetching ──────────────────────────┐
│ ThreadPoolExecutor(max_workers=user_input, cap=20) │
│ ├─ Task 1: Get counters for Org 1 │
│ ├─ Task 2: Get counters for Org 2 │
│ └─ Task N: Get counters for Org N │
│ [Sequential wait: tqdm.as_completed] │
└──────────────────────────────────────────────────────┘
↓
┌─ Phase 2: Inclusion Data Collection (Nested) ────────┐
│ Outer: ThreadPoolExecutor(max_workers=user_input) │
│ │
│ For Org 1: │
│ │ Inner: ThreadPoolExecutor(max_workers=40) │
│ │ ├─ Patient 1: Async lab/questionnaire fetch │
│ │ ├─ Patient 2: Async lab/questionnaire fetch │
│ │ └─ Patient N: Async lab/questionnaire fetch │
│ │ [Sequential outer wait: as_completed] │
│ │ │
│ For Org 2: │
│ │ [Similar parallel processing] │
│ │ │
│ For Org N: │
│ │ [Similar parallel processing] │
└──────────────────────────────────────────────────────┘
Performance Optimizations
-
Thread-Local HTTP Clients
- Each thread maintains its own
httpx.Client - Avoids connection conflicts
- Implementation via
get_httpx_client()
- Each thread maintains its own
-
Nested Parallelization
- Main pool: Organizations (20 workers)
- Subtasks pool: Lab requests (40 workers)
- Non-blocking I/O during processing
-
Questionnaire Batching (4-5x improvement)
- Single call retrieves all questionnaires + answers
- Eliminates N filtered calls per patient
-
Configurable Worker Threads
- User input selection (1-20 workers)
- Tunable for network bandwidth and API rate limits
Progress Tracking (Multi-Level)
Overall Progress [████████████░░░░░░░░░░░░] 847/1200
1/15 - Center 1 [██████████░░░░░░░░░░░░░░░] 73/95
2/15 - Center 2 [██████░░░░░░░░░░░░░░░░░░░] 42/110
3/15 - Center 3 [████░░░░░░░░░░░░░░░░░░░░░] 28/85
Thread-Safe Updates:
with _global_pbar_lock:
if global_pbar:
global_pbar.update(1)
🛡️ Error Handling & Resilience
Token Management Strategy
-
Automatic Token Refresh on 401
- Triggered by
@api_call_with_retrydecorator - Thread-safe via
_token_refresh_lock
- Triggered by
-
Retry Mechanism
- Max retries: 10 attempts
- Delay between retries: 0.5 seconds
- Decorators:
@api_call_with_retry
-
Thread-Safe Token Refresh
def new_token(): global access_token, refresh_token with _token_refresh_lock: # Only one thread refreshes at a time for attempt in range(ERROR_MAX_RETRY): try: # POST /api/auth/refreshToken # Update global tokens except: sleep(WAIT_BEFORE_RETRY)
Exception Handling Categories
| Category | Examples | Handling |
|---|---|---|
| API Errors | Network timeouts, HTTP errors | Retry with exponential spacing |
| File I/O Errors | Missing config, permission denied | Graceful error + exit |
| Validation Errors | Invalid config, incoherent data | Log warning + prompt user |
| Thread Errors | Worker thread failures | Shutdown gracefully + propagate |
Graceful Degradation
- Missing Organization Mapping: Skip silently, use fallback (org name)
- Critical Quality Issues: Prompt user for confirmation before export
- Thread Failure: Shutdown all workers gracefully, preserve partial results
- Invalid Configuration: Clear error messages with remediation suggestions
📊 Data Output Structure
JSON Output: endobest_inclusions.json
[
{
"Patient_Identification": {
"Organisation_Id": "uuid",
"Organisation_Name": "Hospital Name",
"Center_Name": "HOSP-A",
"Patient_Id": "internal_id",
"Pseudo": "ENDO-001",
"Patient_Name": "Doe, John",
"Patient_Birthday": "1975-05-15",
"Patient_Age": 49
},
"Inclusion": {
"Consent_Signed": true,
"Inclusion_Date": "15/10/2024",
"Inclusion_Status": "incluse",
"isPrematurelyTerminated": false
},
"Extended_Fields": {
"Custom_Field_1": "value",
"Custom_Field_2": 42,
"Composite_Score": "8/10"
},
"Endotest": {
"Request_Sent": true,
"Diagnostic_Status": "Completed"
}
}
]
JSON Output: endobest_organizations.json
[
{
"id": "org-uuid",
"name": "Hospital A",
"Center_Name": "HOSP-A",
"patients_count": 45,
"preincluded_count": 8,
"included_count": 35,
"prematurely_terminated_count": 2
}
]
🚀 Execution Modes
Mode 1: Normal (Full Collection)
python eb_dashboard.py
- Authenticates
- Collects from APIs
- Runs quality checks
- Exports JSON + Excel
- Duration: 2.5-5 minutes (typical)
Mode 2: Excel-Only (Fast Export)
python eb_dashboard.py --excel-only
- Skips data collection
- Uses existing JSON files
- Regenerates Excel workbooks
- Duration: 5-15 seconds
- Use case: Reconfigure reports, test templates
Mode 3: Check-Only (Validation Only)
python eb_dashboard.py --check-only
- Loads existing JSON
- Runs quality checks
- No export
- Duration: 5-10 seconds
- Use case: Verify data before distribution
Mode 4: Debug (Verbose Output)
python eb_dashboard.py --debug
- Executes normal mode
- Enables detailed logging
- Shows field-by-field changes
- Check
dashboard.logfor details
📈 Performance Metrics & Benchmarks
Typical Execution Times (Full Dataset: 1,200+ patients, 15+ organizations)
| Phase | Duration | Notes |
|---|---|---|
| Login & Config | 2-3 sec | Sequential, network-dependent |
| Fetch Counters | 5-8 sec | 20 workers, parallelized |
| Collect Inclusions | 2-4 min | Includes API calls + field processing |
| Quality Checks | 10-15 sec | File loads, data comparison |
| Export to JSON | 3-5 sec | File I/O |
| Export to Excel | 5-15 sec | Template processing + fill |
| TOTAL | ~2.5-5 min | Depends on network, API perf |
Network Optimization Impact
With old questionnaire approach (N filtered calls per patient):
- 1,200 patients × 15 questionnaires = 18,000 API calls
- Estimated: 15-30 minutes
With optimized single-call questionnaire:
- 1,200 patients × 1 call = 1,200 API calls
- Estimated: 2-5 minutes
- Improvement: 3-6x faster ✅
🔍 Field Extraction & Processing Logic
Complete Field Processing Pipeline
For each field in INCLUSIONS_MAPPING_CONFIG:
│
├─ Step 1: Determine Source Type
│ ├─ q_id / q_name / q_category → Find questionnaire
│ ├─ record → Use clinical record
│ ├─ inclusion → Use patient inclusion data
│ ├─ request → Use lab request data
│ └─ calculated → Execute custom function
│
├─ Step 2: Extract Raw Value
│ ├─ Navigate JSON using field_path
│ ├─ Supports wildcard (*) for list traversal
│ └─ Return value or "undefined"
│
├─ Step 3: Check Field Condition (optional)
│ ├─ If condition undefined → Set to "undefined"
│ ├─ If condition not boolean → Error flag
│ ├─ If condition false → Set to "N/A"
│ └─ If condition true → Continue
│
├─ Step 4: Apply Post-Processing Transformations
│ ├─ true_if_any: Convert to boolean
│ ├─ value_labels: Map to localized text
│ ├─ field_template: Apply formatting
│ └─ List joining: Flatten arrays with pipe delimiter
│
├─ Step 5: Format Score Dictionaries
│ ├─ If {total, max} → Format as "total/max"
│ └─ Otherwise → Keep as-is
│
└─ Store: output_inclusion[field_group][field_name] = final_value
Custom Functions for Calculated Fields
| Function | Purpose | Syntax |
|---|---|---|
search_in_fields_using_regex |
Search multiple fields for pattern | ["search_in_fields_using_regex", "pattern", "field1", "field2"] |
extract_parentheses_content |
Extract text within parentheses | ["extract_parentheses_content", "field_name"] |
append_terminated_suffix |
Add suffix if patient terminated | ["append_terminated_suffix", "status_field", "is_terminated_field"] |
if_then_else |
Unified conditional with 8 operators | ["if_then_else", "operator", arg1, arg2_optional, true_result, false_result] |
if_then_else Operators:
is_true/is_false- Boolean field testis_defined/is_undefined- Existence testall_true/all_defined- Multiple field test==/!=- Value comparison
✅ Quality Assurance Framework
Coherence Check
Purpose: Verify API-provided statistics match actual collected data
Logic:
For each organization:
API_Count = statistic.total
Actual_Count = count of inclusion records
if API_Count != Actual_Count:
Report discrepancy with severity
├─ ±10%: Warning
└─ >±10%: Critical
Non-Regression Check
Purpose: Detect unexpected changes between data runs
Configuration-Driven Rules:
- Field selection pipeline (include/exclude patterns)
- Transition patterns (expected state changes)
- Severity levels (Warning/Critical)
- Exception handling (exclude specific organizations)
Logic:
Load previous inclusion data (_old file)
For each rule:
├─ Build candidate fields via pipeline
├─ Determine key field for matching
└─ For each inclusion:
├─ Find matching old inclusion by key
├─ Check for unexpected transitions
├─ Apply exceptions
└─ Report violations
📋 Documentation Structure
The system includes comprehensive documentation:
| Document | Size | Content |
|---|---|---|
| DOCUMENTATION_10_ARCHITECTURE.md | 43.7 KB | System design, workflow, APIs, multithreading |
| DOCUMENTATION_11_FIELD_MAPPING.md | 56.3 KB | Field extraction logic, custom functions, examples |
| DOCUMENTATION_12_QUALITY_CHECKS.md | 60.2 KB | QA framework, regression rules, configuration |
| DOCUMENTATION_13_EXCEL_EXPORT.md | 29.6 KB | Excel generation, data transformation, config |
| DOCUMENTATION_98_USER_GUIDE.md | 8.4 KB | End-user instructions, troubleshooting, FAQ |
| DOCUMENTATION_99_CONFIG_GUIDE.md | 24.8 KB | Administrator reference, Excel tables, examples |
🔧 Key Technical Features
Thread Safety
- Per-thread HTTP clients (no connection conflicts)
- Synchronized access to global state via locks
- Thread-safe progress bar updates
Error Recovery
- Automatic token refresh on 401 errors
- Exponential backoff retry logic (configurable)
- Graceful degradation for optional features
- User confirmation on critical issues
Configuration Flexibility
- 100% externalized to Excel (zero code changes)
- Supports multiple data sources
- Custom business logic functions
- Field dependencies and conditions
- Value transformations and templates
Performance
- Optimized API calls (4-5x improvement)
- Parallel processing (20+ workers)
- Async I/O operations
- Configurable thread pools
Data Quality
- Coherence checking (stats vs actual data)
- Non-regression testing (config-driven)
- Comprehensive validation
- Audit trail logging
📦 Dependencies
Core Libraries
- httpx - HTTP client with connection pooling
- openpyxl - Excel file reading/writing
- questionary - Interactive CLI prompts
- tqdm - Progress bars
- rich - Rich text formatting
- pywin32 - Windows COM automation (optional, for formula recalculation)
- pytz - Timezone support (optional)
Python Version
- Python 3.7+
External Services
- Ziwig IAM API
- Ziwig Research Clinic (RC) API
- Ziwig Lab (GDD) API
🎓 Usage Patterns
For End Users
- Configure fields in Excel (no code needed)
- Run:
python eb_dashboard.py - Review results in JSON or Excel
For Administrators
- Add new fields to
Inclusions_Mapping - Define quality rules in
Regression_Check - Configure Excel export in
Excel_Workbooks+Excel_Sheets - Restart: script picks up config automatically
For Developers
- Add custom function to Block 6 (eb_dashboard.py)
- Register in field config (Inclusions_Mapping)
- Use via:
"source_id": "function_name" - No code recompile needed for other changes
🎯 Summary
The Endobest Clinical Research Dashboard represents a mature, production-ready system that successfully combines:
✅ Architectural Excellence - Clean modular design with separation of concerns ✅ User-Centric Configuration - 100% externalized, no code changes needed ✅ Performance Optimization - 4-5x faster via API and threading improvements ✅ Robust Resilience - Comprehensive error handling, automatic recovery, graceful degradation ✅ Quality Assurance - Multi-level validation, coherence checks, regression testing ✅ Comprehensive Documentation - 250+ KB of technical and user guides ✅ Maintainability - Clear code structure, extensive logging, audit trails
The system successfully enables non-technical users to configure complex data extraction and reporting workflows while maintaining enterprise-grade reliability and performance standards.
Document Version: 1.0 Last Updated: 2025-11-08 Status: ✅ Complete & Production Ready