abdelkouddous/EB_Dashboard

Fork 0

Files

Abdelkouddous LHACHIMI cb8b5d9a12 Version fonctionnelle

2025-12-12 23:07:26 +01:00

40 KiB

Raw Blame History

📊 Endobest Clinical Research Dashboard - Architecture Summary

Last Updated: 2025-11-08 Project Status: Production Ready with Excel Export Feature Language: Python 3.x

🎯 Executive Summary

The Endobest Clinical Research Dashboard is a sophisticated, production-grade automated data collection and reporting system designed to aggregate patient inclusion data from the Endobest clinical research protocol across multiple healthcare organizations. The system combines high-performance multithreading, comprehensive quality assurance, and fully externalized configuration to enable non-technical users to manage complex data extraction workflows without code modifications.

Core Value Propositions

✅ 100% Externalized Configuration - All field definitions, quality rules, and export logic defined in Excel ✅ High-Performance Architecture - 4-5x faster via optimized API calls and parallel processing ✅ Robust Resilience - Automatic token refresh, retries, graceful degradation ✅ Comprehensive Quality Assurance - Coherence checks + config-driven regression testing ✅ Multi-Format Export - JSON + configurable Excel workbooks with data transformation ✅ User-Friendly Interface - Interactive prompts, progress tracking, clear error messages

📁 Project Structure

Endobest Dashboard/
├── 📜 MAIN SCRIPT
│   └── eb_dashboard.py                      (57.5 KB, 1,021 lines)
│       Core orchestrator for data collection, processing, and export
│
├── 🔧 UTILITY MODULES
│   ├── eb_dashboard_utils.py                (6.4 KB, 184 lines)
│   │   Thread-safe HTTP clients, nested data navigation, config resolution
│   │
│   ├── eb_dashboard_quality_checks.py       (58.5 KB, 1,266 lines)
│   │   Coherence checks, non-regression testing, data validation
│   │
│   └── eb_dashboard_excel_export.py         (32 KB, ~1,000 lines)
│       Configuration-driven Excel workbook generation
│
├── 📚 DOCUMENTATION
│   ├── DOCUMENTATION_10_ARCHITECTURE.md     (43.7 KB)
│   │   System design, data flow, API integration, multithreading
│   │
│   ├── DOCUMENTATION_11_FIELD_MAPPING.md    (56.3 KB)
│   │   Field extraction logic, custom functions, transformations
│   │
│   ├── DOCUMENTATION_12_QUALITY_CHECKS.md   (60.2 KB)
│   │   Quality assurance framework, regression rules, validation logic
│   │
│   ├── DOCUMENTATION_13_EXCEL_EXPORT.md     (29.6 KB)
│   │   Excel generation architecture, data transformation pipeline
│   │
│   ├── DOCUMENTATION_98_USER_GUIDE.md       (8.4 KB)
│   │   End-user instructions, quick start, troubleshooting
│   │
│   └── DOCUMENTATION_99_CONFIG_GUIDE.md     (24.8 KB)
│       Administrator configuration reference
│
├── ⚙️  CONFIGURATION
│   └── config/
│       ├── Endobest_Dashboard_Config.xlsx   (Configuration file)
│       │   Inclusions_Mapping
│       │   Organizations_Mapping
│       │   Excel_Workbooks
│       │   Excel_Sheets
│       │   Regression_Check
│       │
│       ├── eb_org_center_mapping.xlsx       (Organization enrichment)
│       │
│       └── templates/
│           ├── Endobest_Template.xlsx
│           ├── Statistics_Template.xlsx
│           └── (Other Excel templates)
│
├── 📊 OUTPUT FILES
│   ├── endobest_inclusions.json             (~6-7 MB, patient data)
│   ├── endobest_inclusions_old.json         (backup)
│   ├── endobest_organizations.json          (~17-20 KB, stats)
│   ├── endobest_organizations_old.json      (backup)
│   ├── [Excel outputs]                      (*.xlsx, configurable)
│   └── dashboard.log                        (Execution log)
│
└── 🔨 EXECUTABLES
    ├── eb_dashboard.exe                     (16.5 MB, PyInstaller build)
    └── [Various .bat launch scripts]

🏗️ System Architecture Overview

High-Level Component Diagram

┌─────────────────────────────────────────────────────────────────────┐
│                   ENDOBEST DASHBOARD MAIN PROCESS                   │
│                        eb_dashboard.py                              │
├─────────────────────────────────────────────────────────────────────┤
│                                                                     │
│  ┌──────────────────────────────────────────────────────────────┐  │
│  │  PHASE 1: INITIALIZATION & AUTHENTICATION                   │  │
│  │  ├─ User Login (IAM API)                                    │  │
│  │  ├─ Token Exchange (RC-specific)                            │  │
│  │  ├─ Config Loading (Excel parsing & validation)            │  │
│  │  └─ Thread Pool Setup (20 workers main, 40 subtasks)       │  │
│  └──────────────────────────────────────────────────────────────┘  │
│                              ↓                                      │
│  ┌──────────────────────────────────────────────────────────────┐  │
│  │  PHASE 2: ORGANIZATION & COUNTERS RETRIEVAL                │  │
│  │  ├─ Get All Organizations (getAllOrganizations API)        │  │
│  │  ├─ Fetch Counters Parallelized (20 workers)               │  │
│  │  ├─ Enrich with Center Mapping (optional)                  │  │
│  │  └─ Calculate Totals & Sort                                │  │
│  └──────────────────────────────────────────────────────────────┘  │
│                              ↓                                      │
│  ┌──────────────────────────────────────────────────────────────┐  │
│  │  PHASE 3: PATIENT INCLUSION DATA COLLECTION                │  │
│  │  Outer Loop: Organizations (20 parallel workers)           │  │
│  │  ├─ For Each Organization:                                 │  │
│  │  │  ├─ Get Inclusions List (POST /api/inclusions/search)  │  │
│  │  │  └─ For Each Patient (Sequential):                      │  │
│  │  │     ├─ Fetch Clinical Record (API)                      │  │
│  │  │     ├─ Fetch All Questionnaires (Optimized: 1 call)    │  │
│  │  │     ├─ Fetch Lab Requests (Async pool)                  │  │
│  │  │     ├─ Process Field Mappings (extraction + transform)  │  │
│  │  │     └─ Update Progress Bars (thread-safe)               │  │
│  │  │                                                         │  │
│  │  │  Inner Async: Lab/Questionnaire Fetches (40 workers)   │  │
│  │  │     (Non-blocking I/O during main processing)           │  │
│  │  └─ Combine Inclusions from All Orgs                       │  │
│  └──────────────────────────────────────────────────────────────┘  │
│                              ↓                                      │
│  ┌──────────────────────────────────────────────────────────────┐  │
│  │  PHASE 4: QUALITY ASSURANCE & VALIDATION                   │  │
│  │  ├─ Coherence Check (API stats vs actual data)             │  │
│  │  │  └─ Compares counters with detailed records             │  │
│  │  ├─ Non-Regression Check (config-driven)                   │  │
│  │  │  └─ Detects changes with severity levels                │  │
│  │  └─ Critical Issue Handling (user confirmation if needed)  │  │
│  └──────────────────────────────────────────────────────────────┘  │
│                              ↓                                      │
│  ┌──────────────────────────────────────────────────────────────┐  │
│  │  PHASE 5: EXPORT & PERSISTENCE                             │  │
│  │  ├─ Backup Old Files (if quality passed)                   │  │
│  │  ├─ Write JSON Outputs (endobest_inclusions.json, etc.)   │  │
│  │  ├─ Export to Excel (if configured)                        │  │
│  │  │  ├─ Load Templates                                      │  │
│  │  │  ├─ Apply Filters & Sorts                               │  │
│  │  │  ├─ Fill Data into Sheets                               │  │
│  │  │  ├─ Replace Values                                      │  │
│  │  │  └─ Recalculate Formulas (win32com)                     │  │
│  │  └─ Display Summary & Elapsed Time                         │  │
│  └──────────────────────────────────────────────────────────────┘  │
│                              ↓                                      │
│                           EXIT                                      │
└─────────────────────────────────────────────────────────────────────┘

                    ↓ EXTERNAL DEPENDENCIES ↓

┌─────────────────────────────────────────────────────────────────────┐
│                        EXTERNAL APIS                                │
├─────────────────────────────────────────────────────────────────────┤
│                                                                     │
│  🔐 AUTHENTICATION (IAM)                                           │
│     └─ api-auth.ziwig-connect.com                                  │
│        ├─ POST /api/auth/ziwig-pro/login                           │
│        └─ POST /api/auth/refreshToken                              │
│                                                                     │
│  🏥 RESEARCH CLINIC (RC)                                           │
│     └─ api-hcp.ziwig-connect.com                                   │
│        ├─ POST /api/auth/config-token                              │
│        ├─ GET /api/inclusions/getAllOrganizations                  │
│        ├─ POST /api/inclusions/inclusion-statistics                │
│        ├─ POST /api/inclusions/search                              │
│        ├─ POST /api/records/byPatient                              │
│        └─ POST /api/surveys/filter/with-answers (optimized!)      │
│                                                                     │
│  🧪 LAB / DIAGNOSTICS (GDD)                                        │
│     └─ api-lab.ziwig-connect.com                                   │
│        └─ GET /api/requests/by-tube-id/{tubeId}                    │
│                                                                     │
│  📝 EXCEL TEMPLATES                                                │
│     └─ config/templates/                                           │
│        ├─ Endobest_Template.xlsx                                   │
│        ├─ Statistics_Template.xlsx                                 │
│        └─ (Custom templates)                                       │
│                                                                     │
└─────────────────────────────────────────────────────────────────────┘

🔌 Module Descriptions

1. eb_dashboard.py - Main Orchestrator (57.5 KB)

Responsibility: Complete data collection workflow, API coordination, multithreaded execution

Structure (9 Blocks):

Block	Purpose	Key Functions
1	Configuration & Infrastructure	Constants, global vars, progress bar setup
2	Decorators & Resilience	`@api_call_with_retry`, retry logic
3	Authentication	`login()`, token exchange, IAM integration
3B	File Utilities	`load_json_file()`
4	Inclusions Mapping Config	`load_inclusions_mapping_config()`, validation
5	Data Search & Extraction	Questionnaire finding, field retrieval
6	Custom Functions	Business logic, calculated fields
7	Business API Calls	RC, GDD, organization endpoints
7b	Organization Center Mapping	`load_org_center_mapping()`
8	Processing Orchestration	`process_organization_patients()`, patient data processing
9	Main Execution	Entry point, quality checks, export

Key Technologies:

httpx - HTTP client (with thread-local instances)
openpyxl - Excel parsing
concurrent.futures.ThreadPoolExecutor - Parallel execution
tqdm - Progress tracking
questionary - Interactive prompts

2. eb_dashboard_utils.py - Utility Functions (6.4 KB)

Responsibility: Generic, reusable utility functions shared across modules

Core Functions:

get_httpx_client()          # Thread-local HTTP client management
get_thread_position()       # Progress bar positioning
get_nested_value()          # JSON path navigation with wildcard support (*)
get_config_path()           # Config folder resolution (script vs PyInstaller)
get_old_filename()          # Backup filename generation

Key Features:

Thread-safe HTTP client pooling
Wildcard support in nested JSON paths (e.g., ["items", "*", "value"])
Cross-platform path resolution

3. eb_dashboard_quality_checks.py - QA & Validation (58.5 KB)

Responsibility: Quality assurance, data validation, regression checking

Core Functions:

Function	Purpose
`load_regression_check_config()`	Load regression rules from Excel
`run_quality_checks()`	Orchestrate all QA checks
`coherence_check()`	Verify stats vs detailed data consistency
`non_regression_check()`	Config-driven change validation
`run_check_only_mode()`	Standalone validation mode
`backup_output_files()`	Create versioned backups

Quality Check Types:

Coherence Check
- Compares API-provided organization statistics vs. actual inclusion counts
- Severity: Warning/Critical
- Example: Total API count (145) vs. actual inclusions (143)
Non-Regression Check
- Compares current vs. previous run data
- Applies config-driven rules with transition patterns
- Detects: new inclusions, deletions, field changes
- Severity: Warning/Critical with exceptions

4. eb_dashboard_excel_export.py - Excel Generation & Orchestration (38 KB, v1.1+)

Responsibility: Configuration-driven Excel workbook generation with data transformation + high-level orchestration

Core Functions (Low-Level):

Function	Purpose
`load_excel_export_config()`	Load Excel_Workbooks + Excel_Sheets config
`validate_excel_config()`	Validate templates and named ranges
`export_to_excel()`	Main export orchestration (openpyxl + win32com)
`_apply_filter()`	AND-condition filtering
`_apply_sort()`	Multi-key sorting with datetime support
`_apply_value_replacement()`	Strict type matching value transformation
`_handle_output_exists()`	File conflict resolution
`_recalculate_workbook()`	Formula recalculation via win32com
`_process_sheet()`	Sheet-specific data filling

High-Level Orchestration Functions (v1.1+):

Function	Purpose	Called From
`export_excel_only()`	Complete --excel-only mode	main() CLI detection
`run_normal_mode_export()`	Normal mode export phase	main() after JSON write
`prepare_excel_export()`	Preparation + validation	Both orchestration functions
`execute_excel_export()`	Execution with error handling	Both orchestration functions
`_load_json_file_internal()`	Safe JSON loading	run_normal_mode_export()

Data Transformation Pipeline:

1. Load Configuration (Excel_Workbooks + Excel_Sheets)
2. For each workbook:
   a. Load template (openpyxl)
   b. For each sheet:
      - Apply filter (AND conditions)
      - Apply sort (multi-key)
      - Apply value replacement (strict type matching)
      - Fill data into cells/named ranges
   c. Handle file conflicts (Overwrite/Increment/Backup)
   d. Save workbook (openpyxl)
   e. Recalculate formulas (win32com - optional)

Orchestration Pattern (v1.1+):

As of v1.1, the system delegates all export orchestration to dedicated functions following the pattern established by run_check_only_mode() from quality_checks:

--excel-only mode: Main script calls single function → export_excel_only() handles everything
Normal mode export: Main script calls single function → run_normal_mode_export() handles everything

This keeps the main script focused on business logic while all export mechanics are encapsulated in the module.

🔄 Complete Data Collection Workflow

Phase 1: Initialization (2-3 seconds)

User provides credentials (with defaults)
IAM Login: POST /api/auth/ziwig-pro/login
Token Exchange: POST /api/auth/config-token
Load configuration from Endobest_Dashboard_Config.xlsx
Validate field mappings and quality check rules
Setup thread pools (main: 20 workers, subtasks: 40 workers)

Phase 2: Organization Retrieval (5-8 seconds)

Get all organizations: GET /api/inclusions/getAllOrganizations
Filter excluded centers (config-driven)
Fetch counters in parallel (20 workers):
- For each org: POST /api/inclusions/inclusion-statistics
- Store: patients_count, preincluded_count, included_count, prematurely_terminated_count
Optional: Enrich with center mapping (from eb_org_center_mapping.xlsx)
Calculate totals and sort

Phase 3: Patient Data Collection (2-4 minutes)

Nested Parallel Architecture:

Outer Loop (20 workers): For each organization

POST /api/inclusions/search?limit=1000&page=1 → Get up to 1000 inclusions

Middle Loop (Sequential): For each patient

Fetch clinical record: POST /api/records/byPatient
Fetch questionnaires: POST /api/surveys/filter/with-answers (optimized: 1 call)
Submit async lab request: GET /api/requests/by-tube-id/{tubeId} (in subtasks pool)

Inner Loop (40 async workers): Non-blocking lab/questionnaire processing

Parallel fetches of lab requests while main thread processes fields

Field Processing (per patient):

For each field in configuration:
1. Determine source (questionnaire, record, inclusion, request, calculated)
2. Extract raw value (supports JSON paths with wildcards)
3. Check field condition (optional)
4. Apply post-processing transformations
5. Format score dictionaries
6. Store in nested output structure

Phase 4: Quality Assurance (10-15 seconds)

Coherence Check: Compare API counters vs. actual data
Non-Regression Check: Compare current vs. previous run with config rules
Critical Issue Handling: User confirmation if issues detected
If NO critical issues → continue to export
If YES critical issues → prompt user for override

Phase 5: Export & Persistence (3-5 seconds)

Step 1: Backup & JSON Write

Backup old files (if quality checks passed)
Write JSON outputs:
- endobest_inclusions.json (6-7 MB)
- endobest_organizations.json (17-20 KB)

Step 2: Excel Export (if configured) Delegated to run_normal_mode_export() function which handles:

Load JSONs from filesystem (ensures consistency)
Load Excel configuration
Validate templates and named ranges
For each configured workbook:
- Load template file
- Apply filter conditions (AND logic)
- Apply multi-key sort
- Apply value replacements (strict type matching)
- Fill data into cells/named ranges
- Handle file conflicts (Overwrite/Increment/Backup)
- Save workbook
- Recalculate formulas (optional, via win32com)
Display results and return status

Step 3: Summary

Display elapsed time
Report file locations
Note any warnings/errors during export

⚙️ Configuration System

Three-Layer Configuration Architecture

Layer 1: Excel Configuration (`Endobest_Dashboard_Config.xlsx`)

Sheet 1: Inclusions_Mapping (Field Extraction)

Define which patient fields to extract
Specify sources (questionnaire, record, inclusion, request, calculated)
Configure transformations (value labels, templates, conditions)
~50+ fields typically configured

Sheet 2: Organizations_Mapping (Organization Fields)

Define which organization fields to export
Rarely modified

Sheet 3: Excel_Workbooks (Excel Export Metadata)

Workbook names
Template paths
Output filenames (with template variables)
File conflict handling strategy (Overwrite/Increment/Backup)

Sheet 4: Excel_Sheets (Sheet Configurations)

Workbook name (reference to Excel_Workbooks)
Sheet name (in template)
Source type (Inclusions/Organizations/Variable)
Target (cell or named range)
Column mapping (JSON)
Filter conditions (JSON with AND logic)
Sort keys (JSON, multi-key with datetime support)
Value replacements (JSON, strict type matching)

Sheet 5: Regression_Check (Quality Rules)

Rule names
Field selection pipeline (include/exclude patterns)
Scope (all organizations or specific org list)
Transition patterns (expected state changes)
Severity levels (Warning/Critical)

Layer 2: Organization Mapping (`eb_org_center_mapping.xlsx`)

Optional mapping file
Sheet: Org_Center_Mapping
Maps organization names to center identifiers
Gracefully degraded if missing

Layer 3: Excel Templates (`config/templates/`)

Excel workbook templates with:
- Sheet definitions
- Named ranges (for data fill targets)
- Formula structures
- Formatting and styles

Configuration Constants (in code)

# API Configuration
IAM_URL = "https://api-auth.ziwig-connect.com"
RC_URL = "https://api-hcp.ziwig-connect.com"
GDD_URL = "https://api-lab.ziwig-connect.com"
RC_APP_ID = "602aea51-cdb2-4f73-ac99-fd84050dc393"
RC_ENDOBEST_PROTOCOL_ID = "3c7bcb4d-91ed-4e9f-b93f-99d8447a276e"

# Threading & Performance
MAX_THREADS = 20                # Main thread pool workers
ASYNC_THREADS = 40              # Subtasks thread pool workers
ERROR_MAX_RETRY = 10            # Maximum retry attempts
WAIT_BEFORE_RETRY = 0.5         # Seconds between retries

# Excluded Organizations
RC_ENDOBEST_EXCLUDED_CENTERS = ["e18e7487-...", "5582bd75-...", "e053512f-..."]

🔐 API Integration

Authentication Flow

1. IAM Login
   POST https://api-auth.ziwig-connect.com/api/auth/ziwig-pro/login
   Request: {"username": "...", "password": "..."}
   Response: {"access_token": "jwt_master", "userId": "uuid"}

2. Token Exchange (RC-specific)
   POST https://api-hcp.ziwig-connect.com/api/auth/config-token
   Headers: Authorization: Bearer {master_token}
   Request: {"userId": "...", "clientId": "...", "userAgent": "..."}
   Response: {"access_token": "jwt_rc", "refresh_token": "refresh_token"}

3. Automatic Token Refresh (on 401)
   POST https://api-hcp.ziwig-connect.com/api/auth/refreshToken
   Headers: Authorization: Bearer {current_token}
   Request: {"refresh_token": "..."}
   Response: {"access_token": "jwt_new", "refresh_token": "new_refresh"}

Key API Endpoints

Endpoint	Method	Purpose
`/api/inclusions/getAllOrganizations`	GET	List all organizations
`/api/inclusions/inclusion-statistics`	POST	Get patient counts per org
`/api/inclusions/search`	POST	Get inclusions list for org (paginated)
`/api/records/byPatient`	POST	Get clinical record for patient
`/api/surveys/filter/with-answers`	POST	OPTIMIZED: Get all questionnaires for patient
`/api/requests/by-tube-id/{tubeId}`	GET	Get lab test results

Performance Optimization: Questionnaire Batching

Problem: Multiple API calls per patient (1 call per questionnaire × N patients = slow)

Solution: Single optimized call retrieves all questionnaires with answers

BEFORE (inefficient):
for qcm_id in questionnaire_ids:
    GET /api/surveys/{qcm_id}/answers?subject={patient_id}
    # Result: N API calls per patient

AFTER (optimized):
POST /api/surveys/filter/with-answers
{
  "context": "clinic_research",
  "subject": patient_id
}
# Result: 1 API call per patient
# Impact: 4-5x performance improvement

⚡ Multithreading & Performance Optimization

Thread Pool Architecture

Main Application Thread
    ↓
┌─ Phase 1: Counter Fetching ──────────────────────────┐
│ ThreadPoolExecutor(max_workers=user_input, cap=20)   │
│ ├─ Task 1: Get counters for Org 1                     │
│ ├─ Task 2: Get counters for Org 2                     │
│ └─ Task N: Get counters for Org N                     │
│ [Sequential wait: tqdm.as_completed]                  │
└──────────────────────────────────────────────────────┘
    ↓
┌─ Phase 2: Inclusion Data Collection (Nested) ────────┐
│ Outer: ThreadPoolExecutor(max_workers=user_input)    │
│                                                       │
│ For Org 1:                                            │
│ │   Inner: ThreadPoolExecutor(max_workers=40)        │
│ │   ├─ Patient 1: Async lab/questionnaire fetch      │
│ │   ├─ Patient 2: Async lab/questionnaire fetch      │
│ │   └─ Patient N: Async lab/questionnaire fetch      │
│ │   [Sequential outer wait: as_completed]            │
│ │                                                     │
│ For Org 2:                                            │
│ │   [Similar parallel processing]                    │
│ │                                                     │
│ For Org N:                                            │
│ │   [Similar parallel processing]                    │
└──────────────────────────────────────────────────────┘

Performance Optimizations

Thread-Local HTTP Clients
- Each thread maintains its own httpx.Client
- Avoids connection conflicts
- Implementation via get_httpx_client()
Nested Parallelization
- Main pool: Organizations (20 workers)
- Subtasks pool: Lab requests (40 workers)
- Non-blocking I/O during processing
Questionnaire Batching (4-5x improvement)
- Single call retrieves all questionnaires + answers
- Eliminates N filtered calls per patient
Configurable Worker Threads
- User input selection (1-20 workers)
- Tunable for network bandwidth and API rate limits

Progress Tracking (Multi-Level)

Overall Progress [████████████░░░░░░░░░░░░] 847/1200
  1/15 - Center 1 [██████████░░░░░░░░░░░░░░░]  73/95
  2/15 - Center 2 [██████░░░░░░░░░░░░░░░░░░░]  42/110
  3/15 - Center 3 [████░░░░░░░░░░░░░░░░░░░░░]  28/85

Thread-Safe Updates:

with _global_pbar_lock:
    if global_pbar:
        global_pbar.update(1)

🛡️ Error Handling & Resilience

Token Management Strategy

Automatic Token Refresh on 401
- Triggered by @api_call_with_retry decorator
- Thread-safe via _token_refresh_lock
Retry Mechanism
- Max retries: 10 attempts
- Delay between retries: 0.5 seconds
- Decorators: @api_call_with_retry

Thread-Safe Token Refresh

def new_token():
    global access_token, refresh_token
    with _token_refresh_lock:  # Only one thread refreshes at a time
        for attempt in range(ERROR_MAX_RETRY):
            try:
                # POST /api/auth/refreshToken
                # Update global tokens
            except:
                sleep(WAIT_BEFORE_RETRY)

Exception Handling Categories

Category	Examples	Handling
API Errors	Network timeouts, HTTP errors	Retry with exponential spacing
File I/O Errors	Missing config, permission denied	Graceful error + exit
Validation Errors	Invalid config, incoherent data	Log warning + prompt user
Thread Errors	Worker thread failures	Shutdown gracefully + propagate

Graceful Degradation

Missing Organization Mapping: Skip silently, use fallback (org name)
Critical Quality Issues: Prompt user for confirmation before export
Thread Failure: Shutdown all workers gracefully, preserve partial results
Invalid Configuration: Clear error messages with remediation suggestions

📊 Data Output Structure

JSON Output: `endobest_inclusions.json`

[
  {
    "Patient_Identification": {
      "Organisation_Id": "uuid",
      "Organisation_Name": "Hospital Name",
      "Center_Name": "HOSP-A",
      "Patient_Id": "internal_id",
      "Pseudo": "ENDO-001",
      "Patient_Name": "Doe, John",
      "Patient_Birthday": "1975-05-15",
      "Patient_Age": 49
    },
    "Inclusion": {
      "Consent_Signed": true,
      "Inclusion_Date": "15/10/2024",
      "Inclusion_Status": "incluse",
      "isPrematurelyTerminated": false
    },
    "Extended_Fields": {
      "Custom_Field_1": "value",
      "Custom_Field_2": 42,
      "Composite_Score": "8/10"
    },
    "Endotest": {
      "Request_Sent": true,
      "Diagnostic_Status": "Completed"
    }
  }
]

JSON Output: `endobest_organizations.json`

[
  {
    "id": "org-uuid",
    "name": "Hospital A",
    "Center_Name": "HOSP-A",
    "patients_count": 45,
    "preincluded_count": 8,
    "included_count": 35,
    "prematurely_terminated_count": 2
  }
]

🚀 Execution Modes

Mode 1: Normal (Full Collection)

python eb_dashboard.py

Authenticates
Collects from APIs
Runs quality checks
Exports JSON + Excel
Duration: 2.5-5 minutes (typical)

Mode 2: Excel-Only (Fast Export)

python eb_dashboard.py --excel-only

Skips data collection
Uses existing JSON files
Regenerates Excel workbooks
Duration: 5-15 seconds
Use case: Reconfigure reports, test templates

Mode 3: Check-Only (Validation Only)

python eb_dashboard.py --check-only

Loads existing JSON
Runs quality checks
No export
Duration: 5-10 seconds
Use case: Verify data before distribution

Mode 4: Debug (Verbose Output)

python eb_dashboard.py --debug

Executes normal mode
Enables detailed logging
Shows field-by-field changes
Check dashboard.log for details

📈 Performance Metrics & Benchmarks

Typical Execution Times (Full Dataset: 1,200+ patients, 15+ organizations)

Phase	Duration	Notes
Login & Config	2-3 sec	Sequential, network-dependent
Fetch Counters	5-8 sec	20 workers, parallelized
Collect Inclusions	2-4 min	Includes API calls + field processing
Quality Checks	10-15 sec	File loads, data comparison
Export to JSON	3-5 sec	File I/O
Export to Excel	5-15 sec	Template processing + fill
TOTAL	~2.5-5 min	Depends on network, API perf

Network Optimization Impact

With old questionnaire approach (N filtered calls per patient):

1,200 patients × 15 questionnaires = 18,000 API calls
Estimated: 15-30 minutes

With optimized single-call questionnaire:

1,200 patients × 1 call = 1,200 API calls
Estimated: 2-5 minutes
Improvement: 3-6x faster ✅

🔍 Field Extraction & Processing Logic

Complete Field Processing Pipeline

For each field in INCLUSIONS_MAPPING_CONFIG:
  │
  ├─ Step 1: Determine Source Type
  │  ├─ q_id / q_name / q_category → Find questionnaire
  │  ├─ record → Use clinical record
  │  ├─ inclusion → Use patient inclusion data
  │  ├─ request → Use lab request data
  │  └─ calculated → Execute custom function
  │
  ├─ Step 2: Extract Raw Value
  │  ├─ Navigate JSON using field_path
  │  ├─ Supports wildcard (*) for list traversal
  │  └─ Return value or "undefined"
  │
  ├─ Step 3: Check Field Condition (optional)
  │  ├─ If condition undefined → Set to "undefined"
  │  ├─ If condition not boolean → Error flag
  │  ├─ If condition false → Set to "N/A"
  │  └─ If condition true → Continue
  │
  ├─ Step 4: Apply Post-Processing Transformations
  │  ├─ true_if_any: Convert to boolean
  │  ├─ value_labels: Map to localized text
  │  ├─ field_template: Apply formatting
  │  └─ List joining: Flatten arrays with pipe delimiter
  │
  ├─ Step 5: Format Score Dictionaries
  │  ├─ If {total, max} → Format as "total/max"
  │  └─ Otherwise → Keep as-is
  │
  └─ Store: output_inclusion[field_group][field_name] = final_value

Custom Functions for Calculated Fields

Function	Purpose	Syntax
`search_in_fields_using_regex`	Search multiple fields for pattern	`["search_in_fields_using_regex", "pattern", "field1", "field2"]`
`extract_parentheses_content`	Extract text within parentheses	`["extract_parentheses_content", "field_name"]`
`append_terminated_suffix`	Add suffix if patient terminated	`["append_terminated_suffix", "status_field", "is_terminated_field"]`
`if_then_else`	Unified conditional with 8 operators	`["if_then_else", "operator", arg1, arg2_optional, true_result, false_result]`

if_then_else Operators:

is_true / is_false - Boolean field test
is_defined / is_undefined - Existence test
all_true / all_defined - Multiple field test
== / != - Value comparison

✅ Quality Assurance Framework

Coherence Check

Purpose: Verify API-provided statistics match actual collected data

Logic:

For each organization:
  API_Count = statistic.total
  Actual_Count = count of inclusion records

  if API_Count != Actual_Count:
    Report discrepancy with severity
    ├─ ±10%: Warning
    └─ >±10%: Critical

Non-Regression Check

Purpose: Detect unexpected changes between data runs

Configuration-Driven Rules:

Field selection pipeline (include/exclude patterns)
Transition patterns (expected state changes)
Severity levels (Warning/Critical)
Exception handling (exclude specific organizations)

Logic:

Load previous inclusion data (_old file)

For each rule:
  ├─ Build candidate fields via pipeline
  ├─ Determine key field for matching
  └─ For each inclusion:
     ├─ Find matching old inclusion by key
     ├─ Check for unexpected transitions
     ├─ Apply exceptions
     └─ Report violations

📋 Documentation Structure

The system includes comprehensive documentation:

Document	Size	Content
DOCUMENTATION_10_ARCHITECTURE.md	43.7 KB	System design, workflow, APIs, multithreading
DOCUMENTATION_11_FIELD_MAPPING.md	56.3 KB	Field extraction logic, custom functions, examples
DOCUMENTATION_12_QUALITY_CHECKS.md	60.2 KB	QA framework, regression rules, configuration
DOCUMENTATION_13_EXCEL_EXPORT.md	29.6 KB	Excel generation, data transformation, config
DOCUMENTATION_98_USER_GUIDE.md	8.4 KB	End-user instructions, troubleshooting, FAQ
DOCUMENTATION_99_CONFIG_GUIDE.md	24.8 KB	Administrator reference, Excel tables, examples

🔧 Key Technical Features

Thread Safety

Per-thread HTTP clients (no connection conflicts)
Synchronized access to global state via locks
Thread-safe progress bar updates

Error Recovery

Automatic token refresh on 401 errors
Exponential backoff retry logic (configurable)
Graceful degradation for optional features
User confirmation on critical issues

Configuration Flexibility

100% externalized to Excel (zero code changes)
Supports multiple data sources
Custom business logic functions
Field dependencies and conditions
Value transformations and templates

Performance

Optimized API calls (4-5x improvement)
Parallel processing (20+ workers)
Async I/O operations
Configurable thread pools

Data Quality

Coherence checking (stats vs actual data)
Non-regression testing (config-driven)
Comprehensive validation
Audit trail logging

📦 Dependencies

Core Libraries

httpx - HTTP client with connection pooling
openpyxl - Excel file reading/writing
questionary - Interactive CLI prompts
tqdm - Progress bars
rich - Rich text formatting
pywin32 - Windows COM automation (optional, for formula recalculation)
pytz - Timezone support (optional)

Python Version

Python 3.7+

External Services

Ziwig IAM API
Ziwig Research Clinic (RC) API
Ziwig Lab (GDD) API

🎓 Usage Patterns

For End Users

Configure fields in Excel (no code needed)
Run: python eb_dashboard.py
Review results in JSON or Excel

For Administrators

Add new fields to Inclusions_Mapping
Define quality rules in Regression_Check
Configure Excel export in Excel_Workbooks + Excel_Sheets
Restart: script picks up config automatically

For Developers

Add custom function to Block 6 (eb_dashboard.py)
Register in field config (Inclusions_Mapping)
Use via: "source_id": "function_name"
No code recompile needed for other changes

🎯 Summary

The Endobest Clinical Research Dashboard represents a mature, production-ready system that successfully combines:

✅ Architectural Excellence - Clean modular design with separation of concerns ✅ User-Centric Configuration - 100% externalized, no code changes needed ✅ Performance Optimization - 4-5x faster via API and threading improvements ✅ Robust Resilience - Comprehensive error handling, automatic recovery, graceful degradation ✅ Quality Assurance - Multi-level validation, coherence checks, regression testing ✅ Comprehensive Documentation - 250+ KB of technical and user guides ✅ Maintainability - Clear code structure, extensive logging, audit trails

The system successfully enables non-technical users to configure complex data extraction and reporting workflows while maintaining enterprise-grade reliability and performance standards.

Document Version: 1.0 Last Updated: 2025-11-08 Status: ✅ Complete & Production Ready

40 KiB Raw Blame History Unescape Escape

📊 Endobest Clinical Research Dashboard - Architecture Summary

🎯 Executive Summary

Core Value Propositions

📁 Project Structure

🏗️ System Architecture Overview

High-Level Component Diagram

🔌 Module Descriptions

1. eb_dashboard.py - Main Orchestrator (57.5 KB)

2. eb_dashboard_utils.py - Utility Functions (6.4 KB)

3. eb_dashboard_quality_checks.py - QA & Validation (58.5 KB)

4. eb_dashboard_excel_export.py - Excel Generation & Orchestration (38 KB, v1.1+)

🔄 Complete Data Collection Workflow

Phase 1: Initialization (2-3 seconds)

Phase 2: Organization Retrieval (5-8 seconds)

Phase 3: Patient Data Collection (2-4 minutes)

Phase 4: Quality Assurance (10-15 seconds)

Phase 5: Export & Persistence (3-5 seconds)

⚙️ Configuration System

Three-Layer Configuration Architecture

Layer 1: Excel Configuration (Endobest_Dashboard_Config.xlsx)

Layer 2: Organization Mapping (eb_org_center_mapping.xlsx)

Layer 3: Excel Templates (config/templates/)

Configuration Constants (in code)

🔐 API Integration

Authentication Flow

Key API Endpoints

Performance Optimization: Questionnaire Batching

⚡ Multithreading & Performance Optimization

Thread Pool Architecture

Performance Optimizations

Progress Tracking (Multi-Level)

🛡️ Error Handling & Resilience

Token Management Strategy

Exception Handling Categories

Graceful Degradation

📊 Data Output Structure

JSON Output: endobest_inclusions.json

JSON Output: endobest_organizations.json

🚀 Execution Modes

Mode 1: Normal (Full Collection)

Mode 2: Excel-Only (Fast Export)

Mode 3: Check-Only (Validation Only)

Mode 4: Debug (Verbose Output)

📈 Performance Metrics & Benchmarks

Typical Execution Times (Full Dataset: 1,200+ patients, 15+ organizations)

Network Optimization Impact

🔍 Field Extraction & Processing Logic

Complete Field Processing Pipeline

Custom Functions for Calculated Fields

✅ Quality Assurance Framework

Coherence Check

Non-Regression Check

📋 Documentation Structure

🔧 Key Technical Features

Thread Safety

Error Recovery

Configuration Flexibility

Performance

Data Quality

📦 Dependencies

Core Libraries

Python Version

External Services

🎓 Usage Patterns

For End Users

For Administrators

For Developers

🎯 Summary

40 KiB

Raw Blame History

Layer 1: Excel Configuration (`Endobest_Dashboard_Config.xlsx`)

Layer 2: Organization Mapping (`eb_org_center_mapping.xlsx`)

Layer 3: Excel Templates (`config/templates/`)

JSON Output: `endobest_inclusions.json`

JSON Output: `endobest_organizations.json`