Files
EB_Dashboard/DOCUMENTATION/DOCUMENTATION_30_ARCHITECTURE_SUMMARY.md

40 KiB
Raw Permalink Blame History

📊 Endobest Clinical Research Dashboard - Architecture Summary

Last Updated: 2025-11-08 Project Status: Production Ready with Excel Export Feature Language: Python 3.x


🎯 Executive Summary

The Endobest Clinical Research Dashboard is a sophisticated, production-grade automated data collection and reporting system designed to aggregate patient inclusion data from the Endobest clinical research protocol across multiple healthcare organizations. The system combines high-performance multithreading, comprehensive quality assurance, and fully externalized configuration to enable non-technical users to manage complex data extraction workflows without code modifications.

Core Value Propositions

100% Externalized Configuration - All field definitions, quality rules, and export logic defined in Excel High-Performance Architecture - 4-5x faster via optimized API calls and parallel processing Robust Resilience - Automatic token refresh, retries, graceful degradation Comprehensive Quality Assurance - Coherence checks + config-driven regression testing Multi-Format Export - JSON + configurable Excel workbooks with data transformation User-Friendly Interface - Interactive prompts, progress tracking, clear error messages


📁 Project Structure

Endobest Dashboard/
├── 📜 MAIN SCRIPT
│   └── eb_dashboard.py                      (57.5 KB, 1,021 lines)
│       Core orchestrator for data collection, processing, and export
│
├── 🔧 UTILITY MODULES
│   ├── eb_dashboard_utils.py                (6.4 KB, 184 lines)
│   │   Thread-safe HTTP clients, nested data navigation, config resolution
│   │
│   ├── eb_dashboard_quality_checks.py       (58.5 KB, 1,266 lines)
│   │   Coherence checks, non-regression testing, data validation
│   │
│   └── eb_dashboard_excel_export.py         (32 KB, ~1,000 lines)
│       Configuration-driven Excel workbook generation
│
├── 📚 DOCUMENTATION
│   ├── DOCUMENTATION_10_ARCHITECTURE.md     (43.7 KB)
│   │   System design, data flow, API integration, multithreading
│   │
│   ├── DOCUMENTATION_11_FIELD_MAPPING.md    (56.3 KB)
│   │   Field extraction logic, custom functions, transformations
│   │
│   ├── DOCUMENTATION_12_QUALITY_CHECKS.md   (60.2 KB)
│   │   Quality assurance framework, regression rules, validation logic
│   │
│   ├── DOCUMENTATION_13_EXCEL_EXPORT.md     (29.6 KB)
│   │   Excel generation architecture, data transformation pipeline
│   │
│   ├── DOCUMENTATION_98_USER_GUIDE.md       (8.4 KB)
│   │   End-user instructions, quick start, troubleshooting
│   │
│   └── DOCUMENTATION_99_CONFIG_GUIDE.md     (24.8 KB)
│       Administrator configuration reference
│
├── ⚙️  CONFIGURATION
│   └── config/
│       ├── Endobest_Dashboard_Config.xlsx   (Configuration file)
│       │   Inclusions_Mapping
│       │   Organizations_Mapping
│       │   Excel_Workbooks
│       │   Excel_Sheets
│       │   Regression_Check
│       │
│       ├── eb_org_center_mapping.xlsx       (Organization enrichment)
│       │
│       └── templates/
│           ├── Endobest_Template.xlsx
│           ├── Statistics_Template.xlsx
│           └── (Other Excel templates)
│
├── 📊 OUTPUT FILES
│   ├── endobest_inclusions.json             (~6-7 MB, patient data)
│   ├── endobest_inclusions_old.json         (backup)
│   ├── endobest_organizations.json          (~17-20 KB, stats)
│   ├── endobest_organizations_old.json      (backup)
│   ├── [Excel outputs]                      (*.xlsx, configurable)
│   └── dashboard.log                        (Execution log)
│
└── 🔨 EXECUTABLES
    ├── eb_dashboard.exe                     (16.5 MB, PyInstaller build)
    └── [Various .bat launch scripts]

🏗️ System Architecture Overview

High-Level Component Diagram

┌─────────────────────────────────────────────────────────────────────┐
│                   ENDOBEST DASHBOARD MAIN PROCESS                   │
│                        eb_dashboard.py                              │
├─────────────────────────────────────────────────────────────────────┤
│                                                                     │
│  ┌──────────────────────────────────────────────────────────────┐  │
│  │  PHASE 1: INITIALIZATION & AUTHENTICATION                   │  │
│  │  ├─ User Login (IAM API)                                    │  │
│  │  ├─ Token Exchange (RC-specific)                            │  │
│  │  ├─ Config Loading (Excel parsing & validation)            │  │
│  │  └─ Thread Pool Setup (20 workers main, 40 subtasks)       │  │
│  └──────────────────────────────────────────────────────────────┘  │
│                              ↓                                      │
│  ┌──────────────────────────────────────────────────────────────┐  │
│  │  PHASE 2: ORGANIZATION & COUNTERS RETRIEVAL                │  │
│  │  ├─ Get All Organizations (getAllOrganizations API)        │  │
│  │  ├─ Fetch Counters Parallelized (20 workers)               │  │
│  │  ├─ Enrich with Center Mapping (optional)                  │  │
│  │  └─ Calculate Totals & Sort                                │  │
│  └──────────────────────────────────────────────────────────────┘  │
│                              ↓                                      │
│  ┌──────────────────────────────────────────────────────────────┐  │
│  │  PHASE 3: PATIENT INCLUSION DATA COLLECTION                │  │
│  │  Outer Loop: Organizations (20 parallel workers)           │  │
│  │  ├─ For Each Organization:                                 │  │
│  │  │  ├─ Get Inclusions List (POST /api/inclusions/search)  │  │
│  │  │  └─ For Each Patient (Sequential):                      │  │
│  │  │     ├─ Fetch Clinical Record (API)                      │  │
│  │  │     ├─ Fetch All Questionnaires (Optimized: 1 call)    │  │
│  │  │     ├─ Fetch Lab Requests (Async pool)                  │  │
│  │  │     ├─ Process Field Mappings (extraction + transform)  │  │
│  │  │     └─ Update Progress Bars (thread-safe)               │  │
│  │  │                                                         │  │
│  │  │  Inner Async: Lab/Questionnaire Fetches (40 workers)   │  │
│  │  │     (Non-blocking I/O during main processing)           │  │
│  │  └─ Combine Inclusions from All Orgs                       │  │
│  └──────────────────────────────────────────────────────────────┘  │
│                              ↓                                      │
│  ┌──────────────────────────────────────────────────────────────┐  │
│  │  PHASE 4: QUALITY ASSURANCE & VALIDATION                   │  │
│  │  ├─ Coherence Check (API stats vs actual data)             │  │
│  │  │  └─ Compares counters with detailed records             │  │
│  │  ├─ Non-Regression Check (config-driven)                   │  │
│  │  │  └─ Detects changes with severity levels                │  │
│  │  └─ Critical Issue Handling (user confirmation if needed)  │  │
│  └──────────────────────────────────────────────────────────────┘  │
│                              ↓                                      │
│  ┌──────────────────────────────────────────────────────────────┐  │
│  │  PHASE 5: EXPORT & PERSISTENCE                             │  │
│  │  ├─ Backup Old Files (if quality passed)                   │  │
│  │  ├─ Write JSON Outputs (endobest_inclusions.json, etc.)   │  │
│  │  ├─ Export to Excel (if configured)                        │  │
│  │  │  ├─ Load Templates                                      │  │
│  │  │  ├─ Apply Filters & Sorts                               │  │
│  │  │  ├─ Fill Data into Sheets                               │  │
│  │  │  ├─ Replace Values                                      │  │
│  │  │  └─ Recalculate Formulas (win32com)                     │  │
│  │  └─ Display Summary & Elapsed Time                         │  │
│  └──────────────────────────────────────────────────────────────┘  │
│                              ↓                                      │
│                           EXIT                                      │
└─────────────────────────────────────────────────────────────────────┘

                    ↓ EXTERNAL DEPENDENCIES ↓

┌─────────────────────────────────────────────────────────────────────┐
│                        EXTERNAL APIS                                │
├─────────────────────────────────────────────────────────────────────┤
│                                                                     │
│  🔐 AUTHENTICATION (IAM)                                           │
│     └─ api-auth.ziwig-connect.com                                  │
│        ├─ POST /api/auth/ziwig-pro/login                           │
│        └─ POST /api/auth/refreshToken                              │
│                                                                     │
│  🏥 RESEARCH CLINIC (RC)                                           │
│     └─ api-hcp.ziwig-connect.com                                   │
│        ├─ POST /api/auth/config-token                              │
│        ├─ GET /api/inclusions/getAllOrganizations                  │
│        ├─ POST /api/inclusions/inclusion-statistics                │
│        ├─ POST /api/inclusions/search                              │
│        ├─ POST /api/records/byPatient                              │
│        └─ POST /api/surveys/filter/with-answers (optimized!)      │
│                                                                     │
│  🧪 LAB / DIAGNOSTICS (GDD)                                        │
│     └─ api-lab.ziwig-connect.com                                   │
│        └─ GET /api/requests/by-tube-id/{tubeId}                    │
│                                                                     │
│  📝 EXCEL TEMPLATES                                                │
│     └─ config/templates/                                           │
│        ├─ Endobest_Template.xlsx                                   │
│        ├─ Statistics_Template.xlsx                                 │
│        └─ (Custom templates)                                       │
│                                                                     │
└─────────────────────────────────────────────────────────────────────┘

🔌 Module Descriptions

1. eb_dashboard.py - Main Orchestrator (57.5 KB)

Responsibility: Complete data collection workflow, API coordination, multithreaded execution

Structure (9 Blocks):

Block Purpose Key Functions
1 Configuration & Infrastructure Constants, global vars, progress bar setup
2 Decorators & Resilience @api_call_with_retry, retry logic
3 Authentication login(), token exchange, IAM integration
3B File Utilities load_json_file()
4 Inclusions Mapping Config load_inclusions_mapping_config(), validation
5 Data Search & Extraction Questionnaire finding, field retrieval
6 Custom Functions Business logic, calculated fields
7 Business API Calls RC, GDD, organization endpoints
7b Organization Center Mapping load_org_center_mapping()
8 Processing Orchestration process_organization_patients(), patient data processing
9 Main Execution Entry point, quality checks, export

Key Technologies:

  • httpx - HTTP client (with thread-local instances)
  • openpyxl - Excel parsing
  • concurrent.futures.ThreadPoolExecutor - Parallel execution
  • tqdm - Progress tracking
  • questionary - Interactive prompts

2. eb_dashboard_utils.py - Utility Functions (6.4 KB)

Responsibility: Generic, reusable utility functions shared across modules

Core Functions:

get_httpx_client()          # Thread-local HTTP client management
get_thread_position()       # Progress bar positioning
get_nested_value()          # JSON path navigation with wildcard support (*)
get_config_path()           # Config folder resolution (script vs PyInstaller)
get_old_filename()          # Backup filename generation

Key Features:

  • Thread-safe HTTP client pooling
  • Wildcard support in nested JSON paths (e.g., ["items", "*", "value"])
  • Cross-platform path resolution

3. eb_dashboard_quality_checks.py - QA & Validation (58.5 KB)

Responsibility: Quality assurance, data validation, regression checking

Core Functions:

Function Purpose
load_regression_check_config() Load regression rules from Excel
run_quality_checks() Orchestrate all QA checks
coherence_check() Verify stats vs detailed data consistency
non_regression_check() Config-driven change validation
run_check_only_mode() Standalone validation mode
backup_output_files() Create versioned backups

Quality Check Types:

  1. Coherence Check

    • Compares API-provided organization statistics vs. actual inclusion counts
    • Severity: Warning/Critical
    • Example: Total API count (145) vs. actual inclusions (143)
  2. Non-Regression Check

    • Compares current vs. previous run data
    • Applies config-driven rules with transition patterns
    • Detects: new inclusions, deletions, field changes
    • Severity: Warning/Critical with exceptions

4. eb_dashboard_excel_export.py - Excel Generation & Orchestration (38 KB, v1.1+)

Responsibility: Configuration-driven Excel workbook generation with data transformation + high-level orchestration

Core Functions (Low-Level):

Function Purpose
load_excel_export_config() Load Excel_Workbooks + Excel_Sheets config
validate_excel_config() Validate templates and named ranges
export_to_excel() Main export orchestration (openpyxl + win32com)
_apply_filter() AND-condition filtering
_apply_sort() Multi-key sorting with datetime support
_apply_value_replacement() Strict type matching value transformation
_handle_output_exists() File conflict resolution
_recalculate_workbook() Formula recalculation via win32com
_process_sheet() Sheet-specific data filling

High-Level Orchestration Functions (v1.1+):

Function Purpose Called From
export_excel_only() Complete --excel-only mode main() CLI detection
run_normal_mode_export() Normal mode export phase main() after JSON write
prepare_excel_export() Preparation + validation Both orchestration functions
execute_excel_export() Execution with error handling Both orchestration functions
_load_json_file_internal() Safe JSON loading run_normal_mode_export()

Data Transformation Pipeline:

1. Load Configuration (Excel_Workbooks + Excel_Sheets)
2. For each workbook:
   a. Load template (openpyxl)
   b. For each sheet:
      - Apply filter (AND conditions)
      - Apply sort (multi-key)
      - Apply value replacement (strict type matching)
      - Fill data into cells/named ranges
   c. Handle file conflicts (Overwrite/Increment/Backup)
   d. Save workbook (openpyxl)
   e. Recalculate formulas (win32com - optional)

Orchestration Pattern (v1.1+):

As of v1.1, the system delegates all export orchestration to dedicated functions following the pattern established by run_check_only_mode() from quality_checks:

  1. --excel-only mode: Main script calls single function → export_excel_only() handles everything
  2. Normal mode export: Main script calls single function → run_normal_mode_export() handles everything

This keeps the main script focused on business logic while all export mechanics are encapsulated in the module.


🔄 Complete Data Collection Workflow

Phase 1: Initialization (2-3 seconds)

  1. User provides credentials (with defaults)
  2. IAM Login: POST /api/auth/ziwig-pro/login
  3. Token Exchange: POST /api/auth/config-token
  4. Load configuration from Endobest_Dashboard_Config.xlsx
  5. Validate field mappings and quality check rules
  6. Setup thread pools (main: 20 workers, subtasks: 40 workers)

Phase 2: Organization Retrieval (5-8 seconds)

  1. Get all organizations: GET /api/inclusions/getAllOrganizations
  2. Filter excluded centers (config-driven)
  3. Fetch counters in parallel (20 workers):
    • For each org: POST /api/inclusions/inclusion-statistics
    • Store: patients_count, preincluded_count, included_count, prematurely_terminated_count
  4. Optional: Enrich with center mapping (from eb_org_center_mapping.xlsx)
  5. Calculate totals and sort

Phase 3: Patient Data Collection (2-4 minutes)

Nested Parallel Architecture:

Outer Loop (20 workers): For each organization

  • POST /api/inclusions/search?limit=1000&page=1 → Get up to 1000 inclusions

Middle Loop (Sequential): For each patient

  • Fetch clinical record: POST /api/records/byPatient
  • Fetch questionnaires: POST /api/surveys/filter/with-answers (optimized: 1 call)
  • Submit async lab request: GET /api/requests/by-tube-id/{tubeId} (in subtasks pool)

Inner Loop (40 async workers): Non-blocking lab/questionnaire processing

  • Parallel fetches of lab requests while main thread processes fields

Field Processing (per patient):

  • For each field in configuration:
    1. Determine source (questionnaire, record, inclusion, request, calculated)
    2. Extract raw value (supports JSON paths with wildcards)
    3. Check field condition (optional)
    4. Apply post-processing transformations
    5. Format score dictionaries
    6. Store in nested output structure

Phase 4: Quality Assurance (10-15 seconds)

  1. Coherence Check: Compare API counters vs. actual data
  2. Non-Regression Check: Compare current vs. previous run with config rules
  3. Critical Issue Handling: User confirmation if issues detected
  4. If NO critical issues → continue to export
  5. If YES critical issues → prompt user for override

Phase 5: Export & Persistence (3-5 seconds)

Step 1: Backup & JSON Write

  1. Backup old files (if quality checks passed)
  2. Write JSON outputs:
    • endobest_inclusions.json (6-7 MB)
    • endobest_organizations.json (17-20 KB)

Step 2: Excel Export (if configured) Delegated to run_normal_mode_export() function which handles:

  1. Load JSONs from filesystem (ensures consistency)
  2. Load Excel configuration
  3. Validate templates and named ranges
  4. For each configured workbook:
    • Load template file
    • Apply filter conditions (AND logic)
    • Apply multi-key sort
    • Apply value replacements (strict type matching)
    • Fill data into cells/named ranges
    • Handle file conflicts (Overwrite/Increment/Backup)
    • Save workbook
    • Recalculate formulas (optional, via win32com)
  5. Display results and return status

Step 3: Summary

  1. Display elapsed time
  2. Report file locations
  3. Note any warnings/errors during export

⚙️ Configuration System

Three-Layer Configuration Architecture

Layer 1: Excel Configuration (Endobest_Dashboard_Config.xlsx)

Sheet 1: Inclusions_Mapping (Field Extraction)

  • Define which patient fields to extract
  • Specify sources (questionnaire, record, inclusion, request, calculated)
  • Configure transformations (value labels, templates, conditions)
  • ~50+ fields typically configured

Sheet 2: Organizations_Mapping (Organization Fields)

  • Define which organization fields to export
  • Rarely modified

Sheet 3: Excel_Workbooks (Excel Export Metadata)

  • Workbook names
  • Template paths
  • Output filenames (with template variables)
  • File conflict handling strategy (Overwrite/Increment/Backup)

Sheet 4: Excel_Sheets (Sheet Configurations)

  • Workbook name (reference to Excel_Workbooks)
  • Sheet name (in template)
  • Source type (Inclusions/Organizations/Variable)
  • Target (cell or named range)
  • Column mapping (JSON)
  • Filter conditions (JSON with AND logic)
  • Sort keys (JSON, multi-key with datetime support)
  • Value replacements (JSON, strict type matching)

Sheet 5: Regression_Check (Quality Rules)

  • Rule names
  • Field selection pipeline (include/exclude patterns)
  • Scope (all organizations or specific org list)
  • Transition patterns (expected state changes)
  • Severity levels (Warning/Critical)

Layer 2: Organization Mapping (eb_org_center_mapping.xlsx)

  • Optional mapping file
  • Sheet: Org_Center_Mapping
  • Maps organization names to center identifiers
  • Gracefully degraded if missing

Layer 3: Excel Templates (config/templates/)

  • Excel workbook templates with:
    • Sheet definitions
    • Named ranges (for data fill targets)
    • Formula structures
    • Formatting and styles

Configuration Constants (in code)

# API Configuration
IAM_URL = "https://api-auth.ziwig-connect.com"
RC_URL = "https://api-hcp.ziwig-connect.com"
GDD_URL = "https://api-lab.ziwig-connect.com"
RC_APP_ID = "602aea51-cdb2-4f73-ac99-fd84050dc393"
RC_ENDOBEST_PROTOCOL_ID = "3c7bcb4d-91ed-4e9f-b93f-99d8447a276e"

# Threading & Performance
MAX_THREADS = 20                # Main thread pool workers
ASYNC_THREADS = 40              # Subtasks thread pool workers
ERROR_MAX_RETRY = 10            # Maximum retry attempts
WAIT_BEFORE_RETRY = 0.5         # Seconds between retries

# Excluded Organizations
RC_ENDOBEST_EXCLUDED_CENTERS = ["e18e7487-...", "5582bd75-...", "e053512f-..."]

🔐 API Integration

Authentication Flow

1. IAM Login
   POST https://api-auth.ziwig-connect.com/api/auth/ziwig-pro/login
   Request: {"username": "...", "password": "..."}
   Response: {"access_token": "jwt_master", "userId": "uuid"}

2. Token Exchange (RC-specific)
   POST https://api-hcp.ziwig-connect.com/api/auth/config-token
   Headers: Authorization: Bearer {master_token}
   Request: {"userId": "...", "clientId": "...", "userAgent": "..."}
   Response: {"access_token": "jwt_rc", "refresh_token": "refresh_token"}

3. Automatic Token Refresh (on 401)
   POST https://api-hcp.ziwig-connect.com/api/auth/refreshToken
   Headers: Authorization: Bearer {current_token}
   Request: {"refresh_token": "..."}
   Response: {"access_token": "jwt_new", "refresh_token": "new_refresh"}

Key API Endpoints

Endpoint Method Purpose
/api/inclusions/getAllOrganizations GET List all organizations
/api/inclusions/inclusion-statistics POST Get patient counts per org
/api/inclusions/search POST Get inclusions list for org (paginated)
/api/records/byPatient POST Get clinical record for patient
/api/surveys/filter/with-answers POST OPTIMIZED: Get all questionnaires for patient
/api/requests/by-tube-id/{tubeId} GET Get lab test results

Performance Optimization: Questionnaire Batching

Problem: Multiple API calls per patient (1 call per questionnaire × N patients = slow)

Solution: Single optimized call retrieves all questionnaires with answers

BEFORE (inefficient):
for qcm_id in questionnaire_ids:
    GET /api/surveys/{qcm_id}/answers?subject={patient_id}
    # Result: N API calls per patient

AFTER (optimized):
POST /api/surveys/filter/with-answers
{
  "context": "clinic_research",
  "subject": patient_id
}
# Result: 1 API call per patient
# Impact: 4-5x performance improvement

Multithreading & Performance Optimization

Thread Pool Architecture

Main Application Thread
    ↓
┌─ Phase 1: Counter Fetching ──────────────────────────┐
│ ThreadPoolExecutor(max_workers=user_input, cap=20)   │
│ ├─ Task 1: Get counters for Org 1                     │
│ ├─ Task 2: Get counters for Org 2                     │
│ └─ Task N: Get counters for Org N                     │
│ [Sequential wait: tqdm.as_completed]                  │
└──────────────────────────────────────────────────────┘
    ↓
┌─ Phase 2: Inclusion Data Collection (Nested) ────────┐
│ Outer: ThreadPoolExecutor(max_workers=user_input)    │
│                                                       │
│ For Org 1:                                            │
│ │   Inner: ThreadPoolExecutor(max_workers=40)        │
│ │   ├─ Patient 1: Async lab/questionnaire fetch      │
│ │   ├─ Patient 2: Async lab/questionnaire fetch      │
│ │   └─ Patient N: Async lab/questionnaire fetch      │
│ │   [Sequential outer wait: as_completed]            │
│ │                                                     │
│ For Org 2:                                            │
│ │   [Similar parallel processing]                    │
│ │                                                     │
│ For Org N:                                            │
│ │   [Similar parallel processing]                    │
└──────────────────────────────────────────────────────┘

Performance Optimizations

  1. Thread-Local HTTP Clients

    • Each thread maintains its own httpx.Client
    • Avoids connection conflicts
    • Implementation via get_httpx_client()
  2. Nested Parallelization

    • Main pool: Organizations (20 workers)
    • Subtasks pool: Lab requests (40 workers)
    • Non-blocking I/O during processing
  3. Questionnaire Batching (4-5x improvement)

    • Single call retrieves all questionnaires + answers
    • Eliminates N filtered calls per patient
  4. Configurable Worker Threads

    • User input selection (1-20 workers)
    • Tunable for network bandwidth and API rate limits

Progress Tracking (Multi-Level)

Overall Progress [████████████░░░░░░░░░░░░] 847/1200
  1/15 - Center 1 [██████████░░░░░░░░░░░░░░░]  73/95
  2/15 - Center 2 [██████░░░░░░░░░░░░░░░░░░░]  42/110
  3/15 - Center 3 [████░░░░░░░░░░░░░░░░░░░░░]  28/85

Thread-Safe Updates:

with _global_pbar_lock:
    if global_pbar:
        global_pbar.update(1)

🛡️ Error Handling & Resilience

Token Management Strategy

  1. Automatic Token Refresh on 401

    • Triggered by @api_call_with_retry decorator
    • Thread-safe via _token_refresh_lock
  2. Retry Mechanism

    • Max retries: 10 attempts
    • Delay between retries: 0.5 seconds
    • Decorators: @api_call_with_retry
  3. Thread-Safe Token Refresh

    def new_token():
        global access_token, refresh_token
        with _token_refresh_lock:  # Only one thread refreshes at a time
            for attempt in range(ERROR_MAX_RETRY):
                try:
                    # POST /api/auth/refreshToken
                    # Update global tokens
                except:
                    sleep(WAIT_BEFORE_RETRY)
    

Exception Handling Categories

Category Examples Handling
API Errors Network timeouts, HTTP errors Retry with exponential spacing
File I/O Errors Missing config, permission denied Graceful error + exit
Validation Errors Invalid config, incoherent data Log warning + prompt user
Thread Errors Worker thread failures Shutdown gracefully + propagate

Graceful Degradation

  1. Missing Organization Mapping: Skip silently, use fallback (org name)
  2. Critical Quality Issues: Prompt user for confirmation before export
  3. Thread Failure: Shutdown all workers gracefully, preserve partial results
  4. Invalid Configuration: Clear error messages with remediation suggestions

📊 Data Output Structure

JSON Output: endobest_inclusions.json

[
  {
    "Patient_Identification": {
      "Organisation_Id": "uuid",
      "Organisation_Name": "Hospital Name",
      "Center_Name": "HOSP-A",
      "Patient_Id": "internal_id",
      "Pseudo": "ENDO-001",
      "Patient_Name": "Doe, John",
      "Patient_Birthday": "1975-05-15",
      "Patient_Age": 49
    },
    "Inclusion": {
      "Consent_Signed": true,
      "Inclusion_Date": "15/10/2024",
      "Inclusion_Status": "incluse",
      "isPrematurelyTerminated": false
    },
    "Extended_Fields": {
      "Custom_Field_1": "value",
      "Custom_Field_2": 42,
      "Composite_Score": "8/10"
    },
    "Endotest": {
      "Request_Sent": true,
      "Diagnostic_Status": "Completed"
    }
  }
]

JSON Output: endobest_organizations.json

[
  {
    "id": "org-uuid",
    "name": "Hospital A",
    "Center_Name": "HOSP-A",
    "patients_count": 45,
    "preincluded_count": 8,
    "included_count": 35,
    "prematurely_terminated_count": 2
  }
]

🚀 Execution Modes

Mode 1: Normal (Full Collection)

python eb_dashboard.py
  • Authenticates
  • Collects from APIs
  • Runs quality checks
  • Exports JSON + Excel
  • Duration: 2.5-5 minutes (typical)

Mode 2: Excel-Only (Fast Export)

python eb_dashboard.py --excel-only
  • Skips data collection
  • Uses existing JSON files
  • Regenerates Excel workbooks
  • Duration: 5-15 seconds
  • Use case: Reconfigure reports, test templates

Mode 3: Check-Only (Validation Only)

python eb_dashboard.py --check-only
  • Loads existing JSON
  • Runs quality checks
  • No export
  • Duration: 5-10 seconds
  • Use case: Verify data before distribution

Mode 4: Debug (Verbose Output)

python eb_dashboard.py --debug
  • Executes normal mode
  • Enables detailed logging
  • Shows field-by-field changes
  • Check dashboard.log for details

📈 Performance Metrics & Benchmarks

Typical Execution Times (Full Dataset: 1,200+ patients, 15+ organizations)

Phase Duration Notes
Login & Config 2-3 sec Sequential, network-dependent
Fetch Counters 5-8 sec 20 workers, parallelized
Collect Inclusions 2-4 min Includes API calls + field processing
Quality Checks 10-15 sec File loads, data comparison
Export to JSON 3-5 sec File I/O
Export to Excel 5-15 sec Template processing + fill
TOTAL ~2.5-5 min Depends on network, API perf

Network Optimization Impact

With old questionnaire approach (N filtered calls per patient):

  • 1,200 patients × 15 questionnaires = 18,000 API calls
  • Estimated: 15-30 minutes

With optimized single-call questionnaire:

  • 1,200 patients × 1 call = 1,200 API calls
  • Estimated: 2-5 minutes
  • Improvement: 3-6x faster

🔍 Field Extraction & Processing Logic

Complete Field Processing Pipeline

For each field in INCLUSIONS_MAPPING_CONFIG:
  │
  ├─ Step 1: Determine Source Type
  │  ├─ q_id / q_name / q_category → Find questionnaire
  │  ├─ record → Use clinical record
  │  ├─ inclusion → Use patient inclusion data
  │  ├─ request → Use lab request data
  │  └─ calculated → Execute custom function
  │
  ├─ Step 2: Extract Raw Value
  │  ├─ Navigate JSON using field_path
  │  ├─ Supports wildcard (*) for list traversal
  │  └─ Return value or "undefined"
  │
  ├─ Step 3: Check Field Condition (optional)
  │  ├─ If condition undefined → Set to "undefined"
  │  ├─ If condition not boolean → Error flag
  │  ├─ If condition false → Set to "N/A"
  │  └─ If condition true → Continue
  │
  ├─ Step 4: Apply Post-Processing Transformations
  │  ├─ true_if_any: Convert to boolean
  │  ├─ value_labels: Map to localized text
  │  ├─ field_template: Apply formatting
  │  └─ List joining: Flatten arrays with pipe delimiter
  │
  ├─ Step 5: Format Score Dictionaries
  │  ├─ If {total, max} → Format as "total/max"
  │  └─ Otherwise → Keep as-is
  │
  └─ Store: output_inclusion[field_group][field_name] = final_value

Custom Functions for Calculated Fields

Function Purpose Syntax
search_in_fields_using_regex Search multiple fields for pattern ["search_in_fields_using_regex", "pattern", "field1", "field2"]
extract_parentheses_content Extract text within parentheses ["extract_parentheses_content", "field_name"]
append_terminated_suffix Add suffix if patient terminated ["append_terminated_suffix", "status_field", "is_terminated_field"]
if_then_else Unified conditional with 8 operators ["if_then_else", "operator", arg1, arg2_optional, true_result, false_result]

if_then_else Operators:

  • is_true / is_false - Boolean field test
  • is_defined / is_undefined - Existence test
  • all_true / all_defined - Multiple field test
  • == / != - Value comparison

Quality Assurance Framework

Coherence Check

Purpose: Verify API-provided statistics match actual collected data

Logic:

For each organization:
  API_Count = statistic.total
  Actual_Count = count of inclusion records

  if API_Count != Actual_Count:
    Report discrepancy with severity
    ├─ ±10%: Warning
    └─ >±10%: Critical

Non-Regression Check

Purpose: Detect unexpected changes between data runs

Configuration-Driven Rules:

  • Field selection pipeline (include/exclude patterns)
  • Transition patterns (expected state changes)
  • Severity levels (Warning/Critical)
  • Exception handling (exclude specific organizations)

Logic:

Load previous inclusion data (_old file)

For each rule:
  ├─ Build candidate fields via pipeline
  ├─ Determine key field for matching
  └─ For each inclusion:
     ├─ Find matching old inclusion by key
     ├─ Check for unexpected transitions
     ├─ Apply exceptions
     └─ Report violations

📋 Documentation Structure

The system includes comprehensive documentation:

Document Size Content
DOCUMENTATION_10_ARCHITECTURE.md 43.7 KB System design, workflow, APIs, multithreading
DOCUMENTATION_11_FIELD_MAPPING.md 56.3 KB Field extraction logic, custom functions, examples
DOCUMENTATION_12_QUALITY_CHECKS.md 60.2 KB QA framework, regression rules, configuration
DOCUMENTATION_13_EXCEL_EXPORT.md 29.6 KB Excel generation, data transformation, config
DOCUMENTATION_98_USER_GUIDE.md 8.4 KB End-user instructions, troubleshooting, FAQ
DOCUMENTATION_99_CONFIG_GUIDE.md 24.8 KB Administrator reference, Excel tables, examples

🔧 Key Technical Features

Thread Safety

  • Per-thread HTTP clients (no connection conflicts)
  • Synchronized access to global state via locks
  • Thread-safe progress bar updates

Error Recovery

  • Automatic token refresh on 401 errors
  • Exponential backoff retry logic (configurable)
  • Graceful degradation for optional features
  • User confirmation on critical issues

Configuration Flexibility

  • 100% externalized to Excel (zero code changes)
  • Supports multiple data sources
  • Custom business logic functions
  • Field dependencies and conditions
  • Value transformations and templates

Performance

  • Optimized API calls (4-5x improvement)
  • Parallel processing (20+ workers)
  • Async I/O operations
  • Configurable thread pools

Data Quality

  • Coherence checking (stats vs actual data)
  • Non-regression testing (config-driven)
  • Comprehensive validation
  • Audit trail logging

📦 Dependencies

Core Libraries

  • httpx - HTTP client with connection pooling
  • openpyxl - Excel file reading/writing
  • questionary - Interactive CLI prompts
  • tqdm - Progress bars
  • rich - Rich text formatting
  • pywin32 - Windows COM automation (optional, for formula recalculation)
  • pytz - Timezone support (optional)

Python Version

  • Python 3.7+

External Services

  • Ziwig IAM API
  • Ziwig Research Clinic (RC) API
  • Ziwig Lab (GDD) API

🎓 Usage Patterns

For End Users

  1. Configure fields in Excel (no code needed)
  2. Run: python eb_dashboard.py
  3. Review results in JSON or Excel

For Administrators

  1. Add new fields to Inclusions_Mapping
  2. Define quality rules in Regression_Check
  3. Configure Excel export in Excel_Workbooks + Excel_Sheets
  4. Restart: script picks up config automatically

For Developers

  1. Add custom function to Block 6 (eb_dashboard.py)
  2. Register in field config (Inclusions_Mapping)
  3. Use via: "source_id": "function_name"
  4. No code recompile needed for other changes

🎯 Summary

The Endobest Clinical Research Dashboard represents a mature, production-ready system that successfully combines:

Architectural Excellence - Clean modular design with separation of concerns User-Centric Configuration - 100% externalized, no code changes needed Performance Optimization - 4-5x faster via API and threading improvements Robust Resilience - Comprehensive error handling, automatic recovery, graceful degradation Quality Assurance - Multi-level validation, coherence checks, regression testing Comprehensive Documentation - 250+ KB of technical and user guides Maintainability - Clear code structure, extensive logging, audit trails

The system successfully enables non-technical users to configure complex data extraction and reporting workflows while maintaining enterprise-grade reliability and performance standards.


Document Version: 1.0 Last Updated: 2025-11-08 Status: Complete & Production Ready