EB_Dashboard/DOCUMENTATION/DOCUMENTATION_10_ARCHITECTURE.md

# Endobest Clinical Research Dashboard - Technical Documentation

## Part 1: General Architecture & Report Generation Workflow

**Document Version:** 2.0 (Updated with Excel Export feature)
**Last Updated:** 2025-11-08
**Audience:** Developers, Technical Architects
**Language:** English

---

## Table of Contents

1. [Overview](#overview)
2. [System Architecture](#system-architecture)
3. [Module Structure](#module-structure)
4. [Complete Data Collection Workflow](#complete-data-collection-workflow)
5. [API Integration](#api-integration)
6. [Multithreading & Performance](#multithreading--performance)
7. [Data Processing Pipeline](#data-processing-pipeline)
8. [Execution Modes](#execution-modes)
9. [Error Handling & Resilience](#error-handling--resilience)

---

## Overview

The **Endobest Clinical Research Dashboard** is an automated data collection and processing system designed to extract, validate, and consolidate patient inclusion data from the Endobest clinical research protocol across multiple healthcare organizations.

### Key Characteristics

- **100% Externalized Configuration**: All extraction fields defined in Excel, zero code changes needed
- **Multi-Source Data Integration**: Fetches from RC (Research Clinic), GDD (Lab), and questionnaire APIs
- **High-Performance Multithreading**: 20+ concurrent workers for API parallelization
- **Comprehensive Quality Assurance**: Built-in coherence checks and regression testing
- **Thread-Safe Operations**: Dedicated HTTP clients per thread, synchronized access to shared resources
- **Automated Error Recovery**: Token refresh, automatic retry with exponential backoff
- **Audit Trail**: Detailed logging and JSON backup versioning

---

## System Architecture

### High-Level Component Diagram

```
┌─────────────────────────────────────────────────────────┐
│          Endobest Dashboard Main Process                │
│               eb_dashboard.py                           │
├─────────────────────────────────────────────────────────┤
│                                                         │
│  ┌──────────────┐  ┌──────────────┐  ┌──────────────┐  │
│  │ Block 1-3    │  │ Block 4      │  │ Block 5-6    │  │
│  │ Config & Auth│  │ Config Load  │  │ Data Extract │  │
│  └──────────────┘  └──────────────┘  └──────────────┘  │
│        ↓                   ↓                   ↓        │
│  ┌─────────────────────────────────────────────────┐   │
│  │     Extended Fields Configuration               │   │
│  │  (Excel: Mapping Sheet → JSON field mapping)    │   │
│  └─────────────────────────────────────────────────┘   │
│        ↓                   ↓                   ↓        │
│  ┌──────────────┐  ┌──────────────┐  ┌──────────────┐  │
│  │ Block 7      │  │ Block 8      │  │ Block 9      │  │
│  │ API Calls    │  │ Orchestration│  │ Quality QA   │  │
│  └──────────────┘  └──────────────┘  └──────────────┘  │
│        ↓                   ↓                   ↓        │
│  ┌─────────────────────────────────────────────────┐   │
│  │   Multithreaded Processing (ThreadPoolExecutor) │   │
│  │  - Organizations: 20 workers (parallel)         │   │
│  │  - Requests/Questionnaires: 40 workers (async)  │   │
│  └─────────────────────────────────────────────────┘   │
│        ↓                   ↓                   ↓        │
│  ┌─────────────────────────────────────────────────┐   │
│  │       Quality Checks & Validation               │   │
│  │  - Coherence Check (stats vs detail)            │   │
│  │  - Non-Regression Check (config-driven)         │   │
│  └─────────────────────────────────────────────────┘   │
│        ↓                   ↓                   ↓        │
│  ┌─────────────────────────────────────────────────┐   │
│  │          Export & Persistence                   │   │
│  │  - endobest_inclusions.json                      │   │
│  │  - endobest_organizations.json                   │   │
│  │  - Versioned backups (_old suffix)              │   │
│  └─────────────────────────────────────────────────┘   │
│                                                         │
└─────────────────────────────────────────────────────────┘
        ↓
┌──────────────────────────────────┐
│   Utility Modules                │
├──────────────────────────────────┤
│ • eb_dashboard_utils.py          │
│ • eb_dashboard_quality_checks.py │
└──────────────────────────────────┘
        ↓
┌──────────────────────────────────┐
│   External APIs                  │
├──────────────────────────────────┤
│ • IAM (Authentication)           │
│ • RC (Research Clinic)           │
│ • GDD (Lab / Diagnostic Data)    │
└──────────────────────────────────┘
```

---

## Module Structure

### 1. **eb_dashboard.py** (Primary Orchestrator)
**Size:** ~45 KB | **Lines:** 1,021
**Responsibility:** Main application logic, API coordination, multithreading

#### Major Blocks:
- **Block 1**: Configuration & Base Infrastructure (constants, global variables, progress bar setup)
- **Block 2**: Decorators & Resilience (retry logic, token refresh)
- **Block 3**: Authentication (IAM login, token management)
- **Block 4**: Extended Fields Configuration (Excel loading & validation)
- **Block 5**: Data Search & Extraction (questionnaire finding, field retrieval)
- **Block 6**: Custom Functions & Field Processing (business logic, calculated fields)
- **Block 7**: Business API Calls (RC, GDD endpoints)
- **Block 7b**: Organization Center Mapping (organization enrichment with center identifiers)
- **Block 8**: Processing Orchestration (patient data processing)
- **Block 9**: Main Execution (entry point, quality checks, export)

### 2. **eb_dashboard_utils.py** (Reusable Utilities)
**Size:** ~6.4 KB | **Lines:** 184
**Responsibility:** Generic utility functions shared across modules

#### Core Functions:
```python
get_httpx_client()                  # Thread-local HTTP client management
get_thread_position()               # Progress bar positioning
get_nested_value()                  # JSON path navigation with wildcard support
get_config_path()                   # Config folder resolution (script vs PyInstaller)
get_old_filename()                  # Backup filename generation
```

### 3. **eb_dashboard_quality_checks.py** (QA & Validation)
**Size:** ~59 KB | **Lines:** 1,266
**Responsibility:** Quality assurance, data validation, regression checking

#### Core Functions:
```python
load_regression_check_config()       # Load regression rules from Excel
run_quality_checks()                 # Orchestrate all QA checks
coherence_check()                    # Verify stats vs detailed data consistency
non_regression_check()               # Config-driven change validation
run_check_only_mode()                # Standalone validation mode
backup_output_files()                # Create versioned backups
```

### 4. **eb_dashboard_excel_export.py** (Excel Report Generation & Orchestration)
**Size:** ~38 KB | **Lines:** ~1,340 (v1.1+)
**Responsibility:** Configuration-driven Excel workbook generation with data transformation + high-level orchestration

#### Low-Level Functions (Data Processing):
```python
load_excel_export_config()           # Load Excel_Workbooks and Excel_Sheets config
validate_excel_config()              # Validate templates and named ranges
export_to_excel()                    # Main export orchestration (openpyxl + win32com)
_apply_filter()                      # AND-condition filtering
_apply_sort()                        # Multi-key sorting with datetime support
_apply_value_replacement()           # Strict type matching value transformation
_handle_output_exists()              # File conflict resolution (Overwrite/Increment/Backup)
_recalculate_workbook()              # Formula recalculation via win32com (optional)
_process_sheet()                     # Sheet-specific data filling
```

#### High-Level Orchestration Functions (v1.1+):
```python
export_excel_only(sys_argv, console, ...)     # Complete --excel-only mode orchestration
run_normal_mode_export(data, data, enabled, config, ...)  # Normal mode export phase
prepare_excel_export(inclusions_file, organizations_file, ...)  # Prep + validate
execute_excel_export(inclusions_data, organizations_data, config, ...)  # Exec + error handling
_load_json_file_internal(filename)  # Safe JSON loading helper
```

**Design Pattern (v1.1+):**
- All export mechanics delegated to module (follows quality_checks pattern)
- Main script calls single function per mode: `export_excel_only()` or `run_normal_mode_export()`
- Configuration validation and error handling centralized in module
- Result: Main script focused on business logic, export details encapsulated

**Note:** See [DOCUMENTATION_13_EXCEL_EXPORT.md](DOCUMENTATION_13_EXCEL_EXPORT.md) for complete architecture and configuration details.

### 5. **eb_dashboard_constants.py** (Centralized Configuration)
**Size:** ~3.5 KB | **Lines:** 120
**Responsibility:** Single source of truth for all application constants

#### Constants Categories:
```python
# File Management
INCLUSIONS_FILE_NAME, ORGANIZATIONS_FILE_NAME, CONFIG_FOLDER_NAME, etc.

# Excel Configuration
DASHBOARD_CONFIG_FILE_NAME, ORG_CENTER_MAPPING_FILE_NAME
EXCEL_WORKBOOKS_TABLE_NAME, EXCEL_SHEETS_TABLE_NAME, etc.

# API Configuration
API_TIMEOUT, API_*_ENDPOINT (9 endpoints across Auth, RC, GDD)
DEFAULT_USER_NAME, DEFAULT_PASSWORD, IAM_URL, RC_URL, GDD_URL, RC_APP_ID

# Research Protocol
RC_ENDOBEST_PROTOCOL_ID, RC_ENDOBEST_EXCLUDED_CENTERS

# Performance & Quality
ERROR_MAX_RETRY, WAIT_BEFORE_RETRY, MAX_THREADS
EXCEL_RECALC_TIMEOUT

# Logging & UI
LOG_FILE_NAME, BAR_N_FMT_WIDTH, BAR_TOTAL_FMT_WIDTH, etc.
```

**Design Principle:** All constants are imported from this module - never duplicated or redefined in other modules. This ensures a single source of truth for all configuration values across the entire application.

---

## Complete Data Collection Workflow

### Phase 1: Initialization & Authentication

```
START
  ↓
[1] User Login Prompt
  ├─ Input: username, password (defaults available)
  ├─ IAM Authentication: POST /api/auth/ziwig-pro/login
  ├─ Get Master Token + User ID
  └─ RC Token Exchange: POST /api/auth/config-token
      └─ Output: access_token, refresh_token
  ↓
[2] Configuration Loading
  ├─ Parse Excel: Endobest_Dashboard_Config.xlsx
  ├─ Load Inclusions_Mapping sheet → Field mapping definition
  ├─ Validate all field configurations
  └─ Load Regression_Check sheet → Quality rules
  ↓
[3] Thread Pool Configuration
  ├─ Main pool: ThreadPoolExecutor(user_input_threads, max=20)
  ├─ Async pool: ThreadPoolExecutor(40) for nested tasks
  └─ Initialize per-thread HTTP clients
```

### Phase 2: Organization & Counters Retrieval

```
[4] Get All Organizations
  ├─ API: GET /api/inclusions/getAllOrganizations
  ├─ Filter: Exclude RC_ENDOBEST_EXCLUDED_CENTERS
  └─ Output: List of all centers
  ↓
[5] Fetch Organization Counters (Parallelized)
  ├─ For each organization:
  │  └─ POST /api/inclusions/inclusion-statistics
  │     ├─ Protocol: RC_ENDOBEST_PROTOCOL_ID
  │     └─ Store: patients_count, preincluded_count, included_count, prematurely_terminated_count
  ├─ Execute: 20 parallel workers
  └─ Output: Organizations with counters
  ↓
[5b] Enrich Organizations with Center Mapping (Optional)
  ├─ Load mapping file: eb_org_center_mapping.xlsx (if exists)
  ├─ Parse sheet: Org_Center_Mapping
  │  ├─ Extract: Organization_Name → Center_Name pairs
  │  ├─ Validate: No duplicate organizations or centers
  │  └─ Build: Normalized key mapping (case-insensitive, trimmed)
  ├─ For each organization:
  │  ├─ Normalize organization name
  │  ├─ Lookup in mapping dictionary
  │  ├─ If found: Add Center_Name field (mapped value)
  │  └─ If not found: Add Center_Name field (fallback to org name)
  ├─ Error Handling: Graceful degradation (missing file = skip silently)
  └─ Output: Organizations with enriched Center_Name field
  ↓
[6] Calculate Totals & Sort
  ├─ Sum all patient counts across organizations
  ├─ Sort organizations by patient count (descending)
  └─ Display summary statistics
```

### Phase 3: Patient Inclusion Data Collection

```
[7] For Each Organization (Parallelized - 20 workers):
  ├─ API: POST /api/inclusions/search?limit=1000&page=1
  │  └─ Retrieve up to 1000 inclusions per organization
  ├─ Store: inclusions_list[]
  └─ For Each Patient in Inclusions (Sequential):
      ↓
      [8] Fetch Patient Data Sources (Parallel):
          ├─ THREAD 1: GET /api/records/byPatient
          │  └─ Retrieve clinical record, protocol inclusions, data
          ├─ THREAD 2: GET /api/surveys/filter/with-answers (OPTIMIZED)
          │  └─ Single call retrieves ALL questionnaires + answers for patient
          ├─ THREAD 3: GET /api/requests/by-tube-id/{tubeId}
          │  └─ Retrieve lab test results
          └─ WAIT: All parallel threads complete
      ↓
      [9] Process Field Mappings
          ├─ For each field in field mapping config:
          │  ├─ Determine field source (questionnaire, record, inclusion, request)
          │  ├─ Extract raw value using field_path (supports JSON path + wildcards)
          │  ├─ Apply field condition (if specified)
          │  ├─ Execute custom functions (if Calculated type)
          │  ├─ Apply post-processing transformations:
          │  │  ├─ true_if_any: Convert to boolean if value matches list
          │  │  ├─ value_labels: Map value to localized text
          │  │  ├─ field_template: Apply formatting template
          │  │  └─ List joining: Join array values with pipe delimiter
          │  └─ Store in output_inclusion[field_group][field_name]
          └─ Output: Complete inclusion record with all fields
      ↓
      [10] Progress Update
          ├─ Update per-organization progress bar
          └─ Update global progress bar (thread-safe)
  ↓
[11] Aggregate Results
  └─ Combine all inclusions from all organizations
```

### Phase 4: Quality Assurance & Validation

```
[12] Sorting
  ├─ Sort by: Organization Name, Inclusion Date, Patient Pseudo
  └─ Output: Ordered inclusions_list[]
  ↓
[13] Quality Checks Execution
  ├─ COHERENCE CHECK:
  │  ├─ Compare organization statistics (API counters)
  │  ├─ vs. actual inclusion data (detailed records)
  │  ├─ Verify: total, preincluded, included, prematurely_terminated counts
  │  └─ Report mismatches with severity levels
  │
  ├─ NON-REGRESSION CHECK:
  │  ├─ Load previous inclusions (_old file)
  │  ├─ Compare current vs. previous data
  │  ├─ Apply config-driven regression rules
  │  ├─ Detect: new inclusions, deleted inclusions, field changes
  │  ├─ Apply transition patterns and exceptions
  │  └─ Report violations by severity (Warning/Critical)
  │
  └─ Result: has_coherence_critical, has_regression_critical flags
  ↓
[14] Critical Issues Handling
  ├─ If NO critical issues:
  │  └─ Continue to export
  ├─ If YES critical issues:
  │  ├─ Display warning: ⚠ CRITICAL issues detected!
  │  ├─ Prompt user: "Do you want to write results anyway?"
  │  ├─ If NO → Cancel export, exit gracefully
  │  └─ If YES → Continue to export (user override)
```

### Phase 5: Export & Persistence

**Phase 5 covers both JSON persistence and optional Excel export. The architecture is flexible:**

```
[15] Backup Old Files (only if checks passed)
  ├─ endobest_inclusions.json → endobest_inclusions_old.json
  ├─ endobest_organizations.json → endobest_organizations_old.json
  └─ Operation: Silent, overwrite existing backups
  ↓
[16] Write JSON Output Files
  ├─ File 1: endobest_inclusions.json
  │  ├─ Format: JSON array of inclusion objects
  │  ├─ Structure: Nested by field groups
  │  └─ Size: Typically 6-7 MB (for full Endobest)
  │
  ├─ File 2: endobest_organizations.json
  │  ├─ Format: JSON array of organization objects
  │  ├─ Includes: counters, statistics
  │  └─ Size: Typically 17-20 KB
  │
  └─ Both: UTF-8 encoding, 4-space indentation
  ↓
[17] Excel Export (if configured)
  ├─ DELEGATED TO: run_normal_mode_export()
  ├─ (from eb_dashboard_excel_export module)
  │
  ├─ Workflow:
  │  ├─ Check: Is Excel export enabled?
  │  │  └─ If NO → Skip to Completion (step 18)
  │  │  └─ If YES → Continue
  │  │
  │  ├─ Load JSONs from filesystem
  │  │  └─ Ensures consistency with just-written files
  │  │
  │  ├─ Load Excel export configuration
  │  │  ├─ Sheet: Excel_Workbooks (workbook definitions)
  │  │  └─ Sheet: Excel_Sheets (sheet configurations)
  │  │
  │  ├─ For each configured workbook:
  │  │  ├─ Load template file (openpyxl)
  │  │  ├─ For each sheet in workbook:
  │  │  │  ├─ Load source data (Inclusions or Organizations JSON)
  │  │  │  ├─ Apply filter (AND conditions)
  │  │  │  ├─ Apply multi-key sort (datetime-aware)
  │  │  │  ├─ Apply value replacements (strict type matching)
  │  │  │  └─ Fill data into cells/named ranges
  │  │  │
  │  │  ├─ Handle file conflicts (Overwrite/Increment/Backup strategy)
  │  │  ├─ Save workbook (openpyxl)
  │  │  └─ Recalculate formulas (optional, via win32com)
  │  │
  │  └─ Return: status (success/failure) + error message
  │
  └─ Note: See DOCUMENTATION_13_EXCEL_EXPORT.md for data transformation details
  ↓
[18] Completion & Reporting
  ├─ Display elapsed time
  ├─ Report all file locations (JSONs + Excel files if generated)
  ├─ Log all operations to dashboard.log
  └─ EXIT
```

**Three Operating Modes:**

1. **NORMAL MODE** (full workflow)
   - Collect data → Quality checks → Write JSONs → Excel export (if enabled)

2. **--excel-only MODE**
   - Skip data collection + quality checks
   - Load existing JSONs → Excel export
   - Uses: `export_excel_only()` function from module

3. **--check-only MODE**
   - Skip data collection
   - Run quality checks only
   - Uses: `run_check_only_mode()` function from quality_checks module

### Expected Output Structure

```json
[
  {
    "Patient_Identification": {
      "Organisation_Id": "uuid",
      "Organisation_Name": "Center Name",
      "Patient_Id": "internal_id",
      "Pseudo": "ENDO-001",
      "Patient_Name": "Doe, John",
      "Patient_Birthday": "1975-05-15",
      "Patient_Age": 49
    },
    "Inclusion": {
      "Consent_Signed": true,
      "Inclusion_Date": "15/10/2024",
      "Inclusion_Status": "incluse",
      "Inclusion_Complex": "Non",
      "isPrematurelyTerminated": false,
      "Inclusion_Status_Complete": "incluse",
      "Need_RCP": false
    },
    "Extended_Fields": {
      "Custom_Field_1": "value",
      "Custom_Field_2": 42
    },
    "Endotest": {
      "Request_Sent": true,
      "Diagnostic_Status": "Completed",
      "Request_Overall_Status": "Accepted par Ziwig Lab"
    },
    "Infos Générales": {
      "Couleurs (ex: 8/10)": "8/10",
      "Qualité de vie (ex: 43/55)": "43/55"
    }
  }
]
```

---

## API Integration

### Authentication APIs (IAM)

#### Login Endpoint
```
POST https://api-auth.ziwig-connect.com/api/auth/ziwig-pro/login

Request:
{
  "username": "user@example.com",
  "password": "password123"
}

Response:
{
  "access_token": "jwt_token_master",
  "userId": "user-uuid",
  ...
}
```

#### Token Exchange (RC-specific)
```
POST https://api-hcp.ziwig-connect.com/api/auth/config-token

Headers:
  Authorization: Bearer {master_token}

Request:
{
  "userId": "user-uuid",
  "clientId": "602aea51-cdb2-4f73-ac99-fd84050dc393",
  "userAgent": "Mozilla/5.0..."
}

Response:
{
  "access_token": "jwt_token_rc",
  "refresh_token": "refresh_token_value"
}
```

#### Token Refresh (Automatic on 401)
```
POST https://api-hcp.ziwig-connect.com/api/auth/refreshToken

Headers:
  Authorization: Bearer {current_access_token}

Request:
{
  "refresh_token": "refresh_token_value"
}

Response:
{
  "access_token": "new_jwt_token",
  "refresh_token": "new_refresh_token"
}
```

### Research Clinic APIs (RC)

#### Get All Organizations
```
GET https://api-hcp.ziwig-connect.com/api/inclusions/getAllOrganizations

Headers:
  Authorization: Bearer {access_token}

Response:
[
  {
    "id": "org-uuid",
    "name": "Center Name",
    "address": "...",
    ...
  }
]
```

#### Get Organization Statistics
```
POST https://api-hcp.ziwig-connect.com/api/inclusions/inclusion-statistics

Headers:
  Authorization: Bearer {access_token}

Request:
{
  "protocolId": "3c7bcb4d-91ed-4e9f-b93f-99d8447a276e",
  "center": "org-uuid",
  "excludedCenters": ["excluded-org-uuid-1", "excluded-org-uuid-2"]
}

Response:
{
  "statistic": {
    "totalInclusions": 145,
    "preIncluded": 23,
    "included": 110,
    "prematurelyTerminated": 12
  }
}
```

#### Search Inclusions by Organization
```
POST https://api-hcp.ziwig-connect.com/api/inclusions/search?limit=1000&page=1

Headers:
  Authorization: Bearer {access_token}

Request:
{
  "protocolId": "3c7bcb4d-91ed-4e9f-b93f-99d8447a276e",
  "center": "org-uuid",
  "keywords": ""
}

Response:
{
  "data": [
    {
      "id": "patient-uuid",
      "name": "Doe, John",
      "status": "incluse",
      ...
    }
  ]
}
```

#### Get Patient Clinical Record
```
POST https://api-hcp.ziwig-connect.com/api/records/byPatient

Headers:
  Authorization: Bearer {access_token}

Request:
{
  "center": "org-uuid",
  "patientId": "patient-uuid",
  "mode": "exchange",
  "state": "ongoing",
  "includeEndoParcour": false,
  "sourceClient": "pro_prm"
}

Response:
{
  "record": {
    "protocol_inclusions": [
      {
        "status": "incluse",
        "blockedQcmVersions": [],
        "clinicResearchData": [
          {
            "requestMetaData": {
              "tubeId": "tube-uuid"
            }
          }
        ]
      }
    ]
  }
}
```

#### Get All Questionnaires for Patient (Optimized)
```
POST https://api-hcp.ziwig-connect.com/api/surveys/filter/with-answers

Headers:
  Authorization: Bearer {access_token}

Request:
{
  "context": "clinic_research",
  "subject": "patient-uuid",
  "blockedQcmVersions": [] (optional)
}

Response:
[
  {
    "questionnaire": {
      "id": "qcm-uuid",
      "name": "Questionnaire Name",
      "category": "Category"
    },
    "answers": {
      "question_1": "answer_value",
      "question_2": true,
      ...
    }
  }
]
```

### Lab APIs (GDD)

#### Get Request by Tube ID
```
GET https://api-lab.ziwig-connect.com/api/requests/by-tube-id/{tubeId}?isAdmin=true&organization=undefined

Headers:
  Authorization: Bearer {access_token}

Response:
{
  "id": "request-uuid",
  "status": "completed",
  "tubeId": "tube-uuid",
  "diagnostic_status": "Completed",
  "results": [
    {
      "test_name": "Test Result",
      "value": "Result Value"
    }
  ]
}
```

---

## Multithreading & Performance

### Thread Pool Architecture

```
Main Application Thread
    ↓
┌─────────────────────────────────────────────────────┐
│  Phase 1: Counter Fetching                          │
│  ThreadPoolExecutor(max_workers=user_input)         │
│  ├─ Task 1: Get counter for Org 1                   │
│ ├─ Task 2: Get counter for Org 2                   │
│  └─ Task N: Get counter for Org N                   │
│  [Sequential wait: tqdm.as_completed]              │
└─────────────────────────────────────────────────────┘
    ↓
┌─────────────────────────────────────────────────────┐
│  Phase 2: Inclusion Data Collection (Nested)       │
│  Outer: ThreadPoolExecutor(max_workers=user_input) │
│  ├─ For Org 1:                                      │
│  │  └─ Inner: ThreadPoolExecutor(max_workers=40)   │
│  │     ├─ Patient 1: Async request/questionnaires  │
│  │     ├─ Patient 2: Async request/questionnaires  │
│  │     └─ Patient N: Async request/questionnaires  │
│  │  └─ [Sequential wait: as_completed]             │
│  │                                                  │
│  ├─ For Org 2:                                      │
│  │  └─ [Similar parallel processing]               │
│  │                                                  │
│  └─ For Org N:                                      │
│     └─ [Similar parallel processing]               │
│  [Outer wait: tqdm.as_completed]                   │
└─────────────────────────────────────────────────────┘
```

### Performance Optimizations

#### 1. Questionnaire Batching
**Problem:** Multiple filtered API calls per patient (slow)
**Solution:** Single optimized API call retrieves all questionnaires with answers
**Impact:** 4-5x performance improvement

```python
# BEFORE (inefficient):
for qcm_id in questionnaire_ids:
    answers = GET /api/surveys/{qcm_id}/answers?subject={patient_id}

# AFTER (optimized):
all_answers = POST /api/surveys/filter/with-answers
  with payload: {"context": "clinic_research", "subject": patient_id}
```

#### 2. Thread-Local HTTP Clients
**Problem:** Shared httpx.Client causes connection conflicts
**Solution:** Each thread maintains its own client
**Implementation:**
```python
def get_httpx_client() -> httpx.Client:
    thread_id = threading.get_ident()
    if thread_id not in httpx_clients:
        httpx_clients[thread_id] = httpx.Client()
    return httpx_clients[thread_id]
```

#### 3. Nested Parallelization
**Problem:** Sequential patient processing within organization
**Solution:** Submitting request/questionnaire fetches to async pool
**Benefit:** Non-blocking I/O during main thread processing

```python
for inclusion in inclusions:
    output_inclusion = _process_inclusion_data(inclusion, organization)
    # Within _process_inclusion_data():
    request_future = subtasks_thread_pool.submit(get_request_by_tube_id, tube_id)
    all_questionnaires = get_all_questionnaires_by_patient(patient_id, record_data)
    request_data = request_future.result()  # Wait for async completion
```

#### 4. Configurable Worker Threads
**User Input:** Thread count selection (1-20 workers)
**Rationale:** Allows tuning for network bandwidth, API rate limits, system resources

### Progress Tracking

#### Multi-Level Progress Bars
```
Overall Progress                    [████████████░░░░░░░░░░░░] 847/1200
  1/15 - Center 1                  [██████████░░░░░░░░░░░░░░░]  73/95
  2/15 - Center 2                  [██████░░░░░░░░░░░░░░░░░░░]  42/110
  3/15 - Center 3                  [████░░░░░░░░░░░░░░░░░░░░░]  28/85
```

#### Thread-Safe Progress Updates
```python
with _global_pbar_lock:
    if global_pbar:
        global_pbar.update(1)  # Thread-safe update
```

---

## Data Processing Pipeline

### Field Extraction Logic

```
For each field in field mapping configuration:
  ├─ Input: field configuration from Excel
  │
  ├─ Step 1: Determine Field Source
  │  ├─ If source_type in [q_id, q_name, q_category]
  │  │  └─ Find questionnaire in all_questionnaires dict
  │  ├─ If source_type == "record"
  │  │  └─ Use record_data (clinical record)
  │  ├─ If source_type == "inclusion"
  │  │  └─ Use inclusion_data (patient inclusion data)
  │  ├─ If source_type == "request"
  │  │  └─ Use request_data (lab test request)
  │  └─ If source_name == "Calculated"
  │     └─ Execute custom function
  │
  ├─ Step 2: Extract Raw Value
  │  ├─ Navigate JSON using field_path (supports * wildcard)
  │  ├─ Example: ["record", "clinicResearchData", "*", "value"]
  │  └─ Result: raw_value or "undefined"
  │
  ├─ Step 3: Check Field Condition (optional)
  │  ├─ If condition field is undefined
  │  │  └─ Set final_value = "undefined"
  │  ├─ If condition field is not boolean
  │  │  └─ Set final_value = "$$$$ Condition Field Error"
  │  ├─ If condition field is False
  │  │  └─ Set final_value = "N/A"
  │  └─ If condition field is True
  │     └─ Continue processing
  │
  ├─ Step 4: Apply Post-Processing Transformations
  │  ├─ true_if_any: Convert to boolean
  │  │  └─ If raw_value matches any value in true_if_any list → True
  │  │  └─ Otherwise → False
  │  │
  │  ├─ value_labels: Map to localized text
  │  │  └─ Find matching label_map entry by raw_value
  │  │  └─ Replace with French text (text.fr)
  │  │
  │  ├─ field_template: Apply formatting
  │  │  └─ Replace "$value" placeholder with formatted value
  │  │  └─ Example: "$value%" → "85%"
  │  │
  │  └─ List joining: Flatten arrays
  │     └─ Join array elements with "|" delimiter
  │
  ├─ Step 5: Format Score Dictionaries
  │  ├─ If value is dict with keys ['total', 'max']
  │  │  └─ Format as "total/max" string
  │  │  └─ Example: {"total": 8, "max": 10} → "8/10"
  │  └─ Otherwise: Keep as-is
  │
  └─ Output: final_value
     └─ Stored in output_inclusion[field_group][field_name]
```

### Custom Functions for Calculated Fields

#### 1. search_in_fields_using_regex
**Purpose:** Search multiple fields for regex pattern match
**Syntax:** `["search_in_fields_using_regex", "regex_pattern", "field_1", "field_2", ...]`
**Logic:**
```
FOR each field in [field_1, field_2, ...]:
  IF field value matches regex_pattern (case-insensitive):
    RETURN True
RETURN False
```
**Example:**
```json
{
  "source_id": "search_in_fields_using_regex",
  "field_path": [".*surgery.*", "Indication", "Previous_Surgery"]
}
```

#### 2. extract_parentheses_content
**Purpose:** Extract text within parentheses
**Syntax:** `["extract_parentheses_content", "field_name"]`
**Logic:**
```
value = get_value_from_inclusion(field_name)
RETURN match first occurrence of (content) pattern
```
**Example:**
```
Input: "Status (Active)"
Output: "Active"
```

#### 3. append_terminated_suffix
**Purpose:** Add " - AP" suffix if patient prematurely terminated
**Syntax:** `["append_terminated_suffix", "status_field", "is_terminated_field"]`
**Logic:**
```
status = get_value_from_inclusion(status_field)
is_terminated = get_value_from_inclusion(is_terminated_field)
IF is_terminated == True:
  RETURN status + " - AP"
ELSE:
  RETURN status
```

#### 4. if_then_else
**Purpose:** Unified conditional logic with 8 operators
**Syntax:** `["if_then_else", "operator", arg1, arg2_optional, result_if_true, result_if_false]`

**Operators:**

| Operator | Args | Logic |
|----------|------|-------|
| `is_true` | field, true_val, false_val | IF field == True THEN true_val ELSE false_val |
| `is_false` | field, true_val, false_val | IF field == False THEN true_val ELSE false_val |
| `is_defined` | field, true_val, false_val | IF field is not undefined THEN true_val ELSE false_val |
| `is_undefined` | field, true_val, false_val | IF field is undefined THEN true_val ELSE false_val |
| `all_true` | [fields_list], true_val, false_val | IF all fields are True THEN true_val ELSE false_val |
| `all_defined` | [fields_list], true_val, false_val | IF all fields are defined THEN true_val ELSE false_val |
| `==` | value1, value2, true_val, false_val | IF value1 == value2 THEN true_val ELSE false_val |
| `!=` | value1, value2, true_val, false_val | IF value1 != value2 THEN true_val ELSE false_val |

**Value Resolution Rules:**
- **Boolean literals:** `true`, `false` → used directly
- **Numeric literals:** `42`, `3.14` → used directly
- **String literals:** Prefixed with `$` → `$"Active"` → `"Active"`
- **Field references:** No prefix → looked up from inclusion data

**Examples:**
```json
{
  "source_id": "if_then_else",
  "field_path": ["is_defined", "Patient_Id", "$\"DEFINED\"", "$\"UNDEFINED\""]
}

{
  "source_id": "if_then_else",
  "field_path": ["==", "Status", "$\"Active\"", "$\"Is Active\"", "$\"Not Active\""]
}

{
  "source_id": "if_then_else",
  "field_path": ["all_true", ["Is_Consented", "Is_Included"], true, false]
}
```

---

## Execution Modes

### Mode 1: Normal Mode (Full Data Collection)
```bash
python eb_dashboard.py
```

**Workflow:**
1. User login (with defaults)
2. Load configuration
3. Collect organizations & counters
4. Collect all inclusion data (parallelized)
5. Run quality checks (coherence + regression)
6. Prompt user if critical issues
7. Export JSON files
8. Display elapsed time

**Output Files:**
- `endobest_inclusions.json`
- `endobest_organizations.json`
- Backup files with `_old` suffix
- Excel files (if configured in Excel_Workbooks table)

### Mode 2: Excel-Only Mode (Fast Export) - NEW
```bash
python eb_dashboard.py --excel-only
```

**Workflow:**
1. Load existing JSON files (no API calls, no collection)
2. Load Excel export configuration
3. Generate Excel workbooks from existing data
4. Exit

**Use Case:** Regenerate Excel reports without data collection (faster iteration), test new configurations, apply new filters/sorts

**Output Files:**
- Excel files as specified in Excel_Workbooks configuration

### Mode 3: Check-Only Mode (Validation Only)
```bash
python eb_dashboard.py --check-only
```

**Workflow:**
1. Load existing JSON files (no API calls)
2. Load regression check configuration
3. Run quality checks without collecting new data
4. Report any issues
5. Exit

**Use Case:** Validate data before distribution, no fresh collection needed

### Mode 4: Check-Only Compare Mode (File Comparison)
```bash
python eb_dashboard.py --check-only file1.json file2.json
```

**Workflow:**
1. Load two specific JSON files
2. Run regression check comparing file1 vs file2
3. Skip coherence check (organizations file not needed)
4. Report differences
5. Exit

**Use Case:** Compare two snapshot versions without coherence validation

### Mode 4: Debug Mode (Detailed Output)
```bash
python eb_dashboard.py --debug
```

**Workflow:**
1. Execute as normal mode
2. Enable DEBUG_MODE in quality checks module
3. Display detailed field-by-field changes
4. Show individual inclusion comparisons
5. Verbose logging

**Use Case:** Troubleshoot regression check rules, understand data changes

---

## Organization ↔ Center Mapping

### Overview

The organization-to-center mapping feature enriches healthcare organization records with standardized center identifiers. This enables center-based reporting without requiring code modifications.

### Configuration

**File:** `eb_org_center_mapping.xlsx` (optional, in script directory)

**Sheet Name:** `Org_Center_Mapping` (case-sensitive)

**Required Columns:**
```
| Organization_Name | Center_Name |
|-------------------|-------------|
| Hospital A        | HOSP-A      |
| Hospital B        | HOSP-B      |
```

### Workflow

1. **Load Mapping** (Step [5b] of Phase 2)
   - Read `eb_org_center_mapping.xlsx` if file exists
   - Parse `Org_Center_Mapping` sheet
   - Skip silently if file not found (graceful degradation)

2. **Validate Data**
   - Check for duplicate organization names (normalized: lowercase, trimmed)
   - Check for duplicate center names
   - If duplicates found: abort mapping, return empty dict

3. **Build Mapping Dictionary**
   - Key: normalized organization name
   - Value: center name (original case preserved)
   - Example: `{"hospital a": "HOSP-A"}`

4. **Apply to Organizations**
   - For each organization from RC API:
     - Normalize organization name (lowercase, trim)
     - Lookup in mapping dictionary
     - If found: Add `Center_Name` field with mapped value
     - If not found: Add `Center_Name` field with fallback (org name)

### Error Handling

| Scenario | Behavior |
|----------|----------|
| File missing | Print warning, skip mapping |
| Sheet not found | Print warning, skip mapping |
| Columns missing | Print warning, skip mapping |
| Duplicate organizations | Abort mapping, print error |
| Duplicate centers | Abort mapping, print error |
| Organization not in mapping | Use fallback (org name) |

### Output

**In `endobest_organizations.json`:**
```json
{
  "id": "org-uuid",
  "name": "Hospital A",
  "Center_Name": "HOSP-A",
  "patients_count": 45,
  ...
}
```

**In `endobest_inclusions.json` (if extended field configured):**
```json
{
  "Patient_Identification": {
    "Organisation_Name": "Hospital A",
    "Center_Name": "HOSP-A",
    ...
  }
}
```

### Example

**Input Organizations (from RC API):**
```json
[
  {"id": "org1", "name": "Hospital A"},
  {"id": "org2", "name": "Hospital B"},
  {"id": "org3", "name": "Clinic C"}
]
```

**Mapping File:**
```
Organization_Name | Center_Name
Hospital A        | HOSP-A
Hospital B        | HOSP-B
```

**Console Output:**
```
Mapping organizations to centers...
⚠ 1 organization(s) not mapped:
  - Clinic C
```

**Result:** Clinic C uses fallback → `Center_Name = "Clinic C"`

### Features

- ✅ **Case-Insensitive Matching**: "Hospital A" matches "hospital a" in file
- ✅ **Whitespace Trimming**: "  Hospital A  " matches "Hospital A"
- ✅ **Graceful Degradation**: Missing file doesn't break process
- ✅ **Fallback Strategy**: Unmapped organizations use original name
- ✅ **No Code Changes**: Fully configurable via Excel file

---

## Error Handling & Resilience

### Token Management Strategy

#### 1. Automatic Token Refresh on 401
```python
@api_call_with_retry
def some_api_call():
    # If response.status_code == 401:
    #   new_token() is called automatically
    #   Request is retried
    pass
```

#### 2. Thread-Safe Token Refresh
```python
def new_token():
    global access_token, refresh_token
    with _token_refresh_lock:  # Only one thread refreshes at a time
        # Attempt refresh up to ERROR_MAX_RETRY times
        for attempt in range(ERROR_MAX_RETRY):
            try:
                # POST /api/auth/refreshToken
                # Update global tokens
            except:
                sleep(WAIT_BEFORE_RETRY)
```

### Retry Mechanism

#### Configuration Constants
```python
ERROR_MAX_RETRY = 10              # Maximum retry attempts
WAIT_BEFORE_RETRY = 0.5           # Seconds between retries (no exponential backoff)
```

#### Retry Logic
```python
for attempt in range(ERROR_MAX_RETRY):
    try:
        # Make API call
        response.raise_for_status()
        return result
    except (httpx.RequestError, httpx.HTTPStatusError) as exc:
        logging.warning(f"Error (Attempt {attempt + 1}/{ERROR_MAX_RETRY}): {exc}")

        # Handle 401 (token expired)
        if isinstance(exc, httpx.HTTPStatusError) and exc.response.status_code == 401:
            logging.info("Token expired. Refreshing token.")
            new_token()

        # Wait before retry (except last attempt)
        if attempt < ERROR_MAX_RETRY - 1:
            sleep(WAIT_BEFORE_RETRY)

# If all retries fail
logging.critical(f"Persistent error after {ERROR_MAX_RETRY} attempts")
raise httpx.RequestError(message="Persistent error")
```

### Exception Handling

#### API Errors
- **httpx.RequestError:** Network errors, connection timeouts, DNS failures
- **httpx.HTTPStatusError:** HTTP status codes >= 400
- **json.JSONDecodeError:** Invalid JSON in configuration or response

#### File I/O Errors
- **FileNotFoundError:** Configuration file missing
- **IOError:** Cannot write output files
- **json.JSONDecodeError:** Corrupted JSON file loading

#### Validation Errors
- **Configuration validation:** Invalid field definitions in Excel
- **Data validation:** Incoherent statistics vs. detailed data
- **Regression check violations:** Unexpected data changes

#### Error Logging
```python
import logging
logging.basicConfig(
    level=logging.WARNING,
    format='%(asctime)s - %(levelname)s - %(message)s',
    filename='dashboard.log',
    filemode='w'
)
```

**Logged Events:**
- API errors with attempt numbers
- Token refresh events
- Configuration loading status
- Quality check results
- File I/O operations
- Thread errors with stack traces

### Graceful Degradation

#### User Confirmation on Critical Issues
```
If has_coherence_critical or has_regression_critical:
  Display: "⚠ CRITICAL issues detected in quality checks!"
  Prompt: "Do you want to write the results anyway?"

  If YES:
    Continue with export (user override)
  If NO:
    Cancel export, preserve old files
    Exit gracefully
```

#### Thread Failure Handling
```python
try:
    result = future.result()
    output_inclusions.extend(result)
except Exception as exc:
    logging.critical(f"Critical error in worker: {exc}", exc_info=True)
    thread_pool.shutdown(wait=False, cancel_futures=True)
    raise  # Propagate to main handler
```

#### Main Exception Handler
```python
if __name__ == '__main__':
    try:
        main()
    except Exception as e:
        logging.critical(f"Script terminated prematurely: {e}", exc_info=True)
        print(f"Error: {e}")
    finally:
        if 'subtasks_thread_pool' in globals():
            subtasks_thread_pool.shutdown(wait=False, cancel_futures=True)
        input("Press Enter to exit...")
```

---

## Performance Metrics & Benchmarks

### Typical Execution Times

For a full Endobest dataset (1,200+ patients, 15+ organizations):

| Phase | Duration | Notes |
|-------|----------|-------|
| Login & Config | ~2-3 sec | Sequential |
| Fetch Counters (20 workers) | ~5-8 sec | Parallelized |
| Collect Inclusions (20 workers) | ~2-4 min | Includes API calls + processing |
| Quality Checks | ~10-15 sec | Loads files, compares data |
| Export to JSON | ~3-5 sec | File I/O |
| **Total** | **~2.5-5 min** | Depends on network, API performance |

### Network Optimization Impact

**With old questionnaire fetching (N filtered calls per patient):**
- 1,200 patients × 15 questionnaires = 18,000 API calls
- Estimated: 15-30 minutes

**With optimized single-call questionnaire fetching:**
- 1,200 patients × 1 call = 1,200 API calls
- Estimated: 2-5 minutes
- **Improvement: 3-6x faster**

---

## Configuration Files

### Excel Configuration File: `Endobest_Dashboard_Config.xlsx`

#### Sheet 1: Inclusions_Mapping (Field Mapping Definition)
Defines all fields to be extracted and their transformation rules.
See **DOCUMENTATION_11_FIELD_MAPPING.md** for detailed guide.

#### Sheet 2: Regression_Check (Non-Regression Rules)
Defines data validation rules for detecting unexpected changes.
See **DOCUMENTATION_12_QUALITY_CHECKS.md** for detailed guide.

---

## Summary

The **Endobest Dashboard** implements a sophisticated, production-grade data collection system with:

✅ **Flexible Configuration:** Zero-code field definitions via Excel
✅ **High Performance:** 4-5x faster via optimized API calls
✅ **Robust Resilience:** Automatic token refresh, retries, error recovery
✅ **Thread Safety:** Per-thread clients, synchronized shared state
✅ **Quality Assurance:** Coherence checks + config-driven regression testing
✅ **Comprehensive Logging:** Full audit trail in dashboard.log
✅ **User-Friendly:** Progress bars, interactive prompts, clear error messages

This architecture enables non-technical users to configure new data sources without code changes, while providing developers with extensible hooks for custom logic and quality validation.

---

**Document End**