EB_Dashboard/DOCUMENTATION/DOCUMENTATION_30_ARCHITECTURE_SUMMARY.md

# 📊 Endobest Clinical Research Dashboard - Architecture Summary

**Last Updated:** 2025-11-08
**Project Status:** Production Ready with Excel Export Feature
**Language:** Python 3.x

---

## 🎯 Executive Summary

The **Endobest Clinical Research Dashboard** is a sophisticated, production-grade automated data collection and reporting system designed to aggregate patient inclusion data from the Endobest clinical research protocol across multiple healthcare organizations. The system combines high-performance multithreading, comprehensive quality assurance, and fully externalized configuration to enable non-technical users to manage complex data extraction workflows without code modifications.

### Core Value Propositions

✅ **100% Externalized Configuration** - All field definitions, quality rules, and export logic defined in Excel
✅ **High-Performance Architecture** - 4-5x faster via optimized API calls and parallel processing
✅ **Robust Resilience** - Automatic token refresh, retries, graceful degradation
✅ **Comprehensive Quality Assurance** - Coherence checks + config-driven regression testing
✅ **Multi-Format Export** - JSON + configurable Excel workbooks with data transformation
✅ **User-Friendly Interface** - Interactive prompts, progress tracking, clear error messages

---

## 📁 Project Structure

```
Endobest Dashboard/
├── 📜 MAIN SCRIPT
│   └── eb_dashboard.py                      (57.5 KB, 1,021 lines)
│       Core orchestrator for data collection, processing, and export
│
├── 🔧 UTILITY MODULES
│   ├── eb_dashboard_utils.py                (6.4 KB, 184 lines)
│   │   Thread-safe HTTP clients, nested data navigation, config resolution
│   │
│   ├── eb_dashboard_quality_checks.py       (58.5 KB, 1,266 lines)
│   │   Coherence checks, non-regression testing, data validation
│   │
│   └── eb_dashboard_excel_export.py         (32 KB, ~1,000 lines)
│       Configuration-driven Excel workbook generation
│
├── 📚 DOCUMENTATION
│   ├── DOCUMENTATION_10_ARCHITECTURE.md     (43.7 KB)
│   │   System design, data flow, API integration, multithreading
│   │
│   ├── DOCUMENTATION_11_FIELD_MAPPING.md    (56.3 KB)
│   │   Field extraction logic, custom functions, transformations
│   │
│   ├── DOCUMENTATION_12_QUALITY_CHECKS.md   (60.2 KB)
│   │   Quality assurance framework, regression rules, validation logic
│   │
│   ├── DOCUMENTATION_13_EXCEL_EXPORT.md     (29.6 KB)
│   │   Excel generation architecture, data transformation pipeline
│   │
│   ├── DOCUMENTATION_98_USER_GUIDE.md       (8.4 KB)
│   │   End-user instructions, quick start, troubleshooting
│   │
│   └── DOCUMENTATION_99_CONFIG_GUIDE.md     (24.8 KB)
│       Administrator configuration reference
│
├── ⚙️  CONFIGURATION
│   └── config/
│       ├── Endobest_Dashboard_Config.xlsx   (Configuration file)
│       │   Inclusions_Mapping
│       │   Organizations_Mapping
│       │   Excel_Workbooks
│       │   Excel_Sheets
│       │   Regression_Check
│       │
│       ├── eb_org_center_mapping.xlsx       (Organization enrichment)
│       │
│       └── templates/
│           ├── Endobest_Template.xlsx
│           ├── Statistics_Template.xlsx
│           └── (Other Excel templates)
│
├── 📊 OUTPUT FILES
│   ├── endobest_inclusions.json             (~6-7 MB, patient data)
│   ├── endobest_inclusions_old.json         (backup)
│   ├── endobest_organizations.json          (~17-20 KB, stats)
│   ├── endobest_organizations_old.json      (backup)
│   ├── [Excel outputs]                      (*.xlsx, configurable)
│   └── dashboard.log                        (Execution log)
│
└── 🔨 EXECUTABLES
    ├── eb_dashboard.exe                     (16.5 MB, PyInstaller build)
    └── [Various .bat launch scripts]
```

---

## 🏗️ System Architecture Overview

### High-Level Component Diagram

```
┌─────────────────────────────────────────────────────────────────────┐
│                   ENDOBEST DASHBOARD MAIN PROCESS                   │
│                        eb_dashboard.py                              │
├─────────────────────────────────────────────────────────────────────┤
│                                                                     │
│  ┌──────────────────────────────────────────────────────────────┐  │
│  │  PHASE 1: INITIALIZATION & AUTHENTICATION                   │  │
│  │  ├─ User Login (IAM API)                                    │  │
│  │  ├─ Token Exchange (RC-specific)                            │  │
│  │  ├─ Config Loading (Excel parsing & validation)            │  │
│  │  └─ Thread Pool Setup (20 workers main, 40 subtasks)       │  │
│  └──────────────────────────────────────────────────────────────┘  │
│                              ↓                                      │
│  ┌──────────────────────────────────────────────────────────────┐  │
│  │  PHASE 2: ORGANIZATION & COUNTERS RETRIEVAL                │  │
│  │  ├─ Get All Organizations (getAllOrganizations API)        │  │
│  │  ├─ Fetch Counters Parallelized (20 workers)               │  │
│  │  ├─ Enrich with Center Mapping (optional)                  │  │
│  │  └─ Calculate Totals & Sort                                │  │
│  └──────────────────────────────────────────────────────────────┘  │
│                              ↓                                      │
│  ┌──────────────────────────────────────────────────────────────┐  │
│  │  PHASE 3: PATIENT INCLUSION DATA COLLECTION                │  │
│  │  Outer Loop: Organizations (20 parallel workers)           │  │
│  │  ├─ For Each Organization:                                 │  │
│  │  │  ├─ Get Inclusions List (POST /api/inclusions/search)  │  │
│  │  │  └─ For Each Patient (Sequential):                      │  │
│  │  │     ├─ Fetch Clinical Record (API)                      │  │
│  │  │     ├─ Fetch All Questionnaires (Optimized: 1 call)    │  │
│  │  │     ├─ Fetch Lab Requests (Async pool)                  │  │
│  │  │     ├─ Process Field Mappings (extraction + transform)  │  │
│  │  │     └─ Update Progress Bars (thread-safe)               │  │
│  │  │                                                         │  │
│  │  │  Inner Async: Lab/Questionnaire Fetches (40 workers)   │  │
│  │  │     (Non-blocking I/O during main processing)           │  │
│  │  └─ Combine Inclusions from All Orgs                       │  │
│  └──────────────────────────────────────────────────────────────┘  │
│                              ↓                                      │
│  ┌──────────────────────────────────────────────────────────────┐  │
│  │  PHASE 4: QUALITY ASSURANCE & VALIDATION                   │  │
│  │  ├─ Coherence Check (API stats vs actual data)             │  │
│  │  │  └─ Compares counters with detailed records             │  │
│  │  ├─ Non-Regression Check (config-driven)                   │  │
│  │  │  └─ Detects changes with severity levels                │  │
│  │  └─ Critical Issue Handling (user confirmation if needed)  │  │
│  └──────────────────────────────────────────────────────────────┘  │
│                              ↓                                      │
│  ┌──────────────────────────────────────────────────────────────┐  │
│  │  PHASE 5: EXPORT & PERSISTENCE                             │  │
│  │  ├─ Backup Old Files (if quality passed)                   │  │
│  │  ├─ Write JSON Outputs (endobest_inclusions.json, etc.)   │  │
│  │  ├─ Export to Excel (if configured)                        │  │
│  │  │  ├─ Load Templates                                      │  │
│  │  │  ├─ Apply Filters & Sorts                               │  │
│  │  │  ├─ Fill Data into Sheets                               │  │
│  │  │  ├─ Replace Values                                      │  │
│  │  │  └─ Recalculate Formulas (win32com)                     │  │
│  │  └─ Display Summary & Elapsed Time                         │  │
│  └──────────────────────────────────────────────────────────────┘  │
│                              ↓                                      │
│                           EXIT                                      │
└─────────────────────────────────────────────────────────────────────┘

                    ↓ EXTERNAL DEPENDENCIES ↓

┌─────────────────────────────────────────────────────────────────────┐
│                        EXTERNAL APIS                                │
├─────────────────────────────────────────────────────────────────────┤
│                                                                     │
│  🔐 AUTHENTICATION (IAM)                                           │
│     └─ api-auth.ziwig-connect.com                                  │
│        ├─ POST /api/auth/ziwig-pro/login                           │
│        └─ POST /api/auth/refreshToken                              │
│                                                                     │
│  🏥 RESEARCH CLINIC (RC)                                           │
│     └─ api-hcp.ziwig-connect.com                                   │
│        ├─ POST /api/auth/config-token                              │
│        ├─ GET /api/inclusions/getAllOrganizations                  │
│        ├─ POST /api/inclusions/inclusion-statistics                │
│        ├─ POST /api/inclusions/search                              │
│        ├─ POST /api/records/byPatient                              │
│        └─ POST /api/surveys/filter/with-answers (optimized!)      │
│                                                                     │
│  🧪 LAB / DIAGNOSTICS (GDD)                                        │
│     └─ api-lab.ziwig-connect.com                                   │
│        └─ GET /api/requests/by-tube-id/{tubeId}                    │
│                                                                     │
│  📝 EXCEL TEMPLATES                                                │
│     └─ config/templates/                                           │
│        ├─ Endobest_Template.xlsx                                   │
│        ├─ Statistics_Template.xlsx                                 │
│        └─ (Custom templates)                                       │
│                                                                     │
└─────────────────────────────────────────────────────────────────────┘
```

---

## 🔌 Module Descriptions

### 1. **eb_dashboard.py** - Main Orchestrator (57.5 KB)

**Responsibility:** Complete data collection workflow, API coordination, multithreaded execution

**Structure (9 Blocks):**

| Block | Purpose | Key Functions |
|-------|---------|---|
| **1** | Configuration & Infrastructure | Constants, global vars, progress bar setup |
| **2** | Decorators & Resilience | `@api_call_with_retry`, retry logic |
| **3** | Authentication | `login()`, token exchange, IAM integration |
| **3B** | File Utilities | `load_json_file()` |
| **4** | Inclusions Mapping Config | `load_inclusions_mapping_config()`, validation |
| **5** | Data Search & Extraction | Questionnaire finding, field retrieval |
| **6** | Custom Functions | Business logic, calculated fields |
| **7** | Business API Calls | RC, GDD, organization endpoints |
| **7b** | Organization Center Mapping | `load_org_center_mapping()` |
| **8** | Processing Orchestration | `process_organization_patients()`, patient data processing |
| **9** | Main Execution | Entry point, quality checks, export |

**Key Technologies:**
- `httpx` - HTTP client (with thread-local instances)
- `openpyxl` - Excel parsing
- `concurrent.futures.ThreadPoolExecutor` - Parallel execution
- `tqdm` - Progress tracking
- `questionary` - Interactive prompts

---

### 2. **eb_dashboard_utils.py** - Utility Functions (6.4 KB)

**Responsibility:** Generic, reusable utility functions shared across modules

**Core Functions:**

```python
get_httpx_client()          # Thread-local HTTP client management
get_thread_position()       # Progress bar positioning
get_nested_value()          # JSON path navigation with wildcard support (*)
get_config_path()           # Config folder resolution (script vs PyInstaller)
get_old_filename()          # Backup filename generation
```

**Key Features:**
- Thread-safe HTTP client pooling
- Wildcard support in nested JSON paths (e.g., `["items", "*", "value"]`)
- Cross-platform path resolution

---

### 3. **eb_dashboard_quality_checks.py** - QA & Validation (58.5 KB)

**Responsibility:** Quality assurance, data validation, regression checking

**Core Functions:**

| Function | Purpose |
|----------|---------|
| `load_regression_check_config()` | Load regression rules from Excel |
| `run_quality_checks()` | Orchestrate all QA checks |
| `coherence_check()` | Verify stats vs detailed data consistency |
| `non_regression_check()` | Config-driven change validation |
| `run_check_only_mode()` | Standalone validation mode |
| `backup_output_files()` | Create versioned backups |

**Quality Check Types:**

1. **Coherence Check**
   - Compares API-provided organization statistics vs. actual inclusion counts
   - Severity: Warning/Critical
   - Example: Total API count (145) vs. actual inclusions (143)

2. **Non-Regression Check**
   - Compares current vs. previous run data
   - Applies config-driven rules with transition patterns
   - Detects: new inclusions, deletions, field changes
   - Severity: Warning/Critical with exceptions

---

### 4. **eb_dashboard_excel_export.py** - Excel Generation & Orchestration (38 KB, v1.1+)

**Responsibility:** Configuration-driven Excel workbook generation with data transformation + high-level orchestration

**Core Functions (Low-Level):**

| Function | Purpose |
|----------|---------|
| `load_excel_export_config()` | Load Excel_Workbooks + Excel_Sheets config |
| `validate_excel_config()` | Validate templates and named ranges |
| `export_to_excel()` | Main export orchestration (openpyxl + win32com) |
| `_apply_filter()` | AND-condition filtering |
| `_apply_sort()` | Multi-key sorting with datetime support |
| `_apply_value_replacement()` | Strict type matching value transformation |
| `_handle_output_exists()` | File conflict resolution |
| `_recalculate_workbook()` | Formula recalculation via win32com |
| `_process_sheet()` | Sheet-specific data filling |

**High-Level Orchestration Functions (v1.1+):**

| Function | Purpose | Called From |
|----------|---------|-------------|
| `export_excel_only()` | Complete --excel-only mode | main() CLI detection |
| `run_normal_mode_export()` | Normal mode export phase | main() after JSON write |
| `prepare_excel_export()` | Preparation + validation | Both orchestration functions |
| `execute_excel_export()` | Execution with error handling | Both orchestration functions |
| `_load_json_file_internal()` | Safe JSON loading | run_normal_mode_export() |

**Data Transformation Pipeline:**
```
1. Load Configuration (Excel_Workbooks + Excel_Sheets)
2. For each workbook:
   a. Load template (openpyxl)
   b. For each sheet:
      - Apply filter (AND conditions)
      - Apply sort (multi-key)
      - Apply value replacement (strict type matching)
      - Fill data into cells/named ranges
   c. Handle file conflicts (Overwrite/Increment/Backup)
   d. Save workbook (openpyxl)
   e. Recalculate formulas (win32com - optional)
```

**Orchestration Pattern (v1.1+):**

As of v1.1, the system delegates all export orchestration to dedicated functions following the pattern established by `run_check_only_mode()` from quality_checks:

1. **--excel-only mode:** Main script calls single function → `export_excel_only()` handles everything
2. **Normal mode export:** Main script calls single function → `run_normal_mode_export()` handles everything

This keeps the main script focused on business logic while all export mechanics are encapsulated in the module.

---

## 🔄 Complete Data Collection Workflow

### Phase 1: Initialization (2-3 seconds)
1. User provides credentials (with defaults)
2. IAM Login: `POST /api/auth/ziwig-pro/login`
3. Token Exchange: `POST /api/auth/config-token`
4. Load configuration from `Endobest_Dashboard_Config.xlsx`
5. Validate field mappings and quality check rules
6. Setup thread pools (main: 20 workers, subtasks: 40 workers)

### Phase 2: Organization Retrieval (5-8 seconds)
1. Get all organizations: `GET /api/inclusions/getAllOrganizations`
2. Filter excluded centers (config-driven)
3. Fetch counters in parallel (20 workers):
   - For each org: `POST /api/inclusions/inclusion-statistics`
   - Store: patients_count, preincluded_count, included_count, prematurely_terminated_count
4. Optional: Enrich with center mapping (from `eb_org_center_mapping.xlsx`)
5. Calculate totals and sort

### Phase 3: Patient Data Collection (2-4 minutes)
**Nested Parallel Architecture:**

**Outer Loop (20 workers):** For each organization
- `POST /api/inclusions/search?limit=1000&page=1` → Get up to 1000 inclusions

**Middle Loop (Sequential):** For each patient
- Fetch clinical record: `POST /api/records/byPatient`
- Fetch questionnaires: `POST /api/surveys/filter/with-answers` (**optimized: 1 call**)
- Submit async lab request: `GET /api/requests/by-tube-id/{tubeId}` (in subtasks pool)

**Inner Loop (40 async workers):** Non-blocking lab/questionnaire processing
- Parallel fetches of lab requests while main thread processes fields

**Field Processing (per patient):**
- For each field in configuration:
  1. Determine source (questionnaire, record, inclusion, request, calculated)
  2. Extract raw value (supports JSON paths with wildcards)
  3. Check field condition (optional)
  4. Apply post-processing transformations
  5. Format score dictionaries
  6. Store in nested output structure

### Phase 4: Quality Assurance (10-15 seconds)
1. **Coherence Check:** Compare API counters vs. actual data
2. **Non-Regression Check:** Compare current vs. previous run with config rules
3. **Critical Issue Handling:** User confirmation if issues detected
4. If NO critical issues → continue to export
5. If YES critical issues → prompt user for override

### Phase 5: Export & Persistence (3-5 seconds)

**Step 1: Backup & JSON Write**
1. Backup old files (if quality checks passed)
2. Write JSON outputs:
   - `endobest_inclusions.json` (6-7 MB)
   - `endobest_organizations.json` (17-20 KB)

**Step 2: Excel Export (if configured)**
Delegated to `run_normal_mode_export()` function which handles:
1. Load JSONs from filesystem (ensures consistency)
2. Load Excel configuration
3. Validate templates and named ranges
4. For each configured workbook:
   - Load template file
   - Apply filter conditions (AND logic)
   - Apply multi-key sort
   - Apply value replacements (strict type matching)
   - Fill data into cells/named ranges
   - Handle file conflicts (Overwrite/Increment/Backup)
   - Save workbook
   - Recalculate formulas (optional, via win32com)
5. Display results and return status

**Step 3: Summary**
1. Display elapsed time
2. Report file locations
3. Note any warnings/errors during export

---

## ⚙️ Configuration System

### Three-Layer Configuration Architecture

#### Layer 1: Excel Configuration (`Endobest_Dashboard_Config.xlsx`)

**Sheet 1: Inclusions_Mapping** (Field Extraction)
- Define which patient fields to extract
- Specify sources (questionnaire, record, inclusion, request, calculated)
- Configure transformations (value labels, templates, conditions)
- ~50+ fields typically configured

**Sheet 2: Organizations_Mapping** (Organization Fields)
- Define which organization fields to export
- Rarely modified

**Sheet 3: Excel_Workbooks** (Excel Export Metadata)
- Workbook names
- Template paths
- Output filenames (with template variables)
- File conflict handling strategy (Overwrite/Increment/Backup)

**Sheet 4: Excel_Sheets** (Sheet Configurations)
- Workbook name (reference to Excel_Workbooks)
- Sheet name (in template)
- Source type (Inclusions/Organizations/Variable)
- Target (cell or named range)
- Column mapping (JSON)
- Filter conditions (JSON with AND logic)
- Sort keys (JSON, multi-key with datetime support)
- Value replacements (JSON, strict type matching)

**Sheet 5: Regression_Check** (Quality Rules)
- Rule names
- Field selection pipeline (include/exclude patterns)
- Scope (all organizations or specific org list)
- Transition patterns (expected state changes)
- Severity levels (Warning/Critical)

#### Layer 2: Organization Mapping (`eb_org_center_mapping.xlsx`)
- Optional mapping file
- Sheet: `Org_Center_Mapping`
- Maps organization names to center identifiers
- Gracefully degraded if missing

#### Layer 3: Excel Templates (`config/templates/`)
- Excel workbook templates with:
  - Sheet definitions
  - Named ranges (for data fill targets)
  - Formula structures
  - Formatting and styles

### Configuration Constants (in code)

```python
# API Configuration
IAM_URL = "https://api-auth.ziwig-connect.com"
RC_URL = "https://api-hcp.ziwig-connect.com"
GDD_URL = "https://api-lab.ziwig-connect.com"
RC_APP_ID = "602aea51-cdb2-4f73-ac99-fd84050dc393"
RC_ENDOBEST_PROTOCOL_ID = "3c7bcb4d-91ed-4e9f-b93f-99d8447a276e"

# Threading & Performance
MAX_THREADS = 20                # Main thread pool workers
ASYNC_THREADS = 40              # Subtasks thread pool workers
ERROR_MAX_RETRY = 10            # Maximum retry attempts
WAIT_BEFORE_RETRY = 0.5         # Seconds between retries

# Excluded Organizations
RC_ENDOBEST_EXCLUDED_CENTERS = ["e18e7487-...", "5582bd75-...", "e053512f-..."]
```

---

## 🔐 API Integration

### Authentication Flow

```
1. IAM Login
   POST https://api-auth.ziwig-connect.com/api/auth/ziwig-pro/login
   Request: {"username": "...", "password": "..."}
   Response: {"access_token": "jwt_master", "userId": "uuid"}

2. Token Exchange (RC-specific)
   POST https://api-hcp.ziwig-connect.com/api/auth/config-token
   Headers: Authorization: Bearer {master_token}
   Request: {"userId": "...", "clientId": "...", "userAgent": "..."}
   Response: {"access_token": "jwt_rc", "refresh_token": "refresh_token"}

3. Automatic Token Refresh (on 401)
   POST https://api-hcp.ziwig-connect.com/api/auth/refreshToken
   Headers: Authorization: Bearer {current_token}
   Request: {"refresh_token": "..."}
   Response: {"access_token": "jwt_new", "refresh_token": "new_refresh"}
```

### Key API Endpoints

| Endpoint | Method | Purpose |
|----------|--------|---------|
| `/api/inclusions/getAllOrganizations` | GET | List all organizations |
| `/api/inclusions/inclusion-statistics` | POST | Get patient counts per org |
| `/api/inclusions/search` | POST | Get inclusions list for org (paginated) |
| `/api/records/byPatient` | POST | Get clinical record for patient |
| `/api/surveys/filter/with-answers` | POST | **OPTIMIZED:** Get all questionnaires for patient |
| `/api/requests/by-tube-id/{tubeId}` | GET | Get lab test results |

### Performance Optimization: Questionnaire Batching

**Problem:** Multiple API calls per patient (1 call per questionnaire × N patients = slow)

**Solution:** Single optimized call retrieves all questionnaires with answers

```
BEFORE (inefficient):
for qcm_id in questionnaire_ids:
    GET /api/surveys/{qcm_id}/answers?subject={patient_id}
    # Result: N API calls per patient

AFTER (optimized):
POST /api/surveys/filter/with-answers
{
  "context": "clinic_research",
  "subject": patient_id
}
# Result: 1 API call per patient
# Impact: 4-5x performance improvement
```

---

## ⚡ Multithreading & Performance Optimization

### Thread Pool Architecture

```
Main Application Thread
    ↓
┌─ Phase 1: Counter Fetching ──────────────────────────┐
│ ThreadPoolExecutor(max_workers=user_input, cap=20)   │
│ ├─ Task 1: Get counters for Org 1                     │
│ ├─ Task 2: Get counters for Org 2                     │
│ └─ Task N: Get counters for Org N                     │
│ [Sequential wait: tqdm.as_completed]                  │
└──────────────────────────────────────────────────────┘
    ↓
┌─ Phase 2: Inclusion Data Collection (Nested) ────────┐
│ Outer: ThreadPoolExecutor(max_workers=user_input)    │
│                                                       │
│ For Org 1:                                            │
│ │   Inner: ThreadPoolExecutor(max_workers=40)        │
│ │   ├─ Patient 1: Async lab/questionnaire fetch      │
│ │   ├─ Patient 2: Async lab/questionnaire fetch      │
│ │   └─ Patient N: Async lab/questionnaire fetch      │
│ │   [Sequential outer wait: as_completed]            │
│ │                                                     │
│ For Org 2:                                            │
│ │   [Similar parallel processing]                    │
│ │                                                     │
│ For Org N:                                            │
│ │   [Similar parallel processing]                    │
└──────────────────────────────────────────────────────┘
```

### Performance Optimizations

1. **Thread-Local HTTP Clients**
   - Each thread maintains its own `httpx.Client`
   - Avoids connection conflicts
   - Implementation via `get_httpx_client()`

2. **Nested Parallelization**
   - Main pool: Organizations (20 workers)
   - Subtasks pool: Lab requests (40 workers)
   - Non-blocking I/O during processing

3. **Questionnaire Batching** (4-5x improvement)
   - Single call retrieves all questionnaires + answers
   - Eliminates N filtered calls per patient

4. **Configurable Worker Threads**
   - User input selection (1-20 workers)
   - Tunable for network bandwidth and API rate limits

### Progress Tracking (Multi-Level)

```
Overall Progress [████████████░░░░░░░░░░░░] 847/1200
  1/15 - Center 1 [██████████░░░░░░░░░░░░░░░]  73/95
  2/15 - Center 2 [██████░░░░░░░░░░░░░░░░░░░]  42/110
  3/15 - Center 3 [████░░░░░░░░░░░░░░░░░░░░░]  28/85
```

**Thread-Safe Updates:**
```python
with _global_pbar_lock:
    if global_pbar:
        global_pbar.update(1)
```

---

## 🛡️ Error Handling & Resilience

### Token Management Strategy

1. **Automatic Token Refresh on 401**
   - Triggered by `@api_call_with_retry` decorator
   - Thread-safe via `_token_refresh_lock`

2. **Retry Mechanism**
   - Max retries: 10 attempts
   - Delay between retries: 0.5 seconds
   - Decorators: `@api_call_with_retry`

3. **Thread-Safe Token Refresh**
   ```python
   def new_token():
       global access_token, refresh_token
       with _token_refresh_lock:  # Only one thread refreshes at a time
           for attempt in range(ERROR_MAX_RETRY):
               try:
                   # POST /api/auth/refreshToken
                   # Update global tokens
               except:
                   sleep(WAIT_BEFORE_RETRY)
   ```

### Exception Handling Categories

| Category | Examples | Handling |
|----------|----------|----------|
| **API Errors** | Network timeouts, HTTP errors | Retry with exponential spacing |
| **File I/O Errors** | Missing config, permission denied | Graceful error + exit |
| **Validation Errors** | Invalid config, incoherent data | Log warning + prompt user |
| **Thread Errors** | Worker thread failures | Shutdown gracefully + propagate |

### Graceful Degradation

1. **Missing Organization Mapping:** Skip silently, use fallback (org name)
2. **Critical Quality Issues:** Prompt user for confirmation before export
3. **Thread Failure:** Shutdown all workers gracefully, preserve partial results
4. **Invalid Configuration:** Clear error messages with remediation suggestions

---

## 📊 Data Output Structure

### JSON Output: `endobest_inclusions.json`

```json
[
  {
    "Patient_Identification": {
      "Organisation_Id": "uuid",
      "Organisation_Name": "Hospital Name",
      "Center_Name": "HOSP-A",
      "Patient_Id": "internal_id",
      "Pseudo": "ENDO-001",
      "Patient_Name": "Doe, John",
      "Patient_Birthday": "1975-05-15",
      "Patient_Age": 49
    },
    "Inclusion": {
      "Consent_Signed": true,
      "Inclusion_Date": "15/10/2024",
      "Inclusion_Status": "incluse",
      "isPrematurelyTerminated": false
    },
    "Extended_Fields": {
      "Custom_Field_1": "value",
      "Custom_Field_2": 42,
      "Composite_Score": "8/10"
    },
    "Endotest": {
      "Request_Sent": true,
      "Diagnostic_Status": "Completed"
    }
  }
]
```

### JSON Output: `endobest_organizations.json`

```json
[
  {
    "id": "org-uuid",
    "name": "Hospital A",
    "Center_Name": "HOSP-A",
    "patients_count": 45,
    "preincluded_count": 8,
    "included_count": 35,
    "prematurely_terminated_count": 2
  }
]
```

---

## 🚀 Execution Modes

### Mode 1: Normal (Full Collection)
```bash
python eb_dashboard.py
```
- Authenticates
- Collects from APIs
- Runs quality checks
- Exports JSON + Excel
- Duration: 2.5-5 minutes (typical)

### Mode 2: Excel-Only (Fast Export)
```bash
python eb_dashboard.py --excel-only
```
- Skips data collection
- Uses existing JSON files
- Regenerates Excel workbooks
- Duration: 5-15 seconds
- Use case: Reconfigure reports, test templates

### Mode 3: Check-Only (Validation Only)
```bash
python eb_dashboard.py --check-only
```
- Loads existing JSON
- Runs quality checks
- No export
- Duration: 5-10 seconds
- Use case: Verify data before distribution

### Mode 4: Debug (Verbose Output)
```bash
python eb_dashboard.py --debug
```
- Executes normal mode
- Enables detailed logging
- Shows field-by-field changes
- Check `dashboard.log` for details

---

## 📈 Performance Metrics & Benchmarks

### Typical Execution Times (Full Dataset: 1,200+ patients, 15+ organizations)

| Phase | Duration | Notes |
|-------|----------|-------|
| **Login & Config** | 2-3 sec | Sequential, network-dependent |
| **Fetch Counters** | 5-8 sec | 20 workers, parallelized |
| **Collect Inclusions** | 2-4 min | Includes API calls + field processing |
| **Quality Checks** | 10-15 sec | File loads, data comparison |
| **Export to JSON** | 3-5 sec | File I/O |
| **Export to Excel** | 5-15 sec | Template processing + fill |
| **TOTAL** | **~2.5-5 min** | Depends on network, API perf |

### Network Optimization Impact

**With old questionnaire approach (N filtered calls per patient):**
- 1,200 patients × 15 questionnaires = 18,000 API calls
- Estimated: 15-30 minutes

**With optimized single-call questionnaire:**
- 1,200 patients × 1 call = 1,200 API calls
- Estimated: 2-5 minutes
- **Improvement: 3-6x faster** ✅

---

## 🔍 Field Extraction & Processing Logic

### Complete Field Processing Pipeline

```
For each field in INCLUSIONS_MAPPING_CONFIG:
  │
  ├─ Step 1: Determine Source Type
  │  ├─ q_id / q_name / q_category → Find questionnaire
  │  ├─ record → Use clinical record
  │  ├─ inclusion → Use patient inclusion data
  │  ├─ request → Use lab request data
  │  └─ calculated → Execute custom function
  │
  ├─ Step 2: Extract Raw Value
  │  ├─ Navigate JSON using field_path
  │  ├─ Supports wildcard (*) for list traversal
  │  └─ Return value or "undefined"
  │
  ├─ Step 3: Check Field Condition (optional)
  │  ├─ If condition undefined → Set to "undefined"
  │  ├─ If condition not boolean → Error flag
  │  ├─ If condition false → Set to "N/A"
  │  └─ If condition true → Continue
  │
  ├─ Step 4: Apply Post-Processing Transformations
  │  ├─ true_if_any: Convert to boolean
  │  ├─ value_labels: Map to localized text
  │  ├─ field_template: Apply formatting
  │  └─ List joining: Flatten arrays with pipe delimiter
  │
  ├─ Step 5: Format Score Dictionaries
  │  ├─ If {total, max} → Format as "total/max"
  │  └─ Otherwise → Keep as-is
  │
  └─ Store: output_inclusion[field_group][field_name] = final_value
```

### Custom Functions for Calculated Fields

| Function | Purpose | Syntax |
|----------|---------|--------|
| `search_in_fields_using_regex` | Search multiple fields for pattern | `["search_in_fields_using_regex", "pattern", "field1", "field2"]` |
| `extract_parentheses_content` | Extract text within parentheses | `["extract_parentheses_content", "field_name"]` |
| `append_terminated_suffix` | Add suffix if patient terminated | `["append_terminated_suffix", "status_field", "is_terminated_field"]` |
| `if_then_else` | Unified conditional with 8 operators | `["if_then_else", "operator", arg1, arg2_optional, true_result, false_result]` |

**if_then_else Operators:**
- `is_true` / `is_false` - Boolean field test
- `is_defined` / `is_undefined` - Existence test
- `all_true` / `all_defined` - Multiple field test
- `==` / `!=` - Value comparison

---

## ✅ Quality Assurance Framework

### Coherence Check

**Purpose:** Verify API-provided statistics match actual collected data

**Logic:**
```
For each organization:
  API_Count = statistic.total
  Actual_Count = count of inclusion records

  if API_Count != Actual_Count:
    Report discrepancy with severity
    ├─ ±10%: Warning
    └─ >±10%: Critical
```

### Non-Regression Check

**Purpose:** Detect unexpected changes between data runs

**Configuration-Driven Rules:**
- Field selection pipeline (include/exclude patterns)
- Transition patterns (expected state changes)
- Severity levels (Warning/Critical)
- Exception handling (exclude specific organizations)

**Logic:**
```
Load previous inclusion data (_old file)

For each rule:
  ├─ Build candidate fields via pipeline
  ├─ Determine key field for matching
  └─ For each inclusion:
     ├─ Find matching old inclusion by key
     ├─ Check for unexpected transitions
     ├─ Apply exceptions
     └─ Report violations
```

---

## 📋 Documentation Structure

The system includes comprehensive documentation:

| Document | Size | Content |
|----------|------|---------|
| **DOCUMENTATION_10_ARCHITECTURE.md** | 43.7 KB | System design, workflow, APIs, multithreading |
| **DOCUMENTATION_11_FIELD_MAPPING.md** | 56.3 KB | Field extraction logic, custom functions, examples |
| **DOCUMENTATION_12_QUALITY_CHECKS.md** | 60.2 KB | QA framework, regression rules, configuration |
| **DOCUMENTATION_13_EXCEL_EXPORT.md** | 29.6 KB | Excel generation, data transformation, config |
| **DOCUMENTATION_98_USER_GUIDE.md** | 8.4 KB | End-user instructions, troubleshooting, FAQ |
| **DOCUMENTATION_99_CONFIG_GUIDE.md** | 24.8 KB | Administrator reference, Excel tables, examples |

---

## 🔧 Key Technical Features

### Thread Safety
- Per-thread HTTP clients (no connection conflicts)
- Synchronized access to global state via locks
- Thread-safe progress bar updates

### Error Recovery
- Automatic token refresh on 401 errors
- Exponential backoff retry logic (configurable)
- Graceful degradation for optional features
- User confirmation on critical issues

### Configuration Flexibility
- 100% externalized to Excel (zero code changes)
- Supports multiple data sources
- Custom business logic functions
- Field dependencies and conditions
- Value transformations and templates

### Performance
- Optimized API calls (4-5x improvement)
- Parallel processing (20+ workers)
- Async I/O operations
- Configurable thread pools

### Data Quality
- Coherence checking (stats vs actual data)
- Non-regression testing (config-driven)
- Comprehensive validation
- Audit trail logging

---

## 📦 Dependencies

### Core Libraries
- **httpx** - HTTP client with connection pooling
- **openpyxl** - Excel file reading/writing
- **questionary** - Interactive CLI prompts
- **tqdm** - Progress bars
- **rich** - Rich text formatting
- **pywin32** - Windows COM automation (optional, for formula recalculation)
- **pytz** - Timezone support (optional)

### Python Version
- Python 3.7+

### External Services
- Ziwig IAM API
- Ziwig Research Clinic (RC) API
- Ziwig Lab (GDD) API

---

## 🎓 Usage Patterns

### For End Users
1. Configure fields in Excel (no code needed)
2. Run: `python eb_dashboard.py`
3. Review results in JSON or Excel

### For Administrators
1. Add new fields to `Inclusions_Mapping`
2. Define quality rules in `Regression_Check`
3. Configure Excel export in `Excel_Workbooks` + `Excel_Sheets`
4. Restart: script picks up config automatically

### For Developers
1. Add custom function to Block 6 (eb_dashboard.py)
2. Register in field config (Inclusions_Mapping)
3. Use via: `"source_id": "function_name"`
4. No code recompile needed for other changes

---

## 🎯 Summary

The **Endobest Clinical Research Dashboard** represents a mature, production-ready system that successfully combines:

✅ **Architectural Excellence** - Clean modular design with separation of concerns
✅ **User-Centric Configuration** - 100% externalized, no code changes needed
✅ **Performance Optimization** - 4-5x faster via API and threading improvements
✅ **Robust Resilience** - Comprehensive error handling, automatic recovery, graceful degradation
✅ **Quality Assurance** - Multi-level validation, coherence checks, regression testing
✅ **Comprehensive Documentation** - 250+ KB of technical and user guides
✅ **Maintainability** - Clear code structure, extensive logging, audit trails

The system successfully enables non-technical users to configure complex data extraction and reporting workflows while maintaining enterprise-grade reliability and performance standards.

---

**Document Version:** 1.0
**Last Updated:** 2025-11-08
**Status:** ✅ Complete & Production Ready