Files
EB_Dashboard/DOCUMENTATION/DOCUMENTATION_34_FEATURES_MATRIX.md

14 KiB

Endobest Dashboard - Features & Capabilities Matrix

Complete Feature Reference


🎯 Core Features

1. Automated Data Collection

Multi-Source Data Integration

  • Research Clinic (RC) APIs

    • Organizations listing
    • Inclusion statistics per organization
    • Patient records & clinical data
    • Questionnaire responses (optimized batch call)
  • Lab APIs (GDD)

    • Lab test requests
    • Diagnostic results
    • Tube ID tracking
  • Questionnaire System

    • Single optimized call retrieves ALL questionnaires + answers (4-5x faster)
    • Support for multiple questionnaire sources
    • Nested answer structures with JSON path navigation

Authentication & Token Management

  • IAM integration (Ziwig Pro)
  • Automatic token exchange (RC-specific)
  • Automatic token refresh on 401 errors
  • Thread-safe token refresh with locking
  • Credential input with secure defaults

2. 100% Externalized Configuration

Excel-Based Configuration

No code changes needed - all behavior defined in Excel:

Configuration Aspect Location Flexibility
Field Extraction Inclusions_Mapping sheet Define which fields to extract, from which sources
Organization Fields Organizations_Mapping sheet Define which org fields to export
Excel Export Excel_Workbooks + Excel_Sheets sheets Define workbooks, templates, sheets, transformations
Quality Rules Regression_Check sheet Define expected data changes & validation rules
Organization Mapping eb_org_center_mapping.xlsx Map organization names to center identifiers

Supported Configuration Types

  • Field sources: Questionnaire (by ID/Name/Category), Record, Inclusion, Request, Calculated
  • Value transformations: Labels, templates, boolean conversion, list joining
  • Field conditions: Optional conditional execution (N/A if condition false)
  • Custom functions: Business logic without code modification
  • Filters: AND-condition filtering with nested field support
  • Sorting: Multi-key sorting with datetime parsing
  • Value replacement: Type-strict mapping (boolean, enum, status codes)

3. High-Performance Multithreading

Parallel Processing Architecture

  • Organization processing: Up to 20 concurrent workers
  • Patient processing: Nested async pool (40 workers) for lab/questionnaire fetches
  • Configurable thread count: User selects 1-20 workers
  • Non-blocking I/O: Async lab fetches during main processing

Performance Optimizations

  • Questionnaire batching: Single API call per patient instead of N calls (4-5x improvement)
  • Thread-local HTTP clients: Per-thread client instances prevent connection conflicts
  • Nested parallelization: Outer pool for orgs, inner pool for async tasks
  • Progress tracking: Real-time multi-level progress bars with thread-safe updates

Typical Performance (Full Dataset)

Data Collection: 2-4 minutes (1,200+ patients, 15+ orgs)
Quality Checks: 10-15 seconds
Excel Export: 5-15 seconds
─────────────────────────────
TOTAL: 2.5-5 minutes

4. Comprehensive Quality Assurance

Coherence Checking

  • Compares API-provided statistics vs. actual collected data
  • Organization-level validation
  • Detects data inconsistencies
  • Warning/Critical severity levels

Non-Regression Testing

  • Compares current vs. previous run
  • Config-driven validation rules
  • Detects unexpected changes:
    • New inclusions
    • Deleted inclusions
    • Field value changes
    • Status transitions
  • Exception handling (org-specific overrides)
  • Transition pattern support (expected state changes)

Critical Issue Handling

  • User confirmation required for critical issues
  • Override capability (continue export despite warnings)
  • Prevents accidental data replacement
  • Clear reporting of issue severity

5. Flexible Data Export

JSON Export

  • Structure: Nested by field groups
  • Files:
    • endobest_inclusions.json (~6-7 MB)
    • endobest_organizations.json (~17-20 KB)
  • Backup: Automatic _old file creation
  • Format: UTF-8, 4-space indentation

Excel Export (New!)

  • Configuration-driven: All behavior in Excel
  • Multi-workbook support: Generate multiple Excel files
  • Template support: Load & fill Excel templates
  • Data transformation pipeline:
    • Filter (AND conditions)
    • Sort (multi-key with datetime)
    • Replace values (type-strict)
    • Fill cells/named ranges
  • Formula recalculation: win32com integration (optional)
  • File conflict handling: Overwrite/Increment/Backup strategies
  • Template variables: Dynamic filenames using .format() pattern

6. Advanced Field Processing

Field Sources

  • Questionnaire sources:
    • By ID: Direct lookup (fastest)
    • By Name: Sequential search
    • By Category: Sequential search
  • Record sources: Clinical record data
  • Inclusion sources: Patient inclusion metadata
  • Request sources: Lab test data
  • Calculated sources: Custom function execution

Field Path Navigation

  • Nested path support: Navigate multi-level structures
  • Wildcard support: Extract lists with * operator
  • JSON path expressions: Full structure traversal

Value Transformations

  • Boolean conversion: true_if_any with multiple match values
  • Value labels: Map values to localized text (French support)
  • Field templates: Format with placeholders (e.g., "$value%")
  • List joining: Flatten arrays with pipe delimiter
  • Score dictionaries: Format as "total/max"
  • Type preservation: Keep original types unless transformed

Custom Functions

  • search_in_fields_using_regex - Pattern matching across fields
  • extract_parentheses_content - Extract text in parentheses
  • append_terminated_suffix - Conditional suffix appending
  • if_then_else - Unified conditional logic (8 operators)
    • is_true, is_false - Boolean tests
    • is_defined, is_undefined - Existence tests
    • all_true, all_defined - Multiple field tests
    • ==, != - Value comparisons

7. Organization Enrichment

Center Mapping Feature

  • Optional: Map organization names to center identifiers
  • File: eb_org_center_mapping.xlsx
  • Configuration: Org_Center_Mapping sheet
  • Matching: Case-insensitive, whitespace-trimmed
  • Validation: No duplicate orgs/centers check
  • Graceful degradation: Missing file doesn't break process
  • Fallback: Unmapped orgs use original name
  • No code changes: Fully configurable via Excel

8. Robust Error Handling

API Error Recovery

  • Automatic token refresh on 401 errors
  • Retry mechanism: Up to 10 attempts with configurable spacing
  • Network error handling: Timeouts, connection refused, DNS failures
  • Thread-safe: Synchronized token refresh across workers

Graceful Degradation

  • Missing configuration: Clear error messages with file paths
  • Missing organization mapping: Skip silently, use fallback
  • Optional features: Disabled if dependencies missing (win32com, pytz)
  • Partial failures: Continue with available data
  • Thread failures: Shutdown gracefully, preserve partial results

Comprehensive Logging

  • Log file: dashboard.log (per run)
  • Logged events:
    • API errors with attempt counts
    • Token refresh events
    • Configuration loading
    • Quality check results
    • File I/O operations
    • Thread errors with stack traces
  • Log levels: WARNING, CRITICAL

🚀 Execution Modes

Mode 1: Normal (Full Collection)

python eb_dashboard.py

Workflow:

  1. Authenticate
  2. Load configuration
  3. Collect from all APIs (2-4 min)
  4. Run quality checks (10-15 sec)
  5. Export JSON files
  6. Generate Excel (if configured)

Use case: Regular data updates, scheduled runs


Mode 2: Excel-Only (Fast Export)

python eb_dashboard.py --excel_only

Workflow:

  1. Load existing JSON files
  2. Load configuration
  3. Generate Excel (5-15 sec)

Use case: Reconfigure reports, test templates, quick re-export


Mode 3: Check-Only (Validation)

python eb_dashboard.py --check-only

Workflow:

  1. Load existing JSON files
  2. Run quality checks
  3. Report any issues

Use case: Verify data quality, pre-distribution checks


Mode 4: Check-Only Compare (File Comparison)

python eb_dashboard.py --check-only file1.json file2.json

Workflow:

  1. Load two specific JSON files
  2. Run regression check (file1 vs file2)
  3. Report differences

Use case: Compare data snapshots, version comparison


Mode 5: Debug (Verbose Output)

python eb_dashboard.py --debug

Workflow:

  1. Execute normal mode
  2. Enable detailed logging
  3. Show field-by-field changes

Use case: Troubleshoot issues, detailed analysis


📊 Data Quality Features

Coherence Check

  • Compares API counts vs actual collected data
  • Organization-level validation
  • Severity: Warning (±10%), Critical (>±10%)

Non-Regression Check

  • Config-driven change detection
  • Supports transition patterns
  • Severity: Warning/Critical with overrides
  • Exception handling per organization

Data Validation

  • Field existence checking
  • Type validation (boolean, list, dict)
  • Condition evaluation
  • Nested structure traversal

🔧 Technical Capabilities

Multi-Level Threading

  • Organization-level parallelization (20 workers)
  • Patient-level async operations (40 workers)
  • Non-blocking I/O during processing
  • Thread-safe progress tracking

API Integration

  • 3 separate API domains (IAM, RC, GDD)
  • Dynamic URL routing
  • Request/response logging
  • Connection pooling per thread

Excel Workbook Generation

  • openpyxl for reading/writing
  • Template support
  • Named range targeting
  • Formula preservation
  • win32com integration (formula recalc)

Data Processing

  • JSON parsing & validation
  • Nested structure navigation
  • Wildcard pattern matching
  • Value type preservation
  • List flattening with delimiters

Configuration Management

  • Excel file parsing
  • Sheet validation
  • Row-level configuration loading
  • JSON column parsing
  • Type conversion & validation

🎓 User Experience Features

Interactive Interface

  • Credential input with secure defaults
  • Thread count selection (1-20)
  • Progress bar feedback (multi-level)
  • User confirmation for critical issues
  • Clear error messages with remediation

Progress Tracking

Overall Progress [████████░░░░░░░░░░░░] 847/1200
  1/15 - Center 1 [████████░░░░░░░░░░░░] 73/95
  2/15 - Center 2 [██░░░░░░░░░░░░░░░░░░] 42/110

Logging & Audit Trail

  • Per-run log file
  • Timestamped entries
  • Execution metrics
  • Error details with context
  • Searchable log format

📈 Scalability Features

Configurable Parallelization

  • User-selected thread count (1-20)
  • Network bandwidth tuning
  • API rate limit adaptation
  • Memory-efficient streaming

Large Dataset Support

  • 1,000+ patient support
  • 100+ organization support
  • Paginated API calls
  • Chunked processing

Performance Monitoring

  • Elapsed time tracking
  • Progress rate display
  • Estimated time remaining
  • Per-phase timing

🔐 Security & Data Protection

Credential Handling

  • Interactive password input (hidden)
  • Optional default credentials
  • Token-based API authentication
  • Automatic token refresh

Data Protection

  • File backup strategy (_old files)
  • Critical issue confirmation before overwrite
  • Graceful degradation on errors
  • No accidental data loss

Thread Safety

  • Per-thread HTTP clients
  • Synchronized global state
  • Lock-protected shared resources
  • Safe concurrent access

📦 System Integration

Platform Support

  • Windows (primary)
  • Linux/macOS compatible
  • PyInstaller executable packaging
  • Command-line interface

Dependency Management

  • pip-installable packages
  • Optional dependencies (pywin32, pytz)
  • Fallback for missing modules
  • Clear error messages

File Format Support

  • Excel 2007+ (.xlsx)
  • JSON UTF-8
  • Plain text logging
  • Configuration in Excel

🎯 Feature Comparison Table

Feature Implementation Configuration Code Changes Performance
Field Extraction Multi-source Excel None Fast
Custom Functions 4 built-in Excel register ⚠️ Function code Medium
Data Filtering AND conditions Excel JSON None Fast
Data Sorting Multi-key Excel JSON None Medium
Value Mapping Type-strict Excel JSON None Fast
Excel Export Template-based Excel None Medium
Quality Checks Rule-based Excel None Medium
Token Refresh Automatic Code ⚠️ Constants only Fast
Error Retry Configurable Code ⚠️ Constants only Medium
Organization Mapping File-based Excel file None Fast

Legend: = No changes, ⚠️ = Constants only, = Full code modification


🏆 Key Strengths

  1. 100% Configuration-Driven - No code changes for normal operations
  2. High Performance - 4-5x faster via optimized APIs
  3. Enterprise-Grade - Comprehensive error handling & recovery
  4. User-Friendly - Clear prompts, progress tracking, helpful errors
  5. Flexible - Multiple execution modes, configurable parameters
  6. Maintainable - Clean architecture, extensive logging, good documentation
  7. Scalable - Handles 1,000+ patients, 100+ organizations
  8. Reliable - Quality checks, data validation, automatic backups

🔄 Workflow Flexibility

Multi-Stage Process

COLLECT → VALIDATE → EXPORT (JSON + Excel)
           ↓
      CRITICAL ISSUES?
           ↓
        USER CONFIRMS?
           ↓
     YES → CONTINUE | NO → ABORT

Parallel Data Sources

  • Clinical Records (RC)
  • Questionnaire Answers (RC)
  • Lab Requests (GDD)
  • Organization Stats (RC)
  • Patient Demographics (Mixed)

Sequential Validation Gates

  1. Configuration validation (startup)
  2. Authentication validation (login)
  3. API response validation (each call)
  4. Data structure validation (processing)
  5. Coherence validation (checks)
  6. Regression validation (checks)

All features listed above are fully functional and tested in production.

For implementation details, see DOCUMENTATION_30_ARCHITECTURE_SUMMARY.md