14 KiB
14 KiB
✨ Endobest Dashboard - Features & Capabilities Matrix
Complete Feature Reference
🎯 Core Features
1. Automated Data Collection
✅ Multi-Source Data Integration
-
Research Clinic (RC) APIs
- Organizations listing
- Inclusion statistics per organization
- Patient records & clinical data
- Questionnaire responses (optimized batch call)
-
Lab APIs (GDD)
- Lab test requests
- Diagnostic results
- Tube ID tracking
-
Questionnaire System
- Single optimized call retrieves ALL questionnaires + answers (4-5x faster)
- Support for multiple questionnaire sources
- Nested answer structures with JSON path navigation
✅ Authentication & Token Management
- IAM integration (Ziwig Pro)
- Automatic token exchange (RC-specific)
- Automatic token refresh on 401 errors
- Thread-safe token refresh with locking
- Credential input with secure defaults
2. 100% Externalized Configuration
✅ Excel-Based Configuration
No code changes needed - all behavior defined in Excel:
| Configuration Aspect | Location | Flexibility |
|---|---|---|
| Field Extraction | Inclusions_Mapping sheet |
Define which fields to extract, from which sources |
| Organization Fields | Organizations_Mapping sheet |
Define which org fields to export |
| Excel Export | Excel_Workbooks + Excel_Sheets sheets |
Define workbooks, templates, sheets, transformations |
| Quality Rules | Regression_Check sheet |
Define expected data changes & validation rules |
| Organization Mapping | eb_org_center_mapping.xlsx |
Map organization names to center identifiers |
✅ Supported Configuration Types
- Field sources: Questionnaire (by ID/Name/Category), Record, Inclusion, Request, Calculated
- Value transformations: Labels, templates, boolean conversion, list joining
- Field conditions: Optional conditional execution (N/A if condition false)
- Custom functions: Business logic without code modification
- Filters: AND-condition filtering with nested field support
- Sorting: Multi-key sorting with datetime parsing
- Value replacement: Type-strict mapping (boolean, enum, status codes)
3. High-Performance Multithreading
✅ Parallel Processing Architecture
- Organization processing: Up to 20 concurrent workers
- Patient processing: Nested async pool (40 workers) for lab/questionnaire fetches
- Configurable thread count: User selects 1-20 workers
- Non-blocking I/O: Async lab fetches during main processing
✅ Performance Optimizations
- Questionnaire batching: Single API call per patient instead of N calls (4-5x improvement)
- Thread-local HTTP clients: Per-thread client instances prevent connection conflicts
- Nested parallelization: Outer pool for orgs, inner pool for async tasks
- Progress tracking: Real-time multi-level progress bars with thread-safe updates
✅ Typical Performance (Full Dataset)
Data Collection: 2-4 minutes (1,200+ patients, 15+ orgs)
Quality Checks: 10-15 seconds
Excel Export: 5-15 seconds
─────────────────────────────
TOTAL: 2.5-5 minutes
4. Comprehensive Quality Assurance
✅ Coherence Checking
- Compares API-provided statistics vs. actual collected data
- Organization-level validation
- Detects data inconsistencies
- Warning/Critical severity levels
✅ Non-Regression Testing
- Compares current vs. previous run
- Config-driven validation rules
- Detects unexpected changes:
- New inclusions
- Deleted inclusions
- Field value changes
- Status transitions
- Exception handling (org-specific overrides)
- Transition pattern support (expected state changes)
✅ Critical Issue Handling
- User confirmation required for critical issues
- Override capability (continue export despite warnings)
- Prevents accidental data replacement
- Clear reporting of issue severity
5. Flexible Data Export
✅ JSON Export
- Structure: Nested by field groups
- Files:
endobest_inclusions.json(~6-7 MB)endobest_organizations.json(~17-20 KB)
- Backup: Automatic _old file creation
- Format: UTF-8, 4-space indentation
✅ Excel Export (New!)
- Configuration-driven: All behavior in Excel
- Multi-workbook support: Generate multiple Excel files
- Template support: Load & fill Excel templates
- Data transformation pipeline:
- Filter (AND conditions)
- Sort (multi-key with datetime)
- Replace values (type-strict)
- Fill cells/named ranges
- Formula recalculation: win32com integration (optional)
- File conflict handling: Overwrite/Increment/Backup strategies
- Template variables: Dynamic filenames using
.format()pattern
6. Advanced Field Processing
✅ Field Sources
- Questionnaire sources:
- By ID: Direct lookup (fastest)
- By Name: Sequential search
- By Category: Sequential search
- Record sources: Clinical record data
- Inclusion sources: Patient inclusion metadata
- Request sources: Lab test data
- Calculated sources: Custom function execution
✅ Field Path Navigation
- Nested path support: Navigate multi-level structures
- Wildcard support: Extract lists with
*operator - JSON path expressions: Full structure traversal
✅ Value Transformations
- Boolean conversion:
true_if_anywith multiple match values - Value labels: Map values to localized text (French support)
- Field templates: Format with placeholders (e.g., "$value%")
- List joining: Flatten arrays with pipe delimiter
- Score dictionaries: Format as "total/max"
- Type preservation: Keep original types unless transformed
✅ Custom Functions
search_in_fields_using_regex- Pattern matching across fieldsextract_parentheses_content- Extract text in parenthesesappend_terminated_suffix- Conditional suffix appendingif_then_else- Unified conditional logic (8 operators)is_true,is_false- Boolean testsis_defined,is_undefined- Existence testsall_true,all_defined- Multiple field tests==,!=- Value comparisons
7. Organization Enrichment
✅ Center Mapping Feature
- Optional: Map organization names to center identifiers
- File:
eb_org_center_mapping.xlsx - Configuration:
Org_Center_Mappingsheet - Matching: Case-insensitive, whitespace-trimmed
- Validation: No duplicate orgs/centers check
- Graceful degradation: Missing file doesn't break process
- Fallback: Unmapped orgs use original name
- No code changes: Fully configurable via Excel
8. Robust Error Handling
✅ API Error Recovery
- Automatic token refresh on 401 errors
- Retry mechanism: Up to 10 attempts with configurable spacing
- Network error handling: Timeouts, connection refused, DNS failures
- Thread-safe: Synchronized token refresh across workers
✅ Graceful Degradation
- Missing configuration: Clear error messages with file paths
- Missing organization mapping: Skip silently, use fallback
- Optional features: Disabled if dependencies missing (win32com, pytz)
- Partial failures: Continue with available data
- Thread failures: Shutdown gracefully, preserve partial results
✅ Comprehensive Logging
- Log file:
dashboard.log(per run) - Logged events:
- API errors with attempt counts
- Token refresh events
- Configuration loading
- Quality check results
- File I/O operations
- Thread errors with stack traces
- Log levels: WARNING, CRITICAL
🚀 Execution Modes
Mode 1: Normal (Full Collection)
python eb_dashboard.py
Workflow:
- Authenticate
- Load configuration
- Collect from all APIs (2-4 min)
- Run quality checks (10-15 sec)
- Export JSON files
- Generate Excel (if configured)
Use case: Regular data updates, scheduled runs
Mode 2: Excel-Only (Fast Export)
python eb_dashboard.py --excel_only
Workflow:
- Load existing JSON files
- Load configuration
- Generate Excel (5-15 sec)
Use case: Reconfigure reports, test templates, quick re-export
Mode 3: Check-Only (Validation)
python eb_dashboard.py --check-only
Workflow:
- Load existing JSON files
- Run quality checks
- Report any issues
Use case: Verify data quality, pre-distribution checks
Mode 4: Check-Only Compare (File Comparison)
python eb_dashboard.py --check-only file1.json file2.json
Workflow:
- Load two specific JSON files
- Run regression check (file1 vs file2)
- Report differences
Use case: Compare data snapshots, version comparison
Mode 5: Debug (Verbose Output)
python eb_dashboard.py --debug
Workflow:
- Execute normal mode
- Enable detailed logging
- Show field-by-field changes
Use case: Troubleshoot issues, detailed analysis
📊 Data Quality Features
✅ Coherence Check
- Compares API counts vs actual collected data
- Organization-level validation
- Severity: Warning (±10%), Critical (>±10%)
✅ Non-Regression Check
- Config-driven change detection
- Supports transition patterns
- Severity: Warning/Critical with overrides
- Exception handling per organization
✅ Data Validation
- Field existence checking
- Type validation (boolean, list, dict)
- Condition evaluation
- Nested structure traversal
🔧 Technical Capabilities
✅ Multi-Level Threading
- Organization-level parallelization (20 workers)
- Patient-level async operations (40 workers)
- Non-blocking I/O during processing
- Thread-safe progress tracking
✅ API Integration
- 3 separate API domains (IAM, RC, GDD)
- Dynamic URL routing
- Request/response logging
- Connection pooling per thread
✅ Excel Workbook Generation
- openpyxl for reading/writing
- Template support
- Named range targeting
- Formula preservation
- win32com integration (formula recalc)
✅ Data Processing
- JSON parsing & validation
- Nested structure navigation
- Wildcard pattern matching
- Value type preservation
- List flattening with delimiters
✅ Configuration Management
- Excel file parsing
- Sheet validation
- Row-level configuration loading
- JSON column parsing
- Type conversion & validation
🎓 User Experience Features
✅ Interactive Interface
- Credential input with secure defaults
- Thread count selection (1-20)
- Progress bar feedback (multi-level)
- User confirmation for critical issues
- Clear error messages with remediation
✅ Progress Tracking
Overall Progress [████████░░░░░░░░░░░░] 847/1200
1/15 - Center 1 [████████░░░░░░░░░░░░] 73/95
2/15 - Center 2 [██░░░░░░░░░░░░░░░░░░] 42/110
✅ Logging & Audit Trail
- Per-run log file
- Timestamped entries
- Execution metrics
- Error details with context
- Searchable log format
📈 Scalability Features
✅ Configurable Parallelization
- User-selected thread count (1-20)
- Network bandwidth tuning
- API rate limit adaptation
- Memory-efficient streaming
✅ Large Dataset Support
- 1,000+ patient support
- 100+ organization support
- Paginated API calls
- Chunked processing
✅ Performance Monitoring
- Elapsed time tracking
- Progress rate display
- Estimated time remaining
- Per-phase timing
🔐 Security & Data Protection
✅ Credential Handling
- Interactive password input (hidden)
- Optional default credentials
- Token-based API authentication
- Automatic token refresh
✅ Data Protection
- File backup strategy (_old files)
- Critical issue confirmation before overwrite
- Graceful degradation on errors
- No accidental data loss
✅ Thread Safety
- Per-thread HTTP clients
- Synchronized global state
- Lock-protected shared resources
- Safe concurrent access
📦 System Integration
✅ Platform Support
- Windows (primary)
- Linux/macOS compatible
- PyInstaller executable packaging
- Command-line interface
✅ Dependency Management
- pip-installable packages
- Optional dependencies (pywin32, pytz)
- Fallback for missing modules
- Clear error messages
✅ File Format Support
- Excel 2007+ (.xlsx)
- JSON UTF-8
- Plain text logging
- Configuration in Excel
🎯 Feature Comparison Table
| Feature | Implementation | Configuration | Code Changes | Performance |
|---|---|---|---|---|
| Field Extraction | Multi-source | Excel | ❌ None | Fast |
| Custom Functions | 4 built-in | Excel register | ⚠️ Function code | Medium |
| Data Filtering | AND conditions | Excel JSON | ❌ None | Fast |
| Data Sorting | Multi-key | Excel JSON | ❌ None | Medium |
| Value Mapping | Type-strict | Excel JSON | ❌ None | Fast |
| Excel Export | Template-based | Excel | ❌ None | Medium |
| Quality Checks | Rule-based | Excel | ❌ None | Medium |
| Token Refresh | Automatic | Code | ⚠️ Constants only | Fast |
| Error Retry | Configurable | Code | ⚠️ Constants only | Medium |
| Organization Mapping | File-based | Excel file | ❌ None | Fast |
Legend: ❌ = No changes, ⚠️ = Constants only, ✅ = Full code modification
🏆 Key Strengths
- 100% Configuration-Driven - No code changes for normal operations
- High Performance - 4-5x faster via optimized APIs
- Enterprise-Grade - Comprehensive error handling & recovery
- User-Friendly - Clear prompts, progress tracking, helpful errors
- Flexible - Multiple execution modes, configurable parameters
- Maintainable - Clean architecture, extensive logging, good documentation
- Scalable - Handles 1,000+ patients, 100+ organizations
- Reliable - Quality checks, data validation, automatic backups
🔄 Workflow Flexibility
Multi-Stage Process
COLLECT → VALIDATE → EXPORT (JSON + Excel)
↓
CRITICAL ISSUES?
↓
USER CONFIRMS?
↓
YES → CONTINUE | NO → ABORT
Parallel Data Sources
- Clinical Records (RC)
- Questionnaire Answers (RC)
- Lab Requests (GDD)
- Organization Stats (RC)
- Patient Demographics (Mixed)
Sequential Validation Gates
- Configuration validation (startup)
- Authentication validation (login)
- API response validation (each call)
- Data structure validation (processing)
- Coherence validation (checks)
- Regression validation (checks)
All features listed above are fully functional and tested in production.
For implementation details, see DOCUMENTATION_30_ARCHITECTURE_SUMMARY.md