501 lines
14 KiB
Markdown
501 lines
14 KiB
Markdown
# ✨ Endobest Dashboard - Features & Capabilities Matrix
|
|
|
|
**Complete Feature Reference**
|
|
|
|
---
|
|
|
|
## 🎯 Core Features
|
|
|
|
### 1. Automated Data Collection
|
|
|
|
#### ✅ Multi-Source Data Integration
|
|
- **Research Clinic (RC) APIs**
|
|
- Organizations listing
|
|
- Inclusion statistics per organization
|
|
- Patient records & clinical data
|
|
- Questionnaire responses (optimized batch call)
|
|
|
|
- **Lab APIs (GDD)**
|
|
- Lab test requests
|
|
- Diagnostic results
|
|
- Tube ID tracking
|
|
|
|
- **Questionnaire System**
|
|
- Single optimized call retrieves ALL questionnaires + answers (4-5x faster)
|
|
- Support for multiple questionnaire sources
|
|
- Nested answer structures with JSON path navigation
|
|
|
|
#### ✅ Authentication & Token Management
|
|
- IAM integration (Ziwig Pro)
|
|
- Automatic token exchange (RC-specific)
|
|
- Automatic token refresh on 401 errors
|
|
- Thread-safe token refresh with locking
|
|
- Credential input with secure defaults
|
|
|
|
---
|
|
|
|
### 2. 100% Externalized Configuration
|
|
|
|
#### ✅ Excel-Based Configuration
|
|
No code changes needed - all behavior defined in Excel:
|
|
|
|
| Configuration Aspect | Location | Flexibility |
|
|
|---|---|---|
|
|
| **Field Extraction** | `Inclusions_Mapping` sheet | Define which fields to extract, from which sources |
|
|
| **Organization Fields** | `Organizations_Mapping` sheet | Define which org fields to export |
|
|
| **Excel Export** | `Excel_Workbooks` + `Excel_Sheets` sheets | Define workbooks, templates, sheets, transformations |
|
|
| **Quality Rules** | `Regression_Check` sheet | Define expected data changes & validation rules |
|
|
| **Organization Mapping** | `eb_org_center_mapping.xlsx` | Map organization names to center identifiers |
|
|
|
|
#### ✅ Supported Configuration Types
|
|
- **Field sources:** Questionnaire (by ID/Name/Category), Record, Inclusion, Request, Calculated
|
|
- **Value transformations:** Labels, templates, boolean conversion, list joining
|
|
- **Field conditions:** Optional conditional execution (N/A if condition false)
|
|
- **Custom functions:** Business logic without code modification
|
|
- **Filters:** AND-condition filtering with nested field support
|
|
- **Sorting:** Multi-key sorting with datetime parsing
|
|
- **Value replacement:** Type-strict mapping (boolean, enum, status codes)
|
|
|
|
---
|
|
|
|
### 3. High-Performance Multithreading
|
|
|
|
#### ✅ Parallel Processing Architecture
|
|
- **Organization processing:** Up to 20 concurrent workers
|
|
- **Patient processing:** Nested async pool (40 workers) for lab/questionnaire fetches
|
|
- **Configurable thread count:** User selects 1-20 workers
|
|
- **Non-blocking I/O:** Async lab fetches during main processing
|
|
|
|
#### ✅ Performance Optimizations
|
|
- **Questionnaire batching:** Single API call per patient instead of N calls (4-5x improvement)
|
|
- **Thread-local HTTP clients:** Per-thread client instances prevent connection conflicts
|
|
- **Nested parallelization:** Outer pool for orgs, inner pool for async tasks
|
|
- **Progress tracking:** Real-time multi-level progress bars with thread-safe updates
|
|
|
|
#### ✅ Typical Performance (Full Dataset)
|
|
```
|
|
Data Collection: 2-4 minutes (1,200+ patients, 15+ orgs)
|
|
Quality Checks: 10-15 seconds
|
|
Excel Export: 5-15 seconds
|
|
─────────────────────────────
|
|
TOTAL: 2.5-5 minutes
|
|
```
|
|
|
|
---
|
|
|
|
### 4. Comprehensive Quality Assurance
|
|
|
|
#### ✅ Coherence Checking
|
|
- Compares API-provided statistics vs. actual collected data
|
|
- Organization-level validation
|
|
- Detects data inconsistencies
|
|
- Warning/Critical severity levels
|
|
|
|
#### ✅ Non-Regression Testing
|
|
- Compares current vs. previous run
|
|
- Config-driven validation rules
|
|
- Detects unexpected changes:
|
|
- New inclusions
|
|
- Deleted inclusions
|
|
- Field value changes
|
|
- Status transitions
|
|
- Exception handling (org-specific overrides)
|
|
- Transition pattern support (expected state changes)
|
|
|
|
#### ✅ Critical Issue Handling
|
|
- User confirmation required for critical issues
|
|
- Override capability (continue export despite warnings)
|
|
- Prevents accidental data replacement
|
|
- Clear reporting of issue severity
|
|
|
|
---
|
|
|
|
### 5. Flexible Data Export
|
|
|
|
#### ✅ JSON Export
|
|
- **Structure:** Nested by field groups
|
|
- **Files:**
|
|
- `endobest_inclusions.json` (~6-7 MB)
|
|
- `endobest_organizations.json` (~17-20 KB)
|
|
- **Backup:** Automatic _old file creation
|
|
- **Format:** UTF-8, 4-space indentation
|
|
|
|
#### ✅ Excel Export (New!)
|
|
- **Configuration-driven:** All behavior in Excel
|
|
- **Multi-workbook support:** Generate multiple Excel files
|
|
- **Template support:** Load & fill Excel templates
|
|
- **Data transformation pipeline:**
|
|
- Filter (AND conditions)
|
|
- Sort (multi-key with datetime)
|
|
- Replace values (type-strict)
|
|
- Fill cells/named ranges
|
|
- **Formula recalculation:** win32com integration (optional)
|
|
- **File conflict handling:** Overwrite/Increment/Backup strategies
|
|
- **Template variables:** Dynamic filenames using `.format()` pattern
|
|
|
|
---
|
|
|
|
### 6. Advanced Field Processing
|
|
|
|
#### ✅ Field Sources
|
|
- **Questionnaire sources:**
|
|
- By ID: Direct lookup (fastest)
|
|
- By Name: Sequential search
|
|
- By Category: Sequential search
|
|
- **Record sources:** Clinical record data
|
|
- **Inclusion sources:** Patient inclusion metadata
|
|
- **Request sources:** Lab test data
|
|
- **Calculated sources:** Custom function execution
|
|
|
|
#### ✅ Field Path Navigation
|
|
- **Nested path support:** Navigate multi-level structures
|
|
- **Wildcard support:** Extract lists with `*` operator
|
|
- **JSON path expressions:** Full structure traversal
|
|
|
|
#### ✅ Value Transformations
|
|
- **Boolean conversion:** `true_if_any` with multiple match values
|
|
- **Value labels:** Map values to localized text (French support)
|
|
- **Field templates:** Format with placeholders (e.g., "$value%")
|
|
- **List joining:** Flatten arrays with pipe delimiter
|
|
- **Score dictionaries:** Format as "total/max"
|
|
- **Type preservation:** Keep original types unless transformed
|
|
|
|
#### ✅ Custom Functions
|
|
- `search_in_fields_using_regex` - Pattern matching across fields
|
|
- `extract_parentheses_content` - Extract text in parentheses
|
|
- `append_terminated_suffix` - Conditional suffix appending
|
|
- `if_then_else` - Unified conditional logic (8 operators)
|
|
- `is_true`, `is_false` - Boolean tests
|
|
- `is_defined`, `is_undefined` - Existence tests
|
|
- `all_true`, `all_defined` - Multiple field tests
|
|
- `==`, `!=` - Value comparisons
|
|
|
|
---
|
|
|
|
### 7. Organization Enrichment
|
|
|
|
#### ✅ Center Mapping Feature
|
|
- Optional: Map organization names to center identifiers
|
|
- **File:** `eb_org_center_mapping.xlsx`
|
|
- **Configuration:** `Org_Center_Mapping` sheet
|
|
- **Matching:** Case-insensitive, whitespace-trimmed
|
|
- **Validation:** No duplicate orgs/centers check
|
|
- **Graceful degradation:** Missing file doesn't break process
|
|
- **Fallback:** Unmapped orgs use original name
|
|
- **No code changes:** Fully configurable via Excel
|
|
|
|
---
|
|
|
|
### 8. Robust Error Handling
|
|
|
|
#### ✅ API Error Recovery
|
|
- **Automatic token refresh** on 401 errors
|
|
- **Retry mechanism:** Up to 10 attempts with configurable spacing
|
|
- **Network error handling:** Timeouts, connection refused, DNS failures
|
|
- **Thread-safe:** Synchronized token refresh across workers
|
|
|
|
#### ✅ Graceful Degradation
|
|
- **Missing configuration:** Clear error messages with file paths
|
|
- **Missing organization mapping:** Skip silently, use fallback
|
|
- **Optional features:** Disabled if dependencies missing (win32com, pytz)
|
|
- **Partial failures:** Continue with available data
|
|
- **Thread failures:** Shutdown gracefully, preserve partial results
|
|
|
|
#### ✅ Comprehensive Logging
|
|
- **Log file:** `dashboard.log` (per run)
|
|
- **Logged events:**
|
|
- API errors with attempt counts
|
|
- Token refresh events
|
|
- Configuration loading
|
|
- Quality check results
|
|
- File I/O operations
|
|
- Thread errors with stack traces
|
|
- **Log levels:** WARNING, CRITICAL
|
|
|
|
---
|
|
|
|
## 🚀 Execution Modes
|
|
|
|
### Mode 1: Normal (Full Collection)
|
|
```bash
|
|
python eb_dashboard.py
|
|
```
|
|
**Workflow:**
|
|
1. Authenticate
|
|
2. Load configuration
|
|
3. Collect from all APIs (2-4 min)
|
|
4. Run quality checks (10-15 sec)
|
|
5. Export JSON files
|
|
6. Generate Excel (if configured)
|
|
|
|
**Use case:** Regular data updates, scheduled runs
|
|
|
|
---
|
|
|
|
### Mode 2: Excel-Only (Fast Export)
|
|
```bash
|
|
python eb_dashboard.py --excel_only
|
|
```
|
|
**Workflow:**
|
|
1. Load existing JSON files
|
|
2. Load configuration
|
|
3. Generate Excel (5-15 sec)
|
|
|
|
**Use case:** Reconfigure reports, test templates, quick re-export
|
|
|
|
---
|
|
|
|
### Mode 3: Check-Only (Validation)
|
|
```bash
|
|
python eb_dashboard.py --check-only
|
|
```
|
|
**Workflow:**
|
|
1. Load existing JSON files
|
|
2. Run quality checks
|
|
3. Report any issues
|
|
|
|
**Use case:** Verify data quality, pre-distribution checks
|
|
|
|
---
|
|
|
|
### Mode 4: Check-Only Compare (File Comparison)
|
|
```bash
|
|
python eb_dashboard.py --check-only file1.json file2.json
|
|
```
|
|
**Workflow:**
|
|
1. Load two specific JSON files
|
|
2. Run regression check (file1 vs file2)
|
|
3. Report differences
|
|
|
|
**Use case:** Compare data snapshots, version comparison
|
|
|
|
---
|
|
|
|
### Mode 5: Debug (Verbose Output)
|
|
```bash
|
|
python eb_dashboard.py --debug
|
|
```
|
|
**Workflow:**
|
|
1. Execute normal mode
|
|
2. Enable detailed logging
|
|
3. Show field-by-field changes
|
|
|
|
**Use case:** Troubleshoot issues, detailed analysis
|
|
|
|
---
|
|
|
|
## 📊 Data Quality Features
|
|
|
|
### ✅ Coherence Check
|
|
- Compares API counts vs actual collected data
|
|
- Organization-level validation
|
|
- Severity: Warning (±10%), Critical (>±10%)
|
|
|
|
### ✅ Non-Regression Check
|
|
- Config-driven change detection
|
|
- Supports transition patterns
|
|
- Severity: Warning/Critical with overrides
|
|
- Exception handling per organization
|
|
|
|
### ✅ Data Validation
|
|
- Field existence checking
|
|
- Type validation (boolean, list, dict)
|
|
- Condition evaluation
|
|
- Nested structure traversal
|
|
|
|
---
|
|
|
|
## 🔧 Technical Capabilities
|
|
|
|
### ✅ Multi-Level Threading
|
|
- Organization-level parallelization (20 workers)
|
|
- Patient-level async operations (40 workers)
|
|
- Non-blocking I/O during processing
|
|
- Thread-safe progress tracking
|
|
|
|
### ✅ API Integration
|
|
- 3 separate API domains (IAM, RC, GDD)
|
|
- Dynamic URL routing
|
|
- Request/response logging
|
|
- Connection pooling per thread
|
|
|
|
### ✅ Excel Workbook Generation
|
|
- openpyxl for reading/writing
|
|
- Template support
|
|
- Named range targeting
|
|
- Formula preservation
|
|
- win32com integration (formula recalc)
|
|
|
|
### ✅ Data Processing
|
|
- JSON parsing & validation
|
|
- Nested structure navigation
|
|
- Wildcard pattern matching
|
|
- Value type preservation
|
|
- List flattening with delimiters
|
|
|
|
### ✅ Configuration Management
|
|
- Excel file parsing
|
|
- Sheet validation
|
|
- Row-level configuration loading
|
|
- JSON column parsing
|
|
- Type conversion & validation
|
|
|
|
---
|
|
|
|
## 🎓 User Experience Features
|
|
|
|
### ✅ Interactive Interface
|
|
- Credential input with secure defaults
|
|
- Thread count selection (1-20)
|
|
- Progress bar feedback (multi-level)
|
|
- User confirmation for critical issues
|
|
- Clear error messages with remediation
|
|
|
|
### ✅ Progress Tracking
|
|
```
|
|
Overall Progress [████████░░░░░░░░░░░░] 847/1200
|
|
1/15 - Center 1 [████████░░░░░░░░░░░░] 73/95
|
|
2/15 - Center 2 [██░░░░░░░░░░░░░░░░░░] 42/110
|
|
```
|
|
|
|
### ✅ Logging & Audit Trail
|
|
- Per-run log file
|
|
- Timestamped entries
|
|
- Execution metrics
|
|
- Error details with context
|
|
- Searchable log format
|
|
|
|
---
|
|
|
|
## 📈 Scalability Features
|
|
|
|
### ✅ Configurable Parallelization
|
|
- User-selected thread count (1-20)
|
|
- Network bandwidth tuning
|
|
- API rate limit adaptation
|
|
- Memory-efficient streaming
|
|
|
|
### ✅ Large Dataset Support
|
|
- 1,000+ patient support
|
|
- 100+ organization support
|
|
- Paginated API calls
|
|
- Chunked processing
|
|
|
|
### ✅ Performance Monitoring
|
|
- Elapsed time tracking
|
|
- Progress rate display
|
|
- Estimated time remaining
|
|
- Per-phase timing
|
|
|
|
---
|
|
|
|
## 🔐 Security & Data Protection
|
|
|
|
### ✅ Credential Handling
|
|
- Interactive password input (hidden)
|
|
- Optional default credentials
|
|
- Token-based API authentication
|
|
- Automatic token refresh
|
|
|
|
### ✅ Data Protection
|
|
- File backup strategy (_old files)
|
|
- Critical issue confirmation before overwrite
|
|
- Graceful degradation on errors
|
|
- No accidental data loss
|
|
|
|
### ✅ Thread Safety
|
|
- Per-thread HTTP clients
|
|
- Synchronized global state
|
|
- Lock-protected shared resources
|
|
- Safe concurrent access
|
|
|
|
---
|
|
|
|
## 📦 System Integration
|
|
|
|
### ✅ Platform Support
|
|
- Windows (primary)
|
|
- Linux/macOS compatible
|
|
- PyInstaller executable packaging
|
|
- Command-line interface
|
|
|
|
### ✅ Dependency Management
|
|
- pip-installable packages
|
|
- Optional dependencies (pywin32, pytz)
|
|
- Fallback for missing modules
|
|
- Clear error messages
|
|
|
|
### ✅ File Format Support
|
|
- Excel 2007+ (.xlsx)
|
|
- JSON UTF-8
|
|
- Plain text logging
|
|
- Configuration in Excel
|
|
|
|
---
|
|
|
|
## 🎯 Feature Comparison Table
|
|
|
|
| Feature | Implementation | Configuration | Code Changes | Performance |
|
|
|---------|---|---|---|---|
|
|
| **Field Extraction** | Multi-source | Excel | ❌ None | Fast |
|
|
| **Custom Functions** | 4 built-in | Excel register | ⚠️ Function code | Medium |
|
|
| **Data Filtering** | AND conditions | Excel JSON | ❌ None | Fast |
|
|
| **Data Sorting** | Multi-key | Excel JSON | ❌ None | Medium |
|
|
| **Value Mapping** | Type-strict | Excel JSON | ❌ None | Fast |
|
|
| **Excel Export** | Template-based | Excel | ❌ None | Medium |
|
|
| **Quality Checks** | Rule-based | Excel | ❌ None | Medium |
|
|
| **Token Refresh** | Automatic | Code | ⚠️ Constants only | Fast |
|
|
| **Error Retry** | Configurable | Code | ⚠️ Constants only | Medium |
|
|
| **Organization Mapping** | File-based | Excel file | ❌ None | Fast |
|
|
|
|
**Legend:** ❌ = No changes, ⚠️ = Constants only, ✅ = Full code modification
|
|
|
|
---
|
|
|
|
## 🏆 Key Strengths
|
|
|
|
1. **100% Configuration-Driven** - No code changes for normal operations
|
|
2. **High Performance** - 4-5x faster via optimized APIs
|
|
3. **Enterprise-Grade** - Comprehensive error handling & recovery
|
|
4. **User-Friendly** - Clear prompts, progress tracking, helpful errors
|
|
5. **Flexible** - Multiple execution modes, configurable parameters
|
|
6. **Maintainable** - Clean architecture, extensive logging, good documentation
|
|
7. **Scalable** - Handles 1,000+ patients, 100+ organizations
|
|
8. **Reliable** - Quality checks, data validation, automatic backups
|
|
|
|
---
|
|
|
|
## 🔄 Workflow Flexibility
|
|
|
|
### Multi-Stage Process
|
|
```
|
|
COLLECT → VALIDATE → EXPORT (JSON + Excel)
|
|
↓
|
|
CRITICAL ISSUES?
|
|
↓
|
|
USER CONFIRMS?
|
|
↓
|
|
YES → CONTINUE | NO → ABORT
|
|
```
|
|
|
|
### Parallel Data Sources
|
|
- Clinical Records (RC)
|
|
- Questionnaire Answers (RC)
|
|
- Lab Requests (GDD)
|
|
- Organization Stats (RC)
|
|
- Patient Demographics (Mixed)
|
|
|
|
### Sequential Validation Gates
|
|
1. Configuration validation (startup)
|
|
2. Authentication validation (login)
|
|
3. API response validation (each call)
|
|
4. Data structure validation (processing)
|
|
5. Coherence validation (checks)
|
|
6. Regression validation (checks)
|
|
|
|
---
|
|
|
|
**All features listed above are fully functional and tested in production.**
|
|
|
|
**For implementation details, see DOCUMENTATION_30_ARCHITECTURE_SUMMARY.md**
|