Version fonctionnelle

This commit is contained in:
2025-12-12 23:07:26 +01:00
commit cb8b5d9a12
42 changed files with 465285 additions and 0 deletions

View File

@@ -0,0 +1,500 @@
# ✨ Endobest Dashboard - Features & Capabilities Matrix
**Complete Feature Reference**
---
## 🎯 Core Features
### 1. Automated Data Collection
#### ✅ Multi-Source Data Integration
- **Research Clinic (RC) APIs**
- Organizations listing
- Inclusion statistics per organization
- Patient records & clinical data
- Questionnaire responses (optimized batch call)
- **Lab APIs (GDD)**
- Lab test requests
- Diagnostic results
- Tube ID tracking
- **Questionnaire System**
- Single optimized call retrieves ALL questionnaires + answers (4-5x faster)
- Support for multiple questionnaire sources
- Nested answer structures with JSON path navigation
#### ✅ Authentication & Token Management
- IAM integration (Ziwig Pro)
- Automatic token exchange (RC-specific)
- Automatic token refresh on 401 errors
- Thread-safe token refresh with locking
- Credential input with secure defaults
---
### 2. 100% Externalized Configuration
#### ✅ Excel-Based Configuration
No code changes needed - all behavior defined in Excel:
| Configuration Aspect | Location | Flexibility |
|---|---|---|
| **Field Extraction** | `Inclusions_Mapping` sheet | Define which fields to extract, from which sources |
| **Organization Fields** | `Organizations_Mapping` sheet | Define which org fields to export |
| **Excel Export** | `Excel_Workbooks` + `Excel_Sheets` sheets | Define workbooks, templates, sheets, transformations |
| **Quality Rules** | `Regression_Check` sheet | Define expected data changes & validation rules |
| **Organization Mapping** | `eb_org_center_mapping.xlsx` | Map organization names to center identifiers |
#### ✅ Supported Configuration Types
- **Field sources:** Questionnaire (by ID/Name/Category), Record, Inclusion, Request, Calculated
- **Value transformations:** Labels, templates, boolean conversion, list joining
- **Field conditions:** Optional conditional execution (N/A if condition false)
- **Custom functions:** Business logic without code modification
- **Filters:** AND-condition filtering with nested field support
- **Sorting:** Multi-key sorting with datetime parsing
- **Value replacement:** Type-strict mapping (boolean, enum, status codes)
---
### 3. High-Performance Multithreading
#### ✅ Parallel Processing Architecture
- **Organization processing:** Up to 20 concurrent workers
- **Patient processing:** Nested async pool (40 workers) for lab/questionnaire fetches
- **Configurable thread count:** User selects 1-20 workers
- **Non-blocking I/O:** Async lab fetches during main processing
#### ✅ Performance Optimizations
- **Questionnaire batching:** Single API call per patient instead of N calls (4-5x improvement)
- **Thread-local HTTP clients:** Per-thread client instances prevent connection conflicts
- **Nested parallelization:** Outer pool for orgs, inner pool for async tasks
- **Progress tracking:** Real-time multi-level progress bars with thread-safe updates
#### ✅ Typical Performance (Full Dataset)
```
Data Collection: 2-4 minutes (1,200+ patients, 15+ orgs)
Quality Checks: 10-15 seconds
Excel Export: 5-15 seconds
─────────────────────────────
TOTAL: 2.5-5 minutes
```
---
### 4. Comprehensive Quality Assurance
#### ✅ Coherence Checking
- Compares API-provided statistics vs. actual collected data
- Organization-level validation
- Detects data inconsistencies
- Warning/Critical severity levels
#### ✅ Non-Regression Testing
- Compares current vs. previous run
- Config-driven validation rules
- Detects unexpected changes:
- New inclusions
- Deleted inclusions
- Field value changes
- Status transitions
- Exception handling (org-specific overrides)
- Transition pattern support (expected state changes)
#### ✅ Critical Issue Handling
- User confirmation required for critical issues
- Override capability (continue export despite warnings)
- Prevents accidental data replacement
- Clear reporting of issue severity
---
### 5. Flexible Data Export
#### ✅ JSON Export
- **Structure:** Nested by field groups
- **Files:**
- `endobest_inclusions.json` (~6-7 MB)
- `endobest_organizations.json` (~17-20 KB)
- **Backup:** Automatic _old file creation
- **Format:** UTF-8, 4-space indentation
#### ✅ Excel Export (New!)
- **Configuration-driven:** All behavior in Excel
- **Multi-workbook support:** Generate multiple Excel files
- **Template support:** Load & fill Excel templates
- **Data transformation pipeline:**
- Filter (AND conditions)
- Sort (multi-key with datetime)
- Replace values (type-strict)
- Fill cells/named ranges
- **Formula recalculation:** win32com integration (optional)
- **File conflict handling:** Overwrite/Increment/Backup strategies
- **Template variables:** Dynamic filenames using `.format()` pattern
---
### 6. Advanced Field Processing
#### ✅ Field Sources
- **Questionnaire sources:**
- By ID: Direct lookup (fastest)
- By Name: Sequential search
- By Category: Sequential search
- **Record sources:** Clinical record data
- **Inclusion sources:** Patient inclusion metadata
- **Request sources:** Lab test data
- **Calculated sources:** Custom function execution
#### ✅ Field Path Navigation
- **Nested path support:** Navigate multi-level structures
- **Wildcard support:** Extract lists with `*` operator
- **JSON path expressions:** Full structure traversal
#### ✅ Value Transformations
- **Boolean conversion:** `true_if_any` with multiple match values
- **Value labels:** Map values to localized text (French support)
- **Field templates:** Format with placeholders (e.g., "$value%")
- **List joining:** Flatten arrays with pipe delimiter
- **Score dictionaries:** Format as "total/max"
- **Type preservation:** Keep original types unless transformed
#### ✅ Custom Functions
- `search_in_fields_using_regex` - Pattern matching across fields
- `extract_parentheses_content` - Extract text in parentheses
- `append_terminated_suffix` - Conditional suffix appending
- `if_then_else` - Unified conditional logic (8 operators)
- `is_true`, `is_false` - Boolean tests
- `is_defined`, `is_undefined` - Existence tests
- `all_true`, `all_defined` - Multiple field tests
- `==`, `!=` - Value comparisons
---
### 7. Organization Enrichment
#### ✅ Center Mapping Feature
- Optional: Map organization names to center identifiers
- **File:** `eb_org_center_mapping.xlsx`
- **Configuration:** `Org_Center_Mapping` sheet
- **Matching:** Case-insensitive, whitespace-trimmed
- **Validation:** No duplicate orgs/centers check
- **Graceful degradation:** Missing file doesn't break process
- **Fallback:** Unmapped orgs use original name
- **No code changes:** Fully configurable via Excel
---
### 8. Robust Error Handling
#### ✅ API Error Recovery
- **Automatic token refresh** on 401 errors
- **Retry mechanism:** Up to 10 attempts with configurable spacing
- **Network error handling:** Timeouts, connection refused, DNS failures
- **Thread-safe:** Synchronized token refresh across workers
#### ✅ Graceful Degradation
- **Missing configuration:** Clear error messages with file paths
- **Missing organization mapping:** Skip silently, use fallback
- **Optional features:** Disabled if dependencies missing (win32com, pytz)
- **Partial failures:** Continue with available data
- **Thread failures:** Shutdown gracefully, preserve partial results
#### ✅ Comprehensive Logging
- **Log file:** `dashboard.log` (per run)
- **Logged events:**
- API errors with attempt counts
- Token refresh events
- Configuration loading
- Quality check results
- File I/O operations
- Thread errors with stack traces
- **Log levels:** WARNING, CRITICAL
---
## 🚀 Execution Modes
### Mode 1: Normal (Full Collection)
```bash
python eb_dashboard.py
```
**Workflow:**
1. Authenticate
2. Load configuration
3. Collect from all APIs (2-4 min)
4. Run quality checks (10-15 sec)
5. Export JSON files
6. Generate Excel (if configured)
**Use case:** Regular data updates, scheduled runs
---
### Mode 2: Excel-Only (Fast Export)
```bash
python eb_dashboard.py --excel_only
```
**Workflow:**
1. Load existing JSON files
2. Load configuration
3. Generate Excel (5-15 sec)
**Use case:** Reconfigure reports, test templates, quick re-export
---
### Mode 3: Check-Only (Validation)
```bash
python eb_dashboard.py --check-only
```
**Workflow:**
1. Load existing JSON files
2. Run quality checks
3. Report any issues
**Use case:** Verify data quality, pre-distribution checks
---
### Mode 4: Check-Only Compare (File Comparison)
```bash
python eb_dashboard.py --check-only file1.json file2.json
```
**Workflow:**
1. Load two specific JSON files
2. Run regression check (file1 vs file2)
3. Report differences
**Use case:** Compare data snapshots, version comparison
---
### Mode 5: Debug (Verbose Output)
```bash
python eb_dashboard.py --debug
```
**Workflow:**
1. Execute normal mode
2. Enable detailed logging
3. Show field-by-field changes
**Use case:** Troubleshoot issues, detailed analysis
---
## 📊 Data Quality Features
### ✅ Coherence Check
- Compares API counts vs actual collected data
- Organization-level validation
- Severity: Warning (±10%), Critical (>±10%)
### ✅ Non-Regression Check
- Config-driven change detection
- Supports transition patterns
- Severity: Warning/Critical with overrides
- Exception handling per organization
### ✅ Data Validation
- Field existence checking
- Type validation (boolean, list, dict)
- Condition evaluation
- Nested structure traversal
---
## 🔧 Technical Capabilities
### ✅ Multi-Level Threading
- Organization-level parallelization (20 workers)
- Patient-level async operations (40 workers)
- Non-blocking I/O during processing
- Thread-safe progress tracking
### ✅ API Integration
- 3 separate API domains (IAM, RC, GDD)
- Dynamic URL routing
- Request/response logging
- Connection pooling per thread
### ✅ Excel Workbook Generation
- openpyxl for reading/writing
- Template support
- Named range targeting
- Formula preservation
- win32com integration (formula recalc)
### ✅ Data Processing
- JSON parsing & validation
- Nested structure navigation
- Wildcard pattern matching
- Value type preservation
- List flattening with delimiters
### ✅ Configuration Management
- Excel file parsing
- Sheet validation
- Row-level configuration loading
- JSON column parsing
- Type conversion & validation
---
## 🎓 User Experience Features
### ✅ Interactive Interface
- Credential input with secure defaults
- Thread count selection (1-20)
- Progress bar feedback (multi-level)
- User confirmation for critical issues
- Clear error messages with remediation
### ✅ Progress Tracking
```
Overall Progress [████████░░░░░░░░░░░░] 847/1200
1/15 - Center 1 [████████░░░░░░░░░░░░] 73/95
2/15 - Center 2 [██░░░░░░░░░░░░░░░░░░] 42/110
```
### ✅ Logging & Audit Trail
- Per-run log file
- Timestamped entries
- Execution metrics
- Error details with context
- Searchable log format
---
## 📈 Scalability Features
### ✅ Configurable Parallelization
- User-selected thread count (1-20)
- Network bandwidth tuning
- API rate limit adaptation
- Memory-efficient streaming
### ✅ Large Dataset Support
- 1,000+ patient support
- 100+ organization support
- Paginated API calls
- Chunked processing
### ✅ Performance Monitoring
- Elapsed time tracking
- Progress rate display
- Estimated time remaining
- Per-phase timing
---
## 🔐 Security & Data Protection
### ✅ Credential Handling
- Interactive password input (hidden)
- Optional default credentials
- Token-based API authentication
- Automatic token refresh
### ✅ Data Protection
- File backup strategy (_old files)
- Critical issue confirmation before overwrite
- Graceful degradation on errors
- No accidental data loss
### ✅ Thread Safety
- Per-thread HTTP clients
- Synchronized global state
- Lock-protected shared resources
- Safe concurrent access
---
## 📦 System Integration
### ✅ Platform Support
- Windows (primary)
- Linux/macOS compatible
- PyInstaller executable packaging
- Command-line interface
### ✅ Dependency Management
- pip-installable packages
- Optional dependencies (pywin32, pytz)
- Fallback for missing modules
- Clear error messages
### ✅ File Format Support
- Excel 2007+ (.xlsx)
- JSON UTF-8
- Plain text logging
- Configuration in Excel
---
## 🎯 Feature Comparison Table
| Feature | Implementation | Configuration | Code Changes | Performance |
|---------|---|---|---|---|
| **Field Extraction** | Multi-source | Excel | ❌ None | Fast |
| **Custom Functions** | 4 built-in | Excel register | ⚠️ Function code | Medium |
| **Data Filtering** | AND conditions | Excel JSON | ❌ None | Fast |
| **Data Sorting** | Multi-key | Excel JSON | ❌ None | Medium |
| **Value Mapping** | Type-strict | Excel JSON | ❌ None | Fast |
| **Excel Export** | Template-based | Excel | ❌ None | Medium |
| **Quality Checks** | Rule-based | Excel | ❌ None | Medium |
| **Token Refresh** | Automatic | Code | ⚠️ Constants only | Fast |
| **Error Retry** | Configurable | Code | ⚠️ Constants only | Medium |
| **Organization Mapping** | File-based | Excel file | ❌ None | Fast |
**Legend:** ❌ = No changes, ⚠️ = Constants only, ✅ = Full code modification
---
## 🏆 Key Strengths
1. **100% Configuration-Driven** - No code changes for normal operations
2. **High Performance** - 4-5x faster via optimized APIs
3. **Enterprise-Grade** - Comprehensive error handling & recovery
4. **User-Friendly** - Clear prompts, progress tracking, helpful errors
5. **Flexible** - Multiple execution modes, configurable parameters
6. **Maintainable** - Clean architecture, extensive logging, good documentation
7. **Scalable** - Handles 1,000+ patients, 100+ organizations
8. **Reliable** - Quality checks, data validation, automatic backups
---
## 🔄 Workflow Flexibility
### Multi-Stage Process
```
COLLECT → VALIDATE → EXPORT (JSON + Excel)
CRITICAL ISSUES?
USER CONFIRMS?
YES → CONTINUE | NO → ABORT
```
### Parallel Data Sources
- Clinical Records (RC)
- Questionnaire Answers (RC)
- Lab Requests (GDD)
- Organization Stats (RC)
- Patient Demographics (Mixed)
### Sequential Validation Gates
1. Configuration validation (startup)
2. Authentication validation (login)
3. API response validation (each call)
4. Data structure validation (processing)
5. Coherence validation (checks)
6. Regression validation (checks)
---
**All features listed above are fully functional and tested in production.**
**For implementation details, see DOCUMENTATION_30_ARCHITECTURE_SUMMARY.md**