1059 lines
33 KiB
Markdown
1059 lines
33 KiB
Markdown
# Endobest Excel Export Feature & Architecture
|
|
|
|
## Part 4: Configuration-Driven Excel Workbook Generation
|
|
|
|
**Document Version:** 1.1
|
|
**Last Updated:** 2025-11-11
|
|
**Audience:** Developers, Business Analysts, System Architects
|
|
**Language:** English
|
|
|
|
---
|
|
|
|
## Table of Contents
|
|
|
|
1. [Overview](#overview)
|
|
2. [Architecture & Design](#architecture--design)
|
|
3. [Core Components](#core-components)
|
|
4. [High-Level Orchestration Functions (v1.1+)](#high-level-orchestration-functions-v11)
|
|
5. [Configuration System](#configuration-system)
|
|
6. [Data Flow & Processing Pipeline](#data-flow--processing-pipeline)
|
|
7. [Excel Export Functions](#excel-export-functions)
|
|
8. [Filter, Sort & Replacement Logic](#filter-sort--replacement-logic)
|
|
9. [Template Variables](#template-variables)
|
|
10. [File Conflict Handling](#file-conflict-handling)
|
|
11. [Integration with Main Dashboard](#integration-with-main-dashboard)
|
|
12. [Error Handling & Validation](#error-handling--validation)
|
|
13. [Configuration Examples](#configuration-examples)
|
|
14. [Troubleshooting & Debugging](#troubleshooting--debugging)
|
|
|
|
---
|
|
|
|
## Overview
|
|
|
|
The **Excel Export Feature** enables generation of configurable Excel workbooks from patient inclusion data and organization statistics. The system is entirely configuration-driven, allowing non-technical users to define export behavior through Excel configuration tables without code modifications.
|
|
|
|
### Key Characteristics
|
|
|
|
**Configuration-Driven Design:**
|
|
- All export behavior defined in `Endobest_Dashboard_Config.xlsx`
|
|
- Two tables: `Excel_Workbooks` (metadata) and `Excel_Sheets` (sheet definitions)
|
|
- No code changes needed to modify export behavior
|
|
|
|
**Modular Architecture:**
|
|
- New module: `eb_dashboard_excel_export.py`
|
|
- Separation of concerns: Excel logic isolated from main dashboard
|
|
- Dependency injection for testing and flexibility
|
|
|
|
**Data Transformation:**
|
|
- **Filter:** AND conditions with nested field support
|
|
- **Sort:** Multi-key sorting with case-insensitive strings, datetime parsing, natural alphanumeric sorting (`*natsort`)
|
|
- **Replace:** Strict type matching with first-match-wins logic
|
|
- **Fill:** Direct cell or named range targeting
|
|
|
|
> **Note:** For complete configuration details and up-to-date column specifications, refer to `DOCUMENTATION_99_CONFIG_GUIDE.md`
|
|
|
|
**Three Operating Modes:**
|
|
1. **Normal:** Full collection → Quality checks → JSON export → Excel export
|
|
2. **--excel-only:** Load existing JSON → Excel export (fast iteration)
|
|
3. **--check-only:** Quality checks only (unchanged, for backward compatibility)
|
|
|
|
---
|
|
|
|
## Architecture & Design
|
|
|
|
### Module Structure
|
|
|
|
```
|
|
eb_dashboard_excel_export.py
|
|
├── Imports & Dependencies
|
|
├── Constants & Configuration
|
|
│ └── EXCEL_RECALC_TIMEOUT = 60
|
|
├── Module-Level Variables (Injected)
|
|
│ ├── console (Rich Console instance)
|
|
│ ├── DASHBOARD_CONFIG_FILE_NAME
|
|
│ └── Other global references
|
|
│
|
|
├── Public API (Called from main)
|
|
│ ├── load_excel_export_config(console)
|
|
│ ├── validate_excel_config(excel_config, console, inclusions_mapping, organizations_mapping)
|
|
│ └── export_to_excel(inclusions_data, organizations_data, excel_config, console)
|
|
│
|
|
└── Internal Functions (Helpers)
|
|
├── _prepare_template_variables()
|
|
├── _apply_filter(item, filter_condition)
|
|
├── _apply_sort(items, sort_keys)
|
|
├── _apply_value_replacement(value, replacements)
|
|
├── _handle_output_exists(output_path, action)
|
|
├── _get_named_range_dimensions(workbook, range_name) [openpyxl - validation phase]
|
|
├── _get_table_dimensions_xlwings(workbook_xw, range_name) [xlwings - data processing]
|
|
├── _recalculate_workbook(workbook_path)
|
|
├── _process_sheet_xlwings(workbook_xw, sheet_config, ...) [xlwings - data fill]
|
|
└── set_dependencies(...)
|
|
```
|
|
|
|
### Design Principles
|
|
|
|
1. **Configuration-First:** Behavior determined by config, not code
|
|
2. **Pure Functions:** Helper functions are pure (no side effects) except I/O
|
|
3. **xlwings-First Architecture:** Data processing uses xlwings exclusively (native Excel COM API)
|
|
- Configuration validation uses openpyxl (read-only, lighter footprint)
|
|
- Data fill & processing uses xlwings (preserves workbook structure, formulas, images)
|
|
- Automatic formula recalculation via xlwings COM API during cell updates
|
|
- No redundant file reloads - metadata read via COM API without reloading
|
|
4. **Early Validation:** Config errors detected at startup, before data collection
|
|
|
|
---
|
|
|
|
## Core Components
|
|
|
|
### 1. Configuration Loading
|
|
**Function:** `load_excel_export_config(console)`
|
|
|
|
Loads Excel export configuration from the `Endobest_Dashboard_Config.xlsx` file.
|
|
|
|
**Responsibilities:**
|
|
- Read `Excel_Workbooks` table
|
|
- Read `Excel_Sheets` table
|
|
- Parse JSON fields (filter_condition, sort_keys, value_replacement)
|
|
- Validate structure and presence of required columns
|
|
- Return parsed config and error status
|
|
|
|
**Return Value:**
|
|
```python
|
|
(config_dict, has_error: bool)
|
|
```
|
|
|
|
**Config Structure:**
|
|
```python
|
|
{
|
|
"workbooks": [
|
|
{
|
|
"workbook_name": str,
|
|
"template_path": str,
|
|
"output_filename": str,
|
|
"output_exists_action": "Overwrite" | "Increment" | "Backup"
|
|
},
|
|
...
|
|
],
|
|
"sheets": [
|
|
{
|
|
"workbook_name": str,
|
|
"sheet_name": str,
|
|
"source_type": "Variable" | "Inclusions" | "Organizations",
|
|
"target": str,
|
|
"column_mapping": dict | None,
|
|
"filter_condition": dict | None,
|
|
"sort_keys": list | None,
|
|
"value_replacement": list | None
|
|
},
|
|
...
|
|
]
|
|
}
|
|
```
|
|
|
|
### 2. Configuration Validation
|
|
**Function:** `validate_excel_config(excel_config, console, inclusions_mapping, organizations_mapping)`
|
|
|
|
Validates that all referenced templates exist and have correct structure.
|
|
|
|
**Validations Performed:**
|
|
- Template files exist in `config/` directory
|
|
- Template files are valid Excel (`.xlsx`)
|
|
- Named ranges exist in templates
|
|
- Named range dimensions correct (height=1 for tables, width≥max index)
|
|
- Column mappings reference valid fields
|
|
- Source types are valid
|
|
|
|
**Return Value:**
|
|
```python
|
|
(has_critical_error: bool, error_messages: list)
|
|
```
|
|
|
|
### 3. Excel Export Orchestration
|
|
**Function:** `export_to_excel(inclusions_data, organizations_data, excel_config, console)`
|
|
|
|
Main orchestration function for Excel export.
|
|
|
|
**Workflow:**
|
|
1. Prepare template variables (timestamp, extract_date_time, etc.)
|
|
2. For each workbook in config:
|
|
- Resolve output filename using template variables
|
|
- Handle file conflicts (Overwrite/Increment/Backup)
|
|
- Copy template to output location
|
|
- **XLWINGS PHASE (native Excel COM API):**
|
|
- Load workbook with xlwings
|
|
- For each sheet config:
|
|
- Apply filters, sorts, replacements
|
|
- Read metadata via xlwings COM API (no file reloads)
|
|
- Fill cells/named ranges with data
|
|
- Formulas automatically recalculated by Excel COM API
|
|
- Save workbook
|
|
3. Log summary and completion
|
|
|
|
**Architecture Change (v1.2+):**
|
|
- Migration from openpyxl to xlwings eliminated need for separate win32com recalculation phase
|
|
- xlwings uses native Excel COM API, which automatically recalculates formulas during cell updates
|
|
- Simplified workflow: one Excel session, no hand-off between libraries
|
|
|
|
---
|
|
|
|
## High-Level Orchestration Functions (v1.1+)
|
|
|
|
**New in v1.1:** Three high-level orchestration functions were added to completely externalize Excel export orchestration from the main script. These functions follow the established pattern from the quality_checks module.
|
|
|
|
### 1. `export_excel_only(sys_argv, console_instance, inclusions_filename, organizations_filename, inclusions_mapping_config, organizations_mapping_config)`
|
|
|
|
**Purpose:** Complete orchestration of `--excel-only` CLI mode
|
|
|
|
**Workflow:**
|
|
1. Initialize console and set default filenames
|
|
2. Call `prepare_excel_export()` to load and validate
|
|
3. Handle critical configuration errors with user confirmation
|
|
4. Call `execute_excel_export()` to perform export
|
|
5. Display results and return
|
|
|
|
**Usage in Main Script:**
|
|
```python
|
|
if excel_only_mode:
|
|
export_excel_only(sys.argv, console, INCLUSIONS_FILE_NAME, ORGANIZATIONS_FILE_NAME,
|
|
INCLUSIONS_MAPPING_CONFIG, {})
|
|
return
|
|
```
|
|
|
|
**Impact:** Reduces main script from 34 lines to 4 lines (87% reduction)
|
|
|
|
---
|
|
|
|
### 2. `run_normal_mode_export(inclusions_data, organizations_data, excel_enabled, excel_config, console_instance, inclusions_mapping_config, organizations_mapping_config)`
|
|
|
|
**Purpose:** Orchestrates Excel export phase during normal workflow
|
|
|
|
**Workflow:**
|
|
1. Check if export enabled (returns early if not)
|
|
2. Load JSONs from filesystem (ensures consistency)
|
|
3. Call `execute_excel_export()` to perform export
|
|
4. Display results and return status tuple
|
|
|
|
**Returns:** `(success: bool, error_message: str)`
|
|
|
|
**Usage in Main Script:**
|
|
```python
|
|
# After JSONs are written to disk
|
|
run_normal_mode_export(output_inclusions, organizations_list, EXCEL_EXPORT_ENABLED,
|
|
EXCEL_EXPORT_CONFIG, console, INCLUSIONS_MAPPING_CONFIG, {})
|
|
```
|
|
|
|
**Impact:** Reduces main script from 19 lines to 2 lines (89% reduction)
|
|
|
|
---
|
|
|
|
### 3. `prepare_excel_export(inclusions_filename, organizations_filename, console_instance, inclusions_mapping_config, organizations_mapping_config)`
|
|
|
|
**Purpose:** Centralized preparation function - loads JSONs, config, and validates
|
|
|
|
**Responsibility:**
|
|
- Load inclusions JSON from filesystem
|
|
- Load organizations JSON from filesystem
|
|
- Load Excel export configuration
|
|
- Validate configuration against templates
|
|
- Aggregate and return all errors
|
|
|
|
**Returns:** `(prep_success: bool, inclusions_data, organizations_data, excel_config, has_critical_errors: bool, error_messages: list)`
|
|
|
|
**Used By:** Both `export_excel_only()` and potentially `run_normal_mode_export()`
|
|
|
|
---
|
|
|
|
### 4. `execute_excel_export(inclusions_data, organizations_data, excel_config, console_instance, inclusions_mapping_config, organizations_mapping_config)`
|
|
|
|
**Purpose:** Execute Excel export with comprehensive error handling
|
|
|
|
**Responsibility:**
|
|
- Call core `export_to_excel()` function
|
|
- Catch and log all exceptions
|
|
- Return success/failure status to caller
|
|
|
|
**Returns:** `(success: bool, error_message: str)`
|
|
|
|
**Error Handling:** All exceptions caught and returned as error messages (never raises)
|
|
|
|
---
|
|
|
|
### 5. `_load_json_file_internal(filename)`
|
|
|
|
**Purpose:** Internal helper for safe JSON file loading
|
|
|
|
**Responsibility:**
|
|
- Check file existence
|
|
- Load and parse JSON
|
|
- Handle errors gracefully
|
|
- Return None on failure (instead of raising)
|
|
|
|
**Used By:** `run_normal_mode_export()` internally
|
|
|
|
---
|
|
|
|
### Design Pattern: Consistency with Quality Checks
|
|
|
|
The orchestration functions follow the exact pattern established by `run_check_only_mode()` from the quality_checks module:
|
|
|
|
| Aspect | Quality Checks | Excel Export |
|
|
|--------|---|---|
|
|
| Standalone mode orchestration | `run_check_only_mode()` | `export_excel_only()` |
|
|
| Config loading in module | ✅ Yes | ✅ Yes |
|
|
| User confirmation in module | ✅ Yes | ✅ Yes |
|
|
| Error handling in module | ✅ Yes | ✅ Yes |
|
|
| Main script integration | 1 line call | 1 line call |
|
|
|
|
**Result:** Consistent architecture across all major features (quality checks, excel export, etc.)
|
|
|
|
---
|
|
|
|
## Configuration System
|
|
|
|
### Two-Table Configuration
|
|
|
|
The Excel export is configured through two tables in `Endobest_Dashboard_Config.xlsx`:
|
|
|
|
#### Table 1: Excel_Workbooks
|
|
Defines metadata for each Excel workbook to generate.
|
|
|
|
| Column | Type | Required | Example | Description |
|
|
|--------|------|----------|---------|-------------|
|
|
| workbook_name | Text | Yes | "Endobest_Output" | Unique identifier for workbook |
|
|
| template_path | Text | Yes | "templates/Endobest_Template.xlsx" | Path relative to config/ folder |
|
|
| output_filename | Text | Yes | "{workbook_name}_{extract_date_time}.xlsx" | Template for output filename |
|
|
| output_exists_action | Text | Yes | "Increment" | How to handle conflicts (Overwrite/Increment/Backup) |
|
|
|
|
#### Table 2: Excel_Sheets
|
|
Defines how to fill each sheet in the workbooks.
|
|
|
|
| Column | Type | Required | Example | Description |
|
|
|--------|------|----------|---------|-------------|
|
|
| workbook_name | Text | Yes | "Endobest_Output" | Must match Excel_Workbooks entry |
|
|
| sheet_name | Text | Yes | "Inclusions" | Sheet name in template |
|
|
| source_type | Text | Yes | "Inclusions" | Variable / Inclusions / Organizations |
|
|
| target | Text | Yes | "DataTable" | Named range or cell reference |
|
|
| column_mapping | JSON | Conditional | `{"col_id": "patient_id"}` | For source_type=Inclusions/Organizations only |
|
|
| filter_condition | JSON | No | `{"status": "active"}` | AND conditions for filtering |
|
|
| sort_keys | JSON | No | `[["date", "asc"], ["id", "asc", "*natsort"]]` | Sort specification with optional datetime/natsort |
|
|
| value_replacement | JSON | No | `[{"type": "bool", "true": "Yes", "false": "No"}]` | Value transformations |
|
|
|
|
---
|
|
|
|
## Data Flow & Processing Pipeline
|
|
|
|
### Overview
|
|
|
|
```
|
|
Input Data (inclusions + organizations)
|
|
↓
|
|
Filter (AND conditions)
|
|
↓
|
|
Sort (multi-key with datetime)
|
|
↓
|
|
Value Replacement (strict typing)
|
|
↓
|
|
Fill Excel Cells/Ranges (via xlwings)
|
|
↓
|
|
Save Workbook (xlwings)
|
|
↓
|
|
Formulas Automatically Recalculated (xlwings COM API)
|
|
↓
|
|
Final Excel File
|
|
```
|
|
|
|
### Detailed Processing Steps
|
|
|
|
#### Step 1: Filter
|
|
Applies AND conditions to select matching items.
|
|
|
|
**Logic:**
|
|
- Start with all items
|
|
- For each field in filter_condition:
|
|
- Keep only items where field value equals expected value
|
|
- Support nested field paths (dot notation: `patient.status`)
|
|
- Return filtered items
|
|
|
|
**Example:**
|
|
```json
|
|
{
|
|
"status": "active",
|
|
"visit_type": "inclusion"
|
|
}
|
|
```
|
|
Keeps only items where BOTH conditions are true.
|
|
|
|
#### Step 2: Sort
|
|
Multi-key sort with datetime awareness and missing field handling.
|
|
|
|
**Logic:**
|
|
- Apply sort keys in order (first key is primary, second is secondary, etc.)
|
|
- Detect datetime fields automatically (ISO format: YYYY-MM-DD)
|
|
- Items with missing fields go to end of sort
|
|
- Reverse order for `"desc"` order specification
|
|
|
|
**Example:**
|
|
```json
|
|
[
|
|
{"field": "visit_type", "order": "asc"},
|
|
{"field": "date_visit", "order": "desc"}
|
|
]
|
|
```
|
|
|
|
#### Step 3: Value Replacement
|
|
Transform cell values based on rules (first-match-wins).
|
|
|
|
**Logic:**
|
|
- Evaluate rules in order
|
|
- Stop at first matching rule
|
|
- Strict type matching (e.g., boolean `True` ≠ string `"true"`)
|
|
- Return original value if no match
|
|
|
|
**Supported Types:**
|
|
- `"bool"`: Boolean replacement with `"true"` and `"false"` fields
|
|
- `"str"`: String replacement with `"from"` and `"to"` fields
|
|
- `"int"`: Integer replacement with `"from"` and `"to"` fields
|
|
|
|
#### Step 4: Fill Excel
|
|
Place transformed data into Excel cells or named ranges.
|
|
|
|
**Two Modes:**
|
|
- **Variable (Single Cell):** Write evaluated template string to target cell
|
|
- **Table (Named Range):** Write filtered/sorted/replaced items to target range
|
|
|
|
##### 4.1 Variable Mode (Template String Substitution)
|
|
|
|
For `source_type = "Variable"`:
|
|
1. Evaluate the source template string using `.format(**template_vars)`
|
|
2. Write result to the target named cell
|
|
3. Example: `{extract_date_time_french}` → `"2025-01-15 14:30:45+01:00"`
|
|
|
|
##### 4.2 Table Mode (Data Fill with Column Mapping)
|
|
|
|
For `source_type = "Inclusions"` or `"Organizations"`:
|
|
|
|
**Key Concept:** The first row of the table target serves as BOTH TEMPLATE and FIRST DATA ROW.
|
|
Some columns may contain formulas that should NOT be overwritten (unmapped columns).
|
|
|
|
**Algorithm:**
|
|
|
|
1. **Extract Column Mapping**
|
|
- Load mapping from Inclusions_Mapping or Organizations_Mapping table
|
|
- Mapping column name comes from Excel_Sheets.source parameter
|
|
- Mapping contains indices (0, 1, 2...) indicating Excel column positions
|
|
- Example:
|
|
```
|
|
Inclusions_Mapping:
|
|
| field_name | field_group | MainReport_PatientsList |
|
|
| Patient_Id | Patient_Identification | 0 |
|
|
| Status | Inclusion | 1 |
|
|
| Date | Inclusion | 3 |
|
|
(Column 2 not mapped - preserves template formula!)
|
|
```
|
|
- Result: `{0: "Patient_Identification.Patient_Id", 1: "Inclusion.Status", 3: "Inclusion.Date"}`
|
|
|
|
2. **Filter and Sort Data**
|
|
- Apply AND filter conditions
|
|
- Apply multi-key sort with datetime parsing
|
|
- Example: 5 items match filter, sorted by Patient_Id ascending
|
|
|
|
3. **Extend Table Rows**
|
|
- Delete any existing data rows below the template row
|
|
- Keep the first row (template + first data)
|
|
- For each filtered/sorted item:
|
|
a. Create new row (or use template row for first item)
|
|
b. Copy ALL cells from template row (preserves formulas!)
|
|
c. Overwrite ONLY mapped columns with JSON data
|
|
d. Apply value_replacement to mapped values
|
|
|
|
4. **Preserve Formulas in Unmapped Columns**
|
|
- Unmapped columns (those without index in mapping) keep template values
|
|
- If template column contains formula, it's preserved and recalculates later
|
|
- Allows mixed rows: some columns from JSON, some from formulas
|
|
|
|
**Example:**
|
|
|
|
Template Row (Row 1):
|
|
```
|
|
| A: P001 | B: Active | C: =SUM(...) | D: 2025-01 |
|
|
| (mapped 0) | (mapped 1) | (formula!) | (mapped 3) |
|
|
```
|
|
|
|
After processing (3 data items):
|
|
```
|
|
| A: P001 | B: Active | C: 45 | D: 2025-01 | ← Template + first data
|
|
| A: P002 | B: Active | C: 67 | D: 2025-02 | ← Data 2 (formula copied)
|
|
| A: P003 | B: Active | C: 89 | D: 2025-03 | ← Data 3 (formula copied)
|
|
```
|
|
|
|
Result:
|
|
- Columns A, B, D filled with JSON data and value replacement
|
|
- Column C: Formula `=SUM(...)` copied to all rows, will recalculate
|
|
- All rows have consistent formatting from template
|
|
|
|
---
|
|
|
|
## Excel Export Functions
|
|
|
|
### Public Functions (3)
|
|
|
|
#### load_excel_export_config(console=None)
|
|
```python
|
|
def load_excel_export_config(console_instance=None):
|
|
"""Load Excel export configuration from config file.
|
|
|
|
Reads Excel_Workbooks and Excel_Sheets tables from
|
|
Endobest_Dashboard_Config.xlsx, parses JSON fields.
|
|
|
|
Args:
|
|
console_instance: Optional Rich Console for messages
|
|
|
|
Returns:
|
|
(config_dict, has_error: bool)
|
|
|
|
Raises:
|
|
None (returns error status instead)
|
|
"""
|
|
```
|
|
|
|
#### validate_excel_config(excel_config, console, inclusions_mapping, organizations_mapping)
|
|
```python
|
|
def validate_excel_config(excel_config, console_instance,
|
|
inclusions_mapping_config,
|
|
organizations_mapping_config):
|
|
"""Validate Excel configuration against templates.
|
|
|
|
Checks that:
|
|
- Template files exist and are valid
|
|
- Named ranges exist in templates
|
|
- Dimensions are correct
|
|
- Mappings reference valid fields
|
|
|
|
Args:
|
|
excel_config: Config dict from load_excel_export_config()
|
|
console_instance: Rich Console instance
|
|
inclusions_mapping_config: List of valid inclusions fields
|
|
organizations_mapping_config: Dict of valid organizations fields
|
|
|
|
Returns:
|
|
(has_critical_error: bool, error_messages: list)
|
|
"""
|
|
```
|
|
|
|
#### export_to_excel(inclusions_data, organizations_data, excel_config, console=None)
|
|
```python
|
|
def export_to_excel(inclusions_data, organizations_data, excel_config,
|
|
console_instance=None):
|
|
"""Main orchestration: Generate Excel files from data and config.
|
|
|
|
xlwings-based processing with automatic formula recalculation:
|
|
- Load template via xlwings
|
|
- Apply data transformations (filter, sort, replace)
|
|
- Fill cells/ranges with data
|
|
- Save workbook (formulas auto-recalculated by Excel COM API)
|
|
|
|
Args:
|
|
inclusions_data: List of inclusion dicts
|
|
organizations_data: List of organization dicts
|
|
excel_config: Config dict from load_excel_export_config()
|
|
console_instance: Optional Rich Console
|
|
|
|
Returns:
|
|
None (creates files as side effect)
|
|
|
|
Raises:
|
|
Catches and logs exceptions, continues with next workbook
|
|
"""
|
|
```
|
|
|
|
### Internal Functions (10)
|
|
|
|
#### _prepare_template_variables()
|
|
```python
|
|
def _prepare_template_variables():
|
|
"""Extract variables for template string substitution.
|
|
|
|
Variables:
|
|
- extract_date_time: Full ISO datetime (UTC→Paris TZ)
|
|
- extract_year: Year
|
|
- extract_month: Month (2-digit)
|
|
- extract_day: Day (2-digit)
|
|
|
|
Returns:
|
|
dict: Variables for .format(**locals())
|
|
"""
|
|
```
|
|
|
|
#### _apply_filter(item, filter_condition)
|
|
```python
|
|
def _apply_filter(item, filter_condition):
|
|
"""Apply AND filter to item.
|
|
|
|
Returns True only if ALL conditions match.
|
|
Supports nested field paths (dot notation).
|
|
|
|
Args:
|
|
item: Dict to filter
|
|
filter_condition: Dict of field:value conditions
|
|
|
|
Returns:
|
|
bool: True if matches, False otherwise
|
|
"""
|
|
```
|
|
|
|
#### _apply_sort(items, sort_keys)
|
|
```python
|
|
def _apply_sort(items, sort_keys):
|
|
"""Multi-key sort with datetime parsing and natural alphanumeric support.
|
|
|
|
Handles:
|
|
- String fields (case-insensitive comparison)
|
|
- Numeric and datetime fields
|
|
- Natural alphanumeric sorting (*natsort option)
|
|
- Missing fields (placed at end)
|
|
- Mixed ascending and descending order
|
|
|
|
Args:
|
|
items: List of dicts to sort
|
|
sort_keys: List of [field, order] or [field, order, option]
|
|
where option can be:
|
|
- datetime format string (e.g., "%Y-%m-%d")
|
|
- "*natsort" for natural alphanumeric sorting
|
|
|
|
Returns:
|
|
list: Sorted items
|
|
"""
|
|
```
|
|
|
|
#### _apply_value_replacement(value, replacements)
|
|
```python
|
|
def _apply_value_replacement(value, replacements):
|
|
"""Transform value using first-matching rule.
|
|
|
|
Strict type matching. Returns original if no match.
|
|
|
|
Args:
|
|
value: Original value
|
|
replacements: List of replacement rules
|
|
|
|
Returns:
|
|
Replaced value or original
|
|
"""
|
|
```
|
|
|
|
#### _handle_output_exists(output_path, action)
|
|
```python
|
|
def _handle_output_exists(output_path, action):
|
|
"""Handle file conflicts: Overwrite/Increment/Backup.
|
|
|
|
Overwrite: Returns same path (existing file will be overwritten)
|
|
Increment: Returns path with _1, _2, etc. suffix
|
|
Backup: Renames existing to _backup_1, etc.; returns original path
|
|
|
|
Args:
|
|
output_path: Target file path
|
|
action: "Overwrite" | "Increment" | "Backup"
|
|
|
|
Returns:
|
|
str: Actual path to use
|
|
"""
|
|
```
|
|
|
|
#### _get_named_range_dimensions(workbook, range_name)
|
|
```python
|
|
def _get_named_range_dimensions(workbook, range_name):
|
|
"""Extract position and dimensions from named range.
|
|
|
|
Uses openpyxl named_ranges to find range definition.
|
|
|
|
Args:
|
|
workbook: openpyxl Workbook object
|
|
range_name: Name of the named range
|
|
|
|
Returns:
|
|
(sheet_name, start_cell, height, width)
|
|
|
|
Raises:
|
|
ValueError if range not found
|
|
"""
|
|
```
|
|
|
|
#### _process_sheet_xlwings(workbook_xw, sheet_config, inclusions_data, organizations_data, ...)
|
|
```python
|
|
def _process_sheet_xlwings(workbook_xw, sheet_config, inclusions_data,
|
|
organizations_data, inclusions_mapping_config,
|
|
organizations_mapping_config, template_vars):
|
|
"""Fill single sheet using xlwings (native Excel COM API).
|
|
|
|
Routes based on source_type:
|
|
- Variable: Evaluate template string, write to cell
|
|
- Inclusions/Organizations: Filter, sort, fill table (bulk operation)
|
|
|
|
Automatic formula recalculation occurs via xlwings COM API.
|
|
|
|
Args:
|
|
workbook_xw: xlwings Book object (open)
|
|
sheet_config: Single sheet configuration dict
|
|
inclusions_data, organizations_data: Source data
|
|
inclusions_mapping_config, organizations_mapping_config: Field mappings
|
|
template_vars: Variables for template strings
|
|
|
|
Returns:
|
|
bool: Success status
|
|
"""
|
|
```
|
|
|
|
#### set_dependencies(console_obj, inclusions_file, organizations_file)
|
|
```python
|
|
def set_dependencies(console_instance, inclusions_filename,
|
|
organizations_filename, ...):
|
|
"""Inject module-level variables (dependency injection).
|
|
|
|
Called from main dashboard to provide:
|
|
- console: Rich Console instance
|
|
- File names and configuration
|
|
|
|
Args:
|
|
console_instance: Rich Console object
|
|
... (other global references)
|
|
|
|
Returns:
|
|
None
|
|
"""
|
|
```
|
|
|
|
---
|
|
|
|
## Filter, Sort & Replacement Logic
|
|
|
|
### AND Filter Logic
|
|
|
|
Conditions combined with AND (all must be true):
|
|
|
|
```python
|
|
filter_condition = {"status": "active", "type": "inclusion"}
|
|
# Matches: {"status": "active", "type": "inclusion", "date": "2025-01-15"}
|
|
# Does NOT match: {"status": "active", "type": "follow-up"} (type different)
|
|
```
|
|
|
|
**Nested Field Support:**
|
|
```python
|
|
filter_condition = {"patient.status": "active"}
|
|
# Matches: {"patient": {"status": "active"}}
|
|
```
|
|
|
|
### Multi-Key Sort Logic
|
|
|
|
Sort keys applied in order (first is primary):
|
|
|
|
```python
|
|
sort_keys = [
|
|
["status", "asc"], # Primary sort
|
|
["date_visit", "desc"], # Secondary sort
|
|
["patient_id", "asc", "*natsort"] # Tertiary sort with natural alphanumeric
|
|
]
|
|
```
|
|
|
|
**String Comparison:**
|
|
- **Case-insensitive by default:** `"Centre"` comes before `"CHU"` (natural alphabetical order)
|
|
- Tiebreaker: Case-sensitive if lowercase versions are equal
|
|
|
|
**Datetime Handling:**
|
|
- Provide strptime format as third parameter: `["date_field", "desc", "%Y-%m-%d"]`
|
|
- Custom formats supported: `"%d/%m/%Y"`, `"%Y-%m-%d %H:%M:%S"`, etc.
|
|
|
|
**Natural Alphanumeric Sorting:**
|
|
- Use `"*natsort"` as third parameter for proper numeric segment handling
|
|
- Correctly sorts: `"ENDOBEST-003-3-BA"` < `"ENDOBEST-003-20-BA"` < `"ENDOBEST-003-100-BA"`
|
|
- Also handles: `"v1.2"` < `"v1.10"`, `"file2.txt"` < `"file10.txt"`
|
|
- Perfect for patient IDs, version codes, sequential identifiers
|
|
|
|
**Missing Values:**
|
|
- Items with missing/null/undefined field values placed at end
|
|
|
|
### Value Replacement Rules
|
|
|
|
First-matching rule wins; strict type matching:
|
|
|
|
```python
|
|
replacements = [
|
|
{"type": "bool", "true": "Yes", "false": "No"},
|
|
{"type": "str", "from": "active", "to": "Active"},
|
|
]
|
|
|
|
# True (boolean) → "Yes"
|
|
# "active" (string) → "Active"
|
|
# "true" (string) → "true" (no match, unchanged)
|
|
```
|
|
|
|
---
|
|
|
|
## Template Variables
|
|
|
|
### Available Variables
|
|
|
|
Template variables available in `output_filename` and Variable cell content:
|
|
|
|
| Variable | Type | Example | Notes |
|
|
|----------|------|---------|-------|
|
|
| `extract_date_time` | ISO datetime | `2025-01-15T14:30:45+01:00` | Full timestamp (UTC→Paris TZ) |
|
|
| `extract_year` | Year | `2025` | 4-digit year |
|
|
| `extract_month` | Month | `01` | 2-digit month |
|
|
| `extract_day` | Day | `15` | 2-digit day |
|
|
| `workbook_name` | Text | `"Endobest_Output"` | From config |
|
|
|
|
### Usage Examples
|
|
|
|
**Filename Template:**
|
|
```
|
|
{workbook_name}_{extract_date_time}.xlsx
|
|
→ Endobest_Output_2025-01-15T14-30-45.xlsx
|
|
```
|
|
|
|
**Variable Cell Template:**
|
|
```
|
|
Extracted: {extract_date_time}
|
|
→ Extracted: 2025-01-15T14:30:45+01:00
|
|
```
|
|
|
|
---
|
|
|
|
## File Conflict Handling
|
|
|
|
### Three Strategies
|
|
|
|
#### 1. Overwrite
|
|
- Deletes existing file
|
|
- Writes new file with same name
|
|
|
|
```
|
|
output_path: report.xlsx
|
|
result: report.xlsx (new)
|
|
```
|
|
|
|
#### 2. Increment
|
|
- Finds next available number
|
|
- Appends _1, _2, etc. to filename
|
|
|
|
```
|
|
existing: report.xlsx, report_1.xlsx, report_2.xlsx
|
|
output_path: report.xlsx
|
|
result: report_3.xlsx
|
|
```
|
|
|
|
#### 3. Backup
|
|
- Renames existing to _backup_N
|
|
- Writes new file with original name
|
|
|
|
```
|
|
existing: report.xlsx
|
|
output_path: report.xlsx
|
|
result:
|
|
- report_backup_1.xlsx (renamed)
|
|
- report.xlsx (new)
|
|
```
|
|
|
|
---
|
|
|
|
## Integration with Main Dashboard
|
|
|
|
### Integration Points
|
|
|
|
1. **Startup Validation (before collection):**
|
|
```python
|
|
EXCEL_EXPORT_CONFIG, error = load_excel_export_config(console)
|
|
if error:
|
|
# Ask user confirmation
|
|
EXCEL_EXPORT_ENABLED = False
|
|
```
|
|
|
|
2. **After JSON Export (after collection):**
|
|
```python
|
|
if EXCEL_EXPORT_ENABLED:
|
|
inclusions = load_json_file(INCLUSIONS_FILE_NAME)
|
|
organizations = load_json_file(ORGANIZATIONS_FILE_NAME)
|
|
export_to_excel(inclusions, organizations, EXCEL_EXPORT_CONFIG, console)
|
|
```
|
|
|
|
3. **--excel-only Mode:**
|
|
```python
|
|
if "--excel-only" in sys.argv:
|
|
inclusions = load_json_file(INCLUSIONS_FILE_NAME)
|
|
organizations = load_json_file(ORGANIZATIONS_FILE_NAME)
|
|
export_to_excel(inclusions, organizations, EXCEL_EXPORT_CONFIG, console)
|
|
```
|
|
|
|
### Global Variables
|
|
|
|
Added to `eb_dashboard.py`:
|
|
|
|
```python
|
|
EXCEL_EXPORT_CONFIG = None # Loaded config
|
|
EXCEL_EXPORT_ENABLED = False # Flag to enable/disable export
|
|
|
|
# Constants
|
|
EXCEL_WORKBOOKS_TABLE_NAME = "Excel_Workbooks"
|
|
EXCEL_SHEETS_TABLE_NAME = "Excel_Sheets"
|
|
```
|
|
|
|
---
|
|
|
|
## Error Handling & Validation
|
|
|
|
### Validation Stages
|
|
|
|
#### Stage 1: Config Loading (Startup)
|
|
- File exists and valid Excel format
|
|
- Required columns present
|
|
- JSON parsing succeeds
|
|
- Returns error status
|
|
|
|
#### Stage 2: Config Validation (Startup)
|
|
- Templates exist in `config/` folder
|
|
- Templates valid `.xlsx` files
|
|
- Named ranges exist
|
|
- Dimensions correct
|
|
- Returns critical error status
|
|
|
|
#### Stage 3: User Confirmation (Startup)
|
|
- If critical errors found:
|
|
- Display error messages
|
|
- Ask user to continue or abort
|
|
- Set EXCEL_EXPORT_ENABLED flag
|
|
|
|
#### Stage 4: Runtime Error Handling
|
|
- Try/except wraps main export
|
|
- Logs detailed errors
|
|
- Continues with next workbook
|
|
- Displays summary
|
|
|
|
### Error Messages
|
|
|
|
**Critical Config Error:**
|
|
```
|
|
⚠ CRITICAL CONFIGURATION ERROR(S) DETECTED
|
|
────────────────────────────────────
|
|
Error 1: Template file missing: config/templates/Missing.xlsx
|
|
Error 2: Named range not found: MyRange in sheet MySheet
|
|
...
|
|
Do you want to continue anyway? [y/N]:
|
|
```
|
|
|
|
**Runtime Error:**
|
|
```
|
|
✗ Excel export failed: [Specific error message]
|
|
(See dashboard.log for full traceback)
|
|
```
|
|
|
|
---
|
|
|
|
## Configuration Examples
|
|
|
|
### Example 1: Simple Inclusion List
|
|
|
|
**Excel_Workbooks:**
|
|
| workbook_name | template_path | output_filename | output_exists_action |
|
|
|---|---|---|---|
|
|
| Inclusions_Report | templates/Simple.xlsx | Inclusions_{extract_date_time}.xlsx | Increment |
|
|
|
|
**Excel_Sheets:**
|
|
| workbook_name | sheet_name | source_type | target | column_mapping | filter_condition | sort_keys | value_replacement |
|
|
|---|---|---|---|---|---|---|---|
|
|
| Inclusions_Report | Data | Inclusions | DataTable | {"col_id": "patient_id", "col_name": "name"} | {"status": "active"} | [{"field": "date_inclusion", "order": "asc"}] | null |
|
|
|
|
### Example 2: Multi-Sheet with Variables
|
|
|
|
**Excel_Sheets (multiple rows):**
|
|
| workbook_name | sheet_name | source_type | target | ... |
|
|
|---|---|---|---|---|
|
|
| Report | Title | Variable | TitleCell | ... |
|
|
| Report | Inclusions | Inclusions | InclusionTable | ... |
|
|
| Report | Organizations | Organizations | OrgTable | ... |
|
|
|
|
### Example 3: Value Replacement
|
|
|
|
**Excel_Sheets:**
|
|
```
|
|
value_replacement: [
|
|
{
|
|
"type": "bool",
|
|
"true": "Yes",
|
|
"false": "No"
|
|
},
|
|
{
|
|
"type": "str",
|
|
"from": "active",
|
|
"to": "Active Status"
|
|
}
|
|
]
|
|
```
|
|
|
|
---
|
|
|
|
## Troubleshooting & Debugging
|
|
|
|
### Common Issues
|
|
|
|
#### "Template file missing"
|
|
**Cause:** Template path incorrect or file not in `config/` folder
|
|
**Solution:** Verify file exists at `config/{template_path}`
|
|
|
|
#### "Named range not found"
|
|
**Cause:** Range name in config doesn't exist in template
|
|
**Solution:** Check range name in Excel (Formulas → Define Names → Name Manager)
|
|
|
|
#### "Dimensions mismatch"
|
|
**Cause:** Column count in mapping exceeds named range width
|
|
**Solution:** Verify named range dimensions and column mapping count match
|
|
|
|
#### "Formulas not recalculating"
|
|
**Cause:** xlwings not installed or Excel not available on system
|
|
**Solution:** Ensure xlwings is installed (`pip install xlwings`) and Excel is available. Formulas are automatically recalculated by xlwings via COM API.
|
|
|
|
### Debug Mode
|
|
|
|
```bash
|
|
python eb_dashboard.py --debug
|
|
```
|
|
|
|
Enables verbose logging with detailed Excel export operations.
|
|
|
|
### Log File
|
|
|
|
Check `dashboard.log` for:
|
|
- Configuration load/validation results
|
|
- Each workbook processing
|
|
- Filter/sort/replace operations
|
|
- File creation details
|
|
- Error details and tracebacks
|
|
|
|
---
|
|
|
|
## Notes for Developers
|
|
|
|
### Adding New Features
|
|
|
|
1. **New Transformation Step:** Add function to `eb_dashboard_excel_export.py`, call from `_process_sheet_xlwings()`
|
|
2. **New Source Type:** Add case to `_process_sheet_xlwings()` router (update SOURCE_TYPES in constants)
|
|
3. **New Template Variable:** Add to `_prepare_template_variables()`
|
|
4. **Update Constants:** Add new values to `eb_dashboard_constants.py` (single source of truth)
|
|
|
|
### Testing
|
|
|
|
- Unit tests: `test_core_logic.py` (26 tests, 100% pass)
|
|
- No external dependencies needed (pure function testing)
|
|
- Integration tests: Use `--excel_only` mode with real data
|
|
|
|
### Performance Considerations
|
|
|
|
- **Data Filtering:** O(n) per filter rule
|
|
- **Sorting:** O(n log n)
|
|
- **Excel Fill:** O(n) for cells, time depends on file size
|
|
- **Typical Duration:** 1-5 seconds per workbook (depends on data volume and template complexity)
|
|
|
|
---
|
|
|
|
**End of Excel Export Architecture Documentation**
|
|
|