Files
EB_Dashboard/DOCUMENTATION/DOCUMENTATION_13_EXCEL_EXPORT.md

1059 lines
33 KiB
Markdown

# Endobest Excel Export Feature & Architecture
## Part 4: Configuration-Driven Excel Workbook Generation
**Document Version:** 1.1
**Last Updated:** 2025-11-11
**Audience:** Developers, Business Analysts, System Architects
**Language:** English
---
## Table of Contents
1. [Overview](#overview)
2. [Architecture & Design](#architecture--design)
3. [Core Components](#core-components)
4. [High-Level Orchestration Functions (v1.1+)](#high-level-orchestration-functions-v11)
5. [Configuration System](#configuration-system)
6. [Data Flow & Processing Pipeline](#data-flow--processing-pipeline)
7. [Excel Export Functions](#excel-export-functions)
8. [Filter, Sort & Replacement Logic](#filter-sort--replacement-logic)
9. [Template Variables](#template-variables)
10. [File Conflict Handling](#file-conflict-handling)
11. [Integration with Main Dashboard](#integration-with-main-dashboard)
12. [Error Handling & Validation](#error-handling--validation)
13. [Configuration Examples](#configuration-examples)
14. [Troubleshooting & Debugging](#troubleshooting--debugging)
---
## Overview
The **Excel Export Feature** enables generation of configurable Excel workbooks from patient inclusion data and organization statistics. The system is entirely configuration-driven, allowing non-technical users to define export behavior through Excel configuration tables without code modifications.
### Key Characteristics
**Configuration-Driven Design:**
- All export behavior defined in `Endobest_Dashboard_Config.xlsx`
- Two tables: `Excel_Workbooks` (metadata) and `Excel_Sheets` (sheet definitions)
- No code changes needed to modify export behavior
**Modular Architecture:**
- New module: `eb_dashboard_excel_export.py`
- Separation of concerns: Excel logic isolated from main dashboard
- Dependency injection for testing and flexibility
**Data Transformation:**
- **Filter:** AND conditions with nested field support
- **Sort:** Multi-key sorting with case-insensitive strings, datetime parsing, natural alphanumeric sorting (`*natsort`)
- **Replace:** Strict type matching with first-match-wins logic
- **Fill:** Direct cell or named range targeting
> **Note:** For complete configuration details and up-to-date column specifications, refer to `DOCUMENTATION_99_CONFIG_GUIDE.md`
**Three Operating Modes:**
1. **Normal:** Full collection → Quality checks → JSON export → Excel export
2. **--excel-only:** Load existing JSON → Excel export (fast iteration)
3. **--check-only:** Quality checks only (unchanged, for backward compatibility)
---
## Architecture & Design
### Module Structure
```
eb_dashboard_excel_export.py
├── Imports & Dependencies
├── Constants & Configuration
│ └── EXCEL_RECALC_TIMEOUT = 60
├── Module-Level Variables (Injected)
│ ├── console (Rich Console instance)
│ ├── DASHBOARD_CONFIG_FILE_NAME
│ └── Other global references
├── Public API (Called from main)
│ ├── load_excel_export_config(console)
│ ├── validate_excel_config(excel_config, console, inclusions_mapping, organizations_mapping)
│ └── export_to_excel(inclusions_data, organizations_data, excel_config, console)
└── Internal Functions (Helpers)
├── _prepare_template_variables()
├── _apply_filter(item, filter_condition)
├── _apply_sort(items, sort_keys)
├── _apply_value_replacement(value, replacements)
├── _handle_output_exists(output_path, action)
├── _get_named_range_dimensions(workbook, range_name) [openpyxl - validation phase]
├── _get_table_dimensions_xlwings(workbook_xw, range_name) [xlwings - data processing]
├── _recalculate_workbook(workbook_path)
├── _process_sheet_xlwings(workbook_xw, sheet_config, ...) [xlwings - data fill]
└── set_dependencies(...)
```
### Design Principles
1. **Configuration-First:** Behavior determined by config, not code
2. **Pure Functions:** Helper functions are pure (no side effects) except I/O
3. **xlwings-First Architecture:** Data processing uses xlwings exclusively (native Excel COM API)
- Configuration validation uses openpyxl (read-only, lighter footprint)
- Data fill & processing uses xlwings (preserves workbook structure, formulas, images)
- Automatic formula recalculation via xlwings COM API during cell updates
- No redundant file reloads - metadata read via COM API without reloading
4. **Early Validation:** Config errors detected at startup, before data collection
---
## Core Components
### 1. Configuration Loading
**Function:** `load_excel_export_config(console)`
Loads Excel export configuration from the `Endobest_Dashboard_Config.xlsx` file.
**Responsibilities:**
- Read `Excel_Workbooks` table
- Read `Excel_Sheets` table
- Parse JSON fields (filter_condition, sort_keys, value_replacement)
- Validate structure and presence of required columns
- Return parsed config and error status
**Return Value:**
```python
(config_dict, has_error: bool)
```
**Config Structure:**
```python
{
"workbooks": [
{
"workbook_name": str,
"template_path": str,
"output_filename": str,
"output_exists_action": "Overwrite" | "Increment" | "Backup"
},
...
],
"sheets": [
{
"workbook_name": str,
"sheet_name": str,
"source_type": "Variable" | "Inclusions" | "Organizations",
"target": str,
"column_mapping": dict | None,
"filter_condition": dict | None,
"sort_keys": list | None,
"value_replacement": list | None
},
...
]
}
```
### 2. Configuration Validation
**Function:** `validate_excel_config(excel_config, console, inclusions_mapping, organizations_mapping)`
Validates that all referenced templates exist and have correct structure.
**Validations Performed:**
- Template files exist in `config/` directory
- Template files are valid Excel (`.xlsx`)
- Named ranges exist in templates
- Named range dimensions correct (height=1 for tables, width≥max index)
- Column mappings reference valid fields
- Source types are valid
**Return Value:**
```python
(has_critical_error: bool, error_messages: list)
```
### 3. Excel Export Orchestration
**Function:** `export_to_excel(inclusions_data, organizations_data, excel_config, console)`
Main orchestration function for Excel export.
**Workflow:**
1. Prepare template variables (timestamp, extract_date_time, etc.)
2. For each workbook in config:
- Resolve output filename using template variables
- Handle file conflicts (Overwrite/Increment/Backup)
- Copy template to output location
- **XLWINGS PHASE (native Excel COM API):**
- Load workbook with xlwings
- For each sheet config:
- Apply filters, sorts, replacements
- Read metadata via xlwings COM API (no file reloads)
- Fill cells/named ranges with data
- Formulas automatically recalculated by Excel COM API
- Save workbook
3. Log summary and completion
**Architecture Change (v1.2+):**
- Migration from openpyxl to xlwings eliminated need for separate win32com recalculation phase
- xlwings uses native Excel COM API, which automatically recalculates formulas during cell updates
- Simplified workflow: one Excel session, no hand-off between libraries
---
## High-Level Orchestration Functions (v1.1+)
**New in v1.1:** Three high-level orchestration functions were added to completely externalize Excel export orchestration from the main script. These functions follow the established pattern from the quality_checks module.
### 1. `export_excel_only(sys_argv, console_instance, inclusions_filename, organizations_filename, inclusions_mapping_config, organizations_mapping_config)`
**Purpose:** Complete orchestration of `--excel-only` CLI mode
**Workflow:**
1. Initialize console and set default filenames
2. Call `prepare_excel_export()` to load and validate
3. Handle critical configuration errors with user confirmation
4. Call `execute_excel_export()` to perform export
5. Display results and return
**Usage in Main Script:**
```python
if excel_only_mode:
export_excel_only(sys.argv, console, INCLUSIONS_FILE_NAME, ORGANIZATIONS_FILE_NAME,
INCLUSIONS_MAPPING_CONFIG, {})
return
```
**Impact:** Reduces main script from 34 lines to 4 lines (87% reduction)
---
### 2. `run_normal_mode_export(inclusions_data, organizations_data, excel_enabled, excel_config, console_instance, inclusions_mapping_config, organizations_mapping_config)`
**Purpose:** Orchestrates Excel export phase during normal workflow
**Workflow:**
1. Check if export enabled (returns early if not)
2. Load JSONs from filesystem (ensures consistency)
3. Call `execute_excel_export()` to perform export
4. Display results and return status tuple
**Returns:** `(success: bool, error_message: str)`
**Usage in Main Script:**
```python
# After JSONs are written to disk
run_normal_mode_export(output_inclusions, organizations_list, EXCEL_EXPORT_ENABLED,
EXCEL_EXPORT_CONFIG, console, INCLUSIONS_MAPPING_CONFIG, {})
```
**Impact:** Reduces main script from 19 lines to 2 lines (89% reduction)
---
### 3. `prepare_excel_export(inclusions_filename, organizations_filename, console_instance, inclusions_mapping_config, organizations_mapping_config)`
**Purpose:** Centralized preparation function - loads JSONs, config, and validates
**Responsibility:**
- Load inclusions JSON from filesystem
- Load organizations JSON from filesystem
- Load Excel export configuration
- Validate configuration against templates
- Aggregate and return all errors
**Returns:** `(prep_success: bool, inclusions_data, organizations_data, excel_config, has_critical_errors: bool, error_messages: list)`
**Used By:** Both `export_excel_only()` and potentially `run_normal_mode_export()`
---
### 4. `execute_excel_export(inclusions_data, organizations_data, excel_config, console_instance, inclusions_mapping_config, organizations_mapping_config)`
**Purpose:** Execute Excel export with comprehensive error handling
**Responsibility:**
- Call core `export_to_excel()` function
- Catch and log all exceptions
- Return success/failure status to caller
**Returns:** `(success: bool, error_message: str)`
**Error Handling:** All exceptions caught and returned as error messages (never raises)
---
### 5. `_load_json_file_internal(filename)`
**Purpose:** Internal helper for safe JSON file loading
**Responsibility:**
- Check file existence
- Load and parse JSON
- Handle errors gracefully
- Return None on failure (instead of raising)
**Used By:** `run_normal_mode_export()` internally
---
### Design Pattern: Consistency with Quality Checks
The orchestration functions follow the exact pattern established by `run_check_only_mode()` from the quality_checks module:
| Aspect | Quality Checks | Excel Export |
|--------|---|---|
| Standalone mode orchestration | `run_check_only_mode()` | `export_excel_only()` |
| Config loading in module | ✅ Yes | ✅ Yes |
| User confirmation in module | ✅ Yes | ✅ Yes |
| Error handling in module | ✅ Yes | ✅ Yes |
| Main script integration | 1 line call | 1 line call |
**Result:** Consistent architecture across all major features (quality checks, excel export, etc.)
---
## Configuration System
### Two-Table Configuration
The Excel export is configured through two tables in `Endobest_Dashboard_Config.xlsx`:
#### Table 1: Excel_Workbooks
Defines metadata for each Excel workbook to generate.
| Column | Type | Required | Example | Description |
|--------|------|----------|---------|-------------|
| workbook_name | Text | Yes | "Endobest_Output" | Unique identifier for workbook |
| template_path | Text | Yes | "templates/Endobest_Template.xlsx" | Path relative to config/ folder |
| output_filename | Text | Yes | "{workbook_name}_{extract_date_time}.xlsx" | Template for output filename |
| output_exists_action | Text | Yes | "Increment" | How to handle conflicts (Overwrite/Increment/Backup) |
#### Table 2: Excel_Sheets
Defines how to fill each sheet in the workbooks.
| Column | Type | Required | Example | Description |
|--------|------|----------|---------|-------------|
| workbook_name | Text | Yes | "Endobest_Output" | Must match Excel_Workbooks entry |
| sheet_name | Text | Yes | "Inclusions" | Sheet name in template |
| source_type | Text | Yes | "Inclusions" | Variable / Inclusions / Organizations |
| target | Text | Yes | "DataTable" | Named range or cell reference |
| column_mapping | JSON | Conditional | `{"col_id": "patient_id"}` | For source_type=Inclusions/Organizations only |
| filter_condition | JSON | No | `{"status": "active"}` | AND conditions for filtering |
| sort_keys | JSON | No | `[["date", "asc"], ["id", "asc", "*natsort"]]` | Sort specification with optional datetime/natsort |
| value_replacement | JSON | No | `[{"type": "bool", "true": "Yes", "false": "No"}]` | Value transformations |
---
## Data Flow & Processing Pipeline
### Overview
```
Input Data (inclusions + organizations)
Filter (AND conditions)
Sort (multi-key with datetime)
Value Replacement (strict typing)
Fill Excel Cells/Ranges (via xlwings)
Save Workbook (xlwings)
Formulas Automatically Recalculated (xlwings COM API)
Final Excel File
```
### Detailed Processing Steps
#### Step 1: Filter
Applies AND conditions to select matching items.
**Logic:**
- Start with all items
- For each field in filter_condition:
- Keep only items where field value equals expected value
- Support nested field paths (dot notation: `patient.status`)
- Return filtered items
**Example:**
```json
{
"status": "active",
"visit_type": "inclusion"
}
```
Keeps only items where BOTH conditions are true.
#### Step 2: Sort
Multi-key sort with datetime awareness and missing field handling.
**Logic:**
- Apply sort keys in order (first key is primary, second is secondary, etc.)
- Detect datetime fields automatically (ISO format: YYYY-MM-DD)
- Items with missing fields go to end of sort
- Reverse order for `"desc"` order specification
**Example:**
```json
[
{"field": "visit_type", "order": "asc"},
{"field": "date_visit", "order": "desc"}
]
```
#### Step 3: Value Replacement
Transform cell values based on rules (first-match-wins).
**Logic:**
- Evaluate rules in order
- Stop at first matching rule
- Strict type matching (e.g., boolean `True` ≠ string `"true"`)
- Return original value if no match
**Supported Types:**
- `"bool"`: Boolean replacement with `"true"` and `"false"` fields
- `"str"`: String replacement with `"from"` and `"to"` fields
- `"int"`: Integer replacement with `"from"` and `"to"` fields
#### Step 4: Fill Excel
Place transformed data into Excel cells or named ranges.
**Two Modes:**
- **Variable (Single Cell):** Write evaluated template string to target cell
- **Table (Named Range):** Write filtered/sorted/replaced items to target range
##### 4.1 Variable Mode (Template String Substitution)
For `source_type = "Variable"`:
1. Evaluate the source template string using `.format(**template_vars)`
2. Write result to the target named cell
3. Example: `{extract_date_time_french}``"2025-01-15 14:30:45+01:00"`
##### 4.2 Table Mode (Data Fill with Column Mapping)
For `source_type = "Inclusions"` or `"Organizations"`:
**Key Concept:** The first row of the table target serves as BOTH TEMPLATE and FIRST DATA ROW.
Some columns may contain formulas that should NOT be overwritten (unmapped columns).
**Algorithm:**
1. **Extract Column Mapping**
- Load mapping from Inclusions_Mapping or Organizations_Mapping table
- Mapping column name comes from Excel_Sheets.source parameter
- Mapping contains indices (0, 1, 2...) indicating Excel column positions
- Example:
```
Inclusions_Mapping:
| field_name | field_group | MainReport_PatientsList |
| Patient_Id | Patient_Identification | 0 |
| Status | Inclusion | 1 |
| Date | Inclusion | 3 |
(Column 2 not mapped - preserves template formula!)
```
- Result: `{0: "Patient_Identification.Patient_Id", 1: "Inclusion.Status", 3: "Inclusion.Date"}`
2. **Filter and Sort Data**
- Apply AND filter conditions
- Apply multi-key sort with datetime parsing
- Example: 5 items match filter, sorted by Patient_Id ascending
3. **Extend Table Rows**
- Delete any existing data rows below the template row
- Keep the first row (template + first data)
- For each filtered/sorted item:
a. Create new row (or use template row for first item)
b. Copy ALL cells from template row (preserves formulas!)
c. Overwrite ONLY mapped columns with JSON data
d. Apply value_replacement to mapped values
4. **Preserve Formulas in Unmapped Columns**
- Unmapped columns (those without index in mapping) keep template values
- If template column contains formula, it's preserved and recalculates later
- Allows mixed rows: some columns from JSON, some from formulas
**Example:**
Template Row (Row 1):
```
| A: P001 | B: Active | C: =SUM(...) | D: 2025-01 |
| (mapped 0) | (mapped 1) | (formula!) | (mapped 3) |
```
After processing (3 data items):
```
| A: P001 | B: Active | C: 45 | D: 2025-01 | ← Template + first data
| A: P002 | B: Active | C: 67 | D: 2025-02 | ← Data 2 (formula copied)
| A: P003 | B: Active | C: 89 | D: 2025-03 | ← Data 3 (formula copied)
```
Result:
- Columns A, B, D filled with JSON data and value replacement
- Column C: Formula `=SUM(...)` copied to all rows, will recalculate
- All rows have consistent formatting from template
---
## Excel Export Functions
### Public Functions (3)
#### load_excel_export_config(console=None)
```python
def load_excel_export_config(console_instance=None):
"""Load Excel export configuration from config file.
Reads Excel_Workbooks and Excel_Sheets tables from
Endobest_Dashboard_Config.xlsx, parses JSON fields.
Args:
console_instance: Optional Rich Console for messages
Returns:
(config_dict, has_error: bool)
Raises:
None (returns error status instead)
"""
```
#### validate_excel_config(excel_config, console, inclusions_mapping, organizations_mapping)
```python
def validate_excel_config(excel_config, console_instance,
inclusions_mapping_config,
organizations_mapping_config):
"""Validate Excel configuration against templates.
Checks that:
- Template files exist and are valid
- Named ranges exist in templates
- Dimensions are correct
- Mappings reference valid fields
Args:
excel_config: Config dict from load_excel_export_config()
console_instance: Rich Console instance
inclusions_mapping_config: List of valid inclusions fields
organizations_mapping_config: Dict of valid organizations fields
Returns:
(has_critical_error: bool, error_messages: list)
"""
```
#### export_to_excel(inclusions_data, organizations_data, excel_config, console=None)
```python
def export_to_excel(inclusions_data, organizations_data, excel_config,
console_instance=None):
"""Main orchestration: Generate Excel files from data and config.
xlwings-based processing with automatic formula recalculation:
- Load template via xlwings
- Apply data transformations (filter, sort, replace)
- Fill cells/ranges with data
- Save workbook (formulas auto-recalculated by Excel COM API)
Args:
inclusions_data: List of inclusion dicts
organizations_data: List of organization dicts
excel_config: Config dict from load_excel_export_config()
console_instance: Optional Rich Console
Returns:
None (creates files as side effect)
Raises:
Catches and logs exceptions, continues with next workbook
"""
```
### Internal Functions (10)
#### _prepare_template_variables()
```python
def _prepare_template_variables():
"""Extract variables for template string substitution.
Variables:
- extract_date_time: Full ISO datetime (UTC→Paris TZ)
- extract_year: Year
- extract_month: Month (2-digit)
- extract_day: Day (2-digit)
Returns:
dict: Variables for .format(**locals())
"""
```
#### _apply_filter(item, filter_condition)
```python
def _apply_filter(item, filter_condition):
"""Apply AND filter to item.
Returns True only if ALL conditions match.
Supports nested field paths (dot notation).
Args:
item: Dict to filter
filter_condition: Dict of field:value conditions
Returns:
bool: True if matches, False otherwise
"""
```
#### _apply_sort(items, sort_keys)
```python
def _apply_sort(items, sort_keys):
"""Multi-key sort with datetime parsing and natural alphanumeric support.
Handles:
- String fields (case-insensitive comparison)
- Numeric and datetime fields
- Natural alphanumeric sorting (*natsort option)
- Missing fields (placed at end)
- Mixed ascending and descending order
Args:
items: List of dicts to sort
sort_keys: List of [field, order] or [field, order, option]
where option can be:
- datetime format string (e.g., "%Y-%m-%d")
- "*natsort" for natural alphanumeric sorting
Returns:
list: Sorted items
"""
```
#### _apply_value_replacement(value, replacements)
```python
def _apply_value_replacement(value, replacements):
"""Transform value using first-matching rule.
Strict type matching. Returns original if no match.
Args:
value: Original value
replacements: List of replacement rules
Returns:
Replaced value or original
"""
```
#### _handle_output_exists(output_path, action)
```python
def _handle_output_exists(output_path, action):
"""Handle file conflicts: Overwrite/Increment/Backup.
Overwrite: Returns same path (existing file will be overwritten)
Increment: Returns path with _1, _2, etc. suffix
Backup: Renames existing to _backup_1, etc.; returns original path
Args:
output_path: Target file path
action: "Overwrite" | "Increment" | "Backup"
Returns:
str: Actual path to use
"""
```
#### _get_named_range_dimensions(workbook, range_name)
```python
def _get_named_range_dimensions(workbook, range_name):
"""Extract position and dimensions from named range.
Uses openpyxl named_ranges to find range definition.
Args:
workbook: openpyxl Workbook object
range_name: Name of the named range
Returns:
(sheet_name, start_cell, height, width)
Raises:
ValueError if range not found
"""
```
#### _process_sheet_xlwings(workbook_xw, sheet_config, inclusions_data, organizations_data, ...)
```python
def _process_sheet_xlwings(workbook_xw, sheet_config, inclusions_data,
organizations_data, inclusions_mapping_config,
organizations_mapping_config, template_vars):
"""Fill single sheet using xlwings (native Excel COM API).
Routes based on source_type:
- Variable: Evaluate template string, write to cell
- Inclusions/Organizations: Filter, sort, fill table (bulk operation)
Automatic formula recalculation occurs via xlwings COM API.
Args:
workbook_xw: xlwings Book object (open)
sheet_config: Single sheet configuration dict
inclusions_data, organizations_data: Source data
inclusions_mapping_config, organizations_mapping_config: Field mappings
template_vars: Variables for template strings
Returns:
bool: Success status
"""
```
#### set_dependencies(console_obj, inclusions_file, organizations_file)
```python
def set_dependencies(console_instance, inclusions_filename,
organizations_filename, ...):
"""Inject module-level variables (dependency injection).
Called from main dashboard to provide:
- console: Rich Console instance
- File names and configuration
Args:
console_instance: Rich Console object
... (other global references)
Returns:
None
"""
```
---
## Filter, Sort & Replacement Logic
### AND Filter Logic
Conditions combined with AND (all must be true):
```python
filter_condition = {"status": "active", "type": "inclusion"}
# Matches: {"status": "active", "type": "inclusion", "date": "2025-01-15"}
# Does NOT match: {"status": "active", "type": "follow-up"} (type different)
```
**Nested Field Support:**
```python
filter_condition = {"patient.status": "active"}
# Matches: {"patient": {"status": "active"}}
```
### Multi-Key Sort Logic
Sort keys applied in order (first is primary):
```python
sort_keys = [
["status", "asc"], # Primary sort
["date_visit", "desc"], # Secondary sort
["patient_id", "asc", "*natsort"] # Tertiary sort with natural alphanumeric
]
```
**String Comparison:**
- **Case-insensitive by default:** `"Centre"` comes before `"CHU"` (natural alphabetical order)
- Tiebreaker: Case-sensitive if lowercase versions are equal
**Datetime Handling:**
- Provide strptime format as third parameter: `["date_field", "desc", "%Y-%m-%d"]`
- Custom formats supported: `"%d/%m/%Y"`, `"%Y-%m-%d %H:%M:%S"`, etc.
**Natural Alphanumeric Sorting:**
- Use `"*natsort"` as third parameter for proper numeric segment handling
- Correctly sorts: `"ENDOBEST-003-3-BA"` < `"ENDOBEST-003-20-BA"` < `"ENDOBEST-003-100-BA"`
- Also handles: `"v1.2"` < `"v1.10"`, `"file2.txt"` < `"file10.txt"`
- Perfect for patient IDs, version codes, sequential identifiers
**Missing Values:**
- Items with missing/null/undefined field values placed at end
### Value Replacement Rules
First-matching rule wins; strict type matching:
```python
replacements = [
{"type": "bool", "true": "Yes", "false": "No"},
{"type": "str", "from": "active", "to": "Active"},
]
# True (boolean) → "Yes"
# "active" (string) → "Active"
# "true" (string) → "true" (no match, unchanged)
```
---
## Template Variables
### Available Variables
Template variables available in `output_filename` and Variable cell content:
| Variable | Type | Example | Notes |
|----------|------|---------|-------|
| `extract_date_time` | ISO datetime | `2025-01-15T14:30:45+01:00` | Full timestamp (UTC→Paris TZ) |
| `extract_year` | Year | `2025` | 4-digit year |
| `extract_month` | Month | `01` | 2-digit month |
| `extract_day` | Day | `15` | 2-digit day |
| `workbook_name` | Text | `"Endobest_Output"` | From config |
### Usage Examples
**Filename Template:**
```
{workbook_name}_{extract_date_time}.xlsx
→ Endobest_Output_2025-01-15T14-30-45.xlsx
```
**Variable Cell Template:**
```
Extracted: {extract_date_time}
→ Extracted: 2025-01-15T14:30:45+01:00
```
---
## File Conflict Handling
### Three Strategies
#### 1. Overwrite
- Deletes existing file
- Writes new file with same name
```
output_path: report.xlsx
result: report.xlsx (new)
```
#### 2. Increment
- Finds next available number
- Appends _1, _2, etc. to filename
```
existing: report.xlsx, report_1.xlsx, report_2.xlsx
output_path: report.xlsx
result: report_3.xlsx
```
#### 3. Backup
- Renames existing to _backup_N
- Writes new file with original name
```
existing: report.xlsx
output_path: report.xlsx
result:
- report_backup_1.xlsx (renamed)
- report.xlsx (new)
```
---
## Integration with Main Dashboard
### Integration Points
1. **Startup Validation (before collection):**
```python
EXCEL_EXPORT_CONFIG, error = load_excel_export_config(console)
if error:
# Ask user confirmation
EXCEL_EXPORT_ENABLED = False
```
2. **After JSON Export (after collection):**
```python
if EXCEL_EXPORT_ENABLED:
inclusions = load_json_file(INCLUSIONS_FILE_NAME)
organizations = load_json_file(ORGANIZATIONS_FILE_NAME)
export_to_excel(inclusions, organizations, EXCEL_EXPORT_CONFIG, console)
```
3. **--excel-only Mode:**
```python
if "--excel-only" in sys.argv:
inclusions = load_json_file(INCLUSIONS_FILE_NAME)
organizations = load_json_file(ORGANIZATIONS_FILE_NAME)
export_to_excel(inclusions, organizations, EXCEL_EXPORT_CONFIG, console)
```
### Global Variables
Added to `eb_dashboard.py`:
```python
EXCEL_EXPORT_CONFIG = None # Loaded config
EXCEL_EXPORT_ENABLED = False # Flag to enable/disable export
# Constants
EXCEL_WORKBOOKS_TABLE_NAME = "Excel_Workbooks"
EXCEL_SHEETS_TABLE_NAME = "Excel_Sheets"
```
---
## Error Handling & Validation
### Validation Stages
#### Stage 1: Config Loading (Startup)
- File exists and valid Excel format
- Required columns present
- JSON parsing succeeds
- Returns error status
#### Stage 2: Config Validation (Startup)
- Templates exist in `config/` folder
- Templates valid `.xlsx` files
- Named ranges exist
- Dimensions correct
- Returns critical error status
#### Stage 3: User Confirmation (Startup)
- If critical errors found:
- Display error messages
- Ask user to continue or abort
- Set EXCEL_EXPORT_ENABLED flag
#### Stage 4: Runtime Error Handling
- Try/except wraps main export
- Logs detailed errors
- Continues with next workbook
- Displays summary
### Error Messages
**Critical Config Error:**
```
⚠ CRITICAL CONFIGURATION ERROR(S) DETECTED
────────────────────────────────────
Error 1: Template file missing: config/templates/Missing.xlsx
Error 2: Named range not found: MyRange in sheet MySheet
...
Do you want to continue anyway? [y/N]:
```
**Runtime Error:**
```
✗ Excel export failed: [Specific error message]
(See dashboard.log for full traceback)
```
---
## Configuration Examples
### Example 1: Simple Inclusion List
**Excel_Workbooks:**
| workbook_name | template_path | output_filename | output_exists_action |
|---|---|---|---|
| Inclusions_Report | templates/Simple.xlsx | Inclusions_{extract_date_time}.xlsx | Increment |
**Excel_Sheets:**
| workbook_name | sheet_name | source_type | target | column_mapping | filter_condition | sort_keys | value_replacement |
|---|---|---|---|---|---|---|---|
| Inclusions_Report | Data | Inclusions | DataTable | {"col_id": "patient_id", "col_name": "name"} | {"status": "active"} | [{"field": "date_inclusion", "order": "asc"}] | null |
### Example 2: Multi-Sheet with Variables
**Excel_Sheets (multiple rows):**
| workbook_name | sheet_name | source_type | target | ... |
|---|---|---|---|---|
| Report | Title | Variable | TitleCell | ... |
| Report | Inclusions | Inclusions | InclusionTable | ... |
| Report | Organizations | Organizations | OrgTable | ... |
### Example 3: Value Replacement
**Excel_Sheets:**
```
value_replacement: [
{
"type": "bool",
"true": "Yes",
"false": "No"
},
{
"type": "str",
"from": "active",
"to": "Active Status"
}
]
```
---
## Troubleshooting & Debugging
### Common Issues
#### "Template file missing"
**Cause:** Template path incorrect or file not in `config/` folder
**Solution:** Verify file exists at `config/{template_path}`
#### "Named range not found"
**Cause:** Range name in config doesn't exist in template
**Solution:** Check range name in Excel (Formulas → Define Names → Name Manager)
#### "Dimensions mismatch"
**Cause:** Column count in mapping exceeds named range width
**Solution:** Verify named range dimensions and column mapping count match
#### "Formulas not recalculating"
**Cause:** xlwings not installed or Excel not available on system
**Solution:** Ensure xlwings is installed (`pip install xlwings`) and Excel is available. Formulas are automatically recalculated by xlwings via COM API.
### Debug Mode
```bash
python eb_dashboard.py --debug
```
Enables verbose logging with detailed Excel export operations.
### Log File
Check `dashboard.log` for:
- Configuration load/validation results
- Each workbook processing
- Filter/sort/replace operations
- File creation details
- Error details and tracebacks
---
## Notes for Developers
### Adding New Features
1. **New Transformation Step:** Add function to `eb_dashboard_excel_export.py`, call from `_process_sheet_xlwings()`
2. **New Source Type:** Add case to `_process_sheet_xlwings()` router (update SOURCE_TYPES in constants)
3. **New Template Variable:** Add to `_prepare_template_variables()`
4. **Update Constants:** Add new values to `eb_dashboard_constants.py` (single source of truth)
### Testing
- Unit tests: `test_core_logic.py` (26 tests, 100% pass)
- No external dependencies needed (pure function testing)
- Integration tests: Use `--excel_only` mode with real data
### Performance Considerations
- **Data Filtering:** O(n) per filter rule
- **Sorting:** O(n log n)
- **Excel Fill:** O(n) for cells, time depends on file size
- **Typical Duration:** 1-5 seconds per workbook (depends on data volume and template complexity)
---
**End of Excel Export Architecture Documentation**