# Endobest Dashboard - Configuration Guide **Document Version:** 1.0 **Last Updated:** 2025-11-08 **Audience:** System Administrators, Configuration Managers **Language:** English --- ## Configuration Overview The Endobest Dashboard is configured entirely through Excel files - no code changes needed. ### Main Configuration File **File Location:** `config/Endobest_Dashboard_Config.xlsx` **Contains:** - `Inclusions_Mapping` - Field definitions for inclusion data - `Organizations_Mapping` - Field definitions for organization data - `Excel_Workbooks` - Metadata for Excel export - `Excel_Sheets` - Sheet definitions and data transformation rules - `Regression_Check` - Quality check rules This guide focuses on **Excel_Workbooks** and **Excel_Sheets** tables (for Excel export configuration). --- ## Table of Contents 1. [File Location & Structure](#file-location--structure) 2. [Inclusions_Mapping (Reference)](#inclusions_mapping-reference) 3. [Organizations_Mapping (Reference)](#organizations_mapping-reference) 4. [Excel_Workbooks Table](#excel_workbooks-table) 5. [Excel_Sheets Table](#excel_sheets-table) 6. [Data Types & Formats](#data-types--formats) 7. [JSON Field Specifications](#json-field-specifications) 8. [Naming Conventions](#naming-conventions) 9. [Configuration Examples](#configuration-examples) 10. [Validation & Error Messages](#validation--error-messages) 11. [Best Practices](#best-practices) 12. [Troubleshooting](#troubleshooting) --- ## File Location & Structure ### Directory Layout ``` Endobest Dashboard/ ├── eb_dashboard.py (main script) ├── config/ │ ├── Endobest_Dashboard_Config.xlsx (← CONFIGURATION FILE) │ ├── Endobest_Extended_Fields.xlsx (old, deprecated) │ ├── eb_org_center_mapping.xlsx │ └── templates/ │ ├── Endobest_Template.xlsx │ ├── Statistics_Template.xlsx │ └── (other templates) ├── endobest_inclusions.json (output) ├── endobest_organizations.json (output) └── dashboard.log ``` ### Opening & Editing 1. Open `config/Endobest_Dashboard_Config.xlsx` in Excel 2. Go to specific sheet tab 3. Edit rows as needed 4. Save file 5. Run script - changes take effect on next run **Important:** Do NOT change column order or delete required columns. --- ## Inclusions_Mapping (Reference) This table defines which patient fields to include in export. ### Purpose Specifies which inclusion data fields are available for use in: - Excel export (column_mapping in Excel_Sheets) - Quality checks - Regression testing ### Columns | Column | Type | Example | Notes | |--------|------|---------|-------| | Field_Selection | Action | [["include", "*.*"]] | Pipeline of include/exclude actions | | Field_Name | Text | patient_id | Internal name used in column_mapping | ### Usage in Excel Export The Field_Name values are used in `column_mapping`: ```json { "col_patient_id": "patient_id", "col_name": "patient_name", "col_status": "inclusion_status" } ``` **Map Excel Column Name → Inclusion Field Name** --- ## Organizations_Mapping (Reference) This table defines which organization fields to include in export. ### Purpose Specifies which organization data fields are available for use in: - Excel export (column_mapping for Organizations source_type) - Quality checks ### Columns | Column | Type | Example | Notes | |--------|------|---------|-------| | Field_Name | Text | org_id | Internal name | | org_id | Text | org.id | Data source path | | org_name | Text | org.name | Organization name | ### Usage in Excel Export The Field_Name values are used in `column_mapping`: ```json { "col_org_code": "org_id", "col_org_name": "org_name" } ``` --- ## Excel_Workbooks Table Defines metadata for each Excel file to generate. ### Purpose Specifies WHAT Excel files to create, using which templates, with what naming. ### Column Definitions #### workbook_name (Required) - **Type:** Text - **Length:** 1-255 characters - **Example:** `Endobest_Output`, `Statistics_Report`, `Monthly_Summary` - **Usage:** Unique identifier referenced in Excel_Sheets table - **Rules:** Must be unique within the table - **Notes:** Used in template variables as {workbook_name} #### template_path (Required) - **Type:** Text (file path) - **Example:** `templates/Endobest_Template.xlsx` - **Relative To:** `config/` folder - **Rules:** Path is relative, not absolute - **Validation:** Script checks file exists before export - **Notes:** Template must be valid Excel (.xlsx) file - **Error if:** - File doesn't exist - File is not .xlsx format - Path is absolute instead of relative #### output_filename (Required) - **Type:** Text (filename template) - **Example:** `{workbook_name}_{extract_date_time}.xlsx` - **Available Variables:** - `{workbook_name}` - From workbook_name column - `{extract_date_time}` - Full ISO datetime (2025-01-15T14:30:45+01:00) - `{extract_year}` - Year (2025) - `{extract_month}` - Month (01-12) - `{extract_day}` - Day (01-31) - **Processed As:** Python f-string via `.format()` - **Example Results:** - `Report_{extract_date_time}.xlsx` → `Report_2025-01-15T14-30-45.xlsx` - `{workbook_name}_Month{extract_month}.xlsx` → `Endobest_Output_Month01.xlsx` - **Rules:** - Must include `.xlsx` extension - Must be valid filename (no /, \, :, *, ?, ", <, >, |) - Variables are case-sensitive #### output_exists_action (Required) - **Type:** Text (one of three values) - **Valid Values:** - `Overwrite` - Replace existing file - `Increment` - Append _1, _2, etc. - `Backup` - Rename existing to _backup_1, etc. - **Default:** `Increment` (recommended for safety) - **Behavior:** | Action | If file exists | Result | |--------|---|---| | **Overwrite** | `report.xlsx` | Deletes `report.xlsx`, creates new | | **Increment** | `report.xlsx`, `report_1.xlsx` | Creates `report_2.xlsx` | | **Backup** | `report.xlsx` | Renames to `report_backup_1.xlsx`, creates new `report.xlsx` | ### Row Rules - Each row generates ONE Excel file - All columns must be filled (no empty cells) - workbook_name must be unique - Multiple workbooks allowed ### Example Rows ``` Row 1: workbook_name: Endobest_Output template_path: templates/Endobest_Template.xlsx output_filename: {workbook_name}_{extract_date_time}.xlsx output_exists_action: Increment Row 2: workbook_name: Statistics_Report template_path: templates/Statistics.xlsx output_filename: {workbook_name}_{extract_year}-{extract_month}.xlsx output_exists_action: Overwrite ``` --- ## Excel_Sheets Table Defines how to fill sheets within the workbooks. ### Purpose Specifies HOW to fill each sheet: - Which data to use (Inclusions/Organizations/Variable) - How to transform it (filter, sort, replace) - Where to put it (target cell/range) ### Column Definitions #### workbook_name (Required) - **Type:** Text - **Example:** `Endobest_Output` - **Rules:** Must match exactly one row in Excel_Workbooks table - **Validation:** Script checks reference exists #### sheet_name (Required) - **Type:** Text - **Example:** `Inclusions`, `Summary`, `Organizations` - **Rules:** Must match sheet name in template exactly - **Validation:** Script checks sheet exists in template #### source_type (Required) - **Type:** Text (one of three values) - **Valid Values:** - `Variable` - Single variable value (timestamp, text, etc.) - `Inclusions` - Patient inclusion data - `Organizations` - Organization data - **Rules:** Determines what column_mapping is required #### target (Required) - **Type:** Text (cell reference or named range) - **Format:** - Cell reference: `A1`, `B10`, `Title_Cell` - Named range: `DataTable`, `InclusionsRange`, etc. - **For Variable:** Single cell (not a range) - **For Inclusions/Organizations:** Named range with height=1 (single row for headers, data below) - **Validation:** Script checks target exists in template #### column_mapping (Conditional) - **Required If:** source_type = `Inclusions` OR `Organizations` - **Type:** JSON object - **Format:** `{"excel_column_name": "data_field_name", ...}` - **Example (Inclusions):** ```json { "col_id": "patient_id", "col_name": "patient_name", "col_status": "inclusion_status", "col_date": "date_inclusion" } ``` - **Example (Organizations):** ```json { "col_code": "org_id", "col_name": "org_name", "col_count": "patient_count" } ``` - **Field Names:** Must match names in Inclusions_Mapping or Organizations_Mapping - **Column Order:** Determines order of columns in Excel (left to right) - **Validation:** Script checks all field names exist in mapping - **For Variable:** Leave empty (NULL or omit) #### filter_condition (Optional) - **Type:** JSON object (AND conditions) - **Default:** NULL (no filtering, all items included) - **Format:** `{"field_name": expected_value, ...}` - **Example:** ```json { "status": "active", "visit_type": "inclusion" } ``` - **Logic:** AND (all conditions must match) - Item with `{"status": "active", "visit_type": "inclusion"}` → MATCHES - Item with `{"status": "active", "visit_type": "follow-up"}` → DOES NOT MATCH - **Nested Fields:** Support dot notation - `"patient.status": "active"` matches `{"patient": {"status": "active"}}` - **For Variable:** Ignored (leave NULL) - **Types:** String, number, boolean values all supported #### sort_keys (Optional) - **Type:** JSON array of sort specifications - **Default:** NULL (no sorting, original order) - **Format:** `[["field_name", "asc"|"desc"], ["field2", "order", "option"], ...]` - **Example:** ```json [ ["date_visit", "desc"], ["patient_name", "asc"] ] ``` - **Primary/Secondary:** First array element is primary sort, second is secondary, etc. - **Options:** Third element can be datetime format (`"%Y-%m-%d"`) or `"*natsort"` for alphanumeric sorting - **Order Values:** - `"asc"` - Ascending (A→Z, 0→9, old→new dates) - `"desc"` - Descending (Z→A, 9→0, new→old dates) - **Missing Fields:** Items with missing field placed at end - **Datetime:** Auto-detected from ISO format (YYYY-MM-DD) - no configuration needed - **For Variable:** Ignored (leave NULL) #### value_replacement (Optional) - **Type:** JSON array of replacement rules - **Default:** NULL (no replacement, original values used) - **Format:** `[{rule1}, {rule2}, ...]` - **Logic:** First matching rule wins (stop at first match) - **Types Supported:** **Boolean replacement:** ```json { "type": "bool", "true": "Yes", "false": "No" } ``` - Matches: Python boolean `True` / `False` (not strings) - Replaces: `True` → "Yes", `False` → "No" **String replacement:** ```json { "type": "str", "from": "active", "to": "Active Status" } ``` - Matches: String "active" (exact, case-sensitive) - Does NOT match: "Active" or "ACTIVE" **Integer replacement:** ```json { "type": "int", "from": 0, "to": "Not Applicable" } ``` - Matches: Integer 0 (not string "0") - Replaces: 0 → "Not Applicable" - **Type Matching:** Strict - boolean True ≠ string "true" - **Multiple Rules Example:** ```json [ {"type": "bool", "true": "Yes", "false": "No"}, {"type": "str", "from": "active", "to": "Active"}, {"type": "str", "from": "inactive", "to": "Inactive"} ] ``` - Booleans match first rule - "active" matches second rule - "inactive" matches third rule - Other strings pass through unchanged - **For Variable:** Ignored (leave NULL) ### Row Rules - Each row defines ONE sheet in ONE workbook - Source_type determines required fields: - **Variable:** column_mapping, filter_condition, sort_keys, value_replacement all ignored - **Inclusions/Organizations:** column_mapping REQUIRED, others optional - Multiple rows for same workbook allowed (multiple sheets) - Multiple rows for same sheet not recommended (last wins) ### Example Configurations **Simple Inclusions Table:** ``` workbook_name: Endobest_Output sheet_name: Inclusions source_type: Inclusions target: DataTable column_mapping: {"col_id": "patient_id", "col_name": "patient_name"} filter_condition: {"status": "active"} sort_keys: [["date_inclusion", "desc"]] value_replacement: NULL ``` **Multiple Sheets:** ``` Row 1 (Title): workbook_name: Report sheet_name: Title source_type: Variable target: TitleCell (other columns ignored) Row 2 (Inclusions): workbook_name: Report sheet_name: Data source_type: Inclusions target: InclusionTable column_mapping: {...} Row 3 (Organizations): workbook_name: Report sheet_name: Orgs source_type: Organizations target: OrgTable column_mapping: {...} ``` **Complex Transformations:** ``` workbook_name: Statistics sheet_name: SummaryData source_type: Inclusions target: SummaryTable column_mapping: { "col_id": "patient_id", "col_status": "status", "col_activated": "is_activated" } filter_condition: {"status": "active"} sort_keys: [ ["status", "asc"], ["date_visit", "desc"] ] value_replacement: [ {"type": "bool", "true": "✓", "false": "✗"}, {"type": "str", "from": "active", "to": "Active"}, {"type": "str", "from": "pending", "to": "Pending"} ] ``` --- ## Data Types & Formats ### Text Fields - **Type:** Plain text - **Length:** As needed - **Special Characters:** Allowed in values, but not in field names - **Examples:** `patient_id`, `Inclusions`, `Endobest_Output` ### JSON Fields - **Type:** Valid JSON format - **Validation:** Must be valid JSON or NULL - **Common Mistakes:** - Missing quotes: `{col_id: "patient_id"}` ✗ (should be `{"col_id": "patient_id"}`) - Single quotes: `{'col_id': 'patient_id'}` ✗ (JSON uses double quotes) - Trailing commas: `{"a": 1,}` ✗ (not valid JSON) - **Validation:** Script validates JSON parsing before use ### Dates & Times - **Format:** ISO 8601 (YYYY-MM-DD or YYYY-MM-DDTHH:MM:SS) - **Example:** `2025-01-15`, `2025-01-15T14:30:45` - **Timezone:** Convert to UTC before storing - **Auto-Detection:** Script auto-detects datetime fields and parses correctly --- ## JSON Field Specifications ### column_mapping JSON **Structure:** ```json { "excel_column_1": "field_name_1", "excel_column_2": "field_name_2", ... } ``` **Rules:** - Keys (left side): Column names (can be any text) - Values (right side): Must match Inclusions_Mapping or Organizations_Mapping - Order: Determines column order in Excel (left to right) - Count: No limit, but must fit in target range **Validation:** - All values must exist in source mapping - Extra columns cause error - Missing columns fill with blanks ### filter_condition JSON **Structure:** ```json { "field_1": value_1, "field_2": value_2, ... } ``` **Rules:** - Keys (left side): Field names (from mapping) - Values (right side): Literal values to match - Logic: AND (all conditions must match) - Empty object: `{}` matches all (no filtering) **Value Types Supported:** - String: `"active"` - Number: `123`, `45.67` - Boolean: `true`, `false` (JSON format, not quoted) - NULL: `null` **Example:** ```json { "status": "active", "center_code": "PARIS01", "patient_count": 10 } ``` Matches only items with ALL three conditions. ### sort_keys JSON **Structure:** ```json [ ["field_name_1", "asc"], ["field_name_2", "desc"], ["field_name_3", "asc", "option"] ] ``` **Rules:** - Array of arrays format (ordered list) - Each sort specification: `[field, order]` or `[field, order, option]` - Field: Must exist in source data - Order: `"asc"` or `"desc"` only - Option (optional): Special sorting behavior (see below) - Empty array: `[]` means no sorting **Field Matching:** - Exact field name match required - Case-sensitive field names - **String comparison:** Case-insensitive by default - `"Centre Evidens"` comes before `"CHU Hospital"` (natural alphabetical order) **Optional Third Parameter:** 1. **Datetime Format:** ```json ["date_field", "desc", "%Y-%m-%d"] ``` - Provide Python strptime format for custom date parsing - Example formats: `"%d/%m/%Y"`, `"%Y-%m-%d %H:%M:%S"` 2. **Natural Alphanumeric Sorting:** ```json ["patient_id", "asc", "*natsort"] ``` - Use `"*natsort"` for natural sorting of alphanumeric codes - Correctly sorts: `"ENDOBEST-003-3-BA"` < `"ENDOBEST-003-20-BA"` - Also handles: `"file2.txt"` < `"file10.txt"`, `"v1.9"` < `"v1.10"` - Perfect for patient IDs, version numbers, sequential codes ### value_replacement JSON **Structure:** ```json [ { "type": "TYPE_NAME", "TYPE_SPECIFIC_FIELDS": values }, ... ] ``` **Boolean Type:** ```json { "type": "bool", "true": "Replacement for True", "false": "Replacement for False" } ``` **String Type:** ```json { "type": "str", "from": "Source string", "to": "Replacement string" } ``` **Integer Type:** ```json { "type": "int", "from": 123, "to": "Replacement" } ``` **Rules:** - Each rule must have `"type"` field - Other fields required per type - Evaluated in order (first match wins) - NULL or empty array means no replacement --- ## Naming Conventions ### File & Path Naming - **Paths:** Relative to `config/` folder - **Separators:** Use forward slash `/` (not backslash `\`) - **Extensions:** Must include `.xlsx` - **Spaces:** Avoid in filenames (use underscore or camelCase) ### Column Naming - **No spaces:** Use underscores or camelCase - **Avoid special characters:** Letters, numbers, underscore only - **Length:** Keep reasonable (avoid 100+ char names) - **Consistency:** Use same names across configuration ### Field Naming - **From Mapping:** Use exact names from Inclusions_Mapping or Organizations_Mapping - **Case-Sensitive:** Field_Name ≠ field_name - **Match Required:** Must exist in mapping ### Excel Named Ranges - **Define in Excel:** Formulas → Name Manager → New - **Naming:** Same rules as column naming - **Scope:** Sheet-level or Workbook-level both OK - **Used in:** `target` column of Excel_Sheets --- ## Configuration Examples ### Example 1: Simple Patient Report **Excel_Workbooks:** ``` workbook_name | template_path | output_filename | output_exists_action Endobest_Report | templates/Simple.xlsx | Report_{extract_date_time}.xlsx | Increment ``` **Excel_Sheets:** ``` workbook_name | sheet_name | source_type | target | column_mapping | filter_condition | sort_keys Endobest_Report | Patients | Inclusions | PatientTbl | {"ID": "patient_id", | {"status": | [{"field": "date_inclusion", | | | | "Name": "patient_name", | "active"} | "order": "asc"}] | | | | "Date": "date_inclusion"} | | ``` ### Example 2: Multi-Sheet Report **Excel_Workbooks:** ``` workbook_name | template_path | output_filename | output_exists_action FullReport | templates/Multi.xlsx | {workbook_name}_{extract_month}.xlsx | Overwrite ``` **Excel_Sheets (3 rows):** ``` Row 1 (Title): workbook_name | sheet_name | source_type | target | column_mapping | filter_condition | sort_keys FullReport | Cover | Variable | TitleCell | NULL | NULL | NULL Row 2 (Inclusions): workbook_name | sheet_name | source_type | target | column_mapping | filter_condition | sort_keys FullReport | Inclusions | Inclusions | IncTbl | {"col_id": "patient_id", | {"status": "active"} | [{"field": "date_visit", | | | | "col_name": "patient_name", | | "order": "desc"}] | | | | "col_site": "site_id"} | | Row 3 (Organizations): workbook_name | sheet_name | source_type | target | column_mapping | filter_condition | sort_keys FullReport | Summary | Organizations | OrgTbl | {"Name": "org_name", | NULL | [{"field": "org_name", | | | | "Count": "patient_count"} | | "order": "asc"}] ``` --- ## Validation & Error Messages ### Configuration Errors (Startup) **Template file missing:** ``` ✗ CRITICAL: Template file missing: config/templates/Missing.xlsx ``` **Fix:** Verify file exists and path is correct **Named range not found:** ``` ✗ CRITICAL: Named range not found: 'DataTable' in sheet 'Inclusions' ``` **Fix:** Create named range in Excel or correct the name in configuration **Column reference invalid:** ``` ✗ CRITICAL: Column mapping references invalid field: 'unknown_field' ``` **Fix:** Check field name matches Inclusions_Mapping or Organizations_Mapping exactly **JSON parse error:** ``` ✗ CRITICAL: Invalid JSON in column_mapping: {col_id: "patient_id"} ``` **Fix:** Ensure all JSON fields use double quotes and valid syntax ### Runtime Errors **No matching data:** ``` ⚠ WARNING: Filter condition found no matching items for sheet 'Inclusions' ``` **Possible Causes:** - Filter too restrictive - Filter field doesn't exist - No data in source **Fix:** Review filter_condition, check data exists **File write error:** ``` ✗ ERROR: Could not write file: Permission denied ``` **Possible Causes:** - File open in another program - No write permissions - Disk full **Fix:** Close Excel, check permissions, check disk space --- ## Best Practices ### Configuration Management 1. **Backup Config** - Keep version history - Comment changes in Excel or separate document 2. **Test Changes** - Use `--excel_only` mode for quick testing - Run full process periodically to verify 3. **Document Mappings** - Maintain spreadsheet of field meanings - Update when fields change 4. **Naming Consistency** - Use same field names across tables - Use descriptive, self-documenting names ### Performance Optimization 1. **Filter Early** - Use filter_condition to reduce data - Smaller datasets = faster processing 2. **Smart Sorting** - Don't sort if not needed - Sort by indexed fields when possible 3. **Template Optimization** - Minimize template complexity - Remove unnecessary formulas ### Data Quality 1. **Validation** - Verify filter_condition results - Check sort_keys order makes sense - Test value_replacement transformations 2. **Documentation** - Document why each filter exists - Document expected results - Include contact info for questions ### Security 1. **File Permissions** - Restrict config file access (contains sensitive paths) - Backup encrypted if needed 2. **Data Privacy** - Excel files contain patient data - Handle per organization policy - Ensure secure storage/transmission --- ## Troubleshooting ### Configuration Issues **"Excel config file not found"** - Path: `config/Endobest_Dashboard_Config.xlsx` - Check file exists in correct location **"Required column missing"** - Check all required columns present - Don't delete or rename columns - Use exact column names **"Workbook name mismatch"** - Excel_Sheets.workbook_name must match Excel_Workbooks.workbook_name exactly - Check spelling and case ### Template Issues **"Template file not found"** - Verify file in `config/templates/` folder - Check path relative to config (not root) - Example correct: `templates/MyTemplate.xlsx` - Example incorrect: `config/templates/MyTemplate.xlsx` **"Named range not found"** - Open template in Excel - Formulas → Name Manager - Verify range exists and spelling matches **"Invalid target cell"** - Check cell reference format (A1, B10, etc.) or range name - Verify cell/range exists in sheet ### Data Issues **"No data in Excel cells"** - Check filter_condition isn't too restrictive - Verify source data exists (run --check-only) - Check column_mapping field names are correct **"Column order wrong"** - Column order determined by column_mapping object key order - In newer Excel: right-click → "Edit in formula bar" to see order - Reorder keys in JSON to change column order **"Values not replaced"** - Check value_replacement type matches actual data type - Boolean True ≠ string "true" - Check rule order (first match wins) **"Dates sorting incorrectly"** - Dates must be ISO format: YYYY-MM-DD - Check field value format - If text looks like date but formats as text in Excel, may sort alphabetically --- ## Advanced Configuration ### Template Variables in Variable Cells Use variables to populate single cells: ``` target: TimestampCell source_type: Variable In Excel template, cell value: "Extracted: {extract_date_time}" Result: "Extracted: 2025-01-15T14:30:45+01:00" ``` ### Dynamic Filenames Create filenames that reflect data/content: ``` output_filename: "{workbook_name}_{extract_year}_{extract_month}.xlsx" Results in: "Statistics_2025_01.xlsx" "Endobest_Output_2025_01.xlsx" ``` ### Cascading Filters & Sorts Apply multiple rules: ``` filter_condition: {"status": "active", "center": "PARIS01", "type": "inclusion"} sort_keys: [ ["visit_order", "asc"], ["date_visit", "desc"], ["patient_name", "asc"] ] ``` --- **End of Configuration Guide** For user guide, see DOCUMENTATION_98_USER_GUIDE.md For architecture details, see DOCUMENTATION_13_EXCEL_EXPORT.md