Files
EB_Dashboard/DOCUMENTATION/DOCUMENTATION_99_CONFIG_GUIDE.md

25 KiB

Endobest Dashboard - Configuration Guide

Document Version: 1.0 Last Updated: 2025-11-08 Audience: System Administrators, Configuration Managers Language: English


Configuration Overview

The Endobest Dashboard is configured entirely through Excel files - no code changes needed.

Main Configuration File

File Location: config/Endobest_Dashboard_Config.xlsx

Contains:

  • Inclusions_Mapping - Field definitions for inclusion data
  • Organizations_Mapping - Field definitions for organization data
  • Excel_Workbooks - Metadata for Excel export
  • Excel_Sheets - Sheet definitions and data transformation rules
  • Regression_Check - Quality check rules

This guide focuses on Excel_Workbooks and Excel_Sheets tables (for Excel export configuration).


Table of Contents

  1. File Location & Structure
  2. Inclusions_Mapping (Reference)
  3. Organizations_Mapping (Reference)
  4. Excel_Workbooks Table
  5. Excel_Sheets Table
  6. Data Types & Formats
  7. JSON Field Specifications
  8. Naming Conventions
  9. Configuration Examples
  10. Validation & Error Messages
  11. Best Practices
  12. Troubleshooting

File Location & Structure

Directory Layout

Endobest Dashboard/
├── eb_dashboard.py (main script)
├── config/
│   ├── Endobest_Dashboard_Config.xlsx (← CONFIGURATION FILE)
│   ├── Endobest_Extended_Fields.xlsx (old, deprecated)
│   ├── eb_org_center_mapping.xlsx
│   └── templates/
│       ├── Endobest_Template.xlsx
│       ├── Statistics_Template.xlsx
│       └── (other templates)
├── endobest_inclusions.json (output)
├── endobest_organizations.json (output)
└── dashboard.log

Opening & Editing

  1. Open config/Endobest_Dashboard_Config.xlsx in Excel
  2. Go to specific sheet tab
  3. Edit rows as needed
  4. Save file
  5. Run script - changes take effect on next run

Important: Do NOT change column order or delete required columns.


Inclusions_Mapping (Reference)

This table defines which patient fields to include in export.

Purpose

Specifies which inclusion data fields are available for use in:

  • Excel export (column_mapping in Excel_Sheets)
  • Quality checks
  • Regression testing

Columns

Column Type Example Notes
Field_Selection Action [["include", "."]] Pipeline of include/exclude actions
Field_Name Text patient_id Internal name used in column_mapping

Usage in Excel Export

The Field_Name values are used in column_mapping:

{
  "col_patient_id": "patient_id",
  "col_name": "patient_name",
  "col_status": "inclusion_status"
}

Map Excel Column Name → Inclusion Field Name


Organizations_Mapping (Reference)

This table defines which organization fields to include in export.

Purpose

Specifies which organization data fields are available for use in:

  • Excel export (column_mapping for Organizations source_type)
  • Quality checks

Columns

Column Type Example Notes
Field_Name Text org_id Internal name
org_id Text org.id Data source path
org_name Text org.name Organization name

Usage in Excel Export

The Field_Name values are used in column_mapping:

{
  "col_org_code": "org_id",
  "col_org_name": "org_name"
}

Excel_Workbooks Table

Defines metadata for each Excel file to generate.

Purpose

Specifies WHAT Excel files to create, using which templates, with what naming.

Column Definitions

workbook_name (Required)

  • Type: Text
  • Length: 1-255 characters
  • Example: Endobest_Output, Statistics_Report, Monthly_Summary
  • Usage: Unique identifier referenced in Excel_Sheets table
  • Rules: Must be unique within the table
  • Notes: Used in template variables as {workbook_name}

template_path (Required)

  • Type: Text (file path)
  • Example: templates/Endobest_Template.xlsx
  • Relative To: config/ folder
  • Rules: Path is relative, not absolute
  • Validation: Script checks file exists before export
  • Notes: Template must be valid Excel (.xlsx) file
  • Error if:
    • File doesn't exist
    • File is not .xlsx format
    • Path is absolute instead of relative

output_filename (Required)

  • Type: Text (filename template)
  • Example: {workbook_name}_{extract_date_time}.xlsx
  • Available Variables:
    • {workbook_name} - From workbook_name column
    • {extract_date_time} - Full ISO datetime (2025-01-15T14:30:45+01:00)
    • {extract_year} - Year (2025)
    • {extract_month} - Month (01-12)
    • {extract_day} - Day (01-31)
  • Processed As: Python f-string via .format()
  • Example Results:
    • Report_{extract_date_time}.xlsxReport_2025-01-15T14-30-45.xlsx
    • {workbook_name}_Month{extract_month}.xlsxEndobest_Output_Month01.xlsx
  • Rules:
    • Must include .xlsx extension
    • Must be valid filename (no /, , :, *, ?, ", <, >, |)
    • Variables are case-sensitive

output_exists_action (Required)

  • Type: Text (one of three values)
  • Valid Values:
    • Overwrite - Replace existing file
    • Increment - Append _1, _2, etc.
    • Backup - Rename existing to _backup_1, etc.
  • Default: Increment (recommended for safety)
  • Behavior:
Action If file exists Result
Overwrite report.xlsx Deletes report.xlsx, creates new
Increment report.xlsx, report_1.xlsx Creates report_2.xlsx
Backup report.xlsx Renames to report_backup_1.xlsx, creates new report.xlsx

Row Rules

  • Each row generates ONE Excel file
  • All columns must be filled (no empty cells)
  • workbook_name must be unique
  • Multiple workbooks allowed

Example Rows

Row 1:
  workbook_name: Endobest_Output
  template_path: templates/Endobest_Template.xlsx
  output_filename: {workbook_name}_{extract_date_time}.xlsx
  output_exists_action: Increment

Row 2:
  workbook_name: Statistics_Report
  template_path: templates/Statistics.xlsx
  output_filename: {workbook_name}_{extract_year}-{extract_month}.xlsx
  output_exists_action: Overwrite

Excel_Sheets Table

Defines how to fill sheets within the workbooks.

Purpose

Specifies HOW to fill each sheet:

  • Which data to use (Inclusions/Organizations/Variable)
  • How to transform it (filter, sort, replace)
  • Where to put it (target cell/range)

Column Definitions

workbook_name (Required)

  • Type: Text
  • Example: Endobest_Output
  • Rules: Must match exactly one row in Excel_Workbooks table
  • Validation: Script checks reference exists

sheet_name (Required)

  • Type: Text
  • Example: Inclusions, Summary, Organizations
  • Rules: Must match sheet name in template exactly
  • Validation: Script checks sheet exists in template

source_type (Required)

  • Type: Text (one of three values)
  • Valid Values:
    • Variable - Single variable value (timestamp, text, etc.)
    • Inclusions - Patient inclusion data
    • Organizations - Organization data
  • Rules: Determines what column_mapping is required

target (Required)

  • Type: Text (cell reference or named range)
  • Format:
    • Cell reference: A1, B10, Title_Cell
    • Named range: DataTable, InclusionsRange, etc.
  • For Variable: Single cell (not a range)
  • For Inclusions/Organizations: Named range with height=1 (single row for headers, data below)
  • Validation: Script checks target exists in template

column_mapping (Conditional)

  • Required If: source_type = Inclusions OR Organizations
  • Type: JSON object
  • Format: {"excel_column_name": "data_field_name", ...}
  • Example (Inclusions):
    {
      "col_id": "patient_id",
      "col_name": "patient_name",
      "col_status": "inclusion_status",
      "col_date": "date_inclusion"
    }
    
  • Example (Organizations):
    {
      "col_code": "org_id",
      "col_name": "org_name",
      "col_count": "patient_count"
    }
    
  • Field Names: Must match names in Inclusions_Mapping or Organizations_Mapping
  • Column Order: Determines order of columns in Excel (left to right)
  • Validation: Script checks all field names exist in mapping
  • For Variable: Leave empty (NULL or omit)

filter_condition (Optional)

  • Type: JSON object (AND conditions)
  • Default: NULL (no filtering, all items included)
  • Format: {"field_name": expected_value, ...}
  • Example:
    {
      "status": "active",
      "visit_type": "inclusion"
    }
    
  • Logic: AND (all conditions must match)
    • Item with {"status": "active", "visit_type": "inclusion"} → MATCHES
    • Item with {"status": "active", "visit_type": "follow-up"} → DOES NOT MATCH
  • Nested Fields: Support dot notation
    • "patient.status": "active" matches {"patient": {"status": "active"}}
  • For Variable: Ignored (leave NULL)
  • Types: String, number, boolean values all supported

sort_keys (Optional)

  • Type: JSON array of sort specifications
  • Default: NULL (no sorting, original order)
  • Format: [["field_name", "asc"|"desc"], ["field2", "order", "option"], ...]
  • Example:
    [
      ["date_visit", "desc"],
      ["patient_name", "asc"]
    ]
    
  • Primary/Secondary: First array element is primary sort, second is secondary, etc.
  • Options: Third element can be datetime format ("%Y-%m-%d") or "*natsort" for alphanumeric sorting
  • Order Values:
    • "asc" - Ascending (A→Z, 0→9, old→new dates)
    • "desc" - Descending (Z→A, 9→0, new→old dates)
  • Missing Fields: Items with missing field placed at end
  • Datetime: Auto-detected from ISO format (YYYY-MM-DD) - no configuration needed
  • For Variable: Ignored (leave NULL)

value_replacement (Optional)

  • Type: JSON array of replacement rules

  • Default: NULL (no replacement, original values used)

  • Format: [{rule1}, {rule2}, ...]

  • Logic: First matching rule wins (stop at first match)

  • Types Supported:

    Boolean replacement:

    {
      "type": "bool",
      "true": "Yes",
      "false": "No"
    }
    
    • Matches: Python boolean True / False (not strings)
    • Replaces: True → "Yes", False → "No"

    String replacement:

    {
      "type": "str",
      "from": "active",
      "to": "Active Status"
    }
    
    • Matches: String "active" (exact, case-sensitive)
    • Does NOT match: "Active" or "ACTIVE"

    Integer replacement:

    {
      "type": "int",
      "from": 0,
      "to": "Not Applicable"
    }
    
    • Matches: Integer 0 (not string "0")
    • Replaces: 0 → "Not Applicable"
  • Type Matching: Strict - boolean True ≠ string "true"

  • Multiple Rules Example:

    [
      {"type": "bool", "true": "Yes", "false": "No"},
      {"type": "str", "from": "active", "to": "Active"},
      {"type": "str", "from": "inactive", "to": "Inactive"}
    ]
    
    • Booleans match first rule
    • "active" matches second rule
    • "inactive" matches third rule
    • Other strings pass through unchanged
  • For Variable: Ignored (leave NULL)

Row Rules

  • Each row defines ONE sheet in ONE workbook
  • Source_type determines required fields:
    • Variable: column_mapping, filter_condition, sort_keys, value_replacement all ignored
    • Inclusions/Organizations: column_mapping REQUIRED, others optional
  • Multiple rows for same workbook allowed (multiple sheets)
  • Multiple rows for same sheet not recommended (last wins)

Example Configurations

Simple Inclusions Table:

workbook_name: Endobest_Output
sheet_name: Inclusions
source_type: Inclusions
target: DataTable
column_mapping: {"col_id": "patient_id", "col_name": "patient_name"}
filter_condition: {"status": "active"}
sort_keys: [["date_inclusion", "desc"]]
value_replacement: NULL

Multiple Sheets:

Row 1 (Title):
  workbook_name: Report
  sheet_name: Title
  source_type: Variable
  target: TitleCell
  (other columns ignored)

Row 2 (Inclusions):
  workbook_name: Report
  sheet_name: Data
  source_type: Inclusions
  target: InclusionTable
  column_mapping: {...}

Row 3 (Organizations):
  workbook_name: Report
  sheet_name: Orgs
  source_type: Organizations
  target: OrgTable
  column_mapping: {...}

Complex Transformations:

workbook_name: Statistics
sheet_name: SummaryData
source_type: Inclusions
target: SummaryTable
column_mapping: {
  "col_id": "patient_id",
  "col_status": "status",
  "col_activated": "is_activated"
}
filter_condition: {"status": "active"}
sort_keys: [
  ["status", "asc"],
  ["date_visit", "desc"]
]
value_replacement: [
  {"type": "bool", "true": "✓", "false": "✗"},
  {"type": "str", "from": "active", "to": "Active"},
  {"type": "str", "from": "pending", "to": "Pending"}
]

Data Types & Formats

Text Fields

  • Type: Plain text
  • Length: As needed
  • Special Characters: Allowed in values, but not in field names
  • Examples: patient_id, Inclusions, Endobest_Output

JSON Fields

  • Type: Valid JSON format
  • Validation: Must be valid JSON or NULL
  • Common Mistakes:
    • Missing quotes: {col_id: "patient_id"} ✗ (should be {"col_id": "patient_id"})
    • Single quotes: {'col_id': 'patient_id'} ✗ (JSON uses double quotes)
    • Trailing commas: {"a": 1,} ✗ (not valid JSON)
  • Validation: Script validates JSON parsing before use

Dates & Times

  • Format: ISO 8601 (YYYY-MM-DD or YYYY-MM-DDTHH:MM:SS)
  • Example: 2025-01-15, 2025-01-15T14:30:45
  • Timezone: Convert to UTC before storing
  • Auto-Detection: Script auto-detects datetime fields and parses correctly

JSON Field Specifications

column_mapping JSON

Structure:

{
  "excel_column_1": "field_name_1",
  "excel_column_2": "field_name_2",
  ...
}

Rules:

  • Keys (left side): Column names (can be any text)
  • Values (right side): Must match Inclusions_Mapping or Organizations_Mapping
  • Order: Determines column order in Excel (left to right)
  • Count: No limit, but must fit in target range

Validation:

  • All values must exist in source mapping
  • Extra columns cause error
  • Missing columns fill with blanks

filter_condition JSON

Structure:

{
  "field_1": value_1,
  "field_2": value_2,
  ...
}

Rules:

  • Keys (left side): Field names (from mapping)
  • Values (right side): Literal values to match
  • Logic: AND (all conditions must match)
  • Empty object: {} matches all (no filtering)

Value Types Supported:

  • String: "active"
  • Number: 123, 45.67
  • Boolean: true, false (JSON format, not quoted)
  • NULL: null

Example:

{
  "status": "active",
  "center_code": "PARIS01",
  "patient_count": 10
}

Matches only items with ALL three conditions.

sort_keys JSON

Structure:

[
  ["field_name_1", "asc"],
  ["field_name_2", "desc"],
  ["field_name_3", "asc", "option"]
]

Rules:

  • Array of arrays format (ordered list)
  • Each sort specification: [field, order] or [field, order, option]
  • Field: Must exist in source data
  • Order: "asc" or "desc" only
  • Option (optional): Special sorting behavior (see below)
  • Empty array: [] means no sorting

Field Matching:

  • Exact field name match required
  • Case-sensitive field names
  • String comparison: Case-insensitive by default
    • "Centre Evidens" comes before "CHU Hospital" (natural alphabetical order)

Optional Third Parameter:

  1. Datetime Format:

    ["date_field", "desc", "%Y-%m-%d"]
    
    • Provide Python strptime format for custom date parsing
    • Example formats: "%d/%m/%Y", "%Y-%m-%d %H:%M:%S"
  2. Natural Alphanumeric Sorting:

    ["patient_id", "asc", "*natsort"]
    
    • Use "*natsort" for natural sorting of alphanumeric codes
    • Correctly sorts: "ENDOBEST-003-3-BA" < "ENDOBEST-003-20-BA"
    • Also handles: "file2.txt" < "file10.txt", "v1.9" < "v1.10"
    • Perfect for patient IDs, version numbers, sequential codes

value_replacement JSON

Structure:

[
  {
    "type": "TYPE_NAME",
    "TYPE_SPECIFIC_FIELDS": values
  },
  ...
]

Boolean Type:

{
  "type": "bool",
  "true": "Replacement for True",
  "false": "Replacement for False"
}

String Type:

{
  "type": "str",
  "from": "Source string",
  "to": "Replacement string"
}

Integer Type:

{
  "type": "int",
  "from": 123,
  "to": "Replacement"
}

Rules:

  • Each rule must have "type" field
  • Other fields required per type
  • Evaluated in order (first match wins)
  • NULL or empty array means no replacement

Naming Conventions

File & Path Naming

  • Paths: Relative to config/ folder
  • Separators: Use forward slash / (not backslash \)
  • Extensions: Must include .xlsx
  • Spaces: Avoid in filenames (use underscore or camelCase)

Column Naming

  • No spaces: Use underscores or camelCase
  • Avoid special characters: Letters, numbers, underscore only
  • Length: Keep reasonable (avoid 100+ char names)
  • Consistency: Use same names across configuration

Field Naming

  • From Mapping: Use exact names from Inclusions_Mapping or Organizations_Mapping
  • Case-Sensitive: Field_Name ≠ field_name
  • Match Required: Must exist in mapping

Excel Named Ranges

  • Define in Excel: Formulas → Name Manager → New
  • Naming: Same rules as column naming
  • Scope: Sheet-level or Workbook-level both OK
  • Used in: target column of Excel_Sheets

Configuration Examples

Example 1: Simple Patient Report

Excel_Workbooks:

workbook_name          | template_path              | output_filename                    | output_exists_action
Endobest_Report        | templates/Simple.xlsx      | Report_{extract_date_time}.xlsx    | Increment

Excel_Sheets:

workbook_name    | sheet_name | source_type  | target     | column_mapping                        | filter_condition | sort_keys
Endobest_Report  | Patients   | Inclusions   | PatientTbl | {"ID": "patient_id",                  | {"status":       | [{"field": "date_inclusion",
                 |            |              |            |  "Name": "patient_name",              |  "active"}       |   "order": "asc"}]
                 |            |              |            |  "Date": "date_inclusion"}            |                  |

Example 2: Multi-Sheet Report

Excel_Workbooks:

workbook_name | template_path         | output_filename          | output_exists_action
FullReport    | templates/Multi.xlsx  | {workbook_name}_{extract_month}.xlsx | Overwrite

Excel_Sheets (3 rows):

Row 1 (Title):
workbook_name | sheet_name | source_type | target    | column_mapping | filter_condition | sort_keys
FullReport    | Cover      | Variable    | TitleCell | NULL           | NULL             | NULL

Row 2 (Inclusions):
workbook_name | sheet_name | source_type | target     | column_mapping                       | filter_condition     | sort_keys
FullReport    | Inclusions | Inclusions  | IncTbl     | {"col_id": "patient_id",             | {"status": "active"} | [{"field": "date_visit",
              |            |             |            |  "col_name": "patient_name",        |                      |   "order": "desc"}]
              |            |             |            |  "col_site": "site_id"}             |                      |

Row 3 (Organizations):
workbook_name | sheet_name | source_type | target    | column_mapping                | filter_condition | sort_keys
FullReport    | Summary    | Organizations | OrgTbl   | {"Name": "org_name",          | NULL             | [{"field": "org_name",
              |            |             |            |  "Count": "patient_count"}    |                  |   "order": "asc"}]

Validation & Error Messages

Configuration Errors (Startup)

Template file missing:

✗ CRITICAL: Template file missing: config/templates/Missing.xlsx

Fix: Verify file exists and path is correct

Named range not found:

✗ CRITICAL: Named range not found: 'DataTable' in sheet 'Inclusions'

Fix: Create named range in Excel or correct the name in configuration

Column reference invalid:

✗ CRITICAL: Column mapping references invalid field: 'unknown_field'

Fix: Check field name matches Inclusions_Mapping or Organizations_Mapping exactly

JSON parse error:

✗ CRITICAL: Invalid JSON in column_mapping: {col_id: "patient_id"}

Fix: Ensure all JSON fields use double quotes and valid syntax

Runtime Errors

No matching data:

⚠ WARNING: Filter condition found no matching items for sheet 'Inclusions'

Possible Causes:

  • Filter too restrictive
  • Filter field doesn't exist
  • No data in source Fix: Review filter_condition, check data exists

File write error:

✗ ERROR: Could not write file: Permission denied

Possible Causes:

  • File open in another program
  • No write permissions
  • Disk full Fix: Close Excel, check permissions, check disk space

Best Practices

Configuration Management

  1. Backup Config

    • Keep version history
    • Comment changes in Excel or separate document
  2. Test Changes

    • Use --excel_only mode for quick testing
    • Run full process periodically to verify
  3. Document Mappings

    • Maintain spreadsheet of field meanings
    • Update when fields change
  4. Naming Consistency

    • Use same field names across tables
    • Use descriptive, self-documenting names

Performance Optimization

  1. Filter Early

    • Use filter_condition to reduce data
    • Smaller datasets = faster processing
  2. Smart Sorting

    • Don't sort if not needed
    • Sort by indexed fields when possible
  3. Template Optimization

    • Minimize template complexity
    • Remove unnecessary formulas

Data Quality

  1. Validation

    • Verify filter_condition results
    • Check sort_keys order makes sense
    • Test value_replacement transformations
  2. Documentation

    • Document why each filter exists
    • Document expected results
    • Include contact info for questions

Security

  1. File Permissions

    • Restrict config file access (contains sensitive paths)
    • Backup encrypted if needed
  2. Data Privacy

    • Excel files contain patient data
    • Handle per organization policy
    • Ensure secure storage/transmission

Troubleshooting

Configuration Issues

"Excel config file not found"

  • Path: config/Endobest_Dashboard_Config.xlsx
  • Check file exists in correct location

"Required column missing"

  • Check all required columns present
  • Don't delete or rename columns
  • Use exact column names

"Workbook name mismatch"

  • Excel_Sheets.workbook_name must match Excel_Workbooks.workbook_name exactly
  • Check spelling and case

Template Issues

"Template file not found"

  • Verify file in config/templates/ folder
  • Check path relative to config (not root)
  • Example correct: templates/MyTemplate.xlsx
  • Example incorrect: config/templates/MyTemplate.xlsx

"Named range not found"

  • Open template in Excel
  • Formulas → Name Manager
  • Verify range exists and spelling matches

"Invalid target cell"

  • Check cell reference format (A1, B10, etc.) or range name
  • Verify cell/range exists in sheet

Data Issues

"No data in Excel cells"

  • Check filter_condition isn't too restrictive
  • Verify source data exists (run --check-only)
  • Check column_mapping field names are correct

"Column order wrong"

  • Column order determined by column_mapping object key order
  • In newer Excel: right-click → "Edit in formula bar" to see order
  • Reorder keys in JSON to change column order

"Values not replaced"

  • Check value_replacement type matches actual data type
  • Boolean True ≠ string "true"
  • Check rule order (first match wins)

"Dates sorting incorrectly"

  • Dates must be ISO format: YYYY-MM-DD
  • Check field value format
  • If text looks like date but formats as text in Excel, may sort alphabetically

Advanced Configuration

Template Variables in Variable Cells

Use variables to populate single cells:

target: TimestampCell
source_type: Variable

In Excel template, cell value:
"Extracted: {extract_date_time}"

Result:
"Extracted: 2025-01-15T14:30:45+01:00"

Dynamic Filenames

Create filenames that reflect data/content:

output_filename: "{workbook_name}_{extract_year}_{extract_month}.xlsx"

Results in:
"Statistics_2025_01.xlsx"
"Endobest_Output_2025_01.xlsx"

Cascading Filters & Sorts

Apply multiple rules:

filter_condition: {"status": "active", "center": "PARIS01", "type": "inclusion"}
sort_keys: [
  ["visit_order", "asc"],
  ["date_visit", "desc"],
  ["patient_name", "asc"]
]

End of Configuration Guide

For user guide, see DOCUMENTATION_98_USER_GUIDE.md For architecture details, see DOCUMENTATION_13_EXCEL_EXPORT.md