Files

Abdelkouddous LHACHIMI cb8b5d9a12 Version fonctionnelle

2025-12-12 23:07:26 +01:00

25 KiB

Raw Blame History

Endobest Dashboard - Configuration Guide

Document Version: 1.0 Last Updated: 2025-11-08 Audience: System Administrators, Configuration Managers Language: English

Configuration Overview

The Endobest Dashboard is configured entirely through Excel files - no code changes needed.

Main Configuration File

File Location: config/Endobest_Dashboard_Config.xlsx

Contains:

Inclusions_Mapping - Field definitions for inclusion data
Organizations_Mapping - Field definitions for organization data
Excel_Workbooks - Metadata for Excel export
Excel_Sheets - Sheet definitions and data transformation rules
Regression_Check - Quality check rules

This guide focuses on Excel_Workbooks and Excel_Sheets tables (for Excel export configuration).

File Location & Structure
Inclusions_Mapping (Reference)
Organizations_Mapping (Reference)
Excel_Workbooks Table
Excel_Sheets Table
Data Types & Formats
JSON Field Specifications
Naming Conventions
Configuration Examples
Validation & Error Messages
Best Practices
Troubleshooting

File Location & Structure

Directory Layout

Endobest Dashboard/
├── eb_dashboard.py (main script)
├── config/
│   ├── Endobest_Dashboard_Config.xlsx (← CONFIGURATION FILE)
│   ├── Endobest_Extended_Fields.xlsx (old, deprecated)
│   ├── eb_org_center_mapping.xlsx
│   └── templates/
│       ├── Endobest_Template.xlsx
│       ├── Statistics_Template.xlsx
│       └── (other templates)
├── endobest_inclusions.json (output)
├── endobest_organizations.json (output)
└── dashboard.log

Opening & Editing

Open config/Endobest_Dashboard_Config.xlsx in Excel
Go to specific sheet tab
Edit rows as needed
Save file
Run script - changes take effect on next run

Important: Do NOT change column order or delete required columns.

Inclusions_Mapping (Reference)

This table defines which patient fields to include in export.

Purpose

Specifies which inclusion data fields are available for use in:

Excel export (column_mapping in Excel_Sheets)
Quality checks
Regression testing

Columns

Column	Type	Example	Notes
Field_Selection	Action	[["include", "."]]	Pipeline of include/exclude actions
Field_Name	Text	patient_id	Internal name used in column_mapping

Usage in Excel Export

The Field_Name values are used in column_mapping:

{
  "col_patient_id": "patient_id",
  "col_name": "patient_name",
  "col_status": "inclusion_status"
}

Map Excel Column Name → Inclusion Field Name

Organizations_Mapping (Reference)

This table defines which organization fields to include in export.

Purpose

Specifies which organization data fields are available for use in:

Excel export (column_mapping for Organizations source_type)
Quality checks

Columns

Column	Type	Example	Notes
Field_Name	Text	org_id	Internal name
org_id	Text	org.id	Data source path
org_name	Text	org.name	Organization name

Usage in Excel Export

The Field_Name values are used in column_mapping:

{
  "col_org_code": "org_id",
  "col_org_name": "org_name"
}

Excel_Workbooks Table

Defines metadata for each Excel file to generate.

Purpose

Specifies WHAT Excel files to create, using which templates, with what naming.

Column Definitions

workbook_name (Required)

Type: Text
Length: 1-255 characters
Example: Endobest_Output, Statistics_Report, Monthly_Summary
Usage: Unique identifier referenced in Excel_Sheets table
Rules: Must be unique within the table
Notes: Used in template variables as {workbook_name}

template_path (Required)

Type: Text (file path)
Example: templates/Endobest_Template.xlsx
Relative To: config/ folder
Rules: Path is relative, not absolute
Validation: Script checks file exists before export
Notes: Template must be valid Excel (.xlsx) file
Error if:
- File doesn't exist
- File is not .xlsx format
- Path is absolute instead of relative

output_filename (Required)

Type: Text (filename template)
Example: {workbook_name}_{extract_date_time}.xlsx
Available Variables:
- {workbook_name} - From workbook_name column
- {extract_date_time} - Full ISO datetime (2025-01-15T14:30:45+01:00)
- {extract_year} - Year (2025)
- {extract_month} - Month (01-12)
- {extract_day} - Day (01-31)
Processed As: Python f-string via .format()
Example Results:
- Report_{extract_date_time}.xlsx → Report_2025-01-15T14-30-45.xlsx
- {workbook_name}_Month{extract_month}.xlsx → Endobest_Output_Month01.xlsx
Rules:
- Must include .xlsx extension
- Must be valid filename (no /, , :, *, ?, ", <, >, |)
- Variables are case-sensitive

output_exists_action (Required)

Type: Text (one of three values)
Valid Values:
- Overwrite - Replace existing file
- Increment - Append _1, _2, etc.
- Backup - Rename existing to _backup_1, etc.
Default: Increment (recommended for safety)
Behavior:

Action	If file exists	Result
Overwrite	`report.xlsx`	Deletes `report.xlsx`, creates new
Increment	`report.xlsx`, `report_1.xlsx`	Creates `report_2.xlsx`
Backup	`report.xlsx`	Renames to `report_backup_1.xlsx`, creates new `report.xlsx`

Row Rules

Each row generates ONE Excel file
All columns must be filled (no empty cells)
workbook_name must be unique
Multiple workbooks allowed

Example Rows

Row 1:
  workbook_name: Endobest_Output
  template_path: templates/Endobest_Template.xlsx
  output_filename: {workbook_name}_{extract_date_time}.xlsx
  output_exists_action: Increment

Row 2:
  workbook_name: Statistics_Report
  template_path: templates/Statistics.xlsx
  output_filename: {workbook_name}_{extract_year}-{extract_month}.xlsx
  output_exists_action: Overwrite

Excel_Sheets Table

Defines how to fill sheets within the workbooks.

Purpose

Specifies HOW to fill each sheet:

Which data to use (Inclusions/Organizations/Variable)
How to transform it (filter, sort, replace)
Where to put it (target cell/range)

Column Definitions

workbook_name (Required)

Type: Text
Example: Endobest_Output
Rules: Must match exactly one row in Excel_Workbooks table
Validation: Script checks reference exists

sheet_name (Required)

Type: Text
Example: Inclusions, Summary, Organizations
Rules: Must match sheet name in template exactly
Validation: Script checks sheet exists in template

source_type (Required)

Type: Text (one of three values)
Valid Values:
- Variable - Single variable value (timestamp, text, etc.)
- Inclusions - Patient inclusion data
- Organizations - Organization data
Rules: Determines what column_mapping is required

target (Required)

Type: Text (cell reference or named range)
Format:
- Cell reference: A1, B10, Title_Cell
- Named range: DataTable, InclusionsRange, etc.
For Variable: Single cell (not a range)
For Inclusions/Organizations: Named range with height=1 (single row for headers, data below)
Validation: Script checks target exists in template

column_mapping (Conditional)

Required If: source_type = Inclusions OR Organizations
Type: JSON object
Format: {"excel_column_name": "data_field_name", ...}

Example (Inclusions):

{
  "col_id": "patient_id",
  "col_name": "patient_name",
  "col_status": "inclusion_status",
  "col_date": "date_inclusion"
}

Example (Organizations):

{
  "col_code": "org_id",
  "col_name": "org_name",
  "col_count": "patient_count"
}

Field Names: Must match names in Inclusions_Mapping or Organizations_Mapping
Column Order: Determines order of columns in Excel (left to right)
Validation: Script checks all field names exist in mapping
For Variable: Leave empty (NULL or omit)

filter_condition (Optional)

Type: JSON object (AND conditions)
Default: NULL (no filtering, all items included)
Format: {"field_name": expected_value, ...}

Example:

{
  "status": "active",
  "visit_type": "inclusion"
}

Logic: AND (all conditions must match)
- Item with {"status": "active", "visit_type": "inclusion"} → MATCHES
- Item with {"status": "active", "visit_type": "follow-up"} → DOES NOT MATCH
Nested Fields: Support dot notation
- "patient.status": "active" matches {"patient": {"status": "active"}}
For Variable: Ignored (leave NULL)
Types: String, number, boolean values all supported

sort_keys (Optional)

Type: JSON array of sort specifications
Default: NULL (no sorting, original order)
Format: [["field_name", "asc"|"desc"], ["field2", "order", "option"], ...]

Example:

[
  ["date_visit", "desc"],
  ["patient_name", "asc"]
]

Primary/Secondary: First array element is primary sort, second is secondary, etc.
Options: Third element can be datetime format ("%Y-%m-%d") or "*natsort" for alphanumeric sorting
Order Values:
- "asc" - Ascending (A→Z, 0→9, old→new dates)
- "desc" - Descending (Z→A, 9→0, new→old dates)
Missing Fields: Items with missing field placed at end
Datetime: Auto-detected from ISO format (YYYY-MM-DD) - no configuration needed
For Variable: Ignored (leave NULL)

value_replacement (Optional)

Type: JSON array of replacement rules
Default: NULL (no replacement, original values used)
Format: [{rule1}, {rule2}, ...]
Logic: First matching rule wins (stop at first match)
Types Supported:

Boolean replacement:
```
{
  "type": "bool",
  "true": "Yes",
  "false": "No"
}
```
- Matches: Python boolean True / False (not strings)
- Replaces: True → "Yes", False → "No"
String replacement:
```
{
  "type": "str",
  "from": "active",
  "to": "Active Status"
}
```
- Matches: String "active" (exact, case-sensitive)
- Does NOT match: "Active" or "ACTIVE"
Integer replacement:
```
{
  "type": "int",
  "from": 0,
  "to": "Not Applicable"
}
```
- Matches: Integer 0 (not string "0")
- Replaces: 0 → "Not Applicable"
Type Matching: Strict - boolean True ≠ string "true"

Multiple Rules Example:

[
  {"type": "bool", "true": "Yes", "false": "No"},
  {"type": "str", "from": "active", "to": "Active"},
  {"type": "str", "from": "inactive", "to": "Inactive"}
]

Booleans match first rule
"active" matches second rule
"inactive" matches third rule
Other strings pass through unchanged

For Variable: Ignored (leave NULL)

Row Rules

Each row defines ONE sheet in ONE workbook
Source_type determines required fields:
- Variable: column_mapping, filter_condition, sort_keys, value_replacement all ignored
- Inclusions/Organizations: column_mapping REQUIRED, others optional
Multiple rows for same workbook allowed (multiple sheets)
Multiple rows for same sheet not recommended (last wins)

Example Configurations

Simple Inclusions Table:

workbook_name: Endobest_Output
sheet_name: Inclusions
source_type: Inclusions
target: DataTable
column_mapping: {"col_id": "patient_id", "col_name": "patient_name"}
filter_condition: {"status": "active"}
sort_keys: [["date_inclusion", "desc"]]
value_replacement: NULL

Multiple Sheets:

Row 1 (Title):
  workbook_name: Report
  sheet_name: Title
  source_type: Variable
  target: TitleCell
  (other columns ignored)

Row 2 (Inclusions):
  workbook_name: Report
  sheet_name: Data
  source_type: Inclusions
  target: InclusionTable
  column_mapping: {...}

Row 3 (Organizations):
  workbook_name: Report
  sheet_name: Orgs
  source_type: Organizations
  target: OrgTable
  column_mapping: {...}

Complex Transformations:

workbook_name: Statistics
sheet_name: SummaryData
source_type: Inclusions
target: SummaryTable
column_mapping: {
  "col_id": "patient_id",
  "col_status": "status",
  "col_activated": "is_activated"
}
filter_condition: {"status": "active"}
sort_keys: [
  ["status", "asc"],
  ["date_visit", "desc"]
]
value_replacement: [
  {"type": "bool", "true": "✓", "false": "✗"},
  {"type": "str", "from": "active", "to": "Active"},
  {"type": "str", "from": "pending", "to": "Pending"}
]

Data Types & Formats

Text Fields

Type: Plain text
Length: As needed
Special Characters: Allowed in values, but not in field names
Examples: patient_id, Inclusions, Endobest_Output

JSON Fields

Type: Valid JSON format
Validation: Must be valid JSON or NULL
Common Mistakes:
- Missing quotes: {col_id: "patient_id"} ✗ (should be {"col_id": "patient_id"})
- Single quotes: {'col_id': 'patient_id'} ✗ (JSON uses double quotes)
- Trailing commas: {"a": 1,} ✗ (not valid JSON)
Validation: Script validates JSON parsing before use

Dates & Times

Format: ISO 8601 (YYYY-MM-DD or YYYY-MM-DDTHH:MM:SS)
Example: 2025-01-15, 2025-01-15T14:30:45
Timezone: Convert to UTC before storing
Auto-Detection: Script auto-detects datetime fields and parses correctly

JSON Field Specifications

column_mapping JSON

Structure:

{
  "excel_column_1": "field_name_1",
  "excel_column_2": "field_name_2",
  ...
}

Rules:

Keys (left side): Column names (can be any text)
Values (right side): Must match Inclusions_Mapping or Organizations_Mapping
Order: Determines column order in Excel (left to right)
Count: No limit, but must fit in target range

Validation:

All values must exist in source mapping
Extra columns cause error
Missing columns fill with blanks

filter_condition JSON

Structure:

{
  "field_1": value_1,
  "field_2": value_2,
  ...
}

Rules:

Keys (left side): Field names (from mapping)
Values (right side): Literal values to match
Logic: AND (all conditions must match)
Empty object: {} matches all (no filtering)

Value Types Supported:

String: "active"
Number: 123, 45.67
Boolean: true, false (JSON format, not quoted)
NULL: null

Example:

{
  "status": "active",
  "center_code": "PARIS01",
  "patient_count": 10
}

Matches only items with ALL three conditions.

sort_keys JSON

Structure:

[
  ["field_name_1", "asc"],
  ["field_name_2", "desc"],
  ["field_name_3", "asc", "option"]
]

Rules:

Array of arrays format (ordered list)
Each sort specification: [field, order] or [field, order, option]
Field: Must exist in source data
Order: "asc" or "desc" only
Option (optional): Special sorting behavior (see below)
Empty array: [] means no sorting

Field Matching:

Exact field name match required
Case-sensitive field names
String comparison: Case-insensitive by default
- "Centre Evidens" comes before "CHU Hospital" (natural alphabetical order)

Optional Third Parameter:

Datetime Format:
```
["date_field", "desc", "%Y-%m-%d"]
```
- Provide Python strptime format for custom date parsing
- Example formats: "%d/%m/%Y", "%Y-%m-%d %H:%M:%S"
Natural Alphanumeric Sorting:
```
["patient_id", "asc", "*natsort"]
```
- Use "*natsort" for natural sorting of alphanumeric codes
- Correctly sorts: "ENDOBEST-003-3-BA" < "ENDOBEST-003-20-BA"
- Also handles: "file2.txt" < "file10.txt", "v1.9" < "v1.10"
- Perfect for patient IDs, version numbers, sequential codes

value_replacement JSON

Structure:

[
  {
    "type": "TYPE_NAME",
    "TYPE_SPECIFIC_FIELDS": values
  },
  ...
]

Boolean Type:

{
  "type": "bool",
  "true": "Replacement for True",
  "false": "Replacement for False"
}

String Type:

{
  "type": "str",
  "from": "Source string",
  "to": "Replacement string"
}

Integer Type:

{
  "type": "int",
  "from": 123,
  "to": "Replacement"
}

Rules:

Each rule must have "type" field
Other fields required per type
Evaluated in order (first match wins)
NULL or empty array means no replacement

Naming Conventions

File & Path Naming

Paths: Relative to config/ folder
Separators: Use forward slash / (not backslash \)
Extensions: Must include .xlsx
Spaces: Avoid in filenames (use underscore or camelCase)

Column Naming

No spaces: Use underscores or camelCase
Avoid special characters: Letters, numbers, underscore only
Length: Keep reasonable (avoid 100+ char names)
Consistency: Use same names across configuration

Field Naming

From Mapping: Use exact names from Inclusions_Mapping or Organizations_Mapping
Case-Sensitive: Field_Name ≠ field_name
Match Required: Must exist in mapping

Excel Named Ranges

Define in Excel: Formulas → Name Manager → New
Naming: Same rules as column naming
Scope: Sheet-level or Workbook-level both OK
Used in: target column of Excel_Sheets

Configuration Examples

Example 1: Simple Patient Report

Excel_Workbooks:

workbook_name          | template_path              | output_filename                    | output_exists_action
Endobest_Report        | templates/Simple.xlsx      | Report_{extract_date_time}.xlsx    | Increment

Excel_Sheets:

workbook_name    | sheet_name | source_type  | target     | column_mapping                        | filter_condition | sort_keys
Endobest_Report  | Patients   | Inclusions   | PatientTbl | {"ID": "patient_id",                  | {"status":       | [{"field": "date_inclusion",
                 |            |              |            |  "Name": "patient_name",              |  "active"}       |   "order": "asc"}]
                 |            |              |            |  "Date": "date_inclusion"}            |                  |

Example 2: Multi-Sheet Report

Excel_Workbooks:

workbook_name | template_path         | output_filename          | output_exists_action
FullReport    | templates/Multi.xlsx  | {workbook_name}_{extract_month}.xlsx | Overwrite

Excel_Sheets (3 rows):

Row 1 (Title):
workbook_name | sheet_name | source_type | target    | column_mapping | filter_condition | sort_keys
FullReport    | Cover      | Variable    | TitleCell | NULL           | NULL             | NULL

Row 2 (Inclusions):
workbook_name | sheet_name | source_type | target     | column_mapping                       | filter_condition     | sort_keys
FullReport    | Inclusions | Inclusions  | IncTbl     | {"col_id": "patient_id",             | {"status": "active"} | [{"field": "date_visit",
              |            |             |            |  "col_name": "patient_name",        |                      |   "order": "desc"}]
              |            |             |            |  "col_site": "site_id"}             |                      |

Row 3 (Organizations):
workbook_name | sheet_name | source_type | target    | column_mapping                | filter_condition | sort_keys
FullReport    | Summary    | Organizations | OrgTbl   | {"Name": "org_name",          | NULL             | [{"field": "org_name",
              |            |             |            |  "Count": "patient_count"}    |                  |   "order": "asc"}]

Validation & Error Messages

Configuration Errors (Startup)

Template file missing:

✗ CRITICAL: Template file missing: config/templates/Missing.xlsx

Fix: Verify file exists and path is correct

Named range not found:

✗ CRITICAL: Named range not found: 'DataTable' in sheet 'Inclusions'

Fix: Create named range in Excel or correct the name in configuration

Column reference invalid:

✗ CRITICAL: Column mapping references invalid field: 'unknown_field'

Fix: Check field name matches Inclusions_Mapping or Organizations_Mapping exactly

JSON parse error:

✗ CRITICAL: Invalid JSON in column_mapping: {col_id: "patient_id"}

Fix: Ensure all JSON fields use double quotes and valid syntax

Runtime Errors

No matching data:

⚠ WARNING: Filter condition found no matching items for sheet 'Inclusions'

Possible Causes:

Filter too restrictive
Filter field doesn't exist
No data in source Fix: Review filter_condition, check data exists

File write error:

✗ ERROR: Could not write file: Permission denied

Possible Causes:

File open in another program
No write permissions
Disk full Fix: Close Excel, check permissions, check disk space

Best Practices

Configuration Management

Backup Config
- Keep version history
- Comment changes in Excel or separate document
Test Changes
- Use --excel_only mode for quick testing
- Run full process periodically to verify
Document Mappings
- Maintain spreadsheet of field meanings
- Update when fields change
Naming Consistency
- Use same field names across tables
- Use descriptive, self-documenting names

Performance Optimization

Filter Early
- Use filter_condition to reduce data
- Smaller datasets = faster processing
Smart Sorting
- Don't sort if not needed
- Sort by indexed fields when possible
Template Optimization
- Minimize template complexity
- Remove unnecessary formulas

Data Quality

Validation
- Verify filter_condition results
- Check sort_keys order makes sense
- Test value_replacement transformations
Documentation
- Document why each filter exists
- Document expected results
- Include contact info for questions

Security

File Permissions
- Restrict config file access (contains sensitive paths)
- Backup encrypted if needed
Data Privacy
- Excel files contain patient data
- Handle per organization policy
- Ensure secure storage/transmission

Troubleshooting

Configuration Issues

"Excel config file not found"

Path: config/Endobest_Dashboard_Config.xlsx
Check file exists in correct location

"Required column missing"

Check all required columns present
Don't delete or rename columns
Use exact column names

"Workbook name mismatch"

Excel_Sheets.workbook_name must match Excel_Workbooks.workbook_name exactly
Check spelling and case

Template Issues

"Template file not found"

Verify file in config/templates/ folder
Check path relative to config (not root)
Example correct: templates/MyTemplate.xlsx
Example incorrect: config/templates/MyTemplate.xlsx

"Named range not found"

Open template in Excel
Formulas → Name Manager
Verify range exists and spelling matches

"Invalid target cell"

Check cell reference format (A1, B10, etc.) or range name
Verify cell/range exists in sheet

Data Issues

"No data in Excel cells"

Check filter_condition isn't too restrictive
Verify source data exists (run --check-only)
Check column_mapping field names are correct

"Column order wrong"

Column order determined by column_mapping object key order
In newer Excel: right-click → "Edit in formula bar" to see order
Reorder keys in JSON to change column order

"Values not replaced"

Check value_replacement type matches actual data type
Boolean True ≠ string "true"
Check rule order (first match wins)

"Dates sorting incorrectly"

Dates must be ISO format: YYYY-MM-DD
Check field value format
If text looks like date but formats as text in Excel, may sort alphabetically

Advanced Configuration

Template Variables in Variable Cells

Use variables to populate single cells:

target: TimestampCell
source_type: Variable

In Excel template, cell value:
"Extracted: {extract_date_time}"

Result:
"Extracted: 2025-01-15T14:30:45+01:00"

Dynamic Filenames

Create filenames that reflect data/content:

output_filename: "{workbook_name}_{extract_year}_{extract_month}.xlsx"

Results in:
"Statistics_2025_01.xlsx"
"Endobest_Output_2025_01.xlsx"

Cascading Filters & Sorts

Apply multiple rules:

filter_condition: {"status": "active", "center": "PARIS01", "type": "inclusion"}
sort_keys: [
  ["visit_order", "asc"],
  ["date_visit", "desc"],
  ["patient_name", "asc"]
]

End of Configuration Guide

For user guide, see DOCUMENTATION_98_USER_GUIDE.md For architecture details, see DOCUMENTATION_13_EXCEL_EXPORT.md

25 KiB Raw Blame History

Endobest Dashboard - Configuration Guide

Configuration Overview

Main Configuration File

Table of Contents

File Location & Structure

Directory Layout

Opening & Editing

Inclusions_Mapping (Reference)

Purpose

Columns

Usage in Excel Export

Organizations_Mapping (Reference)

Purpose

Columns

Usage in Excel Export

Excel_Workbooks Table

Purpose

Column Definitions

workbook_name (Required)

template_path (Required)

output_filename (Required)

output_exists_action (Required)

Row Rules

Example Rows

Excel_Sheets Table

Purpose

Column Definitions

workbook_name (Required)

sheet_name (Required)

source_type (Required)

target (Required)

column_mapping (Conditional)

filter_condition (Optional)

sort_keys (Optional)

value_replacement (Optional)

Row Rules

Example Configurations

Data Types & Formats

Text Fields

JSON Fields

Dates & Times

JSON Field Specifications

column_mapping JSON

filter_condition JSON

sort_keys JSON

value_replacement JSON

Naming Conventions

File & Path Naming

Column Naming

Field Naming

Excel Named Ranges

Configuration Examples

Example 1: Simple Patient Report

Example 2: Multi-Sheet Report

Validation & Error Messages

Configuration Errors (Startup)

Runtime Errors

Best Practices

Configuration Management

Performance Optimization

Data Quality

Security

Troubleshooting

Configuration Issues

Template Issues

Data Issues

Advanced Configuration

Template Variables in Variable Cells

Dynamic Filenames

Cascading Filters & Sorts

25 KiB

Raw Blame History