53 KiB
Endobest Field Mapping Configuration Guide
Part 2: Field Mapping & Configuration
Document Version: 2.0 (Updated with new module references) Last Updated: 2025-11-08 Audience: Developers, Business Analysts, Data Managers Language: English
Note: Configuration file Endobest_Dashboard_Config.xlsx uses Inclusions_Mapping sheet for field definitions (see DOCUMENTATION_13_EXCEL_EXPORT.md and DOCUMENTATION_99_CONFIG_GUIDE.md for Excel export configuration)
Table of Contents
- Overview
- Technical Architecture
- Field Processing Logic
- Configuration File Structure
- Column Reference
- Special Value Prefixes
- Data Sources Explained
- Field Path Syntax
- Custom Functions Reference
- Post-Processing Transformations
- Configuration Examples
- User Guide: Adding/Modifying Fields
- Common Patterns & Recipes
- Troubleshooting
Overview
The Field Mapping Configuration defines which data points are extracted from multiple APIs (RC, GDD, questionnaires) and how they are transformed before export. The configuration is 100% externalized in an Excel file, enabling non-technical users to add new fields without code modifications.
Key Concepts
- Field Group: Logical container for related fields (e.g., "Patient_Identification", "Inclusion", "Endotest")
- Field Name: Unique identifier for the field within its group
- Source: Where the data comes from (questionnaire, record, inclusion, request)
- Field Path: JSON path to navigate nested structures
- Transformations: Post-processing rules (labels, templates, conditions)
- Custom Functions: Calculated fields with business logic
Technical Architecture
Field Extraction Pipeline (Detailed)
CONFIGURATION LOADING (startup):
├─ Load Endobest_Dashboard_Config.xlsx
├─ Parse Inclusions_Mapping sheet (rows 2 onwards)
├─ Validate each field configuration
├─ Parse JSON fields (field_path, value_labels, true_if_any, field_condition)
└─ Store in DASHBOARD_CONFIG array
FIELD PROCESSING (per patient):
├─ For each field in DASHBOARD_CONFIG:
│ ├─ Determine source type (questionnaire, record, inclusion, request, calculated)
│ │
│ ├─ IF source == questionnaire:
│ │ ├─ Method: Search by q_id, q_name, or q_category
│ │ ├─ Data: All questionnaires already fetched for patient
│ │ ├─ Path: Navigate to field_path within questionnaire answers
│ │ └─ Result: raw_value or "undefined"
│ │
│ ├─ IF source == record:
│ │ ├─ Data: Patient's clinical record
│ │ ├─ Path: Navigate JSON structure using field_path
│ │ └─ Result: raw_value or "undefined"
│ │
│ ├─ IF source == inclusion:
│ │ ├─ Data: Patient inclusion metadata
│ │ ├─ Path: Navigate nested inclusion structure
│ │ └─ Result: raw_value or "undefined"
│ │
│ ├─ IF source == request:
│ │ ├─ Data: Lab test request/results
│ │ ├─ Path: Navigate request JSON structure
│ │ └─ Result: raw_value or "undefined"
│ │
│ ├─ IF source == calculated:
│ │ ├─ Function: Custom business logic function
│ │ ├─ Arguments: From field_path
│ │ ├─ Access: Other fields already processed in output_inclusion
│ │ └─ Result: Computed value
│ │
│ ├─ CHECK field_condition (optional):
│ │ ├─ If condition is false → Set to "N/A"
│ │ ├─ If condition is undefined → Set to "undefined"
│ │ └─ If condition is true → Continue processing
│ │
│ ├─ APPLY post-processing transformations:
│ │ ├─ true_if_any: Convert to boolean
│ │ ├─ value_labels: Map to localized text
│ │ ├─ field_template: Apply formatting
│ │ └─ List joining: Flatten arrays
│ │
│ └─ STORE: output_inclusion[field_group][field_name] = final_value
│
└─ Result: Complete inclusion with all extended fields
Questionnaire Finding Strategy
The system supports 3 methods to locate questionnaires:
def find_questionnaire(all_questionnaires, source_type, source_value):
if source_type == "q_id":
# Direct lookup by questionnaire ID (fastest)
return all_questionnaires.get(source_value, {}).get("answers")
elif source_type == "q_name":
# Sequential search by questionnaire name
for qcm_data in all_questionnaires.values():
if qcm_data["questionnaire"]["name"] == source_value:
return qcm_data.get("answers")
return None
elif source_type == "q_category":
# Sequential search by questionnaire category
for qcm_data in all_questionnaires.values():
if qcm_data["questionnaire"]["category"] == source_value:
return qcm_data.get("answers")
return None
Recommendation: Use q_id= for best performance (direct lookup)
Questionnaire Data Optimization
Instead of multiple filtered API calls:
BEFORE (slow):
GET /api/surveys/{qcm_id_1}/answers?subject={patient_id}
GET /api/surveys/{qcm_id_2}/answers?subject={patient_id}
GET /api/surveys/{qcm_id_3}/answers?subject={patient_id}
... (N calls per patient)
AFTER (optimized - single call):
POST /api/surveys/filter/with-answers
payload: {"context": "clinic_research", "subject": patient_id}
returns: [
{"questionnaire": {id, name, category}, "answers": {...}},
{"questionnaire": {id, name, category}, "answers": {...}},
...
]
All questionnaires are returned in a single call, indexed by ID for fast lookup.
Field Processing Logic
Step 1: Source Type Determination
| Source Prefix | Meaning | Example | Data Location |
|---|---|---|---|
q_id= |
Questionnaire by ID | q_id=uuid-123 |
all_questionnaires[uuid-123]["answers"] |
q_name= |
Questionnaire by name | q_name=Symptom Check |
Search by ["questionnaire"]["name"] |
q_category= |
Questionnaire by category | q_category=Symptoms |
Search by ["questionnaire"]["category"] |
record |
Clinical record | record |
record_data["record"] |
inclusion |
Inclusion metadata | inclusion |
inclusion_data |
request |
Lab test request | request |
request_data |
| (Calculated) | Custom function | N/A | Function result |
Step 2: Raw Value Extraction
The field_path defines how to navigate nested JSON structures:
# Simple path
field_path = ["patient", "name"]
# Equivalent to: data["patient"]["name"]
# Nested path
field_path = ["record", "clinicResearchData", 0, "data"]
# Equivalent to: data["record"]["clinicResearchData"][0]["data"]
# Wildcard path (returns array)
field_path = ["record", "clinicResearchData", "*", "test_name"]
# Returns: [test_name_1, test_name_2, test_name_3, ...]
# Deep wildcard
field_path = ["record", "*", "results", "*", "value"]
# Matches all results.*.value across all record items
Step 3: Field Condition Checking (Optional)
The field_condition allows skipping field processing based on another field's value:
IF field_condition is specified:
├─ Look up condition field value in output_inclusion
├─ IF condition value is None or "undefined":
│ └─ Set final_value = "undefined" (skip further processing)
├─ IF condition value is not a boolean:
│ └─ Set final_value = "$$$$ Condition Field Error"
├─ IF condition value is False:
│ └─ Set final_value = "N/A" (field not applicable)
└─ IF condition value is True:
└─ Continue with post-processing
Example:
{
"field_group": "Endotest",
"field_name": "Request_Status",
"source_id": "request",
"field_path": ["status"],
"field_condition": "Endotest.Request_Sent"
}
Meaning: Only populate "Request_Status" if "Request_Sent" is True. Otherwise set to "N/A".
Step 4: Post-Processing Transformations
4a. Array Flattening
If raw_value is an array → Join with | delimiter:
Input: ["Active", "Pending", "Resolved"]
Output: "Active|Pending|Resolved"
4b. Score Dictionary Formatting
If raw_value is dict with keys ['total', 'max'] → Format as string:
Input: {"total": 8, "max": 10}
Output: "8/10"
4c. true_if_any Transformation
If true_if_any is specified → Convert to boolean:
true_if_any: ["Active", "Pending"]
raw_value: "Active"
→ Does raw_value match ANY value in true_if_any list?
→ TRUE
4d. value_labels Mapping
If value_labels is specified → Map value to localized text:
{
"raw_value": "active",
"value_labels": [
{"value": "active", "text": {"fr": "Actif", "en": "Active"}},
{"value": "inactive", "text": {"fr": "Inactif", "en": "Inactive"}}
]
}
→ Output: "Actif" (French text)
4e. field_template Formatting
If field_template is specified → Apply template with $value placeholder:
field_template: "Score: $value/100"
final_value: 85
→ Output: "Score: 85/100"
Configuration File Structure
File Location
Endobest_Dashboard_Config.xlsx
├─ Sheet 1: "Inclusions_Mapping" (field mapping definition)
└─ Sheet 2: "Regression_Check" (non-regression rules)
[See DOCUMENTATION_12_QUALITY_CHECKS.md]
Inclusions_Mapping Sheet Overview
Row 1 (Headers):
A B C D E
field_group field_name source_name source_id field_path
F G H I
field_template field_condition true_if_any value_labels
Row 2+: Field definitions (one per row)
Color Coding (for visual identification):
- Yellow: Extended fields or Calculated fields (requires special attention)
- Blue: Questionnaire-sourced fields (q_id, q_name, q_category)
- Red: Fields with errors or missing required data
- White: Record/Inclusion/Request fields
Column Reference
Column A: field_group
Type: String (required) Description: Logical grouping of related fields in output JSON Rules:
- Must be unique within context (same field_name can exist in different groups)
- Becomes a dictionary key in JSON:
output[field_group][field_name] - Controls field visibility in regression checks
Examples:
Patient_Identification → Contains patient metadata
Inclusion → Inclusion status and data
Endotest → Lab test information
Custom_Data → Default for general fields
Infos_Générales → General information
Antécédents Médicaux → Medical history
Column B: field_name
Type: String (required) Description: Unique field identifier within its group Rules:
- Must not be empty
- Can contain letters, numbers, underscores, hyphens
- Special text in parentheses is automatically removed
- Example:
Patient_Age (years)→Patient_Age
- Example:
Excel Behavior: When cell contains Patient_Age (years), the system parses it as:
field_name = "Patient_Age" # Parenthetical text stripped
Column C: source_name
Type: String (enum) Required: Yes (unless cell contains "Not Specified") Valid Values:
Inclusion → Field from inclusion data
Record → Field from clinical record
Request → Field from lab test request
Patient / Douleurs → Questionnaire name (implicit q_name=)
Signes et symptômes → Questionnaire name (implicit q_name=)
Calculated → Custom function (no direct source)
Not Specified → Skip this row (used for spacing/comments)
Column D: source_id
Type: String (enum with prefixes or JSON array) Description: Specifies how to identify the data source
Format Options:
1. Questionnaire by ID (Recommended)
Syntax: q_id=<uuid>
Example: q_id=550e8400-e29b-41d4-a716-446655440000
Speed: Fastest (direct lookup)
2. Questionnaire by Name
Syntax: q_name=<name>
Example: q_name=Symptom Questionnaire
Speed: Slower (sequential search)
3. Questionnaire by Category
Syntax: q_category=<category>
Example: q_category=Medical History
Speed: Slower (sequential search)
4. Record Source
Value: record
Means: Extract from clinical record data
5. Inclusion Source
Value: inclusion
Means: Extract from inclusion metadata
6. Request Source
Value: request
Means: Extract from lab test request
7. Calculated Function
Syntax: <function_name>
Example: search_in_fields_using_regex, if_then_else, extract_parentheses_content
See Section: Custom Functions Reference
Column E: field_path
Type: JSON array (required when field is specified) Description: Path to navigate nested JSON structure
Syntax Examples:
Simple field:
["name"]
// Equivalent to: data["name"]
Nested path:
["record", "patient", "demographics", "age"]
// Equivalent to: data["record"]["patient"]["demographics"]["age"]
Array index:
["record", "clinicResearchData", 0, "test_name"]
// Equivalent to: data["record"]["clinicResearchData"][0]["test_name"]
Wildcard (all elements):
["record", "clinicResearchData", "*", "test_name"]
// Returns: [test_name_1, test_name_2, test_name_3, ...]
// Result: Automatically joined with "|" in final value
For Calculated Functions (arguments):
[
"search_in_fields_using_regex",
".*surgery.*",
"Previous_Surgery",
"Recent_Surgery"
]
// First element: function name
// Rest: arguments to pass to function
Column F: field_template
Type: String with $value placeholder (optional)
Description: Apply formatting to the final value
Rules:
- Only applied if final_value is not "undefined" or "N/A"
- Must contain
$valueplaceholder - Result: Template with
$valuereplaced by actual value
Examples:
Template: "$value%"
Value: 85
Result: "85%"
Template: "Score: $value/100"
Value: 42
Result: "Score: 42/100"
Template: "Status: $value (Updated)"
Value: "Active"
Result: "Status: Active (Updated)"
Column G: field_condition
Type: String (field name reference, optional) Description: Conditional field inclusion based on another field's value Rules:
- If specified, must reference another field name already processed
- Must evaluate to a boolean value
- Referenced as
<field_group>.<field_name>
Logic:
IF field_condition_value == True:
Process field normally
ELIF field_condition_value == False:
Set final_value = "N/A"
ELSE (undefined/null/non-boolean):
Set final_value = "undefined"
Examples:
field_condition: Inclusion.isPrematurelyTerminated
Meaning: Only process this field if patient is prematurely terminated
field_condition: Endotest.Request_Sent
Meaning: Only process if test request was sent
Column H: true_if_any
Type: JSON array (optional) Description: Convert to boolean if value matches ANY item in array
Syntax:
["value1", "value2", "value3"]
Logic:
LOOP through true_if_any array:
IF raw_value == any_item:
RETURN True
RETURN False
Example:
{
"field_name": "Is_Active",
"true_if_any": ["active", "pending", "processing"]
}
raw_value = "pending"
→ Does "pending" exist in ["active", "pending", "processing"]?
→ YES → Final value = True
raw_value = "completed"
→ Does "completed" exist in list?
→ NO → Final value = False
Column I: value_labels
Type: JSON array of mapping objects (optional) Description: Map field values to localized text labels
Syntax:
[
{
"value": "raw_value_1",
"text": {
"fr": "Libellé Français",
"en": "English Label"
}
},
{
"value": "raw_value_2",
"text": {
"fr": "Autre Libellé",
"en": "Another Label"
}
}
]
Logic:
LOOP through value_labels array:
IF label_map.value == raw_value:
RETURN label_map.text.fr (French text)
IF no match found:
RETURN "$$$$ Value Error: {raw_value}"
Example:
{
"field_name": "Status",
"value_labels": [
{
"value": 1,
"text": {"fr": "Inclus", "en": "Included"}
},
{
"value": 0,
"text": {"fr": "Pré-inclus", "en": "Pre-included"}
}
]
}
raw_value = 1
→ Map to French label: "Inclus"
Special Value Prefixes
This section documents special prefixes and keywords used in Extended Fields configuration for value resolution and field references.
Prefix: $ (String Literal)
Location: In function arguments (like if_then_else parameters)
Meaning: Marks a string value as a literal (not a field reference)
Syntax: $value (just prefix with $, no quotes needed)
Without $ prefix:
{
"field_path": ["is_true", "Has_Consent", "YES", "NO"]
}
// "YES" is interpreted as a FIELD NAME to look up
// This will fail because no field named "YES" exists
With $ prefix (correct):
{
"field_path": ["is_true", "Has_Consent", "$YES", "$NO"]
}
// $YES is interpreted as LITERAL STRING "YES"
// $NO is interpreted as LITERAL STRING "NO"
// Has_Consent is interpreted as FIELD NAME (no prefix)
Why It Matters: The system needs to distinguish between:
- Field references (look up values):
Status,Is_Active,Patient_Id - Literal values (use as-is):
$Active,$N/A,$Ready
No Prefix: Field References
Location: Arguments where field names are expected
Meaning: Refers to a field in the current inclusion data
Examples:
{
"field_path": ["is_true", "Has_Consent", "$YES", "$NO"]
}
// Has_Consent ← field reference (look up this field's value)
// Status ← field reference
// Is_Active ← field reference
Resolution: The system looks up the field in the current inclusion object.
Wildcard: * in Field Paths
Location: In field_path column (Column E in Mapping sheet)
Meaning: Match all elements at this level
Syntax:
["record", "*", "results", "*", "value"]
Example 1: Single Level Wildcard
{
"field_path": ["items", "*", "name"]
}
// Returns all "name" values from each item
// If items = [
// {name: "Item 1", ...},
// {name: "Item 2", ...},
// {name: "Item 3", ...}
// ]
// Result: ["Item 1", "Item 2", "Item 3"]
// Final output: "Item 1|Item 2|Item 3" (pipe-joined)
Example 2: Multiple Level Wildcard
{
"field_path": ["record", "*", "data", "*", "test"]
}
// Matches test values at multiple nesting levels
Post-Processing:
- Arrays are automatically joined with
|delimiter - Scalar values are kept as-is
Value Resolution in if_then_else
When using the if_then_else function, values are resolved based on their format:
| Format | Type | Resolution |
|---|---|---|
true, false |
Boolean literal | Used directly |
42, 3.14 |
Numeric literal | Used directly |
$string |
String literal | Remove $ prefix and use value |
field_name |
Field reference | Look up field value |
Examples:
{
"field_path": ["is_true", "Has_Consent", "$APPROVED", "$NOT_APPROVED"]
}
// Has_Consent → field reference (look it up)
// $APPROVED → string literal (use "APPROVED")
// $NOT_APPROVED → string literal (use "NOT_APPROVED")
{
"field_path": ["==", "Status", "$Active", "Overall_Status", "$MISSING"]
}
// Status → field reference
// $Active → string literal (use "Active")
// Overall_Status → field reference
// $MISSING → string literal (use "MISSING")
Summary Table: Special Prefixes
| Symbol | Meaning | Example |
|---|---|---|
$value |
String literal (remove $ prefix) |
$YES, $READY, $N/A |
| No prefix | Field reference (look up) | Status, Patient_Id |
* |
Wildcard in field_path (all array elements) | ["items", "*", "name"] |
Data Sources Explained
1. Questionnaire Sources (q_id, q_name, q_category)
What Are Questionnaires?
Questionnaires are forms/surveys filled out by patients or clinicians in the Research Clinic system. Each questionnaire has:
- ID: Unique identifier (UUID)
- Name: Display name (e.g., "Symptom Assessment")
- Category: Logical grouping (e.g., "Medical History")
- Answers: Key-value pairs of responses
Data Structure
all_questionnaires: {
"qcm-uuid-1": {
"questionnaire": {
"id": "qcm-uuid-1",
"name": "Symptom Questionnaire",
"category": "Symptoms"
},
"answers": {
"question_1": "answer_value",
"question_2": true,
"question_3": 42
}
},
"qcm-uuid-2": {
"questionnaire": {
"id": "qcm-uuid-2",
"name": "Medical History",
"category": "History"
},
"answers": {
"has_diabetes": false,
"has_hypertension": true
}
}
}
Finding Questionnaires
Option 1: By ID (Fastest)
{
"source_id": "q_id=qcm-uuid-1",
"field_path": ["answers", "question_1"]
}
// Direct lookup in dictionary by ID
// Performance: O(1) constant time
Option 2: By Name
{
"source_id": "q_name=Symptom Questionnaire",
"field_path": ["answers", "question_1"]
}
// Sequential search through all questionnaires
// Performance: O(n) proportional to questionnaire count
Option 3: By Category
{
"source_id": "q_category=Symptoms",
"field_path": ["answers", "question_1"]
}
// Sequential search for category match
// Performance: O(n)
Recommendation: Use q_id= for best performance. Name and category searches are slower but acceptable if IDs are not available.
2. Record Source (Clinical Data)
What Is Record Data?
The clinical record contains all medical information for a patient within the Research Clinic context:
- Protocol inclusions status
- Clinical research data (test requests, results)
- Patient demographics
- Medical history
Data Structure
record_data: {
"record": {
"id": "record-uuid",
"patientId": "patient-uuid",
"protocol_inclusions": [
{
"status": "incluse",
"blockedQcmVersions": [],
"clinicResearchData": [
{
"requestMetaData": {
"tubeId": "tube-uuid-123"
},
"needRcp": false
}
]
}
]
}
}
Example Extraction
{
"source_id": "record",
"field_path": ["record", "protocol_inclusions", 0, "status"]
}
// Result: "incluse"
{
"source_id": "record",
"field_path": ["record", "clinicResearchData", "*", "requestMetaData", "tubeId"]
}
// Result: ["tube-uuid-1", "tube-uuid-2"]
// Final: "tube-uuid-1|tube-uuid-2"
3. Inclusion Source (Inclusion Metadata)
What Is Inclusion Data?
Inclusion data contains metadata about the patient's inclusion in the research protocol:
- Basic patient information (name, birthday)
- Organization assignment
- Inclusion status
- Inclusion date
Data Structure
inclusion_data: {
"id": "patient-uuid",
"name": "Doe, John",
"birthday": "1975-05-15",
"status": "incluse",
"inclusionDate": "2024-10-15",
"organization_id": "org-uuid-added-by-system",
"organization_name": "Center Name-added-by-system"
}
Example Extraction
{
"source_id": "inclusion",
"field_path": ["name"]
}
// Result: "Doe, John"
{
"source_id": "inclusion",
"field_path": ["status"]
}
// Result: "incluse"
4. Request Source (Lab Test Data)
What Is Request Data?
Request data contains information about laboratory tests ordered and their results:
- Test request status
- Diagnostic status
- Individual test results
- Result values
Data Structure
request_data: {
"id": "request-uuid",
"tubeId": "tube-uuid-123",
"status": "completed",
"diagnostic_status": "Completed",
"results": [
{
"testName": "Complete Blood Count",
"value": "Normal",
"unit": ""
},
{
"testName": "Coelioscopie",
"value": "Findings documented",
"unit": ""
}
]
}
Example Extraction
{
"source_id": "request",
"field_path": ["status"]
}
// Result: "completed"
{
"source_id": "request",
"field_path": ["results", "*", "testName"]
}
// Result: ["Complete Blood Count", "Coelioscopie"]
// Final: "Complete Blood Count|Coelioscopie"
5. Calculated Source (Custom Functions)
What Are Calculated Fields?
Calculated fields derive their values from custom business logic functions, not direct data extraction. The function can access other already-processed fields and perform complex transformations.
Examples
{
"source_name": "Calculated",
"source_id": "search_in_fields_using_regex",
"field_path": [".*SURGERY.*", "Previous_Surgery", "Recent_Surgery"]
}
// Function searches multiple fields using regex
{
"source_name": "Calculated",
"source_id": "if_then_else",
"field_path": ["is_true", "Requested", "$\"YES\"", "$\"NO\""]
}
// Function applies conditional logic
{
"source_name": "Calculated",
"source_id": "extract_parentheses_content",
"field_path": ["Status_Field"]
}
// Function extracts text from within parentheses
See Section: Custom Functions Reference for detailed function documentation.
6. Inclusion Source with Organization Enrichment (center_name)
What Is Organization Center Mapping?
The organization center mapping feature enriches patient inclusion data with standardized center identifiers. When configured, the center_name field is automatically added to each inclusion record, allowing you to group patients by center codes.
Data Source: Inclusion Type
{
"source_name": "Inclusion",
"source_id": "inclusion",
"source_type": "inclusion",
"field_path": ["center_name"]
}
Fields Available from Organization Enrichment
| Field | Type | Description | Availability |
|---|---|---|---|
center_name |
String | Standardized center identifier | If mapping file exists |
organization_name |
String | Full organization name | Always |
organization_id |
String | Organization UUID | Always |
Data Structure
inclusion_data: {
"organization_id": "org-uuid",
"organization_name": "Hospital Cardiology Research Lab",
"center_name": "HCR-MAIN", // ← Added by organization mapping
"id": "patient-uuid",
...
}
Example Extraction
{
"source_name": "Inclusion",
"source_id": "inclusion",
"source_type": "inclusion",
"field_path": ["center_name"]
}
// Result: "HCR-MAIN"
{
"source_name": "Inclusion",
"source_id": "inclusion",
"source_type": "inclusion",
"field_path": ["organization_name"]
}
// Result: "Hospital Cardiology Research Lab"
Configuration Requirements
To use this feature:
- Create
eb_org_center_mapping.xlsxin script directory (see DOCUMENTATION_10_ARCHITECTURE.md Organization ↔ Center Mapping section) - Define mapping rules in the
Org_Center_Mappingsheet - Add extended field with source type "inclusion" and field_path ["center_name"]
Availability:
- ✅ If mapping file exists and organization is mapped →
center_name= mapped value - ⚠️ If mapping file missing or organization not in mapping →
center_name= organization name (fallback)
Example Configuration
{
"field_group": "Patient_Identification",
"field_name": "Center_Name",
"source_name": "Inclusion",
"source_id": "inclusion",
"source_type": "inclusion",
"field_path": ["center_name"],
"field_template": null,
"field_condition": null,
"true_if_any": null,
"value_labels": null
}
Result in output:
{
"Patient_Identification": {
"Organisation_Name": "Hospital Cardiology Research Lab",
"Center_Name": "HCR-MAIN",
...
}
}
Field Path Syntax
Basic Path Navigation
Single-Level Access
["field_name"]
// JavaScript equivalent: data.field_name
// Result: value or undefined
Multi-Level Nesting
["record", "patient", "demographics", "age"]
// JavaScript: data.record.patient.demographics.age
Array Index Access
["items", 0, "name"]
// JavaScript: data.items[0].name
// Accesses first element of array
Negative Index (from end)
["items", -1, "name"]
// JavaScript: data.items[data.items.length - 1].name
// Accesses last element of array
Wildcard Paths (Multiple Values)
Single Wildcard (One Level)
["questionnaire", "answers", "*", "value"]
// Returns all values from each answer object
// Result: Array of values [value1, value2, value3, ...]
Multiple Wildcards (Deep)
["record", "*", "data", "*", "test"]
// Matches nested wildcards at multiple levels
// Returns: All tests at matching paths
Wildcard Result Flattening
path: ["items", "*", "values", "*", "score"]
items: [
{
"values": [
{"score": 10},
{"score": 20}
]
},
{
"values": [
{"score": 30},
{"score": 40}
]
}
]
// Without flattening: [[10, 20], [30, 40]]
// With flattening (used): [10, 20, 30, 40]
Edge Cases & Behavior
Missing Path
field_path: ["missing", "field"]
data: {}
Result: "undefined" (not null or empty string)
Null/None Values in Path
field_path: ["patient", "contact", "phone"]
data: {"patient": {"contact": null}}
Result: "undefined" (stops at null)
Non-Dictionary/Non-List Element
field_path: ["patient", "name", "first"]
data: {"patient": {"name": "John"}} // "name" is string, not dict
Result: "undefined" (cannot navigate string)
Custom Functions Reference
Function 1: search_in_fields_using_regex
Purpose: Search multiple fields for regex pattern match (case-insensitive)
Syntax:
{
"source_name": "Calculated",
"source_id": "search_in_fields_using_regex",
"field_path": ["regex_pattern", "field_1", "field_2", ...]
}
Parameters:
- regex_pattern (string): Regular expression pattern (case-insensitive)
- field_1, field_2, ... (strings): Field names to search (looked up in output_inclusion)
Logic:
FOR EACH field in [field_1, field_2, ...]:
value = get_value_from_inclusion(field_name)
IF value is string AND value matches regex_pattern:
RETURN True
RETURN False
Return Value:
Trueif ANY field matches the patternFalseif NO fields match"undefined"if ALL fields are undefined
Examples:
Example 1: Detect if any surgery field contains "surgery"
{
"field_name": "Has_Surgery_History",
"source_id": "search_in_fields_using_regex",
"field_path": [".*surgery.*", "Previous_Surgery", "Recent_Surgery", "Planned_Surgery"]
}
If any of these fields contains "surgery" → True
Otherwise → False
Example 2: Check for specific procedures
{
"field_name": "Is_Endoscopy_Planned",
"source_id": "search_in_fields_using_regex",
"field_path": ["endoscopy|colonoscopy", "Procedure_Type", "Procedure_Notes"]
}
Matches if "endoscopy" OR "colonoscopy" appears in either field
Function 2: extract_parentheses_content
Purpose: Extract text within the first set of parentheses
Syntax:
{
"source_name": "Calculated",
"source_id": "extract_parentheses_content",
"field_path": ["field_name"]
}
Parameters:
- field_name (string): Field to extract from (looked up in output_inclusion)
Logic:
value = get_value_from_inclusion(field_name)
IF value is not defined:
RETURN "undefined"
MATCH first occurrence of (content) pattern
IF match found:
RETURN content
ELSE:
RETURN "undefined"
Return Value:
- Text extracted from parentheses (e.g., "Active")
"undefined"if no parentheses found or field undefined
Examples:
Example 1: Extract status from formatted field
Input: "Patient Status (Active)"
Output: "Active"
Example 2: Extract category name
Input: "Medical Condition (Hypertension)"
Output: "Hypertension"
Example 3: Nested extraction
Input: "Surgery Scheduled (Appendectomy - Jan 15)"
Output: "Appendectomy - Jan 15"
Function 3: append_terminated_suffix
Purpose: Add " - AP" suffix to status if patient prematurely terminated
Syntax:
{
"source_name": "Calculated",
"source_id": "append_terminated_suffix",
"field_path": ["status_field_name", "is_terminated_field_name"]
}
Parameters:
- status_field_name (string): Field containing status value
- is_terminated_field_name (string): Boolean field indicating termination
Logic:
status = get_value_from_inclusion(status_field_name)
is_terminated = get_value_from_inclusion(is_terminated_field_name)
IF status is undefined:
RETURN "undefined"
IF is_terminated is TRUE:
RETURN status + " - AP"
ELSE:
RETURN status
Return Value:
- Status with " - AP" suffix if terminated
- Original status if not terminated
"undefined"if status field undefined
Examples:
Example 1: Mark prematurely terminated patients
{
"field_name": "Inclusion_Status",
"source_id": "append_terminated_suffix",
"field_path": ["Base_Status", "isPrematurelyTerminated"]
}
If isPrematurelyTerminated = True:
"incluse" → "incluse - AP"
If isPrematurelyTerminated = False:
"incluse" → "incluse"
Function 4: if_then_else
Purpose: Unified conditional logic with 8 different operators
Syntax:
{
"source_name": "Calculated",
"source_id": "if_then_else",
"field_path": ["operator", arg1, arg2_optional, result_if_true, result_if_false]
}
Operator Reference
Operator 1: is_true
Signature: ["is_true", field_name, result_if_true, result_if_false]
Logic: IF field == True THEN result_if_true ELSE result_if_false
Example:
{
"field_path": ["is_true", "Has_Consent", "$\"Consented\"", "$\"Not Consented\""]
}
// If Has_Consent = True → "Consented"
// If Has_Consent = False → "Not Consented"
Operator 2: is_false
Signature: ["is_false", field_name, result_if_true, result_if_false]
Logic: IF field == False THEN result_if_true ELSE result_if_false
Example:
{
"field_path": ["is_false", "Has_Exclusion", "$\"Eligible\"", "$\"Excluded\""]
}
Operator 3: is_defined
Signature: ["is_defined", field_name, result_if_true, result_if_false]
Logic: IF field is not undefined THEN result_if_true ELSE result_if_false
Example:
{
"field_path": ["is_defined", "Surgery_Date", "$\"Date Available\"", "$\"No Date\""]
}
Operator 4: is_undefined
Signature: ["is_undefined", field_name, result_if_true, result_if_false]
Logic: IF field is undefined THEN result_if_true ELSE result_if_false
Example:
{
"field_path": ["is_undefined", "Last_Contact", "$\"Never Contacted\"", "$\"Contacted\""]
}
Operator 5: all_true
Signature: ["all_true", [field_1, field_2, ...], result_if_true, result_if_false]
Logic: IF all fields == True THEN result_if_true ELSE result_if_false
Example:
{
"field_path": ["all_true", ["Has_Consent", "Has_Results", "Is_Complete"], "$\"READY\"", "$\"INCOMPLETE\""]
}
// Returns "READY" only if ALL three fields are True
Operator 6: all_defined
Signature: ["all_defined", [field_1, field_2, ...], result_if_true, result_if_false]
Logic: IF all fields are defined THEN result_if_true ELSE result_if_false
Example:
{
"field_path": ["all_defined", ["First_Name", "Last_Name", "Birth_Date"], "$\"COMPLETE\"", "$\"INCOMPLETE\""]
}
// Returns "COMPLETE" only if ALL three fields have values
Operator 7: ==
Signature: ["==", value1, value2, result_if_true, result_if_false]
Logic: IF value1 == value2 THEN result_if_true ELSE result_if_false
Example:
{
"field_path": ["==", "Status", "$\"Active\"", "$\"Is Active\"", "$\"Not Active\""]
}
// If Status equals "Active" → "Is Active"
Operator 8: !=
Signature: ["!=", value1, value2, result_if_true, result_if_false]
Logic: IF value1 != value2 THEN result_if_true ELSE result_if_false
Example:
{
"field_path": ["!=", "Status", "$\"Inactive\"", "$\"Active\"", "$\"Inactive\""]
}
// If Status NOT equal to "Inactive" → "Active"
Value Resolution
The function supports multiple value types:
Boolean Literals:
true, false
// Used directly without field lookup
Numeric Literals:
42, 3.14, 0, -1
// Used directly without field lookup
String Literals (Prefixed with $):
"$\"Active\"", "$\"Ready\"", "$\"N/A\""
// Remove $ prefix before using
// $ prefix signals: don't look this up as field name
Field References (No Prefix):
"Status", "Is_Active", "Patient_Name"
// Looked up in output_inclusion
Complex Examples:
{
"field_path": ["==", "Status_Code", 1, "$\"Active\"", "$\"Inactive\""]
}
// Compare Status_Code field against numeric value 1
{
"field_path": ["all_true", ["Consent_Received", "Test_Completed"], "Overall_Status", "$\"MISSING\""]
}
// If both conditions true, use Overall_Status value
// If either false, use literal "MISSING"
Post-Processing Transformations
Transformation Order
Raw Value Extraction
↓
Condition Check
↓
IF final_value is list:
└─ Join with "|" delimiter
↓
IF final_value is score dict (has 'total' and 'max'):
└─ Format as "total/max"
↓
IF true_if_any is specified:
└─ Apply boolean conversion
↓
IF value_labels is specified:
└─ Apply label mapping
↓
IF field_template is specified:
└─ Apply formatting with $value
Transformation 1: Array Flattening
When: Raw value is an array/list
Action: Join elements with | delimiter
Example:
Raw: ["Active", "Pending", "Resolved"]
Output: "Active|Pending|Resolved"
Transformation 2: Score Dictionary Formatting
When: Raw value is dict with keys ['total', 'max'] Action: Convert to "total/max" string format Example:
Raw: {"total": 8, "max": 10}
Output: "8/10"
Transformation 3: true_if_any
When: true_if_any is specified in configuration Action: Check if raw value matches ANY item in the array Example:
{
"true_if_any": ["Active", "Pending", "Processing"],
"raw_value": "Active"
}
// Result: true
{
"true_if_any": ["Active", "Pending"],
"raw_value": "Completed"
}
// Result: false
Transformation 4: value_labels
When: value_labels is specified in configuration Action: Map raw value to localized text Logic:
FOR EACH label_map in value_labels:
IF label_map.value == raw_value:
RETURN label_map.text.fr (French label)
IF no match:
RETURN "$$$$ Value Error: {raw_value}"
Example:
{
"value_labels": [
{"value": "active", "text": {"fr": "Actif", "en": "Active"}},
{"value": "inactive", "text": {"fr": "Inactif", "en": "Inactive"}}
],
"raw_value": "active"
}
// Result: "Actif"
Transformation 5: field_template
When: field_template is specified (and value is not "undefined" or "N/A") Action: Replace $value placeholder with actual value Example:
template: "Score: $value/100"
raw_value: 85
Result: "Score: 85/100"
template: "Status [$value]"
raw_value: "Active"
Result: "Status [Active]"
Configuration Examples
Example 1: Simple Field Extraction
Requirement: Extract patient name from inclusion data
{
"field_group": "Patient_Identification",
"field_name": "Patient_Name",
"source_name": "Inclusion",
"source_id": "inclusion",
"field_path": ["name"],
"field_template": null,
"field_condition": null,
"true_if_any": null,
"value_labels": null
}
Flow:
- Source: inclusion data
- Extract: data["name"]
- Result: "Doe, John"
- Output: {"Patient_Identification": {"Patient_Name": "Doe, John"}}
Example 2: Questionnaire Field with Label Mapping
Requirement: Extract symptom severity and map to French labels
{
"field_group": "Symptoms",
"field_name": "Severity",
"source_name": "Symptoms (OUI/NON)",
"source_id": "q_id=77e488a1-d3c-148af-a6bc-8fe1f55e82e4",
"field_path": ["answers", "question5"],
"field_template": null,
"field_condition": null,
"true_if_any": null,
"value_labels": [
{"value": 1, "text": {"fr": "Léger", "en": "Mild"}},
{"value": 2, "text": {"fr": "Modéré", "en": "Moderate"}},
{"value": 3, "text": {"fr": "Sévère", "en": "Severe"}}
]
}
Flow:
- Source: Questionnaire with ID 77e488a1-...
- Extract: answers["question5"] → 2
- Apply value_labels: 2 → "Modéré"
- Output: {"Symptoms": {"Severity": "Modéré"}}
Example 3: Conditional Field
Requirement: Only show request status if test was requested
{
"field_group": "Endotest",
"field_name": "Request_Status",
"source_name": "Request",
"source_id": "request",
"field_path": ["status"],
"field_template": null,
"field_condition": "Endotest.Request_Sent",
"true_if_any": null,
"value_labels": null
}
Flow:
- Check condition: Endotest.Request_Sent
- If False → Set to "N/A"
- If True → Extract status from request data
- Output: {"Endotest": {"Request_Status": "completed"}} or "N/A"
Example 4: Calculated Field with if_then_else
Requirement: Show overall status based on inclusion and termination
{
"field_group": "Inclusion",
"field_name": "Inclusion_Status_Complete",
"source_name": "Calculated",
"source_id": "if_then_else",
"field_path": ["is_true", "isPrematurelyTerminated", "$\"incluse - AP\"", "Inclusion_Status"],
"field_template": null,
"field_condition": null,
"true_if_any": null,
"value_labels": null
}
Flow:
- Check: Is isPrematurelyTerminated == True?
- If YES → Return literal "incluse - AP"
- If NO → Return value of Inclusion_Status field
- Output: {"Inclusion": {"Inclusion_Status_Complete": "incluse - AP"}} or "incluse"
Example 5: Array Field with Formatting
Requirement: Extract all test names and format them
{
"field_group": "Endotest",
"field_name": "Tests_Performed",
"source_name": "Request",
"source_id": "request",
"field_path": ["results", "*", "testName"],
"field_template": "Tests: $value",
"field_condition": null,
"true_if_any": null,
"value_labels": null
}
Flow:
- Source: request data
- Extract: results[*].testName → ["Blood Test", "Imaging", "ECG"]
- Array flattening → "Blood Test|Imaging|ECG"
- Apply template → "Tests: Blood Test|Imaging|ECG"
- Output: {"Endotest": {"Tests_Performed": "Tests: Blood Test|Imaging|ECG"}}
Example 6: Complex Conditional Logic
Requirement: Show surgery type based on multiple conditions
{
"field_group": "Surgery",
"field_name": "Surgery_Status",
"source_name": "Calculated",
"source_id": "if_then_else",
"field_path": [
"all_true",
["Surgery_Planned", "Surgeon_Assigned", "Date_Set"],
"$\"READY_FOR_SURGERY\"",
"$\"INCOMPLETE_PREPARATION\""
],
"field_template": null,
"field_condition": null,
"true_if_any": null,
"value_labels": null
}
Flow:
- Check: Are ALL of [Surgery_Planned, Surgeon_Assigned, Date_Set] == True?
- If YES → "READY_FOR_SURGERY"
- If NO → "INCOMPLETE_PREPARATION"
- Output: Conditional status
Example 7: Search and Boolean Conversion
Requirement: Detect if patient has surgery history
{
"field_group": "Medical_History",
"field_name": "Has_Prior_Surgery",
"source_name": "Calculated",
"source_id": "search_in_fields_using_regex",
"field_path": [".*surgery|.*intervention.*", "History_Notes", "Previous_Procedures"],
"field_template": null,
"field_condition": null,
"true_if_any": null,
"value_labels": null
}
Flow:
- Search History_Notes and Previous_Procedures
- Pattern: ".*surgery|.intervention." (case-insensitive)
- If ANY field matches → true
- If NO matches → false
- Output: {"Medical_History": {"Has_Prior_Surgery": true}}
User Guide: Adding/Modifying Fields
Step 1: Identify Data Source
Determine where the data lives:
Patient Name → inclusion (inclusion_data)
Symptom Severity → questionnaire (q_id, q_name, or q_category)
Clinical Notes → record (record_data)
Test Results → request (request_data)
Derived Value → calculated (custom function)
Step 2: Locate Field Path
Navigate the JSON structure to find the exact path:
For Inclusion:
Open endobest_inclusions_old.json
Find a patient record
Look for field under "Patient_Identification"
Example path: ["name"]
For Questionnaire:
Need questionnaire ID/name/category
Look inside answers object
Example: q_id=abc-123, field_path: ["answers", "question_5"]
For Record:
Open a record with GET /api/records/byPatient
Navigate structure
Example: ["record", "clinicResearchData", 0, "requestMetaData"]
For Request:
Field from lab request response
Example: ["results", "*", "testName"]
Step 3: Create Configuration Row
Open Endobest_Dashboard_Config.xlsx → Inclusions_Mapping sheet
Row N:
A: field_group (e.g., "Custom_Data")
B: field_name (e.g., "Patient_Status")
C: source_name (e.g., "Inclusion")
D: source_id (e.g., "inclusion")
E: field_path (e.g., ["status"])
F: field_template (optional, e.g., "Status: $value")
G: field_condition (optional, e.g., "Inclusion.Is_Active")
H: true_if_any (optional, e.g., ["active", "pending"])
I: value_labels (optional, complex JSON)
Step 4: Validate Configuration
Run the dashboard in check-only mode:
python eb_dashboard.py --check-only
Expected Output:
✓ Loaded 81 fields from extended configuration.
✓ All checks passed successfully!
If errors occur:
Error in config file, row 42, field 'field_path': Invalid JSON format.
→ Fix the JSON syntax in the cell
Step 5: Test with Full Collection
python eb_dashboard.py
After collection completes, verify:
- New field appears in endobest_inclusions.json
- Values are populated correctly
- No data quality issues reported
Step 6: Document the Field
Add comments in a separate notes section (if available) explaining:
- Purpose of the field
- Data source and ID
- Any special transformations
- Expected value ranges/types
Common Patterns & Recipes
Pattern 1: Boolean Flag from Multiple Conditions
Requirement: Create true/false flag based on multiple fields
{
"field_group": "Flags",
"field_name": "Is_Ready_For_Export",
"source_name": "Calculated",
"source_id": "if_then_else",
"field_path": [
"all_true",
["Has_Consent", "Data_Complete", "Approved"],
true,
false
]
}
Pattern 2: Score Display Formatting
Requirement: Show quality of life score as "X/100" format
{
"field_group": "Quality_Metrics",
"field_name": "QOL_Score_Display",
"source_name": "q_id=...",
"source_id": "q_id=...",
"field_path": ["answers", "overall_score"],
"field_template": "$value/100"
}
Pattern 3: Status Translation with Suffix
Requirement: Show inclusion status with " - AP" for terminated patients
{
"field_group": "Inclusion",
"field_name": "Status_With_Termination",
"source_name": "Calculated",
"source_id": "append_terminated_suffix",
"field_path": ["Inclusion_Status", "isPrematurelyTerminated"]
}
Pattern 4: List-to-String Conversion
Requirement: Show all diagnoses as pipe-separated text
{
"field_group": "Medical_Data",
"field_name": "All_Diagnoses",
"source_name": "Record",
"source_id": "record",
"field_path": ["record", "diagnoses", "*", "code"]
// Result: "ICD-001|ICD-002|ICD-003"
}
Pattern 5: Optional Field Based on Condition
Requirement: Only show surgery details if surgery was performed
{
"field_group": "Surgery",
"field_name": "Surgery_Details",
"source_name": "Record",
"source_id": "record",
"field_path": ["record", "surgery", "details"],
"field_condition": "Surgery.Surgery_Performed"
// If Surgery_Performed = false → "N/A"
}
Pattern 6: Enum-to-Text Mapping
Requirement: Convert numeric status codes to readable text
{
"field_group": "Status",
"field_name": "Inclusion_Status_Text",
"source_name": "Inclusion",
"source_id": "inclusion",
"field_path": ["status_code"],
"value_labels": [
{"value": 0, "text": {"fr": "Pré-inclus", "en": "Pre-included"}},
{"value": 1, "text": {"fr": "Inclus", "en": "Included"}},
{"value": 2, "text": {"fr": "Exclus", "en": "Excluded"}}
]
}
Pattern 7: Pattern Matching in Multiple Fields
Requirement: Check if any medical note mentions specific condition
{
"field_group": "Medical",
"field_name": "Mentions_Hypertension",
"source_name": "Calculated",
"source_id": "search_in_fields_using_regex",
"field_path": [
"hypertension|high.*pressure|HBP",
"Medical_History",
"Current_Conditions",
"Medication_Notes"
]
}
Pattern 8: Extracted Parenthetical Classification
Requirement: Extract diagnosis type from formatted text like "Disease (Type A)"
{
"field_group": "Classification",
"field_name": "Diagnosis_Type",
"source_name": "Calculated",
"source_id": "extract_parentheses_content",
"field_path": ["Formatted_Diagnosis"]
}
Troubleshooting
Issue 1: "Invalid JSON format" Error
Symptom: Configuration validation fails with JSON parsing error
Cause: Malformed JSON in field_path, value_labels, or field_condition
Solution:
- Open cell in JSON validator (jsonlint.com)
- Verify all:
- Array brackets:
[...] - Object braces:
{...} - String quotes:
"..." - Commas between elements
- Array brackets:
- Fix syntax errors
- Re-run validation
Example - WRONG:
["name", "address" ] // WRONG: no comma after "name"
["name", "address"] // CORRECT
Issue 2: Field Returns "undefined"
Symptom: Field value always "undefined" in output
Causes:
- Field path doesn't match actual data structure
- Questionnaire ID incorrect
- Source type mismatch
Solution:
- Check if source data exists in endobest_inclusions_old.json
- Verify JSON path by stepping through manually
- Check questionnaire ID (use
q_idfor fastest lookup) - Enable debug mode to see detailed errors
python eb_dashboard.py --debug
Issue 3: Empty Array Result
Symptom: Wildcard path returns empty array instead of values
Causes:
- Array elements don't exist at specified path
- Wildcard position incorrect in path
Solution:
- Verify array exists in source data
- Check array element structure
- Test path manually in JSON tool
Example:
// WRONG: No elements at this path
["record", "items", "*", "nonexistent_field"]
// CORRECT: Match actual structure
["record", "items", "*", "existing_field"]
Issue 4: Calculated Field Returns Error
Symptom: Calculated field value starts with " "
Causes:
- Function name wrong
- Function argument count mismatch
- Referenced fields not yet processed
Solution:
- Check function name spelling
- Verify argument count in field_path
- Ensure referenced fields are defined BEFORE calculated field
- Check for circular dependencies
Common Errors:
"$$$$ Unknown Custom Function: typo_name"
→ Check function name spelling
"$$$$ Argument Error: function requires N arguments"
→ Check field_path array length
"$$$$ Value Error: undefined"
→ Referenced field is undefined; check order in config
Issue 5: value_labels Not Applied
Symptom: Raw value shown instead of mapped label
Causes:
- Raw value doesn't match any entry in value_labels
- JSON syntax error in value_labels
- Case sensitivity mismatch
Solution:
- Check raw value type (string vs. number)
- Verify exact match in value_labels
- Check for case mismatches (e.g., "Active" vs "active")
- Add wildcard entry if needed
Example:
{
"value_labels": [
{"value": "active", "text": {"fr": "Actif"}},
{"value": "inactive", "text": {"fr": "Inactif"}},
{"value": "*", "text": {"fr": "Autre"}} // Catch-all for unmapped values
]
}
Issue 6: Performance Degradation After Adding Field
Symptom: Collection takes significantly longer after adding field
Causes:
- Sequential questionnaire search (use q_id instead)
- Expensive regex in search_in_fields_using_regex
- Deep wildcard paths (multiple levels)
Solution:
- Use
q_id=instead ofq_name=orq_category= - Simplify regex patterns
- Flatten wildcard paths where possible
Summary
The Field Mapping Configuration provides:
✅ 100% Externalized: No code changes needed to add fields ✅ Flexible Sourcing: Support for questionnaires, records, requests, calculated fields ✅ Rich Transformations: Labels, templates, conditions, custom functions ✅ User-Friendly: Excel-based configuration with validation ✅ Performance Optimized: Single-call questionnaire fetching, field batching
This architecture enables rapid iteration on data extraction without deploying code changes.
Document End