Files
EB_Dashboard/DOCUMENTATION/DOCUMENTATION_11_FIELD_MAPPING.md

2148 lines
53 KiB
Markdown

# Endobest Field Mapping Configuration Guide
## Part 2: Field Mapping & Configuration
**Document Version:** 2.0 (Updated with new module references)
**Last Updated:** 2025-11-08
**Audience:** Developers, Business Analysts, Data Managers
**Language:** English
**Note:** Configuration file `Endobest_Dashboard_Config.xlsx` uses `Inclusions_Mapping` sheet for field definitions (see DOCUMENTATION_13_EXCEL_EXPORT.md and DOCUMENTATION_99_CONFIG_GUIDE.md for Excel export configuration)
---
## Table of Contents
1. [Overview](#overview)
2. [Technical Architecture](#technical-architecture)
3. [Field Processing Logic](#field-processing-logic)
4. [Configuration File Structure](#configuration-file-structure)
5. [Column Reference](#column-reference)
6. [Special Value Prefixes](#special-value-prefixes)
7. [Data Sources Explained](#data-sources-explained)
8. [Field Path Syntax](#field-path-syntax)
9. [Custom Functions Reference](#custom-functions-reference)
10. [Post-Processing Transformations](#post-processing-transformations)
11. [Configuration Examples](#configuration-examples)
12. [User Guide: Adding/Modifying Fields](#user-guide-adding-modifying-fields)
13. [Common Patterns & Recipes](#common-patterns--recipes)
14. [Troubleshooting](#troubleshooting)
---
## Overview
The **Field Mapping Configuration** defines which data points are extracted from multiple APIs (RC, GDD, questionnaires) and how they are transformed before export. The configuration is **100% externalized in an Excel file**, enabling non-technical users to add new fields without code modifications.
### Key Concepts
- **Field Group:** Logical container for related fields (e.g., "Patient_Identification", "Inclusion", "Endotest")
- **Field Name:** Unique identifier for the field within its group
- **Source:** Where the data comes from (questionnaire, record, inclusion, request)
- **Field Path:** JSON path to navigate nested structures
- **Transformations:** Post-processing rules (labels, templates, conditions)
- **Custom Functions:** Calculated fields with business logic
---
## Technical Architecture
### Field Extraction Pipeline (Detailed)
```
CONFIGURATION LOADING (startup):
├─ Load Endobest_Dashboard_Config.xlsx
├─ Parse Inclusions_Mapping sheet (rows 2 onwards)
├─ Validate each field configuration
├─ Parse JSON fields (field_path, value_labels, true_if_any, field_condition)
└─ Store in DASHBOARD_CONFIG array
FIELD PROCESSING (per patient):
├─ For each field in DASHBOARD_CONFIG:
│ ├─ Determine source type (questionnaire, record, inclusion, request, calculated)
│ │
│ ├─ IF source == questionnaire:
│ │ ├─ Method: Search by q_id, q_name, or q_category
│ │ ├─ Data: All questionnaires already fetched for patient
│ │ ├─ Path: Navigate to field_path within questionnaire answers
│ │ └─ Result: raw_value or "undefined"
│ │
│ ├─ IF source == record:
│ │ ├─ Data: Patient's clinical record
│ │ ├─ Path: Navigate JSON structure using field_path
│ │ └─ Result: raw_value or "undefined"
│ │
│ ├─ IF source == inclusion:
│ │ ├─ Data: Patient inclusion metadata
│ │ ├─ Path: Navigate nested inclusion structure
│ │ └─ Result: raw_value or "undefined"
│ │
│ ├─ IF source == request:
│ │ ├─ Data: Lab test request/results
│ │ ├─ Path: Navigate request JSON structure
│ │ └─ Result: raw_value or "undefined"
│ │
│ ├─ IF source == calculated:
│ │ ├─ Function: Custom business logic function
│ │ ├─ Arguments: From field_path
│ │ ├─ Access: Other fields already processed in output_inclusion
│ │ └─ Result: Computed value
│ │
│ ├─ CHECK field_condition (optional):
│ │ ├─ If condition is false → Set to "N/A"
│ │ ├─ If condition is undefined → Set to "undefined"
│ │ └─ If condition is true → Continue processing
│ │
│ ├─ APPLY post-processing transformations:
│ │ ├─ true_if_any: Convert to boolean
│ │ ├─ value_labels: Map to localized text
│ │ ├─ field_template: Apply formatting
│ │ └─ List joining: Flatten arrays
│ │
│ └─ STORE: output_inclusion[field_group][field_name] = final_value
└─ Result: Complete inclusion with all extended fields
```
### Questionnaire Finding Strategy
The system supports **3 methods** to locate questionnaires:
```python
def find_questionnaire(all_questionnaires, source_type, source_value):
if source_type == "q_id":
# Direct lookup by questionnaire ID (fastest)
return all_questionnaires.get(source_value, {}).get("answers")
elif source_type == "q_name":
# Sequential search by questionnaire name
for qcm_data in all_questionnaires.values():
if qcm_data["questionnaire"]["name"] == source_value:
return qcm_data.get("answers")
return None
elif source_type == "q_category":
# Sequential search by questionnaire category
for qcm_data in all_questionnaires.values():
if qcm_data["questionnaire"]["category"] == source_value:
return qcm_data.get("answers")
return None
```
**Recommendation:** Use `q_id=` for best performance (direct lookup)
### Questionnaire Data Optimization
Instead of multiple filtered API calls:
```
BEFORE (slow):
GET /api/surveys/{qcm_id_1}/answers?subject={patient_id}
GET /api/surveys/{qcm_id_2}/answers?subject={patient_id}
GET /api/surveys/{qcm_id_3}/answers?subject={patient_id}
... (N calls per patient)
AFTER (optimized - single call):
POST /api/surveys/filter/with-answers
payload: {"context": "clinic_research", "subject": patient_id}
returns: [
{"questionnaire": {id, name, category}, "answers": {...}},
{"questionnaire": {id, name, category}, "answers": {...}},
...
]
```
All questionnaires are returned in a single call, indexed by ID for fast lookup.
---
## Field Processing Logic
### Step 1: Source Type Determination
| Source Prefix | Meaning | Example | Data Location |
|---------------|---------|---------|----------------|
| `q_id=` | Questionnaire by ID | `q_id=uuid-123` | `all_questionnaires[uuid-123]["answers"]` |
| `q_name=` | Questionnaire by name | `q_name=Symptom Check` | Search by `["questionnaire"]["name"]` |
| `q_category=` | Questionnaire by category | `q_category=Symptoms` | Search by `["questionnaire"]["category"]` |
| `record` | Clinical record | `record` | `record_data["record"]` |
| `inclusion` | Inclusion metadata | `inclusion` | `inclusion_data` |
| `request` | Lab test request | `request` | `request_data` |
| (Calculated) | Custom function | N/A | Function result |
### Step 2: Raw Value Extraction
The `field_path` defines how to navigate nested JSON structures:
```python
# Simple path
field_path = ["patient", "name"]
# Equivalent to: data["patient"]["name"]
# Nested path
field_path = ["record", "clinicResearchData", 0, "data"]
# Equivalent to: data["record"]["clinicResearchData"][0]["data"]
# Wildcard path (returns array)
field_path = ["record", "clinicResearchData", "*", "test_name"]
# Returns: [test_name_1, test_name_2, test_name_3, ...]
# Deep wildcard
field_path = ["record", "*", "results", "*", "value"]
# Matches all results.*.value across all record items
```
### Step 3: Field Condition Checking (Optional)
The `field_condition` allows skipping field processing based on another field's value:
```
IF field_condition is specified:
├─ Look up condition field value in output_inclusion
├─ IF condition value is None or "undefined":
│ └─ Set final_value = "undefined" (skip further processing)
├─ IF condition value is not a boolean:
│ └─ Set final_value = "$$$$ Condition Field Error"
├─ IF condition value is False:
│ └─ Set final_value = "N/A" (field not applicable)
└─ IF condition value is True:
└─ Continue with post-processing
```
**Example:**
```json
{
"field_group": "Endotest",
"field_name": "Request_Status",
"source_id": "request",
"field_path": ["status"],
"field_condition": "Endotest.Request_Sent"
}
```
Meaning: Only populate "Request_Status" if "Request_Sent" is True. Otherwise set to "N/A".
### Step 4: Post-Processing Transformations
#### 4a. Array Flattening
If `raw_value` is an array → Join with `|` delimiter:
```
Input: ["Active", "Pending", "Resolved"]
Output: "Active|Pending|Resolved"
```
#### 4b. Score Dictionary Formatting
If `raw_value` is dict with keys `['total', 'max']` → Format as string:
```
Input: {"total": 8, "max": 10}
Output: "8/10"
```
#### 4c. true_if_any Transformation
If `true_if_any` is specified → Convert to boolean:
```
true_if_any: ["Active", "Pending"]
raw_value: "Active"
→ Does raw_value match ANY value in true_if_any list?
→ TRUE
```
#### 4d. value_labels Mapping
If `value_labels` is specified → Map value to localized text:
```json
{
"raw_value": "active",
"value_labels": [
{"value": "active", "text": {"fr": "Actif", "en": "Active"}},
{"value": "inactive", "text": {"fr": "Inactif", "en": "Inactive"}}
]
}
Output: "Actif" (French text)
```
#### 4e. field_template Formatting
If `field_template` is specified → Apply template with `$value` placeholder:
```
field_template: "Score: $value/100"
final_value: 85
→ Output: "Score: 85/100"
```
---
## Configuration File Structure
### File Location
```
Endobest_Dashboard_Config.xlsx
├─ Sheet 1: "Inclusions_Mapping" (field mapping definition)
└─ Sheet 2: "Regression_Check" (non-regression rules)
[See DOCUMENTATION_12_QUALITY_CHECKS.md]
```
### Inclusions_Mapping Sheet Overview
```
Row 1 (Headers):
A B C D E
field_group field_name source_name source_id field_path
F G H I
field_template field_condition true_if_any value_labels
Row 2+: Field definitions (one per row)
```
**Color Coding** (for visual identification):
- **Yellow:** Extended fields or Calculated fields (requires special attention)
- **Blue:** Questionnaire-sourced fields (q_id, q_name, q_category)
- **Red:** Fields with errors or missing required data
- **White:** Record/Inclusion/Request fields
---
## Column Reference
### Column A: field_group
**Type:** String (required)
**Description:** Logical grouping of related fields in output JSON
**Rules:**
- Must be unique within context (same field_name can exist in different groups)
- Becomes a dictionary key in JSON: `output[field_group][field_name]`
- Controls field visibility in regression checks
**Examples:**
```
Patient_Identification → Contains patient metadata
Inclusion → Inclusion status and data
Endotest → Lab test information
Custom_Data → Default for general fields
Infos_Générales → General information
Antécédents Médicaux → Medical history
```
### Column B: field_name
**Type:** String (required)
**Description:** Unique field identifier within its group
**Rules:**
- Must not be empty
- Can contain letters, numbers, underscores, hyphens
- Special text in parentheses is automatically removed
- Example: `Patient_Age (years)``Patient_Age`
**Excel Behavior:** When cell contains `Patient_Age (years)`, the system parses it as:
```
field_name = "Patient_Age" # Parenthetical text stripped
```
### Column C: source_name
**Type:** String (enum)
**Required:** Yes (unless cell contains "Not Specified")
**Valid Values:**
```
Inclusion → Field from inclusion data
Record → Field from clinical record
Request → Field from lab test request
Patient / Douleurs → Questionnaire name (implicit q_name=)
Signes et symptômes → Questionnaire name (implicit q_name=)
Calculated → Custom function (no direct source)
Not Specified → Skip this row (used for spacing/comments)
```
### Column D: source_id
**Type:** String (enum with prefixes or JSON array)
**Description:** Specifies how to identify the data source
#### Format Options:
**1. Questionnaire by ID (Recommended)**
```
Syntax: q_id=<uuid>
Example: q_id=550e8400-e29b-41d4-a716-446655440000
Speed: Fastest (direct lookup)
```
**2. Questionnaire by Name**
```
Syntax: q_name=<name>
Example: q_name=Symptom Questionnaire
Speed: Slower (sequential search)
```
**3. Questionnaire by Category**
```
Syntax: q_category=<category>
Example: q_category=Medical History
Speed: Slower (sequential search)
```
**4. Record Source**
```
Value: record
Means: Extract from clinical record data
```
**5. Inclusion Source**
```
Value: inclusion
Means: Extract from inclusion metadata
```
**6. Request Source**
```
Value: request
Means: Extract from lab test request
```
**7. Calculated Function**
```
Syntax: <function_name>
Example: search_in_fields_using_regex, if_then_else, extract_parentheses_content
See Section: Custom Functions Reference
```
### Column E: field_path
**Type:** JSON array (required when field is specified)
**Description:** Path to navigate nested JSON structure
#### Syntax Examples:
**Simple field:**
```json
["name"]
// Equivalent to: data["name"]
```
**Nested path:**
```json
["record", "patient", "demographics", "age"]
// Equivalent to: data["record"]["patient"]["demographics"]["age"]
```
**Array index:**
```json
["record", "clinicResearchData", 0, "test_name"]
// Equivalent to: data["record"]["clinicResearchData"][0]["test_name"]
```
**Wildcard (all elements):**
```json
["record", "clinicResearchData", "*", "test_name"]
// Returns: [test_name_1, test_name_2, test_name_3, ...]
// Result: Automatically joined with "|" in final value
```
**For Calculated Functions (arguments):**
```json
[
"search_in_fields_using_regex",
".*surgery.*",
"Previous_Surgery",
"Recent_Surgery"
]
// First element: function name
// Rest: arguments to pass to function
```
### Column F: field_template
**Type:** String with `$value` placeholder (optional)
**Description:** Apply formatting to the final value
**Rules:**
- Only applied if final_value is not "undefined" or "N/A"
- Must contain `$value` placeholder
- Result: Template with `$value` replaced by actual value
**Examples:**
```
Template: "$value%"
Value: 85
Result: "85%"
Template: "Score: $value/100"
Value: 42
Result: "Score: 42/100"
Template: "Status: $value (Updated)"
Value: "Active"
Result: "Status: Active (Updated)"
```
### Column G: field_condition
**Type:** String (field name reference, optional)
**Description:** Conditional field inclusion based on another field's value
**Rules:**
- If specified, must reference another field name already processed
- Must evaluate to a boolean value
- Referenced as `<field_group>.<field_name>`
**Logic:**
```
IF field_condition_value == True:
Process field normally
ELIF field_condition_value == False:
Set final_value = "N/A"
ELSE (undefined/null/non-boolean):
Set final_value = "undefined"
```
**Examples:**
```
field_condition: Inclusion.isPrematurelyTerminated
Meaning: Only process this field if patient is prematurely terminated
field_condition: Endotest.Request_Sent
Meaning: Only process if test request was sent
```
### Column H: true_if_any
**Type:** JSON array (optional)
**Description:** Convert to boolean if value matches ANY item in array
**Syntax:**
```json
["value1", "value2", "value3"]
```
**Logic:**
```
LOOP through true_if_any array:
IF raw_value == any_item:
RETURN True
RETURN False
```
**Example:**
```json
{
"field_name": "Is_Active",
"true_if_any": ["active", "pending", "processing"]
}
raw_value = "pending"
Does "pending" exist in ["active", "pending", "processing"]?
YES Final value = True
raw_value = "completed"
Does "completed" exist in list?
NO Final value = False
```
### Column I: value_labels
**Type:** JSON array of mapping objects (optional)
**Description:** Map field values to localized text labels
**Syntax:**
```json
[
{
"value": "raw_value_1",
"text": {
"fr": "Libellé Français",
"en": "English Label"
}
},
{
"value": "raw_value_2",
"text": {
"fr": "Autre Libellé",
"en": "Another Label"
}
}
]
```
**Logic:**
```
LOOP through value_labels array:
IF label_map.value == raw_value:
RETURN label_map.text.fr (French text)
IF no match found:
RETURN "$$$$ Value Error: {raw_value}"
```
**Example:**
```json
{
"field_name": "Status",
"value_labels": [
{
"value": 1,
"text": {"fr": "Inclus", "en": "Included"}
},
{
"value": 0,
"text": {"fr": "Pré-inclus", "en": "Pre-included"}
}
]
}
raw_value = 1
Map to French label: "Inclus"
```
---
## Special Value Prefixes
This section documents special prefixes and keywords used in Extended Fields configuration for value resolution and field references.
### Prefix: `$` (String Literal)
**Location:** In function arguments (like `if_then_else` parameters)
**Meaning:** Marks a string value as a literal (not a field reference)
**Syntax:** `$value` (just prefix with `$`, no quotes needed)
**Without `$` prefix:**
```json
{
"field_path": ["is_true", "Has_Consent", "YES", "NO"]
}
// "YES" is interpreted as a FIELD NAME to look up
// This will fail because no field named "YES" exists
```
**With `$` prefix (correct):**
```json
{
"field_path": ["is_true", "Has_Consent", "$YES", "$NO"]
}
// $YES is interpreted as LITERAL STRING "YES"
// $NO is interpreted as LITERAL STRING "NO"
// Has_Consent is interpreted as FIELD NAME (no prefix)
```
**Why It Matters:** The system needs to distinguish between:
- **Field references** (look up values): `Status`, `Is_Active`, `Patient_Id`
- **Literal values** (use as-is): `$Active`, `$N/A`, `$Ready`
---
### No Prefix: Field References
**Location:** Arguments where field names are expected
**Meaning:** Refers to a field in the current inclusion data
**Examples:**
```json
{
"field_path": ["is_true", "Has_Consent", "$YES", "$NO"]
}
// Has_Consent ← field reference (look up this field's value)
// Status ← field reference
// Is_Active ← field reference
```
**Resolution:** The system looks up the field in the current inclusion object.
---
### Wildcard: `*` in Field Paths
**Location:** In `field_path` column (Column E in Mapping sheet)
**Meaning:** Match all elements at this level
**Syntax:**
```json
["record", "*", "results", "*", "value"]
```
**Example 1: Single Level Wildcard**
```json
{
"field_path": ["items", "*", "name"]
}
// Returns all "name" values from each item
// If items = [
// {name: "Item 1", ...},
// {name: "Item 2", ...},
// {name: "Item 3", ...}
// ]
// Result: ["Item 1", "Item 2", "Item 3"]
// Final output: "Item 1|Item 2|Item 3" (pipe-joined)
```
**Example 2: Multiple Level Wildcard**
```json
{
"field_path": ["record", "*", "data", "*", "test"]
}
// Matches test values at multiple nesting levels
```
**Post-Processing:**
- Arrays are automatically joined with `|` delimiter
- Scalar values are kept as-is
---
### Value Resolution in if_then_else
When using the `if_then_else` function, values are resolved based on their format:
| Format | Type | Resolution |
|--------|------|-----------|
| `true`, `false` | Boolean literal | Used directly |
| `42`, `3.14` | Numeric literal | Used directly |
| `$string` | String literal | Remove `$` prefix and use value |
| `field_name` | Field reference | Look up field value |
**Examples:**
```json
{
"field_path": ["is_true", "Has_Consent", "$APPROVED", "$NOT_APPROVED"]
}
// Has_Consent → field reference (look it up)
// $APPROVED → string literal (use "APPROVED")
// $NOT_APPROVED → string literal (use "NOT_APPROVED")
{
"field_path": ["==", "Status", "$Active", "Overall_Status", "$MISSING"]
}
// Status → field reference
// $Active → string literal (use "Active")
// Overall_Status → field reference
// $MISSING → string literal (use "MISSING")
```
---
## Summary Table: Special Prefixes
| Symbol | Meaning | Example |
|--------|---------|---------|
| `$value` | String literal (remove `$` prefix) | `$YES`, `$READY`, `$N/A` |
| No prefix | Field reference (look up) | `Status`, `Patient_Id` |
| `*` | Wildcard in field_path (all array elements) | `["items", "*", "name"]` |
---
## Data Sources Explained
### 1. Questionnaire Sources (q_id, q_name, q_category)
#### What Are Questionnaires?
Questionnaires are forms/surveys filled out by patients or clinicians in the Research Clinic system. Each questionnaire has:
- **ID:** Unique identifier (UUID)
- **Name:** Display name (e.g., "Symptom Assessment")
- **Category:** Logical grouping (e.g., "Medical History")
- **Answers:** Key-value pairs of responses
#### Data Structure
```json
all_questionnaires: {
"qcm-uuid-1": {
"questionnaire": {
"id": "qcm-uuid-1",
"name": "Symptom Questionnaire",
"category": "Symptoms"
},
"answers": {
"question_1": "answer_value",
"question_2": true,
"question_3": 42
}
},
"qcm-uuid-2": {
"questionnaire": {
"id": "qcm-uuid-2",
"name": "Medical History",
"category": "History"
},
"answers": {
"has_diabetes": false,
"has_hypertension": true
}
}
}
```
#### Finding Questionnaires
**Option 1: By ID (Fastest)**
```json
{
"source_id": "q_id=qcm-uuid-1",
"field_path": ["answers", "question_1"]
}
// Direct lookup in dictionary by ID
// Performance: O(1) constant time
```
**Option 2: By Name**
```json
{
"source_id": "q_name=Symptom Questionnaire",
"field_path": ["answers", "question_1"]
}
// Sequential search through all questionnaires
// Performance: O(n) proportional to questionnaire count
```
**Option 3: By Category**
```json
{
"source_id": "q_category=Symptoms",
"field_path": ["answers", "question_1"]
}
// Sequential search for category match
// Performance: O(n)
```
**Recommendation:** Use `q_id=` for best performance. Name and category searches are slower but acceptable if IDs are not available.
### 2. Record Source (Clinical Data)
#### What Is Record Data?
The clinical record contains all medical information for a patient within the Research Clinic context:
- Protocol inclusions status
- Clinical research data (test requests, results)
- Patient demographics
- Medical history
#### Data Structure
```json
record_data: {
"record": {
"id": "record-uuid",
"patientId": "patient-uuid",
"protocol_inclusions": [
{
"status": "incluse",
"blockedQcmVersions": [],
"clinicResearchData": [
{
"requestMetaData": {
"tubeId": "tube-uuid-123"
},
"needRcp": false
}
]
}
]
}
}
```
#### Example Extraction
```json
{
"source_id": "record",
"field_path": ["record", "protocol_inclusions", 0, "status"]
}
// Result: "incluse"
{
"source_id": "record",
"field_path": ["record", "clinicResearchData", "*", "requestMetaData", "tubeId"]
}
// Result: ["tube-uuid-1", "tube-uuid-2"]
// Final: "tube-uuid-1|tube-uuid-2"
```
### 3. Inclusion Source (Inclusion Metadata)
#### What Is Inclusion Data?
Inclusion data contains metadata about the patient's inclusion in the research protocol:
- Basic patient information (name, birthday)
- Organization assignment
- Inclusion status
- Inclusion date
#### Data Structure
```json
inclusion_data: {
"id": "patient-uuid",
"name": "Doe, John",
"birthday": "1975-05-15",
"status": "incluse",
"inclusionDate": "2024-10-15",
"organization_id": "org-uuid-added-by-system",
"organization_name": "Center Name-added-by-system"
}
```
#### Example Extraction
```json
{
"source_id": "inclusion",
"field_path": ["name"]
}
// Result: "Doe, John"
{
"source_id": "inclusion",
"field_path": ["status"]
}
// Result: "incluse"
```
### 4. Request Source (Lab Test Data)
#### What Is Request Data?
Request data contains information about laboratory tests ordered and their results:
- Test request status
- Diagnostic status
- Individual test results
- Result values
#### Data Structure
```json
request_data: {
"id": "request-uuid",
"tubeId": "tube-uuid-123",
"status": "completed",
"diagnostic_status": "Completed",
"results": [
{
"testName": "Complete Blood Count",
"value": "Normal",
"unit": ""
},
{
"testName": "Coelioscopie",
"value": "Findings documented",
"unit": ""
}
]
}
```
#### Example Extraction
```json
{
"source_id": "request",
"field_path": ["status"]
}
// Result: "completed"
{
"source_id": "request",
"field_path": ["results", "*", "testName"]
}
// Result: ["Complete Blood Count", "Coelioscopie"]
// Final: "Complete Blood Count|Coelioscopie"
```
### 5. Calculated Source (Custom Functions)
#### What Are Calculated Fields?
Calculated fields derive their values from custom business logic functions, not direct data extraction. The function can access other already-processed fields and perform complex transformations.
#### Examples
```json
{
"source_name": "Calculated",
"source_id": "search_in_fields_using_regex",
"field_path": [".*SURGERY.*", "Previous_Surgery", "Recent_Surgery"]
}
// Function searches multiple fields using regex
{
"source_name": "Calculated",
"source_id": "if_then_else",
"field_path": ["is_true", "Requested", "$\"YES\"", "$\"NO\""]
}
// Function applies conditional logic
{
"source_name": "Calculated",
"source_id": "extract_parentheses_content",
"field_path": ["Status_Field"]
}
// Function extracts text from within parentheses
```
See **Section: Custom Functions Reference** for detailed function documentation.
### 6. Inclusion Source with Organization Enrichment (center_name)
#### What Is Organization Center Mapping?
The organization center mapping feature enriches patient inclusion data with standardized center identifiers. When configured, the `center_name` field is automatically added to each inclusion record, allowing you to group patients by center codes.
#### Data Source: Inclusion Type
```json
{
"source_name": "Inclusion",
"source_id": "inclusion",
"source_type": "inclusion",
"field_path": ["center_name"]
}
```
#### Fields Available from Organization Enrichment
| Field | Type | Description | Availability |
|-------|------|-------------|--------------|
| `center_name` | String | Standardized center identifier | If mapping file exists |
| `organization_name` | String | Full organization name | Always |
| `organization_id` | String | Organization UUID | Always |
#### Data Structure
```json
inclusion_data: {
"organization_id": "org-uuid",
"organization_name": "Hospital Cardiology Research Lab",
"center_name": "HCR-MAIN", // ← Added by organization mapping
"id": "patient-uuid",
...
}
```
#### Example Extraction
```json
{
"source_name": "Inclusion",
"source_id": "inclusion",
"source_type": "inclusion",
"field_path": ["center_name"]
}
// Result: "HCR-MAIN"
{
"source_name": "Inclusion",
"source_id": "inclusion",
"source_type": "inclusion",
"field_path": ["organization_name"]
}
// Result: "Hospital Cardiology Research Lab"
```
#### Configuration Requirements
**To use this feature:**
1. Create `eb_org_center_mapping.xlsx` in script directory (see [DOCUMENTATION_10_ARCHITECTURE.md](DOCUMENTATION_10_ARCHITECTURE.md) Organization ↔ Center Mapping section)
2. Define mapping rules in the `Org_Center_Mapping` sheet
3. Add extended field with source type "inclusion" and field_path ["center_name"]
**Availability:**
- ✅ If mapping file exists and organization is mapped → `center_name` = mapped value
- ⚠️ If mapping file missing or organization not in mapping → `center_name` = organization name (fallback)
#### Example Configuration
```json
{
"field_group": "Patient_Identification",
"field_name": "Center_Name",
"source_name": "Inclusion",
"source_id": "inclusion",
"source_type": "inclusion",
"field_path": ["center_name"],
"field_template": null,
"field_condition": null,
"true_if_any": null,
"value_labels": null
}
```
**Result in output:**
```json
{
"Patient_Identification": {
"Organisation_Name": "Hospital Cardiology Research Lab",
"Center_Name": "HCR-MAIN",
...
}
}
```
---
## Field Path Syntax
### Basic Path Navigation
#### Single-Level Access
```json
["field_name"]
// JavaScript equivalent: data.field_name
// Result: value or undefined
```
#### Multi-Level Nesting
```json
["record", "patient", "demographics", "age"]
// JavaScript: data.record.patient.demographics.age
```
#### Array Index Access
```json
["items", 0, "name"]
// JavaScript: data.items[0].name
// Accesses first element of array
```
#### Negative Index (from end)
```json
["items", -1, "name"]
// JavaScript: data.items[data.items.length - 1].name
// Accesses last element of array
```
### Wildcard Paths (Multiple Values)
#### Single Wildcard (One Level)
```json
["questionnaire", "answers", "*", "value"]
// Returns all values from each answer object
// Result: Array of values [value1, value2, value3, ...]
```
#### Multiple Wildcards (Deep)
```json
["record", "*", "data", "*", "test"]
// Matches nested wildcards at multiple levels
// Returns: All tests at matching paths
```
#### Wildcard Result Flattening
```json
path: ["items", "*", "values", "*", "score"]
items: [
{
"values": [
{"score": 10},
{"score": 20}
]
},
{
"values": [
{"score": 30},
{"score": 40}
]
}
]
// Without flattening: [[10, 20], [30, 40]]
// With flattening (used): [10, 20, 30, 40]
```
### Edge Cases & Behavior
#### Missing Path
```json
field_path: ["missing", "field"]
data: {}
Result: "undefined" (not null or empty string)
```
#### Null/None Values in Path
```json
field_path: ["patient", "contact", "phone"]
data: {"patient": {"contact": null}}
Result: "undefined" (stops at null)
```
#### Non-Dictionary/Non-List Element
```json
field_path: ["patient", "name", "first"]
data: {"patient": {"name": "John"}} // "name" is string, not dict
Result: "undefined" (cannot navigate string)
```
---
## Custom Functions Reference
### Function 1: search_in_fields_using_regex
**Purpose:** Search multiple fields for regex pattern match (case-insensitive)
**Syntax:**
```json
{
"source_name": "Calculated",
"source_id": "search_in_fields_using_regex",
"field_path": ["regex_pattern", "field_1", "field_2", ...]
}
```
**Parameters:**
- **regex_pattern** (string): Regular expression pattern (case-insensitive)
- **field_1, field_2, ...** (strings): Field names to search (looked up in output_inclusion)
**Logic:**
```
FOR EACH field in [field_1, field_2, ...]:
value = get_value_from_inclusion(field_name)
IF value is string AND value matches regex_pattern:
RETURN True
RETURN False
```
**Return Value:**
- `True` if ANY field matches the pattern
- `False` if NO fields match
- `"undefined"` if ALL fields are undefined
**Examples:**
Example 1: Detect if any surgery field contains "surgery"
```json
{
"field_name": "Has_Surgery_History",
"source_id": "search_in_fields_using_regex",
"field_path": [".*surgery.*", "Previous_Surgery", "Recent_Surgery", "Planned_Surgery"]
}
If any of these fields contains "surgery" True
Otherwise False
```
Example 2: Check for specific procedures
```json
{
"field_name": "Is_Endoscopy_Planned",
"source_id": "search_in_fields_using_regex",
"field_path": ["endoscopy|colonoscopy", "Procedure_Type", "Procedure_Notes"]
}
Matches if "endoscopy" OR "colonoscopy" appears in either field
```
### Function 2: extract_parentheses_content
**Purpose:** Extract text within the first set of parentheses
**Syntax:**
```json
{
"source_name": "Calculated",
"source_id": "extract_parentheses_content",
"field_path": ["field_name"]
}
```
**Parameters:**
- **field_name** (string): Field to extract from (looked up in output_inclusion)
**Logic:**
```
value = get_value_from_inclusion(field_name)
IF value is not defined:
RETURN "undefined"
MATCH first occurrence of (content) pattern
IF match found:
RETURN content
ELSE:
RETURN "undefined"
```
**Return Value:**
- Text extracted from parentheses (e.g., "Active")
- `"undefined"` if no parentheses found or field undefined
**Examples:**
Example 1: Extract status from formatted field
```
Input: "Patient Status (Active)"
Output: "Active"
```
Example 2: Extract category name
```
Input: "Medical Condition (Hypertension)"
Output: "Hypertension"
```
Example 3: Nested extraction
```
Input: "Surgery Scheduled (Appendectomy - Jan 15)"
Output: "Appendectomy - Jan 15"
```
### Function 3: append_terminated_suffix
**Purpose:** Add " - AP" suffix to status if patient prematurely terminated
**Syntax:**
```json
{
"source_name": "Calculated",
"source_id": "append_terminated_suffix",
"field_path": ["status_field_name", "is_terminated_field_name"]
}
```
**Parameters:**
- **status_field_name** (string): Field containing status value
- **is_terminated_field_name** (string): Boolean field indicating termination
**Logic:**
```
status = get_value_from_inclusion(status_field_name)
is_terminated = get_value_from_inclusion(is_terminated_field_name)
IF status is undefined:
RETURN "undefined"
IF is_terminated is TRUE:
RETURN status + " - AP"
ELSE:
RETURN status
```
**Return Value:**
- Status with " - AP" suffix if terminated
- Original status if not terminated
- `"undefined"` if status field undefined
**Examples:**
Example 1: Mark prematurely terminated patients
```json
{
"field_name": "Inclusion_Status",
"source_id": "append_terminated_suffix",
"field_path": ["Base_Status", "isPrematurelyTerminated"]
}
If isPrematurelyTerminated = True:
"incluse" "incluse - AP"
If isPrematurelyTerminated = False:
"incluse" "incluse"
```
### Function 4: if_then_else
**Purpose:** Unified conditional logic with 8 different operators
**Syntax:**
```json
{
"source_name": "Calculated",
"source_id": "if_then_else",
"field_path": ["operator", arg1, arg2_optional, result_if_true, result_if_false]
}
```
#### Operator Reference
##### Operator 1: is_true
**Signature:** `["is_true", field_name, result_if_true, result_if_false]`
**Logic:** IF field == True THEN result_if_true ELSE result_if_false
**Example:**
```json
{
"field_path": ["is_true", "Has_Consent", "$\"Consented\"", "$\"Not Consented\""]
}
// If Has_Consent = True → "Consented"
// If Has_Consent = False → "Not Consented"
```
##### Operator 2: is_false
**Signature:** `["is_false", field_name, result_if_true, result_if_false]`
**Logic:** IF field == False THEN result_if_true ELSE result_if_false
**Example:**
```json
{
"field_path": ["is_false", "Has_Exclusion", "$\"Eligible\"", "$\"Excluded\""]
}
```
##### Operator 3: is_defined
**Signature:** `["is_defined", field_name, result_if_true, result_if_false]`
**Logic:** IF field is not undefined THEN result_if_true ELSE result_if_false
**Example:**
```json
{
"field_path": ["is_defined", "Surgery_Date", "$\"Date Available\"", "$\"No Date\""]
}
```
##### Operator 4: is_undefined
**Signature:** `["is_undefined", field_name, result_if_true, result_if_false]`
**Logic:** IF field is undefined THEN result_if_true ELSE result_if_false
**Example:**
```json
{
"field_path": ["is_undefined", "Last_Contact", "$\"Never Contacted\"", "$\"Contacted\""]
}
```
##### Operator 5: all_true
**Signature:** `["all_true", [field_1, field_2, ...], result_if_true, result_if_false]`
**Logic:** IF all fields == True THEN result_if_true ELSE result_if_false
**Example:**
```json
{
"field_path": ["all_true", ["Has_Consent", "Has_Results", "Is_Complete"], "$\"READY\"", "$\"INCOMPLETE\""]
}
// Returns "READY" only if ALL three fields are True
```
##### Operator 6: all_defined
**Signature:** `["all_defined", [field_1, field_2, ...], result_if_true, result_if_false]`
**Logic:** IF all fields are defined THEN result_if_true ELSE result_if_false
**Example:**
```json
{
"field_path": ["all_defined", ["First_Name", "Last_Name", "Birth_Date"], "$\"COMPLETE\"", "$\"INCOMPLETE\""]
}
// Returns "COMPLETE" only if ALL three fields have values
```
##### Operator 7: ==
**Signature:** `["==", value1, value2, result_if_true, result_if_false]`
**Logic:** IF value1 == value2 THEN result_if_true ELSE result_if_false
**Example:**
```json
{
"field_path": ["==", "Status", "$\"Active\"", "$\"Is Active\"", "$\"Not Active\""]
}
// If Status equals "Active" → "Is Active"
```
##### Operator 8: !=
**Signature:** `["!=", value1, value2, result_if_true, result_if_false]`
**Logic:** IF value1 != value2 THEN result_if_true ELSE result_if_false
**Example:**
```json
{
"field_path": ["!=", "Status", "$\"Inactive\"", "$\"Active\"", "$\"Inactive\""]
}
// If Status NOT equal to "Inactive" → "Active"
```
#### Value Resolution
The function supports multiple value types:
**Boolean Literals:**
```json
true, false
// Used directly without field lookup
```
**Numeric Literals:**
```json
42, 3.14, 0, -1
// Used directly without field lookup
```
**String Literals (Prefixed with $):**
```json
"$\"Active\"", "$\"Ready\"", "$\"N/A\""
// Remove $ prefix before using
// $ prefix signals: don't look this up as field name
```
**Field References (No Prefix):**
```json
"Status", "Is_Active", "Patient_Name"
// Looked up in output_inclusion
```
**Complex Examples:**
```json
{
"field_path": ["==", "Status_Code", 1, "$\"Active\"", "$\"Inactive\""]
}
// Compare Status_Code field against numeric value 1
{
"field_path": ["all_true", ["Consent_Received", "Test_Completed"], "Overall_Status", "$\"MISSING\""]
}
// If both conditions true, use Overall_Status value
// If either false, use literal "MISSING"
```
---
## Post-Processing Transformations
### Transformation Order
```
Raw Value Extraction
Condition Check
IF final_value is list:
└─ Join with "|" delimiter
IF final_value is score dict (has 'total' and 'max'):
└─ Format as "total/max"
IF true_if_any is specified:
└─ Apply boolean conversion
IF value_labels is specified:
└─ Apply label mapping
IF field_template is specified:
└─ Apply formatting with $value
```
### Transformation 1: Array Flattening
**When:** Raw value is an array/list
**Action:** Join elements with `|` delimiter
**Example:**
```
Raw: ["Active", "Pending", "Resolved"]
Output: "Active|Pending|Resolved"
```
### Transformation 2: Score Dictionary Formatting
**When:** Raw value is dict with keys ['total', 'max']
**Action:** Convert to "total/max" string format
**Example:**
```
Raw: {"total": 8, "max": 10}
Output: "8/10"
```
### Transformation 3: true_if_any
**When:** true_if_any is specified in configuration
**Action:** Check if raw value matches ANY item in the array
**Example:**
```json
{
"true_if_any": ["Active", "Pending", "Processing"],
"raw_value": "Active"
}
// Result: true
{
"true_if_any": ["Active", "Pending"],
"raw_value": "Completed"
}
// Result: false
```
### Transformation 4: value_labels
**When:** value_labels is specified in configuration
**Action:** Map raw value to localized text
**Logic:**
```
FOR EACH label_map in value_labels:
IF label_map.value == raw_value:
RETURN label_map.text.fr (French label)
IF no match:
RETURN "$$$$ Value Error: {raw_value}"
```
**Example:**
```json
{
"value_labels": [
{"value": "active", "text": {"fr": "Actif", "en": "Active"}},
{"value": "inactive", "text": {"fr": "Inactif", "en": "Inactive"}}
],
"raw_value": "active"
}
// Result: "Actif"
```
### Transformation 5: field_template
**When:** field_template is specified (and value is not "undefined" or "N/A")
**Action:** Replace $value placeholder with actual value
**Example:**
```
template: "Score: $value/100"
raw_value: 85
Result: "Score: 85/100"
template: "Status [$value]"
raw_value: "Active"
Result: "Status [Active]"
```
---
## Configuration Examples
### Example 1: Simple Field Extraction
**Requirement:** Extract patient name from inclusion data
```json
{
"field_group": "Patient_Identification",
"field_name": "Patient_Name",
"source_name": "Inclusion",
"source_id": "inclusion",
"field_path": ["name"],
"field_template": null,
"field_condition": null,
"true_if_any": null,
"value_labels": null
}
```
**Flow:**
1. Source: inclusion data
2. Extract: data["name"]
3. Result: "Doe, John"
4. Output: {"Patient_Identification": {"Patient_Name": "Doe, John"}}
### Example 2: Questionnaire Field with Label Mapping
**Requirement:** Extract symptom severity and map to French labels
```json
{
"field_group": "Symptoms",
"field_name": "Severity",
"source_name": "Symptoms (OUI/NON)",
"source_id": "q_id=77e488a1-d3c-148af-a6bc-8fe1f55e82e4",
"field_path": ["answers", "question5"],
"field_template": null,
"field_condition": null,
"true_if_any": null,
"value_labels": [
{"value": 1, "text": {"fr": "Léger", "en": "Mild"}},
{"value": 2, "text": {"fr": "Modéré", "en": "Moderate"}},
{"value": 3, "text": {"fr": "Sévère", "en": "Severe"}}
]
}
```
**Flow:**
1. Source: Questionnaire with ID 77e488a1-...
2. Extract: answers["question5"] → 2
3. Apply value_labels: 2 → "Modéré"
4. Output: {"Symptoms": {"Severity": "Modéré"}}
### Example 3: Conditional Field
**Requirement:** Only show request status if test was requested
```json
{
"field_group": "Endotest",
"field_name": "Request_Status",
"source_name": "Request",
"source_id": "request",
"field_path": ["status"],
"field_template": null,
"field_condition": "Endotest.Request_Sent",
"true_if_any": null,
"value_labels": null
}
```
**Flow:**
1. Check condition: Endotest.Request_Sent
2. If False → Set to "N/A"
3. If True → Extract status from request data
4. Output: {"Endotest": {"Request_Status": "completed"}} or "N/A"
### Example 4: Calculated Field with if_then_else
**Requirement:** Show overall status based on inclusion and termination
```json
{
"field_group": "Inclusion",
"field_name": "Inclusion_Status_Complete",
"source_name": "Calculated",
"source_id": "if_then_else",
"field_path": ["is_true", "isPrematurelyTerminated", "$\"incluse - AP\"", "Inclusion_Status"],
"field_template": null,
"field_condition": null,
"true_if_any": null,
"value_labels": null
}
```
**Flow:**
1. Check: Is isPrematurelyTerminated == True?
2. If YES → Return literal "incluse - AP"
3. If NO → Return value of Inclusion_Status field
4. Output: {"Inclusion": {"Inclusion_Status_Complete": "incluse - AP"}} or "incluse"
### Example 5: Array Field with Formatting
**Requirement:** Extract all test names and format them
```json
{
"field_group": "Endotest",
"field_name": "Tests_Performed",
"source_name": "Request",
"source_id": "request",
"field_path": ["results", "*", "testName"],
"field_template": "Tests: $value",
"field_condition": null,
"true_if_any": null,
"value_labels": null
}
```
**Flow:**
1. Source: request data
2. Extract: results[*].testName → ["Blood Test", "Imaging", "ECG"]
3. Array flattening → "Blood Test|Imaging|ECG"
4. Apply template → "Tests: Blood Test|Imaging|ECG"
5. Output: {"Endotest": {"Tests_Performed": "Tests: Blood Test|Imaging|ECG"}}
### Example 6: Complex Conditional Logic
**Requirement:** Show surgery type based on multiple conditions
```json
{
"field_group": "Surgery",
"field_name": "Surgery_Status",
"source_name": "Calculated",
"source_id": "if_then_else",
"field_path": [
"all_true",
["Surgery_Planned", "Surgeon_Assigned", "Date_Set"],
"$\"READY_FOR_SURGERY\"",
"$\"INCOMPLETE_PREPARATION\""
],
"field_template": null,
"field_condition": null,
"true_if_any": null,
"value_labels": null
}
```
**Flow:**
1. Check: Are ALL of [Surgery_Planned, Surgeon_Assigned, Date_Set] == True?
2. If YES → "READY_FOR_SURGERY"
3. If NO → "INCOMPLETE_PREPARATION"
4. Output: Conditional status
### Example 7: Search and Boolean Conversion
**Requirement:** Detect if patient has surgery history
```json
{
"field_group": "Medical_History",
"field_name": "Has_Prior_Surgery",
"source_name": "Calculated",
"source_id": "search_in_fields_using_regex",
"field_path": [".*surgery|.*intervention.*", "History_Notes", "Previous_Procedures"],
"field_template": null,
"field_condition": null,
"true_if_any": null,
"value_labels": null
}
```
**Flow:**
1. Search History_Notes and Previous_Procedures
2. Pattern: ".*surgery|.*intervention.*" (case-insensitive)
3. If ANY field matches → true
4. If NO matches → false
5. Output: {"Medical_History": {"Has_Prior_Surgery": true}}
---
## User Guide: Adding/Modifying Fields
### Step 1: Identify Data Source
Determine where the data lives:
```
Patient Name → inclusion (inclusion_data)
Symptom Severity → questionnaire (q_id, q_name, or q_category)
Clinical Notes → record (record_data)
Test Results → request (request_data)
Derived Value → calculated (custom function)
```
### Step 2: Locate Field Path
Navigate the JSON structure to find the exact path:
**For Inclusion:**
```
Open endobest_inclusions_old.json
Find a patient record
Look for field under "Patient_Identification"
Example path: ["name"]
```
**For Questionnaire:**
```
Need questionnaire ID/name/category
Look inside answers object
Example: q_id=abc-123, field_path: ["answers", "question_5"]
```
**For Record:**
```
Open a record with GET /api/records/byPatient
Navigate structure
Example: ["record", "clinicResearchData", 0, "requestMetaData"]
```
**For Request:**
```
Field from lab request response
Example: ["results", "*", "testName"]
```
### Step 3: Create Configuration Row
Open Endobest_Dashboard_Config.xlsx → Inclusions_Mapping sheet
```
Row N:
A: field_group (e.g., "Custom_Data")
B: field_name (e.g., "Patient_Status")
C: source_name (e.g., "Inclusion")
D: source_id (e.g., "inclusion")
E: field_path (e.g., ["status"])
F: field_template (optional, e.g., "Status: $value")
G: field_condition (optional, e.g., "Inclusion.Is_Active")
H: true_if_any (optional, e.g., ["active", "pending"])
I: value_labels (optional, complex JSON)
```
### Step 4: Validate Configuration
Run the dashboard in check-only mode:
```bash
python eb_dashboard.py --check-only
```
**Expected Output:**
```
✓ Loaded 81 fields from extended configuration.
✓ All checks passed successfully!
```
**If errors occur:**
```
Error in config file, row 42, field 'field_path': Invalid JSON format.
```
→ Fix the JSON syntax in the cell
### Step 5: Test with Full Collection
```bash
python eb_dashboard.py
```
After collection completes, verify:
1. New field appears in endobest_inclusions.json
2. Values are populated correctly
3. No data quality issues reported
### Step 6: Document the Field
Add comments in a separate notes section (if available) explaining:
- Purpose of the field
- Data source and ID
- Any special transformations
- Expected value ranges/types
---
## Common Patterns & Recipes
### Pattern 1: Boolean Flag from Multiple Conditions
**Requirement:** Create true/false flag based on multiple fields
```json
{
"field_group": "Flags",
"field_name": "Is_Ready_For_Export",
"source_name": "Calculated",
"source_id": "if_then_else",
"field_path": [
"all_true",
["Has_Consent", "Data_Complete", "Approved"],
true,
false
]
}
```
### Pattern 2: Score Display Formatting
**Requirement:** Show quality of life score as "X/100" format
```json
{
"field_group": "Quality_Metrics",
"field_name": "QOL_Score_Display",
"source_name": "q_id=...",
"source_id": "q_id=...",
"field_path": ["answers", "overall_score"],
"field_template": "$value/100"
}
```
### Pattern 3: Status Translation with Suffix
**Requirement:** Show inclusion status with " - AP" for terminated patients
```json
{
"field_group": "Inclusion",
"field_name": "Status_With_Termination",
"source_name": "Calculated",
"source_id": "append_terminated_suffix",
"field_path": ["Inclusion_Status", "isPrematurelyTerminated"]
}
```
### Pattern 4: List-to-String Conversion
**Requirement:** Show all diagnoses as pipe-separated text
```json
{
"field_group": "Medical_Data",
"field_name": "All_Diagnoses",
"source_name": "Record",
"source_id": "record",
"field_path": ["record", "diagnoses", "*", "code"]
// Result: "ICD-001|ICD-002|ICD-003"
}
```
### Pattern 5: Optional Field Based on Condition
**Requirement:** Only show surgery details if surgery was performed
```json
{
"field_group": "Surgery",
"field_name": "Surgery_Details",
"source_name": "Record",
"source_id": "record",
"field_path": ["record", "surgery", "details"],
"field_condition": "Surgery.Surgery_Performed"
// If Surgery_Performed = false → "N/A"
}
```
### Pattern 6: Enum-to-Text Mapping
**Requirement:** Convert numeric status codes to readable text
```json
{
"field_group": "Status",
"field_name": "Inclusion_Status_Text",
"source_name": "Inclusion",
"source_id": "inclusion",
"field_path": ["status_code"],
"value_labels": [
{"value": 0, "text": {"fr": "Pré-inclus", "en": "Pre-included"}},
{"value": 1, "text": {"fr": "Inclus", "en": "Included"}},
{"value": 2, "text": {"fr": "Exclus", "en": "Excluded"}}
]
}
```
### Pattern 7: Pattern Matching in Multiple Fields
**Requirement:** Check if any medical note mentions specific condition
```json
{
"field_group": "Medical",
"field_name": "Mentions_Hypertension",
"source_name": "Calculated",
"source_id": "search_in_fields_using_regex",
"field_path": [
"hypertension|high.*pressure|HBP",
"Medical_History",
"Current_Conditions",
"Medication_Notes"
]
}
```
### Pattern 8: Extracted Parenthetical Classification
**Requirement:** Extract diagnosis type from formatted text like "Disease (Type A)"
```json
{
"field_group": "Classification",
"field_name": "Diagnosis_Type",
"source_name": "Calculated",
"source_id": "extract_parentheses_content",
"field_path": ["Formatted_Diagnosis"]
}
```
---
## Troubleshooting
### Issue 1: "Invalid JSON format" Error
**Symptom:** Configuration validation fails with JSON parsing error
**Cause:** Malformed JSON in field_path, value_labels, or field_condition
**Solution:**
1. Open cell in JSON validator (jsonlint.com)
2. Verify all:
- Array brackets: `[...]`
- Object braces: `{...}`
- String quotes: `"..."`
- Commas between elements
3. Fix syntax errors
4. Re-run validation
**Example - WRONG:**
```json
["name", "address" ] // WRONG: no comma after "name"
["name", "address"] // CORRECT
```
### Issue 2: Field Returns "undefined"
**Symptom:** Field value always "undefined" in output
**Causes:**
1. Field path doesn't match actual data structure
2. Questionnaire ID incorrect
3. Source type mismatch
**Solution:**
1. Check if source data exists in endobest_inclusions_old.json
2. Verify JSON path by stepping through manually
3. Check questionnaire ID (use `q_id` for fastest lookup)
4. Enable debug mode to see detailed errors
```bash
python eb_dashboard.py --debug
```
### Issue 3: Empty Array Result
**Symptom:** Wildcard path returns empty array instead of values
**Causes:**
1. Array elements don't exist at specified path
2. Wildcard position incorrect in path
**Solution:**
1. Verify array exists in source data
2. Check array element structure
3. Test path manually in JSON tool
**Example:**
```json
// WRONG: No elements at this path
["record", "items", "*", "nonexistent_field"]
// CORRECT: Match actual structure
["record", "items", "*", "existing_field"]
```
### Issue 4: Calculated Field Returns Error
**Symptom:** Calculated field value starts with "$$$$ "
**Causes:**
1. Function name wrong
2. Function argument count mismatch
3. Referenced fields not yet processed
**Solution:**
1. Check function name spelling
2. Verify argument count in field_path
3. Ensure referenced fields are defined BEFORE calculated field
4. Check for circular dependencies
**Common Errors:**
```
"$$$$ Unknown Custom Function: typo_name"
→ Check function name spelling
"$$$$ Argument Error: function requires N arguments"
→ Check field_path array length
"$$$$ Value Error: undefined"
→ Referenced field is undefined; check order in config
```
### Issue 5: value_labels Not Applied
**Symptom:** Raw value shown instead of mapped label
**Causes:**
1. Raw value doesn't match any entry in value_labels
2. JSON syntax error in value_labels
3. Case sensitivity mismatch
**Solution:**
1. Check raw value type (string vs. number)
2. Verify exact match in value_labels
3. Check for case mismatches (e.g., "Active" vs "active")
4. Add wildcard entry if needed
**Example:**
```json
{
"value_labels": [
{"value": "active", "text": {"fr": "Actif"}},
{"value": "inactive", "text": {"fr": "Inactif"}},
{"value": "*", "text": {"fr": "Autre"}} // Catch-all for unmapped values
]
}
```
### Issue 6: Performance Degradation After Adding Field
**Symptom:** Collection takes significantly longer after adding field
**Causes:**
1. Sequential questionnaire search (use q_id instead)
2. Expensive regex in search_in_fields_using_regex
3. Deep wildcard paths (multiple levels)
**Solution:**
1. Use `q_id=` instead of `q_name=` or `q_category=`
2. Simplify regex patterns
3. Flatten wildcard paths where possible
---
## Summary
The Field Mapping Configuration provides:
**100% Externalized:** No code changes needed to add fields
**Flexible Sourcing:** Support for questionnaires, records, requests, calculated fields
**Rich Transformations:** Labels, templates, conditions, custom functions
**User-Friendly:** Excel-based configuration with validation
**Performance Optimized:** Single-call questionnaire fetching, field batching
This architecture enables rapid iteration on data extraction without deploying code changes.
---
**Document End**