Gap Filling Modules (GFMs)

Gap Filling Modules are the building blocks of the EOS calculation engine. Each module is a specialized, self-contained unit that handles a specific aspect of environmental impact calculation.

What are Gap Filling Modules?

In the real world, food product data is often incomplete. A product might have:

A name but no ingredient list
Ingredients but no origin information
Origin but no transport details
Partial nutritional data

GFMs solve this problem by automatically "filling the gaps" in product data using scientific models, databases, and intelligent defaults.

The GFM Concept

df3a632b018e53271e2741a7727ee1f6

Each GFM consists of:

Scheduling Logic - Determines if the module is relevant for a given node
Readiness Check - Verifies all dependencies are satisfied
Execution - The calculation logic that fills gaps and updates the graph

Available Modules

EOS includes 50+ specialized GFMs organized by function:

Matching Modules

Match product names and ingredients to database entries:

Module	Purpose
`match_product_name_gfm`	Matches product names to database entries
`attachment_ai_matching_gfm`	AI-powered ingredient matching
`link_term_to_activity_node_gfm`	Links terms to LCA activities
`link_food_categories_gfm`	Assigns food categories

Location Modules

Handle geographic and origin data:

Module	Purpose
`origin_gfm`	Determines product origin
`location_gfm`	Geographic location handling
`transportation_decision_gfm`	Determines transport modes
`transportation_mode_distance_gfm`	Calculates transport distances

Lifecycle Modules

Model processing and supply chain:

Module	Purpose
`processing_gfm`	Food processing impacts
`greenhouse_gfm`	Greenhouse gas calculations
`conservation_gfm`	Storage and preservation
`perishability_gfm`	Shelf life and waste factors

Impact Assessment Modules

Calculate environmental metrics:

Module	Purpose
`impact_assessment_gfm`	Aggregates impact calculations
`water_scarcity_gfm`	Water footprint calculation
`rainforest_gfm`	Deforestation impact
`vitascore_gfm`	Nutritional scoring

Aggregation Modules

Combine results across ingredients:

Module	Purpose
`aggregation_gfm`	Aggregates ingredient impacts
`ingredient_splitter_gfm`	Breaks down composite ingredients
`ingredient_amount_estimator_gfm`	Estimates ingredient quantities

Module Architecture

Each GFM follows the Factory/Worker pattern:

# Factory - initialized once per service, manages workers
class ExampleGapFillingFactory(AbstractGapFillingFactory):
    def __init__(self, postgres_db, service_provider):
        super().__init__(postgres_db, service_provider)
        self.cache = {}  # Persistent cache across calculations

    async def init_cache(self):
        # Load required data into memory
        pass

    def spawn_worker(self, node):
        return ExampleGapFillingWorker(node)

# Worker - spawned per node, runs the calculation
class ExampleGapFillingWorker(AbstractGapFillingWorker):
    def should_be_scheduled(self) -> bool:
        # Is this GFM relevant for this node?
        return self.node.needs_processing()

    def can_run_now(self) -> GapFillingWorkerStatusEnum:
        # Are dependencies satisfied?
        if self.node.has_required_data():
            return GapFillingWorkerStatusEnum.READY
        return GapFillingWorkerStatusEnum.WAITING

    async def run(self, calc_graph):
        # Execute the gap-filling logic
        result = await self.calculate()
        self.node.set_property("result", result)

Why Factory/Worker?

Benefit	Description
Isolation	Each calculation runs in its own worker instance
Caching	Factory maintains caches across calculations
Scalability	Workers can be distributed
Testing	Workers can be tested independently

Orchestration

The orchestrator coordinates GFM execution:

fdb75a588cce79d21cc1402b46a7e4d3

Scheduling Loop

Node Addition - When nodes are added to the CalcGraph, the orchestrator spawns workers
Scheduling Check - Each worker's should_be_scheduled() is called
Readiness Check - can_run_now() verifies dependencies
Execution - Ready workers execute asynchronously
Graph Updates - Results are written back to node properties
Propagation - New nodes may trigger additional GFMs

Module Dependencies

Modules depend on outputs from other modules:

eb3c7fc7338cf46aae9ee47d43e53d96

Dependencies are resolved automatically through the orchestrator's scheduling loop.

State Management

GFMs track their execution state per node:

# Per-node state stored in GfmStateProp
gfm_state = {
    "match_product_name_gfm": "completed",
    "origin_gfm": "completed",
    "greenhouse_gfm": "running",
    "impact_assessment_gfm": "pending"
}

State Values

State	Description
`pending`	Not yet scheduled
`waiting`	Dependencies not satisfied
`running`	Currently executing
`completed`	Successfully finished
`failed`	Execution failed

Error Handling

GFMs implement graceful error handling:

async def run(self, calc_graph):
    try:
        result = await self.calculate()
        self.node.set_property("result", result)
    except Exception as e:
        # Log error with context
        logger.error("GFM failed",
            gfm=self.__class__.__name__,
            node_uid=self.node.uid,
            error=str(e))
        # Create DataError for tracking
        error = DataError(
            node_uid=self.node.uid,
            gfm_name=self.__class__.__name__,
            message=str(e),
            classification=ErrorClassification.calculation_error
        )
        self.node.add_error(error)

Fallback Strategy

When a module fails:

Log the error - Structured logging with context
Create DataError - Track for reporting
Continue processing - Other modules can still run
Flag uncertainty - Mark result with reduced confidence

Next Steps

How GFMs Work - Detailed mechanics
Module Catalog - All available modules
GFM SDK (coming soon) - Build custom modules

What are Gap Filling Modules?​

The GFM Concept​

Available Modules​

Matching Modules​

Location Modules​

Lifecycle Modules​

Impact Assessment Modules​

Aggregation Modules​

Module Architecture​

Why Factory/Worker?​

Orchestration​

Scheduling Loop​

Module Dependencies​

State Management​

State Values​

Error Handling​

Fallback Strategy​

Next Steps​