Skip to main content

Gap Filling Modules (GFMs)

Gap Filling Modules are the building blocks of the EOS calculation engine. Each module is a specialized, self-contained unit that handles a specific aspect of environmental impact calculation.

What are Gap Filling Modules?

In the real world, food product data is often incomplete. A product might have:

  • A name but no ingredient list
  • Ingredients but no origin information
  • Origin but no transport details
  • Partial nutritional data

GFMs solve this problem by automatically "filling the gaps" in product data using scientific models, databases, and intelligent defaults.

The GFM Concept

df3a632b018e53271e2741a7727ee1f6

Each GFM consists of:

  1. Scheduling Logic - Determines if the module is relevant for a given node
  2. Readiness Check - Verifies all dependencies are satisfied
  3. Execution - The calculation logic that fills gaps and updates the graph

Available Modules

EOS includes 50+ specialized GFMs organized by function:

Matching Modules

Match product names and ingredients to database entries:

ModulePurpose
match_product_name_gfmMatches product names to database entries
attachment_ai_matching_gfmAI-powered ingredient matching
link_term_to_activity_node_gfmLinks terms to LCA activities
link_food_categories_gfmAssigns food categories

Location Modules

Handle geographic and origin data:

ModulePurpose
origin_gfmDetermines product origin
location_gfmGeographic location handling
transportation_decision_gfmDetermines transport modes
transportation_mode_distance_gfmCalculates transport distances

Lifecycle Modules

Model processing and supply chain:

ModulePurpose
processing_gfmFood processing impacts
greenhouse_gfmGreenhouse gas calculations
conservation_gfmStorage and preservation
perishability_gfmShelf life and waste factors

Impact Assessment Modules

Calculate environmental metrics:

ModulePurpose
impact_assessment_gfmAggregates impact calculations
water_scarcity_gfmWater footprint calculation
rainforest_gfmDeforestation impact
vitascore_gfmNutritional scoring

Aggregation Modules

Combine results across ingredients:

ModulePurpose
aggregation_gfmAggregates ingredient impacts
ingredient_splitter_gfmBreaks down composite ingredients
ingredient_amount_estimator_gfmEstimates ingredient quantities

Module Architecture

Each GFM follows the Factory/Worker pattern:

# Factory - initialized once per service, manages workers
class ExampleGapFillingFactory(AbstractGapFillingFactory):
def __init__(self, postgres_db, service_provider):
super().__init__(postgres_db, service_provider)
self.cache = {} # Persistent cache across calculations

async def init_cache(self):
# Load required data into memory
pass

def spawn_worker(self, node):
return ExampleGapFillingWorker(node)

# Worker - spawned per node, runs the calculation
class ExampleGapFillingWorker(AbstractGapFillingWorker):
def should_be_scheduled(self) -> bool:
# Is this GFM relevant for this node?
return self.node.needs_processing()

def can_run_now(self) -> GapFillingWorkerStatusEnum:
# Are dependencies satisfied?
if self.node.has_required_data():
return GapFillingWorkerStatusEnum.READY
return GapFillingWorkerStatusEnum.WAITING

async def run(self, calc_graph):
# Execute the gap-filling logic
result = await self.calculate()
self.node.set_property("result", result)

Why Factory/Worker?

BenefitDescription
IsolationEach calculation runs in its own worker instance
CachingFactory maintains caches across calculations
ScalabilityWorkers can be distributed
TestingWorkers can be tested independently

Orchestration

The orchestrator coordinates GFM execution:

fdb75a588cce79d21cc1402b46a7e4d3

Scheduling Loop

  1. Node Addition - When nodes are added to the CalcGraph, the orchestrator spawns workers
  2. Scheduling Check - Each worker's should_be_scheduled() is called
  3. Readiness Check - can_run_now() verifies dependencies
  4. Execution - Ready workers execute asynchronously
  5. Graph Updates - Results are written back to node properties
  6. Propagation - New nodes may trigger additional GFMs

Module Dependencies

Modules depend on outputs from other modules:

eb3c7fc7338cf46aae9ee47d43e53d96

Dependencies are resolved automatically through the orchestrator's scheduling loop.

State Management

GFMs track their execution state per node:

# Per-node state stored in GfmStateProp
gfm_state = {
"match_product_name_gfm": "completed",
"origin_gfm": "completed",
"greenhouse_gfm": "running",
"impact_assessment_gfm": "pending"
}

State Values

StateDescription
pendingNot yet scheduled
waitingDependencies not satisfied
runningCurrently executing
completedSuccessfully finished
failedExecution failed

Error Handling

GFMs implement graceful error handling:

async def run(self, calc_graph):
try:
result = await self.calculate()
self.node.set_property("result", result)
except Exception as e:
# Log error with context
logger.error("GFM failed",
gfm=self.__class__.__name__,
node_uid=self.node.uid,
error=str(e))
# Create DataError for tracking
error = DataError(
node_uid=self.node.uid,
gfm_name=self.__class__.__name__,
message=str(e),
classification=ErrorClassification.calculation_error
)
self.node.add_error(error)

Fallback Strategy

When a module fails:

  1. Log the error - Structured logging with context
  2. Create DataError - Track for reporting
  3. Continue processing - Other modules can still run
  4. Flag uncertainty - Mark result with reduced confidence

Next Steps