Ingredient Amount Estimator GFM

The Ingredient Amount Estimator Gap Filling Module estimates the amount of each ingredient in a food product based on nutritional data. It uses convex optimization (CVXPY) to minimize the difference between declared nutritional values on product packaging and calculated values derived from individual ingredient nutrients.

Quick Reference

Property	Description
Runs on	`FoodProcessingActivityNode` with nutrient values available on parent nodes
Dependencies	`UnitWeightConversionGapFillingWorker`, `AddClientNodesGapFillingWorker`, `MatchProductNameGapFillingWorker`, `IngredientSplitterGapFillingWorker`, `NutrientSubdivisionGapFillingWorker`, `LinkTermToActivityNodeGapFillingWorker`
Key Input	Product nutrient declaration, ingredient list with nutrient profiles
Output	Estimated ingredient amounts (kg per unit of parent product)
Trigger	Product has nutrient values and ingredient declaration

When It Runs

The module triggers when:

The node is a FoodProcessingActivityNode
Parent nodes have nutrient values available (from Application Programming Interface or database)
The product has an ingredients declaration that has been parsed by the Ingredient Splitter Gap Filling Module
All required dependency Gap Filling Modules have completed

Key Output

The module produces:

Ingredient amounts: Estimated weight of each ingredient in kg per unit of parent product
Solution status: Optimization solver status (optimal, infeasible, unbounded)
Error squares: Per-nutrient squared errors for quality assessment
Estimated nutrients: Calculated nutrient values based on estimated amounts

Scientific Methodology

The ingredient amount estimator solves a constrained optimization problem to find ingredient percentages that best match declared nutritional values while respecting legal and physical constraints.

Problem Formulation

The algorithm minimizes the normalized difference between calculated and declared nutrients:

minimize ||A_norm * x - b_norm||

Where:

A: M x N nutrient matrix (M nutrients, N ingredients)
x: N x 1 vector of ingredient amounts (fractions of total, summing to 1)
b: M x 1 vector of declared nutrient values
A_norm, b_norm: Normalized versions using standard deviation

Normalization

Each nutrient is normalized by its population standard deviation to ensure equal weighting:

A_norm = (A.T / norm_vector).T
b_norm = b / norm_vector

This normalization uses statistics derived from thousands of product declarations:

Nutrient	Average	Standard Deviation	Median
Energy (kilocalorie)	284.84	170.00	275.0
Fat (g)	12.61	12.67	7.4
Saturated fat (g)	5.48	6.29	2.5
Carbohydrates (g)	27.47	24.19	16.2
Sugar/Sucrose (g)	15.36	17.05	7.3
Protein (g)	6.94	5.76	5.8
Sodium chloride (g)	0.77	0.85	0.4
Fibers (g)	2.88	2.59	2.2
Sodium (mg)	302.92	938.07	15.74
Chlorine (mg)	461.02	515.61	242.64

Accepted Nutrients

Only nutrients with complete data in the Eaternity Database are used for optimization:

ACCEPTED_NUTRIENTS = {
    "energy",
    "fat",
    "saturated_fat",
    "carbohydrates",
    "water",
    "sucrose",
    "protein",
    "sodium",
    "chlorine",
    "fibers",
}

Constraint System

The optimization includes several constraint types:

1. Sum-to-One Constraint

All ingredient fractions at each hierarchy level must sum to 1 (100%):

F @ x == g

Where F is the equality constraint matrix ensuring ingredients at each level sum correctly.

2. Decreasing Order Constraint

According to European Union food labeling regulations, ingredients must be listed in decreasing order by weight:

C @ x <= d

Where C is a constraint matrix enforcing:

ingredient[i] >= ingredient[i+1] for same-level ingredients

Exception: This constraint does not apply to subdivisions (nutrient variants of the same ingredient).

3. Fixed Percentage Constraints

When percentages are declared on packaging:

# Exact percentage constraint
x[ingredient_idx] == fixed_percentage * x[parent_idx]

# Or for root-level ingredients
x[ingredient_idx] == fixed_percentage

4. Minimum/Maximum Percentage Constraints

When only bounds are specified:

# Minimum percentage: ingredient >= min_percentage * parent
x[col_idx] - min_pct * x[parent_col_idx] >= 0

# Maximum percentage: ingredient <= max_percentage * parent
x[col_idx] - max_pct * x[parent_col_idx] <= 0

5. Non-negativity

All ingredient amounts must be non-negative:

x >= 0

Implementation Details

Solver Configuration

The module uses the ECOS solver via CVXPY:

problem.solve(solver="ECOS")

ECOS (Embedded Conic Solver) is chosen for its efficiency with second-order cone programs and its ability to handle the least-squares objective function.

Ingredient Hierarchy Handling

The algorithm handles nested ingredient declarations using a level-tuple system:

# Example hierarchy:
# "Chocolate (cocoa 30%, sugar), Milk powder (milk, lactose)"
# Level tuples:
# (0,)    -> Chocolate
# (0, 0)  -> cocoa (30% of Chocolate)
# (0, 1)  -> sugar
# (1,)    -> Milk powder
# (1, 0)  -> milk
# (1, 1)  -> lactose

The constraint matrix ensures:

Sub-ingredients sum to their parent ingredient's amount
Order constraints apply within each level

Special Handling

Fermentation Adjustment

For alcoholic products, the algorithm adjusts nutrients to account for fermentation:

# Alcohol conversion (Gay-Lussac equation)
# 180g sugar -> 92g ethanol + 88g CO2
alc_as_sugar = alc.value / 0.47

# Adjust sugar and carbohydrates
sugar.value += alc_as_sugar
carbs.value += alc_as_sugar

# Adjust energy (alcohol: 7 kcal/g, sugar: 4 kcal/g)
energy.value += alc_as_sugar * 4.0  # Add back sugar calories
energy.value -= alc.value * 7.0     # Remove alcohol calories

Sodium Chloride Splitting

When sodium chloride (salt) is declared, it's split into sodium and chlorine for optimization:

NA_PERCENT_IN_NACL = 39.34  # 39.34% sodium
CL_PERCENT_IN_NACL = 60.66  # 60.66% chlorine

new_sodium = old_sodium + (sodium_chloride * 39.34 / 100)
new_chlorine = old_chlorine + (sodium_chloride * 60.66 / 100)

Water Removal

Water content is excluded from direct optimization since it's implicitly handled through nutrient subdivision:

incoming_nutrients.quantities = {
    k: v for k, v in incoming_nutrients.quantities.items()
    if k != get_nutrient_term("water").uid
}

Non-Food Ingredients

Non-food ingredients (additives, preservatives, E-numbers) are assigned zero amount:

for uid in non_food_ingredient_uids:
    calc_graph.apply_mutation(
        PropMutation(
            node_uid=uid,
            prop_name="amount",
            prop=QuantityProp(
                value=0.0,
                unit_term_uid=kilogram_term.uid,
            ),
        )
    )

Constraint Matrix Construction

Nutrient Matrix (A)

The M x N nutrient matrix maps ingredient nutrient profiles to the optimization:

def make_nutrient_matrix(flat_nutrients, product_order, nutrient_order, M, N):
    matrix = np.zeros((M, N))
    for iy, product_key in enumerate(product_order):
        if flat_nutrients[product_key] is None:
            continue  # Leave column at zero
        for ix, nutrient_key in enumerate(nutrient_order):
            if nutrient_key in flat_nutrients[product_key].quantities:
                matrix[ix, iy] = flat_nutrients[product_key].quantities[nutrient_key].value
    matrix[np.isnan(matrix)] = 0.0
    return matrix

Order Constraint Matrix (C)

Enforces decreasing ingredient order:

# Example for [(0,), (1,), (1,0), (1,1), (2,)]
# Matrix structure:
# [[-1,  1,  0,  0,  0],   # ingredient[0] >= ingredient[1]
#  [ 0, -1,  0,  0,  1],   # ingredient[1] >= ingredient[4]
#  [ 0,  0, -1,  1,  0],   # sub[0] >= sub[1]
#  [ 0,  0,  0, -1,  0],
#  [ 0,  0,  0,  0, -1]]

Equality Matrix (F)

Ensures ingredients sum correctly at each level:

# Example: F @ x = g
# [[1, 1, 0, 0, 1],    # Top level sums to 1
#  [0, -1, 1, 1, 0]]   # Sub-ingredients sum to parent
# g = [1, 0]

Calculation Example

Scenario: Chocolate bar with declaration:

"Cocoa mass (45%), Sugar, Cocoa butter, Milk powder (5%)"
Declared nutrients: 530 kcal, 32g fat, 20g saturated fat, 52g carbs, 48g sugar, 6g protein

Step 1: Build Ingredient Hierarchy

Level tuples:
(0,) -> Cocoa mass      [45% fixed]
(1,) -> Sugar
(2,) -> Cocoa butter
(3,) -> Milk powder     [5% fixed]

Step 2: Construct Matrices

Nutrient matrix A (per 100g of each ingredient):

	Cocoa mass	Sugar	Cocoa butter	Milk powder
Energy	228	400	884	496
Fat	14	0	99.8	26.7
Sat. fat	8.1	0	60.5	16.7
Carbs	11.5	100	0	38.4
Sugar	0.5	100	0	38.4
Protein	19.6	0	0	26.3

Constraint matrix C (order constraints):

[[-1, 1, 0, 0],    # x[0] >= x[1]
 [0, -1, 1, 0],    # x[1] >= x[2]
 [0, 0, -1, 1],    # x[2] >= x[3]
 [0, 0, 0, -1]]

Step 3: Solve Optimization

With fixed constraints x[0] = 0.45 and x[3] = 0.05:

minimize ||A_norm @ x - b_norm||
subject to:
  x[0] + x[1] + x[2] + x[3] = 1
  x[0] = 0.45
  x[3] = 0.05
  x[0] >= x[1] >= x[2] >= x[3]
  x >= 0

Step 4: Solution

Cocoa mass:   45.0%  (fixed)
Sugar:        30.2%  (estimated)
Cocoa butter: 19.8%  (estimated)
Milk powder:   5.0%  (fixed)

Step 5: Output to Graph

Each ingredient node receives an amount in kg:

# For 1kg product:
cocoa_mass.amount = 0.450 kg
sugar.amount = 0.302 kg
cocoa_butter.amount = 0.198 kg
milk_powder.amount = 0.050 kg

Fallback Mechanisms

When Optimization Fails

If the optimization returns "infeasible" or "unbounded":

Try fallback without estimation: If all percentages are fixed, use those directly
Handle subdivisions: Assign subdivision amounts based on parent production amount
Log data error: Record the failure for manual review

if len(flat_nutrients) == len(fixed_percentages):
    if set(flat_nutrients.keys()) == set(fixed_percentages.keys()):
        self.handle_fixed_percentages(flat_nutrients, fixed_percentages, calc_graph)
        return

Missing Nutrients

When leaf ingredients lack nutrient data:

If all percentages are known, use fixed percentages
Otherwise, raise an error requesting science team data

Quality Metrics

Error Squares

The module calculates per-nutrient squared errors for quality assessment:

error_squares[nutrient] = ((estimated - declared) / STD)^2

This normalized error allows comparison across products regardless of scale.

Solution Status

The solver returns status indicating solution quality:

optimal: Valid solution found
optimal_inaccurate: Solution found but may have numerical issues
infeasible: No valid solution exists (conflicting constraints)
unbounded: Problem not properly constrained

Known Limitations

Data Coverage

Requires nutrient profiles for all ingredients in the Eaternity Database
Only 10 nutrients used for optimization (not all declared nutrients)
Nutrient statistics based on European product data

Model Assumptions

Ingredients listed in decreasing order (European Union regulation)
Sub-ingredient percentages relative to parent (may vary in practice)
Linear relationship between ingredient amounts and nutrients
No processing-related nutrient changes modeled directly

Numerical Considerations

Very small negative solutions (> -1e-5) are clamped to 1e-10
NaN solutions trigger errors
Solver may fail on ill-conditioned problems

Module	Relationship
Ingredient Splitter	Parses ingredient declarations into structured hierarchy
Nutrient Subdivision	Creates dried/modified variants for water-loss modeling
Match Product Name	Links ingredients to database entries with nutrient profiles
Unit Weight Conversion	Converts amounts between units

References

CVXPY Documentation. https://www.cvxpy.org/
ECOS Solver. Domahidi, A., Chu, E., & Boyd, S. (2013). ECOS: An SOCP solver for embedded systems. European Control Conference.
European Union Food Labeling Regulation. Regulation (EU) No 1169/2011
Mahalanobis Distance. https://en.wikipedia.org/wiki/Mahalanobis_distance
EuroFIR Food Composition Data. http://www.eurofir.org/

Quick Reference​

When It Runs​

Key Output​

Scientific Methodology​

Problem Formulation​

Normalization​

Accepted Nutrients​

Constraint System​

1. Sum-to-One Constraint​

2. Decreasing Order Constraint​

3. Fixed Percentage Constraints​

4. Minimum/Maximum Percentage Constraints​

5. Non-negativity​

Implementation Details​

Solver Configuration​

Ingredient Hierarchy Handling​

Special Handling​

Fermentation Adjustment​

Sodium Chloride Splitting​

Water Removal​

Non-Food Ingredients​

Constraint Matrix Construction​

Nutrient Matrix (A)​

Order Constraint Matrix (C)​

Equality Matrix (F)​

Calculation Example​

Step 1: Build Ingredient Hierarchy​

Step 2: Construct Matrices​

Step 3: Solve Optimization​

Step 4: Solution​

Step 5: Output to Graph​

Fallback Mechanisms​

When Optimization Fails​

Missing Nutrients​

Quality Metrics​

Error Squares​

Solution Status​

Known Limitations​

Data Coverage​

Model Assumptions​

Numerical Considerations​

Related Gap Filling Modules​

References​