Skip to main content

Ingredient Amount Estimator GFM

The Ingredient Amount Estimator Gap Filling Module estimates the amount of each ingredient in a food product based on nutritional data. It uses convex optimization (CVXPY) to minimize the difference between declared nutritional values on product packaging and calculated values derived from individual ingredient nutrients.

Quick Reference

PropertyDescription
Runs onFoodProcessingActivityNode with nutrient values available on parent nodes
DependenciesUnitWeightConversionGapFillingWorker, AddClientNodesGapFillingWorker, MatchProductNameGapFillingWorker, IngredientSplitterGapFillingWorker, NutrientSubdivisionGapFillingWorker, LinkTermToActivityNodeGapFillingWorker
Key InputProduct nutrient declaration, ingredient list with nutrient profiles
OutputEstimated ingredient amounts (kg per unit of parent product)
TriggerProduct has nutrient values and ingredient declaration

When It Runs

The module triggers when:

  1. The node is a FoodProcessingActivityNode
  2. Parent nodes have nutrient values available (from Application Programming Interface or database)
  3. The product has an ingredients declaration that has been parsed by the Ingredient Splitter Gap Filling Module
  4. All required dependency Gap Filling Modules have completed

Key Output

The module produces:

  • Ingredient amounts: Estimated weight of each ingredient in kg per unit of parent product
  • Solution status: Optimization solver status (optimal, infeasible, unbounded)
  • Error squares: Per-nutrient squared errors for quality assessment
  • Estimated nutrients: Calculated nutrient values based on estimated amounts

Scientific Methodology

The ingredient amount estimator solves a constrained optimization problem to find ingredient percentages that best match declared nutritional values while respecting legal and physical constraints.

Problem Formulation

The algorithm minimizes the normalized difference between calculated and declared nutrients:

minimize ||A_norm * x - b_norm||

Where:

  • A: M x N nutrient matrix (M nutrients, N ingredients)
  • x: N x 1 vector of ingredient amounts (fractions of total, summing to 1)
  • b: M x 1 vector of declared nutrient values
  • A_norm, b_norm: Normalized versions using standard deviation

Normalization

Each nutrient is normalized by its population standard deviation to ensure equal weighting:

A_norm = (A.T / norm_vector).T
b_norm = b / norm_vector

This normalization uses statistics derived from thousands of product declarations:

NutrientAverageStandard DeviationMedian
Energy (kilocalorie)284.84170.00275.0
Fat (g)12.6112.677.4
Saturated fat (g)5.486.292.5
Carbohydrates (g)27.4724.1916.2
Sugar/Sucrose (g)15.3617.057.3
Protein (g)6.945.765.8
Sodium chloride (g)0.770.850.4
Fibers (g)2.882.592.2
Sodium (mg)302.92938.0715.74
Chlorine (mg)461.02515.61242.64

Accepted Nutrients

Only nutrients with complete data in the Eaternity Database are used for optimization:

ACCEPTED_NUTRIENTS = {
"energy",
"fat",
"saturated_fat",
"carbohydrates",
"water",
"sucrose",
"protein",
"sodium",
"chlorine",
"fibers",
}

Constraint System

The optimization includes several constraint types:

1. Sum-to-One Constraint

All ingredient fractions at each hierarchy level must sum to 1 (100%):

F @ x == g

Where F is the equality constraint matrix ensuring ingredients at each level sum correctly.

2. Decreasing Order Constraint

According to European Union food labeling regulations, ingredients must be listed in decreasing order by weight:

C @ x <= d

Where C is a constraint matrix enforcing:

  • ingredient[i] >= ingredient[i+1] for same-level ingredients

Exception: This constraint does not apply to subdivisions (nutrient variants of the same ingredient).

3. Fixed Percentage Constraints

When percentages are declared on packaging:

# Exact percentage constraint
x[ingredient_idx] == fixed_percentage * x[parent_idx]

# Or for root-level ingredients
x[ingredient_idx] == fixed_percentage

4. Minimum/Maximum Percentage Constraints

When only bounds are specified:

# Minimum percentage: ingredient >= min_percentage * parent
x[col_idx] - min_pct * x[parent_col_idx] >= 0

# Maximum percentage: ingredient <= max_percentage * parent
x[col_idx] - max_pct * x[parent_col_idx] <= 0

5. Non-negativity

All ingredient amounts must be non-negative:

x >= 0

Implementation Details

Solver Configuration

The module uses the ECOS solver via CVXPY:

problem.solve(solver="ECOS")

ECOS (Embedded Conic Solver) is chosen for its efficiency with second-order cone programs and its ability to handle the least-squares objective function.

Ingredient Hierarchy Handling

The algorithm handles nested ingredient declarations using a level-tuple system:

# Example hierarchy:
# "Chocolate (cocoa 30%, sugar), Milk powder (milk, lactose)"
# Level tuples:
# (0,) -> Chocolate
# (0, 0) -> cocoa (30% of Chocolate)
# (0, 1) -> sugar
# (1,) -> Milk powder
# (1, 0) -> milk
# (1, 1) -> lactose

The constraint matrix ensures:

  • Sub-ingredients sum to their parent ingredient's amount
  • Order constraints apply within each level

Special Handling

Fermentation Adjustment

For alcoholic products, the algorithm adjusts nutrients to account for fermentation:

# Alcohol conversion (Gay-Lussac equation)
# 180g sugar -> 92g ethanol + 88g CO2
alc_as_sugar = alc.value / 0.47

# Adjust sugar and carbohydrates
sugar.value += alc_as_sugar
carbs.value += alc_as_sugar

# Adjust energy (alcohol: 7 kcal/g, sugar: 4 kcal/g)
energy.value += alc_as_sugar * 4.0 # Add back sugar calories
energy.value -= alc.value * 7.0 # Remove alcohol calories

Sodium Chloride Splitting

When sodium chloride (salt) is declared, it's split into sodium and chlorine for optimization:

NA_PERCENT_IN_NACL = 39.34  # 39.34% sodium
CL_PERCENT_IN_NACL = 60.66 # 60.66% chlorine

new_sodium = old_sodium + (sodium_chloride * 39.34 / 100)
new_chlorine = old_chlorine + (sodium_chloride * 60.66 / 100)

Water Removal

Water content is excluded from direct optimization since it's implicitly handled through nutrient subdivision:

incoming_nutrients.quantities = {
k: v for k, v in incoming_nutrients.quantities.items()
if k != get_nutrient_term("water").uid
}

Non-Food Ingredients

Non-food ingredients (additives, preservatives, E-numbers) are assigned zero amount:

for uid in non_food_ingredient_uids:
calc_graph.apply_mutation(
PropMutation(
node_uid=uid,
prop_name="amount",
prop=QuantityProp(
value=0.0,
unit_term_uid=kilogram_term.uid,
),
)
)

Constraint Matrix Construction

Nutrient Matrix (A)

The M x N nutrient matrix maps ingredient nutrient profiles to the optimization:

def make_nutrient_matrix(flat_nutrients, product_order, nutrient_order, M, N):
matrix = np.zeros((M, N))
for iy, product_key in enumerate(product_order):
if flat_nutrients[product_key] is None:
continue # Leave column at zero
for ix, nutrient_key in enumerate(nutrient_order):
if nutrient_key in flat_nutrients[product_key].quantities:
matrix[ix, iy] = flat_nutrients[product_key].quantities[nutrient_key].value
matrix[np.isnan(matrix)] = 0.0
return matrix

Order Constraint Matrix (C)

Enforces decreasing ingredient order:

# Example for [(0,), (1,), (1,0), (1,1), (2,)]
# Matrix structure:
# [[-1, 1, 0, 0, 0], # ingredient[0] >= ingredient[1]
# [ 0, -1, 0, 0, 1], # ingredient[1] >= ingredient[4]
# [ 0, 0, -1, 1, 0], # sub[0] >= sub[1]
# [ 0, 0, 0, -1, 0],
# [ 0, 0, 0, 0, -1]]

Equality Matrix (F)

Ensures ingredients sum correctly at each level:

# Example: F @ x = g
# [[1, 1, 0, 0, 1], # Top level sums to 1
# [0, -1, 1, 1, 0]] # Sub-ingredients sum to parent
# g = [1, 0]

Calculation Example

Scenario: Chocolate bar with declaration:

  • "Cocoa mass (45%), Sugar, Cocoa butter, Milk powder (5%)"
  • Declared nutrients: 530 kcal, 32g fat, 20g saturated fat, 52g carbs, 48g sugar, 6g protein

Step 1: Build Ingredient Hierarchy

Level tuples:
(0,) -> Cocoa mass [45% fixed]
(1,) -> Sugar
(2,) -> Cocoa butter
(3,) -> Milk powder [5% fixed]

Step 2: Construct Matrices

Nutrient matrix A (per 100g of each ingredient):

Cocoa massSugarCocoa butterMilk powder
Energy228400884496
Fat14099.826.7
Sat. fat8.1060.516.7
Carbs11.5100038.4
Sugar0.5100038.4
Protein19.60026.3

Constraint matrix C (order constraints):

[[-1, 1, 0, 0],    # x[0] >= x[1]
[0, -1, 1, 0], # x[1] >= x[2]
[0, 0, -1, 1], # x[2] >= x[3]
[0, 0, 0, -1]]

Step 3: Solve Optimization

With fixed constraints x[0] = 0.45 and x[3] = 0.05:

minimize ||A_norm @ x - b_norm||
subject to:
x[0] + x[1] + x[2] + x[3] = 1
x[0] = 0.45
x[3] = 0.05
x[0] >= x[1] >= x[2] >= x[3]
x >= 0

Step 4: Solution

Cocoa mass:   45.0%  (fixed)
Sugar: 30.2% (estimated)
Cocoa butter: 19.8% (estimated)
Milk powder: 5.0% (fixed)

Step 5: Output to Graph

Each ingredient node receives an amount in kg:

# For 1kg product:
cocoa_mass.amount = 0.450 kg
sugar.amount = 0.302 kg
cocoa_butter.amount = 0.198 kg
milk_powder.amount = 0.050 kg

Fallback Mechanisms

When Optimization Fails

If the optimization returns "infeasible" or "unbounded":

  1. Try fallback without estimation: If all percentages are fixed, use those directly
  2. Handle subdivisions: Assign subdivision amounts based on parent production amount
  3. Log data error: Record the failure for manual review
if len(flat_nutrients) == len(fixed_percentages):
if set(flat_nutrients.keys()) == set(fixed_percentages.keys()):
self.handle_fixed_percentages(flat_nutrients, fixed_percentages, calc_graph)
return

Missing Nutrients

When leaf ingredients lack nutrient data:

  • If all percentages are known, use fixed percentages
  • Otherwise, raise an error requesting science team data

Quality Metrics

Error Squares

The module calculates per-nutrient squared errors for quality assessment:

error_squares[nutrient] = ((estimated - declared) / STD)^2

This normalized error allows comparison across products regardless of scale.

Solution Status

The solver returns status indicating solution quality:

  • optimal: Valid solution found
  • optimal_inaccurate: Solution found but may have numerical issues
  • infeasible: No valid solution exists (conflicting constraints)
  • unbounded: Problem not properly constrained

Known Limitations

Data Coverage

  • Requires nutrient profiles for all ingredients in the Eaternity Database
  • Only 10 nutrients used for optimization (not all declared nutrients)
  • Nutrient statistics based on European product data

Model Assumptions

  • Ingredients listed in decreasing order (European Union regulation)
  • Sub-ingredient percentages relative to parent (may vary in practice)
  • Linear relationship between ingredient amounts and nutrients
  • No processing-related nutrient changes modeled directly

Numerical Considerations

  • Very small negative solutions (> -1e-5) are clamped to 1e-10
  • NaN solutions trigger errors
  • Solver may fail on ill-conditioned problems

ModuleRelationship
Ingredient SplitterParses ingredient declarations into structured hierarchy
Nutrient SubdivisionCreates dried/modified variants for water-loss modeling
Match Product NameLinks ingredients to database entries with nutrient profiles
Unit Weight ConversionConverts amounts between units

References

  1. CVXPY Documentation. https://www.cvxpy.org/

  2. ECOS Solver. Domahidi, A., Chu, E., & Boyd, S. (2013). ECOS: An SOCP solver for embedded systems. European Control Conference.

  3. European Union Food Labeling Regulation. Regulation (EU) No 1169/2011

  4. Mahalanobis Distance. https://en.wikipedia.org/wiki/Mahalanobis_distance

  5. EuroFIR Food Composition Data. http://www.eurofir.org/