Ingredient Amount Estimator GFM
The Ingredient Amount Estimator Gap Filling Module estimates the amount of each ingredient in a food product based on nutritional data. It uses convex optimization (CVXPY) to minimize the difference between declared nutritional values on product packaging and calculated values derived from individual ingredient nutrients.
Quick Reference
| Property | Description |
|---|---|
| Runs on | FoodProcessingActivityNode with nutrient values available on parent nodes |
| Dependencies | UnitWeightConversionGapFillingWorker, AddClientNodesGapFillingWorker, MatchProductNameGapFillingWorker, IngredientSplitterGapFillingWorker, NutrientSubdivisionGapFillingWorker, LinkTermToActivityNodeGapFillingWorker |
| Key Input | Product nutrient declaration, ingredient list with nutrient profiles |
| Output | Estimated ingredient amounts (kg per unit of parent product) |
| Trigger | Product has nutrient values and ingredient declaration |
When It Runs
The module triggers when:
- The node is a
FoodProcessingActivityNode - Parent nodes have nutrient values available (from Application Programming Interface or database)
- The product has an ingredients declaration that has been parsed by the Ingredient Splitter Gap Filling Module
- All required dependency Gap Filling Modules have completed
Key Output
The module produces:
- Ingredient amounts: Estimated weight of each ingredient in kg per unit of parent product
- Solution status: Optimization solver status (optimal, infeasible, unbounded)
- Error squares: Per-nutrient squared errors for quality assessment
- Estimated nutrients: Calculated nutrient values based on estimated amounts
Scientific Methodology
The ingredient amount estimator solves a constrained optimization problem to find ingredient percentages that best match declared nutritional values while respecting legal and physical constraints.
Problem Formulation
The algorithm minimizes the normalized difference between calculated and declared nutrients:
minimize ||A_norm * x - b_norm||
Where:
- A: M x N nutrient matrix (M nutrients, N ingredients)
- x: N x 1 vector of ingredient amounts (fractions of total, summing to 1)
- b: M x 1 vector of declared nutrient values
- A_norm, b_norm: Normalized versions using standard deviation
Normalization
Each nutrient is normalized by its population standard deviation to ensure equal weighting:
A_norm = (A.T / norm_vector).T
b_norm = b / norm_vector
This normalization uses statistics derived from thousands of product declarations:
| Nutrient | Average | Standard Deviation | Median |
|---|---|---|---|
| Energy (kilocalorie) | 284.84 | 170.00 | 275.0 |
| Fat (g) | 12.61 | 12.67 | 7.4 |
| Saturated fat (g) | 5.48 | 6.29 | 2.5 |
| Carbohydrates (g) | 27.47 | 24.19 | 16.2 |
| Sugar/Sucrose (g) | 15.36 | 17.05 | 7.3 |
| Protein (g) | 6.94 | 5.76 | 5.8 |
| Sodium chloride (g) | 0.77 | 0.85 | 0.4 |
| Fibers (g) | 2.88 | 2.59 | 2.2 |
| Sodium (mg) | 302.92 | 938.07 | 15.74 |
| Chlorine (mg) | 461.02 | 515.61 | 242.64 |
Accepted Nutrients
Only nutrients with complete data in the Eaternity Database are used for optimization:
ACCEPTED_NUTRIENTS = {
"energy",
"fat",
"saturated_fat",
"carbohydrates",
"water",
"sucrose",
"protein",
"sodium",
"chlorine",
"fibers",
}
Constraint System
The optimization includes several constraint types:
1. Sum-to-One Constraint
All ingredient fractions at each hierarchy level must sum to 1 (100%):
F @ x == g
Where F is the equality constraint matrix ensuring ingredients at each level sum correctly.
2. Decreasing Order Constraint
According to European Union food labeling regulations, ingredients must be listed in decreasing order by weight:
C @ x <= d
Where C is a constraint matrix enforcing:
ingredient[i] >= ingredient[i+1]for same-level ingredients
Exception: This constraint does not apply to subdivisions (nutrient variants of the same ingredient).
3. Fixed Percentage Constraints
When percentages are declared on packaging:
# Exact percentage constraint
x[ingredient_idx] == fixed_percentage * x[parent_idx]
# Or for root-level ingredients
x[ingredient_idx] == fixed_percentage
4. Minimum/Maximum Percentage Constraints
When only bounds are specified:
# Minimum percentage: ingredient >= min_percentage * parent
x[col_idx] - min_pct * x[parent_col_idx] >= 0
# Maximum percentage: ingredient <= max_percentage * parent
x[col_idx] - max_pct * x[parent_col_idx] <= 0
5. Non-negativity
All ingredient amounts must be non-negative:
x >= 0
Implementation Details
Solver Configuration
The module uses the ECOS solver via CVXPY:
problem.solve(solver="ECOS")
ECOS (Embedded Conic Solver) is chosen for its efficiency with second-order cone programs and its ability to handle the least-squares objective function.
Ingredient Hierarchy Handling
The algorithm handles nested ingredient declarations using a level-tuple system:
# Example hierarchy:
# "Chocolate (cocoa 30%, sugar), Milk powder (milk, lactose)"
# Level tuples:
# (0,) -> Chocolate
# (0, 0) -> cocoa (30% of Chocolate)
# (0, 1) -> sugar
# (1,) -> Milk powder
# (1, 0) -> milk
# (1, 1) -> lactose
The constraint matrix ensures:
- Sub-ingredients sum to their parent ingredient's amount
- Order constraints apply within each level
Special Handling
Fermentation Adjustment
For alcoholic products, the algorithm adjusts nutrients to account for fermentation:
# Alcohol conversion (Gay-Lussac equation)
# 180g sugar -> 92g ethanol + 88g CO2
alc_as_sugar = alc.value / 0.47
# Adjust sugar and carbohydrates
sugar.value += alc_as_sugar
carbs.value += alc_as_sugar
# Adjust energy (alcohol: 7 kcal/g, sugar: 4 kcal/g)
energy.value += alc_as_sugar * 4.0 # Add back sugar calories
energy.value -= alc.value * 7.0 # Remove alcohol calories
Sodium Chloride Splitting
When sodium chloride (salt) is declared, it's split into sodium and chlorine for optimization:
NA_PERCENT_IN_NACL = 39.34 # 39.34% sodium
CL_PERCENT_IN_NACL = 60.66 # 60.66% chlorine
new_sodium = old_sodium + (sodium_chloride * 39.34 / 100)
new_chlorine = old_chlorine + (sodium_chloride * 60.66 / 100)
Water Removal
Water content is excluded from direct optimization since it's implicitly handled through nutrient subdivision:
incoming_nutrients.quantities = {
k: v for k, v in incoming_nutrients.quantities.items()
if k != get_nutrient_term("water").uid
}
Non-Food Ingredients
Non-food ingredients (additives, preservatives, E-numbers) are assigned zero amount:
for uid in non_food_ingredient_uids:
calc_graph.apply_mutation(
PropMutation(
node_uid=uid,
prop_name="amount",
prop=QuantityProp(
value=0.0,
unit_term_uid=kilogram_term.uid,
),
)
)
Constraint Matrix Construction
Nutrient Matrix (A)
The M x N nutrient matrix maps ingredient nutrient profiles to the optimization:
def make_nutrient_matrix(flat_nutrients, product_order, nutrient_order, M, N):
matrix = np.zeros((M, N))
for iy, product_key in enumerate(product_order):
if flat_nutrients[product_key] is None:
continue # Leave column at zero
for ix, nutrient_key in enumerate(nutrient_order):
if nutrient_key in flat_nutrients[product_key].quantities:
matrix[ix, iy] = flat_nutrients[product_key].quantities[nutrient_key].value
matrix[np.isnan(matrix)] = 0.0
return matrix
Order Constraint Matrix (C)
Enforces decreasing ingredient order:
# Example for [(0,), (1,), (1,0), (1,1), (2,)]
# Matrix structure:
# [[-1, 1, 0, 0, 0], # ingredient[0] >= ingredient[1]
# [ 0, -1, 0, 0, 1], # ingredient[1] >= ingredient[4]
# [ 0, 0, -1, 1, 0], # sub[0] >= sub[1]
# [ 0, 0, 0, -1, 0],
# [ 0, 0, 0, 0, -1]]
Equality Matrix (F)
Ensures ingredients sum correctly at each level:
# Example: F @ x = g
# [[1, 1, 0, 0, 1], # Top level sums to 1
# [0, -1, 1, 1, 0]] # Sub-ingredients sum to parent
# g = [1, 0]
Calculation Example
Scenario: Chocolate bar with declaration:
- "Cocoa mass (45%), Sugar, Cocoa butter, Milk powder (5%)"
- Declared nutrients: 530 kcal, 32g fat, 20g saturated fat, 52g carbs, 48g sugar, 6g protein
Step 1: Build Ingredient Hierarchy
Level tuples:
(0,) -> Cocoa mass [45% fixed]
(1,) -> Sugar
(2,) -> Cocoa butter
(3,) -> Milk powder [5% fixed]
Step 2: Construct Matrices
Nutrient matrix A (per 100g of each ingredient):
| Cocoa mass | Sugar | Cocoa butter | Milk powder | |
|---|---|---|---|---|
| Energy | 228 | 400 | 884 | 496 |
| Fat | 14 | 0 | 99.8 | 26.7 |
| Sat. fat | 8.1 | 0 | 60.5 | 16.7 |
| Carbs | 11.5 | 100 | 0 | 38.4 |
| Sugar | 0.5 | 100 | 0 | 38.4 |
| Protein | 19.6 | 0 | 0 | 26.3 |
Constraint matrix C (order constraints):
[[-1, 1, 0, 0], # x[0] >= x[1]
[0, -1, 1, 0], # x[1] >= x[2]
[0, 0, -1, 1], # x[2] >= x[3]
[0, 0, 0, -1]]
Step 3: Solve Optimization
With fixed constraints x[0] = 0.45 and x[3] = 0.05:
minimize ||A_norm @ x - b_norm||
subject to:
x[0] + x[1] + x[2] + x[3] = 1
x[0] = 0.45
x[3] = 0.05
x[0] >= x[1] >= x[2] >= x[3]
x >= 0
Step 4: Solution
Cocoa mass: 45.0% (fixed)
Sugar: 30.2% (estimated)
Cocoa butter: 19.8% (estimated)
Milk powder: 5.0% (fixed)
Step 5: Output to Graph
Each ingredient node receives an amount in kg:
# For 1kg product:
cocoa_mass.amount = 0.450 kg
sugar.amount = 0.302 kg
cocoa_butter.amount = 0.198 kg
milk_powder.amount = 0.050 kg
Fallback Mechanisms
When Optimization Fails
If the optimization returns "infeasible" or "unbounded":
- Try fallback without estimation: If all percentages are fixed, use those directly
- Handle subdivisions: Assign subdivision amounts based on parent production amount
- Log data error: Record the failure for manual review
if len(flat_nutrients) == len(fixed_percentages):
if set(flat_nutrients.keys()) == set(fixed_percentages.keys()):
self.handle_fixed_percentages(flat_nutrients, fixed_percentages, calc_graph)
return
Missing Nutrients
When leaf ingredients lack nutrient data:
- If all percentages are known, use fixed percentages
- Otherwise, raise an error requesting science team data
Quality Metrics
Error Squares
The module calculates per-nutrient squared errors for quality assessment:
error_squares[nutrient] = ((estimated - declared) / STD)^2
This normalized error allows comparison across products regardless of scale.
Solution Status
The solver returns status indicating solution quality:
- optimal: Valid solution found
- optimal_inaccurate: Solution found but may have numerical issues
- infeasible: No valid solution exists (conflicting constraints)
- unbounded: Problem not properly constrained
Known Limitations
Data Coverage
- Requires nutrient profiles for all ingredients in the Eaternity Database
- Only 10 nutrients used for optimization (not all declared nutrients)
- Nutrient statistics based on European product data
Model Assumptions
- Ingredients listed in decreasing order (European Union regulation)
- Sub-ingredient percentages relative to parent (may vary in practice)
- Linear relationship between ingredient amounts and nutrients
- No processing-related nutrient changes modeled directly
Numerical Considerations
- Very small negative solutions (> -1e-5) are clamped to 1e-10
- NaN solutions trigger errors
- Solver may fail on ill-conditioned problems
Related Gap Filling Modules
| Module | Relationship |
|---|---|
| Ingredient Splitter | Parses ingredient declarations into structured hierarchy |
| Nutrient Subdivision | Creates dried/modified variants for water-loss modeling |
| Match Product Name | Links ingredients to database entries with nutrient profiles |
| Unit Weight Conversion | Converts amounts between units |
References
-
CVXPY Documentation. https://www.cvxpy.org/
-
ECOS Solver. Domahidi, A., Chu, E., & Boyd, S. (2013). ECOS: An SOCP solver for embedded systems. European Control Conference.
-
European Union Food Labeling Regulation. Regulation (EU) No 1169/2011
-
Mahalanobis Distance. https://en.wikipedia.org/wiki/Mahalanobis_distance
-
EuroFIR Food Composition Data. http://www.eurofir.org/