Skip to main content

LCA Food Glossary

A unified glossary system for Life Cycle Assessment in food, integrating multiple standards with advanced semantic mapping capabilities. Built on a LinkML-first architecture for semantic web compatibility and multi-format data generation.

Looking for Platform Terminology?

This glossary covers food classification standards and LCA databases used in scientific research.

For EOS platform terminology (system concepts, API terms, calculation methods), see the EOS Glossary.

Overview

The LCA Food Glossary bridges the gap between different food classification systems and LCA databases with AI-powered semantic mapping, making it easier for researchers, practitioners, and organizations to perform accurate environmental assessments of food products and supply chains.

Key Statistics

  • Total Terms: 168,626
  • Data Sources: 10 integrated sources
  • Output Formats: 8+ formats (JSON, SQLite, RDF, TypeScript, etc.)
  • Database Size: 133 MB (SQLite)
  • Current Version: 0.1.2

Key Features

Multi-Source Integration

Combines data from 10 leading food and LCA standards including FoodEx2, Hestia, Ecoinvent, AGROvoc, and more. See the complete Data Sources documentation for term counts and coverage details.

Advanced Semantic Mapping

  • AI-Powered Matching - OpenAI and Google AI integration for intelligent term matching
  • 4-Stage Cascade - Contextual, exact, synonym, and embedding-based matching
  • Quality Validation - Confidence scoring and match quality analysis
  • Interactive Debugging - Real-time match visualization and debugging tools

Multi-Format Export

Generate data in multiple formats for different use cases:

  • SQLite Database - Optimized for queries and relationships (133 MB)
  • JSON/JSON-LD - Web applications and semantic web integration (189 MB)
  • LinkML YAML - Native format with full semantic annotations (157 MB)
  • TypeScript Types - Type-safe integration for JavaScript/TypeScript
  • RDF/OWL - Semantic web ontologies
  • SQL DDL - Database schema definitions
  • CSV/Excel - Data analysis and spreadsheet applications

LinkML-First Architecture

Built on LinkML (Linked Data Modeling Language) as the primary schema definition:

Raw Data → Parser → LinkML YAML → Validation → Multi-Format Generation
├── JSON/JSON-LD
├── TypeScript Types
├── RDF/OWL Ontologies
├── SQL DDL Schemas
└── SQLite Database

Benefits:

  • Semantic web native with built-in RDF, JSON-LD, and SKOS support
  • FAIR data principles (Findable, Accessible, Interoperable, Reusable)
  • Enhanced validation with pattern matching and conditional rules
  • Single source of truth for all output formats

Use Cases

Environmental Impact Assessment

Perform comprehensive LCA studies with standardized terminology across multiple food classification systems.

Supply Chain Analysis

Map ingredients and processes from different standards to analyze sustainability across the entire supply chain.

Food Product Classification

Standardize product descriptions using unified terminology from multiple authoritative sources.

Research and Academic Studies

Access comprehensive food and LCA vocabulary with semantic relationships for academic research.

Software Integration

Integrate with existing LCA tools using multiple export formats and type-safe interfaces.

Semantic Web Applications

Build linked data applications using JSON-LD, RDF, and SKOS vocabularies.

Quick Start

Download Pre-Built Data

The glossary is available in multiple formats:

# SQLite Database (recommended for queries)
wget https://esfc-glossary-ec2bc9.gitlab.io/downloads/glossary.db

# JSON Format (web applications)
wget https://esfc-glossary-ec2bc9.gitlab.io/downloads/glossary.json

# LinkML YAML (native format)
wget https://esfc-glossary-ec2bc9.gitlab.io/downloads/glossary.yaml

# TypeScript Types
wget https://esfc-glossary-ec2bc9.gitlab.io/downloads/glossary.types.ts

TypeScript/JavaScript Integration

import { Term, Glossary } from './glossary.types'

// Load glossary data
const glossary: Glossary = await fetch('/glossary.json')
.then(r => r.json())

// Search for terms
const hestiaTerms = glossary.terms.filter(t => t.source === 'hestia')
console.log(`Found ${hestiaTerms.length} Hestia terms`)

Python Integration

from linkml_runtime.loaders import yaml_loader
from glossary_model import Glossary, Term

# Load glossary
glossary = yaml_loader.load('glossary.yaml', target_class=Glossary)

# Query terms
sources = set(t.source for t in glossary.terms)
print(f"Loaded {len(glossary.terms)} terms from {len(sources)} sources")

SQL Queries

-- SQLite database queries
SELECT * FROM terms
WHERE source = 'hestia'
AND category LIKE '%emission%'
LIMIT 10;

-- Get term counts by source
SELECT source, COUNT(*) as term_count
FROM terms
GROUP BY source
ORDER BY term_count DESC;

RDF/SPARQL Queries

PREFIX skos: <http://www.w3.org/2004/02/skos/core#>
PREFIX dc: <http://purl.org/dc/terms/>

SELECT ?term ?label ?source WHERE {
?term skos:prefLabel ?label ;
dc:source ?source .
FILTER(CONTAINS(LCASE(?label), "emission"))
}
LIMIT 10

Documentation

Reference Documentation

Technical Documentation

Architecture

Project Structure

esfc-glossary/
├── sources/ # Source data from providers
│ ├── foodex2/ # FoodEx2 Excel files
│ ├── hestia/ # Live API integration
│ ├── ecoinvent/ # CSV/JSON data
│ ├── agrovoc/ # FAO thesaurus
│ └── ...
├── schema/ # LinkML schema definitions
│ └── glossary.linkml.yaml
├── scripts/ # Data processing pipeline
│ ├── *-parser-yaml.js
│ └── build-glossary-linkml.js
├── output/ # Generated output files
│ ├── glossary.db # SQLite database
│ ├── glossary.json # JSON format
│ └── glossary.yaml # LinkML YAML
└── website/ # React 19 + Vite application
└── public/ # Static assets

Data Pipeline

  1. Fetch - Download/fetch data from live APIs and static sources
  2. Parse - Convert to LinkML YAML format with semantic annotations
  3. Validate - Validate against LinkML schema
  4. Build - Merge all sources into unified glossary
  5. Generate - Export to multiple formats (JSON, SQLite, TypeScript, RDF)
  6. Deploy - Publish to web application and download endpoints

Contributing

The LCA Food Glossary is an open project. Contributions are welcome:

  • Report Issues - Submit bug reports and feature requests
  • Add Data Sources - Integrate new food or LCA vocabularies
  • Improve Mappings - Enhance semantic relationships between terms
  • Update Documentation - Help improve this documentation

Version History

Version 0.1.2 (Current)

  • Total Terms: 168,626
  • Build: 6
  • Last Updated: December 8, 2025
  • Features:
    • LinkML-first architecture
    • Live Hestia API integration (36,044 terms)
    • AI-powered semantic matching
    • Multi-format export (8+ formats)
    • Enhanced web interface with SQL query support

License

The LCA Food Glossary is licensed under the MIT License. Individual data sources may have their own licenses.

Acknowledgments

  • EFSA - FoodEx2 food classification system
  • Hestia Project - Food LCA database and API
  • ecoinvent Association - Life Cycle Inventory database
  • FAO - AGROvoc agricultural thesaurus
  • GS1 - Global packaging vocabulary
  • UN Statistics Division - CPC commodity codes and UNECE packaging codes

Support

For questions, issues, or contributions:

Next Steps