LCA Food Glossary
A unified glossary system for Life Cycle Assessment in food, integrating multiple standards with advanced semantic mapping capabilities. Built on a LinkML-first architecture for semantic web compatibility and multi-format data generation.
This glossary covers food classification standards and LCA databases used in scientific research.
For EOS platform terminology (system concepts, API terms, calculation methods), see the EOS Glossary.
Overview
The LCA Food Glossary bridges the gap between different food classification systems and LCA databases with AI-powered semantic mapping, making it easier for researchers, practitioners, and organizations to perform accurate environmental assessments of food products and supply chains.
Key Statistics
- Total Terms: 168,626
- Data Sources: 10 integrated sources
- Output Formats: 8+ formats (JSON, SQLite, RDF, TypeScript, etc.)
- Database Size: 133 MB (SQLite)
- Current Version: 0.1.2
Key Features
Multi-Source Integration
Combines data from 10 leading food and LCA standards including FoodEx2, Hestia, Ecoinvent, AGROvoc, and more. See the complete Data Sources documentation for term counts and coverage details.
Advanced Semantic Mapping
- AI-Powered Matching - OpenAI and Google AI integration for intelligent term matching
- 4-Stage Cascade - Contextual, exact, synonym, and embedding-based matching
- Quality Validation - Confidence scoring and match quality analysis
- Interactive Debugging - Real-time match visualization and debugging tools
Multi-Format Export
Generate data in multiple formats for different use cases:
- SQLite Database - Optimized for queries and relationships (133 MB)
- JSON/JSON-LD - Web applications and semantic web integration (189 MB)
- LinkML YAML - Native format with full semantic annotations (157 MB)
- TypeScript Types - Type-safe integration for JavaScript/TypeScript
- RDF/OWL - Semantic web ontologies
- SQL DDL - Database schema definitions
- CSV/Excel - Data analysis and spreadsheet applications
LinkML-First Architecture
Built on LinkML (Linked Data Modeling Language) as the primary schema definition:
Raw Data → Parser → LinkML YAML → Validation → Multi-Format Generation
├── JSON/JSON-LD
├── TypeScript Types
├── RDF/OWL Ontologies
├── SQL DDL Schemas
└── SQLite Database
Benefits:
- Semantic web native with built-in RDF, JSON-LD, and SKOS support
- FAIR data principles (Findable, Accessible, Interoperable, Reusable)
- Enhanced validation with pattern matching and conditional rules
- Single source of truth for all output formats
Use Cases
Environmental Impact Assessment
Perform comprehensive LCA studies with standardized terminology across multiple food classification systems.
Supply Chain Analysis
Map ingredients and processes from different standards to analyze sustainability across the entire supply chain.
Food Product Classification
Standardize product descriptions using unified terminology from multiple authoritative sources.
Research and Academic Studies
Access comprehensive food and LCA vocabulary with semantic relationships for academic research.
Software Integration
Integrate with existing LCA tools using multiple export formats and type-safe interfaces.
Semantic Web Applications
Build linked data applications using JSON-LD, RDF, and SKOS vocabularies.
Quick Start
Download Pre-Built Data
The glossary is available in multiple formats:
# SQLite Database (recommended for queries)
wget https://esfc-glossary-ec2bc9.gitlab.io/downloads/glossary.db
# JSON Format (web applications)
wget https://esfc-glossary-ec2bc9.gitlab.io/downloads/glossary.json
# LinkML YAML (native format)
wget https://esfc-glossary-ec2bc9.gitlab.io/downloads/glossary.yaml
# TypeScript Types
wget https://esfc-glossary-ec2bc9.gitlab.io/downloads/glossary.types.ts
TypeScript/JavaScript Integration
import { Term, Glossary } from './glossary.types'
// Load glossary data
const glossary: Glossary = await fetch('/glossary.json')
.then(r => r.json())
// Search for terms
const hestiaTerms = glossary.terms.filter(t => t.source === 'hestia')
console.log(`Found ${hestiaTerms.length} Hestia terms`)
Python Integration
from linkml_runtime.loaders import yaml_loader
from glossary_model import Glossary, Term
# Load glossary
glossary = yaml_loader.load('glossary.yaml', target_class=Glossary)
# Query terms
sources = set(t.source for t in glossary.terms)
print(f"Loaded {len(glossary.terms)} terms from {len(sources)} sources")
SQL Queries
-- SQLite database queries
SELECT * FROM terms
WHERE source = 'hestia'
AND category LIKE '%emission%'
LIMIT 10;
-- Get term counts by source
SELECT source, COUNT(*) as term_count
FROM terms
GROUP BY source
ORDER BY term_count DESC;
RDF/SPARQL Queries
PREFIX skos: <http://www.w3.org/2004/02/skos/core#>
PREFIX dc: <http://purl.org/dc/terms/>
SELECT ?term ?label ?source WHERE {
?term skos:prefLabel ?label ;
dc:source ?source .
FILTER(CONTAINS(LCASE(?label), "emission"))
}
LIMIT 10
Documentation
Reference Documentation
- Data Sources - Complete list of 10 integrated sources with term counts
- FoodEx2 Reference - EFSA food classification system
- Hestia Reference - Food LCA database
- Ecoinvent Reference - Life Cycle Inventory database
- Eaternity Schema - EOS schema classes and properties
Technical Documentation
- Semantic Mapping - AI-powered term matching strategies
- Data Formats - Export formats and integration examples
Architecture
Project Structure
esfc-glossary/
├── sources/ # Source data from providers
│ ├── foodex2/ # FoodEx2 Excel files
│ ├── hestia/ # Live API integration
│ ├── ecoinvent/ # CSV/JSON data
│ ├── agrovoc/ # FAO thesaurus
│ └── ...
├── schema/ # LinkML schema definitions
│ └── glossary.linkml.yaml
├── scripts/ # Data processing pipeline
│ ├── *-parser-yaml.js
│ └── build-glossary-linkml.js
├── output/ # Generated output files
│ ├── glossary.db # SQLite database
│ ├── glossary.json # JSON format
│ └── glossary.yaml # LinkML YAML
└── website/ # React 19 + Vite application
└── public/ # Static assets
Data Pipeline
- Fetch - Download/fetch data from live APIs and static sources
- Parse - Convert to LinkML YAML format with semantic annotations
- Validate - Validate against LinkML schema
- Build - Merge all sources into unified glossary
- Generate - Export to multiple formats (JSON, SQLite, TypeScript, RDF)
- Deploy - Publish to web application and download endpoints
Contributing
The LCA Food Glossary is an open project. Contributions are welcome:
- Report Issues - Submit bug reports and feature requests
- Add Data Sources - Integrate new food or LCA vocabularies
- Improve Mappings - Enhance semantic relationships between terms
- Update Documentation - Help improve this documentation
Version History
Version 0.1.2 (Current)
- Total Terms: 168,626
- Build: 6
- Last Updated: December 8, 2025
- Features:
- LinkML-first architecture
- Live Hestia API integration (36,044 terms)
- AI-powered semantic matching
- Multi-format export (8+ formats)
- Enhanced web interface with SQL query support
License
The LCA Food Glossary is licensed under the MIT License. Individual data sources may have their own licenses.
Acknowledgments
- EFSA - FoodEx2 food classification system
- Hestia Project - Food LCA database and API
- ecoinvent Association - Life Cycle Inventory database
- FAO - AGROvoc agricultural thesaurus
- GS1 - Global packaging vocabulary
- UN Statistics Division - CPC commodity codes and UNECE packaging codes
Support
For questions, issues, or contributions:
- Documentation: esfc-glossary-ec2bc9.gitlab.io
- Repository: GitLab (private)
- Contact: Eaternity team
Next Steps
- Explore Data Sources - Learn about all 10 integrated sources
- Download Formats - Get the glossary in your preferred format
- Semantic Mapping - Understand the AI-powered matching system