Skip to main content

Data Formats

The ESFC Glossary is available in 8+ formats to support different use cases, from web applications to semantic web integration, database queries, and type-safe programming.

Overview

All formats are generated from a single LinkML schema, ensuring consistency across outputs while optimizing each format for its specific use case.

Available Formats:

  • SQLite Database (133 MB)
  • JSON (189 MB)
  • LinkML YAML (157 MB)
  • JSON-LD (semantic web)
  • TypeScript Types
  • RDF/OWL Ontologies
  • SQL DDL Schemas
  • CSV/Excel

Primary Formats

SQLite Database

File: glossary.db Size: 133 MB Use Case: Queries, relationships, application integration

The SQLite database provides optimized storage and fast queries for the complete glossary.

Database Schema

-- Main terms table
CREATE TABLE terms (
id TEXT PRIMARY KEY,
name TEXT NOT NULL,
description TEXT,
source TEXT NOT NULL,
category TEXT,
properties JSON,
external_mappings JSON,
parent_terms JSON,
metadata JSON,
status TEXT DEFAULT 'active',
created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP,
updated_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP
);

-- Indexes for performance
CREATE INDEX idx_source ON terms(source);
CREATE INDEX idx_category ON terms(category);
CREATE INDEX idx_name ON terms(name);
CREATE INDEX idx_status ON terms(status);

-- Full-text search
CREATE VIRTUAL TABLE terms_fts USING fts5(
id, name, description, category,
content=terms
);

-- Relationships table
CREATE TABLE relationships (
id INTEGER PRIMARY KEY AUTOINCREMENT,
source_term_id TEXT NOT NULL,
target_term_id TEXT NOT NULL,
relationship_type TEXT NOT NULL,
confidence REAL,
method TEXT,
FOREIGN KEY (source_term_id) REFERENCES terms(id),
FOREIGN KEY (target_term_id) REFERENCES terms(id)
);

CREATE INDEX idx_relationships_source ON relationships(source_term_id);
CREATE INDEX idx_relationships_target ON relationships(target_term_id);

Query Examples

Basic Queries:

-- Count terms by source
SELECT source, COUNT(*) as term_count
FROM terms
GROUP BY source
ORDER BY term_count DESC;

-- Find all wheat-related terms
SELECT id, name, source, category
FROM terms
WHERE name LIKE '%wheat%'
ORDER BY source, name;

-- Full-text search
SELECT t.id, t.name, t.source, t.category
FROM terms_fts fts
JOIN terms t ON fts.id = t.id
WHERE terms_fts MATCH 'carbon emission'
LIMIT 20;

Advanced Queries:

-- Find terms with relationships
SELECT
t1.id as source_id,
t1.name as source_name,
r.relationship_type,
t2.name as target_name,
t2.source as target_source,
r.confidence
FROM terms t1
JOIN relationships r ON t1.id = r.source_term_id
JOIN terms t2 ON t2.id = r.target_term_id
WHERE t1.source = 'foodex2'
AND t2.source = 'hestia'
ORDER BY r.confidence DESC
LIMIT 10;

-- Hierarchical queries
WITH RECURSIVE hierarchy AS (
SELECT id, name, parent_terms, 1 as level
FROM terms
WHERE id = 'foodex2-A0101'

UNION ALL

SELECT t.id, t.name, t.parent_terms, h.level + 1
FROM terms t
JOIN hierarchy h
WHERE json_extract(t.parent_terms, '$[0]') = h.id
)
SELECT * FROM hierarchy;

Integration Examples

Python:

import sqlite3

# Connect to database
conn = sqlite3.connect('glossary.db')
cursor = conn.cursor()

# Query terms
cursor.execute('''
SELECT id, name, source, category
FROM terms
WHERE source = ?
LIMIT 10
''', ('hestia',))

for row in cursor.fetchall():
print(f"{row[0]}: {row[1]} ({row[2]})")

conn.close()

Node.js:

import Database from 'better-sqlite3'

const db = new Database('glossary.db')

// Prepared statement
const stmt = db.prepare(`
SELECT id, name, source, category
FROM terms
WHERE source = ? AND category LIKE ?
`)

const results = stmt.all('hestia', '%Emission%')
console.log(`Found ${results.length} emission terms`)

JSON Format

File: glossary.json Size: 189 MB Use Case: Web applications, JavaScript/TypeScript integration

Complete glossary data in JSON format with full term details and metadata.

Structure

{
"metadata": {
"version": "0.1.2",
"build": 6,
"lastUpdated": "2025-12-08T02:54:36.996Z",
"totalTerms": 168626,
"sources": {
"foodex2": 31601,
"hestia": 36044,
"ecoinvent": 33784,
"agrovoc": 41447,
"langual": 12836,
"cpc": 4583,
"sentier": 7731,
"unece": 406,
"gs1": 154,
"eaternity": 40
}
},
"terms": [
{
"@type": "Term",
"id": "foodex2-A010101",
"name": "Common wheat",
"description": "Triticum aestivum, bread wheat",
"source": "foodex2",
"category": "Grains",
"properties": {
"hierarchyCode": "A010101",
"scientificName": "Triticum aestivum",
"level": 4
},
"external_mappings": [
{
"externalId": "hestia-crop-wheat",
"externalSource": "hestia",
"mappingType": "related"
}
],
"parent_terms": ["foodex2-A0101"],
"metadata": {
"searchable": true,
"verified": true
},
"status": "active"
}
]
}

Usage Examples

JavaScript/TypeScript:

// Load glossary
const glossary = await fetch('/glossary.json')
.then(r => r.json())

// Filter by source
const hestiaTerms = glossary.terms.filter(t =>
t.source === 'hestia'
)

// Search by name
const searchResults = glossary.terms.filter(t =>
t.name.toLowerCase().includes('wheat')
)

// Group by category
const byCategory = glossary.terms.reduce((acc, term) => {
const cat = term.category || 'Uncategorized'
if (!acc[cat]) acc[cat] = []
acc[cat].push(term)
return acc
}, {})

Python:

import json

with open('glossary.json') as f:
glossary = json.load(f)

# Access metadata
print(f"Version: {glossary['metadata']['version']}")
print(f"Total terms: {glossary['metadata']['totalTerms']}")

# Filter terms
hestia_terms = [
t for t in glossary['terms']
if t['source'] == 'hestia'
]

# Search
wheat_terms = [
t for t in glossary['terms']
if 'wheat' in t['name'].lower()
]

LinkML YAML

File: glossary.yaml Size: 157 MB Use Case: Semantic web, research, data validation

Native LinkML format with full semantic annotations and relationships.

Structure

terms:
- '@type': Term
id: foodex2-A010101
name: Common wheat
description: Triticum aestivum, bread wheat
source: foodex2
category: Grains
properties:
hierarchyCode: A010101
scientificName: Triticum aestivum
level: 4
external_mappings:
- externalId: hestia-crop-wheat
externalSource: hestia
mappingType: related
parent_terms:
- foodex2-A0101
metadata:
searchable: true
verified: true
status: active

Usage with LinkML

Python with LinkML Runtime:

from linkml_runtime.loaders import yaml_loader
from glossary_model import Glossary, Term

# Load glossary
glossary = yaml_loader.load('glossary.yaml', target_class=Glossary)

# Access terms
print(f"Loaded {len(glossary.terms)} terms")

# Filter by source
hestia_terms = [t for t in glossary.terms if t.source == 'hestia']

# Validate against schema
from linkml_runtime.utils.schemaview import SchemaView

schema = SchemaView('schema/glossary.linkml.yaml')
for term in glossary.terms[:10]:
schema.validate_object(term, target_class='Term')

JSON-LD (Semantic Web)

File: glossary.jsonld Size: ~200 MB Use Case: Semantic web, RDF integration, linked data

JSON-LD format with semantic web context for RDF/SPARQL integration.

Structure

{
"@context": {
"@vocab": "http://esfc-glossary.org/vocab/",
"skos": "http://www.w3.org/2004/02/skos/core#",
"dc": "http://purl.org/dc/terms/",
"rdfs": "http://www.w3.org/2000/01/rdf-schema#",
"Term": "skos:Concept",
"name": "skos:prefLabel",
"description": "skos:definition",
"source": "dc:source",
"category": "skos:inScheme",
"parent_terms": "skos:broader",
"external_mappings": {
"@id": "skos:relatedMatch",
"@container": "@set"
}
},
"@graph": [
{
"@type": "Term",
"@id": "foodex2:A010101",
"name": "Common wheat",
"description": "Triticum aestivum, bread wheat",
"source": "foodex2",
"category": "Grains",
"parent_terms": ["foodex2:A0101"],
"external_mappings": [
{
"@id": "hestia:crop-wheat",
"mappingType": "related"
}
]
}
]
}

SPARQL Queries

PREFIX skos: <http://www.w3.org/2004/02/skos/core#>
PREFIX dc: <http://purl.org/dc/terms/>

# Find all wheat terms
SELECT ?term ?label ?source WHERE {
?term skos:prefLabel ?label ;
dc:source ?source .
FILTER(CONTAINS(LCASE(?label), "wheat"))
}
LIMIT 10

# Find related terms
SELECT ?source ?target ?type WHERE {
?source skos:relatedMatch ?target .
?source dc:source "foodex2" .
?target dc:source "hestia" .
}

Generated Formats

TypeScript Types

File: glossary.types.ts Size: ~500 KB Use Case: Type-safe TypeScript/JavaScript integration

Generated TypeScript type definitions for the glossary schema.

Generated Types

/**
* Main glossary interface
*/
export interface Glossary {
metadata: GlossaryMetadata
terms: Term[]
}

/**
* Metadata about the glossary
*/
export interface GlossaryMetadata {
version: string
build: number
lastUpdated: string
totalTerms: number
sources: Record<string, number>
}

/**
* Individual glossary term
*/
export interface Term {
'@type': 'Term'
id: string
name: string
description?: string
source: GlossarySource
category?: string
properties?: Record<string, any>
external_mappings?: ExternalMapping[]
parent_terms?: string[]
metadata?: Record<string, any>
status: TermStatus
}

/**
* Glossary sources
*/
export type GlossarySource =
| 'foodex2'
| 'hestia'
| 'ecoinvent'
| 'agrovoc'
| 'langual'
| 'cpc'
| 'sentier'
| 'unece'
| 'gs1'
| 'eaternity'

/**
* External mapping to other vocabularies
*/
export interface ExternalMapping {
externalId: string
externalSource: string
mappingType: 'exact' | 'related' | 'broader' | 'narrower'
confidence?: number
}

/**
* Term status
*/
export type TermStatus = 'active' | 'deprecated' | 'obsolete'

Usage

import { Glossary, Term, GlossarySource } from './glossary.types'

async function loadGlossary(): Promise<Glossary> {
const response = await fetch('/glossary.json')
return response.json()
}

function filterBySource(
terms: Term[],
source: GlossarySource
): Term[] {
return terms.filter(t => t.source === source)
}

// Type-safe usage
const glossary = await loadGlossary()
const hestiaTerms = filterBySource(glossary.terms, 'hestia')

// TypeScript ensures type safety
console.log(`Found ${hestiaTerms.length} Hestia terms`)

SQL DDL Schema

File: glossary.sql Size: ~50 KB Use Case: Database schema creation, PostgreSQL/MySQL setup

SQL schema definition for creating database tables.

Generated Schema

-- Terms table
CREATE TABLE terms (
id VARCHAR(255) PRIMARY KEY,
name TEXT NOT NULL,
description TEXT,
source VARCHAR(50) NOT NULL,
category VARCHAR(255),
properties JSONB,
external_mappings JSONB,
parent_terms JSONB,
metadata JSONB,
status VARCHAR(50) DEFAULT 'active',
created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP,
updated_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP
);

-- Indexes
CREATE INDEX idx_terms_source ON terms(source);
CREATE INDEX idx_terms_category ON terms(category);
CREATE INDEX idx_terms_name ON terms USING gin(to_tsvector('english', name));
CREATE INDEX idx_terms_properties ON terms USING gin(properties);

-- Full-text search (PostgreSQL)
CREATE INDEX idx_terms_fts ON terms
USING gin(to_tsvector('english', coalesce(name, '') || ' ' || coalesce(description, '')));

-- Materialized view for source statistics
CREATE MATERIALIZED VIEW source_statistics AS
SELECT
source,
COUNT(*) as term_count,
COUNT(DISTINCT category) as category_count,
MIN(created_at) as first_added,
MAX(updated_at) as last_updated
FROM terms
GROUP BY source;

RDF/OWL Ontology

File: glossary.owl Size: ~250 MB Use Case: Semantic web applications, ontology reasoning

OWL ontology for semantic web reasoning and inference.

Ontology Structure

<?xml version="1.0"?>
<rdf:RDF xmlns="http://esfc-glossary.org/ontology#"
xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#"
xmlns:owl="http://www.w3.org/2002/07/owl#"
xmlns:rdfs="http://www.w3.org/2000/01/rdf-schema#"
xmlns:skos="http://www.w3.org/2004/02/skos/core#">

<owl:Ontology rdf:about="http://esfc-glossary.org/ontology">
<rdfs:label>ESFC Glossary Ontology</rdfs:label>
<rdfs:comment>
Unified food and LCA glossary ontology
</rdfs:comment>
</owl:Ontology>

<!-- Classes -->
<owl:Class rdf:about="http://esfc-glossary.org/ontology#Term">
<rdfs:label>Term</rdfs:label>
<rdfs:subClassOf rdf:resource="http://www.w3.org/2004/02/skos/core#Concept"/>
</owl:Class>

<!-- Properties -->
<owl:DatatypeProperty rdf:about="http://esfc-glossary.org/ontology#source">
<rdfs:domain rdf:resource="http://esfc-glossary.org/ontology#Term"/>
<rdfs:range rdf:resource="http://www.w3.org/2001/XMLSchema#string"/>
</owl:DatatypeProperty>

<!-- Individuals (Terms) -->
<owl:NamedIndividual rdf:about="http://esfc-glossary.org/terms/foodex2-A010101">
<rdf:type rdf:resource="http://esfc-glossary.org/ontology#Term"/>
<skos:prefLabel>Common wheat</skos:prefLabel>
<skos:definition>Triticum aestivum, bread wheat</skos:definition>
</owl:NamedIndividual>

</rdf:RDF>

Export Formats

CSV Export

Generate CSV files for specific term subsets.

Export Structure

id,name,description,source,category,properties,status
foodex2-A010101,"Common wheat","Triticum aestivum",foodex2,Grains,"{""hierarchyCode"":""A010101""}",active
hestia-crop-wheat,"Wheat crop","Agricultural wheat production",hestia,"Inputs & Products","{}",active

Export Scripts

# Export all terms to CSV
npm run export:csv

# Export specific source
npm run export:csv -- --source hestia

# Export with filters
npm run export:csv -- --source foodex2 --category Grains

Excel Export

Multi-sheet Excel workbook with organized data.

Workbook Structure

Sheet 1: Overview

  • Metadata and statistics
  • Source summary
  • Category breakdown

Sheet 2: All Terms

  • Complete term list
  • Filterable columns
  • Color-coded by source

Sheet 3: FoodEx2

  • FoodEx2 terms with hierarchy
  • Facet information

Sheet 4: Hestia

  • Hestia LCA terms
  • Category organization

Sheet 5: Relationships

  • Cross-source mappings
  • Confidence scores
  • Mapping methods

Generation

# Generate Excel workbook
npm run export:excel

# Custom export
node scripts/export-excel.js \
--output glossary.xlsx \
--include-relationships

Download Locations

All formats are available for download:

https://esfc-glossary-ec2bc9.gitlab.io/downloads/
├── glossary.db # SQLite database (133 MB)
├── glossary.json # JSON format (189 MB)
├── glossary.yaml # LinkML YAML (157 MB)
├── glossary.jsonld # JSON-LD (200 MB)
├── glossary.types.ts # TypeScript types (500 KB)
├── glossary.owl # OWL ontology (250 MB)
├── glossary.sql # SQL DDL (50 KB)
├── glossary.csv # CSV export (variable)
└── glossary.xlsx # Excel workbook (variable)

Format Selection Guide

Choose the right format for your use case:

Use CaseRecommended FormatWhy
Web applicationJSON or SQLiteFast loading, easy integration
Type-safe developmentTypeScript Types + JSONType safety and autocomplete
Database applicationSQLite or SQL DDLOptimized queries
Semantic webJSON-LD or RDF/OWLRDF/SPARQL compatibility
ResearchLinkML YAMLFull semantic annotations
Data analysisCSV or ExcelSpreadsheet tools
Python integrationSQLite or LinkML YAMLNative support
Node.js integrationJSON or SQLiteEasy parsing

Generation Pipeline

All formats are generated from the LinkML schema:

LinkML Schema (glossary.linkml.yaml)

Data Parsing & Validation

LinkML YAML (native format)

Multi-Format Generation
├── JSON (linkml-convert)
├── JSON-LD (linkml-convert)
├── TypeScript (linkml-generate-typescript)
├── OWL (linkml-convert)
├── SQL DDL (linkml-generate-sql)
└── SQLite (custom script)

Optimization & Compression

Deployment to CDN