Data Formats

The ESFC Glossary is available in 8+ formats to support different use cases, from web applications to semantic web integration, database queries, and type-safe programming.

Overview

All formats are generated from a single LinkML schema, ensuring consistency across outputs while optimizing each format for its specific use case.

Available Formats:

SQLite Database (133 MB)
JSON (189 MB)
LinkML YAML (157 MB)
JSON-LD (semantic web)
TypeScript Types
RDF/OWL Ontologies
SQL DDL Schemas
CSV/Excel

Primary Formats

SQLite Database

File: glossary.db Size: 133 MB Use Case: Queries, relationships, application integration

The SQLite database provides optimized storage and fast queries for the complete glossary.

Database Schema

-- Main terms table
CREATE TABLE terms (
  id TEXT PRIMARY KEY,
  name TEXT NOT NULL,
  description TEXT,
  source TEXT NOT NULL,
  category TEXT,
  properties JSON,
  external_mappings JSON,
  parent_terms JSON,
  metadata JSON,
  status TEXT DEFAULT 'active',
  created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP,
  updated_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP
);

-- Indexes for performance
CREATE INDEX idx_source ON terms(source);
CREATE INDEX idx_category ON terms(category);
CREATE INDEX idx_name ON terms(name);
CREATE INDEX idx_status ON terms(status);

-- Full-text search
CREATE VIRTUAL TABLE terms_fts USING fts5(
  id, name, description, category,
  content=terms
);

-- Relationships table
CREATE TABLE relationships (
  id INTEGER PRIMARY KEY AUTOINCREMENT,
  source_term_id TEXT NOT NULL,
  target_term_id TEXT NOT NULL,
  relationship_type TEXT NOT NULL,
  confidence REAL,
  method TEXT,
  FOREIGN KEY (source_term_id) REFERENCES terms(id),
  FOREIGN KEY (target_term_id) REFERENCES terms(id)
);

CREATE INDEX idx_relationships_source ON relationships(source_term_id);
CREATE INDEX idx_relationships_target ON relationships(target_term_id);

Query Examples

Basic Queries:

-- Count terms by source
SELECT source, COUNT(*) as term_count
FROM terms
GROUP BY source
ORDER BY term_count DESC;

-- Find all wheat-related terms
SELECT id, name, source, category
FROM terms
WHERE name LIKE '%wheat%'
ORDER BY source, name;

-- Full-text search
SELECT t.id, t.name, t.source, t.category
FROM terms_fts fts
JOIN terms t ON fts.id = t.id
WHERE terms_fts MATCH 'carbon emission'
LIMIT 20;

Advanced Queries:

-- Find terms with relationships
SELECT
  t1.id as source_id,
  t1.name as source_name,
  r.relationship_type,
  t2.name as target_name,
  t2.source as target_source,
  r.confidence
FROM terms t1
JOIN relationships r ON t1.id = r.source_term_id
JOIN terms t2 ON t2.id = r.target_term_id
WHERE t1.source = 'foodex2'
  AND t2.source = 'hestia'
ORDER BY r.confidence DESC
LIMIT 10;

-- Hierarchical queries
WITH RECURSIVE hierarchy AS (
  SELECT id, name, parent_terms, 1 as level
  FROM terms
  WHERE id = 'foodex2-A0101'

  UNION ALL

  SELECT t.id, t.name, t.parent_terms, h.level + 1
  FROM terms t
  JOIN hierarchy h
  WHERE json_extract(t.parent_terms, '$[0]') = h.id
)
SELECT * FROM hierarchy;

Integration Examples

Python:

import sqlite3

# Connect to database
conn = sqlite3.connect('glossary.db')
cursor = conn.cursor()

# Query terms
cursor.execute('''
  SELECT id, name, source, category
  FROM terms
  WHERE source = ?
  LIMIT 10
''', ('hestia',))

for row in cursor.fetchall():
    print(f"{row[0]}: {row[1]} ({row[2]})")

conn.close()

Node.js:

import Database from 'better-sqlite3'

const db = new Database('glossary.db')

// Prepared statement
const stmt = db.prepare(`
  SELECT id, name, source, category
  FROM terms
  WHERE source = ? AND category LIKE ?
`)

const results = stmt.all('hestia', '%Emission%')
console.log(`Found ${results.length} emission terms`)

JSON Format

File: glossary.json Size: 189 MB Use Case: Web applications, JavaScript/TypeScript integration

Complete glossary data in JSON format with full term details and metadata.

Structure

{
  "metadata": {
    "version": "0.1.2",
    "build": 6,
    "lastUpdated": "2025-12-08T02:54:36.996Z",
    "totalTerms": 168626,
    "sources": {
      "foodex2": 31601,
      "hestia": 36044,
      "ecoinvent": 33784,
      "agrovoc": 41447,
      "langual": 12836,
      "cpc": 4583,
      "sentier": 7731,
      "unece": 406,
      "gs1": 154,
      "eaternity": 40
    }
  },
  "terms": [
    {
      "@type": "Term",
      "id": "foodex2-A010101",
      "name": "Common wheat",
      "description": "Triticum aestivum, bread wheat",
      "source": "foodex2",
      "category": "Grains",
      "properties": {
        "hierarchyCode": "A010101",
        "scientificName": "Triticum aestivum",
        "level": 4
      },
      "external_mappings": [
        {
          "externalId": "hestia-crop-wheat",
          "externalSource": "hestia",
          "mappingType": "related"
        }
      ],
      "parent_terms": ["foodex2-A0101"],
      "metadata": {
        "searchable": true,
        "verified": true
      },
      "status": "active"
    }
  ]
}

Usage Examples

JavaScript/TypeScript:

// Load glossary
const glossary = await fetch('/glossary.json')
  .then(r => r.json())

// Filter by source
const hestiaTerms = glossary.terms.filter(t =>
  t.source === 'hestia'
)

// Search by name
const searchResults = glossary.terms.filter(t =>
  t.name.toLowerCase().includes('wheat')
)

// Group by category
const byCategory = glossary.terms.reduce((acc, term) => {
  const cat = term.category || 'Uncategorized'
  if (!acc[cat]) acc[cat] = []
  acc[cat].push(term)
  return acc
}, {})

Python:

import json

with open('glossary.json') as f:
    glossary = json.load(f)

# Access metadata
print(f"Version: {glossary['metadata']['version']}")
print(f"Total terms: {glossary['metadata']['totalTerms']}")

# Filter terms
hestia_terms = [
    t for t in glossary['terms']
    if t['source'] == 'hestia'
]

# Search
wheat_terms = [
    t for t in glossary['terms']
    if 'wheat' in t['name'].lower()
]

LinkML YAML

File: glossary.yaml Size: 157 MB Use Case: Semantic web, research, data validation

Native LinkML format with full semantic annotations and relationships.

Structure

terms:
  - '@type': Term
    id: foodex2-A010101
    name: Common wheat
    description: Triticum aestivum, bread wheat
    source: foodex2
    category: Grains
    properties:
      hierarchyCode: A010101
      scientificName: Triticum aestivum
      level: 4
    external_mappings:
      - externalId: hestia-crop-wheat
        externalSource: hestia
        mappingType: related
    parent_terms:
      - foodex2-A0101
    metadata:
      searchable: true
      verified: true
    status: active

Usage with LinkML

Python with LinkML Runtime:

from linkml_runtime.loaders import yaml_loader
from glossary_model import Glossary, Term

# Load glossary
glossary = yaml_loader.load('glossary.yaml', target_class=Glossary)

# Access terms
print(f"Loaded {len(glossary.terms)} terms")

# Filter by source
hestia_terms = [t for t in glossary.terms if t.source == 'hestia']

# Validate against schema
from linkml_runtime.utils.schemaview import SchemaView

schema = SchemaView('schema/glossary.linkml.yaml')
for term in glossary.terms[:10]:
    schema.validate_object(term, target_class='Term')

JSON-LD (Semantic Web)

File: glossary.jsonld Size: ~200 MB Use Case: Semantic web, RDF integration, linked data

JSON-LD format with semantic web context for RDF/SPARQL integration.

Structure

{
  "@context": {
    "@vocab": "http://esfc-glossary.org/vocab/",
    "skos": "http://www.w3.org/2004/02/skos/core#",
    "dc": "http://purl.org/dc/terms/",
    "rdfs": "http://www.w3.org/2000/01/rdf-schema#",
    "Term": "skos:Concept",
    "name": "skos:prefLabel",
    "description": "skos:definition",
    "source": "dc:source",
    "category": "skos:inScheme",
    "parent_terms": "skos:broader",
    "external_mappings": {
      "@id": "skos:relatedMatch",
      "@container": "@set"
    }
  },
  "@graph": [
    {
      "@type": "Term",
      "@id": "foodex2:A010101",
      "name": "Common wheat",
      "description": "Triticum aestivum, bread wheat",
      "source": "foodex2",
      "category": "Grains",
      "parent_terms": ["foodex2:A0101"],
      "external_mappings": [
        {
          "@id": "hestia:crop-wheat",
          "mappingType": "related"
        }
      ]
    }
  ]
}

SPARQL Queries

PREFIX skos: <http://www.w3.org/2004/02/skos/core#>
PREFIX dc: <http://purl.org/dc/terms/>

# Find all wheat terms
SELECT ?term ?label ?source WHERE {
  ?term skos:prefLabel ?label ;
        dc:source ?source .
  FILTER(CONTAINS(LCASE(?label), "wheat"))
}
LIMIT 10

# Find related terms
SELECT ?source ?target ?type WHERE {
  ?source skos:relatedMatch ?target .
  ?source dc:source "foodex2" .
  ?target dc:source "hestia" .
}

Generated Formats

TypeScript Types

File: glossary.types.ts Size: ~500 KB Use Case: Type-safe TypeScript/JavaScript integration

Generated TypeScript type definitions for the glossary schema.

Generated Types

/**
 * Main glossary interface
 */
export interface Glossary {
  metadata: GlossaryMetadata
  terms: Term[]
}

/**
 * Metadata about the glossary
 */
export interface GlossaryMetadata {
  version: string
  build: number
  lastUpdated: string
  totalTerms: number
  sources: Record<string, number>
}

/**
 * Individual glossary term
 */
export interface Term {
  '@type': 'Term'
  id: string
  name: string
  description?: string
  source: GlossarySource
  category?: string
  properties?: Record<string, any>
  external_mappings?: ExternalMapping[]
  parent_terms?: string[]
  metadata?: Record<string, any>
  status: TermStatus
}

/**
 * Glossary sources
 */
export type GlossarySource =
  | 'foodex2'
  | 'hestia'
  | 'ecoinvent'
  | 'agrovoc'
  | 'langual'
  | 'cpc'
  | 'sentier'
  | 'unece'
  | 'gs1'
  | 'eaternity'

/**
 * External mapping to other vocabularies
 */
export interface ExternalMapping {
  externalId: string
  externalSource: string
  mappingType: 'exact' | 'related' | 'broader' | 'narrower'
  confidence?: number
}

/**
 * Term status
 */
export type TermStatus = 'active' | 'deprecated' | 'obsolete'

Usage

import { Glossary, Term, GlossarySource } from './glossary.types'

async function loadGlossary(): Promise<Glossary> {
  const response = await fetch('/glossary.json')
  return response.json()
}

function filterBySource(
  terms: Term[],
  source: GlossarySource
): Term[] {
  return terms.filter(t => t.source === source)
}

// Type-safe usage
const glossary = await loadGlossary()
const hestiaTerms = filterBySource(glossary.terms, 'hestia')

// TypeScript ensures type safety
console.log(`Found ${hestiaTerms.length} Hestia terms`)

SQL DDL Schema

File: glossary.sql Size: ~50 KB Use Case: Database schema creation, PostgreSQL/MySQL setup

SQL schema definition for creating database tables.

Generated Schema

-- Terms table
CREATE TABLE terms (
  id VARCHAR(255) PRIMARY KEY,
  name TEXT NOT NULL,
  description TEXT,
  source VARCHAR(50) NOT NULL,
  category VARCHAR(255),
  properties JSONB,
  external_mappings JSONB,
  parent_terms JSONB,
  metadata JSONB,
  status VARCHAR(50) DEFAULT 'active',
  created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP,
  updated_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP
);

-- Indexes
CREATE INDEX idx_terms_source ON terms(source);
CREATE INDEX idx_terms_category ON terms(category);
CREATE INDEX idx_terms_name ON terms USING gin(to_tsvector('english', name));
CREATE INDEX idx_terms_properties ON terms USING gin(properties);

-- Full-text search (PostgreSQL)
CREATE INDEX idx_terms_fts ON terms
USING gin(to_tsvector('english', coalesce(name, '') || ' ' || coalesce(description, '')));

-- Materialized view for source statistics
CREATE MATERIALIZED VIEW source_statistics AS
SELECT
  source,
  COUNT(*) as term_count,
  COUNT(DISTINCT category) as category_count,
  MIN(created_at) as first_added,
  MAX(updated_at) as last_updated
FROM terms
GROUP BY source;

RDF/OWL Ontology

File: glossary.owl Size: ~250 MB Use Case: Semantic web applications, ontology reasoning

OWL ontology for semantic web reasoning and inference.

Ontology Structure

<?xml version="1.0"?>
<rdf:RDF xmlns="http://esfc-glossary.org/ontology#"
     xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#"
     xmlns:owl="http://www.w3.org/2002/07/owl#"
     xmlns:rdfs="http://www.w3.org/2000/01/rdf-schema#"
     xmlns:skos="http://www.w3.org/2004/02/skos/core#">

  <owl:Ontology rdf:about="http://esfc-glossary.org/ontology">
    <rdfs:label>ESFC Glossary Ontology</rdfs:label>
    <rdfs:comment>
      Unified food and LCA glossary ontology
    </rdfs:comment>
  </owl:Ontology>

  <!-- Classes -->
  <owl:Class rdf:about="http://esfc-glossary.org/ontology#Term">
    <rdfs:label>Term</rdfs:label>
    <rdfs:subClassOf rdf:resource="http://www.w3.org/2004/02/skos/core#Concept"/>
  </owl:Class>

  <!-- Properties -->
  <owl:DatatypeProperty rdf:about="http://esfc-glossary.org/ontology#source">
    <rdfs:domain rdf:resource="http://esfc-glossary.org/ontology#Term"/>
    <rdfs:range rdf:resource="http://www.w3.org/2001/XMLSchema#string"/>
  </owl:DatatypeProperty>

  <!-- Individuals (Terms) -->
  <owl:NamedIndividual rdf:about="http://esfc-glossary.org/terms/foodex2-A010101">
    <rdf:type rdf:resource="http://esfc-glossary.org/ontology#Term"/>
    <skos:prefLabel>Common wheat</skos:prefLabel>
    <skos:definition>Triticum aestivum, bread wheat</skos:definition>
  </owl:NamedIndividual>

</rdf:RDF>

Export Formats

CSV Export

Generate CSV files for specific term subsets.

Export Structure

id,name,description,source,category,properties,status
foodex2-A010101,"Common wheat","Triticum aestivum",foodex2,Grains,"{""hierarchyCode"":""A010101""}",active
hestia-crop-wheat,"Wheat crop","Agricultural wheat production",hestia,"Inputs & Products","{}",active

Export Scripts

# Export all terms to CSV
npm run export:csv

# Export specific source
npm run export:csv -- --source hestia

# Export with filters
npm run export:csv -- --source foodex2 --category Grains

Excel Export

Multi-sheet Excel workbook with organized data.

Workbook Structure

Sheet 1: Overview

Metadata and statistics
Source summary
Category breakdown

Sheet 2: All Terms

Complete term list
Filterable columns
Color-coded by source

Sheet 3: FoodEx2

FoodEx2 terms with hierarchy
Facet information

Sheet 4: Hestia

Hestia LCA terms
Category organization

Sheet 5: Relationships

Cross-source mappings
Confidence scores
Mapping methods

Generation

# Generate Excel workbook
npm run export:excel

# Custom export
node scripts/export-excel.js \
  --output glossary.xlsx \
  --include-relationships

Download Locations

All formats are available for download:

https://esfc-glossary-ec2bc9.gitlab.io/downloads/
├── glossary.db          # SQLite database (133 MB)
├── glossary.json        # JSON format (189 MB)
├── glossary.yaml        # LinkML YAML (157 MB)
├── glossary.jsonld      # JSON-LD (200 MB)
├── glossary.types.ts    # TypeScript types (500 KB)
├── glossary.owl         # OWL ontology (250 MB)
├── glossary.sql         # SQL DDL (50 KB)
├── glossary.csv         # CSV export (variable)
└── glossary.xlsx        # Excel workbook (variable)

Format Selection Guide

Choose the right format for your use case:

Use Case	Recommended Format	Why
Web application	JSON or SQLite	Fast loading, easy integration
Type-safe development	TypeScript Types + JSON	Type safety and autocomplete
Database application	SQLite or SQL DDL	Optimized queries
Semantic web	JSON-LD or RDF/OWL	RDF/SPARQL compatibility
Research	LinkML YAML	Full semantic annotations
Data analysis	CSV or Excel	Spreadsheet tools
Python integration	SQLite or LinkML YAML	Native support
Node.js integration	JSON or SQLite	Easy parsing

Generation Pipeline

All formats are generated from the LinkML schema:

LinkML Schema (glossary.linkml.yaml)
    ↓
Data Parsing & Validation
    ↓
LinkML YAML (native format)
    ↓
Multi-Format Generation
    ├── JSON (linkml-convert)
    ├── JSON-LD (linkml-convert)
    ├── TypeScript (linkml-generate-typescript)
    ├── OWL (linkml-convert)
    ├── SQL DDL (linkml-generate-sql)
    └── SQLite (custom script)
    ↓
Optimization & Compression
    ↓
Deployment to CDN

Data Sources - Overview of all 10 sources
Semantic Mapping - Cross-source relationships
Glossary Overview - Main documentation
FoodEx2 Reference - Food classification
Hestia Reference - LCA data

Overview​

Primary Formats​

SQLite Database​

Database Schema​

Query Examples​

Integration Examples​

JSON Format​

Structure​

Usage Examples​

LinkML YAML​

Structure​

Usage with LinkML​

JSON-LD (Semantic Web)​

Structure​

SPARQL Queries​

Generated Formats​

TypeScript Types​

Generated Types​

Usage​

SQL DDL Schema​

Generated Schema​

RDF/OWL Ontology​

Ontology Structure​

Export Formats​

CSV Export​

Export Structure​

Export Scripts​

Excel Export​

Workbook Structure​

Generation​

Download Locations​

Format Selection Guide​

Generation Pipeline​

Related Documentation​

Overview

Primary Formats

SQLite Database

Database Schema

Query Examples

Integration Examples

JSON Format

Structure

Usage Examples

LinkML YAML

Structure

Usage with LinkML

JSON-LD (Semantic Web)

Structure

SPARQL Queries

Generated Formats

TypeScript Types

Generated Types

Usage

SQL DDL Schema

Generated Schema

RDF/OWL Ontology

Ontology Structure

Export Formats

CSV Export

Export Structure

Export Scripts

Excel Export

Workbook Structure

Generation

Download Locations

Format Selection Guide

Generation Pipeline

Related Documentation