Ferrum — GA4GH infrastructure that actually runs.

Complete GA4GH stack on-premise — for clinics, data integration centres, and genomDE data nodes that cannot send raw data to the cloud. Tested, documented, in Rust.

Built in Rust On-Premises BUSL-1.1

Evidence you can inspect

Five public repositories form the GA4GH stack: Ferrum (data/compute), ga4gh-infra (identity), Lab Kit (deploy), Demo (benchmark), and HelixTest (conformance). Apache-2.0 where stated; BUSL-1.1 covers the integrated Ferrum runtime with a clear research allowance and four-year conversion to Apache-2.0 (see LICENSE).

Why Ferrum exists

Most GA4GH implementations are cloud-first, hard to verify, or simply not finished. Ferrum is built for teams that know their data must stay on-premise — while still needing to interoperate with GA4GH-compatible networks. The GA4GH APIs are good. There should be a system that implements them consistently. So we built it.

Designed for resource-constrained environments

Ferrum is not built only for European institutions with stable infrastructure. A growing group of users works in settings with intermittent connectivity, limited hardware, and a need for sovereign data infrastructure without cloud dependency. Ferrum runs in laptop mode on a single device, recovers from power loss via checkpoints, and ingests Nanopore data directly from the lab.

Features

Feature Description
Offline-first / laptop mode Runs on a single laptop with SQLite and local storage. No PostgreSQL, no MinIO required. Recovers cleanly from power loss.
Nanopore / ONT integration Native ingestion of POD5/FAST5/BLOW5 files. Stores ONT quality metrics (Q-score, N50) alongside GA4GH DRS objects.
Multi-pathogen Beacon Beacon v2 queries across TB, malaria, AMR, and viral pathogens — on the same infrastructure as human genomics data.
Outbreak mode Controlled emergency data sharing: pre-configured policies that grant authorised public-health beacon access during an outbreak, with a full audit trail. Not open, not closed — controlled.
Federated Beacon (P2P) Ferrum instances query each other directly — no central coordinator, no cloud dependency. Works across slow or intermittent links.
Battery / solar aware Detects power source (AC/battery/UPS). Reduces load on battery, writes a checkpoint before emergency shutdown.
Data residency audit Cryptographically chained, append-only log of all data movements. Proves data stayed within your institution.

What we need to say honestly

Ferrum is tested — HelixTest runs in CI, the GA4GH demo is reproducible, the architecture is written to scale. What we do not yet have: a deployment with truly large clinical datasets. That is not an architecture limitation — it is because we have not had the resources for that yet. That is something we want to do with the first real partner.

We are looking for a first productive pilot

Ferrum is tested and documented — but a deployment with real clinical data volumes, e.g. at a DIC site or genomDE data node, has not happened yet. That is something we want to do with the right partner. Who that is: an institution that wants to become GA4GH-compliant, on-premise, without cloud dependency — and that brings openness for genuine collaboration rather than a classical software purchase.

Ferrum + GHGA — two sides of one infrastructure

GHGA is the national archive for genomic data. Ferrum is the local side: GA4GH-compliant infrastructure at the clinic or data node, before data is submitted to GHGA. Crypt4GH is implemented in Ferrum — the same encryption GHGA uses. DRS interface for data transfer. No cloud as an intermediate step.

Crypt4GH DRS-compatible No cloud intermediary GHGA-complementary

MII core dataset integration

Ferrum MII Connect checks FHIR profiles against the MII core dataset offline — no external API dependency. JSON/SARIF reports for ETL-CI. For DIC sites that want to embed MII conformance checks into their pipeline CI.

MII Connect documentation →

GA4GH Services

TRS · DRS · WES · TES · htsget · Beacon v2 · Passports — all under one gateway, with shared authentication.

GA4GH implementation on GitHub →

Deployment options

Demo, single node, HPC cluster, Kubernetes — all options are documented.

Option Description
Demo Docker stack for evaluation and conformance testing (HelixTest, GA4GH demo).
Single Node PostgreSQL and MinIO on one server — standard layout for production on-premise deployments.
HPC Cluster SLURM/LSF integration for workflow execution in cluster environments.
Kubernetes Kubernetes manifests and Helm charts for scalable cluster deployments.
Laptop / offline No external dependencies. SQLite and local filesystem. Suitable for field labs and resource-constrained environments. Field & offline deployment →
Deployment docs on GitHub →

Installation

Quickstart on GitHub →

Licence & collaboration

Ferrum is licensed under BUSL-1.1 — free for research and non-commercial use, with a commercial licence for production deployments. After four years the licence converts to Apache-2.0. You can licence it, request features, or use it as a starting point for your own infrastructure. No vendor lock-in.

Designed for regulated environments (GDPR, EHDS, NIS2, HIPAA as orientation). Includes technical compliance tools (MII profile checks, CI reports). No certification promise — technical evidence, not legal advice.

Read the Ferrum & GA4GH white paper — HelixTest conformance checks, GA4GH demo benchmarks, architecture, and how we work.

Regulatory context: EHDS · NIS2 · GDPR & health data

We typically reply within two business days. Repository and licence details are available on request.