---
title: "Formats, Validation & Identity"
description: "XML ↔ JSON-LD conversion, multi-layer validation, identifier translation, idempotent event hashing."
canonical_url: "https://openepcis.io/docs/platform-overview/modules/formats"
last_updated: "2026-07-02T20:31:25.812Z"
---

Before an EPCIS document touches the event store it goes through this layer: it gets parsed, validated, canonicalised, hashed for deduplication, and its identifiers are normalised to Digital Link form. XML or JSON-LD goes in; a trusted, deduplicated, canonical event comes out. By the time anything is indexed, the platform has guaranteed that it's standards-conformant, semantically equivalent across XML and JSON-LD representations, and impossible to insert twice.

Two converters ship for two scopes. The open-source edition includes an XSLT-based converter — load the document, run the transform, emit the result. It's a clean approach for single events, small batches and plain event shapes. The Business edition adds a SAX-streaming converter for production-volume work: multi-gigabyte EPCIS exports stream through at network speed with bounded memory, the JVM heap stays flat as the document grows, deep extension trees and sensor payloads survive intact, mixed 1.2 / 2.0 batches pass through cleanly, and the edge cases where load-then-transform either struggles or quietly drops information are handled correctly. A streamed conversion plugs into the same validation and event-hash stages, so it lands as a fully canonical, deduplicated event in one pass. For organisations moving production volumes — especially migrating live 1.2 corpora to 2.0 — this is the headline difference.

Validation runs in layers. JSON Schema first, then custom-extension shapes at every nesting level of the event (parent, readPoint, bizLocation, errorDeclaration, sensorElement, ILMD, bizStep, disposition), then sensor-element rules. Anything that fails any layer is rejected at the boundary. Custom namespaces (`battery:`, `eudr:`, `textile:`, customer extensions) only get validated when the request declares them via the `GS1-Extensions` HTTP header; without that declaration the validator lets them through untouched. The header is the explicit opt-in that activates regulation-specific or vendor-specific validation rules.

Event hashes are computed against a *canonicalised* representation of the event content — field order, types, whitespace all ironed out per the EPCIS specification — not against the raw bytes that arrived. Two events that differ only in JSON whitespace or attribute order produce the same hash, so re-sends and round-trips through different serialisers produce the same event ID. Canonicalisation is CBV-version-aware: the rules evolve alongside the spec without breaking historical hashes.

EPCIS 1.2 ↔ 2.0 XML migration works in both directions, which matters when an organisation is still receiving 1.2 from upstream partners while shipping 2.0 downstream.

## Capabilities by edition

<table>
<thead>
  <tr>
    <th>
      Capability
    </th>
    
    <th>
      OSS
    </th>
    
    <th>
      Business
    </th>
  </tr>
</thead>

<tbody>
  <tr>
    <td>
      XML ↔ JSON-LD conversion (XSLT, load-then-transform)
    </td>
    
    <td>
      <span className="fm-yes">
        ✓
      </span>
    </td>
    
    <td>
      <span className="fm-yes">
        ✓
      </span>
    </td>
  </tr>
  
  <tr>
    <td>
      Streaming XML ↔ JSON-LD conversion (SAX, bounded memory)
    </td>
    
    <td>
      <span className="fm-no">
        —
      </span>
    </td>
    
    <td>
      <span className="fm-yes">
        ✓
      </span>
    </td>
  </tr>
  
  <tr>
    <td>
      EPCIS 1.2 ↔ 2.0 XML migration
    </td>
    
    <td>
      <span className="fm-yes">
        ✓
      </span>
    </td>
    
    <td>
      <span className="fm-yes">
        ✓
      </span>
    </td>
  </tr>
  
  <tr>
    <td>
      EPCIS document validation
    </td>
    
    <td>
      <span className="fm-yes">
        ✓
      </span>
    </td>
    
    <td>
      <span className="fm-yes">
        ✓
      </span>
    </td>
  </tr>
  
  <tr>
    <td>
      Multi-level custom-extension validation
    </td>
    
    <td>
      <span className="fm-no">
        —
      </span>
    </td>
    
    <td>
      <span className="fm-yes">
        ✓
      </span>
    </td>
  </tr>
  
  <tr>
    <td>
      Sensor element validation
    </td>
    
    <td>
      <span className="fm-no">
        —
      </span>
    </td>
    
    <td>
      <span className="fm-yes">
        ✓
      </span>
    </td>
  </tr>
  
  <tr>
    <td>
      Pre-canonical event hash (idempotent IDs)
    </td>
    
    <td>
      <span className="fm-yes">
        ✓
      </span>
    </td>
    
    <td>
      <span className="fm-yes">
        ✓
      </span>
    </td>
  </tr>
  
  <tr>
    <td>
      Web UI for format conversion
    </td>
    
    <td>
      <span className="fm-no">
        —
      </span>
    </td>
    
    <td>
      <span className="fm-yes">
        ✓
      </span>
    </td>
  </tr>
  
  <tr>
    <td>
      Hash generator as a service
    </td>
    
    <td>
      <span className="fm-no">
        —
      </span>
    </td>
    
    <td>
      <span className="fm-yes">
        ✓
      </span>
    </td>
  </tr>
</tbody>
</table>

## See also

- [Architecture → GS1 conformance contract](/docs/platform-overview/architecture#gs1-conformance-contract) — the discipline rules these modules enforce.
- [Modules → EPCIS Events](/docs/platform-overview/modules/epcis-events) — where the validated, hashed events go next.
- [Modules → Testdata](/docs/platform-overview/modules/testdata) — generators that respect the same conformance rules.
