Cryptography
Canonicalization
The process of producing a deterministic byte representation of structured data — mandatory before hashing anything meant to be signed.
Canonicalization is the generic problem of producing a unique, deterministic serialization for data that has multiple equivalent representations. JSON is the archetypal example: {"a": 1, "b": 2} and {"b":2,"a":1} are semantically identical but byte-different. Hashing either produces different digests. Without canonicalization, signatures on structured data are fragile.
For JSON-LD credentials, LearnCoin uses URDNA2015 — the RDF Dataset Canonicalization algorithm that produces a sorted, deterministic N-Quads serialization regardless of how the input JSON was structured. For non-JSON contexts, the parallel algorithms are JCS (JSON Canonicalization Scheme, RFC 8785) for plain JSON and XML Canonicalization (C14N) for XML.
The canonicalization is the byte input to the hash function, and the hash is what gets signed. Any serializer bug that breaks determinism — a Unicode normalization slip, an integer precision edge case, a sort-order inconsistency — breaks signature verification across all affected credentials. That's why the canonicalization algorithms are so precisely specified.
Related terms