Kez/python/MNEMONIC-TEST-VECTORS.md
Jason Tudisco b0cc1a74a0 feat(python,crosstest): mirror BIP-39 mnemonic to Python + add interop scenarios
Completes the three-way BIP-39 mnemonic surface (Rust + Node landed in
0058d9b) and pins down byte-for-byte agreement with crosstest scenarios.

Python (mirrors rust/crates/kez-core/src/mnemonic.rs + nodejs's mnemonic.ts):
  • python/kez/mnemonic.py — generate_mnemonic, seed_from_mnemonic,
    mnemonic_from_seed_24, ed25519_from_mnemonic,
    generate_ed25519_with_mnemonic. Same 24-word-bijection / 12-word-
    SHA-256-domain-tagged semantics. Uses Trezor's `mnemonic` library
    (v0.21) for the BIP-39 wordlist + entropy parsing; deliberately does
    NOT use BIP-39's PBKDF2 to_seed function.
  • python/kez/keys.py — Ed25519Secret.from_mnemonic() +
    generate_with_mnemonic() classmethods; signer_from_flags widened to
    accept --mnemonic.
  • python/kez/cli.py — identity new --mnemonic-words, identity
    mnemonic [--words], identity from-mnemonic; --mnemonic flag on
    claim create/dns and sigchain add/revoke/show/export. Output format
    matches Rust + Node verbatim so the crosstest harness can grep
    Primary/Public/Secret/Mnemonic lines.
  • python/tests/test_mnemonic.py — 19 tests covering all three
    canonical vectors (exact-match Secret + Public hex), round-trip,
    determinism, whitespace tolerance, bad-checksum, bad-word-count,
    the literal domain-tag bytes, and the 12-vs-24 entropy-overlap
    non-collision case.

Note: --mnemonic is NOT added to `sigchain publish` because that
subcommand doesn't exist in the Python CLI yet (rust + node only). When
the publish surface is ported, --mnemonic should follow it the same way.

Ground truth — python/MNEMONIC-TEST-VECTORS.md:
  V1: 24-word zero-entropy phrase ("abandon… art")
      seed   = 0000…0000
      pubkey = 3b6a27bcceb6a42d62a3a8d02a6f0d73653215771de243a63ac048a18b59da29
  V2: 12-word zero-entropy phrase ("abandon… about")
      seed   = 09451c0f06588db78205e32a793536e15ae263c8f9ee6d14f5c6fd82b8bd20da
      pubkey = 9403c32e0d3b4ce51105c0bcac09a0d73be0cca98a6bf7b3cd434651be866d70
  V3: 12-word "legal winner thank year wave sausage worth useful legal winner thank yellow"
      seed   = 9df434a2bd5dc767ee949d8ab95ca09c4ebbb88cefc3d0b1523f6b2a744ca824
      pubkey = cc99d06b15ccb83a5ca43f25dd3d27f50638c1c6fbe3a822352da3e07156ce03

  The domain tag for the 12-word derivation is exactly the 15 ASCII
  bytes of "kez-bip39-12-v1", documented in the spec doc.

crosstest.sh — new "BIP-39 mnemonic interop" section:
  • Vector match: each impl × each vector × Public hex == expected (9
    scenarios). Catches any silent derivation drift.
  • Cross-impl claim signing via --mnemonic: every signer ↔ verifier
    pair (rust↔node, rust↔py, node↔py), every format (json/compact/
    markdown). 6 pairings × 3 formats = 18 scenarios.
  • Bijection sanity: the 24-word phrase printed by `identity from-
    mnemonic` round-trips to itself byte-for-byte (rust + node).
  • Python-involving scenarios auto-skip if `python/.venv/bin/python
    kez_cli.py identity from-mnemonic` returns non-zero, so the harness
    stays runnable on machines where Python isn't set up.

Verified end-to-end: `bash crosstest.sh` reports
  "All 84 scenarios passed."

Test totals across implementations:
  Rust:   114 (9 mnemonic-specific in kez-core)
  Node:    99 (8 mnemonic-specific in @kez/core)
  Python:  19 (mnemonic only; was no test suite before)
  Crosstest: 84 scenarios end-to-end

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-06-05 17:50:34 -06:00

2.4 KiB

KEZ Mnemonic — canonical test vectors

These vectors are ground truth that all three implementations (Rust, Node, Python) MUST match byte-for-byte. Generated from the Rust and Node implementations, which have already been verified to agree (see mnemonics branch commit 0058d9b).

Semantics

  • 24-word phrase → entropy IS the 32-byte Ed25519 seed (bijection).
  • 12-word phrase → 16-byte entropy → 32-byte seed via SHA-256("kez-bip39-12-v1" || entropy). Domain tag bytes: 0x6b, 0x65, 0x7a, 0x2d, 0x62, 0x69, 0x70, 0x33, 0x39, 0x2d, 0x31, 0x32, 0x2d, 0x76, 0x31 (15 bytes, UTF-8 of "kez-bip39-12-v1").

Wordlist: BIP-39 English (the canonical 2048-word list).

Vectors

V1 — 24-word, all-zero entropy

phrase:  abandon abandon abandon abandon abandon abandon abandon abandon
         abandon abandon abandon abandon abandon abandon abandon abandon
         abandon abandon abandon abandon abandon abandon abandon art
seed:    0000000000000000000000000000000000000000000000000000000000000000
pubkey:  3b6a27bcceb6a42d62a3a8d02a6f0d73653215771de243a63ac048a18b59da29

V2 — 12-word, all-zero entropy

phrase:  abandon abandon abandon abandon abandon abandon abandon abandon
         abandon abandon abandon about
seed:    09451c0f06588db78205e32a793536e15ae263c8f9ee6d14f5c6fd82b8bd20da
pubkey:  9403c32e0d3b4ce51105c0bcac09a0d73be0cca98a6bf7b3cd434651be866d70

V3 — 12-word, non-trivial entropy

phrase:  legal winner thank year wave sausage worth useful legal winner
         thank yellow
seed:    9df434a2bd5dc767ee949d8ab95ca09c4ebbb88cefc3d0b1523f6b2a744ca824
pubkey:  cc99d06b15ccb83a5ca43f25dd3d27f50638c1c6fbe3a822352da3e07156ce03

What "pubkey" means here

pubkey is the 32-byte Ed25519 public key (hex) derived from the seed above via the standard Ed25519 keypair derivation (the same as ed25519-dalek / @noble/curves/ed25519). The KEZ identity string is ed25519:<pubkey>.

Implementation crib

Both Rust and Node load the raw entropy from the BIP-39 phrase (not the BIP-39 PBKDF2-derived 64-byte seed). 24-word entropy is 32 bytes and is used directly as the seed. 12-word entropy is 16 bytes and is hashed once with the domain tag to produce the 32-byte seed.

This deliberately differs from how hardware wallets use the same phrases (which feed the PBKDF2 64-byte seed into BIP-32 derivation). KEZ has one identity per phrase, no derivation tree.