Kez/python/kez/mnemonic.py
Jason Tudisco b0cc1a74a0 feat(python,crosstest): mirror BIP-39 mnemonic to Python + add interop scenarios
Completes the three-way BIP-39 mnemonic surface (Rust + Node landed in
0058d9b) and pins down byte-for-byte agreement with crosstest scenarios.

Python (mirrors rust/crates/kez-core/src/mnemonic.rs + nodejs's mnemonic.ts):
  • python/kez/mnemonic.py — generate_mnemonic, seed_from_mnemonic,
    mnemonic_from_seed_24, ed25519_from_mnemonic,
    generate_ed25519_with_mnemonic. Same 24-word-bijection / 12-word-
    SHA-256-domain-tagged semantics. Uses Trezor's `mnemonic` library
    (v0.21) for the BIP-39 wordlist + entropy parsing; deliberately does
    NOT use BIP-39's PBKDF2 to_seed function.
  • python/kez/keys.py — Ed25519Secret.from_mnemonic() +
    generate_with_mnemonic() classmethods; signer_from_flags widened to
    accept --mnemonic.
  • python/kez/cli.py — identity new --mnemonic-words, identity
    mnemonic [--words], identity from-mnemonic; --mnemonic flag on
    claim create/dns and sigchain add/revoke/show/export. Output format
    matches Rust + Node verbatim so the crosstest harness can grep
    Primary/Public/Secret/Mnemonic lines.
  • python/tests/test_mnemonic.py — 19 tests covering all three
    canonical vectors (exact-match Secret + Public hex), round-trip,
    determinism, whitespace tolerance, bad-checksum, bad-word-count,
    the literal domain-tag bytes, and the 12-vs-24 entropy-overlap
    non-collision case.

Note: --mnemonic is NOT added to `sigchain publish` because that
subcommand doesn't exist in the Python CLI yet (rust + node only). When
the publish surface is ported, --mnemonic should follow it the same way.

Ground truth — python/MNEMONIC-TEST-VECTORS.md:
  V1: 24-word zero-entropy phrase ("abandon… art")
      seed   = 0000…0000
      pubkey = 3b6a27bcceb6a42d62a3a8d02a6f0d73653215771de243a63ac048a18b59da29
  V2: 12-word zero-entropy phrase ("abandon… about")
      seed   = 09451c0f06588db78205e32a793536e15ae263c8f9ee6d14f5c6fd82b8bd20da
      pubkey = 9403c32e0d3b4ce51105c0bcac09a0d73be0cca98a6bf7b3cd434651be866d70
  V3: 12-word "legal winner thank year wave sausage worth useful legal winner thank yellow"
      seed   = 9df434a2bd5dc767ee949d8ab95ca09c4ebbb88cefc3d0b1523f6b2a744ca824
      pubkey = cc99d06b15ccb83a5ca43f25dd3d27f50638c1c6fbe3a822352da3e07156ce03

  The domain tag for the 12-word derivation is exactly the 15 ASCII
  bytes of "kez-bip39-12-v1", documented in the spec doc.

crosstest.sh — new "BIP-39 mnemonic interop" section:
  • Vector match: each impl × each vector × Public hex == expected (9
    scenarios). Catches any silent derivation drift.
  • Cross-impl claim signing via --mnemonic: every signer ↔ verifier
    pair (rust↔node, rust↔py, node↔py), every format (json/compact/
    markdown). 6 pairings × 3 formats = 18 scenarios.
  • Bijection sanity: the 24-word phrase printed by `identity from-
    mnemonic` round-trips to itself byte-for-byte (rust + node).
  • Python-involving scenarios auto-skip if `python/.venv/bin/python
    kez_cli.py identity from-mnemonic` returns non-zero, so the harness
    stays runnable on machines where Python isn't set up.

Verified end-to-end: `bash crosstest.sh` reports
  "All 84 scenarios passed."

Test totals across implementations:
  Rust:   114 (9 mnemonic-specific in kez-core)
  Node:    99 (8 mnemonic-specific in @kez/core)
  Python:  19 (mnemonic only; was no test suite before)
  Crosstest: 84 scenarios end-to-end

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-06-05 17:50:34 -06:00

99 lines
3.3 KiB
Python

"""BIP-39 mnemonic phrases for Ed25519 primary keys.
Mirrors ``rust/crates/kez-core/src/mnemonic.rs`` and
``nodejs/packages/kez-core/src/mnemonic.ts`` byte-for-byte.
Two word counts are supported, with different semantics:
- **24 words** ↔ **32 bytes of entropy** ↔ **Ed25519 seed** (bijection).
Round-trips perfectly. The entropy *is* the seed.
- **12 words** → **16 bytes of entropy** → **Ed25519 seed**, via
``SHA-256("kez-bip39-12-v1" || entropy)``. One-way KEZ-specific
derivation; you cannot recover a 12-word phrase from a seed.
Wordlist: BIP-39 English. NB: we deliberately do *not* use BIP-39's
``to_seed(passphrase)`` function — that produces a 64-byte seed via
PBKDF2, intended to feed into BIP-32 hierarchical derivation. KEZ has
one identity per phrase, so taking the entropy directly (or hashing it
once for 12-word phrases) is the right primitive.
"""
from __future__ import annotations
import hashlib
from mnemonic import Mnemonic as _Bip39
from .keys import Ed25519Secret
# Domain separator for the 12-word → seed derivation. Bumping this would
# break every existing 12-word KEZ identity, so don't.
DOMAIN_TAG_12: bytes = b"kez-bip39-12-v1"
# Lazy singleton of the English BIP-39 wordlist parser.
_M = _Bip39("english")
def _assert_words(n: int) -> None:
if n not in (12, 24):
raise ValueError(f"mnemonic word count must be 12 or 24, got {n}")
def generate_mnemonic(words: int) -> str:
"""Generate a fresh BIP-39 mnemonic of the requested length.
The returned phrase is a space-separated lowercase string from the
BIP-39 English wordlist. ``words`` must be 12 or 24.
"""
_assert_words(words)
# bip39 strength is in bits: 12 words = 128 bits, 24 = 256.
strength = 256 if words == 24 else 128
return _M.generate(strength=strength)
def seed_from_mnemonic(phrase: str) -> bytes:
"""Decode a phrase (12 or 24 words) to a 32-byte Ed25519 seed.
For 24 words the entropy IS the seed; for 12 words the seed is
``SHA-256(DOMAIN_TAG_12 || entropy)``.
"""
trimmed = " ".join(phrase.split())
try:
entropy = bytes(_M.to_entropy(trimmed))
except Exception as exc: # noqa: BLE001 — wrap as our own error
raise ValueError(f"invalid mnemonic: {exc}") from exc
if len(entropy) == 32:
return entropy
if len(entropy) == 16:
return hashlib.sha256(DOMAIN_TAG_12 + entropy).digest()
raise ValueError(
f"mnemonic must decode to 16 or 32 bytes of entropy, got {len(entropy)}"
)
def mnemonic_from_seed_24(seed: bytes) -> str:
"""Inverse of :func:`seed_from_mnemonic` for the 24-word case ONLY.
There is no inverse for 12-word phrases (hashing is one-way) — this
function always produces 24 words.
"""
if len(seed) != 32:
raise ValueError(
f"mnemonic_from_seed_24: seed must be 32 bytes, got {len(seed)}"
)
return _M.to_mnemonic(seed)
def ed25519_from_mnemonic(phrase: str) -> Ed25519Secret:
"""Reconstruct an :class:`Ed25519Secret` from a BIP-39 phrase."""
return Ed25519Secret(seed_from_mnemonic(phrase))
def generate_ed25519_with_mnemonic(words: int) -> tuple[Ed25519Secret, str]:
"""Generate a fresh Ed25519 identity *and* return its BIP-39 phrase."""
phrase = generate_mnemonic(words)
secret = ed25519_from_mnemonic(phrase)
return secret, phrase