Completes the three-way BIP-39 mnemonic surface (Rust + Node landed in
0058d9b) and pins down byte-for-byte agreement with crosstest scenarios.
Python (mirrors rust/crates/kez-core/src/mnemonic.rs + nodejs's mnemonic.ts):
• python/kez/mnemonic.py — generate_mnemonic, seed_from_mnemonic,
mnemonic_from_seed_24, ed25519_from_mnemonic,
generate_ed25519_with_mnemonic. Same 24-word-bijection / 12-word-
SHA-256-domain-tagged semantics. Uses Trezor's `mnemonic` library
(v0.21) for the BIP-39 wordlist + entropy parsing; deliberately does
NOT use BIP-39's PBKDF2 to_seed function.
• python/kez/keys.py — Ed25519Secret.from_mnemonic() +
generate_with_mnemonic() classmethods; signer_from_flags widened to
accept --mnemonic.
• python/kez/cli.py — identity new --mnemonic-words, identity
mnemonic [--words], identity from-mnemonic; --mnemonic flag on
claim create/dns and sigchain add/revoke/show/export. Output format
matches Rust + Node verbatim so the crosstest harness can grep
Primary/Public/Secret/Mnemonic lines.
• python/tests/test_mnemonic.py — 19 tests covering all three
canonical vectors (exact-match Secret + Public hex), round-trip,
determinism, whitespace tolerance, bad-checksum, bad-word-count,
the literal domain-tag bytes, and the 12-vs-24 entropy-overlap
non-collision case.
Note: --mnemonic is NOT added to `sigchain publish` because that
subcommand doesn't exist in the Python CLI yet (rust + node only). When
the publish surface is ported, --mnemonic should follow it the same way.
Ground truth — python/MNEMONIC-TEST-VECTORS.md:
V1: 24-word zero-entropy phrase ("abandon… art")
seed = 0000…0000
pubkey = 3b6a27bcceb6a42d62a3a8d02a6f0d73653215771de243a63ac048a18b59da29
V2: 12-word zero-entropy phrase ("abandon… about")
seed = 09451c0f06588db78205e32a793536e15ae263c8f9ee6d14f5c6fd82b8bd20da
pubkey = 9403c32e0d3b4ce51105c0bcac09a0d73be0cca98a6bf7b3cd434651be866d70
V3: 12-word "legal winner thank year wave sausage worth useful legal winner thank yellow"
seed = 9df434a2bd5dc767ee949d8ab95ca09c4ebbb88cefc3d0b1523f6b2a744ca824
pubkey = cc99d06b15ccb83a5ca43f25dd3d27f50638c1c6fbe3a822352da3e07156ce03
The domain tag for the 12-word derivation is exactly the 15 ASCII
bytes of "kez-bip39-12-v1", documented in the spec doc.
crosstest.sh — new "BIP-39 mnemonic interop" section:
• Vector match: each impl × each vector × Public hex == expected (9
scenarios). Catches any silent derivation drift.
• Cross-impl claim signing via --mnemonic: every signer ↔ verifier
pair (rust↔node, rust↔py, node↔py), every format (json/compact/
markdown). 6 pairings × 3 formats = 18 scenarios.
• Bijection sanity: the 24-word phrase printed by `identity from-
mnemonic` round-trips to itself byte-for-byte (rust + node).
• Python-involving scenarios auto-skip if `python/.venv/bin/python
kez_cli.py identity from-mnemonic` returns non-zero, so the harness
stays runnable on machines where Python isn't set up.
Verified end-to-end: `bash crosstest.sh` reports
"All 84 scenarios passed."
Test totals across implementations:
Rust: 114 (9 mnemonic-specific in kez-core)
Node: 99 (8 mnemonic-specific in @kez/core)
Python: 19 (mnemonic only; was no test suite before)
Crosstest: 84 scenarios end-to-end
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
99 lines
3.3 KiB
Python
99 lines
3.3 KiB
Python
"""BIP-39 mnemonic phrases for Ed25519 primary keys.
|
|
|
|
Mirrors ``rust/crates/kez-core/src/mnemonic.rs`` and
|
|
``nodejs/packages/kez-core/src/mnemonic.ts`` byte-for-byte.
|
|
|
|
Two word counts are supported, with different semantics:
|
|
|
|
- **24 words** ↔ **32 bytes of entropy** ↔ **Ed25519 seed** (bijection).
|
|
Round-trips perfectly. The entropy *is* the seed.
|
|
|
|
- **12 words** → **16 bytes of entropy** → **Ed25519 seed**, via
|
|
``SHA-256("kez-bip39-12-v1" || entropy)``. One-way KEZ-specific
|
|
derivation; you cannot recover a 12-word phrase from a seed.
|
|
|
|
Wordlist: BIP-39 English. NB: we deliberately do *not* use BIP-39's
|
|
``to_seed(passphrase)`` function — that produces a 64-byte seed via
|
|
PBKDF2, intended to feed into BIP-32 hierarchical derivation. KEZ has
|
|
one identity per phrase, so taking the entropy directly (or hashing it
|
|
once for 12-word phrases) is the right primitive.
|
|
"""
|
|
|
|
from __future__ import annotations
|
|
|
|
import hashlib
|
|
|
|
from mnemonic import Mnemonic as _Bip39
|
|
|
|
from .keys import Ed25519Secret
|
|
|
|
# Domain separator for the 12-word → seed derivation. Bumping this would
|
|
# break every existing 12-word KEZ identity, so don't.
|
|
DOMAIN_TAG_12: bytes = b"kez-bip39-12-v1"
|
|
|
|
# Lazy singleton of the English BIP-39 wordlist parser.
|
|
_M = _Bip39("english")
|
|
|
|
|
|
def _assert_words(n: int) -> None:
|
|
if n not in (12, 24):
|
|
raise ValueError(f"mnemonic word count must be 12 or 24, got {n}")
|
|
|
|
|
|
def generate_mnemonic(words: int) -> str:
|
|
"""Generate a fresh BIP-39 mnemonic of the requested length.
|
|
|
|
The returned phrase is a space-separated lowercase string from the
|
|
BIP-39 English wordlist. ``words`` must be 12 or 24.
|
|
"""
|
|
_assert_words(words)
|
|
# bip39 strength is in bits: 12 words = 128 bits, 24 = 256.
|
|
strength = 256 if words == 24 else 128
|
|
return _M.generate(strength=strength)
|
|
|
|
|
|
def seed_from_mnemonic(phrase: str) -> bytes:
|
|
"""Decode a phrase (12 or 24 words) to a 32-byte Ed25519 seed.
|
|
|
|
For 24 words the entropy IS the seed; for 12 words the seed is
|
|
``SHA-256(DOMAIN_TAG_12 || entropy)``.
|
|
"""
|
|
trimmed = " ".join(phrase.split())
|
|
try:
|
|
entropy = bytes(_M.to_entropy(trimmed))
|
|
except Exception as exc: # noqa: BLE001 — wrap as our own error
|
|
raise ValueError(f"invalid mnemonic: {exc}") from exc
|
|
|
|
if len(entropy) == 32:
|
|
return entropy
|
|
if len(entropy) == 16:
|
|
return hashlib.sha256(DOMAIN_TAG_12 + entropy).digest()
|
|
raise ValueError(
|
|
f"mnemonic must decode to 16 or 32 bytes of entropy, got {len(entropy)}"
|
|
)
|
|
|
|
|
|
def mnemonic_from_seed_24(seed: bytes) -> str:
|
|
"""Inverse of :func:`seed_from_mnemonic` for the 24-word case ONLY.
|
|
|
|
There is no inverse for 12-word phrases (hashing is one-way) — this
|
|
function always produces 24 words.
|
|
"""
|
|
if len(seed) != 32:
|
|
raise ValueError(
|
|
f"mnemonic_from_seed_24: seed must be 32 bytes, got {len(seed)}"
|
|
)
|
|
return _M.to_mnemonic(seed)
|
|
|
|
|
|
def ed25519_from_mnemonic(phrase: str) -> Ed25519Secret:
|
|
"""Reconstruct an :class:`Ed25519Secret` from a BIP-39 phrase."""
|
|
return Ed25519Secret(seed_from_mnemonic(phrase))
|
|
|
|
|
|
def generate_ed25519_with_mnemonic(words: int) -> tuple[Ed25519Secret, str]:
|
|
"""Generate a fresh Ed25519 identity *and* return its BIP-39 phrase."""
|
|
phrase = generate_mnemonic(words)
|
|
secret = ed25519_from_mnemonic(phrase)
|
|
return secret, phrase
|