Files
EverOS/src/everos/component/tokenizer/factory.py
Elliot Chen 518b8eca85 chore: initialize EverOS 1.0.0
md-first memory extraction framework for AI agents.

Markdown is the single source of truth; SQLite holds state and LanceDB
provides the rebuildable vector + BM25 + scalar index. The codebase follows
a single-direction DDD layering (entrypoints -> service -> memory -> infra,
with component / core / config cross-cutting) enforced by import-linter.

Engineering surface:
- Coding conventions in .claude/rules/ (path-scoped) and workflows in
  .claude/skills/ (/commit, /new-branch, /pr).
- GitHub Actions CI runs make lint + test + integration; pre-commit mirrors
  the gates locally (ruff, hygiene hooks, gitlint commit-msg).
- Commit messages follow Conventional Commits, enforced by gitlint.
- make lint also enforces datetime two-zone discipline and OpenAPI drift.
2026-06-06 07:33:17 +08:00

18 lines
550 B
Python

"""Factory for the cascade-time tokenizer.
Single implementation today (``JiebaTokenizer``). Lifting this into a
factory keeps callers (cascade handler) decoupled from the concrete
choice, so swapping to char-bigram / hf tokenizer later is a one-file
change — see ``17_lancedb_tables_design.md`` §2.4.1.
"""
from __future__ import annotations
from .jieba_provider import JiebaTokenizer
from .protocol import Tokenizer
def build_tokenizer() -> Tokenizer:
"""Build the default tokenizer (``JiebaTokenizer``)."""
return JiebaTokenizer()