chore: initialize EverOS 1.0.0
md-first memory extraction framework for AI agents. Markdown is the single source of truth; SQLite holds state and LanceDB provides the rebuildable vector + BM25 + scalar index. The codebase follows a single-direction DDD layering (entrypoints -> service -> memory -> infra, with component / core / config cross-cutting) enforced by import-linter. Engineering surface: - Coding conventions in .claude/rules/ (path-scoped) and workflows in .claude/skills/ (/commit, /new-branch, /pr). - GitHub Actions CI runs make lint + test + integration; pre-commit mirrors the gates locally (ruff, hygiene hooks, gitlint commit-msg). - Commit messages follow Conventional Commits, enforced by gitlint. - make lint also enforces datetime two-zone discipline and OpenAPI drift.
This commit is contained in:
17
src/everos/component/tokenizer/factory.py
Normal file
17
src/everos/component/tokenizer/factory.py
Normal file
@ -0,0 +1,17 @@
|
||||
"""Factory for the cascade-time tokenizer.
|
||||
|
||||
Single implementation today (``JiebaTokenizer``). Lifting this into a
|
||||
factory keeps callers (cascade handler) decoupled from the concrete
|
||||
choice, so swapping to char-bigram / hf tokenizer later is a one-file
|
||||
change — see ``17_lancedb_tables_design.md`` §2.4.1.
|
||||
"""
|
||||
|
||||
from __future__ import annotations
|
||||
|
||||
from .jieba_provider import JiebaTokenizer
|
||||
from .protocol import Tokenizer
|
||||
|
||||
|
||||
def build_tokenizer() -> Tokenizer:
|
||||
"""Build the default tokenizer (``JiebaTokenizer``)."""
|
||||
return JiebaTokenizer()
|
||||
Reference in New Issue
Block a user