Skip to content

feat(pageindex): add .txt parsing support#252

Open
EthanGuo-coder wants to merge 1 commit intoVectifyAI:mainfrom
EthanGuo-coder:fix/issue-222
Open

feat(pageindex): add .txt parsing support#252
EthanGuo-coder wants to merge 1 commit intoVectifyAI:mainfrom
EthanGuo-coder:fix/issue-222

Conversation

@EthanGuo-coder
Copy link
Copy Markdown

@EthanGuo-coder EthanGuo-coder commented May 2, 2026

Summary

Add .txt parsing support: a new pageindex/page_index_txt.py module exposes async txt_to_tree() (utf-8 with latin-1 fallback) that emits a single-root-node tree mirroring md_to_tree's output shape, wired into the package exports, PageIndexClient.index() (new mode='txt' branch and auto-detection), and the run_pageindex.py CLI via --txt_path.

Fixes #222

Test plan

  • Run tests/test_page_index_txt.py::test_txt_to_tree_parses_plain_text_into_single_node, tests/test_page_index_txt.py::test_txt_to_tree_preserves_utf8_content, tests/test_page_index_txt.py::test_txt_to_tree_includes_line_count, tests/test_page_index_txt.py::test_txt_to_tree_exposed_from_package, tests/test_page_index_txt.py::test_client_dispatches_txt_extension locally
  • Project's full test suite locally (validator-agent confirmed)
  • CI green on this branch

Fixes VectifyAI#222

Add a new pageindex/page_index_txt.py module exposing async txt_to_tree(),
which reads UTF-8 (with latin-1 fallback) and returns a single-root-node
tree mirroring md_to_tree's output shape (doc_name, line_count, structure).
Wired into pageindex package exports, PageIndexClient.index() (new
mode='txt' branch and is_txt auto-detection; _make_meta_entry treats txt
like md for line_count), and run_pageindex.py CLI (--txt_path with the
same validation/save flow as --md_path).

Co-Authored-By: Claude <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Add support for parsing .txt files

1 participant