-
Notifications
You must be signed in to change notification settings - Fork 420
[Docs] Add supported model tables to pretrain_sft advanced tutorial #1728
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Open
CyCle1024
wants to merge
3
commits into
InternLM:main
Choose a base branch
from
CyCle1024:docs/add-supported-models-table
base: main
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
Open
Changes from 1 commit
Commits
Show all changes
3 commits
Select commit
Hold shift + click to select a range
e0c7546
docs(pretrain_sft): add supported model tables and xtuner-sync-suppor…
CyCle1024 2bbd39d
docs: fix autodoc_mock_imports and add type hints to scan script
CyCle1024 92d2394
docs(model): fix inheritance hierarchy tree for VL text backbone configs
CyCle1024 File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1 @@ | ||
| ../.claude/skills |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,123 @@ | ||
| --- | ||
| name: xtuner-sync-supported-models | ||
| description: Synchronize xtuner's supported model documentation (docs/en/pretrain_sft/advanced_tutorial/model.md and docs/zh_cn/pretrain_sft/advanced_tutorial/model.md) with the actual Config classes defined under xtuner/v1/model/. Use when (1) new TransformerConfig, MoEConfig, or BaseComposeConfig subclasses are added, removed, or renamed in xtuner/v1/model/, (2) existing model configs change their inheritance hierarchy, scale, or HuggingFace counterpart, or (3) a code review or user request points out that model.md is out of sync with the codebase. | ||
| --- | ||
|
|
||
| # Update XTuner Supported Model Docs | ||
|
|
||
| Keep the English and Chinese `model.md` files synchronized with the actual Config classes in `xtuner/v1/model/`. | ||
|
|
||
| ## Scan the Codebase | ||
|
|
||
| Run the bundled scan script from the xtuner project root to discover all Config classes and their inheritance: | ||
|
|
||
| ```bash | ||
| python3 .agents/skills/xtuner-sync-supported-models/scripts/scan_model_configs.py | ||
| ``` | ||
|
|
||
| The script outputs JSON with two keys: | ||
| - `configs`: list of every `*Config` class under `xtuner/v1/model/` with its parent classes and file path | ||
| - `children`: parent-to-children mapping for the hierarchy tree | ||
|
|
||
| ## What to Update | ||
|
|
||
| Compare the script output against the two files: | ||
| - `docs/en/pretrain_sft/advanced_tutorial/model.md` | ||
| - `docs/zh_cn/pretrain_sft/advanced_tutorial/model.md` | ||
|
|
||
| Both files share the same structure and must stay in sync: | ||
|
|
||
| 1. **Base Config Classes** — configs that directly inherit from `TransformerConfig` (or `MoEConfig`) and provide a `from_hf` classmethod for loading HuggingFace weights | ||
| 2. **Concrete Model Configs** — fixed-scale subclasses of the base configs above | ||
| 3. **Compose Models** — multimodal configs that inherit from `BaseComposeConfig` | ||
| 4. **Inheritance Hierarchy** — a text tree showing the full `XTunerBaseModelConfig` hierarchy | ||
|
|
||
| ### Rules for the Base Config table | ||
|
|
||
| Include these direct descendants of `TransformerConfig`/`MoEConfig`: | ||
| - `Qwen2DenseConfig` | ||
| - `Qwen3DenseConfig` | ||
| - `DeepSeekV3Config` | ||
| - `GptOssConfig` | ||
| - `Qwen3MoEConfig` | ||
|
|
||
| Exclude from the base table: | ||
| - `MoEConfig` — it is an intermediate base class, not a usable model family | ||
| - `Qwen3_5_VLTextMoEConfig` — it is an intermediate base with only one concrete child; its child `Qwen3_5_VLTextMoE35BA3BConfig` belongs under the MoE concrete table | ||
|
|
||
| ### Rules for the Concrete Model table | ||
|
|
||
| Include every concrete subclass that has fixed parameter defaults. For each row note: | ||
| - `Config Class` | ||
| - `Base Class / Family` | ||
| - `Architecture Type`: `Dense`, `MoE`, `Dense (VL backbone)`, `MoE (VL backbone)` | ||
| - `Scale / Notes`: parameter count or total/activated size; for VL backbones note "for multimodal" | ||
|
|
||
| `DeepSeekV3Config` appears here even though it has no separate base entry (it is both base and concrete). | ||
|
|
||
| ### Rules for the Compose Models section | ||
|
|
||
| Include three sub-tables: | ||
| 1. **Compose Base Config Classes** — `Qwen3VLBaseConfig`, `InternVLBaseConfig`, `InternS1BaseConfig` | ||
| - `Qwen3VLBaseConfig`: VL model based on Qwen3 text backbone | ||
| - `InternVLBaseConfig`: VL model based on InternViT + Qwen3 | ||
| - `InternS1BaseConfig`: Science multimodal model based on InternViT + Qwen3 | ||
| 2. **Concrete Compose Model Configs** — every subclass of the above bases; for each row note the wrapped `Text Config` and scale | ||
|
|
||
| ### Rules for the Inheritance Hierarchy tree | ||
|
|
||
| Rebuild the tree from `XTunerBaseModelConfig` with two top-level branches: | ||
|
|
||
| ```text | ||
| XTunerBaseModelConfig | ||
| ├── TransformerConfig | ||
| │ ├── Dense Models | ||
| │ │ ├── Qwen2DenseConfig | ||
| │ │ │ └── Qwen2Dense7BConfig | ||
| │ │ └── Qwen3DenseConfig | ||
| │ │ ├── Qwen3Dense8BConfig | ||
| │ │ ├── Qwen3Dense4BConfig | ||
| │ │ ├── Qwen3Dense0P6BConfig | ||
| │ │ ├── Qwen3VLTextDense4BConfig | ||
| │ │ └── Qwen3VLTextDense8BConfig | ||
| │ └── MoE Models (via MoEConfig) | ||
| │ ├── DeepSeekV3Config | ||
| │ ├── GptOssConfig | ||
| │ │ ├── GptOss21BA3P6Config | ||
| │ │ └── GptOss117BA5P8Config | ||
| │ ├── Qwen3MoEConfig | ||
| │ │ ├── Qwen3MoE30BA3Config | ||
| │ │ ├── Qwen3MoE235BA22Config | ||
| │ │ ├── Qwen3MoEFoPEConfig | ||
| │ │ ├── Qwen3VLTextMoE30BA3Config | ||
| │ │ └── Qwen3VLTextMoE235BA22Config | ||
| │ └── Qwen3_5_VLTextMoEConfig | ||
| │ └── Qwen3_5_VLTextMoE35BA3BConfig | ||
| └── BaseComposeConfig | ||
| ├── Qwen3VLBaseConfig | ||
| │ ├── Qwen3VLMoE30BA3Config | ||
| │ ├── Qwen3VLMoE235BA22Config | ||
| │ ├── Qwen3VLDense4BConfig | ||
| │ ├── Qwen3VLDense8BConfig | ||
| │ └── Qwen3_5_BaseConfig | ||
| │ └── Qwen3_5_VLMoE35BA3Config | ||
| ├── InternVLBaseConfig | ||
| │ ├── InternVL3P5Dense8BConfig | ||
| │ ├── InternVL3P5MoE30BA3Config | ||
| │ └── InternVL3P5Dense1BConfig | ||
| └── InternS1BaseConfig | ||
| ├── InternS1Config | ||
| └── InternS1MiniConfig | ||
| ``` | ||
|
|
||
| When new configs are added, insert them into the appropriate branch following the same indentation style. | ||
|
|
||
| ## Translation Notes | ||
|
|
||
| Keep the Chinese `model.md` (`docs/zh_cn/...`) structurally identical to the English one. Translate: | ||
| - Section headings | ||
| - Table header cells | ||
| - Description cells (e.g., "Image / Video + Text" → "图像/视频 + 文本") | ||
| - Scale descriptions (e.g., "~7B parameters" → "约 7B 参数", "FoPE variant" → "FoPE 变体") | ||
|
|
||
| Do **not** translate Config class names, file paths, or code identifiers. |
71 changes: 71 additions & 0 deletions
71
.claude/skills/xtuner-sync-supported-models/scripts/scan_model_configs.py
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,71 @@ | ||
| #!/usr/bin/env python3 | ||
| """Scan xtuner/v1/model for all Config classes and output model info as JSON.""" | ||
|
|
||
| import json | ||
| import re | ||
| import sys | ||
| from pathlib import Path | ||
|
|
||
| # We care about configs that are part of the supported model hierarchy | ||
| RELEVANT_BASES = { | ||
| "TransformerConfig", | ||
| "MoEConfig", | ||
| "BaseComposeConfig", | ||
| "XTunerBaseModelConfig", | ||
| # Known intermediate/family bases | ||
| "Qwen2DenseConfig", | ||
| "Qwen3DenseConfig", | ||
| "Qwen3MoEConfig", | ||
| "Qwen3_5_VLTextMoEConfig", | ||
| "GptOssConfig", | ||
| "DeepSeekV3Config", | ||
| "Qwen3VLBaseConfig", | ||
| "Qwen3_5_BaseConfig", | ||
| "InternVLBaseConfig", | ||
| "InternS1BaseConfig", | ||
| } | ||
|
|
||
|
|
||
| def scan_file(path: Path): | ||
| text = path.read_text() | ||
| # Match class definitions like: class FooConfig(BarConfig): | ||
| pattern = r"^class\s+(\w+Config)\s*\(([^)]+)\):" | ||
| results = [] | ||
| for m in re.finditer(pattern, text, re.MULTILINE): | ||
| class_name = m.group(1) | ||
| parents = [p.strip() for p in m.group(2).split(",")] | ||
| results.append({"class": class_name, "parents": parents, "file": str(path)}) | ||
| return results | ||
|
|
||
|
|
||
| def main(): | ||
| root = Path(sys.argv[1]) if len(sys.argv) > 1 else Path(".") | ||
| model_dir = root / "xtuner" / "v1" / "model" | ||
| if not model_dir.exists(): | ||
| print(f"Model directory not found: {model_dir}", file=sys.stderr) | ||
| sys.exit(1) | ||
|
|
||
| all_configs = [] | ||
| for py_file in sorted(model_dir.rglob("*.py")): | ||
| all_configs.extend(scan_file(py_file)) | ||
|
|
||
| # Build parent -> children map | ||
| children: dict[str, list[str]] = {} | ||
| for cfg in all_configs: | ||
| for p in cfg["parents"]: | ||
|
CyCle1024 marked this conversation as resolved.
|
||
| if p in RELEVANT_BASES or p.endswith("Config"): | ||
| children.setdefault(p, []).append(cfg["class"]) | ||
|
|
||
| # Deduplicate | ||
| for k in children: | ||
| children[k] = sorted(set(children[k])) | ||
|
|
||
| output = { | ||
| "configs": all_configs, | ||
| "children": children, | ||
| } | ||
| print(json.dumps(output, indent=2, ensure_ascii=False)) | ||
|
|
||
|
|
||
| if __name__ == "__main__": | ||
| main() | ||
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -1,3 +1,110 @@ | ||
| # Model | ||
|
|
||
| Coming soon... | ||
| XTuner v1's `TrainEngine` supports a variety of Transformer architectures through different `TransformerConfig` subclasses. The documentation below summarizes the currently supported models (RL-related configs are excluded). | ||
|
|
||
| ## Base Config Classes | ||
|
|
||
| The following table lists the **base config classes** that define each model family. They provide the `from_hf` interface for loading pretrained weights from HuggingFace. | ||
|
|
||
| | Base Config Class | Model Family | Architecture Type | HuggingFace Counterpart | | ||
| |---|---|---|---| | ||
| | `Qwen2DenseConfig` | Qwen2 Dense | Dense | `Qwen2ForCausalLM` | | ||
| | `Qwen3DenseConfig` | Qwen3 Dense | Dense | `Qwen3ForCausalLM` | | ||
| | `DeepSeekV3Config` | DeepSeek-V3 | MoE | `DeepseekV3ForCausalLM` | | ||
| | `GptOssConfig` | GPT-OSS | MoE | `GptOssForCausalLM` | | ||
| | `Qwen3MoEConfig` | Qwen3 MoE | MoE | `Qwen3MoeForCausalLM` | | ||
|
|
||
| ## Concrete Model Configs | ||
|
|
||
| The following table lists the **concrete model configs** that inherit from the base classes above. Each config corresponds to a specific model scale or variant. | ||
|
|
||
| | Config Class | Base Class / Family | Architecture Type | Scale / Notes | | ||
| |---|---|---|---| | ||
| | `Qwen2Dense7BConfig` | `Qwen2DenseConfig` | Dense | ~7B parameters | | ||
| | `Qwen3Dense8BConfig` | `Qwen3DenseConfig` | Dense | ~8B parameters | | ||
| | `Qwen3Dense4BConfig` | `Qwen3DenseConfig` | Dense | ~4B parameters | | ||
| | `Qwen3Dense0P6BConfig` | `Qwen3DenseConfig` | Dense | ~0.6B parameters | | ||
| | `Qwen3VLTextDense4BConfig` | `Qwen3DenseConfig` | Dense (VL backbone) | ~4B parameters, for multimodal | | ||
| | `Qwen3VLTextDense8BConfig` | `Qwen3DenseConfig` | Dense (VL backbone) | ~8B parameters, for multimodal | | ||
| | `DeepSeekV3Config` | — | MoE | ~671B total / ~37B activated | | ||
| | `GptOss21BA3P6Config` | `GptOssConfig` | MoE | ~21B total / ~3.6B activated | | ||
| | `GptOss117BA5P8Config` | `GptOssConfig` | MoE | ~117B total / ~5.8B activated | | ||
| | `Qwen3MoE30BA3Config` | `Qwen3MoEConfig` | MoE | ~30B total / ~3B activated | | ||
|
CyCle1024 marked this conversation as resolved.
|
||
| | `Qwen3MoE235BA22Config` | `Qwen3MoEConfig` | MoE | ~235B total / ~22B activated | | ||
| | `Qwen3MoEFoPEConfig` | `Qwen3MoEConfig` | MoE | FoPE (Frequency-based Position Embedding) variant | | ||
| | `Qwen3VLTextMoE30BA3Config` | `Qwen3MoEConfig` | MoE (VL backbone) | ~30B total, for multimodal | | ||
| | `Qwen3VLTextMoE235BA22Config` | `Qwen3MoEConfig` | MoE (VL backbone) | ~235B total, for multimodal | | ||
| | `Qwen3_5_VLTextMoE35BA3BConfig` | `Qwen3_5_VLTextMoEConfig` | MoE (VL backbone) | ~35B total / ~3B activated, for multimodal | | ||
|
|
||
| ## Compose Models | ||
|
|
||
| In addition to pure text models, XTuner also supports **multimodal compose models** that combine a vision encoder, a projector, and a language model. These configs inherit from `BaseComposeConfig` rather than `TransformerConfig` directly, but they wrap the text configs listed above. | ||
|
|
||
| ### Compose Base Config Classes | ||
|
|
||
| | Base Config Class | Model Family | Modality | Description | | ||
| |---|---|---|---| | ||
| | `Qwen3VLBaseConfig` | Qwen3-VL | Image / Video + Text | VL model based on Qwen3 text backbone | | ||
| | `InternVLBaseConfig` | InternVL | Image + Text | VL model based on InternViT + Qwen3 | | ||
| | `InternS1BaseConfig` | InternS1 | Image + Text | Science multimodal model based on InternViT + Qwen3 | | ||
|
|
||
| ### Concrete Compose Model Configs | ||
|
|
||
| | Config Class | Compose Base / Family | Text Config | Scale / Notes | | ||
| |---|---|---|---| | ||
| | `Qwen3VLMoE30BA3Config` | `Qwen3VLBaseConfig` | `Qwen3VLTextMoE30BA3Config` | ~30B total, MoE VL | | ||
| | `Qwen3VLMoE235BA22Config` | `Qwen3VLBaseConfig` | `Qwen3VLTextMoE235BA22Config` | ~235B total, MoE VL | | ||
| | `Qwen3VLDense4BConfig` | `Qwen3VLBaseConfig` | `Qwen3VLTextDense4BConfig` | ~4B parameters, Dense VL | | ||
| | `Qwen3VLDense8BConfig` | `Qwen3VLBaseConfig` | `Qwen3VLTextDense8BConfig` | ~8B parameters, Dense VL | | ||
| | `Qwen3_5_VLMoE35BA3Config` | `Qwen3_5_BaseConfig` | `Qwen3_5_VLTextMoE35BA3BConfig` | ~35B total / ~3B activated, MoE VL | | ||
| | `InternVL3P5Dense8BConfig` | `InternVLBaseConfig` | `Qwen3Dense8BConfig` | ~8B parameters, Dense VL | | ||
| | `InternVL3P5MoE30BA3Config` | `InternVLBaseConfig` | `Qwen3MoE30BA3Config` | ~30B total, MoE VL | | ||
| | `InternVL3P5Dense1BConfig` | `InternVLBaseConfig` | `Qwen3Dense0P6BConfig` | ~1B parameters, Dense VL | | ||
| | `InternS1Config` | `InternS1BaseConfig` | `Qwen3MoE235BA22Config` | ~235B total, MoE multimodal | | ||
| | `InternS1MiniConfig` | `InternS1BaseConfig` | `Qwen3Dense8BConfig` | ~8B parameters, Dense multimodal | | ||
|
|
||
| ## Inheritance Hierarchy | ||
|
|
||
| The following diagram shows the complete inheritance hierarchy of all config classes supported by `TrainEngine`, including both `TransformerConfig` and `BaseComposeConfig` branches. | ||
|
|
||
| ```text | ||
| XTunerBaseModelConfig | ||
| ├── TransformerConfig | ||
| │ ├── Dense Models | ||
| │ │ ├── Qwen2DenseConfig | ||
| │ │ │ └── Qwen2Dense7BConfig | ||
| │ │ └── Qwen3DenseConfig | ||
| │ │ ├── Qwen3Dense8BConfig | ||
| │ │ ├── Qwen3Dense4BConfig | ||
| │ │ ├── Qwen3Dense0P6BConfig | ||
| │ │ ├── Qwen3VLTextDense4BConfig | ||
| │ │ └── Qwen3VLTextDense8BConfig | ||
| │ └── MoE Models (via MoEConfig) | ||
|
CyCle1024 marked this conversation as resolved.
|
||
| │ ├── DeepSeekV3Config | ||
| │ ├── GptOssConfig | ||
| │ │ ├── GptOss21BA3P6Config | ||
| │ │ └── GptOss117BA5P8Config | ||
| │ ├── Qwen3MoEConfig | ||
| │ │ ├── Qwen3MoE30BA3Config | ||
| │ │ ├── Qwen3MoE235BA22Config | ||
| │ │ ├── Qwen3MoEFoPEConfig | ||
| │ │ ├── Qwen3VLTextMoE30BA3Config | ||
| │ │ └── Qwen3VLTextMoE235BA22Config | ||
| │ └── Qwen3_5_VLTextMoEConfig | ||
| │ └── Qwen3_5_VLTextMoE35BA3BConfig | ||
| └── BaseComposeConfig | ||
| ├── Qwen3VLBaseConfig | ||
| │ ├── Qwen3VLMoE30BA3Config | ||
| │ ├── Qwen3VLMoE235BA22Config | ||
| │ ├── Qwen3VLDense4BConfig | ||
| │ ├── Qwen3VLDense8BConfig | ||
| │ └── Qwen3_5_BaseConfig | ||
| │ └── Qwen3_5_VLMoE35BA3Config | ||
| ├── InternVLBaseConfig | ||
| │ ├── InternVL3P5Dense8BConfig | ||
| │ ├── InternVL3P5MoE30BA3Config | ||
| │ └── InternVL3P5Dense1BConfig | ||
| └── InternS1BaseConfig | ||
| ├── InternS1Config | ||
| └── InternS1MiniConfig | ||
| ``` | ||
Oops, something went wrong.
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Uh oh!
There was an error while loading. Please reload this page.