Skip to content
Open
Show file tree
Hide file tree
Changes from 1 commit
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions .agents/skills
123 changes: 123 additions & 0 deletions .claude/skills/xtuner-sync-supported-models/SKILL.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,123 @@
---
name: xtuner-sync-supported-models
description: Synchronize xtuner's supported model documentation (docs/en/pretrain_sft/advanced_tutorial/model.md and docs/zh_cn/pretrain_sft/advanced_tutorial/model.md) with the actual Config classes defined under xtuner/v1/model/. Use when (1) new TransformerConfig, MoEConfig, or BaseComposeConfig subclasses are added, removed, or renamed in xtuner/v1/model/, (2) existing model configs change their inheritance hierarchy, scale, or HuggingFace counterpart, or (3) a code review or user request points out that model.md is out of sync with the codebase.
---

# Update XTuner Supported Model Docs

Keep the English and Chinese `model.md` files synchronized with the actual Config classes in `xtuner/v1/model/`.

## Scan the Codebase

Run the bundled scan script from the xtuner project root to discover all Config classes and their inheritance:

```bash
python3 .agents/skills/xtuner-sync-supported-models/scripts/scan_model_configs.py
```

The script outputs JSON with two keys:
- `configs`: list of every `*Config` class under `xtuner/v1/model/` with its parent classes and file path
- `children`: parent-to-children mapping for the hierarchy tree

## What to Update

Compare the script output against the two files:
- `docs/en/pretrain_sft/advanced_tutorial/model.md`
- `docs/zh_cn/pretrain_sft/advanced_tutorial/model.md`

Both files share the same structure and must stay in sync:

1. **Base Config Classes** — configs that directly inherit from `TransformerConfig` (or `MoEConfig`) and provide a `from_hf` classmethod for loading HuggingFace weights
2. **Concrete Model Configs** — fixed-scale subclasses of the base configs above
3. **Compose Models** — multimodal configs that inherit from `BaseComposeConfig`
4. **Inheritance Hierarchy** — a text tree showing the full `XTunerBaseModelConfig` hierarchy

### Rules for the Base Config table

Include these direct descendants of `TransformerConfig`/`MoEConfig`:
- `Qwen2DenseConfig`
- `Qwen3DenseConfig`
- `DeepSeekV3Config`
- `GptOssConfig`
- `Qwen3MoEConfig`

Exclude from the base table:
- `MoEConfig` — it is an intermediate base class, not a usable model family
- `Qwen3_5_VLTextMoEConfig` — it is an intermediate base with only one concrete child; its child `Qwen3_5_VLTextMoE35BA3BConfig` belongs under the MoE concrete table

### Rules for the Concrete Model table

Include every concrete subclass that has fixed parameter defaults. For each row note:
- `Config Class`
- `Base Class / Family`
- `Architecture Type`: `Dense`, `MoE`, `Dense (VL backbone)`, `MoE (VL backbone)`
- `Scale / Notes`: parameter count or total/activated size; for VL backbones note "for multimodal"

`DeepSeekV3Config` appears here even though it has no separate base entry (it is both base and concrete).

### Rules for the Compose Models section

Include three sub-tables:
1. **Compose Base Config Classes** — `Qwen3VLBaseConfig`, `InternVLBaseConfig`, `InternS1BaseConfig`
- `Qwen3VLBaseConfig`: VL model based on Qwen3 text backbone
- `InternVLBaseConfig`: VL model based on InternViT + Qwen3
- `InternS1BaseConfig`: Science multimodal model based on InternViT + Qwen3
2. **Concrete Compose Model Configs** — every subclass of the above bases; for each row note the wrapped `Text Config` and scale

### Rules for the Inheritance Hierarchy tree

Rebuild the tree from `XTunerBaseModelConfig` with two top-level branches:

```text
XTunerBaseModelConfig
├── TransformerConfig
│ ├── Dense Models
│ │ ├── Qwen2DenseConfig
│ │ │ └── Qwen2Dense7BConfig
│ │ └── Qwen3DenseConfig
│ │ ├── Qwen3Dense8BConfig
│ │ ├── Qwen3Dense4BConfig
│ │ ├── Qwen3Dense0P6BConfig
│ │ ├── Qwen3VLTextDense4BConfig
│ │ └── Qwen3VLTextDense8BConfig
│ └── MoE Models (via MoEConfig)
│ ├── DeepSeekV3Config
│ ├── GptOssConfig
│ │ ├── GptOss21BA3P6Config
│ │ └── GptOss117BA5P8Config
│ ├── Qwen3MoEConfig
│ │ ├── Qwen3MoE30BA3Config
│ │ ├── Qwen3MoE235BA22Config
│ │ ├── Qwen3MoEFoPEConfig
│ │ ├── Qwen3VLTextMoE30BA3Config
│ │ └── Qwen3VLTextMoE235BA22Config
│ └── Qwen3_5_VLTextMoEConfig
│ └── Qwen3_5_VLTextMoE35BA3BConfig
└── BaseComposeConfig
├── Qwen3VLBaseConfig
│ ├── Qwen3VLMoE30BA3Config
│ ├── Qwen3VLMoE235BA22Config
│ ├── Qwen3VLDense4BConfig
│ ├── Qwen3VLDense8BConfig
│ └── Qwen3_5_BaseConfig
│ └── Qwen3_5_VLMoE35BA3Config
├── InternVLBaseConfig
│ ├── InternVL3P5Dense8BConfig
│ ├── InternVL3P5MoE30BA3Config
│ └── InternVL3P5Dense1BConfig
└── InternS1BaseConfig
├── InternS1Config
└── InternS1MiniConfig
```

When new configs are added, insert them into the appropriate branch following the same indentation style.

## Translation Notes

Keep the Chinese `model.md` (`docs/zh_cn/...`) structurally identical to the English one. Translate:
- Section headings
- Table header cells
- Description cells (e.g., "Image / Video + Text" → "图像/视频 + 文本")
- Scale descriptions (e.g., "~7B parameters" → "约 7B 参数", "FoPE variant" → "FoPE 变体")

Do **not** translate Config class names, file paths, or code identifiers.
Original file line number Diff line number Diff line change
@@ -0,0 +1,71 @@
#!/usr/bin/env python3
"""Scan xtuner/v1/model for all Config classes and output model info as JSON."""

import json
import re
import sys
from pathlib import Path

# We care about configs that are part of the supported model hierarchy
RELEVANT_BASES = {
"TransformerConfig",
"MoEConfig",
"BaseComposeConfig",
"XTunerBaseModelConfig",
# Known intermediate/family bases
"Qwen2DenseConfig",
"Qwen3DenseConfig",
"Qwen3MoEConfig",
"Qwen3_5_VLTextMoEConfig",
"GptOssConfig",
"DeepSeekV3Config",
"Qwen3VLBaseConfig",
"Qwen3_5_BaseConfig",
"InternVLBaseConfig",
"InternS1BaseConfig",
}


def scan_file(path: Path):
Comment thread
CyCle1024 marked this conversation as resolved.
Outdated
text = path.read_text()
# Match class definitions like: class FooConfig(BarConfig):
pattern = r"^class\s+(\w+Config)\s*\(([^)]+)\):"
results = []
for m in re.finditer(pattern, text, re.MULTILINE):
class_name = m.group(1)
parents = [p.strip() for p in m.group(2).split(",")]
results.append({"class": class_name, "parents": parents, "file": str(path)})
return results


def main():
root = Path(sys.argv[1]) if len(sys.argv) > 1 else Path(".")
model_dir = root / "xtuner" / "v1" / "model"
if not model_dir.exists():
print(f"Model directory not found: {model_dir}", file=sys.stderr)
sys.exit(1)

all_configs = []
for py_file in sorted(model_dir.rglob("*.py")):
all_configs.extend(scan_file(py_file))

# Build parent -> children map
children: dict[str, list[str]] = {}
for cfg in all_configs:
for p in cfg["parents"]:
Comment thread
CyCle1024 marked this conversation as resolved.
if p in RELEVANT_BASES or p.endswith("Config"):
children.setdefault(p, []).append(cfg["class"])

# Deduplicate
for k in children:
children[k] = sorted(set(children[k]))

output = {
"configs": all_configs,
"children": children,
}
print(json.dumps(output, indent=2, ensure_ascii=False))


if __name__ == "__main__":
main()
109 changes: 108 additions & 1 deletion docs/en/pretrain_sft/advanced_tutorial/model.md
Original file line number Diff line number Diff line change
@@ -1,3 +1,110 @@
# Model

Coming soon...
XTuner v1's `TrainEngine` supports a variety of Transformer architectures through different `TransformerConfig` subclasses. The documentation below summarizes the currently supported models (RL-related configs are excluded).

## Base Config Classes

The following table lists the **base config classes** that define each model family. They provide the `from_hf` interface for loading pretrained weights from HuggingFace.

| Base Config Class | Model Family | Architecture Type | HuggingFace Counterpart |
|---|---|---|---|
| `Qwen2DenseConfig` | Qwen2 Dense | Dense | `Qwen2ForCausalLM` |
| `Qwen3DenseConfig` | Qwen3 Dense | Dense | `Qwen3ForCausalLM` |
| `DeepSeekV3Config` | DeepSeek-V3 | MoE | `DeepseekV3ForCausalLM` |
| `GptOssConfig` | GPT-OSS | MoE | `GptOssForCausalLM` |
| `Qwen3MoEConfig` | Qwen3 MoE | MoE | `Qwen3MoeForCausalLM` |

## Concrete Model Configs

The following table lists the **concrete model configs** that inherit from the base classes above. Each config corresponds to a specific model scale or variant.

| Config Class | Base Class / Family | Architecture Type | Scale / Notes |
|---|---|---|---|
| `Qwen2Dense7BConfig` | `Qwen2DenseConfig` | Dense | ~7B parameters |
| `Qwen3Dense8BConfig` | `Qwen3DenseConfig` | Dense | ~8B parameters |
| `Qwen3Dense4BConfig` | `Qwen3DenseConfig` | Dense | ~4B parameters |
| `Qwen3Dense0P6BConfig` | `Qwen3DenseConfig` | Dense | ~0.6B parameters |
| `Qwen3VLTextDense4BConfig` | `Qwen3DenseConfig` | Dense (VL backbone) | ~4B parameters, for multimodal |
| `Qwen3VLTextDense8BConfig` | `Qwen3DenseConfig` | Dense (VL backbone) | ~8B parameters, for multimodal |
| `DeepSeekV3Config` | — | MoE | ~671B total / ~37B activated |
| `GptOss21BA3P6Config` | `GptOssConfig` | MoE | ~21B total / ~3.6B activated |
| `GptOss117BA5P8Config` | `GptOssConfig` | MoE | ~117B total / ~5.8B activated |
| `Qwen3MoE30BA3Config` | `Qwen3MoEConfig` | MoE | ~30B total / ~3B activated |
Comment thread
CyCle1024 marked this conversation as resolved.
| `Qwen3MoE235BA22Config` | `Qwen3MoEConfig` | MoE | ~235B total / ~22B activated |
| `Qwen3MoEFoPEConfig` | `Qwen3MoEConfig` | MoE | FoPE (Frequency-based Position Embedding) variant |
| `Qwen3VLTextMoE30BA3Config` | `Qwen3MoEConfig` | MoE (VL backbone) | ~30B total, for multimodal |
| `Qwen3VLTextMoE235BA22Config` | `Qwen3MoEConfig` | MoE (VL backbone) | ~235B total, for multimodal |
| `Qwen3_5_VLTextMoE35BA3BConfig` | `Qwen3_5_VLTextMoEConfig` | MoE (VL backbone) | ~35B total / ~3B activated, for multimodal |

## Compose Models

In addition to pure text models, XTuner also supports **multimodal compose models** that combine a vision encoder, a projector, and a language model. These configs inherit from `BaseComposeConfig` rather than `TransformerConfig` directly, but they wrap the text configs listed above.

### Compose Base Config Classes

| Base Config Class | Model Family | Modality | Description |
|---|---|---|---|
| `Qwen3VLBaseConfig` | Qwen3-VL | Image / Video + Text | VL model based on Qwen3 text backbone |
| `InternVLBaseConfig` | InternVL | Image + Text | VL model based on InternViT + Qwen3 |
| `InternS1BaseConfig` | InternS1 | Image + Text | Science multimodal model based on InternViT + Qwen3 |

### Concrete Compose Model Configs

| Config Class | Compose Base / Family | Text Config | Scale / Notes |
|---|---|---|---|
| `Qwen3VLMoE30BA3Config` | `Qwen3VLBaseConfig` | `Qwen3VLTextMoE30BA3Config` | ~30B total, MoE VL |
| `Qwen3VLMoE235BA22Config` | `Qwen3VLBaseConfig` | `Qwen3VLTextMoE235BA22Config` | ~235B total, MoE VL |
| `Qwen3VLDense4BConfig` | `Qwen3VLBaseConfig` | `Qwen3VLTextDense4BConfig` | ~4B parameters, Dense VL |
| `Qwen3VLDense8BConfig` | `Qwen3VLBaseConfig` | `Qwen3VLTextDense8BConfig` | ~8B parameters, Dense VL |
| `Qwen3_5_VLMoE35BA3Config` | `Qwen3_5_BaseConfig` | `Qwen3_5_VLTextMoE35BA3BConfig` | ~35B total / ~3B activated, MoE VL |
| `InternVL3P5Dense8BConfig` | `InternVLBaseConfig` | `Qwen3Dense8BConfig` | ~8B parameters, Dense VL |
| `InternVL3P5MoE30BA3Config` | `InternVLBaseConfig` | `Qwen3MoE30BA3Config` | ~30B total, MoE VL |
| `InternVL3P5Dense1BConfig` | `InternVLBaseConfig` | `Qwen3Dense0P6BConfig` | ~1B parameters, Dense VL |
| `InternS1Config` | `InternS1BaseConfig` | `Qwen3MoE235BA22Config` | ~235B total, MoE multimodal |
| `InternS1MiniConfig` | `InternS1BaseConfig` | `Qwen3Dense8BConfig` | ~8B parameters, Dense multimodal |

## Inheritance Hierarchy

The following diagram shows the complete inheritance hierarchy of all config classes supported by `TrainEngine`, including both `TransformerConfig` and `BaseComposeConfig` branches.

```text
XTunerBaseModelConfig
├── TransformerConfig
│ ├── Dense Models
│ │ ├── Qwen2DenseConfig
│ │ │ └── Qwen2Dense7BConfig
│ │ └── Qwen3DenseConfig
│ │ ├── Qwen3Dense8BConfig
│ │ ├── Qwen3Dense4BConfig
│ │ ├── Qwen3Dense0P6BConfig
│ │ ├── Qwen3VLTextDense4BConfig
│ │ └── Qwen3VLTextDense8BConfig
│ └── MoE Models (via MoEConfig)
Comment thread
CyCle1024 marked this conversation as resolved.
│ ├── DeepSeekV3Config
│ ├── GptOssConfig
│ │ ├── GptOss21BA3P6Config
│ │ └── GptOss117BA5P8Config
│ ├── Qwen3MoEConfig
│ │ ├── Qwen3MoE30BA3Config
│ │ ├── Qwen3MoE235BA22Config
│ │ ├── Qwen3MoEFoPEConfig
│ │ ├── Qwen3VLTextMoE30BA3Config
│ │ └── Qwen3VLTextMoE235BA22Config
│ └── Qwen3_5_VLTextMoEConfig
│ └── Qwen3_5_VLTextMoE35BA3BConfig
└── BaseComposeConfig
├── Qwen3VLBaseConfig
│ ├── Qwen3VLMoE30BA3Config
│ ├── Qwen3VLMoE235BA22Config
│ ├── Qwen3VLDense4BConfig
│ ├── Qwen3VLDense8BConfig
│ └── Qwen3_5_BaseConfig
│ └── Qwen3_5_VLMoE35BA3Config
├── InternVLBaseConfig
│ ├── InternVL3P5Dense8BConfig
│ ├── InternVL3P5MoE30BA3Config
│ └── InternVL3P5Dense1BConfig
└── InternS1BaseConfig
├── InternS1Config
└── InternS1MiniConfig
```
Loading
Loading