chore: update README and model inference testing scripts
This commit is contained in:
56
README.md
56
README.md
@@ -99,7 +99,14 @@ git submodule update --init --recursive Megatron-LM
|
|||||||
- `vocab.json`
|
- `vocab.json`
|
||||||
- `tokenizer_config.json`
|
- `tokenizer_config.json`
|
||||||
|
|
||||||
这些文件需要从 Kaiyuan-2B 的模型权重或对应 tokenizer 配置中手动提取出来,再交给 Megatron 的数据预处理流程使用。
|
这些文件需要从 Kaiyuan-2B 的模型权重或对应 tokenizer 配置中手动提取出来,再交给 Megatron 的数据预处理流程使用:
|
||||||
|
|
||||||
|
```bash
|
||||||
|
wget https://hf-mirror.com/thu-pacman/PCMind-2.1-Kaiyuan-2B/resolve/refs%2Fpr%2F1/tokenizer.json
|
||||||
|
wget https://hf-mirror.com/thu-pacman/PCMind-2.1-Kaiyuan-2B/resolve/refs%2Fpr%2F1/tokenizer_config.json
|
||||||
|
wget https://hf-mirror.com/thu-pacman/PCMind-2.1-Kaiyuan-2B/resolve/refs%2Fpr%2F1/vocab.json
|
||||||
|
wget https://hf-mirror.com/thu-pacman/PCMind-2.1-Kaiyuan-2B/resolve/refs%2Fpr%2F1/merges.txt
|
||||||
|
```
|
||||||
|
|
||||||
## 4. 模型定义与训练脚本
|
## 4. 模型定义与训练脚本
|
||||||
|
|
||||||
@@ -177,6 +184,12 @@ git submodule update --init --recursive Megatron-LM
|
|||||||
|
|
||||||
训练完成后,可以使用 `eval_<model_name>.sh` 或对应的推理脚本进行模型推理。
|
训练完成后,可以使用 `eval_<model_name>.sh` 或对应的推理脚本进行模型推理。
|
||||||
|
|
||||||
|
注意: 执行推理前需在docker环境中安装 `flask`:
|
||||||
|
|
||||||
|
```bash
|
||||||
|
pip install flask-restful
|
||||||
|
```
|
||||||
|
|
||||||
### 7.1 推理前的必要修改
|
### 7.1 推理前的必要修改
|
||||||
|
|
||||||
在执行推理前,需要手动修改 `Megatron-LM/megatron/core/inference/text_generation_server/run_mcore_engine.py` 的第 89 行,把:
|
在执行推理前,需要手动修改 `Megatron-LM/megatron/core/inference/text_generation_server/run_mcore_engine.py` 的第 89 行,把:
|
||||||
@@ -197,6 +210,47 @@ git submodule update --init --recursive Megatron-LM
|
|||||||
AttributeError: 'list' object has no attribute 'tolist'
|
AttributeError: 'list' object has no attribute 'tolist'
|
||||||
```
|
```
|
||||||
|
|
||||||
|
并且需要将 `Megatron-LM/tools/run_text_generation_server.py` 的第64行, 将
|
||||||
|
|
||||||
|
```python
|
||||||
|
inference_context = StaticInferenceContext(args.inference_max_requests, args.inference_max_sequence_length)
|
||||||
|
```
|
||||||
|
|
||||||
|
改为
|
||||||
|
|
||||||
|
```
|
||||||
|
inference_context = StaticInferenceContext(args.inference_max_requests, <any integer, such as 4096>)
|
||||||
|
```
|
||||||
|
|
||||||
|
避免报错:
|
||||||
|
|
||||||
|
```text
|
||||||
|
[rank0]: AttributeError: 'Namespace' object has no attribute 'inference_max_sequence_length'
|
||||||
|
```
|
||||||
|
|
||||||
|
推理服务部署成功后会显示:
|
||||||
|
```bash
|
||||||
|
INFO:werkzeug:WARNING: This is a development server. Do not use it in a production deployment. Use a production WSGI server instead.
|
||||||
|
* Running on all addresses (0.0.0.0)
|
||||||
|
* Running on http://127.0.0.1:5000
|
||||||
|
* Running on http://172.17.0.2:5000
|
||||||
|
INFO:werkzeug:Press CTRL+C to quit
|
||||||
|
```
|
||||||
|
|
||||||
|
切换到同Docker脚本下的另一个 `bash terminal`, 执行如下的命令即可测试模型推理:
|
||||||
|
|
||||||
|
```bash
|
||||||
|
curl -X PUT http://127.0.0.1:5000/api \
|
||||||
|
-H "Content-Type: application/json" \
|
||||||
|
-d '{
|
||||||
|
"prompts": ["The capital of France is"],
|
||||||
|
"tokens_to_generate": 50,
|
||||||
|
"temperature": 0.8,
|
||||||
|
"top_k": 0,
|
||||||
|
"top_p": 0.9
|
||||||
|
}'
|
||||||
|
```
|
||||||
|
|
||||||
## 8. 常用脚本
|
## 8. 常用脚本
|
||||||
|
|
||||||
### 8.1 数据下载
|
### 8.1 数据下载
|
||||||
|
|||||||
@@ -24,27 +24,6 @@ export CUDA_DEVICE_MAX_CONNECTIONS=1
|
|||||||
|
|
||||||
# pip install flask-restful
|
# pip install flask-restful
|
||||||
|
|
||||||
# torchrun $DISTRIBUTED_ARGS $MEGATRON_PATH/tools/run_text_generation_server.py \
|
|
||||||
# --tensor-model-parallel-size 1 \
|
|
||||||
# --pipeline-model-parallel-size 1 \
|
|
||||||
# --num-layers 12 \
|
|
||||||
# --hidden-size 3072 \
|
|
||||||
# --load ${CHECKPOINT} \
|
|
||||||
# --num-attention-heads 8 \
|
|
||||||
# --num-query-groups 4 \
|
|
||||||
# --max-position-embeddings 4096 \
|
|
||||||
# --fp16 \
|
|
||||||
# --micro-batch-size 1 \
|
|
||||||
# --seq-length 1024 \
|
|
||||||
# --temperature 1.0 \
|
|
||||||
|
|
||||||
# --top_p 0.9 \
|
|
||||||
# --seed 42 \
|
|
||||||
# --tokenizer-type GPT2BPETokenizer
|
|
||||||
# --vocab-file $VOCAB_FILE \
|
|
||||||
# --merge-file $MERGE_FILE \
|
|
||||||
|
|
||||||
|
|
||||||
torchrun $DISTRIBUTED_ARGS $MEGATRON_PATH/tools/run_text_generation_server.py \
|
torchrun $DISTRIBUTED_ARGS $MEGATRON_PATH/tools/run_text_generation_server.py \
|
||||||
--load $CHECKPOINT \
|
--load $CHECKPOINT \
|
||||||
--tensor-model-parallel-size 1 \
|
--tensor-model-parallel-size 1 \
|
||||||
|
|||||||
50
scripts/kaiyuan2b-training/eval_smoke_qwen3_1p7b.sh
Normal file
50
scripts/kaiyuan2b-training/eval_smoke_qwen3_1p7b.sh
Normal file
@@ -0,0 +1,50 @@
|
|||||||
|
#!/bin/bash
|
||||||
|
# This example will start serving the 345M model.
|
||||||
|
DISTRIBUTED_ARGS="--nproc_per_node 1 \
|
||||||
|
--nnodes 1 \
|
||||||
|
--node_rank 0 \
|
||||||
|
--master_addr localhost \
|
||||||
|
--master_port 6000"
|
||||||
|
|
||||||
|
# <Path to checkpoint (e.g /345m)>
|
||||||
|
CHECKPOINT=/apps/yi/model_training/artifacts/checkpoints/qwen3_1p7b_smoke_yi
|
||||||
|
|
||||||
|
# <Path to vocab.json (e.g. /gpt2-vocab.json)>
|
||||||
|
VOCAB_FILE=/apps/yi/model_training/data/tokenizer/vocab.json
|
||||||
|
|
||||||
|
# <Path to merges.txt (e.g. /gpt2-merges.txt)>
|
||||||
|
MERGE_FILE=/apps/yi/model_training/data/tokenizer/merges.txt
|
||||||
|
|
||||||
|
# <Path to tokenizer>
|
||||||
|
TOKENIZER_PATH=/apps/yi/model_training/data/tokenizer
|
||||||
|
|
||||||
|
MEGATRON_PATH=/apps/yi/model_training/Megatron-LM
|
||||||
|
|
||||||
|
export CUDA_DEVICE_MAX_CONNECTIONS=1
|
||||||
|
|
||||||
|
# pip install flask-restful
|
||||||
|
|
||||||
|
torchrun $DISTRIBUTED_ARGS $MEGATRON_PATH/tools/run_text_generation_server.py \
|
||||||
|
--load $CHECKPOINT \
|
||||||
|
--tensor-model-parallel-size 1 \
|
||||||
|
--pipeline-model-parallel-size 1 \
|
||||||
|
--num-layers 28 \
|
||||||
|
--hidden-size 2048 \
|
||||||
|
--ffn-hidden-size 6144 \
|
||||||
|
--num-attention-heads 16 \
|
||||||
|
--num-query-groups 8 \
|
||||||
|
--group-query-attention \
|
||||||
|
--seq-length 4096 \
|
||||||
|
--max-position-embeddings 4096 \
|
||||||
|
--position-embedding-type rope \
|
||||||
|
--rotary-base 10000 \
|
||||||
|
--swiglu \
|
||||||
|
--disable-bias-linear \
|
||||||
|
--normalization RMSNorm \
|
||||||
|
--untie-embeddings-and-output-weights \
|
||||||
|
--tokenizer-type HuggingFaceTokenizer \
|
||||||
|
--tokenizer-model $TOKENIZER_PATH \
|
||||||
|
--bf16 \
|
||||||
|
--micro-batch-size 1 \
|
||||||
|
--micro-batch-size 1 \
|
||||||
|
--inference-max-requests 1
|
||||||
Reference in New Issue
Block a user