diff --git a/README.md b/README.md index 6dcb069..b5a7f48 100644 --- a/README.md +++ b/README.md @@ -99,7 +99,14 @@ git submodule update --init --recursive Megatron-LM - `vocab.json` - `tokenizer_config.json` -这些文件需要从 Kaiyuan-2B 的模型权重或对应 tokenizer 配置中手动提取出来,再交给 Megatron 的数据预处理流程使用。 +这些文件需要从 Kaiyuan-2B 的模型权重或对应 tokenizer 配置中手动提取出来,再交给 Megatron 的数据预处理流程使用: + +```bash +wget https://hf-mirror.com/thu-pacman/PCMind-2.1-Kaiyuan-2B/resolve/refs%2Fpr%2F1/tokenizer.json +wget https://hf-mirror.com/thu-pacman/PCMind-2.1-Kaiyuan-2B/resolve/refs%2Fpr%2F1/tokenizer_config.json +wget https://hf-mirror.com/thu-pacman/PCMind-2.1-Kaiyuan-2B/resolve/refs%2Fpr%2F1/vocab.json +wget https://hf-mirror.com/thu-pacman/PCMind-2.1-Kaiyuan-2B/resolve/refs%2Fpr%2F1/merges.txt +``` ## 4. 模型定义与训练脚本 @@ -177,6 +184,12 @@ git submodule update --init --recursive Megatron-LM 训练完成后,可以使用 `eval_.sh` 或对应的推理脚本进行模型推理。 +注意: 执行推理前需在docker环境中安装 `flask`: + +```bash +pip install flask-restful +``` + ### 7.1 推理前的必要修改 在执行推理前,需要手动修改 `Megatron-LM/megatron/core/inference/text_generation_server/run_mcore_engine.py` 的第 89 行,把: @@ -197,6 +210,47 @@ git submodule update --init --recursive Megatron-LM AttributeError: 'list' object has no attribute 'tolist' ``` +并且需要将 `Megatron-LM/tools/run_text_generation_server.py` 的第64行, 将 + +```python +inference_context = StaticInferenceContext(args.inference_max_requests, args.inference_max_sequence_length) +``` + +改为 + +``` +inference_context = StaticInferenceContext(args.inference_max_requests, ) +``` + +避免报错: + +```text +[rank0]: AttributeError: 'Namespace' object has no attribute 'inference_max_sequence_length' +``` + +推理服务部署成功后会显示: +```bash +INFO:werkzeug:WARNING: This is a development server. Do not use it in a production deployment. Use a production WSGI server instead. + * Running on all addresses (0.0.0.0) + * Running on http://127.0.0.1:5000 + * Running on http://172.17.0.2:5000 +INFO:werkzeug:Press CTRL+C to quit +``` + +切换到同Docker脚本下的另一个 `bash terminal`, 执行如下的命令即可测试模型推理: + +```bash +curl -X PUT http://127.0.0.1:5000/api \ + -H "Content-Type: application/json" \ + -d '{ + "prompts": ["The capital of France is"], + "tokens_to_generate": 50, + "temperature": 0.8, + "top_k": 0, + "top_p": 0.9 + }' +``` + ## 8. 常用脚本 ### 8.1 数据下载 diff --git a/scripts/kaiyuan2b-training/eval_smoke_gpt2.sh b/scripts/kaiyuan2b-training/eval_smoke_gpt2.sh index fde1793..95cfd02 100644 --- a/scripts/kaiyuan2b-training/eval_smoke_gpt2.sh +++ b/scripts/kaiyuan2b-training/eval_smoke_gpt2.sh @@ -24,27 +24,6 @@ export CUDA_DEVICE_MAX_CONNECTIONS=1 # pip install flask-restful -# torchrun $DISTRIBUTED_ARGS $MEGATRON_PATH/tools/run_text_generation_server.py \ -# --tensor-model-parallel-size 1 \ -# --pipeline-model-parallel-size 1 \ -# --num-layers 12 \ -# --hidden-size 3072 \ -# --load ${CHECKPOINT} \ -# --num-attention-heads 8 \ -# --num-query-groups 4 \ -# --max-position-embeddings 4096 \ -# --fp16 \ -# --micro-batch-size 1 \ -# --seq-length 1024 \ -# --temperature 1.0 \ - -# --top_p 0.9 \ -# --seed 42 \ -# --tokenizer-type GPT2BPETokenizer -# --vocab-file $VOCAB_FILE \ -# --merge-file $MERGE_FILE \ - - torchrun $DISTRIBUTED_ARGS $MEGATRON_PATH/tools/run_text_generation_server.py \ --load $CHECKPOINT \ --tensor-model-parallel-size 1 \ diff --git a/scripts/kaiyuan2b-training/eval_smoke_qwen3_1p7b.sh b/scripts/kaiyuan2b-training/eval_smoke_qwen3_1p7b.sh new file mode 100644 index 0000000..9e96f77 --- /dev/null +++ b/scripts/kaiyuan2b-training/eval_smoke_qwen3_1p7b.sh @@ -0,0 +1,50 @@ +#!/bin/bash +# This example will start serving the 345M model. +DISTRIBUTED_ARGS="--nproc_per_node 1 \ + --nnodes 1 \ + --node_rank 0 \ + --master_addr localhost \ + --master_port 6000" + +# +CHECKPOINT=/apps/yi/model_training/artifacts/checkpoints/qwen3_1p7b_smoke_yi + +# +VOCAB_FILE=/apps/yi/model_training/data/tokenizer/vocab.json + +# +MERGE_FILE=/apps/yi/model_training/data/tokenizer/merges.txt + +# +TOKENIZER_PATH=/apps/yi/model_training/data/tokenizer + +MEGATRON_PATH=/apps/yi/model_training/Megatron-LM + +export CUDA_DEVICE_MAX_CONNECTIONS=1 + +# pip install flask-restful + +torchrun $DISTRIBUTED_ARGS $MEGATRON_PATH/tools/run_text_generation_server.py \ + --load $CHECKPOINT \ + --tensor-model-parallel-size 1 \ + --pipeline-model-parallel-size 1 \ + --num-layers 28 \ + --hidden-size 2048 \ + --ffn-hidden-size 6144 \ + --num-attention-heads 16 \ + --num-query-groups 8 \ + --group-query-attention \ + --seq-length 4096 \ + --max-position-embeddings 4096 \ + --position-embedding-type rope \ + --rotary-base 10000 \ + --swiglu \ + --disable-bias-linear \ + --normalization RMSNorm \ + --untie-embeddings-and-output-weights \ + --tokenizer-type HuggingFaceTokenizer \ + --tokenizer-model $TOKENIZER_PATH \ + --bf16 \ + --micro-batch-size 1 \ + --micro-batch-size 1 \ + --inference-max-requests 1