chore: update README and model inference testing scripts

2026-05-06 18:35:53 +08:00
parent 404f1b85aa
commit 056df3b6ca
3 changed files with 105 additions and 22 deletions
--- a/README.md
+++ b/README.md
@@ -99,7 +99,14 @@ git submodule update --init --recursive Megatron-LM
 - `vocab.json`
 - `tokenizer_config.json`

-这些文件需要从 Kaiyuan-2B 的模型权重或对应 tokenizer 配置中手动提取出来，再交给 Megatron 的数据预处理流程使用。
+这些文件需要从 Kaiyuan-2B 的模型权重或对应 tokenizer 配置中手动提取出来，再交给 Megatron 的数据预处理流程使用:
+
+```bash
+wget https://hf-mirror.com/thu-pacman/PCMind-2.1-Kaiyuan-2B/resolve/refs%2Fpr%2F1/tokenizer.json
+wget https://hf-mirror.com/thu-pacman/PCMind-2.1-Kaiyuan-2B/resolve/refs%2Fpr%2F1/tokenizer_config.json
+wget https://hf-mirror.com/thu-pacman/PCMind-2.1-Kaiyuan-2B/resolve/refs%2Fpr%2F1/vocab.json
+wget https://hf-mirror.com/thu-pacman/PCMind-2.1-Kaiyuan-2B/resolve/refs%2Fpr%2F1/merges.txt
+```

 ## 4. 模型定义与训练脚本

@@ -177,6 +184,12 @@ git submodule update --init --recursive Megatron-LM

 训练完成后，可以使用 `eval_<model_name>.sh` 或对应的推理脚本进行模型推理。

+注意: 执行推理前需在docker环境中安装 `flask`:
+
+```bash
+pip install flask-restful
+```
+
 ### 7.1 推理前的必要修改

 在执行推理前，需要手动修改 `Megatron-LM/megatron/core/inference/text_generation_server/run_mcore_engine.py` 的第 89 行，把：
@@ -197,6 +210,47 @@ git submodule update --init --recursive Megatron-LM
 AttributeError: 'list' object has no attribute 'tolist'
 ```

+并且需要将 `Megatron-LM/tools/run_text_generation_server.py` 的第64行, 将
+
+```python
+inference_context = StaticInferenceContext(args.inference_max_requests, args.inference_max_sequence_length)
+```
+
+改为
+
+```
+inference_context = StaticInferenceContext(args.inference_max_requests, <any integer, such as 4096>)
+```
+
+避免报错: 
+
+```text
+[rank0]: AttributeError: 'Namespace' object has no attribute 'inference_max_sequence_length'
+```
+
+推理服务部署成功后会显示:
+```bash
+INFO:werkzeug:WARNING: This is a development server. Do not use it in a production deployment. Use a production WSGI server instead.
+ * Running on all addresses (0.0.0.0)
+ * Running on http://127.0.0.1:5000
+ * Running on http://172.17.0.2:5000
+INFO:werkzeug:Press CTRL+C to quit
+```
+
+切换到同Docker脚本下的另一个 `bash terminal`, 执行如下的命令即可测试模型推理:
+
+```bash
+curl -X PUT http://127.0.0.1:5000/api \
+  -H "Content-Type: application/json" \
+  -d '{
+    "prompts": ["The capital of France is"],
+    "tokens_to_generate": 50,
+    "temperature": 0.8,
+    "top_k": 0,
+    "top_p": 0.9
+  }'
+```
+
 ## 8. 常用脚本

 ### 8.1 数据下载