当前位置: 首页 > news >正文

CANN/AMCT OFMR算法示例

AMCT Large Model Quantization

【免费下载链接】amctAMCT是CANN提供的昇腾AI处理器亲和的模型压缩工具仓。项目地址: https://gitcode.com/cann/amct

1 Quantization Prerequisites

1.1 Install Dependencies

The dependency packages for this sample can be found in requirements.txt

Note that the torch_npu package version needs to match the Python and torch package versions, and the CANN package needs to be installed

1.2 Model and Dataset Preparation

This sample uses Llama2-7b, qwen2-7b, and qwen3-8b models with pileval data and wikitext2 dataset as examples. Please download the models yourself and pass the model path to the script. The dataset is loaded online.

1.3 Simple Quantization Configuration

The quantization configuration used in this sample is built into the tool and can be obtained and used in the following ways:

from amct_pytorch import HIFP8_OFMR_CFG

If you need to modify the detailed configuration, please refer to the documentation to construct the required quantization configuration dict.

The OFMR algorithm supports weight-only quantization and full quantization. The supported quantization types and quantization configurations are:

FieldTypeDescriptionValue RangeNotes
batch_numuint32Number of batches used for quantization1/
skip_layersstrLayers to skip quantization/Skip quantization layers support fuzzy matching. When the configured string is a layer name substring or matches the layer name, skip quantization for that layer and do not generate quantization configuration. The string must contain numbers or letters
weights.typestrQuantized weight type'float8_e4m3fn'/'hifloat8'/
weights.symmetricboolSymmetric quantizationTRUE/
weights.strategystrQuantization granularity'tensor'/'channel'/
inputs.typestrQuantized activation type'float8_e4m3fn'/'hifloat8'/
inputs.symmetricboolSymmetric quantizationTRUE/
inputs.strategystrQuantization granularity'tensor'/
algorithmdictQuantization algorithm configuration used{'ofmr'}/

2 Quantization Example

2.1 Use Interface Method to Call

step 1.Please execute the following command in the current directory to run the sample program. Users need to modify the model path in the sample program according to actual conditions:

python3 src/run_llama2_samples.py --model_path=/data/Llama2_7b_hf/
python3 src/run_qwen_samples.py --model_path=/data/Qwen2-7b/
python3 src/run_qwen_samples.py --model_path=/data/Qwen3-8B/

If the following information appears, it indicates that quantization is successful:

Test time taken: 1.0 min 59.24865388870239 s Score: 5.477707

Where Score is the quantized model PPL. For specific values, refer to the following table:

ModelCalibration SetDatasetPre-quantization PPLPost-quantization PPL
LLAMA2-7Bpilevalwikitext25.4725.505
QWEN2-7Bpilevalwikitext27.1377.196
QWEN3-8Bpilevalwikitext29.7159.808

After inference succeeds, a quantization log file ./amct_log/amct_pytorch.log is generated in the current directory

【免费下载链接】amctAMCT是CANN提供的昇腾AI处理器亲和的模型压缩工具仓。项目地址: https://gitcode.com/cann/amct

创作声明:本文部分内容由AI辅助生成(AIGC),仅供参考

http://www.rkmt.cn/news/1473810.html

相关文章:

  • UE5数字人引擎架构设计:从Metahuman到AI交互的完整解决方案深度解析
  • 为什么你的二维码在AI数字营销正文里自动失效?——CSDN官方白皮书未披露的4类拦截场景及3种灰度兼容方案
  • 开源项目管理的终极解决方案:OpenProject完整使用指南
  • 如何用LX Music桌面版打造你的专属音乐库:5个超实用技巧
  • ThinkPad风扇控制终极指南:3种场景下的TPFanCtrl2专业配置方案
  • LikeC4架构权限管理:如何实现细粒度访问控制与可视化权限建模
  • LLM底层原理-从零训练你的第一个ChatGPT 风格大模型:NanoChat 全流程实战指南
  • 开源数据恢复工具:3大常见数据灾难的终极解决方案
  • 可乐机减压阀哪个牌子好?2026专业选购指南 - 速递信息
  • 如何在Ruby on Rails中集成redis-rails?5分钟快速上手指南
  • 2026郑州黄金回收权威测评:全国连锁榜首,收的顶稳居本地行业龙头 - 奢侈品回收评测
  • 终极指南:让2008-2019年老款Mac重获新生,安装最新macOS系统
  • 手把手看懂排序算法:冒泡快排归并等6种算法动态执行过程
  • 3个理由告诉你,为什么开源数据标注平台LabelLLM正在改变AI训练的游戏规则
  • VHDL信号与变量深度解析:硬件思维与仿真模型的核心差异
  • 利用快马ai快速生成基于c2000ware sdk的电机控制原型
  • 轻量级C语言DNS中继工具:本地映射+上游转发双路解析
  • 哪款散热器适配学生手游党?2026散热器实测,静音便携解锁舒适游戏体验 - 资讯焦点
  • PUBG罗技鼠标宏完整教程:从零基础到实战精通
  • Linux平台二维液滴润湿LBM模拟代码包,含编译脚本与接触角计算核心
  • 2026 河源卫生间厨房阳台地下室漏水维修商家测评,多家防水企业综合评分横向对比,帮本地业主甄选靠谱堵漏维保团队 - 吉修匠
  • 炉石传说HsMod插件终极指南:55项功能全面解锁游戏体验
  • 国家中小学智慧教育平台电子课本下载指南:三步获取PDF教材的智能工具
  • MonkeyCode VS Code 插件安装教程
  • 天津本地收金TOP权威榜单,2026禹竞名奢汇报价碾压一众同行 - 奢侈品交易观察员
  • 基于魏格纳分布的一维振动信号时频图生成工具(Matlab可直接运行)
  • 基于LM2678的双模式DC-DC电源设计:从5V固定输出到1.2-12V可调输出实战
  • VisualCppRedist AIO高效解决方案:一站式解决Windows运行时组件缺失问题
  • OmenSuperHub终极指南:解锁惠普暗影精灵游戏本全部性能
  • 轻松解决Rails性能瓶颈:redis-rails HTTP缓存实现详解 [特殊字符]