当前位置: 首页 > news >正文

CANN/amct GPTQ量化示例

AMCT Large Model GPTQ Quantization

【免费下载链接】amctAMCT是CANN提供的昇腾AI处理器亲和的模型压缩工具仓。项目地址: https://gitcode.com/cann/amct

1 Quantization Prerequisites

1.1 Install Dependencies

The dependency packages for this sample can be found in requirements.txt

Note that the torch_npu package version needs to match the Python and torch package versions, and the CANN package needs to be installed

1.2 Model and Dataset Preparation

This sample uses Llama2-7b and qwen2-7b models, pileval data, and wikitext2 dataset as examples. Data is loaded online, and models need to be downloaded by users themselves and the model path needs to be specified when executing the script.

1.3 Simple Quantization Configuration

The quantization configuration used in this sample is built into the tool and can be obtained and used in the following ways:

INT4 weight-only quantization configuration:from amct_pytorch import INT4_GPTQ_WEIGHT_QUANT_CFGMXFP4_E2M1 weight-only quantization configuration:

cfg = { 'batch_num': 1, 'quant_cfg': { 'weights': { 'type': 'mxfp4_e2m1', 'symmetric': True, 'strategy': 'group', 'group_size': 32 }, }, 'algorithm': {'gptq'}, 'skip_layers': {'lm_head'} }

If you need to modify the detailed configuration, please refer to the documentation to construct the required quantization configuration dict.

The GPTQ algorithm only supports weight quantization. The supported quantization types and quantization configurations are:

FieldTypeDescriptionValue RangeNotes
batch_numuint32Number of batches used for quantization1/
skip_layersstrLayers to skip quantization/Skip quantization layers support fuzzy matching. When the configured string is a layer name substring or matches the layer name, skip quantization for that layer and do not generate quantization configuration. The string must contain numbers or letters
weights.typestrQuantized weight type'int4'/'int8'/'float4_e2m1'/'mxfp4_e2m1'/
weights.symmetricboolSymmetric quantizationTRUE/FALSEfloat4_e2m1 and mxfp4_e2m1 only support symmetric quantization configuration
weights.strategystrQuantization granularity'tensor'/'channel'/'group'float4_e2m1 and mxfp4_e2m1 only support group strategy configuration
algorithmdictQuantization algorithm configuration used{'gptq'}/

2 Quantization Example

2.1 Use Interface Method to Call

step 1.Please execute the following command in the current directory to run the sample program. Users need to modify the model and dataset paths in the sample program according to actual conditions:

python3 src/run_llama2_samples.py --model_path=/data/Llama2_7b_hf/
python3 src/run_qwen_samples.py --model_path=/data/Qwen2-7b/

If the following information appears, it indicates that quantization is successful:

Test time taken: 1.0 min 59.24865388870239 s Score: 5.477707

step 2.Recommended to use the following configuration

Where Score is the quantized model PPL. For specific values, refer to the following table:

ModelCalibration SetDatasetPre-quantization PPLPost-INT4 quantization PPLPost-MXFP4 quantization PPL
LLAMA2-7Bpilevalwikitext25.4725.6015.799
QWEN2-7Bpilevalwikitext27.1377.2537.305

After inference succeeds, a quantization log file ./amct_log/amct_pytorch.log is generated in the current directory

【免费下载链接】amctAMCT是CANN提供的昇腾AI处理器亲和的模型压缩工具仓。项目地址: https://gitcode.com/cann/amct

创作声明:本文部分内容由AI辅助生成(AIGC),仅供参考

http://www.rkmt.cn/news/1471301.html

相关文章:

  • Mythos:首个可规模化漏洞挖掘的AI安全研究员
  • LDDC:一款高效精准的逐字歌词下载与匹配工具
  • SQL高手进阶:从语法熟练到执行引擎直觉的跃迁路径
  • 知乎式问答社区源码:SpringBoot后端 + Vue2前端,含数据库脚本与部署文档
  • 从‘空口令’到‘security123’:一次完整的L0phtCrack密码审计实验复盘与防御思考
  • 2026年实际成本分摊ERP解决方案TOP5排行盘点:NAV MES、NAV MPS、NAV MRP、NAV Mobile选择指南 - 优质品牌商家
  • 从防火墙到探针:拆解一份真实的等保2.0设备采购清单,看看钱都花在哪了
  • Apache服务器安全配置:从.htaccess文件解析漏洞看如何防护你的网站
  • 2026上门地漏疏通服务评测:上门下水道疏通/上门通下水/上门马桶疏通/马桶疏通/上门地漏疏通/上门管道疏通/地漏疏通/选择指南 - 优质品牌商家
  • Veo视频风格迁移效果翻车全复盘,37个真实项目案例对比(含Stable Video Diffusion基准线)
  • B站视频解析终极指南:5个简单技巧助你轻松获取高清资源
  • AI分层防御钓鱼攻击:URL分析、语义识别与行为验证实战
  • 别再乱开抗锯齿了!从GPU架构(IMR/TBR/TBDR)深度解析MSAA的性能消耗与适用场景
  • Claude Mythos:AI红队能力跃迁与自主渗透测试实战解析
  • 2026年深圳外贸建站多少钱
  • 免费在线图表编辑器:Mermaid Live Editor完整使用指南
  • tower-web与其他Rust Web框架对比:为什么选择tower-web?
  • 告别纸上谈兵:手把手带你用SAP IDES复现一个完整的PS项目(含WBS、网络、采购、结算全流程)
  • 市面上性价比高的防锈母粒厂商推荐,方底防锈袋/可降解防锈海绵/VCI防锈纸/气相防锈纸,防锈母粒生产厂家哪家可靠 - 品牌推荐师
  • 数据科学中的线性代数:向量建模、矩阵变换与数值稳定性实战指南
  • HsMod:炉石传说的终极增强插件,3分钟开启你的个性化游戏体验
  • Agentic RAG:从查资料到自主决策的AI工作流演进
  • 相关性分析实战指南:从皮尔逊到斯皮尔曼的选型逻辑与避坑要点
  • 全日制档案激活服务机构排行:函授毕业证补办、大专档案补办、大专毕业证补办、学位证遗失补办、学籍档案补办、往届生毕业证补办选择指南 - 优质品牌商家
  • 2026年Q2酒店用锁品牌排行:分体式酒店锁/宾馆刷卡锁/宾馆刷卡门锁/宾馆锁/宿舍智能锁/电子酒店锁/直板式酒店锁/选择指南 - 优质品牌商家
  • 如何免费将扫描PDF转换为可搜索文档:Umi-OCR双层PDF转换终极指南
  • 告别Cartopy!用Python Basemap + xarray处理ETOPO2地形数据,绘制一张高清全球海拔图
  • 抖音无水印视频批量下载实战:3分钟掌握专业级下载技巧
  • 保姆级教程:用CubeMX和Keil MDK-V6给STM32F407移植RTX5实时系统(附源码)
  • PingFangSC字体高效应用实战指南:从安装到性能优化的完整解决方案