当前位置：首页 > news >正文

CANN/amct GPTQ量化示例

news 2026/6/6 5:28:49

AMCT Large Model GPTQ Quantization

【免费下载链接】amctAMCT是CANN提供的昇腾AI处理器亲和的模型压缩工具仓。项目地址: https://gitcode.com/cann/amct

1 Quantization Prerequisites

1.1 Install Dependencies

The dependency packages for this sample can be found in requirements.txt

Note that the torch_npu package version needs to match the Python and torch package versions, and the CANN package needs to be installed

1.2 Model and Dataset Preparation

This sample uses Llama2-7b and qwen2-7b models, pileval data, and wikitext2 dataset as examples. Data is loaded online, and models need to be downloaded by users themselves and the model path needs to be specified when executing the script.

1.3 Simple Quantization Configuration

The quantization configuration used in this sample is built into the tool and can be obtained and used in the following ways:

INT4 weight-only quantization configuration:from amct_pytorch import INT4_GPTQ_WEIGHT_QUANT_CFGMXFP4_E2M1 weight-only quantization configuration:

cfg = { 'batch_num': 1, 'quant_cfg': { 'weights': { 'type': 'mxfp4_e2m1', 'symmetric': True, 'strategy': 'group', 'group_size': 32 }, }, 'algorithm': {'gptq'}, 'skip_layers': {'lm_head'} }

If you need to modify the detailed configuration, please refer to the documentation to construct the required quantization configuration dict.

The GPTQ algorithm only supports weight quantization. The supported quantization types and quantization configurations are:

Field	Type	Description	Value Range	Notes
batch_num	uint32	Number of batches used for quantization	1	/
skip_layers	str	Layers to skip quantization	/	Skip quantization layers support fuzzy matching. When the configured string is a layer name substring or matches the layer name, skip quantization for that layer and do not generate quantization configuration. The string must contain numbers or letters
weights.type	str	Quantized weight type	'int4'/'int8'/'float4_e2m1'/'mxfp4_e2m1'	/
weights.symmetric	bool	Symmetric quantization	TRUE/FALSE	float4_e2m1 and mxfp4_e2m1 only support symmetric quantization configuration
weights.strategy	str	Quantization granularity	'tensor'/'channel'/'group'	float4_e2m1 and mxfp4_e2m1 only support group strategy configuration
algorithm	dict	Quantization algorithm configuration used	{'gptq'}	/

2 Quantization Example

2.1 Use Interface Method to Call

step 1.Please execute the following command in the current directory to run the sample program. Users need to modify the model and dataset paths in the sample program according to actual conditions:

python3 src/run_llama2_samples.py --model_path=/data/Llama2_7b_hf/

python3 src/run_qwen_samples.py --model_path=/data/Qwen2-7b/

If the following information appears, it indicates that quantization is successful:

Test time taken: 1.0 min 59.24865388870239 s Score: 5.477707

step 2.Recommended to use the following configuration

Where Score is the quantized model PPL. For specific values, refer to the following table:

Model	Calibration Set	Dataset	Pre-quantization PPL	Post-INT4 quantization PPL	Post-MXFP4 quantization PPL
LLAMA2-7B	pileval	wikitext2	5.472	5.601	5.799
QWEN2-7B	pileval	wikitext2	7.137	7.253	7.305

After inference succeeds, a quantization log file ./amct_log/amct_pytorch.log is generated in the current directory

【免费下载链接】amctAMCT是CANN提供的昇腾AI处理器亲和的模型压缩工具仓。项目地址: https://gitcode.com/cann/amct

创作声明：本文部分内容由AI辅助生成（AIGC），仅供参考

查看全文

http://www.rkmt.cn/news/1471301.html

Mythos：首个可规模化漏洞挖掘的AI安全研究员

LDDC：一款高效精准的逐字歌词下载与匹配工具

SQL高手进阶：从语法熟练到执行引擎直觉的跃迁路径

知乎式问答社区源码：SpringBoot后端 + Vue2前端，含数据库脚本与部署文档

从‘空口令’到‘security123’：一次完整的L0phtCrack密码审计实验复盘与防御思考

2026年实际成本分摊ERP解决方案TOP5排行盘点：NAV MES、NAV MPS、NAV MRP、NAV Mobile选择指南 - 优质品牌商家

从防火墙到探针：拆解一份真实的等保2.0设备采购清单，看看钱都花在哪了

Apache服务器安全配置：从.htaccess文件解析漏洞看如何防护你的网站

2026上门地漏疏通服务评测：上门下水道疏通/上门通下水/上门马桶疏通/马桶疏通/上门地漏疏通/上门管道疏通/地漏疏通/选择指南 - 优质品牌商家

Veo视频风格迁移效果翻车全复盘，37个真实项目案例对比（含Stable Video Diffusion基准线）

B站视频解析终极指南：5个简单技巧助你轻松获取高清资源

AI分层防御钓鱼攻击：URL分析、语义识别与行为验证实战

别再乱开抗锯齿了！从GPU架构（IMR/TBR/TBDR）深度解析MSAA的性能消耗与适用场景

Claude Mythos：AI红队能力跃迁与自主渗透测试实战解析

2026年深圳外贸建站多少钱

免费在线图表编辑器：Mermaid Live Editor完整使用指南

tower-web与其他Rust Web框架对比：为什么选择tower-web？

告别纸上谈兵：手把手带你用SAP IDES复现一个完整的PS项目（含WBS、网络、采购、结算全流程）

数据科学中的线性代数：向量建模、矩阵变换与数值稳定性实战指南

HsMod：炉石传说的终极增强插件，3分钟开启你的个性化游戏体验

Agentic RAG：从查资料到自主决策的AI工作流演进

相关性分析实战指南：从皮尔逊到斯皮尔曼的选型逻辑与避坑要点

如何免费将扫描PDF转换为可搜索文档：Umi-OCR双层PDF转换终极指南

告别Cartopy！用Python Basemap + xarray处理ETOPO2地形数据，绘制一张高清全球海拔图

抖音无水印视频批量下载实战：3分钟掌握专业级下载技巧

保姆级教程：用CubeMX和Keil MDK-V6给STM32F407移植RTX5实时系统（附源码）

PingFangSC字体高效应用实战指南：从安装到性能优化的完整解决方案