CANN/mat-chem-sim-pred IPDT批量滚动评分基准测试-尧图网站建设

📅 发布时间：2026/7/4 8:06:53

PidIpdtBatchRolloutScore Benchmark Report

【免费下载链接】mat-chem-sim-pred面向工业领域，聚焦计算仿真、预测两大核心场景，构建面向流程工业"机理+数据"双轮驱动的领域计算层，推动AI for Science在材料化学领域的深度应用。项目地址: https://gitcode.com/cann/mat-chem-sim-pred

This document records the measured CPU/NPU behavior ofPidIpdtBatchRolloutScore.

Environment

NPU host:node202
Device:Ascend910B3, device id0
CANN:/usr/local/Ascend/ascend-toolkit/latest
CPU baseline: benchmark program multi-thread mode
Build:-DCMAKE_BUILD_TYPE=Release -DSOC_VERSION=Ascend910B3 -DRUN_MODE=npu

Method

Thebenchmark_pid_ipdt_batch_rollout_score_aclnnprogram builds an in-process multi-thread CPU reference (ComputeRange, the same integrator recurrencey[k+1] = y[k] + b*u[k-delay]), runs the NPU operator on the same inputs and reportsmax_abs_err,max_quality_rel_errandbest_idx_diff_count. The pass conditions arenpu_zero_score_count == 0, per-candidate scores matching the CPU reference to float32 precision, and anybest_idxdifferences being near-ties (the chosen candidate's metric rel-err stays small), matching the behavior of the verified FOPDT operator.

Correctness

The IPDT kernel differs from the verified FOPDT kernel only in the state recurrence (thea*ydecay term is dropped). The candidate-axis SIMD width does not change the numerics (each tile is independent), so the accuracy profile matches FOPDT: NPU output equals the CPU reference within float32 rounding.

Measured onnode202 / Ascend910B3, B=128, sim_steps=1024, candidate_tile=C,npu_zero_score_count=0:

candidates	max_abs_err	max_quality_rel_err	best_idx_diff_count
1024	2.4e-4	1.5e-6	0
4096	1.0	1.69e-3	1
16384	1.5e-3	3.3e-5	1

Themax_abs_err=1at 4096 is the discrete settling-time metric crossing the settle band one sample later on NPU than on CPU for a single near-tie loop (dt=1-> abs diff 1); the corresponding metric rel-err stays< 2e-3. The reference FOPDT operator shows the same behavior at this candidate count (max_abs_err=1, max_quality_rel_err=4.5e-3, best_idx_diff_count=1), so IPDT is within the accepted baseline.

Measured timing

node202 / Ascend910B3, B=128, sim_steps=1024, candidate_tile=C, CPU = 64-thread parallel reference.

candidates	CPU parallel ms	NPU kernel ms	NPU kernel vs CPU
1024	32.5	7.45	4.36x
4096	122.1	24.7	4.95x
16384	426.6	93.8	4.55x

Against a 192-thread CPU reference the speedup is 3.8-4.0x (the wider CPU pool narrows the gap).

Notes

The kernel reuses the FOPDT wide-lane (kLane=768) and fused inner-loop optimizations unchanged; the only algorithmic difference is the integrator recurrence, which removes one vector multiply per timestep.

创作声明：本文部分内容由AI辅助生成（AIGC），仅供参考