当前位置：首页 > news >正文

Stoic模型性能评估：准确预测蛋白质复合物组分比例的机器学习方法

news 2026/5/30 9:38:11

Stoic模型性能评估：准确预测蛋白质复合物组分比例的机器学习方法

【免费下载链接】stoic项目地址: https://ai.gitcode.com/hf_mirrors/PickyBinders/stoic

Stoic是一款基于机器学习的蛋白质复合物组分比例预测工具，能够直接从蛋白质序列快速准确地预测拷贝数，并可根据预测的最佳化学计量学结果导出AF3-ready JSON文件。该工具基于Facebook的ESM2_t33_650M_UR50D基础模型构建，在生物学研究领域具有重要应用价值。

核心功能与技术优势

Stoic的核心功能在于实现蛋白质化学计量学的精准预测，其技术优势主要体现在以下几个方面：

多输入支持：能够处理序列列表、单个FASTA文件或FASTA文件目录（每个FASTA文件视为独立复合物）
高效预测：首次推理仅需联网下载模型权重，后续可离线使用，模型权重缓存于~/.cache/huggingface
灵活输出：可返回指定数量的顶级化学计量学候选结果（默认3个），支持残基权重预测和保存
AF3集成：直接导出适用于AlphaFold3的输入JSON文件，无缝衔接蛋白质结构预测流程

模型架构与性能指标

Stoic采用先进的机器学习架构，结合序列嵌入与图卷积网络技术：

基础模型：使用facebook/esm2_t33_650M_UR50D作为序列嵌入模型
特征池化：采用SelfAttentionPooling策略，配备4个注意力头
序列特征编码：使用GCNConv图卷积网络，结合4头注意力机制
量化优化：支持4-bit加载模式，平衡性能与资源消耗

模型在多种评估指标上表现优异，包括准确率（accuracy）、F1分数、召回率（recall）和精确率（precision），能够有效预测1-24范围内的蛋白质拷贝数。

快速安装指南

环境准备

使用venv

python -m venv .venv source .venv/bin/activate

使用conda/mamba

mamba create -n stoic-env python=3.10 -y mamba activate stoic-env

安装Stoic

从本地克隆安装（可编辑模式）

git clone https://gitcode.com/hf_mirrors/PickyBinders/stoic cd stoic python -m pip install --upgrade pip python -m pip install -e .

直接从GitHub安装

python -m pip install git+https://gitcode.com/hf_mirrors/PickyBinders/stoic.git

命令行预测使用方法

Stoic提供直观的命令行工具stoic_predict_stoichiometry，支持多种使用场景：

序列列表输入

stoic_predict_stoichiometry \ --sequences "SENECA" "VIRTVS" \ --top-n 3

单个FASTA文件输入

stoic_predict_stoichiometry \ --input-path path/to/complex.fasta \ --top-n 3

FASTA文件目录批量处理

stoic_predict_stoichiometry \ --input-path path/to/fasta_dir \ --top-n 3 \ --output-dir stoic_predictions

Python API使用示例

高层推理辅助函数

from stoic.predict_stoichiometry import predict_stoichiometry results = predict_stoichiometry( sequences=["SENECA", "VIRTVS"], # 或FASTA路径/FASTA目录路径 model_name="PickyBinders/stoic", top_n=3, ) print(results)

直接从Hugging Face加载模型

import torch from stoic.model import Stoic device = "cuda" if torch.cuda.is_available() else "cpu" model = Stoic.from_pretrained("PickyBinders/stoic") model.eval().to(device) pred = model.predict_stoichiometry(["SENECA", "VIRTVS"], top_n=3) print(pred)

输出文件说明

当指定--output-dir参数时，Stoic会生成以下文件：

单个输入（序列列表或单个FASTA）：
- results.json：预测结果
- af3_input.json：AlphaFold3输入文件
- residue_predictions.pkl：残基预测结果（需启用--return-residue-weights）
FASTA目录输入：
- <complex_name>.json：各复合物的预测结果
- <complex_name>_af3_input.json：各复合物的AlphaFold3输入文件
- <complex_name>_residue_predictions.pkl：各复合物的残基预测结果（需启用--return-residue-weights）

引用与学术支持

如果您在研究中使用了Stoic，请引用以下文献：

@article{litvinov2026stoic, title = {Stoic: Fast and accurate protein stoichiometry prediction}, author = {Litvinov, Daniil and Pantolini, Lorenzo and {\v{S}}krinjar, Peter and Tauriello, Gerardo and McCafferty, Caitlyn L and Engel, Benjamin D and Schwede, Torsten and Durairaj, Janani}, journal = {bioRxiv}, year = {2026}, doi = {10.64898/2026.03.13.711535}, url = {https://www.biorxiv.org/content/10.64898/2026.03.13.711535v1} }

Stoic作为一款开源工具，遵循MIT许可证，为蛋白质组学研究提供了强大的计算支持，帮助研究人员更快速、更准确地预测蛋白质复合物的组成比例，推动结构生物学和系统生物学的发展。

【免费下载链接】stoic项目地址: https://ai.gitcode.com/hf_mirrors/PickyBinders/stoic

创作声明：本文部分内容由AI辅助生成（AIGC），仅供参考

查看全文

http://www.rkmt.cn/news/1426734.html