yolov26改进 | 添加注意力机制篇 | 利用SENetV2改进网络结构 (全网独家改进,含二次创新C2PSA、SPPF)
开始讲解之前推荐一下我的专栏,本专栏的内容支持(分类、检测、分割、追踪、关键点检测),专栏目前为限时折扣,欢迎大家订阅本专栏,本专栏每周更新5-7篇最新机制,更有包含我所有改进的文件和交流群提供给大家,本人定期在群内分享发表论文方法和经验。
一、本文介绍
本文给大家带来的改进机制是SENetV2,其是一种通过调整卷积网络中的通道关系来提升性能的网络结构。SENet并不是一个独立的网络模型,而是一个可以和现有的任何一个模型相结合的模块(可以看作是一种通道型的注意力机制但是相对于SENetV1来说V2又在全局的角度进行了考虑)。在SENet中,所谓的挤压和激励(Squeeze-and-Excitation)操作是作为一个单元添加到传统的卷积网络结构中,如残差单元中(文章中我会把修改好的残差单元给大家大家直接复制粘贴即可使用)亲测大中小三中目标检测上都有一定程度的涨点效果,含二次创新SPPF、C2PSA机制.。
专栏链接:YOLOv26有效涨点专栏包含:Conv、注意力机制、主干/Backbone、损失函数、优化器、后处理等改进机制
二、SENetV2框架原理
论文地址:官方论文地址点击即可跳转
代码地址:官方代码地址点击即可跳转
SENetV2介绍了一种改进的SENet架构,该架构通过引入一种称为Squeeze aggregated excitation(SaE)的新模块来提升网络的表征能力。这个模块结合了挤压和激励(SENetV1)操作,通过多分支全连接层增强了网络的全局表示学习。在基准数据集上的实验结果证明了SENetV2模型相较于现有模型在分类精度上的显著提升。这一架构尤其强调在仅略微增加模型参数的情况下,如何有效地提高模型的性能。
挤压和激励模块大家可以看我发的SENetV1文章里面有介绍。
图中展示了三种不同的神经网络模块对比:
a) ResNeXt模块:采用多分支CNN结构,不同分支的特征图通过卷积操作处理后合并(concatenate),再进行额外的卷积操作。
b) SENet模块:标准卷积操作后,利用全局平均池化来挤压特征,然后通过两个尺寸为1x1的全连接层(FC)和Sigmoid激活函数来获取通道权重,最后对卷积特征进行缩放(Scale)。
c) SENetV2模块:结合了ResNeXt和SENet的特点,采用多分支全连接层(FC)来挤压和激励操作,最后进行特征缩放。
其中SENetV2的设计旨在通过多分支结构进一步提升特征表达的精细度和全局信息的整合能力。
前面我们提到了SaE,就是SENetV2相对于SENetV1的主要改进机制,下面的图片介绍了其内部工作原理。
SENet V2中所提出的SaE(Squeeze-and-Excitation)模块的内部工作机制。挤压输出后,被输入到多分支的全连接(FC)层,然后进行激励过程。分割的输入在最后被传递以恢复其原始形状。这种设计能够让网络更有效地学习到输入数据的不同特征,并且在进行特征转换时考虑到不同通道之间的相互依赖性。
三、SENetV2核心代码
下面的代码是SENetV2的核心代码,我们将其复制导'ultralytics/nn'目录下,在其中创建一个文件,我这里起名为SENetV2然后粘贴进去,其余使用方式看章节四。
import torch import torch.nn as nn __all__ = ['C2PSA_SENetV2', 'SELayerV2', 'SPPFSENetV2'] # 定义SE模块 class SELayer(nn.Module): def __init__(self, channel, reduction=16): super(SELayer, self).__init__() self.avg_pool = nn.AdaptiveAvgPool2d(1) self.fc = nn.Sequential( nn.Linear(channel, channel // reduction, bias=False), nn.ReLU(inplace=True), nn.Linear(channel // reduction, channel, bias=False), nn.Sigmoid() ) def forward(self, x): b, c, _, _ = x.size() y = self.avg_pool(x).view(b, c) y = self.fc(y).view(b, c, 1, 1) return x * y.expand_as(x) # 定义SaE模块 class SELayerV2(nn.Module): def __init__(self, in_channel, reduction=16): super(SELayerV2, self).__init__() assert in_channel >= reduction and in_channel % reduction == 0, 'invalid in_channel in SaElayer' self.reduction = reduction self.cardinality = 4 self.avg_pool = nn.AdaptiveAvgPool2d(1) # cardinality 1 self.fc1 = nn.Sequential( nn.Linear(in_channel, in_channel // self.reduction, bias=False), nn.ReLU(inplace=True) ) # cardinality 2 self.fc2 = nn.Sequential( nn.Linear(in_channel, in_channel // self.reduction, bias=False), nn.ReLU(inplace=True) ) # cardinality 3 self.fc3 = nn.Sequential( nn.Linear(in_channel, in_channel // self.reduction, bias=False), nn.ReLU(inplace=True) ) # cardinality 4 self.fc4 = nn.Sequential( nn.Linear(in_channel, in_channel // self.reduction, bias=False), nn.ReLU(inplace=True) ) self.fc = nn.Sequential( nn.Linear(in_channel // self.reduction * self.cardinality, in_channel, bias=False), nn.Sigmoid() ) def forward(self, x): b, c, _, _ = x.size() y = self.avg_pool(x).view(b, c) y1 = self.fc1(y) y2 = self.fc2(y) y3 = self.fc3(y) y4 = self.fc4(y) y_concate = torch.cat([y1, y2, y3, y4], dim=1) y_ex_dim = self.fc(y_concate).view(b, c, 1, 1) return x * y_ex_dim.expand_as(x) def autopad(k, p=None, d=1): # kernel, padding, dilation """Pad to 'same' shape outputs.""" if d > 1: k = d * (k - 1) + 1 if isinstance(k, int) else [d * (x - 1) + 1 for x in k] # actual kernel-size if p is None: p = k // 2 if isinstance(k, int) else [x // 2 for x in k] # auto-pad return p class Conv(nn.Module): """Standard convolution with args(ch_in, ch_out, kernel, stride, padding, groups, dilation, activation).""" default_act = nn.SiLU() # default activation def __init__(self, c1, c2, k=1, s=1, p=None, g=1, d=1, act=True): """Initialize Conv layer with given arguments including activation.""" super().__init__() self.conv = nn.Conv2d(c1, c2, k, s, autopad(k, p, d), groups=g, dilation=d, bias=False) self.bn = nn.BatchNorm2d(c2) self.act = self.default_act if act is True else act if isinstance(act, nn.Module) else nn.Identity() def forward(self, x): """Apply convolution, batch normalization and activation to input tensor.""" return self.act(self.bn(self.conv(x))) def forward_fuse(self, x): """Perform transposed convolution of 2D data.""" return self.act(self.conv(x)) class PSABlock(nn.Module): """ PSABlock class implementing a Position-Sensitive Attention block for neural networks. This class encapsulates the functionality for applying multi-head attention and feed-forward neural network layers with optional shortcut connections. Attributes: attn (Attention): Multi-head attention module. ffn (nn.Sequential): Feed-forward neural network module. add (bool): Flag indicating whether to add shortcut connections. Methods: forward: Performs a forward pass through the PSABlock, applying attention and feed-forward layers. Examples: Create a PSABlock and perform a forward pass >>> psablock = PSABlock(c=128, attn_ratio=0.5, num_heads=4, shortcut=True) >>> input_tensor = torch.randn(1, 128, 32, 32) >>> output_tensor = psablock(input_tensor) """ def __init__(self, c, attn_ratio=0.5, num_heads=4, shortcut=True) -> None: """Initializes the PSABlock with attention and feed-forward layers for enhanced feature extraction.""" super().__init__() self.attn = SELayerV2(c) self.ffn = nn.Sequential(Conv(c, c * 2, 1), Conv(c * 2, c, 1, act=False)) self.add = shortcut def forward(self, x): """Executes a forward pass through PSABlock, applying attention and feed-forward layers to the input tensor.""" x = x + self.attn(x) if self.add else self.attn(x) x = x + self.ffn(x) if self.add else self.ffn(x) return x class C2PSA_SENetV2(nn.Module): """ C2PSA module with attention mechanism for enhanced feature extraction and processing. This module implements a convolutional block with attention mechanisms to enhance feature extraction and processing capabilities. It includes a series of PSABlock modules for self-attention and feed-forward operations. Attributes: c (int): Number of hidden channels. cv1 (Conv): 1x1 convolution layer to reduce the number of input channels to 2*c. cv2 (Conv): 1x1 convolution layer to reduce the number of output channels to c. m (nn.Sequential): Sequential container of PSABlock modules for attention and feed-forward operations. Methods: forward: Performs a forward pass through the C2PSA module, applying attention and feed-forward operations. Notes: This module essentially is the same as PSA module, but refactored to allow stacking more PSABlock modules. Examples: >>> c2psa = C2PSA(c1=256, c2=256, n=3, e=0.5) >>> input_tensor = torch.randn(1, 256, 64, 64) >>> output_tensor = c2psa(input_tensor) """ def __init__(self, c1, c2, n=1, e=0.5): """Initializes the C2PSA module with specified input/output channels, number of layers, and expansion ratio.""" super().__init__() assert c1 == c2 self.c = int(c1 * e) self.cv1 = Conv(c1, 2 * self.c, 1, 1) self.cv2 = Conv(2 * self.c, c1, 1) self.m = nn.Sequential(*(PSABlock(self.c, attn_ratio=0.5, num_heads=self.c // 64) for _ in range(n))) def forward(self, x): """Processes the input tensor 'x' through a series of PSA blocks and returns the transformed tensor.""" a, b = self.cv1(x).split((self.c, self.c), dim=1) b = self.m(b) return self.cv2(torch.cat((a, b), 1)) class SPPFSENetV2(nn.Module): """Spatial Pyramid Pooling - Fast (SPPF) layer for YOLOv5 by Glenn Jocher.""" def __init__(self, c1: int, c2: int, k: int = 5, n: int = 3, shortcut: bool = False): """Initialize the SPPF layer with given input/output channels and kernel size. Args: c1 (int): Input channels. c2 (int): Output channels. k (int): Kernel size. n (int): Number of pooling iterations. shortcut (bool): Whether to use shortcut connection. Notes: This module is equivalent to SPP(k=(5, 9, 13)). """ super().__init__() c_ = c1 // 2 # hidden channels self.cv1 = Conv(c1, c_, 1, 1, act=False) self.cv2 = Conv(c_ * (n + 1), c2, 1, 1) self.m = nn.MaxPool2d(kernel_size=k, stride=1, padding=k // 2) self.n = n self.add = shortcut and c1 == c2 self.Att = SELayerV2(c1) def forward(self, x: torch.Tensor) -> torch.Tensor: """Apply sequential pooling operations to input and return concatenated feature maps.""" x = self.Att(x) y = [self.cv1(x)] y.extend(self.m(y[-1]) for _ in range(getattr(self, "n", 3))) y = self.cv2(torch.cat(y, 1)) return y + x if getattr(self, "add", False) else y if __name__ == "__main__": # Generating Sample image image_size = (1, 64, 240, 240) image = torch.rand(*image_size) # Model mobilenet_v1 = SPPFSENetV2(64, 64) out = mobilenet_v1(image) print(out.size())四、手把手教你添加SENetV2模块
下面的步骤如果你不会或者不想麻烦操作,可以联系作者获得本专栏添加所有项目文件的源代码,可直接训练.
4.1 修改一
第一还是建立文件,我们找到如下ultralytics/nn文件夹下建立一个目录名字呢就是'Addmodules'文件夹!
4.2 修改二
然后在Addmodules文件夹内建立一个新的py文件,将本文章节三中的“核心代码"复制粘贴进去。
4.3 修改三
第二步我们在该目录下创建一个新的py文件名字为'__init__.py',然后在其内部导入我们的文件,如下图所示。
4.4 修改四
第三步我门中到如下文件'ultralytics/nn/tasks.py'进行导入和注册我们的模块(此处只需要添加一次即可,如果你用我其它的改进机制这里的步骤只需要添加一次)!
4.5 修改五
在'ultralytics/nn/tasks.py'文件内的parse_model方法函数内(位置大概在1500+行左右),按照图示位置添加即可(此处需要自己有一定的判别能力,如果不会可联系作者获得视频教程)。
4.6 修改六
在'ultralytics/nn/tasks.py'文件内的parse_model方法函数内(位置大概在1550+行左右),按照图示位置添加即可,此处一定要对应好位置和缩进否则很容易报错。
elif m in {此处填写本章代码的名字.}: c2 = ch[f] args = [c2, *args]五、正式训练
5.1 yaml文件
5.1.1 yaml文件1
训练信息:YOLO26-C2PSA-SENetV2 summary: 263 layers, 2,463,004 parameters, 2,463,004 gradients, 5.7 GFLOPs
# Ultralytics 🚀 AGPL-3.0 License - https://ultralytics.com/license # Ultralytics YOLO26 object detection model with P3/8 - P5/32 outputs # Model docs: https://docs.ultralytics.com/models/yolo26 # Task docs: https://docs.ultralytics.com/tasks/detect # Parameters nc: 80 # number of classes end2end: True # whether to use end-to-end mode reg_max: 1 # DFL bins scales: # model compound scaling constants, i.e. 'model=yolo26n.yaml' will call yolo26.yaml with scale 'n' # [depth, width, max_channels] n: [0.50, 0.25, 1024] # summary: 260 layers, 2,572,280 parameters, 2,572,280 gradients, 6.1 GFLOPs s: [0.50, 0.50, 1024] # summary: 260 layers, 10,009,784 parameters, 10,009,784 gradients, 22.8 GFLOPs m: [0.50, 1.00, 512] # summary: 280 layers, 21,896,248 parameters, 21,896,248 gradients, 75.4 GFLOPs l: [1.00, 1.00, 512] # summary: 392 layers, 26,299,704 parameters, 26,299,704 gradients, 93.8 GFLOPs x: [1.00, 1.50, 512] # summary: 392 layers, 58,993,368 parameters, 58,993,368 gradients, 209.5 GFLOPs # YOLO26n backbone backbone: # [from, repeats, module, args] - [-1, 1, Conv, [64, 3, 2]] # 0-P1/2 - [-1, 1, Conv, [128, 3, 2]] # 1-P2/4 - [-1, 2, C3k2, [256, False, 0.25]] - [-1, 1, Conv, [256, 3, 2]] # 3-P3/8 - [-1, 2, C3k2, [512, False, 0.25]] - [-1, 1, Conv, [512, 3, 2]] # 5-P4/16 - [-1, 2, C3k2, [512, True]] - [-1, 1, Conv, [1024, 3, 2]] # 7-P5/32 - [-1, 2, C3k2, [1024, True]] - [-1, 1, SPPF, [1024, 5, 3, True]] # 9 - [-1, 2, C2PSA_SENetV2, [1024]] # 10 # YOLO26n head head: - [-1, 1, nn.Upsample, [None, 2, "nearest"]] - [[-1, 6], 1, Concat, [1]] # cat backbone P4 - [-1, 2, C3k2, [512, True]] # 13 - [-1, 1, nn.Upsample, [None, 2, "nearest"]] - [[-1, 4], 1, Concat, [1]] # cat backbone P3 - [-1, 2, C3k2, [256, True]] # 16 (P3/8-small) - [-1, 1, Conv, [256, 3, 2]] - [[-1, 13], 1, Concat, [1]] # cat head P4 - [-1, 2, C3k2, [512, True]] # 19 (P4/16-medium) - [-1, 1, Conv, [512, 3, 2]] - [[-1, 10], 1, Concat, [1]] # cat head P5 - [-1, 1, C3k2, [1024, True, 0.5, True]] # 22 (P5/32-large) - [[16, 19, 22], 1, Detect, [nc]] # Detect(P3, P4, P5)5.1.2 yaml文件2
训练信息:YOLO26-Att-SENetV2 summary: 271 layers, 2,508,188 parameters, 2,508,188 gradients, 5.8 GFLOPs
# Ultralytics 🚀 AGPL-3.0 License - https://ultralytics.com/license # Ultralytics YOLO26 object detection model with P3/8 - P5/32 outputs # Model docs: https://docs.ultralytics.com/models/yolo26 # Task docs: https://docs.ultralytics.com/tasks/detect # Parameters nc: 80 # number of classes end2end: True # whether to use end-to-end mode reg_max: 1 # DFL bins scales: # model compound scaling constants, i.e. 'model=yolo26n.yaml' will call yolo26.yaml with scale 'n' # [depth, width, max_channels] n: [0.50, 0.25, 1024] # summary: 260 layers, 2,572,280 parameters, 2,572,280 gradients, 6.1 GFLOPs s: [0.50, 0.50, 1024] # summary: 260 layers, 10,009,784 parameters, 10,009,784 gradients, 22.8 GFLOPs m: [0.50, 1.00, 512] # summary: 280 layers, 21,896,248 parameters, 21,896,248 gradients, 75.4 GFLOPs l: [1.00, 1.00, 512] # summary: 392 layers, 26,299,704 parameters, 26,299,704 gradients, 93.8 GFLOPs x: [1.00, 1.50, 512] # summary: 392 layers, 58,993,368 parameters, 58,993,368 gradients, 209.5 GFLOPs # YOLO26n backbone backbone: # [from, repeats, module, args] - [-1, 1, Conv, [64, 3, 2]] # 0-P1/2 - [-1, 1, Conv, [128, 3, 2]] # 1-P2/4 - [-1, 2, C3k2, [256, False, 0.25]] - [-1, 1, Conv, [256, 3, 2]] # 3-P3/8 - [-1, 2, C3k2, [512, False, 0.25]] - [-1, 1, Conv, [512, 3, 2]] # 5-P4/16 - [-1, 2, C3k2, [512, True]] - [-1, 1, Conv, [1024, 3, 2]] # 7-P5/32 - [-1, 2, C3k2, [1024, True]] - [-1, 1, SPPF, [1024, 5, 3, True]] # 9 - [-1, 2, C2PSA, [1024]] # 10 # YOLO26n head head: - [-1, 1, nn.Upsample, [None, 2, "nearest"]] - [[-1, 6], 1, Concat, [1]] # cat backbone P4 - [-1, 2, C3k2, [512, True]] # 13 - [-1, 1, nn.Upsample, [None, 2, "nearest"]] - [[-1, 4], 1, Concat, [1]] # cat backbone P3 - [-1, 2, C3k2, [256, True]] # 16 (P3/8-small) - [-1, 1, Conv, [256, 3, 2]] - [[-1, 13], 1, Concat, [1]] # cat head P4 - [-1, 2, C3k2, [512, True]] # 19 (P4/16-medium) - [-1, 1, Conv, [512, 3, 2]] - [[-1, 10], 1, Concat, [1]] # cat head P5 - [-1, 1, C3k2, [1024, True, 0.5, True]] # 22 (P5/32-large) - [16, 1, SELayerV2, []] # 23 # - [19, 1, SELayerV2, []] # 24 # - [22, 1, SELayerV2, []] # 25 # 此处的使用说法注释: 其中上面的三个注意力机制目前仅使用了23层,如果你想使用24层那么就取消掉代码注释, # 并将下面检测头中的19改为24,如果想使用第25层注意力机制同理,将下面检测头中的22改为25即可。 # 此处用法比较复杂如过不会联系Snu77博主获取视频教程 - [[23, 19, 22], 1, Detect, [nc]] # Detect(P3, P4, P5)5.1.3 yaml文件3
训练信息:YOLO26-SPPF-SENetV2 summary: 272 layers, 2,538,908 parameters, 2,538,908 gradients, 5.8 GFLOPs
# Ultralytics 🚀 AGPL-3.0 License - https://ultralytics.com/license # Ultralytics YOLO26 object detection model with P3/8 - P5/32 outputs # Model docs: https://docs.ultralytics.com/models/yolo26 # Task docs: https://docs.ultralytics.com/tasks/detect # Parameters nc: 80 # number of classes end2end: True # whether to use end-to-end mode reg_max: 1 # DFL bins scales: # model compound scaling constants, i.e. 'model=yolo26n.yaml' will call yolo26.yaml with scale 'n' # [depth, width, max_channels] n: [0.50, 0.25, 1024] # summary: 260 layers, 2,572,280 parameters, 2,572,280 gradients, 6.1 GFLOPs s: [0.50, 0.50, 1024] # summary: 260 layers, 10,009,784 parameters, 10,009,784 gradients, 22.8 GFLOPs m: [0.50, 1.00, 512] # summary: 280 layers, 21,896,248 parameters, 21,896,248 gradients, 75.4 GFLOPs l: [1.00, 1.00, 512] # summary: 392 layers, 26,299,704 parameters, 26,299,704 gradients, 93.8 GFLOPs x: [1.00, 1.50, 512] # summary: 392 layers, 58,993,368 parameters, 58,993,368 gradients, 209.5 GFLOPs # YOLO26n backbone backbone: # [from, repeats, module, args] - [-1, 1, Conv, [64, 3, 2]] # 0-P1/2 - [-1, 1, Conv, [128, 3, 2]] # 1-P2/4 - [-1, 2, C3k2, [256, False, 0.25]] - [-1, 1, Conv, [256, 3, 2]] # 3-P3/8 - [-1, 2, C3k2, [512, False, 0.25]] - [-1, 1, Conv, [512, 3, 2]] # 5-P4/16 - [-1, 2, C3k2, [512, True]] - [-1, 1, Conv, [1024, 3, 2]] # 7-P5/32 - [-1, 2, C3k2, [1024, True]] - [-1, 1, SPPFSENetV2, [1024, 5, 3, True]] # 9 - [-1, 2, C2PSA, [1024]] # 10 # YOLO26n head head: - [-1, 1, nn.Upsample, [None, 2, "nearest"]] - [[-1, 6], 1, Concat, [1]] # cat backbone P4 - [-1, 2, C3k2, [512, True]] # 13 - [-1, 1, nn.Upsample, [None, 2, "nearest"]] - [[-1, 4], 1, Concat, [1]] # cat backbone P3 - [-1, 2, C3k2, [256, True]] # 16 (P3/8-small) - [-1, 1, Conv, [256, 3, 2]] - [[-1, 13], 1, Concat, [1]] # cat head P4 - [-1, 2, C3k2, [512, True]] # 19 (P4/16-medium) - [-1, 1, Conv, [512, 3, 2]] - [[-1, 10], 1, Concat, [1]] # cat head P5 - [-1, 1, C3k2, [1024, True, 0.5, True]] # 22 (P5/32-large) - [[16, 19, 22], 1, Detect, [nc]] # Detect(P3, P4, P5)5.2 训练代码
大家可以创建一个py文件将我给的代码复制粘贴进去,配置好自己的文件路径即可运行。
import warnings warnings.filterwarnings('ignore') from ultralytics import YOLO if __name__ == '__main__': model = YOLO('模型配置文件地址,也就是5.1你保存到本地文件的地址') # 如何切换模型版本, 上面的ymal文件可以改为 yolo26s.yaml就是使用的26s, # 类似某个改进的yaml文件名称为yolo26-XXX.yaml那么如果想使用其它版本就把上面的名称改为yolo26l-XXX.yaml即可(改的是上面YOLO中间的名字不是配置文件的)! # model.load('yolo26n.pt') # 是否加载预训练权重,科研不建议大家加载否则很难提升精度 model.train( data=r"数据集文件地址", # 如果大家任务是其它的'ultralytics/cfg/default.yaml'找到这里修改task可以改成detect, segment, classify, pose cache=False, imgsz=640, epochs=20, single_cls=False, # 是否是单类别检测 batch=16, close_mosaic=0, workers=0, device='0', optimizer='MuSGD', # using SGD/MuSGD # resume=, # 这里是填写last.pt地址 amp=True, # 如果出现训练损失为Nan可以关闭amp project='runs/train', name='exp', )5.3 训练过程截图
五、本文总结
到此本文的正式分享内容就结束了,在这里给大家推荐我的YOLOv26改进有效涨点专栏,本专栏目前为新开的平均质量分98分,后期我会根据各种最新的前沿顶会进行论文复现,也会对一些老的改进机制进行补充,如果大家觉得本文帮助到你了,订阅本专栏,关注后续更多的更新~
专栏链接:YOLOv26有效涨点专栏包含:Conv、注意力机制、主干/Backbone、损失函数、优化器、后处理等改进机制
