DeepLab全家桶(From v1 to v3+)

DeepLabv1

论文地址:Semantic Image Segmentation with Deep Convolutional Nets and Fully Connected CRFs

其实挺烦看这种远古论文的,引用的算法现在都不太常见,使用的措辞也和现在不太一样。该论文主要引入了空洞卷积(Astrous/Dilated Convolution)条件随机场(Conditional Random Field, CRF)

空洞卷积,顾名思义,即是在卷积核权重之间注入空洞,使用小卷积核的计算量获得大卷积核的感受野。(如理解有误请邮件指正)

空洞卷积比传统卷积多一个参数为采样率(dilation rate),表示一个卷积核中采样的间隔。

Q2DWex.gif

条件随机场涉及到很多机器学习的知识,学起来比较耗时间,而且在后来的DeepLab版本中被取代,所以此处暂略,有机会再补上。

DeepLabv2

论文地址:DeepLab: Semantic Image Segmentation with Deep Convolutional Nets, Atrous Convolution, and Fully Connected CRFs

比起v1,v2的主要改动是增加了带孔空间金字塔池化(ASPP)模块,其思想来源于SPPnet。但是文中对ASPP的阐述非常少,完全没有讲清楚ASPP的机制,只能通过论文中的图片和网上的博客来猜。

YM_ISX2ECM53ZW3_4T7HNYJ.png

_H7RQ3P@__TYL_4_87Z0H05.png

可以看出,ASPP使用了几种不同采样率的空洞卷积,对一张特征图得出多个分支后,最终concat到一起。可以借助代码理解一下这部分:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
#without bn version
class ASPP(nn.Module):
def __init__(self, in_channel=512, depth=256):
super(ASPP,self).__init__()
self.mean = nn.AdaptiveAvgPool2d((1, 1)) #(1,1)means ouput_dim
self.conv = nn.Conv2d(in_channel, depth, 1, 1)
self.atrous_block1 = nn.Conv2d(in_channel, depth, 1, 1)
self.atrous_block6 = nn.Conv2d(in_channel, depth, 3, 1, padding=6, dilation=6)
self.atrous_block12 = nn.Conv2d(in_channel, depth, 3, 1, padding=12, dilation=12)
self.atrous_block18 = nn.Conv2d(in_channel, depth, 3, 1, padding=18, dilation=18)
self.conv_1x1_output = nn.Conv2d(depth * 5, depth, 1, 1)

def forward(self, x):
size = x.shape[2:]

image_features = self.mean(x)
image_features = self.conv(image_features)
image_features = F.upsample(image_features, size=size, mode='bilinear')

atrous_block1 = self.atrous_block1(x)
atrous_block6 = self.atrous_block6(x)
atrous_block12 = self.atrous_block12(x)
atrous_block18 = self.atrous_block18(x)

net = self.conv_1x1_output(torch.cat([image_features, atrous_block1, atrous_block6,
atrous_block12, atrous_block18], dim=1))
return net

DeepLabv3

论文地址:Rethinking Atrous Convolution for Semantic Image Segmentation

提出了串联(cascade)和并联(parallel)两种格式,并指出并联效果更好。

YF@__L_0M@_0RUKBG_N_O6C.png

_C1PITC8~_S3@U_48_2_L5M.png

网络去除了CRF,修改了一些参数,应用了一些新技术(比如批归一化)使模型更加精简。

虽然从文中看不出做了多少修改,但作者说性能得到了很大的提升。科科。

DeepLabv3+

论文地址:Encoder-Decoder with Atrous Separable Convolution for Semantic Image Segmentation

8102年,deeplab终于将Encoder-Decoder结构加进网络里了,之前一直用的双线性插值做上采样。

V7U32G_QEI_GOMGI97N6LAG.png

__ZDD_GOD~3NX9_P_L@HSWA.png

参考文献

  1. 【语义分割系列:一】DeepLab v1 / v2 论文阅读翻译笔记
  2. 语义分割(semantic segmentation)—DeepLabV3之ASPP(Atrous Spatial Pyramid Pooling)代码详解
  3. deeplab v3论文翻译 Rethinking Atrous Convolution for Semantic Image Segmentation
  4. Deeplab相关改进的阅读记录(Deeplab V3和Deeplab V3+)
-------------本文结束感谢您的阅读-------------