【代码阅读】Sparse Convolution代码理解
发布日期:2021-09-16 07:31:52 浏览次数:2 分类:技术文章

本文共 11356 字,大约阅读时间需要 37 分钟。

文章目录

Sparse Convolution成功用于3D目标检测的网络,例如Second,Part-A^2,PV-RCNN等,证明其有效性。相比于3D Convolution,在运算速度和显存消耗中有巨大的优势。Sparse Convolution在SECOND论文中提出,并且原文中给出了实现的方法。但在Part- A 2 A^2 A2和PV-RCNN中用的是另外一个版本,并且在pytorch中非常易用。

本文通过阅读和代码,来看看spconv到底是怎么实现的。OpenPCDet主要是用来看spconv如何使用。

最重要的一句话,spconv的readme中写道:“This implementation use gather-gemm-scatter framework to do sparse convolution.”看完本文,对能理解这句话了。

OpenPCDet:VoxelBackBone8x

主要是看看spconv是怎么用的,具体查看OpenPCDet中的pcdet/models/backbones_3d/spconv_backbone.py中的VoxelBackBone8x类别。

class VoxelBackBone8x(nn.Module):    def __init__(self, model_cfg, input_channels, grid_size, **kwargs):        super().__init__()        self.model_cfg = model_cfg        norm_fn = partial(nn.BatchNorm1d, eps=1e-3, momentum=0.01)        self.sparse_shape = grid_size[::-1] + [1, 0, 0]  # self.sparse_shape=[41, 1600, 1480]        self.conv_input = spconv.SparseSequential(            spconv.SubMConv3d(input_channels, 16, 3, padding=1, bias=False, indice_key='subm1'),  # 重点关注            norm_fn(16),            nn.ReLU(),        )        ...        	def forward(self, batch_dict):        """        Args:            batch_dict:                batch_size: int                vfe_features: (num_voxels, C)                voxel_coords: (num_voxels, 4), [batch_idx, z_idx, y_idx, x_idx]        Returns:            batch_dict:                encoded_spconv_tensor: sparse tensor        """        voxel_features, voxel_coords = batch_dict['voxel_features'], batch_dict['voxel_coords']        batch_size = batch_dict['batch_size']				# 先初始化Sparse Tensor        input_sp_tensor = spconv.SparseConvTensor(            features=voxel_features,  # [32000, 3], 其中32000为两帧点云中voxel的数量,3是xyz            indices=voxel_coords.int(),  # [32000, 4],voxel的具体坐标,4中的第一个代表是哪一帧,后面是3维中的3个坐标            spatial_shape=self.sparse_shape,  # self.sparse_shape=[41, 1600, 1480],由kitti决定            batch_size=batch_size  # batch_size=2,有两帧点云        )        x = self.conv_input(input_sp_tensor)  # 进行submanifold convolution        ...

spconv

Sparse Tensor

在进行稀疏卷积之前,要构建Sparse Tensor,先看看这个是怎么构建的,在spconv中的spconv/__init__.py中:

class SparseConvTensor(object):    def __init__(self, features, indices, spatial_shape, batch_size, grid=None):        """        Args:            grid: pre-allocated grid tensor. should be used when the volume of spatial shape                is very large.        """        self.features = features  # 储存密集的feature        self.indices = indices   # 储存每个feature对应的voxel坐标系下的坐标        if self.indices.dtype != torch.int32:            self.indices.int()        self.spatial_shape = spatial_shape        self.batch_size = batch_size  # 储存batch size        self.indice_dict = {
} # 储存坐标之间的对应关系 self.grid = grid ...

SubMConv3d

接着上面看,要进行submanifold convolution,输入就是Sparse Tensor,从spconv中的spconv/conv.py中查看:

class SparseConvolution(SparseModule):    def __init__(self,                 ndim,                 in_channels,                 out_channels,                 kernel_size=3,                 stride=1,                 padding=0,                 dilation=1,                 groups=1,                 bias=True,                 subm=False,                 output_padding=0,                 transposed=False,                 inverse=False,                 indice_key=None):        super(SparseConvolution, self).__init__()        ...		# 下面的weight和bias就是本层filter拥有的权重和偏置        self.weight = Parameter(            torch.Tensor(*kernel_size, in_channels, out_channels))        if bias:            self.bias = Parameter(torch.Tensor(out_channels))        else:            self.register_parameter('bias', None)        ...    def forward(self, input):        ...		# 上面为一些准备工作,1)通过kernel size,padding等计算out_spatial_shape,处理卷积核1x1的特殊情况。				# input为sparse tensor,找到本layer要依赖的上一个indice        datas = input.find_indice_pair(self.indice_key)        		...			if self.indice_key is not None and datas is not None:                outids, _, indice_pairs, indice_pair_num, _ = datas            else:            	# 找到本层输入和输出的indices和它们的关系。输入的indices就是input的indices的属性。            	# 输出的indices则是outids,indice_pairs表示indices和outids的对应关系,indice_pair_num是用于指示的中间变量。            	# incides: [32000, 4]            	# outids: [32000, 4],由于submconv性质,outids和incides是一样的,如果是标准的spconv,就不一样了            	# indices_pairs: [27, 2, 32000],27是因为3x3x3的卷积核。2是对应关系,第0位储存indices的下标,第1位储存outids中的下标。            	# indice_pair_num: [27],indice_pair_num[i]==sum(indices_pairs[i,0,:])==sum(indices_pairs[i,1,:]),用于卷积过程中构造矩阵。                outids, indice_pairs, indice_pair_num = ops.get_indice_pairs(                    indices, batch_size, spatial_shape, self.kernel_size,                    self.stride, self.padding, self.dilation, self.output_padding, self.subm, self.transposed, grid=input.grid)                input.indice_dict[self.indice_key] = (outids, indices, indice_pairs, indice_pair_num, spatial_shape)		...		        if self.subm:        	# 进行spconv            out_features = Fsp.indice_subm_conv(features, self.weight,                                              indice_pairs.to(device),                                              indice_pair_num,                                              outids.shape[0])        ...		# 加上偏置        if self.bias is not None:            out_features += self.bias                # 构建输出的sparse tensor        out_tensor = spconv.SparseConvTensor(out_features, outids,                                             out_spatial_shape, batch_size)        out_tensor.indice_dict = input.indice_dict        out_tensor.grid = input.grid        return out_tensor

从上面代码中,可以看到两个问题没有解决:1)输出out_tensor的indices(也就是outids)怎么来的,两者关系indice_pairs怎么来的;2)subm_conv怎么计算的。

get_indice_pairs

先看outids和indice_pairs怎么计算的,要看spconv的spconv/ops.py中:

def get_indice_pairs(indices,             batch_size,             spatial_shape,             ksize=3,             stride=1,             padding=0,             dilation=1,             out_padding=0,             subm=False,             transpose=False,             grid=None):    ...    # 上面就是根据输入的indices,kernel size等信息计算out_shape。        if grid is None:        if ndim == 2:            get_indice_pairs_func = torch.ops.spconv.get_indice_pairs_2d        elif ndim == 3:        	# 这里get_indice_pairs_func就是get_indice_pairs_3d!!!!!!!!!            get_indice_pairs_func = torch.ops.spconv.get_indice_pairs_3d        else:            raise NotImplementedError                # 重点关注        # indices: [32000, 4]        # batch_size = 2        # out_shape = [41, 1600, 1408]        # spatial_shape = [41, 1600, 1408]        # ksize = [3, 3, 3]        # stride = [1, 1, 1]        # int(subm) = 1        # int(transpose) = 0        return get_indice_pairs_func(indices, batch_size, out_shape, spatial_shape, ksize,                            stride, padding, dilation, out_padding, int(subm), int(transpose))    ...

那么接着看get_indice_pairs_3d这个函数,要在spconv的src/spconv/all.cc中:

static auto registry =        ...                .op("spconv::get_indice_pairs_3d", &spconv::getIndicePair<3>) # get_indice_pairs_3d相关        .op("spconv::indice_conv_fp32", &spconv::indiceConv
) ...

那就得接着看spconv中的include/spconv/spconv_ops.h

template 
std::vector
getIndicePair(torch::Tensor indices, int64_t batchSize, std::vector
outSpatialShape, std::vector
spatialShape, std::vector
kernelSize, std::vector
stride, std::vector
padding, std::vector
dilation, std::vector
outPadding, int64_t _subM, int64_t _transpose) { // 具体就不细节展开了,可以自己去看代码 // 做的事情就是给定indices,conv方法,kernel size,确定输出的voxel哪些不为空,也就是找到outdis。由于submconv的性质,outdis其实是和indices是一样的。 // 有了outdis和indices,还需要知道indices中每一个voxel的27个周围voxel中哪些对应着outdis中哪个。也就是indice_pairs。}

indice_subm_conv

这个在spconv中的spconv/functional.py中:

class SubMConvFunction(Function):    @staticmethod    def forward(            ctx,            features,            filters,            indice_pairs,            indice_pair_num,            num_activate_out):        ctx.save_for_backward(            indice_pairs,            indice_pair_num,            features,            filters)        # 重要        return ops.indice_conv(features, filters, indice_pairs, indice_pair_num, num_activate_out, False, True)    @staticmethod    def backward(ctx, grad_output):        indice_pairs, indice_pair_num, features, filters = ctx.saved_tensors        input_bp, filters_bp = ops.indice_conv_backward(features, filters, grad_output, indice_pairs, indice_pair_num, False, True)                return input_bp, filters_bp, None, None, Noneindice_subm_conv = SubMConvFunction.apply

indice_conv要去看spconv/ops.py:

def indice_conv(features,              filters,              indice_pairs,              indice_pair_num,              num_activate_out,              inverse=False,              subm=False):    if filters.dtype == torch.float32:    	# 重要        return torch.ops.spconv.indice_conv_fp32(features, filters, indice_pairs,                                               indice_pair_num, num_activate_out,                                               int(inverse), int(subm))    elif filters.dtype == torch.half:        return torch.ops.spconv.indice_conv_half(features, filters, indice_pairs,                                               indice_pair_num, num_activate_out,                                               int(inverse), int(subm))    else:        raise NotImplementedError

indice_conv_fp32从spconv的src/spconv/all.cc中得知,要去看include/spconv/spconv_ops.h中:

template 
torch::Tensor indiceConv(torch::Tensor features, torch::Tensor filters, torch::Tensor indicePairs, torch::Tensor indiceNum, int64_t numActOut, int64_t _inverse, int64_t _subM) {
... // kenerVolume = 27 // 也就是说,这个循环是对3x3x3的kernel中每个分开计算的 for (int i = 0; i < kernelVolume; ++i) {
auto nHot = indicePairNumCpu.data
()[i]; if (nHot <= 0 || (subM && i == indicePairMaxOffset)) {
continue; } // auto timer = spconv::CudaContextTimer<>(); // 定义输入输出的buffer auto outputBufferBlob = torch::from_blob(outputBuffer.data
(), {
nHot, numOutPlanes}, options); auto inputBufferBlob = torch::from_blob(inputBuffer.data
(), {
nHot, numInPlanes}, options); ... // 有了indices,outids,和他俩之间关系indice_pairs,可以对于kernel的某个cell,gather出来这个cell要计算的voxel functor::SparseGatherFunctor
gatherFtor; gatherFtor(tv::TorchGPU(), tv::torch2tv
(inputBuffer), tv::torch2tv
(features), tv::torch2tv
(indicePairs).subview(i, inverse), nHot); TV_CHECK_CUDA_ERR(); // 矩阵乘法,对kernel中某个cell计算经过该cell提取特征的输出 torch::mm_out(outputBufferBlob, inputBufferBlob, filters[i]); ... // 将输出scatter到outids中 functor::SparseScatterAddFunctor
scatterFtor; scatterFtor(tv::TorchGPU(), tv::torch2tv
(output), tv::torch2tv
(outputBuffer), tv::torch2tv
(indicePairs).subview(i, !inverse), nHot, true); TV_CHECK_CUDA_ERR(); } return output;}

从这里,就理解了Readme中说的:“This implementation use gather-gemm-scatter framework to do sparse convolution.”

GEMM

gemm为通用矩阵乘的优化,在本代码中,直接调用torch.mm_out这个函数,依赖torch已经做好的GEMM。GEMM的理解,可以看下述博客:

https://blog.csdn.net/qq_20880415/article/details/104332743
https://zhuanlan.zhihu.com/p/66958390

对GEMM的理解就是,GEMM并未改变运算量,是通过把循环分开写,通过更多的利用cache储存来代替内存访问,从而减少时间。

转载地址:https://blog.csdn.net/wqwqqwqw1231/article/details/116463560 如侵犯您的版权,请留言回复原文章的地址,我们会给您删除此文章,给您带来不便请您谅解!

上一篇:python coding
下一篇:【论文阅读】【2D目标检测】CenterNetv2: Probabilistic two-stage detection

发表评论

最新留言

不错!
[***.144.177.141]2024年04月10日 01时45分02秒