PointNet++ Learning

2019-12-23

Words= 1k | Time≈ 6min

Few prior works study deep learning on point sets. PointNet by Qi et al. is a pioneer in this direction. However, by design PointNet does not capture local structures induced by the metric space points live in, limiting its ability to recognize fine-grained patterns and generalizability to complex scenes. In this work, they introduce a hierarchical neural network that applies PointNet recursively on a nested partitioning of the input point set. By exploiting metric space distances, our network is able to learn local features with increasing contextual scales. With further observation that point sets are usually sampled with varying densities, which results in greatly decreased performance for networks trained on uniform densities, we propose novel set learning layers to adaptively combine features from multiple scales. Experiments show that our network called PointNet++ is able to learn deep point set features efficiently and robustly. In particular, results significantly better than state-of-the-art have been obtained on challenging benchmarks of 3D point clouds.

1. Architecture

Pointnet

This paper introduce a type of novel neural network, named as PointNet++, to process a set of points sampled in a metric space in a hierarchical fashion (2D points in Euclidean space are used for this illustration). The general idea of PointNet++ is simple. They first partition the set of points into overlapping local regions by the distance metric of the underlying space. Similar to CNNs, they extract local features capturing fine geometric structures from small neighborhoods; such local features are further grouped into larger units and processed to produce higher level features. This process is repeated until they obtain the features of the whole point set.

This structure is mainly divided into three parts:

Sampling layer: used to select a part of points from the input point cloud, these points are also the centroids of the local area;
Grouping Layer: Used to select points that are “adjacent” to the centroid according to the rules of the neighborhood;
PointNet layer: Mini-PointNet is used to encode the graphics of local areas into feature vectors.

Input: dimensions N × (d + C) , which means N points with d dimension coordinate information and C dimension features.
Output: a matrix with dimension N′ × (d + C′) , which means a new C ′ with d-dimensional coordinate information and summary local information N ′ Points.

2.Non-uniform sampling density

Pointnet

2.1. Multi-scale grouping(MSG)

As shown on the left of the figure above, at each grouping layer, each group is determined by multiple scales (set multiple radius values), and multiple features are concated after pointnet extraction of features to obtain new features.

2.2. Multi-resolution grouping(MRG)

As shown on the right of the figure above. The feature vector on the left is obtained after two set abstractions, and the radius of each set abstraction is different. The feature vector on the right is obtained by directly performing pointnet convolution on all points in the current layer. In addition, when the density of the point cloud is uneven, the left and right feature vectors can be given different weights by judging the density of the current patch. For example, when the density in the patch is small, the information obtained by the left vector is not as reliable as the features extracted from all the midpoints of the patch, so the weight of the right feature vector is increased. This reduces the amount of calculations while solving the density problem.

3. classification and segmentation

Divided into Set Abstraction layers, Feature Propagation layers, FC layers. A UNet-like structure, the code of the entire segmented network is as follows:

 # Set Abstraction layers
    l1_xyz, l1_points, l1_indices = pointnet_sa_module(l0_xyz, l0_points, npoint=512, radius=0.2, nsample=64, mlp=[64,64,128], mlp2=None, group_all=False, is_training=is_training, bn_decay=bn_decay, scope='layer1')
    l2_xyz, l2_points, l2_indices = pointnet_sa_module(l1_xyz, l1_points, npoint=128, radius=0.4, nsample=64, mlp=[128,128,256], mlp2=None, group_all=False, is_training=is_training, bn_decay=bn_decay, scope='layer2')
    l3_xyz, l3_points, l3_indices = pointnet_sa_module(l2_xyz, l2_points, npoint=None, radius=None, nsample=None, mlp=[256,512,1024], mlp2=None, group_all=True, is_training=is_training, bn_decay=bn_decay, scope='layer3')

    # Feature Propagation layers
    l2_points = pointnet_fp_module(l2_xyz, l3_xyz, l2_points, l3_points, [256,256], is_training, bn_decay, scope='fa_layer1')
    l1_points = pointnet_fp_module(l1_xyz, l2_xyz, l1_points, l2_points, [256,128], is_training, bn_decay, scope='fa_layer2')
    l0_points = pointnet_fp_module(l0_xyz, l1_xyz, tf.concat([l0_xyz,l0_points],axis=-1), l1_points, [128,128,128], is_training, bn_decay, scope='fa_layer3')

    # FC layers
    net = tf_util.conv1d(l0_points, 128, 1, padding='VALID', bn=True, is_training=is_training, scope='fc1', bn_decay=bn_decay)
    end_points['feats'] = net 
    net = tf_util.dropout(net, keep_prob=0.5, is_training=is_training, scope='dp1')
    net = tf_util.conv1d(net, 50, 1, padding='VALID', activation_fn=None, scope='fc2')
    
def pointnet_fp_module(xyz1, xyz2, points1, points2, mlp, is_training, bn_decay, scope, bn=True):
    ''' PointNet Feature Propogation (FP) Module
        Input:                                                                                                      
            xyz1: (batch_size, ndataset1, 3) TF tensor                                                              
            xyz2: (batch_size, ndataset2, 3) TF tensor, sparser than xyz1                                           
            points1: (batch_size, ndataset1, nchannel1) TF tensor                                                   
            points2: (batch_size, ndataset2, nchannel2) TF tensor
            mlp: list of int32 -- output size for MLP on each point                                                 
        Return:
            new_points: (batch_size, ndataset1, mlp[-1]) TF tensor
    '''
    with tf.variable_scope(scope) as sc:
        dist, idx = three_nn(xyz1, xyz2)
        dist = tf.maximum(dist, 1e-10)
        norm = tf.reduce_sum((1.0/dist),axis=2,keep_dims=True)
        norm = tf.tile(norm,[1,1,3])
        weight = (1.0/dist) / norm  #weight is the inverse of distance
        # interpolate 
        interpolated_points = three_interpolate(points2, idx, weight)

        if points1 is not None:
            new_points1 = tf.concat(axis=2, values=[interpolated_points, points1]) # B,ndataset1,nchannel1+nchannel2
        else:
            new_points1 = interpolated_points
        new_points1 = tf.expand_dims(new_points1, 2)
        for i, num_out_channel in enumerate(mlp):
            new_points1 = tf_util.conv2d(new_points1, num_out_channel, [1,1],
                                         padding='VALID', stride=[1,1],
                                         bn=bn, is_training=is_training,
                                         scope='conv_%d'%(i), bn_decay=bn_decay)
        new_points1 = tf.squeeze(new_points1, [2]) # B,ndataset1,mlp[-1]
        return new_points1

SA module Feature extraction module: downsampling. The input is (N, D) points for N D-dimensional features, and the output is (N’, D’) for N’ points after downsampling, and each point uses the furthest point to find N’ center points. The characteristics of the N′ dimension are obtained by pointnet calculation. The previous classification network said that I won’t go into details. FP module feature transfer module: used for upsampling.

Using the inverse of the distance as the weight. This interpolation of input (N, D) and output (N’, D) guarantees that the feature dimensions of the input remain unchanged.

作者： Chris Yan
链接： https:/Yansz.github.io/2019/12/23/PointNet++/
版权声明： 本博客所有文章除特别声明外，均采用 MIT 许可协议。转载请注明出处！