3D Robot Perception and Object Classification


This page will describe how to create a Python perception pipeline, utilising DBSCAN and a trained SVM to perform object recognition and location to complete tabletop pick and place operation using the RGB-D camera point-cloud data using ROS and Gazebo. Below is a walkthrough of the fundamental principles and methods used in the project and it's subsequent Python implementation. The information supplied is not exhaustive, and it is recommended to view the source code and comments on GitHub for a deeper understanding of the code. A link to the GitHub repository is here


This project page will describe the following sections

Point-cloud data filtering

Object clustering

Feature extraction

Object recognition with a support vector machine

Testing and results


Point-Cloud Filtering


For this project, we will be using a model of the PR2 robot to collect data and sort objects into bins. The perception supplied by the PR2 is in the form an RGB-D or red green blue depth camera, similar to an Xbox Kinect or 3D scanner. These types of cameras produce point-clouds which is an RGB image where the data points are in a three-dimensional coordinate system. An example of this is the image below. ​


The first step to filtering is to remove outliers. For this, we will use a nearest-neighbour, standard deviation filter. The following function removes this data noise. 



Removes outlier data from a point cloud

:param: point_cloud, point cloud containing the filtered clusters

:param: neighboring_pts, the number neighbouring of points to be analysed against

:param: scale_factor, threshold scaling factor of the standard deviation

:return: filtered point cloud


def outlierFilter(point_cloud, neighboring_pts, scale_factor):


    # create the filter object

    outlier_filter = point_cloud.make_statistical_outlier_filter()

    # Set the number of neighboring points to analyze for any given point


    # Any point with a mean distance larger than global (mean distance+x*std_dev) will

    # be considered outliers



    # return the filtered data

    return outlier_filter.filter()


The remaining cloud contains a detail-rich cloud that contains far more information than is required to correctly perform classification. To minimise the data we will reduce the number of points by downsampling using a voxel grid approach. A voxel represents a value for the collection of data that lies within a set cubic grid. We accomplish this with the following. 



Voxel grid down samples the point cloud

:param: point_cloud, point cloud containing the filtered clusters

:param: leaf_size, the measurement size of each unit

:return: Voxel point cloud


def voxelGrid(point_cloud, leaf_size):


    vox = point_cloud.make_voxel_grid_filter()

    vox.set_leaf_size(leaf_size, leaf_size, leaf_size)

    # Call the filter function to obtain the resultant down sampled point cloud

    return vox.filter()


This produced the following image.


Since we know that the objects are placed on a table, we can remove the data that is surrounding the table. We will crop the point-cloud to the known locations of the tabletop and objects in the field of view. This is achieved with a pass-through filter in the x, y and z planes. That is straight ahead, left to right and bottom to the top of the camera respectively. The following removes points from outside the supplied min and max values.



Filters the point cloud to only the data within the min and max values

:param: point_cloud, point cloud containing the filtered clusters

:param: axis, x, y or z axis to be filter along. x is in the direction of

the camera, y is the left & right of the camera, z up & down

:param: axis_min, min distance to filter

:param: axis_min, max distance to filter

:return: filtered point cloud


def passThroughFilter(point_cloud, axis, axis_min, axis_max):


    # Create a PassThrough filter object.

    passthrough = point_cloud.make_passthrough_filter()

    filter_axis = axis


    passthrough.set_filter_limits(axis_min, axis_max)


    # return the passthrough filtered point cloud

    return passthrough.filter()


This produces the following image.


We now need to remove the tabletop. For this segmentation is performed using Random Sample Consensus or RANSAC. RANSAC identifies points in the dataset that belong to a particular model which could be a plane or a specific shape. The RANSAC algorithm uses this model to find the data points that fit the model called inliers and data that does not meet that model is called outliers.


As we wish to remove the table from the data, we can use the plane model which will group all the data points that correspond to the table as inliers and the objects as outliers.


''' RANSAC plane segmentation for segmenting the table

:param: point_cloud, point cloud containing the filtered clusters

:param: max_distance, length of a point to be considered fitting the model

:return: inlier indices and model coefficients of the segment


def ransacFilter(point_cloud, max_distance):


    # Create the segmentation object

    seg = point_cloud.make_segmenter()


    # Set the model used for fitting




    # Max distance for a point to be considered fitting the model



    # return the inlier indices and model coefficients of the segment

    return seg.segment()


The inliers or table top is as follows.



And the outliers.



We have now filtered the point-cloud dataset to only contain the target objects. We can now move onto clustering of each object to separate each object.


Object Clustering


Now with a dataset that only contains the target objects, we must now extract each of the data points into clusters. To do this we will use the Euclidean clustering algorithm. 


In order to perform Euclidean Clustering, we need to construct a k-d tree from the cloud_objects point cloud, this helps increase the computational efficiency for searching for neighbouring points. When using a k-d tree, we can only use spacial information which requires us to remove the RGB data from the XYZRGB dataset. To do this we will borrow a function from the Udacity pcl_helper.py file.


""" Converts a PCL XYZRGB point cloud to an XYZ point cloud (removes color info)


XYZRGB_cloud (PointCloud_PointXYZRGB): A PCL XYZRGB point cloud


PointCloud_PointXYZ: A PCL XYZ point cloud


def XYZRGB_to_XYZ(XYZRGB_cloud):

    XYZ_cloud = pcl.PointCloud()

    points_list = []


    for data in XYZRGB_cloud:

        points_list.append([data[0], data[1], data[2]])



    return XYZ_cloud



Feature Extraction


With this colour-less cloud, we can now perform Euclidean clustering.



Euclidean Clustering

:param: white_cloud, points cloud containing only xyz data

:return: the cluster locations


def euclideanCluster(white_cloud):

    tree = white_cloud.make_kdtree()


    # Create a cluster extraction object

    ec = white_cloud.make_EuclideanClusterExtraction()


    # Set tolerances for distance threshold

    # as well as minimum and maximum cluster size (in points)





    # Search the k-d tree for clusters



    # Extract indices for each of the discovered clusters

    return ec.Extract()


By applying a cluster-mask to the returned indices, we can visualise that each cluster has correctly grouped each object. This can be performed by the following code resulting in the image below.


# Create a Cluster-Mask Point Cloud to visualize each cluster separately

# Assign a color corresponding to each segmented object in scene

cluster_color = get_color_list(len(cluster_indices))

color_cluster_point_list = []


for j, indices in enumerate(cluster_indices):

    for i, indice in enumerate(indices):





# Create new cloud containing all clusters, each with unique color

cluster_cloud = pcl.PointCloud_PointXYZRGB()



In order to produce an accurate classifier, we will need to extract various features from each cluster to produce a robust object classifier.


The features that we will extract will all be a colour histograms and a histogram of the objects surface normals. Using binned histograms provides the benefit or removing object detail, as the bin size decreases, which normalises the object features, allowing better classification against variations with an objects orientation, position or environmental lighting conditions. However, as the bin size decreases, detail is lost, and different objects begin to show the same characteristic's resulting in incorrect object classification.


Before we extract the features, we will convert the point-cloud image data from RGB to HSV. This will help prevent variations in the environment lighting. This is performed with the following. 


def rgb_to_hsv(rgb_list):

    rgb_normalized = [1.0*rgb_list[0]/255, 1.0*rgb_list[1]/255, 1.0*rgb_list[2]/255]

    hsv_normalized = matplotlib.colors.rgb_to_hsv([[rgb_normalized]])[0][0]

    return hsv_normalized


We can now extract a binned histogram of the object's colour.



Computes and bins the point-cloud data using the objects colour.

:param: point_cloud, point cloud containing the filtered clusters

:param: using_hsv, convert the cluster RGB data to HSV when set to True

:param: nbins,number of bins that data will be pooled into

:param: nrange, value range of the data to be pooled

:return: the normalised histogram features


def compute_color_histograms(cloud, using_hsv=False, nbins=32, nrange=(0,256)):


    # Compute histograms for the clusters

    point_colors_list = []


    # Step through each point in the point cloud

    for point in pc2.read_points(cloud, skip_nans=True):

        rgb_list = float_to_rgb(point[3])


        if using_hsv:

            point_colors_list.append(rgb_to_hsv(rgb_list) * 255)




    # Populate lists with color values

    channel_1_vals = []

    channel_2_vals = []

    channel_3_vals = []


    for color in point_colors_list:




    # Compute each channels histograms

    channel_1_hist = np.histogram(channel_1_vals, bins=nbins, range=nrange)

    channel_2_hist = np.histogram(channel_2_vals, bins=nbins, range=nrange)

    channel_3_hist = np.histogram(channel_3_vals, bins=nbins, range=nrange)

    # Concatenate and normalize the histograms

    hist_features = np.concatenate((channel_1_hist[0], channel_2_hist[0],                                                             channel_3_hist[0])).astype(np.float64)

    normed_features = hist_features / np.sum(hist_features)

    return normed_features


During training of the model, the observed results were found to be consistent with the number of bins within the range from 24 to 64. However, with some objects having better classification with a low bin count and other objects having better classification with a larger bin count. The final value used is 44.


As we are dealing with three-dimensional data, we have partial information of the object's shape. To gain an understanding of an object's shape, we will calculate the distribution of an object's surface normals. As a surface normal is perpendicular to the surface of a shape, the distribution of these normals will vary dramatically between a flat or round objects. It is this distribution of the normals, taken as a whole, is what we will use to describe the surface of the object. As with the colour features, we will bin the surface normals into a histogram as the two following functions describe.



Calculates the surface normal of each point in the point cloud

:param: cloud: point cloud to get the point normals

:return: array of point surface normals


def get_normals(cloud):

    get_normals_prox = rospy.ServiceProxy('/feature_extractor/get_normals', GetNormals)

    return get_normals_prox(cloud).cluster


We call the above get_normals() function to calculate the surface normals, followed by the histogram binning code below.



Computes and bins the point-cloud data using the objects distribution of surface normals.

:param: normal_cloud, point cloud containing the filtered clusters.

:param: nbins,number of bins that data will be pooled into.

:param: nrange, value range of the data to be pooled.

:return: the normalised histogram of surface normals


def compute_normal_histograms(normal_cloud, nbins=32, nrange=(-1,1)):

    norm_x_vals = []

    norm_y_vals = []

    norm_z_vals = []


    for norm_component in pc2.read_points(normal_cloud,

                                          field_names = ('normal_x', 'normal_y', 'normal_z'),





    # Compute histograms of normal values (just like with color)

    norm_x_hist = np.histogram(norm_x_vals, bins=nbins, range=nrange)

    norm_y_hist = np.histogram(norm_y_vals, bins=nbins, range=nrange)

    norm_z_hist = np.histogram(norm_z_vals, bins=nbins, range=nrange)


    # Concatenate and normalize the histograms

    hist_features = np.concatenate((norm_x_hist[0], norm_y_hist[0],                                                                   norm_z_hist[0])).astype(np.float64)

    normed_features = hist_features / np.sum(hist_features)


    return normed_features


The surface normals are a unit vector from -1 to 1. Due to this, three different bin values were tested, 10, 20 and 40. The results were considered to be similar results to the colour histogram, where different objects performed better with either a smaller or larger value. The final result used was 20. ​



Object Recognition with an SVM


We have now not only located the objects in the point-cloud, or be it in clusters, we have also extracted features from the data. It is now time to classify the clusters into objects. We will use a Support Vector Machine or SVM classifier as it is good with complicated domains that have a good margin of separation. 



Classifies all the objects in a point cloud

:param: point_cloud, point cloud containing the filtered clusters

:param: white_cloud, points cloud containing only xyz data

:param: cluster_indices, locations of the clusters

:return: the classified objects positions and their labels


def classifyClusters(point_cloud, white_cloud, cluster_indices):

    detected_objects_labels = []

    detected_objects = []


    # Classify each detected cluster

    for index, pts_list in enumerate(cluster_indices):

        # Grab the points for the cluster from the extracted outliers (cloud_objects)

        pcl_cluster = point_cloud.extract(pts_list)

        # convert the cluster from pcl to ROS using helper function

        ros_cluster = pcl_to_ros(pcl_cluster)

        # Extract the cluster colour feature histogram

        chists = compute_color_histograms(ros_cluster, using_hsv=True, nbins=44)

        normals = get_normals(ros_cluster)

        # Extract the cluster surface normals histogram

        nhists = compute_normal_histograms(normals, nbins=20)

        feature = np.concatenate((chists, nhists))

        # Make the prediction, retrieve the label for the result

        # and add it to detected_objects_labels list

        # clf is the trained SVM classifier

        prediction = clf.predict(scaler.transform(feature.reshape(1, -1)))

        label = encoder.inverse_transform(prediction)[0]


        # store the label to be published into RViz

        label_pos = list(white_cloud[pts_list[0]])

        label_pos[2] += .4

        object_markers_pub.publish(make_label(label, label_pos, index))

        # Add the detected object to the list of detected objects

        do = DetectedObject()

        do.label = label

        do.cloud = ros_cluster


    return [detected_objects, detected_objects_labels]


It is from this function that we iterate through each cluster and then extracts the previously described features for the clusters point-cloud. It is now at this point we predict what the object is. The trained SVM classifier, in the above, is named clf. With our prediction, we can now add this to the object's list so that we can pick up and place the object into the correct box later on. 


The training of the clf model is performed prior to the running of the project and is available in the sensor_stick folder here. The training file will no be described here, but it is available in the project repository here. Follow the readme instructions for more details.   




Testing and Results

The model used for the following results has been trained with 4000 combinations of the eight potential objects. 

These objects are biscuits, soap, soap2, book, glue, sticky_notes, snacks and eraser.


The SVM classifier uses default sklearn.svc.svm parameters with two exceptions, 1. kernel = 'linear' and C = 50. The original 'rbf' kernel was found to overfit the data resulting in inaccurate classification, while a large C created more precise boundaries but with a minimal expense to overfit the data. With the svm parameters set to kernel = 'linear', C=50, hist bins = 44 and norm bins = 20, we achieve an accuracy of 0.92 (+/- 0.02).


The most significant contributor to accuracy is the amount of data collected for training. The model implemented for testing has been trained with 500 point-cloud variations of each objects resulting in a data set of 4000. As a result of the trained model, available on GitHub as model.sav, has led to the following normalised confusion matrix.


With our trained classifier, we can now test our accuracy with three environments.

Environment one


Environment one resulted in 3/3 correct classifications. The above image is the classified objects of the environment, as seen by the robot with the clusters overlay.

Upon classification of the objects, a Picking List of the order in which an object are to be picked up and its resulting drop location was issued. We will not go into detail of the code to pickup and placing of the objects here. The code and ROS commands to pickup and drop each object can be found in the perception.py file under the function pr2_mover(), located between lines 287 and 400. This function uses two for loops to first iterate over the picking list, which then compares the current picking item against all the detected clusters. Upon a match, the cluster's point-cloud centroid can be calculated with the following.

# Get the PointCloud for a given object and obtain it's centroid
centroids.append(np.mean(points_arr, axis=0)[:3])
center_x = np.asscalar(centroids[j][0])
center_y = np.asscalar(centroids[j][1])
center_z = np.asscalar(centroids[j][2])


The centroid coordinates are then passed to the robots movement service routine, line 387, along with the environment number, objects name, the left or right arm to pick the object up with, the objects pickup location and the objects final drop location.

If you would like to learn about the kinematics of a robot arm, checkout the "Robot Arm Pick and Place" project here.  

Environment two


Environment two resulted in 5/5 correct classifications. The above image is the classified objects of the environment, as seen by the robot with the clusters overlay.

Environment three


Environment three resulted in 7/8 correct classifications.

The complete project code can be viewed and downloaded from GitHub here.



I hope you enjoyed reviewing the project. - Let's get in touch. LinkedIn messages work best.

  • github
  • udacity_black
  • LinkedIn - White Circle
  • git
  • YouTube - White Circle
  • udacity