![]() |
Visual Servoing Platform version 3.7.0
|
In ViSP 3.7, we are introducing a new Render-Based Tracker (RBT), that leverages rendering on the GPU to extract geometric information in order to perform online and continuous 6D pose estimation of a known 3D object.
It is an extension of the Model-Based Tracker (MBT) already present in ViSP (see Tutorial: Markerless generic model-based tracking using a color camera). This implementation is derived from the work of [45].
A major advantage over the Model-Based Tracker is that the RBT uses a common mesh representation for the object to track. This representation does not require additional preprocessing for feature selection and is widely supported in software such as Blender. It can also be obtained through photogrammetry processes, minimizing the work required before using the tracker. Note that we do not consider deformable objects, only non-articulated rigid objects.
Similar to the MBT, tracking is framed as an optimization problem, where the difference between the features (geometric or photometric) extracted from a 3D model and those observed in the image must be minimized.
The RBT is built as a modular pipeline, see 3. Algorithm overview. In practice, this means:
To successfully build the RBT module, you will need:
Additionally, Some optional dependencies are (strongly) recommended:
To use the RBT you will need several things:
There are very little restrictions on the 3D model that can be used in the RBT.
First, your 3D model's size should be expressed in meters. Be aware that some CAD software export models in millimeters.
If you are using the initialization by click, you will have to be careful with the model orientation when exporting (see below).
Note that while 3D meshes support textures, the presently available RBT features do not use the texture information.
To correctly process your model, here is a small overview of the steps to follow in Blender
To make sure your model is correct, you should start by setting the scale and orientation of your 3D model.
You should first import your model in Blender. In the top left corner of the Blender window, click on File > Import > (Your model type)**, then pick your object in the popup window.
Your model should have been imported and should be visible in the 3D viewer. You should then click on it and press N** to bring up the Transform panel in the top right corner. Then you should press Ctrl+A to open the Apply menu and click on the All transforms item. When you do this, you will not see any change applied to the model itself, but this will ensure that what you see in Blender is what is observed by the tracker and that both use the same orientation and scale.
If you are using initialization by click as described in 4.1. Initialization by user click section, you can select the 3D points to click directly in Blender. To do so, first click on your object and press tab to go into Edit mode. Then, press N to display the Transform Panel (top right of the viewer). Choose a 3D point to use for initialization, then click on it. In the Transform panel, you should then see the XYZ coordinates of the point, which you can copy into the init file. Be sure to select the "Global" frame in the transform panel.
Once you have selected your points make sure to export your model in the same frame as the Blender frame. You can export your 3D model, using the File > Export > (Supported 3D format such as .obj) and save your model in the desired location. In the export panel, set "Forward Axis" to -Z and "Up Axis" to Y.
With all the prerequisites met, you can now understand (if you wish) how the tracker works and how to customize it for your needs.
This section of the tutorial details how the tracker works.
The RBT is used by building a vpRBTracker object, configuring it and then feeding it RGB or RGB-D frames.
As the RBT is a tracking algorithm, an initial pose must be provided. Given an initial pose, the RBT will continuously track an object in a sequence of frames. This also assumes that the motion between consecutive frame is small enough so that the tracker can retrieve the features and correctly update the pose of the object in the camera frame. Otherwise, there is a risk of divergence, leading to a need for reinitialization. This can be done via click (see 4.1. Initialization by user click section and the vpRBTracker::initClick() method). Otherwise, you can use a pose estimation algorithm (e.g. Megapose described in Tutorial: Tracking with MegaPose and vpMegaPose::estimatePoses() method) followed by a call to vpRBTracker::setPose().
Roughly the algorithm works as follows:
Object rendering is done through Panda3D. The object is rendered at the last computed pose.
All the information derived from the render is stored into a vpRBRenderData, in the current frame data (vpRBFeatureTrackerInput). This object contains information such as:
For more information on how rendering is performed you can examine the following sources:
This section details the different features that can be used to track the object. The RBT is flexible, and each feature can be added or removed, depending on your scenario and available sensor.
For each feature, a weighting function is computed at every iteration, and a user specified weight can be used to consider certain features as more or less important than others.
All features inherit from a base class, vpRBFeatureTracker, that defines the different functionalities that should be implemented for a feature to be trackable.
To add or interact with the features of the RBT, you can use vpRBTracker::addTracker() and vpRBTracker::getFeatureTrackers().
| Feature | Class | Required inputs | Note |
|---|---|---|---|
| KLT 2D points | vpRBKltTracker | Grayscale image | Requires OpenCV |
| Silhouette moving edges | vpRBSilhouetteMeTracker | Grayscale image | - |
| Silhouette color edges | vpRBSilhouetteCCDTracker | Color image | More expensive than silhouette moving edge based tracker |
| Depth map (point-to-plane) | vpRBDenseDepthTracker | Depth image | - |
| XFeat keypoints | RBXFeatFeatureTracker | Color image | Only available in Python |
To define you own features (advanced), see 7.1. Defining your own features.
While you can use all the different features together here are some common rules of thumb
When tracking, it is possible for occlusions or lighting artifacts to appear. In those cases, considering ill-defined or ill-matched features may lead to tracking failure. To handle those cases and improve tracking reliability, the RBT comes with two mechanisms:
While M-estimators are always used, the masking approach is entirely optional. To enable masking, you should call vpRBTracker::setObjectSegmentationMethod() with the method that you wish to use. To disable it, pass it nullptr as input argument. As of now, the only method available is vpColorHistogramMask. This method computes time-smoothed histograms of the object and background's color distributions, and compares their output probabilities for a given pixel color to compute an object membership score. If available, this method may use camera depth information to update the object histograms using only reliable pixels in terms of depth (where object and render are closely aligned) This method is ideal when there is a clear photometric distinction between object and environment.
You may define your own approach to object segmentation (by, e.g., using a neural network to compare feature maps or using more depth information) by inheriting from vpObjectMask.
Note that the use of the probability map output by the segmentation method is implemented differently for each type of feature. For instance, contour based method will search for strong probability differences along the contour normal, while point based feature may only look in a small neighborhood to see if a given area belong to the object.
The diagram below sums up the combination of the masking and outlier rejection steps.
To update the camera pose, the RBT implements tracking as a non-linear optimization problem. The error to be minimized is the difference between features extracted from the 3D model that have been matched with features in the current images. For each feature, we must compute the visual error as well as its relationship to the pose (the Jacobian).
The tracking flow is as follows:
Then, optimization is an iterative process:
To influence the importance of each feature type, the user can tweak its weight by calling vpRBFeatureTracker::setTrackerWeight().
There are several options to tweak the behavior of the optimization process, which is based on a Levenberg-Marquardt approach. See 4. Configuration for more information.
Tracking may of course fail. It is thus important to have some measure of confidence in the tracking process.
If this option is enabled, then the RBT will compute a confidence metric for the computed object pose.
It is then your responsibility to use this score to call for a tracker reinitialization.
To enable drift detection, you should call vpRBTracker::setDriftDetector() with the scoring method that you wish to use.
The method that is provided by default, vpRBProbabilistic3DDriftDetector, is an online method that compares the photometric information of the model in the current image with a representation that is learned online. If available, it will also compare the depth in the render data with the depth from the camera. This representation is a set of 3D points, for which some color statistics are stored and updated every frame.
This method will output a probability estimate that the model is correctly tracked. The confidence metric is thus between 0 and 1.
This method relies on framing color and depth information as Gaussian distributions. Their spread is controlled by the user and a larger spread will result in a greater confidence and smoother changes in the confidence score.
In addition, since color estimation is performed online, the variance of the color distribution of a point is used to factor to weigh its importance with respect to other sample when computing the final score. This means that more importance will be given to points that have a stable color (that are in a smooth color region of the model and that were not too impacted by lighting changes in the previous frames).
To ease the use of the RBT, it is strongly recommended to have compiled ViSP with the 3rd-party that enables JSON support (see JSON for modern C++ installation instructions). This section describes how to configure the tracker.
If you wish or must configure the tracker directly from the code, these are the main methods of interest:
To load a json configuration file, you can call vpRBTracker::loadConfigurationFile().
This subsection details the parameters that you can tweak through JSON.
| Field | Type | Description |
|---|---|---|
| model | string | Optional path to the 3D model of the object. If not present, it should be set using vpRBTracker::setModelPath() before calling vpRBTracker::startTracking() |
| displaySilhouette | Boolean | Whether to display the object silhouette from the last rendered frame when calling vpRBTracker::display(). Note that since this is the last rendered frame, it may appear laggy. |
| updateRenderThreshold | float | Optional. The motion threshold between the last render and the current object pose above which the object should be rerendered and the renders updated. By default, this metric is specified in meters. This threshold is also used when rerendering after odometry. Set this value to 0 to always perform rendering. |
| camera | Dictionary | Optional camera intrinsics. See 4.2.2. Camera configuration. |
| vvs | Dictionary | Parameters for optimization. See 4.2.3. Optimization parameters. |
| silhouetteExtractionSettings | Dictionary | Parameters for the contour extraction from the renders. See 4.2.4. Silhouette extraction settings. |
| mask | Dictionary | Optional segmentation method configuration. See 4.2.5. Object segmentation method configuration. |
| drift | Dictionary | Optional drift detection method configuration. See 4.2.6. Drift detection method configuration. |
| features | List | List of features to track. See 4.2.7. Feature configuration. |
| Field | Type | Description |
|---|---|---|
| width | int | Image width |
| height | int | Image Height |
| intrinsics | Dictionary | Camera intrinsics, as detailed below |
Intrinsics definition:
| Field | Type | Description |
|---|---|---|
| px | double | Focal length to pixel width ratio |
| py | double | Focal length to pixel height ratio |
| u0 | double | Principal point pixel vertical location |
| v0 | double | Principal point pixel horizontal location |
| model | string | Should be set to "perspectiveWithoutDistortion" |
Example configuration without defining a stopping criterion for optimization:
And with a criterion (stop if 3D motion is below 1/10th of a millimeter between two iterations)
| Field | Type | Description |
|---|---|---|
| gain | double | Optimization gain. A gain too small may lead to drift issues, while a gain too high may lead to instabilities in pose. |
| maxIterations | int | Maximum number of optimization iterations. |
| mu | double | Initial amount of regularization for the Levenberg-Marquardt optimizer. |
| muIterFactor | double | Scaling factor to the mu parameter applied at every optimization iteration |
| convergenceThreshold | double | Motion threshold (in meters) below which the tracking is considered as having converged. This conditions the number of iterations for optimization. Set to 0 or do not define to run optimization for the maximum number of iterations. |
Example configuration:
To control the silhouette processing, there are two main settings threshold and sampling.
The threshold settings serve to determine what is a silhouette point or not. Silhouette extraction is based on depth disparity between neighboring render pixels. If there is a strong depth disparity, then a point is considered as belonging to the silhouette.
The threshold can be either defined as an absolute value, or relatively to the object's "depth" size (the difference between the farthest and closest object points in the camera).
| Field | Type | Description |
|---|---|---|
| type | string | Either "relative" or "absolute". If relative, specified as a fraction of the difference between the clipping planes. If absolute, specified in meters. |
| value | double | Value of the threshold. |
The thresholding parameters are used to determine whether a point belongs to the silhouette, but processing the whole silhouette may too cumbersome as there can thousands of contour pixels. Thus, we can sample the silhouette to obtain a subset of points. Sampling as a strong impact on the runtime performance of the tracker.
| Field | Type | Description |
|---|---|---|
| samplingRate | int | This is the step (in pixels) that is used to sample the silhouette map. A value of 1 will examine every pixel. Higher values are susceptible to aliasing artifacts, but may provide more even sampling across the silhouette. |
| numPoints | int | Maximum number of points to keep. If set to 0, then all points are kept. |
| reusePreviousPoints | boolean | Whether to try and reuse the same silhouette points across tracking iterations. This may help improve stability and reduce jitter. |
The use of a segmentation method is defined by the mask key in the main configuration dictionary. Which type of method is used depends on the value of the "type" key of the mask dictionary.
Example configuration:
| Field | Type | Description |
|---|---|---|
| type | string | Set to "histogram" to use this method. |
| bins | int | Should be a power of 2 (e.g, 16 or 32). Determines the number of bins used per color component for the object and background histograms. Less bins will lead to less noise in the pixel map, but may lead to a less confident map. |
| objectUpdateRate | float | Object histogram update rate. Should be between 0 and 1. A value of 1 means that only the last frame will be used in the histogram. Values below will keep a memory of the last seen frames. |
| backgroundUpdateRate | float | Background histogram update rate. Same as objectUpdateRate. |
| maxDepthError | float | When depth information is available, Render depth and actual depth will be compared. For a given pixel of the object, if the depth difference is too high, the pixel will be used to compute the background histogram instead of the object histogram. This helps alleviate the influence of occlusion on the mask. Tihs parameter controls the max tolerated depth difference, expressed in meters. |
| computeOnlyOnBoundingBox | boolean | Whether to compute the mask only in the area containing the object. If true, this will save computation time, but may lead to preserving silhouette contours that should not be. |
Example configuration:
| Field | Type | Description |
|---|---|---|
| type | string | Set to "probabilistic" to use this method. |
| colorUpdateRate | float | Update rate for the color statistics of each point on the object's surface. |
| initialColorSigma | float | Standard deviation (R,G,B) used to compute the error function between the observed color and the estimated color distribution. |
| depthSigma | float | Standard deviation used to compute the error function between the observed pixel depth and rendered depth. |
| filteringMaxDistance | float | Minimum error between the rendered depth and the stored point's distance to the camera. This is used to check whether a 3D point is visible from the camera. This value is in meters, and should be left to a small value. |
| minDistanceNewPoints | float | Each frame, this method tries to sample new 3D points on the object. This parameter controls the sampling density, and ensures that a new candidate point is far enough from all the other stored points from previous iterations. |
The type of feature that is used depends on the "type" value of each feature dictionary.
The "features" object of the main dictionary is a list, as multiple features can be used.
For each feature, these values can be defined:
| Field | Type | Description |
|---|---|---|
| weight | float or vpTemporalWeighting | The importance given by the user to this feature type. |
| display | boolean | Whether the features should be displayed when calling vpRBTracker::display(). |
For weighting, it can be of interest for certain features to give them more importance at the end of optimization, when the pose should be close to the correct solution. Indeed, certain features such as Silhouette contours or depth features exhibit a limited convergence domain (depending on their settings) but do not suffer from modelling inaccuracies, like keypoints. As such, they are ideal to correct the small remaining error that cannot be removed by keypoints.
To do so, you can use a sigmoid-like weighting scheme for such features. This scheme modifies the weight during the optimization (called every frame). The configuration is the following:
| Field | Type | Description |
|---|---|---|
| minWeight | float | The minimum weight given to the feature (either at the start or end of optimization, depending on the increasing flag) |
| maxWeight | float | The maximum weight given to the feature (either at the start or end of optimization, depending on the increasing flag) |
| midpointLocation | float | The optimization point where the weight reaches the halfpoint between minWeight and maxWeight. It is specified as a fraction of the number of iterations (between 0 and 1) |
| slopePower | int | How fast the weighting transitions from minWeight to maxWeight |
| increasing | boolean | Whether the weight should increase (go from minWeight to maxWeight) or decrease (maxWeight to minWeight) |
Example configurations:
For a features whose importance does not change during optimization
For a feature that is more important at the end of optimization (A feature that has a more local convergence domain)
And accordingly, a feature whose importance decreases (such as features that have a large convergence domain but that may be inaccurate):
Example configuration:
| Field | Type | Description |
|---|---|---|
| useMask | boolean | Whether, if available, the segmentation mask should be used to filter wrong features. |
| minMaskConfidence | float | If useMask is true, minimum confidence for a feature to be accepted. |
| maxReprojectionErrorPixels | float | Maximum reprojection error, above which a tracked KLT point is rejected for future tracking iterations. |
| newPointsMinPixelDistance | float | During KLT resampling, minimum euclidean distance that a point should have with all other to be considered for tracking. |
| minimumNumPoints | int | Minimum number of points that should be tracked at a given timestep. If the number of points is below, then KLT tracking is fully reinitialized. |
Example configuration:
| Field | Type | Description |
|---|---|---|
| numCandidates | int | Number of edge locations that are preserved as potential matches for a given silhouette point. |
| useMask | boolean | Whether, if available, the segmentation mask should be used to filter wrong features. Here, the segmentation mask is examined along the silhouette normal to find a disparity in the probability mask (marking a sharp distinction between object and background). |
| minMaskConfidence | float | If useMask is true, minimum confidence disparity for a feature to be accepted. |
| movingEdge | dictionary | Moving edge tracker settings. See vpMe for more information. |
Example configuration for a case with small motions:
Example configuration to handle larger motions:
| Field | Type | Description |
|---|---|---|
| useMask | boolean | Whether, if available, the segmentation mask should be used to filter wrong features. Here, the segmentation mask is examined along the silhouette normal to find a disparity in the probability mask (marking a sharp distinction between object and background). |
| minMaskConfidence | float | If useMask is true, minimum confidence disparity for a feature to be accepted. |
| maxNumPoints | int | Resample number of points to use for this tracker. This allows to use many points for the ME-based tracker, while retaining fewer points for this tracker (it is more expensive). Leave to 0 to not perform resampling. |
| temporalSmoothing | float | Interpolation factor, between 0 and 1. This allows to blend a contour point's current color statistics with those computed on the previous frame. This may improve stability, but may also lead to smaller motion. |
| ccd | dictionary | Contracting curve density algorithm settings. See below. |
To configure the actual CCD parameters, you can modify:
| Field | Type | Description |
|---|---|---|
| h | int | The initial contour search area. This size (in pixels) is used to define the normal size that is used to compute the color statistics and will strongly impact the convergence domain of the tracker. |
| delta_h | int | Subsampling step when examining the contour normal. This helps diminish the tracking time when h is large. Using a high value (comparatively to that of h) can lead to instabilities. |
| min_h | int | Parameter to set if you wish to use an adaptive scaling for the silhouette search area. If set, the initial search area will be set to h. Then, at each optimization iteration, if motion is smaller than a multiple of h, then h is halved, down to a minimum value of min_h. Note that if delta_h is set, then it is also halved in order to obtain more reliable statistics. |
Example configuration:
| Field | Type | Description |
|---|---|---|
| useMask | boolean | Whether, if available, the segmentation mask should be used to filter wrong features. |
| minMaskConfidence | float | If useMask is true, minimum confidence for a feature to be accepted. |
| step | int | Step in pixels to sample the render map. A low value for this parameter will lead to a larger number of depth features. As this step is used when iterating on rows and columns of the render images, the number of depth features will quickly decrease as the step grows. |
When using the Python bindings, more options are available. These are defined in modules/python/bindings/visp/python/rbt
You can load Python features from a JSON configuration. To do so, you can place the Python configuration in a JSON object under the "python_ext" key.
Your config file should look like:
For more information see modules/python/bindings/visp/python/rbt/PythonRBExtensions.py
An example of RBT using Python is available under modules/python/examples/realsense-rbt.py
We provide a wrapper around the XFeat keypoints. XFeat keypoints are robust, and allow for reliable matching, even in the presence of large motion blur and image difference. They are far more powerful than the KLT based features.
XFeat keypoints are matched with 3D points that are stored in a map (see vpPointMap). This map either contains objects points (expressed in the object's frame) or in the environment frame. When a new image arrives, keypoints are extracted, then matched with the points in the map. The correspondences can then be used to minimize a visual error to estimate the object or camera motion. The visual error can be 2D (classical IBVS) or in 3D (Point minimisation in metric space) After minimization incorrect matches are detected and removed from the 3D point map. New points are added.
We run XFeat on the whole image, and the keypoints can be used both for the object tracking and camera motion estimation via visual odometry. To use XFeat-backed settings, the Python settings should contain the XFeat feature extraction settings.
| Field | Type | Description |
|---|---|---|
| numPoints | int | The maximum number of points that are extracted in the frame |
| minCos | float | The minimum matching score for two keypoints to be matched (extract from two different images). Use a high value (e.g. 0.9) to minimize the number of matches. |
| useDense | boolean | Whether to use dense keypoint extraction and matching (Xfeat*) |
| onlyOnBB | boolean | Wheter to run keypoint extraction only in the object region in the image. Do not use this option if you wish to use odometry. |
| minObjMaskValue | float | Minimum mask value (if mask is enabled) for a keypoint to be considered as belonging to the object. If the mask value is below, it is considered an environment keypoints (that may be used for odometry). |
| minSilhouetteDistanceEnvPoint | float | Distance in pixels that an environment keypoint should have to the object's silhouette for it to be a valid keypoint , that can be used for odometry. Useful to reject keypoints that could potentially belong to the object in scenarios where large motions happen |
| minSilhouetteDistanceObjectPoint | float | Distance in pixels that an environment keypoint should have to the object's silhouette for it to be a valid keypoint, that can be used to estimate object motion. Useful to only consider keypoints that are in the texture part of the object and not its contour, where keypoints may rely on the background and thus provide unreliable matches. |
To enable XFeat feature for object tracking, you should add the following to the "python_ext" object configuration:
| Field | Type | Description |
|---|---|---|
| use_3d | boolean | Whether to use 2D error criterion or 3D (requires RGBD camera) |
| numPoints | int | The maximum number of 3D points to store in the object map. These points are those considered for matching with the current object keypoints. |
| reprojectionThreshold | float | The maximum reprojection error for a 3D keypoint at the end of a tracking iteration before it is considered as an outlier and it is removed from the map. |
| minDistNewPoint | float | Minimum distance (in meters) that a new point should have to all the point currently stored in the map for it to be considered as a valid candidate for addition. |
| maxDepthErrorVisible | float | Maximum depth error (in meters) between the rendered depth and the stored 3D point depth before this point is consdered as not visible from the current viewpoint. Mainly used to filter self occlusion. |
| maxDepthErrorCandidate | float | Maximum depth error (in meters) before a new keypoint addition is rejected. Used to filter points that do not belong to the object or for which the actual 3D error is too great. |
To perform odometry using XFeat, you can add these settings:
| Field | Type | Description |
|---|---|---|
| use_3d | boolean | Whether to use 2D error criterion or 3D |
| numPoints | int | The maximum number of 3D points to store in the object map. These points are those considered for matching with the current object keypoints. |
| reprojectionThreshold | float | The maximum reprojection error for a 3D keypoint at the end of a tracking iteration before it is considered as an outlier and it is removed from the map. |
| minDistNewPoint | float | Minimum distance (in meters) that a new point should have to all the point currently stored in the map for it to be considered as a valid candidate for addition. |
| maxDepthErrorVisible | float | Maximum depth error (in meters) between the actual camera depth and the stored 3D point depth before this point is consdered as not visible from the current viewpoint. Used to filter occlusions. |
| maxDepthErrorCandidate | float | Maximum depth error (in meters) before a new keypoint addition is rejected. |
In order to test the tracker, we provide a prerecorded sequence of images named dragon.mp4 and the dragon object, whose CAD model file dragon.bam is provided in the ViSP source code.
The result of tracking the dragon can be obtained by going to the build folder:
then launching:
Note that:
More explanation is provided in the next section.
We will now focus on an other example usage of the tracker, using a Realsense camera.
Note that:
To run the program, start by placing yourself in the ViSP build directory. Assuming you have compiled and built the tutorials, navigate to the tutorial folder:
~/visp-build $ cd tutorial/tracking/render-based
You can then examine the command line arguments with:
Which outputs:
Program description: Tutorial showing the usage of the Render-Based tracker with a RealSense camera
Arguments:
--config Path to the JSON configuration file. Values in this files are loaded, and can be overridden by command line arguments.
Optional
--debug-display Enable additional displays from the renderer
Default: false
--fps Realsense requested framerate
Default: 60
Optional
--height Realsense requested image height
Default: 480
Optional
--init-file Path to the JSON file containing the 2D/3D correspondences for initialization by click
Default: ""
Optional
--max-depth-display Maximum depth value, used to scale the depth display
Default: 1.0
Optional
--no-display Disable display windows
Default: true
--object Name of the object to track. Used to potentially fetch the init file
Default: ""
Optional
--plot-cov Plot the pose covariance trace for each feature
Default: false
--plot-divergence Plot the metrics associated to the divergence threshold computation
Default: false
--plot-pose Plot the pose of the object in the camera frame
Default: false
--plot-position Plot the position of the object in a 3d figure
Default: false
--pose Initial pose of the object in the camera frame.
Default: []
Optional
--profile Enable the use of Pstats to profile rendering times
Default: false
--save Whether to save experiment data
Default: false
--save-path Where to save the experiment log. The folder should not exist.
Default: ""
Optional
--save-video Whether to save the video
Default: false
--tracker Path to the JSON file containing the tracker
Default: ""
Optional
--video-framerate Output video framerate
Default: 30
Optional
--width Realsense requested image width
Default: 848
Optional
Example JSON configuration file:
{
"--debug-display": false,
"--fps": 60,
"--height": 480,
"--init-file": "",
"--max-depth-display": 1.0,
"--no-display": true,
"--object": "",
"--plot-cov": false,
"--plot-divergence": false,
"--plot-pose": false,
"--plot-position": false,
"--pose": [],
"--profile": false,
"--save": false,
"--save-path": "",
"--save-video": false,
"--tracker": "",
"--video-framerate": 30,
"--width": 848
}
You can see that among those arguments, the script has two main parameters. The first, --tracker is the path to the .json configuration file, defined following the details given in 4. Configuration section. The second --object, is the path to the 3D model (in a format readable by the Panda3D engine). Alternatively, it can be specified in the .json file.
Here, we will run the tutorial with the following arguments
~/visp-build/tutorial-tracking/render-based $ ./tutorial-rbt-realsense \ --tracker data/dragon/dragon_realsense.json \ --object data/dragon/dragon.bam \ --init-file data/dragon/dragon.init
The file dragon_realsense.json contains the following configuration:
Let's now have a look at the tutorial code that will perform the tracking. We recall that the full code is given in tutorial-rbt-realsense.cpp.
The code first starts by parsing the command line arguments, detailed above, using structures defined in render-based-tutorial-utils.h:
We start by declaring an instance of the tracker and load the given JSON configuration file.
Depending on the configuration, the call to vpRBTracker::loadConfigurationFile() may raise an error. For instance, The KLT tracker requires OpenCV and if you did not compile ViSP with OpenCV as a third party, an error will be raised.
If successful, we then open a connection to the realsense camera, requesting a specific configuration. This configuration depends on the input arguments (depth, width and framerate)
If the configuration is available, we can proceed and update the tracker with a new, more relevant configuration. Indeed, while camera intrinsics and image resolution can be declared in the configuration file, using the intrinsics provided by the RealSense will be more accurate and ensures that there is no configuration mismatch. We then call vpRBTracker::startTracking() to initialize the renderer, load the 3D model and other setup related operations.
We then proceed and declare the different images that will be needed by the tutorial. Some images contain raw information and will be fed to the tracker. Others will be used for display purposes.
We then create the different display windows with
Once the setup is finished, we wait on the user to setup is environment and position the object (here, the dragon) in front of the camera.
When the user has clicked the display window, initialization can be performed.
If the initial pose is given as a command line argument, tracking can directly proceed. Otherwise, The user must click on different pixel positions that correspond to known 3D points, defined in a initialization file located next to the configuration file or the mesh. In this case, the initialization file is located at data/dragon/dragon.init and contains:
Once initialization has been performed and a starting pose given, we can enter the tracking loop. At every iteration, the same simple steps are performed:
We start by acquiring the most recent color and depth frames.
The alignment parameter ensures that the depth map is aligned with the color frame, as required by the RBT. The call to updateDepth() converts from a raw uint16_t representation to a depth map in meters. We also convert The RGB image to a grayscale image using vpImageConvert::convert().
We can then call the tracker with the different frames:
As an output, we get a vpRBTrackingResult object. This object contains information about what the tracker has done, the number of features that were used, etc. It contains one important information, which is the reason for why the tracking call has returned. In the following snippet:
We use this information to check whether:
In the first three cases, this means that tracking should be reinitialized.
After that, we check (if it has been computed) the tracking confidence score. If it is low, we emit a warning. In a more complete application, the pipeline would contain an automatic reinitialization procedure.
We then display the current frame and overlay the tracking information
On the grayscale image, you can see:
On the color image, you can see:
The display also contains the depth image:
Finally, you should also see a black and white image which is the pixel wise segmentation mask that is used to filter out the features. If no segmentation method is used, the image should be black.
We also log in the console timing details. Running the tutorial with the –verbose flag will print timings for each part of the RBT pipeline.
When the user has pressed left click, we exit the tracking loop and perform a small cleanup:
One possible reason for this is that we make large use of OpenMP to parallelize and split the workload. An invalid OpenMP configuration may lead to a substantial loss in performance.
In Ubuntu 24.04, we have observed with the GNU OpenMP implementation (that is used if you compile with gcc/g++) that the default settings lead to a slow down by a factor of 4/5.
This is due to an active waiting policy. We recommend using the following OpenMP settings:
Note that this configuration will only be kept for the lifetime of your terminal and you may have to restore it the next time you start the tracker.
This error can appear when calling vpRBTracker::startTracking().
It may be useful to check the backend used by Panda3D, which should be displayed when you start the tutorial:
Known pipe types: glxGraphicsPipe
You can also additionally get more information bout your OpenGL support. In ubuntu, you can use:
glxinfo | grep version server glx version string: 1.4 client glx version string: 1.4 GLX version: 1.4 OpenGL core profile version string: 4.6.0 NVIDIA 550.144.03 OpenGL core profile shading language version string: 4.60 NVIDIA OpenGL version string: 4.6.0 NVIDIA 550.144.03 OpenGL shading language version string: 4.60 NVIDIA OpenGL ES profile version string: OpenGL ES 3.2 NVIDIA 550.144.03 OpenGL ES profile shading language version string: OpenGL ES GLSL ES 3.20 GL_EXT_shader_group_vote, GL_EXT_shader_implicit_conversions,
If you do not have something similar, it may be that your platform has no OpenGL support.
As seen in 4. Configuration, the different components of the RBT pipeline can be parsed from a JSON configuration file. This requires a ViSP version that is built with the nlohmann::json third party (see 2.1. Building the RBT module).
This functionality is implemented using a factory pattern as defined in the vpDynamicFactory.
For your own class to be loadable, you should call vpDynamicFactory::registerType with a lambda function that defines how a JSON object can be used to configure an object from your type
For instance, in the vpRBDriftDetectorFactory, an object of type vpRBProbabilistic3DDriftDetector is registered as:
This means that when parsing the JSON object's drift field, if the "type" field is equal to "probabilistic", then this function will be called and the json object will used to build a vpRBProbabilistic3DDriftDetector.
This patterns hold for the different types of components: