Robotics perception system

Bob Mottram 45fc377d9e readme 6 years ago
FASTcorners d092c71d1b 11 years ago
MemCompressor 46f85d2f2b 11 years ago
applications d092c71d1b 11 years ago
directx d5b384c2af 14 years ago
docs 40ca25eb9c 13 years ago
hardware 5983ab8811 13 years ago
pathplanner d092c71d1b 11 years ago
selfopt 040bf72784 12 years ago
sentcore d092c71d1b 11 years ago
testdata 59e219842d 13 years ago
utilities e766fdb6f8 license 6 years ago
LICENSE e766fdb6f8 license 6 years ago 45fc377d9e readme 6 years ago


Sentience is a volumetric perception system for mobile robots, written in C#. It uses webcam-based stereoscopic vision to generate depth maps, and from these create colour 3D voxel models of the environment for obstacle avoidance, navigation and object recognition purposes.

This project is not currently active, but might be revived in a different form in future.


Robots which use Sentience:

  • Rodney
  • Flint
  • Surveyor SRV-1 (with stereo vision system)
  • GROK2

Stereo cameras:

How to make a stereo camera

You don't need to use prohibitively expensive dedicated hardware to be able to experiment with stereoscopic vision. Most webcams these days are of a reasonable quality and have a sufficiently high frame rate to be practical on slow moving domestic robots.

Here's how to build a stereo camera in 30 minutes at a cost of under 30 quid (about $60).

Choose your cameras

There are zillions of webcams now on the market, and the models and megapixels are constantly changing. Firstly I should say that both cameras should be of an identical model. This might seem obvious, but I have seen folks in the past try to do stereo using two entirely different cameras and they soon get themselves into trouble. Quite apart from any aesthetic considerations using the same model ensures that the optics of both cameras will be somewhat similar and they will have the same field of view? and focal lengths.

If possible try to acquire webcams which use a CCD chip, rather than the CMOS type. CMOS cameras are ok in most situations, but under low illumination conditions such as artificial lighting (especially the low intensity energy saving light bulbs) CCD has superior performance characteristics. Another alternative is to use a camera which has its own LED illumination, although this could be problematic in borderline situations where the LEDs might switch on and off erratically. If you can't find any CCD based cameras, or they're too expensive, just stick with good old CMOS.

In the world of digital photography megapixels rule. However, for a stereo vision system you actually don't need a particularly high resolution. 640x480, or even a mere 320x240 pixels are quite adequate for ranging of distances up to about three metres. When selecting cameras think cheap and nasty rather than top of the range. Any extraneous features or software (other than the driver) are strictly surplus to requirements.

If possible try to use cameras with a wide field of view. The standard field of view for webcams (at the time of writing) is 40 degrees horizontal and 20 degrees vertical. There are a few with a wider view than this. A wide field of view just means that the robot can see more of what's ahead at one time than would otherwise be the case, so things like obstacle avoidance or feature tracking work better.

For this system I'm going to use the Creative Webcam NX Ultra. It's not the smallest webcam you've ever seen, but it has a wide 78 degree field of view and uses a CCD chip. Most important of all, they only cost 13 pounds each on eBay.


You'll need something to mount the cameras onto as a backplate. For this I used a piece of strip aluminium from a local DIY store. The metal is light, yet thick enough not to bend easily - a very important property for a mobile robot.

Cut off a strip of metal so that you can mount the cameras onto it with a baseline separation of 100mm between the centres of each lens. Why 100mm? Well, there doesn't seem to be any general agreement on an appropriate baseline distance for mobile robot stereo vision. It's just down to personal preference, and 100mm seems like a round number in the right sort of ballpack. In the past I have used 70mm and 140mm spacings.

The Creative webcams which I'm using are conveniently held together with screws which can be easily removed (it's almost as if they designed them to be dissassembled!). Remove the top two screws from each camera, then drill holes in the aluminium strip using an appropriately sized bit, like this:

Final assembly

Now its time to assemble the whole caboodle. The stereo camera singularity is near. Screw the cameras onto the metal backplate, and make sure that they're secure. In my case I found that the screws weren't long enough, and instead used some small screws which came with radio control servos as a substitute. You may also wish to drill a few holes in the centre of the metal strip to allow the system to be securely bolted onto your robot.

And that's all there is to it. The next step is camera calibration, and then the system is ready to be used as a fully operational ranging device.

Using multiple cameras on Linux

When you only ever have a single camera attached to a computer then life is easy, but if you need to have multiple cameras attached - for example with one or more stereo cameras - then keeping track of the index numbers assigned to each camera can become a nightmare. If cameras are unplugged and then reconnected they can be assigned completely different index numbers.

To have a consistent multi-camera system you need to set up some udev rules. First look at the parameters for each of your video devices. For example, for /dev/video0

udevadm info -a -p $(udevadm info -q path -n /dev/video0)

This will provide a list of attributes associated with the device. Create a text file similar to the following:

KERNEL=="video[0-9]*", ATTRS{serial}=="A1CDE628", SYMLINK+="video-front-left"
KERNEL=="video[0-9]*", ATTRS{serial}=="70CDEF10", SYMLINK+="video-front-right"
KERNEL=="video[0-9]*", ATTRS{serial}=="208AEF19", SYMLINK+="video-rear-left"
KERNEL=="video[0-9]*", ATTRS{serial}=="E929E62D", SYMLINK+="video-rear-right"

In this case I've picked out the serial number attribute which is unique to each camera, then assigned a more meaningful name to the device. Save this file with a filename such as 10-video.rules, copy it to /etc/udev/rules.d then unplug and reconnect your cameras.

Listing the video devices then looks like this:


By referencing devices by their new names, such as /dev/video-front-left, you can then achieve a completely consistent interface with your software, no matter what the order in which cameras were originally connected.

The Surveyor Stereo Vision System


This is a stereo vision system for mobile robots sold by Surveyor Corporation. Code and algorithms from the main Sentience project have been used to develop some software for use with this device, giving the ability to range objects within view.

The software is written in C# and is intended to be compiled either using Visual Studio 2005 on a Microsoft Windows system, or using MonoDevelop on GNU/Linux systems. It has been tested on Windows Vista and Ubuntu Hardy, and is licenced under GPL version 3.

Setup General SVS setup information can be found here. Also a step by step guide to uploading new firmware versions is here.


Main solution files are:

surveyorstereo.mds (GNU/Linux version)
surveyorstereo.sln (Windows version)

First you need to ensure that the IP address and port numbers for the stereo camera are configured correctly. These are contained within MainWindow.cs (GNU/Linux version which uses a Gtk GUI) or frmStereo.cs (Windows version using Windows.Forms). You should then be able to compile and run the software and receive images from both cameras.

On a new installation of Ubuntu you may need to install gmcs (the .NET 2.0 runtime) in order to compile the software, like this:

sudo apt-get install mono-gmcs


A video of the calibration procedure is shown here.

Both cameras need to be calibrated before the device can be used for stereo ranging. Calibration removes the lens distortion effects, ensuring that straight lines in the world appear as straight lines in the images so that the epipolar constraint applies, and also corrects for any small misalignments from a perfectly parallel geometry. This is an easy procedure which only takes a few minutes, and only typically needs to be performed once unless the positions of the cameras have changed.

It's a good idea to ensure that both cameras are rigidly mounted in parallel so that they cannot move relative to each other, even despite the knocks and bumps which a mobile robot is bound to encounter. One way to do this is to bolt a piece of wood or metal to the top mounting holes, as in the above picture.

Run the software, then either click on the "calibrate left camera" checkbox or select it from the menu bar (depending upon whether the Windows or GNU/Linux version is being used). You should see a pattern of dots appear next to the left camera image, like this:

Point the left camera at the dot pattern on the screen (it is assumed that the screen being used is flat, not curved like an old CRT monitor) so that the pattern fills the field of view of the camera. You may need to experiment with the distance between the screen and the camera, but once the system acquires a suitable image it will be displayed in the left image area. After a few updates you will notice that the dot pattern appears to go from being somewhat warped around the periphery to being straight, with regular spacing between dots. There will also be a beep sound, indicating that the RMS curve fitting error has fallen below an acceptable threshold. Once this happens you can click on "calibrate right camera" or select it from the menu and perform a similar procedure for the right camera.

You should now be able to see a pair of images where straight lines appear straight. To complete the calibration process you now need to point the stereo camera at something a significant distance away - preferably more than five metres distant. The idea here is that rays of light coming from these distant objects can be considered for practical purposes to be effectively parallel. A good way to do this is to point the stereo camera out of a window at a distant house or trees. Click on "calibrate alignment" or select it from the menu and the calibration process will now be complete.

You may wish to save the calibration XML file to some suitable location for later use, which can be done either from the file menu or by clicking on a button. In the GNU/Linux (Gtk) version the calibration file is saved onto the desktop.

By default you will see the simple (sparse) stereo algorithm running, with green dots appearing in the left image indicating the amount of stereo disparity for detected features. Bigger dots are closer to the camera. The relationship between disparity and distance is given in this document. If the results look unsatisfactory for any reason try going through the calibration procedure again. It cannot be overstated that good camera calibration is essential for any reasonable results on stereo correspondence. Poorly focussed cameras can also cause problems, so ensure that the images look reasonably sharp and if not then manually adjust the focus as needed.

Calibration using a physical pattern

It is also possible to save the calibration pattern as a bitmap. In the GNU/Linux version this is saved to the desktop by clicking a button, and in other versions a dialog from the file menu allows you to save the image to any location. The image may then be printed out and glued to a cardboard or other rigid backing. This provides an alternative to calibrating directly from the screen, should that be more convenient. The audible beep produced as each camera is calibrated means that you don't necessarily need to be looking at a monitor or laptop in order to know that the process has completed.

The calibration image can also be downloaded here.

Stereo vision server Server source code

A graphical user interface is obviously necessary for calibration and testing of the stereo correspondence algorithms. However, once calibration has been completed it is possible to run the system "headless" as a command line program. This program computes stereo disparities and broadcasts this data to other connected applications using the TCP protocol. The syntax is as follows:

stereoserver -server <IP address of the stereo camera>
             -leftport <port number of the left camera, usually 10001>
             -rightport <port number of the right camera, usually 10002>
             -broadcastport <port number on which to broadcast stereo feature data>
             -algorithm <"simple" or "dense">
             -calibration <calibration filename>
             -width <width of the image coming from the stereo camera in pixels>
             -height <height of the image coming from the stereo camera in pixels>
             -record <save raw and rectified images to the given path>
             -fps <ideal number of frames per second>

What is the format of the stereo feature data being broadcast? For the simple edge/corner based stereo algorithm the data format is 12 bytes per feature.

public struct BroadcastStereoFeature
    public float x;         // 4 bytes
    public float y;         // 4 bytes
    public float disparity; // 4 bytes

For the dense algorithm three bytes indicating the colour of the feature is also transmitted, with 16 bytes per feature.

public struct BroadcastStereoFeatureColour
    public float x;         // 4 bytes
    public float y;         // 4 bytes
    public float disparity; // 4 bytes
    public byte r, g, b;    // 3 bytes
    public byte pack;       // 1 byte

The first byte in the transmission contains either a zero value if the subsequent data is in the first format or a one if the subsequent data is in the second format with additional colour information.

To keep bandwidth usage reasonable a maximum of 200 stereo features are broadcast at any point in time. If more than 200 features were detected a random sample is returned. This also allows computational economies to be made in the dense stereo algorithm, which only needs to sample a few randomly selected image rows in order to acquire enough data. For those familiar with LIDAR systems you can think of this as being like a crazy laser scanner, where the tilt angle used for each scan line is picked randomly.

An example command line client program which connects to the stereo vision server and receives features can be found here. The syntax for its use is as follows.

stereoclient -server

         -port <broadcast port number on which the stereo vision server is communicating>

An easy way to write a program which does something with the stereo disparity data would be to create a new class which inherits from StereoVisionClient.cs and overrides the FeaturesArrived? method.

Notes The dense stereo algorithm is loosely based upon a paper by Jenny Read and Bruce Cumming, Sensors for impossible stimuli may solve the stereo correspondence problem.

Results of stereo correspondence are always noisy in practice, and occasional false matches are an occupational hazard. However, provided that the signal outweighs the noise good results can be achieved, especially when using probabilistic methods such as occupancy grids.

Ideas for future work It is possible that stereo correspondence could be combined with fast monocular tracking similar to that used by Andrew Davison. Features could be initialized in a very computationally economical manner, perhaps randomly selecting a single image row upon which to find new features, then tracked monocularly at high speed.

Release 0.3.1 This version contains code which has been tested on Windows Vista, both within Visual Studio 2008 and the Windows version of MonoDevelop?.

Release 0.3 In the interests of following the mantra release early and often this is a somewhat experimental and possibly unstable release. There have been significant improvements in the threading used to grab images from both cameras, which make the GUI a lot less flaky. Buttons have also been added to allow manual teleoperation of the SRV robot, although the screen resolution buttons don't work. There's also a logging feature which allows images and teleoperation commands to be stored to disk for later analysis or simulation.

For the first time binaries have also been released to allow easy installation. There are deb and rpm packages for installation on various Linux distros. The deb package should install necessary dependencies, but if you're installing the rpm you'll need to ensure that the mono-core and gtk-sharp packages are installed.

In this version I expect that there will be broken functionality on Windows based operating systems, because I've payed little attention to the Windows version. This will be improved in the next release.

Report all bugs using the Issues tab above.

Release 0.2 Source code

Stereo correspondence and calibration algorithms are mostly unchanged. This version mainly contains usability improvements, which include:

Slightly altered class heirachy. A command line stereo vision server as an alternative to the GUI based system. This may be useful on robots running GNU/Linux which are not using X. The server program only acquires images from the SVS and computes stereo matches if one or more client applications are connected. An example client command line program which connects to the server and receives stereo feature data. An audible beep when calibration of an individual camera is complete. Ability to save the calibration pattern so that it may be printed out. Release 0.1 Source code

The initial release contains algorithms for sparse edge/corner based stereo correspondence and a dense disparity map. The dense algorithm has not been optimized, so there is presumably scope for improvement in its performance.

Known issues:

When selecting to calibrate left or right cameras on the Linux version sometimes you need to click more than once on the check box. Calibration does not include relative rotation (roll angle) of one camera to another. It is assumed that the relative rotation is zero degrees. There is no way of transmitting the stereo feature data to other applications. Broadcasting the data via TCP or UDP is anticipated for a later version.

Minoru 3D webcam


stereosensormodel - a program for creating stereo vision sensor models suitable for use with occupancy grids stereotweaks - a GUI which can be used to manually set stereo camera calibration offsets, visually check camera calibration and create anaglyph-like animations. surveyorstereo - a graphical user interface for the Surveyor SVS.


What development tools do I need to work with the sentience code?

How to install OpenCV

How to compile and run the code

Fundamentals of stereo vision:

Why stereo vision ?

How do I calculate range from stereo disparity?

Characterising uncertainties in stereoscopic vision

Stereo camera calibration

Sensor models

Other topics:

Project roadmap

How do I find the pixel density of the image sensor, or the focal length?

Utilities for webcam based stereo vision

A biologically inspired stereo correspondence algorithm

Moving through space

Autonomous Navigation?

Zen and the art of robot design

Occupancy Grids

Background reading:

Articles on occupancy grids and SLAM algorithms

The Development of Camera Calibration Methods and Models by T.A. Clarke and J.G. Fryer. Probabilistic Robotics, by Sebastian Thrun, Wolfram Burgard and Dieter Fox. Estimating egomotion from stereo vision using the principle of least commitment