Proximity Vision

Abstract 概要: Proximity vision is computer vision for nearby objects. Here we implement the following functions: detecting an object, estimating the object pose (position and orientation on image), and detecting the object movement (which is caused by slippage or deformation).

Algorithm Overview †

We directly process the camera stream for proximity vision.

We implement two algorithms: nearby-object detection and slip estimation.

The object detection is useful to estimate the pose (position and orientation) and size of grasped object or nearby object, and is also used to distinguish the background and object movement.

Slip is estimated by a background subtraction method with the mask of detected object. We also considered optical flow, but background subtraction was better in some cases where the object does not have sufficient texture.

In the calibration stage, we build the background model. An object model is adaptively and dynamically constructed. Both the background and the object models are represented as color histograms.

As the background subtraction method, we use an implementation of OpenCV cv::BackgroundSubtractorMOG2. With this implementation, the strength of slip at each pixel is estimated which does not have the direction of slip.

↑

Movement Detection †

We found that optical flow and background subtraction are good at detecting changes in a sequence of images. We compared three implementations based on functions in OpenCV, applying cvCalcOpticalFlowLK to raw images, cvCalcOpticalFlowLK to edge images detected by the Sobel filter, and cv::BackgroundSubtractorMOG2 to raw images. In many cases, the three approaches provided similar results. In some cases especially when the object does not have sufficient texture, cv::BackgroundSubtractorMOG2 was slightly better than the others. Thus, we use the background subtraction approach for movement detection (detection of moving pixels).

↑

Object Model Construction †

An object model is adaptively and dynamically constructed.

First we create a background model as a histogram of colors. This process is referred to as calibration.

In the beginning of an object manipulation, we detect moving pixels in the image, compute a histogram of colors of the moving pixels, and subtract the background histogram. The remaining histogram is added to the object model.

In the tracking phase, we apply the back projection of the object histogram to the current frame, and thresholding to detect the object.

As the color histograms, we use the hue and saturation components of the HSV color space, where the number of bins of hue and saturation components are 100 and 10 respectively.

The background model is constructed with several adjacent frames (e.g. 3). We simply make it as an average of histograms of all frames. Let us denote the background histogram model as $(( H_{bg}(h,s) $)) where $(( h $)) and $(( s $)) denote indexes of hue and saturation bin respectively.

During construction of an object model, the object is assumed to be moving in the image as we described above. At each frame, we detect the moving points with the background subtraction method, and calculate the histogram of colors as $(( H_{mv}(h,s) $)). We update the object histogram model by:

\$$$ H_{obj}'(h,s) = \min(255, H_{obj}(h,s) + f_{gain} \max(0, H_{mv}(h,s) - f_{bg} H_{bg}(h,s)) ) \$$$

where $(( H_{obj}(h,s) $)) and $(( H_{obj}'(h,s) $)) are the current and the updated object histogram models. At the beginning, $(( H_{obj}(h,s) $)) is initialized to be zero. The component $(( \max(0, H_{mv}(h,s) - f_{bg} H_{bg}(h,s)) $)) computes the remaining histogram after subtracting the background histogram from the color histogram of moving points. The $(( \min(255, \dots) $)) operation is for normalization. $(( f_{bg} $)) and $(( f_{gain} $)) are constant values, for example 1.5 and 0.5 respectively.

↑

Object Tracking †

At each frame, we track an object by detecting the pixels similar to the object model. Concretely, we apply a back projection method (cv::calcBackProject) with the histogram of the object $(( H_{obj}(h,s) $)), and threshold the result to remove the uncertain pixels. The remaining pixels are the detected object. These pixels are used in two ways. One is removing the background change from the moving points obtained from the background subtraction. For this purpose we apply an erosion (size 2) and a dilation (size 7) to remove noise and expand the boundary of the object. The other is computing the position and the angle (orientation) of the object. This is done by computing the moment of the object pixels.

↑

Implementation: TObjDetTrackBSP Class †

The nearby-object detection and slip estimation algorithms are implemented as the the object detection and tracking class TObjDetTrackBSP in fv_core/prox_vision.h which depends on OpenCV and the Boost C++ Libraries.

Reading the code of the standalone program of marker tracker standalone/prox_vision_test.cpp would be a good practice to learn the usage of this class.

Note that TObjDetTrackBSP itself does not do the image rectification since it is a common process among other image processing. The image rectification method is introduced in Video Capture and Rectification.

↑

Parameters †

The TObjDetTrackBSPParams structure is a set of the parameters of TObjDetTrackBSP. It is defined as follows:

struct TObjDetTrackBSPParams
{
  // Resize image for speed up
  int Width, Height;

  // Remove black background
  int ThreshBlkH;
  int ThreshBlkS;
  int ThreshBlkV;

  // Parameters of BackgroundSubtractorMOG2
  float BS_History;
  float BS_VarThreshold;
  bool BS_DetectShadows;

  // Background subtraction and post process
  int NErode1;  // Size of erode applied to the background subtraction result.

  // Histogram bins (Hue, Saturation)
  int BinsH;
  int BinsS;

  // Object detection parameters
  float Fbg;    // Degree to remove background model (histogram). Larger remove more.
  float Fgain;  // Learning rate.

  int NumModel;         // Number of object models to be maintained.
  int NumFramesInHist;  // Number of frames to make an object histogram.

  // Object tracking parameters
  int NThreshold2;  // Threshold applied to back projection.
  int NErode2;
  int NDilate2;

  int NCalibBGFrames;  // Number of frames to make a background histogram.

  // Simplifying detected object and movement for easy use
  int ObjSW, ObjSH;  // Size of shrunken object data
  int MvSW, MvSH;  // Size of shrunken movement data
};

The following utility functions are provided for TObjDetTrackBSPParams:

↑

WriteToYAML and ReadFromYAML †

void WriteToYAML(const std::vector<TObjDetTrackBSPParams> &params, const std::string &file_name);
void ReadFromYAML(std::vector<TObjDetTrackBSPParams> &params, const std::string &file_name);

These functions are saving and loading multiple TObjDetTrackBSPParams objects into/from a YAML file. An example of the configuration file is: fingervision/config/fv_2_l.yaml. Note that only the ObjDetTrack section matters. Other sections such as CameraInfo are ignored.

↑

Member Functions †

↑

Set and Get Parameters †

TObjDetTrackBSPParams& Params()  {return params_;}
const TObjDetTrackBSPParams& Params() const {return params_;}

Params gives writable and access to the parameter object.

↑

Init †

void Init();

Initialization of internal objects. Init should be called after setting parameters.

↑

Calibrate †

void CalibBG(const std::vector<cv::Mat> &frames);

Calibrate the background model with a sequence of images. Note that frames should be rectified.

↑

Step †

void Step(const cv::Mat &frame);

Progress the object detection and tracking with a given image frame. Note that frame should be rectified.

↑

Draw †

void Draw(cv::Mat &frame);

Visualize the calibration result on frame.

↑

Object Model Manipulation †

void AddToModel(const cv::Mat &frame, const cv::Point &p);

Force to add area around a point p to the object models. The background model is NOT subtracted. This function is useful to manually update the object model.

void StartDetect();
void StopDetect();

Start and stop the object detection. When the object detection is stopped, the object model is not updated.

void ClearObject();

Clear the object model.

↑

Access to the Result †

// Raw object mask.
const cv::Mat& ObjRaw() const;
// Expanded object mask.
const cv::Mat& ObjExp() const;
// Movement points.
const cv::Mat& Mv() const;

// Moment of object (ObjectRaw).
const cv::Moments& ObjMoments() const;
// Shrunken object.
const cv::Mat& ObjS() const;
// Shrunken movement.
const cv::Mat& MvS() const;

For the moment, refer to the OpenCV documentation and Wikipedia-Image moment.

ObjExp() is a mask used to mask the object movement (movement points) from the background subtraction result.

ObjS() and MvS() are simplified (shrunken) versions of ObjRaw() and Mv() respectively. In many robot control, shrunken versions are enough. In the ROS implementation, ObjS() and MvS() are sent as topics instead of ObjRaw() and Mv() for reducing the amount of packets. Shrinking matrices are similar to cv::resize with the INTER_AREA interpolation, but we compute each element as a float value. The sizes of shrunken matrices are defined by the (ObjSW, ObjSH) and (MvSW, MvSH) parameters. In default, all of them are 3.

↑

Step-by-Step Usage †

The following code shows the step-by-step usage of TObjDetTrackBSP.

↑

Initialize †

A simple way to initialize TObjDetTrackBSP is:

TObjDetTrackBSP tracker;
tracker.Params().xxx= yyy;
tracker.Init();

You can omit tracker.Params().xxx= ... if you will use the default parameters.

The following style is loading the parameters from a configuration file in YAML format.

TObjDetTrackBSP tracker;
std::vector<TObjDetTrackBSPParams> objdettrack_info;
ReadFromYAML(objdettrack_info, config_file);
tracker.Params()= objdettrack_info[0];
tracker.Init();

Note that ReadFromYAML can load multiple TObjDetTrackBSPParams objects from a single file. The above example uses only the first object.

↑

Progress Object Detection and Tracking †

We assume CAPTURE_AND_RECTIFY() is a function to capture an image and preprocess it. An image may be captured from a device, video stream, and a file. It should be preprocessed with resizing, rotating, and rectifying.

cv::Mat frame= CAPTURE_AND_RECTIFY()

Then the image is fed to the Step method:

tracker.Step(frame);

For visualization of calibration result:

frame*= 0.3;  // darken the original image.
tracker.Draw(frame);

↑

Calibration of Background Model †

Remove objects from the front of FingerVision. Then, run the code:

std::vector<cv::Mat> frames;
for(int i(0); i<tracker.Params().NCalibBGFrames; ++i)
  frames.push_back(CAPTURE_AND_RECTIFY());
tracker.CalibBG(frames);

↑

Standalone Proximity Vision Program †

The standalone code to test TObjDetTrackBSP is implemented in standalone/prox_vision_test.cpp.

//-------------------------------------------------------------------------------------------
/*! \file    prox_vision_test.cpp
    \brief   Object detection and tracking (proximity vision) of FingerVision.
    \author  Akihiko Yamaguchi, info@akihikoy.net
    \version 0.1
    \date    Feb.17, 2017
    \version 0.2
    \date    Aug.9, 2018
             Using the same core programs as the ROS version.
             Supporting to load YAML configuration.
             Supporting camera rectification.

g++ -g -Wall -O2 -o prox_vision_test.out prox_vision_test.cpp -I../3rdparty -I../fv_core -lopencv_core -lopencv_imgproc -lopencv_features2d -lopencv_calib3d -lopencv_video -lopencv_highgui -lopencv_videoio

Run:
  $ ./prox_vision_test.out [CAMERA_NUMBER [WIDTH HEIGHT]]
    CAMERA_NUMBER: Camera device number (e.g. 1).
  $ ./prox_vision_test.out CONFIG_FILE
    CONFIG_FILE: Configuration file in YAML (e.g. fv_1.yaml).
    The configuration file may include camera configuration (CameraInfo) and
    the configuration of proximity vision (ObjDetTrack).
Usage:
  Press 'q' or Esc: Exit the program.
  Press 'c': Calibrate the tracker (constructing the background color model).
  Press 'C': Show/hide the parameter configuration trackbars.
  Press 'P': Save the tracker parameter into a file in /tmp.
  Press 'W' (shift+'w'): Start/stop video recording.
  Press 'r': Reset (clear) the tracking object.
  Press 'd': On/off the object detection mode (default=on).
  Press 'm': Change the dimming-level of the displayed image (0.3 -> 0.7 -> 1.0 -> 0.3 -> ...).
  Shift+Click a point on the image: Add the color of the point to the object color model.
          Tips: This is useful when making an object model manually.
*/
//-------------------------------------------------------------------------------------------
#include "prox_vision.h"
#include "ay_vision/vision_util.h"
#include "ay_cpp/cpp_util.h"
#include "ay_cpp/sys_util.h"
//-------------------------------------------------------------------------------------------
#include "prox_vision.cpp"
#include "ay_vision/vision_util.cpp"
//-------------------------------------------------------------------------------------------
using namespace std;
using namespace trick;
//-------------------------------------------------------------------------------------------

cv::Mat Capture(cv::VideoCapture &cap, TCameraInfo &info)
{
  cv::Mat frame;
  while(!cap.read(frame))
  {
    if(CapWaitReopen(info,cap)) continue;
    else  return cv::Mat();
  }
  return frame;
}
//-------------------------------------------------------------------------------------------

struct TMouseEventData
{
  cv::Mat &frame;
  TObjDetTrackBSP &tracker;
  TMouseEventData(cv::Mat &f, TObjDetTrackBSP &t) : frame(f), tracker(t) {}
};
void OnMouse(int event, int x, int y, int flags, void *data)
{
  if(event == cv::EVENT_LBUTTONDOWN && (flags & cv::EVENT_FLAG_SHIFTKEY))
  {
    TMouseEventData &d(*reinterpret_cast<TMouseEventData*>(data));
    d.tracker.AddToModel(d.frame, cv::Point(x,y));
  }
}
//-------------------------------------------------------------------------------------------

int main(int argc, char**argv)
{
  std::string cam("0"), config_file;
  if(argc>1)  cam= argv[1];

  std::vector<TCameraInfo> cam_info;
  // If cam exists as a file, it is considered as a configuration YAML file.
  if(FileExists(cam))
  {
    ReadFromYAML(cam_info, cam);
    config_file= cam;
  }
  else
  {
    cam_info.push_back(TCameraInfo());
    cam_info[0].DevID= cam;
    cam_info[0].Width= ((argc>2)?atoi(argv[2]):320);
    cam_info[0].Height= ((argc>3)?atoi(argv[3]):240);
  }
  cv::VideoCapture cap;
  if(!CapOpen(cam_info[0], cap))  return -1;

  TCameraRectifier cam_rectifier;
  if(cam_info[0].Rectification)
  {
    cv::Size size_in(cam_info[0].Width,cam_info[0].Height), size_out(cam_info[0].Width,cam_info[0].Height);
    cam_rectifier.Setup(cam_info[0].K, cam_info[0].D, cam_info[0].R, size_in, cam_info[0].Alpha, size_out);
  }

  TObjDetTrackBSP tracker;
  if(config_file=="")
  {
    tracker.Init();
  }
  else
  {
    std::vector<TObjDetTrackBSPParams> objdettrack_info;
    ReadFromYAML(objdettrack_info, config_file);
    tracker.Params()= objdettrack_info[0];
    tracker.Init();
  }

  cv::Mat frame, frame_src;

  cv::namedWindow("camera",1);
  TMouseEventData mouse_data(frame_src,tracker);
  cv::setMouseCallback("camera", OnMouse, &mouse_data);
  int trackbar_mode(0);
  bool init_request(false);

  TEasyVideoOut vout, vout_orig;
  vout.SetfilePrefix("/tmp/objtr");
  vout_orig.SetfilePrefix("/tmp/objtr-orig");

  int show_fps(0);
  double dim_levels[]={0.0,0.3,0.7,1.0};  int dim_idx(1);
  for(int f(0);;++f)
  {
    if(init_request)
    {
      tracker.Init();
      init_request= false;
    }

    frame= Capture(cap, cam_info[0]);
    vout_orig.Step(frame);
    Preprocess(frame, cam_info[0], &cam_rectifier);
    frame.copyTo(frame_src);

    if(f>0)
    {
      tracker.Step(frame);
      frame*= dim_levels[dim_idx];
      tracker.Draw(frame);
    }

    vout.Step(frame);
    vout.VizRec(frame);
    cv::imshow("camera", frame);
    char c(cv::waitKey(1));
    if(c=='\x1b'||c=='q') break;
    else if(char(c)=='W')  {vout.Switch(); vout_orig.Switch();}
    else if(c=='m')
    {
      dim_idx++;
      if(dim_idx>=int(sizeof(dim_levels)/sizeof(dim_levels[0])))  dim_idx=0;
    }
    else if(c=='r')
    {
      tracker.ClearObject();
    }
    else if(c=='d')
    {
      if(tracker.ModeDetect())  tracker.StopDetect();
      else                      tracker.StartDetect();
      std::cerr<<"Object detection mode is: "<<(tracker.ModeDetect()?"on":"off")<<std::endl;
    }
    else if(c=='P')
    {
      std::vector<TObjDetTrackBSPParams> p;
      p.push_back(tracker.Params());
      WriteToYAML(p,"/tmp/objtr_params.yaml");
      std::cerr<<"Parameters of the tracker are saved into /tmp/objtr_params.yaml"<<std::endl;
    }
    else if(c=='C')
    {
      // Remove trackbars from window.
      cv::destroyWindow("camera");
      cv::namedWindow("camera",1);
      ++trackbar_mode;
      CreateTrackbars("camera", tracker.Params(), trackbar_mode, init_request);
      cv::setMouseCallback("camera", OnMouse, &mouse_data);
    }
    else if(c=='c' || f==0)
    {
      std::vector<cv::Mat> frames;
      for(int i(0); i<tracker.Params().NCalibBGFrames; ++i)
      {
        frames.push_back(Capture(cap, cam_info[0]));
        Preprocess(frames.back(), cam_info[0], &cam_rectifier);
      }
      tracker.CalibBG(frames);
    }
    // usleep(10000);
    if(show_fps==0)
    {
      std::cerr<<"FPS: "<<vout.FPS()<<std::endl;
      show_fps= vout.FPS()*4;
    }
    --show_fps;
  }
  return 0;
}
//-------------------------------------------------------------------------------------------

The code is already explained above about TObjDetTrackBSP and Video Capture and Rectification.

Compile:

 g++ -g -Wall -O2 -o prox_vision_test.out prox_vision_test.cpp -I../3rdparty -I../fv_core -lopencv_core -lopencv_imgproc -lopencv_features2d -lopencv_calib3d -lopencv_video -lopencv_highgui

Execution:

 $ ./prox_vision_test.out [CAMERA_NUMBER [WIDTH HEIGHT]]
   CAMERA_NUMBER: Camera device number (e.g. 1).
 $ ./prox_vision_test.out CONFIG_FILE
   CONFIG_FILE: Configuration file in YAML (e.g. fv_1.yaml).
   The configuration file may include camera configuration (CameraInfo) and
   the configuration of proximity vision (ObjDetTrack).

Usage:

Press 'q' or Esc: Exit the program.
Press 'c': Calibrate the tracker (constructing the background color model).
Press 'C': Show/hide the parameter configuration trackbars.
Press 'W' (shift+'w'): Start/stop video recording.
Press 'r': Reset (clear) the tracking object.
Press 'd': On/off the object detection mode (default=on).
Click a point on the image: Add the color of the point to the object color model. Tips: This is useful when making an object model manually.

↑

Test with Dummy Data †

Here we run the standalone program with dummy data (no actual FingerVision is necessary).

Refer to Dummy FingerVision Data for the dummy data and the detailed explanation of streaming the video files over the network.

$ cd fingervision/tools/
$ ./stream_file1.sh

Access http://localhost:8080/stream.html to see the video stream.

Then run the standalone program.

$ cd fingervision/standalone
$ ./prox_vision_test.out fv_3_l.yaml