Navigation : EXPO21XX > VISION 21XX > H15: Research and Universities > TLD VISION
Videos
Loading the player ...
  • Offer Profile
  • TLD is an award-winning, real-time algorithm for tracking of unknown objects in video streams. The object of interest is defined by a bounding box in a single frame. TLD simultaneously Tracks the object, Learns its appearance and Detects it whenever it appears in the video. The result is a real-time tracking that typically improves over time.

    TLD has been developed by Zdenek Kalal during his PhD thesis supervised by Krystian Mikolajczyk and Jiri Matas. The main contributions of TLD have been presented at international computer-vision conferences. For his work on TLD, Zdenek Kalal has been awarded the UK ICT Pioneers 2011.
Product Portfolio
  • TLD - TRACKING - LEARNING - DETECTION

  • PREDATOR - A smart camera that learns from its errors
    Due to its learning abilities, TLD has been advertised under name Predator.

    Key Features
    • TLD tracks currently only a single object
    • Input: video stream from single monocular camera, bounding box defining the object
    • Output: object location in the stream, object detector
    • Implementation: Matlab + C, single thread, no GPU
    • No offline training stage
    • Real-time performance on QVGA video stream
    • Ported to Windows, Mac OS X and Linux
    OBJECTIVES
    Our goal is long-term, real-time tracking of arbitrary objects. The object is defined by a region of interest in a single frame. The video sequence is unconstrained, the object might significantly change appearance, get partially or fully occluded or move in and out of the field of view.

    MOTIVATION
    Long-term tracking of arbitrary objects is a the core problem in many computer vision applications: surveillance, object auto-focus, SLAM, games, HCI, video annotation.

    CHALLENGES
    Real-time performance, partial and full occlusions, illumination changes, large displacements, background clutter, similar objects, low video quality.

    THE APPROACH
    Decomposition of the long-term tracking task into three components: tracking, learning and detection (TLD). Each of these components deals withdifferent aspect of the problem, the components are running in parallel and are combined in a synergetic manner to suppress their drawbacks.

    FUTURE WORK
    Document the code and make it publically available. Automatic initialization, test different tracker and detector, eliminate planarity assumption, explicitly handle out-of-plane rotation, track multiple targets, learn shape. Our goal is long-term, real-time tracking of arbitrary objects. The object is defined by a region of interest in a single frame. The video sequence is unconstrained, the object might significantly change appearance, get partially or fully occluded or move in and out of the field of view.
    • TLD

    • A framework addressing long-term tracking. TLD trains a detector of an object after initialization from a single patch and its warps. The tracker and the detector are running in parallel and both contribute to estimated location of the object. "Not visible" is possible output. Updates of the tracker and the detector depends on the learning module described below.
    • TRACKING

    • Median-shift tracker - tracker of a rectangle, based on the Lucas-Kanade tracker, robust to partial occlusions. Estimates translation and scale. Tracker validation - detector is updated as long as the trajectory is forward-backward consistent.
    • LEARNING

    • The learning is implemented withing the P-N Learning framework. Object is tracked by a tracker. Patches close to the trajectory update the detector with positive label (P-consaints). The object is detected by the detector, non-maximaly confident detections update the detector with negative label (N-constraints). Both constraints make errors, the learning stability is achieved by their mutual compensation.
    • DETECTION

    • 1st stage filter:
      Randomized forest, 2bitBP features

      2nd stage classifier:
      1-NN, 10x10 patch, NCC
      Confidence = d / (d + d )
      2bit Binary Feature
  • High-level description of TLD