Visual Object Tracking
150 papers with code • 21 benchmarks • 26 datasets
Visual Object Tracking is an important research topic in computer vision, image understanding and pattern recognition. Given the initial state (centre location and scale) of a target in the first frame of a video sequence, the aim of Visual Object Tracking is to automatically obtain the states of the object in the subsequent video frames.
Libraries
Use these libraries to find Visual Object Tracking models and implementationsMost implemented papers
SSD: Single Shot MultiBox Detector
Experimental results on the PASCAL VOC, MS COCO, and ILSVRC datasets confirm that SSD has comparable accuracy to methods that utilize an additional object proposal step and is much faster, while providing a unified framework for both training and inference.
SiamRPN++: Evolution of Siamese Visual Tracking with Very Deep Networks
Moreover, we propose a new model architecture to perform depth-wise and layer-wise aggregations, which not only further improves the accuracy but also reduces the model size.
One-Shot Video Object Segmentation
This paper tackles the task of semi-supervised video object segmentation, i. e., the separation of an object from the background in a video, given the mask of the first frame.
SiamFC++: Towards Robust and Accurate Visual Tracking with Target Estimation Guidelines
Following these guidelines, we design our Fully Convolutional Siamese tracker++ (SiamFC++) by introducing both classification and target state estimation branch(G1), classification score without ambiguity(G2), tracking without prior knowledge(G3), and estimation quality score(G4).
ECO: Efficient Convolution Operators for Tracking
Moreover, our fast variant, using hand-crafted features, operates at 60 Hz on a single CPU, while obtaining 65. 0% AUC on OTB-2015.
High Performance Visual Tracking With Siamese Region Proposal Network
Visual object tracking has been a fundamental topic in recent years and many deep learning based trackers have achieved state-of-the-art performance on multiple benchmarks.
Deeper and Wider Siamese Networks for Real-Time Visual Tracking
Siamese networks have drawn great attention in visual tracking because of their balanced accuracy and speed.
Rethinking Self-supervised Correspondence Learning: A Video Frame-level Similarity Perspective
To learn generalizable representation for correspondence in large-scale, a variety of self-supervised pretext tasks are proposed to explicitly perform object-level or patch-level similarity learning.
Discriminative Correlation Filter with Channel and Spatial Reliability
Short-term tracking is an open and challenging problem for which discriminative correlation filters (DCF) have shown excellent performance.
YouTube-VOS: Sequence-to-Sequence Video Object Segmentation
End-to-end sequential learning to explore spatial-temporal features for video segmentation is largely limited by the scale of available video segmentation datasets, i. e., even the largest video segmentation dataset only contains 90 short video clips.