FCOS Fully Convolutional One-Stage Object Detection
Citation: Z. Tian, C. Shen, H. Chen and T. He, “FCOS: Fully Convolutional One-Stage Object Detection,” 2019 IEEE/CVF International Conference on Computer Vision (ICCV), 2019, pp. 9626-9635, doi: 10.1109/ICCV.2019.00972.
Brief Summary:
Object Detection has been growing at a rapid pace with various applications in autonomous vehicles, mobile robots, etc. State-of-the-art detectors at that time, like Faster R-CNN use anchor boxes of three specific aspect ratios and sizes for detecting multiple objects in an image. However, this is still a hyperparameter that needs to be tuned carefully to get higher performance. This paper overcomes this issue by proposing an anchorless method of object detection known as Fully Convolutional One-Stage (FCOS) Object Detection. By using ResNeXt-64x4d-101, the model achieves 44.7% in Average Precision (AP) with single-model and single-scale testing, surpassing previous one-stage detectors. On top of this, they additionally propose a “center-ness” approach which helps suppress the low-quality detected bounding boxes and improves overall performance by a large margin. Since the approach is a per-pixel prediction method, it helps in reusing the semantic segmentation pipeline in deployment.
Main Contributions:
The paper proposes an anchor-free method of object detection which helps in reducing the computations by a huge margin.
A novel approach called “centerness” is presented, which helps in suppressing the low-quality detected bounding boxes.
The method proposed helps in generalizing the concept of object detection with other FCN-solvable tasks such as semantic segmentation.
Strengths:
The paper achieves state-of-the-art results among one-stage object detectors.
Avoiding the anchor box helps in avoiding complex computations such as Intersection-Over-Union (IOU).
FCOS can be replaced with Regional Proposal Networks (RPN) in two-stage detectors that enhance the performance of anchor-based methods.
This paper can be extended to tasks such as instance segmentation and key-point detection.
The paper is well-written; the code is provided for reproducibility and has enough mathematical explanations and supporting figures.
Weakness:
There are assumptions for many parts of the paper without sufficient proof, which can be confusing to the readers.
In-Depth Analysis of Strengths and Weaknesses:
The paper with its anchor-free detection method scores over the anchor-based methods such as Faster R-CNN by avoiding complicated IOU computation resulting in faster training times. The novel “center-ness” approach also helped in increasing training time, as it helped in reducing the number of outliers that the Non-Maximum Suppression has to deal with. Avoiding the anchor-box method also helped in overcoming complex computations such as Intersection-Over-Union. All these helped in enhancing the efficiency of the model by reducing computation time, which led to the state-of-the-art status of the paper.
FCOS can also be used along with anchor-based methods such as Faster R-CNN by using it for the Regional Proposal Networks to improve the performance of those models.
It is not described how several levels in multi-level prediction with FPN affect its mean Average Precision.
Experimental Results:
The experiments were conducted on the MS-COCO dataset with ResNet-50 as the backbone network. The ablation study is done in an extensive manner, carefully analyzing the effectiveness of the proposed architecture.
Multi-Level Prediction with FPN: To avoid overlapping bounding boxes, the problems of poor BPR and ambiguous samples must be fixed. They obtained a BPR of roughly 95.55%, which is greater than anchor-based detectors. They get an even greater BPR of 98.40% with FPN included. They outperform anchor-based models with their FCOS because they obtain a substantially smaller proportion of unclear samples.
Center-ness enhances the bounding box quality by adding center-ness, and they found that this raised AP performance to 37.1%, exceeding RetinaNet’s 35.9%. They also eliminated the requirement for computing IOU.
FCOS achieves 43.2% in AP using ResNeXt-64x4d-101-FPN as their guiding principle. It performs far better than the SOTA anchor-free detector CornerNet while requiring substantially less processing.
Extensions:
The FCOS-based object detection can be extended to areas such as semantic segmentation, instance segmentation, and keypoint detection, as the per-pixel prediction methods are analogous to it.
Comments:
The ablation study for every component of the architecture helps to analyze and understand the paper in a better way.