Detection, Recognition, Analysis and Extraction of Chart Content.
The principle goal is to develop robust methods that address the problem of reverse engineering data in charts in scientific documents to enable access to the underlying quantitative information not otherwise available. Many tasks such as chart question answering, and general information extraction would be enhanced by a solution to this problem. The problem we are addressing can be summarized by a set of six tasks as shown in the figure above. First we aim to extract the base structure of the chart, which is necessary to constrain data extraction within plot area. In our approach, a U-Net with offset prediction and Mask-RCNN with Graph Neural Networks are adopted. In order to prompt general and robust performance of our method, more charts that meet real-world-data distribution are generated to increase the diversity of training datasets.