1.Electric Power Research Institute of Guangxi Power Grid Co,Ltd;2.Guangxi Power Grid Co,Ltd
光学字符识别(optical character recognition ,OCR)技术在近些年得到了较大的发展,但在电力工作票的识别领域应用还较少。现有电力工作票分割与识别通常基于OCR与语义分割的方法实现工作票表格分割与文本信息的提取。但存在模型训练量大,分割效果受OCR检测准确率影响较大等问题。因此,本论文提出了一种电力工作票分割与文本信息提取方法,首先该方法通过对电力工作票进行二值化处理,并对所得图像进行了膨胀腐蚀等操作提取了表格框线。然后基于对表格框线的检测结果对工作票进行了分割操作得到了单元格图片。然后使用OCR方法对各单元格内对应的文本信息进行检测。最后基于正则匹配方法对文本数据进行结构化处理。本文基于Python验证了所提方法的有效性。
Optical character recognition (OCR) technology has been greatly developed in recent years, but its application in the field of identification of power work tickets is still relatively small. Existing power work ticket segmentation and recognition are usually based on OCR and semantic segmentation methods to achieve work ticket table segmentation and text information extraction. However, there are problems such as a large amount of model training, and the segmentation effect is greatly affected by the accuracy of OCR detection. Therefore, this paper proposes a method of power work ticket segmentation and text information extraction. First, this method highlights the table borders by binarizing the power work ticket and performing operations such as expansion and corrosion on the resulting image. Then, the work ticket is divided based on the detection result of the table border, and the position of each cell is obtained. Then use the OCR method to detect the corresponding text information in each cell. Finally, the text data is structured based on the regular matching method. This paper verifies the effectiveness of the proposed method based on Python.