一种面向电力作业的工作票分割与作业信息提取方法
DOI:
作者:
作者单位:

1.广西电网有限责任公司电力科学研究院;2.广西电网有限责任公司

作者简介:

通讯作者:

中图分类号:

TM721

基金项目:

基于机器学习的广西电网电力作业全过程风险管控系统研发及应用


A Power Work Ticket Segmentation and Text Information Extraction Method
Author:
Affiliation:

1.Electric Power Research Institute of Guangxi Power Grid Co,Ltd;2.Guangxi Power Grid Co,Ltd

Fund Project:

  • 摘要
  • |
  • 图/表
  • |
  • 访问统计
  • |
  • 参考文献
  • |
  • 相似文献
  • |
  • 引证文献
  • |
  • 资源附件
  • |
  • 文章评论
    摘要:

    光学字符识别(optical character recognition ,OCR)技术在近些年得到了较大的发展,但在电力工作票的识别领域应用还较少。现有电力工作票分割与识别通常基于OCR与语义分割的方法实现工作票表格分割与文本信息的提取。但存在模型训练量大,分割效果受OCR检测准确率影响较大等问题。因此,本论文提出了一种电力工作票分割与文本信息提取方法,首先该方法通过对电力工作票进行二值化处理,并对所得图像进行了膨胀腐蚀等操作提取了表格框线。然后基于对表格框线的检测结果对工作票进行了分割操作得到了单元格图片。然后使用OCR方法对各单元格内对应的文本信息进行检测。最后基于正则匹配方法对文本数据进行结构化处理。本文基于Python验证了所提方法的有效性。

    Abstract:

    Optical character recognition (OCR) technology has been greatly developed in recent years, but its application in the field of identification of power work tickets is still relatively small. Existing power work ticket segmentation and recognition are usually based on OCR and semantic segmentation methods to achieve work ticket table segmentation and text information extraction. However, there are problems such as a large amount of model training, and the segmentation effect is greatly affected by the accuracy of OCR detection. Therefore, this paper proposes a method of power work ticket segmentation and text information extraction. First, this method highlights the table borders by binarizing the power work ticket and performing operations such as expansion and corrosion on the resulting image. Then, the work ticket is divided based on the detection result of the table border, and the position of each cell is obtained. Then use the OCR method to detect the corresponding text information in each cell. Finally, the text data is structured based on the regular matching method. This paper verifies the effectiveness of the proposed method based on Python.

    参考文献
    相似文献
    引证文献
引用本文
分享
文章指标
  • 点击次数:
  • 下载次数:
  • HTML阅读次数:
  • 引用次数:
历史
  • 收稿日期:2021-05-26
  • 最后修改日期:2021-07-06
  • 录用日期:2021-09-11
  • 在线发布日期:
  • 出版日期: