pytesseract实现OCR金额识别



  • import pytesseract
    def getNumber(start_point_array, end_point_array):
        money = ''
    
        try:
            finalDic = None
            if start_point_array[0] not in ['',0] and start_point_array[1] not in ['',0] and end_point_array[0]not in ['',0] and  end_point_array[1]not in ['',0]:
                start_x_point = start_point_array[0]
                start_y_point = start_point_array[1]
                end_x_point = end_point_array[0]
                end_y_point = end_point_array[1]
                window_capture('demo2',start_x_point,start_y_point,end_x_point,end_y_point)
                image = Image.open('demo2.png')
                finalCode = pytesseract.image_to_string(image, lang="eng",config="-psm 7")
                '''识别纠错'''
                ocr_body =finalCode.strip().replace(' ', '').replace(',', '.').replace('O', '0').replace('o','0').replace('\n', '')
                for _m in ocr_body:
                    if _m.isdigit():
                        money += _m
                    elif _m == '.':
                        money += _m
                float(money)
                return True, money
            else:
                return False, money
        except Exception as e:
            print( 'do not recongize money{},prepare to query location'.format(money))
            print(str(traceback.format_exc()))
            return False, money
    

    代码片段,用到的,自己改下就好了。
    start_point_array,end_point_array分别为左上角和又下角元组坐标



  • 忘了上传训练好的金额识别库了tesseract-ocr-setup-3.05.00dev.exe,要的找我


Log in to reply