Interfaze

logo

Beta

pricing

docs

blog

sign in

Ocr Captcha

Ocr Captcha by xiaolv, a image-to-text model with OCR capabilities. Understand and compare OCR features, benchmarks, and capabilities.

Comparison

FeatureOcr CaptchaInterfaze
Input Modalities

image

image, text, audio, video, document

Native OCRYesYes
Long Document ProcessingNoYes
Language Support

unknown

162+

Native Speech-to-TextNoYes
Native Object DetectionNoYes
Guardrail ControlsNoYes
Context Input Size

unknown

1M

Tool CallingNo

Tool calling supported + built in browser, code execution and web search

OCR Capabilities

FeatureOcr CaptchaInterfaze
Text Bounding BoxesNoYes
Confidence ScoresNoYes
Dense Image ProcessingNoYes
Low Quality ImagesNoYes
Handwritten TextNoYes
Charts, Tables & EquationsNoYes

Scaling

FeatureOcr CaptchaInterfaze
Scaling

Self-hosted/Provider-hosted with quantization

Unlimited

View model card on Hugging Face

介绍(Introduction)

**验证码识别模型(ocr-captcha)**专门识别常见验证码的模型,训练模型有2个:

1.small:训练数据大小为700MB,约8.4万张验证码图片,训练轮次27轮,最终的精度将近100%,推荐下载这个模型

2.big:训练数据大小为11G,约135万个验证码图片,训练轮次1轮,最终的精度将近93.95%(由于资源问题,无法训练太久);

数据分布

1.类型:1. 纯数字型;2. 数字+字母型;3.纯字母型(大小写)

2.长度:4位、5位、6位

数据微调

1.基座模型:基座模型参考达摩院发布的读光-文字识别-行识别模型-中英-通用领域

2.具体微调参考以上链接

模型体验链接

modelscope:验证码识别模型(ocr-captcha)

单独模型链接(modelscope)

1.验证码识别模型(小)-small

2.验证码识别模型(大)-big

快速使用(Quickstart)

代码提供web网页版:myself_train_model.py

from modelscope.pipelines import pipeline
from modelscope.utils.constant import Tasks
import gradio as gr
import os


class xiaolv_ocr_model():

    def __init__(self):
        model_small = r"./output_small"
        model_big = r"./output_big"
        self.ocr_recognition_small = pipeline(Tasks.ocr_recognition, model=model_small)
        self.ocr_recognition1_big = pipeline(Tasks.ocr_recognition, model=model_big)


    def run(self,pict_path,moshi = "small", context=[]):
        pict_path = pict_path.name
        context = [pict_path]

        if moshi == "small":
            result = self.ocr_recognition_small(pict_path)
        else:
            result = self.ocr_recognition1_big(pict_path)

        context += [str(result['text'][0])]
        responses = [(u, b) for u, b in zip(context[::2], context[1::2])]
        print(f"识别的结果为:{result}")
        os.remove(pict_path)
        return responses,context




if __name__ == "__main__":
    pict_path = r"C:\Users\admin\Desktop\图片识别测试\企业微信截图_16895911221007.png"
    ocr_model = xiaolv_ocr_model()
    # ocr_model.run(pict_path)

联系我们(Contact Us)

如果你想给我们的研发团队和产品团队留言,请通过邮件(2240560729@qq.com)联系我们。

Want more deterministic results?