Optical Character Recognition

Overview

Note: Model Asset eXchange is moving to Machine Learning Exchange (MLX) – a Linux Foundation AI (LFAI) project. Additional info can be found on the MLX GitHub page.

This repository contains code to instantiate and deploy an optical character recognition model. This model takes an image of text as an input and returns the predicted text. This model was trained on 20 samples of 94 characters from 8 different fonts and 4 attributes (regular, bold, italic, bold + italic) for a total of 60,160 training samples. Please see the paper An Overview of the Tesseract OCR Engine for more detailed information about how this model was trained.

Model Metadata

Domain Application Industry Framework Training Data Input Data Format
Image & Video Optical Character Recognition General n/a Tesseract Data Files Image (PNG/JPG)

References

Licenses

Component License Link
This repository Apache 2.0 LICENSE
Model Code (3rd party) Apache 2.0 Tesseract OCR Repository
Test Samples Apache 2.0 Sample README

Options available for deploying this model

  • Deploy from Dockerhub:

    docker run -it -p 5000:5000 codait/max-ocr
    
  • Deploy on Red Hat OpenShift:

    Follow the instructions for the OpenShift web console or the OpenShift Container Platform CLI in this tutorial and specify codait/max-ocr as the image name.

  • Deploy on Kuberneters:

    kubectl apply -f https://raw.githubusercontent.com/IBM/MAX-OCR/master/max-ocr.yaml
    

    A more elaborate tutorial on how to deploy this MAX model to production on IBM Cloud can be found here.

  • Locally: follow the instructions in the model README on GitHub

Test the model using cURL

Once deployed, you can test the model from the command line. For example if running locally:

$ curl -F "image=@samples/quick_start_watson_studio.jpg" -XPOST http://localhost:5000/model/predict
{
  "status": "ok",
  "text": [
    [
      "Quick Start with Watson Studio"
    ],
    [
      "Watson Studio is IBM’s hosted notebook service, and you can create",
      "a free account at https://www.ibm.com/cloud/watson-studio. Other",
      "hosted notebook services can be used to run the noteooks as well,",
      "but Watson Studio offers all of the frameworks and languages that",
      "are used for this book’s examples. Once you have created an account",
      "and logged in, you can begin by creating a project and notebook."
    ]
  ]
}

Resources and Contributions

If you are interested in contributing to the Model Asset Exchange project or have any queries, please follow the instructions here.

Legend