Overview
Note: Model Asset eXchange is moving to Machine Learning Exchange (MLX) – a Linux Foundation AI (LFAI) project. Additional info can be found on the MLX GitHub page.
This model generates captions from a fixed vocabulary that describe the contents of images in the COCO Dataset. The model consists of an encoder model – a deep convolutional net using the Inception-v3 architecture trained on ImageNet-2012 data – and a decoder model – an LSTM network that is trained conditioned on the encoding from the image encoder model. The input to the model is an image, and the output is a sentence describing the image content.
The model is based on the Show and Tell Image Caption Generator Model.
Model Metadata
Domain | Application | Industry | Framework | Training Data | Input Data Format |
---|---|---|---|---|---|
Vision | Image Caption Generator | General | TensorFlow | COCO | Images |
References
- O. Vinyals, A. Toshev, S. Bengio, D. Erhan, “Show and Tell: Lessons learned from the 2015 MSCOCO Image Captioning Challenge”, IEEE transactions on Pattern Analysis and Machine Intelligence, 2016.
- im2txt TensorFlow Model GitHub Page
- COCO Dataset Project Page
Licenses
Component | License | Link |
---|---|---|
This repository | Apache 2.0 | LICENSE |
Model Weights | MIT | Pretrained Show and Tell Model |
Model Code (3rd party) | Apache 2.0 | im2txt |
Test assets | Various | Asset README |
Options available for deploying this model
This model can be deployed using the following mechanisms:
Deploy from Dockerhub:
docker run -it -p 5000:5000 codait/max-image-caption-generator
Deploy on Red Hat OpenShift:
Follow the instructions for the OpenShift web console or the OpenShift Container Platform CLI in this tutorial and specify
codait/max-image-caption-generator
as the image name.Deploy on Kubernetes:
kubectl apply -f https://raw.githubusercontent.com/IBM/MAX-Image-Caption-Generator/master/max-image-caption-generator.yaml
A more elaborate tutorial on how to deploy this MAX model to production on IBM Cloud can be found here.
Locally: follow the instructions in the model README on GitHub
Example Usage
You can test or use this model
Test the model using cURL
Once deployed, you can test the model from the command line. For example if running locally:
curl -F "image=@assets/surfing.jpg" -X POST http://127.0.0.1:5000/model/predict
{
"status": "ok",
"predictions": [
{
"index": "0",
"caption": "a man riding a wave on top of a surfboard .",
"probability": 0.038827644239537
},
{
"index": "1",
"caption": "a person riding a surf board on a wave",
"probability": 0.017933410519265
},
{
"index": "2",
"caption": "a man riding a wave on a surfboard in the ocean .",
"probability": 0.0056628732021868
}
]
}
Test the model in a Node-RED flow
Complete the node-red-contrib-model-asset-exchange module setup instructions and import the image-caption-generator
getting started flow.
Test the model in CodePen
Learn how to send an image to the model and how to render the results in CodePen.
Test the model in a serverless app
You can utilize this model in a serverless application by following the instructions in the Leverage deep learning in IBM Cloud Functions tutorial.
Links
- Image Caption Generator Web App: A reference application created by the IBM CODAIT team that uses the Image Caption Generator
Resources and Contributions
If you are interested in contributing to the Model Asset Exchange project or have any queries, please follow the instructions here.