How to call OpenAI GPT-4o from Langchain to describe images

OpenAI's GPT-4o model has multi-modal capabilities, which enables it to process and describe images. This short blog post shows how to call this model via Langchain.

Setting Up Your Environment

First, ensure you have the necessary libraries installed. You will need langchain and langchain-openai. Install them using pip if you haven't already:

pip install langchain langchain-openai

Writing the Python Script

Here's a step-by-step guide to writing the script that uses GPT-4o to describe an image:

  1. Import the Libraries: Begin by importing the necessary modules from langchain_core and langchain_openai.

     from langchain_core.messages import HumanMessage
     from langchain_openai import ChatOpenAI
    
  2. Initialize the Model: Create an instance of the ChatOpenAI class with the gpt-4o model. Provide the api_key if needed, and other parameters, such as temperature or max_tokens.

     llm = ChatOpenAI(model="gpt-4o", openai_api_key=...)
    
  3. Prepare the Request: Construct a HumanMessage with a content list that includes both text and image_url types. Invoke this LLM call.

     response = llm.invoke(
         [
             HumanMessage(
                 content=[
                     {"type": "text", "text": "What's in the image?"},
                     {
                         "type": "image_url",
                         "image_url": {
                             "url": "https://unsplash.com/photos/two-people-in-scuba-gear-swimming-in-the-ocean-SuGTwrtPCg4"
                         },
                     },
                 ]
             )
         ]
     )
    
  4. Output the Description: Print the model's response to see the image description.

     print(response.content)
    

Here's the complete script for clarity:

from langchain_core.messages import HumanMessage
from langchain_openai import ChatOpenAI

llm = ChatOpenAI(model="gpt-4o")

response = llm.invoke(
    [
        HumanMessage(
            content=[
                {"type": "text", "text": "What's in the image?"},
                {
                    "type": "image_url",
                    "image_url": {
                        "url": "https://unsplash.com/photos/two-people-in-scuba-gear-swimming-in-the-ocean-SuGTwrtPCg4"
                    },
                },
            ]
        )
    ]
)

print(response.content)

# The image depicts two people in scuba gear swimming underwater in the ocean. The divers are surrounded by clear blue water, giving a serene and adventurous feel to the scene. They appear to be exploring the underwater environment, likely observing marine life or the ocean floor. The image conveys a sense of exploration and the beauty of the underwater world.

It's that simple.

For other calls, you can refer to https://python.langchain.com/v0.1/docs/integrations/chat/openai/