OpenAI's GPT-4o model has multi-modal capabilities, which enables it to process and describe images. This short blog post shows how to call this model via Langchain.
Setting Up Your Environment
First, ensure you have the necessary libraries installed. You will need langchain
and langchain-openai
. Install them using pip if you haven't already:
pip install langchain langchain-openai
Writing the Python Script
Here's a step-by-step guide to writing the script that uses GPT-4o to describe an image:
Import the Libraries: Begin by importing the necessary modules from
langchain_core
andlangchain_openai
.from langchain_core.messages import HumanMessage from langchain_openai import ChatOpenAI
Initialize the Model: Create an instance of the
ChatOpenAI
class with thegpt-4o
model. Provide theapi_key
if needed, and other parameters, such astemperature
ormax_tokens
.llm = ChatOpenAI(model="gpt-4o", openai_api_key=...)
Prepare the Request: Construct a
HumanMessage
with a content list that includes bothtext
andimage_url
types. Invoke this LLM call.response = llm.invoke( [ HumanMessage( content=[ {"type": "text", "text": "What's in the image?"}, { "type": "image_url", "image_url": { "url": "https://unsplash.com/photos/two-people-in-scuba-gear-swimming-in-the-ocean-SuGTwrtPCg4" }, }, ] ) ] )
Output the Description: Print the model's response to see the image description.
print(response.content)
Here's the complete script for clarity:
from langchain_core.messages import HumanMessage
from langchain_openai import ChatOpenAI
llm = ChatOpenAI(model="gpt-4o")
response = llm.invoke(
[
HumanMessage(
content=[
{"type": "text", "text": "What's in the image?"},
{
"type": "image_url",
"image_url": {
"url": "https://unsplash.com/photos/two-people-in-scuba-gear-swimming-in-the-ocean-SuGTwrtPCg4"
},
},
]
)
]
)
print(response.content)
# The image depicts two people in scuba gear swimming underwater in the ocean. The divers are surrounded by clear blue water, giving a serene and adventurous feel to the scene. They appear to be exploring the underwater environment, likely observing marine life or the ocean floor. The image conveys a sense of exploration and the beauty of the underwater world.
It's that simple.
For other calls, you can refer to https://python.langchain.com/v0.1/docs/integrations/chat/openai/