GPT-4o (Omni) is a revolutionary omni-modal model from OpenAI. It natively processes any combination of text, audio, image, and video inputs and outputs in real-time. With response speeds matching human conversation and exceptional performance in cross-modal understanding and emotional interaction, it is the core engine for next-generation real-time AI assistants and interactive applications.
Generate image
All API requests must be authenticated using a Bearer token in the Authorization header. Please ensure your API key is active.Authorization: Bearer sk-xxxxxx
These parameters come from the selected model form_schema. Switching models updates this list and the request example.
system_prompt?stringGlobal instructions or persona for the model.
prompt*stringimage_urls?arraySupports multi-image understanding (optional).
image_detail?stringtemperature?numberHigher values make output more random, lower more deterministic.
top_p?numberNucleus sampling threshold; an alternative to temperature.
presence_penalty?numberIncreases the tendency to talk about new topics.
frequency_penalty?numberReduces the likelihood of repeating the same text verbatim.
seed?numberA fixed seed yields more reproducible results (optional).
max_completion_tokens?numberresponse_format?stringstream?booleancreated?integerParameter description for Created
data?arrayParameter description for Data