deepseek-v4-flash - Create Chat Completion (AI Model · By )

API Documentation

API Reference

Chat

Create Chat Completion

Supported Model Code

Loading models...

deepseek-v4-flash

deepseek-v4-flash0 models support this endpoint

This is a highly efficient and lightweight MoE model with 284 billion parameters in total and 13 billion activated parameters per inference. It natively supports context windows of up to one million tokens, offering fast inference speed, low latency, and cost-effective invocation while maintaining well-balanced overall performance. Designed for high-concurrency and lightweight workloads, it is ideally suited for common essential use cases such as everyday dialogue, content creation, basic RAG applications, and batch text processing.

Create chat completion

https://api.modelstream.ai

POST/v1/chat/completions

Authentication

BearerAuth

AuthenticationBearer <token>

All API requests must be authenticated using a Bearer token in the Authorization header. Please ensure your API key is active.Authorization: Bearer sk-xxxxxx

Parameter Location: Header Param

Request Body

application/json

These parameters come from the selected model form_schema. Switching models updates this list and the request example.

system_prompt?string

Global instructions or persona for the model.

Example Value: You are a helpful and expert assistant.Placeholder: e.g., You are a senior software architect...

prompt*string

RequiredExample Value: Hello! What can you do?Placeholder: Enter your question or instructions...

temperature?number

Higher values make output more random, lower more deterministic.

Example Value: 1Value Range: 0 ≤ value ≤ 2step: 0.1

top_p?number

Nucleus sampling threshold; an alternative to temperature.

Example Value: 1Value Range: 0 ≤ value ≤ 1step: 0.05

presence_penalty?number

Increases the tendency to talk about new topics.

Example Value: 0Value Range: -2 ≤ value ≤ 2step: 0.1

frequency_penalty?number

Reduces the likelihood of repeating the same text verbatim.

Example Value: 0Value Range: -2 ≤ value ≤ 2step: 0.1

max_tokens?number

Example Value: 4096Value Range: 1 ≤ value ≤ 65536

response_format?string

Example Value: text

Enum/Options:

Text: textJSON Object: json_object

stream?boolean

Example Value: true

Response Parameters

application/json

200apiDocs.responses.successCreateResponse

id?string

Parameter description for Id

object?string

Parameter description for Object

created?integer

Parameter description for Created

model?string

Model ID used

choices?array

Parameter description for Choices

usage?object

Parameter description for Usage

prompt_tokens?integer

Parameter description for Prompt Tokens

completion_tokens?integer

Parameter description for Completion Tokens

total_tokens?integer

Parameter description for Total Tokens

prompt_tokens_details?object

Parameter description for Prompt Tokens Details

completion_tokens_details?object

Parameter description for Completion Tokens Details

system_fingerprint?string

Parameter description for System Fingerprint

400apiDocs.responses.badRequestParams

error?object

Parameter description for Error

message?string

Error Message

type?string

Error Type

param?string

Related Parameters

code?string

Error Code

429apiDocs.responses.rateLimited

error?object

Parameter description for Error

message?string

Error Message

type?string

Error Type

param?string

Related Parameters

code?string

Error Code

curl -X POST "https://api.modelstream.ai/v1/chat/completions" \
  -H "Authorization: Bearer <token>" \
  -H "Content-Type: application/json" \
  -d '{
  "model": "deepseek-v4-flash",
  "messages": [
    {
      "role": "system",
      "content": "You are a senior software architect."
    },
    {
      "role": "user",
      "content": "Explain the trade-offs between microservices and a monolith."
    }
  ],
  "temperature": 0.7,
  "top_p": 1,
  "presence_penalty": 0,
  "frequency_penalty": 0,
  "max_tokens": 4096,
  "stream": true
}'

{
  "id": "string",
  "object": "chat.completion",
  "created": 0,
  "model": "string",
  "choices": [
    {
      "index": 0,
      "message": {
        "role": "system",
        "content": null,
        "name": "string",
        "tool_calls": [
          {
            "id": "string",
            "type": "function",
            "function": {
              "name": "string",
              "arguments": "string"
            }
          }
        ],
        "tool_call_id": "string",
        "reasoning_content": "string"
      },
      "finish_reason": "stop"
    }
  ],
  "usage": {
    "prompt_tokens": 0,
    "completion_tokens": 0,
    "total_tokens": 0,
    "prompt_tokens_details": {
      "cached_tokens": 0,
      "text_tokens": 0,
      "audio_tokens": 0,
      "image_tokens": 0
    },
    "completion_tokens_details": {
      "text_tokens": 0,
      "audio_tokens": 0,
      "reasoning_tokens": 0
    }
  },
  "system_fingerprint": "string"
}

API Documentation

API Reference

Chat

Create Chat Completion

Supported Model Code

Loading models...

deepseek-v4-flash

deepseek-v4-flash0 models support this endpoint

Create chat completion

https://api.modelstream.ai

POST/v1/chat/completions

Authentication

BearerAuth

AuthenticationBearer <token>

All API requests must be authenticated using a Bearer token in the Authorization header. Please ensure your API key is active.Authorization: Bearer sk-xxxxxx

Parameter Location: Header Param

Request Body

application/json

These parameters come from the selected model form_schema. Switching models updates this list and the request example.

system_prompt?string

Global instructions or persona for the model.

Example Value: You are a helpful and expert assistant.Placeholder: e.g., You are a senior software architect...

prompt*string

RequiredExample Value: Hello! What can you do?Placeholder: Enter your question or instructions...

temperature?number

Higher values make output more random, lower more deterministic.

Example Value: 1Value Range: 0 ≤ value ≤ 2step: 0.1

top_p?number

Nucleus sampling threshold; an alternative to temperature.

Example Value: 1Value Range: 0 ≤ value ≤ 1step: 0.05

presence_penalty?number

Increases the tendency to talk about new topics.

Example Value: 0Value Range: -2 ≤ value ≤ 2step: 0.1

frequency_penalty?number

Reduces the likelihood of repeating the same text verbatim.

Example Value: 0Value Range: -2 ≤ value ≤ 2step: 0.1

max_tokens?number

Example Value: 4096Value Range: 1 ≤ value ≤ 65536

response_format?string

Example Value: text

Enum/Options:

Text: textJSON Object: json_object

stream?boolean

Example Value: true

Response Parameters

application/json

200apiDocs.responses.successCreateResponse

id?string

Parameter description for Id

object?string

Parameter description for Object

created?integer

Parameter description for Created

model?string

Model ID used

choices?array

Parameter description for Choices

usage?object

Parameter description for Usage

prompt_tokens?integer

Parameter description for Prompt Tokens

completion_tokens?integer

Parameter description for Completion Tokens

total_tokens?integer

Parameter description for Total Tokens

prompt_tokens_details?object

Parameter description for Prompt Tokens Details

completion_tokens_details?object

Parameter description for Completion Tokens Details

system_fingerprint?string

Parameter description for System Fingerprint

400apiDocs.responses.badRequestParams

error?object

Parameter description for Error

message?string

Error Message

type?string

Error Type

param?string

Related Parameters

code?string

Error Code

429apiDocs.responses.rateLimited

error?object

Parameter description for Error

message?string

Error Message

type?string

Error Type

param?string

Related Parameters

code?string

Error Code

curl -X POST "https://api.modelstream.ai/v1/chat/completions" \
  -H "Authorization: Bearer <token>" \
  -H "Content-Type: application/json" \
  -d '{
  "model": "deepseek-v4-flash",
  "messages": [
    {
      "role": "system",
      "content": "You are a senior software architect."
    },
    {
      "role": "user",
      "content": "Explain the trade-offs between microservices and a monolith."
    }
  ],
  "temperature": 0.7,
  "top_p": 1,
  "presence_penalty": 0,
  "frequency_penalty": 0,
  "max_tokens": 4096,
  "stream": true
}'

{
  "id": "string",
  "object": "chat.completion",
  "created": 0,
  "model": "string",
  "choices": [
    {
      "index": 0,
      "message": {
        "role": "system",
        "content": null,
        "name": "string",
        "tool_calls": [
          {
            "id": "string",
            "type": "function",
            "function": {
              "name": "string",
              "arguments": "string"
            }
          }
        ],
        "tool_call_id": "string",
        "reasoning_content": "string"
      },
      "finish_reason": "stop"
    }
  ],
  "usage": {
    "prompt_tokens": 0,
    "completion_tokens": 0,
    "total_tokens": 0,
    "prompt_tokens_details": {
      "cached_tokens": 0,
      "text_tokens": 0,
      "audio_tokens": 0,
      "image_tokens": 0
    },
    "completion_tokens_details": {
      "text_tokens": 0,
      "audio_tokens": 0,
      "reasoning_tokens": 0
    }
  },
  "system_fingerprint": "string"
}