StreamingSSEReal-timeImplementation

Streaming vs Blocking Calls: How to Choose and Implement

AI APIs support both streaming and non-streaming response modes. Understand the pros and cons of each, and how to implement them in code.

Blocking Calls

Client sends a request, waits for the server to fully process it, then receives the complete result at once. Pros: simple to implement, good for batch processing. Cons: high perceived latency, users wait for full generation.

Streaming Output

Server pushes generated content chunks in real-time via Server-Sent Events (SSE), displayed word-by-word on the frontend. Pros: better UX (typewriter effect), lower perceived latency. Cons: more complex to implement, requires stream parsing.

Python Streaming Example

from openai import OpenAI

client = OpenAI(api_key="key", base_url="https://chinawhapi.com/v1")

stream = client.chat.completions.create(
    model="qwen3.6-plus",
    messages=[{"role": "user", "content": "Explain microservices"}],
    stream=True,
)

for chunk in stream:
    if chunk.choices[0].delta.content:
        print(chunk.choices[0].delta.content, end="", flush=True)

Node.js Streaming Example

const stream = await client.chat.completions.create({
  model: "qwen3.6-plus",
  messages: [{ role: "user", content: "Explain microservices" }],
  stream: true,
});

for await (const chunk of stream) {
  process.stdout.write(chunk.choices[0]?.delta?.content ?? "");
}

Use Case Guide

Chat interfaces, real-time assistants, code completion → streaming. Batch content generation, report exports, async processing → blocking.