Multi-turn Conversations: Managing History and Context Windows
Each AI API call is independent — the client is responsible for maintaining conversation history. This article covers multi-turn implementation and context window management.
Multi-turn Conversation Principles
AI APIs like /v1/chat/completions don't maintain state. Each call requires the client to pass in the full conversation history (all user/assistant messages) as the messages array.
Implementation
messages = []
def chat(user_input):
messages.append({"role": "user", "content": user_input})
response = client.chat.completions.create(
model="qwen3.6-plus",
messages=messages
)
assistant_msg = response.choices[0].message.content
messages.append({"role": "assistant", "content": assistant_msg})
return assistant_msgContext Window Management
Each model has a context window limit (e.g., 128K, 256K) — exceeding it causes errors or truncation. Long conversations need history compression: keep the most recent N turns, or use a summarization model.
Compression Strategies
Keep core content from the most recent 10-20 turns; use a model to summarize history then replace original messages; or truncate excess (suitable for scenarios that don't heavily rely on history).
Token Counting
Estimate total token count before submitting to the API — trigger compression if it exceeds context limits. Use tiktoken or equivalent libraries for accurate counting.