Multi-turnConversationContextMemory

Multi-turn Conversations: Managing History and Context Windows

Each AI API call is independent — the client is responsible for maintaining conversation history. This article covers multi-turn implementation and context window management.

Multi-turn Conversation Principles

AI APIs like /v1/chat/completions don't maintain state. Each call requires the client to pass in the full conversation history (all user/assistant messages) as the messages array.

Implementation

messages = []

def chat(user_input):
    messages.append({"role": "user", "content": user_input})
    
    response = client.chat.completions.create(
        model="qwen3.6-plus",
        messages=messages
    )
    
    assistant_msg = response.choices[0].message.content
    messages.append({"role": "assistant", "content": assistant_msg})
    
    return assistant_msg

Context Window Management

Each model has a context window limit (e.g., 128K, 256K) — exceeding it causes errors or truncation. Long conversations need history compression: keep the most recent N turns, or use a summarization model.

Compression Strategies

Keep core content from the most recent 10-20 turns; use a model to summarize history then replace original messages; or truncate excess (suitable for scenarios that don't heavily rely on history).

Token Counting

Estimate total token count before submitting to the API — trigger compression if it exceeds context limits. Use tiktoken or equivalent libraries for accurate counting.