China AI API FAQ

These questions will be expanded into dedicated answer pages to help users and AI search engines understand ChinaWHAPI's models, interfaces, and error handling.

Does DeepSeek API charge fees?

DeepSeek API charges per token, with separate rates for input and output. When using ChinaWHAPI, users can view real-time pricing, balance, and daily usage estimates for all models in the unified console.

Do Chinese models support OpenAI SDK?

ChinaWHAPI is fully compatible with OpenAI SDK protocol. Simply change the baseURL to https://chinawhapi.com/v1, and you can use openai Python/JS SDK to call all Chinese models without any code changes.

How to use Chinese models in Cursor?

In Cursor settings, find AI Provider, select Custom/OpenAI Compatible, fill in ChinaWHAPI's base URL (https://chinawhapi.com/v1) and your API Key. You can now use Chinese models like DeepSeek and Qwen as alternatives to GPT-4.

Why does API return 401?

401 indicates Unauthorized, usually due to invalid API Key, deleted key, incorrect format (should be Bearer xxx), or expired key. Check the key status in console and request header format.

What does API 403 mean?

403 indicates Forbidden, usually due to insufficient account balance, expired subscription, or attempting to access models not enabled. Log in to console to check subscription status and balance.

What to do when hitting 429 rate limit?

429 indicates rate limit triggered. Recommendations: add exponential backoff retry (wait 2s, then 4s, then 8s); reduce concurrency; consider upgrading subscription for higher rate limits.

How to troubleshoot 400 Bad Request?

400 usually means request body format error. Common causes: incorrect JSON format, non-existent model name, missing messages field, incorrect messages format.

API returns 500 Internal Server Error

500 usually indicates temporary upstream model service exception, not your code issue. Wait 5-10 seconds and retry. If persistent, check ChinaWHAPI status page or contact support.

What is a Token? How to calculate?

A token is the smallest unit language models process text by. English: approximately 4 characters = 1 token; Chinese: approximately 1-2 characters = 1 token. ChinaWHAPI console shows actual token consumption.

What is Context Window?

Context window is the maximum tokens a model can process in a single call (including input and output). Exceeding this will truncate or error. ChinaWHAPI models range from 32K to 1M tokens.

What are reasoning models?

Reasoning models (like DeepSeek R1, ERNIE X1.1) have built-in Chain-of-Thought thinking chains, suitable for math, code analysis, complex logic tasks, but slower response and higher cost.

What is RAG?

RAG (Retrieval-Augmented Generation) retrieves relevant information from knowledge base first, then combines the information with the original question for the LLM to generate answers. Suitable for enterprise knowledge bases and scenarios requiring source citations.

How to get ChinaWHAPI API Key?

Log in to https://chinawhapi.com/console, click to generate new key on API Keys page. Recommend creating separate keys for development, testing, and production environments for easier management and permission isolation.

What is ChinaWHAPI base URL?

ChinaWHAPI's OpenAI-compatible endpoint address is https://chinawhapi.com/v1. Set the base_url parameter in OpenAI SDK to this value.

Which model is best for code generation?

Code task recommendations: Qwen3 Coder Plus (daily code, completion, Bugfix), DeepSeek V4 Pro (complex architecture code, algorithms), Doubao Seed Code (frontend development, Bugfix).

Which model is strongest for Chinese tasks?

Chinese general tasks: Qwen3.6 Max Preview; Chinese reasoning: DeepSeek R1; Chinese long documents: Kimi K2.6. Qwen3.6 Plus offers best cost-performance for mainstream tasks.

Which model for math and reasoning tasks?

First choice for reasoning: DeepSeek R1 (pure reasoning model with strongest Chain-of-Thought); secondary: ERNIE X1.1 or Doubao Seed 1.6 Thinking. Regular math: Qwen3.6 Plus suffices.

Which model for fast response?

Speed-focused recommendations: Doubao Seed 1.6 Flash (fastest), Qwen3.5 Flash, Hunyuan TurboS Latest. Suitable for real-time customer service, high concurrency, and lightweight tasks.

Which model has the lowest cost?

Doubao Seed 1.6 Flash has lowest unit price, suitable for high-concurrency lightweight tasks; Qwen3.5 Flash is second, with good quality and speed, suitable for daily business use.

Which model for long document processing?

First choice for long documents: Kimi K2.6 (256K context); second: Kimi K2.5 (256K); DeepSeek V4 series also supports 1M ultra-long context, suitable for books, contracts, papers.

Which model for image understanding?

Vision model recommendations: Qwen3 VL Plus (strongest Chinese image understanding), GLM-5V Turbo (chart analysis), Hunyuan Vision 1.5 (Tencent ecosystem integration).

Which model for building AI Agent?

Agent planning module: DeepSeek R1 (strong reasoning); tool calling: DeepSeek V4 series or Qwen3.6 Plus (good Function Calling support); memory module: Qwen3.5 Flash (low cost).

Can I switch between models?

Yes. ChinaWHAPI maintains unified request format. Simply modify the model field in requests to switch between models. Perfect for A/B testing and fallback strategies.

What's the difference between DeepSeek R1 and V3?

R1 is a reasoning model (thinking chain), suitable for math, code analysis, complex reasoning, slower response but stronger reasoning; V3 is a general model, suitable for daily conversation and content generation, faster response.

Which has stronger code capability, Qwen Coder or DeepSeek?

Qwen3 Coder Plus is a code-specialized model, more friendly for Chinese-commented code and simple Bugfix; DeepSeek V4 Pro is stronger for complex system code and architecture design. Both can be used together.

What are Kimi's advantages over other models?

Kimis core advantage is ultra-long context (256K), suitable for processing long documents, contracts, papers without needing RAG splitting to directly process entire documents.

How to call ChinaWHAPI with Python?

Use OpenAI Python SDK: from openai import OpenAI; client = OpenAI(api_key='key', base_url='https://chinawhapi.com/v1'); then call client.chat.completions.create().

How to call ChinaWHAPI with Node.js?

Use openai npm package: new OpenAI({ apiKey: 'key', baseURL: 'https://chinawhapi.com/v1' }); supports both ESM and CommonJS imports.

How to use ChinaWHAPI in LangChain?

from langchain.chat_models import ChatOpenAI; llm = ChatOpenAI(model='qwen3.6-plus', openai_api_key='key', openai_api_base='https://chinawhapi.com/v1')

How to use ChinaWHAPI in LlamaIndex?

from llama_index.llms.openai_like import OpenAILike; llm = OpenAILike(model='qwen3.6-plus', api_key='key', api_base='https://chinawhapi.com/v1')

How to integrate ChinaWHAPI in Dify?

In Dify's model provider settings, select OpenAI compatible, fill in https://chinawhapi.com/v1 and API Key. All ChinaWHAPI models are now available.

How to configure ChinaWHAPI in Cursor IDE?

Open Cursor Settings → AI Provider → select Custom, fill in Base URL: https://chinawhapi.com/v1, API Key: your key, select default model.

How to integrate ChinaWHAPI in Jan (local AI app)?

Jan supports OpenAI-compatible interface. In Settings → Models, add ChinaWHAPI base URL and API Key. Ready to use.

How to use ChinaWHAPI in Cherry Studio?

Add ChinaWHAPI in Cherry Studio model settings, select OpenAI Compatible mode, fill in base URL and API Key.

How to test ChinaWHAPI in Postman?

Create new POST request, URL: https://chinawhapi.com/v1/chat/completions, Headers: Authorization: Bearer {key}, Content-Type: application/json, Body: raw JSON format.

Does ChinaWHAPI support REST API?

Yes. ChinaWHAPI's /v1/chat/completions is a RESTful API, supporting JSON requests and responses, conforming to standard HTTP specifications.

Does ChinaWHAPI support Webhook?

ChinaWHAPI backend supports webhook callbacks for payment notifications etc. For AI API itself, streaming mode uses Server-Sent Events (SSE) for real-time push.

Is Streaming output supported?

Yes. Set stream: true in request, server pushes content chunks in real-time via SSE, enabling typewriter effect in frontend.

What is the price for each model?

Prices vary by model: Doubao Seed 1.6 Flash lowest ($0.05/1K input tokens), DeepSeek V4 Pro higher ($0.55/1K input tokens). Exact prices available in ChinaWHAPI console.

Is billing based on input or output?

Input and Output are billed separately, output is usually 4-5x more expensive than input. This is because generating output requires more computing resources.

How to estimate cost per API call?

Cost = (input tokens × input price + output tokens × output price) / 1000. Console usage statistics show real-time cost breakdown.

How to set budget alerts?

Configure daily cost alert threshold in console usage settings. You'll receive notification when daily cost exceeds threshold, helping avoid unexpected overspending.

Is there a free trial?

New users typically receive initial credits after registration. Check account page in console for credit amount. Recharge required when credits depleted.

How to recharge account?

Log in to ChinaWHAPI console, go to recharge page. Supports USDT TRC20, Stripe credit card and other payment methods. Instant credit.

Is there a monthly subscription?

ChinaWHAPI offers subscription plans with fixed API call quotas. Suitable for users with stable usage. Check subscription page in console for details.

How to reduce API call costs?

Cost optimization strategies: 1) Use low-cost models for simple tasks (Qwen3.5 Flash); 2) Streamline prompts to reduce input tokens; 3) Implement semantic caching to avoid duplicate requests; 4) Set usage alerts for anomaly monitoring.

How to use caching to reduce API calls?

Embed user queries and store in vector database. Same-intent queries return cached results without calling model. Cache hit rate typically 40-60%, saving significant costs.

How to maintain context in multi-turn conversations?

Pass complete conversation history (all user/assistant messages) as messages in each call. Be aware of context window limits; long conversations need compression or truncation.

What is Prompt Injection? How to prevent?

Prompt injection is when users try to inject malicious instructions in input (e.g. 'Ignore previous instructions...'). Prevention: clearly define model behavior boundaries in system prompt; filter content before forwarding; use backend validation for critical scenarios.

How to set Temperature parameter?

Temperature controls randomness: 0.1-0.3 (accurate tasks: Q&A, code, summarization), 0.5-0.7 (balanced: writing, conversation), 0.8-1.0 (creative tasks: poetry, stories). Use low temperature for deterministic output.

What is max_tokens parameter for?

max_tokens limits maximum output tokens per call. Proper limits: 1) prevent waste from overly long output; 2) control response time; 3) ensure output fits your display scenario.

What is System Prompt? How to use?

System Prompt is system-level instruction defining model behavior role and output rules. Example: 'You are a professional technical support engineer, only answer technical questions.' Place at first position in messages array.

How to get JSON format output?

Set response_format: {'type':'json_object'} in request, and specify JSON structure in system prompt. Model may still return JSON wrapped in Markdown, frontend needs strip processing.

What is Few-shot? How to use?

Few-shot provides 2-3 input-output examples in prompt to help model understand task format and expected style. More accurate than plain text description, suitable for formatted and structured output.

What is Chain of Thought (CoT)?

CoT requires model to show reasoning process first before giving answer. For complex problems, CoT can significantly improve accuracy. Enable by adding 'Please reason step by step before answering' in prompt.

What are common causes of 401 error?

1) API Key doesn't exist or deleted; 2) API Key format wrong (should be Bearer xxx); 3) API Key expired; 4) Request header typo (Authorization vs Authorization Bearer); 5) Wrong key used (test vs production).

What are common causes of 429 error?

1) Too many requests in short time; 2) Concurrent requests exceed plan limit; 3) Model TPM (tokens per minute) limit triggered. Add backoff retry, reduce concurrency, or upgrade plan.

How to handle request timeout?

Recommend setting 60-120 second request timeout. On timeout: 1) check network connection; 2) model may be processing long context (slow response); 3) add retry logic (max 3 times); 4) consider switching to faster model.

What to do when encountering 'model not found' error?

Check if model name in request exactly matches model name shown in ChinaWHAPI console, including case. For example deepseek-v4-flash, not DeepSeek-V4-Flash.

How to fix 'invalid JSON' error?

1) Check if request body is valid JSON; 2) ensure double quotes not single quotes; 3) ensure no trailing commas; 4) ensure messages array format is correct.

How to design API retry strategy?

Recommended exponential backoff: wait 1s on first failure, 2s second, 4s third, give up after 3. Differentiate error types: retry 429/500/503; don't retry 401/403/400, error directly.

How to log and troubleshoot API requests?

Log at API call location: timestamp, model, input_tokens, output_tokens, latency, error_type. Recommend structured logging (JSON format) for easier analysis and alerting.

What to check before going to production?

1) API Key stored in environment variables or secret management service; 2) implement retry and fallback; 3) set request timeout; 4) configure usage alerts; 5) implement logging and monitoring; 6) permission-based key management.

How to securely manage API Keys?

Never hardcode in code or put in frontend. Use environment variables (.env file, .gitignore excludes), AWS Secrets Manager, HashiCorp Vault, etc. Frontend calls AI API via backend proxy.

Why need backend proxy for AI API calls?

Backend proxy can: 1) hide real API Key; 2) implement request caching; 3) control permissions and rate; 4) filter malicious requests; 5) add logging and monitoring. Strongly recommended for production.

How to design API rate limiting?

Implement rate limiting at application layer using token bucket or sliding window algorithm. Set different limits per user/project/model. ChinaWHAPI plans also have rate limits, ensure not exceeding.

What AI API metrics to monitor in production?

Core metrics: 1) call count and token consumption; 2) daily/monthly cost; 3) error rate (by error type); 4) P99 response time; 5) call distribution by model. Set alert thresholds for automatic notification.

How to do A/B testing for models?

Randomly assign same user requests to different models (keep seed consistent for reproducibility), record answer quality and response time. Evaluation dimensions: accuracy, speed, cost, user satisfaction.

How to configure model fallback in production?

Configure 2-3 candidate model priority list per task type. Example: code tasks [Qwen Coder Plus → DeepSeek V4 Pro → Doubao Code]. When primary model fails (429/500), automatically try next.

How to implement API Key hot reload?

Store API Key in config center (etcd, Consul, database), service loads on startup and refreshes periodically. Avoid rolling restart due to key rotation.

How to do canary deployment for models?

When launching new model, route 5% traffic first, monitor error rate, response time, user feedback. If OK, gradually expand (10% → 25% → 50% → 100%). Set auto-rollback condition (error rate > 1%).

Does ChinaWHAPI support image understanding models?

Yes. Vision models include Qwen3 VL Plus (image Q&A, screenshot analysis), GLM-5V Turbo (chart understanding), Hunyuan Vision 1.5 (OCR, image reasoning). Pass base64-encoded images or image URLs.

Is Function Calling / Tool Use supported?

Yes. DeepSeek V4 series, Qwen3.6 Plus, Kimi and other mainstream models support Function Calling tool use protocol. Define tools array in request, model selectively calls.

What's the maximum output tokens per call?

Depends on specific model, output limit typically 4K-32K tokens. Control output length via max_tokens parameter to prevent cost overruns.

Is Streaming (Server-Sent Events) supported?

Yes. Set stream: true, server pushes content chunks via SSE. Python: for chunk in stream; Node.js: for await (const chunk of stream).

Is Batch request supported?

ChinaWHAPI doesn't have dedicated Batch API, but batch processing achievable via async concurrent calls. Recommend asyncio (Python) or Promise.all (Node.js) with reasonable concurrency (5-20) to avoid rate limiting.

Is GPT-4, Claude and other overseas models supported?

ChinaWHAPI primarily aggregates Chinese LLM models. GPT-4 and Claude require OpenAI/Anthropic official APIs. ChinaWHAPI provides unified entry for Chinese models.

Will ChinaWHAPI sync latest models?

Yes, ChinaWHAPI continuously follows latest models from DeepSeek, Qwen, Kimi and other vendors. Follow console announcements for first access to new models.

Which is better for my scenario, RAG or Fine-tuning?

Frequent knowledge updates, need source citations → RAG (low cost, real-time, traceable). Need to change model behavior/style, have large labeled data → Fine-tuning (stable but expensive). Both can be combined.

Does ChinaWHAPI support Embeddings?

ChinaWHAPI provides Embedding capability through supported models (like Qwen),可用于构建向量数据库和语义搜索系统，实现 RAG 的检索部分。

How to build AI Agent with ChinaWHAPI?

AI Agent core components: planning (DeepSeek R1), tool calling (Qwen3.6 Plus Function Calling), memory (vector DB for conversation history), execution loop (iterate until complete).

Is multi-Agent collaboration supported?

ChinaWHAPI itself is an API gateway, multi-Agent collaboration needs implementation at application layer. Can use LangChain Agents, CrewAI, AutoGen frameworks, all support OpenAI-compatible interface.

Is voice/speech recognition integration supported?

ChinaWHAPI focuses on text models. Speech recognition (ASR) and speech synthesis (TTS) require specialized voice services. Convert speech to text via ASR first, then process with ChinaWHAPI, finally convert reply to speech via TTS.

How to reset API Key?

On API Keys page in console, click delete button next to existing key, then regenerate new key. Old key becomes invalid immediately after deletion, ensure all apps using it are updated.

Can I create multiple API Keys?

Yes. Recommend creating different keys for different projects, environments (dev/test/prod) for easier permission management and usage tracking. Console supports enable/disable operations.

How to delete account?

Contact ChinaWHAPI support team to request account cancellation. Before cancellation: ensure account balance is zero, all subscriptions cancelled, data to keep backed up.

Where to view API call statistics?

Log in to ChinaWHAPI console, go to usage statistics page. View call count, token consumption, cost breakdown by day/week/month, also breakdown by model and project.

How to download invoice?

View and download monthly invoices on billing page in console. Invoices typically available for download after monthly bill generated at start of month.

Is there a referral/rebate program?

ChinaWHAPI offers distribution/referral program. Earn commission for successfully referring new users who register and recharge. See distribution page in console or contact support for details.

Does ChinaWHAPI have official SDK?

ChinaWHAPI is fully OpenAI SDK compatible. Can directly use openai Python SDK and openai JS SDK, no need for dedicated ChinaWHAPI SDK.

Is there OpenAPI specification?

ChinaWHAPI API is consistent with OpenAI Chat Completions specification. Refer to OpenAI official docs. Console also provides detailed endpoint descriptions and request examples.

Is there online API testing console?

ChinaWHAPI console provides basic API testing. Can also use Postman, Bruno, Insomnia to import API spec for more comprehensive testing.

Are there code examples in different languages?

ChinaWHAPI docs page provides complete examples in Python, Node.js, curl and other common languages. Also refer to OpenAI official docs, just replace baseURL with ChinaWHAPI address.

Is there a Postman Collection?

ChinaWHAPI API is OpenAI-compatible, can directly use OpenAI Postman Collection, just replace base URL with https://chinawhapi.com/v1.

What's the difference between ChinaWHAPI and using vendor APIs directly?

ChinaWHAPI = unified entry + multi-model aggregation + unified billing + simplified integration. Direct vendor APIs = separate registrations, separate key management, separate billing, multiple codebases. ChinaWHAPI significantly reduces management overhead.

What are ChinaWHAPI advantages over other API aggregation platforms?

ChinaWHAPI focuses on Chinese LLMs with more comprehensive coverage; transparent real-time pricing; excellent OpenAI compatibility with zero migration cost; unified billing and management with consistent experience.

I'm unsure which model to choose, what should I do?

Recommend starting with Qwen3.6 Plus (balanced choice) to get started, then A/B test based on specific tasks. ChinaWHAPI console provides model quality comparison and price comparison tools.

Do I need to change code to switch models?

Only need to change the model field value in request, no changes to business logic code. ChinaWHAPI unified interface makes model switching zero-cost.

How do Chinese models compare to GPT-4/Claude?

Chinese models have narrowed the gap in Chinese/English bilingual and Chinese tasks, priced at approximately 10-30% of GPT-4. For code tasks, Qwen3 Coder Plus and DeepSeek V4 Pro approach GPT-4 Turbo level.

What's the best cost-performance model combination?

Daily conversation/customer service → Doubao Seed 1.6 Flash; code tasks → Qwen3 Coder Plus; complex tasks → DeepSeek V4 Flash or Qwen3.6 Plus; reasoning → DeepSeek R1. This combination covers 95% scenarios at far lower cost than GPT-4.

What to do when API response latency is high?

High latency causes: 1) model processing time (normal for reasoning models); 2) network issues; 3) high concurrency causing queue. Confirm which stage is slow; use streaming for reasoning tasks to improve perceived latency; switch to faster model; reduce concurrency.