We have just added DeepSeek V4 Pro and DeepSeek V4 Flash to the YepAPI chat catalog. These are DeepSeek's newest mixture-of-experts models — both ship with a 1M token context window and 384K max output, and both are callable from the same `/v1/ai/chat` endpoint you already use for every other LLM under your key, with pay-per-token pricing and no waitlist.
What's new
- Two new DeepSeek models: V4 Pro (1.6T params, 49B active — flagship reasoning) and V4 Flash (284B params, 13B active — speed-optimized)
- Full 1,048,576 token context window on both models
- Up to 384,000 output tokens per request — long-form code, papers, transcripts in one shot
- MoE architecture: only the activated parameters fire per token, so latency stays low and cost stays cheap
- Drop-in on the unified `/v1/ai/chat` endpoint — same envelope, same auth, streaming via `stream: true`
- Short aliases: `deepseek-v4-pro`, `deepseek-v4-flash`, and `deepseek-v4` (resolves to Pro)
Endpoints in this release
Which one to use
V4 Pro is DeepSeek's new top of the line — 1.6T total parameters with 49B activated per token. It's built for advanced reasoning, agentic workflows, deep code review, and tasks where you'd reach for a frontier model. V4 Flash is the efficiency-tuned sibling — 284B params with 13B activated — designed for high-throughput workloads where you want strong DeepSeek quality at sub-cent prices: classification, summarization, retrieval-augmented generation, draft-quality coding, and chat assistants. Both share the same 1M token context window so you can drop entire codebases into either.
How to call them
Pass `deepseek/deepseek-v4-pro` or `deepseek/deepseek-v4-flash` as the `model` on the `/v1/ai/chat` endpoint. Streaming is supported via `stream: true`. Everything else — auth, billing, retries, rate limits — works exactly like every other model on YepAPI.
curl -X POST https://api.yepapi.com/v1/ai/chat \
-H "x-api-key: $YEPAPI_KEY" \
-H "Content-Type: application/json" \
-d '{
"model": "deepseek/deepseek-v4-pro",
"messages": [{"role": "user", "content": "Explain MoE routing in plain English."}]
}'Pricing
V4 Pro is priced at $0.61 per 1M input tokens and $1.22 per 1M output tokens. V4 Flash is $0.20 per 1M input and $0.40 per 1M output. Both have a $0.01 minimum per request and are billed only on successful completions, deducted from your prepaid balance.
Try DeepSeek V4 today
Sign up, grab an API key, and spend your free starter credits on the new V4 models.
Open the V4 Pro playground