What You’ll Learn
- Extracting clean content from one or many URLs
- Basic vs advanced extraction depth
- Query-focused extraction for targeted content retrieval
- Batch extraction (up to 20 URLs in a single call)
How Does It Work?
Tavily Extract takes a URL (or list of URLs) and returns the page content as clean markdown or plain text. It handles JavaScript-rendered pages, removes boilerplate (ads, navigation, footers), and returns structured content ready for LLM consumption. Two extraction depths are available:| Depth | Speed | Success Rate | Content | Cost |
|---|---|---|---|---|
basic | Fast | Good | Standard page content | 1 credit per 5 URLs |
advanced | Slower | Higher | Tables, embedded content, JS-rendered pages | 2 credits per 5 URLs |
Getting Started
Get your Tavily API key
Batch Extraction
Extract content from up to 20 URLs in a single call. Failed URLs are reported separately without blocking successful ones.Query-Focused Extraction
When you pass aquery parameter, Extract reranks the content chunks by relevance to your question. Combined with chunks_per_source, this returns only the most relevant portions of each page.
raw_content field will contain the top 3 most relevant chunks separated by [...], rather than the full page content. This is useful for keeping LLM context windows small while maintaining relevance.
Choosing the Right Extraction Depth
When to use basic extraction
When to use basic extraction
- Static HTML pages (blogs, articles, documentation)
- When speed matters more than completeness
- High-volume batch jobs where cost is a concern
- Pages with straightforward content structure
When to use advanced extraction
When to use advanced extraction
- JavaScript-rendered single-page applications
- Pages with tables, charts, or embedded content
- When you need the highest success rate
- Complex pages where basic extraction misses content
Critical Knobs
extract_depth
extract_depth
"basic"(default) — standard HTML pages, 1 credit per 5 URLs"advanced"— JS-rendered pages, tables, embedded content, 2 credits per 5 URLs
query + chunks_per_source
query + chunks_per_source
- Pass a
queryto rerank content by relevance to your question - Pair with
chunks_per_source(1–5) to return only the top snippets - Without
query, full page content is returned
format
format
"markdown"(default) — preserves headings, links, and structure"text"— plain text, lighter for simple pipelines
Next Steps
Extract API Reference
Full parameter list, response schema, and interactive playground.
Extract Best Practices
Depth selection, two-step search-then-extract, and optimization tips.
Python SDK Reference
Python client methods, async support, and type details.
JavaScript SDK Reference
JavaScript/TypeScript client methods and usage.