Documentation Index
Fetch the complete documentation index at: https://docs.tavily.com/llms.txt
Use this file to discover all available pages before exploring further.
Extract Parameters
Query
Use query to rerank extracted content chunks based on relevance:- To extract only relevant portions of long documents
- When you need focused content instead of full page extraction
- For targeted information retrieval from specific URLs
When query is provided, chunks are reranked based on relevance to the query.
Chunks Per Source
Control the amount of content returned per URL to prevent context window explosion:- Returns only relevant content snippets (max 500 characters each) instead of full page content
- Prevents context window from exploding
- Chunks appear in
raw_contentas:<chunk 1> [...] <chunk 2> [...] <chunk 3> - Must be between 1 and 5 chunks per source
Example with multiple URLs:chunks_per_sourceis only available whenqueryis provided.
Extraction Approaches
Search with include_raw_content
Enable include_raw_content=true in Search API calls to retrieve both search results and extracted content simultaneously.- Quick prototyping
- Simple queries where search results are likely relevant
- Single API call convenience
Direct Extract API
Use the Extract API when you want control over which specific URLs to extract from.- You already have specific URLs to extract from
- You want to filter or curate URLs before extraction
- You need targeted extraction with query and chunks_per_source
include_raw_content extracts from all search results.
Extract Depth
Theextract_depth parameter controls extraction comprehensiveness:
| Depth | Use case |
|---|---|
basic (default) | Simple text extraction, faster processing |
advanced | Complex pages, tables, structured data, media |
Using extract_depth=advanced
Best for content requiring detailed extraction:
- Dynamic content or JavaScript-rendered pages
- Tables and structured information
- Embedded media and rich content
- Higher extraction success rates needed
extract_depth=advanced provides better accuracy but increases latency and
cost. Use basic for simple content.Advanced Filtering Strategies
Beyond query-based filtering, consider these approaches for curating URLs before extraction:| Strategy | When to use |
|---|---|
| Re-ranking | Use dedicated re-ranking models for precision |
| LLM-based | Let an LLM assess relevance before extraction |
| Clustering | Group similar documents, extract from clusters |
| Domain-based | Filter by trusted domains before extracting |
| Score-based | Filter search results by relevance score |
Example: Score-based filtering
Integration with Search
Optimal workflow
- Search to discover relevant URLs
- Filter by relevance score, domain, or content snippet
- Re-rank if needed using specialized models
- Extract from top-ranked sources with query and chunks_per_source
- Validate extracted content quality
- Process for your RAG or AI application
Example end-to-end pipeline
Summary
- Use query and chunks_per_source for targeted, focused extraction
- Choose Extract API when you need control over which URLs to extract from
- Filter URLs before extraction using scores, re-ranking, or domain trust
- Choose appropriate extract_depth based on content complexity
- Process URLs concurrently with async operations for better performance
- Implement error handling to manage failed extractions gracefully
- Validate extracted content before downstream processing
- Optimize costs by extracting only necessary content with chunks_per_source
Start with query and chunks_per_source for targeted extraction. Filter URLs strategically, extract with appropriate depth, and handle errors gracefully for production-ready pipelines.