Learn the best practices for web content extraction process
raw_content
include_raw_content = true
when making a Tavily Search API call. This allows you to retrieve both search results and extracted content in a single step.
However, this can increase latency because you may extract raw content from sources that are not relevant in the first place. It’s recommended to split the process into two steps: running multiple sub-queries to expand the pool of sources, then curating the most relevant documents based on content snippets or source scores. By extracting raw content from the most relevant sources, you get high-quality RAG documents.
Use the Tavily Search API to retrieve relevant web pages, which output URLs.Step 1: Search
Use the Tavily Extract API to fetch the full content from the most relevant URLs. Example:Step 2: Extract
search_depth = "advanced"
.
extract_depth = "advanced"
in the Extract API allows for more comprehensive content retrieval. This mode is particularly useful when dealing with:
If precision and depth are priorities for your application, extract_depth = "advanced"
is the recommended choice.