Instantiating a client

To interact with Tavily in Python, you must instatiate a client with your API key. For greater flexibility, we provide both a synchronous and an asynchronous client class.

Once you have instantiated a client, call one of our supported methods (detailed below) to access the API.

Synchronous Client

from tavily import TavilyClient

client = TavilyClient("tvly-YOUR_API_KEY")

Asynchronous Client

from tavily import AsyncTavilyClient

client = AsyncTavilyClient("tvly-YOUR_API_KEY")

NEW! Try our interactive API Playground to see each parameter in action, and generate ready-to-use Python snippets.

You can access Tavily Search in Python through the client’s search function.

Parameters

ParameterTypeDescriptionDefault
query (required)strThe query to run a search on.
search_depthstrThe depth of the search. It can be "basic" or "advanced"."basic"
topicstrThe category of the search. Determines which agent will be used. Supported values are "general" and "news"."general"
daysintThe number of days back from the current date to include in the results. Available only when using the "news" topic.3
time_rangestrThe time range back from the current date. Accepted values include "day", "week", "month", "year" or shorthand values "d", "w", "m", "y".
max_resultsintThe maximum number of search results to return. It must be between 0 and 20.5
include_imagesboolInclude a list of query-related images in the response.False
include_image_descriptionsboolInclude a list of query-related images and their descriptions in the response.False
include_answerbool or strInclude an answer to the query generated by an LLM based on search results. A "basic" (or True) answer is quick but less detailed; an "advanced" answer is more detailed.False
include_raw_contentboolInclude the cleaned and parsed HTML content of each search result.False
include_domainslist[str]A list of domains to specifically include in the search results.[]
exclude_domainslist[str]A list of domains to specifically exclude from the search results.[]

Response format

The response object you receive will be in the following format:

KeyTypeDescription
resultslist[Result]A list of sorted search results ranked by relevancy.
querystrYour search query.
response_timefloatYour search result response time.
answer (optional)strThe answer to your search query, generated by an LLM based on Tavily’s search results. This is only available if include_answer is set to True.
images (optional)list[str] or list[ImageResult]This is only available if include_images is set to True. A list of query-related image URLs. If include_image_descriptions is set to True, each entry will be an ImageResult.

Results

KeyTypeDescription
titlestrThe title of the search result.
urlstrThe URL of the search result.
contentstrThe most query-related content from the scraped URL. Tavily uses proprietary AI to extract the most relevant content based on context quality and size.
scorefloatThe relevance score of the search result.
raw_content (optional)strThe parsed and cleaned HTML content of the site. This is only available if include_raw_content is set to True.
published_date (optional)strThe publication date of the source. This is only available if the search topic is set to "news".

Image Results

If includeImageDescriptions is set to true, each image in the images list will be in the following ImageResult format:

KeyTypeDescription
urlstringThe URL of the image.
descriptionstringAn LLM-generated description of the image.

Example

Tavily Extract

You can access Tavily Extract in Python through the client’s extract function.

Parameters

ParameterTypeDescriptionDefault
urls (required)str or list[str]The URL (or URLs) you want to extract. If a list is provided, it must not contain more than 20 URLs.
include_imagesboolInclude a list of images extracted from the URLs in the response.False
extract_depthstrThe depth of the extraction process. You may experience higher latency with "advanced" extraction, but it offers a higher success rate and retrieves more data from the URL (e.g., tables, embedded content). "basic" extraction costs 1 API Credit per 5 successful URL extractions, while advanced extraction costs 2 API Credits per 5 successful URL extractions."basic"

Response format

The response object you receive will be in the following format:

KeyTypeDescription
resultslist[SuccessfulResult]A list of extracted content.
failed_resultslist[FailedResult]A list of URLs that could not be processed.
response_timefloatThe search result response time.

Successful Results

Each successful result in the results list will be in the following SuccessfulResult format:

KeyTypeDescription
urlstrThe URL of the webpage.
raw_contentstrThe raw content extracted.
images (optional)list[str]This is only available if include_images is set to True. A list of extracted image URLs.

Failed Results

Each failed result in the results list will be in the following FailedResult format:

KeyTypeDescription
urlstrThe URL that failed.
errorstrAn error message describing why it could not be processed.

Example

Tavily Hybrid RAG

Tavily Hybrid RAG is an extension of the Tavily Search API built to retrieve relevant data from both the web and an existing database collection. This way, a RAG agent can combine web sources and locally available data to perform its tasks. Additionally, data queried from the web that is not yet in the database can optionally be inserted into it. This will allow similar searches in the future to be answered faster, without the need to query the web again.

Parameters

The TavilyHybridClient class is your gateway to Tavily Hybrid RAG. There are a few important parameters to keep in mind when you are instantiating a Tavily Hybrid Client.

ParameterTypeDescriptionDefault
api_keystrYour Tavily API Key
db_providerstrYour database provider. Currently, only "mongodb" is supported.
collectionstrA reference to the MongoDB collection that will be used for local search.
embeddings_field (optional)strThe name of the field that stores the embeddings in the specified collection. This field MUST be the same one used in the specified index. This will also be used when inserting web search results in the database using our default function."embeddings"
content_field (optional)strThe name of the field that stores the text content in the specified collection. This will also be used when inserting web search results in the database using our default function."content"
embedding_function (optional)functionA custom embedding function (if you want to use one). The function must take in a list[str] corresponding to the list of strings to be embedded, as well as an additional string defining the type of document. It must return a list[list[float]], one embedding per input string. If no function is provided, defaults to Cohere’s Embed. Keep in mind that you shouldn’t mix different embeddings in the same database collection.
ranking_function (optional)functionA custom ranking function (if you want to use one). If no function is provided, defaults to Cohere’s Rerank. It should return an ordered list[dict] where the documents are sorted by decreasing relevancy to your query. Each returned document will have two properties - content, which is a str, and score, which is a float. The function MUST accept the following parameters: query: str - This is the query you are executing. When your ranking function is called during Hybrid RAG, the query parameter of your search call (more details below) will be passed as query. documents:List[Dict]: - This is the list of documents that are returned by your Hybrid RAG call and that you want to sort. Each document will have two properties - content, which is a str, and score, which is a float. top_n: int - This is the number of results you want to return after ranking. When your ranking function is called during Hybrid RAG, the max_results value will be passed as top_n.

Methods

search(query, max_results=10, max_local=None, max_foreign=None, save_foreign=False, **kwargs)

Performs a Tavily Hybrid RAG query and returns the retrieved documents as a list[dict] where the documents are sorted by decreasing relevancy to your query. Each returned document will have three properties - content (str), score (float), and origin, which is either local or foreign.

ParameterTypeDescriptionDefault
querystrThe query you want to search for.
max_resultsintThe maximum number of total search results to return.10
max_localintThe maximum number of local search results to return.None, which defaults to max_results.
max_localintThe maximum number of local search results to return.None, which defaults to max_results.
max_foreignintThe maximum number of web search results to return.None, which defaults to max_results.
save_foreignUnion[bool, function]Save documents from the web search in the local database. If True is passed, our default saving function (which only saves the content str and the embedding list[float] will be used.) If False is passed, no web search result documents will be saved in the local database. If a function is passed, that function MUST take in a dict as a parameter, and return another dict. The input dict contains all properties of the returned Tavily result object. The output dict is the final document that will be inserted in the database. You are free to add to it any fields that are supported by the database, as well as remove any of the default ones. If this function returns None, the document will not be saved in the database.

Additional parameters can be provided as keyword arguments (detailed below). The keyword arguments supported by this method are: search_depth, topic, include_raw_content, include_domains,exclude_domains.

Setup

MongoDB setup

You will need to have a MongoDB collection with a vector search index. You can follow the MongoDB Documentation to learn how to set this up.

Cohere API Key

By default, embedding and ranking use the Cohere API, our recommended option. Unless you want to provide a custom embedding and ranking function, you’ll need to get an API key from Cohere and set it as an environment variable named CO_API_KEY

If you decide to stick with Cohere, please note that you’ll need to install the Cohere Python package as well:

pip install cohere

Tavily Hybrid RAG Client setup

Once you are done setting up your database, you’ll need to create a MongoDB Client as well as a Tavily Hybrid RAG Client. A minimal setup would look like this:

from pymongo import MongoClient
from tavily import TavilyHybridClient

db = MongoClient("mongodb+srv://YOUR_MONGO_URI")["YOUR_DB"]

hybrid_rag = TavilyHybridClient(
    api_key="tvly-YOUR_API_KEY",
    db_provider="mongodb",
    collection=db.get_collection("YOUR_COLLECTION"),
    index="YOUR_VECTOR_SEARCH_INDEX",
    embeddings_field="YOUR_EMBEDDINGS_FIELD",
    content_field="YOUR_CONTENT_FIELD"
)

Usage

Once you create the proper clients, you can easily start searching. A few simple examples are shown below. They assume you’ve followed earlier steps. You can use most of the Tavily Search parameters with Tavily Hybrid RAG as well.

Simple Tavily Hybrid RAG example

This example will look for context about Leo Messi on the web and in the local database. Here, we get 5 sources, both from our database and from the web, but we want to exclude unwanted-domain.com from our web search results:

results = hybrid_rag.search("Who is Leo Messi?", max_results=5, exclude_domains=['unwanted-domain.com'])

Here, we want to prioritize the number of local sources, so we will get 2 foreign (web) sources, and 5 sources from our database:

results = hybrid_rag.search("Who is Leo Messi?",  max_local=5, max_foreign=2)

Note: The sum of max_local and max_foreign can exceed max_results, but only the top max_results results will be returned.

Adding retrieved data to the database

If you want to add the retrieved data to the database, you can do so by setting the save_foreign parameter to True:

results = hybrid_rag.search("Who is Leo Messi?", save_foreign=True)

This will use our default saving function, which stores the content and its embedding.

Examples

Sample 1: Using a custom saving function

You might want to add some extra properties to documents you’re inserting or even discard some of them based on custom criteria. This can be done by passing a function to the save_foreign parameter:

def save_document(document):
    if document['score'] < 0.5:
        return None # Do not save documents with low scores

    return {
        'content': document['content'],

         # Save the title and URL in the database
        'site_title': document['title'],
        'site_url': document['url'],

        # Add a new field
        'added_at': datetime.now()
    }

results = hybrid_rag.search("Who is Leo Messi?", save_foreign=save_document)

Sample 2: Using a custom embedding function

By default, we use Cohere for our embeddings. If you want to use your own embeddings, can pass a custom embedding function to the TavilyHybridClient:

def my_embedding_function(texts, doc_type): # doc_type will be either 'search_query' or 'search_document'
    return my_embedding_model.encode(texts)

hybrid_rag = TavilyHybridClient(
    # ...
    embedding_function=my_embedding_function
)

Sample 3: Using a custom ranking function

Cohere’s rerank model is used by default, but you can pass your own function to the ranking_function parameter:

def my_ranking_function(query, documents, top_n):
    return my_ranking_model.rank(query, documents, top_n)

hybrid_rag = TavilyHybridClient(
    # ...
    ranking_function=my_ranking_function
)