> ## Documentation Index > Fetch the complete documentation index at: https://docs.tavily.com/llms.txt > Use this file to discover all available pages before exploring further. # RAG Evaluation > Effortless Web-Based RAG Evaluation Using Tavily and LangGraph # Introduction Every data science enthusiast knows that a vital first step to building a successful model or algorithm is having a reliable evaluation set to aspire to. In the rapidly evolving landscape of **Retrieval-Augmented Generation (RAG)** and AI-driven search systems, the importance of high-quality eval datasets is crucial. In this article, we introduce an agentic workflow designed to **generate** subject-specific dynamic **evaluation datasets**, enabling precise validation of web search augmented agents' performance. **Known RAG evaluation datasets**, such as [HotPotQA](https://hotpotqa.github.io), [CRAG](https://github.com/facebookresearch/CRAG), and [MultiHop-RAG](https://github.com/yixuantt/MultiHop-RAG), have been pivotal in benchmarking and fine-tuning models. However, these datasets primarily focus on evaluating performance with **static, pre-defined document sets**. As a result, they fall short when it comes to evaluating **web-based RAG systems**, where data is dynamic, contextual, and ever-changing. This gap presents a significant challenge: how do we effectively test and refine RAG systems designed for real-world web search scenarios? **Enter the Real-Time Dataset Generator for RAG Evals** — an agentic tool leveraging [Tavily’s Search Layer](https://tavily.com) and the **LangGraph framework** to create diverse, relevant, and dynamic datasets tailored specifically for web based RAG agents. # How does it work? Web Evaluation Graph

The Real-Time Dataset Generator follows a systematic workflow to create high-quality evaluation datasets: The workflow begins with user-provided inputs. If a subject is provided (e.g., “NBA Basketball”), the system **generates a set of search queries**. This ensures queries are tailored to gather high-quality, recent, and subject-specific information. This step guarantees that the dataset reflects **current and relevant information**, particularly for web search RAG evaluation, where up-to-date data is crucial.This is the **heart of the RAG Dataset Generator**, transforming queries into actionable, high-quality data that forms the foundation of the evaluation set. For each website returned by Tavily, the system generates question-answer pair using a **map-reduce paradigm** to ensure efficient processing across multiple sources. This step is implemented using LangGraph’s Send API. Finally, the generated dataset is saved either **locally** or to **Langsmith**, based on the input configuration. The result is a well-structured, subject-specific evaluation dataset, ready for use in advanced evaluation methods like **LLM-as-a-Judge**. # Learn More Want to dive deeper into web-based RAG evaluation? Check out these resources: Read our detailed blog post about generating dynamic RAG evaluation datasets `/Eyalbenba/tavily-web-eval-generator`