Extract page content tool

ExtractPageContentTool

Type: Auxiliary Tool Source: sgr_agent_core/tools/extract_page_content_tool.py

Extracts full detailed content from specific web pages using Tavily Extract API.

Parameters

reasoning (str) - why extract these specific pages
urls (list[str], 1-5 items) - list of URLs to extract full content from

Behavior

Extracts full content from specified URLs via TavilySearchService
Updates existing sources in context.sources with full content
For new URLs, adds them with sequential numbering
Returns formatted string with extracted content preview (limited by content_limit)

Usage

Call after web_search_tool to get detailed information from promising URLs found in search results.

Important warnings

Extracted pages may show data from different years or time periods than requested
Always verify that extracted content matches the temporal context of the question
If extracted content contradicts search snippet, prefer snippet for factual questions
For date or number questions, cross-check extracted values with search snippets

Configuration

search:
  tavily_api_key: "your-tavily-api-key"  # Required: Tavily API key
  tavily_api_base_url: "https://api.tavily.com"  # Tavily API URL
  content_limit: 1500  # Content character limit per source (truncates extracted content)

Example

agents:
  research_agent:
    search:
      content_limit: 2000  # Increase content limit for more detailed extraction
    tools:
      - "web_search_tool"
      - "extract_page_content_tool"