📄

Request My Resume

Thank you for your interest! To receive my resume, please reach out to me through any of the following channels:

Learning Agent Development with Google Gemini CLI (Part 4): How Does Gemini CLI Scrape Data?

Learning Agent Development with Google Gemini CLI (Part 4): How Does Gemini CLI Scrape Data?

In the first three articles, my partner Tam took us through Gemini CLI’s “security moat” (sandbox), “perception scalpel” (file system), and “layered memory brain” (memory system). Our AI Agent is now secure, precise, and memory-rich. But that’s not enough.

It’s still like a scholar isolated from the world—knowledge frozen in the past, unable to perceive “now.” An AI that can’t connect to the internet can’t track latest tech frameworks, can’t find latest solutions on Stack Overflow, can’t read latest API documentation.

To break this “information wall,” today Tam will reveal AI’s “far-seeing eyes”—how google_web_search and web_fetch, these two pillars, empower Gemini models to perceive and process real-time web data.

Over to him.


  • by Tam -

Chapter 1: Two Pillars of Web Access: Discovery and Extraction

Gemini CLI deconstructs complex web interaction tasks into two basic, complementary primitives:

  • 1. Discovery: google_web_search plays “AI Research Analyst,” responsible for finding answers from the entire internet, returning not link lists but summary reports with cited sources.

  • 2. Extraction: web_fetch plays “Content Processing Assistant,” responsible for deep processing of user-specified URLs—summarizing, comparing, extracting specific information.

Their combined power is immense. For example, first use google_web_search to find materials about “Rust ownership model vs C++ RAII comparison,” then after AI returns several high-quality article links, use web_fetch to have AI deeply read these links and generate a detailed comparison report for you.

Chapter 2: Deep Dive into google_web_search: The Intelligent Leap from Search to Answer

Behind this tool lies one of modern LLM applications’ most core technologies: Retrieval-Augmented Generation (RAG). The key: retrieval tasks are entirely completed by Gemini API backend, not locally by CLI.

The internal flow roughly goes:

  1. Model receives a query requiring real-time information, decides to call google_web_search tool.

  2. CLI sends a request with “enable search” flag to Gemini API.

  3. API backend triggers RAG flow: optimize query, parallel search, crawl and filter webpage content, chunk content, relevance ranking—ultimately building a temporary “knowledge base” specifically serving this query.

  4. Model uses user’s question as instruction, this “knowledge base” as context, generating a comprehensive answer with cited sources.

This API-based RAG design ensures high-quality, high-efficiency retrieval information, effectively avoiding anti-crawler restrictions and protecting user privacy.

Chapter 3: Deep Dive into web_fetch: Flexibility and Robustness Under Dual-Mode Execution

Unlike google_web_search, web_fetch empowers users to specify information sources. Its core design is “API First, Local Fallback” dual-mode execution strategy.

  1. Security Confirmation: Before execution, CLI extracts all URLs and requests user confirmation, ensuring users have full awareness of web access.

  2. Mode 1 (API First): CLI first tries letting Gemini API backend directly fetch and process specified URL content. This is the most efficient, reliable path.

  3. Mode 2 (Local Fallback): If API fails to access due to firewall, intranet addresses, etc., CLI triggers local fallback mechanism. It uses libraries like axios on user’s local machine to retry crawling URL content.

  4. Second Call: After successful local crawl, CLI submits the crawled text content as tool execution result (FunctionResponse), along with user’s original instruction, again to the model for processing, finally generating the answer.

This dual-mode design is exemplary, achieving excellent balance between efficiency (API mode) and robustness (local fallback mode accessing intranet and other restricted resources).

Chapter 4: Advanced Usage and Prompt Engineering

To become an advanced user, you need to master the art of precisely commanding these two web tools through Prompt Engineering.

google_web_search Prompting Tips

Query quality determines answer quality. Try clarifying intent and scope, e.g., use "Compare the server-side rendering (SSR) performance of Next.js 14 and Nuxt 3 in 2024." instead of vague "web frameworks".

web_fetch Refined Instructions

Its capabilities go far beyond “summarizing.” You can instruct it to play specific roles, require specific output formats, even perform complex cross-document content synthesis.

“Extract the name, company, and talk title of all speakers from this event page URL and format the output as a Markdown table.”

Conclusion—From “Reader” to Future Web “Actor”

Gemini CLI’s web tools, through explicit confirmation, privacy isolation, and sandbox integration, lay security and privacy foundations. Currently, it’s an excellent web information “reader” and “analyzer.”

Its future must evolve toward web “actor.” We can foresee future tools handling JavaScript dynamically rendered pages, interacting with webpage forms and buttons, eventually evolving into Autonomous Web Agents—given just a high-level goal, autonomously decomposing tasks, accessing multiple websites, synthesizing data, and providing complete solutions.

All this begins with those two seemingly simple yet infinitely wise commands: google_web_search and web_fetch.

Found Tam’s analysis insightful? Give it a thumbs up and share with more friends who need it!

Follow my channel to explore the infinite possibilities of AI, going global, and digital marketing together.

Connecting AI to the internet is like connecting real-time senses to the wise brain.

Mr. Guo Logo

© 2026 Mr'Guo

Twitter Github WeChat