📄

Request My Resume

Thank you for your interest! To receive my resume, please reach out to me through any of the following channels:

Building Tools for Agents: More Like Next-Gen SaaS Than Building for Humans

A Screenshot Tool Suddenly Made Me Re-understand SaaS

I’ve been increasingly feeling that CLIs might be making a sexy comeback.

A few years ago, this statement might have sounded like an engineer’s self-indulgent fantasy. Black windows, commands, parameters, errors – none of it sounds like the beginning of a “next-gen SaaS.” In recent years, when we talked about SaaS, the focus was mostly on interfaces, conversion, onboarding, dashboards, templates, collaboration, and retention. Once users came in, how could we make them think less, click more, and ideally, return the next day?

This logic is, of course, sound. User experience is crucial when people use software. Yet, the fact that even I can’t go a day without various CLIs is the best proof.

But recently, seeing the update for Apple Frames 4, it suddenly struck me that the users of some tools might no longer be just humans. Apple Frames is a very specific product: it adds official-style device frames to screenshots of Apple devices. Designers, writers, and developers use it, but you’d hardly call it a world-changing platform.

Interestingly, this time it didn’t just update its Shortcut, or add multi-color device frames and scaling; it also released an open-source Apple Frames CLI. More crucially, I checked the original MacStories article and confirmed this wasn’t a misinterpretation in a Redol summary. The original article explicitly states that Apple Frames CLI is for AI agents, and comes with Claude Code/Codex skills, allowing coding agents to batch process dozens or even hundreds of screenshots from any folder on a Mac. Author Federico Viticci also mentioned that the CLI itself was built by him using Claude Code and Codex, and can already be integrated into the same agentic workflow in Claude Code for testing iOS apps, taking screenshots, and adding frames.

A screenshot framing tool, developing a CLI, creating agent skills, and making it directly callable by Claude Code and Codex.

Is this a small thing? Yes, it is.

But it’s highly representative. Because it shows that software tools are starting to seriously serve a new type of user. This user isn’t a human sitting in front of a screen clicking buttons, but an Agent. Not the “intelligent agents” walking around in suits in promotional videos, nor the fantastical platform concepts drawn in PPTs. It’s the very concrete executor within Claude Code, Codex, Cursor, various automation scripts, and workflows.

It won’t appreciate how pretty your buttons are, doesn’t care if your empty state pages have warmth, and won’t think your product is more worth paying for just because you used a sophisticated gradient. What it cares about is whether the tool can be called, if the input is stable, if the output is clear, if it can determine what went wrong after an error, if there are permission boundaries, if it can be executed repeatedly, if it can recover after failure, and if the entire process can leave a record.

Honestly, if this trend continues, a portion of SaaS’s value propositions will be completely re-evaluated.

Humans Using Software vs. Agents Using Software: Not the Same Thing

Previously, when we built tools, the default assumption was that they were for humans.

Building tools for humans is fundamentally about reducing human operational costs. The interface needs to be clear, paths short, buttons easy to find, feedback timely, aesthetics not too shabby, and ideally, the product should have a “I understand you” vibe. Users come in from the homepage, register, log in, navigate, click menus, try features, and then ask customer service if they encounter issues. In this process, product managers have a lot of room to maneuver. You can use interactive guides, create templates, build case studies, visualize data, or hide complex workflows.

But when an Agent uses your tool, it’s usually not like that.

It might be called in the middle of a long task. Dozens of steps might have already run, with dozens more to go. It just needs to process an image, clean a batch of data, check a page, convert Markdown to HTML, generate a JSON report, or upload an article to a draft folder. It’s not in the mood to appreciate you; it just wants to know one thing: if this task is handed to you, will it reliably come back completed?

This is why I believe “building tools for Agents might be more like the next generation of SaaS than building tools for humans.”

This isn’t to say that UIs will be unnecessary in the future. Don’t misunderstand. Humans will always need to configure goals, approve permissions, accept results, handle exceptions, and pay for renewals. Without humans, software businesses would cease to exist. At least for now, we’re not at the stage where AI itself is swiping credit cards to buy SaaS; if that day comes, Stripe will probably need to hold a meeting first.

What I’m saying is that a new entry point for SaaS is emerging. In the past, SaaS competed on whether users would stay after logging in. In the future, a category of SaaS might compete on whether users even need to log in to you, but rather whether their Agent will repeatedly hand tasks over to you.

This change is somewhat counter-intuitive.

Previously, product managers would worry about DAU, retention, page dwell time, and feature click-through rates. But in an Agent workflow, the more a user “likes” you, the less they might want to open you. Because they want you to quietly get the job done. A truly useful piece of infrastructure doesn’t need you to open it daily and praise its beautiful interface. You just want it to not break down.

For those building SaaS, this is both quite harsh and quite an opportunity.

The harsh part is that in the past, you relied on interfaces to capture user attention; in the future, users might bypass the interface entirely. The opportunity is that as long as you can become a reliable node in the task chain, users might not open you daily, but they might rely on you daily.

You might not capture attention, but you capture the task chain.

These two types of value are different. Attention value is suitable for content, communities, platforms, and consumer products. Task chain value is more suitable for tools, infrastructure, vertical SaaS, and automation nodes. What many ordinary entrepreneurs should really be thinking about might not be “Should I build the next Notion?”, but rather “Can I become the most reliable step in a certain type of Agent workflow?”

My Own Bottleneck: SERP API – Expensive, Yet Indispensable

I’ve felt this particularly strongly recently while planning my own automated daily website blog post update tasks.

If it’s just a human doing content planning, they can open Google, search a few keywords, look at a few pages, and roughly judge search intent, competitor titles, content depth, and page type. This process is slow, but humans can tolerate it. Browsers have ads, redirects, regional differences, and all sorts of messy SERP components, but humans can still make do with experience.

But once you want an Agent to do this reliably, the problem becomes entirely different.

An Agent can’t just rely on “I feel like Google results are roughly like this.” It needs stable SERP data, to know who the top ten pages are, what their titles are, what their snippets are, if there are ads, if there’s an AI Overview, and if there are modules like Reddit, YouTube, Shopping, People Also Ask. It also needs to be able to run repeatedly: run today, run tomorrow, run with different keywords, different countries, different languages.

This is where the most annoying thing comes in: SERP API.

I really hate it. To get stable SERP data, I have no choice but to use it. But after using it, I find the price incredibly expensive. 1000 searches cost $25. A human manually checking 20 keywords is just a bit tiring; if an Agent is let loose, 20 topics, each broken down into 5 search variations, followed by several rounds of validation, the costs quickly start to skyrocket.

This is a very typical Agent tool problem.

From a human perspective, search engine pages are free. You open a browser, enter keywords, see results, at most it costs you time. From an Agent’s perspective, stable, parseable, repeatable SERP data is not free; it’s an infrastructure. You can hate that it’s expensive, but as long as you want your tasks to run reliably, you’ll find you can’t do without it.

This example is more solid than many grand narratives.

Because it clarifies the problem: the value of tools in the Agent era is often not about “having a beautiful backend,” but about “transforming an inherently unstable, uncontrollable action, cobbled together by human experience, into a stable, callable, billable, and auditable capability.”

Why can SERP API charge money? Not because its interface is pretty. It charges because it transforms chaotic search results into structured data that Agents can consume. If you think it’s expensive, it means the cost is real; if you have no choice but to use it, it means the value is also real.

This is where the next generation of SaaS is likely to emerge.

Not all opportunities look like a new platform. Some opportunities exist in these annoying, expensive, fundamental areas that become indispensable once automated.

Old Things Re-sequenced: API, CLI, Permissions, and Logs

So I’m reluctant to make “Agent-native tools” sound too mystical.

This isn’t some new magic conjured out of thin air. Many so-called new things in the Agent era are actually old things re-sequenced. APIs aren’t new, CLIs aren’t new, webhooks aren’t new, automation scripts aren’t new, plugin ecosystems aren’t new.

What’s new is that these things, which previously mainly served engineers and advanced users, are now starting to serve AI. Or rather, AI is allowing more ordinary users to indirectly leverage these previously engineering-centric capabilities.

Previously, a non-technical user who couldn’t write scripts would find it difficult to chain ten tools together. Now, they can say, “Help me plan today’s website blog topics, check SERP, filter keywords, generate an article outline, and then write it into my content spreadsheet,” and the Agent will attempt to call a bunch of tools behind the scenes.

But whether it can successfully make those calls depends on whether these tools have paved the way for “non-human callers.”

A SaaS that only has web buttons, no API, no stable export, no clear error feedback, no granular permissions, and no batch processing capability might be easy for humans to use, but very difficult for Agents. Conversely, even if a tool has a very simple interface, as long as its capabilities are clear enough, its interface stable enough, and its output clean enough, it can easily be integrated into various Agent workflows.

This is also why the small example of Apple Frames CLI caught my attention.

It’s not because the action of “screenshot framing” is particularly grand, but because it transformed an action that could originally be done manually by a human into a capability that an Agent can call. Where the screenshot is, how to identify device dimensions, where to output, how to batch process, how to be understood by Claude Code/Codex’s skills – once these things are clear, it can enter someone else’s workflow.

That’s the key.

When building tools for humans, you want to make people feel comfortable. When building tools for Agents, you want to make the system feel confident. Comfort comes from interface and experience; confidence comes from structure and boundaries.

This involves many things that previously seemed unsexy: stable input, structured output, exit codes, error codes, logs, idempotency, permissions, auditing, task status, recoverable processes. Before, they were quietly important in the backend, with the frontend handling the appearance and the backend providing the safety net. In the Agent era, they might become part of the product itself.

More accurately, they will become the reasons why a product can or cannot be trusted by an Agent.

Opportunities for Ordinary Developers Are Not in “All-in-One”

This leads to a very practical startup question: Do ordinary developers still have opportunities?

I think yes, but the opportunity is likely not in “building another comprehensive AI SaaS.” That path is too heavy. You’d need to build an account system, billing, team collaboration, permissions, dashboards, integrate a bunch of models, provide customer service, and compete with giants for distribution channels. In the end, the product looks complete, but then a user comes in and asks: “What’s the difference between you and ChatGPT Plus?”

That question stings.

But if you look at it from a different angle, the opportunities become much clearer. Ordinary developers can build a very narrow, stable, and easily callable small capability for Agents. Don’t immediately aim to build an “AI content platform”; you could start by creating a tool that reliably converts Markdown to WeChat Official Account HTML. Don’t immediately aim to build an “AI SEO toolkit”; you could start by creating a tool that reliably outputs page scraping, title/description checks, schema checks, and image alt checks as JSON. Don’t immediately aim to build an “AI e-commerce operations platform”; you could start by creating a small tool that reliably handles product image processing, file naming, alt text generation, and Shopify draft creation.

Taken individually, these things don’t look like huge SaaS products; they might even seem a bit mundane. But they all share one commonality: they are callable capabilities within a workflow.

And callable capabilities, in the Agent era, might be more valuable than a beautiful backend.

In the past, much of SaaS’s business value came from “keeping people within my system.” You log into my backend, upload your data, collaborate on my pages, view results in my dashboard, and complete tasks within my ecosystem. This is a very typical platform mindset.

But Agent workflows will challenge this. Users might not want to enter your system; they just want to hand a task to an Agent, and then the Agent calls a dozen systems behind the scenes. Your product is just one node among them. This might sound like reduced visibility, but the business value isn’t necessarily lower.

Because once you become a critical node, users rely not on your interface, but on the stability of your capability. They don’t open you daily, but their processes call you daily.

This is also why I believe a very important form will emerge in the future, which we can temporarily call “capability SaaS.”

It doesn’t sell a complete system, but rather a clear, stable, callable, and billable small capability. This capability can be used by humans or by Agents. When humans use it, it has a simple interface; when Agents use it, it has an API, CLI, MCP server, schema, logs, permissions, and status callbacks. Humans see the results, Agents see the interfaces, and companies pay for whether this capability can be embedded in their processes long-term.

This is actually good news for independent developers.

Not everyone has the resources to build a large platform, but many can build a narrow capability. In fact, the narrower, the better. If it’s broad, you can’t compete with big tech; if it’s narrow, you can better understand the scenario, the failure points, and what users are willing to pay for.

AI-readable page auditing for cross-border independent websites, stable Markdown to WeChat HTML typesetting for Official Account authors, product image processing and pre-listing checks for Shopify sellers, SERP data cleaning and AI search citation monitoring for SEO teams, automated screenshot, screen recording, release note, and documentation image processing for developers – none of these are “cosmic-level platforms.” But they share a common characteristic: humans don’t want to do them repeatedly, Agents are well-suited to do them, and tools are needed in between.

That’s the opportunity.

But Don’t Overhype It as a Universal Opportunity

Of course, let’s not overhype this.

Building tools for Agents doesn’t mean you can just write any API and make money. There are several very real pitfalls.

Distribution is the first hurdle. Just because an Agent can call you doesn’t mean users know about you. You still need to solve customer acquisition. This could be through SEO, developer communities, template ecosystems, being listed on mainstream Agent platforms, or by breaking in with open-source tools. Standards are the second hurdle. Today everyone talks about MCPs, tools, skills, APIs, workflows, but the standards are still evolving. If you build too early, you might get caught by interface changes; if you build too late, the entry point might already be occupied by others.

Trust is the third hurdle. When an Agent calls a tool, it’s essentially delegating a portion of its execution authority. If your tool handles files, accounts, customer data, publishing permissions, or payment actions, users will be extremely cautious. Billing is the fourth hurdle. Previously, SaaS could charge per seat; Agent tools might be better suited for charging per task, call volume, processing volume, result quality, or workflow node. But how to make this billing model feel reasonable to users still needs to be figured out.

Then there’s substitutability. If your capability is too simple, and large models can do it themselves, or users can replace it with a few lines of script, then it will be difficult to charge. You must have some barrier to entry in terms of stability, scenario understanding, compliance, integration, and delivery quality.

So this direction isn’t a “everyone can get rich” scenario. I hate that kind of talk.

It’s more like a reminder: if you’re thinking of building AI tools now, don’t just focus on chat interfaces and comprehensive platforms. You can look back at those “unsexy” parts that people used to disdain. File processing, format conversion, pre-publication checks, screenshot generation, data cleaning, page auditing, permission control, log organization, report output – these things used to seem like scraps, but in Agent workflows, scraps might become essential junctions.

Whoever guards the junction has the opportunity to collect a toll. Of course, this “toll” cannot be collected by holding people hostage. If you truly rely on holding people hostage, users will eventually find a way around. It needs to rely on stability, affordability, ease of use, and control.

This is the old logic of tool-based SaaS.

AI hasn’t changed this old logic. It has merely made it prominent again.

Don’t Just Build Buttons for Humans, Also Pave a Path for Agents

So, back to the title.

Why do I say that building tools for Agents might be more like the next generation of SaaS than building tools for humans?

Because when Agents become the executors, the value of SaaS will expand from “making humans comfortable to operate” to “making tasks reliably completed.” When users no longer personally click every step, a product’s callability, recoverability, and auditability will become part of the experience. As more and more workflows are chained together by Agents, a small tool, if reliable enough, could become a fixed node in many processes.

This isn’t a particularly romantic future; it’s even a bit plain. Not every next-gen SaaS will look like something out of a sci-fi movie. Many might just be a stable interface, a clean JSON, a reliable command, a small tool with clearly defined permission boundaries.

But business opportunities are often like this.

It’s not always where the most buzz is that the most money is. Sometimes, truly valuable areas are those others deem too small, too fragmented, too unsexy, and thus don’t bother to take seriously. Once Agents truly start working, these small areas will suddenly become infrastructure.

I’m increasingly convinced of one thing. In the future, when building tools, you shouldn’t just ask one question: Will humans use it?

You also need to ask: Will Agents dare to hand the job over to you?

This question might determine many new opportunities for small SaaS.

As for how exactly to do it, which parts are worth doing, which are just pseudo-needs, which can generate revenue, and which are only suitable for open-source lead generation – that’s another topic with some reservations. But at least for today, let’s put this statement out there.

Don’t just build buttons for humans.

Also, pave a path for Agents.

TuneFab streaming music to MP3 converter banner

TuneFab Streaming Music Converter

Download and convert Spotify, Apple Music, YouTube Music, Amazon Music, Deezer, Pandora, SoundCloud, and Audible to MP3, WAV, or FLAC.

  • Supports major streaming music platforms and Audible.
  • Keeps original quality while exporting MP3, WAV, or FLAC.
  • Useful when you want offline listening and a single desktop workflow.
Explore TuneFab

Affiliate links. Opens TuneFab in a new tab.

Mr. Guo Logo

© 2026 Mr'Guo

Twitter Github WeChat