In Setting Destination, we pictured Zedge‑style “smart sidekicks” co‑creating, curating, and enriching.(Zedge Blog) This follow‑up is the pragmatic sequel: how agentic systems are actually built, where they shine, where they stall, and how they’re evolving - without losing the curiosity that got us here.

Agentic AI = systems that plan, act with tools, remember, and adapt. The ecosystem now ships this out of the box: OpenAI’s Responses API + Agents SDK add first‑party tools (web search, file search, computer use), session state, handoffs, and tracing; Anthropic’s Computer Use lets Claude operate the desktop when APIs don’t exist. These aren’t just chatbots; they’re action systems with audit trails.

The Evolution of Agents (2023 → 2025)

That arc echoes the design framing in Stop Prompting, Start Designing: 5 Agentic AI Patterns That Actually Work - but below we go deeper on how to ship each pattern. DEV Community

Patterns that actually work (flows + build tips)

1) ReAct — Interleave reasoning with tool‑use

Default for retrieval/browsing/short, multi‑step tasks; reduces hallucinations by forcing evidence.

Implementation tips

Use typed tool schemas (reject malformed URLs/IDs); allow‑list tools and params.
Log every think/act/observe step; keep tool I/O in your trace for audit/RCA.
ReAct remains a strong baseline for knowledge tasks and simple web flows. arXiv

2) Reflexion - Self‑critique, then retry

Cheap quality lift when you can validate an answer (tests, heuristics, rules).

Implementation tips

Cap to 1-2 retries; require a pass/fail predicate (unit test, regex, score).
Store reflections as episodic notes (short‑term memory), distinct from long‑term.
Expose reflections in traces so humans learn too. arXiv GitHub

3) Plan‑then‑Act (ReWOO) - Reason first, fetch later

Cuts token cost/latency by separating “thinking” from “gathering.”

Implementation tips

Treat evidence as a typed artifact (JSON/Markdown) with size caps; cache by (tool, query, params).
Fail fast if a planned tool isn’t reachable; re‑plan with cheaper backoff.
Reported 5× token efficiency on HotpotQA in the paper’s evaluations. arXiv OpenReview

4) Tree‑of‑Thoughts / LATS - Explore multiple paths

For thorny planning/coding/web tasks: generate candidate “thoughts,” prune, expand; pick the best path.

Implementation tips

Bound breadth/depth; use a value function (LLM self‑rating or heuristics).
Persist each branch’s artifacts to tracing for RCA.
LATS unifies reasoning + acting + planning via MCTS; effective for programming/web. arXiv+1

5) Multi‑agent “teams” - Router → specialists → critic/approver

Compose skills, isolate risk, keep approvals where it matters.

Implementation tips

Keep each specialist’s tool surface area tiny; typed contracts only.
Use handoffs for clean delegation; approvals appear in the trace by default.
Track per‑role SLAs (latency, quality, intervention rate). OpenAI Platform OpenAI GitHub

Toolbox (single glance)

PydanticAI - “FastAPI‑for‑agents” DX: typed tools & strict I/O validation, generic type‑safe agents, and pydantic‑graph under the hood. Great for contracts and guardrails.
OpenAI Responses API + Agents SDK - stateful runs, first‑party tools (web/file search, computer use), handoffs, guardrails, and built‑in tracing.
Anthropic Computer Use - desktop control via screenshot/mouse/keyboard for legacy UI flows (beta). Anthropic

Where it shines (public wins to learn from)

Consumer support at scale - Klarna: assistant handled ~two‑thirds of chats in month one (≈2.3M conversations), with major time/cost gains. Narrow scope + approvals = speed + trust. PR Newswire
Enterprise coding - GitHub Copilot Coding Agent: assign issues, open PRs, and tag you for review; available across Business/Enterprise plans; runs in a secure, cloud‑hosted dev environment. GitHub Docs The GitHub Bl
E‑commerce copilot - Shopify Sidekick: merchant assistant launched via early access, now sharing production lessons on evals and GRPO training. The Verge Shopify
Cloud IDE autonomy - Amazon Q Developer: agentic coding inside VS Code (actions on your behalf), with iterative updates across code review, tests, and docs. Amazon Web Services, Inc.+1

Pattern fit: ReAct/ReWOO for support & analytics; multi‑agent + approvals for coding & enterprise workflows; computer‑use when you must cross legacy UI.

Reality checks (a/k/a the limits)

Benchmarks that simulate “real work” remain sobering:

WebArena & VisualWebArena: realistic websites + visually grounded tasks expose brittleness in multi‑step UI flows and visual grounding. arXiv+1
AgentBench: across 8 environments, failures cluster in long‑term reasoning/decision‑making - agents need tighter scaffolding and evaluation. arXiv
SWE‑bench: real GitHub issues remain hard; naive autonomy disappoints without tests, sandboxes, and review gates. arXiv

Design consequences

Stay narrow: crisp goals, bounded steps, explicit success checks.
Prefer short horizons: many simple plans >> one heroic chain.
Trace everything: if you can’t inspect plan/tool calls, you can’t improve them.

Memory that lasts (without monstrous prompts)

Long‑lived assistants need OS‑style virtual memory: short‑term working context ↔ recall store ↔ long‑term archive. MemGPT formalizes this “paging” model so agents remember across sessions—without stuffing the prompt. arXiv

Security & governance (practical, not preachy)

Least‑privilege tools, time‑boxed creds; allow‑list domains/APIs for browsing.
Typed tool I/O and schema validation on every hop (your easiest win).
Approval gates for state‑changing actions; immutable traces for audit/RCA.

So… what does “limits of unlimited” really mean?

The promise is huge - but unbounded autonomy is a myth (for now). Agents don’t win because they do everything. They win because they do some things extremely well:

Tight scopes with observable steps.
Patterns that balance planning with evidence.
Typed contracts that make tool‑use safe and predictable.
Human approvals where stakes are high.

Where we go next (optimistic, and specific)

UI‑native agents become normal: “computer‑use” grows up, pairing with design systems so agents reliably click/type/verify across apps. Anthropic
Memory becomes skill: OS‑like paging (MemGPT) merges with reusable “procedures” learned from past runs-less prompting, more doing. arXiv
Evaluation goes CI: web/visual/coding benchmarks turn into continuous gates; we ship behind tests, not vibes. arXiv+2arXiv+2
Governance productizes: tracing dashboards + NIST profiles + OWASP checks feel like linting for autonomy - limits as a feature, not a bug. OpenAI GitHub

We can still be dreamers. The trick is dreaming with guardrails - so the horizon keeps getting closer, one reliable handoff at a time.

From Prompts to Partners: The Limits of Unlimited Agentic AI

Patterns that actually work (flows + build tips)

1) ReAct — Interleave reasoning with tool‑use

2) Reflexion - Self‑critique, then retry

3) Plan‑then‑Act (ReWOO) - Reason first, fetch later

4) Tree‑of‑Thoughts / LATS - Explore multiple paths

5) Multi‑agent “teams” - Router → specialists → critic/approver

Toolbox (single glance)

Where it shines (public wins to learn from)

Reality checks (a/k/a the limits)

Memory that lasts (without monstrous prompts)

Security & governance (practical, not preachy)

So… what does “limits of unlimited” really mean?

Where we go next (optimistic, and specific)

Posted by Gediminas Sadaunykas

Patterns that actually work (flows + build tips)

1) ReAct — Interleave reasoning with tool‑use

2) Reflexion - Self‑critique, then retry

3) Plan‑then‑Act (ReWOO) - Reason first, fetch later

4) Tree‑of‑Thoughts / LATS - Explore multiple paths

5) Multi‑agent “teams” - Router → specialists → critic/approver

Toolbox (single glance)

Where it shines (public wins to learn from)

Reality checks (a/k/a the limits)

Memory that lasts (without monstrous prompts)

Security & governance (practical, not preachy)

So… what does “limits of unlimited” really mean?

Where we go next (optimistic, and specific)

Share with friends

Posted by Gediminas Sadaunykas

Stay in the loop