Categories
ai programming

Unrolling the Codex agent loop | OpenAI

This article describes the implementation of Codex behind the scenes. Codex CLI is an open-source software engineering agent that helps developers read, write, and reason about code by combining LLM with tooling. Unlike Claude Code, another popular agentic tool, Codex is open source, which makes its design choices and internal mechanics transparent and easier to learn from.

As of now, I still prefer Claude Code for day-to-day usage, but Codex is catching up fast.

We’ve introduced the Codex agent loop and walked through how Codex crafts and manages its context when querying a model. Along the way, we highlighted practical considerations and best practices that apply to anyone building an agent loop on top of the Responses API.

Source: Unrolling the Codex agent loop | OpenAI

Categories
ai programming

MCP Apps: Extending servers with interactive user interfaces | mcp blog

New extension to MCP protocol enables support for interactive UI for MCP hosts. The initial extension specification supports only text/html content. Instead of just returning raw text, MCP servers can now deliver UI in the way that is intended to be visualized, eg. in the form of a HTML chart.

Today we’re introducing the proposal for the MCP Apps Extension (SEP-1865) to standardize support for interactive user interfaces in the Model Context Protocol.This extension addresses one of the most requested features from the MCP community and builds on proven work from MCP-UI and OpenAI Apps SDK – the ability for MCP servers to deliver interactive user interfaces to hosts.MCP Apps Extension introduces a standardized pattern for declaring UI resources, linking them to tools, and enabling bidirectional communication between embedded interfaces and the host application.

Source: MCP Apps: Extending servers with interactive user interfaces | mcp blog

Categories
ai

Announcing Imagen 4 Fast and the general availability of the Imagen 4 family in the Gemini API – Google Developers Blog

Google’s Imagen 4 Fast is now generally available. I tested it to generate a vintage-style stamp featuring Singapore in the 60s – inspired by one of the sample prompts.

Not bad, except that a Satay stall would not normally look like a pushcart. Try it in Google AI Studio.

Discover Imagen 4 Fast, Google’s new speed-optimized text-to-image model, now generally available with Imagen 4 and 4 Ultra in the Gemini API.

Source: Announcing Imagen 4 Fast and the general availability of the Imagen 4 family in the Gemini API – Google Developers Blog

Categories
ai programming

I vibe-coded a new UI for ERP 2.0

In preparation for ERP 2.0, I recently changed my IU to the new OBU, but opted out of the touchscreen installation. LTA provides an option to download their ERP 2.0 app, which is supposed to function like the OBU screen to some extent. It includes functions like ability to view ERP charging details, traffic alerts, and manage payment options via your smartphone.

The ERP 2.0 app only runs in landscape mode and requires pairing with the OBU.

The app displays pretty much the same info as the physical touchscreen, which I thought is quite basic. I understand the need to not distract drivers with too much info, but surely it can provide more useful content than function as a glorified digital clock?

As I had a bit of time on a Saturday afternoon I decided to see what I can prototype using v0.dev, the AI-powered tool from Vercel. This is my first time using v0.dev and this is the result:

You can play around with the live app deployed to Vercel. This was done entirely in v0.dev in an hour with default settings with the free plan. It even works in portrait mode 🙂

You can see how I build the app by looking at the chat log. The app pretty much works without errors in each generation step. This is definitely a more polished way to develop quick frontend prototypes, compared to the early days of copy-pasting code, and asking it to make fixes.

Having said that, I do have to give quite specific instructions at some point to guide the model to make changes in the exact way that I want, for example to get the progress ring to start at the bottom instead of the top.

You can start building in Vercel for free, with USD5 of monthly credits. The app that I created cost me USD0.62. So I still have enough credit to play around before it runs out.

Categories
ai

The Future of Work with AI Agents — Insights from a Stanford Study | by Cobus Greyling | Jun, 2025 | Medium

The original article from Stanford’s SALT Lab explores how AI agents are rapidly transforming the workplace. This medium post highlights some key points from the study.

The research suggests that AI Agents could fundamentally reshape core human competencies, shifting the focus from information management to interpersonal strengths.

Training and Teaching Others ranks highly in human agency, which accordingly to the study, implies that human involvement is required and AI should only be used to augment the task. My job is safe for now 🙂

Source: The Future of Work with AI Agents — Insights from a Stanford Study | by Cobus Greyling | Jun, 2025 | Medium

Categories
ai

Building Effective AI Agents \ Anthropic

This article gives a good definition of AI agents vs workflows, since many use commonly conflate the 2 terms.

Workflows are systems where LLMs and tools are orchestrated through predefined code paths. Agents, on the other hand, are systems where LLMs dynamically direct their own processes and tool usage, maintaining control over how they accomplish tasks.

Source: Building Effective AI Agents \ Anthropic

Categories
ai programming

Coding agents have crossed a chasm // flurries of latent creativity

Coding agents or helpers have improved by leaps and bounds in the past few months. However, when people ask me whether they still need to learn coding, my answer is an unequivocal yes. Coding in the future – or even right now – is going to look very different from coding in the past. But it is still a skill that needs to be learnt. Think of it like photography: almost everyone carries a camera in their pocket today, but that doesn’t automatically make them a skilled photographer.

Everyone can generate a snake game in an instant nowadays. These are the low hanging fruit – much like taking a photo with your phone doesn’t require much skills. But as soon as you move up the rung to something more complex, the ability of an untrained person to do so breaks down rather quickly.

This quote from the article is particularly relevant today:

Without a solid foundation, you can’t distinguish between good AI suggestions and plausible-sounding nonsense. Therefore, a solid understanding of how to write and structure code remains really important to use this technology best.

Source: Coding agents have crossed a chasm // flurries of latent creativity

Categories
3D ai

HoloPart: Generative 3D Part Amodal Segmentation

This research is somewhat related to my early-day work on 3D model segmentation. The difference in approach is that while I am developing an algorithmic method to achieve segmentation, they are tapping on the new generative AI techniques such as attention mechanism and diffusion methods. The results are impressive, as shown in the examples.

 

generative 3D part amodal segmentation–decomposing a 3D shape into complete, semantically meaningful parts.

Source: HoloPart: Generative 3D Part Amodal Segmentation

Categories
ai programming

Amazon introduces SWE-PolyBench, a multilingual benchmark for AI Coding Agents | AWS DevOps & Developer Productivity Blog

It is always good to have diversity in benchmarks, to avoid over-reliance and overfitting on one set of benchmarks. AWS just released SWE-PolyBench, their benchmark to evaluate AI coding agents’ ability to navigate and understand complex codebases.

Unlike SWE-Bench, which only works for Python code, SWE-PolyBench is designed to work for additional languages like Java, JavaScript and TypeScript.

Today, Amazon introduces SWE-PolyBench, the first industry benchmark to evaluate AI coding agents’ ability to navigate and understand complex codebases, introducing rich metrics to advance AI performance in real-world scenarios. SWE-PolyBench contains over 2,000 curated issues in four languages. In addition, it contains a stratified subset of 500 issues (SWE-PolyBench500) for the purpose of rapid experimentation. SWE-PolyBench evaluates the performance of AI coding agents through a comprehensive set of metrics: pass rates across different programming languages and task complexity levels, along with precision and recall measurements for code/file context identification. These evaluation metrics can help the community address challenges in understanding how well AI coding agents can navigate through and comprehend complex codebases

Perhaps unsurprisingly, Amazon Q Developer Agent is currently leading this benchmark in the leaderboard. It remains to be seen how well-adopted this new benchmark will be.

Categories
ai cloud

DeepSeek-R1 now available as a fully managed serverless model in Amazon Bedrock

In my previous writeup, I wrote that you have to spent a lot (using GPU), or put up with very slow performance (using CPU) if you wanted to use Deepseek R1 on AWS. Not anymore. AWS now offers Deepseek R1 as a base model starting from 10 Mar 2025 (in selected regions). Check out AWS blog on the demo walkthrough.

Just take note that you may have to increase the maximum output length in order to complete your request – this applies to most reasoning models. In my test, output was abruptly stopped halfway, as the default output token length is only 4096. Extending the output length solves the problem.