keep moving – a blog on current news and trends in software, hardware and technology

I want to wash my car. The car wash is 50 meters away. Should I walk or drive?

Post author By tongwing
Post date February 17, 2026

A seemingly trivial question has been making the rounds recently:

“I want to wash my car. The car wash is 50 meters away. Should I walk or drive?”

Several leading LLMs were asked this question and their responses were confident and polished, but wrong.

At first glance, the correct answer seems obvious. If you want to wash your car and the car wash is 50 meters away, you must drive the car there. Walking defeats the purpose. Yet some models overcomplicate the scenario, interpret it abstractly, or default to generic health or environmental advice (walking is healthier, driving such a short distance is inefficient, etc). The reasoning drifts away from the core constraint: the car needs to be at the car wash.

This is the new strawberry test for LLMs.

We’ve seen similar examples before. Asked whether there is a seahorse emoji – there isn’t – models sometimes confidently assert that there is, even describing what it looks like. This isn’t a hallucination in the dramatic sense; it’s a failure of calibration. The model generates what is statistically plausible rather than what is true.

These cases are interesting precisely because they are not edge cases. They are mundane, everyday scenarios. They don’t require advanced logic or domain expertise. They require grounding – an alignment between language generation and basic world constraints.

What’s happening?

Large Language Models do not “understand” situations in the way humans do. They operate by predicting likely token sequences given prior context. When a question resembles a familiar pattern (“Should I walk or drive?”), the model retrieves the statistical shadow of similar discussions – health, carbon footprint, convenience – rather than reconstructing the physical constraints of the specific scenario.

These examples highlight something deeper than hallucination. They expose architectural limitations:

LLMs optimize for linguistic plausibility, not truth.
They lack persistent grounding in physical reality.
They may substitute culturally common advice for situational reasoning.
Their “world model” is probabilistic, not causal.

None of this makes LLMs useless – far from it. But it does clarify their boundaries. The more a task depends on implicit physical constraints or unstated but obvious real-world logic, the more brittle purely statistical reasoning becomes.

Examples like these reminds us of an important principle: fluency is not understanding. Confidence is not correctness. And plausibility is not the same as truth.

Incidentally, on the original car washing problem, some models like Claude Opus 4.6 and Gemini 3 do gave the correct response.

ai programming

Unrolling the Codex agent loop | OpenAI

Post author By tongwing
Post date January 24, 2026

This article describes the implementation of Codex behind the scenes. Codex CLI is an open-source software engineering agent that helps developers read, write, and reason about code by combining LLM with tooling. Unlike Claude Code, another popular agentic tool, Codex is open source, which makes its design choices and internal mechanics transparent and easier to learn from.

As of now, I still prefer Claude Code for day-to-day usage, but Codex is catching up fast.

We’ve introduced the Codex agent loop and walked through how Codex crafts and manages its context when querying a model. Along the way, we highlighted practical considerations and best practices that apply to anyone building an agent loop on top of the Responses API.

Source: Unrolling the Codex agent loop | OpenAI

privacy programming security

Threat Actors Expand Abuse of Microsoft Visual Studio Code

Post author By tongwing
Post date January 22, 2026

This is a relatively new attack technique that specifically targets developers. It typically begins under the guise of a technical interview, where candidates are asked to review a codebase by cloning a Git repository. Unknown to the victim, the repository is malicious.

The attack leverages a lesser-known feature of Visual Studio Code called Tasks. When a developer opens the cloned project and trusts the workspace, VS Code can automatically interpret and execute configurations defined in tasks.json. This behavior allows a backdoor or malicious command to run without the developer explicitly initiating it.

Notably, many developers – including myself – are unaware of how powerful and potentially dangerous this feature can be when abused. This makes the attack particularly effective, as it exploits implicit trust in development tools rather than traditional software vulnerabilities.

One variant of the malware deployed by this technique targets the crypto wallets on the developer’s machine.

Jamf Threat Labs uncovers North Korean hackers exploiting VS Code to deploy backdoor malware via malicious Git repositories in the Contagious Interview campaign

Source: Threat Actors Expand Abuse of Microsoft Visual Studio Code

cloud legal

Build production-ready applications without infrastructure complexity using Amazon ECS Express Mode | AWS News Blog

Post author By tongwing
Post date November 30, 2025

AWS just introduced a new wizard UI called Amazon ECS Express Mode. Amazon ECS is Amazon’s own container orchestration service. Compared to EKS, it is much simpler. However there are still quite a number of concepts (eg. task definition, task, service etc) to learn and steps to perform. ECS Express Mode simplifies it to just a single page or single CLI command. It’s like the equivalent of LightSail for EC2.

Honestly, I am not sure who the intended audience is for ECS Express Mode. It is ok for spinning up something quickly for demonstrations or testing purposes. However, for teams operating production-ready ECS clusters, the required configurations are typically more complex, eg. multiple container images, more sophisticated autoscaling strategies, and various networking and storage requirements etc. A more useful feature might be to import docker compose configuration directly.

Amazon ECS Express Mode provides a simplified interface to the Amazon ECS service resource with new integrations for creating commonly used resources across AWS. ECS Express Mode automatically provisions and configures ECS clusters, task definitions, Application Load Balancers, auto scaling policies, and Amazon Route 53 domains from a single entry point.

programming

Software Failures and IT Management’s Repeated Mistakes – IEEE Spectrum

Post author By tongwing
Post date November 26, 2025

This IEEE article talks about the failures of software projects. It is timely, as the world looks to AI as the panacea to all problems. Arguably, talent problems might be alleviated by proper use of AI tools – though even that is not a guarantee.

Cautionary tales like the Phoenix’s payroll meltdown reminds us that software project failures can have real-world consequences. It is a sombre reminder that complex software projects still need to be properly planned, managed and executed, even if we have much better tools today and beyond.

As I heard from a wise man, it’s a people, not a software problem.

Why do software failures persist despite soaring IT budgets? Dive into the complexities that keep success elusive.

Source: Software Failures and IT Management’s Repeated Mistakes – IEEE Spectrum

ai programming

MCP Apps: Extending servers with interactive user interfaces | mcp blog

Post author By tongwing
Post date November 23, 2025

New extension to MCP protocol enables support for interactive UI for MCP hosts. The initial extension specification supports only text/html content. Instead of just returning raw text, MCP servers can now deliver UI in the way that is intended to be visualized, eg. in the form of a HTML chart.

Today we’re introducing the proposal for the MCP Apps Extension (SEP-1865) to standardize support for interactive user interfaces in the Model Context Protocol.This extension addresses one of the most requested features from the MCP community and builds on proven work from MCP-UI and OpenAI Apps SDK – the ability for MCP servers to deliver interactive user interfaces to hosts.MCP Apps Extension introduces a standardized pattern for declaring UI resources, linking them to tools, and enabling bidirectional communication between embedded interfaces and the host application.

Source: MCP Apps: Extending servers with interactive user interfaces | mcp blog

Announcing Imagen 4 Fast and the general availability of the Imagen 4 family in the Gemini API – Google Developers Blog

Post author By tongwing
Post date August 16, 2025

Google’s Imagen 4 Fast is now generally available. I tested it to generate a vintage-style stamp featuring Singapore in the 60s – inspired by one of the sample prompts.

Not bad, except that a Satay stall would not normally look like a pushcart. Try it in Google AI Studio.

Discover Imagen 4 Fast, Google’s new speed-optimized text-to-image model, now generally available with Imagen 4 and 4 Ultra in the Gemini API.

Source: Announcing Imagen 4 Fast and the general availability of the Imagen 4 family in the Gemini API – Google Developers Blog

ai programming

I vibe-coded a new UI for ERP 2.0

Post author By tongwing
Post date July 6, 2025

In preparation for ERP 2.0, I recently changed my IU to the new OBU, but opted out of the touchscreen installation. LTA provides an option to download their ERP 2.0 app, which is supposed to function like the OBU screen to some extent. It includes functions like ability to view ERP charging details, traffic alerts, and manage payment options via your smartphone.

The ERP 2.0 app only runs in landscape mode and requires pairing with the OBU.

The app displays pretty much the same info as the physical touchscreen, which I thought is quite basic. I understand the need to not distract drivers with too much info, but surely it can provide more useful content than function as a glorified digital clock?

As I had a bit of time on a Saturday afternoon I decided to see what I can prototype using v0.dev, the AI-powered tool from Vercel. This is my first time using v0.dev and this is the result:

You can play around with the live app deployed to Vercel. This was done entirely in v0.dev in an hour with default settings with the free plan. It even works in portrait mode 🙂

You can see how I build the app by looking at the chat log. The app pretty much works without errors in each generation step. This is definitely a more polished way to develop quick frontend prototypes, compared to the early days of copy-pasting code, and asking it to make fixes.

Having said that, I do have to give quite specific instructions at some point to guide the model to make changes in the exact way that I want, for example to get the progress ring to start at the bottom instead of the top.

You can start building in Vercel for free, with USD5 of monthly credits. The app that I created cost me USD0.62. So I still have enough credit to play around before it runs out.

The Future of Work with AI Agents — Insights from a Stanford Study | by Cobus Greyling | Jun, 2025 | Medium

Post author By tongwing
Post date June 27, 2025

The original article from Stanford’s SALT Lab explores how AI agents are rapidly transforming the workplace. This medium post highlights some key points from the study.

The research suggests that AI Agents could fundamentally reshape core human competencies, shifting the focus from information management to interpersonal strengths.

Training and Teaching Others ranks highly in human agency, which accordingly to the study, implies that human involvement is required and AI should only be used to augment the task. My job is safe for now 🙂

Source: The Future of Work with AI Agents — Insights from a Stanford Study | by Cobus Greyling | Jun, 2025 | Medium

Building Effective AI Agents \ Anthropic

Post author By tongwing
Post date June 18, 2025

This article gives a good definition of AI agents vs workflows, since many use commonly conflate the 2 terms.

Workflows are systems where LLMs and tools are orchestrated through predefined code paths. Agents, on the other hand, are systems where LLMs dynamically direct their own processes and tool usage, maintaining control over how they accomplish tasks.

Source: Building Effective AI Agents \ Anthropic