Field notes / 2026

The log/

What we read while we build.

Short notes from a small Paris studio. Model choice, open-source infrastructure, product taste, benchmarks, systems that run, and the strange new shape of companies built around software.

Open-source when the system needs ownership/

The question is not ideology. The question is where control matters.

For some builds, managed platforms are the right move. They let you ship faster and remove noise. For other builds, the core system should be close to the metal: model gateway, data store, queue, logs, evals, deployment.

We like open-source when it makes the product more understandable. When the system is central to the business, ownership is not a luxury. It is how you debug, adapt, and keep moving.

InfraSelf-hostedControl

How we choose a model for a build/

The best model is the one that fits the system, not the leaderboard.

We start with the job: classify, write, search, reason, extract, call tools, or decide. Then we test the failure modes. Latency, price, context, privacy, deployment, and eval quality matter as much as raw intelligence.

A small local model can be perfect for one step. A frontier model can be worth it for a hard decision. Most real products need routing, not model worship.

CostEvalsRoutingLatency

Benchmarks are inputs, not decisions/

MLPerf Inference v6.0 is useful because it looks more like deployment.

MLCommons released MLPerf Inference v6.0 with new and updated datacenter tests, including large-language-model and reasoning workloads. The interesting signal is not a single winner. It is the direction: inference is becoming a systems problem.

For builders, the useful question is practical. Can this stack serve the workload at the right latency, price, observability, and failure rate? A benchmark can start the conversation. It cannot finish it.

"real-world scenarios for AI deployments"

Source: MLCommons Published: April 1, 2026 Read source
BenchmarksInferenceReasoningSystems

Agents need supervision by design/

Autonomy is useful when the boundaries are clear.

A good agent does not pretend to be magic. It knows its tools, its budget, its confidence, and the moments where it should stop and ask.

The product work is in the rails: approvals, logs, replay, escalation, memory, rollback. That is where trust is built. Not in a prompt, in the system around it.

AgentsHuman in the loopTrust

Back to the forge /

Return to Execudo /