I attended AI Engineer Paris 2025 and here are my key takeaways/tools/concepts I learned, in no particular order. The event attracted lots of great startups and big companies from the EU and US so the talent density was high!

Current state:

Observability all day, every day

The Agentic world seems to be more mature. Multiple companies are building Observability features, into their existing products. It seems to be almost table stakes at this point. Let’s take Sentry for example: their logic is that if you are already an existing customer who uses their product for bug tracking/web app Observability, it’s only natural to give you new features so you can debug and observe your AI Agents. It almost feels like it has already been commoditized. In reality, it seems very similar to web app analytics—it’s difficult to innovate when you are providing graphs, error codes, time execution stats, etc. They are super important and much needed, of course—but so are the analytics in a simple web app.

The question is: Which product will you use for Agent Observability? Depending on your team/company size, of course, the answer will be different, but it’s easy to see a fragmented Observability space for your Agents, where your team uses 5+ products. It almost feels like it’s the perfect time for someone to build the “Segment” for Agent Observability—where you can send your data to one provider and ensure capture, then use whatever visualizer or end product you want. [inner_monologue_starts] Maybe I should build that product!? [inner_monologue_ends]

DIY Vertical Cloud

I was amazed by how many companies are selling their own bare metal cloud for specialized workloads. It seems that in the last 5 years, so many projects were open-sourced that you can basically recreate a mini version of AWS that specializes, e.g., in AI code or in running Github Actions and get > 50% better execution times. Of course, it’s much harder than that, but still, seeing the pitches from some startups it almost feels approachable to rent your bare metal servers from a German provider, use MicroVM, and build a product that can run on top of it.

MCP Ecosystem Maturity and Monetization

Really happy to see that the MCP side of things is maturing as well. A lot of solutions solve fundamental problems like discovering a server’s potential actions and, most importantly, authorization of MCP servers with 3rd party services. The best one I saw was Apify’s product, where they are trying to build the Substack/Shopify of MCP servers—giving monetization tools to creators and an easy way for developers/users to pay and use their MCP servers.

The Future/Trends to watch

I often come back to the quote: “The future is already here – it’s just not very evenly distributed.”

  • Observability for Agents in every product that has a dashboard
  • New tooling for MCPs (from deploying to maintaining to making money)
  • Purpose-built agents for existing products (e.g., Datadog agent for taking action on an error threshold)

In short, I am feeling really optimistic about the future of AI Engineering tooling in the next year. After the disappointment that was GPT-5, people seem to have gone back to focusing on moving the needle forward in the fundamental building blocks.

Companies & Tools that got me interested

Docker Hub MCP Server

A New Way to Discover, Inspect, and Manage Container Images

MicroVM

A microVM is a lightweight virtual machine between containers and full VMs. It provides security and isolation like VMs but with near-container efficiency. Firecracker and KVM are leading implementations.

roocode

Roo’s specialized modes stay on task and ship great code. Open source and works with any model.

🤖 cagent 🤖

A powerful, easy to use, customizable multi-agent runtime that orchestrates AI agents with specialized capabilities and tools, and the interactions between agents.

Light LLM

LightLLM is a Python-based LLM (Large Language Model) inference and serving framework, notable for its lightweight design, easy scalability, and high-speed performance.

daytona.io

Secure and Elastic Infrastructure for Running Your AI-Generated Code.

Concepts i learned

Thinking traces

Thinking traces are the hidden reasoning steps an LLM takes before producing an output. OpenAI’s new Responses API allows structured access to these traces. This makes debugging, auditing, and model interpretability easier.

Context rot

Context rot refers to performance drop in LLMs with longer context inputs. Even if the task stays constant, accuracy deteriorates as text length increases. Research highlights this as a key challenge for scaling context windows.

Needle in a haystack

The test involves hiding a small fact (the “needle”) inside a very long passage (the “haystack”). Then the model is asked to retrieve that fact. It benchmarks how effectively models use their context windows.

Speech diarization

Speech diarization is the process of splitting audio into segments by speaker. It answers the question “who spoke when?” in a conversation. This is key for meetings, transcription, and smart assistants.