Most AI work is still presentation. I am interested in systems that hold up under real use.
I work on applied AI systems, research automation, and production-oriented infrastructure:
- systems that evaluate, compare, and validate models
- workflows that turn research into usable decisions
- product and infrastructure patterns that survive contact with reality
- AI evaluation and benchmarking
- agentic and research workflows
- realtime AI applications
- edge-native application architecture
- VPS-based infrastructure for deployed systems
- practical validation of emerging AI capabilities
I am most interested in questions like:
- How do you evaluate whether a model or workflow is actually useful?
- What makes an AI system robust enough to operate beyond a prototype?
- Which parts of an AI workflow should be automated, and which should remain human-led?
- How do infrastructure choices shape product quality, speed, and reliability?
-
aa-llm-benchmarking
Cloudflare Worker app for exploring and visualizing Artificial Analysis benchmark data. -
realtime-voice-assistant
Lightweight realtime voice assistant prototype focused on practical voice AI interaction patterns. -
cb-api-chatgpt-plugin
Early example of connecting conversational interfaces to real APIs.
- The New Coder Class: Gen X, Rebooted. Fast Builders.
A book about how AI shortens the distance between judgment and execution, why value is moving toward problem framing and orchestration, and why the curious operator is becoming a central figure in modern knowledge work.
Right now I am focused on:
- evaluation and benchmarking as a product capability
- research automation as a practical operating layer
- AI systems that combine product thinking with real infrastructure
- building with enough rigor to separate signal from hype
- Website: diegoromero.es
- Book: The New Coder Class
- LinkedIn: diegoromerosm
- Hugging Face: dromerosm
- X: @diegoromerosm



