Back/Engineering/Codex
AdvancedEngineeringCodex

How to Use AI Coding Agents for Exhaustive Infrastructure Benchmarking

Leverage powerful AI coding agents to perform deep, exhaustive benchmarking on complex infrastructure problems. This workflow automates the process of testing numerous solutions, like different database engines or index types, to find the optimal configuration without manual effort.

How to Use AI Coding Agents for Exhaustive Infrastructure Benchmarking

Tools Used

Codex

OpenAI's cloud-based AI software engineering agent that can execute code, run tests, and handle complex multi-file tasks autonomously.

02Step-by-Step Guide
1

Identify the Problem and Define Success

Start with a challenging engineering problem, such as slow database queries. Define a clear, open-ended goal for the AI agent (e.g., 'make these queries faster') without prescribing the solution.

Pro Tip: Frame the problem as a hard evaluation for the model. Instead of asking for a specific solution, create the right tests and success criteria and let the agent creatively explore the solution space.
2

Set Up the Agent Environment

Provide a powerful coding agent like Codex with access to a realistic testing environment. This involves using production-like data and running the agent on a high-powered remote machine (e.g., AWS EC2) to handle the intense compute load. Use tools like tmux to manage multiple, persistent agent sessions.

Pro Tip: Ensure the agent's environment is representative of your production stack to get meaningful, real-world performance results. If using actual production data, exercise extreme care and use it from object storage to avoid impacting live services.
3

Run Exhaustive Experiments

Task the agent with exploring the entire solution space by running experiments continuously for hours or even days. For a slow query problem, this could involve instructing the agent to test every available open-source column store format, benchmark every compatible execution engine against those formats, and test different index types like bloom filters.

Pro Tip: The power of this approach lies in the agent's ability to tirelessly benchmark combinations that a human engineer might dismiss or only test superficially. This rigor can uncover non-obvious solutions that provide significant performance gains.

Start shipping
better products.

Join 100,000+ product managers who use ChatPRD to write better docs, align teams faster, and build products users love.

Free to start
No credit card
SOC 2 certified
Enterprise ready