Designing Code AI Can Actually Use
AI can't click buttons, navigate visual interfaces, or see what your app looks like without special tooling. Code that enables an autonomous AI development loop is designed around what AI can actually do: run commands, read structured output, and observe results. Design for the tool, not around it.
CLI-First Design
If AI can run a command, AI can test your app. If it can't, you're the bottleneck in every loop.
Four rules for CLI surfaces that enable autonomous loops: JSON on stdout so AI can parse results, plain-text errors on stderr for separation of concerns, proper exit codes (0 for success, nonzero for failure) so the loop can gate on outcome, and a --help flag so the AI can orient itself without asking you. Break one rule and the loop stalls.
Test credentials go in .testEnvVars, not .env. The separation isn't cosmetic — AI sources .testEnvVars on its own as part of the loop. You don't want that mixed with your production environment file.
| Rule | Why |
|---|---|
| JSON on stdout | AI parses it directly |
| Plain-text errors on stderr | Separation of concerns |
| 0 for success, nonzero for failure | The loop gates on exit codes |
| --help flag | Self-documenting; AI can orient itself |
| .testEnvVars for test creds | Separate from app .env |
Structured Logging
'Error occurred in user service' is useless to an AI. A JSON log line with level, service, action, error, and timestamp is searchable, parseable, and understandable by a model that has never touched your codebase before.
Always log: function entry with inputs, function exit with results, errors with full context, external API calls, and database queries in dev and test. Every gap in the log is a place where AI can't reconstruct what happened — and a place where the fix loop stalls.
Use Pino, structlog, or your language's equivalent. Don't reach for console.log or print for anything the AI might need to reason about.
Test Design
TDD for units: define the contract first, write failing tests, implement to pass. Always verify the Red phase is actually red — AI will sometimes write tests that accidentally pass because of existing code. A test that can't fail is a test that can't catch anything.
Explore-then-codify for integration: let AI probe the running system dynamically, then capture the discoveries as repeatable test scripts. Your job during the Explore phase is to suggest areas to probe deeper and notice anything surprising — not to drive.
Ask the AI to review its own tests before implementation: 'What cases are missing? What assumptions did you make?' AI is excellent at identifying gaps in a test suite it just produced, as long as you ask.
Security Fundamentals
Never paste API keys or secrets into a prompt. Prompts can be logged, cached, and sent to third parties. Reference the environment variable by name instead: 'Use the API key from .testEnvVars (OPENAI_API_KEY) to call the service.' Once a secret is in a prompt, it's out of your control.
Verify .gitignore before the first commit, not after. Secrets leak on commit one, not commit ten. .env, .testEnvVars, *.key, *.pem, and the ai/ folder all belong in .gitignore before the repo touches any remote.
Three patterns to catch in AI-generated code: SQL injection through string interpolation, hardcoded API keys in source, and user input treated as prompt instructions. AI generates these patterns confidently. Your job is to catch them.
The confidence trap: AI makes you faster and more confident, and the Stanford finding is that AI-assisted developers produce more security vulnerabilities while reporting higher confidence in their security. The fix is process: sub-agent review, fresh-session PR review, and catching the three patterns before they ship.
Want to apply these frameworks to your business?