Steeping thought

This is something I’m still not sure about. I’m mulling over this idea, please reach out with your perspective.

Since LLMs and AI can generate an absurd amount of code in a relative short time period, the workload of reviewing deliverables (code) increases drastically.

When you have true observability, you will have more insights and are able to spot regressions or missed features quicker.

What about tests?

Historically, regression and feature completeness was managed by tests, be it unit tests or end-to-end tests. This however lives close to the codebase, which the AI has access to and is able to edit or even remove failing tests to fit into its narrative when it’s incorrect.

The issue here is that even when you add limitations, permissions or scopes that the LLM needs to adhere to. You will become a bottleneck real quick. You either need to review the additions, modifications or deletions, creating extra reviewing glue work and administrative friction. Or you need to create the tests yourself, which will slow the implementation down.

Gherkin tests or something of the sorts might alleviate your pain, but it might still drift if the AI can touch this.