Prompts: The Ledger of Mistakes
The prompts below are the human-side contribution to “The Ledger of Mistakes” — the questions, framing, constraints, and course-corrections that steered the writing. They are reproduced here to delineate which part of the work is Łukasz Stafiniak’s.
Do you see here a material for another AI co-authored blog article, in the spirit of the two “The Furnace” articles I attached? These are documentations from the Ludics project, generated by the agents either when working on Ludics or when participating in Ludics for tasks on other projects.
Your preferred title is a nice callback to philosophy of science, but it’s rather cryptic if someone misses that connection. I prefer one of “The Ledger of Mistakes”, “When Agents Write Their Own Engineering Culture”, “The Bureaucracy That Learns”. The concrete image is not how things happened: the starting point was workflow retrospectives, that I requested for “quality improvements until fixpoint”, and retrospectives have both a project-specific and workflow-specific scopes. But for workflow-specific improvements, agents were overly eager to ask for large additions to prompts. Since prompts consume context and can introduce distractions from the task goals, I decided to give an outlet for such improvement considerations: a document for “what a competent SWE should know – don’t prompt for things obvious to an expert”.
Which concrete findings or “teachings” would we highlight?
Curious about “Do not weaken the invariant to make the test easier”. Is this just about cheating, or is it about balancing complexity against idealized goals?
Would you like to draft the article? For the blog, we aim for between 4000 and 7500 words (soft limit, and if we cross 9000 words of good material it becomes two articles).
I attach a couple relevant Ludics skills for inspection, maybe taking a look can shape or be incorporated into the article. Let’s also keep in mind Ludics is vibe-coded, it’s also a reflection of a moment-in-time of a generation of Claude (which I was using more at the time) and GPT Codex models. I like your writing. Let’s put it in a separate Markdown file if you can. Remember to keep it tight.
Given that we saved some space, maybe we can add a section cataloguing the more interesting of the examples from swe-textbook.md that haven’t landed in the main text? (and maybe ac-rigor-reference.md too if there are some anecdote-worthy ones)
I published, but on reviewing, the section “A moment in time” didn’t land well. I intended it more to indicate that some of the bureaucracy (in prompting) is the AI agent not trusting another agent with execution on a less-defined target. Can I remove this section?