Building dbRosetta Using AI: Part 2 of Many
This is the next installment of the series on building a database and an application called dbRosetta using AI/LLM. Part 1 introduces the concept.
THE AI PICKED DATABASE FIRST!
Look, I talk databases at this thing a lot, so it probably knows my own preference, but when I asked it, it chose to build a database separate from the code. Let’s get into it.
Defining the Project
To start with, I set up the code repo & wrote the intro article (link above), all with some input from the AI. I knew a bunch of what I wanted out of the project, but I want to let the AI make a lot of the decisions here, so we see how all that works out. My initial prompt was:
Right, so let’s get started. I’ve created a repository on github. I’ve initialized the series of blog posts with an introduction. Now, we need to do three things. First, establish our best approach for outlining the behavior of the application, which I assume means prompt outlines? Then, we have to establish how we’re building the app, whether it’s code first or database first, which language to use, etc. Finally, any kind of plan around what to publish when.
In this case, I won’t publish the whole, lengthy, response. Just this bit:
- Short recommendation — Hybrid, start database-first for examples, support code-first for tooling
- Start with a database-first approach for the canonical reproducible examples: author baseline schemas and migrations as SQL artifacts that are authoritative. This keeps examples portable, platform-focused, and easy to run in CI using Dockerized DBs.
- Build tooling (generators, validators, orchestration) as code-first modules so contributors can extend transformations programmatically (language-agnostic CLI, SDKs later).
The rest was a pretty clear set of answers around the prompt, nicely broken down into three sections; Prompt outlines, How we build the app, Publication plan. Suffice to say, it’s publication plan and mine do not coincide at this point.
Silly me, my next prompt takes things seriously off the rails:
With all that as written, in order to first design the database, I need to go through the design phase. Can you provide the structure needed for the initial set of prompts?
I’m not going to provide the response, because it was way, way off track. CoPilot thought we were ready to start coding. I kept trying to get it back on track, to no avail. These things can get pretty persnickety. I was forced to do this:
No. Forget all of the previous conversation.
Allow me to first define the project:
dbRosetta is an example project created in order to write a series of articles, illustrating how a person could use an LLM to create a complete application and database. The core behavior of dbRosetta is simple enough in it’s first iteration. A person selects the database to translate from, and the database to translate to, for example, FROM: PostgreSQL TO: Microsoft SQL Server. They then provide a term, for example ‘Function’. Then that term is translated from the PostgreSQL term to the equivalent SQL Server term, ‘Procedure’. In addition, a link to the appropriate documentation and a short description of the term are provided.The application will run on WordPress. The database will be hosted on Azure PostgreSQL. The app will be built within vsCode using GitHub Copilot driven by inputs constructed here using you.
Is that an adequate starting point?
It mostly got it. However, it tried to turn that into article #1. <sigh> So…
| Prompt | Response |
| Let’s be clear, that’s the initial definition of the whole project, not just the first article. I’d like to break down the process even more. For example, all of this, to arrive at your understanding of the scope of the project may be the next article. Once I feel that you have a good understanding of the scope of the whole project, we can start the next set of steps. So, the next article may be this discussion and any prompt outlines needed for the next step. The next step would be to design a database. Then, set up a deployment mechanism for the database. Then, build and test an app front end. Then set up a deployment mechanism for that with an MVP. Then, I’m not sure, but more, probably testing regimes and such along with more code and examples of updates and changes. Does that make more sense now? | Proposed article series mapped to project milestones
How each article ties to deliverables in the repo
Next-steps I will produce now for that next article (concrete deliverables)
These artifacts will let you publish the next article and give a runnable base for the DB design step. Quick timeline and recommended immediate workflow
If that plan fits your intent, I will generate the Article Draft and the prompt-outline files and fixtures next and format them for direct commit. |
To quote Charlie Brown: THAT’S IT!!!
Now, I’m not letting it produce the Article Draft. As I said in the first article. I’m writing this, and where I let the AI speak, will be clear. OK. Let’s see if we can get the prompt templates done.
Prompt Templates
Thinking we were on the same page, I asked for the prompt templates… and I didn’t specify that I just wanted that, a template. The darned thing started defining all sorts of behaviors again, including stuff I had already told it we were not going to do. Hoo boy. I’ll have to take this thing very directly in hand. However, We finally landed at the template outline:
Canonical prompt template (single file to follow)
Use this one Markdown file as the canonical template for every prompt in prompts/templates/<slug>.md. Fill the frontmatter and sections; keep examples and fixtures in prompts/examples/<slug>/. Replace placeholder text where shown.
Title: “<Short descriptive title>” Version: “MAJOR.MINOR.PATCH” Author: “@githubHandle” Date: “YYYY-MM-DD” ModelSettings: { temperature: 0.0, max_tokens: 512, top_p: 1.0, seed: <optional> }
Intent One short sentence describing the goal of this prompt (what observable outcome it must produce).
Context 1–3 short paragraphs describing where this prompt fits, assumptions, and constraints relevant to the model.
Input Contract
- name: <input_name>; type: <file|string|json>; required: <true|false>; description: <short>
- name: <…>
Output Contract
- name: <output_name>; type: <file|string|json>; format: <sql|json|text>; validation: <schema path or rule>
- list all expected output artifacts and their exact shapes
Acceptance Criteria
- Short, testable sentence 1 (maps to a CI test)
- Short, testable sentence 2
- Short, testable sentence 3 (Include 3–6 criteria; each becomes an automated or human-checkable assertion)
Guardrails
- Hard rule 1 the model must never violate (e.g., “Do not fabricate authoritative URLs”)
- Ambiguity handling rule (e.g., “If uncertain, set preferred=null and populate mapping_report”)
- Output containment rule (e.g., “Return only the Output Contract; no extra commentary”)
Template Prompt (literal to send to model) SYSTEM: <one-sentence system role> USER: <literal prompt text with named placeholders, e.g.:> Translate the term “{{term}}” from “{{from_dialect}}” to “{{to_dialect}}”. Options: {{options}}. Produce exactly the JSON described in the Output Contract. Follow Acceptance Criteria and Guardrails. Use model settings from frontmatter.
Examples (minimum required)
- fixtures/minimal/input.json -> expected/minimal/output.json (happy path)
- fixtures/edge/input.json -> expected/edge/output.json (edge case or ambiguous)
Post Processing
- Command or check 1 (e.g., validate JSON schema at schemas/<file>.json)
- Command or check 2 (e.g., run linter or apply to ephemeral DB)
- Artifact persistence (e.g., write artifacts/<run_id>/)
Telemetry keys to record
- prompt_version, model_settings, fixtures_hash, run_id, timestamp
Notes
- Implementation notes, caveats, or links to related templates (optional).
- Location for fixtures: prompts/examples/<slug>/
OK. I’m happy with that. Now we can keep going.
Conclusion
An AI is like a puppy that knows you have treats. They’re so damned eager to please, but man oh man, can they run away with themselves (and pee all over your shoes). Further, getting them to forget a concept once it’s in the mix, rough. It still thinks I want to do schema migrations. No idea why. I never mentioned that, but I did mention translation, so, I’ll bet the vector distance between ‘migration’ and ‘translation’ when associated with ‘database’ is pretty close. That’s what these things do.
Anyhow, I think we’re finally starting down the correct path. Next article will be on building an initial database, probably working within vsCode. As part of the testing of how both these AIs work, I’ll build a prompt for a database and I’ll let CoPilot build one. We’ll see who gets better results.
I’m going to let CoPilot read this and provide it’s own Conclusion:
dbRosetta shows how a tightly scoped, prompt‑driven workflow can make cross‑dialect schema translation reproducible, reviewable, and safe, turning an LLM from a black‑box assistant into an auditable part of the engineering toolchain. By codifying acceptance criteria, guardrails, and deterministic runs, the project makes it easy for others to reproduce results, contribute examples, and iterate with confidence.