Building dbRosetta Using AI: Part 3, Creating a Database

The AI said I had to do a database first, not code. Who am I to argue? So, with all the prompts outlining the goals of the project, I’ve gone forward with the project, and step one is creating a PostgreSQL database on Azure.

This is part three of a multi-part set of articles. I’ll move this list to the bottom of future articles:
Part 1: Introducing the Concept of dbRosetta
Part 2: Defining the Project & Prompt Templates

Generating the Database Prompt

At the end of the last article, I said I’d create two prompts, one by hand and one from CoPilot, to run in vsCode. However, on further cogitation, I decided I’d just run with CoPilot’s prompts. I’ll tell CoPilot what I want. I’ll let CoPilot interpret that for GitHub CoPilot. We’ll see what happens. Let me just say, so far, fascinating.

Prompt Response
OK. Let’s start with the initial prompt for vsCode to create the database. Would you like to create an example of that prompt? Do you need more information before you create the database that I’ve roughly described? VS Code prompt for Copilot (create initial Azure Postgres schema)

 

Use this prompt in VS Code Copilot (or your prompt-runner) to generate the initial SQL DDL and a tiny deployment checklist. It assumes an Azure Database for PostgreSQL instance and creates the minimal tables for dbRosetta v0: dialects, terms, translations, and a simple artifacts/metadata table. It also records the assumptions used so the output is deterministic.

Prompt SYSTEM: You are a careful SQL developer and infra assistant. Produce a single SQL file (UTF-8) that can be applied to an Azure PostgreSQL instance to create a schema named dbrosetta and four tables: dialects, terms, translations, and artifacts. Include sensible indexes, constraints, and comments that reflect the acceptance criteria for term translation (no users/auth stored, cacheable translations, metadata JSON). Use standard PostgreSQL types and features compatible with Azure Database for PostgreSQL (single server/postgres-compatible). Do not include any destructive statements (DROP TABLE). At the top of the file, include a brief assumptions block (postgres version, encoding, owner placeholder). Provide example INSERTs for a few dialects (postgres, sqlserver, mysql). USER: Generate the SQL file now, then provide a one-paragraph deployment checklist with the minimal Azure connection/env vars and a sample psql command to apply the file. Keep SQL and checklist separate and clearly labeled. Use low creativity and deterministic style.


Example SQL DDL result (what Copilot should produce)

— Assumptions:
— Postgres version: compatible with 12+ features
— Encoding: UTF8
— Schema: dbrosetta
— Owner: replace with your DB user

 

CREATE SCHEMA IF NOT EXISTS dbrosetta AUTHORIZATION current_user;

— dialects: supported DB dialects and a docs home URL CREATE TABLE dbrosetta.dialects ( id SERIAL PRIMARY KEY, slug TEXT NOT NULL UNIQUE, name TEXT NOT NULL, documentation_home_url TEXT, created_at TIMESTAMPTZ NOT NULL DEFAULT now() ); COMMENT ON TABLE dbrosetta.dialects IS ‘Registry of supported dialects and their canonical documentation home page’;

— terms: canonical terms per dialect for potential seed data CREATE TABLE dbrosetta.terms ( id SERIAL PRIMARY KEY, dialect_id INT NOT NULL REFERENCES dbrosetta.dialects(id) ON DELETE RESTRICT, term TEXT NOT NULL, canonical_normalized_term TEXT NOT NULL, created_at TIMESTAMPTZ NOT NULL DEFAULT now(), UNIQUE (dialect_id, canonical_normalized_term) ); COMMENT ON TABLE dbrosetta.terms IS ‘Optional seeding of known terms per dialect; canonical_normalized_term is lowercased normalization’;

— translations: cached translation results and metadata from runs CREATE TABLE dbrosetta.translations ( id BIGSERIAL PRIMARY KEY, from_dialect_id INT NOT NULL REFERENCES dbrosetta.dialects(id) ON DELETE RESTRICT, to_dialect_id INT NOT NULL REFERENCES dbrosetta.dialects(id) ON DELETE RESTRICT, input_term TEXT NOT NULL, preferred_term TEXT, alternates JSONB DEFAULT ‘[]’::jsonb, description TEXT, doc_url TEXT, mapping_report JSONB DEFAULT ‘[]’::jsonb, metadata JSONB NOT NULL, cached BOOLEAN NOT NULL DEFAULT true, created_at TIMESTAMPTZ NOT NULL DEFAULT now(), UNIQUE (from_dialect_id, to_dialect_id, lower(input_term)) WHERE cached ); COMMENT ON TABLE dbrosetta.translations IS ‘Cached translations: metadata includes prompt_version, model_settings, run_id, timestamp’;

— artifacts: optional generic artifacts produced by runs CREATE TABLE dbrosetta.artifacts ( id BIGSERIAL PRIMARY KEY, run_id UUID NOT NULL, artifact_type TEXT NOT NULL, — e.g., translation.json, metadata.json payload JSONB NOT NULL, created_at TIMESTAMPTZ NOT NULL DEFAULT now() ); COMMENT ON TABLE dbrosetta.artifacts IS ‘Store run artifacts for auditing and reproducibility’;

— Example seeds INSERT INTO dbrosetta.dialects (slug, name, documentation_home_url) VALUES (‘postgres’, ‘PostgreSQL’, ‘https://www.postgresql.org/docs/’), (‘sqlserver’, ‘Microsoft SQL Server’, ‘https://learn.microsoft.com/sql/’), (‘mysql’, ‘MySQL’, ‘https://dev.mysql.com/doc/’);

— Optional: index to speed lookup by normalized input term in translations CREATE INDEX IF NOT EXISTS idx_translations_from_to_input ON dbrosetta.translations(from_dialect_id, to_dialect_id, lower(input_term));

 

Minimal deployment checklist (one-paragraph)

Assumptions: you have an Azure PostgreSQL server and a database created; export connection info as environment variables PGHOST, PGPORT, PGDATABASE, PGUSER, PGPASSWORD. Apply the SQL file with: psql “host=$PGHOST port=$PGPORT dbname=$PGDATABASE user=$PGUSER password=$PGPASSWORD sslmode=require” -f dbrosetta-init.sql. . In CI, store secrets in GH Actions secrets or Azure Key Vault and run the same psql command in a job step; verify success by querying SELECT count(*) FROM dbrosetta.dialects;.

 Assumptions I used (so you can edit them into the prompt-runner)

  • Azure Postgres supports standard PostgreSQL 12+ features and JSONB.
  • No row-level security or advanced role setup is included; owner replacement is manual.
  • Caching is enabled by default: translations are inserted with cached=true.
  • Metadata JSON must include prompt_version and model_settings as strings/objects during run-time (enforced by application logic, not DB constraints).

If this meets your needs, I will produce a ready-to-save dbrosetta-init.sql file and a short GitHub Actions job snippet next.

I swear, this thing is EXCITED about this project. Ha! So anyway, you may notice what I did. That prompt deviates a bit from the template we outlined. So, I responded:

Prompt Response
That’s great and I’ll use it as written within vsCode if it’s what you want. However, is that following the template that you defined for me earlier? Assessment

Yes — the VS Code prompt and the example SQL DDL you accepted broadly follow the canonical prompt template you asked for, but they’re not a 1:1 fulfillment of every template section. The prompt covers Intent, Context, Template Prompt text, and a deterministic ModelSettings requirement, and the DDL output aligns with the Acceptance Criteria and Guardrails you described.

Where it already matches the canonical template

  • Intent: Clearly stated (create initial Azure Postgres schema for dbRosetta).
  • Context: Included assumptions about Azure Postgres, schema name, and non‑destructive output.
  • Template Prompt (literal): Provided a SYSTEM and USER instruction that can be pasted into Copilot or a prompt-runner.
  • Guardrails: Explicit “Do not include destructive statements” and determinism guidance (low creativity).
  • Post Processing expectations: You asked for a separate checklist and told Copilot to keep SQL and checklist separate.

Small gaps to make it fully conformant (suggested, optional)

  • Frontmatter / ModelSettings block: Add a short frontmatter header (Title, Version, ModelSettings) to the prompt file so it matches the template structure and can be parsed by your prompt-runner.
  • Input / Output Contracts: Explicitly state the Input Contract (e.g., target schema name, owner placeholder, desired seed dialects) and Output Contract (single SQL file path; checklist text file) inside the prompt text so CI and schema validators can assert them.
  • Acceptance Criteria: Convert the existing informal checks into 3–6 bullet acceptance sentences (e.g., “DDL creates schema and tables without DROP statements”; “translations table contains metadata JSON and cache uniqueness constraint”; “psql apply command succeeds in CI”).
  • Telemetry keys: Ask the run to emit a run_id and include metadata keys (prompt_version, model_settings) in an artifacts JSON when the prompt-runner executes.

Minimal actionable edit (one small snippet you can paste)

Add this frontmatter and two short contract lines at the top of your prompt before the SYSTEM line:

Title: “dbRosetta init schema” Version: “0.1.0” ModelSettings: { temperature: 0.0, max_tokens: 1024, top_p: 1.0 }

Input Contract: { schema_name: “dbrosetta”, owner_placeholder: “<DB_USER>”, seed_dialects: [“postgres”,”sqlserver”,”mysql”] } Output Contract: { file: “dbrosetta-init.sql”, checklist: “dbrosetta-deploy-checklist.txt” }

…<additional stuff snipped>

CoPilot can do no wrong according to CoPilot. Fine. We’re trying to let it run the show, so I’m adding the defined header. However, I’m going to change my prompts to be more specific from here on out. I do want to ensure that we follow the plan we said we were going to follow, even if the AI wants to hare off and do whatever (<insert joke about developers here>).

Over to vsCode. Let’s see what happens when we use this prompt.

Oh, but before I do that, I’m having side conversations occasionally with CoPilot. One of them was on branching. It wanted a branch for every prompt, which I thought was a bit much, lots and lots of merges that are sooner or later going to get messy. I suggested a branch for each article. The AI compromised, a branch for each article, but, within the article, a branch for each feature/prompt.

Creating a Database

The amount of setup in vsCode is just about non-existent. I’ve got the project in Github open in vsCode. I’ve already installed extensions for Github and PostgreSQL (I’ll use vsCode to do manual work in Azure PostgreSQL, but I may occasionally revert back to DBeaver for some work, just to be open & clear). Nothing to it but to paste in the prompt. Here’s the output:

I’ll generate a complete SQL schema file for the dbRosetta system and provide deployment instructions. Let me create this file in your workspace.

Created [](file:///c%3A/Users/grant/OneDrive/Documents/GitHub/dbRosetta/schema/init-dbrosetta-schema.sql)

## SQL Schema File

The complete SQL schema file has been created at init-dbrosetta-schema.sql. This file creates the dbrosetta schema with four tables (dialects, terms, translations, and artifacts) along with appropriate indexes, constraints, foreign keys, and sample data for PostgreSQL, SQL Server, and MySQL dialects.

## Deployment Checklist

**Prerequisites and Environment Variables:** Ensure you have an active Azure Database for PostgreSQL instance with admin credentials. Set the following environment variables: PGHOST (your Azure PostgreSQL server name), PGPORT (typically 5432), PGDATABASE (target database name), PGUSER (admin username), and PGPASSWORD (admin password). Verify network connectivity by ensuring your client IP is allowed in the Azure PostgreSQL firewall rules. **Deployment Command:** Execute the schema using psql with SSL required for Azure: psql "host=$PGHOST port=$PGPORT dbname=$PGDATABASE user=$PGUSER password=$PGPASSWORD sslmode=require" -f schema/init-dbrosetta-schema.sql. After successful execution, verify the schema creation by connecting to the database and running \dt dbrosetta.* to list the created tables and SELECT COUNT(*) FROM dbrosetta.dialects; to confirm sample data insertion.

It worked! I feel just a little like Victor Frankenstein. Actually, I feel a lot like Victor Frankenstein. Here’s the SQL code it generated, all of it, not hiding the monster from you at all:

I’m impressed, somewhat, that it decided to put some tests in. Yeah, OK, however ineffectual those tests may be, they’re there. Kind of cool. Overall, I’m not crazy about the script, especially the indexing. However, we’re taking this as written. I tested the script. It worked great. However, I also fed the script back to CoPilot so that it could validate if this met our criteria. CoPilot wasn’t completely happy with the results:

Issues and risks to address

  1. SERIAL usage and sequences
    • SERIAL works on PostgreSQL 11+, but for newer portability and explicit behavior consider IDENTITY (GENERATED ALWAYS AS IDENTITY) to avoid implicit sequence ownership issues in some managed platforms.
  2. updated_at not maintained automatically
    • updated_at defaults to CURRENT_TIMESTAMP but won’t change on updates. Consider adding triggers (or application-level updates) to maintain updated_at consistently.
  3. Case sensitivity / normalization for canonical_term and lookups
    • canonical_term is UNIQUE but case-sensitive. If lookups should be case-insensitive, add a unique index on lower(canonical_term) or store a normalized column.
  4. Unnamed constraints and FK clarity
    • FKs and UNIQUE constraints are fine, but naming constraints (e.g., fk_translations_term) improves clarity in migrations and error messages.
  5. Ambiguous data types and pattern length
    • translated_term VARCHAR(500) may be fine, but confirm max lengths vs expected needs. examples/text fields could be TEXT consistently to avoid arbitrary limits.
  6. Verification queries may fail in non-interactive CI
    • The SELECT checks at the end are useful locally, but in automated runs they may produce output that’s ignored; ensure CI treats success as exit code 0 and doesn’t break on non-empty results.
  7. Metadata key expectations not enforced
    • Your acceptance criteria mention prompt_version and model_settings in metadata; the DB doesn’t validate those keys. Consider JSON Schema validation at application level or a CHECK that metadata ? ‘prompt_version’ if you want DB-level guardrails.
  8. Search path and session settings
    • SET search_path TO dbrosetta, public is fine for an interactive run, but scripts applied in CI or by apps should avoid relying on session-specific search_path unless intentional.

Except for #6 (I don’t like the tests, at all), and #2 (triggers? really?), I think this is a very good set of recommendations for improving the initial script. I’m a little shocked that it’s not put out by those index choices. However, I’m going to have CoPilot regenerate the script with these suggested changes except I’m going to have it just remove the existing tests and not generate triggers. While I’m letting the AI largely run things, I figure my input counts as well.

Although, that brings up a point of discussion. Should I? What I mean is, am I biased by having knowledge of what we’re building such that I’m pushing the decisions in a direction I’m more comfortable with? Yeah, yeah, I know I’m biased. Everyone is biased. I’m just wondering if my bias is negatively affecting the experiment? I’m going forward with how I planned it, but I figured this is worth at least bringing up.

With that, mission accomplished. We have a database. And I have to say, I’m pretty excited about it.

Conclusion

I’m coming to believe more and more, being a beginner in IT has become much more challenging due to these new tools. Look, we can argue about the naming standards, indexing, data type choices, and general table structure all we want. The fact of the matter is, that is a functional database. I did next to nothing to arrive at it other than interrogate the LLM a bit. Heck, GitHub CoPilot generated files for me. And filled them in. I could easily allow it to create new resources within Azure, same as everything else we’re doing. So even the steps I’ve done manually, I didn’t have to. The question isn’t, can a beginner do this. The question is, how does a beginner actually learn how to do this, what’s right, what’s wrong, what’s simply problematic or a compromise? That’s all about experience, which this stuff is not giving me.

Anyway, we have a database. I’m honestly excited with how this is going. Next up, I’m going to create a Continuous Integration deployment process where we can begin to put testing and validation in place. I will be using Flyway, and, after discussion with CoPilot, GitHub actions.

As per usual, I’ll let CoPilot have the last word:

The database-creation step succeeded: the init script was produced, applied without errors, and the target schema and seed data are present in the test environment. Core guardrails (non‑destructive DDL, JSONB metadata, identity keys, and named constraints) are in place.