I let Copilot build my database. Here's everything I learned

Q: How does this affect people learning database design?

<div class="font-claude-response relative leading-[1.65rem] [&_pre>div]:bg-bg-000/50 [&_pre>div]:border-0.5 [&_pre>div]:border-border-400 [&_.ignore-pre-bg>div]:bg-transparent [&_.standard-markdown_:is(p,blockquote,h1,h2,h3,h4,h5,h6)]:pl-2 [&_.standard-markdown_:is(p,blockquote,ul,ol,h1,h2,h3,h4,h5,h6)]:pr-8 [&_.progressive-markdown_:is(p,blockquote,h1,h2,h3,h4,h5,h6)]:pl-2 [&_.progressive-markdown_:is(p,blockquote,ul,ol,h1,h2,h3,h4,h5,h6)]:pr-8"> A beginner can produce the same functional schema in an afternoon with no understanding of why the choices matter. The tools make it easy to arrive at a working database without the experience that tells you whether it’s a good one.

I handed Microsoft Copilot the keys to design a PostgreSQL schema for Azure Database for PostgreSQL Flexible Server, with as little intervention from me as I could stand. What came back was a working four-table schema (dialects, terms, translations, artifacts) with foreign keys, JSONB metadata, seed data for three SQL dialects, and a deployment checklist that actually ran.

But that’s just a small part of the story. In this article, I’ll shout about what it got right – but also emphasize what it got wrong, why it happened, and what I’ve learned. Includes tips, advice, and the full code repo for if you want to give it a go yourself.

This is the second part of Grant Fritchey’s series on building an app and database almost entirely through an LLM. The repo, including every prompt, lives on GitHub.

In part one, I established the rules of this experiment and got the AI to commit to a project plan. This time, we actually build something. The first thing on the list, per Copilot’s own recommendation in part one, is the database. So: a PostgreSQL schema for dbRosetta, hosted on Azure Flexible Server. And the most important bit: it’s all designed and generated by Copilot, with as little intervention from me as I can stand.

What follows is what happened, what I let slide, what I overrode, and a slightly uncomfortable question I haven’t fully answered for myself yet.

Generating the prompt

I’d originally said I would write two prompts in parallel — one by hand, one from Copilot — and compare them. On reflection I dropped that. The whole point of this series is to see what happens when the AI runs the show, and writing my own version in parallel would have biased everything that came after it.

So, I gave Copilot a one-liner asking for the Visual Studio Code (VSCode) prompt and got back a properly structured one: schema name, table list, constraints, JSONB metadata, deterministic style, “do not include destructive statements,” sample seed inserts for three dialects, and a deployment checklist.

Click here for the full prompt, and Copilot’s response

Prompt

Response

OK. Let’s start with the initial prompt for vsCode to create the database. Would you like to create an example of that prompt? Do you need more information before you create the database that I’ve roughly described?

VS Code prompt for Copilot (create initial Azure Postgres schema) Use this prompt in VS Code Copilot (or your prompt-runner) to generate the initial SQL DDL and a tiny deployment checklist. It assumes an Azure Database for PostgreSQL instance and creates the minimal tables for dbRosetta v0: dialects, terms, translations, and a simple artifacts/metadata table. It also records the assumptions used so the output is deterministic.Prompt SYSTEM: You are a careful SQL developer and infra assistant. Produce a single SQL file (UTF-8) that can be applied to an Azure PostgreSQL instance to create a schema named dbrosetta and four tables: dialects, terms, translations, and artifacts. Include sensible indexes, constraints, and comments that reflect the acceptance criteria for term translation (no users/auth stored, cacheable translations, metadata JSON). Use standard PostgreSQL types and features compatible with Azure Database for PostgreSQL (single server/postgres-compatible). Do not include any destructive statements (DROP TABLE). At the top of the file, include a brief assumptions block (postgres version, encoding, owner placeholder). Provide example INSERTs for a few dialects (postgres, sqlserver, mysql). USER: Generate the SQL file now, then provide a one-paragraph deployment checklist with the minimal Azure connection/env vars and a sample psql command to apply the file. Keep SQL and checklist separate and clearly labeled. Use low creativity and deterministic style.
Example SQL DDL result (what Copilot should produce)— Assumptions:
— Postgres version: compatible with 12+ features
— Encoding: UTF8
— Schema: dbrosetta
— Owner: replace with your DB user CREATE SCHEMA IF NOT EXISTS dbrosetta AUTHORIZATION current_user;– dialects: supported DB dialects and a docs home URL CREATE TABLE dbrosetta.dialects ( id SERIAL PRIMARY KEY, slug TEXT NOT NULL UNIQUE, name TEXT NOT NULL, documentation_home_url TEXT, created_at TIMESTAMPTZ NOT NULL DEFAULT now() ); COMMENT ON TABLE dbrosetta.dialects IS ‘Registry of supported dialects and their canonical documentation home page’;– terms: canonical terms per dialect for potential seed data CREATE TABLE dbrosetta.terms ( id SERIAL PRIMARY KEY, dialect_id INT NOT NULL REFERENCES dbrosetta.dialects(id) ON DELETE RESTRICT, term TEXT NOT NULL, canonical_normalized_term TEXT NOT NULL, created_at TIMESTAMPTZ NOT NULL DEFAULT now(), UNIQUE (dialect_id, canonical_normalized_term) ); COMMENT ON TABLE dbrosetta.terms IS ‘Optional seeding of known terms per dialect; canonical_normalized_term is lowercased normalization’;– translations: cached translation results and metadata from runs CREATE TABLE dbrosetta.translations ( id BIGSERIAL PRIMARY KEY, from_dialect_id INT NOT NULL REFERENCES dbrosetta.dialects(id) ON DELETE RESTRICT, to_dialect_id INT NOT NULL REFERENCES dbrosetta.dialects(id) ON DELETE RESTRICT, input_term TEXT NOT NULL, preferred_term TEXT, alternates JSONB DEFAULT ‘[]’::jsonb, description TEXT, doc_url TEXT, mapping_report JSONB DEFAULT ‘[]’::jsonb, metadata JSONB NOT NULL, cached BOOLEAN NOT NULL DEFAULT true, created_at TIMESTAMPTZ NOT NULL DEFAULT now(), UNIQUE (from_dialect_id, to_dialect_id, lower(input_term)) WHERE cached ); COMMENT ON TABLE dbrosetta.translations IS ‘Cached translations: metadata includes prompt_version, model_settings, run_id, timestamp’;– artifacts: optional generic artifacts produced by runs CREATE TABLE dbrosetta.artifacts ( id BIGSERIAL PRIMARY KEY, run_id UUID NOT NULL, artifact_type TEXT NOT NULL, — e.g., translation.json, metadata.json payload JSONB NOT NULL, created_at TIMESTAMPTZ NOT NULL DEFAULT now() ); COMMENT ON TABLE dbrosetta.artifacts IS ‘Store run artifacts for auditing and reproducibility’;– Example seeds INSERT INTO dbrosetta.dialects (slug, name, documentation_home_url) VALUES (‘postgres’, ‘PostgreSQL’, ‘https://www.postgresql.org/docs/’), (‘sqlserver’, ‘Microsoft SQL Server’, ‘https://learn.microsoft.com/sql/’), (‘mysql’, ‘MySQL’, ‘https://dev.mysql.com/doc/’);– Optional: index to speed lookup by normalized input term in translations CREATE INDEX IF NOT EXISTS idx_translations_from_to_input ON dbrosetta.translations(from_dialect_id, to_dialect_id, lower(input_term)); Minimal deployment checklist (one-paragraph)Assumptions: you have an Azure PostgreSQL server and a database created; export connection info as environment variables PGHOST, PGPORT, PGDATABASE, PGUSER, PGPASSWORD. Apply the SQL file with: psql “host=$PGHOST port=$PGPORT dbname=$PGDATABASE user=$PGUSER password=$PGPASSWORD sslmode=require” -f dbrosetta-init.sql. . In CI, store secrets in GH Actions secrets or Azure Key Vault and run the same psql command in a job step; verify success by querying SELECT count(*) FROM dbrosetta.dialects;. Assumptions I used (so you can edit them into the prompt-runner)Azure Postgres supports standard PostgreSQL 12+ features and JSONB.No row-level security or advanced role setup is included; owner replacement is manual.Caching is enabled by default: translations are inserted with cached=true.Metadata JSON must include prompt_version and model_settings as strings/objects during run-time (enforced by application logic, not DB constraints).If this meets your needs, I will produce a ready-to-save dbrosetta-init.sql file and a short GitHub Actions job snippet next.

The interesting part was the deviation. Back in part one, Copilot and I agreed a canonical prompt template — frontmatter with version and model settings, explicit input and output contracts, acceptance criteria, telemetry keys. The prompt it produced for the database didn’t follow that template.

In response, here’s what I then asked it directly:

That’s great and I’ll use it as written within VSCode if it’s what you want. However, is that following the template that you defined for me earlier?

Click here for the full prompt, and Copilot’s response

Prompt Response
That’s great and I’ll use it as written within vsCode if it’s what you want. However, is that following the template that you defined for me earlier? AssessmentYes — the VS Code prompt and the example SQL DDL you accepted broadly follow the canonical prompt template you asked for, but they’re not a 1:1 fulfillment of every template section. The prompt covers Intent, Context, Template Prompt text, and a deterministic ModelSettings requirement, and the DDL output aligns with the Acceptance Criteria and Guardrails you described.Where it already matches the canonical templateIntent: Clearly stated (create initial Azure Postgres schema for dbRosetta).Context: Included assumptions about Azure Postgres, schema name, and non‑destructive output.Template Prompt (literal): Provided a SYSTEM and USER instruction that can be pasted into Copilot or a prompt-runner.Guardrails: Explicit “Do not include destructive statements” and determinism guidance (low creativity).Post Processing expectations: You asked for a separate checklist and told Copilot to keep SQL and checklist separate.Small gaps to make it fully conformant (suggested, optional)Frontmatter / ModelSettings block: Add a short frontmatter header (Title, Version, ModelSettings) to the prompt file so it matches the template structure and can be parsed by your prompt-runner.Input / Output Contracts: Explicitly state the Input Contract (e.g., target schema name, owner placeholder, desired seed dialects) and Output Contract (single SQL file path; checklist text file) inside the prompt text so CI and schema validators can assert them.Acceptance Criteria: Convert the existing informal checks into 3–6 bullet acceptance sentences (e.g., “DDL creates schema and tables without DROP statements”; “translations table contains metadata JSON and cache uniqueness constraint”; “psql apply command succeeds in CI”).Telemetry keys: Ask the run to emit a run_id and include metadata keys (prompt_version, model_settings) in an artifacts JSON when the prompt-runner executes.Minimal actionable edit (one small snippet you can paste)Add this frontmatter and two short contract lines at the top of your prompt before the SYSTEM line:Title: “dbRosetta init schema” Version: “0.1.0” ModelSettings: { temperature: 0.0, max_tokens: 1024, top_p: 1.0 }Input Contract: { schema_name: “dbrosetta”, owner_placeholder: “<DB_USER>”, seed_dialects: [“postgres”,”sqlserver”,”mysql”] } Output Contract: { file: “dbrosetta-init.sql”, checklist: “dbrosetta-deploy-checklist.txt” }…<additional stuff snipped>

Copilot acknowledged the gaps cheerfully, told me what was missing, and offered a small frontmatter snippet to bolt on. To paraphrase: yes, broadly, but here are five things you could add to make it fully conformant. Which is fine. But it would never have flagged any of this had I not asked.

An LLM will produce something that resembles what you asked for, much faster than it will produce something that conforms to a standard you defined three sessions ago. Bookmark that one.

I let it run with the corrected prompt. Over to VSCode.

Frankenstein, with relational integrity

I’m not going to lie. There is something deeply odd about watching an LLM create a file in your workspace, fill it with two hundred lines of DDL, and then write you a deployment checklist that actually works.

From the prompt, it generated init-dbrosetta-schema.sql, four tables, indexes, foreign keys, comments on every column, sample seed data for PostgreSQL, SQL Server and MySQL, and a tidy little psql command to apply it.

I tested it. It worked. Schema present, tables present, seed data queryable. I feel just a little like Victor Frankenstein. Actually, I feel a lot like Victor Frankenstein.

Here’s the interesting part of what it produced — the translations table, which is where the domain actually lives:

CREATE TABLE translations (

id SERIAL PRIMARY KEY,

term_id INTEGER NOT NULL REFERENCES terms(id) ON DELETE CASCADE,

dialect_id INTEGER NOT NULL REFERENCES dialects(id) ON DELETE CASCADE,

translated_term VARCHAR(500) NOT NULL,

syntax_pattern TEXT,

examples TEXT,

notes TEXT,

confidence_level INTEGER DEFAULT 100

CHECK (confidence_level >= 0 AND confidence_level <= 100),

is_active BOOLEAN NOT NULL DEFAULT true,

metadata JSONB,

created_at TIMESTAMP WITH TIME ZONE NOT NULL DEFAULT CURRENT_TIMESTAMP,

updated_at TIMESTAMP WITH TIME ZONE NOT NULL DEFAULT CURRENT_TIMESTAMP,

UNIQUE(term_id, dialect_id)

);

Click here to see the full code

100

101

102

103

104

105

106

107

108

109

110

111

112

113

114

115

116

117

118

119

120

121

122

123

124

125

126

127

128

129

130

131

132

133

134

135

136

137

138

139

140

141

142

143

144

145

146

147

148

149

150

151

152

153

154

155

156

157

158

159

160

161

162

163

164

165

166

167

168

169

170

171

172

173

174

175

176

177

178

179

180

181

182

183

184

185

186

187

188

189

190

191

192

193

194

195

196

197

198

199

200

-- =============================================================================

-- dbRosetta Schema Initialization

-- =============================================================================

-- ASSUMPTIONS:

-- - PostgreSQL version 11+ (Azure Database for PostgreSQL compatible)

-- - UTF-8 encoding

-- - Database owner/admin privileges required for schema creation

-- - Target database already exists

-- - Schema owner placeholder: replace 'dbrosetta_owner' with actual role

-- =============================================================================

-- Create schema

CREATE SCHEMA IF NOT EXISTS dbrosetta;

-- Set search path for this session

SET search_path TO dbrosetta, public;

-- =============================================================================

-- TABLE: dialects

-- Stores SQL dialect definitions and metadata

-- =============================================================================

CREATE TABLE dialects (

id SERIAL PRIMARY KEY,

name VARCHAR(50) NOT NULL UNIQUE,

display_name VARCHAR(100) NOT NULL,

version VARCHAR(20),

description TEXT,

is_active BOOLEAN NOT NULL DEFAULT true,

metadata JSONB,

created_at TIMESTAMP WITH TIME ZONE NOT NULL DEFAULT CURRENT_TIMESTAMP,

updated_at TIMESTAMP WITH TIME ZONE NOT NULL DEFAULT CURRENT_TIMESTAMP

);

-- Indexes for dialects table

CREATE INDEX idx_dialects_name ON dialects (name);

CREATE INDEX idx_dialects_active ON dialects (is_active);

CREATE INDEX idx_dialects_metadata ON dialects USING GIN (metadata);

-- Comments for dialects table

COMMENT ON TABLE dialects IS 'SQL dialect definitions and metadata for translation system';

COMMENT ON COLUMN dialects.name IS 'Unique identifier for the dialect (lowercase, no spaces)';

COMMENT ON COLUMN dialects.display_name IS 'Human-readable name for the dialect';

COMMENT ON COLUMN dialects.version IS 'Version identifier for the dialect';

COMMENT ON COLUMN dialects.is_active IS 'Flag to enable/disable dialect for translations';

COMMENT ON COLUMN dialects.metadata IS 'Additional dialect configuration and features as JSON';

-- =============================================================================

-- TABLE: terms

-- Stores canonical SQL terms and their definitions

-- =============================================================================

CREATE TABLE terms (

id SERIAL PRIMARY KEY,

canonical_term VARCHAR(200) NOT NULL UNIQUE,

category VARCHAR(50) NOT NULL,

subcategory VARCHAR(50),

description TEXT NOT NULL,

usage_context TEXT,

is_active BOOLEAN NOT NULL DEFAULT true,

metadata JSONB,

created_at TIMESTAMP WITH TIME ZONE NOT NULL DEFAULT CURRENT_TIMESTAMP,

updated_at TIMESTAMP WITH TIME ZONE NOT NULL DEFAULT CURRENT_TIMESTAMP

);

-- Indexes for terms table

CREATE INDEX idx_terms_canonical ON terms (canonical_term);

CREATE INDEX idx_terms_category ON terms (category);

CREATE INDEX idx_terms_active ON terms (is_active);

CREATE INDEX idx_terms_category_sub ON terms (category, subcategory);

CREATE INDEX idx_terms_metadata ON terms USING GIN (metadata);

-- Comments for terms table

COMMENT ON TABLE terms IS 'Canonical SQL terms and definitions for cross-dialect translation';

COMMENT ON COLUMN terms.canonical_term IS 'Standardized term identifier (case-sensitive)';

COMMENT ON COLUMN terms.category IS 'Primary classification of the term (e.g., function, operator, keyword)';

COMMENT ON COLUMN terms.subcategory IS 'Secondary classification for more specific grouping';

COMMENT ON COLUMN terms.usage_context IS 'Additional context about when and how to use this term';

COMMENT ON COLUMN terms.metadata IS 'Additional term properties and configuration as JSON';

-- =============================================================================

-- TABLE: translations

-- Stores dialect-specific translations for canonical terms

-- =============================================================================

CREATE TABLE translations (

id SERIAL PRIMARY KEY,

term_id INTEGER NOT NULL REFERENCES terms(id) ON DELETE CASCADE,

dialect_id INTEGER NOT NULL REFERENCES dialects(id) ON DELETE CASCADE,

translated_term VARCHAR(500) NOT NULL,

syntax_pattern TEXT,

examples TEXT,

notes TEXT,

confidence_level INTEGER DEFAULT 100 CHECK (confidence_level >= 0 AND confidence_level <= 100),

is_active BOOLEAN NOT NULL DEFAULT true,

metadata JSONB,

created_at TIMESTAMP WITH TIME ZONE NOT NULL DEFAULT CURRENT_TIMESTAMP,

updated_at TIMESTAMP WITH TIME ZONE NOT NULL DEFAULT CURRENT_TIMESTAMP,

UNIQUE(term_id, dialect_id)

);

-- Indexes for translations table

CREATE INDEX idx_translations_term ON translations (term_id);

CREATE INDEX idx_translations_dialect ON translations (dialect_id);

CREATE INDEX idx_translations_active ON translations (is_active);

CREATE INDEX idx_translations_confidence ON translations (confidence_level);

CREATE INDEX idx_translations_term_dialect ON translations (term_id, dialect_id);

CREATE INDEX idx_translations_metadata ON translations USING GIN (metadata);

-- Comments for translations table

COMMENT ON TABLE translations IS 'Dialect-specific translations for canonical terms with caching support';

COMMENT ON COLUMN translations.translated_term IS 'The dialect-specific equivalent of the canonical term';

COMMENT ON COLUMN translations.syntax_pattern IS 'Template or pattern showing how to use the translated term';

COMMENT ON COLUMN translations.examples IS 'Example usage in the target dialect';

COMMENT ON COLUMN translations.confidence_level IS 'Translation accuracy confidence (0-100)';

COMMENT ON COLUMN translations.metadata IS 'Additional translation properties and caching hints as JSON';

-- =============================================================================

-- TABLE: artifacts

-- Stores generated SQL artifacts and their metadata

-- =============================================================================

CREATE TABLE artifacts (

id SERIAL PRIMARY KEY,

name VARCHAR(200) NOT NULL,

artifact_type VARCHAR(50) NOT NULL,

source_dialect_id INTEGER REFERENCES dialects(id) ON DELETE SET NULL,

target_dialect_id INTEGER REFERENCES dialects(id) ON DELETE SET NULL,

original_sql TEXT,

translated_sql TEXT,

translation_summary TEXT,

status VARCHAR(20) NOT NULL DEFAULT 'draft',

metadata JSONB,

created_at TIMESTAMP WITH TIME ZONE NOT NULL DEFAULT CURRENT_TIMESTAMP,

updated_at TIMESTAMP WITH TIME ZONE NOT NULL DEFAULT CURRENT_TIMESTAMP

);

-- Indexes for artifacts table

CREATE INDEX idx_artifacts_name ON artifacts (name);

CREATE INDEX idx_artifacts_type ON artifacts (artifact_type);

CREATE INDEX idx_artifacts_source_dialect ON artifacts (source_dialect_id);

CREATE INDEX idx_artifacts_target_dialect ON artifacts (target_dialect_id);

CREATE INDEX idx_artifacts_status ON artifacts (status);

CREATE INDEX idx_artifacts_created ON artifacts (created_at);

CREATE INDEX idx_artifacts_metadata ON artifacts USING GIN (metadata);

-- Comments for artifacts table

COMMENT ON TABLE artifacts IS 'Generated SQL artifacts from translation processes';

COMMENT ON COLUMN artifacts.name IS 'User-friendly name for the artifact';

COMMENT ON COLUMN artifacts.artifact_type IS 'Type of artifact (e.g., script, procedure, schema)';

COMMENT ON COLUMN artifacts.original_sql IS 'Source SQL before translation';

COMMENT ON COLUMN artifacts.translated_sql IS 'SQL after dialect translation';

COMMENT ON COLUMN artifacts.translation_summary IS 'Summary of changes made during translation';

COMMENT ON COLUMN artifacts.status IS 'Artifact status (draft, validated, deployed, etc.)';

COMMENT ON COLUMN artifacts.metadata IS 'Additional artifact properties and processing metadata as JSON';

-- =============================================================================

-- EXAMPLE DATA

-- Sample dialects and basic term translations

-- =============================================================================

-- Insert sample dialects

INSERT INTO dialects (name, display_name, version, description, metadata) VALUES

('postgresql', 'PostgreSQL', '13+', 'PostgreSQL database system', '{"features": ["arrays", "jsonb", "cte"], "url": "https://postgresql.org"}'),

('sqlserver', 'Microsoft SQL Server', '2019+', 'Microsoft SQL Server database system', '{"features": ["cte", "windowing", "xml"], "url": "https://microsoft.com/sql"}'),

('mysql', 'MySQL', '8.0+', 'MySQL database system', '{"features": ["json", "cte"], "url": "https://mysql.com"}');

-- Insert sample terms

INSERT INTO terms (canonical_term, category, subcategory, description, usage_context) VALUES

('CONCAT', 'function', 'string', 'Concatenates two or more strings together', 'String manipulation and formatting'),

('ISNULL', 'function', 'conditional', 'Tests if a value is NULL and returns alternative', 'NULL handling and conditional logic'),

('LIMIT', 'clause', 'query', 'Limits the number of rows returned by a query', 'Result set pagination and performance');

-- Insert sample translations

INSERT INTO translations (term_id, dialect_id, translated_term, syntax_pattern, examples, confidence_level) VALUES

-- CONCAT translations

(1, 1, 'CONCAT', 'CONCAT(string1, string2, ...)', 'SELECT CONCAT(''Hello'', '' '', ''World'');', 100),

(1, 2, 'CONCAT', 'CONCAT(string1, string2, ...)', 'SELECT CONCAT(''Hello'', '' '', ''World'');', 100),

(1, 3, 'CONCAT', 'CONCAT(string1, string2, ...)', 'SELECT CONCAT(''Hello'', '' '', ''World'');', 100),

-- ISNULL translations

(2, 1, 'COALESCE', 'COALESCE(value, replacement)', 'SELECT COALESCE(column, ''N/A'') FROM table;', 95),

(2, 2, 'ISNULL', 'ISNULL(value, replacement)', 'SELECT ISNULL(column, ''N/A'') FROM table;', 100),

(2, 3, 'IFNULL', 'IFNULL(value, replacement)', 'SELECT IFNULL(column, ''N/A'') FROM table;', 100),

-- LIMIT translations

(3, 1, 'LIMIT', 'LIMIT count [OFFSET offset]', 'SELECT * FROM table LIMIT 10 OFFSET 20;', 100),

(3, 2, 'TOP', 'TOP (count)', 'SELECT TOP (10) * FROM table ORDER BY id OFFSET 20 ROWS;', 90),

(3, 3, 'LIMIT', 'LIMIT [offset,] count', 'SELECT * FROM table LIMIT 20, 10;', 100);

-- =============================================================================

-- GRANTS AND PERMISSIONS

-- =============================================================================

-- Grant schema usage (replace 'dbrosetta_owner' with actual role)

-- GRANT USAGE ON SCHEMA dbrosetta TO dbrosetta_owner;

-- GRANT ALL PRIVILEGES ON ALL TABLES IN SCHEMA dbrosetta TO dbrosetta_owner;

-- GRANT ALL PRIVILEGES ON ALL SEQUENCES IN SCHEMA dbrosetta TO dbrosetta_owner;

-- =============================================================================

-- SCHEMA VALIDATION

-- =============================================================================

-- Verify schema creation

SELECT

schemaname,

tablename,

tableowner

FROM pg_tables

WHERE schemaname = 'dbrosetta'

ORDER BY tablename;

-- Verify sample data

SELECT

d.name as dialect,

t.canonical_term,

tr.translated_term,

tr.confidence_level

FROM dialects d

JOIN translations tr ON d.id = tr.dialect_id

JOIN terms t ON tr.term_id = t.id

ORDER BY t.canonical_term, d.name;

That’s a reasonable starting point. It also seeded sensible cross-dialect examples — ISNULL in SQL Server maps to COALESCE in PostgreSQL and IFNULL in MySQL, each with a confidence score and an example query. I didn’t ask for confidence scores. It added them anyway. I’m leaving them in for now because they’re useful, but it’s a small reminder that the AI is making product decisions whenever you give it room.

(The full schema, with all four tables, indexes and seeds, is in the repo.)

Accelerate and simplify database development with Redgate

Automate time-consuming tasks and support consistent workflows.

Learn more

What I’m less keen on is the indexing. There’s an index on practically every column that could be filtered or joined on, including separate single-column indexes for fields that only appear together in composite lookups. As a starting point on day one of a project with no production workload yet, it’s defensible.

However, as a pattern to ship, it’s the kind of thing you’d flag in code review: this is going to cost you on writes, and most of these will never be used. We’ll see whether it bites later.

Copilot critiques Copilot

This is where it got genuinely interesting. After confirming the script ran, I fed the whole DDL back to Copilot and asked it to validate the result against the acceptance criteria we’d set in part one. It came back with eight issues.

The eight issues it presented (edited for length):

SERIAL works on PostgreSQL 11+, but GENERATED ALWAYS AS IDENTITY is more portable and avoids implicit sequence ownership issues on managed platforms.
updated_at defaults to CURRENT_TIMESTAMP but never changes on updates. Add triggers or maintain it in application code.
canonical_term is unique but case-sensitive. If lookups should be case-insensitive, add a unique index on lower(canonical_term) or store a normalised column.
Foreign keys and unique constraints aren’t named. Naming them (fk_translations_term, etc.) improves clarity in migrations and error messages.
Pattern-length limits are arbitrary. Consider TEXT instead of VARCHAR(500) for fields like translated_term and examples.
The validation SELECT statements at the end of the script are fine locally but noisy in CI.
Acceptance criteria mention prompt_version and model_settings in metadata. The schema doesn’t enforce them. Consider JSON Schema validation in the app layer, or a CHECK constraint.
SET search_path TO dbrosetta, public is fine interactively but should be avoided in scripts applied by CI or applications.

Analyzing Copilot’s validation effort

Most of those are good calls. I accepted six of them. I overrode two: the updated_at trigger (overkill for v0, do it in app code when there’s an app) and the in-script validation SELECTs (I’d rather pull those out entirely than maintain them as test artifacts).

I’m a little shocked Copilot didn’t flag the indexing strategy — every other surface gets a recommendation, but the index-everything pattern got a pass. That tells me something about what’s overrepresented in the training data. Indexes that exist look correct. Indexes that shouldn’t exist look like prudence.

I asked Copilot to regenerate the script with the six accepted changes — and only those changes. It did. The revised script applied cleanly. Database confirmed.

Mission accomplished – we have a database. And I’m pretty excited about it.

The awkward question

Here’s the thing I haven’t fully resolved. I am, by background, a database person. I have opinions on schema design. When I accepted six of Copilot’s eight suggestions and rejected two, I did that on the basis of judgement formed over thirty-odd years of doing this work.

Is my intervention biasing the experiment towards a result I’m comfortable with?

Yes. Obviously yes. The bigger question is whether that’s a problem.

If the experiment is “can an LLM build a working database with no human intervention,” then every override is a contamination. If the experiment is “can an LLM build a working database faster than I could on my own, with me as a reviewer rather than an author,” then my overrides are exactly the point — they’re what makes the experiment honest.

I’m going with the second framing, because I think the first one is a fantasy that no real engineering organization should be running.

The harder question this raises…

Without my reviewing eye, what would have shipped? A database with an index on every column, no consistent foreign key naming, case-sensitive lookups in a field that almost certainly wants to be case-insensitive, and an updated_at field that lies about when rows were last updated.

None of those are bugs. They all work. But they’re also all things that somebody is going to swear at five years from now, when this database is full of real data and somebody is trying to figure out why a particular query is slow, or why two rows that look identical are being treated as different.

What I learned (and what I’d now do differently)

If you’re letting an LLM design schema, the moment you have something that runs is exactly the moment to feed it back to a different prompt and ask it to critique its own output against your acceptance criteria

This is the single most useful pattern I’ve found so far. Copilot was much more honest about the schema’s weaknesses when reviewing it as a reader than it was when generating it as an author. Two-pass generation, where pass two is find the problems with pass one, is worth the extra round trip every single time.
Watch for what the AI doesn’t flag

The eight issues Copilot raised were all real. The thing it didn’t raise — a schema indexing strategy that would make a senior DBA wince — is the more interesting data point. If you’re using AI to generate code in an area you’re not expert in, that’s where the risk is. Not in the things the model is uncertain about; in the things the model is confidently wrong about because the training corpus is full of the same mistake.
Make peace with the bias question and move on

You will override the AI’s choices. You should. Pretending otherwise is just performing rigour for the article. What matters is being honest, in writing, about which choices you overrode and why. This matters, so a reader without your background can see where your judgement entered the process and decide whether they’d have made the same call.

What’s next

We have a database. The next step is making it deployable. I’ll use a Redgate Flyway baseline against the existing schema, then a continuous integration (CI) pipeline that can apply migrations from GitHub to the live Azure instance. That part of the series turned out to be where the AI tried hardest to compromise on security — and it’s the article I’m most interested in writing. More on that next time.

One last thing, though, that has been nagging at me since this article finished generating itself…

Being a beginner in IT has become much more challenging because of these tools

Look, we can argue about the naming standards, indexing, data type choices, and general table structure all we want. The fact of the matter is, that is a functional database. I did next to nothing to arrive at it other than interrogate the LLM a bit. Heck, GitHub Copilot generated files for me. And filled them in. I could easily allow it to create new resources within Azure, same as everything else we’re doing.

So, even the steps I’ve done manually, I didn’t have to. The question isn’t can a beginner do this. The question is, how does a beginner actually learn how to do this. What’s right? What’s wrong? And what’s simply problematic or a compromise? That’s all about experience, which this stuff is not giving me.

“Everyone wants to move faster with AI, but few are truly ready for it.”

What does the AI landscape look like in 2026? Get the full overview in Redgate’s 2026 State of the Database Landscape AI mini report >>

Download the AI mini report

FAQs: Can Copilot build a database from the ground up?

1. Can Microsoft Copilot design a production-ready PostgreSQL schema?

It produces a functional schema that runs on Azure: tables, foreign keys, indexes, seed data. But the first pass had case-sensitive lookups that should have been case-insensitive, an updated_at column that never updated, unnamed constraints, and an index-on-everything pattern that hurts write performance. It works. It’s not what you’d ship.

2. What is the two-pass AI code review pattern?

Feed the AI’s output back to it in a fresh prompt and ask it to critique the result against your acceptance criteria. Copilot was far more honest about the schema’s weaknesses as a reviewer than as an author – flagging eight real issues on the second pass. The extra round trip is worth it every time.

3. What did Copilot fail to flag in its own schema?

The indexing strategy. The schema included an index on nearly every column, including redundant ones for fields only used in composite lookups. A senior DBA would flag this on sight. Copilot didn’t – suggesting the training corpus treats over-indexing as prudence.

4. Should you override the AI's choices when generating schemas?

Yes, and being honest about the overrides is the point. Six of Copilot’s eight self-critique suggestions were accepted here; two were overridden. The override list is part of the deliverable.

5. How does this affect people learning database design?

A beginner can produce the same functional schema in an afternoon with no understanding of why the choices matter. The tools make it easy to arrive at a working database without the experience that tells you whether it’s a good one.

How I built an app just by talking to an AI LLM

Part 2 of 2

Can you build an app, from the ground up, just by talking to an AI LLM? Find out in this four-part series.…

1. How I built an app just by talking to an AI LLM – the good, the bad, and what I’d do differently next time (part one)
2. I let Copilot build my database. Here’s what I learned – and everything I’d do differently next time (part two)

This document contains proprietary information and is protected by copyright law.

Register for Simple Talk

I let Copilot build my database. Here’s what I learned – and everything I’d do differently next time (part two)

Generating the prompt

In response, here’s what I then asked it directly:

Frankenstein, with relational integrity

Accelerate and simplify database development with Redgate

Copilot critiques Copilot

The eight issues it presented (edited for length):

Analyzing Copilot’s validation effort

The awkward question

The harder question this raises…

What I learned (and what I’d now do differently)

What’s next

Being a beginner in IT has become much more challenging because of these tools

“Everyone wants to move faster with AI, but few are truly ready for it.”

FAQs: Can Copilot build a database from the ground up?

1. Can Microsoft Copilot design a production-ready PostgreSQL schema?

2. What is the two-pass AI code review pattern?

3. What did Copilot fail to flag in its own schema?

4. Should you override the AI's choices when generating schemas?

5. How does this affect people learning database design?

Article tags

About the author

How I built an app just by talking to an AI LLM – the good, the bad, and what I’d do differently next time (part one)

Why even the best teams struggle with cloud migrations

Why database ownership is so fragmented in 2026 – and what you can do about it

I let Copilot build my database. Here’s what I learned – and everything I’d do differently next time (part two)

Generating the prompt

In response, here’s what I then asked it directly:

Frankenstein, with relational integrity

Accelerate and simplify database development with Redgate

Copilot critiques Copilot

The eight issues it presented (edited for length):

Analyzing Copilot’s validation effort

The awkward question

The harder question this raises…

What I learned (and what I’d now do differently)

What’s next

Being a beginner in IT has become much more challenging because of these tools

“Everyone wants to move faster with AI, but few are truly ready for it.”

FAQs: Can Copilot build a database from the ground up?

1. Can Microsoft Copilot design a production-ready PostgreSQL schema?

2. What is the two-pass AI code review pattern?

3. What did Copilot fail to flag in its own schema?

4. Should you override the AI's choices when generating schemas?

5. How does this affect people learning database design?

Article tags

Recommended

Subscribe for more

About the author

How I built an app just by talking to an AI LLM – the good, the bad, and what I’d do differently next time (part one)

Why even the best teams struggle with cloud migrations

Why database ownership is so fragmented in 2026 – and what you can do about it