Back to field notes
Engineering 5 min read

From one wall to many: migrating a multi-tenant SaaS schema without downtime

We launched with one Wall of Love per user, then customers started asking for multiple walls per product. Here's the exact migration we ran — two SQL files, a backfill, and a foreign key addition — and how we did it without dropping a table or taking the app offline.

TL;DR

Drop the unique constraint first (migration 1), add the new column and FK with a backfill in the same transaction (migration 2). The app runs continuously throughout because we never remove columns — only add them — and the Drizzle schema stays in sync with the DB at every step.

When we shipped Proofly, every user had exactly one Wall of Love. The schema enforced this with a unique constraint on wall_of_love_configs.user_id. It seemed like enough — a testimonial wall is a testimonial wall.

Within a month, customers were asking for more. An agency wanted one wall per client. A SaaS team wanted a separate wall for their enterprise testimonials and their self-serve testimonials. One customer wanted a wall for each product line.

The feature request was clear. The migration was the interesting part.

What the schema looked like before#

CREATE TABLE wall_of_love_configs (
  id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
  user_id UUID NOT NULL REFERENCES profiles(id) ON DELETE CASCADE,
  title TEXT,
  subtitle TEXT,
  theme TEXT DEFAULT 'light',
  embed_slug TEXT NOT NULL UNIQUE,
  created_at TIMESTAMPTZ DEFAULT now() NOT NULL,
  updated_at TIMESTAMPTZ DEFAULT now() NOT NULL
);

-- One wall per user
CREATE UNIQUE INDEX wall_of_love_configs_user_id_key ON wall_of_love_configs(user_id);

Testimonials linked to a wall indirectly: testimonial → testimonial_request → (nothing). There was no explicit wall_config_id anywhere in the request chain. The API just loaded all testimonials for a user when rendering a wall.

The two-migration plan#

We broke this into two sequential migrations. The reason for two instead of one: the first change (removing the unique constraint) is immediately safe to deploy, and we wanted to verify the system worked correctly with multiple walls possible before adding the foreign key that codifies the relationship.

Migration 1: allow multiple walls#

-- 0006_multi_walls.sql
DROP INDEX IF EXISTS "wall_of_love_configs_user_id_key";

ALTER TABLE "wall_of_love_configs" ADD COLUMN "name" text;

CREATE INDEX "wall_of_love_configs_user_id_idx"
  ON "wall_of_love_configs" USING btree ("user_id");

Three operations:

  1. Drop the unique index — allows more than one wall per user immediately. The old _key suffix is a Postgres naming convention for unique indexes; we replace it with a plain _idx for the non-unique version.
  2. Add the name column — so users can label their walls ("Enterprise Customers", "Self-serve"). Nullable, because existing rows don't need it immediately.
  3. Add a btree index on user_id — since we lost the unique index (which was also doing index duty for queries filtering by user_id), we add a plain index to keep those queries fast.

No rows are deleted, no data changes, no downtime. The app continues running; the only visible change is that the dashboard's "create wall" button no longer errors after the first wall.

Migration 2: connect requests to walls#

The harder migration. Testimonial requests needed an explicit wall association so we could filter testimonials by wall rather than loading everything for a user and filtering in application code.

-- 0007_request_wall_id.sql

-- Add the nullable column first
ALTER TABLE "testimonial_requests"
  ADD COLUMN "wall_config_id" uuid;

-- Add the foreign key
ALTER TABLE "testimonial_requests"
  ADD CONSTRAINT "testimonial_requests_wall_config_id_wall_of_love_configs_id_fk"
  FOREIGN KEY ("wall_config_id")
  REFERENCES "public"."wall_of_love_configs"("id")
  ON DELETE set null
  ON UPDATE no action;

-- Index for the FK (Postgres doesn't auto-create these)
CREATE INDEX "testimonial_requests_wall_config_id_idx"
  ON "testimonial_requests" USING btree ("wall_config_id");

-- Backfill: associate every existing request with the user's oldest wall.
-- DISTINCT ON picks exactly one wall per user (the first by created_at).
UPDATE "testimonial_requests" r
SET "wall_config_id" = w.id
FROM (
  SELECT DISTINCT ON ("user_id") "id", "user_id"
  FROM "wall_of_love_configs"
  ORDER BY "user_id", "created_at" ASC
) w
WHERE w."user_id" = r."user_id"
  AND r."wall_config_id" IS NULL;

The column is nullable. This is intentional — it lets the migration run without requiring every existing row to have a value before the backfill. The ON DELETE set null means if a user deletes a wall, their requests don't cascade-delete; they just lose their wall association and can be reassigned.

The backfill uses DISTINCT ON ("user_id") with ORDER BY "user_id", "created_at" ASC to pick one wall per user — specifically the oldest one, which in most cases was the only wall they had. This runs as a single UPDATE statement, which Postgres executes atomically.

The expand-contract pattern#

What made this safe to run on a live app is the expand-contract pattern:

Expand: add the nullable column. The app keeps working. New writes from updated application code include the wall_config_id. Old code paths (if any were still running) write null, which is fine.

Backfill: fill in the nulls for existing rows. This is idempotent — the AND r.wall_config_id IS NULL clause means running it twice is harmless.

Contract: (future) once we're confident all rows have a value, add NOT NULL. We haven't done this yet — the column is still nullable for legacy rows created before the migration, and that's fine for our read queries, which join against the column.

The key constraint throughout: never remove a column that the running application code is reading from. Only add, only relax, until both the old and new code paths are gone.

What changed in the application queries#

Before this migration, loadApprovedTestimonials loaded all approved testimonials for a user:

// Before
await db.select().from(testimonials)
  .where(eq(testimonials.userId, userId))

After, wall queries filter by wallConfigId via a join on testimonialRequests:

// After
await db.select({ ...fields })
  .from(testimonials)
  .innerJoin(
    testimonialRequests,
    eq(testimonials.requestId, testimonialRequests.id),
  )
  .where(
    and(
      eq(testimonials.userId, userId),
      eq(testimonialRequests.wallConfigId, wallConfigId),
    ),
  )

The function signature changed to require { wallConfigId }. Every call site — the public wall feed API, the dashboard, the embed — was updated before we ran the migration, so by the time the column existed, the code was already using it.

Lessons from the migration#

Two small migrations beat one big one. Each migration is its own checkpoint. If something goes wrong after migration 1, you know exactly which change caused it. If we'd done both in one file, diagnosing a failure would be slower.

Nullable first, backfill second, not-null later (maybe). Don't try to add a NOT NULL column with a default in a single statement on a large table — Postgres will rewrite the entire table. Add nullable, backfill, then add the constraint. For us, the column is fine as nullable since our query code always joins through the request, and requests always have a wall ID for any request created after the migration.

Write the backfill before you need it. We wrote and tested the backfill SQL locally against a copy of production data before running it. The edge cases — users with zero walls, users who deleted all their requests — showed up in local testing, not in production.

Index your foreign keys. Postgres does not automatically create an index on the referencing column of a foreign key. We added testimonial_requests_wall_config_id_idx explicitly. Without it, WHERE testimonial_requests.wall_config_id = ? would do a sequential scan on the requests table.

Frequently asked

Quick answers

Why did you separate this into two migration files instead of one?+

Postgres lets you drop a constraint and add a column in the same transaction, so technically one migration would work. We split it because the first change (dropping the unique constraint) is a schema change that immediately affects app behavior, and we wanted to deploy and verify it independently before adding the new column. Smaller, reversible migrations are easier to reason about under stress.

What happened to existing requests that didn't have a wall_config_id?+

The backfill in migration 0007 associated every existing request with the user's oldest wall (the first one they created, ordered by created_at ascending). DISTINCT ON user_id picks one wall per user, and the UPDATE sets wall_config_id on all that user's requests. It runs in a single statement, so there's no window where some requests are associated and others aren't.

Did you need any application code changes before running the migrations?+

Yes. We updated the Zod input schema for creating testimonial requests to require wall_config_id as a non-nullable field before the DB migration added the column. That way, new requests created after the migration would always have a wall ID. The DB column itself was nullable to support the legacy rows until the backfill ran — a classic expand-contract pattern.