Overview
When building a SaaS application, one of the most critical aspects is designing a robust and extensible user data model. In this post, we'll explore how to build such a model, starting from basic requirements and gradually adding complexity to support real-world needs.
Starting Simple: The Core User Table
Let's start with the most basic version of a user table:
CREATE TABLE users (
user_id UUID PRIMARY KEY DEFAULT GEN_RANDOM_UUID(),
first_name TEXT NULL,
last_name TEXT NULL,
email TEXT NULL UNIQUE,
password_hash TEXT NULL,
created_at TIMESTAMPTZ NOT NULL DEFAULT NOW(),
updated_at TIMESTAMPTZ NOT NULL DEFAULT NOW()
);
This simple model captures the essential information needed for user authentication and basic personalization. However, modern SaaS applications need much more.
Adding Multi-tenancy Support
Most SaaS applications serve multiple organizations. We need to add support for this:
CREATE TABLE organizations (
organization_id UUID PRIMARY KEY DEFAULT GEN_RANDOM_UUID(),
name TEXT NOT NULL,
created_at TIMESTAMPTZ NOT NULL DEFAULT now(),
updated_at TIMESTAMPTZ NOT NULL DEFAULT now()
);
ALTER TABLE users ADD COLUMN
organization_id UUID NOT NULL REFERENCES organizations(organization_id);
Now each user belongs to an organization, enabling proper data isolation and multi-tenant features.
User Types and Role Management
Rather than using an ENUM for user types, we create a separate table. This gives us flexibility to add new user types without schema changes:
CREATE TABLE user_types (
user_type_id UUID PRIMARY KEY DEFAULT GEN_RANDOM_UUID(),
name TEXT NOT NULL UNIQUE,
description TEXT NOT NULL,
created_at TIMESTAMPTZ NOT NULL DEFAULT NOW(),
updated_at TIMESTAMPTZ NOT NULL DEFAULT NOW()
);
ALTER TABLE users ADD COLUMN
user_type_id UUID NOT NULL REFERENCES user_types(user_type_id);
This approach allows us to define different types of users (admin, regular user, guest, etc.) and their capabilities. It also provides a foundation for future RBAC (Role-Based Access Control) implementation.
Tracking User State
Modern applications need to track various user states and important timestamps:
ALTER TABLE users ADD COLUMN
locked_at TIMESTAMPTZ NULL,
deactivated_at TIMESTAMPTZ NULL,
email_verified_at TIMESTAMPTZ NULL,
phone_verified_at TIMESTAMPTZ NULL,
first_login_at TIMESTAMPTZ NULL,
last_login_at TIMESTAMPTZ NULL;
-- Add a derived status column
ALTER TABLE users ADD COLUMN
status TEXT GENERATED ALWAYS AS (
CASE
-- Terminal States
WHEN deactivated_at IS NOT NULL THEN 'DEACTIVATED'
WHEN locked_at IS NOT NULL THEN 'LOCKED'
-- Active States
WHEN first_login_at IS NOT NULL THEN
CASE
WHEN email_verified_at IS NOT NULL
OR phone_verified_at IS NOT NULL THEN 'ACTIVE'
ELSE 'PENDING_VERIFICATION'
END
WHEN email_verified_at IS NOT NULL
OR phone_verified_at IS NOT NULL THEN 'PENDING_FIRST_LOGIN'
ELSE 'PENDING_VERIFICATION'
END
) STORED;
Instead of storing status as a simple enum, we store the actual events (locked_at, deactivated_at, etc.) and derive the status. This gives us a complete audit trail and allows us to answer questions like "how long was this account locked?"
Handling User Attributes
User attributes come in different flavors:
- Static attributes that rarely change
- Calculated attributes that depend on user behavior
- Feature flags and other dynamic attributes
Let's handle these separately:
-- Static attributes
CREATE TABLE user_attributes (
user_attributes_id UUID PRIMARY KEY DEFAULT GEN_RANDOM_UUID(),
user_id UUID NOT NULL UNIQUE REFERENCES users(user_id),
timezone TEXT NULL CHECK (timezone ~* '^[a-zA-Z0-9\/_-]+$'),
created_at TIMESTAMPTZ NOT NULL DEFAULT NOW(),
updated_at TIMESTAMPTZ NOT NULL DEFAULT NOW()
);
-- Calculated attributes (refreshed periodically)
CREATE MATERIALIZED VIEW user_attributes_calculated_view AS (
SELECT
u.user_id,
u.organization_id,
DATE_PART('day', NOW() - u.created_at) AS days_since_created,
TO_CHAR(u.created_at, 'YYYY-MM') AS cohort_created,
TO_CHAR(u.first_login_at, 'YYYY-MM') AS cohort_first_active
FROM users AS u
);
The materialized view is particularly powerful as it can incorporate data from multiple sources, including:
- Usage analytics from a data warehouse
- Billing information from your payment processor
- Integration status from third-party services
Creating a Clean Interface
Finally, we create views that provide a clean interface to all this data:
CREATE OR REPLACE VIEW user_attributes_view AS (
SELECT
u.user_id,
u.organization_id,
-- Static attributes
ua.timezone,
-- Calculated attributes
uac.cohort_created,
uac.cohort_first_active,
-- Real-time calculated flags
CASE
WHEN u.email_verified_at IS NOT NULL
OR u.phone_verified_at IS NOT NULL THEN TRUE
ELSE FALSE
END AS flag_is_verified,
CASE
WHEN u.last_login_at >= NOW() - INTERVAL '30 days'
THEN TRUE
ELSE FALSE
END AS flag_is_active
FROM users AS u
LEFT JOIN user_attributes AS ua ON u.user_id = ua.user_id
LEFT JOIN user_attributes_calculated_view AS uac
ON u.user_id = uac.user_id
);
CREATE OR REPLACE VIEW user_view AS (
SELECT *
FROM users AS u
INNER JOIN user_attributes_view AS v ON u.user_id = v.user_id
);
This gives us a single, clean interface that:
- Combines all user data in one place
- Calculates derived attributes efficiently
- Separates concerns (static vs. calculated attributes)
- Provides a foundation for future extensions
Future Extensions
This data model provides a solid foundation that can be extended to support:
-
Role-Based Access Control (RBAC)
- Add roles and permissions tables
- Create user-role mappings
- Define permission hierarchies
-
Feature Flags
- Add a
feature_flags
table - Create organization and user level overrides
- Track feature usage and adoption
- Add a
-
User Preferences
- Add user-specific settings
- Support organization-wide defaults
- Enable preference inheritance
-
Audit Logging
- Track changes to user data
- Monitor security events
- Support compliance requirements
-
Triggers
- We could add triggers on inserts/updates in specific tables to trigger refreshes to materialized view
Conclusion
Building a robust user data model requires careful consideration of current and future needs. By separating concerns (core user data, attributes, calculated values) and providing clean interfaces, we create a foundation that can grow with our application.
Key takeaways:
- Store events and derive state instead of storing state directly
- Separate static and calculated attributes
- Use materialized views for expensive calculations
- Provide clean interfaces through views
- Design for extensibility from the start
This model has served as a solid foundation for many SaaS applications, providing the flexibility needed to grow while maintaining data integrity and performance.
Appendix
Final Data Models
Organizations
CREATE TABLE IF NOT EXISTS organizations (
organization_id UUID PRIMARY KEY DEFAULT GEN_RANDOM_UUID(),
name TEXT NOT NULL,
-- Dates
created_at TIMESTAMPTZ NOT NULL DEFAULT now(),
updated_at TIMESTAMPTZ NOT NULL DEFAULT now()
);
User Types
CREATE TABLE IF NOT EXISTS user_types (
user_type_id UUID PRIMARY KEY DEFAULT GEN_RANDOM_UUID(),
name TEXT NOT NULL UNIQUE,
description TEXT NOT NULL,
created_at TIMESTAMPTZ NOT NULL DEFAULT NOW(),
updated_at TIMESTAMPTZ NOT NULL DEFAULT NOW()
);
Users
CREATE TABLE IF NOT EXISTS users (
user_id UUID PRIMARY KEY DEFAULT GEN_RANDOM_UUID(),
organization_id UUID NOT NULL,
user_type_id UUID NOT NULL,
first_name TEXT NULL,
last_name TEXT NULL,
full_name TEXT GENERATED ALWAYS AS (TRIM(COALESCE(first_name, '') || ' ' || COALESCE(last_name, ''))) STORED,
email TEXT NULL UNIQUE,
phone TEXT NULL,
avatar_url TEXT NULL,
password_hash TEXT NULL,
-- Dates
created_at TIMESTAMPTZ NOT NULL DEFAULT NOW(),
updated_at TIMESTAMPTZ NOT NULL DEFAULT NOW(),
locked_at TIMESTAMPTZ NULL,
deactivated_at TIMESTAMPTZ NULL,
email_verified_at TIMESTAMPTZ NULL,
phone_verified_at TIMESTAMPTZ NULL,
first_login_at TIMESTAMPTZ NULL,
last_login_at TIMESTAMPTZ NULL,
-- Derived Status Column
status TEXT GENERATED ALWAYS AS (
CASE
-- Terminal States
WHEN deactivated_at IS NOT NULL THEN 'DEACTIVATED'
WHEN locked_at IS NOT NULL THEN 'LOCKED'
-- Active States
WHEN first_login_at IS NOT NULL
THEN
CASE
WHEN email_verified_at IS NOT NULL OR phone_verified_at IS NOT NULL THEN 'ACTIVE'
ELSE 'PENDING_VERIFICATION'
END
WHEN email_verified_at IS NOT NULL OR phone_verified_at IS NOT NULL THEN 'PENDING_FIRST_LOGIN'
ELSE 'PENDING_VERIFICATION'
END
) STORED,
-- Foreign keys
CONSTRAINT fk_organization_id FOREIGN KEY (organization_id) REFERENCES organizations (organization_id) ON DELETE CASCADE,
CONSTRAINT fk_user_type_id FOREIGN KEY (user_type_id) REFERENCES user_types (user_type_id) ON DELETE RESTRICT,
-- Checks
CONSTRAINT chk_email CHECK (email ~* '^[a-zA-Z0-9._%+-]+@[a-zA-Z0-9.-]+\.[a-zA-Z]{2,}$'),
CONSTRAINT chk_phone CHECK (phone ~* '^\+?[0-9]{1,3}[0-9]{3,14}$' AND LENGTH(phone) >= 5 AND LENGTH(phone) <= 15),
CONSTRAINT chk_password_hash_length CHECK (LENGTH(password_hash) = 60)
);
-- Create indices for `users` table
CREATE INDEX IF NOT EXISTS idx_users_organization ON users (organization_id);
CREATE INDEX IF NOT EXISTS idx_users_type ON users (user_type_id);
CREATE INDEX IF NOT EXISTS idx_users_email ON users (email);
CREATE INDEX IF NOT EXISTS idx_users_phone ON users (phone);
User Attributes
CREATE TABLE IF NOT EXISTS user_attributes (
user_attributes_id UUID PRIMARY KEY DEFAULT GEN_RANDOM_UUID(),
user_id UUID NOT NULL UNIQUE,
timezone TEXT NULL,
-- Dates
created_at TIMESTAMPTZ NOT NULL DEFAULT NOW(),
updated_at TIMESTAMPTZ NOT NULL DEFAULT NOW(),
-- Foreign keys
CONSTRAINT fk_user_id FOREIGN KEY (user_id) REFERENCES users (user_id) ON DELETE CASCADE,
-- Checks
CONSTRAINT chk_timezone CHECK (timezone ~* '^[a-zA-Z0-9\/_-]+$')
);
User Attributes Calculated View
CREATE MATERIALIZED VIEW IF NOT EXISTS user_attributes_calculated_view AS (
SELECT
u.user_id,
u.organization_id,
u.created_at,
DATE_PART('day', NOW() - u.created_at) AS days_since_created,
TO_CHAR(u.created_at, 'YYYY-MM') AS cohort_created,
TO_CHAR(u.first_login_at, 'YYYY-MM') AS cohort_first_active
FROM users AS u
);
User Attributes View
CREATE OR REPLACE VIEW user_attributes_view AS (
SELECT
u.user_id,
u.organization_id,
-- Static Attributes from `user_attributes` Table
ua.timezone,
-- Attributes Calculated via Materialized View (refreshed on some cadence; Can add additional statistics, days_since_blah, averages, etc...)
uac.cohort_created,
uac.cohort_first_active,
-- Attributes Calculated at Query Time
CASE
WHEN u.email_verified_at IS NOT NULL OR u.phone_verified_at IS NOT NULL THEN TRUE
ELSE FALSE
END AS flag_is_verified,
CASE
WHEN u.last_login_at >= NOW() - INTERVAL '30 days' THEN TRUE
ELSE FALSE
END AS flag_is_active,
CASE
WHEN u.last_login_at IS NOT NULL THEN TRUE
ELSE FALSE
END AS flag_ever_logged_in
FROM users AS u
LEFT JOIN user_attributes AS ua ON (u.user_id = ua.user_id)
LEFT JOIN user_attributes_calculated_view AS uac ON (u.user_id = uac.user_id)
);
User View
CREATE OR REPLACE VIEW user_view AS (
SELECT
*
FROM users AS u
INNER JOIN user_attributes_view AS v ON (u.user_id = v.user_id)
)