Public Daniel

Building a Scalable User Data Model for Modern SaaS Applications

Overview

When building a SaaS application, one of the most critical aspects is designing a robust and extensible user data model. In this post, we'll explore how to build such a model, starting from basic requirements and gradually adding complexity to support real-world needs.

Starting Simple: The Core User Table

Let's start with the most basic version of a user table:

CREATE TABLE users (
    user_id UUID PRIMARY KEY DEFAULT GEN_RANDOM_UUID(),
    first_name TEXT NULL,
    last_name TEXT NULL,
    email TEXT NULL UNIQUE,
    password_hash TEXT NULL,
    created_at TIMESTAMPTZ NOT NULL DEFAULT NOW(),
    updated_at TIMESTAMPTZ NOT NULL DEFAULT NOW()
);

This simple model captures the essential information needed for user authentication and basic personalization. However, modern SaaS applications need much more.

Adding Multi-tenancy Support

Most SaaS applications serve multiple organizations. We need to add support for this:

CREATE TABLE organizations (
    organization_id UUID PRIMARY KEY DEFAULT GEN_RANDOM_UUID(),
    name TEXT NOT NULL,
    created_at TIMESTAMPTZ NOT NULL DEFAULT now(),
    updated_at TIMESTAMPTZ NOT NULL DEFAULT now()
);

ALTER TABLE users ADD COLUMN 
    organization_id UUID NOT NULL REFERENCES organizations(organization_id);

Now each user belongs to an organization, enabling proper data isolation and multi-tenant features.

User Types and Role Management

Rather than using an ENUM for user types, we create a separate table. This gives us flexibility to add new user types without schema changes:

CREATE TABLE user_types (
    user_type_id UUID PRIMARY KEY DEFAULT GEN_RANDOM_UUID(),
    name TEXT NOT NULL UNIQUE,
    description TEXT NOT NULL,
    created_at TIMESTAMPTZ NOT NULL DEFAULT NOW(),
    updated_at TIMESTAMPTZ NOT NULL DEFAULT NOW()
);

ALTER TABLE users ADD COLUMN 
    user_type_id UUID NOT NULL REFERENCES user_types(user_type_id);

This approach allows us to define different types of users (admin, regular user, guest, etc.) and their capabilities. It also provides a foundation for future RBAC (Role-Based Access Control) implementation.

Tracking User State

Modern applications need to track various user states and important timestamps:

ALTER TABLE users ADD COLUMN
    locked_at TIMESTAMPTZ NULL,
    deactivated_at TIMESTAMPTZ NULL,
    email_verified_at TIMESTAMPTZ NULL,
    phone_verified_at TIMESTAMPTZ NULL,
    first_login_at TIMESTAMPTZ NULL,
    last_login_at TIMESTAMPTZ NULL;

-- Add a derived status column
ALTER TABLE users ADD COLUMN
    status TEXT GENERATED ALWAYS AS (
        CASE
            -- Terminal States
            WHEN deactivated_at IS NOT NULL THEN 'DEACTIVATED'
            WHEN locked_at IS NOT NULL THEN 'LOCKED'
            -- Active States
            WHEN first_login_at IS NOT NULL THEN
                CASE
                    WHEN email_verified_at IS NOT NULL 
                         OR phone_verified_at IS NOT NULL THEN 'ACTIVE'
                    ELSE 'PENDING_VERIFICATION'
                END
            WHEN email_verified_at IS NOT NULL 
                 OR phone_verified_at IS NOT NULL THEN 'PENDING_FIRST_LOGIN'
            ELSE 'PENDING_VERIFICATION'
        END
    ) STORED;

Instead of storing status as a simple enum, we store the actual events (locked_at, deactivated_at, etc.) and derive the status. This gives us a complete audit trail and allows us to answer questions like "how long was this account locked?"

Handling User Attributes

User attributes come in different flavors:

  1. Static attributes that rarely change
  2. Calculated attributes that depend on user behavior
  3. Feature flags and other dynamic attributes

Let's handle these separately:

-- Static attributes
CREATE TABLE user_attributes (
    user_attributes_id UUID PRIMARY KEY DEFAULT GEN_RANDOM_UUID(),
    user_id UUID NOT NULL UNIQUE REFERENCES users(user_id),
    timezone TEXT NULL CHECK (timezone ~* '^[a-zA-Z0-9\/_-]+$'),
    created_at TIMESTAMPTZ NOT NULL DEFAULT NOW(),
    updated_at TIMESTAMPTZ NOT NULL DEFAULT NOW()
);

-- Calculated attributes (refreshed periodically)
CREATE MATERIALIZED VIEW user_attributes_calculated_view AS (
    SELECT
        u.user_id,
        u.organization_id,
        DATE_PART('day', NOW() - u.created_at) AS days_since_created,
        TO_CHAR(u.created_at, 'YYYY-MM') AS cohort_created,
        TO_CHAR(u.first_login_at, 'YYYY-MM') AS cohort_first_active
    FROM users AS u
);

The materialized view is particularly powerful as it can incorporate data from multiple sources, including:

Creating a Clean Interface

Finally, we create views that provide a clean interface to all this data:

CREATE OR REPLACE VIEW user_attributes_view AS (
    SELECT
        u.user_id,
        u.organization_id,
        -- Static attributes
        ua.timezone,
        -- Calculated attributes
        uac.cohort_created,
        uac.cohort_first_active,
        -- Real-time calculated flags
        CASE
            WHEN u.email_verified_at IS NOT NULL 
                 OR u.phone_verified_at IS NOT NULL THEN TRUE
            ELSE FALSE
        END AS flag_is_verified,
        CASE
            WHEN u.last_login_at >= NOW() - INTERVAL '30 days' 
                 THEN TRUE
            ELSE FALSE
        END AS flag_is_active
    FROM users AS u
    LEFT JOIN user_attributes AS ua ON u.user_id = ua.user_id
    LEFT JOIN user_attributes_calculated_view AS uac 
        ON u.user_id = uac.user_id
);

CREATE OR REPLACE VIEW user_view AS (
    SELECT *
    FROM users AS u
    INNER JOIN user_attributes_view AS v ON u.user_id = v.user_id
);

This gives us a single, clean interface that:

Future Extensions

This data model provides a solid foundation that can be extended to support:

  1. Role-Based Access Control (RBAC)

    • Add roles and permissions tables
    • Create user-role mappings
    • Define permission hierarchies
  2. Feature Flags

    • Add a feature_flags table
    • Create organization and user level overrides
    • Track feature usage and adoption
  3. User Preferences

    • Add user-specific settings
    • Support organization-wide defaults
    • Enable preference inheritance
  4. Audit Logging

    • Track changes to user data
    • Monitor security events
    • Support compliance requirements
  5. Triggers

    • We could add triggers on inserts/updates in specific tables to trigger refreshes to materialized view

Conclusion

Building a robust user data model requires careful consideration of current and future needs. By separating concerns (core user data, attributes, calculated values) and providing clean interfaces, we create a foundation that can grow with our application.

Key takeaways:

This model has served as a solid foundation for many SaaS applications, providing the flexibility needed to grow while maintaining data integrity and performance.

Appendix

Final Data Models

Organizations

CREATE TABLE IF NOT EXISTS organizations (
    organization_id UUID PRIMARY KEY DEFAULT GEN_RANDOM_UUID(),
    name TEXT NOT NULL,

    -- Dates
    created_at TIMESTAMPTZ NOT NULL DEFAULT now(),
    updated_at TIMESTAMPTZ NOT NULL DEFAULT now()
);

User Types

CREATE TABLE IF NOT EXISTS user_types (
    user_type_id UUID PRIMARY KEY DEFAULT GEN_RANDOM_UUID(),
    name TEXT NOT NULL UNIQUE,
    description TEXT NOT NULL,
    created_at TIMESTAMPTZ NOT NULL DEFAULT NOW(),
    updated_at TIMESTAMPTZ NOT NULL DEFAULT NOW()
);

Users

CREATE TABLE IF NOT EXISTS users (
    user_id UUID PRIMARY KEY DEFAULT GEN_RANDOM_UUID(),
    organization_id UUID NOT NULL,
    user_type_id UUID NOT NULL,
    first_name TEXT NULL,
    last_name TEXT NULL,
    full_name TEXT GENERATED ALWAYS AS (TRIM(COALESCE(first_name, '') || ' ' || COALESCE(last_name, ''))) STORED,
    email TEXT NULL UNIQUE,
    phone TEXT NULL,
    avatar_url TEXT NULL,
    password_hash TEXT NULL,

    -- Dates
    created_at TIMESTAMPTZ NOT NULL DEFAULT NOW(),
    updated_at TIMESTAMPTZ NOT NULL DEFAULT NOW(),
    locked_at TIMESTAMPTZ NULL,
    deactivated_at TIMESTAMPTZ NULL,
    email_verified_at TIMESTAMPTZ NULL,
    phone_verified_at TIMESTAMPTZ NULL,
    first_login_at TIMESTAMPTZ NULL,
    last_login_at TIMESTAMPTZ NULL,

    -- Derived Status Column
    status TEXT GENERATED ALWAYS AS (
        CASE
            -- Terminal States
            WHEN deactivated_at IS NOT NULL THEN 'DEACTIVATED'
            WHEN locked_at IS NOT NULL THEN 'LOCKED'

            -- Active States
            WHEN first_login_at IS NOT NULL
                THEN
                    CASE
                        WHEN email_verified_at IS NOT NULL OR phone_verified_at IS NOT NULL THEN 'ACTIVE'
                        ELSE 'PENDING_VERIFICATION'
                    END
            WHEN email_verified_at IS NOT NULL OR phone_verified_at IS NOT NULL THEN 'PENDING_FIRST_LOGIN'
            ELSE 'PENDING_VERIFICATION'
        END
    ) STORED,

    -- Foreign keys
    CONSTRAINT fk_organization_id FOREIGN KEY (organization_id) REFERENCES organizations (organization_id) ON DELETE CASCADE,
    CONSTRAINT fk_user_type_id FOREIGN KEY (user_type_id) REFERENCES user_types (user_type_id) ON DELETE RESTRICT,

    -- Checks
    CONSTRAINT chk_email CHECK (email ~* '^[a-zA-Z0-9._%+-]+@[a-zA-Z0-9.-]+\.[a-zA-Z]{2,}$'),
    CONSTRAINT chk_phone CHECK (phone ~* '^\+?[0-9]{1,3}[0-9]{3,14}$' AND LENGTH(phone) >= 5 AND LENGTH(phone) <= 15),
    CONSTRAINT chk_password_hash_length CHECK (LENGTH(password_hash) = 60)
);

-- Create indices for `users` table
CREATE INDEX IF NOT EXISTS idx_users_organization ON users (organization_id);
CREATE INDEX IF NOT EXISTS idx_users_type ON users (user_type_id);
CREATE INDEX IF NOT EXISTS idx_users_email ON users (email);
CREATE INDEX IF NOT EXISTS idx_users_phone ON users (phone);

User Attributes

CREATE TABLE IF NOT EXISTS user_attributes (
    user_attributes_id UUID PRIMARY KEY DEFAULT GEN_RANDOM_UUID(),
    user_id UUID NOT NULL UNIQUE,
    timezone TEXT NULL,

    -- Dates
    created_at TIMESTAMPTZ NOT NULL DEFAULT NOW(),
    updated_at TIMESTAMPTZ NOT NULL DEFAULT NOW(),

    -- Foreign keys
    CONSTRAINT fk_user_id FOREIGN KEY (user_id) REFERENCES users (user_id) ON DELETE CASCADE,

    -- Checks
    CONSTRAINT chk_timezone CHECK (timezone ~* '^[a-zA-Z0-9\/_-]+$')
);

User Attributes Calculated View

CREATE MATERIALIZED VIEW IF NOT EXISTS user_attributes_calculated_view AS (
    SELECT
        u.user_id,
        u.organization_id,
        u.created_at,
        DATE_PART('day', NOW() - u.created_at) AS days_since_created,
        TO_CHAR(u.created_at, 'YYYY-MM') AS cohort_created,
        TO_CHAR(u.first_login_at, 'YYYY-MM') AS cohort_first_active
    FROM users AS u
);

User Attributes View

CREATE OR REPLACE VIEW user_attributes_view AS (
    SELECT
        u.user_id,
        u.organization_id,
        -- Static Attributes from `user_attributes` Table
        ua.timezone,
        -- Attributes Calculated via Materialized View (refreshed on some cadence; Can add additional statistics, days_since_blah, averages, etc...)
        uac.cohort_created,
        uac.cohort_first_active,
        -- Attributes Calculated at Query Time
        CASE
            WHEN u.email_verified_at IS NOT NULL OR u.phone_verified_at IS NOT NULL THEN TRUE
            ELSE FALSE
        END AS flag_is_verified,
        CASE
            WHEN u.last_login_at >= NOW() - INTERVAL '30 days' THEN TRUE
            ELSE FALSE
        END AS flag_is_active,
        CASE
            WHEN u.last_login_at IS NOT NULL THEN TRUE
            ELSE FALSE
        END AS flag_ever_logged_in
    FROM users AS u
    LEFT JOIN user_attributes AS ua ON (u.user_id = ua.user_id)
    LEFT JOIN user_attributes_calculated_view AS uac ON (u.user_id = uac.user_id)
);

User View

CREATE OR REPLACE VIEW user_view AS (
    SELECT
        *
    FROM users AS u
    INNER JOIN user_attributes_view AS v ON (u.user_id = v.user_id)
)