Database Design Principles

October 9, 2025 32174b4 Edit this page

✔️ Current
39 day(s) old

Database Design Principles

Essential principles and best practices for designing robust databases

Database Design Principles

Master the fundamentals of database design to create efficient, scalable, and maintainable database systems.

Database Design Process

1. Requirements Analysis

Gather Information:

What data needs to be stored?
Who will use the system?
What operations are most common?
What are the performance requirements?
What are the security requirements?

Output: List of entities, attributes, and relationships

2. Conceptual Design

Create Entity-Relationship (ER) diagram showing:

Entities (tables)
Attributes (columns)
Relationships
Cardinality

3. Logical Design

Normalize to appropriate level
Define data types
Establish constraints
Plan indexes

4. Physical Design

Choose storage engine
Optimize for specific DBMS
Plan partitioning
Configure caching

Normalization

First Normal Form (1NF)

Rules:

Each column contains atomic (indivisible) values
No repeating groups
Each row is unique

Bad (Violates 1NF):

CREATE TABLE orders (
    order_id INT,
    customer_name VARCHAR(100),
    products VARCHAR(500)  -- "Widget,Gadget,Tool"
);

Good (Follows 1NF):

CREATE TABLE orders (
    order_id INT PRIMARY KEY,
    customer_name VARCHAR(100)
);

CREATE TABLE order_items (
    order_id INT,
    product_name VARCHAR(100),
    FOREIGN KEY (order_id) REFERENCES orders(order_id)
);

Second Normal Form (2NF)

Rules:

Must be in 1NF
All non-key attributes depend on the entire primary key
Remove partial dependencies

Bad (Violates 2NF):

CREATE TABLE order_items (
    order_id INT,
    product_id INT,
    product_name VARCHAR(100),  -- Depends only on product_id
    quantity INT,
    PRIMARY KEY (order_id, product_id)
);

Good (Follows 2NF):

CREATE TABLE order_items (
    order_id INT,
    product_id INT,
    quantity INT,
    PRIMARY KEY (order_id, product_id),
    FOREIGN KEY (product_id) REFERENCES products(product_id)
);

CREATE TABLE products (
    product_id INT PRIMARY KEY,
    product_name VARCHAR(100)
);

Third Normal Form (3NF)

Rules:

Must be in 2NF
No transitive dependencies
Non-key attributes depend only on primary key

Bad (Violates 3NF):

CREATE TABLE employees (
    employee_id INT PRIMARY KEY,
    name VARCHAR(100),
    department_id INT,
    department_name VARCHAR(100),  -- Depends on department_id
    department_location VARCHAR(100)  -- Depends on department_id
);

Good (Follows 3NF):

CREATE TABLE employees (
    employee_id INT PRIMARY KEY,
    name VARCHAR(100),
    department_id INT,
    FOREIGN KEY (department_id) REFERENCES departments(department_id)
);

CREATE TABLE departments (
    department_id INT PRIMARY KEY,
    department_name VARCHAR(100),
    department_location VARCHAR(100)
);

When to Denormalize

Reasons:

Read-heavy applications
Complex joins are too slow
Reporting/analytics needs
Caching aggregated data

Example:

-- Store calculated totals for faster reporting
CREATE TABLE order_summary (
    order_id INT PRIMARY KEY,
    total_items INT,
    total_amount DECIMAL(10,2),
    created_at TIMESTAMP,
    FOREIGN KEY (order_id) REFERENCES orders(order_id)
);

Data Types

Choosing the Right Type

Integers:

TINYINT    -- -128 to 127 (1 byte)
SMALLINT   -- -32,768 to 32,767 (2 bytes)
INT        -- -2B to 2B (4 bytes)
BIGINT     -- Very large numbers (8 bytes)

Decimals:

-- For money, measurements (precise)
DECIMAL(10,2)  -- 10 digits total, 2 after decimal

-- For scientific data (approximate)
FLOAT
DOUBLE

Strings:

CHAR(10)       -- Fixed length, faster for known sizes
VARCHAR(100)   -- Variable length, saves space
TEXT           -- Large text blocks

Dates and Times:

DATE           -- YYYY-MM-DD
TIME           -- HH:MM:SS
DATETIME       -- YYYY-MM-DD HH:MM:SS
TIMESTAMP      -- Auto-updating timestamp

Boolean:

BOOLEAN        -- TRUE/FALSE
-- Often stored as TINYINT(1) in MySQL

Best Practices

✅ Use smallest data type that fits data
✅ Use VARCHAR instead of CHAR for variable-length data
✅ Use DECIMAL for money (never FLOAT)
✅ Use TIMESTAMP for audit fields
✅ Use ENUM for fixed list of values

❌ Don’t use TEXT for short strings
❌ Don’t use VARCHAR(255) by default
❌ Don’t store dates as strings
❌ Don’t use FLOAT/DOUBLE for money

Primary Keys

Natural vs Surrogate Keys

Natural Key:

Business-meaningful (email, SSN)
Pros: Self-documenting
Cons: Can change, complex

CREATE TABLE users (
    email VARCHAR(255) PRIMARY KEY,
    name VARCHAR(100)
);

Surrogate Key:

System-generated (auto-increment, UUID)
Pros: Never changes, simple
Cons: Extra storage, meaningless

CREATE TABLE users (
    user_id INT AUTO_INCREMENT PRIMARY KEY,
    email VARCHAR(255) UNIQUE NOT NULL,
    name VARCHAR(100)
);

Recommendation: Use surrogate keys for most tables

Composite Primary Keys

Used when relationship between two entities is unique:

CREATE TABLE student_courses (
    student_id INT,
    course_id INT,
    enrollment_date DATE,
    grade VARCHAR(2),
    PRIMARY KEY (student_id, course_id),
    FOREIGN KEY (student_id) REFERENCES students(student_id),
    FOREIGN KEY (course_id) REFERENCES courses(course_id)
);

Relationships

One-to-One (1:1)

Each record in Table A relates to one record in Table B.

CREATE TABLE users (
    user_id INT AUTO_INCREMENT PRIMARY KEY,
    username VARCHAR(50) UNIQUE NOT NULL,
    email VARCHAR(255) UNIQUE NOT NULL
);

CREATE TABLE user_profiles (
    user_id INT PRIMARY KEY,
    bio TEXT,
    avatar_url VARCHAR(500),
    FOREIGN KEY (user_id) REFERENCES users(user_id) ON DELETE CASCADE
);

When to Use:

Separate sensitive data
Large optional fields
Different access patterns

One-to-Many (1:N)

Most common relationship. One record in Table A relates to many in Table B.

CREATE TABLE customers (
    customer_id INT AUTO_INCREMENT PRIMARY KEY,
    name VARCHAR(100) NOT NULL,
    email VARCHAR(255)
);

CREATE TABLE orders (
    order_id INT AUTO_INCREMENT PRIMARY KEY,
    customer_id INT NOT NULL,
    order_date TIMESTAMP DEFAULT CURRENT_TIMESTAMP,
    total DECIMAL(10,2),
    FOREIGN KEY (customer_id) REFERENCES customers(customer_id)
);

Many-to-Many (M:N)

Requires junction/bridge table.

CREATE TABLE students (
    student_id INT AUTO_INCREMENT PRIMARY KEY,
    name VARCHAR(100) NOT NULL
);

CREATE TABLE courses (
    course_id INT AUTO_INCREMENT PRIMARY KEY,
    course_name VARCHAR(100) NOT NULL
);

-- Junction table
CREATE TABLE enrollments (
    student_id INT,
    course_id INT,
    enrollment_date DATE,
    grade VARCHAR(2),
    PRIMARY KEY (student_id, course_id),
    FOREIGN KEY (student_id) REFERENCES students(student_id),
    FOREIGN KEY (course_id) REFERENCES courses(course_id)
);

Indexes

When to Index

Good Candidates:

Primary keys (auto-indexed)
Foreign keys
Frequently searched columns
Frequently used in WHERE clauses
JOIN conditions
ORDER BY columns

Poor Candidates:

Small tables (< 1000 rows)
Columns with few distinct values
Frequently updated columns
Large text fields

Index Types

Single Column:

CREATE INDEX idx_email ON users(email);

Composite Index:

-- Order matters! Good for (last_name) and (last_name, first_name)
-- Not good for just (first_name)
CREATE INDEX idx_name ON users(last_name, first_name);

Unique Index:

CREATE UNIQUE INDEX idx_unique_email ON users(email);

Full-Text Index:

CREATE FULLTEXT INDEX idx_content ON articles(title, body);

Index Best Practices

-- Good: Specific, targeted
SELECT * FROM orders 
WHERE customer_id = 123 AND status = 'pending';
-- Index: (customer_id, status)

-- Bad: Leading wildcard prevents index use
SELECT * FROM users WHERE email LIKE '%@gmail.com';

-- Good: Index can be used
SELECT * FROM users WHERE email LIKE 'john%';

Constraints

NOT NULL

Ensures column always has a value:

CREATE TABLE products (
    product_id INT AUTO_INCREMENT PRIMARY KEY,
    name VARCHAR(100) NOT NULL,
    description TEXT,  -- Can be NULL
    price DECIMAL(10,2) NOT NULL
);

UNIQUE

Ensures no duplicate values:

CREATE TABLE users (
    user_id INT AUTO_INCREMENT PRIMARY KEY,
    email VARCHAR(255) UNIQUE NOT NULL,
    username VARCHAR(50) UNIQUE NOT NULL
);

CHECK

Validates data against condition:

CREATE TABLE products (
    product_id INT AUTO_INCREMENT PRIMARY KEY,
    name VARCHAR(100) NOT NULL,
    price DECIMAL(10,2) NOT NULL CHECK (price >= 0),
    stock_quantity INT CHECK (stock_quantity >= 0)
);

DEFAULT

Provides default value:

CREATE TABLE orders (
    order_id INT AUTO_INCREMENT PRIMARY KEY,
    status VARCHAR(20) DEFAULT 'pending',
    created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP,
    updated_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP ON UPDATE CURRENT_TIMESTAMP
);

Foreign Key Constraints

CREATE TABLE orders (
    order_id INT AUTO_INCREMENT PRIMARY KEY,
    customer_id INT NOT NULL,
    FOREIGN KEY (customer_id) REFERENCES customers(customer_id)
        ON DELETE CASCADE  -- Delete orders when customer deleted
        ON UPDATE CASCADE  -- Update orders when customer ID changes
);

Options:

CASCADE: Propagate change
SET NULL: Set to NULL
RESTRICT: Prevent change
NO ACTION: Similar to RESTRICT

Common Patterns

Audit Columns

Track who changed what and when:

CREATE TABLE products (
    product_id INT AUTO_INCREMENT PRIMARY KEY,
    name VARCHAR(100) NOT NULL,
    price DECIMAL(10,2) NOT NULL,
    created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP,
    created_by INT,
    updated_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP ON UPDATE CURRENT_TIMESTAMP,
    updated_by INT,
    FOREIGN KEY (created_by) REFERENCES users(user_id),
    FOREIGN KEY (updated_by) REFERENCES users(user_id)
);

Soft Delete

Keep deleted records for audit/recovery:

CREATE TABLE users (
    user_id INT AUTO_INCREMENT PRIMARY KEY,
    email VARCHAR(255) UNIQUE NOT NULL,
    deleted_at TIMESTAMP NULL DEFAULT NULL
);

-- Query only active users
SELECT * FROM users WHERE deleted_at IS NULL;

-- "Delete" user (soft delete)
UPDATE users SET deleted_at = CURRENT_TIMESTAMP WHERE user_id = 123;

Versioning

Track changes to records:

CREATE TABLE product_versions (
    version_id INT AUTO_INCREMENT PRIMARY KEY,
    product_id INT NOT NULL,
    name VARCHAR(100) NOT NULL,
    price DECIMAL(10,2) NOT NULL,
    version_number INT NOT NULL,
    created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP,
    created_by INT,
    FOREIGN KEY (product_id) REFERENCES products(product_id)
);

Hierarchical Data

Adjacency List (Simple, Limited):

CREATE TABLE categories (
    category_id INT AUTO_INCREMENT PRIMARY KEY,
    name VARCHAR(100) NOT NULL,
    parent_id INT NULL,
    FOREIGN KEY (parent_id) REFERENCES categories(category_id)
);

Nested Set (Complex, Efficient):

CREATE TABLE categories (
    category_id INT AUTO_INCREMENT PRIMARY KEY,
    name VARCHAR(100) NOT NULL,
    lft INT NOT NULL,
    rgt INT NOT NULL
);

Performance Optimization

Query Optimization

Use EXPLAIN:

EXPLAIN SELECT * FROM orders 
WHERE customer_id = 123 AND status = 'pending';

*Avoid SELECT :

-- Bad: Returns unnecessary data
SELECT * FROM users WHERE user_id = 123;

-- Good: Specific columns
SELECT user_id, name, email FROM users WHERE user_id = 123;

Limit Results:

SELECT * FROM products ORDER BY created_at DESC LIMIT 20;

Partitioning

Split large tables for better performance:

-- Range partitioning by date
CREATE TABLE orders (
    order_id INT,
    order_date DATE,
    total DECIMAL(10,2)
)
PARTITION BY RANGE (YEAR(order_date)) (
    PARTITION p2023 VALUES LESS THAN (2024),
    PARTITION p2024 VALUES LESS THAN (2025),
    PARTITION p2025 VALUES LESS THAN (2026)
);

Caching Strategies

Query Cache:

Enable for repeated identical queries
Useful for read-heavy applications

Application-Level Cache:

Redis, Memcached
Cache common queries
Invalidate on updates

Security

SQL Injection Prevention

Bad (Vulnerable):

# Never do this!
query = f"SELECT * FROM users WHERE username = '{username}'"

Good (Safe):

# Use parameterized queries
query = "SELECT * FROM users WHERE username = ?"
cursor.execute(query, (username,))

Principle of Least Privilege

-- Create read-only user
CREATE USER 'app_reader'@'localhost' IDENTIFIED BY 'password';
GRANT SELECT ON mydb.* TO 'app_reader'@'localhost';

-- Create write user with limited permissions
CREATE USER 'app_writer'@'localhost' IDENTIFIED BY 'password';
GRANT SELECT, INSERT, UPDATE ON mydb.* TO 'app_writer'@'localhost';

Encryption

At Rest:

Encrypt sensitive columns (credit cards, SSN)
Use database encryption features
Encrypt backups

In Transit:

Always use SSL/TLS
Never send passwords in plain text

Schema Migrations

Version Control

Track schema changes like code:

-- migrations/001_create_users.sql
CREATE TABLE users (
    user_id INT AUTO_INCREMENT PRIMARY KEY,
    email VARCHAR(255) UNIQUE NOT NULL,
    created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP
);

-- migrations/002_add_username.sql
ALTER TABLE users ADD COLUMN username VARCHAR(50) UNIQUE;

Migration Tools

Flyway (Java)
Liquibase (XML/YAML)
Alembic (Python)
Django Migrations (Python)
Rails Migrations (Ruby)

Best Practices

✅ Never modify committed migrations
✅ Test migrations on copy of production
✅ Plan rollback strategy
✅ Make migrations idempotent
✅ Keep migrations small and focused

Testing

Test Data

-- Use transactions for test isolation
BEGIN;
INSERT INTO users (email) VALUES ('test@example.com');
-- Run tests
ROLLBACK;

Fixtures

Maintain consistent test data:

# users.yml
- id: 1
  email: 'user1@example.com'
  username: 'user1'
  
- id: 2
  email: 'user2@example.com'
  username: 'user2'

Documentation

Table Documentation

CREATE TABLE users (
    user_id INT AUTO_INCREMENT PRIMARY KEY COMMENT 'Unique user identifier',
    email VARCHAR(255) UNIQUE NOT NULL COMMENT 'User email address (login)',
    username VARCHAR(50) UNIQUE NOT NULL COMMENT 'Display name',
    created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP COMMENT 'Account creation time'
) COMMENT='User accounts table';

ER Diagrams

Use tools to visualize:

dbdiagram.io
MySQL Workbench
pgAdmin (PostgreSQL)
DBeaver

Common Mistakes

❌ Not using foreign keys
❌ Over-normalizing (too many joins)
❌ Under-indexing (slow queries)
❌ Over-indexing (slow writes)
❌ Using wrong data types
❌ No backup strategy
❌ Storing files in database
❌ No documentation

Resources

JavaScript Modern Patterns