Database Design Principles
October 9, 2025 32174b4 Edit this page
✔️ Current
39 day(s) old

Database Design Principles

Essential principles and best practices for designing robust databases

Database Design Principles

Master the fundamentals of database design to create efficient, scalable, and maintainable database systems.

Database Design Process

1. Requirements Analysis

Gather Information:

  • What data needs to be stored?
  • Who will use the system?
  • What operations are most common?
  • What are the performance requirements?
  • What are the security requirements?

Output: List of entities, attributes, and relationships

2. Conceptual Design

Create Entity-Relationship (ER) diagram showing:

  • Entities (tables)
  • Attributes (columns)
  • Relationships
  • Cardinality

3. Logical Design

  • Normalize to appropriate level
  • Define data types
  • Establish constraints
  • Plan indexes

4. Physical Design

  • Choose storage engine
  • Optimize for specific DBMS
  • Plan partitioning
  • Configure caching

Normalization

First Normal Form (1NF)

Rules:

  • Each column contains atomic (indivisible) values
  • No repeating groups
  • Each row is unique

Bad (Violates 1NF):

CREATE TABLE orders (
    order_id INT,
    customer_name VARCHAR(100),
    products VARCHAR(500)  -- "Widget,Gadget,Tool"
);

Good (Follows 1NF):

CREATE TABLE orders (
    order_id INT PRIMARY KEY,
    customer_name VARCHAR(100)
);

CREATE TABLE order_items (
    order_id INT,
    product_name VARCHAR(100),
    FOREIGN KEY (order_id) REFERENCES orders(order_id)
);

Second Normal Form (2NF)

Rules:

  • Must be in 1NF
  • All non-key attributes depend on the entire primary key
  • Remove partial dependencies

Bad (Violates 2NF):

CREATE TABLE order_items (
    order_id INT,
    product_id INT,
    product_name VARCHAR(100),  -- Depends only on product_id
    quantity INT,
    PRIMARY KEY (order_id, product_id)
);

Good (Follows 2NF):

CREATE TABLE order_items (
    order_id INT,
    product_id INT,
    quantity INT,
    PRIMARY KEY (order_id, product_id),
    FOREIGN KEY (product_id) REFERENCES products(product_id)
);

CREATE TABLE products (
    product_id INT PRIMARY KEY,
    product_name VARCHAR(100)
);

Third Normal Form (3NF)

Rules:

  • Must be in 2NF
  • No transitive dependencies
  • Non-key attributes depend only on primary key

Bad (Violates 3NF):

CREATE TABLE employees (
    employee_id INT PRIMARY KEY,
    name VARCHAR(100),
    department_id INT,
    department_name VARCHAR(100),  -- Depends on department_id
    department_location VARCHAR(100)  -- Depends on department_id
);

Good (Follows 3NF):

CREATE TABLE employees (
    employee_id INT PRIMARY KEY,
    name VARCHAR(100),
    department_id INT,
    FOREIGN KEY (department_id) REFERENCES departments(department_id)
);

CREATE TABLE departments (
    department_id INT PRIMARY KEY,
    department_name VARCHAR(100),
    department_location VARCHAR(100)
);

When to Denormalize

Reasons:

  • Read-heavy applications
  • Complex joins are too slow
  • Reporting/analytics needs
  • Caching aggregated data

Example:

-- Store calculated totals for faster reporting
CREATE TABLE order_summary (
    order_id INT PRIMARY KEY,
    total_items INT,
    total_amount DECIMAL(10,2),
    created_at TIMESTAMP,
    FOREIGN KEY (order_id) REFERENCES orders(order_id)
);

Data Types

Choosing the Right Type

Integers:

TINYINT    -- -128 to 127 (1 byte)
SMALLINT   -- -32,768 to 32,767 (2 bytes)
INT        -- -2B to 2B (4 bytes)
BIGINT     -- Very large numbers (8 bytes)

Decimals:

-- For money, measurements (precise)
DECIMAL(10,2)  -- 10 digits total, 2 after decimal

-- For scientific data (approximate)
FLOAT
DOUBLE

Strings:

CHAR(10)       -- Fixed length, faster for known sizes
VARCHAR(100)   -- Variable length, saves space
TEXT           -- Large text blocks

Dates and Times:

DATE           -- YYYY-MM-DD
TIME           -- HH:MM:SS
DATETIME       -- YYYY-MM-DD HH:MM:SS
TIMESTAMP      -- Auto-updating timestamp

Boolean:

BOOLEAN        -- TRUE/FALSE
-- Often stored as TINYINT(1) in MySQL

Best Practices

✅ Use smallest data type that fits data
✅ Use VARCHAR instead of CHAR for variable-length data
✅ Use DECIMAL for money (never FLOAT)
✅ Use TIMESTAMP for audit fields
✅ Use ENUM for fixed list of values

❌ Don’t use TEXT for short strings
❌ Don’t use VARCHAR(255) by default
❌ Don’t store dates as strings
❌ Don’t use FLOAT/DOUBLE for money

Primary Keys

Natural vs Surrogate Keys

Natural Key:

  • Business-meaningful (email, SSN)
  • Pros: Self-documenting
  • Cons: Can change, complex
CREATE TABLE users (
    email VARCHAR(255) PRIMARY KEY,
    name VARCHAR(100)
);

Surrogate Key:

  • System-generated (auto-increment, UUID)
  • Pros: Never changes, simple
  • Cons: Extra storage, meaningless
CREATE TABLE users (
    user_id INT AUTO_INCREMENT PRIMARY KEY,
    email VARCHAR(255) UNIQUE NOT NULL,
    name VARCHAR(100)
);

Recommendation: Use surrogate keys for most tables

Composite Primary Keys

Used when relationship between two entities is unique:

CREATE TABLE student_courses (
    student_id INT,
    course_id INT,
    enrollment_date DATE,
    grade VARCHAR(2),
    PRIMARY KEY (student_id, course_id),
    FOREIGN KEY (student_id) REFERENCES students(student_id),
    FOREIGN KEY (course_id) REFERENCES courses(course_id)
);

Relationships

One-to-One (1:1)

Each record in Table A relates to one record in Table B.

CREATE TABLE users (
    user_id INT AUTO_INCREMENT PRIMARY KEY,
    username VARCHAR(50) UNIQUE NOT NULL,
    email VARCHAR(255) UNIQUE NOT NULL
);

CREATE TABLE user_profiles (
    user_id INT PRIMARY KEY,
    bio TEXT,
    avatar_url VARCHAR(500),
    FOREIGN KEY (user_id) REFERENCES users(user_id) ON DELETE CASCADE
);

When to Use:

  • Separate sensitive data
  • Large optional fields
  • Different access patterns

One-to-Many (1:N)

Most common relationship. One record in Table A relates to many in Table B.

CREATE TABLE customers (
    customer_id INT AUTO_INCREMENT PRIMARY KEY,
    name VARCHAR(100) NOT NULL,
    email VARCHAR(255)
);

CREATE TABLE orders (
    order_id INT AUTO_INCREMENT PRIMARY KEY,
    customer_id INT NOT NULL,
    order_date TIMESTAMP DEFAULT CURRENT_TIMESTAMP,
    total DECIMAL(10,2),
    FOREIGN KEY (customer_id) REFERENCES customers(customer_id)
);

Many-to-Many (M:N)

Requires junction/bridge table.

CREATE TABLE students (
    student_id INT AUTO_INCREMENT PRIMARY KEY,
    name VARCHAR(100) NOT NULL
);

CREATE TABLE courses (
    course_id INT AUTO_INCREMENT PRIMARY KEY,
    course_name VARCHAR(100) NOT NULL
);

-- Junction table
CREATE TABLE enrollments (
    student_id INT,
    course_id INT,
    enrollment_date DATE,
    grade VARCHAR(2),
    PRIMARY KEY (student_id, course_id),
    FOREIGN KEY (student_id) REFERENCES students(student_id),
    FOREIGN KEY (course_id) REFERENCES courses(course_id)
);

Indexes

When to Index

Good Candidates:

  • Primary keys (auto-indexed)
  • Foreign keys
  • Frequently searched columns
  • Frequently used in WHERE clauses
  • JOIN conditions
  • ORDER BY columns

Poor Candidates:

  • Small tables (< 1000 rows)
  • Columns with few distinct values
  • Frequently updated columns
  • Large text fields

Index Types

Single Column:

CREATE INDEX idx_email ON users(email);

Composite Index:

-- Order matters! Good for (last_name) and (last_name, first_name)
-- Not good for just (first_name)
CREATE INDEX idx_name ON users(last_name, first_name);

Unique Index:

CREATE UNIQUE INDEX idx_unique_email ON users(email);

Full-Text Index:

CREATE FULLTEXT INDEX idx_content ON articles(title, body);

Index Best Practices

-- Good: Specific, targeted
SELECT * FROM orders 
WHERE customer_id = 123 AND status = 'pending';
-- Index: (customer_id, status)

-- Bad: Leading wildcard prevents index use
SELECT * FROM users WHERE email LIKE '%@gmail.com';

-- Good: Index can be used
SELECT * FROM users WHERE email LIKE 'john%';

Constraints

NOT NULL

Ensures column always has a value:

CREATE TABLE products (
    product_id INT AUTO_INCREMENT PRIMARY KEY,
    name VARCHAR(100) NOT NULL,
    description TEXT,  -- Can be NULL
    price DECIMAL(10,2) NOT NULL
);

UNIQUE

Ensures no duplicate values:

CREATE TABLE users (
    user_id INT AUTO_INCREMENT PRIMARY KEY,
    email VARCHAR(255) UNIQUE NOT NULL,
    username VARCHAR(50) UNIQUE NOT NULL
);

CHECK

Validates data against condition:

CREATE TABLE products (
    product_id INT AUTO_INCREMENT PRIMARY KEY,
    name VARCHAR(100) NOT NULL,
    price DECIMAL(10,2) NOT NULL CHECK (price >= 0),
    stock_quantity INT CHECK (stock_quantity >= 0)
);

DEFAULT

Provides default value:

CREATE TABLE orders (
    order_id INT AUTO_INCREMENT PRIMARY KEY,
    status VARCHAR(20) DEFAULT 'pending',
    created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP,
    updated_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP ON UPDATE CURRENT_TIMESTAMP
);

Foreign Key Constraints

CREATE TABLE orders (
    order_id INT AUTO_INCREMENT PRIMARY KEY,
    customer_id INT NOT NULL,
    FOREIGN KEY (customer_id) REFERENCES customers(customer_id)
        ON DELETE CASCADE  -- Delete orders when customer deleted
        ON UPDATE CASCADE  -- Update orders when customer ID changes
);

Options:

  • CASCADE: Propagate change
  • SET NULL: Set to NULL
  • RESTRICT: Prevent change
  • NO ACTION: Similar to RESTRICT

Common Patterns

Audit Columns

Track who changed what and when:

CREATE TABLE products (
    product_id INT AUTO_INCREMENT PRIMARY KEY,
    name VARCHAR(100) NOT NULL,
    price DECIMAL(10,2) NOT NULL,
    created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP,
    created_by INT,
    updated_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP ON UPDATE CURRENT_TIMESTAMP,
    updated_by INT,
    FOREIGN KEY (created_by) REFERENCES users(user_id),
    FOREIGN KEY (updated_by) REFERENCES users(user_id)
);

Soft Delete

Keep deleted records for audit/recovery:

CREATE TABLE users (
    user_id INT AUTO_INCREMENT PRIMARY KEY,
    email VARCHAR(255) UNIQUE NOT NULL,
    deleted_at TIMESTAMP NULL DEFAULT NULL
);

-- Query only active users
SELECT * FROM users WHERE deleted_at IS NULL;

-- "Delete" user (soft delete)
UPDATE users SET deleted_at = CURRENT_TIMESTAMP WHERE user_id = 123;

Versioning

Track changes to records:

CREATE TABLE product_versions (
    version_id INT AUTO_INCREMENT PRIMARY KEY,
    product_id INT NOT NULL,
    name VARCHAR(100) NOT NULL,
    price DECIMAL(10,2) NOT NULL,
    version_number INT NOT NULL,
    created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP,
    created_by INT,
    FOREIGN KEY (product_id) REFERENCES products(product_id)
);

Hierarchical Data

Adjacency List (Simple, Limited):

CREATE TABLE categories (
    category_id INT AUTO_INCREMENT PRIMARY KEY,
    name VARCHAR(100) NOT NULL,
    parent_id INT NULL,
    FOREIGN KEY (parent_id) REFERENCES categories(category_id)
);

Nested Set (Complex, Efficient):

CREATE TABLE categories (
    category_id INT AUTO_INCREMENT PRIMARY KEY,
    name VARCHAR(100) NOT NULL,
    lft INT NOT NULL,
    rgt INT NOT NULL
);

Performance Optimization

Query Optimization

Use EXPLAIN:

EXPLAIN SELECT * FROM orders 
WHERE customer_id = 123 AND status = 'pending';

*Avoid SELECT :

-- Bad: Returns unnecessary data
SELECT * FROM users WHERE user_id = 123;

-- Good: Specific columns
SELECT user_id, name, email FROM users WHERE user_id = 123;

Limit Results:

SELECT * FROM products ORDER BY created_at DESC LIMIT 20;

Partitioning

Split large tables for better performance:

-- Range partitioning by date
CREATE TABLE orders (
    order_id INT,
    order_date DATE,
    total DECIMAL(10,2)
)
PARTITION BY RANGE (YEAR(order_date)) (
    PARTITION p2023 VALUES LESS THAN (2024),
    PARTITION p2024 VALUES LESS THAN (2025),
    PARTITION p2025 VALUES LESS THAN (2026)
);

Caching Strategies

Query Cache:

  • Enable for repeated identical queries
  • Useful for read-heavy applications

Application-Level Cache:

  • Redis, Memcached
  • Cache common queries
  • Invalidate on updates

Security

SQL Injection Prevention

Bad (Vulnerable):

# Never do this!
query = f"SELECT * FROM users WHERE username = '{username}'"

Good (Safe):

# Use parameterized queries
query = "SELECT * FROM users WHERE username = ?"
cursor.execute(query, (username,))

Principle of Least Privilege

-- Create read-only user
CREATE USER 'app_reader'@'localhost' IDENTIFIED BY 'password';
GRANT SELECT ON mydb.* TO 'app_reader'@'localhost';

-- Create write user with limited permissions
CREATE USER 'app_writer'@'localhost' IDENTIFIED BY 'password';
GRANT SELECT, INSERT, UPDATE ON mydb.* TO 'app_writer'@'localhost';

Encryption

At Rest:

  • Encrypt sensitive columns (credit cards, SSN)
  • Use database encryption features
  • Encrypt backups

In Transit:

  • Always use SSL/TLS
  • Never send passwords in plain text

Schema Migrations

Version Control

Track schema changes like code:

-- migrations/001_create_users.sql
CREATE TABLE users (
    user_id INT AUTO_INCREMENT PRIMARY KEY,
    email VARCHAR(255) UNIQUE NOT NULL,
    created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP
);

-- migrations/002_add_username.sql
ALTER TABLE users ADD COLUMN username VARCHAR(50) UNIQUE;

Migration Tools

  • Flyway (Java)
  • Liquibase (XML/YAML)
  • Alembic (Python)
  • Django Migrations (Python)
  • Rails Migrations (Ruby)

Best Practices

✅ Never modify committed migrations
✅ Test migrations on copy of production
✅ Plan rollback strategy
✅ Make migrations idempotent
✅ Keep migrations small and focused

Testing

Test Data

-- Use transactions for test isolation
BEGIN;
INSERT INTO users (email) VALUES ('test@example.com');
-- Run tests
ROLLBACK;

Fixtures

Maintain consistent test data:

# users.yml
- id: 1
  email: 'user1@example.com'
  username: 'user1'
  
- id: 2
  email: 'user2@example.com'
  username: 'user2'

Documentation

Table Documentation

CREATE TABLE users (
    user_id INT AUTO_INCREMENT PRIMARY KEY COMMENT 'Unique user identifier',
    email VARCHAR(255) UNIQUE NOT NULL COMMENT 'User email address (login)',
    username VARCHAR(50) UNIQUE NOT NULL COMMENT 'Display name',
    created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP COMMENT 'Account creation time'
) COMMENT='User accounts table';

ER Diagrams

Use tools to visualize:

  • dbdiagram.io
  • MySQL Workbench
  • pgAdmin (PostgreSQL)
  • DBeaver

Common Mistakes

❌ Not using foreign keys
❌ Over-normalizing (too many joins)
❌ Under-indexing (slow queries)
❌ Over-indexing (slow writes)
❌ Using wrong data types
❌ No backup strategy
❌ Storing files in database
❌ No documentation

Resources