Database Design Principles
Essential principles and best practices for designing robust databases
Database Design Principles
Master the fundamentals of database design to create efficient, scalable, and maintainable database systems.
Database Design Process
1. Requirements Analysis
Gather Information:
- What data needs to be stored?
- Who will use the system?
- What operations are most common?
- What are the performance requirements?
- What are the security requirements?
Output: List of entities, attributes, and relationships
2. Conceptual Design
Create Entity-Relationship (ER) diagram showing:
- Entities (tables)
- Attributes (columns)
- Relationships
- Cardinality
3. Logical Design
- Normalize to appropriate level
- Define data types
- Establish constraints
- Plan indexes
4. Physical Design
- Choose storage engine
- Optimize for specific DBMS
- Plan partitioning
- Configure caching
Normalization
First Normal Form (1NF)
Rules:
- Each column contains atomic (indivisible) values
- No repeating groups
- Each row is unique
Bad (Violates 1NF):
CREATE TABLE orders (
order_id INT,
customer_name VARCHAR(100),
products VARCHAR(500) -- "Widget,Gadget,Tool"
);Good (Follows 1NF):
CREATE TABLE orders (
order_id INT PRIMARY KEY,
customer_name VARCHAR(100)
);
CREATE TABLE order_items (
order_id INT,
product_name VARCHAR(100),
FOREIGN KEY (order_id) REFERENCES orders(order_id)
);Second Normal Form (2NF)
Rules:
- Must be in 1NF
- All non-key attributes depend on the entire primary key
- Remove partial dependencies
Bad (Violates 2NF):
CREATE TABLE order_items (
order_id INT,
product_id INT,
product_name VARCHAR(100), -- Depends only on product_id
quantity INT,
PRIMARY KEY (order_id, product_id)
);Good (Follows 2NF):
CREATE TABLE order_items (
order_id INT,
product_id INT,
quantity INT,
PRIMARY KEY (order_id, product_id),
FOREIGN KEY (product_id) REFERENCES products(product_id)
);
CREATE TABLE products (
product_id INT PRIMARY KEY,
product_name VARCHAR(100)
);Third Normal Form (3NF)
Rules:
- Must be in 2NF
- No transitive dependencies
- Non-key attributes depend only on primary key
Bad (Violates 3NF):
CREATE TABLE employees (
employee_id INT PRIMARY KEY,
name VARCHAR(100),
department_id INT,
department_name VARCHAR(100), -- Depends on department_id
department_location VARCHAR(100) -- Depends on department_id
);Good (Follows 3NF):
CREATE TABLE employees (
employee_id INT PRIMARY KEY,
name VARCHAR(100),
department_id INT,
FOREIGN KEY (department_id) REFERENCES departments(department_id)
);
CREATE TABLE departments (
department_id INT PRIMARY KEY,
department_name VARCHAR(100),
department_location VARCHAR(100)
);When to Denormalize
Reasons:
- Read-heavy applications
- Complex joins are too slow
- Reporting/analytics needs
- Caching aggregated data
Example:
-- Store calculated totals for faster reporting
CREATE TABLE order_summary (
order_id INT PRIMARY KEY,
total_items INT,
total_amount DECIMAL(10,2),
created_at TIMESTAMP,
FOREIGN KEY (order_id) REFERENCES orders(order_id)
);Data Types
Choosing the Right Type
Integers:
TINYINT -- -128 to 127 (1 byte)
SMALLINT -- -32,768 to 32,767 (2 bytes)
INT -- -2B to 2B (4 bytes)
BIGINT -- Very large numbers (8 bytes)Decimals:
-- For money, measurements (precise)
DECIMAL(10,2) -- 10 digits total, 2 after decimal
-- For scientific data (approximate)
FLOAT
DOUBLEStrings:
CHAR(10) -- Fixed length, faster for known sizes
VARCHAR(100) -- Variable length, saves space
TEXT -- Large text blocksDates and Times:
DATE -- YYYY-MM-DD
TIME -- HH:MM:SS
DATETIME -- YYYY-MM-DD HH:MM:SS
TIMESTAMP -- Auto-updating timestampBoolean:
BOOLEAN -- TRUE/FALSE
-- Often stored as TINYINT(1) in MySQLBest Practices
✅ Use smallest data type that fits data
✅ Use VARCHAR instead of CHAR for variable-length data
✅ Use DECIMAL for money (never FLOAT)
✅ Use TIMESTAMP for audit fields
✅ Use ENUM for fixed list of values
❌ Don’t use TEXT for short strings
❌ Don’t use VARCHAR(255) by default
❌ Don’t store dates as strings
❌ Don’t use FLOAT/DOUBLE for money
Primary Keys
Natural vs Surrogate Keys
Natural Key:
- Business-meaningful (email, SSN)
- Pros: Self-documenting
- Cons: Can change, complex
CREATE TABLE users (
email VARCHAR(255) PRIMARY KEY,
name VARCHAR(100)
);Surrogate Key:
- System-generated (auto-increment, UUID)
- Pros: Never changes, simple
- Cons: Extra storage, meaningless
CREATE TABLE users (
user_id INT AUTO_INCREMENT PRIMARY KEY,
email VARCHAR(255) UNIQUE NOT NULL,
name VARCHAR(100)
);Recommendation: Use surrogate keys for most tables
Composite Primary Keys
Used when relationship between two entities is unique:
CREATE TABLE student_courses (
student_id INT,
course_id INT,
enrollment_date DATE,
grade VARCHAR(2),
PRIMARY KEY (student_id, course_id),
FOREIGN KEY (student_id) REFERENCES students(student_id),
FOREIGN KEY (course_id) REFERENCES courses(course_id)
);Relationships
One-to-One (1:1)
Each record in Table A relates to one record in Table B.
CREATE TABLE users (
user_id INT AUTO_INCREMENT PRIMARY KEY,
username VARCHAR(50) UNIQUE NOT NULL,
email VARCHAR(255) UNIQUE NOT NULL
);
CREATE TABLE user_profiles (
user_id INT PRIMARY KEY,
bio TEXT,
avatar_url VARCHAR(500),
FOREIGN KEY (user_id) REFERENCES users(user_id) ON DELETE CASCADE
);When to Use:
- Separate sensitive data
- Large optional fields
- Different access patterns
One-to-Many (1:N)
Most common relationship. One record in Table A relates to many in Table B.
CREATE TABLE customers (
customer_id INT AUTO_INCREMENT PRIMARY KEY,
name VARCHAR(100) NOT NULL,
email VARCHAR(255)
);
CREATE TABLE orders (
order_id INT AUTO_INCREMENT PRIMARY KEY,
customer_id INT NOT NULL,
order_date TIMESTAMP DEFAULT CURRENT_TIMESTAMP,
total DECIMAL(10,2),
FOREIGN KEY (customer_id) REFERENCES customers(customer_id)
);Many-to-Many (M:N)
Requires junction/bridge table.
CREATE TABLE students (
student_id INT AUTO_INCREMENT PRIMARY KEY,
name VARCHAR(100) NOT NULL
);
CREATE TABLE courses (
course_id INT AUTO_INCREMENT PRIMARY KEY,
course_name VARCHAR(100) NOT NULL
);
-- Junction table
CREATE TABLE enrollments (
student_id INT,
course_id INT,
enrollment_date DATE,
grade VARCHAR(2),
PRIMARY KEY (student_id, course_id),
FOREIGN KEY (student_id) REFERENCES students(student_id),
FOREIGN KEY (course_id) REFERENCES courses(course_id)
);Indexes
When to Index
Good Candidates:
- Primary keys (auto-indexed)
- Foreign keys
- Frequently searched columns
- Frequently used in WHERE clauses
- JOIN conditions
- ORDER BY columns
Poor Candidates:
- Small tables (< 1000 rows)
- Columns with few distinct values
- Frequently updated columns
- Large text fields
Index Types
Single Column:
CREATE INDEX idx_email ON users(email);Composite Index:
-- Order matters! Good for (last_name) and (last_name, first_name)
-- Not good for just (first_name)
CREATE INDEX idx_name ON users(last_name, first_name);Unique Index:
CREATE UNIQUE INDEX idx_unique_email ON users(email);Full-Text Index:
CREATE FULLTEXT INDEX idx_content ON articles(title, body);Index Best Practices
-- Good: Specific, targeted
SELECT * FROM orders
WHERE customer_id = 123 AND status = 'pending';
-- Index: (customer_id, status)
-- Bad: Leading wildcard prevents index use
SELECT * FROM users WHERE email LIKE '%@gmail.com';
-- Good: Index can be used
SELECT * FROM users WHERE email LIKE 'john%';Constraints
NOT NULL
Ensures column always has a value:
CREATE TABLE products (
product_id INT AUTO_INCREMENT PRIMARY KEY,
name VARCHAR(100) NOT NULL,
description TEXT, -- Can be NULL
price DECIMAL(10,2) NOT NULL
);UNIQUE
Ensures no duplicate values:
CREATE TABLE users (
user_id INT AUTO_INCREMENT PRIMARY KEY,
email VARCHAR(255) UNIQUE NOT NULL,
username VARCHAR(50) UNIQUE NOT NULL
);CHECK
Validates data against condition:
CREATE TABLE products (
product_id INT AUTO_INCREMENT PRIMARY KEY,
name VARCHAR(100) NOT NULL,
price DECIMAL(10,2) NOT NULL CHECK (price >= 0),
stock_quantity INT CHECK (stock_quantity >= 0)
);DEFAULT
Provides default value:
CREATE TABLE orders (
order_id INT AUTO_INCREMENT PRIMARY KEY,
status VARCHAR(20) DEFAULT 'pending',
created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP,
updated_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP ON UPDATE CURRENT_TIMESTAMP
);Foreign Key Constraints
CREATE TABLE orders (
order_id INT AUTO_INCREMENT PRIMARY KEY,
customer_id INT NOT NULL,
FOREIGN KEY (customer_id) REFERENCES customers(customer_id)
ON DELETE CASCADE -- Delete orders when customer deleted
ON UPDATE CASCADE -- Update orders when customer ID changes
);Options:
CASCADE: Propagate changeSET NULL: Set to NULLRESTRICT: Prevent changeNO ACTION: Similar to RESTRICT
Common Patterns
Audit Columns
Track who changed what and when:
CREATE TABLE products (
product_id INT AUTO_INCREMENT PRIMARY KEY,
name VARCHAR(100) NOT NULL,
price DECIMAL(10,2) NOT NULL,
created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP,
created_by INT,
updated_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP ON UPDATE CURRENT_TIMESTAMP,
updated_by INT,
FOREIGN KEY (created_by) REFERENCES users(user_id),
FOREIGN KEY (updated_by) REFERENCES users(user_id)
);Soft Delete
Keep deleted records for audit/recovery:
CREATE TABLE users (
user_id INT AUTO_INCREMENT PRIMARY KEY,
email VARCHAR(255) UNIQUE NOT NULL,
deleted_at TIMESTAMP NULL DEFAULT NULL
);
-- Query only active users
SELECT * FROM users WHERE deleted_at IS NULL;
-- "Delete" user (soft delete)
UPDATE users SET deleted_at = CURRENT_TIMESTAMP WHERE user_id = 123;Versioning
Track changes to records:
CREATE TABLE product_versions (
version_id INT AUTO_INCREMENT PRIMARY KEY,
product_id INT NOT NULL,
name VARCHAR(100) NOT NULL,
price DECIMAL(10,2) NOT NULL,
version_number INT NOT NULL,
created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP,
created_by INT,
FOREIGN KEY (product_id) REFERENCES products(product_id)
);Hierarchical Data
Adjacency List (Simple, Limited):
CREATE TABLE categories (
category_id INT AUTO_INCREMENT PRIMARY KEY,
name VARCHAR(100) NOT NULL,
parent_id INT NULL,
FOREIGN KEY (parent_id) REFERENCES categories(category_id)
);Nested Set (Complex, Efficient):
CREATE TABLE categories (
category_id INT AUTO_INCREMENT PRIMARY KEY,
name VARCHAR(100) NOT NULL,
lft INT NOT NULL,
rgt INT NOT NULL
);Performance Optimization
Query Optimization
Use EXPLAIN:
EXPLAIN SELECT * FROM orders
WHERE customer_id = 123 AND status = 'pending';*Avoid SELECT :
-- Bad: Returns unnecessary data
SELECT * FROM users WHERE user_id = 123;
-- Good: Specific columns
SELECT user_id, name, email FROM users WHERE user_id = 123;Limit Results:
SELECT * FROM products ORDER BY created_at DESC LIMIT 20;Partitioning
Split large tables for better performance:
-- Range partitioning by date
CREATE TABLE orders (
order_id INT,
order_date DATE,
total DECIMAL(10,2)
)
PARTITION BY RANGE (YEAR(order_date)) (
PARTITION p2023 VALUES LESS THAN (2024),
PARTITION p2024 VALUES LESS THAN (2025),
PARTITION p2025 VALUES LESS THAN (2026)
);Caching Strategies
Query Cache:
- Enable for repeated identical queries
- Useful for read-heavy applications
Application-Level Cache:
- Redis, Memcached
- Cache common queries
- Invalidate on updates
Security
SQL Injection Prevention
Bad (Vulnerable):
# Never do this!
query = f"SELECT * FROM users WHERE username = '{username}'"Good (Safe):
# Use parameterized queries
query = "SELECT * FROM users WHERE username = ?"
cursor.execute(query, (username,))Principle of Least Privilege
-- Create read-only user
CREATE USER 'app_reader'@'localhost' IDENTIFIED BY 'password';
GRANT SELECT ON mydb.* TO 'app_reader'@'localhost';
-- Create write user with limited permissions
CREATE USER 'app_writer'@'localhost' IDENTIFIED BY 'password';
GRANT SELECT, INSERT, UPDATE ON mydb.* TO 'app_writer'@'localhost';Encryption
At Rest:
- Encrypt sensitive columns (credit cards, SSN)
- Use database encryption features
- Encrypt backups
In Transit:
- Always use SSL/TLS
- Never send passwords in plain text
Schema Migrations
Version Control
Track schema changes like code:
-- migrations/001_create_users.sql
CREATE TABLE users (
user_id INT AUTO_INCREMENT PRIMARY KEY,
email VARCHAR(255) UNIQUE NOT NULL,
created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP
);
-- migrations/002_add_username.sql
ALTER TABLE users ADD COLUMN username VARCHAR(50) UNIQUE;Migration Tools
- Flyway (Java)
- Liquibase (XML/YAML)
- Alembic (Python)
- Django Migrations (Python)
- Rails Migrations (Ruby)
Best Practices
✅ Never modify committed migrations
✅ Test migrations on copy of production
✅ Plan rollback strategy
✅ Make migrations idempotent
✅ Keep migrations small and focused
Testing
Test Data
-- Use transactions for test isolation
BEGIN;
INSERT INTO users (email) VALUES ('test@example.com');
-- Run tests
ROLLBACK;Fixtures
Maintain consistent test data:
# users.yml
- id: 1
email: 'user1@example.com'
username: 'user1'
- id: 2
email: 'user2@example.com'
username: 'user2'Documentation
Table Documentation
CREATE TABLE users (
user_id INT AUTO_INCREMENT PRIMARY KEY COMMENT 'Unique user identifier',
email VARCHAR(255) UNIQUE NOT NULL COMMENT 'User email address (login)',
username VARCHAR(50) UNIQUE NOT NULL COMMENT 'Display name',
created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP COMMENT 'Account creation time'
) COMMENT='User accounts table';ER Diagrams
Use tools to visualize:
- dbdiagram.io
- MySQL Workbench
- pgAdmin (PostgreSQL)
- DBeaver
Common Mistakes
❌ Not using foreign keys
❌ Over-normalizing (too many joins)
❌ Under-indexing (slow queries)
❌ Over-indexing (slow writes)
❌ Using wrong data types
❌ No backup strategy
❌ Storing files in database
❌ No documentation