Database Systems and SQL
Database Systems
A database system consists of:
- Database: A collection of interrelated data. A database includes:
- Data Dictionary: Contains a description of the database structure (metadata).
- Data: The actual information stored in the database.
- Database Management System (DBMS): A set of programs that allows you to define, create, manipulate, and control access to the database.
- Users: Administrators, designers, end-users, etc.
Advantages of Using a DBMS
- Reduces and controls data redundancy.
- Avoids inconsistencies (different copies of the original data).
- Maintains data integrity (ensures stored information is correct) using integrity constraints (data type, length, etc.).
- Simplifies creating, developing, and maintaining relationships between data.
- Provides access control, security, and concurrency management.
- Facilitates data backups and recovery from errors.
- Offers flexibility for structural changes without affecting stored data.
Data Models
A database data model is a set of concepts that describe data types, relationships, and constraints, as well as query and modification operations. Types:
- Conceptual: High-level, close to the real world, based on entities with defining attributes and relationships. Example: UML.
- Logical: Describes the global structure, understandable by users but close to the physical data organization. Examples: Relational, object-oriented.
- Physical: Describes the physical database structure, how records, blocks, etc., are stored and accessed.
Schemas and Instances
Schema: Specifies the database design (metadata).
Instance: The dataset contained in the database at a specific time. The DBMS ensures consistency.
Database Languages and Interfaces
DDL (Data Definition Language): Specifies the conceptual database structure.
SDL (Storage Definition Language): Defines the physical storage structure.
VDL (View Definition Language): Allows users to define views. Most DBMSs use DDL for views.
DML (Data Manipulation Language): Enables data access, insertion, deletion, and modification. Can be procedural (how to get data) or declarative (what data to get).
Relational Data Model
The most widely used model, based on records (tables).
Formal | SQL | Description |
---|---|---|
Relation | Table | Represents a generic entity. |
Tuple | Row | Represents a specific entity. |
Attribute | Column | A property of an entity. |
Domain | Domain | Atomic values attributes can take. |
Formal relational properties:
- No duplicate tuples.
- Unordered tuples.
- Unordered attributes.
- Atomic attribute values.
Integrity Rules
Define constraints on tuples. The relational model specifies two types:
- Candidate, Primary, and Alternate Keys: A subset of attributes enforcing uniqueness (no two tuples with the same values). Cannot contain null.
- Foreign Keys: Attributes referencing another relation’s primary key (or itself). Links relations. The DBMS ensures foreign key correspondence (unless null).
Null Values
Represent unknown, missing, or inapplicable attributes. Primary keys cannot contain nulls. Foreign keys may allow nulls.
Database Languages: Relational Algebra
Operators using relations as operands and returning relations. Compatible relations have the same attribute count and domains.
Operations
Operation | Symbol | Compatible | Result |
---|---|---|---|
Union | ∪ | Yes | Tuples in R, S, or both. |
Intersection | ∩ | Yes | Tuples in both R and S. |
Difference | – | Yes | Tuples in R but not S. |
Cartesian Product | × | No | All combinations of R and S tuples. |
Restriction | σ | No | Tuples satisfying a condition. |
Projection | π | No | Selected attributes of a relation. |
Join | ⋈ | No | Combines related tuples. |
Division | ÷ | No | Tuples of R associated with all tuples of S. |
Aggregate Functions | No | Grouping and function application. |
Relational Calculus
Expresses conditions. Result: tuples satisfying the condition. Examples:
- {t | ACTOR(t) and t.cache > 2000}
- {t.name, t.nationality | ACTOR(t) and t.cache > 5000}
Data Integrity
Integrity Rules
- Name
- Constraint
- Violation Response:
- Reject operation
- Alternative procedure
Types:
- Domain: Defines a domain. Example:
CREATE DOMAIN Eye_Color AS VARCHAR(10) DEFAULT 'BROWN' CONSTRAINT valid_color CHECK (VALUE IN ('BROWN', 'GREY', 'BLUE', 'GREEN', 'BLACK'));
- Table: Defined with the table, applying to:
- Columns: Data type, domain, nulls allowed, etc.
CREATE TABLE Actor (Name VARCHAR(30) NOT NULL);
- Candidate and foreign keys.
- Table checks (CHECK).
CREATE TABLE Films (CONSTRAINT film_dates_ok CHECK (shooting_end_date < release_date) ...);
- Columns: Data type, domain, nulls allowed, etc.
- General (Assertions): Not table-specific, involving multiple tables.
CREATE ASSERTION RI1_agency1_cache CHECK (NOT EXISTS (SELECT * FROM Actor WHERE agency=1 AND cache<300));
Constraint Checking
- IMMEDIATE: After each SQL statement.
- DEFERRED: At the transaction’s end.
Triggers
Actions executed after constraint violations. Design requires:
- Triggering event.
- Execution condition.
- Action.
Transactions
Actions performed by a user or program, reading and updating the database. Properties:
- Atomicity: All or nothing.
- Consistency: From one consistent state to another.
- Isolation: Changes invisible until committed.
- Durability: Changes persist after successful completion.
Completed with COMMIT or ROLLBACK.
Concurrency Control
Concurrency Issues
- Lost Update: Two transactions read, one writes, the other overwrites.
- Dirty Read: Reading uncommitted data.
- Incorrect Summary: Aggregates on changing data.
- Non-Repeatable Read: Value changes between reads.
Serializability
Ensures transactions don’t interfere. Serial schedule: Consecutive execution. Inefficient. Non-serial schedule: Interleaved operations. Serializable: Equivalent to a serial schedule. Conflict equivalence: Order matters for conflicting read/write operations.
Precedence graph: Detects non-serializable schedules. Cycles indicate non-serializability. Topological sort for serializability.
Locking Methods
Rules and protocols for serializability. Locks don’t guarantee serializability.
- Shared Lock: Read but not write.
- Exclusive Lock: Read and write.
Two-Phase Locking (2PL)
All locks before unlocks. Phases: Expansion (acquire locks), Downturn (release locks). Ensures serializability if applied by all, but reduces concurrency and can cause deadlocks.
2PL variations:
- Conservative 2PL: Lock everything upfront.
- Strict 2PL: Hold exclusive locks until the end.
- Rigorous 2PL: Hold all locks until the end.
Deadlocks
Two or more transactions waiting for each other. DBMS must detect and resolve.
Deadlock management:
- Timeouts: Abort after waiting too long.
- Prevention: Timestamps and wait-die or wound-wait algorithms.
- Detection: Dependency graph and cycle detection.
Starvation
Repeatedly aborted transactions. Solution: Prioritize aborted transactions.
Indefinite Blocking
Lower priority transactions never get locks. Solutions: FIFO, priority based on wait time.
Failure Recovery
Ensures transaction atomicity and durability.
Failure Types
- Local: Affects only one transaction.
- Global: Affects all transactions.
- System (Soft Crash): Hardware or network malfunction.
- Hardware (Hard Crash): Corrupted database.
- Physical: Disasters, fire, theft.
Recovery Mechanisms
Restore a consistent state after failure. Log file stores transaction details.
Transaction Recovery
- Undo uncommitted transactions by reversing log entries.
- Redo committed transactions by replaying log entries.
Write-Ahead Logging (WAL)
Changes persisted to disk after log entries. Ensures recoverability.
Checkpoints
Log markers indicating disk writes of log buffer and database changes. Speeds up recovery.
Recoverable Schedules
No transaction commits before dependent transactions commit. Prevents lost updates.
Strict Schedules
No reading or writing of uncommitted data. Prevents cascading rollbacks.
Recovery Strategies
- Restore backup and redo log entries for physical failures.
- Use log for redo/undo after other failures.
Update-Based Technique
Changes written to disk before commit. Redo or undo based on transaction state.
Deferred Update Technique
Changes written after commit. No action needed for uncommitted transactions.
Variation of Update
Changes written before commit, but commit considered complete after disk write.
Transaction Isolation Levels
SET TRANSACTION ISOLATION LEVEL ;
Access Modes:
- READ ONLY
- READ WRITE (default)
SQL
Data Query: SELECT
SELECT FROM WHERE GROUP BY HAVING ORDER BY ;
Set Operations
UNION, INTERSECT, EXCEPT (MINUS in Oracle). Compatible tables required. Use ALL variants to keep duplicates.
Explicit Sets
IN
, ANY
(SOME
), ALL
operators for set comparisons.
Nested Queries
EXISTS
, UNIQUE
operators for subquery checks.
Inline Views
Subqueries in the FROM
clause.
Joins
INNER JOIN
(default)NATURAL JOIN
LEFT
,RIGHT
,FULL [OUTER] JOIN
Data Modification
INSERT INTO () VALUES ();
UPDATE SET = WHERE ;
DELETE FROM WHERE ;
Data Definition Language (DDL)
Schemas
CREATE SCHEMA AUTHORIZATION ;
DROP SCHEMA ;
Tables
CREATE TABLE (, );
Data Types
- Numeric:
INTEGER
,SMALLINT
,REAL
,FLOAT
,NUMERIC
,DECIMAL
(NUMBER
in Oracle). - Character:
CHAR
,VARCHAR
. - Bit String:
BIT
,BIT VARYING
. - Temporal:
DATE
,TIME
,TIMESTAMP
,INTERVAL
. - Custom Domains:
CREATE DOMAIN
,DROP DOMAIN
.
Altering Tables
ALTER TABLE
commands for adding, dropping, and modifying columns and constraints.
Views
CREATE VIEW AS ;
DROP VIEW ;
Indexes
CREATE INDEX ON ();
DROP INDEX ;