Database Systems and SQL

Database Systems

A database system consists of:

  • Database: A collection of interrelated data. A database includes:
    • Data Dictionary: Contains a description of the database structure (metadata).
    • Data: The actual information stored in the database.
  • Database Management System (DBMS): A set of programs that allows you to define, create, manipulate, and control access to the database.
  • Users: Administrators, designers, end-users, etc.

Advantages of Using a DBMS

  • Reduces and controls data redundancy.
  • Avoids inconsistencies (different copies of the original data).
  • Maintains data integrity (ensures stored information is correct) using integrity constraints (data type, length, etc.).
  • Simplifies creating, developing, and maintaining relationships between data.
  • Provides access control, security, and concurrency management.
  • Facilitates data backups and recovery from errors.
  • Offers flexibility for structural changes without affecting stored data.

Data Models

A database data model is a set of concepts that describe data types, relationships, and constraints, as well as query and modification operations. Types:

  • Conceptual: High-level, close to the real world, based on entities with defining attributes and relationships. Example: UML.
  • Logical: Describes the global structure, understandable by users but close to the physical data organization. Examples: Relational, object-oriented.
  • Physical: Describes the physical database structure, how records, blocks, etc., are stored and accessed.

Schemas and Instances

Schema: Specifies the database design (metadata).

Instance: The dataset contained in the database at a specific time. The DBMS ensures consistency.

Database Languages and Interfaces

DDL (Data Definition Language): Specifies the conceptual database structure.

SDL (Storage Definition Language): Defines the physical storage structure.

VDL (View Definition Language): Allows users to define views. Most DBMSs use DDL for views.

DML (Data Manipulation Language): Enables data access, insertion, deletion, and modification. Can be procedural (how to get data) or declarative (what data to get).

Relational Data Model

The most widely used model, based on records (tables).

FormalSQLDescription
RelationTableRepresents a generic entity.
TupleRowRepresents a specific entity.
AttributeColumnA property of an entity.
DomainDomainAtomic values attributes can take.

Formal relational properties:

  • No duplicate tuples.
  • Unordered tuples.
  • Unordered attributes.
  • Atomic attribute values.

Integrity Rules

Define constraints on tuples. The relational model specifies two types:

  • Candidate, Primary, and Alternate Keys: A subset of attributes enforcing uniqueness (no two tuples with the same values). Cannot contain null.
  • Foreign Keys: Attributes referencing another relation’s primary key (or itself). Links relations. The DBMS ensures foreign key correspondence (unless null).

Null Values

Represent unknown, missing, or inapplicable attributes. Primary keys cannot contain nulls. Foreign keys may allow nulls.

Database Languages: Relational Algebra

Operators using relations as operands and returning relations. Compatible relations have the same attribute count and domains.

Operations

OperationSymbolCompatibleResult
UnionYesTuples in R, S, or both.
IntersectionYesTuples in both R and S.
DifferenceYesTuples in R but not S.
Cartesian Product×NoAll combinations of R and S tuples.
RestrictionσNoTuples satisfying a condition.
ProjectionπNoSelected attributes of a relation.
JoinNoCombines related tuples.
Division÷NoTuples of R associated with all tuples of S.
Aggregate FunctionsNoGrouping and function application.

Relational Calculus

Expresses conditions. Result: tuples satisfying the condition. Examples:

  • {t | ACTOR(t) and t.cache > 2000}
  • {t.name, t.nationality | ACTOR(t) and t.cache > 5000}

Data Integrity

Integrity Rules

  • Name
  • Constraint
  • Violation Response:
    • Reject operation
    • Alternative procedure

Types:

  • Domain: Defines a domain. Example:
    CREATE DOMAIN Eye_Color AS VARCHAR(10) DEFAULT 'BROWN' CONSTRAINT valid_color CHECK (VALUE IN ('BROWN', 'GREY', 'BLUE', 'GREEN', 'BLACK'));
  • Table: Defined with the table, applying to:
    • Columns: Data type, domain, nulls allowed, etc.
      CREATE TABLE Actor (Name VARCHAR(30) NOT NULL);
    • Candidate and foreign keys.
    • Table checks (CHECK).
      CREATE TABLE Films (CONSTRAINT film_dates_ok CHECK (shooting_end_date < release_date) ...);
  • General (Assertions): Not table-specific, involving multiple tables.
    CREATE ASSERTION RI1_agency1_cache CHECK (NOT EXISTS (SELECT * FROM Actor WHERE agency=1 AND cache<300));

Constraint Checking

  • IMMEDIATE: After each SQL statement.
  • DEFERRED: At the transaction’s end.

Triggers

Actions executed after constraint violations. Design requires:

  • Triggering event.
  • Execution condition.
  • Action.

Transactions

Actions performed by a user or program, reading and updating the database. Properties:

  • Atomicity: All or nothing.
  • Consistency: From one consistent state to another.
  • Isolation: Changes invisible until committed.
  • Durability: Changes persist after successful completion.

Completed with COMMIT or ROLLBACK.

Concurrency Control

Concurrency Issues

  • Lost Update: Two transactions read, one writes, the other overwrites.
  • Dirty Read: Reading uncommitted data.
  • Incorrect Summary: Aggregates on changing data.
  • Non-Repeatable Read: Value changes between reads.

Serializability

Ensures transactions don’t interfere. Serial schedule: Consecutive execution. Inefficient. Non-serial schedule: Interleaved operations. Serializable: Equivalent to a serial schedule. Conflict equivalence: Order matters for conflicting read/write operations.

Precedence graph: Detects non-serializable schedules. Cycles indicate non-serializability. Topological sort for serializability.

Locking Methods

Rules and protocols for serializability. Locks don’t guarantee serializability.

  • Shared Lock: Read but not write.
  • Exclusive Lock: Read and write.

Two-Phase Locking (2PL)

All locks before unlocks. Phases: Expansion (acquire locks), Downturn (release locks). Ensures serializability if applied by all, but reduces concurrency and can cause deadlocks.

2PL variations:

  • Conservative 2PL: Lock everything upfront.
  • Strict 2PL: Hold exclusive locks until the end.
  • Rigorous 2PL: Hold all locks until the end.

Deadlocks

Two or more transactions waiting for each other. DBMS must detect and resolve.

Deadlock management:

  • Timeouts: Abort after waiting too long.
  • Prevention: Timestamps and wait-die or wound-wait algorithms.
  • Detection: Dependency graph and cycle detection.

Starvation

Repeatedly aborted transactions. Solution: Prioritize aborted transactions.

Indefinite Blocking

Lower priority transactions never get locks. Solutions: FIFO, priority based on wait time.

Failure Recovery

Ensures transaction atomicity and durability.

Failure Types

  • Local: Affects only one transaction.
  • Global: Affects all transactions.
  • System (Soft Crash): Hardware or network malfunction.
  • Hardware (Hard Crash): Corrupted database.
  • Physical: Disasters, fire, theft.

Recovery Mechanisms

Restore a consistent state after failure. Log file stores transaction details.

Transaction Recovery

  • Undo uncommitted transactions by reversing log entries.
  • Redo committed transactions by replaying log entries.

Write-Ahead Logging (WAL)

Changes persisted to disk after log entries. Ensures recoverability.

Checkpoints

Log markers indicating disk writes of log buffer and database changes. Speeds up recovery.

Recoverable Schedules

No transaction commits before dependent transactions commit. Prevents lost updates.

Strict Schedules

No reading or writing of uncommitted data. Prevents cascading rollbacks.

Recovery Strategies

  • Restore backup and redo log entries for physical failures.
  • Use log for redo/undo after other failures.

Update-Based Technique

Changes written to disk before commit. Redo or undo based on transaction state.

Deferred Update Technique

Changes written after commit. No action needed for uncommitted transactions.

Variation of Update

Changes written before commit, but commit considered complete after disk write.

Transaction Isolation Levels

SET TRANSACTION ISOLATION LEVEL ;

Access Modes:

  • READ ONLY
  • READ WRITE (default)

SQL

Data Query: SELECT

SELECT FROM WHERE GROUP BY HAVING ORDER BY ;

Set Operations

UNION, INTERSECT, EXCEPT (MINUS in Oracle). Compatible tables required. Use ALL variants to keep duplicates.

Explicit Sets

IN, ANY (SOME), ALL operators for set comparisons.

Nested Queries

EXISTS, UNIQUE operators for subquery checks.

Inline Views

Subqueries in the FROM clause.

Joins

  • INNER JOIN (default)
  • NATURAL JOIN
  • LEFT, RIGHT, FULL [OUTER] JOIN

Data Modification

INSERT INTO () VALUES ();

UPDATE SET = WHERE ;

DELETE FROM WHERE ;

Data Definition Language (DDL)

Schemas

CREATE SCHEMA AUTHORIZATION ;

DROP SCHEMA ;

Tables

CREATE TABLE (, );

Data Types

  • Numeric: INTEGER, SMALLINT, REAL, FLOAT, NUMERIC, DECIMAL (NUMBER in Oracle).
  • Character: CHAR, VARCHAR.
  • Bit String: BIT, BIT VARYING.
  • Temporal: DATE, TIME, TIMESTAMP, INTERVAL.
  • Custom Domains: CREATE DOMAIN, DROP DOMAIN.

Altering Tables

ALTER TABLE commands for adding, dropping, and modifying columns and constraints.

Views

CREATE VIEW AS ;

DROP VIEW ;

Indexes

CREATE INDEX ON ();

DROP INDEX ;