Database Concepts and Design Principles

What is a Database?

  • A database is a logically organized collection of data with inherent meaning, representing some aspect of the real world and designed for a specific purpose.
  • It is designed, built, and populated with data for a specific purpose and has an intended group of users and applications.

Early Databases

  • Record-keeping predates computers, with early forms like clay tablets and manual filing systems.
  • Computerized file-based systems initially mimicked manual filing systems, storing related data together in files.
  • This approach led to issues such as data separation, redundancy, and dependence.

Database Approach

  • In the database approach, data is defined in a data dictionary and managed independently of applications, enabling data independence.
  • This approach offers benefits like reduced redundancy, improved data integrity, and better security.

Data Models

  • A data model is an abstract concept with structure, operators, and constraints, providing a way to organize and manipulate data.
  • Types of data models include hierarchical, network, relational, object-oriented, and non-relational (NoSQL).

Three-Level Architecture

  • The ANSI-SPARC three-level architecture separates user views from physical data representation.
  • It consists of the external level (user view), conceptual level (logical structure), and internal level (physical implementation).

Functions of a DBMS

  • A database management system (DBMS) is software that allows users to define, create, manipulate, and maintain a database.
  • A DBMS provides functions like data storage and retrieval, a user-accessible catalog, transaction support, concurrency control, recovery services, authorization services, data communications support, and integrity services.

Relational Databases

  • In a relational database, data is organized into tables with rows (tuples) and columns (attributes).
  • Labels are provided by attribute names, measures or domains by attribute definitions, and values in table cells.

Conclusion

  • A database is a self-describing collection of logically related records.
  • The database approach enables data sharing, reducing problems with data separation, isolation, and duplication.
  • The three-level architecture provides data independence.

Relational Model Characteristics

  • A relation is a table of values, with each row (tuple) representing related data.
  • Each column (attribute) specifies how to interpret data values in each row.
  • The domain defines the type of data allowed in each attribute.
  • Each tuple is unique, identified by a key.

Keys

  • Superkey: Any attribute or combination with unique values for each tuple.
  • Candidate Key: A minimal superkey; removing any attribute loses uniqueness.
  • Primary Key: The chosen key for the relation; cannot be NULL.
  • Alternate Key: Any candidate key not chosen as the primary key.
  • Foreign Key: An attribute referencing a primary key in another relation.
  • Secondary Key: Non-unique, used for data retrieval.
  • Surrogate Key: Artificial primary key, often for implementation.

Integrity Constraints

  • Domain Constraint: Attribute values must come from the same specified domain.
  • Entity Integrity Constraint: Primary key must be unique and not null.
  • Referential Integrity Constraint: Foreign key must match a primary key or be null.
  • Enterprise Constraints: Business rules that must hold true.

Relational Algebra

  • Restrict (σ): Subset of tuples based on a condition.
  • Project (∏): Subset of attributes.
  • Cartesian Product (X): All possible tuple combinations from two relations.
  • Join (⋈): Product followed by restrict, usually on related records.
  • Outer Join: Preserves non-matching tuples.
  • Division (÷): Matching on a subset of values.
  • Union (U): All tuples from either relation.
  • Intersection (∩): Tuples present in both relations.
  • Difference (-): Tuples in the first relation but not the second.

SQL (Structured Query Language)

Ease of programming, data independence, code optimization, and portability are advantages of using a database-specific language like SQL.

SQL is the industry standard language for relational databases.

A Brief History of SQL

SQL has evolved through various versions and standards, with the first commercial implementation in 1978 by Oracle Corp.

Most vendors support most of the standard, but there are differences in support levels and feature names.

Some SQL Features

  • Data Definition Language (DDL): Used for creating, altering, and dropping tables, views, indexes, etc. (e.g., CREATE TABLE, ALTER TABLE, DROP TABLE, CREATE VIEW, DROP VIEW, CREATE INDEX).
  • Data Manipulation Language (DML): Used for inserting, deleting, updating, and querying data (e.g., INSERT, DELETE, UPDATE, SELECT).
  • Data Control Language (DCL): Used for controlling access to database objects (e.g., GRANT, REVOKE).
  • Transaction Control Language (TCL): Used for managing transactions (e.g., COMMIT, ROLLBACK).
  • SQL/Persistent Stored Modules (SQL/PSM): Extensions for procedural programming concepts (e.g., functions, triggers, procedures).

SQL in Labs

SQL is used in labs for various purposes, including data definition, data manipulation, data control, transaction control, and procedural language constructs.

SQL for Data Manipulation: SELECT

  • SELECT queries retrieve data from tables.
  • Basic syntax includes SELECT, FROM, WHERE, GROUP BY, HAVING, and ORDER BY clauses.
  • SELECT specifies columns to retrieve.
  • WHERE filters rows based on conditions.
  • Comparison operators, logical operators, range search, pattern matching, and IS NULL/IS NOT NULL are used in the WHERE clause.
  • JOIN operator or matching values in WHERE clause are used for joins.
  • Aggregate functions (COUNT, MIN, MAX, SUM, AVG) operate on columns.
  • GROUP BY groups rows based on attribute values.
  • HAVING filters groups.
  • Subqueries are queries within queries.
  • Set operators (UNION, INTERSECT, MINUS) combine query results.

SQL for Data Definition

  • CREATE TABLE defines a new table with columns, data types, and constraints.
  • CREATE VIEW creates virtual tables from base tables.
  • Constraints include NOT NULL, UNIQUE, primary key, foreign key, and CHECK.
  • DROP TABLE removes a table.
  • ALTER TABLE modifies table structure.
  • CREATE INDEX and DROP INDEX manage indexes.

Other SQL Features

  • GRANT and REVOKE control privileges on tables.
  • Transactions ensure atomicity of operations.
  • SQL/PSM extends SQL with procedural capabilities.

Keys

  • Candidate Key: A minimal superkey; removing any attribute loses uniqueness.
  • Primary Key: The chosen key for the relation; cannot be NULL.
  • Foreign Key: An attribute referencing a primary key in another relation.

Relational Design Guidelines

  • Relations should be simple, correspond to a single concept, and free of modification anomalies.
  • Normalization removes anomalies and ensures lossless joins.
  • Functional dependencies should be preserved.

Modification Anomalies

  • Occur when changes to data with redundancy lead to inconsistencies.
  • Insertion Anomalies: Entering more data than needed.
  • Update Anomalies: Inconsistent updates.
  • Deletion Anomalies: Unintentional data loss.

Functional Dependencies (FDs)

  • A relationship where one attribute’s value determines another’s.
  • FDs represent real-world relationships between attributes.
  • Normal forms are based on FDs.
  • Dependency preservation is crucial for data correctness.

Normal Forms

  • 1NF: All valid relations are in 1NF.
  • 2NF: 1NF plus no partial FDs.
  • 3NF: 2NF plus no transitive FDs.
  • BCNF: 3NF plus every determinant is a candidate key.
  • 4NF: 3NF plus no multi-valued dependencies.

Normalization Process

  • Transforms unsatisfactory relation schemas into better designs.
  • Uses FDs to identify candidate keys and examine attribute relationships.
  • Normalize to a higher normal form if the design is not optimal.

Normalization in Practice

  • Normalization creates good designs, but consider processing needs.
  • Denormalization can be used for performance, but start with a normalized design.

Conclusion

  • Normalization helps create flexible database designs without modification anomalies.
  • Consider denormalization for performance, but start with a normalized design.

Logical Database Design

  • It’s the process of creating a data model based on a specific data model (e.g., relational) but independent of a particular DBMS.
  • It involves refining the conceptual data model (ERD) and mapping it to a logical data model.
  • Good logical design is crucial for good database performance.

Logical Design for the Relational Model

  • The vast majority of new databases are implemented in DBMS based on the relational model.
  • All features of the ERD must be represented in tables (relations).
  • Business requirements should be represented in the logical design.

Logical Database Design Methodology

  • Represent entities as tables.
  • Represent relationships using foreign keys.
  • Validate the model using normalization.
  • Validate the model against user transactions.
  • Define and check integrity constraints.
  • Review the logical model with users.
  • Merge local models into a global model.
  • Check for future growth.
  • Document the logical design.

Representing Entities and Relationships

  • Create a table for each entity, including all attributes.
  • Normalize tables to 3NF.
  • Represent 1:N relationships by including the primary key of the 1-side as a foreign key in the N-side table.
  • For ID-dependent entities, the foreign key is already in the child table as part of the PK.
  • Recursive 1:N relationships can be represented by putting the foreign key in the same table or creating a new table.
  • For 1:1 relationships, the placement of the foreign key depends on mandatory/optional participation.
  • M:N relationships are replaced with two 1:M relationships and an intersection table.
  • Subtypes and supertypes can be represented in various ways, depending on the relationship and constraints.

Integrity Constraints

  • Domain, entity integrity, referential integrity, and enterprise constraints should be defined and checked.
  • Referential actions (foreign key rules) should be specified for INSERT, UPDATE, and DELETE to maintain referential integrity.
  • Cardinality constraints should be enforced.

Confirming and Documenting the Logical Design

  • Review the logical data model with users.
  • Check for future growth and extensibility.
  • Merge local data models into a global model if necessary.
  • Document the logical design in a data dictionary, including the ERD, relational schema, attribute descriptions, primary keys, foreign keys, referential actions, and any additional constraints.