Database Concepts and Design Principles

Posted on Mar 14, 2025 in Commercialization and Logistics

What is a Database?

A database is a logically organized collection of data with inherent meaning, representing some aspect of the real world and designed for a specific purpose.
It is designed, built, and populated with data for a specific purpose and has an intended group of users and applications.

Early Databases

Record-keeping predates computers, with early forms like clay tablets and manual filing systems.
Computerized file-based systems initially mimicked manual filing systems, storing related data together in files.
This approach led to issues such as data separation, redundancy, and dependence.

Database Approach

In the database approach, data is defined in a data dictionary and managed independently of applications, enabling data independence.
This approach offers benefits like reduced redundancy, improved data integrity, and better security.

Data Models

A data model is an abstract concept with structure, operators, and constraints, providing a way to organize and manipulate data.
Types of data models include hierarchical, network, relational, object-oriented, and non-relational (NoSQL).

Three-Level Architecture

The ANSI-SPARC three-level architecture separates user views from physical data representation.
It consists of the external level (user view), conceptual level (logical structure), and internal level (physical implementation).

Functions of a DBMS

A database management system (DBMS) is software that allows users to define, create, manipulate, and maintain a database.
A DBMS provides functions like data storage and retrieval, a user-accessible catalog, transaction support, concurrency control, recovery services, authorization services, data communications support, and integrity services.

Relational Databases

In a relational database, data is organized into tables with rows (tuples) and columns (attributes).
Labels are provided by attribute names, measures or domains by attribute definitions, and values in table cells.

Conclusion

A database is a self-describing collection of logically related records.
The database approach enables data sharing, reducing problems with data separation, isolation, and duplication.
The three-level architecture provides data independence.

Relational Model Characteristics

A relation is a table of values, with each row (tuple) representing related data.
Each column (attribute) specifies how to interpret data values in each row.
The domain defines the type of data allowed in each attribute.
Each tuple is unique, identified by a key.

Keys

Superkey: Any attribute or combination with unique values for each tuple.
Candidate Key: A minimal superkey; removing any attribute loses uniqueness.
Primary Key: The chosen key for the relation; cannot be NULL.
Alternate Key: Any candidate key not chosen as the primary key.
Foreign Key: An attribute referencing a primary key in another relation.
Secondary Key: Non-unique, used for data retrieval.
Surrogate Key: Artificial primary key, often for implementation.

Integrity Constraints

Domain Constraint: Attribute values must come from the same specified domain.
Entity Integrity Constraint: Primary key must be unique and not null.
Referential Integrity Constraint: Foreign key must match a primary key or be null.
Enterprise Constraints: Business rules that must hold true.

Relational Algebra

Restrict (σ): Subset of tuples based on a condition.
Project (∏): Subset of attributes.
Cartesian Product (X): All possible tuple combinations from two relations.
Join (⋈): Product followed by restrict, usually on related records.
Outer Join: Preserves non-matching tuples.
Division (÷): Matching on a subset of values.
Union (U): All tuples from either relation.
Intersection (∩): Tuples present in both relations.
Difference (-): Tuples in the first relation but not the second.

SQL (Structured Query Language)

Ease of programming, data independence, code optimization, and portability are advantages of using a database-specific language like SQL.

SQL is the industry standard language for relational databases.

A Brief History of SQL

SQL has evolved through various versions and standards, with the first commercial implementation in 1978 by Oracle Corp.

Most vendors support most of the standard, but there are differences in support levels and feature names.

Some SQL Features

Data Definition Language (DDL): Used for creating, altering, and dropping tables, views, indexes, etc. (e.g., CREATE TABLE, ALTER TABLE, DROP TABLE, CREATE VIEW, DROP VIEW, CREATE INDEX).
Data Manipulation Language (DML): Used for inserting, deleting, updating, and querying data (e.g., INSERT, DELETE, UPDATE, SELECT).
Data Control Language (DCL): Used for controlling access to database objects (e.g., GRANT, REVOKE).
Transaction Control Language (TCL): Used for managing transactions (e.g., COMMIT, ROLLBACK).
SQL/Persistent Stored Modules (SQL/PSM): Extensions for procedural programming concepts (e.g., functions, triggers, procedures).

SQL in Labs

SQL is used in labs for various purposes, including data definition, data manipulation, data control, transaction control, and procedural language constructs.

SQL for Data Manipulation: `SELECT`

SELECT queries retrieve data from tables.
Basic syntax includes SELECT, FROM, WHERE, GROUP BY, HAVING, and ORDER BY clauses.
SELECT specifies columns to retrieve.
WHERE filters rows based on conditions.
Comparison operators, logical operators, range search, pattern matching, and IS NULL/IS NOT NULL are used in the WHERE clause.
JOIN operator or matching values in WHERE clause are used for joins.
Aggregate functions (COUNT, MIN, MAX, SUM, AVG) operate on columns.
GROUP BY groups rows based on attribute values.
HAVING filters groups.
Subqueries are queries within queries.
Set operators (UNION, INTERSECT, MINUS) combine query results.

SQL for Data Definition

CREATE TABLE defines a new table with columns, data types, and constraints.
CREATE VIEW creates virtual tables from base tables.
Constraints include NOT NULL, UNIQUE, primary key, foreign key, and CHECK.
DROP TABLE removes a table.
ALTER TABLE modifies table structure.
CREATE INDEX and DROP INDEX manage indexes.

Other SQL Features

GRANT and REVOKE control privileges on tables.
Transactions ensure atomicity of operations.
SQL/PSM extends SQL with procedural capabilities.

Keys

Candidate Key: A minimal superkey; removing any attribute loses uniqueness.
Primary Key: The chosen key for the relation; cannot be NULL.
Foreign Key: An attribute referencing a primary key in another relation.

Relational Design Guidelines

Relations should be simple, correspond to a single concept, and free of modification anomalies.
Normalization removes anomalies and ensures lossless joins.
Functional dependencies should be preserved.

Modification Anomalies

Occur when changes to data with redundancy lead to inconsistencies.
Insertion Anomalies: Entering more data than needed.
Update Anomalies: Inconsistent updates.
Deletion Anomalies: Unintentional data loss.

Functional Dependencies (FDs)

A relationship where one attribute’s value determines another’s.
FDs represent real-world relationships between attributes.
Normal forms are based on FDs.
Dependency preservation is crucial for data correctness.

Normal Forms

1NF: All valid relations are in 1NF.
2NF: 1NF plus no partial FDs.
3NF: 2NF plus no transitive FDs.
BCNF: 3NF plus every determinant is a candidate key.
4NF: 3NF plus no multi-valued dependencies.

Normalization Process

Transforms unsatisfactory relation schemas into better designs.
Uses FDs to identify candidate keys and examine attribute relationships.
Normalize to a higher normal form if the design is not optimal.

Normalization in Practice

Normalization creates good designs, but consider processing needs.
Denormalization can be used for performance, but start with a normalized design.

Conclusion

Normalization helps create flexible database designs without modification anomalies.
Consider denormalization for performance, but start with a normalized design.

Logical Database Design

It’s the process of creating a data model based on a specific data model (e.g., relational) but independent of a particular DBMS.
It involves refining the conceptual data model (ERD) and mapping it to a logical data model.
Good logical design is crucial for good database performance.

Logical Design for the Relational Model

The vast majority of new databases are implemented in DBMS based on the relational model.
All features of the ERD must be represented in tables (relations).
Business requirements should be represented in the logical design.

Logical Database Design Methodology

Represent entities as tables.
Represent relationships using foreign keys.
Validate the model using normalization.
Validate the model against user transactions.
Define and check integrity constraints.
Review the logical model with users.
Merge local models into a global model.
Check for future growth.
Document the logical design.

Representing Entities and Relationships

Create a table for each entity, including all attributes.
Normalize tables to 3NF.
Represent 1:N relationships by including the primary key of the 1-side as a foreign key in the N-side table.
For ID-dependent entities, the foreign key is already in the child table as part of the PK.
Recursive 1:N relationships can be represented by putting the foreign key in the same table or creating a new table.
For 1:1 relationships, the placement of the foreign key depends on mandatory/optional participation.
M:N relationships are replaced with two 1:M relationships and an intersection table.
Subtypes and supertypes can be represented in various ways, depending on the relationship and constraints.

Integrity Constraints

Domain, entity integrity, referential integrity, and enterprise constraints should be defined and checked.
Referential actions (foreign key rules) should be specified for INSERT, UPDATE, and DELETE to maintain referential integrity.
Cardinality constraints should be enforced.

Confirming and Documenting the Logical Design

Review the logical data model with users.
Check for future growth and extensibility.
Merge local data models into a global model if necessary.
Document the logical design in a data dictionary, including the ERD, relational schema, attribute descriptions, primary keys, foreign keys, referential actions, and any additional constraints.

Database Concepts and Design Principles

What is a Database?

Early Databases

Database Approach

Data Models

Three-Level Architecture

Functions of a DBMS

Relational Databases

Conclusion

Relational Model Characteristics

Keys

Integrity Constraints

Relational Algebra

SQL (Structured Query Language)

A Brief History of SQL

Some SQL Features

SQL in Labs

SQL for Data Manipulation: `SELECT`

SQL for Data Definition

Other SQL Features

Keys

Relational Design Guidelines

Modification Anomalies

Functional Dependencies (FDs)

Normal Forms

Normalization Process

Normalization in Practice

Conclusion

Logical Database Design

Logical Design for the Relational Model

Logical Database Design Methodology

Representing Entities and Relationships

Integrity Constraints

Confirming and Documenting the Logical Design

Recent Notes

Subjects

Publicidad

Database Concepts and Design Principles

What is a Database?

Early Databases

Database Approach

Data Models

Three-Level Architecture

Functions of a DBMS

Relational Databases

Conclusion

Relational Model Characteristics

Keys

Integrity Constraints

Relational Algebra

SQL (Structured Query Language)

A Brief History of SQL

Some SQL Features

SQL in Labs

SQL for Data Manipulation: SELECT

SQL for Data Definition

Other SQL Features

Keys

Relational Design Guidelines

Modification Anomalies

Functional Dependencies (FDs)

Normal Forms

Normalization Process

Normalization in Practice

Conclusion

Logical Database Design

Logical Design for the Relational Model

Logical Database Design Methodology

Representing Entities and Relationships

Integrity Constraints

Confirming and Documenting the Logical Design

Recent Notes

Subjects

Publicidad

SQL for Data Manipulation: `SELECT`