Database Concepts: Structure, Management, and SQL Queries

Posted on Mar 31, 2025 in Computers

Understanding Databases

A database is an organized collection of data that is stored, managed, and accessed electronically. It allows users to efficiently store, retrieve, update, and manipulate data. Databases are commonly used in applications, websites, and businesses to manage structured information.

Database Types

Relational Databases (RDBMS): Use structured tables with rows and columns (e.g., MySQL, PostgreSQL, SQL Server).
NoSQL Databases: Designed for unstructured or semi-structured data (e.g., MongoDB, Cassandra, Redis).
Cloud Databases: Hosted on cloud platforms for scalability (e.g., Amazon RDS, Google Firebase).
Graph Databases: Store relationships between entities efficiently (e.g., Neo4j).
Time-Series Databases: Optimized for time-stamped data (e.g., InfluxDB).
Key-Value Databases: Store data as key-value pairs (e.g., Redis, DynamoDB).

Key Database Components

Tables (for relational databases)
Records (rows) and Fields (columns)
Indexes for faster searches
Queries for data retrieval
Transactions to ensure data integrity

Database Management Systems (DBMS)

A DBMS (Database Management System) is software that allows users to create, manage, and interact with databases. It provides tools for storing, retrieving, updating, and managing data efficiently while ensuring security, integrity, and consistency.

DBMS Functions

Data Storage & Retrieval: Efficiently stores and retrieves data.
Data Security: Protects data with authentication and authorization.
Data Integrity: Ensures accuracy and consistency of data.
Concurrency Control: Manages multiple users accessing data simultaneously.
Backup & Recovery: Prevents data loss with regular backups.

DBMS Types

Relational DBMS (RDBMS): Uses tables (e.g., MySQL, PostgreSQL, Oracle).
NoSQL DBMS: Handles unstructured data (e.g., MongoDB, Cassandra).
Hierarchical DBMS: Uses a tree-like structure (e.g., IBM IMS).
Network DBMS: Uses a graph-like structure for relationships (e.g., IDMS).

Database Normalization Explained

Normalization is the process of organizing data in a database to minimize redundancy and improve data integrity. It involves breaking down large tables into smaller, related tables and establishing relationships between them.

Goals of Normalization

Eliminate Data Redundancy: Avoid storing duplicate data.
Ensure Data Integrity: Prevent anomalies during insert, update, or delete operations.
Improve Data Consistency: Keep data accurate and consistent across tables.
Optimize Query Performance: Reduce the complexity of data retrieval.

Normal Forms (NF)

Normalization is achieved through a series of stages called Normal Forms:

First Normal Form (1NF): Eliminates duplicate columns and ensures each column contains atomic (indivisible) values.
Second Normal Form (2NF): Ensures 1NF and removes partial dependencies (i.e., all non-key attributes depend entirely on the primary key).
Third Normal Form (3NF): Ensures 2NF and removes transitive dependencies (i.e., non-key attributes must not depend on other non-key attributes).
Boyce-Codd Normal Form (BCNF): A stricter version of 3NF, ensuring that every determinant is a candidate key.
Fourth Normal Form (4NF): Ensures BCNF and removes multi-valued dependencies.
Fifth Normal Form (5NF): Ensures 4NF and resolves join dependencies.

First Normal Form (1NF)

Rule:

Eliminate duplicate columns.
Ensure each column contains atomic (indivisible) values.
Each row must have a unique identifier (Primary Key).

Example (Before 1NF – Unnormalized Table):

Student ID	Name	Subject
101	Farheed	English, Math
102	Saroj	English, Nepali

Problem: The “Subject” column contains multiple values (not atomic).

After applying 1NF:

Student ID	Name	Subject
101	Farheed	English
101	Farheed	Math
102	Saroj	English
102	Saroj	Nepali

Now, the table follows 1NF since each column contains atomic values.

Second Normal Form (2NF)

Rule:

Must be in 1NF.
Remove partial dependencies (a column should depend on the whole primary key, not part of it).
If a composite primary key exists, all non-key attributes must depend on the entire key.

Example (Before 2NF – Violating Partial Dependency):
(Assuming composite key: Student_ID, Course)

Student ID	Course	Instructor	Instructor Phone No
101	Math	Mr. Saroj	9803252836
101	Science	Mr. Farheed	9818471478

Problem: “Instructor” and “Instructor Phone No” depend only on “Course”, which is part of the composite key, not the entire key.

After applying 2NF:

Student-Course Table:

Student ID	Course
101	Math
101	Science

Course-Instructor Table:

Course	Instructor	Instructor Phone No
Math	Mr. Saroj	9803252836
Science	Mr. Farheed	9818471478

Now, every non-key column depends on the whole primary key of its respective table, ensuring 2NF.

Third Normal Form (3NF)

Rule:

Must be in 2NF.
Remove transitive dependencies (when a non-key column depends on another non-key column rather than the primary key).

Example (Before 3NF – Transitive Dependency):

Student ID	Name	Course	Department	Dept Head
101	Kamal	Math	Science Dept	Mr. Saroj
102	Lalit	English	Arts Dept	Mr. Farheed

Problem: “Dept Head” depends on “Department” (a non-key attribute), which in turn depends on “Student ID” (the primary key). This is a transitive dependency.

After applying 3NF:

Student-Course Table:

Student ID	Name	Course	Department
101	Kamal	Math	Science Dept
102	Lalit	English	Arts Dept

Department Table:

Department	Dept Head
Science Dept	Mr. Saroj
Arts Dept	Mr. Farheed

Now, there are no transitive dependencies, ensuring 3NF.

Fourth Normal Form (4NF)

Rule:

Must be in 3NF (or BCNF).
Eliminate multi-valued dependencies (when two or more independent attributes depend on the same primary key).

Example (Before 4NF – Multi-Valued Dependency):

Student ID	Course	Hobby
101	Math	Football
101	Math	Cricket
101	English	Football
101	English	Cricket

Problem: “Course” and “Hobby” are independent multi-valued attributes that both depend on “Student ID,” causing redundancy (e.g., the fact that student 101 takes Math is listed twice, once for each hobby).

After applying 4NF:

Student-Course Table:

Student ID	Course
101	Math
101	English

Student-Hobby Table:

Student ID	Hobby
101	Football
101	Cricket

Now, the multi-valued dependency is removed by splitting the table, ensuring 4NF.

Why is Normalization Important?

Eliminates Data Redundancy:
- Redundant data wastes storage space and makes updates more complex.
- Example: Without normalization, a Customer table might store the same address multiple times if the customer places multiple orders.
- Solution: Split into separate Customer and Address tables, linked appropriately.
Improves Data Integrity and Consistency:
- Ensures that changes in one place are reflected consistently.
- Example: If a customer’s phone number is stored in multiple places, an update might be missed in some locations, leading to inconsistency.
- Solution: Store the phone number in one central customer table and reference it where needed.
Simplifies Data Modification (Insert, Update, Delete):
- Without normalization: Updating a value (like a department head’s name) might require changes in multiple rows where that department appears.
- With normalization: Data is updated in one place (e.g., the Department table), reducing errors and effort.
- Example: If “Publisher Name” is repeated in a Books table, a name change requires modifying multiple book rows.
- Solution: Create a separate Publisher table and link it via a foreign key in the Books table.
Improves Query Performance:
- Smaller, well-structured tables generally allow for faster searches, indexing, and joins.
- Example: Searching for all books by “Pearson Publishers” is quicker if there’s an indexed Publisher table linked to the Books table, rather than scanning a potentially huge Books table containing repeated publisher names.
Maintains Data Integrity with Relationships:
- Enforces Referential Integrity using Foreign Keys.
- Ensures that relationships between tables are valid and prevents orphaned records (e.g., an order record referring to a customer ID that no longer exists).
- Example: A BookIssue table should not allow an entry for a BookId that does not exist in the Book table.

What is a Data Model?

A data model is a conceptual framework that defines how data is structured, stored, and manipulated in a database. It provides a blueprint for designing a database by specifying entities, attributes, relationships, constraints, and rules for organizing data.

Understanding Stored Procedures

A Stored Procedure is a precompiled set of SQL statements that can be stored in the database and executed multiple times by name. It helps improve performance (by reducing network traffic and precompiling execution plans), maintainability (by centralizing logic), and security (by granting execute permissions instead of direct table access) in database operations.

Database Triggers Explained

A Trigger is a special type of stored procedure that automatically executes (or “fires”) when a specified event (INSERT, UPDATE, DELETE) occurs on a specific table or view. Triggers are often used to enforce complex business rules, maintain audit logs, ensure data consistency across related tables, or automate actions based on data modifications.

Union Compatibility in Databases

Union Compatibility refers to a condition in relational algebra and SQL where two tables (or the results of two queries) can be combined using set operations like UNION, INTERSECT, and EXCEPT (or MINUS). For two relations to be union-compatible, they must satisfy two conditions:

They must have the same number of attributes (columns).
The data types of corresponding attributes must be compatible or implicitly convertible.

Data Definition Language (DDL)

DDL (Data Definition Language) refers to a subset of SQL commands used to define, modify, and manage the structure of database objects, such as tables, indexes, views, schemas, users, and permissions. DDL commands do not manipulate the data itself but instead focus on the database schema or structure. Common DDL commands include CREATE, ALTER, and DROP.

Data Manipulation Language (DML)

DML (Data Manipulation Language) refers to a subset of SQL commands used to manipulate and manage the data stored within database tables. Unlike DDL, which deals with the structure, DML allows you to insert, update, delete, and retrieve data from tables. Common DML commands include SELECT, INSERT, UPDATE, and DELETE.

Database Security Fundamentals

Database Security refers to the collective measures, controls, and practices used to protect a database from unauthorized access, misuse, modification, disclosure, or destruction. It aims to ensure the confidentiality (preventing unauthorized disclosure), integrity (preventing unauthorized modification or deletion), and availability (ensuring authorized users can access data when needed) of the data stored in the database system. Key aspects include authentication, authorization (permissions/privileges), encryption, auditing, and backup/recovery strategies.

Relational Algebra Concepts

Relational Algebra is a procedural query language based on set theory and algebra. It provides a formal foundation for relational database operations and is used to specify how to retrieve and manipulate data stored in relations (tables). It defines a set of operations that take one or more relations as input and produce a new relation as output. Key operations include Selection (σ), Projection (π), Union (∪), Set Difference (-), Cartesian Product (×), Rename (ρ), and Join (⋈).

Functional Dependencies in Databases

A Functional Dependency (FD) is a constraint between two sets of attributes in a relation (table) from a database. For a given relation R, an attribute Y is functionally dependent on an attribute X (denoted as X → Y) if and only if each X value in R is associated with precisely one Y value in R. In simpler terms, the value of X uniquely determines the value of Y.

Notation:
If attribute set X determines attribute set Y, it is written as:

X → Y

This means that for each unique value (or combination of values) of X, there is only one corresponding value (or combination of values) of Y.

Types of Functional Dependencies

Trivial Functional Dependency:
A functional dependency X → Y is trivial if Y is a subset of X (Y ⊆ X).
Example: {A, B} → A
Since A is already part of the determinant {A, B}, knowing {A, B} trivially determines A.
Non-Trivial Functional Dependency:
A functional dependency X → Y is non-trivial if Y is not a subset of X (Y <0xE2><0x8A><0x88> X).
Example: Student_ID → Student_Name
Here, Student_ID determines Student_Name, and “Student_Name” is not part of the determinant “Student_ID.” This is the most common and useful type of FD.
Partial Functional Dependency:
A partial dependency occurs when a non-prime attribute (an attribute not part of any candidate key) is functionally dependent on only a part of a composite candidate key.
Example: Consider a table Enrollment(Student_ID, Course_ID, Course_Name) where {Student_ID, Course_ID} is the primary key.
Student ID Course ID Course Name
101 CSE101 Database
102 CSE102 Network
Here, we have the FD: {Student_ID, Course_ID} → Course_Name (as it’s the primary key).
However, we also likely have: Course_ID → Course_Name.
Since Course_Name (a non-prime attribute) depends only on Course_ID (part of the primary key), this is a partial dependency.
Problem: Leads to data redundancy and violates 2NF.
Transitive Functional Dependency:
A transitive dependency occurs when there is an indirect relationship that causes a functional dependency. If we have X → Y and Y → Z, then we also have X → Z transitively, where X is a candidate key (or superkey), and Y is not a candidate key, and Z is a non-prime attribute.
Example: Consider a table StudentDept(Student_ID, Department, Dept_Head)
Student ID Department Dept Head
101 Science Dr. Saroj
102 Arts Dr. Farheed
Here: Student_ID → Department (A student belongs to one department).
And: Department → Dept_Head (A department has one head).
Thus, Student_ID → Dept_Head exists transitively via Department.
Problem: Leads to data redundancy and update anomalies, violating 3NF.
Multivalued Dependency (MVD):
A multivalued dependency exists when the presence of one row in a table implies the presence of one or more other rows in that same table. It occurs when two or more independent, multi-valued attributes are dependent on a single key attribute. Denoted as X →→ Y.
Example: Consider EmployeeSkills(Employee, Skill, Hobby)
Employee Skill Hobby
John SQL Chess
John Python Chess
John SQL Football
John Python Football
Here, for employee ‘John’, the skills {SQL, Python} and hobbies {Chess, Football} are independent. This leads to MVDs: Employee →→ Skill and Employee →→ Hobby. To represent all combinations, redundancy is introduced.
Problem: Leads to data anomalies and violates 4NF.

Student ID	Course ID	Course Name
101	CSE101	Database
102	CSE102	Network

Student ID	Department	Dept Head
101	Science	Dr. Saroj
102	Arts	Dr. Farheed

Employee	Skill	Hobby
John	SQL	Chess
John	Python	Chess
John	SQL	Football
John	Python	Football

Relational Algebra and SQL Query Examples

Given the schema as below:

Sailors(sid, sname, rating, age)
Boats(bid, bname, color)
Reserves(sid, bid, reserve_date)

A. Find name, rating, and age of all sailors

Relational Algebra:

π sname, rating, age (Sailors)

SQL Query:

SELECT sname, rating, age
FROM Sailors;

B. Find the name and color of boats having color ‘blue’

Relational Algebra:

π bname, color (σ color='blue' (Boats))

SQL Query:

SELECT bname, color
FROM Boats
WHERE color = 'blue';

C. Find the name of the boats reserved on date ‘2021-07-20’

Relational Algebra:

π bname (σ reserve_date='2021-07-20' (Reserves ⋈ Boats))

SQL Query:

SELECT b.bname
FROM Boats b
JOIN Reserves r ON b.bid = r.bid
WHERE r.reserve_date = '2021-07-20';

D. Find the names of boats having color ‘green’

Relational Algebra:

π bname (σ color='green' (Boats))

SQL Query:

SELECT bname
FROM Boats
WHERE color = 'green';

E. Use LEFT and RIGHT OUTER JOIN between Sailors and Reserves

LEFT OUTER JOIN (Show all sailors, and their reservations if any):

SELECT s.sid, s.sname, r.bid, r.reserve_date
FROM Sailors s
LEFT OUTER JOIN Reserves r ON s.sid = r.sid;

RIGHT OUTER JOIN (Show all reservations, and the sailor details if available):

SELECT s.sid, s.sname, r.bid, r.reserve_date
FROM Sailors s
RIGHT OUTER JOIN Reserves r ON s.sid = r.sid;

SQL Query Examples: Book Database

Consider the following table schemas:

Book(BookId, Title, Publisher, PublishedDate, Price)
Book_Authors(BookId, AuthorName)
Borrower(CardNo, Name, Address, Contact)
BookIssue(BookId, CardNo, IssuedDate, DueDate)

A. Find the name of the books taken by “Ramesh Sharma”

SQL Query:

SELECT B.Title
FROM Book B
JOIN BookIssue BI ON B.BookId = BI.BookId
JOIN Borrower BR ON BI.CardNo = BR.CardNo
WHERE BR.Name = 'Ramesh Sharma';

B. Find the issued book name and the borrower’s name who have issued date before 15th May 2023 and due date after 15th June 2023

SQL Query:

SELECT B.Title, BR.Name
FROM Book B
JOIN BookIssue BI ON B.BookId = BI.BookId
JOIN Borrower BR ON BI.CardNo = BR.CardNo
WHERE BI.IssuedDate < '2023-05-15'
  AND BI.DueDate > '2023-06-15';

C. Find the borrower’s name who has taken the book written by “Laxmi Prashad Devkota”

SQL Query:

SELECT DISTINCT BR.Name
FROM Borrower BR
JOIN BookIssue BI ON BR.CardNo = BI.CardNo
JOIN Book_Authors BA ON BI.BookId = BA.BookId
WHERE BA.AuthorName = 'Laxmi Prashad Devkota';

D. Find the number of books issued on the date 25th April 2023

SQL Query:

SELECT COUNT(*) AS Number_of_Books
FROM BookIssue
WHERE IssuedDate = '2023-04-25';

E. Find the borrower’s name who has taken a book having a price greater than the average price.

SQL Query:

SELECT DISTINCT BR.Name
FROM Borrower BR
JOIN BookIssue BI ON BR.CardNo = BI.CardNo
JOIN Book B ON BI.BookId = B.BookId
WHERE B.Price > (SELECT AVG(Price) FROM Book);

Database Concepts: Structure, Management, and SQL Queries

Understanding Databases

Database Types

Key Database Components

Database Management Systems (DBMS)

DBMS Functions

DBMS Types

Database Normalization Explained

Goals of Normalization

Normal Forms (NF)

First Normal Form (1NF)

Second Normal Form (2NF)

Third Normal Form (3NF)

Fourth Normal Form (4NF)

Why is Normalization Important?

What is a Data Model?

Understanding Stored Procedures

Database Triggers Explained

Union Compatibility in Databases

Data Definition Language (DDL)

Data Manipulation Language (DML)

Database Security Fundamentals

Relational Algebra Concepts

Functional Dependencies in Databases

Types of Functional Dependencies

Relational Algebra and SQL Query Examples

SQL Query Examples: Book Database

Recent Notes

Subjects

Publicidad