Database Concepts and Architectures
Tables and Keys
A table (or relation) represents a specific entity of data. It’s composed of:
- Primary Key: A unique value identifying each tuple (row).
- Tuples (Rows): Instances of the entity.
- Attributes (Columns): Characteristics of the entity.
Primary Key: Unique across all tuples, used for identification and querying.
Foreign Key: Links to the primary key of another table.
Data Integrity
Ensures data accuracy, consistency, and validity through rules and constraints.
- Entity Integrity: No primary key attribute can be null.
- Referential Integrity: Non-null foreign keys must match a related table’s primary key.
NoSQL DBMS in PostgreSQL
PostgreSQL, while primarily an RDBMS, can handle NoSQL data models.
Hierarchical Data Models
- Tree-like parent-child relationships.
- Simple and efficient.
- Limited flexibility and querying difficulty.
- Used in IBM’s Information Management System.
Network Data Model
- Addresses hierarchical model limitations.
- Child nodes can have multiple parent nodes.
- More complex to manage.
SQL Commands
WHERE: Filters data based on conditions.
ORDER BY: Sorts data (DESC for descending).
SET: Specifies update arguments.
Joins
Combine columns from multiple tables.
Inner Joins
- Return only rows matching the join condition.
- Explicit Method (SQL-92): Uses the
JOIN
keyword. - Implicit Method: Older method without the
JOIN
keyword.
A join condition specifies columns and comparison methods.
Cells, Rows, and Columns
A column represents an attribute, a row represents an instance, and a cell (intersection) stores a single value.
Self-Joins
Joins a table to itself (rare).
Outer Joins
- Return all rows from one table, regardless of the join condition.
- Use
RIGHT
orLEFT
keywords (OUTER
is optional). - Unmatched columns receive
NULL
values.
Equijoins
Use the =
operator to join on matching columns.
The USING
keyword simplifies joins on common columns.
Natural Joins
Automatically joins on columns with the same name.
Cross Joins
Combines every row from one table with every row from another, creating a Cartesian product.
Data Warehousing
Specialized for large datasets, analytics, and reporting.
Uses denormalized models (star, snowflake).
Star Schema
Efficient for storage, history, and updates.
OLTP vs. OLAP
OLTP (Online Transaction Processing): Handles numerous short operations (updates). Common in retail, inventory, and finance.
OLAP (Online Analytical Processing): Used for data mining and business logic.
ETL (Extract, Transform, Load)
Transforms OLTP data into OLAP data.
SQL Command Categories
- DML (Data Manipulation Language):
SELECT
,INSERT
,UPDATE
,DELETE
. - DDL (Data Definition Language):
CREATE
,DROP
. - DCL (Data Control Language):
GRANT
.
Aggregate Functions
Aggregate numerical results (COUNT
, SUM
, AVG
).
Nested Functions (Subqueries)
Queries within queries.
Window Functions
Advanced aggregate functions returning a value for each row based on a specified window.
NULL Values
Represent missing data.
String Operations
Manipulate string data (CONCAT
, LENGTH
, UPPER
, LOWER
, TRIM
).
Date Time Operations
Format and manipulate date-time values (NOW
, DATE
, TIME
).
Views
Virtual tables based on SQL statements.
- Simplify queries.
- Restrict data access.
- Ensure consistency.
- Provide abstraction.
Indexes
Improve data retrieval speed.
ALL and DISTINCT
ALL
(default) includes all values, DISTINCT
excludes duplicates.
Database Architecture
- Single-tier: Application and data on one machine.
- Two-tier: Client-server model.
- Three-tier: UI, application logic, and data layer separated.
Database Schema Layers
- External Schema (User View): Customized view for each user.
- Conceptual Schema: Logical view of the entire database.
- Internal Schema: Physical storage method.
SMP (Symmetric Multiprocessing)
Processors share resources.
MPP (Massively Parallel Processing)
Processors have local resources.
Data Warehouse Layers
- Bronze: Raw data.
- Silver: Cleaned and transformed data.
- Gold: Business-ready data.
Data Warehouse vs. Operational Database
Different purposes, schema designs, and data structures.
Data Structures
- Structured: Highly organized.
- Semi-structured: Some organization (JSON, XML).
- Unstructured: No pre-defined format.
ACID Model
Ensures reliable transaction processing (Atomicity, Consistency, Isolation, Durability).
Normalization
Reduces redundancy and ensures data integrity.
Referential Integrity
Maintains relationships between tables.
Data Governance
Manages data availability, usability, integrity, security, and consistency.
ETL vs. ELT
Different data transformation approaches.
Entity-Relationship (ER) Model
Conceptual tool for database design.
External Tables
Reference data stored outside the database.
Partitioning
Divides a database into smaller pieces.
Hash Function
Creates a fixed-size string from an input.
Cloud Database Security and Privacy
High priority for confidential data.
Cloud Database Recovery Management
Protects against data loss and failures.
Cloud Database Auto Scaling
Automatically adjusts resources.
Cloud Database Mirroring
Creates real-time copies for availability.
Failover Clusters
Maintain high availability through redundant servers.