Database Concepts and Architectures

Posted on Oct 24, 2024 in Technology

Tables and Keys

A table (or relation) represents a specific entity of data. It’s composed of:

Primary Key: A unique value identifying each tuple (row).
Tuples (Rows): Instances of the entity.
Attributes (Columns): Characteristics of the entity.

Primary Key: Unique across all tuples, used for identification and querying.

Foreign Key: Links to the primary key of another table.

Data Integrity

Ensures data accuracy, consistency, and validity through rules and constraints.

Entity Integrity: No primary key attribute can be null.
Referential Integrity: Non-null foreign keys must match a related table’s primary key.

NoSQL DBMS in PostgreSQL

PostgreSQL, while primarily an RDBMS, can handle NoSQL data models.

Hierarchical Data Models

Tree-like parent-child relationships.
Simple and efficient.
Limited flexibility and querying difficulty.
Used in IBM’s Information Management System.

Network Data Model

Addresses hierarchical model limitations.
Child nodes can have multiple parent nodes.
More complex to manage.

SQL Commands

WHERE: Filters data based on conditions.

ORDER BY: Sorts data (DESC for descending).

SET: Specifies update arguments.

Joins

Combine columns from multiple tables.

Inner Joins

Return only rows matching the join condition.
Explicit Method (SQL-92): Uses the JOIN keyword.
Implicit Method: Older method without the JOIN keyword.

A join condition specifies columns and comparison methods.

Cells, Rows, and Columns

A column represents an attribute, a row represents an instance, and a cell (intersection) stores a single value.

Self-Joins

Joins a table to itself (rare).

Outer Joins

Return all rows from one table, regardless of the join condition.
Use RIGHT or LEFT keywords (OUTER is optional).
Unmatched columns receive NULL values.

Equijoins

Use the = operator to join on matching columns.

The USING keyword simplifies joins on common columns.

Natural Joins

Automatically joins on columns with the same name.

Cross Joins

Combines every row from one table with every row from another, creating a Cartesian product.

Data Warehousing

Specialized for large datasets, analytics, and reporting.

Uses denormalized models (star, snowflake).

Star Schema

Efficient for storage, history, and updates.

OLTP vs. OLAP

OLTP (Online Transaction Processing): Handles numerous short operations (updates). Common in retail, inventory, and finance.

OLAP (Online Analytical Processing): Used for data mining and business logic.

ETL (Extract, Transform, Load)

Transforms OLTP data into OLAP data.

SQL Command Categories

DML (Data Manipulation Language): SELECT, INSERT, UPDATE, DELETE.
DDL (Data Definition Language): CREATE, DROP.
DCL (Data Control Language): GRANT.

Aggregate Functions

Aggregate numerical results (COUNT, SUM, AVG).

Nested Functions (Subqueries)

Queries within queries.

Window Functions

Advanced aggregate functions returning a value for each row based on a specified window.

NULL Values

Represent missing data.

String Operations

Manipulate string data (CONCAT, LENGTH, UPPER, LOWER, TRIM).

Date Time Operations

Format and manipulate date-time values (NOW, DATE, TIME).

Views

Virtual tables based on SQL statements.

Simplify queries.
Restrict data access.
Ensure consistency.
Provide abstraction.

Indexes

Improve data retrieval speed.

ALL and DISTINCT

ALL (default) includes all values, DISTINCT excludes duplicates.

Database Architecture

Single-tier: Application and data on one machine.
Two-tier: Client-server model.
Three-tier: UI, application logic, and data layer separated.

Database Schema Layers

External Schema (User View): Customized view for each user.
Conceptual Schema: Logical view of the entire database.
Internal Schema: Physical storage method.

SMP (Symmetric Multiprocessing)

Processors share resources.

MPP (Massively Parallel Processing)

Processors have local resources.

Data Warehouse Layers

Bronze: Raw data.
Silver: Cleaned and transformed data.
Gold: Business-ready data.

Data Warehouse vs. Operational Database

Different purposes, schema designs, and data structures.

Data Structures

Structured: Highly organized.
Semi-structured: Some organization (JSON, XML).
Unstructured: No pre-defined format.

ACID Model

Ensures reliable transaction processing (Atomicity, Consistency, Isolation, Durability).

Normalization

Reduces redundancy and ensures data integrity.

Referential Integrity

Maintains relationships between tables.

Data Governance

Manages data availability, usability, integrity, security, and consistency.

ETL vs. ELT

Different data transformation approaches.

Entity-Relationship (ER) Model

Conceptual tool for database design.

External Tables

Reference data stored outside the database.

Partitioning

Divides a database into smaller pieces.

Hash Function

Creates a fixed-size string from an input.

Cloud Database Security and Privacy

High priority for confidential data.

Cloud Database Recovery Management

Protects against data loss and failures.

Cloud Database Auto Scaling

Automatically adjusts resources.

Cloud Database Mirroring

Creates real-time copies for availability.

Failover Clusters

Maintain high availability through redundant servers.