Star vs. Snowflake Schema: SQL Examples & Python Clustering

Posted on Mar 30, 2025 in Naval Architecture

Snowflake Schema Example

Below are SQL examples demonstrating the structure of a Snowflake schema:

CREATE TABLE sales_fact (
    customer_id VARCHAR(10),
    car_id VARCHAR(10),
    date_id DATE
);

CREATE TABLE customer_dim (
    customer_id INT PRIMARY KEY,
    name VARCHAR(20),
    address VARCHAR(25),
    phone_no INT,
    city VARCHAR(10)
);

CREATE TABLE car_dim (
    car_id INT PRIMARY KEY,
    name VARCHAR(10),
    year INT,
    model VARCHAR(10)
);

CREATE TABLE date_dim (
    date_id INT PRIMARY KEY,
    day_of_week VARCHAR(10),
    month VARCHAR(10),
    year INT
);

CREATE TABLE city_dim (
    state VARCHAR(10),
    zipcode INT
);

CREATE TABLE model_dim (
    model_id INT,
    year INT,
    items_sold INT
);

Star Schema Example

Below are SQL examples demonstrating the structure of a Star schema:

CREATE TABLE sales_fact (
    customer_id VARCHAR(10),
    car_id INT,
    date_id DATE,
    amount MONEY
);

CREATE TABLE car_dim (
    car_id INT PRIMARY KEY,
    name VARCHAR(10),
    year INT,
    model VARCHAR(10)
);

CREATE TABLE customer_dim (
    customer_id INT PRIMARY KEY,
    name VARCHAR(20),
    address VARCHAR(25),
    phone_no INT
);

CREATE TABLE date_dim (
    date_id INT PRIMARY KEY,
    date DATE,
    day_of_week INT,
    year INT,
    month VARCHAR(10)
);

K-Means Clustering with Python

This Python script demonstrates K-Means clustering on the Iris dataset using scikit-learn.

import numpy as np
import matplotlib.pyplot as plt
from sklearn.cluster import KMeans
from sklearn.datasets import load_iris
from sklearn.decomposition import PCA

# Load the Iris dataset
iris = load_iris()
X = iris.data
y = iris.target

# Apply k-means clustering
kmeans = KMeans(n_clusters=3, random_state=42)
kmeans.fit(X)

# Get cluster labels and cluster centers
labels = kmeans.labels_
centers = kmeans.cluster_centers_

# Reduce the dimensionality for visualization (using PCA)
pca = PCA(n_components=2)
X_pca = pca.fit_transform(X)

# Plot the clusters
plt.figure(figsize=(8, 6))

scatter = plt.scatter(X_pca[:, 0], X_pca[:, 1], c=labels, cmap='viridis', edgecolor='k', s=80)
plt.scatter(centers[:, 0], centers[:, 1], c='red', marker='X', s=200, label='Cluster Centers')

plt.title('K-Means Clustering of Iris Dataset')
plt.xlabel('Principal Component 1')
plt.ylabel('Principal Component 2')
plt.legend()
plt.show()

Installation

Install the necessary Python libraries using pip:

pip install numpy matplotlib scikit-learn

Data Warehouse Table Examples

Dimension Table: Time

An example of a time dimension table:

time_id	date	day_of_week	month	quarter	year
1	2023-01-01	Sunday	January	Q1	2023
2	2023-01-02	Monday	January	Q1	2023
…	…	…	…	…	…

Column Descriptions:

time_id: Primary key for the time dimension.
date: The specific date.
day_of_week: Day of the week.
month: Month of the year.
quarter: Quarter of the year.
year: Year.

Fact Table: HospitalFacts

An example of a fact table for hospital data:

patient_id	doctor_id	time_id	admission_count	diagnosis_code	treatment_code	cost
101	201	1	2	A	X	500
102	202	2	1	B	Y	300
…	…	…	…	…	…	…

Column Descriptions:

patient_id: Foreign key to the patient dimension.
doctor_id: Foreign key to the doctor dimension.
time_id: Foreign key to the time dimension.
admission_count: Number of times a patient is admitted.
diagnosis_code: Code representing the diagnosis.
treatment_code: Code representing the treatment.
cost: Cost associated with the admission, diagnosis, and treatment.

html>