Home » Computers » Clustering Algorithms: Applications and Methods

Clustering Algorithms: Applications and Methods

Posted on Dec 12, 2024 in Computers

1. What are Sample Applications of Clustering in Which the Objective is to Cluster the Samples/Observations?

This is the most common use of clustering.
Examples include:
- Machine learning
- Data mining
- Information retrieval
- Bioinformatics
- Data compression

2. What are Sample Applications of Clustering in Which the Objective is to Cluster the Variables/Features?

Clustering is used in market segmentation, where we try to find customers that are similar to each other, whether in terms of behaviors or attributes.
Image segmentation/compression, where we try to group similar regions together.
Document clustering based on topics.

3. Clustering Can Be Loosely Defined as “Grouping Items That are Similar/Close to Each Other”. What Components Do We Need to Formulate to Have a Well-Defined Computational Framework for Clustering?

Proximity measure (how closely related they are)
Criterion function to define good and bad clustering
Algorithm to compute clustering

4. What are Possible Ways of Choosing the Number of Clusters?

K-means
- Assume we know how many clusters there are.
- Find the number of clusters that minimizes the SSE (sum of squared errors).
Incorporate the trade-off between complexity and fit into the objective function.
- Evaluate the trade-off between complexity and fit after computing clusterings by trying different values of K.
- Hierarchical clustering
  - Pick the number after a distance tree is made.

5. What is the Objective Function of K-Means Clustering? Can K-Means Still Be Applied if Our Notion of Distance is Not Euclidean Distance?

Creates clusters based on distance from centroids.
K-means finds a locally optimal solution and is not guaranteed to find the (globally) optimal solution.
Yes, the distance function can be changed depending on the circumstances.

6. Is K-Means Guaranteed to Identify the Optimal Clustering According to its Objective Function? If Not, What Can We Say About the Optimality of the Clustering it Identifies?

No, it is not guaranteed.
The optimal solution is the clustering that minimizes the objective function.

7. Considering an Instance of Clustering Where We are Trying to Cluster Points/Observations in 2-Dimensional Space, Can You Give a Sample Instance and an Initialization That Will Make K-Means Converge to an Undesirable Solution?

Spherical clustering
- Because K-means draws a line.

8. Consider a Run of K-Means in 1-Dimensional Space, with Say K=2 Clusters. If You are Shown the Configurations of the Cluster Centroids and Assignment of Data Points to Clusters at the End of Each Iteration, Can You Order Those Iterations?

Refer to the slides for a visual example.

9. What are the Weaknesses of K-Means?

Sensitive to outliers.
The user needs to specify K.
Only applicable if the mean is supplied.

10. What are Possible Strategies for Making Algorithms More Robust to Outliers in the Application of K-Means?

Remove outliers.
Perform random sampling on a small subset of data points.

11. What are the Main Components of an Agglomerative Hierarchical Clustering Algorithm?

A distance matrix specifying pairwise distance.
Either bottom-up or top-down.
Decisions cannot be undone.

12. What are the Possible Ways of Assessing the Distance Between Groups of Items Given the Distances Between Pairs of Items?

Closest pair
Farthest pair
Average of all pairs

13. Can You Give an Example of Hierarchical Clustering in 2-Dimensional Space with Euclidean Distance as the Distance Function, Such That Single-Linkage and Complete-Linkage Will Produce the Same Dendrogram?

14. Can You Give an Example of Hierarchical Clustering in 2-Dimensional Space with Euclidean Distance as the Distance Function, Such That Single-Linkage and Complete-Linkage Will Produce Different Dendrograms?

15. What is the Key Idea of Ward’s Method?

Consider joining two clusters: how far does it change the total distance from the centroid?
Is it worth it to combine these two?

16. What is the Common Requirement of K-Means and Hierarchical Clustering Using Ward’s Method?

They both have to have computable means.

17. What are the Pros and Cons of Applying a Divisive Hierarchical Clustering Algorithm That, for Example, Uses K-Means with K=2 to Split the Points at Each Node of the Dendrogram (as Compared to Agglomerative Clustering)?

Agglomerative is faster to compute.
- Only looks at pairwise.
Divisive is “less blind” to all data.
- Can find the best possible split between two.

18. What is the (Common) Key Idea Behind Bayesian Information Criterion (BIC) and Akaike Information Criterion (AIC) in Formulating a Measure to Guide the Selection of the Number of Clusters?

They determine how good a model fits in a maximum-likelihood sense.
How many parameters are needed to get a good fit?

19. What is the Key Difference Between AIC and BIC?

AIC (Akaike Information Criterion) tries to select the model that most adequately describes an unknown, high-dimensional reality.
BIC (Bayesian Information Criterion) tries to find the TRUE model among the set of candidates.
AIC offers better predictability.
BIC finds a more efficient solution.

20. Assume That You are Given a Dendrogram and You are Trying to Translate This into a Disjoint Set of Clusters Such That the Model Complexity and Model Fit are (Locally) Optimized Together. You are Also Given a Procedure That Computes AIC for a Given Clustering. Describe an Algorithm/Method You Would Use to Identify the Final Set of Clusters.

Find the largest ΔSSE between the number of clusters you have.
Between having X clusters or Y clusters, indicates that X clusters divide the cells into much more homogenous groups than does Y groups. From this reasoning, it could be possible to pick X clusters as the final solution.