Clustering Algorithms: Applications and Methods
Posted on Dec 12, 2024 in Computers
1. What are Sample Applications of Clustering in Which the Objective is to Cluster the Samples/Observations?
- This is the most common use of clustering.
- Examples include:
- Machine learning
- Data mining
- Information retrieval
- Bioinformatics
- Data compression
2. What are Sample Applications of Clustering in Which the Objective is to Cluster the Variables/Features?
- Clustering is used in market segmentation, where we try to find customers that are similar to each other, whether in terms of behaviors or attributes.
- Image segmentation/compression, where we try to group similar regions together.
- Document clustering based on topics.
3. Clustering Can Be Loosely Defined as “Grouping Items That are Similar/Close to Each Other”. What Components Do We Need to Formulate to Have a Well-Defined Computational Framework for Clustering?
- Proximity measure (how closely related they are)
- Criterion function to define good and bad clustering
- Algorithm to compute clustering
4. What are Possible Ways of Choosing the Number of Clusters?
- K-means
- Assume we know how many clusters there are.
- Find the number of clusters that minimizes the SSE (sum of squared errors).
- Incorporate the trade-off between complexity and fit into the objective function.
- Evaluate the trade-off between complexity and fit after computing clusterings by trying different values of K.
- Hierarchical clustering
- Pick the number after a distance tree is made.
5. What is the Objective Function of K-Means Clustering? Can K-Means Still Be Applied if Our Notion of Distance is Not Euclidean Distance?
- Creates clusters based on distance from centroids.
- K-means finds a locally optimal solution and is not guaranteed to find the (globally) optimal solution.
- Yes, the distance function can be changed depending on the circumstances.
6. Is K-Means Guaranteed to Identify the Optimal Clustering According to its Objective Function? If Not, What Can We Say About the Optimality of the Clustering it Identifies?
- No, it is not guaranteed.
- The optimal solution is the clustering that minimizes the objective function.
7. Considering an Instance of Clustering Where We are Trying to Cluster Points/Observations in 2-Dimensional Space, Can You Give a Sample Instance and an Initialization That Will Make K-Means Converge to an Undesirable Solution?
- Spherical clustering
- Because K-means draws a line.
8. Consider a Run of K-Means in 1-Dimensional Space, with Say K=2 Clusters. If You are Shown the Configurations of the Cluster Centroids and Assignment of Data Points to Clusters at the End of Each Iteration, Can You Order Those Iterations?
- Refer to the slides for a visual example.
9. What are the Weaknesses of K-Means?
- Sensitive to outliers.
- The user needs to specify K.
- Only applicable if the mean is supplied.
10. What are Possible Strategies for Making Algorithms More Robust to Outliers in the Application of K-Means?
- Remove outliers.
- Perform random sampling on a small subset of data points.
11. What are the Main Components of an Agglomerative Hierarchical Clustering Algorithm?
- A distance matrix specifying pairwise distance.
- Either bottom-up or top-down.
- Decisions cannot be undone.
12. What are the Possible Ways of Assessing the Distance Between Groups of Items Given the Distances Between Pairs of Items?
- Closest pair
- Farthest pair
- Average of all pairs
13. Can You Give an Example of Hierarchical Clustering in 2-Dimensional Space with Euclidean Distance as the Distance Function, Such That Single-Linkage and Complete-Linkage Will Produce the Same Dendrogram?
14. Can You Give an Example of Hierarchical Clustering in 2-Dimensional Space with Euclidean Distance as the Distance Function, Such That Single-Linkage and Complete-Linkage Will Produce Different Dendrograms?
15. What is the Key Idea of Ward’s Method?
- Consider joining two clusters: how far does it change the total distance from the centroid?
- Is it worth it to combine these two?
16. What is the Common Requirement of K-Means and Hierarchical Clustering Using Ward’s Method?
- They both have to have computable means.
17. What are the Pros and Cons of Applying a Divisive Hierarchical Clustering Algorithm That, for Example, Uses K-Means with K=2 to Split the Points at Each Node of the Dendrogram (as Compared to Agglomerative Clustering)?
- Agglomerative is faster to compute.
- Divisive is “less blind” to all data.
- Can find the best possible split between two.
18. What is the (Common) Key Idea Behind Bayesian Information Criterion (BIC) and Akaike Information Criterion (AIC) in Formulating a Measure to Guide the Selection of the Number of Clusters?
- They determine how good a model fits in a maximum-likelihood sense.
- How many parameters are needed to get a good fit?
19. What is the Key Difference Between AIC and BIC?
- AIC (Akaike Information Criterion) tries to select the model that most adequately describes an unknown, high-dimensional reality.
- BIC (Bayesian Information Criterion) tries to find the TRUE model among the set of candidates.
- AIC offers better predictability.
- BIC finds a more efficient solution.
20. Assume That You are Given a Dendrogram and You are Trying to Translate This into a Disjoint Set of Clusters Such That the Model Complexity and Model Fit are (Locally) Optimized Together. You are Also Given a Procedure That Computes AIC for a Given Clustering. Describe an Algorithm/Method You Would Use to Identify the Final Set of Clusters.
- Find the largest ΔSSE between the number of clusters you have.
- Between having X clusters or Y clusters, indicates that X clusters divide the cells into much more homogenous groups than does Y groups. From this reasoning, it could be possible to pick X clusters as the final solution.