Cluster Analysis, Association, and Experimental Design
Cluster Analysis
Cluster analysis is a segmentation technique, a process by which we identify groups of consumers according to particular characteristics, with the aim of creating differentiated offers for each. Key objectives:
- Uniformity Inside: Maximize the similarity within groups.
- Heterogeneity Between Groups: Maximize the difference between groups.
Conclusions: Determine which group (name) and if the p-value is > 0.05, accept H0 (null hypothesis) which means there are no significant differences between groups. If there are significant differences, use socio-demographic information to further analyze the groups.
Association
Measuring Association:
- Coefficient PHI: φ = √(Eexp / n)
- Contingency Coefficient: C = √(Eexp / (Eexp + n))
- Cramer’s V: V = √((Eexp / n) / min(r-1, c-1))
Note: All values < 0.6 indicate a weak association between variables.
Hypothesis Testing for Association:
- H0 (Null Hypothesis): X and Y are independent (not associated).
- H1 (Alternative Hypothesis): X and Y are associated.
Conclusion: Create a contingency table (frequency table). Calculate Ee = (Vo – Ve)2 / Ve + … Compare it to the Chi-squared distribution and either reject or accept H0.
Steps in Cluster Analysis
- Variable Selection: Variables included in the analysis should be quantitative (continuous data).
- Distance Calculation: Calculate the Euclidean distance between cases.
- Classification Method: Choose a classification method. Two types:
- Hierarchical Cluster: For samples < 100.
- Non-Hierarchical Cluster: For samples > 100.
Timeline for Cluster Analysis
- Define the quantitative scale variables used in the cluster analysis.
- Determine the number of clusters.
- Identify the variables where there are significant differences.
- Describe the sociodemographic characteristics of each cluster.
- Apply the most suitable strategy for each identified cluster.
Important: The smaller the number of clusters fixed a priori, the more reliable the result in obtaining significant differences between groups.
Experimental Design
Latin Square Design
Question: What type of experimental design should you use and why?
Answer: A Latin Square design is best suited because of its characteristics:
- Two external variables and one treatment variable.
- Test variables are not all homogeneous.
(Create a table with total treatments and averages)
Dtotal = Dtreatment + Dmonth + Destablishment + ERROR (the difference).
(Create a table!)
- X-files (rows): Price, Month, Provider, Error, Total.
- Columns: Sources of Dispersion, Sum of Squares (D), Degrees of Freedom, Variance (Sum of Squares / Degrees of Freedom), F. Snedecor (Variance / Error).
Compare the calculated F statistic (RSTD) with the critical value from the F-distribution table (e.g., at the 0.05 significance level, often denoted as F0.05). If the calculated F > critical F, reject H0; if calculated F < critical F, accept H0.
Key Question: Which variables are determinants in the recommendation of “what is q?”
Answer: Those variables for which the null hypothesis (H0) is rejected.
Example: If the price significantly influences sales (H0 is rejected), we face inelastic demand, so we should set the price that maximizes profit.
T-Test
Objective: Draw conclusions about the importance of economic management and other factors, depending on whether individuals are paid or not.
Conclusions (indicating mean values):
- Importance of “1”: The second chart shows a Levene’s test for equality of variances with a significance of 0.654 > 0.05. Therefore, we accept H0, assuming equal variances. The t-test for equality of means (bilateral) has a significance of 0.436 > 0.05. We accept H0: μ1 = μ2, meaning there is no statistically significant difference in means.
- Importance of the “2”: (Follow the same procedure as above).
Variable Types Required for the Analysis:
- Dependent Variables (Continuous): Importance of economic management and importance of “other factors.”
- Independent Variable (Categorical): Whether individuals are paid (yes/no).