Cluster Analysis, Association, and Experimental Design

Cluster Analysis

Cluster analysis is a segmentation technique, a process by which we identify groups of consumers according to particular characteristics, with the aim of creating differentiated offers for each. Key objectives:

  • Uniformity Inside: Maximize the similarity within groups.
  • Heterogeneity Between Groups: Maximize the difference between groups.

Conclusions: Determine which group (name) and if the p-value is > 0.05, accept H0 (null hypothesis) which means there are no significant differences between groups. If there are significant differences, use socio-demographic information to further analyze the groups.

Association

Measuring Association:

  • Coefficient PHI: φ = √(Eexp / n)
  • Contingency Coefficient: C = √(Eexp / (Eexp + n))
  • Cramer’s V: V = √((Eexp / n) / min(r-1, c-1))

Note: All values < 0.6 indicate a weak association between variables.

Hypothesis Testing for Association:

  • H0 (Null Hypothesis): X and Y are independent (not associated).
  • H1 (Alternative Hypothesis): X and Y are associated.

Conclusion: Create a contingency table (frequency table). Calculate Ee = (Vo – Ve)2 / Ve + … Compare it to the Chi-squared distribution and either reject or accept H0.

Steps in Cluster Analysis

  1. Variable Selection: Variables included in the analysis should be quantitative (continuous data).
  2. Distance Calculation: Calculate the Euclidean distance between cases.
  3. Classification Method: Choose a classification method. Two types:
    • Hierarchical Cluster: For samples < 100.
    • Non-Hierarchical Cluster: For samples > 100.

Timeline for Cluster Analysis

  1. Define the quantitative scale variables used in the cluster analysis.
  2. Determine the number of clusters.
  3. Identify the variables where there are significant differences.
  4. Describe the sociodemographic characteristics of each cluster.
  5. Apply the most suitable strategy for each identified cluster.

Important: The smaller the number of clusters fixed a priori, the more reliable the result in obtaining significant differences between groups.

Experimental Design

Latin Square Design

Question: What type of experimental design should you use and why?

Answer: A Latin Square design is best suited because of its characteristics:

  • Two external variables and one treatment variable.
  • Test variables are not all homogeneous.

(Create a table with total treatments and averages)

Dtotal = Dtreatment + Dmonth + Destablishment + ERROR (the difference).

(Create a table!)

  • X-files (rows): Price, Month, Provider, Error, Total.
  • Columns: Sources of Dispersion, Sum of Squares (D), Degrees of Freedom, Variance (Sum of Squares / Degrees of Freedom), F. Snedecor (Variance / Error).

Compare the calculated F statistic (RSTD) with the critical value from the F-distribution table (e.g., at the 0.05 significance level, often denoted as F0.05). If the calculated F > critical F, reject H0; if calculated F < critical F, accept H0.

Key Question: Which variables are determinants in the recommendation of “what is q?”

Answer: Those variables for which the null hypothesis (H0) is rejected.

Example: If the price significantly influences sales (H0 is rejected), we face inelastic demand, so we should set the price that maximizes profit.

T-Test

Objective: Draw conclusions about the importance of economic management and other factors, depending on whether individuals are paid or not.

Conclusions (indicating mean values):

  1. Importance of “1”: The second chart shows a Levene’s test for equality of variances with a significance of 0.654 > 0.05. Therefore, we accept H0, assuming equal variances. The t-test for equality of means (bilateral) has a significance of 0.436 > 0.05. We accept H0: μ1 = μ2, meaning there is no statistically significant difference in means.
  2. Importance of the “2”: (Follow the same procedure as above).

Variable Types Required for the Analysis:

  • Dependent Variables (Continuous): Importance of economic management and importance of “other factors.”
  • Independent Variable (Categorical): Whether individuals are paid (yes/no).