DA321 Multimodal Data Analysis & Learning Exam Solutions

DA321: Multimodal Data Analysis and Learning-I Exam Solutions

Question 1: Image Intensity Histogram and Convolution (4 Points)

Approach:

  1. Constructing the Histogram:

    • Create an 8 × 8 matrix with alternating black (0) and white (255) pixels.
    • Count the frequency of pixel values (0 and 255) to plot the histogram.
    • Resulting histogram:
      • 32 pixels with intensity 0.
      • 32 pixels with intensity 255.
  2. Applying the 2 × 2 Smoothing Kernel:

    • Kernel: 1/4.
    • Apply convolution by moving the kernel over the image, assuming mirrored borders for boundary pixels.
    • Show a few detailed examples of convolution outputs. For example, a 2 × 2 block of (0, 255, 255, 0) will result in a value of 127.5.
  3. Normalization Explanation:

    • Dividing by 4 ensures the output values stay within the original intensity range (0 to 255).
  4. Resulting Histogram After Convolution:

    • Draw the new histogram showing a spread of pixel values (e.g., peaks at 128).

Question 2: RGB Color Space and Fourier Spectrum (6 Points)

(a) Total Number of Colors:

  • Calculate using: 224.
  • Each color component (R, G, B) is represented by an 8-bit number, allowing for 256 values (0-255) per component.

(b) Fourier Spectrum for Each Channel:

  • Compute the 2-D Fourier transform for each channel (R, G, B) separately.
  • Due to different color distributions, the spectra for R, G, and B may vary.

(c) Edge Enhancement Approach:

  • Steps:
    • Convert the image to grayscale (optional).
    • Apply an edge detection kernel (e.g., Sobel operator).
  • Pre-processing: Apply Gaussian smoothing.
  • Post-processing: Normalize the output to highlight edges.

(d) 2-D Pattern with Increasing Frequency:

  • Draw a pattern matrix with values that form a sinusoidal pattern at 45°.

Question 3: Spectrogram Analysis (12 Points)

Approach:

  • Given temporal segments (T1 to T6), map each to sound categories:
    • T1 (Fricative): High frequency, broad spectrum.
    • T2 (Male Speech): Low to mid-range frequencies.
    • T3 (Female Speech): Higher average frequency compared to male speech.
    • T4 (Bird Chirp): Sharp, high-frequency components.
    • T5 (DTMF Tone): Specific dual frequencies.
    • T6 (Pause): Low energy, minimal spectral content.
  • Describe the typical spectral characteristics of each category.

Question 4: Fundamental Frequency Estimation (4 Points)

Approach:

  1. 2-D Fourier Transform:

    • Treat the 2-D spectrogram as an input matrix.
    • Apply the 2-D Fourier transform to extract the frequency domain representation.
  2. Identify Harmonics:

    • Locate the peak representing the fundamental frequency.
  3. Pre/Post-Processing:

    • Use a windowing function before applying the transform.
    • Perform peak detection to identify the fundamental frequency accurately.

Question 5: Low-Frequency Spectrum Transmission (6 Points)

Approach:

  1. Design High-Pass Filter:

    • Specify a filter with a cutoff frequency > 1000 Hz.
  2. Transmission Strategy:

    • Explain how the high-frequency components can be transmitted.
  3. Reconstruction:

    • At the receiver, use an inverse transform to reconstruct the signal.
  • Consider pre-filtering for noise reduction.

Question 6: Image Conversion to Binary (6 Points)

(a) Thresholding:

  • Apply a threshold (e.g., Otsu’s method) for each channel separately.
  • Choose thresholds based on maximizing inter-class variance.

(b) k-Means Clustering:

  • Steps:
    • Convert each pixel’s RGB values into feature space.
    • Apply k-means with k=2.
    • Assign binary values to clustered pixels.

Question 7: EEG Relaxation Detection (4 Points)

(a) Band Identification:

  • Choose the alpha band (8-13 Hz) for relaxation monitoring.

(b) Power Calculation Approach:

  • Compute the FFT of the EEG signal.
  • Extract the power of the alpha band.

Feedback System Implementation:

  • Use a visual or auditory alert when alpha power crosses a defined threshold.
  • Suggest real-time tracking software implementation.

Question 8: AR Model and Linear Prediction (3 Points)

(a) Stating the AR Model and Parameter Estimation Methodology:

  1. AR Model Definition:

    • An Autoregressive (AR) model of order p is defined as:

      xt = a1xt-1 + a2xt-2 + … + apxt-p + εt

      where ai are the model parameters, and εt is the error term.

  2. Parameter Estimation Methodology:

    • Least Squares Estimation:
      • Construct a system of equations using observed data samples.
      • Solve for the parameters using linear algebra techniques (e.g., solving y = Xa where X is the lagged data matrix).
    • Yule-Walker Equations:
      • Alternatively, derive parameters using the autocorrelation function and solve the Yule-Walker equations.

(b) Poor Predictions for AR(3) on Validation Set:

  • Potential reasons for poor performance:
    1. Model Order Mismatch: The actual signal might require a higher or lower order for accurate prediction.
    2. Non-stationarity: The signal might be non-stationary, violating the AR model assumptions.
    3. Overfitting: The training data might fit well with the model, but the validation data might not.
    4. Noise Sensitivity: High noise levels might affect model accuracy.

Question 9: Crossword Puzzle Solutions (10 Points)

Across:

  1. Echo (Persistence of sound as it reflects off surfaces in an enclosed space)
  2. Gaussian (Type of blur filter that uses a bell-shaped curve)
  3. Microphone (Device that converts sound waves to electrical signals)
  4. Variance (Statistical measure of the spread in a dataset)

Down:

  1. Speaker (Output device that converts electrical signals to sound waves)
  2. Neuron (Basic unit of the nervous system for transmitting signals)
  3. Median (Noise-reducing filter that replaces pixel values with the middle value)
  4. Cones (Photoreceptor cells in the eye for detecting color)
  5. Bayer (Pattern of colored filters on sensors for color images)
  6. PCA (Dimensionality reduction technique, abbreviated)