asdfsedfs

Q. explain non parametric methods

1. Mann-Whitney U Test (Wilcoxon Rank-Sum Test): Used to compare the distributions of two independent groups or samples to determine if they have different medians. It is an alternative to the independent samples t-test.

2. Wilcoxon Signed-Rank Test: Used to compare the medians of paired or matched samples to determine if there is a significant difference. It is an alternative to the paired samples t-test.

3. Kruskal-Wallis Test: Used to compare the distributions of three or more independent groups or samples to determine if they have different medians. It is an alternative to the one-way ANOVA.

4. Friedman Test: Used to compare three or more related samples or repeated measures to determine if there are significant differences among the medians. It is an alternative to the repeated measures ANOVA.

5. Spearman’s Rank-Order Correlation: Used to measure the strength and direction of the monotonic relationship between two variables when the data are ranked. It is an alternative to Pearson’s correlation coefficient.

6. Kendall’s Rank Correlation: Used to measure the strength and direction of the monotonic relationship between two variables when the data are ranked. It is another alternative to Pearson’s correlation coefficient.

7. Sign Test: Used to determine if the median of a paired sample differs significantly from a specified value. It is a simple nonparametric test that compares the signs of the differences.



Exponential Smoothing:

->Exponential smoothing is another widely used technique for time series analysis and forecasting.

->It assigns exponentially decreasing weights to past observations, giving more importance to recent data points while gradually diminishing the impact of older ones.

->The exponential smoothing method calculates the forecast as a weighted sum of past observations.

Here’s how exponential smoothing works:

1. Define the smoothing factor or coefficient, denoted as α (alpha), which controls the rate at which weights decrease. The value of α lies between 0 and 1, with higher values giving more weight to recent observations.

2. Initialize the forecast for the first time period as the value of the initial observation.

3. For each subsequent time period, update the forecast using the formula:

   Forecast(t) = α * Observation(t) + (1 – α) * Forecast(t-1)

   Here, Observation(t) represents the actual value at time t, and Forecast(t-1) is the forecasted value at the previous time period.

4. Repeat this process for all time periods in the series, updating the forecast at each step.

The exponential smoothing technique adapts to changes in the time series, reacting more strongly to recent observations while still considering past values. It provides a weighted average of previous data points and can be useful for short-term forecasting when there is no clear trend or seasonality in the data.



Moving Average:

  • Moving average is a popular method used in time series analysis and forecasting to smooth out fluctuations and identify underlying trends.
  • It calculates the average of a fixed number of consecutive data points within a given window and uses that average as the smoothed value.
  • The window size determines the number of data points considered for the calculation.

1. Select a window size (e.g., 5, 10, or 20 time periods) depending on the frequency and characteristics of your data. A larger window size captures long-term trends but may overlook short-term fluctuations, while a smaller window size captures more immediate changes but may introduce more noise.

2. Compute the average of the data points within the window by summing them up and dividing by the window size.

3. Slide the window by one time period and recalculate the average based on the new set of data points within the window.

4. Repeat this process until you have calculated the moving average for all relevant time periods.

  • The moving average smooths out the data by reducing the impact of   random fluctuations and highlighting the underlying trend.
  • It can be useful for
    • detecting patterns,
    • identifying turning points, and
    • more stable series for forecasting.


Cluster Sampling Cluster
it is a probability sampling technique where researchers divide the population into multiple groups (clusters) for research. So researchers then select random groups with a simple random or systematic random sampling technique for data collection and analysis.
• In cluster sampling, the elements in the population are first divided into separate groups called clusters. Each element of the population belongs to one and only one cluster. Suppose we want to study the performance of Second year students in Mathematics in our Institute, then we can consider 4 clusters – one for each branch and then obtain samples from each branch using one of the above random sampling techniques.
• One of the primary applications of cluster sampling is area sampling, where clusters are city blocks or other well-defined areas.
• Cluster sampling generally requires a larger total sample size than either simple random sampling or stratified random sampling.
Stratified Random Sampling
• In stratified random sampling, the elements in the population are first divided into groups called strata, such that each element in the population belongs to one and only one stratum.
• A stratified sample should ensure that subgroups (strata) of a given population are each adequately represented within the whole sample population of a research
study. 
• For example, in our Institute there are four engineering branches and the Mechanical branch has twice the number of students as any other branch. Therefore, if I select a stratified random sample of size 10, then I should take a sample of size 4 from Mech and samples of size 2 each from the other branches.
• The basis for forming the strata, such as department, location, age, industry type, and so on, is at the discretion of the designer of the sample. However, the best
results are obtained when the elements within each stratum are as much alike as possible


Time series patterns refer to recurring behaviors or structures observed in a time series data set. These patterns provide insights into the underlying dynamics and can be used to make predictions and informed decisions. Here are some common time series patterns:

1. Trend: A trend refers to a long-term increase or decrease in the data over time. It indicates the overall direction of the series and can be linear (straight line) or nonlinear (curved). Trends can be ascending (upward), descending (downward), or horizontal (no significant change).

2. Seasonality: Seasonality represents patterns that repeat at regular intervals within a time series. These patterns could be daily, weekly, monthly, or yearly. Seasonal effects often occur due to factors such as holidays, weather conditions, or cultural events. Identifying and accounting for seasonality is crucial for accurate forecasting.

3. Cyclical: Cyclical patterns are longer-term oscillations that are not necessarily fixed to specific time periods like seasonality. These cycles can extend over several years and are usually influenced by economic, political, or natural factors. Cyclical patterns do not have the same regularity as seasonality and can have varying lengths and amplitudes.

4. Irregular/Random: Irregular or random patterns represent unpredictable fluctuations and noise within a time series. These could be caused by random events, outliers, measurement errors, or other unforeseen factors. Irregular patterns do not follow any specific trend, seasonality, or cyclical behavior.

5. Autocorrelation: Autocorrelation refers to the relationship between observations at different time points within a time series. Positive autocorrelation indicates that high values are followed by high values and low values by low values, while negative autocorrelation suggests an inverse relationship. Autocorrelation is important for identifying patterns and choosing appropriate forecasting models.



6. Level Shifts: Level shifts occur when there is a sudden and permanent change in the mean value of the time series. These shifts can be caused by structural changes in the underlying process, such as policy changes, economic shifts, or external events. Detecting level shifts is crucial for understanding changes in the series’ behavior.

7. Outliers: Outliers are extreme values that deviate significantly from the overall pattern of the time series. They can be caused by measurement errors, anomalies, or rare events. Outliers can distort the analysis and forecasting process, so identifying and handling them appropriately is essential.

Understanding these time series patterns helps analysts select appropriate modeling techniques and develop accurate forecasts. Various statistical methods, such as ARIMA, seasonal decomposition of time series, and machine learning algorithms, can be applied to capture and account for these patterns in time series analysis and forecasting.



Q. Time series decomposition

Time series decomposition is a statistical technique used to break down a time series into its underlying components: trend, seasonality, and residual (or error). This decomposition helps in understanding the underlying patterns and structures within the time series data, which can be useful for forecasting, analysis, and anomaly detection.The three main components :

1. Trend: The trend component represents the long-term movement or directionality of the time series. It captures the overall pattern or tendency of the data over an extended period. The trend can be increasing, decreasing, or stationary 

2. Seasonality: The seasonality component represents the regular, repeating patterns or fluctuations within the time series that occur at fixed intervals. Seasonality can be observed in various time frames, such as daily, weekly, monthly, or yearly. It often arises due to factors like weather, holidays, or business cycles.

3. Residual (Error): The residual component, also known as the error or remainder, represents the random or irregular fluctuations that cannot be explained by the trend or seasonality. It consists of the unexplained variability in the time series data and is often assumed to follow a random distribution with a mean of zero.