Statistics: Regression Analysis, Sampling Methods, and Quantitative Research
Regression Analysis
Regression table:
Multiple R: coefficient of correlation (0,0995) → 9,95% of variability in BMI (Y) is connected with 9,95% of the variability in age (X).
R square: coefficient of determination (0,0099) → 0,99% of variance in BMI (Y) is explained by our regression model.
St error in kg/m2: the prediction of the BMI (X) made using our model will different from reality by around 5,677 kg/m2.
Obs: we have a total of 400 obs in our model.
Intercept (B0)
Coefficients: if the age will increase by 1 year the BMI will increase by 29,0653745 kg/m2; if our admission rate (X) increases by 1%, our tuition fees (Y) decreases by 58308,25 *1 %= 583,0826$.
T-Stat: (coeff/st error) 35,2488498
P-value: level of risk of almost 0 so the probability is 99,99%.
Lower 95%/Upper 95%: we are 95% confident that our coefficient b0 between 27,4443 and 30,6865.
Age: (B1)
P-value: the relationship between age and the BMI is statistically significance for 95,33% of prob (1-p value).
Regression equation: 29,0653745(b0)+ 0,03919041(b1)x
Sampling Methods
BOXPLOT (de abajo a arriba):
The lowest point: the lowest admission rate in a US university is 3,39%.
First quartile: 25% of the universities in our sample have the admission rate of less than 12,11%; 75% of the universities in our sample accept more than 12,11% students.
Second quartile (medium): 50% of the universities in our sample accept less than 18,2% of the first year candidates; 50% of the universities in our sample accept more than 18,2% of the first year candidates.
Third quartile: 75% of the universities in our sample accept less than 25,7%% of the first year candidates; 25% of the universities in our sample accept more than 25,7%% of the first year candidates.
The number we have inside the rectangle is the average: The average university in our sample accepts 20% of the first year candidates.
Stats: refers to the body of techniques used for collecting, organizing, analyzing, and interpreting data. Quantitative: values expressed numerically, are variables (continuous- infinite range of variables, exam grade; discrete- values expressed as a whole nº, people in a restaurant). Qualitative: values expressed in words (nominal- not able to order, gender; ordinal- can be ranked or ordered, educational levels). Descriptive stats: describes de characts of the data (average height of people on a soccer team ); inferential stats: techniques by which decisions about a statistical pop or processes are made based on a sample (questionnaire in ESIC, is everyone answering?).
Random sampling: everyone has the same chance to be taken into consideration (simple, random nº from 1 to 10 and select 10 random mº; systematic, entering in class teacher selecting every 3rd person; stratified, only the person who have a given charact; custer, 2 or more charact); Non-random sampling: direcly select.
Quantitative Research
Quantitative Research: obj – develop formal mathematical models, theories and hypotheses on the basis of existing studies (exploratory, done by us; descriptive, given phenomena; causal, cost & effective). Levels of measurement: nominal (non-ordered), ordinal (can be ordered), interval (distance btw scores), continuous/discrete.
Internal Validity: Is there any relationship btw the selected variables and cause-effect association? smoke->cancer, cause->effect, X->Y; External Validity: how well does the conducted study or used sample related to the general pop.
Mean: average; Median: middle value in the dataset when the values are arranged is ascending or descending order; Mode: value that occurs most frequently in our samp or pop.
Measures of dispersion: Deviation (average distance btw each data point and the mean, individual diff btw points and central value); Variance; St Dev (specific statistical measure that summarized these deviations into a single value, providing a standarized measure of variability in the dataset); Coefficient of variation; Normal distribution; Pearson’s coefficient of skewness (symmetrical=0, positively, negatively).
Regression analysis: obj is to estimate the value of a random variable (Y) given that the value of an associated variable (X) is known. Regression eq: algebraic formula by which the estimated value of the dep variable is determined. Regression line – scatter plot: + slope (direct relationship btw variables); – slope (inverse relationship); 0 slope (no relationship) -> the more closer to the line, the more strenght relationship, the more spread, the weaker relationship.
Scatter plot: graph in which each plotted point represents an observed pair of values for the indep and dep variables.
Dep Y: stock returns, cultural expenditures/press expenses per month, exchange rates, delivery time, sales, ice-cream orders.
Indep X: interest rates, income per month, trade balance, distance, advertising expenses, temperature.
Correlation coefficient graphs: a (linea arriba, puntos juntos: perfectly + correlation), b (linea enmedio, esparcido: no correlation, no relationship), c (linea abajo, puntos juntos: perfect – correlation), d (linea arriba, esparcido: + correlation, points more scatter), e (linea abajo, esparcido: – correlation, inverse relationship).
The closer the correlation coefficient is to 1, the stronger the relation.