Statistics Concepts: Observational Studies, Bias, and Inference
Statistics Concepts and Definitions
Ex. 1:
Observational study: No cause-effect; just associations. Five Number Summary = Min, Q1, Median, Q3, Max
Factors: Explanatory variable (x). Covariance: + or – relation but not strength
Block design: Individuals sharing the same characteristic are pooled.
SRS (Simple Random Sample); Stratified: Sample distinct groups separately then combine them. Sample survey: Cross-sectional; collect data of a population at one point in time.
Multistage: Using SRS within SRSs.
Bias: Undercoverage (selection bias – some groups are left out of the selection process); nonresponse and response (behavior of the respondent or interviewer can cause it – wording).
Case control: Retrospective: looking back into the past for exposure factors.
Cohort: Subjects sharing a defining characteristic are observed at regular intervals over an extended period of time. Ex: births. Longitudinal: involves comparison (exposed vs non) – variables over time.
– Prospective: se chequean cada cierto tiempo.
Qualitative data: Histograms and Dot Plots; Categorical data: Bar graph and pie charts; For 2 related variables: scatterplot
Discrete variables: Only specific values, countable in a specific amount of time. Continuous: Can take on an uncountable set of values. Ex: histograms – age, height, weight, temperature…
= standard dev.
= variance = % of variation
= Left
= right
Ex.2:
Ojo: if we know on table prob A, and they ask prob of something given that… p(A)/p(B total)
Discrete/Binomial Prob. Distributions: Success X, specific n, independent, yes/no answers. When np>=10 and n(1-p)>=10 we can say binomial distribution is Approx. Normally distributed.
Sampling distributions:
Numerical:
Categorical:
CLC (Central Limit Theorem): If n is large enough, resulting sampling distribution will be normal.
Law of large numbers: As n increases, the sample stats get closer to true parameters.
Inference:
If 90% -> 95% -> 99% = M.E increases
If n increases, M.E decreases
If n increases, p decreases
or
Type I error: Rejecting Ho when it’s true (p= alfa)
Type II error: Failing to reject Ho when it’s false (p= beta)
Significance level (alfa): % prob. below value is unusual.
p-value: Prob. under Ho we would see this evidence or increase against Ho by chance.
Inference (1 mean population):
Requirements: population= Normally distributed, random sample, representative samples (>40)
Inference (2 mean population):
Population proportion (p): Requirements
-Large sample (z): random samples and normally distributed (n
30)
–Inference (1-sample proportion):
-C.I: n
(successes) and (n(1-
)) (failures) is
15 (if n
10 with 90% CI use “+4 method for 1 sample proportion”)
-Hypothesis test: if n
(successes) and # of failures n(1-
)) (failures) is
10.
– Inference (2-sample proportion):
-C.I: if n
and n(1-
))
10 for each sample (if n
5 in each sample, use “+4 method” for 2 samples proportion”)
-Hypothesis test: if n
and n(1-
))
5 in each sample
Population proportion – 1 sample:
Parameter: p(0
;
1); Statistic:
Population proportion – 2 samples:
)
Notas:
– With a (%) confidence level, we can state that the (population parameter) is between (lower limit) and (upper limit).
– There (is/isn’t) sufficient evidence at the ___ level of significance to conclude (translate Ha)
* Crea un frequency table for proportions.
– There (is/isn’t) statistically significant evidence in the proportion of ___ compared to ____