Statistics Concepts: Observational Studies, Bias, and Inference

Statistics Concepts and Definitions

Ex. 1:

Observational study: No cause-effect; just associations. Five Number Summary = Min, Q1, Median, Q3, Max

Factors: Explanatory variable (x). Covariance: + or – relation but not strength

Block design: Individuals sharing the same characteristic are pooled.

SRS (Simple Random Sample); Stratified: Sample distinct groups separately then combine them. Sample survey: Cross-sectional; collect data of a population at one point in time.

Multistage: Using SRS within SRSs.

Bias: Undercoverage (selection bias – some groups are left out of the selection process); nonresponse and response (behavior of the respondent or interviewer can cause it – wording).

Case control: Retrospective: looking back into the past for exposure factors.

Cohort: Subjects sharing a defining characteristic are observed at regular intervals over an extended period of time. Ex: births. Longitudinal: involves comparison (exposed vs non) – variables over time.

– Prospective: se chequean cada cierto tiempo.

Qualitative data: Histograms and Dot Plots; Categorical data: Bar graph and pie charts; For 2 related variables: scatterplot

Discrete variables: Only specific values, countable in a specific amount of time. Continuous: Can take on an uncountable set of values. Ex: histograms – age, height, weight, temperature…

e6ipkoXYc2wiXv9ETfUjeXdkaFNoubcwbjR1WEWJ2UUQ7HjQLs8Z782XBOrYjGa9qzdWCvflyQytCbjfeWzuxNgfq7ZFDK0LyUbyCmd5L2lQselCAUgSuh0YKHWWQ1gx6lBM2WfQf19alwhMxUvUAAAAASUVORK5CYII=

= standard dev. 8n5RyogdepaKYeL9SLxMzl8FRxE7WNSp3y8wAhqPDLzYdlT5HNYeDLb8cL0w+YlpuxFWYpRAAAAABJRU5ErkJggg==

= variance = % of variation AdTg8EcJYH77AAAAAElFTkSuQmCC

wM33LI1HIP96gAAAABJRU5ErkJggg==

0mUDPZMiYAvk+Oz59hvihvZtl5V8d5UOjcGk8feDtDF270y+s4Z+AuczgSgg077XwAAAABJRU5ErkJggg==

gHRFY4W10OocAAAAABJRU5ErkJggg==

Hj6rhsqbAAAAABJRU5ErkJggg==

BNf1txbeZ0BuC0qzgcpXL0AAAAASUVORK5CYII=

2c8hmnqAAAAABJRU5ErkJggg==

wPrswr+GBTgcgAAAABJRU5ErkJggg==

gct3j+Bp0AqXWFjRT+AAAAAElFTkSuQmCC

= Left 8Benm558cb1BEfUPG7AgpUFIgzRYGKAnGmKFBRIM4UBSoKxJmiQEWBOFMUqCgQZ4oCFQXiTFGgokCcKQpUFIgzRYGKAnGmKFBRIM4UBSoKxJmiQEWBOFMUqCgQZ4oCFQXiTFGgokCcKQpUFIgzRYGKAnGmKFBRIM4UBSoKxJmiQEWBOLtqUbf3j+ftZ13e+to+z7Wf3zCnqdZEu6rUAAAAAElFTkSuQmCC

= right

Ex.2:

Ojo: if we know on table prob A, and they ask prob of something given that… p(A)/p(B total)

97FQ1lG4iXIAAAAASUVORK5CYII=

gdc7FFkH8FySQAAAABJRU5ErkJggg==

i22G4beTjdy50aJfdQjmkCdE1sQAAAAASUVORK5CYII=

blXugAAAAASUVORK5CYII=

twcdKfhnZoGEj6FdIEVGMtetgf8AS2jaX2T4I4MAAAAASUVORK5CYII=

wGxYGnwGqqhhwAAAABJRU5ErkJggg==

E3Mzyxv0bcBqMvRl7zrYTA6RTj7K6Ga+biyBSBa+wK3ci+sJP58l1B3hWc+mIrhzyc5nSauQMQYa6g2CtqIGrg92jgfyzy2Rq3+zr6AAAAAElFTkSuQmCC

jPCf9flw5tvBlJCAAAAAElFTkSuQmCC

EAAAAASUVORK5CYII=

PCYEDQ+AfYZjIlQIJNmwAAAAASUVORK5CYII=

d94QjAAAAAElFTkSuQmCC

Discrete/Binomial Prob. Distributions: Success X, specific n, independent, yes/no answers. When np>=10 and n(1-p)>=10 we can say binomial distribution is Approx. Normally distributed.

SEZQAAAABJRU5ErkJggg==

aiLnV5315IhvQ2Tl76dKetAlDm9Crdjv9o7i0kDPSrsYyHQzYEb3MNd5IGTWPXxfEhNNS5iR1FSEBgPgcvVz3g2F0kFgX8Fgf8BDFXaLyh70e4AAAAASUVORK5CYII=

Sampling distributions:

Numerical: gLUlNDmh80y00AAAAASUVORK5CYII=

DIMaYcUZ7FxcWxKEtDcA1MhdYbPyiD9wSkNQESSvcyMBMEf8Pp7iZEsnxjuxOiiXvYlWDT+8A5Ge6d4t97+CNtkWxbTVygAAAABJRU5ErkJggg==

Categorical: NlXEWSAnZgenvIGUEQXAUCY6U0sMsd1yLnNwakhyAV7ITjTqv9A3OggWO+kjGkAAAAAElFTkSuQmCC

mSccx9gHAJW37n3AcaIgZa0494HuC4CHfoAky8lfgEWh1CbyuxZwQAAAABJRU5ErkJggg==

CLC (Central Limit Theorem): If n is large enough, resulting sampling distribution will be normal.

Law of large numbers: As n increases, the sample stats get closer to true parameters.


Inference:

9OsnwvNepVUAAAAASUVORK5CYII=

If 90% -> 95% -> 99% = M.E increases

If n increases, M.E decreases

tbbQpUAAAAASUVORK5CYII=

If n increases, p decreases

dkIt1tdE4B+SWaHNlarcIgAAAABJRU5ErkJggg==

BA5IvwBmsaRDz+1mvgAAAABJRU5ErkJggg==

4C6Qm2rBKJZ0eAAAAAElFTkSuQmCC

or PmBsKAAAAAElFTkSuQmCC

e2Ss3sAOM4heQiIRfPwcO2Pfo92QpmsVv4xXdhwayRhFCz7y+BTTwAjKqAZboYHAJVu3j8oZAaoQEL+Hxi8zpydyNFMOEpiT1F44f2bAFBOYHSKBxil4GN+oi+f1lI0F2I1OKQYYhdDlXbSz5bD7H3mj2Rshont3KFvbKObwvrzRYs8PhvykDx7961Zv0P8jWcbfdOE8+AAAAAElFTkSuQmCC

Type I error: Rejecting Ho when it’s true (p= alfa)

Type II error: Failing to reject Ho when it’s false (p= beta)

3Z9iPtAe3jAAAAAElFTkSuQmCC

Significance level (alfa): % prob. below value is unusual.

p-value: Prob. under Ho we would see this evidence or increase against Ho by chance.

Inference (1 mean population):

Requirements: population= Normally distributed, random sample, representative samples (>40)

zmqdy2hOcRkoaga7tnvs6FLYmbYNdCP8PvJyGl+ejpuQAAAAASUVORK5CYII=

dA2kpyoxQPbuDaD+PmRPlbLRJVkmYoe6mZ2JgRAb+Afmq3YgdsDYFAAAAAElFTkSuQmCC

mG2M6Z7aQaIVJzIXHHaCkK6Na4nXIDETgH6BO0vQKeAQyAAAAAElFTkSuQmCC

0FXy2OU12zrAAAAAElFTkSuQmCC

4hXYUQXpxvlnsQ2IPAONhcV1dMDSx5AHrAWAsfIUL4XafDP0Dhk6iOH75G+cAAAAASUVORK5CYII=

zAxkBi7OQFJdenEU2UBmYNgM5Dwcdvyz99+DgZyH3yMOGcWwGfgHjMrpAhSL9ykAAAAASUVORK5CYII=

W1Ul7zgcRx6eFg5Vov6pp4DsZqtjBiE3jNhmKZEw2LqqXM3T36ZkQmBA4dgT+B61jIqr125mMAAAAAElFTkSuQmCC

Inference (2 mean population):

FjpOd1xbsIOuebsMnvQH11uMH1w+KTuoIColhKsK036PAF3ds0uZXDVJGAAAAAElFTkSuQmCC

giE4ysG6+bwb+AVAiVUf+NsAaAAAAAElFTkSuQmCC

ZLo3J8UfJCcAAAAASUVORK5CYII=


Population proportion (p): Requirements

-Large sample (z): random samples and normally distributed (n E2H+MFPzMfA3+ZzLiokWBMjL0so01pA5ZSIEQX08b+p1DVII3lUJkGxH6cF19pbTdI5kQEQImztGF94fxxlTfFYWmOxPLGI8ohObWEl4j7vppDxGNaBXy8rXkI2xIftcuzxxfXXwEYudYYxF05UgAAAABJRU5ErkJggg==

30)

Inference (1-sample proportion):

-C.I: n CDRsKsJXmC8xKmf4kB6gCCbWkMDFArYBywq+7cusKQFuAJcSeIY6WlCtIBsgSuHsQB2QeSABsEciLIdogzgCELFrdKSwO7B8gAi4IlUAyCehdIMTHcvnYMZBcGAJkKtQGhHMjCGSQAOditPYrhqNIAAAAASUVORK5CYII=

(successes) and (n(1- CDRsKsJXmC8xKmf4kB6gCCbWkMDFArYBywq+7cusKQFuAJcSeIY6WlCtIBsgSuHsQB2QeSABsEciLIdogzgCELFrdKSwO7B8gAi4IlUAyCehdIMTHcvnYMZBcGAJkKtQGhHMjCGSQAOditPYrhqNIAAAAASUVORK5CYII=

)) (failures) is E2H+MFPzMfA3+ZzLiokWBMjL0so01pA5ZSIEQX08b+p1DVII3lUJkGxH6cF19pbTdI5kQEQImztGF94fxxlTfFYWmOxPLGI8ohObWEl4j7vppDxGNaBXy8rXkI2xIftcuzxxfXXwEYudYYxF05UgAAAABJRU5ErkJggg==

15 (if n E2H+MFPzMfA3+ZzLiokWBMjL0so01pA5ZSIEQX08b+p1DVII3lUJkGxH6cF19pbTdI5kQEQImztGF94fxxlTfFYWmOxPLGI8ohObWEl4j7vppDxGNaBXy8rXkI2xIftcuzxxfXXwEYudYYxF05UgAAAABJRU5ErkJggg==

10 with 90% CI use “+4 method for 1 sample proportion”)

-Hypothesis test: if n CDRsKsJXmC8xKmf4kB6gCCbWkMDFArYBywq+7cusKQFuAJcSeIY6WlCtIBsgSuHsQB2QeSABsEciLIdogzgCELFrdKSwO7B8gAi4IlUAyCehdIMTHcvnYMZBcGAJkKtQGhHMjCGSQAOditPYrhqNIAAAAASUVORK5CYII=

(successes) and # of failures n(1- CDRsKsJXmC8xKmf4kB6gCCbWkMDFArYBywq+7cusKQFuAJcSeIY6WlCtIBsgSuHsQB2QeSABsEciLIdogzgCELFrdKSwO7B8gAi4IlUAyCehdIMTHcvnYMZBcGAJkKtQGhHMjCGSQAOditPYrhqNIAAAAASUVORK5CYII=

)) (failures) is E2H+MFPzMfA3+ZzLiokWBMjL0so01pA5ZSIEQX08b+p1DVII3lUJkGxH6cF19pbTdI5kQEQImztGF94fxxlTfFYWmOxPLGI8ohObWEl4j7vppDxGNaBXy8rXkI2xIftcuzxxfXXwEYudYYxF05UgAAAABJRU5ErkJggg==

10.

Inference (2-sample proportion):

-C.I: if n CDRsKsJXmC8xKmf4kB6gCCbWkMDFArYBywq+7cusKQFuAJcSeIY6WlCtIBsgSuHsQB2QeSABsEciLIdogzgCELFrdKSwO7B8gAi4IlUAyCehdIMTHcvnYMZBcGAJkKtQGhHMjCGSQAOditPYrhqNIAAAAASUVORK5CYII=

and n(1- CDRsKsJXmC8xKmf4kB6gCCbWkMDFArYBywq+7cusKQFuAJcSeIY6WlCtIBsgSuHsQB2QeSABsEciLIdogzgCELFrdKSwO7B8gAi4IlUAyCehdIMTHcvnYMZBcGAJkKtQGhHMjCGSQAOditPYrhqNIAAAAASUVORK5CYII=

)) E2H+MFPzMfA3+ZzLiokWBMjL0so01pA5ZSIEQX08b+p1DVII3lUJkGxH6cF19pbTdI5kQEQImztGF94fxxlTfFYWmOxPLGI8ohObWEl4j7vppDxGNaBXy8rXkI2xIftcuzxxfXXwEYudYYxF05UgAAAABJRU5ErkJggg==

10 for each sample (if n E2H+MFPzMfA3+ZzLiokWBMjL0so01pA5ZSIEQX08b+p1DVII3lUJkGxH6cF19pbTdI5kQEQImztGF94fxxlTfFYWmOxPLGI8ohObWEl4j7vppDxGNaBXy8rXkI2xIftcuzxxfXXwEYudYYxF05UgAAAABJRU5ErkJggg==

5 in each sample, use “+4 method” for 2 samples proportion”)

-Hypothesis test: if n CDRsKsJXmC8xKmf4kB6gCCbWkMDFArYBywq+7cusKQFuAJcSeIY6WlCtIBsgSuHsQB2QeSABsEciLIdogzgCELFrdKSwO7B8gAi4IlUAyCehdIMTHcvnYMZBcGAJkKtQGhHMjCGSQAOditPYrhqNIAAAAASUVORK5CYII=

and n(1- CDRsKsJXmC8xKmf4kB6gCCbWkMDFArYBywq+7cusKQFuAJcSeIY6WlCtIBsgSuHsQB2QeSABsEciLIdogzgCELFrdKSwO7B8gAi4IlUAyCehdIMTHcvnYMZBcGAJkKtQGhHMjCGSQAOditPYrhqNIAAAAASUVORK5CYII=

)) E2H+MFPzMfA3+ZzLiokWBMjL0so01pA5ZSIEQX08b+p1DVII3lUJkGxH6cF19pbTdI5kQEQImztGF94fxxlTfFYWmOxPLGI8ohObWEl4j7vppDxGNaBXy8rXkI2xIftcuzxxfXXwEYudYYxF05UgAAAABJRU5ErkJggg==

5 in each sample

Population proportion – 1 sample:

Parameter: p(0 Yze+6XYFyMevd11qBOwAAAAASUVORK5CYII=

CDRsKsJXmC8xKmf4kB6gCCbWkMDFArYBywq+7cusKQFuAJcSeIY6WlCtIBsgSuHsQB2QeSABsEciLIdogzgCELFrdKSwO7B8gAi4IlUAyCehdIMTHcvnYMZBcGAJkKtQGhHMjCGSQAOditPYrhqNIAAAAASUVORK5CYII=

; CDRsKsJXmC8xKmf4kB6gCCbWkMDFArYBywq+7cusKQFuAJcSeIY6WlCtIBsgSuHsQB2QeSABsEciLIdogzgCELFrdKSwO7B8gAi4IlUAyCehdIMTHcvnYMZBcGAJkKtQGhHMjCGSQAOditPYrhqNIAAAAASUVORK5CYII=

Yze+6XYFyMevd11qBOwAAAAASUVORK5CYII=

1); Statistic: CDRsKsJXmC8xKmf4kB6gCCbWkMDFArYBywq+7cusKQFuAJcSeIY6WlCtIBsgSuHsQB2QeSABsEciLIdogzgCELFrdKSwO7B8gAi4IlUAyCehdIMTHcvnYMZBcGAJkKtQGhHMjCGSQAOditPYrhqNIAAAAASUVORK5CYII=

PhGSMAFUh3bwPyo1rkNtLJXWtvjELKKVlBt47FSDHDeRZolc3mnq6bxzGUlBRQe0K15PlRK6qBIiPAJ71eAvLrTrwg64bdWuflERx9qJzeWWRipo4LJRwWYg74sjowKSFNf83QVTiCuqQrwDTiWJQnVka+4AAAAASUVORK5CYII=

A9r3UCsd1lqGQAAAABJRU5ErkJggg==

gUOT10uEdNxlQAAAABJRU5ErkJggg==

xjxjkjkBFwEfgfjCaJCtaV5pMAAAAASUVORK5CYII=

Population proportion – 2 samples:

8AlRLMoP8vskCAAAAAElFTkSuQmCC

fMAAAAASUVORK5CYII=

AAAAAElFTkSuQmCC

)

S0dlLjSiXrFQZfbEd2bHEjgP0SSAdU33kavAAAAAElFTkSuQmCC

Notas:

– With a (%) confidence level, we can state that the (population parameter) is between (lower limit) and (upper limit).

– There (is/isn’t) sufficient evidence at the ___ level of significance to conclude (translate Ha)

* Crea un frequency table for proportions.

– There (is/isn’t) statistically significant evidence in the proportion of ___ compared to ____