Statistical Methods for Economists Lecture 2 & 3 Hypothesis testing: Parametric and Non-parametric tests (in marketing and elsewhere) •David Bartl •Statistical Methods for Economists •INM/BASTE Outline of the lecture •About statistical hypothesis testing •PARAMETRIC TESTS •t-tests for the means •Two sample F-test for the equality of variances •NON-PARAMETRIC TESTS •Sign test for the median •Pearson’s χ2-test for the goodness of fit •χ2-test of independence of qualitative data items About statistical hypothesis testing •The general outline of a statistical hypothesis test •The p-value of a test •Parametric and Non-parametric tests • The general outline of a statistical hypothesis test The general outline of a statistical hypothesis test The general outline of a statistical hypothesis test The general outline of a statistical hypothesis test The p-value of the test hypothesis.) Parametric and Non-parametric tests •There are two large classes of statistical tests: parametric and non-parametric. • •The parametric tests make assumptions about the probability distributions of the random variables that are subject to the test. It is often assumed that the underlying distribution is normal (Gaussian). • •The non-parametric tests do not make such assumptions. The non-parametric tests can be used if the random variables are not normally distributed. PARAMETRIC TESTS •t-tests for the means •Two sample F-test for the equality of variances • t-tests for the means •One-sample t-test for the population mean •Paired-sample t-test for the difference of the population means •Two-sample t-test for the difference of the population means • One-sample t-test for the population mean One-sample t-test for the population mean One-sample t-test for the population mean Theorem – Corollary – Proof: One-sample t-test for the population mean One-sample t-test for the population mean One-sample t-test for the population mean One-sample t-test for the population mean One-sample t-test for the population mean One-sample t-test for the population mean One-sample t-test for the population mean One-sample t-test for the population mean The gamma function The gamma function – another definition (due to Euler) One-sample t-test for the population mean One-sample t-test for the population mean One-sample t-test for the population mean Type I and Type II error Type I and Type II error Type I and Type II error Type I and Type II error Type I and Type II error Type I and Type II error: Summary One-sample t-test for the population mean One-sample t-test for the population mean One-sample t-test for the population mean One-sample t-test for the population mean Paired-sample t-test for the difference of the pop.means Paired-sample t-test for the difference of the pop.means Paired-sample t-test for the difference of the pop.means Paired-sample t-test for the difference of the pop.means Paired-sample t-test for the difference of the pop.means •We have thus reduced the paired-sample t-test for the difference of the population means to the one-sample t-test for the population mean, which we already know. Paired-sample t-test for the difference of the pop.means Paired-sample t-test for the difference of the pop.means Paired-sample t-test for the difference of the pop.means Paired-sample t-test for the difference of the pop.means Paired-sample t-test for the difference of the pop.means Paired-sample t-test for the difference of the pop.means Paired-sample t-test for the difference of the pop.means Two-sample t-test for the diff. of the pop. means // σX=σY Two-sample t-test for the diff. of the pop. means // σX=σY Two-sample t-test for the diff. of the pop. means // σX=σY Two-sample t-test for the diff. of the pop. means // σX=σY Two-sample t-test for the diff. of the pop. means // σX=σY Two-sample t-test for the diff. of the pop. means // σX=σY Two-sample t-test for the diff. of the pop. means // σX=σY Two-sample t-test for the diff. of the pop. means // σX=σY Two-sample t-test for the diff. of the pop. means // σX=σY Two-sample t-test for the diff. of the pop. means // σX=σY Two-sample t-test for the diff. of the pop. means // σX=σY Two-sample t-test for the diff. of the pop. means // σX=σY Two-sample t-test for the diff. of the pop. means // σX=σY Two-sample t-test for the diff. of the pop. means // σX=σY Two-sample t-test for the diff. of the pop. means // σX=σY Two-sample t-test for the diff. of the pop. means // σX=σY Two-sample t-test for the diff. of the pop. means // σX=σY Two-sample t-test for the diff. of the pop. means // σX≠σY Theorem (Satterthwaite’s approximation): Two-sample t-test for the diff. of the pop. means // σX≠σY •Exercise: • •Use the last Theorem (Satterthwaite’s approximation) to formulate a statistical two-sample t-test for the difference of the population means with two-sided / one-sided alternative hypothesis (not assuming the same variance). Two sample F-test for the equality of variances • • Motivation Two sample F-test for the equality of variances Two sample F-test for the equality of variances Two sample F-test for the equality of variances Theorem Two-sample F-test for the equality of variances Two-sample F-test for the equality of variances Two-sample F-test for the equality of variances Two-sample F-test for the equality of variances Two-sample F-test for the equality of variances Two-sample F-test for the equality of variances Two-sample F-test for the equality of variances NON-PARAMETRIC TESTS •Sign test for the median •Pearson’s χ2-test for the goodness of fit •χ2-test of independence of qualitative data items • Sign test for the median •Sign test for the median •Paired sign test for the difference of the medians • Sign test for the median Sign test for the median Sign test for the median Sign test for the median Sign test for the median Sign (binomial) test for the median Sign (binomial) test for the median that Sign (binomial) test for the median Sign (z-) test for the median Sign (z-) test for the median Sign (z-) test for the median Sign (z-) test for the median Sign (z-) test for the median Sign test for the median Sign test for the median Paired sign test for the difference of the medians Paired sign test for the difference of the medians Paired sign test for the difference of the medians Paired sign test for the difference of the medians χ2-test for goodness of fit •Pearson’s χ2-test for the goodness of fit • Pearson’s χ2-test for the goodness of fit Pearson’s χ2-test for the goodness of fit Pearson’s χ2-test for the goodness of fit Pearson’s χ2-test for the goodness of fit Pearson’s χ2-test for the goodness of fit Pearson’s χ2-test for the goodness of fit Pearson’s χ2-test for the goodness of fit Pearson’s χ2-test for the goodness of fit Pearson’s χ2-test for the goodness of fit Example: Tests for population proportion Example: Tests for population proportion Example: Tests for population proportion Pearson’s χ2-test for the goodness of fit χ2-test of independence of qualitative data items •χ2-test of independence of qualitative data items • χ2-test of independence of qualitative data items Contingency table the observed counts of the combinations of the categories Ai&Bj for i=1,…,r & j=1,…,s marginal totals marginal totals the grand total 2 × 2 contingency table •The 2 × 2 contingency table is popular. •It is a contingency table with r=2 rows and s=2 columns. the observed counts of the combinations of the categories Ai&Bj for i=1,2 & j=1,2 marginal totals marginal totals the grand total χ2-test of independence of qualitative data items •Having all the observed counts of the combinations of the categories Ai & Bj summarized in the contingency table for i=1,…,r and for j=1,…,s, we ask whether the category of the data item (variable) B depends upon the category of the data item (variable) A, or whether the categories of both data items (variables) A and B are independent of each other. • •Assume therefore the null hypothesis H0: •the categories of both data items (variables) A and B are independent of each other χ2-test of independence of qualitative data items •Having all the observed counts of the combinations of the categories Ai & Bj summarized in the contingency table for i=1,…,r and for j=1,…,s, assume the null hypothesis H0 that the categories of both data items (variables) A and B are independent of each other. •Now – if we choose a data unit randomly: •What is the probability that the data item A of the chosen data unit is of category Ai for some i=1,…,r ? •What is the probability that the data item B of the chosen data unit is of category Bj for some j=1,…,s ? χ2-test of independence of qualitative data items χ2-test of independence of qualitative data items χ2-test of independence of qualitative data items χ2-test of independence of qualitative data items χ2-test of independence of qualitative data items χ2-test of independence of qualitative data items χ2-test of independence of qualitative data items χ2-test of independence of qualitative data items