Chi-squared goodness of fit test We want to test whether there's a significant difference in the preferences of people for three different flavours of ice cream: chocolate, vanilla, and strawberry. We'll collect data from a sample of 200 individuals and record their preferences. Here's hypothetical data: Chocolate: 80 people; Vanilla: 60 people; Strawberry: 60 people Now, we want to test whether these preferences are significantly different from what we would expect if there were no preference (i.e., if people were equally likely to choose any flavour). Null hypothesis (H0): there is no difference in preference, meaning each flavour is equally likely to be chosen. Alternative hypothesis (H1): there is a difference in preference, meaning at least one flavour is more preferred than the others. We'll use the chi-squared test to analyse this data. First, we need to calculate the expected frequencies under the assumption of no preference. Since there are three flavours and 200 people, the expected frequency for each flavour is 200/3 = 66.67. Expected frequencies: Chocolate: 66.67; Vanilla: 66.67; Strawberry: 66.67 Now, we calculate the chi-squared statistic: χ² = Σ ((Observed frequency - Expected frequency)² / Expected frequency) For each flavour: Chocolate: (80 - 66.67)² / 66.67 ≈ 2.66; Vanilla: (60 - 66.67)² / 66.67 ≈ 0.67; Strawberry: (60 - 66.67)² / 66.67 ≈ 0.67 Summing these values, we get: χ² ≈ 2.66 + 0.67 + 0.67 ≈ 4 Now, we need to compare this value to the critical value from the chi-squared distribution table with (3-1) = 2 degrees of freedom (since there are 3 categories). Assuming a significance level (α) of 0.05, we find that the critical value is approximately 5.99. CHISQ.INV(0.95,2) Since our calculated χ² value (4) is less than the critical value (5.99), we fail to reject the null hypothesis. This means that we do not have sufficient evidence to conclude that there is a significant difference in preferences for the three flavours of ice cream. Here are some examples where you can use the chi-squared goodness of fit test to test hypotheses: 1. Dice Fairness: Hypothesis: Are the outcomes of a fair six-sided die statistically consistent with the expected probabilities? Assume a significance level (α) of 0.05. Data Collection: Roll a fair six-sided die a large number of times and record the frequencies of each outcome (1 through 6). Number Frequency 1 80 2 120 3 70 4 130 5 110 6 90 Null Hypothesis (H0): The observed frequencies match the expected probabilities of each outcome (1/6 for each). Alternative Hypothesis (H1): The observed frequencies do not match the expected probabilities. 2. Marbles in a Bag: Hypothesis: Do the observed frequencies of different coloured marbles in a bag match the expected frequencies based on a specified distribution? Assume a significance level (α) of 0.01. Data Collection: Randomly sample a large number of marbles from a bag and record the frequencies of each colour. Colour Observed frequency Expected frequency Red 30 35% Green 20 20% Blue 50 35% Black 10 10% Null Hypothesis (H0): The observed frequencies match the expected frequencies based on the specified distribution. Alternative Hypothesis (H1): The observed frequencies do not match the expected frequencies. In each of these examples, you would define the expected frequencies based on the null hypothesis, calculate the chi-squared statistic, and compare it to the critical value from the chi-squared distribution to make a conclusion about the goodness of fit of the observed data to the expected distribution. Chi-squared test for independence Consider an example where we want to determine if there's an association between smoking habits and gender among a group of individuals. We'll collect data from a sample of 500 people and record whether they are smokers or non-smokers and their gender. Here's hypothetical data: Among 250 males, 100 are smokers and 150 are non-smokers. Among 250 females, 50 are smokers and 200 are non-smokers. We want to test whether smoking habits are independent of gender. Null hypothesis (H0): Smoking habits are independent of gender. Alternative hypothesis (H1): Smoking habits are dependent on gender. We'll use the chi-squared test for independence to analyse this data. First, let's create a contingency table: SMOKERS NON-SMOKERS TOTAL MALE 100 150 FEMALE 50 200 TOTAL Now, we'll calculate the expected frequencies assuming independence: - Expected frequency for male smokers: (250 * 150) / 500 = 75 - Expected frequency for male non-smokers: (250 * 350) / 500 = 175 - Expected frequency for female smokers: (250 * 150) / 500 = 75 - Expected frequency for female non-smokers: (250 * 350) / 500 = 175 Next, we'll calculate the chi-squared statistic: χ² = Σ ((Observed frequency - Expected frequency)² / Expected frequency) For each cell: (100 - 75)² / 75 ≈ 8.33; (150 - 175)² / 175 ≈ 3.57; (50 - 75)² / 75 ≈ 8.33; (200 - 175)² / 175 ≈ 3.57 Summing these values, we get: χ² ≈ 8.33 + 3.57 + 8.33 + 3.57 ≈ 23.8 Now, we need to compare this value to the critical value from the chi-squared distribution table with (2-1)(2-1) = 1 degree of freedom (since there are 2 categories for both smoking habit and gender). Assuming a significance level (α) of 0.05, the critical value is approximately 3.84. CHISQ.INV(0.95, 1) = 3.84 Since our calculated χ² value (23.8) is greater than the critical value (3.84), we reject the null hypothesis. This indicates that there is a significant association between smoking habits and gender among the population. Here are a few more examples where you can apply the chi-squared test for independence to test hypotheses: 1. Educational Attainment and Employment Status: Hypothesis: Is there a relationship between educational attainment (e.g., high school diploma, bachelor's degree, master's degree) and employment status (e.g., employed, unemployed, student)? Assume a significance level (α) of 0.05. Data Collection: Survey a sample of individuals and record their educational attainment and current employment status. EMPLOYED UNEMPLOYED STUDENT TOTAL HIGH 100 50 70 BACHELOR 120 40 50 MASTER 80 20 30 TOTAL Null Hypothesis (H0): Educational attainment and employment status are independent. Alternative Hypothesis (H1): Educational attainment and employment status are dependent. 2. Customer Satisfaction and Product Type: Hypothesis: Is there an association between customer satisfaction (e.g., satisfied, neutral, dissatisfied) and the type of product purchased (e.g., electronics, clothing, food)? Assume a significance level (α) of 0.01. Data Collection: Gather feedback from customers who purchased different types of products and record their satisfaction levels. ELECTRONICS CLOTHING FOOD TOTAL SATISFIED 50 40 30 NEUTRAL 40 30 10 DISSATISFIED 30 20 10 TOTAL Null Hypothesis (H0): Customer satisfaction and product type are independent. Alternative Hypothesis (H1): Customer satisfaction and product type are dependent. 3. Preferred Social Media Platform and Age Group: Hypothesis: Is there an association between preferred social media platform (e.g., Facebook, Instagram) and age group (e.g., teenagers, young adults, middle-aged adults)? Assume a significance level (α) of 0.05. Data Collection: Survey a sample of individuals across different age groups and record their preferred social media platforms. TEENAGERS YOUNG ADULTS OTHERS TOTAL FACEBOOK 25 50 60 INSTAGRAM 60 30 35 TOTAL Null Hypothesis (H0): Preferred social media platform and age group are independent. Alternative Hypothesis (H1): Preferred social media platform and age group are dependent. In each of these examples, you would collect data, create a contingency table, calculate expected frequencies assuming independence, compute the chi-squared statistic, and compare it to the critical value from the chi-squared distribution to make a conclusion about the relationship between the variables.