Statistical Methods for Economists Lecture 1 Basic Statistical Concepts and Data Characteristics •David Bartl •Statistical Methods for Economists •INM/BASTE Outline of the lecture •Reading list •Measures of central tendency (arithmetic mean, mode, median) •Measures of variability (range, variance, coefficient of variation) •Measures of data concentration (skewness, kurtosis) •Moment characteristics •Two statistical variables •SUPPLEMENT: The expected values of the functions of random variables Reading list •Compulsory: • •TOŠENOVSKÝ, Filip: Statistical Methods for Economists. Karviná: SU OPF, 2014. ISBN 978-80-7510-033-7 • •ANDERSON, David R., SWEENEY, Dennis J., WILLIAMS, Thomas A., FREEMAN, James, SHOESMITH, Eddie: Statistics for Business and Economics. 4th Edition. Cengage Learning, 2017. ISBN 978-1-4737-2656-7 • •KELLER, Gerald: Statistics for Management and Economics. 11th Edition. Cengage Learning, 2017. ISBN 978-1-337-09345-3 • Reading list •Free Online Textbooks: • •Many textbooks on statistics and other disciplines can be found at https://freetextbook.org/ • •Online Statistics Education: An Interactive Multimedia Course of Study http://onlinestatbook.com/ • •The Electronic Statistics Textbook by StatSoft, Inc. (2013) www.statsoft.com/textbook • •The printed version of the latter textbook: HILL, T. & LEWICKI, P. (2007). STATISTICS: Methods and Applications. StatSoft, Tulsa, OK. Reading list •Recommended I: • •SIEGEL, Andrew: Practical Business Statistics. 7th Edition. Academic Press, 2016. ISBN 978-0-12-804250-2 • •ÖZDEMIR, Durmuş: Applied Statistics for Economics and Business. 2nd Edition. Springer, 2016. ISBN 978-3-319-26495-0 (hardcover). ISBN 978-3-319-79962-9 (softcover). • •UBØE, Jan: Introductory Statistics for Business and Economics: Theory, Exercises and Solutions. 1st Edition. Springer, 2017. ISBN 978-3-319-70935-2 (hardcover). ISBN 978-3-319-89016-6 (softcover). • Reading list •Recommended II: • •QUIRK, Thomas: Excel 2016 for Business Statistics: A Guide to Solving Practical Problems. 1st Edition. Springer, 2016. ISBN 978-3-319-38958-5 (softcover). • •HERKENHOFF, Linda, FOGLI, John: Applied Statistics for Business and Management using Microsoft Excel. 1st Edition. Springer, 2013. ISBN 978-1-4614-8422-6 (softcover). • Reading list •Optional: • •DANIEL, W. W., TERREL, J.: Business Statistics for Management and Economics. Houghton Mifflin, 1995. ISBN 0-395-73717-6 • •WOOLDRIDGE, J. M.: Introductory Econometrics: A Modern Approach. Mason, OH: Thomson/South-Western, 2006. ISBN 0-324-28978-2 • •VAN MATRE, J. G., GILBREATH, G. H.: Statistics for Business and Economics. BPI/IRWIN, Homewood, 1997. ISBN 0-256-03719-1 • Basic Statistical Concepts •Data — Data unit — Data item — Observation — Dataset •Population — Sample — Data item •Population & Sample • Data — Data unit — Data item — Observation — Dataset •Data — (plural) — measurements and observations •Data unit — one entity (e.g. a person) in the population, under study, about which the data are collected •Data item — a characteristics (an attribute) of a data unit (e.g. the date of birth, gender, income, …), also called a variable •Observation — an occurrence of a specific data item recorded about a data unit, also called a datum (singular of “data”) •Dataset — a complete collection of all observations Population — Sample — Data item •Population — a collection of all data units of the same specification •Sample — a selected subset of the population •Data item — a property or an attribute of a data unit of the population • •Data items – statistical variables – are: • — qualitative (categorical), such as the gender, colour, taste, satisfaction • — quantitative (numerical), such as the revenue, price, number of customers Population & Sample •Assume that we have a set (i.e. a “population”) of values of some phenomenon, which we observe / measure / study / deal with. In practice, this set may be very very large (e.g. some data item, the data units being all the people living on the Earth), thus unknown to us. Another example might be the set of all results of some experiment, yet the instances which we have not done yet. •Assume however, that the set exists (in theory at least) and that the set is finite (for simplicity). Population & Sample Population & Sample Population & Sample Measures of central tendency •Arithmetic mean •Mode •Median •Frequencies of occurrence •Weighted arithmetic mean • Measures of central tendency •Assume that a variable (data item) is numerical, i.e. quantitative, discrete or continuous. We then consider several measures of central tendency of the variable: • — Arithmetic mean • — Mode • — Median Arithmetic mean Arithmetic mean Median & Mode Sample Mean / Median / Mode in Excel •In Excel, use the functions: • • =AVERAGEA() to calculate the sample arithmetic mean • • =MEDIAN() to find the sample median • • =MODE.SNGL() to find one of the sample modes • • =MODE.MULT() to find many of the sample modes • (matrix function, press “Ctrl-Shift-Enter”) • • =MODE() to find one of the sample modes (the same as =MODE.SNGL(), deprecated) Frequencies of occurrence Frequencies of occurrence Arithmetic mean Example: Employees (a sample of the Dataset) ID Gender Age Marital Status Education Position Salary per Year Evaluation 5060 M 65 divorced secondary worker 258800 4 1030 M 60 divorced university manager 630000 2 3049 M 60 married primary operator 436600 5 5047 M 60 widowed primary+vocational worker 240600 3 5061 M 60 widowed primary+vocational worker 241800 1 5087 M 60 widowed secondary worker 239500 — 5133 F 60 married secondary worker 241100 4 5177 F 60 widowed secondary worker 239600 4 3030 F 58 widowed primary operator 422600 1 3014 F 56 widowed university operator 303600 3 5012 F 56 widowed primary+vocational worker 223100 4 5056 M 56 divorced primary worker 225200 5 5101 M 56 unmarried primary+vocational worker 224600 4 5106 M 56 married primary+vocational worker 226100 7 5146 F 56 married primary+vocational worker 224900 3 5153 M 56 divorced secondary worker 224500 4 5189 M 56 married primary+vocational worker 224600 1 5196 M 56 widowed primary+vocational worker 222800 3 1031 M 55 married university manager 429000 — 5016 M 55 divorced secondary administrative officer 259000 5 5021 F 55 married primary+vocational worker 220200 — 5062 F 55 widowed primary+vocational worker 221400 5 5107 M 55 divorced primary+vocational worker 220500 4 5154 F 55 widowed primary+vocational worker 219200 5 5195 M 55 married primary+vocational worker 219400 6 sample Example: Employees — data item “Age” Measures of variability •Range •Variance (dispersion) •Coefficient of variation • Measures of variability •Assume that a variable (data item) is numerical, i.e. quantitative, discrete or continuous. We then consider several measures of variability of the variable: • — Range • — Variance (dispersion) • — Coefficient of variation Range Variance (dispersion) Variance (dispersion) Variance (dispersion) Variance (dispersion) Variance (dispersion) Standard deviation Variance (dispersion) & Standard deviation Sample Variance / Standard deviation •In Excel, use the functions: • • =VARA() to calculate the sample variance • • =STDEVA() to calculate the sample standard deviation • • • • =VAR.S() to calculate the sample variance (skipping text values) • • =VAR() to calculate the sample variance (skipping text values) • (the same as =VAR.S(), deprecated) Population Variance / Standard deviation •In Excel, use the functions: • • =VARPA() to calculate the population variance • • =STDEVPA() to calculate the population standard deviation • • • • =VAR.P() to calculate the population variance (skipping text values) Coefficient of variation Example 135.7 Example Measures of data concentration •Skewness •Kurtosis • Measures of data concentration •Assume that a variable (data item) is numerical, i.e. quantitative, discrete or continuous. We then consider several measures of data concentration of the variable: • — Skewness • — Kurtosis Skewness: Pearson’s moment coefficient of skewness Skewness: Properties and interpretation Skewness: Properties and interpretation Skewness: Properties and interpretation Skewness in Excel •In Excel, use the functions: • • =SKEW.P() to calculate the population skewness • • =SKEW() to calculate the sample skewness Skewness in Excel Kurtosis: Pearson’s moment coefficient of kurtosis Kurtosis: Properties and interpretation Kurtosis: Properties and interpretation Kurtosis: Properties and interpretation Excess kurtosis Kurtosis in Excel •In Excel, use the function: • • =KURT() to calculate the sample excess kurtosis Kurtosis in Excel Moment characteristics •Raw moments •Central moments •Standardized moments • The moments Raw moment Central moment Central moment and raw moments Central moment Two statistical variables •Two populations •Contingency table •Covariance •Correlation coefficient • Two populations Two populations Two populations: Joint frequencies Two populations: Marginal frequencies Contingency table — for the population marginal frequencies marginal frequencies the population size Two samples Two samples: Joint frequencies Two samples: Marginal frequencies Contingency table — for the sample marginal frequencies marginal frequencies the sample size Arithmetic means Variances Covariance Covariance Pearson paired correlation coefficient The expected values of the functions of random variables •The expected value of the sample mean •Independent events •Independent random variables •The variance of the expected value of the sample mean •The expected value of the sample variance • The expected value of the sample mean Independent events Independent random variables Independent random variables: Theorem Independent random variables: E(XY) = E(X) E(Y) Independent random variables: E(XY) = E(X) E(Y) Independent random variables: Theorem II The variance of the sample mean The variance of the sample mean The variance of the sample mean The expected value of the sample variance The expected value of the sample variance The expected value of the sample variance The expected value of the sample variance The expected value of the sample variance Alternative formula for sample variance Alternative formula for sample variance