Home | > | Linear Regression Analysis |

Linear Regression Analysis

Assumptions and Applications

John P. Hoffmann and Kevin Shafer

978-0-87101-457-3. 2015. Item #4573. 240 pages.

978-0-87101-457-3. 2015. Item #4573. 240 pages.

Book Type:

*Linear Regression Analysis: Assumptions and Applications*is designed to provide students with a straightforward introduction to a commonly used statistical model that is appropriate for making sense of data with multiple continuous dependent variables. Using a relatively simple approach that has been proven through several years of classroom use, this text will allow students with little mathematical background to understand and apply the most commonly used quantitative regression model in a wide variety of research settings. Instructors will find that its well-written and engaging style, numerous examples, and chapter exercises will provide essential material that will complement classroom work.

*Linear Regression Analysis*may also be used as a self-teaching guide by researchers who require general guidance or specific advice regarding regression models, by policymakers who are tasked with interpreting and applying research findings that are derived from regression models, and by those who need a quick reference or a handy guide to linear regression analysis.

Social work and other social and behavioral science students and researchers need to have a suite of research tools to conduct studies. Regression analysis is a popular tool that is used in numerous studies to examine statistical relationships among variables. Yet there are few books that offer straightforward and easy-to-follow instruction regarding this type of analysis. Most books rely too much on mathematical and symbolic representations of regression analysis, even though many students do not have a sufficient background in mathematics and are often put off by the high level of sophistication required to master these techniques. This book offers a conceptual and software-driven approach to understanding linear regression analysis, with only a slight familiarity with algebra required even for self-study. Students and researchers will find this to be an accessible, yet thorough, introduction to the linear regression model.

Note: eBooks may be purchased online in single quantities only. To purchase multiple eBook copies, please contact naswpress@brightkey.net.

About the Authors

Preface

Acknowledgments

References

Index

Preface

Acknowledgments

**Chapter 1:**A Review of Some Elementary Statistical Concepts**Chapter 2:**Simple Linear Regression**Chapter 3:**Multiple Linear Regression Analysis**Chapter 4:**The ANOVA Table and Goodness-of-Fit**Chapter 5:**Comparing Linear Regression Models**Chapter 6:**Dummy Variables in Linear Regression Models**Chapter 7:**Specification Errors in Linear Regression Models**Chapter 8:**Measurement Errors in Linear Regression Models**Chapter 9:**Collinearity and Multicollinearity**Chapter 10:**Nonlinear Associations and Interaction Terms**Chapter 11:**Heteroscedasticity and Autocorrelation**Chapter 12:**Influential Observations: Leverage Points and Outliers**Chapter 13:**An Introduction to Logistic Regression**Chapter 14:**An Introduction to Multilevel Models**Chapter 15:**ConclusionReferences

Index

We live in a world where we are surrounded by data. Studies are highlighted in newspapers, magazines, on television, and online daily. We are constantly shown graphs and charts, statistics about life expectancy, crime, pollution, unemployment, life satisfaction, elections, and other phenomena. As a social worker, you will likely encounter data frequently – standardized assessment scores, research studies, and new information as you obtain continuing education units. Understanding statistics, or at least speaking intelligently about them, is practically mandatory for well-educated people and good social workers. Yet very few people have a good grasp on the strengths and weaknesses of data collection and analysis. What does it mean to say that life expectancy in the United States is 75.8? Should we trust exit polls in the latest elections? When someone claims that "taking calcium supplements is not associated with lower risk of bone fractures in elderly women," what does this actually mean? Is it really meaningful for a social work researcher to suggest that a one-year increase in education is associated with a half-point decrease on a depression scale? These questions, and many others, are common in contemporary statistical analysis.

For budding social scientists, whether they are sociologists, psychologists, or social work researchers, it is a virtual impossibility to avoid sophisticated quantitative analyses. Such research moves well beyond simple statistics such as means, medians, modes, and standard deviations. In fact, the majority of studies found in social science (including social work) journals use sophisticated statistical models designed to predict the occurrence of one variable with information about another. The most common type of model designed to make such a prediction is the linear regression model – which is the focus of this book.

One can make a quick search anywhere and find numerous books and articles discussing the linear regression model. Students are usually exposed to this model in their second statistics course for several reasons: (1) it is relatively easy to understand; (2) statistical software designed to estimate it is widely available; (3) it is a highly flexible method; and (4) it really is the backbone of virtually all statistical analysis. Despite these very positive aspects, there are some weaknesses as well – particularly related to the misuse of the model. For instance, linear regression does not work well when two or more of the explanatory variables have high correlations (chapter 9) or if one of the data points is different from all others (chapter 12). Therefore, one of the goals of this book is to provide a relatively painless overview of the assumptions researchers and statisticians make when using the linear regression model.

However, the main purpose of this book is to show the reader how to use linear regression models in studies using quantitative data. We do so by discussing (1) why regression models are used; (2) what they tell us about the relationships between two or more variables; (3) what the assumptions of the model are and how to determine whether they are satisfied; (4) what to do when assumptions are not satisfied; and (5) what to do when the outcome variable is measured using only two categories. In chapter 13, we explain that linear regression models are not good at this last point, which will lead us to discuss the logistic regression model.

The coauthors of this book have substantial experience teaching statistics and have seen some students struggle with these methods, but many more succeed. We know that having a basic foundation of elementary statistics is critical to understanding the linear regression model. As a result, the first chapter of the book covers some elementary statistics. However, this chapter is meant as a review, not as a substitute for a full class on such methods. We recommend that you review a basic statistics textbook if you are unfamiliar with mean, medians, standard deviations, z scores, t tests, correlation, or analysis of variance. We also suggest that you take some time to learn either Stata or SPSS, statistical software packages designed to carry out the analyses covered here. Although SPSS may be the more commonly used of these programs, you will notice that Stata is often more flexible for the methods we show you in this book. If you are an SPSS user, we have noted throughout the book how to use the program to analyze the data using the drop-down menu approach. SPSS also uses syntax – however, this method is not very efficient. In contrast, we highly recommend using syntax in Stata and do-files, which are simple programs of instructions for Stata. Many books and websites will introduce you to both programs, although we strongly recommend the website maintained by UCLA at www.ats.ucla.edu/stat/. One further point for Stata users is that we rely on the command line approach in this book, but we highly encourage you to write out these commands in a do-file so that you have a record of the commands used. Using Stata’s log files is also strongly recommended.

The chapters follow the typical format for books on linear regression. We begin with elementary statistics, followed by a discussion of the simple linear regression model and regressions with multiple variables. Next, we will learn about goodness-of-fit, comparison of models, and dummy variables. This will be followed by a discussion of linear regression assumptions, such as multicollinearity, heteroscedasticity, autocorrelation, and influential observations. Finally, we provide a brief overview of the logistic regression model.

We hope that you, the reader, will keep one thing in mind as you read this book. Statistics are often maligned by observers. Consider, for example, the book

We use both Stata and SPSS in our presentation. In the first few chapters, we present the output from both programs to increase your comfort level with what the programs look like. In the latter chapters, we give one set of results. We list Stata commands and, in a few instances, SPSS syntax, in Courier New font. When we discuss using the drop-down menus in SPSS, we provide those directions by using italicized font. Key words and phrases are also italicized for emphasis, and we provide a list of key terms and concepts at the end of each chapter. We provide you with a few exercises to do on your own at the end of the chapter. We also highly encourage you follow along with the examples we provide in the text.

For budding social scientists, whether they are sociologists, psychologists, or social work researchers, it is a virtual impossibility to avoid sophisticated quantitative analyses. Such research moves well beyond simple statistics such as means, medians, modes, and standard deviations. In fact, the majority of studies found in social science (including social work) journals use sophisticated statistical models designed to predict the occurrence of one variable with information about another. The most common type of model designed to make such a prediction is the linear regression model – which is the focus of this book.

One can make a quick search anywhere and find numerous books and articles discussing the linear regression model. Students are usually exposed to this model in their second statistics course for several reasons: (1) it is relatively easy to understand; (2) statistical software designed to estimate it is widely available; (3) it is a highly flexible method; and (4) it really is the backbone of virtually all statistical analysis. Despite these very positive aspects, there are some weaknesses as well – particularly related to the misuse of the model. For instance, linear regression does not work well when two or more of the explanatory variables have high correlations (chapter 9) or if one of the data points is different from all others (chapter 12). Therefore, one of the goals of this book is to provide a relatively painless overview of the assumptions researchers and statisticians make when using the linear regression model.

However, the main purpose of this book is to show the reader how to use linear regression models in studies using quantitative data. We do so by discussing (1) why regression models are used; (2) what they tell us about the relationships between two or more variables; (3) what the assumptions of the model are and how to determine whether they are satisfied; (4) what to do when assumptions are not satisfied; and (5) what to do when the outcome variable is measured using only two categories. In chapter 13, we explain that linear regression models are not good at this last point, which will lead us to discuss the logistic regression model.

The coauthors of this book have substantial experience teaching statistics and have seen some students struggle with these methods, but many more succeed. We know that having a basic foundation of elementary statistics is critical to understanding the linear regression model. As a result, the first chapter of the book covers some elementary statistics. However, this chapter is meant as a review, not as a substitute for a full class on such methods. We recommend that you review a basic statistics textbook if you are unfamiliar with mean, medians, standard deviations, z scores, t tests, correlation, or analysis of variance. We also suggest that you take some time to learn either Stata or SPSS, statistical software packages designed to carry out the analyses covered here. Although SPSS may be the more commonly used of these programs, you will notice that Stata is often more flexible for the methods we show you in this book. If you are an SPSS user, we have noted throughout the book how to use the program to analyze the data using the drop-down menu approach. SPSS also uses syntax – however, this method is not very efficient. In contrast, we highly recommend using syntax in Stata and do-files, which are simple programs of instructions for Stata. Many books and websites will introduce you to both programs, although we strongly recommend the website maintained by UCLA at www.ats.ucla.edu/stat/. One further point for Stata users is that we rely on the command line approach in this book, but we highly encourage you to write out these commands in a do-file so that you have a record of the commands used. Using Stata’s log files is also strongly recommended.

The chapters follow the typical format for books on linear regression. We begin with elementary statistics, followed by a discussion of the simple linear regression model and regressions with multiple variables. Next, we will learn about goodness-of-fit, comparison of models, and dummy variables. This will be followed by a discussion of linear regression assumptions, such as multicollinearity, heteroscedasticity, autocorrelation, and influential observations. Finally, we provide a brief overview of the logistic regression model.

We hope that you, the reader, will keep one thing in mind as you read this book. Statistics are often maligned by observers. Consider, for example, the book

*How to Lie with Statistics*(Huff, 1954), which supports disbelief of statistical analyses. Sometimes social workers feel that social work is more art than science. Although we believe that social work is an art, it is also a science. It is important for social work researchers and clinicians to acknowledge that knowing what the data do and do not support is important for the field. Of course, you should have a healthy dose of skepticism and use some common sense and imagination when doing statistical analysis. We hope that you will feel comfortable using your imagination and reasoning skills as you cover the material and do your own quantitative work. Statistical analysis has led to many important discoveries in medicine, social work, and other social sciences. It has also informed policy decisions. Both clinical and macro social workers should be keenly aware of the power of statistical methods while also acknowledging its weaknesses.We use both Stata and SPSS in our presentation. In the first few chapters, we present the output from both programs to increase your comfort level with what the programs look like. In the latter chapters, we give one set of results. We list Stata commands and, in a few instances, SPSS syntax, in Courier New font. When we discuss using the drop-down menus in SPSS, we provide those directions by using italicized font. Key words and phrases are also italicized for emphasis, and we provide a list of key terms and concepts at the end of each chapter. We provide you with a few exercises to do on your own at the end of the chapter. We also highly encourage you follow along with the examples we provide in the text.

**John P. Hoffmann, PhD,**is a professor in the Department of Sociology at Brigham Young University, where he teaches courses on linear regression, generalized linear models, and research methods. He received a BS in political science from James Madison University, an MS in justice studies from American University, a PhD in criminology from SUNY-Albany, and an MPH with emphases in epidemiology and behavioral science from Emory University. He is the author of several books and articles that address applied statistical methods, juvenile delinquency, and adolescent health behaviors. His current research interests include trends in adolescent and adult substance use, innovative approaches to the measurement of adolescent substance use, and the effects of stress on adolescent behaviors. He is married to Lynn Hoffmann, and they are the parents of four children: Brian, Christopher, Brandon, and Curtis.

**Kevin Shafer, PhD,**is an assistant professor of social work at Brigham Young University in Provo, Utah, where he teaches MSW-level courses on research methods, statistics, and community organization. Originally from Columbus, Ohio, he received a PhD in 2009, an MA in 2005, and a BA in 2002 from The Ohio State University. His research addresses stepfamily functioning, fathering in the United States and Brazil, and men’s mental health domestically and abroad. He is the author of numerous research articles on these topics, which have appeared in journals such as

*Social Work*, the

*Journal of Marriage & Family*, the

*Journal of Family Issues*, and

*Family Relations*. He is married to Melissa Randall Shafer, and together they have four children – three boys and one girl.