The availability of a variety of free, user-friendly statistical software packages has enabled many scientists to perform their own analyses and generate results without a formal knowledge of programming languages. This democratization of data science can be hugely important to researchers wishing to analyze and interpret their research data, but can be risky without sufficient background knowledge in statistics. In order to generate valid results and correctly interpret them, it is important to have an understanding of key statistical principles and applications.
With the goal of enabling the informed use of statistical software, Ragon Institute faculty statistician Dr. Musie Ghebremichael designed an introductory Biostatistics course, mixing lecture with experiential learning to provide the research community with the tools they need to effectively perform their own statistical analyses. “It’s important not just to know which statistical software package to use,” says Dr. Ghebremichael, “but to also know which statistical test or method to use, the assumptions on which it is based, and how to evaluate whether that particular test or method is best suited for the data under consideration.”
Students attending lecture on 3/11/19
Statistics is a data science with the ultimate goal of drawing conclusions about a population using the information generated from a sample. Unlike other statistics courses, the course relies heavily on the use of graphics, data sets, and examples from research on HIV/AIDS and other diseases to elucidate abstract statistical concepts. With weekly Monday lectures and Friday labs, participants in the course have a built-in structure for applying their newly acquired knowledge.
The format of the course works well for students with different backgrounds. “This topic is really new to me, but I think I am starting to make progress after attending the practice sessions following each lecture,” comments Radiana Trifonova, a Research Specialist in the Allen Lab. “It is useful to get examples about proper graphical presentation of data and learn how data can sometimes be presented in ways that are misleading.” In addition to the concrete examples, the theoretical information is proving to be invaluable. “There are several important concepts, particularly the right interpretation of the meaning of specific terms,” says Jun-Rong Wei, a Research Scientist in the Fortune Lab. “Now I understand specificity and sensitivity correctly, and how the disease population affects these numbers.”
Graphs used by Dr. Ghebremichael to illustrate course concepts. Figure 1 shows type 1 and type 2 errors and their corresponding probabilities, and Figure 2 shows interpretations of confidence intervals.
Although the class is still in its early stages, participants are noticing a marked difference in their ability to understand and apply statistical methods and tests, and are feeling more confident about using the software. “I used R at my previous internships but I had several issues; I always needed help,” says Christina Tsekeri, a Masters Student in the Pillai Lab. “With Dr. Ghebremichael’s help to understand, I am feeling much more comfortable with the software and I’ll definitely use it in the future.”
A
detailed description of the course including slides and exercises can be found
on the Ragon
Biostatistics webpage. Lectures and labs will continue to be held
weekly on Mondays and Fridays through the end of July.