Introduction to Statistical Methods

M1 International, Université Paris-Saclay - Winter 2017-18

Description

The course is the second part of the module Probabilities and Statistics of the international Computer-Science Master (M1 IIT) program. The course mainly targets students and researchers who are interested in experimental research methods and often need to deal with small samples and messy data. Previous knowledge of statistics or probability theory is not required, but some basic understanding of probabilities could help.

The course will introduce fundamental concepts of descriptive and inferential statistics. The goal of the course is NOT to provide a set of statistical recipes or step-by-step instructions. Particular focus will be given on understanding key principles, thinking about underlying assumptions, and recognizing the limitations of statistical methods.

The students will also learn how to use the R statistical software to analyze real datasets and how to apply computational methods to estimate parameters or evaluate statistical procedures.

Lectures

Lectures take place on Thursdays from 2 to 5pm at PUIO (Bat. 640) in room E213. For more information about the content and schedule of the course, see the course syllabus.

Classes are given by Theophanis Tsandilas.

News

I have posted a set of home exercises to help you prepare for the final exam.

The assignment is out!

Classes

Nov 23. Basic Concepts: data, populations, samples. Why learning statistics? Types of data and descriptive statistics. Starting with R.
[Lecture 1: Slides] [Lecture 1: R code]

- You can read Chapter 1 of Baguley's book, available online.

Nov 30. Discrete and continuous probability distributions: binomial, normal, log-normal, chi-square and t-distribution. The sampling distribution of a statistic. The Central Limit Theorem. A brief introduction to confidence intervals.
[Lecture 2: Slides] [Lecture 2: R code]

- Lecture notes on the normal distribution and the Central Limit Theorem from MIT. The notes also explain how to use histograms to plot distributions.
- Interactively generate sampling distributions of various statistics from various population distributions.

Dec 7. Confidence intervals. Monte Carlo simulations. Experimental designs: independent groups, repeated measures.
[Lecture 3: Slides] [Lecture 3: R code] [R skeleton code for exercise]

Dec 14. Confidence intervals of non-normal distributions. Bootstrapping. Introduction to Null Hypothesis Significance Testing.
[Lecture 4: Slides] [Lecture 4: R code]

- Working with data frames on R
- A primer to bootstrapping. The report starts with a discussion on the use of Null Hypothesis Significance Testing, but (unless you are curious) you can ignore it for now.

Dec 21. Significance tests and p values. Type I and Type II errors. Statistical Power. Publication bias. P-hacking and criticisms of NHST.
[Lecture 5: Slides] [Lecture 5: R code]

- An article discussing several p-hacking methods and ways to avoid them.

Jan 11. Covariance and correlation. Simple linear regression.
[Lecture 6: Slides] [Lecture 6: R code]

Jan 18. Preparation for the final exam.
[Lecture 7: Home exercises]

R Tutorials

There is a large collection of online tutorials for learning R. The following list may not be representative. It will be updated during the course:

Textbooks

Part of the course content has been inspired by Thom Baguley's book:

Buying the book for the purposes of the course is not required, but I recommend it to students who are interested in deepening their understanding of statistics and would like to have a reference for their data analyses. There are many other textbooks on statistics, but unfortunately, I cannot express a personal opinion on their content or teaching approach.

During the class, I will add links to various online readings to help students better understand the course material.