IB/NRES 509: Statistical Modeling

SPRING 2010
Lecture: MWF 10:00-10:50, location TBD
Lab: W 2:00-4:50, IGB 0607
4 Credit Hours

Instructor: Dr. Dietze

mdietze at illinois.edu
Morrill 183 / IGB 1405
Office hours by appointment

Teaching Assistant:

TBD

Course Format:

4 credit hours -- 3 50-min lectures and 1 3-hr computer lab

Course Description:

Researchers in the biological and environmental sciences are often confronted with data that is complex in nature and is beyond the assumptions of classical statistical tests. The goal of confronting our scientific theories with data often requires us embrace the complexity of our data -- to make inference on indirectly observed quantities, to bring multiple types of data to bear on a single question, to synthesize past observations with new data, or to separate the effects of different error processes (e.g. observation error vs. inherent variability in the process). This class provides an introduction to modern statistical modeling from both likelihood and Bayesian perspectives. The focus is on science-driven, problem-specific design of statistical analyses for complex data. Topics include point estimation, interval estimation, model selection, regression, non-linear models, non-Gaussian models & GLMs, hierarchical models, time-series analysis, spatial models, data assimilation, and statistical forecasting. Computational methods such as numerical optimization and Markov-Chain Monte-Carlo simulation are covered with a focus on hands-on application to real data. Course is designed around case-study problem sets using R and BUGS. Advanced graduate students are encouraged to use their own datasets for class projects.

Prerequisites:

Calculus (Math 220); CPSC 440 or Stats 400 or equivalent or consent of instructor
In practice, a basic familiarity with math and statistics is what is is required. You should know what a derivative, integral, and sum are, you should have heard of ANOVA and regression, and you should have a general understanding of experimental design, randomization, and exploratory data analysis.

Text:

Models for Ecological Data by James S. Clark



While the primary text takes an ecological perspective the methods are applicable to all aspects of research in the biological and environmental sciences. The primary text will be supplemented with select readings from additional textbooks and the primary literature. Literature readings will focus on examples of the application of statistical models in the biological and environmental literature rather than methods papers. These “case studies” will also serve as the focus for the analysis problems in the lab component.

Software:

The R project for statistical computing
The BUGS project - Bayesian inference Using Gibbs Sampling

Grading Policy:

Grading will be based on lab reports/problem sets, a semester-long project, and three exams. Students are encouraged to make use of their own data sets for the semester project. Total = 350

Lecture Schedule

Date Topics Reading Assignments
1/20 Introduction to model-based inference Chapter 1
1/22 Probability theory: discrete and continuous distributions
1/25 Probability theory: joint, conditional, and marginal distributions
1/27 Maximum Likelihood
1/29 Point estimation by MLE
2/1 Analytically tractable MLEs
2/3 Intractable MLEs and basic numerical optimization
2/5 Bayes Theorem
2/8 Point estimation using Bayes
2/10 Analytically-tractable Bayes: conjugacy and priors
2/12 Numerical methods for Bayes: MCMC
2/15 MCMC: Metropolis-Hastings
2/17 MCMC: Gibbs sampler Project Proposals
2/19 MCMC: Importance sampling
2/22 EXAM 1
2/24 Interval Estimation: theory
2/26 Frequentist confidence intervals
3/1 Bayesian credible intervals
3/3 Model Selection: Likelihood ratio test, AIC
3/5 Model Selection: DIC, predictive loss, model averaging
3/8 Regression: likelihood derivation
3/10 Bayesian linear regression
3/12 Logistic regression
3/15 GLMs
3/17 Nonlinear models Model Description
3/19 Hierarchical Bayes
3/29 Random effects models
3/31 Measurement error and missing data models
4/2 EXAM 2
4/5 Time series: Basics and diagnostics
4/7 Time series: ARMA
4/9 Time series: spectral techniques
4/12 Time series: Bayesian state space model
4/14 Spatial: point pattern data Preliminary Analysis
4/16 Spatial: point-referenced (geostatistical) data and Kreiging
4/19 Spatial: block-referenced data and misalignment
4/21 Spatial: conditional autoregressive models (CAR)
4/23 Data assimilation: classic Kalman filter
4/26 Data assimilation: Kalman variants
4/28 Data assimilation: Bayesian state-space revisited
5/3 Forecasting: posterior predictive distributions
5/5 Forecasting: Ensemble analysis
TBD FINAL EXAM FINAL PROJECT

Lab Syllabus

Week Topics Software
1 Introduction to R R
2 Probability distributions and sampling R
3 Maximum likelihood - basics R
4 Maximum likelihood - numerical optimization R
5 Metropolis Algorithm and importance sampling R
6 Gibbs sampler R
7 Interval estimation and model selection R
8 Introduction to BUGS WinBUGS
9 Regression Both
10 Hierarchical modeling WinBUGS
11 Exploratory data analysis: space and time R
12 Time series WinBUGS
13 Spatial CAR and Kriging WinBUGS
14 Data Assimilation R
15 Forecasting WinBUGS

Additional Resources

Books

General review articles

-->