CMSC320 Intro. Data Science
Hector Corrada Bravo
1 Preamble
2 Introduction and Overview
3 An Illustrative Analysis
4 Setting up the Data Science Toolbox
- 4.1 R/Rstudio
- 4.2 Python/Jupyter
(Part) Data representation modeling, ingestion and cleaning
5 Measurements and Data Types
6 Principles: Basic Operations
7 Principles: More Operations
8 Basic plotting with ggplot
- 8.1 Plot Construction Details
  - 8.1.1 Mappings
  - 8.1.2 Representations
- 8.2 Frequently Used Plots
9 Brief Introduction to Rmarkdown
10 Best Practices for Data Science Projects
11 Tidy Data I: The ER Model
12 SQL I: Single Table Queries
- 12.1 Group-by and summarize
- 12.2 Subqueries
13 Two-table operations
14 SQL System Constructs
15 DB Parting Shots
- 15.1 Database Query Optimization
- 15.2 JSON Data Model
16 Ingesting data
- 16.1 Structured ingestion
  - 16.1.1 CSV files (and similar)
  - 16.1.2 Excel spreadsheets
- 16.2 Scraping
  - 16.2.1 Scraping from dirty HTML tables
17 Tidying data
- 17.1 Tidy Data
- 17.2 Common problems in messy data
18 Text and Dates
- 18.1 Text
- 18.2 Handling dates
19 Entity Resolution and Record Linkage
(Part) Exploratory Data Analysis
20 Exploratory Data Analysis: Visualization
21 Exploratory Data Analysis: Summary Statistics
22 EDA: Data Transformations
23 EDA: Handling Missing Data
(Part) Statistical Learning
24 Univariate distributions and statistics
25 Experiment design and hypothesis testing
26 Multivariate probability
(Part) Machine Learning
27 Data Analysis with Geometry
28 Linear Regression
29 Linear models for classification
30 Solving linear ML problems
31 Tree-Based Methods
32 Model Selection and Evaluation
33 Unsupervised Learning: Clustering
34 Unsupervised Learning: Dimensionality Reduction

Lecture Notes: Introduction to Data Science

(Part) Exploratory Data Analysis