CMSC320 Resources
R
- R is an open-source environment for data analysis.
- The RStudio IDE is highly recommended. The Revolution IDE is also very good, only Linux and Windows.
- tidyverse is a collection of data science packages designed for consistency and interoperability.
- swirl is an interactive R (and general data analysis) tutorial
- Data Camp has a nice short online course introducing R
- R Task Views: The Machine Learning and Optimization Task Views list useful packages in R we may use.
- R/Matlab references: A short R guide for Matlab users. A longer one.
- R/Python references: A short R guide for Python users.
Python
- Python Tutorial
- Python Docs
- DataCamp Intro to Python
- Jupyter notebooks
- Google Colab notebooks
- Short python tutorial by Hal Daume
- scikit-learn
- Numpy Tutorial (tentative)
- Numpy User’s Guide
- Scipy docs
- Keras documentation
- Tensorflow documentation
Other Resources
Rstudio has made some very nice cheatsheets for a number of workflows and tools we’ll look at this semester. You can find them here: https://www.rstudio.com/resources/cheatsheets/
You can find a nice list of free data science books here: http://www.wzchen.com/data-science-books/
Data Repositories
- Kaggle: is a site hosting data competitions. It’s a great source of datasets, questions and tutorials.
- Kaggle Datasets a new repository in Kaggle specifically for datasets, including code and scripts by users to get analyses on these datasets started.
- data.world another new repository of public datasets.
- data.gov: The U.S. goverment’s open data portal
- Gapminder
- Global Health Observatory: World Health Organization’s data repository.
- UCI Machine Learning Repository: contains many datasets useful for testing and benchmarking learning algorithms.
- StatLib: Statistical software and dataset portal maintained by CMU.
- Yet another list of public datasets: https://github.com/caesar0301/awesome-public-datasets
- And yet another list of public datasets: http://blog.bigml.com/list-of-public-data-sources-fit-for-machine-learning/#national_governments
- Airbnb data
- NYC Taxi ride data
- Resources for data journalism with R
- Algorithms and datasets for computational social science and digital humanities
- Open population datasets
- Google BigQuery Public Datasets
- AWS Public Datasets
- Yelp provides a dataset for use if you give them your email address here