- R is an open-source environment for data analysis.
- The RStudio IDE is highly recommended. The Revolution IDE is also very good, only Linux and Windows.
- tidyverse is a collection of data science packages designed for consistency and interoperability.
- swirl is an interactive R (and general data analysis) tutorial
- Data Camp has a nice short online course introducing R
- R Task Views: The Machine Learning and Optimization Task Views list useful packages in R we may use.
- R/Matlab references: A short R guide for Matlab users. A longer one.
- R/Python references: A short R guide for Python users.
You can find a nice list of free data science books here: http://www.wzchen.com/data-science-books/
- Kaggle: is a site hosting data competitions. It’s a great source of datasets, questions and tutorials.
- Kaggle Datasets a new repository in Kaggle specifically for datasets, including code and scripts by users to get analyses on these datasets started.
- data.world another new repository of public datasets.
- data.gov: The U.S. goverment’s open data portal
- Global Health Observatory: World Health Organization’s data repository.
- UCI Machine Learning Repository: contains many datasets useful for testing and benchmarking learning algorithms.
- StatLib: Statistical software and dataset portal maintained by CMU.
- Another large list of repositories: http://www.inside-r.org/howto/finding-data-internet
- Yet another list of public datasets: https://github.com/caesar0301/awesome-public-datasets
- And yet another list of public datasets: http://blog.bigml.com/list-of-public-data-sources-fit-for-machine-learning/#national_governments
- Airbnb data
- NYC Taxi ride data
- Resources for data journalism with R
- Algorithms and datasets for computational social science and digital humanities
- Open population datasets
- Google BigQuery Public Datasets
- AWS Public Datasets