Python and R for the Modern Data Scientist Code

Welcome

Welcome to the companion code repository for the O'Reilly book Python and R for the Modern Data Scientist. You can also access this repository as an RStudio Cloud project (account required).

Success in data science depends on the flexible and appropriate use of tools. That includes Python and R, two of the foundational programming languages in the field. With this book, data scientists from the Python and R communities will learn how to speak the dialects of each language. By recognizing the strengths of working with both, you'll discover new ways to accomplish data science tasks and expand your skill set.

Authors Rick J Scavetta and Boyan Angelov explain the fundamentals of these languages and highlight where each one excels over the other, whether it's their linguistic features or the power of their open source ecosystems. Not only will you learn how to use Python and R together in real-world settings, but you'll also broaden your knowledge and job opportunities by working as a bilingual data scientist.

Learn Python and R from the perspective of your current language
Understand the strengths and weaknesses of each language
Identify use cases where one language is better suited than the other
Understand the modern open source ecosystem available for both, including packages, frameworks, and workflows
Learn how to integrate R and Python in a single workflow
Follow a real-world case study that demonstrates ways to use these languages together

Repository structure

When available, companion scripts to the book are found in their respective chapter directories.

Part II. Levels of working together I: Bilingual

Part III. Modern Context

Part IV. Levels of working together II: Synergy

Appendix A. Bilingual Dictionary

Available here.

Datasets

Datasets used in the book can be found as follows.

Diamonds

This dataset is from the R ggplot2 package:

library(ggplot2)
data(diamonds)

Iris & Plant Growth

These are available in base R:

data(PlantGrowth)
data(iris)

Boston housing

This dataset is available in using the Python scikit-learn package:

from sklearn.datasets import load_boston
boston_data = load_boston()

Amazon music reviews

The Amazon music review data can be downloaded here. We use the "digital music" subset.

Swimming pool and car detection

This dataset on swimming pool and car detection using satelite imagery is available on Kaggle.

Daily Australian Temperatures

The daily australian temperatures dataset can be dowloaded directly from Github.

Loxodonta Africana species occurence data

Obtain this data and the spatial raster (the bioclimactic varialbes) using the R sdmbench package:

library(sdmbench)
data <- get_benchmarking_data("Loxodonta africana")

This object is a list and contains the occurence data in data$df_data and the raster layers in data$raster_data.

Shared cars locations data

These data can be downloaded from Kaggle.

Wildfires

The wildfires data can be downloaded from the USDA website directly or from Kaggle. To run the case study, add the file FPA_FOD_20170508.sqlite to the ch07-case-study/data/ folder.

Star Wars

This dataset is from the R dplyr package:

library(dplyr)
data(starwars)

Name		Name	Last commit message	Last commit date
Latest commit History 9 Commits
ch02-r4py		ch02-r4py
ch03-py4r		ch03-py4r
ch04-format		ch04-format
ch05-workflow		ch05-workflow
ch06-reticulate		ch06-reticulate
ch07-case-study		ch07-case-study
.gitignore		.gitignore
PyR4MDS.Rproj		PyR4MDS.Rproj
README.md		README.md
book_cover.jpeg		book_cover.jpeg

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Python and R for the Modern Data Scientist Code

Welcome

Repository structure

Datasets

Diamonds

Iris & Plant Growth

Boston housing

Amazon music reviews

Swimming pool and car detection

Daily Australian Temperatures

Loxodonta Africana species occurence data

Shared cars locations data

Wildfires

Star Wars

About

Releases

Packages

Contributors 2

Languages

moderndatadesign/PyR4MDS

Folders and files

Latest commit

History

Repository files navigation

Python and R for the Modern Data Scientist Code

Welcome

Repository structure

Datasets

Diamonds

Iris & Plant Growth

Boston housing

Amazon music reviews

Swimming pool and car detection

Daily Australian Temperatures

Loxodonta Africana species occurence data

Shared cars locations data

Wildfires

Star Wars

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Languages

Packages