Instructor
Erol Taymaz
Room A216
Lecture hours: Wed 14:40-17:30
Office hours: Wed 10:00-12:00
Course prerequisites: IS 100, ECON 206
Course credit: 3
Course description
Data science is an interdisciplinary field about scientific processes and systems to extract knowledge or insights from data in various forms. With the availability of substantial amount of data in various forms and resources, it has become essential for economists to be equipped with skills needed to collect, process, analyze, and present the data. The course will be taught as a series of workshops. Main topics and methods will be summarized and discussed in each lecture, and the students will write the code to perform the task assigned to them during the lecture. The students will learn how to write basic programs in R which is one of the most popular open-source programming language currently in use by data scientists.
Course objectives
By the end of the course the students will know how they use the data science for economic analysis, and learn the basic tools that they need for data analysis. At the end of the course the students apply these tools and techniques to analyze a real-world problem by using R in all stages of the research process.
Learning outcomes
Thus the students at the end of the semester will be able to:
- Learn basic programming skills with R Programming
- Access the data from various sources and formats
- Reshape and clean the data for reporting and further analysis
- Explore and visualize the data
- Conduct statistics analysis by using R *Perform reproducible research
Grading
The course consists of lectures, practices (assignments) and a project. Practices involve performing a specific task about collecting, cleaning, analyzing, visualizing and presenting the data about a certain topic relevant for economists. Practices will be submitted individually. The project will involve all components of data analysis process, and students are encouraged to work in teams of two or three for a project. The project will seek to answer an important real-world probem. The students will collect and clean th data, model the problem, visualize the data and their analysis, and present their findings by using R.
Course grades will be based on an 6 practices (60 % each), and a (group) project (40 %).
Textbooks
Everitt, Brian S. and Hothorn, Torsten (2009), A Handbook of Statistical Analyses Using R, Chapman and Hall/CRC.
Peng, Roger D. (2015), R Programming for Data Science, Leanpub.
Venables, W. N., Smith, D. M. and the R Core Team (2015), An Introduction to R, R Core Team. Wickham, Hadley (2014), Advanced R, Chapman & Hall/CRC.
Wickham, Hadley (2016), ggplot2: Elegant Graphics for Data Analysis, Springer.
Zumel, N. and Mount, J. (2014), Practical Data Science with R, Manning Publications.
Outline of topics
- Introduction to Data Science
- What is data science?
- Why is data science important?
- The data science process
- Introduction to R
- What is R?
- R language basics
- Rstudio basics
- Data visualization
- Data structures
- File types
- ggplot basics
- Animation
- Data exploration
- Data presentation
- Data structures
- Data structures in R
- Matrix
- Data frame
- Data table
- Functions
- Function components
- Function arguments
- Special functions
- Loops and loop functions
- Looping in R
- Loop functions
- Transforming and cleaning data
- Data transformation
- Data cleaning
- Data merging
- Creating new variables
- Missing observations
- Reading and collecting data
- Reading data files
- Web sources
- Descriptive statistics
- Univariate descriptive statistics
- Bivariate descriptive statistics
- Statistical modeling
- Statistical models
- Linear regression models
- Panel data models
- Text mining
- Reading text data
- Analyzing text data
- Maps
- Map functions in R
- Map visualizations
- Networks
- Network analysis basics
- Network visualizations
- International trade networks
- Reproducible research
- Rmarkdown basics
- Presentation basics