Instructor: Michael L. Nelson mln@cs.odu.edu
Office Hours: Wednesdays 2-4 and by appointment
Time: Wednesdays 4:20pm - 7pm
Place: Dragas, r. 1102
Pre-/Co-requisites: CS 330, CS 300T
This class is intended for academic juniors (or higher) considering a career in research in data and web science. Since research is not a solitary activity, class work will occur in groups. To prepare students for a research career, we will cover a range of topics including:
-
Data management and software engineering tools: Git/GitHub, Docker, Travis CI, cloud computing, virtual machines, etc.
-
Languages and Environments: Python, R, Unix CLI, LaTeX, Overleaf, REST APIs, etc.
-
Participating in the scholarly communication process: reading and summarizing papers, preparing and giving presentations, documenting your own research findings, the spectrum of scholarly communication, etc.
-
Preparing proposals: students will read, review, and finally prepare their own actual research proposals to be submitted to places like the National Science Foundation (NSF), Virginia Space Grant Consortium (VSGC), National Aeronautics and Space Administration (NASA), and others as idenfitied (and pending the student's eligibility).
-
Reproducibility and Replicability: students will work in teams to identify published data and web science studies that they will reproduce and/or replicate.
Note: this class schedule is subject to change. Watch the class repo and monitor the class email list for updates.
- 2020-01-15: Administrivia, scholarly communication, web science, research and grad school
- 2020-01-22: MLN research retrospective, Python, R 1, 2, 3, Git/GitHub 1, 2, 3
- 2020-01-29: HTTP mechanics, web archiving, "Why Care About The Past?"
- 2020-02-05: Guest lecture: Sawood Alam (MLN traveling): AWS 1, 2, Docker 1, 2, LaTeX/Overleaf 1, 2, 3, CLI 1
- 2020-02-12: Web archiving wrap-up, "How to Read a Paper", "Academic Communication (presenting)"
- 2020-02-19: Student presentations, Reproducibility 1,2
- 2020-02-26: Guest lecture: Alexander Nwala and "Micro-collections", Student presentations
- 2020-03-04: Guest lecture: Kritika Garg and "Topic Lifecycle on Social Networks", "Academic Communication (writing) (slides 30--78), NSF GRFP
- 2020-03-11: Spring Break -- no class
- 2020-03-18: Second Spring Break -- Coronavirus
- 2020-03-25: Guest lecture: Gavindya Jayawardena and "Eye-Tracking for Assessing Speech-In-Noise Performance of Adults with ADHD"
- 2020-04-01: Guest lecture: Bhanuka Mahanama and "MGaze: Multi-Gaze Interactions"
- Branches: "Don’t Mess with the Master: Working with Branches in Git and GitHub", "Git Branching - Branches in a Nutshell", "Git Branching - Basic Branching and Merging", "Git & GitHub Tutorial for Beginners #8 - Branches", "Git & GitHub Tutorial for Beginners #9 - Merging Branches (& conflicts)"
- Travis CI & pytest: "Core Concepts for Beginners", "Testing Python with Travis CI in Just 3 Steps", "pytest: Installation and Getting Started"
- 2020-04-08: Guest lecture: Yasith Jayawardana and "Diagnosing ASD from Brain Activity", evaluation background
- "Introduction to Statistics: Levels of Measurement"
- "Mode, Median, Mean, Range, and Standard Deviation"
- "The Five Number Summary, Boxplots, and Outliers"
- "Symmetry and Skewness"
- "Density Curves and their Properties"
- "The Normal Distribution and the 68-95-99.7 Rule"
- assignment 8: NSF GRFP status update (now optional)
- 2020-04-15: Guest lecture: Shawn Jones and "Storytelling with web archives", "Social Cards Probably Provide For Better Understanding Of Web Archive Collections"
- 2020-04-22: Guest lecture: Abigail Mabe and Dhruv Patel and "TimeMap Visualization: An archival thumbnail visualization server",
- "Types of Sampling Methods"
- "How Much of the Web is Archived?"
- "Why We Need Private Web Archives: Almost Two-Thirds of Web Traffic IS NOT Publicly Archivable"
- assignment 10: ready to submit: NSF GRFP
- 2020-04-29: Exams -- no class
- assignment 11: ready to submit: research paper
At the end of the class, each student will have:
- GitHub account + software / data repos resulting from class work
- A professional identity, via a Google Scholar account, Twitter account, LinkedIn account, etc.
- Two submitted (or ready to submit, pending the solicitation's availability) proposals: Virginia Space Grant Consortium (VSGC) and the National Science Foundation (NSF) Graduate Research Fellowship Program (GRFP)
- One published technical report at arXiv.org, co-authored with Dr. Nelson and possibly other faculty and students from the Web Sciences and Digital Libraries Research Group.
A combination of 11 assignments (written and/or oral) and class participation. Note the class will not have conventional mid-term and final exams, tests, quizzes, etc.
Instead, the grades will be determined by class participation and the quality of the aforementioned research deliverables.