Skip to content

Homework 1

Jinho D. Choi edited this page Jan 29, 2018 · 11 revisions

Preparation

Download hw1.py.

Regular Expressions

Write regular expressions matching the following cases (one expression per case):

  • Numbers (e.g., two thousands eighteen).
  • Dates (e.g., Dec. 25, 2018).

Use the template in hw1.py to write your expressions:

RE_NUMBER = re.compile('your expression')
RE_DATE = re.compile('your expression')

Try to cover as many patterns as possible. Be aware that these cases can occur anywhere in a document. In the report, describe what kind of groups you make in your expressions.

Normalization

Normalize numbers into digits:

  • two thousands eighteen → 2018.
  • 3 hundreds twenty one → 321.

Normalize dates into a standardized format:

  • April 1st, 2018 → 2018/04/01.
  • Dec. 25 2018 → 2018/12/25.

Complete the norm_number and norm_date function in hw1.py, which take a string and convert all numbers and dates in the string into the standardized formats, respectively:

def norm_number(s):
    """
    :param s: the input string
    :return: the input string where all numbers are converted into their digit-forms.
    """
    return s


def norm_date(s):
    """
    :param s: the input string
    :return: the input string where all dates are standarized.
    """
    return s

In the report, provide interesting test cases for these functions.

Exam Schedule Extraction

Use any code in data_aggregation.py to extract information for each course with its final exam schedule and save the result to a JSON file called course_exam_spring_2018.json. Make sure to include information from all departments in the Course Atlas. In the report, describe any challenge that was not discussed during the classes.

Submission

Submit the followings to: https://canvas.emory.edu/courses/41979/assignments/115326

  • hw1.py: including the code assigned above.
  • hw1.pdf: the report describing your approaches.
  • course_exam_spring_2018.json: the output of the schedule extraction.

Practical Approaches to Data Science with Text

Instructor


Emory University

Clone this wiki locally