Textbook:
- Topics: course overview, git bash, python config.ini files, conda virtual environments
- Technology: git bash, configparser, conda
- Homework: use the command line to search data among 1000's of server configuration files
- Topics: Extract text from docx, pdf, and image files
- Technology: docx, PyPDF2, pdfminer.six, subprocess, pytesseract
- Homework: structure the annual reports into sections
- Supplementary Material: watch lesson_databases videos
- Topics: POS tagging, dependency parsing, rule-based matching, phrase dectection
- Technology: SpaCy, gensim
- Prework: Read section 2.1-2.4 SLP and/or 2.1-2.5 SLP videos , section 8.1-8.3 SLP, and chapter 5 Collocations
- Supplementary Material: watch lesson_automation videos
- Topics: vector space model, TFIDF, BM25, Co-occurance matrix
- Technology: scikit-learn
- Prework: Read section 6.1-6.6 SLP
- Supplementary Material: watch lesson_object_oriented_python
- Topics: PCA, latent semantic indexing (LSI), latent dirichlet allocation(LDA), topic coherence metrics
- Technology: scikit-learn, gensim
- Prework: Read TamingTextwiththeSVD
- Topics: Word2Vec, GloVe, FastText
- Technology: scikit-learn, gensim
- Prework: Read section 6.8-6.13 SLP, Efficient Estimation of Word Representations in Vector Space & Distributed Representations of Words and Phrases and their Compositionality
- Topics: Neural networks with word embeddings
- Technology: keras, gensim, FLAIR (pytorch)
- Prework: Read NLP (Almost) from Scratch and A Primer on Neural Network Models for Natural Language Processing
- Topics: Neural networks with word embeddings, Contextual Word Embeddings
- Technology: keras, gensim, FLAIR (pytorch)
- Prework: Read NLP (Almost) from Scratch and A Primer on Neural Network Models for Natural Language Processing
- Topics: cosine similarity, distance metrics, l1 and l2 norm, recommendation engines
- Technology: scikit-learn, SpaCy, gensim
- Prework: Read section 2.5 SLP and/or 2.1-2.5 SLP videos
- Topics: automate the process to collect data from https://www.annualreports.com
- Technology: requests, Jupyter Notebooks, BeautifulSoup, Scrapy
- Homework: automate the process to identify and download company 10-K annual reports
- Topics: use sqlalchemy to create and populate a database, locally and on AWS
- Technology: sqlalchemy, sqllite, AWS RDS (MySQL)
- Homework: create and populate a database with sqlalchemy
- Topics: reconstruct scikit-learn's CountVectorizer codebase
- Technology: scikit-learn, object oriented Python