Explain differences between the loss functions used for logistic regression and perceptron.
Explain how the chain rule and the Markov assumption are used to estimate the maximum likelihood of a word sequence.
Explain differences between the Laplace smoothing and the Discount smoothing.
Submit quiz6.pdf
including your answers to https://canvas.emory.edu/courses/57068/assignments/218227