EECE703A Machine Learning for Natural Language Processing

 

1. Course Objectives

 

 

This course covers basic theories and practices of machine learning techniques suitable for natural language processing. The first half of the lectures are devoted to various machine learning techniques such as decision trees, Bayesian learning, neural net, instance-based learning, learning set of rules, support vector machines, maximum entropy learning, and other hybrid learning methods. The second half will be devoted to applications of machine learning theories to diverse natural language processing tasks such as POS tagging, parsing, information extraction, web mining, speech processing, bio-text mining and question-answering systems.

 

2. Pre-requsites

no prerequisites, but instructor consents recommended

 

3. Grading

midterm 50%

assignments 20%

class presentation 30%

 

4. Text and references

Tom Mitchell, Machine Learning, WCB/McGraw-Hill, 1997 (first half)

Manning, C. D., & Schutze, H. (1999). Foundations of Statistical Natural Language Processing. MIT Press (second half)

Related Proceedings (ACL/COLING, NAACL, ICSLP, ASRU, ICASSP, etc) and some selected papers (second half)

 

5. Course Schedules

1st week: Introduction to machine learning/NLP

2nd week: Decision tree learning  - 1st homework

3rd week: Artificial Neural Network learning

4th week: Bayesian learning (MLE, MAP)

5th week: Instance-based learning 2nd homework

6th week: Learning set of rules/FOL learning

7th week: Reinforcement learning

8th week: support vector machine (SVM) 3rd homework

9th week: Hidden Markov Models (HMM)

10th week Log Linear models/Maximum entropy model

11th week: Conditional random fields (CRF) last homework

12th week: Machine Translation application

13th week: Dialog Systems applications

14th week: Student presentation (NLP/speech application)

15th week: Student presentation (NLP/Speech application)

16th week: Student presentation (NLP/speech applications)

 

6. Notes

- Students perform some hands-on exercises for learning-based NLP tasks such as POS tagging and parsing (both for Korean and English)

- All lectures, presentations and discussions are in English