CMP711   Natural Language Processing


 

Semester

Fall 2024

Instructor

Prof. Dr. İlyas Çiçekli

Email:  ilyas@cs.hacettepe.edu.tr

Class Hours 

Wednesday 13:30-16:30

Classroom: D6

  


Text Book

1.   Daniel Jurafsky, and James H. Martin, "Speech and Language Processing", Third Edition, Prentice Hall, 2024.

 

Other References

 

1.    Christopher D. Manning, and Hinrich Schutze, "Foundations of Statistical Natural Language Processing", The MIT Press, 1999.

2.    Bird, Steven, Edward Loper and Ewan Klein, “Natural Language Processing with Python”, O’Reilly Media Inc., 2009.

 


Grading

Project

45%

Final Exam 

40%

Homework

15%

 


Project

Each student will do a survey in an advanced topic in NLP field, and a computational work as a project. You should read at least 2-3 major papers in that field, and prepare a professionally written paper (in the format of a conference or journal paper) for your project. At the end of semester, you will return your paper together with the copies of the major papers that you read and you will make a demo of your project.

 

Possible Project Topics You should write a one-page document for your project proposal and it should include your project title and the references to major papers related with your project and the description of your project.

 

Project Proposal: Due Date: 21 October 2024

You should submit your project proposal before due date. Your project proposal (a single one-page pdf file) should include your project title, the references to major papers related with your project and a short description of your project.

 

Midway Project Report: Due Date: 6 December 2024

You should submit your midway project report as a single pdf file before due date. This means that you should finish some parts of your project work before the midway point. The format of your midway project report is the same as the format of the final project report. It should be a short version of the final project report. It should include the details of your project (such as related work section, problem description and things that you did until the midway point).

 

Project Demo Date:  15 January 2025 (or before) (HARD DEADLINE)

You have to make a demo of your project to me on these dates. You should give all your source files and executable files before your demo day.

 

Due Date for Final Project Report: 15 January 2025 (or before) (HARD DEADLINE) -Your project is NOT complete until you give all of the followings. You should send them as a single zipped file before your demo.

1.       A soft copy of your final project report. Your final project must be in a journal article format. It should include a title, an abstract, an introduction section, a conclusion section, references, an evaluation section and other related sections. You should use IEEE style file (IEEEFormat) for your reports. Your final project report should be a single pdf file.

2.       All of your source code files, executable files and all other files related with your project (including sample input-output files and a readme file how to execute your project).

3.       Soft copies of the papers in your survey.

 

 


Course Outline:

Week

Subject

Related chapters in 3rd edition of textbook

1

Introduction/Overview of NLP

Ch. 1

2

Regular Expressions, Text Normalization, Edit Distance

Ch. 2

3

N-gram Language Models,

Ch. 3

4

Spelling Correction, Part-of-Speech Tagging

Ch. 8 & Appendix B

5

Text Classification: Naive Bayes

Ch. 4

6

Text Classification: Logistic Regression

Ch. 5

7

Vector Semantics

Ch. 6

8

Neural Networks and Neural Language Models

Ch. 7

9

RNNs and LSTMs

Ch. 9

10

Transformers and Large Language Models

Ch. 10

11

Fine-Tuning and Masked Language Models

Ch. 11

12

Morphological Processing

Ch. 3 from 2nd edition of the book

13

Context-Free Grammars and Syntactic Parsing

Ch. 17 and else

14

Statistical Parsing

Ch. 18 and else

 


 

Lecture Notes:

           

·       lec01-introduction.pdf

·       lec02-1-BasicTextProcessing.pdf

·       lec02-2-MinimumEditDistance.pdf

·       lec03-LanguageModels.pdf

·       lec04-1-SpellingCorrection.pdf

·       lec04-2-PartOfSpeechTagging.pdf

·       lec05-TextClassificationNaiveBayes.pdf

·       lec06-LogisticRegression.pdf

·       lec07-VectorSemantics_Word2vec.pdf

·       lec08-NN_NeuralLanguageModels.pdf

·       lec09-RNNs_LSTMs.pdf

·       lec10-Transformers_LLMs.pdf

·       lec11-BidirectionalTransformerEncoders.pdf

·       lec12-MorphologicalProcessing.pdf

·       lec13-1-SyntacticParsing.pdf

·       lec13-2-StatisticalParsing.pdf

 


 

 

Announcements:

 

·       I will use the HADI system ( https://hadi.hacettepe.edu.tr/login/ ) for all course announcements. All course materials including your grades will be available in the HADI system. You will submit your assignments using the HADI system.