BIL 722: Advanced Topics in Computer Vision
(Deep Learning for Computer Vision)
Google's Deep Dream re-interprets Georges Seurat's A Sunday Afternoon on the Island of La Grande Jatte (1884). The generated image courtesy of Alex Korbonits.
This is a graduate seminar course exploring recent advances in computer vision, with a special focus on deep learning. In particular, the class will take an in-depth look at common deep architectures and their applications to various problems in computer vision. The topics include image/scene/video classification, object detection, segmentation, action/activity recognition, image captioning and visual question answering.
Time and Location
Lectures: Tuesday at 13:30-16:30 (Room D9)
The course webpage will be updated regularly throughout the semester with lecture notes, presentations, assignments and important deadlines. All other course related communications will be carried out through Piazza. Please enroll it by following the link https://piazza.com/hacettepe.edu.tr/spring2016/bil722.
Courses in computer vision and/or machine learning (e.g. BBM 406, BBM 416, BIL 712, BIL 719). Good programming skills for the assignment(s) and the course project.
Course Requirements and Grading
Grading for BIL 722 will be based on
- homework (20%)
- course project (done in pairs) (presentation and reports) (40%),
- paper presentations (25%),
- class participation (attendance, participation in discussions, response papers) (15%).
|Feb 9||Background and Basics [slides]
course information, what is deep learning, linear classification, nearest neighbor classfiers, hyperparameter search, cross-validation, loss functions, stochastic gradient descent
|Feb 16||Training Neural Networks [slides]
feedforward neural networks, activation functions, backpropagation, hyperparameter optimization, weight initialization, batch normalization, dropout
|Feb 23||Convolutional Neural Networks (ConvNets) [slides]
Caffe Tutorial [slides]
||Aykut ErdemCagdas Bak|
|Mar 1||Course Project [slides], ConvNets In Practice: Image Classification||
||Hilal Ergun AkyuzMehmet GunelSemih Yagcioglu|
|Mar 8||ConvNets In Practice: Scene Classification and Object Detection
TensorFlow Tutorial [slides]
||Bora CelikkaleKemal CizmecilerGoksu ErdoganM. Kerim Yucel|
|Mar 15||ConvNets In Practice: Segmentation||
Mehmet GunelGoksu Erdogan
|Mar 22||ConvNets In Practice: Video Classification
Theano Tutorial [slides]
||Cemil ZalluhogluIman RezazadehCagdas BakSemih Yagcioglu|
|Mar 29||ConvNets In Practice: Misc||
||Aysun KocakOkay ArikBerkan Demirel|
|Apr 5||Recurrent Neural Networks (RNNs) [slides]
backpropagation through time (BTT), memory units, LSTMs
||Nazli Ikizler Cinbis|
|Apr 12||RNNs In Practice: Language and Vision
Keras Tutorial [slides]
||Berkan DemirelMert KilickayaMuhammet Ali Asan|
|Apr 19||Progress Presentations|
|Apr 26||RNNs In Practice: Video Classification||
||Ozge YalcinkayaMehmet Kerim YucelEzgi Peksen Soysal|
|May 3||RNNs In Practice: Object Recognition and Segmentation||
||Ceren Guzel TurhanSemih YagciogluOkay Arik|
|May 10||Unsupervised Deep Learning
Boltzmann machines and log-bilinear models, autoencoders
||Hilal Ergun AkyuzLevent Karacan|
Depending on the class enrollment, each student is required to present one or two papers over the course of the semester. Each presentation should be clear, well organized and very technical, and roughly 30 minutes long. The presenter should read the assigned paper in detail and be prepared to effectively lead the class discussion on the paper.
To prepare your presentation, you can use any presentation tool (e.g., Powerpoint, Keynote, LaTex) provided that the tool has options to export the slides to PDF. You are allowed to reuse the material already exist on the web as long as you clearly cite the source of the media that you have used in your presentation. Extra credit will be awarded to those students who also conduct some experiments demonstrating how the method works in practice.
Deadline: You should meet with the instructor 3-4 days before the presentation date to discuss your slides, and the presentation should be submitted by the night before the class.
- High-level overview of the paper (main contributions)
- Problem statement and motivation (clear definition of the problem, why it is interesting and important)
- Key technical ideas (overview of the approach)
- Experimental set-up (datasets, evaluation metrics, applications)
- Strengths and weaknesses (discussion of the results obtained)
- Connections with other work (how it relates to other approaches, its similarities and differences)
- Future direction (open research questions)
The presentations will be graded according to this rubric.
Due: March 15, 2016 (12:30pm)
In this homework, you will learn, through a first-hand experience, how to fine-tune a pre-trained model to classify cultural events on the image data from ChaLearn Looking at People 2015 Challenge (CVPR 2015).
In particular, the purpose of this homework is to make you familiarize with fundamentals of training and understanding convolutional networks, namely
- applying dropout, batch normalization and data augmentation to reduce overfitting,
- combining models into ensembles to improve the performance,
- using transfer learning to adapt a pre-trained model to a new dataset,
- using data gradients to visualize saliency maps
You can use the deep learning framework of your choice (e.g. Caffe, Torch, Theano, Keras, etc.) as long as your implementation meet the requirements stated above.
For more details on the homework, see this page.
The students taking the course are required to complete a research oriented project. The students can work individually or in pairs. The course project may involve
- Design of a novel approach and its experimental analysis, or
- An extension to a recent study of non-trivial complexity and its experimental analysis.
- Ian Goodfellow, Yoshua Bengio, and Aaron Courville, Deep Learning, MIT Press (in preparation) (draft available online)
- CS231n: Convolutional Neural Networks for Visual Recognition, Fei-Fei Li, Andrej Karpathy, Justin Johnson, Stanford University
- CSC2523: Deep Learning in Computer Vision, Sanja Fidler, University of Toronto
- CSC321: Introduction to Neural Networks and Machine Learning, Tijmen Tieleman, University of Toronto
- Deep Learning, Yann LeCun, New York University
- ECE 6504 Deep Learning for Perception, Dhruv Batra, Virginia Tech
- Machine Learning, Nando de Freitas, Oxford University