Detailed Syllabus and Lectures


Lecture 10: Modeling the Physical World (slides)

physical scene understanding, intuitive physics, interaction networks, relation networks, visual interaction networks, learning physics engines via graph networks

Please study the following material in preparation for the class:

Required Reading:



Lecture 9: Graph Networks (slides)

graph structured data, graph neural nets (GNNs), GNNs for ”classical” network problems

Please study the following material in preparation for the class:

Required Reading:

Suggested Video Material:



Lecture 8: Image Synthesis (slides)

image synthesis via generative models, conditional generative models, structured vs unstructured prediction, image-to-image translation, generative adversarial networks, cycle-consistent adversarial networks

Please study the following material in preparation for the class:

Required Reading:

Suggested Video Material:



Lecture 7: Embodied Vision (slides)

imitation learning, reinforcement learning, why vision?, connecting language and vision to actions, case study: embodied QA

Please study the following material in preparation for the class:

Required Reading:

Environments:



Lecture 6: Deep Reinforcement Learning (slides)

case studies (and a bit of history), formalizing reinforcement learning, policy gradient methods, temporal differences, q-learning

Please study the following material in preparation for the class:

Required Reading:

Please study the following material in preparation for the class:

Suggested Video Material:

Additional Materials:



Lecture 5: Language and Vision (slides)

image captioning, visual question answering, neural module networks

Please study the following material in preparation for the class:

Required Reading:

Suggested Video Material:



Lecture 4: Multimodality (slides)

what is multimodality, core technical challenges (representation learning, translation, alignment, fusion and co-learning), multimodal representation learning (joint representations, coordinated representations), multimodal fusion

Please study the following material in preparation for the class:

Required Reading:

Suggested Video Material:



Lecture 3: Sequential Processing with NNs, Attention (slides)

sequential data, convolutions in time, recurrent neural networks (RNNs), autoregressive generative models, attention models, transformer

Please study the following material in preparation for the class:

Required Reading:

Suggested Video Material:



Lecture 2: Neural Networks Basics, Spatial Processing with CNNs (slides)

deep learning, computation in a neural net, optimization, backpropagation, training tricks, convolutional neural networks

Please study the following material in preparation for the class:

Required Reading:



Lecture 1: Introduction to the course (slides)

course information, can machines see like humans?, the current state of art in computer vision

Please study the following material in preparation for the class:

Required Reading: