Hluboké učení

Intro to learning - linear classifier

S. Lazebnik - presentation No. 3.
Slides to study: 1-20, (24-28), 30-31, 34-36, 39-47
Další studijní materiály

F. Chollet: Deep Learning v jazyku Python, kapitola 2.4.

R. Neruda, J. Šíma: Teoretické otázky neuronových sítí, kapitola 2.1 (volitelný materiál).

P. Sosík: Skripta z neuronových sítí, kapitola 2.1 ( volitelný materiál).

Mean squared error (3 points)
Consider a training set with 30 samples (x_i, y_i) and the loss function MSE - mean squared error shown at slide 18 in the presentation No. 3 - Linear Classifiers. When will the total loss be lower: if (a) the difference between the neuron output w * x_i and the desired value y_i is 1 for each training sample, or (b) if ten samples have the difference 2 and the remaining samples have the difference 0? Justify your answer. Hint: just use the formula for MSE.
Gradient descent (3 points)
Explain what the gradient of a loss function is and what its geometric meaning is when we think of the loss function as a hypersurface. How do we use it to minimize the loss with Stochastic Gradient Descent (SGD)? What does the word ``stochastic'' refer to in this method?