Summer Session 2024 Midterm Exam

This exam was administered in person. Students had 90 minutes to take this exam.

Problem 1

Consider a dataset of n values, y_1, y_2, \dots, y_n, all of which are non-negative. We are interested in fitting a constant model, H(x) = h, to the data using the ``Jack” loss function, defined as:

L_{\text{Jack}}(y_i, h) = \begin{cases} \alpha \cdot (y_i - h)^2 & \text{if } y_i \geq h, \\ \beta \cdot |y_i - h|^3 & \text{if } y_i < h, \end{cases}

where \alpha and \beta are positive constants that weight the squared and cubic loss components differently depending on whether the prediction h underestimates or overestimates the true value y_i.

Problem 1.1

Find \frac{d L_\text{Jack}}{d h}, the derivative of the Jack loss function with respect to h. Show your work, and put a \boxed{\text{box}} around your final answer.

\frac{L_{\text{Jack}}}{\partial h} = \begin{cases} -2 \alpha(y_i - h) & \text{if } y_i \geq h, \\ 3 \beta |h - y_i|^2 & \text{if } y_i < h, \end{cases}

Looking at the loss function (displayed again for your convenience), we can start by taking the derivative of the first case, when y_i \geq h:

L_{\text{Jack}}(y_i, h) = \begin{cases} \alpha \cdot (y_i - h)^2 & \text{if } y_i \geq h, \\ \beta \cdot |y_i - h|^3 & \text{if } y_i < h, \end{cases}

To find the derivative of the first case we will use chain rule. Chain rule states F'(h) = f'(g(h)) \times g'(h). We start with \frac{\partial}{\partial h} \alpha(y_i - h)^2 and treat f(h) = h^2 and g(h) = (y_i - h). We can easily find f'(h) = 2h and g'(h) = -1. When combining all of these parts we get

\frac{\partial}{\partial h} \alpha(y_i - h)^2 \rightarrow -2 \alpha(y_i - h)

We can now look at the second case, when y_i < h. Once again we will use the chain rule. We start with \frac{\partial}{\partial h} \beta \cdot |y_i - h|^3 and treat f(h) = h^3 and g(h) = |y_i - h|. We can easily find f'(h) = 3h, but g'(h) is a bit trickier.

This requires you to know when h = y_i the line is undefined. This means we have a piecewise function.

g'(h) = \begin{cases} -3 \beta & \text{if } y_i > h, \\ \text{undefined} & \text{if } y_i = h\\ 3 \beta & \text{if } y_i < h, \end{cases}

Since we only care about when y_i < h we can replace g'(h) with 1! This means we have

\frac{\partial}{\partial h} \beta \cdot |y_i - h|^3 \rightarrow 3 \beta |h - y_i|^2

\boxed{\frac{L_{\text{Jack}}}{\partial h} = \begin{cases} -2 \alpha(y_i - h) & \text{if } y_i \geq h, \\ 3 \beta |h - y_i|^2 & \text{if } y_i < h, \end{cases}}

Difficulty: ⭐️⭐️⭐️

The average score on this problem was 66%.

Problem 1.2

Prove that for the constant prediction h^* minimizing empirical risk for the Jack loss function, the following quantities sum to 0:

Hint: You do not need to solve for the optimal value of h^*. Instead, walk through the general process of minimizing a risk function, and think about the equations you come across.

Once again, recall Jack loss:

L_{\text{Jack}}(y_i, h) = \begin{cases} \alpha \cdot (y_i - h)^2 & \text{if } y_i \geq h, \\ \beta \cdot |y_i - h|^3 & \text{if } y_i < h, \end{cases}

We can rewrite this to say:

L_{\text{Jack}}(y_i, h) = \alpha \cdot (y_i - h)^2 + \beta \cdot |y_i - h|^3

From here we can now set up the equation for risk. We are told in the hint to go through the motions of finding h^* without doing all of the calculations. Recall risk follows the equation: R_{\text{f(x)}} = \frac{1}{n} \sum_{i = 1}^{n} f(x). Once we have this equation we take the derivative and set it equal to zero to solve for h^*.

\begin{align*} R_{L_{\text{Jack}}} &= \frac{1}{n} \sum_{i = 1}^{n} \alpha \cdot (y_i - h)^2 + \beta \cdot |y_i - h|^3\\ &=\frac{1}{n} (\sum_{\text{if }y_i \geq h} \alpha \cdot (y_i - h)^2 + \sum_{\text{if }y_i < h} \beta \cdot |y_i - h|^3)\\ &\text{To get the derivative use part a!}\\ &= \frac{1}{n} (\sum_{\text{if }y_i \geq h} -2 \alpha(y_i - h) + \sum_{\text{if }y_i < h} 3 \beta |h - y_i|^2)\\ \end{align*}

From here we can set R_{L_{\text{Jack}}} to zero.

\begin{align*} 0 &= R_{L_{\text{Jack}}} \\ 0 &= \frac{1}{n} (\sum_{\text{if }y_i \geq h} -2 \alpha(y_i - h) + \sum_{\text{if }y_i < h} 3 \beta |h - y_i|^2) \\ 0 &= -2 \alpha \sum_{\text{if }y_i \geq h} (y_i - h) + 3 \beta \sum_{\text{if }y_i < h} |h - y_i|^2 \end{align*}

We know h^* minimizes empirical risk and makes it 0, so h^* will make the sum of these two derivative parts sum up to zero.

Difficulty: ⭐️⭐️⭐️⭐️

The average score on this problem was 34%.

Problem 1.3

For any set of values y_1, y_2, ..., y_n in sorted order y_1 \leq y_2 \leq ... \leq y_n, evaluate h^* when \alpha = 0.

Any number \leq y_1

When \alpha = 0 the loss function becomes:

L_{\text{Jack}}(y_i, h) = \begin{cases} 0 & \text{if } y_i \geq h, \\ \beta \cdot |y_i - h|^3 & \text{if } y_i < h, \end{cases}

This means if y_i \geq h then the loss for every data point is 0. In contrast if y_i < h then the loss will increase by the function \beta \cdot |y_i - h|^3. We want the loss to be as small as possible. This means we want an h^* that will avoid any positive contribution to the loss term. If we did an h^* less than any value in the dataset then y_i will be greater than them. This leads to a loss of 0. This means we want any number \leq y_1.

For example, if you had the dataset 2, 4, 6, 8, 10 and we chose h^* = 1 then when we check which function to use we will always get 0.

Difficulty: ⭐️⭐️⭐️⭐️⭐️

The average score on this problem was 6%.

Problem 1.4

For any set of values y_1, y_2, ..., y_n in sorted order y_1 \leq y_2 \leq ... \leq y_n, evaluate h^* when \beta = 0.

Any number > y_n

When \beta = 0 the loss function becomes:

L_{\text{Jack}}(y_i, h) = \begin{cases} \alpha \cdot (y_i - h)^2 & \text{if } y_i \geq h, \\ 0 & \text{if } y_i < h, \end{cases}

This means if y_i < h then the loss for every data point is 0. In contrast if y_i \geq h then the loss will increase by the function \alpha \cdot (y_i - h)^2. We want the loss to be as small as possible. This means we want an h^* that will avoid any positive contribution to the loss term. If we did an h^* more than any value in the dataset then y_i will be less than them. This leads to a loss of 0. This means we want any number > y_n.

For example, if you had the dataset 2, 4, 6, 8, 10 and we chose h^* = 12 then when we check which function to use we will always get 0.

Difficulty: ⭐️⭐️⭐️⭐️⭐️

The average score on this problem was 6%.

Problem 2

Problem 2.1

If a vector \vec{v} is orthogonal to a vector \vec{w}, then \vec{v} is also orthogonal to the projection of an arbitrary vector \vec{u} onto \vec{w}.

The statement above is true.

Let the following information hold:

c is a scalar.
c\vec{w} is the projection of \vec{u} onto \vec{w}.

When a vector is orthogonal to another their dot product will equal zero. This means \vec{v} \cdot \vec{w} = 0.

From lecture we know: c\vec{w} = \frac{\vec{w} \cdot \vec{u}}{\vec{w} \cdot \vec{w}} \cdot \vec{w}

So, now we can solve c\vec{w} \cdot \vec{v} = \frac{\vec{w} \cdot \vec{u} \cdot \vec{w} \cdot \vec{v}}{\vec{w} \cdot \vec{w}}. The numerator becomes zero because \vec{v} \cdot \vec{w} = 0 and anything multiplied by zero will also be zero. This means the projection of \vec{u} onto \vec{w} is orthogonal to \vec v.

Difficulty: ⭐️⭐️⭐️

The average score on this problem was 52%.

Problem 2.2

If a vector \vec{v} is orthogonal to a vector \vec{w}, then \vec{v} is also orthogonal to the projection of \vec{w} onto an arbitrary vector \vec{u}.

The statement above is false.

Let the following hold:

\vec v = \begin{bmatrix} 1 \\ 2 \end{bmatrix}
\vec w = \begin{bmatrix} -2 \\ 1 \end{bmatrix}
\vec u = \begin{bmatrix} 0 \\ 1 \end{bmatrix}

Let’s check if \vec w \cdot \vec v = 0. (1)(-2)+(1)(2) = 0 That worked! Now we can calculate the projection of \vec w onto \vec u.

\begin{align*} \frac{\vec w \cdot \vec u}{\vec u \cdot \vec u}\vec u &= \frac{(-2)(0)+(1)(1)}{(0)(0)+(1)(1)} \cdot \begin{bmatrix} 0 \\ 1 \end{bmatrix}\\ &= -1 \cdot \begin{bmatrix} 0 \\ 1 \end{bmatrix}\\ &= \begin{bmatrix} 0 \\ -1 \end{bmatrix} \end{align*}

Now we can check to see if \vec v \cdot the projection of \vec w onto \vec u is equal to zero.

\begin{align*} &\begin{bmatrix} 0 \\ -1 \end{bmatrix} \cdot \begin{bmatrix} 1 \\ 2 \end{bmatrix}\\ &= (0)(1) + (2)(-1) = -1\\ &\text{and } -1 \neq 0 \end{align*}

We can see that \vec v \cdot the projection of \vec w onto \vec u is not equal to zero.

Difficulty: ⭐️⭐️⭐️

The average score on this problem was 50%.

Problem 3

Problem 3.1

An m-dimensional vector.

We know the following information:

M is an m \times n matrix.
\vec{v} is an n-dimensional vector.

This means:

M = \begin{bmatrix} & & \\ & & \\ & & \\ \end{bmatrix}_{m \times n} \text{ and } \vec{v} = \begin{bmatrix} & \\ & \\ \end{bmatrix}_{n \times 1}

When you multiply M \vec{v} the n will both cancel out leaving you with an object with the size \begin{bmatrix}& \\& \\& \\\end{bmatrix}_{m \times 1}, which is an m-dimensional vector.

Difficulty: ⭐️⭐️

The average score on this problem was 87%.

Problem 3.2

An n \times n matrix.

We know the following information:

M is an m \times n matrix.

This means:

M^T = \begin{bmatrix} & & & \\ & & & \\ \end{bmatrix}_{n \times m} \text{ and } M = \begin{bmatrix} & & \\ & & \\ & & \\ \end{bmatrix}_{m \times n}

When you multiply M^T M the m will both cancel out leaving you with an object with the size \begin{bmatrix}& & \\& & \\\end{bmatrix}_{n \times n}, which is an n \times n matrix.

Difficulty: ⭐️⭐️

The average score on this problem was 87%.

Problem 3.3

A scalar.

We know the following information:

\vec{v} is an n-dimensional vector.
N is an n \times n matrix.

This means:

\vec{v}^T = \begin{bmatrix} & & \end{bmatrix}_{1 \times n} \text{, } N = \begin{bmatrix} & & \\ & & \\ \end{bmatrix}_{n \times n} \text{, and } \vec{v} = \begin{bmatrix} & \\ & \\ \end{bmatrix}_{n \times 1}

When you multiply \vec{v}^T N the n will both cancel out leaving you with an object with the size \begin{bmatrix}& &\end{bmatrix}_{1 \times n}. When you multiply this object by \vec{v} the n will cancel out again leaving you with an object with the size \begin{bmatrix}&\end{bmatrix}_{1 \times 1}, which is a scalar.

Difficulty: ⭐️

The average score on this problem was 93%.

Problem 3.4

An n-dimensional vector.

We know the following information:

\vec{v} is an n-dimensional vector.
N is an n \times n matrix.
s is a scalar.

This means:

\vec{v} = \begin{bmatrix} & \\ & \\ \end{bmatrix}_{n \times 1} \text{, } N = \begin{bmatrix} & & \\ & & \\ \end{bmatrix}_{n \times n} \text{, and } s = \begin{bmatrix} & \end{bmatrix}_{1 \times 1}

When you multiply N \vec{v} the n will both cancel out leaving you with an object with the size \begin{bmatrix}& \\& \\\end{bmatrix}_{n \times 1}. When you multiply s \vec{v} the dimensions of \vec{v} does not change, so you will have an object with the size \begin{bmatrix}& \\& \\\end{bmatrix}_{n \times 1}. When you add these vectors the dimension will not change, so you are left with an n-dimensional vector.

Difficulty: ⭐️⭐️

The average score on this problem was 87%.

Problem 3.5

An n-dimensional vector.

We know the following information:

\vec{v} is an n-dimensional vector.
M is an m \times n matrix.
M^T is an n \times m matrix.

This means:

M^T = \begin{bmatrix} & & & \\ & & & \\ \end{bmatrix}_{n \times m} \text{, } M = \begin{bmatrix} & & \\ & & \\ & & \\ \end{bmatrix}_{m \times n} \text{, and } \vec{v} = \begin{bmatrix} & \\ & \\ \end{bmatrix}_{n \times 1}

When you multiply M^T M the m will both cancel out leaving you with an object with the size \begin{bmatrix}& & \\& & \\\end{bmatrix}_{n \times n}, which is an n \times n matrix.

When you multiply \begin{bmatrix}& & \\& & \\\end{bmatrix}_{n \times n} \times \begin{bmatrix}& \\&\\\end{bmatrix}_{n \times 1} the n will cancel out leaving you with an n-dimensional vector.

Difficulty: ⭐️⭐️

The average score on this problem was 81%.

Problem 4

If \vec{e} is orthogonal to every column in A, prove that \vec{b} must satisfy the following equation:

Hint: If the columns of a matrix A are orthogonal to a vector \vec{v}, then A^T \vec{v} = 0.

To begin with, we know from the bullet points above that A^T \vec e = 0 in order for A^T A \vec{b} = A^T \vec{d} to be true. So we start by interpreting A^T \vec e = 0 in \vec e’s other form \vec{d} - \vec{c}. We get A^T(\vec d - \vec c)!

We then distribute A^T to get A^T \vec d - A^T \vec c = 0. From here we can move A^T \vec c to the right to get A^T \vec d = A^T \vec c.

From the third bullet point we find \vec c = A\vec{b}, so we can write A^T \vec d = A^T A \vec b. Thus we have proved if \vec{e} is orthogonal to every column in A that \vec{b} must satisfy the following equation: A^T A \vec{b} = A^T \vec{d}.

Difficulty: ⭐️⭐️⭐️

The average score on this problem was 58%.

Problem 5

Fill in the blanks for each set of vectors below to accurately describe their relationship and span.

\vec{a} = \begin{bmatrix} 1 \\ 3 \end{bmatrix} \qquad \vec{b} = \begin{bmatrix} -2 \\ 1 \end{bmatrix}

“\vec{a} and \vec{b} are __(i)__, meaning they span a __(ii)__. The vector \vec{c} = \begin{bmatrix} -6 \\ 11 \end{bmatrix} __(iii)__ in the span of \vec{a} and \vec{b}. \vec{a} and \vec{b} are __(iv)__, meaning the angle between them is __(v)__”

Problem 5.1

Linearly Independent

Vectors \vec{a} and \vec{b} are linearly independent because they aren’t multiples of each other. They each point in different (but not opposite) directions, so we can’t express either vector as a scalar multiple of the other.

Difficulty: ⭐️⭐️

The average score on this problem was 81%.

Problem 5.2

Plane

Any two linearly independent vectors must span a plane. Geometrically, we can think of all scalar multiples of a single vector to lie along a single line. Then, linear combinations of any two vectors pointing in different directions must lie on the same plane.

Difficulty: ⭐️⭐️

The average score on this problem was 81%.

Problem 5.3

A quick way to know that \vec{c} lies in the span of \vec{a} and \vec{b} is because we are dealing with 2-dimensional vectors and we know that \vec{a} and \vec{b} span a plane (so they must span all of \mathbb{R}^2). \vec{c} exists in 2-dimensional space, so it must exist in the span of \vec{a} and \vec{b}.

We can also directly check if \vec{c} lies in the span of \vec{a} and \vec{b} if there exists some constants c_1 and c_2 such that c_1 \vec{a} + c_2 \vec{b} = \vec{c}. Plugging in what we know: c_1 \begin{bmatrix} 1 \\ 3 \end{bmatrix} + c_2 \begin{bmatrix} -2 \\ 1 \end{bmatrix} = \begin{bmatrix} -6 \\ 11 \end{bmatrix} This gives us the system of equations: c_1 (1) + c_2 (-2) = -6 c_1 (3) + c_2 (1) = 11 Solving this system gives us c_1 = \frac{16}{7}, c_2 = \frac{29}{7}, which means \vec{c} does lie in the span of \vec{a} and \vec{b}.

Difficulty: ⭐️⭐️⭐️

The average score on this problem was 68%.

Problem 5.4

neither orthogonal nor collinear

The vectors \vec{a} and \vec{b} are orthogonal if \vec{a} \cdot \vec{b} = 0. We can compute \vec{a} \cdot \vec{b} = (1)(-2) + (3)(1) = 1 \neq 0, so \vec{a} and \vec{b} are not orthogonal.

The vectors \vec{a} and \vec{b} are collinear if we can write one vector as a scalar multiple of the other. But we already know \vec{a} and \vec{b} are linearly independent, so they don’t lie along the same line and aren’t collinear.

Difficulty: ⭐️⭐️⭐️

The average score on this problem was 68%.

Problem 5.5

something else

If two vectors are orthogonal, the angle between them is 90 degrees. If two vectors are collinear, the angle between them is 0 or 180 degrees. Since \vec{a} and \vec{b} are neither of these, then the angle between them must be something else.

Difficulty: ⭐️⭐️

The average score on this problem was 81%.

Problem 6

Fill in the blanks for each set of vectors below to accurately describe their relationship and span.

\vec{a} = \begin{bmatrix} -2 \\ 1.5 \\ 4 \end{bmatrix} \qquad \vec{b} = \begin{bmatrix} 1 \\ 2 \\ -2.5 \end{bmatrix} \qquad \vec{c} = \begin{bmatrix} -6 \\ 10 \\ 11 \end{bmatrix}

“\vec{a}, \vec{b}, and \vec{c} are __(i)__, meaning they span a __(ii)__. The vector \vec{a} = \begin{bmatrix} -2 \\ 15 \\ -7 \end{bmatrix} __(iii)__ in the span of \vec{a}, \vec{b}, and \vec{c}. \vec{a}, \vec{b}, and \vec{c} are __(iv)__, meaning the angle between them is __(v)__”

Problem 6.1

Linearly Dependent

To check if \vec{a}, \vec{b}, \vec{c} are linearly independent or dependent, it is possible to try a few different linear combinations of two vectors, and see if it equals the third vector. In this case, we can see that 4\vec{a} + 2\vec{b} = \vec{c}, so they are linearly dependent. For a more straight-forward computation, we can arrange these vectors as the columns of a matrix and compute their rank: A = \begin{bmatrix} -2 & 1 & -6 \\ 1.5 & 2 & 10 \\ 4 & -2.5 & 11\end{bmatrix} Then, we can perform row operations to reduce the matrix A. This gives us RREF(A) = \begin{bmatrix} 1 & 0 & 0 \\ 0 & 1 & 0 \\ 0 & 0 & 0\end{bmatrix} From here, we see that there only two columns with leading 1s, which implies the vectors are linearly dependent.

Difficulty: ⭐️

The average score on this problem was 94%.

Problem 6.2

Plane

We know from above that the three vectors are linearly dependent. It is clear that they all do not lie on the same line, since we can’t multiply any single vector by a constant to obtain another vector. If all three vectors were independent, they would span all of \mathbb{R}^3, or 3-dimensional space. Since this isn’t the case, they must span a plane.

Difficulty: ⭐️⭐️⭐️

The average score on this problem was 63%.

Problem 6.3

is not

If \vec{d} was in the span of \vec{a}, \vec{b}, \vec{c}, then there should exist constants to fulfill the equation: c_1 \vec{a} + c_2 \vec{b} + c_3 \vec{c} = \vec{d} We can set this up as an augmented matrix: A = \begin{bmatrix} -2 & 1 & -6 & -2 \\ 1.5 & 2 & 10 & 15 \\ 4 & -2.5 & 11 & -7 \end{bmatrix} After performing row operations, we obtain: RREF(A) = \begin{bmatrix} 1 & 0 & 4 & 0 \\ 0 & 1 & 2 & 0 \\ 0 & 0 & 0 & 1\end{bmatrix} This tells us our system of equations is inconsistent because the last row implies 0=1. Therefore, there does not exist a solution, and \vec{d} is not in the span of \vec{a}, \vec{b}, \vec{c}.

Difficulty: ⭐️⭐️⭐️

The average score on this problem was 50%.

Problem 6.4

neither orthogonal nor collinear

We can tell that \vec{a}, \vec{b}, \vec{c} are not collinear because their span isn’t a line. In order to check if these vectors are orthogonal, we can compute the dot product between each pair of vectors; if any of the three dot products are not equal to 0, then they are not orthogonal.

We see that \vec{a} \cdot \vec{b} = (-2)(1) + (1.5)(2) + (4)(-2.5) = -9 \neq 0, so the three vectors together are not orthogonal.

Difficulty: ⭐️⭐️

The average score on this problem was 81%.

Problem 6.5

something else

If two vectors are orthogonal, the angle between them is 90 degrees. If two vectors are collinear, the angle between them is 0 or 180 degrees. Since each pair of the vectors \vec{a}, \vec{b}, \vec{c} are neither of these, then the angles between them must be something else.

Difficulty: ⭐️⭐️

The average score on this problem was 81%.

Problem 7

Fill in the blanks for each set of vectors below to accurately describe their relationship and span.

\vec{a} = \begin{bmatrix} -1 \\ 4 \end{bmatrix} \qquad \vec{b} = \begin{bmatrix} 0.5 \\ -2 \end{bmatrix}

“\vec{a} and \vec{b} are __(i)__, meaning they span a __(ii)__. The vector \vec{c} = \begin{bmatrix} -3 \\ 4 \end{bmatrix} __(iii)__ in the span of \vec{a} and \vec{b}. \vec{a} and \vec{b} are __(iv)__, meaning the angle between them is __(v)__”

Problem 7.1

Linearly Dependent

Upon inspection, we can see that \vec{b} = -\frac{1}{2}\vec{a}. Therefore, these vectors are linearly dependent.

Difficulty: ⭐️⭐️

The average score on this problem was 81%.

Problem 7.2

Line

Since we found that \vec{a} and \vec{b} are linearly dependent, they must span a space of only 1-dimension. This means that \vec{a} and \vec{b} lie the same line.

Difficulty: ⭐️⭐️

The average score on this problem was 87%.

Problem 7.3

is not

We know that \vec{a} and \vec{b} span a line consisting only of vectors that are scalar multiples of \vec{a} or \vec{b}. We can see that the first component of \vec{c} is three times larger than the first component of \vec{a}, but they both share the same second component. This means it is not possible to scale \vec{a} by a constant to obtain \vec{c}, and so \vec{c} does not lie on the same line spanned by \vec{a} and \vec{b}.

Difficulty: ⭐️

The average score on this problem was 93%.

Problem 7.4

Collinear

The vectors \vec{a} and \vec{b} are collinear if we can write one vector as a scalar multiple of the other. We saw that \vec{b} = -\frac{1}{2}\vec{a}, so they are collinear.

Difficulty: ⭐️⭐️⭐️

The average score on this problem was 62%.

Problem 7.5

0 or 180 degrees

If two vectors are collinear, the angle between them is 0 or 180 degrees. This is because they lie on the same line and either point in the same or opposite directions from the origin.

Difficulty: ⭐️⭐️

The average score on this problem was 81%.

Problem 8

Fill in the blanks for each set of vectors below to accurately describe their relationship and span.

\vec{a} = \begin{bmatrix} -3 \\ -2 \end{bmatrix} \qquad \vec{b} = \begin{bmatrix} -2 \\ 3 \end{bmatrix}

“\vec{a} and \vec{b} are __(i)__, meaning they span a __(ii)__. The vector \vec{a} = \begin{bmatrix} 1 \\ 1 \end{bmatrix} __(iii)__ in the span of \vec{a} and \vec{b}. \vec{a} and \vec{b} are __(iv)__, meaning the angle between them is __(v)__”

Problem 8.1

Linearly Independent

We can see that one vector is not a scalar multiple of the other. \vec{a} has both negative components, so we would only be able to obtain scalar multiples with both negative or both positive components. On the other hand, \vec{b} has one negative and one positive component. So they must be linearly independent.

Difficulty: ⭐️⭐️

The average score on this problem was 75%.

Problem 8.2

Plane

Difficulty: ⭐️⭐️

The average score on this problem was 87%.

Problem 8.3

Similar to Q5, we can tell that \vec{c} lies in the span of \vec{a} and \vec{b} because we are dealing with 2-dimensional vectors, and we know that \vec{a} and \vec{b} span a plane (so they must span all of \mathbb{R}^2). \vec{c} exists in 2-dimensional space, so it must exist in the span of \vec{a} and \vec{b}.

We can also directly check if \vec{c} lies in the span of \vec{a} and \vec{b} if there exists some constants c_1 and c_2 such that c_1 \vec{a} + c_2 \vec{b} = \vec{c}. Plugging in what we know: c_1 \begin{bmatrix} -3 \\ -2 \end{bmatrix} + c_2 \begin{bmatrix} -2 \\ 3 \end{bmatrix} = \begin{bmatrix} 1 \\ 1 \end{bmatrix}$ This gives us the system of equations: c_1 (-3) + c_2 (-2) = 1 c_1 (-2) + c_2 (3) = 1 Solving this system gives us c_1 = -\frac{5}{13}, c_2 = \frac{1}{13}, which means \vec{c} does lie in the span of \vec{a} and \vec{b}.

Difficulty: ⭐️⭐️

The average score on this problem was 75%.

Problem 8.4

Orthogonal

The vectors \vec{a} and \vec{b} are orthogonal if \vec{a} \cdot \vec{b} = 0. We can compute \vec{a} \cdot \vec{b} = (-3)(-2) + (-2)(3) = 6 - 6 = 0, so \vec{a} and \vec{b} are orthogonal.

Difficulty: ⭐️⭐️

The average score on this problem was 87%.

Problem 8.5

90 degrees

Since \vec{a} and \vec{b} are orthogonal, the angle between them is 90 degrees.

Difficulty: ⭐️⭐️

The average score on this problem was 87%.

Problem 9

Suppose we want to fit a hypothesis function of the form: H(x) = w_0 + w_1 x^{(1)} + w_2 x^{(2)} + w_3 (x^{(1)})^2 + w_4 (x^{(1)} x^{(2)})^2 Our dataset looks like this:

Problem 9.1

x^{(1)}	x^{(2)}	y
1	6	7
-3	8	2
4	1	9
-2	7	5
0	4	6

Suppose we found w_2^* = 15.6 using multiple linear regression. Now, suppose we rescaled our data so feature vector \vec{x^{(2)}} became \left[60,\ 80,\ 10,\ 70,\ 40\right]^T, and performed multiple linear regression in the same setting. What would the new value of w_2^* (the weight on feature x^{(2)} in H(x)) be? You do not need to simplify your answer.

w_2^* = 1.56

Recall in linear regression, when a feature is rescaled by a factor of c, the corresponding weight is inversely scaled by \frac{1}{c}. This is because the coefficient adjusts to maintain the overall contribution of the feature to the prediction. In this problem we can see that x^{(2)} has been scaled by a factor of 10! It changed from \left[6,\ 8,\ 1,\ 7,\ 4\right]^T to \left[60,\ 80,\ 10,\ 70,\ 40\right]^T. This means our w_2 will be scaled by \frac{1}{10}. We can easily calculate \frac{1}{10}w_2^* = \frac{1}{10}\cdot 15.6 = 1.56.

Difficulty: ⭐️⭐️

The average score on this problem was 81%.

Problem 9.2

Suppose we found w_4^* = 72 using multiple linear regression. Now, suppose we rescaled our data so feature vector \vec{x^{(1)}} became \left[ \frac{1}{2},\ -\frac{3}{2},\ 2,\ -1,\ 0 \right] and \vec{x^{(2)}} now became \left[36, \ 48, \ 6, \ 42, \ 24\right]^T, and performed multiple linear regression in the same setting. What would the new value of w_4^* (the weight on feature (x^{(1)} x^{(2)})^2 in H(x)) be? You do not need to simplify your answer.

w_4^* = \frac{72}{(\frac{6}{2})^2}

Similar to the first part of the problem we need to find how \vec{x^{(1)}} and \vec{x^{(2)}} changed. We can then inversely scale w_4^* by those values.

Let’s start with \vec{x^{(1)}}. Originally \vec{x^{(1)}} was \left[1,\ -3,\ 4,\ -2,\ 0\right]^T, but it becomes \left[ \frac{1}{2},\ -\frac{3}{2},\ 2,\ -1,\ 0 \right]. We can see the values were scaled by \frac{1}{2}.

We can now look at \vec{x^{(2)}}. Originally \vec{x^{(2)}} was \left[6,\ 8,\ 1,\ 7,\ 4\right]^T, but it becomes \left[36, \ 48, \ 6, \ 42, \ 24\right]^T. We can see the values were scaled by 6.

We know w_4^* is attached to the variable (x^{(1)} x^{(2)})^2. This means we need to multiply the scaled values we found and then square it. (\frac{1}{2} \cdot 6)^2 = (3)^2 = 9.

From here we simply inversely scale w_4^*. Originally w_4^* = 72, so we multiply by \frac{1}{9} to get 72 \cdot \frac{1}{9} = \frac{72}{9} (which is equal to \frac{72}{(\frac{6}{2})^2}).

Difficulty: ⭐️⭐️⭐️⭐️

The average score on this problem was 35%.

Problem 9.3

Suppose we found w_0^* = 4.47 using multiple linear regression. Now, suppose our observation vector \vec{y} now became \left[12, \ 7, \ 14, \ 10, \ 11\right]^T, and performed multiple linear regression in the same setting. What would the new value of w_0^* be? You do not need to simplify your answer.

w_0^* = 9.47

Recall w_0^* is the intercept term and represents the value of the prediction H(x) when all the features (\vec{x^{(1)}} and \vec{x^{(2)}}) are zero. If all other features (the w_1^* and w_2^*) remain unchanged then the intercept adjusts to reflect the shifts in the mean of the observed \vec y.

Our original \vec y was \left[7, \ 2, \ 9, \ 5, \ 6\right]^T, but became \left[12, \ 7, \ 14, \ 10, \ 11\right]^T. We need to calculate the original \vec y’s mean and the new \vec y’s mean.

Old \vec y’s mean: \frac{7 + 2 + 9 + 5 + 6}{5} = \frac{29}{5} = 5.8

New \vec y’s mean: \frac{12 + 7 + 14 + 10 + 11}{5} = \frac{54}{5} = 10.8

Our new w_0^* is found by taking the old one and adding the difference of 10.8 - 5.8. This means: w_0^* = 4.47 + (10.8 - 5.8) = 4.47 + 5 = 9.47.

Difficulty: ⭐️⭐️⭐️

The average score on this problem was 62%.

Problem 9.4

Suppose we found w_1^* = 0.428 using multiple linear regression. Now, suppose our observation vector \vec{y} now became \left[12, \ 7, \ 14, \ 10, \ 11\right]^T, and performed multiple linear regression in the same setting. What would the new value of w_1^* (the weight on feature x^{(1)} in H(x)) be? You do not need to simplify your answer.

w_1^* = 0.428

Our old \vec y was \left[7, \ 2, \ 9, \ 5, \ 6\right]^T and our new \vec y is \left[12, \ 7, \ 14, \ 10, \ 11\right]^T. We can see that our old \vec y has five added to each value to get our new \vec y.

Recall the slope (w_1^*) does not change if our y_i shifts by a constant amount! This means w_1^* = 0.428.

Difficulty: ⭐️⭐️⭐️

The average score on this problem was 50%.

Problem 10

Suppose we want to fit a hypothesis function of the form: H(x) = w_0 + w_1 x^{(1)} + w_2 x^{(2)} + w_3 (x^{(1)})^2 + w_4 (x^{(1)} x^{(2)})^2

x^{(1)}	x^{(2)}	y
1	6	7
-3	8	2
4	1	9
-2	7	5
0	4	6

We know we need to find an optimal parameter vector \vec{w}^* = \left[w_0^* \ \ w_1^* \ \ w_2^* \ \ w_3^* \ \ w_4^* \right]^T that satisfies the normal equations. To do so, we build a design matrix, but our columns get all shuffled due to an error with our computer! Our resulting design matrix is

X_\text{shuffled} = \begin{bmatrix} 36 & 6 & 1 & 1 & 1 \\ 576 & 8 & -3& 1 & 9 \\ 16 & 1 & 4 & 1 & 16 \\ 196 & 7 & -2& 1 & 4 \\ 0 & 4 & 0 & 1 & 0 \end{bmatrix}

If we solved the normal equations using this shuffled design matrix X_\text{shuffled}, we would not get our parameter vector \vec{w}^* = \left[w_0^* \ \ w_1^* \ \ w_2^* \ \ w_3^* \ \ w_4^* \right]^T in the correct order. Let \vec{s} = \left[ s_0 \ \ s_1 \ \ s_2 \ \ s_3 \ \ s_4 \right] be the parameter vector we find instead. Let’s figure out which features correspond to the weight vector \vec{s} that we found using the shuffled design matrix X_\text{shuffled}. Fill in the bubbles below.

Problem 10.1

First weight s_0 after solving normal equations corresponds to the term in H(x):

(x^{(1)} x^{(2)})^2

The first column inside of our X_\text{shuffled} represents s_0, so we want to figure out how to create these values. We can easily eliminate intercept, x^{(1)}, and x^{(2)} because none of these numbers match. From here we can calculate (x^{(1)})^2 and (x^{(1)} x^{(2)})^2 to determine which element creates s_0.

\begin{align*} (x^{(1)})^2 &= \begin{bmatrix} 1^2 = 1 \\ (-3)^2 = 9\\ 4^2 = 16\\ (-2)^2 = 4\\ 0^2 = 0 \end{bmatrix} \\ &\text{and}\\ (x^{(1)} x^{(2)})^2 &= \begin{bmatrix} (1 \times 6)^2 = 36 \\ (-3 \times 8)^2 = 576 \\ (4 \times 1)^2 = 16 \\ (-2 \times 7 )^2 = 196 \\ (0 \times 4)^2 = 0 \end{bmatrix} \end{align*}

From this we can see the answer is clearly (x^{(1)} x^{(2)})^2.

Difficulty: ⭐️⭐️

The average score on this problem was 87%.

Problem 10.2

Second weight s_1 after solving normal equations corresponds to the term in H(x):

x^{(2)}

The second column inside of our X_\text{shuffled} represents s_1, so we want to figure out how to create these values. We can see this is the same as x^{(2)}.

Difficulty: ⭐️

The average score on this problem was 93%.

Problem 10.3

Third weight s_2 after solving normal equations corresponds to the term in H(x):

x^{(1)}

The third column inside of our X_\text{shuffled} represents s_2, so we want to figure out how to create these values. We can see this is the same as x^{(1)}.

Difficulty: ⭐️⭐️

The average score on this problem was 87%.

Problem 10.4

Fourth weight s_3 after solving normal equations corresponds to the term in H(x):

intercept

The fourth column inside of our X_\text{shuffled} represents s_3, so we want to figure out how to create these values. We know the intercept is a vector of ones, which matches!

Difficulty: ⭐️

The average score on this problem was 93%.

Problem 10.5

Fifth weight s_4 after solving normal equations corresponds to the term in H(x):

(x^{(1)})^2

From process of elimination we can find the answer or from our first calculation.

(x^{(1)})^2 = \begin{bmatrix} 1^2 = 1 \\ (-3)^2 = 9\\ 4^2 = 16\\ (-2)^2 = 4\\ 0^2 = 0 \end{bmatrix}

Difficulty: ⭐️⭐️

The average score on this problem was 87%.

Problem 11

Suppose we have already fit a multiple regression hypothesis function of the form: H(x) = w_0 + w_1 x^{(1)} + w_2 x^{(2)}

Now, suppose we add the feature (x^{(1)} + x^{(2)}) when performing multiple regression. Below, answer ``Yes/No” to the following questions and rigorously justify why certain behavior will or will not occur. Your answer must mention linear algebra concepts such as rank and linear independence in relation to the design matrix, weight vector \vec{w^*}, and hypothesis function H(x).

Problem 11.1

Which of the following are true about the new design matrix X_\text{new} with our added feature (x^{(1)} + x^{(2)})?

The columns of X_\text{new} are linearly dependent. The columns of X_\text{new} have the same span as the original design matrix X. X_\text{new}^TX_\text{new} is not a full-rank matrix.

Let’s go through each of the options and determine if they are true or false.

The columns of X_\text{new} are linearly independent.

This statement is false because (x^{(1)} + x^{(2)}) is a linear combination of the original features (linearly dependent). This means the added feature does not provide any new, independent information to the model.

The columns of X_\text{new} are linearly dependent.

This statement is true because (x^{(1)} + x^{(2)}) is a linear combination of the original features.

\vec{y} is orthogonal to all the columns of X_\text{new}.

This statement is false because there is no justification for othogonality. It is usually not the case that \vec y is orthogonal to the columns of X_\text{new} because the goal of regression is to find a linear relatiionship between the predictors and the response variable. Since we have some regression coefficients (w_0, w_1, w_2) this implies there exists a relationship between \vec y and X_\text{new}.

\vec{y} is orthogonal to all the columns of the original design matrix X.

This statement is false because there is no justification for othogonality. It is usually not the case that \vec y is orthogonal to the columns of X because the goal of regression is to find a linear relatiionship between the predictors and the response variable. Since we have some regression coefficients (w_0, w_1, w_2) this implies there exists a relationship between \vec y and X.

The columns of X_\text{new} have the same span as the original design matrix X.

This statement is true because (x^{(1)} + x^{(2)}) is a linear combination of the original features the span does not change.

X_\text{new}^TX_\text{new} is a full-rank matrix.

This statement is false because there is a linearly dependent column!

X_\text{new}^TX_\text{new} is not a full-rank matrix.

This statement is true because of linear dependence.

Difficulty: ⭐️⭐️⭐️

The average score on this problem was 70%.

Problem 11.2

Is there more than one optimal weight vector \vec{w^*} that produces the lowest mean squared error hypothesis function H(x) = w_0^* + w_1^* x^{(1)} + w_2^* x^{(2)} + w_4^*(x^{(1)} + x^{(2)})?

Yes

Difficulty: ⭐️⭐️

The average score on this problem was 81%.

Problem 11.3

There can be multiple optimal weight vectors \vec w^* that achieve the lowest mean squared error for the hypothesis function because of the linear dependence between the columns in the design matrix. This means the matrix is not full rank, so there are infinite solutions. This also results in non-unique solutions for the weight coefficients, allowing for various combinations of weights that produce the same optimal prediction outcome.

Difficulty: ⭐️⭐️⭐️

The average score on this problem was 62%.

Problem 11.4

Does the best possible mean squared error of the new hypothesis function differ from that of the previous hypothesis function?

Difficulty: ⭐️⭐️⭐️⭐️

The average score on this problem was 37%.

Problem 11.5

When we have a linear combination (x^{(1)} + x^{(2)}) we are not enhancing the model’s capavility to fit the data in a way that would lower the best possible mean squared error. This means both models are capturing the same underlying relationship between the predictors and the response variable. Making it so that the mean squared error of the new hypothesis function does not differ from that of the previous hypothesis function.

Difficulty: ⭐️⭐️⭐️⭐️

The average score on this problem was 31%.

Problem 1

Problem 1.1

Click to view the solution.

Difficulty: ⭐️⭐️⭐️

Problem 1.2

Click to view the solution.

Difficulty: ⭐️⭐️⭐️⭐️

Problem 1.3

Click to view the solution.

Difficulty: ⭐️⭐️⭐️⭐️⭐️

Problem 1.4

Click to view the solution.

Difficulty: ⭐️⭐️⭐️⭐️⭐️

Problem 2

Problem 2.1

Click to view the solution.

Difficulty: ⭐️⭐️⭐️

Problem 2.2

Click to view the solution.

Difficulty: ⭐️⭐️⭐️

Problem 3

Problem 3.1

Click to view the solution.

Difficulty: ⭐️⭐️

Problem 3.2

Click to view the solution.

Difficulty: ⭐️⭐️

Problem 3.3

Click to view the solution.

Difficulty: ⭐️

Problem 3.4

Click to view the solution.

Difficulty: ⭐️⭐️

Problem 3.5

Click to view the solution.

Difficulty: ⭐️⭐️

Problem 4

Click to view the solution.

Difficulty: ⭐️⭐️⭐️

Problem 5

Problem 5.1

Click to view the solution.

Difficulty: ⭐️⭐️

Problem 5.2

Click to view the solution.

Difficulty: ⭐️⭐️

Problem 5.3

Click to view the solution.

Difficulty: ⭐️⭐️⭐️

Problem 5.4

Click to view the solution.

Difficulty: ⭐️⭐️⭐️

Problem 5.5

Click to view the solution.

Difficulty: ⭐️⭐️

Problem 6

Problem 6.1

Click to view the solution.

Difficulty: ⭐️

Problem 6.2

Click to view the solution.

Difficulty: ⭐️⭐️⭐️

Problem 6.3

Click to view the solution.

Difficulty: ⭐️⭐️⭐️

Problem 6.4

Click to view the solution.

Difficulty: ⭐️⭐️

Problem 6.5

Click to view the solution.

Difficulty: ⭐️⭐️

Problem 7

Problem 7.1

Click to view the solution.

Difficulty: ⭐️⭐️

Problem 7.2

Click to view the solution.

Difficulty: ⭐️⭐️

Problem 7.3

Click to view the solution.