Summer Session 2024 Final Exam

← return to practice.dsc40a.com


Instructor(s): Nishant Kheterpal

This exam was administered in person. Students had 180 minutes to take this exam.


Problem 1

Source: Summer Session 2 2024 Final, Problem 1a-g

Consider the dataset x_1 = 1, x_2 = 2, x_3 = 3, x_4 = 4, x_5 = 5.


Problem 1.1

Which single element of the dataset would you change, and to what value, to make h^* = 4 the single minimizing value for the following loss function and corresponding empirical risk? Bubble in your choice of x_i, if possible, and provide the new value of that x_i. If no such x_i is possible, bubble in “Not possible" and explain why. L(y_i, h) = \begin{cases} 0 & y_i = h \\ 1 & y_i \neq h \end{cases}

Include an explanation.

x_1 = 4 or x_2 = 4 or x_3 = 4 or x_5 = 4

Recall from lecture 0-1 loss L(y_i, h) = \begin{cases} 0 & y_i = h \\ 1 & y_i \neq h \end{cases} is minimized by the mode. This means we want to maximize the number of matches between h and y_i. If h^* = 4 then we want to change values that are not equal to 4 into 4. This means any answer that is not x_4 can be changed to equal 4 and it is possible.


Difficulty: ⭐️⭐️⭐️

The average score on this problem was 64%.


Problem 1.2

Suppose we must modify x_5. What should the new value of x_5 be to make h^* = 6 for the following loss function and corresponding empirical risk?

L(y_i, h) = (y_i - h)^2

x_5 should be modified to 20

We know that L(y_i, h) = (y_i - h)^2 is minimized by the mean. This means h^* = \frac{1}{n} \sum_{i = 1}^n x_i.

We can make the equation h^* =6 by writing 6 = \frac{1 + 2 + 3 + 4 + x_5'}{5}. From here we simply solve for x_5'.

\begin{align*} 6 &= \frac{1 + 2 + 3 + 4 + x_5'}{5}\\ 30 &= 1 + 2 + 3 + 4 + x_5'\\ 30 &= 10 + x_5 \\ 20 &= x_5 \end{align*}

Here we can see that x_5 should be changed to 20.


Difficulty: ⭐️⭐️

The average score on this problem was 83%.


Problem 1.3

Suppose we must modify x_1. What should the new value of x_1 be to make h^* = 6 for the following loss function and corresponding empirical risk?

L(y_i, h) = (y_i - h)^2

x_1 should be modified to 16

Once again we know that L(y_i, h) = (y_i - h)^2 is minimized by the mean. This means h^* = \frac{1}{n} \sum_{i = 1}^n x_i.

We can make the equation h^* =6 by writing 6 = \frac{x_1' + 2 + 3 + 4 + 5}{5}. From here we simply solve for x_1'.

\begin{align*} 6 &= \frac{x_1' + 2 + 3 + 4 + 5}{5}\\ 30 &= x_1 + 2 + 3 + 4 + 5\\ 30 &= x_1 + 14 \\ 16 &= x_1 \end{align*}

Here we can see that x_1 should be changed to 16.


Difficulty: ⭐️⭐️

The average score on this problem was 83%.


Problem 1.4

Consider the dataset x_1 = 1, x_2 = 2, x_3 = 3, x_4 = 4, x_5 = 5.

Which single element of the dataset would you change, and to what value, to make h^* = 4 for the following loss function and corresponding empirical risk? Bubble in your choice of x_i, if possible, and provide the new value of that x_i in the box below. If no such x_i is possible, bubble in “Not possible" and explain why in the box below. L(y_i, h) = |y_i - h|

Include an explanation.

x_3 = 4

This is absolute loss and is minimized by the median. This means we need to turn whatever the current median is into 4.

When we look at the dataset (x_1 = 1, x_2 = 2, x_3 = 3, x_4 = 4, x_5 = 5) we can see the current median is x_3 = 3. This means all we need to do is change the x_3 to 4.


Difficulty: ⭐️⭐️

The average score on this problem was 89%.


Problem 1.5

Which single element of the dataset would you change, and to what value, to make h^* = 5 for the following loss function and corresponding empirical risk? Bubble in your choice of x_i, if possible, and provide the new value of that x_i in the box below. If no such x_i is possible, bubble in “Not possible" and explain why in the box below. L(y_i, h) = |y_i - h|

Include an explanation.

Not possible

This is absolute loss and is minimized by the median. This means we need to turn whatever the current median is into 4.

When we look at the dataset (x_1 = 1, x_2 = 2, x_3 = 3, x_4 = 4, x_5 = 5) we can see the current median is x_3 = 3. However, if we change x_3 = 5 then the dataset would look like: x_1 = 1, x_2 = 2, x_4 = 4, x_3 = 5, x_5 = 5, which would make the median equal to 4. There is no way to make the median 5 by only changing a single value.


Difficulty: ⭐️

The average score on this problem was 92%.


Problem 1.6

Suppose we delete x_2, so the dataset is x_1 = 1, x_3=3, x_4=4, x_5=5. What is the new value of h^* for following loss function and corresponding empirical risk? L(y_i, h) = (y_i - h)^2

The mean of the dataset: \frac{13}{4}

Once again we know that L(y_i, h) = (y_i - h)^2 is minimized by the mean. This means h^* = \frac{1}{n} \sum_{i = 1}^n x_i.

If we delete x_2 this means our equation becomes \frac{1 + 3 + 4 + 5}{4} = \frac{13}{4}.


Difficulty: ⭐️⭐️

The average score on this problem was 88%.


Problem 1.7

Suppose we delete x_3, so the dataset is x_1 = 1, x_2=2, x_4=4, x_5=5. Which of the following values of h^* minimize the following loss function and corresponding empirical risk? L(y_i, h) = |y_i - h|

2, 2.5, 3, 3.5, 4

Once again, recall L(y_i, h) = |y_i - h| is minimized by the median. The new dataset is even and has a median between 2 and 4, which means any answer in-between (and including) these values should be selected.


Difficulty: ⭐️⭐️

The average score on this problem was 87%.



Problem 2

Source: Summer Session 2 2024 Final, Problem 2a-e

Consider a dataset of 4 values, y_1 < y_2 < y_3 < y_4, with a mean of 6.

Let Y_\text{abs}(h) = \frac{1}{4} \sum_{i = 1}^4 |y_i - h| represent the mean absolute error of a constant prediction h on this dataset of 4 values.

Similarly, consider another dataset of 3 values, x_1 < x_2 < x_3, that also has a mean of 6.

Let X_\text{abs}(h) = \frac{1}{3} \sum_{i = 1}^3 |x_i - h| represent the mean absolute error of a constant prediction h on this dataset of 3 values.

Suppose that x_1 < y_1, y_4 < x_2, and that T_\text{abs}(h) represents the mean absolute error of a constant prediction h on the combined dataset of 7 values, x_1, y_1, ..., y_4, x_2, x_3. We denote these 7 values as \{ z_1, z_2, z_3, z_4, z_5, z_6, z_7 \}.


Problem 2.1

What value of h minimizes the following empirical risk function? Z(h) = \frac{1}{7} \sum_{i = 1}^7 (h - z_i)

Z(h) = \frac{1}{7} \sum_{i = 1}^7 (h - z_i) is minimized when h is as small as possible. If h is smaller than z_i then it will make a negative risk! This means the smaller h is the smaller the difference will be!

When looking at our answers available the smallest risk would be h^* = -\infty.


Difficulty: ⭐️⭐️⭐️⭐️⭐️

The average score on this problem was 16%.


Problem 2.2

What value of h minimizes mean absolute error, T_\text{abs}(h)?

y_3, the median of the dataset.

Recall h^* for T_\text{abs}(h) is the median of the dataset!

Our dataset is: x_1, y_1, y_2, y_3, y_4, x_2, x_3. Our median is y_3, which means h^* = y_3.


Difficulty: ⭐️⭐️

The average score on this problem was 77%.


Problem 2.3

Suppose the slope of T_\text{abs}(h) is -\frac{1}{7} at some h_p. Hint: think about what values of h could have this slope.

Suppose the dataset is now modified by moving the \{x_i\} such that
y_1 < y_2 < y_3 < y_4 < x_1 < x_2 < x_3. What would the slope of T_\text{abs}(h) be at the point whose x-value is h_p, given this assumption?

-\frac{3}{7}

Slope of T_{abs}(h) is equal to \frac{1}{7} * \text{(number of points to the left of h - number of points to the right of h)}. If the slope of T_{abs}(h) was originally -\frac{1}{7}, there must have been four points to the right of h and 3 points to the left of h, meaning y_2<h<y_3. It follows that in the modified dataset there are 5 points to the right of h and 2 points to the left of h, meaning the slope of T_{abs}(h) must be -\frac{3}{7}


Difficulty: ⭐️⭐️⭐️⭐️

The average score on this problem was 44%.


Problem 2.4

The following information is repeated from the previous page, for your convenience.

Consider a dataset of 4 values, y_1 < y_2 < y_3 < y_4, with a mean of 6. Let Y_\text{abs}(h) = \frac{1}{4} \sum_{i = 1}^4 |y_i - h| represent the mean absolute error of a constant prediction h on this dataset of 4 values.

Similarly, consider another dataset of 3 values, x_1 < x_2 < x_3, that also has a mean of 6. Let X_\text{abs}(h) = \frac{1}{3} \sum_{i = 1}^3 |x_i - h| represent the mean absolute error of a constant prediction h on this dataset of 3 values.

Suppose that x_1 < y_1, y_4 < x_2, and that T_\text{abs}(h) represents the mean absolute error of a constant prediction h on the combined dataset of 7 values, x_1, y_1, ..., y_4, x_2, x_3. We denote these 7 values as \{ z_1, z_2, z_3, z_4, z_5, z_6, z_7 \}.

Suppose the slope of T_\text{abs}(h) is -\frac{1}{7} at some h_p. Hint: think about what values of h could have this slope.

Suppose the dataset is now modified by repeating each value y_i such that it now contains x_1, y_1, y_1, y_2, y_2, y_3, y_3, y_4, y_4, in ascending order; the ordering of the points is the same as the beginning of this question. What would the slope of T_\text{abs}(h) be at the point whose x-value is h_p, given this assumption?

There are two answers based on if you believed x_2 and x_3 was still in the dataset.

Recall the slope of T_{abs}(h) is equal to \frac{1}{7} * \text{(number of points to the left of h - number of points to the right of h)}.

Correct case 1: assumes that x_2 and x_3 are still in the dataset and finds the answer to be -\frac{1}{11}

  • In our original dataset the slope at h_p = - \frac{1}{7}, this reflects there were four points to the left of h_p and three points to the right of h_p
    • This would mean h_p would be in between y_2 and y_3 of the original dataset (x_1, y_1, y_2, y_3, y_4, x_2, x_3)
  • Our new dataset would become x_1, y_1, y_1, y_2, y_2, y_3, y_3, y_4, y_4, x_2, x_3
  • We assume h_p does not change, this means that it would look like: x_1, y_1, y_1, y_2, y_2, h_p, y_3, y_3, y_4, y_4, x_2, x_3
  • We can see there are five points to the left of h_p and six points to the right of h_p, this means we have \frac{1}{11} (5 - 6) = -\frac{1}{11}

Correct case 2: does not assume that x_2 and x_3 are still in the dataset and finds the answer to be \frac{1}{9}

  • In our original dataset the slope at h_p = - \frac{1}{7}, this reflects there were four points to the left of h_p and three points to the right of h_p
    • This would mean h_p would be in between y_2 and y_3 of the original dataset (x_1, y_1, y_2, y_3, y_4, x_2, x_3)
  • Our new dataset would become x_1, y_1, y_1, y_2, y_2, y_3, y_3, y_4, y_4
  • We assume h_p does not change, this means that it would look like: x_1, y_1, y_1, y_2, y_2, h_p, y_3, y_3, y_4, y_4
  • We can see there are five points to the left of h_p and four points to the right of h_p, this means we have \frac{1}{9} (5 - 4) = \frac{1}{9}

Difficulty: ⭐️⭐️⭐️⭐️

The average score on this problem was 31%.


Problem 2.5

At the point whose x-value is h_{p}, select the option below that correctly describes the relationship between the slopes of Y_\text{abs}(h), X_\text{abs}(h), and T_\text{abs}(h), respectively.

Hint: We already know the slope of T_\text{abs}(h) at h_p.

Slope of X_{abs}(h) < slope of T_{abs}(h) < slope of Y_{abs}(h)

There is a key insight to make here: the slope of the mean absolute error is influenced by the distribution of points above and below h.

T_{abs}(h) represents the combined dataset (both the x’s and the y’s), so the slope reflects the overall balance between the two datasets. This means the distribution of points above and below h will be smoothed out. As a result the slope will be the least steep.

You can also think of it as: x_1, y_1, y_2, h_p, y_3, y_4, x_2, x_3. The slope of T_{abs}(h) is -\frac{1}{7}.

Y_{abs}(h) represents only the y’s. There are 4 y values and we know these points are closer together than the x values. Because the y values are more concentrated the slope will be larger! (Recall that the mean for these 4 points is 6).

You can also think of it as: y_1, y_2, h_p, y_3, y_4. The slope of Y_{abs}(h) is 0.

X_{abs}(h)represents only the x’s. There are 3 x values and we know these points are farther apart than the y values. Because the x values are more spread out the slope will be smaller! (Recall that the mean for these 3 points is 6).

You can also think of it as: x_1, h_p, x_2, x_3. The slope of X_{abs}(h) is -\frac{1}{3}.


Difficulty: ⭐️⭐️⭐️⭐️

The average score on this problem was 38%.



Problem 3

Source: Summer Session 2 2024 Final, Problem 3a-g

Consider the following vectors in \mathbb{R}^3, where \alpha \in \mathbb{R} is a scalar:

\vec{v}_1 = \begin{bmatrix} 0 \\ 1 \\ 1 \end{bmatrix}, \quad \vec{v}_2 = \begin{bmatrix} 1 \\ 0 \\ 1 \end{bmatrix}, \quad \vec{v}_3 = \begin{bmatrix} \alpha \\ 1 \\ 2 \end{bmatrix}


Problem 3.1

For what value(s) of \alpha are \vec{v}_1, \vec{v}_2, and \vec{v}_3 linearly independent? Show your work, and put your answer in a box.

The vectors are linearly independent for any \alpha \neq 1.

To be linearly independent it means there is not a linear combination between any of the vectors. We can see that if we add \vec v_1 and \vec v_2 it looks almost like \vec v_3, so as long as we can make it so \alpha \neq 1 then the vectors will be independent.


Difficulty: ⭐️⭐️⭐️

The average score on this problem was 69%.


Problem 3.2

For what value(s) of \alpha are \vec{v}_1 and \vec{v}_3 orthogonal? Show your work, and put your answer in a box.

We know for \vec v_1 and \vec v_3 to be orthogonal their dot product should equal zero.

\vec v_1 \cdot \vec v_3 = (0)(\alpha) + (1)(1) + (1)(2) = 0

There are no values of \alpha for which \vec{v}_1, \vec{v}_3 are orthogonal. We can see (0)(\alpha) = 0, which means that we cannot manipulate \alpha in any way to make the vectors orthogonal.


Difficulty: ⭐️⭐️

The average score on this problem was 88%.


Problem 3.3

For what value(s) of \alpha are \vec{v}_2 and \vec{v}_3 orthogonal? Show your work, and put your answer in a box.

\alpha = -2

We know for \vec v_2 and \vec v_3 to be orthogonal their dot product should equal zero.

The dot product is:

\vec v_2 \cdot \vec v_3 = (1)(\alpha) + (0)(1) + (1)(2)

So we can do:

\begin{align*} 0 &= (1)(\alpha) + (0)(1) + (1)(2)\\ 0 &=\alpha + 0 + 2\\ -2 &= \alpha \end{align*}

We can clearly see when \alpha = -2 then the dot product will equal zero.


Difficulty: ⭐️

The average score on this problem was 100%.


Problem 3.4

Regardless of your answer above, assume in this part that \alpha = 3. Is the vector \begin{bmatrix} 3 \\ 5 \\ 8 \end{bmatrix} in \textbf{span}(\vec{v}_1, \vec{v}_2, \vec{v}_3)?

Yes

If \alpha = 3 and all three vectors are independent from one another then any vector in \mathbb{R}^3 is in the span of 3 linearly independent vectors in \mathbb{R}^3.


Difficulty: ⭐️

The average score on this problem was 94%.


The following information is repeated from the previous page, for your convenience.

Consider the following vectors in \mathbb{R}^3, where \alpha \in \mathbb{R} is some scalar:

\vec{v}_1 = \begin{bmatrix} 0 \\ 1 \\ 1 \end{bmatrix}, \quad \vec{v}_2 = \begin{bmatrix} 1 \\ 0 \\ 1 \end{bmatrix}, \quad \vec{v}_3 = \begin{bmatrix} \alpha \\ 1 \\ 2 \end{bmatrix}


Problem 3.5

What is the projection of the vector \begin{bmatrix} 3 \\ 5 \\ 8 \end{bmatrix} onto \vec{v}_1? Provide your answer in the form of a vector. Show your work, and put your answer in a box.

\begin{bmatrix}0 \\ 6.5 \\ 6.5 \end{bmatrix}

We follow the equation \frac{\begin{bmatrix}3 \\ 5 \\ 8\end{bmatrix} \cdot \vec v_1}{\vec v_1 \cdot \vec v_1} \vec v_1 to find the projection of \begin{bmatrix}3 \\ 5 \\ 8\end{bmatrix} onto \vec{v_1}:

\begin{align*} \frac{\begin{bmatrix}3 \\ 5 \\ 8\end{bmatrix} \cdot \begin{bmatrix}0 \\ 1 \\ 1 \end{bmatrix}}{\begin{bmatrix}0 \\ 1 \\ 1 \end{bmatrix}\cdot \begin{bmatrix}0 \\ 1 \\ 1 \end{bmatrix}} \begin{bmatrix}0 \\ 1 \\ 1 \end{bmatrix} &= \frac{13}{2} \begin{bmatrix}0 \\ 1 \\ 1 \end{bmatrix}\\ &= \begin{bmatrix}0 \\ 6.5 \\ 6.5 \end{bmatrix} \end{align*}


Difficulty: ⭐️⭐️⭐️

The average score on this problem was 58%.


Problem 3.6

The following information is repeated from the previous page, for your convenience.

Consider the following vectors in \mathbb{R}^3, where \alpha \in \mathbb{R} is some scalar:

\vec{v}_1 = \begin{bmatrix} 0 \\ 1 \\ 1 \end{bmatrix}, \quad \vec{v}_2 = \begin{bmatrix} 1 \\ 0 \\ 1 \end{bmatrix}, \quad \vec{v}_3 = \begin{bmatrix} \alpha \\ 1 \\ 2 \end{bmatrix}

What is the orthogonal projection of the vector \begin{bmatrix} 3 \\ 15 \\ 18 \end{bmatrix} onto \textbf{span}(\vec{v}_1, \vec{v}_2)? Note that for X = \begin{bmatrix} 0 & 1 \\ 1 & 0 \\ 1 & 1 \end{bmatrix}, we have

(X^T X) ^{-1}X^T = \begin{bmatrix} -\frac{1}{3} & \frac{2}{3} & \frac{1}{3} \\[6pt] \frac{2}{3} & -\frac{1}{3} & \frac{1}{3} \end{bmatrix} Write your answer in the form of coefficients that multiply \vec{v}_1 and \vec{v}_2 and yield a vector \vec{p} that is the orthogonal projection requested above.

\vec{p} = \_\_(a)\_\_ \cdot \vec{v}_1 + \_\_(b)\_\_ \cdot \vec{v}_2

What goes in \_\_(a)\_\_ and \_\_(b)\_\_?

To find what goes in \_\_(a)\_\_ and \_\_(b)\_\_ we need to multiply (X^T X)^{-1}X^T and \vec y.

\begin{align*} \begin{bmatrix} a \\ b \end{bmatrix} &= (X^T X)^{-1}X^T \begin{bmatrix}3 \\ 15 \\ 18\end{bmatrix}\\ \begin{bmatrix} a \\ b \end{bmatrix} &= \begin{bmatrix} -\frac{1}{3}(3) & \frac{2}{3}(15) & \frac{1}{3}(18) \\ \frac{2}{3}(3) & -\frac{1}{3}(15) & \frac{1}{3}(18) \end{bmatrix}\\ \begin{bmatrix} a \\ b \end{bmatrix} &= \begin{bmatrix} -1 + 10 + 15 \\ 2 - 5 + 6 \end{bmatrix}\\ \begin{bmatrix} a \\ b \end{bmatrix} &= \begin{bmatrix} 15 \\ 3 \end{bmatrix}\\ \end{align*}


Difficulty: ⭐️⭐️⭐️

The average score on this problem was 73%.


Problem 3.7

Is it true that, for the orthogonal projection \vec{p} from above, the entries of \vec{e} = \vec{p} \ - \begin{bmatrix} 3 \\ 15 \\ 18 \end{bmatrix} sum to 0? Explain why or why not. Your answer must mention characteristics of \vec{v}_1,\vec{v}_2, and/or X to receive full credit.

Explain your answer

Yes

Note: Design matrix X lacks a column of all 1s (and the span of its columns does not include all 1s), so residuals don’t necessarily not sum to 0. This means we need more evidence!

\vec{p} is the orthogonal projection of \begin{bmatrix}3\\15\\18\end{bmatrix} onto \text{span}(\vec{v_1}, \vec{v_2}) where \vec{v_1} and \vec{v_2} are the column of the matrix X. From the instructions we know: \vec{e} = \vec{p} \ - \begin{bmatrix} 3 \\ 15 \\ 18 \end{bmatrix}. This means if \begin{bmatrix}3\\15\\18\end{bmatrix} lies in \text{span}(\vec{v_1}, \vec{v_2}) then \vec{e} = \vec{0}. We know \vec{p} is exactly \begin{bmatrix}3\\15\\18\end{bmatrix}, which means the sum of residuals equals 0.


Difficulty: ⭐️⭐️⭐️

The average score on this problem was 69%.



Problem 4

Source: Summer Session 2 2024 Final, Problem 4a-f

You are given a dataset with the following data points and want to fit a variety of hypothesis functions to predict y from features u and v:

\begin{array}{|c|c|c|} \hline u & v & y \\ \hline 1 & 3 & 4 \\ 3 & 0 & 6 \\ 2 & 2 & 5 \\ 4 & -4 & 8 \\ 5 & -1 & 11 \\ \hline \end{array}

You are also provided with the following hypothesis functions, used to construct design matrices:

  1. H_A(u, v) = w_0 + w_1 u + w_2 u^2 + w_3 u^3

  2. H_B(u, v) = w_0 u + w_1 u^2 + w_2 u^3 + w_3 u v

  3. H_C(u, v) = w_0 + w_1 u + w_2 v + w_3 v^2

  4. H_D(u, v) = w_0 + w_1 u + w_2 u^3 + w_3 u v

X_1 = \begin{bmatrix} 1 & 1 & 1 & 1 \\ 1 & 3 & 9 & 27 \\ 1 & 2 & 4 & 8 \\ 1 & 4 & 16 & 64 \\ 1 & 5 & 25 & 125 \end{bmatrix} \ \ \ X_2 = \begin{bmatrix} 1 & 1 & 1 & 3 \\ 1 & 3 & 27 & 0 \\ 1 & 2 & 8 & 4 \\ 1 & 4 & 64 & -16 \\ 1 & 5 & 125 & -5 \end{bmatrix} X_3 = \begin{bmatrix} 1 & 1 & 1 & 3 \\ 3 & 9 & 27 & 0 \\ 2 & 4 & 8 & 4 \\ 4 & 16 & 64 & -16 \\ 5 & 25 & 125 & -5 \end{bmatrix} \ \ \ X_4 = \begin{bmatrix} 1 & 1 & 3 & 9 \\ 1 & 3 & 0 & 0 \\ 1 & 2 & 2 & 4 \\ 1 & 4 & -4 & 16 \\ 1 & 5 & -1 & 1 \end{bmatrix}


Problem 4.1

Which design matrix corresponds to H_A(u, v)?

X_1

We can easily create the design matrix from H_A(u, v) = w_0 + w_1 u + w_2 u^2 + w_3 u^3 by viewing the order of variables and how they are being manipulated. We can see there is a w_0 term, which means that the first column should be a vector of ones.

This means we know:

X_? = \begin{bmatrix} 1 & ? & ? & ? \\ 1 & ? & ? & ? \\ 1 & ? & ? & ? \\ 1 & ? & ? & ? \\ 1 & ? & ? & ? \end{bmatrix}

This immediately eliminates X_3. We now see that the second column should be \vec u. We are told the values inside of the vector at the top, which means we get:

X_? = \begin{bmatrix} 1 & 1 & ? & ? \\ 1 & 3 & ? & ? \\ 1 & 2 & ? & ? \\ 1 & 4 & ? & ? \\ 1 & 5 & ? & ? \end{bmatrix}

This does not eliminate any of our values, so we look to see the next column will be \vec u^2. This means:

X_? = \begin{bmatrix} 1 & 1 & (1)^2 & ? \\ 1 & 3 & (3)^2 & ? \\ 1 & 2 & (2)^2 & ? \\ 1 & 4 & (4)^2 & ? \\ 1 & 5 & (5)^2 & ? \end{bmatrix} = \begin{bmatrix} 1 & 1 & 1 & ? \\ 1 & 3 & 9 & ? \\ 1 & 2 & 4 & ? \\ 1 & 4 & 16 & ? \\ 1 & 5 & 25 & ? \end{bmatrix}

Here we can now eliminate X_2 and X_4, so we know the answer must be X_1!


Difficulty: ⭐️

The average score on this problem was 100%.


Problem 4.2

Which design matrix corresponds to H_B(u, v)?

X_3

We can easily create the design matrix from H_B(u, v) = w_0 u + w_1 u^2 + w_2 u^3 + w_3 u v by viewing the order of variables and how they are being manipulated. We can see there is a w_0 term that is being modified, which means that the first column should be a vector of ones multiplied by \vec u.

This means we know:

X_? = \begin{bmatrix} 1 * 1 & ? & ? & ? \\ 1 * 3 & ? & ? & ? \\ 1 * 2 & ? & ? & ? \\ 1 *4 & ? & ? & ? \\ 1 * 5 & ? & ? & ? \end{bmatrix} = \begin{bmatrix} 1 & ? & ? & ? \\ 3 & ? & ? & ? \\ 2 & ? & ? & ? \\ 4 & ? & ? & ? \\ 5 & ? & ? & ? \end{bmatrix}

We can see the only design matrix with this first column is X_3.


Difficulty: ⭐️

The average score on this problem was 100%.


Problem 4.3

Which design matrix corresponds to H_C(u, v)?

X_4

We can easily create the design matrix from H_C(u, v) = w_0 + w_1 u + w_2 v + w_3 v^2 by viewing the order of variables and how they are being manipulated. We can see there is a w_0 term, which means that the first column should be a vector of ones.

This means we know:

X_? = \begin{bmatrix} 1 & ? & ? & ? \\ 1 & ? & ? & ? \\ 1 & ? & ? & ? \\ 1 & ? & ? & ? \\ 1 & ? & ? & ? \end{bmatrix}

This immediately eliminates X_3. We now see that the second column should be \vec u. We are told the values inside of the vector at the top, which means we get:

X_? = \begin{bmatrix} 1 & 1 & ? & ? \\ 1 & 3 & ? & ? \\ 1 & 2 & ? & ? \\ 1 & 4 & ? & ? \\ 1 & 5 & ? & ? \end{bmatrix}

This does not eliminate any of our values, so we look to see the next column will be \vec v. This means:

X_? = \begin{bmatrix} 1 & 1 & 3 & ? \\ 1 & 3 & 0 & ? \\ 1 & 2 & 2 & ? \\ 1 & 4 & -4 & ? \\ 1 & 5 & -1 & ? \end{bmatrix}

Here we can eliminate X_1 and X_2, which means our answer is X_4.


Difficulty: ⭐️

The average score on this problem was 100%.


Problem 4.4

Which design matrix corresponds to H_D(u, v)?

X_2

We can easily create the design matrix from H_D(u, v) = w_0 + w_1 u + w_2 u^3 + w_3 u v by viewing the order of variables and how they are being manipulated. We can see there is a w_0 term, which means that the first column should be a vector of ones.

This means we know:

X_? = \begin{bmatrix} 1 & ? & ? & ? \\ 1 & ? & ? & ? \\ 1 & ? & ? & ? \\ 1 & ? & ? & ? \\ 1 & ? & ? & ? \end{bmatrix}

This immediately eliminates X_3. We now see that the second column should be \vec u. We are told the values inside of the vector at the top, which means we get:

X_? = \begin{bmatrix} 1 & 1 & ? & ? \\ 1 & 3 & ? & ? \\ 1 & 2 & ? & ? \\ 1 & 4 & ? & ? \\ 1 & 5 & ? & ? \end{bmatrix}

We cannot eliminate any of the design matrices, so we move to the next column, which is \vec u^3. This means:

X_? = \begin{bmatrix} 1 & 1 & (1)^3 & ? \\ 1 & 3 & (3)^3 & ? \\ 1 & 2 & (2)^3 & ? \\ 1 & 4 & (4)^3 & ? \\ 1 & 5 & (5)^3 & ? \end{bmatrix} = \begin{bmatrix} 1 & 1 & 1 & ? \\ 1 & 3 & 27 & ? \\ 1 & 2 & 8 & ? \\ 1 & 4 & 64 & ? \\ 1 & 5 & 125 & ? \end{bmatrix}

Now we can eliminate the design matrices X_1 and X_4, which means the answer is X_2.


Difficulty: ⭐️

The average score on this problem was 100%.


The following hypothesis functions are repeated from the previous subparts, for your convenience, plus an additional hypothesis function H_E:

  1. H_A(u, v) = w_0 + w_1 u + w_2 u^2 + w_3 u^3

  2. H_B(u, v) = w_0 u + w_1 u^2 + w_2 u^3 + w_3 u v

  3. H_C(u, v) = w_0 + w_1 u + w_2 v + w_3 v^2

  4. H_D(u, v) = w_0 + w_1 u + w_2 u^3 + w_3 u v

  5. H_E(u, v) = w_0 + w_1 u + w_2 u^2 + w_3 u^3 + w_4 v + w_5 v^2 + w_6 u v


Problem 4.5

Which of the five hypothesis functions above has the lowest mean squared error on this dataset? Choose a hypothesis function H(\cdot) and briefly justify your answer in the space below.

H_E(u, v)

H_E(u, v) contains the most information. H_E has \vec u and \vec v.This means we have information about both variables. We also see it has every component present in the other functions (H_A, H_B, H_C, H_D). This makes H_E the most expressive, which will allow it to fit our data the best.


Difficulty: ⭐️⭐️⭐️

The average score on this problem was 56%.


Problem 4.6

Suppose we use the lowest-MSE hypothesis function chosen above to make a prediction for a new point (u_\text{new}, v_\text{new}) = (10, 15). Is this prediction likely to be accurate? Justify your answer.

No

This prediction is not likely to be accurate due to overfitting or extrapolation issues. If there are too many terms there is a higher chance the function is overfitting the training data and will not generalize well to new points. You can think of a high-degree polynomial overfitting a linear trend.


Difficulty: ⭐️⭐️

The average score on this problem was 77%.



Problem 5

Source: Summer Session 2 2024 Final, Problem 5a-c

Let \vec{x} = \begin{bmatrix} x_1 \\ x_2 \end{bmatrix}. Consider the function g(\vec{x}) = x_1^2 + x_2^2 + x_1 x_2 - 4x_1 - 6x_2 + 8.


Problem 5.1

Find \nabla g(\vec{x}), the gradient of g(\vec{x}), and use it to show that \nabla g\left( \begin{bmatrix} 1 \\ 2 \end{bmatrix} \right) = \begin{bmatrix} 0 \\ -1 \end{bmatrix}.

\nabla g(\vec{x}) = \begin{bmatrix} 2x_1 + x_2 - 4 \\ 2x_2 + x_1 - 6 \end{bmatrix}

To calculate the gradient, we take the partial derivatives of g with respect to both x_1 and x_2, and arrange these partial derivatives as a vector. \frac{\partial g}{\partial x_1} = 2x_1 + x_2 - 4 \frac{\partial g}{\partial x_2} = 2x_2 + x_1 - 6 Writing these as a vector, we obtain the gradient above.

We can verify that \nabla g\left( \begin{bmatrix} 1 \\ 2 \end{bmatrix} \right) = \begin{bmatrix} 2(1) + (2) - 4 \\ 2(2) + (1) - 6 \end{bmatrix} = \begin{bmatrix} 0 \\ -1 \end{bmatrix}.


Difficulty: ⭐️⭐️

The average score on this problem was 88%.


Problem 5.2

We’d like to find the vector \vec{x}^* that minimizes g(\vec{x}) using gradient descent. Perform one iteration of gradient descent by hand, using the initial guess \vec{x}^{(0)} = \begin{bmatrix} 1 \\ 2 \end{bmatrix} and the learning rate \alpha = \frac{1}{4}. Show your work, and put a \boxed{\text{box}} around your final answer for \vec{x}^{(1)}.

\vec x^{(1)} = \begin{pmatrix} 1 \\ \frac94\end{pmatrix}

Our first step of gradient descent is given by the equation: \vec{x}^{(1)} = \vec{x}^{(0)} - \alpha \nabla g(\vec{x}^{(0)}).

From the previous problem, we found that \nabla g\left( \begin{bmatrix} 1 \\ 2 \end{bmatrix} \right) = \begin{bmatrix} 0 \\ -1 \end{bmatrix}.

Plugging in what we have: \vec{x}^{(1)} = \begin{bmatrix} 1 \\ 2 \end{bmatrix} - \left(\frac{1}{4} \right) \begin{bmatrix} 0 \\ -1 \end{bmatrix} = \begin{bmatrix} 1 \\ 2 \end{bmatrix} - \begin{bmatrix} 0 \\ - \frac14\end{bmatrix} = \begin{pmatrix} 1 \\ \frac94\end{pmatrix}


Difficulty: ⭐️⭐️

The average score on this problem was 87%.


Problem 5.3

Given some new function f(x) that is convex, prove that the function g(x) = a f(x) + b is also convex, where a and b are positive real constants.

Note: this function g(\cdot) is entirely different from g(\vec{x}) on the previous page.

Hint: Use the fact that we already know f(x) is convex and the formal definition of convexity: for all t\in[0, 1], \ (1-t) f(c) + t f(d) \geq f\left((1-t)c + td\right).

To show that g(x) is convex, we want to begin with the expression (1-t) g(c) + t g(d) and use algebra to show it’s \geq g\left((1-t)c + td\right).

Let t\in[0, 1], a\in\mathbb{R}, b\in\mathbb{R}.

\begin{align*} (1-t) g(c) + t g(d) &= (1-t)(af(c) + b) + t(af(d) + b)\\ &= (1-t)(af(c)) + (1-t)b + t(af(d)) + tb\\ &= (1-t)(af(c)) + t(af(d)) + b - tb + tb\\ &= a((1-t)f(c) + tf(d)) + b \geq af((1-t)c + td) + b\\ &= g((1-t)c + td) \end{align*}


Difficulty: ⭐️⭐️⭐️⭐️

The average score on this problem was 49%.



Problem 6

Source: Summer Session 2 2024 Final, Problem 6a-d

Euchre is a card game popular in the Midwest United States and is played with a partial deck of 24 cards: the 9, 10, Jack, Queen, King, and Ace of all four suits (Hearts, Diamonds, Spades, Clubs). A hand of cards in Euchre is 5 cards.

Assume cards are dealt by drawing at random without replacement from the set of 24 possible cards. The order of cards in a hand does not matter.

In this question, you may leave your answers unsimplified, in terms of fractions, exponents, factorials, the permutation formula P(n, k), and the binomial coefficient {n \choose k}.


Problem 6.1

How many possible hands of 5 cards are there in total?

\binom{24}{5}

We are sampling without replacement, and the order in which we select the cards does not matter. The formula to count this is given by the choose formula {n \choose k} = \frac{n!}{k!(n-k)!}, where n=24 is the total number of objects we can choose from, and k=5 is the size of the set of objects we’re sampling. Therefore, the total possible hands is \binom{24}{5}.


Difficulty: ⭐️

The average score on this problem was 100%.


Problem 6.2

What is the probability that all five cards in a hand of Euchre are from the same suit?

\frac{\binom{6}{5}\binom{4}{1}}{\binom{24}{5}}

First, we want to count the number of five card hands of the same suit. There are 4 suits, so there are \binom{4}{1} ways to choose a suit. We want to multiply this choice by the possible ways to choose a set of 5 distinct card “faces”. Since there are 6 card faces (the 9, 10, Jack, Queen, King, and Ace), we count this with \binom{6}{5}. From the previous part, we found the total 5 card hands is \binom{24}{5}. So the probability that we want is \frac{\binom{6}{5}\binom{4}{1}}{\binom{24}{5}}.


Difficulty: ⭐️⭐️⭐️

The average score on this problem was 71%.


Problem 6.3

What is the probability that exactly two cards in a hand of Euchre are from the same suit?

\frac{\binom{4}{1} \cdot \binom{6}{2} \cdot \binom{3}{3} \cdot \binom{6}{1}^3}{\binom{24}{5}}

First, we want to count the number of hands where exactly 2 cards are from the same suit. This means that 2 of the 5 cards are any of the 4 suits, and the remaining 3 cards are each a different suit. We begin by counting the ways to choose 1 suit that 2 of the cards will share: \binom{4}{1}. Then, we must multiply the ways to choose 2 card faces out of the 6 possibilities, which is \binom{6}{2}. The remaining 3 cards in our hand must each be a distinct suit out of the 3 remaining suits; there is exactly \binom{3}{3} = 1 way to choose these suits. Each of the remaining cards can be 1 of 6 card faces, so we choose these 3 cards with \binom{6}{1}^3. The number of hands we want to count is \binom{4}{1} \cdot \binom{6}{2} \cdot \binom{3}{3} \cdot \binom{6}{1}^3.

Similar to the previous problem, we divide by the total number of hands to obtain the probability: \frac{\binom{4}{1} \cdot \binom{6}{2} \cdot \binom{3}{3} \cdot \binom{6}{1}^3}{\binom{24}{5}}


Difficulty: ⭐️⭐️⭐️⭐️

The average score on this problem was 33%.


Problem 6.4

Euchre is played by four players, so when all cards are dealt, there are four remaining cards (24 - 4\cdot 5). What is the probability that all four remaining cards are Jacks (the strongest card in Euchre)?

\frac{4}{24} \cdot \frac{3}{23} \cdot \frac{2}{22} \cdot \frac{1}{21}

The way to think about this problem is if there are 4 Jacks in 24 cards when the first card is dealt the probability that this card is a Jack is \frac{4}{24}. After the first card is delt there are 23 remaining. If that first card was a Jack then the probability the second card is a Jack is \frac{3}{24}. Following the same logic we find if the third card was a Jack then the probability becomes \frac{2}{22} and if the fourth card was a Jack then the probability becomes \frac{1}{21}.

Each probability represents the likelihood of a Jack being left in the deck as the cards are dealt. Since the cards are being dealt without replacement, the number of available Jacks decreases with each deal, and the total number of remaining cards also decreases. This is why the probability for each successive card is calculated based on the remaining number of Jacks and the remaining cards in the deck.


Difficulty: ⭐️⭐️⭐️⭐️⭐️

The average score on this problem was 22%.



Problem 7

Source: Summer Session 2 2024 Final, Problem 7a-e

You go to the (world-renowned) San Diego Safari Park. The animals you see there appear at random, depending on the exhibit. The probabilities of seeing an animal in a given exhibit (Plains, Forest, and Jungle) are provided in the table below. For example, the probability of seeing a Camel if you are in the African Plains is \frac{3}{5} = 60\%, and the probability you do not see a Camel if you are in the African Plains is the complement, \frac{2}{5} = 40\%. Note that you may see multiple animals in each exhibit, so the probabilities given in each row sum to well over 100%.

On the day you go to the Safari Park, the tram navigation system malfunctions and takes you to exhibits at random. It takes guests to the African Plains with probability \frac{3}{5}, the Gorilla Forest with probability \frac{1}{10}, and the Hidden Jungle with probability \frac{3}{10}.

Location Rhino Camel Gazelle Lemur Fruit Bat Gorilla
African Plains \frac{3}{10} \frac{3}{5} \frac{1}{2} \frac{1}{10} - \frac{1}{4}
Gorilla Forest \frac{3}{10} \frac{1}{10} \frac{1}{5} \frac{1}{4} \frac{7}{20} \frac{9}{20}
Hidden Jungle - - - \frac{1}{5} \frac{2}{5} -

In this question, you may leave your answers unsimplified, in terms of fractions, exponents, factorials, the permutation formula P(n, k), and the binomial coefficient {n \choose k}.


Problem 7.1

Assume you only go to one exhibit. What is the probability you see a Lemur at the Safari Park? Show your work, and put a around your final answer.

\frac{1}{10}\cdot\frac{3}{5} + \frac{1}{4}\cdot\frac{1}{10} + \frac{1}{5}\cdot\frac{3}{10}

To solve this problem we need to use the law of total probability. Recall the law of total probability is: P(A) = \sum_{i=1}^n P(A \mid B_i) P(B_i) where B_i is a partition of a sample sapce that are mutually exclusive and collectively exhaustive.

To use the law of total probability here we need to look at the three different instances one may see a lemur! We can see that the lemur is present in the Plains, Forest, and Jungle. This means our equation will look like:

P(\text{Lemur}) = P(\text{Lemur}|\text{Plains})P(\text{Plains}) + P(\text{Lemur}|\text{Forest})P(\text{Forest}) + P(\text{Lemur}|\text{Jungle})P(\text{Jungle})

We are given the conditional probabilities in the table:

  • P(\text{Lemur}|\text{Plains}) = \frac{1}{10}
  • P(\text{Lemur}|\text{Forest})= \frac{1}{4}
  • P(\text{Lemur}|\text{Jungle}) = \frac{1}{5}

In the directions we are given:

  • P(\text{Plains}) = \frac{3}{5}
  • P(\text{Forest}) = \frac{1}{10}
  • P(\text{Jungle}) = \frac{3}{10}

Now all we have to do is plug them into the equation!

\frac{1}{10}\cdot\frac{3}{5} + \frac{1}{4}\cdot\frac{1}{10} + \frac{1}{5}\cdot\frac{3}{10}

Recall we can leave our answer unsimplified!


Difficulty: ⭐️⭐️

The average score on this problem was 81%.


Problem 7.2

Assume you only go to one exhibit, and you see a Lemur. What is the probability you are at the African Plains? Show your work, and put a around your final answer.

\frac{\frac{1}{10}\cdot\frac{3}{5}}{\frac{1}{10}\cdot\frac{3}{5} + \frac{1}{4}\cdot\frac{1}{10} + \frac{1}{5}\cdot\frac{3}{10}} = \frac{12}{29}

To solve this question we need to use Bayes’ Theorem and the law of total probability from part a.

\frac{P(\text{Lemur}|\text{Plains})}{P(\text{Lemur})}

We know from part a that:

\begin{align*} P(\text{Lemur}) &= \frac{1}{10}\cdot\frac{3}{5} + \frac{1}{4}\cdot\frac{1}{10} + \frac{1}{5}\cdot\frac{3}{10}\\ &= \frac{3}{50}+\frac{1}{40}+\frac{3}{50}\\ &= \frac{29}{200} \end{align*}

and that P(\text{Lemur}|\text{Plains}) = \frac{1}{10}.

All we have to do is now combine them!

\frac{\text{P(Lemur | Plains)P(Plains)}}{\text{P(Lemur)}} = \frac{\frac{3}{50}}{\frac{29}{200}} = \frac{12}{29}


Difficulty: ⭐️⭐️

The average score on this problem was 75%.


Problem 7.3

The following information is repeated from the previous page, for your convenience.

You go to the (world-renowned) San Diego Safari Park. The animals you see there appear at random, depending on the exhibit. The probabilities of seeing an animal in a given exhibit (Plains, Forest, and Jungle) are provided in the table below. For example, the probability of seeing a Camel if you are in the African Plains is \frac{3}{5} = 60\%, and the probability you do not see a Camel if you are in the African Plains is the complement, \frac{2}{5} = 40\%. Note that you may see multiple animals in each exhibit, so the probabilities given in each row sum to well over 100%.

On the day you go to the Safari Park, the tram navigation system malfunctions and takes you to exhibits at random. It takes guests to the African Plains with probability \frac{3}{5}, the Gorilla Forest with probability \frac{1}{10}, and the Hidden Jungle with probability \frac{3}{10}.

Location Rhino Camel Gazelle Lemur Fruit Bat Gorilla
African Plains \frac{3}{10} \frac{3}{5} \frac{1}{2} \frac{1}{10} - \frac{1}{4}
Gorilla Forest \frac{3}{10} \frac{1}{10} \frac{1}{5} \frac{1}{4} \frac{7}{20} \frac{9}{20}
Hidden Jungle - - - \frac{1}{5} \frac{2}{5} -

Suppose you visit the Hidden Jungle four times. What is the probability you see a Fruit Bat at least once? Show your work, and put a around your final answer.

1 - (\frac{3}{5})^4

The best way to solve this problem is by using the compliment rule. We can calculate the probability of not seeing a Fruit Bat in the Hidden Jungle: 1 - P(\text{Fruit Bat | Hidden Jungle}) = 1 - \frac{2}{5} = \frac{3}{5}. We then need to put this fraction to the power of 4 for the four independent observations. \frac{3}{5}^4 says we never see the Fruit Bat in the four times we have been to the Hidden Jungle. Now we use the compliment rule again! 1 - \frac{3}{5}^4 gives us the probability of seeing at least one Fruit Bat in 4 observations!

Essentially, 1 - (1 - P(\text{Fruit Bat | Hidden Jungle}))^4 = P(\text{at least 1 Fruit Bat} | \text{Hidden Jungle}).


Difficulty: ⭐️⭐️⭐️

The average score on this problem was 63%.


Problem 7.4

Suppose you visit the African Plains. What is the probability you see a Rhino or a Gazelle? Assume the two events are independent. Show your work, and put a around your final answer.

\frac{3}{10} + \frac{1}{2} - \frac{3}{10} \cdot \frac{1}{2} =\frac{6}{20} + \frac{10}{20} - \frac{3}{20} = \frac{13}{20}

To solve this problem we know we will need to use Inclusion-Exclusion Principle (P(\text{Rhino} \cup \text{Gazelle}) = P(\text{Rhino}) + P(\text{Gazelle}) - P(\text{Rhino} \cap \text{Gazelle})).

We are in the African Plains! This means:

  • P(\text{Rhino}) = \frac{3}{10}
  • P(\text{Gazelle}) = \frac{1}{2}

Since we are told the two events are independent we can do: P(\text{Rhino} \cap \text{Gazelle}) = P(\text{Rhino}) \cdot P(\text{Gazelle}) = \frac{3}{10} \cdot \frac{1}{2} = \frac{3}{20}.

Now we just plug into the Inclusion-Exclusion Principle:

\begin{align*} P(\text{Rhino} \cup \text{Gazelle}) &= P(\text{Rhino}) + P(\text{Gazelle}) - P(\text{Rhino} \cap \text{Gazelle})\\ &= \frac{3}{10} + \frac{1}{2} - \frac{3}{10} \cdot \frac{1}{2} \\ &= \frac{6}{20} + \frac{10}{20} - \frac{3}{20}\\ &= \frac{13}{20} \end{align*}


Difficulty: ⭐️⭐️⭐️

The average score on this problem was 65%.


Problem 7.5

In the African Plains, the probability you see a Rhino, given a Gazelle is present, is \frac{2}{5}, and the probability you see a Camel, given a Gazelle is present, is \frac{7}{10}. If the probability you see both a Rhino and Camel, given a Gazelle is present, is \frac{3}{5}, are the two events conditionally independent?

No

Two events are conditionally independent given a third event if: P(A \cap B | C) = P(A|C) \cdot P(B|C).

We can replace the variables for our scenario: P(\text{Rhino} \cap \text{Camel})|\text{Gazelle} = P(\text{Rhino}|\text{Gazelle}) \cdot P(\text{Camel}|\text{Gazelle}).

We are given:

  • P(\text{Rhino}|\text{Gazelle}) = \frac{2}{5}
  • P(\text{Camel}|\text{Gazelle}) = \frac{7}{10}
  • P(\text{Rhino} \cap \text{Camel})|\text{Gazelle} = \frac{3}{5}

We can simply plug the given values into the equation to see if it holds: \frac{2}{5} \cdot \frac{7}{10} = \frac{7}{25}

This means the answer is “No” because \frac{3}{5} \neq \frac{7}{25}.


Difficulty: ⭐️⭐️

The average score on this problem was 77%.



Problem 8

Source: Summer Session 2 2024 Final, Problem 8

The San Diego Safari Park has one of the world’s most successful cheetah breeding programs. To support this, they keep track of their cheetahs’ characteristics. For each cheetah, the Safari Park keeps track of:

We’re given the following information about the cheetahs in the park:

A new cheetah is observed with dark fur and short claws. Assuming conditional independence of features (color, length) within a class (development), calculate the probability that this cheetah is young using Bayes’ Theorem. In this question, you may leave your answers unsimplified, in terms of fractions, exponents, factorials, the permutation formula P(n, k), and the binomial coefficient {n \choose k}.

\frac{7}{13}

To calculate the probability that a cheetah is young given that it has dark fur and short claws we can use Bayes’ Theorem!

\frac{P(\text{Dark Fur} \cap \text{Short Claws} | \text{Young}) \cdot P(\text{Young})}{P(\text{Dark Fur} \cap \text{Short Claws})}

We first need to calculate: P(\text{Dark Fur} \cap \text{Short Claws} | \text{Young}).

We can do this using the equation: P(\text{Dark Fur} \cap \text{Short Claws} | \text{Young}) = P(\text{Dark Fur} | \text{Young}) \cdot P(\text{Short Claws} | \text{Young}).

We are given:

  • P(\text{Dark Fur} \mid \text{Young}) = 0.80
  • P(\text{Short Claws} \mid \text{Young}) = 0.70

This means: P(\text{Dark Fur} \cap \text{Short Claws} | \text{Young}) = 0.8 \cdot 0.7 = 0.56

Notice the denominator of our Bayes’ Theorem is P(\text{Dark Fur} \cap \text{Short Claws}) this means we need to probability of seeing Dark Fur or Short Claws regardless of the development stage.

This follows the equation for the law of total probability:

\begin{align*} P(\text{Dark Fur} \cap \text{Short Claws}) &= P(\text{Dark Fur} \cap \text{Short Claws} | \text{Young}) * P(\text{Young})\\ &+ P(\text{Dark Fur} \cap \text{Short Claws} | \text{Adolescent}) * P(\text{Adolescent})\\ &+ P(\text{Dark Fur} \cap \text{Short Claws} | \text{Mature}) * P(\text{Mature}) \end{align*}

This means we need to calculate: P(\text{Dark Fur} \cap \text{Short Claws} | \text{Adolescent}) and P(\text{Dark Fur} \cap \text{Short Claws} | \text{Mature}).

For Adolescent we are given:

  • P(\text{Golden Fur} \mid \text{Adolescent}) = 0.50
  • P(\text{Long Claws} \mid \text{Adolescent}) = 0.40

We have to take the compliments of these qualities to get short claws and dark fur!

  • P(\text{Dark Fur} \mid \text{Adolescent}) = 1 - P(\text{Golden Fur} \mid \text{Adolescent}) = 1 - 0.50 = 0.5
  • P(\text{Short Claws} \mid \text{Adolescent}) = 1 - P(\text{Long Claws} \mid \text{Adolescent}) = 1 - 0.40 = 0.6

This means: P(\text{Dark Fur} \cap \text{Short Claws} | \text{Adolescent}) = 0.5 \cdot 0.6 = 0.3

For Mature we are given:

  • P(\text{Long Claws} \mid \text{Mature}) = 1
  • P(\text{Dark Fur} \mid \text{Mature}) = 0.10

We have to take the compliments of long claws to get short claws!

  • P(\text{Dark Fur} \mid \text{Mature}) = 1 - P(\text{Golden Fur} \mid \text{Mature}) = 1 - 1 = 0

This means: P(\text{Dark Fur} \cap \text{Short Claws} | \text{Mature}) = 0 \cdot 0.1 = 0

To finally calculate the denominator we need these probabilities:

  • P(\text{Young}) = 0.25
  • P(\text{Adolescent}) = 0.40
  • P(\text{Mature}) = 0.35

Now we can simply plug our numbers into the equation!

\begin{align*} P(\text{Dark Fur} \cap \text{Short Claws}) &= P(\text{Dark Fur} \cap \text{Short Claws} | \text{Young}) * P(\text{Young})\\ &+ P(\text{Dark Fur} \cap \text{Short Claws} | \text{Adolescent}) * P(\text{Adolescent})\\ &+ P(\text{Dark Fur} \cap \text{Short Claws} | \text{Mature}) * P(\text{Mature})\\ &= 0.56 \cdot 0.25 + 0.3 * 0.4 + 0 * 0.35 \\ &= 0.14 + 0.12\\ &= 0.26 \end{align*}

Now we have all the components to answer the question:

\frac{P(\text{Dark Fur} \cap \text{Short Claws} | \text{Young}) \cdot P(\text{Young})}{P(\text{Dark Fur} \cap \text{Short Claws})} = \frac{0.56 \cdot 0.25}{0.26} = \frac{7}{13}


Difficulty: ⭐️⭐️⭐️

The average score on this problem was 50%.


Problem 9

Source: Summer Session 2 2024 Final, Problem 9

Information about a sample of 50 cheetahs in the San Diego Safari Park is summarized in the table below.


Golden Fur Dark Fur Sum of Row
Short Claws Long Claws Short Claws Long Claws
Young 2 0 10 4 16
Adolescent 6 3 2 6 17
Mature 3 12 0 2 17
Sum of Column 11 15 12 12 50


For instance, we’re told that 10 cheetahs with dark fur and short claws are young and that there were 11 cheetahs with golden fur and short claws.

Given its other characteristics, San Diego Safari Park would like to use this information to predict whether a new cheetah to the Park is young, adolescent, or mature.

A new cheetah is observed with golden fur and long claws. Using the data in the table and assuming conditional independence of features, use the Naive Bayes formula with smoothing to determine which developmental stage is most likely for the new cheetah. In this question, you may leave your answers unsimplified, in terms of fractions, exponents, factorials, the permutation formula P(n, k), and the binomial coefficient {n \choose k}.

Mature

To figure out if the new cheetah is young, adolescent, or mature we need to find the probabilities for each stage of development.

This means we need to find the probability of a young cheetah having golden fur and long claws, the probability of a adolenscent cheetah having golden fur and long claws, and the probability of a mature cheetah having golden fur and long claws.

Let Age represent either Young, Adolescent, or Mature. We need to then follow the equation: P(\text{Age}|\text{Golden Fur} \cap \text{Long Claws}) = P(\text{Age}) \cdot P(\text{Golden Fur}|\text{Age}) \cdot P(\text{Long Claws}|\text{Age}) for all the different ages.

Recall we will also be using smoothing! This means:

P(\text{Feature}|\text{Class}) = \frac{\text{Count of Features in Class} + 1}{\text{Total in Class} + \text{Number of Possible Feature Values}}

When Age = Young:

  • P(\text{Young}) = \frac{16+1}{50+3} = \frac{17}{53}
  • P(\text{Golden Fur}|\text{Young}) = \frac{2 + 0 + 1}{16 + 2} = \frac{3}{18}
  • P(\text{Long Claws}|\text{Young}) = \frac{0 + 1}{16 + 2} = \frac{5}{18}

\begin{align*} P(\text{Young}|\text{Golden Fur} \cap \text{Long Claws}) &= P(\text{Young}) \cdot P(\text{Golden Fur}|\text{Young}) \cdot P(\text{Long Claws}|\text{Young})\\ &= \frac{3}{18} \cdot \frac{5}{18} \cdot \frac{17}{53}\\ &= \frac{85}{5724} \end{align*}

When Age = Adolescent:

  • P(\text{Adolescent}) = \frac{17 + 1}{50 + 3} = \frac{18}{53}
  • P(\text{Golden Fur}|\text{Adolescent}) = \frac{6 + 3 + 1}{17 + 2} = \frac{10}{19}
  • P(\text{Long Claws}|\text{Adolescent}) = \frac{3 + 6 + 1}{17 + 2} = \frac{10}{19}

\begin{align*} P(\text{Adolescent}|\text{Golden Fur} \cap \text{Long Claws}) &= P(\text{Adolescent}) \cdot P(\text{Golden Fur}|\text{Adolescent}) \cdot P(\text{Long Claws}|\text{Adolescent})\\ &= \frac{10}{19} \cdot \frac{10}{19} \cdot \frac{18}{53}\\ &= \frac{1800}{19133} \end{align*}

When Age = Mature:

  • P(\text{Mature}) = \frac{17+1}{50+3} = \frac{18}{53}
  • P(\text{Golden Fur}|\text{Mature}) = \frac{3 + 12 + 1}{17 + 2} = \frac{16}{19}
  • P(\text{Long Claws}|\text{Mature}) = \frac{12 + 2 + 1}{17 + 2} = \frac{15}{19}

\begin{align*} P(\text{Mature}|\text{Golden Fur} \cap \text{Long Claws}) &= P(\text{Mature}) \cdot P(\text{Golden Fur}|\text{Mature}) \cdot P(\text{Long Claws}|\text{Mature})\\ &= \frac{16}{19} \cdot \frac{15}{19} \cdot \frac{18}{53}\\ &= \frac{4320}{19133} \end{align*}

When age is Mature then it has the highest probability, which means we predict the new cheetah is mature.


Difficulty: ⭐️⭐️⭐️

The average score on this problem was 59%.


👋 Feedback: Find an error? Still confused? Have a suggestion? Let us know here.