Spring 2023 Final Exam Part 2

← return to practice.dsc40a.com


Instructor(s): Janine Tiefenbruck

This exam was administered in person. Students had 50 minutes to take this exam.


Problem 1

Source: Spring 2023 Final Part 2, Problem 1

You run the k-means clustering algorithm on a dataset and it converges to a certain clustering with associated inertia I. You then duplicate each data point in the dataset and run k-means again on this twice-as-big dataset, with the same initial centroids as before. Which of the following is true? Select all that apply.

Bubbles 1 and 3: “The centroids found will be the same as before” and “The inertia will be twice as much as before, 2I.”

The centroids found will be the same as before because the dataset has been duplicated, but they have not moved! This means k means clustering will find the same clusters as before. You can imagine it like points overlapping each other.

The inertia will be twice as much before 2I because of how inertia is calculated. Recall inertia measures how well a dataset is clustered and is calculated by measuring the distance between each data point and its centroid, squaring the distance, and summing these squares across a cluster. We have doubled our points! Since they did not move otherwise the difference does not change, but we have twice as many points, which will cause the inertia to double.


Problem 2

Source: Spring 2023 Final Part 2, Problem 2

You roll a 12-sided die 8 times.


Problem 2.1

What is the probability you roll 8 distinct values?

\dfrac{{12 \choose 8}\cdot8!}{12^8} = \dfrac{12}{12}\cdot\dfrac{11}{12}\cdot\dfrac{10}{12}\cdot\dots\cdot\dfrac{5}{12}

There’s two ways you can think of this problem.

The numerator {12 \choose 8}\cdot8! is the number of total possible permutations of 8 distinct values; {12 \choose 8} possible sets of 8 distinct values times 8! orderings, because the die could roll the same set of distinct values in 8! different ways. The denominator 12^8 is the number of total possible outcomes. Altogether, dividing the numerator by the denominator gives us the probability of rolling 8 distinct values.

We can also solve this problem differently by realizing that the rolls are independent. In the first roll the probability of getting a new number is 1 (since any number would be new), now in the second roll the probability of getting a new number is \frac{11}{12} since we’ve only rolled one number so rolling any of the other 11 numbers would be rolling a new number.If you keep doing this for each roll, and multiply the independent probabilities of getting a new number in each of the rolls you’ll get \dfrac{12}{12}\cdot\dfrac{11}{12}\cdot\dfrac{10}{12}\cdot\dots\cdot\dfrac{5}{12}


Problem 2.2

What is the probability you roll exactly 7 distinct values?

\dfrac{{8 \choose 2}\cdot{12 \choose 7}\cdot7!}{12^8}

Again, I’m going to show you two different ways you can think of this problem. We can get the total number of outcomes that get you exactly 7 distinct values by doing the following, first choose 7 from the 12 possible numbers (there’s {12 \choose 7} ways to do it), then choose a number to duplicate (there’s {7 \choose 1} ways to do that), lastly you need to order them. From the 8 possible places you have availible you need to choose 2 for the repeated value (there’s {8 \choose 2} ways to do that), then once you chose a placement for your reapeted value you need to choose one for each of the remaing 6 values(there’s 6! ways to do that). Finally just multiply the different ways you can perform each of those actions and you’ll get the number of ways you can roll 7 different values \begin{align*} {12 \choose 7}\cdot{7 \choose 1}\cdot{8 \choose 2}\cdot6! = {8 \choose 2}\cdot{12 \choose 7}\cdot7! \end{align*}

lastly, just divide by the total number of possible outcomes and you get \dfrac{{8 \choose 2}\cdot{12 \choose 7}\cdot7!}{12^8}

You could also find the number of rolls with exactly 7 different values by doing the following. First, select an ordered string of size 7 from the 12 possible values (there’s {12 \choose 7}\cdot7! ways to do that), then take the first value of the string as the duplicated value and keep the others as they are, now you only need to place 2 copies of the repeated value somewhere on your string of 6 numbers (there’s {8 \choose 2} ways to do that), multiply this the ways you can do these two actions and divide by total number of outcomes and you’ll get \dfrac{{8 \choose 2}\cdot{12 \choose 7}\cdot7!}{12^8} This solution is faster but less intuitive.



Problem 3

Source: Spring 2023 Final Part 2, Problem 3

Consider three events A, B, and C in the same sample space.


Problem 3.1

Which of the following is equivalent to P((A\cap B)|(B \cap C))? Select all that apply.

P(A|(B \cap C)) and P((A \cap C)|(B \cap C))

Recall the formula for conditional probability is P(A|B) = \frac{P(A \cap B)}{P(B)}. We are going to use this fact to figure out which of the following are the same as P((A\cap B)|(B \cap C)).

We should first expand P((A\cap B)|(B \cap C)) by doing: P((A\cap B)|(B \cap C)) = \frac{P(A \cap B) \cap P(B \cap C)}{P(B \cap C)}. We can further simplify the numerator to be (A \cap B) \cap (B \cap C) \rightarrow A \cap B \cap C. This means we get P((A\cap B)|(B \cap C)) by doing: P((A\cap B)|(B \cap C)) = \frac{P(A \cap B \cap C)}{P(B \cap C)}.

Now we can go through each of the options and figure out if, when put inside the formula for conditional probability, we get \frac{P(A \cap B \cap C)}{P(B \cap C)}.


P(A|(B \cap C)) P(A|(B \cap C)) = \frac{P(A)P(B \cap C)}{P(B \cap C)}= \frac{P(A \cap (B \cap C))}{P(B \cap C)} = \frac{P(A \cap B \cap C)}{P(B \cap C)} This matches \frac{P(A \cap B \cap C)}{P(B \cap C)}!


P(A \cap B \cap C)

We can see that there is no way for this to equal \frac{P(A \cap B \cap C)}{P(B \cap C)} because it is missing the denominator.


P((B \cap C)|(A \cap B))

P((B \cap C)|(A \cap B)) = \frac{P(B \cap C)P(A \cap B)}{P(A \cap B)}= \frac{P(A \cap B \cap C)}{P(A \cap B)} This does not match \frac{P(A \cap B \cap C)}{P(B \cap C)}.


P((A \cap C)|(B \cap C)) P((A \cap C)|(B \cap C)) = \frac{P(A \cap C)P(B \cap C)}{P(B \cap C)} = \frac{P(A \cap B \cap C)}{P(B \cap C)} This matches \frac{P(A \cap B \cap C)}{P(B \cap C)}!


Since options 1 and 4 are correct the answer cannot be None of the Above!


Problem 3.2

Suppose P((A \cap B)|C) = P(A|(B \cap C))*P(B). Which of the following pairs of events must be independent?

B, C

We are going to rewrite the following in the hopes of being able to simplify things later. To do this we will once again use the formula for conditional probability: P(A|B) = \frac{P(A \cap B)}{P(B)}

Let’s look at the left side of P((A \cap B)|C) = P(A|(B \cap C))*P(B).

P((A \cap B)|C) = \frac{P(A \cap B)P(C)}{P(C)} = \frac{P(A \cap B \cap C)}{P(C)}

Let’s now look at the right side of P((A \cap B)|C) = P(A|(B \cap C))*P(B).

P(A|(B \cap C))*P(B) = \frac{P(A)P(B \cap C)}{P(B \cap C)} * P(B) = \frac{P(A \cap B \cap C)}{P(B \cap C)}* P(B) = \frac{P(A \cap B \cap C) * P(B)}{P(B \cap C)}

Now we can look at them together! \frac{P(A \cap B \cap C)}{P(C)} = \frac{P(A \cap B \cap C) * P(B)}{P(B \cap C)}. We can cross multiply to clear the fractions: P(A \cap B \cap C) * P(B \cap C) = P(A \cap B \cap C) * P(B) * P(C). Assuming that P(A \cap B \cap C) \neq 0 we can divide both sides by P(A \cap B \cap C). We end up with P(B \cap C) = P(B) * P(C), which demonstrates to us that B and C are independent of one another.



Problem 4

Source: Spring 2023 Final Part 2, Problem 4

At Panda Express, there are 18 items available on the menu. Food is served on plates that are divided into three equal sections, and all sections of the plate must contain exactly one menu item.


Problem 4.1

Suppose each section of the plate is a different color (in a clockwise order around the plate, the sections are red, yellow, blue). How many different-looking plates of food can be created, if we must have three different menu items on the plate?

One example plate of food has Honey Walnut Shrimp in the red section, Kung Pao Chicken in the yellow section, and brown rice in the blue section.

{18 \choose 3} \cdot 3!

This problem is exactly asking for the number of permutations of three elements from a pool of 18 elements, which is \begin{align*} 18\cdot17\cdot16 = \frac{18!}{15!} = {18 \choose 3} \cdot 3! \end{align*}

You should remember that whenever you draw items without replacement and the order matters then you are calculating a permutation. Alternatively, you can reason that in the red segment of the plate, you can have any of the 18 different menu items, then for the yellow segment you only have 17 menu items available, since the red already has one, lastly you have 16 items left for the blue part of the plate. Then just multiply all the ways you can choose menu items for each segment getting a total of \begin{align*} 18\cdot17\cdot16 = {18 \choose 3} \cdot 3! \end{align*}


Problem 4.2

Suppose each section of the plate is a different color (in a clockwise order around the plate, the sections are red, yellow, blue). How many different-looking plates of food can be created, if we are allowed to have the same food in multiple sections?

18^3

In this question, since we are allowed to repeat menu items, we have 18 options for each of the sections of the plate. Then when you multiply all the ways you can choose items for a secyion of the plate you get 18\cdot18\cdot18 = 18^3. Similarly to how we explained the question before this one, aske your self if the drawing process has replacement and if the order matters. In this case you’ll find that we have replacement and we also have ordeirng, so this is a sequence, and the number of sequence’s of lenght k from a pool of n elements is n^k.


Problem 4.3

Suppose each section of the plate is the same color (white). How many different-looking plates of food can be created, if we must have three different menu items on the plate?

One example plate of food has, in a clockwise order around the plate, Eggplant Tofu, Broccoli Beef, and fried rice. This is the same plate as the one that has, in a clockwise order, Broccoli Beef, fried rice, and Eggplant Tofu, because one plate can be rotated to look like the other (this counts as the same.)

2 \cdot {18 \choose 3}

Similarly to how we solved the previous problems, we are going to deconstruct the process of building a plate with the given characteristics. First, we know that any plate must have three items, so let’s start by choosing three items from the 18 menu items we have available (we have {18 \choose 3} ways to do that).

Now once you chose the menu items that go on your plate, you need to arange them on your plate, normally if we are aranging three distinct elements on a line we know that there’s 3! ways to do that, unfortunately like we saw in the prompt of this problem we need to account for when we rotate the plate. In the prompt it was signaled that the plate with Eggplant Tofu, Broccoli Beef, and fried rice in clockwise order around the plate, is the same as the plate with Broccoli Beef, fried rice, and Eggplant Tofu (also in clockwise order), this happens because if you rotate your fisrt plate in counter clockwise direction then you would get the second plate, and in fact we could do this again to show those two plates are the same as the plate with fried rice, Eggplant Tofu, and Broccoli Beef. If you were to do the rotation again you would get the original plate, so for any specific plate we can find three different lines that describe the same plate, thus if we divide the number of ways to order our 3 menu items on a line(3!) by 3, we should get the number of different plates we can build with any three items, so \frac{3!}{3} = 2

Finally we only need to mutiply the number of ways we can choose three menu items by the number of plates we can build with those three items getting 2 \cdot {18 \choose 3}.


Problem 4.4

Suppose each section of the plate is the same color (white). Now we’d like to consider what happens if we can have the same food in multiple sections. What do we need to add to 4.3’s answer to get the number of different-looking plates of food that can be created, if we are allowed to have the same food in multiple sections?

As before, two plates that can be rotated to look like one another count as the same.

2 \cdot {18 \choose 2} + {18 \choose 1}

Now that we are allowed to repeat menu items we need to consider the 2 new types of plates, type 1 a plate with the same menu item 3 times, type 2 a plate with a repeated menu item and an additional menu item. Then the total number of plates should be our answer from 4.3 plus the number of plates with type 1 and number of plates with type 2.

Type 1 Counting the number of type 1 plates should be fairly simple, if your plate only contains one menu item then there should be one plate of this type for each of the menu items, therefore there’s 18 plates like this.

Type 2 Now for type 2, we can start by chossing two items from the menu (there’s 18 \choose 2 ways to do that) then from here note that you need to choose which item you are going to repeat because depending on which one you choose to double you get a different plate (there’s 2 options to double), then you should also consider the order but you should realize that no matter how you arange the item on your plate you are just getting rotations of the same plate, therefore your total number of plates of type 2 is 2 \cdot {18 \choose 2}

Lastly, just add both the plates of type 1 and type 2, and that’s the number you need to add to your answer from 4.3. Answer: 2 \cdot {18 \choose 2} + {18 \choose 1}



Problem 5

Source: Spring 2023 Final Part 2, Problem 5

You’re interested in seeing the Milky Way in the night sky, which you have sometimes been able to do when conditions are right. For each night that you’ve attempted to see the Milky Way, you have a record of:

These 12 attempts are recorded in the table below.

setting season time       outcome
suburban winter late night successful
rural spring late night successful
urban winter early morning successful
rural winter early morning successful
rural summer late night unsuccessful
urban winter late night unsuccessful
rural winter early morning unsuccessful
rural spring late night unsuccessful
urban fall early morning unsuccessful
suburban summer late night unsuccessful
urban winter late night unsuccessful
rural winter late night unsuccessful

You want to use a Naive Bayes classifier to predict whether you’ll be successful in seeing the Milky Way under the following conditions:

Naive Bayes predicts that, under these conditions, the probability you are unsuccessful is k times the probability you are successful, for an integer value of k. What is k?

k=3

According to the Naive Bayes formula we will be solving these equations: \begin{align*} &P(\text{successful}|\text{urban, winter, late night})\\ &= P(\text{successful}) * P(\text{urban}|\text{successful}) * P(\text{winter}|\text{successful}) * P(\text{late night}|\text{successful}) \end{align*} and \begin{align*} &P(\text{unsuccessful}|\text{urban, winter, late night}) \\ &= P(\text{unsuccessful}) * P(\text{urban}|\text{unsuccessful}) * P(\text{winter}|\text{unsuccessful}) * P(\text{late night}|\text{unsuccessful}) \end{align*}

This means we need to calculate our prior probabilities P(\text{successful}), P(\text{unsucessful}), P(\text{urban}|\text{successful}), P(\text{urban}|\text{usuccessful}), P(\text{winter}|\text{successful}), P(\text{winter}|\text{unsuccessful}), P(\text{late night}|\text{successful}), and P(\text{late night}|\text{unsuccessful}).

We can calculate P(\text{successful}) by looking at the number of successful outcomes from all possible outcomes. We find P(\text{successful}) = \frac{4}{12} = \frac{1}{3}.

Similarily we can calculate P(\text{unsucessful}). P(\text{unsucessful}) = \frac{8}{12} = \frac{2}{3}.

We can now calculate P(\text{urban}|\text{successful}) by looking at how many settings are urban when the outcome is successful. P(\text{urban}|\text{successful}) = \frac{1}{4}.

Similarily we can calculate P(\text{urban}|\text{unsuccessful}). P(\text{urban}|\text{unsuccessful}) = \frac{3}{8}.

Following this same pattern we will find:

P(\text{winter}|\text{successful}) = \frac{3}{4}

P(\text{winter}|\text{unsuccessful}) = \frac{4}{8} = \frac{1}{2}

P(\text{late night}|\text{successful}) = \frac{2}{4} = \frac{1}{2}

P(\text{late night}|\text{unsuccessful}) = \frac{6}{8} = \frac{3}{4}

From here all we need to do is plug our prior probnabilities into the equations we made earlier and solve!

\begin{align*} & P(\text{successful}|\text{urban, winter, late night})\\&= P(\text{successful}) * P(\text{urban}|\text{successful}) * P(\text{winter}|\text{successful}) * P(\text{late night}|\text{successful})\\ &= \frac{1}{3} * \frac{1}{4} * \frac{3}{4} * \frac{1}{2}\\ &= \frac{1}{32} \end{align*}

and

\begin{align*} & P(\text{unsuccessful}|\text{urban, winter, late night}) \\ &= P(\text{unsuccessful}) * P(\text{urban}|\text{unsuccessful}) * P(\text{winter}|\text{unsuccessful}) * P(\text{late night}|\text{unsuccessful})\\ &= \frac{2}{3} * \frac{3}{8} * \frac{1}{2} * \frac{3}{4}\\ &= \frac{3}{32} \end{align*}

We are told in the problem that “the probability you are unsuccessful is k times the probability you are successful.”

This means k * \frac{1}{32} = \frac{3}{32}, which we can easily see shows k = 3.


Problem 6

Source: Spring 2023 Final Part 2, Problem 6

Suppose UCSD implements a new graduation requirement for data science majors. All data science majors must submit a list of English words they can make using the letters of “Halicioglu”. When these lists are collected, it is determined that:

How many data science majors submitted a list?

280

This problem requires us to the the principle of inclusion-exclusion for three sets. For this problem lets define the following sets:

  • Let H be the set of people who included “hall” on their list
  • Let C be the set of people who included “chill” on their list
  • Let L be the set of people who included “logical” on their list

We are told:

  • |H| = 200
  • |C| = 100
  • |L| = 25
  • |H \cap C| = 80
  • |C \cap L| = 15
  • |H \cap L| = 20
  • |H \cap C \cap L| = 10
  • |H' \cap C' \cap L'| = 60
    • H' means “not H

Recall the principle of inclusion-exclusion for three sets follows: |H \cup C \cup L| = |H| + |C| + |L| - |H \cap C| - |C \cap L| - |H \cap L| + |H \cap C \cap L|

We can simply plug and chug into the equation: |H \cup C \cup L| = 200 + 100 + 25 - 80 - 15 - 20 + 10 = 220. This means the number of data science majors who submited words “hall,”chill,” and “logical” is 220. But this is not all data science majors!

We need to add in the number of data science majors who did not submit any of those words in their list, which we are told is 60 students. This means the number of data science majors is 220 + 60 = 280.


👋 Feedback: Find an error? Still confused? Have a suggestion? Let us know here.