Winter 2024 Final Exam Part 2

← return to practice.dsc40a.com


Instructor(s): Aobo Li

This exam was administered in person. Students had 50 minutes to take this exam.


Problem 1

Source: Winter 2024 Final Part 2, Problem 1

Consider an experiment where each of n people selects one piece from their own set of chess pieces, uniformly at random. A chess set has 32 pieces. 16 of these are black and 16 of these are white. In each color, the 16 pieces are:

The result of the experiment is a description of the colors and types of the pieces each person selected. For example, if n=3, one possible result is:


Problem 1.1

How many results are possible for this experiment with n people?

12^n

To answer this question we must determine how many different pieces each person can choose. In a complete chess set we know that there’s pawns, knights, bishops, rooks, queens, and kings. Then we also know that each of these pieces can be aither black or white, therefore there’s a totoal of 6 \cdot 2 = 12 different pieces. Notice that a complete chess set has 32 total pieces, but there’s multiple duplicates in those 32 pieces that is why we say there’s only 12 different pieces on a chess set.

Now since every person has their own chess set, we know that each of them can get any of the 12 pieces, therefore if we do this with one person there would be 12 possible outcomes, now if we do it with two persons the first person can get any of the 12 pieces, but then for each piece that person 1 can choose person 2 can choose from their 12 pieces, which means that in this case there’s 12 \cdot 12 outcomes. if we keep doing this for n persons we’ll get that there’s 12 \cdot \dots \cdot 12 = 12^n.


Problem 1.2

As n becomes large, what fraction of n people has selected a black pawn? Choose the answer that’s closest to the actual fraction.

\frac{1}{4}

First, let’s calculate the probability of one person choosing a black pawn. We know that the complete chess set has 32 pieces, and out of those 32 there are 8 black pawns, which gives us a probability of \frac{8}{32} = \frac{1}{4} of selecting a black pawn. Now, think about how we interpret probabilities; for example, if we say that the probability of getting heads when flipping a coin is \frac{1}{2} that just means that if we flip a coin over and over, we expect to see heads \frac{1}{2} of the time. Similarly, when we say that the probability of choosing a black pawn is \frac{1}{4}, that means that if we were to choose a chess piece over and over, we expect to choose a black pawn \frac{1}{4} of the time.

As n becomes large, we choose chess pieces over and over n times. Therefore, you still expect to see the black pawn \frac{1}{4} of the time.



Problem 1.3

True or False: Having two pawns is independent of having two white pieces.

False.

We know that two probabilities are independent of each other if P(A \cap B) = P(A) \cdot P(B).

  • Let P(A) be the probability of getting a white piece twice in a row.
  • Let P(B) be the probability of getting a pawn twice in a row.

We can calculate the probability of getting a white piece twice in a row P(A) = \frac{16}{32} \cdot \frac{15}{31} = \frac{15}{62}. We get \frac{16}{32} from the fact there are 16 white pieces out of the 32 total pieces. We get \frac{15}{31} from the fact there are now 15 white pieces out of the 31 pieces left after taking out a white piece.

We can also calculate the probability of getting a pawn twice in a row. Note that the color of the pawn does not matter! P(B) = \frac{16}{32} \cdot \frac{15}{31} = \frac{15}{62}. We get \frac{16}{32} from the fact there are 16 pawns (8 black pawns plus 8 white pawns) out of the 32 total pieces. We get \frac{15}{31} from the fact there are now 15 pawns out of the 31 pieces left after taking out a pawn.

We can now calculate P(A) \cdot P(B): \frac{15}{62} \cdot \frac{15}{62} = \frac{225}{3844}

Recall we calculate P(A \cap B) = P(A) \cdot P(B|A).

To find P(B|A), we assume that both pieces are white and imagine the probabilities of getting two pawns from that batch of white pieces. To get the first pawn, the probability is \frac{8}{16} (8 pawns out of 16 white pieces). To get the second pawn, the probability is: \frac{7}{15} (7 remaining pawns out of 15 remaining white pieces). So, the probabilitiy of getting two pawns given we have two white pieces is P(B|A) = \frac{8}{16} \cdot \frac{7}{15} = \frac{7}{30}. We also already solved for P(A) above: \frac{15}{62}.

\frac{15}{62} \cdot \frac{7}{30} = \frac{7}{124}

We can see that P(A \cap B) \neq P(A) \cdot P(B) because \frac{7}{124} \neq \frac{225}{3844}, so the answer is “False.”



Problem 2

Source: Winter 2024 Final Part 2, Problem 2

You run the k-means clustering algorithm on a dataset and it converges to a certain clustering with associated inertia I. You then duplicate each data point in the dataset and run k-means again on this twice-as-big dataset, with the same initial centroids as before. Which of the following is true? Select all that apply.

Bubbles 1 and 3: “The centroids found will be the same as before” and “The inertia will be twice as much as before, 2I.”

The centroids found will be the same as before because the dataset has been duplicated, but they have not moved! This means k means clustering will find the same clusters as before. You can imagine it like points overlapping each other.

The inertia will be twice as much before 2I because of how inertia is calculated. Recall inertia measures how well a dataset is clustered and is calculated by measuring the distance between each data point and its centroid, squaring the distance, and summing these squares across a cluster. We have doubled our points! Since they did not move otherwise the difference does not change, but we have twice as many points, which will cause the inertia to double.


Problem 3

Source: Winter 2024 Final Part 2, Problem 3

With the help of the Alien from Bayesian Galaxy, Issac built a Little Hadron Collider in his garage to continue testing some fundamental principles of nature. The alien set up a fixed target at one end of the collider, and ask Issac to shoot quarks from the other end. Everytime the quark hit the target, the Alien will tell Issac whether a new hadron is formed or not. Due to the energy limitation in Issac’s garage, he could only generate up, down, top, and bottom quarks (Strange and Charm quark would have consumed too much energy). For each quark Issac created, he has a record of:

Issac created 10 quarks, and these 10 quarks are recorded in the table below.

type state color Formed Hadron
down particle green formed
top particle red formed
top antiparticle blue formed
up particle red formed
up antiparticle blue formed
bottom antiparticle red not formed
down antiparticle blue not formed
down particle blue not formed
top particle green not formed
up antiparticle green not formed

Since the alien is from Bayesian galaxy, they want Issac to develop a Naive Bayes classifier to predict whether he’ll be successful in forming a hadron under the following quark conditions:


Problem 3.1

Naive Bayes predicts that, given a up-particle-blue quark, the probability a hadron formed is k times the probability a hadron is not formed, for an integer value of k. What is k?

k = 3

Following the equation for Naive Bayes we will create the folllowing two formulas: \begin{align*} &P(\text{formed hadron}|\text{up, particle, blue})\\ &=P(\text{formed hadron}) * P(\text{up}|\text{formed hadron}) * P(\text{particle}|\text{formed hadron}) * P(\text{blue}|\text{formed hadron}) \end{align*} and \begin{align*} &P(\text{not formed hadron}|\text{up, particle, blue})\\ &=P(\text{not formed hadron}) * P(\text{up}|\text{not formed hadron}) * P(\text{particle}|\text{not formed hadron}) * P(\text{blue}|\text{not formed hadron}) \end{align*}

Now all we have to do is calculate the prior probabilities!

We can calculate P(\text{formed hadron}) by looking at the number of times a formed hadron happens out of the total number of outcomes. We can see this probability is P(\text{formed hadron}) = \frac{5}{10} = \frac{1}{2}.

Similarily we can calculate for a not formed hadron P(\text{not formed hadron}) = \frac{5}{10} = \frac{1}{2}.

We can calculate P(\text{up}|\text{formed hadron}) by looking at the number of times up appears out of all formed hadrons. This will give us something like: P(\text{up}|\text{formed hadron}) = \frac{2}{5}.

We can use this same method to calculate P(\text{up}|\text{not formed hadron}) by looking at the number of times up appears out of all not formed hadrons. P(\text{up}|\text{not formed hadron}) = \frac{1}{5}

If you continue finding the conditional probabilities you will find:

  • P(\text{particle}|\text{formed hadron}) = \frac{3}{5}
  • P(\text{particle}|\text{not formed hadron}) = \frac{2}{5}
  • P(\text{blue}|\text{formed hadron}) = \frac{2}{5}
  • P(\text{blue}|\text{not formed hadron}) = \frac{2}{5}

Now we simply plug and chug using the equations we had before! \begin{align*} &P(\text{formed hadron}|\text{up, particle, blue})\\ &= P(\text{formed hadron}) * P(\text{up}|\text{formed hadron}) * P(\text{particle}|\text{formed hadron}) * P(\text{blue}|\text{formed hadron})\\ &= \frac{1}{2} * \frac{2}{5} * \frac{3}{5} * \frac{2}{5}\\ &= \frac{6}{125} \end{align*} and \begin{align*} &P(\text{not formed hadron}|\text{up, particle, blue})\\ &= P(\text{not formed hadron}) * P(\text{up}|\text{not formed hadron}) * P(\text{particle}|\text{not formed hadron}) * P(\text{blue}|\text{not formed hadron})\\ &= \frac{1}{2} * \frac{1}{5} * \frac{2}{5} * \frac{2}{5}\\ &= \frac{2}{125} \end{align*}

Now the only thing left to do is calculate k. We can do this by solving the equation k * \frac{2}{125} = \frac{6}{125}. You should find k=3.


Problem 3.2

What would be the value of k if you change up quark in the previous collision to top quark, but keep everything else the same (i.e. top-particle-blue quark)?

k = 3

All we have to do for this problem, after completing the first part, is to see how P(\text{top}|\text{formed hadron}) and P(\text{top}|\text{not formed hadron}) changes our previous equations.

We calculate these two probabilites by looking at the number of formed/not formed hadrons respectively and counting how many of those are in the top position. You will find P(\text{top}|\text{formed hadron}) = \frac{2}{5} and P(\text{top}|\text{not formed hadron}) = \frac{1}{5}. These are the same as our conditional probabilities when in the up position! Which means k=3 again.

If you want to go through the entire calculation again it can be found below: \begin{align*} &P(\text{formed hadron}|\text{top, particle, blue})\\ &= P(\text{formed hadron}) * P(\text{top}|\text{formed hadron}) * P(\text{particle}|\text{formed hadron}) * P(\text{blue}|\text{formed hadron})\\ &= \frac{1}{2} * \frac{2}{5} * \frac{3}{5} * \frac{2}{5}\\ &= \frac{6}{125} \end{align*} and \begin{align*} &P(\text{not formed hadron}|\text{top, particle, blue})\\ &= P(\text{not formed hadron}) * P(\text{top}|\text{not formed hadron}) * P(\text{particle}|\text{not formed hadron}) * P(\text{blue}|\text{not formed hadron})\\ &= \frac{1}{2} * \frac{1}{5} * \frac{2}{5} * \frac{2}{5}\\ &= \frac{2}{125} \end{align*}

We can do this by solving the equation k * \frac{2}{125} = \frac{6}{125}. You should find k=3.



Problem 4

Source: Winter 2024 Final Part 2, Problem 4

In hospital, there are 10 patients who have recently been exposed to COVID. The hospital purchased a new COVID testing machine. The machine could comprehensively analyze multiple body measurements to produce a COVID score for each patient, ranging from 1 to 10. According to the instruction manual of the machine, a higher COVID score means the patient is more likely to be infected by COVID, while a lower COVID score means the patient is less likely to be infected. The COVID score of each patient is listed below:

Patient Number 1 2 3 4 5 6 7 8 9 10
COVID Score 7 9 9 3 5 4 6 1 8 2

Suppose Patient 1, Patient 2, Patient 3, and Patient 4 are actually infected by COVID, while all other patients are not infected.


Problem 4.1

Based on the patient’s COVID score, Dr. Issac uses a threshold of 2.5 to diagnose COVID. That is, he diagnoses all patients with a COVID Score above 2.5 to be COVID-infected, and all patients with a COVID Score below 2.5 to be non-infected. What is the True Positive (TP), True Negative (TN), False Positive (FP), and False Negative (FN) of Dr. Issac’s diagnosis? Each block should be filled with an integer.

TP = ?

TP = 4

True positives are when the patient has COVID and was predicted to have COVID. It can be helpful to expand on the table from above to visualize how many patients have COVID.

Patient Number 1 2 3 4 5 6 7 8 9 10
COVID Score 7 9 9 3 5 4 6 1 8 2
Has COVID True True True True False False False False False False
Predicted COVID True True True True True True True False True False

We can now easily see Dr. Issac’s threshold would correctly diagnose 4 patients with COVID.


Problem 4.2

TN = ?

TN = 2

A true negative is when the patient does not have COVID and is predicted to not have COVID.

Using the same table as the part before:

Patient Number 1 2 3 4 5 6 7 8 9 10
COVID Score 7 9 9 3 5 4 6 1 8 2
Has COVID True True True True False False False False False False
Predicted COVID True True True True True True True False True False

We can see Dr. Issac’s threshold would correctly diagnose 2 patients to not have COVID.


Problem 4.3

FP = ?

FP = 4

A false positive is when the patient is said to have COVID, but does not have COVID.

Using the same table as the part before:

Patient Number 1 2 3 4 5 6 7 8 9 10
COVID Score 7 9 9 3 5 4 6 1 8 2
Has COVID True True True True False False False False False False
Predicted COVID True True True True True True True False True False

We can see Dr. Issac’s threshold would incorrectly diagnose 4 patients with COVID.


Problem 4.4

FN = ?

FN = 0

A false negative is when the patient is said to not have COVID, but has COVID.

Using the same table as the part before:

Patient Number 1 2 3 4 5 6 7 8 9 10
COVID Score 7 9 9 3 5 4 6 1 8 2
Has COVID True True True True False False False False False False
Predicted COVID True True True True True True True False True False

We can see Dr. Issac’s threshold would incorrectly diagnose 0 patients.


Problem 4.5

Dr. Albert is somewhat skeptical about technology, leading him to adopt a cautious approach in diagnosing patients. Initially, he sets a diagnostic threshold at 5.5. Patients scoring above this threshold are initially diagnosed as positive. However, Dr. Albert conducts a secondary diagnosis for all patients who received a positive diagnosis initially. During this secondary diagnosis, he reclassifies Patient 9’s positive diagnoses as negative. What is the True Positive (TP), True Negative (TN), False Positive (FP), and False Negative (FN) of Dr. Albert’s diagnosis? Each block should be filled with an integer.

TP = ?

TP = 3

True positives are when the patient has COVID and was predicted to have COVID.

Once again it can be beneficial to look at the table visually with added rows for how Dr. Albert predicted each patient’s COVID status.

Patient Number 1 2 3 4 5 6 7 8 9 10
COVID Score 7 9 9 3 5 4 6 1 8 2
Has COVID True True True True False False False False False False
Predicted COVID True True True False False False True False False False

Note that Patient 9 was classified as False after a secondary diagnosis!

Here we can see that Dr. Albert’s threshold and diagnosises would correctly identify 3 of the 4 patients have COVID, making TP = 3.


Problem 4.6

TN = ?

TN = 5

A true negative is when the patient does not have COVID and is predicted to not have COVID.

Using the same table as the part before:

Patient Number 1 2 3 4 5 6 7 8 9 10
COVID Score 7 9 9 3 5 4 6 1 8 2
Has COVID True True True True False False False False False False
Predicted COVID True True True False False False True False False False

Note that Patient 9 was classified as False after a secondary diagnosis!

Here we can see that Dr. Albert’s threshold and diagnosises would correctly identify 5 of the 6 not having COVID, making TN = 5.


Problem 4.7

FP = ?

FP = 1

A false positive is when the patient is said to have COVID, but does not have COVID.

Using the same table as the part before:

Patient Number 1 2 3 4 5 6 7 8 9 10
COVID Score 7 9 9 3 5 4 6 1 8 2
Has COVID True True True True False False False False False False
Predicted COVID True True True False False False True False False False

Note that Patient 9 was classified as False after a secondary diagnosis!

We can see that Dr. Albert’s threshold and diagnosises would incorrectly identify 1 patient as having COVID when they do not. Making FP = 1.


Problem 4.8

FN = ?

FN = 1

A false negative is when the patient is said to not have COVID, but has COVID.

Using the same table as the part before:

Patient Number 1 2 3 4 5 6 7 8 9 10
COVID Score 7 9 9 3 5 4 6 1 8 2
Has COVID True True True True False False False False False False
Predicted COVID True True True False False False True False False False

Note that Patient 9 was classified as False after a secondary diagnosis!

We can see Dr. Albert’s threshold would incorrectly diagnose 1 patient, making FN = 1.


Problem 4.9

Which Doctor’s diagnoses is better?

Albert

Recall the equation for accuracy is \frac{\text{TP}+\text{TN}}{\text{TP} + \text{FP} + \text{FN} + \text{TN}}.

We can figure out which Doctor’s diagnosis was more accurate, thus better.

Dr. Isaac: \frac{4 + 2}{4 + 4 + 0 + 2} = \frac{6}{10}

Dr. Albert: \frac{3 + 5}{3 + 1 + 1 + 5} = \frac{8}{10}

We can see that Dr. Albert’s accuracy is greater than Dr Isaac’s accuracy, which means Dr. ALbert’s diagnosis is better.



👋 Feedback: Find an error? Still confused? Have a suggestion? Let us know here.