Winter 2024 Final Exam Part 2

This exam was administered in person. Students had 50 minutes to take this exam.

Problem 1

Consider an experiment where each of n people selects one piece from their own set of chess pieces, uniformly at random. A chess set has 32 pieces. 16 of these are black and 16 of these are white. In each color, the 16 pieces are:

The result of the experiment is a description of the colors and types of the pieces each person selected. For example, if n=3, one possible result is:

Problem 1.1

12^n

To answer this question we must determine how many different pieces each person can choose. In a complete chess set we know that there’s pawns, knights, bishops, rooks, queens, and kings. Then we also know that each of these pieces can be aither black or white, therefore there’s a totoal of 6 \cdot 2 = 12 different pieces. Notice that a complete chess set has 32 total pieces, but there’s multiple duplicates in those 32 pieces that is why we say there’s only 12 different pieces on a chess set.

Now since every person has their own chess set, we know that each of them can get any of the 12 pieces, therefore if we do this with one person there would be 12 possible outcomes, now if we do it with two persons the first person can get any of the 12 pieces, but then for each piece that person 1 can choose person 2 can choose from their 12 pieces, which means that in this case there’s 12 \cdot 12 outcomes. if we keep doing this for n persons we’ll get that there’s 12 \cdot \dots \cdot 12 = 12^n.

Problem 1.2

As n becomes large, what fraction of n people has selected a black pawn? Choose the answer that’s closest to the actual fraction.

\frac{1}{4}

First, let’s calculate the probability of one person choosing a black pawn. We know that the complete chess set has 32 pieces, and out of those 32 there are 8 black pawns, which gives us a probability of \frac{8}{32} = \frac{1}{4} of selecting a black pawn. Now, think about how we interpret probabilities; for example, if we say that the probability of getting heads when flipping a coin is \frac{1}{2} that just means that if we flip a coin over and over, we expect to see heads \frac{1}{2} of the time. Similarly, when we say that the probability of choosing a black pawn is \frac{1}{4}, that means that if we were to choose a chess piece over and over, we expect to choose a black pawn \frac{1}{4} of the time.

As n becomes large, we choose chess pieces over and over n times. Therefore, you still expect to see the black pawn \frac{1}{4} of the time.

Problem 1.3

False.

We know that two probabilities are independent of each other if P(A \cap B) = P(A) \cdot P(B).

Let P(A) be the probability of getting a white piece twice in a row.
Let P(B) be the probability of getting a pawn twice in a row.

We can calculate the probability of getting a white piece twice in a row P(A) = \frac{16}{32} \cdot \frac{15}{31} = \frac{15}{62}. We get \frac{16}{32} from the fact there are 16 white pieces out of the 32 total pieces. We get \frac{15}{31} from the fact there are now 15 white pieces out of the 31 pieces left after taking out a white piece.

We can also calculate the probability of getting a pawn twice in a row. Note that the color of the pawn does not matter! P(B) = \frac{16}{32} \cdot \frac{15}{31} = \frac{15}{62}. We get \frac{16}{32} from the fact there are 16 pawns (8 black pawns plus 8 white pawns) out of the 32 total pieces. We get \frac{15}{31} from the fact there are now 15 pawns out of the 31 pieces left after taking out a pawn.

We can now calculate P(A) \cdot P(B): \frac{15}{62} \cdot \frac{15}{62} = \frac{225}{3844}

Recall we calculate P(A \cap B) = P(A) \cdot P(B|A).

To find P(B|A), we assume that both pieces are white and imagine the probabilities of getting two pawns from that batch of white pieces. To get the first pawn, the probability is \frac{8}{16} (8 pawns out of 16 white pieces). To get the second pawn, the probability is: \frac{7}{15} (7 remaining pawns out of 15 remaining white pieces). So, the probabilitiy of getting two pawns given we have two white pieces is P(B|A) = \frac{8}{16} \cdot \frac{7}{15} = \frac{7}{30}. We also already solved for P(A) above: \frac{15}{62}.

\frac{15}{62} \cdot \frac{7}{30} = \frac{7}{124}

We can see that P(A \cap B) \neq P(A) \cdot P(B) because \frac{7}{124} \neq \frac{225}{3844}, so the answer is “False.”

Problem 2

You run the k-means clustering algorithm on a dataset and it converges to a certain clustering with associated inertia I. You then duplicate each data point in the dataset and run k-means again on this twice-as-big dataset, with the same initial centroids as before. Which of the following is true? Select all that apply.

Bubbles 1 and 3: “The centroids found will be the same as before” and “The inertia will be twice as much as before, 2I.”

The centroids found will be the same as before because the dataset has been duplicated, but they have not moved! This means k means clustering will find the same clusters as before. You can imagine it like points overlapping each other.

The inertia will be twice as much before 2I because of how inertia is calculated. Recall inertia measures how well a dataset is clustered and is calculated by measuring the distance between each data point and its centroid, squaring the distance, and summing these squares across a cluster. We have doubled our points! Since they did not move otherwise the difference does not change, but we have twice as many points, which will cause the inertia to double.

Problem 3

With the help of the Alien from Bayesian Galaxy, Issac built a Little Hadron Collider in his garage to continue testing some fundamental principles of nature. The alien set up a fixed target at one end of the collider, and ask Issac to shoot quarks from the other end. Everytime the quark hit the target, the Alien will tell Issac whether a new hadron is formed or not. Due to the energy limitation in Issac’s garage, he could only generate up, down, top, and bottom quarks (Strange and Charm quark would have consumed too much energy). For each quark Issac created, he has a record of:

Since the alien is from Bayesian galaxy, they want Issac to develop a Naive Bayes classifier to predict whether he’ll be successful in forming a hadron under the following quark conditions:

Problem 3.1

type	state	color	Formed Hadron
down	particle	green	formed
top	particle	red	formed
top	antiparticle	blue	formed
up	particle	red	formed
up	antiparticle	blue	formed
bottom	antiparticle	red	not formed
down	antiparticle	blue	not formed
down	particle	blue	not formed
top	particle	green	not formed
up	antiparticle	green	not formed

Naive Bayes predicts that, given a up-particle-blue quark, the probability a hadron formed is k times the probability a hadron is not formed, for an integer value of k. What is k?

k = 3

Following the equation for Naive Bayes we will create the folllowing two formulas: \begin{align*} &P(\text{formed hadron}|\text{up, particle, blue})\\ &=P(\text{formed hadron}) * P(\text{up}|\text{formed hadron}) * P(\text{particle}|\text{formed hadron}) * P(\text{blue}|\text{formed hadron}) \end{align*} and \begin{align*} &P(\text{not formed hadron}|\text{up, particle, blue})\\ &=P(\text{not formed hadron}) * P(\text{up}|\text{not formed hadron}) * P(\text{particle}|\text{not formed hadron}) * P(\text{blue}|\text{not formed hadron}) \end{align*}

Now all we have to do is calculate the prior probabilities!

We can calculate P(\text{formed hadron}) by looking at the number of times a formed hadron happens out of the total number of outcomes. We can see this probability is P(\text{formed hadron}) = \frac{5}{10} = \frac{1}{2}.

Similarily we can calculate for a not formed hadron P(\text{not formed hadron}) = \frac{5}{10} = \frac{1}{2}.

We can calculate P(\text{up}|\text{formed hadron}) by looking at the number of times up appears out of all formed hadrons. This will give us something like: P(\text{up}|\text{formed hadron}) = \frac{2}{5}.

We can use this same method to calculate P(\text{up}|\text{not formed hadron}) by looking at the number of times up appears out of all not formed hadrons. P(\text{up}|\text{not formed hadron}) = \frac{1}{5}

If you continue finding the conditional probabilities you will find:

P(\text{particle}|\text{formed hadron}) = \frac{3}{5}
P(\text{particle}|\text{not formed hadron}) = \frac{2}{5}
P(\text{blue}|\text{formed hadron}) = \frac{2}{5}
P(\text{blue}|\text{not formed hadron}) = \frac{2}{5}

Now we simply plug and chug using the equations we had before! \begin{align*} &P(\text{formed hadron}|\text{up, particle, blue})\\ &= P(\text{formed hadron}) * P(\text{up}|\text{formed hadron}) * P(\text{particle}|\text{formed hadron}) * P(\text{blue}|\text{formed hadron})\\ &= \frac{1}{2} * \frac{2}{5} * \frac{3}{5} * \frac{2}{5}\\ &= \frac{6}{125} \end{align*} and \begin{align*} &P(\text{not formed hadron}|\text{up, particle, blue})\\ &= P(\text{not formed hadron}) * P(\text{up}|\text{not formed hadron}) * P(\text{particle}|\text{not formed hadron}) * P(\text{blue}|\text{not formed hadron})\\ &= \frac{1}{2} * \frac{1}{5} * \frac{2}{5} * \frac{2}{5}\\ &= \frac{2}{125} \end{align*}

Now the only thing left to do is calculate k. We can do this by solving the equation k * \frac{2}{125} = \frac{6}{125}. You should find k=3.

Problem 3.2

What would be the value of k if you change up quark in the previous collision to top quark, but keep everything else the same (i.e. top-particle-blue quark)?

k = 3

All we have to do for this problem, after completing the first part, is to see how P(\text{top}|\text{formed hadron}) and P(\text{top}|\text{not formed hadron}) changes our previous equations.

We calculate these two probabilites by looking at the number of formed/not formed hadrons respectively and counting how many of those are in the top position. You will find P(\text{top}|\text{formed hadron}) = \frac{2}{5} and P(\text{top}|\text{not formed hadron}) = \frac{1}{5}. These are the same as our conditional probabilities when in the up position! Which means k=3 again.

If you want to go through the entire calculation again it can be found below: \begin{align*} &P(\text{formed hadron}|\text{top, particle, blue})\\ &= P(\text{formed hadron}) * P(\text{top}|\text{formed hadron}) * P(\text{particle}|\text{formed hadron}) * P(\text{blue}|\text{formed hadron})\\ &= \frac{1}{2} * \frac{2}{5} * \frac{3}{5} * \frac{2}{5}\\ &= \frac{6}{125} \end{align*} and \begin{align*} &P(\text{not formed hadron}|\text{top, particle, blue})\\ &= P(\text{not formed hadron}) * P(\text{top}|\text{not formed hadron}) * P(\text{particle}|\text{not formed hadron}) * P(\text{blue}|\text{not formed hadron})\\ &= \frac{1}{2} * \frac{1}{5} * \frac{2}{5} * \frac{2}{5}\\ &= \frac{2}{125} \end{align*}

We can do this by solving the equation k * \frac{2}{125} = \frac{6}{125}. You should find k=3.

Problem 4

In hospital, there are 10 patients who have recently been exposed to COVID. The hospital purchased a new COVID testing machine. The machine could comprehensively analyze multiple body measurements to produce a COVID score for each patient, ranging from 1 to 10. According to the instruction manual of the machine, a higher COVID score means the patient is more likely to be infected by COVID, while a lower COVID score means the patient is less likely to be infected. The COVID score of each patient is listed below:

Suppose Patient 1, Patient 2, Patient 3, and Patient 4 are actually infected by COVID, while all other patients are not infected.

Problem 4.1

Based on the patient’s COVID score, Dr. Issac uses a threshold of 2.5 to diagnose COVID. That is, he diagnoses all patients with a COVID Score above 2.5 to be COVID-infected, and all patients with a COVID Score below 2.5 to be non-infected. What is the True Positive (TP), True Negative (TN), False Positive (FP), and False Negative (FN) of Dr. Issac’s diagnosis? Each block should be filled with an integer.

TP = 4

True positives are when the patient has COVID and was predicted to have COVID. It can be helpful to expand on the table from above to visualize how many patients have COVID.

Patient Number	1	2	3	4	5	6	7	8	9	10
COVID Score	7	9	9	3	5	4	6	1	8	2
Has COVID	True	True	True	True	False	False	False	False	False	False
Predicted COVID	True	True	True	True	True	True	True	False	True	False

We can now easily see Dr. Issac’s threshold would correctly diagnose 4 patients with COVID.

Problem 4.2

TN = 2

A true negative is when the patient does not have COVID and is predicted to not have COVID.

Using the same table as the part before:

Patient Number	1	2	3	4	5	6	7	8	9	10
COVID Score	7	9	9	3	5	4	6	1	8	2
Has COVID	True	True	True	True	False	False	False	False	False	False
Predicted COVID	True	True	True	True	True	True	True	False	True	False

We can see Dr. Issac’s threshold would correctly diagnose 2 patients to not have COVID.

Problem 4.3

FP = 4

A false positive is when the patient is said to have COVID, but does not have COVID.

Using the same table as the part before:

Patient Number	1	2	3	4	5	6	7	8	9	10
COVID Score	7	9	9	3	5	4	6	1	8	2
Has COVID	True	True	True	True	False	False	False	False	False	False
Predicted COVID	True	True	True	True	True	True	True	False	True	False

We can see Dr. Issac’s threshold would incorrectly diagnose 4 patients with COVID.

Problem 4.4

FN = 0

A false negative is when the patient is said to not have COVID, but has COVID.

Using the same table as the part before:

Patient Number	1	2	3	4	5	6	7	8	9	10
COVID Score	7	9	9	3	5	4	6	1	8	2
Has COVID	True	True	True	True	False	False	False	False	False	False
Predicted COVID	True	True	True	True	True	True	True	False	True	False

We can see Dr. Issac’s threshold would incorrectly diagnose 0 patients.

Problem 4.5

Dr. Albert is somewhat skeptical about technology, leading him to adopt a cautious approach in diagnosing patients. Initially, he sets a diagnostic threshold at 5.5. Patients scoring above this threshold are initially diagnosed as positive. However, Dr. Albert conducts a secondary diagnosis for all patients who received a positive diagnosis initially. During this secondary diagnosis, he reclassifies Patient 9’s positive diagnoses as negative. What is the True Positive (TP), True Negative (TN), False Positive (FP), and False Negative (FN) of Dr. Albert’s diagnosis? Each block should be filled with an integer.

TP = 3

True positives are when the patient has COVID and was predicted to have COVID.

Once again it can be beneficial to look at the table visually with added rows for how Dr. Albert predicted each patient’s COVID status.

Patient Number	1	2	3	4	5	6	7	8	9	10
COVID Score	7	9	9	3	5	4	6	1	8	2
Has COVID	True	True	True	True	False	False	False	False	False	False
Predicted COVID	True	True	True	False	False	False	True	False	False	False

Note that Patient 9 was classified as False after a secondary diagnosis!

Here we can see that Dr. Albert’s threshold and diagnosises would correctly identify 3 of the 4 patients have COVID, making TP = 3.

Problem 4.6

TN = 5

A true negative is when the patient does not have COVID and is predicted to not have COVID.

Using the same table as the part before:

Patient Number	1	2	3	4	5	6	7	8	9	10
COVID Score	7	9	9	3	5	4	6	1	8	2
Has COVID	True	True	True	True	False	False	False	False	False	False
Predicted COVID	True	True	True	False	False	False	True	False	False	False

Note that Patient 9 was classified as False after a secondary diagnosis!

Here we can see that Dr. Albert’s threshold and diagnosises would correctly identify 5 of the 6 not having COVID, making TN = 5.

Problem 4.7

FP = 1

A false positive is when the patient is said to have COVID, but does not have COVID.

Using the same table as the part before:

Patient Number	1	2	3	4	5	6	7	8	9	10
COVID Score	7	9	9	3	5	4	6	1	8	2
Has COVID	True	True	True	True	False	False	False	False	False	False
Predicted COVID	True	True	True	False	False	False	True	False	False	False

Note that Patient 9 was classified as False after a secondary diagnosis!

We can see that Dr. Albert’s threshold and diagnosises would incorrectly identify 1 patient as having COVID when they do not. Making FP = 1.

Problem 4.8

FN = 1

A false negative is when the patient is said to not have COVID, but has COVID.

Using the same table as the part before:

Patient Number	1	2	3	4	5	6	7	8	9	10
COVID Score	7	9	9	3	5	4	6	1	8	2
Has COVID	True	True	True	True	False	False	False	False	False	False
Predicted COVID	True	True	True	False	False	False	True	False	False	False

Note that Patient 9 was classified as False after a secondary diagnosis!

We can see Dr. Albert’s threshold would incorrectly diagnose 1 patient, making FN = 1.

Problem 4.9

Albert

Recall the equation for accuracy is \frac{\text{TP}+\text{TN}}{\text{TP} + \text{FP} + \text{FN} + \text{TN}}.

We can figure out which Doctor’s diagnosis was more accurate, thus better.

Dr. Isaac: \frac{4 + 2}{4 + 4 + 0 + 2} = \frac{6}{10}

Dr. Albert: \frac{3 + 5}{3 + 1 + 1 + 5} = \frac{8}{10}

We can see that Dr. Albert’s accuracy is greater than Dr Isaac’s accuracy, which means Dr. ALbert’s diagnosis is better.

Problem 1

Problem 1.1

Click to view the solution.

Problem 1.2

Click to view the solution.

Problem 1.3

Click to view the solution.

Problem 2

Click to view the solution.

Problem 3

Problem 3.1

Click to view the solution.

Problem 3.2

Click to view the solution.

Problem 4

Problem 4.1

Click to view the solution.

Problem 4.2

Click to view the solution.

Problem 4.3

Click to view the solution.

Problem 4.4

Click to view the solution.

Problem 4.5

Click to view the solution.

Problem 4.6

Click to view the solution.

Problem 4.7

Click to view the solution.

Problem 4.8

Click to view the solution.

Problem 4.9

Click to view the solution.

👋 Feedback: Find an error? Still confused? Have a suggestion? Let us know here.