← return to practice.dsc40a.com

**Instructor(s):** Aobo Li

This exam was administered in person. Students had **50
minutes** to take this exam.

*Source: Winter 2024 Final
Part 2, Problem 1*

Consider an experiment where each of n people selects one piece from their own set
of chess pieces, uniformly at random. A chess set has 32 pieces. 16 of these are black and 16 of these are white. **In each
color**, the 16 pieces are:

- 8 pawns,
- 2 bishops,
- 2 knights,
- 2 rooks,
- 1 queen, and
- 1 king.

The **result** of the experiment is a description of the
**colors and types** of the pieces each person selected.
For example, if n=3, one possible
result is:

Person 1 selected a white knight.

Person 2 selected a black queen.

Person 3 selected a black pawn.

How many results are possible for this experiment with n people?

2^n

6^n

12^n

16^n

32^n

None of the above.

12^n

To answer this question we must determine how many
**different** pieces each person can choose. In a complete
chess set we know that there’s pawns, knights, bishops, rooks, queens,
and kings. Then we also know that each of these pieces can be aither
black or white, therefore there’s a totoal of 6 \cdot 2 = 12 different pieces. Notice that
a complete chess set has 32 total pieces, but there’s multiple
duplicates in those 32 pieces that is why we say there’s only 12
different pieces on a chess set.

Now since every person has their own chess set, we know that each of them can get any of the 12 pieces, therefore if we do this with one person there would be 12 possible outcomes, now if we do it with two persons the first person can get any of the 12 pieces, but then for each piece that person 1 can choose person 2 can choose from their 12 pieces, which means that in this case there’s 12 \cdot 12 outcomes. if we keep doing this for n persons we’ll get that there’s 12 \cdot \dots \cdot 12 = 12^n.

As n becomes large, what fraction of n people has selected a black pawn? Choose the answer that’s closest to the actual fraction.

1/32

1/12

1/4

1/2

None of the above.

\frac{1}{4}

First, let’s calculate the probability of one person choosing a black
pawn. We know that the complete chess set has 32 pieces, and out of
those 32 there are 8 black pawns, which gives us a probability of \frac{8}{32} = \frac{1}{4} of selecting a
black pawn. Now, think about how we interpret probabilities; for
example, if we say that the probability of getting heads when flipping a
coin is \frac{1}{2} that just means
that if we flip a coin over and over, we expect to see heads \frac{1}{2} of the time. Similarly, when we
say that the probability of choosing a black pawn is \frac{1}{4}, that means that if we were to
__choose a chess piece over and over__, we expect to choose a black
pawn \frac{1}{4} of the time.

As n becomes large, we __choose
chess pieces over and over n times__.
Therefore, you still expect to see the black pawn \frac{1}{4} of the time.

True or False: Having two pawns is **independent** of
having two white pieces.

True

False

False.

We know that two probabilities are independent of each other if P(A \cap B) = P(A) \cdot P(B).

- Let P(A) be the probability of getting a white piece twice in a row.
- Let P(B) be the probability of getting a pawn twice in a row.

We can calculate the probability of getting a white piece twice in a row P(A) = \frac{16}{32} \cdot \frac{15}{31} = \frac{15}{62}. We get \frac{16}{32} from the fact there are 16 white pieces out of the 32 total pieces. We get \frac{15}{31} from the fact there are now 15 white pieces out of the 31 pieces left after taking out a white piece.

We can also calculate the probability of getting a pawn twice in a row. Note that the color of the pawn does not matter! P(B) = \frac{16}{32} \cdot \frac{15}{31} = \frac{15}{62}. We get \frac{16}{32} from the fact there are 16 pawns (8 black pawns plus 8 white pawns) out of the 32 total pieces. We get \frac{15}{31} from the fact there are now 15 pawns out of the 31 pieces left after taking out a pawn.

We can now calculate P(A) \cdot P(B): \frac{15}{62} \cdot \frac{15}{62} = \frac{225}{3844}

Recall we calculate P(A \cap B) = P(A) \cdot P(B|A).

To find P(B|A), we assume that both pieces are white and imagine the probabilities of getting two pawns from that batch of white pieces. To get the first pawn, the probability is \frac{8}{16} (8 pawns out of 16 white pieces). To get the second pawn, the probability is: \frac{7}{15} (7 remaining pawns out of 15 remaining white pieces). So, the probabilitiy of getting two pawns given we have two white pieces is P(B|A) = \frac{8}{16} \cdot \frac{7}{15} = \frac{7}{30}. We also already solved for P(A) above: \frac{15}{62}.

\frac{15}{62} \cdot \frac{7}{30} = \frac{7}{124}

We can see that P(A \cap B) \neq P(A) \cdot P(B) because \frac{7}{124} \neq \frac{225}{3844}, so the answer is “False.”

*Source: Winter 2024 Final
Part 2, Problem 2*

You run the k-means clustering
algorithm on a dataset and it converges to a certain clustering with
associated inertia I. You then
duplicate each data point in the dataset and run k-means again on this twice-as-big dataset,
with the same initial centroids as before. Which of the following is
true? **Select all that apply.**

The centroids found will be the same as before.

The inertia will be the same as before, I.

The inertia will be twice as much as before, 2I.

None of the above.

Bubbles 1 and 3: “The centroids found will be the same as before” and “The inertia will be twice as much as before, 2I.”

The centroids found will be the same as before because the dataset has been duplicated, but they have not moved! This means k means clustering will find the same clusters as before. You can imagine it like points overlapping each other.

The inertia will be twice as much before 2I because of how inertia is calculated. Recall inertia measures how well a dataset is clustered and is calculated by measuring the distance between each data point and its centroid, squaring the distance, and summing these squares across a cluster. We have doubled our points! Since they did not move otherwise the difference does not change, but we have twice as many points, which will cause the inertia to double.

*Source: Winter 2024 Final
Part 2, Problem 3*

With the help of the Alien from Bayesian Galaxy, Issac built a Little Hadron Collider in his garage to continue testing some fundamental principles of nature. The alien set up a fixed target at one end of the collider, and ask Issac to shoot quarks from the other end. Everytime the quark hit the target, the Alien will tell Issac whether a new hadron is formed or not. Due to the energy limitation in Issac’s garage, he could only generate up, down, top, and bottom quarks (Strange and Charm quark would have consumed too much energy). For each quark Issac created, he has a record of:

the type

`up`

,`down`

,`top`

,`bottom`

the state

`particle`

,`antiparticle`

the color charge

`red`

,`green`

,`blue`

, andthe outcome

`hadron formed`

,`hadron not formed`

Issac created 10 quarks, and these 10 quarks are recorded in the table below.

type |
state |
color |
Formed Hadron |
---|---|---|---|

down | particle | green | formed |

top | particle | red | formed |

top | antiparticle | blue | formed |

up | particle | red | formed |

up | antiparticle | blue | formed |

bottom | antiparticle | red | not formed |

down | antiparticle | blue | not formed |

down | particle | blue | not formed |

top | particle | green | not formed |

up | antiparticle | green | not formed |

Since the alien is from Bayesian galaxy, they want Issac to develop a Naive Bayes classifier to predict whether he’ll be successful in forming a hadron under the following quark conditions:

- the type is
`up`

- the state is
`particle`

- the color is
`blue`

Naive Bayes predicts that, given a `up-particle-blue`

quark, the probability a hadron formed is k times the probability a hadron is not
formed, for an integer value of k. What
is k?

1

2

3

4

k = 3

Following the equation for Naive Bayes we will create the folllowing two formulas: \begin{align*} &P(\text{formed hadron}|\text{up, particle, blue})\\ &=P(\text{formed hadron}) * P(\text{up}|\text{formed hadron}) * P(\text{particle}|\text{formed hadron}) * P(\text{blue}|\text{formed hadron}) \end{align*} and \begin{align*} &P(\text{not formed hadron}|\text{up, particle, blue})\\ &=P(\text{not formed hadron}) * P(\text{up}|\text{not formed hadron}) * P(\text{particle}|\text{not formed hadron}) * P(\text{blue}|\text{not formed hadron}) \end{align*}

Now all we have to do is calculate the prior probabilities!

We can calculate P(\text{formed
hadron}) by looking at the number of times a
`formed hadron`

happens out of the total number of outcomes.
We can see this probability is P(\text{formed
hadron}) = \frac{5}{10} = \frac{1}{2}.

Similarily we can calculate for a `not formed hadron`

P(\text{not formed hadron}) = \frac{5}{10} =
\frac{1}{2}.

We can calculate P(\text{up}|\text{formed
hadron}) by looking at the number of times `up`

appears out of all `formed hadron`

s. This will give us
something like: P(\text{up}|\text{formed
hadron}) = \frac{2}{5}.

We can use this same method to calculate P(\text{up}|\text{not formed hadron}) by
looking at the number of times `up`

appears out of all
`not formed hadron`

s. P(\text{up}|\text{not formed hadron}) =
\frac{1}{5}

If you continue finding the conditional probabilities you will find:

- P(\text{particle}|\text{formed hadron}) = \frac{3}{5}
- P(\text{particle}|\text{not formed hadron}) = \frac{2}{5}
- P(\text{blue}|\text{formed hadron}) = \frac{2}{5}
- P(\text{blue}|\text{not formed hadron}) = \frac{2}{5}

Now we simply plug and chug using the equations we had before! \begin{align*} &P(\text{formed hadron}|\text{up, particle, blue})\\ &= P(\text{formed hadron}) * P(\text{up}|\text{formed hadron}) * P(\text{particle}|\text{formed hadron}) * P(\text{blue}|\text{formed hadron})\\ &= \frac{1}{2} * \frac{2}{5} * \frac{3}{5} * \frac{2}{5}\\ &= \frac{6}{125} \end{align*} and \begin{align*} &P(\text{not formed hadron}|\text{up, particle, blue})\\ &= P(\text{not formed hadron}) * P(\text{up}|\text{not formed hadron}) * P(\text{particle}|\text{not formed hadron}) * P(\text{blue}|\text{not formed hadron})\\ &= \frac{1}{2} * \frac{1}{5} * \frac{2}{5} * \frac{2}{5}\\ &= \frac{2}{125} \end{align*}

Now the only thing left to do is calculate k. We can do this by solving the equation k * \frac{2}{125} = \frac{6}{125}. You should find k=3.

What would be the value of k if you
change `up`

quark in the previous collision to
`top`

quark, but keep everything else the same (i.e.
`top-particle-blue`

quark)?

1

2

3

4

k = 3

All we have to do for this problem, after completing the first part, is to see how P(\text{top}|\text{formed hadron}) and P(\text{top}|\text{not formed hadron}) changes our previous equations.

We calculate these two probabilites by looking at the number of
formed/not formed hadrons respectively and counting how many of those
are in the `top`

position. You will find P(\text{top}|\text{formed hadron}) =
\frac{2}{5} and P(\text{top}|\text{not
formed hadron}) = \frac{1}{5}. These are the same as our
conditional probabilities when in the `up`

position! Which
means k=3 again.

If you want to go through the entire calculation again it can be found below: \begin{align*} &P(\text{formed hadron}|\text{top, particle, blue})\\ &= P(\text{formed hadron}) * P(\text{top}|\text{formed hadron}) * P(\text{particle}|\text{formed hadron}) * P(\text{blue}|\text{formed hadron})\\ &= \frac{1}{2} * \frac{2}{5} * \frac{3}{5} * \frac{2}{5}\\ &= \frac{6}{125} \end{align*} and \begin{align*} &P(\text{not formed hadron}|\text{top, particle, blue})\\ &= P(\text{not formed hadron}) * P(\text{top}|\text{not formed hadron}) * P(\text{particle}|\text{not formed hadron}) * P(\text{blue}|\text{not formed hadron})\\ &= \frac{1}{2} * \frac{1}{5} * \frac{2}{5} * \frac{2}{5}\\ &= \frac{2}{125} \end{align*}

We can do this by solving the equation k * \frac{2}{125} = \frac{6}{125}. You should find k=3.

*Source: Winter 2024 Final
Part 2, Problem 4*

In hospital, there are 10 patients who have recently been exposed to COVID. The hospital purchased a new COVID testing machine. The machine could comprehensively analyze multiple body measurements to produce a COVID score for each patient, ranging from 1 to 10. According to the instruction manual of the machine, a higher COVID score means the patient is more likely to be infected by COVID, while a lower COVID score means the patient is less likely to be infected. The COVID score of each patient is listed below:

Patient Number |
1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | 10 |
---|---|---|---|---|---|---|---|---|---|---|

COVID Score |
7 | 9 | 9 | 3 | 5 | 4 | 6 | 1 | 8 | 2 |

Suppose Patient 1, Patient 2, Patient 3, and Patient 4 are actually infected by COVID, while all other patients are not infected.

Based on the patient’s COVID score, Dr. Issac uses a threshold of 2.5 to diagnose COVID. That is, he diagnoses all patients with a COVID Score above 2.5 to be COVID-infected, and all patients with a COVID Score below 2.5 to be non-infected. What is the True Positive (TP), True Negative (TN), False Positive (FP), and False Negative (FN) of Dr. Issac’s diagnosis? Each block should be filled with an integer.

TP = ?

TP = 4

True positives are when the patient has COVID and was predicted to have COVID. It can be helpful to expand on the table from above to visualize how many patients have COVID.

Patient Number |
1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | 10 |
---|---|---|---|---|---|---|---|---|---|---|

COVID Score |
7 | 9 | 9 | 3 | 5 | 4 | 6 | 1 | 8 | 2 |

Has COVID |
True | True | True | True | False | False | False | False | False | False |

Predicted COVID |
True | True | True | True | True | True | True | False | True | False |

We can now easily see Dr. Issac’s threshold would correctly diagnose 4 patients with COVID.

TN = ?

TN = 2

A true negative is when the patient does not have COVID and is predicted to not have COVID.

Using the same table as the part before:

Patient Number |
1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | 10 |
---|---|---|---|---|---|---|---|---|---|---|

COVID Score |
7 | 9 | 9 | 3 | 5 | 4 | 6 | 1 | 8 | 2 |

Has COVID |
True | True | True | True | False | False | False | False | False | False |

Predicted COVID |
True | True | True | True | True | True | True | False | True | False |

We can see Dr. Issac’s threshold would correctly diagnose 2 patients to not have COVID.

FP = ?

FP = 4

A false positive is when the patient is said to have COVID, but does not have COVID.

Using the same table as the part before:

Patient Number |
1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | 10 |
---|---|---|---|---|---|---|---|---|---|---|

COVID Score |
7 | 9 | 9 | 3 | 5 | 4 | 6 | 1 | 8 | 2 |

Has COVID |
True | True | True | True | False | False | False | False | False | False |

Predicted COVID |
True | True | True | True | True | True | True | False | True | False |

We can see Dr. Issac’s threshold would incorrectly diagnose 4 patients with COVID.

FN = ?

FN = 0

A false negative is when the patient is said to not have COVID, but has COVID.

Using the same table as the part before:

Patient Number |
1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | 10 |
---|---|---|---|---|---|---|---|---|---|---|

COVID Score |
7 | 9 | 9 | 3 | 5 | 4 | 6 | 1 | 8 | 2 |

Has COVID |
True | True | True | True | False | False | False | False | False | False |

Predicted COVID |
True | True | True | True | True | True | True | False | True | False |

We can see Dr. Issac’s threshold would incorrectly diagnose 0 patients.

Dr. Albert is somewhat skeptical about technology, leading him to adopt a cautious approach in diagnosing patients. Initially, he sets a diagnostic threshold at 5.5. Patients scoring above this threshold are initially diagnosed as positive. However, Dr. Albert conducts a secondary diagnosis for all patients who received a positive diagnosis initially. During this secondary diagnosis, he reclassifies Patient 9’s positive diagnoses as negative. What is the True Positive (TP), True Negative (TN), False Positive (FP), and False Negative (FN) of Dr. Albert’s diagnosis? Each block should be filled with an integer.

TP = ?

TP = 3

True positives are when the patient has COVID and was predicted to have COVID.

Once again it can be beneficial to look at the table visually with added rows for how Dr. Albert predicted each patient’s COVID status.

Patient Number |
1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | 10 |
---|---|---|---|---|---|---|---|---|---|---|

COVID Score |
7 | 9 | 9 | 3 | 5 | 4 | 6 | 1 | 8 | 2 |

Has COVID |
True | True | True | True | False | False | False | False | False | False |

Predicted COVID |
True | True | True | False | False | False | True | False | False | False |

* Note that Patient 9 was classified as False after a secondary
diagnosis! *

Here we can see that Dr. Albert’s threshold and diagnosises would correctly identify 3 of the 4 patients have COVID, making TP = 3.

TN = ?

TN = 5

A true negative is when the patient does not have COVID and is predicted to not have COVID.

Using the same table as the part before:

Patient Number |
1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | 10 |
---|---|---|---|---|---|---|---|---|---|---|

COVID Score |
7 | 9 | 9 | 3 | 5 | 4 | 6 | 1 | 8 | 2 |

Has COVID |
True | True | True | True | False | False | False | False | False | False |

Predicted COVID |
True | True | True | False | False | False | True | False | False | False |

* Note that Patient 9 was classified as False after a secondary
diagnosis! *

Here we can see that Dr. Albert’s threshold and diagnosises would correctly identify 5 of the 6 not having COVID, making TN = 5.

FP = ?

FP = 1

A false positive is when the patient is said to have COVID, but does not have COVID.

Using the same table as the part before:

Patient Number |
1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | 10 |
---|---|---|---|---|---|---|---|---|---|---|

COVID Score |
7 | 9 | 9 | 3 | 5 | 4 | 6 | 1 | 8 | 2 |

Has COVID |
True | True | True | True | False | False | False | False | False | False |

Predicted COVID |
True | True | True | False | False | False | True | False | False | False |

* Note that Patient 9 was classified as False after a secondary
diagnosis! *

We can see that Dr. Albert’s threshold and diagnosises would incorrectly identify 1 patient as having COVID when they do not. Making FP = 1.

FN = ?

FN = 1

A false negative is when the patient is said to not have COVID, but has COVID.

Using the same table as the part before:

Patient Number |
1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | 10 |
---|---|---|---|---|---|---|---|---|---|---|

COVID Score |
7 | 9 | 9 | 3 | 5 | 4 | 6 | 1 | 8 | 2 |

Has COVID |
True | True | True | True | False | False | False | False | False | False |

Predicted COVID |
True | True | True | False | False | False | True | False | False | False |

* Note that Patient 9 was classified as False after a secondary
diagnosis! *

We can see Dr. Albert’s threshold would incorrectly diagnose 1 patient, making FN = 1.

Which Doctor’s diagnoses is better?

Albert

Issac

They are equally good

Albert

Recall the equation for accuracy is \frac{\text{TP}+\text{TN}}{\text{TP} + \text{FP} + \text{FN} + \text{TN}}.

We can figure out which Doctor’s diagnosis was more accurate, thus better.

Dr. Isaac: \frac{4 + 2}{4 + 4 + 0 + 2} = \frac{6}{10}

Dr. Albert: \frac{3 + 5}{3 + 1 + 1 + 5} = \frac{8}{10}

We can see that Dr. Albert’s accuracy is greater than Dr Isaac’s accuracy, which means Dr. ALbert’s diagnosis is better.