--- title: "Strategizing Jeop-R-dy!" author: "Luce" date: "20 September, 2018" output: html_document: toc: yes pdf_document: toc: yes editor_options: chunk_output_type: console --- ```{r setup, include=FALSE} knitr::opts_chunk$set(echo = TRUE) ``` ## Parameters We have a few parameters that we can play with: * the prize money multiplier base | double | triple ---- | ------ | ------ 1x | 2x | 3x * level of difficulty + low ($100, $200 questions) + medium ($300 questions) + high ($400, $500 questions) ## Preliminary data Let's start by figuring out: for a question of medium difficulty, what is the probability that a given person in a team knows the answer. First, we have to estimate the probability that a given person knows the answer. Let's say it's 0.6. ```{r} p.know <- 0.6 ``` So then, if there was only one person on a team, what would the probability of them getting the right answer be? Trivially, we know that it is 0.6! How about if there were two people on a team? What is the probability that one of them knows the right answer (and can convince her team-mate of that)? That's just the binomial distribution: ```{r} 1 - dbinom(0, size = 2, prob = 0.6) ``` ... where `dbinom(0, 2, 0.6)` is the probability that neither of the two people knows the answer, and 1 minus that is the probability that at least one person knows. Can we frame the probability of one person knowing the answer in this form too? ```{r} 1 - dbinom(0, size = 1, prob = 0.6) ``` Yep! That's reassuring! Similarly, we can use the binomial distribution to estimate the probability that, in a team of three people, or six people, at least one person will know the right answer (and be able to convince their team-mates): ```{r} 1 - dbinom(0, size = 3, prob = 0.6) 1 - dbinom(0, size = 6, prob = 0.6) ``` We should probably plot that to see if there is a point of diminishing returns. ```{r} n <- 1:6 barplot(1 - dbinom(0, n, p.know), names.arg = n, ylim = c(0, 1.0)) ``` It looks like 3 people have practically as much chance of getting a medium difficulty question right as a full team of 6. Good to know! What about if the question is from one of the harder brackets. In this case, our probability of a given person knowing the answer would be much lower. Let's estimate this at 0.2. ```{r} p.know <- 0.2 barplot(1 - dbinom(0, n, p.know), names.arg = n, ylim = c(0, 1.0)) abline(h = 0.5) ``` From the graph, it looks like we would need to have at least 3 people trying to answer the question, to have even a 50% chance of getting it right. Let's make a similar graph for an easy question. Say the probability of a given person knowing the answer here is high, i.e., 0.85. ```{r} p.know <- 0.85 barplot(1 - dbinom(0, n, p.know), names.arg = n, ylim = c(0, 1.0)) ``` Perhaps we'd like to print these figures side-by-side: ```{r, echo = FALSE} par(mfrow = c(1,3)) # we'll probably need to add titles p.know <- 0.6 n <- 1:6 barplot(1 - dbinom(0, n, p.know), names.arg = n, ylim = c(0, 1.0), main = "Medium questions \n p.know = 0.6") p.know <- 0.2 barplot(1 - dbinom(0, n, p.know), names.arg = n, ylim = c(0, 1.0), main = "Hard questions \n p.know = 0.2") abline(h = 0.5) p.know <- 0.85 barplot(1 - dbinom(0, n, p.know), names.arg = n, ylim = c(0, 1.0), main = "Easy questions \n p.know = 0.85") ``` ## Incorporating multipliers Each team can choose to multiply their winning (or losing) stake by 1x, 2x or 3x, with the added stipulation that an increased multiplier decreases the number of people who can answer the question. We have already seen that an increase in the number of people answering the question increases the probability of getting it right. For a given question difficulty, can we calculate the expected **reward** that a team would win, given: * a question's difficulty level, and * the number of people who can answer. For the moment, let's imagine that all questions are worth $100. For the moment, let's also ignore the fact that a team can pass on answering. Let's start with an easy question (probability of knowing the answer is 0.85), with a full team of 6 people, and a multiplier of 1x. ```{r} p.know <- 0.85 team.size <- 6 money <- 100 multiplied.money <- 1 * money # need to add the money if a team correctly answers # and subtract the money if a team does not ((1 - dbinom(0, team.size, p.know)) * multiplied.money) + (dbinom(0, team.size, p.know) * (-multiplied.money)) ``` ```{r echo = FALSE} options(digits = 2) ``` Okay, the expected prize money rounds up to $`r ((1 - dbinom(0, team.size, p.know)) * money) + (dbinom(0, team.size, p.know) * (-money))`. If there are too many decimal places in your currency, you can include `options(digits = 2)` in a code chunk. Note that this will be in effect from here on down in the document. What happens if we increase the multiplier? ```{r} # multiplier of 2x, means a randomly selected team of 2 team.size <- 2 multiplied.money <- 2 * money ((1 - dbinom(0, team.size, p.know)) * multiplied.money) + (dbinom(0, team.size, p.know) * (-multiplied.money)) # multiplier of 3x, means a single randomly selected persom team.size <- 1 multiplied.money <- 3 * money ((1 - dbinom(0, team.size, p.know)) * multiplied.money) + (dbinom(0, team.size, p.know) * (-multiplied.money)) ``` With an easy question, the expected amount of rewarded money increases with the multiplier, despite the fact that fewer people can participate in answering. Let's try the same analysis, but for the medium and hard questions. But first... we see that we are writing the same code over and over again; might be time to write a function! ```{r echo = FALSE} # Wouldn't it be nice if all we had to pass in to the function was the probability of knowing the answer, and the money multiplier, and it would figure out the rest for us? reward.calc <- function(p.know, multiplier = 1) { team.size <- c(6, 2, 1) money <- 100 multiplied.money <- multiplier * money ((1 - dbinom(0, team.size[multiplier], p.know)) * multiplied.money) + (dbinom(0, team.size[multiplier], p.know) * (-multiplied.money)) } ``` ```{r} # Medium difficulty question p.know <- 0.6 # full team; team of randomly selected two; team of randomly selected one multiplier <- 1:3 reward.calc(p.know, multiplier) # High difficulty question p.know <- 0.2 reward.calc(p.know, multiplier) ``` These results suggest that on medium difficulty questions, doubling the reward/risk pays off, but on the hard questions, you might want the wisdom of the crowd. ## Getting more realistic ```{r echo = FALSE} # if you changed the number of decimal places above, you may want to change them back here options(digits = 4) ``` In Jeop-R-dy, a team can choose, after seeing the question, to pass, thereby forfeiting any chance at prize money, but also protecting themselves against loss. Let's see if we can build up a model where you have two probability parameters: * the probability of a given person knowing the right answer * the probability of a team passing because they know they don't know the right answer (known unknown!) ```{r} p.know <- 0.6 p.pass <- 0.3 ``` So, the probability of somebody on a team of 6 knowing the right answer is ... ```{r} team <- 6 1 - dbinom(0, team, p.know) ``` and the probability of the team passing is ... ```{r} dbinom(team, team, p.pass) ``` Note that everybody in the team has to pass. The third scenario is that nobody actually knows the right answer, but somebody thinks they do, and they are wrong. ```{r} 1 - ((1 - dbinom(0, team, p.know)) + dbinom(team, team, p.pass)) #is the same as dbinom(0, team, p.know) - dbinom(team, team, p.pass) ``` Let's add money into the equation. If a team passes, there is no money term, so we only have to think about when a team is right `1 - dbinom(0, team, p.know)` or when a team is wrong `dbinom(0, team, p.know) - dbinom(team, team, p.pass)`. ```{r} multiplied.money <- 1 * money ((1 - dbinom(0, team, p.know) ) * multiplied.money) + ( (dbinom(0, team, p.know) - dbinom(team, team, p.pass)) * (-multiplied.money)) ``` We can update our function to incorporate the probability of passing. ```{r echo = FALSE} reward.calc <- function(p.know, multiplier = 1, p.pass = 0) { team.size <- c(6, 2, 1) money <- 100 multiplied.money <- multiplier * money ((1 - dbinom(0, team.size[multiplier], p.know)) * multiplied.money) + ( (dbinom(0, team.size[multiplier], p.know) - dbinom(team.size[multiplier], team.size[multiplier], p.pass)) * (-multiplied.money)) } ``` How much money are we expected to make with a medium difficulty question, for each of the three multiplier scenarios? ```{r} # Medium difficulty questions p.know <- 0.6; p.pass <- 0.3 reward.calc(p.know, multiplier, p.pass) ``` What if the question is one of the harder ones? ```{r} # Hard (but not tricky) questions (if you don't know, you know you don't know) p.know <- 0.2; p.pass <- 0.7 reward.calc(p.know, multiplier, p.pass) ``` How does this change if the question is of medium difficulty, but is "tricky" - people think they know the answer, but it's wrong. In "tricky" cases like this, the probability of passing is low. ```{r} # Medium but tricky p.know <- 0.6; p.pass <- 0.1 reward.calc(p.know, multiplier, p.pass) ``` And if the question hard *and* tricky? ```{r} # Hard and tricky p.know <- 0.2; p.pass <- 0.1 reward.calc(p.know, multiplier, p.pass) ``` ## Conclusion Given the above analysis, the strategy for maximizing your Jeop-R-dy prize money is: * for easy questions, pick one person and cash in * for medium questions, you need two people to maximize returns * for hard questions, you want the wisdom of the crowd. If you knew that a hard question was also tricky, you should pass, but of course, then it wouldn't be tricky! ## Future work While this analysis does give the correct qualitative strategy, it also assumes that everybody in the group is equally adept at R. How would the model (and strategy) change is there is one person on the team who is amazing at R? How about if one person came unprepared? Would this strategy change if we correctly scaled the prize money for the different difficulty of questions? Why, or why not?