On September 15, 2024, Craig Sexauer posted “CUPED Explained” where CUPED stands for Controlled-experiment Using Pre-Experiment Data, billed as “one of the most powerful algorithmic tools for increasing the speed and accuracy of experimentation programs.”
Here is a list of some of the things that just make me want to vomit about this explanation of the tool (if not the actual tool itself):
As an experiment matures and hits its target date for readout, it’s not uncommon to see a result that seems to be only barely outside the range where it would be treated as statistically significant.
“Only barely“. As in, we already gave you a 1-in-20 chance of not proving your hypothesis, you just need a little bit more. This is very reminiscent of this XKCD comic showing that green jelly beans linked to acne, where “only 5% chance of coincidence” prominently displayed, while burying the other Red, Turquoise, Magenta, Yellow, Grey, etc (19 other times) studies. Any study that admits to using CUPED would instantly get a “what are you not telling me?” rejection.
Waiting for more samples delays your ability to make an informed decision, and it doesn’t guarantee you’ll observe a statistically significant result when there is a real effect.
See above – you have a weak hypothesis, and adding more sample doesn’t guarantee the result you want. This is all true – it just hides the “use CUPED to solve this problem” conclusion.
Conceptually, if one group has a faster average baseline, their experiment results will also be faster. When we apply a CUPED correction, the faster group’s metric will be adjusted downwards relative to the slower group.
Before you just pick a random variable and “correct” for it (like, by using CUPED), I would recommend you read (and understand) The Book of Why by Judea Pearl. This book has tons of examples where a “confounding variable” is corrected for incorrectly. The book describes p-calculus as a way to model causality [All of this is not to say I understand p-calculus. I only understand enough to tell when other people are wrong.]
Python libraries on causaility:
- gCastle – https://github.com/huawei-noah/trustworthyAI/tree/master/gcastle
- DoWhy – https://www.pywhy.org/dowhy/v0.11.1/
- EconML – https://github.com/py-why/EconML
- causal-lean – https://github.com/py-why/causal-learn and tetrad (Java)
- PyWhy – https://www.pywhy.org/