Reaction to CUPED article

On September 15, 2024, Craig Sexauer posted “CUPED Explained” where CUPED stands for Controlled-experiment Using Pre-Experiment Data, billed as “one of the most powerful algorithmic tools for increasing the speed and accuracy of experimentation programs.”

Here is a list of some of the things that just make me want to vomit about this explanation of the tool (if not the actual tool itself):

As an experiment matures and hits its target date for readout, it’s not uncommon to see a result that seems to be only barely outside the range where it would be treated as statistically significant. 

Only barely“. As in, we already gave you a 1-in-20 chance of not proving your hypothesis, you just need a little bit more. This is very reminiscent of this XKCD comic showing that green jelly beans linked to acne, where “only 5% chance of coincidence” prominently displayed, while burying the other Red, Turquoise, Magenta, Yellow, Grey, etc (19 other times) studies. Any study that admits to using CUPED would instantly get a “what are you not telling me?” rejection.

Waiting for more samples delays your ability to make an informed decision, and it doesn’t guarantee you’ll observe a statistically significant result when there is a real effect.

See above – you have a weak hypothesis, and adding more sample doesn’t guarantee the result you want. This is all true – it just hides the “use CUPED to solve this problem” conclusion.

Conceptually, if one group has a faster average baseline, their experiment results will also be faster. When we apply a CUPED correction, the faster group’s metric will be adjusted downwards relative to the slower group.

Before you just pick a random variable and “correct” for it (like, by using CUPED), I would recommend you read (and understand) The Book of Why by Judea Pearl. This book has tons of examples where a “confounding variable” is corrected for incorrectly. The book describes p-calculus as a way to model causality [All of this is not to say I understand p-calculus. I only understand enough to tell when other people are wrong.]

Python libraries on causaility:

This entry was posted in Machine Learning. Bookmark the permalink.