Reaction to CUPED article

On September 15, 2024, Craig Sexauer posted “CUPED Explained” where CUPED stands for Controlled-experiment Using Pre-Experiment Data, billed as “one of the most powerful algorithmic tools for increasing the speed and accuracy of experimentation programs.”

Here is a list of some of the things that just make me want to vomit about this explanation of the tool (if not the actual tool itself):

As an experiment matures and hits its target date for readout, it’s not uncommon to see a result that seems to be only barely outside the range where it would be treated as statistically significant.

“Only barely“. As in, we already gave you a 1-in-20 chance of not proving your hypothesis, you just need a little bit more. This is very reminiscent of this XKCD comic showing that green jelly beans linked to acne, where “only 5% chance of coincidence” prominently displayed, while burying the other Red, Turquoise, Magenta, Yellow, Grey, etc (19 other times) studies. Any study that admits to using CUPED would instantly get a “what are you not telling me?” rejection.

Waiting for more samples delays your ability to make an informed decision, and it doesn’t guarantee you’ll observe a statistically significant result when there is a real effect.

See above – you have a weak hypothesis, and adding more sample doesn’t guarantee the result you want. This is all true – it just hides the “use CUPED to solve this problem” conclusion.

Conceptually, if one group has a faster average baseline, their experiment results will also be faster. When we apply a CUPED correction, the faster group’s metric will be adjusted downwards relative to the slower group.

Before you just pick a random variable and “correct” for it (like, by using CUPED), I would recommend you read (and understand) The Book of Why by Judea Pearl. This book has tons of examples where a “confounding variable” is corrected for incorrectly. The book describes p-calculus as a way to model causality [All of this is not to say I understand p-calculus. I only understand enough to tell when other people are wrong.]

Python libraries on causaility:

gCastle – https://github.com/huawei-noah/trustworthyAI/tree/master/gcastle
DoWhy – https://www.pywhy.org/dowhy/v0.11.1/
EconML – https://github.com/py-why/EconML
causal-lean – https://github.com/py-why/causal-learn and tetrad (Java)
PyWhy – https://www.pywhy.org/

Recent Posts

Recent Comments

Archives

Categories

Meta