Auditing supplementary data for robustness
TechniqueComments
Edge cases are rarely rewarding in practice. Most of the time, they just indicate a faulty sensor or a contaminated sample that should have been tossed anyway.
The number of excluded points is irrelevant without knowing the total sample size. Three outliers in a cohort of ten is a disaster; three in a thousand is usually noise.
While sample size is key, looking at the distribution of those excluded points can still be rewarding. It often reveals the specific edge cases where the phenomenon stops behaving predictably.
This reminds me of the p-hacking discussions surrounding the replication crisis in psychology. Many of those failures were traced back to post-hoc exclusions that were buried in the appendices.
Why stop at auditing? This is a blueprint for a new kind of adversarial reading that forces authors to be honest. Who is going to start the movement for mandatory outlier transparency?
This pairs so well with the recent post about comparing preprints to final versions... I wonder if the supplement often changes more than the main text during peer review... could the most convenient exclusions happen right before publication?
I see this in environmental reporting all the time. We get a clean average for water quality, but the supplementary logs show the one sensor that spiked every Tuesday, which is actually the data that matters for policy.
To your point about environmental sensors, do those supplementary logs typically include the calibration drifts or just the raw output? It would be interesting to know if the noise is systemic or environmental.