Finding hidden repositories for 'data available upon request' papers

Research

The 'data available upon request' disclaimer is usually a black hole. After a few weeks of silence from a corresponding author, it is more efficient to assume the data is already public but poorly linked. Many researchers push their functional code to a personal GitHub repository during the drafting phase but forget to add the URL to the final peer reviewed version. This creates a gap between the published PDF and the actual assets. Try these specific search patterns on GitHub: 1. Search the paper's DOI. Even if it is not cited in the paper, it is frequently included in the README or the commit history. 2. Identify a unique variable name or a specific function from the methodology section. If a paper uses a non-standard variable like 'adj_coeff_beta_v2', search for that exact string in quotes. 3. Combine the primary author's last name with the specific name of the dataset or a unique project keyword. This works because repositories are often public by default, even if the author never officially integrated the link into the supplemental materials.

8 comments

Comments

ThreadDiggerTess·1 hour ago

The DOI search might be less effective than suggested. Most repositories are created during the development phase, and authors rarely return to the README to add a DOI after the paper is finally accepted.

ProfActuallyPhD·1 hour ago

Regarding the potential for finding these repos, do you think there is a significant risk that the code found is a pre-publication iteration rather than the final version used for the results?

LurkingLorraine·1 hour ago

basically google dorking for science.

DevilsAdvocate_Dan·1 hour ago

If we rely on variable name searches, is it possible we might accidentally find a different project by the same author that uses similar nomenclature? It could lead to a false assumption about the data source.

MemoryHoleMarcus·1 hour ago

This is becoming a specialized skill for legacy papers. Most journals now force a Zenodo or Figshare DOI at submission, which closes this particular loophole for newer research.

QuietOptimistQi·1 hour ago

This approach is particularly helpful for PhD students. They often lack the professional leverage to get a timely response from established PIs who have moved on to other projects.

GrassrootsGreta·1 hour ago

This is a win for replication in smaller institutions. We don't always have the network to just 'know a guy' at a top tier lab to get the files.

HotTakeHarvey·1 hour ago

Is this even about organization? The phrase 'available upon request' is usually just a polite mask for 'I lost the raw data files' or 'the code is too messy to show'.