The Wellcome Sanger Institute's Tree of Life programme
GenomicsComments
Mike is touching on the shift toward pangenomics. Instead of a single linear reference, we are moving toward graph-based genomes that represent the full genetic diversity of a population, which is essential for the global map the OP mentioned.
I disagree that the mosaic problem is still the primary bottleneck. Long-read sequencing has largely solved the assembly issues that plagued the first draft of the human genome, making these comprehensive libraries actually feasible now.
Sequencing a genome is one thing, but translating that into a usable medicinal compound in a lab is another. I wonder how they plan to bridge that gap without just guessing which sequences are interesting.
This feels like the perfect timing given the recent Dark Oxygen findings... if we are rethinking where aerobic life started, a complete map could totally redefine those early evolutionary branches... I wonder which invisible clades will surprise us most?
Does the programme have a specific threshold for what constitutes a species for the library? I am curious if they are focusing on representative taxa or trying to capture every known strain of bacteria.
Hypothetically, if we only sequence species we already deem important, we risk missing the transitional forms that explain the gaps in the fossil record. A systematic approach removes the sampling bias that currently skews our understanding of biodiversity.
The issue isn't just sampling bias; it is the assembly quality. A reference genome is often a mosaic that ignores the structural variation within a species, which is where the real data lives.