Analysis of the genomes of hundreds of people from across Africa has shed light on ancient migrations and modern susceptibility and resistance to disease, revealing unexpected genetic diversity.
The genome is often described as the body's instruction manual, and whole-genome sequencing involves effectively reading the billions of DNA bases in an individual.
New research on sequences from more than 400 people from the African continent is part of an attempt to address an imbalance in the genetic data available worldwide, most of which comes from people of European ancestry.
The work, published Wednesday in the journal Nature, reveals Zambia may have been a key stopover point in the expansion of Bantu-speaking groups.
And it finds that populations living in one area may have significantly different genetic vulnerabilities to disease.
"These are population groups that for the most part have not had their genomes sequenced before," said Zane Lombard, an associate professor of human genetics at Johannesburg's University of the Witwatersrand, who helped lead the study.
"So the addition to our knowledge base is unique, significant, and more than would be expected from a similar sample size in other populations," she told AFP.
In many ways, we live in the age of the gene, with better understanding of our body's basic building blocks helping advance medical research and treatment, and better explain how humans evolved and migrated around the world.
But the picture of human genetic variation remains frustratingly incomplete, with the world's diversity far from represented.
As little as 22 percent of participants in genomics research are of non-European heritage, with the majority of available genetic data coming from just three countries -- Britain, the United States and Iceland.
Efforts to correct that have been under way for years in Africa, and Lombard's team worked with a consortium established to improve genetic research on the continent to access whole-genome sequences from 426 people enrolled in ongoing studies.
The sample might seem comparatively small, but it represents 50 ethnolinguistic groups from 13 countries, including some being studied for the first time.
'Scratching the surface'
Among the discoveries revealed by the sequencing was an unexpectedly large number of new so-called single-nucleotide variants (SNVs), areas that differ from a reference genome and had not previously been identified in publicly available genome sequence data.
"This is significant because we are learning more about human genetic variation in general, discovering more differences that could be linked to disease or traits in the future, and that can inform what we know about genetic diversity across the globe," Lombard said.
Uncovering the variants is just a first step, but it will help identify which ones may be important in health outcomes, and provide key data on the different vulnerabilities of populations.
"Africans are (often) presumed to have the same disease susceptibility or incidence where that may not be a useful framework for specific groups," Lombard said.
For example, the data showed members of one group in Uganda had variants that are protective against severe malaria, while other groups living in the same country lacked the variant.
This could be the result, the study says, of the relatively recent migration of members of the unprotected group into parts of Uganda where malaria is endemic.
The sequencing also offers insights into the Bantu expansions -- key migrations of Bantu-speaking people that happened several times over thousands of years.
The sequencing showed Bantu groups in eastern and southern Africa had genetic similarities to Bantu speakers in Zambia.
So while recent work has theorised at least one Bantu migration originated in Angola, the genetic data suggests Zambia may have been a key stopover point.
For all the sequencing reveals, Lombard acknowledges the work "is really just scratching the surface of the more than 2,000 ethnolinguistic groups represented in Africa".
Going forward, the researchers hope to look at other types of variations in the sequences, and to add information from unstudied populations.