COVID lessons: scientists without quality data are like unarmed soldiers in a war zone

Some of the most important lessons for public health from the COVID pandemic are about how the government should share data with the public, how updates about responses should be clearly articulated, and how important the sharing of the insight, the information, and the full relevant data are to the public.

The pandemic brought these issues into focus. But the challenge extends beyond the boundaries of COVID-19 to all diseases.

Mistakes made during the pandemic when it came to the collection, management and distribution of data must be recognised. And lessons must be learned and shared about efficiently navigating public health data.

We looked at the effectiveness of lockdown in South Africa and how data were used during the pandemic. We concluded that data collection and dissemination could have been much more efficient. And that if it had been it would have determined better outcomes.

For example, if more detailed, localised data had been publicly available throughout the country it would have been possible to quantify and contrast the spread of the disease between cities, towns and rural areas. In turn, this would have meant that those making policy decisions were better informed.

Our analysis and findings underscored that quality data is the cornerstone of good science. Without it, scientists given the job of informing the public about vital public health issues are like unarmed soldiers in a war zone.

We cannot emphasise enough the importance of epidemiological data, and how relevant it is in managing the early stages of a disease outbreak. However, as a disease progresses, so too must the underlying data and reporting improve to manage the progression of the outbreak.

About more than just data

Sharing information is not just about sharing any data with the public.

Take the issue of aggregate reporting. Limited inferences can perpetuate public bias. Aggregate reporting presents data in a way that illustrates a cumulative number or a time series progression of the total sum of data. These World Health Organisation graphs are a good example of both good and bad practice. Good because data are shared, bad because only one variable perspective is shared at a time.

Another challenge is that underlying data aren't made available for other scientists to use easily. So even though comprehensive and well presented epidemiological reports are released by South Africa's National institute of Communicable Disease (NICD) and it now has a very usable dashboard , the underlying data are not available for any other visualisations or analyses by others.

A further problem with aggregate reporting is that it abstracts the nuances and public healthcare interventions and changes over time. This includes things like modification in patient follow-up guidelines, introduction of a new treatment regimen (as was the case with HIV/TB) and innovative clinical surveillance strategies.

Members of the public need to have comparisons of the state of the current outbreak in relation to previous outbreaks of a similar nature. This would be contextually relevant and can help people to assess the insight as well as the data and move toward evidence based decision making.

The time frames can be adjusted from these dashboards. But the way the data are presented means that it's hard to contextually compare different infectious disease surges (or clusters of outbreaks of a specific disease) and the impact on the healthcare system.

Reflecting changing realities

Disease outbreaks aren't static. A disease can lose epidemic-status and become endemic , as it becomes a constant and more predictable presence at a particular location. For example, both the contagiousness and harmfulness of a disease can change as a result of an actual intervention, such as an effective vaccine or effective non-pharmaceutical interventions.

Read more: We can't banish COVID-19. But we can end the pandemic with vaccinations

In the initial stages of an outbreak, three primary data points are useful to everyone and should be shared routinely: time, place, and number.

Typically, after any outbreak, government or health authorities take steps to share basic data and infographics with the public that purport to justify any interventions they may recommend.

This was the case during COVID.

But we identified some immediate problems with this approach.

Firstly, much of the information is released only in formats like infographics that are not computer readable. This makes further analysis impossible without research groups and members of society manually transcribing, collecting, and sharing data. This causes a trust issue with the data: there might be multiple sources of the same information and the process is error-prone.

Secondly, data shared over time and subsequent visualisations became less frequent (in the case of data sharing) and remained aggregate (in the case of dashboards and infographics). An unfortunate consequence was that there wasn't transparency or a clear correlation between the underlying evidence and decisions being taken.

So how can public health decision making stop being treated like a state secret? Aren't there simply ways for the required data to be openly shared, and platforms created that enable engagement with the numbers?

We think it is indeed possible.

The way forward

Disaggregated data. In a country with inequities such as South Africa, aggregated data can hide disproportionate effects of an event on specific communities. Making the raw disaggregated data available can enable evidence-based advocacy and interventions to meet the needs of marginalised communities more effectively.

Accessible data. Information should be shared with the inclusion of indices, metrics, and simplified computer readable data types. This would allow wider use and add a layer of transparency. It would also create an opportunity for community-led monitoring and evaluation outside the government.

Choosing appropriate visualisations. We strongly recommend representing the data as a relative number (in other words such as percentages or per population size) in addition to absolute numbers. This would make it more accessible. Ordinary citizens would be able to understand better where things stand and how they are changing. It would also help inform changes they may choose to implement to keep themselves safe.

Also, previous outbreaks of a similar or the same pathogen should also be displayed. This would enable people to contextually assess the similarities and differences at a glance. Here's a good example.

Flaws to overcome

COVID-19 exposed the fragmented way in which data is released, and how insufficient data sharing can be if it's not done at a local area level.

In some instances, data quality issues also compromise the trust the public has in the system. Trust is also affected by the frequency with which data are shared. Inconsistencies in terms of time and date for data sharing seems to be a universal problem. This breeds public distrust.

Finally, information shared should not only support “good news”. Negative data – such as side effects of a particular treatment regimen or medical intervention – should also be shared.

From COVID-19 we learned that there are multiple opinions around a specific issue. Some of these opinions have been badly informed. But one cannot blame people who are uninformed when important decision-making information is not freely and easily accessible. Without the required supporting information, citizens will continue to make assumptions or believe misinformation and disinformation that are not evidence-based. Their spread may be unavoidable. But the lack of access to quality data is not.

Nompumelelo Mtsweni, data visualisation developer, Elizabeth Cornelia Greyling, strategy manager at Columbus Stainless, and Emmanuel A. Simon, digital strategic consultant, also contributed to this article.

Herkulaas MvE Combrink is affiliated with the Interdisciplinary Centre For Digital Futures at the University of the Free State.

Benjamin T H Smart receives funding from The National Research Foundation.

Scott Hazelhurst receives funding from the National Institutes of Health and GlaxoSmithKline Research & Development. The opinions expressed are his responsibility and not of the funders.

Vukosi Marivate receives funding from ABSA and National Research Foundation. He is affiliated with Deep Learning Indaba and Masakhane NLP organisations.

By Herkulaas MvE Combrink, Lecturer, University of the Free State And

Benjamin T H Smart, Associate Professor, University of Johannesburg And

Scott Hazelhurst, Professor of Bioinformatics, University of the Witwatersrand And

Vukosi Marivate, Chair of Data Science, University of Pretoria