Data discrepancy

A couple of studies use naturally occurring control groups—passengers in the Diamond Princess and those who returned to Wuhan after foreign travel—and COVID-19 mortality data to calculate the Infection Fatality Ratio and with that work backwards to arrive at the true caseload in a country, which is far higher than the official numbers put out by countries.

Published : Apr 28, 2020 07:00 IST

The Diamond Princess  cruise ship, in which dozens of passengers tested positive for COVID-19, arrives in Yokohama near Tokyo, on February 16.

The Diamond Princess cruise ship, in which dozens of passengers tested positive for COVID-19, arrives in Yokohama near Tokyo, on February 16.

THE lack of knowledge about the true number of COVID-19 positive cases in any country or region seems to be the elephant in the room in the ongoing COVID-19 pandemic. The severity of any epidemic has two aspects: one, how infectious or contagious the causative pathogen—in this case, the coronavirus SARS-CoV-2—of the disease is; and, two, how dangerous it is; that is, how many in a cohort group of COVID-19 patients the disease will kill, and equivalently, what the chances are that a person who becomes positive for the virus will eventually die.

A measure of the first is given by a parameter called the Basic Reproduction Number, denoted by R0 (pronounced “R-nought”), which is the number of people that an infected person is likely to transmit the infection to. In a new and growing epidemic, R0 is usually greater than 1; that is, one person is likely to infect more than one person, and the secondary infected persons will each go on to infect R0 more number of persons, leading to an exponential growth in the number of infected persons.

R0 is not a fixed number; it is a dynamic parameter that can be brought down by reducing the chances of the infection being transmitted from person to person by measures such as physical distancing among people, hygienic practices such as regular washing of hands, isolation of infected persons, self-quarantine and so on. Once R0 becomes less than 1, the infection spread will decline and eventually die out. Scientists estimate R0 by constructing models of transmission of infection in a given social setting.

The second aspect is intrinsic to the pathogen in a situation where there is no treatment for the disease or any degree of innate immunity to it in populations (as is the case with any new epidemic/pandemic such as COVID-19) and can be taken to be a constant factor across populations assuming country-to-country differences in health care settings to be not very significant. As is obvious, the number of deaths that is likely to occur for a given number of infections would be a measure of this. This is called the Case Fatality Ratio or Rate (CFR), which is the number of deaths divided by the total number of infections, generally given as a percentage. Of course, since the severity of the disease has been found to be age-specific, with the elderly being more susceptible, age-stratified CFR will be of more epidemiological significance. Also, CFR will be higher where the health care system is relatively poor, and less where it is otherwise.

As earlier mentioned, in any growing epidemic, the major epidemiological problem is that one does not have a handle on the denominator, the true caseload. Early on in an epidemic, the more severe cases tend to get detected and so the CFR will be high, which will decline as less severe cases, too, begin to get captured in the diagnostic testing network of the operative health care system in a given setting. But even in an expanded testing network, many cases could go under the radar because mildly symptomatic or asymptomatic cases are not likely to show up in hospitals and clinics for testing. That asymptomatic cases, and transmission of infection by them, do occur is now an accepted fact. 

While earlier pieces of evidence may have had other confounding factors obscuring such an inference, there is now clear evidence from data of people aboard the cruise ship Diamond Princess and, more recently, the aircraft carrier USS Theodore Roosevelt. These captive populations were equivalent to control groups where everyone is tested for SARS-CoV-2 irrespective of whether they are symptomatic or asymptomatic, and these fortuitously occasioned control groups have thrown up significant numbers of asymptomatic cases.

So the naive or crude CFR does not reflect the true ground situation. How close the detected number of cases is to the true caseload will depend on the adequacy of the testing strategy, in particular the criteria adopted for testing the population—that is, how accurately it picks up most of the infections, including undiagnosed and asymptomatic cases. 

In such a situation, the criteria for testing have to be much broader than they are at present in many countries, notably in India. The narrower a country’s testing strategy, the poorer its Detection Rate (DR) and, consequently, the higher its crude CFR (the number of deaths divided by the detected—rather than the actual—number of cases).

Infection fatality ratio

Notwithstanding the fact that the sample sizes of such naturally occurring control groups will be relatively small, since the number of infections in them is accurately known and they can be closely monitored, they can provide a fairly good estimate of the real fatality ratio, called the Infection Fatality Ratio (IFR), which will be a true measure of the intrinsic severity of the pathogen. So if one knows the exact number of deaths owing to COVID-19, which is likely to be the case in most contexts, the IFR so determined from control groups can be used to determine the true number of infections in real-world situations. 

Actually, the number of deaths at a given point of time will give the total number of infections a little in the past because there is a time lag between the time a person tests positive for COVID-19 and the time of death, which has been found to be a little over two weeks.

The question, however, is how good is such an IFR for it to be used across populations and across usual health care settings to work backwards from the number of deaths and arrive at the true caseload? 

Settings such as Diamond Princess and USS Theodore Roosevelt are unlikely to reflect the real-world situation because of better-than-normal health care that the passengers are likely to have received both while on board and later.

However, a recent research work by a British team of scientists led by Robert Verity, and published in the journal Lancet Infectious Diseases , has provided (age-stratified) data about another similar control group where everyone was tested for the virus. This is the set of people who were returning to Wuhan, the Ground Zero of COVID-19, after international travel. They were all tested for the virus on entry at the airport and COVID-19 positive cases quarantined and their conditions monitored. 

There were six such international flights that brought people back to Wuhan and by combining age-stratified infection and disease data from mainland China with the data of people on the repatriation flights, the group presented age-specific IFR for the first time. By combining data from mainland China and outside, this study also estimated the average time from onset of COVID-19 symptoms to death to be approximately 18 days.

Arguing that this age-structured IFR data given by Robert Verity and his team is likely to be more reliable than crude CFRs, and that these can be treated as benchmarks for projecting to other settings and contexts after making country-specific demographic corrections, the researchers Sebastian Vollmer and Christian Bommer of Gottingen University have laid down a template for obtaining the total caseload that was back-dated two weeks from the mortality data on a given date. The researchers have allowed for an average of four days to pass from the day of onset of symptoms to being tested.

The Gottingen University researchers used the demographic data for each country from the United Nations population database to yield country-specific IFRs. 

This country-specific IFR is calculated as a weighted sum of age-stratified IFRs given by the British study, with weights determined by the population shares in each age strata respectively for each country as given in the U.N. population data. In this manner, they worked out IFRs for as many as 40 most-affected countries, including India.

So, for example, the COVID-19 mortality data of any country on April 14 when divided by its IFR will yield the real number of infections on March 31. Their findings show that a vast majority of infections go undetected in most countries. According to them, on the average, the official numbers of confirmed cases put out by different governments on a given date actually represent less than a tenth of the actual number of infections.

For China as a whole, the IFR determined by the British team was 0.66 per cent. Based on the age-stratified data of China, the IFR projected in the Gottingen University study work for India as a whole is 0.41 per cent. 

The corresponding IFRs for Germany is 1.3 per cent, Iran 0.43 per cent, Italy 1.38 per cent, Japan 1.6 per cent, Pakistan 0.29 per cent, Spain 1.21 per cent, South Korea 0.96 per cent, the U.K. 1.09 per cent and the U.S. 0.96 per cent. The vast differences in the unstratified country-specific IFRs reflect the widely different demographic structures of the countries.

For India, this means that, given that the COVID-19 mortality data on April 19 was 507, the total number of infections two weeks earlier on April 5 was 1,23,659 as against the official figure of 3,577, which means that official figures on that day represented only about 2.9 per cent, or 1/35th, of the real caseload. That is, the DR of infections is only about 2.9 per cent, which is indicative of grossly inadequate testing. Actually, this was a significant improvement over the DR that the researchers found on the day they wrote their paper.

The Gottingen University researchers published their work on April 6 and had calculated country-specific infection figures for March 17 based on mortality figures of March 30. They arrived at country-specific DRs as on March 17 by dividing the officially declared confirmed cases on March 17 by the real number of infections on March 17 that they calculated with the respective IFRs. According to those figures (Table 1), on March 17, India’s DR was only 1.68 per cent. 

Corresponding DRs for Germany was 15.8 per cent, Iran 2.4 per cent, Italy 3.5 per cent, Japan 25.16 per cent, Pakistan 2.65 per cent, Spain 1.7 per cent, South Korea 49.47 per cent, the U.K. 1.2 per cent and the U.S. 1.6 per cent. According to the researchers, the huge differences in DRs may explain the vast differences that one notices in the crude CFRs that one sees across countries, in particular the low CFR for South Korea corresponding to its significantly high DR.

Of course, the DRs are bound to change depending upon how each country responds to progression in the epidemic; the DRs could drop if the testing network cannot keep pace with increasing number of infections or it could increase if either the testing is ramped up or if a country had crossed the peak of the epidemic and infections had begun to decline. Accordingly, the Gottingen University team updated its work on April 14 by calculating country-specific DRs as on March 30. Interestingly, India’s DR on March 30—based on mortality figure of 358 on April 13—had dropped to 1.45 per cent from the earlier 1.68. In comparison, Germany’s DR had increased to 27.32 per cent, Iran 3.9 per cent, Italy 6.87 per cent, Japan 24.34 per cent, Pakistan 5.39 per cent, Spain 5.98 per cent, South Korea 42.88 per cent, the United Kingdom 2.15 per cent and the United States 6.59 per cent (Table 1). The fact that India’s DR has now increased to 2.9 per cent is indicative of testing being ramped up in the country, but it is still quite low.

Sign in to Unlock member-only benefits!
  • Bookmark stories to read later.
  • Comment on stories to start conversations.
  • Subscribe to our newsletters.
  • Get notified about discounts and offers to our products.
Sign in


Comments have to be in English, and in full sentences. They cannot be abusive or personal. Please abide to our community guidelines for posting your comment