ARTICLE OWN VOICE, OWN WORDS.
When you first heard about COVID-19, your reaction was likely “Am I going to get sick and/or DIE from this thing?” Followed closely by: “Is someone I CARE ABOUT going to get sick and/or DIE from this thing?”. Followed, with some delay, by financial, economic, social, and cultural concerns. All that is completely understandable. Less so, is why you can’t get answers to your first question.
In March, when the media were reporting that the “death rate” in Italy was running at 9%-10%, people became very concerned. Death statistics from the 1918 flu pandemic only fanned the flames. Fear and uncertainty started to take root. And the closed Case Fatality Rate in the U.S. was up to 69%, before data on patient recoveries starting coming in; it has now dropped to 37%, and is still trending down.
So how worried should we be, given that we are in the Bay Area?
Local county data are incomplete, but our estimates are that if you are hospitalized with COVID-19 you face between a 3.3% and 5.5% risk of mortality. However, the disease severity varies greatly by age, with patients over 60 facing at least 8.8% mortality, while the rest at risks of over 1.2%. There are also important considerations related to adequate distancing, and to intubation, so please read on…
Two weeks ago when I looked at the COVID dashboard for the San Mateo County (SMC) Dept. of Public Health, it made me weep. Not because the numbers of cases and deaths were scary (at least not Italy or New York scary); but because it left me with more questions than answers. Since then the site has gotten steadily better, in terms of data completeness and ease of understanding with good graphics.
The current SMC COVID-19 dashboard is a good start, but there are some important improvements that could increase its value to the public. More information, even if not all of it is good news, can calm fears and guide people to act to reduce their personal risk, and the overall community risk.
Ideally, a county’s published COVID-19 data should address three broad areas of public concern:
1. How is the disease advancing, locally? Are non-therapeutic mechanisms (stay at home, physical distancing, face covering, hygiene) working to limit the spread? Are we flattening the curve, or seeing exponential growth? To answer this, the site should display:
– time series for all statistics (so we can see trends)
– onset and outcomes by age group (tests, pos. & neg; hospitalizations, acute and ICU; recoveries; deaths)
– cumulative per capita ratios, e.g., cases per thousand residents
– comparisons to other localities (in the Bay Area, state-wide), e.g. Sonoma Co.’s cases graph
2. Will the growing burden on hospitals, health professionals and other Public Health resources overwhelm the treatment system?
– Capacity utilization over time (acute beds, surge beds, ICU beds, ventilators)
– Is the trend suggesting we are in danger of overwhelming the system?
3. More selfishly, what is the personal risk to each of us, of infection and death? What can the statistics for testing, confirmed cases, recoveries, and deaths tell us about our chances, given our age, sex, and pre-existing conditions? How is that risk changing over time?
– This is why it’s important to have time series by demographics, and we need:
– Estimates by age group for the Closed Case Fatality Rate (cCFR)
– Estimates by age group for Infection Fatality Rate (IFR) – this is the percentage of all infections (both diagnosed and undiagnosed) resulting in death. This much harder to calculate because without widespread testing before and after (with antibody tests) we don’t know the true rate of infection in the population.
Death Rates explained
What is the real underlying mortality rate for COVID-19 victims here, and why does it appear to be so different from region to region? Some variations arise because there are different metrics that can be calculated from the numbers for testing, cases, deaths, and recoveries. Each of these metrics has advantages, but all are just estimates of the true mortality rate in the underlying epidemic process. They each have inaccuracies due to limitations of observability, measurement and reporting – and you’ve probably heard there are inaccuracies in reported deaths, cases, tests given, and recoveries reported. But we do the best we can, and hope that as we get larger numbers and more practice, we’ll get more meaningful statistics.
One estimate the press often reports for the underlying mortality rate is the simple ratio of cumulative confirmed deaths to cumulative confirmed cases, because this data is more readily available. This number is referred to by epidemiologists as the naïve Case Fatality Rate (nCFR).
naïve CFR = (Confirmed Cumulative Deaths)/(Confirmed Cumulative Cases)
When more complete data on patient outcomes is available (the number of recovered, as well as deaths), doctors and hospitals calculate the Closed Case Fatality (cCFR) Rate:
closed CFR = Deaths/(Deaths+Recoveries).
For example, In Sonoma County, as of 8 Apr, there were 123 confirmed cases, and only 1 death. So the naïve CFR at that point was 0.81%, and the closed CFR was 2.27%. With only one death, and under 130 cases, it’s dangerous to conclude much about the underlying process. For SMC, because we lack data on recoveries, we can only calculate the naïve Case Fatality Rate, which as of 6 Apr, was 3.57%.
So what is wrong with using the naïve CFR as an estimate of the underlying mortality rate? Well, the naive CFR has multiple sources of error. Perhaps it’s higher, maybe it’s lower than the closed CFR will turn out to be, when all patient outcomes are recorded. Here are some sources of error:
– The numerator only takes into account Confirmed Deaths. This may sound ok, but that number tends to be an underestimate, because of the reporting challenge, esp. if someone dies outside a hospital, but even if in hospital. Did the patient die from organ failure, or from COVID-19? What gets recorded? Were test kits used to test a fatality, or was there a test kit shortage requiring them to be conserved for the living?
– The denominator only takes into account Confirmed Cases. If a significant proportion of those infected remain asymptomatic but recover, or just choose to recover at home even after testing positive, then those people often won’t be reported as recovered and the Naïve CFR will be inaccurately high. Recently the CDC estimated that the proportion of asymptomatic infections is in the range of 25% – 40%. Also, as a local data point, Sonoma Co. is showing that (as of 2:30pm, Wed, 8 Apr) of 123 confirmed cases, 20 were hospitalized, and 103 were not.
–The denominator counts cases that came after the date of diagnosis; and it ignores that different fatalities have different time courses from diagnosis to death. Imagine for a minute that the disease duration is identical – from the day of testing positive, to the day of death – for all patients who die. Then the patients who die on day N would be among those who tested positive on some prior day, exactly T days earlier. So if we use the cumulative case count on day N in the denominator, we are using an inflated number of cases, some of which began after the group of infected patients for which day N deaths occurred. When daily new cases are rising sharply (the denominator), this can introduce a big error, and the naïve CFR will be much lower than the true cCFR.
Of course in reality people’s disease courses are all different. Ideally we would know the distribution of those courses, and could then apply an appropriate weighted average of case counts across prior days. But we don’t know the real distribution, so we propose a simple average of cases taken over a particular lagged window of prior days. Averaging over multiple days also helps to smooth the sometimes lumpy reporting of new cases. (For example: on 30 Mar, Santa Clara Co. (SCC) reported 202 new confirmed cases, with the note that these were confirmed at unknown points on the three days from 28-30 Mar.)
We’re using here a lagged 4-day average in the denominator, averaging the cumulative Cases from 7, 8, 9, and 10 days prior. Here is the spreadsheet formula for estimating the cCFR as of day d:
And the closed Case Fatality Rate also has problems. First, the same errors can exist in the count of fatalities. Second, it also only includes confirmed cases, and the true number is likely much larger. Third, the number of recoveries could be lagging in our collection of data. Bad news travels faster.
So what does this new cCFR calculation tell us about the mortality rate? Here is a graph of the case data from Santa Clara County (SCC) for the last month (where full time series data are available), showing naïve CFR and our smoothed estimate for cCFR using lagged 7-10 day case average:… and here are similar data for the state of California. Note the recent surge in negative test results, as a slug of unprocessed tests were finished and as testing began to ramp up here.
Analysis of Findings
The smoothed and lagged estimate takes a while to settle down, due to the initial small sample size. The SCC naïve CFR as of 8 Apr is: 3.3%. This is probably a low estimate of the closed CFR, while our current lagged estimate of 5.5% is probably a high estimate. Thus, currently the current underlying mortality rate in SCC is likely to be between 3.3% and 5.5%. In California as a whole the nCFR is currently 2.6%, the cCFR is 7.3%. But remember these ranges are an estimate, for people of all age groups, who were sick enough to need to go to the hospital. And there are marked variations in fatality by age group.
Below is a chart that illustrates the big difference in outcomes for different age groups that has been observed in SCC. As of 6 Apr 2020, patients over age 60 accounted for 28.4% of cases, but 74.4% of deaths. Conversely, patients age 60 and under accounted for 70.7% of cases, but only 25.6% of deaths. Put another way, patients over age 60 in the first month of the epidemic have a naïve CFR of 8.8%, while those 60 and under have a naïve CFR of just 1.2%. So the danger for older folks in the Bay Area is significantly greater, as was the case in China and Italy.
So What? – or – Implications:
Going back to the important question, “What are the chances I’ll die from this?”
Answer: they’re much better if you stay out of the hospital. Internal NYC hospital memos document “… the likely mortality is 50 to 80 percent for patients needing intubation, especially those with pre-existing lung or cardiac comorbidities.” There is also recent evidence that intubation may not be the best treatment for all oxygen-depleted patients.
What To Do:
- Stay home so we can get this pandemic over sooner;
- Wear a mask when making essential trips to enclosed spaces;
- Practice physical distancing (Don’t believe that cough droplets only travel less than 6ft. The distance can be over 25 feet per an MIT study. See links below.;
- Have a decontamination procedure for bringing groceries, sundries, yourself, your clothes and your shoes – back into your home.
- Last thought: if you do test positive, and have to go into the hospital, remember, this isn’t Italy. In the Bay Area at present, we DO have sufficient beds and ventilators because of the good work all of us have been doing so far. So, think positive if you test positive.
Why don’t the published COVID-19 data for San Mateo County, and most of the other Bay Area counties, include testing or recovery stats? One can debate whether the smoothing technique we use herein is the best, but without data on recoveries, how are we ever really going to know the mortality rate?
Bruce and Gregg were housemates at MIT, and know just enough statistics to be dangerous.
Technical Explanation, Footnotes, and Sources:
- Marin: https://www.marinmommies.com/keep-track-covid-19-county-marin-hhs
- Contra Costa: https://www.coronavirus.cchealth.org/
- plus Dashboard: https://www.coronavirus.cchealth.org/dashboard
- Alameda County: http://www.acphd.org/2019-ncov.aspx
- San Francisco: https://data.sfgov.org/stories/s/San-Francisco-COVID-19-Data-Tracker/fjki-2fab/
- San Mateo: https://www.smchealth.org/post/san-mateo-county-covid-19-data-1
- Santa Clara: https://www.sccgov.org/sites/phd/DiseaseInformation/novel-coronavirus/Pages/dashboard.aspx
- The Following Counties were NOT part of the initial shelter in place:
- Sonoma: https://en.wikipedia.org/wiki/Portal:San_Francisco_Bay_Area
- also: https://norcalpublicmedia.org/track-sonoma-county-coronavirus-cases-and-test-results
- Solano: http://solanocounty.com/depts/ph/coronavirus.asp
- and https://doitgis.maps.arcgis.com/apps/opsdashboard/index.html#/6c83d8b0a564467a829bfa875e7437d8
- Napa: https://www.countyofnapa.org/2739/Coronavirus
- and https://legacy.livestories.com/s/v2/coronavirus-report-for-napa-county-ca/9065d62d-f5a6-445f-b2a9-b7cf30b846dd/
- California: https://norcalpublicmedia.org/2020040642477/news-feed/california-covid-19-tracker-gives-county-by-county-look-at-cases-deaths
There have been various estimates for T, including 7 days, 14 days, and even higher numbers from WHO. Recently, WHO reported that the time between symptom onset and death ranged from about 2 weeks to 8 weeks. And, of course, all these estimates come with a probability distribution of numbers and the confidence thereof. See:
Given the average incubation period for the virus, which is 5 to 7 days, testing could not detect the virus until then. Subtracting that period from an estimated disease course of 2 weeks, or 14 days, leaves 7 to 9 days for a detected case to progress to mortality. We’re using a tenth day to account for the few U.S. cases which take longer.
How we calculate these two metrics for Sonoma Co.: The data as of 3:30pm, Wed, 8 April 2020: Confirmed Cases: 123 Active: 79 Recovered: 43 Deaths: 1 which gives: Naïve CFR: 0.81% (1/123) Closed CFR: 2.27% (1/(43+1))
For comparison, what is the Naïve CFR for San Mateo Co.: Data last updated Mon, 6 Apr 2020. Last checked 3:40pm, Wed, 8 Apr. The data: Confirmed Cases: 589 Active:?? Recovered:?? Deaths:13 which gives: Naïve CFR: 2.21%(13/589)
- JAMA Intern Med. 2020. doi: 10.1001/jamainternmed.2020.0994. [Epub ahead of print]; https://bit.ly/2V3vFTZ;
- N Engl J Med. 2020. doi: 10.1056/NEJMoa2002032. [Epub ahead of print]; https://bit.ly/3bDobxu;
- China CDC Weekly. 2020;2:113; https://bit.ly/2wHYAVv;
- N Engl J Med. 2020. doi: 10.1056/NEJMoa2004500. [Epub ahead of print]; https://bit.ly/2WT5LES;
- JAMA. 2020. doi: 10.1001/jama.2020.4326. [Epub ahead of print]; https://bit.ly/2UO0EDm.
Note that these studies are being released before full peer review, because of the critical need for information to fight CV-19.
MIT Prof. Lydia Bourouiba, JAMA article (w/video of cough cloud) “Turbulent Gas Clouds and Respiratory Pathogen Emissions: Potential Implications for Reducing Transmission of COVID-19”
Video from NHK World, Japan. “Beautiful demonstration of micro-droplet i.e. airborne virus : Coronavirus.” https://www.reddit.com/r/Coronavirus/comments/fu7c1u/beautiful_demonstration_of_microdroplet_ie/?utm_source=share&utm_medium=web2x