Subscribe to the OSS Weekly Newsletter!

Register for the OSS 25th Anniversary Event

Predicting Emerging COVID-19 Hotspots...Without Asking

Predicting catastrophic events has been an ongoing obsession for big-data scientists. However, the prediction of emerging COVID-19 hotspots has been a struggle during this current pandemic. Most attempts made at monitoring COVID symptoms in the population have been conducted via online surveys or ‘apps.’ And while these are tried and tested methods used to gather information, many contain inherent biases and flaws.

The ability to accurately pinpoint areas where new cases of COVID-19 are rising would help to better allocate resources and enact better local policy.

Most attempts made at monitoring COVID symptoms in the population have been conducted via online surveys or ‘apps.’ While these are tried and tested methods used to gather information, we know they have many inherent biases and flaws. People still need to “choose” to answer a survey… and many don’t. We also know people often give false information for a variety of reasons.

With all these deficiencies in survey questionnaires, how can we predict where to expect the next big COVID outbreak?

Simple, don’t ask.

In his book “Everybody Lies; Big Data, New Data and What the Internet Can Tell Us About Who We Really Are,” Seth Stephens Davidowitz uses Google trends to dive deep into the human psyche; he attempts to discover what we really think and who we really are, by looking at what people divulge through their Google searches. Dr. Davidowitz further elaborates on this method in his New York Times article “Google Searches Can Help Us Find Emerging COVID-19 Outbreaks.

Davidowitz looked at Google searches such as “I can’t smell” in order to predict case prevalence in the USA and other countries, such as Ecuador, where testing may be less prevalent.

Inspired by his work, I decided to use Google trends to look at some pretty standard questions the non-medical population may ask when experiencing early COVID symptoms.

I decided to limit my search to the United States for the last seven days.

In the tables below, the numbers represent the relative frequency of that specific question relative to all other searches.

My first search: “Do I have Corona?”

From there, I decided to cross-reference a few other searches to try and get a more accurate picture.

My next search: “Should I get tested for Corona?”

Finally, I added a search for the most common symptom of COVID-19, i.e. a dry cough.

Cross-referencing all three searches resulted in the following:

Interestingly, searches for getting tested had very little influence on the data, and other permutations of the search did not have enough hits to produce any significant data points.

Looking at the above list, I would presume Rhode Island, Delaware, and Massachusetts to be likely candidates as emerging COVID-19 hotspots.

Substituting “should I get tested for Corona” with “lost smell” produced the following table:

Given the above, it would appear that layering our searches might give us more useful information than looking at just one factor at a time.

Interestingly, as of today – May 24, 2020 – none of the states mentioned directly above are in the group of states where new cases are said to be increasing.

Delaware and New Hampshire are in the group, which show a stable number of cases.

New Jersey, Rhode Island, and Massachusetts are all showing a decreasing number of cases.

While the results of the Google trend searches might not appear to correlate to an increase in COVID cases in the areas we might have expected, we should remember that these searches are used as future prediction models; we might currently have numbers that indicate a decrease in cases in certain areas due to insufficient testing, or a re-emergence that has yet to form part of the statistics.

Though this novel method of predicting possible outbreaks hasn’t yet proven to be effective, it may very well provide a more accurate depiction of resurgences in COVD-19, or of non-tested cases, especially as compared to the survey model.

Should we consider increasing testing in the locations where people are searching for keywords indicative of early common symptoms?

Should we expect a ‘second wave’ in these states?

Time will tell, though not too much time… We should see it in the next 7-14 days.


Back to top