Dr. Farzad Mostashari on syndromic surveillance.
When Carnegie Mellon researchers came up with the idea of putting together a survey asking the public about their coronavirus symptoms, the scientists knew they had to collect millions of data points in order to learn something meaningful.
So they turned to Facebook, which has a public team that specializes in using analytics for humanitarian purposes called "Data for Good", for help.
The survey, which went live to billions of Facebook users about six months ago, has so far collected data from more than 30 million people around the world. The survey asked if they tested positive for the virus, if they wear masks and are socially distant, and if they currently have symptoms. Respondents also share data about their demographics, such as age, mental health, and pre-existing medical conditions.
More than 1.5 million people complete the survey every week. To protect privacy, Facebook does not have direct access to the answers. Carnegie Mellon has now published aggregated data through its COVIDcast API as well as real-time visualizations.
However, there are still some important questions to answer: Will this data really be useful? And can it predict the next Covid-19 outbreak before it happens?
To find out, a group of epidemiologists and infectious disease experts from Carnegie Mellon, the University of Maryland, the Duke Margolis Center for Health Policy, and Resolve to Save Lives, a nonprofit led by former CDC director Tom Frieden, have a challenge launched open to any data scientist or researcher. With the prize money funded by Facebook, the ultimate goal is to see if the dataset can be used to find the next Covid-19 surge so public health officials can deploy scarce resources accordingly.
"It is a wealth of information that has amazed me and that is not being used more widely," said Dr. Farzad Mostashari, former National Health Information Technology Coordinator in the Ministry of Health and Human Services, in a telephone interview. He also helped create the challenge. "If it is better understood, this could be a big step forward."
Once the submissions are received – the first deadline is September 29 – they will be reviewed by a scientific committee made up of epidemiologists and data scientists. Mostashari, John Brownstein of Boston Children's Hospital, and Alex Reinhart, assistant professor of statistics and data science at Carnegie Mellon, are on the committee, along with about a dozen others working on the frontlines of the pandemic.
The idea of using consumer technology tools like Facebook and Google to gather information about diseases is nothing new.
In the mid-2000s, a group of epidemiologists, including Brownstein, began working with technology companies to see if their data could be used to advance public health programs. This led to projects like Google Flu Trends, launched in 2008, which aimed to use search trends to determine the prevalence of influenza in specific regions.
Google Flu Trends didn't end up being a great success story, partly because Google learned too late that it needed to combine these datasets with information gathered by public health agencies like the CDC. It worked in summer 2015.
However, researchers still see the benefit of gathering information about people's symptoms, be it from search terms, online surveys, or wearable devices. In combination with other so-called "syndrome monitoring" records, e.g. For example, how many patients in emergency rooms report flu-like illnesses, the data collected by technology companies can help predict epidemics, researchers say.
Covid-19 has now inspired many of the largest tech companies to re-fund health agency funding after learning from their previous mistakes.
"We have validated this type of data over time," said Brownstein, who continues to work with tech giants like Facebook, Google and Uber. "And now the tech companies are investing significant resources."
Also this week, Google announced that it is investigating whether trends in the symptom search, such as B. the search for a fever, predict a possible Covid-19 outbreak and help researchers to map the spread of the virus. The approach is similar to Google Flu Trends, but the company is looking for feedback from public health researchers to make the dataset more useful over time.
Mostashari believes that these new types of records are needed because current methods are far from perfect. Due to inadequate testing in countries like the US, it is a challenge for health authorities to determine the exact number of cases. Of course, deaths are a lagging indicator. Monitoring hospital emergency rooms alone can be inadequate because of the changing way people seek care. For example, in a pandemic, fewer people generally go to emergency rooms than normal – and that can affect the data.
"The cat's pajamas"
Mostashari said the survey data may have helped researchers predict the recent surge in cases in Florida. "There is enough evidence that it could be a big deal," he said.
Other researchers agree. "It was clear within a few months of collecting the data that the signal appeared to have some correlation with confirmed case numbers," added Reinhart of Carnegie Mellon, referring to his group's initial efforts to determine whether the symptom data was with the State correlated. Status reports on the number of cases. "However, given the sample size, it took us longer to do a deeper analysis."
But Reinhart and Mostashari say they are open to being proven wrong. They hope that the researchers who join the challenge will test their assumptions and learn more in the process.
"We want it to be torn apart," said Mostashari. "And for those who take on the challenge of asking questions whether this (dataset) is really the cat's pajamas, or whether we are seeing a no-cause correlation."