UCSD researchers have developed a model to predict the spread of influenza up to a week in advance, with as much accuracy as Google Flu Trends can estimate current infections.
The study, published last Thursday, Jan. 29 combined “big data” compiled from GFT and traditional data from the Center for Disease Control to assess real-time flu activity with better accuracy and to make valid predictions about future spread.
GFT operates on the assumption that a spike in flu-related Google web searches in a geographical region is directly correlated with a spike in that region’s flu activity. People who have the flu are expected to be searching the web for indicative symptoms and possible remedies. While its system has been successful in estimating levels of flu infections in real-time (and historically), GFT alone does not claim reliability in its estimates about the future.
Michael Davidson, first author and a doctoral student in political science, told the UCSD Guardian that the flu prediction model improves GFT’s data, which was originally an improvement from the CDC’s two-week delay in disclosing current flu trends.
“One of the reasons Google is so great is because it can estimate how many people are sick [with influenza] right now,” Davidson said. “The CDC can [also] estimate it, but it takes about two weeks to figure out how many people are sick. In our model, we can combine those two.”
In order to predict future flu trends, UCSD researchers set out to map a nationwide “social network” that could provide insight into which regions of the country would likely be struck with an outbreak next, based on the infection activity in regions to which they are connected.
“Social networks can teach us a lot about an individual based on the people whom they are connected to,” Davidson said. “We thought we could improve this really important tool by doing the same thing in the context of the flu.”
There are 10 regions in the United States whose borders are defined by the Department of Health and Human Services. Using data from the CDC about which regions experienced similar levels of infections at the same time over the past year, Davidson and his team developed a method to predict the movement of influenza across the regions. They checked their predictions for one week in advance at a time against historic records and found the model to be valid.
The ability to predict influenza movement would mean better allocation of resources for doctors, who could better gauge how many vaccines would be needed in a given flu season. For data modeling, Davidson says their method shows the importance of incorporating multiple sources of information.
“The basic principle behind it all is that big data doesn’t work the best in a vacuum,” Davidson said. “It actually does better when we combine it with these traditional sources of data [such as those from the CDC].”
Expanding the reach of the model to predict trends within specific states, counties and cities is a future goal of researchers; however, doing so would require analyzing GFT’s raw data — which the team did not access — in these places.
“That would be great if we had more access to data like that,” Davidson said. “I’m confident that Google has the ability to make predictions [about flu trends] like that, bu t as of yet if you go on the GFT website they say that it’s just experimental at the city level.”