This past winter, Sabine Brunswicker, professor of technology leadership and innovation, led nearly 80 fellow data science enthusiasts in the IronHacks COVID-19 Data Science Challenge: Protect Purdue. The hacking competition, which was designed to use data to solve societal problems, was powered by the Purdue IronHacks platform.
The Protect Purdue Challenge encouraged participants to explore a variety of data science models using data from past human actions to predict future human behavior. During the competition, participants used their programming skills to build statistical models that monitored and then predicted anonymous foot traffic across major places of interest in Tippecanoe County in November and December of 2020, when the holidays coincided with the COVID-19 pandemic.
“We gave participants historical data about foot traffic, social distancing, mobility graphs mapping origins to target locations and COVID instances and then asked them to predict the foot traffic in places such as restaurants and shopping centers, to give an indication of social concentration, where people go and spend time or money,” said Brunswicker. “They gave us their predictions and we used ground truth data (actual information that is collected on location by direct observation) to compare. So as they’re making their predictions, we can match it against the ground truth. It’s an excellent example of active learning.”
The data from the IronHacks COVID-19 Data Science Challenge was aggregated into a map of Tippecanoe County to illustrate where people congregated during the months of the hackathon. The information will be shared with the local government and Protect Purdue.
“The best model submitted was very close (to the actual data),” said Brunswicker. “When we asked, ‘How many people go there,’ the best model had an error of less than 14 visits, so it was very close to the ground truth data.”
Brunswicker reported three impacts from the challenge for Protect Purdue’s COVID-19 efforts.
“First, the data science models and the reports produced by our data scientists can inform the decision makers, like the Protect Purdue team, about areas of social concentration in Tippecanoe County. There are certain places that showed a significant increase in foot traffic in November 2020. There is an opportunity to build upon the models and put them in production so that they are updated automatically on a weekly basis to make weekly and monthly predictions about ‘hot spots’ of social crowding. The best solution could be used as a publicly accessible dashboard to report the foot traffic in Tippecanoe County.
“Second, we have built a unique data science community at Purdue and other locations, comprised of people who are eager to continue to participate in COVID-19 data science challenges. This offers the opportunity to engage students in COVID-19-related activities and sharpen their skills in working with and analyzing COVID-19 data. Such informal learning is essential for developing data science skills.
“Third, we increased the awareness of participants about COVID-19 policies, such as social distancing, via surveys, forum posts, interviews, and their actual data science tasks. They worked on social distancing and foot traffic and gained first-hand experiences in the social behaviors and risk associated with COVID-19.”
Participants in the challenge spoke highly of the event and touted the importance of being able to communicate and explain data science results to professionals who require the information.
"I have been learning about machine learning and all of the theories for a year and a half and haven't had a real-life opportunity to practice my theory or knowledge,” said Zhiwei Chu, a student at Purdue Northwest. “This was a good opportunity for me to practice and see what I have learned."
Senthamizhan, a research assistant from Indian Institute of Technology, Madras, was the winner of the data science challenge, with the aforementioned mean absolute error of fewer than eight people. Speaking about the insights gained from the competition, he said, "We, in data science, should focus more on explainability of machine learning models than going for better and better accuracy. Public health officials won't ask us for 90% accuracy, but they will ask how this works, so we should be able to explain it to them."
The Challenge Continues: 2021 COVID-19 Data Science Challenge launching soon
Brunswicker will launch the 2021 stage of the IronHacks COVID-19 Data Science Challenge at the end of April to predict which places are expecting the highest social concentration this spring as COVID-19 restrictions decrease. The challenge will also encourage the participants to build explanatory models that help decision makers better understand why movements are happening. The upcoming challenge is expected to last two months.
“We will use new real-time data that we streamed from our data providers directly into the BigQuery database that the participants are using. The models predicted can then also give an indication about whether social activity has moved back to level prior to the COVID-19 shutdowns in 2019,” Brunswicker added.
The challenge is not restricted to Purdue students or personnel.
“Anyone with programming skills in Python can be involved,” Brunswicker said. “Indeed, we hope that students from K-12 to professionals in industry engage in the challenge. Sometimes the best solutions come from unexpected places.”
To participate, visit ironhacks.com.