This is Phase 2 of COVID-19 Data Hackathon Competitions! The COVID-19 Pandemic continues to impact all our lives and understanding and evaluating data related to the pandemic remains critical.

We are pleased to announce a follow-up of our 2021 COVID-19 Data Hackathon! Competitions are now live, don’t wait! The COVID-19 Pandemic continues to impact all our lives and understanding and evaluating data related to the pandemic remains critical.

What: We have five challenges to choose from – details for each project are provided below.

When: Submissions are due Monday, December 5, 2022!


  1. Register for an account at!
  2. Join a hackathon challenge by registering directly at the Kaggle Link listed below (if provided) or email the project hosts to have your team added
  3. For your team you will download the data, code your solutions, and update it to Kaggle for scoring.
  4. Submit your files for evaluation and scoring. It is that simple!

If you have any difficulty feel free to email Bobak Mortazavi ( Ryan King ( and Yufei Huang (

Public Health Informatics Challenge

The goal of this data challenge is to predict the 7-day average of new COVID-19 cases and the positivity rate based on historical public health data. Accurate prediction of such epidemiological trends can provide useful insights for the public, helping them make informed decisions regarding protection/mitigation measures, travel planning, and more.

Natural Language Processing Challenge

Practical usage of NLP models with COVID-relevant papers might enable automated information extraction from literature to facilitate drug discovery efforts. One of the crucial elements that can inform these efforts is knowledge about viral proteins. The goal of this data challenge is to build an NLP model to identify answers to protein-related questions from scientific papers.

Sensor Informatics Challenge:

The pathogenesis of COVID-19 is increasingly suggesting impairments in the respiratory system. In this light, it is natural to ask – Can sound samples serve as acoustic biomarkers of COVID-19? If yes, an acoustics-based COVID-19 diagnosis can provide a fast, contactless, and inexpensive testing scheme with the potential to supplement the existing molecular testing methods, such as RT-PCR and RAT. The present Challenge is an exploration of ideas to find answers to this question.

Bioinformatics Drug Target Challenge:

The goal of this challenge is to build effective ML/AI-based surrogate models that can accurately predict the docking scores of candidate drug molecules on SARS-CoV-2 protein targets.

Bioinformatics scRNA-Seq Challenge:

Identification of molecular signatures of severity of COVID-19 infection has become of utmost importance for early treatment of this pandemic disease. For this the use of single-cell RNA sequencing (scRNA-Seq) makes possible to identify and quantify thousands of genes within thousands of cells. In this context, scRNA-Seq technology has called for novel artificial intelligence (AI) solutions for data analysis and medical applications. The present challenge consists of the application of an AI algorithm to predict the severity of COVID-19 infection using a scRNA-Seq dataset. This AI model can be of great significance and of practical value for further study of the signatures of the severity of COVID-19.