The CAMDA Contest Challenges

For 2019, we present

CAMDA encourages an open contest, where all analyses of the contest data sets are of interest, not limited to the questions suggested here. There is an online forum for the free discussion of the contest data sets and their analysis, in which you are encouraged to participate.

We look forward to a lively contest!

Metagenomic Forensics Challenge

MetaSUB is building an international metagenomic map of urban spaces, based on extensive sampling of mass-transit systems and other public areas across the globe. In a strategic partnership new data from global City Sampling Days is first introduced through the annual CAMDA contests. CAMDA delegates thus receive access to hundreds of novel MetaSUB samples, comprising several gigabases of whole genome shotgun (WGS) metagenomics data. The primary data set covers multiple cities around the world, with tens of samples per city, providing a unique resource for the study of biodiversity within and across geographic locations.

Further extended global coverage is achieved by complementary 16S rRNA studies contributing thousands of samples, the Earth Microbiome Project and A global atlas of the dominant bacteria found in soil. For a range of MetaSUB Boston reference samples we now provide both WGS and 16S profiles, allowing a first systematic link of WGS and 16S resources.

Together, these unique resources support novel approaches to identify sample origin locations for cities seen for the very first time. Performance can be tested on an independent test set of over 50 samples from multiple 'mystery' locations from cities not sampled before.

Please visit and participate in the open CAMDA meta-genomics forum for free discussion related to this contest.

Analysis suggestions:
A key challenge in metagenomic forensics is the construction of a microbiome fingerprint which will allow the prediction of the geographical origin of a sample even in case when no reference samples from this location are known.

Typical considerations include:

  • How can we exploit metagenomic fingerprints for identifying the origin of a sample?
  • How reliable are such predictions of sample origins?

The primary data set is now available. This contains: i) the 16S sequencing-based OTUs for thousands of soil samples from two mentioned project from allover the world, ii) hundreds of samples with WGS raw reads from urban locations from MetaSUB Consortium.

The set of mystery samples are now available and consist of further tens of samples with WGS raw reads.

Please sign up to announcements from the CAMDA meta-genomics forum for alerts.

For a copy of our data, please accept the data download agreement for access.

Hi-Res Cancer Data Integration Challenge

From the comprehensive description of genomic, transcriptomic and epigenomic changes of cancers provided by Genomic Data Commons (GDC, formerly at TCGA), the main goal of this challenge is to develop and demonstrate novel methods for gaining novel biological insights or improving support for Precision Medicine. Innovation can come from

Examine algorithm performance in a real-world clinical settings! We know that many approaches work well on some data-sets yet not on others. We here challenge you to demonstrate a unified single approach that matches or outperforms the current state-of-the-art for

and for at least one of the less well studied

Please visit and participate in the open CAMDA data integration forum for free discussion related to this contest.

Analysis suggestions:
Biological:

  • What known and new disease mechanisms can you identify?
  • How can the integration of matched molecular profiles and patient data yield a more meaningful readout, including likely causal changes?
  • What can we learn about the role of aberrant splicing and regulation of alternative gene transcripts in cancer?
  • How can individual human genomics sequence aid Precision Medicine and the development of personalized rational drug treatment plans?

Technical:

  • Can we apply approaches and insights developed from one type of cancer (e.g., a common, well studied cancer) to other diseases (e.g., less-well studied cancers)?
  • How large a distortion is observed from restriction of gene expression readout to the standard human reference sequence (vs mapping to individual human genome sequences)?

Contest data comprises raw and pre-processed data from matched molecular profiles with complementary clinical information.

For convenience, we provide a local copy of the data. In addition, anonymized RNA-seq read level data are now available.

Please sign up to announcements from the CAMDA data integration forum for alerts.

Please read and accept the data download agreement for access.

CMap Drug Safety Challenge

Attrition in drug discovery and development due to safety / toxicity issues remains a significant concern, and there are strong efforts to identify and mitigate risk as early as possible. Drug-induced liver injury (DILI) is one of the primary problems in drug development and regulatory clearance due to the poor performance of existing preclinical models. There is a pressing need to evaluate alternative methods for predicting DILI, with great hopes being placed in modern approaches from statistics and machine learning applied to genome scale profiling data. A critical question thus is if we can better integrate, understand, and exploit information from cell-based screens like the Broad Institute Connectivity Map (CMap, Science 313, Nature Reviews Cancer 7, Cell 171).

This CAMDA challenge focuses on understanding or predicting drug induced liver injury in humans from cell-based screens, specifically the CMap L1000 gene expression responses of thirteen different cancer cell lines to 1,314 drug compounds, greatly extending last year's challenge data set (common cell lines: MCF7 & PC3, new cell lines: A375, A549, ASC, HA1E, HCC515, HEPG2, HT29, NPC, PHH, SKB, VCAP). To also support supervised approaches, we provide clinical DILI results as training labels for 233 drugs (21 high concern, 62 less concern, 35 no concern, and 55 ambiguous) .

In addition, we now provide chemical structures (SMILES codes) for all drugs, and annotated images from cellular assays (Journal of chemical information and modeling (to appear), Nature Protocols 11, GigaScience 6) for a subset of drugs and cell lines. Please read and accept the data download agreement for access. Please visit and participate in the open CAMDA toxicogenomics forum for free discussion related to this contest.

Analysis suggestions:

  • Identification and interpretation of differences in cell-line response across drugs and across cell-line type
  • Prediction of human clinical DILI results from cell-line responses
  • Integration of molecular and cellular assays. Assessment of the relative values of the complementary data types for prediction.

Contest data comprise raw and processed expression profiles from the Broad Institute Human L1000 epsilon platform, Complementary information includes like cell line names, drug concentrations, etc. as well as the chemical structure (SMILES codes and PubChem IDs) and Cell Painting images. Toxicity labels were compiled by the US FDA.

A local copy of relevant subsets of the data is now available (images will be added from mid February).

Independent validation labels for selected chemical compounds are now available.

Please sign up to announcements from the CAMDA toxicogenomics forum for alerts.

Please read and accept the data download agreement for access.