We are excited to share that the JUMP Cell Painting Datasets now include most data components from all but one of the sources for the principal dataset (cpg0016), covering 116,000 chemical and over 20,000 genetic perturbations (covering ~15,000 genes). The first draft of metadata files are available as well as a notebook to load and inspect the data currently available in the principal dataset. Please note that a full quality control of the dataset is pending and some replicates are missing. All data components of the three pilot datasets (cpg0000, cpg0001, and cpg0002) are available.
Looking ahead, we will be extending the metadata and notebooks to the three pilots for seamless integration. Additionally, curated annotations for the compounds from ChEMBL and other sources will be provided. We also plan to release the remaining data components and the one remaining source for the principal dataset, alongside deep learning embeddings for all four datasets using a pre-trained neural network. Finally, image-level quality control results for the principal dataset will be made available to enable the removal of bad images.
Stay tuned for these exciting developments, and don't hesitate to reach out here if you have any questions or suggestions!
JUMP Cell Painting dataset: morphological impact of 136,000 chemical and genetic perturbations
We posted a minimalist preprint in BioRxiv describing basics about the dataset. Please note that this is just a placeholder - we plan to carry out further analyses of the data and submit it for formal publication, but we wanted to share details about the dataset with the community as soon as possible.