MacCoss - Casanovo 5.0

Improvements to Casanovo, a deep learning de novo peptide sequencer
Data License: CC BY 4.0 | ProteomeXchange: PXD066485 | doi: https://doi.org/10.6069/96jr-2s62
  • Organism: Homo sapiens, Mus musculus, Saccharomyces cerevisiae
  • Instrument: Orbitrap Fusion Lumos
  • SpikeIn: No
  • Keywords: Casanovo, de novo, DDA
  • Lab head: William Noble Submitter: Chris Hsu
Abstract
Casanovo is a state-of-the-art deep learning model for de novo peptide sequencing from mass spec- trometry proteomics data. Here we report on a series of enhancements to Casanovo, aimed at improving the interpretability of the scores assigned to predicted peptides, generalizing the software for use in database search, speeding up training and prediction runtimes, and providing workflows and visual- ization tools to facilitate adoption of Casanovo and interpretation of its results. Our goal is to make Casanovo accurate and easy to use for applications such as metaproteomics, antibody sequencing, im- munopeptidomics, and discovery of novel peptide sequences in standard proteomics analyses. Casanovo is available as open source at https://github.com/Noble-Lab/casanovo. The raw data and search results can also be viewed at: https://limelight.yeastrc.org/limelight/d/pg/project/165
Experiment Description
To demonstrate and benchmark Casanovo’s database search functionality, we used three tandem mass spectrometry runs measured from human, mouse, or yeast samples, respectively. The samples were prepared using a protein aggregate method as previously described by Wen et al. [26] and acquired on an Orbitrap Fusion Lumos mass spectrometer paired to an EvoSep One liquid chromatography system using an 88-minute extended gradient method. For all samples, the data-dependent acquisition method contained one cycle of MS1 (60,000 Orbitrap resolving power, 118 milliseconds max injection time, and 100% AGC target) and MS2 (15k Orbitrap resolving power, 1.6 m/z isolation window, 27% HCD collision energy, 22 milliseconds max injection time, and 100% AGC target). The MS1 peaks were filtered by intensity threshold of greater than 2E4, charge state of 2-5, and dynamic exclusion (with an exclusion duration of 15 seconds and repeat count of 1). Raw MS/MS data were converted to mzML files using MSConvert with peak picking enabled in ProteoWizard (version 3.0.24031) [27]. Reference databases for each experiment were downloaded from UniProt [28] (proteome identifiers UP000005640, UP000000589, and UP001165141) on 18 April 2025.
Created on 7/23/25, 10:28 AM
Clustergrammer Heatmap
 
Download
hela_2025-07-21_15-14-01.sky.zip2025-07-23 10:27:101,8379,1829,64986,7361
mouse_2025-07-21_15-13-32.sky.zip2025-07-23 10:27:101,4147,6538,39875,5071
yeast_2025-07-21_15-12-08.sky.zip2025-07-23 10:27:101756016525,8661
The raw data and search results can also be viewed in Limelight at: https://limelight.yeastrc.org/limelight/d/pg/project/165