MacCoss - 2024-6 Cascadia

MacCoss - 2024-6 Cascadia
A transformer model for de novo sequencing of data-independent acquisition mass spectrometry data
Data License: CC BY 4.0 | ProteomeXchange: PXD053291 | doi: https://doi.org/10.6069/68hb-wh73
  • Organism: Homo sapiens, Mus musculus
  • Instrument: Orbitrap Astral,Q Exactive HF-X
  • SpikeIn: No
  • Keywords: de novo, data independent acquisition, sequence variant, extracellular vesicles, Mag-Net
  • Lab head: Michael MacCoss Submitter: Michael MacCoss
Abstract
A core computational challenge in the analysis of mass spectrometry data is the de novo sequencing problem, in which the generating amino acid sequence is inferred directly from an observed fragmentation spectrum without the use of a sequence database. Recently, deep learning models have made significant advances in de novo sequencing by learning from massive datasets of high confidence labeled mass spectra. However, these methods are primarily designed for data-dependent acquisition (DDA) experiments. Over the past decade, the field of mass spectrometry has been moving toward using data-independent acquisition (DIA) protocols for the analysis of complex proteomic samples due to their superior specificity and reproducibility. Hence, we present a new de novo sequencing model called Cascadia, which uses a transformer architecture to handle the more complex data generated by DIA protocols. In comparisons with existing approaches for de novo sequencing of DIA data, Cascadia achieves improved performance across a range of instruments and experimental protocols. Additionally, we demonstrate Cascadia’s ability to accurately discover de novo coding variants and peptides from the variable region of antibodies.
Experiment Description
For our experiments with narrow-window DIA data, we use a training dataset of 878,217 labeled augmented spectra derived from 77 mouse plasma DIA mass spectrometry runs with 4~Th isolation window collected on the Orbitrap Astral. Peptide detections for training were generated using the DIA_Speclib_Quant workflow in MSFragger-DIA, and precursor features for DeepNovo-DIA and Casanovo were selected using DIA-Umpire. To demonstrate Cascadia’s ability to discover de novo peptides, we test it in a setting were ground truth labels are available through an orthogonal sequencing modality. We generated 40 DIA runs from the sampling of the surface of human skin using D100 squame sampling disks. We used three of these DIA runs derived from three different individuals. Targeted exome sequencing was then performed on 549 genes in these same individuals, yielding lists of 357, 368, and 595 ground truth single-nucleotide variants (SNVs) in each sample.
Sample Description
Mouse plasma samples were prepared with the Mag-Net protocol for enrichment of extracellular vesicles as described using 50 µl of EDTA plasma (https://doi.org/10.1101/2023.06.10.544439). The surface of the skin was sampled using a D100 Squame disk, placed in an Eppendorf tube, and vortexed in the presence of 2% SDS buffer. The samples were digested to proteins using S-traps via the manufacturer's instructions (https://files.protifi.com/protocols/s-trap-micro-long-4-7.pdf)
Created on 6/21/24, 4:53 PM
Clustergrammer Heatmap
 
Download
Unknown-HMW-GS-Combination_2024-12-12_13-43-29.sky.zip2025-04-01 14:47:3114404039258
Known-HMW-GS-Combination_2024-12-12_13-15-23.sky.zip2025-04-01 14:47:2614404039237
07_Final-Method_Pooled-Sample_2024-12-12_13-01-40.sky.zip2025-04-01 14:47:211440403814
06_Refinement-Method_Pooled-Sample_2024-12-12_13-01-05.sky.zip2025-04-01 14:47:211467671,0164
05_Potential-Marker-Peptide_SingleRuns_2024-12-12_12-53-42.sky.zip2025-04-01 14:47:1614671343,0966
04_Present-in-positive_Single-Runs_2024-12-11_10-53-50.sky.zip2025-04-01 14:47:09141022054,5446
03_Confirmation__Pooled-Sample_2024-12-11_10-46-23.sky.zip2025-04-01 14:47:06141092194,8341
02_First-Screening_Pooled-Sample_Reduced-For-Confirmation_2024-12-11_10-39-53.sky.zip2025-04-01 14:46:56141242595,7621
01_First-Screening_Pooled-Sample_2024-12-11_10-37-06.sky.zip2025-04-01 14:46:561424561213,1531
050924withassaymapcurves_2025-04-01_15-31-03.sky.zip2025-04-01 12:31:261484862
051024_PlasmaPilotwithDose_2025-04-01_15-30-07.sky.zip2025-04-01 12:30:3114848369
Study9S_Site52_v3_2025-03-11_09-06-36.sky.zip2025-03-11 01:08:255994313
Long-term QC_digested serum.sky.zip2025-03-10 14:38:0414234614059
Intra- and interday precision.sky.zip2025-03-10 14:38:04255110230618
calibration_CRP_ESDTSYVSLK.sky.zip2025-03-10 14:38:04112645
calibration_IgG3_TPLGDTTHTCPR.sky.zip2025-03-10 14:38:04112633
calibration_SAA4_EALQGVGDMGR.sky.zip2025-03-10 14:38:04112630
calibration_SAA4_FRPDGLPK.sky.zip2025-03-10 14:38:04112633
calibration_A1AG2_SDVMYTDWK.sky.zip2025-03-10 14:38:04112633
calibration_ApoA_DYVSQFEGSALGK+AKPALEDLR.sky.zip2025-03-10 14:38:041241039
calibration_A1AT-1_AVLTIDEK.sky.zip2025-03-10 14:38:04112636
calibration_IgA1_TPLTATLSK.sky.zip2025-03-10 14:38:04112636
calibration_ApoE_LGPLVEQGR.sky.zip2025-03-10 14:38:04112642
calibration_CRP-1_GYSIFSYATK.sky.zip2025-03-10 14:38:04112430
calibration_SAA2-1_LTGHGAEDSLADQAANK+GPGGAWAAEVISNAR.sky.zip2025-03-10 14:38:041241242
calibration_SAA2-1_GAEDSLADQAANK.sky.zip2025-03-10 14:38:04112648
calibration_SAA1+2_SFFSFLGEAFDGAR.sky.zip2025-03-10 14:38:04112639
calibration_SAA1+2_EANYIGSDK.sky.zip2025-03-10 14:38:04112642
calibration_A1AG1_NWGLSVYADKPETTK.sky.zip2025-03-10 14:38:04112633
calibration_CX3CL1_AQDGGPVGTELFR.sky.zip2025-03-10 14:38:04112645
calibration_CHGA_VAHQLQALR.sky.zip2025-03-10 14:38:04112639
calibration_SERPINF2_FDPSLTQR+DFLQSLK.sky.zip2025-03-10 14:38:041241248
calibration_SERPINF2-1_LGNQEPGGQTALK.sky.zip2025-03-10 14:38:04112639
calibration_B2M_VNHVTLSQPK.sky.zip2025-03-10 14:38:04112639
calibration_CAL2_DLQNFLK.sky.zip2025-03-10 14:38:04112645
calibration_SAA1_FFGHGAEDSLADQAANEWGR.sky.zip2025-03-10 14:38:04112633
calibration_A1AG2_TLMFGSYLDDEK.sky.zip2025-03-10 14:38:04112639
calibration_IgG2_GLPAPIEK.sky.zip2025-03-10 14:38:04112648
calibration_IgG_total_NQVSLTCLVK+DTLMISR.sky.zip2025-03-10 14:38:041241257
calibration_IgG1_TTPPVLDSDGSFFLYSK+GPSVFPLAPSSK.sky.zip2025-03-10 14:38:041241257
calibration_A1AT-1_FNKPFVFLMIEQNTK.sky.zip2025-03-10 14:38:04112636
calibration_IgG3_SCDTPPPCPR.sky.zip2025-03-10 14:38:04112642
calibration_MPO_QNQIAVDEIR+VVLEGGIPILR.sky.zip2025-03-10 14:38:041241242
calibration_MPO_IANVFTNAFR.sky.zip2025-03-10 14:38:04112639
calibration_CAL2_LGHPDTLNQGEFK.sky.zip2025-03-10 14:38:04112642
calibration_CAL1_ALNSIIDVYHK+GADVWFK.sky.zip2025-03-10 14:38:041241239
calibration_IgG4_TTPPVLDSDGSFFLYSR+GLPSSIEK.sky.zip2025-03-10 14:38:041241239
calibration_KNG1_QVVAGLNFR.sky.zip2025-03-10 14:38:041121042
calibration_KNG1_TVGSDTFYSFK+YFIDFVAR.sky.zip2025-03-10 14:38:041242048
calibration_AMBP_GVCEETSGAYEK+ETLLQDFR.sky.zip2025-03-10 14:38:041242048
calibration_LBP_ITLPDFTGDLR+LAEGFPLPLLK.sky.zip2025-03-10 14:38:041241848
calibration_CST3_ALDFAVGEYNK.sky.zip2025-03-10 14:38:04112848
calibration_A2M_AIGYLNTGYQR.sky.zip2025-03-10 14:38:04112645
calibration_A2M_NEDSLVFVQTDK.sky.zip2025-03-10 14:38:04112639
calibration_HP_TEGDGVYTLNNEK+VGYVSGWGR.sky.zip2025-03-10 14:38:031241245
calibration_CHGA_ELQDLALQGAK.sky.zip2025-03-10 14:38:03112648
calibration_SAA4_GPGGVWAAK.sky.zip2025-03-10 14:38:03112648
calibration_A1AG1_YVGGQEHFAHLLILR+TYMLAFDVNDEK.sky.zip2025-03-10 14:38:031241251
calibration_A1AG2_EHVAHLLFLR.sky.zip2025-03-10 14:38:03112442
calibration_A1AG1+2_WFYIASAFR.sky.zip2025-03-10 14:38:03112645
calibration_A1AT_SVLGQLGITK.sky.zip2025-03-10 14:38:03112639
calibration_A1AT_LSITGTYDLK+FLENEDR.sky.zip2025-03-10 14:38:031241251
Intra- and interday precision_part 2.sky.zip2025-03-07 14:34:0212234613818
Intra- and interday precision_part 1.sky.zip2025-03-07 14:34:0214285616818
Long-term QC_digested plasma.sky.zip2025-03-07 14:34:0214234613629
calibration_SAA1_FFGHGAEDSLADQAANEWGR.sky.zip2025-03-07 14:34:02112636
calibration_A1AG2_TLMFGSYLDDEK.sky.zip2025-03-07 14:34:02112636
calibration_IgG_total_NQVSLTCLVK+DTLMISR.sky.zip2025-03-07 14:34:021241260
calibration_IgG2_GLPAPIEK.sky.zip2025-03-07 14:34:02112651
calibration_IgG1_TTPPVLDSDGSFFLYSK+GPSVFPLAPSSK.sky.zip2025-03-07 14:34:021241260
calibration_A1AT-1_FNKPFVFLMIEQNTK.sky.zip2025-03-07 14:34:02112639
calibration_CAL2_LGHPDTLNQGEFK.sky.zip2025-03-07 14:34:02112551
calibration_MPO_VVLEGGIDPILR.sky.zip2025-03-07 14:34:02112648
calibration_MPO_QNQIAVDEIR.sky.zip2025-03-07 14:34:02112651
calibration_MPO_IANVFTNAFR.sky.zip2025-03-07 14:34:02112645
calibration_CAL1_ALNSIIDVYHK+GADVWFK.sky.zip2025-03-07 14:34:021241251
calibration_ApoE_LGPLVEQGR.sky.zip2025-03-07 14:34:02112651
calibration_SAA2-1_GPGGAWAAEVISNAR.sky.zip2025-03-07 14:34:02112639
calibration_SAA2-1_GAEDSLADQAANK+LTGHGAEDSLADQAANK.sky.zip2025-03-07 14:34:021241251
calibration_SAA1+2_EANYIGSDK.sky.zip2025-03-07 14:34:02112648
calibration_CRP_ESDTSYVSLK.sky.zip2025-03-07 14:34:02112651
calibration_IgG3_SCDTPPPCPR.sky.zip2025-03-07 14:34:02112651
calibration_IgG4_GLPSSIEK.sky.zip2025-03-07 14:34:02112651
calibration_IgG4_TTPPVLDSDGSFFLYSR.sky.zip2025-03-07 14:34:02112648
calibration_CX3CL1_AQDGGPVGTELFR.sky.zip2025-03-07 14:34:02112651
calibration_SERPINF2_FDPSLTQR+DFLQSLK.sky.zip2025-03-07 14:34:021241248
calibration_SERPINF2-1_LGNQEPGGQTALK.sky.zip2025-03-07 14:34:02112651
calibration A1AT-1_AVLTIDEK.sky.zip2025-03-07 14:34:02112645
calibration_SAA4_FRPDGLPK.sky.zip2025-03-07 14:34:02112642
calibration_ApoA_AKPALEDLR.sky.zip2025-03-07 14:34:02112845
calibration_ApoA_DYVSQFEGSALGK.sky.zip2025-03-07 14:34:02112633
calibration_IgA1_TPLTATLSK.sky.zip2025-03-07 14:34:02112645
calibration_KNG1_QVVAGLNFR.sky.zip2025-03-07 14:34:02112639
calibration_KNG1_TVGSDTFYSFK+YFIDFVAR.sky.zip2025-03-07 14:34:021241242
calibration_CST3_ALDFAVGEYNK.sky.zip2025-03-07 14:34:02112442
calibration_AMBP_GVCEETSGAYEK+ETLLQDFR.sky.zip2025-03-07 14:34:021241242
calibration_LBP_ITLPDFTGDLR+LAEGFPLPLLK.sky.zip2025-03-07 14:34:021241242
calibration_B2M_VNHVTLSQPK.sky.zip2025-03-07 14:34:02112642
calibration_A2M_NEDSLVFVQTDK+AIGYLNTGYQR.sky.zip2025-03-07 14:34:021241239
calibration_HP_TEGDGVYTLNNEK+VGYVSGWGR.sky.zip2025-03-07 14:34:021241236