Data independent acquisition-based proteomics reveals a multi-scale picture of gene expression in Escherichia coli
- Organism: Escherichia coli
- Instrument: TripleTOF 5600
Escherichia coli, absolute protein quantification, AQUA, xTop, DIA, SWATH, targeted proteomics
- Lab head:
Reproducible and accurate quantitative measurements of protein concentrations are increasingly valuable for the quantitative study of gene expression. Here, we provide high-quality absolute abundances of E. coli proteins, down to 50 copies per cell, using DIA/SWATH mass spectrometry. To do so, we generated a comprehensive E. coli spectral library entailing information for more than 60% of all E. coli proteins, and we developed a novel quantitative protein inference algorithm, termed xTop, which is found to have superior performances compared to other commonly applied methods in terms of reproducibility, accuracy and sampling depth. We applied our method to study carbon and nitrogen starvation, translational limitation, as well as a variety of growth conditions and stresses, allowing us to explore the hierarchical nature of regulation of the E. coli proteome across different scales: from large-scale “sectors” of proteins with similar response to nutritional or translational stresses, to intermediate, functionally coherent, clusters of tightly coregulated proteins, down to the expression of individual genes.
Most of this work is based on E. coli K-12 strain NCM3722 whose growth physiology has been extensively characterized. A few derivatives of NCM3722 were also used: EQ59 which constitutively expresses GFP from the chromosome, NQ359 which expresses GOGAT from a titratable promoter in GDH-null background, NQ1431 which harbors phnE+ allele and can utilize phosphonate as a sole phosphorus source, and NQ1527, which restores a mutation in the rpoS gene. Construction of NQ1431 and NQ1527 are described below. For the calibration samples A1, C1 and F1, we used strain EQ353, which is the specific MG1655 strain used in Li et al. 2014. Additionally, we used E. coli K-12 strain MG1655 obtained from the Coli Genetic Stock Center (CGSC#6300) and E. coli Nissle1917 isolated from a Mutaflor capsule (Pharam-Zentrale, Germany).
Unless otherwise indicated, growth media used are based on one of the following base media: modified Record’s MOPS medium, phosphate-buffered “N-C-“ medium, M9 medium and Luria-Bertani (LB) medium.
E. coli cells collected for the library samples were grown in batch culture as described below. Batch cultures were grown in a 37°C water bath shaker shaking at 250 rpm for aeration. Each growth experiment was carried out in three steps: seed culture in LB broth, pre-culture and experimental culture in an identical growth medium. For seed culture, cells from a single colony grown on LB agar plate was inoculated into liquid LB and grown at 37°C with shaking. Cells in the seed culture were then transferred into the growth medium with proper dilution and grown at 37°C overnight (pre-culture). Cells from the overnight pre-culture was then transferred into the growth medium with proper dilution, and grown at 37°C until harvested for proteomic analysis as described below. Optical densities at 600 nm were measured with a spectrophotometer Genesys 20 (Thermo Scientific).
Proteomic sample preparation
The proteomic sample preparation was performed using an optimized E. coli protocol described previously by Schmidt et al.. Briefly, E. coli cell pellets, were lysed with 2% sodium deoxycholate, ultrasonicated and heated to 95°C. Proteins were reduced, alkylated and digested with LysC and trypsin. The peptide mixtures were desalted, dried and resuspended to a concentration of 0.5 µg/µl. To all peptide mixtures the iRT peptide mix (Biognosys) was added directly before the MS-measurement. To increase proteome coverage, 33 µg of peptides from samples Lib1 to Lib30 were pooled and fractionated by off-gel electrophoresis (OGE) into 13 fractions.
Exclusively to the three biological replicates of the calibration sample (E. coli strain K-12 MG1655 grown in glucose minimal media at exponential growth phase) a set of 29 stable isotope labeled peptides (AQUA peptides) was spiked after digestion and before C18 purification. Depending on the previously determined endogenous peptide intensities, either a concentration of 10 fmol/µl or 100 fmol/µl was spiked. Those 29 isotope-labeled AQUA peptides were used to absolutely quantify 29 anchor proteins and to hereby confirm the high proteome similarity (also in absolute terms) between the calibration sample generated in-house and the sample studied and published by Li et al. using ribosomal profiling.
DDA mass spectrometry
LC-MS/MS runs in DDA mode were performed on a TripleTOF 5600 mass spectrometer (SCIEX) interfaced with an NanoLC Ultra 2D Plus HPLC system (Eksigent). Peptides were separated using a 120 min gradient from 2 – 35% buffer B (0.1% v/v formic acid, 90% v/v acetonitrile). The 20 most intense precursors were selected for fragmentation. For the generation of the E. coli spectral library 53 DDA-based proteomic measurements were performed in total.
Generation of spectral library and peptide query parameters
A non-redundant consensus spectral library was generated with SpectraST. The python script “spectrast2tsv” (https://pypi.python.org/pypi/msproteomicstools) was used to extract peptide query parameters from the spectral library. This script automatically extracted the six most abundant singly or doubly charged b- and y-ion fragments for each peptide precursor in the range between 350 to 2,000 m/z, excluding the precursor isolation window region. iRT peptides were used to generate normalized retention times for all peptides.
DIA/SWATH mass spectrometry
Tryptic peptides were measured in SWATH mode on two TripleTOF 5600 mass spectrometers (Sciex), both interfaced with an Eksigent NanoLC Ultra 2D Plus HPLC system. Peptides were separated using a 60 minutes gradient from 2–35% buffer B (0.1% (v/v) formic acid, 90% (v/v) acetonitrile). A 64-variable window DIA scheme was applied, covering the precursor mass range of 400–1,200 m/z, with a total cycle time of ~3.45 s. Per MS injection 2 μg of protein amount was loaded onto the HPLC column.
DIA/SWATH data analysis
The DIA/SWATH data was analysed using OpenSWATH (www.openswath.org). We only changed the following parameter: m/z extraction windows = 50 PPM. To extract the data, we used our E. coli spectral library described before. PyProphet-cli, an extended version of PyProphet, optimally combined peptide query scores into a single discriminative score and estimated q-values using a semi-supervised algorithm. To assign the weight of each OpenSWATH subscore, we used the set of peptide peak groups subsampled from every run with the ratio of 0.07. The software was run using the experiment-wide and global context with a fixed lambda of 0.8, and the results of the experiment-wide mode were filtered with a 1% protein and peptide false discovery rate according to the global mode analysis. TRIC was applied to align extracted and scored peak groups across all the runs following the filtration steps. The resulting peptide-level and protein-level quantitative data matrices are available in Supporting Table S1.
Created on 1/22/20, 3:27 PM