Spotlight on alternative frame coding: two long overlapping genes in Pseudomonas aeruginosa are translated and under purifying selection
The existence of overlapping genes (OLGs) with significant coding overlaps revolutionises our understanding of genomic complexity. We report two exceptionally long (957 nt and 1536 nt), evolutionarily novel, translated antisense open reading frames (ORFs) embedded within annotated genes in the medically important Gram-negative bacterium Pseudomonas aeruginosa. Both OLG pairs show sequence features consistent with being genes and transcriptional signals in RNA sequencing data. Translation of both OLGs was confirmed by ribosome profiling and mass spectrometry. Quantitative proteomics of samples taken during different phases of growth revealed regulation of protein abundances, implying biological functionality. Both OLGs are taxonomically highly restricted, and likely arose by overprinting within the genus. Evidence for purifying selection further supports functionality. The OLGs reported here, designated olg1 and olg2, are the longest yet proposed in prokaryotes and are among the best attested in terms of translation and evolutionary constraint. These results highlight a potentially large unexplored dimension of prokaryotic genomes.
Oligonucleotides and synthetic peptides. For Olg1, Olg2, Tle3, and PA1383 in total eighteen optimal peptides were selected for isotopically-labeled reference peptides (SpikeTidesL) purchased from JPT Peptide Technologies. Either the C-terminal lysine (Lys8) or arginine residue (Arg10) were 13C- and 15N-labeled. Isotope-labelled peptides were not purified and, thus, concentrations represent only estimates.
Cell lysis and protein digest for mass spectrometry. Cells were lysed in 100 µL absolute TFA (Sigma-Aldrich; 5 min, 55°C, shaking at 1,000 rpm) and neutralized with 900 µL 2 M Tris. Protein concentration was determined using Bradford reagent (B6916, Sigma-Aldrich). For offline high pH reversed-phase (hpH RP) fractionation and for targeted proteomics, 75 µg and 20 µg of total protein amount were reduced and alkylated (10 mM TCEP, 55 mM CAA; 5 min, 95°C), respectively. Water-diluted samples (1:1) were subjected to proteolysis with trypsin (enzyme to protein ratio 1:50, 30°C, overnight, shaking at 400 rpm) and then stopped (3% formic acid, FA).
High pH reversed-phase fractionation for targeted proteomics. C18-packed 200-µl tips were loaded with peptides from the 20 µg digest. A pH switch was performed using 25 mM ammonium formate (pH 10) and varying ACN concentrations for each of six fractions. ACN was added at concentrations of 0, 5, 10, 15, 25, and 50%, respectively. Fraction 1 and 5 and fraction 2 and 6 were combined. The solvent was each evaporated (1+5, 2+6, 3, and 4), and samples were dissolved in 2% ACN/0.1% FA.
Targeted LC-MS/MS measurements. Targeted measurements using Parallel Reaction Monitoring (PRM) were performed with a 50-min linear gradient on a Dionex Ultimate 3000 RSLCnano system coupled to a Q-Exactive HF-X mass spectrometer (Thermo Fisher Scientific). The spectrometer was operated in PRM and positive ionization mode. MS1 spectra (360–1300 m/z) were recorded at a resolution of 60,000 using an AGC target value of 3×106 and a MaxIT of 100 ms. Targeted MS2 spectra were acquired at 60,000 resolution with a fixed first mass of 100 m/z, after HCD with 26% NCE, and using an AGC target value of 1×106, a MaxIT of 118 ms and an isolation window of 0.7 m/z. For the PRM analysis of the growth phase samples, 18 OLG and mother gene peptides plus 12 retention time reference peptides (subset of Procal peptides synthesized by JPT) were targeted within a single PRM run and with a 5 min scheduled retention time window. The cycle time was ~2.1 s, which leads to ~10 data points per chromatographic peak.
Selection and validation of target peptides. Isotope-labelled internal reference peptides were used for confident identification and quantification. Peptide selections were based on results of DDA measurements of the deep proteome at OD600nm = 1. Peptides were selected based on intensity, location within the protein, Andromeda score, excluding modification, and charge state. All isotopically-labeled synthetic peptides were pooled and targeted proteomic measurements (PRM) showed confident detection of all 18 peptides (MaxQuant score >90) Skyline-daily was used to build an experimental spectral library from the generated PRM data.
Targeted mass spectrometric data analysis. PRM data was analysed using Skyline-daily. Peak integration, transition interferences and integration boundaries were reviewed manually, considering four to six transitions per peptide. To discriminate between positive and negative peptide detection, filtering according to correlation of fragment ion intensities between the endogenous (light) and the spike-in (heavy) peptides was applied (“Library Dot Product” ≥0.8). Additionally, a correlation of fragment ion intensities between the light and heavy peptide (“DotProductLightToHeavy” of >0.9) and a mass accuracy of below ±20 ppm (“Average Mass Error PPM”) was required. Total protein intensity was computed by summing up all light peptide intensities detected positive in each sample. Uniqueness of the peptides was assessed against the RefSeq database for P. aeruginosa PAO1.
We performed targeted proteomic measurements including isotopically labelled reference peptides. Based on our initial mass spectrometric data, we selected four to five peptides per target protein and purchased those in synthetic and stable isotopically-labelled form. Those heavy reference peptides were spike into P. aeruginosa PAO1 samples taken from various growth time points (1h, 2h, OD1, 4h, 6h, 8h, 24h) and measured using the targeted proteomic method Parallel Reaction Monitoring (PRM).