FragPipe can be downloaded here. Follow the instructions on that same Releases page to launch the program.
FragPipe runs on Windows and Linux operating systems. While very simple analyses may only require 8 GB RAM, large scale/complex analyses or timsTOF data will likely need 24 GB memory or more. Free disk space is needed to run FragPipe analyses and save reports, typically 20-50% of spectral file size for non-ion mobility data. Disk space requirements for quantification of timsTOF data are greater, ~60% spectral file size if .d files are uncompressed, but up to 250% if Bruker’s compression function has been used.
Complete workflows are available for a variety of experiment types, we recommend starting your analysis with a built-in workflow, which can then be customized and saved for future use. For partial processing (e.g. to save time upon reanalysis), steps can be skipped by unchecking the corresponding boxes. Before you get started, make sure your LC-MS file format is compatible with the workflows you want to perform (for Thermo data, we recommend converting .raw files to mzML):
Linux users: please note that Mono must be installed to directly read Thermo .raw files.
- Configure FragPipe
- Setting up FragPipe on remote Linux server (with X forwarding)
- Select workflow and add spectral files
- DIA-Umpire SE
- Specify a sequence database
- Configure MSFragger search
- Label-free quantification
- Isobaric quantification
- Post-translational modifications
- Spectral library generation
- Run FragPipe
When FragPipe launches, the first tab in the window (‘Config’) will be used to configure the program.
- Connect FragPipe to a MSFragger .jar program file. If you already have such a file downloaded, use the ‘Browse’ button to select it or ‘Update’ to upgrade to the latest version. If you have not downloaded MSFragger before, use the ‘Download’ button. (MSFragger installation help)
- Connect FragPipe to a Philosopher program file. If you already have it downloaded, select ‘Browse’, otherwise select ‘Download’. (Philosopher installation help)
- Optional: Python is needed to perform database splitting (necessary in complex searches/low memory situations) and spectral library generation. If you already have Python 3 or greater plus a few additional packages installed (numpy, pandas, Cython, and msproteomicstools) use ‘Browse’ to locate your python.exe file. (Python installation help)
For more help, see the full tutorial on FragPipe configuration.
Select workflow and add spectral files
In the ‘Workflow’ tab:
Choose the workflow you want to use and press ‘Load’. Use Default workflow for simple conventional (closed) searches. A number of common workflows (including glyco) are provided. You can also customize and save workflows for future use (all workflows are stored in the FragPipe ‘workflows’ folder). You can even share your workflow files with other FragPipe users.
Set the amount of memory & number of logical cores to use.
Set ‘Regular MS’ for non-ion mobility data, and ‘IM-MS’ for Bruker timsTOF PASEF data.
Drag & drop LC/MS files into the window or select ‘Add files’ or ‘Add Folder Recursively’ (to add all files in a folder, including those in subfolders). Specify the appropriate labels for the replicates/fractions in your experiment.
Notes about timsTOF data: With timsTOF PASEF data we recommend using raw PASEF files (.d files). You can simply select .d folder as the raw file. If you have already run MSFragger on the .d files, make sure the .mzBIN files resulting from that analysis are in the same directory as the .d files to speed up the analysis. If you don’t need to perform quantification, you can use .mgf files (that can optionally be generated by the Bruker’s software immediately after data acquisition is completed) instead of raw .d.
Once you’ve loaded your spectral files, annotate your data to specify Experiments and Replicates, which determines how your PSM/peptide/protein etc. reports will be generated:
Leave the ‘Experiment’ and ‘Replicate’ fields blank. Use this option if you want to analyze all input files together and generate a single merged report (including bulding a combined spectral library from all input data).
Indicate the ‘Experiment’ and ‘Replicate’ for each input file as shown below, where each replicate of two experimental conditions is composed of two fractions. Different fractions from the same sample should have the same ‘Experiment’/’Replicate’ name.
Note: If you would like to use MSStats for downstream statistical analysis of FragPipe-generated reports, the ‘Replicate’ ID (e.g., 1, 2, 3, and 4 in the above table) should not be reused by different replicates from different experiments. However, if each pair of ‘Control’ and ‘Treatment’ is from the same study subject, you should use the same ‘Replicate’ ID for the corresponding ‘Control’ runs and ‘Treatment’ runs (detailed discussion can be found here.):
where ‘run_name_1.mzML’, ‘run_name_2.mzML’, ‘run_name_5.mzML’, and ‘run_name_6.mzML’ are controls and treatments from the same study subject; ‘run_name_3.mzML’, ‘run_name_4.mzML’, ‘run_name_7.mzML’, and ‘run_name_8.mzML’ are controls and treatments from another study subject.
When analyzing AP-MS and related data (e.g. BioID) for compatibility with the Resource for Evaluation of Protein Interaction Networks (REPRINT), ‘Experiment’ names should be written as follows:
Negative controls: Put Control (or CONTROL) in the Experiment column, and label each biological replicate with a different replicate number.
Bait IPs: Use
[GENE]_[condition] format to describe the experiments, where
[GENE] is the official gene symbol of the bait protein, e.g.
HDAC5. If there are multiple conditions for the same bait protein (e.g. mutant and wt), add can add ‘condition’, e.g.
Note: All negative controls should be labeled the same, as ‘Control’, even if you have negative controls generated under different conditions or in different cell lines.
Note: When the files are annotated with non-empty ‘Experiment’ and/or ‘Replicate’ field (as described above), FragPipe multi-experiment workflow is used, which includes running Philosopher Abacus command for generating combined summary reports at the protein and (optionally) peptide levels. Abacus is run with ‘–reprint’ option, generating reprint-spc.tsv (spectral count-based) and reprint-int.tsv (intensity-based) files. These files can be uploaded to REPRINT for interaction scoring using SAINT or SAINTexpress and downstream visualization of the resulting interaction network.
For TMT/iTRAQ analysis, spectral files should be in mzML format. Raw format is currently not supported.
TMT/iTRAQ experiments typically consist of one or more ‘plexes’ (multiplexed samples), each composed of multiple spectral files (from prefractionation). Use’Experiment’ to denote spectral files from the same plex while leaving the ‘Replicate’ column empty. We recommend organizing data in folders, one for each plex. E.g. if you have 2 TMT plexes, with 2 spectral files (peptide fractions) in each, you can create a folder (e.g. named ‘MyData’), containing two subfolders (e.g. ‘TMT1’ and ‘TMT2’) each containing the corresponding mzML files. Load data by clicking Add folder recursively and selecting ‘MyData’ folder, then assign files to Experiments/Groups By parent directory, resulting in the following spectral file annotation:
Run DIA-Umpire SE
DIA-Umpire’s signal extraction module can now be used through FragPipe. Please note that this tool only accepts the mzXML file format. To use the SE module, select the ‘Enable DIA-Umpire’ checkbox on the Config tab, then load the ‘DIA-Umpire_SpecLib’ workflow from the Workflow tab. For more information, see the Signal Extraction Module section of the DIA-Umpire manual. Specify the path to the MSConvert binary file (can be downloaded here). The ‘Default config file’ path can be left blank, only provide this configuration file if using advanced parameters not shown in the GUI.
Specify a protein sequence database
Protein database files need to follow specific formatting. If you haven’t made a database file using FragPipe/Philosopher before, select ‘Download’ to fetch one from UniProt. Specify the download location, then choose your options and select an organism (use the uniprot proteome ID to specify your own, e.g. ‘UP000000625’ for E. coli). We generally recommend using ‘Reviewed’ subset of UniProt. If needed, add iRT sequences (e.g. if you are building a spectral library for DIA analysis and added iRT peptides to your samples).
You can use ‘Browse’ to select a FASTA file from a previous FragPipe/Philosopher analysis.
If you need to use a custom FASTA database, use the ‘Add decoys’ button to add decoys (common contaminants can also be added). Then click ‘Browse’ to navigate to your updated custom FASTA file. If you 50% of the entries should contain the decoy tag. Some additional information is also provided in the ‘Quick start with protein sequence databases’ section.
Configure MSFragger search
In the ‘MSFragger’ tab, check that the search parameters are suitable for your analysis. You can choose to save a customized parameter file to load for future use, or save the entire workflow (from either the ‘Workflow’ or the ‘Run’ tab).
Note about calibration and optimization: ‘Calibration and Optimization’ options is set, by default, to “Mass Calibration, Parameter Optimization”. It will effectively perform multiple MSFragger searches with different parameters, selecting the optimal settings. In practice, it results in 5-10% improvement in the number of identified PSMs, at the expense of increasing the search time. Consider changing this option to “Mass Calibration” or even “None”, especially if you already know your data (e.g. from previous searches of the same or similar files) and can adjust the corresponding MSFragger parameters (fragment tolerance, number of peaks used, intensity transformation) manually, if needed.
Note about highly complex searches: For non-specific searches or for searches with many variable modifications, you may need to use the database splitting option, which requires an installation of Python.
Note about custom enzymes: To use an enzyme not listed in the digestion rules drop-down menu, enter the custom cleavage rules but be sure to set the ‘Enzyme name’ field to ‘nonspecific’. Select ‘ENZYMATIC’ from the ‘Cleavage’ drop-down menu.
If you loaded one of the common workflow file provided with FragPipe, or previously updated the downstream parameters when setting MSFragger search parameters, you can skip to the next section. You can also load default downstream processing parameters by selecting the appropriate ‘Load defaults’ button. In most cases, search results from MSFragger must be filtered by PeptideProphet and ProteinProphet.
For open search workflows, select Crystal-C to remove open search artifacts and improve the interpretability of your results (Note: at present, Crystal-C does not support .d files, and will be disabled by FragPipe when using .d as input).
LFQ: Label-free Quantification
To perform label-free quantification, make sure Label-Free Quantification is selected (note that analyses can be performed without any quantification, in which case spectral counts will be reported).
IonQuant is a default LFQ quantification tool in FragPipe. A match-between-runs (MBR) option can be turned on for closed search workflows. Min ions (default = 2) controls how many quantifiable ions are required for protein-level quantification. Select Protein quant quantification option (either top-N, e.g. summation of the top 3 most intense peptides, or MaxLFQ approach. The latter is recommended).
Note: IonQuant provides a lot of flexibility with how identifications are transfered between runs with MBR. Two of the key parameters controlling MBR are ‘Min MBR correlation’ and ‘MBR top runs’. Min MBR correlation parameter allows MBR only between runs with an overlap-weighted correlation (Spearman correlation of (retention time, intensity, and ion mobiliby) * overlap in IDs) above the specified threshold (0.5 by default). In addition, MBR top runs is applied to allow transfer of ions only from the highest N (by default 3) correlated runs that are above the ‘Min MBR correlation’.
The optimal choice of MBR parameters depends on the experimental design. For example, in an AP-MS experiment with three replicates of Bait protein and 3 replicates of Negative Controls, one may want to set ‘MBR top runs’ parameter to 2, so only runs of the same kind can be used as donor runs for MBR. As a result, MBR will be performed only between Bait IP runs, or between the Control runs, but not between the two groups.
If you want to allow transfer between all runs in the dataset, set ‘MBR top runs’ to a large value (larger than the number of runs in the dataset) and set ‘Min MBR correlation’ to 0.
MBR is also controlled using FDR. We recommend 0.01 (i.e. 1%) ion-level FDR (default value). However, to allow more transfers (at the risk of introducing more quantification errors), FDR threshold can be relaxed, e.g. to 0.05 (5 %).
Isobaric Labeling-Based Quantification
To perform isobaric labeling-based quantification (TMT/iTRAQ), make sure Label-Free Quantification is selected
- Select a labeling reagent (e.g. TMT10, TMT6, iTRAQ4, etc).
- For each experiment as set in the ‘Workflow’ tab, select ‘Edit/Create’ Sample/Channel Annotation to assign sample information to each TMT/iTRAQ channel, or ‘Browse’ to load an existing ‘annotation.txt’ file.
In the annotation pop-up window:
- Load the selected TMT/iTRAQ channels.
- Provide the experiment/replicate information for each channel.
Annotation files will be named ‘annotation.txt’ and saved in each folder.
Note: Instead of naming samples/channels in FragPipe using Edit/Create, you can make ‘annotation.txt’ files in advance, and FragPipe will load it automatically if it is in the same folder as the corresponding mzML files. When creating these files, make sure the value in first column (channel) and in the second column (sample) are separated using space, not using tab or any other character.
Note: If you have multiples plexes and added a common reference sample to each plex for bridging purposes, label these common reference samples as commonprefix_plexnumber (e.g. pool1, pool2, etc). If you want to use this common reference as the basis for computing the TMT/iTRAQ ratios for each PSM (within TMT-Integrator), select ‘Define reference: Reference sample’, and enter the text keyword describing the common reference channel (e.g. ‘pool’) that matches your naming scheme. Alternatively, select Virtual Reference approach if you do not have a reference sample. With the vitual reference approach, individual channel intensities for each PSM will be converted to ratios by dividing each channel intensity by the average intensity across all channels in that PSM.
For open search-based workflows, PTM-Shepherd summarizes delta masses and provides reports on residue localization, retention time similarity, and more.
Spectral library generation
Spectral libraries can be generated within closed search-based workflows. A library will be generated for each experiment specified in the ‘Workflow’ tab. Experiments must contain more than one spectral file.
When building a library from fractionated data, using one of the fractions for reference retention time (RT) calibration is not recommended. Instead, select ciRT for human samples or iRT spike-in peptides for other organisms if possible.
Note: To use EasyPQP, you will need to 1) install Git if you don’t already have it, then 2) open an Anaconda Prompt command line window and run these two commands:
pip uninstall --yes easypqp
pip install git+https://github.com/grosenberger/easypqp.git@master
- Browse for the folder where you would like the search results to be written.
- Press ‘RUN’ to begin the analysis! See the MSFragger wiki, FAQ, and previous questions on Github for more help.