Running MSstats using MSstats.csv from IonQuant
Using the .d folders (raw TIMS-TOF PASEF data) as input according to the FragPipe tutorial, IonQuant can generate a MSstats compatible file MSstats.csv
.
Given an experimental setup that looks like this in the ‘Select LC/MS Files’ tab of FragPipe:
The MSstats.csv
file output from IonQuant (via FragPipe) will look something like this:
This MSstats.csv
file can be read by MSstats without any conversion. The R command for installing MSstats
is:
if (!requireNamespace("BiocManager", quietly = TRUE))
install.packages("BiocManager")
BiocManager::install("MSstats")
Below is an example R script that reads and analyzes MSstats.csv
using MSstats
(make sure to set rootDir
to the directory containing your MSstats.csv file):
rm(list = ls())
library(stringr)
library(readr)
library(MSstats)
rootDir <- "folder_with_MSstats.csv" # Specify the path of the directory containing MSstats.csv.
print(str_c("Using IonQuant's result from ", rootDir))
# Read MSstats.csv file.
raw <- read_csv(str_c(rootDir, "MSstats.csv"), na = c("", "NA", "0"))
raw$ProteinName <- factor(raw$ProteinName)
raw$PeptideSequence <- factor(raw$PeptideSequence)
# Change root directory for MSstats
print(str_c("Root DIR: ", rootDir))
setwd(rootDir)
# Processing the data using MSstats
processedData <- dataProcess(raw, logTrans = 10)
# Downstream analysis...
Details of the functions and options can be found here.
With processedData
, users can perform various downstream analyses described in the MSstats user manual.
Convert processedData
into a protein intensity table (optional)
Although processedData
generated by MSstats is suitable for data analysis in R, it is not reader-friendly. Here we provide a R script that converts this data structure into a protein intensity table after running MSstats:
rm(list = ls())
library(stringr)
library(readr)
library(MSstats)
rootDir <- "folder_with_MSstats.csv" # Specify the path of the directory containing MSstats.csv.
print(str_c("Using IonQuant's result from ", rootDir))
# Read MSstats.csv file.
raw <- read_csv(str_c(rootDir, "MSstats.csv"), na = c("", "NA", "0"))
raw$ProteinName <- factor(raw$ProteinName)
raw$PeptideSequence <- factor(raw$PeptideSequence)
# Change root directory for MSstats
print(str_c("Root DIR: ", rootDir))
setwd(rootDir)
# Processing the data using MSstats
processedData <- dataProcess(raw, logTrans = 10)
print("Preparing a protein-intensity table.")
x <- processedData$RunlevelData
y <- data.frame(matrix(nrow=length(unique(x$Protein)), ncol=length(unique(x$SUBJECT_NESTED)) + 1))
colnames(y) <- c("Protein", unique(as.character(x$SUBJECT_NESTED)))
y$Protein <- unique(x$Protein)
for (protein in y$Protein) {
for (subject in unique(x$SUBJECT_NESTED)) {
if (length(x[x$Protein == protein & x$SUBJECT_NESTED == subject,]$LogIntensities) > 0) {
y[y$Protein == protein, match(subject, colnames(y))] <- x[x$Protein == protein & x$SUBJECT_NESTED == subject,]$LogIntensities
}
}
}
write_csv(y, str_c(rootDir, "MSstats_protein.csv"), na = "")
The protein intensity table (MSstats_protein.csv
) will then look like the following:
Protein | 1.1 | 1.2 | 1.3 | 1.4 |
---|---|---|---|---|
A0A024RBG1 | 2.268105 | 2.194517 | 2.199005 | 2.251258 |
A0A0B4J2D5 | 3.015404 | 2.995607 | 3.01179 | 2.984934 |
A0A0B4J2F0 | 3.068368 | 3.069178 | 3.005508 | |
A0A0B4J2F2 | 2.206007 | 2.215123 | 2.257591 | 2.185901 |
A0A0U1RRE5 | 2.26019 | 2.204316 | 2.305125 | 2.288413 |
A0A0U1RRL7 | 2.080483 | 2.135092 | 2.115185 | |
A0A140G945 | 3.064431 | 3.048989 | ||
A0AV96 | 2.540166 | 2.579872 | 2.566627 | 2.606213 |
A0AVF1 | 2.435175 | 2.507841 | 2.335089 | 2.35696 |
A0AVT1 | 2.59594 | 2.590422 | 2.586453 | 2.58392 |
A0FGR8 | 2.685545 | 2.636278 | 2.696507 | 2.720165 |
A0JLT2 | 2.173107 | 2.17204 | 2.197647 | 2.164165 |
A0JNW5 | 3.07027 | 2.975922 | 3.023312 | 3.014954 |
A0MZ66 | 2.868374 | 2.818923 | 2.861887 | 2.846367 |
A0PJW6 | 2.553568 | 2.534552 | 2.685835 | 2.567018 |
A1L0T0 | 2.751461 | 2.737689 | 2.780147 | 2.776778 |
A1L170 | 2.395757 | 2.386972 | 2.39856 | 2.447601 |
A1X283 | 2.561679 | 2.536692 | 2.57959 | 2.587461 |
A2RRP1 | 2.462482 | 2.431679 | 2.47807 | 2.450703 |
A2RTX5 | 2.257712 | 2.174976 | ||
A2RU67 | 3.001728 | 2.402563 | ||
A2VDF0 | 2.207056 | 2.295692 | 2.374141 | 2.262893 |
A3KMH1 | 2.446569 | 2.461708 | 2.471607 | 2.509326 |
A3KN83 | 2.62989 | 2.644234 | 2.6331 | 2.58684 |
A4D1E9 | 2.58176 | 2.537538 | 2.564693 | 2.571433 |
A4D1P6 | 2.293137 | 2.237853 | ||
A5A3E0 | 2.407162 | 2.3797 | 2.396964 | 2.407677 |
A5D8V6 | 2.334407 | 2.261135 | 2.269774 | 2.258641 |
A5PLN9 | 2.55918 | 2.881768 | 2.65032 | 3.034141 |
A5YKK6 | 2.62894 | 2.654115 | 2.620041 | 2.628946 |