The Experiment
Mass spectrometry has become an integral analytical technique in natural product discovery, both in measuring accurate mass for chemical formula determination, to analyzing molecule fragmentation for structure elucidation and library searches.
I won’t go in to too much detail here but there are some important experimental considerations when approaching the analysis of mass spectrometry data.
For more information I’d recommended looking at learning material the big vendors usually have, such as this.
The Instrument
Ionizer
There are a number of ionizers that are in use in modern mass spectrometers, with electrospray ionization (ESI) being the most common in our field. Nethod of ionization is important to consider because it will effect which types molecules are ionized, how they are ionized, whether the molecules will remain largely intact or fragmented, and what types of adducts you can expect. For ESI instruments are often in run in positive (most often) or negative mode, and sometimes both (polarity switching) and you should be aware of this going into your analysis (or be prepared to extract the relevant metadata from the raw data).
Analyzer
“Mass spectrometry” is ultimately performed by the mass analyzer, the part that separates, filters, differentiates molecules based on mass to charge (and sometimes size/shape via ion-mobility). There are a number of analyzers on the market, with the most popular being quadrupoles, ion traps, orbitraps, time-of-flight, and combinations thereof. The type of analyzer is important to consider in the analysis as well, and the following should be thought about when approaching a new analysis.
- What is the resolving power of the anlyzer(s)? Plural because, for example, the m/z filter window of a quadrupole that filters into a Time-Of-Flight (TOF) may or may not impact your analysis.
- What mode was the instrument run in (e.g. for triple quads was it run in precursor ion scan, neutral loss scan, product ion scan and MRM/SRM mode?)
- Often analyzers will have their efficacy rated in FWHM (full width at half maximum) which is a measure of resolving power
- If you confuse resolving power with mass resolution you aren’t alone, there’s been much controversy over the years as they somewhat related. See this whitepaper by Agilent. Simply stated, resolving power measures how well you can separate two mass peaks in a spectrum and resolution is a measure of how “wide” your peaks are.
- What is the scan speed of your analyzer(s)?
- Some analyzers are very fast (i.e. they can analyze/separate many m/z per second), while others are slower. Some sacrifice sensitivity, accuracy, resolution, etc for higher scan speed. All things to be aware of.
Detector
While there are different detectors, this isn’t usually a concern during modern analyses. However, if you notice things like sensitivity being too high or low it could be good feedback to give to the instrument operator as it could be detector settings (though llikely sample concentration or ionization efficiency).
Another thing to note is that some instruments may have more than one detector and they may serve different purposes. For example, some Time-Of-Flight (TOF) instruments have both a “linear” detector and a “reflectron” detector that elongates the flight path allowing higher resolving power but lower m/z ceiling than the “linear” detector.
The Data
Raw data formats
Unfortunately different instrument vendors, and even different instruments from the same vendor, have their own unique data storage format. This is for a variety of good and bad reasons, the most convincing to me being that instruments with ever-increasing acquisition speeds and ever-increasing data size need faster/better software/hardware strategies to store data, which can provide a competitive advantage.
Raw data is llikely to come with file extensions (.wiff
, .d
, .raw/.RAW
, .lcd
, etc.) and some are locked in so that only the instrument vendor’s software can read the data.
Open-source data formats
Fortunately there are widely used, open, standard formats available. You will proabably encounter mzXML
and/or its newer version, mzML
; so, go with mzML
if you have the option. mzXML files have the file extension mzXML
and mzML
files have the file extension .mzXML
and .mzML
.
Some vendor software allows converting a file in proprietary data format to mzML, otherwise your best bet is llikely the program msconvert
available as part of the ProteoWizard software library. Unfortunately some vendor formats can only be converted on a Windows computer, a limitation of vendors only providing Windows-based DLLs (i.e. don’t complain to the ProteoWizard team about this).
msconvert
can be used from both its GUI or at the command line
For an example of how to use the command line you can take a look at this zip of a directory that contains a batch file that converts a large number of files at once https://ccms-ucsd.github.io/GNPSDocumentation/fileconversion/#data-conversion-easy
I haven’t had the chance to try them but supposedly there are some relatively new Docker containers that can successfully run msconvert. If you know how badly this was needed then you know how exciting this would be/is.
Going forware I will only cover mzML/mzXML as they are by far the most commonly encountered open formats in our field. Other formats can be seen at https://www.psidev.info/specifications; and MGF at http://www.matrixscience.com/help/data_file_help.html
Next
In the next post we will dive into what mzML actually looks like, what spectra look like, etc.