Scans alignment

Introduction

When initially recorded, the mz axis may be different for each profile MS scans. The first step in the transformation pipeline is to generated a common axis for all scans. This is need for Finnee, but as it will be shown also allow many interesting chemometrics transformation. Every example shows in this page are done using the CE-TOFMS data from a urine sample - mzML files with scans in profile or centroid spectrum format, The Finnee object was created using the following options

myFinnee = Finnee('overwrite', 'tLim', [1 15])

which after creation gives the following information

myFinnee = 

  Finnee with properties:

FileID: 'ACID-198-rep1'
DateOfCreation: 15-Dec-2017 11:32:02
FileIn: 'C:\iBET\Data4Wiki\ACID-198-rep1.mzml'
  Datasets: {[1x1 Dataset]}
   Options: [1x1 struct]
  Path2Fin: 'C:\iBET\Data4Wiki\ACID-198-rep1.fin'
  MZMLDump: {1x70 cell}

The total ion profiles (TIP) can be visualised using

myFinnee.Datasets{1}.TIP.plot

TIP

Generating common mz axis

Dealing with data reduction

Ideally, the mz axis in any scan could be used as template. However, to reduce the size of the original file, in many cases trailing zeros are removed from the data. It is necessary to extrapolate the full axis using the available data. With Finnee this is done, with any scans, with the extrapolMZ function however it is recommended to use this function with the most intense ms spectra. With our previous data, this is the spectra recorded at 8.54 min. The spectra can be obtained using cSptr = myFinnee.Datasets{1}.getSpectra(8.54);

From this spectra the axis can be extrapolated using

[mzAxis, r2, data4axis] = obj.extrapolMZ(n, Lim);

where n is the order of the polynomial (default 2) that will be used to estimate the mz axis, Lim = [mzMin, mzMax], the limit of the new axis; mzAxis the axis, r2 the goodness of the fit with the polynomial and data4axis the data that were used to estimate the axis. The true limit of the mz interval are recorded in the Datasets object

Lim = myFinnee.Datasets{1, 1}.MZlim

Lim =

   48.0020  686.5568

while the interval can be changed, it should be with the dataset limits, especially if using a high order polynomial. For example,

[mzAxis, r2, data4axis] = cSptr.extrapolMZ(3, [50 500]);

r2 is equal to 0.9999. Quality of the extrapolated axis can be verify with the data4axis,

scatter(data4axis(:,1), data4axis(:,2))

scatter plot ~

Adjusting all scans to the common axis

The matlab interp1 function is used to align all spectra to the common axis. While different methods are possible, with those data the 'linear' method is the one that gave best results. However other approach may be tested in the future. The function align2newmz can be used to correct every spectra and save them in a new dataset:

myFinnee = myFinnee.align2newMZ(Id, newAxis)

where Id to the dataset and newAxis is the new Axis. Can also be [], in this case the axis will be estimated using the most intense spectra with a polynomial of degree 2.

myFinnee = myFinnee.align2newMZ(1, [])

myFinnee = 

  Finnee with properties:

FileID: 'ACID-198-rep1'
DateOfCreation: 15-Dec-2017 11:32:02
FileIn: 'C:\iBET\Data4Wiki\ACID-198-rep1.mzml'
  Datasets: {[1x1 Dataset]  [1x1 Dataset]}
   Options: [1x1 struct]
  Path2Fin: 'C:\iBET\Data4Wiki\ACID-198-rep1.fin'
  MZMLDump: {1x70 cell}

Using the new dataset

This transformation allows to obtain MS spectra all with the same mz axis. This will be used in later time to corrected the whole dataset from baseline drift and background noise, but those can also be used, for example, to detect co-migration. The Pearson correlation coefficient is a single measurant that allows to quantify the similarity between two series of points. It is often central to many chromatographic techniques and often used to estimated the peak purity. In Matlab the Pearson correlation coefficient can be calculate using corrcoef. For example, to determine if the peak at 8.54 min correspond to one or more compounds, one can compare the spectra at the beginning and end of the peak. spc1 = myFinnee.Datasets{2}.getSpectra(8.40); spc2 = myFinnee.Datasets{2}.getSpectra(8.72);

It should be emphasis that we are now using datasets{2} where all scan have the same axis. However, for size reason, the trailing zeros have been removed. To align those spectra to the full axis the following commend should be used spc1 = myFinnee.Datasets{2}.xpend(myFinnee.Datasets{2}.getSpectra(8.40)); spc2 = myFinnee.Datasets{2}.xpend(myFinnee.Datasets{2}.getSpectra(8.72));

spc1 and spc2 are trace objects

spc1 = 

  Trace with properties:

  Title: 'XMS- MS scan from 8.40 to 8.40 min'
FigureTitle: 'PRF=2 MMZ=true SPR=2'
  TraceType: 'PRF'
  AxisX: [1x1 Axis]
  AxisY: [1x1 Axis]
   Path2Fin: 'C:\iBET\Data4Wiki\ACID-198-rep1.fin'
  Precision: ''
   Path2Dat: ''
  Index: [0 0]
 StoredData: [60065x2 double]
DataStorage: 'inTrace'
   AdiParam: {}
   Data: [60065x2 double]
InfoTrc: [1x1 struct]
 bz: 0

and the spectra is recorded in obj.Data with in the first column the axis and in the second the intensity. thus to compare those two spectra we would use

corrcoef(spc1.Data(:,2), spc2.Data(:,2))

ans =

   1.0000   0.4490
   0.4490   1.0000

A new representation could easily be build to check for peak purity:

PP = myFinnee.Datasets{1, 2}.AxisX.Data; % Time axis
for ii = 2:size(PP, 1)
	    spc1 = myFinnee.Datasets{2}.xpend(myFinnee.Datasets{2}.getSpectra(PP(ii-1, 1)));
		spc2 = myFinnee.Datasets{2}.xpend(myFinnee.Datasets{2}.getSpectra(PP(ii, 1)));
        cc = corrcoef(spc1.Data(:,2), spc2.Data(:,2));
        PP(ii,2) = cc(1,2);
end

To superpose the TIP with the Peak Purity,

TIP = myFinnee.Datasets{2}.TIP
plotyy(TIP.Data(:,1), TIP.Data(:,2), PP(:,1), PP(:,2))

Up :Correcting a Dataset
Next : Savitzky Golay filtering
Previous: Correcting a Dataset

Finnee: The User's manual

1. Where to start?
2. Trace, Dataset and Finnee objects
3. Correcting a Dataset
4. Mining for features, using replicates and peak matching
5. Tutorials

Provide feedback

Saved searches

Use saved searches to filter your results more quickly