top of page

Baseline correction

Here is a very classical trouble when trying to acquire spectrum like data: the baseline.

The baseline may be caused by different phenomena but is generally useless since the information is not directly usable. A lot of data analyst get ride of the baseline. Badly, their baseline correction is specific for their use, and cannot be directly applicable in our case.

 

On Raman spectra

Using python, I developed a very simple function that allows removing the baseline from spectrum like data, meaning data with a lot of points and where the signal is presented as a peak, see the 'raw data' image. Here, we have two spectrums, one from a cell, the other from the background, and we want to remove the baseline to allow a better comparison between them.

The y-axis is the signal intensity while the x-axis is the Raman shift.

The baseline correction realizes first a modified rolling mean. , more specifically, it is making the rolling mean on the values considered as background. To differentiate the background form the signal, an Otsu threshold is applied on the window consider, and all value higher than the threshold (aka, the peaks) are not taken in the calculation. The baseline is then adjusted to have its minimum at 0 for intensity. This gives the baseline 1, that is further process by a Savgol-Golay smoothing, also realized with a rolling window. This step permits to remove any residual spikes, and shown with the baseline 2. Finally, the raw data are corrected by the baseline.

An optimization can be realized by adjusting the window size used for the rolling mean, where this size needs to be greater than the width of the peaks, and smaller than the baseline width.

With both spectra corrected, we can more easily distinguish peaks specific for the cell, while the background signal is mainly composed of glass and water spectra.

This data example can be further process by subtracting the background to the cell spectra.

On proteome profile

It is also possible to use this baseline correction on other data, like proteomic profile from a SDS-PAGE gel.

A cell lysate was denaturated, then labeled with Cy3-NHS ester, that was then compared to the whole protein labeling through a coomassie (CBB) staining. These data came from my publication in Bioconjugate Chemistry.

We can see by eye that the profile is different, especially for the band intensity.

By subtracting the baseline and normalizing the signal intensity, it becomes possible to directly compare the two conditions. In the graph, the blue line is the Cy3, while the green line corresponds to the CBB.

Finally, we can observe that they are not so much difference between the two conditions, allowing us to say that the fluorescent labeling with Cy3 permits to observe a classical proteome profile.

Possible improvement and code

It is still possible to improve this function, mainly with a better management of the edges/limits of the spectra. Actually, these are just ignored, with the window size being adjusted. One possible improvement will be to complete the window on the side with the last value or more difficult to extrapolate the spectrum.

In any case, the following link permit to access my Github with the function for the baseline and the Raman data. These data were acquired by Arnaud Germond, RIKEN in 2018.

Do not hesitate to try and test, and if any trouble arises, contact me.

In case you want to use this for publication or other,  just put my name in the acknowledgment. You can also contact me if you want that I adapt this function for a specific case.

bottom of page