MSL-ST: Development of Mass Spectral Library Search Tool to Enhance

Compound Identiﬁcation

Teodora Gerasimoska

1 a

, Milka Ljoncheva

2,3 b

and Monika Simjanoska

1 c

Faculty of Computer Science and Engineering, Ss. Cyril and Methodius University,

Rugjer Boshkovikj 16, 1000 Skopje, N. Macedonia

zef Stefan Institute, Jamova cesta 39, 1000 Ljubljana, Slovenia

International Postgraduate School Jo

zef Stefan, Jamova Cesta 39, 1000 Ljubljana, Slovenia

Keywords:

Mass Spectral Library, Batch Search Tool, Mass Spectrometry.

Abstract:

Identiﬁcation of new organic compounds through suspect screening (SS) and non-targeted analysis (NTA)

became the most challenging task in environmental and metabolomics research in the recent two decades.

Identiﬁcation of thousands of organic compounds is performed using the recent technology advancements in

chromatography-mass spectrometry as the core analytical platform, assisted by multitude of cheminformatics-

assisted approaches. As many of those approaches rely on mass spectral libraries (MSLs) search, the avail-

ability of comprehensive MSLs with engines for batch search and export of MS data and batch search engines

for simultaneous search and export of MS data from multiple MSLs is of crucial importance. In lack of such,

analysts perform this step in a laborious, time-consuming manual manner, importing signiﬁcant risk of com-

pound misidentiﬁcation.

This paper presents MSL-ST, the ﬁrst tool for automated batch search and storage of MS spectra that uses

two of the largest publicly available MSLs as data source, the MassBank of North America (MoNa) and the

MassBank of Europe. MSL-ST assembles large amount of MS data in an automated, time- and cost-effective

manner in a format which allows its further processing, especially for the purpose of compound identiﬁcation.

The tool, accompanied with user manual, is publicly available on GitHub.

1 INTRODUCTION

The typical targeted analysis of organic compounds,

based on the identiﬁcation of the presence and de-

termination of the concentration of a predetermined

list of organic compounds is the golden standard in

all monitoring analyses of organic compounds. De-

spite it, over the last two decades, SS and NTA

have evolved into effective screening strategies for

“known unknowns” and ”unknown unknowns” com-

pounds in complex environmental samples, respec-

tively. In addition to the challenge of developing sen-

sitive generic analytical methods able to identify as

many analytical signals as possible that correspond

to organic compounds, the next major challenge is

to identify the compounds to which these analytical

signals correspond. Appropriate data processing and

https://orcid.org/0000-0001-9923-6483

https://orcid.org/0000-0001-6664-0287

https://orcid.org/0000-0002-5028-3841

compound identiﬁcation tools have to be employed

for this task in order to avoid false positive or false

negative identiﬁcations, most frequently due to the

presence of interfering singals. Acquisition of sig-

nals of thousands of compounds in a single environ-

mental or biological sample is achieved by employ-

ing two analytical platforms: nuclear magnetic res-

onance (NMR) and gas chromatography (GC) or liq-

uid chromatography (LC) coupled to mass spectrome-

try (MS). While the ﬁrst analytical platform performs

nondestructive and noninvasive sample analysis with

concentration-dependent sensitivity, GC/LC-MS an-

alytical platforms offer sensitive detection of thou-

sands of analyte signals, accompanied with various

established cheminformatics-assisted approaches for

their annotation and identiﬁcation. High resolution-

accurate mass MS (HR/AM-MS) became the indis-

pensable analytical platform for accurate and reliable

compound identiﬁcation. GC and LC are the analyt-

ical techniques used for selective compound separa-

tion, that, when coupled to various HR/AM-MS ana-

Gerasimoska, T., Ljoncheva, M. and Simjanoska, M.

MSL-ST: Development of Mass Spectral Library Search Tool to Enhance Compound Identiﬁcation.

DOI: 10.5220/0010424101950203

In Proceedings of the 14th International Joint Conference on Biomedical Engineering Systems and Technologies (BIOSTEC 2021) - Volume 3: BIOINFORMATICS, pages 195-203

ISBN: 978-989-758-490-9

195

lyzers with high mass accuracy, resolving power and

sensitivity and wide linear range ensure reliable com-

pound identiﬁcation across a wide mass and concen-

tration range.

One of the main applications of cheminformat-

ics is generating methods for storage and retrieval of

chemical structures of compounds. Effective search

through structural information for compounds in-

volves the use of information tools (data search and

machine learning), as well as knowledge in the ﬁeld

of graph theory (or more precisely, chemical graph

theory). The automated retrieval and storage of chem-

ical, structural compound information is an important

step that would facilitate and speed up the process

that is currently being performed manually. In recent

decade, multitude of cheminformatics approaches are

developed that perform compound identiﬁcation us-

ing MS data as initial input data. Numerous tools,

such as MOLGEN-MS (Schymanski et al., 2008), the

GMD algorithm (Hummel et al., 2010), FT-BLAST

(Rasche et al., 2012) and NEIMS (Wei et al., 2019)

include MSL(s) search of the MS spectra of the candi-

date compound, followed by comparison of the mea-

sured MS spectra of the unknown compound with

the MS spectra of the candidate compounds con-

tained in the searched MSLs. As NTA often results

with identiﬁcation of hundreds to thousands of an-

alyte signals, identical is the number of candidate

compounds for which MS data should be searched

across one or multiple MSLs. The automation of

the MSL(s) search, as well as improvements in spec-

trum search tools in terms of user interface, search

algorithms, speed and IT support are the future chal-

lenges addressed in this research. Literature reviews

from the last decade indicate difﬁculties that persist

to this day. Namely, in order to enable the develop-

ment of data-driven algorithms for identiﬁcation of

small molecules, it is necessary that all sets of MS

spectra are available for public use through appro-

priate MSLs. The fact that most MSLs do not al-

low data download, ie. batch data download suggests

that for optimal use of MSL data, research commu-

nities should embrace new technologies and mecha-

nisms that support interoperability, and promote the

advancement of an ”open data sharing” culture. They

would ﬁnd even more frequent use as input data on

several ML models (Scheubert et al., 2013). The en-

tries of spectral data in MSLs is often accompanied by

rich metadata annotations, including compound struc-

ture, name, molecular descriptors (InChI, InChIKey,

SMILES), instrument type, ionization mode, opera-

tion conditions (e.g. GC oven temperature program,

LC method, collision energy etc.), adduct ion type and

product ion annotations. An extensive overview of the

literature on MSLs and the identiﬁcation of peptides

and small molecules using cheminformatics-assisted

approaches indicates that the lack of ability to search

and retrieve batch mass spectra from publicly avail-

able MSLs signiﬁcantly impedes the development of

new ML models for predicting molecular mass, MF

and / or the structure of small organic compounds,

thus preventing the identiﬁcation of new organic com-

pounds. Hence the main impetus for the development

of an MSLs search tool, used for the identiﬁcation

of organic compounds, which has the option of batch

search of mass spectra for more data and their auto-

mated retrieval.

The paper ﬁrst discusses some of the notable tools

related to the problem at hand (section 2), moves for-

ward to the presentation of the main data sources and

methods used for the tool development (Section 3)

and of the comprehensive functionality of the tool

(Section 4) and concludes the directions of further de-

velopment (section 5).

2 RELATED WORK

Searching and providing high quality chemical infor-

mation from multiple data sources is an extreme chal-

lenge, but also essential before performing SS and

NTA, given that the direct approach to identifying

compounds is to search for one or more MSLs. Chal-

lenges for chemical DBs of compounds include at-

taching high quality data and thus providing access to

chemical structures and related metadata (experimen-

tal/predicted properties, toxicity data, literature data,

etc.). The publication of chemical metadata with an

appropriate molecular descriptor (eg SMILES, InChI,

InChI Key, PubChem ID, ChemSpider ID, Dashboard

DTXSID, CAS numbers or MOL ﬁles) is extremely

important for unique self-identiﬁcation display, but

also for in the database.

Most MSLs are publicly available, including

METLIN (Guijas et al., 2018), MassBank, GMD

(Hummel et al., 2007), although many do not allow

batch download. Some MSLs are speciﬁc to a speciﬁc

research area, such as Human Metabolome Database

(HMDB) (Wishart et al., 2007), which includes the

MS / MS spectrum of human metabolites, and Re-

Spect (Sawada et al., 2012) contains mass spectra

of plant metabolites. Also, depending on the type

of chromatography, for GH-MS large MSLs are rou-

tinely used, while for LC-MS MSLs are used which

usually contain a smaller number of mass spectra for

a smaller number of compounds.

MSLs can be classiﬁed according to various crite-

ria, such as resolution (low/high), number and type of

BIOINFORMATICS 2021 - 12th International Conference on Bioinformatics Models, Methods and Algorithms

196

mass analyzers used to generate mass spectra (MS

, ...., MS

) (method type of ionization, type of

chromatography (GH, LH), number and type of com-

pounds with mass spectra included in MSLs, tak-

ing into account their physicochemical properties and

source (human, animal, plant, synthetic compound,

environmental pollutant).

Statistical processing of the information extracted

from the relevant scientiﬁc literature in 2015 shows

that EI-MS libraries for volatile compounds (such

as NIST (NIST: National Institue of Standard and

Technology, 2020) / EPA / NIH, Wiley (McLafferty

and Stauffer, 2009), GMD (Hummel et al., 2007),

MassBank (Horai et al., 2020) ) are most commonly

cited and used MSLs (82%), followed by ESI-MS /

MS MSLs (such as METLIN (Guijas et al., 2018)

and HMDB (Wishart et al., 2007) ) for non-volatile

compounds (16%) (Ljoncheva, Milka and Stepi

snik,

Toma

z and D

zeroski, Sa

so and Kosjek, Tina, 2020).

GC-MS spectra have a great advantage over LC-

MS/MS spectra due to their higher reproducibility,

which makes the generated mass spectra comparable,

regardless of whether they are generated using mass

analyzers of different conﬁguration and/or manufac-

turer, which in turn allows the application of com-

prehensive MBSs. The limited use of the GC-MS

analytical platform for the analysis of volatile com-

pounds leads to a reduced number of GH-MS spectra

in MSLs. Namely, Mass Bank of Europe (Horai et al.,

2020) contains mass spectra of 15000 non-volatile

organic compounds out of a total of 41,092 com-

pounds, METLIN (Guijas et al., 2018) only 12,057

out of 242,032 compounds, compared to libraries with

volatile compounds such as NIST (NIST: National

Institue of Standard and Technology, 2020), Wiley

(McLafferty and Stauffer, 2009), MassBank of Eu-

rope (Horai et al., 2020), Fiehn Lib (Kind et al.,

2009), and the Golm Metabolome Database (Hummel

et al., 2007), containing a signiﬁcantly larger number

of compounds.

3 MATERIALS AND METHODS

3.1 Materials

3.1.1 MassBank of Europe

MassBank of Europe (Horai et al., 2020) is the

ﬁrst publicly available MSL containing comprehen-

sive corpus of MS data generated using versatile

analytical platforms, such as LC-ESI-QTOF, LC-

ESI-ITFT, LC-ESI-QFT, EI-B, LC-ESI-QQ, LC-ESI-

Q, LC-ESI-QQQ and GC-EI-TOF. As of December

2020, electrospray ionization (ESI) (60.0%), elec-

tron ionization (EI) (30.0%), chemical ionization

(CI) (2.0%), atmospheric-pressure chemical ioniza-

tion (APCI) (1.6%) and matrix-assisted laser desorp-

tion/ionization (MALDI) (1.5%) mass spectra rep-

resent the most valuable MS data from the 88 168

unique spectra (19 981 MS

, 67 188 MS

, 929 MS

and 70 MS

spectra) of 14 838 unique compounds this

MSL includes, making it one of the most comprehen-

sive MS data repository available MassBank (Horai

et al., 2020). MassBank of Europe’s unique feature

is the inclusion of merged spectra, that are spectra

containing MS data from numerous spectra of a sin-

gle compound generated under different instrumental

conditions (usually at different ionization energies).

MassBank of Europe is also repository for trial and

unknown MS data, that are of great importance for SS

and NTA. Despite being especially user-friendly and

with versatile search options, including name, exact

mass, molecular formula, peak list, peak differences,

InChIKey and SPLASH search, the search engine of

this MSL does not offer batch data search and export.

3.1.2 MoNa

The MassBank’s version of North America,MoNa

(Mehta, 2020) is the most exhaustive readily-

downloadable MS data repository. As of December

2020, it contains 672 751 MS spectra of more than

90 300 organic compounds (20 756 MS

, 635 129

, 929 MS

and 70 MS

spectra), 490 087 of which

in silico and 179 535 experimental spectra. MoNa’s

search engine apart from offering quick spectra search

by compound name, compound class, molecular for-

mula and/or exact mass includes the unique feature of

searching for spectra similar to the one of the user, but

lacks to offer batch search and download of spectral

data and accompanying metadata.

3.2 Methods

MSL-ST relies on both Python and Django frame-

work, and the concept of simulating human approach

for data assessment. This approach saves signiﬁcant

amount of time otherwise used for manual spectral

data download. The process itself involves fetch-

ing the web page and downloading data by utilising

Python libraries, which is automating the scenario

when the web page would have been displayed to

the user through a web browser. Once data is down-

loaded, the extraction begins. The content of the page

can be parsed, searched, and reformatted.

Python is often described as the ”batteries in-

cluded” language because of its comprehensive stan-

dard library and modules for variety of tasks (DeBill,

MSL-ST: Development of Mass Spectral Library Search Tool to Enhance Compound Identiﬁcation

197

2010). Besides being the main tool for the automated

data extraction functionality, it was as well selected

as the most appropriate programming language for

developing MSL-ST by using the Django free open

source framework, which follows the model-view-

template (MVT) architectural pattern (Holovaty and

Kaplan-Moss, 2009), including usage of Python even

for the conﬁguration ﬁles and data models. Django

was selected as web framework due to the emphasiz-

ing reusable components, shorter codes, rapid devel-

opment and principle of non-repetition, which lead to

facilitated creation of complex data-driven web pages.

The source code of MSL-ST is available at GitHub

htt ps : //github.com/gerasimoska/mass spectra,

and is also hosted and can be accessed at the

following link htt ps : //massspectra.dev/.

4 MSL-ST FUNCTIONALITY

In this section, the functionality of MSL-ST is pre-

sented by depicturing two experimental scenarios of

obtaining data from the two possible sources: the ﬁrst

one represents batch search and export from MoNa

and the second one from MassBank of Europe.

The web interface of MSL-ST (Fig. 1) consists of:

(1) Mass spectral library section, in which the user

selects the MSL to be searched against. Selecting one

among the two MSLs (one-at-a-time) enables parts of

the interface that correspond with the selected library,

that are searching parameters valid for the library at

hand. Selecting MoNa as source MSL enables the Li-

brary ﬁlter section, where the user is allowed to select

one or more of the sub-MSLs included in MoNa (Fig.

2). Selecting MassBank of Europe as source MSL, on

the other hand, enables the mass Spectra Library ﬁl-

ter section, the Other Instrument Types ﬁlter, which is

only available when searching the Mass Bank, is no

longer blurred and disabled (Fig. 3);

(2) Search parameter ﬁlter sections, used to cus-

tom the search parameters according to the user’s re-

quirements. Apart from the library-speciﬁc parame-

ters made available after MSL selection, other ﬁlters

are available for both MSLs, including:

• The ion mode ﬁlter, that allows the selection of

the type of ionization in MS, positive or negative;

• The MS Type ﬁlter which indicates the number

and type of mass spectrometers used to generate

MS and

• The Source Introduction ﬁlter that allows the se-

lection of the type of chromatography, whether it

is gas chromatography, liquid chromatography, or

capillary electrophoresis.

offering selection of ion mode (positive/negative), MS

type (MS

, MS

) and source introduction

(LC, GC, capillary electrophoresis (CE)). The search

ﬁlters are represented as tabs where the header is the

name of the ﬁlter, and the available options that can

be selected are radio buttons (only one of the offered

options is allowed), or, checkboxes (one or more of

the offered options are allowed). Since we already

described the different contents the .csv ﬁle might

contain, the user has to select the type of input data:

InChiKey, molecular formula or exact mass. Selec-

tion of exact mass as input data triggers another input

number ﬁeld requiring the user to deﬁne the mass dif-

ference tolerance level of the exact mass variation (in

Da);

(3) Upload ﬁle section, which allows attachment of

a .csv ﬁle with a list of compound names, molecu-

lar formulae, InChIKeys or molecular masses of com-

pounds for which MS data is required. The ﬁle upload

is performed by clicking the Browse button, which al-

lows attachment of ﬁles locally stored on the device.

After selecting the ﬁle, the window closes automati-

cally and the name of the selected ﬁle is written in the

Upload File tab and thus is visible for the user.

Furthermore, MSL-ST implements validation of

the attached ﬁle, approving attachment only of ﬁles

with .csv or .txt extension. This feature warns the user

fo attaching inappropriate ﬁle format by displaying an

error message in a warning window.

After uploading and selecting the desired ﬁlter op-

tions, by clicking the Search button, the request is

submitted and the data retrieval process begins, af-

ter which a ﬁle is automatically generated and down-

loaded (Fig. 8). In case the ﬁle is not in the appro-

priate format, and the source is not selected from the

Mass Spectral Library ﬁlter, this button remains dis-

abled.

After clicking Search, a POST request is sent and

a message for successful processing is displayed in

a warning window. This means that an HttpRequest

object is passed from the displayed template as an ar-

gument in the function of the view pointed from the

URL. The view expects an HttpResponse that will be

returned after executing the function for data retrieval.

As soon as the OK button is clicked, all ﬁlters are

disabled and a spinner is displayed, while the request

is processed and an output ﬁle is generated.

The data retrieval process, as shown in Fig. 5

starts by scrolling through all the input items, and

invoking the corresponding function from additional

helpers ﬁle in the system. Input arguments are the

values of selected ﬁlters from the application (ION

MODE, MS TYPE, SOURCE INTRODUCTION, LI-

BRARY), while the other arguments are initialized

BIOINFORMATICS 2021 - 12th International Conference on Bioinformatics Models, Methods and Algorithms

198

Figure 1: MSL-ST web interface.

Figure 2: MSL-ST with MoNa selected as source MSL.

to an empty list. Using the selected ﬁlters, a URL

where querystring passes the values, is generated and

a GET request is sent to it. The response of the request

contains list of ids from all of the compounds whose

metadata is contained in the selected MSL, that fulﬁl

the selected parameters.

Using the output arguments of the function, it is

iterated through each of the received IDs, and a GET

request is sent to each of the generated URLs from

the ID. The response contains all the data for the

given compound (name, exact mass, molecular for-

mula, molecular descriptors etc.) and the metadata

corresponding to instrument part and conditions, as

shown in Fig. 5. Since the response is in JSON for-

mat, using build-in functions of data structure objects

eases the process of parsing the data. Additionally,

formatting of the retrieved data in an easy-to-use out-

put .csv format is done before download (Fig. 8).

The output .csv ﬁle, where the header contains

the metadata/parameters name, molecular formula,

ions, etc.). The described data retrieval procedure is

repeated for all compounds from the input ﬁle and the

results are added to the output ﬁle accordingly. As

soon as the data retrieval process is ﬁnished for all

input items that have been iterated, the Django view

that has received the POST HttpRequest as an input

argument, returns the response as an output argument

of the function. After receiving the response from

MSL-ST: Development of Mass Spectral Library Search Tool to Enhance Compound Identiﬁcation

199

Figure 3: MSL-ST with MassBank of Europe selected as source MSL.

Figure 4: Obtained metadata for a given compound.

the POST request, when the output ﬁle is ready to be

downloaded, this option becomes available and easily

visible to the user as shown in Fig. 6. In real time,

the MS data for all MS entries of the selected MSL

will be downloaded and written in the output ﬁle. As

shown in Fig.7, it contains all the metadata available

in both MSLs.

Figure 5: UML diagram for the POST request.

Apart from the MSLs search engine, the navi-

gation menu of MSL-ST’s interface provides about

page with introduction to the project aim, mass spec-

trometry and organic compound identiﬁcation, a con-

tent aimed to aid the user’s experience while utilizing

MSL-ST. A link to the GitHub repository and author’s

contact information are also provided (Fig. 8).

In this way, MSL-ST allows time-effective re-

trieval of large data quantities, replacing multiple

weeks of manual MS data search and export with few

minutes of automated gathering of MS data for thou-

BIOINFORMATICS 2021 - 12th International Conference on Bioinformatics Models, Methods and Algorithms

200

Figure 6: Data ready to be downloaded.

Figure 7: Several columns from the generated output ﬁle containing retrieved metadata for the retrieved spectra.

Figure 8: Path diagram of MSL-ST.

sands of compounds. As MSL-ST is the ﬁrst batch

MSLs search and export engine available, no com-

parative analysis with similar tools can be performed.

Due to the lack of batch search option in MassBank

of Europe and MoNa, comparison with single MSL

batch search and export is also not possible.

MSL-ST: Development of Mass Spectral Library Search Tool to Enhance Compound Identiﬁcation

201

5 CONCLUSION

The development of MSL-ST, the ﬁrst publicly avail-

able MS data batch search and export engine using

two of the most comprehensive MSLs is pivotal step

in MSL-chemoinformatics-based compound identiﬁ-

cation. By offering a time-, cost- and labour-effective

solution for data extraction that can be easily im-

plemented in custom-made workﬂows, MSL-ST is

clearly addressing three of the current challenges of

chemoinformatics, that are: (1) centralization of mul-

tiple MSLs and uniformation of searching through the

empirical data they contain; (2) enabling search and

retrieval of batch data, instead of manually repeat-

ing the process, and (3) automated download of com-

pound data in a structured tabular format, instead of

time-consuming manual storage.

Since MSL-ST is the ﬁrst MSLs batch search

and export engine, it can be easily assumed that

the idea of developing such tools in order to aim

chemoinformatics-assisted compound identiﬁcation

is in its earliest infancy and thus, many more advance-

ments are to be added to the basic functionalities of

MSL-ST available in its ﬁrst version, presented in this

paper. Consequently, its further upgrades would in-

clude:

• Addition of more publicly available MSLs, that

would allow access to larger amount of exepri-

mental and metadata, thus spreading the capabili-

ties of the MSL(s)-based chemoinformatics tools;

• The MS Type ﬁlter which indicates the number

and type of mass spectrometers used to generate

MS, and

• The Source Introduction ﬁlter that allows the se-

lection of the type of chromatography, whether it

is gas chromatography, liquid chromatography, or

capillary electrophoresis.

This would allow access to a large number of ex-

perimental data on compounds of different species

(metabolites, peptides, etc.). The process of real-

time data extraction would be expanded by searching

through other structured databases, using them as a

”living resource” that is updated daily.

The web application provides an easy way to se-

lect the characteristics of the mass spectrometry of the

whole pile of input compounds, which will be applied

in the search in the selected MSL. There is still room

for future work in improving the interface of the web

application, which would result in a better user expe-

rience and easier use of the application.

REFERENCES

DeBill, E. (2010). Module counts. WWW], http://www.

modulecounts. com/.[Haettu 1.11. 2016.].

Guijas, C., Montenegro-Burke, J. R., Domingo-Almenara,

X., Palermo, A., Warth, B., Hermann, G., Koel-

lensperger, G., Huan, T., Uritboonthai, W., Aisporna,

A. E., et al. (2018). Metlin: a technology platform for

identifying knowns and unknowns. Analytical chem-

istry, 90(5):3156–3164.

Holovaty, A. and Kaplan-Moss, J. (2009). The deﬁni-

tive guide to Django: Web development done right.

Apress.

Horai, H., Suwa, K., Arita, M., Nihei, Y., and Nishioka,

T. (2020 (accessed November 16, 2020)). Massbank:

Mass spectral database for metabolome analysis. In

The 56th ASMS Conference on Mass Spectrometry

and Allied Topics, Denver, CO.

Hummel, J., Selbig, J., Walther, D., and Kopka, J. (2007).

The golm metabolome database: a database for gc-ms

based metabolite proﬁling. In Metabolomics, pages

75–95. Springer.

Hummel, J., Strehmel, N., Selbig, J., Walther, D., and

Kopka, J. (2010). Decision tree supported substruc-

ture prediction of metabolites from gc-ms proﬁles.

Metabolomics, 6(2):322–333.

Kind, T., Wohlgemuth, G., Lee, D. Y., Lu, Y., Pala-

zoglu, M., Shahbaz, S., and Fiehn, O. (2009). Fiehn-

lib: mass spectral and retention index libraries for

metabolomics based on quadrupole and time-of-ﬂight

gas chromatography/mass spectrometry. Analytical

chemistry, 81(24):10038–10048.

Ljoncheva, Milka and Stepi

snik, Toma

z and D

zeroski, Sa

and Kosjek, Tina (2020). Cheminformatics in MS-

based environmental exposomics: Current achieve-

ments and future directions. Trends in Environmental

Analytical Chemistry, page e00099.

McLafferty, F. W. and Stauffer, D. (2009). Wiley registry of

mass spectral data, volume 662. John Wiley Hobo-

ken, NJ.

Mehta, S. (2020 (accessed November 16, 2020)). Mass-

bank of north america (mona): An open-access, auto-

curating mass spectral database for compound identi-

ﬁcation in metabolomics presentation.

NIST: National Institue of Standard and Technology (2020

(accessed November 16, 2020)). The nist mass spec-

trometry data center.

Rasche, F., Scheubert, K., Hufsky, F., Zichner, T., Kai, M.,

Svato

s, A., and B

ocker, S. (2012). Identifying the un-

knowns by aligning fragmentation trees. Analytical

chemistry, 84(7):3417–3426.

Sawada, Y., Nakabayashi, R., Yamada, Y., Suzuki, M.,

Sato, M., Sakata, A., Akiyama, K., Sakurai, T.,

Matsuda, F., Aoki, T., et al. (2012). Riken tan-

dem mass spectral database (respect) for phytochem-

icals: a plant-speciﬁc ms/ms-based data resource and

database. Phytochemistry, 82:38–45.

Scheubert, K., Hufsky, F., and B

ocker, S. (2013). Computa-

tional mass spectrometry for small molecules. Journal

of cheminformatics, 5(1):12.

BIOINFORMATICS 2021 - 12th International Conference on Bioinformatics Models, Methods and Algorithms

202

Schymanski, E. L., Meinert, C., Meringer, M., and Brack,

W. (2008). The use of ms classiﬁers and structure

generation to assist in the identiﬁcation of unknowns

in effect-directed analysis. Analytica chimica acta,

615(2):136–147.

Wei, J. N., Belanger, D., Adams, R. P., and Sculley, D.

(2019). Rapid prediction of electron–ionization mass

spectrometry using neural networks. ACS central sci-

ence, 5(4):700–708.

Wishart, D. S., Tzur, D., Knox, C., Eisner, R., Guo, A. C.,

Young, N., Cheng, D., Jewell, K., Arndt, D., Sawh-

ney, S., et al. (2007). Hmdb: the human metabolome

database. Nucleic acids research, 35(suppl 1):D521–

D526.

MSL-ST: Development of Mass Spectral Library Search Tool to Enhance Compound Identiﬁcation

203