End-to-End Multi-channel Neural Networks for Predicting Inﬂuenza a

Virus Hosts and Antigenic Types

Yanhua Xu

and Dominik Wojtczak

Department of Computer Science, University of Liverpool, U.K.

Keywords:

Machine Learning, Inﬂuenza Virus, Long Short-term Network, Convolutional Neural Network, Transformer.

Abstract:

Inﬂuenza occurs every season and occasionally causes pandemics. Despite its low mortality rate, inﬂuenza is a

major public health concern, as it can be complicated by severe diseases like pneumonia. A accurate and low-

cost method to predict the origin host and subtype of inﬂuenza viruses could help reduce virus transmission and

beneﬁt resource-poor areas. In this work, we propose multi-channel neural networks to predict antigenic types

and hosts of inﬂuenza A viruses with hemagglutinin and neuraminidase protein sequences. An integrated data

set containing complete protein sequences were used to produce a pre-trained model, and two other data sets

were used for testing the model’s performance. One test set contained complete protein sequences, and another

test set contained incomplete protein sequences. The results suggest that multi-channel neural networks are

applicable and promising for predicting inﬂuenza A virus hosts and antigenic subtypes with complete and

partial protein sequences.

1 INTRODUCTION

Inﬂuenza is a highly contagious respiratory illness

that results in as many as 650,000 respiratory deaths

globally per year (Iuliano et al., 2018). Inﬂuenza

spreads mainly through droplets, aerosols, or by di-

rect contact (Lau et al., 2010), and up to 50% of infec-

tions are asymptomatic (Wilde et al., 1999). Inﬂuenza

can complicate a range of clinical problems associ-

ated with high fatality rates, including secondary bac-

terial pneumonia, primary viral pneumonia, chronic

kidney disease, acute renal failure, and heart fail-

ure (Watanabe, 2013), (Casas-Aparicio et al., 2018),

(England, 2020).

The inﬂuenza virus genome comprises several

segments of single-stranded ribonucleic acid (RNA).

The virus has four genera, differentiated mainly by

the antigenic properties of the nucleocapsid (NP) and

matrix (M) proteins (Shaw and Palese, 2013). At

present, Inﬂuenza virus has four types: inﬂuenza

A virus (IAV), inﬂuenza B virus (IBV), inﬂuenza

C virus (IVC) and inﬂuenza D virus (IVD). IAV is

widespread in a variety of species, causes the most

serious diseases, and is the most capable of unleash-

ing a pandemic, while the others are less virulent. IAV

https://orcid.org/0000-0003-1028-9023

https://orcid.org/0000-0001-5560-0546

could trigger major public health disruption by evolv-

ing for efﬁcient human transmission, as it did, with

the ‘Spanish Flu’, during 1918–1919, which is esti-

mated to have killed 20 to 100 million people (Mills

et al., 2004).

IVA is further subtyped by the antigenic properties

of its two surface glycoproteins, hemagglutinin (HA)

and neuraminidase (NA). There are presently 18 HA

subtypes and 11 NA subtypes known (Asha and Ku-

mar, 2019), of which only H1, H2, H3 and N1, N2

spread among humans. The avian inﬂuenza viruses

(H5N1, H5N2, H5N8, H7N7, and H9N2) may spread

from birds to humans; this occurs rarely but can be

deadly: all avian inﬂuenza A viruses have the poten-

tial to cause pandemics.

The virus uses HA and NA to bind to its host cells

(James and Whitley, 2017). HA allows the virus to

recognise and attach to speciﬁc receptors on host ep-

ithelial cells. Upon entering the host cell, the virus

replicates and is released by NA, thence infecting

more cells. The immune system can be triggered to

attack viruses and destroy infected tissue throughout

the respiratory system, but death can result through

organ failure or secondary infections.

Viruses undergo continuous evolution. Point mu-

tations in the genes that encode the HA and NA can

render the virus able to escape the immune system.

Such change is described as antigenic drift and leads

Xu, Y. and Wojtczak, D.

End-to-End Multi-channel Neural Networks for Predicting Inﬂuenza a Virus Hosts and Antigenic Types.

DOI: 10.5220/0011526300003335

In Proceedings of the 14th International Joint Conference on Knowledge Discovery, Knowledge Engineering and Knowledge Management (IC3K 2022) - Volume 1: KDIR, pages 40-50

ISBN: 978-989-758-614-9; ISSN: 2184-3228

to seasonal inﬂuenza. The other change, the antigenic

shift, occurs more rarely and results in a major change

in the production of a new virus that cannot be com-

pletely handled by the existing immune response, and

may leads to the pandemics (Clayville, 2011).

In this paper, we propose multi-channel neural

networks (CNN, bidirectional long short-term mem-

ory, bidirectional gated recurrent unit and trans-

former) to predict the subtypes and hosts of IAV.

The models were trained on an integrated protein se-

quence data set collected prior to 2019 (named pre-19

set) and tested both on an integrated data set collected

from 2019 to 2021 (named post-19 set), and a data set

containing incomplete sequences. We use Basic Lo-

cal Alignment Search Tool (BLAST) as the baseline

model and all models yield better performance than

the baseline model, especially multi-channel BiGRU.

Tested on the post-19 set, this model reaches 94.73%

(94.58%, 94.87%), 99.86% (99.82%, 99.89%) and

99.81% (99.74%, 99.89%) F

score for hosts, HA

subtypes and NA subtypes prediction, respectively.

The performance on incomplete sequences reaches

approximately 81.36% (80.35%, 82.37%), 96.86%

(96.50%, 97.21%) and 98.18% (97.80%, 98.56%) F

score for hosts, HA subtypes and NA subtypes pre-

diction, respectively.

2 RELATED WORK

Rapid and accurate detection of IAV hosts and sub-

types can improve inﬂuenza surveillance and re-

duce spread. The traditional methods for virus

subtyping, such as nucleic acid-based tests (NATs),

are labour intensive and time-consuming (Vemula

et al., 2016). Therefore, various supervised machine

learning-based methods have been developed to pre-

dict the hosts or subtypes of inﬂuenza viruses, based

on convolutional neural network (CNN) (Clayville,

2011), (Fabija

nska and Grabowski, 2019), (Scarafoni

et al., 2019), support vector machines (SVM) (Ah-

san and Ebrahimi, 2018), (Xu et al., 2017), (Kincaid,

2018), decision trees (DT) (Ahsan and Ebrahimi,

2018), (Attaluri et al., 2009), random forests (RF)

(Kincaid, 2018), (Eng et al., 2014), (Kwon et al.,

2020), etc.

The protein sequence is of variable length and

needs to be encoded as a numerical vector. Pre-

vious studies have sought to do so using simple

one-hot encoding (Clayville, 2011), (Eng et al.,

2014), (Mock et al., 2021), pre-deﬁned binary en-

coding schemes (Attaluri et al., 2010), pre-deﬁned

ASCII codes (Fabija

nska and Grabowski, 2019),

Word2Vec (Xu et al., 2017), and physicochemical

features(Chrysostomou et al., 2021), (Sherif et al.,

2017), (Kwon et al., 2020), (Yin et al., 2020). One

of the drawbacks of using handcrafted feature sets or

physicochemical features is they do require feature

selection or extraction process before training. There-

fore, we applied word embedding to ask models to

learn the features from given training data, which is

more convenient, straightforward and light-weighted.

Most work focuses on three classes (i.e. avian, swine

and human) or a single class of hosts from a single

database. In contrast to previous work, we collected

data from multiple databases and focuses on more re-

ﬁned classes.

Multi-channel neural network has been used in re-

lation extraction (Chen et al., 2020), emotion recogni-

tion (Yang et al., 2018), face detection (George et al.,

2019), entity alignment (Cao et al., 2019), haptic ma-

terial classiﬁcation (Kerzel et al., 2017), etc. Few

studies use the multi-channel neural networks in in-

fectious diseases and therein we proposed three kinds

of multi-channel neural network architectures to pre-

dict inﬂuenza A virus host and subtypes simultane-

ously, instead of training single task-speciﬁc models.

3 MATERIALS AND METHODS

3.1 Data Preparation

3.1.1 Protein Sequences

The complete hemagglutinin (HA) and neuraminidase

(NA) sequences were collected from the Inﬂuenza

Research Database (IRD) (Squires et al., 2012) and

Global Initiative on Sharing Avian Inﬂuenza Data

(GISAID) (Shu and McCauley, 2017) (status 16th

August 2021). The originally retrieved data set con-

tains 157,119 HA sequences and 156,925 NA se-

quences from GISAID, 96,412 HA sequences and

84,186 NA sequences from IRD. The redundant and

multi-label sequences were ﬁltered, and only one HA

sequence and one NA sequence for each strain were

included in the data set. Therefore, each strain has a

unique pair of HA and NA sequences and belongs to

one host and one subtype. Our data set is from differ-

ent sources, and we removed sequences from GISAID

if they were already in IRD before integration. Some

strains in GISAID belonging to H0N0, HA0 is an un-

cleaved protein that is not infectious, also have been

removed. The strains isolated prior to 2019 are used

to produce the pre-trained model and strains isolated

from 2019 to 2021 are used only for testing the per-

formance of models.

The incomplete HA and NA sequences were col-

End-to-End Multi-channel Neural Networks for Predicting Inﬂuenza a Virus Hosts and Antigenic Types

Table 1: Summary statistics of data sets.

Data Set (alias) # Total # IRD # GISAID

< 2019 (pre-19) 27, 884 26,704 1,108

2019 - 2021 (post-19) 2,716 2,206 510

Incomplete (incomplete) 8,325 8,325 /

lected from IRD (status 16th August 2021). The se-

quence is thought as complete if its length is the same

as the length of the actual genomic sequence (Shu

and McCauley, 2017). We download the database

and then ﬁlter the complete sequences to get incom-

plete sequences, as both complete and incomplete se-

quences form the Inﬂuenza database (all sequences =

complete sequences ∪ incomplete sequences). In-

complete sequences are only used for testing the per-

formance of models. The details of the data sets are

summarized in Table 1.

3.1.2 Label Reassignment

IRD and GISAID recorded 45 and 33 hosts, re-

spectively, of which only 6 are consistent in both

databases, as shown in Fig. 1. We regroup the host la-

bels into 44 categories, the distribution of regrouped

host labels is represented in Fig. 2. 18 HA (numbered

as H1 - H18) and 11 NA (numbered as N1 - N11) sub-

types have been discovered, respectively. We also re-

group very few subtypes in the data set into other sub-

types (i.e. H15, H17, H18, N10 and N11), as shown

in Fig. 3 and Fig. 4.

3.2 Sequence Representation

Neural networks are functional operators that perform

mathematical operations on inputs and generate nu-

merical outputs. A neural network cannot interpret

the raw sequences and needs them to be represented

as numerical vectors before feeding them to the neu-

ral network. The most intuitive and simple strategy

to vectorise the sequence is called one-hot encoding.

In natural language processing (NLP), the length of

the one-hot vector for each word is equal to the size

of vocabulary. The vocabulary consists of all unique

words (or tokens) in the data. If each amino acid is

represented as one “word”, then the length of the one-

hot vector for each amino acid depends on the num-

ber of unique amino acids. Therefore, one-hot encod-

ing results in a sparse matrix for large vocabularies,

which is very inefﬁcient. A more powerful approach

is to represent each word as a distributed dense vector

by word embedding, which learns the word represen-

tation by looking at its surroundings, so that similar

words are given similar embeddings. Word embed-

ding has been successfully used to extract features of

biological sequences (Asgari and Mofrad, 2015).

A protein sequence can be represented as a set of

3-grams. In NLP, N-grams are N consecutive words

in the text, and N-grams of a protein sequence are N

consecutive amino acids. For example, the 3-grams

of sequence MENIVLLLAI is MEN ENI NIV IVL

VLL LLL LLA LAI. We set N as 3 as suggested by

previous research (Xu and Wojtczak, 2022).

3.3 Neural Network Architecture

We propose a multi-channel neural network architec-

ture that takes two inputs (HA trigrams and NA tri-

grams) and generates three outputs (host, HA sub-

types and NA subtypes). The neural networks applied

in this study include bidirectional long short-term

memory (BiLSTM), bidirectional gated recurrent unit

(BiGRU), convolutional neural network (CNN) and

Transformer.

3.3.1 Bidirectional Recurrent Neural Networks

We use two kinds of bidirectional recurrent network

networks in this study: Bidirectional Long Short-

Term Memory (BiLSTM) and Bidirectional Gated re-

current unit (BiGRU). LSTM is an extension of re-

current neural network (RNN). It uses gates to reg-

ulate the ﬂow of information to tackle the vanish-

ing gradient problem of standard RNNs (Hochreiter

and Schmidhuber, 1997b), (Hochreiter and Schmid-

huber, 1997a). A common LSTM has three gates:

input gate, forget gate and output gate. The input

gate stores new information from the current input

and selectively updates the cell state, the forget gate

ignores irrelevant information, and the output gate de-

termines which information is moved to the next hid-

den state. Bidirectional LSTM (BiLSTM) (Graves

et al., 2005), (Thireou and Reczko, 2007) comprises

a forward LSTM and a backward LSTM to train the

data in both directions, leading to better context un-

derstanding, and is more effective than unidirectional

LSTM (Graves and Schmidhuber, 2005).

The Gated recurrent unit (GRU) is similar to

LSTM but only has a reset gate and an update gate

(Cho et al., 2014). The reset gate decides how much

previous information needs to be forgotten, and the

update gate decides how much information to discard

and how much new information to add. GPUs have

fewer tensor operations and are therefore faster than

LSTMs in terms of training speed. Bidirectional GRU

also includes forward and backward GRU.

KDIR 2022 - 14th International Conference on Knowledge Discovery and Information Retrieval

anteater, avian, bat, beetle, bovine, camel, caprine, civet, civet cat, crane, dog, domestic cat,

donkey, ferret, ﬂat-faced bat, fowl, fox, horse, insect, large cat, lion, marten, meerkat, mink,

monkey, muskrat, panda, pika, plateau pika, raccoon, raccoon dog, rat, reassortant, sea mam-

mal, seal, skunk, weasel, wildebeest, yak

chicken, curlew, duck, eagle, falcon, goose, grouse, guinea fowl, gull, ostrich, other avian, par-

tridge, passerine, penguin, pheasant, pigeon, rails, sandpiper, shearwater, swan, turkey, turn-

stone, US quail, canine, equine, feline, other mammals

human, laboratory derived, unknown, swine, environment, equine

IRD

GISAID

Figure 1: Inconsistent host labels between IRD and GISAID databases: the intersection of hosts in the IRD and GISAID

databases is indicated in light green.

Figure 2: Data distribution (hosts).

3.3.2 Transformer

Transformer is an impactful neural network archi-

tecture developed in 2017 (Vaswani et al., 2017).

It was originally designed for machine translation,

but can be extended to other domains, such as solv-

ing protein folding problems (Grechishnikova, 2021).

Transformer lays the foundation for the development

of some state-of-the-art natural language processing

models, such as BERT (Devlin et al., 2018), T5 (Raf-

fel et al., 2019), and GPT-3 (Brown et al., 2020). One

of the biggest advantages of Transformer over tradi-

tional RNNs is that Transformer can process data in

parallel. Therefore, the Transformer can use GPUs to

speed them up and handle large text sequences well.

The innovations of Transformer neural network

include positional encoding and self-attention mech-

anism. Positional encoding stores the word order in

the data and helps the neural network to learn the or-

der information. The attention mechanism allows the

model to decide how to translate a word from the orig-

inal text to the output text. The self-attention mech-

anism, as the name suggests, pays attention to itself.

The self-attention mechanism allows the neural net-

work to understand the underlying meaning of words

in context by looking at the words around them. With

self-attention, neural networks can not only distin-

guish words but also reduce the amount of compu-

tation.

3.3.3 Convolutional Neural Network

A convolutional neural network (CNN) is typically

used to process images and achieves great success.

The idea of CNN is inspired by the visual process-

ing mechanism of the human brain, that is, neurons

are only activated by different features of the image,

such as edges. Two kinds of layers are often used in

CNNs, convolution layers and pooling layers. Con-

volution layers are the heart of CNNs, they imple-

ment convolution operators on the input image and ﬁl-

ters. Pooling layers downsample the image to reduce

the learnable parameters. In this study, we use one-

dimensional convolution layers to process sequence

data.

3.4 Implementation and Evaluation

Methods

All models are built with Keras, trained on pre-19 data

sets, and tested on post-19 and incomplete data sets.

We apply transfer learning when it comes to incom-

plete data set. The architecture of the multi-channel

neural network architecture is shown in Fig. 5. The

Transformer architecture used in this study is the en-

coder shown in (Vaswani et al., 2017), we use 3 heads

and an input embedding with 32 dimensions.

End-to-End Multi-channel Neural Networks for Predicting Inﬂuenza a Virus Hosts and Antigenic Types

Figure 3: Data distribution (HA subtypes).

Figure 4: Data distribution (NA subtypes).

Some studies confuse the role of validation and

test sets, so they tune the model’s hyperparameters

on the testing set instead of a separate validation set.

This involves the risk of data leakage and undermines

the credibility of the results. Therefore, in contrast to

classic K-fold cross validation (CV), which split data

into training and testing set, nested CV uses an outer

CV to estimate the unbiased generalised error of the

model, and an inner CV for model selection or hyper-

parameter tuning. The outer CV splits the data into

training

outer

set and testing set, and the inner CV splits

the training

outer

set into training

inner

set and validation

set. The model is trained only on the training

inner

set,

tunes its hyperparameters based on its performance

on validation set, and tests its general performance on

testing set. In this study, the outer fold k

outer

is chosen

as 5 and inner fold k

inner

is 4. Fig. 6 shows the process

of building CV ensemble models.

The data sets used in this study are highly im-

balanced, and common evaluation measurements,

such as accuracy and receiver operating characteris-

tic (ROC) curve, can be misleading (Akosa, 2017),

(Davis and Goadrich, 2006). Precision-recall curve

(PRC), on the other hand, is more informative when

dealing with a highly skewed dataset (Saito and

Rehmsmeier, 2015), and has been widely used in re-

search (Bunescu et al., 2005), (Bockhorst and Craven,

2005), (Goadrich et al., 2004), (Davis et al., 2005).

It is unsuitable, however, if using linear interpolation

to calculate the area under the precision-recall curve

(AUPRC) (Davis and Goadrich, 2006). A better al-

ternative way, in this case, is average precision (AP)

score (Su et al., 2015). Besides, we also apply com-

mon evaluation metrics, i.e. precision, recall and F

score. The formulas of these evaluation metrics are

shown above:

Precision =

T P

T P + FP

(1)

Recall =

T P

T P + FN

(2)

= 2 ×

Precision × Recall

Precision + Recall

(3)

AP =

∑

(Recall

− Recall

n−1

Precision

) (4)

where TP, FP, TN, FN stand for true positive, false

positive, true negative and false negative. If positive

data is predicted as negative, then it counts as FN, and

so on for TN, TP and FP.

KDIR 2022 - 14th International Conference on Knowledge Discovery and Information Retrieval

NA seq

(1,200)

HA seq

(1,300)

Embedding

(1,200,100)

Embedding

(1,300,100)

Dense+Relu

(1,128)

Dense+Relu

(1,32)

Dense+Relu

(1,32)

Dense+Softmax

(1,44)

Dense+Softmax

(1,17)

Dense+Softmax

(1,11)

Concatenate

(1,500,100)

Hosts

HA SubtypesNA Subtypes

GRU

(1,500,64)

Dense+Relu

(1,512)

BiLSTM

(1,500,64)

Flatten

Dense+Relu

(1,512)

Dense+Relu

(1,256)

Conv

(1,500,256)

MaxPooling

(1,166,256)

Conv

(1,166,128)

MaxPooling

(1,55,128)

Conv

(1,55,64)

MaxPooling

(1,18,64)

Dropout

(1,18,64)

Flatten

BiLSTM

(1,500,32)

GRU

(1,500,32)

Flatten

Dense+Relu

(1,256)

Figure 5: The multi-channel neural network architecture.

Protein Sequences

Model 1Model 2Model 3Model 4

Out-of-Fold Predictions

Final Predictions

soft voting

split data into k

outer

fold

split data into k

inner

fold

Training

Testing

Validation

Figure 6: The process of building a CV ensemble.

4 RESULTS

The overall performance of the model tested on each

data set is shown in Fig. 7 to Fig. 9. Metrics like

AP are designed for binary classiﬁcation but can be

extended to multi-class classiﬁcation by applying a

one-vs-all strategy. This case entails taking one class

as positive and remaining as negative. We compare

each model with a baseline model, Basic Local Align-

ment Search Tool (BLAST), with default parame-

ters, in terms of AP, F

score, precision and recall

values. Five-fold cross-validation is also applied to

BLAST. The results of BLAST are framed by the

solid black line. All models outperform baseline,

especially multi-channel BiGRU and multi-channel

CNN, and the host classiﬁcation task is harder than

the subtype classiﬁcation task for all models.

All models are trained only on the pre-19 data

set and tested on the post-19 and incomplete data

sets. The pre-19 data set includes 44 hosts, 17

HA, and 11 NA, which is more diverse than post-19

set (15 hosts, 15 HA, and 10 NA) and the incom-

plete set (30 hosts, 16 HA, and 10 NA). Pre-19 and

post-19 data sets contain only complete sequences,

as opposed to the incomplete data set. Therefore,

the post-19 data set is less diverse, and all mod-

els performed better on the post-19 data set than

on the pre-19 and incomplete data sets, with the

best model being the multi-channel BiGRU. Multi-

channel BiGRU achieves 98.92% (98.88%, 98.97%)

AP, 98.33% (98.22%, 98.44%) precision, 98.13%

(98.05%, 98.22%) F

and 98.08% (97.98%, 98.18%)

recall on post-19 set.

When it comes to pre-19 and incomplete data sets,

multi-channel CNN yields best results, with an AP

of 93.38% (93.04%, 93.72%), a precision of 92.40%

(91.99%, 92.81%), a F

of 92.00% (91.57%, 92.44%)

and an recall of 93.01% (92.63%, 93.38%) on pre-

19 data set; and an AP of 96.41% (96.08%, 96.74%),

a precision of 93.65% (93.25%, 94.05%), a F

93.42% (93.04%, 93.81%) and an recall of 94.08%

(93.70%, 94.46%) on incomplete data set.

We further select two pairs of HA and NA

sequences from two strains that respectively indi-

cate that humans were infected with the ﬁrst cases

of H5N8 and H10N3. A male patient was di-

agnosed with an A/H10N3 infection on 28 May

2021, and the isolated virus strain was named

as A/Jiangsu/428/2021. Whole-genome sequencing

analysis and phylogenetic analysis demonstrated that

this strain is of avian origin. More speciﬁcally, the

HA, NA, PB2, NS, PB1, MP, PA and NP genes of

this strain were closely related to some strains isolated

from chicken (Wang et al., 2021), which is aligns with

our model’s prediction, as shown in Table 2. The

second strain was isolated from poultry farm work-

End-to-End Multi-channel Neural Networks for Predicting Inﬂuenza a Virus Hosts and Antigenic Types

Figure 7: Comparison of Overall Performance Between Models (Hosts): the baseline results with BLAST are framed by the

black solid line.

Figure 8: Comparison of Overall Performance Between Models (HA subtypes): the baseline results with BLAST are framed

by the black solid line.

ers in Russia during a large-scale avian virus out-

break and was named A/Astrakhan/3212/2020. Phy-

logenetic analysis shows that this strain has high sim-

ilarity with some avian strains at the amino acid level

(Pyankova et al., 2021), which also matches our ﬁnd-

ings.

5 CONCLUSION AND

DISCUSSION

Inﬂuenza viruses mutate rapidly, leading to seasonal

epidemics, but they rarely cause pandemics. How-

ever, inﬂuenza viruses can exacerbate underlying dis-

KDIR 2022 - 14th International Conference on Knowledge Discovery and Information Retrieval

Figure 9: Comparison of Overall Performance Between Models (NA Subtypes): the baseline results with BLAST are framed

by the black solid line.

Table 2: Case Study.

Algorithms Predicted Hosts Predicted HA Predicted NA

A/Jiangsu/428/2021

(human; H10N3)

BiGRU

chicken (0.7)

duck (0.3)

H10

N3 (0.95)

mixed (0.05)

BiLSTM

chicken (0.65)

duck (0.35)

H10 N3

CNN

chicken (0.6)

duck (0.4)

H10 (0.95)

mixed (0.05)

N3 (0.95)

mixed (0.05)

Transformer

duck (0.7)

mallard (0.25)

chicken (0.05)

H10

N3 (0.95)

mixed (0.05)

A/Astrakhan/3212/2020

(human; H5N8)

BiGRU

chicken (0.5)

duck (0.35)

goose (0.1)

environment (0.05)

H5 N8

BiLSTM

chicken (0.75)

duck (0.2)

swan (0.05)

H5 N8

CNN

chicken (0.6)

duck (0.2)

goose (0.2)

H5 N8

Transformer

duck (0.65)

chicken (0.25)

goose (0.1)

H5 N8

eases which increase the mortality risk. In this pa-

per, we have proposed multi-channel neural networks

that can rapidly and accurately predict viral hosts at a

lower taxonomical level as well as predict subtypes of

IAV given the HA and NA sequences. In contrast to

handcrafting the encoding scheme for transferring the

protein sequences to numerical vectors, our network

can learn the embedding of protein trigrams (three

consecutive amino acids in the sequence). This can

transfer a protein sequence to a dense vector. The

neural network architecture is designed to be multi-

channel, which takes multiple inputs and generates

multiple outputs, eliminating the need to train sepa-

rate models for similar tasks.

End-to-End Multi-channel Neural Networks for Predicting Inﬂuenza a Virus Hosts and Antigenic Types

We incorporate CNN, BiLSTM, BiGRU, and

Transformer algorithms as part of our multi-channel

neural network architecture, and we ﬁnd that BiGRU

produces better results than other algorithms. A sim-

ple case study conducted in this study showed that our

results matched amino acid-level phylogenetic anal-

ysis in predicting the host and subtype of origin for

the ﬁrst human cases of infection with H5N8 and

H10N3. Our study enables accurate prediction of po-

tential host origins and subtypes for this strain and

could beneﬁt many resource-poor regions where ex-

pensive laboratory experiments are economically dif-

ﬁcult to be conducted. However, as we only utilized

protein sequence data, it cannot predict the type of re-

ceptor that the virus may be compatible with. There-

fore, further research is needed to predict potential

viruses that are cross-species transmissible.

Furthermore, we only apply supervised learning

algorithms in this study, which rely on correctly la-

belled data and favour the majority of data, resulting

in the poor predictive ability for labels with insufﬁ-

cient data. Therefore, leveraging insufﬁcient data is

also a goal of future research.

ACKNOWLEDGMENTS

The work is supported by University of Liverpool.

REFERENCES

Ahsan, R. and Ebrahimi, M. (2018). The ﬁrst implication of

image processing techniques on inﬂuenza a virus sub-

typing based on ha/na protein sequences, using convo-

lutional deep neural network. bioRxiv, page 448159.

Akosa, J. (2017). Predictive accuracy: A misleading perfor-

mance measure for highly imbalanced data. In Pro-

ceedings of the SAS Global Forum, volume 12. 3.4

Asgari, E. and Mofrad, M. R. (2015). Continuous

distributed representation of biological sequences

for deep proteomics and genomics. PloS one,

10(11):e0141287. 3.2

Asha, K. and Kumar, B. (2019). Emerging inﬂuenza d virus

threat: what we know so far! Journal of Clinical

Medicine, 8(2):192. 1

Attaluri, P. K., Chen, Z., and Lu, G. (2010). Applying

neural networks to classify inﬂuenza virus antigenic

types and hosts. In 2010 IEEE Symposium on Com-

putational Intelligence in Bioinformatics and Compu-

tational Biology, pages 1–6. IEEE. 2

Attaluri, P. K., Chen, Z., Weerakoon, A. M., and Lu, G.

(2009). Integrating decision tree and hidden markov

model (hmm) for subtype prediction of human in-

ﬂuenza a virus. In International Conference on Multi-

ple Criteria Decision Making, pages 52–58. Springer.

Bockhorst, J. and Craven, M. (2005). Markov networks

for detecting overlapping elements in sequence data.

Advances in Neural Information Processing Systems,

17:193–200. 3.4

Brown, T., Mann, B., Ryder, N., Subbiah, M., Kaplan, J. D.,

Dhariwal, P., Neelakantan, A., Shyam, P., Sastry, G.,

Askell, A., et al. (2020). Language models are few-

shot learners. Advances in neural information pro-

cessing systems, 33:1877–1901. 3.3.2

Bunescu, R., Ge, R., Kate, R. J., Marcotte, E. M., Mooney,

R. J., Ramani, A. K., and Wong, Y. W. (2005). Com-

parative experiments on learning information extrac-

tors for proteins and their interactions. Artiﬁcial intel-

ligence in medicine, 33(2):139–155. 3.4

Cao, Y., Liu, Z., Li, C., Li, J., and Chua, T.-S. (2019).

Multi-channel graph neural network for entity align-

ment. arXiv preprint arXiv:1908.09898. 2

Casas-Aparicio, G. A., Le

on-Rodr

ıguez, I., Hern

andez-

Zenteno, R. d. J., Castillejos-L

opez, M., Alvarado-

de la Barrera, C., Ormsby, C. E., and Reyes-Ter

an, G.

(2018). Aggressive ﬂuid accumulation is associated

with acute kidney injury and mortality in a cohort of

patients with severe pneumonia caused by inﬂuenza a

h1n1 virus. PLoS One, 13(2):e0192592. 1

Chen, Y., Wang, K., Yang, W., Qing, Y., Huang, R., and

Chen, P. (2020). A multi-channel deep neural network

for relation extraction. IEEE Access, 8:13195–13203.

Cho, K., van Merrienboer, B., Bahdanau, D., and Bengio,

Y. (2014). On the properties of neural machine trans-

lation: Encoder-decoder approaches. 3.3.1

Chrysostomou, C., Alexandrou, F., Nicolaou, M. A., and

Seker, H. (2021). Classiﬁcation of inﬂuenza hemag-

glutinin protein sequences using convolutional neural

networks. arXiv preprint arXiv:2108.04240. 2

Clayville, L. R. (2011). Inﬂuenza update: a review of cur-

rently available vaccines. Pharmacy and Therapeu-

tics, 36(10):659. 1, 2

Davis, J., Burnside, E. S., de Castro Dutra, I., Page, D.,

Ramakrishnan, R., Costa, V. S., and Shavlik, J. W.

(2005). View learning for statistical relational learn-

ing: With an application to mammography. In IJCAI,

pages 677–683. Citeseer. 3.4

Davis, J. and Goadrich, M. (2006). The relationship be-

tween precision-recall and roc curves. In Proceed-

ings of the 23rd international conference on Machine

learning, pages 233–240. 3.4

Devlin, J., Chang, M.-W., Lee, K., and Toutanova, K.

(2018). Bert: Pre-training of deep bidirectional trans-

formers for language understanding. arXiv preprint

arXiv:1810.04805. 3.3.2

Eng, C. L., Tong, J. C., and Tan, T. W. (2014). Predicting

host tropism of inﬂuenza a virus proteins using ran-

dom forest. BMC medical genomics, 7(3):1–11. 2

England, P. H. (2020). Inﬂuenza: the green book, chapter

19. 1

Fabija

nska, A. and Grabowski, S. (2019). Viral genome

deep classiﬁer. IEEE Access, 7:81297–81307. 2

KDIR 2022 - 14th International Conference on Knowledge Discovery and Information Retrieval

George, A., Mostaani, Z., Geissenbuhler, D., Nikisins, O.,

Anjos, A., and Marcel, S. (2019). Biometric face pre-

sentation attack detection with multi-channel convo-

lutional neural network. IEEE Transactions on Infor-

mation Forensics and Security, 15:42–55. 2

Goadrich, M., Oliphant, L., and Shavlik, J. (2004).

Learning ensembles of ﬁrst-order clauses for recall-

precision curves: A case study in biomedical informa-

tion extraction. In International Conference on Induc-

tive Logic Programming, pages 98–115. Springer. 3.4

Graves, A., Beringer, N., and Schmidhuber, J. (2005).

Rapid retraining on speech data with lstm recurrent

networks. Technical Report IDSIA-09–05, IDSIA.

3.3.1

Graves, A. and Schmidhuber, J. (2005). Framewise

phoneme classiﬁcation with bidirectional lstm and

other neural network architectures. Neural networks,

18(5-6):602–610. 3.3.1

Grechishnikova, D. (2021). Transformer neural network for

protein-speciﬁc de novo drug generation as a machine

translation problem. Scientiﬁc reports, 11(1):1–13.

3.3.2

Hochreiter, S. and Schmidhuber, J. (1997a). Long short-

term memory. Neural computation, 9(8):1735–1780.

3.3.1

Hochreiter, S. and Schmidhuber, J. (1997b). Lstm can solve

hard long time lag problems. Advances in neural in-

formation processing systems, pages 473–479. 3.3.1

Iuliano, A. D., Roguski, K. M., Chang, H. H., Muscatello,

D. J., Palekar, R., Tempia, S., Cohen, C., Gran, J. M.,

Schanzer, D., Cowling, B. J., et al. (2018). Esti-

mates of global seasonal inﬂuenza-associated respi-

ratory mortality: a modelling study. The Lancet,

391(10127):1285–1300. 1

James, S. H. and Whitley, R. J. (2017). Inﬂuenza viruses.

In Infectious diseases, pages 1465–1471. Elsevier. 1

Kerzel, M., Ali, M., Ng, H. G., and Wermter, S. (2017).

Haptic material classiﬁcation with a multi-channel

neural network. In 2017 International Joint Confer-

ence on Neural Networks (IJCNN), pages 439–446.

IEEE. 2

Kincaid, C. (2018). N-gram methods for inﬂuenza host

classiﬁcation. In Proceedings of the International

Conference on Bioinformatics & Computational Biol-

ogy (BIOCOMP), pages 105–107. The Steering Com-

mittee of The World Congress in Computer Science,

Computer . . . . 2

Kwon, E., Cho, M., Kim, H., and Son, H. S. (2020).

A study on host tropism determinants of inﬂuenza

virus using machine learning. Current Bioinformat-

ics, 15(2):121–134. 2

Lau, L. L., Cowling, B. J., Fang, V. J., Chan, K.-H., Lau,

E. H., Lipsitch, M., Cheng, C. K., Houck, P. M.,

Uyeki, T. M., Peiris, J. M., et al. (2010). Viral

shedding and clinical illness in naturally acquired in-

ﬂuenza virus infections. The Journal of infectious dis-

eases, 201(10):1509–1516. 1

Mills, C. E., Robins, J. M., and Lipsitch, M. (2004).

Transmissibility of 1918 pandemic inﬂuenza. Nature,

432(7019):904–906. 1

Mock, F., Viehweger, A., Barth, E., and Marz, M. (2021).

Vidhop, viral host prediction with deep learning.

Bioinformatics, 37(3):318–325. 2

Pyankova, O. G., Susloparov, I. M., Moiseeva, A. A.,

Kolosova, N. P., Onkhonova, G. S., Danilenko, A. V.,

Vakalova, E. V., Shendo, G. L., Nekeshina, N. N.,

Noskova, L. N., et al. (2021). Isolation of clade 2.3.

4.4 b a (h5n8), a highly pathogenic avian inﬂuenza

virus, from a worker during an outbreak on a poul-

try farm, russia, december 2020. Eurosurveillance,

26(24):2100439. 4

Raffel, C., Shazeer, N., Roberts, A., Lee, K., Narang,

S., Matena, M., Zhou, Y., Li, W., and Liu, P. J.

(2019). Exploring the limits of transfer learning

with a uniﬁed text-to-text transformer. arXiv preprint

arXiv:1910.10683. 3.3.2

Saito, T. and Rehmsmeier, M. (2015). The precision-recall

plot is more informative than the roc plot when eval-

uating binary classiﬁers on imbalanced datasets. PloS

one, 10(3):e0118432. 3.4

Scarafoni, D., Telfer, B. A., Ricke, D. O., Thornton, J. R.,

and Comolli, J. (2019). Predicting inﬂuenza a tropism

with end-to-end learning of deep networks. Health

security, 17(6):468–476. 2

Shaw, M. and Palese, P. (2013). Orthomyxoviridae, p 1151–

1185. ﬁelds virology. 1

Sherif, F. F., Zayed, N., and Fakhr, M. (2017). Classiﬁca-

tion of host origin in inﬂuenza a virus by transferring

protein sequences into numerical feature vectors. Int

J Biol Biomed Eng, 11. 2

Shu, Y. and McCauley, J. (2017). Gisaid: Global initia-

tive on sharing all inﬂuenza data–from vision to real-

ity. Eurosurveillance, 22(13):30494. 3.1.1

Squires, R. B., Noronha, J., Hunt, V., Garc

ıa-Sastre, A.,

Macken, C., Baumgarth, N., Suarez, D., Pickett, B. E.,

Zhang, Y., Larsen, C. N., et al. (2012). Inﬂuenza re-

search database: an integrated bioinformatics resource

for inﬂuenza research and surveillance. Inﬂuenza and

other respiratory viruses, 6(6):404–416. 3.1.1

Su, W., Yuan, Y., and Zhu, M. (2015). A relationship be-

tween the average precision and the area under the roc

curve. In Proceedings of the 2015 International Con-

ference on The Theory of Information Retrieval, pages

349–352. 3.4

Thireou, T. and Reczko, M. (2007). Bidirectional

long short-term memory networks for predicting

the subcellular localization of eukaryotic proteins.

IEEE/ACM transactions on computational biology

and bioinformatics, 4(3):441–446. 3.3.1

Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones,

L., Gomez, A. N., Kaiser, Ł., and Polosukhin, I.

(2017). Attention is all you need. Advances in neural

information processing systems, 30. 3.3.2, 3.4

Vemula, S. V., Zhao, J., Liu, J., Wang, X., Biswas, S., and

Hewlett, I. (2016). Current approaches for diagno-

sis of inﬂuenza virus infections in humans. Viruses,

8(4):96. 2

Wang, Y., Niu, S., Zhang, B., Yang, C., and Zhou, Z. (2021).

Withdrawn: The whole genome analysis for the ﬁrst

End-to-End Multi-channel Neural Networks for Predicting Inﬂuenza a Virus Hosts and Antigenic Types

human infection with h10n3 inﬂuenza virus in china.

Watanabe, T. (2013). Renal complications of seasonal and

pandemic inﬂuenza a virus infections. European jour-

nal of pediatrics, 172(1):15–22. 1

Wilde, J. A., McMillan, J. A., Serwint, J., Butta, J.,

O’Riordan, M. A., and Steinhoff, M. C. (1999). Ef-

fectiveness of inﬂuenza vaccine in health care profes-

sionals: a randomized trial. Jama, 281(10):908–913.

Xu, B., Tan, Z., Li, K., Jiang, T., and Peng, Y. (2017). Pre-

dicting the host of inﬂuenza viruses based on the word

vector. PeerJ, 5:e3579. 2

Xu, Y. and Wojtczak, D. (2022). Dive into machine

learning algorithms for inﬂuenza virus host predic-

tion with hemagglutinin sequences. arXiv preprint

arXiv:2207.13842. 3.2

Yang, Y., Wu, Q., Qiu, M., Wang, Y., and Chen, X. (2018).

Emotion recognition from multi-channel eeg through

parallel convolutional recurrent neural network. In

2018 international joint conference on neural net-

works (IJCNN), pages 1–7. IEEE. 2

Yin, R., Zhou, X., Rashid, S., and Kwoh, C. K. (2020).

Hopper: an adaptive model for probability estima-

tion of inﬂuenza reassortment through host prediction.

BMC medical genomics, 13(1):1–13. 2

KDIR 2022 - 14th International Conference on Knowledge Discovery and Information Retrieval