Aircraft Type Recognition in Remote Sensing Images using Mean

Interval Kernel

Jaya Sharma

, Rajeshreddy Datla

1,2

, Yenduri Sravani

, Vishnu Chalavadi

and Krishna Mohan C.

Department of Computer Science and Engineering, Indian Institute of Technology Hyderabad, Hyderabad, India

Advanced Data Processing Research Institute (ADRIN), Department of Space, Secunderabad, India

Keywords:

Remote Sensing Images, Aircraft Type Recognition, Structural Information Model, Scale-invariant Feature

Transform (SIFT), Dynamic Kernels.

Abstract:

Structural characteristics representation and their ﬁne variations are crucial for the recognition of different

types of aircrafts in remote sensing images. Aircraft type classiﬁcation across different sensor remote sensing

images by spectral and spatial resolutions of objects in an image involves variable length spatial pattern iden-

tiﬁcation. In our proposed approach, we explore dynamic kernels to deal with variable length spatial patterns

of aircrafts in remote sensing images. A Gaussian mixture model (GMM), namely, structure model (SM)

is trained over aircraft scenes to implicitly learn the local structures using the spatial scale-invariant feature

transform (SIFT) features. The statistics of SM are used to design dynamic kernel, namely, mean interval

kernel (MIK) to deal with the spatial changes globally in the identical scene and preserve the similarities in

local spatial structures. The efﬁcacy of the proposed method is demonstrated on the multi-type aircraft remote

sensing images (MTARSI) benchmark dataset (20 distinct kinds of aircraft) using MIK. Also, we compare

the performance of the proposed approach with other dynamic kernels, such as supervector kernel (SVK) and

intermediate matching kernel (IMK).

1 INTRODUCTION

With the current earth observation abilities at pixel

level, the ﬁne details of the ground are illustrated and

the spatial content is gathered from high resolution

remote sensing images. Along with their sub-parts,

these details facilitate the computer vision community

in exploring even the small man made objects. The

inherent ﬁne variations within their sub-parts reveal

unique characteristics, which are helpful for object

recognition tasks. In remote sensing images, aircraft

type recognition is one such task that distinct char-

acteristics in deciding an aircraft type. Aircraft type

recognition includes various applications such as sta-

tus monitoring (Zhong et al., 2018) , airport surveil-

lance analysis (Chen et al., 2014), and aircraft iden-

tiﬁcation (He et al., 2018), (Chen et al., 2014). In

particular, an aircraft type and its dynamics remark-

ably help in examining the battleﬁelds thereby to for-

mulate rapid strategic military decisions (Liu et al.,

2012), (Zhong et al., 2018).

Typically, the background pixel occupancy of an

aircraft is high in a remote sensing aircraft scene when

compared to the existence of an aircraft in the im-

age. The ﬁne variations among different types of air-

crafts such as nose, empennage, fuselage, wings, en-

gines, etc., have challenges in perceptibility due to

their fewer pixel occupancy in the images resulting

in low classiﬁcation performance. So, the different

aircraft types with identical backgrounds as shown in

Fig. 1 cause low inter-class variation. Also, the var-

ious types of aircrafts such as B-52 & C-135, KC-10

& Boeing, and A-26 & P63, etc., are similar to each

other visually. Along with distinct background of an

aircraft, acquisition image factors such as spatial reso-

lution, manifest shadows caused by illumination con-

dition, variations in scale, view-angle, occlusion of

sub-parts, also cause high intra-class variations (the

aircraft of type Boeing, P-63, C-5, C-135, etc. from

Fig. 1).

In addition to the remote sensing image character-

istics, any equipment for ground handling, e.g., refu-

elers, aircraft service stairs, dollies, aero bridges, and

ground power units that are near the aircraft disrupt an

aircraft structure. Moreover, such equipment presents

an additional or a change in the existing structure of

an aircraft. These changes in the structure further

pose the challenges in demonstrating the ﬁne varia-

166

Sharma, J., Datla, R., Sravani, Y., Chalavadi, V. and C., K.

Aircraft Type Recognition in Remote Sensing Images using Mean Interval Kernel.

DOI: 10.5220/0011062600003209

In Proceedings of the 2nd International Conference on Image Processing and Vision Engineering (IMPROVE 2022), pages 166-173

ISBN: 978-989-758-563-0; ISSN: 2795-4943

tions among the different aircraft types.

C-5

Boeing

C-135

P-63

(a) Intra-class diversity

(b) Inter-class variation

KC-10 vs. Boeing

P-63 vs. A-26

Boeing vs. U-2

C-135 vs. B-52

Figure 1: Typical characteristics of multi-type aircraft re-

mote sensing images (MTARSI) benchmark dataset (Wu

et al., 2020). (a) Intra-class diversity: First, second, third,

and fourth column represent aircraft types, such as C-5,

Boeing, C-135, and P-63, respectively. (b) Inter-class simi-

larity: First row shows the similarity of Boeing vs. U-2 and

KC-10 vs. Boeing; Second row: C-135 vs. B-52; and P-63

vs. A-26 (best viewed in color).

In literature, for recognition of an aircraft type

in the remote sensing images, the methods based on

hand engineered features (Cheng et al., 2017) are not

vigorous as their features are designed speciﬁc to the

attributes of an image. The template matching-based

methods (Wu et al., 2014) (Xu and Duan, 2010) (Liu

et al., 2012) have some predeﬁned templates which

are applied in the recognition process. These methods

have certain limitations on generalizing the process of

recognition for the similar type of aircrafts but with

different sizes, if they are not incorporated in the tem-

plates. Though the high level features from the data

are automatically learned by CNN (Zhao et al., 2017),

they do not focus on the ﬁner variations in different

aircrafts. Hence, the performance of CNN-based ap-

proaches are not satisfactory in recognising various

types of aircrafts in remote sensing images.

In this work, we propose an approach to obtain an

efﬁcient representation for the classiﬁcation of differ-

ent types of aircrafts in remote sensing images. We

employ scale-invariant feature transform (SIFT) (Ha

and Moon, 2011) to capture the local structures such

as nose, empennage, fuselage, wings, engines, etc., of

aircrafts in remote sensing images. To encode these

variable length local SIFT features, a single Gaussian

mixture model (GMM) also known as structure model

(SM) is trained. The statistics of SM are employed

to estimate the likeness between any two images by

computing the distance between them. A better sepa-

rability is achieved by kernel methods for various im-

age classes by mapping distances to different space

(Smola and Sch

olkopf, 2004). Although, most of the

approaches are suited to deal with speciﬁed length

patterns, these are limited when compared to variable

number of local features between two images. So, to

handle the patterns of variable length we use dynamic

kernels to select the best local features or projecting

into ﬁxed-length patterns.

In dynamic kernels, the use of base kernels is help-

ful in measuring the likeness among two images by

computing the distance between their local features.

In kernels which are based on probability, the base

kernel is computed using the posterior probability of

GMM. Whereas, in matching based kernels, the local

features that are alike to GMM means are included in

the kernel computation. This helps in retaining key

local structures including the spatial patterns during

the computation of base kernel. Some of the scenes

in the images, such as rounded road connectedness in

collateral images, row houses in dense residential, are

considered as the signiﬁcant spatial patterns. Hence,

dynamic kernels become suitable option in order to

represent the likeness among the images.

The contributions done by this paper can be sum-

marized as:

• The similarities in the spatial patterns of local

structures of the aircraft are retained by training

structure model (SM).

• To deal with variable length spatial features of im-

ages, dynamic kernels are explored to retain the

local structures and capture the global variations

in an image

• The efﬁcacy of our approach is demonstrated on a

large aircraft recognition dataset: multi-type air-

craft remote sensing images (MTARSI), a bench-

mark dataset consisting of 20 different types of

aircrafts.

2 RELATED WORK

The existing approaches related to the image classiﬁ-

cation in remote sensing images are discussed in this

section. Also, we summarize the techniques that han-

Aircraft Type Recognition in Remote Sensing Images using Mean Interval Kernel

167

dle the variable length patterns based on the dynamic

kernels.

The low-level, mid-level, and high-level features

are explored in the existing methods for classiﬁcation

of different objects in remote sensing images (Cheng

et al., 2017) (Azam et al., 2021) (Lin et al., 2018).

The representations encoded on the low-level features

relying heavily on the hand-crafted features primar-

ily focuses on the precise attributes of the images. In

their design, the most prevalent spatial features used

are: shape, color, structural details, spatial and tex-

ture, etc. However, due to the remote sensing image

characteristics, these spatial feature combinations are

usually difﬁcult to achieve. Some global attributes

like texture descriptors and color histograms are used

in object classiﬁcation tasks (Cheng et al., 2017)– (Li

et al., 2018). But, for encoding the local properties,

we need an extra mechanism to describe an entire im-

age. Hence, the mid-level features are used by trans-

forming the local features into the global features to

describe an image completely. In locality constrained

linear coding (LLC) methods, the combination of

BOVW and spatial pyramid matching (SPM) (Cheng

et al., 2017), (Yang and Newsam, 2008), and bag-

of visual-words (BOVW) with scale-invariant fea-

ture transform (SIFT) features (Lowe, 2004), of re-

mote sensing images are used in image classiﬁca-

tion. To obtain effective sparselets (Cheng et al.,

2015a)– (Cheng et al., 2015b), part detectors are ex-

plored by employing feature descriptors of histogram

of oriented gradients (HOG) for image classiﬁcation.

A consolidated framework for joint super resolution

and aircraft recognition (Joint-SRARNet) is proposed

by Tang, Wei, et al.(Tang et al., 2020), that tries

to enhance the recognition performance by generat-

ing discriminative, high-resolution aircraft from low-

resolution remote sensing images. Technically, this

network integrates super resolution and recognition

tasks into the generative adversarial network (GAN)

framework through a joint loss function.

The methods employing convolutional features

have demonstrated better classiﬁcation performance

when compared to mid-level or hand-crafted feature

based methods. This is because the convolutional fea-

tures are able to provide good discrimination along

with better generalization. Also, the effectiveness

of ﬁne-tuned and pre-trained versions of GoogLeNet,

VGGNet16, and AlexNet, (Cheng et al., 2018), (Wu

et al., 2020), (He et al., 2018) in the classiﬁcation of

objects in remote sensing images were demonstrated.

The ensemble of CNNs is utilized to improve the clas-

siﬁcation performance over pre-trained CNN models

(Zhao et al., 2017), (Chang and Lin, 2011). The

object classiﬁcation performance is further improved

by accumulating, integrating, or fusing numerous at-

tributes of CNN (Zhong et al., 2018)–(Chaib et al.,

2017). In (Sitaula et al., 2020), the combination of

object-based and scene-based attributes from both re-

gion level as well as scene level are used for image

depiction. To address the problem of inter and intra

class dissimilarities in object classiﬁcation, an objec-

tive function is augmented along with the features of

CNN (Cheng et al., 2018).

Later, a key ﬁlter bank based CNN (KFBNet) (Li

et al., 2020) is used from the key locations of each im-

age by assimilating the class-speciﬁc features in order

to preserve global information for image classiﬁca-

tion. Another approach (Zhao et al., 2017) is explored

to model the unrevealed ontological formation by us-

ing the multi-granularity canonical appearance pool-

ing from the remote sensing images. A siamese net-

work is explored to obtain CNN attributes and to de-

termine the structure at its granule level. By calculat-

ing the second order statistics from the obtained CNN

attributes, Gaussian covariance matrices are derived.

The better classiﬁcation performance is obtained by

suitable normalization of the covariance matrices dur-

ing training.

The combination of hidden markov model

(HMM) and Gaussian mixture model (GMM) for the

variable length pattern representation is explored in

various application domains such as, image, speech,

video, and music analysis. Dynamic kernels (Dileep

and Sekhar, 2013) (Perveen et al., 2020) (Boumed-

dane et al., 2019) are important for obtaining a ﬁxed

length feature vector from patterns of variable length.

To construct a probabilistic sequence kernel (PSK),

instead of generative features which produce discrim-

inative features, Lee et al. (Lee et al., 2007) estimated

the Gaussian densities. Bhattacharyya distance-based

calculation is employed between the GMM mixtures

to incorporate both the ﬁrst and the second-order

GMM statistics (You et al., 2009) in order to boost

the computational performance of PSK. A single uni-

versal background model (UBM) is constructed to

model the features from multiple speakers to be mod-

eled. From the covariances and mean of UBM ob-

tained in mean interval supervectors, the covariances

and means are adjusted for each speaker. The ker-

nel resulting from a supervector is speciﬁed as the

Gaussian mean interval kernel for categorization us-

ing support vector machine (SVM). Rather than mean

or covariance transformation, intermediate matching

kernels (IMK) (Boughorbel et al., 2005) uses virtual

feature vector sets based on GMM mixtures for the

nearest local feature vectors to be selected from each

image. As the selected features acquired from a clip

are lesser than the local features (Dileep and Sekhar,

IMPROVE 2022 - 2nd International Conference on Image Processing and Vision Engineering

168

2013) from the probabilistic sequence kernel (PSK)

and the Gaussian mean interval kernel, IMK is com-

putationally more efﬁcient. Also, by the virtual fea-

tures optimal selection, it was demonstrated that fur-

ther depletion in computation time is possible.

3 PROPOSED METHOD

A dynamic kernel, namely, mean interval kernel

(MIK) for aircraft type recognition in remote sensing

images is discussed in this section. Figure 2 gives the

proposed approach which consists of three modules,

namely, feature extraction, structure model (SM), and

construction of dynamic kernels for classiﬁcation in

better kernel space.

3.1 Feature Extraction

The scale-invariant feature transform (SIFT) (Lowe,

2004) features are extracted to represent the local

structural information such as engines, wings, nose,

etc. For this, the key points are sampled in linear scale

space to recognise the locations that are invariant to

occlusion, view-point, and scale. A 16 ×16 region is

extracted around these key locations to compute the

magnitude and angle of the gradient. This region is

again divided into 4×4 sub-regions resulting in com-

puting 8 orientation histograms of 128 feature dimen-

sions. The features N ×128 around these key points

N describe the information of local structures for dis-

criminating different aircrafts. These SIFT features

from all the remote sensing aircraft scenes are used to

construct the structure model, which is explained in

the next sub-section.

3.2 Structure Model (SM)

The structure model (SM) is a Gaussian mixture

model (GMM) that encodes the SIFT features in order

to capture the local structures responsible for differ-

entiating the types of aircrafts. These local structures,

also known as attributes like nose, engine, fuselage,

etc., collectively form an aircraft. The SM is trained

on SIFT features of all aircraft remote sensing im-

ages to model these local structures of aircrafts. The

SM with weights w

, means µ

, and covariances σ

given by

p(x

|(w

,µ

,Σ

)) =

∑

q=1

N (x

|µ

,Σ

), (1)

The SM is trained on SIFT feature descriptor, x

us-

ing expectation maximization (EM) algorithm. We

assume that each GMM component captures an at-

tribute of the aircraft and the variance of each mixture

determines the variations of spatial patterns in differ-

ent aircrafts. The parameters of SM are adapted using

maximum a posterior (MAP) adaptation to enhance

the contribution of each component. These parame-

ters of SM are computed by

(x) =

∑

k=1

p(q|x

), (2)

(x) =

(x)

∑

k=1

p(q|x

, (3)

and

(x) =

(x)

∑

k=1

p(q|x

. (4)

where p(q|x

) =

p(x

|q)

∑

q=1

p(x

|q)

is the posterior prob-

ability of SM component q and p(x

|q) is the like-

lihood of feature vector x

. These SM parameters

are used to obtain a compact representation for han-

dling variable-length patterns using various dynamic

kernels.

3.3 SM-Mean Interval Kernel

(SM-MIK)

Dynamic kernels are kernel functions that map the

variable length spatial features to either constant

length feature vectors ( kernels based on probabil-

ity) or choosing the optimal features (matching based

kernels). SM-MIK is a probaility based kernel that

incorporates the additional information captured by

second-order statistics, along with means of SM. The

adapted means and covariances is given by

ˆw

= αn

(x)/K + (1 −α)w

, (5)

ˆµ

(x) = αF

(x) + (1 −α)µ

, (6)

(x) = αS

(x) + (1 −α)(Σ

+ µ

) −

. (7)

The mean supervector using the adapted means and

covariances is given by

(x) =

(x) + Σ

−

(ˆµ

(x) −µ

). (8)

Later, the SM-MIK is calculated to measure the simi-

larity between two remote sensing image x

& x

) = Φ

)

). (9)

Computational time of SM-MIK is O(QL + Q(K

) + K

). The time complexity of SM-MIK is high

due to calculation of ﬁrst & second order statistics of

SM.

Aircraft Type Recognition in Remote Sensing Images using Mean Interval Kernel

169

Figure 2: Block diagram of the proposed method for aircraft type recognition.

3.4 SM-Supervector Kernel (SM-SVK)

The SM-SVK kernels computes the similarity be-

tween two aircraft scenes by measuring the adapted

means of each image w.r.t the means of SM. The

adapted means is given by

ˆw

= αn

(x)/K + (1 −α)w

, (10)

ˆµ

(x) = αF

(x) + (1 −α)µ

, (11)

The SM supervector ϕ

(x) = [

√

−

ˆµ

(x)]

is ob-

tained by concatenating the adapted means of all mix-

tures of SM. The SM-SVK constructed using the

(x) is given by

) = Φ

)

). (12)

The computation time of SM-SVK is O(QL + QK

). Where, Q is number of mixtures in SM, L repre-

sents the number of local feature. K

gives the dimen-

sion of SIFT feature vector and K

denotes supervec-

tor dimension.

3.5 Intermediate Matching Kernel

(IMK)

The IMK is a matching based kernel that calculates

the similarity by choosing the nearest local features.

The IMK matches the local features from x

and x

with a virtual features set V = {v

,....v

} by cal-

culating

∗

= argmin

x∈x

D(x,v

), (13)

and

∗

= argmin

x∈x

D(x,v

), (14)

Here, the D(.) is the similarity function that ﬁnds the

spatial patterns in local features similar to patterns

learnt from the particular SM mixture. The virtual

features represented with the posterior probability can

effectively incorporate information about component

coefﬁcients, means, and covariances. It is given by

∗

= argmax

x∈x

p(q|x

), (15)

and

∗

= argmax

x∈x

p(q|x

). (16)

The computational time of IMK is O(QL). The com-

plexity is lower than other dynamic kernels due to the

selection of best feature vectors.

4 EXPERIMENTAL RESULTS

The effectiveness of the proposed SM-MIK based

method is evaluated on multi-type aircraft remote

sensing images (MTARSI) dataset. We also compare

SM-MIK with other dynamic kernels, namely, SM-

SVK and IMK in this section.

IMPROVE 2022 - 2nd International Conference on Image Processing and Vision Engineering

170

4.0.1 Dataset

We consider a challenging dataset, multi-type aircraft

remote sensing images (MTARSI) (Wu et al., 2020)

in recognizing the different aircraft types to demon-

strate the effectiveness of the proposed method. From

Google Earth satellite imagery constituting 20 differ-

ent aircraft types, this dataset consists of 9,385 air-

craft scenes. Each of the aircraft scenes contains ex-

actly one aircraft and the number of aircraft scenes

vary with different types of aircraft 230 to 846. From

different airports around the world with spatial resolu-

tion between 0.3 m and 1.0 m, the scenes of a speciﬁc

aircraft type are obtained. In addition to high within

class variation and inter-class similarity, this dataset

illustrates rich image variations. For each class in the

experimental settings, this dataset is randomly split-

ted into 80%-20% training-testing ratio.

For training of GMM, the feature vectors of SIFT,

which describes both global and local semantics are

extracted from each image. A single GMM is trained

for 5 different mixtures on MTARSI dataset.

4.0.2 Evaluation of Dynamic Kernels

In Table 1, the performance of various kernels are

presented on MTARSI dataset, by formulating ker-

nel based SVM classiﬁers using LibSVM (Chang and

Lin, 2011). The dynamic kernel performance is im-

proved with the SIFT features and it is examined that

beyond 128 the SM components do not participate in

the advancement of classiﬁcation achievement. Also,

it is noticed that SM-SVK and SM-MIK provide im-

proved classiﬁcation performance than IMK. This is

due to the incorporation of SM statistics (1

& 2

order) in SM-MIK and SM-SVK which can efﬁ-

ciently model the crucial information across the spa-

tial patterns of variable length entities. Though the

MIK is not efﬁcient computationally than IMK, we

can select the suitable kernel based on the use-case.

Table 1: Comparison of classiﬁcation performance for vari-

ous dynamic kernels with structural information model mix-

tures of {2

}

`=5

on MTARSI benchmark dataset.

SM mixtures SM-SVK SM-MIK IMK

32 75.32 88.64 73.59

64 83.54 89.79 82.13

128 89.52 90.87 89.79

256 86.89 90.54 85.76

512 86.21 89.32 84.43

4.0.3 Comparison with State-of-the Art Methods

The Table 2 gives the comparison of the existing

approaches with the proposed method on MTARSI

Table 2: Comparison of the proposed method with existing

approaches on MTARSI benchmark dataset.

Method Accuracy(%)

SIFT (Ha and Moon, 2011)+BOVW 59.02

HOG (Dalal and Triggs, 2005)+SVM 61.34

ScSPM (Yang et al., 2009) 60.61

LLC (Yu et al., 2009) 64.93

AlexNet (Krizhevsky et al., 2012) 85.61

GoogleNet (Szegedy et al., 2015) 85.61

VGG (Simonyan and Zisserman, 2014) 87.56

ResNet (He et al., 2016) 89.61

DenseNet (Huang et al., 2017) 89.15

EfﬁcientNet (Tan and Le, 2019) 89.79

SM-SVK 89.52

IMK 89.79

SM-MIK 90.87

dataset. The proposed SM-MIK based approach

outperforms the current state-of-the-art methods on

MTARSI data. Also, the performance of SM-MIK

on MTARSI dataset is 90.87%. This is because SM-

MIK can capture global dissimilarities efﬁciently by

modelling the variable length spatial patterns of air-

crafts in the images and conserving local formations.

This shows that SM-MIK is able to apprehend global

dissimilarities successfully. Thus, using the mean and

covariances of SM, we are able to capture the global

spatial features for the aircraft type recognition more

efﬁciently than the local spatial features modelled by

SIFT.

5 CONCLUSION

In this paper, a mean interval kernel (SM-MIK)

based-method is presented to obtain an effective rep-

resentation for recognizing different aircraft types

from remote sensing images. To model varying length

spatial features of the aircrafts while capturing global

variations, we construct SM-MIK over the trained

Gaussian mixture model. To deal with the varying

length spatial features of objects in scenes of remote

sensing images, the SM-MIK has demonstrated to be

better than the other kernels. In the calculation of SM-

MIK, the utilization of ﬁrst-order and second-order

statistics of the Gaussian mixture model contributes

useful information for the aircraft type classiﬁcation

task. Though IMK are not much discriminative in

comparison to SM-MIK, the IMK have better com-

putational time complexity. The effectiveness of the

proposed approach is demonstrated on the challeng-

ing large-scale MTARSI benchmark dataset for air-

craft type recognition.

Aircraft Type Recognition in Remote Sensing Images using Mean Interval Kernel

171

REFERENCES

Azam, F., Rizvi, A., Khan, W. Z., Aalsalem, M. Y., Yu, H.,

and Zikria, Y. B. (2021). Aircraft classiﬁcation based

on pca and feature fusion techniques in convolutional

neural network. IEEE Access, 9:161683–161694.

Boughorbel, S., Tarel, J. P., and Boujemaa, N. (2005). The

intermediate matching kernel for image local features.

In Proceedings. 2005 IEEE International Joint Con-

ference on Neural Networks, 2005., volume 2, pages

889–894. IEEE.

Boumeddane, S., Hamdad, L., Dabo-Niang, S., and Had-

dadou, H. (2019). Spatial kernel discriminant analy-

sis: Applied for hyperspectral image classiﬁcation. In

ICAART (2), pages 184–191.

Chaib, S., Liu, H., Gu, Y., and Yao, H. (2017). Deep fea-

ture fusion for vhr remote sensing scene classiﬁcation.

IEEE Transactions on Geoscience and Remote Sens-

ing, 55(8):4775–4784.

Chang, C.-C. and Lin, C.-J. (2011). Libsvm: a library for

support vector machines. ACM transactions on intel-

ligent systems and technology (TIST), 2(3):1–27.

Chen, J., Zhang, B., and Wang, C. (2014). Backscatter-

ing feature analysis and recognition of civilian aircraft

in terrasar-x images. IEEE Geoscience and Remote

Sensing Letters, 12(4):796–800.

Cheng, G., Han, J., Guo, L., and Liu, T. (2015a). Learning

coarse-to-ﬁne sparselets for efﬁcient object detection

and scene classiﬁcation. In Proceedings of the IEEE

conference on computer vision and pattern recogni-

tion, pages 1173–1181.

Cheng, G., Han, J., and Lu, X. (2017). Remote sensing

image scene classiﬁcation: Benchmark and state of

the art. Proceedings of the IEEE, 105(10):1865–1883.

Cheng, G., Yang, C., Yao, X., Guo, L., and Han, J. (2018).

When deep learning meets metric learning: Remote

sensing image scene classiﬁcation via learning dis-

criminative cnns. IEEE transactions on geoscience

and remote sensing, 56(5):2811–2821.

Cheng, G., Zhou, P., Han, J., Guo, L., and Han, J. (2015b).

Auto-encoder-based shared mid-level visual dictio-

nary learning for scene classiﬁcation using very high

resolution remote sensing images. IET Computer Vi-

sion, 9(5):639–647.

Dalal, N. and Triggs, B. (2005). Histograms of oriented

gradients for human detection. In 2005 IEEE com-

puter society conference on computer vision and pat-

tern recognition (CVPR’05), volume 1, pages 886–

893. Ieee.

Dileep, A. D. and Sekhar, C. C. (2013). Gmm-based inter-

mediate matching kernel for classiﬁcation of varying

length patterns of long duration speech using support

vector machines. IEEE Transactions on Neural Net-

works and Learning Systems, 25(8):1421–1432.

Ha, S.-W. and Moon, Y.-H. (2011). Multiple object track-

ing using sift features and location matching. Interna-

tional Journal of Smart Home, 5(4):17–26.

He, K., Zhang, X., Ren, S., and Sun, J. (2016). Iden-

tity mappings in deep residual networks. In Euro-

pean conference on computer vision, pages 630–645.

Springer.

He, N., Fang, L., Li, S., Plaza, A., and Plaza, J. (2018).

Remote sensing scene classiﬁcation using multilayer

stacked covariance pooling. IEEE Transactions on

Geoscience and Remote Sensing, 56(12):6899–6910.

Huang, G., Liu, Z., Van Der Maaten, L., and Weinberger,

K. Q. (2017). Densely connected convolutional net-

works. In Proceedings of the IEEE conference on

computer vision and pattern recognition, pages 4700–

4708.

Krizhevsky, A., Sutskever, I., and Hinton, G. E. (2012). Im-

agenet classiﬁcation with deep convolutional neural

networks. Advances in neural information processing

systems, 25:1097–1105.

Lee, K.-A., You, C., Li, H., and Kinnunen, T. (2007). A

gmm-based probabilistic sequence kernel for speaker

veriﬁcation. In Eighth Annual Conference of the In-

ternational Speech Communication Association. Cite-

seer.

Li, F., Feng, R., Han, W., and Wang, L. (2020). High-

resolution remote sensing image scene classiﬁcation

via key ﬁlter bank based on convolutional neural net-

work. IEEE Transactions on Geoscience and Remote

Sensing, 58(11):8077–8092.

Li, P., Ren, P., Zhang, X., Wang, Q., Zhu, X., and Wang, L.

(2018). Region-wise deep feature representation for

remote sensing images. Remote Sensing, 10(6):871.

Lin, J., Li, X., and Pan, H. (2018). Aircraft recognition

in remote sensing images based on deep learning. In

2018 33rd Youth Academic Annual Conference of Chi-

nese Association of Automation (YAC), pages 895–

899. IEEE.

Liu, G., Sun, X., Fu, K., and Wang, H. (2012). Aircraft

recognition in high-resolution satellite images using

coarse-to-ﬁne shape prior. IEEE Geoscience and Re-

mote Sensing Letters, 10(3):573–577.

Lowe, D. G. (2004). Distinctive image features from scale-

invariant keypoints. International journal of computer

vision, 60(2):91–110.

Perveen, N., Roy, D., and Chalavadi, K. M. (2020). Fa-

cial expression recognition in videos using dynamic

kernels. IEEE Transactions on Image Processing,

29:8316–8325.

Simonyan, K. and Zisserman, A. (2014). Very deep con-

volutional networks for large-scale image recognition.

arXiv preprint arXiv:1409.1556.

Sitaula, C., Xiang, Y., Basnet, A., Aryal, S., and Lu, X.

(2020). Hdf: hybrid deep features for scene image

representation. In 2020 International Joint Confer-

ence on Neural Networks (IJCNN), pages 1–8. IEEE.

Smola, A. J. and Sch

olkopf, B. (2004). A tutorial on

support vector regression. Statistics and computing,

14(3):199–222.

Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S.,

Anguelov, D., Erhan, D., Vanhoucke, V., and Rabi-

novich, A. (2015). Going deeper with convolutions.

In Proceedings of the IEEE conference on computer

vision and pattern recognition, pages 1–9.

IMPROVE 2022 - 2nd International Conference on Image Processing and Vision Engineering

172

Tan, M. and Le, Q. (2019). Efﬁcientnet: Rethinking model

scaling for convolutional neural networks. In Interna-

tional Conference on Machine Learning, pages 6105–

6114. PMLR.

Tang, W., Deng, C., Han, Y., Huang, Y., and Zhao, B.

(2020). Srarnet: A uniﬁed framework for joint su-

perresolution and aircraft recognition. IEEE Journal

of Selected Topics in Applied Earth Observations and

Remote Sensing, 14:327–336.

Wu, Q., Sun, H., Sun, X., Zhang, D., Fu, K., and Wang, H.

(2014). Aircraft recognition in high-resolution optical

satellite remote sensing images. IEEE Geoscience and

Remote Sensing Letters, 12(1):112–116.

Wu, Z.-Z., Wan, S.-H., Wang, X.-F., Tan, M., Zou, L., Li,

X.-L., and Chen, Y. (2020). A benchmark data set for

aircraft type recognition from remote sensing images.

Applied Soft Computing, 89:106132.

Xu, C. and Duan, H. (2010). Artiﬁcial bee colony (abc)

optimized edge potential function (epf) approach to

target recognition for low-altitude aircraft. Pattern

Recognition Letters, 31(13):1759–1772.

Yang, J., Yu, K., Gong, Y., and Huang, T. (2009). Lin-

ear spatial pyramid matching using sparse coding for

image classiﬁcation. In 2009 IEEE Conference on

computer vision and pattern recognition, pages 1794–

1801. IEEE.

Yang, Y. and Newsam, S. (2008). Comparing sift descrip-

tors and gabor texture features for classiﬁcation of re-

mote sensed imagery. In 2008 15th IEEE interna-

tional conference on image processing, pages 1852–

1855. IEEE.

You, C. H., Lee, K. A., and Li, H. (2009). Gmm-svm ker-

nel with a bhattacharyya-based distance for speaker

recognition. IEEE Transactions on Audio, Speech,

and Language Processing, 18(6):1300–1312.

Yu, K., Zhang, T., Gong, Y., et al. (2009). Nonlinear

learning using local coordinate coding. In NIPS, vol-

ume 22, pages 2223–2231. Citeseer.

Zhao, A., Fu, K., Wang, S., Zuo, J., Zhang, Y., Hu, Y.,

and Wang, H. (2017). Aircraft recognition based on

landmark detection in remote sensing images. IEEE

Geoscience and Remote Sensing Letters, 14(8):1413–

1417.

Zhong, Y., Ma, A., soon Ong, Y., Zhu, Z., and Zhang, L.

(2018). Computational intelligence in optical remote

sensing image processing. Applied Soft Computing,

64:75–93.

Aircraft Type Recognition in Remote Sensing Images using Mean Interval Kernel

173