Deep Learning Semi-Supervised Strategy for Gamma/Hadron

Classiﬁcation of Imaging Atmospheric Cherenkov Telescope Events

∗

Diego Riquelme

1 a

, Mauricio Araya

, Sebastian Borquez

, Boris Panes

and Edson Carquin

Universidad T

ecnica Federico Santa Mar

ıa (for the CTA Consortium), Av. Espa

na 1680, Valpara

ıso, Chile

Instituto de Astrof

ısica, Pontiﬁcia Universidad Cat

olica de Chile, Avenida Vicu

na Mackenna 4860, Santiago, Chile

Keywords:

Deep Learning, Self-Supervised, Cherenkov, Classiﬁcation, High-Energy Astronomy, Convolutional.

Abstract:

The new Cherenkov Telescope Array (CTA) will record astrophysical gamma-ray events with an energy cov-

erage range, angular resolution, and ﬂux sensitivity never achieved before. The Earth’s atmosphere produces

Cherenkov’s light when a shower of particles is induced by a high-energy particle of astrophysical origin (gam-

mas, hadrons, electrons, etc.). The energy and direction of these gamma air shower events can be reconstructed

stereoscopically using imaging atmospheric Cherenkov detectors. Since most of CTA’s scientiﬁc goals focus

on identifying and studying Gamma-Ray sources, it is imperative to distinguish this speciﬁc type of event

from the hadronic cosmic ray background with the highest possible efﬁciency. Following this objective, we

designed a competitive deep-learning-based approach for gamma/background classiﬁcation. First, we train

the model with simulated images in a standard supervised fashion. Then, we explore a novel self-supervised

approach that allows the use of new unlabeled images towards a method for reﬁning the classiﬁer using real

images captured by the telescopes. Our results show that one can use unlabeled observed data to increase

the accuracy and general performance of current simulation-based classiﬁers, which suggests that continuous

improvement of the learning model could be possible under real data conditions.

1 INTRODUCTION

The Cherenkov Telescope Array (The CTA Consor-

tium, 2019)

(CTA) is a ground-based assemblage

of tens of Imaging Atmospheric Cherenkov Tele-

scopes (IACT). These telescopes were designed to

study high and very high energy (>20 GeV up to

300 TeV) gamma rays. Since gamma rays are pro-

duced in violent, highly active regions of the uni-

verse, such as supernovae remnants, pulsar wind neb-

ulae, and supermassive black holes in distant galaxies,

their study could provide important information about

these sources. Gamma rays interact with the Earth’s

atmosphere, generating a cascade of secondary parti-

cles called an extensive air shower. The highly rela-

tivistic process stimulates the emission of Cherenkov

light in the atmosphere, which is reﬂected by the tele-

scope’s mirrors into a high-speed camera sensor, ef-

fectively imaging the shower of particles.

But Cherenkov light is not exclusive to gamma-

ray showers. Cherenkov light events observed by

the telescopes are dominated by extensive air show-

https://orcid.org/0000-0003-0363-7720

∗

For the CTA Consortium

https://www.cta-observatory.org/

ers produced by cosmic rays, whereas gamma-ray de-

tections are only a fraction. CTA will provide a fac-

tor of 5 to 10 improvement in sensitivity compared to

current IACT telescopes, which requires an efﬁcient

rejection of cosmic rays in a highly imbalanced clas-

siﬁcation scenario. Since the particle level processes

underlying the generation of both types of extended

showers are not identical, the morphology of both ex-

tensive air showers is also different. Existing IACTs

can reject the background noise by reducing the im-

age into geometrical parameters that are handed to

classical classiﬁers, such as Random Forest (RF) (Al-

bert, 2007) or Boosted Decision Trees (BDT) (Krause

et al., 2017; Becherini et al., 2011; Ohm et al., 2009).

Following the current trends in digital image pro-

cessing, deep-learning techniques have also been ex-

plored in high-energy physics. In particular, convolu-

tional neural networks (CNN) use the information of

raw images, which potentially provides an advantage

to image parameterization as it allows for process-

ing whole event images at high speed. This line of

work has been studied, and promising results are al-

ready available (Grespan et al., 2021; Mangano et al.,

2018; Miener et al., 2021b; Nieto et al., 2019; Nieto

et al., 2017; Shilon et al., 2019; Jacquemont et al.,

Riquelme, D., Araya, M., Borquez, S., Panes, B. and Carquin, E.

Deep Learning Semi-Supervised Strategy for Gamma/Hadron Classiﬁcation of Imaging Atmospheric Cherenkov Telescope Events.

DOI: 10.5220/0011611500003411

In Proceedings of the 12th International Conference on Pattern Recognition Applications and Methods (ICPRAM 2023), pages 725-732

ISBN: 978-989-758-626-2; ISSN: 2184-4313

 2023 by SCITEPRESS – Science and Technology Publications, Lda. Under CC license (CC BY-NC-ND 4.0)

725

2019; Jury

sek et al., 2021). Most of these advances

use Monte Carlo simulations of the showers to train

and assess supervised models. Simulations are useful

since they can be generated in any desired proportion,

number of events, and different conﬁgurations. But

once real data becomes available, an adaptive strategy

is needed to train the models further.

In this work, we present a proof of concept for

allowing semi-supervised training once real data be-

come available in CTA, in quantities that will enable

proper training and testing. This is relevant because

real observations are naturally unlabeled in the con-

text of IACTs, as there is no other independent nor

systematic way to conﬁrm the type of the detected

particle. Moreover, several deep learning classiﬁca-

tion approaches proposed so far could beneﬁt from

our strategy of further training with unlabeled real

data. So our contribution is complementary to pre-

vious efforts in the ﬁeld. Also, we propose analyzing

the latent space of the deep learning model to under-

stand the properties of the input data parameters in

a lower dimensional representation. This may give us

insights into the connection to parametric descriptions

of the input data (such as Hillas parameters) and may

impact the speed of effective models for rapid classi-

ﬁcation tasks.

This paper is organized as follows. First, a general

idea of the current research on gamma/hadron classi-

ﬁcation is given. Then, we discuss the data features

and introduce the real data problem. Subsequently,

we describe our CNN architecture and its features.

Later, the experiments are described. Ensuing, the re-

sults are shown. Finally, we brieﬂy explore the Latent

Space and propose some interesting lines of work.

2 DEEP LEARNING FOR

GAMMA/HADRON

CLASSIFICATION

2.1 Related Work

As mentioned, previous existing Cherenkov telescope

arrays have developed techniques based on machine

learning: VERITAS and H.E.S.S use Boosted De-

cision Trees (BDT) while MAGIC uses Random

Forests (RF) (Becherini et al., 2011; Ohm et al., 2009;

Krause et al., 2017; Albert, 2007), both based on

Hillas parameterization (Hillas, 1985). These tech-

niques have reasonable performance, but new ap-

proaches based on artiﬁcial neural networks, such

as CNN, promise a leap forward by directly using

the charge and arrival time of pixels of the images

captured by the telescope. In fact, some authors

have already studied the effectiveness of deep learn-

ing approaches tested on real data. Pipelines that

include a framework for deep learning techniques

are being designed and used for stereoscopic recon-

struction (Miener et al., 2021b). While most studies

were conducted with simulated data, there are some

of them that compares with real data (Shilon et al.,

2019; Miener et al., 2021a; Vuillaume et al., 2021)

or trained/tested with a combination of both (Albert,

2007; Lu, 2013).

Recently, some preliminary efforts have been

made in the domain adaptation ﬁeld in the context of

gamma/hadron separation (Drew, 2021). They con-

front the problem with real data available from differ-

ent observatories.

2.2 Convolutional Neural Networks

with CTA Data

For this work we used simulated data (Bernlohr, 2008;

Heck et al., 1998) for the CTA south site(The CTA

Consortium, 2019). In particular, we used the Prod5

simulation dataset to enable reproducible research;

further information can be found on CTA’s website.

The dataset was subdivided into training and test-

ing. Training classes are balanced, while testing has

twice the number of protons than gamma events. This

was done to emulate class imbalance in real scenar-

ios, even though real class imbalance is much larger.

For instance, the event acquisition frequency in VER-

ITAS is mainly dominated by protons (350 Hz), while

gamma occurrence is 1 Hz (Krause et al., 2017). For

the classiﬁcation task, protons and diffuse gamma

sources are commonly used. Ideally, every other par-

ticle could be used for training, but since protons are

the dominant source, is common to use them as the

negative class in the classiﬁcation task.

In (Grespan et al., 2021), a CNN was trained

for reconstruction, obtaining state-of-the-art perfor-

mance. Similarly, in (Shilon et al., 2019), the au-

thors validate deep learning techniques for classiﬁca-

tion. Also, the authors explore the impact of feeding

real data to a model trained with simulated data. They

concluded that there is a signiﬁcant performance loss,

which motivates a more detailed study on using real

data when available.

CTA models used for event reconstruction are ex-

pected to be enhanced thanks to information from real

data. Still, since we cannot be truly sure about the ac-

tual source of the shower, we can’t rely on the exis-

tence of labelled real data. This is why we propose a

self-supervised step for training a CNN model. Since

it can only be trained with simulated data, a semi-

ICPRAM 2023 - 12th International Conference on Pattern Recognition Applications and Methods

726

supervised method could enable a model to use real

data further when they become available to improve

the latent space representation of the events.

We used two quality cuts to limit the number of

events and ﬁlter poor-quality images. These qual-

ity cuts are based on the Hillas intensity parameter,

which comprises the total charge in photo-electrons

(phe) captured by the sensors of the telescope (Gres-

pan et al., 2021). The two cuts are (1) events with

a Hillas intensity parameter higher than 1000 phe (re-

constructed photo electron) and (2) events higher than

300 phe. This allows us to learn a model with ﬁltered

high-intensity images and then expose the model to

lower-intensity images unseen in the initial training

for a self-supervised reﬁnement of the networks.

Even though there are recent advances on how

to perform stereoscopic reconstruction with hetero-

geneous telescopes, (Brill et al., 2019; Miener et al.,

2021b; Shilon et al., 2019), our proposal focuses on

single-telescope images from LST telescopes. How-

ever, since our proof of concept uses the available data

rather than a novel architecture, this technique could

also be used over heterogeneous stereoscopic classiﬁ-

cation techniques.

Speciﬁcally, we used a small and straightforward

CNN model. Activations were ReLu, and hyperpa-

rameters were tuned with a grid search methodol-

ogy. We also use Cyclical Learning Rate (CLR),

which leads to faster convergence, and better metrics

(Smith, 2017). All models were trained on Wilkes

3 HPC, with A100 GPU, using the Tensorﬂow pack-

age (Developers, 2022). We mounted the model over

GERUMO

pipeline, and thus, all the hyperparam-

eters and training information are available in the

repository. Layer shapes and dimensions of the CNN

architecture employed are shown in Table 1. As for

preprocessing, we only normalized the images.

Table 1: Summary of CNN Architecture.

Layer Shape

Input (55,47,3)

ConvLayer (53,45,64)

ConvBock (26,22,128)

ConvBlock (13,11,256)

ConvBlock (6,5,512)

ConvLayer (6,5,512)

Flatten (15360)

Dense 128

Dense 64

Output 2

https://github.com/sborquez/gerumo

Table 2: ConvBlock Layer.

ConvBlock Kernel Filters

ConvLayer

128

256

512

5x5

3x3

Dropout 0.25

ConvLayer

256

512

1024

5x5

3x3

BatchNormalization

MaxPool 2x2

2.3 Pseudo-Labeling Strategy

Pseudo-labeling is a semi-supervised strategy that al-

lows supervised training with labelled and unlabelled

data. It produces a hybrid loss between the supervised

and self-supervised labels, balanced by a mixing co-

efﬁcient (Lee, 2013). Throughout the training, the

balancing coefﬁcient (weighting) changes in favour

of the unbalanced data, which usually is available in

more signiﬁcant quantities.

The self-supervised loss uses predicted classes for

unlabeled data as if they were true labels. This is

done by assigning labels with the predictions from the

model, by equation 1:

(

1 if i = argmax f

(x),

0 otherwise

(1)

i ∈ gamma,other

where y

is the pseudo label generated for the unla-

beled data x, and f

(x) is the model’s output for the

th label. This means that given unlabeled data x, the

algorithm will use the prediction of labels as if they

were the true labels.

Initially, the model does not predict the classes ac-

curately, which is why the balancing coefﬁcient starts

in favour of supervised learning. Once the model’s

representation of the classes becomes more precise

in the network’s latent space, the predictions become

more accurate. Therefore the balancing coefﬁcient

starts moving toward the self-supervised data until

ending with nothing but an unlabeled loss.

Pseudo labelling makes two assumptions on the

latent space (Chapelle et al., 2009): Continuity As-

sumption which states that points that are close to

each other are more likely to share a label, and Clus-

ter Assumption which asserts that data tends to form

discrete clusters, and points that belong to the same

cluster, are likely to share a label.

Pseudo-labeling is a sound strategy for tackling

the CTA gamma/hadron classiﬁcation problem, but

it requires a novel approach in our context. Since

Deep Learning Semi-Supervised Strategy for Gamma/Hadron Classiﬁcation of Imaging Atmospheric Cherenkov Telescope Events

727

labelled data is only available through simulations,

once the observatory begins operations, the mayority

of available data is going to be real unlabeled data.

With this in mind, we propose the following idea as a

proof of concept to be used by CTA.

First, we train a supervised model with high-

intensity simulated data. Then, we initiate the self-

supervised training with an uncertainty threshold as

shown in Equation 2, where α is the uncertainty

threshold. Ideally, we would use real images here, but

since no real data is available, we use lower-intensity

images as proof of concept. The uncertainty threshold

is to avoid very complex examples that, even though

they contain very relevant information for the network

to learn, could mislead the training path of the net-

work. But once the network gets modiﬁed through

learning, those examples could meet the threshold cut

in another round.

sel f supervised data =

(

x ∈ dataset if f

(x) > α,

x /∈ dataset otherwise

(2)

i ∈ gamma,other

3 EXPERIMENTS

To test our proof of concept, we ﬁrst need to validate

that we can train and further improve the model with

labels created by the same model and tune the α pa-

rameter. This is done before taking the second step,

by emulating the context of new images. Next, a de-

scription of the performed experiments, metrics and

datasets is given.

3.1 Validation Experiment

The ﬁrst experiment assesses our variation of the

pseudo-labelling technique, which will be referred to

as the Validation Experiment. It begins by training

a supervised model with a small dataset. The per-

formance reached by this model should be enough to

generate distinguishable clusters in the latent space.

Then, the same model is trained in a self-supervised

fashion by generating the labels with its prediction

and applying the uncertainty threshold on a single

epoch. The details of the training events are shown

in Table 3.

For the Validation Experiment, both supervised

and self-supervised datasets have a quality cut on

Hillas intensity over 1000 phe. The training datasets

are balanced and evenly distributed for both particles.

This is done to train the CNN only using the morphol-

ogy of the image. The two models were tested over

the same testing dataset. The testing dataset also had

a quality cut of 1000 phe on Hillas intensity.

3.2 Application Experiment

The second experiment, namely the Application Ex-

periment, consists of training a model in a supervised

manner, just like in the ﬁrst experiment, but with a

larger subset to improve training. Then, this model

is further trained in a self-supervised fashion with the

same strategy as in the previous experiment but with

different quality events. In this experiment, the self-

supervised step is done with 300 phe cuts on Hillas

intensity instead of 1000 phe. Also, the second used

dataset overlaps with the ﬁrst to soften the training

gradients. Since the model was already trained with

events from the 1000 phe quality cut, the performance

improvement comes from the second quality cut (300

phe). The details of the training events are shown in

Table 3.

Both models were tested over the same testing

dataset, which also has the 300 phe quality cut on

Hillas intensity. All training and validation datasets

were balanced in particle class, i.e., proton and

gamma, but the testing dataset is unbalanced in a 2:1

proportion, respectively. This is to simulate a more re-

alistic proportion of the classes. A general overview

of the experiments and datasets is shown in Table 3.

4 RESULTS

In this section, we present the main results of both ex-

periments. The metrics considered for the analysis are

accuracy, recall, precision, and f1-score (Zeugmann

et al., 2011).

4.1 Validation Experiment

In the validation experiment, the model was trained

using an early stopping mechanism until a plateau was

reached in the loss. Since we used CLR, the network

converges faster than conventional methods.

For the Supervised step, we trained for 23 epochs,

but the minimum loss was reached on epoch 17. The

joint results for testing are shown in table 4.

While for the self-supervised method, all the train-

ing was done on a single epoch to avoid overﬁtting.

Since applying an uncertainty threshold to the net-

work could easily overﬁt the examples that the model

already classiﬁes with low uncertainty.

ICPRAM 2023 - 12th International Conference on Pattern Recognition Applications and Methods

728

Table 3: Summary of data used for the experiments.

Experiments Training (strategy: #events) Validation Testing

Validation

supervised: 74945

self-supervised: 541911

7898

234276

Application

supervised: 541911

self-supervised: 1519741

108945

605424

greater than 1000 phe,

greater than 300 phe

The α parameter was found by grid search as

stated in Equation 2. The model’s learning capacity

seems sensitive to this parameter, but our best per-

formance was reached with α = 0.9, thus applied for

both experiments.

As shown in table 4, the self-supervised strategy

improved the model’s performance in all metrics. In

other words, for the same image quality (same Hillas

intensity cut), the model learned a better representa-

tion using the predicted labels as if they were true.

Consequently, we can conclude that the general semi-

supervised strategy is an excellent candidate to further

study with different image qualities.

4.2 Application Experiment

For the Application experiment, it can be seen that

performance also improved for all metrics except for

precision, which decreased by 1%. However, the in-

crease in recall and accuracy tell us that the improve-

ment in correctly identifying gamma comes at the

price of a slight rise in the type-I error.

Table 4: Summary of the results of both experiments.

Experiments Accuracy Recall Precision f1-score

Validation Experiment

Supervised 0.88 0.85 0.82 0.82

self-supervised 0.90 0.92 0.85 0.85

Control (1000 phe) 0.91 0.90 0.83 0.87

Application Experiment

Supervised 0.81 0.76 0.70 0.73

self-supervised 0.83 0.87 0.69 0.77

Control (300 phe) 0.85 0.89 0.73 0.80

For all experiments, Recall and Precision are fo-

cused on gammas. Also, a control experiment was

done by training the model in a supervised fashion

on all the training datasets (complete information sce-

nario). The idea is to have a broader vision of where

the semi-supervised strategy lies.

4.3 Considerations

Our experimental design used different-quality un-

labeled data, which might be very different from

real event data. Real data can diverge signiﬁcantly

from simulated data (Shilon et al., 2019), consider-

ing that our cuts are different instances of the same

larger dataset and not an independent data source,

further experiments must be done on the difference

of real/simulated data (Jacquemont et al., 2021).

However, we assume that a trained classiﬁer with

carefully-crafted simulated data will have a reason-

able performance on real data (Miener et al., 2021a;

Vuillaume et al., 2021). This assumption must be

veriﬁed with real data; therefore, we suggest a latent

space tool to further help with this purpose.

The difference between real/simulated data is a

matter of constant study. Some of the studied differ-

ences consider dead/bright pixels and camera mod-

ules, imperfect calibration, imperfect NSB/moonlight

modelling, atmospheric modelling, ageing of the tele-

scopes, and the difﬁculty in reliable modelling of

hadronic interactions. Our strategy studies a possi-

ble approach to take advantage of using such com-

plex real images, but further research should be done

speciﬁcally on real/simulated data integration.

5 LATENT SPACE

We assume that a model that reaches a good perfor-

mance also learns a good representation of the phe-

nomenon, meaning that the images generate a non-

linear transformation which leads to separable classes

to predict in a latent space. Since CNN reduces the

dimension with depth, it brings an opportunity to ex-

plore this latent space representation of the network.

While convolutional layers extract certain features

that become more complex with depth, logic layers

combine these features to fulﬁl the task, in this case,

class prediction. We could extract the result of any in-

termediate layers to explore the latent space. Then, to

visualize, we can apply dimensional reduction tech-

niques, such as UMAP (McInnes et al., 2020), t-SNE

(Laurens Van der Maaten and Hinton, 2008) or PCA.

We applied PCA to show how the network distin-

guishes between classes and conﬁrm our assumptions

on pseudo-labelling.

As shown in Figure 1, classes are separable and

misclassiﬁed cases are accumulated on the edges of

the clusters, as was expected. We also conﬁrm that

the Cluster assumption and Continuity assumption for

the pseudo labelling hold.

Deep Learning Semi-Supervised Strategy for Gamma/Hadron Classiﬁcation of Imaging Atmospheric Cherenkov Telescope Events

729

Figure 1: PCA applied to the ﬁrst logic Layer (1st Dense

128).

Figure 2: Correlation between PCA and Hillas Parameters.

Additionally for PCA, we set a threshold of 70%

of the explained variance to get the number of compo-

nents for representation on the ﬁrst logic layer to ﬁnd

common ground between the CNN representation and

Hillas parameters. In Figure 2, we show the correla-

tion between the Hillas parameters and the PCA com-

ponents for the testing set from the Validation Exper-

iment.

As shown in Figure 2, the ﬁrst PCA component P1

is inversely correlated with Hillas length and width.

This means that the features learned by the CNN cor-

relate with Hillas parameters that are known to be es-

sential to discriminate whether it is gamma or proton

(Albert, 2007; Krause et al., 2017).

Further investigations need to be done regarding

these latent spaces. It also could be used to diagnose

and study the structural differences between real and

simulated data (Shilon et al., 2019).

6 CONCLUSION

We proposed a proof of concept for self-supervised

training for CTA. We validated our modiﬁcation of

the pseudo-labelling strategy and then retrained the

model with images with different quality cuts. The

model learned from fainter pseudo-labelled images

and improved the identiﬁcation of gamma-ray show-

ers in a statistically relevant amount.

These results suggest that our proposal is a suit-

able candidate to stimulate this line of research and a

potential approach for the actual operation of IACTs.

The potential for classiﬁcation performance on real

data could be enhanced by augmenting simulated

training data with real event images.

Our brief exploration of the latent space could

be used as common ground to connect Convolu-

tional Neural Networks and the Hillas parametriza-

tion. With a deeper exploration, we expect to under-

stand how the networks classify the data, contributing

to further improvements in simulation. For example,

training a CNN to classify real and simulated images

as in (Shilon et al., 2019) and then studying its latent

space could reveal a path to improve simulations fur-

ther.

ACKNOWLEDGEMENTS

This work was performed using resources provided

by the Cambridge Service for Data Driven Discov-

ery (CSD3) operated by the University of Cambridge

Research Computing Service (www.csd3.cam.ac.uk),

provided by Dell EMC and Intel using Tier-2 fund-

ing from the Engineering and Physical Sciences Re-

search Council (capital grant EP/T022159/1), and

DiRAC funding from the Science and Technology

Facilities Council (www.dirac.ac.uk).This work used

IRIS computing resources funded by the STFC.

This work was conducted in the context of

the CTA Analysis and Simulations Working Group,

and this paper has been through internal re-

view by the CTA Consortium. We gratefully

acknowledge ﬁnancial support from the agen-

cies and organizations listed here: http://www.cta-

observatory.org/consortium acknowledgments.

The work was also support by ANID PIA/APOYO

AFB180002, ANID-Basal Project FB0008 and

project FONDECYT 1190886.

ICPRAM 2023 - 12th International Conference on Pattern Recognition Applications and Methods

730

REFERENCES

Albert, J. (2007). Implementation of the Random For-

est Method for the Imaging Atmospheric Cherenkov

Telescope MAGIC. Publisher: arXiv Version Num-

ber: 2.

Becherini, Y., Djannati-Ata

ı, A., Marandon, V., Punch, M.,

and Pita, S. (2011). A new analysis strategy for de-

tection of faint gamma-ray sources with Imaging At-

mospheric Cherenkov Telescopes. Publisher: arXiv

Version Number: 1.

Bernlohr, K. (2008). Simulation of Imaging Atmo-

spheric Cherenkov Telescopes with CORSIKA and

sim telarray. Publisher: arXiv Version Number: 1.

Brill, A., Feng, Q., Humensky, T. B., Kim, B., Nieto, D.,

and Miener, T. (2019). Investigating a Deep Learning

Method to Analyze Images from Multiple Gamma-ray

Telescopes. In 2019 New York Scientiﬁc Data Summit

(NYSDS), pages 1–4. arXiv:2001.03602 [astro-ph].

Chapelle, O., Scholkopf, B., and Zien, Eds., A. (2009).

Semi-Supervised Learning (Chapelle, O. et al., Eds.;

2006) [Book reviews]. IEEE Transactions on Neural

Networks, 20(3):542–542.

Developers, T. (2022). TensorFlow.

Drew, R. D. (2021). Deep unsupervised domain adaptation

for gamma-hadron separation.

Grespan, P., Jacquemont, M., L

opez-Coto, R., Miener, T.,

Nieto-Casta

no, D., and Vuillaume, T. (2021). Deep-

learning-driven event reconstruction applied to sim-

ulated data from a single Large-Sized Telescope of

CTA. arXiv:2109.14262 [astro-ph].

Heck, D., Knapp, J., Capdevielle, J. N., Schatz, G., and

Thouw, T. (1998). CORSIKA: a Monte Carlo code

to simulate extensive air showers. Publication Title:

CORSIKA: a Monte Carlo code to simulate extensive

air showers ADS Bibcode: 1998cmcc.book.....H.

Hillas, A. M. (1985). Cerenkov light images of EAS pro-

duced by primary gamma. 3(OG-9.5-3):4.

Jacquemont, M., Vuillaume, T., Benoit, A., Maurin, G.,

Lambert, P., and Lamanna, G. (2021). First Full-

Event Reconstruction from Imaging Atmospheric

Cherenkov Telescope Real Data with Deep Learn-

ing. In 2021 International Conference on Content-

Based Multimedia Indexing (CBMI), pages 1–6, Lille,

France. IEEE.

Jacquemont, M., Vuillaume, T., Benoit, A., Maurin, G.,

Lambert, P., Lamanna, G., and Brill, A. (2019). Gam-

maLearn: A Deep Learning Framework for IACT

Data. In Proceedings of 36th International Cosmic

Ray Conference — PoS(ICRC2019), page 705, Madi-

son, WI, U.S.A. Sissa Medialab.

Jury

sek, J., Lyard, E., and Walter, R. (2021). Full LST-1

data reconstruction with the use of convolutional neu-

ral networks. arXiv:2111.14478 [astro-ph].

Krause, M., Pueschel, E., and Maier, G. (2017). Improved

$\gamma$/hadron separation for the detection of faint

gamma-ray sources using boosted decision trees. As-

troparticle Physics, 89:1–9. arXiv:1701.06928 [astro-

ph].

Laurens Van der Maaten and Hinton, G. (2008). Visualizing

data using t-SNE. 9(11).

Lee, D.-H. (2013). Pseudo-Label : The Simple and Efﬁcient

Semi-Supervised Learning Method for Deep Neural

Networks. Technical report.

Lu, C.-C. (2013). Improving the H.E.S.S. angular resolution

using the Disp method. arXiv:1310.1200 [astro-ph].

Mangano, S., Delgado, C., Bernardos, M., Lallena, M., and

azquez, J. J. R. (2018). Extracting gamma-ray in-

formation from images with convolutional neural net-

work methods on simulated Cherenkov Telescope Ar-

ray data, volume 11081. arXiv:1810.00592 [astro-

ph].

McInnes, L., Healy, J., and Melville, J. (2020). UMAP:

Uniform Manifold Approximation and Projection for

Dimension Reduction. arXiv:1802.03426 [cs, stat].

Miener, T., L

opez-Coto, R., Contreras, J. L., Green, J. G.,

Green, D., Mariotti, E., Nieto, D., Romanato, L., and

Yadav, S. (2021a). IACT event analysis with the

MAGIC telescopes using deep convolutional neural

networks with CTLearn. arXiv:2112.01828 [astro-

ph].

Miener, T., Nieto, D., Brill, A., Spencer, S., and Con-

treras, J. L. (2021b). Reconstruction of stereo-

scopic CTA events using deep learning with CTLearn.

arXiv:2109.05809 [astro-ph].

Nieto, D., Brill, A., Feng, Q., Humensky, T. B., Kim, B.,

Miener, T., Mukherjee, R., and Sevilla, J. (2019).

CTLearn: Deep Learning for Gamma-ray Astronomy.

arXiv:1912.09877 [astro-ph].

Nieto, D., Brill, A., Kim, B., and Humensky, T. B.

(2017). Exploring deep learning as an event classi-

ﬁcation method for the Cherenkov Telescope Array.

arXiv:1709.05889 [astro-ph].

Ohm, S., van Eldik, C., and Egberts, K. (2009). Gamma-

Hadron Separation in Very-High-Energy gamma-ray

astronomy using a multivariate analysis method. Pub-

lisher: arXiv Version Number: 1.

Shilon, I., Kraus, M., B

uchele, M., Egberts, K., Fischer, T.,

Holch, T., Lohse, T., Schwanke, U., Steppa, C., and

Funk, S. (2019). Application of deep learning meth-

ods to analysis of imaging atmospheric Cherenkov

telescopes data. Astroparticle Physics, 105:44–53.

Smith, L. N. (2017). Cyclical Learning Rates for Training

Neural Networks. In 2017 IEEE Winter Conference

on Applications of Computer Vision (WACV), pages

464–472, Santa Rosa, CA, USA. IEEE.

The CTA Consortium (2019). Science with the Cherenkov

Telescope Array. WORLD SCIENTIFIC.

Vuillaume, T., Jacquemont, M., de Lavergne, M. d. B.,

Sanchez, D. A., Poireau, V., Maurin, G., Benoit,

A., Lambert, P., Lamanna, G., and Project, C.-L.

(2021). Analysis of the Cherenkov Telescope Array

ﬁrst Large-Sized Telescope real data using convolu-

tional neural networks. arXiv:2108.04130 [astro-ph].

Zeugmann, T., Poupart, P., Kennedy, J., Jin, X., Han,

J., Saitta, L., Sebag, M., Peters, J., Bagnell, J. A.,

Daelemans, W., Webb, G. I., Ting, K. M., Ting,

K. M., Webb, G. I., Shirabad, J. S., F

urnkranz, J.,

ullermeier, E., Matwin, S., Sakakibara, Y., Flener,

Deep Learning Semi-Supervised Strategy for Gamma/Hadron Classiﬁcation of Imaging Atmospheric Cherenkov Telescope Events

731

P., Schmid, U., Procopiuc, C. M., Lachiche, N., and

urnkranz, J. (2011). Precision and Recall. In Sam-

mut, C. and Webb, G. I., editors, Encyclopedia of Ma-

chine Learning, pages 781–781. Springer US, Boston,

MA.

ICPRAM 2023 - 12th International Conference on Pattern Recognition Applications and Methods

732