Performances could be improved by introducing
additional regularization loss terms (Murdock and
Lucey, 2021). We could improve our results by adding
a loss term aiming to increase the gap between the
groups that will become active and the groups that
will be inactive after soft thresholding. Our results are
promising and the present loss term (Eq.
(3.1.2)
) may
be too strict. Another interesting loss term could be the
minimization of the mutual coherence of
D
D
D
(Murdock
and Lucey, 2020) and we leave this examination for
future works.
Our experimental studies can be generalized in sev-
eral ways. Firstly, a single layer can not be perfect for
all problems. The hierarchy of layers is most promis-
ing for searching for groups of different sizes. As an
example, edge detectors can be built hierarchically
using CNNs, see, e.g., (Poma et al., 2020).
Further, we restricted the investigations to groups
of the same size and the same bias, even though that
inputs may be best ﬁt by groups of different sizes, or
even by including a subset of single elements, and the
bias may also differ. This is an architecture optimiza-
tion problem, where the solution is unknown. Learn-
ing of the sparse representation is however, promising
since under rather strict conditions, high-quality sparse
dictionaries can be found (Arora et al., 2015). The
step to search for groups is still desired since (a) the
search space may become smaller by the groups and
(b) the presence of the active groups may be estimated
quickly and accurately using feedforward methods, es-
pecially transformers (in the absence of attacks). In
turn, feedforward estimation of the groups followed
by (P)GBP with different group sizes including single
atoms seems worth studying.
5 CONCLUSIONS
We studied the adversarial robustness of sparse coding.
We proved theorems for a large variety of structural
generalizations, including: groups within layers, di-
verse connectivities between the layers and versions
of optimization costs related to the
`
1
norm. We also
studied group sparse networks experimentally. We
demonstrated that our GBP can outperform BP, and
that our PGBP works better than both using 8 times
smaller representation. We found that PGBP offers
fast feedforward estimations and the transformer ver-
sion shows considerable robustness for the datasets we
studied. Finally, we showed that gap regularization
can improve robustness even further, as suggested by
condition 4) of Theorem 3.
Yet, the scope of our studies are limited from mul-
tiple perspectives. First, the suprisingly great perfor-
mance of our PGBP despite its small representation
calls for further investigations using more complex
datasets and attacks, as MNIST and IFGSM are too
simple and specialized compared to real world sce-
narios. Second, we believe that theoretical extensions
to PGBP are possible, and that varying group sizes
and other loss functions may provide performance im-
provements.
Defenses against noise, novelties, anomalies and,
in particular, against adversarial attacks may be solved
by combining our robust, structured sparse networks
with out-of-distribution detection methods.
ACKNOWLEDGEMENTS
The research was supported by (a) the Ministry of Inno-
vation and Technology NRDI Ofﬁce within the frame-
work of the Artiﬁcial Intelligence National Laboratory
Program, (b) Application Domain Speciﬁc Highly Re-
liable IT Solutions project of the National Research,
Development and Innovation Fund of Hungary, ﬁ-
nanced under the Thematic Excellence Programme
no. 2020-4.1.1.-TKP2020 (National Challenges Sub-
programme) funding scheme and (c) D. Szeghy was
partially supported by the NKFIH Grant K128862.
REFERENCES
Akhtar, N., Mian, A., Kardan, N., and Shah, M. (2021). Ad-
vances in adversarial attacks and defenses in computer
vision: A survey. IEEE Access, 9:155161–155196.
Arora, S., Ge, R., Ma, T., and Moitra, A. (2015). Simple,
efﬁcient, and neural algorithms for sparse coding. In
Conf. on Learn. Theo., pages 113–149. PMLR.
Bach, F., Jenatton, R., Mairal, J., and Obozinski, G.
(2011). Optimization with sparsity-inducing penalties.
arXiv:1108.0775.
Bai, Y., Mei, J., Yuille, A. L., and Xie, C. (2021). Are
transformers more robust than cnns? Adv. in Neural
Inf. Proc. Syst., 34.
Bottou, L., Curtis, F. E., and Nocedal, J. (2018). Optimiza-
tion methods for large-scale machine learning. Siam
Review, 60(2):223–311.
Cazenavette, G., Murdock, C., and Lucey, S. (2021). Ar-
chitectural adversarial robustness: The case for deep
pursuit. In IEEE/CVF Conf. on Comp. Vis. and Patt.
Recogn., pages 7150–7158.
Chen, S. S., Donoho, D. L., and Saunders, M. A. (2001).
Atomic decomposition by basis pursuit. SIAM Review,
43(1):129–159.
Dhillon, I. S., Heath, J. R., Strohmer, T., and Tropp, J. A.
(2008). Constructing packings in Grassmannian mani-
folds via alternating projection. Exp. Math., 17(1):9–
35.
DeLTA 2022 - 3rd International Conference on Deep Learning Theory and Applications
84