analysis across examples pulled through all our
models for any IM. The user interface will be utilized
to find various explainable models' ability in CBR to
pull useful instances and guide future work on the
strength and limitations of these approaches. The
neuro-symbolic approach would be employed with
explainable example- and prototype-based predictions
for IM classification tasks. A pre-trained model is
used, frozen, which feature extracts from a meme in a
transfer learning configuration with a distinct
downstream classification model making a final
decision based on the extracted features. The
modularity of the framework allows for ease in
comparing the pair-wise combination of the
explanation method and feature extraction model
employed.
(Biagio Grasso, et.al, 2024) demonstrate that the
capacity of KERMIT to learn context-dependent
information from ConceptNet and apply it to the task
of classification improves its performance.
Specifically, the proposed system achieves state-of-
the-art performance on the Facebook Hateful Memes
dataset and is comparable with the latest challengers
on all other datasets. Overall, the paper showcases the
significance of injecting external knowledge into the
process of classification and sets the door ajar for
potential future studies on meme harm detection,
exemplifying the massive scope of AI and knowledge
discovery to aid in moderation. Our KERMIT
framework thus uses a MANN to cache the
knowledge-enriched information graph representing
the entities of the meme and their associated
commonsense knowledge in ConceptNet. KERMIT's
memory block consists of several buckets, one with a
piece of the information network enhanced with
knowledge. In addition, KERMIT is burdened with
the weight of a learnable attention mechanism,
dependent on present input, the bucket(s) with enough
information to offer accurate classification of
hazardous memes. This enables our model to leverage
contextual information and applicable knowledge
from external sources to bypass content moderation on
posts by including external knowledge in the decision-
making process.
(Yang, Chuanpeng, et.al, 2025) ISM learns
modality-invariant and modality-specific
representations using graph neural networks that
complement the dual-stream models with the gap
bridged between them because ISM projects each
modality into two different spaces to retain different
features but merge the others to deal with the modality
gap. ISM uses state-of-the-art multi-modal dual-
stream models (e.g., CLIP, ALBEF and BLIP) as its
backbone, which is a testament to the scalability of
ISM. Based on experimental results, ISM outperforms
baselines and achieves competitive performance
compared to the state-of-the-art methods for toxic
meme detective. The functionality and effectiveness
of the components are also enhanced through ablation
and case studies. By filling the modality gap and
aligning image-text pairs, an ISM is proposed to be a
scalable malicious meme detection framework that
learns modality-specific and modality-invariant
representations through graph neural networks, finally
reaching a holistic and disentangled understanding of
memes.
3 BACKGROUND OF THE
WORK
Two essential natural language processing methods,
Count Vectorizer and TF-IDF, are essential for
obtaining significant characteristics from textual data
in the field of meme text classification. A collection
of text documents is mapped into a matrix form by
Count Vectorizer such that each document is
represented by a row and each vocabulary term is
represented by a column. The matrix contains
numbers representing the frequency of occurrence of
each word in the given document. As a result, every
document would be mapped to a vector such that the
entries represent the number of occurrences of each
word. But Count Vectorizer is not sophisticated in
determining the importance of words beyond
frequency. TF-IDF, by contrast, extends this by
including both the proportion of a term in a document
(Term Frequency) and how rare it is in the full corpus
(Inverse Document Frequency). This approach more
heavily weights the terms that have high frequency
within a document and low frequency overall in the
corpus, thus getting at their discrimination power.
Fundamentally, although Count Vectorizer
provides a straightforward representation by term
frequencies, TF-IDF provides a richer interpretation
by considering the importance of words within the
overall context of the corpus, thus rendering it very
valuable in memes text classification issues where it
is necessary to identify outstanding features for
accurate classification. Graph Neural Networks
(GNNs) offer a robust architecture for memes
classification by exploiting the inherent structure and
relationships of meme data. In memes classification,
A graph can be utilized to model the data, where
nodes are employed to denote items such as images,
text, or users, and edges denote relationships or
interactions among the items. GNNs can effectively