A Brief Review of Basic Deep Learning Models for Recommendation

Systems

Xitong Zhou

Department of Earth Science and Engineering, Imperial College London,

Exhibition Rd, South Kensington, London, U.K.

Keywords: Deep Learning, Recommendation System, Deep Neural Network.

Abstract: Recommendation systems are essential for delivering personalized content to users across various platforms,

enhancing user experience and engagement. Traditional filtering methods, including content-based filtering

and collaborative filtering, have been widely applied to recommend items based on user preferences or

similarities between users and items. However, these methods still face challenges such as data sparsity,

computational complexity, and the cold-start problem, which limit their effectiveness and scalability. This

paper provides an overview of these traditional recommendation techniques, their limitations, and how deep

learning approaches are transforming the field by addressing these issues. The discussion focuses on several

deep learning models, including Multi-Layer Perceptrons (MLP), Autoencoders, Convolutional Neural

Networks (CNN), Recurrent Neural Networks (RNN), and Generative Adversarial Networks (GAN). These

models enhance recommendation systems by capturing complex, non-linear interactions between users and

items, thereby significantly improving personalization, scalability, and robustness in cold-start scenarios. By

leveraging the power of neural networks, deep learning is ushering in a new era for recommendation systems,

offering more accurate, dynamic, and adaptive recommendations.

1 INTRODUCTION

The advent and widespread adoption of the internet

have provided users with an abundance of

information, meeting their needs in the information

age. However, as the internet rapidly expands, the

sheer volume of online information has grown

substantially, which has made it increasingly difficult

for users to identify the information that is truly useful

to them amidst the vast quantities available,

ultimately reducing the efficiency with which they

can utilize this information.

Recommendation systems play an important role

in improving user experience on diverse platforms by

offering tailored suggestions. Over time, these

systems have evolved from basic algorithms to more

advanced data-driven approaches, enabling more

accurate and relevant recommendations.

Conventional recommendation systems, including

content-based filtering and collaborative filtering, are

prevalent yet encounter substantial obstacles, such as

data sparsity, cold-start problems, and scalability

limitations.

Deep learning leverages multi-layer deep neural

networks to automatically learn from large datasets,

making it particularly useful in recommendation

systems. With increasing application of deep

learning, new models have been continuously

developed to overcome existing limitations. These

models employ neural networks to replicate user-item

interactions and to represent intricate patterns and

nonlinear correlations in the data, providing more

resilient solutions.

A Deep Neural Network (DNN) is an intricate

artificial neural network architecture consisting of

several layers of neurons. Each neuron is responsible

for receiving input, processing it, and producing an

output. The existence of several hidden layers

positioned between the input and output layers is a

DNN's defining feature (Samek, 2021).

- The input layer is the network's initial layer,

responsible for accepting input data.

- The hidden layers form the core of the network,

with each layer containing multiple neurons that

process the data received from a previous layer and

forward the results to the layer above.

Zhou and X.

A Brief Review of Basic Deep Learning Models for Recommendation Systems.

DOI: 10.5220/0013526300004619

In Proceedings of the 2nd International Conference on Data Analysis and Machine Learning (DAML 2024), pages 473-482

ISBN: 978-989-758-754-2

473

- The output layer, which is the network's last layer,

has as many neurons as necessary to complete a

certain job, like regression or classification,

depending on its particular needs.

This multi-layer architecture allows DNNs to

model complex relationships and extract high-level

features from data, making them very effective for a

variety of uses. A deep neural network framework for

recommendation systems (Figure 1) can be illustrated

with the following structure:

Figure 1: A General Architecture of DNNs in

Recommendation Systems

This paper gives a brief summary of various types

of recommendation systems and the difficulties that

traditional approaches encounter. It then introduces

how deep learning techniques, such as Multi-Layer

Perceptron (MLP), Autoencoders, Convolutional

Neural Networks (CNN), Recurrent Neural Networks

(RNN), and Generative Adversarial Networks

(GAN), have been applied to recommendation

systems and how they improve upon traditional

models. The paper concludes by outlining the

problems and difficulties that deep learning models in

recommendation systems are now facing and making

recommendations for future study.

2 BACKGROUND

2.1 Types of Recommendation Systems

The primary types of recommendation systems

include content-based, collaborative filtering, hybrid,

social network-based, and knowledge graph-based

approaches (Fayyaz, 2020). By examining a user's

past behavior and personal data, content-based

recommendation systems make recommendations

relevant to their interests and preferences. For

example, recommendations may include movies,

music, or books that align with a user's tastes.

Collaborative filtering systems match users or items

with similar interests based on historical behavior,

recommending content that similar users or items

have interacted with.

Because of their unique advantages and

disadvantages, each of these kinds is appropriate in

certain situations. In practical applications,

recommendation systems typically combine one or

more methods depending on specific requirements to

achieve better recommendation outcomes.

2.2 Challenges of Traditional Methods

The data sparsity, high processing complexity, and

cold-start problem are the main problems that

traditional recommendation systems confront. The

restricted availability of user-item interaction data is

referred to as data sparsity. Collecting enough rating

data is challenging since consumers engage with

recommendation systems in a variety of ways. For

instance, users typically rate only items they like,

while leaving many others unrated, resulting in sparse

data.

Additionally, due to the long-tail effect, niche

items can collectively make up a significant portion

of the market. Therefore, it is particularly crucial for

recommendation systems to provide a wide variety of

niche or less popular items to the right users.

However, in recommendation systems, a select few

popular items tend to receive the majority of user

ratings, while most niche items have very few ratings,

making it challenging to model relationships between

items. With more users and items added to the system,

the computational complexity rises significantly.

Traditional algorithms, such as matrix factorization,

require substantial computational resources to handle

large-scale data, making real-time recommendation

extremely difficult.

Moreover, recommendation systems often

encounter the cold-start issues. There are three

primary forms of cold start issues: first, the lack of

historical data for new users prevents the system from

identifying their preferences, making personalized

recommendations difficult; second, it is challenging

to determine which consumers would be interested in

new products uploaded to the system as there is

insufficient engagement data available.; third, when

the system is newly launched, the absence of

sufficient interaction data between users and items

makes it hard to provide any meaningful

recommendations (Ko, 2022).

DAML 2024 - International Conference on Data Analysis and Machine Learning

474

3 DEEP LEARNING MODELS IN

RECOMMENDATION

SYSTEMS

3.1 Multi-Layer Perceptron-based

Multi-Layer Perceptron (MLP) is an artificial neural

network that simulates the connectivity of neurons in

the human brain, enabling it to learn and extract

complex features from massive datasets. By

comparing the similarities between user preferences

and item features, MLP may be used to build user

profiles and provide accurate lists of

recommendations.

The construction of user profiles involves

collecting and analyzing behavioral data, personal

information, and other relevant data to create

personalized features for each user. These features

may encompass various aspects such as interests,

consumption habits, and geographical location. MLP

can learn and model these features, allowing for more

precise predictions of user preferences. Modeling

item characteristics includes information about the

content, attributes, and categories of items. By using

MLP to learn and process item features, the system

can uncover associations and similarities among

items, ultimately providing recommendations aligned

with user preferences (Zhou, 2022).

3.1.1 Neural Collaborative Filtering-based

The Neural Collaborative Filtering (Neural CF)

model is an improvement over conventional

collaborative filtering techniques (Matrix

Factorization, MF) in that it makes use of a multi-

layer neural network instead of using the dot product

operation between the user and item vectors. This

allows for more comprehensive interactions between

the two vectors, yielding richer combinations of

valuable feature information. To enable the neural

network to fully perform collaborative filtering, a

multi-layer perceptron is employed to mimic user-

item interactions. The output from one layer is used

as the input for the succeeding layer. Neural CF

significantly improves the generalization and fitting

capabilities of collaborative filtering algorithms (He,

2017).

To increase the model's functionality for both

linear and nonlinear combinations, He et al. (2017)

proposed a hybrid version of Neural Collaborative

Filtering that integrates the original Neural CF model

with a Multi-Layer Perceptron (MLP) and the

element-wise product-based Generalized Matrix

Factorization (GMF). In this model, the embedding

vectors are learned separately rather than shared,

providing greater flexibility—allowing for different

dimensions of latent vectors to be determined based

on the complexity of the model, and computed

individually within their respective models. The

results from these computations are then merged and

passed through a further fully linked layer to generate

the output (He, 2017).

The Joint Neural Collaborative Filtering (J-NCF)

approach optimizes user-item ratings using a user-

item rating matrix. This method combines deep

feature learning with deep interaction modeling,

enhancing recommendation performance by

capturing nonlinear user-item interactions. The

model's loss function considers pointwise and

pairwise loss, as well as implicit and explicit

feedback. J-NCF is skilled in scalability and

sensitivity under varying data sparsity and user

activity levels, especially addressing "inactive users"

(Chen, 2019).

3.1.2 AutoMLP-based

Recently, Li proposed a long-short term sequence

recommendation system named AutoMLP, designed

to more accurately represent users' short-term and

long-term interests in their past encounters. AutoMLP

consists solely of Multi-Layer Perceptrons (MLPs),

keeping complexity in time and space linear. Both

long-term and short-term dependencies are captured

by the model's long-short term interest module.

Utilizing automation techniques, AutoMLP employs

continuous relaxation to transform discrete sequence

lengths into continuously differentiable

representations, thereby adaptively optimizing the

window of short-term interest for various tasks (Li,

2024).

3.2 Autoencoder-based

An autoencoder is a neural network model whose

core idea is to use an encoder to convert input data

into a low-dimensional feature representation, and a

decoder to either produce new data or decode the old

data back. Theoretically, an autoencoder can be

considered a generative model, as it learns the data

distribution and may produce fresh data samples. In

recommendation systems, autoencoders are a useful

tool for learning users' latent features to generate

content that aligns with their preferences, thereby

enabling more accurate personalized

recommendations. From an application perspective,

autoencoders can be employed to handle data in

A Brief Review of Basic Deep Learning Models for Recommendation Systems

475

recommendation systems, such as for dimensionality

reduction and data compression.

3.2.1 Denoising Autoencoder-based

Denoising Autoencoders (DAEs) are a type of neural

network model used for unsupervised learning

(Vincent, 2008). Unlike standard autoencoders,

which may simply replicate the input or extract trivial

features, DAEs intentionally introduce noise into the

input data and then attempt to recreate the initial,

noise-free data. This approach enhances the

constraints on the data, encouraging the model to

acquire more practical feature representations.

By reconstructing the noisy input, DAEs achieve

more robust feature representations, thus avoiding the

issue of merely copying the original input. Overall,

denoising autoencoders not only learn features

similar to the original data but also demonstrate

improved performance and stability in the presence of

noise and uncertainty.

Stacked Denoising Autoencoders (SDAEs) were

introduced by Vincent et al. as a deep network

construction strategy, where these autoencoders are

trained locally to denoise corrupted input versions

(Vincent, 2010). Compared to standard autoencoders,

this approach has shown a significant reduction in

classification errors in benchmark tests. Building on

the SDAE framework, Wang et al. proposed a

probabilistic version of SDAE that integrates it with

Probabilistic Matrix Factorization (PMF), leading to

the development of the Relational SDAE (RSDAE)

model. This model systematically combines deep and

relational learning and can naturally extend to handle

multi-relational data, effectively enhancing the

performance and broad applicability of label

recommendation tasks (Wang, 2015).

Wu et al. suggested the Collaborative Denoising

Autoencoder (CDAE) for top-N recommendation.

CDAE serves as a generalization of various existing

collaborative filtering models, featuring a more

flexible structure that effectively incorporates

nonlinear components to enhance recommendation

performance (Wu, 2016). Subsequently, Khan et al.

proposed the User-Tracking Collaborative Denoising

Autoencoder (UT-CDAE). This model evaluates user

rating trends (high or low) across a group of items to

determine user-item associations. The introduction of

rating trends provides the model with additional

regularization flexibility. By incorporating trend-

based weighting, UT-CDAE is able to learn more

robust and nonlinear latent representations,

improving the ranking prediction capability of the

output layer and thereby enhancing the accuracy of

top-N recommendation predictions (Khan, 2019).

Considering an infinite number of copies of the

training data that are corrupted, Chen et al. proposed

the Marginal Denoising Autoencoder (MDAE),

which addresses the issue of corruption by

(approximately) marginalizing it during the training

process (Chen, 2014). MDAE implicitly marginalizes

all possible corrupted samples to reconstruct the

error, thus avoiding additional computational costs.

This enables MDAE to achieve or exceed the

performance of traditional denoising autoencoders

within fewer training iterations. Compared to other

related works, MDAE not only supports nonlinear

encoding and decoding but also excels in learning

latent representations. Additionally, Marginalized

Stacked Denoising Autoencoder (MSDAE) is a deep

structure that can be created by stacking MDAE

(Zhang, 2020).

3.2.2 Variational Autoencoder-based

Variational Autoencoders (VAEs) are a kind of neural

network-based generative model that perform

exceptionally well in unsupervised learning. Using

random sampling, they produce new samples that are

comparable to the original data but not exact replicas

of it after discovering the latent distribution of the

data samples (Kingma, 2013). This makes VAEs a

promising approach for personalized

recommendations in recommendation systems. To

become familiar with the model parameters, VAEs

maximize the marginal probability of the observed

data during training.

The impact of VAEs on the development of

recommendation systems is significant. They

discover how users' past actions and interests relate to

one another, generating new samples that align with

user preferences but differ in specific ways.

Additionally, VAEs can be used to discover hidden

features and calculate similarities. By learning the

similarities between users and applying this in

recommendation systems, VAEs enable more

accurate similarity computations by measuring

distances in the latent space, thus enhancing the

recommendation's accuracy. Furthermore, VAEs

learn the relationships between users and items in the

latent space, which is beneficial for addressing cold-

start problems. By mapping new items or users into

the latent space and comparing them with existing

data, VAEs can provide personalized

recommendations.

To further enhance collaborative filtering

performance, Shenbin et al. (Shenbin, 2020) proposed

DAML 2024 - International Conference on Data Analysis and Machine Learning

476

the Recommender VAE (RecVAE), based on VAE

and Mult-VAE (Zhao, 2017). RecVAE presents a

composite prior combining standard Gaussian priors

with the latent code distribution from the previous

iteration, improving training stability and

performance. Additionally, RecVAE employs an

alternating update training method, allowing the more

complex encoder to be updated multiple times during

each decoder update, using corrupted inputs for

encoder training while the decoder uses clean inputs.

These innovations significantly enhance the model's

recommendation performance under implicit

feedback (Shenbin, 2020).

Zhu et al. presented the Mutually-regularized

Dual Collaborative Variational Autoencoder (MD-

CVAE) to address sparsity and cold-start problems in

collaborative filtering by using stacked latent item

embeddings in place of the standard User

Autoencoder (UAE)'s randomly initialized weights in

the final layer, integrating user ratings with item

content information within a unified variational

framework. This prevents the model from converging

to suboptimal solutions in sparse data conditions.

Additionally, user ratings facilitate the learning of

item content representations that better meet

recommendation needs. MD-CVAE also incorporates

a symmetric inference strategy, connecting the latent

item embeddings in the decoder to the UAE encoder's

first-layer weights, enabling the recommendation of

new items without retraining (Zhu, 2022).

Li et al. proposed a Distributed Variational

Autoencoder (DistVAE) for sequential

recommendations, aiming to address data privacy

concerns while achieving effective model training.

DistVAE employs a client-server architecture,

coordinating thousands of clients for training without

requiring the aggregation of their raw data. DistVAE

combines pseudo-batching for global model updates

and a Gaussian Mixture Model (GMM) to

dynamically cluster clients into virtual groups to

improve the stability of global model training. Clients

within each virtual group sequentially train “local”

models, sharing training experiences to improve

recommendation outcomes (Li, 2023).

3.3 Convolutional Neural

Network-based

Convolutional Neural Networks (CNNs) are a type of

feedforward neural network characterized by deep

structures and convolutional computations, gaining

significant attention in recent years for their

efficiency in recognition tasks. The design of CNNs

is inspired by the animal visual system, which

processes information hierarchically to extract image

features. An input layer, convolutional layers, pooling

layers, and fully linked layers make up a CNN's

conventional architecture. Local feature extraction is

performed by the convolutional layers, feature

dimensionality reduction is handled by the pooling

layers, and classification and regression are managed

by the fully connected layers (Shiri, 2023).

In recommendation systems, CNNs can achieve

personalized recommendations by learning features

from user behavior data. Their exceptional scalability

allows them to efficiently handle sparse data that is

high dimensional and on a wide scale, which helps to

solve data sparsity and cold-start circumstances. The

automatic feature learning capability of CNNs allows

them to excel in multimedia tasks such as fashion

recommendations, music streaming, and video

content recommendations. Their hierarchical learning

mechanism enables CNNs to capture a rich spectrum

of information, from low-level to high-level features,

enhancing robustness against variations in input data

and further enhancing strength of recommendations.

Furthermore, Wang et al. suggested an automatic

CNN recommendation system to create a system

tailored for image classification tasks. This system

analyzes the training data of classification tasks,

evaluates the complexity of the task, and recommends

the optimal CNN model based on the complexity

score. This approach eliminates the need for

extensive model training typically required in

traditional model selection processes, thus saving

time. The system also introduces a "capability score"

to measure the classification capability of CNN

models, taking into account factors such as

computational cost, model depth and width, and the

vanishing gradient problem (Wang, 2017).

To enhance recommendation performance by

leveraging user reviews, Zheng et al. proposed a

model called Deep Collaborative Neural Network

(DeepCoNN). DeepCoNN is made up of two

simultaneous neural networks: one learns properties

connected to items based on their reviews, while the

other utilizes user reviews to focus on user behavior.

A shared layer connects these two networks, allowing

latent factors to interact with each other. This joint

modeling approach improves the accuracy of

prediction, especially for users and items with limited

ratings, effectively addressing the sparsity problem

(Zheng, 2017).

In recent years, a novel neural network model

called CoCNN has been developed for collaborative

filtering (CF) and implicit feedback recommendation.

CoCNN combines co-occurrence patterns with

Convolutional Neural Networks (CNNs). It employs

A Brief Review of Basic Deep Learning Models for Recommendation Systems

477

a multi-task neural network structure to bridge user-

item pairs and item-item pairs through co-occurrence

relationships, capturing more useful information.

Additionally, CoCNN operates directly on the

embedding layer using a CNN architecture, rather

than employing outer product operations, thereby

addressing the data and space complexity issues

associated with outer products (Chen, 2022).

To deal with data sparsity in recommendation

systems and enhance reliability, Li et al. proposed a

model called Auxiliary Review-based Personalized

Attention Convolutional Neural Network

(ARPCNN). ARPCNN employs a parallel CNN

structure to process item reviews and user reviews

separately. By using customized word- and review-

level attention processes, it gives keywords and

significant reviews a larger attention weight.

Additionally, ARPCNN introduces a user auxiliary

network (Aux-Net), which leverages reviews from

similar users in trust relationships as auxiliary

information. This helps extract features more

accurately for user modeling, thereby improving

recommendation performance (Li, 2022).

3.4 Recurrent Neural Network-based

Recurrent Neural Networks (RNNs) are a particular

kind of recursive neural network that accepts

sequence data as input. In contrast to standard

Feedforward Neural Networks (FNNs), all of the

nodes (recurrent units) in an RNN are connected in a

structure resembling a chain, and recursion happens

in the direction that the sequence is evolving. RNNs

are predicated on the idea that human cognition is

dependent on memory and prior experiences, giving

the network the ability to "remember" previous

information. This makes RNNs particularly well-

suited for handling and predicting time dependencies

and temporal information in sequence data, where

there is an inherent order and dependency between

data points (Shiri, 2023).

In many recommendation scenarios, user

behaviors are sequential or session-based, implying

that a user's past interactions with certain goods (such

music, movies, or products) have an impact on the

items they may interact with later. This recurrent

structure allows RNNs to maintain and pass data from

earlier time steps to the present one. As RNNs retain

a hidden state that evolves as new inputs (user

interactions) are processed, they can capture the

temporal dependencies between a user's past

behaviors and future preferences. This makes RNNs

an ideal choice for modeling the dynamic

characteristics of user sessions in recommendation

systems.

3.4.1 Gated Recurrent Unit-based

An RNN variant called the Gated Recurrent Unit

(GRU) adds gating techniques to regulate information

flow, allowing it to more effectively process

sequential data. The two gates in GRU are the update

gate and the reset gate. The update gate identified

which information needs to be updated, while the

reset gate decides which information should be

forgotten. This gating mechanism enables GRUs to

recognize distant dependencies in sequential data

more effectively, which has led to significant success

in recommendation systems (Shiri, 2023).

By controlling the information flow, GRUs solve

the issue of the vanishing gradient often encountered

in traditional RNNs, permitting them to retain

important information over longer sequences. This

makes GRUs particularly well-suited for applications

such as personalized recommendations, where

understanding long-term user preferences or behavior

patterns is crucial for generating accurate predictions

(Yang, 2020).

In collaborative filtering tasks, Bansal et al.

encoded text sequences into latent vectors,

specifically using GRU trained end-to-end. In the

scientific paper recommendation task, this approach

significantly improved recommendation accuracy,

particularly outperforming methods that ignore word

order in cold-start situations. By leveraging the

regularization effect of multi-task learning

(combining metadata prediction and content

recommendation), the network shared the text

encoder between the recommendation and metadata

prediction tasks, preventing overfitting in deep

models. This approach further enhanced performance

and effectively alleviated the sparsity issue in the

rating matrix (Bansal, 2016).

To tackle the sequential recommendation

challenge, Donkers et al. suggested to modeling the

temporal dynamics of consumption sequences using

GRU. Additionally, they introduced a novel gated

structure with an extra input layer to explicitly

represent each user's personalized information. This

led to the design of a user-level GRU specifically for

generating personalized next-item recommendations

(Donkers, 2017). With this method, the model can

represent the preferences of each unique user more

accurately and more successfully customize

recommendations depending on user activity

patterns.

DAML 2024 - International Conference on Data Analysis and Machine Learning

478

To enhance personalization in recommendation

systems, Zeng et al. proposed an algorithm that

utilizes a GRU network as the primary model to

reduce the effects of multi-layer networks'

overfitting. Additionally, they introduced an attention

mechanism, allowing the recommendation model to

more precisely extract key information from user data

while minimizing interference from irrelevant data.

The model also employs a variable-length mini-batch

allocation technique to guarantee more

comprehensive and dependable training data, which

improves the accuracy of personalized

recommendations (Zeng, 2022).

3.4.2 Long Short-Term Memory-based

Another specialized version of RNNs, Long Short-

Term Memory (LSTM), was designed to solve the

disappearing and expanding gradient issues that arise

when processing lengthy data sequences. In

recommendation systems, user behavior may span

long sessions, necessitating the consideration of early

interactions. User preferences may also evolve

slowly, meaning that older interactions could still be

relevant for recommendations. In such cases, LSTMs

are particularly useful for modeling these dynamics

(Yang, 2020).

The primary features of LSTM are its gating

mechanisms and parameter sharing. The three gates

in LSTM are the forget gate, input gate, and output

gate, which allow the network to dynamically decide

what information to retain or forget. These gates

enable LSTMs to effectively handle long-term

dependencies, allowing the network to remember

information from much earlier inputs and use that

information in the current output. Additionally,

LSTM shares the same weights at each time step of

the sequence, enabling the model to process

sequences of arbitrary lengths (Smagulova, 2019).

Time-LSTM is a frequently used variant of LSTM

designed for modeling user sequential behavior by

introducing time gates to handle time intervals, which

helps in better capturing both long-term and short-

term user interests, thereby enhancing

recommendation performance. By explicitly

modeling the time intervals between user actions,

Time-LSTM significantly improves the utilization of

user behavioral information, leading to better

recommendations (Zhu, 2017).

3.5 Generative Adversarial Network-

based

Generative Adversarial Networks (GANs) achieve

data generation and recognition through the

competition and cooperation of two neural network

models. The generator and the discriminator make up

the fundamental structures.

- The generator's job is to generate realistic data

samples from random noise, typically Gaussian noise.

A neural network generates samples that approximate

data distribution from a random vector, aiming to

confuse the discriminator into perceiving the

generated data is authentic.

- The discriminator is another neural network in

charge of differentiating between the generator's

fictitious data and actual data. The input is a data

sample, and it generates a scalar result indicating the

probability of the input being real. The aim of the

discriminator is to increase its accuracy in

distinguishing real data from generated data (Gui,

2021).

3.5.1 GANs for Personalized

Recommendations

Generative Adversarial Networks (GANs) are often

used to construct representations of user interest and

preference features in personalized recommendation

systems. By training the generator model, it can

produce feature vectors that align with users'

interests, while the discriminator model helps

distinguish the user's true interest features. The

system can precisely record user preferences due to

this approach, which produces more personalized

recommendationsl.

The core of a personalized recommendation

system lies in precisely understanding user interests.

GANs can leverage historical user behavior data to

generate a user interest profile, which includes

features such as the user's areas of interest, preference

types, and behavioral habits. The capacity of the

recommendation engine to match content with the

user's tastes can be greatly improved by these profiles

(Gao, 2021).

Moreover, personalized recommendation systems

often require large datasets, but real-world data may

sometimes be insufficient. GANs are able to produce

artificial data that replicates the distribution of real

data, effectively performing data augmentation. By

incorporating superior synthetic data into the current

dataset, this method enhances the recommendation

system's accuracy and performance (Wu, 2019).

A Brief Review of Basic Deep Learning Models for Recommendation Systems

479

3.5.2 GANs for Cold-Start Problems

For new users and products with insufficient

interaction data (e.g., ratings or clicks), GANs can

generate synthetic preferences based on partial

information, such as demographic data or initial

behaviors. These synthetic preferences simulate how

users might interact with different items, allowing the

system to provide appropriate recommendations

immediately, thereby reducing the negative effects of

data sparsity on fresh users and items.

In recommendation systems designed for new

users, the generator produces synthetic user

preferences for the cold-start users or items, while the

discriminator assesses the reliability of these artificial

preferences by comparing them with real user

interaction data. The interaction between these two

networks helps generate more realistic composite

ratings for cold-start users, allowing the

recommendation system to make better initial

recommendations. By providing accurate suggestions

even in the lack of adequate interaction data, this

dynamic method greatly enhances the system's

capacity to address cold-start issues (Chen, 2023).

4 CHALLENGES AND FUTURE

DIRECTIONS

As data volumes continue to grow, more efficient and

accurate recommendation algorithms are constantly

required to meet business needs. However, the large-

scale application of deep learning in recommendation

systems still faces several challenges.

Firstly, deep learning models possess strong

representational power, but this also means they

require substantial amounts of training data to fully

learn features and achieve high accuracy. For small-

scale user bases or new products, insufficient data can

lead to ineffective model training and unreliable

recommendation outcomes. Due to their data

dependence, deep learning models are

computationally intensive, often requiring expensive

GPU servers to support the necessary calculations.

Without adequate hardware resources, training times

can become prohibitively long, or the training may

not complete at all, imposing significant financial

demands on organizations.

Secondly, compared to traditional machine

learning algorithms, deep learning is more complex

and requires advanced skills in areas like deep

learning architectures and optimization algorithms.

At present, professionals with these skills are scarce,

and recruiting and training costs are high. A key

challenge for implementing deep learning projects is

finding and retaining the right talent. Many

organizations already have established big data

infrastructures, and integrating deep learning requires

seamless incorporation with these existing systems

for tasks like data analysis and feature engineering.

This often involves the development of additional

tools or components to bridge the gap between the

existing architecture and the deep learning

framework, allowing for efficient integration that can

support recommendation tasks.

Thirdly, the internal decision-making process of

deep learning models is not transparent, making it

difficult to explain how inputs affect outputs. This

"black-box" nature can make it challenging to provide

users with reasons for recommendations, potentially

undermining trust and impacting user experience and

satisfaction — particularly in scenarios where

transparency is important. In addition, deep learning

models involve numerous parameters and

hyperparameters, which need to be continuously

tuned throughout training to optimize performance.

This process is not only time-consuming but also

requires significant experience, as different

hyperparameter choices can greatly affect outcomes.

As such, tuning can be highly complex and resource-

intensive, adding to the challenges of applying deep

learning effectively.

While these obstacles together prevent deep

learning from being widely used in recommendation

systems, they also show that there is still a great deal

of room for advancement and study in this area.

5 CONCLUSION

In conclusion, the application of deep learning

technology has greatly enhanced recommendation

systems. Although conventional techniques like

collaborative filtering and content-based filtering

have their advantages, they face difficulties with data

sparsity, computational complexity, and the cold-start

issue. Foundational deep learning models, including

MLP, autoencoders, CNNs, RNNs, and GANs, offer

powerful optimization capabilities that improve

personalization, address cold-start scenarios, and

enhance scalability. As deep learning continues to

evolve, its combination with recommendation

systems will result in even higher accuracy and

efficiency, transforming the way users interact with

digital platforms.

DAML 2024 - International Conference on Data Analysis and Machine Learning

480

REFERENCES

Bansal, T., Belanger, D., & McCallum, A. (2016,

September). Ask the gru: Multi-task learning for deep

text recommendations. In Proceedings of the 10th ACM

Conference on Recommender Systems (pp. 107-114).

Chen, C. C., Lai, P. L., & Chen, C. Y. (2023). ColdGAN:

An effective cold-start recommendation system for new

users based on generative adversarial networks. Applied

Intelligence, 53(7), 8302-8317.

Chen, M., Ma, T., & Zhou, X. (2022). CoCNN: Co-

occurrence CNN for recommendation. Expert Systems

with Applications, 195, 116595.

Chen, M., Weinberger, K., Sha, F., & Bengio, Y. (2014,

June). Marginalized denoising auto-encoders for

nonlinear representations. In International Conference

on Machine Learning (pp. 1476-1484). PMLR.

Chen, W., Cai, F., Chen, H., & Rijke, M. D. (2019). Joint

neural collaborative filtering for recommender systems.

ACM Transactions on Information Systems (TOIS),

37(4), 1-30.

Donkers, T., Loepp, B., & Ziegler, J. (2017, August).

Sequential user-based recurrent neural network

recommendations. In Proceedings of the Eleventh ACM

Conference on Recommender Systems (pp. 152-160).

Fayyaz, Z., Ebrahimian, M., Nawara, D., Ibrahim, A., &

Kashef, R. (2020). Recommendation systems:

Algorithms, challenges, metrics, and business

opportunities. Applied Sciences, 10(21), 7748.

Gao, M., Zhang, J., Yu, J., Li, J., Wen, J., & Xiong, Q.

(2021). Recommender systems based on generative

adversarial networks: A problem-driven perspective.

Information Sciences, 546, 1166-1185.

Gui, J., Sun, Z., Wen, Y., Tao, D., & Ye, J. (2021). A review

on generative adversarial networks: Algorithms, theory,

and applications. IEEE Transactions on Knowledge and

Data Engineering, 35(4), 3313-3332.

He, X., Liao, L., Zhang, H., Nie, L., Hu, X., & Chua, T. S.

(2017, April). Neural collaborative filtering. In

Proceedings of the 26th International Conference on

World Wide Web (pp. 173-182).

Ko, H., Lee, S., Park, Y., & Choi, A. (2022). A survey of

recommendation systems: Recommendation models,

techniques, and application fields. Electronics, 11(1),

141.

Khan, Z. A., Zubair, S., Imran, K., Ahmad, R., Butt, S. A.,

& Chaudhary, N. I. (2019). A new users rating-trend

based collaborative denoising auto-encoder for top-N

recommender systems. IEEE Access, 7, 141287-

141310.

Kingma, D. P. (2013). Auto-encoding variational bayes.

arXiv preprint arXiv:1312.6114.

Li, L., Xiahou, J., Lin, F., & Su, S. (2023). DistVAE:

Distributed variational autoencoder for sequential

recommendation. Knowledge-Based Systems, 264,

110313.

Li, M., Zhang, Z., Zhao, X., Wang, W., Zhao, M., Wu, R.,

& Guo, R. (2023, April). AutoMLP: Automated MLP

for sequential recommendations. In Proceedings of the

ACM Web Conference 2023 (pp. 1190-1198).

Li, Z., Chen, H., Ni, Z., Deng, X., Liu, B., & Liu, W. (2022).

ARPCNN: Auxiliary review-based personalized

attentional CNN for trustworthy recommendation.

IEEE Transactions on Industrial Informatics, 19(1),

1018-1029.

Samek, W., Montavon, G., Lapuschkin, S., Anders, C. J., &

Müller, K. R. (2021). Explaining deep neural networks

and beyond: A review of methods and applications.

Proceedings of the IEEE, 109(3), 247-278.

Shenbin, I., Alekseev, A., Tutubalina, E., Malykh, V., &

Nikolenko, S. I. (2020, January). RecVAE: A new

variational autoencoder for top-n recommendations

with implicit feedback. In Proceedings of the 13th

International Conference on Web Search and Data

Mining (pp. 528-536).

Shiri, F. M., Perumal, T., Mustapha, N., & Mohamed, R.

(2023). A comprehensive overview and comparative

analysis on deep learning models: CNN, RNN, LSTM,

GRU. arXiv preprint arXiv:2305.17473.

Smagulova, K., & James, A. P. (2019). A survey on LSTM

memristive neural network architectures and

applications. The European Physical Journal Special

Topics, 228(10), 2313-2324.

Vincent, P., Larochelle, H., Bengio, Y., & Manzagol, P. A.

(2008, July). Extracting and composing robust features

with denoising autoencoders. In Proceedings of the

25th International Conference on Machine Learning

(pp. 1096-1103).

Vincent, P., Larochelle, H., Lajoie, I., Bengio, Y.,

Manzagol, P. A., & Bottou, L. (2010). Stacked

denoising autoencoders: Learning useful

representations in a deep network with a local denoising

criterion. Journal of Machine Learning Research,

11(12), 1103-1127.

Wang, H., Shi, X., & Yeung, D. Y. (2015, February).

Relational stacked denoising autoencoder for tag

recommendation. In Proceedings of the AAAI

Conference on Artificial Intelligence (Vol. 29, No. 1).

Wang, S., Sun, L., Fan, W., Sun, J., Naoi, S., Shirahata,

K., ... & Hashimoto, T. (2017, July). An automated

CNN recommendation system for image classification

tasks. In 2017 IEEE International Conference on

Multimedia and Expo (ICME) (pp. 283-288). IEEE.

Wu, Q., Liu, Y., Miao, C., Zhao, B., Zhao, Y., & Guan, L.

(2019, August). PD-GAN: Adversarial learning for

personalized diversity-promoting recommendation. In

IJCAI (Vol. 19, pp. 3870-3876).

Wu, Y., DuBois, C., Zheng, A. X., & Ester, M. (2016,

February). Collaborative denoising auto-encoders for

top-n recommender systems. In Proceedings of the

Ninth ACM International Conference on Web Search

and Data Mining (pp. 153-162).

Yang, S., Yu, X., & Zhou, Y. (2020, June). LSTM and GRU

neural network performance comparison study: Taking

Yelp review dataset as an example. In 2020

International Workshop on Electronic Communication

and Artificial Intelligence (IWECAI) (pp. 98-101).

IEEE.

Zeng, F., Tang, R., & Wang, Y. (2022). User personalized

recommendation algorithm based on GRU network

A Brief Review of Basic Deep Learning Models for Recommendation Systems

481

model in social networks. Mobile Information Systems,

2022(1), 1487586.

Zhang, G., Liu, Y., & Jin, X. (2020). A survey of

autoencoder-based recommender systems. Frontiers of

Computer Science, 14, 430-450.

Zhao, C. M. K. M., & et al. (2017). Multivariate variational

autoencoder for learning disentangled representations.

In Proceedings of the 34th International Conference on

Machine Learning, 70, 403-412.

Zheng, L., Noroozi, V., & Yu, P. S. (2017, February). Joint

deep modeling of users and items using reviews for

recommendation. In Proceedings of the Tenth ACM

International Conference on Web Search and Data

Mining (pp. 425-434).

Zhou, K., Yu, H., Zhao, W. X., & Wen, J. R. (2022, April).

Filter-enhanced MLP is all you need for sequential

recommendation. In Proceedings of the ACM Web

Conference 2022 (pp. 2388-2399).

Zhu, Y., & Chen, Z. (2022, April). Mutually-regularized

dual collaborative variational auto-encoder for

recommendation systems. In Proceedings of The ACM

Web Conference 2022 (pp. 2379-2387).

Zhu, Y., Li, H., Liao, Y., Wang, B., Guan, Z., Liu, H., &

Cai, D. (2017, August). What to do next: Modeling user

behaviors by time-LSTM. In IJCAI (Vol. 17, pp. 3602-

3608).

DAML 2024 - International Conference on Data Analysis and Machine Learning

482