Proposal of a Cosmetic Product Recommendation Method with

Review Text that is Predicted to Be Write by Users

Natsumi Baba, Yuichi Sei

, Yasuyuki Tahara

and Akihiko Ohsuga

The University of Electro-Communications, Tokyo, Japan

Keywords: Collaborative Filtering, Morphological Analysis, Review Analysis, Product Recommendation.

Abstract: There are a variety of product introduction sites on the Internet, and many of these usually provide a

combination of product composition information and user review text. It is difficult to understand the features

of a product in detail from the information on these sites. Furthermore, these review sites often include product

recommendations such as "recommended for you," but often lack an explanation of why the product is

recommended. Therefore, this study proposes an approach that provides both the user's opinion of the product

and the reason for recommending the product in a simplified manner. Using cosmetics as a case study, where

the user's actual experience is important, we scored product features on a 5-point scale based on review

submitted by users. This data was used for collaborative filtering to determine product recommendations and

generate review sentences that target users are expected to write when using the product. The generated

reviews facilitate users to understand the details of a product before purchasing it and are useful for

comparison before purchasing a product. To verify the usefulness of the proposed method, we conducted a

questionnaire comparing it with existing methods. The proposed method aims to improve user satisfaction in

product recommendations.

1 INTRODUCTION

In recent years, there have been many product

reviews on Social Networking Service and video

sharing sites, and many people refer to Internet

review information when purchasing products.

Cosmetics are goods for which reviews are

particularly important, since there is a large

difference in suitability for different people, and

information on the type of person for whom the

product is suitable is important. On many review sites,

product reviews are composed of a score that

represents the overall evaluation of the product and a

free description by the reviewer. Although there are

many sites where product reviews are posted, most of

them consist only of information on the product

components, the average value of the overall rating

obtained for the product, and the actual product

review. Therefore, it is difficult to judge whether a

product is worth buying or not before purchasing it.

https://orcid.org/0000-0001-6717-7028

https://orcid.org/0000-0002-1939-4455

https://orcid.org/0000-0002-2552-6717

Figure 1: General process of product selection. It is

necessary to carefully check numerous reviews of many

recommended product

Numerous reviews written

by others about the

recommended product

Baba, N., Sei, Y., Tahara, Y. and Ohsuga, A.

Proposal of a Cosmetic Product Recommendation Method with Review Text that is Predicted to Be Write by Users.

DOI: 10.5220/0012377500003636

Paper published under CC license (CC BY-NC-ND 4.0)

In Proceedings of the 16th International Conference on Agents and Artiﬁcial Intelligence (ICAART 2024) - Volume 3, pages 609-616

ISBN: 978-989-758-680-4; ISSN: 2184-433X

609

Figure 2: Proposed process of product selection. The user can understand in advance how she will feel by using the

recommended product

Numerous reviews about the

recommended product

Fig.2 Proposed process of product selection. The user can understand in advance how her

will feel by using the recommended product.

Proposed

model

Dataset of past review texts from

the target user and other users

The review text that the target

user is expected to write

Generation

ICAART 2024 - 16th International Conference on Agents and Artiﬁcial Intelligence

610

review sentences are obtained, and the top 1% of the

TF-IDF values are selected as candidate keywords.

About 30% of the automatically extracted words were

actually adaptable to keywords. Based on these

candidate keywords, they created a dictionary of

evaluative expressions using the same method as

Matsunami et al.’s. They concluded that this method

obtained a higher evaluation rate than existing

methods, even though the number of registered

evaluative expressions was much smaller than

existing methods. Another study that automatically

constructs a lexicon of evaluation expressions is that

of Sakai et al. (Sakai et al., 2019). They used a TF-

IDF method to determine keywords, then used a

dependency analyzer to collect adjectives, adverbs,

verbs, nouns, and auxiliary verbs that modify the

keywords, and created a dictionary of evaluation

expressions. Sakai et al.'s proposed method has

shown that the scoring results for items such as "Lame

is amazing", which can be perceived in various ways

by different people, are significantly different from

the results obtained by manual scoring.

Okuda et al. (2020) studied product

recommendation using this evaluation expression

dictionary. When “User A” gives a high rating to

“Product 1”, they score the reviews that “User A”

gave to “Product 1”, and recommend products that

have similar ratings to “Product 1”. They concluded

that determining similar users and recommending

products based solely on scoring results is not an

optimal method, and that future work should take into

account the bipolarity of scores by evaluation

features, user attribute information, and other factors

to calculate user similarity and improve accuracy.

Yabe et al. (2021) proposed a method in which the

user sets a reference product, and the system adjusts

the score of each item based on the scoring results of

that product to recommend products that are closer to

the user's ideal. The system proposed by Yabe et al.

showed better results than conventional systems in

various items such as the ease of obtaining product

information and the ease of reflecting user

preferences.

As an example of a product recommendation

method, Hara et al. (2021) proposed an experiment in

which the product itself is turned into an agent and

product recommendations are made. Through the

agentization of the product, some people felt as if they

were being recommended by a store clerk, and

changed their selection.

In addition, a method has been proposed to

construct a product recommendation system using

deep learning, and to provide the basis for the output

of the recommendation system at the time of

recommendation. For example, Onogawa et al.

(2020) proposed a method that uses LIME to identify

words that characterize each product when comparing

products. Imafuku et al. (2021) proposed a similar

method to improve recommendation effectiveness

and satisfaction. Afchar et al. (2020) proposed a

method that visualizes which features, which are

input to deep learning, contribute to the

recommendation results. These methods show the

basis of recommendation by highlighting words or

features, and do not infer how the user actually

perceives the product to be recommended. In

addition, LIME assumes that all observations are

independent, so it cannot account for nonindependent

observations with high fidelity (Matsushima et al.,

2021). This study targets users who write many

review sentences after purchasing and using a

product, and although a product has various features,

it is assumed that users will mainly write about

features that are of particular interest to them.

Therefore, the review text that is predicted to be

written by the user is considered to be highly useful

as a reason for product recommendation.

Based on the results of these studies, scoring user

reviews, predicting recommended products from

them, and generating the reviews that the users are

predicted to write when purchasing this product

would allow users to easily and accurately review

products before purchasing them, thus enabling them

to make low-risk purchases.

3 METHOD

The proposed method mainly consists of the

following 4 steps. It is structured as shown in Figure

3. First, the evaluation expression dictionary is

created based on the review ratings given to the

products. Next, scoring is performed for each

evaluation product. Then, based on the scoring results

and the user's past review data, the system decides

which products to recommend. Finally, the system

presents review sentences that predict how the user

will evaluate the product when he or she purchases it,

along with the recommended product.

Proposal of a Cosmetic Product Recommendation Method with Review Text that is Predicted to Be Write by Users

611

Figure 3: Schematic diagram of the proposed method.

As a method for scoring review texts for each

evaluation product, an evaluation expression

dictionary based on co-occurrence expressions, as in

previous studies was used. The evaluation expression

dictionary is created by first extracting evaluation

expression phrases with reference to, performing

scoring, and then dividing them to create an

evaluation expression dictionary consisting of

keywords, feature words, and words expressing

degree.

Scoring of review texts is performed by

morphological analysis of the review texts and

checking whether feature words and words

expressing degree appear in sentences containing

words corresponding to the keywords in the

evaluation expression dictionary. If a negative word

is included in a sentence containing the keyword, the

score of the evaluation expression feature is changed

to the second value from the bottom if it was the

second value from the top. All evaluation feature that

were not mentioned in the review text were

considered to be of standard satisfaction that did not

need to be mentioned, and were given the median

score. Based on the feature scores obtained,

collaborative filtering was used to determine which

products to recommend to the user.

The system estimates how users feel about the

recommended product when they actually use it,

based on other users' reviews, and generates the

review sentences that users are expected to give based

on the predicted scores. When generating review

sentences, information on feature that users consider

important when making a purchase, such as the

number of times a product has been mentioned in past

reviews, should be made easy to understand.

Specifically, for a product that has a high overall

satisfaction level but lacks the ease-of-use that the

user values at 2 stars, this method generate a review

sentence such as "Overall, I was very satisfied with

the product, but I was disappointed that it was a little

difficult to handle”. The review sentence is generated

as follows.

First, scoring of past product review sentences of

the target user for each feature. Also, feature that the

user pays particular attention to in the product are

identified. Then, based on the score results, users who

are assumed to be similar to the target user are

calculated by collaborative filtering. Next, based on

the ratings of similar users, calculate the number of

scores for each feature in the product to be

recommended and in the reviews that users are

expected to give to that product. Finally, generate a

review text that the user is expected to give when

he/she purchases the recommended product by

inputting the predicted score of each feature,

information on the feature items that the user is

interested in, and several past user reviews as

prompts. See Chapter 4 for a concrete image of the

proposed methodology.

4 EXPERIMENTS

We conducted a user study evaluation comparing a

baseline method, which directly presents users with

numerous reviews from other users, with our

proposed method. It should be noted that the output

of the proposed method consists solely of a single

review that appears as if written by the user being

recommended to. Thus, it is feasible to employ both

the baseline and proposed methods concurrently.

Consequently, even if fewer participants favored the

proposed method over the baseline, it doesn't

necessarily indicate the ineffectiveness of the

proposed method. If even a modest number of

participants find the proposed method preferable, it

underscores the merit of appending the output of the

proposed method alongside the baseline's output.

4.1 Experimental Setting

To create a dictionary of evaluation expressions for

cosmetic items, this study decided to treat "eyelash

serum", and conducted the following experiment

using about 10,000 reviews of "Eyelash Serum" from

Rakuten Group, Inc.’s review data (Rakuten Group,

Inc., 2014).

First, obtained 316 evaluation phrases from 250

randomly selected reviews. Based on these, we

obtained 678 co-occurrence expressions for

evaluation expression dictionary. In this experiment,

8 features were set as evaluation deatures for eyelash

serum: "Cost-Effectiveness", "Growth", "Volume",

ICAART 2024 - 16th International Conference on Agents and Artiﬁcial Intelligence

612

"Hypoallergenic", "Treatment", "Pigmentation",

"Ease of Application" and "Smell”. The score given

to each feature in this evaluation expression

dictionary was also set to 1~5, in accordance with the

score given to Rakuten's reviews.

Scoring of product evaluation features was

performed on morphologically analyzed text reviews

using the created evaluation expression dictionary.

Performing user-based collaborative filtering

based on the scored ratings of each item and user

information, and given data on product reviews that a

user had performed in the past, it presented products

that were highly rated by similar users and that the

user was also likely to like.

In order to compare the proposed method with

existing methods, such as the display method on

actual product review sites, a questionnaire was

conducted. with 19 men and women in their teens to

30s who often refer to reviews on online shopping

sites. The questionnaire was conducted online, and

respondents rated on a 5-point scale what they

thought was the best way to present the questions and

images that followed. The question consists of 3

parts. First part is Question 1, which compared the

method of listing reviews of other companies

commonly found on existing sites with the method of

displaying reviews that are expected to be done by the

user generated by the proposed method. Then, second

part is Question 2, which compared the occasion of

displaying only the score for each feature calculated

by the proposed method and the occasion of

generating and displaying reviews that are expected

to be done by the user generated by the proposed

method. Next, final part is Question 3, which tested

whether weighting the review sentences to mention

more features that the user is expected to place

importance on would convey more details about the

product.

4.2 Comparison of Others' Reviews

and Your Own Style of Review

First, to compared the meåthod of listing reviews of

other companies commonly found on existing sites

with the method of displaying reviews that are

expected to be done by the user generated by the

proposed method, compared the method shown in

Figure 4, which simply displays a list of other

people's reviews, with the proposed method shown in

Figure 5, which presents only the text of reviews

generated by the proposed method. In this question,

we refer to the former as “A” and the latter as “B”.

The results are shown in Figure 6. Those who

answered that “A was easier to understand” were

asked about additional reasons for their choice, and

responses such as "B is easier to understand if you

compare per review" and "I prefer to have reviews

from various people”.

Figure 4: Displayed reviews in existing method.

Figure 5: Displayed reviews in proposed method.

4.3 Comparison of Features’ Score and

Review Text Displays

To compared the occasion of displaying only the

score for each feature calculated by the proposed

method and the occasion of generating and displaying

reviews that are expected to be done by the user

generated by the proposed method, asked three

questions to determine which method is better for

understanding the feature of a product when

purchasing it. There are two ways: one is to display

only the score of each item obtained in the process of

the proposed method, as shown in Figure 7, and the

YOU

★★★★☆

"The resu lts fo r in creasing

volu m e and len g th of m y

eyelash es m e t m y expectatio ns.

It w a s e asy to ap p ly, and I d idn 't

encoun ter m an y iss u e s.

H o weve r, th er e w a s a slig h t

sensatio n o f ir ritatio n o n th e

skin , a n d th e sce n t w a s a b it

b o th ersom e. O vera ll, it f e lt

g oo d ."

Proposal of a Cosmetic Product Recommendation Method with Review Text that is Predicted to Be Write by Users

613

other is to display the review text, as shown in Figure

5. In this question, the former is referred to as "A" and

the latter as "B".

Figure 6: Result of Question1: “Which was easier to

understand product’s information?”.

Figure 7: Displayed score of each feature in proposed

method.

Figure 8: Result of Question 2: “Which can help us

understand the product's features more clearly?”.

The results are shown in Figure 8. In all questions,

the format displaying the numerical score was

superior, but it was found that there were a certain

number of people for whom the review text was easier

to understand.

4.4 Weighting of Features

In the review text shown in Figure 5, 11 review texts,

one in which only the score information of each item

was used in generating the review text, we displayed

and another in which the review text was weighted to

mention more items that were considered important

by the user. The question as to whether the

information on each item was sufficiently conveyed

from each review text was asked with respect to three

items.

Table 1 shows the percentage of each item's

characteristic information conveyed in the

unweighted and weighted review sentences,

respectively. In the case of the unweighted review

sentences, the most salient features of the product

were often mentioned and some information was not

conveyed, but the weighted review sentences showed

a significant improvement.

Table 1: Result of Question 3: Percentage of information

conveyed for each feature (%).

5 DISCUSSION

The proposed method aims to improve user

satisfaction with product recommendations by

generating product recommendations using automatic

scoring of review sentences. Based on this, discuss

the results of the questionnaire survey conducted this

time.

5.1 Comparison of Others' Reviews

and Your Own Style of Review

This part, which compares the existing review sites

with the proposed method, showed that the existing

method is easier to understand. In this question, the

number of reviews by others used to express the

existing method was 6 for one product, which is

easily readable, while in actual review sites, there are

hundreds of reviews for a single product, and this is

thought to be one of the reasons for the superiority of

Hypoallergenic Growth Cost-Effectiveness

Weighted 52.63 78.95 94.74

Unw eighted 0.00 10.53 0.00

ICAART 2024 - 16th International Conference on Agents and Artiﬁcial Intelligence

614

A. This is thought to be one of the reasons for A's

superiority. As shown in the additional questionnaire,

personal preference is also considered to be an issue.

In addition, in questions 1-3, the number of

respondents who answered that B was easier to

understand/slightly easier to understand than 1-1 and

1-2 was considerably larger, suggesting that the

content of the displayed reviews may also be a factor.

We would like to continue to examine the comparison

with the existing method by changing the conditions,

etc.

5.2 Comparison of Score and Text

Displays

This part, which compared the score display only with

the display in the review text, showed that the score

display was easier to understand. However, in all

questions, some respondents answered that the review

text by several people B was easier to convey.

Therefore, believe that the information may be more

easily conveyed to users if the obtained scores and the

predicted review sentences are listed together.

5.3 Weighting of Features

This part, which compares weighted and unweighted

reviews, shows that weighting makes it easier to

convey information about specific items. This

weighting is thought to make it easier to convey

information that is not often mentioned in ordinary

reviews, but that the user cares about.

6 CONCLUSION

Based on the experimental results, the efficacy of

generating and presenting a single review that appears

as if written by the user being recommended to was

confirmed. However, when compared to a vast

number of reviews written by other users, its utility

was found to be relatively lower. We believe that it

can contribute to users' easy understanding of

information when deciding whether to purchase a

product by combining the scoring given when a

product is purchased with review text tailored to the

user's preferences.

As future prospects for this research, we would

like to improve the usefulness of this method in

comparison with existing methods, the accuracy of

product recommendations, and the accuracy of

predicted review sentences, which were the issues in

this experiment.

ACKNOWLEDGEMENTS

This research was supported by JSPS Grants-in-Aid

for Scientific Research JP21H03496, JP22K12157,

JP23H03688 and Sumitomo Electric Industries

Group Social Contribution Fund.

In this paper, we used "Rakuten Dataset"

(https://rit.rakuten.com/data_release/) provided by

Rakuten Group, Inc. via IDR Dataset Service of

National Institute of Informatics.

REFERENCES

Darius Afchar, Romain Hennequin. (2020). Making neural

networks interpretable with attribution: application to

implicit signals prediction. In 14th ACM Conference on

Recommender Systems, pp.220-229.

Riya Ambolkar, Arpita Bhagat, Bhakti Buga, and Swapnil

Gharat. (2022). Hotel Recommendation System using

advanced efficiency and accuracy with modified BERT

technique. In 2022 Second International Conference on

Artificial Intelligence and Smart Energy (ICAIS)

Dimple Chehal, Parul Gupta, and Payal Gulati. (2022). An

Approach to Utilize E-commerce Product Reviews to

Remove Irrelevant Recommendations. In 2022 IEEE

Delhi Section Conference (DELCON)

Taluya Hara, Jun Baba, and Takuya Iwamoto. (2021). Item-

Driven Recommrndation: A Recommendation System

for Other Items Based on the Users. In The 35th Annual

Conference of the Japanese Socie 2021

Taichi Imafuku, Tatsuya Kawakami, Tianxiang Yang, and

Masayuki Goto. (2021). A Study on Item

Recommendation Model by Evaluating the Effect of

Individual Intervention. In The 35th Annual

Conference of the Japanese Society for Artificial

Intelligence 2021.

Rio Iwabuchi, Yoko Nakajima, Hirotoshi Honma, Haruka

Aoshima, Akio Kobayashi, Tomoyoshi Akiba, and

Shigeru Masuyama. (2017). Proposal of recommender

system based on user evaluation and cosmetic

ingredients. In 2019 4th International Conference on

Information Technology (InCIT)

P.Kirubanantham, A.Saranya, and D.Senthil Kumar.

(2022). Convolutional Recommended Neural Network

system based on user reviews for movies.In 2021 4th

International Conference on Computing and

Communications Technologies (ICCCT) .

Yuki Matsunami, Mayumi Ueda, Shinsuke Nakajima,

Takeru Hashikami, Sunao Iwasaki, John O'Donovan,

and Byungkyu kang. (2016). An automatic scoring

method for review by evaluation item using a cosmetic

item evaluation expression dictionary. In 8th Forum on

Data Engineering and Information Management

(DEIM Forum 2016) B1-1.

Hiromu Matsushima, Shun Morisawa, Takumi Ishikawa,

and hayato Yamana. (2021). A Survey of Explainable

Proposal of a Cosmetic Product Recommendation Method with Review Text that is Predicted to Be Write by Users

615

Recommender System. In 20th Information Science

and Technology Forum (FIT2021).

Yoko Nakajima, Hirotoshi Honma, Haruka Aoshima, Akio

Kobayashi, Tomoyoshi Akiba, and Shigeru Masuyama.

(2019). Recommender System Based on User

Evaluations and Cosmetic Ingredients. In Pakistan

Journal of Engineering and Technology, PakJET

Volume: 5, Number: 3, Pages: 38- 43.

Asami Okuda, Myumi Ueda, and Shinsuke Nakajima.

(2020). A cosmetic product recommendation method

using scores by evaluation item. In 12th Forum on Data

Engineering and Information Management, B1-3.

Takayuki Onogawa, Ryohei Orihara, Yuichi Sei, Yasuyuki

Tahara, and Akihiko Ohsuga. (2020). Why Do Users

Choose a Hotel over Others? Review Analysis Using

Interpretation Method of Machine Learning Models. In

IEEE International Conference Big Data Analytics

(ICBDA), pp.354-362.

Rakuten Group, Inc. (2014). Rakuten Dataset. Informatics

Research Data Repository, National Institute of

Informatics. (dataset). https://doi.org/10.32130/idr.2.0

Ashra Sahar, Muhammad Ayoub, Shabir Hussain, Yang

Yu, and Akmal Khan. (2022). Transfer Learning-Based

Framework for Sentiment Classification of Cosmetics

Products Reviews. Pakistan Journal of Engineering and

Technology, PakJET

Miharu Sakai, Mitsunori Matsushita, and Mayumi Ueda.

(2019). Automatic Construction of Evaluation

Expression Dictionary for Generating Scores of

Cosmetics by Evaluation Item. In 11th Forum on Data

Engineering and Information Management, B6-2.

Yuna Taniguchi, Asami Okuda, Mayumi Ueda, Panote

Siriaraya, and Shinsuke Nakajima. (2019) Efficient

method for creating evaluation expression dictionaries

for each cosmetic category. In Proceedings of the 35th

Symposium on Fuzzy Systems (FSS2019 Osaka

University).

Sayaka Yabe, Mayumi Ueda, and Shinsuke Nakajima.

(2021) Visualization system of differences among

cosmetic items using scores by evaluation items, The

13th Forum on Data Engineering and Information

Management, C14-2

Hafed Zarzour, Mohammad Alsmirat, and Yaser Jararweh.

(2022). Using Deep Learning for Positive Reviews

Prediction in Explainable Recommendation Systems.

In 2022 13th International Conference on Information

and Communication Systems (ICICS).

ICAART 2024 - 16th International Conference on Agents and Artiﬁcial Intelligence

616