loading
Papers Papers/2022 Papers Papers/2022

Research.Publish.Connect.

Paper

Paper Unlock

Authors: Yaakov HaCohen-Kerner ; Yarden Tzach and Ori Asis

Affiliation: Jerusalem College of Technology (Machon Lev), Israel

Keyword(s): Blog Posts, Distinguishable Features, Gender Clustering.

Related Ontology Subjects/Areas/Topics: Artificial Intelligence ; Clustering and Classification Methods ; Knowledge Discovery and Information Retrieval ; Knowledge-Based Systems ; Mining Text and Semi-Structured Data ; Symbolic Systems

Abstract: The aim of this research is to find out how to perform effective clustering of unlabeled personal blog posts written in English by gender. Given a gender-labeled blog corpus and a blog corpus that is not gender-labeled, we extracted from the labeled corpus distinguishable unigrams for both males and females. Then, we defined two general features that represent the relative frequencies of the distinguishable males’ unigrams and females’ unigrams, (males’ frequency and females’ frequency). The best distinguishable feature was found to be the males’ frequency feature with a ratio factor at least 1.4 times that of females. This feature leads to accuracy rate of 83.7% for gender clustering of the unlabeled blog corpus. To the best of our knowledge, this study presents two novelties: (1) this is the first study to cluster blog posts by gender, and (2) clustering of an unlabeled corpus using distinguishable features that were extracted from a labeled corpus.

CC BY-NC-ND 4.0

Sign In Guest: Register as new SciTePress user now for free.

Sign In SciTePress user: please login.

PDF ImageMy Papers

You are not signed in, therefore limits apply to your IP address 54.226.25.246

In the current month:
Recent papers: 100 available of 100 total
2+ years older papers: 200 available of 200 total

Paper citation in several formats:
HaCohen-Kerner, Y.; Tzach, Y. and Asis, O. (2016). Gender Clustering of Blog Posts using Distinguishable Features. In Proceedings of the 8th International Joint Conference on Knowledge Discovery, Knowledge Engineering and Knowledge Management (IC3K 2016) - KDIR; ISBN 978-989-758-203-5; ISSN 2184-3228, SciTePress, pages 384-391. DOI: 10.5220/0006077403840391

@conference{kdir16,
author={Yaakov HaCohen{-}Kerner. and Yarden Tzach. and Ori Asis.},
title={Gender Clustering of Blog Posts using Distinguishable Features},
booktitle={Proceedings of the 8th International Joint Conference on Knowledge Discovery, Knowledge Engineering and Knowledge Management (IC3K 2016) - KDIR},
year={2016},
pages={384-391},
publisher={SciTePress},
organization={INSTICC},
doi={10.5220/0006077403840391},
isbn={978-989-758-203-5},
issn={2184-3228},
}

TY - CONF

JO - Proceedings of the 8th International Joint Conference on Knowledge Discovery, Knowledge Engineering and Knowledge Management (IC3K 2016) - KDIR
TI - Gender Clustering of Blog Posts using Distinguishable Features
SN - 978-989-758-203-5
IS - 2184-3228
AU - HaCohen-Kerner, Y.
AU - Tzach, Y.
AU - Asis, O.
PY - 2016
SP - 384
EP - 391
DO - 10.5220/0006077403840391
PB - SciTePress