Association Rule Analysis using CT-Pro and Hash-based

Algorithm in Violence Case of Children

Amir Hamzah Siregar

1

, Maya Silvi Lydia

1

and Sutarman Wage

2

1

Faculty of Computer Science and Information Technology, Universitas Sumatera Utara, Medan, Indonesia

2

Faculty of Mathematics and Natural Sciences, Universitas Sumatera Utara, Medan, Indonesia

Keywords: Association Rule Mining, CT-Pro, Hash-Based, Frequent Itemset.

Abstract: The searching technique for frequent itemset patterns in finding support and confidence values with the a

priori algorithm association rule method has a weakness in performance (because it has to read the database

repeatedly in determining frequent itemset). This becomes a serious problem if the database is large, reading

the database repeatedly results in very high processing times for a long time to generate support & confidence

values. A special approach in analyzing association rules using CT-Pro and Hash-Based is needed. CT-Pro

has a CFP-Tree data structure that allows a faster search for frequent itemset where the number of paths or

trees that are built was compressed. Hash-based works with a hashing technique where the database was only

read in the first iteration by entering the candidate itemset in the hash table. The test results were carried out

with 3% support and 15% confidence, CT-pro formed 22 rules and an execution time of 0.25 seconds, while

Hash-based formed 22 rules and an execution time of 0.75 seconds. A new pattern of crime that was found

with the highest confidence and support was when an act of sexual harassment resulted in physical torture

with a confidence of 59%, a support count of 34, and a lift ratio of 1.29.

1 INTRODUCTION

In the law number 23 of 2002 regulates the protection

of children (someone under 18 years of age).

Violence perpetrated against children is behavior that

is abusive either by parents or adults. Based on data

from the Office of Women's Empowerment and Child

Protection of North Sumatra Province, the P2TP2A

Unit (Integrated Service Center for the Empowerment

of Women and Children) states that the total number

of violence against children in 2018 was 991 cases,

then in 2019 there were 587 cases from 33 districts. It

is hoped that the police, which functions as a public

safeguard, is able to respond to the phenomenon and

be able to take action and uncover crimes committed

against children by using an analysis of several habits

that often occur simultaneously with several crimes

against children. Such analysis can be performed

using the Rule association technique.

The association rule is a method in data mining

that looks for a set of items that often appear

simultaneously (Si et al.2019), (Shaban et al. 2018),

(Muhajir et al. 2020). The algorithm that is often used

in the process of association rules is apriori. The

Apriori algorithm performs the process of extracting

information from the database in order to generate

association rules (Ali et al. 2019). Problem solving in

the process of extracting information from a database

is done by processing the frequent itemset to generate

support. Confidence. Support is the level of

dominance of an item / itemset in the database, while

confidence is the conditional relationship between

two items (Sitnikov et al. 2018). In the case of finding

patterns of crimes against children, support is used to

calculate the number of each type of crime committed

and confidence is used to find the relationship

between the types of crimes committed over a period

of time. So that the results are expected to be able to

find a pattern of crime in children based on previous

patterns. To generate support and confidence values,

Apriori must read the database repeatedly and

generate a large number of frequent itemsets and a

large number of association rules. This resulted in a

very high processing rate so that the achievement of

support and confidence values took quite a long time

to complete (Naresh et al. 2019). Apart from Apriori,

there are several other algorithms for finding frequent

itemsets including FP-Growth, CT-Pro, Hash-Based,

Apriori Cristian Borgelt.

Dhivya and Kalpana (2010) conducted research

on the performance of CT-Apriori and CT-Pro to

show the speed of data execution in the form of

Siregar, A., Lydia, M. and Wage, S.

Association Rule Analysis using CT-Pro and Hash-based Algorithm in Violence Case of Children.

DOI: 10.5220/0010338805650573

In Proceedings of the International Conference on Culture Heritage, Education, Sustainable Tourism, and Innovation Technologies (CESIT 2020), pages 565-573

ISBN: 978-989-758-501-2

Copyright

c

2021 by SCITEPRESS – Science and Technology Publications, Lda. All rights reserved

565

performance curves. From the results of this study it

was found that the CT-Pro was superior to the CT-

Apriori algorithm by using the retail sales transaction

dataset research. The CT-Pro and CT-Apriori

algorithms are better than the basic algorithms,

namely FP-Growth and Apriori. The difference in

performance between CT-Pro and CT-Apriori is more

influential at the lower threshold.

Gupta (2011) conducted a study in the form of a

comparison of FP-Tree based algorithms, including

COFI-Tree, CT-PRO and FP-Growth. Where FP-

Growth takes a recursive approach while COFI Tree

and CT-PRO take a non-recursive approach. Then in

terms of FP-Growth structure, make FP-Tree, COFI

Tree uses a two-way FP-Tree structure, and CT-Pro

forms a Compressed FP-Tree (CFP-Tree). In terms of

data execution speed, CT-PRO is better than FP

Growth and COFI-Tree.

Aguru and Rao (2017) conducted research on

Hash-Based using rehashing techniques with retail

sales transaction research data. When the process of

finding the address in the hash table occurs a collision

(there is more than 1 itemset having the same hash

address) and the rehashing function is used to solve

the problem. At the end of their research, Aguru et al.

Compared the length of execution time between

Apriori and Hash-Based using the rehashing

technique, the results of which the Hash-Based

rehashing technique were faster than Apriori's. Hash-

Based with rehashing technique with support 20 has

a long execution time of 22, while Apriori with

support of 20 has a long execution time of 53.

Based on the previous discussion, the CT-Pro

algorithm and the Hash-Based algorithm are able to

streamline the data execution time, in this case the

frequent itemset search. In this study, the search for

patterns of crime in children using the CT-Pro

algorithm and the Hash-Based algorithm is expected

to show better performance so that the achievement

of support values and confidence values does not

require a long time and the association rules that are

formed are not too many. This study also aims to

analyze the performance of the CT-Pro algorithm and

the Hash-Based algorithm to search for frequent

itemsets and generate association rules to get the best

performance comparison of the two methods.

2 METHODS

In this study, a method to find new patterns of crime

in children was developed. The CT-Pro algorithm

association rules method and the Hash-Based

algorithm are used by comparing the number of

association rules and the length of data execution.

From the prepared dataset, 150 data sets on crimes

against children were obtained from the Office of

Women Empowerment and Child Protection of North

Sumatra Province P2TP2A unit (Integrated Service

Center for Women and Children Empowerment). The

data is converted into binary numbers, namely the

data format in the form of 0 & 1. Each data is

processed using the CT-Pro and Hash-Based

algorithms. The results are used to find new patterns

of crime in children, and get a comparison of the time

in finding the association rules and the number of

rules generated between the CT-Pro algorithm and the

Hash-based algorithm.

2.1 Association Rule

Association rule is a data mining technique to identify

the relationship between multiple items in a dataset

(Siswanto et al., 2018). Association rules are

generally of the form "if - then", with the antedecent

representing "if" and "then" representing the

consequent (Shaban et al., 2018). The importance of

an association rule can be determined by two

parameters, namely support and confidence (Segatori

et al., 2018). Support is a measure or number of

occurrences of items simultaneously. Confidence is a

measure or percentage that states the relationship

between the two items (Nomura et al., 2020).

The steps for finding association rules are divided

into three stages (Ghazanfari et al. 2020).

1) Frequent itemset analysis

In this stage the process of searching for frequent

itemset where the requirements are to meet or be

greater than the minimum value of support

(minsupport) in the database (Han et al., 2019). The

support value formula as follows:

Support =

X 100

2) The formation of association rules

Frequent itemsets are generated before the

formation of association rules provided that the

pattern value must be greater than the minimum

confidence (minconfidence) (Ren et al., 2018). The

confidence value formula as follows

Confidence =

,

X 100

3) The search for lift ratio

Lift ratio is a measure or unit that states whether

or not an association rule is strong. The value

generated from the lift ratio calculation is used to

determine whether a rule is valid or not (Li et al.,

2019). The size of the lift ratio is in the range of

CESIT 2020 - International Conference on Culture Heritage, Education, Sustainable Tourism, and Innovation Technologies

566

values from 0 to infinity. (Zahrotun et al., 2018). The

lift rasio value formula as follows:

Lift Ratio =

,

,

The benchmark confidence value using the formula:

Benchmark Confidence=

Notes:

Nc =Total of transactions with items as a consequent

N =Total transactions from the dataset.

2.2 CT-Pro

The flow stages of the CT-Pro algorithm include:

1) Looking for frequent itemset, where the

process is selecting data against a

predetermined database with the minsupport

limit. Furthermore, the frequency value of each

item is calculated to produce a Global item

table.

2) Build a CFP-Tree, where the process is to sort

frequent items in descending order based on

existing Global item values and form a Global

CFP-Tree.

3) Doing the frequent itemset mining process, for

each item in the ordered Global item table.

Search for nodes associated with these items in

the Global CFP-Tree. Furthermore, local

frequent items are used to build local item

tables. Based on the local item table that has

been formed then the Local CFP-Tree is built

and frequent itemset is formed according to the

items that have been mined from the Local

CFP-Tree.

2.3 Hash-based

The stages of the Hash-based algorithm are:

1) Determining the minsupport value as the

threshold condition for generating frequent

itemset and then confidence as the threshold

condition for generating the association rule.

2) C1 (Candidate 1) generation based on support

calculations. Before entering each itemset into

the bucket in the hash table, the hashing process

for 1 itemet candidate must be done. The

formula for the hashing process is

h{x} = {order of item x} mod n.

h = bucket address in the hash table

n = sum of addresses, (n = 2 m + 1)

m = total number of items

3) After performing the hash calculation, the

result is C1. Itemset aims to get the hash

address after calculation with the hashing

formula. Itemset occupies hash addresses and

becomes notes, then builds links that point to

items that contain the itemset in sequence to

form a link list. Then the itemset is filtered

based on the minsupport value to produce L1

(Large 1).

4) The results from L1 are then combined and

hashed into a hash table with the formula: H{k}

= {{order of x} * 10 + order of y} mod n. If a

collision occurs, it means that more than one

itemset has the same hash address. The thing

that must be done is rehashing with multiple

addresses 2 times the previous number with the

formula:

h{k} = {{order of x} * 10 + order of y} mod j.

Note j is the number of addresses after adding.

{j = 2 * m + 1} m is the number of addresses in

the hash table before adding. The addition of

the hash table address is carried out until the

collision between itemset is no longer found. If

the result of the bucket count value is greater

than or equal to the minsupport value, the L1

combination qualifies to be included in the

candidate from Large itemset-2 (C2). Next is

building table L2 from table C2 where the

process is the same as building L1 from table

C1. For searching 3-itemset use a different

formula is:

H(k) = ((order of X) * 100 + (order of Y) * 10

+ order of Z) mod j.

Order of Z states the order of items from the

third item.

3 RESULT AND DISCUSSION

3.1 CT-Pro

In this study, 150 datasets in the form of crime data

on children from the Office of Women's

Empowerment and Child Protection of North

Sumatra Province, P2TP2A unit were used. The data

was converted in the form of binary numbers, namely

the data format is in the form of 1 & 0. The value is 1

if there is a crime criterion in the case and a value of

0 if there is no crime criterion in the case. For

example, in the first case there were crimes PF, PE,

PN and TR.

Association Rule Analysis using CT-Pro and Hash-based Algorithm in Violence Case of Children

567

Table 1: Data Conversion.

NO PF PS PE PP PN TR MA PB EP

1 1 0 1 0 1 1 0 0 0

2 1 0 1 0 0 1 0 0 0

3 1 1 1 0 0 0 0 0 0

4 0 1 1 0 0 0 0 1 0

5 1 0 1 1 0 0 0 0 0

6 0 1 1 0 0 1 0 0 0

7 1 0 1 1 0 0 1 0 0

8 0 0 1 1 1 0 0 0 0

9 1 0 0 0 0 0 0 0 0

10 0 1 0 0 0 1 0 0 0

11 1 0 1 1 0 0 0 0 0

12 1 0 1 0 0 0 0 1 0

13 1 1 0 0 0 0 0 0 0

14 1 1 1 0 0 1 0 0 0

15 1 1 1 0 0 0 0 0 0

16 0 0 0 1 0 0 1 0 0

17 1 0 1 0 0 0 0 0 0

18 0 0 0 0 0 0 0 0 1

19 0 1 1 0 0 1 0 0 0

20 0 1 1 0 1 0 0 0 0

Notes:

Physical Torture = PF, Sexual harassment = PS,

Emotional Torture = PE, Abandonment and Neglect

= PP, Rejection = PN, Giving Terror to Children =

TR, Isolating Children = MA, Giving Bad Influence

to Children = PB, Exploitation = EP.

The next step was to create a Global item table

where each item was filtered with a predetermined

minsupport value of 10%. Furthermore, the data were

sorted from the largest to the smallest frequency

(descending) until a global item table is formed. The

PE itemset with the largest support count, namely 15,

get global ID 1.And PB itemset with the smallest

support, namely 2, with global ID 8.

Table 2: Global item.

Global ID Itemse

t

Su

pp

or

t

1 PE 15

2 PF 12

3 PS 9

4 TR 6

5 PP 5

6 PN 3

7 MA 2

8 PB 2

Then perform data mapping, mapping is data

mapping against the global ID table in table 2. In the

first case there was cases of PF, PE, PN and TR

where the global IDs of the cases were 1, 2, 4 and 6.

The next step is to build a Global CFP-Tree by

following the following processes. (i) Forming a new

node for each item in the global item table; (ii)

Accessing each item in the itemset, if the item in the

itemset is currentNode, then the number in the current

node is added by one, but if the item is not the same

as currentNode, a new node will be created for the

item. (iii) Each time the process of creating a new

node, setting the next and prev attribute values is

done; (iv) The process continues until all items are

accessed.

After the Global CFP-Tree is formed, the mining

process was carried out. In carrying out the Global

item table mining process, data was sorted based on

data from the smallest to the largest frequencies. At

this stage, take the PS (Sexual Harassment) data for

example with a support count of 9, the sixth smallest

data based on the global item table. The next step was

to find nodes that have links to PS in the Global CFP-

Tree, hereinafter referred to as Local frequent items

and used to build a Local item table then a Local CFP-

Tree was built as shown in Figure 1:

Figure 1: Local CFP-Tree.

Then from the Local CFP-Tree, the PS frequent

itemset was obtained:

Sexual harassment (PS).

Physical Torture(PF) - Sexual harassment(PS),

Emotional torture (PE) - Physical torture (PF),

Emotional torture (PE) - Sexual Harassment

(PS).

Emotional torture (PE) - Physical Torture

(PF)-Sexual harassment (PS).

Based on the frequent itemset, the confidence

value with a minconfidence ≥ 60% was calculated.

For example, from the frequent itemset (PS-PF-PE)

to search for the combination and calculate the

CESIT 2020 - International Conference on Culture Heritage, Education, Sustainable Tourism, and Innovation Technologies

568

confidence value. The following is the calculation

result of the confidence value for several itemset:

1. Confidence (Sexual harassment => Physical

Torture).

⅀

⅀

= 4 / 9 = 0.44 * 100 % = 44 %

2. Confidence (Physical Torture => Sexual

harassment).

⅀

⅀

= 4 / 12 = 0.33 * 100 % = 33 %

3. Confidence (Sexual harassment => Emotional

torture).

⅀

⅀

= 7 / 9 = 0.77 * 100 % = 77 %

4. Confidence (Emotional torture => Sexual

harassment.

⅀

⅀

= 7 / 15 = 0.46 * 100 % = 46 %

5. Confidence (Physical Torture => Emotional

torture).

⅀

⅀

= 10 / 12 = 0.83 * 100 % = 83 %

6. Confidence (Emotional torture => Physical

Torture).

⅀

⅀

= 10 / 15 = 0.66 * 100 % = 66 %

7. Confidence (Sexual harassment => Physical

Torture => Emotional torture).

⅀

,

⅀

= 3 / 9 = 0.33 * 100 % = 33 %

8. Confidence (Physical Torture => Emotional

torture => Sexual harassment).

⅀

, ,

⅀

= 3 / 12 = 0.25 * 100 % = 25 %

9. Confidence (Emotional torture => Sexual

harassment => Physical Torture).

⅀

⅀

=

3 / 15 = 0.2 * 100 % = 20 %

After obtaining a rule that meets the minimum

confidence which the rule has a minsupport> 10%

and a minconfidence> 60%. The result is that there

were 3 itemsets that meet these rules, namely PE-PF,

PS-PE, PF-PE. Furthermore, benchmark confidence

(BC) was calculated to obtain the lift ratio value.

Where the benchmark confidence was generated by

dividing the number of consequent occurrences (Nc)

then divided by the number of data (N). From these

results, the lift ratio was then searched by dividing the

value of confidence and benchmark confidence. The

result, if an act of emotional abuse is committed then

there is no crime of physical torture. Confidence:

66%, Support count: 10 and Lift Ratio: 1.1, if a crime

of sexual harassment is committed then a crime of

emotional torture will occur. Confidence: 77%,

Support count: 7 and Lift Ratio: 1.02, if the crime of

physical torture is committed then there will be no

crime of emotional torture. Confidence: 83%,

Support count: 10 and Lift Ratio: 1.10. From the

calculation results obtained in the lift ratio value table

obtained and successfully formed which has a value

greater than one (lift ratio> 1) indicates that the rule

is strong and valid. And vice versa if (lift ratio <1), it

indicates that the rule is not strong or invalid.

3.2 Hash-based

Hash-based processes were tested using the same data

as many as 150 datasets in the form of child crime

data. The stage of the hash-based algorithm is to

determine the value of minsupport and

minconfidence as a threshold condition, minsupport

>10% and minconfidence >60%. To simplify the

calculation of the hash table, each item requires a

sequence of items in the data which is used to

represent the values in the calculation. For example

the Emotional Torture itemset with Initial PE in the

order of 1, following is the order of the items that have

been determined in Table 3.

Table 3: Order of item.

Initial Itemset Order

PE Emotional Torture 1

PF Physical Torture 2

PS Sexual Harassment 3

TR Giving Terror to Children 4

PP Abandonment and Neglect 5

PN Rejection 6

MA Isolating Children 7

PB Giving Bad Influence to Children 8

EK Exploitation 9

The generation of C1 was carried out based on the

calculation of support count. Before entering each

itemset into the bucket in the hash table, the hashing

process for the 1-itemset candidate must be done with

Association Rule Analysis using CT-Pro and Hash-based Algorithm in Violence Case of Children

569

the formula h{x} = {order of item x} mod n. Address

lookup in the hash table for 1 itemset:

h (Emotional Torture) = (1) mod 19 = 1

h (Physical Torture) = (2) mod 19 = 2

h (Sexual Harassment) = (3) mod 19 = 3

h (Giving Terror to Children) = (4) mod 19 = 4

h(Abandonment and Neglect) = (5) mod 19 = 5

h (Rejection) = (6) mod 19 = 6

h (Exile Children) = (7) mod 19 = 7

h (Bad Influence) = (8) mod 19 = 8

h (Exploitation) = (9) mod 19 = 9

After performing the hash calculation, the itemset

gets the hash address. Itemset occupies hash

addresses and becomes notes, then builds links that

point to items that contain the itemset sequentially

until the link list is formed. Then the itemset was

filtered based on the minsupport value, which is

>10%, itemset that has a support value> 10% will

produce L1 (Large 1). The result of the itemset with

the highest support was PE, which is 15 Count with

index 1 and the lowest support itemset was PB, which

is 2 Count with index 8. Itemset Large 1 is shown in

Table 4.

Table 4: L1 (Large 1).

Index Itemset Support

1 PE 15

2 PF 12

3 PS 9

4 TR 6

5 PP 5

6 PN 3

7 MA 2

8 PB 2

The large 1 table is data sorted from the largest to

the smallest frequency (descending) after going

through the selection process at C1 (Candidate 1).

The results from L1 are then combined and hashed

into the hash table with the formula: H {k} = {{order

of x} * 10 + order of y} mod n.

Address lookup in hash table for 2-itemset:

h (PE, PF) = ((1) * 10 + 2) mod 19 = 12

h (PE, PS) = ((1) * 10 + 3) mod 19 = 13

h (PE, TR) = ((1) * 10 + 4) mod 19 = 14

h (PE, PP) = ((1) * 10 + 5) mod 19 = 15 *

h (PE, PN) = ((1) * 10 + 6) mod 19 = 16

h (PE, MA) = ((1) * 10 + 7) mod 19 = 17*

h (PE, PB) = ((1) * 10 + 8) mod 19 = 18*

h (PF, PS) = ((2) * 10 + 3) mod 19 = 4

h (PF, TR) = ((2) * 10 + 4) mod 19 = 5

h (PF, PP) = ((2) * 10 + 5) mod 19 = 6

h (PF, PN) = ((2) * 10 + 6) mod 19 = 7

h (PF, MA) = ((2) * 10 + 7) mod 19 = 8*

h (PF, PB) = ((2) * 10 + 8) mod 19 = 9

h (PS, TR) = ((3) * 10 + 4) mod 19 = 15*

h (PS, PN) = ((3) * 10 + 6) mod 19 = 17 *

h (PS, PB) = ((3) * 10 + 8) mod 19 = 0*

h (TR, PN) = ((4) * 10 + 6) mod 19 = 8*

h (PP, PN) = ((5) * 10 + 6) mod 19 = 18*

h (PP, MA) = ((5) * 10 + 7) mod 19 = 0 *

In the calculation above, a collision is found,

which means there is more than one itemset that has

the same hash address. In this calculation, the

collision is at the 0 address (PS, PB) with (PP, MA),

the 8th address (PF, MA) with (TR, PN), the 15th

address (PE, PP) with (PS, TR), and the 17th address

(PE, MA) with (PS, PN), the 18th address (PE, PB)

with (PP, PN). If a collision occurs, the first thing to

do is check or check the available bucket address. If

after checking is done and an indication is found that

the hash table has been filled, then rehashing with

multiple addresses 2 times the number of previous

addresses must be done with the formula:

h {k} = {{order of x} * 10 + order of y} mod j,

j is the number of addresses after adding. {j = 2 * m

+ 1} m is the number of addresses in the hash table

before adding.

h (PE, PF) = ((1) * 10 + 2) mod 39 = 12

h (PE, PS) = ((1) * 10 + 3) mod 39 = 13

h (PE, TR) = ((1) * 10 + 4) mod 39 = 14

h (PE, PP) = ((1) * 10 + 5) mod 39 = 15

h (PE, PN) = ((1) * 10 + 6) mod 39 = 16

h (PE, MA) = ((1) * 10 + 7) mod 39 = 17*

h (PE, PB) = ((1) * 10 + 8) mod 39 = 18*

h (PF, PS) = ((2) * 10 + 3) mod 39 = 23

h (PF, TR) = ((2) * 10 + 4) mod 39 = 24

h (PF, PP) = ((2) * 10 + 5) mod 39 = 25

h(PF, PN) = ((2) * 10 + 6) mod 39 = 26

h (PF, MA) = ((2) * 10 + 7) mod 39 = 27

h (PF, PB) = ((2) * 10 + 8) mod 39 = 28

h (PS, TR) = ((3) * 10 + 4) mod 39 = 34

h (PS, PN) = ((3) * 10 + 6) mod 39 = 36

h (PS, PB) = ((3) * 10 + 8) mod 39 = 38

h (TR, PN) = ((4) * 10 + 6) mod 39 = 7

h (PP, PN) = ((5) * 10 + 6) mod 39 = 17*

h (PP, MA) = ((5) * 10 + 7) mod 39 = 18*

It was also found that collisions at the 17th

address for (PP, PN) with (PE, MA) and the 18th

address for (PP, MA) with (PE, PB) still occurred. To

solve this problem, the same formula is used again.

h (PE, PF) = ((1) * 10 + 2) mod 79 = 12

h (PE, PS) = ((1) * 10 + 3) mod 79 = 13

CESIT 2020 - International Conference on Culture Heritage, Education, Sustainable Tourism, and Innovation Technologies

570

h (PE, TR) = ((1) * 10 + 4) mod 79 = 14

h (PE, PP) = ((1) * 10 + 5) mod 79 = 15

h (PE, PN) = ((1) * 10 + 6) mod 79 = 16

h (PE, MA) = ((1) * 10 + 7) mod 79 = 17

h (PE, PB) = ((1) * 10 + 8) mod 79 = 18

h (PF, PS) = ((2) * 10 + 3) mod 79 = 23

h (PF, TR) = ((2) * 10 + 4) mod 79 = 24

h (PF, PP) = ((2) * 10 + 5) mod 79 = 25

h (PF, PN) = ((2) * 10 + 6) mod 79 = 26

h (PF, MA) = ((2) * 10 + 7) mod 79 = 27

h (PF, PB) = ((2) * 10 + 8) mod 79 = 28

h (PS, TR) = ((3) * 10 + 4) mod 79 = 34

h (PS, PN) = ((3) * 10 + 6) mod 79 = 36

h (PS, PB) = ((3) * 10 + 8) mod 79 = 38

h (TR, PN) = ((4) * 10 + 6) mod 79 = 46

h (PP, PN) = ((5) * 10 + 6) mod 79 = 56

h (PP, MA) = ((5) * 10 + 7) mod 79 = 57

The addition of the hash table address is carried

out until the collision between itemset is no longer

found. Each address is filled with 1 itemset then the

combined L1 (L1 * L1) results are then distributed

into the address bucket. From the hash table, the

calculation of support for frequent 2-itemset using the

support formula is performed. The results show that

the PE, PF itemset with address 12 has a support

percentage of 50% and a support count of 10 from the

total data of 20 cases. The complete calculation result

of frequent 2-itemset or C2 can be seen in Table 5.

Table 5: Frequent 2-Itemset (Tabel C2).

Address Itemset Count N Support

12 (PE, PF) 10 20 50 %

13

(

PE, PS

)

7 20 35 %

14

(

PE, TR

)

5 20 25 %

15 (PE, PP) 4 20 20 %

16 (PE, PN) 3 20 15 %

17 (PE, MA) 1 20 5 %

18

(

PE, PB

)

2 20 10 %

23

(

PF, PS

)

4 20 20 %

24

(

PF, TR

)

4 20 20 %

25 (PF, PP) 3 20 15 %

26 (PF, PN) 1 20 5 %

27 (PF, MA) 1 20 5 %

28

(

PF, PB

)

1 20 5 %

34

(

PS, TR

)

5 20 25 %

36 (PS, PN) 1 20 5 %

38 (PS, PB) 1 20 5 %

46 (TR, PN) 1 20 5 %

56

(

PP, PN

)

1 20 5 %

57

(

PP, MA

)

2 20 10 %

From Table 6, the itemset which has a minsupport

value of >10% is then carried out to produce frequent

2-itemset or L2. Followed by looking for the

confidence formula value as follows:

Confidence

=

X 100

If the minconfidence value is > 60% then the value

below the minconfidence will be eliminated. From

this calculation, there is one itemset that has a value

of> 60% itemset, namely PE, PF with a number of

support counts A and B of 10 and support count A of

15. Then proceed with the calculation of benchmark

confidence and lift ratio to find out whether the rule

is valid or not. . Based on the calculations carried out,

it can be concluded that those who meet minutes

support> 10%, minimum confidence> 60% and lift

ratio> 1 are as follows: If an act of emotional torture

(PE) is committed then there will be no crime of

physical torture (PF). Confidence: 66%, Support

Count: 10 and Lift Rasio 1.1.

Next, to look for frequent 3-itemset, L2 results are

combined and hashed into a hash table with the

formula: H(k) = ((order of X) * 100 + (order of Y) *

10 + order of Z) mod j.

Based on the first test conducted with data from

150 cases, the CT-Pro algorithm obtained minsupport

= 15% and minconfidence = 50% with 2 rules

generated by the number of rules, and 0.06 seconds

execution time. Meanwhile, Hash-Based generates 2

rules, with an execution time of 0.41 seconds. The

second test was carried out with the CT-Pro algorithm

with minsupport = 10% and minconfidence = 40%

with the number of rules generated as many as 8 rules

and an execution time of 0.07 seconds. Meanwhile,

Hash-Based generates 8 rules, with an execution time

of 0.43 seconds.

The following are the complete results of the

comparison test between the CT-Pro algorithm and

the Hash-Based algorithm:

Table 6: Comparison Results.

No

Min

supp

%

Min

conf

%

CT-Pro Hash-Base

d

Time Rule Time

(sec)

1 15 50 2 0.06 2 0.41

2 10 40 8 0.07 8 0.43

3 7 30 13 0.11 13 0.48

4 5 20 20 0.16 20 0.58

5 3 15 22 0.25 22 0.73

Execution time comparison chart:

Association Rule Analysis using CT-Pro and Hash-based Algorithm in Violence Case of Children

571

Figure 2: Execution Time Comparison Result.

The results of the conducted tests shows that the

smaller of the given minsupport and the

minconfidence values, the longer the data execution

time will be (since more association rules were

formed). Conversely, the higher the given minutes

support and the minconfidence values, the faster the

data execution time will be (since fewer association

rules were formed). In this study, the CT-Pro

algorithm was proven to work well. This can be seen

from the CFP-Tree data structure where the number

of nodes built was very limited so that data execution

was faster. Meanwhile, the Hash-Based algorithm

selects data in the generation process C1 (candidate

1) and L1 (Large 1) and so on, using the hashing

formula. In the hashing calculation process, each item

must have a different address. If there is the same

address (collision), then re-hashing is done by adding

the number of addresses, which is 2 times the

previous number plus 1. In the calculation of the

dataset above, there were several collisions so that

there was an addition of the address. This causes the

Hash-Bases process to take a long time to execute

data.

4 CONCLUSION

From the comparison test results between the CT-Pro

algorithm and the Hash-Based algorithm, it can be

concluded that the CT-Pro algorithm produces a

faster or better processing time than the Hash-Based

algorithm. The conducted test results shows that a

minimum support and confidence of 3% and 15%,

respectively, and CT-Pro produces 22 rules with an

execution time of 0.25 seconds were obtained. The

result is faster than the Hash-Based algorithm which

generates 22 rules with an execution time of 0.73

seconds. This difference occurs due to collisions

which cause an increase in the number of addresses

in the hashing process. A new crime pattern with the

highest support and confidence was found if there

was an act of sexual harassment where there would be

physical torture with a confidence of 59%, a support

count of 34 and a lift ratio of 1.29.

REFERENCES

Aguru, S. and Rao, B.M., 2014. A Hash Based Frequent

Itemset Mining using Rehashing, International Journal

on Recent and Innovation Trends in Computing and

Communication, Volume: 2 Issue: 12.

Ali, Y., Farooq, A., Alam, T. M., Farooq, M. S., Awan, M.

J., & Baig, T. I., 2019. Detection of Schistosomiasis

Factors using Association Rule Mining, IEEE Access,

2019.2956020:1-2019.2956020:8.

Atmaja, E.H.S., Simaremare, R. and Rosa, P.H.P., 2019.

Aplication of CT-Pro Algorithm For Crime Analysis,

Conference SENATIK STT Adisutjipto Yogyakarta.

pp.435-444.

Dhivya, A.B. and Kalpana, B., 2010. A study on the

Performance of CT-APRIORI and CT-PRO Algorithms

using Compressed Structures for Pattern Mining,

Journal of Global Research in Computer Science, 1(2),

pp. 8-15.

Ghazanfari, B., Afghah, F. and Taylor, M.E., 2020.

Sequential Association Rule Mining for Autonomously

Extracting Hierarchical Task Structures in

Reinforcement Learning, IEEE ACCESS

2020:2965930:1-2020:2965930:18.

Gupta, B. and Garg, D., 2011. FP-Tree Based Algorithms

Analysis: FP-Growth, COFI-Tree and CT-PRO,

International Journal on Computer Science and

Engineering (IJCSE). 3(7) pp. 2691-2699.

Han, Q., Lu, D., Zhang, K., Song, H., & Zhang, H., 2019.

Secure Mining Of Association Rules In Distributed

Datasets, IEEE Access. 2019:2948033:1-

2019:2948033:10. 2019.

Hossain, M., Sattar A.H.M. and Paul, M.K., 2019. Market

Basket Analysis Using Apriori and FP Growth

Algorithm, International Conference on Computer and

Information Technology (ICCIT).

Islamiyah., Ginting, P.L., Dengen, N. and Taruk, M., 2019.

Comparison of Apriori and FP-Growth Algorithms in

Determining Association Rule, International

Conference on Electrical, Electronics and Information

Engineering (ICEEIE 2019). pp.320-323.

Law of the Republic of Indonesia No. 23. 2002.

Concerning Child Protection.

State institutions

Republic of Indonesia. 2002; 109: 1-14.

Li, A., Liu, L., Ullah, A., Wang, R., Ma, J., Huang, R., Yu,

H., Ning, H., 2019. Association Rule-Based Breast

Cancer Prevention and Control System,

IEEE

Transactions on Computational Social Systems.

Pp.1106-1114.

Muhajir, M., Kusumawati, A. and Mulyadi, S., 2020.

Apriori Algorithm for Frequent Pattern Mining for

Public Librariesin United States,

Proceedings of the

0

0,2

0,4

0,6

0,8

CT‐Pro

Hash‐

Based

572

International Conference on Mathematics and Islam

(ICMIs 2018)

. pp.60-64

Naresh, P. and Suguna, R., 2019. Association Rule Mining

Algorithms on Large and Small Datasets: A

Comparative Study, Proceedings of the International

Conference on Intelligent Computing and Control

Systems (ICICCS 2019). pp.587-592

Nomura, K., Shiraishi, Y., Mohri, M. and Morii, M., 2020.

Secure Association Rule Mining on Vertically

Partitioned Data Using Private-Set Intersection, IEEE

Access. 2020:3014330:1-2020:3014330:10

Rao, S. and Gupta, P., 2012. Implementing Improved

Algorithm Over Apriori Data Mining Association Rule

Algorithm, IJCST. Vol. 3. pp. 489-493. Jan-Mar 2012.

Ren, F., Pei, Z., & Wu, K., 2019. Selection of Satisfied

Association Rules via Aggregation of Linguistic

Satisfied Degrees, IEEE Access. 2019:2926735:1-

2019:2926735:17. 2019

Segatori, A., Bechini, A., Ducange, P. and Marcelloni, F.,

2018. A Distributed Fuzzy Associative Classifier for

Big Data, IEEE Transactions on Cybernetics. pp.2656–

2669

Shaban, A., Almasalha, F., & Qutqut, M. H.,2018. Hybrid

user action prediction system for automated home using

association rules and ontology, IET Wireless Sensor

Systems. Vol. 9 Iss. 2. pp. 85-93

Si, H., Zhou, J., Chen, Z., Wan, J., Xiong, N.N., Zhang, W.,

2019. Association Rules Mining among Interests and

Applications for Users on Social Networks, IEEE

Access 2019:2925819:1-2019:2925819:13

Siswanto, B., Thariqa, P., 2018. Association Rules Mining

for Identifying Popular Ingredients on YouTube

Cooking Recipes Videos, INAPR International

Conference. pp. 95-98.

Sitnikov, D., Titova, O., Minukhin, S., Kovalenko A., and

Titov S., 2018. Informativity of Association Rules

from the Viewpoint of Information Theory,

International Scientific-Practical Conference.

pp.595-

598 .

Zahrotun, L., Soyusiawaty, D. and Pattihua, R.S., 2018. The

Implementation of Data Mining for Association

Patterns Determination Using Temporal Association

Methods in Medicine Data, Internasional Seminar on

Research of Information Technology and Intelligent

Systems (ISRITI). pp.668-673.

Association Rule Analysis using CT-Pro and Hash-based Algorithm in Violence Case of Children

573