union does not exist then their union is encoded 
and inserted into the index. The compressor 
outputs the encoded value of the DocId which 
exists inside the index and also the next DocId 
is checked if it is inside the index. If it does not 
exist then it is inserted into the index. If it 
exists then we proceed with the next element in 
the list. 
o Sub Case 2: The current DocId in union with 
the next DocId, inside the list, is already stored 
inside the index. In this sub case the algorithm 
checks iteratively if the union of DocIds takes 
the union of the previous step in union with the 
next DocId inside the list, exists inside the 
index. It will go on and on till the list finishes 
or when the union is not stored inside the 
index. In the first case, when we reach the end 
of the list, compressor just outputs the encoded 
value of the union which is already stored 
inside the index. If the union does not exist 
then execute Sub Case 1.  
So for each term we build a list which contains the 
document identifiers and we check if their unions 
exist inside the index. 
4.3  Decompression with Modified 
LZW 
Decompression works the same way as the 
compression, by building the index. The encoded 
values begin from the maximum value of the re-
enumerate method. So modified LZW decompressor 
is creating a list for every term, storing the DocIds or 
the encoded values of patterns. For each element 
inside the list it checks if the element is inside the 
index. Again there are two cases: 
Case 1: The element does not exist inside the 
index and its value is smaller than the bound which 
separates DocIds and encoded values. So the 
decompressor will process the element as a DocId. It 
will encode the element and store it to the index. 
After the insertion decompressor will output the 
current list element and continue with the next 
element inside the list. 
Case 2: The element exists inside the index and 
its value is bigger than the bound’s value. In this 
case the decompressor will know that the element is 
the encoded value of a DocId or a union of DocIds. 
Decompressor will get the DocId or the union of 
DocIds from the index and output it to the file. But 
the algorithm does not stop here. Decompressor 
knows that the compressor outputted the encoded 
value because the union with the next element of the 
list did not exist into the index. So the outputted 
value is united with the next element inside the list 
and the union is encoded and stored into the index. 
After that, decompressor continues with the next 
element inside the list. 
4.4. Index Creation 
As we described in the section 4.1 the pattern 
matching method we applied is based on building an 
index. We scan the list of document identifiers of 
each term and for each element we check if it exists 
inside the index and then we encode it or search for 
DocId unions that are not encoded.  
In the below example we will show exactly how 
the compression and decompression algorithms 
work. Let’s assume we have 5 terms T1, T2, T3, T4, 
and T5 which consist of the below DocIds: 
T1: < 1, 2, 3, 4, 5, 9, 10 >  
T2: < 1, 2, 3, 4, 5, 9, 10, 14, 17 > 
T3: < 1, 2, 3, 4, 5, 9, 10, 17 > 
T4: < 1, 2, 3, 4, 5, 6, 7, 8, 21, 23 > 
T5: < 1, 2, 3, 4, 5, 6, 7, 8, 21, 23, 29 > 
The bound is 29, so the encoding numbers will 
begin on 30. We run the Modified LZW and we get: 
T1: < 1, 2, 3, 4, 5, 9, 10 >  
T2: < 30, 31, 32, 33, 34, 35, 36, 14, 17 > 
T3: < 37, 32, 33, 34, 35, 36, 42 > 
T4: < 43, 33, 34, 6, 7, 8, 21, 23 > 
T5: < 46, 34, 48, 49, 50, 51, 52, 29 > 
The encoded values of DocIds and unions: 
First list 
'1': 30, '2': 31, '3': 32, '4': 33, '5': 34, '9': 35, '10': 36 
Second list 
'1 2': 37, '3 4': 38, '5 9': 39, '10 14': 40, '14': 41, '17': 
42 
Third list 
'1 2 3': 43, '4 5': 44, '9 10': 45 
Fourth list 
'1 2 3 4': 46, '5 6': 47, '6': 48, '7': 49, '8': 50, '21': 51, 
'23': 52 
Fifth list 
'1 2 3 4 5': 53, '6 7': 54, '8 21': 55, '23 29': 56, '29': 57 
In this case the data do not seem very 
compressed because this is a small input, but if the 
input was gigabytes of DocIds then we could see a 
difference. 
Decompression takes as an input the compressed 
inverted file and with the same logic (reading the 
DocIds and building the index) it restores the 
original inverted file.