REFERENCES
Adar, E., Teevan, J., Dumais, S. T., and Elsas, J. L. (2009).
The web changes everything: understanding the dy-
namics of web content. In Proceedings of the Second
ACM International Conference on Web Search and
Data Mining, pages 282–291.
Audeh, B., Beigbeder, M., Zimmermann, A., Jaillon, P., and
Bousquet, C. (2017). Vigi4med scraper: a framework
for web forum structured data extraction and semantic
representation. PloS one, 12(1):e0169658.
Bonifacio, C., Barchyn, T. E., Hugenholtz, C. H., and Kien-
zle, S. W. (2015). Ccdst: A free canadian climate data
scraping tool. Computers & Geosciences, 75:13–16.
Dongo, I., Cadinale, Y., Aguilera, A., Mart
´
ınez, F., Quin-
tero, Y., and Barrios, S. (2020). Web scraping versus
twitter api: A comparison for a credibility analysis. In
Proceedings of the 22nd International Conference on
Information Integration and Web-Based Applications
& Services, iiWAS ’20, page 263–273, New York, NY,
USA. Association for Computing Machinery.
Fetterly, D., Manasse, M., Najork, M., and Wiener, J. L.
(2004). A large-scale study of the evolution of
web pages. Software: Practice and Experience,
34(2):213–237.
Glez-Pe
˜
na, D., Lourenc¸o, A., L
´
opez-Fern
´
andez, H.,
Reboiro-Jato, M., and Fdez-Riverola, F. (2014). Web
scraping technologies in an api world. Briefings in
bioinformatics, 15(5):788–797.
Goh, A., Koh, Y.-K., and Domazet, D. S. (2001). Eca rule-
based support for workflows. Artificial intelligence in
engineering, 15(1):37–46.
Kappel, G., Rausch-Schott, S., and Retschitzegger, W.
(2000). A framework for workflow management sys-
tems based on objects, rules and roles. ACM Comput-
ing Surveys (CSUR), 32(1es):27–es.
Khder, M. A. (2021). Web scraping or web crawling: State
of art, techniques, approaches and application. Inter-
national Journal of Advances in Soft Computing & Its
Applications, 13(3).
Kunang, Y. N., Purnamasari, S. D., et al. (2018). Web scrap-
ing techniques to collect weather data in south sumat-
era. In 2018 International Conference on Electrical
Engineering and Computer Science (ICECOS), pages
385–390. IEEE.
Landers, R. N., Brusso, R. C., Cavanaugh, K. J., and Coll-
mus, A. B. (2016). A primer on theory-driven web
scraping: Automatic extraction of big data from the
internet for use in psychological research. Psycholog-
ical methods, 21(4):475.
Leotta, M., Clerissi, D., Ricca, F., and Spadaro, C. (2013).
Improving test suites maintainability with the page ob-
ject pattern: An industrial case study. In 2013 IEEE
Sixth International Conference on Software Testing,
Verification and Validation Workshops, pages 108–
113. IEEE.
Lin, W., Qian, Z., Xu, J., Yang, S., Zhou, J., and Zhou, L.
(2016). Streamscope: continuous reliable distributed
processing of big data streams. In 13th {USENIX}
Symposium on Networked Systems Design and Imple-
mentation ({NSDI} 16), pages 439–453.
Liu, B. and Menczer, F. (2011). Web Crawling, pages 311–
362. Springer Berlin Heidelberg, Berlin, Heidelberg.
L
´
opez, P. G., Arjona, A., Samp
´
e, J., Slominski, A., and Vil-
lard, L. (2020). Triggerflow: trigger-based orchestra-
tion of serverless workflows. In Proceedings of the
14th ACM International Conference on Distributed
and Event-based Systems, pages 3–14.
Meschenmoser, P., Meuschke, N., Hotz, M., and Gipp, B.
(2016). Scraping scientific web repositories: Chal-
lenges and solutions for automated content extraction.
D-Lib Magazine, 22(9/10):15.
Mitchell, R. (2018). Web scraping with Python: Collecting
more data from the modern web. ” O’Reilly Media,
Inc.”.
Molina, P. J., Meli
´
a, S., and Pastor, O. (2002). User inter-
face conceptual patterns. In International Workshop
on Design, Specification, and Verification of Interac-
tive Systems, pages 159–172. Springer.
Munappy, A. R., Bosch, J., and Olsson, H. H. (2020). Data
pipeline management in practice: Challenges and op-
portunities. In International Conference on Product-
Focused Software Process Improvement, pages 168–
184. Springer.
Noghabi, S. A., Paramasivam, K., Pan, Y., Ramesh,
N., Bringhurst, J., Gupta, I., and Campbell, R. H.
(2017). Samza: stateful scalable stream processing
at linkedin. Proceedings of the VLDB Endowment,
10(12):1634–1645.
Ntoulas, A., Cho, J., and Olston, C. (2004). What’s new
on the web? the evolution of the web from a search
engine perspective. In Proceedings of the 13th inter-
national conference on World Wide Web, pages 1–12.
Pervaiz, F., Vashistha, A., and Anderson, R. (2019). Exam-
ining the challenges in development data pipeline. In
Proceedings of the 2nd ACM SIGCAS Conference on
Computing and Sustainable Societies, pages 13–21.
Saurkar, A. V., Pathare, K. G., and Gode, S. A. (2018). An
overview on web scraping techniques and tools. Inter-
national Journal on Future Revolution in Computer
Science & Communication Engineering, 4(4):363–
367.
Semeniuta, O. and Falkman, P. (2019). Epypes: a
framework for building event-driven data processing
pipelines. PeerJ Computer Science, 5:e176.
Simitsis, A., Wilkinson, K., Dayal, U., and Castellanos, M.
(2010). Optimizing etl workflows for fault-tolerance.
In 2010 IEEE 26th International Conference on Data
Engineering (ICDE 2010), pages 385–396. IEEE.
ICSOFT 2022 - 17th International Conference on Software Technologies
448