
20 years was $2.7 million but 78% of incidents cost
under $10,000. Although some high-profile breaches
inflate the global average, majority of data breaches
did not have a significant impact on an organisation.
While IT industry is the most affected, there are grow-
ing breach incidents in sectors handling highly sensi-
tive data like healthcare and finance. Email is the most
common type of data breached while hacking is the
major cause of data breaches. North America is the
most affected region. SSH, RDP, FTP, Intruder, and
Metasploit are the top five tools used in cyber attacks.
For future research,we will combine the extracted data
for deeper analysis. For example, studying cost trend
in data breach by regions; applying statistical valida-
tion techniques like chi-square tests and correlation
analysis; cost trend in data breach by the causes. We
can also expand the dataset by incorporating multilin-
gual sources to enhance regional coverage. We plan
to manually investigate data breaches, summarise re-
sponses of affected companies and provide solutions
that can be used by business owners.
REFERENCES
Abdulsatar, M., Ahmad, H., Goel, D., and Ullah, F. (2024).
Towards deep learning enabled cybersecurity risk as-
sessment for microservice architectures. arXiv preprint
arXiv:2403.15169.
Accenture (2021). The state of cybersecurity resilience 2021.
Technical report.
ACSC (2022). Data breaches.
Baker, W., Goudie, M., Hutton, A., Hylender, C. D., Nie-
mantsverdriet, J., Novak, C., Ostertag, D., Porter, C.,
Rosen, M., Sartin, B., Tippett, P., the Men, and of the
United States Secret Service, W. (2011). 2011 data
breach investigations report. Report.
Barbaschow, A. (2022). 3.9 million medibank customers
have had their data breached. Gizmodo Australia. Ac-
cessed: 2025-02-05.
Cheng, L., Liu, F., and Yao, D. (2017). Enterprise data
breach: Causes, challenges, prevention, and future
directions. WIREs Data Mining and Knowledge Dis-
covery, 7(5):e1211.
Chowdhary, K. R. (2020). Natural Language Processing.
Springer India, New Delhi.
Cloudflare (2022). What is a web crawler? — how web
spiders work.
CyberArk (2022). What is a web crawler? — how web
spiders work. https://www.cyberark.com/whatis/data-
breach/.
DataReportal (2025). Global digital overview. Accessed:
2025-02-05.
Deloitte (2022). Impact of covid-19 on cybersecurity.
Dilmegani, C. (2022). What is web crawling? how it works
with examples.
Dyrand-Systems (2022). You get more spam and phishing
emails if your data is breached. Accessed: 2025-02-05.
Goldberg, E. (2013). Preventing a data breach from be-
coming a disaster. Journal of Business Continuity and
Emergency Planning, 6:295–303.
Holtfreter, R. E. and Harrington, A. (2015). Data breach
trends in the united states. Journal of Financial Crime,
22(2):242–260.
IBM (2021). Cost of a data breach report 2021. Technical
report, IBM Security. Accessed: 2025-02-05.
IBM (2022). What is natural language processing?
INE (2022). Penetration testing: System security.
Khan, F., Kim, J. H., Mathiassen, L., and Moore, R. (2021).
Data breach management: An integrated risk model.
Information Management, 58(1):103392.
Kuhail, M. A., Taj, I., Alimamy, S., and Abu Shawar, B.
(2024). A review on polyadic chatbots: trends, chal-
lenges, and future research directions. Knowl. Inf. Syst.,
67(1):109–165.
Labrecque, L. I., Markos, E., Swani, K., and Pe
˜
na, P. (2021).
When data security goes wrong: Examining the impact
of stress, social contract violation, and data type on
consumer coping responses following a data breach.
Journal of Business Research, 135:559–571.
Manworren, N., Letwat, J., and Daily, O. (2016). Why you
should care about the target data breach. Business
Horizons, 59(3):257–266.
Neto, N. N., Madnick, S., Paula, A. M. G. D., and Borges,
N. M. (2021). Developing a global data breach
database and the challenges encountered. Journal of
Data and Information Quality, 13(1):Article 3.
Offensive-Security (2022). Penetration testing with kali
linux.
Optus (2022). Latest updates and support on our cyber
response. Accessed: 2025-02-05.
Phu, A. T., Li, B., Ullah, F., Ul Huque, T., Naha, R., Babar,
M. A., and Nguyen, H. (2023). Defending sdn against
packet injection attacks using deep learning. Computer
Networks, 234:109935.
Romanosky, S., Hoffman, D., and Acquisti, A. (2014). Em-
pirical analysis of data breach litigation. Journal of
Empirical Legal Studies, 11(1):74–104.
Sawalha, G., Taj, I., and and, A. S. (2024). Analyzing stu-
dent prompts and their effect on chatgpt’s performance.
Cogent Education, 11(1):2397200.
spaCy (2022). Named entity recognition. Accessed: 2025-
02-05.
Statista (2025). Global data creation volume. Online. Ac-
cessed: 2025-02-05.
Thomas, K., Li, F., Zand, A., Barrett, J., Ranieri, J., In-
vernizzi, L., Markov, Y., Comanescu, O., Eranti, V.,
Moscicki, A., Margolis, D., Paxson, V., and Bursztein,
E. (2017). Data breaches, Phishing, or Malware? under-
standing the risks of stolen credentials. In Proceedings
of the 2017 ACM Conference on Computer and Com-
munications Security (CCS ’17), pages 1421–1434.
DATA 2025 - 14th International Conference on Data Science, Technology and Applications
406