IDIOLECT-BASED IDENTITY DISCLOSURE AND AUTHORSHIP ATTRIBUTION IN WEB-BASED SOCIAL SPACES

Natalie Ardet

Abstract

In this paper, we inspect new possible methods of Web surveillance combining web mining with sociolinguistic and semiotic related knowledge of human discourse. We first give an overview of telecommunication surveillance methods and systems, with focus on the Internet, and we describe the legal issues involved in Web or Internet communications investigations. We put the emphasis on identity disclosure and anonymity or pseudonymity undermining in open web spaces. Further, we give an overview of new trends in Internet mediated communication, and examine the virtual social networks they create. Finally, we present the results of a new method using the semiotic features of web documents for authorship attribution and identity disclosure.

References

  1. AOL/NCSA (2004). Aol/ncsa online safety study. Technical report, America Online and the National Cyber Security Alliance.
  2. Ardet, N. (2004). Teenagers, Internet and Black Metal music. Conference Proceedings CIM 2004.
  3. Ardet, N. and Thome, M. (2004). Virtual Ethnography: a Computer-Based Approach. Computer and their Applications, Conference Proceedings (2004), 79-82.
  4. ATIS (2001). Alliance for telecommunications industry solutions telecom glossary.
  5. Baayen, H., Halteren, H., and Tweedie, F. (1996). Outside the cave of shadows: Using syntactic annotation to enhance authorship attribution. Literary and Linguistic Computing, 11.
  6. Chandler, D. (2002). Semiotics: The Basics. London, Routeledge.
  7. Clarke, R. (1999). Identified, anonymous and pseudonymous transactions: The spectrum of choice. User Identification & Privacy Protection Conference, Stockholm.
  8. Cranor, L. F. (2002). Web Privacy with P3P. O'Reilly & Associates.
  9. De Vel, O., Anderson, A., and Corney, M. (2001a). Mining e-mail content for author identification forensics. ACM Sigmod, Volume 30 , Issue 4 (December 2001).
  10. De Vel, O., Andersond, A., and Corney, M. (2001b). Multitopic-e-mail authorship attribution forensics. In ACM Conference on Computer Security - Workshop on Data Mining for Security Applications, November 8, 2001, Philadelphia, PA, USA.
  11. Garton, L. (1997). Studying on-line social networks. JCMC (Journal of Computer Mediated Communication) Vol.3, Issue 1, 1997.
  12. Ha, L. A. (2003). Extracting important domain-specific concepts and relations from a glossary. In Proceedings of the 6th CLUK Colloquium, pages 49-56, Edinburgh, UK.
  13. Hauben, M. and Hauben, R. (1997). Netizens: On the History and Impact of Usenet and the Internet. WileyIEEE Computer Society Press.
  14. Holmes, D. I. (1994). Authorship attribution. Computers and the Humanities, Nr. 28:87-106.
  15. Huchra, J. and Geller, M. (1982). Groups of Galaxies I. Nearby Groups. ApJ 257 423.
  16. Jacobson, D. (1999). Doing research in cyberspace. Fields Methods, Vol. 11, No. 2, November 1999:pp. 127- 145.
  17. Kantor, B. and Lapsley, P. (1986). Network news transfer protocol. Technical report, U.C. San Diego.
  18. Koppel, M., Argamon, S., and Shimoni, A. (2003). Automatically categorizing written texts by author gender. Literary and Linguistic Computing 17(4), November 2002, pp. 401-412.
  19. Mantovani, G. (2001). The psychological construction of the internet. from information foraging to social gathering to cultural mediation. Cyberpsychology And Behavior. Vol. 4 (1), Pp. 47-56.
  20. Meyer (2001). Extracting knowledge-rich contexts for terminography. In D. Bourigault, C. J. and LHomme, M. C., editors, Recent Advances in Computational Terminology. Amsterdam, John Benjamins.
  21. Nottingham, M. (2003). The atom syndication format 0.3 (pre-draft). Technical report, Atom Working Group.
  22. Oakes, M. P. (1998). Statistics for Corpus Linguistics. Edinburgh.
  23. Pfitzmann, A. (2004). Anonymity, unobservability, pseudonymity, and identity management a proposal for terminology (draft v0.21 sep. 03, 2004). Technical report, TU Dresden.
  24. Pfitzmann, A. and Köhntopp, M. (2001). Anonymity, unobservability, and pseudonymity a proposal for terminology. Technical report, proposal.
  25. Pilgrim, M. (2002). What is rss? www.xml.com.
  26. PLAI (2005). The plain language association international glossary. http://www.plainlanguagenetwork.org/.
  27. Preece, J. (2000). Online communities. Wiley.
  28. Sack, W. (2000). Conversation map: A content-based usenet newsgroup browser. In in the Proceedings of the International Conference on Intelligent User Interfaces (New Orleans, LA: Association for Computing Machinery, January 2000).
  29. Smith, M. (1983). Recent Experience and New Developments of Methods for the Determination of Authorship. Association for Literary and Linguistic Computing Bulletin, 11, 1983, S. 73-82.
  30. Stamatatos, E., N. F. and Kokkinakis, G. (2001). Computerbased authorship attribution without lexical measures. Computers and the Humanities 35, pages 193-214.
  31. STOA (1998). An appraisal of technologies of political control, interim study. Technical report, STOA Programme, Directorate-General for Research Directorate B, Eastman 112, rue Belliard 97-113, B-1047 Bruxelles., http://cryptome.org/stoa-atpc.htm.
  32. Wasserman, S. and Faust, K. (1994). Social Network Analysis: Methods and Applications. Cambridge University Press.
  33. Wellman, B. (1997). Cultures of the Internet, chapter An electronic group is virtually a social network, page pages 179205. Lawrence Erlbaum Publications, Mahwah, New Jersey.
  34. Wikipedia (2005). www.wikipedia.com.
Download


Paper Citation


in Harvard Style

Ardet N. (2005). IDIOLECT-BASED IDENTITY DISCLOSURE AND AUTHORSHIP ATTRIBUTION IN WEB-BASED SOCIAL SPACES . In Proceedings of the First International Conference on Web Information Systems and Technologies - Volume 1: WEBIST, ISBN 972-8865-20-1, pages 305-310. DOI: 10.5220/0001234803050310


in Bibtex Style

@conference{webist05,
author={Natalie Ardet},
title={IDIOLECT-BASED IDENTITY DISCLOSURE AND AUTHORSHIP ATTRIBUTION IN WEB-BASED SOCIAL SPACES},
booktitle={Proceedings of the First International Conference on Web Information Systems and Technologies - Volume 1: WEBIST,},
year={2005},
pages={305-310},
publisher={SciTePress},
organization={INSTICC},
doi={10.5220/0001234803050310},
isbn={972-8865-20-1},
}


in EndNote Style

TY - CONF
JO - Proceedings of the First International Conference on Web Information Systems and Technologies - Volume 1: WEBIST,
TI - IDIOLECT-BASED IDENTITY DISCLOSURE AND AUTHORSHIP ATTRIBUTION IN WEB-BASED SOCIAL SPACES
SN - 972-8865-20-1
AU - Ardet N.
PY - 2005
SP - 305
EP - 310
DO - 10.5220/0001234803050310