Host Institution:
Virginia Commonwealth University (VCU)
Department of Computer Science,
College of Engineering,
Richmond, VA - 23284
Contact Us:
Please Complete This Google Form --> https://forms.gle/QDSxLw5NHKr9P9Wt6
Application Deadline: | May 09, 2022 |
Workshop Date: | 10 am - 12 pm on May 11, 2023 |
Room E4221, Computer Lab at the Fourth Floor in Engineering Building East
Introduction
Digital forensics investigations often involve the analysis of text, e.g., text messages, e-mails, forum posts. Text analysis in digital forensics endeavors to reveal valuable information and undetected patterns in vast digital text data to assist investigations. This type of analysis can aid in identifying pertinent evidence, tracing suspects, and constructing a case. In addition, it can aid in discovering cyber threats and fraud by examining evidence present in emails, social media, and other forms of digital communication that are part of cyber-attacks and financial crimes.
Using modern Natural Language Processing (NLP) techniques for forensic text analysis can greatly enhance the efficiency of the analysis of text in digital forensics. For instance, NLP pre-processing techniques like tokenization, preprocessing, stemming, and named entity recognition (NER) can help to extract relevant information from unstructured digital evidence data more efficiently and effectively. NLP analysis techniques, such as clustering, text summarization, and categorization, can also help to identify patterns and relationships in text data that might otherwise be difficult to detect. Additionally, text visualization techniques such as word clouds, network visualizations, and topic modeling create meaningful visual representations of the text data, which can aid in identifying patterns and relationships in the text and make the analysis more interpretable and understandable.
Through this workshop on using NLP in forensic text analysis, participants will greatly improve their ability to extract valuable information from large amounts of digital forensics text data, which can be critical for investigations and decision-making.
Workshop Module Details
The workshop will start with an introductory session on Digital Forensics and NLP techniques (i.e., a NLP primer) followed by two different scenarios where digital forensics investigation will be augmented using NLP techniques. Each scenario will involve a set of forensics questions that will be answered using NLP techniques, along with interactive exercises to engage the audience.
Module 1: Digital Forensics Primer
Following topics will be covered in this module:
Module 2: NLP Primer
The audience will be introduced to the following topics:
Module 3: Enron Corpus Fraud Investigation
The Enron email dataset contains approximately 500,000 emails, obtained by the Federal Energy Regulatory Commission during its investigation of Enron's collapse. In this scenario, we will explore:
Module 4: Discord Chat Cyberbullying
Consider a scenario where the digital investigation team was assigned to analyze a large dataset collected from a Discord chat for cyberbullying or other malicious activities. Following forensics questions will be addressed: