Evaluating and Extending Techniques for Fine-Grained Text-Topic Prediction for Digital Forensic Data
Abstract
In digital forensics, the search of personal devices by the police is constrained by judicial warrants and must adhere to constitutional privacy rights as prescribed by the Fourth Amendment. Determining which documents align closely with the topics specified in a warrant becomes challenging when a judge attempts to limit the search’s scope based on the content of extracted data. This paper focuses on identifying, evaluating, and extending topic classification techniques for this end application, to which computer/data scientists have not yet received attention. After analyzing the requirements of this domain and considering applicable techniques, we focus on a class of techniques known as zero-shot classifiers. To improve their effectiveness, we propose a method that essentially involves clustering the candidate topics and the documents. A detailed comparison of two datasets shows the effectiveness of combining clustering with zero-shot classifiers. This combined method outperforms supervised methods requiring training data for specific topic inference.
Repository Citation
Thomas E. Kadri, Khan Mohammad Al Farabi, Gagan Agrawal, Gokila Dorai, Rajon Bardhan, and Hoda Maleki,
Evaluating and Extending Techniques for Fine-Grained Text-Topic Prediction for Digital Forensic Data
(2025),
Available at: https://digitalcommons.law.uga.edu/fac_artchop/1767
Previously posted to Springer Nature.