Publications & Patents
Peer-reviewed papers, workshops, and patents across NLP, speech, and information retrieval.
2024
Systems and Methods for Advanced Duplicate Image Search and Analysis
US Patent Issued · App. 18/652,500 · Publication No. US20240411724A1
Issued patent for a system identifying duplicate documents using vector embeddings and similarity hashing. Provides scalable, high-accuracy deduplication for enterprise-scale document repositories. Additionally, 3+ AI/ML patent filings pending.
2022
Bibletts & LiSTra: African Speech Corpora
Interspeech 2022 · NeurIPS 2022 Black in AI Workshop
Co-authored "Bibletts", a high-fidelity multilingual speech corpus. Developed LiSTra, the first English-to-Lingala speech translation dataset and baseline — using both traditional cascade ASR+MT and a transformer-based End-to-End architecture.
2021
Automated Mining of Leaderboards for Empirical AI Research
ICADL 2021 · International Journal on Digital Libraries
Presents a comprehensive approach for generating Leaderboards for knowledge-graph-based scholarly information organization. Investigates automated leaderboard construction using BERT, SciBERT, and XLNet — achieving F1 > 90% and setting new state-of-the-art for leaderboard extraction.
LiSTra Automatic Speech Translation: English to Lingala Case Study
NeurIPS 2021 · Black in AI Workshop (Spotlight)
Presents the Lingala Speech Translation (LiSTra) dataset and releases a full pipeline for constructing such datasets in other low-resource languages. Reports baselines using both cascade ASR→MT and a revolutionary transformer-based End-to-End architecture with customized interactive attention.
2020
Participatory Research for Low-Resourced Machine Translation
EMNLP Findings 2020 · AfricaNLP Workshop ICLR 2020
Contributor to the Masakhane NLP initiative. Discusses methodology for building an African NLP research community and outlines success in addressing the lack of resources for African languages. Sets the standard for African Language NLP.
An Empirical Investigation into the Properties of Standard Word Embeddings
MSc Thesis · University of the Western Cape / AIMS South Africa
Reviews mechanisms for computing word embeddings, investigates popular toolkits and embedding matrices, and experiments with selected implementations to better understand their characteristics and properties.