Publications & Patents

Peer-reviewed papers, workshops, and patents across NLP, speech, and information retrieval.

Google Scholar

2024

Patent

Systems and Methods for Advanced Duplicate Image Search and Analysis

US Patent Issued · App. 18/652,500 · Publication No. US20240411724A1

Issued patent for a system identifying duplicate documents using vector embeddings and similarity hashing. Provides scalable, high-accuracy deduplication for enterprise-scale document repositories. Additionally, 3+ AI/ML patent filings pending.

Computer VisionVector EmbeddingsSimilarity HashingAI/ML

2022

Conference / Workshop

Bibletts & LiSTra: African Speech Corpora

Interspeech 2022 · NeurIPS 2022 Black in AI Workshop

Co-authored "Bibletts", a high-fidelity multilingual speech corpus. Developed LiSTra, the first English-to-Lingala speech translation dataset and baseline — using both traditional cascade ASR+MT and a transformer-based End-to-End architecture.

ASRSpeech TranslationLingalaLow-Resource NLP

2021

Journal / Conference🏆 ICADL 2021 Best Paper Award30+ citations

Automated Mining of Leaderboards for Empirical AI Research

ICADL 2021 · International Journal on Digital Libraries

Presents a comprehensive approach for generating Leaderboards for knowledge-graph-based scholarly information organization. Investigates automated leaderboard construction using BERT, SciBERT, and XLNet — achieving F1 > 90% and setting new state-of-the-art for leaderboard extraction.

Knowledge GraphsInformation ExtractionNLPScholarly IE
Conference / Workshop

LiSTra Automatic Speech Translation: English to Lingala Case Study

NeurIPS 2021 · Black in AI Workshop (Spotlight)

Presents the Lingala Speech Translation (LiSTra) dataset and releases a full pipeline for constructing such datasets in other low-resource languages. Reports baselines using both cascade ASR→MT and a revolutionary transformer-based End-to-End architecture with customized interactive attention.

ASRMachine TranslationLingalaTransformers

2020

Conference / Workshop280+ citations

Participatory Research for Low-Resourced Machine Translation

EMNLP Findings 2020 · AfricaNLP Workshop ICLR 2020

Contributor to the Masakhane NLP initiative. Discusses methodology for building an African NLP research community and outlines success in addressing the lack of resources for African languages. Sets the standard for African Language NLP.

Machine TranslationAfrican LanguagesLow-Resource NLPCommunity
Thesis

An Empirical Investigation into the Properties of Standard Word Embeddings

MSc Thesis · University of the Western Cape / AIMS South Africa

Reviews mechanisms for computing word embeddings, investigates popular toolkits and embedding matrices, and experiments with selected implementations to better understand their characteristics and properties.

Word EmbeddingsNLPDeep Learning