Salomon Kabongo

PhD in NLP · Lead Software Engineer · Masakhane Board Member

Experience

Lead Software Engineer

State Farm — Innovation Group

Feb 2022 – Present

Bloomington-Normal, IL

4 Patent Filings
  • Designed automated pre-labeling pipelines using embedding-based retrieval to accelerate data annotation.
  • Architected a proprietary document deduplication system utilizing visual similarity and hashing algorithms to identify near-duplicates, significantly streamlining business workflows.
  • Led R&D initiatives on Synthetic Media (Deepfake) detection and Video Understanding; benchmarked Visual Language Models (VLMs) against vendor solutions.
  • Invented novel computer vision applications for the insurance domain in AI/ML, resulting in 1 issued patent and 3+ additional filings pending.

Board Member

Masakhane Research Foundation

2021 – May 2026

Global

~$9M Research Funding
  • Spearheaded the strategic formation of the Masakhane AI Hub, defining the 2025–2029 roadmap to build digital public infrastructure for 1 billion+ African language speakers.
  • Secured and oversaw the execution of ~$9M USD in research funding (including $5M from the Bill & Melinda Gates Foundation and $4M from IDRC) to democratize AI access.
  • Led high-level collaborations with strategic partners including Google.org, Lacuna Fund, and UNESCO, scaling the community's impact across 50+ African languages.

Research Assistant

L3S / Leibniz Information Center for Science & Technology (TIB)

Nov 2020 – Nov 2022

Hannover, Germany

  • Engineered the core "Leaderboards" feature for the Open Research Knowledge Graph (ORKG), utilizing Knowledge Graphs to automatically track and visualize state-of-the-art (SOTA) progress across scientific publications.
  • Collaborated with Hannover Medical School (MHH) on personalized medicine research, applying machine learning techniques to analyze large-scale genetic datasets for predictive healthcare outcomes.
  • Conducted research on Scholarly Information Extraction, developing novel NLP pipelines to extract metric data from unstructured text for knowledge graph construction.

Education

PhD in Computer Science — AI / Natural Language Processing (LLMs)

Leibniz Universität Hannover

Nov 2020 – Nov 2025

Hannover, Germany

Master's in Machine Intelligence

African Master's in Machine Intelligence (AMMI)

Sponsored by Google and Facebook through AIMS

Oct 2019 – Nov 2020

Accra, Ghana

Master's in Mathematical Sciences

University of the Western Cape

African Institute for Mathematical Sciences (AIMS South Africa)

Aug 2018 – Jun 2019

Cape Town, South Africa

BSc (Honours) in Mathematics & Computer Science

Université de Lubumbashi

Oct 2014 – Jul 2017

Lubumbashi, DRC

Selected Publications & Patents

Systems and Methods for Advanced Duplicate Image Search and Analysis

2024

US Patent

App. 18/652,500, No. US20240411724A1 · Assignee: State Farm

🏆 US Patent

Bibletts & LiSTra: African Speech Corpora

2022

Interspeech, NeurIPS Workshops

High-fidelity multilingual speech corpus; first English-to-Lingala speech translation baseline

Automated Mining of Leaderboards for Empirical AI Research

2021

ICADL · International Journal on Digital Libraries

30+ citations. SOTA metric extraction from scientific text.

🏆 ICADL 2021 Best Paper Award

Participatory Research for Low-Resourced Machine Translation

2020

EMNLP Findings

280+ citations. Standard benchmark for African Language NLP.

Technical Skills

Languages

PythonC/C++SQLBash

Deep Learning & AI

PyTorchTensorFlowHugging Face TransformersLLMsRAGLangChainOpenCV

Cloud & MLOps

AWS (SageMaker, Lambda)DockerKubernetesGoogle Cloud Vertex AILinuxGit

Research Areas

NLPComputer VisionKnowledge GraphsSpeech TranslationGenerative AI

Awards & Honors

2024

US Patent Issued (AI/ML) + 3 Pending

State Farm — Innovation Group

2021

ICADL Best Paper Award

International Conference on Asian Digital Libraries

2020

DLRL Summer School

CIFAR / Mila, Montreal

2020

Google Hash Code — Ranked 1747/10724

Google

2019

ACM Future of Computing Academy (FCA) Member

Association for Computing Machinery — 36 selected globally