
SAHARA AL-MADI
Founder, Linguistic Security Institute (LSI)
Computational Linguist
[🔗 linkedin.com/in/madisahara] | [🐙 github.com/bintdamiana]
PROFESSIONAL SUMMARY
I’m a computational linguist and founder of the Linguistic Security Institute (LSI) . My work asks a simple question: What does technology owe the voices it learns from? I explore how linguistic nuance shapes both vulnerabilities and defenses in AI systems and how these questions travel across domains, from security to data sovereignty to Earth observation.
I collaborate with open source communities including NAMAA (Arabic NLP) on peer-reviewed research (ACL 2026 forthcoming), consult for AI companies like SoundHound AI, and speak at university symposia, security conferences, and international forums. Previously, I was Digital Strategy Lead at the National Museum of Language in Washington, DC. As a UCLA trained linguist, I’m dedicated to building technology that is culturally coherent, resilient, and trustworthy.
CORE RESEARCH & INTELLECTUAL PROPERTY
Linguistic Security Institute (LSI) | Founder | 2025–Present
LSI helps organizations navigate the intersection of AI, language, and trust. We ask a simple question: What does technology owe the voices it learns from?
The answer shapes how companies audit models, govern data, and build systems that earn trust across markets and communities. Our frameworks for meaning transparency, data sovereignty, and cross-domain accountability give leaders practical tools to identify linguistic risk before it becomes reputational or regulatory harm.
- “Linguistic Security” Framework: An original contribution to AI governance, structured around three pillars: Meaning Transparency, Data Sovereignty, and Cross-Domain Accountability.
- Public Demos & Tools:
- Mitote: Nahuatl-Aware Pronunciation Explorer: A RAG-powered TTS demo that addresses the mispronunciation of Indigenous words, demonstrating AI’s potential for cultural preservation, not extraction.
- Linguistic Firewall: Polyglot Poisoning Probe: An open-source Python script exposing how linguistic nuances (Arabic, RTL scripts) can bypass LLM guardrails. Presented at BSides San Diego (2026).
SELECTED RESEARCH & PUBLICATIONS
| Publication | Venue | Status | Role |
|---|---|---|---|
| Arabic Polarization Detection Across 10+ LLMs | ACL 2026 (SemEval) | Pending | Co-author with NAMAA community, Error Analysis Lead |
| Linguistic Security: A Framework for Endangered Languages | ComputEL-9 (ACL Workshop) | Pending | Solo Author |
| What My Grandmothers’ Languages Taught Me About AI | TSLL 2026 | Pending | Solo Author |
Error Analysis Lead – ACL 2026 (NAMAA Community)
- Leading error analysis on Arabic polarization detection across 10+ LLMs (DeepSeek, Qwen, Gemma, Llama, Fanar).
- Documented patterns of dialectal bias, “Clever Hans” effects, hallucination, and gold label inconsistencies.
- Contributing to a system description paper comparing generative, retrieval-augmented, and discriminative methods.
SELECTED SPEAKING & ENGAGEMENTS
| Event | Title/Focus | Context |
|---|---|---|
| GEO Indigenous Summit (Space4Innovation, Prague 2026) | Who Speaks for the Data? A Linguist’s Lens on Listening to Land and Language | Explored parallels between human language AI and non-human data ethics for space and conservation communities. |
| BSides San Diego (2026) | Polyglot Poisoning: Bypassing AI Guardrails via the Linguistic Supply Chain | Featured the Linguistic Firewall demo; examined security vulnerabilities and defensive frameworks. |
| Tech Intersections (Northeastern University Oakland, 2026) | Bridging the Cybersecurity Gap for Multilingual and Marginalized Communities | Addressed the intersection of language, security, and equity. |
| Grace Hopper Celebration (2026) | Proposal: The Linguist in the Machine | Submitted for the 2026 conference. |
PROFESSIONAL EXPERIENCE
SoundHound AI | Santa Clara, CA (Remote) | Language Consultant | 2024–Present
- Lead multilingual training data operations, managing contractor teams and ensuring quality milestones for AI training datasets.
- Design and implement QA frameworks with structured audits and dashboards, improving data integrity by 15%.
- Perform gold label audits and error analysis across thousands of language dataset entries, improving model response accuracy by 12%.
- Collaborate with engineering teams to analyze data inconsistencies and propose refinements for model reliability.
National Museum of Language | Washington, DC | Digital Strategy Lead | 2020–2025
- Participated in board meetings with educators and government-affiliated members, contributing to strategic discussions on organizational vision and community impact.
- Led grant writing initiatives and project management efforts, advancing museum objectives through collaboration with senior leadership.
- Directed multilingual content strategy, increasing organic traffic by 62% through targeted SEO and audience engagement.
- Led transcription, narration, and interpretation of English, Spanish, Italian, and Arabic literature for cultural exhibits.
COMMUNITY & COLLABORATION
NAMAA Arabic NLP Community | Research Collaborator | 2025–Present
- Co-author of ACL 2026 paper on Arabic polarization detection.
- Lead error analysis examining model failures across dialectal Arabic varieties.
- Contribute to collaborative research on low-resource language NLP and community-centered AI design.
TEACHING & MENTORSHIP
Private Writing & Communication Coach | 2020–Present
- Provide personalized editorial feedback to multilingual professionals and students.
- Develop curricula emphasizing clarity, audience adaptation, and linguistic precision.
Creative Writing Instructor | Ringtail Learning Inc. | 2020–2025
- Taught writing and communication skills to 300+ multilingual and ESL students in English and Spanish.
- Increased average student writing scores by 25% through constructive, personalized feedback.
EDUCATION
University of California, Los Angeles (UCLA) | B.A. Applied Linguistics | 2020
- Coursework: Phonetics, Phonology, Syntax, Pragmatics, Morphology, Dialectology, Computational Linguistics.
TECHNICAL EXPERTISE
| Category | Skills |
|---|---|
| AI Governance | Data Sovereignty, Meaning Extraction Auditing, Accountability Frameworks, Stakeholder Engagement, Grant Writing |
| Error Analysis | Dialectal Bias Detection, “Clever Hans” Identification, Gold Label Audits, Hallucination Analysis |
| AI & NLP | LLM Evaluation, RAG, Zero-Shot Prompting, Fine-Tuning, Model Bias Detection, Adversarial Testing |
| Linguistics | Phonetics, Phonology, Syntax, Pragmatics, Dialectology, Transcription (IPA, SAMPA), Sociolinguistics |
| Technical | Python (Pandas, NumPy), SQL, Data Visualization, Git/GitHub, Open-Source Collaboration |
| Languages | English (Native), Spanish (Native), Arabic (Read/Write, Linguistic Analysis), Italian (Intermediate) |
- Email: thecyberlingo@gmail.com
- LinkedIn: linkedin.com/in/madisahara
