Sahara Al-Madi

SAHARA AL-MADI
Founder, Linguistic Security Institute (LSI)
Computational Linguist
[🔗 linkedin.com/in/madisahara] | [🐙 github.com/bintdamiana]

PROFESSIONAL SUMMARY

I’m a computational linguist and founder of the Linguistic Security Institute (LSI) . My work asks a simple question: What does technology owe the voices it learns from? I explore how linguistic nuance shapes both vulnerabilities and defenses in AI systems and how these questions travel across domains, from security to data sovereignty to Earth observation.

I collaborate with open source communities including NAMAA (Arabic NLP) on peer-reviewed research (ACL 2026 forthcoming), consult for AI companies like SoundHound AI, and speak at university symposia, security conferences, and international forums. Previously, I was Digital Strategy Lead at the National Museum of Language in Washington, DC. As a UCLA trained linguist, I’m dedicated to building technology that is culturally coherent, resilient, and trustworthy.

CORE RESEARCH & INTELLECTUAL PROPERTY

Linguistic Security Institute (LSI) | Founder | 2025–Present
LSI helps organizations navigate the intersection of AI, language, and trust. We ask a simple question: What does technology owe the voices it learns from?

The answer shapes how companies audit models, govern data, and build systems that earn trust across markets and communities. Our frameworks for meaning transparency, data sovereignty, and cross-domain accountability give leaders practical tools to identify linguistic risk before it becomes reputational or regulatory harm.

“Linguistic Security” Framework: An original contribution to AI governance, structured around three pillars: Meaning Transparency, Data Sovereignty, and Cross-Domain Accountability.
Public Demos & Tools:
- Mitote: Nahuatl-Aware Pronunciation Explorer: A RAG-powered TTS demo that addresses the mispronunciation of Indigenous words, demonstrating AI’s potential for cultural preservation, not extraction.
- Linguistic Firewall: Polyglot Poisoning Probe: An open-source Python script exposing how linguistic nuances (Arabic, RTL scripts) can bypass LLM guardrails. Presented at BSides San Diego (2026).

SELECTED RESEARCH & PUBLICATIONS

Publication	Venue	Status	Role
Arabic Polarization Detection Across 10+ LLMs	ACL 2026 (SemEval)	Pending	Co-author with NAMAA community, Error Analysis Lead
Linguistic Security: A Framework for Endangered Languages	ComputEL-9 (ACL Workshop)	Pending	Solo Author
What My Grandmothers’ Languages Taught Me About AI	TSLL 2026	Pending	Solo Author

Error Analysis Lead – ACL 2026 (NAMAA Community)

Leading error analysis on Arabic polarization detection across 10+ LLMs (DeepSeek, Qwen, Gemma, Llama, Fanar).
Documented patterns of dialectal bias, “Clever Hans” effects, hallucination, and gold label inconsistencies.
Contributing to a system description paper comparing generative, retrieval-augmented, and discriminative methods.

SELECTED SPEAKING & ENGAGEMENTS

Event	Title/Focus	Context
GEO Indigenous Summit (Space4Innovation, Prague 2026)	Who Speaks for the Data? A Linguist’s Lens on Listening to Land and Language	Explored parallels between human language AI and non-human data ethics for space and conservation communities.
BSides San Diego (2026)	Polyglot Poisoning: Bypassing AI Guardrails via the Linguistic Supply Chain	Featured the Linguistic Firewall demo; examined security vulnerabilities and defensive frameworks.
Tech Intersections (Northeastern University Oakland, 2026)	Bridging the Cybersecurity Gap for Multilingual and Marginalized Communities	Addressed the intersection of language, security, and equity.
Grace Hopper Celebration (2026)	Proposal: The Linguist in the Machine	Submitted for the 2026 conference.

PROFESSIONAL EXPERIENCE

SoundHound AI | Santa Clara, CA (Remote) | Language Consultant | 2024–Present

Lead multilingual training data operations, managing contractor teams and ensuring quality milestones for AI training datasets.
Design and implement QA frameworks with structured audits and dashboards, improving data integrity by 15%.
Perform gold label audits and error analysis across thousands of language dataset entries, improving model response accuracy by 12%.
Collaborate with engineering teams to analyze data inconsistencies and propose refinements for model reliability.

National Museum of Language | Washington, DC | Digital Strategy Lead | 2020–2025

Participated in board meetings with educators and government-affiliated members, contributing to strategic discussions on organizational vision and community impact.
Led grant writing initiatives and project management efforts, advancing museum objectives through collaboration with senior leadership.
Directed multilingual content strategy, increasing organic traffic by 62% through targeted SEO and audience engagement.
Led transcription, narration, and interpretation of English, Spanish, Italian, and Arabic literature for cultural exhibits.

COMMUNITY & COLLABORATION

NAMAA Arabic NLP Community | Research Collaborator | 2025–Present

Co-author of ACL 2026 paper on Arabic polarization detection.
Lead error analysis examining model failures across dialectal Arabic varieties.
Contribute to collaborative research on low-resource language NLP and community-centered AI design.

TEACHING & MENTORSHIP

Private Writing & Communication Coach | 2020–Present

Provide personalized editorial feedback to multilingual professionals and students.
Develop curricula emphasizing clarity, audience adaptation, and linguistic precision.

Creative Writing Instructor | Ringtail Learning Inc. | 2020–2025

Taught writing and communication skills to 300+ multilingual and ESL students in English and Spanish.
Increased average student writing scores by 25% through constructive, personalized feedback.

EDUCATION

University of California, Los Angeles (UCLA) | B.A. Applied Linguistics | 2020

Coursework: Phonetics, Phonology, Syntax, Pragmatics, Morphology, Dialectology, Computational Linguistics.

TECHNICAL EXPERTISE

Category	Skills
AI Governance	Data Sovereignty, Meaning Extraction Auditing, Accountability Frameworks, Stakeholder Engagement, Grant Writing
Error Analysis	Dialectal Bias Detection, “Clever Hans” Identification, Gold Label Audits, Hallucination Analysis
AI & NLP	LLM Evaluation, RAG, Zero-Shot Prompting, Fine-Tuning, Model Bias Detection, Adversarial Testing
Linguistics	Phonetics, Phonology, Syntax, Pragmatics, Dialectology, Transcription (IPA, SAMPA), Sociolinguistics
Technical	Python (Pandas, NumPy), SQL, Data Visualization, Git/GitHub, Open-Source Collaboration
Languages	English (Native), Spanish (Native), Arabic (Read/Write, Linguistic Analysis), Italian (Intermediate)

Email: thecyberlingo@gmail.com
LinkedIn: linkedin.com/in/madisahara

SAHARA AL-MADIFounder, Linguistic Security Institute (LSI)Computational Linguist[🔗 linkedin.com/in/madisahara] | [🐙 github.com/bintdamiana]