Case Study

How We Built a Multilingual Voice AI for a SAMA-Licensed Fintech

Building a voice assistant that understands Arabic, English, and French simultaneously — the technical challenges and how we solved them.

١ أبريل ٢٠٢٥·8 دقيقة قراءة·Tessafold

The client and the challenge

EdfaPay is a payment company licensed by the Saudi Central Bank (SAMA) that serves customers across Saudi Arabia and neighboring Gulf countries. Their customer base includes Arabic speakers, English-speaking expatriates, and a significant French-speaking community from North and West Africa. They needed a single voice assistant that could handle all three languages without the customer needing to specify their language preference upfront — the system should detect and switch automatically.

The technical architecture

We built the system in three integrated layers. The speech recognition layer uses OpenAI Whisper as the base, enhanced with a custom post-processing module trained on Gulf Arabic business vocabulary and financial terminology — significantly improving accuracy on names, amounts, and product names specific to the payments domain. The language understanding layer uses GPT-4 with a carefully engineered system prompt that includes EdfaPay's product catalog, common queries, and response guidelines in all three languages. The speech synthesis layer uses ElevenLabs with three distinct voice profiles — one per language — to ensure natural-sounding responses that match each language's rhythm and intonation.

The hardest problem: code-switching

The most technically challenging aspect was handling code-switching — when a customer speaks in Arabic but uses English or French technical terms mid-sentence. This is extremely common: 'أبغى أعرف الـ balance ديالي' (I want to know my balance). The English word 'balance' embedded in an Arabic sentence confused standard speech recognition systems by triggering a language switch. Our solution was to train the post-processing layer specifically on code-switched financial queries from the Gulf region, teaching it to handle these patterns as a unified utterance rather than two separate language segments.

SAMA compliance and security requirements

Operating under SAMA regulation meant the system had to meet strict data handling requirements. No customer voice data could be stored beyond the duration of the call. Personally identifiable information detected in transcripts had to be masked before any processing. All communication had to be encrypted end-to-end. We designed the data architecture around these constraints from day one — not as an afterthought. The system was audited by EdfaPay's compliance team before going live and passed all requirements.

Results and what we learned

After three months in production, the system handles over 80% of routine customer queries without human agent escalation — across all three languages. Average handling time for common queries dropped from 4.2 minutes with a human agent to 45 seconds with the voice AI. Customer satisfaction scores for voice interactions are comparable to human agent interactions for transactional queries, and higher for availability (24/7 response with no wait time). The key learning: voice AI for multilingual markets requires significantly more investment in language-specific tuning than typical AI projects — but the ROI, when done correctly, is substantial.

جاهز تطبّق هذا على شركتك؟