The Client and the Problem
This case study covers a regional NBFC based in eastern Uttar Pradesh, handling gold loan and MSME loan products across 400 branches. The contact centre operates 200 seats, handles approximately 12,000 inbound calls per day, and serves a predominantly rural and semi-urban customer base.
Customers communicate in a mix of Hinglish, Bhojpuri, Awadhi, and standard Hindi. The NBFC had been running Google Cloud Speech-to-Text for call transcription and QA, with English sentiment analysis downstream. Accuracy on their actual audio: estimated 61% WER on Bhojpuri-heavy interactions.
Why Existing Systems Were Failing
Google Cloud STT's Hindi model performs reasonably on standard newsreader Hindi. It performs poorly on Bhojpuri, which is linguistically distinct enough that standard Hindi models treat it as heavily accented degraded speech rather than a separate language.
The downstream consequence was a QA system that was effectively blind to 40% of the contact centre's interactions โ exactly the rural borrowers with lower education levels and higher loan default risk. The customers most important to monitor for delinquency signals were the ones whose calls weren't being correctly transcribed.
The Migration Process
Migration from Google STT to Rama STT was completed in 18 days. Week one: integration setup. Rama STT connects via the same real-time streaming WebSocket API pattern as Google, so existing call recording infrastructure required only endpoint configuration changes. No new hardware provisioned.
Week two: parallel running, both systems transcribing all calls simultaneously with human spot-checking on a 5% random sample. Rama STT's WER on the client's actual audio: 9.2% overall vs Google's 23.4%. Full cutover happened at the start of week three.
Month-1 Results
Transcription cost reduced from approximately โน24L/month to โน6L/month โ a reduction of โน18L per month. The savings came from two sources: Rama STT's lower per-minute pricing, and a significant reduction in manual review and correction hours required to make transcripts usable for QA.
QA coverage increased from an estimated 58% of calls to 94%. The sentiment analysis and keyword flagging systems, now working from accurate transcripts, flagged 23% more early delinquency indicators in month 1 versus the same period in the prior year.
What Didn't Go Perfectly
Transparency requires acknowledging what didn't work as planned. Rama STT's performance on Awadhi โ distinct from both Hindi and Bhojpuri โ was better than Google's but still showed elevated WER of approximately 18% on the heaviest Awadhi speakers. This is an area of active model improvement.
The integration with the client's legacy QA workflow system required more custom development than anticipated โ the existing system had undocumented assumptions about transcript format. The core STT migration was smooth; the downstream workflow integration was the variable.