Engagement Hour Pricing: Why CPM and Token Models Are Wrong for Voice AI

Per-character, per-token, per-second. Here's why all of them misalign vendor and customer incentives.

The Problem With Per-Minute Pricing

Google Cloud Speech-to-Text charges per 15 seconds of audio. AWS Transcribe charges per second. The incentive these models create is perverse: to control costs, enterprises want shorter calls. But shorter calls mean less resolution, more repeat contacts, and worse customer outcomes. The pricing model actively works against good business results.

Per-minute pricing also creates billing unpredictability at scale. A contact centre handling 50,000 calls per month doesn't know whether those calls will average 3 minutes or 8 minutes until after the fact. Budgeting becomes a constant post-hoc exercise.

The Token Pricing Trap

LLM providers charge per input and output token. The incentive this creates is equally problematic: enterprises are financially penalized for providing richer context to the model — the very thing that would improve accuracy. Engineering teams spend significant effort on prompt compression that actively trades away quality for cost savings.

Token pricing also scales poorly with Indian languages. The same semantic content requires 1.4× to 2.1× more tokens in Hindi than in English in most tokenizer architectures — creating a structural cost disadvantage for non-English deployments that isn't present in the actual compute cost.

What an Engagement Hour Actually Is

An engagement hour is a unit of time during which a user is actively interacting with an AI-powered product. For voice AI, it means audio being processed — transcribed, synthesized, or analyzed. For a voice agent, it's the actual conversation time from first word to call end.

This unit aligns vendor and customer incentives naturally. EngineAI gets paid more when users engage more — which happens when the product is working well. The enterprise predicts costs from call volume and average handle time, both of which it already measures for existing contact centre operations.

The ₹5 Number and What It Includes

EngineAI's engagement hour pricing of ₹5 includes STT transcription, TTS synthesis, and Krishna LLM inference — the full stack required for a voice agent interaction. For a typical 4-minute call, that's approximately ₹0.33 per interaction.

At 50,000 calls per month, the monthly bill is approximately ₹16,500 — compared to ₹45,000–₹80,000 for equivalent per-minute and per-token pricing from major cloud providers at the same volume.

When Engagement Hour Pricing Doesn't Work

Transparency requires acknowledging the limits. Engagement hour pricing works well when interactions have natural duration bounds — contact centre calls, IVR flows, customer support chats. It works poorly for open-ended research tasks or document processing pipelines where a single 'engagement' might involve hours of processing.

For document intelligence and batch processing workloads, EngineAI offers per-page and per-document pricing. The goal is always incentive alignment: we should profit when you get value, not when you fail to optimize your prompt length.

Talk to EngineAI