Create Experiment
Experiment NameAgent VersionScenariosAvg MOSTask CompletionLatency p95Pass RateDateCompare

prod-agent-v2 preflight

Git a17c9f2

prod-agent-v26
4.1
86%
1180ms
83%
May 23pass

angry_refund_call

4.1

Task seed locked for transcript diff

plan_upgrade_hinglish

3.9

Task seed locked for transcript diff

internet_outage_credit

3.8

Task seed locked for transcript diff

staging-agent-v3 regression

Git b8021ac

MOS dropped 0.4 on hi-IN scenarios

staging-agent-v36
3.4
62%
1840ms
50%
May 22fail

retell-routing baseline

Git c91e0da

retell-routing-v14
3.8
79%
1360ms
75%
May 20pass