Experiments
Compare agent versions, inspect scenario regressions, and wire results into CI/CD.
| Experiment Name | Agent Version | Scenarios | Avg MOS | Task Completion | Latency p95 | Pass Rate | Date | Compare | |
|---|---|---|---|---|---|---|---|---|---|
prod-agent-v2 preflight Git a17c9f2 | prod-agent-v2 | 6 | 4.1 | 86% | 1180ms | 83% | May 23 | pass | |
angry_refund_call 4.1 Task seed locked for transcript diff plan_upgrade_hinglish 3.9 Task seed locked for transcript diff internet_outage_credit 3.8 Task seed locked for transcript diff | |||||||||
staging-agent-v3 regression Git b8021ac MOS dropped 0.4 on hi-IN scenarios | staging-agent-v3 | 6 | 3.4 | 62% | 1840ms | 50% | May 22 | fail | |
retell-routing baseline Git c91e0da | retell-routing-v1 | 4 | 3.8 | 79% | 1360ms | 75% | May 20 | pass | |