Voice experiment · stickiness read
Same framework as before: parts voiced per user (one synthesis event = one part), English only, ranked on robust central tendency. The ByteDance 2.0 cohort is measured against Campfire and Mr. Gray in the control arm — same new-user population, same window — so it is a like-for-like read, not a comparison with their mature lifetime numbers.
1 · Ranked by stickiness
Trim80 is the mean of the bottom 80% of users, dropping the heavy top-20% tail — the same headline number the framework has always used. ByteDance voices come from the treatment arm; Campfire, Mr. Gray and the other incumbents from the control arm.
| Voice | Users | Parts | Mean | Median | Trim80 ▾ |
|---|---|---|---|---|---|
| Ember | 2,430 | 72,337 | 29.8 | 6–10 | 9.5 |
| Campfire | 5,287 | 153,161 | 29.0 | 5 | 8.7 |
| Spotlight | 1,727 | 38,525 | 22.3 | 2 | 3.5 |
| Mr. Gray | 1,144 | 27,563 | 24.1 | 2 | 3.3 |
| Sin | 561 | 11,542 | 20.6 | 2 | 3.1 |
| Spark | 777 | 8,015 | 10.3 | 2 | 2.1 |
| Drowse | 560 | 7,022 | 12.5 | 2 | 2.1 |
| After Dark | 1,148 | 17,622 | 15.4 | 2 | 1.9 |
| Westminster | 1,337 | 14,596 | 10.9 | 2 | 1.9 |
| Cuppa | 1,157 | 11,159 | 9.6 | 2 | 1.6 |
| Mindstream | 1,334 | 10,709 | 8.0 | 2 | 1.6 |
| Blunt | 587 | 3,389 | 5.8 | 2 | 1.5 |
2 · Reach vs depth
Horizontal = users who voiced with it. Vertical = mean parts per user. Bubble area = total parts voiced. Top-right is the prize (broad and deep); high-but-narrow is a niche voice with a loyal heavy tail.
3 · The shape of engagement
Each bar is the share of that voice's users who voiced a given number of parts. Ember and Campfire spread their weight into the heavy buckets; Mr. Gray and Sin are front-loaded (many one-and-done) but keep a real heavy tail — which is why their mean is high while their median stays at 2.
x: parts voiced per user, bucketed (1 · 2 · 3 · 4 · 5 · 6–10 · 11–20 · 21–50 · 51–100 · >100) · y: % of the voice's users
| Voice | 1 | 2 | 3 | 4 | 5 | 6–10 | 11–20 | 21–50 | 51–100 | >100 |
|---|---|---|---|---|---|---|---|---|---|---|
| Ember | 15.9% | 14.3% | 8.8% | 5.9% | 3.7% | 9.8% | 9.4% | 14.0% | 9.2% | 9.0% |
| Campfire | 17.1% | 11.5% | 10.7% | 6.7% | 4.3% | 10.4% | 9.4% | 13.1% | 9.2% | 7.8% |
| Mr. Gray | 45.3% | 14.0% | 3.8% | 3.7% | 1.3% | 5.4% | 5.5% | 8.7% | 5.1% | 7.2% |
| Sin | 42.2% | 17.8% | 4.3% | 2.9% | 1.8% | 5.0% | 5.3% | 8.6% | 6.6% | 5.5% |
Share of each voice's users, by parts-per-user bucket.
4 · Every number
Users, parts and the median bucket are read straight from Amplitude. Mean = parts ÷ users. Trim80 is computed in this page from the frequency histogram (bucket midpoints; ±1–2 on the two heaviest voices because of the open >100 bucket).
| Voice | Users | Parts | Mean | Median | Trim80 |
|---|---|---|---|---|---|
| Treatment arm · ByteDance 2.0 cohort | |||||
| Ember | 2,430 | 72,337 | 29.8 | 6–10 | 9.5 |
| Sin | 561 | 11,542 | 20.6 | 2 | 3.1 |
| Spark | 777 | 8,015 | 10.3 | 2 | 2.1 |
| Drowse | 560 | 7,022 | 12.5 | 2 | 2.1 |
| Mindstream | 678 | 5,531 | 8.2 | 2 | 1.7 |
| Cuppa | 569 | 3,732 | 6.6 | 2 | 1.7 |
| After Dark | 489 | 3,322 | 6.8 | 2 | 1.5 |
| Smoke · disabled | 31 | 157 | 5.1 | 2 | 1.5 |
| Blunt | 587 | 3,389 | 5.8 | 2 | 1.5 |
| Control arm · benchmark + incumbents | |||||
| Campfire | 5,287 | 153,161 | 29.0 | 5 | 8.7 |
| Spotlight | 1,727 | 38,525 | 22.3 | 2 | 3.5 |
| Mr. Gray | 1,144 | 27,563 | 24.1 | 2 | 3.3 |
| After Dark | 1,148 | 17,622 | 15.4 | 2 | 1.9 |
| Westminster | 1,337 | 14,596 | 10.9 | 2 | 1.9 |
| Cuppa | 1,157 | 11,159 | 9.6 | 2 | 1.6 |
| Mindstream | 1,334 | 10,709 | 8.0 | 2 | 1.6 |
5 · No signal
Declared default voice in byteDanceV2.ts (multilingual). 0 events across all languages in the treatment arm.
Declared secondary voice. Also 0 events everywhere. Ember appears to be absorbing the default slot instead.
AAVE-prompt voice, disabled 24 Jun pending prompt work. 31 users / 157 parts before it was pulled — no usable signal.
6 · The read
Ember is the stickiest voice in the whole test. Trim80 9.5 and a median of 6–10 parts beat Campfire (8.7, median 5) — the incumbent it appears to have replaced as the default.
Sin matches Mr. Gray (trim80 3.1 vs 3.3) and carries the second-highest mean of any ByteDance voice — a small, devoted heavy-user base.
Spark and Drowse (2.1) sit above the weak incumbents but below Spotlight and Mr. Gray.
Blunt (1.5) trails every incumbent, with almost no heavy tail — the one numbers-based cut among shipping voices.
The plain timbre is the win, not the gimmicks. Ember carries no style prompt yet out-sticks Campfire. A ByteDance voice may be a better global default.
Mood voices are niche-loyal. Sin (sensual) and Drowse (sleep) show low medians but high means — judge them on subsegment retention, not reach.
Prompt-engineered accents underdeliver. Blunt (Yorkshire) is weakest live; Smoke (AAVE) was pulled.
7 · How this was measured
Source: Amplitude project 413385 (Peech Production) · event audio_stream_synthesizing_performed · group voicePresetName · segment gp:ab_bytedance_v2 · filter language = en. ByteDance 2.0 preset set from constants/byteDanceV2.ts. Reach chart: app.amplitude.com/…/g19o7ylr