ByteDance 2.0 Voice Stickiness — First Read

Voice experiment · stickiness read

ByteDance 2.0 voices, first look

Same framework as before: parts voiced per user (one synthesis event = one part), English only, ranked on robust central tendency. The ByteDance 2.0 cohort is measured against Campfire and Mr. Gray in the control arm — same new-user population, same window — so it is a like-for-like read, not a comparison with their mature lifetime numbers.

Cohort gp:ab_bytedance_v2 = treatment vs control Window 23 Jun – 2 Jul 2026 (~9 days) Event audio_stream_synthesizing_performed Amplitude-verified
Hover any mark for the full breakdown
Stickiest voice — Ember9.5trim80 parts/user · beats Campfire 8.7
Ember median vs Campfire6–10vs Campfire's 5 parts · stickier at the median
ByteDance parts voiced (en)102,462across 6 live voices, ~9 days
Voices with no signal3Vivi & Tim = 0 · Smoke pulled

1 · Ranked by stickiness

Trim80 parts per user, every voice

Trim80 is the mean of the bottom 80% of users, dropping the heavy top-20% tail — the same headline number the framework has always used. ByteDance voices come from the treatment arm; Campfire, Mr. Gray and the other incumbents from the control arm.

ByteDance 2.0Campfire · incumbent defaultMr. Gray · OpenAIOther incumbents
0246810Ember9.5Campfire8.7Spotlight3.5Mr. Gray3.3Sin3.1Spark2.1Drowse2.1After Dark1.9Westminster1.9Cuppa1.6Mindstream1.6Blunt1.5
VoiceUsersPartsMeanMedianTrim80 ▾
Ember2,43072,33729.86–109.5
Campfire5,287153,16129.058.7
Spotlight1,72738,52522.323.5
Mr. Gray1,14427,56324.123.3
Sin56111,54220.623.1
Spark7778,01510.322.1
Drowse5607,02212.522.1
After Dark1,14817,62215.421.9
Westminster1,33714,59610.921.9
Cuppa1,15711,1599.621.6
Mindstream1,33410,7098.021.6
Blunt5873,3895.821.5

2 · Reach vs depth

How many users, and how hard they lean in

Horizontal = users who voiced with it. Vertical = mean parts per user. Bubble area = total parts voiced. Top-right is the prize (broad and deep); high-but-narrow is a niche voice with a loyal heavy tail.

0816243201k2k3k4k5kusers →↑ mean parts/userEmberCampfireSpotlightMr. GraySinDrowseBlunt
Reach vs depth uses the same per-voice numbers as the ranking table above.

3 · The shape of engagement

Per-user parts distribution — the four that matter

Each bar is the share of that voice's users who voiced a given number of parts. Ember and Campfire spread their weight into the heavy buckets; Mr. Gray and Sin are front-loaded (many one-and-done) but keep a real heavy tail — which is why their mean is high while their median stays at 2.

EmberN 2,430 · med 6–10 · trim80 9.5
0%16%32%48%16–10>100
CampfireN 5,287 · med 5 · trim80 8.7
0%16%32%48%16–10>100
Mr. GrayN 1,144 · med 2 · trim80 3.3
0%16%32%48%16–10>100
SinN 561 · med 2 · trim80 3.1
0%16%32%48%16–10>100

x: parts voiced per user, bucketed (1 · 2 · 3 · 4 · 5 · 6–10 · 11–20 · 21–50 · 51–100 · >100)  ·  y: % of the voice's users

Voice123456–1011–2021–5051–100>100
Ember15.9%14.3%8.8%5.9%3.7%9.8%9.4%14.0%9.2%9.0%
Campfire17.1%11.5%10.7%6.7%4.3%10.4%9.4%13.1%9.2%7.8%
Mr. Gray45.3%14.0%3.8%3.7%1.3%5.4%5.5%8.7%5.1%7.2%
Sin42.2%17.8%4.3%2.9%1.8%5.0%5.3%8.6%6.6%5.5%

Share of each voice's users, by parts-per-user bucket.

4 · Every number

Full table, both arms

Users, parts and the median bucket are read straight from Amplitude. Mean = parts ÷ users. Trim80 is computed in this page from the frequency histogram (bucket midpoints; ±1–2 on the two heaviest voices because of the open >100 bucket).

VoiceUsersPartsMeanMedianTrim80
Treatment arm · ByteDance 2.0 cohort
Ember2,43072,33729.86–109.5
Sin56111,54220.623.1
Spark7778,01510.322.1
Drowse5607,02212.522.1
Mindstream6785,5318.221.7
Cuppa5693,7326.621.7
After Dark4893,3226.821.5
Smoke · disabled311575.121.5
Blunt5873,3895.821.5
Control arm · benchmark + incumbents
Campfire5,287153,16129.058.7
Spotlight1,72738,52522.323.5
Mr. Gray1,14427,56324.123.3
After Dark1,14817,62215.421.9
Westminster1,33714,59610.921.9
Cuppa1,15711,1599.621.6
Mindstream1,33410,7098.021.6

5 · No signal

Voices that never showed up

Vivi

Declared default voice in byteDanceV2.ts (multilingual). 0 events across all languages in the treatment arm.

Tim

Declared secondary voice. Also 0 events everywhere. Ember appears to be absorbing the default slot instead.

Smoke

AAVE-prompt voice, disabled 24 Jun pending prompt work. 31 users / 157 parts before it was pulled — no usable signal.

6 · The read

Strongest, weakest, and what to cut

Ranked verdicts

Strongest

Ember is the stickiest voice in the whole test. Trim80 9.5 and a median of 6–10 parts beat Campfire (8.7, median 5) — the incumbent it appears to have replaced as the default.

Holds up

Sin matches Mr. Gray (trim80 3.1 vs 3.3) and carries the second-highest mean of any ByteDance voice — a small, devoted heavy-user base.

Mid-pack

Spark and Drowse (2.1) sit above the weak incumbents but below Spotlight and Mr. Gray.

Weakest live

Blunt (1.5) trails every incumbent, with almost no heavy tail — the one numbers-based cut among shipping voices.

Removal candidates

  • Smoke — already disabled 24 Jun; the AAVE-prompt version drew 31 users before it was pulled. Redesign the prompt or cut.
  • Vivi / Timdo not remove yet. They are the declared primary/secondary voices but fire zero events across every language. That reads as an exposure or default-wiring bug, not user rejection. Fix first; removing would bury the problem.
  • Blunt — the only genuine metrics-based cut among live voices. Caveat: it is the sole British-native-accent ByteDance voice, so keeping one accent option is a product call, not a data call.

Hypotheses

The plain timbre is the win, not the gimmicks. Ember carries no style prompt yet out-sticks Campfire. A ByteDance voice may be a better global default.

Mood voices are niche-loyal. Sin (sensual) and Drowse (sleep) show low medians but high means — judge them on subsegment retention, not reach.

Prompt-engineered accents underdeliver. Blunt (Yorkshire) is weakest live; Smoke (AAVE) was pulled.

7 · How this was measured

Method, verification, caveats

Verification

  • Distinct users were cross-checked three ways and agree exactly. Ember: frequency-histogram sum 2,430 = sum of daily uniques (42+157+217+215+245+283+315+330+348+278 = 2,430) = daily-avg × days (243.2 × ~10). Campfire matches too.
  • Parts are Amplitude event totals. Mean = parts ÷ users. Median is the exact histogram bucket. Only trim80 is modeled (bucket midpoints).
  • is_default returns empty on treatment synthesis events, so Ember's default role is inferred from its 3× reach lead and Campfire-like curve, not proven by a property.

Read it with these caveats

  • ~9 days only; cap moved 5k → 2,300 on 29 Jun; Smoke disabled mid-window. Everything except Ember vs Campfire is directional.
  • Sum-of-daily-uniques ≈ distinct users means nearly every user was active on a single day — so "sticky" here is depth-per-user within the window, not multi-day return (expected for a brand-new-user cohort).
  • This is parts voiced, not strictly listened — player events carry no preset id, same as the prior framework.

Source: Amplitude project 413385 (Peech Production) · event audio_stream_synthesizing_performed · group voicePresetName · segment gp:ab_bytedance_v2 · filter language = en. ByteDance 2.0 preset set from constants/byteDanceV2.ts. Reach chart: app.amplitude.com/…/g19o7ylr