Voice experiment · stickiness read

ByteDance 2.0 voices, first look

Same framework as before: parts voiced per user (one synthesis event = one part), English only, ranked on robust central tendency. The ByteDance 2.0 cohort is measured against Campfire and Mr. Gray in the control arm — same new-user population, same window — so it is a like-for-like read, not a comparison with their mature lifetime numbers.

Cohort gp:ab_bytedance_v2 = treatment vs control Window 23 Jun – 2 Jul 2026 (~9 days) Event audio_stream_synthesizing_performed Amplitude-verified

Stickiest voice — Ember9.5trim80 parts/user · beats Campfire 8.7

Ember median vs Campfire6–10vs Campfire's 5 parts · stickier at the median

ByteDance parts voiced (en)102,462across 6 live voices, ~9 days

Voices with no signal3Vivi & Tim = 0 · Smoke pulled

1 · Ranked by stickiness

Trim80 parts per user, every voice

Trim80 is the mean of the bottom 80% of users, dropping the heavy top-20% tail — the same headline number the framework has always used. ByteDance voices come from the treatment arm; Campfire, Mr. Gray and the other incumbents from the control arm.

ByteDance 2.0Campfire · incumbent defaultMr. Gray · OpenAIOther incumbents

Voice	Users	Parts	Mean	Median	Trim80 ▾
Ember	2,430	72,337	29.8	6–10	9.5
Campfire	5,287	153,161	29.0	5	8.7
Spotlight	1,727	38,525	22.3	2	3.5
Mr. Gray	1,144	27,563	24.1	2	3.3
Sin	561	11,542	20.6	2	3.1
Spark	777	8,015	10.3	2	2.1
Drowse	560	7,022	12.5	2	2.1
After Dark	1,148	17,622	15.4	2	1.9
Westminster	1,337	14,596	10.9	2	1.9
Cuppa	1,157	11,159	9.6	2	1.6
Mindstream	1,334	10,709	8.0	2	1.6
Blunt	587	3,389	5.8	2	1.5

2 · Reach vs depth

How many users, and how hard they lean in

Horizontal = users who voiced with it. Vertical = mean parts per user. Bubble area = total parts voiced. Top-right is the prize (broad and deep); high-but-narrow is a niche voice with a loyal heavy tail.

Reach vs depth uses the same per-voice numbers as the ranking table above.

3 · The shape of engagement

Per-user parts distribution — the four that matter

Each bar is the share of that voice's users who voiced a given number of parts. Ember and Campfire spread their weight into the heavy buckets; Mr. Gray and Sin are front-loaded (many one-and-done) but keep a real heavy tail — which is why their mean is high while their median stays at 2.

EmberN 2,430 · med 6–10 · trim80 9.5

CampfireN 5,287 · med 5 · trim80 8.7

Mr. GrayN 1,144 · med 2 · trim80 3.3

SinN 561 · med 2 · trim80 3.1

x: parts voiced per user, bucketed (1 · 2 · 3 · 4 · 5 · 6–10 · 11–20 · 21–50 · 51–100 · >100) · y: % of the voice's users

Voice	1	2	3	4	5	6–10	11–20	21–50	51–100	>100
Ember	15.9%	14.3%	8.8%	5.9%	3.7%	9.8%	9.4%	14.0%	9.2%	9.0%
Campfire	17.1%	11.5%	10.7%	6.7%	4.3%	10.4%	9.4%	13.1%	9.2%	7.8%
Mr. Gray	45.3%	14.0%	3.8%	3.7%	1.3%	5.4%	5.5%	8.7%	5.1%	7.2%
Sin	42.2%	17.8%	4.3%	2.9%	1.8%	5.0%	5.3%	8.6%	6.6%	5.5%

Share of each voice's users, by parts-per-user bucket.

4 · Every number

Full table, both arms

Users, parts and the median bucket are read straight from Amplitude. Mean = parts ÷ users. Trim80 is computed in this page from the frequency histogram (bucket midpoints; ±1–2 on the two heaviest voices because of the open >100 bucket).

Voice	Users	Parts	Mean	Median	Trim80
Treatment arm · ByteDance 2.0 cohort
Ember	2,430	72,337	29.8	6–10	9.5
Sin	561	11,542	20.6	2	3.1
Spark	777	8,015	10.3	2	2.1
Drowse	560	7,022	12.5	2	2.1
Mindstream	678	5,531	8.2	2	1.7
Cuppa	569	3,732	6.6	2	1.7
After Dark	489	3,322	6.8	2	1.5
Smoke · disabled	31	157	5.1	2	1.5
Blunt	587	3,389	5.8	2	1.5
Control arm · benchmark + incumbents
Campfire	5,287	153,161	29.0	5	8.7
Spotlight	1,727	38,525	22.3	2	3.5
Mr. Gray	1,144	27,563	24.1	2	3.3
After Dark	1,148	17,622	15.4	2	1.9
Westminster	1,337	14,596	10.9	2	1.9
Cuppa	1,157	11,159	9.6	2	1.6
Mindstream	1,334	10,709	8.0	2	1.6

5 · No signal

Voices that never showed up

Vivi

Declared default voice in byteDanceV2.ts (multilingual). 0 events across all languages in the treatment arm.

Tim

Declared secondary voice. Also 0 events everywhere. Ember appears to be absorbing the default slot instead.

Smoke

AAVE-prompt voice, disabled 24 Jun pending prompt work. 31 users / 157 parts before it was pulled — no usable signal.

6 · The read

Strongest, weakest, and what to cut

Ranked verdicts

Strongest

Ember is the stickiest voice in the whole test. Trim80 9.5 and a median of 6–10 parts beat Campfire (8.7, median 5) — the incumbent it appears to have replaced as the default.

Holds up

Sin matches Mr. Gray (trim80 3.1 vs 3.3) and carries the second-highest mean of any ByteDance voice — a small, devoted heavy-user base.

Mid-pack

Spark and Drowse (2.1) sit above the weak incumbents but below Spotlight and Mr. Gray.

Weakest live

Blunt (1.5) trails every incumbent, with almost no heavy tail — the one numbers-based cut among shipping voices.

Removal candidates

Smoke — already disabled 24 Jun; the AAVE-prompt version drew 31 users before it was pulled. Redesign the prompt or cut.
Vivi / Tim — do not remove yet. They are the declared primary/secondary voices but fire zero events across every language. That reads as an exposure or default-wiring bug, not user rejection. Fix first; removing would bury the problem.
Blunt — the only genuine metrics-based cut among live voices. Caveat: it is the sole British-native-accent ByteDance voice, so keeping one accent option is a product call, not a data call.

Hypotheses

The plain timbre is the win, not the gimmicks. Ember carries no style prompt yet out-sticks Campfire. A ByteDance voice may be a better global default.

Mood voices are niche-loyal. Sin (sensual) and Drowse (sleep) show low medians but high means — judge them on subsegment retention, not reach.

Prompt-engineered accents underdeliver. Blunt (Yorkshire) is weakest live; Smoke (AAVE) was pulled.

7 · How this was measured

Method, verification, caveats

Verification

Distinct users were cross-checked three ways and agree exactly. Ember: frequency-histogram sum 2,430 = sum of daily uniques (42+157+217+215+245+283+315+330+348+278 = 2,430) = daily-avg × days (243.2 × ~10). Campfire matches too.
Parts are Amplitude event totals. Mean = parts ÷ users. Median is the exact histogram bucket. Only trim80 is modeled (bucket midpoints).
is_default returns empty on treatment synthesis events, so Ember's default role is inferred from its 3× reach lead and Campfire-like curve, not proven by a property.

Read it with these caveats

~9 days only; cap moved 5k → 2,300 on 29 Jun; Smoke disabled mid-window. Everything except Ember vs Campfire is directional.
Sum-of-daily-uniques ≈ distinct users means nearly every user was active on a single day — so "sticky" here is depth-per-user within the window, not multi-day return (expected for a brand-new-user cohort).
This is parts voiced, not strictly listened — player events carry no preset id, same as the prior framework.

Source: Amplitude project 413385 (Peech Production) · event audio_stream_synthesizing_performed · group voicePresetName · segment gp:ab_bytedance_v2 · filter language = en. ByteDance 2.0 preset set from constants/byteDanceV2.ts. Reach chart: app.amplitude.com/…/g19o7ylr