Voice Cloning
Last updated: 2026-03-09
How Voice Cloning Works#
Altrixy's voice cloning uses Eleven Labs' neural voice synthesis technology to create a digital replica of your voice:
- Sample Collection — You provide voice recordings that capture your vocal characteristics
- Feature Extraction — The AI analyzes pitch, tone, cadence, accent, and speaking patterns
- Model Training — A custom voice model is trained on your samples (takes 5-15 minutes)
- Synthesis — The trained model generates new speech in your voice from any text input
The resulting voice clone captures your unique vocal identity while producing natural-sounding speech on any topic.
Preparing Voice Samples#
The quality of your voice clone depends heavily on the quality and variety of your samples.
What makes a good sample:
- Clear audio with minimal background noise
- Your natural speaking voice (not reading voice)
- Varied sentence structures and emotions
- Mix of short sentences, long sentences, and questions
- At least some industry-specific vocabulary you commonly use
What to avoid:
- Whispering or shouting
- Background music or ambient noise
- Multiple speakers in the same recording
- Heavy audio processing or filters
- Reading in a monotone voice
Upload Requirements#
Voice samples must meet the following specifications:
| Requirement | Specification |
|---|---|
| Format | WAV, MP3, M4A, or FLAC |
| Minimum duration | 30 seconds per sample |
| Maximum duration | 5 minutes per sample |
| Minimum samples | 3 samples |
| Recommended samples | 5-10 samples |
| Total audio | At least 3 minutes, ideally 10+ minutes |
| Sample rate | 16kHz minimum (44kHz recommended) |
| Channels | Mono preferred (stereo is accepted) |
| File size | Max 50MB per file |
More audio data generally produces a better clone. If you can provide 10+ minutes of varied speech, the AI can capture more nuances of your voice.
Cloning Process and Timeline#
After uploading your samples:
- Validation (instant) — Samples are checked for quality, duration, and format compliance
- Preprocessing (1-2 minutes) — Audio is normalized, trimmed, and optimized
- Training (5-15 minutes) — The neural voice model is trained on your data
- Quality Check (1-2 minutes) — The system generates test phrases and verifies quality
- Ready — Your voice clone is available for use
You will receive an in-app notification and optional email when your clone is ready. The entire process typically takes 10-20 minutes from upload to completion.
Testing Your Clone#
After training completes, test your voice clone:
- Go to Voice Studio > My Voices
- Click Test on your new voice clone
- Type a sentence or use one of the test prompts provided
- Click Generate Preview
- Listen to the output and compare with your natural voice
If the clone does not sound right, you can:
- Add more samples — More data improves quality
- Replace low-quality samples — Remove noisy or unnatural recordings
- Retrain — Click "Retrain Model" to rebuild with updated samples
- Adjust settings — Tune stability and clarity sliders
Updating Voice Samples#
Your voice clone can be updated at any time:
1. Go to Voice Studio > My Voices
2. Click the voice clone you want to update
3. In the Samples tab, you can:
- Add new samples with the Upload button
- Remove existing samples by clicking the X on each
- Re-record samples using the built-in recorder
4. Click Retrain Model to rebuild the clone with updated samples
Retraining follows the same timeline as the initial training (5-15 minutes). Your previous voice clone remains available until the new training completes.
Re-record your voice clone every 6-12 months if you notice your natural speaking style has changed.