We should initialize current_voice_id so that ssml processing knows what voice we currently have, and not try to change it unless really needed.
A corpus of SSML input files will also be needed for fuzzing. https://github.com/espeak-ng/espeak-ng/issues/407