I designed a very similar system almost two years ago for VOIS which worked
remarkably well although not very scalable due to the price of voice
recognition back then. It used a variation of HTML called VML (Voice Markup
Language) which had voice-oriented tags for annotating text and specifying
input parameters. The company went bankrupt so it never saw the light of
day. I am happy to see VoxML announced because there is a huge market for
such technology.
It will be very interesting to apply XSL to VoxML.
Don Park
Docuverse