Amazon Polly

Amazon Polly is a TTS (Text-to-Speech) service that, according to Amazon, uses deep learning.

The GUI interface is straightforward; one selects one of the two engines, Neural or Standard. Neural is more human sounding. Languages and voices can vary depending on the engine. Voices reflect the dialect, register, and jargon that are typical for a region or country. For example, under English, there are Australian, British, Indian, Irish, New Zealand, South African, US, and Welsh variations.

Amazon Polly

Once the language is selected, the Voice drop down list is populated with available voices for that language. As with languages, certain voices are only available for a particular engine. Voices vary in pitch, loudness, speed, and tonality to simulate different ages and sexes. These controls make it possible to generate speech as a young girl or an old man.

There is an option to enable Speech Synthesis Markup Language (SSML) to fine-tune the generated output. Amazon supports a subset of the W3C SSML v1.1 recommendation. The two engines vary in the tags they cater to.

In the video below, we deep dive into the available engines and voices and discuss an end-to-end process for text-to-speech production.

Under Advanced Settings, one can control the sample rate, file format, and further customize pronunciation.

Amazon Polly can integrate with other AWS services. This allows developers to develop end-to-end solutions that involve TTS.







Code used in this How-To

<speak>
  Welcome to this Demo.
  The French word for cat is <lang xml:lang="fr-FR">chat</lang>.
  He prefers to eat pasta that is <lang xml:lang="it-IT">al dente</lang>.
</speak>

<speak>
  <p>
    <prosody rate="slow">
      <prosody volume="x-loud">To be, or not to be <break strength="x-strong" /></prosody>
      <prosody volume="loud"><amazon:breath duration="x-long" volume="medium" />that is the question</prosody>
    </prosody>0
  </p>
  <prosody pitch="low">
    <p>
      Whether 'tis nobler in the mind to suffer the slings and arrows of outrageous fortune,<break strength="x-strong" /> <amazon:breath duration="x-long" volume="medium" /> or to take arms against a sea of troubles.
     </p>
     And by opposing end them, <prosody pitch="x-low"><amazon:effect vocal-tract-length="+10%">To die <break strength="x-strong" /> to sleep</amazon:effect></prosody>
   </prosody>
</speak>

<speak>
   <p>
     <say-as interpret-as="spell-out">SAAS</say-as> stands for <break strength="strong" />Software <break strength="weak" />As <break strength="none" />A <break strength="weak" /> Service.
   </p>
</speak>

Comments

Popular posts from this blog

20150628 Giarratana Circular

HOWTO setup OpenVPN server and client configuration files using EasyRSA

How To Reset the firmware, wifi on GoPro Hero 3, 3+ and sync it with latest version of GoPro Quik