January 20, 2021
Hot Topics:

A Look into Speech Support in CSS3

  • By Vipul Patel
  • Send Email »
  • More Articles »
Editor's Note:

This article is forward looking. Future browsers are likely to support speech synthesis that will be configured and driven from a web page's CSS markup. Currently, the specifications for this are in a "recommendation" stage. As such, most browsers don't support this today, but the expectation is that they will in the future if the recommendations are fully approved.


The speech module of CSS, available at http://www.w3.org/TR/css3-speech, describes the various properties that allow web content developers to declare the rendering of web documents via speech synthesis. In addition to this, the specification also defines optional audio cues that can be used in web documents.

The CSS3 speech properties support controlling the pitch and rate of sound, as well as sound levels. In addition, they also support text-to-speech voices. By working together with visual properties, these speech properties provide a rich presentation experience to the readers browsing the content.

The CSS3 Speech Module introduces a new "box model" for the aural dimension. User agents with text-to-speech capability can be targeted in the following ways:

  • By specifying the "speech" media type by using the media attribute of the link element
  • By using the @media at-rule
  • Within an @import statement

Let us take a look at the various speech-related CSS3 properties.


This property is used to control the amplitude (volume) of the audio that will be generated by the text-to-speech engine. It also can be used to adjust the relative volume level of audio cues of a selected element.

Valid values are:

  • silent: This value means no sound is generated.
  • x-soft/soft/medium/loud/x-loud: These values represent monotonically increasing volume levels (implementation depends on the user agent). The default value of voice-volume is "medium".
  • <decibel>: Any number followed by "dB".


This property specifies the balance (spatial distribution) of the audio output, relative to the listener's position. The valid range is from -100 to 100.

Valid values are:

  • <number>: Any number between -100 and 100 with 0 representing the center point.
  • left: This is equal to -100; will push audio output on the left side only.
  • right: This is equal to 100, and will push audio output on the right side only.
  • center: This is equal to 0.
  • leftwards: For an inherited "voice-balance" value, this moves the sounds to the left by 20, for a maximum of -100.
  • rightwards: For an inherited "voice-balance" value, this moves the sounds to the right by 20, for a maximum of 100.


This property determines whether or not text can be rendered aurally.

Valid values are:

  • auto: When the 'display' property is set to 'none', this is imputed as 'none'; otherwise, it is imputed as 'normal'.
  • none: This blocks the element to not have any effect in the aural dimension.
  • normal: This causes the element to be rendered aurally.


This property determines how the text is rendered aurally.

Valid values are:

  • normal: This specifies the user agent to use language-dependent pronunciation to render the content.
  • spell-out: This specifies that each letter be spelled out by the user agent.
  • digits: This specifies the user agent must speak numbers one digit at a time.
  • literal-punctuation: This specifies that punctuation be named aloud.
  • no-punctuation: This specifies that punctuation is not rendered.


These properties specify a silence before/after the speech synthesis of the selected element. These properties can be represented in shorthand by using the "pause" property.

Valid values are:

  • <time>: A non-negative value that describes the pause in absolute time limits.
  • None: No wait period
  • x-weak/weak/medium/strong/x-strong: User agent-defined pause.


These properties specify a silence of a specific duration before/after speech synthesis of the selected element. These properties can be represented in shorthand by using the "rest" property. Valid values are the same as the "pause-before" and "pause-after" property.


These properties specify pre-recorded items to be played before/after the selected element. It takes in a <URI> and a <decibel> value. It can be represented in shorthand by using the "cue" property.


No browser today supports the CSS3 speech module. When there is support, we will be able to hear different voices when we choose to read different contents of the page.


<!DOCTYPE html>
<meta charset="utf-8">
<title>CSS3 speech sample</title>
         <h1>CSS3 speech sample</h1>
         <p class="helen">Demo showing how speech
            will work with CSS3</p>

   h1 {
      voice-family: paul;
      voice-stress: moderate;
      cue-before: url(../audio/ping.wav);
      voice-volume: medium 8dB;

   p.helen {
      voice-family: female;
      voice-balance: left;
      voice-pitch: high;
      voice-volume: -6dB;


In this article, we learned about the various speech-related properties that are specified in the CSS3 speech module. This module is currently at the "Candidate Recommendation" stage.



About the Author

Vipul Patel is a technology geek based in Seattle. He can be reached at vipul.patel@hotmail.com. You can visit his LinkedIn profile at https://www.linkedin.com/pub/vipul-patel/6/675/508.

This article was originally published on October 17, 2014

Enterprise Development Update

Don't miss an article. Subscribe to our newsletter below.

Thanks for your registration, follow us on our social networks to keep up-to-date