Git Product home page Git Product logo

ssml-check-core's Introduction

SSML-Check-Core

SSML-Check-Core will verify that a given input is valid SSML

Usage

This library exposes two functions which allow you to check and optionally correct a given SSML string

Check

The first is check which verifies whether the given input is a valid SSML string on either the Amazon Alexa or Google Assistant platform (or both). This function returns a Promise with an array of errors indicating how the input fails validation, or a Promise of undefined if there are no errors.

check(ssml, options)

The arguments to this function are:

  • ssml - The SSML to check
  • options - Options for evaluating the SSML as noted below

The options structure is composed of the following fields with the following default values:

{
  platform: 'all',           // The voice platform to evaluate this SSML against.
                             // Valid values are "all", "amazon", or "google".
  locale:undefined,          // The locale you want to check against, used for certain
                             // locale-specific attributes like amazon:emotion
  unsupportedTags:undefined, // An array of tags that will be flagged as invalid
                             // For example, ['prosody']
  getPositions:false,        // If set, the index of the tag will be returned as the position
                             // field within the error object
}

The return value is a Promise resolving to an array of errors that were encountered in processing the SSML, or undefined if no errors were encountered. The format of each error object is as follows:

{
  type,       // The type of error encountered ("tag" or a specific error)
  tag,        // The tag that had an error (set if type is "tag")
  attribute,  // The attribute that had an error (set if type is "tag")
  value,      // The attribute value that was in error (set if type is "tag" or "audio")
  position,   // The position of the start of the tag within the input string (set if getPositions is true)
  message,    // A fully formed human readable string with details of the error. (set if error includes a message) 
}

The current version of ssml-check-core will check for the following:

  • Valid XML format
  • All tags are valid tags for their platform with valid attributes and values
  • No more than five audio tags in the response
  • Note invalid & character

Example

const ssmlCheck = require('ssml-check-core');
ssmlCheck.check('<speak><prosody rate="5%">Hello world</prosody></speak>')
.then((errors) => {
  if (errors) {
    console.log(JSON.stringify(errors));
  } else {
    console.log('SSML is clean');
  }
});

will output [{"type":"tag","tag":"prosody","attribute":"rate","value":"5%"}]

verifyAndFix

The second function is verifyAndFix which returns a Promise of an object containing an array of caught SSML errors (similar to check) and, if possible, corrected SSML as noted below.

verifyAndFix(ssml, options)

The arguments to this function, including the options structure, are the same as for check.

The return value is a Promise resolving to an object with the following fields:

{
  fixedSSML,  // A fixed SSML string if errors are found that can be corrected for
              // This field will be undefined if the SSML cannot be corrected
  errors,     // An array of errors. The format of each object in this array is as
              // defined above for the check function. This field is undefined
              // if there are no errors.    
}

If there are no errors, then the Promise will contain an empty object.

The current version of ssml-check-core will correct the following errors:

  • If more than five audio tags are in the response, elements after the first five are removed
  • If an invalid tag is found, the tag will be removed but the contents of the element will remain
  • If an invalid attribute is found, it will be removed (in the case of the src attribute for audio, if this is missing or invalid the element will be removed)
  • If an invalid value is found for an attribute within a valid tag, the value will be corrected as best possible. For example, adding a leading + to values that require it like prosody's pitch attribute, adjusting the value to be within an acceptable range, or substituting a default value if necessary

Examples

const ssmlCheck = require('ssml-check-core');
ssmlCheck.verifyAndFix('<speak><tag>What is this?</tag><break time="20000ms"/>This & that</speak>')
.then((result) => {
  if (result.fixedSSML) {
    console.log(result.fixedSSML);
  } else if (result.errors) {
    console.log(JSON.stringify(result.errors));
  } else {
    console.log('SSML is clean');
  }
});

will output <speak>What is this?<break time="10s"/>This &amp; that</speak>

const ssmlCheck = require('ssml-check-core');
ssmlCheck.verifyAndFix('<speak><prosody rate="60">Hello world</prosody></speak>')
.then((result) => {
  if (result.fixedSSML) {
    console.log(result.fixedSSML);
  } else if (result.errors) {
    console.log(JSON.stringify(result.errors));
  } else {
    console.log('SSML is clean');
  }
});

will output <speak><prosody rate="60%">Hello world</prosody></speak>

Contributions

We love your input! We want to make contributing to this project as easy and transparent as possible, whether it's:

  • Reporting a bug
  • Discussing the current state of the code
  • Submitting a fix
  • Proposing new features

When contributing to this repository, please first discuss the change you wish to make by raising an issue or sending an e-mail with to the owners of this repository.

ssml-check-core's People

Contributors

bayloun avatar dominik-meissner avatar gsdriver avatar kurtisca avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar

ssml-check-core's Issues

Problem processing media tag

Describe the bug
Media tag improperly reports errors with xml:id and soundLevel attributes

To Reproduce

<media xml:id="crowd" soundLevel="5dB" fadeOutDur="1.0s">
  <audio src="https://actions.google.com/sounds/v1/crowds/battle_cry_high_pitch.ogg" clipEnd="3.0s">
    <desc>crowd cheering</desc>
    YEAH!
  </audio>
</media>
</speak>

Expected behavior
xml:id should allow matches of any letters or digits, along with -, _, and # characters (so "crowd" should be acceptable).
soundLevel should allow any integer (currently it requires a preceeding + or -)

Environment (please complete the following information):

  • Platform: Google
  • Version 0.1.1

Add error message for parse issues

Describe the bug
When the xml2json encounters a parse, the library returns a user defined message can't parse ssml

  {
    type: "Can't parse SSML"
  }

To Reproduce
dummy ssml string

Expected behavior
Along with the user defined message, return the actual error message

  {
    type: "Can't parse SSML",
    message: 'Text data outside of root node.\nLine: 0\nColumn: 17\nChar: g'
  }

Environment (please complete the following information):

  • Platform: All
  • Version latest

Incorrect position when same tags are present

Describe the bug
Get Positions returns the position of the first tag when similar tags are present

To Reproduce
Provide a sample SSML response that exhibits the bug.

<speak>
  <p>
    This is an example of SSML text. Here's a sentence with a pause.
    <break time="2s"/>
    And here's another sentence.
  </p>
  <p>
    <prosody rate="slow">This sentence is spoken slowly.</prosody>
    <prosody volume="loud">This sentence is spoken loudly.</prosody>
    <prosody pitch="high">This sentence is spoken with a high pitch.</prosody>
  </p>
  <p>
    <emphasis level="strong">This sentence is emphasized.</emphasis>
    <emphasis level="moderate">This sentence is moderately emphasized.</emphasis>
    <emphasis level="reduced">This sentence is reduced in emphasis.</emphasis>
  </p>
  <p>
    <say-as interpret-as="cardinal">123</say-as>
    <say-as interpret-as="ordinal">1st</say-as>
    <say-as interpret-as="characters">Hello</say-as>
    <say-as interpret-as="spell-out">ABC</say-as>
    <say-as interpret-as="telephone">555-1234</say-as>
    <say-as interpret-as="date">2023-07-10</say-as>
    <say-as interpret-as="time">15:30:00</say-as>
    <say-as interpret-as="verbatim">Text that is read verbatim</say-as>
  </p>
</speak>

Expected behavior
Get Position should return the correct position of the error "Text that is read verbatim" Position should be 984

Current behavior
Position is 627 for the same error tag.

Environment (please complete the following information):

  • Platform: All
  • Version latest

Google added support for a lot of new tags and features

During 2021, Google introduces some new features:

  • <phoneme>: Customize the pronunciation of specific words.
  • <say-as interpret-as="duration">: Specify durations.
  • <voice>: Switch between voices in the same request.
  • <lang>: Use multiple languages in the same request.
  • <mark>: Return the timepoint of a specified point in your transcript.

(it seems that ssml-check-core already supports <say-as interpret-as="duration"> and <mark> just fine)

The solution I'd like:
ssml-check-core should probably support them in the same way the well-established tags are supported.

Describe alternatives you've considered
If some or all of those features are not supported, the documentation should state this.

Additional context
At first, those features were in beta phase (see this archived page) but I think they are out of beta now. I'm unsure if/how this relates to Google's v1 and v1beta1 APIs.

From my (limited) experience, the <phoneme> implementation is very picky and hard to debug, since every locale only supports a subset of the IPA phones, and the <phoneme> tag is completely ignored as soon as there's just one unsupported character in the ph attribute. The validation of ph would be a very useful feature, but also very complex. Maybe this should be a separate issue, or even be outsourced to a separate library?

media begin/end attributes don't handle syncbase values

Describe the bug
The begin and end attributes for Google's media tag don't recognize syncbase values

To Reproduce

<speak>
<media xml:id="words" begin="crowd.end-1.0s">
  <speak><emphasis level="strong">Great catch by Amendola! I can't believe he got both feet in bounds!</emphasis></speak>
</media>
</speak>

returns an error that crowd.end-1.0s is invalid

Expected behavior
This is valid SSML on Google

Environment (please complete the following information):

  • Platform: Google
  • Version: 0.1.2

Add support for Alexa emotions and speaking styles

Is your feature request related to a problem? Please describe.
Amazon recently released a new feature allowing for emotions and speaking styles as described here. Specifically, this looks like it adds support for two new SSML tags specific to the Amazon platform, amazon:emotion and amazon:domain.

Describe the solution you'd like
We should support these new SSML tags, specifically looking at the fields noted in the developer documentation located here and here

Check for incompatible tags

Is your feature request related to a problem? Please describe.
Amazon describes a set of incompatible tags - tags that cannot simultaneously be applied to the same text. It would be good to check for this and present an error in this case.

Describe the solution you'd like
We should have an array of incompatible tags in checkForValidTagsRecursive. If one of these is encountered, it can check that no children are also in the array of incompatible tags

Additional context
We should check Google Assistant to see if they have similar limitations

Add support for amazon:name

Is your feature request related to a problem? Please describe.
According to this blog post (https://developer.amazon.com/blogs/alexa/post/1ad16e9b-4f52-4e68-9187-ec2e93faae55/recognize-voices-and-personalize-your-skills) there is now support for a new amazon:name tag on Alexa which will return the user's name.

Describe the solution you'd like
This should be supported as a valid tag when platform is set to amazon.

Additional context
This tag is not documented in current SSML specs, so will need to be field tested to understand proper usage

Missing Tags

The following valid Amazon SSML tags are currently treated as rejections:

  • mark
  • amazon:breath
  • amazon:auto-breaths
  • <prosody amazon:max-duration="2s">

Add support for ph attribute verification in phoneme tag

Splitting this off from #12

From my (limited) experience, the implementation is very picky and hard to debug, since every locale only supports a subset of the IPA phones, and the tag is completely ignored as soon as there's just one unsupported character in the ph attribute. The validation of ph would be a very useful feature, but also very complex. Maybe this should be a separate issue, or even be outsourced to a separate library?

see this archived page) for more details

more little things..

  1. In my opinion this validation is wrong at the break's "time" attribute. I dont find it anywhere in the documentation. Google doc says:

time: Sets the length of the break by seconds or milliseconds (e.g. "3s" or "250ms").

  } else if ((platform === 'google') && text.match(/^[0-9]+(\.[0-9]+)?$/g)) {
time = 1000 * parseInt(text);

Further more, where did you find that decimal places are allowed? Havent tried it yet, but couldnt find it in the docs of amazon / google, either.

  1. copy paste mistake, the commented line shouldn't be there:
case 'prosody':
// Attribute must be time or strength

Some locales are missing from Amazon/Alexa "lang" tag

Describe the bug
Alexa SSML supports 15 different locales for the xml:lang attribute of the lang tag, 5 of which are missing.

To Reproduce

<speak>
  <lang xml:lang="es-MX">Hello</lang>
</speak>

Expected behavior
This should be valid SSML. See Alexa Documentation.

Environment (please complete the following information):

  • Platform: Alexa
  • Version Unknown

Additional context
I've already created a PR here for the fix. It includes the 5 missing locales.

prosody pitch on Google doesn't support semitones

Describe the bug
On Google, the pitch attribute of the prosody tag should support semitones

To Reproduce

<speak>
    <prosody rate="slow" pitch="-1st">Come in!<break time='0.5s'/>Welcome to the terrifying world of the imagination.</prosody>
</speak>

is valid on Google (but not on Amazon)

Expected behavior
Per documentation on Google for the pitch attribute of the prosody tag:

Semitones: Increase or decrease pitch by "N" semitones using "+Nst" or "-Nst" respectively. Note that "+/-" and "st" are required.

Environment (please complete the following information):

  • Platform: Google
  • Version: 0.1.2

Any way to get error position ?

Hi,

Firstly thank you for this package!

Is there any way to get the position of each error in the returned array?

Currently I have to search the whole text for a specific tag, then attributes, and then values to accomplish that.

Regards

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.