Git Product home page Git Product logo

ssml-check-core's Issues

Missing Tags

The following valid Amazon SSML tags are currently treated as rejections:

  • mark
  • amazon:breath
  • amazon:auto-breaths
  • <prosody amazon:max-duration="2s">

Some locales are missing from Amazon/Alexa "lang" tag

Describe the bug
Alexa SSML supports 15 different locales for the xml:lang attribute of the lang tag, 5 of which are missing.

To Reproduce

<speak>
  <lang xml:lang="es-MX">Hello</lang>
</speak>

Expected behavior
This should be valid SSML. See Alexa Documentation.

Environment (please complete the following information):

  • Platform: Alexa
  • Version Unknown

Additional context
I've already created a PR here for the fix. It includes the 5 missing locales.

prosody pitch on Google doesn't support semitones

Describe the bug
On Google, the pitch attribute of the prosody tag should support semitones

To Reproduce

<speak>
    <prosody rate="slow" pitch="-1st">Come in!<break time='0.5s'/>Welcome to the terrifying world of the imagination.</prosody>
</speak>

is valid on Google (but not on Amazon)

Expected behavior
Per documentation on Google for the pitch attribute of the prosody tag:

Semitones: Increase or decrease pitch by "N" semitones using "+Nst" or "-Nst" respectively. Note that "+/-" and "st" are required.

Environment (please complete the following information):

  • Platform: Google
  • Version: 0.1.2

media begin/end attributes don't handle syncbase values

Describe the bug
The begin and end attributes for Google's media tag don't recognize syncbase values

To Reproduce

<speak>
<media xml:id="words" begin="crowd.end-1.0s">
  <speak><emphasis level="strong">Great catch by Amendola! I can't believe he got both feet in bounds!</emphasis></speak>
</media>
</speak>

returns an error that crowd.end-1.0s is invalid

Expected behavior
This is valid SSML on Google

Environment (please complete the following information):

  • Platform: Google
  • Version: 0.1.2

Add support for amazon:name

Is your feature request related to a problem? Please describe.
According to this blog post (https://developer.amazon.com/blogs/alexa/post/1ad16e9b-4f52-4e68-9187-ec2e93faae55/recognize-voices-and-personalize-your-skills) there is now support for a new amazon:name tag on Alexa which will return the user's name.

Describe the solution you'd like
This should be supported as a valid tag when platform is set to amazon.

Additional context
This tag is not documented in current SSML specs, so will need to be field tested to understand proper usage

more little things..

  1. In my opinion this validation is wrong at the break's "time" attribute. I dont find it anywhere in the documentation. Google doc says:

time: Sets the length of the break by seconds or milliseconds (e.g. "3s" or "250ms").

  } else if ((platform === 'google') && text.match(/^[0-9]+(\.[0-9]+)?$/g)) {
time = 1000 * parseInt(text);

Further more, where did you find that decimal places are allowed? Havent tried it yet, but couldnt find it in the docs of amazon / google, either.

  1. copy paste mistake, the commented line shouldn't be there:
case 'prosody':
// Attribute must be time or strength

Check for incompatible tags

Is your feature request related to a problem? Please describe.
Amazon describes a set of incompatible tags - tags that cannot simultaneously be applied to the same text. It would be good to check for this and present an error in this case.

Describe the solution you'd like
We should have an array of incompatible tags in checkForValidTagsRecursive. If one of these is encountered, it can check that no children are also in the array of incompatible tags

Additional context
We should check Google Assistant to see if they have similar limitations

Add support for Alexa emotions and speaking styles

Is your feature request related to a problem? Please describe.
Amazon recently released a new feature allowing for emotions and speaking styles as described here. Specifically, this looks like it adds support for two new SSML tags specific to the Amazon platform, amazon:emotion and amazon:domain.

Describe the solution you'd like
We should support these new SSML tags, specifically looking at the fields noted in the developer documentation located here and here

Incorrect position when same tags are present

Describe the bug
Get Positions returns the position of the first tag when similar tags are present

To Reproduce
Provide a sample SSML response that exhibits the bug.

<speak>
  <p>
    This is an example of SSML text. Here's a sentence with a pause.
    <break time="2s"/>
    And here's another sentence.
  </p>
  <p>
    <prosody rate="slow">This sentence is spoken slowly.</prosody>
    <prosody volume="loud">This sentence is spoken loudly.</prosody>
    <prosody pitch="high">This sentence is spoken with a high pitch.</prosody>
  </p>
  <p>
    <emphasis level="strong">This sentence is emphasized.</emphasis>
    <emphasis level="moderate">This sentence is moderately emphasized.</emphasis>
    <emphasis level="reduced">This sentence is reduced in emphasis.</emphasis>
  </p>
  <p>
    <say-as interpret-as="cardinal">123</say-as>
    <say-as interpret-as="ordinal">1st</say-as>
    <say-as interpret-as="characters">Hello</say-as>
    <say-as interpret-as="spell-out">ABC</say-as>
    <say-as interpret-as="telephone">555-1234</say-as>
    <say-as interpret-as="date">2023-07-10</say-as>
    <say-as interpret-as="time">15:30:00</say-as>
    <say-as interpret-as="verbatim">Text that is read verbatim</say-as>
  </p>
</speak>

Expected behavior
Get Position should return the correct position of the error "Text that is read verbatim" Position should be 984

Current behavior
Position is 627 for the same error tag.

Environment (please complete the following information):

  • Platform: All
  • Version latest

Any way to get error position ?

Hi,

Firstly thank you for this package!

Is there any way to get the position of each error in the returned array?

Currently I have to search the whole text for a specific tag, then attributes, and then values to accomplish that.

Regards

Google added support for a lot of new tags and features

During 2021, Google introduces some new features:

  • <phoneme>: Customize the pronunciation of specific words.
  • <say-as interpret-as="duration">: Specify durations.
  • <voice>: Switch between voices in the same request.
  • <lang>: Use multiple languages in the same request.
  • <mark>: Return the timepoint of a specified point in your transcript.

(it seems that ssml-check-core already supports <say-as interpret-as="duration"> and <mark> just fine)

The solution I'd like:
ssml-check-core should probably support them in the same way the well-established tags are supported.

Describe alternatives you've considered
If some or all of those features are not supported, the documentation should state this.

Additional context
At first, those features were in beta phase (see this archived page) but I think they are out of beta now. I'm unsure if/how this relates to Google's v1 and v1beta1 APIs.

From my (limited) experience, the <phoneme> implementation is very picky and hard to debug, since every locale only supports a subset of the IPA phones, and the <phoneme> tag is completely ignored as soon as there's just one unsupported character in the ph attribute. The validation of ph would be a very useful feature, but also very complex. Maybe this should be a separate issue, or even be outsourced to a separate library?

Add support for ph attribute verification in phoneme tag

Splitting this off from #12

From my (limited) experience, the implementation is very picky and hard to debug, since every locale only supports a subset of the IPA phones, and the tag is completely ignored as soon as there's just one unsupported character in the ph attribute. The validation of ph would be a very useful feature, but also very complex. Maybe this should be a separate issue, or even be outsourced to a separate library?

see this archived page) for more details

Add error message for parse issues

Describe the bug
When the xml2json encounters a parse, the library returns a user defined message can't parse ssml

  {
    type: "Can't parse SSML"
  }

To Reproduce
dummy ssml string

Expected behavior
Along with the user defined message, return the actual error message

  {
    type: "Can't parse SSML",
    message: 'Text data outside of root node.\nLine: 0\nColumn: 17\nChar: g'
  }

Environment (please complete the following information):

  • Platform: All
  • Version latest

Problem processing media tag

Describe the bug
Media tag improperly reports errors with xml:id and soundLevel attributes

To Reproduce

<media xml:id="crowd" soundLevel="5dB" fadeOutDur="1.0s">
  <audio src="https://actions.google.com/sounds/v1/crowds/battle_cry_high_pitch.ogg" clipEnd="3.0s">
    <desc>crowd cheering</desc>
    YEAH!
  </audio>
</media>
</speak>

Expected behavior
xml:id should allow matches of any letters or digits, along with -, _, and # characters (so "crowd" should be acceptable).
soundLevel should allow any integer (currently it requires a preceeding + or -)

Environment (please complete the following information):

  • Platform: Google
  • Version 0.1.1

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.