Git Product home page Git Product logo

scf's People

Contributors

dominikgarsche avatar pthopesch avatar rico-z avatar spoeschel avatar xchange11 avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

scf's Issues

Change requirement for StartBox and EndBox mapping

Refers to module: STLXML2EBU-TT

Currently the processing is not consistent, when handling teletext control codes (CC) in the 'TF' (text field of the TTI block). According to EBU Tech 3360, StartBox/EndBox should close the current span and open a new one. Same for color CCs (AplphaBlack, AlphaGreen, ...). For StadtBox/EndBox the requirements reflect exactly that. However, for color CCs the requirements state that a new span is created only when the CC changes the subtitle style (foreground or background color). That means there will be no two consecutive spans with the same style.

For StartBox/EndBox the processing should be the same as for color CCs.

For most real world STLXML files this will not be an issue since every line uses only one box.

STL2STLXML: mapping of TTI Blocks containing user data

In the current scf version 0.2.8, TTI blocks from an STL file that contain user data are ignored when transforming from STL to STLXML. To preserve user data (e.g. to allow for round-tripping scenarios) these TTI blocks should be mapped as well to the STLXML format. User data is signalled by an Extension Block Number (EBN) value of FEh. TTI blocks with an EBN set to a reserved value (F0h-FDh) should keep beeing ignored as described in requirement no. 208.

The content of the text field (TF) cannot be mapped (proprietary data) and thus should be encoded, e.g. using base64.

Extension Block Number (EBN): FEh --> TTI block contains user data
Extension Block Number (EBN): F0h-FDh --> Reserved Codes

The description of the requirement no. 208 should be updated accordingly.

EBU-TT2EBU-TT-D Subtitle content conversion fails if tt:metadata present

The XSLT EBU-TT2EBU-TT-D calls only the first child of a tt:p element and then the child element applies templates to it's sibling. This works fine if the first child of the tt:p is a tt:span element (because the applied template follows this "sibling" strategy). But if the first child is for example tt:metadata this fails and no further content is processed. For a quick fix in the tt:metadata template the next sibling strategy needs to added. In the long run this sibling strategy needs to be factored out.

STLXML2EBU-TT: handling of content out of boxing

When a subtitle's TF contains content that is not enclosed by StartBox/EndBox element pairs, currently spaces (in the form of space elements) are discarded - but any text is copied. E.g. in case of This is a test. in STL (without enclosing boxing), this results in Thisisatest. in EBU-TT.

For consistency space/text content outside of boxing should be treat equally i.e. text should be discarded as well in that case.

An alternative way is to discard neither spaces nor text in case the TF does not use boxing at all (which is sometimes seen in STL files).

Applying offset when 'subtitle zero' is present leads to termination

Looking at EBU-TT2EBU-TT-D.xslt line 771:

                <xsl:if test="$mediaHours &lt; 0 or $mediaMinutes &lt; 0 or $mediaSeconds &lt; 0 or $mediaFrames &lt; 0">
                    <xsl:message terminate="yes">
                        The chosen offset would result in a negative timestamp for a time value.
                    </xsl:message>
                </xsl:if>           

If the source STL file has subtitle zero and the relevant offset is applied this always results in a termination. The preferred behaviour here should be to omit the content elements with negative timestamps from the file and possibly issue a warning message.

STL2SLTXML: Mapping of TTI blocks containing comments

In the current scf verison 0.2.8, TTI blocks from an STL file that contain comments are ignored when transforming from STL to STLXML. These comments should be mapped as well in order to achieve a "better" XML representation of the STL file.

CF set to 00h --> TTI Block contains subtitle data
CF set to 01h --> TTI Block contains comments

The mapping of the text field (TF) of a comment should be handled is the same way that a normal subtitle is.

The description of the requirement no. 214 "Comment Flag mapping" should be updated accordingly.

EBU-TT to DXFP

It would be useful for an XSLT translation for EBU-TT to DXFP for deployments wishing to use Microsoft Smooth streaming of subtitles.
We believe this would be a pretty simple edit of the EBU-TT to EBU-T-D template to exchange the namespaces of EBU-TT-D and take out few extensions.

STL2STLXML: don't abort on CPNs not allowed by EBU STL

Currently if a set CPN value is used which is not supported by EBU STL, the behaviour of STL2STLXML depends on Python:

  • when Python supports the encoding, the conversion continues
  • when Python doesn't support it, an exception occurs

To unify the behaviour, the conversion shall fall back to CPN 850 if Python does not support the specified CPN (despite whether allowed by EBU STL or not).

STL2STLXML: consider Teletext control codes/0x8F also for non-850 CPNs

When a STL file has a CPN other than 850, the conversion does not handle the Teletext control codes (0x00 to 0x1F) or the 0x8F code; they instead are mapped to STLXML without further processing.

This shall be fixed so that these characters get the same processing like with CPN 850.

STLXML2EBU-TT: consider also TCP when applying timecode offset

When the parameter offsetInSeconds is used to specify the used offset within the STLXML input file, that offset is subtracted from the TCI/TCO values when the EBU-TT file is written.

While the TCP relates to the TCI/TCO values, it currently is not modified during the conversion. This should be changed so that the offset parameter also affects the TCP field value.

STLXML2EBU-TT: map CD/RD/RN fields

Currently the fields CD/RD/RN are not mapped from STLXML to EBU-TT. So they should be mapped according to EBU-TT Part 2 and EBU-TT Part 1 v1.0.

STLXML2EBU-TT: add offsetInFrames parameter

The STLXML2EBU-TT conversion currently supports the offsetInSeconds parameter, which allows to specify an offset in seconds by that all TCI/TCO values (compare #32) of the input file are affected. In the EBU-TT result the mentioned values are then written after the specified offset has been subtracted respectively.

In addition a similar paramter offsetInFrames shall be added which has the same effect but is specified as an SMPTE timecode.

STLXML2EBU-TT: terminate on non-merged TTI blocks

As #31 proposes an option for STL2STLXML to not merge TTI blocks of the same subtitle, such a resulting file shall not be used with STLXML2EBU-TT.

To enforce this, STLXML2EBU-TT shall terminate when a TTI block is found with an EBN other than FE (user data) or FF (last TTI block of subtitle set).

STLXML2EBU-TT: chars before first space/control code

In certain cases, characters at the beginning of a text field are not processed by the transformation from STLXML to EBU-TT. This seem to affect characters that occur before the first space or control code of a subtitle.

This problem should only involve files with Open Subtitles. Teletext subtitles nowadays usually contain a Double Height control code before any subtitle text, so such files are not affected.

See also #46.

EBU-TT-D2EBU-TT-D-Basic-DE: Missing style reference for text not in a tt:span

When in a EBU-TT-D document text nodes are direct children of the tt:p element they should be "wrapped" by a tt:span element through the transformation to EBU-TT-Basic-DE. Although this is done correctly the resulting span has a style attribute with no value.

Example:

EBU-TT-D Source:

     <tt:p 
            xml:id="sub1"
            region="defaultRegion"
            begin="00:00:00.000"
            end="00:00:02.000">Test text<tt:br/><tt:span>Test text 2nd line</tt:span></tt:p>

Result

       <tt:p xml:id="sub1" style="textCenter" region="bottom" begin="00:00:00.000"
            end="00:00:02.000">
            <tt:span style="">Test text</tt:span>
            <tt:br/>
            <tt:span style="textWhite">Test text 2nd line</tt:span>

Should be

       <tt:p xml:id="sub1" style="textCenter" region="bottom" begin="00:00:00.000"
            end="00:00:02.000">
            <tt:span style="textWhite">Test text</tt:span>
            <tt:br/>
            <tt:span style="textWhite">Test text 2nd line</tt:span>

STL2STLXML: Empty subtitles get lost

In EBU STL files containing teletext closed captions often empty subtitles occur on purpose.

a) Background
If there are longer periods without dialog or noises to be described editors place an empty subtitle to signal all receivers of the viewers who might switched to a channel in the meantime that closed captions are transmitted. Otherwise the search for teletext page 888, 777, 150 etc. can run much longer.

b) Problem introduced by SCF
Omitting empty subtitles leads to two main problems:

  • Round tripping issues due to differences in total no. of subtitles, TC of first/last subtitle cross referenced in metadata workflows, etc.
  • Communication issues between editors creating/checking a file due to the different numbering in the source file and the converted/round tripped one.

c) Expected behavior
SCF should respect every single subtitle no matter whether it is empty or not.

STL2STLXML: Alternative option for handling TTI Blocks

The STLXML format may be used in different scenarios:
a) as an intermediate format when converting from STL to EBU-TT
b) as a human readable version of the STL file (e.g. for error checking)

According to EBU-TT Part 2, subtitles that span over several TTI blocks are merged into one p-element when an STL file is converted to EBU-TT. In scf, the merging process is done in the STL2STLXML module. That is fine for use case a), but the resulting STLXML file is not an exact XML-representation of the source STL file. This may be a problem, e.g. when comparing the metadata "Total Number of Subtitles" with the number of TTI blocks in the STLXML file.

To support various use cases it may be good to implement an option that allows for an exact 1:1 translation of the TTI blocks.

Calculate offset based on `ebuttm:documentStartOfProgramme`

When converting from timebase="smpte" to timebase="media" it would be ideal to offer a setting (or indeed make it the default) to use the value of TCP or equivalently ebuttm:documentStartOfProgramme as the offset and discard any content that falls before that timecode.

For example, an EBU-TT document is created from an STL document and has:

  • timebase="smpte",
  • a 'subtitle zero' at 00:00:00,
  • a ebuttm:documentStartOfProgramme="10:00:00" and
  • most of the content falls after 10:00:00.

This should generate an EBU-TT-D document with:

  • timebase="media",
  • no subtitle zero
  • the content is offset backwards by 10:00:00: an element whose begin="10:00:00" in the EBU-TT should have begin="00:00:00" in the EBU-TT-D.

STLXML2EBU-TT: PARAMETER IS CALLED timecodeFormat INSTEAD of timebase

The STLXML2EBUTT-XSLT accepts one parameter to set the timebase of the documents. Although EBU-TT is restricted in a way that the timebase is equivalent with a specific timecode format (timebase "smpte" only uses frames and "media" milliseconds) this is not the case for TTML in general. Here a subtitle document could have the timebase "media" and nonetheless have a time expression based on frames.

As the current naming of the parameter is misleading this has to be changed.

STLXML2EBU-TT: NEEDLESS ALIGNMENT tt:span

The style creation for inline elements (tt:span) includes the style attribute tts:textAlign.

The attribute tts:textAlign applies only to tt:p elements but not to tt:span elements. Therefore no distinction of text alignment has to be made creating style references for a tt:span. Consequently styles
that apply only to tt:span do not need any information about text alignment.

Although the current code do not result in incorrect rendering and still produces conformant XML it needs to be refactored because it is misleading.

STLXML2EBU-TT: use separate tt:div per SGN value

Currently STLXML2EBU-TT ignores the SGN field and puts all subtitles into a single tt:div element.

Therefore the SGN field shall be processed and a separate tt:div element be used per SGN value.

STLXML2STL: composite sequences not correctly mapped

Composite sequences with diacritical characters are not correctly mapped from STLXML to STL. This includes e.g.:

J́
j́
J̃
L̃
M̃
R̃
j̃
l̃
m̃
r̃
E̊
e̊

The reason is the different order of the diacritical combining character. While in Unicode it is a suffix, in EBU STL it is a prefix. So the char order has to be switched in such a case.

xslt error

Hi,

Using scf version 0.9.2, I have a problem with a STL :

  • conversion of STL to STLXML works fine
  • conversion of STLXML to TTML crashes with
scf-0.9.2/modules/STLXML2EBU-TT/STLXML2EBU-TT.xslt:615: validity error : xml:id : attribute value {concat('SGN', .)} is not an NCName
                <tt:div style="defaultStyle" xml:id="{concat('SGN', .)}">
                                                                        ^
scf-0.9.2/modules/STLXML2EBU-TT/STLXML2EBU-TT.xslt:892: validity error : xml:id : attribute value {concat('sub', $SN)} is not an NCName
            end="{$end}">
                        ^
xmlXPathCompOpEval: function current-date not found
XPath error : Unregistered function
xmlXPathCompiledEval: evaluation failed
runtime error: file scf-0.9.2/modules/STLXML2EBU-TT/STLXML2EBU-TT.xslt line 287 element value-of
XPath evaluation returned no result.

I am not familiar with xslt transformations so I am kind of lost here.

Using scf version 0.2.4 (the version I usually use), it produces only the following error :

xsltproc scf-0.2.4/modules/STLXML2EBU-TT/STLXML2EBU-TT.xslt 1491467708_36188FRA_ST.stlxml.xml
scf-0.2.4/modules/STLXML2EBU-TT/STLXML2EBU-TT.xslt:879: validity error : xml:id : attribute value {concat('sub', $SN)} is not an NCName
            end="{$end}">

And the output TTML is much bigger but way far from being usable (some text is missing)

The STL is here : https://drive.google.com/open?id=0B60JiOl5bvMNc09ISUd6em9jMTg
A shell script of what I am doing : https://drive.google.com/open?id=0B60JiOl5bvMNZXRYWmZNT1pqWE0
Regards.

Mapping of user data from the GSI block

The user-defined area in the GSI block of an STL file should be mapped to STLXML and further to EBU-TT. The information in this user data field may be worth to preserve. Additionally this is a requirement for lossless round-tripping.

STLXML2EBU-TT error with xsltproc

I'm trying to do the XSLT conversion step from STLXML -> EBU-TT using xsltproc under Ubuntu 16.04 by calling:

xsltproc STLXML2EBU-TT.xslt STLXML.xml > EBU-TT.xml

xsltproc --version
Using libxml 20903, libxslt 10128 and libexslt 817
xsltproc was compiled against libxml 20903, libxslt 10128 and libexslt 817
libxslt 10128 was compiled against libxml 20902
libexslt 817 was compiled against libxml 20902

However, the process terminates with the following error messages:

STLXML2EBU-TT.xslt:669: validity error : xml:id : attribute value {concat('SGN', .)} is not an NCName
        <tt:div style="defaultStyle" xml:id="{concat('SGN', .)}">
                                                                ^
STLXML2EBU-TT.xslt:969: validity error : xml:id : attribute value {concat('sub', $SN)} is not an NCName
            end="{$end}">
                        ^
XPath error : Invalid expression
number($offsetTCP) eq 1
                   ^
compilation error: file STLXML2EBU-TT.xslt line 874 element when
xsl:when : could not compile test expression 'number($offsetTCP) eq 1'
XPath error : Invalid expression
string-length($tcp) ne 8
                    ^
compilation error: file STLXML2EBU-TT.xslt line 876 element if
xsl:if : could not compile test expression 'string-length($tcp) ne 8'

Are there any suggestions on how to solve this issue?

THX!

Only output the styles that are actually used

The current STLXML2EBU-TT stylesheet always outputs the whole set of styles for all the different foreground and background colour combinations even if only a subset of those styles are used. Those are then copied across by EBU-TT2EBU-TT-D and are therefore preserved. It would be a good improvement to insert only those styles into the EBU-TT that are actually used.

STLXML2STL: conversion aborts if more than two subtitle lines are used

When a subtitles uses more than two lines and for every line the 40 bytes of a Teletext line are used (e.g. 3 lines resulting in 3*40=120 bytes, plus line breaks), the capacity of a single TTI block (112 bytes) is exceeded. Hence the conversion aborts, as the case of multiple TTI block is currently not covered. Instead the conversion tries to apply a TTI block padding with a negative length, e.g.:

Stopped at stlxml2stl.xqm, 176/20:
[bin:negative-size] Size '-9' is negative.

STLXML2EBU-TT TYPO STYLE

When choosing the option to not trim white space the inserted XML contains a typo. The style attribute is called stype instead of style.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.