silnrsi / teckit Goto Github PK
View Code? Open in Web Editor NEWA Text Encoding Conversion toolkit
License: Other
A Text Encoding Conversion toolkit
License: Other
teckit-2.5.9.tar.gz archive contains zlib-1.2.3 directory where many files contain this license declaration:
/* adler32.c -- compute the Adler-32 checksum of a data stream
* Copyright (C) 1995-2004 Mark Adler
* For conditions of distribution and use, see copyright notice in zlib.h
*/
But there is no zlib.h file. When I look at original zlib-1.2.3 sources, the zlib.h reads:
This software is provided 'as-is', without any express or implied
warranty. In no event will the authors be held liable for any damages
arising from the use of this software.
Permission is granted to anyone to use this software for any purpose,
including commercial applications, and to alter it and redistribute it
freely, subject to the following restrictions:
1. The origin of this software must not be misrepresented; you must not
claim that you wrote the original software. If you use this software
in a product, an acknowledgment in the product documentation would be
appreciated but is not required.
2. Altered source versions must be plainly marked as such, and must not be
misrepresented as being the original software.
3. This notice may not be removed or altered from any source distribution.
That means the teckit archive violates zlib license because it removed the copyright notice from the source distribution. Please add zlib.h or at least the portion with with the copyright notice back to the teckit release archive. I can see it exists in your git repository.
On macOS 10.14.6 with expat-2.4.1 installed locally:
g++ -DHAVE_CONFIG_H -I. -I.. -I../source/Public-headers -DXML_DTD -I/sw/include -std=c++11 -g -O2 -DNDEBUG -MT ../SFconv/sfconv-SFconv.o -MD -MP -MF ../SFconv/.deps/sfconv-SFconv.Tpo -c -o ../SFconv/sfconv-SFconv.o `test -f '../SFconv/SFconv.cpp' || echo './'`../SFconv/SFconv.cpp
../SFconv/SFconv.cpp:49:10: fatal error: 'expat/xmlparse/xmlparse.h' file not found
#include "expat/xmlparse/xmlparse.h"
^~~~~~~~~~~~~~~~~~~~~~~~~~~
1 error generated.
In SFconv.cpp, the choice between ancient expat/xmlparse/xmlparse.h
and current expat.h
is set by HAVE_LIBEXPAT, and that's defined in config.h: #define HAVE_LIBEXPAT 1
The problem is config.h is only included in SFconv.cpp in certain cases:
Lines 35 to 40 in eda0d44
Also, expat/xmlparse/xmlparse.h
is not included in the tarball if anyone was trying to build without system-expat.
After investing a fair bit of time (and not reading all documentation first), I came to realize txtconv won't simply ignore characters with no mapping specified. Hence if my source doc is an unstructured mix of Unicode and legacy text, all of the Unicode text gets replaced with the default/specified replacement character.
Since txtconv has to replace characters missing a mapping with the replacement character anyways, is there any reason not to allow replacing the unmapped character with itself? I understand that txtconv seeks to guarantee the final output is uniformly of the designated encoding, but giving an option to bypass this constraint would make the tool even more flexible (and useful for many unstructured, mixed encoding scenarios).
adding a "-u 3" option could be a way to implement this. If specified as an option to txtconv, then txtconv ignores or replaces unmapped characters with that same character. A warning could be printed if the self-replaced character(s) does not match the desired final encoding.
This issue was reported as bug #338110 for the Gentoo Linux app-text/teckit-2.5.1 package. I don't know if it was ever reported upstream to you, but I thought it would be a good idea to make a note of it here and see what you think.
The report was:
I found a mixing of FLAGS:
/bin/sh ../libtool --tag=CXX --mode=link x86_64-pc-linux-gnu-g++ -O2 -pipe -march=core2 -frecord-gcc-switches -mssse3 -mcx16 -mmmx -g -Wmissing-include-dirs -Wenum-compare -DNDEBUG -Wl,-O1 -Wl,--as-needed -Wl,-O1,--hash-style=gnu,--sort-common -o txtconv TxtConv.o ../lib/lib
TECkit.la -lz
libtool: link: x86_64-pc-linux-gnu-g++ -O2 -pipe -march=core2 -frecord-gcc-switches -mssse3 -mcx16 -mmmx -g -Wmissing-include-dirs -Wenum-compare -DNDEBUG -Wl,-O1 -Wl,-O1 -Wl,--hash-style=gnu -Wl,--sort-common -o .libs/txtconv TxtConv.o -Wl,--as-needed ../lib/.libs/libTECkit.so
-lz
cc1plus: warning: command line option "-Wimplicit-function-declaration" is valid for C/ObjC but not for C++
The fix is this patch:
No need to pass CFLAGS twice, esp. if they are used to feed g++
Bug #338110
Index: TECkit_2_5_1/configure.ac
===================================================================
--- TECkit_2_5_1.orig/configure.ac
+++ TECkit_2_5_1/configure.ac
@@ -76,7 +76,7 @@ noexpat_CFLAGS="$CFLAGS"
noexpat_LIBS="$LIBS"
AC_CHECK_LIB(expat, XML_ExpatVersion)
AM_CONDITIONAL(SYSTEM_EXPAT, test x$ac_cv_lib_expat_XML_ExpatVersion = xyes)
-expat_CFLAGS="$CFLAGS"
+expat_CFLAGS=""
expat_LIBS="$LIBS"
CFLAGS="$noexpat_CFLAGS"
LIBS="$noexpat_LIBS"
The patch is still being applied in Gentoo's current package, app-text/teckit-2.5.6.
During #13, I discovered that there is both a README.md
file and a README
file. I think it would be a good idea to remove one, probably the README
since it is not nicely formatted by GitHub's markdown renderer. Consequently, the information in the README
should be transferred either to the README.md
and/or to NEWS
. I think the information about changes would be most appropriate in NEWS
.
Travis-CI is failing in the cross-compilation of Windows binaries. I believe the error is here:
make[2]: Entering directory '/home/travis/build/spl/teckit/windows-build32/lib'
CXX ../source/Compiler.lo
[...]
i686-w64-mingw32-windres -o Compiler_ver.o ../../source/Compiler_ver.rc
CXXLD TECkit_Compiler_x86.la
/usr/bin/ld: unrecognized option '--add-stdcall-alias'
/usr/bin/ld: use the --help option for usage information
collect2: error: ld returned 1 exit status
Makefile:687: recipe for target 'TECkit_Compiler_x86.la' failed
[...]
The command "./build-windows-binaries.sh" exited with 2.
I looked into the ld
error on Travis-CI:
$ /usr/bin/ld --version
GNU ld (GNU Binutils for Ubuntu) 2.26.1
$ /usr/bin/ld --help
Usage: /usr/bin/ld [options] file...
Options:
[...]
--add-stdcall-alias Export symbols with and without @nn
[...]
I'm not sure what the error means, since it appears that ld
does have the --add-stdcall-alias
flag.
But is this the right ld
executable here? Or is there a different one from the MinGW packages that should be used?
In the process of looking into this, I found this warning from ../configure
:
configure: WARNING: using cross tools not prefixed with host triplet
Searching for the warning led me to a page in the autoconf
manual on Specifying target triplets. I wonder if that could be related to the ld
error.
There are a number of posts around the internet with similar ld
error reports, but the solution never seems to have anything to do with the ld
flag --add-stdcall-alias
itself. Plus, I'm assuming the TECkit developers use this script locally, so it must work in some places.
I get an error when attempting to run ./configure from a new download of TECkit 2.5.10 (from https://scripts.sil.org/cms/scripts/page.php?site_id=nrsi&id=TECkitDownloads) on MacOS Mojave version 10.14.5
./configure: line 2118: syntax error near unexpected token config.h' ./configure: line 2118:
AM_CONFIG_HEADER(config.h)'
Any help would be appreciated, please let me know if I should provide any additional information.
During #13, I discovered that there is both a ChangeLog
file (empty) and a NEWS
file. Is the former left over from before git
? Can it be removed?
Is it possible to build with the system expat
library? Or is the bundled library always included?
TeX Live uses part of TECkit and has patches at
https://www.tug.org/svn/texlive/trunk/Build/source/libs/teckit/TLpatches/
to eliminate compiler warnings.
A low priority issue would be to update the TECkit source so most of these patches are not needed.
For now, TeX Live is OK with the patches.
I had a font with characters assigned to the PUA in plane 0 and later the script got accepted into the unicode standard and is now in plane 1. I made a teckit map (by hand) to do the converting, but couldn't get any success from it in Ubuntu. So I booted into Windows and install SIL Converters. When I opened the teckit editing program there I noticed that the font glyph preview windows only let you preview characters in plane 0. I copied the map I had made into the editor and tried to use it in the test area. It seemed to convert the original PUA characters to some other codepoints in plane 0, but not to the plane 1 codepoints specified in the map.
Here is the map I made:
EncodingName "SIL-Hispa-2018"
DescriptiveName "Hispa.ttf font makes use of the Private Use Space of unicode to represent the Toto characters. Now they have been accepted into Unicode proper."
Version "0"
Contact "mailto:[email protected]"
RegistrationAuthority "SIL International"
RegistrationName "Hispa-2018"
RHSFlags (ExpectsNFC) ;NFC means that when going from Unicode back to legacy, the incoming data will be NFC-normalized before the mapping rules are applied. You can't normalize the LHS legacy data.
;these lines should be included in all normal TECkit maps, for handling
;characters below 32.
ByteClass [CTL] = ( 0x00 .. 0x1f )
UniClass [CTL] = ( U+0000 .. U+001f )
[CTL] <> [CTL]
pass(Unicode)
U+e600 <> U+01e290 ; ๐
U+e601 <> U+01e291 ; ๐
U+e602 <> U+01e296 ; ๐
U+e603 <> U+01e292 ; ๐
U+e604 <> U+01e293 ; ๐
U+e605 <> U+01e297 ; ๐
U+e606 <> U+01e294 ; ๐
U+e607 <> U+01e295 ; ๐
U+e608 <> U+01e298 ; ๐
U+e609 <> U+01e299 ; ๐
U+e60a <> U+01e29c ; ๐
U+e60b <> U+01e29f ; ๐
U+e60c <> U+01e29a ; ๐
U+e60d <> U+01e29d ; ๐
U+e60e <> U+01e2a0 ; ๐
U+e60f <> U+01e29b ; ๐
U+e610 <> U+01e29e ; ๐
U+e611 <> U+01e2ae ; โ๐ฎ
U+e612 <> U+01e2a1 ; ๐ก
U+e613 <> U+01e2a2 ; ๐ข
U+e614 <> U+01e2a3 ; ๐ฃ
U+e615 <> U+01e2a5 ; ๐ฅ
U+e616 <> U+01e2a6 ; ๐ฆ
U+e617 <> U+01e2a7 ; ๐ง
U+e618 <> U+01e2a8 ; ๐จ
U+e619 <> U+01e2aa ; ๐ช
U+e61a <> U+01e2ab ; ๐ซ
U+e61b <> U+01e2ac ; ๐ฌ
U+e61c <> U+01e2aa U+01e2ae ; ๐ช๐ฎ
U+e61d <> U+01e2ad ; ๐ญ
U+e61e <> U+01e2ab U+01e29b ; ๐ซ๐
U+e61f <> U+01e2a6 U+01e298 ; ๐ฆ๐
U+e620 <> U+01e2a9 ; ๐ฉ
U+e622 <> U+0027 ; quotesingle
U+e623 <> U+01e2a4 ; ๐ค
U+e612 U+e621 <> U+01e2a2 ; ๐ข
U+e614 U+e621 <> U+01e2a4 ; ๐ค
U+e616 U+e621 <> U+01e2a7 ; ๐ง
U+e618 U+e621 <> U+01e2a9 ; ๐ฉ
U+e61a U+e621 <> U+01e2ac ; ๐ฌ
The attached text file and .tec seg faults txtconv.
Big_mouth_frog_story-kali.txt
kali-mymr.zip
license/License_LGPLv21.txt file quotes Free Software Foundation's postal address that is not valid anymore. Current one can be found at https://www.gnu.org/licenses/old-licenses/lgpl-2.1.txt. Please update the license wording to provide an up-to-date address to your users.
I have written a converter for Devanagari New (font) to Unicode. It is working well for our purposes but requires the conversion tools. Is there a way to use the mapping file such that I can call it from a simple web interface? I'd like to past in some Devanagari New text in a text box and get back Unicode text in another text box within a browser.
Hello!
Thanks for your really helpful TECkit package!
The license conditions say that the package is licensed under the CPL or the GNU LGPL. However, Debian has recently noticed that the included ConvertUTF.c and ConvertUTF.h files from Unicode, Inc. have different license conditions, ones which Debian has decided do not fulfil the Debian Free Software Guidelines. See Debian bug #823100 for a discussion of this. It turns out that this code is embedded within TeX Live too, as TECkit is used by XeTeX! And it is also embedded in a variety of other pieces of software, too.
Would it be feasible to either ask Unicode, Inc. to relicense this code, or to write replacement code which is licensed under the conditions of the rest of this package?
With many thanks!
P.S. This issue is now also being tracked in Debian for the experimental TECkit package at Bug #850438.
Building the TECkit-2.5.3 package as part of the xetex build fails during compilation:
In file included from ../../../source/libs/teckit/TECkit-2.5.3/source/Engine.cpp:120:0:
../../../source/libs/teckit/TECkit-2.5.3/source/NormalizationData.c:2575:1: error: narrowing conversion of โ69786โ from โintโ to โUInt16 {aka short unsigned int}โ inside { } [-Wnarrowing]
};
^
../../../source/libs/teckit/TECkit-2.5.3/source/NormalizationData.c:2575:1: error: narrowing conversion of โ69788โ from โintโ to โUInt16 {aka short unsigned int}โ inside { } [-Wnarrowing]
../../../source/libs/teckit/TECkit-2.5.3/source/NormalizationData.c:2575:1: error: narrowing conversion of โ69803โ from โintโ to โUInt16 {aka short unsigned int}โ inside { } [-Wnarrowing]
I am using gcc 7.2.0
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.