Git Product home page Git Product logo

Comments (16)

JamesHeinrich avatar JamesHeinrich commented on August 23, 2024

A UTF-16 sample file would be a great start.

from getid3.

nicklan avatar nicklan commented on August 23, 2024

Sure. This file: http://datashat.net/music_for_programming_10-unity_gain_temple.mp3 (from http://musicforprogramming.net/) shows the problem.

Screenshot of what I'm seeing:
2015-11-23-221744_1066x675_scrot

and output of id3v2 -l:
2015-11-23-221905_893x102_scrot

from getid3.

JamesHeinrich avatar JamesHeinrich commented on August 23, 2024

Those non-displayable characters are indeed the Byte Order Marker (BOM) from the UTF-16 text.

The ID3 documentation specifies this regarding text encodings:

Frames that allow different types of text encoding contains a text
encoding description byte. Possible encodings:

 $00   ISO-8859-1 [ISO-8859-1]. Terminated with $00.
 $01   UTF-16 [UTF-16] encoded Unicode [UNICODE] with BOM. All
       strings in the same frame SHALL have the same byteorder.
       Terminated with $00 00.
 $02   UTF-16BE [UTF-16] encoded Unicode [UNICODE] without BOM.
       Terminated with $00 00.
 $03   UTF-8 [UTF-8] encoded Unicode [UNICODE]. Terminated with $00.

Strings dependent on encoding are represented in frame descriptions
as <text string according to encoding>, or <full text string
according to encoding> if newlines are allowed. Any empty strings of
type $01 which are NULL-terminated may have the Unicode BOM followed
by a Unicode NULL ($FF FE 00 00 or $FE FF 00 00).

Your file is tagged with encoding 01 "UTF-16" which means the text could be either big-endian or little-endian, as determined by the BOM at the start of the string. Without the BOM it is unknown how to display (or convert) the text since it's not known what order the bytes come in. With encoding 02 "UTF-16BE" the order is known so the BOM is not needed.

I did make a small change to remove the BOM from blank frame description fields (which are usually blank). The BOM will remain for non-empty description as well as the actual data.
88d284f

Normally you would pull the comment data you need from $info['comments']['title'] rather than $info['id3v2']['COMM'][0]['data'] and the data there is (by default) already converted to UTF-8 which intrinsically removes the BOM. If you do need to process your data directly in UTF-16 for whatever reason then you would need the BOM intact otherwise your string couldn't be handled.

from getid3.

nicklan avatar nicklan commented on August 23, 2024

Ahh yes, this makes sense. Can I ask then though, why $info['comments']['title'] seems to be an array of two elements, one without the BOM but shortened, and one still with the BOM (i assume) but all the rest of text. See below:
2015-11-24-150841_902x363_scrot

from getid3.

JamesHeinrich avatar JamesHeinrich commented on August 23, 2024

That shouldn't be. There should only be one instance of each title without the BOM. Please check that you've mirrored all the changes from Github.

g61

from getid3.

nicklan avatar nicklan commented on August 23, 2024

I have the latest version and I'm still seeing the same as above. I made a fresh checkout of the repo, and at the bottom of the page I see "Powered by getID3() v1.9.10-201511241457" which seems to be the latest version. (Thanks very much for looking into this by the way!)

from getid3.

nicklan avatar nicklan commented on August 23, 2024

Well, I think I know why there are two things, seems like one is coming from the id3v1 tag (the shortened one) and one from the id3v2 tag (with the BOM). You probably already figure that :) But I'm still not sure why you're not seeing that behavior. Could there be something in my php settings? I'm on 5.6.4 64-bit.

from getid3.

JamesHeinrich avatar JamesHeinrich commented on August 23, 2024

My best guess would be that your PHP installation doesn't have native iconv() support and it's relying on getid3_lib::iconv_fallback() and there may be an issue in there.

Note that this is simply a guess at this point, I'll need to take a look at that tomorrow and see if I can find a problem. I'll let you know.

from getid3.

JamesHeinrich avatar JamesHeinrich commented on August 23, 2024

Can you save the entire output of demo.browse for that file to a .html file and attach it here please?

from getid3.

nicklan avatar nicklan commented on August 23, 2024

Sure, attached below (as .txt so github would let me). I'll have a look too and see if I can figure anything out with the iconv thing, thanks for the hint.

getID3() - _demo_demo.browse.php (sample script).txt

from getid3.

JamesHeinrich avatar JamesHeinrich commented on August 23, 2024

If I disable the built-in iconv and use getID3's version it still works correctly. Perhaps there is an issue with your built-in version of iconv?

First let's check if it's there, what version if available, and then try a very simple conversion using both PHP's iconv() function and getID3's version:

require_once('N:/webroot/_github/getID3/getid3/getid3.lib.php');
$string = "\xFF\xFE\x48\x00\x69\x00"; // BOM+"Hi"
echo '<pre>';
echo (function_exists('iconv') ? 'yes: '.`iconv --version` : 'no').'<hr>';
var_dump(iconv('UTF-16', 'UTF-8//TRANSLIT', $string));
var_dump(getid3_lib::iconv_fallback('UTF-16', 'UTF-8//TRANSLIT', $string));
echo '</pre>';

They should both just say "Hi" with no BOM, 2 chars long. I suspect one of them will be 4-chars with a BOM.

from getid3.

nicklan avatar nicklan commented on August 23, 2024

Yep, looks like iconv is failing and the builtin one is leaving the BOM:

yes: iconv (Gentoo 2.21-r1 p5) 2.21
Copyright (C) 2015 Free Software Foundation, Inc.
This is free software; see the source for copying conditions.  There is NO
warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.
Written by Ulrich Drepper.
bool(false)
string(6) "��Hi"

from getid3.

nicklan avatar nicklan commented on August 23, 2024

ahh, and iconv error is: "Notice: iconv(): Wrong charset, conversion from UTF-16' toUTF-8//TRANSLIT' is not allowed in [path_to_test].php on line 13" (the iconv line)

from getid3.

nicklan avatar nicklan commented on August 23, 2024

couple of other notes

  • on the command line, iconv seems to be able to convert from utf-16 to utf-8 without a problem (i.e. not going through php). not sure if that's at all relevant but I wanted to test.
  • i've tried UTF-8//IGNORE and UTF-8 with the same results

from getid3.

nicklan avatar nicklan commented on August 23, 2024

ohh, and if i run php at the command line, it works. outputting:

<pre>yes: iconv (Gentoo 2.21-r1 p5) 2.21
Copyright (C) 2015 Free Software Foundation, Inc.
This is free software; see the source for copying conditions.  There is NO
warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.
Written by Ulrich Drepper.
<hr>string(2) "Hi"
string(2) "Hi"

So it must be something with my nginx install. Yar. I will keep hunting.

from getid3.

nicklan avatar nicklan commented on August 23, 2024

Okay, turned out to be an issue with php-fpm which wasn't loading the iconv shared libraries properly. Thanks for the help pin-pointing it!

from getid3.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.