Git Product home page Git Product logo

Comments (11)

Stewori avatar Stewori commented on August 18, 2024 1

Oh, okay \x00 is really the plain 0, I overlooked that. This causes C string handling to terminate it. This makes sense now. Then I'd suggest to use the encoding UTF-8 which should be close to the original NewStringUTF-based implementation.
The lines

jclass strClass = (*env)->FindClass(env, "Ljava/lang/String;");
jmethodID ctorID = (*env)->GetMethodID(env, strClass, "<init>", "([BLjava/lang/String;)V");

can be removed because in JyNI these IDs are already cached at startup. I suggest this implementation:

jstring charsetName;
jobject strByteArray, jstr;
jbyte* strBytes;
Py_ssize_t len = PyString_GET_SIZE(src);

env(NULL);
charsetName = (*env)->NewStringUTF(env, "UTF-8");
strByteArray = (*env)->NewByteArray(env, len);
strBytes = (*env)->GetByteArrayElements(env, strByteArray, NULL);
memcpy(strBytes, PyString_AS_STRING(src), len);
(*env)->ReleaseByteArrayElements(env, strByteArray, strBytes, 0); //copy back and free buffer
jstr = (*env)->NewObject(env, stringClass,
		string_fromBytesAndCharsetNameConstructor, strByteArray, charsetName);
//todo: check interned-regulations on jython-side
if (JyNI_HasJyAttribute(AS_JY_NO_GC(src), JyAttributeStringInterned))
	jstr = (*env)->CallObjectMethod(env, jstr, string_intern);
return (*env)->CallStaticObjectMethod(env, pyPyClass, pyPy_newString, jstr);

We should move charsetName = (*env)->NewStringUTF(env, "UTF-8"); to be a global constant since this here might be a heavily used function. Same with charsetName = (*env)->NewStringUTF(env, "UTF-16BE"); in the PyUnicode equivalent. But that's another todo.

from jyni.

Stewori avatar Stewori commented on August 18, 2024

Thanks for reporting this issue. However, a copy/paste-able form would have been convenient for reproduction, so I'll share it here:

import sys
sys.path.append('/usr/lib/python2.7/lib-dynload')
import mmap
print mmap
f = open('jython.jar', "rb")
mp = mmap.mmap(int(f.fileno()), 0, access=mmap.ACCESS_READ)
print repr(mp[:8])
print len(mp)

Added a length output, which at least yields the correct value.
Does the truncation always occur at a line break? That might be a starting point.
I cannot tell right now when I will find time to investigate this further.
Feel free to track down this issue in native code, then I would help as far as I can.

from jyni.

Stewori avatar Stewori commented on August 18, 2024

I just tried it with some text file and there it works fine. Seems to be unrelated to newline characters.
On the other hand I tried it on a distinct binary file, where I could even provoke a crash:

Traceback (most recent call last):
  File "/data/workspace/linux/JyNI/JyNI-Demo/src/issue_28.py", line 17, in <module>
    print repr(mp[:60])
java.lang.NullPointerException
	at org.python.core.__builtin__.repr(__builtin__.java:1089)
	at org.python.core.BuiltinFunctions.__call__(__builtin__.java:105)
	at org.python.core.PyObject.__call__(PyObject.java:460)
	at org.python.pycode._pyx0.f$0(/data/workspace/linux/JyNI/JyNI-Demo/src/issue_28.py:20)
	at org.python.pycode._pyx0.call_function(/data/workspace/linux/JyNI/JyNI-Demo/src/issue_28.py)
	at org.python.core.PyTableCode.call(PyTableCode.java:171)
	at org.python.core.PyCode.call(PyCode.java:18)
	at org.python.core.Py.runCode(Py.java:1589)
	at org.python.util.PythonInterpreter.execfile(PythonInterpreter.java:296)
	at org.python.util.jython.run(jython.java:366)
	at org.python.util.jython.main(jython.java:142)
java.lang.NullPointerException: java.lang.NullPointerException

So, this seems to depend somehow on file content. print repr(mp[:60]) and any higher value cause the crash for that file while e.g. print repr(mp[:50]) succeeds. There must be specific values in the data that blow up string conversion. Maybe this is a general issue with converting arbitrary data to strings or an encoding issue. Maybe it actually boils down to a Jython or even a Java thing. Does non-string access to the data work okay to some extend?

from jyni.

666asd avatar 666asd commented on August 18, 2024

Thanks for your help. I think the problem is due to \x00. i have changed the function JySync_Init_JyString_From_PyString in JyNI-C/JySync.c . it seems to work fine.

aa

this is my change.

jobject JySync_Init_JyString_From_PyString(PyObject* src, jclass subtype)
{
    jstring jstr;
    env(NULL);
    int size = ((PyStringObject *)(src))->ob_size;
    jclass strClass = (*env)->FindClass(env, "Ljava/lang/String;");
    jmethodID ctorID = (*env)->GetMethodID(env, strClass, "<init>", "([BLjava/lang/String;)V");
    jbyteArray bytes = (*env)->NewByteArray(env, size);
    (*env)->SetByteArrayRegion(env, bytes, 0, size, (jbyte*) ((PyStringObject *)(src))->ob_sval);
    jstring encoding = (*env)->NewStringUTF(env, "GB2312");
    jstr = (*env)->NewObject(env, strClass, ctorID, bytes, encoding);
    if (JyNI_HasJyAttribute(AS_JY_NO_GC(src), JyAttributeStringInterned))
        jstr = (*env)->CallObjectMethod(env, jstr, string_intern);
    return (*env)->CallStaticObjectMethod(env, pyPyClass, pyPy_newString, jstr);
}

this result
image

from jyni.

666asd avatar 666asd commented on August 18, 2024

But the crash still appears.

>>> mp[13:14]
'J'
>>> mp[12:14]
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
java.lang.NullPointerException
	at org.python.core.PySystemState.displayhook(PySystemState.java:1489)
	at org.python.core.PySystemStateFunctions.__call__(PySystemState.java:1905)
	at org.python.core.PyObject.invoke(PyObject.java:3727)
	at org.python.core.Py.printResult(Py.java:2268)
	at org.python.pycode._pyx40.f$0(<stdin>:1)
	at org.python.pycode._pyx40.call_function(<stdin>)
	at org.python.core.PyTableCode.call(PyTableCode.java:171)
	at org.python.core.PyCode.call(PyCode.java:18)
	at org.python.core.Py.runCode(Py.java:1614)
	at org.python.core.Py.exec(Py.java:1658)
	at org.python.util.PythonInterpreter.exec(PythonInterpreter.java:276)
	at org.python.util.InteractiveInterpreter.runcode(InteractiveInterpreter.java:131)
	at org.python.util.InteractiveInterpreter.runsource(InteractiveInterpreter.java:116)
	at org.python.util.InteractiveInterpreter.runsource(InteractiveInterpreter.java:62)
	at org.python.util.InteractiveConsole.push(InteractiveConsole.java:187)
	at org.python.util.InteractiveConsole._interact(InteractiveConsole.java:168)
	at org.python.util.InteractiveConsole.interact(InteractiveConsole.java:126)
	at org.python.util.jython.run(jython.java:419)
	at org.python.util.jython.main(jython.java:142)
java.lang.NullPointerException: java.lang.NullPointerException
>>> mp[13:14]
'J'
>>> mp[10:11]
'n'
>>> mp[10:12]
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
java.lang.NullPointerException
	at org.python.core.PySystemState.displayhook(PySystemState.java:1489)
	at org.python.core.PySystemStateFunctions.__call__(PySystemState.java:1905)
	at org.python.core.PyObject.invoke(PyObject.java:3727)
	at org.python.core.Py.printResult(Py.java:2268)
	at org.python.pycode._pyx43.f$0(<stdin>:1)
	at org.python.pycode._pyx43.call_function(<stdin>)
	at org.python.core.PyTableCode.call(PyTableCode.java:171)
	at org.python.core.PyCode.call(PyCode.java:18)
	at org.python.core.Py.runCode(Py.java:1614)
	at org.python.core.Py.exec(Py.java:1658)
	at org.python.util.PythonInterpreter.exec(PythonInterpreter.java:276)
	at org.python.util.InteractiveInterpreter.runcode(InteractiveInterpreter.java:131)
	at org.python.util.InteractiveInterpreter.runsource(InteractiveInterpreter.java:116)
	at org.python.util.InteractiveInterpreter.runsource(InteractiveInterpreter.java:62)
	at org.python.util.InteractiveConsole.push(InteractiveConsole.java:187)
	at org.python.util.InteractiveConsole._interact(InteractiveConsole.java:168)
	at org.python.util.InteractiveConsole.interact(InteractiveConsole.java:126)
	at org.python.util.jython.run(jython.java:419)
	at org.python.util.jython.main(jython.java:142)
java.lang.NullPointerException: java.lang.NullPointerException

from jyni.

Stewori avatar Stewori commented on August 18, 2024

Okay, these seem to be two separate things. The idea to fix it in JySync_Init_JyString_From_PyString is promising. However, I don't see why changing to another encoding would solve it. As far as I understand it, we get bytes from a binary file that cannot be encoded as a string properly, using utf8. How can you know that this wouldn't fail with some other input for GB2312 as well? Actually I would think that an exception on attempting to turn arbitrary bytes into a string is justified. However, it does not occur with CPython, so they have some way to resolve it. We must look how it is done there and port that solution to JyNI. Do they use GB2312 in CPython? (I doubt, but haven't looked it up so far).

For the NullPointerException we'll have to understand why PySystemState.displayHook gets a null-argument...

from jyni.

666asd avatar 666asd commented on August 18, 2024

I modified the way to get the length of C string when it was converted to jstring. In the previous way, when the string of C was converted to jstring, the string was truncated because of \x00. Modifying the encoding method to GB2312 does not solve the problem of string truncation. The code is copied from the network, and the original code is GB2312 encoded. I found that the code can take effect so it doesn't change. I have the same results for utf8.

from jyni.

666asd avatar 666asd commented on August 18, 2024

@Stewori Did you analyze the cause of NullPointerException? I couldn't find the reason.

from jyni.

Stewori avatar Stewori commented on August 18, 2024

Sorry, this is currently in conflict with my work on #22 in my local branch, causing additional errors. Sure, I could make a second clone for this, but anyway -- I prefer to work on one issue at a time. Given that #22 is tricky, it might take a while until I can follow up here. So far it's limited to rough comments, review and style hints, e.t.c. from my side.

from jyni.

666asd avatar 666asd commented on August 18, 2024

i find pyfile in jython use the function to convent byte[] to string.
image

then i change the function JySync_Init_JyString_From_PyString.

jobject JySync_Init_JyString_From_PyString(PyObject* src, jclass subtype)
{
        jstring charsetName;
        jobject strByteArray, jstr;
        jbyte* strBytes;
        Py_ssize_t len = PyString_GET_SIZE(src);

        env(NULL);
        charsetName = (*env)->NewStringUTF(env, "UTF-8");
        strByteArray = (*env)->NewByteArray(env, len);
        strBytes = (*env)->GetByteArrayElements(env, strByteArray, NULL);
        memcpy(strBytes, PyString_AS_STRING(src), len);
        (*env)->ReleaseByteArrayElements(env, strByteArray, strBytes, 0); //copy back and free buff$
        jclass strClass = (*env)->FindClass(env, "java/lang/String");
        jmethodID ctorID = (*env)->GetMethodID(env, strClass, "<init>", "([BIII)V");

        jstr = (*env)->NewObject(env, strClass, ctorID, strByteArray, 0, 0, len);
        //todo: check interned-regulations on jython-side
        if (JyNI_HasJyAttribute(AS_JY_NO_GC(src), JyAttributeStringInterned))
                jstr = (*env)->CallObjectMethod(env, jstr, string_intern);
                return (*env)->CallStaticObjectMethod(env, pyPyClass, pyPy_newString, jstr);
}

now it can work.
image

from jyni.

Stewori avatar Stewori commented on August 18, 2024

Did you check that the conversion matches CPython behavior?
The constructor you suggest now would fill the upper 8 bits with zeros. This should be the same like creating an array of jchars and assigning the bytes to it in a loop (goodbye memcopy). Based on this we should get the same result using the constructor String(char[] value, int offset, int count), but avoiding a deprecated constructor.

With that taken into account, would you turn your fix into a PR?

If you would turn it into a PR, these two lines should be adjusted to fit with JyNI's approach to handle JNI:

jclass strClass = (*env)->FindClass(env, "java/lang/String");
jmethodID ctorID = (*env)->GetMethodID(env, strClass, "<init>", "([BIII)V");

The first lookup can be avoided by directly using stringClass, which is globally cached on JyNI startup. The second line should move to JyNI_JNI.c, see this post for a rough usage overview.

Otherwise I can refactor it into a commit as soon as I find time.
Thanks for finding this isue and thanks even more for solving it!

from jyni.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.