alibaba / pemja Goto Github PK
View Code? Open in Web Editor NEWLicense: Apache License 2.0
License: Apache License 2.0
I am running Flink 1.16 on Mac M1. Everything works as expected except few tweaks I had to make to get the pyflink 1.16 to work in my M1. However, when I decided to test the job in Thread mode, I got the following error:
2022-11-14 17:01:51
pemja.core.PythonException: <class 'TypeError'>: 'NoneType' object is not iterable
at /usr/local/lib/python3.8/site-packages/pyflink/fn_execution/embedded/operations.process_element2(operations.py:140)
at /usr/local/lib/python3.8/site-packages/pyflink/fn_execution/embedded/operations._output_elements(operations.py:57)
at /usr/local/lib/python3.8/site-packages/pyflink/fn_execution/embedded/operations._process_elements_on_operation(operations.py:48)
at /usr/local/lib/python3.8/site-packages/pyflink/fn_execution/datastream/embedded/operations.process_element_func2(operations.py:208)
at /usr/local/lib/python3.8/site-packages/pyflink/fn_execution/datastream/embedded/operations.process_func(operations.py:111)
at pemja.core.object.PyIterator.next(Native Method)
at pemja.core.object.PyIterator.hasNext(PyIterator.java:40)
at org.apache.flink.streaming.api.operators.python.embedded.AbstractTwoInputEmbeddedPythonFunctionOperator.processElement2(AbstractTwoInputEmbeddedPythonFunctionOperator.java:208)
at org.apache.flink.streaming.runtime.io.StreamTwoInputProcessorFactory.processRecord2(StreamTwoInputProcessorFactory.java:225)
at org.apache.flink.streaming.runtime.io.StreamTwoInputProcessorFactory.lambda$create$1(StreamTwoInputProcessorFactory.java:194)
at org.apache.flink.streaming.runtime.io.StreamTwoInputProcessorFactory$StreamTaskNetworkOutput.emitRecord(StreamTwoInputProcessorFactory.java:266)
at org.apache.flink.streaming.runtime.io.AbstractStreamTaskNetworkInput.processElement(AbstractStreamTaskNetworkInput.java:134)
at org.apache.flink.streaming.runtime.io.AbstractStreamTaskNetworkInput.emitNext(AbstractStreamTaskNetworkInput.java:105)
at org.apache.flink.streaming.runtime.io.StreamOneInputProcessor.processInput(StreamOneInputProcessor.java:65)
at org.apache.flink.streaming.runtime.io.StreamMultipleInputProcessor.processInput(StreamMultipleInputProcessor.java:85)
at org.apache.flink.streaming.runtime.tasks.StreamTask.processInput(StreamTask.java:542)
at org.apache.flink.streaming.runtime.tasks.mailbox.MailboxProcessor.runMailboxLoop(MailboxProcessor.java:231)
at org.apache.flink.streaming.runtime.tasks.StreamTask.runMailboxLoop(StreamTask.java:831)
at org.apache.flink.streaming.runtime.tasks.StreamTask.invoke(StreamTask.java:780)
at org.apache.flink.runtime.taskmanager.Task.runWithSystemExitMonitoring(Task.java:935)
at org.apache.flink.runtime.taskmanager.Task.restoreAndInvoke(Task.java:914)
at org.apache.flink.runtime.taskmanager.Task.doRun(Task.java:728)
at org.apache.flink.runtime.taskmanager.Task.run(Task.java:550)
at java.base/java.lang.Thread.run(Thread.java:829)
The following is the brief settings i have in my job
env = StreamExecutionEnvironment.get_execution_environment()
env.set_stream_time_characteristic(TimeCharacteristic.EventTime)
# Additional python settings
env_config = Configuration(
j_configuration=get_j_env_configuration(env._j_stream_execution_environment)
)
env_config.set_string("python.execution-mode", "thread")
I am running the job with 2 parallelism.
Hi, in java web, is it possible to use pemja to call a deep learning model written in python to implement an AI service? There is such a scenario, there are two teams, one team is good at writing java web, the other team is good at using python to research machine translation models, we hope to use pemja to connect these two teams and deliver a robust and efficient model service , so that there is no need for a C++ development team or a python development team to develop web services. When the design of pemja, is it possible to support java calls to persistent python instances in RAM memory or GPU memory?
FLIP-206 compares PemJa to Jython, GraalVM, JPype and Jep
deeplearning4j's Python4J framework seems more comparable to PemJa than any of these
It depends on javacpp-presets which bundles CPython into a Jar and depends on java-cpp to abstract from the JNI
API example:
try(PythonGIL gil = PythonGIL.lock()){
try(PythonGC gc = PythonGC.watch()){
List<PythonVariable> inputs = new ArrayList<>();
inputs.add(new PythonVariable<>("x", PythonTypes.STR, "Hello "));
inputs.add(new PythonVariable<>("y", PythonTypes.STR, "World"));
PythonVariable out = new PythonVariable<>("z", PythonTypes.STR);
String code = "z = x + y";
PythonExecutioner.exec(code, inputs, Collections.singletonList(out));
System.out.println(out.getValue());
}
}catch (Throwable e){
e.printStackTrace();
}
A comparison between this approach and that of PemJa would be very useful
Will you support Pemja for Windows operating system?
I'm trying to build a docker image to host my app with both Java and Python components
FROM python:3.9
RUN apt-get update
RUN apt install default-jre -y
COPY myapp.jar ./
CMD java -classpath myapp.jar foo.Main
But I get the error
java.lang.RuntimeException: Failed to find libpython
at pemja.utils.CommonUtils.getPythonLibrary(CommonUtils.java:175)
at pemja.core.PythonInterpreter$MainInterpreter.initialize(PythonInterpreter.java:358)
at pemja.core.PythonInterpreter.initialize(PythonInterpreter.java:145)
at pemja.core.PythonInterpreter.<init>(PythonInterpreter.java:46)
This appears to be because the path pattern doesn't match what is in this Python image
PemJa looks for ^libpython.*so$
String libPythonPathPattern;
if (isLinuxOs()) {
libPythonPathPattern = "^libpython.*so$";
} else if (isMacOs()) {
libPythonPathPattern = "^libpython.*dylib$";
} else {
throw new RuntimeException("Unsupported os ");
}
if (libFile.isDirectory()) {
for (File f : Objects.requireNonNull(libFile.listFiles())) {
if (f.isFile() && Pattern.matches(libPythonPathPattern, f.getName())) {
return f.getAbsolutePath();
}
}
}
throw new RuntimeException("Failed to find libpython");
When the actual contents of /usr/local/lib
is:
libpython3.9.so libpython3.9.so.1.0 libpython3.so pkgconfig python3.9
Pemja is missing an anaconda package, this will make installations easier for those who want to use pemja in their packages.
From codes in PythonInterpreter, the MainInterpreter implements Serializable but has non-serializable CountDownLatch
fields.
The codes may break the Serializable
interface semantics.
There is an open PR for building pemja on arm64 here: https://github.com/alibaba/pemja/pull/40/files
Requesting that this PR be finished and merged in to allow building on arm64 architectures.
python function:
train_data = np.array(df)
return train_data
java invoke return type:
pemja.core.object.PyObject
How to return a python null array to the Java side? The return type is java.util.List
python setup.py egg_info
The following is error message
numpy not found
python -m pip install -r flink-python/dev/dev-requirements.txt
The following is error message
Looking in indexes: https://pypi.tuna.tsinghua.edu.cn/simple
Collecting pemja==0.1.5
Using cached https://pypi.tuna.tsinghua.edu.cn/packages/36/32/18615e64be80b70c4c95dc2d7d3b20a9d706dc5fcefc92ebafe0348ca3dc/pemja-0.1.5.tar.gz (32 kB)
Preparing metadata (setup.py) ... error
error: subprocess-exited-with-error
× python setup.py egg_info did not run successfully.
│ exit code: 255
╰─> [1 lines of output]
numpy not found
[end of output]
note: This error originates from a subprocess, and is likely not a problem with pip.
error: metadata-generation-failed
× Encountered error while generating package metadata.
╰─> See above for output.
note: This is an issue with the package mentioned above, not pip.
hint: See above for details.
https://github.com/exaloop/codon
Codon is a high-performance Python compiler that compiles Python code to native machine code without any runtime overhead.
If Pemja will integrate Codeon, I believe it will highly promote your popularity.
Executing this example in the README
fails
interpreter.set("a", 12345);
interpreter.get("a"); // Object
interpreter.get("a", int.class);
With error: Exception in thread "main" pemja.core.PythonException: Unknown Number class int.
interpreter.get("a", Integer.class);
runs but returns a boxed integer instead of the primitive int
I've just analyzed through why pemja 0.4.0
is not found and the reason is that:
0.3.0
has pom inside: https://repo.maven.apache.org/maven2/com/alibaba/pemja/0.3.0/0.4.0
has no
pom inside: https://repo.maven.apache.org/maven2/com/alibaba/pemja/0.4.0/In many scenarios, Python is used as the entry point to call Java code. but pemja does not support jvm scheduling by pvm currently. We hope that the pemja team can support this feature. Thank you!
I define a function by exec, but invoke this function failed, eg:
interpreter.exec("def f(a):return a");
interpreter.invoke("f", 1);
then throw:
Exception in thread "main" pemja.core.PythonException: <class 'RuntimeError'>: Failed to find the function `f`
at pemja.core.PythonInterpreter.invokeOneArgInt(Native Method)
at pemja.core.PythonInterpreter.invokeOneArg(PythonInterpreter.java:184)
at pemja.core.PythonInterpreter.invoke(PythonInterpreter.java:93)
I want to use it on PySpark
It would be great if invokeMethod
could take the return type, similar to public <T> T get(String name, Class<T> clazz)
The below code which passes an integer to Python and expects an integer back fails with the error class java.lang.Long cannot be cast to class java.lang.Integer
var pyFnName = "add_one";
var pythonCode = pyFnName + " = lambda x: x + 1";
int input = 5;
try (var interpreter = new PythonInterpreter(config);) {
interpreter.exec(pythonCode);
var pyFn = (PyObject) interpreter.get(pyFnName);
var out = (int) pyFn.invokeMethod("__call__", input);
System.out.println(out);
}
An integer can be retrieved by using the generic get
alongside exec
but is fiddlier and less pretty
var pyFnName = "add_one";
var pythonCode = pyFnName + " = lambda x: x + 1";
int input = 5;
try (var interpreter = new PythonInterpreter(config);) {
interpreter.exec(pythonCode);
var resultVarName = "result";
interpreter.exec(resultVarName + "=" + pyFnName + "(" + input + ")");
var out = (int) interpreter.get(resultVarName, Integer.class);
System.out.println(out);
}
Or can accept the value back as a Long
and convert it to an int value
var out = (Long) pyFn.invokeMethod("__call__", input);
var intOut = out.intValue();
So functionally everything is available, with a public <T> T invokeMethod(String name, Class<T> clazz, Object... args)
method just providing syntactic sugar
Hi,
I am trying to use PEMJA to build an interface between python and scala code. This is working fine for simple cases, however I run into a deadlock in the following situation:
Things I have tried to work around this issue:
MULTI_THREAD
this still deadlocks, with SUB_INTERPRETER
this fails with a numpy import error.C [pemja_core.cpython-310-darwin.so+0x7fc8] JcpPyJObject_New+0xf8
when trying to call non-trivial jvm methods from the new python thread).Do you have any guidance on how to work around this issue?
When I following the document to callback java in python,
from pemja import findClass
StringBuilder = findClass('java.lang.StringBuilder')
I catch such error:
ImportError: cannot import name 'findClass' from 'pemja' (/root/miniconda3/envs/pemja/lib/python3.8/site-packages/pemja/__init__.py)
I create pemja env by conda, and here is all my packages listed by 'pip list'
Package Version
-------------- ---------
certifi 2022.9.24
find-libpython 0.3.0
numpy 1.21.4
pemja 0.2.6
pip 22.2.2
setuptools 65.5.0
wheel 0.37.1
python = 3.8
jdk = openjdk version "1.8.0_292"
pemja = 0.2.6
I can't seem to get the pip install to work with python 3.10.6 and pip 22.3.1
pip install requests pemja
Defaulting to user installation because normal site-packages is not writeable
Requirement already satisfied: requests in /usr/lib/python3/dist-packages (2.25.1)
Collecting pemja
Using cached pemja-0.2.6.tar.gz (48 kB)
Installing build dependencies ... done
Getting requirements to build wheel ... error
error: subprocess-exited-with-error
× Getting requirements to build wheel did not run successfully.
│ exit code: 1
╰─> [24 lines of output]
setup.py:34: RuntimeWarning: Pemja may not yet support Python 3.10.
warnings.warn(
Traceback (most recent call last):
File "/home/richard/.local/lib/python3.10/site-packages/pip/_vendor/pep517/in_process/_in_process.py", line 351, in
main()
File "/home/richard/.local/lib/python3.10/site-packages/pip/_vendor/pep517/in_process/_in_process.py", line 333, in main
json_out['return_val'] = hook(**hook_input['kwargs'])
File "/home/richard/.local/lib/python3.10/site-packages/pip/_vendor/pep517/in_process/_in_process.py", line 118, in get_requires_for_build_wheel
return hook(config_settings)
File "/tmp/pip-build-env-gt1_4mbr/overlay/local/lib/python3.10/dist-packages/setuptools/build_meta.py", line 162, in get_requires_for_build_wheel
return self._get_build_requires(
File "/tmp/pip-build-env-gt1_4mbr/overlay/local/lib/python3.10/dist-packages/setuptools/build_meta.py", line 143, in _get_build_requires
self.run_setup()
File "/tmp/pip-build-env-gt1_4mbr/overlay/local/lib/python3.10/dist-packages/setuptools/build_meta.py", line 267, in run_setup
super(_BuildMetaLegacyBackend,
File "/tmp/pip-build-env-gt1_4mbr/overlay/local/lib/python3.10/dist-packages/setuptools/build_meta.py", line 158, in run_setup
exec(compile(code, file, 'exec'), locals())
File "setup.py", line 184, in
include_dirs=get_java_include() + ['src/main/c/pemja/core/include'] + get_numpy_include(),
File "setup.py", line 111, in get_java_include
inc = os.path.join(get_java_home(), inc_name)
File "/usr/lib/python3.10/posixpath.py", line 76, in join
a = os.fspath(a)
TypeError: expected str, bytes or os.PathLike object, not NoneType
When building from scratch I get:
python setup.py sdist
/home/richard/src/pemja/setup.py:23: DeprecationWarning: The distutils package is deprecated and slated for removal in Python 3.12. Use setuptools or check PEP 632 for potential alternatives
from distutils.command.build_ext import build_ext as old_build_ext
/home/richard/src/pemja/setup.py:34: RuntimeWarning: Pemja may not yet support Python 3.10.
warnings.warn(
Traceback (most recent call last):
File "/home/richard/src/pemja/setup.py", line 184, in
include_dirs=get_java_include() + ['src/main/c/pemja/core/include'] + get_numpy_include(),
File "/home/richard/src/pemja/setup.py", line 111, in get_java_include
inc = os.path.join(get_java_home(), inc_name)
File "/usr/lib/python3.10/posixpath.py", line 76, in join
a = os.fspath(a)
TypeError: expected str, bytes or os.PathLike object, not NoneType
I see that the documentation and unit tests are triggered in Java, is there a way to trigger them through Python?
import org.apache.flink.api.common.functions.RichFlatMapFunction;
import org.apache.flink.streaming.api.datastream.DataStream;
import org.apache.flink.streaming.api.environment.StreamExecutionEnvironment;
import org.apache.flink.table.api.Table;
import org.apache.flink.table.api.bridge.java.StreamTableEnvironment;
import org.apache.flink.types.Row;
import org.apache.flink.util.Collector;
import pemja.core.PythonInterpreter;
import pemja.core.PythonInterpreterConfig;
public class PythonExecTest {
public static void main(String[] args) throws Exception {
StreamExecutionEnvironment env = StreamExecutionEnvironment.createLocalEnvironment();
StreamTableEnvironment tableEnv = StreamTableEnvironment.create(env);
tableEnv.executeSql("create table datagen (f1 string, f2 int) with ('connector' = 'datagen')");
Table table = tableEnv.from("datagen");
DataStream<Row> dataStream = tableEnv.toDataStream(table);
System.out.println("======================================");
dataStream.flatMap(new MyRichFlatMapFunction());
System.out.println("================end======================");
}
static class MyRichFlatMapFunction extends RichFlatMapFunction<Row, Row> {
private static PythonInterpreterConfig pythonInterpreterConfig;
static {
System.out.println("=================open==================");
pythonInterpreterConfig = PythonInterpreterConfig
.newBuilder()
.setExcType(PythonInterpreterConfig.ExecType.SUB_INTERPRETER)
// .setExcType(PythonInterpreterConfig.ExecType.MULTI_THREAD)
// 设置python的环境路径
.setPythonExec("/root/miniconda3/bin/python3")
// 设置python的执行路径
.addPythonPaths("/root/pyAutoflow/python")
// 设置依赖路径
.addPythonPaths("/root/miniconda3/lib/python3.8/site-packages")
.build();
}
@Override
public void flatMap(Row row, Collector<Row> collector) throws Exception {
System.out.println("=================flatMap==================");
// 构建执行器
PythonInterpreter interpreter = new PythonInterpreter(pythonInterpreterConfig);
interpreter.set("a", 12345);
Integer a = interpreter.get("a", Integer.class);
// 执行脚本内容
interpreter.exec("print(a)");
// 要执行的文件
interpreter.exec("import funcs");
for (int i = 0; i < 10; i++) {
// 调用
Object result = interpreter.invoke("funcs.add", i, 2);
System.out.println("result-------------->" + result);
}
}
}
}
======================================
=================open==================
================end======================
[failed]
python -c "from pemja import findClass; Integer = findClass('java.lang.Integer'); print(Integer.toHexString(Integer.MAX_VALUE))" Traceback (most recent call last): File "<string>", line 1, in <module> ImportError: cannot import name 'findClass' from 'pemja' (/usr/local/lib/python3.8/site-packages/pemja/__init__.py)
@HuangXingBo I've seen your commit to add python 3.10 version here.
Are there plans to add 3.11? It's required for Flink python 3.11 support.
Unable to install pemja on raspberry pi, tried with python 3.7 and 3.9
Steps to reproduce:
$ virtualenv -p /usr/bin/python3.9 venv3.9
$ source venv3.9/bin/activate
$ python --version
Python 3.9.0
$ pip --version
pip 22.1.1 from /home/pi/venv3.9/lib/python3.9/site-packages/pip (python 3.9)
$ pip install pemja
Looking in indexes: https://pypi.org/simple, https://www.piwheels.org/simple
Collecting pemja
Downloading pemja-0.1.5.tar.gz (32 kB)
Preparing metadata (setup.py) ... error
error: subprocess-exited-with-error
× python setup.py egg_info did not run successfully.
│ exit code: 1
╰─> [10 lines of output]
Traceback (most recent call last):
File "<string>", line 2, in <module>
File "<pip-setuptools-caller>", line 34, in <module>
File "/tmp/pip-install-iqprsatx/pemja_afa58ffaf082492bb729b5377e1491da/setup.py", line 185, in <module>
include_dirs=get_java_include() + ['src/main/c/pemja/core/include'] + get_numpy_include(),
File "/tmp/pip-install-iqprsatx/pemja_afa58ffaf082492bb729b5377e1491da/setup.py", line 112, in get_java_include
inc = os.path.join(get_java_home(), inc_name)
File "/home/pi/venv3.9/lib/python3.9/posixpath.py", line 76, in join
a = os.fspath(a)
TypeError: expected str, bytes or os.PathLike object, not NoneType
[end of output]
note: This error originates from a subprocess, and is likely not a problem with pip.
error: metadata-generation-failed
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.