leondavi / nerlnet Goto Github PK
View Code? Open in Web Editor NEWNerlnet is a distributed machine learning platform for experiments and IoT deployment.
License: GNU General Public License v3.0
Nerlnet is a distributed machine learning platform for experiments and IoT deployment.
License: GNU General Public License v3.0
Hi, I am new to this and wondering if there is any possible way we could run NerlNet on a single PC?
I've been attempting to set it up on my PC by following the example provided in the document with a changes to my local address and port only.
./NerlNetInstall.sh
./NerlnetJupyterEnvGenerator.sh -j ./examples/
example_run.ipynb
to step api_server_instance.selectJsons()
with following choices:Architecture:
0. arch_1PCSIM6WorkerMNist.json
Connection Map Files
0. conn_1Router1Client1S.json
Experiments Flow Files
1. exp_1Worker1SourceHealth.json
I did change in arch_*.json
by these changes only:
"devices": [
{
"host": "0.0.0.0",
"entities": "mainServer,c1,c2,c3,c4,c5,c6,s1,r1,r2,r3,r4,r5,r6,apiServer"
}
],
"apiServer":
{
"host": "0.0.0.0",
"port": "8080",
"args": ""
}
,
"nerlGUI":
{
"host": "0.0.0.0",
"port": "8096",
"args": ""
}
,
"mainServer":
{
"host": "0.0.0.0",
"port": "8484",
"args": ""
}
api_server_instance.sendJsonsToDevices()
and received this log from apiServerSending JSON paths to devices...
Init JSONs sent to devices
2023-08-19T19:36:32.616293+07:00 info: nerlNetServer_app/start@92: This device IP: "192.168.1.10"
2023-08-19T19:37:04.155398+07:00 info: nerlNetServer_app/start@98: ArchitectureAdderess: <<"arch.json">>, CommunicationMapAdderess : <<"conn.json">>
2023-08-19T19:37:04.177674+07:00 notice: Host IP="192.168.1.10"
2023-08-19T19:37:04.177983+07:00 error: crasher: initial call: application_master:init/4, pid: <0.186.0>, registered_name: [], exit: {{bad_return,{{nerlNetServer_app,start,[normal,[]]},{'EXIT',{{badkey,<<"192.168.1.10">>},[{erlang,map_get,[<<"192.168.1.10">>,#{<<"0.0.0.0">> => [mainServer,c1,c2,c3,c4,c5,c6,s1,r1,r2,r3,r4,r5,r6,apiServer]}],[{error_info,#{module => erl_erts_errors}}]},{jsonParser,json_to_ets,2,[{file,"/media/ubuntu_data/NErlNet/src_erl/Communication_Layer/http_Nerlserver/src/init/jsonParser.erl"},{line,136}]},{jsonParser,getHostEntities,3,[{file,"/media/ubuntu_data/NErlNet/src_erl/Communication_Layer/http_Nerlserver/src/init/jsonParser.erl"},{line,165}]},{nerlNetServer_app,parseJsonAndStartNerlnet,1,[{file,"/media/ubuntu_data/NErlNet/src_erl/Communication_Layer/http_Nerlserver/src/nerlNetServer_app.erl"},{line,144}]},{nerlNetServer_app,start,2,[{file,"/media/ubuntu_data/NErlNet/src_erl/Communication_Layer/http_Nerlserver/src/nerlNetServer_app.erl"},{line,100}]},{application_master,start_it_old,4,[{file,"application_master.erl"},{line,293}]}]}}}},[{application_master,init,4,[{file,"application_master.erl"},{line,142}]},{proc_lib,init_p_do_apply,3,[{file,"proc_lib.erl"},{line,240}]}]}, ancestors: [<0.185.0>], message_queue_len: 1, messages: [{'EXIT',<0.187.0>,normal}], links: [<0.185.0>,<0.44.0>], dictionary: [], trap_exit: true, status: running, heap_size: 610, stack_size: 28, reductions: 216; neighbours:
2023-08-19T19:37:04.178554+07:00 notice: Application: nerlNetServer. Exited: {bad_return,{{nerlNetServer_app,start,[normal,[]]},{'EXIT',{{badkey,<<"192.168.1.10">>},[{erlang,map_get,[<<"192.168.1.10">>,#{<<"0.0.0.0">> => [mainServer,c1,c2,c3,c4,c5,c6,s1,r1,r2,r3,r4,r5,r6,apiServer]}],[{error_info,#{module => erl_erts_errors}}]},{jsonParser,json_to_ets,2,[{file,"/media/ubuntu_data/NErlNet/src_erl/Communication_Layer/http_Nerlserver/src/init/jsonParser.erl"},{line,136}]},{jsonParser,getHostEntities,3,[{file,"/media/ubuntu_data/NErlNet/src_erl/Communication_Layer/http_Nerlserver/src/init/jsonParser.erl"},{line,165}]},{nerlNetServer_app,parseJsonAndStartNerlnet,1,[{file,"/media/ubuntu_data/NErlNet/src_erl/Communication_Layer/http_Nerlserver/src/nerlNetServer_app.erl"},{line,144}]},{nerlNetServer_app,start,2,[{file,"/media/ubuntu_data/NErlNet/src_erl/Communication_Layer/http_Nerlserver/src/nerlNetServer_app.erl"},{line,100}]},{application_master,start_it_old,4,[{file,"application_master.erl"},{line,293}]}]}}}}. Type: temporary.
2023-08-19T19:37:04.179228+07:00 notice: Application: cowboy. Exited: stopped. Type: temporary.
2023-08-19T19:37:05.161489+07:00 notice: Application: ranch. Exited: stopped. Type: temporary.
2023-08-19T19:37:05.161603+07:00 notice: Application: cowlib. Exited: stopped. Type: temporary.
===> Failed to boot nerlNetServer for reason {bad_return,
{{nerlNetServer_app,
start,
[normal,[]]},
{'EXIT',
{{badkey,
<<"192.168.1.10">>},
[{erlang,map_get,
[<<"192.168.1.10">>,
#{<<"0.0.0.0">> =>
[mainServer,
c1,c2,c3,
c4,c5,c6,
s1,r1,r2,
r3,r4,r5,
r6,
apiServer]}],
[{error_info,
#{module =>
erl_erts_errors}}]},
{jsonParser,
json_to_ets,2,
[{file,
"/media/ubuntu_data/NErlNet/src_erl/Communication_Layer/http_Nerlserver/src/init/jsonParser.erl"},
{line,136}]},
{jsonParser,
getHostEntities,
3,
[{file,
"/media/ubuntu_data/NErlNet/src_erl/Communication_Layer/http_Nerlserver/src/init/jsonParser.erl"},
{line,165}]},
{nerlNetServer_app,
parseJsonAndStartNerlnet,
1,
[{file,
"/media/ubuntu_data/NErlNet/src_erl/Communication_Layer/http_Nerlserver/src/nerlNetServer_app.erl"},
{line,144}]},
{nerlNetServer_app,
start,2,
[{file,
"/media/ubuntu_data/NErlNet/src_erl/Communication_Layer/http_Nerlserver/src/nerlNetServer_app.erl"},
{line,100}]},
{application_master,
start_it_old,4,
[{file,
"application_master.erl"},
{line,
293}]}]}}}}
I did check again in this dir /src_erl/Communication_Layer
seem like it successful transfer the json but unable to parse.
Can you help me to get it up and running please?
Thanks a lot!
@kapelnik Please review the following error message
===> Failed to boot nerlNetServer for reason {bad_return,
{{nerlNetServer_app,
start,
[normal,[]]},
{'EXIT',
{{badmatch,
{error,
{badmatch,
{error,
undef}}}},
[{clientStatem,
start_link,1,
[{file,
"/home/pi/workspace/NErlNet/src_erl/Communication_Layer/http_Nerlserver/src/Client/clientStatem.erl"},
{line,47}]},
{nerlNetServer_app,
createClientsAndWorkers,
2,
[{file,
"/home/pi/workspace/NErlNet/src_erl/Communication_Layer/http_Nerlserver/src/nerlNetServer_app.erl"},
{line,107}]},
{nerlNetServer_app,
start,2,
[{file,
"/home/pi/workspace/NErlNet/src_erl/Communication_Layer/http_Nerlserver/src/nerlNetServer_app.erl"},
{line,80}]},
{application_master,
start_it_old,4,
[{file,
"application_master.erl"},
{line,
277}]}]}}}}
NErlNet/src_py/apiServer/apiServer.py
Line 159 in 1581734
It appears that some directories are missing during installation:
missing operand after '/usr/local/lib/nerlnet-lib/log'
Try 'chown --help' for more information.
chown: missing operand after '/usr/local/lib/nerlnet-lib/NErlNet'
Try 'chown --help' for more information.
chown: missing operand after '/usr/local/lib/nerlnet-lib/NErlNet/build'
Try 'chown --help' for more information.
enable and start nerlnet.servicessing during the installation process.
Running inside ubuntu LTS.
Close workers and clean resources of workers on apiServer.close() command.
This command should be added to apiServer and close all of the resources.
Implement epoch parameter per worker (passing it from json to worker).
The AEC's training function would crash when exiting it due to pointers problems in the function's arguments (data and autoencoder_data). We solved this by copying the tensor data to tensor data_temp. Need to find a way to make this function work without inefficient copying.
The message from client is sent to server thru the router.
It is not clear what is done with Body of HTTP in mainserver incoming message from router
predict(cast, {predictRes,WorkerName,InputName,ResultID,PredictNerlTensor, Type, _TimeTook}, State = #client_statem_state{myName = MyName, msgCounter = Counter,nerlnetGraph = NerlnetGraph,timingMap = TimingMap}) ->
NewTimingMap = updateTimingMap(WorkerName,TimingMap),
{RouterHost,RouterPort} = nerl_tools:getShortPath(MyName,"mainServer",NerlnetGraph),
%io:format("Client got result from predict-~nInputName: ~p,ResultID: ~p, ~nResult:~p~n",[InputName,ResultID,Result]),
nerl_tools:http_request(RouterHost,RouterPort,"predictRes", term_to_binary({atom_to_list(WorkerName),InputName,ResultID,{PredictNerlTensor, Type}})),
{next_state, predict, State#client_statem_state{timingMap =NewTimingMap, msgCounter = Counter+1}};
NErlNet/src_cpp/opennnBridge/create.h
Line 98 in 90f26d6
The same for pooling layers.
API Server help menu is too old.
Remind this issue:
Similar issue:
ignatov/intellij-erlang#836 (comment)
Stackoverflow opened issues:
https://stackoverflow.com/questions/77078335/issue-with-running-a-compiled-rebar3-erlang-application-with-erl-pa
https://stackoverflow.com/questions/77078729/rebar3-release-cannot-find-target-build-when-rebar-config-base-dir-set
Solution:
grpc/grpc#24249 (comment)
Added to NerlnetRun script
Fix crashes on high frequencies scenarios
@dolby360 if you can please assist us with this warning of Node.js 16 deprecated.
https://github.com/leondavi/NErlNet/actions/runs/8241275327/attempts/1?pr=286
Add an exception if two experiment phase names are identical.
g++10 on raspbian - load nif results:
{error,{load_failed,"Failed to load NIF library:
'/home/pi/workspace/NErlNet/build/release/libnerlnet.so: undefined
symbol: __atomic_compare_exchange_8'
release: 5.10.63-v7+
In file: src_erl/NerlnetApp/Source/sourceStatem.erl
Function spawnTransmitter gets 'Frequncy' and computes Ms as 1000*(1/Freq)
Function sendSamples calls left_print which prints how many batches left to send , but it doesn't match the Frequency (very slow).
We should check what happens between each batch transmission and why we don't reach the desired frequency.
We need an implementation of a pre step that perform Model<-->Data validation which means that nif methods validate that data size fits the network input size before train/predict are called.
@GuyPerets106
We should fix this right after FullFlowCI integration.
These terms are VERY misleading!
createClientsAndWorkers:
{"/weightsVector",clientStateHandler, [vector,ClientStatemPid]}
Critical mistake - weightsVector is actually batchHandler and vector atom should change to batch_handler
createRouters:
{"/weightsVector",routingHandler, [rout,RouterGenServerPid]},
Critical mistake - weightsVector is actually batchHandler
rout atom change to routing
init of client:
vector -> gen_statem:cast(Client_StateM_Pid,{sample,Body});
change atom vector to batch
change statem pattern to {batch,Body}
routerGenserver:
nerl_tools:sendHTTP(MyName, To, "weightsVector", Body),
Critical mistake: change weightsVector to atom_to_str(batchHandler)
sendBatch method in sourceStatem:
Change http request of weightVector - it is not weights BUT a batch
createRouters and weightsVector methods - change hookname of weightsVector and its function accordingly to batch_handler
affected files:
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.