Git Product home page Git Product logo

multiverso's Issues

Docker file is not working when downloading JDK

Operating System: Ubuntu 16.04
VirtualBox Manager: Oracle VM Version 5.2.8 r121009
Docker version: 18.03.1-ce, build 9ee9f40

I am trying to build the Dockerfile to test Multiverso on my virtual machine.

docker build -f ./Dockerfile .

Everything works till the Java files are being configured within the execturion of the Dockerfile, that is:

RUN mkdir -p /usr/local/java/default && \
    curl -Ls 'http://download.oracle.com/otn-pub/java/jdk/8u65-b17/jdk-8u65-linux-x64.tar.gz' -H 'Cookie: oraclelicense=accept-securebackup-cookie' | \
    tar --strip-components=1 -xz -C /usr/local/java/default/

On the terminal is shown the next error

gzip: stdin: not in gzip format
tar: Child returned status 1
tar: Error is not recoverable: exiting now
The command '/bin/sh -c mkdir -p /usr/local/java/default &&     
curl -Ls 'http://download.oracle.com/otn-pub/java/jdk/8u65-b17/jdk-8u65-linux-x64.tar.gz' 
-H 'Cookie: oraclelicense=accept-securebackup-cookie' |    
 tar --strip-components=1 -xz -C /usr/local/java/default/' returned a non-zero code: 2

I think the problem comes from the oracle file that it is not fetched correctly.

High cpu usage of communicator

To better tune my application, I set name for every threads by calling pthread_setname_np API in linux. I found that high cpu usage for Multiverso's Communicator(subclass of Actor) thread. Here is a snapshot provided by htop.
selection_010

I notice some code in src/communicator.cpp. According to this code, communicator always call non-blocking send and receive. I think this is the result of high cpu usage.


    MessagePtr msg;
    while (mailbox_->Alive()) {
      // Try pop and Send
      if (mailbox_->TryPop(msg)) {
        ProcessMessage(msg);
      }
      // Probe and Recv
      size_t size = net_util_->Recv(&msg);
      if (size > 0) LocalForward(msg);
      CHECK(msg.get() == nullptr);
      net_util_->Send(msg);
    }
    break;
  }

This "spin lock" design is great. But most applications won't call send/recv thousand or millions times in a second, which means a cpu computation resource is wasted. Why not sleep communicator for a little while when no send/recv calls?

Several questions about the communication between server and worker

Your work is really awesome, and I have learn a part of the project, but might not understand it very well.
I have several questions about the communication between server and worker as follows.

  1. It seem that servers will not communicate with each other? Do the workers determine from which server to pull/push the parameters?
  2. Why you design the structure "Table"? It seems that parameters data can be consisted of "Rows".
    • Does a whole table should be stored in a single server? There is not intersection between different servers?
    • When will we need to create multiple "Tables"?

Thank you so much.

Segmentaion fault while running on cluster

Got below error while running wordembedding on cluster.

I am using starcluster to create cluster and mpi to run the jobs.

[node001:04462] *** Process received signal ***
[node001:04462] Signal: Segmentation fault (11)
[node001:04462] Signal code: Invalid permissions (2)
[node001:04462] Failing at address: 0x7f5232168010
[node001:04462] [ 0] /lib/x86_64-linux-gnu/libpthread.so.0(+0x11390)[0x7f5271b09390]
[node001:04462] [ 1] /lib/x86_64-linux-gnu/libc.so.6(+0x9f840)[0x7f52717cd840]
[node001:04462] [ 2] /home/ubuntu/multiverso/build/src/libmultiverso.so(_ZN10multiverso7UpdaterIfE6AccessEmPfS2_mPNS_9AddOptionE+0x4e)[0x7f527290e050]
[node001:04462] [ 3] /home/ubuntu/multiverso/build/src/libmultiverso.so(ZN10multiverso17MatrixServerTableIfE10ProcessGetERKSt6vectorINS_4BlobESaIS3_EEPS5+0x39e)[0x7f52728f81a0]
[node001:04462] [ 4] [node004:04351] *** Process received signal ***
/home/ubuntu/multiverso/build/src/libmultiverso.so(_ZN10multiverso6Server10ProcessGetERSt10unique_ptrINS_7MessageESt14default_deleteIS2_EE+0x1df)[0x7f52728e132d]
[node001:04462] [ 5] [node004:04351] Signal: Segmentation fault (11)
[node004:04351] Signal code: Invalid permissions (2)
[node004:04351] Failing at address: 0x7fce92be0810
/home/ubuntu/multiverso/build/src/libmultiverso.so(ZNKSt12_Mem_fn_baseIMN10multiverso6ServerEFvRSt10unique_ptrINS0_7MessageESt14default_deleteIS3_EEELb1EEclIIS7_EvEEvPS1_DpOT+0x7c)[0x7f52728e61c2]
[node001:04462] [ 6] /home/ubuntu/multiverso/build/src/libmultiverso.so(_ZNSt5_BindIFSt7_Mem_fnIMN10multiverso6ServerEFvRSt10unique_ptrINS1_7MessageESt14default_deleteIS4_EEEEPS2_St12_PlaceholderILi1EEEE6__callIvJS8_EJLm0ELm1EEEET_OSt5tupleIJDpT0_EESt12_Index_tupleIJXspT1_EEE+0x84)[0x7f52728e5bb4]
[node001:04462] [ 7] /home/ubuntu/multiverso/build/src/libmultiverso.so(ZNSt5_BindIFSt7_Mem_fnIMN10multiverso6ServerEFvRSt10unique_ptrINS1_7MessageESt14default_deleteIS4_EEEEPS2_St12_PlaceholderILi1EEEEclIJS8_EvEET0_DpOT+0x56)[0x7f52728e4dce]
[node001:04462] [ 8] [node004:04351] /home/ubuntu/multiverso/build/src/libmultiverso.so(ZNSt17_Function_handlerIFvRSt10unique_ptrIN10multiverso7MessageESt14default_deleteIS2_EEESt5_BindIFSt7_Mem_fnIMNS1_6ServerEFvS6_EEPSA_St12_PlaceholderILi1EEEEE9_M_invokeERKSt9_Any_dataS6+0x37)[0x7f52728e3f55]
[node001:04462] [ 9] [ 0] /home/ubuntu/multiverso/build/src/libmultiverso.so(ZNKSt8functionIFvRSt10unique_ptrIN10multiverso7MessageESt14default_deleteIS2_EEEEclES6+0x49)[0x7f52728c3d8f]
[node001:04462] [10] /home/ubuntu/multiverso/build/src/libmultiverso.so(_ZN10multiverso5Actor4MainEv+0xf3)[0x7f52728c2153]
[node001:04462] [11] /lib/x86_64-linux-gnu/libpthread.so.0(+0x11390)[0x7fced2544390]
/home/ubuntu/multiverso/build/src/libmultiverso.so(ZNKSt12_Mem_fn_baseIMN10multiverso5ActorEFvvELb1EEclIJEvEEvPS1_DpOT+0x65)[0x7f52728c90e3]
[node001:04462] [12] [node004:04351] /home/ubuntu/multiverso/build/src/libmultiverso.so(_ZNSt12_Bind_simpleIFSt7_Mem_fnIMN10multiverso5ActorEFvvEEPS2_EE9_M_invokeIILm0EEEEvSt12_Index_tupleIIXspT_EEE+0x43)[0x7f52728c9077]
[node001:04462] [13] /home/ubuntu/multiverso/build/src/libmultiverso.so(_ZNSt12_Bind_simpleIFSt7_Mem_fnIMN10multiverso5ActorEFvvEEPS2_EEclEv+0x2c)[0x7f52728c8f7e]
[node001:04462] [14] [ 1] /home/ubuntu/multiverso/build/src/libmultiverso.so(_ZNSt6thread5_ImplISt12_Bind_simpleIFSt7_Mem_fnIMN10multiverso5ActorEFvvEEPS4_EEE6_M_runEv+0x1c)[0x7f52728c8f0e]
[node001:04462] [15] /usr/lib/x86_64-linux-gnu/libstdc++.so.6(+0xb8c80)[0x7f527250ec80]
[node001:04462] [16] /lib/x86_64-linux-gnu/libpthread.so.0(+0x76ba)[0x7f5271aff6ba]
[node001:04462] [17] /lib/x86_64-linux-gnu/libc.so.6(clone+0x6d)[0x7f52718353dd]
[node001:04462] *** End of error message ***
/lib/x86_64-linux-gnu/libc.so.6(+0x9f840)[0x7fced2208840]
[node004:04351] [ 2] /home/ubuntu/multiverso/build/src/libmultiverso.so(_ZN10multiverso7UpdaterIfE6AccessEmPfS2_mPNS_9AddOptionE+0x4e)[0x7fced3349050]
[node004:04351] [ 3] /home/ubuntu/multiverso/build/src/libmultiverso.so(ZN10multiverso17MatrixServerTableIfE10ProcessGetERKSt6vectorINS_4BlobESaIS3_EEPS5+0x39e)[0x7fced33331a0]
[node004:04351] [ 4] /home/ubuntu/multiverso/build/src/libmultiverso.so(_ZN10multiverso6Server10ProcessGetERSt10unique_ptrINS_7MessageESt14default_deleteIS2_EE+0x1df)[0x7fced331c32d]
[node004:04351] [ 5] /home/ubuntu/multiverso/build/src/libmultiverso.so(ZNKSt12_Mem_fn_baseIMN10multiverso6ServerEFvRSt10unique_ptrINS0_7MessageESt14default_deleteIS3_EEELb1EEclIIS7_EvEEvPS1_DpOT+0x7c)[0x7fced33211c2]
[node004:04351] [ 6] /home/ubuntu/multiverso/build/src/libmultiverso.so(_ZNSt5_BindIFSt7_Mem_fnIMN10multiverso6ServerEFvRSt10unique_ptrINS1_7MessageESt14default_deleteIS4_EEEEPS2_St12_PlaceholderILi1EEEE6__callIvJS8_EJLm0ELm1EEEET_OSt5tupleIJDpT0_EESt12_Index_tupleIJXspT1_EEE+0x84)[0x7fced3320bb4]
[node004:04351] [ 7] /home/ubuntu/multiverso/build/src/libmultiverso.so(ZNSt5_BindIFSt7_Mem_fnIMN10multiverso6ServerEFvRSt10unique_ptrINS1_7MessageESt14default_deleteIS4_EEEEPS2_St12_PlaceholderILi1EEEEclIJS8_EvEET0_DpOT+0x56)[0x7fced331fdce]
[node004:04351] [ 8] /home/ubuntu/multiverso/build/src/libmultiverso.so(ZNSt17_Function_handlerIFvRSt10unique_ptrIN10multiverso7MessageESt14default_deleteIS2_EEESt5_BindIFSt7_Mem_fnIMNS1_6ServerEFvS6_EEPSA_St12_PlaceholderILi1EEEEE9_M_invokeERKSt9_Any_dataS6+0x37)[0x7fced331ef55]
[node004:04351] [ 9] /home/ubuntu/multiverso/build/src/libmultiverso.so(ZNKSt8functionIFvRSt10unique_ptrIN10multiverso7MessageESt14default_deleteIS2_EEEEclES6+0x49)[0x7fced32fed8f]
[node004:04351] [10] /home/ubuntu/multiverso/build/src/libmultiverso.so(_ZN10multiverso5Actor4MainEv+0xf3)[0x7fced32fd153]
[node004:04351] [11] /home/ubuntu/multiverso/build/src/libmultiverso.so(ZNKSt12_Mem_fn_baseIMN10multiverso5ActorEFvvELb1EEclIJEvEEvPS1_DpOT+0x65)[0x7fced33040e3]
[node004:04351] [12] /home/ubuntu/multiverso/build/src/libmultiverso.so(_ZNSt12_Bind_simpleIFSt7_Mem_fnIMN10multiverso5ActorEFvvEEPS2_EE9_M_invokeIILm0EEEEvSt12_Index_tupleIIXspT_EEE+0x43)[0x7fced3304077]
[node004:04351] [13] /home/ubuntu/multiverso/build/src/libmultiverso.so(_ZNSt12_Bind_simpleIFSt7_Mem_fnIMN10multiverso5ActorEFvvEEPS2_EEclEv+0x2c)[0x7fced3303f7e]
[node004:04351] [14] [node004:04355] *** Process received signal ***
[node004:04355] Signal: Segmentation fault (11)
[node004:04355] Signal code: Invalid permissions (2)
[node004:04355] Failing at address: 0x7f5b22ffe410
[node002:04280] *** Process received signal ***
[node002:04280] Signal: Segmentation fault (11)
[node002:04280] Signal code: Invalid permissions (2)
[node002:04280] Failing at address: 0x7f85b9bf8000
[node002:04280] [ 0] /lib/x86_64-linux-gnu/libpthread.so.0(+0x11390)[0x7f85f955a390]
[node002:04280] [ 1] /lib/x86_64-linux-gnu/libc.so.6(+0x9f84e)[0x7f85f921e84e]
[node002:04280] [ 2] /home/ubuntu/multiverso/build/src/libmultiverso.so(_ZN10multiverso7UpdaterIfE6AccessEmPfS2_mPNS_9AddOptionE+0x4e)[0x7f85fa35f050]
[node002:04280] [ 3] /home/ubuntu/multiverso/build/src/libmultiverso.so(ZN10multiverso17MatrixServerTableIfE10ProcessGetERKSt6vectorINS_4BlobESaIS3_EEPS5+0x39e)[0x7f85fa3491a0]
[node002:04280] [ 4] /home/ubuntu/multiverso/build/src/libmultiverso.so(_ZN10multiverso6Server10ProcessGetERSt10unique_ptrINS_7MessageESt14default_deleteIS2_EE+0x1df)[0x7f85fa33232d]
[node002:04280] [ 5] /home/ubuntu/multiverso/build/src/libmultiverso.so(ZNKSt12_Mem_fn_baseIMN10multiverso6ServerEFvRSt10unique_ptrINS0_7MessageESt14default_deleteIS3_EEELb1EEclIIS7_EvEEvPS1_DpOT+0x7c)[0x7f85fa3371c2]
[node002:04280] [ 6] /home/ubuntu/multiverso/build/src/libmultiverso.so(_ZNSt5_BindIFSt7_Mem_fnIMN10multiverso6ServerEFvRSt10unique_ptrINS1_7MessageESt14default_deleteIS4_EEEEPS2_St12_PlaceholderILi1EEEE6__callIvJS8_EJLm0ELm1EEEET_OSt5tupleIJDpT0_EESt12_Index_tupleIJXspT1_EEE+0x84)[0x7f85fa336bb4]
[node002:04280] [ 7] /home/ubuntu/multiverso/build/src/libmultiverso.so(ZNSt5_BindIFSt7_Mem_fnIMN10multiverso6ServerEFvRSt10unique_ptrINS1_7MessageESt14default_deleteIS4_EEEEPS2_St12_PlaceholderILi1EEEEclIJS8_EvEET0_DpOT+0x56)[0x7f85fa335dce]
[node002:04280] [ 8] /home/ubuntu/multiverso/build/src/libmultiverso.so(ZNSt17_Function_handlerIFvRSt10unique_ptrIN10multiverso7MessageESt14default_deleteIS2_EEESt5_BindIFSt7_Mem_fnIMNS1_6ServerEFvS6_EEPSA_St12_PlaceholderILi1EEEEE9_M_invokeERKSt9_Any_dataS6+0x37)[0x7f85fa334f55]
[node002:04280] [ 9] /home/ubuntu/multiverso/build/src/libmultiverso.so(ZNKSt8functionIFvRSt10unique_ptrIN10multiverso7MessageESt14default_deleteIS2_EEEEclES6+0x49)[0x7f85fa314d8f]
[node002:04280] [10] /home/ubuntu/multiverso/build/src/libmultiverso.so(_ZN10multiverso5Actor4MainEv+0xf3)[0x7f85fa313153]
[node002:04280] [11] /home/ubuntu/multiverso/build/src/libmultiverso.so(ZNKSt12_Mem_fn_baseIMN10multiverso5ActorEFvvELb1EEclIJEvEEvPS1_DpOT+0x65)[0x7f85fa31a0e3]
[node002:04280] [12] /home/ubuntu/multiverso/build/src/libmultiverso.so(_ZNSt12_Bind_simpleIFSt7_Mem_fnIMN10multiverso5ActorEFvvEEPS2_EE9_M_invokeIILm0EEEEvSt12_Index_tupleIIXspT_EEE+0x43)[0x7f85fa31a077]
[node002:04280] [13] /home/ubuntu/multiverso/build/src/libmultiverso.so(_ZNSt12_Bind_simpleIFSt7_Mem_fnIMN10multiverso5ActorEFvvEEPS2_EEclEv+0x2c)[0x7f85fa319f7e]
[node002:04280] [14] /home/ubuntu/multiverso/build/src/libmultiverso.so(_ZNSt6thread5_ImplISt12_Bind_simpleIFSt7_Mem_fnIMN10multiverso5ActorEFvvEEPS4_EEE6_M_runEv+0x1c)[0x7f85fa319f0e]
[node002:04280] [15] /usr/lib/x86_64-linux-gnu/libstdc++.so.6(+0xb8c80)[0x7f85f9f5fc80]
[node002:04280] [16] /lib/x86_64-linux-gnu/libpthread.so.0(+0x76ba)[0x7f85f95506ba]
[node002:04280] [17] /lib/x86_64-linux-gnu/libc.so.6(clone+0x6d)[0x7f85f92863dd]
[node002:04280] *** End of error message ***
/home/ubuntu/multiverso/build/src/libmultiverso.so(_ZNSt6thread5_ImplISt12_Bind_simpleIFSt7_Mem_fnIMN10multiverso5ActorEFvvEEPS4_EEE6_M_runEv+0x1c)[0x7fced3303f0e]
[node004:04351] [15] /usr/lib/x86_64-linux-gnu/libstdc++.so.6(+0xb8c80)[0x7fced2f49c80]
[node004:04351] [16] /lib/x86_64-linux-gnu/libpthread.so.0(+0x76ba)[0x7fced253a6ba]
[node004:04351] [17] /lib/x86_64-linux-gnu/libc.so.6(clone+0x6d)[0x7fced22703dd]
[node004:04351] *** End of error message ***
[node004:04355] [ 0] /lib/x86_64-linux-gnu/libpthread.so.0(+0x11390)[0x7f5b66ce1390]
[node004:04355] [ 1] /lib/x86_64-linux-gnu/libc.so.6(+0x9f840)[0x7f5b669a5840]
[node004:04355] [ 2] /home/ubuntu/multiverso/build/src/libmultiverso.so(_ZN10multiverso7UpdaterIfE6AccessEmPfS2_mPNS_9AddOptionE+0x4e)[0x7f5b67ae6050]
[node004:04355] [ 3] /home/ubuntu/multiverso/build/src/libmultiverso.so(ZN10multiverso17MatrixServerTableIfE10ProcessGetERKSt6vectorINS_4BlobESaIS3_EEPS5+0x39e)[0x7f5b67ad01a0]
[node004:04355] [ 4] /home/ubuntu/multiverso/build/src/libmultiverso.so(_ZN10multiverso6Server10ProcessGetERSt10unique_ptrINS_7MessageESt14default_deleteIS2_EE+0x1df)[0x7f5b67ab932d]
[node004:04355] [ 5] /home/ubuntu/multiverso/build/src/libmultiverso.so(ZNKSt12_Mem_fn_baseIMN10multiverso6ServerEFvRSt10unique_ptrINS0_7MessageESt14default_deleteIS3_EEELb1EEclIIS7_EvEEvPS1_DpOT+0x7c)[0x7f5b67abe1c2]
[node004:04355] [ 6] /home/ubuntu/multiverso/build/src/libmultiverso.so(_ZNSt5_BindIFSt7_Mem_fnIMN10multiverso6ServerEFvRSt10unique_ptrINS1_7MessageESt14default_deleteIS4_EEEEPS2_St12_PlaceholderILi1EEEE6__callIvJS8_EJLm0ELm1EEEET_OSt5tupleIJDpT0_EESt12_Index_tupleIJXspT1_EEE+0x84)[0x7f5b67abdbb4]
[node004:04355] [ 7] /home/ubuntu/multiverso/build/src/libmultiverso.so(ZNSt5_BindIFSt7_Mem_fnIMN10multiverso6ServerEFvRSt10unique_ptrINS1_7MessageESt14default_deleteIS4_EEEEPS2_St12_PlaceholderILi1EEEEclIJS8_EvEET0_DpOT+0x56)[0x7f5b67abcdce]
[node004:04355] [ 8] /home/ubuntu/multiverso/build/src/libmultiverso.so(ZNSt17_Function_handlerIFvRSt10unique_ptrIN10multiverso7MessageESt14default_deleteIS2_EEESt5_BindIFSt7_Mem_fnIMNS1_6ServerEFvS6_EEPSA_St12_PlaceholderILi1EEEEE9_M_invokeERKSt9_Any_dataS6+0x37)[0x7f5b67abbf55]
[node004:04355] [ 9] /home/ubuntu/multiverso/build/src/libmultiverso.so(ZNKSt8functionIFvRSt10unique_ptrIN10multiverso7MessageESt14default_deleteIS2_EEEEclES6+0x49)[0x7f5b67a9bd8f]
[node004:04355] [10] /home/ubuntu/multiverso/build/src/libmultiverso.so(_ZN10multiverso5Actor4MainEv+0xf3)[0x7f5b67a9a153]
[node004:04355] [11] /home/ubuntu/multiverso/build/src/libmultiverso.so(ZNKSt12_Mem_fn_baseIMN10multiverso5ActorEFvvELb1EEclIJEvEEvPS1_DpOT+0x65)[0x7f5b67aa10e3]
[node004:04355] [12] /home/ubuntu/multiverso/build/src/libmultiverso.so(_ZNSt12_Bind_simpleIFSt7_Mem_fnIMN10multiverso5ActorEFvvEEPS2_EE9_M_invokeIILm0EEEEvSt12_Index_tupleIIXspT_EEE+0x43)[0x7f5b67aa1077]
[node004:04355] [13] /home/ubuntu/multiverso/build/src/libmultiverso.so(_ZNSt12_Bind_simpleIFSt7_Mem_fnIMN10multiverso5ActorEFvvEEPS2_EEclEv+0x2c)[0x7f5b67aa0f7e]
[node004:04355] [14] /home/ubuntu/multiverso/build/src/libmultiverso.so(_ZNSt6thread5_ImplISt12_Bind_simpleIFSt7_Mem_fnIMN10multiverso5ActorEFvvEEPS4_EEE6_M_runEv+0x1c)[0x7f5b67aa0f0e]
[node004:04355] [15] /usr/lib/x86_64-linux-gnu/libstdc++.so.6(+0xb8c80)[0x7f5b676e6c80]
[node004:04355] [16] /lib/x86_64-linux-gnu/libpthread.so.0(+0x76ba)[0x7f5b66cd76ba]
[node004:04355] [17] /lib/x86_64-linux-gnu/libc.so.6(clone+0x6d)[0x7f5b66a0d3dd]
[node004:04355] *** End of error message ***
[node004:04346] *** Process received signal ***
[node004:04346] Signal: Segmentation fault (11)
[node004:04346] Signal code: Invalid permissions (2)
[node004:04346] Failing at address: 0x7f1ea6168000
[node004:04346] [ 0] /lib/x86_64-linux-gnu/libpthread.so.0(+0x11390)[0x7f1ee59e9390]
[node004:04346] [ 1] /lib/x86_64-linux-gnu/libc.so.6(+0x9f84e)[0x7f1ee56ad84e]
[node004:04346] [ 2] /home/ubuntu/multiverso/build/src/libmultiverso.so(_ZN10multiverso7UpdaterIfE6AccessEmPfS2_mPNS_9AddOptionE+0x4e)[0x7f1ee67ee050]
[node004:04346] [ 3] /home/ubuntu/multiverso/build/src/libmultiverso.so(ZN10multiverso17MatrixServerTableIfE10ProcessGetERKSt6vectorINS_4BlobESaIS3_EEPS5+0x39e)[0x7f1ee67d81a0]
[node004:04346] [ 4] /home/ubuntu/multiverso/build/src/libmultiverso.so(_ZN10multiverso6Server10ProcessGetERSt10unique_ptrINS_7MessageESt14default_deleteIS2_EE+0x1df)[0x7f1ee67c132d]
[node004:04346] [ 5] /home/ubuntu/multiverso/build/src/libmultiverso.so(ZNKSt12_Mem_fn_baseIMN10multiverso6ServerEFvRSt10unique_ptrINS0_7MessageESt14default_deleteIS3_EEELb1EEclIIS7_EvEEvPS1_DpOT+0x7c)[0x7f1ee67c61c2]
[node004:04346] [ 6] /home/ubuntu/multiverso/build/src/libmultiverso.so(_ZNSt5_BindIFSt7_Mem_fnIMN10multiverso6ServerEFvRSt10unique_ptrINS1_7MessageESt14default_deleteIS4_EEEEPS2_St12_PlaceholderILi1EEEE6__callIvJS8_EJLm0ELm1EEEET_OSt5tupleIJDpT0_EESt12_Index_tupleIJXspT1_EEE+0x84)[0x7f1ee67c5bb4]
[node004:04346] [ 7] /home/ubuntu/multiverso/build/src/libmultiverso.so(ZNSt5_BindIFSt7_Mem_fnIMN10multiverso6ServerEFvRSt10unique_ptrINS1_7MessageESt14default_deleteIS4_EEEEPS2_St12_PlaceholderILi1EEEEclIJS8_EvEET0_DpOT+0x56)[0x7f1ee67c4dce]
[node004:04346] [ 8] /home/ubuntu/multiverso/build/src/libmultiverso.so(ZNSt17_Function_handlerIFvRSt10unique_ptrIN10multiverso7MessageESt14default_deleteIS2_EEESt5_BindIFSt7_Mem_fnIMNS1_6ServerEFvS6_EEPSA_St12_PlaceholderILi1EEEEE9_M_invokeERKSt9_Any_dataS6+0x37)[0x7f1ee67c3f55]
[node004:04346] [ 9] /home/ubuntu/multiverso/build/src/libmultiverso.so(ZNKSt8functionIFvRSt10unique_ptrIN10multiverso7MessageESt14default_deleteIS2_EEEEclES6+0x49)[0x7f1ee67a3d8f]
[node004:04346] [10] /home/ubuntu/multiverso/build/src/libmultiverso.so(_ZN10multiverso5Actor4MainEv+0xf3)[0x7f1ee67a2153]
[node004:04346] [11] /home/ubuntu/multiverso/build/src/libmultiverso.so(ZNKSt12_Mem_fn_baseIMN10multiverso5ActorEFvvELb1EEclIJEvEEvPS1_DpOT+0x65)[0x7f1ee67a90e3]
[node004:04346] [12] /home/ubuntu/multiverso/build/src/libmultiverso.so(_ZNSt12_Bind_simpleIFSt7_Mem_fnIMN10multiverso5ActorEFvvEEPS2_EE9_M_invokeIILm0EEEEvSt12_Index_tupleIIXspT_EEE+0x43)[0x7f1ee67a9077]
[node004:04346] [13] /home/ubuntu/multiverso/build/src/libmultiverso.so(_ZNSt12_Bind_simpleIFSt7_Mem_fnIMN10multiverso5ActorEFvvEEPS2_EEclEv+0x2c)[0x7f1ee67a8f7e]
[node004:04346] [14] /home/ubuntu/multiverso/build/src/libmultiverso.so(_ZNSt6thread5_ImplISt12_Bind_simpleIFSt7_Mem_fnIMN10multiverso5ActorEFvvEEPS4_EEE6_M_runEv+0x1c)[0x7f1ee67a8f0e]
[node004:04346] [15] /usr/lib/x86_64-linux-gnu/libstdc++.so.6(+0xb8c80)[0x7f1ee63eec80]
[node004:04346] [16] /lib/x86_64-linux-gnu/libpthread.so.0(+0x76ba)[0x7f1ee59df6ba]
[node004:04346] [17] /lib/x86_64-linux-gnu/libc.so.6(clone+0x6d)[0x7f1ee57153dd]
[node004:04346] *** End of error message ***
[node002:04281] *** Process received signal ***
[node002:04281] Signal: Segmentation fault (11)
[node002:04281] Signal code: Invalid permissions (2)
[node002:04281] Failing at address: 0x7f14cdbf8410
[node002:04281] [ 0] /lib/x86_64-linux-gnu/libpthread.so.0(+0x11390)[0x7f150d4da390]
[node002:04281] [ 1] /lib/x86_64-linux-gnu/libc.so.6(+0x9f840)[0x7f150d19e840]
[node002:04281] [ 2] /home/ubuntu/multiverso/build/src/libmultiverso.so(_ZN10multiverso7UpdaterIfE6AccessEmPfS2_mPNS_9AddOptionE+0x4e)[0x7f150e2df050]
[node002:04281] [ 3] /home/ubuntu/multiverso/build/src/libmultiverso.so(ZN10multiverso17MatrixServerTableIfE10ProcessGetERKSt6vectorINS_4BlobESaIS3_EEPS5+0x39e)[0x7f150e2c91a0]
[node002:04281] [ 4] /home/ubuntu/multiverso/build/src/libmultiverso.so(_ZN10multiverso6Server10ProcessGetERSt10unique_ptrINS_7MessageESt14default_deleteIS2_EE+0x1df)[0x7f150e2b232d]
[node002:04281] [ 5] /home/ubuntu/multiverso/build/src/libmultiverso.so(ZNKSt12_Mem_fn_baseIMN10multiverso6ServerEFvRSt10unique_ptrINS0_7MessageESt14default_deleteIS3_EEELb1EEclIIS7_EvEEvPS1_DpOT+0x7c)[0x7f150e2b71c2]
[node002:04281] [ 6] /home/ubuntu/multiverso/build/src/libmultiverso.so(_ZNSt5_BindIFSt7_Mem_fnIMN10multiverso6ServerEFvRSt10unique_ptrINS1_7MessageESt14default_deleteIS4_EEEEPS2_St12_PlaceholderILi1EEEE6__callIvJS8_EJLm0ELm1EEEET_OSt5tupleIJDpT0_EESt12_Index_tupleIJXspT1_EEE+0x84)[0x7f150e2b6bb4]
[node002:04281] [ 7] /home/ubuntu/multiverso/build/src/libmultiverso.so(ZNSt5_BindIFSt7_Mem_fnIMN10multiverso6ServerEFvRSt10unique_ptrINS1_7MessageESt14default_deleteIS4_EEEEPS2_St12_PlaceholderILi1EEEEclIJS8_EvEET0_DpOT+0x56)[0x7f150e2b5dce]
[node002:04281] [ 8] /home/ubuntu/multiverso/build/src/libmultiverso.so(ZNSt17_Function_handlerIFvRSt10unique_ptrIN10multiverso7MessageESt14default_deleteIS2_EEESt5_BindIFSt7_Mem_fnIMNS1_6ServerEFvS6_EEPSA_St12_PlaceholderILi1EEEEE9_M_invokeERKSt9_Any_dataS6+0x37)[0x7f150e2b4f55]
[node002:04281] [ 9] /home/ubuntu/multiverso/build/src/libmultiverso.so(ZNKSt8functionIFvRSt10unique_ptrIN10multiverso7MessageESt14default_deleteIS2_EEEEclES6+0x49)[0x7f150e294d8f]
[node002:04281] [10] /home/ubuntu/multiverso/build/src/libmultiverso.so(_ZN10multiverso5Actor4MainEv+0xf3)[0x7f150e293153]
[node002:04281] [11] /home/ubuntu/multiverso/build/src/libmultiverso.so(ZNKSt12_Mem_fn_baseIMN10multiverso5ActorEFvvELb1EEclIJEvEEvPS1_DpOT+0x65)[0x7f150e29a0e3]
[node002:04281] [12] /home/ubuntu/multiverso/build/src/libmultiverso.so(_ZNSt12_Bind_simpleIFSt7_Mem_fnIMN10multiverso5ActorEFvvEEPS2_EE9_M_invokeIILm0EEEEvSt12_Index_tupleIIXspT_EEE+0x43)[0x7f150e29a077]
[node002:04281] [13] /home/ubuntu/multiverso/build/src/libmultiverso.so(_ZNSt12_Bind_simpleIFSt7_Mem_fnIMN10multiverso5ActorEFvvEEPS2_EEclEv+0x2c)[0x7f150e299f7e]
[node002:04281] [14] /home/ubuntu/multiverso/build/src/libmultiverso.so(_ZNSt6thread5_ImplISt12_Bind_simpleIFSt7_Mem_fnIMN10multiverso5ActorEFvvEEPS4_EEE6_M_runEv+0x1c)[0x7f150e299f0e]
[node002:04281] [15] /usr/lib/x86_64-linux-gnu/libstdc++.so.6(+0xb8c80)[0x7f150dedfc80]
[node002:04281] [16] /lib/x86_64-linux-gnu/libpthread.so.0(+0x76ba)[0x7f150d4d06ba]
[node002:04281] [17] /lib/x86_64-linux-gnu/libc.so.6(clone+0x6d)[0x7f150d2063dd]
[node002:04281] *** End of error message ***
[INFO] [2017-10-10 07:43:14] Rank 20 Prepare data time:44.863608s
[node004][[21269,1],55][btl_tcp_frag.c:237:mca_btl_tcp_frag_recv] mca_btl_tcp_frag_recv: readv failed: Connection reset by peer (104)
[node004:04348] [[21269,1],55] ORTE_ERROR_LOG: Unreachable in file pml_ob1_sendreq.c at line 1130
[INFO] [2017-10-10 07:43:14] Rank 26 Prepare data time:44.876433s

mpirun noticed that process rank 22 with PID 0 on node node002 exited on signal 11 (Segmentation fault).

Segmentation fault when increase blocks' number

When I increase blocks' number,I met a Segmentation fault,following are the gdb debugging information.
Can anyone share any ideas?
Thanks you all!

Program received signal SIGSEGV, Segmentation fault.
[Switching to Thread 0x7ffd46403700 (LWP 37256)]
0x000000000041e83c in multiverso::Aggregator::StartThread() ()
(gdb) bt
#0 0x000000000041e83c in multiverso::Aggregator::StartThread() ()
#1 0x00007ffff7708970 in ?? () from /usr/lib/x86_64-linux-gnu/libstdc++.so.6
#2 0x00007ffff7965064 in start_thread () from /lib/x86_64-linux-gnu/libpthread.so.0
#3 0x00007ffff6e7862d in clone () from /lib/x86_64-linux-gnu/libc.so.6

Sync server will block

The sync server will blocked infrequently, and it can be reproduced by Matrix unittest

What's the clock means in BSP (max_delay=0)?

In multiverso document, it says:
BSP (max_delay=0): All worker processes are barriered forcibly at the end of a clock.

What's the "clock" actually means in multiverso? Some mini-batches or a period of time?

Multiverso/Test not using cmake variables for MPI compiler

The build of Multiverso/Test files have hard-coded "mpicxx" compiler instead of respecting the option
cmake -D MPI_CXX_COMPILER=CC, which breaks builds in Cray Linux environments, and probably any user environment where compiler wrappers are used. This needs to be fixed to support building in environments where MPI compiler wrappers are common (i.e. CC, cc, ftn, etc).

For building on Cray XC systems I'm configuring multiverso with cmake/3.5.2 as follows:

cmake -D MPI_CXX_COMPILER=CC -D MPI_C_COMPILER=cc -D CMAKE_LINKER=CC -D CMAKE_CXX_COMPILER=CC -D CMAKE_C_COMPILER=cc -D CMAKE_SYSTEM_NAME=CrayLinuxEnvironment -D CMAKE_BUILD_TYPE="release" -D BUILD_SHARED_LIBS=ON -D TEST=OFF -D BOOST_ROOT=/path/to/Boost/boost-1.60.0 -D BOOST_INCLUDEDIR=/path/to/Boost/boost-1.60.0/include -D MPI_LIBRARY=${CRAY_MPICH_DIR}/lib/libmpich.so -D MPI_EXTRA_LIBRARY=${CRAY_MPICH_DIR}/lib -D CMAKE_VERBOSE_MAKEFILE=TRUE -D MPI_CXX_INCLUDE_PATH=${CRAY_MPICH_DIR}/include -D MPIEXEC=/path/to/srun -D MPIEXEC_NUMPROC_FLAG="-n" ../

-Jake

include path error

after builder as README said
run ./multiverso_server
ERROR:
error while loading shared libraries: libzmq.so.5: cannot open shared object file: No such file or directory

install on OS X

make all -j4 produce following:

find: /src/multiverso: No such file or directory
find: /src/multiverso_server: No such file or directory
find: /src/multiverso: No such file or directory
find: /src/multiverso_server: No such file or directory
mkdir -p /lib
mkdir: /lib: Permission denied
find: /src/multiverso: No such file or directory
make: *** [/lib] Error 1

I'm not an expert in Makefiles, but why it try install into /src/multiverso_server?

why workers are blocked when I start 6 process in three node, 2 process per node

11 class TestMultiversoSharedVariable:
12 def _test_sharedvar(self, row, col):
13 W = sharedvar.mv_shared(
14 value=np.zeros(
15 (row, col),
16 dtype=theano.config.floatX
17 ),
18 name='W',
19 borrow=True
20 )
21 delta = np.array(range(1, row * col + 1),
22 dtype=theano.config.floatX).reshape((row, col))
23 train_model = theano.function([], updates=[(W, W + delta)])
24 for i in xrange(10):
25 train_model()
26 train_model()
27 sharedvar.sync_all_mv_shared_vars() #sent to server
28 #mv.barrier()
29 # to get the newest value, we must sync again
30 mv.barrier()
31 sharedvar.sync_all_mv_shared_vars()
32 for j, actual in enumerate(W.get_value().reshape(-1)):
33 print "[%d] %d %d %d"%(i,j, (j + 1) * (i + 1) * 2 * mv.workers_num(), actual)
34
35 def test_sharedvar(self):
36 self._test_sharedvar(10, 10)
37
38
39 if name == 'main':
40 mv.init()
41 test_shared = TestMultiversoSharedVariable()
42 test_shared.test_sharedvar()
43 mv.shutdown()

I run this test, found When start one worker in one node, it is OK
but When start two worker in one node , all workers were blocked。
mpirun -hostfile alg_cluster.txt -npernode 1 python test_multi.py
mpirun -hostfile alg_cluster.txt -npernode 2 python test_multi.py

there are three ips in my cluster.

build error

zmq.hpp:No such file or directory
#include "zmq.hpp"

cmake error

I want to install word-embedding on linux.

I enter the following lines:
cd multiverso/Applications/WordEmbedding

cmake CMakeLists.txt
and the error occur:
CMake Error at CMakeLists.txt:22 (add_executable):
Cannot find source file:

//src

Tried extensions .c .C .c++ .cc .cpp .cxx .m .M .mm .h .hh .h++ .hm .hpp
.hxx .in .txx

Any help would be appreciate.

Server::Process_EndTrain overflow?

void Server::Process_EndTrain(std::shared_ptr msg_pack)
{
MsgType msg_type;
MsgArrow arrow;
int src, dst;
msg_pack->GetHeaderInfo(&msg_type, &arrow, &src, &dst);
clocks_[src] = 1 << 31;
in this place ,clocks_ is vector,so after bit manipulation clocks_[src] will become INT_MIN;so after one worker end train,other workers will be hanging in config.max_delay >=0. Will this be a problem?

Build Multiverso with mvapich2 (replacement of OpenMPI) in CNTK

I would like to build multiverse with mvapich2, and my CMAKE command is like cmake -DCMAKE_VERBOSE_MAKEFILE=TRUE -DMPI_CXX_INCLUDE_PATH=/usr/local/mvapich2-2.2/include -DMPI_CXX_LIBRARIES=/usr/local/mvapich2-2.2 -DMPI_LIBRARY=/usr/local/mvapich2-2.2 -.... It passes the source build, but fails the Test build with error as:
/cntk/build/gpu/release/lib/libmultiverso.so: undefined reference to MPI_Barrier'
/cntk/build/gpu/release/lib/libmultiverso.so: undefined reference to MPI_Iprobe' /cntk/build/gpu/release/lib/libmultiverso.so: undefined reference to MPI_Get_count'
/cntk/build/gpu/release/lib/libmultiverso.so: undefined reference to MPI_Isend' /cntk/build/gpu/release/lib/libmultiverso.so: undefined reference to MPI_Initialized'
/cntk/build/gpu/release/lib/libmultiverso.so: undefined reference to MPI_Allreduce' /cntk/build/gpu/release/lib/libmultiverso.so: undefined reference to MPI_Comm_size'
/cntk/build/gpu/release/lib/libmultiverso.so: undefined reference to MPI_Init_thread' /cntk/build/gpu/release/lib/libmultiverso.so: undefined reference to MPI_Query_thread'
/cntk/build/gpu/release/lib/libmultiverso.so: undefined reference to MPI_Wait' /cntk/build/gpu/release/lib/libmultiverso.so: undefined reference to MPI_Recv'
/cntk/build/gpu/release/lib/libmultiverso.so: undefined reference to MPI_Comm_rank' /cntk/build/gpu/release/lib/libmultiverso.so: undefined reference to MPI_Finalize'
/cntk/build/gpu/release/lib/libmultiverso.so: undefined reference to MPI_Testall' .

Any suggestions on this? Thank you.

about python version

Hi, all
I am interested in the version of python. If I use the python 2.7 not python3or3.5, what trouble will I meet?

build failed on ubuntu16.04

andy@Andy-UB1604:~/prj/Multiverso/build$ cmake ..
OpenMP found
/usr/local/lib/libmpi.so
/usr/local/lib/libmpi.so
CMake Warning at /usr/share/cmake-3.5/Modules/FindBoost.cmake:725 (message):
Imported targets not available for Boost version
Call Stack (most recent call first):
/usr/share/cmake-3.5/Modules/FindBoost.cmake:763 (_Boost_COMPONENT_DEPENDENCIES)
/usr/share/cmake-3.5/Modules/FindBoost.cmake:1332 (_Boost_MISSING_DEPENDENCIES)
Test/unittests/CMakeLists.txt:3 (find_package)

CMake Error at /usr/share/cmake-3.5/Modules/FindBoost.cmake:1677 (message):
Unable to find the requested Boost libraries.

Unable to find the Boost header files. Please set BOOST_ROOT to the root
directory containing Boost or BOOST_INCLUDEDIR to the directory containing
Boost's headers.
Call Stack (most recent call first):
Test/unittests/CMakeLists.txt:3 (find_package)

CMake Error at Test/unittests/CMakeLists.txt:9 (MESSAGE):
message called with incorrect number of arguments

Boost_INCLUDE_DIR-NOTFOUND
CMake Error: The following variables are used in this project, but they are set to NOTFOUND.
Please set them or make sure they are set and tested correctly in the CMake files:
/home/andy/prj/Multiverso/Test/unittests/Boost_INCLUDE_DIR
used as include directory in directory /home/andy/prj/Multiverso/Test/unittests

-- Configuring incomplete, errors occurred!
See also "/home/andy/prj/Multiverso/build/CMakeFiles/CMakeOutput.log".
andy@Andy-UB1604:/prj/Multiverso/build$ sudo apt-get install libopenmpi-dev openmpi-bin
Reading package lists... Done
Building dependency tree
Reading state information... Done
libopenmpi-dev is already the newest version (1.10.2-8ubuntu1).
openmpi-bin is already the newest version (1.10.2-8ubuntu1).
0 upgraded, 0 newly installed, 0 to remove and 354 not upgraded.
andy@Andy-UB1604:
/prj/Multiverso/build$

what does this running problem mean?

HI, all
I have installed multiverso on three ubuntu 14.04 VMs (master0, slave1, slave2) in VirtualBox, when i tried to run LR example on one single VM, program run successfully. But when i run LR example on these three VMs, the program get stuck at the step of "multiverso MPI-NET is initilized under MPI-THREAD_SERIALIZED mode", so i cann't get other running information such as "All nodes registered..."
Besides, i found LR processes have been started on these three VMs.
Finally, current VM where i started LR program get stuck and never finished...
Could anyone help me about these problem? thanks.

Could NOT find MPI_C (missing: MPI_C_LIBRARIES MPI_C_INCLUDE_PATH)

I run "cmake .." in Multiverso/build/, then it appears below errors.
I have installed openmpi-1.8-1.8.1-5.el6.x86_64 using yum.
then how I solve this MPI_C problem ?

CMake Error at /usr/local/share/cmake-2.8/Modules/FindPackageHandleStandardArgs.cmake:108 (message):
Could NOT find MPI_C (missing: MPI_C_LIBRARIES MPI_C_INCLUDE_PATH)
Call Stack (most recent call first):
/usr/local/share/cmake-2.8/Modules/FindPackageHandleStandardArgs.cmake:315 (_FPHSA_FAILURE_MESSAGE)
/usr/local/share/cmake-2.8/Modules/FindMPI.cmake:587 (find_package_handle_standard_args)
CMakeLists.txt:11 (find_package)

-- Configuring incomplete, errors occurred!

01

Encountering a problem when using array table with small sizes

I'm using a array table with size == 1.
Because multiverso checks that size_ > MV_NumServers()
So I get errors below.

[INFO] [2016-05-29 14:20:24] multiverso MPI-Net is initialized under MPI_THREAD_SERIALIZED mode.
[INFO] [2016-05-29 14:20:24] All nodes registered. System contains 1 nodes. num_worker = 1, num_server = 1
[INFO] [2016-05-29 14:20:24] Create a async server
[INFO] [2016-05-29 14:20:24] Rank 0: Zoo start sucessfully
[FATAL] [2016-05-29 14:20:25] Check failed: size_ > MV_NumServers() at /home/xiaoyang/repos/multiverso/src/table/array_table.cpp, line 14 .

master branch does not compile on ubuntu 16.04 because pthread not linked in LogisticRegression Application

Hello,

The multiverso does not compile on my ubuntu. I had the following error:
/usr/bin/ld: CMakeFiles/LogisticRegression.dir/src/reader.cpp.o: undefined reference to symbol 'pthread_create@@GLIBC_2.2.5'
//lib/x86_64-linux-gnu/libpthread.so.0: error adding symbols: DSO missing from command line

I could fix the issue by adding the missing pthread in https://github.com/Microsoft/Multiverso/blob/master/Applications/LogisticRegression/CMakeLists.txt as follow:

target_link_libraries(LogisticRegression multiverso ${MPI_CXX_LIBRARIES} pthread)

I am working on a linux machine with the following characteristics:
Distributor ID: Ubuntu
Description: Ubuntu 16.04.1 LTS
Release: 16.04
Codename: xenial

my gcc is gcc (Ubuntu 5.4.0-6ubuntu1~16.04.2) 5.4.0 20160609

Multiverso unable to learn on the Criteo Dataset

I can't seem to train a model using Multiverso on the Criteo dataset (L2-regularized Logistic regression).

I can train a model successfully on the same data using Vowpal Wabbit and some simple Python code. I have tried multiple configurations of Multiverso but without much luck.

My config file is:

input_size=14
output_size=1
objective_type=logistic
regular_type=L2
updater_type=sgd
train_epoch=20
sparse=false
use_ps=false
minibatch_size=20
#train_file=/mnt/efs/criteo_derivatives/day_1.csv_parsed
test_file=/mnt/efs/test_file
learning_rate_coef=0.0001
regular_coef=0.0007

Does Multiverso support concurrent message handling in server table?

I implement a new table and create a bunch of client threads to send the GET and ADD operations. I suppose the table can concurrently handle multiple GET and ADD ops. No matter how many threads I create, I can only observe "200% CPU usage" in Linux top command. I guess "100%" is on client side, and another "100%" is on server side.

I also notice "opm_threads" keyword exists in the API-document. But I failed to setup this variable and get a fatal error in FlagRegister::SetFlagIfFound method.

Does Multiverso support concurrent message handling in server table? If it does, how to enable concurrent server side processing?

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.