Comments (15)
Hi probir, this is not a memory leak. According to C++11's specification, the STL container's capacity won't be changed after invoking clear() or pop_back(), e.g., see https://en.cppreference.com/w/cpp/container/vector/clear.
AIFM's current implementation adopts the same convention here.
from aifm.
Thanks for pointing that out. I tried deleting the data frame vector pointer that allocate_dataframe_vector_heap returns. However, the free memory does not increase. What's your suggestion regarding this?
I also have re-written "test_tcp_array_add" to call the do_work function iteratively. My understanding is that the allocated array will be destroyed every time the do_work function exits. However, by measuring the available free memory, it does not go up when the function exits. And results in seg-fault. Here is the code and output of the execution:
for(int i = 0; i < 10; i ++)
{
do_work(manager.get());
cout << "Free region ratio: " << manager.get()->get_free_mem_ratio() << "\n";
}
Running test test_tcp_array_add2...
CPU 14| <5> cpu: detected 20 cores, 1 nodes
CPU 14| <5> time: detected 2394 ticks / us
[ 0.000721] CPU 14| <5> loading configuration from '/users/probirr/AIFM/aifm/configs/client.config'
[ 0.000789] CPU 14| <3> < 1 guaranteed kthreads is not recommended for networked apps
[ 0.017684] CPU 14| <5> net: started network stack
[ 0.017700] CPU 14| <5> net: using the following configuration:
[ 0.017706] CPU 14| <5> addr: 18.18.1.2
[ 0.017711] CPU 14| <5> netmask: 255.255.255.0
[ 0.017718] CPU 14| <5> gateway: 18.8.1.1
[ 0.017725] CPU 14| <5> mac: AE:A3:59:C7:CF:B3
[ 0.388215] CPU 14| <5> thread: created thread 0
[ 0.388351] CPU 14| <5> spawning 18 kthreads
[ 0.388590] CPU 18| <5> thread: created thread 1
[ 0.388639] CPU 06| <5> thread: created thread 2
[ 0.388656] CPU 17| <5> thread: created thread 3
[ 0.388656] CPU 01| <5> thread: created thread 4
[ 0.388943] CPU 17| <5> thread: created thread 5
[ 0.389183] CPU 15| <5> thread: created thread 6
[ 0.389358] CPU 02| <5> thread: created thread 7
[ 0.389599] CPU 17| <5> thread: created thread 8
[ 0.389892] CPU 16| <5> thread: created thread 9
[ 0.390059] CPU 11| <5> thread: created thread 10
[ 0.390347] CPU 14| <5> thread: created thread 11
[ 0.390559] CPU 08| <5> thread: created thread 12
[ 0.390885] CPU 12| <5> thread: created thread 13
[ 0.391036] CPU 02| <5> thread: created thread 14
[ 0.391158] CPU 07| <5> thread: created thread 15
[ 0.391386] CPU 11| <5> thread: created thread 16
[ 0.391529] CPU 08| <5> thread: created thread 17
Passed
Free region ratio: 0.148438
Segmentation fault
Does it destruct the allocated array when the function scope exits? Do I need to explicitly call garbage collectors to run?
from aifm.
from aifm.
Hi Zain,
This confuses me a bit since when you initialized the far memory manager, you provided the number of threads that will be used for garbage collection. The paper also mentions that AIFM's GC does collect dead objects.
I tried resizing kFarMemSize, but it did not help. My understanding is that, without GC running, AIFM will eventually run out of memory when running large applications that iteratively allocate and free memory. We want to test some real-world applications on AIFM. Can you please enable the GC or guide us on how to do so?
Thanks,
Probir
from aifm.
from aifm.
Hi Zain,
Sure. Here is the stack trace of segfault:
(gdb) r /users/probirr/AIFM/aifm/configs/client.config 18.18.1.3:8000
Starting program: /users/probirr/AIFM/aifm/bin/test_tcp_array_add2 /users/probirr/AIFM/aifm/configs/client.config 18.18.1.3:8000
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1".
CPU 11| <5> cpu: detected 20 cores, 1 nodes
CPU 11| <5> time: detected 2394 ticks / us
[ 0.000683] CPU 11| <5> loading configuration from '/users/probirr/AIFM/aifm/configs/client.config'
[ 0.000736] CPU 11| <3> < 1 guaranteed kthreads is not recommended for networked apps
[ 0.019572] CPU 11| <5> net: started network stack
[ 0.019587] CPU 11| <5> net: using the following configuration:
[ 0.019593] CPU 11| <5> addr: 18.18.1.2
[ 0.019597] CPU 11| <5> netmask: 255.255.255.0
[ 0.019603] CPU 11| <5> gateway: 18.8.1.1
[ 0.019608] CPU 11| <5> mac: 1E:0C:47:6A:1A:C8
[ 0.387087] CPU 11| <5> thread: created thread 0
[ 0.387219] CPU 11| <5> spawning 18 kthreads
[New Thread 0x7fff3cb7e700 (LWP 12414)]
[ 0.387912] CPU 02| <5> thread: created thread 1
[New Thread 0x7fff3c37d700 (LWP 12415)]
[ 0.388184] CPU 13| <5> thread: created thread 2
[New Thread 0x7fff3b77c700 (LWP 12416)]
[ 0.388483] CPU 04| <5> thread: created thread 3
[New Thread 0x7fff3ab7b700 (LWP 12417)]
[ 0.388750] CPU 16| <5> thread: created thread 4
[New Thread 0x7fff39f7a700 (LWP 12418)]
[ 0.389005] CPU 05| <5> thread: created thread 5
[New Thread 0x7fff39379700 (LWP 12419)]
[ 0.389271] CPU 07| <5> thread: created thread 6
[New Thread 0x7fff23fff700 (LWP 12420)]
[ 0.389548] CPU 09| <5> thread: created thread 7
[New Thread 0x7fff237fe700 (LWP 12421)]
[ 0.389828] CPU 19| <5> thread: created thread 8
[New Thread 0x7fff22efd700 (LWP 12422)]
[ 0.390096] CPU 01| <5> thread: created thread 9
[New Thread 0x7fff222fc700 (LWP 12423)]
[ 0.390345] CPU 04| <5> thread: created thread 10
[New Thread 0x7fff216fb700 (LWP 12424)]
[ 0.390665] CPU 18| <5> thread: created thread 11
[New Thread 0x7fff20afa700 (LWP 12425)]
[ 0.390892] CPU 08| <5> thread: created thread 12
[New Thread 0x7fff0be7f700 (LWP 12426)]
[ 0.391062] CPU 18| <5> thread: created thread 13
[New Thread 0x7fff0b27e700 (LWP 12427)]
[ 0.391276] CPU 17| <5> thread: created thread 14
[New Thread 0x7fff0a67d700 (LWP 12428)]
[ 0.391536] CPU 04| <5> thread: created thread 15
[New Thread 0x7fff09a7c700 (LWP 12429)]
[ 0.391717] CPU 03| <5> thread: created thread 16
[New Thread 0x7fff08e7b700 (LWP 12430)]
[ 0.391937] CPU 10| <5> thread: created thread 17
Thread 16 "test_tcp_array_" received signal SIGUSR2, User defined signal 2.
[Switching to Thread 0x7fff0a67d700 (LWP 12428)]
thread_yield () at runtime/sched.c:807
807 ACCESS_ONCE(start_schedule_us[get_core_num()].c) = curr_us;
(gdb) bt
#0 thread_yield () at runtime/sched.c:807
#1 0x0000555555563ea5 in far_memory::Parallelizer<std::pair<unsigned long, unsigned long> >::slave_can_exit (tid=1, this=0x7ffef8011b18) at /usr/include/c++/9/bits/unique_ptr.h:286
#2 far_memory::GCParallelMarker::slave_fn(unsigned int) () at src/manager.cpp:418
#3 0x00005555555a0376 in std::function<void ()>::operator()() const (this=0x7fff084bafe0) at /usr/include/c++/9/bits/std_function.h:683
#4 rt::thread_internal::ThreadTrampolineWithJoin (arg=0x7fff084bafd0) at thread.cc:15
#5 0x00005555555a3460 in ?? () at runtime/sched.c:128
#6 0x0000000000000000 in ?? ()
(gdb) bt full
#0 thread_yield () at runtime/sched.c:807
nextk = 122
r = <optimized out>
k = <optimized out>
myth = 0x100000009780
curr_us = 11524800
#1 0x0000555555563ea5 in far_memory::Parallelizer<std::pair<unsigned long, unsigned long> >::slave_can_exit (tid=1, this=0x7ffef8011b18) at /usr/include/c++/9/bits/unique_ptr.h:286
i = <optimized out>
#2 far_memory::GCParallelMarker::slave_fn(unsigned int) () at src/manager.cpp:418
No locals.
#3 0x00005555555a0376 in std::function<void ()>::operator()() const (this=0x7fff084bafe0) at /usr/include/c++/9/bits/std_function.h:683
No locals.
#4 rt::thread_internal::ThreadTrampolineWithJoin (arg=0x7fff084bafd0) at thread.cc:15
d = 0x7fff084bafd0
#5 0x00005555555a3460 in ?? () at runtime/sched.c:128
thread_slab = {name = 0x5555555e8aac "runtime_threads", size = 192, link = {next = 0x555561814e50 <smalloc_slabs+16>, prev = 0x555561907f50 <node_slab+16>}, nodes = {0x1000000080c0, 0x0, 0x0, 0x0}}
last_watchdog_tsc = 120123258731640552
last_tsc = 120123258731728240
runtime_stack = 0x7fff0923bff8
thread_tcache = 0x55556194d900
__perthread_thread_pt = {tc = 0x3010102464c457f, rounds = 0, capacity = 0, loaded = 0x1003e0003, previous = 0xb300}
last_pmc_tsc = <optimized out>
runtime_stack_base = <optimized out>
rr_idx = 12
start_schedule_us = {{c = 0, pad = {0, 0, 0, 0, 0, 0, 0}}, {c = 11524799, pad = {0, 0, 0, 0, 0, 0, 0}}, {c = 11524583, pad = {0, 0, 0, 0, 0, 0, 0}}, {c = 11524799, pad = {0, 0, 0, 0, 0, 0, 0}}, {
c = 11524799, pad = {0, 0, 0, 0, 0, 0, 0}}, {c = 9724764, pad = {0, 0, 0, 0, 0, 0, 0}}, {c = 11524800, pad = {0, 0, 0, 0, 0, 0, 0}}, {c = 11524798, pad = {0, 0, 0, 0, 0, 0, 0}}, {c = 11524801,
pad = {0, 0, 0, 0, 0, 0, 0}}, {c = 11524800, pad = {0, 0, 0, 0, 0, 0, 0}}, {c = 0, pad = {0, 0, 0, 0, 0, 0, 0}}, {c = 0, pad = {0, 0, 0, 0, 0, 0, 0}}, {c = 0, pad = {0, 0, 0, 0, 0, 0, 0}}, {
c = 0, pad = {0, 0, 0, 0, 0, 0, 0}}, {c = 0, pad = {0, 0, 0, 0, 0, 0, 0}}, {c = 0, pad = {0, 0, 0, 0, 0, 0, 0}}, {c = 0, pad = {0, 0, 0, 0, 0, 0, 0}}, {c = 0, pad = {0, 0, 0, 0, 0, 0, 0}}, {
c = 11532420, pad = {0, 0, 0, 0, 0, 0, 0}}, {c = 0, pad = {0, 0, 0, 0, 0, 0, 0}} <repeats 237 times>}
__prioritized_status = 3
duration_gc_us = {{c = 0, pad = {0, 0, 0, 0, 0, 0, 0}}, {c = 4287, pad = {0, 0, 0, 0, 0, 0, 0}}, {c = 0, pad = {0, 0, 0, 0, 0, 0, 0}}, {c = 2805, pad = {0, 0, 0, 0, 0, 0, 0}}, {c = 1502, pad = {0,
0, 0, 0, 0, 0, 0}}, {c = 0, pad = {0, 0, 0, 0, 0, 0, 0}}, {c = 1151, pad = {0, 0, 0, 0, 0, 0, 0}}, {c = 1791, pad = {0, 0, 0, 0, 0, 0, 0}}, {c = 1539, pad = {0, 0, 0, 0, 0, 0, 0}}, {c = 1,
pad = {0, 0, 0, 0, 0, 0, 0}}, {c = 0, pad = {0, 0, 0, 0, 0, 0, 0}} <repeats 246 times>}
__curr_cpu = 3
runtime_congestion = 0x7fffcd600d80
__self = 0x100000009780
__global_prioritizing = 1
start_gc_us = {{c = 0, pad = {0, 0, 0, 0, 0, 0, 0}}, {c = 11524784, pad = {0, 0, 0, 0, 0, 0, 0}}, {c = 11524586, pad = {0, 0, 0, 0, 0, 0, 0}}, {c = 11524762, pad = {0, 0, 0, 0, 0, 0, 0}}, {
c = 11524770, pad = {0, 0, 0, 0, 0, 0, 0}}, {c = 0, pad = {0, 0, 0, 0, 0, 0, 0}}, {c = 11524754, pad = {0, 0, 0, 0, 0, 0, 0}}, {c = 11524798, pad = {0, 0, 0, 0, 0, 0, 0}}, {c = 11524799,
pad = {0, 0, 0, 0, 0, 0, 0}}, {c = 11524801, pad = {0, 0, 0, 0, 0, 0, 0}}, {c = 0, pad = {0, 0, 0, 0, 0, 0, 0}}, {c = 0, pad = {0, 0, 0, 0, 0, 0, 0}}, {c = 0, pad = {0, 0, 0, 0, 0, 0, 0}}, {
c = 11524825, pad = {0, 0, 0, 0, 0, 0, 0}}, {c = 11524836, pad = {0, 0, 0, 0, 0, 0, 0}}, {c = 11524839, pad = {0, 0, 0, 0, 0, 0, 0}}, {c = 0, pad = {0, 0, 0, 0, 0, 0, 0}}, {c = 11524862,
pad = {0, 0, 0, 0, 0, 0, 0}}, {c = 0, pad = {0, 0, 0, 0, 0, 0, 0}} <repeats 238 times>}
__status = 3
disable_watchdog = 0
---Type <return> to continue, or q <return> to quit---
duration_schedule_us = {{c = 0, pad = {0, 0, 0, 0, 0, 0, 0}}, {c = 395001, pad = {0, 0, 0, 0, 0, 0, 0}}, {c = 400297, pad = {0, 0, 0, 0, 0, 0, 0}}, {c = 398424, pad = {0, 0, 0, 0, 0, 0, 0}}, {
c = 392668, pad = {0, 0, 0, 0, 0, 0, 0}}, {c = 398601, pad = {0, 0, 0, 0, 0, 0, 0}}, {c = 11524657, pad = {0, 0, 0, 0, 0, 0, 0}}, {c = 11524665, pad = {0, 0, 0, 0, 0, 0, 0}}, {c = 11524676,
pad = {0, 0, 0, 0, 0, 0, 0}}, {c = 11524706, pad = {0, 0, 0, 0, 0, 0, 0}}, {c = 0, pad = {0, 0, 0, 0, 0, 0, 0}}, {c = 0, pad = {0, 0, 0, 0, 0, 0, 0}}, {c = 0, pad = {0, 0, 0, 0, 0, 0, 0}}, {
c = 11524825, pad = {0, 0, 0, 0, 0, 0, 0}}, {c = 11524836, pad = {0, 0, 0, 0, 0, 0, 0}}, {c = 11524839, pad = {0, 0, 0, 0, 0, 0, 0}}, {c = 0, pad = {0, 0, 0, 0, 0, 0, 0}}, {c = 11524862,
pad = {0, 0, 0, 0, 0, 0, 0}}, {c = 11528459, pad = {0, 0, 0, 0, 0, 0, 0}}, {c = 0, pad = {0, 0, 0, 0, 0, 0, 0}} <repeats 237 times>}
#6 0x0000000000000000 in ?? ()
No symbol table info available.
Thanks,
Probir
from aifm.
Thanks probir. Before typing "r" in gdb, could you please first filter out the irrelavant signals by typing "handle SIGUSR2 nostop noprint" and "handle SIGUSR1 nostop noprint"? Otherwise the reported stack trace won't be the fault site. Once I get some clues, I will work on the fix,
from aifm.
Sure. I have filtered out the irrelevant signals. Here is the stack trace:
[New Thread 0x7fff08e7b700 (LWP 13490)]
[ 0.414626] CPU 04| <5> thread: created thread 17
Thread 17 "test_tcp_array_" received signal SIGSEGV, Segmentation fault.
[Switching to Thread 0x7fff09a7c700 (LWP 13489)]
far_memory::GCParallelMarker::slave_fn(unsigned int) () at src/manager.cpp:434
434 if (!ptr->meta().is_shared()) {
(gdb) bt full
#0 far_memory::GCParallelMarker::slave_fn(unsigned int) () at src/manager.cpp:434
ptr = 0x0
obj_id_len = <optimized out>
obj_id = 0x7ffecb805b28 "\003\276\376\177\b"
guard = <optimized out>
obj = {static kPtrAddrPos = 0, static kPtrAddrSize = 6, static kDataLenPos = 6, static kDSIDPos = 8, static kIDLenPos = 9, addr_ = 140732312607518, static kDSIDSize = 1, static kIDLenSize = 1,
static kDataLenSize = 2, static kHeaderSize = 10, static kMaxObjectSize = 65535, static kMaxObjectIDSize = 255, static kMaxObjectDataSize = 65270}
left = <optimized out>
right = <optimized out>
cur = 140732312607518
task = <optimized out>
#1 0x00005555555a0376 in std::function<void ()>::operator()() const (this=0x7ffeed63ffe0) at /usr/include/c++/9/bits/std_function.h:683
No locals.
#2 rt::thread_internal::ThreadTrampolineWithJoin (arg=0x7ffeed63ffd0) at thread.cc:15
d = 0x7ffeed63ffd0
#3 0x00005555555a3460 in ?? () at runtime/sched.c:128
thread_slab = {name = 0x5555555e8aac "runtime_threads", size = 192, link = {next = 0x555561814e50 <smalloc_slabs+16>, prev = 0x555561907f50 <node_slab+16>}, nodes = {0x1000000080c0, 0x0, 0x0, 0x0}}
last_watchdog_tsc = 120126583620015516
last_tsc = 120126583620015786
runtime_stack = 0x7fff0863aff8
thread_tcache = 0x55556194d900
__perthread_thread_pt = {tc = 0x3010102464c457f, rounds = 0, capacity = 0, loaded = 0x1003e0003, previous = 0xb300}
last_pmc_tsc = <optimized out>
runtime_stack_base = <optimized out>
rr_idx = 1557
start_schedule_us = {{c = 0, pad = {0, 0, 0, 0, 0, 0, 0}}, {c = 7576873, pad = {0, 0, 0, 0, 0, 0, 0}}, {c = 7576877, pad = {0, 0, 0, 0, 0, 0, 0}}, {c = 7576882, pad = {0, 0, 0, 0, 0, 0, 0}}, {
c = 7588616, pad = {0, 0, 0, 0, 0, 0, 0}}, {c = 7576889, pad = {0, 0, 0, 0, 0, 0, 0}}, {c = 7576894, pad = {0, 0, 0, 0, 0, 0, 0}}, {c = 7588327, pad = {0, 0, 0, 0, 0, 0, 0}}, {c = 7588663,
pad = {0, 0, 0, 0, 0, 0, 0}}, {c = 7576907, pad = {0, 0, 0, 0, 0, 0, 0}}, {c = 0, pad = {0, 0, 0, 0, 0, 0, 0}}, {c = 7576912, pad = {0, 0, 0, 0, 0, 0, 0}}, {c = 7588642, pad = {0, 0, 0, 0, 0,
0, 0}}, {c = 7576921, pad = {0, 0, 0, 0, 0, 0, 0}}, {c = 7576925, pad = {0, 0, 0, 0, 0, 0, 0}}, {c = 7588326, pad = {0, 0, 0, 0, 0, 0, 0}}, {c = 7586463, pad = {0, 0, 0, 0, 0, 0, 0}}, {
c = 7576933, pad = {0, 0, 0, 0, 0, 0, 0}}, {c = 7576939, pad = {0, 0, 0, 0, 0, 0, 0}}, {c = 7576866, pad = {0, 0, 0, 0, 0, 0, 0}}, {c = 0, pad = {0, 0, 0, 0, 0, 0, 0}} <repeats 236 times>}
__prioritized_status = 3
duration_gc_us = {{c = 0, pad = {0, 0, 0, 0, 0, 0, 0}}, {c = 773057, pad = {0, 0, 0, 0, 0, 0, 0}}, {c = 1365922, pad = {0, 0, 0, 0, 0, 0, 0}}, {c = 1031406, pad = {0, 0, 0, 0, 0, 0, 0}}, {
c = 1131463, pad = {0, 0, 0, 0, 0, 0, 0}}, {c = 1544494, pad = {0, 0, 0, 0, 0, 0, 0}}, {c = 1639550, pad = {0, 0, 0, 0, 0, 0, 0}}, {c = 1469808, pad = {0, 0, 0, 0, 0, 0, 0}}, {c = 1326982,
pad = {0, 0, 0, 0, 0, 0, 0}}, {c = 1527556, pad = {0, 0, 0, 0, 0, 0, 0}}, {c = 0, pad = {0, 0, 0, 0, 0, 0, 0}}, {c = 1435272, pad = {0, 0, 0, 0, 0, 0, 0}}, {c = 1207706, pad = {0, 0, 0, 0, 0,
0, 0}}, {c = 1029356, pad = {0, 0, 0, 0, 0, 0, 0}}, {c = 1175862, pad = {0, 0, 0, 0, 0, 0, 0}}, {c = 933133, pad = {0, 0, 0, 0, 0, 0, 0}}, {c = 976550, pad = {0, 0, 0, 0, 0, 0, 0}}, {
c = 344530, pad = {0, 0, 0, 0, 0, 0, 0}}, {c = 213113, pad = {0, 0, 0, 0, 0, 0, 0}}, {c = 1445023, pad = {0, 0, 0, 0, 0, 0, 0}}, {c = 0, pad = {0, 0, 0, 0, 0, 0, 0}} <repeats 236 times>}
__curr_cpu = 2
runtime_congestion = 0x7fffcd600d80
__self = 0x100000069600
__global_prioritizing = 1
start_gc_us = {{c = 0, pad = {0, 0, 0, 0, 0, 0, 0}}, {c = 7576874, pad = {0, 0, 0, 0, 0, 0, 0}}, {c = 7576878, pad = {0, 0, 0, 0, 0, 0, 0}}, {c = 7576882, pad = {0, 0, 0, 0, 0, 0, 0}}, {
c = 7445716, pad = {0, 0, 0, 0, 0, 0, 0}}, {c = 7576889, pad = {0, 0, 0, 0, 0, 0, 0}}, {c = 7576894, pad = {0, 0, 0, 0, 0, 0, 0}}, {c = 7445645, pad = {0, 0, 0, 0, 0, 0, 0}}, {c = 7414454,
pad = {0, 0, 0, 0, 0, 0, 0}}, {c = 7576907, pad = {0, 0, 0, 0, 0, 0, 0}}, {c = 0, pad = {0, 0, 0, 0, 0, 0, 0}}, {c = 7576912, pad = {0, 0, 0, 0, 0, 0, 0}}, {c = 7446230, pad = {0, 0, 0, 0, 0,
0, 0}}, {c = 7576921, pad = {0, 0, 0, 0, 0, 0, 0}}, {c = 7576926, pad = {0, 0, 0, 0, 0, 0, 0}}, {c = 7576766, pad = {0, 0, 0, 0, 0, 0, 0}}, {c = 7398917, pad = {0, 0, 0, 0, 0, 0, 0}}, {
---Type <return> to continue, or q <return> to quit---
c = 7576934, pad = {0, 0, 0, 0, 0, 0, 0}}, {c = 7576938, pad = {0, 0, 0, 0, 0, 0, 0}}, {c = 7576866, pad = {0, 0, 0, 0, 0, 0, 0}}, {c = 0, pad = {0, 0, 0, 0, 0, 0, 0}} <repeats 236 times>}
__status = 3
disable_watchdog = 0
duration_schedule_us = {{c = 0, pad = {0, 0, 0, 0, 0, 0, 0}}, {c = 694210, pad = {0, 0, 0, 0, 0, 0, 0}}, {c = 537958, pad = {0, 0, 0, 0, 0, 0, 0}}, {c = 650964, pad = {0, 0, 0, 0, 0, 0, 0}}, {
c = 614338, pad = {0, 0, 0, 0, 0, 0, 0}}, {c = 819785, pad = {0, 0, 0, 0, 0, 0, 0}}, {c = 4176384, pad = {0, 0, 0, 0, 0, 0, 0}}, {c = 4102448, pad = {0, 0, 0, 0, 0, 0, 0}}, {c = 4061523,
pad = {0, 0, 0, 0, 0, 0, 0}}, {c = 4078690, pad = {0, 0, 0, 0, 0, 0, 0}}, {c = 0, pad = {0, 0, 0, 0, 0, 0, 0}}, {c = 4071345, pad = {0, 0, 0, 0, 0, 0, 0}}, {c = 4280196, pad = {0, 0, 0, 0, 0,
0, 0}}, {c = 4128097, pad = {0, 0, 0, 0, 0, 0, 0}}, {c = 4194133, pad = {0, 0, 0, 0, 0, 0, 0}}, {c = 4167618, pad = {0, 0, 0, 0, 0, 0, 0}}, {c = 3981447, pad = {0, 0, 0, 0, 0, 0, 0}}, {
c = 4122436, pad = {0, 0, 0, 0, 0, 0, 0}}, {c = 3954343, pad = {0, 0, 0, 0, 0, 0, 0}}, {c = 4138636, pad = {0, 0, 0, 0, 0, 0, 0}}, {c = 0, pad = {0, 0, 0, 0, 0, 0, 0}} <repeats 236 times>}
#4 0x0000000000000000 in ?? ()
No symbol table info available.
(gdb) bt
#0 far_memory::GCParallelMarker::slave_fn(unsigned int) () at src/manager.cpp:434
#1 0x00005555555a0376 in std::function<void ()>::operator()() const (this=0x7ffeed63ffe0) at /usr/include/c++/9/bits/std_function.h:683
#2 rt::thread_internal::ThreadTrampolineWithJoin (arg=0x7ffeed63ffd0) at thread.cc:15
#3 0x00005555555a3460 in ?? () at runtime/sched.c:128
#4 0x0000000000000000 in ?? ()
from aifm.
Thank you probir, it's a bug. I've pushed the fix.
from aifm.
Hi Zain,
I tested the code today. I am still having the segfault. Please find the stack trace below:
(gdb) handle SIGUSR1 nostop noprint
Signal Stop Print Pass to program Description
SIGUSR1 No No Yes User defined signal 1
(gdb) handle SIGUSR2 nostop noprint
Signal Stop Print Pass to program Description
SIGUSR2 No No Yes User defined signal 2
(gdb) r /users/probirr/AIFM/aifm/configs/client.config 18.18.1.3:8000
Starting program: /users/probirr/AIFM/aifm/bin/test_tcp_array_add /users/probirr/AIFM/aifm/configs/client.config 18.18.1.3:8000
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1".
CPU 18| <5> cpu: detected 20 cores, 1 nodes
CPU 18| <5> time: detected 2394 ticks / us
[ 0.000699] CPU 18| <5> loading configuration from '/users/probirr/AIFM/aifm/configs/client.config'
[ 0.000753] CPU 18| <3> < 1 guaranteed kthreads is not recommended for networked apps
[ 0.018529] CPU 18| <5> net: started network stack
[ 0.018546] CPU 18| <5> net: using the following configuration:
[ 0.018552] CPU 18| <5> addr: 18.18.1.2
[ 0.018556] CPU 18| <5> netmask: 255.255.255.0
[ 0.018562] CPU 18| <5> gateway: 18.8.1.1
[ 0.018567] CPU 18| <5> mac: 26:AC:74:AF:50:5C
[ 0.384120] CPU 18| <5> thread: created thread 0
[ 0.384247] CPU 18| <5> spawning 18 kthreads
[New Thread 0x7fff3cb7e700 (LWP 26448)]
[ 0.384847] CPU 19| <5> thread: created thread 1
[New Thread 0x7fff3c37d700 (LWP 26449)]
[ 0.385191] CPU 07| <5> thread: created thread 2
[New Thread 0x7fff3b77c700 (LWP 26450)]
[ 0.385452] CPU 09| <5> thread: created thread 3
[New Thread 0x7fff3ab7b700 (LWP 26451)]
[ 0.385700] CPU 01| <5> thread: created thread 4
[New Thread 0x7fff39f7a700 (LWP 26452)]
[ 0.385949] CPU 15| <5> thread: created thread 5
[New Thread 0x7fff39379700 (LWP 26453)]
[ 0.386182] CPU 16| <5> thread: created thread 6
[New Thread 0x7fff23fff700 (LWP 26454)]
[ 0.386427] CPU 02| <5> thread: created thread 7
[New Thread 0x7fff237fe700 (LWP 26455)]
[ 0.386762] CPU 04| <5> thread: created thread 8
[New Thread 0x7fff22efd700 (LWP 26456)]
[ 0.387233] CPU 13| <5> thread: created thread 9
[New Thread 0x7fff222fc700 (LWP 26457)]
[ 0.387570] CPU 13| <5> thread: created thread 10
[New Thread 0x7fff216fb700 (LWP 26458)]
[ 0.387961] CPU 08| <5> thread: created thread 11
[New Thread 0x7fff20afa700 (LWP 26459)]
[ 0.388225] CPU 05| <5> thread: created thread 12
[New Thread 0x7fff0be7f700 (LWP 26460)]
[ 0.388477] CPU 10| <5> thread: created thread 13
[New Thread 0x7fff0b27e700 (LWP 26461)]
[ 0.388655] CPU 18| <5> thread: created thread 14
[New Thread 0x7fff0a67d700 (LWP 26462)]
[ 0.388897] CPU 13| <5> thread: created thread 15
[New Thread 0x7fff09a7c700 (LWP 26463)]
[ 0.389130] CPU 06| <5> thread: created thread 16
[New Thread 0x7fff08e7b700 (LWP 26464)]
[ 0.389394] CPU 06| <5> thread: created thread 17
Thread 13 "test_tcp_array_" received signal SIGSEGV, Segmentation fault.
[Switching to Thread 0x7fff20afa700 (LWP 26459)]
far_memory::GCParallelMarker::slave_fn(unsigned int) () at src/manager.cpp:434
434 if (!ptr->meta().is_shared()) {
(gdb) bt full
#0 far_memory::GCParallelMarker::slave_fn(unsigned int) () at src/manager.cpp:434
ptr = 0x0
obj_id_len = <optimized out>
obj_id = 0x7ffee59e00e8 "\340\333\376\177"
guard = <optimized out>
obj = {static kPtrAddrPos = 0, static kPtrAddrSize = 6, static kDataLenPos = 6, static kDSIDPos = 8, static kIDLenPos = 9, addr_ = 140732750758110, static kDSIDSize = 1,
static kIDLenSize = 1, static kDataLenSize = 2, static kHeaderSize = 10, static kMaxObjectSize = 65535, static kMaxObjectIDSize = 255, static kMaxObjectDataSize = 65270}
left = <optimized out>
right = <optimized out>
cur = 140732750758110
task = <optimized out>
#1 0x00005555555a02b6 in std::function<void ()>::operator()() const (this=0x7fff3aebbfe0) at /usr/include/c++/9/bits/std_function.h:683
No locals.
#2 rt::thread_internal::ThreadTrampolineWithJoin (arg=0x7fff3aebbfd0) at thread.cc:15
d = 0x7fff3aebbfd0
#3 0x00005555555a33a0 in ?? () at runtime/sched.c:128
thread_slab = {name = 0x5555555e89ec "runtime_threads", size = 192, link = {next = 0x555561814e50 <smalloc_slabs+16>, prev = 0x555561907f50 <node_slab+16>}, nodes = {0x1000000080c0, 0x0, 0x0,
0x0}}
last_watchdog_tsc = 31902610368648492
last_tsc = 31902610368823485
runtime_stack = 0x7fff0b63eff8
thread_tcache = 0x55556194d900
__perthread_thread_pt = {tc = 0x3010102464c457f, rounds = 0, capacity = 0, loaded = 0x1003e0003, previous = 0xb2c0}
last_pmc_tsc = <optimized out>
runtime_stack_base = <optimized out>
rr_idx = 14407
start_schedule_us = {{c = 0, pad = {0, 0, 0, 0, 0, 0, 0}}, {c = 20707552, pad = {0, 0, 0, 0, 0, 0, 0}}, {c = 20776849, pad = {0, 0, 0, 0, 0, 0, 0}}, {c = 20707918, pad = {0, 0, 0, 0, 0, 0,
0}}, {c = 20708417, pad = {0, 0, 0, 0, 0, 0, 0}}, {c = 20776838, pad = {0, 0, 0, 0, 0, 0, 0}}, {c = 20642360, pad = {0, 0, 0, 0, 0, 0, 0}}, {c = 20707918, pad = {0, 0, 0, 0, 0, 0, 0}}, {
c = 20703293, pad = {0, 0, 0, 0, 0, 0, 0}}, {c = 20776825, pad = {0, 0, 0, 0, 0, 0, 0}}, {c = 0, pad = {0, 0, 0, 0, 0, 0, 0}}, {c = 20707918, pad = {0, 0, 0, 0, 0, 0, 0}}, {c = 20707919,
pad = {0, 0, 0, 0, 0, 0, 0}}, {c = 20703217, pad = {0, 0, 0, 0, 0, 0, 0}}, {c = 20707919, pad = {0, 0, 0, 0, 0, 0, 0}}, {c = 20776796, pad = {0, 0, 0, 0, 0, 0, 0}}, {c = 20707937, pad = {
0, 0, 0, 0, 0, 0, 0}}, {c = 20708411, pad = {0, 0, 0, 0, 0, 0, 0}}, {c = 20642440, pad = {0, 0, 0, 0, 0, 0, 0}}, {c = 20775572, pad = {0, 0, 0, 0, 0, 0, 0}}, {c = 0, pad = {0, 0, 0, 0,
0, 0, 0}} <repeats 236 times>}
__prioritized_status = 3
duration_gc_us = {{c = 0, pad = {0, 0, 0, 0, 0, 0, 0}}, {c = 907167, pad = {0, 0, 0, 0, 0, 0, 0}}, {c = 1487000, pad = {0, 0, 0, 0, 0, 0, 0}}, {c = 1360770, pad = {0, 0, 0, 0, 0, 0, 0}}, {
c = 1332090, pad = {0, 0, 0, 0, 0, 0, 0}}, {c = 1684852, pad = {0, 0, 0, 0, 0, 0, 0}}, {c = 1415205, pad = {0, 0, 0, 0, 0, 0, 0}}, {c = 1480664, pad = {0, 0, 0, 0, 0, 0, 0}}, {
c = 1424729, pad = {0, 0, 0, 0, 0, 0, 0}}, {c = 1074331, pad = {0, 0, 0, 0, 0, 0, 0}}, {c = 0, pad = {0, 0, 0, 0, 0, 0, 0}}, {c = 1920584, pad = {0, 0, 0, 0, 0, 0, 0}}, {c = 1264151,
pad = {0, 0, 0, 0, 0, 0, 0}}, {c = 1862664, pad = {0, 0, 0, 0, 0, 0, 0}}, {c = 1682386, pad = {0, 0, 0, 0, 0, 0, 0}}, {c = 1171910, pad = {0, 0, 0, 0, 0, 0, 0}}, {c = 1958608, pad = {0,
0, 0, 0, 0, 0, 0}}, {c = 1637780, pad = {0, 0, 0, 0, 0, 0, 0}}, {c = 1529236, pad = {0, 0, 0, 0, 0, 0, 0}}, {c = 299564, pad = {0, 0, 0, 0, 0, 0, 0}}, {c = 0, pad = {0, 0, 0, 0, 0, 0,
0}} <repeats 236 times>}
__curr_cpu = 6
runtime_congestion = 0x7fffcd600d80
__self = 0x10000001b840
__global_prioritizing = 1
start_gc_us = {{c = 0, pad = {0, 0, 0, 0, 0, 0, 0}}, {c = 20642402, pad = {0, 0, 0, 0, 0, 0, 0}}, {c = 20642371, pad = {0, 0, 0, 0, 0, 0, 0}}, {c = 20630830, pad = {0, 0, 0, 0, 0, 0, 0}}, {
---Type <return> to continue, or q <return> to quit---
c = 20624720, pad = {0, 0, 0, 0, 0, 0, 0}}, {c = 20613993, pad = {0, 0, 0, 0, 0, 0, 0}}, {c = 20642361, pad = {0, 0, 0, 0, 0, 0, 0}}, {c = 20642411, pad = {0, 0, 0, 0, 0, 0, 0}}, {
c = 20642415, pad = {0, 0, 0, 0, 0, 0, 0}}, {c = 20648131, pad = {0, 0, 0, 0, 0, 0, 0}}, {c = 0, pad = {0, 0, 0, 0, 0, 0, 0}}, {c = 20642423, pad = {0, 0, 0, 0, 0, 0, 0}}, {c = 20642382,
pad = {0, 0, 0, 0, 0, 0, 0}}, {c = 20642428, pad = {0, 0, 0, 0, 0, 0, 0}}, {c = 20642432, pad = {0, 0, 0, 0, 0, 0, 0}}, {c = 20630878, pad = {0, 0, 0, 0, 0, 0, 0}}, {c = 20648131, pad = {
0, 0, 0, 0, 0, 0, 0}}, {c = 20648130, pad = {0, 0, 0, 0, 0, 0, 0}}, {c = 20642440, pad = {0, 0, 0, 0, 0, 0, 0}}, {c = 20642349, pad = {0, 0, 0, 0, 0, 0, 0}}, {c = 0, pad = {0, 0, 0, 0,
0, 0, 0}} <repeats 236 times>}
__status = 3
disable_watchdog = 0
duration_schedule_us = {{c = 0, pad = {0, 0, 0, 0, 0, 0, 0}}, {c = 870747, pad = {0, 0, 0, 0, 0, 0, 0}}, {c = 806000, pad = {0, 0, 0, 0, 0, 0, 0}}, {c = 849502, pad = {0, 0, 0, 0, 0, 0, 0}}, {
c = 762007, pad = {0, 0, 0, 0, 0, 0, 0}}, {c = 562866, pad = {0, 0, 0, 0, 0, 0, 0}}, {c = 8657580, pad = {0, 0, 0, 0, 0, 0, 0}}, {c = 5275973, pad = {0, 0, 0, 0, 0, 0, 0}}, {c = 5307707,
pad = {0, 0, 0, 0, 0, 0, 0}}, {c = 8400243, pad = {0, 0, 0, 0, 0, 0, 0}}, {c = 0, pad = {0, 0, 0, 0, 0, 0, 0}}, {c = 5345756, pad = {0, 0, 0, 0, 0, 0, 0}}, {c = 5430843, pad = {0, 0, 0,
0, 0, 0, 0}}, {c = 5297854, pad = {0, 0, 0, 0, 0, 0, 0}}, {c = 5397851, pad = {0, 0, 0, 0, 0, 0, 0}}, {c = 5395546, pad = {0, 0, 0, 0, 0, 0, 0}}, {c = 9403586, pad = {0, 0, 0, 0, 0, 0,
0}}, {c = 5345828, pad = {0, 0, 0, 0, 0, 0, 0}}, {c = 6739800, pad = {0, 0, 0, 0, 0, 0, 0}}, {c = 5021648, pad = {0, 0, 0, 0, 0, 0, 0}}, {c = 0, pad = {0, 0, 0, 0, 0, 0,
0}} <repeats 236 times>}
#4 0x0000000000000000 in ?? ()
No symbol table info available.
(gdb) bt
#0 far_memory::GCParallelMarker::slave_fn(unsigned int) () at src/manager.cpp:434
#1 0x00005555555a02b6 in std::function<void ()>::operator()() const (this=0x7fff3aebbfe0) at /usr/include/c++/9/bits/std_function.h:683
#2 rt::thread_internal::ThreadTrampolineWithJoin (arg=0x7fff3aebbfd0) at thread.cc:15
#3 0x00005555555a33a0 in ?? () at runtime/sched.c:128
#4 0x0000000000000000 in ?? ()
from aifm.
Hi probir, did you recompile the code? Last time after applying the fix, I ran through the same test you use and didn't see segfault anymore.
from aifm.
Yes. I downloaded and re-compiled the code. While testing, did you modify the test case by iteratively calling the do_work()? I am using the following _main function:
void _main(void *arg) {
char **argv = static_cast<char **>(arg);
std::string ip_addr_port(argv[1]);
auto raddr = helpers::str_to_netaddr(ip_addr_port);
std::unique_ptr<FarMemManager> manager =
std::unique_ptr<FarMemManager>(FarMemManagerFactory::build(
kCacheSize, kNumGCThreads,
new TCPDevice(raddr, kNumConnections, kFarMemSize)));
for(int i = 0 ; i < 5 ; i++) //// new line added here
do_work(manager.get());
}
from aifm.
Hi probir, I just created two new cloudlab instances and reran the test. I still didn't see any segfault. Below is the log from my side:
zainruan@node-0:~/AIFM/aifm$ sudo ./bin/test_tcp_array_add configs/client.config 18.18.1.3:8000
CPU 01| <5> cpu: detected 20 cores, 1 nodes
CPU 01| <5> time: detected 2394 ticks / us
[ 0.000441] CPU 01| <5> loading configuration from 'configs/client.config'
[ 0.000465] CPU 01| <3> < 1 guaranteed kthreads is not recommended for networked apps
[ 0.014083] CPU 01| <5> net: started network stack
[ 0.014097] CPU 01| <5> net: using the following configuration:
[ 0.014102] CPU 01| <5> addr: 18.18.1.2
[ 0.014107] CPU 01| <5> netmask: 255.255.255.0
[ 0.014111] CPU 01| <5> gateway: 18.8.1.1
[ 0.014114] CPU 01| <5> mac: 56:9B:B2:A1:C2:77
[ 0.379299] CPU 01| <5> thread: created thread 0
[ 0.379365] CPU 01| <5> spawning 18 kthreads
[ 0.379471] CPU 04| <5> thread: created thread 1
[ 0.379509] CPU 15| <5> thread: created thread 2
[ 0.379544] CPU 17| <5> thread: created thread 3
[ 0.379593] CPU 05| <5> thread: created thread 4
[ 0.379767] CPU 03| <5> thread: created thread 5
[ 0.379919] CPU 13| <5> thread: created thread 6
[ 0.380049] CPU 09| <5> thread: created thread 7
[ 0.380112] CPU 08| <5> thread: created thread 8
[ 0.380197] CPU 15| <5> thread: created thread 9
[ 0.380266] CPU 01| <5> thread: created thread 10
[ 0.380423] CPU 13| <5> thread: created thread 11
[ 0.380514] CPU 13| <5> thread: created thread 12
[ 0.380697] CPU 19| <5> thread: created thread 13
[ 0.380833] CPU 01| <5> thread: created thread 14
[ 0.380883] CPU 19| <5> thread: created thread 15
[ 0.381029] CPU 13| <5> thread: created thread 16
[ 0.381079] CPU 13| <5> thread: created thread 17
Passed
Passed
Passed
Passed
Passed
[275.964328] CPU 15| <5> init: shutting down -> SUCCESS
from aifm.
Could you kindly confirm that the commit fe9b93b is included at your git repo? And rerun your test after typing make clean; make -j
in the AIFM/aifm
folder.
from aifm.
Resolved
from aifm.
Related Issues (20)
- spend too much time to run an experiment HOT 5
- Add section "Known Limitations" to README.md HOT 1
- ASSERTION 'tcp_dial(laddr, raddr, &remote_master_) != 0' FAILED IN 'TCPDevice' HOT 12
- SPDK backend HOT 3
- installing error HOT 2
- fig6a segfault HOT 8
- How to measure the throughput of the application HOT 1
- fig7 S3 links access denied HOT 2
- Log allocator mark-compact HOT 4
- A bug in gc_cache() and CircularBuffer
- ksched failed to insert HOT 1
- failed to map ingress region HOT 1
- control_setup: failed to map ingress region
- Zero window deadlock HOT 3
- Sche.c preempt assertion error HOT 8
- Link layer is not Ethernet. HOT 2
- mlx5_init: IB device not found HOT 4
- Segfault while reproducing fig11a HOT 8
- failed to load ksched.ko HOT 3
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from aifm.