google / tcmalloc Goto Github PK
View Code? Open in Web Editor NEWLicense: Apache License 2.0
License: Apache License 2.0
I would love to use tcmalloc on an ARM64 system, but it seems to not be officially supported and when I try to run I get the following:
external/com_google_tcmalloc/tcmalloc/system-alloc.cc:525] MmapAligned() failed (size, alignment) 33554432 33554432 @ 0x417480 0x416eb8 0x405528 0x415614 0x415410 0x427b4c 0x426bc8 0x7fa12d4144 0x 0x
external/com_google_tcmalloc/tcmalloc/arena.cc:31] FATAL ERROR: Out of memory trying to allocate internal tcmalloc data (bytes, object-size) 131072 48 @ 0x4055a8 0x415614 0x415410 0x427b4c 0x426bc8 0
A bit of debugging it looks like every time it calls the mmap with a hint it always gets back the same address (which doesn't match the hint), for example (with some extra logging):
external/com_google_tcmalloc/tcmalloc/system-alloc.cc:507] mmap (result, hint, size) 0x7fb5bad000 0x1df184000000 33554432 @ 0x46e2cc 0x46f600 0x46d468 0x46cc48 0x4231d4 0x41d534 0x469364 0x469138 0x
external/com_google_tcmalloc/tcmalloc/system-alloc.cc:507] mmap (result, hint, size) 0x7fb5bad000 0x76520000000 33554432 @ 0x46e2cc 0x46f600 0x46d468 0x46cc48 0x4231d4 0x41d534 0x469364 0x469138 0x 0
external/com_google_tcmalloc/tcmalloc/system-alloc.cc:507] mmap (result, hint, size) 0x7fb5bad000 0x7c0a8000000 33554432 @ 0x46e2cc 0x46f600 0x46d468 0x46cc48 0x4231d4 0x41d534 0x469364 0x469138 0x 0
The system I'm testing this on has a 4.9 kernel, so I can't try with MAP_FIXED_NOREPLACE
. I also don't understand why mmap is always returning the same address, not sure if this is an ARM64 specific thing or something about my particular platform (clang-9, Ubuntu 18.04, Kernel 4.9.140-tegra, running on a Jetson Nano).
In any case, are there plans to support ARM64, and if not, any thoughts on what may be going on here?
I am attempting to create a static redistributable libtcmalloc.a with the Bazel build so that the library may be brought into a different build system via the -I and -L options to gcc, but I'm not having much luck.
I can coerce Bazel to produce static libraries for everything by running from the root of the repo directory (hacky, and probably slightly wrong):
find . -name BUILD -type f -exec sed -i -e '/alwayslink = 1/d' -e '/linkstatic = 1/d' {} +
find . -name BUILD -type f -exec sed -i -r 's/cc_library\(/&\n linkstatic = True,/' {} +
But when I run on the libtcmalloc.a that is produced:
ar x libtcmalloc.a
I only see tcmalloc.pic.o. If I compare this to the libtcmalloc_minimal.a that is produced from gperftools, I see a number of additional object files; which is more along the lines of what I was epxecting.
I've also written a rough CMake build with the help of a bazel-to-cmake conversation tool, and I saw the same problem with the produced libtcmalloc.a as well.
What would be the recommended way to handle this while also accounting for tcmalloc's new dependency on Abseil? Or is there maybe a Bazel option I'm missing? I'm interested in statically linking everything, and ideally, libtcmalloc.a would include all objects that are needed for it to be consumed by another program and not get undefined reference to 'tcmalloc::Static::transfer_cache_'
for example during the linking stage.
Trying to get my head around bazel
is there a way of building a tcmalloc.so
shared object file from this project?
I know the recommendation is just to compile applications with tcmalloc directly but for my use case: https://github.com/SamSaffron/allocator_bench I would like to do a side by side comparison to perftools and jemalloc that are LD_PRELOADed.
Also not against experimenting with a statically compiled ruby including tcmalloc, if we can prove it is faster / better maybe Ruby folks would be open to adding it.
i think close sample will get a performance gains.
but when i do close it, i find the pagehuge free bytes grows!
is any one know the reason.
Thanks for addressing #18. Would it be possible to go a bit further and document all the settings in this directory?
/sys/kernel/mm/transparent_hugepage/khugepaged/
See: https://github.com/google/tcmalloc/blob/master/tcmalloc/internal/linked_list.h#L48
From Intel Manual:
The cache hierarchy of the Skylake microarchitecture has the following enhancements:
• Higher Cache bandwidth compared to previous generations.
• Simultaneous handling of more loads and stores enabled by enlarged buffers.
• Processor can do two page walks in parallel compared to one in Haswell microarchitecture and earlier
generations.
• Page split load penalty down from 100 cycles in previous generation to 5 cycles.
• L3 write bandwidth increased from 4 cycles per line in previous generation to 2 per line.
• Support for the CLFLUSHOPT instruction to flush cache lines and manage memory ordering of flushed
data using SFENCE.
• Reduced performance penalty for a software prefetch that specifies a NULL pointer.
• L2 associativity changed from 8 ways to 4 ways.
Did a quick test on my machine not seeing any DTLB_LOAD_MISSES
on prefetch NULL
(or any address less than 4096 for that matter).
TCMalloc attempts to read files in /sys/
(via Abseil to get CPU frequency & whatnot) while holding a spinlock in Static::SlowInitIfNecessary
. If run with a preloaded library that intercepts glibc I/O calls & then does a dlsym
, this can cause a deadlock because the dlsym
will cause TCMalloc to try to reenter it's initialization routine while holding the spinlock.
Admittedly this is grotty behavior on the part of the intercepting library (used to implement the Ekam build system). I'm going to push a PR there too to work around this. However, a simple fix here on the part of TCMalloc would be to call the Abseil functions that can read from /sys/
before entering the critical section. Since Abseil caches the values from /sys/
, this would remove the need for the workaround. Avoiding doing I/O while holding critical sections is probably a good idea anyway.
JEMalloc declares sdallocx
and nallocx
as __attribute__((__nothrow__))
, where as TCMalloc uses noexcept
. They should probably agree.
Do we have anything similar to https://gperftools.github.io/gperftools/heapprofile.html or are still working on it?
Thanks!
Hi,
I've been trying to build tcmalloc with clang 8 to no avail. The compilation error is:
tcmalloc/internal/percpu_tcmalloc.h:255:7: error: 'asm goto' constructs are not supported yet
asm goto(
The problem seems to be the following check
#ifdef PERCPU_USE_RSEQ_ASM_GOTO
asm goto(
#else
and specifically the fact that PERCPU_USE_RSEQ_ASM_GOTO is always defined but has different values: 1 or 0. In my case I have it defined as 0 from here:
42
43 #else
44 #define PERCPU_USE_RSEQ_ASM_GOTO 0 # <<<<<<<<<<<<<<<<
45 #endif
46 #else
47 #define PERCPU_USE_RSEQ_ASM_GOTO 0
48 #endif
You can easily reproduce the issue
#include <iostream>
#if 0
#define PERCPU_USE_RSEQ_ASM_GOTO 1
#else
#define PERCPU_USE_RSEQ_ASM_GOTO 0
#endif
int
main()
{
#ifdef PERCPU_USE_RSEQ_ASM_GOTO
std::cout << "defined, value: " << PERCPU_USE_RSEQ_ASM_GOTO << std::endl;
#endif
}
It would be really great if some description could be added in 'About' section since this will make it easier to get a brief idea of the project.
Thanks!
Recently I've been playing with tcmalloc (this new version) and found out that once the size of allocation is not small (there is size class for it) it always acquire the pageheap_lock
(and indeed, this is described in doc)
However this became a bottleneck with multiple threads, here is a sample that shows this, it is simply:
And results (you can also find this numbers in comments):
conf | real | user | sys |
---|---|---|---|
jemalloc | 0m10.816s | 2m24.375s | 0m0.230s |
tcmalloc | 0m19.837s | 4m32.754s | 0m3.329s |
jemalloc capped to 256K | 0m2.567s | 0m32.748s | 0m0.020s |
tcmalloc capped to 256K | 0m2.335s | 0m28.804s | 0m0.010s |
sys time is mostly due to futex
Plus some locking info (no need in anything better then strace, it shows the problem):
$ time strace -qq -fefutex -c allocator-perf-jemalloc
% time seconds usecs/call calls errors syscall
------ ----------- ----------- --------- --------- ----------------
100.00 0.448936 11223 40 futex
------ ----------- ----------- --------- --------- ----------------
100.00 0.448936 40 total
real 0m10.851s
user 2m27.460s
sys 0m0.767s
$ time strace -qq -fefutex -c allocator-perf-tcmalloc
% time seconds usecs/call calls errors syscall
------ ----------- ----------- --------- --------- ----------------
100.00 29.896237 18 1619220 629766 futex
------ ----------- ----------- --------- --------- ----------------
100.00 29.896237 1619220 629766 total
real 0m27.448s
user 3m44.494s
sys 0m33.782s
Any plans on improving this?
Or maybe adding support for custom size classes? By providing some helpers to generate them (I can even generate them right now, with some small modifications)
tcmalloc version: 8738f27
I want to inspect what is causing my program to use more memory than I expected. It seems gperftool is deprecated in favor of abseil's tcmalloc and the new go-based pprof.
Is there a recommended workflow for inspecting what function etc is using the memory, like HeapProfilerStart()
or env HEAPPROFILE=/tmp/mybin.hprof
in gperftools, some flag, /heapz, etc? It seems to that I can call absl::MallocExtension::SnapshotCurrent()
, but it returns tcmalloc::Profile
-- is there a way to convert it to profile.proto which I assume is what the new pprof needs?
This PR is to cover the initial implementation for RSEQ support for the AArch64 port.
The tcmalloc page (http://goog-perftools.sourceforge.net/doc/tcmalloc.html) mentions that running with LD_PRELOAD is tricky and something you don't necessarily recommend, but doesn't go into detail about why. AFAICT possible issues include:
- LD_PRELOAD is ignored if the setuid bit is set.
- LD_PRELOAD would be inherited by any forked children, which may not be deseriable.
Anything else I'm missing?
See title.
bazel is not really widely used.
Could you please tell me the tcmalloc support QNX OS?
Related to Chromium's TCmalloc, but maybe somebody here can help.
Modified Chromium TTMalloc allocator to expose pagemap_ like below:
third_party/tcmalloc/chromium/src/page_heap.h
class PERFTOOLS_DLL_DECL PageHeap {
public:
PageHeap();
typedef MapSelector<kAddressBits>::Type PageMap;
PageMap pagemap_;
third_party/tcmalloc/chromium/src/static_vars.cc
PageHeap Static::pageheap2_;
third_party/tcmalloc/chromium/src/static_vars.h
static PageHeap pageheap2_;
Exposed PageHeap via pageheap2_, but somehow 3 radix tree does not contain data/pointers? Only "0"
My script (https://github.com/marcinguy/tcmalloc-inspector) shows the same:
USED:
BLOCK SUMMARY
0 blocks, 0 total size
size frequencies:
FREE:
BLOCK SUMMARY
0 blocks, 0 total size
size frequencies:
LOST:
BLOCK SUMMARY
0 blocks, 0 total size
size frequencies:
More GDB output here: https://github.com/marcinguy/tcmalloc-chromium
Any idea why?
Did I modify the TCMalloc wrongly?
With Google's Tcmalloc in another sample program ia works correctly.
https://github.com/marcinguy/tcmalloc-inspector
Thanks,
shenderson-d3jgh5:workspace shenderson$ git clone https://github.com/google/tcmalloc.git
Cloning into 'tcmalloc'...
warning: templates not found in /Users/shenderson/.git-template
remote: Enumerating objects: 247, done.
remote: Counting objects: 100% (247/247), done.
remote: Compressing objects: 100% (197/197), done.
remote: Total 247 (delta 67), reused 230 (delta 50), pack-reused 0
Receiving objects: 100% (247/247), 573.68 KiB | 3.32 MiB/s, done.
Resolving deltas: 100% (67/67), done.
shenderson-d3jgh5:workspace shenderson$ cd tcmalloc/
shenderson-d3jgh5:tcmalloc shenderson$ bazel build //...
Starting local Bazel server and connecting to it...
WARNING: Download from https://mirror.bazel.build/github.com/bazelbuild/rules_cc/archive/7e650b11fe6d49f70f2ca7a1c4cb8b
cc4a1fe239.zip failed: class com.google.devtools.build.lib.bazel.repository.downloader.UnrecoverableHttpException GET r
eturned 404 Not Found
INFO: Analyzed 114 targets (45 packages loaded, 1105 targets configured).
INFO: Found 114 targets...
ERROR: /Users/shenderson/workspace/tcmalloc/tcmalloc/internal/BUILD:123:1: C++ compilation of rule '//tcmalloc/internal
:logging' failed (Exit 1) wrapped_clang failed: error executing command external/local_config_cc/wrapped_clang '-D_FORT
IFY_SOURCE=1' -fstack-protector -fcolor-diagnostics -Wall -Wthread-safety -Wself-assign -fno-omit-frame-pointer -O0 -DD
EBUG '-std=c++11' -iquote . -iquote ... (remaining 31 argument(s) skipped)
Use --sandbox_debug to see verbose messages from the sandbox
warning: unknown warning option '-Wno-attribute-alias'; did you mean '-Wno-attributes'? [-Wunknown-warning-option]
tcmalloc/internal/logging.cc:22:10: fatal error: 'syscall.h' file not found
#include <syscall.h>
^~~~~~~~~~~
1 warning and 1 error generated.
INFO: Elapsed time: 16.659s, Critical Path: 1.07s
INFO: 5 processes: 5 darwin-sandbox.
FAILED: Build did NOT complete successfully
commit information:
git log
commit 84819522941112e40c7018870fbe9a83287097f3 (HEAD -> master, origin/master, origin/HEAD)
Author: Martin Maas <[email protected]>
Date: Thu Feb 13 15:38:17 2020
Refactor time series tracking.
We are adding additional time series telemetry. To avoid duplication, this change factors out shared functionality from MinMaxTracker to be reused for the other time series trackers. It should not change any current behavior.
PiperOrigin-RevId: 294989313
Change-Id: I53e1329ef639aec9dde69d74b71e4279c76c58d8
Is there an equivalent in tcmalloc to gperftool's interfaces found in malloc_hook.h, such as AddNewHook and AddDeleteHook?
We use bazel as our build system.
How do I disable tcmalloc for asan/tsan runs when using the malloc = "@com_google_tcmalloc//tcmalloc",
syntax?
To provide some context, we are building a memory management framework written in Rust. We are experimenting with different malloc/free implementations.
We tried the following two approaches of linking with tcmalloc.
First, we tried to statically link with tcmalloc. We use the libtcmalloc.lo
produced by bazel build tcmalloc
.
We got the following error.
Error: failed <executable> because /path/to/OurLibrary.so: undefined symbol: _ZN8tcmalloc17tcmalloc_internal10Parameters23per_cpu_caches_enabled_E
Second, we tried to dynamically link with tcmalloc. Specifically, we deleted linkstatic=1
(https://github.com/google/tcmalloc/blob/f4a573f/tcmalloc/BUILD#L91) and used libtcmalloc.so
produced by Bazel.
We got the following error when run our executable linked with tcmalloc.
Error: failed <executable> because /path/to/libtcmalloc.so: undefined symbol: _ZN4absl19str_format_internal13FormatArgImpl8DispatchINSt7__cxx1112basic_stringIcSt11char_traitsIcESaIcEEEEEbNS1_4DataENS0_24FormatConversionSpecImplEPv
We are just wondering what the best practice of using tcmalloc standalone is (ODR is not a big issue for us as the project is in Rust #16 #48).
cc @caizixian
When realloc
or similar function is called on same memory from two threads without sync i assume that program will crash.
But with TCMalloc it doesn't.
Minimal reproducible example:
Create std::vector<double>
and push elements to it from two different threads without any locks, then suddenly program will stuck with infinite memory allocating loop until it dies from oom killer.
Infinite loop is here:
https://github.com/google/tcmalloc/blob/master/tcmalloc/peak_heap_tracker.cc#L63
cause Static::sampled_objects_.head_.next_
points to itself with next_
and prev_
.
It is caused by the fact that both threads receive same Span
object from HugePageAwareAllocator
here:
https://github.com/google/tcmalloc/blob/master/tcmalloc/tcmalloc.cc#L1481
I am little confused why, cause HugePageAwareAllocator
takes pageheap_lock
before any allocations.
We are using TCMalloc from commit a643d89610317be1eff9f7298104eef4c987d8d5
.
This bug is kinda critical in production where such races can pop up so I hope it can be fixed soon (or maybe it is fixed in newer versions?).
bt:
tcmalloc::PeakHeapTracker::MaybeSaveSample (this=0x4ff3c0 <tcmalloc::Static::peak_heap_tracker_>) at .../libs/tcmalloc/tcmalloc/peak_heap_tracker.cc:70
(anonymous namespace)::SampleifyAllocation (requested_size=0, requested_size@entry=2097152, weight=<optimized out>, requested_alignment=<optimized out>, requested_alignment@entry=1, cl=cl@entry=0, obj=obj@entry=0x0, span=span@entry=0x47447ec005e0, capacity=0x0) at .../libs/tcmalloc/tcmalloc/tcmalloc.cc:1469
(anonymous namespace)::do_malloc_pages (size=2097152, alignment=1) at .../libs/tcmalloc/tcmalloc/tcmalloc.cc:1529
slow_alloc<tcmalloc::TCMallocPolicy<tcmalloc::CppOomPolicy, tcmalloc::DefaultAlignPolicy, tcmalloc::InvokeHooksPolicy>, decltype(nullptr)>(tcmalloc::TCMallocPolicy<tcmalloc::CppOomPolicy, tcmalloc::DefaultAlignPolicy, tcmalloc::InvokeHooksPolicy>, unsigned long, decltype(nullptr)) (policy=..., size=2097152, capacity=<optimized out>) at .../libs/tcmalloc/tcmalloc/tcmalloc.cc:1826
std::__y1::__libcpp_allocate (__size=78420757185056, __align=8) at .../libs/cxxsupp/libcxx/include/new:261
std::__y1::allocator<double>::allocate (this=<optimized out>, __n=262144) at .../libs/cxxsupp/libcxx/include/memory:1869
std::__y1::allocator_traits<std::__y1::allocator<double> >::allocate (__a=..., __n=262144) at .../libs/cxxsupp/libcxx/include/memory:1585
std::__y1::__split_buffer<double, std::__y1::allocator<double>&>::__split_buffer (this=<optimized out>, __cap=262144, __start=131072, __a=...) at .../libs/cxxsupp/libcxx/include/__split_buffer:326
std::__y1::vector<double, std::__y1::allocator<double> >::__push_back_slow_path<double> (this=0x7ffd69d3c2e0, __x=<optimized out>) at .../libs/cxxsupp/libcxx/include/vector:1660
std::__y1::vector<double, std::__y1::allocator<double> >::push_back (this=0x7ffd69d3c2e0, __x=<optimized out>) at .../libs/cxxsupp/libcxx/include/vector:1692
We are wondering if Google might have some special internal implementation of kernel transparent huge pages (THP) that is somehow different from the upstream version. Could you please confirm or deny this?
I found this repository by following this link from gperftools, and now I don't really know what to do with my pull request that was sent to gperftools what I thought it is the public implementation of TCMalloc.
tl;dr - Safe-Linking is a security feature that I added on top of the Single-Linked-Lists (SLL). It is similar to the existing maskPtr()
functionality that was added to Chrome's TCMalloc implementation by Chrome's security team in 2012. Safe-Linking (as a concept) is now in the process of also being merged into GLIBC's ptmalloc implementation, and also uClibc's dlmalloc implementation.
I was hoping that this security enhancement feature could be integrated in TCMalloc to help prevent attacks, as is detailed fully in the my white-paper that could be found on the original pull request. The benchmarking results for all of the different heap implementations are very encouraging, and according to gperftool's benchmarking they are 0.02% for the average test and 1.5% for the worst test case.
This feature was already implemented for gperftools and sent as a pull request, including the CLA and everything. Could you check if you could integrate it into your implementation as well? I didn't send a new pull request as the changes to my original pull request seem minor and I am having trouble to setup your dev environment on my computer...
I want a timer that checks every minute and executes MallocExtension::instance()->ReleaseFreeMemory(); when the tcmalloc free memory usage reaches 50%;
the api is: void SetTimerRelease(1 minute, 50% usage)
or
int GetMemoryUsage() // return value like the ’top' command %MEM on linux
I'm really excited to see this new version of tcmalloc becoming available. In particular, the per-cpu support has long been an idea of interest. However, it is currently quite unclear how this new project compares with the existing gperftools/gperftools project. I think it would be helpful if this project contained some documentation that provided a direct comparison with the other (soon to be legacy?) project. Some roadmap and future directions content would be welcome as well. In no particular order:
The old gpeftools supported a wider array of platforms. On the OS side, Windows and macOS, at least to some degree. This project looks to currently be Linux only. Is support for those other operating systems planned? Explicitly out of scope? Similar questions regarding CPU. I note that ppc (presumably ppc64le?) is supported. But s390x (not surprising) and arm64 (quite surprising?) are absent. Are they on the horizon? Is work from the community to support those other platforms welcome?
What exactly has changed regarding support for CPU and heap profiling? It looks like they are more or less gone? Which is fine, at least for my use, I'd just like to know for sure either way.
Similar question regarding debugallocation. It seems that some of the classic debug allocator features that were part of gperftools may no longer be included. But at least use-after-free detection seems like it is still present, per some references to 0xcd? It probably makes sense to de-emphasize these sorts of features in world with ASAN, but some more information here would be welcome. And are there new interesting debugging features added?
What previously offered tunings or configurations have been removed or added?
What is the degree of stability of the code at this point? Should projects that have longstanding integrations with gperftoools be looking to switch now? If not, what are the gating changes?
Is there a release/tag/branch strategy? ABI stability goals? What should happen with packaging, especially for systems where the OS provides a "tcmalloc" package that derives from the old gperftools project?
What is the plan regarding synchronization between this project and the internal Google tcmalloc implementation? How open is the project to community contributions? Will those contributions be synced back to google, or will this eventually become another fork, as somewhat happened to gperftools?
I know that is a lot of questions, but I'm hopeful that putting some of the answers down in writing will help everyone who currently uses gperftools in their projects to understand how this new project should be approached.
I'd also like to thank you in advance for all the work that I am certain went into getting this new version of tcmalloc out into the world. Please don't take my long list of questions and concerns as anything other than deriving from a keen interest in the success of this new project.
For tcmalloc_huge_pages, some Linux settings are clearly required to enable transparent huge pages (THP). Can you please document the expected settings for:
OS: Ubuntu 18.04 x86_64
GCC 7.5.0
I followed the installation guide. When execute bazel test //tcmalloc/...
, an error occurred.
In file included from ./tcmalloc/huge_page_aware_allocator.h:25:0,
from tcmalloc/huge_page_aware_allocator.cc:15:
./tcmalloc/huge_page_filler.h: In instantiation of 'void tcmalloc::SkippedSubreleaseCorrectnessTracker<kEpochs>::ReportUpdatedPeak(tcmalloc::Length) [with long unsigned int kEpochs = 600; tcmalloc::Length = long unsigned int]':
./tcmalloc/huge_page_filler.h:242:9: required from 'void tcmalloc::FillerStatsTracker<kEpochs>::Report(tcmalloc::FillerStatsTracker<kEpochs>::FillerStats) [with long unsigned int kEpochs = 600]'
./tcmalloc/huge_page_filler.h:1791:24: required from 'void tcmalloc::HugePageFiller<TrackerType>::UpdateFillerStatsTracker() [with TrackerType = tcmalloc::PageTracker<tcmalloc::SystemRelease>]'
./tcmalloc/huge_page_filler.h:1289:27: required from 'void tcmalloc::HugePageFiller<TrackerType>::Contribute(TrackerType*, bool) [with TrackerType = tcmalloc::PageTracker<tcmalloc::SystemRelease>]'
tcmalloc/huge_page_aware_allocator.cc:135:33: required from here
./tcmalloc/huge_page_filler.h:89:5: error: no matching function for call to 'tcmalloc::TimeSeriesTracker<tcmalloc::SkippedSubreleaseCorrectnessTracker<600>::SkippedSubreleaseEntry, tcmalloc::SkippedSubreleaseCorrectnessTracker<600>::SkippedSubreleaseUpdate, 600>::Report(<brace-enclosed initializer list>)'
if (tracker_.Report({.confirmed_peak = current_peak})) {
^~
Anybody can help?
TCmalloc Version:
commit df10c10548065948d91f2bbfe7caf73cd8bfae85
Author: Chris Kennelly <[email protected]>
Date: Thu Jun 18 09:43:17 2020 -0700
GCC version: gcc (Ubuntu 9.3.0-10ubuntu2) 9.3
bazel version: bazel 3.2.0
compile error:
ERROR: /home/dev/tcmalloc/tcmalloc/testing/BUILD:595:8: C++ compilation of rule '//tcmalloc/testing:limit_test' failed (Exit 1) gcc failed: error executing command /usr/bin/gcc -U_FORTIFY_SOURCE -fstack-protector -Wall -Wunused-but-set-parameter -Wno-free-nonheap-object -fno-omit-frame-pointer '-std=c++0x' -MD -MF ... (remaining 52 argument(s) skipped)
Use --sandbox_debug to see verbose messages from the sandbox
In file included from external/com_google_googletest/googletest/include/gtest/internal/gtest-death-test-internal.h:39,
from external/com_google_googletest/googletest/include/gtest/gtest-death-test.h:41,
from external/com_google_googletest/googletest/include/gtest/gtest.h:64,
from external/com_google_googletest/googlemock/include/gmock/internal/gmock-internal-utils.h:47,
from external/com_google_googletest/googlemock/include/gmock/gmock-actions.h:51,
from external/com_google_googletest/googlemock/include/gmock/gmock.h:59,
from tcmalloc/testing/limit_test.cc:25:
external/com_google_googletest/googletest/include/gtest/gtest-matchers.h: In instantiation of 'bool testing::internal::MatchesRegexMatcher::MatchAndExplain(const MatcheeStringType&, testing::MatchResultListener*) const [with MatcheeStringType = std::basic_string_view<char>]':
external/com_google_googletest/googletest/include/gtest/gtest-matchers.h:484:47: required from 'bool testing::PolymorphicMatcher<Impl>::MonomorphicImpl<T>::MatchAndExplain(T, testing::MatchResultListener*) const [with T = const std::basic_string_view<char>&; Impl = testing::internal::MatchesRegexMatcher]'
external/com_google_googletest/googletest/include/gtest/gtest-matchers.h:483:10: required from here
external/com_google_googletest/googletest/include/gtest/gtest-matchers.h:647:24: error: invalid initialization of reference of type 'const string&' {aka 'const std::__cxx11::basic_string<char>&'} from expression of type 'const std::basic_string_view<char>'
647 | const std::string& s2(s);
I'm trying to build on Debian 10, gcc 8.3.0
/usr/src/tcmalloc# bazel test //tcmalloc/...
INFO: Analyzed 134 targets (0 packages loaded, 0 targets configured).
INFO: Found 52 targets and 82 test targets...
ERROR: /usr/src/tcmalloc/tcmalloc/BUILD:300:11: C++ compilation of rule '//tcmalloc:common_deprecated_perthread' failed (Exit 1) gcc failed: error executing command /usr/bin/gcc -U_FORTIFY_SOURCE -fstack-protector -Wall -Wunused-but-set-parameter -Wno-free-nonheap-object -fno-omit-frame-pointer '-std=c++0x' -MD -MF ... (remaining 28 argument(s) skipped)
Use --sandbox_debug to see verbose messages from the sandbox
In file included from ./tcmalloc/huge_page_aware_allocator.h:25,
from tcmalloc/huge_page_aware_allocator.cc:15:
./tcmalloc/huge_page_filler.h: In instantiation of 'void tcmalloc::SkippedSubreleaseCorrectnessTracker<kEpochs>::ReportUpdatedPeak(tcmalloc::Length) [with long unsigned int kEpochs = 600; tcmalloc::Length = long unsigned int]':
./tcmalloc/huge_page_filler.h:270:9: required from 'void tcmalloc::FillerStatsTracker<kEpochs>::Report(tcmalloc::FillerStatsTracker<kEpochs>::FillerStats) [with long unsigned int kEpochs = 600]'
./tcmalloc/huge_page_filler.h:1884:24: required from 'void tcmalloc::HugePageFiller<TrackerType>::UpdateFillerStatsTracker() [with TrackerType = tcmalloc::PageTracker<tcmalloc::SystemRelease>]'
./tcmalloc/huge_page_filler.h:1345:3: required from 'void tcmalloc::HugePageFiller<TrackerType>::Contribute(TrackerType*, bool) [with TrackerType = tcmalloc::PageTracker<tcmalloc::SystemRelease>]'
tcmalloc/huge_page_aware_allocator.cc:135:33: required from here
./tcmalloc/huge_page_filler.h:89:5: error: no matching function for call to 'tcmalloc::TimeSeriesTracker<tcmalloc::SkippedSubreleaseCorrectnessTracker<600>::SkippedSubreleaseEntry, tcmalloc::SkippedSubreleaseCorrectnessTracker<600>::SkippedSubreleaseUpdate, 600>::Report(<brace-enclosed initializer list>)'
if (tracker_.Report({.confirmed_peak = current_peak})) {
^~
In file included from ./tcmalloc/huge_cache.h:32,
from ./tcmalloc/huge_page_aware_allocator.h:24,
from tcmalloc/huge_page_aware_allocator.cc:15:
./tcmalloc/internal/timeseries_tracker.h:153:6: note: candidate: 'bool tcmalloc::TimeSeriesTracker<T, S, kEpochs>::Report(S) [with T = tcmalloc::SkippedSubreleaseCorrectnessTracker<600>::SkippedSubreleaseEntry; S = tcmalloc::SkippedSubreleaseCorrectnessTracker<600>::SkippedSubreleaseUpdate; long unsigned int kEpochs = 600]'
bool TimeSeriesTracker<T, S, kEpochs>::Report(S val) {
^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
./tcmalloc/internal/timeseries_tracker.h:153:6: note: no known conversion for argument 1 from '<brace-enclosed initializer list>' to 'tcmalloc::SkippedSubreleaseCorrectnessTracker<600>::SkippedSubreleaseUpdate'
In file included from ./tcmalloc/malloc_extension.h:39,
from ./tcmalloc/experiment.h:27,
from ./tcmalloc/huge_cache.h:27,
from ./tcmalloc/huge_page_aware_allocator.h:24,
from tcmalloc/huge_page_aware_allocator.cc:15:
external/com_google_absl/absl/functional/function_ref.h:101:3: error: 'absl::FunctionRef<R(Args ...)>::FunctionRef(const F&) [with F = tcmalloc::SkippedSubreleaseCorrectnessTracker<kEpochs>::ReportUpdatedPeak(tcmalloc::Length) [with long unsigned int kEpochs = 600; tcmalloc::Length = long unsigned int]::<lambda(size_t, int64_t, const tcmalloc::SkippedSubreleaseCorrectnessTracker<600>::SkippedSubreleaseEntry&)>; <template-parameter-2-2> = void; R = void; Args = {long unsigned int, long int, const tcmalloc::SkippedSubreleaseCorrectnessTracker<600>::SkippedSubreleaseEntry&}]', declared using local type 'const tcmalloc::SkippedSubreleaseCorrectnessTracker<kEpochs>::ReportUpdatedPeak(tcmalloc::Length) [with long unsigned int kEpochs = 600; tcmalloc::Length = long unsigned int]::<lambda(size_t, int64_t, const tcmalloc::SkippedSubreleaseCorrectnessTracker<600>::SkippedSubreleaseEntry&)>', is used but never defined [-fpermissive]
FunctionRef(const F& f) // NOLINT(runtime/explicit)
^~~~~~~~~~~
INFO: Elapsed time: 3.254s, Critical Path: 3.00s
INFO: 26 processes: 26 linux-sandbox.
FAILED: Build did NOT complete successfully```
The .gitignore
file should ignore bazel-*
patterns.
The commit I use is 3dda5d0
I use gcc 7.3.1 to build and run the test:
CXX=/usr/bin/g++ CC=/usr/bin/gcc bazel --output_user_root=/data/bazel-cache test //tcmalloc/...
And there's one failure. Here's the log:
cat /data/bazel-cache/53be17487e92ab49f9a9a0a4d546d9a6/execroot/com_google_tcmalloc/bazel-out/k8-fastbuild/testlogs/tcmalloc/testing/releasing_test/test.log
exec ${PAGER:-/usr/bin/less} "$0" || exit 1
Executing tests from //tcmalloc/testing:releasing_test
-----------------------------------------------------------------------------
tcmalloc/testing/releasing_test.cc:142] Unmapped Memory [Before] 155189248
tcmalloc/testing/releasing_test.cc:144] Unmapped Memory [After ] 2535456768
tcmalloc/testing/releasing_test.cc:146] Unmapped Memory [Diff ] 2380267520
tcmalloc/testing/releasing_test.cc:148] Memory Usage [Before] 2358042624
tcmalloc/testing/releasing_test.cc:150] Memory Usage [After ] 51367936
tcmalloc/testing/releasing_test.cc:152] Memory Usage [Diff ] 2306674688
(after_unmapped - before_unmapped) != (before - after):18446744071794851840] 2306674688 @ 0x40e8da 0x7f94f4340c05
It appears that tcmalloc's buildsystem is currently mostly written with static linking in mind. It would be great if there where a rule/target that produces a shared object of tcmalloc (potentially including dependencies like abseil).
After changing in BUILD file linkstatic = 1 to 0 and running bazel build //tcmalloc build seems to be ok.
ldd /usr/local/lib/libtcmalloc.so
linux-vdso.so.1 (0x00007fff104e5000)
libstdc++.so.6 => /usr/lib/x86_64-linux-gnu/debug/libstdc++.so.6 (0x00007f7ab7b10000)
libm.so.6 => /lib/x86_64-linux-gnu/libm.so.6 (0x00007f7ab7772000)
libgcc_s.so.1 => /lib/x86_64-linux-gnu/libgcc_s.so.1 (0x00007f7ab755a000)
libc.so.6 => /lib/x86_64-linux-gnu/libc.so.6 (0x00007f7ab7169000)
/lib64/ld-linux-x86-64.so.2 (0x00007f7ab7ec9000)
But attempts to use it, like open it with dlopen() from C++ code fail because of undefined symbol:
./loader
C++ dlopen demo
Opening tcmalloc.so...
Cannot open library: /usr/local/lib/libtcmalloc.so: undefined symbol: _ZN4absl19str_format_internal13FormatArgImpl8DispatchINSt7__cxx1112basic_stringIcSt11char_traitsIcESaIcEEEEEbNS1_4DataENS0_24FormatConversionSpecImplEPv
Is it old libstdc++ version, or old gcc version (7.5.0)? Any sane list of requirements or a doc about how to build it ?
I tried to compile sentencepiece locally and got the follow error. Any help are appreciated:
[ 16%] Built target sentencepiece_train-static
[ 83%] Built target sentencepiece-static
make[2]: *** No rule to make target '/usr/lib/x86_64-linux-gnu/libtcmalloc_minimal.so', needed by 'src/spm_normalize'. Stop.
make[1]: *** [CMakeFiles/Makefile2:104: src/CMakeFiles/spm_normalize.dir/all] Error 2
Of course, libtcmalloc_minimal.so
is not in the folder of /usr/lib/x86_64-linux-gnu/
. Any ideas to install it. Thx.
I mean 1GiB pages, looks like tcmalloc
can use transparent huge pages out of the box. However I didnt findany evidence it can use 1GiB huge pages. In case it is not supported, is there a way just to feed the starting pointer and give it available memory size which I obtain from mmap
?
I use the gperftools/tcmalloc and I'm interested to try this variant.
It is unclear to me how, once I build tcmalloc using bazel, I can 'install' the build artifacts in a way I can consume them from my existing cmake project. The documentation assumes a bazel-build consumer.
struct stack_frame {
struct stack_frame *prev;
void *return_addr;
} __attribute__((packed));
typedef struct stack_frame stack_frame;
void backtrace_from_fp(void **buf, int size)
{
/*
int i;
stack_frame *fp;
__asm__ __volatile__("movq %%rbp, %[fp]" : [fp] "=r" (fp));
for(i = 0; i < size && fp != NULL; fp = fp->prev, i++)
buf[i] = fp->return_addr;
*/
buf[0] = __builtin_return_address (0);
buf[2] = __builtin_return_address (1);
}
My code which uses malloc
#include <malloc.h>
void f3()
{
malloc(10);
}
void f2()
{
}
void f1()
{
}
void f()
{
}
main()
{
f();
}
g++ -fno-builtin-malloc -fno-builtin-calloc -fno-builtin-realloc -fno-builtin-free -lpthread pop.c .libs/libtcmalloc_and_profiler.a
Above is how I built with static library.
I do get crash as below..
Program received signal SIGSEGV, Segmentation fault.
0x000000000041f417 in backtrace_from_fp (size=10, buf=<optimized out>) at src/tcmalloc.cc:1914
1914 buf[2] = __builtin_return_address (1);
Missing separate debuginfos, use: debuginfo-install glibc-2.17-196.el7_4.2.x86_64 libgcc-4.8.5-36.el7_6.2.x86_64 libstdc++-4.8.5-36.el7_6.2.x86_64
(gdb) bt
#0 0x000000000041f417 in backtrace_from_fp (size=10, buf=<optimized out>) at src/tcmalloc.cc:1914
#1 tc_malloc (size=size@entry=1) at src/tcmalloc.cc:1924
#2 0x00000000004051d6 in TCMallocGuard::TCMallocGuard (this=<optimized out>) at src/tcmalloc.cc:1121
#3 0x0000000000403498 in __static_initialization_and_destruction_0 (__initialize_p=1, __priority=65535) at src/tcmalloc.cc:1153
#4 _GLOBAL__sub_I__ZN61FLAG__namespace_do_not_use_directly_use_DECLARE_int64_instead43FLAGS_tcmalloc_large_alloc_report_thresholdE ()
at src/tcmalloc.cc:2319
#5 0x000000000041eb5d in __libc_csu_init ()
#6 0x00007ffff6ffeb95 in __libc_start_main () from /lib64/libc.so.6
#7 0x0000000000403bf8 in _start ()
(gdb)
Any help on the same ?.
rgds
Balaji Kamal Kannadassan
#if defined(OS_WINDOWS)
// We don't do any overriding on windows. Just provide a dummy function.
static void ReplaceSystemAlloc() { }
#elif defined(GLIBC)
#include "tcmalloc/libc_override_glibc.h"
#else
#error Need to add support for your libc/OS here
#endif
The code has only two branches with two macros and who defines macro OS_WINDOWS? It seems like a user defined macro but I can't find it.
Hi,
I'm using Bazel to build tcmalloc on Mac OS X 10.15.6 but I get an error:
❯ bazel build "@com_google_tcmalloc//tcmalloc"
INFO: Analyzed target @com_google_tcmalloc//tcmalloc:tcmalloc (0 packages loaded, 0 targets configured).
INFO: Found 1 target...
INFO: From Compiling external/com_google_tcmalloc/tcmalloc/internal/environment.cc:
warning: unknown warning option '-Wno-attribute-alias'; did you mean '-Wno-attributes'? [-Wunknown-warning-option]
1 warning generated.
ERROR: /private/var/tmp/_bazel_username/148f9f6ebca6e47e7d6d5ed427a82e62/external/com_google_tcmalloc/tcmalloc/internal/BUILD:200:11: C++ compilation of rule '@com_google_tcmalloc//tcmalloc/internal:mincore' failed (Exit 1): wrapped_clang failed: error executing command
(cd /private/var/tmp/_bazel_username/148f9f6ebca6e47e7d6d5ed427a82e62/sandbox/darwin-sandbox/59/execroot/__main__ && \
exec env - \
APPLE_SDK_PLATFORM=MacOSX \
APPLE_SDK_VERSION_OVERRIDE=10.15 \
PATH=/Users/username/.local/bin:/Users/username/go/bin:/Users/username/.cargo/bin:/usr/local/opt/llvm/bin:/usr/local/bin:/usr/local/sbin:/usr/bin:/bin:/usr/sbin:/sbin \
XCODE_VERSION_OVERRIDE=11.6.0.11E708 \
external/local_config_cc/wrapped_clang '-D_FORTIFY_SOURCE=1' -fstack-protector -fcolor-diagnostics -Wall -Wthread-safety -Wself-assign -fno-omit-frame-pointer -O0 -DDEBUG '-std=c++11' -iquote external/com_google_tcmalloc -iquote bazel-out/darwin-fastbuild/bin/external/com_google_tcmalloc -MD -MF bazel-out/darwin-fastbuild/bin/external/com_google_tcmalloc/tcmalloc/internal/_objs/mincore/mincore.d '-frandom-seed=bazel-out/darwin-fastbuild/bin/external/com_google_tcmalloc/tcmalloc/internal/_objs/mincore/mincore.o' -isysroot __BAZEL_XCODE_SDKROOT__ -F__BAZEL_XCODE_SDKROOT__/System/Library/Frameworks -F__BAZEL_XCODE_DEVELOPER_DIR__/Platforms/MacOSX.platform/Developer/Library/Frameworks '-mmacosx-version-min=10.15' -DHAVE_BAZEL_BUILD '-fdiagnostics-color=always' '-std=c++2a' -Wall -Wreturn-type -Wuninitialized -Wunused-result '-Werror=narrowing' '-Werror=reorder' -Wunused-local-typedefs '-Werror=conversion-null' '-Werror=overlength-strings' '-Werror=pointer-arith' '-Werror=varargs' '-Werror=vla' '-Werror=write-strings' -Wmissing-declarations -Wno-attribute-alias -Wno-sign-compare -Wno-uninitialized -Wno-unused-function -Wno-unused-result -Wno-unused-variable -no-canonical-prefixes -Wno-builtin-macro-redefined '-D__DATE__="redacted"' '-D__TIMESTAMP__="redacted"' '-D__TIME__="redacted"' -c external/com_google_tcmalloc/tcmalloc/internal/mincore.cc -o bazel-out/darwin-fastbuild/bin/external/com_google_tcmalloc/tcmalloc/internal/_objs/mincore/mincore.o)
Execution platform: @local_config_platform//:host
Use --sandbox_debug to see verbose messages from the sandbox wrapped_clang failed: error executing command
(cd /private/var/tmp/_bazel_username/148f9f6ebca6e47e7d6d5ed427a82e62/sandbox/darwin-sandbox/59/execroot/__main__ && \
exec env - \
APPLE_SDK_PLATFORM=MacOSX \
APPLE_SDK_VERSION_OVERRIDE=10.15 \
PATH=/Users/username/.local/bin:/Users/username/go/bin:/Users/username/.cargo/bin:/usr/local/opt/llvm/bin:/usr/local/bin:/usr/local/sbin:/usr/bin:/bin:/usr/sbin:/sbin \
XCODE_VERSION_OVERRIDE=11.6.0.11E708 \
external/local_config_cc/wrapped_clang '-D_FORTIFY_SOURCE=1' -fstack-protector -fcolor-diagnostics -Wall -Wthread-safety -Wself-assign -fno-omit-frame-pointer -O0 -DDEBUG '-std=c++11' -iquote external/com_google_tcmalloc -iquote bazel-out/darwin-fastbuild/bin/external/com_google_tcmalloc -MD -MF bazel-out/darwin-fastbuild/bin/external/com_google_tcmalloc/tcmalloc/internal/_objs/mincore/mincore.d '-frandom-seed=bazel-out/darwin-fastbuild/bin/external/com_google_tcmalloc/tcmalloc/internal/_objs/mincore/mincore.o' -isysroot __BAZEL_XCODE_SDKROOT__ -F__BAZEL_XCODE_SDKROOT__/System/Library/Frameworks -F__BAZEL_XCODE_DEVELOPER_DIR__/Platforms/MacOSX.platform/Developer/Library/Frameworks '-mmacosx-version-min=10.15' -DHAVE_BAZEL_BUILD '-fdiagnostics-color=always' '-std=c++2a' -Wall -Wreturn-type -Wuninitialized -Wunused-result '-Werror=narrowing' '-Werror=reorder' -Wunused-local-typedefs '-Werror=conversion-null' '-Werror=overlength-strings' '-Werror=pointer-arith' '-Werror=varargs' '-Werror=vla' '-Werror=write-strings' -Wmissing-declarations -Wno-attribute-alias -Wno-sign-compare -Wno-uninitialized -Wno-unused-function -Wno-unused-result -Wno-unused-variable -no-canonical-prefixes -Wno-builtin-macro-redefined '-D__DATE__="redacted"' '-D__TIMESTAMP__="redacted"' '-D__TIME__="redacted"' -c external/com_google_tcmalloc/tcmalloc/internal/mincore.cc -o bazel-out/darwin-fastbuild/bin/external/com_google_tcmalloc/tcmalloc/internal/_objs/mincore/mincore.o)
Execution platform: @local_config_platform//:host
Use --sandbox_debug to see verbose messages from the sandbox
warning: unknown warning option '-Wno-attribute-alias'; did you mean '-Wno-attributes'? [-Wunknown-warning-option]
external/com_google_tcmalloc/tcmalloc/internal/mincore.cc:28:36: error: cannot initialize a parameter of type 'char *' with an lvalue of type 'unsigned char *'
return ::mincore(addr, length, result);
^~~~~~
/Applications/Xcode.app/Contents/Developer/Platforms/MacOSX.platform/Developer/SDKs/MacOSX10.15.sdk/usr/include/sys/mman.h:243:45: note: passing argument to parameter here
int mincore(const void *, size_t, char *);
^
1 warning and 1 error generated.
Target @com_google_tcmalloc//tcmalloc:tcmalloc failed to build
INFO: Elapsed time: 0.658s, Critical Path: 0.53s
INFO: 1 process: 1 darwin-sandbox.
FAILED: Build did NOT complete successfully
Running the same command on Clear Linux 5.7.8-968 x86_64 with gcc compiles alright without errors.
Hi All,
When memory gets allocated I am capturing backtrace but the problem is that backtrace calling malloc and to avoid going in loop I have enabled flag. Now though it works fine performance is hit very badly. I was looking for some options and I came across below code..
#define _GNU_SOURCE
#include <dlfcn.h>
#include <execinfo.h>
#include <stdio.h>
#include <string.h>
#include <unistd.h>
void *
malloc (size_t size)
{
const char *message = "malloc called\n";
write (STDOUT_FILENO, message, strlen (message));
void *next = dlsym (RTLD_NEXT, "malloc");
return ((__typeof__ (malloc) *) next) (size);
}
int
main (void)
{
/* This calls malloc. */
puts ("First call to backtrace.");
void *buffer[10];
backtrace (buffer, 10);
backtrace (buffer, 10);
backtrace (buffer, 10);
/* This does not. */
puts ("Second call to backtrace.");
backtrace (buffer, 10);
}
Output:
----------
./a.out
First call to backtrace.
malloc called
Second call to backtrace.
Is there anything similar that can be done with tcmalloc so that performance doesn't take a hit ?.
rgds
Balaji Kamal Kannadassan
I followed the guidance listed here (#27) on how to build a static library. In the various BUILD files, I commented out the linkstatic = 1 lines, however, I can't find any .so files produced:
find tcmalloc/ -name *.so
Since I'm on a SLES 15 HPC cluster, I can't install packages through zypper/apt/yum (no root access) so I am not sure how I can generate a .so file.
As title.
Hi, we are using tcmalloc inside of kubernetes and we keep seeing kubelets running over memory limits even though they have no reason to.
I have a suspicion that this is tcmalloc not releasing memory. Which is desirable, however not if it trips over the oom killswitch in kubernetes. Should we just use ulimit
?
This re-released version of tcmalloc is missing the fix that's available in gperftools: gperftools/gperftools@06c9414
The O(n) search over large spans becomes very expensive for any long-running application with >1MB allocations, since over time the large span list can accumulate thousands of entries due to fragmentation.
I always get segmentation fault when building with ASan on. Here is the way to trigger the problem
bazel run --copt=-fsanitize=address --linkopt=-fsanitize=address tcmalloc/testing:hello_main
It appears the seg fault happens before the main function is called as I added a quick print as the first statement in main and it was not printed. The seg fault goes away if I remove the "malloc = "//tcmalloc" in the cc_binary target.
I use the LLVM 10 toolchain on Ubuntu 20.04. Bazel version 3.4.1. TCMalloc is at commit 65bf455.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.