Comments (23)
I've compiled a small program that just returns 0.
$ time ./t
real 0m8.253s
user 0m1.065s
sys 0m7.165s
Then I've injected lines printing time(NULL) into __asan_init:
$ ./t
line: 649, time: 1325147635
line: 654, time: 1325147635
line: 658, time: 1325147635
line: 692, time: 1325147635
line: 700, time: 1325147635
line: 703, time: 1325147640
line: 706, time: 1325147640
line: 733, time: 1325147643
line: 738, time: 1325147643
line: 763, time: 1325147643
line: 772, time: 1325147643
line: 786, time: 1325147643
line: 797, time: 1325147643
These results are quite rough, but looks like ~5 seconds are spent in
InitializeAsanInterceptors() (which calls INTERCEPT_FUNCTION many times) and
another 3 seconds are in multiple calls to INTERCEPT_FUNCTION.
So, yes, the hypothesis about slow mach_override is correct.
I haven't got any valuable information from Shark yet. The top line (~18-20%)
is usually ml_set_interrupts_enabled (which means many profiler ticks occured
while the program was in the kernel), most of other lines relate to kernel
code, too.
The most interesting part is vm_allocate(), which is called a number of times
for each interceptor -- this is most likely to be the culprit.
Original comment by [email protected]
on 29 Dec 2011 at 8:48
- Changed state: Started
from address-sanitizer.
Another interesting experiment was to count the number of vm_allocate calls in
mach_override_ptr().
================================
Index: projects/compiler-rt/lib/asan/mach_override/mach_override.c
===================================================================
--- projects/compiler-rt/lib/asan/mach_override/mach_override.c (revision
147308)
+++ projects/compiler-rt/lib/asan/mach_override/mach_override.c (working copy)
@@ -451,9 +451,11 @@
int allocated = 0;
vm_map_t task_self = mach_task_self();
+ fprintf(stderr, "vm_allocates follow\n");
while( !err && !allocated && page != last ) {
err = vm_allocate( task_self, &page, pageSize, 0 );
+ fprintf(stderr, "vm_allocate\n");
if( err == err_none )
allocated = 1;
else if( err == KERN_NO_SPACE ) {
================================
$ ./t > log 2>&1
$ cat log | grep "vm_allocates follow" | wc
48 96 960
$ cat log | grep "vm_allocate$" | wc
3146952 3146952 37763424
Original comment by [email protected]
on 29 Dec 2011 at 9:45
from address-sanitizer.
For 32 bits that's only 1176 calls to vm_allocate() -- no surprise everything
is ok.
Original comment by [email protected]
on 29 Dec 2011 at 9:47
from address-sanitizer.
Loop perforation in action: we can easily speed up this code by 4x (that's
427414 calls to vm_allocate, so it is not the bottleneck anymore):
Index: projects/compiler-rt/lib/asan/mach_override/mach_override.c
===================================================================
--- projects/compiler-rt/lib/asan/mach_override/mach_override.c (revision
147338)
+++ projects/compiler-rt/lib/asan/mach_override/mach_override.c (working copy)
@@ -451,16 +451,18 @@
int allocated = 0;
vm_map_t task_self = mach_task_self();
+ fprintf(stderr, "vm_allocates follow\n");
while( !err && !allocated && page != last ) {
err = vm_allocate( task_self, &page, pageSize, 0 );
+ fprintf(stderr, "vm_allocate\n");
if( err == err_none )
allocated = 1;
else if( err == KERN_NO_SPACE ) {
#if defined(__x86_64__)
- page -= pageSize;
+ page -= pageSize * 8;
#else
- page += pageSize;
+ page += pageSize * 8;
#endif
err = err_none;
=========================================
$ time ./t 2>/dev/null
real 0m2.129s
user 0m0.322s
sys 0m1.800s
Of course the fix should involve calling vm_allocate less often by grouping
several allocations together and/or caching the probe results for subsequent
calls to mach_override_ptr().
Original comment by [email protected]
on 29 Dec 2011 at 10:30
from address-sanitizer.
I've made ASan pre-allocate memory for mach_override_ptr using mmap, but it
still takes 1.3 seconds to run an empty program (versus 13 milliseconds on
32-bit Mac OS).
I've instrumented the code with profiling printfs and here's what I got:
sec: 1326380400, msec: 319867 at
/Users/glider/src/asan/asan-llvm-trunk/llvm/projects/compiler-rt/lib/asan/asan_r
tl.cc:394
sec: 1326380401, msec: 42812 at
/Users/glider/src/asan/asan-llvm-trunk/llvm/projects/compiler-rt/lib/asan/asan_r
tl.cc:510
-- that's __asan_init(), which takes 723 milliseconds to run (I've also seen
450 ms sometimes)
Some 560 ms are spent in InitializeAsanInterceptors():
sec: 1326380400, msec: 354748 at
/Users/glider/src/asan/asan-llvm-trunk/llvm/projects/compiler-rt/lib/asan/asan_r
tl.cc:451
sec: 1326380400, msec: 911536 at
/Users/glider/src/asan/asan-llvm-trunk/llvm/projects/compiler-rt/lib/asan/asan_r
tl.cc:456
, which calls mach_override_ptr for 26 times, that's 21 ms per call:
sec: 1326380400, msec: 366326 at
/Users/glider/src/asan/asan-llvm-trunk/llvm/projects/compiler-rt/lib/asan/mach_o
verride/mach_override.c:214
sec: 1326380400, msec: 483865 at
/Users/glider/src/asan/asan-llvm-trunk/llvm/projects/compiler-rt/lib/asan/mach_o
verride/mach_override.c:214
sec: 1326380400, msec: 507547 at
/Users/glider/src/asan/asan-llvm-trunk/llvm/projects/compiler-rt/lib/asan/mach_o
verride/mach_override.c:214
sec: 1326380400, msec: 531243 at
/Users/glider/src/asan/asan-llvm-trunk/llvm/projects/compiler-rt/lib/asan/mach_o
verride/mach_override.c:214
Each time some 12 ms are spent on something that looks like a COW in
atomic_mov64:
908 void atomic_mov64(
909 uint64_t *targetAddress,
910 uint64_t value )
911 {
912 PROFILE_TIME();
913 *targetAddress = value;
914 PROFILE_TIME();
915 *targetAddress = value;
916 PROFILE_TIME();
917 }
(I've inserted the second access to make sure it's faster than the first one)
sec: 1326380400, msec: 495752 at
/Users/glider/src/asan/asan-llvm-trunk/llvm/projects/compiler-rt/lib/asan/mach_o
verride/mach_override.c:913
sec: 1326380400, msec: 507510 at
/Users/glider/src/asan/asan-llvm-trunk/llvm/projects/compiler-rt/lib/asan/mach_o
verride/mach_override.c:915
sec: 1326380400, msec: 507542 at
/Users/glider/src/asan/asan-llvm-trunk/llvm/projects/compiler-rt/lib/asan/mach_o
verride/mach_override.c:917
Some other write accesses to the library code may also take up to 20 ms, so do
system calls like vm_protect() (the total result depends on which library
functions are intercepted: further accesses to the same code pages may be
faster).
It's still not evident why the empty program takes additional 0.6 seconds after
__asan_init() has finished.
Dima suspects that this can be caused by delayed effects of copying or caching.
Original comment by [email protected]
on 12 Jan 2012 at 3:14
from address-sanitizer.
Attached is the Shark profile for this program.
Most of the time is spent in vm_map_lookup_locked, which is invoked by
user_trap() (50.9%) and exit() (23.9%)
Original comment by [email protected]
on 13 Jan 2012 at 9:35
Attachments:
from address-sanitizer.
Okay, we have two problems here.
First, mach_override_ptr is slow because of the free memory lookups that do too
many vm_allocate calls. This used to take up to 8 seconds on our machine. My
solution is to externalize the branch island allocator so that it can pre-map
some memory and minimize the allocation cost. The draft implementation has sped
up an empty asan_test64 run to some 0.8 seconds.
Second, allocating the shadow memory bloats the virtual page table and slows
down the lookups and the shutdown process. For example, the following program:
==============
#include <sys/mman.h>
int main() {
void *t = mmap(0, 0x00000fffffffffffUL, PROT_READ| PROT_WRITE,
MAP_ANON | MAP_PRIVATE | MAP_NORESERVE, -1, 0);
}
==============
, which maps runs for 0.55 seconds on our machine without AddressSanitizer.
Most of this time is spent in the virtual page table lookups on shutdown.
We do not know how to get rid of this lookup overhead right now (it is in fact
greater, because the lookups are also performed as the program runs).
Mapping the shadow memory before mach_override_ptr() makes the performance
worse:
real 0m1.300s
user 0m0.012s
sys 0m1.277s
versus
real 0m0.842s
user 0m0.011s
sys 0m0.828s
if the shadow is mapped after mach_override_ptr() calls.
Original comment by [email protected]
on 13 Jan 2012 at 11:31
from address-sanitizer.
The last thing to mention is that my measurements of mach_override_ptr
performance were done for the case of shadow memory mapping at the beginning of
__asan_init, so they are off a bit. With my allocator patc overriding functions
takes only 1 millisecond:
sec: 1326454369, msec: 633272 at
/Users/glider/src/asan/asan-llvm-trunk/llvm/projects/compiler-rt/lib/asan/asan_r
tl.cc:414
sec: 1326454369, msec: 634223 at
/Users/glider/src/asan/asan-llvm-trunk/llvm/projects/compiler-rt/lib/asan/asan_r
tl.cc:416
vs. 8-9 seconds without it:
sec: 1326454567, msec: 742921 at
/Users/glider/src/asan/asan-llvm-trunk/llvm/projects/compiler-rt/lib/asan/asan_r
tl.cc:414
sec: 1326454576, msec: 246927 at
/Users/glider/src/asan/asan-llvm-trunk/llvm/projects/compiler-rt/lib/asan/asan_r
tl.cc:416
Original comment by [email protected]
on 13 Jan 2012 at 11:37
from address-sanitizer.
As of r148116, the whole asan_test64 takes finite time (18 minutes) to pass
(32-bit tests run for 1 minute)
Further possible speed improvements will require mapping less virtual memory.
(using e.g. twice less memory should speed up the shutdown twice). This can be
accomplished in the following ways:
-- use a SEGV handler instead of pre-allocating all the shadow memory;
-- omit some of the shadow memory which is guaranteed to be not used by the tests;
-- use a greater shadow memory scale factor.
Original comment by [email protected]
on 13 Jan 2012 at 4:47
from address-sanitizer.
>> asan_test64 takes finite time (18 minutes)
Good!
>> -- use a SEGV handler instead of pre-allocating all the shadow memory;
>> -- omit some of the shadow memory which is guaranteed to be not used by the
tests;
>> -- use a greater shadow memory scale factor.
Any of the suggested solutions will end up testing something different from
what we ship to users.
Original comment by [email protected]
on 13 Jan 2012 at 6:46
from address-sanitizer.
Okay, because this is a test-only problem, let's fix the tests.
I'll make the heavy death run in parallel -- hope that helps.
Original comment by [email protected]
on 15 Jan 2012 at 6:43
from address-sanitizer.
Looks like EXPECT_DEATH can't be called from multiple threads, because it
shares the |g_captured_stdout| and |g_captured_stderr| global variables.
Putting each EXPECT_DEATH call under a lock will effectively kill the
performance gain :(
Original comment by [email protected]
on 16 Jan 2012 at 11:47
from address-sanitizer.
I've also tried to use multiple processes to run death tests in parallel, but
it seems to slow down the execution even more.
Original comment by [email protected]
on 16 Jan 2012 at 12:46
from address-sanitizer.
If nothing else works, we can try this...
But we will have to make sure that at least some tests (e.g. output tests) run
in regular mode.
>> -- omit some of the shadow memory which is guaranteed to be not used by the
tests;
Original comment by [email protected]
on 17 Jan 2012 at 7:30
from address-sanitizer.
Does asan on 64-bit Mac always have to run with ASLR off?
If yes, we can actually reduce the size of the shadow significantly.
I tried the patch below (not for commit!) and the 64-bit tests ran ~5x faster.
@@ -457,11 +458,22 @@
{
if (kLowShadowBeg != kLowShadowEnd) {
+ // 0x100000000000
+ // 0x11ffffffffff
// mmap the low shadow plus one page.
- ReserveShadowMemoryRange(kLowShadowBeg - kPageSize, kLowShadowEnd);
+ uintptr_t low_end = kLowShadowEnd;
+ if (1 && __WORDSIZE == 64) {
+ low_end = 0x101fffffffffULL;
+ }
+
+ ReserveShadowMemoryRange(kLowShadowBeg - kPageSize, low_end);
}
// mmap the high shadow.
- ReserveShadowMemoryRange(kHighShadowBeg, kHighShadowEnd);
+ uintptr_t high_shadow = kHighShadowBeg;
+ if (1 && __WORDSIZE == 64) {
+ high_shadow = 0x1f8000000000ULL;
+ }
+ ReserveShadowMemoryRange(high_shadow, kHighShadowEnd);
// protect the gap
void *prot = AsanMprotect(kShadowGapBeg, kShadowGapEnd - kShadowGapBeg + 1);
CHECK(prot == (void*)kShadowGapBeg);
Original comment by [email protected]
on 31 Jan 2012 at 2:51
from address-sanitizer.
Yes, we strictly need ASLR off. Otherwise it's possible to have the code
segment overwritten.
Are you going to use this just for the tests?
Original comment by [email protected]
on 31 Jan 2012 at 8:09
from address-sanitizer.
I would highly prefer to have no difference between tests and non-tests.
Original comment by [email protected]
on 31 Jan 2012 at 5:47
from address-sanitizer.
The solution in #c15 is actually risky.
The ideal situation (which we have now) is when all memory is one of
- legal application memory
- legal shadow memory
- forbidden memory (mapped with PROT_NONE)
#c15 violates this.
As I just experimented, mmap with PROT_NONE is as expensive as mmap with
PROT_READ|PROT_WRITE
Original comment by [email protected]
on 24 Feb 2012 at 9:06
from address-sanitizer.
Yes, you're right. If any of the shadow memory pages is unmapped, someone may
occasionally mmap it from the client code.
We can hardly prevent it: the only solution I can think of is to wrap mmap and
manage the virtual memory table ourselves, which will be probably slower than
doing that in the kernel.
Original comment by [email protected]
on 27 Feb 2012 at 8:11
from address-sanitizer.
Original comment by [email protected]
on 22 May 2012 at 8:47
- Added labels: OpSys-OSX
from address-sanitizer.
btw, http://openradar.appspot.com/radar?id=1634406
Original comment by [email protected]
on 27 Jun 2012 at 7:02
from address-sanitizer.
The remaining performance issues are minor on 10.7 and 10.8, so reducing the
priority.
Original comment by [email protected]
on 29 Nov 2012 at 1:45
- Added labels: Priority-Low
- Removed labels: Priority-Medium
from address-sanitizer.
Current asan startup/shutdown on Mac > 10.7 time is ~0.3 seconds
This is much worse than on Linux, but still tolerable.
I think we can close this issue.
Original comment by [email protected]
on 18 Feb 2013 at 6:49
- Changed state: Fixed
from address-sanitizer.
Related Issues (20)
- ASan instrumentation should work with -O0 HOT 6
- need to instrument thread-local globals HOT 1
- Clang+ASan incorrectly handles exceptions. HOT 3
- warn on missing blacklist, better errmsg HOT 2
- [deleted issue]
- symbolize stack traces using code from lldb HOT 17
- Blacklist regexp errors silently lead to blacklist not working HOT 1
- double-free / invalid-free errors should use Report instead of Printf HOT 2
- asan false positives caused by dlcose HOT 1
- Issue with -faddress-sanitizer in combination with -Os/-O2 HOT 6
- ASan unittest (32-bit) fail to link with fresh googletest HOT 4
- mach_override may conflict with that in the client programs HOT 1
- _Unwind_Backtrace cannot unwind past wrap___cxa_throw on Mac HOT 3
- Add -C option to addr2line to de-mangle names HOT 1
- get rid of sysinfo/sysinfo.cc HOT 2
- CHECK fails on linux and program doesn't launch HOT 14
- clang and clang++ behave differently on a small OOB test HOT 4
- Deal with ASLR on Mac OS HOT 8
- can't static link against gflags HOT 1
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from address-sanitizer.