Comments (24)
I've compiled a small program that just returns 0.
$ time ./t
real 0m8.253s
user 0m1.065s
sys 0m7.165s
Then I've injected lines printing time(NULL) into __asan_init:
$ ./t
line: 649, time: 1325147635
line: 654, time: 1325147635
line: 658, time: 1325147635
line: 692, time: 1325147635
line: 700, time: 1325147635
line: 703, time: 1325147640
line: 706, time: 1325147640
line: 733, time: 1325147643
line: 738, time: 1325147643
line: 763, time: 1325147643
line: 772, time: 1325147643
line: 786, time: 1325147643
line: 797, time: 1325147643
These results are quite rough, but looks like ~5 seconds are spent in InitializeAsanInterceptors()
(which calls INTERCEPT_FUNCTION many times) and another 3 seconds are in multiple calls
to INTERCEPT_FUNCTION.
So, yes, the hypothesis about slow mach_override is correct.
I haven't got any valuable information from Shark yet. The top line (~18-20%) is usually
ml_set_interrupts_enabled (which means many profiler ticks occured while the program
was in the kernel), most of other lines relate to kernel code, too.
The most interesting part is vm_allocate(), which is called a number of times for each
interceptor -- this is most likely to be the culprit.
Reported by ramosian.glider
on 2011-12-29 08:48:21
- Status changed:
Started
from sanitizers.
Another interesting experiment was to count the number of vm_allocate calls in mach_override_ptr().
================================
Index: projects/compiler-rt/lib/asan/mach_override/mach_override.c
===================================================================
--- projects/compiler-rt/lib/asan/mach_override/mach_override.c (revision 147308)
+++ projects/compiler-rt/lib/asan/mach_override/mach_override.c (working copy)
@@ -451,9 +451,11 @@
int allocated = 0;
vm_map_t task_self = mach_task_self();
+ fprintf(stderr, "vm_allocates follow\n");
while( !err && !allocated && page != last ) {
err = vm_allocate( task_self, &page, pageSize, 0 );
+ fprintf(stderr, "vm_allocate\n");
if( err == err_none )
allocated = 1;
else if( err == KERN_NO_SPACE ) {
================================
$ ./t > log 2>&1
$ cat log | grep "vm_allocates follow" | wc
48 96 960
$ cat log | grep "vm_allocate$" | wc
3146952 3146952 37763424
Reported by ramosian.glider
on 2011-12-29 09:45:07
from sanitizers.
For 32 bits that's only 1176 calls to vm_allocate() -- no surprise everything is ok.
Reported by ramosian.glider
on 2011-12-29 09:47:14
from sanitizers.
Loop perforation in action: we can easily speed up this code by 4x (that's 427414 calls
to vm_allocate, so it is not the bottleneck anymore):
Index: projects/compiler-rt/lib/asan/mach_override/mach_override.c
===================================================================
--- projects/compiler-rt/lib/asan/mach_override/mach_override.c (revision 147338)
+++ projects/compiler-rt/lib/asan/mach_override/mach_override.c (working copy)
@@ -451,16 +451,18 @@
int allocated = 0;
vm_map_t task_self = mach_task_self();
+ fprintf(stderr, "vm_allocates follow\n");
while( !err && !allocated && page != last ) {
err = vm_allocate( task_self, &page, pageSize, 0 );
+ fprintf(stderr, "vm_allocate\n");
if( err == err_none )
allocated = 1;
else if( err == KERN_NO_SPACE ) {
#if defined(__x86_64__)
- page -= pageSize;
+ page -= pageSize * 8;
#else
- page += pageSize;
+ page += pageSize * 8;
#endif
err = err_none;
=========================================
$ time ./t 2>/dev/null
real 0m2.129s
user 0m0.322s
sys 0m1.800s
Of course the fix should involve calling vm_allocate less often by grouping several
allocations together and/or caching the probe results for subsequent calls to mach_override_ptr().
Reported by ramosian.glider
on 2011-12-29 10:30:46
from sanitizers.
I've made ASan pre-allocate memory for mach_override_ptr using mmap, but it still takes
1.3 seconds to run an empty program (versus 13 milliseconds on 32-bit Mac OS).
I've instrumented the code with profiling printfs and here's what I got:
sec: 1326380400, msec: 319867 at /Users/glider/src/asan/asan-llvm-trunk/llvm/projects/compiler-rt/lib/asan/asan_rtl.cc:394
sec: 1326380401, msec: 42812 at /Users/glider/src/asan/asan-llvm-trunk/llvm/projects/compiler-rt/lib/asan/asan_rtl.cc:510
-- that's __asan_init(), which takes 723 milliseconds to run (I've also seen 450 ms
sometimes)
Some 560 ms are spent in InitializeAsanInterceptors():
sec: 1326380400, msec: 354748 at /Users/glider/src/asan/asan-llvm-trunk/llvm/projects/compiler-rt/lib/asan/asan_rtl.cc:451
sec: 1326380400, msec: 911536 at /Users/glider/src/asan/asan-llvm-trunk/llvm/projects/compiler-rt/lib/asan/asan_rtl.cc:456
, which calls mach_override_ptr for 26 times, that's 21 ms per call:
sec: 1326380400, msec: 366326 at /Users/glider/src/asan/asan-llvm-trunk/llvm/projects/compiler-rt/lib/asan/mach_override/mach_override.c:214
sec: 1326380400, msec: 483865 at /Users/glider/src/asan/asan-llvm-trunk/llvm/projects/compiler-rt/lib/asan/mach_override/mach_override.c:214
sec: 1326380400, msec: 507547 at /Users/glider/src/asan/asan-llvm-trunk/llvm/projects/compiler-rt/lib/asan/mach_override/mach_override.c:214
sec: 1326380400, msec: 531243 at /Users/glider/src/asan/asan-llvm-trunk/llvm/projects/compiler-rt/lib/asan/mach_override/mach_override.c:214
Each time some 12 ms are spent on something that looks like a COW in atomic_mov64:
908 void atomic_mov64(
909 uint64_t *targetAddress,
910 uint64_t value )
911 {
912 PROFILE_TIME();
913 *targetAddress = value;
914 PROFILE_TIME();
915 *targetAddress = value;
916 PROFILE_TIME();
917 }
(I've inserted the second access to make sure it's faster than the first one)
sec: 1326380400, msec: 495752 at /Users/glider/src/asan/asan-llvm-trunk/llvm/projects/compiler-rt/lib/asan/mach_override/mach_override.c:913
sec: 1326380400, msec: 507510 at /Users/glider/src/asan/asan-llvm-trunk/llvm/projects/compiler-rt/lib/asan/mach_override/mach_override.c:915
sec: 1326380400, msec: 507542 at /Users/glider/src/asan/asan-llvm-trunk/llvm/projects/compiler-rt/lib/asan/mach_override/mach_override.c:917
Some other write accesses to the library code may also take up to 20 ms, so do system
calls like vm_protect() (the total result depends on which library functions are intercepted:
further accesses to the same code pages may be faster).
It's still not evident why the empty program takes additional 0.6 seconds after __asan_init()
has finished.
Dima suspects that this can be caused by delayed effects of copying or caching.
Reported by [email protected]
on 2012-01-12 15:14:49
from sanitizers.
Attached is the Shark profile for this program.
Most of the time is spent in vm_map_lookup_locked, which is invoked by user_trap()
(50.9%) and exit() (23.9%)
Reported by ramosian.glider
on 2012-01-13 09:35:01
- _Attachment: [shark-asan.txt](https://storage.googleapis.com/google-code-attachments/address-sanitizer/issue-24/comment-6/shark-asan.txt)_
from sanitizers.
Okay, we have two problems here.
First, mach_override_ptr is slow because of the free memory lookups that do too many
vm_allocate calls. This used to take up to 8 seconds on our machine. My solution is
to externalize the branch island allocator so that it can pre-map some memory and minimize
the allocation cost. The draft implementation has sped up an empty asan_test64 run
to some 0.8 seconds.
Second, allocating the shadow memory bloats the virtual page table and slows down the
lookups and the shutdown process. For example, the following program:
==============
#include <sys/mman.h>
int main() {
void *t = mmap(0, 0x00000fffffffffffUL, PROT_READ| PROT_WRITE,
MAP_ANON | MAP_PRIVATE | MAP_NORESERVE, -1, 0);
}
==============
, which maps runs for 0.55 seconds on our machine without AddressSanitizer. Most of
this time is spent in the virtual page table lookups on shutdown.
We do not know how to get rid of this lookup overhead right now (it is in fact greater,
because the lookups are also performed as the program runs).
Mapping the shadow memory before mach_override_ptr() makes the performance worse:
real 0m1.300s
user 0m0.012s
sys 0m1.277s
versus
real 0m0.842s
user 0m0.011s
sys 0m0.828s
if the shadow is mapped after mach_override_ptr() calls.
Reported by ramosian.glider
on 2012-01-13 11:31:20
from sanitizers.
The last thing to mention is that my measurements of mach_override_ptr performance were
done for the case of shadow memory mapping at the beginning of __asan_init, so they
are off a bit. With my allocator patc overriding functions takes only 1 millisecond:
sec: 1326454369, msec: 633272 at /Users/glider/src/asan/asan-llvm-trunk/llvm/projects/compiler-rt/lib/asan/asan_rtl.cc:414
sec: 1326454369, msec: 634223 at /Users/glider/src/asan/asan-llvm-trunk/llvm/projects/compiler-rt/lib/asan/asan_rtl.cc:416
vs. 8-9 seconds without it:
sec: 1326454567, msec: 742921 at /Users/glider/src/asan/asan-llvm-trunk/llvm/projects/compiler-rt/lib/asan/asan_rtl.cc:414
sec: 1326454576, msec: 246927 at /Users/glider/src/asan/asan-llvm-trunk/llvm/projects/compiler-rt/lib/asan/asan_rtl.cc:416
Reported by ramosian.glider
on 2012-01-13 11:37:03
from sanitizers.
As of r148116, the whole asan_test64 takes finite time (18 minutes) to pass (32-bit
tests run for 1 minute)
Further possible speed improvements will require mapping less virtual memory.
(using e.g. twice less memory should speed up the shutdown twice). This can be accomplished
in the following ways:
-- use a SEGV handler instead of pre-allocating all the shadow memory;
-- omit some of the shadow memory which is guaranteed to be not used by the tests;
-- use a greater shadow memory scale factor.
Reported by ramosian.glider
on 2012-01-13 16:47:23
from sanitizers.
>> asan_test64 takes finite time (18 minutes)
Good!
>> -- use a SEGV handler instead of pre-allocating all the shadow memory;
>> -- omit some of the shadow memory which is guaranteed to be not used by the tests;
>> -- use a greater shadow memory scale factor.
Any of the suggested solutions will end up testing something different from what we
ship to users.
Reported by konstantin.s.serebryany
on 2012-01-13 18:46:44
from sanitizers.
Okay, because this is a test-only problem, let's fix the tests.
I'll make the heavy death run in parallel -- hope that helps.
Reported by ramosian.glider
on 2012-01-15 18:43:14
from sanitizers.
Looks like EXPECT_DEATH can't be called from multiple threads, because it shares the
|g_captured_stdout| and |g_captured_stderr| global variables. Putting each EXPECT_DEATH
call under a lock will effectively kill the performance gain :(
Reported by [email protected]
on 2012-01-16 11:47:35
from sanitizers.
I've also tried to use multiple processes to run death tests in parallel, but it seems
to slow down the execution even more.
Reported by ramosian.glider
on 2012-01-16 12:46:42
from sanitizers.
If nothing else works, we can try this...
But we will have to make sure that at least some tests (e.g. output tests) run in regular
mode.
>> -- omit some of the shadow memory which is guaranteed to be not used by the tests;
Reported by konstantin.s.serebryany
on 2012-01-17 19:30:51
from sanitizers.
Does asan on 64-bit Mac always have to run with ASLR off?
If yes, we can actually reduce the size of the shadow significantly.
I tried the patch below (not for commit!) and the 64-bit tests ran ~5x faster.
@@ -457,11 +458,22 @@
{
if (kLowShadowBeg != kLowShadowEnd) {
+ // 0x100000000000
+ // 0x11ffffffffff
// mmap the low shadow plus one page.
- ReserveShadowMemoryRange(kLowShadowBeg - kPageSize, kLowShadowEnd);
+ uintptr_t low_end = kLowShadowEnd;
+ if (1 && __WORDSIZE == 64) {
+ low_end = 0x101fffffffffULL;
+ }
+
+ ReserveShadowMemoryRange(kLowShadowBeg - kPageSize, low_end);
}
// mmap the high shadow.
- ReserveShadowMemoryRange(kHighShadowBeg, kHighShadowEnd);
+ uintptr_t high_shadow = kHighShadowBeg;
+ if (1 && __WORDSIZE == 64) {
+ high_shadow = 0x1f8000000000ULL;
+ }
+ ReserveShadowMemoryRange(high_shadow, kHighShadowEnd);
// protect the gap
void *prot = AsanMprotect(kShadowGapBeg, kShadowGapEnd - kShadowGapBeg + 1);
CHECK(prot == (void*)kShadowGapBeg);
Reported by konstantin.s.serebryany
on 2012-01-31 02:51:07
from sanitizers.
Yes, we strictly need ASLR off. Otherwise it's possible to have the code segment overwritten.
Are you going to use this just for the tests?
Reported by ramosian.glider
on 2012-01-31 08:09:48
from sanitizers.
I would highly prefer to have no difference between tests and non-tests.
Reported by konstantin.s.serebryany
on 2012-01-31 17:47:12
from sanitizers.
The solution in #c15 is actually risky.
The ideal situation (which we have now) is when all memory is one of
- legal application memory
- legal shadow memory
- forbidden memory (mapped with PROT_NONE)
#c15 violates this.
As I just experimented, mmap with PROT_NONE is as expensive as mmap with PROT_READ|PROT_WRITE
Reported by konstantin.s.serebryany
on 2012-02-24 21:06:36
from sanitizers.
Yes, you're right. If any of the shadow memory pages is unmapped, someone may occasionally
mmap it from the client code.
We can hardly prevent it: the only solution I can think of is to wrap mmap and manage
the virtual memory table ourselves, which will be probably slower than doing that in
the kernel.
Reported by ramosian.glider
on 2012-02-27 08:11:49
from sanitizers.
Reported by konstantin.s.serebryany
on 2012-05-22 08:47:47
- Labels added: OpSys-OSX
from sanitizers.
btw, http://openradar.appspot.com/radar?id=1634406
Reported by konstantin.s.serebryany
on 2012-06-27 07:02:13
from sanitizers.
The remaining performance issues are minor on 10.7 and 10.8, so reducing the priority.
Reported by ramosian.glider
on 2012-11-29 13:45:12
- Labels added: Priority-Low
- Labels removed: Priority-Medium
from sanitizers.
Current asan startup/shutdown on Mac > 10.7 time is ~0.3 seconds
This is much worse than on Linux, but still tolerable.
I think we can close this issue.
Reported by konstantin.s.serebryany
on 2013-02-18 06:49:57
- Status changed:
Fixed
from sanitizers.
Adding Project:AddressSanitizer as part of GitHub migration.
Reported by ramosian.glider
on 2015-07-30 09:12:58
- Labels added: ProjectAddressSanitizer
from sanitizers.
Related Issues (20)
- The program compiled and run with lsan alone reports an error. HOT 3
- The error content of `AddressSanitizerComparisonOfMemoryTools` page
- ASan: please add option to suppress details on indirect leaks HOT 3
- TSAN segmentation fault on AlpineLinux
- ASAN_OPTIONS=log_to_syslog=true leads to deadlock HOT 1
- How to deal with this problem? HOT 1
- Thread Sanitizer FATAL error on kernel version 6.6.6-x HOT 21
- TSAN: False positive lock-order-inversion from memory reuse
- TSAN: potential false positive in simple lockfree stack HOT 6
- LSan API doesn't allow control of child threads
- allocator_may_return_null=1 does not seem to work, hindering fuzzing for security issues
- AddressSanitizer: SEGV in malloc_trim
- can not be detected by asan HOT 5
- TestCases/stack-uas.c occasionally misclassified as stack-buffer-overflow
- Empty code randomly crashes with SEGV on unknown address HOT 5
- MSAN+libfuzzer reports error in libfuzzer code with '-jobs=X' (Uninitialized bytes in fputs) HOT 1
- Suppressing "heap-use-after-free" does not work in called functions
- What is -[ClassName objCMethodToSuppress:]??? HOT 1
- TSAN_OPTIONS="log_path=/var/log/xxxx_tsan" doesn't work, the logs still print on stdout
- [tsan] False positive when reading from a pipe in io_uring
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from sanitizers.