Git Product home page Git Product logo

Comments (8)

tudorsuciu avatar tudorsuciu commented on May 7, 2024 1

I could attach gdb to sde but symbols are not recognized. The backtrace is not usefull.
in gdb info function gives:
0x0000000000031190 pik::ButteraugliDistance(pik::Image3 const&, pik::Image3 const&, pik::Image)
0x0000000000031830 pik::ButteraugliDistance(pik::Image3 const&, pik::Image3 const&, pik::Image
)
0x0000000000031910 pik::ButteraugliDistance(pik::MetaImage const&, pik::MetaImage const&, pik::Image)
0x0000000000031e90 pik::Zone::~Zone() [clone .constprop.225]
0x00000000000321d0 pik::Image3 pik::Window(pik::Image3
, unsigned long, unsigned long, unsigned long, unsigned long) [clone .constprop.219]
0x0000000000032860 pik::CenterOpsinValues(pik::Image3*)

So the invalid instruction happens here:
pik::Zone::~Zone() [clone .constprop.225]

I got a 1,6Gb file from sde with all instructions executed. This is the last page:
Write (UINT64)0x00007ffe4aca2090 = 0x7ffe4aca3200
INS 0x0000560e4f0d807d BASE nop dword ptr [rax], eax
INS 0x0000560e4f0d8080 SSE movaps xmmword ptr [rsp], xmm4
Write (UINT128)0x00007ffe4aca2080 = 00000000_00000000_00000000_00000001
INS 0x0000560e4f0d8084 BASE lea rdi, ptr [rip+0xc6d11] | rdi = 0x560e4f19ed9c
INS 0x0000560e4f0d808b BASE call 0x560e4f0d0ea0 | rsp = 0x7ffe4aca2078
Write (UINT64)0x00007ffe4aca2078 = 0x560e4f0d8090
INS 0x0000560e4f0d0ea0 BASE push r15 | rsp = 0x7ffe4aca2070
Write (UINT64)0x00007ffe4aca2070 = 0x50
INS 0x0000560e4f0d0ea2 BASE push r14 | rsp = 0x7ffe4aca2068
Write (UINT64)0x00007ffe4aca2068 = 0x7ffe4aca3670
INS 0x0000560e4f0d0ea4 BASE push r13 | rsp = 0x7ffe4aca2060
Write (UINT64)0x00007ffe4aca2060 = 0x560e4f423ea8
INS 0x0000560e4f0d0ea6 BASE push r12 | rsp = 0x7ffe4aca2058
Write (UINT64)0x00007ffe4aca2058 = 0x7ffe4aca2200
INS 0x0000560e4f0d0ea8 BASE push rbp | rsp = 0x7ffe4aca2050
Write (UINT64)0x00007ffe4aca2050 = 0x560e509a1d40
INS 0x0000560e4f0d0ea9 BASE push rbx | rsp = 0x7ffe4aca2048
Write (UINT64)0x00007ffe4aca2048 = 0x560e509a1880
INS 0x0000560e4f0d0eaa BASE mov rbx, rdi | rbx = 0x560e4f19ed9c
INS 0x0000560e4f0d0ead BASE sub rsp, 0x18 | rsp = 0x7ffe4aca2030, rflags = 0x206
Read 0x7f923f7f9080 = (UINT64)0x00007f923f7f9080
INS 0x0000560e4f0d0eb1 BASE mov rax, qword ptr fs:[0x0] | rax = 0x7f923f7f9080
Read 0x560e509a1880 = (UINT64)0x00007f923f7f9078
INS 0x0000560e4f0d0ebd BASE mov rsi, qword ptr [rax-0x8] | rsi = 0x560e509a1880
INS 0x0000560e4f0d0ec4 BASE test rsi, rsi | rflags = 0x202
INS 0x0000560e4f0d0ec7 BASE jz 0x560e4f0d11c0
INS 0x0000560e4f0d0ecd SSE2 lfence
INS 0x0000560e4f0d0ed0 BASE rdtsc | rax = 0x37307c1b, rdx = 0x9f7
INS 0x0000560e4f0d0ed2 BASE shl rdx, 0x20 | rdx = 0x9f700000000, rflags = 0x206
INS 0x0000560e4f0d0ed6 BASE or rax, rdx | rax = 0x9f737307c1b, rflags = 0x206
INS 0x0000560e4f0d0ed9 SSE2 lfence
Read 0x560e4b19ed4b = (UINT64)0x0000560e509a18e0
INS 0x0000560e4f0d0edc BASE sub rbx, qword ptr [rsi+0x60] | rbx = 0x4000051, rflags = 0x202
INS 0x0000560e4f0d0ee0 BASE mov rbp, 0x1fffffffff | rbp = 0x1fffffffff
INS 0x0000560e4f0d0eea BASE and rax, rbp | rax = 0x1737307c1b, rflags = 0x206
INS 0x0000560e4f0d0eed BASE shl rbx, 0x25 | rbx = 0x80000a2000000000, rflags = 0x286
INS 0x0000560e4f0d0ef1 BASE lea rdi, ptr [rbx+rax1] | rdi = 0x80000a3737307c1b
Read 0 = (UINT64)0x0000560e509a18c0
INS 0x0000560e4f0d0ef5 BASE mov rax, qword ptr [rsi+0x40] | rax = 0
INS 0x0000560e4f0d0ef9 BASE cmp rax, 0x8 | rflags = 0x293
INS 0x0000560e4f0d0efd BASE jz 0x560e4f0d0f28
INS 0x0000560e4f0d0eff BASE lea rdx, ptr [rax
8] | rdx = 0
INS 0x0000560e4f0d0f07 BASE add rax, 0x1 | rax = 0x1, rflags = 0x202
INS 0x0000560e4f0d0f0b BASE mov qword ptr [rsi+rdx*1], rdi
Write (UINT64)0x0000560e509a1880 = 0x80000a3737307c1b
INS 0x0000560e4f0d0f0f BASE mov qword ptr [rsi+0x40], rax
Write (UINT64)0x0000560e509a18c0 = 0x1
INS 0x0000560e4f0d0f13 BASE add rsp, 0x18 | rsp = 0x7ffe4aca2048, rflags = 0x206
Read 0x560e509a1880 = (UINT64)0x00007ffe4aca2048
INS 0x0000560e4f0d0f17 BASE pop rbx | rbx = 0x560e509a1880, rsp = 0x7ffe4aca2050
Read 0x560e509a1d40 = (UINT64)0x00007ffe4aca2050
INS 0x0000560e4f0d0f18 BASE pop rbp | rbp = 0x560e509a1d40, rsp = 0x7ffe4aca2058
Read 0x7ffe4aca2200 = (UINT64)0x00007ffe4aca2058
INS 0x0000560e4f0d0f19 BASE pop r12 | r12 = 0x7ffe4aca2200, rsp = 0x7ffe4aca2060
Read 0x560e4f423ea8 = (UINT64)0x00007ffe4aca2060
INS 0x0000560e4f0d0f1b BASE pop r13 | r13 = 0x560e4f423ea8, rsp = 0x7ffe4aca2068
Read 0x7ffe4aca3670 = (UINT64)0x00007ffe4aca2068
INS 0x0000560e4f0d0f1d BASE pop r14 | r14 = 0x7ffe4aca3670, rsp = 0x7ffe4aca2070
Read 0x50 = (UINT64)0x00007ffe4aca2070
INS 0x0000560e4f0d0f1f BASE pop r15 | r15 = 0x50, rsp = 0x7ffe4aca2078
Read 0x560e4f0d8090 = (UINT64)0x00007ffe4aca2078
INS 0x0000560e4f0d0f21 BASE ret | rsp = 0x7ffe4aca2080
INS 0x0000560e4f0d8090 BASE call 0x560e4f0cbe90 | rsp = 0x7ffe4aca2078
Write (UINT64)0x00007ffe4aca2078 = 0x560e4f0d8095
INS 0x0000560e4f0cbe90 BASE push r15 | rsp = 0x7ffe4aca2070
Write (UINT64)0x00007ffe4aca2070 = 0x50
INS 0x0000560e4f0cbe92 BASE push r14 | rsp = 0x7ffe4aca2068
Write (UINT64)0x00007ffe4aca2068 = 0x7ffe4aca3670
INS 0x0000560e4f0cbe94 BASE push r13 | rsp = 0x7ffe4aca2060
Write (UINT64)0x00007ffe4aca2060 = 0x560e4f423ea8
INS 0x0000560e4f0cbe96 BASE push r12 | rsp = 0x7ffe4aca2058
Write (UINT64)0x00007ffe4aca2058 = 0x7ffe4aca2200
INS 0x0000560e4f0cbe98 BASE push rbp | rsp = 0x7ffe4aca2050
Write (UINT64)0x00007ffe4aca2050 = 0x560e509a1d40
INS 0x0000560e4f0cbe99 BASE push rbx | rsp = 0x7ffe4aca2048
Write (UINT64)0x00007ffe4aca2048 = 0x560e509a1880
INS 0x0000560e4f0cbe9a BASE sub rsp, 0x18 | rsp = 0x7ffe4aca2030, rflags = 0x206
SIG signal=0x4 on thread 0 at address 0x560e4f0cbe9e FATALSIG4

Do this pieces of information help or more is needed?

from pik.

jan-wassenberg avatar jan-wassenberg commented on May 7, 2024

Hi, thanks for reporting this issue.
The next patch is unfortunately very large but convolve.h seems to be the most likely cause.
Would you kindly let us know which instruction is faulting (gdb) so we can narrow it down?

from pik.

tudorsuciu avatar tudorsuciu commented on May 7, 2024

Hello,

Thank you for looking at the issue. I cannot get a proper core file, compilation doesn't work with msse4.1. Intel sde gives this for 9849523 (no patches applied):
~/Downloads/sde-external-8.16.0-2018-01-30-lin/sde64 -- ./bin/cpik test3.png out3.pik
Compressing with maximum Butteraugli distance 1.000000
Illegal instruction at address = 55e95da9ee9e: 0f 01 f9 48 c1 e2 20 48 09 d0 0f ae e8 48 89
Image name: /home/tudor/pik/bin/cpik
Offset in image: 0x31e9e
If you believe your application should attempt to execute
this illegal instruction (and others that may be present),
Then use this knob: -emit-illegal-insts 0
and this error message will be avoided.

If this is not helpful to you, i will try to attach gdb as described in sde documentation to get a proper backtrace(i hope).

Regards,

from pik.

jan-wassenberg avatar jan-wassenberg commented on May 7, 2024

Thank you for this information, it is very helpful and points us straight to the problem and a solution!
The faulting instruction is RDTSCP, used for timing especially in profiler.h. I didn't realize this but Penryn indeed lacks that instruction.

A quick fix for the short term would be to change rdtscp to rdtsc in the inline asm in tsc_timer.h.
BTW we no longer use the 32-bit version (wraps around too quickly) so I can simplify the interface.
I'll also change cpik/dpik to use the OS timer instead. After that, you'll be able to change #if PROFILER_ENABLED in profiler.h to #if PROFILER_ENABLED && 0 and this problem should be resolved.

from pik.

tudorsuciu avatar tudorsuciu commented on May 7, 2024

With this patch:

diff --git a/tsc_timer.h b/tsc_timer.h
index b7783dd..36a2fee 100644
--- a/tsc_timer.h
+++ b/tsc_timer.h
@@ -125,13 +125,13 @@ inline uint64_t Stop<uint64_t>() {
#elif PIK_ARCH_X64 && PIK_COMPILER_MSVC
PIK_COMPILER_FENCE;
unsigned aux;
- t = __rdtscp(&aux);
+ t = __rdtsc(&aux);
SIMD_NAMESPACE::load_fence();
PIK_COMPILER_FENCE;
#elif PIK_ARCH_X64 && (PIK_COMPILER_CLANG || PIK_COMPILER_GCC)
// Use inline asm because __rdtscp generates code to store TSC_AUX (ecx).
asm volatile(
- "rdtscp\n\t"
+ "rdtsc\n\t"
"shl $32, %%rdx\n\t"
"or %%rdx, %0\n\t"
"lfence"

It compiles and executes correctly (at least on small test images).
With this code:
static inline void native_cpuid(unsigned int *eax, unsigned int *ebx,
unsigned int *ecx, unsigned int *edx)
{
/* ecx is often an input as well as an output. */
asm volatile("cpuid"
: "=a" (*eax),
"=b" (*ebx),
"=c" (*ecx),
"=d" (*edx)
: "0" (*eax), "2" (*ecx));
}
#include <stdio.h>
int main(int argc, char **argv)
{
unsigned eax, ebx, ecx, edx;
eax = 1; /* processor info and feature bits */
native_cpuid(&eax, &ebx, &ecx, &edx);
printf("stepping %d\n", eax & 0xF);
printf("model %d\n", (eax >> 4) & 0xF);
printf("family %d\n", (eax >> 8) & 0xF);
printf("processor type %d\n", (eax >> 12) & 0x3);
printf("extended model %d\n", (eax >> 16) & 0xF);
printf("extended family %d\n", (eax >> 20) & 0xFF);
eax = 0x80000001;
native_cpuid(&eax, &ebx, &ecx, &edx);
printf("rdtscp %d\n", (edx >> 27) & 0x1);
}
I get:
stepping 10
model 7
family 6
processor type 0
extended model 1
extended family 0
rdtscp 0

So rdtscp is not supported for penryn processor if this example code is ok. It means that Intel sde does not emulate/simulate correctly this instruction. It does seem to add the flag:
~/Downloads/sde-external-8.16.0-2018-01-30-lin/sde64 -- ./a.out
stepping 0
model 6
family 6
processor type 0
extended model 6
extended family 0
rdtscp 1

Would it be possible for you to fix the compilation errors that appear with a patched makefile like this?
diff --git a/Makefile b/Makefile
index 53ef001..3c58028 100644
--- a/Makefile
+++ b/Makefile
@@ -1,4 +1,4 @@
-SIMD_FLAGS := -DSIMD_ENABLE=4 -msse4.2 -maes
+SIMD_FLAGS := -DSIMD_ENABLE=4 -msse4.1
override CXXFLAGS += -std=c++11 -Wall -O3 -fPIC -I. -I../ -Ithird_party/brotli/c/include/ -Wno-sign-compare
override LDFLAGS += -lpthread
A solution to compile without sse would be even better for generic debug and to run the code on other architectures (arm64).

from pik.

tudorsuciu avatar tudorsuciu commented on May 7, 2024

I wanted to see what sse4.2 instruction is used, so i deactivated the exit after "Cannot continue because CPU lacks SSE4 support". There is no instruction used in cpik/dpik that needs sse4.2. The only intrinsic that i found in simd library is:
mm_cmpgt_epi64
In my case the problem is fixed, i was able (finally) to check/evaluate the quality of pik on big images.
I don't know if the issue should be closed, it depends if there is somebody else impacted.

Regards,

from pik.

jan-wassenberg avatar jan-wassenberg commented on May 7, 2024

Congratulations, happy to hear you got it working!
The timers are fixed and will be in the next push.
Looks like reducing from SSE4.2 to SSE4.1 would cover about 1.4% more systems according to Steam Hardware survey. That's surprising and possibly worthwhile, I'll add it to the backlog. Thanks for suggesting and investigating this!

from pik.

jan-wassenberg avatar jan-wassenberg commented on May 7, 2024

Hello again, we've fixed this locally (removed cmpgt_epi64 and aes), so hopefully it would work in Penryn after the next push. Thanks again for suggesting and investigating this!

from pik.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.