Git Product home page Git Product logo

Comments (18)

da-phil avatar da-phil commented on August 16, 2024 1

Got it. Just to clarify, I'm using ROCm since rusticl doesn't support my iGPU, but the message is the same of course.

@tomaz-suller
Did you try out what @garrett proposed in order to make ROCm recognize your iGPU as a supported GPU, setting the HSA_OVERRIDE_GFX_VERSION environment variable?

Another guess: does the "classic" OpenCL driver work for iGPUs?
E.g. installing it by sudo amdgpu-install --usecase=graphics,opencl --opencl=rocr

I'm interested in this issue as I'm currently planning to buy a laptop (TUXEDO Pulse 14 Gen 4) featuring a Radeon 780M iGPU.

In this forum they claimed that ROCm should work for this iGPU:

Note that we use HSA_OVERRIDE_GFX_VERSION=11.0.0 because the 780m iGPU is gfx1103 (version 11.0.3) which ROCm does not support, but in my experience using the override to tell ROCm to pretend it is gfx1100 seems to work without issue.

from darktable.

tomaz-suller avatar tomaz-suller commented on August 16, 2024

Forgot to add that the log resulted from me not only opening the picture, but also zooming it in until the glitch disappeared at 62% zoom.

from darktable.

gi-man avatar gi-man commented on August 16, 2024

Is rocm 6.0 supporting iGPU? I think Apu and iGPU were not supported in previous versions.

from darktable.

tomaz-suller avatar tomaz-suller commented on August 16, 2024

To be entirely honest I'm not sure, and I'm not sure about how to check either; if you give any instructions I can follow them.

What I do know, as the logs and my GPU usage according to nvtop show, is that the GPU is detected and that darktable is using it.

from darktable.

gi-man avatar gi-man commented on August 16, 2024

I know the rocm 5.7 drivers do load and they produce images with errors when I use my APU on linux. In one of rocm websites it described that supported/tested systems for Linux. I do know the drivers work on windows.

from darktable.

tomaz-suller avatar tomaz-suller commented on August 16, 2024

Seems to be precisely what is going on here. The drivers load and the device is detected both by darktable and clinfo, which is not the case with the Rusticl (since apparently it doesn't provide support for APU at all).

from darktable.

jenshannoschwalm avatar jenshannoschwalm commented on August 16, 2024

@tomaz-suller would you be able to check with current master? I think the issue should be gone right now, if not please prove a fresh log with -d opencl -d pipe to investigate further.

from darktable.

tomaz-suller avatar tomaz-suller commented on August 16, 2024

Just tested, same behaviour still. Just to be sure I didn't mess up during the installation, here's what I installed:

darktable d496f37
Copyright (C) 2012-2024 Johannes Hanika and other contributors.

Compile options:
  Bit depth              -> 64 bit
  Debug                  -> DISABLED
  SSE2 optimizations     -> ENABLED
  OpenMP                 -> ENABLED
  OpenCL                 -> ENABLED
  Lua                    -> ENABLED  - API version 9.3.0
  Colord                 -> ENABLED
  gPhoto2                -> ENABLED
  GMIC                   -> ENABLED  - Compressed LUTs are supported
  GraphicsMagick         -> ENABLED
  ImageMagick            -> DISABLED
  libavif                -> ENABLED
  libheif                -> ENABLED
  libjxl                 -> ENABLED
  OpenJPEG               -> ENABLED
  OpenEXR                -> ENABLED
  WebP                   -> ENABLED

And here are the logs.

from darktable.

tomaz-suller avatar tomaz-suller commented on August 16, 2024

The output of the first command is just /opt/darktable-test/bin/darktable --version.

To produce the logs I ran darktable with /opt/darktable-test/bin/darktable --configdir "~/.config/darktable-test" -d pipe -d opencl, imported 3 NEF files, opened two of them and zoomed in and out.

from darktable.

jenshannoschwalm avatar jenshannoschwalm commented on August 16, 2024

The device offers only 1gb of ram so there is a huge amount of tiling. Difficult to track down from here.

  1. Can you somehow control the size of dedicated ram, maybe via bios settings?

  2. Could you check with resources=small settings? Also with logs as above ?

  3. Are there any modules you can switch off and the issue goes away?

  4. Can you please confirm that issue goes away while zooming in?

from darktable.

tomaz-suller avatar tomaz-suller commented on August 16, 2024
  1. Didn't understand what you mean. I've never tried controlling it, but for sure I can't increase it since I'm running darktable on my laptop, which is the only computer I have, if that's what you're asking; frankly I don't know if it's possible to reduce it.

  2. Still same problem. Logs are here. Just to be 100% sure, this is what you mean by

    resources=small

    right?
    Screenshot_20240320_114300

  3. I'm a beginner in darktable, and I have the default install from master, so I'm a bit clueless about what the "modules" would be. How could I go about disabling them?

  4. Yes, the issue goes away when zooming in, at roughly the same level as before (around 38%)

from darktable.

gi-man avatar gi-man commented on August 16, 2024

The rusticl developer pushed an update to the memory allocation and I think it applies to all of mesa (rocm and rusticl). Im not sure if it is merged. This should help with the 1gb memory.

from darktable.

garrett avatar garrett commented on August 16, 2024

In previous versions of darktable, it seemed to pick the correct device with ROCm.

At some point over the past several, either ROCm and/or darktable changed and I'd also see these glitches.

I worked around this issue by adding this snippet to /etc/environment (and logged out and back in) and now darktable works very well again with my 7900 XTX:

HSA_OVERRIDE_GFX_VERSION=11.0.0

(It would be different for different video cards; this is for the 7000 series. For example, I think HSA_OVERRIDE_GFX_VERSION=10.3.0 would be needed for the 6000 series.)

Previously, setting this was unnecessary and darktable picked the correct GPU. In other words, I'm not suggesting this as a solution, only as a temporary workaround and perhaps as a hint as to what the problem might be.

I did disable the iGPU on my AMD Ryzen 9 7950X3D, and rocminfo still shows the CPU as well as my discrete GPU. So I'm guessing that darktable isn't picking the correct GPU or trying to use them all.

After disabling the iGPU, I see this:

ROCk module is loaded
=====================    
HSA System Attributes    
=====================    
Runtime Version:         1.1
System Timestamp Freq.:  1000.000000MHz
Sig. Max Wait Duration:  18446744073709551615 (0xFFFFFFFFFFFFFFFF) (timestamp count)
Machine Model:           LARGE                              
System Endianness:       LITTLE                             
Mwaitx:                  DISABLED
DMAbuf Support:          YES

==========               
HSA Agents               
==========               
*******                  
Agent 1                  
*******                  
  Name:                    AMD Ryzen 9 7950X3D 16-Core Processor
  Uuid:                    CPU-XX                             
  Marketing Name:          AMD Ryzen 9 7950X3D 16-Core Processor
  Vendor Name:             CPU                                
  Feature:                 None specified                     
  Profile:                 FULL_PROFILE                       
  Float Round Mode:        NEAR                               
  Max Queue Number:        0(0x0)                             
  Queue Min Size:          0(0x0)                             
  Queue Max Size:          0(0x0)                             
  Queue Type:              MULTI                              
  Node:                    0                                  
  Device Type:             CPU                                
  Cache Info:              
    L1:                      32768(0x8000) KB                   
  Chip ID:                 0(0x0)                             
  ASIC Revision:           0(0x0)                             
  Cacheline Size:          64(0x40)                           
  Max Clock Freq. (MHz):   5759                               
  BDFID:                   0                                  
  Internal Node ID:        0                                  
  Compute Unit:            32                                 
  SIMDs per CU:            0                                  
  Shader Engines:          0                                  
  Shader Arrs. per Eng.:   0                                  
  WatchPts on Addr. Ranges:1                                  
  Features:                None
  Pool Info:               
    Pool 1                   
      Segment:                 GLOBAL; FLAGS: FINE GRAINED        
      Size:                    65467276(0x3e6f38c) KB             
      Allocatable:             TRUE                               
      Alloc Granule:           4KB                                
      Alloc Alignment:         4KB                                
      Accessible by all:       TRUE                               
    Pool 2                   
      Segment:                 GLOBAL; FLAGS: KERNARG, FINE GRAINED
      Size:                    65467276(0x3e6f38c) KB             
      Allocatable:             TRUE                               
      Alloc Granule:           4KB                                
      Alloc Alignment:         4KB                                
      Accessible by all:       TRUE                               
    Pool 3                   
      Segment:                 GLOBAL; FLAGS: COARSE GRAINED      
      Size:                    65467276(0x3e6f38c) KB             
      Allocatable:             TRUE                               
      Alloc Granule:           4KB                                
      Alloc Alignment:         4KB                                
      Accessible by all:       TRUE                               
  ISA Info:                
*******                  
Agent 2                  
*******                  
  Name:                    gfx1100                            
  Uuid:                    GPU-16f2a3584821508f               
  Marketing Name:          AMD Radeon RX 7900 XTX             
  Vendor Name:             AMD                                
  Feature:                 KERNEL_DISPATCH                    
  Profile:                 BASE_PROFILE                       
  Float Round Mode:        NEAR                               
  Max Queue Number:        128(0x80)                          
  Queue Min Size:          64(0x40)                           
  Queue Max Size:          131072(0x20000)                    
  Queue Type:              MULTI                              
  Node:                    1                                  
  Device Type:             GPU                                
  Cache Info:              
    L1:                      32(0x20) KB                        
    L2:                      6144(0x1800) KB                    
    L3:                      98304(0x18000) KB                  
  Chip ID:                 29772(0x744c)                      
  ASIC Revision:           0(0x0)                             
  Cacheline Size:          64(0x40)                           
  Max Clock Freq. (MHz):   2371                               
  BDFID:                   768                                
  Internal Node ID:        1                                  
  Compute Unit:            96                                 
  SIMDs per CU:            2                                  
  Shader Engines:          6                                  
  Shader Arrs. per Eng.:   2                                  
  WatchPts on Addr. Ranges:4                                  
  Coherent Host Access:    FALSE                              
  Features:                KERNEL_DISPATCH 
  Fast F16 Operation:      TRUE                               
  Wavefront Size:          32(0x20)                           
  Workgroup Max Size:      1024(0x400)                        
  Workgroup Max Size per Dimension:
    x                        1024(0x400)                        
    y                        1024(0x400)                        
    z                        1024(0x400)                        
  Max Waves Per CU:        32(0x20)                           
  Max Work-item Per CU:    1024(0x400)                        
  Grid Max Size:           4294967295(0xffffffff)             
  Grid Max Size per Dimension:
    x                        4294967295(0xffffffff)             
    y                        4294967295(0xffffffff)             
    z                        4294967295(0xffffffff)             
  Max fbarriers/Workgrp:   32                                 
  Packet Processor uCode:: 550                                
  SDMA engine uCode::      19                                 
  IOMMU Support::          None                               
  Pool Info:               
    Pool 1                   
      Segment:                 GLOBAL; FLAGS: COARSE GRAINED      
      Size:                    25149440(0x17fc000) KB             
      Allocatable:             TRUE                               
      Alloc Granule:           4KB                                
      Alloc Alignment:         4KB                                
      Accessible by all:       FALSE                              
    Pool 2                   
      Segment:                 GLOBAL; FLAGS: EXTENDED FINE GRAINED
      Size:                    25149440(0x17fc000) KB             
      Allocatable:             TRUE                               
      Alloc Granule:           4KB                                
      Alloc Alignment:         4KB                                
      Accessible by all:       FALSE                              
    Pool 3                   
      Segment:                 GROUP                              
      Size:                    64(0x40) KB                        
      Allocatable:             FALSE                              
      Alloc Granule:           0KB                                
      Alloc Alignment:         0KB                                
      Accessible by all:       FALSE                              
  ISA Info:                
    ISA 1                    
      Name:                    amdgcn-amd-amdhsa--gfx1100         
      Machine Models:          HSA_MACHINE_MODEL_LARGE            
      Profiles:                HSA_PROFILE_BASE                   
      Default Rounding Mode:   NEAR                               
      Default Rounding Mode:   NEAR                               
      Fast f16:                TRUE                               
      Workgroup Max Size:      1024(0x400)                        
      Workgroup Max Size per Dimension:
        x                        1024(0x400)                        
        y                        1024(0x400)                        
        z                        1024(0x400)                        
      Grid Max Size:           4294967295(0xffffffff)             
      Grid Max Size per Dimension:
        x                        4294967295(0xffffffff)             
        y                        4294967295(0xffffffff)             
        z                        4294967295(0xffffffff)             
      FBarrier Max Size:       32                                 
*** Done ***             

Note the 2 "agents" for ROCm; first is CPU, second is GPU. If I re-enable my iGPU, I'd probably have 3 listed.

from darktable.

jenshannoschwalm avatar jenshannoschwalm commented on August 16, 2024

@tomaz-suller what you describe is pinpointing to a mem alloc problem. Not sure yet if it's a rusticl bug or dt doing something wrong like overallocating memory. Running dt using the small preference points to a rusticl problem as more likely.

from darktable.

tomaz-suller avatar tomaz-suller commented on August 16, 2024

Got it. Just to clarify, I'm using ROCm since rusticl doesn't support my iGPU, but the message is the same of course.

from darktable.

jenshannoschwalm avatar jenshannoschwalm commented on August 16, 2024

@tomaz-suller the question about modules was about "what dt modules are you yousing? At the right side in darkroom. Could you check with demosaic set to lmmse and check for issue? Could you disable highlights and check? This would be the test on your side where the issue might becoming from. The log shows: opencl is running fine and didn't report an issue...

from darktable.

jenshannoschwalm avatar jenshannoschwalm commented on August 16, 2024

OK, the AMD driver is notorious for problems :-) Maaybe you can identify the bad module and we can fix it in dt. :-) it might be worth to test if opencl on such a small device is helping for performance at all...

from darktable.

pomoke avatar pomoke commented on August 16, 2024

I have this problem as well with integrated Vega 7, with lossy ARWs. And if interpolator is changed, the stripes will have different pattern.

Darktable: 4.6.1
Log from darktable -d pipe -d opencl:

darktable 4.6.1
Copyright (C) 2012-2024 Johannes Hanika and other contributors.

Compile options:
  Bit depth              -> 64 bit
  Debug                  -> DISABLED
  SSE2 optimizations     -> ENABLED
  OpenMP                 -> ENABLED
  OpenCL                 -> ENABLED
  Lua                    -> ENABLED  - API version 9.2.0
  Colord                 -> ENABLED
  gPhoto2                -> ENABLED
  GMIC                   -> ENABLED  - Compressed LUTs are supported
  GraphicsMagick         -> ENABLED
  ImageMagick            -> DISABLED
  libavif                -> ENABLED
  libheif                -> ENABLED
  libjxl                 -> ENABLED
  OpenJPEG               -> ENABLED
  OpenEXR                -> ENABLED
  WebP                   -> ENABLED

See https://www.darktable.org/resources/ for detailed documentation.
See https://github.com/darktable-org/darktable/issues/new/choose to report bugs.

     0.2291 [dt_get_sysresource_level] switched to 2 as `large'
     0.2291   total mem:       29803MB
     0.2291   mipmap cache:    3725MB
     0.2292   available mem:   20373MB
     0.2292   singlebuff:      465MB
     0.2573 [opencl_init] opencl library 'libOpenCL' found on your system and loaded, preference 'default path'
     0.4118 [opencl_init] found 3 platforms
     0.4119 [check platform] platform 'rusticl' with key 'clplatform_rusticl' is NOT active
     0.4257 [check platform] platform 'Portable Computing Language' with key 'clplatform_portablecomputinglanguage' is NOT active
[opencl_init] found 1 device

[dt_opencl_device_init]
   DEVICE:                   0: 'gfx900:xnack-'
   PLATFORM, VENDOR & ID:    AMD Accelerated Parallel Processing, Advanced Micro Devices, Inc., ID=4098
   CANONICAL NAME:           amdacceleratedparallelprocessinggfx900xnack
   DRIVER VERSION:           3590.0 (HSA1.1,LC)
   DEVICE VERSION:           OpenCL 2.0 
   DEVICE_TYPE:              GPU, dedicated mem
   GLOBAL MEM SIZE:          2048 MB
   MAX MEM ALLOC:            1741 MB
   MAX IMAGE SIZE:           16384 x 16384
   MAX WORK GROUP SIZE:      256
   MAX WORK ITEM DIMENSIONS: 3
   MAX WORK ITEM SIZES:      [ 1024 1024 1024 ]
   ASYNC PIXELPIPE:          NO
   PINNED MEMORY TRANSFER:   NO
   USE HEADROOM:             400Mb
   AVOID ATOMICS:            NO
   MICRO NAP:                250
   ROUNDUP WIDTH & HEIGHT    16x16
   CHECK EVENT HANDLES:      128
   TILING ADVANTAGE:         0.000
   DEFAULT DEVICE:           NO
   KERNEL BUILD DIRECTORY:   /usr/share/darktable/kernels
   KERNEL DIRECTORY:         /home/<redacted>/.cache/darktable/cached_v3_kernels_for_AMDAcceleratedParallelProcessinggfx900xnack_35900HSA11LC
   CL COMPILER OPTION:       -cl-fast-relaxed-math
   CL COMPILER COMMAND:      -w -cl-fast-relaxed-math  -DAMD=1 -I"/usr/share/darktable/kernels"
   KERNEL LOADING TIME:       0.0634 sec
[opencl_init] OpenCL successfully initialized. internal numbers and names of available devices:
[opencl_init]		0	'AMD Accelerated Parallel Processing gfx900:xnack-'
     0.9185 [opencl_init] FINALLY: opencl is AVAILABLE and ENABLED.
[opencl_init] opencl_scheduling_profile: 'default'
[opencl_init] opencl_device_priority: '*/!0,*/*/*/!0,*'
[opencl_init] opencl_mandatory_timeout: 400
[dt_opencl_update_priorities] these are your device priorities:
[dt_opencl_update_priorities] 		image	preview	export	thumbs	preview2
[dt_opencl_update_priorities]		0	-1	0	0	-1
[dt_opencl_update_priorities] show if opencl use is mandatory for a given pixelpipe:
[dt_opencl_update_priorities] 		image	preview	export	thumbs	preview2
[dt_opencl_update_priorities]		0	0	0	0	0
[opencl_synchronization_timeout] synchronization timeout set to 200
[dt_opencl_update_priorities] these are your device priorities:
[dt_opencl_update_priorities] 		image	preview	export	thumbs	preview2
[dt_opencl_update_priorities]		0	-1	0	0	-1
[dt_opencl_update_priorities] show if opencl use is mandatory for a given pixelpipe:
[dt_opencl_update_priorities] 		image	preview	export	thumbs	preview2
[dt_opencl_update_priorities]		0	0	0	0	0
[opencl_synchronization_timeout] synchronization timeout set to 200

This is how it looks:
image

With LMMSE demosaic, the fit view works.

image

from darktable.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.