Skip to content

[Bug] docker image exited with code 132 (restarting) (illegal instruction) #1695

Description

@cs8425

Git commit

b12098f
docker image crash ghcr.io/leejet/stable-diffusion.cpp:master-vulkan@sha256:3723a778e62fbcc8a1085ce53b758cf332f7198278552010499fc89190257611
not test other backend version yet

Operating System & Version

Ubuntu 24.04.4

GGML backends

Vulkan

Command-line arguments used

./sd-server --listen-ip 0.0.0.0 --listen-port 8081 -v --mmap --rng cpu --diffusion-model ./anima-base/waiANIMA_v10Base10.safetensors --vae ./anima-base/qwen_image_vae.safetensors --llm ./anima-base/qwen_3_06b_base.safetensors --cfg-scale 1 --steps 8 --sampling-method er_sde --offload-to-cpu --vae-tiling

Steps to reproduce

compose.yml:

services:
  sd-server:
    image: ghcr.io/leejet/stable-diffusion.cpp:master-vulkan@sha256:15df896405c93b49c7d16123bbda18906a287681a06d88ba92f260796c868f2c
    container_name: sd-vulkan-server
    ports:
      - "8081:8081"
    devices:
      - /dev/dri:/dev/dri
      - /dev/kfd:/dev/kfd
    volumes:
      - ./models:/app/models
      - ./lora:/app/lora
      - ./upscale:/app/upscale
      - ./anima-base:/app/anima-base
    user: 1000:1000
    group_add:
      - video
      - 110 #render # Required for GPU access
    working_dir: /app
    entrypoint: ["/sd-server"]
    command: >
      --listen-ip 0.0.0.0
      --listen-port 8081
      -v
      --mmap
      --rng cpu
      --diffusion-model /app/anima-base/waiANIMA_v10Base10.safetensors
      --vae /app/anima-base/qwen_image_vae.safetensors
      --llm /app/anima-base/qwen_3_06b_base.safetensors
      --cfg-scale 1
      --steps 8
      --sampling-method er_sde
      --offload-to-cpu
      --vae-tiling
    restart: unless-stopped

and run: docker compose up

What you expected to happen

start sd-server as normal, no crash

What actually happened

got exit code 132 (illegal instruction)

Logs / error messages / stack trace

docker container log:

sd-vulkan-server  | [DEBUG] main.cpp:82   - version: stable-diffusion.cpp version unknown, commit b12098f
sd-vulkan-server  | WARNING: radv is not a conformant Vulkan implementation, testing use only.
sd-vulkan-server  | ggml_vulkan: Found 2 Vulkan devices:
sd-vulkan-server  | ggml_vulkan: 0 = AMD Radeon Graphics (RADV GFX1200) (radv) | uma: 0 | fp16: 1 | bf16: 0 | warp size: 64 | shared memory: 65536 | int dot: 0 | matrix cores: KHR_coopmat
sd-vulkan-server  | ggml_vulkan: 1 = AMD Radeon Graphics (RADV RENOIR) (radv) | uma: 1 | fp16: 1 | bf16: 0 | warp size: 64 | shared memory: 65536 | int dot: 0 | matrix cores: none
sd-vulkan-server exited with code 132 (restarting)

even disable both --offload-to-cpu and --rng cpu, it still crash before the line [DEBUG] main.cpp:83 - System Info:

Additional context / environment details

not work:

docker image ghcr.io/leejet/stable-diffusion.cpp:master-vulkan@sha256:15df896405c93b49c7d16123bbda18906a287681a06d88ba92f260796c868f2c
same as version master-714-b12098f

work:

  • download via release master-714-b12098f sd-master-b12098f-bin-Linux-Ubuntu-24.04-x86_64-vulkan.zip
  • dokcer image ghcr.io/leejet/stable-diffusion.cpp:master-vulkan@sha256:3723a778e62fbcc8a1085ce53b758cf332f7198278552010499fc89190257611 (same as master-713-2bd249c)

other detail

sha256 not same for sd-server in release zip and image

4133873aac5ca062ef26fe6d85c190249bd5e22faba3af8aa2c3108f544877f7  sd-server-release-fine
3f9fd64ca151a8e1fb42f2c7a84dd9040a1dbde37c5ebbea982fda829c5d0aef  sd-server-docker-crash

verify by gdb, seems like AVX512 (my cpu did not support AVX512)

multi-thre Thread 0x7ffff7e9ec (asm) In: ggml_cpu_init                                                             L??   PC: 0x555555afa3ec 
#0  0x0000555555afa3ec in ggml_cpu_init ()
#1  0x0000555555aa929d in ggml_backend_cpu_reg ()
#2  0x0000555555aa40e9 in get_reg() ()
#3  0x0000555555aa449d in ggml_backend_dev_count ()
#4  0x00005555557237dc in sd_get_system_info::{lambda()#1}::operator()() const [clone .isra.0] ()
#5  0x0000555555723d9a in sd_get_system_info ()
#6  0x00005555555fc28a in main ()
(gdb) x/i $pc
=> 0x555555afa3ec <ggml_cpu_init+572>:  vmovdqa32 0x30f728a(%rip),%zmm0        # 0x555558bf1680

I have no idea why AVX512 enable by commit between two build: master-713-2bd249c...master-714-b12098f

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't working

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions