Skip to content

[QNN] InternVL3-1B fails to load on SA8295 (V68) due to insufficient HTP PD memory #18410

@fgsiveone

Description

@fgsiveone

🐛 Describe the bug

Bug Report: InternVL3-1B vision encoder fails to load on SA8295 (V68) with "Failed to find available PD" error

Environment

  • Device: SA8295
  • QNN SDK Version: 2.44.0
  • ExecuTorch: 60d57e5
  • Model: InternVL3-1B (--decoder_model internvl3_1b)

Description

I referenced issue #18280 to attempt running the InternVL3-1B model on the SA8295 platform, but I encountered some errors during execution. Does anyone have any suggestions or insights on how to resolve these errors?
CC @cccclai @quic-boyuc @haowhsu-quic @shewu-quic @chunit-quic @winskuo-quic

diff --git a/backends/qualcomm/quantizer/custom_annotation.py b/backends/qualcomm/quantizer/custom_annotation.py
index 54456877c1..825dec4dc6 100644
--- a/backends/qualcomm/quantizer/custom_annotation.py
+++ b/backends/qualcomm/quantizer/custom_annotation.py
@@ -10,6 +10,7 @@ import torch
 from executorch.backends.qualcomm.quantizer.quantizer import (
     get_16a8w_qnn_ptq_config,
     get_16a8w_qnn_qat_config,
+    get_16a4w_qnn_ptq_config,
     get_8a8w_qnn_ptq_config,
     get_8a8w_qnn_qat_config,
     get_ptq_per_channel_quant_config,
@@ -27,6 +28,7 @@ from torchao.quantization.pt2e.quantizer import (
     annotate_output_qspec,
     QuantizationAnnotation,
     SharedQuantizationSpec,
+    QuantizationSpec,
 )


@@ -464,3 +466,30 @@ def get_custom_quant_ios_dtype(
         u.target for u in list(node.users.keys())
     ] + [node.target]:
         return sharding_dtype
+
+def custom_annotation_16a4w_layer_norm(gm):
+    use_16a4w_config = get_16a4w_qnn_ptq_config()
+    use_16a4w_config.weight = QuantizationSpec(
+        dtype=torch.uint8,
+        quant_min=0,
+        quant_max=15,
+        qscheme=torch.per_tensor_symmetric,
+        ch_axis=0,
+        observer_or_fake_quant_ctr=use_16a4w_config.weight.observer_or_fake_quant_ctr,
+    )
+    for node in gm.graph.nodes:
+        if node.target != torch.ops.aten.layer_norm.default:
+            continue
+        act_node = node.args[0]
+        weight_node = node.args[2]
+        bias_node = None
+        input_qspec_map = {act_node: use_16a4w_config.input_activation, weight_node: use_16a4w_config.weight}
+        if len(node.args) > 2:
+            bias_node = node.args[3]
+            input_qspec_map[bias_node] = use_16a4w_config.bias
+
+        node.meta[Q_ANNOTATION_KEY] = QuantizationAnnotation(
+            input_qspec_map=input_qspec_map,
+            output_qspec=use_16a4w_config.output_activation,
+            _annotated=True,
+        )

diff --git a/examples/qualcomm/oss_scripts/llama/__init__.py b/examples/qualcomm/oss_scripts/llama/__init__.py
index b28b4752c1..adfca110b6 100644
--- a/examples/qualcomm/oss_scripts/llama/__init__.py
+++ b/examples/qualcomm/oss_scripts/llama/__init__.py
@@ -528,7 +528,7 @@ class InternVL3_1B(LLMModelConfig):
     convert_weights = convert_internvl3_weights
     transform_weight = False
     instruct_model = True
-    num_sharding = 1
+    num_sharding = 24
     masked_softmax = True
     seq_mse_candidates = 0
     r1 = False
diff --git a/examples/qualcomm/oss_scripts/llama/llama.py b/examples/qualcomm/oss_scripts/llama/llama.py
index 5449599acc..48ab2cfc7a 100755
--- a/examples/qualcomm/oss_scripts/llama/llama.py
+++ b/examples/qualcomm/oss_scripts/llama/llama.py
@@ -123,7 +123,8 @@ def compile(
             # In multi<E2><80><91>image scenarios, we skip encoder quantization by default to preserve modality feature quality,
             # because the encoder is quite sensitive and quantization can make it harder for the model to distinguish
             # between images within the same conversation.
-            to_skip = len(args.image_path) > 1
+            # to_skip = len(args.image_path) > 1
+            to_skip = True
             backend_options = generate_htp_compiler_spec(
                 use_fp16=to_skip,
             )
diff --git a/examples/qualcomm/oss_scripts/llama/static_llm_quant_recipe.py b/examples/qualcomm/oss_scripts/llama/static_llm_quant_recipe.py
index ca06a89142..13c4374861 100644
--- a/examples/qualcomm/oss_scripts/llama/static_llm_quant_recipe.py
+++ b/examples/qualcomm/oss_scripts/llama/static_llm_quant_recipe.py
@@ -7,7 +7,7 @@
 from typing import Optional

 import torch
-from executorch.backends.qualcomm.quantizer.custom_annotation import annotate_kv_8bit
+from executorch.backends.qualcomm.quantizer.custom_annotation import annotate_kv_8bit, custom_annotate_matmul_16a8w, custom_annotation_16a4w_layer_norm
 from executorch.backends.qualcomm.quantizer.quant_recipe import (
     QuantGranularity,
     QuantRecipe,
@@ -419,15 +419,16 @@ class InternVL3_1B_QuantRecipe(StaticLLMQuantRecipe):
             act_observer=MinMaxObserver,
             granularity=QuantGranularity.PER_TENSOR,
             verbose=verbose,
-        ).add_node_target(
-            {
-                torch.ops.aten.conv2d.default,
-            },
-            QuantDtype.use_16a8w,
-            False,
-            act_observer=MinMaxObserver,
-            granularity=QuantGranularity.PER_CHANNEL,
+        # ).add_node_target(
+        #     {
+        #         torch.ops.aten.conv2d.default,
+        #     },
+        #     QuantDtype.use_16a8w,
+        #     False,
+        #     act_observer=MinMaxObserver,
+        #     granularity=QuantGranularity.PER_CHANNEL,
         )
+        self.recipe.custom_quant_annotations.extend([custom_annotation_16a4w_layer_norm,custom_annotate_matmul_16a8w, annotate_kv_8bit])

Conversion Command

python examples/qualcomm/oss_scripts/llama/llama.py \
  -b build-android \
  -m SA8295 \
  --decoder_model internvl3_1b \
  --model_mode hybrid \
  --prefill_ar_len 32 \
  --max_seq_len 1024 \
  --prompt "Can you describe this image?" \
  --image_path "http://images.cocodataset.org/val2017/000000039769.jpg" \
  --compile_only

Artifact

-rw-r--r-- 1 root root  481 Mar 23 12:38 chat_template.jinja
-rw-r--r-- 1 root root 2.4G Mar 23 12:44 decode_qdq.pt2
-rw-r--r-- 1 root root 481M Mar 23 12:59 hybrid_llama_qnn.pte
-rw-r--r-- 1 root root 260M Mar 23 12:49 tok_embedding_qnn.pte
-rw-r--r-- 1 root root  11M Mar 23 12:38 tokenizer.json
-rw-r--r-- 1 root root 1.6K Mar 23 12:38 tokenizer_config.json
-rw-r--r-- 1 root root 605M Mar 23 12:49 vision_encoder_qnn.pte

Execution Command

./qnn_multimodal_runner \
  -decoder_model_version internvl3 \
 -decoder_path hybrid_llama_qnn.pte \
 -encoder_path vision_encoder_qnn.pte \
 -tok_embedding_path tok_embedding_qnn.pte \
 -tokenizer_path tokenizer.json \
 -eval_mode 1 \
 -image_path ./pic1.jpg \
 -prompt "Describe this image:" \
 -seq_len 256

Error Log

I tokenizers:regex.cpp:27] Registering override fallback regex
I 00:00:00.000505 executorch:qnn_multimodal_runner.cpp:247] Load Encoder: vision_encoder_qnn.pte
I 00:00:00.000529 executorch:qnn_multimodal_runner.cpp:254] Load Token Embedding: tok_embedding_qnn.pte
I 00:00:00.000542 executorch:qnn_multimodal_runner.cpp:261] Load Text Decoder: hybrid_llama_qnn.pte
I 00:00:00.000635 executorch:qnn_multimodal_runner.cpp:139] Starting multimodal runner
I 00:00:00.000642 executorch:multimodal_runner.cpp:144] creating runner: tokenizer_path=tokenizer.json
I 00:00:00.000644 executorch:multimodal_runner.cpp:145] eval mode=1
I tokenizers:normalizer.cpp:102] Using NFC normalizer. Please notice that our implementation may not handle all edge cases.
WARNING: All log messages before absl::InitializeLog() is called are written to STDERR
E0000 00:00:1750716851.356408   25999 re2.cc:237] Error parsing '((?i:'s|'t|'re|'ve|'m|'ll|'d)|[^\r\n\p{L}\p{N}]?\p{L}+|\p{N}| ?[^\s\p{L}\p{N}]+[\r\n]*|\s*[\r\n]+|\s...': invalid perl operator: (?!
I tokenizers:re2_regex.cpp:27] Re2 failed to compile regex: ((?i:'s|'t|'re|'ve|'m|'ll|'d)|[^\r\n\p{L}\p{N}]?\p{L}+|\p{N}| ?[^\s\p{L}\p{N}]+[\r\n]*|\s*[\r\n]+|\s+(?!\S)|\s+), error: invalid perl operator: (?!
This may be ok if a fallback regex is used.
I tokenizers:regex_lookahead.cpp:27] Creating PCRE2 regex
I 00:00:00.710134 executorch:llm_runner_helper.cpp:54] Loaded json tokenizer
[INFO] [Qnn ExecuTorch]: Deserializing processed data using QnnContextCustomProtocol
[INFO] [Qnn ExecuTorch]: Creating new backend bundle.
[INFO] [Qnn ExecuTorch]: create QNN Logger with log_level 1
[INFO] [Qnn ExecuTorch]: Initialize Qnn backend parameters for Qnn executorch backend type 2
[INFO] [Qnn ExecuTorch]: Caching: Caching is in RESTORE MODE.
[INFO] [Qnn ExecuTorch]: QnnContextCustomProtocol expected magic number: 0x5678abcd but get: 0x2000000
[INFO] [Qnn ExecuTorch]: Running level=1 optimization.
[INFO] [Qnn ExecuTorch]: Running level=1 optimization.
[INFO] [Qnn ExecuTorch]: Deserializing processed data using QnnContextCustomProtocol
[INFO] [Qnn ExecuTorch]: Use cached backend bundle for current backend: kHtpBackend
[INFO] [Qnn ExecuTorch]: Initialize Qnn backend parameters for Qnn executorch backend type 2
[INFO] [Qnn ExecuTorch]: Caching: Caching is in RESTORE MODE.
[INFO] [Qnn ExecuTorch]: QnnContextCustomProtocol expected magic number: 0x5678abcd but get: 0x2000000
[INFO] [Qnn ExecuTorch]: Running level=1 optimization.
[INFO] [Qnn ExecuTorch]: Running level=1 optimization.
[INFO] [Qnn ExecuTorch]: Deserializing processed data using QnnContextCustomProtocol
[INFO] [Qnn ExecuTorch]: Use cached backend bundle for current backend: kHtpBackend
[INFO] [Qnn ExecuTorch]: Initialize Qnn backend parameters for Qnn executorch backend type 2
[INFO] [Qnn ExecuTorch]: Caching: Caching is in RESTORE MODE.
[INFO] [Qnn ExecuTorch]: QnnContextCustomProtocol expected magic number: 0x5678abcd but get: 0x2000000
[INFO] [Qnn ExecuTorch]: Running level=1 optimization.
[INFO] [Qnn ExecuTorch]: Running level=1 optimization.
[INFO] [Qnn ExecuTorch]: Deserializing processed data using QnnContextCustomProtocol
[INFO] [Qnn ExecuTorch]: Use cached backend bundle for current backend: kHtpBackend
[INFO] [Qnn ExecuTorch]: Initialize Qnn backend parameters for Qnn executorch backend type 2
[INFO] [Qnn ExecuTorch]: Caching: Caching is in RESTORE MODE.
[INFO] [Qnn ExecuTorch]: QnnContextCustomProtocol expected magic number: 0x5678abcd but get: 0x2000000
[INFO] [Qnn ExecuTorch]: Running level=1 optimization.
[INFO] [Qnn ExecuTorch]: Running level=1 optimization.
[INFO] [Qnn ExecuTorch]: Deserializing processed data using QnnContextCustomProtocol
[INFO] [Qnn ExecuTorch]: Use cached backend bundle for current backend: kHtpBackend
[INFO] [Qnn ExecuTorch]: Initialize Qnn backend parameters for Qnn executorch backend type 2
[INFO] [Qnn ExecuTorch]: Caching: Caching is in RESTORE MODE.
[INFO] [Qnn ExecuTorch]: QnnContextCustomProtocol expected magic number: 0x5678abcd but get: 0x2000000
[INFO] [Qnn ExecuTorch]: Running level=1 optimization.
[INFO] [Qnn ExecuTorch]: Running level=1 optimization.
[INFO] [Qnn ExecuTorch]: Deserializing processed data using QnnContextCustomProtocol
[INFO] [Qnn ExecuTorch]: Use cached backend bundle for current backend: kHtpBackend
[INFO] [Qnn ExecuTorch]: Initialize Qnn backend parameters for Qnn executorch backend type 2
[INFO] [Qnn ExecuTorch]: Caching: Caching is in RESTORE MODE.
[INFO] [Qnn ExecuTorch]: QnnContextCustomProtocol expected magic number: 0x5678abcd but get: 0x2000000
[INFO] [Qnn ExecuTorch]: Running level=1 optimization.
[INFO] [Qnn ExecuTorch]: Running level=1 optimization.
[INFO] [Qnn ExecuTorch]: Deserializing processed data using QnnContextCustomProtocol
[INFO] [Qnn ExecuTorch]: Use cached backend bundle for current backend: kHtpBackend
[INFO] [Qnn ExecuTorch]: Initialize Qnn backend parameters for Qnn executorch backend type 2
[INFO] [Qnn ExecuTorch]: Caching: Caching is in RESTORE MODE.
[INFO] [Qnn ExecuTorch]: QnnContextCustomProtocol expected magic number: 0x5678abcd but get: 0x2000000
[INFO] [Qnn ExecuTorch]: Running level=1 optimization.
[INFO] [Qnn ExecuTorch]: Running level=1 optimization.
[INFO] [Qnn ExecuTorch]: Deserializing processed data using QnnContextCustomProtocol
[INFO] [Qnn ExecuTorch]: Use cached backend bundle for current backend: kHtpBackend
[INFO] [Qnn ExecuTorch]: Initialize Qnn backend parameters for Qnn executorch backend type 2
[INFO] [Qnn ExecuTorch]: Caching: Caching is in RESTORE MODE.
[INFO] [Qnn ExecuTorch]: QnnContextCustomProtocol expected magic number: 0x5678abcd but get: 0x2000000
[INFO] [Qnn ExecuTorch]: Running level=1 optimization.
[INFO] [Qnn ExecuTorch]: Running level=1 optimization.
[INFO] [Qnn ExecuTorch]: Deserializing processed data using QnnContextCustomProtocol
[INFO] [Qnn ExecuTorch]: Use cached backend bundle for current backend: kHtpBackend
[INFO] [Qnn ExecuTorch]: Initialize Qnn backend parameters for Qnn executorch backend type 2
[INFO] [Qnn ExecuTorch]: Caching: Caching is in RESTORE MODE.
[INFO] [Qnn ExecuTorch]: QnnContextCustomProtocol expected magic number: 0x5678abcd but get: 0x2000000
[INFO] [Qnn ExecuTorch]: Running level=1 optimization.
[INFO] [Qnn ExecuTorch]: Running level=1 optimization.
[INFO] [Qnn ExecuTorch]: Deserializing processed data using QnnContextCustomProtocol
[INFO] [Qnn ExecuTorch]: Use cached backend bundle for current backend: kHtpBackend
[INFO] [Qnn ExecuTorch]: Initialize Qnn backend parameters for Qnn executorch backend type 2
[INFO] [Qnn ExecuTorch]: Caching: Caching is in RESTORE MODE.
[INFO] [Qnn ExecuTorch]: QnnContextCustomProtocol expected magic number: 0x5678abcd but get: 0x2000000
[INFO] [Qnn ExecuTorch]: Running level=1 optimization.
[INFO] [Qnn ExecuTorch]: Running level=1 optimization.
[INFO] [Qnn ExecuTorch]: Deserializing processed data using QnnContextCustomProtocol
[INFO] [Qnn ExecuTorch]: Use cached backend bundle for current backend: kHtpBackend
[INFO] [Qnn ExecuTorch]: Initialize Qnn backend parameters for Qnn executorch backend type 2
[INFO] [Qnn ExecuTorch]: Caching: Caching is in RESTORE MODE.
[INFO] [Qnn ExecuTorch]: QnnContextCustomProtocol expected magic number: 0x5678abcd but get: 0x2000000
[INFO] [Qnn ExecuTorch]: Running level=1 optimization.
[INFO] [Qnn ExecuTorch]: Running level=1 optimization.
[INFO] [Qnn ExecuTorch]: Deserializing processed data using QnnContextCustomProtocol
[INFO] [Qnn ExecuTorch]: Use cached backend bundle for current backend: kHtpBackend
[INFO] [Qnn ExecuTorch]: Initialize Qnn backend parameters for Qnn executorch backend type 2
[INFO] [Qnn ExecuTorch]: Caching: Caching is in RESTORE MODE.
[INFO] [Qnn ExecuTorch]: QnnContextCustomProtocol expected magic number: 0x5678abcd but get: 0x2000000
[INFO] [Qnn ExecuTorch]: Running level=1 optimization.
[INFO] [Qnn ExecuTorch]: Running level=1 optimization.
[INFO] [Qnn ExecuTorch]: Deserializing processed data using QnnContextCustomProtocol
[INFO] [Qnn ExecuTorch]: Use cached backend bundle for current backend: kHtpBackend
[INFO] [Qnn ExecuTorch]: Initialize Qnn backend parameters for Qnn executorch backend type 2
[INFO] [Qnn ExecuTorch]: Caching: Caching is in RESTORE MODE.
[INFO] [Qnn ExecuTorch]: QnnContextCustomProtocol expected magic number: 0x5678abcd but get: 0x2000000
[INFO] [Qnn ExecuTorch]: Running level=1 optimization.
[INFO] [Qnn ExecuTorch]: Running level=1 optimization.
[INFO] [Qnn ExecuTorch]: Deserializing processed data using QnnContextCustomProtocol
[INFO] [Qnn ExecuTorch]: Use cached backend bundle for current backend: kHtpBackend
[INFO] [Qnn ExecuTorch]: Initialize Qnn backend parameters for Qnn executorch backend type 2
[INFO] [Qnn ExecuTorch]: Caching: Caching is in RESTORE MODE.
[INFO] [Qnn ExecuTorch]: QnnContextCustomProtocol expected magic number: 0x5678abcd but get: 0x2000000
[INFO] [Qnn ExecuTorch]: Running level=1 optimization.
[INFO] [Qnn ExecuTorch]: Running level=1 optimization.
[INFO] [Qnn ExecuTorch]: Deserializing processed data using QnnContextCustomProtocol
[INFO] [Qnn ExecuTorch]: Use cached backend bundle for current backend: kHtpBackend
[INFO] [Qnn ExecuTorch]: Initialize Qnn backend parameters for Qnn executorch backend type 2
[INFO] [Qnn ExecuTorch]: Caching: Caching is in RESTORE MODE.
[INFO] [Qnn ExecuTorch]: QnnContextCustomProtocol expected magic number: 0x5678abcd but get: 0x2000000
[INFO] [Qnn ExecuTorch]: Running level=1 optimization.
[INFO] [Qnn ExecuTorch]: Running level=1 optimization.
[INFO] [Qnn ExecuTorch]: Deserializing processed data using QnnContextCustomProtocol
[INFO] [Qnn ExecuTorch]: Use cached backend bundle for current backend: kHtpBackend
[INFO] [Qnn ExecuTorch]: Initialize Qnn backend parameters for Qnn executorch backend type 2
[INFO] [Qnn ExecuTorch]: Caching: Caching is in RESTORE MODE.
[INFO] [Qnn ExecuTorch]: QnnContextCustomProtocol expected magic number: 0x5678abcd but get: 0x2000000
[INFO] [Qnn ExecuTorch]: Running level=1 optimization.
[INFO] [Qnn ExecuTorch]: Running level=1 optimization.
[INFO] [Qnn ExecuTorch]: Deserializing processed data using QnnContextCustomProtocol
[INFO] [Qnn ExecuTorch]: Use cached backend bundle for current backend: kHtpBackend
[INFO] [Qnn ExecuTorch]: Initialize Qnn backend parameters for Qnn executorch backend type 2
[INFO] [Qnn ExecuTorch]: Caching: Caching is in RESTORE MODE.
[INFO] [Qnn ExecuTorch]: QnnContextCustomProtocol expected magic number: 0x5678abcd but get: 0x2000000
[INFO] [Qnn ExecuTorch]: Running level=1 optimization.
[INFO] [Qnn ExecuTorch]: Running level=1 optimization.
[INFO] [Qnn ExecuTorch]: Deserializing processed data using QnnContextCustomProtocol
[INFO] [Qnn ExecuTorch]: Use cached backend bundle for current backend: kHtpBackend
[INFO] [Qnn ExecuTorch]: Initialize Qnn backend parameters for Qnn executorch backend type 2
[INFO] [Qnn ExecuTorch]: Caching: Caching is in RESTORE MODE.
[INFO] [Qnn ExecuTorch]: QnnContextCustomProtocol expected magic number: 0x5678abcd but get: 0x2000000
[INFO] [Qnn ExecuTorch]: Running level=1 optimization.
[INFO] [Qnn ExecuTorch]: Running level=1 optimization.
[INFO] [Qnn ExecuTorch]: Deserializing processed data using QnnContextCustomProtocol
[INFO] [Qnn ExecuTorch]: Use cached backend bundle for current backend: kHtpBackend
[INFO] [Qnn ExecuTorch]: Initialize Qnn backend parameters for Qnn executorch backend type 2
[INFO] [Qnn ExecuTorch]: Caching: Caching is in RESTORE MODE.
[INFO] [Qnn ExecuTorch]: QnnContextCustomProtocol expected magic number: 0x5678abcd but get: 0x2000000
[INFO] [Qnn ExecuTorch]: Running level=1 optimization.
[INFO] [Qnn ExecuTorch]: Running level=1 optimization.
[INFO] [Qnn ExecuTorch]: Deserializing processed data using QnnContextCustomProtocol
[INFO] [Qnn ExecuTorch]: Use cached backend bundle for current backend: kHtpBackend
[INFO] [Qnn ExecuTorch]: Initialize Qnn backend parameters for Qnn executorch backend type 2
[INFO] [Qnn ExecuTorch]: Caching: Caching is in RESTORE MODE.
[INFO] [Qnn ExecuTorch]: QnnContextCustomProtocol expected magic number: 0x5678abcd but get: 0x2000000
[INFO] [Qnn ExecuTorch]: Running level=1 optimization.
[INFO] [Qnn ExecuTorch]: Running level=1 optimization.
[INFO] [Qnn ExecuTorch]: Deserializing processed data using QnnContextCustomProtocol
[INFO] [Qnn ExecuTorch]: Use cached backend bundle for current backend: kHtpBackend
[INFO] [Qnn ExecuTorch]: Initialize Qnn backend parameters for Qnn executorch backend type 2
[INFO] [Qnn ExecuTorch]: Caching: Caching is in RESTORE MODE.
[INFO] [Qnn ExecuTorch]: QnnContextCustomProtocol expected magic number: 0x5678abcd but get: 0x2000000
[INFO] [Qnn ExecuTorch]: Running level=1 optimization.
[INFO] [Qnn ExecuTorch]: Running level=1 optimization.
[INFO] [Qnn ExecuTorch]: Deserializing processed data using QnnContextCustomProtocol
[INFO] [Qnn ExecuTorch]: Use cached backend bundle for current backend: kHtpBackend
[INFO] [Qnn ExecuTorch]: Initialize Qnn backend parameters for Qnn executorch backend type 2
[INFO] [Qnn ExecuTorch]: Caching: Caching is in RESTORE MODE.
[INFO] [Qnn ExecuTorch]: QnnContextCustomProtocol expected magic number: 0x5678abcd but get: 0x2000000
[INFO] [Qnn ExecuTorch]: Running level=1 optimization.
[INFO] [Qnn ExecuTorch]: Running level=1 optimization.
[INFO] [Qnn ExecuTorch]: Deserializing processed data using QnnContextCustomProtocol
[INFO] [Qnn ExecuTorch]: Use cached backend bundle for current backend: kHtpBackend
[INFO] [Qnn ExecuTorch]: Initialize Qnn backend parameters for Qnn executorch backend type 2
[INFO] [Qnn ExecuTorch]: Caching: Caching is in RESTORE MODE.
[INFO] [Qnn ExecuTorch]: QnnContextCustomProtocol expected magic number: 0x5678abcd but get: 0x2000000
[INFO] [Qnn ExecuTorch]: Running level=1 optimization.
[INFO] [Qnn ExecuTorch]: Running level=1 optimization.
[INFO] [Qnn ExecuTorch]: Deserializing processed data using QnnContextCustomProtocol
[INFO] [Qnn ExecuTorch]: Use cached backend bundle for current backend: kHtpBackend
[INFO] [Qnn ExecuTorch]: Initialize Qnn backend parameters for Qnn executorch backend type 2
[INFO] [Qnn ExecuTorch]: Caching: Caching is in RESTORE MODE.
[INFO] [Qnn ExecuTorch]: QnnContextCustomProtocol expected magic number: 0x5678abcd but get: 0x2000000
[INFO] [Qnn ExecuTorch]: Running level=1 optimization.
[INFO] [Qnn ExecuTorch]: Running level=1 optimization.
[INFO] [Qnn ExecuTorch]: Deserializing processed data using QnnContextCustomProtocol
[INFO] [Qnn ExecuTorch]: Use cached delegate handle for current method: kv_forward
[INFO] [Qnn ExecuTorch]: Deserializing processed data using QnnContextCustomProtocol
[INFO] [Qnn ExecuTorch]: Use cached delegate handle for current method: kv_forward
[INFO] [Qnn ExecuTorch]: Deserializing processed data using QnnContextCustomProtocol
[INFO] [Qnn ExecuTorch]: Use cached delegate handle for current method: kv_forward
[INFO] [Qnn ExecuTorch]: Deserializing processed data using QnnContextCustomProtocol
[INFO] [Qnn ExecuTorch]: Use cached delegate handle for current method: kv_forward
[INFO] [Qnn ExecuTorch]: Deserializing processed data using QnnContextCustomProtocol
[INFO] [Qnn ExecuTorch]: Use cached delegate handle for current method: kv_forward
[INFO] [Qnn ExecuTorch]: Deserializing processed data using QnnContextCustomProtocol
[INFO] [Qnn ExecuTorch]: Use cached delegate handle for current method: kv_forward
[INFO] [Qnn ExecuTorch]: Deserializing processed data using QnnContextCustomProtocol
[INFO] [Qnn ExecuTorch]: Use cached delegate handle for current method: kv_forward
[INFO] [Qnn ExecuTorch]: Deserializing processed data using QnnContextCustomProtocol
[INFO] [Qnn ExecuTorch]: Use cached delegate handle for current method: kv_forward
[INFO] [Qnn ExecuTorch]: Deserializing processed data using QnnContextCustomProtocol
[INFO] [Qnn ExecuTorch]: Use cached delegate handle for current method: kv_forward
[INFO] [Qnn ExecuTorch]: Deserializing processed data using QnnContextCustomProtocol
[INFO] [Qnn ExecuTorch]: Use cached delegate handle for current method: kv_forward
[INFO] [Qnn ExecuTorch]: Deserializing processed data using QnnContextCustomProtocol
[INFO] [Qnn ExecuTorch]: Use cached delegate handle for current method: kv_forward
[INFO] [Qnn ExecuTorch]: Deserializing processed data using QnnContextCustomProtocol
[INFO] [Qnn ExecuTorch]: Use cached delegate handle for current method: kv_forward
[INFO] [Qnn ExecuTorch]: Deserializing processed data using QnnContextCustomProtocol
[INFO] [Qnn ExecuTorch]: Use cached delegate handle for current method: kv_forward
[INFO] [Qnn ExecuTorch]: Deserializing processed data using QnnContextCustomProtocol
[INFO] [Qnn ExecuTorch]: Use cached delegate handle for current method: kv_forward
[INFO] [Qnn ExecuTorch]: Deserializing processed data using QnnContextCustomProtocol
[INFO] [Qnn ExecuTorch]: Use cached delegate handle for current method: kv_forward
[INFO] [Qnn ExecuTorch]: Deserializing processed data using QnnContextCustomProtocol
[INFO] [Qnn ExecuTorch]: Use cached delegate handle for current method: kv_forward
[INFO] [Qnn ExecuTorch]: Deserializing processed data using QnnContextCustomProtocol
[INFO] [Qnn ExecuTorch]: Use cached delegate handle for current method: kv_forward
[INFO] [Qnn ExecuTorch]: Deserializing processed data using QnnContextCustomProtocol
[INFO] [Qnn ExecuTorch]: Use cached delegate handle for current method: kv_forward
[INFO] [Qnn ExecuTorch]: Deserializing processed data using QnnContextCustomProtocol
[INFO] [Qnn ExecuTorch]: Use cached delegate handle for current method: kv_forward
[INFO] [Qnn ExecuTorch]: Deserializing processed data using QnnContextCustomProtocol
[INFO] [Qnn ExecuTorch]: Use cached delegate handle for current method: kv_forward
[INFO] [Qnn ExecuTorch]: Deserializing processed data using QnnContextCustomProtocol
[INFO] [Qnn ExecuTorch]: Use cached delegate handle for current method: kv_forward
[INFO] [Qnn ExecuTorch]: Deserializing processed data using QnnContextCustomProtocol
[INFO] [Qnn ExecuTorch]: Use cached delegate handle for current method: kv_forward
[INFO] [Qnn ExecuTorch]: Deserializing processed data using QnnContextCustomProtocol
[INFO] [Qnn ExecuTorch]: Use cached delegate handle for current method: kv_forward
[INFO] [Qnn ExecuTorch]: Deserializing processed data using QnnContextCustomProtocol
[INFO] [Qnn ExecuTorch]: Use cached delegate handle for current method: kv_forward
I 00:00:01.556410 executorch:multimodal_runner.cpp:219] Reading metadata from model
[INFO] [Qnn ExecuTorch]: Deserializing processed data using QnnContextCustomProtocol
[INFO] [Qnn ExecuTorch]: Use cached backend bundle for current backend: kHtpBackend
[INFO] [Qnn ExecuTorch]: Initialize Qnn backend parameters for Qnn executorch backend type 2
[INFO] [Qnn ExecuTorch]: Caching: Caching is in RESTORE MODE.
[INFO] [Qnn ExecuTorch]: QnnContextCustomProtocol expected magic number: 0x5678abcd but get: 0x2000000
[ERROR] [Qnn ExecuTorch]: QnnDsp <E> Skel failed to process context binary.

[ERROR] [Qnn ExecuTorch]: QnnDsp <E> Context create from binary failed for deviceId 0 coreId 0 pdId 0 for context 19, err 5005

[ERROR] [Qnn ExecuTorch]: QnnDsp <E> Context 25 failed on pd 0

[ERROR] [Qnn ExecuTorch]: QnnDsp <E> Skel failed to process context binary.

[ERROR] [Qnn ExecuTorch]: QnnDsp <E> Context create from binary failed for deviceId 0 coreId 0 pdId 2 for context 19, err 5005

[ERROR] [Qnn ExecuTorch]: QnnDsp <E> Context 25 failed on pd 2

[ERROR] [Qnn ExecuTorch]: QnnDsp <E> Skel failed to process context binary.

[ERROR] [Qnn ExecuTorch]: QnnDsp <E> Context create from binary failed for deviceId 0 coreId 0 pdId 3 for context 19, err 5005

[ERROR] [Qnn ExecuTorch]: QnnDsp <E> Context 25 failed on pd 3

[ERROR] [Qnn ExecuTorch]: QnnDsp <E> Skel failed to process context binary.

[ERROR] [Qnn ExecuTorch]: QnnDsp <E> Context create from binary failed for deviceId 0 coreId 0 pdId 1 for context 19, err 5005

[ERROR] [Qnn ExecuTorch]: QnnDsp <E> Context 25 failed on pd 1

[ERROR] [Qnn ExecuTorch]: QnnDsp <E> Failed to find available PD for contextId 25 on deviceId 0 coreId 0with context size estimate 697858304

[ERROR] [Qnn ExecuTorch]: QnnDsp <E> context create from binary failed on contextId 25, err = 1002

[ERROR] [Qnn ExecuTorch]: QnnDsp <E> context create from binary failed for contextId 25, err = 1002

[ERROR] [Qnn ExecuTorch]: QnnDsp <E> Fail to create context from binary with err 1002

[ERROR] [Qnn ExecuTorch]: QnnDsp <E> Size Calculation encounter error! Doing Hard reset of reserved mem to 0.

[ERROR] [Qnn ExecuTorch]: QnnDsp <E> Failed to create context from binary with err 0x3ea

[ERROR] [Qnn ExecuTorch]: Can't create context from binary. Error 1002.
E 00:00:02.772253 executorch:QnnManager.cpp:268] Fail to configure Qnn context
E 00:00:02.772262 executorch:QnnExecuTorchBackend.cpp:99] Fail to initialize Qnn Manager
E 00:00:02.772269 executorch:method.cpp:114] Init failed for backend QnnBackend: 0x1
E 00:00:02.829077 executorch:encoder.cpp:36] Failed to load encoder method
F 00:00:02.829107 executorch:result.h:170] In function CheckOk(), assert failed: hasValue_
Aborted

Versions

PyTorch version: 2.11.0+cpu
Is debug build: False
CUDA used to build PyTorch: None
ROCM used to build PyTorch: N/A

OS: Ubuntu 22.04.5 LTS (x86_64)
GCC version: (Ubuntu 11.4.0-1ubuntu1~22.04.3) 11.4.0
Clang version: Could not collect
CMake version: version 3.31.10
Libc version: glibc-2.35

Python version: 3.10.20 | packaged by conda-forge | (main, Mar 5 2026, 16:42:22) [GCC 14.3.0] (64-bit runtime)
Python platform: Linux-5.10.134-16.3.al8.x86_64-x86_64-with-glibc2.35
Is CUDA available: False
CUDA runtime version: No CUDA
CUDA_MODULE_LOADING set to: N/A
GPU models and configuration: No CUDA
Nvidia driver version: No CUDA
cuDNN version: No CUDA
HIP runtime version: N/A
MIOpen runtime version: N/A
Is XNNPACK available: True

CPU:
Architecture: x86_64
CPU op-mode(s): 32-bit, 64-bit
Address sizes: 46 bits physical, 57 bits virtual
Byte Order: Little Endian
CPU(s): 128
On-line CPU(s) list: 0-127
Vendor ID: GenuineIntel
Model name: Intel(R) Xeon(R) Platinum 8369B CPU @ 2.90GHz
CPU family: 6
Model: 106
Thread(s) per core: 2
Core(s) per socket: 32
Socket(s): 2
Stepping: 6
CPU max MHz: 3500.0000
CPU min MHz: 800.0000
BogoMIPS: 5800.00
Flags: fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx pdpe1gb rdtscp lm constant_tsc art arch_perfmon pebs bts rep_good nopl xtopology nonstop_tsc cpuid aperfmperf pni pclmulqdq dtes64 monitor ds_cpl vmx smx est tm2 ssse3 sdbg fma cx16 xtpr pdcm pcid dca sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand lahf_lm abm 3dnowprefetch cpuid_fault epb cat_l3 invpcid_single intel_ppin ssbd mba ibrs ibpb stibp ibrs_enhanced tpr_shadow vnmi flexpriority ept vpid ept_ad fsgsbase tsc_adjust bmi1 hle avx2 smep bmi2 erms invpcid rtm cqm rdt_a avx512f avx512dq rdseed adx smap avx512ifma clflushopt clwb intel_pt avx512cd sha_ni avx512bw avx512vl xsaveopt xsavec xgetbv1 xsaves cqm_llc cqm_occup_llc cqm_mbm_total cqm_mbm_local split_lock_detect wbnoinvd dtherm ida arat pln pts hwp hwp_act_window hwp_epp hwp_pkg_req avx512vbmi umip pku ospke avx512_vbmi2 gfni vaes vpclmulqdq avx512_vnni avx512_bitalg tme avx512_vpopcntdq rdpid fsrm md_clear pconfig flush_l1d arch_capabilities
Virtualization: VT-x
L1d cache: 3 MiB (64 instances)
L1i cache: 2 MiB (64 instances)
L2 cache: 80 MiB (64 instances)
L3 cache: 96 MiB (2 instances)
NUMA node(s): 2
NUMA node0 CPU(s): 0-31,64-95
NUMA node1 CPU(s): 32-63,96-127
Vulnerability Itlb multihit: Not affected
Vulnerability L1tf: Not affected
Vulnerability Mds: Not affected
Vulnerability Meltdown: Not affected
Vulnerability Mmio stale data: Mitigation; Clear CPU buffers; SMT vulnerable
Vulnerability Retbleed: Not affected
Vulnerability Spec store bypass: Mitigation; Speculative Store Bypass disabled via prctl and seccomp
Vulnerability Spectre v1: Mitigation; usercopy/swapgs barriers and __user pointer sanitization
Vulnerability Spectre v2: Mitigation; Enhanced IBRS, IBPB conditional, RSB filling, PBRSB-eIBRS SW sequence
Vulnerability Srbds: Not affected
Vulnerability Tsx async abort: Not affected

Versions of relevant libraries:
[pip3] executorch==1.3.0a0+60d57e5
[pip3] numpy==2.2.6
[pip3] pytorch_tokenizers==1.1.0
[pip3] torch==2.11.0+cpu
[pip3] torchao==0.17.0+git42bcdc491
[pip3] torchaudio==2.11.0+cpu
[pip3] torchdata==0.11.0+cpu
[pip3] torchsr==1.0.4
[pip3] torchtune==0.0.0
[pip3] torchvision==0.26.0+cpu
[conda] executorch 1.3.0a0+60d57e5 pypi_0 pypi
[conda] numpy 2.2.6 pypi_0 pypi
[conda] pytorch-tokenizers 1.1.0 pypi_0 pypi
[conda] torch 2.11.0+cpu pypi_0 pypi
[conda] torchao 0.17.0+git42bcdc491 pypi_0 pypi
[conda] torchaudio 2.11.0+cpu pypi_0 pypi
[conda] torchdata 0.11.0+cpu pypi_0 pypi
[conda] torchsr 1.0.4 pypi_0 pypi
[conda] torchtune 0.0.0 pypi_0 pypi
[conda] torchvision 0.26.0+cpu pypi_0 pypi

cc @cccclai @cbilgin

Metadata

Metadata

Labels

module: qnnIssues related to Qualcomm's QNN delegate and code under backends/qualcomm/

Type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions