Skip to content

Pipeline Invocation Response is Different From Model Invocation Response #6357

@charleschangdp

Description

@charleschangdp

Describe the bug

My team is working on deploying this HuggingFace model on Seldon v2.8.5. I'm running into this interesting issue where results from invoking model and invoking pipeline are materially different. When calling the model directly via http://localhost:1234/v2/models/cc-hf-test/infer the response is vector as expected. in the data part of the tensor ( ex: "[[-0.19405314326286316, -0.12709179520606995, 0.06382999569177628, -0.1323239356 ...]]" ) But when calling the pipeline I'm getting a base64 encoded response in the tensor data. (ex: "W1stMC4xMDAyNjA4OTg0NzA4Nzg2LCAwLjEzMDAyODcwOTc2OTI0ODk2LCAwLjAxOTk1NTk1OTE3MTA1Njc0Nywg.... " ) The Kafka messages show the response [[-0.19405314326286316, -0.12709179520606995, 0.06382999569177628, -0.1323239356 ...]]" My best guess right now modelgateway is doing this when taking returning a response from Kafka message to Envoy.

To reproduce

model-setting.json

{
 "implementation": "mlserver_huggingface.HuggingFaceRuntime",
 "parameters": {
  "extra": {
   "optimum_model": "true",
   "pretrained_model": "optimum/all-MiniLM-L6-v2",
   "task": "feature-extraction"
  }
 }
}

manifests

apiVersion: mlops.seldon.io/v1alpha1
kind: Model
metadata:
  name: cc-hf-test
  namespace: seldon
spec:
  requirements:
    - huggingface
  secretName: seldon-rclone-s3-secret
  storageUri: s3://seldonbucket/cc-hf-test/1/
  replicas: 2
---
apiVersion: mlops.seldon.io/v1alpha1
kind: Server
metadata:
  annotations:
  name: cc-hf-test
  namespace: seldon
spec:
  capabilities:
  - huggingface
  podSpec:
    containers:
    - image: seldonio/mlserver:1.6.1-huggingface
      name: mlserver
      resources:
        requests:
          cpu:     "2"
          memory:  4Gi
        limits:
          cpu:     "2"
          memory:  4Gi
    serviceAccountName: sa
  replicas: 2
  serverConfig: mlserver
---
apiVersion: mlops.seldon.io/v1alpha1
kind: Pipeline
metadata:
  name: cc-hf-test-pipeline
  namespace: seldon
spec:
  output:
    steps:
    - cc-hf-test
    stepsJoin: inner
  steps:
  - inputsJoinType: inner
    name: cc-hf-test

Invocation

curl --location 'http://localhost:1234/v2/pipelines/cc-hf-test-pipeline/infer' \
--header 'Content-Type: application/json' \
--data '{
    "inputs": [
        {
            "name": "args",
            "shape": [
                2
            ],
            "datatype": "BYTES",
            "data": [
                "interesting issue"
            ]
        }      
    ]
}'

Result

{
  "model_name": "",
  "outputs": [
    {
      "data": [
        "W1stMC4xMDAyNjA4OTg0NzA4Nzg2LCAwLjEzMDAyODcwOTc2OT.... TRUNCATED"
      ],
      "name": "output",
      "shape": [
        1,
        1
      ],
      "datatype": "BYTES",
      "parameters": {
        "content_type": "hg_jsonlist"
      }
    }
  ]
}

Expected behaviour

{
    "model_name": "cc-hf-test_1",
    "model_version": "1",
    "id": "261005c3-0721-4406-b7f5-1cab60744815",
    "parameters": {},
    "outputs": [
        {
            "name": "output",
            "shape": [
                1,
                1
            ],
            "datatype": "BYTES",
            "parameters": {
                "content_type": "hg_jsonlist"
            },
            "data": [
                "[[-0.19405314326286316, -0.12709179520606995, 0.06382999569177628,...TRUNCATED]]"
            ]
        }
    ]
}

Environment

Model Details

  • Images of your model: seldonio/mlserver:1.6.1-huggingface
  • Logs of your model:
mlserver 2025-03-28 20:17:51,652 [mlserver.grpc] INFO - /inference.GRPCInferenceService/ModelInfer                                                                                                      
mlserver Ignoring args : ('',)                                                                                                                                                                          
agent time="2025-03-28T20:17:51Z" level=debug msg="Extracted model name seldon-internal-model:cc-hf-test_1 seldon-model:cc-hf-test" Source=GRPCProxy                                                    
agent time="2025-03-28T20:17:51Z" level=debug msg="Ensure that model cc-hf-test_1 is loaded in memory" Source=StateManager                                                                              
agent time="2025-03-28T20:17:51Z" level=debug msg="Model exists in cache cc-hf-test_1" Source=StateManager                                                                                              
agent time="2025-03-28T20:17:51Z" level=debug msg="Request ids from incoming meta [cvjg7rr51kgc73dm4prg]" Source=GRPCProxy   

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions