Pipeline Invocation Response is Different From Model Invocation Response

## Describe the bug

My team is working on deploying this [HuggingFace model](https://huggingface.co/sentence-transformers/all-MiniLM-L6-v2) on Seldon v2.8.5. I'm running into this interesting issue where results from invoking model and invoking pipeline are materially different. When calling the model directly via http://localhost:1234/v2/models/cc-hf-test/infer  the response is vector as expected. in the data part of the tensor ( ex: "[[-0.19405314326286316, -0.12709179520606995, 0.06382999569177628, -0.1323239356 ...]]" ) But when calling the pipeline I'm getting a base64 encoded response in the tensor data. (ex: "W1stMC4xMDAyNjA4OTg0NzA4Nzg2LCAwLjEzMDAyODcwOTc2OTI0ODk2LCAwLjAxOTk1NTk1OTE3MTA1Njc0Nywg.... " ) The Kafka messages show the response [[-0.19405314326286316, -0.12709179520606995, 0.06382999569177628, -0.1323239356 ...]]"  My best guess right now modelgateway is doing this when taking returning a response from Kafka message to Envoy.

## To reproduce

model-setting.json
```
{
 "implementation": "mlserver_huggingface.HuggingFaceRuntime",
 "parameters": {
  "extra": {
   "optimum_model": "true",
   "pretrained_model": "optimum/all-MiniLM-L6-v2",
   "task": "feature-extraction"
  }
 }
}
``` 
manifests
```
apiVersion: mlops.seldon.io/v1alpha1
kind: Model
metadata:
  name: cc-hf-test
  namespace: seldon
spec:
  requirements:
    - huggingface
  secretName: seldon-rclone-s3-secret
  storageUri: s3://seldonbucket/cc-hf-test/1/
  replicas: 2
---
apiVersion: mlops.seldon.io/v1alpha1
kind: Server
metadata:
  annotations:
  name: cc-hf-test
  namespace: seldon
spec:
  capabilities:
  - huggingface
  podSpec:
    containers:
    - image: seldonio/mlserver:1.6.1-huggingface
      name: mlserver
      resources:
        requests:
          cpu:     "2"
          memory:  4Gi
        limits:
          cpu:     "2"
          memory:  4Gi
    serviceAccountName: sa
  replicas: 2
  serverConfig: mlserver
---
apiVersion: mlops.seldon.io/v1alpha1
kind: Pipeline
metadata:
  name: cc-hf-test-pipeline
  namespace: seldon
spec:
  output:
    steps:
    - cc-hf-test
    stepsJoin: inner
  steps:
  - inputsJoinType: inner
    name: cc-hf-test
```
Invocation
```
curl --location 'http://localhost:1234/v2/pipelines/cc-hf-test-pipeline/infer' \
--header 'Content-Type: application/json' \
--data '{
    "inputs": [
        {
            "name": "args",
            "shape": [
                2
            ],
            "datatype": "BYTES",
            "data": [
                "interesting issue"
            ]
        }      
    ]
}'
```

Result
```
{
  "model_name": "",
  "outputs": [
    {
      "data": [
        "W1stMC4xMDAyNjA4OTg0NzA4Nzg2LCAwLjEzMDAyODcwOTc2OT.... TRUNCATED"
      ],
      "name": "output",
      "shape": [
        1,
        1
      ],
      "datatype": "BYTES",
      "parameters": {
        "content_type": "hg_jsonlist"
      }
    }
  ]
}
```
## Expected behaviour


```
{
    "model_name": "cc-hf-test_1",
    "model_version": "1",
    "id": "261005c3-0721-4406-b7f5-1cab60744815",
    "parameters": {},
    "outputs": [
        {
            "name": "output",
            "shape": [
                1,
                1
            ],
            "datatype": "BYTES",
            "parameters": {
                "content_type": "hg_jsonlist"
            },
            "data": [
                "[[-0.19405314326286316, -0.12709179520606995, 0.06382999569177628,...TRUNCATED]]"
            ]
        }
    ]
}
```


## Environment




## Model Details 
* Images of your model: `seldonio/mlserver:1.6.1-huggingface`
* Logs of your model: 
```
mlserver 2025-03-28 20:17:51,652 [mlserver.grpc] INFO - /inference.GRPCInferenceService/ModelInfer                                                                                                      
mlserver Ignoring args : ('',)                                                                                                                                                                          
agent time="2025-03-28T20:17:51Z" level=debug msg="Extracted model name seldon-internal-model:cc-hf-test_1 seldon-model:cc-hf-test" Source=GRPCProxy                                                    
agent time="2025-03-28T20:17:51Z" level=debug msg="Ensure that model cc-hf-test_1 is loaded in memory" Source=StateManager                                                                              
agent time="2025-03-28T20:17:51Z" level=debug msg="Model exists in cache cc-hf-test_1" Source=StateManager                                                                                              
agent time="2025-03-28T20:17:51Z" level=debug msg="Request ids from incoming meta [cvjg7rr51kgc73dm4prg]" Source=GRPCProxy   
```

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Pipeline Invocation Response is Different From Model Invocation Response #6357

Describe the bug

To reproduce

Expected behaviour

Environment

Model Details

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Pipeline Invocation Response is Different From Model Invocation Response #6357

Description

Describe the bug

To reproduce

Expected behaviour

Environment

Model Details

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions