Describe the bug
My team is working on deploying this HuggingFace model on Seldon v2.8.5. I'm running into this interesting issue where results from invoking model and invoking pipeline are materially different. When calling the model directly via http://localhost:1234/v2/models/cc-hf-test/infer the response is vector as expected. in the data part of the tensor ( ex: "[[-0.19405314326286316, -0.12709179520606995, 0.06382999569177628, -0.1323239356 ...]]" ) But when calling the pipeline I'm getting a base64 encoded response in the tensor data. (ex: "W1stMC4xMDAyNjA4OTg0NzA4Nzg2LCAwLjEzMDAyODcwOTc2OTI0ODk2LCAwLjAxOTk1NTk1OTE3MTA1Njc0Nywg.... " ) The Kafka messages show the response [[-0.19405314326286316, -0.12709179520606995, 0.06382999569177628, -0.1323239356 ...]]" My best guess right now modelgateway is doing this when taking returning a response from Kafka message to Envoy.
To reproduce
model-setting.json
{
"implementation": "mlserver_huggingface.HuggingFaceRuntime",
"parameters": {
"extra": {
"optimum_model": "true",
"pretrained_model": "optimum/all-MiniLM-L6-v2",
"task": "feature-extraction"
}
}
}
manifests
apiVersion: mlops.seldon.io/v1alpha1
kind: Model
metadata:
name: cc-hf-test
namespace: seldon
spec:
requirements:
- huggingface
secretName: seldon-rclone-s3-secret
storageUri: s3://seldonbucket/cc-hf-test/1/
replicas: 2
---
apiVersion: mlops.seldon.io/v1alpha1
kind: Server
metadata:
annotations:
name: cc-hf-test
namespace: seldon
spec:
capabilities:
- huggingface
podSpec:
containers:
- image: seldonio/mlserver:1.6.1-huggingface
name: mlserver
resources:
requests:
cpu: "2"
memory: 4Gi
limits:
cpu: "2"
memory: 4Gi
serviceAccountName: sa
replicas: 2
serverConfig: mlserver
---
apiVersion: mlops.seldon.io/v1alpha1
kind: Pipeline
metadata:
name: cc-hf-test-pipeline
namespace: seldon
spec:
output:
steps:
- cc-hf-test
stepsJoin: inner
steps:
- inputsJoinType: inner
name: cc-hf-test
Invocation
curl --location 'http://localhost:1234/v2/pipelines/cc-hf-test-pipeline/infer' \
--header 'Content-Type: application/json' \
--data '{
"inputs": [
{
"name": "args",
"shape": [
2
],
"datatype": "BYTES",
"data": [
"interesting issue"
]
}
]
}'
Result
{
"model_name": "",
"outputs": [
{
"data": [
"W1stMC4xMDAyNjA4OTg0NzA4Nzg2LCAwLjEzMDAyODcwOTc2OT.... TRUNCATED"
],
"name": "output",
"shape": [
1,
1
],
"datatype": "BYTES",
"parameters": {
"content_type": "hg_jsonlist"
}
}
]
}
Expected behaviour
{
"model_name": "cc-hf-test_1",
"model_version": "1",
"id": "261005c3-0721-4406-b7f5-1cab60744815",
"parameters": {},
"outputs": [
{
"name": "output",
"shape": [
1,
1
],
"datatype": "BYTES",
"parameters": {
"content_type": "hg_jsonlist"
},
"data": [
"[[-0.19405314326286316, -0.12709179520606995, 0.06382999569177628,...TRUNCATED]]"
]
}
]
}
Environment
Model Details
- Images of your model:
seldonio/mlserver:1.6.1-huggingface
- Logs of your model:
mlserver 2025-03-28 20:17:51,652 [mlserver.grpc] INFO - /inference.GRPCInferenceService/ModelInfer
mlserver Ignoring args : ('',)
agent time="2025-03-28T20:17:51Z" level=debug msg="Extracted model name seldon-internal-model:cc-hf-test_1 seldon-model:cc-hf-test" Source=GRPCProxy
agent time="2025-03-28T20:17:51Z" level=debug msg="Ensure that model cc-hf-test_1 is loaded in memory" Source=StateManager
agent time="2025-03-28T20:17:51Z" level=debug msg="Model exists in cache cc-hf-test_1" Source=StateManager
agent time="2025-03-28T20:17:51Z" level=debug msg="Request ids from incoming meta [cvjg7rr51kgc73dm4prg]" Source=GRPCProxy
Describe the bug
My team is working on deploying this HuggingFace model on Seldon v2.8.5. I'm running into this interesting issue where results from invoking model and invoking pipeline are materially different. When calling the model directly via http://localhost:1234/v2/models/cc-hf-test/infer the response is vector as expected. in the data part of the tensor ( ex: "[[-0.19405314326286316, -0.12709179520606995, 0.06382999569177628, -0.1323239356 ...]]" ) But when calling the pipeline I'm getting a base64 encoded response in the tensor data. (ex: "W1stMC4xMDAyNjA4OTg0NzA4Nzg2LCAwLjEzMDAyODcwOTc2OTI0ODk2LCAwLjAxOTk1NTk1OTE3MTA1Njc0Nywg.... " ) The Kafka messages show the response [[-0.19405314326286316, -0.12709179520606995, 0.06382999569177628, -0.1323239356 ...]]" My best guess right now modelgateway is doing this when taking returning a response from Kafka message to Envoy.
To reproduce
model-setting.json
manifests
Invocation
Result
Expected behaviour
Environment
Model Details
seldonio/mlserver:1.6.1-huggingface