You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Integrate popular cloud and external storage systems with Label Studio to collect new items uploaded to the buckets, containers, databases, or directories and return the annotation results so that you can use them in your machine learning pipelines.
15
15
16
-
<divclass="opensource-only">
17
-
18
-
| Storage | Community | Enterprise |
19
-
|---|---|---|
20
-
|[Amazon S3](#Amazon-S3)| ✅ | ✅ |
21
-
|[Amazon S3 with IAM role](https://docs.humansignal.com/guide/storage#Set-up-an-S3-connection-with-IAM-role-access)| ❌ | ✅ |
@@ -1294,7 +1270,7 @@ Complete the following fields and then click **Test connection**:
1294
1270
|||
1295
1271
| --- | --- |
1296
1272
| Storage Title | Enter a name forthe storage connection to appearin Label Studio. |
1297
-
| Storage Name | Enter the name of your Azure storage account. |
1273
+
| Storage Name | Enter the name of your Azure storage sccount. |
1298
1274
| Container Name | Enter the name of a container within the Azure storage account. |
1299
1275
| Tenant ID | Specify the **Directory (tenant) ID** from your App Registration. |
1300
1276
| Client ID | Specify the **Application (client) ID** from your App Registration. |
@@ -1516,92 +1492,60 @@ If you're using Label Studio in Docker, you need to mount the local directory th
1516
1492
1517
1493
<div class="enterprise-only">
1518
1494
1519
-
Connect Label Studio Enterprise to Databricks Unity Catalog (UC) Volumes to import files as tasks and export annotations as JSON back to your volumes. This connector uses the Databricks Files API and operates only in proxy mode (presigned URLs are not supported by Databricks).
1495
+
Connect Label Studio Enterprise to Databricks Unity Catalog (UC) Volumes to import files as tasks and export annotations as JSON back to your volumes. This connector uses the Databricks Files API and operates only in proxy mode (no presigned URLs are supported by Databricks).
1520
1496
1521
1497
### Prerequisites
1522
-
- A Databricks workspace URL (Workspace Host), for example `https://adb-12345678901234.1.databricks.com` (or Azure domain).
1523
-
1524
-
See [Create a workspace](https://docs.databricks.com/aws/en/admin/workspace/) and [Get identifiers for workspace objects](https://docs.databricks.com/aws/en/workspace/workspace-details#workspace-url).
1525
-
- A Databricks Personal Access Token (PAT) with permission to access the Files API.
1526
-
1527
-
You can generate tokens from **Settings > Developer**. See [Databricks personal access token authentication](https://docs.databricks.com/en/dev-tools/auth/pat.html).
1528
-
- A UC Volume path under `/Volumes/<catalog>/<schema>/<volume>` with files you want to label.
1529
-
1530
-
See [What are Unity Catalog volumes?](https://docs.databricks.com/aws/en/volumes/).
1531
-
1532
-
### Create a source storage connection in the Label Studio UI
1533
-
1534
-
From Label Studio, open your project and select **Settings > Cloud Storage > Add Source Storage**.
1535
-
1536
-
Select **Databricks Files (UC Volumes)** and click **Next**.
1537
-
1538
-
#### Configure Connection
1539
-
1540
-
Complete the following fields and then click **Test connection**:
1541
-
1542
-
<div class="noheader rowheader">
1543
-
1544
-
| | |
1545
-
| --- | --- |
1546
-
| Storage Title | Enter a name for the storage connection to appear in Label Studio. |
1547
-
| Workspace Host | Enter your workspace URL, for example `https://<workspace-identifier>.cloud.databricks.com` |
1548
-
| Access Token | Enter your personal access token that you generated in Databricks. |
1549
-
| Catalog <br> Schema <br> Volume | Specify your volume path (UC coordinates). You can find this from the **Catalog Explorer** in Databricks (see screenshot below). |
1550
-
1551
-
</div>
1552
-
1553
-

1498
+
- A Databricks workspace URL (Workspace Host), for example `https://adb-12345678901234.1.databricks.com` (or Azure domain)
1499
+
- A Databricks Personal Access Token (PAT) with permission to access the Files API
1500
+
- A UC Volume path under `/Volumes/<catalog>/<schema>/<volume>` with files you want to label
- Personal access tokens: https://docs.databricks.com/en/dev-tools/auth/pat.html
1505
+
- Unity Catalog and Volumes: https://docs.databricks.com/en/files/volumes.html
1556
1506
1557
-
Complete the following fields and then click **Load preview** to ensure you are syncing the correct data:
1558
-
1559
-
<div class="noheader rowheader">
1560
-
1561
-
| | |
1562
-
| --- | --- |
1563
-
| Bucket Prefix | Optionally, enter the directory name within the volume that you would like to use. For example, `data-set-1` or `data-set-1/subfolder-2`. |
1564
-
| Import Method | Select whether you want create a task for each file in your container or whether you would like to use a JSON/JSONL/Parquet file to define the data for each task. |
1565
-
| File Name Filter | Specify a regular expression to filter bucket objects. Use `.*` to collect all objects. |
1566
-
| Scan all sub-folders | Enable this option to perform a recursive scan across subfolders within your container. |
1567
-
1568
-
</div>
1569
-
1570
-
#### Review & Confirm
1571
-
1572
-
If everything looks correct, click **Save & Sync** to sync immediately, or click **Save** to save your settings and sync later.
1507
+
### Set up connection in the Label Studio UI
1508
+
1. Open Label Studio → project → **Settings > Cloud Storage**.
- If your file preview returns zero files, verify the path under `/Volumes/<catalog>/<schema>/<volume>/<prefix?>` and your PAT permissions.
1541
+
- If listing returns zero files, verify the path under `/Volumes/<catalog>/<schema>/<volume>/<prefix?>` and your PAT permissions.
1587
1542
- Ensure the Workspace Host has no trailing slash and matches your workspace domain.
1588
-
- If previews work but media fails to load, confirm proxy mode is allowed for your organization in Label Studio (**Organization > Usage & License > Features**) and network egress allows Label Studio to reach Databricks.
1543
+
- If previews work but media fails to load, confirm proxy mode is allowed for your organization in Label Studio and network egress allows Label Studio to reach Databricks.
1589
1544
1590
1545
1591
1546
!!! warning "Proxy and security"
1592
1547
This connector streams data **through the Label Studio backend** with HTTP Range support. Databricks does not support presigned URLs, so this option is also not available in Label Studio.
1593
1548
1594
-
### Create a target storage connection in the Label Studio UI
1595
-
1596
-
Repeat the steps from the previous section but using **Add Target Storage**. Use the same workspace host, token, and volume path (UC coordinates).
1597
-
1598
-
For your **Bucket Prefix**, set an export folder to use (e.g., `exports/${project_id}`) and determine whether you want to allow files to be deleted from target storage.
1599
-
1600
-
When file deletion is enabled, if you delete an annotation in Label Studio (via UI or API), Label Studio will also delete the corresponding exported JSON file from your target storage for this storage connection.
1601
-
1602
-
Note that this only affects files that were exported by that target storage, not your source media or tasks. Your PAT permissions must also allow deletion.
1603
-
1604
-
After adding, click **Sync** to export annotations as JSON files to your target volume.
1605
1549
1606
1550
</div>
1607
1551
@@ -1615,11 +1559,7 @@ Databricks Unity Catalog (UC) Volumes integration is available in Label Studio E
1615
1559
- Stream media securely via the platform proxy (no presigned URLs)
1616
1560
- Export annotations back to your Databricks Volume as JSON
1617
1561
1618
-
Learn more and see the full setup guide in the Enterprise documentation:
If your organization needs governed access to Databricks data with Unity Catalog, consider [Label Studio Enterprise](https://humansignal.com/).
1562
+
Learn more and see the full setup guide in the Enterprise documentation: [Databricks Files (UC Volumes)](https://docs.humansignal.com/guide/storage#Databricks-Files-UC-Volumes). If your organization needs governed access to Databricks data with Unity Catalog, consider [Label Studio Enterprise](https://humansignal.com/).
0 commit comments