Skip to content

Commit 74bc54d

Browse files
docs: Agreement v2 docs (#9607)
1 parent 31b8c06 commit 74bc54d

36 files changed

Lines changed: 1337 additions & 429 deletions

docs/source/guide/agreement_metrics.md

Lines changed: 780 additions & 0 deletions
Large diffs are not rendered by default.

docs/source/guide/custom_metric.md

Lines changed: 4 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -1,17 +1,18 @@
11
---
22
title: Add a custom agreement metric to Label Studio
3-
short: Custom agreement metric
3+
short: Custom metrics
44
tier: enterprise
55
type: guide
66
order: 0
77
order_enterprise: 310
88
meta_title: Add a Custom Agreement Metric for Labeling
99
meta_description: Label Studio Enterprise documentation about how to add a custom agreement metric to use for assessing annotator agreement or the quality of your annotation and prediction results for data labeling and machine learning projects.
1010
section: "Review & Measure Quality"
11-
11+
parent: "stats"
12+
parent_enterprise: "stats"
1213
---
1314

14-
Write a custom agreement metric to assess the quality of the predictions and annotations in your Label Studio Enterprise project. Label Studio Enterprise contains a variety of [agreement metrics for your project](stats.html) but if you want to evaluate annotations using a custom metric or a standard metric not available in Label Studio, you can write your own.
15+
Write a custom agreement metric to assess the quality of the predictions and annotations in your Label Studio Enterprise project. Label Studio Enterprise contains a variety of [agreement metrics for your project](agreement_metrics) but if you want to evaluate annotations using a custom metric or a standard metric not available in Label Studio, you can write your own.
1516

1617
!!! note
1718
This functionality is available out-of-the-box for Label Studio Enterprise Cloud users.

docs/source/guide/dashboard_members.md

Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -63,6 +63,9 @@ The Annotator Agreement Matrix helps you see how consistently different members
6363
- **Hover over any cell** to view more information including the number of tasks where both members made an annotation. If a member made more than one annotation in a task, the additional annotation(s) are also considered.
6464
- **Use the label dropdown** to filter and explore agreement when at least one annotation contains the specified label.
6565

66+
!!! note
67+
Agreement in the Members Dashboard reflects the [Pairwise agreement](stats#Pairwise) between annotators, regardless of what methodology you have selected for the project.
68+
6669
## Agreement Distribution
6770

6871
The Agreement Distribution visualizes how agreement scores vary across tasks in your project. The bar chart displays the number of tasks at each agreement score range.

docs/source/guide/label_studio_compare.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -303,7 +303,7 @@ Label Studio is available to everyone as open source software (Label Studio Comm
303303
<tr>
304304
<td><b>Agreement metrics</b><br/><a href="https://docs.humansignal.com/guide/stats.html">Define how annotator consensus is calculated using pre-defined agreement metrics.</a></td>
305305
<td style="text-align:center">❌</td>
306-
<td style="text-align:center"></td>
306+
<td style="text-align:center">Limited</td>
307307
<td style="text-align:center">✅</td>
308308
</tr>
309309
<tr>

docs/source/guide/manage_data.md

Lines changed: 26 additions & 108 deletions
Original file line numberDiff line numberDiff line change
@@ -29,7 +29,7 @@ For information on setting up a project, see [Create and configure projects](set
2929

3030
</div>
3131

32-
In Label Studio Community Edition, the data manager is the default view for your data. In Label Studio Enterprise, click **Data Manager** to open and view the data manager page. Every row in the data manager represents a labeling task in your dataset.
32+
Every row in the data manager represents a labeling task in your dataset.
3333

3434
<div class="enterprise-only">
3535

@@ -142,136 +142,54 @@ If you want to make changes to the labeling interface or perform a different typ
142142

143143
<div class="enterprise-only">
144144

145-
## Agreement and Agreement (Selected) columns
145+
## Agreement columns
146146

147-
These two columns allow you to see agreement scores at a task level.
147+
The agreement columns in the Data Manager reflect consensus between annotators for a task. For more information on agreement and how it is calculated, see [Task agreement](stats).
148148

149-
### Agreement
149+
You will see the following agreement columns in the Data Manager:
150150

151-
The **Agreement** column displays the average agreement score between all annotators for a particular task.
151+
* **Agreement** - The is the overall agreement for the task.
152152

153-
Each annotation pair's agreement score will be calculated as new annotations are submitted. For example, if there are three annotations for a task, there will be three unique annotation pairs, and the agreement column will show the average agreement score of those three pairs.
153+
This is calculated as the mean agreement score between all control tags for a particular task. See [Overall agreement](stats#Overall-agreement).
154+
* **[Control tag] agreement** - Each control tag has its own agreement score.
154155

155-
Here is an example with a simple label config. Let's assume we are using ["Exact matching choices" agreement calculation](stats#Exact-matching-choices-example)
156-
```xml
157-
<View>
158-
<Image name="image_object" value="$image_url"/>
159-
<Choices name="image_classes" toName="image_object">
160-
<Choice value="Cat"/>
161-
<Choice value="Dog"/>
162-
</Choices>
163-
</View>
164-
```
165-
Annotation 1: `Cat`
166-
Annotation 2: `Dog`
167-
Annotation 3: `Cat`
156+
How control tag agreement is calculated depends on how your project is set up. See [Per-control-tag agreement](stats#Per-control-tag-agreement).
168157

169-
The three unique pairs are
170-
1. Annotation 1 <> Annotation 2 - agreement score is `0`
171-
2. Annotation 1 <> Annotation 3 - agreement score is `1`
172-
3. Annotation 2 <> Annotation 3 - agreement score is `0`
158+
![Screenshot](/images/review/agreement-dm.png)
173159

174-
The agreement column for this task would show the average of all annotation pair's agreement score:
175-
`33%`
160+
### Annotators and models
176161

177-
### Agreement (Selected)
162+
Click any agreement column to select specific annotators and models that you want to use for agreement calculation.
178163

179-
The **Agreement (Selected)** column builds on top of the agreement column, allowing you to get agreement scores between annotators, ground truth, and model versions.
164+
![Screenshot](/images/review/agreement-dm-modal.png)
180165

181-
The column header is a dropdown where you can make your selection of which pairs you want to include in the calculation.
166+
By default, all annotators (and not models) are selected for agreement calculation.
182167

183-
<img src="/images/project/agreement-selected.png" class="gif-border" style="max-width:679px">
168+
However, you can customize this to select a subset of annotators, models, or models and annotators to compare.
184169

185-
Under **Choose What To Calculate** there are two options, which can be used for different use cases.
170+
For example, if you have 10 annotators and you select 3, the overall agreement score and the control tag agreement scores will be recalculated to reflect only your selections.
186171

187-
#### Agreement Pairs
188-
189-
This allows you to select specific annotators and/or models to compare.
190-
191-
192-
You must select at least two items to compare. This can be used in a variety of ways.
193-
194-
**Subset of annotators**
195-
196-
You can select a subset of annotators to compare. This is different and more precise than the **Agreement** column which automatically includes all annotators in the score.
197-
198-
This will then average all annotator vs annotator scores for only the selected annotators.
199-
200-
<img src="/images/project/agreement-selected-annotators.png" class="gif-border" style="max-width:679px">
201-
202-
**Subset of models**
203-
204-
You can also select multiple models to see model consensus in your project. This will average all model vs model scores for the selected models.
205-
206-
<img src="/images/project/agreement-selected-models.png" class="gif-border" style="max-width:679px">
207-
208-
**Subset of models and annotators**
209-
210-
Other combinations are also possible such as selecting one annotator and multiple models, multiple annotators and multiple models, etc.
211-
212-
* If multiple annotators are selected, all annotator vs annotator scores will be included in the average.
213-
* If multiple models are selected, all model vs model scores will be included in the average.
214-
* If one or more annotators are selected along with one or more models, all annotator vs model scores will be included in the average.
215-
216-
#### Ground Truth Match
217-
218-
If your project contains ground truth annotations, this allows you to compare either a single annotator or a single model to ground truth annotations.
219-
220-
<img src="/images/project/agreement-selected-gt.png" class="gif-border" style="max-width:679px">
221-
222-
223-
#### Limitations
224-
225-
We currently only support calculating the **Agreement (Selected)** columen for tasks with 20 or less annotations. If you have a task with more than this threshold, you will see an info icon with a tooltip.
226-
227-
<img src="/images/project/agreement-selected-threshold.png" class="gif-border" style="max-width:679px">
228-
229-
230-
#### Example Score Calculations
231-
232-
Example using the same simple label config as above:
172+
!!! note
173+
You must select at least two items to compare.
233174

234-
```xml
235-
<View>
236-
<Image name="image_object" value="$image_url"/>
237-
<Choices name="image_classes" toName="image_object">
238-
<Choice value="Cat"/>
239-
<Choice value="Dog"/>
240-
</Choices>
241-
</View>
242-
```
175+
Your selections will apply to all agreement columns in the Data Manager. You cannot select different annotators and models for different agreement columns.
243176

244-
Lets say for one task we have the following:
245-
1. Annotation 1 from annotator 1 - `Cat` (marked as ground truth)
246-
2. Annotation 2 from annotator 2 - `Dog`
247-
3. Prediction 1 from model version 1 - `Dog`
248-
4. Prediction 2 from model version 2 - `Cat`
249177

250-
Here is how the score would be calculated for various selections in the dropdown
178+
### Ground truth match
251179

252-
#### `Agreement Pairs` with `All Annotators` selected
253-
This will match the behavior of the **Agreement** column - all annotation pair's scores will be averaged:
180+
If your project contains ground truth annotations, you can use this option to compare either a single annotator or a single model to ground truth annotations.
254181

255-
1. Annotation 1 <> Annotation 2: Agreement score is `0`
182+
Label Studio will apply whatever agreement metrics and methodology you have configured for your project, but will limit the calculation to the selected annotator or model and the annotations marked as ground truth.
256183

257-
Score displayed in column for this task: `0%`
184+
<img src="/images/review/agreement-dm-gt.png" class="gif-border" style="max-width:679px">
258185

259-
#### `Agreement Pairs` with `All Annotators` and `All Model Versions` selected
260-
This will average all annotation pair's scores, as well as all annotation <> model version pair's scores
261-
1. Annotation 1 <> Annotation 2 - agreement score is `0`
262-
4. Annotation 1 <> Prediction 1 - agreement score is `0`
263-
5. Annotation 1 <> Prediction 2 - agreement score is `1`
264-
6. Annotation 2 <> Prediction 1 - agreement score is `1`
265-
7. Annotation 2 <> Prediction 2 - agreement score is `0`
186+
### Agreement popover
266187

267-
Score displayed in column for this task: `40%`
188+
Click any agreement column to see a popover that has information about the metric and methodology used.
268189

269-
#### `Ground Truth Match` with `model version 2` selected
270-
This will compare all ground truth annotations with all predictions from `model version 2`.
190+
If you are using **Pairwise** methodology, you will see a breakdown of agreement scores for the selected annotators and models.
271191

272-
In this example, Annotation 1 is marked as ground truth and Prediction 2 is from `model version 2`:
192+
<img src="/images/review/agreement-dm-popover.png" class="gif-border" style="max-width:600px">
273193

274-
1. Annotation 1 <> Prediction 2 - agreement score is `1`
275194

276-
Score displayed in column for this task: `100%`
277195
</div>

docs/source/guide/project_settings_lse.md

Lines changed: 78 additions & 17 deletions
Original file line numberDiff line numberDiff line change
@@ -783,15 +783,18 @@ For more information about pausing annotators, including how to manually pause s
783783

784784
</dd>
785785

786-
<dt id="task-agreement">Agreement</dt>
786+
<dt id="task-agreement">Agreement <span class="badge"></span></dt>
787787

788788
<dd>
789789

790790
When multiple annotators are labeling a task, the task agreement reflects how much agreement there is between annotators.
791791

792792
For example, if 10 annotators review a task and only 2 select the same choice, then that task would have a low agreement score.
793793

794-
You can customize how task agreement is calculated and how it should affect the project workflow. For more information, see [Task agreement and how it is calculated](stats).
794+
You can customize how task agreement is calculated and how it should affect the project workflow. For more information, see [Task agreement](stats).
795+
796+
!!! error Enterprise
797+
Label Studio Starter Cloud only supports the **Pairwise** methodology. Each control tag uses the [default built-in metric](agreement_metrics#Default-metric-reference) for agreement calculation.
795798

796799
<table>
797800
<thead>
@@ -803,20 +806,90 @@ You can customize how task agreement is calculated and how it should affect the
803806
<tr>
804807
<td>
805808

806-
**Agreement metric**
809+
**Methodology**
810+
811+
</td>
812+
<td>
813+
814+
Methodology to use for calculating task agreement.
815+
816+
* **Consensus**: Consensus measures *"What percentage of annotators chose the most common answer?"*
817+
* **Pairwise**: Pairwise measures *"What is the average agreement score across all pairs of annotators?"*
818+
819+
For more information, see [Task agreement - methodology](stats#Methodology).
820+
821+
</td>
822+
</tr>
823+
<tr>
824+
<td>
825+
826+
**Built-in Metrics vs Custom**
827+
828+
</td>
829+
<td>
830+
831+
Select whether you want to use the built-in metrics or custom metrics for agreement.
832+
833+
For more information, see [Built-in agreement metrics reference](agreement_metrics) and [Custom agreement metrics](custom_metric).
834+
835+
</td>
836+
</tr>
837+
<tr>
838+
<td>
839+
840+
**Overall Agreement**
841+
807842
</td>
808843
<td>
809844

810-
Select the [metric](stats#Available-agreement-metrics) that should determine task agreement.
845+
Configure how overall agreement is calculated by setting the weight for each control tag.
846+
847+
For more information, see [Configure weight for the overall agreement](stats#Configure-weight-for-the-overall-agreement).
848+
811849

812850
</td>
813851
</tr>
814852
<tr>
815853
<td>
816854

855+
**Agreement Columns**
856+
857+
</td>
858+
<td>
859+
860+
Configure how agreement is calculated for each control tag.
861+
862+
For more information, see [Configure agreement for each control tag](stats#Configure-agreement-for-each-control-tag).
863+
864+
</td>
865+
</tr>
866+
</table>
867+
868+
</dd>
869+
870+
<dt id="low-agreement">Low Agreement Resolution <span class="badge"></span></dt>
871+
872+
<dd>
873+
874+
!!! note
875+
Low agreement resolution settings are only available when the project is configured to [automatically assign tasks](#distribute-tasks). If you are using Manual distribution, this section will not appear in your project settings.
876+
877+
If you switch a project from Automatic to Manual distribution, low agreement resolution is automatically disabled.
878+
879+
Resolve tasks with low agreement scores by automatically assigning additional annotators to the task.
880+
881+
<table>
882+
<thead>
883+
<tr>
884+
<th>Field</th>
885+
<th>Description</th>
886+
</tr>
887+
</thead>
888+
<tr>
889+
<td>
890+
817891
**Assign additional annotator**
818892

819-
<span class="badge"></span>
820893
</td>
821894
<td>
822895
Enable this option to automatically assign an additional annotator to any tasks that have a low agreement score.
@@ -832,7 +905,6 @@ Note that to see this setting, the project must be set up with [automatic task a
832905

833906
**Agreement threshold**
834907

835-
<span class="badge"></span>
836908
</td>
837909
<td>
838910

@@ -845,7 +917,6 @@ Enter the agreement score that a task must meet before it can be considered comp
845917

846918
**Maximum additional annotators**
847919

848-
<span class="badge"></span>
849920
</td>
850921
<td>
851922

@@ -860,16 +931,6 @@ Annotators are assigned one at a time until the agreement threshold is achieved.
860931
!!! note
861932
When configuring **Maximum additional annotators**, be mindful of the number of annotators available in your project. If you have fewer annotators available than the sum of [**Annotations per task**](#overlap) + **Maximum additional annotators**, you might encounter a scenario in which a task with a low agreement score cannot be marked complete.
862933

863-
</dd>
864-
865-
<dt>Custom weights</dt>
866-
867-
<dd>
868-
869-
Set custom weights for tags and labels to change the agreement calculation. The options you are given are automatically generated from your labeling interface setup.
870-
871-
Weights set to zero are ignored from calculation.
872-
873934
</dd>
874935
</dl>
875936

docs/source/guide/quality.md

Lines changed: 0 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -183,11 +183,6 @@ Review a table to see the following for each annotator:
183183
- The agreement of their annotations with the ground truth annotations, if there are any.
184184
- The agreement of their annotations with predicted annotations, if there are any.
185185

186-
See the following video for an overview of annotator agreement metrics:
187-
188-
<iframe class="video-border" width="560" height="315" src="https://www.youtube.com/embed/Lo_PVE9Pyw4?si=z1vtyI_xIo8aR8fY" width="100%" height="400vh" title="YouTube video player" frameborder="0" allow="accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope; picture-in-picture" allowfullscreen></iframe>
189-
190-
191186
### Review annotator agreement matrix
192187

193188
You can also review the overall annotator agreement on a more individual basis with the annotator agreement matrix.

0 commit comments

Comments
 (0)