You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: docs/source/guide/custom_metric.md
+4-3Lines changed: 4 additions & 3 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -1,17 +1,18 @@
1
1
---
2
2
title: Add a custom agreement metric to Label Studio
3
-
short: Custom agreement metric
3
+
short: Custom metrics
4
4
tier: enterprise
5
5
type: guide
6
6
order: 0
7
7
order_enterprise: 310
8
8
meta_title: Add a Custom Agreement Metric for Labeling
9
9
meta_description: Label Studio Enterprise documentation about how to add a custom agreement metric to use for assessing annotator agreement or the quality of your annotation and prediction results for data labeling and machine learning projects.
10
10
section: "Review & Measure Quality"
11
-
11
+
parent: "stats"
12
+
parent_enterprise: "stats"
12
13
---
13
14
14
-
Write a custom agreement metric to assess the quality of the predictions and annotations in your Label Studio Enterprise project. Label Studio Enterprise contains a variety of [agreement metrics for your project](stats.html) but if you want to evaluate annotations using a custom metric or a standard metric not available in Label Studio, you can write your own.
15
+
Write a custom agreement metric to assess the quality of the predictions and annotations in your Label Studio Enterprise project. Label Studio Enterprise contains a variety of [agreement metrics for your project](agreement_metrics) but if you want to evaluate annotations using a custom metric or a standard metric not available in Label Studio, you can write your own.
15
16
16
17
!!! note
17
18
This functionality is available out-of-the-box for Label Studio Enterprise Cloud users.
Copy file name to clipboardExpand all lines: docs/source/guide/dashboard_members.md
+3Lines changed: 3 additions & 0 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -63,6 +63,9 @@ The Annotator Agreement Matrix helps you see how consistently different members
63
63
-**Hover over any cell** to view more information including the number of tasks where both members made an annotation. If a member made more than one annotation in a task, the additional annotation(s) are also considered.
64
64
-**Use the label dropdown** to filter and explore agreement when at least one annotation contains the specified label.
65
65
66
+
!!! note
67
+
Agreement in the Members Dashboard reflects the [Pairwise agreement](stats#Pairwise) between annotators, regardless of what methodology you have selected for the project.
68
+
66
69
## Agreement Distribution
67
70
68
71
The Agreement Distribution visualizes how agreement scores vary across tasks in your project. The bar chart displays the number of tasks at each agreement score range.
Copy file name to clipboardExpand all lines: docs/source/guide/label_studio_compare.md
+1-1Lines changed: 1 addition & 1 deletion
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -303,7 +303,7 @@ Label Studio is available to everyone as open source software (Label Studio Comm
303
303
<tr>
304
304
<td><b>Agreement metrics</b><br/><a href="https://docs.humansignal.com/guide/stats.html">Define how annotator consensus is calculated using pre-defined agreement metrics.</a></td>
@@ -29,7 +29,7 @@ For information on setting up a project, see [Create and configure projects](set
29
29
30
30
</div>
31
31
32
-
In Label Studio Community Edition, the data manager is the default view for your data. In Label Studio Enterprise, click **Data Manager** to open and view the data manager page. Every row in the data manager represents a labeling task in your dataset.
32
+
Every row in the data manager represents a labeling task in your dataset.
33
33
34
34
<divclass="enterprise-only">
35
35
@@ -142,136 +142,54 @@ If you want to make changes to the labeling interface or perform a different typ
142
142
143
143
<divclass="enterprise-only">
144
144
145
-
## Agreement and Agreement (Selected) columns
145
+
## Agreement columns
146
146
147
-
These two columns allow you to see agreement scores at a task level.
147
+
The agreement columns in the Data Manager reflect consensus between annotators for a task. For more information on agreement and how it is calculated, see [Task agreement](stats).
148
148
149
-
### Agreement
149
+
You will see the following agreement columns in the Data Manager:
150
150
151
-
The**Agreement**column displays the average agreement score between all annotators for a particular task.
151
+
***Agreement**- The is the overall agreement for the task.
152
152
153
-
Each annotation pair's agreement score will be calculated as new annotations are submitted. For example, if there are three annotations for a task, there will be three unique annotation pairs, and the agreement column will show the average agreement score of those three pairs.
153
+
This is calculated as the mean agreement score between all control tags for a particular task. See [Overall agreement](stats#Overall-agreement).
154
+
***[Control tag] agreement** - Each control tag has its own agreement score.
154
155
155
-
Here is an example with a simple label config. Let's assume we are using ["Exact matching choices" agreement calculation](stats#Exact-matching-choices-example)
The agreement column for this task would show the average of all annotation pair's agreement score:
175
-
`33%`
160
+
### Annotators and models
176
161
177
-
### Agreement (Selected)
162
+
Click any agreement column to select specific annotators and models that you want to use for agreement calculation.
178
163
179
-
The **Agreement (Selected)** column builds on top of the agreement column, allowing you to get agreement scores between annotators, ground truth, and model versions.
However, you can customize this to select a subset of annotators, models, or models and annotators to compare.
184
169
185
-
Under **Choose What To Calculate** there are two options, which can be used for different use cases.
170
+
For example, if you have 10 annotators and you select 3, the overall agreement score and the control tag agreement scores will be recalculated to reflect only your selections.
186
171
187
-
#### Agreement Pairs
188
-
189
-
This allows you to select specific annotators and/or models to compare.
190
-
191
-
192
-
You must select at least two items to compare. This can be used in a variety of ways.
193
-
194
-
**Subset of annotators**
195
-
196
-
You can select a subset of annotators to compare. This is different and more precise than the **Agreement** column which automatically includes all annotators in the score.
197
-
198
-
This will then average all annotator vs annotator scores for only the selected annotators.
We currently only support calculating the **Agreement (Selected)** columen for tasks with 20 or less annotations. If you have a task with more than this threshold, you will see an info icon with a tooltip.
Your selections will apply to all agreement columns in the Data Manager. You cannot select different annotators and models for different agreement columns.
243
176
244
-
Lets say for one task we have the following:
245
-
1. Annotation 1 from annotator 1 - `Cat` (marked as ground truth)
246
-
2. Annotation 2 from annotator 2 - `Dog`
247
-
3. Prediction 1 from model version 1 - `Dog`
248
-
4. Prediction 2 from model version 2 - `Cat`
249
177
250
-
Here is how the score would be calculated for various selections in the dropdown
178
+
### Ground truth match
251
179
252
-
#### `Agreement Pairs` with `All Annotators` selected
253
-
This will match the behavior of the **Agreement** column - all annotation pair's scores will be averaged:
180
+
If your project contains ground truth annotations, you can use this option to compare either a single annotator or a single model to ground truth annotations.
254
181
255
-
1. Annotation 1 <> Annotation 2: Agreement score is `0`
182
+
Label Studio will apply whatever agreement metrics and methodology you have configured for your project, but will limit the calculation to the selected annotator or model and the annotations marked as ground truth.
When multiple annotators are labeling a task, the task agreement reflects how much agreement there is between annotators.
791
791
792
792
For example, if 10 annotators review a task and only 2 select the same choice, then that task would have a low agreement score.
793
793
794
-
You can customize how task agreement is calculated and how it should affect the project workflow. For more information, see [Task agreement and how it is calculated](stats).
794
+
You can customize how task agreement is calculated and how it should affect the project workflow. For more information, see [Task agreement](stats).
795
+
796
+
!!! error Enterprise
797
+
Label Studio Starter Cloud only supports the **Pairwise** methodology. Each control tag uses the [default built-in metric](agreement_metrics#Default-metric-reference) for agreement calculation.
795
798
796
799
<table>
797
800
<thead>
@@ -803,20 +806,90 @@ You can customize how task agreement is calculated and how it should affect the
803
806
<tr>
804
807
<td>
805
808
806
-
**Agreement metric**
809
+
**Methodology**
810
+
811
+
</td>
812
+
<td>
813
+
814
+
Methodology to use for calculating task agreement.
815
+
816
+
***Consensus**: Consensus measures *"What percentage of annotators chose the most common answer?"*
817
+
***Pairwise**: Pairwise measures *"What is the average agreement score across all pairs of annotators?"*
818
+
819
+
For more information, see [Task agreement - methodology](stats#Methodology).
820
+
821
+
</td>
822
+
</tr>
823
+
<tr>
824
+
<td>
825
+
826
+
**Built-in Metrics vs Custom**
827
+
828
+
</td>
829
+
<td>
830
+
831
+
Select whether you want to use the built-in metrics or custom metrics for agreement.
832
+
833
+
For more information, see [Built-in agreement metrics reference](agreement_metrics) and [Custom agreement metrics](custom_metric).
834
+
835
+
</td>
836
+
</tr>
837
+
<tr>
838
+
<td>
839
+
840
+
**Overall Agreement**
841
+
807
842
</td>
808
843
<td>
809
844
810
-
Select the [metric](stats#Available-agreement-metrics) that should determine task agreement.
845
+
Configure how overall agreement is calculated by setting the weight for each control tag.
846
+
847
+
For more information, see [Configure weight for the overall agreement](stats#Configure-weight-for-the-overall-agreement).
848
+
811
849
812
850
</td>
813
851
</tr>
814
852
<tr>
815
853
<td>
816
854
855
+
**Agreement Columns**
856
+
857
+
</td>
858
+
<td>
859
+
860
+
Configure how agreement is calculated for each control tag.
861
+
862
+
For more information, see [Configure agreement for each control tag](stats#Configure-agreement-for-each-control-tag).
Low agreement resolution settings are only available when the project is configured to [automatically assign tasks](#distribute-tasks). If you are using Manual distribution, this section will not appear in your project settings.
876
+
877
+
If you switch a project from Automatic to Manual distribution, low agreement resolution is automatically disabled.
878
+
879
+
Resolve tasks with low agreement scores by automatically assigning additional annotators to the task.
880
+
881
+
<table>
882
+
<thead>
883
+
<tr>
884
+
<th>Field</th>
885
+
<th>Description</th>
886
+
</tr>
887
+
</thead>
888
+
<tr>
889
+
<td>
890
+
817
891
**Assign additional annotator**
818
892
819
-
<spanclass="badge"></span>
820
893
</td>
821
894
<td>
822
895
Enable this option to automatically assign an additional annotator to any tasks that have a low agreement score.
@@ -832,7 +905,6 @@ Note that to see this setting, the project must be set up with [automatic task a
832
905
833
906
**Agreement threshold**
834
907
835
-
<spanclass="badge"></span>
836
908
</td>
837
909
<td>
838
910
@@ -845,7 +917,6 @@ Enter the agreement score that a task must meet before it can be considered comp
845
917
846
918
**Maximum additional annotators**
847
919
848
-
<spanclass="badge"></span>
849
920
</td>
850
921
<td>
851
922
@@ -860,16 +931,6 @@ Annotators are assigned one at a time until the agreement threshold is achieved.
860
931
!!! note
861
932
When configuring **Maximum additional annotators**, be mindful of the number of annotators available in your project. If you have fewer annotators available than the sum of [**Annotations per task**](#overlap) + **Maximum additional annotators**, you might encounter a scenario in which a task with a low agreement score cannot be marked complete.
862
933
863
-
</dd>
864
-
865
-
<dt>Custom weights</dt>
866
-
867
-
<dd>
868
-
869
-
Set custom weights for tags and labels to change the agreement calculation. The options you are given are automatically generated from your labeling interface setup.
0 commit comments