Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Updates LexGLUE and MultiEURLEX README.md files #3075

Merged
merged 24 commits into from
Oct 18, 2021
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
23 changes: 9 additions & 14 deletions datasets/lex_glue/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -98,7 +98,7 @@ The supported tasks are the following:
<table>
<tr><td>Dataset</td><td>Source</td><td>Sub-domain</td><td>Task Type</td><td>Classes</td><tr>
<tr><td>ECtHR (Task A)</td><td> <a href="https://aclanthology.org/P19-1424/">Chalkidis et al. (2019)</a> </td><td>ECHR</td><td>Multi-label classification</td><td>10+1</td></tr>
<tr><td>ECtHR (Task B)</td><td> <a href="https://aclanthology.org/2021.naacl-main.22/">Chalkidis et al. (2021a)</a> </td><td>ECHR</td><td>Multi-label classification </td><td>10</td></tr>
<tr><td>ECtHR (Task B)</td><td> <a href="https://aclanthology.org/2021.naacl-main.22/">Chalkidis et al. (2021a)</a> </td><td>ECHR</td><td>Multi-label classification </td><td>10+1</td></tr>
<tr><td>SCOTUS</td><td> <a href="http://scdb.wustl.edu">Spaeth et al. (2020)</a></td><td>US Law</td><td>Multi-class classification</td><td>14</td></tr>
<tr><td>EUR-LEX</td><td> <a href="https://arxiv.org/abs/2109.00904">Chalkidis et al. (2021b)</a></td><td>EU Law</td><td>Multi-label classification</td><td>100</td></tr>
<tr><td>LEDGAR</td><td> <a href="https://aclanthology.org/2020.lrec-1.155/">Tuggener et al. (2020)</a></td><td>Contracts</td><td>Multi-class classification</td><td>100</td></tr>
Expand Down Expand Up @@ -141,13 +141,13 @@ The current leaderboard includes several Transformer-based (Vaswaniet al., 2017)
<table>
<tr><td>Dataset</td><td>ECtHR Task A </td><td>ECtHR Task B </td><td>SCOTUS </td><td>EUR-LEX</td><td>LEDGAR </td><td>UNFAIR-ToS </td><td>CaseHOLD</td></tr>
<tr><td>Model</td><td>μ-F1 / m-F1 </td><td>μ-F1 / m-F1 </td><td>μ-F1 / m-F1 </td><td>μ-F1 / m-F1 </td><td>μ-F1 / m-F1 </td><td>μ-F1 / m-F1</td><td>μ-F1 / m-F1 </td></tr>
<tr><td>BERT </td><td><b>71.4</b> / 64.0 </td><td>87.6 / <b>77.8</b> </td><td>70.5 / 60.9 </td><td>71.6 / 55.6 </td><td>87.7 / 82.2 </td><td>97.3 / 80.4</td><td>70.7 </td></tr>
<tr><td>RoBERTa </td><td>69.5 / 60.7 </td><td>87.2 / 77.3 </td><td>70.8 / 61.2 </td><td>71.8 / <b>57.5</b> </td><td>87.9 / 82.1 </td><td>97.2 / 79.6</td><td>71.7 </td></tr>
<tr><td>DeBERTa </td><td>69.1 / 61.2 </td><td>87.4 / 77.3 </td><td>70.0 / 60.0 </td><td><b>72.3</b> / 57.2 </td><td>87.9 / 82.0 </td><td>97.2 / 80.2</td><td>72.1 </td></tr>
<tr><td>Longformer </td><td>69.6 / 62.4 </td><td>88.0 / <b>77.8</b> </td><td>72.2 / 62.5 </td><td>71.9 / 56.7 </td><td>87.7 / 82.3 </td><td><b>97.5</b> / 81.0</td><td>72.0 </td></tr>
<tr><td>BigBird </td><td>70.5 / 63.8 </td><td> <b>88.1</b> / 76.6 </td><td>71.7 / 61.4 </td><td>71.8 / 56.6 </td><td>87.7 / 82.1 </td><td>97.4 / 81.1</td><td>70.4 </td></tr>
<tr><td>Legal-BERT </td><td>71.2 / <b>64.6</b> </td><td>88.0 / 77.2 </td><td>76.2 / 65.8 </td><td>72.2 / 56.2 </td><td><b>88.1</b> / <b>82.7</b></td><td> 97.4 / <b>83.4</b></td><td>75.1</td></tr>
<tr><td>CaseLaw-BERT </td><td>71.2 / 64.2 </td><td>88.0 / 77.5 </td><td><b>76.4</b> / <b>66.2</b> </td><td>71.0 / 55.9 </td><td>88.0 / 82.3</td><td>97.4 / 82.4</td><td><b>75.6</b> </td></tr>
<tr><td>BERT </td><td><b>71.4</b> / 64.0 </td><td>79.6 / <b>78.3</b> </td><td>70.5 / 60.9 </td><td>71.6 / 55.6 </td><td>87.7 / 82.2 </td><td>97.3 / 80.4</td><td>70.7 </td></tr>
<tr><td>RoBERTa </td><td>69.5 / 60.7 </td><td>78.6 / 77.0 </td><td>70.8 / 61.2 </td><td>71.8 / <b>57.5</b> </td><td>87.9 / 82.1 </td><td>97.2 / 79.6</td><td>71.7 </td></tr>
<tr><td>DeBERTa </td><td>69.1 / 61.2 </td><td>79.9 / <b>78.3</b> </td><td>70.0 / 60.0 </td><td><b>72.3</b> / 57.2 </td><td>87.9 / 82.0 </td><td>97.2 / 80.2</td><td>72.1 </td></tr>
<tr><td>Longformer </td><td>69.6 / 62.4 </td><td>78.8 / 75.8 </td><td>72.2 / 62.5 </td><td>71.9 / 56.7 </td><td>87.7 / 82.3 </td><td><b>97.5</b> / 81.0</td><td>72.0 </td></tr>
<tr><td>BigBird </td><td>70.5 / 63.8 </td><td> 79.9 / 76.9 </td><td>71.7 / 61.4 </td><td>71.8 / 56.6 </td><td>87.7 / 82.1 </td><td>97.4 / 81.1</td><td>70.4 </td></tr>
<tr><td>Legal-BERT </td><td>71.2 / <b>64.6</b> </td><td><b>80.6</b> / 77.2 </td><td>76.2 / 65.8 </td><td>72.2 / 56.2 </td><td><b>88.1</b> / <b>82.7</b></td><td> 97.4 / <b>83.4</b></td><td>75.1</td></tr>
<tr><td>CaseLaw-BERT </td><td>71.2 / 64.2 </td><td>79.7 / 76.8 </td><td><b>76.4</b> / <b>66.2</b> </td><td>71.0 / 55.9 </td><td>88.0 / 82.3</td><td>97.4 / 82.4</td><td><b>75.6</b> </td></tr>
</table>

### Languages
Expand Down Expand Up @@ -223,12 +223,7 @@ An example of 'train' looks as follows.
An example of 'test' looks as follows.
```json
{
"contexts": ["In Granato v. City and County of Denver, No. CIV 11-0304 MSK/BNB, 2011 WL 3820730 (D.Colo. Aug. 20, 2011), the Honorable Marcia S. Krieger, now-Chief United States District Judge for the District of Colorado, ruled similarly: At a minimum, a party asserting a Mo-nell claim must plead sufficient facts to identify ... to act pursuant to City or State policy, custom, decision, ordinance, re d 503, 506-07 (3d Cir.l985)(<HOLDING>).",
"In Granato v. City and County of Denver, No. CIV 11-0304 MSK/BNB, 2011 WL 3820730 (D.Colo. Aug. 20, 2011), the Honorable Marcia S. Krieger, now-Chief United States District Judge for the District of Colorado, ruled similarly: At a minimum, a party asserting a Mo-nell claim must plead sufficient facts to identify ... to act pursuant to City or State policy, custom, decision, ordinance, re d 503, 506-07 (3d Cir.l985)(<HOLDING>).",
"In Granato v. City and County of Denver, No. CIV 11-0304 MSK/BNB, 2011 WL 3820730 (D.Colo. Aug. 20, 2011), the Honorable Marcia S. Krieger, now-Chief United States District Judge for the District of Colorado, ruled similarly: At a minimum, a party asserting a Mo-nell claim must plead sufficient facts to identify ... to act pursuant to City or State policy, custom, decision, ordinance, re d 503, 506-07 (3d Cir.l985)(<HOLDING>).",
"In Granato v. City and County of Denver, No. CIV 11-0304 MSK/BNB, 2011 WL 3820730 (D.Colo. Aug. 20, 2011), the Honorable Marcia S. Krieger, now-Chief United States District Judge for the District of Colorado, ruled similarly: At a minimum, a party asserting a Mo-nell claim must plead sufficient facts to identify ... to act pursuant to City or State policy, custom, decision, ordinance, re d 503, 506-07 (3d Cir.l985)(<HOLDING>).",
"In Granato v. City and County of Denver, No. CIV 11-0304 MSK/BNB, 2011 WL 3820730 (D.Colo. Aug. 20, 2011), the Honorable Marcia S. Krieger, now-Chief United States District Judge for the District of Colorado, ruled similarly: At a minimum, a party asserting a Mo-nell claim must plead sufficient facts to identify ... to act pursuant to City or State policy, custom, decision, ordinance, re d 503, 506-07 (3d Cir.l985)(<HOLDING>).",
],
"context": "In Granato v. City and County of Denver, No. CIV 11-0304 MSK/BNB, 2011 WL 3820730 (D.Colo. Aug. 20, 2011), the Honorable Marcia S. Krieger, now-Chief United States District Judge for the District of Colorado, ruled similarly: At a minimum, a party asserting a Mo-nell claim must plead sufficient facts to identify ... to act pursuant to City or State policy, custom, decision, ordinance, re d 503, 506-07 (3d Cir.l985)(<HOLDING>).",
"endings": ["holding that courts are to accept allegations in the complaint as being true including monell policies and writing that a federal court reviewing the sufficiency of a complaint has a limited task",
"holding that for purposes of a class certification motion the court must accept as true all factual allegations in the complaint and may draw reasonable inferences therefrom",
"recognizing that the allegations of the complaint must be accepted as true on a threshold motion to dismiss",
Expand Down
55 changes: 29 additions & 26 deletions datasets/multi_eurlex/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -201,32 +201,35 @@ for sample in dataset:
```

### Data Splits

| Language | ISO code | Member Countries where official | EU Speakers (%) (Native / Total) | Number of Documents (Training/Dev/Test) |
| ---- | ---- | ---- | ---- | ---- |
English |**en** | United Kingdom (1973-2020), Ireland (1973), Malta (2004) |13/ 51\% | 55,000 / 5,000 / 5,000 |
| German | **de** |Germany (1958), Belgium (1958), Luxembourg (1958) |16/32\% |55,000 / 5,000 / 5,000
| French | **fr** |France (1958), Belgium(1958), Luxembourg (1958) |12/26\% |55,000 / 5,000 / 5,000
Italian | **it** |Italy (1958) | 13/16\% | 55,000 / 5,000 / 5,000
Spanish | **es** |Spain (1986) | 8/15\% | 52,785 / 5,000 / 5,000
Polish | **pl** |Poland (2004) | 8/9\% | 23,197 / 5,000 / 5,000 |
Romanian | **ro** |Romania (2007) | 5/5\% | 15,921 / 5,000 / 5,000 |
Dutch | **nl** |Netherlands (1958), Belgium (1958) | 4/5\% | 55,000 / 5,000 / 5,000 |
Greek | **el** |Greece (1981), Cyprus (2008) | 3/4\% | 55,000 / 5,000 / 5,000 |
Hungarian | **hu** |Hungary (2004) | 3/3\% | 22,664 / 5,000 / 5,000 |
Portuguese | **pt** |Portugal (1986) | 2/3\% | 23,188 / 5,000 / 5,000 |
Czech | **cs** |Czech Republic (2004) | 2/3\% | 23,187 / 5,000 / 5,000 |
Swedish | **sv** |Sweden (1995) | 2/3\% | 42,490 / 5,000 / 5,000 |
Bulgarian | **bg** |Bulgaria (2007) | 2/2\% | 15,986 / 5,000 / 5,000 |
Danish | **da** |Denmark (1973) | 1/1\% | 55,000 / 5,000 / 5,000 |
Finnish | **fi** |Finland (1995) | 1/1\% | 42,497 / 5,000 / 5,000 |
Slovak | **sk** |Slovakia (2004) | 1/1\% | 15,986 / 5,000 / 5,000 |
Lithuanian | **lt** |Lithuania (2004) | 1/1\% | 23,188 / 5,000 / 5,000 |
Croatian | **hr** |Croatia (2013) | 1/1\% | 7,944 / 2,500 / 5,000 |
Slovene | **sl** |Slovenia (2004) | <1/<1\% | 23,184 / 5,000 / 5,000 |
Estonian | **et** |Estonia (2004) | <1/<1\% | 23,126 / 5,000 / 5,000 |
Latvian | **lv** |Latvia (2004) | <1/<1\% | 23,188 / 5,000 / 5,000 |
Maltese | **mt** |Malta (2004) | <1/<1\% | 17,521 / 5,000 / 5,000 |
<table>
<tr><td> Language </td> <td> ISO code </td> <td> Member Countries where official </td> <td> EU Speakers [1] </td> <td> Number of Documents [2] </td> </tr>
<tr><td> English </td> <td> <b>en</b> </td> <td> United Kingdom (1973-2020), Ireland (1973), Malta (2004) </td> <td> 13/ 51% </td> <td> 55,000 / 5,000 / 5,000 </td> </tr>
<tr><td> German </td> <td> <b>de</b> </td> <td> Germany (1958), Belgium (1958), Luxembourg (1958) </td> <td> 16/32% </td> <td> 55,000 / 5,000 / 5,000 </td> </tr>
<tr><td> French </td> <td> <b>fr</b> </td> <td> France (1958), Belgium(1958), Luxembourg (1958) </td> <td> 12/26% </td> <td> 55,000 / 5,000 / 5,000 </td> </tr>
<tr><td> Italian </td> <td> <b>it</b> </td> <td> Italy (1958) </td> <td> 13/16% </td> <td> 55,000 / 5,000 / 5,000 </td> </tr>
<tr><td> Spanish </td> <td> <b>es</b> </td> <td> Spain (1986) </td> <td> 8/15% </td> <td> 52,785 / 5,000 / 5,000 </td> </tr>
<tr><td> Polish </td> <td> <b>pl</b> </td> <td> Poland (2004) </td> <td> 8/9% </td> <td> 23,197 / 5,000 / 5,000 </td> </tr>
<tr><td> Romanian </td> <td> <b>ro</b> </td> <td> Romania (2007) </td> <td> 5/5% </td> <td> 15,921 / 5,000 / 5,000 </td> </tr>
<tr><td> Dutch </td> <td> <b>nl</b> </td> <td> Netherlands (1958), Belgium (1958) </td> <td> 4/5% </td> <td> 55,000 / 5,000 / 5,000 </td> </tr>
<tr><td> Greek </td> <td> <b>el</b> </td> <td> Greece (1981), Cyprus (2008) </td> <td> 3/4% </td> <td> 55,000 / 5,000 / 5,000 </td> </tr>
<tr><td> Hungarian </td> <td> <b>hu</b> </td> <td> Hungary (2004) </td> <td> 3/3% </td> <td> 22,664 / 5,000 / 5,000 </td> </tr>
<tr><td> Portuguese </td> <td> <b>pt</b> </td> <td> Portugal (1986) </td> <td> 2/3% </td> <td> 23,188 / 5,000 / 5,000 </td> </tr>
<tr><td> Czech </td> <td> <b>cs</b> </td> <td> Czech Republic (2004) </td> <td> 2/3% </td> <td> 23,187 / 5,000 / 5,000 </td> </tr>
<tr><td> Swedish </td> <td> <b>sv</b> </td> <td> Sweden (1995) </td> <td> 2/3% </td> <td> 42,490 / 5,000 / 5,000 </td> </tr>
<tr><td> Bulgarian </td> <td> <b>bg</b> </td> <td> Bulgaria (2007) </td> <td> 2/2% </td> <td> 15,986 / 5,000 / 5,000 </td> </tr>
<tr><td> Danish </td> <td> <b>da</b> </td> <td> Denmark (1973) </td> <td> 1/1% </td> <td> 55,000 / 5,000 / 5,000 </td> </tr>
<tr><td> Finnish </td> <td> <b>fi</b> </td> <td> Finland (1995) </td> <td> 1/1% </td> <td> 42,497 / 5,000 / 5,000 </td> </tr>
<tr><td> Slovak </td> <td> <b>sk</b> </td> <td> Slovakia (2004) </td> <td> 1/1% </td> <td> 15,986 / 5,000 / 5,000 </td> </tr>
<tr><td> Lithuanian </td> <td> <b>lt</b> </td> <td> Lithuania (2004) </td> <td> 1/1% </td> <td> 23,188 / 5,000 / 5,000 </td> </tr>
<tr><td> Croatian </td> <td> <b>hr</b> </td> <td> Croatia (2013) </td> <td> 1/1% </td> <td> 7,944 / 2,500 / 5,000 </td> </tr>
<tr><td> Slovene </td> <td> <b>sl</b> </td> <td> Slovenia (2004) </td> <td> <1/<1% </td> <td> 23,184 / 5,000 / 5,000 </td> </tr>
<tr><td> Estonian </td> <td> <b>et</b> </td> <td> Estonia (2004) </td> <td> <1/<1% </td> <td> 23,126 / 5,000 / 5,000 </td> </tr>
<tr><td> Latvian </td> <td> <b>lv</b> </td> <td> Latvia (2004) </td> <td> <1/<1% </td> <td> 23,188 / 5,000 / 5,000 </td> </tr>
<tr><td> Maltese </td> <td> <b>mt</b> </td> <td> Malta (2004) </td> <td> <1/<1% </td> <td> 17,521 / 5,000 / 5,000 </td> </tr>
</table>

[1] Native and Total EU speakers percentage (%) \
[2] Training / Development / Test Splits

## Dataset Creation

Expand Down