Skip to content

Commit

Permalink
fix: add custom load dataset function for MLSUM tasks (#405)
Browse files Browse the repository at this point in the history
* fix: add custom load dataset function

* fix: fix linter

* docs: added points

---------

Co-authored-by: Imene Kerboua <imene.kerboua@esker.com>
Co-authored-by: Kenneth Enevoldsen <kennethcenevoldsen@gmail.com>
  • Loading branch information
3 people authored Apr 18, 2024
1 parent b3e4239 commit c549af2
Show file tree
Hide file tree
Showing 4 changed files with 34 additions and 3 deletions.
6 changes: 3 additions & 3 deletions docs/mmteb/points.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,15 +2,15 @@

| GitHub | Total points | New dataset | New task | Dataset annotations | (Bug)fixes | Running Models | Review PR | Paper Writing | Ideation | Coordination |
|-------------------| ------------ |-------------| -------- | ------------------- | ---------- | -------------- |-----------| -------------- | -------- | ------------- |
| KennethEnevoldsen | | 54 | | 8 | 18 | | 32 | | | 5 |
| KennethEnevoldsen | | 54 | | 8 | 18 | | 33 | | | 5 |
| x-tabdeveloping | | 48 | | | | | | | | |
| imenelydiaker | | 88 | | | | | 15 | | | |
| imenelydiaker | | 88 | | | 2 | | 15 | | | |
| wissam-sib | | 88 | | | | | 1 | | | |
| GabrielSequeira | | 88 | | | | | | | | |
| schmarion | | 88 | | | | | | | | |
| MathieuCiancone | | 88 | | | | | | | | |
| Sakshamrzt | | 10 | | | | | 2 | | | |
| MartinBernstorff | | 2 | | | 7 | | 3 | | | |
| MartinBernstorff | | 2 | | | 7 | | 4 | | | |
| guenthermi | | 12 | | | | | | | | |
| Muennighoff | | | | | | | 8 | | | |
| rasdani | | 4 | | | | | | | | |
Expand Down
15 changes: 15 additions & 0 deletions mteb/tasks/Clustering/fra/MLSUMClusteringP2P.py
Original file line number Diff line number Diff line change
Expand Up @@ -38,6 +38,21 @@ class MLSUMClusteringP2P(AbsTaskClustering):
avg_character_length=None,
)

def load_data(self, **kwargs):
"""
Load dataset from HuggingFace hub and convert it to the standard format.
"""
if self.data_loaded:
return
self.dataset = datasets.load_dataset(
self.metadata.dataset["path"],
self.metadata.dataset["name"],
split=self.metadata.eval_splits[0],
revision=self.metadata.dataset["revision"],
)
self.dataset_transform()
self.data_loaded = True

def create_description(self, example):
example["text"] = example["title"] + " " + example["text"]
return example
Expand Down
15 changes: 15 additions & 0 deletions mteb/tasks/Clustering/fra/MLSUMClusteringS2S.py
Original file line number Diff line number Diff line change
Expand Up @@ -38,6 +38,21 @@ class MLSUMClusteringS2S(AbsTaskClustering):
avg_character_length=None,
)

def load_data(self, **kwargs):
"""
Load dataset from HuggingFace hub and convert it to the standard format.
"""
if self.data_loaded:
return
self.dataset = datasets.load_dataset(
self.metadata.dataset["path"],
self.metadata.dataset["name"],
split=self.metadata.eval_splits[0],
revision=self.metadata.dataset["revision"],
)
self.dataset_transform()
self.data_loaded = True

def dataset_transform(self):
"""
Convert to standard format
Expand Down
1 change: 1 addition & 0 deletions tmp/results
Submodule results added at 6cecf2

0 comments on commit c549af2

Please sign in to comment.