You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Is your feature request related to a problem? Please describe.
I’m working through these Classifier and Heuristic Quality Filtering docs. I’m looking for an elegant way to write filtered docs back to file. If I follow the example of the docs, then I use the books = DocumentDataset.read_json(files, add_filename=True) and ultimately long_books.to_json("long_books/", write_to_filename=True) method. This gets me a filename of the correct name, but the new data now has a filename field, which I do not wish to have.
If instead I avoid using add_filename=True and then use .to_json("long_books") then I end up with the data I want, but in a file called 0.part.
Describe the solution you'd like
I'd like to be able to write to a .jsonl file directly either w/o creating the filename column, or, without including it in the output file.
Is your feature request related to a problem? Please describe.
I’m working through these Classifier and Heuristic Quality Filtering docs. I’m looking for an elegant way to write filtered docs back to file. If I follow the example of the docs, then I use the
books = DocumentDataset.read_json(files, add_filename=True)
and ultimatelylong_books.to_json("long_books/", write_to_filename=True)
method. This gets me a filename of the correct name, but the new data now has a filename field, which I do not wish to have.If instead I avoid using
add_filename=True
and then use.to_json("long_books")
then I end up with the data I want, but in a file called0.part
.Describe the solution you'd like
I'd like to be able to write to a
.jsonl
file directly either w/o creating the filename column, or, without including it in the output file.Describe alternatives you've considered
...which won't work for larger datasets or multiple files.
The text was updated successfully, but these errors were encountered: