Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

How to apply the pre-trained model to a raw text file? #1

Open
hppRC opened this issue Jan 3, 2022 · 4 comments
Open

How to apply the pre-trained model to a raw text file? #1

hppRC opened this issue Jan 3, 2022 · 4 comments

Comments

@hppRC
Copy link

hppRC commented Jan 3, 2022

Hi, @mounicam and @danieljkim0118.

Thanks to share your code of experiments and the pre-trained model.

I want to apply the pre-trained Split-and-Rephrase model (ourmodel_bisect_wiki-001.pt) to my own raw text data which consists of only the "complex side", however, even though I've read some of the code and tried to run it in my environment, I don't understand how to do it yet.

Could you give me some instructions to adapt your model to a raw text file, or share a code snippet?

Here, this is an example of my raw text data file.
There are only complex sentences in the file, and each sentence is written per line.

One side of the armed conflicts is composed mainly of the Sudanese military and the Janjaweed, a Sudanese militia group recruited mostly from the Afro-Arab Abbala tribes of the northern Rizeigat region in Sudan.
Jeddah is the principal gateway to Mecca, Islam's holiest city, which able-bodied Muslims are required to visit at least once in their lifetime.
The Great Dark Spot is thought to represent a hole in the methane cloud deck of Neptune.

I've already finished installing packages such as fairseq, tensorflow, simplediff, and stanfordcorenlp according to your README.md.
Also, I've downloaded the .jar file and the pre-trained model.

I will probably need to use Moses tokenizer at first, but after that, what should I do?

Thanks a lot!

@tampered816
Copy link

同问

@mounicam
Copy link
Owner

mounicam commented May 10, 2022

The instructions to generate the output are in the README at
https://github.com/mounicam/BiSECT/tree/main/our_model

You can have dummy train, valid and test.dst files and use your file as test.src.

@tampered816
Copy link

The instructions to generate the output are in the README at https://github.com/mounicam/BiSECT/tree/main/our_model

You can have dummy train, valid and test.dst files and use your file as test.src.

Our question is whether there is a corresponding test.py file for Train.py, because we need to know the result.

@drillerjon
Copy link

I do not understand how to generate the output.
I have created a folder raw_data which contains the files train, valid and test.dst and the files train, valid and test.src. test.src contains only one sentence to be split by the model. Then I call
sh generate.sh ../data/binarized_data ../model/our_model/ourmodel_bisect.pt result ../data/raw_data/train.src

in data/binarized_data are the files created during preprocessing.

I don't get any result, maybe you can help me?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants