Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ValueError: cannot mmap an empty file #37

Open
xiang-xiang-zhu opened this issue Aug 23, 2021 · 6 comments
Open

ValueError: cannot mmap an empty file #37

xiang-xiang-zhu opened this issue Aug 23, 2021 · 6 comments

Comments

@xiang-xiang-zhu
Copy link

When I want to view the shape of train.features.mmap, numpy reports an error. How can I solve this problem

By the way, can I directly use the mmap file (such as train/valid/test. features.mmap) as the video feature, for example, save it as an .npy file for multimodal training

thank you

@xiang-xiang-zhu
Copy link
Author

Because I didn't download the complete image compression package, I want to know how many images there are in the training set, verification set and test set respectively

@YuxianMeng
Copy link
Collaborator

YuxianMeng commented Aug 23, 2021

@xiang-xiang-zhu Hi, I guess you are not familiar with mmap format.
We choose to use mmap instead of npy because .npy will load np.array to memory, but our feature file is too
big.
If you want to know the number of images without downloading them, you can try to download text first, and each sentence in text should be paired with an image.

@xiang-xiang-zhu
Copy link
Author

Now I want to train with my own model. Can I directly use MMAP to read the data in the image part? Is the shape (image_num,feature_ dim)?

@YuxianMeng
Copy link
Collaborator

@xiang-xiang-zhu Yes, please refer to our corresponding code.

@xiang-xiang-zhu
Copy link
Author

xiang-xiang-zhu commented Aug 24, 2021

@xiang-xiang-zhu Yes, please refer to our corresponding code.

thank you very much
I would like to ask whether [0, 1, 2] in the jsonl file represents a series of conversations from the 0 to the 2 sentences. 1 is the response of 0 and 2 is the response of 1. The conversations in [3,4,5] have nothing to do with [0,1,2] during training

np.memmap(feature_file(data_dir, split), dtype='float32', mode='r',shape=(self.total_num, self.dim))
By the way, when I read the mmap file with this code, does the array subscript represent the picture subscript? For example, after reading train.featrues.mmap, is the nth element in the read array the feature of the nth training picture

@YuxianMeng
Copy link
Collaborator

@xiang-xiang-zhu Yes, both of your comments are right.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants