Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Downloading the tbv dataset. #8

Closed
tom-bu opened this issue Mar 18, 2022 · 13 comments
Closed

Downloading the tbv dataset. #8

tom-bu opened this issue Mar 18, 2022 · 13 comments

Comments

@tom-bu
Copy link

tom-bu commented Mar 18, 2022

I'm trying to download the tbv dataset and it seems there are two instructions to do so. Do these two methods produce the same result?

One here:

  1. https://github.com/argoai/argoverse2-api/blob/main/DOWNLOAD.md
    s5cmd --no-sign-request cp s3://argoai-argoverse/av2/tbv/* target-directory

And another here:
2. https://github.com/argoai/argoverse2-api/blob/main/src/av2/datasets/tbv/README.md
SHARD_DIR={DESIRED PATH FOR TAR.GZ files} s5cmd cp s3://argoai-argoverse/av2/tars/tbv/*.tar.gz ${SHARD_DIR}

When I try 1, I get an error "s5cmd is hitting the max open file limit allowed by your OS. Either increase the open file limit or try to decrease the number of workers with '-numworkers' parameter'.

When I try 2, I get an error
"Error session: fetching region failed: NoCredentialProviders: no valid providers in chain. Deprecated."

  1. probably downloads half of the dataset, while 2. doesn't initiate the download. I will probably continue with 1, but 2. probably is faster. I'm using Linux Ubuntu 18.04.
@benjaminrwilson
Copy link
Collaborator

Hi @tom-bu,

Thanks for your interest!

  1. Have you tried increasing the open file limit? Additionally, does change the numworkers parameter resolve the issue?

  2. I think the second command you listed is missing --no-sign-request. Does this resolve the issue: SHARD_DIR={DESIRED PATH FOR TAR.GZ files} s5cmd --no-sign-request cp s3://argoai-argoverse/av2/tars/tbv/*.tar.gz ${SHARD_DIR}?

@johnwlambert
Copy link

Hey @tom-bu, thanks for trying out TbV. If you're having issues with s5cmd even after trying the fixes that Ben suggested, you can just download the 21 tar files, directly, given the links listed here:

wget https://s3.amazonaws.com/argoai-argoverse/av2/tars/tbv/TbV_v1.0_shard0.tar.gz
wget https://s3.amazonaws.com/argoai-argoverse/av2/tars/tbv/TbV_v1.0_shard1.tar.gz
...
wget https://s3.amazonaws.com/argoai-argoverse/av2/tars/tbv/TbV_v1.0_shard20.tar.gz

The TbV dataset is available in two identical formats, for convenience to users. Either you can download the 21 tar.gz files directly, and then use the untar_tbv.py script, or you can pull down all of the files (in their extracted form). Depending upon your connection, one may be much faster than the other (there are almost 8 million images files in the extracted format, for example).

@tom-bu
Copy link
Author

tom-bu commented Mar 18, 2022

Thanks for the feedback! My s5cmd version doesn't seem to have a sync method to continue where I left off with option 1, so I'm currently trying option 2 with the tar files. I realized for option 2 I wasn't waiting long enough to download, but I believe the --no-sign-request is necessary.

@tom-bu
Copy link
Author

tom-bu commented Mar 18, 2022

Hi @johnwlambert, thanks for the info. Another question I had about the TbV dataset is where can I find the labels for change? Is there a file that indicates different log pairs and what has changed?

Thanks!

@senselessdev1
Copy link
Contributor

Hi @tom-bu, we'll upload more information about the train/val/test splits for TbV in the next day or two.

@tom-bu
Copy link
Author

tom-bu commented Mar 21, 2022

Thanks for the update @johnlambert-argo. Look forward to it!

@senselessdev1
Copy link
Contributor

@tom-bu I've provided a clustering of logs by spatial location in this PR: #26.

A few things to note:

  • Each log within a cluster shares some significant visual overlap with other logs within its cluster.
  • These are not necessarily before/after pairs. In some cases, all logs in a cluster may be "after" a change.
  • Each cluster has at least one log in the val or test set.
  • Logs of each cluster are provided in chronological order.

@tom-bu
Copy link
Author

tom-bu commented Mar 26, 2022

@johnlambert-argo great, thanks! Are all of the logs clustered? I believe there are ~1000 logs? Also, will labels be released for which logs include change/no change?

@johnwlambert
Copy link

Hi @tom-bu, no, this is only a specific subset of logs, where some log in the cluster had a real-world change.

You could cluster all the log spatially though, using their poses.

Labels for the val set will be released, but labels for the test set will not be (they'll be used for an online leaderboard, which you will be welcome to submit to). We'll release those val set annotations in probably 1-3 days.

@tom-bu
Copy link
Author

tom-bu commented Mar 29, 2022

Hi @johnlambert-argo,

I just realized that this map change dataset doesn't necessarily have before and after sensor data as shown in the image here. So it seems we're just checking if the corresponding vector map is up-to-date or not?

And I wanted to verify that the training/validation sets have no changes. Therefore, all we need to know is if a log is in the training/validation set to know the label?

Thanks,

Tom

@benjaminrwilson
Copy link
Collaborator

Hi @tom-bu, were you able to get your questions answered?

@tom-bu
Copy link
Author

tom-bu commented Apr 13, 2022

Have the data splits been released yet? I think that's the only thing I'm waiting on.

@benjaminrwilson
Copy link
Collaborator

@tom-bu These are now available here: https://github.com/argoai/av2-api/blob/main/src/av2/datasets/tbv/splits.py

Please reach out if you have any other questions!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants