Comments (11)
Since I have not heard back from anyone, I wrote a script that uses selenium to populate the downloads file from the webpage.
import argparse
import time
import requests
import selenium.webdriver.firefox.options
from selenium import webdriver
CO3D_WEBPAEGE_URL = "https://ai.facebook.com/datasets/co3d-downloads/"
def fetch_url_by_span_text(driver, query_text):
text_elem = driver.find_element_by_xpath("//span[contains(text(),'{}')]".format(query_text))
a_elm = text_elem.find_element_by_xpath("..")
url = a_elm.get_attribute("href")
return url
def get_category_ids(driver):
cur_list_url = fetch_url_by_span_text(driver, "Download all links")
response = requests.get(cur_list_url)
data = response.text
lines = data.split('\n')[1:]
category_ids = [elm.split()[0].strip() for elm in lines]
return category_ids
def get_co3d_urls(page_path):
options = selenium.webdriver.firefox.options.Options()
options.headless = True
firefox_profile = webdriver.FirefoxProfile()
firefox_profile.set_preference("browser.privatebrowsing.autostart", True)
with webdriver.Firefox(options=options, firefox_profile=firefox_profile) as driver:
driver.get(page_path)
time.sleep(1) # Some delay to let the webpage populate
category_ids = get_category_ids(driver)
item_path_pairs = []
for category_id in category_ids:
url = fetch_url_by_span_text(driver, category_id)
item_path_pairs.append((category_id, url))
return item_path_pairs
if __name__ == "__main__":
parser = argparse.ArgumentParser()
parser.add_argument("--download_files_list", type=str, required=False, help="Where the downloadable list will be generated", default="./downloadpaths.txt")
args = parser.parse_args()
co3d_item_urls = get_co3d_urls(CO3D_WEBPAEGE_URL)
with open(args.download_files_list, 'w') as f_out:
f_out.write("file_name\tcdn_link\n")
for i, (item, url) in enumerate(co3d_item_urls):
f_out.write(item)
f_out.write('\t')
f_out.write(url)
if i < len(co3d_item_urls) - 1:
f_out.write('\n')
from co3d.
+1
There is also a ZeroDivisionError
in download_dataset.py
(line 138) if the download fails like this.
from co3d.
Thanks for releasing this useful dataset. I was trying to download the data following the CDN links found in the text file, but for the URLs I get "URL signature expired" error from any browser and any machine I try it from. How do I solve this?
Hi, thanks for the interest in our dataset and sorry for being late with the response due to some of us being on summer holiday.
The CDN links expire once every few days and the link text file has to be re-downloaded. Make sure to download a fresh list of links whenever you start the download. This should fix the problem.
Indeed the solution using selenium
seems to do the latter automatically. Thanks for the code!
from co3d.
Hi, I tried to use the script to download the dataset with the copied CDN links from your website, but met the ZeroDivisionError also.
Do you know what could be the reason for this?
from co3d.
Thanks for releasing this useful dataset. I was trying to download the data following the CDN links found in the text file, but for the URLs I get "URL signature expired" error from any browser and any machine I try it from. How do I solve this?
Hi, thanks for the interest in our dataset and sorry for being late with the response due to some of us being on summer holiday.
The CDN links expire once every few days and the link text file has to be re-downloaded. Make sure to download a fresh list of links whenever you start the download. This should fix the problem.
Indeed the solution using
selenium
seems to do the latter automatically. Thanks for the code!
I just downloaded the text file and retried the URLs and I still get the "URL signature expired" error. The URLs within the text files are getting expired. The ZeroDivisionError
in the python script is also happening because of this. That's why I wrote the script above. It requires selenium
to work, but generates a fresh text file which should allow one to download the dataset without facing these errors.
from co3d.
@davnov134 Happy Summer Holiday! The bugs here are:
- The website generating the 51-line text links file appears to be broken. The urls are all expired. Perhaps it's generating a static file and/or there are some caching problems going on. Seems we have multiple reproductions here.
- The download script has a
ZeroDivisionError
bug that triggers when one or more of the files can't be downloaded. Also several reproductions.
Edit: huh it seems the manual download links may have also expired now (i.e. the 50 links on https://ai.facebook.com/datasets/co3d-downloads/ ). I haven't seen that happen before.
from co3d.
@davnov134 Happy Summer Holiday! The bugs here are:
Edit: huh it seems the manual download links may have also expired now (i.e. the 50 links on https://ai.facebook.com/datasets/co3d-downloads/ ). I haven't seen that happen before.
@pwais The links on the page do work for me still. Perhaps the problem on your end happens due to webpage caching by your browser? Do the links work if you load the page in incognito?
from co3d.
hmm looks like I had some connection issues. the download script still doesn't work for me tho. the selenium script does help a lot!!
from co3d.
@pwais , I just downloaded a fresh link list file and launched the download without issues.
If you make sure that you are using a fresh set of links (i.e. do a no-cache refresh of the link page, in Chrome+Mac this is done via Cmd+R), do you still encounter the zero-division error?
The selenium solution is very nice, but introduces too big of a dependency to be supported officially. So I'd rather make sure that the problem cannot be solved in a simpler manner.
from co3d.
Agree that selenium is a heavy requirement, but the links seem to expire from time to time nonetheless. I was able to download using wget --continue
which was handy because the downloads did fail a bit from time to time. The ZeroDivision
error remains-- when a download has zero bytes (it fails), the cited exception hides the issue of the link being broken-- the division at hand is to inform the progress bar, and the progress bar not working is irrelevant if the file can't be downloaded at all. If the response is zero, perhaps just raise a ValueError
?
@davnov134 The paper says "The CO3D collection effort still continues at a steady pace of ∼500 videos per week which we plan to release in the near future." --- do you intend to version the dataset and/or provide the new videos? I think what most people would want here is an experience similar to rsync
or aws s3 sync
-- any partial data is not re-downloaded, and new data can get downloaded easily too. (Note that the existing download script always starts from scratch-- that doesn't scale well for a dataset this size... I had to resume multiple times due to network issues, and I never saw better than 50 MByte/sec download). awscli
is a healthy multi-platform client but I can understand why Facbeook might not want to depend on that and/or publish under an S3-compatible server-side solution ...
At any rate, thanks for this amazing dataset! I wish there were a more straightforward way for distributing stuff like this, COCO, imagenet, etc...
from co3d.
@davnov134 Thanks for the recent fix. I still can't download the whole dataset though :( It will eventually time out, and the download script doesn't allow resumes (it tends to blank out everything downloaded thus far). I have fiber internet so I don't think the problem is my connection is too slow.
-
Will the dataset be available via one big download (e.g. how imagenet was) or bittorrent or something? The current distribution method doesn't seem to work. Once upon a time Facebook had Wirehog (https://en.wikipedia.org/wiki/Wirehog ) ... maybe they can revive that?
-
Again, the paper says "The CO3D collection effort still continues at a steady pace of ∼500 videos per week which we plan to release in the near future." --- do you intend to version the dataset and/or provide the new videos?
from co3d.
Related Issues (20)
- Pytorch independent usage convention HOT 2
- Is it possible to download only one category? HOT 2
- Camera Position Plots Don't Seem to Match Expected Circular Motion HOT 3
- Missing depth information for apple HOT 1
- Difference between v1 and v2 HOT 1
- How to only get 3d point Cloud? HOT 1
- Camera Intrinsic parameter HOT 2
- Filter accurate pointclouds HOT 6
- CO3D Depth Unit HOT 1
- Confusion about the camera extrinsics HOT 2
- How can i get distortion coefficient? HOT 1
- Dataset index sequence_frames_in_order HOT 3
- Co3D depth warping HOT 1
- Contents of Single-Sequence Subset Compared to Full Dataset
- Generating camera poses using Colmap
- Format of camera poses listed in frame_annotations.jgz HOT 1
- Scale in co3d annotation file HOT 1
- Predicted camera poses evaluation HOT 2
- How to compute the rescale_factor when we zero-centered every point ?
- dataset/links.txt
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from co3d.