Git Product home page Git Product logo

celeba-hq-dataset-download's Introduction

celebA-HQ-dataset-download

While working with celebA-HQ dataset I found it quite difficult to generate the dataset, so I collected the following scripts and dockerized it to make life a little bit easier.

To get the celebA-HQ dataset, you need to
a) download the celebA dataset download_celebA.py,
b) unzip celebA files with p7zip,
c) move Anno files to celebA folder,
d) download some extra files, download_celebA_HQ.py,
e) do some processing to get the HQ images make_HQ_images.py.

The size of the final dataset is 89G. However, you will need a bit more storage to be able to run the scripts.

Usage

Docker

If you have Docker installed, run the following command from the root directory of this project:

docker build -t celeba-hq . && docker run -it -v $(pwd):/data celebahq

By default, this will create the dataset in same directory. To put it elsewhere, replace $(pwd) with the absolute path to the desired output directory.

Prebuilt Docker Image

I also have a pre-built docker image at suvojit0x55aa/celeba-hq. You can just docker run without cloning the repo even !

docker run -it -v $(pwd):/data suvojit0x55aa/celeba-hq

Running it locally

If you choose to go through the more involved path for whatever reason follow the steps below:

  1. Clone the repository
git clone https://github.com/suvojit-0x55aa/celebA-HQ-dataset-download.git
cd celebA-HQ-dataset-download
  1. Install necessary packages (Because specific versions are required Conda is recomended)
conda create -n celebaHQ python=3
source activate celebaHQ
  • Install the packages
conda install jpeg=8d tqdm requests pillow==3.1.1 urllib3 numpy cryptography scipy
pip install opencv-python==3.4.0.12 cryptography==2.1.4
  • Install 7zip (On Ubuntu)
sudo apt-get install p7zip-full
  1. Run the scripts
./create_celebA-HQ.sh <dir_to_save_files>

where <dir_to_save_files> is the directory where you wish the data to be saved.

  1. Go watch a movie, theses scripts will take a few hours to run depending on your internet connection and your CPU power. By default the script will launch as many jobs as you have cores on your CPU. If you want to change this behaviour change the create_celebA-HQ.sh script. The final HQ images will be saved as .jpg files in the <dir_to_save_files>/celeba-hq folder.

Pre-Calculated Dataset

This script generated the dateset with original names from CelebA. If you're okay with a version of the dataset that is named index wise you can save a lot of time and effort and download it from this convenient Google Drive link.

Remark

This script has lot of specific dependencies and is likely to break somewhere, but if it executes until the end, you should obtain the correct dataset. However the docker is pretty fool-proof, so do use it if you can.

Sources

This code is inspired by these files

Citing the dataset

You probably want to cite the paper "Progressive Growing of GANs for Improved Quality, Stability, and Variation" that was submitted to ICLR 2018 by Tero Karras (NVIDIA), Timo Aila (NVIDIA), Samuli Laine (NVIDIA), Jaakko Lehtinen (NVIDIA and Aalto University).

celeba-hq-dataset-download's People

Contributors

suvojit-0x55aa avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar

celeba-hq-dataset-download's Issues

Checksum error persists

The link to directly download the dataset is not working. Additionally, the checksum problem raised in a previous issue(now closed) is still there. Please let me know of a solution to these problems.

Refactor create_celeba_HQ.py:create_celeba_hq(celeba_dir,delta_dir,output_dir,num_threads,num_tasks)

I've selected create_celeba_HQ.py:create_celeba_hq(celeba_dir,delta_dir,output_dir,num_threads,num_tasks) for refactoring, which is a unit of 151 lines of code. Addressing this will make our codebase more maintainable and improve Better Code Hub's Write Short Units of Code guideline rating! ๐Ÿ‘

Here's the gist of this guideline:

  • Definition ๐Ÿ“–
    Limit the length of code units to 15 lines of code.
  • Whyโ“
    Small units are easier to analyse, test and reuse.
  • How ๐Ÿ”ง
    When writing new units, don't let them grow above 15 lines of code. When a unit grows beyond this, split it in smaller units of no longer than 15 lines.

You can find more info about this guideline in Building Maintainable Software. ๐Ÿ“–


โ„น๏ธ To know how many other refactoring candidates need addressing to get a guideline compliant, select some by clicking on the ๐Ÿ”ฒ next to them. The risk profile below the candidates signals (โœ…) when it's enough! ๐Ÿ


Good luck and happy coding! :shipit: โœจ ๐Ÿ’ฏ

Error downloading `jpeg` & `pillow` with conda

I get stuck at the same error whether I try the Docker approach or follow the instructions myself.

PackagesNotFoundError: The following packages are not available from current channels:

  - pillow==3.1.1
  - jpeg=8d

Am I doing something wrong? Or does the script need to be udpated?

Google Drive Link is dead

Hello, thank you for providing the script to download CelebA_HQ dataset. I'm trying to download the dataset through the google drive link you provided, but it seems that the link is dead. Could you perhaps reactivate the link please? Thank you very much.

Prebuilt Docker Image fails

Per the instructions in the README, the command

docker run -it -v $(pwd):/data suvojit0x55aa/celeba-hq

Fails with:

`Loading CelebA data from /data/celebA
(202599, 5, 2)

Loading CelebA-HQ deltas from /data/celebA-HQ

Error: Expected to find 30 zips in /data/celebA-HQ/delta*.zip`

Any ideas as to what the problem may be?

Checksum error for download_CelebA.py

Hi, I have tried to download the celebA using your code and I got the following errors as below. It seems that the checksum of img_celeba.7z.001 does not match. Is there any workaround for this? Thanks!

(celebaHQ) swheo@insight56:/ssd2/swheo/db/celebA-HQ-dataset-download$ sh ./create_celebA-HQ.sh ./celebA-HQ   
/ssd2/swheo/db/celebA-HQ-dataset-download
Downloading Anno/list_landmarks_celeba.txt to ./celebA-HQ/celebA/Anno/list_landmarks_celeba.txt
...
Check SHA1 ./celebA-HQ/celebA/img_celeba.7z.001
Traceback (most recent call last):
  File "/ssd2/swheo/db/celebA-HQ-dataset-download/download_celebA.py", line 186, in <module>
    download_celabA(dataset_dir)
  File "/ssd2/swheo/db/celebA-HQ-dataset-download/download_celebA.py", line 178, in download_celabA
    download_and_check(_IMGS_DRIVE, dataset_dir)
  File "/ssd2/swheo/db/celebA-HQ-dataset-download/download_celebA.py", line 112, in download_and_check
    raise RuntimeError('Checksum mismatch for %s.' % save_path)
RuntimeError: Checksum mismatch for ./celebA-HQ/celebA/img_celeba.7z.001.
...
Deal with file: image_list.txt
./celebA-HQ/celebA-HQ/image_list.txt: 23.0B [00:00, 306B/s]

7-Zip [64] 16.02 : Copyright (c) 1999-2016 Igor Pavlov : 2016-05-21
p7zip Version 16.02 (locale=C,Utf16=off,HugeFiles=on,64 bits,72 CPUs Intel(R) Xeon(R) Gold 6140 CPU @ 2.30GHz (50654),ASM,AES-NI)

Scanning the drive for archives:
1 file, 2257 bytes (3 KiB)   

Extracting archive: ./celebA-HQ/celebA/img_celeba.7z
ERROR: ./celebA-HQ/celebA/img_celeba.7z
./celebA-HQ/celebA/img_celeba.7z
Open ERROR: Can not open the file as [7z] archive


ERRORS:
Is not archive
    
Can't open as archive: 1
Files: 0
Size:       0
Compressed: 0
  File "/ssd2/swheo/db/celebA-HQ-dataset-download/create_celeba_HQ.py", line 71
    print '\n\nWorker thread caught an exception:\n' + result.traceback + '\n',
    
SyntaxError: Missing parentheses in call to 'print'. Did you mean print(...)?

Can you share the 64x64 size data?

Hi, thanks for your effort!
It's very convenient to download the data directly from the google drive you shared.
I just note that it seems lack of the 64x64 size data version? Can you share this version?

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.