idea-research / humansd Goto Github PK

[ICCV 2023] The official implementation of paper "HumanSD: A Native Skeleton-Guided Diffusion Model for Human Image Generation"

License: Apache License 2.0

Python 99.99% Shell 0.01%

conditional-image-generation deep-learning iccv iccv2023 image-generation pytorch

humansd's People

Contributors

Stargazers

Watchers

Forkers

barseghyanartur tuofeilunhifi zonepg haizhu12 xywen97 cv-synthesis jianzfb xxmiprai steven-xiong bruinxiong hidou-mahiro jackzhousz chnxindong guoqincode paperwave peterzs xayahlia

humansd's Issues

How can we use this method for other types of datasets

How can we use this method for other types of datasets, I mean, other types of image generation (not human generation). This method may need to be pre-trained on a new dataset

ValueError: 'cuda' is not a valid DistributedType

I tried to train, but I encountered this error, has anyone encountered similar problems?

Downloading LAION-Human Dataset

Hi, thanks for your great work!

I'm finding it difficult to download the LAION-Human dataset using the provided python scripts (utils/download_data.py) as downloading from URLs seems to be slow and unstable. Would it be possible for you to directly provide zip files containing the jpg images, similar to the pose files? This would make the download process much more convenient. Thank you.

ImportError: cannot import name 'inference_bottom_up_pose_model' from 'mmpose.apis'

'cuda' is not a valid DistributedType

Hi when I train the humansd model, I met the 'cuda' is not a valid DistributedType'
I set the GPU to 1 can not fix.

Is it possible to use full openpose file for the humanSD pose?

Hi, I was wondering if it is possible to use face + hands in addition to pose from the openpose in the HumanSD?

Thanks
Best regards

pose_ckpt

Thank you for your great work!

Here is the file path for pose_ckpt.

HumanSD/configs/humansd/humansd-finetune.yaml

Line 22 in d8fb179

pose_ckpt: 'humansd_data/checkpoints/higherhrnet_w48_coco_512x512_udp.pth'

But, I couldn't find 'higherhrnet_w48_coco_512x512_udp.pth' ' in the shared link:
https://drive.google.com/drive/folders/1NLQAlF7i0zjEpd-XY0EcVw9iXP5bB5BJ

Should I change the filename in the config?

Evaluation bugs

Hi! Thanks for sharing great work!

While I am following instructions for inference in README, I've encountered some bugs and the result is not good after resolved them. (when running scripts/gradio/pose2img.py)

Bugs

functions from mmpose.apis are changed

HumanSD/scripts/gradio/pose2img.py

Line 27 in 464fcc7

from mmpose.apis import inference_bottom_up_pose_model, init_pose_model

name of fucntion
- init_pose_model -> init_model
- inference_bottom_up_pose_model -> inference_bottomup
And arguments of each functions seems changed

use_fp16=True in

HumanSD/configs/humansd/humansd-inference.yaml

Line 25 in 464fcc7

use_fp16: True

If use_fp16 is True, error occurs as follow

Result

What I did are 1) skip mmpose (I don't know well about mmpose and it requires lots of debugging) and feed pose image directly rather than running pre-trained mmpose model, 2) set use_fp16=False.
Set arguments as follows

    input_image = Image.open('assets/demo/demo7.jpg')
    load_image_type="Upload skeleton image",
    prompt = "A girl is running in the field"
    num_samples = 1
    ddim_steps = 50
    detection_thresh = 0.05
    scale = 10
    strength = 1.
    seed = 299033459
    eta = 0.
    negative_prompt = "longbody, lowres, bad anatomy, bad hands, missing fingers, extra digit, fewer digits, cropped, worst quality, low quality"
    added_prompt = "detailed background, best quality, extremely detailed"
    predict(
        comparison_model=[],
        load_image_type=load_image_type,
        input_image=input_image,
        prompt=prompt,
        added_prompt=added_prompt,
        ddim_steps=ddim_steps,
        detection_thresh=detection_thresh,
        num_samples=num_samples,
        scale=scale,
        seed=seed,
        eta=eta,
        strength=strength,
        negative_prompt=negative_prompt,

def predict(comparison_model, load_image_type, input_image, prompt, added_prompt, ddim_steps, detection_thresh, num_samples, scale, seed, eta, strength, negative_prompt, save_path="logs/gradio_images"):
    image = np.array(input_image.convert("RGB"))
    image = HWC3(image)
    image = resize_image(image, IMAGE_RESOLUTION)
    humansd_pose_image = image

    ...

But result is quite bad.

My result
Result in README

Can you give me any advices ??
Thank you!

Memory issue when calculating FID

Thank you for sharing your great work!

I'd like to calculate FID given source images and generated images.

In your paper, the number of validation images is 3,750 images.
In my experiment, the number of images is 3000, less than your case.
When I calculate FID, out of memory issue happened on V100 gpu in here

Could you please give me any comments to solve it?

LAION-Human Dataset

Hi,

I have already downloaded the full laion-5b dataset. How can i use your .parquet and mapping file to get corresponding image.

Train with fp16

I am trying to fine-tune HumanSD ckpt with use_fp16=True option. (I installed xformers)
But it still RuntimeError occurs like below

I think there needs some codes converting model weight to fp16 but there isn't any convertion or autocast
HumanSD/ldm/modules/diffusionmodules/openaimodel.py

Line 23 in fe480f7

def convert_module_to_f16(x):
Is it possible training stable diffusion with fp16 ??
If I set use_fp16=False in configs, it works

About GHI dataset.

Hi, I am interested in the GHI dataset. I am wondering whether the GHI dataset will be released.
Thank you for your reply.

The mapping file does not match downloaded Laion-aesthetic

I follow the instructions to download Laion-aesthetic V1. However, I found that the images do not match the provided mapping_file_training.json. For example, 00040_000400060, I downloaded the following image with text prompt "mother of the bride hairstyles: woman with sleek blown out hair and a headband" but mapping_file_training.json says "Understanding Urinary Tract Infections".

Can you provide the script used to download Laion-aesthetic?

Evaluation of quality (FID, KID, IS)

Hi, thanks for the great work and the comprehensive evaluation framework. We recently wanted to add your work for comparison but found that there seems to be a problem in evaluating the quality of generated images. Please tell me if there is any misunderstanding:

As line 233 in scripts/pose2img_metrics.py shows, we saved the images as a concatenation of four images: [generated_image, pose_image, text_image, original_image]. However, in your implementation of the quality evaluation in utils/metrics/evalute_metrics_of_each_category.py, you seem to load the entire image for evaluation rather than splitting out the generated image. Please refer to line 78. We believe this will lower down the quality scores.

We tried to revise the codes in evalute_metrics_of_each_category.py by adding three lines of codes between line 77 and line 78:
img = np.array(img).reshape(512,4,512,3) \\ img = cv2.cvtColor(img[:,0,:,:], cv2.COLOR_BGR2RGB) \\ img = Image.fromarray(img)
The codes above basically extracts the generated image from the saved image. Having revised this, we are able to achieve a FID of about 10 using your pretrained checkpoint (much lower than reported).

Thanks and I'm looking forward to further discussion :)

About the data quality

Hi Xuan,

I used the utils/download_data.py script from the repository to download data via the Laion-Human's URL.

After downloading 1000 images, I randomly sampled several images which looks like:

I would like to confirm if I missed any steps or where the problem occurred.

Thanks.

laion human dataset download

The process can not go on flu/data/disk1/code/HumanSD/humansd_data/datasets/LaionAesthetics/images/00003.parquetently, and even can not download images/00000 wholely, only up to 000095.jpg.
Can u provide a way to directly download the jpg files? I think it is not stable to download from url.

huggingface connect error

hi thx for your excellent work!
when the code automatically downloads the 'open_clip_pytorch_model.bin' model, the hugging face can not connect like this.

And I also try to upload the .bin file to the code, but where I should upload to. Directly up down the '～./cache/models--laion--CLIP-ViT-H-14-laion2B-s32B-b79K' also can not work.
Looking forward to your reply！

Which model ckpt of SD used in the paper for finetune, and how to load the model ckpt of SD in the main.py

Can you give me some guidance, thank you

The problem of the model file download

When I download the "humansd-v1.ckpt" model, the download always fails because the file is too large. Is there any other download link?

How to correctly install and use ldm

Hi, there was some error when I finetuned the humansd about the ldm package.
Using pip install -e . can not fix this error.

After clicking the 'Run' button, the resutls are not obtained and the demo seems running. Can you give me some guidances?

questions about metrics

Hi author, thanks for your team's contribution.

I would like to ask you a question about calculating the metrics during the training process. Specifically, the training process is usually interspersed with a validation step, do you perform the computation of the evaluation metrics during the validation step, which seems to be time consuming. So I'm wondering how you schedule the evaluation during the training process?

Questions on Evaluation Metrics

Thank you very much for your work, we recently tried to add your work for comparison but had some problems with the assessment indicators.

Do you have a reference code for Pose Cosine Similarity-based AP (CAP) and People Count Error (PCE) ? We would like to ensure consistency of values. Thank you very much!:)

How to run the task about (img+pose+text)2img

@juxuan27
Hi, I found that all the demo sample is either img2img or pose2img.
Could you please demo how to give img and pose as condition and generate a the result?
thanks!

How to follow the *.parquet files to download the images

How to generate poses for custom images. Please guide.

@juxuan27 , Can you please explain how to create .npz files for custom dataset.

When will the code be released?

When will the code be released and under which license?

About the heatmap usage

Dear the authors of HumanSD.

First of all, I would like to thank you for sharing your work.
I have read both the paper and the code and I found some parts that I cannot understand so I would like to ask some questions here.

About the heatmap loss, in the Eq. 6 of paper, it is written that $Wa$ is a weight such that the loss around the area that has high correlation to the input condition has higher priority factors.

From my understanding from Figure 2, it seems that the heatmap is used for simple multiplication or simple mask.

But, after checking the code, it seems that the obtained heatmap is not directly used as simple mask. After heatmap is obtained, the heatmap is passed to VAE encoder as shown here:

HumanSD/ldm/models/diffusion/ddpm.py

Line 2011 in c5db29d

back_to_embed_pose_add=self.encode_first_stage(pose_add_weight)

After that, the obtained embedding is used to mask the loss here:

HumanSD/ldm/models/diffusion/ddpm.py

Line 2026 in c5db29d

 loss_simple = torch.mul(self.get_loss(model_output, target, mean=False),(1+self.pose_loss_weight*back_to_embed_pose_add_weight)).mean([1, 2, 3]) 

My questions are the following:

Why is it necessary to pass the obtained heatmap to VAE encoder?
Why do you need 1+ in loss_simple = torch.mul(self.get_loss(model_output, target, mean=False),(1+self.pose_loss_weight*back_to_embed_pose_add_weight)).mean([1, 2, 3])
About obtaining the heatmap as shown in this part of the code.

HumanSD/ldm/models/diffusion/ddpm.py

Line 1998 in c5db29d

aggregated_heatmaps=torch.where(aggregated_heatmaps > self.estimate_thresh,

The way the heatmap is calculated makes the pixel which are greater than threshold has value of zero and otherwise. I thought that the normal way is to assign 1 to pixels where value is greater than threshold. Why the other way is performed here?

I would really appreciate it if you could guide me to understand your work more correctly.
Thank you very much.

Aesthetics_Human 是你们通过人体检测器从laion-Aesthetics中筛选出来的子集吗？

如题

The implementation of Train

Hello authors, your work is quite good, I learned a lot, can you share The implementation of HumanSD with heatmap-guided loss, thank you for your help