idea-research / humansd Goto Github PK
View Code? Open in Web Editor NEW[ICCV 2023] The official implementation of paper "HumanSD: A Native Skeleton-Guided Diffusion Model for Human Image Generation"
License: Apache License 2.0
[ICCV 2023] The official implementation of paper "HumanSD: A Native Skeleton-Guided Diffusion Model for Human Image Generation"
License: Apache License 2.0
How can we use this method for other types of datasets, I mean, other types of image generation (not human generation). This method may need to be pre-trained on a new dataset
Hi, thanks for your great work!
I'm finding it difficult to download the LAION-Human dataset using the provided python scripts (utils/download_data.py
) as downloading from URLs seems to be slow and unstable. Would it be possible for you to directly provide zip files containing the jpg images, similar to the pose files? This would make the download process much more convenient. Thank you.
Hi, I was wondering if it is possible to use face + hands in addition to pose from the openpose in the HumanSD?
Thanks
Best regards
Thank you for your great work!
Here is the file path for pose_ckpt.
But, I couldn't find 'higherhrnet_w48_coco_512x512_udp.pth' ' in the shared link:
https://drive.google.com/drive/folders/1NLQAlF7i0zjEpd-XY0EcVw9iXP5bB5BJ
Should I change the filename in the config?
Hi! Thanks for sharing great work!
While I am following instructions for inference in README, I've encountered some bugs and the result is not good after resolved them. (when running scripts/gradio/pose2img.py)
HumanSD/scripts/gradio/pose2img.py
Line 27 in 464fcc7
input_image = Image.open('assets/demo/demo7.jpg')
load_image_type="Upload skeleton image",
prompt = "A girl is running in the field"
num_samples = 1
ddim_steps = 50
detection_thresh = 0.05
scale = 10
strength = 1.
seed = 299033459
eta = 0.
negative_prompt = "longbody, lowres, bad anatomy, bad hands, missing fingers, extra digit, fewer digits, cropped, worst quality, low quality"
added_prompt = "detailed background, best quality, extremely detailed"
predict(
comparison_model=[],
load_image_type=load_image_type,
input_image=input_image,
prompt=prompt,
added_prompt=added_prompt,
ddim_steps=ddim_steps,
detection_thresh=detection_thresh,
num_samples=num_samples,
scale=scale,
seed=seed,
eta=eta,
strength=strength,
negative_prompt=negative_prompt,
def predict(comparison_model, load_image_type, input_image, prompt, added_prompt, ddim_steps, detection_thresh, num_samples, scale, seed, eta, strength, negative_prompt, save_path="logs/gradio_images"):
image = np.array(input_image.convert("RGB"))
image = HWC3(image)
image = resize_image(image, IMAGE_RESOLUTION)
humansd_pose_image = image
...
But result is quite bad.
Can you give me any advices ??
Thank you!
Thank you for sharing your great work!
I'd like to calculate FID given source images and generated images.
In your paper, the number of validation images is 3,750 images.
In my experiment, the number of images is 3000, less than your case.
When I calculate FID, out of memory issue happened on V100 gpu in here
Could you please give me any comments to solve it?
Hi,
I have already downloaded the full laion-5b dataset. How can i use your .parquet and mapping file to get corresponding image.
I am trying to fine-tune HumanSD ckpt with use_fp16=True
option. (I installed xformers)
But it still RuntimeError occurs like below
I think there needs some codes converting model weight to fp16 but there isn't any convertion or autocast
Is it possible training stable diffusion with fp16 ??
If I set use_fp16=False
in configs, it works
Hi, I am interested in the GHI dataset. I am wondering whether the GHI dataset will be released.
Thank you for your reply.
I follow the instructions to download Laion-aesthetic V1. However, I found that the images do not match the provided mapping_file_training.json. For example, 00040_000400060, I downloaded the following image with text prompt "mother of the bride hairstyles: woman with sleek blown out hair and a headband" but mapping_file_training.json says "Understanding Urinary Tract Infections".
Can you provide the script used to download Laion-aesthetic?
Hi, thanks for the great work and the comprehensive evaluation framework. We recently wanted to add your work for comparison but found that there seems to be a problem in evaluating the quality of generated images. Please tell me if there is any misunderstanding:
As line 233 in scripts/pose2img_metrics.py shows, we saved the images as a concatenation of four images: [generated_image, pose_image, text_image, original_image]. However, in your implementation of the quality evaluation in utils/metrics/evalute_metrics_of_each_category.py, you seem to load the entire image for evaluation rather than splitting out the generated image. Please refer to line 78. We believe this will lower down the quality scores.
We tried to revise the codes in evalute_metrics_of_each_category.py by adding three lines of codes between line 77 and line 78:
img = np.array(img).reshape(512,4,512,3) \\ img = cv2.cvtColor(img[:,0,:,:], cv2.COLOR_BGR2RGB) \\ img = Image.fromarray(img)
The codes above basically extracts the generated image from the saved image. Having revised this, we are able to achieve a FID of about 10 using your pretrained checkpoint (much lower than reported).
Thanks and I'm looking forward to further discussion :)
Hi Xuan,
I used the utils/download_data.py script from the repository to download data via the Laion-Human's URL.
After downloading 1000 images, I randomly sampled several images which looks like:
I would like to confirm if I missed any steps or where the problem occurred.
Thanks.
hi thx for your excellent work!
when the code automatically downloads the 'open_clip_pytorch_model.bin' model, the hugging face can not connect like this.
And I also try to upload the .bin file to the code, but where I should upload to. Directly up down the '~./cache/models--laion--CLIP-ViT-H-14-laion2B-s32B-b79K' also can not work.
Looking forward to your reply!
Can you give me some guidance, thank you
When I download the "humansd-v1.ckpt" model, the download always fails because the file is too large. Is there any other download link?
Hi author, thanks for your team's contribution.
I would like to ask you a question about calculating the metrics during the training process. Specifically, the training process is usually interspersed with a validation step, do you perform the computation of the evaluation metrics during the validation step, which seems to be time consuming. So I'm wondering how you schedule the evaluation during the training process?
Thank you very much for your work, we recently tried to add your work for comparison but had some problems with the assessment indicators.
Do you have a reference code for Pose Cosine Similarity-based AP (CAP) and People Count Error (PCE) ? We would like to ensure consistency of values. Thank you very much!:)
@juxuan27
Hi, I found that all the demo sample is either img2img or pose2img.
Could you please demo how to give img and pose as condition and generate a the result?
thanks!
@juxuan27 , Can you please explain how to create .npz files for custom dataset.
When will the code be released and under which license?
Dear the authors of HumanSD.
First of all, I would like to thank you for sharing your work.
I have read both the paper and the code and I found some parts that I cannot understand so I would like to ask some questions here.
About the heatmap loss, in the Eq. 6 of paper, it is written that
From my understanding from Figure 2, it seems that the heatmap is used for simple multiplication or simple mask.
But, after checking the code, it seems that the obtained heatmap is not directly used as simple mask. After heatmap is obtained, the heatmap is passed to VAE encoder as shown here:
HumanSD/ldm/models/diffusion/ddpm.py
Line 2011 in c5db29d
HumanSD/ldm/models/diffusion/ddpm.py
Line 2026 in c5db29d
My questions are the following:
loss_simple = torch.mul(self.get_loss(model_output, target, mean=False),(1+self.pose_loss_weight*back_to_embed_pose_add_weight)).mean([1, 2, 3])
HumanSD/ldm/models/diffusion/ddpm.py
Line 1998 in c5db29d
I would really appreciate it if you could guide me to understand your work more correctly.
Thank you very much.
如题
Hello authors, your work is quite good, I learned a lot, can you share The implementation of HumanSD with heatmap-guided loss, thank you for your help
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.