Git Product home page Git Product logo

humansd's People

Contributors

juxuan27 avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

humansd's Issues

Downloading LAION-Human Dataset

Hi, thanks for your great work!

I'm finding it difficult to download the LAION-Human dataset using the provided python scripts (utils/download_data.py) as downloading from URLs seems to be slow and unstable. Would it be possible for you to directly provide zip files containing the jpg images, similar to the pose files? This would make the download process much more convenient. Thank you.

image

Evaluation bugs

Hi! Thanks for sharing great work!

While I am following instructions for inference in README, I've encountered some bugs and the result is not good after resolved them. (when running scripts/gradio/pose2img.py)

Bugs

  1. functions from mmpose.apis are changed
    from mmpose.apis import inference_bottom_up_pose_model, init_pose_model
  • name of fucntion
    • init_pose_model -> init_model
    • inference_bottom_up_pose_model -> inference_bottomup
  • And arguments of each functions seems changed
  1. use_fp16=True in
  • If use_fp16 is True, error occurs as follow
    image

Result

  • What I did are 1) skip mmpose (I don't know well about mmpose and it requires lots of debugging) and feed pose image directly rather than running pre-trained mmpose model, 2) set use_fp16=False.
  • Set arguments as follows
    input_image = Image.open('assets/demo/demo7.jpg')
    load_image_type="Upload skeleton image",
    prompt = "A girl is running in the field"
    num_samples = 1
    ddim_steps = 50
    detection_thresh = 0.05
    scale = 10
    strength = 1.
    seed = 299033459
    eta = 0.
    negative_prompt = "longbody, lowres, bad anatomy, bad hands, missing fingers, extra digit, fewer digits, cropped, worst quality, low quality"
    added_prompt = "detailed background, best quality, extremely detailed"
    predict(
        comparison_model=[],
        load_image_type=load_image_type,
        input_image=input_image,
        prompt=prompt,
        added_prompt=added_prompt,
        ddim_steps=ddim_steps,
        detection_thresh=detection_thresh,
        num_samples=num_samples,
        scale=scale,
        seed=seed,
        eta=eta,
        strength=strength,
        negative_prompt=negative_prompt,

def predict(comparison_model, load_image_type, input_image, prompt, added_prompt, ddim_steps, detection_thresh, num_samples, scale, seed, eta, strength, negative_prompt, save_path="logs/gradio_images"):
    image = np.array(input_image.convert("RGB"))
    image = HWC3(image)
    image = resize_image(image, IMAGE_RESOLUTION)
    humansd_pose_image = image

    ...

But result is quite bad.

  • My result
    2023-05-09-20_24_01
  • Result in README
    image

Can you give me any advices ??
Thank you!

Memory issue when calculating FID

Thank you for sharing your great work!

I'd like to calculate FID given source images and generated images.

In your paper, the number of validation images is 3,750 images.
In my experiment, the number of images is 3000, less than your case.
When I calculate FID, out of memory issue happened on V100 gpu in here

Could you please give me any comments to solve it?

LAION-Human Dataset

Hi,

I have already downloaded the full laion-5b dataset. How can i use your .parquet and mapping file to get corresponding image.

Train with fp16

I am trying to fine-tune HumanSD ckpt with use_fp16=True option. (I installed xformers)
But it still RuntimeError occurs like below
image

  • I think there needs some codes converting model weight to fp16 but there isn't any convertion or autocast

  • def convert_module_to_f16(x):

  • Is it possible training stable diffusion with fp16 ??

  • If I set use_fp16=False in configs, it works

About GHI dataset.

Hi, I am interested in the GHI dataset. I am wondering whether the GHI dataset will be released.
Thank you for your reply.

The mapping file does not match downloaded Laion-aesthetic

I follow the instructions to download Laion-aesthetic V1. However, I found that the images do not match the provided mapping_file_training.json. For example, 00040_000400060, I downloaded the following image with text prompt "mother of the bride hairstyles: woman with sleek blown out hair and a headband" but mapping_file_training.json says "Understanding Urinary Tract Infections".
000400060

Can you provide the script used to download Laion-aesthetic?

Evaluation of quality (FID, KID, IS)

Hi, thanks for the great work and the comprehensive evaluation framework. We recently wanted to add your work for comparison but found that there seems to be a problem in evaluating the quality of generated images. Please tell me if there is any misunderstanding:

As line 233 in scripts/pose2img_metrics.py shows, we saved the images as a concatenation of four images: [generated_image, pose_image, text_image, original_image]. However, in your implementation of the quality evaluation in utils/metrics/evalute_metrics_of_each_category.py, you seem to load the entire image for evaluation rather than splitting out the generated image. Please refer to line 78. We believe this will lower down the quality scores.

We tried to revise the codes in evalute_metrics_of_each_category.py by adding three lines of codes between line 77 and line 78:
img = np.array(img).reshape(512,4,512,3) \\ img = cv2.cvtColor(img[:,0,:,:], cv2.COLOR_BGR2RGB) \\ img = Image.fromarray(img)
The codes above basically extracts the generated image from the saved image. Having revised this, we are able to achieve a FID of about 10 using your pretrained checkpoint (much lower than reported).

Thanks and I'm looking forward to further discussion :)

About the data quality

Hi Xuan,

I used the utils/download_data.py script from the repository to download data via the Laion-Human's URL.

After downloading 1000 images, I randomly sampled several images which looks like:

000002238
000001138
000001038

I would like to confirm if I missed any steps or where the problem occurred.

Thanks.

laion human dataset download

The process can not go on flu/data/disk1/code/HumanSD/humansd_data/datasets/LaionAesthetics/images/00003.parquetently, and even can not download images/00000 wholely, only up to 000095.jpg.
Can u provide a way to directly download the jpg files? I think it is not stable to download from url.
截屏2023-10-22 下午8 00 00

huggingface connect error

hi thx for your excellent work!
when the code automatically downloads the 'open_clip_pytorch_model.bin' model, the hugging face can not connect like this.
截屏2023-10-18 下午4 48 26
And I also try to upload the .bin file to the code, but where I should upload to. Directly up down the '~./cache/models--laion--CLIP-ViT-H-14-laion2B-s32B-b79K' also can not work.
Looking forward to your reply!

questions about metrics

Hi author, thanks for your team's contribution.

I would like to ask you a question about calculating the metrics during the training process. Specifically, the training process is usually interspersed with a validation step, do you perform the computation of the evaluation metrics during the validation step, which seems to be time consuming. So I'm wondering how you schedule the evaluation during the training process?

Questions on Evaluation Metrics

Thank you very much for your work, we recently tried to add your work for comparison but had some problems with the assessment indicators.

Do you have a reference code for Pose Cosine Similarity-based AP (CAP) and People Count Error (PCE) ? We would like to ensure consistency of values. Thank you very much!:)

About the heatmap usage

Dear the authors of HumanSD.

First of all, I would like to thank you for sharing your work.
I have read both the paper and the code and I found some parts that I cannot understand so I would like to ask some questions here.

About the heatmap loss, in the Eq. 6 of paper, it is written that $Wa$ is a weight such that the loss around the area that has high correlation to the input condition has higher priority factors.
image

From my understanding from Figure 2, it seems that the heatmap is used for simple multiplication or simple mask.
image

But, after checking the code, it seems that the obtained heatmap is not directly used as simple mask. After heatmap is obtained, the heatmap is passed to VAE encoder as shown here:

back_to_embed_pose_add=self.encode_first_stage(pose_add_weight)

After that, the obtained embedding is used to mask the loss here:
loss_simple = torch.mul(self.get_loss(model_output, target, mean=False),(1+self.pose_loss_weight*back_to_embed_pose_add_weight)).mean([1, 2, 3])

My questions are the following:

  1. Why is it necessary to pass the obtained heatmap to VAE encoder?
  2. Why do you need 1+ in loss_simple = torch.mul(self.get_loss(model_output, target, mean=False),(1+self.pose_loss_weight*back_to_embed_pose_add_weight)).mean([1, 2, 3])
  3. About obtaining the heatmap as shown in this part of the code.
    aggregated_heatmaps=torch.where(aggregated_heatmaps > self.estimate_thresh,

    The way the heatmap is calculated makes the pixel which are greater than threshold has value of zero and otherwise. I thought that the normal way is to assign 1 to pixels where value is greater than threshold. Why the other way is performed here?

I would really appreciate it if you could guide me to understand your work more correctly.
Thank you very much.

The implementation of Train

Hello authors, your work is quite good, I learned a lot, can you share The implementation of HumanSD with heatmap-guided loss, thank you for your help

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.