rexying / diffpool Goto Github PK

View Code? Open in Web Editor NEW

472.0 472.0 134.0 5.94 MB

License: MIT License

Python 99.31% Shell 0.69%

diffpool's People

Contributors

Stargazers

Watchers

Forkers

wzhang1984 amoliu songfgh hsmoonjohn jiaxuanyou fendaq wangyan9411 vermamachinelearning huangzhii zwytop qss2012 markcheung malizheng chao-jiang jy5380 dlwbm123 zhhhzhang mguarin0 tmacmilan uctoronto xiaodongv ekagra-ranjan superphy light8lee 2prime theconsummate elgiva hujilin1229 hyzcn alge24 neuzhangyf zcrwind arthur151 queenie88 cyhbrilliant zakheav jdhorwood ihaeyong zizai winddd juexinwang meltzerpete milescranmer caralhsi ltkevin crownx vaynelau cslele vivonasg coffeeclh rongzhao-zhang yyl424525 ice110956 10546065 boldchan abhi4rana7 avltreeql aashaybhupendradoshi jpoulletxaccount bigxiaofeng shengzhang90 matthew-tech jeba91 liun-online raihan2108 chinmaykhasnis7 fesianxu antoalli goverose muchen2 fjpsxh hyeokhyen apexcodings louise-lulin achen0804 honeyxjk mldl jiahuawu mzy2240 linmengsysu eric-ewing fendouwangshan aaron-hk humengge newxerichorizon momomom123 youngjin1997 hujunxianligong jeffgan99 jmhicoding datasnail crm2017 sangminwoo almonotonous gadgetsam mrphil junjianye somebodyup hubioinfo xuxiaohan

diffpool's Issues

difference between methods

Thank you for your implementation. You have three different model in your encoders, soft-assign, base-set2set, and base. Whats the difference between these?

Auxiliary Link Prediction Objective and Entropy Regularization?

I'm trying to understand the code after reading the paper:

How is Auxiliary Link Prediction Objective calculated? Is it using Frobenius norm as described in the paper? It seems to be nll_loss instead.
Where is the Entropy Regularization?

Thanks!

Auxiliary loss

Hello, could you elaborate the rationale behind the adoption of link prediction loss: nearby nodes to be pooled together?

It's a bit confusing.

Exact architecture from the paper?

I am confused about the exact layers in the architecture from the paper. It states that:
"
We use the “mean” variant of GRAPHSAGE [16] and apply a DIFFPOOL layer after every two GRAPHSAGE layers in our architecture. A total of 2 DIFFPOOL layers are used for the datasets. For small datasets such as ENZYMES and COLLAB, 1 DIFFPOOL layer can achieve similar performance. After each DIFFPOOL layer, 3 layers of graph convolutions are performed, before the next DIFFPOOL layer, or the readout layer.
"

So 2 or 3 GraphSAGE layers are used? Which one of the below would be correct? Or if no one is correct, what would be the exact architecture from the paper?

1. GraphSAGE
2. GraphSAGE
3. DIFFPOOL
4. GraphSAGE
5. GraphSAGE
6. DIFFPOOL
7. GraphSAGE
8. GraphSAGE
9. GraphSAGE
10. READOUT

1. GraphSAGE
2. GraphSAGE
3. GraphSAGE
4. DIFFPOOL
5. GraphSAGE
6. GraphSAGE
7. GraphSAGE
8. DIFFPOOL
9. GraphSAGE
10. GraphSAGE
11. GraphSAGE
12. READOUT

1. GraphSAGE
2. GraphSAGE
3. DIFFPOOL
4. GraphSAGE
5. GraphSAGE
6. DIFFPOOL
7. READOUT

batch normalization

I am a little bit confused by the batch normalization implementation here: everytime I run bn a self.bn(x) is used, where a new bn layer is created, i.e., bn_module = nn.BatchNorm1d(x.size()[1]).cuda(). Will this fail to train the parameters in the bn as it is created incrementally?

Requirements

May I ask the requirements of this project?
What version of pytorch is using?

Thanks in advance.

the DD datasets had high validation accuracy at the begging of the training.

hello, I have a question, why the DD datasets had high validation accuracy at the begging of the training. Could you please answer my question?

How to set features for REDDIT-12k and COLLAB?

I found on this website, REDDIT-12k and COLLAB neither has node label nor node attribute. How do you set the node feature for training?

Also in your paper, you mentioned Appendix A but I couldn't find it anywhere. How can I find it?

Thanks!

Reproducing results

Would be good to list the exact commands needed to reproduce the results reported in the paper. Similar to how they're listed in the example.sh script.

why the maximum number of nodes are constrained?

and what about other dataset except for D&D, ENZYMES?

in example.sh, there are only information (of max_node) about two dataset.

Some questions about the SoftPoolingGcnEncoder

hello,
I edited parser.set_defaults{'method'='soft-assign' }, then error happened on the SoftPoolingGcnEncoder. So I have some questions here:
Whether nn.Linear() should add .cuda() on 90th line?
Whether self.assign_pred should change into self.assign_pred_modules[i] in 339th line?
Whether there is difference between assign_input_dim and the dimension of x_a update by new x?
Can you let me know whether I misunderstood?

Cluster assignments learning for Graph classification

Hi @RexYing ,

Thank you for your work and for the code.

I have a question related to cluster assignments in the context of graph classification.

How do you initialize the number C of clusters and how to get the assignments cluster for each node.

Given x of dimension (30,256) where 30 is the number of nodes and 256 the dimension of features. Where x is the output of my convolutional layer. A is an adjacency matrix of dimension (30,30).

How can l get S the assignments matrix of dimension (30,C) given x and A.

Thank you

Receiving a TypeError when executing example.sh

I was trying to run the file example.sh. The first command runs fine, but it throws an exception on the second one:

$ python -m train --bmname=ENZYMES --assign-ratio=0.1 --hidden-dim=30 --output-dim=30 --cuda=1 --num-classes=6 --method=soft-assign

The output is as follows. Any idea what is going on? Given that I haven't changed anything, maybe the version of packages are different?

CUDA 1
Using node labels
Num training graphs:  540 ; Num validation graphs:  60
Number of graphs:  600
Number of edges:  37282
Max, avg, std of graph size:  125 , 32.46 , 14.87
Method: soft-assign
/location/diffpool/encoders.py:71: UserWarning: nn.init.xavier_uniform is now deprecated in favor of nn.init.xavier_uniform_.
  m.weight.data = init.xavier_uniform(m.weight.data, gain=nn.init.calculate_gain('relu'))
/location/diffpool/encoders.py:73: UserWarning: nn.init.constant is now deprecated in favor of nn.init.constant_.
  m.bias.data = init.constant(m.bias.data, 0.0)
/location/diffpool/encoders.py:293: UserWarning: nn.init.xavier_uniform is now deprecated in favor of nn.init.xavier_uniform_.
  m.weight.data = init.xavier_uniform(m.weight.data, gain=nn.init.calculate_gain('relu'))
/location/diffpool/encoders.py:295: UserWarning: nn.init.constant is now deprecated in favor of nn.init.constant_.
  m.bias.data = init.constant(m.bias.data, 0.0)
Epoch:  0
/location/miniconda/envs/pytorch/lib/python3.6/site-packages/torch/nn/_reduction.py:49: UserWarning: size_average and reduce args will be deprecated, please use reduction='mean' instead.
  warnings.warn(warning.format(ret))
/location/diffpool/train.py:209: UserWarning: torch.nn.utils.clip_grad_norm is now deprecated in favor of torch.nn.utils.clip_grad_norm_.
  nn.utils.clip_grad_norm(model.parameters(), args.clip)
Traceback (most recent call last):
  File "/location/miniconda/envs/pytorch/lib/python3.6/site-packages/PIL/Image.py", line 2515, in fromarray
    mode, rawmode = _fromarray_typemap[typekey]
KeyError: ((1, 1, 1800), '|u1')

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/location/miniconda/envs/pytorch/lib/python3.6/runpy.py", line 193, in _run_module_as_main
    "__main__", mod_spec)
  File "/location/miniconda/envs/pytorch/lib/python3.6/runpy.py", line 85, in _run_code
    exec(code, run_globals)
  File "/location/diffpool/train.py", line 640, in <module>
    main()
  File "/location/diffpool/train.py", line 628, in main
    benchmark_task_val(prog_args, writer=writer)
  File "/location/diffpool/train.py", line 511, in benchmark_task_val
    writer=writer)
  File "/location/diffpool/train.py", line 218, in train
    log_assignment(model.assign_tensor, writer, epoch, writer_batch_idx)
  File "/location/diffpool/train.py", line 98, in log_assignment
    writer.add_image('assignment', data, epoch)
  File "/location/miniconda/envs/pytorch/lib/python3.6/site-packages/tensorboardX/writer.py", line 427, in add_image
    image(tag, img_tensor, dataformats=dataformats), global_step, walltime)
  File "/location/miniconda/envs/pytorch/lib/python3.6/site-packages/tensorboardX/summary.py", line 216, in image
    image = make_image(tensor, rescale=rescale)
  File "/location/miniconda/envs/pytorch/lib/python3.6/site-packages/tensorboardX/summary.py", line 254, in make_image
    image = Image.fromarray(tensor)
  File "/location/miniconda/envs/pytorch/lib/python3.6/site-packages/PIL/Image.py", line 2517, in fromarray
    raise TypeError("Cannot handle this data type")
TypeError: Cannot handle this data type

Different Graph Size

Hi,
How do you deal with input graph with different size, since once the assignment matrix S is learned,its size is fixed. How to solve different graphs with different number of nodes?

Thank you.

About the module pipeline

Hi, I have a question on the module in your paper, which says "apply a Diffpool layer after two Graphsage layers", "A total of 2 Dffpool layers are used ", "After each Diffpool layer, 3 layers of graph convolutions are performed ". Ain't these paradox? The "graph convolutions" here are not Graphsage layers?
If so, is the pipeline 2 Graphsage+diffpool1+3 graph convolutions+fc1+2 Graphsage+diffpool2 +3 graph convolutions +fc2+fc3?
Is the Graphgsage used as edbedding GNN and "graph convolutions" as pooling GNN? why use different GNNs?
And Why add a prediction layer( fc layer) after each graph convolution in the code?
Also in the encoders.py, I see sometimes classes call each other's methods and attributes without raising an error? How does that work?
Could you give me some guidance on these? Have confused me for several days. Appreciate.

Can this classify nodes on a single graph?

Max-degree=10?

Why while calculating features of nodes self.max_deg is set to 10 when features==deg or struct?

Your Implementation is WRONG

In encoders.py line 251, line 272, line 276

class SoftPoolingGcnEncoder(GcnEncoderGraph):
    def __init__(self, max_num_nodes, input_dim, hidden_dim, embedding_dim, label_dim, num_layers,
            assign_hidden_dim, assign_ratio=0.25, assign_num_layers=-1, num_pooling=1,
            pred_hidden_dims=[50], concat=True, bn=True, dropout=0.0, linkpred=True,
            assign_input_dim=-1, args=None):
        '''
        Args:
            num_layers: number of gc layers before each pooling
            num_nodes: number of nodes for each graph in batch
            linkpred: flag to turn on link prediction side objective
        '''

        super(SoftPoolingGcnEncoder, self).__init__(input_dim, hidden_dim, embedding_dim, label_dim,
                num_layers, pred_hidden_dims=pred_hidden_dims, concat=concat, args=args)
        add_self = not concat
        self.num_pooling = num_pooling
        self.linkpred = linkpred
        self.assign_ent = True

        # GC
        self.conv_first_after_pool = []
        self.conv_block_after_pool = []
        self.conv_last_after_pool = []
        for i in range(num_pooling):
            # use self to register the modules in self.modules()
            self.conv_first2, self.conv_block2, self.conv_last2 = self.build_conv_layers(
                    self.pred_input_dim, hidden_dim, embedding_dim, num_layers, 
                    add_self, normalize=True, dropout=dropout)
            self.conv_first_after_pool.append(self.conv_first2)
            self.conv_block_after_pool.append(self.conv_block2)
            self.conv_last_after_pool.append(self.conv_last2)
        ...

you try to add self. ahead of submodules: self.conv_first2, self.conv_block2, self.conv_last2 and so on to register these modules.

It seems that you DONT'T fully understand the registration mechanism of pytorch, every loop will just remove the same name registed modules and then to register a new one, so only the last one is registed. The result is that this code only training with the top layer.
You can check the detail of pytorch's registration mechanism in the source code of pytorch:
https://github.com/pytorch/pytorch/blob/3805490d6ac7d77f93f91c55cdb37f08cfcfa254/torch/nn/modules/module.py#L542
Or you can try yourself in a small example:

class Test(nn.Module):
    def __init__(self):
        super(Test, self).__init__()
        self.l = nn.Linear(5, 3)

t = Test()
print('t:', list(t.parameters()))

class Test2(nn.Module):
    def __init__(self):
        super(Test2, self).__init__()
        self.ls = []
        for i in range(3):
            self.l = nn.Linear(5, 3)  # Wrong
            self.ls.append(self.l)  # Wrong

t2 = Test2()
print('t2:', list(t2.parameters()))

class Test3(nn.Module):
    def __init__(self):
        super(Test3, self).__init__()
        self.ls = []
        for i in range(3):
            l = nn.Linear(5, 3)
            self.ls.append(l)
        self.ls = nn.ModuleList(self.ls)  # This is the correct way!

t3 = Test3()
print('t3:', list(t3.parameters()))

The Result is:

t: [Parameter containing:
tensor([[ 0.1651, -0.0475, -0.2202, -0.3338, -0.1139],
        [-0.0085, -0.2673, -0.1991, -0.2291, -0.2957],
        [ 0.1386,  0.4027, -0.2351, -0.3703, -0.2127]], requires_grad=True), Parameter containing:
tensor([-0.3098, -0.0805,  0.2104], requires_grad=True)]
t2: [Parameter containing:
tensor([[ 0.1684, -0.2316,  0.3627,  0.2841,  0.3111],
        [ 0.3504, -0.1804, -0.0317, -0.2029, -0.3229],
        [-0.0555, -0.3587, -0.0170, -0.0934,  0.1872]], requires_grad=True), Parameter containing:
tensor([ 0.0846, -0.3757,  0.3119], requires_grad=True)]
t3: [Parameter containing:
tensor([[ 0.1828, -0.3483,  0.0333,  0.4261, -0.0936],
        [-0.3653,  0.0440, -0.3851,  0.4471, -0.0447],
        [ 0.3575,  0.0553, -0.3604,  0.4240, -0.4345]], requires_grad=True), Parameter containing:
tensor([-0.4442,  0.2580, -0.0387], requires_grad=True), Parameter containing:
tensor([[ 0.4045,  0.2508, -0.0131, -0.0955,  0.2923],
        [ 0.2839, -0.2880, -0.4088, -0.2684,  0.0620],
        [ 0.3492,  0.3595,  0.3115, -0.3213, -0.3045]], requires_grad=True), Parameter containing:
tensor([ 0.0668, -0.2749,  0.1292], requires_grad=True), Parameter containing:
tensor([[ 0.3967,  0.1927, -0.4264, -0.0192,  0.2547],
        [ 0.1547,  0.1158,  0.1841,  0.3160, -0.4269],
        [ 0.1797,  0.0126,  0.3575,  0.0975, -0.1188]], requires_grad=True), Parameter containing:
tensor([-0.0337,  0.4061,  0.3979], requires_grad=True)]

You can clearly find that t2 is only register the last Linear module.

So I DO doubt the results of your paper, even if I haven't run your code. I think you should check your code and produce new results.

edge attributes

This might be a stupid question - but do you use edge attibutes anywhere in the code? I can't seem to figure out where and how.

Appendix pages

Hello, Rex Ying!

I'm Jaehoon, and I'm so thankful to your paper including DiffPooling method.
I would just like to ask you about 'Appendix' pages.
I already checked this issue, #7, but I can't still find the one.
So, how could I get the appendix pages now??

Thank you.
Best Regards

Parameters for REDDIT-MULTI-12K dataset

Can I ask what are the parameters for running REDDIT-MULTI-12K?
I ran into memory error when trying:
python -m train --bmname=REDDIT-MULTI-12K --assign-ratio=0.1 --hidden-dim=5 --output-dim=11 --cuda=0 --batch-size=10 --num-gc-layers=1 --num-classes=11 --method=soft-assign

thanks!

Cannot benchmark on common graph classification dataset

Hi,

I'm benchmarking your code (without modification) on common datasets also processed by Kristian, Nils, et al.. But couldn't get it run on 3 common datasets: MUTAG, PTC, REDDIT-BINARY.

All facing the same error as below:

$ python -m train --datadir=data --bmname=MUTAG --cuda=0 --max-nodes=28 --num-classes=2
Remove existing log dir:  log/MUTAG_base_l3_h20_o20
CUDA 0
No node attributes
Using node labels
Num training graphs:  170 ; Num validation graphs:  18
Number of graphs:  188
Number of edges:  3721
Max, avg, std of graph size:  28 , 17.93 , 4.58
Method: base
/home/survivio/ruochun/diffpool/encoders.py:71: UserWarning: nn.init.xavier_uniform is now deprecated in favor of nn.init.xavier_uniform_.
  m.weight.data = init.xavier_uniform(m.weight.data, gain=nn.init.calculate_gain('relu'))
/home/survivio/ruochun/diffpool/encoders.py:73: UserWarning: nn.init.constant is now deprecated in favor of nn.init.constant_.
  m.bias.data = init.constant(m.bias.data, 0.0)
Epoch:  0
/home/survivio/anaconda3/lib/python3.6/site-packages/torch/nn/_reduction.py:49: UserWarning: size_average and reduce args will be deprecated, please use reduction='mean' instead.
  warnings.warn(warning.format(ret))
/opt/conda/conda-bld/pytorch_1544174967633/work/aten/src/THCUNN/ClassNLLCriterion.cu:105: void cunn_ClassNLLCriterion_updateOutput_kernel(Dtype *, Dtype *, Dtype *, long *, Dtype *, int, int, int, int, long) [with Dtype = float, Acctype = float]: block: [0,0,0], thread: [0,0,0] Assertion `t >= 0 && t < n_classes` failed.
/opt/conda/conda-bld/pytorch_1544174967633/work/aten/src/THCUNN/ClassNLLCriterion.cu:105: void cunn_ClassNLLCriterion_updateOutput_kernel(Dtype *, Dtype *, Dtype *, long *, Dtype *, int, int, int, int, long) [with Dtype = float, Acctype = float]: block: [0,0,0], thread: [3,0,0] Assertion `t >= 0 && t < n_classes` failed.
/opt/conda/conda-bld/pytorch_1544174967633/work/aten/src/THCUNN/ClassNLLCriterion.cu:105: void cunn_ClassNLLCriterion_updateOutput_kernel(Dtype *, Dtype *, Dtype *, long *, Dtype *, int, int, int, int, long) [with Dtype = float, Acctype = float]: block: [0,0,0], thread: [4,0,0] Assertion `t >= 0 && t < n_classes` failed.
/opt/conda/conda-bld/pytorch_1544174967633/work/aten/src/THCUNN/ClassNLLCriterion.cu:105: void cunn_ClassNLLCriterion_updateOutput_kernel(Dtype *, Dtype *, Dtype *, long *, Dtype *, int, int, int, int, long) [with Dtype = float, Acctype = float]: block: [0,0,0], thread: [5,0,0] Assertion `t >= 0 && t < n_classes` failed.
/opt/conda/conda-bld/pytorch_1544174967633/work/aten/src/THCUNN/ClassNLLCriterion.cu:105: void cunn_ClassNLLCriterion_updateOutput_kernel(Dtype *, Dtype *, Dtype *, long *, Dtype *, int, int, int, int, long) [with Dtype = float, Acctype = float]: block: [0,0,0], thread: [7,0,0] Assertion `t >= 0 && t < n_classes` failed.
/opt/conda/conda-bld/pytorch_1544174967633/work/aten/src/THCUNN/ClassNLLCriterion.cu:105: void cunn_ClassNLLCriterion_updateOutput_kernel(Dtype *, Dtype *, Dtype *, long *, Dtype *, int, int, int, int, long) [with Dtype = float, Acctype = float]: block: [0,0,0], thread: [10,0,0] Assertion `t >= 0 && t < n_classes` failed.
/opt/conda/conda-bld/pytorch_1544174967633/work/aten/src/THCUNN/ClassNLLCriterion.cu:105: void cunn_ClassNLLCriterion_updateOutput_kernel(Dtype *, Dtype *, Dtype *, long *, Dtype *, int, int, int, int, long) [with Dtype = float, Acctype = float]: block: [0,0,0], thread: [12,0,0] Assertion `t >= 0 && t < n_classes` failed.
/opt/conda/conda-bld/pytorch_1544174967633/work/aten/src/THCUNN/ClassNLLCriterion.cu:105: void cunn_ClassNLLCriterion_updateOutput_kernel(Dtype *, Dtype *, Dtype *, long *, Dtype *, int, int, int, int, long) [with Dtype = float, Acctype = float]: block: [0,0,0], thread: [13,0,0] Assertion `t >= 0 && t < n_classes` failed.
/opt/conda/conda-bld/pytorch_1544174967633/work/aten/src/THCUNN/ClassNLLCriterion.cu:105: void cunn_ClassNLLCriterion_updateOutput_kernel(Dtype *, Dtype *, Dtype *, long *, Dtype *, int, int, int, int, long) [with Dtype = float, Acctype = float]: block: [0,0,0], thread: [16,0,0] Assertion `t >= 0 && t < n_classes` failed.
THCudaCheck FAIL file=/opt/conda/conda-bld/pytorch_1544174967633/work/aten/src/THCUNN/generic/ClassNLLCriterion.cu line=111 error=59 : device-side assert triggered
Traceback (most recent call last):
  File "/home/survivio/anaconda3/lib/python3.6/runpy.py", line 193, in _run_module_as_main
    "__main__", mod_spec)
  File "/home/survivio/anaconda3/lib/python3.6/runpy.py", line 85, in _run_code
    exec(code, run_globals)
  File "/home/survivio/ruochun/diffpool/train.py", line 640, in <module>
    main()
  File "/home/survivio/ruochun/diffpool/train.py", line 628, in main
    benchmark_task_val(prog_args, writer=writer)
  File "/home/survivio/ruochun/diffpool/train.py", line 511, in benchmark_task_val
    writer=writer)
  File "/home/survivio/ruochun/diffpool/train.py", line 205, in train
    loss = model.loss(ypred, label)
  File "/home/survivio/ruochun/diffpool/encoders.py", line 193, in loss
    return F.cross_entropy(pred, label, size_average=True)
  File "/home/survivio/anaconda3/lib/python3.6/site-packages/torch/nn/functional.py", line 1970, in cross_entropy
    return nll_loss(log_softmax(input, 1), target, weight, None, ignore_index, None, reduction)
  File "/home/survivio/anaconda3/lib/python3.6/site-packages/torch/nn/functional.py", line 1790, in nll_loss
    ret = torch._C._nn.nll_loss(input, target, weight, _Reduction.get_enum(reduction), ignore_index)
RuntimeError: cuda runtime error (59) : device-side assert triggered at /opt/conda/conda-bld/pytorch_1544174967633/work/aten/src/THCUNN/generic/ClassNLLCriterion.cu:111

Could you help to investigate the root cause?

Thanks,
Ruochun

Code of paper?

Is this the official implementation of the paper |
Hierarchical Graph Representation Learning with
Differentiable Pooling

About the name

Thank you very much for your interesting paper and code. I have a concern regarding the naming. Instead of pooling, I think what you proposed is a fully connected layer for graph. Each output node is connected with all the input nodes. The number of output nodes for diffpool can be larger or smaller than the input. Since it is officially called a pooling layer, all the subsequently proposed pooling layers should be compared with diffpool, which is actually based on a fully connected mechanism (a fc layer). It is unfair and doesn't quite make sense.

Auxiliary losses implementations

Hi Rex,
I have a couple of questions regarding the implementation of the auxiliary losses.

In the paper it says that
'at each layer l, we minimize

LLP = ||A(l), S(l)S(l)^T||_F,   where || · ||_F

denotes the Frobenius norm.'
However, in the code, what I find is:

self.link_loss = -adj * torch.log(pred_adj+eps) - (1-adj) * torch.log(1-pred_adj+eps)
which is the binary cross-entropy on pre_adj
Could you please explain why/how this is equivalent to the mathematical formulation? Also, I believe that the pre_adj used is created with the final assignation tensor, isn't it?

In theory you are also regularizing the entropy of the cluster assignment by minimizing
LE = 1/n Sum(H(Si))
but I can't see this anywhere in the code? Could you point me to this please?
A third comment, not related to the losses is that in the experiments section of the paper you say that you use GraphSAGE as a base for the model, but as far as I could see in the code, it is using a GConv. Could you also enlighten me a little bit on this please?

Thanks!
Guadalupe

How can I visualize diffpool process?

Thank you for your nice work.

Like in your paper fig.2 , How can I visualize diffpooling process?
What method did you use?

Best,
Daeun Lee

About Dataset and Performance

I have read your paper very interestingly and want to reproduce the performance for the ENZYMES dataset.
First of all, thank you for your great paper and code.

However, performance is not reproducible, and diffpool implementations of pytorch-geometric also do not reproduce the performance of this paper.
(Depending on a random seed, the performance difference is very severe.)
I am curious about the experimental environment with data.

I wonder if you divided the data by train/validation, or divided the data by train/validation/test.
(The benchmark_task_val in your code does not split the test set.)
What is the percentage of train/validation/test set?
I wonder whether the performance on the paper is for the validation set or test set. Did you report the average 10-fold validation performance without an explicit test set?

What GNN method is used in this implementation?

Hi,

According to the paper, it said you used the GraphSAGE as the GNN module and stack two layers before diffpool layer. However, in this code implementation, are you using GCN instead and stack 3 layers before diffpool?

Paper stats for proteins or proteins_full?

@RexYing Hi, I am wondering if the stats provided in the paper under proteins column is for Proteins, proteins_full, concatenation, node-label features, etc.?

Batch Normalization and the direction of softmax

Hi,

Thank you for sharing your code. It is a very exciting paper! I have a few concerns about some details in your code. Please correct me if I make any mistakes. :-)

The parameters in BatchNormalization is not trainable, as discussed in this issue. In this case, I personally think it is better to call it standardization instead of batch normalization. And I am wondering whether it is possible to view it in the way of CNN and image processing, where 1D batch normalization is applied to every image in a batch and every location in an image. Here we can regard the different nodes in graphs as different locations in images where convolutional kernels are applied.
Which dimension should be applied to BatchNormalization. As discussed in (1), if we regard the nodes in graphs equivalent to the different locations in images, whether it would be better to do batch normalization to the last dimension, which is the feature channel of nodes. I personally guess it might make more sense than applying BatchNormalization to the node dimension.
The direction of softmax in calculating the assignment matrix. I am curious whether it is better to softmax across the dimension of the node or the dimension of the new feature. I agree that to create an assignment matrix, the contribution of the original nodes to the new clusters should sum to 1 by applying softmax to the last dimension. But I am still wondering whether it makes sense if I apply softmax to the dimension of the node. In this way, each column of the assignment matrix could represent a contribution distribution of the original nodes.

Error when plt.savefig

I got an FileNotFoundError when executing plt.savefig(gen_train_plt_name(args), dpi=600) in train.py/train()

  File "/home/LAB/penghao/.conda/envs/torch-sparse/lib/python3.6/site-packages/matplotlib/pyplot.py", line 842, in savefig
    res = fig.savefig(*args, **kwargs)
  File "/home/LAB/penghao/.conda/envs/torch-sparse/lib/python3.6/site-packages/matplotlib/figure.py", line 2311, in savefig
    self.canvas.print_figure(fname, **kwargs)
  File "/home/LAB/penghao/.conda/envs/torch-sparse/lib/python3.6/site-packages/matplotlib/backend_bases.py", line 2217, in print_figure
    **kwargs)
  File "/home/LAB/penghao/.conda/envs/torch-sparse/lib/python3.6/site-packages/matplotlib/backend_bases.py", line 1639, in wrapper
    return func(*args, **kwargs)
  File "/home/LAB/penghao/.conda/envs/torch-sparse/lib/python3.6/site-packages/matplotlib/backends/backend_agg.py", line 512, in print_png
    dpi=self.figure.dpi, metadata=metadata, pil_kwargs=pil_kwargs)
  File "/home/LAB/penghao/.conda/envs/torch-sparse/lib/python3.6/site-packages/matplotlib/image.py", line 1591, in imsave
    image.save(fname, **pil_kwargs)
  File "/home/LAB/penghao/.conda/envs/torch-sparse/lib/python3.6/site-packages/PIL/Image.py", line 2155, in save
    fp = builtins.open(filename, "w+b")
FileNotFoundError: [Errno 2] No such file or directory: 'results/ENZYMES_soft-assign_l3x1_ar10_h30_o30.png'

Can DiffPool be used on Bipartite graph?

Thanks for your contribution!

Excuse me, can DiffPool be used on Bipartite graph? Do you have any idea of how to do this?
Thanks a lot!

Typo in example.sh

Hi,

I'm trying your code by running example.sh and found the argument --label-classes should be --num-classes according to arg_parse function in train.py.
You may consider to fix the typo.

Best,
Ruochun

Dadaset

Hello. Is REDDIT-MULTI-5K dataset part of REDDIT-MULTI-12K dataset?

Test dataset

Hi, thanks for your work. The code is really hard for me now, could you give me a overall guide for two questions?

When a graph in test dataset has a size that does not show in train dataset, the parameters got from the train set don't fit the "unseen" graph, then how the codes work? And I can't even find the test dataset in the codes, just validation dataset, why this?
The main way to construct the graph dataset is using networkx module, are there any advantages here? Cause the GCN, GAT,ST-GCN don't use this, they seem to be more brief.
Thanks.

Receiving a IndexError when executing example.sh

I was trying to run the file example.sh. The first command runs fine, but it throws an exception on the second one:

$ python -m train --bmname=ENZYMES --assign-ratio=0.1 --hidden-dim=30 --output-dim=30 --cuda=1 --num-classes=6 --method=soft-assign

The output is as follows. Any idea what is going on? Given that I haven't changed anything, maybe the version of packages are different?

CUDA 1 Using node labels Num training graphs: 540 ; Num validation graphs: 60 Number of graphs: 600 Number of edges: 37282 Max, avg, std of graph size: 125 , 32.46 , 14.87 Method: soft-assign /home/sunxiaohan/paper2/diffpool-master/encoders.py:71: UserWarning: nn.init.xavier_uniform is now deprecated in favor of nn.init.xavier_uniform_. m.weight.data = init.xavier_uniform(m.weight.data, gain=nn.init.calculate_gain('relu')) /home/sunxiaohan/paper2/diffpool-master/encoders.py:73: UserWarning: nn.init.constant is now deprecated in favor of nn.init.constant_. m.bias.data = init.constant(m.bias.data, 0.0) /home/sunxiaohan/paper2/diffpool-master/encoders.py:293: UserWarning: nn.init.xavier_uniform is now deprecated in favor of nn.init.xavier_uniform_ . m.weight.data = init.xavier_uniform(m.weight.data, gain=nn.init.calculate_gain('relu')) /home/sunxiaohan/paper2/diffpool-master/encoders.py:295: UserWarning: nn.init.constant is now deprecated in favor of nn.init.constant_. m.bias.data = init.constant(m.bias.data, 0.0) Epoch: 0 Traceback (most recent call last): File "/home/sunxiaohan/anaconda3/envs/pep/lib/python3.8/runpy.py", line 194, in _run_module_as_main return _run_code(code, main_globals, None, File "/home/sunxiaohan/anaconda3/envs/pep/lib/python3.8/runpy.py", line 87, in _run_code exec(code, run_globals) File "/home/sunxiaohan/paper2/diffpool-master/train.py", line 652, in <module> main() File "/home/sunxiaohan/paper2/diffpool-master/train.py", line 640, in main benchmark_task_val(prog_args, writer=writer) File "/home/sunxiaohan/paper2/diffpool-master/train.py", line 519, in benchmark_task_val _, val_accs = train(train_dataset, model, args, val_dataset=val_dataset, test_dataset=None, File "/home/sunxiaohan/paper2/diffpool-master/train.py", line 199, in train for batch_idx, data in enumerate(dataset): File "/home/sunxiaohan/anaconda3/envs/pep/lib/python3.8/site-packages/torch/utils/data/dataloader.py", line 652, in __next__ data = self._next_data() File "/home/sunxiaohan/anaconda3/envs/pep/lib/python3.8/site-packages/torch/utils/data/dataloader.py", line 1347, in _next_data return self._process_data(data) File "/home/sunxiaohan/anaconda3/envs/pep/lib/python3.8/site-packages/torch/utils/data/dataloader.py", line 1373, in _process_data data.reraise() File "/home/sunxiaohan/anaconda3/envs/pep/lib/python3.8/site-packages/torch/_utils.py", line 461, in reraise raise exception IndexError: Caught IndexError in DataLoader worker process 0. Original Traceback (most recent call last): File "/home/sunxiaohan/anaconda3/envs/pep/lib/python3.8/site-packages/torch/utils/data/_utils/worker.py", line 302, in _worker_loop data = fetcher.fetch(index) File "/home/sunxiaohan/anaconda3/envs/pep/lib/python3.8/site-packages/torch/utils/data/_utils/fetch.py", line 49, in fetch data = [self.dataset[idx] for idx in possibly_batched_index] File "/home/sunxiaohan/anaconda3/envs/pep/lib/python3.8/site-packages/torch/utils/data/_utils/fetch.py", line 49, in <listcomp> data = [self.dataset[idx] for idx in possibly_batched_index] File "/home/sunxiaohan/paper2/diffpool-master/graph_sampler.py", line 101, in __getitem__ num_nodes = adj.shape[0] IndexError: tuple index out of range

Rolling Up Cross Validation Results

Thanks for providing the repo. I was going through the code and couldn't understand why the code is averaging the results across all the cross validation sets per epoch and then picking the max?

I would think we would want to pick the max accuracy from each cross validation run and then average the best values from each run?

code for auxiliary link prediction objective

hi, thanks your code!
in your code, you use the auxiliary link prediction objective to reconstruct the adj, and the relevant code is in encoders.py.
in encoders.py line 383, you try to compare every element of pred_adj and 1 to ensure they are less than 1, which can successfully calculate torch.log in the line 387.
however, torch.Tensor(1) means generate a random tensor scalar, not 1. maybe try torch.Tensor([1])
that's a tiny mistake in my opinion. Or ,I'm wrong!
wish your reply!

Typo in encoder.py causes nan

Hi, thanks for your codes. I think in encoder.py line 383, pred_adj = torch.min(pred_adj, torch.Tensor(1).cuda()) should be pred_adj = torch.min(pred_adj, torch.Tensor([1, ]).cuda()). I'm using PyTorch 1.1.0, and torch.Tensor(1).cuda()) will create a random tensor of size 1. Sometimes this will lead to negative pred_adj, and causes nan in log function.

Cannot reproduce results in the paper

I have fixed some small bugs mentioned on the before issue and run the program. However, I cannot get the result mentioned in the paper. The results generated from running the code is not averaged over 10-fold (the paper mentions results are averaged from 10 folds cross-validation), so I have to average them to get the 10-fold cross-validation accuracy (best validation model of each fold is chosen for test), which is 47.74% for Enzyme, and 67.37% for DD. This does not match the results of the paper. Can you let us know how to reproduce the results from the paper?