Git Product home page Git Product logo

diffpool's People

Contributors

rexying avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

diffpool's Issues

Receiving a IndexError when executing example.sh

I was trying to run the file example.sh. The first command runs fine, but it throws an exception on the second one:

$ python -m train --bmname=ENZYMES --assign-ratio=0.1 --hidden-dim=30 --output-dim=30 --cuda=1 --num-classes=6 --method=soft-assign

The output is as follows. Any idea what is going on? Given that I haven't changed anything, maybe the version of packages are different?

CUDA 1 Using node labels Num training graphs: 540 ; Num validation graphs: 60 Number of graphs: 600 Number of edges: 37282 Max, avg, std of graph size: 125 , 32.46 , 14.87 Method: soft-assign /home/sunxiaohan/paper2/diffpool-master/encoders.py:71: UserWarning: nn.init.xavier_uniform is now deprecated in favor of nn.init.xavier_uniform_. m.weight.data = init.xavier_uniform(m.weight.data, gain=nn.init.calculate_gain('relu')) /home/sunxiaohan/paper2/diffpool-master/encoders.py:73: UserWarning: nn.init.constant is now deprecated in favor of nn.init.constant_. m.bias.data = init.constant(m.bias.data, 0.0) /home/sunxiaohan/paper2/diffpool-master/encoders.py:293: UserWarning: nn.init.xavier_uniform is now deprecated in favor of nn.init.xavier_uniform_ . m.weight.data = init.xavier_uniform(m.weight.data, gain=nn.init.calculate_gain('relu')) /home/sunxiaohan/paper2/diffpool-master/encoders.py:295: UserWarning: nn.init.constant is now deprecated in favor of nn.init.constant_. m.bias.data = init.constant(m.bias.data, 0.0) Epoch: 0 Traceback (most recent call last): File "/home/sunxiaohan/anaconda3/envs/pep/lib/python3.8/runpy.py", line 194, in _run_module_as_main return _run_code(code, main_globals, None, File "/home/sunxiaohan/anaconda3/envs/pep/lib/python3.8/runpy.py", line 87, in _run_code exec(code, run_globals) File "/home/sunxiaohan/paper2/diffpool-master/train.py", line 652, in <module> main() File "/home/sunxiaohan/paper2/diffpool-master/train.py", line 640, in main benchmark_task_val(prog_args, writer=writer) File "/home/sunxiaohan/paper2/diffpool-master/train.py", line 519, in benchmark_task_val _, val_accs = train(train_dataset, model, args, val_dataset=val_dataset, test_dataset=None, File "/home/sunxiaohan/paper2/diffpool-master/train.py", line 199, in train for batch_idx, data in enumerate(dataset): File "/home/sunxiaohan/anaconda3/envs/pep/lib/python3.8/site-packages/torch/utils/data/dataloader.py", line 652, in __next__ data = self._next_data() File "/home/sunxiaohan/anaconda3/envs/pep/lib/python3.8/site-packages/torch/utils/data/dataloader.py", line 1347, in _next_data return self._process_data(data) File "/home/sunxiaohan/anaconda3/envs/pep/lib/python3.8/site-packages/torch/utils/data/dataloader.py", line 1373, in _process_data data.reraise() File "/home/sunxiaohan/anaconda3/envs/pep/lib/python3.8/site-packages/torch/_utils.py", line 461, in reraise raise exception IndexError: Caught IndexError in DataLoader worker process 0. Original Traceback (most recent call last): File "/home/sunxiaohan/anaconda3/envs/pep/lib/python3.8/site-packages/torch/utils/data/_utils/worker.py", line 302, in _worker_loop data = fetcher.fetch(index) File "/home/sunxiaohan/anaconda3/envs/pep/lib/python3.8/site-packages/torch/utils/data/_utils/fetch.py", line 49, in fetch data = [self.dataset[idx] for idx in possibly_batched_index] File "/home/sunxiaohan/anaconda3/envs/pep/lib/python3.8/site-packages/torch/utils/data/_utils/fetch.py", line 49, in <listcomp> data = [self.dataset[idx] for idx in possibly_batched_index] File "/home/sunxiaohan/paper2/diffpool-master/graph_sampler.py", line 101, in __getitem__ num_nodes = adj.shape[0] IndexError: tuple index out of range

Test dataset

Hi, thanks for your work. The code is really hard for me now, could you give me a overall guide for two questions?

  1. When a graph in test dataset has a size that does not show in train dataset, the parameters got from the train set don't fit the "unseen" graph, then how the codes work? And I can't even find the test dataset in the codes, just validation dataset, why this?
  2. The main way to construct the graph dataset is using networkx module, are there any advantages here? Cause the GCN, GAT,ST-GCN don't use this, they seem to be more brief.
    Thanks.

Cannot reproduce results in the paper

I have fixed some small bugs mentioned on the before issue and run the program. However, I cannot get the result mentioned in the paper. The results generated from running the code is not averaged over 10-fold (the paper mentions results are averaged from 10 folds cross-validation), so I have to average them to get the 10-fold cross-validation accuracy (best validation model of each fold is chosen for test), which is 47.74% for Enzyme, and 67.37% for DD. This does not match the results of the paper. Can you let us know how to reproduce the results from the paper?

About Dataset and Performance

I have read your paper very interestingly and want to reproduce the performance for the ENZYMES dataset.
First of all, thank you for your great paper and code.

However, performance is not reproducible, and diffpool implementations of pytorch-geometric also do not reproduce the performance of this paper.
(Depending on a random seed, the performance difference is very severe.)
I am curious about the experimental environment with data.

  1. I wonder if you divided the data by train/validation, or divided the data by train/validation/test.
    (The benchmark_task_val in your code does not split the test set.)
  2. What is the percentage of train/validation/test set?
  3. I wonder whether the performance on the paper is for the validation set or test set. Did you report the average 10-fold validation performance without an explicit test set?

Different Graph Size

Hi,
How do you deal with input graph with different size, since once the assignment matrix S is learned,its size is fixed. How to solve different graphs with different number of nodes?

Thank you.

About the name

Thank you very much for your interesting paper and code. I have a concern regarding the naming. Instead of pooling, I think what you proposed is a fully connected layer for graph. Each output node is connected with all the input nodes. The number of output nodes for diffpool can be larger or smaller than the input. Since it is officially called a pooling layer, all the subsequently proposed pooling layers should be compared with diffpool, which is actually based on a fully connected mechanism (a fc layer). It is unfair and doesn't quite make sense.

Requirements

May I ask the requirements of this project?
What version of pytorch is using?

Thanks in advance.

Cluster assignments learning for Graph classification

Hi @RexYing ,

Thank you for your work and for the code.

I have a question related to cluster assignments in the context of graph classification.

How do you initialize the number C of clusters and how to get the assignments cluster for each node.

Given x of dimension (30,256) where 30 is the number of nodes and 256 the dimension of features. Where x is the output of my convolutional layer. A is an adjacency matrix of dimension (30,30).

How can l get S the assignments matrix of dimension (30,C) given x and A.

Thank you

Typo in encoder.py causes nan

Hi, thanks for your codes. I think in encoder.py line 383, pred_adj = torch.min(pred_adj, torch.Tensor(1).cuda()) should be pred_adj = torch.min(pred_adj, torch.Tensor([1, ]).cuda()). I'm using PyTorch 1.1.0, and torch.Tensor(1).cuda()) will create a random tensor of size 1. Sometimes this will lead to negative pred_adj, and causes nan in log function.

Code of paper?

Is this the official implementation of the paper |
Hierarchical Graph Representation Learning with
Differentiable Pooling

Your Implementation is WRONG

In encoders.py line 251, line 272, line 276

class SoftPoolingGcnEncoder(GcnEncoderGraph):
    def __init__(self, max_num_nodes, input_dim, hidden_dim, embedding_dim, label_dim, num_layers,
            assign_hidden_dim, assign_ratio=0.25, assign_num_layers=-1, num_pooling=1,
            pred_hidden_dims=[50], concat=True, bn=True, dropout=0.0, linkpred=True,
            assign_input_dim=-1, args=None):
        '''
        Args:
            num_layers: number of gc layers before each pooling
            num_nodes: number of nodes for each graph in batch
            linkpred: flag to turn on link prediction side objective
        '''

        super(SoftPoolingGcnEncoder, self).__init__(input_dim, hidden_dim, embedding_dim, label_dim,
                num_layers, pred_hidden_dims=pred_hidden_dims, concat=concat, args=args)
        add_self = not concat
        self.num_pooling = num_pooling
        self.linkpred = linkpred
        self.assign_ent = True

        # GC
        self.conv_first_after_pool = []
        self.conv_block_after_pool = []
        self.conv_last_after_pool = []
        for i in range(num_pooling):
            # use self to register the modules in self.modules()
            self.conv_first2, self.conv_block2, self.conv_last2 = self.build_conv_layers(
                    self.pred_input_dim, hidden_dim, embedding_dim, num_layers, 
                    add_self, normalize=True, dropout=dropout)
            self.conv_first_after_pool.append(self.conv_first2)
            self.conv_block_after_pool.append(self.conv_block2)
            self.conv_last_after_pool.append(self.conv_last2)
        ...

you try to add self. ahead of submodules: self.conv_first2, self.conv_block2, self.conv_last2 and so on to register these modules.

It seems that you DONT'T fully understand the registration mechanism of pytorch, every loop will just remove the same name registed modules and then to register a new one, so only the last one is registed. The result is that this code only training with the top layer.
You can check the detail of pytorch's registration mechanism in the source code of pytorch:
https://github.com/pytorch/pytorch/blob/3805490d6ac7d77f93f91c55cdb37f08cfcfa254/torch/nn/modules/module.py#L542
Or you can try yourself in a small example:

class Test(nn.Module):
    def __init__(self):
        super(Test, self).__init__()
        self.l = nn.Linear(5, 3)

t = Test()
print('t:', list(t.parameters()))

class Test2(nn.Module):
    def __init__(self):
        super(Test2, self).__init__()
        self.ls = []
        for i in range(3):
            self.l = nn.Linear(5, 3)  # Wrong
            self.ls.append(self.l)  # Wrong

t2 = Test2()
print('t2:', list(t2.parameters()))

class Test3(nn.Module):
    def __init__(self):
        super(Test3, self).__init__()
        self.ls = []
        for i in range(3):
            l = nn.Linear(5, 3)
            self.ls.append(l)
        self.ls = nn.ModuleList(self.ls)  # This is the correct way!

t3 = Test3()
print('t3:', list(t3.parameters()))

The Result is:

t: [Parameter containing:
tensor([[ 0.1651, -0.0475, -0.2202, -0.3338, -0.1139],
        [-0.0085, -0.2673, -0.1991, -0.2291, -0.2957],
        [ 0.1386,  0.4027, -0.2351, -0.3703, -0.2127]], requires_grad=True), Parameter containing:
tensor([-0.3098, -0.0805,  0.2104], requires_grad=True)]
t2: [Parameter containing:
tensor([[ 0.1684, -0.2316,  0.3627,  0.2841,  0.3111],
        [ 0.3504, -0.1804, -0.0317, -0.2029, -0.3229],
        [-0.0555, -0.3587, -0.0170, -0.0934,  0.1872]], requires_grad=True), Parameter containing:
tensor([ 0.0846, -0.3757,  0.3119], requires_grad=True)]
t3: [Parameter containing:
tensor([[ 0.1828, -0.3483,  0.0333,  0.4261, -0.0936],
        [-0.3653,  0.0440, -0.3851,  0.4471, -0.0447],
        [ 0.3575,  0.0553, -0.3604,  0.4240, -0.4345]], requires_grad=True), Parameter containing:
tensor([-0.4442,  0.2580, -0.0387], requires_grad=True), Parameter containing:
tensor([[ 0.4045,  0.2508, -0.0131, -0.0955,  0.2923],
        [ 0.2839, -0.2880, -0.4088, -0.2684,  0.0620],
        [ 0.3492,  0.3595,  0.3115, -0.3213, -0.3045]], requires_grad=True), Parameter containing:
tensor([ 0.0668, -0.2749,  0.1292], requires_grad=True), Parameter containing:
tensor([[ 0.3967,  0.1927, -0.4264, -0.0192,  0.2547],
        [ 0.1547,  0.1158,  0.1841,  0.3160, -0.4269],
        [ 0.1797,  0.0126,  0.3575,  0.0975, -0.1188]], requires_grad=True), Parameter containing:
tensor([-0.0337,  0.4061,  0.3979], requires_grad=True)]

You can clearly find that t2 is only register the last Linear module.

So I DO doubt the results of your paper, even if I haven't run your code. I think you should check your code and produce new results.

What GNN method is used in this implementation?

Hi,

According to the paper, it said you used the GraphSAGE as the GNN module and stack two layers before diffpool layer. However, in this code implementation, are you using GCN instead and stack 3 layers before diffpool?

Batch Normalization and the direction of softmax

Hi,

Thank you for sharing your code. It is a very exciting paper! I have a few concerns about some details in your code. Please correct me if I make any mistakes. :-)

  1. The parameters in BatchNormalization is not trainable, as discussed in this issue. In this case, I personally think it is better to call it standardization instead of batch normalization. And I am wondering whether it is possible to view it in the way of CNN and image processing, where 1D batch normalization is applied to every image in a batch and every location in an image. Here we can regard the different nodes in graphs as different locations in images where convolutional kernels are applied.

  2. Which dimension should be applied to BatchNormalization. As discussed in (1), if we regard the nodes in graphs equivalent to the different locations in images, whether it would be better to do batch normalization to the last dimension, which is the feature channel of nodes. I personally guess it might make more sense than applying BatchNormalization to the node dimension.

  3. The direction of softmax in calculating the assignment matrix. I am curious whether it is better to softmax across the dimension of the node or the dimension of the new feature. I agree that to create an assignment matrix, the contribution of the original nodes to the new clusters should sum to 1 by applying softmax to the last dimension. But I am still wondering whether it makes sense if I apply softmax to the dimension of the node. In this way, each column of the assignment matrix could represent a contribution distribution of the original nodes.

difference between methods

Hi

Thank you for your implementation. You have three different model in your encoders, soft-assign, base-set2set, and base. Whats the difference between these?

Some questions about the SoftPoolingGcnEncoder

hello,
I edited parser.set_defaults{'method'='soft-assign' }, then error happened on the SoftPoolingGcnEncoder. So I have some questions here:
Whether nn.Linear() should add .cuda() on 90th line?
Whether self.assign_pred should change into self.assign_pred_modules[i] in 339th line?
Whether there is difference between assign_input_dim and the dimension of x_a update by new x?
Can you let me know whether I misunderstood?

Receiving a TypeError when executing example.sh

I was trying to run the file example.sh. The first command runs fine, but it throws an exception on the second one:

$ python -m train --bmname=ENZYMES --assign-ratio=0.1 --hidden-dim=30 --output-dim=30 --cuda=1 --num-classes=6 --method=soft-assign

The output is as follows. Any idea what is going on? Given that I haven't changed anything, maybe the version of packages are different?

CUDA 1
Using node labels
Num training graphs:  540 ; Num validation graphs:  60
Number of graphs:  600
Number of edges:  37282
Max, avg, std of graph size:  125 , 32.46 , 14.87
Method: soft-assign
/location/diffpool/encoders.py:71: UserWarning: nn.init.xavier_uniform is now deprecated in favor of nn.init.xavier_uniform_.
  m.weight.data = init.xavier_uniform(m.weight.data, gain=nn.init.calculate_gain('relu'))
/location/diffpool/encoders.py:73: UserWarning: nn.init.constant is now deprecated in favor of nn.init.constant_.
  m.bias.data = init.constant(m.bias.data, 0.0)
/location/diffpool/encoders.py:293: UserWarning: nn.init.xavier_uniform is now deprecated in favor of nn.init.xavier_uniform_.
  m.weight.data = init.xavier_uniform(m.weight.data, gain=nn.init.calculate_gain('relu'))
/location/diffpool/encoders.py:295: UserWarning: nn.init.constant is now deprecated in favor of nn.init.constant_.
  m.bias.data = init.constant(m.bias.data, 0.0)
Epoch:  0
/location/miniconda/envs/pytorch/lib/python3.6/site-packages/torch/nn/_reduction.py:49: UserWarning: size_average and reduce args will be deprecated, please use reduction='mean' instead.
  warnings.warn(warning.format(ret))
/location/diffpool/train.py:209: UserWarning: torch.nn.utils.clip_grad_norm is now deprecated in favor of torch.nn.utils.clip_grad_norm_.
  nn.utils.clip_grad_norm(model.parameters(), args.clip)
Traceback (most recent call last):
  File "/location/miniconda/envs/pytorch/lib/python3.6/site-packages/PIL/Image.py", line 2515, in fromarray
    mode, rawmode = _fromarray_typemap[typekey]
KeyError: ((1, 1, 1800), '|u1')

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/location/miniconda/envs/pytorch/lib/python3.6/runpy.py", line 193, in _run_module_as_main
    "__main__", mod_spec)
  File "/location/miniconda/envs/pytorch/lib/python3.6/runpy.py", line 85, in _run_code
    exec(code, run_globals)
  File "/location/diffpool/train.py", line 640, in <module>
    main()
  File "/location/diffpool/train.py", line 628, in main
    benchmark_task_val(prog_args, writer=writer)
  File "/location/diffpool/train.py", line 511, in benchmark_task_val
    writer=writer)
  File "/location/diffpool/train.py", line 218, in train
    log_assignment(model.assign_tensor, writer, epoch, writer_batch_idx)
  File "/location/diffpool/train.py", line 98, in log_assignment
    writer.add_image('assignment', data, epoch)
  File "/location/miniconda/envs/pytorch/lib/python3.6/site-packages/tensorboardX/writer.py", line 427, in add_image
    image(tag, img_tensor, dataformats=dataformats), global_step, walltime)
  File "/location/miniconda/envs/pytorch/lib/python3.6/site-packages/tensorboardX/summary.py", line 216, in image
    image = make_image(tensor, rescale=rescale)
  File "/location/miniconda/envs/pytorch/lib/python3.6/site-packages/tensorboardX/summary.py", line 254, in make_image
    image = Image.fromarray(tensor)
  File "/location/miniconda/envs/pytorch/lib/python3.6/site-packages/PIL/Image.py", line 2517, in fromarray
    raise TypeError("Cannot handle this data type")
TypeError: Cannot handle this data type

About the module pipeline

Hi, I have a question on the module in your paper, which says "apply a Diffpool layer after two Graphsage layers", "A total of 2 Dffpool layers are used ", "After each Diffpool layer, 3 layers of graph convolutions are performed ". Ain't these paradox? The "graph convolutions" here are not Graphsage layers?
If so, is the pipeline 2 Graphsage+diffpool1+3 graph convolutions+fc1+2 Graphsage+diffpool2 +3 graph convolutions +fc2+fc3?
Is the Graphgsage used as edbedding GNN and "graph convolutions" as pooling GNN? why use different GNNs?
And Why add a prediction layer( fc layer) after each graph convolution in the code?
Also in the encoders.py, I see sometimes classes call each other's methods and attributes without raising an error? How does that work?
Could you give me some guidance on these? Have confused me for several days. Appreciate.

Auxiliary loss

Hello, could you elaborate the rationale behind the adoption of link prediction loss: nearby nodes to be pooled together?

It's a bit confusing.

Rolling Up Cross Validation Results

Thanks for providing the repo. I was going through the code and couldn't understand why the code is averaging the results across all the cross validation sets per epoch and then picking the max?

I would think we would want to pick the max accuracy from each cross validation run and then average the best values from each run?

Error when plt.savefig

I got an FileNotFoundError when executing plt.savefig(gen_train_plt_name(args), dpi=600) in train.py/train()

  File "/home/LAB/penghao/.conda/envs/torch-sparse/lib/python3.6/site-packages/matplotlib/pyplot.py", line 842, in savefig
    res = fig.savefig(*args, **kwargs)
  File "/home/LAB/penghao/.conda/envs/torch-sparse/lib/python3.6/site-packages/matplotlib/figure.py", line 2311, in savefig
    self.canvas.print_figure(fname, **kwargs)
  File "/home/LAB/penghao/.conda/envs/torch-sparse/lib/python3.6/site-packages/matplotlib/backend_bases.py", line 2217, in print_figure
    **kwargs)
  File "/home/LAB/penghao/.conda/envs/torch-sparse/lib/python3.6/site-packages/matplotlib/backend_bases.py", line 1639, in wrapper
    return func(*args, **kwargs)
  File "/home/LAB/penghao/.conda/envs/torch-sparse/lib/python3.6/site-packages/matplotlib/backends/backend_agg.py", line 512, in print_png
    dpi=self.figure.dpi, metadata=metadata, pil_kwargs=pil_kwargs)
  File "/home/LAB/penghao/.conda/envs/torch-sparse/lib/python3.6/site-packages/matplotlib/image.py", line 1591, in imsave
    image.save(fname, **pil_kwargs)
  File "/home/LAB/penghao/.conda/envs/torch-sparse/lib/python3.6/site-packages/PIL/Image.py", line 2155, in save
    fp = builtins.open(filename, "w+b")
FileNotFoundError: [Errno 2] No such file or directory: 'results/ENZYMES_soft-assign_l3x1_ar10_h30_o30.png'

Cannot benchmark on common graph classification dataset

Hi,

I'm benchmarking your code (without modification) on common datasets also processed by Kristian, Nils, et al.. But couldn't get it run on 3 common datasets: MUTAG, PTC, REDDIT-BINARY.

All facing the same error as below:

$ python -m train --datadir=data --bmname=MUTAG --cuda=0 --max-nodes=28 --num-classes=2
Remove existing log dir:  log/MUTAG_base_l3_h20_o20
CUDA 0
No node attributes
Using node labels
Num training graphs:  170 ; Num validation graphs:  18
Number of graphs:  188
Number of edges:  3721
Max, avg, std of graph size:  28 , 17.93 , 4.58
Method: base
/home/survivio/ruochun/diffpool/encoders.py:71: UserWarning: nn.init.xavier_uniform is now deprecated in favor of nn.init.xavier_uniform_.
  m.weight.data = init.xavier_uniform(m.weight.data, gain=nn.init.calculate_gain('relu'))
/home/survivio/ruochun/diffpool/encoders.py:73: UserWarning: nn.init.constant is now deprecated in favor of nn.init.constant_.
  m.bias.data = init.constant(m.bias.data, 0.0)
Epoch:  0
/home/survivio/anaconda3/lib/python3.6/site-packages/torch/nn/_reduction.py:49: UserWarning: size_average and reduce args will be deprecated, please use reduction='mean' instead.
  warnings.warn(warning.format(ret))
/opt/conda/conda-bld/pytorch_1544174967633/work/aten/src/THCUNN/ClassNLLCriterion.cu:105: void cunn_ClassNLLCriterion_updateOutput_kernel(Dtype *, Dtype *, Dtype *, long *, Dtype *, int, int, int, int, long) [with Dtype = float, Acctype = float]: block: [0,0,0], thread: [0,0,0] Assertion `t >= 0 && t < n_classes` failed.
/opt/conda/conda-bld/pytorch_1544174967633/work/aten/src/THCUNN/ClassNLLCriterion.cu:105: void cunn_ClassNLLCriterion_updateOutput_kernel(Dtype *, Dtype *, Dtype *, long *, Dtype *, int, int, int, int, long) [with Dtype = float, Acctype = float]: block: [0,0,0], thread: [3,0,0] Assertion `t >= 0 && t < n_classes` failed.
/opt/conda/conda-bld/pytorch_1544174967633/work/aten/src/THCUNN/ClassNLLCriterion.cu:105: void cunn_ClassNLLCriterion_updateOutput_kernel(Dtype *, Dtype *, Dtype *, long *, Dtype *, int, int, int, int, long) [with Dtype = float, Acctype = float]: block: [0,0,0], thread: [4,0,0] Assertion `t >= 0 && t < n_classes` failed.
/opt/conda/conda-bld/pytorch_1544174967633/work/aten/src/THCUNN/ClassNLLCriterion.cu:105: void cunn_ClassNLLCriterion_updateOutput_kernel(Dtype *, Dtype *, Dtype *, long *, Dtype *, int, int, int, int, long) [with Dtype = float, Acctype = float]: block: [0,0,0], thread: [5,0,0] Assertion `t >= 0 && t < n_classes` failed.
/opt/conda/conda-bld/pytorch_1544174967633/work/aten/src/THCUNN/ClassNLLCriterion.cu:105: void cunn_ClassNLLCriterion_updateOutput_kernel(Dtype *, Dtype *, Dtype *, long *, Dtype *, int, int, int, int, long) [with Dtype = float, Acctype = float]: block: [0,0,0], thread: [7,0,0] Assertion `t >= 0 && t < n_classes` failed.
/opt/conda/conda-bld/pytorch_1544174967633/work/aten/src/THCUNN/ClassNLLCriterion.cu:105: void cunn_ClassNLLCriterion_updateOutput_kernel(Dtype *, Dtype *, Dtype *, long *, Dtype *, int, int, int, int, long) [with Dtype = float, Acctype = float]: block: [0,0,0], thread: [10,0,0] Assertion `t >= 0 && t < n_classes` failed.
/opt/conda/conda-bld/pytorch_1544174967633/work/aten/src/THCUNN/ClassNLLCriterion.cu:105: void cunn_ClassNLLCriterion_updateOutput_kernel(Dtype *, Dtype *, Dtype *, long *, Dtype *, int, int, int, int, long) [with Dtype = float, Acctype = float]: block: [0,0,0], thread: [12,0,0] Assertion `t >= 0 && t < n_classes` failed.
/opt/conda/conda-bld/pytorch_1544174967633/work/aten/src/THCUNN/ClassNLLCriterion.cu:105: void cunn_ClassNLLCriterion_updateOutput_kernel(Dtype *, Dtype *, Dtype *, long *, Dtype *, int, int, int, int, long) [with Dtype = float, Acctype = float]: block: [0,0,0], thread: [13,0,0] Assertion `t >= 0 && t < n_classes` failed.
/opt/conda/conda-bld/pytorch_1544174967633/work/aten/src/THCUNN/ClassNLLCriterion.cu:105: void cunn_ClassNLLCriterion_updateOutput_kernel(Dtype *, Dtype *, Dtype *, long *, Dtype *, int, int, int, int, long) [with Dtype = float, Acctype = float]: block: [0,0,0], thread: [16,0,0] Assertion `t >= 0 && t < n_classes` failed.
THCudaCheck FAIL file=/opt/conda/conda-bld/pytorch_1544174967633/work/aten/src/THCUNN/generic/ClassNLLCriterion.cu line=111 error=59 : device-side assert triggered
Traceback (most recent call last):
  File "/home/survivio/anaconda3/lib/python3.6/runpy.py", line 193, in _run_module_as_main
    "__main__", mod_spec)
  File "/home/survivio/anaconda3/lib/python3.6/runpy.py", line 85, in _run_code
    exec(code, run_globals)
  File "/home/survivio/ruochun/diffpool/train.py", line 640, in <module>
    main()
  File "/home/survivio/ruochun/diffpool/train.py", line 628, in main
    benchmark_task_val(prog_args, writer=writer)
  File "/home/survivio/ruochun/diffpool/train.py", line 511, in benchmark_task_val
    writer=writer)
  File "/home/survivio/ruochun/diffpool/train.py", line 205, in train
    loss = model.loss(ypred, label)
  File "/home/survivio/ruochun/diffpool/encoders.py", line 193, in loss
    return F.cross_entropy(pred, label, size_average=True)
  File "/home/survivio/anaconda3/lib/python3.6/site-packages/torch/nn/functional.py", line 1970, in cross_entropy
    return nll_loss(log_softmax(input, 1), target, weight, None, ignore_index, None, reduction)
  File "/home/survivio/anaconda3/lib/python3.6/site-packages/torch/nn/functional.py", line 1790, in nll_loss
    ret = torch._C._nn.nll_loss(input, target, weight, _Reduction.get_enum(reduction), ignore_index)
RuntimeError: cuda runtime error (59) : device-side assert triggered at /opt/conda/conda-bld/pytorch_1544174967633/work/aten/src/THCUNN/generic/ClassNLLCriterion.cu:111

Could you help to investigate the root cause?

Thanks,
Ruochun

Dadaset

Hello. Is REDDIT-MULTI-5K dataset part of REDDIT-MULTI-12K dataset?

Typo in example.sh

Hi,

I'm trying your code by running example.sh and found the argument --label-classes should be --num-classes according to arg_parse function in train.py.
You may consider to fix the typo.

Best,
Ruochun

How to set features for REDDIT-12k and COLLAB?

I found on this website, REDDIT-12k and COLLAB neither has node label nor node attribute. How do you set the node feature for training?

Also in your paper, you mentioned Appendix A but I couldn't find it anywhere. How can I find it?

Thanks!

Exact architecture from the paper?

I am confused about the exact layers in the architecture from the paper. It states that:
"
We use the “mean” variant of GRAPHSAGE [16] and apply a DIFFPOOL layer after every two GRAPHSAGE layers in our architecture. A total of 2 DIFFPOOL layers are used for the datasets. For small datasets such as ENZYMES and COLLAB, 1 DIFFPOOL layer can achieve similar performance. After each DIFFPOOL layer, 3 layers of graph convolutions are performed, before the next DIFFPOOL layer, or the readout layer.
"

So 2 or 3 GraphSAGE layers are used? Which one of the below would be correct? Or if no one is correct, what would be the exact architecture from the paper?

1. GraphSAGE
2. GraphSAGE
3. DIFFPOOL
4. GraphSAGE
5. GraphSAGE
6. DIFFPOOL
7. GraphSAGE
8. GraphSAGE
9. GraphSAGE
10. READOUT
1. GraphSAGE
2. GraphSAGE
3. GraphSAGE
4. DIFFPOOL
5. GraphSAGE
6. GraphSAGE
7. GraphSAGE
8. DIFFPOOL
9. GraphSAGE
10. GraphSAGE
11. GraphSAGE
12. READOUT
1. GraphSAGE
2. GraphSAGE
3. DIFFPOOL
4. GraphSAGE
5. GraphSAGE
6. DIFFPOOL
7. READOUT

batch normalization

I am a little bit confused by the batch normalization implementation here: everytime I run bn a self.bn(x) is used, where a new bn layer is created, i.e., bn_module = nn.BatchNorm1d(x.size()[1]).cuda(). Will this fail to train the parameters in the bn as it is created incrementally?

Parameters for REDDIT-MULTI-12K dataset

Can I ask what are the parameters for running REDDIT-MULTI-12K?
I ran into memory error when trying:
python -m train --bmname=REDDIT-MULTI-12K --assign-ratio=0.1 --hidden-dim=5 --output-dim=11 --cuda=0 --batch-size=10 --num-gc-layers=1 --num-classes=11 --method=soft-assign

thanks!

edge attributes

This might be a stupid question - but do you use edge attibutes anywhere in the code? I can't seem to figure out where and how.

Max-degree=10?

Why while calculating features of nodes self.max_deg is set to 10 when features==deg or struct?

Reproducing results

Would be good to list the exact commands needed to reproduce the results reported in the paper. Similar to how they're listed in the example.sh script.

Auxiliary losses implementations

Hi Rex,
I have a couple of questions regarding the implementation of the auxiliary losses.

  1. In the paper it says that
    'at each layer l, we minimize
LLP = ||A(l), S(l)S(l)^T||_F,   where || · ||_F

denotes the Frobenius norm.'
However, in the code, what I find is:

self.link_loss = -adj * torch.log(pred_adj+eps) - (1-adj) * torch.log(1-pred_adj+eps)
which is the binary cross-entropy on pre_adj
Could you please explain why/how this is equivalent to the mathematical formulation? Also, I believe that the pre_adj used is created with the final assignation tensor, isn't it?

  1. In theory you are also regularizing the entropy of the cluster assignment by minimizing
    LE = 1/n Sum(H(Si))
    but I can't see this anywhere in the code? Could you point me to this please?

  2. A third comment, not related to the losses is that in the experiments section of the paper you say that you use GraphSAGE as a base for the model, but as far as I could see in the code, it is using a GConv. Could you also enlighten me a little bit on this please?

Thanks!
Guadalupe

code for auxiliary link prediction objective

hi, thanks your code!
in your code, you use the auxiliary link prediction objective to reconstruct the adj, and the relevant code is in encoders.py.
in encoders.py line 383, you try to compare every element of pred_adj and 1 to ensure they are less than 1, which can successfully calculate torch.log in the line 387.
however, torch.Tensor(1) means generate a random tensor scalar, not 1. maybe try torch.Tensor([1])
that's a tiny mistake in my opinion. Or ,I'm wrong!
wish your reply!

Appendix pages

Hello, Rex Ying!

I'm Jaehoon, and I'm so thankful to your paper including DiffPooling method.
I would just like to ask you about 'Appendix' pages.
I already checked this issue, #7, but I can't still find the one.
So, how could I get the appendix pages now??

Thank you.
Best Regards

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.