rusty1s / pyg_autoscale Goto Github PK

View Code? Open in Web Editor NEW

157.0 157.0 27.0 216 KB

Implementation of "GNNAutoScale: Scalable and Expressive Graph Neural Networks via Historical Embeddings" in PyTorch

Home Page: http://arxiv.org/abs/2106.05609

License: MIT License

C++ 8.92% Cuda 6.99% Python 83.50% C 0.58%

graph-neural-networks pytorch pytorch-geometric scalability

pyg_autoscale's Introduction

Founding Engineer 🤓 @ kumo.ai - PhD from TU Dortmund University

👨🏼‍💻 I love coding and learning new things
🤩 I'm interested in

pyg_autoscale's People

Contributors

Stargazers

Watchers

pyg_autoscale's Issues

A Question about the Code

Hello. I appreciate that your work is really good and influenced me a lot. But excuse me, could you tell me in pyg_autoscale/torch_geometric_autoscale/history.py, within the member of the class History, push(), the branch else, why is it self.emb[dst_o:dst_o + c] = x[src_o:src_o + c]? Where is the indices vector n_id? I thought self.emb, also "history" as mentioned in your paper, recorded node v's most recent embedding in its v-th block or self.emb[v]. So in History's push(), within the last elif, self.emb is assigned values from x as the batch feature vectors only for x in n_id. But why in else, it has nothing to do with n_id? Or I want to know, if it should be self.emb[n_id[dst_o:dst_o + c]] = x[src_o:src_o + c], or the offset and count are not exactly like what you've described in pyg_autoscale/torch_geometric_autoscale/models/base.py, within ScalableGNN: __call__() ? Since I can't find the source code of torch.ops.torch_sparse.partition (trust me, I have tried a lot

)

Installation Failure

My environment: centos7, GCC 5.4, nvidia driver 470.82.01, cuda: 11.4, anaconda, python 3.9.13 pytorch 1.10.1, pyg 2.0.4

Install from source, with pip install -e .

Could you help me install this package? Thanks!

Failed to import package torch_geometric_autoscale

my env is
torch 1.12.0
torch-cluster 1.6.0
torch-geometric 2.0.4
torch-geometric-autoscale 0.0.0
torch-scatter 2.0.9
torch-sparse 0.6.15
torch-spline-conv 1.2.1

The import command is blocked until ctrl+c, then throw exception

from torch_geometric_autoscale import History, AsyncIOPool
^CTraceback (most recent call last):
File "", line 1, in
File "/home/ma-user/anaconda3/envs/py3.9/lib/python3.9/site-packages/torch_geometric_autoscale/init.py", line 12, in
from .data import get_data # noqa
File "/home/ma-user/anaconda3/envs/py3.9/lib/python3.9/site-packages/torch_geometric_autoscale/data.py", line 9, in
from ogb.nodeproppred import PygNodePropPredDataset
File "/home/ma-user/anaconda3/envs/py3.9/lib/python3.9/site-packages/ogb/nodeproppred/init.py", line 1, in
from .evaluate import Evaluator
File "/home/ma-user/anaconda3/envs/py3.9/lib/python3.9/site-packages/ogb/nodeproppred/evaluate.py", line 1, in
from sklearn.metrics import roc_auc_score
File "/home/ma-user/anaconda3/envs/py3.9/lib/python3.9/site-packages/sklearn/init.py", line 82, in
from .base import clone
File "/home/ma-user/anaconda3/envs/py3.9/lib/python3.9/site-packages/sklearn/base.py", line 17, in
from .utils import _IS_32BIT
File "/home/ma-user/anaconda3/envs/py3.9/lib/python3.9/site-packages/sklearn/utils/init.py", line 26, in
from . import _joblib
File "/home/ma-user/anaconda3/envs/py3.9/lib/python3.9/site-packages/sklearn/utils/_joblib.py", line 7, in
import joblib
File "/home/ma-user/anaconda3/envs/py3.9/lib/python3.9/site-packages/joblib/init.py", line 113, in
from .memory import Memory, MemorizedResult, register_store_backend
File "/home/ma-user/anaconda3/envs/py3.9/lib/python3.9/site-packages/joblib/memory.py", line 32, in
from ._store_backends import StoreBackendBase, FileSystemStoreBackend
File "/home/ma-user/anaconda3/envs/py3.9/lib/python3.9/site-packages/joblib/_store_backends.py", line 15, in
from .backports import concurrency_safe_rename
File "/home/ma-user/anaconda3/envs/py3.9/lib/python3.9/site-packages/joblib/backports.py", line 7, in
from distutils.version import LooseVersion
File "", line 1007, in _find_and_load
File "", line 982, in _find_and_load_unlocked
File "", line 925, in _find_spec
File "/home/ma-user/anaconda3/envs/py3.9/lib/python3.9/site-packages/_distutils_hack/init.py", line 97, in find_spec
return method()
File "/home/ma-user/anaconda3/envs/py3.9/lib/python3.9/site-packages/_distutils_hack/init.py", line 108, in spec_for_distutils
mod = importlib.import_module('setuptools._distutils')
File "/home/ma-user/anaconda3/envs/py3.9/lib/python3.9/importlib/init.py", line 127, in import_module
return _bootstrap._gcd_import(name[level:], package, level)
File "/home/ma-user/anaconda3/envs/py3.9/lib/python3.9/site-packages/setuptools/init.py", line 16, in
import setuptools.version
File "/home/ma-user/anaconda3/envs/py3.9/lib/python3.9/site-packages/setuptools/version.py", line 1, in
import pkg_resources
File "", line 211, in _lock_unlock_module
File "", line 107, in acquire
KeyboardInterrupt

Support the multi-gpu option

Hi, thanks for the effort and commitment to the scalable GNN, the memory issue really bugs me sometimes.
The autoscale method seems a cool approach to handle the above issue and if it supports the multi-gpu, the training speed is faster than ever!
It seems hard to consider the multi-gpu option, but I ask you all just in case :)

Which codes can repoduce the Figure4 Runtime overhead in paper?

I wonder how we can evaluate Figure 4, Table4 and 5 in the paper: Runtime overhead in relation to the inter-/intra-
connectivity ratio of mini-batches, both for serial and concurrent history access patterns.
I am sorry that I cannot find it in the repo.

question about emb_device and device for History class

Hello Matthias,
Thank you very much for the code. Nice work as always.
I was wondering what is the difference between emb_device and device for the History class?

When I initialized a GCN like this

model = GCN(10, 10, 10, 10, 5, device='cpu')
print(model)

I get

GCN(
  (histories): ModuleList(
    (0): History(10, 10, emb_device=cpu, device=cpu)
    (1): History(10, 10, emb_device=cpu, device=cpu)
    (2): History(10, 10, emb_device=cpu, device=cpu)
    (3): History(10, 10, emb_device=cpu, device=cpu)
  )
  (lins): ModuleList()
  (convs): ModuleList(
    (0): GCNConv(10, 10)
    (1): GCNConv(10, 10)
    (2): GCNConv(10, 10)
    (3): GCNConv(10, 10)
    (4): GCNConv(10, 10)
  )
  (bns): ModuleList(
    (0): BatchNorm1d(10, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
    (1): BatchNorm1d(10, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
    (2): BatchNorm1d(10, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
    (3): BatchNorm1d(10, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
    (4): BatchNorm1d(10, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
  )
)

but I noticed that there is a process in cuda:0 (I have multiple gpus), which I don't understand why. Is this desirable behavior?
Also, in general, should I always set the device in GCN class as none? I noticed this is what you did in the large_benchmark/main.py.

SubgraphLoader for heterogeneous graph

Hi,
I am trying to apply pyg_autoscale to heterogeneous graph and have to modify the compute_subgraph method in SubgraphLoader class. I was wondering would you like to elaborate on what offset, count are and what is relabel_fn doing?
My current understanding is that compute_subgraph is basically taking the sub-graph spanned by n_id. Is this understanding accurate?
Many thanks!

    def compute_subgraph(self, batches: List[Tuple[int, Tensor]]) -> SubData:
        batch_ids, n_ids = zip(*batches)
        n_id = torch.cat(n_ids, dim=0)
        batch_id = torch.tensor(batch_ids)

        # We collect the in-mini-batch size (`batch_size`), the offset of each
        # partition in the mini-batch (`offset`), and the number of nodes in
        # each partition (`count`)
        batch_size = n_id.numel()
        offset = self.ptr[batch_id]
        count = self.ptr[batch_id.add_(1)].sub_(offset)

        rowptr, col, value = self.data.adj_t.csr()
        rowptr, col, value, n_id = relabel_fn(rowptr, col, value, n_id,
                                              self.bipartite)

        adj_t = SparseTensor(rowptr=rowptr, col=col, value=value,
                             sparse_sizes=(rowptr.numel() - 1, n_id.numel()),
                             is_sorted=True)

        data = self.data.__class__(adj_t=adj_t)
        for k, v in self.data:
            if isinstance(v, Tensor) and v.size(0) == self.data.num_nodes:
                data[k] = v.index_select(0, n_id)

        return SubData(data, batch_size, n_id, offset, count)

Why `cudaMemcpyDeviceToHost` in `read_async_cuda`?

Hi @rusty1s , awesome work. I'm reading through the codebase and found you were using cudaMemcpyDeviceToHost when copying data from src (which resides in CPU) and dst (which resides in CUDA):

pyg_autoscale/csrc/cuda/async_cuda.cu

Lines 74 to 76 in a17350c

 cudaMemcpyAsync( 

 dst_data + (dst_offset * size), src_data + (src_offset * size), 

 c * size * sizeof(scalar_t), cudaMemcpyDeviceToHost, stream);

I wonder is it a typo?

AttributeError: 'NoneType' object has no attribute 'origin'

Hi @rusty1s, thank you for your great contribution to large-scale graph learning. These days I am facing a problem regarding the configuration of GAS. This issue occurs while importing the GAS library. I've tried many ways, shifting the version of torch, cuda, and torch-sparse, to fix it, but none of them make a difference. So I wanna ask you what can I do with it?

Attribute Error while importing torch_geometric_autoscale

When I try to run the benchmark examples, I get this error:

Traceback (most recent call last):
  File "repos/pyg_autoscale/small_benchmark/main.py", line 8, in <module>
    from torch_geometric_autoscale import (get_data, metis, permute, models,
  File "repos/pyg_autoscale/torch_geometric_autoscale/__init__.py", line 10, in <module>
    library, [osp.dirname(__file__)]).origin)
AttributeError: 'NoneType' object has no attribute 'origin'

I have these versions:

python 3.9.16
torch 2.1.0
torch_geometric 2.4.0
torch-geometric-autoscale 0.0.0
torch-scatter 2.0.9
torch-sparse 0.6.18

No cuda. MacBook Pro macOS 14.0.

Edit: According to pyg-team/pytorch_geometric#2304 I have checked in my env folder and I can see the *.so files:

_async.so
_relabel.so

Issue with SubgraphLoader() and/or Torch

Hi, I obtained this error by just running the first example train_gcn.py

Traceback (most recent call last):
File "/home/ersa/spage2vec/pyg_autoscale/examples/train_gcn.py", line 85, in
train(model, loader, optimizer)
File "/home/ersa/spage2vec/pyg_autoscale/examples/train_gcn.py", line 57, in train
for batch, *args in loader:
File "/home/ersa/miniconda3/envs/spage2vec/lib/python3.10/site-packages/torch/utils/data/dataloader.py", line 633, in next
data = self._next_data()
File "/home/ersa/miniconda3/envs/spage2vec/lib/python3.10/site-packages/torch/utils/data/dataloader.py", line 677, in _next_data
data = self._dataset_fetcher.fetch(index) # may raise StopIteration
File "/home/ersa/miniconda3/envs/spage2vec/lib/python3.10/site-packages/torch/utils/data/_utils/fetch.py", line 54, in fetch
return self.collate_fn(data)
File "/home/ersa/miniconda3/envs/spage2vec/lib/python3.10/site-packages/torch_geometric_autoscale/loader.py", line 74, in compute_subgraph
rowptr, col, value, n_id = relabel_fn(rowptr, col, value, n_id,
File "/home/ersa/miniconda3/envs/spage2vec/lib/python3.10/site-packages/torch/_ops.py", line 502, in call
return self._op(*args, **kwargs or {})
RuntimeError: isBool() INTERNAL ASSERT FAILED at "/home/ersa/miniconda3/envs/spage2vec/lib/python3.10/site-packages/torch/include/ATen/core/ivalue.h":666, please report a bug to PyTorch.

My conda environment:
python version==3.10.13.final.0

torch 1.13.1+cu117 pypi_0 pypi
torch-cluster 1.6.3+pt20cu117 pypi_0 pypi
torch-geometric 2.4.0 pypi_0 pypi
torch-geometric-autoscale 0.0.0 pypi_0 pypi
torch-scatter 2.1.2+pt20cu117 pypi_0 pypi
torch-sparse 0.6.18+pt20cu117 pypi_0 pypi

synchronize in pool.py

Hello, GNNAutoScale is an interesting and solid work, and thanks for your hard work.
However, when running "synchronize = torch.ops.torch_geometric_autoscale.synchronize", it reports an error.
How to deal with this problem?

RuntimeError: Not compiled with CUDA support

Hello, thank you very much for your contribution in the large-scale graph training. I also try to run it on the machine. There is a problem now, I hope you can help solve it ,thank you!

Error raised when reproducing large_benchmark

We run the main.py on the document large_benchmark ,but we meet the following issues,please help us.The following are our issues and enviroment.

Error message

Loading data... Processing...
Error executing job with overrides: ['model=pna', 'dataset=flickr', 'root=/tmp/datasets', 'device=0', 'log_every=1']
Traceback (most recent call last):
  File "main.py", line 78, in main
    data, in_channels, out_channels = get_data(conf.root, conf.dataset.name)
  File "/root/anaconda3/lib/python3.8/site-packages/torch_geometric_autoscale/data.py", line 126, in get_data
    return get_flickr(root)
  File "/root/anaconda3/lib/python3.8/site-packages/torch_geometric_autoscale/data.py", line 81, in get_flickr
    dataset = Flickr(f'{root}/Flickr', pre_transform=T.ToSparseTensor())
  File "/root/anaconda3/lib/python3.8/site-packages/torch_geometric/datasets/flickr.py", line 37, in __init__
    super().__init__(root, transform, pre_transform)
  File "/root/anaconda3/lib/python3.8/site-packages/torch_geometric/data/in_memory_dataset.py", line 60, in __init__
    super().__init__(root, transform, pre_transform, pre_filter)
  File "/root/anaconda3/lib/python3.8/site-packages/torch_geometric/data/dataset.py", line 86, in __init__
    self._process()
  File "/root/anaconda3/lib/python3.8/site-packages/torch_geometric/data/dataset.py", line 165, in _process
    self.process()
  File "/root/anaconda3/lib/python3.8/site-packages/torch_geometric/datasets/flickr.py", line 69, in process
    x = np.load(osp.join(self.raw_dir, 'feats.npy'))
  File "/root/anaconda3/lib/python3.8/site-packages/numpy/lib/npyio.py", line 444, in load
    raise ValueError("Cannot load file containing pickled data "
ValueError: Cannot load file containing pickled data when allow_pickle=False

Set the environment variable HYDRA_FULL_ERROR=1 for a complete stack trace.

Environment

Package                            Version
---------------------------------- -------------------
alabaster                          0.7.12
anaconda-client                    1.7.2
anaconda-navigator                 1.10.0
anaconda-project                   0.8.3
antlr4-python3-runtime             4.8
argh                               0.26.2
argon2-cffi                        20.1.0
asn1crypto                         1.4.0
astroid                            2.4.2
astropy                            4.0.2
async-generator                    1.10
atomicwrites                       1.4.0
attrs                              20.3.0
autopep8                           1.5.4
Babel                              2.8.1
backcall                           0.2.0
backports.functools-lru-cache      1.6.1
backports.shutil-get-terminal-size 1.0.0
backports.tempfile                 1.0
backports.weakref                  1.0.post1
beautifulsoup4                     4.9.3
bitarray                           1.6.1
bkcharts                           0.2
bleach                             3.2.1
bokeh                              2.2.3
boto                               2.49.0
Bottleneck                         1.3.2
brotlipy                           0.7.0
certifi                            2020.6.20
cffi                               1.14.3
chardet                            3.0.4
click                              7.1.2
cloudpickle                        1.6.0
clyent                             1.2.2
colorama                           0.4.4
conda                              4.11.0
conda-build                        3.20.5
conda-package-handling             1.7.2
conda-verify                       3.4.2
contextlib2                        0.6.0.post1
cryptography                       3.1.1
cycler                             0.10.0
Cython                             0.29.21
cytoolz                            0.11.0
dask                               2.30.0
decorator                          4.4.2
defusedxml                         0.6.0
dgl-cu110                          0.6.1
diff-match-patch                   20200713
distributed                        2.30.1
docutils                           0.16
entrypoints                        0.3
et-xmlfile                         1.0.1
fastcache                          1.1.0
filelock                           3.0.12
flake8                             3.8.4
Flask                              1.1.2
fsspec                             0.8.3
future                             0.18.2
gevent                             20.9.0
glob2                              0.7
gmpy2                              2.0.8
googledrivedownloader              0.4
greenlet                           0.4.17
h5py                               2.10.0
HeapDict                           1.0.1
html5lib                           1.1
hydra-core                         1.1.1
idna                               2.10
imageio                            2.9.0
imagesize                          1.2.0
importlib-metadata                 2.0.0
importlib-resources                5.4.0
iniconfig                          1.1.1
intervaltree                       3.1.0
ipykernel                          5.3.4
ipython                            7.19.0
ipython-genutils                   0.2.0
ipywidgets                         7.5.1
isodate                            0.6.0
isort                              5.6.4
itsdangerous                       1.1.0
jdcal                              1.4.1
jedi                               0.17.1
jeepney                            0.5.0
Jinja2                             2.11.2
joblib                             0.17.0
json5                              0.9.5
jsonschema                         3.2.0
jupyter                            1.0.0
jupyter-client                     6.1.7
jupyter-console                    6.2.0
jupyter-core                       4.6.3
jupyterlab                         2.2.6
jupyterlab-pygments                0.1.2
jupyterlab-server                  1.2.0
keyring                            21.4.0
kiwisolver                         1.3.0
lazy-object-proxy                  1.4.3
libarchive-c                       2.9
littleutils                        0.2.2
llvmlite                           0.34.0
locket                             0.2.0
lxml                               4.6.1
MarkupSafe                         1.1.1
matplotlib                         3.3.2
mccabe                             0.6.1
mistune                            0.8.4
mkl-fft                            1.2.0
mkl-random                         1.1.1
mkl-service                        2.3.0
mock                               4.0.2
more-itertools                     8.6.0
mpmath                             1.1.0
msgpack                            1.0.0
multipledispatch                   0.6.0
navigator-updater                  0.2.1
nbclient                           0.5.1
nbconvert                          6.0.7
nbformat                           5.0.8
nest-asyncio                       1.4.2
networkx                           2.5
nltk                               3.5
nose                               1.3.7
notebook                           6.1.4
numba                              0.51.2
numexpr                            2.7.1
numpy                              1.19.2
numpydoc                           1.1.0
ogb                                1.3.1
olefile                            0.46
omegaconf                          2.1.1
openpyxl                           3.0.5
outdated                           0.2.1
packaging                          20.4
pandas                             1.1.3
pandocfilters                      1.4.3
parso                              0.7.0
partd                              1.1.0
path                               15.0.0
pathlib2                           2.3.5
pathtools                          0.1.2
patsy                              0.5.1
pep8                               1.7.1
pexpect                            4.8.0
pickleshare                        0.7.5
Pillow                             8.0.1
pip                                20.2.4
pkginfo                            1.6.1
pluggy                             0.13.1
ply                                3.11
prometheus-client                  0.8.0
prompt-toolkit                     3.0.8
psutil                             5.7.2
ptyprocess                         0.6.0
py                                 1.9.0
pycodestyle                        2.6.0
pycosat                            0.6.3
pycparser                          2.20
pycurl                             7.43.0.6
pydocstyle                         5.1.1
pyflakes                           2.2.0
Pygments                           2.7.2
pylint                             2.6.0
pyodbc                             4.0.0-unsupported
pyOpenSSL                          19.1.0
pyparsing                          2.4.7
pyrsistent                         0.17.3
PySocks                            1.7.1
pytest                             0.0.0
python-dateutil                    2.8.1
python-jsonrpc-server              0.4.0
python-language-server             0.35.1
python-louvain                     0.15
pytz                               2020.1
PyWavelets                         1.1.1
pyxdg                              0.27
PyYAML                             5.3.1
pyzmq                              19.0.2
QDarkStyle                         2.8.1
QtAwesome                          1.0.1
qtconsole                          4.7.7
QtPy                               1.9.0
rdflib                             5.0.0
regex                              2020.10.15
requests                           2.24.0
rope                               0.18.0
Rtree                              0.9.4
ruamel-yaml                        0.15.87
scikit-image                       0.17.2
scikit-learn                       0.23.2
scipy                              1.5.2
seaborn                            0.11.0
SecretStorage                      3.1.2
Send2Trash                         1.5.0
setuptools                         50.3.1.post20201107
simplegeneric                      0.8.1
singledispatch                     3.4.0.3
sip                                4.19.13
six                                1.15.0
snowballstemmer                    2.0.0
sortedcollections                  1.2.1
sortedcontainers                   2.2.2
soupsieve                          2.0.1
Sphinx                             3.2.1
sphinxcontrib-applehelp            1.0.2
sphinxcontrib-devhelp              1.0.2
sphinxcontrib-htmlhelp             1.0.3
sphinxcontrib-jsmath               1.0.1
sphinxcontrib-qthelp               1.0.3
sphinxcontrib-serializinghtml      1.1.4
sphinxcontrib-websupport           1.2.4
spyder                             4.1.5
spyder-kernels                     1.9.4
SQLAlchemy                         1.3.20
statsmodels                        0.12.0
sympy                              1.6.2
tables                             3.6.1
tblib                              1.7.0
terminado                          0.9.1
testpath                           0.4.4
threadpoolctl                      2.1.0
tifffile                           2020.10.1
toml                               0.10.1
toolz                              0.11.1
torch                              1.8.0+cu111
torch-cluster                      1.5.9
torch-geometric                    1.7.2
torch-geometric-autoscale          0.0.0
torch-scatter                      2.0.7
torch-sparse                       0.6.10
torch-spline-conv                  1.2.1
torchaudio                         0.8.0
torchvision                        0.9.0+cu111
tornado                            6.0.4
tqdm                               4.50.2
traitlets                          5.0.5
typing-extensions                  3.7.4.3
ujson                              4.0.1
unicodecsv                         0.14.1
urllib3                            1.25.11
watchdog                           0.10.3
wcwidth                            0.2.5
webencodings                       0.5.1
Werkzeug                           1.0.1
wheel                              0.35.1
widgetsnbextension                 3.5.1
wrapt                              1.11.2
wurlitzer                          2.0.1
xlrd                               1.2.0
XlsxWriter                         1.3.7
xlwt                               1.3.0
xmltodict                          0.12.0
yapf                               0.30.0
zict                               2.0.0
zipp                               3.4.0
zope.event                         4.5.0
zope.interface                     5.1.2

CUDA version problem when install the package

below is the error image, I'm confused about the error because my cuda version is 11.2

below is my cuda version

The code cannot exit properly

Hi, thank you very much for your extraordinary work, it helps me a lot.
Now I have a problem when I run the the following command in the examples directory:
python train_gcn.py --root=/my_data --device=0.
There is no error raised, but the code just cannot exit properly after the training is finished, and I have to exit manually using Ctrl+C as following. I have tried to add "sys.exit(0)" in the train_gcn.py, but it doesn't work.

Traceback (most recent call last):
File "/home/qchen/anaconda3/envs/py38/lib/python3.8/threading.py", line 1388, in _shutdown
lock.acquire()
KeyboardInterrupt:

My dependencies are python38 + pytorch 1.10.1 + pyg 2.0.3. Do you have any idea by what it is caused? Thank you again for your contribution!

pickling error

thank you a lot for this brilliant work, it works like a charm, and is very impressive (a perfect addition to already fantastic torch_geometric pack)

I have an issue that seems far beyond my python competences. When i am trying to use several workers with SubgraphLoader, I get a very interesting error:

_pickle.PicklingError: Can't pickle <class 'torch_geometric.data.batch.Batch'>: it's not the same object as torch_geometric.data.batch.Batch

(made a few google searches, but it did not help)

my setup is a bit more involved than your examples, I will try to describe it, and hopefully give useful information without too much noise.

I have an outer torch_geometric.loader.DataLoader that read samples from disk
this samples are of the form
MyGraph(torch_geometric.data.Data)
and redefine __cat_dim__ and the constructor

(I did not redefine the collation functions, so I get a Batch out of this)
this outer dataloader has a fixed batch size of 1, as I am scaling up to larger graphs

this samples are passed (one by one as my bs is 1) to a code that split it like in your examples:

perm, ptr = metis(d.adj_t, num_parts=num_parts, log=self.debug)                                 
data = permute(d, perm, log=self.debug)

and then it works like a charm if I do:

loader = SubgraphLoader(data, ptr, batch_size=inner_batch_size, shuffle=True)

but I get the above error if i do:

loader = SubgraphLoader(data, ptr, batch_size=inner_batch_size, shuffle=True, num_workers=2, persistent_workers=True)

So I am wondering I you have any clue on this, or if you know where I should investigate.

Thanks a lot, and congratulations again for this piece of work and for the whole torch_geometric pack.

Memory Usage

Hello,

Thanks for the great work. I am trying to check the memory usage based on GAS, I used APPNP which is provided to run on some large scale datasets, the following is the code, I just added several lines for memory usage within the forward:

class APPNP(ScalableGNN):
    def __init__(self, num_nodes: int, in_channels, hidden_channels: int,
                 out_channels: int, num_mlp:int, num_layers: int, alpha: float,
                 dropout: float = 0.0, pool_size: Optional[int] = None,
                 buffer_size: Optional[int] = None, device=None):
        super().__init__(num_nodes, out_channels, num_layers, pool_size,
                         buffer_size, device)

        self.in_channels = in_channels
        self.out_channels = out_channels
        self.alpha = alpha
        self.dropout = dropout


        self.lins = torch.nn.ModuleList()
        self.lins.append(torch.nn.Linear(in_channels, hidden_channels))
        self.bns = torch.nn.ModuleList()
        self.bns.append(torch.nn.BatchNorm1d(hidden_channels))
        for _ in range(num_mlp):
            self.lins.append(torch.nn.Linear(hidden_channels, hidden_channels))
            self.bns.append(torch.nn.BatchNorm1d(hidden_channels))
        self.lins.append(torch.nn.Linear(hidden_channels, out_channels))

        self.reg_modules = self.lins[:1]
        self.nonreg_modules = self.lins[1:]

    def reset_parameters(self):
        super().reset_parameters()
        for lin in self.lins:
            lin.reset_parameters()
        for bn in self.bns:
            bn.reset_parameters()

    def forward(self, x: Tensor, adj_t: SparseTensor, *args) -> Tensor:
        print("allocated1: %.2f MB" % (torch.cuda.memory_allocated() / 1024 / 1024), flush=True)
        x = self.lins[0](x).relu_()
        x = F.dropout(x, p=self.dropout, training=self.training)
        for i, lin in enumerate(self.lins[1:-1]):
            x = lin(x)
            x = self.bns[i](x)
            x = F.relu(x)
            x = F.dropout(x, p=self.dropout, training=self.training)
        x = self.lins[-1](x)
        x_0 = x[:adj_t.size(0)]
        print("allocated2: %.2f MB" % (torch.cuda.memory_allocated() / 1024 / 1024), flush=True)
        for history in self.histories:
            x = (1 - self.alpha) * (adj_t @ x) + self.alpha * x_0
            x = self.push_and_pull(history, x, *args)

        x = (1 - self.alpha) * (adj_t @ x) + self.alpha * x_0
        print("allocated3: %.2f MB" % (torch.cuda.memory_allocated() / 1024 / 1024), flush=True)
        return x

    @torch.no_grad()
    def forward_layer(self, layer, x, adj_t, state):
        if layer == 0:
            x = F.dropout(x, p=self.dropout, training=self.training)
            x = self.lins[0](x)
            x = x.relu()
            x = F.dropout(x, p=self.dropout, training=self.training)
            x = x_0 = self.lins[1](x)
            state['x_0'] = x_0[:adj_t.size(0)]

        x = (1 - self.alpha) * (adj_t @ x) + self.alpha * state['x_0']
        return x

And also added some lines in the training function for memory usage:

def mini_train(model, loader, criterion, optimizer, max_steps, grad_norm=None,
               edge_dropout=0.0):
    model.train()

    total_loss = total_examples = 0
    for i, (batch, batch_size, *args) in enumerate(loader):
        x = batch.x.to(model.device)
        adj_t = batch.adj_t.to(model.device)
        y = batch.y[:batch_size].to(model.device)
        train_mask = batch.train_mask[:batch_size].to(model.device)

        if train_mask.sum() == 0:
            continue

        # We make use of edge dropout on ogbn-products to avoid overfitting.
        adj_t = dropout(adj_t, p=edge_dropout)

        optimizer.zero_grad()
        out = model(x, adj_t, batch_size, *args)
        loss = criterion(out[train_mask], y[train_mask])
        print("middle: %.2f MB" % (torch.cuda.memory_allocated() / 1024 / 1024), flush=True)
        loss.backward(retain_graph=True)
        if grad_norm is not None:
           torch.nn.utils.clip_grad_norm_(model.parameters(), grad_norm)
        optimizer.step()
        print("after: %.2f MB" % (torch.cuda.memory_allocated() / 1024 / 1024), flush=True)

        total_loss += float(loss) * int(train_mask.sum())
        total_examples += int(train_mask.sum())

        # We may abort after a fixed number of steps to refresh histories...
        if (i + 1) >= max_steps and (i + 1) < len(loader):
            break

    #print("allocated: %.2f MB" % (torch.cuda.memory_allocated() / 1024 / 1024), flush=True)
    return total_loss / total_examples

@torch.no_grad()
def full_test(model, data):
    model.eval()
    return model(data.x.to(model.device), data.adj_t.to(model.device)).cpu()


@torch.no_grad()
def mini_test(model, loader):
    model.eval()
    return model(loader=loader)

When I run the code, it returns as following:

Loading data... Done! [4.13s]
Computing METIS partitioning with 150 parts... Done! [72.42s]
Permuting data... Done! [52.20s]
Adding self-loops... Done! [3.61s]
Normalizing data... Done! [1.83s]
Pre-processing subgraphs... Done! [5.54s]
Pre-processing subgraphs... Done! [23.81s]
Calculating buffer size... Done! [0.01s] -> 104781
Fill history... allocated1: 0.00 MB
allocated2: 0.00 MB
allocated3: 4259.25 MB
Done! [3.61s]

I am wondering why the "allocated2" which should be the memory after MLP is 0, and there is a hugh increase between "allocated2" and "allocated3". And during training, the memory keeps increased and then becomes stable, is it normal? Thanks for the help.

Have trouble on installing the package

    warnings.warn(ABI_INCOMPATIBILITY_WARNING.format(compiler))
  building 'torch_geometric_autoscale._async' extension
  creating build/temp.linux-x86_64-cpython-38
  creating build/temp.linux-x86_64-cpython-38/csrc
  gcc -pthread -B /gpfs/gibbs/pi/zhao/tl688/conda_envs/graphnn/compiler_compat -Wl,--sysroot=/ -Wsign-compare -DNDEBUG -g -fwrapv -O3 -Wall -Wstrict-prototypes -fPIC -Icsrc -I/home/tl688/.local/lib/python3.8/site-packages/torch/include -I/home/tl688/.local/lib/python3.8/site-packages/torch/include/torch/csrc/api/include -I/home/tl688/.local/lib/python3.8/site-packages/torch/include/TH -I/home/tl688/.local/lib/python3.8/site-packages/torch/include/THC -I/gpfs/gibbs/pi/zhao/tl688/conda_envs/graphnn/include/python3.8 -c csrc/async.cpp -o build/temp.linux-x86_64-cpython-38/csrc/async.o -DAT_PARALLEL_OPENMP -fopenmp -DTORCH_API_INCLUDE_EXTENSION_H -DPYBIND11_COMPILER_TYPE=\"_gcc\" -DPYBIND11_STDLIB=\"_libstdcpp\" -DPYBIND11_BUILD_ABI=\"_cxxabi1011\" -DTORCH_EXTENSION_NAME=_async -D_GLIBCXX_USE_CXX11_ABI=0 -std=c++14
  gcc: error: unrecognized command line option ‘-std=c++14’
  error: command '/usr/bin/gcc' failed with exit code 1
  [end of output]

note: This error originates from a subprocess, and is likely not a problem with pip.
error: legacy-install-failure

Here is the error I met, but I did not fail when I intend to install pyg. Could you please help me solve this problem? Thanks a lot!

Whether autoscale supports heterogeneous graph

Thanks for this great works.
What i should do if I want to use autoscale for heterogeneous graph? Can these three ways in the docs be applied with autoscale to create models on heterogeneous graph data?

Results on OGBN-Papers100M

Hello,

I am wondering if there are any results of GAS on OGBN-Papers100M. (Or results on some datasets larger than ogbn-products)

BTW, since the dataset is so big, preprocessing steps such as partitioning with METIS are unrealistic to implement as usual. I am also wondering if there are any code scripts I can refer to that can help solve this issue.

Thanks.

Impossible to install on AWS SageMaker (gcc < 5.0)

When I am trying to install pyg_autoscale at AWS SageMaker I am getting a following error:

pip install git+https://github.com/rusty1s/pyg_autoscale.git -q

ERROR: Command errored out with exit status 1:
command: /home/ec2-user/anaconda3/envs/python3/bin/python -u -c 'import io, os, sys, setuptools, tokenize; sys.argv[0] = '"'"'/tmp/pip-req-build-wjznfuqn/setup.py'"'"'; file='"'"'/tmp/pip-req-build-wjznfuqn/setup.py'"'"';f = getattr(tokenize, '"'"'open'"'"', open)(file) if os.path.exists(file) else io.StringIO('"'"'from setuptools import setup; setup()'"'"');code = f.read().replace('"'"'\r\n'"'"', '"'"'\n'"'"');f.close();exec(compile(code, file, '"'"'exec'"'"'))' bdist_wheel -d /tmp/pip-wheel-ajrg7df1
cwd: /tmp/pip-req-build-wjznfuqn/
Complete output (46 lines):
running bdist_wheel
running build
running build_py
creating build
creating build/lib.linux-x86_64-3.6
creating build/lib.linux-x86_64-3.6/torch_geometric_autoscale
copying torch_geometric_autoscale/init.py -> build/lib.linux-x86_64-3.6/torch_geometric_autoscale
copying torch_geometric_autoscale/metis.py -> build/lib.linux-x86_64-3.6/torch_geometric_autoscale
copying torch_geometric_autoscale/pool.py -> build/lib.linux-x86_64-3.6/torch_geometric_autoscale
copying torch_geometric_autoscale/loader.py -> build/lib.linux-x86_64-3.6/torch_geometric_autoscale
copying torch_geometric_autoscale/history.py -> build/lib.linux-x86_64-3.6/torch_geometric_autoscale
copying torch_geometric_autoscale/utils.py -> build/lib.linux-x86_64-3.6/torch_geometric_autoscale
copying torch_geometric_autoscale/data.py -> build/lib.linux-x86_64-3.6/torch_geometric_autoscale
creating build/lib.linux-x86_64-3.6/torch_geometric_autoscale/models
copying torch_geometric_autoscale/models/gcn2.py -> build/lib.linux-x86_64-3.6/torch_geometric_autoscale/models
copying torch_geometric_autoscale/models/init.py -> build/lib.linux-x86_64-3.6/torch_geometric_autoscale/models
copying torch_geometric_autoscale/models/gcn.py -> build/lib.linux-x86_64-3.6/torch_geometric_autoscale/models
copying torch_geometric_autoscale/models/gat.py -> build/lib.linux-x86_64-3.6/torch_geometric_autoscale/models
copying torch_geometric_autoscale/models/appnp.py -> build/lib.linux-x86_64-3.6/torch_geometric_autoscale/models
copying torch_geometric_autoscale/models/pna_jk.py -> build/lib.linux-x86_64-3.6/torch_geometric_autoscale/models
copying torch_geometric_autoscale/models/base.py -> build/lib.linux-x86_64-3.6/torch_geometric_autoscale/models
copying torch_geometric_autoscale/models/pna.py -> build/lib.linux-x86_64-3.6/torch_geometric_autoscale/models
running build_ext
/home/ec2-user/anaconda3/envs/python3/lib/python3.6/site-packages/torch/utils/cpp_extension.py:312: UserWarning:
                             !! WARNING !!
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
Your compiler (g++ 4.8.5) may be ABI-incompatible with PyTorch!
Please use a compiler that is ABI-compatible with GCC 5.0 and above.
See https://gcc.gnu.org/onlinedocs/libstdc++/manual/abi.html.

See https://gist.github.com/goldsborough/d466f43e8ffc948ff92de7486c5216d6
for instructions on how to install GCC 5 or higher.
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
                            !! WARNING !!

warnings.warn(ABI_INCOMPATIBILITY_WARNING.format(compiler))
building 'torch_geometric_autoscale._relabel' extension
creating build/temp.linux-x86_64-3.6
creating build/temp.linux-x86_64-3.6/csrc
creating build/temp.linux-x86_64-3.6/csrc/cpu
gcc -pthread -B /home/ec2-user/anaconda3/envs/python3/compiler_compat -Wl,--sysroot=/ -Wsign-compare -DNDEBUG -g -fwrapv -O3 -Wall -Wstrict-prototypes -fPIC -DWITH_CUDA -Icsrc -I/home/ec2-user/anaconda3/envs/python3/lib/python3.6/site-packages/torch/include -I/home/ec2-user/anaconda3/envs/python3/lib/python3.6/site-packages/torch/include/torch/csrc/api/include -I/home/ec2-user/anaconda3/envs/python3/lib/python3.6/site-packages/torch/include/TH -I/home/ec2-user/anaconda3/envs/python3/lib/python3.6/site-packages/torch/include/THC -I/usr/local/cuda-10.0/include -I/home/ec2-user/anaconda3/envs/python3/include/python3.6m -c csrc/relabel.cpp -o build/temp.linux-x86_64-3.6/csrc/relabel.o -DAT_PARALLEL_OPENMP -fopenmp -DTORCH_API_INCLUDE_EXTENSION_H -DPYBIND11_COMPILER_TYPE="_gcc" -DPYBIND11_STDLIB="_libstdcpp" -DPYBIND11_BUILD_ABI="_cxxabi1011" -DTORCH_EXTENSION_NAME=_relabel -D_GLIBCXX_USE_CXX11_ABI=0 -std=c++14
gcc: error: unrecognized command line option ‘-std=c++14’
error: command 'gcc' failed with exit status 1

ERROR: Failed building wheel for torch-geometric-autoscale
ERROR: Command errored out with exit status 1:
command: /home/ec2-user/anaconda3/envs/python3/bin/python -u -c 'import io, os, sys, setuptools, tokenize; sys.argv[0] = '"'"'/tmp/pip-req-build-wjznfuqn/setup.py'"'"'; file='"'"'/tmp/pip-req-build-wjznfuqn/setup.py'"'"';f = getattr(tokenize, '"'"'open'"'"', open)(file) if os.path.exists(file) else io.StringIO('"'"'from setuptools import setup; setup()'"'"');code = f.read().replace('"'"'\r\n'"'"', '"'"'\n'"'"');f.close();exec(compile(code, file, '"'"'exec'"'"'))' install --record /tmp/pip-record-o2qjs31r/install-record.txt --single-version-externally-managed --compile --install-headers /home/ec2-user/anaconda3/envs/python3/include/python3.6m/torch-geometric-autoscale
cwd: /tmp/pip-req-build-wjznfuqn/
Complete output (46 lines):
running install
running build
running build_py
creating build
creating build/lib.linux-x86_64-3.6
creating build/lib.linux-x86_64-3.6/torch_geometric_autoscale
copying torch_geometric_autoscale/init.py -> build/lib.linux-x86_64-3.6/torch_geometric_autoscale
copying torch_geometric_autoscale/metis.py -> build/lib.linux-x86_64-3.6/torch_geometric_autoscale
copying torch_geometric_autoscale/pool.py -> build/lib.linux-x86_64-3.6/torch_geometric_autoscale
copying torch_geometric_autoscale/loader.py -> build/lib.linux-x86_64-3.6/torch_geometric_autoscale
copying torch_geometric_autoscale/history.py -> build/lib.linux-x86_64-3.6/torch_geometric_autoscale
copying torch_geometric_autoscale/utils.py -> build/lib.linux-x86_64-3.6/torch_geometric_autoscale
copying torch_geometric_autoscale/data.py -> build/lib.linux-x86_64-3.6/torch_geometric_autoscale
creating build/lib.linux-x86_64-3.6/torch_geometric_autoscale/models
copying torch_geometric_autoscale/models/gcn2.py -> build/lib.linux-x86_64-3.6/torch_geometric_autoscale/models
copying torch_geometric_autoscale/models/init.py -> build/lib.linux-x86_64-3.6/torch_geometric_autoscale/models
copying torch_geometric_autoscale/models/gcn.py -> build/lib.linux-x86_64-3.6/torch_geometric_autoscale/models
copying torch_geometric_autoscale/models/gat.py -> build/lib.linux-x86_64-3.6/torch_geometric_autoscale/models
copying torch_geometric_autoscale/models/appnp.py -> build/lib.linux-x86_64-3.6/torch_geometric_autoscale/models
copying torch_geometric_autoscale/models/pna_jk.py -> build/lib.linux-x86_64-3.6/torch_geometric_autoscale/models
copying torch_geometric_autoscale/models/base.py -> build/lib.linux-x86_64-3.6/torch_geometric_autoscale/models
copying torch_geometric_autoscale/models/pna.py -> build/lib.linux-x86_64-3.6/torch_geometric_autoscale/models
running build_ext
/home/ec2-user/anaconda3/envs/python3/lib/python3.6/site-packages/torch/utils/cpp_extension.py:312: UserWarning:
                               !! WARNING !!

!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
Your compiler (g++ 4.8.5) may be ABI-incompatible with PyTorch!
Please use a compiler that is ABI-compatible with GCC 5.0 and above.
See https://gcc.gnu.org/onlinedocs/libstdc++/manual/abi.html.

See https://gist.github.com/goldsborough/d466f43e8ffc948ff92de7486c5216d6
for instructions on how to install GCC 5 or higher.
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!

                              !! WARNING !!

  warnings.warn(ABI_INCOMPATIBILITY_WARNING.format(compiler))
building 'torch_geometric_autoscale._relabel' extension
creating build/temp.linux-x86_64-3.6
creating build/temp.linux-x86_64-3.6/csrc
creating build/temp.linux-x86_64-3.6/csrc/cpu
gcc -pthread -B /home/ec2-user/anaconda3/envs/python3/compiler_compat -Wl,--sysroot=/ -Wsign-compare -DNDEBUG -g -fwrapv -O3 -Wall -Wstrict-prototypes -fPIC -DWITH_CUDA -Icsrc -I/home/ec2-user/anaconda3/envs/python3/lib/python3.6/site-packages/torch/include -I/home/ec2-user/anaconda3/envs/python3/lib/python3.6/site-packages/torch/include/torch/csrc/api/include -I/home/ec2-user/anaconda3/envs/python3/lib/python3.6/site-packages/torch/include/TH -I/home/ec2-user/anaconda3/envs/python3/lib/python3.6/site-packages/torch/include/THC -I/usr/local/cuda-10.0/include -I/home/ec2-user/anaconda3/envs/python3/include/python3.6m -c csrc/relabel.cpp -o build/temp.linux-x86_64-3.6/csrc/relabel.o -DAT_PARALLEL_OPENMP -fopenmp -DTORCH_API_INCLUDE_EXTENSION_H -DPYBIND11_COMPILER_TYPE="_gcc" -DPYBIND11_STDLIB="_libstdcpp" -DPYBIND11_BUILD_ABI="_cxxabi1011" -DTORCH_EXTENSION_NAME=_relabel -D_GLIBCXX_USE_CXX11_ABI=0 -std=c++14
gcc: error: unrecognized command line option ‘-std=c++14’
error: command 'gcc' failed with exit status 1
----------------------------------------
ERROR: Command errored out with exit status 1: /home/ec2-user/anaconda3/envs/python3/bin/python -u -c 'import io, os, sys, setuptools, tokenize; sys.argv[0] = '"'"'/tmp/pip-req-build-wjznfuqn/setup.py'"'"'; file='"'"'/tmp/pip-req-build-wjznfuqn/setup.py'"'"';f = getattr(tokenize, '"'"'open'"'"', open)(file) if os.path.exists(file) else io.StringIO('"'"'from setuptools import setup; setup()'"'"');code = f.read().replace('"'"'\r\n'"'"', '"'"'\n'"'"');f.close();exec(compile(code, file, '"'"'exec'"'"'))' install --record /tmp/pip-record-o2qjs31r/install-record.txt --single-version-externally-managed --compile --install-headers /home/ec2-user/anaconda3/envs/python3/include/python3.6m/torch-geometric-autoscale Check the logs for full command output.

gcc --version
gcc (GCC) 4.8.5 20150623 (Red Hat 4.8.5-28) Copyright (C) 2015 Free Software Foundation, Inc. This is free software; see the source for copying conditions. There is NO warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.

Is 5.0 gcc version is really needed?

Non-Square adj_t for GCN

Hello,

I was trying to rewrite the code by myself. First, I found the outputted adj_t from SubgraphLoader is non-square, it has size[target, target+neighbors]. Hence, when I tried like:

x = conv(x, adj)

it returns the AssertionError, because these lines in the gcn_convs:

if isinstance(edge_index, SparseTensor):
        assert edge_index.size(0) == edge_index.size(1)

I am wondering how the author solved this issue, and how to modify the code of SubgraphLoader in order to output square adj_t as normal case. Thanks!

Plan to implement the VR-GCN model

Hi, thanks for your solid work! I am wondering if you plan to implement the VR-GCN[1] by pytorch_geometric since it is similar to GNNAutoScale.

[1] Stochastic training of graph convolutional networks with variance reduction. In ICML2018.

RuntimeError: Not compiled with METIS support

I would like to try pyg_autoscale. I have tried to install it on my mac M1 Max with MacOS 13.2.1. Here is the sequence of commands that I used:

conda create --name playground_pygas python=3.8
conda activate playground_pygas
pip3 install torch torchvision
pip install torch-scatter -f https://pytorch-geometric.com/whl/torch-2.0.0+cpu.html
pip install torch-sparse -f https://pytorch-geometric.com/whl/torch-2.0.0+cpu.html
pip install torch-geometric
pip install git+https://github.com/rusty1s/pyg_autoscale.git

According to https://pytorch.org/get-started/locally/ the command pip3 install torch torchvision install current stable (2.0.0) PyTorch. Other commands according to https://github.com/rusty1s/pyg_autoscale#readme.

All packages install successfully, but when I try one of the example scripts (for example https://github.com/rusty1s/pyg_autoscale/blob/master/examples/train_gcn.py I get the following error:

mputing METIS partitioning with 40 parts... Traceback (most recent call last):
  File "train_gcn.py", line 25, in <module>
    perm, ptr = metis(data.adj_t, num_parts=40, log=True)
  File "/Users/marco/miniconda3/envs/playground_pygas/lib/python3.8/site-packages/torch_geometric_autoscale/metis.py", line 31, in metis
    cluster = partition_fn(rowptr, col, None, num_parts, recursive)
  File "/Users/marco/miniconda3/envs/playground_pygas/lib/python3.8/site-packages/torch/_ops.py", line 502, in __call__
    return self._op(*args, **kwargs or {})
RuntimeError: Not compiled with METIS support

What am I doing wrong? Any help is much appreciated.
Thank you!

Reproduction problem of gcn on the yelp dataset

Hello, GNNAutoScale is an interesting and solid work, and thanks for your hard work.
Now I am trying to reproduce the results. However, when I try to run the GAS on the yelp dataset with GCN using the hyper-parameters provided in the conf/model/gcn.yaml, the final test result is much lower than reported.

I am not sure where the problem is, and could you please help solve this problem. Thanks.

Configuration

python version: 3.8.10
torch and cuda version: 1.7.0+cu110

pytorch cuda streams parallel

Hi, It is a great job for the big GNN training, thank you for your job. I have a question, It is seems that the cuda streams could not parallel in the pytorch, like issue pytorch/pytorch#25540, is there some tricks in PygGas?

Reproduction of GNNAutoScale using PGL

We are impressed by your great work in Auto-Scaling GNNs, which gives us a lot of inspiration. Therefore, we recently reproduce your GNNAutoScale framework using PGL and PaddlePaddle. And the code is available at https://github.com/PaddlePaddle/PGL/tree/main/apps/GNNAutoScale.

Please feel free to give us your valuable advice. Thank you!

Missing LICENSE information

Hi,

We are considering using this package for experiments.
Would it be possible to add license information?

Thanks for the amazing implementation.

Inductive learning on Reddit

Hi, @rusty1s , thanks for your really great job and wonderful codes!!! I am recently doing some jobs about Graph Representation Learning and I am working on Reddit dataset. It seems you deal with the dataset in the way of semi-supervised transductive node classification just like for Cora ,Citeseer, and Pubmed? I am a little confused about if we should split the whole dataset into training/validation/testing graph and then use metis to partition them seperately in the setting of inductive learning? I am looking forward to receiving your reply!! Thanks very much!!

	cudaMemcpyAsync(
	dst_data + (dst_offset * size), src_data + (src_offset * size),
	c * size * sizeof(scalar_t), cudaMemcpyDeviceToHost, stream);

rusty1s / pyg_autoscale Goto Github PK

pyg_autoscale's Introduction

pyg_autoscale's People

Contributors

Stargazers

Watchers

Forkers

pyg_autoscale's Issues

Error message

Environment

Recommend Projects

Recommend Topics

Recommend Org