phanein / deepwalk Goto Github PK

View Code? Open in Web Editor NEW

2.7K 2.7K 830.0 21.27 MB

DeepWalk - Deep Learning for Graphs

Home Page: http://www.perozzi.net/projects/deepwalk/

License: Other

Makefile 4.84% TeX 2.81% Python 92.35%

deepwalk's People

Contributors

Stargazers

Watchers

Forkers

wangdongfrank travisbrady liumangtu fanfannothing antoine-tran zhoujialinmumu viveksck yanweifu spillai hihihippp devsinghsachan yulongp weixiaohua babaozhouy5 xsongx ujjwalkarn chihming anthonywang14 tigerneil haonest tttwwy zhuxf0407 ikhlasalhussien nagyistoce ericxsun lenovor zhouchong90 wjmzjx namkhanhtran dawoudi henryslzhao dhruvparamhans lucamelis transcranial jasmor jolilin tpnguyen thunderlbc semihyavuzz skaasj ifff jessilee peratham adoni pombredanne napsternxg adeze wangxiong2015 imclab garfielder007 ymt123 petarr casperhsia vikeydr alphaprime sddchina parisilabs zepx gtmac chubbymaggie annamalai-nr mehdimashayekhi codeaudit njuhugn himangshunits vortext sbasant-cmx rlugojr ddlricardo fulquan taolian nobodyinamerica hiyorimi vanechu wuntoguo wellyzhang gxyepfl feilong0309 thuzhf vatsal2020 binbenliu montecarlo1 bio-ontology-research-group lawwp pandasasa furyphoenix gear davechallis yama1968 david-loughnane lyutianshu wirehack erictham ml-lab geolibrerian zhenv5 caomw baichuan shubhranshu-shekhar hhh920406

deepwalk's Issues

ValueError: Unknown mat file type, version 32, 49

I'm trying to get the adjacency matrix for matlab format of the example:

macbookproloreto:deepwalk admin$ deepwalk --format mat --matfile adjlist --input example_graphs/karate.adjlist --output karate.embeddings

but I get an error

Traceback (most recent call last):
  File "/usr/local/bin/deepwalk", line 9, in <module>
    load_entry_point('deepwalk==1.0.1', 'console_scripts', 'deepwalk')()
  File "/Library/Python/2.7/site-packages/deepwalk-1.0.1-py2.7.egg/deepwalk/__main__.py", line 157, in main
    process(args)
  File "/Library/Python/2.7/site-packages/deepwalk-1.0.1-py2.7.egg/deepwalk/__main__.py", line 52, in process
    G = graph.load_matfile(args.input, variable_name=args.matfile_variable_name, undirected=args.undirected)
  File "/Library/Python/2.7/site-packages/deepwalk-1.0.1-py2.7.egg/deepwalk/graph.py", line 263, in load_matfile
    mat_varables = loadmat(file_)
  File "/System/Library/Frameworks/Python.framework/Versions/2.7/Extras/lib/python/scipy/io/matlab/mio.py", line 125, in loadmat
    MR = mat_reader_factory(file_name, appendmat, **kwargs)
  File "/System/Library/Frameworks/Python.framework/Versions/2.7/Extras/lib/python/scipy/io/matlab/mio.py", line 55, in mat_reader_factory
    mjv, mnv = get_matfile_version(byte_stream)
  File "/System/Library/Frameworks/Python.framework/Versions/2.7/Extras/lib/python/scipy/io/matlab/miobase.py", line 236, in get_matfile_version
    % ret)
ValueError: Unknown mat file type, version 32, 49

load_adjacency-list no reaction

HI Phanein,
i want to load a graph using G = graph.load_adjacencylist('adjedges.txt'),but there are something wrong ,when i run the code:

numbers are some flag：

'adjedges.txt' is as follow:

Deepwalk on directed graphs

What am I missing? Embeddings produced with deepwalk --format edgelist --undirected false --input file --out embedfile are identical to deepwalk --format edgelist --input file --out embedfile. The question is how do I create embeddings for directed graphs? Please help.

A Warning when I run the example command

$ deepwalk --input example_graphs/karate.adjlist --output karate.embeddings
Number of nodes: 34
Number of walks: 340
Data size (walks*length): 13600
Walking...
Training...
2018-03-29 10:35:53 WARNING word2vec.py: 1089 under 10 jobs per worker: consider setting a smaller `batch_words' for smoother alpha decay

Deepwalk on Weighted Graph

Hi,
I would like to know what should be done to use DeepWalk on weighted Graphs?
Do I need to consider weighted random walk or is there any way to run it on Weighted adjacency or Laplacian matrix.

Thanks,
Asif

Has the data set BlogCatalog been processed?

The data set BlogCatalog here is different from the site: http://socialcomputing.asu.edu/datasets/BlogCatalog3, which is used in Node2vec.

ValueError: You appear to be using a legacy multi-label data representation. Sequence of sequences are no longer supported; use a binary array or sparse matrix instead.

my scikit version is 0.17
i get the error when i run the score.py

here is the full stacktrace

Traceback (most recent call last):
File "F:\local_work\deepwalk\example_graphs\scoring.py", line 111, in
results[average] = f1_score(y_test, preds, average=average)
File "D:\Programs\Anaconda2\lib\site-packages\sklearn\metrics\classification.py", line 639, in f1_score
sample_weight=sample_weight)
File "D:\Programs\Anaconda2\lib\site-packages\sklearn\metrics\classification.py", line 756, in fbeta_score
sample_weight=sample_weight)
File "D:\Programs\Anaconda2\lib\site-packages\sklearn\metrics\classification.py", line 956, in precision_recall_fscore_support
y_type, y_true, y_pred = _check_targets(y_true, y_pred)
File "D:\Programs\Anaconda2\lib\site-packages\sklearn\metrics\classification.py", line 74, in _check_targets
type_pred = type_of_target(y_pred)
File "D:\Programs\Anaconda2\lib\site-packages\sklearn\utils\multiclass.py", line 251, in type_of_target
raise ValueError('You appear to be using a legacy multi-label data'
ValueError: You appear to be using a legacy multi-label data representation. Sequence of sequences are no longer supported; use a binary array or sparse matrix instead.

what does this mean?

I tried

    from sklearn.preprocessing import MultiLabelBinarizer  
    y=MultiLabelBinarizer().fit_transform(y)

but is doesn't run

ValueError: Attempted relative import in non-package

Command: python main.py --input ~/code/deepwalk/example_graphs/karate.adjlist --output ~/code/deepwalk/karate.embeddings
when I use Python3, it displays: ImportError: cannot import name 'graph'
when I use Python2, it displays: ValueError: Attempted relative import in non-package

I viewed the history of the code, and reset the code to the last commit which is 00dce6a. Then it ran successfully.
I don't know if there is something wrong. Just a question.

Exception"AttributeError: 'NoneType' object has no attribute 'nodes' " arise when Dumping walks to disk

when i train the datasets named: youtube.mat with parameter:
deepwalk --input ./datasets/youtube.mat --output ./datasets/youtube.embeddings --format mat --number-walks 80 --window-size 10 --workers 4 --representation-size 128
then it print this message:

Number of nodes: 1138499
Number of walks: 91079920
Data size (walks*length): 3643196800
Data size 3643196800 is larger than limit (max-memory-data-size: 1000000000). D
umping walks to disk.
Walking...
Traceback (most recent call last):
File "C:\Users\Jiaming\Anaconda2\Scripts\deepwalk-script.py", line 9, in
load_entry_point('deepwalk==1.0.1', 'console_scripts', 'deepwalk')()
File "C:\Users\Jiaming\Anaconda2\lib\site-packages\deepwalk-1.0.1-py2.7.egg\de
epwalk__main__.py", line 161, in main
process(args)
File "C:\Users\Jiaming\Anaconda2\lib\site-packages\deepwalk-1.0.1-py2.7.egg\de
epwalk__main__.py", line 83, in process
num_workers=args.workers)
File "C:\Users\Jiaming\Anaconda2\lib\site-packages\deepwalk-1.0.1-py2.7.egg\de
epwalk\walks.py", line 87, in write_walks_to_disk
for file_ in executor.map(_write_walks_to_disk, args_list):
File "C:\Users\Jiaming\Anaconda2\lib\site-packages\concurrent\futures_base.py
", line 579, in result_iterator
yield future.result()
File "C:\Users\Jiaming\Anaconda2\lib\site-packages\concurrent\futures_base.py
", line 403, in result
return self.__get_result()
File "C:\Users\Jiaming\Anaconda2\lib\site-packages\concurrent\futures_base.py
", line 355, in __get_result
raise type(self._exception), self._exception, self._traceback
AttributeError: 'NoneType' object has no attribute 'nodes'

can any one tell me what happened and how can i fix it ?
Thanks.

deepwalk does not install dependencies automatically on "pip install deepwalk"

setup.py does not specify requirements in the requirements section.

What about time-varying graphs?

error in mac os 10.10

Hi,

I am trying to run deepwalk in mac pro after it's successfully installed.

When using the command:
deepwalk --input example_graphs/karate.adjlist --output karate.embeddings

I had the error:

Traceback (most recent call last):
File "/usr/local/bin/deepwalk", line 9, in
load_entry_point('deepwalk==v1.0.2', 'console_scripts', 'deepwalk')()
File "build/bdist.macosx-10.9-intel/egg/pkg_resources.py", line 356, in load_entry_point
"""Return name entry point of group for dist or raise ImportError"""
File "build/bdist.macosx-10.9-intel/egg/pkg_resources.py", line 2476, in load_entry_point
except ValueError:
File "build/bdist.macosx-10.9-intel/egg/pkg_resources.py", line 2190, in load
parse_map = classmethod(parse_map)
File "/Library/Python/2.7/site-packages/deepwalk-v1.0.2-py2.7.egg/deepwalk/main.py", line 26, in
p.set_cpu_affinity(list(range(cpu_count())))
AttributeError: 'Process' object has no attribute 'set_cpu_affinity'

Any ideas? I guess it's because of my recent update of mac os from 10.9 to 10.10, but I cannot figure out how to fix this.

Question: DeepWalk on multigraphs?

Hello,

I would like to ask if DeepWalk can be applied to directed Graphs with more that one edge between two nodes?

Thank you very much for your time and for this great research work!

Kind regards,
Prodromos

for nodes didn't appear in walks

Hi,

I read through the code and I have a question. The embedding is generated from walks which generated randomly from the graph, so there must be some nodes didn't appear in the generated walks. Then how to learning their embeddings ? Cause I find that the walks generated didn't cover all the nodes, but the output embedding contains all the nodes representation. So I'm confused how the missing nodes' representation being learned.

Can anybody help? :)

为什么我安装有问题呢，显示 from skipgram import Skipgram ImportError: No module named 'skipgram'

E:\实验\全视角特征\deepwalk>deepwalk --input example_graphs/karate.adjlist --output karate.embeddings
C:\Users\FIRST\AppData\Local\Programs\Python\Python35\lib\site-packages\gensim\utils.py:840: UserWarning: detected Windows; aliasing chunkize to chunkize_serial
warnings.warn("detected Windows; aliasing chunkize to chunkize_serial")
C:\Users\FIRST\AppData\Local\Programs\Python\Python35\lib\site-packages\gensim\utils.py:1015: UserWarning: Pattern library is not installed, lemmatization won't be available.
warnings.warn("Pattern library is not installed, lemmatization won't be available.")
Traceback (most recent call last):
File "C:\Users\FIRST\AppData\Local\Programs\Python\Python35\Scripts\deepwalk-script.py", line 9, in
load_entry_point('deepwalk==1.0.1', 'console_scripts', 'deepwalk')()
File "C:\Users\FIRST\AppData\Local\Programs\Python\Python35\lib\site-packages\pkg_resources_init_.py", line 542, in load_entry_point
return get_distribution(dist).load_entry_point(group, name)
File "C:\Users\FIRST\AppData\Local\Programs\Python\Python35\lib\site-packages\pkg_resources_init_.py", line 2569, in load_entry_point
return ep.load()
File "C:\Users\FIRST\AppData\Local\Programs\Python\Python35\lib\site-packages\pkg_resources_init_.py", line 2229, in load
return self.resolve()
File "C:\Users\FIRST\AppData\Local\Programs\Python\Python35\lib\site-packages\pkg_resources_init_.py", line 2235, in resolve
module = import(self.module_name, fromlist=['name'], level=0)
File "C:\Users\FIRST\AppData\Local\Programs\Python\Python35\lib\site-packages\deepwalk-1.0.1-py3.5.egg\deepwalk_main_.py", line 16, in
from skipgram import Skipgram
ImportError: No module named 'skipgram'

Reproduce paper results

Hi,

I am trying to reproduce paper results at the moment. As the paper indicated, I have tried using LIBLINEAR. I could not however find exact details into how each instance is evaluated. I speculated by using n one Vs. all binary classifiers (where n is number of classes) and then sort predictions by estimate value and compare the highest m to the groundtruth (where m is the number of true classes). Still, I got macro F1 score of 16%.

Cython compilation failed

As I'm running the deepwalk, a warning occurs:

C:\Python27\lib\site-packages\gensim\models\word2vec.py:406: UserWarning: Cython
compilation failed, training will be slow. Do you have Cython installed? pip i nstall cython
warnings.warn("Cython compilation failed, training will be slow. Do you have C
ython installed? pip install cython")

However, I have installed the Cython as the requirements say, version 0.23.4
Why? And how slow will it be without Cython?

Thank you for your attention

AttributeError: 'NoneType' object has no attribute 'nodes'

def build_deepwalk_corpus_iter(G, num_paths, path_length, alpha=0,
rand=random.Random(0)):
walks = []

nodes = list(G.nodes())

for cnt in range(num_paths):
rand.shuffle(nodes)
for node in nodes:
yield G.random_walk(path_length, rand=rand, alpha=alpha, start=node)

G is null. It look's like the __current_graph is reset to None again.

Parallel Implementation of Deepwalk

When I configure --workers 5 ,it is not running on the 5 instances.It is taking lot of time to complete its execution for Larger files and it is showing message killed,but not any info on why it is killed.

change the value of argument

Hi,I was reading your paper about deepwalk.I think that the idea in the paper is amazing.And I want to do some experiments about it.How can I change the value of the arguments in deepwalk?Thank you very much.

ImportError: No module named 'skipgram'

I install all requirements by

cd deepwalk
pip install -r requirements.txt
python setup.py install

but when I run deepwalk.exe --input example_graphs/karate.adjlist --output karate.embeddings

It shows :ImportError: No module named 'skipgram'

deepwalk: error: unrecognized arguments

Hi,
Thanks for deepwalk!
'error: unrecognized arguments' when I run the code, I'll be very appreciate for your help

a question rather than an issue

I would like to know from you, can a disconnected node have a low dimensional representation. E.g., a node v is not connected to any other nodes in the graph, can we learn its representation using deepwalk.
For me, since it is not connected to any other node, we can't obtain its random walks, thus we can't learn its representation. Am i correct?

python3 can't execute deepwalk but can build deepwalk

when i use python3 run $deepwalk --input XXXX --output XXXX,there is an error occured:ImportError there is no module named graph,but when i use python2,it's ok.Can you tell me why and how to use python3?

TypeError: 'NoneType' object is not iterable

Traceback` (most recent call last):
  File "__main__.py", line 165, in <module>
    sys.exit(main())
  File "__main__.py", line 162, in main
    process(args)
  File "__main__.py", line 57, in process
    G = graph.load_matfile(args.input, variable_name=args.matfile_variable_name, undirected=args.undirected)
  File "/home/aswathy/.local/lib/python2.7/site-packages/deepwalk/graph.py", line 263, in load_matfile
    mat_varables = loadmat(file_)
  File "/home/aswathy/.local/lib/python2.7/site-packages/scipy/io/matlab/mio.py", line 141, in loadmat
    MR, file_opened = mat_reader_factory(file_name, appendmat, **kwargs)
  File "/home/aswathy/.local/lib/python2.7/site-packages/scipy/io/matlab/mio.py", line 64, in mat_reader_factory
    byte_stream, file_opened = _open_file(file_name, appendmat)
TypeError: 'NoneType' object is not iterable

```

why do i get this error?

Paralleling walks generation (corpus generation)

Hi,

Can we parallelize this graph.build_deepwalk_corpus so that we could generate multiple walks parallelly?
This would bring a good speedup if we have to generate large numbers of walks?. Also, we could see if we could use previous walks (sub part) for generation of new walks.

Is the code just only applicable to undirected and unweighted graphs?

Dear sir,
the code is just only applicable to undirected and unweighted graphs, yes?
for weighted and directed graphs,it does not work, really?

Error: example_graphs/scoring.py UserWarning

when run the 'scoring.py', it likes this, did I miss something? Thanks for the help

node embeddings missing for some nodes

can you tell me when deepwalk does not provide embeddings for a node in the graph?

A Question about train set

I would like to know Do you use a part of data(labeled nodes) or all the data in the embedding process(not the classification)?
as i don't know this sentence 'Speci cally, we randomly sample a portion (TR) of the labeled nodes, and use them as training data.' in your paper mean.
You use all the data for embedding and then use part of nodes for classification train set?
thanks !

Did the model use CBOW rather than skipgram?

Hello,
I read the source code of deepwalk, especially skipgram.py.
I found that you only set the value of 'min_count', 'workers', 'size', 'sentences', and other parameters were automatically set to default value. So, 'cbow_mean' was set to 1, the model didn't user skipgram but CBOW.
I don't understand that. Please help. Thank you very much.

A question about how to use the embedding the deepwalk generate

I would like to know how the result of the experiment are produced?Do you use the Liblinear that is mentioned in your paper or use the scoring.py in the deepwalk file?
I 'm a newcomer in network representation,and look forward to your reply!

error in score.py

I run the score.py and get the following bugs:

Traceback (most recent call last):
  File "scoring.py", line 96, in <module>
    clf.fit(X_train, y_train)
  File "/usr/local/lib/python2.7/dist-packages/sklearn/multiclass.py", line 205, in fit
    Y = self.label_binarizer_.fit_transform(y)
  File "/usr/local/lib/python2.7/dist-packages/sklearn/base.py", line 494, in fit_transform
    return self.fit(X, **fit_params).transform(X)
  File "/usr/local/lib/python2.7/dist-packages/sklearn/preprocessing/label.py", line 296, in fit
    self.y_type_ = type_of_target(y)
  File "/usr/local/lib/python2.7/dist-packages/sklearn/utils/multiclass.py", line 250, in type_of_target
    raise ValueError('You appear to be using a legacy multi-label data'
ValueError: You appear to be using a legacy multi-label data representation. Sequence of sequences are no longer supported; use a binary array or sparse matrix instead.

Then I solved this by adding:

from sklearn.preprocessing import MultiLabelBinarizer
y_train = MultiLabelBinarizer().fit_transform(y_train)
y_test = MultiLabelBinarizer().fit_transform(y_test)
preds = MultiLabelBinarizer().fit_transform(preds)

But a new bug throws:

Traceback (most recent call last):
  File "scoring.py", line 108, in <module>
    results[average] = f1_score(y_test,  preds, average=average)
  File "/usr/local/lib/python2.7/dist-packages/sklearn/metrics/classification.py", line 692, in f1_score
    sample_weight=sample_weight)
  File "/usr/local/lib/python2.7/dist-packages/sklearn/metrics/classification.py", line 806, in fbeta_score
    sample_weight=sample_weight)
  File "/usr/local/lib/python2.7/dist-packages/sklearn/metrics/classification.py", line 1004, in precision_recall_fscore_support
    present_labels = unique_labels(y_true, y_pred)
  File "/usr/local/lib/python2.7/dist-packages/sklearn/utils/multiclass.py", line 92, in unique_labels
    raise ValueError("Multi-label binary indicator input with "
ValueError: Multi-label binary indicator input with different numbers of labels

The bug occurs may because the sklearn function used in score.py is too old to fit the new version. So now is there any trik to solve this?

error with running scoring.py

when i run scoring.py , the error occurs:
TypeError: load_word2vec_format() got an unexpected keyword argument 'norm_only'

Macro scores worse than expected on Blogcatalog

Issue was caused by normalization in the gensim load function (added gensim 0.9).

Thanks to Shaosheng Cao‍ from Xidian University for reporting.

AttributeError: 'NoneType' object has no attribute 'nodes'

When I set --max-memory-data-size 0, it throws following errors:
Number of nodes: 6301
Number of walks: 63010
Data size (walks*length): 2520400
Data size 2520400 is larger than limit (max-memory-data-size: 0). Dumping walks to disk.
Walking...
Traceback (most recent call last):
File "D:\Python27\Scripts\deepwalk-script.py", line 11, in
load_entry_point('deepwalk==1.0.1', 'console_scripts', 'deepwalk')()
File "D:\Python27\lib\site-packages\deepwalk-1.0.1-py2.7.egg\deepwalk_main_.py", line 162, in main
process(args)
File "D:\Python27\lib\site-packages\deepwalk-1.0.1-py2.7.egg\deepwalk_main_.py", line 83, in process
num_workers=args.workers)
File "D:\Python27\lib\site-packages\deepwalk-1.0.1-py2.7.egg\deepwalk\walks.py", line 85, in write_walks_to_disk
for file_ in executor.map(_write_walks_to_disk, args_list):
File "D:\Python27\lib\site-packages\concurrent\futures_base.py", line 641, in result_iterator
yield fs.pop().result()
File "D:\Python27\lib\site-packages\concurrent\futures_base.py", line 462, in result
return self.__get_result()
File "D:\Python27\lib\site-packages\concurrent\futures_base.py", line 414, in __get_result
raise exception_type, self._exception, self._traceback
AttributeError: 'NoneType' object has no attribute 'nodes'
Is there someone can help me?

AttributeError: 'Process' object has no attribute 'set_cpu_affinity'

Traceback (most recent call last):
File "/home/aswathy/.local/bin/deepwalk", line 7, in
from deepwalk.main import main
File "/home/aswathy/.local/lib/python2.7/site-packages/deepwalk/main.py", line 26, in
p.set_cpu_affinity(list(range(cpu_count())))
AttributeError: 'Process' object has no attribute 'set_cpu_affinity'

I have installed deepwalk on ubuntu 16.04, python 2.7.6.
Why do i get this error when i use $deepwalk --input example_graphs/karate.adjlist --output karate.embeddings ??

UnicodeDecodeError: 'gbk' codec can't decode bytes

I try the karate.adjlist is success, but try 'blogcatalog.mat' is fail, thanks for the help!

Error: 'example_graphs\scoring.py' UserWarning

when run the 'scoring.py', it likes this, did I miss something? Thanks for the help

about the top_k_list scoring.py

How should i understand the top_k_list?

What about the type of file that is a CSV file?

Dear sir,
The input data type can be a adjlist or edgelist or matfile or networkx,
but what about the csv type?

for example,The first column and the second column represent nodes,Indicates that there is an edge between the two nodes
how to input the csv type? thanks!

Why you are only using top-k predictions in the scoring.py?

This will provide additional information to the classifier that there are only k labels we want to get. Hence the F1 score is no longer a fair evaluation.

dict.keys() in Python3

dict.keys() returns an iterable instead of a list (e.g: graph.py:142)

Thanks to Shitian Shen (NCSU) for reporting.

TypeError in running deepwalk

Hi, I'm tying to run the deepwalk with Blogcatalog dataset on Fedora20 sys.
Here's the error I've got:
[HH@localhost deepwalk-master]$ deepwalk --input example_graphs/blogcatalog.mat --output blogcatalog.embeddings --format mat
/usr/lib/python2.7/site-packages/pkg_resources.py:979: UserWarning: /home/HH/.python-eggs is writable by group/others and vulnerable to attack when used with get_resource_filename. Consider a more secure location (set with .set_extraction_path or the PYTHON_EGG_CACHE environment variable).
warnings.warn(msg, UserWarning)
Number of nodes: 10312
Number of walks: 618720
Data size (walks_length): 18561600
Data size 18561600 is larger than limit (max-memory-data-size: 10). Dumping walks to disk.
Walking...
Counting vertex frequency...
Training...
Traceback (most recent call last):
File "/usr/bin/deepwalk", line 9, in
load_entry_point('deepwalk==1.0.1', 'console_scripts', 'deepwalk')()
File "/usr/lib/python2.7/site-packages/deepwalk-1.0.1-py2.7.egg/deepwalk/main.py", line 160, in main
process(args)
File "/usr/lib/python2.7/site-packages/deepwalk-1.0.1-py2.7.egg/deepwalk/main.py", line 94, in process
window=args.window_size, min_count=0, workers=args.workers)
File "/usr/lib/python2.7/site-packages/deepwalk-1.0.1-py2.7.egg/deepwalk/skipgram.py", line 27, in init
super(Skipgram, self).init(_*kwargs)
File "/usr/lib/python2.7/site-packages/gensim-0.11.1_1-py2.7-linux-i686.egg/gensim/models/word2vec.py", line 314, in init
raise TypeError("You can't pass a generator as the sentences argument. Try an iterator.")
TypeError: You can't pass a generator as the sentences argument. Try an iterator.

TypeError: unsupported operand type(s) for +: 'int' and 'str'

I had this exception:

macbookproloreto:deepwalk admin$ deepwalk --input example_graphs/karate.adjlist --output karate.embeddings
Number of nodes: 34
Number of walks: 340
Data size (walks*length): 13600
Walking...
Training...
Traceback (most recent call last):
  File "/usr/local/bin/deepwalk", line 9, in <module>
    load_entry_point('deepwalk==1.0.1', 'console_scripts', 'deepwalk')()
  File "/Library/Python/2.7/site-packages/deepwalk-1.0.1-py2.7.egg/deepwalk/__main__.py", line 157, in main
    process(args)
  File "/Library/Python/2.7/site-packages/deepwalk-1.0.1-py2.7.egg/deepwalk/__main__.py", line 71, in process
    model = Word2Vec(walks, size=args.representation_size, window=args.window_size, min_count=0, workers=args.workers)
  File "/Library/Python/2.7/site-packages/gensim/models/word2vec.py", line 444, in __init__
    self.build_vocab(sentences, trim_rule=trim_rule)
  File "/Library/Python/2.7/site-packages/gensim/models/word2vec.py", line 510, in build_vocab
    self.finalize_vocab()  # build tables & arrays
  File "/Library/Python/2.7/site-packages/gensim/models/word2vec.py", line 640, in finalize_vocab
    self.reset_weights()
  File "/Library/Python/2.7/site-packages/gensim/models/word2vec.py", line 986, in reset_weights
    self.syn0[i] = self.seeded_vector(self.index2word[i] + str(self.seed))
TypeError: unsupported operand type(s) for +: 'int' and 'str'
macbookproloreto:deepwalk admin$ vi /Library/Python/2.7/site-packages/gensim/models/word2vec.py

I have fixed it just removing str() from the line 986:

for i in xrange(len(self.vocab)):
            # construct deterministic seed from word AND seed argument
            self.syn0[i] = self.seeded_vector(self.index2word[i] + self.seed)

AttributeError: 'Process' object has no attribute 'set_cpu_affinity'

On Ubuntu 14.04 LTS with Python 2.7.6, I have done the deepwalk installation. When I use
$deepwalk --input example_graphs/karate.adjlist --output karate.embeddings

it displays

Traceback (most recent call last): File "/usr/local/bin/deepwalk", line 9, in <module> load_entry_point('deepwalk==1.0.1', 'console_scripts', 'deepwalk')() File "/usr/lib/python2.7/dist-packages/pkg_resources.py", line 351, in load_entry_point return get_distribution(dist).load_entry_point(group, name) File "/usr/lib/python2.7/dist-packages/pkg_resources.py", line 2363, in load_entry_point return ep.load() File "/usr/lib/python2.7/dist-packages/pkg_resources.py", line 2088, in load entry = __import__(self.module_name, globals(),globals(), ['__name__']) File "/usr/local/lib/python2.7/dist-packages/deepwalk-1.0.1-py2.7.egg/deepwalk/__main__.py", line 26, in <module> p.set_cpu_affinity(list(range(cpu_count()))) AttributeError: 'Process' object has no attribute 'set_cpu_affinity'

Deepwalk for Signed Networks?

I am trying to analyse a signed network using embeddings. Can I use deepwalk for Signed Network (each edge having label/weight as +1 or -1)?

My intuition is that negative edges should be handled differently than positive edges like distance between the nodes which have negative edge should be higher in the embedding space and so on.

A questions about the dataset -- BlogCatalog

I notice that some nodes in the blogcatalog may have more than one label (group),
in that case, i just want to know how you process this nodes in classification as the have many labels?
thanks!

TypeError: unsupported operand type(s) for +: 'int' and 'str'

Hi@phanein,
There is a problem when I run
sudo deepwalk --input example_graphs/karate.adjlist --output karate.embeddings

Here are the error messages:

Number of nodes: 34
Number of walks: 340
Data size (walks*length): 13600
Walking...
Training...
Traceback (most recent call last):
File "/usr/local/bin/deepwalk", line 9, in
load_entry_point('deepwalk==1.0.1', 'console_scripts', 'deepwalk')()
File "/usr/local/lib/python2.7/dist-packages/deepwalk-1.0.1-py2.7.egg/deepwalk/main.py", line 155, in main
process(args)
File "/usr/local/lib/python2.7/dist-packages/deepwalk-1.0.1-py2.7.egg/deepwalk/main.py", line 69, in process
model = Word2Vec(walks, size=args.representation_size, window=args.window_size, min_count=0, workers=args.workers)
File "/usr/local/lib/python2.7/dist-packages/gensim/models/word2vec.py", line 312, in init
self.build_vocab(sentences)
File "/usr/local/lib/python2.7/dist-packages/gensim/models/word2vec.py", line 414, in build_vocab
self.reset_weights()
File "/usr/local/lib/python2.7/dist-packages/gensim/models/word2vec.py", line 521, in reset_weights
random.seed(uint32(self.hashfxn(self.index2word[i] + str(self.seed))))
TypeError: unsupported operand type(s) for +: 'int' and 'str'

Number of nodes in graph class needs "self."

See line 126 of graph.py:

def number_of_nodes(self):
    "Returns the number of nodes in the graph"
    return order()

should be

def number_of_nodes(self):
    "Returns the number of nodes in the graph"
    return self.order()

Cool project!