safe-graph / dgfraud-tf2 Goto Github PK
View Code? Open in Web Editor NEWA Deep Graph-based Toolbox for Fraud Detection in TensorFlow 2.X
License: Apache License 2.0
A Deep Graph-based Toolbox for Fraud Detection in TensorFlow 2.X
License: Apache License 2.0
When I use the Player2Vec algorithm, I am confused by the masked cross entropy loss. mask/tf.reduce_sum(mask)
has taken the average of items which are equal to 1. Why does it need to do another global average (tf.reduce_mean(loss)
) instead of summing (tf.reduce_sum(loss)
)?
def masked_softmax_cross_entropy(preds: tf.Tensor, labels: tf.Tensor,
mask: tf.Tensor) -> tf.Tensor:
"""
Softmax cross-entropy loss with masking.
:param preds: the last layer logits of the input data
:param labels: the labels of the input data
:param mask: the mask for train/val/test data
"""
loss = tf.nn.softmax_cross_entropy_with_logits(logits=preds, labels=labels)
mask = tf.cast(mask, dtype=tf.float32)
mask /= tf.maximum(tf.reduce_sum(mask), tf.constant([1.]))
loss *= mask
return tf.reduce_mean(loss)
When I tried to run GraphConsis, I found that the model always predicted all nodes as negative (normal nodes), resulting in AUC=0.5000, F1=score=0.0000. I tried to modify the parameters, mainly for the learning rate and epoch, and other parameters. Consistent with the paper, I am very confused. The parameters are as follows:
parser.add_argument('--seed', type=int, default=42, help='random seed') parser.add_argument('--epochs', type=int, default=5,help='number of epochs to train') parser.add_argument('--batch_size', type=int, default=512, help='batch size') parser.add_argument('--train_size', type=float, default=0.8,help='training set percentage') parser.add_argument('--lr', type=float, default=0.1, help='learning rate') parser.add_argument('--nhid', type=int, default=128, help='number of hidden units') parser.add_argument('--sample_sizes', type=list, default=[10, 5],help='number of samples for each layer') parser.add_argument('--identity_dim', type=int, default=32,help='dimension of context embedding') parser.add_argument('--eps', type=float, default=0.001,help='consistency score threshold ε') args = parser.parse_args()
for generating u_i, u_j in SemiGNN, the code at line 187 in utils.py
for index in range(0, num_of_nodes):
u_i.append(pairs[index][0])
u_j.append(pairs[index][1])
the length of num_of_nodes is not the same as the length of pairs. It cann't iterate all pairs.
There seems to be an error in the implementation in Equation (8) in the GAS paper.
In the combination part from line 553 to 561,
# Combination
if self.concat:
user_output = dot(user_output, self.concate_user_weights,
sparse=False)
item_output = dot(item_output, self.concate_item_weights,
sparse=False)
user_output = tf.concat([user_vecs, user_output], axis=1)
item_output = tf.concat([item_vecs, item_output], axis=1)
It seems that user_vecs rather than user_output should be apply to the dot
function with concat weights, as shown in the equation (8).
作者你好!
在您的论文实验部分提到:
We take the 100-dimension Word2Vec embedding of each review as its feature like previous work
我的理解是对每一个review的文本内容进行Word2Vec得到100维的特征向量
但是在数据集中显示的是45954x32的features稀疏矩阵,也就是每一个review节点有32维特征:
我很疑惑这32维特征对应的具体含义,是属于review的文本特征还是行为特征?还是都包含?
在YelpCHI原数数据集论文:
S. Rayana and L. Akoglu. 2015. Collective Opinion Spam Detection: Bridging Review Networks and Metadata. In KDD.
中review特征提取部分包含6个行为特征和10个文本特征,共16维
由于联系论文作者没有取得回复,我不知道原数据中节点特征是怎样的。
希望您能抽空解答,谢谢!
When computing the joint embedding of user u (i.e. Eq.4) for SemiGNN, we should concatenate the weighted embedding in every view of user u.
In our code, The matrix alpha * h (called h_tmp below) is a matrix of (view_num, node_num, self.encoding[-1]). for user 0, we ought to concatenate h_tmp[0][0], h_tmp[1][0], h_tmp[2][0], ...
Yet this line doesn't do that, since tf.reshape(...) doesn't change the overall order of the elements in matrix.
Instead we should be using the code below to implement the concatenation.
output = tf.concat([h_tmp[i] for i in range(self.view_num)], 1)
Don't know if that's right...
I put forward a issue first time, and I don't know what the rules are. Please forgive me if I offend you
semiGnn uses the get_negative_sampling function in Utils.py
but i noticed that len(pairs) not equels len(adj_nodelist)
maybe repalacing “for index in range(0, len(adj_nodelist)):” with “for index in range(0, len(pairs)):” would more correct?
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.