safe-graph / dgfraud-tf2 Goto Github PK

A Deep Graph-based Toolbox for Fraud Detection in TensorFlow 2.X

License: Apache License 2.0

Python 100.00%

security machine-learning opensource graph-algorithms toolkit datascience outlier-detection fraud-detection fraud-prevention datamining

dgfraud-tf2's People

Contributors

Stargazers

Watchers

dgfraud-tf2's Issues

Question on the code of masked cross entropy loss

When I use the Player2Vec algorithm, I am confused by the masked cross entropy loss. mask/tf.reduce_sum(mask) has taken the average of items which are equal to 1. Why does it need to do another global average (tf.reduce_mean(loss)) instead of summing (tf.reduce_sum(loss))?

def masked_softmax_cross_entropy(preds: tf.Tensor, labels: tf.Tensor,
                                 mask: tf.Tensor) -> tf.Tensor:
    """
    Softmax cross-entropy loss with masking.
    :param preds: the last layer logits of the input data
    :param labels: the labels of the input data
    :param mask: the mask for train/val/test data
    """
    loss = tf.nn.softmax_cross_entropy_with_logits(logits=preds, labels=labels)
    mask = tf.cast(mask, dtype=tf.float32)
    mask /= tf.maximum(tf.reduce_sum(mask), tf.constant([1.]))
    loss *= mask
    return tf.reduce_mean(loss)

Question on the GraphConsis

When I tried to run GraphConsis, I found that the model always predicted all nodes as negative (normal nodes), resulting in AUC=0.5000, F1=score=0.0000. I tried to modify the parameters, mainly for the learning rate and epoch, and other parameters. Consistent with the paper, I am very confused. The parameters are as follows：
parser.add_argument('--seed', type=int, default=42, help='random seed') parser.add_argument('--epochs', type=int, default=5,help='number of epochs to train') parser.add_argument('--batch_size', type=int, default=512, help='batch size') parser.add_argument('--train_size', type=float, default=0.8,help='training set percentage') parser.add_argument('--lr', type=float, default=0.1, help='learning rate') parser.add_argument('--nhid', type=int, default=128, help='number of hidden units') parser.add_argument('--sample_sizes', type=list, default=[10, 5],help='number of samples for each layer') parser.add_argument('--identity_dim', type=int, default=32,help='dimension of context embedding') parser.add_argument('--eps', type=float, default=0.001,help='consistency score threshold ε') args = parser.parse_args()

Question on code of SemiGNN

for generating u_i, u_j in SemiGNN, the code at line 187 in utils.py

for index in range(0, num_of_nodes):
        u_i.append(pairs[index][0])
        u_j.append(pairs[index][1])

the length of num_of_nodes is not the same as the length of pairs. It cann't iterate all pairs.

not sure is it an error?

There are two softmax functions on the line 100 and line 104 of SemiGNN.py as below. Does it repeat?

An error in GAS implementation

There seems to be an error in the implementation in Equation (8) in the GAS paper.

In the combination part from line 553 to 561,

        #  Combination
        if self.concat:
            user_output = dot(user_output, self.concate_user_weights,
                              sparse=False)
            item_output = dot(item_output, self.concate_item_weights,
                              sparse=False)

            user_output = tf.concat([user_vecs, user_output], axis=1)
            item_output = tf.concat([item_vecs, item_output], axis=1)

It seems that user_vecs rather than user_output should be apply to the dot function with concat weights, as shown in the equation (8).

semiGNN example dataset explanation

Could you please explain the example dataset of SemiGNN? SemiGNN is used in heterogenous graph, so I wonder how to distinguish between user nodes and nodes of other type in one adjacency matrix of 'rownetworks'?

review features

作者你好！

在您的论文实验部分提到：

We take the 100-dimension Word2Vec embedding of each review as its feature like previous work

我的理解是对每一个review的文本内容进行Word2Vec得到100维的特征向量

但是在数据集中显示的是45954x32的features稀疏矩阵，也就是每一个review节点有32维特征：

我很疑惑这32维特征对应的具体含义，是属于review的文本特征还是行为特征？还是都包含？

在YelpCHI原数数据集论文：

S. Rayana and L. Akoglu. 2015. Collective Opinion Spam Detection: Bridging Review Networks and Metadata. In KDD.

中review特征提取部分包含6个行为特征和10个文本特征，共16维

由于联系论文作者没有取得回复，我不知道原数据中节点特征是怎样的。

希望您能抽空解答，谢谢！

the output of SemiGNN->ViewAttention

When computing the joint embedding of user u (i.e. Eq.4) for SemiGNN, we should concatenate the weighted embedding in every view of user u.

In our code, The matrix alpha * h (called h_tmp below) is a matrix of (view_num, node_num, self.encoding[-1]). for user 0, we ought to concatenate h_tmp[0][0], h_tmp[1][0], h_tmp[2][0], ...
Yet this line doesn't do that, since tf.reshape(...) doesn't change the overall order of the elements in matrix.
Instead we should be using the code below to implement the concatenation.
output = tf.concat([h_tmp[i] for i in range(self.view_num)], 1)
Don't know if that's right...

when I run GEM model as below, it has a bug.

not sure is it a bug ?

I put forward a issue first time, and I don't know what the rules are. Please forgive me if I offend you

semiGnn uses the get_negative_sampling function in Utils.py
but i noticed that len(pairs) not equels len(adj_nodelist)
maybe repalacing “for index in range(0, len(adj_nodelist)):” with “for index in range(0, len(pairs)):” would more correct？

safe-graph / dgfraud-tf2 Goto Github PK

dgfraud-tf2's People

Contributors

Stargazers

Watchers

Forkers

dgfraud-tf2's Issues

Question on the code of masked cross entropy loss

Question on the GraphConsis

Question on code of SemiGNN

not sure is it an error?

An error in GAS implementation

semiGNN example dataset explanation

review features

the output of SemiGNN->ViewAttention

when I run GEM model as below, it has a bug.

not sure is it a bug ?

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent