Question on the feature format for generated samples of ENZYMES dataset #11

lizaitang · 2023-06-02T03:43:35Z

Dear Author, you paper really helps a lot, but I have a question that I want to pass the generated graph to some classifiers, but seems the node features generated by the GDSS is different from the original dataset in data scale. How can I solve it? Thanks

harryjo97 · 2023-06-02T05:31:47Z

Hi lizaitang,

In our work, we used the degree of each node as the node features instead of the given node features of the original dataset. In order to use the node features of the original dataset, you can modify the code in

GDSS/utils/data_loader.py

Line 6 in 4d96334

def graphs_to_dataloader(config, graph_list):

and

GDSS/utils/graph_utils.py

Line 43 in 4d96334

def init_features(init, adjs=None, nfeat=10):

for loading the node features of the dataset.
After changing these, you could newly train the score models to generate both node features and the adjacency matrices.

lizaitang · 2023-06-02T07:29:30Z

Dear Author, thank you very much for your quick reply! How can we modify the code to load node feature of the dataset? Change the init to zeros or ones? Or we dirrectly change x_tensor = init_features(config.data.init, adjs_tensor, config.data.max_feat_num) to the feature of the dataset?
def init_features(init, adjs=None, nfeat=10):

if init=='zeros':
    feature = torch.zeros((adjs.size(0), adjs.size(1), nfeat), dtype=torch.float32, device=adjs.device)
elif init=='ones':
    feature = torch.ones((adjs.size(0), adjs.size(1), nfeat), dtype=torch.float32, device=adjs.device)
elif init=='deg':
    feature = adjs.sum(dim=-1).to(torch.long)
    num_classes = nfeat
    try:
        feature = F.one_hot(feature, num_classes=num_classes).to(torch.float32)
    except:
        print(feature.max().item())
        raise NotImplementedError(f'max_feat_num mismatch')
else:
    raise NotImplementedError(f'{init} not implemented')

flags = node_flags(adjs)

return mask_x(feature, flags)

harryjo97 · 2023-06-02T12:31:57Z

You can change init_features in

GDSS/utils/graph_utils.py

Line 43 in 4d96334

def init_features(init, adjs=None, nfeat=10):

to take in graph_list as input and return the node features.
To be specific, each graph in the graph_list is a networkx Graph with node features.

Or you could directly modify x_tensor = init_features(config.data.init, adjs_tensor, config.data.max_feat_num) in

GDSS/utils/data_loader.py

Line 6 in 4d96334

def graphs_to_dataloader(config, graph_list):

to obtain the original node features from the networkx Graph.

Please refer to the networkx documentation for more details.

FYI, the attributed graphs of the ENZYMES dataset are loaded by this function:
https://github.com/harryjo97/GDSS/blob/4d96334fd0d07577f9891e9d5e81dae4d64a92fd/data/data_generators.py#LL131C13-L131C13

lizaitang · 2023-06-03T00:36:13Z

Dear Author,
Thank you so much for your quick reply, I have a minor question that I follow the format of graph_to_tensor to load the original node features, but for v, feature in g.nodes.data('feature') gives feature as none, could you please help to fix on it?

def feat_to_tensor(graph_list, max_node_num,max_feat_num):
    feat_list = []
    max_node_num = max_node_num

    for g in graph_list:
        assert isinstance(g, nx.Graph)

        node_feat_list = np.zeros([max_node_num,max_feat_num], dtype = float)
        i=0 
        for v, feature in g.nodes.data('feature'):
            
            node_feat_list[i]=feature
            
            i=i+1
        #print(node_feat_list)
       
        feat_list.append(node_feat_list)

    del graph_list

    feat_np = np.asarray(feat_list)
    del feat_list

    adjs_tensor = torch.tensor(feat_np, dtype=torch.float32)
    del feat_np

    return adjs_tensor

harryjo97 · 2023-06-03T06:05:10Z

In the graph loader code:

GDSS/data/data_generators.py

Line 131 in 4d96334

    
           def graph_load_batch(min_num_nodes=20, max_num_nodes=1000, name='ENZYMES', node_attributes=True, graph_labels=True):

The node labels you are looking for are saved in g.nodes.data('label') (saved by Line 158 G.add_node(i + 1, label=data_node_label[i]))

You may want to try g.nodes.data('label') instead of g.nodes.data('feature').

lizaitang · 2023-06-03T13:03:53Z

Thanks for your reply, but if I want to generate graph with same format node features, shouldn't we use the feature instead of node labels?

harryjo97 · 2023-06-03T14:29:11Z

I think the node features you want to use for the classifier are contained in the label.

lizaitang · 2023-06-04T02:45:14Z

Sorry to bother, but I try label, ```
[[2. 2. 2. ... 2. 2. 2.]
[2. 2. 2. ... 2. 2. 2.]
[2. 2. 2. ... 2. 2. 2.]

harryjo97 · 2023-06-04T04:43:53Z

First of all, the label contains other values other than 2 (please see https://github.com/harryjo97/GDSS/blob/master/dataset/ENZYMES/ENZYMES_node_labels.txt)

Furthermore, if you want to use the node attributes in https://github.com/harryjo97/GDSS/blob/master/dataset/ENZYMES/ENZYMES_node_attributes.txt,
you may change the code in:

GDSS/data/data_generators.py

Line 266 in 4d96334

graphs = graph_load_batch(min_num_nodes=10, max_num_nodes=1000, name=dataset,

by setting the node_attributes=True which will load the node attributes file by
data_node_att = np.loadtxt(path + name + '_node_attributes.txt', delimiter=',')
in

GDSS/data/data_generators.py

Line 131 in 4d96334

    
           def graph_load_batch(min_num_nodes=20, max_num_nodes=1000, name='ENZYMES', node_attributes=True, graph_labels=True):

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Question on the feature format for generated samples of ENZYMES dataset #11

Question on the feature format for generated samples of ENZYMES dataset #11

lizaitang commented Jun 2, 2023

harryjo97 commented Jun 2, 2023 •

edited

Loading

lizaitang commented Jun 2, 2023

harryjo97 commented Jun 2, 2023

lizaitang commented Jun 3, 2023

harryjo97 commented Jun 3, 2023

lizaitang commented Jun 3, 2023

harryjo97 commented Jun 3, 2023

lizaitang commented Jun 4, 2023

harryjo97 commented Jun 4, 2023

Question on the feature format for generated samples of ENZYMES dataset #11

Question on the feature format for generated samples of ENZYMES dataset #11

Comments

lizaitang commented Jun 2, 2023

harryjo97 commented Jun 2, 2023 • edited Loading

lizaitang commented Jun 2, 2023

harryjo97 commented Jun 2, 2023

lizaitang commented Jun 3, 2023

harryjo97 commented Jun 3, 2023

lizaitang commented Jun 3, 2023

harryjo97 commented Jun 3, 2023

lizaitang commented Jun 4, 2023

harryjo97 commented Jun 4, 2023

harryjo97 commented Jun 2, 2023 •

edited

Loading