Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix: prase edge error when data props num is different #211

Merged
merged 3 commits into from
May 21, 2021

Conversation

DoubleBabylol
Copy link
Contributor

fix #209

@codecov
Copy link

codecov bot commented May 12, 2021

Codecov Report

Merging #211 (e15d397) into master (5f9aa9b) will decrease coverage by 0.01%.
The diff coverage is 66.66%.

Impacted file tree graph

@@             Coverage Diff              @@
##             master     #211      +/-   ##
============================================
- Coverage     74.04%   74.03%   -0.02%     
  Complexity      871      871              
============================================
  Files            80       80              
  Lines          3645     3647       +2     
  Branches        436      437       +1     
============================================
+ Hits           2699     2700       +1     
  Misses          745      745              
- Partials        201      202       +1     
Impacted Files Coverage Δ Complexity Δ
...om/baidu/hugegraph/loader/builder/EdgeBuilder.java 93.33% <66.66%> (-0.99%) 27.00 <10.00> (ø)

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 5f9aa9b...e15d397. Read the comment docs.

@imbajin imbajin changed the title fix #209 fix: prase edge error when data props num is different May 12, 2021
if (this.vertexIdsIndex == null) {
this.vertexIdsIndex = this.extractVertexIdsIndex(names);
}
this.vertexIdsIndex = this.extractVertexIdsIndex(names);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

每次build都重新提取下标开销太大。

image

图中这个提交将属性的获取从map变成了数组下标访问的方式,你碰到的问题应该是由这个提交导致的。

请切到这个提交的前面一个提交,验证一下是否存在你的那个问题。如果不存在了,我觉得还是得改回用map访问的方式。

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

改回map的话 ,又回到了每行json创建map的情况。
如果对names进行判断,names代表json数据的key值,他没有变化就说明这行数据的索引位置和数值都没变,那么就只有在json的key变化的时候重新计算就可以了

Copy link
Contributor

@Linary Linary May 17, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

文件是Json的时候,现有代码是一定会生成一次map的,性能无影响的;受影响的是文件是CSV或TEXT格式的时候。
所以现在的方案就是两个:

  1. 所有文件格式都先生成map,无需提取数组索引,这样CSV、TEXT有性能损耗,JSON无影响;
  2. CSV、TEXT直接生成数组,JSON先生成map再转成数组,然后CSV、TEXT无性能损耗,JSON需要对比和重新提取,有损耗。

综合考虑,建议采用方案2,可做这样修改:

  1. 文件是CSV和TEXT的时候,names其实每次都是header引用的那个对象,所以不会有变化,不用重新提取索引;每次都比较(或者用==比较的方式);
  2. 文件是JSON格式时,先对比,如果有改变就重新提取。

代码大概是:

private String[] lastNames;

...

public List<Edge> build(String[] names, Object[] values) {
        if (this.vertexIdsIndex == null || 
            (this.lastNames != names && !Arrays.equals(this.lastNames, names))) {
            this.vertexIdsIndex = this.extractVertexIdsIndex(names);
        }
        this.lastNames = names;
        ...
}

if (this.vertexIdsIndex == null) {
this.vertexIdsIndex = this.extractVertexIdsIndex(names);
}
this.vertexIdsIndex = this.extractVertexIdsIndex(names);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

提交前先rebase一下master分支

import java.util.ArrayList;
import java.util.Collection;
import java.util.HashMap;
import java.util.List;
import java.util.Map;
import java.util.Set;
import java.util.*;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

keep original, use this config file with IDEA

@DoubleBabylol DoubleBabylol reopened this May 20, 2021
@@ -69,9 +71,12 @@ public EdgeMapping mapping() {

@Override
public List<Edge> build(String[] names, Object[] values) {
if (this.vertexIdsIndex == null) {
if (this.vertexIdsIndex == null ||
(this.lastNames != names &&
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

no need to compare address because Arrays.equals() will compare them first

@Linary Linary merged commit ba85558 into apache:master May 21, 2021
@imbajin imbajin mentioned this pull request May 24, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

边json数据导入格式不统一时数组溢出
3 participants