We read every piece of feedback, and take your input very seriously.
To see all available qualifiers, see our documentation.
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Describe the bug ViterbiSegment加载自定义词典时未正确替换DoubleArrayTrie
Code to reproduce the issue com/hankcs/hanlp/seg/Viterbi/ViterbiSegment.java
private void loadCustomDic(String customPath, boolean isCache) { if (TextUtility.isBlank(customPath)) { return; } logger.info("开始加载自定义词典:" + customPath); DoubleArrayTrie<CoreDictionary.Attribute> dat = new DoubleArrayTrie<CoreDictionary.Attribute>(); String path[] = customPath.split(";"); String mainPath = path[0]; StringBuilder combinePath = new StringBuilder(); for (String aPath : path) { combinePath.append(aPath.trim()); } File file = new File(mainPath); mainPath = file.getParent() + "/" + Math.abs(combinePath.toString().hashCode()); mainPath = mainPath.replace("\\", "/"); DynamicCustomDictionary.loadMainDictionary(mainPath, path, dat, isCache, config.normalization); }
com/hankcs/hanlp/seg/SegmentTest.java
public void testExtendViterbi() throws Exception { HanLP.Config.enableDebug(false); String path = System.getProperty("user.dir") + "/" + "data/dictionary/custom/CustomDictionary.txt;" + System.getProperty("user.dir") + "/" + "data/dictionary/custom/全国地名大全.txt"; path = path.replace("\\", "/"); String text = "一半天帕克斯曼是走不出丁字桥镇的"; Segment segment = HanLP.newSegment().enableCustomDictionary(false); Segment seg = new ViterbiSegment(path); System.out.println("不启用字典的分词结果:" + segment.seg(text)); System.out.println("默认分词结果:" + HanLP.segment(text)); seg.enableCustomDictionaryForcing(true).enableCustomDictionary(true); List<Term> termList = seg.seg(text); System.out.println("自定义字典的分词结果:" + termList); }
Describe the current behavior 加载CustomDictionary.txt与全国地名大全.txt中, 应该包含'丁字桥镇'词条, 但实际的分词中并未切出
Expected behavior '丁字桥镇'词条应被切出
System information
Other info / logs com/hankcs/hanlp/seg/Viterbi/ViterbiSegment.java中的loadCustomDic(String customPath, boolean isCache)在加载完DoubleArrayTrie后应替换对应词典
The text was updated successfully, but these errors were encountered:
详见pull request: #1835, 如有不足之处请指教, 感谢
Sorry, something went wrong.
已经merge,感谢pr!
hankcs
No branches or pull requests
Describe the bug
ViterbiSegment加载自定义词典时未正确替换DoubleArrayTrie
Code to reproduce the issue
com/hankcs/hanlp/seg/Viterbi/ViterbiSegment.java
com/hankcs/hanlp/seg/SegmentTest.java
Describe the current behavior
加载CustomDictionary.txt与全国地名大全.txt中, 应该包含'丁字桥镇'词条, 但实际的分词中并未切出
Expected behavior
'丁字桥镇'词条应被切出
System information
Other info / logs
com/hankcs/hanlp/seg/Viterbi/ViterbiSegment.java中的loadCustomDic(String customPath, boolean isCache)在加载完DoubleArrayTrie后应替换对应词典
The text was updated successfully, but these errors were encountered: