You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
some machine learning to adjust the grid size of components when composed...
though i believe it's trivially the future work...
about decomposition, https://www.babelstone.co.uk/CJK/index.html -> ids.txt
there's an actively maintained dataset in IDS format, not sure if you're already using it. those could be easily parsed and directly transformed to your format. though also it's expected not a few of them will need to be hand corrected. visualizing them with this project may help to identify errors too.
personally i suggest not to break up the components too fine/deep. down to some mid-level components, much likely they're essentially (look up at zdic.net or hanziyuan.net for example) standalone components but may be generalized and merged to look like combinations of semantically irrelevant sub-shapes during 隸變 楷化. in other word, there's much chance for some fake orthogonality. for example 它→宀匕, 宁→宀丁, 寅→宀?, but in fact none of them is really composed with 宀. also too many levels of composition degrades output quality.
The text was updated successfully, but these errors were encountered:
some machine learning to adjust the grid size of components when composed...
though i believe it's trivially the future work...
about decomposition,
https://www.babelstone.co.uk/CJK/index.html -> ids.txt
there's an actively maintained dataset in IDS format, not sure if you're already using it. those could be easily parsed and directly transformed to your format. though also it's expected not a few of them will need to be hand corrected. visualizing them with this project may help to identify errors too.
personally i suggest not to break up the components too fine/deep. down to some mid-level components, much likely they're essentially (look up at zdic.net or hanziyuan.net for example) standalone components but may be generalized and merged to look like combinations of semantically irrelevant sub-shapes during 隸變 楷化. in other word, there's much chance for some fake orthogonality. for example 它→宀匕, 宁→宀丁, 寅→宀?, but in fact none of them is really composed with 宀. also too many levels of composition degrades output quality.
The text was updated successfully, but these errors were encountered: