Input Chinese characters, and the application will find optimized component-paths connecting them. It's like melting a word. Or traversing a banyan grove by following branches between prop-trunks.
With appreciation and respect for the late poet Yungtze (1922-2021).
import rongzi
rz = rongzi.RongZi()
df = rz.analyze_sequence('永遠的青鳥')
print(df.to_markdown())
永遠 | 遠的 | 的青 | 青鳥 | |
---|---|---|---|---|
0 | 永栐木榬袁遠 | 遠薳艹菂的 | 的白皘青 | 青鶄鳥 |
1 | 永水閖門闧遠 | 遠辶迫白的 | 的白胉月青 | 青聙耳鵈鳥 |
2 | 永詠言誾門闧遠 | 遠袁鎱金鉑白的 | 的白日晴青 | 青錆金鵭鳥 |
3 | 永詠言這辶遠 | 遠袁褤衤袙白的 | 的白鲌鱼鲭青 | 青靖立鴗鳥 |
4 | 永泳氵溒袁遠 | 遠袁𠮷口啲的 | 的白鉑金錆青 | 青鯖魚鷠鳥 |
- For basic usage rongzi requires:
python
version 3.9 or newer.- Third-party libraries
pandas
(1.3.4 or newer), andnumpy
.
- For graph visualization rongzi additionally needs:
graphviz
- (Sometimes graphviz may be a trickier install, so I split this functionality out into a separate library, starting with the streamlit app app.py.)
- Query the CCD database of Chinese character radical breakdowns,
- modeled as a directed graph.
- Develop an process that can:
- when given a Chinese poem as an input,
- find an optimized path stringing the characters together,
- by moving from character to character via shared components and intermediate characters,
- while:
- using the fewest possible steps,
- avoiding using the basic and compound strokes as components to link characters
- with minimal duplication of characters.
- Present it as a hosted interactive web-app.
熔字,róngzì
蓉子,Róngzi
榕子,róngzi
榮。róng
Molten words,
Yungtze,
a banyan seed,
and honor.
- Refactor (ccd, pdb, kdb) as class property methods:
- Allow to choose between CJK languages/scripts (CH, JA, TW)
- Create CICD pipeline using github actions.
- Dockerize application