Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Chinese-to-English (Help Wanted) #914

Open
krahets opened this issue Nov 2, 2023 · 25 comments · Fixed by #994
Open

Chinese-to-English (Help Wanted) #914

krahets opened this issue Nov 2, 2023 · 25 comments · Fixed by #994
Labels
translation English translation

Comments

@krahets
Copy link
Owner

krahets commented Nov 2, 2023

We are working on translating "Hello Algo" from Chinese to English with the following approach:

  1. AI translation: Carry out an initial pass of translations using the machine learning translator.
  2. Human optimization: Manually refine the machine-generated outputs to ensure authenticity and accuracy.
  3. Pull request review: The optimized translation will be doubly checked by the reviewers through GitHub pull request workflow.
  4. Repeat steps 2. and 3. for further improvements.
image

Join us

We're seeking contributors who meet the following criteria.

  • Technical background: Strong foundation in computer science, particularly in data structures and algorithms.
  • Language skills: Native proficiency in Chinese with professional-level English, or native English.
  • Available time: Dedicated to contributing to open-source projects with a willingness to engage in long-term translation efforts.

That is, our contributors are computer scientists, engineers, and students from different linguistic backgrounds, and their objectives have different focal points:

  • Native Chinese with professional working English: Ensuring translation accuracy and consistency between CN and EN versions.
  • Native English: Enhance the authenticity and fluency of the English content to flow naturally and to be engaging.

Tip

  • If you are interested in joining us, don't hesitate to comment on this issue or contact me via krahetx@gmail.com or WeChat krahets-jyd.
  • Please read the contributing guidelines before submitting or reviewing pull requests.
  • We use this Notion page to track progress and assign tasks. Please visit it for more details.
@krahets
Copy link
Owner Author

krahets commented Nov 11, 2023

Check out the following PR for more clarity on the workflow:

@dxtym
Copy link

dxtym commented Feb 3, 2024

Hi! I don't know Chinese, but can I contribute to this? I have formal background in CS. Thank you!

@krahets
Copy link
Owner Author

krahets commented Feb 3, 2024

Hi! I don't know Chinese, but can I contribute to this? I have formal background in CS. Thank you!

Welcome! I think you can engage in the PR reviewing, focusing on optimizing fluency and authenticity, if you’re proficient in English. Is English your first language?

@dxtym
Copy link

dxtym commented Feb 3, 2024

Hi! I don't know Chinese, but can I contribute to this? I have formal background in CS. Thank you!

Welcome! I think you can engage in the PR reviewing, focusing on optimizing fluency and autheticy, if you’re proficient in English. Is English your first language?

Not really, but I'm quite proficient.

@krahets
Copy link
Owner Author

krahets commented Feb 4, 2024

@thisisdilmurod Great! Please add my WeChat: krahets-jyd (if you use it) and join us on Discord

@dxtym
Copy link

dxtym commented Feb 5, 2024

@thisisdilmurod Great! Please add my WeChat: krahets-jyd (if you use it) and join us on Discord

Thank you! But I'm afraid the link to Discord above looks expired. Can you send it again, please?

@krahets
Copy link
Owner Author

krahets commented Feb 8, 2024

@thisisdilmurod Great! Please add my WeChat: krahets-jyd (if you use it) and join us on Discord

Thank you! But I'm afraid the link to Discord above looks expired. Can you send it again, please?

Sorry for the inconvenience. Please try this link: https://discord.gg/nvspS56295

@sheng0321
Copy link

  • A background in computer science, whether as a student, engineer, or researcher.

Hello, I hope I can help.

@SHUANGBRO888
Copy link

Hello, I am willing to help.

@frankliuao
Copy link

Hey I'm happy to help too.
I've been working in the U.S. since 2010 and I'm currently a TPM (Technical Project/Program Manager) at the University of Chicago. I'm working on SaaS products and I work closely with an awesome team of developers, platform engineers, and SecOps. I used to be a developer myself and I am still interested in doing it regularly nowadays. I can do manual translations, proofreads, ChatGPT + Gemini + Meta AI translation comparison, etc.
I admire and strongly respect your work, @krahets . This is a great project and I am happy to help in any way.

@krahets
Copy link
Owner Author

krahets commented May 3, 2024

@frankliuao Thanks for your interest and kind words about this book! Could you contact me via WeChat or Discord to discuss the details further?

@MENG2010
Copy link

Hi Krahets (@krahets),

Thank you for this awesome project. I'm happy to help too.

I've been studying in the US since 2015 and I'm currently a Ph.D. student in Computer Science at George Mason University. My research interests include trustworthy machine learning and software security. I can assist with manual translations, proofreading, and AI translation comparison (using ChatGPT and ClaudeAI).

Thank you for your time.

@krahets
Copy link
Owner Author

krahets commented Jul 12, 2024

Hi, @MENG2010, thanks for your interest! Welcome to join us! Could you please add my WeChat: krahets-jyd?

@umer77jahangir
Copy link

hello! Thank for this opportunity to collaborate with you guys. I am a third-year computer science bachelor's student. Although English is not my first language, I am proficient in it. In addition, I work as a front-end developer and content writer, having written several articles for other people's blogs. I believe my skills will be useful for this project. In git hub, I also provide a leetcode-feedback.
There is a problem with the previous links, so if you are interested, please send an invite link.

@tinatsina
Copy link

Hello 👋
My name is Tinaye (天籁). I am an Embedded Engineer here in China, and I work on stuff like this often. Mostly translating and improving Datasheets and Register Map documents. I am fluent in English, and my Chinese is "okay" 😓 .

Hope to join you on this project as well. I can assist with proofreading and fine-tuning text to be more in line with standard documentation.

@Huilin-Li
Copy link

Huilin-Li commented Jul 18, 2024

@thisisdilmurod Great! Please add my WeChat: krahets-jyd (if you use it) and join us on Discord

Thank you! But I'm afraid the link to Discord above looks expired. Can you send it again, please?

Sorry for the inconvenience. Please try this link: https://discord.gg/nvspS56295

Hi, discord link expired again.

@krahets
Copy link
Owner Author

krahets commented Jul 18, 2024

@Huilin-Li Updated

@krahets
Copy link
Owner Author

krahets commented Jul 18, 2024

Welcome @tinatsina @umer77jahangir! Do you use WeChat? If so, please add me krahets-jyd.

@umer77jahangir
Copy link

no i did not use it. Please send a link which is not expired

@vampirepapi
Copy link

Hey! I don't know Chinese, but can I still contribute to this? I know English and have a good grasp of CS.

@ofou
Copy link

ofou commented Jul 27, 2024

Is there any way to keep track of the translation status? 👀

I used this script to quickly compare the number of lines in each file under the 'docs' directory for both the Chinese and English versions of the content. While line count is not an ideal metric for translation progress, it provides a rough estimate of differences between the two versions.

#!/bin/bash

# Function to get file counts
get_file_counts() {
    cd "$1" || exit
    find . -name "*.md" -print0 | xargs -0 wc -l | sort -n
    cd - > /dev/null || exit
}

# Get counts for both directories
en_counts=$(get_file_counts "./en/docs")
zh_counts=$(get_file_counts "./docs")

# Combine and format as a markdown table, showing only differences
echo "| File | ZH Lines | EN Lines | Difference |"
echo "| ---- | -------- | -------- | ---------- |"

awk '
BEGIN {FS="\n"; RS=""}
{
    for (i=1; i<=NF; i++) {
        split($i, a, " ")
        file = a[2]
        sub(/^\.\//, "", file)
        if (NR == 1) {
            zh_files[file] = a[1]
        } else {
            en_files[file] = a[1]
        }
    }
}
END {
    for (file in zh_files) {
        if (file in en_files) {
            diff = zh_files[file] - en_files[file]
            if (diff != 0) {
                printf "| %s | %s | %s | %d |\n", file, zh_files[file], en_files[file], diff
            }
        } else {
            printf "| %s | %s | - | %s |\n", file, zh_files[file], zh_files[file]
        }
    }
    for (file in en_files) {
        if (!(file in zh_files)) {
            printf "| %s | - | %s | -%s |\n", file, en_files[file], en_files[file]
        }
    }
}
' <(echo "$zh_counts") <(echo "$en_counts") | sort -t '|' -k5 -n

# Print totals
zh_total=$(echo "$zh_counts" | tail -n 1 | awk '{print $1}')
en_total=$(echo "$en_counts" | tail -n 1 | awk '{print $1}')
total_diff=$((zh_total - en_total))

The result of that script is a markdown table like this:

File ZH Lines EN Lines Difference
chapter_array_and_linkedlist/summary.md 76 81 -5
chapter_computational_complexity/performance_evaluation.md 49 48 1
chapter_data_structure/summary.md 66 65 1
chapter_tree/array_representation_of_tree.md 166 164 2
chapter_graph/graph_traversal.md 140 136 4
chapter_data_structure/basic_data_types.md 181 170 11
chapter_tree/avl_tree.md 364 353 11
chapter_introduction/summary.md 22 9 13
chapter_preface/suggestions.md 252 239 13
chapter_array_and_linkedlist/array.md 235 221 14
chapter_backtracking/backtracking_algorithm.md 509 489 20
chapter_tree/binary_tree.md 688 662 26
chapter_stack_and_queue/stack.md 436 389 47
chapter_stack_and_queue/queue.md 429 381 48
chapter_hashing/hash_algorithm.md 416 366 50
chapter_stack_and_queue/deque.md 458 405 53
chapter_hashing/hash_map.md 603 537 66
chapter_paperbook/index.md 68 -1 68
chapter_array_and_linkedlist/linked_list.md 761 686 75
chapter_computational_complexity/space_complexity.md 898 803 95
chapter_computational_complexity/time_complexity.md 1224 1112 112
chapter_array_and_linkedlist/list.md 1034 906 128
total 14570 13717 853

Then we can inspect by file using something like this:

diff \
  --width="$COLUMNS" \
  --side-by-side \
  --color=always \
  --expand-tabs \
  en/docs/chapter_array_and_linkedlist/summary.md \
  docs/chapter_array_and_linkedlist/summary.md 

Now we can get to see were the diff is between both versions

# Summary                                               |  # 小结
                                                           
### Key review                                          |  ### 重点回顾
                                                           
- Arrays and linked lists are two basic data structures |  - 数组和链表是两种基本的数据结构,分�
- Arrays support random access and use less memory; how |  - 数组支持随机访问、占用内存较少;但�
- Linked lists implement efficient node insertion and d |  - 链表通过更改引用(指针)实现高效的�
- Common types of linked lists include singly linked li |  - 列表是一种支持增删查改的元素有序集�
- Lists are ordered collections of elements that suppor |  - 列表的出现大幅提高了数组的实用性,�
- The advent of lists significantly enhanced the practi |  - 程序运行时,数据主要存储在内存中。�
- During program execution, data is mainly stored in me |  - 缓存通过缓存行、预取机制以及空间局�
- Caches provide fast data access to CPUs through mecha |  - 由于数组具有更高的缓存命中率,因此�
- Due to higher cache hit rates, arrays are generally m <  
                                                           
### Q & A                                                  ### Q & A
                                                           
**Q**: Does storing arrays on the stack versus the heap |  **Q**:数组存储在栈上和存储在堆上,对�
                                                           
Arrays stored on both the stack and heap are stored in  |  存储在栈上和堆上的数组都被存储在连续�
                                                           
1. Allocation and release efficiency: The stack is a sm |  1. 分配和释放效率:栈是一块较小的内存�
2. Size limitation: Stack memory is relatively small, w |  2. 大小限制:栈内存相对较小,堆的大小�
3. Flexibility: The size of arrays on the stack needs t |  3. 灵活性:栈上的数组的大小需要在编译�
                                                           
**Q**: Why do arrays require elements of the same type, |  **Q**:为什么数组要求相同类型的元素,�
                                                           
Linked lists consist of nodes connected by references ( |  链表由节点组成,节点之间通过引用(指�
                                                           
In contrast, array elements must be of the same type, a |  相对地,数组元素则必须是相同类型的,�
                                                           
```shell                                                   ```shell
# Element memory address = array memory address + eleme |  # 元素内存地址 = 数组内存地址(首元素�
```                                                        ```
                                                           
**Q**: After deleting a node, is it necessary to set `P |  **Q**:删除节点 `P` 后,是否需要把 `P.next`
                                                           
Not modifying `P.next` is also acceptable. From the per |  不修改 `P.next` 也可以。从该链表的角度看
                                                           
From a garbage collection perspective, for languages wi |  从数据结构与算法(做题)的角度看,不�
                                                           
**Q**: In linked lists, the time complexity for inserti |  **Q**:在链表中插入和删除操作的时间复�
                                                           
If an element is searched first and then deleted, the t |  如果是先查找元素、再删除元素,时间复�
                                                           
**Q**: In the figure "Linked List Definition and Storag |  **Q**:图“链表定义与存储方式”中,浅�
                                                           
The figure is just a qualitative representation; quanti |  该示意图只是定性表示,定量表示需要根�
                                                           
- Different types of node values occupy different amoun |  - 不同类型的节点值占用的空间是不同的�
- The memory space occupied by pointer variables depend |  - 指针变量占用的内存空间大小根据所使�
                                                           
**Q**: Is adding elements to the end of a list always ` |  **Q**:在列表末尾添加元素是否时时刻刻�
                                                           
If adding an element exceeds the list length, the list  |  如果添加元素时超出列表长度,则需要先�
                                                           
**Q**: The statement "The emergence of lists greatly im |  **Q**:“列表的出现极大地提高了数组的�
                                                           
The space wastage here mainly refers to two aspects: on |  这里的空间浪费主要有两方面含义:一方�
                                                           
**Q**: In Python, after initializing `n = [1, 2, 3]`, t |  **Q**:在 Python 中初始化 `n = [1, 2, 3]` 后,�
                                                           
If we replace list elements with linked list nodes `n = |  假如把列表元素换成链表节点 `n = [n1, n2, n
                                                           
Unlike many languages, in Python, numbers are also wrap |  与许多语言不同,Python 中的数字也被包装
                                                           
**Q**: The `std::list` in C++ STL has already implement |  **Q**:C++ STL 里面的 `std::list` 已经实现了�
                                                           
On the one hand, we often prefer to use arrays to imple |  一方面,我们往往更青睐使用数组实现算�
                                                           
- Space overhead: Since each element requires two addit |  - 空间开销:由于每个元素需要两个额外�
- Cache unfriendly: As the data is not stored continuou |  - 缓存不友好:由于数据不是连续存放的�
                                                           
On the other hand, linked lists are primarily necessary |  另一方面,必要使用链表的情况主要是二�
                                                           
**Q**: Does initializing a list `res = [0] * self.size( |  **Q**:初始化列表 `res = [0] * self.size()` 操�
                                                           
No. However, this issue arises with two-dimensional arr |  不会。但二维数组会有这个问题,例如初�
                                                        <  
**Q**: In deleting a node, is it necessary to break the <  
                                                        <  
From the perspective of data structures and algorithms  <  

In my humble opinion, it makes sense to have a side-by-side translation (with an equal number of lines) because of the significant differences between 普通话 and English.

By the way, excellent book! I can't wait to read the English PDF/EPUB3 ASAP!

Footnotes

  1. These checks will output rows for any files that exist in one language version but not in the other. The output will show "-" in the column for the language where the file is missing.

@krahets
Copy link
Owner Author

krahets commented Oct 24, 2024

Hi @ofou, sorry I didn't notice your comment. The analysis is great! Indeed, synchronizing the English and Chinese versions is not an easy task. Would you like to join us? I hope we can discuss this in detail.

@ofou
Copy link

ofou commented Oct 24, 2024

Hey @krahets, thanks for reaching out! I'd love to contribute to this project, but I barely know basic Mandarin. However, I can help with the English version and ensure that the content is well-aligned with the original Chinese version.

@krahets
Copy link
Owner Author

krahets commented Oct 29, 2024

@ofou Welcome on board! We hope to involve native English speakers to make the English version more authentic. The task of synchronizing the Chinese and English versions will be handled by other Chinese translators.

Do you use wechat? Please add me: krahets-jyd , I will invite you to the group chat.

@ofou
Copy link

ofou commented Oct 30, 2024

@krahets I added you, mine is omarnomad I'm a noob using WeChat but let's do it

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
translation English translation
Projects
None yet