Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Enhancement] Implement the function of File Source Connector #4635

Closed
2 of 3 tasks
pandaapo opened this issue Dec 10, 2023 · 12 comments · Fixed by #4650
Closed
2 of 3 tasks

[Enhancement] Implement the function of File Source Connector #4635

pandaapo opened this issue Dec 10, 2023 · 12 comments · Fixed by #4650
Labels
enhancement New feature or request

Comments

@pandaapo
Copy link
Member

Search before asking

  • I had searched in the issues and found no similar issues.

Enhancement Request

At present, the File Source Connector is basically an empty implementation.

Describe the solution you'd like

Implement the function of File Source Connector.

Are you willing to submit PR?

  • Yes I am willing to submit a PR!

Code of Conduct

@pandaapo pandaapo added the enhancement New feature or request label Dec 10, 2023
@HarshSawarkar
Copy link
Contributor

Hi @pandaapo I would like to work on this issue.Can I get some information on how to implement (start(),commit(),stop())this methods?

@pandaapo
Copy link
Member Author

Welcome!
Firstly, the File Source Connector primarily retrieves content from files, and Java offers various methods for obtaining file content. For example, you can establish an input stream to the target file in the start() method, and the stop() method naturally involves closing resources that need to be released, such as the previously opened input stream. The commit() method is mainly used to record the position of data reading, it is not necessary to implement it for now. However, if you can accurately record the position of the data already read, you can implement this method commit(). My suggestion is just a reference, and you are free to choose other, possibly better, implementation approaches.

@HarshSawarkar
Copy link
Contributor

Thanks @pandaapo for your suggestion.I have started the implementation already.I am stuck at this poll() method.Any suggestions for implementing it would be of great help?

@pandaapo
Copy link
Member Author

pandaapo commented Dec 12, 2023

I think you can refer to the implementation of other Source Connectors, and also can observe the File Sink Connector to think about functions needed in this Connector. The above is what I can provide to you.

@HarshSawarkar
Copy link
Contributor

Thanks for the tips! I'll check out other Source Connectors and the File Sink Connector for reference. I appreciate your input and will keep you posted on the progress.

@HarshSawarkar
Copy link
Contributor

Hello @pandaapo , I have submitted a pull request. Could you please review it and provide any suggestions for changes if needed?

@VishalMCF
Copy link
Contributor

VishalMCF commented Dec 13, 2023

@pandaapo @HarshSawarkar I have a theoretical question regarding the File Source connector. Hope I get some suggestions or any resources to study related to my question. Because we are not implementing the commit() method, every time a change happens in a file our connector will read the entire file and then push it to the event broker. This seems to be inefficient but maybe we can ignore that for now.
But let's say if we want to implement the commit() method how are we going to manage the offset? For each file, we need to store that offset persistently somewhere. From what I have read about offset in the context of file source, it can be the location of the byte last read or maybe the line number. What if the file is edited from the line existing previously than offset? For eg:- (offset present at line 91 but the user edited the file at line 22). I am not sure if my understanding is proper about file source connector but I am trying to refine it. Hope my question is clear enough. Thanks in Advance!!

@pandaapo @HarshSawarkar 我有一个关于文件源连接器的理论问题。 希望我能得到一些建议或任何与我的问题相关的研究资源。 因为我们没有实现 commit() 方法,所以每次文件中发生更改时,我们的连接器都会读取整个文件,然后将其推送到事件代理。 目前这似乎效率低下,但也许我们现在可以忽略它。
但是假设如果我们想实现 commit() 方法,我们将如何管理偏移量? 对于每个文件,我们需要将该偏移量永久存储在某处。 根据我在文件源上下文中读到的偏移量,它可以是上次读取的字节的位置,也可以是行号。 如果文件是从先前存在的行而不是偏移量编辑的,该怎么办? 例如:-(偏移量出现在第 91 行,但用户在第 22 行更改)。 我不确定我对文件源连接器的理解是否正确,但我正在尝试完善它。 希望我的问题足够清楚。 提前致谢!!

@pandaapo
Copy link
Member Author

From what I have read about offset in the context of file source, it can be the location of the byte last read or maybe the line number. What if the file is edited from the line existing previously than offset? For eg:- (offset present at line 91 but the user edited the file at line 22).

It is indeed a complex problem, and I feel that relying solely on offset does not seem to solve the problem of incrementally reading file changes.

确实是个复杂的问题,我觉得这单独靠一个偏移量似乎解决不了增量读取文件变更的问题。

@HarshSawarkar
Copy link
Contributor

Hi @pandaapo,I have created a new PR as per the changes mentioned in earlier PR.Can you please review it again?And suggest changes,if any?Here's the link to new PR #4650

Pil0tXia pushed a commit that referenced this issue Jan 20, 2024
* Implemented the functions of file source connector.

* Implemented the functions of file source connector.

* Implemented the functions of file source connector.

* Implemented the functions of file source connector.

* Implemented the functions of file source connector.

* Implemented the functions of file source connector.

* Implemented the functions of file source connector.

* Implemented the functions of file source connector.

* Implemented the functions of file source connector.

* Implemented the functions of file source connector.

* Implemented the functions of file source connector.

* Implemented the functions of file source connector.
@HarshSawarkar
Copy link
Contributor

Hello @pandaapo and @Pil0tXia , I appreciate your help and support! I'm eagerly looking forward to making more contributions. By the way, do you happen to know if EventMesh will be part of GSOC 2024? Additionally, could you share some of the ideas for this year? Thanks a bunch!

@Pil0tXia
Copy link
Member

@HarshSawarkar

EventMesh will participate in at least one of GSoC/OSPP/GLCC each year. You don't need to worry too much because the registration period for these three events hasn't started yet. You can subscribe to our issues mailing list, where the subjects will be announced as issues.

@HarshSawarkar
Copy link
Contributor

@Pil0tXia Thanks for the update! Exciting to hear that EventMesh is participating in GSoC/OSPP/GLCC annually. I'll stay tuned for the registration period and subscribe to the issues mailing list. Appreciate your guidance!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
4 participants