-
Notifications
You must be signed in to change notification settings - Fork 415
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
File Sink Output - JSON Lines type - Not completing the last log line at the end of the file #2343
Comments
@canob Appreciate the detail description. Are you trying to use files to transfer data between rules? That's not the purpose of file sink. It is not designed to write each json output to a single file ( which means "completing the last line" ). Instead file sink is used to save batch data. To transfer data between rules, try to use memory sink/source pair. Check https://ekuiper.org/docs/en/latest/guide/rules/rule_pipeline.html. The REST error means the JSON output format has some problems. Please try to debug https://ekuiper.org/docs/en/latest/getting_started/debug_rules.html |
Hi @ngjaying . Thanks for your answer. |
Thanks for clarifying. You said "and I cannot process that file with other piece of software in that case." I don't think file sink is designed for "realtime" data transfer. Would you like to use MQTT or REST sink to publish "realtime" data to external system? For file sink, it has a rolling strategy(https://ekuiper.org/docs/en/latest/guide/sinks/builtin/file.html#rolling-strategy), and will guarantee to complete the writing for each roll. |
Based on your idea of using other type of sink to publish "realtime" data to external system (not file sink) is that I configured a REST sink Output to send data to FluentBit HTTP Input (my second post in this issue), but in that case happened the problem with JSON unexpected end of JSON input that I need to try to debug with the provided procedure that you commented me in your other answer. I even tried this for example: |
Each sink can have multiple actions, during rule composing, you can add an additional log action to watch the output. I guess you'll just need to set "sendSingle" to true in the sink property. |
I already changed to "sendSingle" on sink output when I saw the first time the |
Hi again @ngjaying, So, I did many tests to try to understand what happening, but is no really clear for me why I'm getting The final conclusion is that If I create this path:
I'm receive the error, but I'm not loosing any event: As you can see, I send 6 events from Fluentbit to eKuiper, and I ended with 6 events on ekuiper file (you are going to see an additional field, "tag", because I'm adding that field with a filter in Fluentbit before send the events to ekuiper file): The debug log of eKuiper is not showing any particular error:
So, my questions to try to understand are basically:
Thanks in advance for your help. |
@canob The error happens when eKuiper parse the response from Fluentbit. Looks like the response from Fluentbit is not a JSON string. The error message is misleading, we'll optimize that. I guess you have set debugResp option to true for rest sink. You can just set it to false to skip parsing response to avoid this problem. |
Thanks @ngjaying ! |
Hi @ngjaying,
Now, what for me is not ok or is not the expected behavior (or in other words, I not saw that behavior in other stream processing solutions), is that for many seconds, in the oldest file, last line of the file, when I do a "cat filename", contains a "cutted" JSON line at the end, and after 10-20 seconds (one or two "cat filename"), that mentioned line complete their content, additional JSON lines appear, and the file is closed and a new file is created by eKuiper to continue appending JSON lines to it.
My assumption is that eKuiper has a kind of buffer/cache of 4K to send the generated rule output content to the file, because every time the startsize of the file after receiving the first bunch of events is of 4.0K, and so on (8.0K, 12.0K, etc.). I already tried to "play" with Cache configs and Async configs on eKuiper, but nothing helped to change this behavior, so maybe it is by design, but I want to understand why, because I prefer to use File Sink than REST Sink, thinking in performance. Thanks in advance for your help, and sorry for the big explanation, but I feel that when you try to explain a problem/issue, is better to give all the details that you can (and more if your are not a native english speaker, like me). |
As file sink is not designed for real time data transfer, we are using golang bufio to write files for better io perfermance. That should be its default behavior. MQTT may be suitable for realtime transfer. We are also working on websocket sink, which is also suitable for data transfer. Regarding json "not completed", that's because you cannot complete a json array in the middle. Think about you expect to have 10 message to write to a json array, when you get the first image, you cannot "compete" it. [{"id":1}], otherwise, how to append the second message? This is not a valid json [{"id":1}],[{"id":2}] should be [{"id":1"},{"id:2"} and append the last Finally, as we are an open source project, we encourage you to read the source code directly if neede. |
Hi @ngjaying . Thanks for your answer. |
@canob You're welcome! |
I configured different sources on eKuiper:
On both cases, I'm sending the output to a file with eKuiper:
On the first case, JSON Lines file source -> JSON Lines file, I see two problems:
The first one is that every JSON Line of the source file is processed again every time eKuiper "review the file" (I configured to do that every 30 seconds), and my expectation is that eKuiper only process the new JSON Lines in the file, not all of them again.
The second problem is the one mentioned in the title of this issue: the destination JSON Lines file that eKuiper is filling, the last line is incomplete. For example:
On the second case, HTTPPush Input -> JSON Lines, the problem is again the one mentioned in the tittle of the issue, the destination JSON Lines file that eKuiper is filling, the last line is incomplete. For example:
This is for me a big issue, because if I use other program to "consume" the output file in real time, is going to have many problems with last incomplete line.
The really strange thing here, is that after some time (sometimes, minutes), eKuiper completes the last JSON line of the output file with cutted part.
Environment:
1.3.0
): lfedge/ekuiper:1.11.4, Docker imagelscpu
): x86_64, Intel(R) Core(TM) i7-8750H CPU @ 2.20GHz, 16 GB RAMcat /etc/os-release
): Manjaro Linux, but using a Docker Image for eKuiperWhat happened and what you expected to happen:
Two things happened:
What I expected to happen:
How to reproduce it (as minimally and precisely as possible):
Anything else we need to know?:
Nothing in particular.
The text was updated successfully, but these errors were encountered: