Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support multiple parsing threads #377

Closed
d3f3kt opened this issue Feb 22, 2016 · 20 comments · Fixed by #2594
Closed

Support multiple parsing threads #377

d3f3kt opened this issue Feb 22, 2016 · 20 comments · Fixed by #2594

Comments

@d3f3kt
Copy link
Contributor

d3f3kt commented Feb 22, 2016

The abillity to parse logs with multiple threads would be a very great improvement.

@allinurl
Copy link
Owner

Would this be multiple threads when parsing one log? If it's for multiple logs, can you run multiple instances of goaccess?

@d3f3kt
Copy link
Contributor Author

d3f3kt commented Feb 23, 2016

The idea is that multiple threads are parsing a single log file.
A server with 32Cores needs now almost an hour to parse a 2gb log file. That would be much faster if all of the 32 cores are parsing the logfile.

@allinurl
Copy link
Owner

allinurl commented Mar 1, 2016

Are you using the on-disk storage?

@freefd
Copy link

freefd commented Oct 12, 2016

Folks, is there any news here?

@allinurl
Copy link
Owner

@freefd I haven't had the chance to look at this particular request. As soon as some progress is made, I'll post back.

@Blaar
Copy link

Blaar commented Oct 13, 2016

Dear Gerardo,
We have a huge number of log files what keeps about 60.400.000 requests. To speed up file processing we would like parse them with parallel goaccess processes. Currently parsing ~200 files took about 3 hours on our VM's hardware.
So, it is possible use taskset to run few concurrent goaccess processes which will use on-disk storage?

@allinurl
Copy link
Owner

allinurl commented Oct 13, 2016

@Blaar good question, you should be able to use taskset with goaccess. Give it a shot and let me know how it goes for you. The times that I've used taskset, I noticed that it would use only one core. I'd bump this on the to-do list, I still want to get the filters done first though.

@freefd
Copy link

freefd commented Oct 13, 2016

@allinurl,
Ok, we'll check and get back to you. Thanks.

@Blaar
Copy link

Blaar commented Nov 14, 2016

I tried to do:
taskset 0x00000001 goaccess
and
taskset 0x00000002 goaccess
Got:
taskset -cp 28057
pid 28057's current affinity list: 0
taskset -cp 28059
pid 28059's current affinity list: 1

But at the same time worked only one

@toontong
Copy link

toontong commented Nov 22, 2016

If it's for multiple logs, can you run multiple instances of goaccess?
With multi result, how to merge it as one html result ??
and, like @d3f3kt say, my log file was 100gb , but just used one core of 32Cores in the server.

@allinurl
Copy link
Owner

@toontong You could run multiple instances but you won't be able to merge the results. Definitely need to look into this request, #117 will make use of multiple threads, so this request could be part of that one as well. Stay tuned!

@shaun-ba
Copy link

shaun-ba commented Nov 7, 2017

Why was this closed? Trying to use GoAccess on many logs also and found it incredible that such a powerful piece of software is single threaded?

@allinurl
Copy link
Owner

allinurl commented Nov 7, 2017

@shaun-ba This is still opened (see label at the top of the page), issue 799 was closed.

@gitqlt
Copy link
Contributor

gitqlt commented Feb 20, 2018

Multiple parsing threads or multiple parsing projects...
I have daily access logs, from midnight to midnight, and goaccess works fine. Sometimes however I create hourly pages as well, using a loop like

for H in $(seq -w 0 23); do grep "20/Feb/2018:$H:" 2018-02-20_access.log | goaccess - >$H.html & done

24 goaccess processes start immediately, but they are executed sequentially (the files like /tmp/-1mdb_hostnames.tcb and the bunch of other similar files may lock the execution?). I will not use other analyzer than goaccess, so I may get acclimatized to this. But it would be better to process parallel.

@allinurl
Copy link
Owner

@gitqlt Are you using the same database files for the 24 processes? If they are all different, you could execute them all in parallel, and use different folder paths, i.e., --db-path <dir>.

@gitqlt
Copy link
Contributor

gitqlt commented Feb 21, 2018

Yep, I didn't pay attention to those databases files... Then I created 24 subfolders, specified them in
--db-path, and all the processes worked like a charm. Thank you for your help.
However, couldn't that be the default behaviour? Or couldn't exist an option, something like
--par-procs <num> (with a default value of ... 24 ... for the lazy guy)?

@twomiles-dev
Copy link

Any news about this one?

@allinurl
Copy link
Owner

@balazsbaranyi still on the works. I need to address a few other issues on the to-do list before getting to this.

@fisherwei
Copy link

Multi thread is very useful feature.
I ran a goaccess on the k8s cluster, it is very long time to start when pod was scheduled to another node.

@allinurl
Copy link
Owner

Implemented. The plan is to incorporate this feature in the next version release. Stay tuned.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

9 participants