-
Notifications
You must be signed in to change notification settings - Fork 30
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
still amazing yet helm-org-rifle-org-directory and helm-org-rifle-occur-directories really slow #15
Comments
Hey Z! Funny you should mention that, just yesterday I was working on this because I saw that update in I'd like to try The slowness comes from having to open each file and activate Another angle is to get actual matching nodes from the external tool (which Can you give me an idea of how many files you end up searching when you call one of these commands? If it's a really large number, the first method would probably help your situation a lot. If it's a few large files, the second method would probably be the one to try. Thanks for the feedback! |
Hey, I just added a branch which may help a lot: https://github.com/alphapapa/helm-org-rifle/tree/find-files-raw This reads unopened files "literally," which avoids activating Org mode unless the user actually chooses a result. This should avoid the slowness caused by activating Org mode in every file before searching it. Could you please test it and let me know how it goes? I'd still like to make use of some external searching tools, but that's proving a bit difficult since rifle searches by nodes rather than lines, so this may be a good solution in the meantime. |
Hi @alphapapa Happy to test it out! yet im very nontechnical as you remember :) how do i update to the new branch? best Z |
Oh, sorry. :) Probably the easiest way to try it is to go here in your browser: https://raw.githubusercontent.com/alphapapa/helm-org-rifle/find-files-raw/helm-org-rifle.el Copy and paste the contents of that file into a buffer in Emacs, then run |
Whoa boy thats insanely fast now :) :) i tried it various times over last ~1h hour and so and it seems to work great. One related question, can the results be presented in ivy instead of helm (as an option?) thx again, will be happy to test anything needed! Z |
Hey, great! Can you give me an idea of how many files you're searching with it? Like, are we talking tens, hundreds, or thousands, or...? Using Ivy is not a bad idea for an alternative UI. It wouldn't be too difficult to add. However, Ivy doesn't provide as much functionality behind the scenes, so some of the more advanced features (like choosing between multiple actions, sorting in different ways) would either have to be reimplemented from scratch (not appealing) or left out altogether. But a basic version of the command could be done easily enough. Of course, if I were to do that, I might need to rename the package since it wouldn't be just for Helm anymore (and that's something I've been considering anyway). Another thing I don't know about is whether Ivy supports multi-line entries. If it doesn't, that would be a big drawback. I'll put it on the todo list as a maybe item. ;) I'm going to hold off on releasing this find-files-raw branch for a while because I wouldn't be surprised if it causes some little bugs here and there. I'll probably tag the 1.4 release without it in the next few days, and then push find-files-raw to master after I polish and test it more, aiming to release it in 1.5. But I would appreciate it if you could continue testing it and let me know about any issues you may find. If you want to use it automatically, without having to evaluate the buffer, you can replace the Thanks. |
Hi, i think its around 100 files more or less :) best Z |
Great, thanks. |
Hi, I have been testing helm-org-rifle on a project with about 8000 files.
After applying the branch as described above, running After that, typing a string known to exists fails silently (no results, no error message). On a different note, have you considered testing sift? It seem to have multiline support. |
Hi Priyadarshan, Thank you very much, that's definitely the kind of testing I've been hoping for. That is a lot of files indeed. I am curious to see how Emacs would handle opening that many files in, say, text-mode. I'll see if I can test this myself. I'm guessing that that's simply too many for Emacs to handle quickly, and so the way rifle currently works, opening each one in an Emacs buffer first, is just not suitable for that many files. As a matter of fact, I stumbled upon sift again last night, and it's on my list of tools to test. I've tried a few others, but each one seems to have some small issue that makes it unsuitable or difficult to use for this project. I'm hoping that sift will be the one! By the way, can you give me a rough idea of the size of these files, like the average size? I doubt it matters much here, but I'm curious. For that many files, you might want to consider some kind of indexing solution. If I could impose on you, would you mind running one of your typical queries using Thanks for your help. |
Since testing on about 8000 files was too lengthy, I made a selection of a collection of 1768 files
Files are more or less the same length, total size is 76M,
So, each file is about 42K. I do use an indexing tool, recoll, but being able to access the files from Emacs would be ideal. I tried searching for pattern "please" with I tested it on Intel i7 2.6 GHz, with 16GB Ram. I wonder it if would make sense to just "slurp" all files as "fundamental mode", and then use Reading a file seems an ideal candidate for async operation, so Emacs could use all CPU cores. I would not mind to dedicate even 1GB of RAM, in oder to have all the archive available through |
Hm, well, that's a lot fewer files, but I'm guessing Emacs is going to take a while to open 1,768 files, no matter what.
Have you seen helm-recoll? I remember reading about it a while back. Here are a couple of links you might want to check: https://oremacs.com/2015/07/27/counsel-recoll/
The find-files-raw branch does load them in fundamental mode...only in the Helm commands, not the
It would be, indeed, but as far as I know, there's no way for Emacs to load files asynchronously. Tools that use async stuff, like Paradox, Magit, etc, run external processes. So, yeah, you could run a second Emacs process in the background and load all the files into it, but then you'd have to pass the results back into the first process, and if you're going to do that, you probably should just use a dedicated searching tool like sift, etc.
Well, that sounds good to me! haha :) I guess you could try loading all of the files you might want to search, then taking a coffee break while they load, and then keeping that Emacs process loaded and all those buffers open while you work. I guess the only problem might be displaying the buffer list when you need to switch buffers, but that could probably be worked around with some kind of custom function that only displays certain ones, or something like that.
That's a little bit surprising to me, but I don't actually have And it's also possible that Emacs is just not able to handle that much data coming in from a process very well. For example, if I use Magit on a git repo containing Firefox, it is...very slow indeed, just to display the status buffer. I guess that's because there are so many lines to read from the external tool, but I'm not completely sure. Well anyway, thanks for your help. I hope to be able to make rifle more useful for you in the future, but I'm not sure how much of the issue is Emacs itself. If sift turns out to work, then I think that will help a lot. |
Thank you for the detailed reply. I cut down my test files to a subset, since 8000 files were taking way too much time. Please let me know if I can be of any help testing. I like helm-org-rifle, and I would like to use it as much as possible. In case, let me know what kind of elisp functions to use for timing and benchmarking, or do replicable testing. I am submitting a small report on my own "toying" around. Just for fun, I tried to open those 1768 files with Ram usage went from about 300M to about 900M. Browsing buffers via Then I tried consolidating all files into one:
That took about 4 secs (on SSD). The nice thing was, to open that file with Having the 76M file open in a fundamental buffer, I then tried to use I then tried to install Also using |
Great, I am very thankful for testers like you!
As a matter of fact, there is a macro in the So, for example, evaluate that macro and then you can run: (profile-rifle 1 (helm-org-rifle-directories "~/org" t)) That will instrument most of the relevant commands, then run If you run the macro from an Org source block with You can also profile the internal functions, like: (profile-rifle 10 (helm-org-rifle--get-candidates-in-buffer (get-buffer "~/org/something.org") "please")) And that will search that file for that input 10 times, then display the profiling results.
Thanks, that confirms my suspicion that Emacs simply can't open that many buffers quickly.
That's a lot of memory, too, but not terribly surprising. Glad to hear that it's usable once it's loaded, though.
Yeah, I guess Emacs handles one large file better than many smaller ones. I doubt many people even try to open that many files in Emacs. :)
Yeah, helm-swoop is slow by nature. It's okay for smaller files, but...
That's interesting! Sometime I'll have to take a look at how it works. Maybe helm-swoop can be made faster. Did you happen to try Thanks for all your help. I'm going to be busy here for a while, but maybe in a few weeks we can work on improving this. |
Regarding sift, sift.el may be of some use. |
Thanks, I'll check it out. |
Well, sift is almost good enough, but not quite. For example:
That produces what looks like good output: the heading and entry contents for every entry that contains "emacs". But the problem is that the negated character sequences
I think using only negated character classes would be a bad idea, because it would result in truncated matches, and that might cause false negatives as well--I'm not sure. Anyway, it's another case of a tool being 95% of what we need, but that last 5% is really important. :( So, two possibilities that I can think of for going on from here:
|
Well, I've been experimenting, and I've found that the biggest speed problem is fontification. (defun helm-org-rifle--get-source-for-literal-results (results)
"Return Helm source for RESULTS."
(let ((source (helm-build-sync-source (car results)
:after-init-hook helm-org-rifle-after-init-hook
:candidates (cdr results)
:candidate-transformer helm-org-rifle-transformer
:match 'identity
:multiline helm-org-rifle-multiline
:volatile t
:action (helm-make-actions
"Show entry" 'helm-org-rifle--show-candidates
"Show entry in indirect buffer" 'helm-org-rifle-show-entry-in-indirect-buffer
"Show entry in real buffer" 'helm-org-rifle-show-entry-in-real-buffer)
:keymap helm-org-rifle-map)))
source))
(let ((helm-candidate-separator " ")
(fontify-fn #'identity)
(fontify-fn #'helm-org-rifle-fontify-like-in-org-mode))
(helm :sources (cl-loop for r in (let ((case-fold-search t)
(input "emacs")
(outline-regexp "\\*+ "))
(with-current-buffer (get-buffer "*test*")
(cl-loop for file in org-agenda-files
do (progn
(insert-file-contents-literally file nil nil nil t)
(goto-char (point-min)))
collect (cons file
(cl-loop while (re-search-forward input nil t)
collect (progn
(outline-back-to-heading)
(cons (funcall fontify-fn
(buffer-substring-no-properties (point)
(progn
(outline-next-heading)
(point))))
(point))))))))
collect (helm-org-rifle--get-source-for-literal-results r)))) That will show results for Now if you set And I don't see any way to fix that. Emacs has to do the fontification itself, so no matter how we feed it entries, whether from an external tool or from within Emacs, the fontification is going to be the bottleneck. So, if you are interested in a non-fontified version, I can add some code to do that. The only advantage it would have over a plain grep command is that it would show the whole entry instead of just matching lines, but that's some benefit. Let me know what you think. :) |
Thank you, very useful comments. I do not mind at all to barter fontification for more speed, I would be very interested in testing it and using it. I see Emacs more as a platform for many applications, and I think it is fine to rely on lower-level tools, like find, awk, ag, rg, etc to leverage their speed. For example, two packages that can deal with hundreds of thousand of text files are mu4e and notmuch. They both use xapian to index the messages. Perhaps in the future that could be leveraged as well. |
Ok, I'll try to push a branch with that soon. Thanks. |
I was intrigued by your hint of combining a "search-engine" like recoll. In the meanwhile, I have found beagrep, which could perhaps offer some additional ideas. |
Thanks, that looks very interesting. The author says that it only supports whole-word matches, so we'd need to test it to see how it matches Org syntax, non-alphabetic characters, etc. It might be useful. |
@alphapapa I have been testing the find-files-raw branch with great success, time between keybinding and helm buffer appearing is significantly quicker. Shaved off about 5 seconds on my setup, I have around 200 org-mode files across 60 directories. Wasnt able to get the benchmarking macro to work, should it just be a case of C-c C-c on the src_block. Finally, thanks for all the work you have put into this and your other emacs packages they are extremely helpful and much appreciated. |
@Johnstone-Tech
Thanks for the feedback. How long was the total time? Were any of the files already open in Emacs?
I'm not sure which one you mean. What happened when you tried?
Thanks for the kind words. You sound like someone who might be interested in some early code I have for indexing Org files in a SQLite database. It's not very user-friendly yet, but you can look at the
Thanks for your feedback. |
I will do some additional tests prior to opening the org files and after to verify i was getting a performance increase. Certainly feels a lot faster. Brilliant I will checkout the org-rifle branch. Sqlite seems like an appropriate choice considering how popular it is and the single file nature of the format. |
I noticed that on the find-files-raw branch, that all of the files that are searched are still open buffers after the search is complete. I was thinking that this branch implemented the "draw all content into a single file and fontify/search that". Is this not the case? If that is never going to be a thing, what do you think about a parameter that would track which files were opened as a result of search, and then subsequently close them after search is complete? |
This also produces the prompt:
When you go to open the file normally. |
No, because that would present the results as all being from a single buffer rather than being from their individual source files. One could try using text properties on each buffer's text to keep track of that; it would require changes in a few places, and it would need benchmarking.
That would make sense.
The branch is experimental, and probably needs rebasing by now, being a few years old. |
Hi again
so im loving helm-org-file and have been using it alot over the past year or so. one thing that is constantly breaking my fast workflow are commands that work on directories such as
helm-org-rifle-org-directory
andhelm-org-rifle-occur-directories
. these can take a long time to load from the time you issue the command until it asks for input. i saw today with the latest ivy release that he included ripgrep for searching with is blazing fast. I was wondering if we can have something like that in helm-org-rifle?thx alot again for your amazing work!
Z
The text was updated successfully, but these errors were encountered: