Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Text is empty while searching large (>2Gb) files #300

Open
ilgrank opened this issue Jan 5, 2022 · 13 comments
Open

Text is empty while searching large (>2Gb) files #300

ilgrank opened this issue Jan 5, 2022 · 13 comments

Comments

@ilgrank
Copy link

ilgrank commented Jan 5, 2022

Hi
Much like: https://sourceforge.net/p/grepwin/tickets/454/
even using 2.0.9.1098, the issue is still present.
I'm searching a web server log file, 2.9Gb, for malicious requests. Even tho matches are found, text is not displayed
grepwin_bug
:

@stefankueng
Copy link
Owner

the filename _urlrewrite.log indicates an etw-trace logfile. These files are not pure text but are basically binary (open it with a text editor and see for yourself).
in that case, grepWin detects the format as binary and therefore won't show 'text' in the column.

if I'm mistaken, provide a part of the file or some other way to reproduce the problem.

@ilgrank
Copy link
Author

ilgrank commented Jan 7, 2022

Hi
Thanks @stefankueng , but no, the file is plain text, as you can see in the screenshot (I'm not using Apache or Nginx)
It's just a very large text file
urlrewrite

@stefankueng
Copy link
Owner

ah, yes: files that are bigger than 2GB are not loaded into memory fully, and therefore the lines are not available. That's why they're not shown.

@ilgrank
Copy link
Author

ilgrank commented Jan 8, 2022

isn't it possible (as an option) to allow loading even large files? on a 64 bit system with plenty of RAM this should not be an issue..
Thanks :)

@garry-ut99
Copy link

garry-ut99 commented Oct 14, 2023

ilgrank : isn't it possible (as an option) to allow loading even large files? on a 64 bit system with plenty of RAM this should not be an issue..

You seem to not have an idea about opening big files, no need plenty of ram, even large text files can be opened easily with reasonable small ram consumption, the proper way of opening large text files is to open them chunk by chunk, instead of loading the whole file to ram, however, even despite of it, I still think that the limit of 2gb is reasonable, because purpose of grepWin and similiar Find & Replace in Files tools, is to find / replace text in many files which are not too big, rather than to deal with a single huge files, you definately should use other specialized tool, and google for phrases like "large file editor" or "big file viewer" to get a proper tool to deal with such huge files like your log.

@stefankueng
Copy link
Owner

while large files can be opened chunk by chunk, the nature of a regex often requires the whole file to be present. For example a search for a.*b means the match can be the a at the start of the 1TB file and the b at the end of the 1TB file.

simple search/replace tools don't have that kinds of problems. But regex tools do (at least if they do it right and not just use a part of the regex standard).

@garry-ut99
Copy link

That's one more good reason to not go above 2gb, to be clear, the context of my previous comment was to keep the current limit, and for bigger files users should use other tools.

@ilgrank
Copy link
Author

ilgrank commented Oct 16, 2023

@garry-ut99 : I have the idea that I'm using grep (under linux) to grep large log files for strings.
The same can't be done with grepWin, at the moment, hence my proposal.
What I don't get, is why people believe their use of a tool is the only legit one.

@garry-ut99
Copy link

garry-ut99 commented Oct 17, 2023

I'm usually open mind and in favor of adding new features and improvements, but some clowns artifically create huge log files by combining many smaller logs into a single huge file, then they file requests on issue trackers and want someone spent time on implementing their private whims, like here : funilrys/PyFunceble#234 (comment) :

question :
Yeah, I'm curious as well, how did he end up with a 2GB text file...

  • is it a some log which accumulated over time (weeks/months)
  • or is it a combined file from other many smaller files (filter lists, html webpages, logs, etc..)

I just can't imagine how can a single 2GB text file ever be created in normal conditions.
I don't think a lot of people need this as it is rather a premium feature request needed by individuals for specific tasks, nothing that many people would benefit from, however I'm always open mind.

reply : The data are completely mixed files that are combined into one file.

But then I've read this : https://www.portent.com/blog/analytics/how-to-read-a-web-site-log-file.htm and it has convinced me that in cases like your, such huge logs can be generated naturally on popular/big websites.

However, still several questions remain :

  • can't you simply limit size of your log files to 2gb each, so that if one log file reaches 2gb, another one is created, and so on? All of the chunks can sit in same folder.
  • by the way, I'm very curious : how much time did your log file need to achieve 2.9gb? (hours, days, weeks, months?)
  • also how many other logs do you have, how many are smaller than 2gb, and how many are bigger than 2gb?

If you prove / convice that support for such large log files is really necessary for grepWin, I am able to back off and agree to support such large files.

@stefankueng
Copy link
Owner

@garry-ut99 if you read this issue properly you'll discover that searching is NOT the problem. grepWin can search much bigger files without problems. What's missing from files > 2GB is that the result view does not show a preview of the line the text was found. But the line where it was found is still shown, and if you set up your text editor properly it will open at that specific line as well.

@garry-ut99
Copy link

garry-ut99 commented Oct 17, 2023

Nowhere did I write that "searching" is a problem, if you read my comments properly, you'll discover that I didn't speak specifically about "searching" at all in my comments, I even didn't use a word "search", like you're trying to wrongly suggest, I was only speaking about the 2gb limit you mentioned about in your comment : #300 (comment), which causes files over 2 gb not being handled properly in search results.

Another thing is that if your read even your own comment properly, you'll further discover, that you didn't said it's a bug, neither you sticked a "bug" label, thus your comment clearly looks like speaking about a 2gb limit, that's why me and ilgrank understood your comment as a limit, he was even asking you to support bigger files, which means he too understood your comment the same way I did - as a limit, not a bug, hence the discussion about 2gb limit, even yourself in your next comment : #300 (comment) were speaking about opening large files, like about a problem of supporting large files and not about a bug to fix.

And now you're denying yourself, and you're speaking like it was a bug to be fixed, and not a limit anymore, in this case try to blame yourself of expressing not properly / not clearly in your earlier comments, than to blame us of not reading properly. Also it's a second time already where it's you who have problems reading my comments or contex of my comments properly.

@ilgrank
Copy link
Author

ilgrank commented Oct 18, 2023

@garry-ut99 : the bug (and sorry for not putting a "bug" label) is that occurrences are found, but not displayed. It seemed clear to me.
It is a limit? It is a bug? I'm not arguing on how to name it.

@garry-ut99
Copy link

garry-ut99 commented Oct 18, 2023

ilgrank : and sorry for not putting a "bug" label

  • it's fine, but I don't require an apology for minor things,
  • if you read my comment properly you'll discover I blamed the admin, not you, hence I don't understand why do you apologize for him
  • if you read github manual properly you'll discover that only admins and collabuolators can give labels, you can't apologize for something you can't do anyway...

ilgrank : the bug
ilgrank : It is a limit? It is a bug? I'm not arguing on how to name it.

You filled an issue, not a bug, and then the conversation between you both was looking like speaking about a limit, not a bug, especially given a fact there was no "bug" label, then the admin blamed me of not reading properly, then I pointed out to him it was rather him not being clear enought.

ilgrank : the bug is that occurrences are found, but not displayed. It seemed clear to me

  • if you read my comments properly you'll discover I do know what the bug / limit is about,
    the only problem is whether it's a limit or a bug, which you yourself admitted that you don't know either
  • despite I made it clear when replied to the admin, you still dumply repeat his wrong assumptions
  • you both don't need to tell me something I already know since the beginning,
  • please you both stop trying to make me look like an idiot, because it's not me who is looking like an idiot here...

I feel like I'm defending myself against being trolled, but if that's just some misunderstanding, not trolling, then fine, and I hope it's all now clear.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants