-
-
Notifications
You must be signed in to change notification settings - Fork 2.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Added an option to compress the created hosts file. #459
Conversation
In particular, the compression option removes non-necessary lines (empty lines and comments) and puts multiple domains in each line. This option should solve the issue StevenBlack#411 regarding the DNS client service of Windows.
Thank you for submitting this pull request! We’ll get back to you as soon as we can! |
Excellent. How much smaller the file is in average? |
My hosts file, with all the extensions and some custom entries, passes from 1.47MB to 1.00MB, but the biggest change is in the number of lines: 5947 instead of 58143. |
30% is an excellent gain. I'm adding the hosts file into initramfs in my custom kernels, so the smaller it is, the better. |
@stefanopini as you can see, |
In particular, the compression option removes non-necessary lines (empty lines and comments) and puts multiple domains in each line. This option should solve the issue StevenBlack#411 regarding the DNS client service of Windows.
@StevenBlack sorry, I forgot to check the code with flake8 before committing it. |
updateHostsFile.py
Outdated
lines_index = 0 | ||
for line in input_file.readlines(): | ||
line = line.decode("UTF-8") | ||
if line.startswith('#') or line.startswith('\n'): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This could be changed to
if line.startswith( ('#', '\n') ):
for same effect with less code. Note that it has to be a tuple of characters, a list won't work.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@stefanopini I think that @rautamiekka has a good point here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Updated. Thank you
@@ -249,7 +250,7 @@ def test_freshen_update(self, _): | |||
|
|||
def tearDown(self): | |||
BaseStdout.tearDown(self) | |||
BaseStdout.tearDown(self) | |||
# BaseStdout.tearDown(self) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Couldn't we do this like this to avoid erroring ?
while 1:
try:
BaseStdout.tearDown(self)
except:
break
Of course, if there are any exceptions to deal with, deal with them appropriately, and don't forget to break
the loop in any case.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Even if it should work, I think this is not necessary.
As far as I understand, the tearDown method closes the current standard output file (it was changed before calling the setUp method) and re-sets it to the system standard output.
Calling tearDown twice causes the closure of the system standard output, raising an error in the following.
If line quantity is the issue, would "compressing" contiguous comment lines to a single line also reduce line quantity significantly enough to be acceptable without needing to remove comment lines entirely? Also, in light of this being a line issue, have the type of line endings been tested, such as testing and comparing CR, LF, CR+LF? Depending on how resources might be allocated and how line endings are interpreted during parsing, is it possible for those having this issue that their systems were designed for efficiency at reading small lines of data iteratively broken by specific line endings, in which case inappropriate line endings may glob the entire file together as one line and be inefficiently parsed? |
Hi @stefanopini thank you for this pull request. I'll be evaluating this shortly, thank you for your patience! |
@stefanopini can I ask you to please do the following before I merge?
Thanks! |
@ScriptTiger the idea before the removal of the comments and the empty lines is that the hosts file created with this script shouldn't need to be edited manually. Thank you for suggesting to test the different like breaks, I didn't think about that. Unfortunately, the problem occurs with every line break (CR, LF, and CR+LF). Interestingly, Windows is capable to parse each of them. I also tried the hosts files on a machine with up-to-date Windows 10 (build 1709) and the issue is still there. |
As this is an opt-in type of feature, it's worth considering for addition even if windows will be fixed in the future. I for one would be very happy about the size gain in the initramfs scenario I'm using |
Removed a redundant skipstatichosts option.
@StevenBlack I updated the docs and the |
updateHostsFile.py
Outdated
continue | ||
|
||
if line.startswith(target_ip): | ||
l = len(lines[lines_index]) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@stefanopini in the code below, I'm curious why you chose these particular limits for line length? Why limit at all?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was quite sure Windows had a length limit for each line of the hosts file, so I chose a value to reduce the file size without incur in errors.
But, searching the reference for that, I discovered that the limit doesn't refer to the the length of the line, but to length of the single hostname.
On the contrary, the limit that matters in our case is the number of supported hostnames for each line, fixed to 9. I found that also the issue #49 reports that.
Therefore, the generated hosts file must have at maximum 9 hostnames for each line, regardless of the line length. I've just committed an update with the correct if
statement and little changes.
I apologize for the dumb error, I should have checked it before. Unfortunately, with my length limits, nearly every line contains less than 9 hosts, so I didn't notice the error.
I'll update the stats previously posted with the new values.
Fixed the number of domains in each line and added the support to inline comments (they will be ignored as the comment lines). Code refactoring.
Thanks @stefanopini. You're awesome. |
Congratulations on merging your first pull request! 🎉🎉🎉 |
thanks for your continuing effort, if that can help people, good. as for me, I'm really sorry, but I have no idea how to "compress" a host file. |
@user789465
|
@user789465 |
I was doing a bit more research into this and actually came full circle back to this repository, ranked second on Google to a hit on superuser.com, so I thought I'd share it here just in case anyone wanted to do a bit more digging into this: #49. I guess @stefanopini had already mentioned it here at some point, as well, but edited/deleted it out at some point. I am not actually personally affected by this, but it definitely is an interesting new feature indeed. So props to everyone that helped finally solve this and bring the solution to fruition. |
the compression seems to work a little; |
@user789465 Windows – always a problem it seems – limits this to 9 domains per line. |
Hello, how to compress hosts file without Python installation? I have VSCode if need. |
@KostiantynO, what's your OS? |
Hello, man! Thanks for feedback!👍 Then just took hosts file with mouse🐁 and drag-and-dropped🖱️ it directly into Compressed.cmd file📄! It was great!!🎉 |
@KostiantynO, I'm glad that could help you. I also have another project that can do compression and update your hosts file automatically, if you're interested in that: |
In particular, the compression option removes non-necessary lines (empty lines and comments) and puts multiple domains in each line.
This option should solve the issue #411 regarding the DNS client service of Windows.
A test for this option is still missing, I hope someone can work on it.