Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Added an option to compress the created hosts file. #459

Merged
merged 4 commits into from
Jan 2, 2018

Conversation

stefanopini
Copy link

In particular, the compression option removes non-necessary lines (empty lines and comments) and puts multiple domains in each line.
This option should solve the issue #411 regarding the DNS client service of Windows.
A test for this option is still missing, I hope someone can work on it.

In particular, the compression option removes non-necessary lines (empty lines and comments) and puts multiple domains in each line.
This option should solve the issue StevenBlack#411 regarding the DNS client service of Windows.
@welcome
Copy link

welcome bot commented Dec 30, 2017

Thank you for submitting this pull request! We’ll get back to you as soon as we can!

@tbalden
Copy link

tbalden commented Dec 30, 2017

Excellent. How much smaller the file is in average?

@stefanopini
Copy link
Author

stefanopini commented Dec 30, 2017

My hosts file, with all the extensions and some custom entries, passes from 1.47MB to 1.00MB, but the biggest change is in the number of lines: 5947 instead of 58143.
If it's useful, I can do some tests to see the average improvement on different scenarios.

@tbalden
Copy link

tbalden commented Dec 30, 2017

30% is an excellent gain. I'm adding the hosts file into initramfs in my custom kernels, so the smaller it is, the better.
I'm curious how Linux/android would handle the parsing. I'll try it when I get some time, out of curiosity.

@StevenBlack
Copy link
Owner

@stefanopini as you can see, travis-ci is failing... Why did you comment-out the teardown code for testing?

In particular, the compression option removes non-necessary lines (empty lines and comments) and puts multiple domains in each line.
This option should solve the issue StevenBlack#411 regarding the DNS client service of Windows.
@stefanopini
Copy link
Author

@StevenBlack sorry, I forgot to check the code with flake8 before committing it.
There were some too-long lines and an ambiguous variable... It should be ok now.
I commented the second tearDown call because it was executed twice, causing an error in my Windows environment. Why was it called twice? I didn't get it.

lines_index = 0
for line in input_file.readlines():
line = line.decode("UTF-8")
if line.startswith('#') or line.startswith('\n'):
Copy link

@rautamiekka rautamiekka Dec 30, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This could be changed to

if line.startswith( ('#', '\n') ):

for same effect with less code. Note that it has to be a tuple of characters, a list won't work.

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@stefanopini I think that @rautamiekka has a good point here.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated. Thank you

@@ -249,7 +250,7 @@ def test_freshen_update(self, _):

def tearDown(self):
BaseStdout.tearDown(self)
BaseStdout.tearDown(self)
# BaseStdout.tearDown(self)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Couldn't we do this like this to avoid erroring ?

while 1:
    try:
        BaseStdout.tearDown(self)
    except:
        break

Of course, if there are any exceptions to deal with, deal with them appropriately, and don't forget to break the loop in any case.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Even if it should work, I think this is not necessary.
As far as I understand, the tearDown method closes the current standard output file (it was changed before calling the setUp method) and re-sets it to the system standard output.
Calling tearDown twice causes the closure of the system standard output, raising an error in the following.

@ScriptTiger
Copy link
Contributor

ScriptTiger commented Dec 31, 2017

If line quantity is the issue, would "compressing" contiguous comment lines to a single line also reduce line quantity significantly enough to be acceptable without needing to remove comment lines entirely?

Also, in light of this being a line issue, have the type of line endings been tested, such as testing and comparing CR, LF, CR+LF? Depending on how resources might be allocated and how line endings are interpreted during parsing, is it possible for those having this issue that their systems were designed for efficiency at reading small lines of data iteratively broken by specific line endings, in which case inappropriate line endings may glob the entire file together as one line and be inefficiently parsed?

@StevenBlack
Copy link
Owner

Hi @stefanopini thank you for this pull request. I'll be evaluating this shortly, thank you for your patience!

@StevenBlack
Copy link
Owner

StevenBlack commented Jan 1, 2018

@stefanopini can I ask you to please do the following before I merge?

Thanks!

@stefanopini
Copy link
Author

@ScriptTiger the idea before the removal of the comments and the empty lines is that the hosts file created with this script shouldn't need to be edited manually.
In my scenario (using all the extensions and some custom rules), comment and empty lines are respectively 2271 and 2629 out of 58143 lines. Contiguous comment lines are 793 only, therefore I don't consider them remarkable. Nevertheless, since comment lines are not impacting a lot on the total number of lines, we could add an option to keep them in the future.

Thank you for suggesting to test the different like breaks, I didn't think about that. Unfortunately, the problem occurs with every line break (CR, LF, and CR+LF). Interestingly, Windows is capable to parse each of them.

I also tried the hosts files on a machine with up-to-date Windows 10 (build 1709) and the issue is still there.

@tbalden
Copy link

tbalden commented Jan 2, 2018

As this is an opt-in type of feature, it's worth considering for addition even if windows will be fixed in the future. I for one would be very happy about the size gain in the initramfs scenario I'm using

Removed a redundant skipstatichosts option.
@stefanopini
Copy link
Author

@StevenBlack I updated the docs and the startswith code.
Do you think we're ready to merge now?

continue

if line.startswith(target_ip):
l = len(lines[lines_index])
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@stefanopini in the code below, I'm curious why you chose these particular limits for line length? Why limit at all?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was quite sure Windows had a length limit for each line of the hosts file, so I chose a value to reduce the file size without incur in errors.
But, searching the reference for that, I discovered that the limit doesn't refer to the the length of the line, but to length of the single hostname.
On the contrary, the limit that matters in our case is the number of supported hostnames for each line, fixed to 9. I found that also the issue #49 reports that.
Therefore, the generated hosts file must have at maximum 9 hostnames for each line, regardless of the line length. I've just committed an update with the correct if statement and little changes.
I apologize for the dumb error, I should have checked it before. Unfortunately, with my length limits, nearly every line contains less than 9 hosts, so I didn't notice the error.
I'll update the stats previously posted with the new values.

Fixed the number of domains in each line and added the support to
inline comments (they will be ignored as the comment lines).
Code refactoring.
@StevenBlack
Copy link
Owner

Thanks @stefanopini. You're awesome.

@StevenBlack StevenBlack merged commit 9d634c3 into StevenBlack:master Jan 2, 2018
@welcome
Copy link

welcome bot commented Jan 2, 2018

Congratulations on merging your first pull request! 🎉🎉🎉

@ghost
Copy link

ghost commented Jan 5, 2018

thanks for your continuing effort, if that can help people, good.

as for me, I'm really sorry, but I have no idea how to "compress" a host file.

@onmyouji
Copy link

onmyouji commented Jan 5, 2018

@user789465

  • install python and download the repo.
  • from command prompt type "python updateHostsFile.py -a -c"

@tbalden
Copy link

tbalden commented Jan 5, 2018

@user789465
Please read the opening description and big things will be revealed to you!
"In particular, the compression option removes non-necessary lines (empty lines and comments) and puts multiple domains in each line."

@ScriptTiger
Copy link
Contributor

ScriptTiger commented Jan 5, 2018

I was doing a bit more research into this and actually came full circle back to this repository, ranked second on Google to a hit on superuser.com, so I thought I'd share it here just in case anyone wanted to do a bit more digging into this: #49. I guess @stefanopini had already mentioned it here at some point, as well, but edited/deleted it out at some point.

I am not actually personally affected by this, but it definitely is an interesting new feature indeed. So props to everyone that helped finally solve this and bring the solution to fruition.

@ghost
Copy link

ghost commented Jan 7, 2018

the compression seems to work a little;
is there a way to compress even more ?
(before I had 60sec delay lag, now it feels like 10 or 5sec)
the host file is about 5000 lines, maybe 100 would to the trick ?
how to make that happen ?

@StevenBlack
Copy link
Owner

@user789465 Windows – always a problem it seems – limits this to 9 domains per line.

@KostiantynO
Copy link

@user789465

  • install python and download the repo.
  • from command prompt type "python updateHostsFile.py -a -c"

Hello, how to compress hosts file without Python installation? I have VSCode if need.

@ScriptTiger
Copy link
Contributor

@KostiantynO, what's your OS?

@KostiantynO
Copy link

@KostiantynO, what's your OS?

Hello, man! Thanks for feedback!👍
Win10 x64, I already figured out it by myself💪.
I followed your link and downloaded Compressed.cmd file📄.

Then just took hosts file with mouse🐁 and drag-and-dropped🖱️ it directly into Compressed.cmd file📄!

It was great!!🎉
I didn't know that I can launch the bash script by dropping file onto *.cmd files!!
So it did comperresed my hosts from 3MB (115K lines) into 2MB (12K lines) by 9 domains per line.
And DNS client with enabled DNS caching is now working fine, I hope 🤞🙏.
At least DNS cache does not eat 100% of CPU now.💻✨
That is some progress!🏆

@ScriptTiger
Copy link
Contributor

@KostiantynO, I'm glad that could help you. I also have another project that can do compression and update your hosts file automatically, if you're interested in that:

https://github.com/ScriptTiger/Unified-Hosts-AutoUpdate

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants