-
Notifications
You must be signed in to change notification settings - Fork 1.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Better document the PSL roadmap and needs #671
Comments
In addition to your list of issues, I would add the failure to rely upon |
@sleevi @weppos I haven't a sense that we have gotten anywhere on this (likely due to #dayjobs), but I have made some great headway with respect to how we engage with other entities in the ICANN and domain space. I have worked with the ICANN Office of the CTO team on helping create a document to be distributed within the IANA to ccTLD and gTLD administrators to help elevate their awareness of the PSL, and we'll be presenting this at the ICANN 67 meeting in Cancun, Mexico in March of 2020. I believe that this will help us with an improved quality of the requests that come in to the ICANN section at the top of the file. Where I think we're suffering is the PRIVATE section and the increasing volume of requests that are hitting us. We'd proposed splitting the file at that horizon, and I think it is a good idea. IF we did that, we need to prepare people for it. It seems to me the place that would get folks to notice would be in the header sections or adding a new comment line or two in the file itself. |
Jothan: It might be useful to focus on the problem you’d like to solve,
rather than the solution.
Both as a maintainer and as a consumer, I don’t believe there is any
benefit to be had at all from splitting the file, and that it would do more
harm than good. That said, I’m probably missing something important that
you’re concerned about, and so I’d want to make sure we got that
documented, before discussing a solution.
It would probably be good to open up as a separate issue for the specific
problem(s) you see, which we can reference here, so that the roadmap
solution is “Solve Problem X” rather than “Do Thing Y”
|
I suppose it would be better to not split the list, unless there is a demand by those who want to treat the sections differently (there is a CA use case given on the PSL web site). If there is no such demand, why bother? |
I'll back off on the idea... that's just a bias for results within me fighting to help this project thrive.
Completely see this perspective. I would not want in any way to introduce disruption.
I think the challenge here, for all of us as volunteers, is the #dayjobs vs time to invest in the architectural stuff and writing up documentation. Clearly, though we have a mailing list and the ability to communicate via github or dm, we can discuss things, but I wonder if we might benefit from some form of ability to announce stuff like changes or proposals and or poll the integrators/users about their biases. |
Right, every known consumer wants both, so splitting doesn’t solve any problems for consumers. It also doesn’t reduce the number of PRs, and having changes go to different files just increases complexity without compelling benefits (at least, AIUI; if there are overlooked benefits, we should nail them down)
It seems like we have that already, as you mention? It’s not clear to me what would be missing in that? |
From my POV we closed the discussion on splitting the file into two sections - just using my leaf blower on the remnants of the chalk dust from the outline of that horse. Moving on ...Announcements/Polls
To answer that, lets journey back to the initial issue -
IF we embark on that type of roadmap dialog, should we not engage the integrators, users and consumers of the PSL? I assert that most of them blindly download the .dat file and are not on mailing lists or monitoring this on github. I am not saying or recommending we do it, but it seems that the most effective manner to reach the largest number of PSL interested parties might be to tweak the file header to include announcements of some sort in a manner that would let us engage them w/o breaking stuff. |
I am closing a few lingering issue reports, and have caught some meta issues that I'll document in issues which we could incorporate into a roadmap concept and reference them in this Issue |
I agree with @sleevi that splitting would not solve the problem of the management of the private section. May solve other problems, but I'm find myself in great agreement with @sleevi statement
I strongly believe automating the submission and validation process is the key. I do have some proposals on how to make it happen leveraging a slightly revised version of the DNS validation we use today. I hope to be able to find the time to make a prototype. In short terms, I'd like to:
|
Could this be leveraged for automation of removals at some point? |
What would this look like? The LE process deals with a "token" they provide which for all intents and purposes is the # of the PR within the _PSL txt record currently helps indicate to me that there is a tether to the PR |
The Git hash of the modified version of the PSL, for example. You could
compute that prior to submitting the PR by making your modifications
against the current HEAD.
|
I am all for automating as much as we can, and leveraging the DNS infrastructure where possible for it. Not trying to go too far down the road on being prescriptive but the RFC 8552 stuff that a zone admin might add could hold a txt record that matches the git user handle so we could know who's an authoritative rep |
I suppose the objective is that such a string would never end up in the DNS, unless so intended by an authorized admin. That can also be achieved by using a hash of the concatenation of |
I don’t think stability is necessarily a goal here. The goal is to be able to quickly authenticate a pull request, which is why the current method uses the PR number. It’s fairly common for a PR to modify multiple domains. To be clear, it is not that someone needs to continually be updating this value. The primary objective is merely authenticating the PR. |
Possibly, yes. Regarding the how is going to work, I'm working to get a proposal out for feedback. I am trying to stay away to use anything connected to how we manage the list. In other words, using something related to Git will make the process strictly tied to how we use Git today, similar to the fact the DNS TXT today references the GitHub repo. I am more inclined to find something that doesn't require any extra shared state besides the suggested PSL change and the hostname itself. That would be sufficient, in combination with the fact the authentication is ultimately whether the user can edit the DNS records or not. |
True. In an earlier comment, it was said that the PR number should be replaced by something self-referencing. The question is what "self" should be: Should it identify the change, or should it identify the candidate public suffix at which the record is added? The latter has the advantage that, if changes in a PR are required, that would not invalidate the verification records configured in the DNS prior to PR submission; they could be reused for a changed or even a completely new PR. That is what I meant with stability; I did not mean long-term stability for continued verification. The same goal can be achieved by allowing the hash of any commit within the PR as a verification token. The invalidation problem upon PR changes can then be avoided by adding changes as new commits, squashing them at merge time. However, I think that's more complicated for users. I have no stakes in this, I simply proposed this because I thought it covers the requirements (as far as they are known to me) and is suitable to reach the goal with minimal friction. |
A significant majority of the entries in the PRIVATE section of the list are simply entries of a domain without any use of the list's special features or syntax. Inclusion on the list is simply used as a signal that subdomains are untrusted, mostly for cookie security. List entry is basically a reflection / descriptor of domain's DNS configuration. To enable automatic updates to the list, each domains entry could be stored inside of a TXT record on the domain. An automated system could automatically update the list by checking the TXT record. There would be no need for tokens - presence of the record would be enough authentication to indicate authorization. The ability to manage DNS is enough to indicate intent to be on the list, as having the authorization to manage DNS is the authorization you need in order to manage DNS records of subdomains. Further, I don't necessarily think there needs to be a central list of private domains. I can't think of any reasonable use case where the PRIVATE list is used for anything except lookups - there's no need for enumeration. We could instead standardize a DNS record indicating the status of a domain as a "public suffix." I'm not 100% sure what to call it, but maybe something like This wouldn't be a ton of overhead - DNS is very fast and designed for pretty much this use-case (at its essence, it's a distributed hosts list). For example, on first connection to a domain, browsers could request and cache that record, and use its value to enforce cross-origin policies. DNS lookups add relatively low latency to requests (which already require a network connection), and the result is cacheable. I can't think of a use-case where enumeration or offline lookups are required - and I think standardizing a DNS record would be a much more maintainable strategy. However, using DNS records as a basis for generating the list would mean no complex authorization schemes. All that would be required would be submitting a domain to an automated system which compiles the list from the records. However, there'd have to be thought put in to anti-spam/abuse of the automated system. |
Ben: Thanks for your suggestions! It might be worthwhile to visit the
archives of the IETF DBOUND mailing list, which explored what you proposed
and looked at the real tradeoffs different client use cases had to be
concerned about.
You might also find
https://github.com/sleevi/psl-problems/blob/master/README.md helpful for
historic context about why the list in its current form exists.
Hope this helps!
|
Thanks for the links, @sleevi! The design of the internet is fascinating to me. Especially interested in HTTP State Tokens as an alternative to cookies. Will definitely read more of the archives from the DBOUND list, but I'm glad more experienced people than myself have already considered that option. Anyways, would it be possible to use similar dns records to automate maintenance of the list itself? That would at least answer the "when can we remove entries?" question. However, I understand additions to the list are obviously much more difficult in order to prevent abuse. What there was an automated system which charged a nominal fee ($25?) similar to domain registration? The funds could go to IETF or a similar non-polarizing public-benefit organization and used to disuade people from using the list to circumvent things such as LetsEncrypt's rate limiting and, with guidance, help a requestor better understand the economic impacts of addition. A manual review option could be available for projects who could not afford the fee. Plus, money changing hands through conventional means almost always leads to auditable identity and accountability in cases of abuse. |
While your point is the operation of the PSL through the DNS, I wanted to point out that consumers can already use the DNS for querying the PSL, see https://publicsuffix.zone/. |
Thanks again for your suggestions.
There are zero plans to ever charge for the list, and it would be counter
to the goals and ethos that created that list, just like we would not
charge to submit open-source patches.
Automation has been heavily discussed by the PSL maintainers (and I believe
some of that is archived in the psl-discuss@ mailing list). It can indeed
be helpful, but does not solve the removal problem without having all
domains on an automated solution. I believe the discussion around those
tradeoffs was public, although it may have been on the older,
maintainers-only mail list unfortunately.
|
@sleevi That makes a lot of sense. I admire the work y'all do. Let me know if there's anything I can do to help. |
Thanks, Ben - ideas and energy always appreciated.
The big challenge in making any changes come from the diversity of usage
and expectations of status quo that are present in use cases out there in
the wild.
We are seeing where let's encrypt or other ca are using PSL as a fast fail
on requests, and other services do something similar with respect to domain
behaviors.
Depending upon the use case, the list may be used in part or in whole in a
variety of ways, and we got stuck in dbound because it was challenging to
even define a "public suffix" due to the diversity of use cases.
Some wanted to have a top down authority chain from the root, akin to
DNSSEC. This breaks some use cases.
Some wanted to publish all their info from their zone. This might work,
but without a 100% replacement that is backward compatible it would mean
the costs of ioerating a parallel system and dealing with synching plus
authority issues.
We are volunteers here, and are not looking to introduce less opportunity
to spend time with families or distract from day jobs if one is fortunate
enough to have one right now.
So the pr# txt _psl.foo.bar gives us validation/verification for now.
Automation of this would be helpful in speeding things up, but it does
cause a quick human review of the rationale of the request and validation
of the dns.
We want to have some connective tissue between the PR and the
administration of a domain name. The PR is specific to a given change, so
it is a reasonable assurance of verification.
Though it may sound like a bias on ensuring technical review was present,
adding the txt record tends to trigger a technical scrutiny that helps
demonstrate that someone will be aware of the impacts of adding or removing
an entry will put some thought to it.
With respect to money being collected...
I also believe, like Ryan, that introducing charging money at this time
would not be a good idea.
It may set expectations, and also might advantage or disadvantage some, and
it has not been something that has a cost other than git ability and
patience.
Perhaps in general, my only shift in this attitude about charging would be
ensuring that the hosting of the list is covered in the future, should
github change their model, or if there were costs that need covering to
evolve the service.
…-J
On Tue, Apr 14, 2020, 8:14 AM Ben Aubin ***@***.***> wrote:
@sleevi <https://github.com/sleevi> That makes a lot of sense. I admire
the work y'all do.
Let me know if there's anything I can do to help.
—
You are receiving this because you were assigned.
Reply to this email directly, view it on GitHub
<#671 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AACQTJIWK4KLYJCDJ7WVAP3RMR4WJANCNFSM4FBYIEKQ>
.
|
Updates: See
|
@simon-friedberger I think this can be closed as we have since established sufficient guidelines for what we accept. |
Please leave open |
If we look at where things historically were, the PSL emerged from three distinct needs:
.co.uk
, which separated out the.uk
namespace into a set of 2LD groupings that were organized similar to how the gTLD namespace was organized at the time:com
/net
/org
translated toco.uk
/.net.uk
/.org.uk
ar.com
orgb.com
- acting as ccTLDs within the.com
space,.gb.net
in.net
, et c.),operaunite.com
,appspot.com
, andblogspot.com
. This was a rather late addition - two of these were only added in 2010,operaunite.com
slightly before then.The registry data was almost entirely reported by the PSL maintainers, chasing down registry operators. Registered domains acting as domain registries was largely due to CentralNic, a popular Registrar that also operated or partnered with several ccTLDs, and thus the data was incidentally picked up. The third case - hosting providers - was not really imagined in the PSLs creation, although it's come to dominate the number of changes to the PSL today.
The PSL has had some growing pains along the way - the opening of the gTLD space by ICANN meant that self-maintaining registry data was no longer an operation that could be done by the PSL maintainers alone, because the sheer number of new registries prevented the effective and ongoing maintenance of that. Registries started to be added by script, and the manual curation of existing records no longer became a thing much dedicated time was spent towards.
A number of dynamic DNS providers were added, which are in a similar-but-not-identical case as the second - there's generally not WHOIS services being provided, registration policies are a bit ad-hoc, but both are aligned in that they provide vanity suffices for registrants.
The growth of Internet services (and the centralization onto common platforms) has driven a significant amount of churn in the third case. New providers come up and old providers wither away, and the maintenance of that list is done almost exclusively based on self-reporting, with some basic automation before addition (the TXT records), since it's no longer possible to scale the previously investigative-analysis that every PSL change got.
As the PSL itself has grown, consumers have had to dramatically alter how they consume they list - filtering out some use cases (such as the third), pushing for more information to be included for the first two use cases, or even rewriting the data structures used, going from static lists to hash lists to tries (compressed or full). Each big growth spurt of the PSL has forced some change for consumers.
Similarly, the adoption of the third case has increased the rate of change in the PSL. While previously the first case could be largely met by a static list updated annually, supporting the second and third cases mean that changes on the order of days are at times necessary for consumers, as otherwise domain holders can't use certain features or they don't work correctly.
The PSL is thus at an inflection point - supporting all of these use cases means that its pace of change and its growth rate are no longer sustainable for the use cases and consumers it supports, and every new use of it brings greater overall risk into the ecosystem.
We thus need to figure out a roadmap for how the PSL will be maintained and scale, what use cases it will consider and not consider, and if and how to wean existing consumers off it, in the search for better solutions.
The text was updated successfully, but these errors were encountered: