-
Notifications
You must be signed in to change notification settings - Fork 4
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: introduce support for generating observations for circumvention nettests #48
base: main
Are you sure you want to change the base?
Conversation
|
||
@add_slots | ||
@dataclass | ||
class CircumventionToolObservation(MeasurementMeta): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
While this looks really good, I think we should invest a bit of time to define this Observation data model such that's flexible enough for all future circumvention tools.
Ideally it would be something that can adapt nicely to new circumvention tools as we put them out and it's probably worth checking with @ainghazal what his thoughts are on the topic.
Some of the considerations to keep in mind are the following:
- Schema migrations are a pain, so the less we do the better it is
- It's easier to add new columns, than it is to change an existing column
- We should factor in schema evolution in such a way where we make it as future proof as possible, but where we anticipate changes, they are done through the addition of new columns, rather than changes to existing ones
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I lack broader context about data design for observations, but in principle, I think I'd go for a generic circumvention observation (with perhaps method family and optional flavor or configuration parameters) rather than a flat observation table that tries to accommodate all of them.
A couple of quick thoughts though:
- If schema migrations are painful, wouldn't it be a good idea to spend some effort and try to come up with a solution that automates them? (thinking in the equivalent for django's
south
). I guess basically we'd need version and a way to convert semantically equivalent data for each field, plus the ability to mark a change as backwards incompatible (NA before a version cut). - One thought I've been entertaining is to draw a "family tree" of circumvention tools (proxy, VPN, onion routing) that captures at least broad aspects of protocols, and then allows to specify changing parameters (for example, Tor over an Obfs4 bridge, with endpoint E, where obfs4 has a version that allows us to compare breaking changes etc). Same for VPN,
is_vpn=True
butproto=wireguard
&&transport=tcp
&&obfuscation=foo
.
This looks really good, thanks for working on it! Could you perhaps split this PR up into two, where you keep the stuff which adds new Observation data models (the one about circumvention tools) so that we can land the facebook_messenger observation generation sooner? I think the new Observation tables for circumvention tools we should take a bit more time to think if this approach is solid (which it might very well be) and I would not want that to stall on landing the facebook_messenger transformer which is much more straightforward. |
psiphon
andvanilla_tor
using a newCircumventionToolObservation
obserations class.