Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: introduce support for generating observations for circumvention nettests #48

Draft
wants to merge 5 commits into
base: main
Choose a base branch
from

Conversation

DecFox
Copy link
Contributor

@DecFox DecFox commented Dec 7, 2023

  • This diff introduces support for generating observations for circumvention nettests: psiphon and vanilla_tor using a new CircumventionToolObservation obserations class.


@add_slots
@dataclass
class CircumventionToolObservation(MeasurementMeta):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

While this looks really good, I think we should invest a bit of time to define this Observation data model such that's flexible enough for all future circumvention tools.

Ideally it would be something that can adapt nicely to new circumvention tools as we put them out and it's probably worth checking with @ainghazal what his thoughts are on the topic.

Some of the considerations to keep in mind are the following:

  • Schema migrations are a pain, so the less we do the better it is
  • It's easier to add new columns, than it is to change an existing column
  • We should factor in schema evolution in such a way where we make it as future proof as possible, but where we anticipate changes, they are done through the addition of new columns, rather than changes to existing ones

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I lack broader context about data design for observations, but in principle, I think I'd go for a generic circumvention observation (with perhaps method family and optional flavor or configuration parameters) rather than a flat observation table that tries to accommodate all of them.

A couple of quick thoughts though:

  1. If schema migrations are painful, wouldn't it be a good idea to spend some effort and try to come up with a solution that automates them? (thinking in the equivalent for django's south). I guess basically we'd need version and a way to convert semantically equivalent data for each field, plus the ability to mark a change as backwards incompatible (NA before a version cut).
  2. One thought I've been entertaining is to draw a "family tree" of circumvention tools (proxy, VPN, onion routing) that captures at least broad aspects of protocols, and then allows to specify changing parameters (for example, Tor over an Obfs4 bridge, with endpoint E, where obfs4 has a version that allows us to compare breaking changes etc). Same for VPN, is_vpn=True but proto=wireguard && transport=tcp && obfuscation=foo.

@hellais
Copy link
Member

hellais commented Dec 7, 2023

This looks really good, thanks for working on it!

Could you perhaps split this PR up into two, where you keep the stuff which adds new Observation data models (the one about circumvention tools) so that we can land the facebook_messenger observation generation sooner?

I think the new Observation tables for circumvention tools we should take a bit more time to think if this approach is solid (which it might very well be) and I would not want that to stall on landing the facebook_messenger transformer which is much more straightforward.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants