-
Notifications
You must be signed in to change notification settings - Fork 82
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Towards Comprehensive Analysis of Tor Hidden Service Access Behavior Identification Under Obfs4 Scenario #101
Comments
I think this is the research paper that corresponds to the data set. https://dl.acm.org/doi/10.1145/3491396.3506532 What brought it to your attention? |
Sorry, I forgot to include the link. The article itself doesn't look particularly good, but some things caught my interest:
If all this is true, it might be interesting to think about adding cover traffic to any obfs4 successor. That, and perhaps try to play with adversarial attacks that can confuse the classifier, since DF is just a CNN ;) Anyway, feel free to close if it doesn't seem relevant to discuss further, I understand the need for some filtering criteria. |
It's all good, don't worry about it. It's encouraged to post anything you've read that you've found interesting.
It's actually possible to get almost arbitrary traffic shaping in obfs4.
The idea is: you have an algorithm for the traffic schedule that is independent of the tunneled applications actual traffic sending. When the schedule calls for a certain number of bytes to be sent at a certain time, you send encrypted application data if there is any available; otherwise you send padding. Something similar is possible with Shadowsocks AEAD ciphers using encryptions of empty plaintexts, even though Shadowsocks doesn't have explicit support for padding. I heard this idea from @fortuna. When you need a source of bytes for padding, you can encrypt a zero-length plaintext, which gives you 34 bytes that decrypt to nothing: 2 bytes encrypted length field, 16 bytes length field authentication tag, 0 bytes encrypted payload, 16 bytes payload authentication tag. You can concatenate as many of these empty ciphertexts as you need to satisfy the traffic schedule. |
Defending against these sort of attacks is beyond the threat model of the obfs4 traffic shaper implementation (and to be honest is beyond the threat model of obfs4 in general). For a while I was working on something that supported using an at-the-time state of the art anti-fingerprinting defense, but I stopped working on it because I lost interest. I have been told that some people evaluated it for a research paper and it worked quite well, though even with some congestion awareness trickery, it did burn a (to me) unreasonable amount of bandwidth. Edit: Don't take this as a suggestion to go dig up the code for my old prototype either, There is quite a bit of the design that I would change in the unlikely event that I were to do it again. |
Thanks for the clarification, it makes sense. Intuitively I would have expected that the protections against fingerprinting via packet timing had been somehow more effective.
I'd be interested in any pointers to that, if it's been published. My hunch is that, for the particular case of convolutional neural nets (which seem to be on the top accuracy for the classifiers I've seen so far), there might be a relatively cheap set of perturbations that would flip the output. I have no hands-on experience with that, but it looks like a fun experiment to run (the most extreme of such attacks on the classifiers is perhaps https://arxiv.org/abs/1710.08864). |
Towards Comprehensive Analysis of Tor Hidden Service Access Behavior Identification Under Obfs4 Scenario
Xuebin Wang, Zeyu Li, Wentao Huang, Meiqi Wang, Jinqiao Shi, Yanyan Yang
dataset: https://github.com/Meiqiw/obfs4-mingan
In this paper, we present a novel approach to identify Tor hidden service access activity with key sequences under Obfs4 scenario. By calculating the key cell signals occurred during Tor hidden service access process, we get the start index and window size of the key TCP package sequence of traffic. In order to verify the effectiveness of this method, we perform comprehensive analysis under nine scenarios of different Obfs4 transmission model. We find through experimental results that there is a TCP package sequence window, which has a great contribution to identifying Tor hidden service access traffic. Only use the key TCP sequence as input, we can achieve more than 90% accuracy as well as recall in nine scenarios.
The text was updated successfully, but these errors were encountered: