-
Notifications
You must be signed in to change notification settings - Fork 19
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Modifying nodes #9
Comments
Hi. Yeah, I've been meaning to do this for a while (making nodes mutable), but I never really got to it because I didn't know if it was needed and was holding off until someone asked for it. |
My initial idea was to change --- a/src/bytes.rs
+++ b/src/bytes.rs
@@ -3,7 +3,12 @@ use std::borrow::Cow;
#[derive(PartialEq, Eq, PartialOrd, Ord, Hash, Clone)]
-pub struct Bytes<'a>(&'a [u8]);
+pub enum Bytes<'a> {
+ /// Borrowed bytes
+ Borrowed(&'a [u8]),
+ /// An owned string
+ Owned(Box<str>),
+} That is, make it an enum representing either borrowed (unmodified) data, or an owned string in case it's mutated later after parsing (much like how
That being said, I have some more ideas on how to implement it without changing the |
Thanks for the effort. I would gladly take a look at you PR, but I have a lot of school project to finish first. Anyway, happy new year 🎉 |
Hey guys, what can I work on to help with this effort? I too need to be able to mutate href's. |
The PR that implements this (#10) needs a few more things until it can be merged.
I was a bit busy the last few days and didn't have a lot of time to spend here, but now I have more time. |
@Skallwar Hey, I'm still not sure I understand your use-case entirely. What happens after modifying |
Exactly
Yeah I saw that. When I will have more time, I would love to help you on both mutable node and serialization |
10: Allow DOM mutation r=y21 a=y21 This PR implements DOM mutation as suggested in #9. It does so by changing `Bytes` (type that holds substrings) to be Copy-on-Write, roughly like this: ```diff - #[derive(PartialEq, Eq, PartialOrd, Ord, Hash, Clone)] - pub struct Bytes<'a>(&'a [u8]); + #[derive(PartialEq, Eq, PartialOrd, Ord)] + enum Bytes { + /// Borrowed bytes + Borrowed(*const u8, u32), + /// Owned bytes + Owned(*mut u8, u32), + } ``` Initially, during parsing, it references input strings through `Bytes::Borrowed`. Later, the substring can be mutated by copying the bytes and changing its discriminant to `Bytes::Owned` (this is exposed through the safe method `Bytes::set`). In `Bytes` Drop code, it deallocates the data *if* (and only if) `Bytes` contains owned data. Sadly, we can't use `&[u8]` in the enum as that increases the size of the enum to 24. It gets around this by storing a pointer to the start of the slice and using a `u32` for the length. This brings it back to 16 (8 for the pointer + 4 for the length + 4 for discriminant/padding/alignment). This PR also adds a new fuzzing target for query selectors. ## Todo - [x] Check for input length in `tl::parse()` (it can't be `> u32::MAX`). - [x] Check for input length in `Bytes::set` (it can't be `> u32::MAX`). - [x] InlineVec and InlineHashMap needs `Drop` code to deallocate possibly owned `Byte`s - [x] Create InlineHashMap/InlineVec mutation methods. - [x] `InlineHashMap::insert` - [x] `InlineHashMap::remove` - [x] `InlineHashMap::get_mut` - [x] `InlineVec::push` - [x] `InlineVec::remove` - [x] `InlineVec::get_mut` Co-authored-by: y21 <btimo8822@gmail.com> Co-authored-by: Timo <30553356+y21@users.noreply.github.com>
As far as I can tell most of what was requested here is already implemented. Right? Modifying I would need this kind of feature for my fun-project and depending on whether you plan to implement those at all (and maybe even with some idea of "when"?) I would either just stick to tl or would put a second layer of abstraction (e.g. "rctree") on top. Using tl to build up the DOM and rctree to manipulate the heck out of it before then writing it out as string. So any info you can share would be very appreciated. |
Yeah, most of the things in the DOM can be mutated now.
I think adding nodes to the DOM should be fairly easy to support. I can try to get it implemented later today. |
I greatly appreciate it 🙂 I wasn't so sure if that would be viable, as it appears to me, that you use some aspects of the |
Oh, that's true. I forgot that the query selector iterator relies on the value of the node ids. That indeed makes it more difficult than I thought. |
A thought on this topic I would like to share: Instead of making the model itself mutable through and through, maybe add special methods on the parser, inspired on CQRS. So you have your still super-fast parsing & iterating over all the model with as little overhead & copying as possible. |
I would be great if
tl
users where able to modify Nodes.My use case is a webdownloader, and thus I need to change the target href for images for examples.
From what I see this is not available yet but it will be great as
tl
seems to be a speedy HTML parser.I might try to implement this but it seems that a big chunk of code will need to be changed so I might need some help to figure out some of the implications of such a change.
Anyway awesome project keep up the good work.
The text was updated successfully, but these errors were encountered: