Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Attempt to fix string escaping and template problems. #25

Merged
merged 2 commits into from
Apr 13, 2024

Conversation

Kiisu-Master
Copy link
Contributor

@Kiisu-Master Kiisu-Master commented Mar 17, 2024

Fix #23

I don't know exactly what cdata is. Should it be escaped too?

I'm a bit concerned about the performance impact of the weird template formatter i made.
I made it because it's way cleaner to work with templates obviously and it would be ugly to switch it all to concatenating.
In my testing it seemed to solve the problem though.

@Kiisu-Master
Copy link
Contributor Author

Kiisu-Master commented Mar 17, 2024

Ok I just did some testing on a random website HTML file (30kb).
Before it would not preserve the content one to one after parsing and dumping.
With this PR it seems to preserve it.

Edit: As the file is 30kb and it doesn't seem to take considerably longer to process than before, i don't think there is any big performance impact with this.

Edit: My testing is inaccurate

@Kiisu-Master Kiisu-Master marked this pull request as draft March 17, 2024 19:33
@elenakrittik elenakrittik added the pr: bug A PR that fixs a bug. label Mar 20, 2024
@elenakrittik
Copy link
Owner

I don't know exactly what cdata is. Should it be escaped too?

SO post. In general, it should be escaped as well, but i'm afraid we won't be able to use String.xml_escape for it as it has a different set of characters that need escaping.

@Kiisu-Master
Copy link
Contributor Author

Ah so its like a comment but part of the document. That's weird

@Kiisu-Master
Copy link
Contributor Author

So i believe i fixed the problems my format function had before, but i don't know if it's really that much better to use a whole custom non-recursive string formatter instead of just concatenating the strings.

I'm also unsure about what exactly xml_escape does and when to use its option to escape quotes. It seems to work though.

@Kiisu-Master
Copy link
Contributor Author

I don't know exactly what cdata is. Should it be escaped too?

SO post. In general, it should be escaped as well, but i'm afraid we won't be able to use String.xml_escape for it as it has a different set of characters that need escaping.

Reading up on that, according to the specification explained in this SO answer, you can have anything in a CDATA block without it getting parsed, except its end sequence. The end sequence cannot be escaped, anything else doesn't need escaping. That's how I understand it.

@elenakrittik
Copy link
Owner

i don't know if it's really that much better to use a whole custom non-recursive string formatter instead of just concatenating the strings.

I don't remember saying that i was against concatenating method specifically 🤔 If you could re-do this with concatenation instead, it would be much appreciated (and more concise as well).

what exactly xml_escape does and when to use its option to escape quotes

It turns e.g. " hello '>.<' " into " hello &apos;&gt;.&lt;&apos; ", and .xml_unescape() turns that back into " hello '>.<' ".

@elenakrittik
Copy link
Owner

The end sequence cannot be escaped

If we really want to, we can do what's suggested in this answer.

@elenakrittik elenakrittik linked an issue Mar 22, 2024 that may be closed by this pull request
@Kiisu-Master
Copy link
Contributor Author

I don't remember saying that i was against concatenating method specifically 🤔 If you could re-do this with concatenation instead, it would be much appreciated (and more concise as well).

😐?

    if self.standalone:
        return "<" + self.name + attribute_string + "/>"
    else:
        return "".join([
            "<" + self.name + attribute_string + ">",
            self.content.xml_escape() + cdata_string + children_string,
            "</" + self.name + ">"
        ])

@elenakrittik
Copy link
Owner

I don't remember saying that i was against concatenating method specifically 🤔 If you could re-do this with concatenation instead, it would be much appreciated (and more concise as well).

😐?

    if self.standalone:
        return "<" + self.name + attribute_string + "/>"
    else:
        return "".join([
            "<" + self.name + attribute_string + ">",
            self.content.xml_escape() + cdata_string + children_string,
            "</" + self.name + ">"
        ])

I meant that as in "more concise than the 'escaper' you currently have". Additionally, you can use parentheses instead of .join()ing parts and extract parts into variables.

@Kiisu-Master Kiisu-Master marked this pull request as ready for review March 25, 2024 20:23
@elenakrittik elenakrittik self-requested a review April 2, 2024 15:24
Copy link
Owner

@elenakrittik elenakrittik left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me, sorry for the review taking so long and thanks for contributing!

@elenakrittik elenakrittik merged commit 25d5aae into elenakrittik:master Apr 13, 2024
@Kiisu-Master
Copy link
Contributor Author

Kiisu-Master commented Apr 13, 2024

Your commit introduced double unescaping i think...
Godots xml parser already does unescaping.
image
(These errors don't happen without your commit.)

@elenakrittik
Copy link
Owner

If you mean that when e.g. <node attr="&quot;"> gets parsed into a node it has .attributes["attr"] == '"', then that's intentional. Is this not what you expected?

@Kiisu-Master
Copy link
Contributor Author

You added xml_unescape but Godot xml parser already does unecaping.
This means it gets unescaped twice now.

func test_unescaping() -> void:
	var xml := XMLNode.new()
	xml.content = "&quot;"
	xml.attributes["one"] = "&lt;"
	xml.attributes["two"] = "&amp; "
	var parsed_xml := XML.parse_str(xml.dump_str(true)).root
	# This is a function i wrote that uses GDUnit to test if all properties match (It will fail in this case)
	assert_xml(xml, parsed_xml)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
pr: bug A PR that fixs a bug.
Projects
None yet
2 participants