Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Simple namespace-aware parsing #119

Open
wants to merge 4 commits into
base: master
Choose a base branch
from

Conversation

pbiron
Copy link

@pbiron pbiron commented Jun 14, 2017

This patch adds "simple" namespace-aware parsing. By "simple" I mean:

  • it ignores the version # portion of the various WXR namespaceURIs that have been used
  • still relies on <wxr_version> to detect the version of WXR (rather than examining the namespaceURI of any element)

So

<rss>
   ...
   <item>
      <wp:post_id xmlns:wp='http://wordpress.org/export/1.1'>123</wp:post_id>
      <wp:post_name xmlns:wp='http://wordpress.org/export/1.2'>home</wp:post_name>
      ...
   </item>
   ...
</rss>

is perfectly acceptable to it.

I decided to do it this way to maintain backwards compatibility with the standard importer (which, of course, ignores namespaces all together), and in keeping with Postel's Law, "Be liberal in what you accept, and conservative in what you send".

The one part of the implementation (not strictly related to namespace-aware parsing) that you may not agree with is that it skips WXR elements until it has detected the version of WXR being used. I did this because of the true streaming nature of this importer. That is, since (according to the RSS spec) <wp:wxr_version/> could come at the very end of <channel> (i.e., after all WXR elements) I don't think it is safe to insert posts, etc into the DB without knowing the version of WXR being used since some future version of WXR could change how the WP-related content is represented. Of course, that case will never happen with the standard exporter, but since this needs to handle exports generated by arbitrary plugins I think it's safer to do it this way.

I have also added unit tests and the XML Schema I wrote for WXR 1.2 (which, of course, is not used, but just wanted to get it out there).

Finally, there are a few @todo's (distinguished with my initials "pvb" to make them easier to find) that you should read over before committing.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant