Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DomReader: slow performance $O(n^2)$ when deserialize on millions of elements to list #342

Closed
morris821028 opened this issue Jul 5, 2023 · 3 comments
Assignees
Milestone

Comments

@morris821028
Copy link

Target Object

LinkedList<String> data = new LinkedList<>();
for (int i = 0; i < 1000000; i++)
  data.add("elem");

Source Code

// Element container; // using DocumentBuilder to append children.

new XStream(new DomDriver()).unmarshal(new DomReader(container));

The parser will achieve $O(n^2)$ time when switching context.

https://github.com/x-stream/xstream/blob/master/xstream/src/java/com/thoughtworks/xstream/io/xml/DomReader.java#L135

Because the parser will move down and up, the DomReader.reassignCurrentElement in linear time $O(n)$. For each element, this function will be called in $O(n)$. Finally, complete in $O(n^2)$.

Then, I'm not sure the implementation of Element.item(int). In my understanding, some of implementations used linked list and make it worse in $O(n^2)$ in one DomReader.reassignCurrentElement.

Here is my workaround for your reference.

public class DomReader extends AbstractDocumentReader {
...
		private List<Element> childElements;
		private Map<Object, List<Element>> cache;
...
		@Override
		protected void reassignCurrentElement(Object current) {
			currentElement = (Element) current;
			if (cache == null)
				cache = new HashMap<>();
			childElements = cache.get(current);
			if (childElements != null)
				return;

			childElements = new ArrayList<>();
			Node node = currentElement.getFirstChild();
			while (node != null) {
				if (node instanceof Element) {
					childElements.add((Element) node);
				}
				node = node.getNextSibling();
			}
			cache.put(current, childElements);
		}

		@Override
		public void moveUp() {
			cache.remove(getCurrent());
			super.moveUp();
		}
...
@joehni
Copy link
Member

joehni commented Jul 10, 2023

Actually the code was written 19 years ago (predates my time with XStream), but I confess the current implementation is unfortunate for elements with a lot of children.

@joehni joehni self-assigned this Jul 10, 2023
@kdebski85
Copy link

This can be considered as a security issue.
An attacker can send XML with many elements to perform DOS attack on a server.

@joehni
Copy link
Member

joehni commented Sep 26, 2023

This can be considered as a security issue. An attacker can send XML with many elements to perform DOS attack on a server.

No, but you're free to prove it. DOM is a memory-based model. You'll never get an XML structure loaded with so many elements to slow down this code significantly enough for a DOS attack. You'll run into an OOME first.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants