Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

added functionality to correctly parse xml lists #15

Open
wants to merge 2 commits into
base: master
Choose a base branch
from

Conversation

Charlemagne3
Copy link

@Charlemagne3 Charlemagne3 commented Apr 18, 2016

The problem

Hello, I am using this library to parse XML in Django REST, and unfortunately, the XML I am parsing is third party, so I cannot use the "list-item" designator.

The solution

I added a function _check_xml_list to check if an XML node represents a list of items. This replaces the check for children[0].tag == "list-item", allowing any list of child nodes that have identical names to be parsed as a list.

The way the function works is that it checks that

  1. the element has more than one child node
  2. the child nodes all have the same name

If they do not have the same name, then it is treated as a normal dictionary element, and duplicate nodes will be overwritten as before.

Testing

I added tests to check both valid and invalid lists nodes, which check the behavior described above.

The unit tests would not run with pytest-django 2.6; the error I received was:

Failed: Database access not allowed, use the "django_db" mark to enable

Upgrading to pytest-django to the latest, 2.9.1, solved this problem.

I am also using Django 1.9, and unittest is no longer a package in django.utils, so I removed the references to it. (This made the linter mad, so I reverted this in 9a24ffd and added the skipUnless checks to my two test methods)

I can remove those changes if desired, my main interest is getting this feature pulled in, and I needed to be able to run the tests to make sure that they passed.

Related issues

Here are related issues and a closed PR:

#5
#11
#12

@bufke
Copy link

bufke commented Jul 25, 2016

For anyone waiting this is a good workaround to use this patch right now.

from rest_framework_xml.parsers import XMLParser


class ListXMLParser(XMLParser):
    def _check_xml_list(self, element):
        return len(element) > 1 and len(set([child.tag for child in element])) <= 1


    def _xml_convert(self, element):
        children = list(element)

        if len(children) == 0:
            return self._type_convert(element.text)
        else:
            # if the fist child tag is list-item means all children are list-item
            if self._check_xml_list(element):
                data = []
                for child in children:
                    data.append(self._xml_convert(child))
            else:
                data = {}
                for child in children:
                    data[child.tag] = self._xml_convert(child)

        return data

@kyleobrien91
Copy link
Collaborator

@Charlemagne3 - sorry for the delay on this. I think this is a great idea.

Still looking at the PR but I noticed there is no documentation update. I think this should get some documentation change and then we can look at merging this.

Perhaps consider adding some info on this into the parsers page of the documentation.

@gertjanol
Copy link

gertjanol commented Dec 11, 2017

I'm not sure this is the way to go. I'm also facing this issue, and I tested this solution. This solution has no way to recognize a list with just one element, which the original implementation with the hard-coded item name (list-item) does get correctly.

Check this example that clarifies my point:

<?xml version="1.0" encoding="utf-8"?>
<root>
  <field_a>
    <list-item>foo</list-item>
  </field_a>
  <field_b>
    <list-item>foo</list-item>
    <list-item>bar</list-item>
  </field_b>  
</root>
{
    "field_a": {
        "list-item": "foo"
    },
    "field_b": ["foo", "bar"]
}

Even though the XML is formatted in a structured way, the resulting dict is not easy to predict. And you can't demand of your users that lists should always contain two or more items. At least, I can't :).

@miniyuuzhan
Copy link

miniyuuzhan commented Dec 14, 2017

@gertjanol

This does the job for me

class ListXMLParser(XMLParser):
    list_tags = [
        'Record',
        'BulletPoint',
        'Job'
    ]

    def _check_xml_list(self, element):
        return len(element) > 1 >= len(set([child.tag for child in element]))

    def _xml_convert(self, element):
        children = list(element)

        if len(children) == 0:
            return self._type_convert(element.text)
        else:
            if self._check_xml_list(element) or children[0].tag in self.list_tags:
                data = []
                for child in children:
                    data.append(self._xml_convert(child))
            else:
                data = {}
                for child in children:
                    data[child.tag] = self._xml_convert(child)
        return data
<ns0:my_xml_service>
     <Records>
	   <Record>
	      <InterfaceType>Test</InterfaceType>
	      <TypeID>testid</TypeID>
	      <Identifier>testidentifier</Identifier>
	      <Status>teststatus</Status>
	      <ResendType>testtype</ResendType>
	      <ResendTypeID>testtypeid</ResendTypeID>
	   </Record>
	   <Record>
	      <InterfaceType>TestB</InterfaceType>
	      <TypeID>testidB</TypeID>
	      <Identifier>testidentifierB</Identifier>
	      <Status>teststatusB</Status>
	      <ResendType>testtypeB</ResendType>
	      <ResendTypeID>testtypeidB</ResendTypeID>
	   </Record>
     </Records>
</ns0:my_xml_service>

Becomes

{
	'Records': [{
		'InterfaceType': 'Test',
		'TypeID': 'testid',
		'Identifier': 'testidentifier',
		'Status': 'teststatus',
		'ResendType': 'testtype',
		'ResendTypeID': 'testtypeid'
	}, {
		'InterfaceType': 'TestB',
		'TypeID': 'testidB',
		'Identifier': 'testidentifierB',
		'Status': 'teststatusB',
		'ResendType': 'testtypeB',
		'ResendTypeID': 'testtypeidB'
	}]
}

Because my_xml_service contains two items

And this:

<ns0:my_xml_service>
	<Records>
	   <Record>
	      <InterfaceType>Test</InterfaceType>
	      <TypeID>testid</TypeID>
	      <Identifier>testidentifier</Identifier>
	      <Status>teststatus</Status>
	      <ResendType>testtype</ResendType>
	      <ResendTypeID>testtypeid</ResendTypeID>
	   </Record>
   </Records>
</ns0:my_xml_service>

Becomes

{
	'Records': [{
		'InterfaceType': 'Test',
		'TypeID': 'testid',
		'Identifier': 'testidentifier',
		'Status': 'teststatus',
		'ResendType': 'testtype',
		'ResendTypeID': 'testtypeid'
	}]
}

Because Record is in the node name list

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants