-
Notifications
You must be signed in to change notification settings - Fork 72
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
XML Schema Download incorrectly modifies the schema #387
Comments
Hi, thank you for the detailed explanation. An alternative is the uri_mapper option, available since release v3.0.0 (download the schemas manually and then provide a map for remote URLs to local paths). |
For now, I have manually reverted the changes in the XML Schema affecting me to move ahead. |
I could add a logger for export method (export_schema function in fact), providing loglevel optional argument like it's now for schema initialization/building. |
Maybe for solving this a fix in this helper can be sufficient: def replace_location(text: str, location: str, repl_location: str) -> str:
repl = 'schemaLocation="{}"'.format(repl_location)
pattern = r'\bschemaLocation\s*=\s*[\'\"].*%s.*[\'"]' % re.escape(location)
return re.sub(pattern, repl, text) The replacement pattern matches also the namespace part so the XML namespace has no Also another improvement (reducing useless changes) could be to skip the erasing of residual non-remote locations. |
- Fix the replacement pattern in export_schema() - Add loglevel argument, apply with a decorator - Add logger.debug statements - Don't remove non-remote residuals schemaLocation entries
The new release v3.1.0 has a fix for schema exports Also a logging facility has been added to |
I tried out the latest release, it seems to not modify the xml schema. |
The XML namespace is already loaded within the meta-schema, so an So the download of remote |
To clarify: the schema export doesn't download nothing, it only uses the already downloaded XSD sources contained in the schema instance and save them locally. |
I'm sorry, I didn't remember well, the referred xml.xsd (e.g. "http://www.w3.org/2001/xml.xsd" is an XSD file with a stylesheet). Schema classes use a meta-schema that already has loaded a minimal set of base namespaces:
I cannot remove XML from base namespaces because xml:lang is used in XSD namespace meta-schema (with a regular import). The meta-schema does a fundamental part in validation and decoding in an efficient mode, despite it can be rebuild if it's needed. Anyway I think the export procedure can be extended with another option for doing a tentative of loading and saving the residual locations referred by skipped xs:import elements. I will try this way for a next release. |
FYI about the special status of the above four base namespaces: https://www.w3.org/TR/xmlschema11-1/#sec-nss-special |
@AmeyaVS: I will not change schema export for downloading skipped schemas like the case of xml.xsd, but in the next minor release I will add a new API |
Makes sense. Thank you for getting a fix in quickly. |
Should I close this issue in the meantime, or should I keep it open once the |
Keep it open, the next minor release should be ready soon. |
The |
Hello @brunato , I tried the following code to try and observe the import xmlschema
import os.path
import urllib
from xmlschema import download_schemas
def main():
# Schema Base URL resource
xsd_base_uri = "http://www.accellera.org/XMLSchema/IPXACT/1685-2022/index.xsd"
# Extract Path from URI
path = urllib.parse.urlparse(xsd_base_uri).path
print(path)
# Split path into path + resource name
schema_path = os.path.split(path)
print(schema_path)
target_path = f"schemas/{schema_path[0]}"
# Create Directory if it doesn't exists:
os.makedirs(target_path, exist_ok=True)
local_target_path = f"{target_path}/{schema_path[1]}"
# Download schemas
download_schemas(xsd_base_uri, target="schemas2")
if not os.path.isfile(f"{local_target_path}"):
schema = xmlschema.XMLSchema(xsd_base_uri)
schema.export(target=target_path, save_remote=True)
schema = xmlschema.XMLSchema(local_target_path)
if __name__ == '__main__':
main() And observing following error message on the console with respect to the
While looking at the <xs:schema xmlns:ipxact="http://www.accellera.org/XMLSchema/IPXACT/1685-2022" xmlns:xs="http://www.w3.org/2001/<xs:schema xmlns:ipxact="http://www.accellera.org/XMLSchema/IPXACT/1685-2022" xmlns:xs="http://www.w3.org/2001/XMLSchema" targetNamespace="http://www.accellera.org/XMLSchema/IPXACT/1685-2022" elementFormDefault="qualified">
<xs:include schemaLocation="busDefinition.xsd"/>
<xs:include schemaLocation="component.xsd"/>
<xs:include schemaLocation="design.xsd"/>
<xs:include schemaLocation="designConfig.xsd"/>
<xs:include schemaLocation="abstractionDefinition.xsd"/>
<xs:include schemaLocation="catalog.xsd"/>
<xs:include schemaLocation="abstractor.xsd"/>
<xs:include schemaLocation="typeDefinitions.xsd"/>
<!-- <xs:include schemaLocation="memoryMapDefinition.xsd"/> -->
<!-- <xs:include schemaLocation="addressBlockDefinition.xsd"/> -->
<!-- <xs:include schemaLocation="registerFileDefinition.xsd"/> -->
<!-- <xs:include schemaLocation="registerDefinition.xsd"/> -->
<!-- <xs:include schemaLocation="fieldDefinition.xsd"/> -->
<!-- <xs:include schemaLocation="enumerationDefinition.xsd"/> -->
<xs:group name="IPXACTDocumentTypes"> It seems xmlschema is also parsing the commented section which anyway are invalid schema definitions. Let me know if additional context is needed. Regarding the 2 different ways to get the schemas results in identical schemas being downloaded for my use case. |
Ok, maybe better to abandon regex for extracting schemaLocation list from text and use an iteration on ElementTree structure instead. thank you |
Changing that for export is not recommendable because xml.xsd is already included in base schema set, so the xmlschema library doesn't need to save another copy of xml.xsd. If you want you can try an export after creating the schema providing Anyway the |
Sounds good, let me know if you want me to close this issue. |
- Modify dataclass XsdSource: now takes a path and an XMLResorce, other attributes are set in __init__; - Schema locations now are extracted from XML tree.
Now the changes are published. Try the updated code and report other problems eventually, or close the issue. |
Sorry, for the delay. |
I am trying to download XML schema from a remote URL and it seems to be modifying one of the schema document incorrectly.
Here's a snippet of the code to reproduce the issue:
The library seems to be modifying the
xs:import
line in theautoConfigure.xsd
document:The Left side is the original file downloaded from the url:
http://www.accellera.org/XMLSchema/IPXACT/1685-2022/autoConfigure.xsd
Because of the edit the XML Validation would fail due to incorrect XML Schema specification with the following error:
For now there are other changes also but are of no significant impact.
Is there an option to download the XML Schemas without editing?
The text was updated successfully, but these errors were encountered: