From 0d1740d9ac5b118b7e5b85a2345963fbd567af2d Mon Sep 17 00:00:00 2001 From: Eric Phenix Date: Wed, 3 Oct 2018 01:04:54 -0600 Subject: [PATCH] 01 - xml blog 2 --- .gitignore | 3 + ...g-XML.markdown => 2018-09-14-XML.markdown} | 308 ++++++++++-------- _posts/2019-10-02-XML.markdown | 183 +++++++++++ 3 files changed, 351 insertions(+), 143 deletions(-) create mode 100644 .gitignore rename _posts/{2018-09-14-Using-XML.markdown => 2018-09-14-XML.markdown} (75%) create mode 100644 _posts/2019-10-02-XML.markdown diff --git a/.gitignore b/.gitignore new file mode 100644 index 0000000..45c1505 --- /dev/null +++ b/.gitignore @@ -0,0 +1,3 @@ +_site +.sass-cache +.jekyll-metadata diff --git a/_posts/2018-09-14-Using-XML.markdown b/_posts/2018-09-14-XML.markdown similarity index 75% rename from _posts/2018-09-14-Using-XML.markdown rename to _posts/2018-09-14-XML.markdown index ffbb83a..6f809d7 100644 --- a/_posts/2018-09-14-Using-XML.markdown +++ b/_posts/2018-09-14-XML.markdown @@ -1,143 +1,165 @@ ---- -layout: post -title: "Working with XML & PowerShell - Part 0" -date: 2018-09-14 00:35:14 -0600 -categories: powershell xml guide series ---- - -## Introduction - -In this guide, I hope to introduce you to manipulating XML data using PowerShell. We should cover the following topics: - -1. Loading XML data into PowerShell -2. How to reference nodes and attributes -3. Some of the nuances behind nodes and attributes when they are overloaded -4. Introduction to XPath -5. Saving your XML - - -Unlike JSON, XML nodes have both children (nested nodes) and attribues. XML also supports comments, and certain special attributes such as schemas. We'll cover those later on. - -You can find all the supporting xml and Powershell files for this guide on github, [here](https://github.com/ephenix/XML-Powershell). - -I encourage you to step through the powershell and play with the objects at each step. - -Keep in mind, the default ToString of the XMLElement object can be quite misleading -- Collections of nodes can be truncated by the text output. It's not quite straightforward when you are given an attribute, a child node, a collection of nodes, a node's name, or a '#Text' value. Doing testing and writing robust code with the information from this guide will ensure consistency when working with data of varying structures. - ---- - -## Loading the XML into PowerShell -Here is the xml we'll be working with for this first example: - -[00.xml](https://github.com/ephenix/XML-Powershell/blob/master/00.xml) -```xml - - - - value3 - - -``` - -```PowerShell -[xml]$XMLDocument = Get-Content ".\00.xml" -``` - -Casting a string to [xml] yields an XMLDocument object. This is what we'll use to parse through our XML. - -## Referencing Nodes and Attributes - -Both nodes and attributes are referenced by a Property added to their parent XMLDocument or XMLElement object. - -The first node of our XML here is the \ tag, which we'd reference with $XMLDocument.rootnode - -```PowerShell -$XMLDocument | Get-Member -MemberType Property - - TypeName: System.Xml.XmlDocument -Name MemberType Definition ----- ---------- ---------- -rootnode Property System.Xml.XmlElement rootnode {get;} -``` - -To get the name of the element you are currently on, we can use the built in .name property of XMLDocument and XMLElement objects. If there is also an attribute or child node called 'name', you can use the inhereted method get_name() to retrieve the name of your node. - -```PowerShell -$XMLDocument.name -eq "#document" -$XMLDocument.rootnode.name -eq "rootnode" -# because rootnode.node[0] doesn't have a name attribute, .name gives the node name. -$XMLDocument.rootnode.node[0].name -eq "node" -# node[1] does have a name attribute, so it returns that value instead. -$XMLDocument.rootnode.node[1].name -eq "nameattribute" -$XMLDocument.rootnode.node[1].get_name() -eq "node" -``` - -When there are child nodes and attributes with the same name, the property returns an object[] array of the '#Text' properties of the attributes and child nodes, or the 'name' property if they don't have text. - -```PowerShell - -$XMLDocument.rootnode.node[1] | Get-Member -MemberType Property - - TypeName: System.Xml.XmlElement - -Name MemberType Definition ----- ---------- ---------- -attribute Property System.Object[] attribute {get;} -name Property string name {get;set;} - -$XMLDocument.rootnode.node[1].Attribute -value2 -value3 - -$XMLDocument.rootnode.node[1].Attribute | % { $_.getType().name } -String -String - -``` -To robustly reference either the attributes or child nodes, use the corresponding collection properties for 'Attributes' or 'ChildNodes': - -```PowerShell -$XMLDocument.rootnode.node[1].Attributes | gm - - TypeName: System.Xml.XmlAttribute - -( $XMLDocument.rootnode.node[1].Attributes | ? Name -eq "attribute" ).'#Text' -eq "value2" - -$XMLDocument.rootnode.node[1].ChildNodes.'#Text' -eq "value3" -``` - -And of course, as these property names may also be overwritten, there are inhereted methods to reference these as well -- get_ChildNodes() and get_Attributes(). For a full list of these inherited methods, you can run - -```PowerShell -$XMLDocument.rootnode | Get-Member -Force -``` ---- -## XPath Lookups - -[XPath](https://www.w3schools.com/xml/xpath_intro.asp) is a way of codifying node paths. [Syntax Reference](https://www.w3schools.com/xml/xpath_syntax.asp). - -We can select nodes with string-based paths more robustly than we could otherwise do with regular expressions. - -```PowerShell -#Select all nodes with an attribute called "name" -$XMLDocument.SelectNodes("//*[@name]") - -#Select all nodes with a child node -$XMLDocument.SelectNodes("//*[*]") -#Select all nodes with a child node of type "attribute" -$XMLDocument.SelectNodes("//*[attribute]") -``` - -I use XPath extensively to store the location for specific values in our configs we need to change. We will go over the schema I use in a future post. - ---- - -## Saving the XML - -Alright, say we've loaded our XML, found our node using either the automatic properties, inherited methods, or XPath. We've modified the values we need to, now what? - -The XMLDocument is strictly in-memory -- we haven't changed the original file. In order to save our XML to disk, we neet to call the .Save( \ ) method of our XMLDocument object. - -```PowerShell -$XMLDocument.Save($Path) -``` +--- +layout: post +title: "Working with XML & PowerShell - Part 0" +date: 2018-09-14 00:35:14 -0600 +categories: powershell xml guide series +--- + +## Introduction + +Let's learn how to manipulate XML with PowerShell! + +If you are token replacing your XML configs with regex, you probably haven't seen [this famous Stackoverflow post](https://stackoverflow.com/questions/1732348/regex-match-open-tags-except-xhtml-self-contained-tags/1732454#1732454). Let me summarize: Regular Expressions are wholly insufficient for parsing XML, as XML is not a regular language. + +Here we'll use the .NET [System.Xml] namespace -- primarily XMLDocument and XMLElement to robustly parse XML structure to get and set exactly the values we want. + +In this post, we will cover the following topics: + +1. Loading existing XML into PowerShell +2. Referencing Nodes and Attributes of the XML +3. Using the inherited methods to unambiguously refer to Attributes or Nodes +4. Using XPath to refer to specific nodes +5. Saving the XML +6. Closing Thoughts + +--- + +## 1. Loading XML into PowerShell + +Here is the xml we'll be working with for this first example: + +[00.xml](https://github.com/ephenix/XML-Powershell/blob/master/00.xml) +```xml + + + + value3 + + +``` + +```PowerShell +[xml]$XMLDocument = Get-Content ".\00.xml" +``` + +Casting a string to [xml] yields an XMLDocument object. This element is the gateway to all of our sub-nodes. + +## 2. Referencing Nodes and Attributes + +Both nodes and attributes are referenced by a Property added to their parent XMLDocument or XMLElement object. + +The first node of our XML here is the \ tag, which we'd reference with $XMLDocument.rootnode + +```PowerShell +$XMLDocument | Get-Member -MemberType Property + + TypeName: System.Xml.XmlDocument +Name MemberType Definition +---- ---------- ---------- +rootnode Property System.Xml.XmlElement rootnode {get;} +``` + +## 3. Inherited Methods + +To get the name of the element you are currently on, we can use the built in .name property of XMLDocument and XMLElement objects. If there is also an attribute or child node called 'name', you can use the inhereted method get_name() to retrieve the name of your node. + +```PowerShell +$XMLDocument.name -eq "#document" +$XMLDocument.rootnode.name -eq "rootnode" +# because rootnode.node[0] doesn't have a name attribute, .name gives the node name. +$XMLDocument.rootnode.node[0].name -eq "node" +# node[1] does have a name attribute, so it returns that value instead. +$XMLDocument.rootnode.node[1].name -eq "nameattribute" +$XMLDocument.rootnode.node[1].get_name() -eq "node" +``` + +When there are child nodes and attributes with the same name, the property returns an object[] array of the '#Text' properties of the attributes and child nodes, or the 'name' property if they don't have text. + +```PowerShell + +$XMLDocument.rootnode.node[1] | Get-Member -MemberType Property + + TypeName: System.Xml.XmlElement + +Name MemberType Definition +---- ---------- ---------- +attribute Property System.Object[] attribute {get;} +name Property string name {get;set;} + +$XMLDocument.rootnode.node[1].Attribute +value2 +value3 + +$XMLDocument.rootnode.node[1].Attribute | % { $_.getType().name } +String +String + +``` +To robustly reference either the attributes or child nodes, use the corresponding collection properties for 'Attributes' or 'ChildNodes'. + +Also shown here, the value in between tags, such as \Value\, is referenced by the '#Text' property. + +```PowerShell +$XMLDocument.rootnode.node[1].Attributes | gm + + TypeName: System.Xml.XmlAttribute + +( $XMLDocument.rootnode.node[1].Attributes | ? Name -eq "attribute" ).'#Text' -eq "value2" + +$XMLDocument.rootnode.node[1].ChildNodes.'#Text' -eq "value3" +``` + +And of course, as these property names may also be overwritten, there are inhereted methods to reference these as well -- get_ChildNodes() and get_Attributes(). For a full list of these inherited methods, you can run + +```PowerShell +$XMLDocument.rootnode | Get-Member -Force +``` +--- +## 4. XPath Lookups + +[XPath](https://www.w3schools.com/xml/xpath_intro.asp) is a way of codifying node paths. [Syntax Reference](https://www.w3schools.com/xml/xpath_syntax.asp). + +We can select nodes with string-based paths more robustly than we could otherwise do with regular expressions. + +```PowerShell +#Select all nodes with an attribute called "name" +$XMLDocument.SelectNodes("//*[@name]") + +#Select all nodes with a child node +$XMLDocument.SelectNodes("//*[*]") +#Select all nodes with a child node of type "attribute" +$XMLDocument.SelectNodes("//*[attribute]") +``` + +I use XPath extensively to store the location for specific values in our configs we need to change. We will go over the schema I use in a future post. + +--- + +## 5. Saving the XML + +Alright, say we've loaded our XML, found our node using either the automatic properties, inherited methods, or XPath. We've modified the values we need to, now what? + +The XMLDocument is strictly in-memory -- we haven't changed the original file. In order to save our XML to disk, we neet to call the .Save( \ ) method of our XMLDocument object. + +```PowerShell +$XMLDocument.Save($Path) +``` + +## 6. Closing Thoughts + +Unlike JSON, XML nodes have both children (nested nodes) and attribues. XML also supports comments, and certain special attributes such as schemas. We'll cover those in a future post. + +You can find all the supporting xml and Powershell files for this guide on github, [here](https://github.com/ephenix/XML-Powershell). + +I encourage you to step through the powershell and play with the objects at each step. + +Keep in mind, the default ToString of the XMLElement object can be quite misleading -- Collections of nodes can be truncated by the text output. It's not quite straightforward when you are given an attribute, a child node, a collection of nodes, a node's name, or a '#Text' value. Doing testing and writing robust code with the information from this guide will ensure consistency when working with data of varying structures. + + \ No newline at end of file diff --git a/_posts/2019-10-02-XML.markdown b/_posts/2019-10-02-XML.markdown new file mode 100644 index 0000000..6708638 --- /dev/null +++ b/_posts/2019-10-02-XML.markdown @@ -0,0 +1,183 @@ +--- +layout: post +title: "Working with XML & PowerShell - Part 1" +date: 2018-10-02 00:35:14 -0600 +categories: powershell xml guide series +--- + +## Introduction + +Welcome to Part 1 of our series on working with XML in Powershell. + +If you missed part 0, you can find it here: + +As before, you can find the supporting Powershell and XML files on [github](https://github.com/ephenix/XML-Powershell). + +In this post, we will cover the following topics: + +1. Working With Real Data +2. Setting Values the Hard Way +3. Setting Values using XPath +4. Closing Thoughts + +--- + +## 1. Working With Real Data + +Here is the xml we'll be working with for this example: + +[01.xml](https://github.com/ephenix/XML-Powershell/blob/master/01.xml) +```xml + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +``` + +Here's an example web.config for a WCF web application. + +One thing that's important to note here -- each of the client and service endpoints is a little different; with differing orders for the attributes, some missing, others formatted on different lines -- this is an example of how using a even a multi-line regex would be insufficient for identifying nodes. This closely mirrors what you'll find in real-world configs as you begin trying to automatically transform them. + +## 2. Updating Values the Hard Way + +Let's say that, for our build, we need to update our configs to use the development environment URLs on the client side. The services defined in the same file must stay pointed to localhost. + +If we attempt to replace all localhost addresses with our app-dev url, it will break our services config, and that's not good. + +```PowerShell +$Content = Get-Content "$PSScriptRoot\01.xml" -Raw +$Content = $Content -replace "localhost:8000", "app-dev.contoso.local" + +``` + +```PowerShell +Context "Try to Set URLs with Regex" { + $Content = $Content -replace "localhost:8000", "app-dev.contoso.local" + [xml]$xml = $Content + it "We don't want to change service addresses" { + $addresses = $xml.SelectNodes("//service/endpoint") | + Foreach-Object { ( $_.Get_Attributes() | + Where-Object { $_.Get_Name() -eq "address"} ).'#Text' } + $addresses | Should -match "localhost:8000" + } + it "But we do want to change endpoint addresses"{ + $addresses = $xml.SelectNodes("//client/endpoint") | + Foreach-Object { ( $_.Get_Attributes() | + Where-Object { $_.Get_Name() -eq "address"} ).'#Text' } + $addresses | Should -match "app-dev.contoso.local" + } + } + + Context Try to Set URLs with Regex + [+] We don't want to change service addresses 31ms + [-] But we do want to change endpoint addresses 9ms + Expected regular expression 'app-dev.contoso.local' to match 'http://localhost:8000/ServiceModelSamples/service.svc', but it did not +match. + 18: $addresses | Should -match "app-dev.contoso.local" + at , C:\onedrive\OneDrive - StrategicTech.io\dev\blog\powershell\XML-Powershell\01.ps1: line 18 +``` + +Writing a regular expression to safely identify the nodes we're trying to identify is not feasible. The number of valid ways for XML to be formatted is statistically significant. + + +Without using Regex or XPath, our code needs to essentially traverse the XML tree with a bunch of where-objects. + +```PowerShell +$configuration = $XmlDocument.ChildNodes | ? {$_.Get_Name() -eq "configuration"} +$servicemodel = $configuration.ChildNodes | ? {$_.Get_Name() -eq "system.servicemodel"} +$client = $servicemodel.ChildNodes | ? {$_.Get_Name() -eq "client"} +$endpoints = $client.ChildNodes | ? {$_.Get_Name() -eq "endpoint"} + +foreach ( $endpoint in $endpoints ) +{ + $endpoint.address = $endpoint.address -replace "localhost:8000", "app-dev.contoso.local" +} +``` + +## 3. Updating Values with XPath + +XPath makes things much, much simpler. + +Here, we're making all the URLs https://, including both services and client config. This could be tailored to one or the other by selecting nodes on //service/endpoint or //client/endpoint. + +```PowerShell +foreach ( $endpoint in $XmlDocument.SelectNodes("//endpoint") ) +{ + $endpoint.address = $endpoint.address -replace "http:","https:" +} +``` + +## 4. Conclusion + +I hope this has been a good introduction to some of the inconsistencies that real-world XML can throw at parsing. These were relatively minor, in comparison to some I've faced. + +Finding a primary key among 200 haphazardly named nodes, some with names, some without -- was an interesting challenge. + +Next up, we're going to look at formatting and sorting XML. This can be used to help compare changes, standardize en-masse, and make the XML more human-readable. + + + \ No newline at end of file