Skip to content

Commit

Permalink
01 - xml blog 2
Browse files Browse the repository at this point in the history
  • Loading branch information
Eric Phenix committed Oct 3, 2018
1 parent 9056272 commit 0d1740d
Show file tree
Hide file tree
Showing 3 changed files with 351 additions and 143 deletions.
3 changes: 3 additions & 0 deletions .gitignore
Original file line number Diff line number Diff line change
@@ -0,0 +1,3 @@
_site
.sass-cache
.jekyll-metadata
308 changes: 165 additions & 143 deletions _posts/2018-09-14-Using-XML.markdown → _posts/2018-09-14-XML.markdown
Original file line number Diff line number Diff line change
@@ -1,143 +1,165 @@
---
layout: post
title: "Working with XML & PowerShell - Part 0"
date: 2018-09-14 00:35:14 -0600
categories: powershell xml guide series
---

## Introduction

In this guide, I hope to introduce you to manipulating XML data using PowerShell. We should cover the following topics:

1. Loading XML data into PowerShell
2. How to reference nodes and attributes
3. Some of the nuances behind nodes and attributes when they are overloaded
4. Introduction to XPath
5. Saving your XML


Unlike JSON, XML nodes have both children (nested nodes) and attribues. XML also supports comments, and certain special attributes such as schemas. We'll cover those later on.

You can find all the supporting xml and Powershell files for this guide on github, [here](https://github.com/ephenix/XML-Powershell).

I encourage you to step through the powershell and play with the objects at each step.

Keep in mind, the default ToString of the XMLElement object can be quite misleading -- Collections of nodes can be truncated by the text output. It's not quite straightforward when you are given an attribute, a child node, a collection of nodes, a node's name, or a '#Text' value. Doing testing and writing robust code with the information from this guide will ensure consistency when working with data of varying structures.

---

## Loading the XML into PowerShell
Here is the xml we'll be working with for this first example:

[00.xml](https://github.com/ephenix/XML-Powershell/blob/master/00.xml)
```xml
<rootnode>
<node attribute="value1" />
<node attribute="value2" name="nameattribute">
<attribute>value3</attribute>
</node>
</rootnode>
```

```PowerShell
[xml]$XMLDocument = Get-Content ".\00.xml"
```

Casting a string to [xml] yields an XMLDocument object. This is what we'll use to parse through our XML.

## Referencing Nodes and Attributes

Both nodes and attributes are referenced by a Property added to their parent XMLDocument or XMLElement object.

The first node of our XML here is the \<rootnode> tag, which we'd reference with $XMLDocument.rootnode

```PowerShell
$XMLDocument | Get-Member -MemberType Property
TypeName: System.Xml.XmlDocument
Name MemberType Definition
---- ---------- ----------
rootnode Property System.Xml.XmlElement rootnode {get;}
```

To get the name of the element you are currently on, we can use the built in .name property of XMLDocument and XMLElement objects. If there is also an attribute or child node called 'name', you can use the inhereted method get_name() to retrieve the name of your node.

```PowerShell
$XMLDocument.name -eq "#document"
$XMLDocument.rootnode.name -eq "rootnode"
# because rootnode.node[0] doesn't have a name attribute, .name gives the node name.
$XMLDocument.rootnode.node[0].name -eq "node"
# node[1] does have a name attribute, so it returns that value instead.
$XMLDocument.rootnode.node[1].name -eq "nameattribute"
$XMLDocument.rootnode.node[1].get_name() -eq "node"
```

When there are child nodes and attributes with the same name, the property returns an object[] array of the '#Text' properties of the attributes and child nodes, or the 'name' property if they don't have text.

```PowerShell
$XMLDocument.rootnode.node[1] | Get-Member -MemberType Property
TypeName: System.Xml.XmlElement
Name MemberType Definition
---- ---------- ----------
attribute Property System.Object[] attribute {get;}
name Property string name {get;set;}
$XMLDocument.rootnode.node[1].Attribute
value2
value3
$XMLDocument.rootnode.node[1].Attribute | % { $_.getType().name }
String
String
```
To robustly reference either the attributes or child nodes, use the corresponding collection properties for 'Attributes' or 'ChildNodes':

```PowerShell
$XMLDocument.rootnode.node[1].Attributes | gm
TypeName: System.Xml.XmlAttribute
( $XMLDocument.rootnode.node[1].Attributes | ? Name -eq "attribute" ).'#Text' -eq "value2"
$XMLDocument.rootnode.node[1].ChildNodes.'#Text' -eq "value3"
```

And of course, as these property names may also be overwritten, there are inhereted methods to reference these as well -- get_ChildNodes() and get_Attributes(). For a full list of these inherited methods, you can run

```PowerShell
$XMLDocument.rootnode | Get-Member -Force
```
---
## XPath Lookups

[XPath](https://www.w3schools.com/xml/xpath_intro.asp) is a way of codifying node paths. [Syntax Reference](https://www.w3schools.com/xml/xpath_syntax.asp).

We can select nodes with string-based paths more robustly than we could otherwise do with regular expressions.

```PowerShell
#Select all nodes with an attribute called "name"
$XMLDocument.SelectNodes("//*[@name]")
#Select all nodes with a child node
$XMLDocument.SelectNodes("//*[*]")
#Select all nodes with a child node of type "attribute"
$XMLDocument.SelectNodes("//*[attribute]")
```

I use XPath extensively to store the location for specific values in our configs we need to change. We will go over the schema I use in a future post.

---

## Saving the XML

Alright, say we've loaded our XML, found our node using either the automatic properties, inherited methods, or XPath. We've modified the values we need to, now what?

The XMLDocument is strictly in-memory -- we haven't changed the original file. In order to save our XML to disk, we neet to call the .Save( \<Path> ) method of our XMLDocument object.

```PowerShell
$XMLDocument.Save($Path)
```
---
layout: post
title: "Working with XML & PowerShell - Part 0"
date: 2018-09-14 00:35:14 -0600
categories: powershell xml guide series
---

## Introduction

Let's learn how to manipulate XML with PowerShell!

If you are token replacing your XML configs with regex, you probably haven't seen [this famous Stackoverflow post](https://stackoverflow.com/questions/1732348/regex-match-open-tags-except-xhtml-self-contained-tags/1732454#1732454). Let me summarize: Regular Expressions are wholly insufficient for parsing XML, as XML is not a regular language.

Here we'll use the .NET [System.Xml] namespace -- primarily XMLDocument and XMLElement to robustly parse XML structure to get and set exactly the values we want.

In this post, we will cover the following topics:

1. Loading existing XML into PowerShell
2. Referencing Nodes and Attributes of the XML
3. Using the inherited methods to unambiguously refer to Attributes or Nodes
4. Using XPath to refer to specific nodes
5. Saving the XML
6. Closing Thoughts

---

## 1. Loading XML into PowerShell

Here is the xml we'll be working with for this first example:

[00.xml](https://github.com/ephenix/XML-Powershell/blob/master/00.xml)
```xml
<rootnode>
<node attribute="value1" />
<node attribute="value2" name="nameattribute">
<attribute>value3</attribute>
</node>
</rootnode>
```

```PowerShell
[xml]$XMLDocument = Get-Content ".\00.xml"
```

Casting a string to [xml] yields an XMLDocument object. This element is the gateway to all of our sub-nodes.

## 2. Referencing Nodes and Attributes

Both nodes and attributes are referenced by a Property added to their parent XMLDocument or XMLElement object.

The first node of our XML here is the \<rootnode> tag, which we'd reference with $XMLDocument.rootnode

```PowerShell
$XMLDocument | Get-Member -MemberType Property
TypeName: System.Xml.XmlDocument
Name MemberType Definition
---- ---------- ----------
rootnode Property System.Xml.XmlElement rootnode {get;}
```

## 3. Inherited Methods

To get the name of the element you are currently on, we can use the built in .name property of XMLDocument and XMLElement objects. If there is also an attribute or child node called 'name', you can use the inhereted method get_name() to retrieve the name of your node.

```PowerShell
$XMLDocument.name -eq "#document"
$XMLDocument.rootnode.name -eq "rootnode"
# because rootnode.node[0] doesn't have a name attribute, .name gives the node name.
$XMLDocument.rootnode.node[0].name -eq "node"
# node[1] does have a name attribute, so it returns that value instead.
$XMLDocument.rootnode.node[1].name -eq "nameattribute"
$XMLDocument.rootnode.node[1].get_name() -eq "node"
```

When there are child nodes and attributes with the same name, the property returns an object[] array of the '#Text' properties of the attributes and child nodes, or the 'name' property if they don't have text.

```PowerShell
$XMLDocument.rootnode.node[1] | Get-Member -MemberType Property
TypeName: System.Xml.XmlElement
Name MemberType Definition
---- ---------- ----------
attribute Property System.Object[] attribute {get;}
name Property string name {get;set;}
$XMLDocument.rootnode.node[1].Attribute
value2
value3
$XMLDocument.rootnode.node[1].Attribute | % { $_.getType().name }
String
String
```
To robustly reference either the attributes or child nodes, use the corresponding collection properties for 'Attributes' or 'ChildNodes'.

Also shown here, the value in between tags, such as \<Tag>Value\</Tag>, is referenced by the '#Text' property.

```PowerShell
$XMLDocument.rootnode.node[1].Attributes | gm
TypeName: System.Xml.XmlAttribute
( $XMLDocument.rootnode.node[1].Attributes | ? Name -eq "attribute" ).'#Text' -eq "value2"
$XMLDocument.rootnode.node[1].ChildNodes.'#Text' -eq "value3"
```

And of course, as these property names may also be overwritten, there are inhereted methods to reference these as well -- get_ChildNodes() and get_Attributes(). For a full list of these inherited methods, you can run

```PowerShell
$XMLDocument.rootnode | Get-Member -Force
```
---
## 4. XPath Lookups

[XPath](https://www.w3schools.com/xml/xpath_intro.asp) is a way of codifying node paths. [Syntax Reference](https://www.w3schools.com/xml/xpath_syntax.asp).

We can select nodes with string-based paths more robustly than we could otherwise do with regular expressions.

```PowerShell
#Select all nodes with an attribute called "name"
$XMLDocument.SelectNodes("//*[@name]")
#Select all nodes with a child node
$XMLDocument.SelectNodes("//*[*]")
#Select all nodes with a child node of type "attribute"
$XMLDocument.SelectNodes("//*[attribute]")
```

I use XPath extensively to store the location for specific values in our configs we need to change. We will go over the schema I use in a future post.

---

## 5. Saving the XML

Alright, say we've loaded our XML, found our node using either the automatic properties, inherited methods, or XPath. We've modified the values we need to, now what?

The XMLDocument is strictly in-memory -- we haven't changed the original file. In order to save our XML to disk, we neet to call the .Save( \<Path> ) method of our XMLDocument object.

```PowerShell
$XMLDocument.Save($Path)
```

## 6. Closing Thoughts

Unlike JSON, XML nodes have both children (nested nodes) and attribues. XML also supports comments, and certain special attributes such as schemas. We'll cover those in a future post.

You can find all the supporting xml and Powershell files for this guide on github, [here](https://github.com/ephenix/XML-Powershell).

I encourage you to step through the powershell and play with the objects at each step.

Keep in mind, the default ToString of the XMLElement object can be quite misleading -- Collections of nodes can be truncated by the text output. It's not quite straightforward when you are given an attribute, a child node, a collection of nodes, a node's name, or a '#Text' value. Doing testing and writing robust code with the information from this guide will ensure consistency when working with data of varying structures.

<div class="PageNavigation">
{% if page.previous.url %}
<a class="prev" href="{{page.previous.url}}">&laquo; {{page.previous.title}}</a>
{% endif %}
{% if page.next.url %}
<a class="next" href="{{page.next.url}}">{{page.next.title}} &raquo;</a>
{% endif %}
</div>
Loading

0 comments on commit 0d1740d

Please sign in to comment.