XML is a significant markup language mainly intended as a means of serialising data structures as a text document. Go has basic support for XML document processing.
XML是一种重要的标记语言,旨在把数据结构序列化成文本文档。Go基本支持XML文档处理。
XML is now a widespread way of representing complex data structures serialised into text format. It is used to describe documents such as DocBook and XHTML. It is used in specialised markup languages such as MathML and CML (Chemistry Markup Language). It is used to encode data as SOAP messages for Web Services, and the Web Service can be specified using WSDL (Web Services Description Language).
现在XML是一个用序列化的文本格式表现复杂数据结构的普遍方式。它被用来描述文档例如DocBook和XHTML。它还用于描述专用标记语言如MathML和CML(化学标记语言)。Web服务中它还用来将数据编码成SOAP消息,Web服务也可以指定使用WSDL(Web服务描述语言)。
At the simplest level, XML allows you to define your own tags for use in text documents. Tags can be nested and can be interspersed with text. Each tag can also contain attributes with values. For example,
在最简单的层次上,XML允许您定义您个人标记用于文本文档。标签可以嵌套,也穿插在文本里。每个标记也可以包含属性与值。例如,
<person>
<name>
<family> Newmarch </family>
<personal> Jan </personal>
</name>
<email type="personal">
jan@newmarch.name
</email>
<email type="work">
j.newmarch@boxhill.edu.au
</email>
</person>
The structure of any XML document can be described in a number of ways:
任何XML文档的结构可以用多种方式描述:
There is argument over the relative value of each way of defining the structure of an XML document. We won't buy into that, as Go does not suport any of them. Go cannot check for validity of any document against a schema, but only for well-formedness.
人们总会争论定义XML文档结构的每一个方式的好坏。我们不会陷入其中,因为Go不支持其中任何一个。Go不能检查任何文档模式的有效性,但只知道良构性。
Four topics are discussed in this chapter: parsing an XML stream, marshalling and unmarshalling Go data into XML, and XHTML.
在本章中讨论四个主题:解析一个XML流,编组和解组Go数据成为XML和XHTML。
Go has an XML parser which is created using NewParser
. This takes an io.Reader
as parameter and returns a pointer to Parser
. The main method of this type is Token
which returns the next token in the input stream. The token is one of the types StartElement
, EndElement
, CharData
, Comment
, ProcInst
or Directive
.
Go有一个使用 NewParser
.创建的XML解析器。这需要一个io.Reader
作为参数,并返回一个指向Parser
的指针。这个类型的主要方法是 Token
,这个方法返回输入流中的下一个标记。该标记是 StartElement
, EndElement
, CharData
, Comment
, ProcInst
和Directive
其中一种。
The types are
这些类有
StartElement
The type StartElement
is a structure with two field types:
StartElement
类型是一个包含两个字段的结构:
type StartElement struct {
Name Name
Attr []Attr
}
type Name struct {
Space, Local string
}
type Attr struct {
Name Name
Value string
}
EndElement
This is also a structure
同样也是一个结构
type EndElement struct {
Name Name
}
CharData
This type represents the text content enclosed by a tag and is a simple type
这个类表示一个被标签包住的文本内容,是一个简单类。
type CharData []byte
Comment
Similarly for this type
这个类也很简洁
type Comment []byte
ProcInst
A ProcInst represents an XML processing instruction of the form <?target inst?>
一个ProcInst表示一个XML处理指令形式,如<target inst?>
type ProcInst struct {
Target string
Inst []byte
}
Directive
A Directive represents an XML directive of the form <!text>. The bytes do not include the <! and > markers.
一个指令用XML指令<!文本>的形式表示,内容不包含< !和> 构成部分。
type Directive []byte
A program to print out the tree structure of an XML document is
打印XML文档的树结构的一个程序,代码如下
/* Parse XML
*/
package main
import (
"encoding/xml"
"fmt"
"io/ioutil"
"os"
"strings"
)
func main() {
if len(os.Args) != 2 {
fmt.Println("Usage: ", os.Args[0], "file")
os.Exit(1)
}
file := os.Args[1]
bytes, err := ioutil.ReadFile(file)
checkError(err)
r := strings.NewReader(string(bytes))
parser := xml.NewDecoder(r)
depth := 0
for {
token, err := parser.Token()
if err != nil {
break
}
switch t := token.(type) {
case xml.StartElement:
elmt := xml.StartElement(t)
name := elmt.Name.Local
printElmt(name, depth)
depth++
case xml.EndElement:
depth--
elmt := xml.EndElement(t)
name := elmt.Name.Local
printElmt(name, depth)
case xml.CharData:
bytes := xml.CharData(t)
printElmt("\""+string([]byte(bytes))+"\"", depth)
case xml.Comment:
printElmt("Comment", depth)
case xml.ProcInst:
printElmt("ProcInst", depth)
case xml.Directive:
printElmt("Directive", depth)
default:
fmt.Println("Unknown")
}
}
}
func printElmt(s string, depth int) {
for n := 0; n < depth; n++ {
fmt.Print(" ")
}
fmt.Println(s)
}
func checkError(err error) {
if err != nil {
fmt.Println("Fatal error ", err.Error())
os.Exit(1)
}
}
Note that the parser includes all CharData, including the whitespace between tags.
注意,解析器包括所有文本节点,包括标签之间的空白。
If we run this program against the person
data structure given earlier, it produces
如果我们运行这个程序对前面给出的 person
数据结构,它就会打印出
person
"
"
name
"
"
family
" Newmarch "
family
"
"
personal
" Jan "
personal
"
"
name
"
"
email
"
jan@newmarch.name
"
email
"
"
email
"
j.newmarch@boxhill.edu.au
"
email
"
"
person
"
"
Note that as no DTD or other XML specification has been used, the tokenizer correctly prints out all the white space (a DTD may specify that the whitespace can be ignored, but without it that assumption cannot be made.)
注意,因为没有使用DTD或其他XML规范, tokenizer 正确地打印出所有的空白(一个DTD可能指定可以忽略空格,但是没有它假设就不能成立。)
There is a potential trap in using this parser. It re-uses space for strings, so that once you see a token you need to copy its value if you want to refer to it later. Go has methods such as func (c CharData) Copy() CharData
to make a copy of data.
在使用这个解析器过程中有一个潜在的陷阱值得注意:它会为字符串重新利用空间,所以,一旦你看到一个你想要复制它的值的标记,假设你想稍后引用它的话,Go有类似的方法如 func (c CharData) Copy() CharData
来复制数据。
Go provides a function Unmarshal
and a method func (*Parser) Unmarshal
to unmarshal XML into Go data structures. The unmarshalling is not perfect: Go and XML are different languages.
Go提供一个函数 Unmarshal
和一个方法调用 func (*Parser) Unmarshal
解组XML转化为Go数据结构。解组并不是完美的:Go和XML毕竟是是两个不同的语言。
We consider a simple example before looking at the details. We take the XML document given earlier of
我们先考虑一个简单的例子再查看细节。我们用前面给出的XML文档
<person>
<name>
<family> Newmarch </family>
<personal> Jan </personal>
</name>
<email type="personal">
jan@newmarch.name
</email>
<email type="work">
j.newmarch@boxhill.edu.au
</email>
</person>
We would like to map this onto the Go structures
接下来我们想把这个文档映射到Go结构
type Person struct {
Name Name
Email []Email
}
type Name struct {
Family string
Personal string
}
type Email struct {
Type string
Address string
}
This requires several comments:
这里需要一些说明:
Name
. Now, though, case-sensitive matching is used. To perform a match, the structure fields must be tagged to show the XML string that will be matched against. This changes Person
to
type Person struct {
Name Name `xml:"name"`
Email []Email `xml:"email"`
}
Person
type Person struct {
XMLName Name `xml:"person"`
Name Name `xml:"name"`
Email []Email `xml:"email"`
}
Type
of Email
, where matching the attribute "type" of the "email" tag requires `xml:"type,attr"`
string
field by the same name (case-sensitive, though). So the tag `xml:"family"`
with character data "Newmarch" maps to the string field Family
,chardata
. This occurs with the "email" data and the field Address
with tag ,chardata
Name
字段。但是现在使用case-sensitive
匹配,要执行一个匹配,结构字段后必须用标记来显示XML标签名,以应付匹配。Person
修改下应该是
type Person struct {
Name Name `xml:"name"`
Email []Email `xml:"email"`
}
Person
如下
type Person struct {
XMLName Name `xml:"person"`
Name Name `xml:"name"`
Email []Email `xml:"email"`
}
Email
类型的Type
字段,需要标记`xml:"type,attr"`
才能匹配带有“type”属性的“email”string
字段是通过相同的名称(区分大小写的,不过如此)。所以标签`xml:"family"`
将对应着文本”Newmarch”映射到Family
的string字段中
,chardata
的文字。如下面例子中通过 Address
后标记,chardata
的字段来获取email的文本值A program to unmarshal the document above is
解组上面文档的一个程序
/* Unmarshal
*/
package main
import (
"encoding/xml"
"fmt"
"os"
//"strings"
)
type Person struct {
XMLName Name `xml:"person"`
Name Name `xml:"name"`
Email []Email `xml:"email"`
}
type Name struct {
Family string `xml:"family"`
Personal string `xml:"personal"`
}
type Email struct {
Type string `xml:"type,attr"`
Address string `xml:",chardata"`
}
func main() {
str := `<?xml version="1.0" encoding="utf-8"?>
<person>
<name>
<family> Newmarch </family>
<personal> Jan </personal>
</name>
<email type="personal">
jan@newmarch.name
</email>
<email type="work">
j.newmarch@boxhill.edu.au
</email>
</person>`
var person Person
err := xml.Unmarshal([]byte(str), &person)
checkError(err)
// now use the person structure e.g.
fmt.Println("Family name: \"" + person.Name.Family + "\"")
fmt.Println("Second email address: \"" + person.Email[1].Address + "\"")
}
func checkError(err error) {
if err != nil {
fmt.Println("Fatal error ", err.Error())
os.Exit(1)
}
}
(Note the spaces are correct.). The strict rules are given in the package specification.
(注意空间是正确的)。Go在包详解中给出了严格的规则。
Go 1 also has support for marshalling data structures into an XML document. The function is
Go1也支持将数据结构编组为XML文档的。这个函数是
func Marshal(v interface}{) ([]byte, error)
This was used as a check in the last two lines of the previous program.
这是用来检查前面程序的最后两行
HTML does not conform to XML syntax. It has unterminated tags such as '<br>'. XHTML is a cleanup of HTML to make it compliant to XML. Documents in XHTML can be managed using the techniques above for XML.
HTML并不符合XML语法。 它包含无闭端的标签如“< br >”。XHTML是HTML的一个自身兼容XML的子集。 在XHTML文档中可以使用操作XML的技术。
There is some support in the XML package to handle HTML documents even though they are not XML-compliant. The XML parser discussed earlier can handle many HTML documents if it is modified by
XML包的部分方法可支持处理HTML文档,即使他们本身不具备XML兼容性。前面讨论的XML解析器修改下就可以处理大部分HTML文件
parser := xml.NewDecoder(r)
parser.Strict = false
parser.AutoClose = xml.HTMLAutoClose
parser.Entity = xml.HTMLEntity
Go has basic support for dealing with XML strings. It does not as yet have mechanisms for dealing with XML specification languages such as XML Schema or Relax NG.
Go基本支持对XML字符的处理,而且它不像有着针对XML专用语言如XML Schema或Relax NG的处理机制。
Copyright Jan Newmarch, jan@newmarch.name
If you like this book, please contribute using Flattr
or donate using PayPal