yes, there is such a functionality in the lxml.html
package, it's called fragment_fromstring
or fragments_fromstring
, but in most cases the html parser also handles xml quite well:
from lxml import etree, html
xml = """
<tree id="A">
<anotherelement />
</tree>
<tree id="B">
<yetanotherelement />
</tree>
"""
fragments = html.fragments_fromstring(xml)
root = etree.Element("root")
for f in fragments:
root.append(f)
print etree.tostring(root, pretty_print=True)
output:
<root>
<tree id="A">
<anotherelement />
</tree>
<tree id="B">
<yetanotherelement />
</tree>
</root>
This is a tutorial on XML processing with lxml.etree. It briefly overviews the main concepts of the ElementTree API, and some simple enhancements that make your life as a programmer easier.,To aid in writing portable code, this tutorial makes it clear in the examples which part of the presented API is an extension of lxml.etree over the original ElementTree API, as defined by Fredrik Lundh's ElementTree library.,In the original xml.etree.ElementTree implementation and in lxml up to 1.3.3, the output looks the same as when serialising only the root Element:,An ElementTree is mainly a document wrapper around a tree with a root node. It provides a couple of methods for serialisation and general document handling.
>>> from lxml
import etree
try:
from lxml
import etree
print("running with lxml.etree")
except ImportError:
try:
# Python 2.5
import xml.etree.cElementTree as etree
print("running with cElementTree on Python 2.5+")
except ImportError:
try:
# Python 2.5
import xml.etree.ElementTree as etree
print("running with ElementTree on Python 2.5+")
except ImportError:
try:
# normal cElementTree install
import cElementTree as etree
print("running with cElementTree")
except ImportError:
try:
# normal ElementTree install
import elementtree.ElementTree as etree
print("running with ElementTree")
except ImportError:
print("Failed to import ElementTree from any known place")
>>> root = etree.Element("root")
>>> print(root.tag) root
>>> root.append(etree.Element("child1"))
>>> child2 = etree.SubElement(root, "child2") >>>
child3 = etree.SubElement(root, "child3")
The first element in every XML document is called the root element. An XML document can only have one root element. The following is not an XML document, because it has two root elements:,The parse() function returns an object which represents the entire document. This is not the root element. To get a reference to the root element, call the getroot() method.,To create a new element, instantiate the Element class. You pass the element name (namespace + local name) as the first argument. This statement creates a feed element in the Atom namespace. This will be our new document’s root element.,So far, we’ve worked with this XML document “from the top down,” starting with the root element, getting its child elements, and so on throughout the document. But many uses of XML require you to find specific elements. Etree can do that, too.
XML is a generalized way of describing hierarchical structured data. An XML document contains one or more elements, which are delimited by start and end tags. This is a complete (albeit boring) XML document:
<foo> ①
</foo> ②
Elements can be nested to any depth. An element bar
inside an element foo
is said to be a subelement or child of foo
.
<foo>
<bar></bar>
</foo>
The first element in every XML document is called the root element. An XML document can only have one root element. The following is not an XML document, because it has two root elements:
<foo></foo>
<bar></bar>
Elements can have text content.
<foo lang='en'>
<bar lang='fr'>PapayaWhip</bar>
</foo>
Elements that contain no text and no children are empty.
<foo></foo>
Is it feasible in Java using the SAX api to parse a list of XML fragments with no root element from a stream input?,I would like not to settle with obvious but clumsy solutions as "Pre-append a custom root element or Use buffered fragment parsing"., › Criticalstructurecorruption bsod windows 81 with intel haxm android emulator installed, 1 week ago Jun 27, 2012 · Is it feasible in Java using the SAX api to parse a list of XML fragments with no root element from. ... Is it feasible in Java using the SAX api to parse a list of XML fragments with no root element from a stream input? 在Java中使用SAX api解析XML片段列表是否可行而没有来自流输入的根元素? ...
org.xml.sax.SAXParseException: The markup in the document following the root element must be well - formed.
Enumeration<InputStream> streams = Collections.enumeration( Arrays.asList(new InputStream[] { new ByteArrayInputStream("<root>".getBytes()), yourXmlLikeStream, new ByteArrayInputStream("</root>".getBytes()), })); SequenceInputStream seqStream = new SequenceInputStream(streams); // Now pass the `seqStream` into the SAX parser.
org.xml.sax.SAXParseException: The markup in the document following the root element must be well - formed.
Enumeration<InputStream>streams = Collections.enumeration(Arrays.asList(new InputStream[] { new ByteArrayInputStream("<root>".getBytes()), yourXmlLikeStream, new ByteArrayInputStream("</root>".getBytes()),})); SequenceInputStream seqStream = new SequenceInputStream(streams); // Now pass the `seqStream` into the SAX parser.
Aug 10, 2020
<?xml version="1.0" encoding="UTF-8"?>
<store>
<inventory>
<apples count="5">Golden Delicious</apples>
<oranges count="0" />
</inventory>
<employees>
<employee title="writer">SearingFrost</employee>
</employees>
</store>
xml_string = '
<?xml version="1.0" encoding="UTF-8"?>
<store>
<inventory>...
</store>'
store = etree.fromstring(xml_string)
# root is the root element, in this case store
print(etree.tostring(store, encoding='utf-8', xml_declaration=True, pretty_print=True))
# b'
<?xml version=\'1.0\' encoding=\'utf-8\'?>
<store>
<inventory>...
</store>'
xml_string = '
<?xml version="1.0" encoding="UTF-8"?>
<store>
<inventory>...
</store>'
store = etree.fromstring(xml_string)
print(store.xpath('/store'))
# [<Element store at 0x112bb51c0>]
print(store.xpath('/store/*'))
# [<Element inventory at 0x112bb5100>, <Element employees at 0x112bb52c0>]
# Iterating through the store
# returning start and end events
# We can see SearingFrost is changed to NewName inplace
for event, element in etree.iterwalk(store, events=('start', 'end')):
print(event, element)
if element.tag == 'SearingFrost':
element.tag = 'NewName'
# start <Element store at 0x107240700>
# start <Element inventory at 0x1081d0540>
# start <Element apples at 0x1081d6580>
# end <Element apples at 0x1081d6580>
# start <Element oranges at 0x1081d6600>
# end <Element oranges at 0x1081d6600>
# end <Element inventory at 0x1081d0540>
# start <Element employees at 0x1081c0b00>
# start <Element SearingFrost at 0x1081d05c0>
# end <Element NewName at 0x1081d05c0>
# end <Element employees at 0x1081c0b00>
# end <Element store at 0x107240700>
from lxml import etree
store = etree.Element('store')
inventory = etree.SubElement(store, 'inventory')
# Be mindful when adding assigning variables after
# the .text on SubElement call, as it will return
# the text and not the Element
etree.SubElement(inventory, 'apples', {'count': '5'}).text = 'Golden Delicious'
etree.SubElement(inventory, 'oranges', {'count': '0'})
employees = etree.SubElement(store, 'employees')
etree.SubElement(employees, 'SearingFrost', {'title': 'writer'})
print(etree.tostring(store, encoding='utf-8', xml_declaration=True, pretty_print=True))
# b'<?xml version=\'1.0\' encoding=\'utf-8\'?><store><inventory>...</store>'