ASPN ActiveState Programmer Network  
ActiveState, a division of Sophos
/ Home / Perl / PHP / Python / Tcl / XSLT /
/ Safari / My ASPN /
Cookbooks | Documentation | Mailing Lists | Modules | News Feeds | Products | User Groups
Submit Recipe
My Recipes

All Recipes
All Cookbooks


View by Category

Title: Unprettify XML: Strip irrelivant spaces and newlines from XML
Submitter: Drew Gulino (other recipes)
Last Updated: 2008/03/18
Version no: 1.0
Category: XML

 

Not Rated yet


Description:

This is a way to 'unprettify' xml, making it hard to read, but reducing the size.

Source: Text Source

#!/bin/python
# works w/Jython also
import xml.dom.minidom as dom

input_xml = """<?xml version="1.0" encoding="UTF-8" standalone="no"?>
<epp xmlns="urn:ietf:params:xml:ns:epp-1.0"
  xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
  xsi:schemaLocation="urn:ietf:params:xml:ns:epp-1.0 epp-1.0.xsd"
>
  <command>
    <login>
      <clID>username</clID>
      <pw>password</pw>
      <options>
        <version>1.0</version>
        <lang>en</lang>
      </options>
      <svcs>
        <objURI>urn:ietf:params:xml:ns:domain-1.0</objURI>
        <objURI>urn:ietf:params:xml:ns:host-1.0</objURI>
      </svcs>
    </login>
    <clTRID>ABC-12345-XYZ</clTRID>
  </command>
</epp>"""

"""
Simple doctest:
>>> fromprettyxml(input_xml) 
<?xml version="1.0" ?><epp xmlns="urn:ietf:params:xml:ns:epp-1.0" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="urn:ietf:params:xml:ns:epp-1.0 epp-1.0.xsd"><command><login><clID>username</clID><pw>password</pw><options><version>1.0</version><lang>en</lang></options><svcs><objURI>urn:ietf:params:xml:ns:domain-1.0</objURI><objURI>urn:ietf:params:xml:ns:host-1.0</objURI></svcs></login><clTRID>ABC-12345-XYZ</clTRID></command></epp>
"""
def fromprettyxml(input_xml): #cool name, but not the opposite of dom.toprettyxml()
    _dom = dom.parseString(input_xml)
    output_xml = ''.join([line.strip() for line in _dom.toxml().splitlines()])
    _dom.unlink()
    return output_xml

def _test():
    import doctest, stripxml
    doctest.testmod(stripxml)

if __name__ == "__main__":
    _test()
    print fromprettyxml(input_xml)

Discussion:

If you're dealing with a bunch of pretty printed XML,
the kind that is broken out by newlines and has spaces between elements (indented),
and you want to reduce the size of the XML so you don't waste bandwidth by
transmitting all those irrelevant bytes,
here's a way to strip them out without removing the relevant spaces inside the elements, such as in namespaces.



Add comment

No comments.



Highest rated recipes:

1. A simple XML-RPC server

2. Web service accessible ...

3. Treat the Win32 Registry ...

4. Watching a directory ...

5. Union Find data structure

6. Function Decorators by ...

7. MS SQL Server log monitor

8. Table objects with ...

9. wx twisted support using ...

10. More accurate sum




Privacy Policy | Email Opt-out | Feedback | Syndication
© 2006 ActiveState Software Inc. All rights reserved.