Saturday, November 19, 2011


Convert all TIFs in a directory to JPEGs
mogrify -format jpg -quality 50 *.tiff

Thursday, November 10, 2011

Python: lxml insert element before and after given element

from lxml import etree
xml = etree.parse('old_xml.xml')
count = 1
for pp in xml.xpath('/response/items/item'):
   prod_id = etree.Element('new_element')  # this element will be inserted
   prod_id.text = "%s" % count
   nav = pp.find('name')  # we will insert before/after this element
   div = nav.getparent()
   pp.insert(pp.index(nav), prod_id) # insert before 'name'
   pp.insert(pp.index(nav) + 1, prod_id) # insert after 'name'
   count += 1

out = open('new_xml.xml','w')
out.write(etree.tostring(xml, pretty_print=True))

Python: Detect Charset and convert to Unicode

Converting data to Unicode via chardet

import chardet
from lxml import html

f = open('my.xml')
content =

encoding = chardet.detect(content)['encoding']
if encoding != 'utf-8':
    content = content.decode(encoding, 'replace').encode('utf-8')

Converting Data to Unicode via UnicodeDammit

from BeautifulSoup import UnicodeDammit

converted = UnicodeDammit(content)
   if not converted.unicode: 
    raise UnicodeDecodeError( 
       "Failed to detect encoding, tried [%s]", 
        ', '.join(converted.triedEncodings))  
        # print converted.originalEncoding
         return converted.unicode

Credit goes to Ian Bicking and others on the lxml team

Tuesday, November 08, 2011

Using lxml xpath to get elements with a default namespace

<a xmlns="">

from lxml import etree
xml = etree.fromstring(xml_string)
xml.xpath('/w3:a/w3:b/w3:r/w3:Value/text()', namespaces={'w3':''})