Saturday, November 19, 2011

ImageMatik


Convert all TIFs in a directory to JPEGs
mogrify -format jpg -quality 50 *.tiff

http://linuxuser32.wordpress.com/2007/06/16/batch-image-convert-scale-thumbnail-jpegs-pdf/

Thursday, November 10, 2011


Python: lxml insert element before and after given element



from lxml import etree
xml = etree.parse('old_xml.xml')
count = 1
for pp in xml.xpath('/response/items/item'):
   prod_id = etree.Element('new_element')  # this element will be inserted
   prod_id.text = "%s" % count
   nav = pp.find('name')  # we will insert before/after this element
   div = nav.getparent()
   pp.insert(pp.index(nav), prod_id) # insert before 'name'
   pp.insert(pp.index(nav) + 1, prod_id) # insert after 'name'
   count += 1

out = open('new_xml.xml','w')
out.write(etree.tostring(xml, pretty_print=True))
out.close()


Python: Detect Charset and convert to Unicode


Converting data to Unicode via chardet 
http://stackoverflow.com/questions/2686709/encoding-in-python-with-lxml-complex-solution

import chardet
from lxml import html

f = open('my.xml')
content = f.read()
f.close()

encoding = chardet.detect(content)['encoding']
if encoding != 'utf-8':
    content = content.decode(encoding, 'replace').encode('utf-8')

Converting Data to Unicode via UnicodeDammit 
http://lxml.de/elementsoup.html

from BeautifulSoup import UnicodeDammit

converted = UnicodeDammit(content)
   if not converted.unicode: 
    raise UnicodeDecodeError( 
       "Failed to detect encoding, tried [%s]", 
        ', '.join(converted.triedEncodings))  
        # print converted.originalEncoding
         return converted.unicode


Credit goes to Ian Bicking and others on the lxml team

Tuesday, November 08, 2011


Using lxml xpath to get elements with a default namespace


xml_string="""
<a xmlns="http://www.w3.org/1999/xhtml">
  <b>
    <r>
      <Value>string</Value>
    </r>
    <r>
      <Value>string</Value>
    </r>
  </b>
  <e>
    <string>string1</string>
    <string>string2</string>
  </e>
</a>
"""

from lxml import etree
xml = etree.fromstring(xml_string)
xml.xpath('/w3:a/w3:b/w3:r/w3:Value/text()', namespaces={'w3':'http://www.w3.org/1999/xhtml'})