May 28, 2012

sed and strong quotes '

If you want to substitute strings via sed, it is very easy:
# echo 'hello world!' |sed 's/hello/it is a cool/'
it is a cool world!
But what if you want to use a strong quote inside the substitution?
# echo 'hello world!' |sed 's/hello/it's a cool/'
gets interpreted from the shell and nothing happens. Next try: Escape the strong quote with a backslash:
 # echo 'hello world!' |sed 's/hello/it\'s a cool/'
Does not work...
Any other ideas? What abaout using the dollar (money always helps ;-):
 # echo 'hello world!' |sed $'s/hello/it\'s a cool/'
it's a cool world!

May 27, 2012

How to export a blog from blogspot.com and to use the output...

First you have to login on http://www.blogger.com and use preferences (aka Einstellungen):

There you have to use etcetera (aka Sonstiges):

There you find a link "export blog" (aka Blog exportieren):

After the following dialog you get one big xml-file:

I got a file named blog-05-26-2012.xml. This file contains everything of you blog:
  • Layout
  • Users
  • Configuration
  • All postings (incl. comments, date, labels, ...)
  • Locales
  • Meta description
  • Timezone, timestamp format
  • ...
The problem is: How to extract the postings out of this file?

First extract only the lines with the xml-tag "entry":
 grep "<entry>" blog-05-27-2012.xml > blog.entry.xml
Then put every entry in a new line:
sed 's/<entry>/\n<entry>/g' blog.entry.xml > blog.newline.xml
Now you have some line wiht configuration details. You can remove them with this command:
 grep -v  "<email>noreply@blogger.com</email>" blog.newline.xml  |grep "<author>" > blog.posts.xml
Now this XML contains a lot of tags:
  • id
  • author
  • title
  • content
  • link
  • published
  • updated
  • uri
  • email
  • category
  • name
and some more...

If you want to get a file with one line per post like "date**title**content" you can use the following command:
cat blog.posts.xml | sed $'s/<title type=\'text\'>/gruzelwurbel/g'|sed $'s/<\/title><content type=\'html\'>/gruzelwurbel/g'|sed 's/<\/content>/gruzelwurbel/g'|sed 's/<published>/gruzelwurbel/g'|sed 's/<\/published>/gruzelwurbel/g'| awk -F gruzelwurbel '{printf("%s**%s**%s\n",$2,$4,$5)}'
-> 2008-01-01T00:00:00.000-08:00**Gästebuch**Um einen Kommentar im Gästebuch zu hinterlassen bitte "Kommentar veröffentlichen" anklicken.

Html is escaped with &lt; and &gt;. To reformat this the following to sed commands can be used:
cat file | sed 's/&lt;/</g' | sed 's/&gt;/>/g' > newfile

May 25, 2012

Visualization of disk usage: filelight

Did you ever run out of space? And then trying "du" on every directory, to find to largest files and directory and to decide, what can be deleted?
There is a nice tool to visualize the disk usage: filelight

The manpage states:
filelight  allows  you  to  understand exactly where your disk space is being used by graphically representing your file system  as  a  set  of concentric  segmented  rings, where each segment subtends an angle proportional to the disk space occupied by that file or directory.
And here a screenshot:


You can navigate per mouse click inside the graphic... Very nice...