«
»

A mass downloader…

September 28th, 2005 by George Notaras

I was reading the Lynx man page the other day and I came across the –dump option. The output of:

lynx --dump http://fedora.redhat.com/

contains, among other things, the complete URLs of the document’s hyperlinks. What is so important about it is that piping this output to grep in order to isolate certain URLs and then through sed, so that we get rid of other irrelevant stuff and keep only the pure URLs, for example “http://www.example.com/file1.zip”, gives us the ability to send multiple URLs to a download manager. Smells like mass downloading to me!!

After some trial and error I ended up with a very useful line of code:

lynx --dump $URL | grep -ie "tp:.*.$EXT" | sed 's/^[ 	]*[0-9]*.//'

If you substitute $URL with a valid web page URL and $EXT with a file extension, eg “rpm”, the output of the above line is a list of the web page’s links to files with the extension “rpm”. It works with “http” and “ftp” links.

Now if I direct the output to a downloader like GWget, I can download many files with the minimum effort. So, I have this:

gwget $(lynx --dump $URL | grep -ie "tp:.*.$EXT" | sed 's/^[ 	]*[0-9]*.//')

This was the base for a script I wrote. I use it as an Epiphany action.

The A mass downloader… by George Notaras, unless otherwise expressly stated, is licensed under a Creative Commons Attribution-Noncommercial-Share Alike 3.0 Unported License. Terms and conditions beyond the scope of this license may be available at www.g-loaded.eu.

Related Articles

Tags: , ,

Bookmark and Share

Comments are automatically disabled after a certain period of time. Further discussion about the published content is still possible though in the G-Loaded Forums.