A mass downloader…

No Comments

I was reading the Lynx man page the other day and I came across the –dump option. The output of:

lynx --dump http://fedora.redhat.com/

contains, among other things, the complete URLs of the document’s hyperlinks. What is so important about it is that piping this output to grep in order to isolate certain URLs and then through sed, so that we get rid of other irrelevant stuff and keep only the pure URLs, for example “http://www.example.com/file1.zip”, gives us the ability to send multiple URLs to a download manager. Smells like mass downloading to me!!

After some trial and error I ended up with a very useful line of code:

lynx --dump $URL | grep -ie "tp:.*.$EXT" | sed 's/^[ 	]*[0-9]*.//'

If you substitute $URL with a valid web page URL and $EXT with a file extension, eg “rpm”, the output of the above line is a list of the web page’s links to files with the extension “rpm”. It works with “http” and “ftp” links.

Now if I direct the output to a downloader like GWget, I can download many files with the minimum effort. So, I have this:

gwget $(lynx --dump $URL | grep -ie "tp:.*.$EXT" | sed 's/^[ 	]*[0-9]*.//')

This was the base for a script I wrote. I use it as an Epiphany action.

A mass downloader… by George Notaras is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License.
Copyright © 2005 - Some Rights Reserved

George Notaras avatar

About George Notaras

George Notaras is the editor of the G-Loaded Journal, a technical blog about Free and Open-Source Software. George, among other things, is an enthusiast self-taught GNU/Linux system administrator. He has created this web site to share the IT knowledge and experience he has gained over the years with other people. George primarily uses CentOS and Fedora. He has also developed some open-source software projects in his spare time.