I was reading the Lynx man page the other day and I came across the –dump option. The output of:
lynx --dump http://fedora.redhat.com/
contains, among other things, the complete URLs of the document’s hyperlinks. What is so important about it is that piping this output to grep in order to isolate certain URLs and then through sed, so that we get rid of other irrelevant stuff and keep only the pure URLs, for example “http://www.example.com/file1.zip”, gives us the ability to send multiple URLs to a download manager. Smells like mass downloading to me!!
After some trial and error I ended up with a very useful line of code:
lynx --dump $URL | grep -ie "tp:.*.$EXT" | sed 's/^[ ]*[0-9]*.//'
If you substitute $URL with a valid web page URL and $EXT with a file extension, eg “rpm”, the output of the above line is a list of the web page’s links to files with the extension “rpm”. It works with “http” and “ftp” links.
Now if I direct the output to a downloader like GWget, I can download many files with the minimum effort. So, I have this:
gwget $(lynx --dump $URL | grep -ie "tp:.*.$EXT" | sed 's/^[ ]*[0-9]*.//')
This was the base for a script I wrote. I use it as an Epiphany action.
A mass downloader… by George Notaras is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License.
Copyright © 2005 - Some Rights Reserved