Using wget via Python

How would I download files (video) with Python using wget and save them locally? There will be a bunch of files, so how do I know that one file is downloaded so as to automatically start downloding another one?

Thanks.

wget with python time limit

I have a large text file of URLs which I have to download via wget. I have written a small python script which basically loops through each domain name and download them using wget (os.system(wget +

Wget problems using python subprocess

I have a url that contains bunch of ampersands. I also have a cmd given below cmd = ‘wget –verbose –auth-no-challenge –no-check-certificate -O res’ When I run the command using subprocess my url o

Using wget via Ruby on Rails

I want to build a simple website that can download a webpage www.example.com/index.html and store its snapshot on the server when the client requests. I’m thinking about using the command wget to down

Synchronous wget command sent via socket in python

I’m sending wget commands to a remote server via sockets. Here is the client code: import socket s=socket.socket(socket.AF_INET, socket.SOCK_STREAM) s.connect((‘192.168.1.16’,12345)) s.send(‘wget http

Executing wget via PHP

I’m using the wget command from within php to download a css stylesheet and its dependencies. exec (wget -p –convert-links -nH -nd –no-check-certificate http://infidelphia.com/style.css -P /home/de

Administrating CQ5 via command line NOT using cURL or wget

I’m trying to write a script for administrating CQ5 via command line NOT using cURL or wget. When I try to upload a package I get the following error message: {success:false,msg:package file para

Help “Install” Module for Python using WGET

I am trying to setup this python library and am having a very hard time. I suspect it is because I am a hobby programmer, but I have successfully installed programs in the past using the command line.

Trying to Create a Shell Script Running a Python Script using wget

I am trying to call an API once a day and save the data in the json using Python and convert it into a csv file. I am using the wget library to download the file. Even though I installed wget correctl

How to download with wget in Python using variables?

I need to do something like this, but in Python instead of Bash: i=1 while [ $i <= 10 ] ; do wget http://somewebsite.net/shared/fshared_$i.7z $i = $i + 1 done In Python I tried with the following

Login into website via wget

I have done 2 weeks of research before posting question here. I have access to this but it requires login in order to access content. I was wondering how can I login via wget and access content of it



Answers

Don’t do this. Use either urllib2 or urlgrabber instead.

If you use os.system() to spawn a process for the wget, it will block until wget finishes the download (or quits with an error). So, just call os.system(‘wget blah’) in a loop until you’ve downloaded all of your files.

Alternatively, you can use urllib2 or httplib. You’ll have to write a non-trivial amount code, but you’ll get better performance, since you can reuse a single HTTP connection to download many files, as opposed to opening a new connection for each file.

No reason to use os.system. Avoid writing a shell script in Python and go with something like urllib.urlretrieve or an equivalent.

Edit… to answer the second part of your question, you can set up a thread pool using the standard library Queue class. Since you’re doing a lot of downloading, the GIL shouldn’t be a problem. Generate a list of the URLs you wish to download and feed them to your work queue. It will handle pushing requests to worker threads.

I’m waiting for a database update to complete, so I put this together real quick.

#!/usr/bin/python

import sys
import threading
import urllib
from Queue import Queue
import logging

class Downloader(threading.Thread):
    def __init__(self, queue):
        super(Downloader, self).__init__()
        self.queue = queue

    def run(self):
        while True:
            download_url, save_as = queue.get()
            # sentinal
            if not download_url:
                return
            try:
                urllib.urlretrieve(download_url, filename=save_as)
            except Exception, e:
                logging.warn("error downloading %s: %s" % (download_url, e))

if __name__ == '__main__':
    queue = Queue()
    threads = []
    for i in xrange(5):
        threads.append(Downloader(queue))
        threads[-1].start()

    for line in sys.stdin:
        url = line.strip()
        filename = url.split('/')[-1]
        print "Download %s as %s" % (url, filename)
        queue.put((url, filename))

    # if we get here, stdin has gotten the ^D
    print "Finishing current downloads"
    for i in xrange(5):
        queue.put((None, None))

No reason to use python. Avoid writing a shell script in Python and go with something like bash or an equivalent.

Install wget via pypi http://pypi.python.org/pypi/wget/0.3

pip install wget

then run, just as documented

python -m wget <url>

Short answer (simplified). To get one file

 import urllib
 urllib.urlretrieve("http://google.com/index.html", filename="local/index.html")

You can figure out how to loop that if necessary.