Stopping gearman workers nicely

I have a number of Gearman workers running constantly, saving things like records of user page views, etc. Occasionally, I’ll update the PHP code that is used by the Gearman workers. In order to get the workers to switch to the new code, I the kill and restart the PHP processes for the workers.

What is a better way to do this? Presumably, I’m sometime losing data (albeit not very important data) when I kill one of those worker processes.

Edit: I found an answer that works for me, and posted it below.

Modularising Gearman workers in Perl

I need to modularise gearman workers in Perl, so that I can just register different worker function name and their implementation and reuse all the other code. use strict; use warnings; use Getopt::St

Gearman and lost workers

I want to use gearman as a queue-system for a webproject. Therefor I tried gearman and gearmanManager which works great. But know, I ask myself what happen if a worker, job or server has gone (for exa

Gearman with multiple servers and php workers

I’m having a problem with gearman workers running on multiple servers which i can’t seem to solve. The problem occurs when a worker server is taken offline, rather than the worker process being cancel

Sharing curl handler among gearman workers

I’ve a gearman worker (called manager) which reads a list of urls and then calls, for each of them, another gearman worker (called dependent) which fetches the content of passed url and does some

Filename of workers in Symfony Gearman bundle

On Symfony, I use Gearman to queue some jobs with Ulabox GearmanBundle as the application-side library. I read how a worker should be created in the readme of the bundle, and understood that I should

Gearman workers – limiting the amount of jobs a worker will process

Afternoon, We are using Gearman in a production environment, we have a couple of workers running which are processing 100’s of thousands of jobs daily (busy workers). We are having a slight memory iss

Running Gearman Workers in the Background

I’m using Ubuntu Natty with PHP 5.3.8. I just got Gearman working on my server. I did a few tests with some scripts I got off the PHP Manual, and everything works ok. However, I’d like to know if ther

Spark Standalone Mode: Workers not stopping properly

When stopping a whole cluster in spark (0.7.0) with $SPARK_HOME/bin/stop-all.sh not all workers are stopped correctly. More specifically, if I then want to restart the cluster with $SPARK_HOME/bin/st

Gearman: Restrict workers from picking jobs based on some input params

I’ve a Gearman queue which processes some user specific data via multiple workers. I do not want a particular user to occupy more than a single worker at once. Say I’ve a queue named process_user_dat

gearman and retrying workers with unreliable external dependencies

I’m using gearman to queue a variety of different jobs, some which can always be serviced immediately, and some which can fail, because they require an unreliable external service. (For example, sen

Answers

Hmm, You could implement a code in the workers to check occasionally if the source code was modified, if yes then just just kill themselves when they see fit. That is, check while they are in the middle of the job, and if job is very large.

Other way would be implement some kind of an interrupt, maybe via network to say stop whenever you have the chance and restart.

The last solution is helping to modify Gearman’s source to include this functionality.

This would fit nicely into your continuous integration system. I hope you have it or you should have it soon 🙂

As you check in new code, it automatically gets built and deployed onto the server. As a part of the build script, you kill all workers, and launch new ones.

http://phpscaling.com/2009/06/23/doing-the-work-elsewhere-sidebar-running-the-worker/

Like the above article demonstrates, I’ve run a worker inside a BASH shell script, exiting occasionally between jobs to cleanup (or re-load the worker-script) – or if a given task is given to it it can exit with a specific exit code and to shut down.

Solution 1


Generally I run my workers with the unix daemon utility with the -r flag and let them expire after one job. Your script will end gracefully after each iteration and daemon will restart automatically.

Your workers will be stale for one job but that may not be as big a deal to you as losing data

This solution also has the advantage of freeing up memory. You may run into problems with memory if you’re doing large jobs as PHP pre 5.3 has god awful GC.

Solution 2


You could also add a quit function to all of your workers that exits the script. When you’d like to restart you simply give gearman calls to quit with a high priority.

I’ve been looking at this recently as well (though in perl with Gearman::XS). My usecase was the same as yours – allow a long-running gearman worker to periodically check for a new version of itself and reload.

My first attempt was just having the worker keep track of how long since it last checked the worker script version (an md5sum would also work). Then once N seconds had elapsed, between jobs, it would check to see if a new version of itself was available, and restart itself (fork()/exec()). This did work OK, but workers registered for rare jobs could potentially end up waiting hours for work() to return, and thus for checking the current time.

So I’m now setting a fairly short timeout when waiting for jobs with work(), so I can check the time more regularly. The PHP interface suggest that you can set this timeout value when registering for the job. I’m using SIGALRM to trigger the new-version check. The perl interface blocks on work(), so the alarm wasn’t being triggered initially. Setting the timeout to 60 seconds got the SIGALRM working.

Well, I posted this question, now I think I have found a good answer to it.

If you look in the code for Net_Gearman_Worker, you’ll find that in the work loop, the function stopWork is monitored, and if it returns true, it exits the function.

I did the following:
Using memcache, I created a cached value, gearman_restarttime, and I use a separate script to set that to the current timestamp whenever I update the site. (I used Memcache, but this could be stored anywhere–a database, a file, or anything).

I extended the Worker class to be, essentially, Net_Gearman_Worker_Foo, and had all of my workers instantiate that. In the Foo class, I overrode the stopWork function to do the following: first, it checks gearman_restarttime; the first time through, it saves the value in a global variable. From then on, each time through, it compares the cached value to the global. If it has changed, the stopWork returns true, and the worker quits. A cron checks every minute to see if each worker is still running, and restarts any worker that has quit.

It may be worth putting a timer in stopWork as well, and checking the cache only once every x minutes. In our case, Memcache is fast enough that checking the value each time doesn’t seem to be a problem, but if you are using some other system to store off the current timestamp, checking less often would be better.

If someone were looking for answer for a worker running perl, that’s part of what the GearmanX::Starter library is for. You can stop workers after completing the current job two different ways: externally by sending the worker process a SIGTERM, or programmatically by setting a global variable.

Given the fact that the workers are written in PHP, it would be a good idea to recycle them on a known schedule. This can be a static amount of time since started or can be done after a certain number of jobs have been attempted.

This essentially kills (no pun intended) two birds with one stone. You are are mitigating the potential for memory leaks, and you have a consistent way to determine when your workers will pick up on any potentially new code.

I generally write workers such that they report their interval to stdout and/or to a logging facility so it is simple to check on where a worker is in the process.

function AutoRestart() {
   static $startTime = time();

   if (filemtime(__FILE__) > $startTime) {
      exit();
   }
}

AutoRestart();  

I ran into this same problem and came up with a solution for python 2.7.

I’m writing a python script which uses gearman to communicate with other components on the system. The script will have multiple workers, and I have each worker running in separate thread. The workers all receive gearman data, they process and store that data on a message queue, and the main thread can pull the data off of the queue as necessary.

My solution to cleanly shutting down each worker was to subclass gearman.GearmanWorker and override the work() function:

from gearman import GearmanWorker
POLL_TIMEOUT_IN_SECONDS = 60.0
class StoppableWorker(GearmanWorker):
    def __init__(self, host_list=None):
        super(StoppableWorker,self).__init__(host_list=host_list)
        self._exit_runloop = False


    # OVERRIDDEN
    def work(self, poll_timeout=POLL_TIMEOUT_IN_SECONDS):
        worker_connections = []
        continue_working = True

        def continue_while_connections_alive(any_activity):
            return self.after_poll(any_activity)

        while continue_working and not self._exit_runloop:
            worker_connections = self.establish_worker_connections()
            continue_working = self.poll_connections_until_stopped(
                worker_connections,
                continue_while_connections_alive,
                timeout=poll_timeout)

        for current_connection in worker_connections:
            current_connection.close()

        self.shutdown()


    def stopwork(self):
        self._exit_runloop = True

Use it just like GearmanWorker. When it’s time to exit the script, call the stopwork() function. It won’t stop immediately–it can take up to poll_timeout seconds before it kicks out of the run loop.

There may be multiple smart ways to invoke the stopwork() function. In my case, I create a temporary gearman client in the main thread. For the worker that I’m trying to shutdown, I send a special STOP command through the gearman server. When the worker gets this message, it knows to shut itself down.

Hope this helps!

I use following code which supports both Ctrl-C and kill -TERM. By default supervisor sends TERM signal if have not modified signal= setting. In PHP 5.3+ declare(ticks = 1) is deprecated, use pcntl_signal_dispatch() instead.

$terminate = false;
pcntl_signal(SIGINT, function() use (&$terminate)
{
    $terminate = true;
});
pcntl_signal(SIGTERM, function() use (&$terminate)
{
    $terminate = true;
});

$worker = new GearmanWorker();
$worker->addOptions(GEARMAN_WORKER_NON_BLOCKING);
$worker->setTimeout(1000);
$worker->addServer('127.0.0.1', 4730);
$worker->addFunction('reverse', function(GearmanJob $job)
{
    return strrev($job->workload());
});

$count = 500 + rand(0, 100); // rand to prevent multple workers restart at same time
for($i = 0; $i < $count; $i++)
{
    if ( $terminate )
    {
        break;
    }
    else
    {
        pcntl_signal_dispatch();
    }

    $worker->work();

    if ( $terminate )
    {
        break;
    }
    else
    {
        pcntl_signal_dispatch();
    }

    if ( GEARMAN_SUCCESS == $worker->returnCode() )
    {
        continue;
    }

    if ( GEARMAN_IO_WAIT != $worker->returnCode() && GEARMAN_NO_JOBS != $worker->returnCode() )
    {
        $e = new ErrorException($worker->error(), $worker->returnCode());
        // log exception
        break;
    }

    $worker->wait();
}

$worker->unregisterAll();

What I do is use gearmadmin to check if there are any jobs running. I used the admin API to make a UI for this. When the jobs are sitting idly, there is no harm in killing them.