Welcome to ProcessRunner’s documentation!

Pypi Version Documentation Status

This documentation includes an introduction to the purpose of ProcessRunner, example uses, and API docs.

Introduction and Background

ProcessRunner is built to run external programs and collect their character (string/non-binary) output and is built on the subprocess.Popen library. It simplifies the management of output when multiple concurrent copies of that output are needed.

ProcessRunner was originally built to split the output of a potentially long-running command line application where it was necessary to write the app’s output to a file while also processing the records in real time.

Today ProcessRunner continues to simplify the handling of multiple activities on an external application’s output streams, with many concurrent activities being performed on those output streams.

Note

When not to use ProcessRunner

Don’t use ProcessRunner when the native tools work. subprocess is a powerful toolset and should satisfy most use cases. Using the native tools also eliminates the fairly significant overhead ProcessRunner introduces.

ProcessRunner uses subprocess.Popen. It does not use the shell=True flag. All processes started by the class are saved in PROCESSRUNNER_PROCESSES. A list of currently active processes started by the class can be retrieved by calling processrunner.getActiveProcesses(), which IS NOT a class member of ProcessRunner.

Determining when to stop

There are multiple mechanisms to determine when the command has stopped and readers have finished. This may seem like a topic for later, but skipping this can cause indefinite hangs!

Blocking methods hold (block) the user’s application from continuing to execute. wait() is the most thorough, as it blocks until both the command and all readers are finished. readlines() blocks until the command is finished and readlines() has collected all requested lines from the command’s pipes. (Both stdout and stderr, or just the one that’s been requested via the procPipeName parameter.)

The ProcessRunner attributes stdout, stderr, and output are special in that they are blocking but also produce values as they are generated. readlines() by contrast only returns a complete list of all output once the command has finished.

When using the non-blocking methods map() and write(), it is necessary to use another mechanism to ensure all output has been processed. ProcessRunner does not require that all output be processed by a reader, except when using wait(). (Further discussion later in this section.)

A potentially risky situation arises when using map() and wait() . If a map() consumer never finishes reading all the output queued for it, wait() will effectively hang. ProcessRunner includes an INFO log notification for this situation at the NOTIFICATION_DELAY interval (default every 1 second, not currently exposed for changing).

map() returns a multiprocessing.Event() object the user can leverage to determine when the mapping has completed. Event() objects contain an is_set() method that returns a bool. In this context, False means the map is incomplete, while True means it has finished and there are no more lines to be processed. For an example of this, see Process output in real time, in the background.

Quickstart

For this series of examples, we’ll use a small one-and-done shell command to produce some output for us:

seq -f 'Line %g' 10

In Python, this can be run without ProcessRunner like so:

from subprocess import call

# Python shell
call(['seq', '-f', 'Line %g', '10'])

# Python file
if __name__ == "__main__":
    print("\n".join(
        call(['seq', '-f', 'Line %g', '10']) )
    )

Which will output the following 11 lines. The first 10 are from seq, and the last is the exit code.

Line 1
Line 2
Line 3
Line 4
Line 5
Line 6
Line 7
Line 8
Line 9
Line 10
0

Similar behavior can be seen with ProcessRunner:

from processrunner import ProcessRunner as Pr

# Python shell
Pr(['seq', '-f', 'Line %g', '10']).output

# Python file
if __name__ == "__main__":
    print("\n".join(
        Pr(['seq', '-f', 'Line %g', '10']).output
    ))

This will output the same lines, but without the exit code:

Line 1
Line 2
Line 3
Line 4
Line 5
Line 6
Line 7
Line 8
Line 9
Line 10

Note on output

ProcessRunner has several ways to collect output, the simplest being the attributes stdout, stderr, and output. This third attribute, output, is a combination of the other two, interwoven as ProcessRunner gets lines from the command’s stdout and stderr.

Writing output to a file

To quickly direct output to a file, the write() method has you covered:

from processrunner import ProcessRunner as Pr

# Python shell
Pr(['seq', '-f', 'Line %g', '10']).write('output.txt')

# Python file
if __name__ == "__main__":
    Pr(['seq', '-f', 'Line %g', '10']).write('output.txt').wait()

Note

Non-blocking!

Note the use of wait() after write() above. The write() method is non-blocking, meaning that it will return immediately. In this case, that happens before the output from the seq command makes it into output.txt.

See more examples on the examples page.

Issues

BrokenPipeErrors and Error 32s/IO Errors

These happen when a ProcessRunner instance starts to shut down before it’s really finished. Internally this often occurs when the shutdown() routine begins to terminate the child processes managing various aspects of the library (like the central _Command object referred to as the run attribute) but mapLines processing children aren’t done and try to get status from the run/_Command object.

This can be mitigated by ensuring the “complete” Event returned from a call to mapLines is headed before tearing down the ProcessRunner instance. wait() was redesigned after version 2.5.3 to globally watch these instances before returning.

RuntimeError with mention of freeze_support()

When running ProcessRunner in a program (vs on a console) make sure to start your primary logic inside a if __name__ == '__main__': block.

This stems from a compromise made when handling the fallout from changes discussed in https://bugs.python.org/issue33725. See Github for details on the changes.

Indices and tables