Dec 13, 2008

NEVER Use PIPE with Python popen

I do use Python at work time-to-time, mostly for build tools and pipelnie stuff. This is the problem I ran into a week ago, and thought I should let other people know too. (I assume a lot of game programmers use Python? Well, not in game code, though)

This is basically a short summary version of this article, which I got my answer from.
  • If you use redirect stdout of a children process through PIPE, it will hang when the size of output is bigger than 64KB. (And it does not give you any meaningful error message. You might see some deadlock deep into some file handling functinos in ntdll or so)
  • My advice is NEVER use PIPE with popen. Even if your current subprocess never outputs more than 64K, other programmers can change this shallow assumption sometime later. (This was my case)
  • Instead, always use a tempfile. (see the code example below)

Code examples:
  • Not-so-good code:

    p = Popen(cmd, stdout=PIPE)

    ...


    out_lines = p.stdout.readlines()

  • Better code I ended up using (but was told a bad advice):

    temp_file = tempfile.TemporaryFile()
    d = temp_file.fileno()
    p = Popen(cmd, stdout = d)

    ...

    out_lines = temp_file.readlines()


p.s.1 You don't have to close temp_file. It'll be closed when garbage collector collects this. But still donig so would be a good practice.

p.s.2 This was done in Python 2.5.1. The reason why I'm saying this is because I just heard Python 3.0 is not backward-compatible.

10 comments:

  1. I cannot read the file d, python says it's impossible to read an int object.

    ReplyDelete
  2. I've run into the problem before, and it can be worked around by placing your call to p.stdout.readlines() and p.stderr.readlines() in a thread before calling p.wait(). The problem is that either call would wait for the file to close while the other one potentially blocks when it's buffer is full.

    ReplyDelete
  3. In response to the first Anonymous post, the reason is because of a small error in Pope's source.

    The line out_lines = d.readlines() should be out_lines = temp_file.readlines().

    d is the file descriptor number since that's what is returned by temp_file.fileno(). You can't read from a file descriptor number.

    ReplyDelete
  4. Thanks for correcting the source code Steven. I guess I really should double check my source code :-)

    I'll update the main post to reflect it.

    ReplyDelete
  5. This sounds extremely strange. What would make it sound less strange (and even proper and expected) would be if the ellipses in your example code includes waiting for the Popened command to stop.

    If the readlines() is happening while the command that writes the pipe is running (or , proper buffering might imply the writing process is sometimes blocked untill the reading process can catch up (when the reading process has caught up, it is of course blocked until the writing process has more data). Also, readlines() tries to read all the data and return a list of lines, it is often more efficient to iterate over the lines. So a better example would be:

    p = Popen(cmd, stdout=PIPE)

    for line in p.stdout:
    ... handle line ...

    ...

    If the processes is long-lived, the pipe often handles more data than there is even room for in the file system, so a temporary file would fail instead.

    ReplyDelete
  6. A temp_file.seek(0) is needed before the readline(), otherwise readlines()
    will return an empty list.

    temp_file.seek(0)
    out_lines = temp_file.readlines()

    Thanks for the post.

    ReplyDelete
  7. This solution with the temporary file is BAD advice!

    The correct way of using Popen is with .communicate().
    It is known that interprocess communication is dead-lock prone, that is exactly why the subprocess module with the Popen() object and its communicate() method was introduced.

    p = Popen(cmd,)

    ...

    stdoutdata, stderrdata = p.communicate(input)

    ReplyDelete
    Replies
    1. Thanks. I've never known there was communicate() method. I updated my main post with your comment.

      Delete
  8. Hi, have any idea why Popen made changes in the list of arguments from command when the argument contain german characters or any other special character?
    eg.
    args = ['cmd.exe','/C', 'mybat.bat','arg1ü','arg2ßä']
    p = Popen(args,stdin=PIPE,stdout=PIPE,stderr=PIPE)
    a,b = p.comunicate()
    print a
    print b

    And in mybat.bat print the arguments with
    echo %1,%2

    Both argument was enconded but i dont want this, i need to pass to another script to process them.

    ReplyDelete