Dec 13, 2008

NEVER Use PIPE with Python popen

I do use Python at work time-to-time, mostly for build tools and pipelnie stuff. This is the problem I ran into a week ago, and thought I should let other people know too. (I assume a lot of game programmers use Python? Well, not in game code, though)

This is basically a short summary version of this article, which I got my answer from.
  • If you use redirect stdout of a children process through PIPE, it will hang when the size of output is bigger than 64KB. (And it does not give you any meaningful error message. You might see some deadlock deep into some file handling functinos in ntdll or so)
  • My advice is NEVER use PIPE with popen. Even if your current subprocess never outputs more than 64K, other programmers can change this shallow assumption sometime later. (This was my case)
  • Instead, always use a tempfile. (see the code example below)

Code examples:
  • Not-so-good code:

    p = Popen(cmd, stdout=PIPE)

    ...


    out_lines = p.stdout.readlines()

  • Better code:

    temp_file = tempfile.TemporaryFile()
    d = temp_file.fileno()
    p = Popen(cmd, stdout = d)

    ...

    out_lines = temp_file.readlines()

p.s.1 You don't have to close temp_file. It'll be closed when garbage collector collects this. But still donig so would be a good practice.

p.s.2 This was done in Python 2.5.1. The reason why I'm saying this is because I just heard Python 3.0 is not backward-compatible.

5 Comments:

  1. I cannot read the file d, python says it's impossible to read an int object.
    ReplyDelete
  2. can you post your code here?
    ReplyDelete
  3. I've run into the problem before, and it can be worked around by placing your call to p.stdout.readlines() and p.stderr.readlines() in a thread before calling p.wait(). The problem is that either call would wait for the file to close while the other one potentially blocks when it's buffer is full.
    ReplyDelete
  4. In response to the first Anonymous post, the reason is because of a small error in Pope's source.

    The line out_lines = d.readlines() should be out_lines = temp_file.readlines().

    d is the file descriptor number since that's what is returned by temp_file.fileno(). You can't read from a file descriptor number.
    ReplyDelete
  5. Thanks for correcting the source code Steven. I guess I really should double check my source code :-)

    I'll update the main post to reflect it.
    ReplyDelete