I'm fairly baffled by the behavior I'm seeing for a Perl script that
runs well on UNIX (e.g., Linux RedHat 7.2 with Perl 5.8.1) but is
having problems running on Windows (Windows XP Professional) on
ActivePerl (5.8.2 ActiveState build 808). I've done a fair amount of
searching email list archives and reading documentation and release
notes, but am coming up empty.
The Perl script opens a long-lived TCP connection to a Java server.
Through that connection, the Java server sends the Perl script work to
do. For each piece of work, the Perl script calls socketpair(), then
calls fork() to create a transient child (child process on UNIX, but
child thread on Win32 ActiveState Perl) to do the actual work. The
child reports work progress and completion to the parent through the
socketpair() socket and then calls exit() to terminate cleanly when
the parent asks it to exit; the parent then reports completion to the
Java server. (For scale, note that each piece of work takes about
5-10 seconds clock time to complete, and the Perl script limits the
number of children it's creating to 16, so there should be no overall
resource exhaustion issue. While trying to debug this problem, I've
also tried stubbing out the actual work to be done in the children and
having each child sleep for a few seconds, and then returns some
prepackaged results, so they consume essentially zero resources.
Didn't help a bit.)
This executes fine for a while. But after several hundred pieces of
work get executed -- exactly how many is somewhat unpredictable from
run to run, as sometimes only 50 pieces of work get completed, but
other times 500 pieces of work get completed before this problem
occurs -- when the parent tries for the umpteenth time to send results
back to the Java server, the attempt fails. Specifically, a call to
syswrite($sock, $messageBuffer) returns undef. When that happens,
some debugging print statements added tells me that either:
- The $! variable contains the numeric value 10038 (string value
"Unknown error"); the $^E variable contains the text "An operation was
attempted on something that is not a socket".
- The $! variable contains the numeric value 10045 (string value
"Unknown error"); the $^E variable contains the text "The attempted
operation is not supported for the type of object referenced."
These values are documented in the perlvar(1) man page to be values
from the Win32 API GetLastError() call, as I was easily able to
confirm by looking up the error codes on MSDN on the Web.
Thoughts?
- Ping Huang
P.S. I also noticed something else odd, but it may or may not be a
red herring: I've been using Process Explorer (from
www.SysInternals.com, purveyors of interesting Win32 utilities that
show you the guts of all sorts of things) to watch the threads in my
Perl process. I see that the following Perl code snippet
use Socket;
socket(SOCK, AF_INET, SOCK_STREAM, getprotobyname("tcp"));
my $dest = sockaddr_in(23, inet_aton("192.168.1.1"));
# 192.168.1.1 port 23 is used here as an example only
connect(SOCK, $dest);
results in the creation of three File handles in the PERL.EXE Win32
process, one with name "\Device\Afd\Endpoint" which comes into
existence after the socket() call (example handle: 0x48) and two with
name "\Device\Tcp" which come into existence after the connect() call
(example handles: 0x4C and 0x50). If I then call close(SOCKET), all
three handles get closed. If I don't call close() right away but
first kill the other end of the TCP connection (e.g., the telnet
server on 192.168.1.1), Windows doesn't preemptively clean up the file
handles but leaves them there until I do call close().
However, when I use Process Explorer to watch my Perl script, after
the syswrite() fails, sometimes, the handle "\Device\Afd\Endpoint"
will have vanished.