PuTTY semi-bug unix-connreset-spin

This is a mirror. The primary PuTTY web site can be found here.

Home | Licence | FAQ | Docs | Download | Keys | Links
Mirrors | Updates | Feedback | Changes | Wishlist | Team

summary: Unix PuTTY spins in tight loop after abnormal exit
class: semi-bug: This might or might not be a bug, depending on your precise definition of what a bug is.
depends: port-unix-gtk2
priority: medium: This should be fixed one day.
present-in: 2006-11-29
fixed-in: r8040 2008-06-06 (0.61) (0.62) (0.63) (0.64)

PuTTY (development snapshot 2006-11-29) on Linux spins on the CPU after a
connection has been terminated unexpectedly.  (In this case, displaying a
PuTTY Fatal Error dialogue box with a "Connection Reset by Peer" message
in it.) stracing the putty process when it's in this state produces lots
of:

poll([{fd=3, events=POLLIN}, {fd=7, events=POLLIN}, {fd=5,
events=POLLIN|POLLPRI, revents=POLLNVAL}], 3, -1) = 1
gettimeofday({1165238007, 805859}, NULL) = 0
gettimeofday({1165238007, 805953}, NULL) = 0
ioctl(3, FIONREAD, [0])                 = 0
poll([{fd=3, events=POLLIN}, {fd=7, events=POLLIN}, {fd=5,
events=POLLIN|POLLPRI, revents=POLLNVAL}], 3, -1) = 1
gettimeofday({1165238007, 806293}, NULL) = 0
gettimeofday({1165238007, 806386}, NULL) = 0
ioctl(3, FIONREAD, [0])                 = 0

SGT, 2006-12-28: I've looked briefly into this. I can reproduce it
easily: the simplest way is to make an SSH connection to a Linux
box, type `ps j $$' to find out the parent PID of the bash process
(i.e. the user half of the privilege-separated sshd) and kill it.

Examining the full strace shows that the reason poll() is repeatedly
returning POLLNVAL on that fd is because it was closed shortly
before that. So we've closed the fd, but not removed it from GDK's
input loop.

Or have we? Applying the following diagnostic patch...

Index: uxsel.c
===================================================================
--- uxsel.c	(revision 7026)
+++ uxsel.c	(working copy)
@@ -71,6 +71,7 @@
 
     oldfd = find234(fds, newfd, NULL);
     if (oldfd) {
+printf("uxsel_input_remove %d %d\n", fd, oldfd->id);
 	uxsel_input_remove(oldfd->id);
 	del234(fds, oldfd);
 	sfree(oldfd);
@@ -78,12 +79,14 @@
 
     add234(fds, newfd);
     newfd->id = uxsel_input_add(fd, rwx);
+printf("uxsel_input_add %d %d\n", fd, newfd->id);
 }
 
 void uxsel_del(int fd)
 {
     struct fd *oldfd = find234(fds, &fd, uxsel_fd_findcmp);
     if (oldfd) {
+printf("uxsel_input_remove %d %d\n", fd, oldfd->id);
 	uxsel_input_remove(oldfd->id);
 	del234(fds, oldfd);
 	sfree(oldfd);

... shows that we actually _do_ call gdk_input_remove() for that fd
before entering the tight loop, and yet somehow the GDK event loop
is still checking it. Also, setting a breakpoint on fd_input_func()
in gtkwin.c indicates that GDK's event loop is not repeatedly
calling it - i.e. that POLLNVAL is _not_ being passed to user code,
but is entirely contained within GDK.

In other words, I'm reasonably sure this is a bug in GDK or Glib,
not in PuTTY. And since PuTTY is still running on GDK 1.2, it seems
quite possible that the bug might already be fixed in later
versions. Therefore, we probably ought to (finally) get round to
porting to GTK 2.x, and see if the problem has gone away, before
devoting any more effort to this issue.

SGT, 2007-01-03: Indeed, this is now confirmed as a GTK problem, and
moreover one which is already fixed in GTK 2. Colin Watson has
submitted a preliminary patch to build PuTTY against GTK 2, and the
problem goes away if you do that.

Audit trail for this semi-bug.


If you want to comment on this web site, see the Feedback page.
(last revision of this bug record was at 2008-06-10 23:44:36 +0000)