[PATCH] fix wrong error code on interrupted close syscalls

The problem is that close() syscalls can call a file system's flush handler, which in turn might sleep interruptibly and ultimately pass back an -ERESTARTSYS return value. This happens for files backed by an interruptible NFS mount under nfs_file_flush() when a large file has just been written and nfs_wait_bit_interruptible() detects that there is a signal pending. I have a test case where the "strace" command is used to attach to a process sleeping in such a close(). Since the SIGSTOP is forced onto the victim process (removing it from the thread's "blocked" mask in force_sig_info()), the RPC wait is interrupted and the close() is terminated early. But the file table entry has already been cleared before the flush handler was called. Thus, when the syscall is restarted, the file descriptor appears closed and an EBADF error is returned (which is wrong). What's worse, there is the hypothetical case where another thread of a multi-threaded application might have reused the file descriptor, in which case that file would be mistakenly closed. The bottom line is that close() syscalls are not restartable, and thus -ERESTARTSYS return values should be mapped to -EINTR. This is consistent with the close(2) manual page. The fix is below. Signed-off-by: Ernie Petrides <petrides@redhat.com> Cc: Roland McGrath <roland@redhat.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>

[PATCH] fix wrong error code on interrupted close syscalls
The problem is that close() syscalls can call a file system's flush handler, which in turn might sleep interruptibly and ultimately pass back an -ERESTARTSYS return value. This happens for files backed by an interruptible NFS mount under nfs_file_flush() when a large file has just been written and nfs_wait_bit_interruptible() detects that there is a signal pending. I have a test case where the "strace" command is used to attach to a process sleeping in such a close(). Since the SIGSTOP is forced onto the victim process (removing it from the thread's "blocked" mask in force_sig_info()), the RPC wait is interrupted and the close() is terminated early. But the file table entry has already been cleared before the flush handler was called. Thus, when the syscall is restarted, the file descriptor appears closed and an EBADF error is returned (which is wrong). What's worse, there is the hypothetical case where another thread of a multi-threaded application might have reused the file descriptor, in which case that file would be mistakenly closed. The bottom line is that close() syscalls are not restartable, and thus -ERESTARTSYS return values should be mapped to -EINTR. This is consistent with the close(2) manual page. The fix is below. Signed-off-by: Ernie Petrides <petrides@redhat.com> Cc: Roland McGrath <roland@redhat.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
ee731f4f · Ernie Petrides · Linus Torvalds · 7bbab916 · ee731f4f
Commit ee731f4f authored Sep 29, 2006 by Ernie Petrides Committed by Linus Torvalds Sep 29, 2006
Hide whitespace changes
Inline Side-by-side

Showing with 11 additions and 1 deletion

fs/open.c fs/open.c +11 -1

No files found.
--- a/fs/open.c
+++ b/fs/open.c
@@ -1173,6 +1173,7 @@ asmlinkage long sys_close(unsigned int fd)
 	struct file * filp;
 	struct files_struct *files = current->files;
 	struct fdtable *fdt;
+	int retval;

 	spin_lock(&files->file_lock);
 	fdt = files_fdtable(files);
@@ -1185,7 +1186,16 @@ asmlinkage long sys_close(unsigned int fd)
 	FD_CLR(fd, fdt->close_on_exec);
 	__put_unused_fd(files, fd);
 	spin_unlock(&files->file_lock);
-	return filp_close(filp, files);
+	retval = filp_close(filp, files);
+
+	/* can't restart close syscall because file table entry was cleared */
+	if (unlikely(retval == -ERESTARTSYS ||
+		     retval == -ERESTARTNOINTR ||
+		     retval == -ERESTARTNOHAND ||
+		     retval == -ERESTART_RESTARTBLOCK))
+		retval = -EINTR;
+
+	return retval;

 out_unlock:
 	spin_unlock(&files->file_lock);