1. 12 Apr, 2004 40 commits
    • Andrew Morton's avatar
      [PATCH] AIO+DIO bio_count race fix · c58d3aeb
      Andrew Morton authored
      From: Suparna Bhattacharya <suparna@in.ibm.com>,
            Daniel McNeil <daniel@osdl.org>
      
      This patch ensures that when the DIO code falls back to buffered i/o after
      having submitted part of the i/o, then buffered i/o is issued only for the
      remaining part of the request (i.e.  the part not already covered by DIO),
      rather than redo the entire i/o.  Now, instead of returning written ==
      -ENOTBLK, generic_file_direct_IO returns the number of bytes already handled
      by DIO, so that the caller knows how much of the I/O is left to be handled
      via fallback to buffered write.
      
      We need to careful not to access dio fields if its possible that the dio
      could already have been freed asynchronously during i/o completion.  A tricky
      part of this involves plugging the window between the decrement of bio_count
      and accessing dio->waiter during i/o completion where the dio could get freed
      by the submission path.  This potential "bio_count race" was tackled (by
      Daniel) by changing bio_list_lock into bio_lock and using that for all the
      bio fields.  Now bio_count and bios_in_flight have been converted from
      atomics into int and are both protected by the bio_lock.  The race in
      finished_one_bio() could thus be fixed by leaving the bio_count at 1 until
      after the dio_complete() and then doing the bio_count decrement and wakeup
      holding the bio_lock.  It appears that shifting to the spin_lock instead of
      atomic_inc/decs is ok performance wise as well.
      
      Update:
      
      An AIO O_DIRECT request was extending the file so it was done
      synchronously.  However, the request got an EFAULT and direct_io_worker()
      was calling aio_complete() on the iocb and returning the EFAULT.  When
      io_submit_one() got the EFAULT return, it assume it had to call
      aio_complete() since the i/o never got queued.
      
      The fix is for direct_io_worker() to only call aio_complete() when the
      upper layer is going to return -EIOCBQUEUED and not when getting errors
      that are being return to the submit path.
      c58d3aeb
    • Andrew Morton's avatar
      [PATCH] direct-io AIO fixes · 332c8cf1
      Andrew Morton authored
      From: Suparna Bhattacharya <suparna@in.ibm.com>
      
      Fixes the following remaining issues with the DIO code:
      
      1. During DIO file extends, intermediate writes could extend i_size
         exposing unwritten blocks to intermediate reads (Soln: Don't drop i_sem
         for file extends)
      
      2. AIO-DIO file extends may update i_size before I/O completes,
         exposing unwritten blocks to intermediate reads.  (Soln: Force AIO-DIO
         file extends to be synchronous)
      
      3. AIO-DIO writes to holes call aio_complete() before falling back to
         buffered I/O !  (Soln: Avoid calling aio_complete() if -ENOTBLK)
      
      4. AIO-DIO writes to an allocated region followed by a hole, falls back
         to buffered i/o without waiting for already submitted i/o to complete;
         might return to user-space, which could overwrite the buffer contents
         while they are still being written out by the kernel (Soln: Always wait
         for submitted i/o to complete before falling back to buffered i/o)
      332c8cf1
    • Andrew Morton's avatar
      [PATCH] blockdev direct-io speedups · aa34baa2
      Andrew Morton authored
      From: Badari Pulavarty <pbadari@us.ibm.com>
      
      1) blkdev_direct_IO() calls blockdev_direct_IO() instead of
         blockdev_direct_IO_no_locking().
      
      2) writev entry point is generic_file_writev() which grabs i_sem.  It
         should use generic_file_write_nolock() instead.
      aa34baa2
    • Andrew Morton's avatar
      [PATCH] Fix race between ll_rw_block() and block_write_full_page() · c2179a48
      Andrew Morton authored
      Fix a race which was identified by Daniel McNeil <daniel@osdl.org>
      
      If a buffer_head is under I/O due to JBD's ordered data writeout (which uses
      ll_rw_block()) then either filemap_fdatawrite() or filemap_fdatawait() need
      to wait on the buffer's existing I/O.
      
      Presently neither will do so, because __block_write_full_page() will not
      actually submit any I/O and will hence not mark the page as being under
      writeback.
      
      The best-performing fix would be to somehow mark the page as being under
      writeback and defer waiting for the ll_rw_block-initiated I/O until
      filemap_fdatawait()-time.  But this is hard, because in
      __block_write_full_page() we do not have control of the buffer_head's end_io
      handler.  Possibly we could make JBD call into end_buffer_async_write(), but
      that gets nasty.
      
      This patch makes __block_write_full_page() wait for any buffer_head I/O to
      complete before inspecting the buffer_head state.  It only does this in the
      case where __block_write_full_page() was called for a "data-integrity" write:
      (wbc->sync_mode != WB_SYNC_NONE).
      
      Probably it doesn't matter, because kjournald is currently submitting (or has
      already submitted) all dirty buffers anyway.
      c2179a48
    • Andrew Morton's avatar
      [PATCH] O_DIRECT data exposure fixes · bc0e2bbf
      Andrew Morton authored
      From: Badari Pulavarty, Suparna Bhattacharya, Andrew Morton
      
      Forward port of Stephen Tweedie's DIO fixes from 2.4, to fix various DIO vs
      buffered IO exposures involving races causing:
      
      (a) stale data from uninstantiated blocks to be read, e.g.
      
          - O_DIRECT reads against buffered writes to a sparse region
      
          - O_DIRECT writes to a sparse region against buffered reads
      
      (b) potential data corruption with
      
          - O_DIRECT IOs against truncate
      
          due to writes to truncated blocks (which may have been reallocated to
          another file).
      
      Summary of fixes:
      
      1) All the changes affect only regular files.  RAW/O_DIRECT on block are
         unaffected. 
      
      2) The DIO code will not fill in sparse regions on a write.  Instead
         -ENOTBLK is returned and the generic file write code would fallthrough to
         buffered IO in this case followed by writing through the pages to disk
         using filemap_fdatawrite/wait.
      
      3) i_sem is held during both DIO reads and writes.  For reads, and writes
         to already allocated blocks, it is released right after IO is issued,
         while for writes to newly allocated blocks (e.g file extending writes and
         hole overwrites) it is held all the way through until IO completes (and
         data is committed to disk).
      
      4) filemap_fdatawrite/wait are called under i_sem to synchronize buffered
         pages to disk blocks before issuing DIO.
      
      5) A new rwsem (i_alloc_sem) is held in shared mode all the while a DIO
         (read or write) is in progress, and in exclusive mode by truncate to guard
         against deallocation of data blocks during DIO. 
      
      6) All this new locking has been pushed down into blockdev_direct_IO to
         avoid interfering with NFS direct IO.  The locks are taken in the order
         i_sem followed by i_alloc_sem.  While i_sem may be released after IO
         submission in some cases, i_alloc_sem is held through until dio_complete
         (in the case of AIO-DIO this happens through the IO completion callback).
      
      7) i_sem and i_alloc_sem are not held for the _nolock versions of write
         routines, as used by blockdev and XFS.  Filesystems can specify the
         needs_special_locking parameter to __blockdev_direct_IO from their direct
         IO address space op accordingly.
      
      Note from Badari:
      Here is the locking (when needs_special_locking is true):
      
      (1) generic_file_*_write() holds i_sem (as before) and calls
          ->direct_IO().  blockdev_direct_IO gets i_alloc_sem and call
          direct_io_worker().
      
      (2) generic_file_*_read() does not hold any locks.  blockdev_direct_IO()
          gets i_sem and then i_alloc_sem and calls direct_io_worker() to do the
          work
      
      (3) direct_io_worker() does the work and drops i_sem after submitting IOs
          if appropriate and drops i_alloc_sem after completing IOs.
      bc0e2bbf
    • Andrew Morton's avatar
      [PATCH] enable suspend-on-halt for NS Geode · 62a36b1f
      Andrew Morton authored
      From: Matt Mackall <mpm@selenic.com>
      
      From: Zwane Mwaikambo <zwane@arm.linux.org.uk>
      
      This enables deep powersaving mode on Geode boxes.
      62a36b1f
    • Andrew Morton's avatar
      [PATCH] shrink inode when quota is disabled · 87217f47
      Andrew Morton authored
      From: Matt Mackall <mpm@selenic.com>
      
      drop quota array in inode struct if no quota support
      87217f47
    • Andrew Morton's avatar
      [PATCH] eliminate nswap and cnswap · 8398bcc6
      Andrew Morton authored
      From: Matt Mackall <mpm@selenic.com>
      
      The nswap and cnswap variables counters have never been incremented as
      Linux doesn't do task swapping.
      8398bcc6
    • Andrew Morton's avatar
      [PATCH] improve CONFIG_EMBEDDED help text · b931abdb
      Andrew Morton authored
      From: Matt Mackall <mpm@selenic.com>
      
      Make CONFIG_EMBEDDED description more accurate
      b931abdb
    • Andrew Morton's avatar
      [PATCH] remove bogus MOD_{INC,DEC}_USE_COUNT from hysdn · cc66b6fc
      Andrew Morton authored
      From: Christoph Hellwig <hch@lst.de>
      
      the maintainer doesn't response unfortauntely, but removing these from
      net_devices unconditionally is the 2.6 way to go, there's no more module
      refcounting on net devices.
      cc66b6fc
    • Andrew Morton's avatar
      [PATCH] oss/wavfront.c warning fix. · 36bf1087
      Andrew Morton authored
      From: "Luiz Fernando N. Capitulino" <lcapitulino@prefeitura.sp.gov.br>
      
      sound/oss/wavfront.c: At top level:
      sound/oss/wavfront.c:2498: warning: `errno' defined but not used
      36bf1087
    • Andrew Morton's avatar
      [PATCH] kill spurious MAKDEV scripts · ffe52a4a
      Andrew Morton authored
      From: Christoph Hellwig <hch@lst.de>
      
      Kill magic ide/sound makedev scripts in scripts/.  The userland MAKEDEV is
      the proper place and already has support for them.
      ffe52a4a
    • Andrew Morton's avatar
      [PATCH] missing NULL pointer check in pte_alloc_one. · 7653e3ac
      Andrew Morton authored
      From: Martin Schwidefsky <schwidefsky@de.ibm.com>
      
      Just found an small bug in pgalloc for s390*.  Comparing notes with other
      architectures I found that pte_alloc_one is sick for alpha and sparc64 as
      well.
      7653e3ac
    • Andrew Morton's avatar
      [PATCH] selinux: fix struct type · d15128eb
      Andrew Morton authored
      From: Stephen Smalley <sds@epoch.ncsc.mil>
      
      This patch fixes the type of the ssec pointer in the sk_free_security
      function.  This has no current impact as the magic element is the top of each
      structure.  Thanks to Chad Hanson of TCS for discovering the bug and
      submitting the patch.
      d15128eb
    • Andrew Morton's avatar
      [PATCH] stv0299.c unused variable · 25c1c70b
      Andrew Morton authored
      From: "Luiz Fernando N. Capitulino" <lcapitulino@prefeitura.sp.gov.br>
      
      drivers/media/dvb/frontends/stv0299.c:356: warning: unused variable `i'
      25c1c70b
    • Andrew Morton's avatar
      [PATCH] ia64 MSI support · 9938e2c2
      Andrew Morton authored
      From: "Nguyen, Tom L" <tom.l.nguyen@intel.com>
      
      Adds MSI support for ia64.
      
      - Modified existing code in drivers/pci/msi.c and drivers/pci/msi.h to
        include MSI support on IA64 platform.
      
      - Based on the comments received from Zwane Mwaikambo and David Mosberger,
        this patch consolidates the vector allocators as
        assign_irq_vector(AUTO_ASSIGN) has the same semantics as
        ia64_alloc_vector() by converting the existing uses of ia64_alloc_vector()
        to assign_irq_vector(AUTO_ASSIGN).
      
      - Based on the comments received from Zwane Mwaikambo, this patch
        consolidates the semantics of vector allocator assign_irq_vector() in
        drivers/pci/msi.c into the relevant architecture's vector allocator
        assign_irq_vector() in arch/i386/kernel/io_apic.c.
      
      - Regarding vector allocation, this patch modifies the existing function
        assign_irq_vector() to maximize the number of allocated vectors to 188
        before going -ENOSPC.
      
      - Based on your comments, this patch creates <asm-i386/msi.h>,
        <asm-ia64/msi.h> and <asm-x86_64/msi.h>, includes <asm/msi.h> from within
        drivers/pci/msi.h and then places all the code which is currently under
        ifdef in msi.h into the relevant architecture's <asm/msi.h> file.
      
      - Based on your comments, this patch places pci_vector_resources() in
        existing drivers/pci/msi.c in the relevant architecture implementations
        such as into arch/.../pci/irq.c.
      9938e2c2
    • Andrew Morton's avatar
      [PATCH] summmit: increase MAX_MP_BUSSES · 27b5c750
      Andrew Morton authored
      From: James Cleverdon <jamesclv@us.ibm.com>
      
      Bump up MAX_MP_BUSSES for summit/generic subarch to cope with big IBM x440
      systems.
      27b5c750
    • Andrew Morton's avatar
      [PATCH] summit: per-subarch NR_IRQ_VECTORS · 15e98d5d
      Andrew Morton authored
      From: James Cleverdon <jamesclv@us.ibm.com>
      
      Break out the definition of NR_IRQ_VECTORS, etc from irq_vectors.h into
      irq_vectors_limits.h, so we can change it per subarch without having code
      duplication for the rest of the file.  Stick the same values back for
      mach-default, and override them for mach-summit/generic which needs bigger
      limits.
      15e98d5d
    • Andrew Morton's avatar
      [PATCH] Strip quotes from kernel parameters · 8e1aabbc
      Andrew Morton authored
      From: Rusty Russell <rusty@rustcorp.com.au>
      
      Agustin Martin <agmartin@debian.org> pointed out that this doesn't work:
      
      	options ide-mod options="ide=nodma hdc=cdrom"
      
      The quotes are understood by kernel/params.c (ie.  it skips over spaces
      inside them), but are not stripped before handing to the underlying
      function.  They should be.
      8e1aabbc
    • Andrew Morton's avatar
      [PATCH] Fix huge sparse tmpfs files · 7feebd5c
      Andrew Morton authored
      From: Hugh Dickins <hugh@veritas.com>
      
      Kevin P.  Fleming pointed out that the 2.6 tmpfs does not allow writing huge
      sparse files.  This is an unintended side-effect of the strict memory commit
      changes: which should make no difference.
      
      The solution is to treat the tmpfs files (of variable size) and the shmem
      objects (of fixed size) differently: sounds nasty but works out well.  The
      shmem objects follow the VM preallocation convention as before, but the tmpfs
      files revert to allocation on demand as a filesystem would.  If there's not
      enough memory to write to a tmpfs hole, it is reported as -ENOSPC rather than
      -ENOMEM, so the mmap writer gets SIGBUS rather than everyone else getting
      OOM-killed.
      7feebd5c
    • Andrew Morton's avatar
      [PATCH] Remove bitmap_shift_*() bitmap length limits · 77c8efae
      Andrew Morton authored
      From: William Lee Irwin III <wli@holomorphy.com>
      
      Chang bitmap_shift_left()/bitmap_shift_right() to have O(1) stackspace
      requirements.
      
      Given zeroed tail preconditions these implementations satisfy zeroed tail
      postconditions, which makes them compatible with whatever changes from Paul
      Jackson one may want to merge in the future.  No particular effort was
      required to ensure this.
      
      A small (but hopefully forgiveable) cleanup is a spelling correction:
      s/bitmap_shift_write/bitmap_shift_right/ in one of the kerneldoc comments.
      
      The primary effect of the patch is to remove the MAX_BITMAP_BITS
      limitation, so restoring the NR_CPUS to be limited only by stackspace and
      slab allocator maximums.  They also look vaguely more efficient than the
      current code, though as this was not done for performance reasons, no
      performance testing was done.
      77c8efae
    • Andrew Morton's avatar
      [PATCH] Support for floppies whose sectors are numbered from zero instead of one · 387f7c83
      Andrew Morton authored
      From: Marcelo Tosatti <marcelo.tosatti@cyclades.com>
      
      From: Alain Knaff <alain.knaff@lll.lu>
      
      This patch adds support for floppy disks whose sectors are numbered
      starting at 0 rather than 1 as usual disks would be.  This format is used
      for some CP/M disks, and also for certain music samplers (such as Ensoniq
      Ensoniq EPS 16plus).
      
      In order to use it, you need an fdutils with the current patch from
      http://fdutils.linux.lu as well, and then do setfdrpm /dev/fd0 dd zerobased
      sect=10 or setfdprm /dev/fd0 hd zerobased sect.
      
      In addtion, the patch also fixes my email addresses.  I no longer use
      pobox.com.
      387f7c83
    • Andrew Morton's avatar
      [PATCH] fix modversions now __this_module is created only in .ko · 7fdaa121
      Andrew Morton authored
      From: Rusty Russell <rusty@rustcorp.com.au>
      
      Brian Gerst's patch which moved __this_module out from module.h into the
      module post-processing had a side effect.  genksyms didn't see the
      undefined symbols for modules without a module_init (or module_exit), and
      hence didn't generate a version for them, causing the kernel to be tainted.
      
      The simple solution is to always include the versions for these functions. 
      Also includes two cleanups:
      
      1) alloc_symbol is easier to use if it populates ->next for us.
      
      2) add_exported_symbol should set owner to module, not head of module
         list (we don't use this field in entries in that list, fortunately).
      7fdaa121
    • Andrew Morton's avatar
      [PATCH] Move __this_module to modpost · 7ee168c0
      Andrew Morton authored
      From: Brian Gerst <bgerst@didntduck.org>
      
      Move the __this_module structure to the modpost code where it really
      belongs.
      7ee168c0
    • Andrew Morton's avatar
      [PATCH] speed up fget() and fget_light() · a05fc485
      Andrew Morton authored
      Eric Dumazet <dada1@cosmosbay.com>
      
      We can avoid evaluating `current' in a few places.
      a05fc485
    • Andrew Morton's avatar
      [PATCH] cpu5wdt.c warning fix · a18fde09
      Andrew Morton authored
      From: Heiko Ronsdorf <hero@persua.de>
      
      - Remvoe a volatile which causes a warning via module_param()
      
      - Remove an unused variable.
      a18fde09
    • Andrew Morton's avatar
      [PATCH] /dev/urandom scalability improvement · 47b54fbf
      Andrew Morton authored
      From: David Mosberger <davidm@napali.hpl.hp.com>
      
      Somebody recently pointed out a performance-anomaly to me where an unusual
      amount of time was being spent reading from /dev/urandom.  The problem
      isn't really surprising as it happened only on >= 4-way machines and the
      random driver isn't terribly scalable the way it is written today.  If
      scalability _really_ mattered, I suppose per-CPU data structures would be
      the way to go.  However, I found that at least for 4-way machines,
      performance can be improved considerably with the attached patch.  In
      particular, I saw the following performance on a 4-way ia64 machine:
      
      Test: 3 tasks running "dd if=/dev/urandom of=/dev/null bs=1024":
      
      			throughput:
      			
      47b54fbf
    • Andrew Morton's avatar
      [PATCH] export complete_all() · ce334bb8
      Andrew Morton authored
      From: Mike Waychison <Michael.Waychison@Sun.COM>
      
      Export complete_all for module use.
      ce334bb8
    • Andrew Morton's avatar
      [PATCH] i830 DRM missing put_user · 50a1d632
      Andrew Morton authored
      From: Arjan van de Ven <arjanv@redhat.com>
      
      The patch below adds a few missing put_user()'s to the i810/i830 drm
      modules.  Users reported oopses with 4g/4g split in action, and sparse
      annotations indeed found the offender in the function in question.  I've
      kept the sparse __user annotations since those are generally useful anyway.
       I can't test it myself but a few people reported that the oopses went away
      so far.
      50a1d632
    • Andrew Morton's avatar
      [PATCH] Update Documentation/Changes · 1079b187
      Andrew Morton authored
      From: Trivial Patch Monkey <trivial@rustcorp.com.au>
      
      From:  Thomas Molina <tmolina@cablespeed.com>
      1079b187
    • Andrew Morton's avatar
      [PATCH] ne2k-pci.c compile fix on ppc[64] · 73007d9b
      Andrew Morton authored
      From: Rusty Russell <rusty@rustcorp.com.au>
      
      These macros are redefined here.  Previously definitions are in
      asm-ppc(64)/io.h
      73007d9b
    • Andrew Morton's avatar
      [PATCH] Add CC Trivial Patch Monkey to SubmittingPatches · 64ea79c7
      Andrew Morton authored
      From: Rusty Russell <rusty@rustcorp.com.au>
      
      From: maximilian attems <janitor@sternwelten.at>
      
      Add the Monkey to SubmittingPatches.
      64ea79c7
    • Andrew Morton's avatar
      [PATCH] Use valid node number when unmapping x86 CPUs · 7275fb97
      Andrew Morton authored
      From: Rusty Russell <rusty@rustcorp.com.au>
      
      From:  colpatch@us.ibm.com
      
      The cpu_2_node[] array for i386 is initialized to all 0's, meaning that
      until modified at CPU bring-up, all CPUs are mapped to node 0.
      
      When CPUs are brought online, they are mapped to the appropriate node by
      various mechanisms, depending on the underlying hardware.
      
      When we unmap CPUs (hotplug time), we should return the mapping for the CPU
      that is going away to its original state, ie: 0.
      
      When this code was initially submitted, the misguided poster (me) made the
      mistake of putting a -1 in the cpu_2_node[] array for the CPU going away.
      
      This patch fixes this mistake, and allows code to get a valid node number
      for all valid CPU numbers.  This is important, because most (if not all)
      callers do not error check the value returned by the cpu_to_node() macro,
      and they should not have to.  The API specifies that a valid node number be
      returned for any valid CPU number.
      7275fb97
    • Andrew Morton's avatar
      [PATCH] Kill duplicate #include <linux_ioport.h> · 3a2d85ea
      Andrew Morton authored
      From: Rusty Russell <rusty@rustcorp.com.au>
      
      include/linux/device.h includes include/linux/ioport.h twice.
      3a2d85ea
    • Andrew Morton's avatar
      [PATCH] updating email info in CREDITS · 17ec30a3
      Andrew Morton authored
      From: Rusty Russell <rusty@rustcorp.com.au>
      
      From:  Thomas Molina <tmolina@cablespeed.com>
      17ec30a3
    • Andrew Morton's avatar
      [PATCH] CONFIG_X86_GENERIC description fixup · e1319f38
      Andrew Morton authored
      From: Rusty Russell <rusty@rustcorp.com.au>
      
      From:  Stewart Smith <stewart@linux.org.au>
      
      A better explanation of the X86_GENERIC config option follows.
      e1319f38
    • Andrew Morton's avatar
      [PATCH] Fix genksyms parsing · f17ea056
      Andrew Morton authored
      From: Rusty Russell <rusty@rustcorp.com.au>
      
      From: Andreas Schwab <schwab@suse.de> I'm getting a warning when building
      for ia64 with MODVERSIONS enabled.  This is a bug in genksyms, it can't
      cope with some arguments of __typeof__.
      
      The following patch will fix that.  Actually the argument of __typeof__ is
      an abstract declarator, but the genksyms parser has no production for that;
      decl_specifier_seq also matches some invalid constructs, but I don't think
      this is a problem in practice, since the compiler will reject them.
      f17ea056
    • Andrew Morton's avatar
      [PATCH] Trivial Patch Monkey should be in MAINTAINERS · fa79e47b
      Andrew Morton authored
      From: Rusty Russell <rusty@rustcorp.com.au>
      
      From:  Petri Koistinen <petri.koistinen@iki.fi>
      fa79e47b
    • Andrew Morton's avatar
      [PATCH] Fix firmware loader docs · f333f50d
      Andrew Morton authored
      From: Rusty Russell <rusty@rustcorp.com.au>
      
      From:  Pavel Machek <pavel@ucw.cz>
      
      sysfs should be mounted on /sys these days.
      f333f50d
    • Andrew Morton's avatar
      [PATCH] i386 irq.c ifdef cleanup · bc344a64
      Andrew Morton authored
      From: Rusty Russell <rusty@rustcorp.com.au>
      
      From:  Josef 'Jeff' Sipek <jeffpc@optonline.net>
      
      I just noticed the nested ifdefs, and made it little more readable.
      bc344a64