1. 20 Mar, 2012 15 commits
    • Oleg Nesterov's avatar
      exit_signal: simplify the "we have changed execution domain" logic · e6368253
      Oleg Nesterov authored
      exit_notify() checks "tsk->self_exec_id != tsk->parent_exec_id"
      to handle the "we have changed execution domain" case.
      
      We can change do_thread() to always set ->exit_signal = SIGCHLD
      and remove this check to simplify the code.
      
      We could change setup_new_exec() instead, this looks more logical
      because it increments ->self_exec_id. But note that de_thread()
      already resets ->exit_signal if it changes the leader, let's keep
      both changes close to each other.
      
      Note that we change ->exit_signal lockless, this changes the rules.
      Thereafter ->exit_signal is not stable under tasklist but this is
      fine, the only possible change is OLDSIG -> SIGCHLD. This can race
      with eligible_child() but the race is harmless. We can race with
      reparent_leader() which changes our ->exit_signal in parallel, but
      it does the same change to SIGCHLD.
      
      The noticeable user-visible change is that the execing task is not
      "visible" to do_wait()->eligible_child(__WCLONE) right after exec.
      To me this looks more logical, and this is consistent with mt case.
      Signed-off-by: default avatarOleg Nesterov <oleg@redhat.com>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      e6368253
    • Oleg Nesterov's avatar
      CLONE_PARENT shouldn't allow to set ->exit_signal · 5f8aadd8
      Oleg Nesterov authored
      The child must not control its ->exit_signal, it is the parent who
      decides which signal the child should use for notification.
      
      This means that CLONE_PARENT should not use "clone_flags & CSIGNAL",
      the forking task is the sibling of the new process and their parent
      doesn't control exit_signal in this case.
      
      This patch uses ->exit_signal of the forking process, but perhaps
      we should simply use SIGCHLD.
      
      We read group_leader->exit_signal lockless, this can race with the
      ORIGINAL_SIGNAL -> SIGCHLD transition, but this is fine.
      
      Potentially this change allows to kill self_exec_id/parent_exec_id.
      Signed-off-by: default avatarOleg Nesterov <oleg@redhat.com>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      5f8aadd8
    • Linus Torvalds's avatar
      Merge tag 'usb-3.3' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/usb · ed378a52
      Linus Torvalds authored
      Pull USB merge for 3.4-rc1 from Greg KH:
       "Here's the big USB merge for the 3.4-rc1 merge window.
      
        Lots of gadget driver reworks here, driver updates, xhci changes, some
        new drivers added, usb-serial core reworking to fix some bugs, and
        other various minor things.
      
        There are some patches touching arch code, but they have all been
        acked by the various arch maintainers."
      
      * tag 'usb-3.3' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/usb: (302 commits)
        net: qmi_wwan: add support for ZTE MF820D
        USB: option: add ZTE MF820D
        usb: gadget: f_fs: Remove lock is held before freeing checks
        USB: option: make interface blacklist work again
        usb/ub: deprecate & schedule for removal the "Low Performance USB Block" driver
        USB: ohci-pxa27x: add clk_prepare/clk_unprepare calls
        USB: use generic platform driver on ath79
        USB: EHCI: Add a generic platform device driver
        USB: OHCI: Add a generic platform device driver
        USB: ftdi_sio: new PID: LUMEL PD12
        USB: ftdi_sio: add support for FT-X series devices
        USB: serial: mos7840: Fixed MCS7820 device attach problem
        usb: Don't make USB_ARCH_HAS_{XHCI,OHCI,EHCI} depend on USB_SUPPORT.
        usb gadget: fix a section mismatch when compiling g_ffs with CONFIG_USB_FUNCTIONFS_ETH
        USB: ohci-nxp: Remove i2c_write(), use smbus
        USB: ohci-nxp: Support for LPC32xx
        USB: ohci-nxp: Rename symbols from pnx4008 to nxp
        USB: OHCI-HCD: Rename ohci-pnx4008 to ohci-nxp
        usb: gadget: Kconfig: fix typo for 'different'
        usb: dwc3: pci: fix another failure path in dwc3_pci_probe()
        ...
      ed378a52
    • Linus Torvalds's avatar
      Merge tag 'tty-3.3' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/tty · 843ec558
      Linus Torvalds authored
      Pull TTY/serial patches from Greg KH:
       "tty and serial merge for 3.4-rc1
      
        Here's the big serial and tty merge for the 3.4-rc1 tree.
      
        There's loads of fixes and reworks in here from Jiri for the tty
        layer, and a number of patches from Alan to help try to wrestle the vt
        layer into a sane model.
      
        Other than that, lots of driver updates and fixes, and other minor
        stuff, all detailed in the shortlog."
      
      * tag 'tty-3.3' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/tty: (132 commits)
        serial: pxa: add clk_prepare/clk_unprepare calls
        TTY: Wrong unicode value copied in con_set_unimap()
        serial: PL011: clear pending interrupts
        serial: bfin-uart: Don't access tty circular buffer in TX DMA interrupt after it is reset.
        vt: NULL dereference in vt_do_kdsk_ioctl()
        tty: serial: vt8500: fix annotations for probe/remove
        serial: remove back and forth conversions in serial_out_sync
        serial: use serial_port_in/out vs serial_in/out in 8250
        serial: introduce generic port in/out helpers
        serial: reduce number of indirections in 8250 code
        serial: delete useless void casts in 8250.c
        serial: make 8250's serial_in shareable to other drivers.
        serial: delete last unused traces of pausing I/O in 8250
        pch_uart: Add module parameter descriptions
        pch_uart: Use existing default_baud in setup_console
        pch_uart: Add user_uartclk parameter
        pch_uart: Add Fish River Island II uart clock quirks
        pch_uart: Use uartclk instead of base_baud
        mpc5200b/uart: select more tolerant uart prescaler on low baudrates
        tty: moxa: fix bit test in moxa_start()
        ...
      843ec558
    • Linus Torvalds's avatar
      Merge tag 'staging-3.3' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/staging · 71e7ff25
      Linus Torvalds authored
      Pull big staging driver updates from Greg KH:
       "Here is the big drivers/staging/ merge for 3.4-rc1
      
        Lots of new driver updates here, with the addition of a few new ones,
        and only one moving out of the staging tree to the "real" part of the
        kernel (the hyperv scsi driver, acked by the scsi maintainer).
      
        There are also loads of cleanups, fixes, and other minor things in
        here, all self-contained in the drivers/staging/ tree.
      
        Overall we reversed the recent trend by adding more lines than we
        removed:
         379 files changed, 37952 insertions(+), 14153 deletions(-)"
      
      * tag 'staging-3.3' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/staging: (360 commits)
        staging/zmem: Use lockdep_assert_held instead of spin_is_locked
        Staging: rtl8187se: r8180_wx.c: Cleaned up comments
        Staging: rtl8187se: r8180_wx.c: Removed old comments
        Staging: rtl8187se: r8180_dm.c: Removed old comments
        Staging: android: ram_console.c:
        Staging: rtl8187se: r8180_dm.c: Fix comments
        Staging: rtl8187se: r8180_dm.c: Fix spacing issues
        Staging: rtl8187se: r8180_dm.c Fixed indentation issues
        Staging: rtl8187se: r8180_dm.c: Fix brackets
        Staging: rtl8187se: r8180_dm.c: Removed spaces before tab stop
        staging: vme: fix section mismatches in linux-next 20120314
        Staging: rtl8187se: r8180_core.c: Fix some long line issues
        Staging: rtl8187se: r8180_core.c: Fix some spacing issues
        Staging: rtl8187se: r8180_core.c: Removed trailing spaces
        staging: mei: remove driver internal versioning
        Staging: rtl8187se: r8180_core.c: Cleaned up if statement
        staging: ozwpan depends on NET
        staging: ozwpan: added maintainer for ozwpan driver
        staging/mei: propagate error codes up in the write flow
        drivers:staging:mei Fix some typos in staging/mei
        ...
      71e7ff25
    • Linus Torvalds's avatar
      Merge tag 'driver-core-3.3' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core · 4a522463
      Linus Torvalds authored
      Pull driver core patches for 3.4-rc1 from Greg KH:
       "Here's the big driver core merge for 3.4-rc1.
      
        Lots of various things here, sysfs fixes/tweaks (with the nlink
        breakage reverted), dynamic debugging updates, w1 drivers, hyperv
        driver updates, and a variety of other bits and pieces, full
        information in the shortlog."
      
      * tag 'driver-core-3.3' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core: (78 commits)
        Tools: hv: Support enumeration from all the pools
        Tools: hv: Fully support the new KVP verbs in the user level daemon
        Drivers: hv: Support the newly introduced KVP messages in the driver
        Drivers: hv: Add new message types to enhance KVP
        regulator: Support driver probe deferral
        Revert "sysfs: Kill nlink counting."
        uevent: send events in correct order according to seqnum (v3)
        driver core: minor comment formatting cleanups
        driver core: move the deferred probe pointer into the private area
        drivercore: Add driver probe deferral mechanism
        DS2781 Maxim Stand-Alone Fuel Gauge battery and w1 slave drivers
        w1_bq27000: Only one thread can access the bq27000 at a time.
        w1_bq27000 - remove w1_bq27000_write
        w1_bq27000: remove unnecessary NULL test.
        sysfs: Fix memory leak in sysfs_sd_setsecdata().
        intel_idle: Revert change of auto_demotion_disable_flags for Nehalem
        w1: Fix w1_bq27000
        driver-core: documentation: fix up Greg's email address
        powernow-k6: Really enable auto-loading
        powernow-k7: Fix CPU family number
        ...
      4a522463
    • Linus Torvalds's avatar
      Merge tag 'char-misc-3.3' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/char-misc · 9f9d2760
      Linus Torvalds authored
      Pull char and misc patches for 3.4-rc1 from Greg KH:
       "Not much here, just a few minor fixes and some conversions to the
        module_*_driver() functions, making the codebase smaller."
      
      * tag 'char-misc-3.3' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/char-misc:
        misc: bmp085: Use unsigned long to store jiffies
        char/ramoops: included linux/err.h twice
        misc: bmp085: Handle jiffies overflow correctly
        misc: fsa9480: Remove obsolete cleanup for clientdata
        char: Fix typo in tlclk.c
        char: Fix typo in viotape.c
        cs5535-mfgpt: don't call __init function from __devinit
        MISC: convert drivers/misc/* to use module_spi_driver()
        MISC: convert drivers/misc/* to use module_i2c_driver()
        MISC: convert drivers/misc/* to use module_platform_driver()
      9f9d2760
    • Dan Carpenter's avatar
      AFS: checking wrong bit in afs_readpages() · ad2a8e60
      Dan Carpenter authored
      We should be testing "if (vnode->flags & (1 << 4))" instead of
      "if (vnode->flags & 4) {".  The current test checks if the data was
      modified instead of deleted.
      Signed-off-by: default avatarDan Carpenter <dan.carpenter@oracle.com>
      Signed-off-by: default avatarDavid Howells <dhowells@redhat.com>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      ad2a8e60
    • Linus Torvalds's avatar
      Merge branch 'timers-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip · 161f7a71
      Linus Torvalds authored
      Pull timer changes for v3.4 from Ingo Molnar
      
      * 'timers-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (32 commits)
        ntp: Fix integer overflow when setting time
        math: Introduce div64_long
        cs5535-clockevt: Allow the MFGPT IRQ to be shared
        cs5535-clockevt: Don't ignore MFGPT on SMP-capable kernels
        x86/time: Eliminate unused irq0_irqs counter
        clocksource: scx200_hrt: Fix the build
        x86/tsc: Reduce the TSC sync check time for core-siblings
        timer: Fix bad idle check on irq entry
        nohz: Remove ts->Einidle checks before restarting the tick
        nohz: Remove update_ts_time_stat from tick_nohz_start_idle
        clockevents: Leave the broadcast device in shutdown mode when not needed
        clocksource: Load the ACPI PM clocksource asynchronously
        clocksource: scx200_hrt: Convert scx200 to use clocksource_register_hz
        clocksource: Get rid of clocksource_calc_mult_shift()
        clocksource: dbx500: convert to clocksource_register_hz()
        clocksource: scx200_hrt:  use pr_<level> instead of printk
        time: Move common updates to a function
        time: Reorder so the hot data is together
        time: Remove most of xtime_lock usage in timekeeping.c
        ntp: Add ntp_lock to replace xtime_locking
        ...
      161f7a71
    • Linus Torvalds's avatar
      Merge branch 'sched-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip · 2ba68940
      Linus Torvalds authored
      Pull scheduler changes for v3.4 from Ingo Molnar
      
      * 'sched-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (27 commits)
        printk: Make it compile with !CONFIG_PRINTK
        sched/x86: Fix overflow in cyc2ns_offset
        sched: Fix nohz load accounting -- again!
        sched: Update yield() docs
        printk/sched: Introduce special printk_sched() for those awkward moments
        sched/nohz: Correctly initialize 'next_balance' in 'nohz' idle balancer
        sched: Cleanup cpu_active madness
        sched: Fix load-balance wreckage
        sched: Clean up parameter passing of proc_sched_autogroup_set_nice()
        sched: Ditch per cgroup task lists for load-balancing
        sched: Rename load-balancing fields
        sched: Move load-balancing arguments into helper struct
        sched/rt: Do not submit new work when PI-blocked
        sched/rt: Prevent idle task boosting
        sched/wait: Add __wake_up_all_locked() API
        sched/rt: Document scheduler related skip-resched-check sites
        sched/rt: Use schedule_preempt_disabled()
        sched/rt: Add schedule_preempt_disabled()
        sched/rt: Do not throttle when PI boosting
        sched/rt: Keep period timer ticking when rt throttling is active
        ...
      2ba68940
    • Linus Torvalds's avatar
      Merge branch 'perf-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip · 9c2b957d
      Linus Torvalds authored
      Pull perf events changes for v3.4 from Ingo Molnar:
      
       - New "hardware based branch profiling" feature both on the kernel and
         the tooling side, on CPUs that support it.  (modern x86 Intel CPUs
         with the 'LBR' hardware feature currently.)
      
         This new feature is basically a sophisticated 'magnifying glass' for
         branch execution - something that is pretty difficult to extract from
         regular, function histogram centric profiles.
      
         The simplest mode is activated via 'perf record -b', and the result
         looks like this in perf report:
      
      	$ perf record -b any_call,u -e cycles:u branchy
      
      	$ perf report -b --sort=symbol
      	    52.34%  [.] main                   [.] f1
      	    24.04%  [.] f1                     [.] f3
      	    23.60%  [.] f1                     [.] f2
      	     0.01%  [k] _IO_new_file_xsputn    [k] _IO_file_overflow
      	     0.01%  [k] _IO_vfprintf_internal  [k] _IO_new_file_xsputn
      	     0.01%  [k] _IO_vfprintf_internal  [k] strchrnul
      	     0.01%  [k] __printf               [k] _IO_vfprintf_internal
      	     0.01%  [k] main                   [k] __printf
      
         This output shows from/to branch columns and shows the highest
         percentage (from,to) jump combinations - i.e.  the most likely taken
         branches in the system.  "branches" can also include function calls
         and any other synchronous and asynchronous transitions of the
         instruction pointer that are not 'next instruction' - such as system
         calls, traps, interrupts, etc.
      
         This feature comes with (hopefully intuitive) flat ascii and TUI
         support in perf report.
      
       - Various 'perf annotate' visual improvements for us assembly junkies.
         It will now recognize function calls in the TUI and by hitting enter
         you can follow the call (recursively) and back, amongst other
         improvements.
      
       - Multiple threads/processes recording support in perf record, perf
         stat, perf top - which is activated via a comma-list of PIDs:
      
      	perf top -p 21483,21485
      	perf stat -p 21483,21485 -ddd
      	perf record -p 21483,21485
      
       - Support for per UID views, via the --uid paramter to perf top, perf
         report, etc.  For example 'perf top --uid mingo' will only show the
         tasks that I am running, excluding other users, root, etc.
      
       - Jump label restructurings and improvements - this includes the
         factoring out of the (hopefully much clearer) include/linux/static_key.h
         generic facility:
      
      	struct static_key key = STATIC_KEY_INIT_FALSE;
      
      	...
      
      	if (static_key_false(&key))
      	        do unlikely code
      	else
      	        do likely code
      
      	...
      	static_key_slow_inc();
      	...
      	static_key_slow_inc();
      	...
      
         The static_key_false() branch will be generated into the code with as
         little impact to the likely code path as possible.  the
         static_key_slow_*() APIs flip the branch via live kernel code patching.
      
         This facility can now be used more widely within the kernel to
         micro-optimize hot branches whose likelihood matches the static-key
         usage and fast/slow cost patterns.
      
       - SW function tracer improvements: perf support and filtering support.
      
       - Various hardenings of the perf.data ABI, to make older perf.data's
         smoother on newer tool versions, to make new features integrate more
         smoothly, to support cross-endian recording/analyzing workflows
         better, etc.
      
       - Restructuring of the kprobes code, the splitting out of 'optprobes',
         and a corner case bugfix.
      
       - Allow the tracing of kernel console output (printk).
      
       - Improvements/fixes to user-space RDPMC support, allowing user-space
         self-profiling code to extract PMU counts without performing any
         system calls, while playing nice with the kernel side.
      
       - 'perf bench' improvements
      
       - ... and lots of internal restructurings, cleanups and fixes that made
         these features possible.  And, as usual this list is incomplete as
         there were also lots of other improvements
      
      * 'perf-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (120 commits)
        perf report: Fix annotate double quit issue in branch view mode
        perf report: Remove duplicate annotate choice in branch view mode
        perf/x86: Prettify pmu config literals
        perf report: Enable TUI in branch view mode
        perf report: Auto-detect branch stack sampling mode
        perf record: Add HEADER_BRANCH_STACK tag
        perf record: Provide default branch stack sampling mode option
        perf tools: Make perf able to read files from older ABIs
        perf tools: Fix ABI compatibility bug in print_event_desc()
        perf tools: Enable reading of perf.data files from different ABI rev
        perf: Add ABI reference sizes
        perf report: Add support for taken branch sampling
        perf record: Add support for sampling taken branch
        perf tools: Add code to support PERF_SAMPLE_BRANCH_STACK
        x86/kprobes: Split out optprobe related code to kprobes-opt.c
        x86/kprobes: Fix a bug which can modify kernel code permanently
        x86/kprobes: Fix instruction recovery on optimized path
        perf: Add callback to flush branch_stack on context switch
        perf: Disable PERF_SAMPLE_BRANCH_* when not supported
        perf/x86: Add LBR software filter support for Intel CPUs
        ...
      9c2b957d
    • Linus Torvalds's avatar
      Merge branch 'irq-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip · 0bbfcaff
      Linus Torvalds authored
      Pull irq/core changes for v3.4 from Ingo Molnar
      
      * 'irq-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
        genirq: Remove paranoid warnons and bogus fixups
        genirq: Flush the irq thread on synchronization
        genirq: Get rid of unnecessary IRQTF_DIED flag
        genirq: No need to check IRQTF_DIED before stopping a thread handler
        genirq: Get rid of unnecessary irqaction field in task_struct
        genirq: Fix incorrect check for forced IRQ thread handler
        softirq: Reduce invoke_softirq() code duplication
        genirq: Fix long-term regression in genirq irq_set_irq_type() handling
        x86-32/irq: Don't switch to irq stack for a user-mode irq
      0bbfcaff
    • Linus Torvalds's avatar
      Merge branch 'core-rcu-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip · 5928a2b6
      Linus Torvalds authored
      Pull RCU changes for v3.4 from Ingo Molnar.  The major features of this
      series are:
      
       - making RCU more aggressive about entering dyntick-idle mode in order
         to improve energy efficiency
      
       - converting a few more call_rcu()s to kfree_rcu()s
      
       - applying a number of rcutree fixes and cleanups to rcutiny
      
       - removing CONFIG_SMP #ifdefs from treercu
      
       - allowing RCU CPU stall times to be set via sysfs
      
       - adding CPU-stall capability to rcutorture
      
       - adding more RCU-abuse diagnostics
      
       - updating documentation
      
       - fixing yet more issues located by the still-ongoing top-to-bottom
         inspection of RCU, this time with a special focus on the CPU-hotplug
         code path.
      
      * 'core-rcu-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (48 commits)
        rcu: Stop spurious warnings from synchronize_sched_expedited
        rcu: Hold off RCU_FAST_NO_HZ after timer posted
        rcu: Eliminate softirq-mediated RCU_FAST_NO_HZ idle-entry loop
        rcu: Add RCU_NONIDLE() for idle-loop RCU read-side critical sections
        rcu: Allow nesting of rcu_idle_enter() and rcu_idle_exit()
        rcu: Remove redundant check for rcu_head misalignment
        PTR_ERR should be called before its argument is cleared.
        rcu: Convert WARN_ON_ONCE() in rcu_lock_acquire() to lockdep
        rcu: Trace only after NULL-pointer check
        rcu: Call out dangers of expedited RCU primitives
        rcu: Rework detection of use of RCU by offline CPUs
        lockdep: Add CPU-idle/offline warning to lockdep-RCU splat
        rcu: No interrupt disabling for rcu_prepare_for_idle()
        rcu: Move synchronize_sched_expedited() to rcutree.c
        rcu: Check for illegal use of RCU from offlined CPUs
        rcu: Update stall-warning documentation
        rcu: Add CPU-stall capability to rcutorture
        rcu: Make documentation give more realistic rcutorture duration
        rcutorture: Permit holding off CPU-hotplug operations during boot
        rcu: Print scheduling-clock information on RCU CPU stall-warning messages
        ...
      5928a2b6
    • Linus Torvalds's avatar
      Merge branch 'core-locking-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip · 5ed59af8
      Linus Torvalds authored
      Pull core/locking changes for v3.4 from Ingo Molnar
      
      * 'core-locking-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
        futex: Simplify return logic
        futex: Cover all PI opcodes with cmpxchg enabled check
      5ed59af8
    • Linus Torvalds's avatar
      Merge branch 'core-iommu-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip · b7f077d7
      Linus Torvalds authored
      Pull core/iommu changes for v3.4 from Ingo Molnar
      
      * 'core-iommu-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
        x86/iommu/intel: Increase the number of iommus supported to MAX_IO_APICS
        x86/iommu/intel: Fix identity mapping for sandy bridge
      b7f077d7
  2. 19 Mar, 2012 2 commits
    • Linus Torvalds's avatar
      Merge branch 'dcache-word-accesses' · b0e37d7a
      Linus Torvalds authored
      * branch 'dcache-word-accesses':
        vfs: use 'unsigned long' accesses for dcache name comparison and hashing
      
      This does the name hashing and lookup using word-sized accesses when
      that is efficient, namely on x86 (although any little-endian machine
      with good unaligned accesses would do).
      
      It does very much depend on little-endian logic, but it's a very hot
      couple of functions under some real loads, and this patch improves the
      performance of __d_lookup_rcu() and link_path_walk() by up to about 30%.
      Giving a 10% improvement on some very pathname-heavy benchmarks.
      
      Because we do make unaligned accesses past the filename, the
      optimization is disabled when CONFIG_DEBUG_PAGEALLOC is active, and we
      effectively depend on the fact that on x86 we don't really ever have the
      last page of usable RAM followed immediately by any IO memory (due to
      ACPI tables, BIOS buffer areas etc).
      
      Some of the bit operations we do are a bit "subtle".  It's commented,
      but you do need to really think about the code.  Or just consider it
      black magic.
      
      Thanks to people on G+ for some of the optimized bit tricks.
      b0e37d7a
    • Linus Torvalds's avatar
      vfs: get rid of batshit-insane pointless dentry hash calculations · 6d7d1a0d
      Linus Torvalds authored
      For some odd historical reason, the final mixing round for the dentry
      cache hash table lookup had an insane "xor with big constant" logic.  In
      two places.
      
      The big constant that is being xor'ed is GOLDEN_RATIO_PRIME, which is a
      fairly random-looking number that is designed to be *multiplied* with so
      that the bits get spread out over a whole long-word.
      
      But xor'ing with it is insane.  It doesn't really even change the hash -
      it really only shifts the hash around in the hash table.  To make
      matters worse, the insane big constant is different on 32-bit and 64-bit
      builds, even though the name hash bits we use are always 32-bit (and the
      bits from the pointer we mix in effectively are too).
      
      It's all total voodoo programming, in other words.
      
      Now, some testing and analysis of the hash chains shows that the rest of
      the hash function seems to be fairly good.  It does pick the right bits
      of the parent dentry pointer, for example, and while it's generally a
      bad idea to use an xor to mix down the upper bits (because if there is a
      repeating pattern, the xor can cause "destructive interference"), it
      seems to not have been a disaster.
      
      For example, replacing the hash with the normal "hash_long()" code (that
      uses the GOLDEN_RATIO_PRIME constant correctly, btw) actually just makes
      the hash worse.  The hand-picked hash knew which bits of the pointer had
      the highest entropy, and hash_long() ends up mixing bits less optimally
      at least in some trivial tests.
      
      So the hash function overall seems fine, it just has that really odd
      "shift result around by a constant xor".
      
      So get rid of the silly xor, and replace the down-mixing of the bits
      with an add instead of an xor that tends to not have the same kind of
      destructive interference issues.  Some stats on the resulting hash
      chains shows that they look statistically identical before and after,
      but the code is simpler and no longer makes you go "WTF?".
      
      Also, the incoming hash really is just "unsigned int", not a long, and
      there's no real point to worry about the high 26 bits of the dentry
      pointer for the 64-bit case, because they are all going to be identical
      anyway.
      
      So also change the hashing to be done in the more natural 'unsigned int'
      that is the real size of the actual hashed data anyway.
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      6d7d1a0d
  3. 18 Mar, 2012 3 commits
    • Linus Torvalds's avatar
      Linux 3.3 · c16fa4f2
      Linus Torvalds authored
      c16fa4f2
    • Jason Baron's avatar
      Don't limit non-nested epoll paths · 93dc6107
      Jason Baron authored
      Commit 28d82dc1 ("epoll: limit paths") that I did to limit the
      number of possible wakeup paths in epoll is causing a few applications
      to longer work (dovecot for one).
      
      The original patch is really about limiting the amount of epoll nesting
      (since epoll fds can be attached to other fds). Thus, we probably can
      allow an unlimited number of paths of depth 1. My current patch limits
      it at 1000. And enforce the limits on paths that have a greater depth.
      
      This is captured in: https://bugzilla.redhat.com/show_bug.cgi?id=681578Signed-off-by: default avatarJason Baron <jbaron@redhat.com>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      93dc6107
    • Linus Torvalds's avatar
      Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net · c579bc7e
      Linus Torvalds authored
      Pull networking changes from David Miller:
       "1) icmp6_dst_alloc() returns NULL instead of ERR_PTR() leading to
           crashes, particularly during shutdown.  Reported by Dave Jones and
           fixed by Eric Dumazet.
      
        2) hyperv and wimax/i2400m return NETDEV_TX_BUSY when they have
           already freed the SKB, which causes crashes as to the caller this
           means requeue the packet.  Fixes from Eric Dumazet.
      
        3) usbnet driver doesn't allocate the right amount of headroom on
           fresh RX SKBs, fix from Eric Dumazet.
      
        4) Fix regression in ip6_mc_find_dev_rcu(), as an RCU lookup it
           abolutely should not take a reference to 'dev', this leads to
           leaks.  Fix from RonQing Li.
      
        5) Fix netfilter ctnetlink race between delete and timeout expiration.
           From Pablo Neira Ayuso.
      
        6) Revert SFQ change which causes regressions, specifically queueing
           to tail can lead to unavoidable flow starvation.  From Eric
           Dumazet.
      
        7) Fix a memory leak and a crash on corrupt firmware files in bnx2x,
           from Michal Schmidt."
      
      * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net:
        netfilter: ctnetlink: fix race between delete and timeout expiration
        ipv6: Don't dev_hold(dev) in ip6_mc_find_dev_rcu.
        wimax/i2400m: fix erroneous NETDEV_TX_BUSY use
        net/hyperv: fix erroneous NETDEV_TX_BUSY use
        net/usbnet: reserve headroom on rx skbs
        bnx2x: fix memory leak in bnx2x_init_firmware()
        bnx2x: fix a crash on corrupt firmware file
        sch_sfq: revert dont put new flow at the end of flows
        ipv6: fix icmp6_dst_alloc()
      c579bc7e
  4. 17 Mar, 2012 10 commits
  5. 16 Mar, 2012 10 commits