• Ira Weiny's avatar
    IB/hfi1: Fix mm_struct use after free · e0cf75de
    Ira Weiny authored
    Testing with CONFIG_SLUB_DEBUG_ON=y resulted in the kernel panic below.
    
    This is the result of the mm_struct sometimes being free'd prior to
    hfi1_file_close being called.
    
    This was due to the combination of 2 reasons:
    
    1) hfi1_file_close is deferred in process exit and it therefore may not
       be called synchronously with process exit.
    2) exit_mm is called prior to exit_files in do_exit.  Normally this is ok
       however, our kernel bypass code requires us to have access to the
       mm_struct for house keeping both at "normal" close time as well as at
       process exit.
    
    Therefore, the fix is to simply keep a reference to the mm_struct until
    we are done with it.
    
    [ 3006.340150] general protection fault: 0000 [#1] SMP
    [ 3006.346469] Modules linked in: hfi1 rdmavt rpcrdma ib_isert iscsi_target_mod
    ib_iser libiscsi scsi_transport_iscsi ib_srpt target_core_mod
     ib_srp scsi_transport_srp ib_ipoib rdma_ucm ib_ucm ib_uverbs ib_umad rdma_cm
     ib_cm iw_cm dm_mirror dm_region_hash dm_log dm_mod snd_hda_code
     c_realtek iTCO_wdt snd_hda_codec_generic iTCO_vendor_support sb_edac edac_core
     x86_pkg_temp_thermal intel_powerclamp coretemp kvm irqbypass c
     rct10dif_pclmul crc32_pclmul ghash_clmulni_intel aesni_intel lrw snd_hda_intel
     gf128mul snd_hda_codec glue_helper snd_hda_core ablk_helper sn
     d_hwdep cryptd snd_seq snd_seq_device snd_pcm snd_timer snd soundcore pcspkr
     shpchp mei_me sg lpc_ich mei i2c_i801 mfd_core ioatdma ipmi_devi
     ntf wmi ipmi_si ipmi_msghandler acpi_cpufreq nfsd auth_rpcgss nfs_acl lockd
     grace sunrpc ip_tables ext4 jbd2 mbcache mlx4_en ib_core sr_mod s
     d_mod cdrom crc32c_intel mgag200 drm_kms_helper syscopyarea sysfillrect igb
     sysimgblt fb_sys_fops ptp mlx4_core ttm isci pps_core ahci drm li
     bsas libahci dca firewire_ohci i2c_algo_bit scsi_transport_sas firewire_core
     crc_itu_t i2c_core libata [last unloaded: mlx4_ib]
     [ 3006.461759] CPU: 16 PID: 11624 Comm: mpi_stress Not tainted 4.7.0-rc5+ #1
     [ 3006.469915] Hardware name: Intel Corporation W2600CR ........../W2600CR, BIOS SE5C600.86B.01.08.0003.022620131521 02/26/2013
     [ 3006.483027] task: ffff8804102f0040 ti: ffff8804102f8000 task.ti: ffff8804102f8000
     [ 3006.491971] RIP: 0010:[<ffffffff810f0383>]  [<ffffffff810f0383>] __lock_acquire+0xb3/0x19e0
     [ 3006.501905] RSP: 0018:ffff8804102fb908  EFLAGS: 00010002
     [ 3006.508447] RAX: 6b6b6b6b6b6b6b6b RBX: 0000000000000001 RCX: 0000000000000000
     [ 3006.517012] RDX: 0000000000000001 RSI: 0000000000000000 RDI: ffff880410b56a40
     [ 3006.525569] RBP: ffff8804102fb9b0 R08: 0000000000000001 R09: 0000000000000000
     [ 3006.534119] R10: ffff8804102f0040 R11: 0000000000000000 R12: 0000000000000000
     [ 3006.542664] R13: ffff880410b56a40 R14: 0000000000000000 R15: 0000000000000000
     [ 3006.551203] FS:  00007ff478c08700(0000) GS:ffff88042e200000(0000) knlGS:0000000000000000
     [ 3006.560814] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
     [ 3006.567806] CR2: 00007f667f5109e0 CR3: 0000000001c06000 CR4: 00000000000406e0
     [ 3006.576352] Stack:
     [ 3006.579157]  ffffffff8124b819 ffffffffffffffff 0000000000000000 ffff8804102fb940
     [ 3006.588072]  0000000000000002 0000000000000000 ffff8804102f0040 0000000000000007
     [ 3006.596971]  0000000000000006 ffff8803cad6f000 0000000000000000 ffff8804102f0040
     [ 3006.605878] Call Trace:
     [ 3006.609220]  [<ffffffff8124b819>] ? uncharge_batch+0x109/0x250
     [ 3006.616382]  [<ffffffff810f2313>] lock_acquire+0xd3/0x220
     [ 3006.623056]  [<ffffffffa0a30bfc>] ? hfi1_release_user_pages+0x7c/0xa0 [hfi1]
     [ 3006.631593]  [<ffffffff81775579>] down_write+0x49/0x80
     [ 3006.638022]  [<ffffffffa0a30bfc>] ? hfi1_release_user_pages+0x7c/0xa0 [hfi1]
     [ 3006.646569]  [<ffffffffa0a30bfc>] hfi1_release_user_pages+0x7c/0xa0 [hfi1]
     [ 3006.654898]  [<ffffffffa0a2efb6>] cacheless_tid_rb_remove+0x106/0x330 [hfi1]
     [ 3006.663417]  [<ffffffff810efd36>] ? mark_held_locks+0x66/0x90
     [ 3006.670498]  [<ffffffff817771f6>] ? _raw_spin_unlock_irqrestore+0x36/0x60
     [ 3006.678741]  [<ffffffffa0a2f1ee>] tid_rb_remove+0xe/0x10 [hfi1]
     [ 3006.686010]  [<ffffffffa0a0c5d5>] hfi1_mmu_rb_unregister+0xc5/0x100 [hfi1]
     [ 3006.694387]  [<ffffffffa0a2fcb9>] hfi1_user_exp_rcv_free+0x39/0x120 [hfi1]
     [ 3006.702732]  [<ffffffffa09fc6ea>] hfi1_file_close+0x17a/0x330 [hfi1]
     [ 3006.710489]  [<ffffffff81263e9a>] __fput+0xfa/0x230
     [ 3006.716595]  [<ffffffff8126400e>] ____fput+0xe/0x10
     [ 3006.722696]  [<ffffffff810b95c6>] task_work_run+0x86/0xc0
     [ 3006.729379]  [<ffffffff81099933>] do_exit+0x323/0xc40
     [ 3006.735672]  [<ffffffff8109a2dc>] do_group_exit+0x4c/0xc0
     [ 3006.742371]  [<ffffffff810a7f55>] get_signal+0x345/0x940
     [ 3006.748958]  [<ffffffff810340c7>] do_signal+0x37/0x700
     [ 3006.755328]  [<ffffffff8127872a>] ? poll_select_set_timeout+0x5a/0x90
     [ 3006.763146]  [<ffffffff811609cb>] ? __audit_syscall_exit+0x1db/0x260
     [ 3006.770853]  [<ffffffff8110f3e3>] ? rcu_read_lock_sched_held+0x93/0xa0
     [ 3006.778765]  [<ffffffff812347a4>] ? kfree+0x1e4/0x2a0
     [ 3006.784986]  [<ffffffff8108e75a>] ? exit_to_usermode_loop+0x33/0xac
     [ 3006.792551]  [<ffffffff8108e785>] exit_to_usermode_loop+0x5e/0xac
     [ 3006.799907]  [<ffffffff81003dca>] do_syscall_64+0x12a/0x190
     [ 3006.806664]  [<ffffffff81777a7f>] entry_SYSCALL64_slow_path+0x25/0x25
     [ 3006.814396] Code: 24 08 44 89 44 24 10 89 4c 24 18 e8 a8 d8 ff ff 48 85 c0
     8b 4c 24 18 44 8b 44 24 10 44 8b 4c 24 08 4c 8b 14 24 0f 84 30
     08 00 00 <f0> ff 80 98 01 00 00 8b 3d 48 ad be 01 45 8b a2 90 0b 00 00 85
     [ 3006.837158] RIP  [<ffffffff810f0383>] __lock_acquire+0xb3/0x19e0
     [ 3006.844401]  RSP <ffff8804102fb908>
     [ 3006.851170] ---[ end trace b7b9f21cf06c27df ]---
     [ 3006.927420] Kernel panic - not syncing: Fatal exception
     [ 3006.933954] Kernel Offset: disabled
     [ 3006.940961] ---[ end Kernel panic - not syncing: Fatal exception
     [ 3006.948249] ------------[ cut here ]------------
    
    Fixes: 3faa3d9a ("IB/hfi1: Make use of mm consistent")
    Reviewed-by: default avatarDean Luick <dean.luick@intel.com>
    Signed-off-by: default avatarIra Weiny <ira.weiny@intel.com>
    Signed-off-by: default avatarDennis Dalessandro <dennis.dalessandro@intel.com>
    Signed-off-by: default avatarDoug Ledford <dledford@redhat.com>
    e0cf75de
file_ops.c 39.8 KB