• Xiaochen Shen's avatar
    x86/resctrl: Fix incorrect local bandwidth when mba_sc is enabled · 06c5fe9b
    Xiaochen Shen authored
    The MBA software controller (mba_sc) is a feedback loop which
    periodically reads MBM counters and tries to restrict the bandwidth
    below a user-specified value. It tags along the MBM counter overflow
    handler to do the updates with 1s interval in mbm_update() and
    update_mba_bw().
    
    The purpose of mbm_update() is to periodically read the MBM counters to
    make sure that the hardware counter doesn't wrap around more than once
    between user samplings. mbm_update() calls __mon_event_count() for local
    bandwidth updating when mba_sc is not enabled, but calls mbm_bw_count()
    instead when mba_sc is enabled. __mon_event_count() will not be called
    for local bandwidth updating in MBM counter overflow handler, but it is
    still called when reading MBM local bandwidth counter file
    'mbm_local_bytes', the call path is as below:
    
      rdtgroup_mondata_show()
        mon_event_read()
          mon_event_count()
            __mon_event_count()
    
    In __mon_event_count(), m->chunks is updated by delta chunks which is
    calculated from previous MSR value (m->prev_msr) and current MSR value.
    When mba_sc is enabled, m->chunks is also updated in mbm_update() by
    mistake by the delta chunks which is calculated from m->prev_bw_msr
    instead of m->prev_msr. But m->chunks is not used in update_mba_bw() in
    the mba_sc feedback loop.
    
    When reading MBM local bandwidth counter file, m->chunks was changed
    unexpectedly by mbm_bw_count(). As a result, the incorrect local
    bandwidth counter which calculated from incorrect m->chunks is shown to
    the user.
    
    Fix this by removing incorrect m->chunks updating in mbm_bw_count() in
    MBM counter overflow handler, and always calling __mon_event_count() in
    mbm_update() to make sure that the hardware local bandwidth counter
    doesn't wrap around.
    
    Test steps:
      # Run workload with aggressive memory bandwidth (e.g., 10 GB/s)
      git clone https://github.com/intel/intel-cmt-cat && cd intel-cmt-cat
      && make
      ./tools/membw/membw -c 0 -b 10000 --read
    
      # Enable MBA software controller
      mount -t resctrl resctrl -o mba_MBps /sys/fs/resctrl
    
      # Create control group c1
      mkdir /sys/fs/resctrl/c1
    
      # Set MB throttle to 6 GB/s
      echo "MB:0=6000;1=6000" > /sys/fs/resctrl/c1/schemata
    
      # Write PID of the workload to tasks file
      echo `pidof membw` > /sys/fs/resctrl/c1/tasks
    
      # Read local bytes counters twice with 1s interval, the calculated
      # local bandwidth is not as expected (approaching to 6 GB/s):
      local_1=`cat /sys/fs/resctrl/c1/mon_data/mon_L3_00/mbm_local_bytes`
      sleep 1
      local_2=`cat /sys/fs/resctrl/c1/mon_data/mon_L3_00/mbm_local_bytes`
      echo "local b/w (bytes/s):" `expr $local_2 - $local_1`
    
    Before fix:
      local b/w (bytes/s): 11076796416
    
    After fix:
      local b/w (bytes/s): 5465014272
    
    Fixes: ba0f26d8 (x86/intel_rdt/mba_sc: Prepare for feedback loop)
    Signed-off-by: default avatarXiaochen Shen <xiaochen.shen@intel.com>
    Signed-off-by: default avatarBorislav Petkov <bp@suse.de>
    Reviewed-by: default avatarTony Luck <tony.luck@intel.com>
    Cc: <stable@vger.kernel.org>
    Link: https://lkml.kernel.org/r/1607063279-19437-1-git-send-email-xiaochen.shen@intel.com
    06c5fe9b
monitor.c 16.2 KB