• Douglas Anderson's avatar
    drm/msm/dpu: Move min BW request and full BW disable back to mdss · b9364eed
    Douglas Anderson authored
    In commit a670ff57 ("drm/msm/dpu: always use mdp device to scale
    bandwidth") we fully moved interconnect stuff to the DPU driver. This
    had no change for sc7180 but _did_ have an impact for other SoCs. It
    made them match the sc7180 scheme.
    
    Unfortunately, the sc7180 scheme seems like it was a bit broken.
    Specifically the interconnect needs to be on for more than just the
    DPU driver's AXI bus. In the very least it also needs to be on for the
    DSI driver's AXI bus. This can be seen fairly easily by doing this on
    a ChromeOS sc7180-trogdor class device:
    
      set_power_policy --ac_screen_dim_delay=5 --ac_screen_off_delay=10
      sleep 10
      cd /sys/bus/platform/devices/ae94000.dsi/power
      echo on > control
    
    When you do that, you'll get a warning splat in the logs about
    "gcc_disp_hf_axi_clk status stuck at 'off'".
    
    One could argue that perhaps what I have done above is "illegal" and
    that it can't happen naturally in the system because in normal system
    usage the DPU is pretty much always on when DSI is on. That being
    said:
    * In official ChromeOS builds (admittedly a 5.4 kernel with backports)
      we have seen that splat at bootup.
    * Even though we don't use "autosuspend" for these components, we
      don't use the "put_sync" variants. Thus plausibly the DSI could stay
      "runtime enabled" past when the DPU is enabled. Techncially we
      shouldn't do that if the DPU's suspend ends up yanking our clock.
    
    Let's change things such that the "bare minimum" request for the
    interconnect happens in the mdss driver again. That means that all of
    the children can assume that the interconnect is on at the minimum
    bandwidth. We'll then let the DPU request the higher amount that it
    wants.
    
    It should be noted that this isn't as hacky of a solution as it might
    initially appear. Specifically:
    * Since MDSS and DPU individually get their own references to the
      interconnect then the framework will actually handle aggregating
      them. The two drivers are _not_ clobbering each other.
    * When the Qualcomm interconnect driver aggregates it takes the max of
      all the peaks. Thus having MDSS request a peak, as we're doing here,
      won't actually change the total interconnect bandwidth (it won't be
      added to the request for the DPU). This perhaps explains why the
      "average" requested in MDSS was historically 0 since that one
      _would_ be added in.
    
    NOTE also that in the downstream ChromeOS 5.4 and 5.15 kernels, we're
    also seeing some RPMH hangs that are addressed by this fix. These
    hangs are showing up in the field and on _some_ devices with enough
    stress testing of suspend/resume. Specifically right at suspend time
    with a stack crawl that looks like this (from chromeos-5.15 tree):
      rpmh_write_batch+0x19c/0x240
      qcom_icc_bcm_voter_commit+0x210/0x420
      qcom_icc_set+0x28/0x38
      apply_constraints+0x70/0xa4
      icc_set_bw+0x150/0x24c
      dpu_runtime_resume+0x50/0x1c4
      pm_generic_runtime_resume+0x30/0x44
      __genpd_runtime_resume+0x68/0x7c
      genpd_runtime_resume+0x12c/0x20c
      __rpm_callback+0x98/0x138
      rpm_callback+0x30/0x88
      rpm_resume+0x370/0x4a0
      __pm_runtime_resume+0x80/0xb0
      dpu_kms_enable_commit+0x24/0x30
      msm_atomic_commit_tail+0x12c/0x630
      commit_tail+0xac/0x150
      drm_atomic_helper_commit+0x114/0x11c
      drm_atomic_commit+0x68/0x78
      drm_atomic_helper_disable_all+0x158/0x1c8
      drm_atomic_helper_suspend+0xc0/0x1c0
      drm_mode_config_helper_suspend+0x2c/0x60
      msm_pm_prepare+0x2c/0x40
      pm_generic_prepare+0x30/0x44
      genpd_prepare+0x80/0xd0
      device_prepare+0x78/0x17c
      dpm_prepare+0xb0/0x384
      dpm_suspend_start+0x34/0xc0
    
    We don't completely understand all the mechanisms in play, but the
    hang seemed to come and go with random factors. It's not terribly
    surprising that the hang is gone after this patch since the line of
    code that was failing is no longer present in the kernel.
    
    Fixes: a670ff57 ("drm/msm/dpu: always use mdp device to scale bandwidth")
    Fixes: c33b7c03 ("drm/msm/dpu: add support for clk and bw scaling for display")
    Signed-off-by: default avatarDouglas Anderson <dianders@chromium.org>
    Reviewed-by: default avatarAbhinav Kumar <quic_abhinavk@quicinc.com>
    Tested-by: Jessica Zhang <quic_jesszhan@quicinc.com> # RB3 (sdm845) and
    Reviewed-by: default avatarStephen Boyd <swboyd@chromium.org>
    Reviewed-by: default avatarDmitry Baryshkov <dmitry.baryshkov@linaro.org>
    Patchwork: https://patchwork.freedesktop.org/patch/487884/
    Link: https://lore.kernel.org/r/20220531160059.v2.1.Ie7f6d4bf8cce28131da31a43354727e417cae98d@changeidSigned-off-by: default avatarAbhinav Kumar <quic_abhinavk@quicinc.com>
    b9364eed
dpu_kms.c 33.8 KB