• Rodrigo Vivi's avatar
    drm/xe: Introduce a simple wedged state · fb74b205
    Rodrigo Vivi authored
    Introduce a very simple 'wedged' state where any attempt
    to access the GPU is entirely blocked.
    
    On some critical cases, like on gt_reset failure, we need to
    block any other attempt to use the GPU. Otherwise we are at
    a risk of reaching cases that would force us to reboot the machine.
    
    So, when this cases are identified we corner and block any GPU
    access. No IOCTL and not even another GT reset should be attempted.
    
    The 'wedged' state in Xe is an end state with no way back.
    Only a device "re-probe" (unbind + bind) can restore the GPU access.
    
    v2: - s/wedged/busted (Lucas)
        - use unbind+bind instead of module reload (Lucas)
        - added more info on unbind operations and instruction on bug report
        - only print the message once.
    
    v3: - s/busted/wedged (Ashutosh, Tvrtko, Thomas)
        - don't assume user has sudo and tee available (Lucas)
    
    v4: - remove unnecessary cases around ct communication or migration.
    
    Cc: Ashutosh Dixit <ashutosh.dixit@intel.com>
    Cc: Tvrtko Ursulin <tursulin@ursulin.net>
    Cc: Thomas Hellström <thomas.hellstrom@linux.intel.com>
    Cc: Lucas De Marchi <lucas.demarchi@intel.com>
    Cc: Anshuman Gupta <anshuman.gupta@intel.com>
    Reviewed-by: default avatarHimal Prasad Ghimiray <himal.prasad.ghimiray@intel.com>
    Reviewed-by: Lucas De Marchi <lucas.demarchi@intel.com> #v2
    Link: https://patchwork.freedesktop.org/patch/msgid/20240423221817.1285081-1-rodrigo.vivi@intel.comSigned-off-by: default avatarRodrigo Vivi <rodrigo.vivi@intel.com>
    fb74b205
xe_device.c 18 KB