• Rafael J. Wysocki's avatar
    ACPI: EC: Rework flushing of pending work · 016b87ca
    Rafael J. Wysocki authored
    There is a race condition in the ACPI EC driver, between
    __acpi_ec_flush_event() and acpi_ec_event_handler(), that may
    cause systems to stay in suspended-to-idle forever after a wakeup
    event coming from the EC.
    
    Namely, acpi_s2idle_wake() calls acpi_ec_flush_work() to wait until
    the delayed work resulting from the handling of the EC GPE in
    acpi_ec_dispatch_gpe() is processed, and that function invokes
    __acpi_ec_flush_event() which uses wait_event() to wait for
    ec->nr_pending_queries to become zero on ec->wait, and that wait
    queue may be woken up too early.
    
    Suppose that acpi_ec_dispatch_gpe() has caused acpi_ec_gpe_handler()
    to run, so advance_transaction() has been called and it has invoked
    acpi_ec_submit_query() to queue up an event work item, so
    ec->nr_pending_queries has been incremented (under ec->lock).  The
    work function of that work item, acpi_ec_event_handler() runs later
    and calls acpi_ec_query() to process the event.  That function calls
    acpi_ec_transaction() which invokes acpi_ec_transaction_unlocked()
    and the latter wakes up ec->wait under ec->lock, but it drops that
    lock before returning.
    
    When acpi_ec_query() returns, acpi_ec_event_handler() acquires
    ec->lock and decrements ec->nr_pending_queries, but at that point
    __acpi_ec_flush_event() (woken up previously) may already have
    acquired ec->lock, checked the value of ec->nr_pending_queries (and
    it would not have been zero then) and decided to go back to sleep.
    Next, if ec->nr_pending_queries is equal to zero now, the loop
    in acpi_ec_event_handler() terminates, ec->lock is released and
    acpi_ec_check_event() is called, but it does nothing unless
    ec_event_clearing is equal to ACPI_EC_EVT_TIMING_EVENT (which is
    not the case by default).  In the end, if no more event work items
    have been queued up while executing acpi_ec_transaction_unlocked(),
    there is nothing to wake up __acpi_ec_flush_event() again and it
    sleeps forever, so the suspend-to-idle loop cannot make progress and
    the system is permanently suspended.
    
    To avoid this issue, notice that it actually is not necessary to
    wait for ec->nr_pending_queries to become zero in every case in
    which __acpi_ec_flush_event() is used.
    
    First, during platform-based system suspend (not suspend-to-idle),
    __acpi_ec_flush_event() is called by acpi_ec_disable_event() after
    clearing the EC_FLAGS_QUERY_ENABLED flag, which prevents
    acpi_ec_submit_query() from submitting any new event work items,
    so calling flush_scheduled_work() and flushing ec_query_wq
    subsequently (in order to wait until all of the queries in that
    queue have been processed) would be sufficient to flush all of
    the pending EC work in that case.
    
    Second, the purpose of the flushing of pending EC work while
    suspended-to-idle described above really is to wait until the
    first event work item coming from acpi_ec_dispatch_gpe() is
    complete, because it should produce system wakeup events if
    that is a valid EC-based system wakeup, so calling
    flush_scheduled_work() followed by flushing ec_query_wq is also
    sufficient for that purpose.
    
    Rework the code to follow the above observations.
    
    Fixes: 56b99184 ("PM: sleep: Simplify suspend-to-idle control flow")
    Reported-by: default avatarKenneth R. Crudup <kenny@panix.com>
    Tested-by: default avatarKenneth R. Crudup <kenny@panix.com>
    Cc: 5.4+ <stable@vger.kernel.org> # 5.4+
    Signed-off-by: default avatarRafael J. Wysocki <rafael.j.wysocki@intel.com>
    016b87ca
ec.c 56.5 KB