• Venkatesh Duggirala's avatar
    BUG#17018343 SLAVE CRASHES WHEN APPLYING ROW-BASED BINLOG ENTRIES IN CASCADING · bb32ac1d
    Venkatesh Duggirala authored
    REPLICATION
    
    Problem: In RBR mode, merge table updates are not successfully applied on a cascading
    replication.
    
    Analysis & Fix: Every type of row event is preceded by one or more table_map_log_events
    that gives the information about all the tables that are involved in the row
    event. Server maintains the list in RPL_TABLE_LIST and it goes through all the
    tables and checks for the compatibility between master and slave. Before
    checking for the compatibility, it calls 'open_tables()' which takes the list
    of all tables that needs to be locked and opened. In RBR, because of the
    Table_map_log_event , we already have all the tables including base tables in
    the list. But the open_tables() which is generic call takes care of appending
    base tables if the list contains merge tables. There is an assumption in the
    current replication layer logic that these tables (TABLE_LIST type objects) are always
    added in the end of the list. Replication layer maintains the count of
    tables(tables_to_lock_count) that needs to be verified for compatibility check
    and runs through only those many tables from the list and rest of the objects
    in linked list can be skipped. But this assumption is wrong.
    open_tables()->..->add_children_to_list() adds base tables to the list immediately
    after seeing the merge table in the list.
    
    For eg: If the list passed to open_tables() is t1->t2->t3 where t3 is merge
    table (and t1 and t2 are base tables), it adds t1'->t2' to the list after t3.
    New table list looks like t1->t2->t3->t1'->t2'. It looks like it added at the
    end of the list but that is not correct. If the list passed to open_tables()
    is t3->t1->t2 where t3 is merge table (and t1 and t2 are base tables), the new
    prepared list will be t3->t1'->t2'->t1->t2. Where t1' and t2' are of
    TABLE_LIST objects which were added by add_children_to_list() call and replication
    layer should not look into them. Here tables_to_lock_count  will not help as the
    objects are added in between the list.
    
    Fix: After investigating add_children_list() logic (which is called from open_tables()),
    there is no flag/logic in it to skip adding the children to the list even if the
    children are already included in the table list. Hence to fix the issue, a
    logic should be added in the replication layer to skip children in the list by
    checking whether  'parent_l' is non-null or not. If it is children, we will skip 'compatibility'
    check for that table.
    
    Also this patch is not removing 'tables_to_lock_count' logic for the performance issues
    if there are any children at the end of the list, those can be easily skipped directly by
    stopping the loop with tables_to_lock_count check.
    bb32ac1d
log_event.cc 338 KB