WL#3072 Maria Recovery

misc fixes of execution of UNDOs in the UNDO phase: - into the CLR_END, store the LSN of the _previous_ UNDO (we debated what was best, so far we're going with "previous"; later we can change to "current" if needed), and store the type of record which is being undone (needed to know how to update state.records when we see the CLR_END during the REDO phase). - declaring all UNDOs and CLR_END as "compressed" - when executing an UNDO in the UNDO phase, state.records is updated as a hook when writing CLR_END (needed for "recovery of the state"), and so is trn->undo_lsn (needed for when we have checkpoints). - bugfix (execution of UNDO_ROW_DELETE didn't store the correct checksum into the re-inserted row, maria_chk -r thus threw the row away). - modifications of ma_test1: where to stop is now driven by --testflag; --test-undo just tells how to stop (flush data, flush log, nothing). - ma_test_recovery: testing of the UNDO phase, more testing of the REDO phase, identification of a bug. storage/maria/ma_blockrec.c: - bugfix: execution of UNDO_ROW_DELETE didn't store the correct checksum into the row (leading to "maria_chk -r" eliminating the re-inserted row, net effect was that rollback appeared to have rolled back no deletion). Reason was that write_block_record() used info->cur_row.checksum, while "row" can be != &info->cur_row (case of UNDO_ROW_DELETE). After fixing this, problems with _ma_update_block_record() appeared; indeed checksum was computed by allocate_and_write_block_record() while _ma_update_block_record() directly calls write_block_record(). Solution is to compute checksum in write_block_record() instead. - when executing an UNDO, we now pass the LSN of the _previous_ UNDO to block_format functions. This LSN can be 0 (if the being-executed UNDO was the transaction's first UNDO), so "undo_lsn==0" cannot work anymore to indicate "this is not UNDO work". Using undo_lsn==LSN_ERROR instead (this is an impossible LSN). - store into CLR_END the type of log record which was undone (INSERT/UPDATE/DELETE); needed for Recovery to know if/how it has to update state.records if it sees this CLR_END in the REDO phase. - when writing the CLR_END in _ma_apply_undo_row_insert(), the place to store file's id is log_data+LSN_STORE_SIZE. - in _ma_apply_undo_row_insert(), the records-- is moved to a hook when writing the CLR_END (this way it is under log's mutex which is needed for "recovery of the state") storage/maria/ma_loghandler.c: - all UNDOs, and CLR_END, start with the LSN of another UNDO; so we can declare them "compressed". - write_hook_for_clr_end() to set trn->undo_lsn (to the previous UNDO's LSN) under log's lock (like UNDOs set trn->undo_lsn under log's lock), and also update, if appropriate, state.records. - reset share->id to 0 when deassigning; not useful for now but sounds logical. storage/maria/ma_recovery.c: - if no table is found for a REDO, it's not an error; for an UNDO, it is - in the REDO phase, when we see a CLR_END we must update trn->undo_lsn and sometimes state.records. - in the UNDO phase, when we execute an UNDO_ROW_INSERT: * update trn->undo_lsn only after executing the record * store the _previous_ undo_lsn into the CLR_END - at the end of the REDO phase, when we recreate TRN objects, they have already their long id in the log (either via a LOGREC_LONG_TRANSACTION_ID, or in a checkpoint record), don't write a new, useless LOGREC_LONG_TRANSACTION_ID for them. storage/maria/ma_test1.c: * where to stop execution is now driven by --testflag and not --test-undo (ma_test2 already has --testflag for the same purpose). This allows us to do a clean stop (with commit) at any point. * --test-undo=# tells how to abort (flush all pages (which implies flushing log) or only log or nothing); all such "ways of crashing" are tested in ma_test_recovery storage/maria/ma_test_recovery: * Testing execution of UNDOs, with and without BLOBs. * Testing idempotency of REDOs. * See @todo for a probable bug with BLOBs. * maria_chk -rq instead of -r, as with -q it nicely stops on any problem in the data file (like the checksum bug see comment of ma_blockrec.c). * Testing if log was written by UNDO phase (often expected), not written by REDO phase (always expected). * Less output on the screen, compares with expected output in the end. * some shell thingies like "set --" and $# are courtesy of Danny and Pekka. storage/maria/maria_read_log.c: when only displaying the records, don't do an UNDO phase storage/maria/ma_test_recovery.expected: This is the expected output of a great part of ma_test_recovery. ma_test_recovery compares its output to the expected output and tells if different. If we look at this file it mentions differences in checksum (normal, it's not recovered yet) and in records count (getting a correct records' count when recovery starts on an already existing table, like when testing rollback, is coded but not yet pushed).

WL#3072 Maria Recovery
misc fixes of execution of UNDOs in the UNDO phase: - into the CLR_END, store the LSN of the _previous_ UNDO (we debated what was best, so far we're going with "previous"; later we can change to "current" if needed), and store the type of record which is being undone (needed to know how to update state.records when we see the CLR_END during the REDO phase). - declaring all UNDOs and CLR_END as "compressed" - when executing an UNDO in the UNDO phase, state.records is updated as a hook when writing CLR_END (needed for "recovery of the state"), and so is trn->undo_lsn (needed for when we have checkpoints). - bugfix (execution of UNDO_ROW_DELETE didn't store the correct checksum into the re-inserted row, maria_chk -r thus threw the row away). - modifications of ma_test1: where to stop is now driven by --testflag; --test-undo just tells how to stop (flush data, flush log, nothing). - ma_test_recovery: testing of the UNDO phase, more testing of the REDO phase, identification of a bug. storage/maria/ma_blockrec.c: - bugfix: execution of UNDO_ROW_DELETE didn't store the correct checksum into the row (leading to "maria_chk -r" eliminating the re-inserted row, net effect was that rollback appeared to have rolled back no deletion). Reason was that write_block_record() used info->cur_row.checksum, while "row" can be != &info->cur_row (case of UNDO_ROW_DELETE). After fixing this, problems with _ma_update_block_record() appeared; indeed checksum was computed by allocate_and_write_block_record() while _ma_update_block_record() directly calls write_block_record(). Solution is to compute checksum in write_block_record() instead. - when executing an UNDO, we now pass the LSN of the _previous_ UNDO to block_format functions. This LSN can be 0 (if the being-executed UNDO was the transaction's first UNDO), so "undo_lsn==0" cannot work anymore to indicate "this is not UNDO work". Using undo_lsn==LSN_ERROR instead (this is an impossible LSN). - store into CLR_END the type of log record which was undone (INSERT/UPDATE/DELETE); needed for Recovery to know if/how it has to update state.records if it sees this CLR_END in the REDO phase. - when writing the CLR_END in _ma_apply_undo_row_insert(), the place to store file's id is log_data+LSN_STORE_SIZE. - in _ma_apply_undo_row_insert(), the records-- is moved to a hook when writing the CLR_END (this way it is under log's mutex which is needed for "recovery of the state") storage/maria/ma_loghandler.c: - all UNDOs, and CLR_END, start with the LSN of another UNDO; so we can declare them "compressed". - write_hook_for_clr_end() to set trn->undo_lsn (to the previous UNDO's LSN) under log's lock (like UNDOs set trn->undo_lsn under log's lock), and also update, if appropriate, state.records. - reset share->id to 0 when deassigning; not useful for now but sounds logical. storage/maria/ma_recovery.c: - if no table is found for a REDO, it's not an error; for an UNDO, it is - in the REDO phase, when we see a CLR_END we must update trn->undo_lsn and sometimes state.records. - in the UNDO phase, when we execute an UNDO_ROW_INSERT: * update trn->undo_lsn only after executing the record * store the _previous_ undo_lsn into the CLR_END - at the end of the REDO phase, when we recreate TRN objects, they have already their long id in the log (either via a LOGREC_LONG_TRANSACTION_ID, or in a checkpoint record), don't write a new, useless LOGREC_LONG_TRANSACTION_ID for them. storage/maria/ma_test1.c: * where to stop execution is now driven by --testflag and not --test-undo (ma_test2 already has --testflag for the same purpose). This allows us to do a clean stop (with commit) at any point. * --test-undo=# tells how to abort (flush all pages (which implies flushing log) or only log or nothing); all such "ways of crashing" are tested in ma_test_recovery storage/maria/ma_test_recovery: * Testing execution of UNDOs, with and without BLOBs. * Testing idempotency of REDOs. * See @todo for a probable bug with BLOBs. * maria_chk -rq instead of -r, as with -q it nicely stops on any problem in the data file (like the checksum bug see comment of ma_blockrec.c). * Testing if log was written by UNDO phase (often expected), not written by REDO phase (always expected). * Less output on the screen, compares with expected output in the end. * some shell thingies like "set --" and $# are courtesy of Danny and Pekka. storage/maria/maria_read_log.c: when only displaying the records, don't do an UNDO phase storage/maria/ma_test_recovery.expected: This is the expected output of a great part of ma_test_recovery. ma_test_recovery compares its output to the expected output and tells if different. If we look at this file it mentions differences in checksum (normal, it's not recovered yet) and in records count (getting a correct records' count when recovery starts on an already existing table, like when testing rollback, is coded but not yet pushed).
ac4ad9bd · unknown · 58ac5254 · ac4ad9bd · ac4ad9bd · ac4ad9bd
Commit ac4ad9bd authored Sep 06, 2007 by unknown
7 changed files
--- a/storage/maria/ma_blockrec.c
+++ b/storage/maria/ma_blockrec.c
@@ -1659,7 +1659,7 @@ static my_bool free_full_page_range(MARIA_HA *info, ulonglong page, uint count)
   @param  map_blocks      On which pages the record should be stored
   @param  row_pos         Position on head page where to put head part of
                           record
-   @param  undo_lsn	   <> 0 if we are in UNDO
+   @param  undo_lsn	   <> LSN_ERROR if we are executing an UNDO

   @note
     On return all pinned pages are released.
@@ -1729,7 +1729,10 @@ static my_bool write_block_record(MARIA_HA *info,
  if (share->base.pack_fields)
    store_key_length_inc(data, row->field_lengths_length);
  if (share->calc_checksum)
-    *(data++)= (uchar) info->cur_row.checksum;
+  {
+    row->checksum= (info->s->calc_checksum)(info, record);
+    *(data++)= (uchar) (row->checksum); /* store least significant byte */
+  }
  memcpy(data, record, share->base.null_bytes);
  data+= share->base.null_bytes;
  memcpy(data, row->empty_bits, share->base.pack_bytes);
@@ -2283,19 +2286,25 @@ static my_bool write_block_record(MARIA_HA *info,
  {
    LEX_STRING *log_array= info->log_row_parts;

-    if (undo_lsn)
+    if (undo_lsn != LSN_ERROR)
    {
-      uchar log_data[LSN_STORE_SIZE + FILEID_STORE_SIZE];
-
+      uchar log_data[LSN_STORE_SIZE + FILEID_STORE_SIZE + 1];
      /* undo_lsn must be first for compression to work */
      lsn_store(log_data, undo_lsn);
+      /*
+        Store if this CLR is about an UNDO_INSERT, UNDO_DELETE or UNDO_UPDATE;
+        in the first/second case, Recovery, when it sees the CLR_END in the
+        REDO phase, may decrement/increment the records' count.
+      */
+      /** @todo when Monty has UNDO_UPDATE coded, revisit this */
+      log_data[LSN_STORE_SIZE + FILEID_STORE_SIZE]= LOGREC_UNDO_ROW_DELETE;
      log_array[TRANSLOG_INTERNAL_PARTS + 0].str=    (char*) log_data;
      log_array[TRANSLOG_INTERNAL_PARTS + 0].length= sizeof(log_data);

      if (translog_write_record(&lsn, LOGREC_CLR_END,
                                info->trn, info, sizeof(log_data),
                                TRANSLOG_INTERNAL_PARTS + 1, log_array,
-                                log_data+ FILEID_STORE_SIZE))
+                                log_data + LSN_STORE_SIZE))
        goto disk_err;
    }
    else
@@ -2425,7 +2434,7 @@ disk_err:
  @param info                Maria handler
  @param record              Record to write
  @param row		     Information about fields in 'record'
-  @param undo_lsn	     <> 0 if in undo
+  @param undo_lsn	     <> LSN_ERROR if we are executing an UNDO

  @return
  @retval 0	ok
@@ -2449,8 +2458,6 @@ static my_bool allocate_and_write_block_record(MARIA_HA *info,
                            PAGECACHE_LOCK_WRITE, &row_pos))
    DBUG_RETURN(1);
  row->lastpos= ma_recordpos(blocks->block->page, row_pos.rownr);
-  if (info->s->calc_checksum)
-    row->checksum= (info->s->calc_checksum)(info,record);
  if (write_block_record(info, (uchar*) 0, record, row,
                         blocks, blocks->block->org_bitmap_value != 0,
                         &row_pos, undo_lsn))
@@ -2482,7 +2489,8 @@ MARIA_RECORD_POS _ma_write_init_block_record(MARIA_HA *info,
  DBUG_ENTER("_ma_write_init_block_record");

  calc_record_size(info, record, &info->cur_row);
-  if (allocate_and_write_block_record(info, record, &info->cur_row, 0))
+  if (allocate_and_write_block_record(info, record,
+                                      &info->cur_row, LSN_ERROR))
    DBUG_RETURN(HA_OFFSET_ERROR);
  DBUG_RETURN(info->cur_row.lastpos);
 }
@@ -2669,7 +2677,7 @@ my_bool _ma_update_block_record(MARIA_HA *info, MARIA_RECORD_POS record_pos,
    if (cur_row->extents_count && free_full_pages(info, cur_row))
      goto err;
    DBUG_RETURN(write_block_record(info, oldrec, record, new_row, blocks,
-                                   1, &row_pos, 0));
+                                   1, &row_pos, LSN_ERROR));
  }
  /*
    Allocate all size in block for record
@@ -2702,7 +2710,7 @@ my_bool _ma_update_block_record(MARIA_HA *info, MARIA_RECORD_POS record_pos,
  row_pos.data= buff + uint2korr(dir);
  row_pos.length= head_length;
  DBUG_RETURN(write_block_record(info, oldrec, record, new_row, blocks, 1,
-                                 &row_pos, 0));
+                                 &row_pos, LSN_ERROR));

 err:
  _ma_unpin_all_pages(info, 0);
@@ -4825,7 +4833,7 @@ my_bool _ma_apply_undo_row_insert(MARIA_HA *info, LSN undo_lsn,
  ulonglong page;
  uint rownr;
  LEX_STRING log_array[TRANSLOG_INTERNAL_PARTS + 1];
-  uchar log_data[LSN_STORE_SIZE + FILEID_STORE_SIZE], *buff;
+  uchar log_data[LSN_STORE_SIZE + FILEID_STORE_SIZE + 1], *buff;
  my_bool res= 1;
  MARIA_PINNED_PAGE page_link;
  LSN lsn;
@@ -4858,16 +4866,16 @@ my_bool _ma_apply_undo_row_insert(MARIA_HA *info, LSN undo_lsn,

  /* undo_lsn must be first for compression to work */
  lsn_store(log_data, undo_lsn);
+  log_data[LSN_STORE_SIZE + FILEID_STORE_SIZE]= LOGREC_UNDO_ROW_INSERT;
  log_array[TRANSLOG_INTERNAL_PARTS + 0].str=    (char*) log_data;
  log_array[TRANSLOG_INTERNAL_PARTS + 0].length= sizeof(log_data);

  if (translog_write_record(&lsn, LOGREC_CLR_END,
                            info->trn, info, sizeof(log_data),
                            TRANSLOG_INTERNAL_PARTS + 1, log_array,
-                            log_data+ FILEID_STORE_SIZE))
+                            log_data + LSN_STORE_SIZE))
    goto err;

-  info->s->state.state.records--;
  res= 0;
 err:
  _ma_unpin_all_pages(info, lsn);

--- a/storage/maria/ma_loghandler.c
+++ b/storage/maria/ma_loghandler.c
@@ -213,6 +213,9 @@ static my_bool write_hook_for_redo(enum translog_record_type type,
 static my_bool write_hook_for_undo(enum translog_record_type type,
                                   TRN *trn, MARIA_HA *tbl_info, LSN *lsn,
                                   struct st_translog_parts *parts);
+static my_bool write_hook_for_clr_end(enum translog_record_type type,
+                                      TRN *trn, MARIA_HA *tbl_info, LSN *lsn,
+                                      struct st_translog_parts *parts);

 static my_bool translog_page_validator(uchar *page_addr, uchar* data_ptr);

@@ -414,7 +417,8 @@ static LOG_DESC INIT_LOGREC_REDO_UNDELETE_ROW=
 "redo_undelete_row", LOGREC_NOT_LAST_IN_GROUP, NULL, NULL};

 static LOG_DESC INIT_LOGREC_CLR_END=
-{LOGRECTYPE_FIXEDLENGTH, 9, 9, NULL, write_hook_for_redo, NULL, 0,
+{LOGRECTYPE_PSEUDOFIXEDLENGTH, LSN_STORE_SIZE + FILEID_STORE_SIZE + 1,
+ LSN_STORE_SIZE + FILEID_STORE_SIZE + 1, NULL, write_hook_for_clr_end, NULL, 1,
 "clr_end", LOGREC_LAST_IN_GROUP, NULL, NULL};

 static LOG_DESC INIT_LOGREC_PURGE_END=
@@ -422,16 +426,16 @@ static LOG_DESC INIT_LOGREC_PURGE_END=
 "purge_end", LOGREC_LAST_IN_GROUP, NULL, NULL};

 static LOG_DESC INIT_LOGREC_UNDO_ROW_INSERT=
-{LOGRECTYPE_FIXEDLENGTH,
+{LOGRECTYPE_PSEUDOFIXEDLENGTH,
 LSN_STORE_SIZE + FILEID_STORE_SIZE + PAGE_STORE_SIZE + DIRPOS_STORE_SIZE,
 LSN_STORE_SIZE + FILEID_STORE_SIZE + PAGE_STORE_SIZE + DIRPOS_STORE_SIZE,
- NULL, write_hook_for_undo, NULL, 0,
+ NULL, write_hook_for_undo, NULL, 1,
 "undo_row_insert", LOGREC_LAST_IN_GROUP, NULL, NULL};

 static LOG_DESC INIT_LOGREC_UNDO_ROW_DELETE=
 {LOGRECTYPE_VARIABLE_LENGTH, 0,
 LSN_STORE_SIZE + FILEID_STORE_SIZE + PAGE_STORE_SIZE + DIRPOS_STORE_SIZE,
- NULL, write_hook_for_undo, NULL, 0,
+ NULL, write_hook_for_undo, NULL, 1,
 "undo_row_delete", LOGREC_LAST_IN_GROUP, NULL, NULL};

 static LOG_DESC INIT_LOGREC_UNDO_ROW_UPDATE=
@@ -451,8 +455,8 @@ static LOG_DESC INIT_LOGREC_UNDO_KEY_INSERT=
 "undo_key_insert", LOGREC_LAST_IN_GROUP, NULL, NULL};

 static LOG_DESC INIT_LOGREC_UNDO_KEY_DELETE=
-{LOGRECTYPE_VARIABLE_LENGTH, 0, 15, NULL, write_hook_for_undo, NULL, 0,
- "undo_key_delete", LOGREC_LAST_IN_GROUP, NULL, NULL}; // QQ: why not compressed?
+{LOGRECTYPE_VARIABLE_LENGTH, 0, 15, NULL, write_hook_for_undo, NULL, 1,
+ "undo_key_delete", LOGREC_LAST_IN_GROUP, NULL, NULL};

 static LOG_DESC INIT_LOGREC_PREPARE=
 {LOGRECTYPE_VARIABLE_LENGTH, 0, 0, NULL, NULL, NULL, 0,
@@ -6303,6 +6307,46 @@ static my_bool write_hook_for_undo(enum translog_record_type type
  */
 }

+
+/**
+   @brief Sets transaction's undo_lsn, first_undo_lsn if needed
+
+   @todo move it to a separate file
+
+   @return Operation status, always 0 (success)
+*/
+
+static my_bool write_hook_for_clr_end(enum translog_record_type type
+                                      __attribute__ ((unused)),
+                                      TRN *trn, MARIA_HA *tbl_info
+                                      __attribute__ ((unused)),
+                                      LSN *lsn
+                                      __attribute__ ((unused)),
+                                      struct st_translog_parts *parts)
+{
+  char *ptr= parts->parts[TRANSLOG_INTERNAL_PARTS + 0].str;
+  enum translog_record_type undone_record_type=
+    ptr[LSN_STORE_SIZE + FILEID_STORE_SIZE];
+
+  DBUG_ASSERT(trn->trid != 0);
+  /** @todo depending on what we are undoing, update "records" or not */
+  trn->undo_lsn= lsn_korr(ptr);
+  switch (undone_record_type) {
+  case LOGREC_UNDO_ROW_DELETE:
+    tbl_info->s->state.state.records++;
+    break;
+  case LOGREC_UNDO_ROW_INSERT:
+    tbl_info->s->state.state.records--;
+    break;
+  default:
+    DBUG_ASSERT(0);
+  }
+  if (trn->undo_lsn == LSN_IMPOSSIBLE) /* has fully rolled back */
+    trn->first_undo_lsn= LSN_WITH_FLAGS_TO_FLAGS(trn->first_undo_lsn);
+  return 0;
+}
+
+
 /**
   @brief Gives a 2-byte-id to MARIA_SHARE and logs this fact

@@ -6375,6 +6419,15 @@ int translog_assign_id_to_share(MARIA_SHARE *share, TRN *trn)
                                       sizeof(log_array)/sizeof(log_array[0]),
                                       log_array, NULL)))
      return 1;
+    /*
+      Note that we first set share->id then write the record. The checkpoint
+      record does not include any share with id==0; this is ok because:
+      checkpoint_start_log_horizon is either before or after the above
+      record. If before, ok to not include the share, as the record will be
+      seen for sure during the REDO phase. If after, Checkpoint will see all
+      data as it was after this record was written, including the id!=0, so
+      share will be included.
+    */
  }
  pthread_mutex_unlock(&share->intern_lock);
  return 0;
@@ -6400,6 +6453,7 @@ void translog_deassign_id_from_share(MARIA_SHARE *share)
  my_atomic_rwlock_rdlock(&LOCK_id_to_share);
  my_atomic_storeptr((void **)&id_to_share[share->id], 0);
  my_atomic_rwlock_rdunlock(&LOCK_id_to_share);
+  share->id= 0;
 }



--- a/storage/maria/ma_recovery.c
+++ b/storage/maria/ma_recovery.c
--- a/storage/maria/ma_test1.c
+++ b/storage/maria/ma_test1.c
@@ -15,11 +15,12 @@

 /* Testing of the basic functions of a MARIA table */

-#include "maria.h"
+#include "maria_def.h"
 #include <my_getopt.h>
 #include <m_string.h>
 #include "ma_control_file.h"
 #include "ma_loghandler.h"
+#include "trnman.h"

 extern PAGECACHE *maria_log_pagecache;
 extern const char *maria_data_root;
@@ -28,7 +29,7 @@ extern const char *maria_data_root;

 static void usage();

-static int rec_pointer_size=0, flags[50];
+static int rec_pointer_size=0, flags[50], testflag;
 static int key_field=FIELD_SKIP_PRESPACE,extra_field=FIELD_SKIP_ENDSPACE;
 static int key_type=HA_KEYTYPE_NUM;
 static int create_flag=0;
@@ -223,6 +224,9 @@ static int run_test(const char *filename)
  if (maria_commit(file) || maria_begin(file))
    goto err;

+  if (testflag == 1)
+    goto end;
+
  /* Insert 2 rows with null values */
  if (null_fields)
  {
@@ -240,16 +244,10 @@ static int run_test(const char *filename)
    flags[0]=2;
  }

-  if (die_in_middle_of_transaction == 1)
+  if (testflag == 2)
  {
-    /*
-      Ensure we get changed pages and log to disk
-      As commit record is not done, the undo entries needs to be rolled back.
-    */
-    _ma_flush_table_files(file, MARIA_FLUSH_DATA, FLUSH_RELEASE,
-                          FLUSH_RELEASE);
-    printf("Dying on request after insert without maria_close()\n");
-    exit(1);
+    printf("terminating after inserts\n");
+    goto end;
  }

  if (!skip_update)
@@ -304,6 +302,8 @@ static int run_test(const char *filename)
    maria_scan_end(file);
  }

+  if (testflag == 3)
+    goto end;
  if (!silent)
    printf("- Reopening file\n");
  if (maria_commit(file))
@@ -321,6 +321,12 @@ static int run_test(const char *filename)

    for (i=0 ; i <= 10 ; i++)
    {
+      /*
+        If you want to debug the problem in ma_test_recovery with BLOBs
+        (see @todo there), you can break out of the loop after just one
+        delete, it is enough, like this:
+        if (i==1) break;
+      */
      /* testing */
      if (remove_count-- == 0)
      {
@@ -355,19 +361,14 @@ static int run_test(const char *filename)
 	}
      }
    }
+  }

-    if (die_in_middle_of_transaction == 2)
+  if (testflag == 4)
  {
-      /*
-        Ensure we get changed pages and log to disk
-        As commit record is not done, the undo entries needs to be rolled back.
-      */
-      _ma_flush_table_files(file, MARIA_FLUSH_DATA, FLUSH_RELEASE,
-                            FLUSH_RELEASE);
-      printf("Dying on request after delete without maria_close()\n");
-      exit(1);
-    }
+    printf("terminating after deletes\n");
+    goto end;
  }
+
  if (!silent)
    printf("- Reading rows with key\n");
  record[1]= 0;                                 /* For nicer printf */
@@ -412,6 +413,39 @@ static int run_test(const char *filename)
 	     i-1,error,my_errno,read_record+1);
    }
  }
+
+end:
+  if (die_in_middle_of_transaction)
+  {
+    /* As commit record is not done, UNDO entries needs to be rolled back */
+    switch (die_in_middle_of_transaction) {
+    case 1:
+      /*
+        Flush changed pages go to disk. That will also flush log. Recovery
+        will skip REDOs and apply UNDOs.
+      */
+      _ma_flush_table_files(file, MARIA_FLUSH_DATA, FLUSH_RELEASE,
+                            FLUSH_RELEASE);
+      break;
+    case 2:
+      /*
+        Just flush log. Pages are likely to not be on disk. Recovery will
+        then execute REDOs and UNDOs.
+      */
+      if (translog_flush(file->trn->undo_lsn))
+        goto err;
+      break;
+    case 3:
+      /*
+        Flush nothing. Pages and log are likely to not be on disk. Recovery
+        will then do nothing.
+      */
+      break;
+    }
+    printf("Dying on request without maria_commit()/maria_close()\n");
+    exit(0);
+  }
+
  if (maria_commit(file))
    goto err;
  if (maria_close(file))
@@ -676,11 +710,13 @@ static struct my_option my_long_options[] =
   (uchar**) &skip_delete, 0, GET_BOOL, NO_ARG, 0, 0, 0, 0, 0, 0},
  {"skip-update", 'D', "Don't test updates", (uchar**) &skip_update,
   (uchar**) &skip_update, 0, GET_BOOL, NO_ARG, 0, 0, 0, 0, 0, 0},
+  {"testflag", 't', "Stop test at specified stage", (uchar**) &testflag,
+   (uchar**) &testflag, 0, GET_INT, REQUIRED_ARG, 0, 0, 0, 0, 0, 0},
  {"test-undo", 'A',
-   "Abort hard after doing inserts. Used for testing recovery with undo",
+   "Abort hard. Used for testing recovery with undo",
   (uchar**) &die_in_middle_of_transaction,
   (uchar**) &die_in_middle_of_transaction,
-   0, GET_INT, OPT_ARG, 0, 0, 0, 0, 0, 0},
+   0, GET_INT, REQUIRED_ARG, 0, 0, 0, 0, 0, 0},
  {"transactional", 'T',
   "Test in transactional mode. (Only works with block format)",
   (uchar**) &transactional, (uchar**) &transactional, 0, GET_BOOL, NO_ARG,
@@ -768,12 +804,6 @@ get_one_option(int optid, const struct my_option *opt __attribute__((unused)),
  case 'K':                                     /* Use key cacheing */
    pagecacheing=1;
    break;
-  case 'A':
-    if (!argument)
-      die_in_middle_of_transaction= 1;
-    else
-      die_in_middle_of_transaction= atoi(argument);
-    break;
  case 'V':
    printf("test1 Ver 1.2 \n");
    exit(0);

--- a/storage/maria/ma_test_recovery
+++ b/storage/maria/ma_test_recovery
@@ -7,58 +7,201 @@ then
    maria_path="."
 fi

-tmp=$maria_path/tmp
+# test data is always put in the current directory or a tmp subdirectory of it
+tmp="./tmp"

 if test '!' -d $tmp
 then
  mkdir $tmp
 fi

-echo "MARIA RECOVERY TESTS - success is if exit code is 0"
+echo "MARIA RECOVERY TESTS"

+check_table_is_same()
+{
+    # Computes checksum of new table and compares to checksum of old table
+    # Shows any difference in table's state (info from the index's header)
+
+    $maria_path/maria_chk -dvv $table | grep -v "Creation time:" > $tmp/maria_chk_message.txt 2>&1
+
+    # save the index file (because we want to test idempotency afterwards)
+    cp $table.MAI tmp/
+    # In the repair below it's good to use -q because it will die on any
+    # incorrectness of the data file if UNDO was badly applied.
+    # QQ: Remove the following line when we also can recover the index file
+    $maria_path/maria_chk -s -rq $table
+
+    $maria_path/maria_chk -s -e $table
+    checksum2=`$maria_path/maria_chk -dss $table`
+    if test "$checksum" != "$checksum2"
+        then
+        echo "checksum differs for $table before and after recovery"
+        return 1;
+    fi
+
+    diff $tmp/maria_chk_message.good.txt $tmp/maria_chk_message.txt > $tmp/maria_chk_diff.txt || true
+    if [ -s $tmp/maria_chk_diff.txt ]
+        then
+        echo "Differences in maria_chk -dvv, recovery not yet perfect !"
+        echo "========DIFF START======="
+        cat $tmp/maria_chk_diff.txt
+        echo "========DIFF END======="
+    fi
+    mv tmp/$table.MAI .
+}
+
+apply_log()
+{
+    # applies log, can verify if applying did write to log or not
+
+    shouldchangelog=$1
+    if [ "$shouldchangelog" != "shouldnotchangelog" ] &&
+        [ "$shouldchangelog" != "shouldchangelog" ] &&
+        [ "$shouldchangelog" != "dontknow" ]
+        then
+        echo "bad argument '$shouldchangelog'"
+        return 1
+    fi
+    log_md5=`md5sum maria_log.*`
+    echo "applying log"
+    $maria_path/maria_read_log -a > $tmp/maria_read_log_$table.txt
+    log_md5_2=`md5sum maria_log.*`
+    if [ "$log_md5" != "$log_md5_2" ]
+        then
+        if [ "$shouldchangelog" == "shouldnotchangelog" ]
+            then
+            echo "maria_read_log should not have modified the log"
+            return 1
+        fi
+        else
+        if [ "$shouldchangelog" == "shouldchangelog" ]
+            then
+            echo "maria_read_log should have modified the log"
+            return 1
+        fi
+    fi
+}
+
+# To not flood the screen, we redirect all the commands below to a text file
+# and just give a final error if their output is not as expected
+
+(
+
+# this message is to remember about the problem with -b (see @todo below)
+echo "!!!!!!!! REMEMBER to FIX this BLOB issue !!!!!!!"
+
+echo "Testing the REDO PHASE ALONE"
 # runs a program inserting/deleting rows, then moves the resulting table
 # elsewhere; applies the log and checks that the data file is
 # identical to the saved original.
 # Does not test the index file as we don't have logging for it yet.

-for prog in "$maria_path/ma_test1 $silent -M -T -c" "$maria_path/ma_test2 $silent -L -K -W -P -M -T -c" "$maria_path/ma_test2 $silent -M -T -c -b"
+set -- "$maria_path/ma_test1 $silent -M -T -c" "$maria_path/ma_test2 $silent -L -K -W -P -M -T -c" "$maria_path/ma_test2 $silent -M -T -c -b"
+while [ $# != 0 ]
 do
-  rm -f maria_log.* maria_log_control
+  prog=$1
+  rm maria_log.* maria_log_control
  echo "TEST WITH $prog"
  $prog
  # derive table's name from program's name
  table=`echo $prog | sed -e 's;.*ma_\(test[0-9]\).*;\1;' `
-  $maria_path/maria_chk -dvv $table > $tmp/maria_chk_message.good.txt 2>&1
+  $maria_path/maria_chk -dvv $table | grep -v "Creation time:"> $tmp/maria_chk_message.good.txt 2>&1
  checksum=`$maria_path/maria_chk -dss $table`
-  mv -f $table.MAD $tmp/$table.MAD.good
+  mv $table.MAD $tmp/$table.MAD.good
  rm $table.MAI
-  echo "applying log"
-  $maria_path/maria_read_log -a > $tmp/maria_read_log_$table.txt
-  $maria_path/maria_chk -dvv $table > $tmp/maria_chk_message.txt 2>&1
-
+  apply_log "shouldnotchangelog"
  cmp $table.MAD $tmp/$table.MAD.good
+  check_table_is_same
+  echo "testing idempotency"
+  apply_log "shouldnotchangelog"
+  cmp $table.MAD $tmp/$table.MAD.good
+  check_table_is_same
+  shift
+done

-  # QQ: Remove the following line when we also can recovert the index file
-  $maria_path/maria_chk -s -r $table
+echo "Testing the REDO AND UNDO PHASE"
+# The test programs look like:
+# work; commit (time T1); work; exit-without-commit (time T2)
+# We first run the test program and let it exit after T1's commit.
+# Then we run it again and let it exit at T2. Then we compare
+# and expect identity.

-  $maria_path/maria_chk -s -e $table
-  checksum2=`$maria_path/maria_chk -dss $table`
-  if test "$checksum" != "$checksum2"
+for blobs in "" "-b" # we test table without blobs and then table with blobs
+do
+  for test_undo in 1 2 3
+  do
+  # first iteration tests rollback of insert, second tests rollback of delete
+  set -- "$maria_path/ma_test1 $silent -M -T -c -N $blobs" "--testflag=1" "--testflag=2" "$maria_path/ma_test1 $silent -M -T -c -N --debug=d:t:i:o,/tmp/ma_test1.trace $blobs" "--testflag=3" "--testflag=4"
+  # -N (create NULL fields) is needed because --test-undo adds it anyway
+  while [ $# != 0 ]
+    do
+    prog=$1
+    commit_run_args=$2
+    abort_run_args=$3;
+    rm maria_log.* maria_log_control
+    echo "TEST WITH $prog $commit_run_args (commit at end)"
+    $prog $commit_run_args
+    # derive table's name from program's name
+    table=`echo $prog | sed -e 's;.*ma_\(test[0-9]\).*;\1;' `
+    $maria_path/maria_chk -dvv $table | grep -v "Creation time:"> $tmp/maria_chk_message.good.txt 2>&1
+    checksum=`$maria_path/maria_chk -dss $table`
+    mv $table.MAD $tmp/$table.MAD.good
+    rm $table.MAI
+    rm maria_log.* maria_log_control
+    echo "TEST WITH $prog $abort_run_args --test-undo=$test_undo (additional aborted work)"
+    $prog $abort_run_args --test-undo=$test_undo
+    cp $table.MAD $tmp/$table.MAD.before_undo
+    if [ $test_undo -lt 3 ]
        then
-   echo "checksum differs for $table before and after recovery"
-   exit 1;
+        apply_log "shouldchangelog" # should undo aborted work
+        else
+        # probably nothing to undo went to log or data file
+        apply_log "dontknow"
    fi
+    cp $table.MAD $tmp/$table.MAD.after_undo
+
+    # It is impossible to do a "cmp" between .good and .after_undo,
+    # because the UNDO phase generated log
+    # records whose LSN tagged pages. Another reason is that rolling back
+    # INSERT only marks the rows free, does not empty them (optimization), so
+    # traces of the INSERT+rollback remain.

-# When "recovery of the table's state" is ready, we can test it like this:
-#  diff $tmp/maria_chk_message.good.txt $tmp/maria_chk_message.txt > $tmp/maria_chk_diff.txt || true
-#  if [ -s $tmp/maria_chk_diff.txt ]
-#      then
-#      echo "Differences in maria_chk -dvv, recovery not yet perfect !"
-#      echo "========DIFF START======="
-#      cat $tmp/maria_chk_diff.txt
-#      echo "========DIFF END======="
-#  fi
-  rm -f $table.* $tmp/maria_chk_*.txt $tmp/maria_read_log_$table.txt
+    check_table_is_same
+    echo "testing idempotency"
+    apply_log "shouldnotchangelog"
+    cmp $table.MAD $tmp/$table.MAD.after_undo
+    check_table_is_same
+    echo "testing applying of CLRs to recreate table"
+    rm $table.MA?
+    apply_log "shouldnotchangelog"
+    # the cmp below fails with blobs! @todo RECOVERY BUG find out why.
+    # It is probably serious; REDOs shouldn't place rows in different
+    # positions from what the run-time code did. Indeed it may lead to
+    # more or less free space...
+    # Execution of UNDO re-inserted rows at different positions than
+    # originally. This generated REDOs which do not insert at the same
+    # positions as the execution of UNDOs, but at the same positions
+    # as before the row was originally deleted.
+    if [ "$blobs" == "" ]
+        then
+        cmp $table.MAD $tmp/$table.MAD.after_undo
+    fi
+    check_table_is_same
+    shift 3
+  done
+done
 done
+rm -f $table.* $tmp/$table* $tmp/maria_chk_*.txt $tmp/maria_read_log_$table.txt

+) > $tmp/ma_test_recovery.output
+
+diff $maria_path/ma_test_recovery.expected $tmp/ma_test_recovery.output > /dev/null || diff_failed=1
+if [ "$diff_failed" == "1" ]
+    then
+    echo "UNEXPECTED OUTPUT OF TESTS, FAILED"
+    echo "For more info, do diff $maria_path/ma_test_recovery.expected $tmp/ma_test_recovery.output"
+    exit 1
+    fi
 echo "ALL RECOVERY TESTS OK"
+# this message is to remember about the problem with -b (see @todo above)
+echo "!!!!!!!! BUT REMEMBER to FIX this BLOB issue !!!!!!!"
--- a/storage/maria/ma_test_recovery.expected
+++ b/storage/maria/ma_test_recovery.expected
--- a/storage/maria/maria_read_log.c
+++ b/storage/maria/maria_read_log.c
@@ -93,7 +93,8 @@ int main(int argc, char **argv)
  */

  fprintf(stdout, "TRACE of the last maria_read_log\n");
-  if (maria_apply_log(lsn, opt_display_and_apply, stdout, TRUE))
+  if (maria_apply_log(lsn, opt_display_and_apply, stdout,
+                      opt_display_and_apply))
    goto err;
  fprintf(stdout, "%s: SUCCESS\n", my_progname);