Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

bug: Handling an Unexpected Halt of a Replica #1611

Open
adofsauron opened this issue Apr 23, 2023 · 44 comments
Open

bug: Handling an Unexpected Halt of a Replica #1611

adofsauron opened this issue Apr 23, 2023 · 44 comments
Labels
A-bug Something isn't working prio: high High priority

Comments

@adofsauron
Copy link
Collaborator

Reference:

https://dev.mysql.com/doc/refman/8.0/en/replication-features-transaction-inconsistencies.html

@adofsauron adofsauron added A-feature feature with good idea prio: high High priority labels Apr 23, 2023
@adofsauron adofsauron self-assigned this Apr 23, 2023
@adofsauron adofsauron changed the title feature: Handling an Unexpected Halt of a Replica bug: Handling an Unexpected Halt of a Replica Apr 23, 2023
@adofsauron adofsauron added A-bug Something isn't working and removed A-feature feature with good idea labels Apr 23, 2023
@adofsauron
Copy link
Collaborator Author

ACK

@adofsauron
Copy link
Collaborator Author

Common crash recovery lies in the process of a transaction, usually in the form of log redundancy, including undo logs and redo logs

---
LOG
---
Log sequence number 0 52087
Log flushed up to   0 52087
Last checkpoint at  0 52087
0 pending log writes, 0 pending chkp writes
20 log i/o's done, 0.00 log i/o's/second


@adofsauron
Copy link
Collaborator Author

Applies the hashed log records to the page, if the page lsn is less than the
lsn of a log record. This can be called when a buffer page has just been
read in, or also for a page already in the buffer pool.

@adofsauron
Copy link
Collaborator Author

Recover from a breakdown point

ibool recv_read_cp_info_for_backup(
    /*=========================*/
    /* out: TRUE if success */
    byte *hdr,        /* in: buffer containing the log group header */
    dulint *lsn,      /* out: checkpoint lsn */
    ulint *offset,    /* out: checkpoint offset in the log group */
    ulint *fsp_limit, /* out: fsp limit of space 0, 1000000000 if the
                    database is running with < version 3.23.50 of InnoDB */
    dulint *cp_no,    /* out: checkpoint number */
    dulint *first_header_lsn);

@adofsauron
Copy link
Collaborator Author

Recovers from a checkpoint. When this function returns, the database is able
to start processing of new user transactions, but the function
recv_recovery_from_checkpoint_finish should be called later to complete
the recovery and free the resources used in it.

@adofsauron
Copy link
Collaborator Author

The recovery process by which a replica recovers from an unexpected halt varies depending on the configuration of the replica. The details of the recovery process are influenced by the chosen method of replication, whether the replica is single-threaded or multithreaded, and the setting of relevant system variables. The overall aim of the recovery process is to identify what transactions had already been applied on the replica's database before the unexpected halt occurred, and retrieve and apply the transactions that the replica missed following the unexpected halt.

@adofsauron
Copy link
Collaborator Author

For file position based replication, the recovery process needs an accurate replication SQL thread (applier) position showing the last transaction that was applied on the replica. Based on that position, the replication I/O thread (receiver) retrieves from the source's binary log all of the transactions that should be applied on the replica from that point on.

@adofsauron
Copy link
Collaborator Author

the recovery process fails if gaps in the sequence of transactions cannot be filled using the information in the relay log. For a single-threaded replica, the recovery process only needs to use the relay log if the relevant information is not available in the applier metadata repository.

@adofsauron
Copy link
Collaborator Author

Each rollback segment maintains a segment header page, which is allocated to 1024 slots (TRX_RSEG_N_SLOTS), each of which corresponds to an undo log object, so InnoDB theoretically supports a maximum of 96 * 1024 common transactions

@adofsauron
Copy link
Collaborator Author

Read the first log file header to print a note if this is a recovery from a restored Hot Backup

@adofsauron
Copy link
Collaborator Author

Start reading the log groups from the checkpoint lsn up. The
variable contiguous_lsn contains an lsn up to which the log is
known to be contiguously written to all log groups.

@adofsauron
Copy link
Collaborator Author

When a logical backup explicitly opens a read view and a long backup is performed, the purge operation on the slave_relay_log_info table cannot be done, resulting in a long version chain. When you start backing up the slave_relay_log_info table, it takes a long time to build the old version. The replication thread is caught waiting for the Page latch because it needs to update the slave_relay_log_info table, which could eventually cause the semaphore wait to time out and the instance to commit suicide

@adofsauron
Copy link
Collaborator Author

When the instance recovers from a crash, the ACTIVE transaction needs to be extracted from the undo. For the transaction in the active state, the transaction is rolled back directly. For the transaction in the Prepare state, if the corresponding binlog of the transaction has been recorded, the transaction is committed, otherwise the transaction is rolled back.

@adofsauron
Copy link
Collaborator Author

struct log_group_struct
{
  /* The following fields are protected by log_sys->mutex */
  ulint id;                /* log group id */
  ulint n_files;           /* number of files in the group */
  ulint file_size;         /* individual log file size in bytes,
                           including the log file header */
  ulint space_id;          /* file space which implements the log
                           group */
  ulint state;             /* LOG_GROUP_OK or
                           LOG_GROUP_CORRUPTED */
  dulint lsn;              /* lsn used to fix coordinates within
                           the log group */
  ulint lsn_offset;        /* the offset of the above lsn */
  ulint n_pending_writes;  /* number of currently pending flush
                          writes for this log group */
  byte **file_header_bufs; /* buffers for each file header in the
                          group */
  /*-----------------------------*/
  byte **archive_file_header_bufs; /* buffers for each file
                          header in the group */
  ulint archive_space_id;          /* file space which implements the log
                                  group archive */
  ulint archived_file_no;          /* file number corresponding to
                                  log_sys->archived_lsn */
  ulint archived_offset;           /* file offset corresponding to
                                   log_sys->archived_lsn, 0 if we have
                                   not yet written to the archive file
                                   number archived_file_no */
  ulint next_archived_file_no;     /* during an archive write,
                             until the write is completed, we
                             store the next value for
                             archived_file_no here: the write
                             completion function then sets the new
                             value to ..._file_no */
  ulint next_archived_offset;      /* like the preceding field */
  /*-----------------------------*/
  dulint scanned_lsn;   /* used only in recovery: recovery scan
                        succeeded up to this lsn in this log
                        group */
  byte *checkpoint_buf; /* checkpoint header is written from
                        this buffer to the group */
  UT_LIST_NODE_T(log_group_t)
  log_groups; /* list of log groups */
};

@adofsauron
Copy link
Collaborator Author

image

@adofsauron
Copy link
Collaborator Author

When the crash recover restarts, the lsn recorded in the checkpoint is read and redo logs are scanned from the lsn

@adofsauron
Copy link
Collaborator Author

Check whether the redo log is sufficient for every four external storage pages. If the redo log is insufficient, checkpoint lsn is advanced

@adofsauron
Copy link
Collaborator Author

If we are using the doublewrite method, we will
check if there are half-written pages in data files,
and restore them from the doublewrite buffer if
possible

@adofsauron
Copy link
Collaborator Author

MLOG_CHECKPOINT logs record "CHECKPOINT LSN". If the LSN recorded in the log is the same as the "CHECKPOINT LSN" recorded in the log header, the matching MLOG_CHECKPOINT LSN is found. Record the scanned LSN number to recv_sys-> mlog_checkpoint_lsn

@adofsauron
Copy link
Collaborator Author

Log objects are stored using a hash structure, calculating hash values based on space id and page no, and changes on the same page are linked together as list nodes

@adofsauron
Copy link
Collaborator Author

struct recv_addr_struct
{
  ulint state;   /* RECV_NOT_PROCESSED, RECV_BEING_PROCESSED,
                 or RECV_PROCESSED */
  ulint space;   /* space id */
  ulint page_no; /* page number */
  UT_LIST_BASE_NODE_T(recv_t)
  rec_list; /* list of log records for this page */
  hash_node_t addr_hash;
};

/* Recovery system data structure */
typedef struct recv_sys_struct recv_sys_t;
struct recv_sys_struct
{
  mutex_t mutex; /* mutex protecting the fields apply_log_recs,
                 n_addrs, and the state field in each recv_addr
                 struct */
  ibool apply_log_recs;
  /* this is TRUE when log rec application to
  pages is allowed; this flag tells the
  i/o-handler if it should do log record
  application */
  ibool apply_batch_on;
  /* this is TRUE when a log rec application
  batch is running */
  dulint lsn; /* log sequence number */
  ulint last_log_buf_size;
  /* size of the log buffer when the database
  last time wrote to the log */
  byte *last_block;
  /* possible incomplete last recovered log
  block */
  byte *last_block_buf_start;
  /* the nonaligned start address of the
  preceding buffer */
  byte *buf; /* buffer for parsing log records */
  ulint len; /* amount of data in buf */
  dulint parse_start_lsn;
  /* this is the lsn from which we were able to
  start parsing log records and adding them to
  the hash table; ut_dulint_zero if a suitable
  start point not found yet */
  dulint scanned_lsn;
  /* the log data has been scanned up to this
  lsn */
  ulint scanned_checkpoint_no;
  /* the log data has been scanned up to this
  checkpoint number (lowest 4 bytes) */
  ulint recovered_offset;
  /* start offset of non-parsed log records in
  buf */
  dulint recovered_lsn;
  /* the log records have been parsed up to
  this lsn */
  dulint limit_lsn; /* recovery should be made at most up to this
                  lsn */
  ibool found_corrupt_log;
  /* this is set to TRUE if we during log
  scan find a corrupt log block, or a corrupt
  log record, or there is a log parsing
  buffer overflow */
  log_group_t *archive_group;
  /* in archive recovery: the log group whose
  archive is read */
  mem_heap_t *heap;        /* memory heap of log records and file
                           addresses*/
  hash_table_t *addr_hash; /* hash table of file addresses of pages */
  ulint n_addrs;           /* number of not processed hashed file
                           addresses in the hash table */
};

@adofsauron
Copy link
Collaborator Author

void recv_recover_page(
    /*==============*/
    ibool recover_backup, /* in: TRUE if we are recovering a backup
                          page: then we do not acquire any latches
                          since the page was read in outside the
                          buffer pool */
    ibool just_read_in,   /* in: TRUE if the i/o-handler calls this for
                          a freshly read page */
    page_t *page,         /* in: buffer page */
    ulint space,          /* in: space id */
    ulint page_no)        /* in: page number */
{
  buf_block_t *block = NULL;
  recv_addr_t *recv_addr;
  recv_t *recv;
  byte *buf;
  dulint start_lsn;
  dulint end_lsn;
  dulint page_lsn;
  dulint page_newest_lsn;
  ibool modification_to_page;
  ibool success;
  mtr_t mtr;

  mutex_enter(&(recv_sys->mutex));

  if (recv_sys->apply_log_recs == FALSE)
  {
    /* Log records should not be applied now */

    mutex_exit(&(recv_sys->mutex));

    return;
  }

  recv_addr = recv_get_fil_addr_struct(space, page_no);

  if ((recv_addr == NULL) || (recv_addr->state == RECV_BEING_PROCESSED) || (recv_addr->state == RECV_PROCESSED))
  {
    mutex_exit(&(recv_sys->mutex));

    return;
  }

  /* fprintf(stderr, "Recovering space %lu, page %lu\n", space, page_no); */

  recv_addr->state = RECV_BEING_PROCESSED;

  mutex_exit(&(recv_sys->mutex));

  mtr_start(&mtr);
  mtr_set_log_mode(&mtr, MTR_LOG_NONE);

  if (!recover_backup)
  {
    block = buf_block_align(page);

    if (just_read_in)
    {
      /* Move the ownership of the x-latch on the page to this OS
      thread, so that we can acquire a second x-latch on it. This
      is needed for the operations to the page to pass the debug
      checks. */

      rw_lock_x_lock_move_ownership(&(block->lock));
    }

    success = buf_page_get_known_nowait(RW_X_LATCH, page, BUF_KEEP_OLD, __FILE__, __LINE__, &mtr);
    ut_a(success);

#ifdef UNIV_SYNC_DEBUG
    buf_page_dbg_add_level(page, SYNC_NO_ORDER_CHECK);
#endif /* UNIV_SYNC_DEBUG */
  }

  /* Read the newest modification lsn from the page */
  page_lsn = mach_read_from_8(page + FIL_PAGE_LSN);

  if (!recover_backup)
  {
    /* It may be that the page has been modified in the buffer
    pool: read the newest modification lsn there */

    page_newest_lsn = buf_frame_get_newest_modification(page);

    if (!ut_dulint_is_zero(page_newest_lsn))
    {
      page_lsn = page_newest_lsn;
    }
  }
  else
  {
    /* In recovery from a backup we do not really use the buffer
    pool */

    page_newest_lsn = ut_dulint_zero;
  }

  modification_to_page = FALSE;
  start_lsn = end_lsn = ut_dulint_zero;

  recv = UT_LIST_GET_FIRST(recv_addr->rec_list);

  while (recv)
  {
    end_lsn = recv->end_lsn;

    if (recv->len > RECV_DATA_BLOCK_SIZE)
    {
      /* We have to copy the record body to a separate
      buffer */

      buf = mem_alloc(recv->len);

      recv_data_copy_to_buf(buf, recv);
    }
    else
    {
      buf = ((byte *)(recv->data)) + sizeof(recv_data_t);
    }

    if (recv->type == MLOG_INIT_FILE_PAGE)
    {
      page_lsn = page_newest_lsn;

      mach_write_to_8(page + UNIV_PAGE_SIZE - FIL_PAGE_END_LSN_OLD_CHKSUM, ut_dulint_zero);
      mach_write_to_8(page + FIL_PAGE_LSN, ut_dulint_zero);
    }

    if (ut_dulint_cmp(recv->start_lsn, page_lsn) >= 0)
    {
      if (!modification_to_page)
      {
        modification_to_page = TRUE;
        start_lsn = recv->start_lsn;
      }

#ifdef UNIV_DEBUG
      if (log_debug_writes)
      {
        fprintf(stderr, "InnoDB: Applying log rec type %lu len %lu to space %lu page no %lu\n", (ulong)recv->type,
                (ulong)recv->len, (ulong)recv_addr->space, (ulong)recv_addr->page_no);
      }
#endif /* UNIV_DEBUG */

      recv_parse_or_apply_log_rec_body(recv->type, buf, buf + recv->len, page, &mtr);
      mach_write_to_8(page + UNIV_PAGE_SIZE - FIL_PAGE_END_LSN_OLD_CHKSUM, ut_dulint_add(recv->start_lsn, recv->len));
      mach_write_to_8(page + FIL_PAGE_LSN, ut_dulint_add(recv->start_lsn, recv->len));
    }

    if (recv->len > RECV_DATA_BLOCK_SIZE)
    {
      mem_free(buf);
    }

    recv = UT_LIST_GET_NEXT(rec_list, recv);
  }

  mutex_enter(&(recv_sys->mutex));

  if (ut_dulint_cmp(recv_max_page_lsn, page_lsn) < 0)
  {
    recv_max_page_lsn = page_lsn;
  }

  recv_addr->state = RECV_PROCESSED;

  ut_a(recv_sys->n_addrs);
  recv_sys->n_addrs--;

  mutex_exit(&(recv_sys->mutex));

  if (!recover_backup && modification_to_page)
  {
    ut_a(block);

    buf_flush_recv_note_modification(block, start_lsn, end_lsn);
  }

  /* Make sure that committing mtr does not change the modification
  lsn values of page */

  mtr.modifications = FALSE;

  mtr_commit(&mtr);
}

@adofsauron
Copy link
Collaborator Author

static byte *recv_parse_or_apply_log_rec_body(
    /*=============================*/
    /* out: log record end, NULL if not a complete
    record */
    byte type,     /* in: type */
    byte *ptr,     /* in: pointer to a buffer */
    byte *end_ptr, /* in: pointer to the buffer end */
    page_t *page,  /* in: buffer page or NULL; if not NULL, then the log
                   record is applied to the page, and the log record
                   should be complete then */
    mtr_t *mtr)    /* in: mtr or NULL; should be non-NULL if and only if
                   page is non-NULL */
{
  dict_index_t *index = NULL;

  switch (type)
  {
    case MLOG_1BYTE:
    case MLOG_2BYTES:
    case MLOG_4BYTES:
    case MLOG_8BYTES:
      ptr = mlog_parse_nbytes(type, ptr, end_ptr, page);
      break;
    case MLOG_REC_INSERT:
    case MLOG_COMP_REC_INSERT:
      if (NULL != (ptr = mlog_parse_index(ptr, end_ptr, type == MLOG_COMP_REC_INSERT, &index)))
      {
        ut_a(!page || (ibool) !!page_is_comp(page) == index->table->comp);
        ptr = page_cur_parse_insert_rec(FALSE, ptr, end_ptr, index, page, mtr);
      }
      break;
    case MLOG_REC_CLUST_DELETE_MARK:
    case MLOG_COMP_REC_CLUST_DELETE_MARK:
      if (NULL != (ptr = mlog_parse_index(ptr, end_ptr, type == MLOG_COMP_REC_CLUST_DELETE_MARK, &index)))
      {
        ut_a(!page || (ibool) !!page_is_comp(page) == index->table->comp);
        ptr = btr_cur_parse_del_mark_set_clust_rec(ptr, end_ptr, index, page);
      }
      break;
    case MLOG_COMP_REC_SEC_DELETE_MARK:
      /* This log record type is obsolete, but we process it for
      backward compatibility with MySQL 5.0.3 and 5.0.4. */
      ut_a(!page || page_is_comp(page));
      ptr = mlog_parse_index(ptr, end_ptr, TRUE, &index);
      if (!ptr)
      {
        break;
      }
      /* Fall through */
    case MLOG_REC_SEC_DELETE_MARK:
      ptr = btr_cur_parse_del_mark_set_sec_rec(ptr, end_ptr, page);
      break;
    case MLOG_REC_UPDATE_IN_PLACE:
    case MLOG_COMP_REC_UPDATE_IN_PLACE:
      if (NULL != (ptr = mlog_parse_index(ptr, end_ptr, type == MLOG_COMP_REC_UPDATE_IN_PLACE, &index)))
      {
        ut_a(!page || (ibool) !!page_is_comp(page) == index->table->comp);
        ptr = btr_cur_parse_update_in_place(ptr, end_ptr, page, index);
      }
      break;
    case MLOG_LIST_END_DELETE:
    case MLOG_COMP_LIST_END_DELETE:
    case MLOG_LIST_START_DELETE:
    case MLOG_COMP_LIST_START_DELETE:
      if (NULL != (ptr = mlog_parse_index(
                       ptr, end_ptr, type == MLOG_COMP_LIST_END_DELETE || type == MLOG_COMP_LIST_START_DELETE, &index)))
      {
        ut_a(!page || (ibool) !!page_is_comp(page) == index->table->comp);
        ptr = page_parse_delete_rec_list(type, ptr, end_ptr, index, page, mtr);
      }
      break;
    case MLOG_LIST_END_COPY_CREATED:
    case MLOG_COMP_LIST_END_COPY_CREATED:
      if (NULL != (ptr = mlog_parse_index(ptr, end_ptr, type == MLOG_COMP_LIST_END_COPY_CREATED, &index)))
      {
        ut_a(!page || (ibool) !!page_is_comp(page) == index->table->comp);
        ptr = page_parse_copy_rec_list_to_created_page(ptr, end_ptr, index, page, mtr);
      }
      break;
    case MLOG_PAGE_REORGANIZE:
    case MLOG_COMP_PAGE_REORGANIZE:
      if (NULL != (ptr = mlog_parse_index(ptr, end_ptr, type == MLOG_COMP_PAGE_REORGANIZE, &index)))
      {
        ut_a(!page || (ibool) !!page_is_comp(page) == index->table->comp);
        ptr = btr_parse_page_reorganize(ptr, end_ptr, index, page, mtr);
      }
      break;
    case MLOG_PAGE_CREATE:
    case MLOG_COMP_PAGE_CREATE:
      ptr = page_parse_create(ptr, end_ptr, type == MLOG_COMP_PAGE_CREATE, page, mtr);
      break;
    case MLOG_UNDO_INSERT:
      ptr = trx_undo_parse_add_undo_rec(ptr, end_ptr, page);
      break;
    case MLOG_UNDO_ERASE_END:
      ptr = trx_undo_parse_erase_page_end(ptr, end_ptr, page, mtr);
      break;
    case MLOG_UNDO_INIT:
      ptr = trx_undo_parse_page_init(ptr, end_ptr, page, mtr);
      break;
    case MLOG_UNDO_HDR_DISCARD:
      ptr = trx_undo_parse_discard_latest(ptr, end_ptr, page, mtr);
      break;
    case MLOG_UNDO_HDR_CREATE:
    case MLOG_UNDO_HDR_REUSE:
      ptr = trx_undo_parse_page_header(type, ptr, end_ptr, page, mtr);
      break;
    case MLOG_REC_MIN_MARK:
    case MLOG_COMP_REC_MIN_MARK:
      ptr = btr_parse_set_min_rec_mark(ptr, end_ptr, type == MLOG_COMP_REC_MIN_MARK, page, mtr);
      break;
    case MLOG_REC_DELETE:
    case MLOG_COMP_REC_DELETE:
      if (NULL != (ptr = mlog_parse_index(ptr, end_ptr, type == MLOG_COMP_REC_DELETE, &index)))
      {
        ut_a(!page || (ibool) !!page_is_comp(page) == index->table->comp);
        ptr = page_cur_parse_delete_rec(ptr, end_ptr, index, page, mtr);
      }
      break;
    case MLOG_IBUF_BITMAP_INIT:
      ptr = ibuf_parse_bitmap_init(ptr, end_ptr, page, mtr);
      break;
    case MLOG_INIT_FILE_PAGE:
      ptr = fsp_parse_init_file_page(ptr, end_ptr, page);
      break;
    case MLOG_WRITE_STRING:
      ptr = mlog_parse_string(ptr, end_ptr, page);
      break;
    case MLOG_FILE_CREATE:
    case MLOG_FILE_RENAME:
    case MLOG_FILE_DELETE:
      ptr = fil_op_log_parse_or_replay(ptr, end_ptr, type, FALSE, ULINT_UNDEFINED);
      break;
    default:
      ptr = NULL;
      recv_sys->found_corrupt_log = TRUE;
  }

  ut_ad(!page || ptr);
  if (index)
  {
    dict_table_t *table = index->table;
    mem_heap_free(index->heap);
    mutex_free(&(table->autoinc_mutex));
    mem_heap_free(table->heap);
  }

  return (ptr);
}

@adofsauron
Copy link
Collaborator Author

recv_parse_log_recs
    --> recv_parse_log_rec
        --> recv_parse_or_apply_log_rec_body

@adofsauron
Copy link
Collaborator Author

When the primary database crashes, the binlog is not delivered to the secondary database. If we directly promote the secondary database to the primary database, the primary database will be inconsistent with the secondary database. The old primary database must be redone according to the new primary database to restore the status

@adofsauron
Copy link
Collaborator Author

static byte *recv_parse_or_apply_log_rec_body(
    /*=============================*/
    /* out: log record end, NULL if not a complete
    record */
    byte type,     /* in: type */
    byte *ptr,     /* in: pointer to a buffer */
    byte *end_ptr, /* in: pointer to the buffer end */
    page_t *page,  /* in: buffer page or NULL; if not NULL, then the log
                   record is applied to the page, and the log record
                   should be complete then */
    mtr_t *mtr)    /* in: mtr or NULL; should be non-NULL if and only if
                   page is non-NULL */
{
  dict_index_t *index = NULL;

  switch (type)
  {
    case MLOG_1BYTE:
    case MLOG_2BYTES:
    case MLOG_4BYTES:
    case MLOG_8BYTES:
      ptr = mlog_parse_nbytes(type, ptr, end_ptr, page);
      break;
    case MLOG_REC_INSERT:
    case MLOG_COMP_REC_INSERT:
      if (NULL != (ptr = mlog_parse_index(ptr, end_ptr, type == MLOG_COMP_REC_INSERT, &index)))
      {
        ut_a(!page || (ibool) !!page_is_comp(page) == index->table->comp);
        ptr = page_cur_parse_insert_rec(FALSE, ptr, end_ptr, index, page, mtr);
      }
      break;
    case MLOG_REC_CLUST_DELETE_MARK:
    case MLOG_COMP_REC_CLUST_DELETE_MARK:
      if (NULL != (ptr = mlog_parse_index(ptr, end_ptr, type == MLOG_COMP_REC_CLUST_DELETE_MARK, &index)))
      {
        ut_a(!page || (ibool) !!page_is_comp(page) == index->table->comp);
        ptr = btr_cur_parse_del_mark_set_clust_rec(ptr, end_ptr, index, page);
      }
      break;
    case MLOG_COMP_REC_SEC_DELETE_MARK:
      /* This log record type is obsolete, but we process it for
      backward compatibility with MySQL 5.0.3 and 5.0.4. */
      ut_a(!page || page_is_comp(page));
      ptr = mlog_parse_index(ptr, end_ptr, TRUE, &index);
      if (!ptr)
      {
        break;
      }
      /* Fall through */
    case MLOG_REC_SEC_DELETE_MARK:
      ptr = btr_cur_parse_del_mark_set_sec_rec(ptr, end_ptr, page);
      break;
    case MLOG_REC_UPDATE_IN_PLACE:
    case MLOG_COMP_REC_UPDATE_IN_PLACE:
      if (NULL != (ptr = mlog_parse_index(ptr, end_ptr, type == MLOG_COMP_REC_UPDATE_IN_PLACE, &index)))
      {
        ut_a(!page || (ibool) !!page_is_comp(page) == index->table->comp);
        ptr = btr_cur_parse_update_in_place(ptr, end_ptr, page, index);
      }
      break;
    case MLOG_LIST_END_DELETE:
    case MLOG_COMP_LIST_END_DELETE:
    case MLOG_LIST_START_DELETE:
    case MLOG_COMP_LIST_START_DELETE:
      if (NULL != (ptr = mlog_parse_index(
                       ptr, end_ptr, type == MLOG_COMP_LIST_END_DELETE || type == MLOG_COMP_LIST_START_DELETE, &index)))
      {
        ut_a(!page || (ibool) !!page_is_comp(page) == index->table->comp);
        ptr = page_parse_delete_rec_list(type, ptr, end_ptr, index, page, mtr);
      }
      break;
    case MLOG_LIST_END_COPY_CREATED:
    case MLOG_COMP_LIST_END_COPY_CREATED:
      if (NULL != (ptr = mlog_parse_index(ptr, end_ptr, type == MLOG_COMP_LIST_END_COPY_CREATED, &index)))
      {
        ut_a(!page || (ibool) !!page_is_comp(page) == index->table->comp);
        ptr = page_parse_copy_rec_list_to_created_page(ptr, end_ptr, index, page, mtr);
      }
      break;
    case MLOG_PAGE_REORGANIZE:
    case MLOG_COMP_PAGE_REORGANIZE:
      if (NULL != (ptr = mlog_parse_index(ptr, end_ptr, type == MLOG_COMP_PAGE_REORGANIZE, &index)))
      {
        ut_a(!page || (ibool) !!page_is_comp(page) == index->table->comp);
        ptr = btr_parse_page_reorganize(ptr, end_ptr, index, page, mtr);
      }
      break;
    case MLOG_PAGE_CREATE:
    case MLOG_COMP_PAGE_CREATE:
      ptr = page_parse_create(ptr, end_ptr, type == MLOG_COMP_PAGE_CREATE, page, mtr);
      break;
    case MLOG_UNDO_INSERT:
      ptr = trx_undo_parse_add_undo_rec(ptr, end_ptr, page);
      break;
    case MLOG_UNDO_ERASE_END:
      ptr = trx_undo_parse_erase_page_end(ptr, end_ptr, page, mtr);
      break;
    case MLOG_UNDO_INIT:
      ptr = trx_undo_parse_page_init(ptr, end_ptr, page, mtr);
      break;
    case MLOG_UNDO_HDR_DISCARD:
      ptr = trx_undo_parse_discard_latest(ptr, end_ptr, page, mtr);
      break;
    case MLOG_UNDO_HDR_CREATE:
    case MLOG_UNDO_HDR_REUSE:
      ptr = trx_undo_parse_page_header(type, ptr, end_ptr, page, mtr);
      break;
    case MLOG_REC_MIN_MARK:
    case MLOG_COMP_REC_MIN_MARK:
      ptr = btr_parse_set_min_rec_mark(ptr, end_ptr, type == MLOG_COMP_REC_MIN_MARK, page, mtr);
      break;
    case MLOG_REC_DELETE:
    case MLOG_COMP_REC_DELETE:
      if (NULL != (ptr = mlog_parse_index(ptr, end_ptr, type == MLOG_COMP_REC_DELETE, &index)))
      {
        ut_a(!page || (ibool) !!page_is_comp(page) == index->table->comp);
        ptr = page_cur_parse_delete_rec(ptr, end_ptr, index, page, mtr);
      }
      break;
    case MLOG_IBUF_BITMAP_INIT:
      ptr = ibuf_parse_bitmap_init(ptr, end_ptr, page, mtr);
      break;
    case MLOG_INIT_FILE_PAGE:
      ptr = fsp_parse_init_file_page(ptr, end_ptr, page);
      break;
    case MLOG_WRITE_STRING:
      ptr = mlog_parse_string(ptr, end_ptr, page);
      break;
    case MLOG_FILE_CREATE:
    case MLOG_FILE_RENAME:
    case MLOG_FILE_DELETE:
      ptr = fil_op_log_parse_or_replay(ptr, end_ptr, type, FALSE, ULINT_UNDEFINED);
      break;
    default:
      ptr = NULL;
      recv_sys->found_corrupt_log = TRUE;
  }

  ut_ad(!page || ptr);
  if (index)
  {
    dict_table_t *table = index->table;
    mem_heap_free(index->heap);
    mutex_free(&(table->autoinc_mutex));
    mem_heap_free(table->heap);
  }

  return (ptr);
}

@adofsauron
Copy link
Collaborator Author


cnt:1 bzize:16384 totalsize:212992 cursize:16384
cnt:2 bzize:16384 totalsize:212992 cursize:32768
cnt:3 bzize:16384 totalsize:212992 cursize:49152
cnt:4 bzize:16384 totalsize:212992 cursize:65536
hint2
index_id:37 level:1 next_offset:4294967295 offset:3
cnt:5 bzize:16384 totalsize:212992 cursize:81920
hint1
index_id:37 level:0 next_offset:5 offset:4
cnt:6 bzize:16384 totalsize:212992 cursize:98304
hint1
index_id:37 level:0 next_offset:6 offset:5
cnt:7 bzize:16384 totalsize:212992 cursize:114688
hint1
index_id:37 level:0 next_offset:7 offset:6
cnt:8 bzize:16384 totalsize:212992 cursize:131072
hint1
index_id:37 level:0 next_offset:8 offset:7
cnt:9 bzize:16384 totalsize:212992 cursize:147456
hint1
index_id:37 level:0 next_offset:9 offset:8
cnt:10 bzize:16384 totalsize:212992 cursize:163840
hint1
index_id:37 level:0 next_offset:10 offset:9
cnt:11 bzize:16384 totalsize:212992 cursize:180224
hint1
index_id:37 level:0 next_offset:11 offset:10
cnt:12 bzize:16384 totalsize:212992 cursize:196608
hint1
index_id:37 level:0 next_offset:4294967295 offset:11
cnt:13 bzize:16384 totalsize:212992 cursize:212992
===INDEX_ID:37
level1 total block is (1)
block_no:         3,level:   1|*|
level0 total block is (8)
block_no:         4,level:   0|*|block_no:         5,level:   0|*|block_no:         6,level:   0|*|
block_no:         7,level:   0|*|block_no:         8,level:   0|*|block_no:         9,level:   0|*|
block_no:        10,level:   0|*|block_no:        11,level:   0|*|


@adofsauron
Copy link
Collaborator Author


block_no:3          space_id:20           index_id:37          
slot_nums:3         heaps_rows:10         n_rows:8         
heap_top:224        del_bytes:0           last_ins_offset:216        
page_dir:2          page_n_dir:7          
leaf_inode_space:20         leaf_inode_pag_no:2         
leaf_inode_offset:242       
no_leaf_inode_space:20      no_leaf_inode_pag_no:2         
no_leaf_inode_offset:50        
last_modify_lsn:3011837
page_type:B+_TREE level:1     

@adofsauron
Copy link
Collaborator Author

If an operating system, storage subsystem, or unexpected mysqld process exits during a page write, a good copy of the page can be found from the dual write buffer during crash recovery

@adofsauron
Copy link
Collaborator Author

Undo logs exist within undo log segments, which are contained within rollback segments. Rollback segments reside in the system tablespace, in undo tablespaces, and in the temporary tablespace

@adofsauron
Copy link
Collaborator Author

The number of transactions supported by a rollback segment depends on the number of undo slots in the rollback segment and the number of undo logs required per transaction

@adofsauron
Copy link
Collaborator Author

Transactions that perform INSERT, UPDATE, and DELETE operations on regular and temporary tables require a full allocation of four undo logs. Transactions that perform INSERT operations only on regular tables require a single undo log

@adofsauron
Copy link
Collaborator Author

664e1b9fc5a3aa24507adbdbfe260cca

@adofsauron
Copy link
Collaborator Author

1682478712439_F23E2B60-3E93-42d7-BECE-9AEB1E35BDE6

@adofsauron adofsauron assigned adofsauron and unassigned adofsauron Apr 27, 2023
@adofsauron
Copy link
Collaborator Author

On the rocksdb system, you need to coordinate the relationship with the master database

@adofsauron
Copy link
Collaborator Author

class PageHeader {
friend NdbOut& operator<<(NdbOut&, const PageHeader&);
public:
bool check();
Uint32 getLogRecordSize();
bool lastPage();
Uint32 lastWord();
protected:
Uint32 m_checksum;
Uint32 m_lap;
Uint32 m_max_gci_completed;
Uint32 m_max_gci_started;
Uint32 m_next_page;
Uint32 m_previous_page;
Uint32 m_ndb_version;
Uint32 m_number_of_logfiles;
Uint32 m_current_page_index;
Uint32 m_old_prepare_file_number;
Uint32 m_old_prepare_page_reference;
Uint32 m_dirty_flag;
/* Debug info Start /
Uint32 m_log_timer;
Uint32 m_page_i_value;
Uint32 m_place_written_from;
Uint32 m_page_no;
Uint32 m_file_no;
Uint32 m_word_written;
Uint32 m_in_writing_flag;
Uint32 m_prev_page_no;
Uint32 m_in_free_list;
/
Debug info End */
};

@adofsauron
Copy link
Collaborator Author

trx_rseg_t *rseg;

#ifdef UNIV_SYNC_DEBUG
ut_ad(mutex_own(&kernel_mutex));
#endif /* UNIV_SYNC_DEBUG */
ut_ad(trx->rseg == NULL);

if (trx->type == TRX_PURGE)
{
trx->id = ut_dulint_zero;
trx->conc_state = TRX_ACTIVE;
trx->start_time = time(NULL);

return (TRUE);

}

ut_ad(trx->conc_state != TRX_ACTIVE);

if (rseg_id == ULINT_UNDEFINED)
{
rseg_id = trx_assign_rseg();
}

rseg = trx_sys_get_nth_rseg(trx_sys, rseg_id);

trx->id = trx_sys_get_new_trx_id();

/* The initial value for trx->no: ut_dulint_max is used in
read_view_open_now: */

trx->no = ut_dulint_max;

trx->rseg = rseg;

trx->conc_state = TRX_ACTIVE;
trx->start_time = time(NULL);

UT_LIST_ADD_FIRST(trx_list, trx_sys->trx_list, trx);

@adofsauron
Copy link
Collaborator Author

Record lock, heap no 2 PHYSICAL RECORD: n_fields 5; compact format; info bits 0
0: len 6; hex 000000000200; asc ;; 1: len 6; hex 000000000505; asc ;; 2: len 7; hex 800000002d0110; asc - ;; 3: len 4; hex 80000001; asc ;; 4: len 4; hex 80000003; asc ;;

@adofsauron
Copy link
Collaborator Author

INVALID = 0,
// -----------------------------
// Catalog
// -----------------------------
CREATE_TABLE = 1,
DROP_TABLE = 2,

CREATE_SCHEMA = 3,
DROP_SCHEMA = 4,

CREATE_VIEW = 5,
DROP_VIEW = 6,

CREATE_SEQUENCE = 8,
DROP_SEQUENCE = 9,
SEQUENCE_VALUE = 10,

CREATE_MACRO = 11,
DROP_MACRO = 12,

CREATE_TYPE = 13,
DROP_TYPE = 14,

ALTER_INFO = 20,

CREATE_TABLE_MACRO = 21,
DROP_TABLE_MACRO = 22,

CREATE_INDEX = 23,
DROP_INDEX = 24,

// -----------------------------
// Data
// -----------------------------
USE_TABLE = 25,
INSERT_TUPLE = 26,
DELETE_TUPLE = 27,
UPDATE_TUPLE = 28,
// -----------------------------
// Flush
// -----------------------------
CHECKPOINT = 99,
WAL_FLUSH = 100

@adofsauron
Copy link
Collaborator Author

FieldWriter writer(serializer);
writer.WriteString(GetSchemaName());
writer.WriteString(GetTableName());
writer.WriteString(name);
writer.WriteString(sql);
writer.WriteField(index->type);
writer.WriteField(index->constraint_type);
writer.WriteSerializableList(expressions);
writer.WriteSerializableList(parsed_expressions);
writer.WriteList<idx_t>(index->column_ids);
writer.Finalize();

@adofsauron
Copy link
Collaborator Author

	while (true) {
		// read the current entry
		WALType entry_type = initial_reader->Read<WALType>();
		if (entry_type == WALType::WAL_FLUSH) {
			// check if the file is exhausted
			if (initial_reader->Finished()) {
				// we finished reading the file: break
				break;
			}
		} else {
			// replay the entry
			checkpoint_state.ReplayEntry(entry_type);
		}
	}

@adofsauron
Copy link
Collaborator Author

if there is a checkpoint flag, we might have already flushed the contents of the WAL to disk

@adofsauron
Copy link
Collaborator Author

Uint32 m_operationType; // 0 READ, 1 UPDATE, 2 INSERT, 3 DELETE

@adofsauron
Copy link
Collaborator Author

fbf772c61f96e68efd12a0f0ffd632f4

@hustjieke hustjieke added this to the StoneDB_5.7_v1.0.4 milestone May 29, 2023
@adofsauron adofsauron removed their assignment Jun 14, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-bug Something isn't working prio: high High priority
Projects
Status: No status
Development

No branches or pull requests

2 participants