buf0buf.c 104 KB
Newer Older
vasil's avatar
vasil committed
1
/*****************************************************************************
2

vasil's avatar
vasil committed
3 4
Copyright (c) 1995, 2009, Innobase Oy. All Rights Reserved.
Copyright (c) 2008, Google Inc.
5

vasil's avatar
vasil committed
6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24
Portions of this file contain modifications contributed and copyrighted by
Google, Inc. Those modifications are gratefully acknowledged and are described
briefly in the InnoDB documentation. The contributions by Google are
incorporated with their permission, and subject to the conditions contained in
the file COPYING.Google.

This program is free software; you can redistribute it and/or modify it under
the terms of the GNU General Public License as published by the Free Software
Foundation; version 2 of the License.

This program is distributed in the hope that it will be useful, but WITHOUT
ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS
FOR A PARTICULAR PURPOSE. See the GNU General Public License for more details.

You should have received a copy of the GNU General Public License along with
this program; if not, write to the Free Software Foundation, Inc., 59 Temple
Place, Suite 330, Boston, MA 02111-1307 USA

*****************************************************************************/
25

osku's avatar
osku committed
26 27 28 29 30 31 32 33 34 35 36 37
/******************************************************
The database buffer buf_pool

Created 11/5/1995 Heikki Tuuri
*******************************************************/

#include "buf0buf.h"

#ifdef UNIV_NONINL
#include "buf0buf.ic"
#endif

marko's avatar
marko committed
38
#include "buf0buddy.h"
osku's avatar
osku committed
39 40 41 42 43 44 45 46 47 48
#include "mem0mem.h"
#include "btr0btr.h"
#include "fil0fil.h"
#include "lock0lock.h"
#include "btr0sea.h"
#include "ibuf0ibuf.h"
#include "dict0dict.h"
#include "log0recv.h"
#include "trx0undo.h"
#include "srv0srv.h"
marko's avatar
marko committed
49
#include "page0zip.h"
osku's avatar
osku committed
50 51 52 53 54

/*
		IMPLEMENTATION OF THE BUFFER POOL
		=================================

55
Performance improvement:
osku's avatar
osku committed
56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138
------------------------
Thread scheduling in NT may be so slow that the OS wait mechanism should
not be used even in waiting for disk reads to complete.
Rather, we should put waiting query threads to the queue of
waiting jobs, and let the OS thread do something useful while the i/o
is processed. In this way we could remove most OS thread switches in
an i/o-intensive benchmark like TPC-C.

A possibility is to put a user space thread library between the database
and NT. User space thread libraries might be very fast.

SQL Server 7.0 can be configured to use 'fibers' which are lightweight
threads in NT. These should be studied.

		Buffer frames and blocks
		------------------------
Following the terminology of Gray and Reuter, we call the memory
blocks where file pages are loaded buffer frames. For each buffer
frame there is a control block, or shortly, a block, in the buffer
control array. The control info which does not need to be stored
in the file along with the file page, resides in the control block.

		Buffer pool struct
		------------------
The buffer buf_pool contains a single mutex which protects all the
control data structures of the buf_pool. The content of a buffer frame is
protected by a separate read-write lock in its control block, though.
These locks can be locked and unlocked without owning the buf_pool mutex.
The OS events in the buf_pool struct can be waited for without owning the
buf_pool mutex.

The buf_pool mutex is a hot-spot in main memory, causing a lot of
memory bus traffic on multiprocessor systems when processors
alternately access the mutex. On our Pentium, the mutex is accessed
maybe every 10 microseconds. We gave up the solution to have mutexes
for each control block, for instance, because it seemed to be
complicated.

A solution to reduce mutex contention of the buf_pool mutex is to
create a separate mutex for the page hash table. On Pentium,
accessing the hash table takes 2 microseconds, about half
of the total buf_pool mutex hold time.

		Control blocks
		--------------

The control block contains, for instance, the bufferfix count
which is incremented when a thread wants a file page to be fixed
in a buffer frame. The bufferfix operation does not lock the
contents of the frame, however. For this purpose, the control
block contains a read-write lock.

The buffer frames have to be aligned so that the start memory
address of a frame is divisible by the universal page size, which
is a power of two.

We intend to make the buffer buf_pool size on-line reconfigurable,
that is, the buf_pool size can be changed without closing the database.
Then the database administarator may adjust it to be bigger
at night, for example. The control block array must
contain enough control blocks for the maximum buffer buf_pool size
which is used in the particular database.
If the buf_pool size is cut, we exploit the virtual memory mechanism of
the OS, and just refrain from using frames at high addresses. Then the OS
can swap them to disk.

The control blocks containing file pages are put to a hash table
according to the file address of the page.
We could speed up the access to an individual page by using
"pointer swizzling": we could replace the page references on
non-leaf index pages by direct pointers to the page, if it exists
in the buf_pool. We could make a separate hash table where we could
chain all the page references in non-leaf pages residing in the buf_pool,
using the page reference as the hash key,
and at the time of reading of a page update the pointers accordingly.
Drawbacks of this solution are added complexity and,
possibly, extra space required on non-leaf pages for memory pointers.
A simpler solution is just to speed up the hash table mechanism
in the database, using tables whose size is a power of 2.

		Lists of blocks
		---------------

139 140 141 142
There are several lists of control blocks.

The free list (buf_pool->free) contains blocks which are currently not
used.
osku's avatar
osku committed
143

144
The common LRU list contains all the blocks holding a file page
osku's avatar
osku committed
145 146 147
except those for which the bufferfix count is non-zero.
The pages are in the LRU list roughly in the order of the last
access to the page, so that the oldest pages are at the end of the
148 149 150 151 152 153 154 155 156 157
list. We also keep a pointer to near the end of the LRU list,
which we can use when we want to artificially age a page in the
buf_pool. This is used if we know that some page is not needed
again for some time: we insert the block right after the pointer,
causing it to be replaced sooner than would noramlly be the case.
Currently this aging mechanism is used for read-ahead mechanism
of pages, and it can also be used when there is a scan of a full
table which cannot fit in the memory. Putting the pages near the
of the LRU list, we make sure that most of the buf_pool stays in the
main memory, undisturbed.
osku's avatar
osku committed
158

159 160 161 162 163 164 165 166
The unzip_LRU list contains a subset of the common LRU list.  The
blocks on the unzip_LRU list hold a compressed file page and the
corresponding uncompressed page frame.  A block is in unzip_LRU if and
only if the predicate buf_page_belongs_to_unzip_LRU(&block->page)
holds.  The blocks in unzip_LRU will be in same order as they are in
the common LRU list.  That is, each manipulation of the common LRU
list will result in the same manipulation of the unzip_LRU list.

167
The chain of modified blocks (buf_pool->flush_list) contains the blocks
osku's avatar
osku committed
168 169 170 171
holding file pages that have been modified in the memory
but not written to disk yet. The block with the oldest modification
which has not yet been written to disk is at the end of the chain.

172 173 174 175 176 177 178 179 180 181 182 183 184 185 186
The chain of unmodified compressed blocks (buf_pool->zip_clean)
contains the control blocks (buf_page_t) of those compressed pages
that are not in buf_pool->flush_list and for which no uncompressed
page has been allocated in the buffer pool.  The control blocks for
uncompressed pages are accessible via buf_block_t objects that are
reachable via buf_pool->chunks[].

The chains of free memory blocks (buf_pool->zip_free[]) are used by
the buddy allocator (buf0buddy.c) to keep track of currently unused
memory blocks of size sizeof(buf_page_t)..UNIV_PAGE_SIZE / 2.  These
blocks are inside the UNIV_PAGE_SIZE-sized memory blocks of type
BUF_BLOCK_MEMORY that the buddy allocator requests from the buffer
pool.  The buddy allocator is solely used for allocating control
blocks for compressed pages (buf_page_t) and compressed page frames.

osku's avatar
osku committed
187 188 189 190 191 192 193 194 195 196 197
		Loading a file page
		-------------------

First, a victim block for replacement has to be found in the
buf_pool. It is taken from the free list or searched for from the
end of the LRU-list. An exclusive lock is reserved for the frame,
the io_fix field is set in the block fixing the block in buf_pool,
and the io-operation for loading the page is queued. The io-handler thread
releases the X-lock on the frame and resets the io_fix field
when the io operation completes.

198 199
A thread may request the above operation using the function
buf_page_get(). It may then continue to request a lock on the frame.
osku's avatar
osku committed
200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237
The lock is granted when the io-handler releases the x-lock.

		Read-ahead
		----------

The read-ahead mechanism is intended to be intelligent and
isolated from the semantically higher levels of the database
index management. From the higher level we only need the
information if a file page has a natural successor or
predecessor page. On the leaf level of a B-tree index,
these are the next and previous pages in the natural
order of the pages.

Let us first explain the read-ahead mechanism when the leafs
of a B-tree are scanned in an ascending or descending order.
When a read page is the first time referenced in the buf_pool,
the buffer manager checks if it is at the border of a so-called
linear read-ahead area. The tablespace is divided into these
areas of size 64 blocks, for example. So if the page is at the
border of such an area, the read-ahead mechanism checks if
all the other blocks in the area have been accessed in an
ascending or descending order. If this is the case, the system
looks at the natural successor or predecessor of the page,
checks if that is at the border of another area, and in this case
issues read-requests for all the pages in that area. Maybe
we could relax the condition that all the pages in the area
have to be accessed: if data is deleted from a table, there may
appear holes of unused pages in the area.

A different read-ahead mechanism is used when there appears
to be a random access pattern to a file.
If a new page is referenced in the buf_pool, and several pages
of its random access area (for instance, 32 consecutive pages
in a tablespace) have recently been referenced, we may predict
that the whole area may be needed in the near future, and issue
the read requests for the whole area.
*/

238
/* Value in microseconds */
239
static const int WAIT_FOR_READ	= 5000;
240

241 242
/* The buffer buf_pool of the database */
UNIV_INTERN buf_pool_t*	buf_pool = NULL;
osku's avatar
osku committed
243

244 245
/* mutex protecting the buffer pool struct and control blocks, except the
read-write lock in them */
246
UNIV_INTERN mutex_t		buf_pool_mutex;
247 248
/* mutex protecting the control blocks of compressed-only pages
(of type buf_page_t, not buf_block_t) */
249
UNIV_INTERN mutex_t		buf_pool_zip_mutex;
250

251 252
#if defined UNIV_DEBUG || defined UNIV_BUF_DEBUG
static ulint	buf_dbg_counter	= 0; /* This is used to insert validation
osku's avatar
osku committed
253 254
					operations in excution in the
					debug version */
255
/** Flag to forbid the release of the buffer pool mutex.
256
Protected by buf_pool_mutex. */
257
UNIV_INTERN ulint		buf_pool_mutex_exit_forbidden = 0;
258 259
#endif /* UNIV_DEBUG || UNIV_BUF_DEBUG */
#ifdef UNIV_DEBUG
260 261 262
/* If this is set TRUE, the program prints info whenever
read-ahead or flush occurs */
UNIV_INTERN ibool		buf_debug_prints = FALSE;
osku's avatar
osku committed
263
#endif /* UNIV_DEBUG */
264

265 266 267 268 269 270 271 272 273
/* A chunk of buffers.  The buffer pool is allocated in chunks. */
struct buf_chunk_struct{
	ulint		mem_size;	/* allocated size of the chunk */
	ulint		size;		/* size of frames[] and blocks[] */
	void*		mem;		/* pointer to the memory area which
					was allocated for the frames */
	buf_block_t*	blocks;		/* array of buffer control blocks */
};

osku's avatar
osku committed
274 275 276 277
/************************************************************************
Calculates a page checksum which is stored to the page when it is written
to a file. Note that we must be careful to calculate the same value on
32-bit and 64-bit architectures. */
278
UNIV_INTERN
osku's avatar
osku committed
279 280 281
ulint
buf_calc_page_new_checksum(
/*=======================*/
marko's avatar
marko committed
282 283
				/* out: checksum */
	const byte*	page)	/* in: buffer page */
osku's avatar
osku committed
284
{
285
	ulint checksum;
osku's avatar
osku committed
286

287 288 289 290
	/* Since the field FIL_PAGE_FILE_FLUSH_LSN, and in versions <= 4.1.x
	..._ARCH_LOG_NO, are written outside the buffer pool to the first
	pages of data files, we have to skip them in the page checksum
	calculation.
osku's avatar
osku committed
291 292 293
	We must also skip the field FIL_PAGE_SPACE_OR_CHKSUM where the
	checksum is stored, and also the last 8 bytes of page because
	there we store the old formula checksum. */
294 295

	checksum = ut_fold_binary(page + FIL_PAGE_OFFSET,
296 297 298 299
				  FIL_PAGE_FILE_FLUSH_LSN - FIL_PAGE_OFFSET)
		+ ut_fold_binary(page + FIL_PAGE_DATA,
				 UNIV_PAGE_SIZE - FIL_PAGE_DATA
				 - FIL_PAGE_END_LSN_OLD_CHKSUM);
300 301 302
	checksum = checksum & 0xFFFFFFFFUL;

	return(checksum);
osku's avatar
osku committed
303 304 305 306 307
}

/************************************************************************
In versions < 4.0.14 and < 4.1.1 there was a bug that the checksum only
looked at the first few bytes of the page. This calculates that old
308
checksum.
osku's avatar
osku committed
309 310 311
NOTE: we must first store the new formula checksum to
FIL_PAGE_SPACE_OR_CHKSUM before calculating and storing this old checksum
because this takes that field as an input! */
312
UNIV_INTERN
osku's avatar
osku committed
313 314 315
ulint
buf_calc_page_old_checksum(
/*=======================*/
marko's avatar
marko committed
316 317
				/* out: checksum */
	const byte*	page)	/* in: buffer page */
osku's avatar
osku committed
318
{
319 320 321
	ulint checksum;

	checksum = ut_fold_binary(page, FIL_PAGE_FILE_FLUSH_LSN);
osku's avatar
osku committed
322

323
	checksum = checksum & 0xFFFFFFFFUL;
osku's avatar
osku committed
324

325
	return(checksum);
osku's avatar
osku committed
326 327 328 329
}

/************************************************************************
Checks if a page is corrupt. */
330
UNIV_INTERN
osku's avatar
osku committed
331 332 333
ibool
buf_page_is_corrupted(
/*==================*/
marko's avatar
marko committed
334 335 336 337
					/* out: TRUE if corrupted */
	const byte*	read_buf,	/* in: a database page */
	ulint		zip_size)	/* in: size of compressed page;
					0 for uncompressed pages */
osku's avatar
osku committed
338
{
339 340
	ulint		checksum_field;
	ulint		old_checksum_field;
osku's avatar
osku committed
341
#ifndef UNIV_HOTBACKUP
342
	ib_uint64_t	current_lsn;
osku's avatar
osku committed
343
#endif
344
	if (UNIV_LIKELY(!zip_size)
345 346 347
	    && memcmp(read_buf + FIL_PAGE_LSN + 4,
		      read_buf + UNIV_PAGE_SIZE
		      - FIL_PAGE_END_LSN_OLD_CHKSUM + 4, 4)) {
osku's avatar
osku committed
348 349 350 351 352 353 354 355 356

		/* Stored log sequence numbers at the start and the end
		of page do not match */

		return(TRUE);
	}

#ifndef UNIV_HOTBACKUP
	if (recv_lsn_checks_on && log_peek_lsn(&current_lsn)) {
357
		if (current_lsn < mach_read_ull(read_buf + FIL_PAGE_LSN)) {
osku's avatar
osku committed
358 359 360
			ut_print_timestamp(stderr);

			fprintf(stderr,
361
				"  InnoDB: Error: page %lu log sequence number"
362
				" %llu\n"
363
				"InnoDB: is in the future! Current system "
364
				"log sequence number %llu.\n"
365 366 367 368 369 370 371 372 373
				"InnoDB: Your database may be corrupt or "
				"you may have copied the InnoDB\n"
				"InnoDB: tablespace but not the InnoDB "
				"log files. See\n"
				"InnoDB: http://dev.mysql.com/doc/refman/"
				"5.1/en/forcing-recovery.html\n"
				"InnoDB: for more information.\n",
				(ulong) mach_read_from_4(read_buf
							 + FIL_PAGE_OFFSET),
374 375
				mach_read_ull(read_buf + FIL_PAGE_LSN),
				current_lsn);
osku's avatar
osku committed
376 377 378 379
		}
	}
#endif

380 381 382 383 384
	/* If we use checksums validation, make additional check before
	returning TRUE to ensure that the checksum is not equal to
	BUF_NO_CHECKSUM_MAGIC which might be stored by InnoDB with checksums
	disabled. Otherwise, skip checksum calculation and return FALSE */

385 386
	if (UNIV_LIKELY(srv_use_checksums)) {
		checksum_field = mach_read_from_4(read_buf
387
						  + FIL_PAGE_SPACE_OR_CHKSUM);
388 389 390

		if (UNIV_UNLIKELY(zip_size)) {
			return(checksum_field != BUF_NO_CHECKSUM_MAGIC
391 392
			       && checksum_field
			       != page_zip_calc_checksum(read_buf, zip_size));
393
		}
osku's avatar
osku committed
394

395 396 397
		old_checksum_field = mach_read_from_4(
			read_buf + UNIV_PAGE_SIZE
			- FIL_PAGE_END_LSN_OLD_CHKSUM);
osku's avatar
osku committed
398

399
		/* There are 2 valid formulas for old_checksum_field:
osku's avatar
osku committed
400

401 402
		1. Very old versions of InnoDB only stored 8 byte lsn to the
		start and the end of the page.
osku's avatar
osku committed
403

404 405 406 407
		2. Newer InnoDB versions store the old formula checksum
		there. */

		if (old_checksum_field != mach_read_from_4(read_buf
408 409 410 411
							   + FIL_PAGE_LSN)
		    && old_checksum_field != BUF_NO_CHECKSUM_MAGIC
		    && old_checksum_field
		    != buf_calc_page_old_checksum(read_buf)) {
412 413 414 415 416

			return(TRUE);
		}

		/* InnoDB versions < 4.0.14 and < 4.1.1 stored the space id
417
		(always equal to 0), to FIL_PAGE_SPACE_OR_CHKSUM */
418

419
		if (checksum_field != 0
420 421 422
		    && checksum_field != BUF_NO_CHECKSUM_MAGIC
		    && checksum_field
		    != buf_calc_page_new_checksum(read_buf)) {
423 424 425 426

			return(TRUE);
		}
	}
osku's avatar
osku committed
427 428 429 430 431 432

	return(FALSE);
}

/************************************************************************
Prints a page to stderr. */
433
UNIV_INTERN
osku's avatar
osku committed
434 435 436
void
buf_page_print(
/*===========*/
marko's avatar
marko committed
437 438
	const byte*	read_buf,	/* in: a database page */
	ulint		zip_size)	/* in: compressed page size, or
439
				0 for uncompressed pages */
osku's avatar
osku committed
440 441 442 443
{
	dict_index_t*	index;
	ulint		checksum;
	ulint		old_checksum;
444 445 446 447 448
	ulint		size	= zip_size;

	if (!size) {
		size = UNIV_PAGE_SIZE;
	}
osku's avatar
osku committed
449 450 451

	ut_print_timestamp(stderr);
	fprintf(stderr, "  InnoDB: Page dump in ascii and hex (%lu bytes):\n",
452 453
		(ulong) size);
	ut_print_buf(stderr, read_buf, size);
454
	fputs("\nInnoDB: End of page dump\n", stderr);
osku's avatar
osku committed
455

456 457 458 459 460
	if (zip_size) {
		/* Print compressed page. */

		switch (fil_page_get_type(read_buf)) {
		case FIL_PAGE_TYPE_ZBLOB:
461
		case FIL_PAGE_TYPE_ZBLOB2:
462
			checksum = srv_use_checksums
463 464
				? page_zip_calc_checksum(read_buf, zip_size)
				: BUF_NO_CHECKSUM_MAGIC;
465 466
			ut_print_timestamp(stderr);
			fprintf(stderr,
467 468 469 470 471 472 473
				"  InnoDB: Compressed BLOB page"
				" checksum %lu, stored %lu\n"
				"InnoDB: Page lsn %lu %lu\n"
				"InnoDB: Page number (if stored"
				" to page already) %lu,\n"
				"InnoDB: space id (if stored"
				" to page already) %lu\n",
474
				(ulong) checksum,
475 476 477 478 479 480 481 482 483
				(ulong) mach_read_from_4(
					read_buf + FIL_PAGE_SPACE_OR_CHKSUM),
				(ulong) mach_read_from_4(
					read_buf + FIL_PAGE_LSN),
				(ulong) mach_read_from_4(
					read_buf + (FIL_PAGE_LSN + 4)),
				(ulong) mach_read_from_4(
					read_buf + FIL_PAGE_OFFSET),
				(ulong) mach_read_from_4(
484 485
					read_buf
					+ FIL_PAGE_ARCH_LOG_NO_OR_SPACE_ID));
486 487 488 489
			return;
		default:
			ut_print_timestamp(stderr);
			fprintf(stderr,
490 491
				"  InnoDB: unknown page type %lu,"
				" assuming FIL_PAGE_INDEX\n",
492 493 494 495
				fil_page_get_type(read_buf));
			/* fall through */
		case FIL_PAGE_INDEX:
			checksum = srv_use_checksums
496 497
				? page_zip_calc_checksum(read_buf, zip_size)
				: BUF_NO_CHECKSUM_MAGIC;
498 499 500

			ut_print_timestamp(stderr);
			fprintf(stderr,
501 502 503 504 505 506 507
				"  InnoDB: Compressed page checksum %lu,"
				" stored %lu\n"
				"InnoDB: Page lsn %lu %lu\n"
				"InnoDB: Page number (if stored"
				" to page already) %lu,\n"
				"InnoDB: space id (if stored"
				" to page already) %lu\n",
508
				(ulong) checksum,
509 510 511 512 513 514 515 516 517 518 519
				(ulong) mach_read_from_4(
					read_buf + FIL_PAGE_SPACE_OR_CHKSUM),
				(ulong) mach_read_from_4(
					read_buf + FIL_PAGE_LSN),
				(ulong) mach_read_from_4(
					read_buf + (FIL_PAGE_LSN + 4)),
				(ulong) mach_read_from_4(
					read_buf + FIL_PAGE_OFFSET),
				(ulong) mach_read_from_4(
					read_buf
					+ FIL_PAGE_ARCH_LOG_NO_OR_SPACE_ID));
520 521 522 523 524 525 526
			return;
		case FIL_PAGE_TYPE_XDES:
			/* This is an uncompressed page. */
			break;
		}
	}

527 528 529 530
	checksum = srv_use_checksums
		? buf_calc_page_new_checksum(read_buf) : BUF_NO_CHECKSUM_MAGIC;
	old_checksum = srv_use_checksums
		? buf_calc_page_old_checksum(read_buf) : BUF_NO_CHECKSUM_MAGIC;
osku's avatar
osku committed
531 532

	ut_print_timestamp(stderr);
533
	fprintf(stderr,
534 535 536 537 538 539 540 541 542 543 544 545 546
		"  InnoDB: Page checksum %lu, prior-to-4.0.14-form"
		" checksum %lu\n"
		"InnoDB: stored checksum %lu, prior-to-4.0.14-form"
		" stored checksum %lu\n"
		"InnoDB: Page lsn %lu %lu, low 4 bytes of lsn"
		" at page end %lu\n"
		"InnoDB: Page number (if stored to page already) %lu,\n"
		"InnoDB: space id (if created with >= MySQL-4.1.1"
		" and stored already) %lu\n",
		(ulong) checksum, (ulong) old_checksum,
		(ulong) mach_read_from_4(read_buf + FIL_PAGE_SPACE_OR_CHKSUM),
		(ulong) mach_read_from_4(read_buf + UNIV_PAGE_SIZE
					 - FIL_PAGE_END_LSN_OLD_CHKSUM),
osku's avatar
osku committed
547 548 549
		(ulong) mach_read_from_4(read_buf + FIL_PAGE_LSN),
		(ulong) mach_read_from_4(read_buf + FIL_PAGE_LSN + 4),
		(ulong) mach_read_from_4(read_buf + UNIV_PAGE_SIZE
550
					 - FIL_PAGE_END_LSN_OLD_CHKSUM + 4),
osku's avatar
osku committed
551
		(ulong) mach_read_from_4(read_buf + FIL_PAGE_OFFSET),
552 553
		(ulong) mach_read_from_4(read_buf
					 + FIL_PAGE_ARCH_LOG_NO_OR_SPACE_ID));
osku's avatar
osku committed
554 555

	if (mach_read_from_2(read_buf + TRX_UNDO_PAGE_HDR + TRX_UNDO_PAGE_TYPE)
556
	    == TRX_UNDO_INSERT) {
557
		fprintf(stderr,
osku's avatar
osku committed
558 559
			"InnoDB: Page may be an insert undo log page\n");
	} else if (mach_read_from_2(read_buf + TRX_UNDO_PAGE_HDR
560 561
				    + TRX_UNDO_PAGE_TYPE)
		   == TRX_UNDO_UPDATE) {
562
		fprintf(stderr,
osku's avatar
osku committed
563 564 565
			"InnoDB: Page may be an update undo log page\n");
	}

566 567 568
	switch (fil_page_get_type(read_buf)) {
	case FIL_PAGE_INDEX:
		fprintf(stderr,
569 570
			"InnoDB: Page may be an index page where"
			" index id is %lu %lu\n",
571 572 573 574
			(ulong) ut_dulint_get_high(
				btr_page_get_index_id(read_buf)),
			(ulong) ut_dulint_get_low(
				btr_page_get_index_id(read_buf)));
osku's avatar
osku committed
575

576
#ifdef UNIV_HOTBACKUP
osku's avatar
osku committed
577 578 579
		/* If the code is in ibbackup, dict_sys may be uninitialized,
		i.e., NULL */

580 581 582 583
		if (dict_sys == NULL) {
			break;
		}
#endif /* UNIV_HOTBACKUP */
osku's avatar
osku committed
584

585 586
		index = dict_index_find_on_id_low(
			btr_page_get_index_id(read_buf));
587 588 589 590
		if (index) {
			fputs("InnoDB: (", stderr);
			dict_index_name_print(stderr, NULL, index);
			fputs(")\n", stderr);
osku's avatar
osku committed
591
		}
592 593
		break;
	case FIL_PAGE_INODE:
osku's avatar
osku committed
594
		fputs("InnoDB: Page may be an 'inode' page\n", stderr);
595 596
		break;
	case FIL_PAGE_IBUF_FREE_LIST:
osku's avatar
osku committed
597
		fputs("InnoDB: Page may be an insert buffer free list page\n",
598
		      stderr);
599 600 601
		break;
	case FIL_PAGE_TYPE_ALLOCATED:
		fputs("InnoDB: Page may be a freshly allocated page\n",
602
		      stderr);
603 604 605
		break;
	case FIL_PAGE_IBUF_BITMAP:
		fputs("InnoDB: Page may be an insert buffer bitmap page\n",
606
		      stderr);
607 608 609
		break;
	case FIL_PAGE_TYPE_SYS:
		fputs("InnoDB: Page may be a system page\n",
610
		      stderr);
611 612 613
		break;
	case FIL_PAGE_TYPE_TRX_SYS:
		fputs("InnoDB: Page may be a transaction system page\n",
614
		      stderr);
615 616 617
		break;
	case FIL_PAGE_TYPE_FSP_HDR:
		fputs("InnoDB: Page may be a file space header page\n",
618
		      stderr);
619 620 621
		break;
	case FIL_PAGE_TYPE_XDES:
		fputs("InnoDB: Page may be an extent descriptor page\n",
622
		      stderr);
623 624 625
		break;
	case FIL_PAGE_TYPE_BLOB:
		fputs("InnoDB: Page may be a BLOB page\n",
626
		      stderr);
627
		break;
628
	case FIL_PAGE_TYPE_ZBLOB:
629
	case FIL_PAGE_TYPE_ZBLOB2:
630
		fputs("InnoDB: Page may be a compressed BLOB page\n",
631
		      stderr);
632
		break;
osku's avatar
osku committed
633 634 635 636 637 638 639 640 641 642
	}
}

/************************************************************************
Initializes a buffer control block when the buf_pool is created. */
static
void
buf_block_init(
/*===========*/
	buf_block_t*	block,	/* in: pointer to control block */
643
	byte*		frame)	/* in: pointer to buffer frame */
osku's avatar
osku committed
644
{
645 646
	UNIV_MEM_DESC(frame, UNIV_PAGE_SIZE, block);

osku's avatar
osku committed
647 648
	block->frame = frame;

649 650 651
	block->page.state = BUF_BLOCK_NOT_USED;
	block->page.buf_fix_count = 0;
	block->page.io_fix = BUF_IO_NONE;
652

653
	block->modify_clock = 0;
654

655
#ifdef UNIV_DEBUG_FILE_ACCESSES
656
	block->page.file_page_was_freed = FALSE;
657
#endif /* UNIV_DEBUG_FILE_ACCESSES */
osku's avatar
osku committed
658 659 660 661

	block->check_index_page_at_flush = FALSE;
	block->index = NULL;

662
#ifdef UNIV_DEBUG
663 664
	block->page.in_page_hash = FALSE;
	block->page.in_zip_hash = FALSE;
665
	block->page.in_flush_list = FALSE;
666
	block->page.in_free_list = FALSE;
667
	block->page.in_LRU_list = FALSE;
668
	block->in_unzip_LRU_list = FALSE;
669
#endif /* UNIV_DEBUG */
670 671 672
#if defined UNIV_AHI_DEBUG || defined UNIV_DEBUG
	block->n_pointers = 0;
#endif /* UNIV_AHI_DEBUG || UNIV_DEBUG */
673
	page_zip_des_init(&block->page.zip);
marko's avatar
marko committed
674

675 676
	mutex_create(&block->mutex, SYNC_BUF_BLOCK);

677
	rw_lock_create(&block->lock, SYNC_LEVEL_VARYING);
osku's avatar
osku committed
678 679 680
	ut_ad(rw_lock_validate(&(block->lock)));

#ifdef UNIV_SYNC_DEBUG
681
	rw_lock_create(&block->debug_latch, SYNC_NO_ORDER_CHECK);
osku's avatar
osku committed
682 683 684 685
#endif /* UNIV_SYNC_DEBUG */
}

/************************************************************************
686 687 688 689 690 691 692 693
Allocates a chunk of buffer frames. */
static
buf_chunk_t*
buf_chunk_init(
/*===========*/
					/* out: chunk, or NULL on failure */
	buf_chunk_t*	chunk,		/* out: chunk of buffers */
	ulint		mem_size)	/* in: requested size in bytes */
osku's avatar
osku committed
694
{
695
	buf_block_t*	block;
osku's avatar
osku committed
696 697 698
	byte*		frame;
	ulint		i;

699 700 701 702 703 704
	/* Round down to a multiple of page size,
	although it already should be. */
	mem_size = ut_2pow_round(mem_size, UNIV_PAGE_SIZE);
	/* Reserve space for the block descriptors. */
	mem_size += ut_2pow_round((mem_size / UNIV_PAGE_SIZE) * (sizeof *block)
				  + (UNIV_PAGE_SIZE - 1), UNIV_PAGE_SIZE);
osku's avatar
osku committed
705

706 707
	chunk->mem_size = mem_size;
	chunk->mem = os_mem_alloc_large(&chunk->mem_size);
708

709
	if (UNIV_UNLIKELY(chunk->mem == NULL)) {
osku's avatar
osku committed
710 711 712 713

		return(NULL);
	}

714 715 716
	/* Allocate the block descriptors from
	the start of the memory block. */
	chunk->blocks = chunk->mem;
717

718 719 720 721
	/* Align a pointer to the first frame.  Note that when
	os_large_page_size is smaller than UNIV_PAGE_SIZE,
	we may allocate one fewer block than requested.  When
	it is bigger, we may allocate more blocks than requested. */
722

723 724 725
	frame = ut_align(chunk->mem, UNIV_PAGE_SIZE);
	chunk->size = chunk->mem_size / UNIV_PAGE_SIZE
		- (frame != chunk->mem);
osku's avatar
osku committed
726

727 728 729
	/* Subtract the space needed for block descriptors. */
	{
		ulint	size = chunk->size;
730

731 732 733 734
		while (frame < (byte*) (chunk->blocks + size)) {
			frame += UNIV_PAGE_SIZE;
			size--;
		}
osku's avatar
osku committed
735

736
		chunk->size = size;
osku's avatar
osku committed
737 738
	}

739 740 741
	/* Init block structs and assign frames for them. Then we
	assign the frames to the first blocks (we already mapped the
	memory above). */
osku's avatar
osku committed
742

743
	block = chunk->blocks;
osku's avatar
osku committed
744

745
	for (i = chunk->size; i--; ) {
osku's avatar
osku committed
746 747

		buf_block_init(block, frame);
748

749 750 751 752 753
#ifdef HAVE_purify
		/* Wipe contents of frame to eliminate a Purify warning */
		memset(block->frame, '\0', UNIV_PAGE_SIZE);
#endif
		/* Add the block to the free list */
754
		UT_LIST_ADD_LAST(list, buf_pool->free, (&block->page));
755
		ut_d(block->page.in_free_list = TRUE);
756 757

		block++;
758
		frame += UNIV_PAGE_SIZE;
osku's avatar
osku committed
759 760
	}

761 762 763
	return(chunk);
}

764 765 766 767 768 769 770 771 772 773 774 775 776 777 778 779 780
#ifdef UNIV_DEBUG
/*************************************************************************
Finds a block in the given buffer chunk that points to a
given compressed page. */
static
buf_block_t*
buf_chunk_contains_zip(
/*===================*/
				/* out: buffer block pointing to
				the compressed page, or NULL */
	buf_chunk_t*	chunk,	/* in: chunk being checked */
	const void*	data)	/* in: pointer to compressed page */
{
	buf_block_t*	block;
	ulint		i;

	ut_ad(buf_pool);
781
	ut_ad(buf_pool_mutex_own());
782

783 784 785 786 787 788 789 790 791 792 793 794 795 796 797
	block = chunk->blocks;

	for (i = chunk->size; i--; block++) {
		if (block->page.zip.data == data) {

			return(block);
		}
	}

	return(NULL);
}

/*************************************************************************
Finds a block in the buffer pool that points to a
given compressed page. */
798
UNIV_INTERN
799 800 801 802 803 804 805 806 807 808 809 810 811 812 813 814 815 816 817 818 819 820
buf_block_t*
buf_pool_contains_zip(
/*==================*/
				/* out: buffer block pointing to
				the compressed page, or NULL */
	const void*	data)	/* in: pointer to compressed page */
{
	ulint		n;
	buf_chunk_t*	chunk = buf_pool->chunks;

	for (n = buf_pool->n_chunks; n--; chunk++) {
		buf_block_t* block = buf_chunk_contains_zip(chunk, data);

		if (block) {
			return(block);
		}
	}

	return(NULL);
}
#endif /* UNIV_DEBUG */

821 822 823 824 825 826 827 828 829 830 831 832 833 834
/*************************************************************************
Checks that all file pages in the buffer chunk are in a replaceable state. */
static
const buf_block_t*
buf_chunk_not_freed(
/*================*/
				/* out: address of a non-free block,
				or NULL if all freed */
	buf_chunk_t*	chunk)	/* in: chunk being checked */
{
	buf_block_t*	block;
	ulint		i;

	ut_ad(buf_pool);
835
	ut_ad(buf_pool_mutex_own());
836 837 838 839 840 841

	block = chunk->blocks;

	for (i = chunk->size; i--; block++) {
		mutex_enter(&block->mutex);

842
		if (buf_block_get_state(block) == BUF_BLOCK_FILE_PAGE
843
		    && !buf_flush_ready_for_replace(&block->page)) {
844 845 846 847 848 849 850 851 852 853 854 855 856 857 858 859 860 861 862 863 864 865 866 867

			mutex_exit(&block->mutex);
			return(block);
		}

		mutex_exit(&block->mutex);
	}

	return(NULL);
}

/*************************************************************************
Checks that all blocks in the buffer chunk are in BUF_BLOCK_NOT_USED state. */
static
ibool
buf_chunk_all_free(
/*===============*/
					/* out: TRUE if all freed */
	const buf_chunk_t*	chunk)	/* in: chunk being checked */
{
	const buf_block_t*	block;
	ulint			i;

	ut_ad(buf_pool);
868
	ut_ad(buf_pool_mutex_own());
869 870 871 872 873

	block = chunk->blocks;

	for (i = chunk->size; i--; block++) {

874
		if (buf_block_get_state(block) != BUF_BLOCK_NOT_USED) {
875 876 877 878 879 880 881 882 883 884 885 886 887 888 889 890 891 892 893

			return(FALSE);
		}
	}

	return(TRUE);
}

/************************************************************************
Frees a chunk of buffer frames. */
static
void
buf_chunk_free(
/*===========*/
	buf_chunk_t*	chunk)		/* out: chunk of buffers */
{
	buf_block_t*		block;
	const buf_block_t*	block_end;

894
	ut_ad(buf_pool_mutex_own());
895 896 897 898

	block_end = chunk->blocks + chunk->size;

	for (block = chunk->blocks; block < block_end; block++) {
899
		ut_a(buf_block_get_state(block) == BUF_BLOCK_NOT_USED);
900
		ut_a(!block->page.zip.data);
901

902
		ut_ad(!block->page.in_LRU_list);
903
		ut_ad(!block->in_unzip_LRU_list);
904
		ut_ad(!block->page.in_flush_list);
905
		/* Remove the block from the free list. */
906
		ut_ad(block->page.in_free_list);
907
		UT_LIST_REMOVE(list, buf_pool->free, (&block->page));
908 909 910 911 912 913 914

		/* Free the latches. */
		mutex_free(&block->mutex);
		rw_lock_free(&block->lock);
#ifdef UNIV_SYNC_DEBUG
		rw_lock_free(&block->debug_latch);
#endif /* UNIV_SYNC_DEBUG */
915
		UNIV_MEM_UNDESC(block);
916 917 918 919 920 921 922
	}

	os_mem_free_large(chunk->mem, chunk->mem_size);
}

/************************************************************************
Creates the buffer pool. */
923
UNIV_INTERN
924 925 926 927 928 929 930 931 932
buf_pool_t*
buf_pool_init(void)
/*===============*/
				/* out, own: buf_pool object, NULL if not
				enough memory or error */
{
	buf_chunk_t*	chunk;
	ulint		i;

933
	buf_pool = mem_zalloc(sizeof(buf_pool_t));
934 935 936

	/* 1. Initialize general fields
	------------------------------- */
937
	mutex_create(&buf_pool_mutex, SYNC_BUF_POOL);
938
	mutex_create(&buf_pool_zip_mutex, SYNC_BUF_BLOCK);
939

940
	buf_pool_mutex_enter();
941 942 943 944 945 946 947 948 949 950 951 952 953 954 955 956 957 958

	buf_pool->n_chunks = 1;
	buf_pool->chunks = chunk = mem_alloc(sizeof *chunk);

	UT_LIST_INIT(buf_pool->free);

	if (!buf_chunk_init(chunk, srv_buf_pool_size)) {
		mem_free(chunk);
		mem_free(buf_pool);
		buf_pool = NULL;
		return(NULL);
	}

	srv_buf_pool_old_size = srv_buf_pool_size;
	buf_pool->curr_size = chunk->size;
	srv_buf_pool_curr_size = buf_pool->curr_size * UNIV_PAGE_SIZE;

	buf_pool->page_hash = hash_create(2 * buf_pool->curr_size);
959
	buf_pool->zip_hash = hash_create(2 * buf_pool->curr_size);
osku's avatar
osku committed
960 961 962 963

	buf_pool->last_printout_time = time(NULL);

	/* 2. Initialize flushing fields
964
	-------------------------------- */
osku's avatar
osku committed
965

966
	for (i = BUF_FLUSH_LRU; i < BUF_FLUSH_N_TYPES; i++) {
osku's avatar
osku committed
967 968 969 970
		buf_pool->no_flush[i] = os_event_create(NULL);
	}

	buf_pool->ulint_clock = 1;
971

osku's avatar
osku committed
972
	/* 3. Initialize LRU fields
973
	--------------------------- */
974
	/* All fields are initialized by mem_zalloc(). */
osku's avatar
osku committed
975

976
	buf_pool_mutex_exit();
osku's avatar
osku committed
977

978 979
	btr_search_sys_create(buf_pool->curr_size
			      * UNIV_PAGE_SIZE / sizeof(void*) / 64);
osku's avatar
osku committed
980

981
	/* 4. Initialize the buddy allocator fields */
982
	/* All fields are initialized by mem_zalloc(). */
983

984 985
	return(buf_pool);
}
osku's avatar
osku committed
986

987 988 989
/************************************************************************
Frees the buffer pool at shutdown.  This must not be invoked before
freeing all mutexes. */
990
UNIV_INTERN
991 992 993 994 995 996 997 998 999 1000 1001 1002 1003 1004 1005 1006 1007 1008 1009
void
buf_pool_free(void)
/*===============*/
{
	buf_chunk_t*	chunk;
	buf_chunk_t*	chunks;

	chunks = buf_pool->chunks;
	chunk = chunks + buf_pool->n_chunks;

	while (--chunk >= chunks) {
		/* Bypass the checks of buf_chunk_free(), since they
		would fail at shutdown. */
		os_mem_free_large(chunk->mem, chunk->mem_size);
	}

	buf_pool->n_chunks = 0;
}

1010 1011 1012 1013 1014 1015 1016 1017 1018 1019 1020 1021 1022 1023 1024 1025 1026 1027

/************************************************************************
Drops the adaptive hash index.  To prevent a livelock, this function
is only to be called while holding btr_search_latch and while
btr_search_enabled == FALSE. */
UNIV_INTERN
void
buf_pool_drop_hash_index(void)
/*==========================*/
{
	ibool		released_search_latch;

#ifdef UNIV_SYNC_DEBUG
	ut_ad(rw_lock_own(&btr_search_latch, RW_LOCK_EX));
#endif /* UNIV_SYNC_DEBUG */
	ut_ad(!btr_search_enabled);

	do {
1028
		buf_chunk_t*	chunks	= buf_pool->chunks;
1029 1030 1031 1032 1033 1034 1035 1036 1037 1038 1039 1040 1041 1042 1043 1044 1045 1046 1047 1048 1049 1050 1051 1052 1053 1054 1055 1056 1057 1058 1059 1060 1061 1062 1063 1064 1065 1066 1067 1068 1069 1070 1071 1072 1073 1074 1075 1076 1077 1078 1079 1080 1081 1082 1083 1084 1085 1086 1087 1088 1089 1090 1091
		buf_chunk_t*	chunk	= chunks + buf_pool->n_chunks;

		released_search_latch = FALSE;

		while (--chunk >= chunks) {
			buf_block_t*	block	= chunk->blocks;
			ulint		i	= chunk->size;

			for (; i--; block++) {
				/* block->is_hashed cannot be modified
				when we have an x-latch on btr_search_latch;
				see the comment in buf0buf.h */

				if (!block->is_hashed) {
					continue;
				}

				/* To follow the latching order, we
				have to release btr_search_latch
				before acquiring block->latch. */
				rw_lock_x_unlock(&btr_search_latch);
				/* When we release the search latch,
				we must rescan all blocks, because
				some may become hashed again. */
				released_search_latch = TRUE;

				rw_lock_x_lock(&block->lock);

				/* This should be guaranteed by the
				callers, which will be holding
				btr_search_enabled_mutex. */
				ut_ad(!btr_search_enabled);

				/* Because we did not buffer-fix the
				block by calling buf_block_get_gen(),
				it is possible that the block has been
				allocated for some other use after
				btr_search_latch was released above.
				We do not care which file page the
				block is mapped to.  All we want to do
				is to drop any hash entries referring
				to the page. */

				/* It is possible that
				block->page.state != BUF_FILE_PAGE.
				Even that does not matter, because
				btr_search_drop_page_hash_index() will
				check block->is_hashed before doing
				anything.  block->is_hashed can only
				be set on uncompressed file pages. */

				btr_search_drop_page_hash_index(block);

				rw_lock_x_unlock(&block->lock);

				rw_lock_x_lock(&btr_search_latch);

				ut_ad(!btr_search_enabled);
			}
		}
	} while (released_search_latch);
}

1092 1093
/************************************************************************
Relocate a buffer control block.  Relocates the block on the LRU list
1094 1095
and in buf_pool->page_hash.  Does not relocate bpage->list.
The caller must take care of relocating bpage->list. */
1096
UNIV_INTERN
1097 1098 1099
void
buf_relocate(
/*=========*/
1100 1101 1102 1103
	buf_page_t*	bpage,	/* in/out: control block being relocated;
				buf_page_get_state(bpage) must be
				BUF_BLOCK_ZIP_DIRTY or BUF_BLOCK_ZIP_PAGE */
	buf_page_t*	dpage)	/* in/out: destination control block */
1104 1105 1106
{
	buf_page_t*	b;
	ulint		fold;
1107

1108
	ut_ad(buf_pool_mutex_own());
1109
	ut_ad(mutex_own(buf_page_get_mutex(bpage)));
1110 1111
	ut_a(buf_page_get_io_fix(bpage) == BUF_IO_NONE);
	ut_a(bpage->buf_fix_count == 0);
1112 1113 1114
	ut_ad(bpage->in_LRU_list);
	ut_ad(!bpage->in_zip_hash);
	ut_ad(bpage->in_page_hash);
1115
	ut_ad(bpage == buf_page_hash_get(bpage->space, bpage->offset));
1116 1117 1118 1119 1120 1121 1122 1123 1124 1125 1126 1127 1128 1129
#ifdef UNIV_DEBUG
	switch (buf_page_get_state(bpage)) {
	case BUF_BLOCK_ZIP_FREE:
	case BUF_BLOCK_NOT_USED:
	case BUF_BLOCK_READY_FOR_USE:
	case BUF_BLOCK_FILE_PAGE:
	case BUF_BLOCK_MEMORY:
	case BUF_BLOCK_REMOVE_HASH:
		ut_error;
	case BUF_BLOCK_ZIP_DIRTY:
	case BUF_BLOCK_ZIP_PAGE:
		break;
	}
#endif /* UNIV_DEBUG */
1130

1131
	memcpy(dpage, bpage, sizeof *dpage);
1132

1133 1134 1135
	ut_d(bpage->in_LRU_list = FALSE);
	ut_d(bpage->in_page_hash = FALSE);

1136
	/* relocate buf_pool->LRU */
1137 1138 1139 1140 1141 1142 1143 1144 1145
	b = UT_LIST_GET_PREV(LRU, bpage);
	UT_LIST_REMOVE(LRU, buf_pool->LRU, bpage);

	if (b) {
		UT_LIST_INSERT_AFTER(LRU, buf_pool->LRU, b, dpage);
	} else {
		UT_LIST_ADD_FIRST(LRU, buf_pool->LRU, dpage);
	}

1146 1147
	if (UNIV_UNLIKELY(buf_pool->LRU_old == bpage)) {
		buf_pool->LRU_old = dpage;
1148 1149 1150 1151 1152 1153 1154 1155
#ifdef UNIV_LRU_DEBUG
		/* buf_pool->LRU_old must be the first item in the LRU list
		whose "old" flag is set. */
		ut_a(!UT_LIST_GET_PREV(LRU, buf_pool->LRU_old)
		     || !UT_LIST_GET_PREV(LRU, buf_pool->LRU_old)->old);
		ut_a(!UT_LIST_GET_NEXT(LRU, buf_pool->LRU_old)
		     || UT_LIST_GET_NEXT(LRU, buf_pool->LRU_old)->old);
#endif /* UNIV_LRU_DEBUG */
1156 1157
	}

1158 1159
	ut_d(UT_LIST_VALIDATE(LRU, buf_page_t, buf_pool->LRU,
			      ut_ad(ut_list_node_313->in_LRU_list)));
1160

1161 1162 1163 1164 1165
	/* relocate buf_pool->page_hash */
	fold = buf_page_address_fold(bpage->space, bpage->offset);

	HASH_DELETE(buf_page_t, hash, buf_pool->page_hash, fold, bpage);
	HASH_INSERT(buf_page_t, hash, buf_pool->page_hash, fold, dpage);
1166 1167

	UNIV_MEM_INVALID(bpage, sizeof *bpage);
1168 1169
}

1170
/************************************************************************
1171 1172
Shrinks the buffer pool. */
static
1173
void
1174 1175 1176 1177
buf_pool_shrink(
/*============*/
				/* out: TRUE if shrunk */
	ulint	chunk_size)	/* in: number of pages to remove */
1178 1179 1180
{
	buf_chunk_t*	chunks;
	buf_chunk_t*	chunk;
1181 1182 1183 1184 1185
	ulint		max_size;
	ulint		max_free_size;
	buf_chunk_t*	max_chunk;
	buf_chunk_t*	max_free_chunk;

1186
	ut_ad(!buf_pool_mutex_own());
1187

1188
try_again:
1189
	btr_search_disable(); /* Empty the adaptive hash index again */
1190
	buf_pool_mutex_enter();
1191

1192 1193
shrink_again:
	if (buf_pool->n_chunks <= 1) {
1194

1195 1196
		/* Cannot shrink if there is only one chunk */
		goto func_done;
1197 1198
	}

1199 1200 1201 1202 1203 1204 1205 1206 1207 1208 1209 1210 1211 1212 1213 1214 1215 1216 1217 1218 1219
	/* Search for the largest free chunk
	not larger than the size difference */
	chunks = buf_pool->chunks;
	chunk = chunks + buf_pool->n_chunks;
	max_size = max_free_size = 0;
	max_chunk = max_free_chunk = NULL;

	while (--chunk >= chunks) {
		if (chunk->size <= chunk_size
		    && chunk->size > max_free_size) {
			if (chunk->size > max_size) {
				max_size = chunk->size;
				max_chunk = chunk;
			}

			if (buf_chunk_all_free(chunk)) {
				max_free_size = chunk->size;
				max_free_chunk = chunk;
			}
		}
	}
1220

1221
	if (!max_free_size) {
1222

1223 1224 1225 1226
		ulint		dirty	= 0;
		ulint		nonfree	= 0;
		buf_block_t*	block;
		buf_block_t*	bend;
1227

1228 1229 1230
		/* Cannot shrink: try again later
		(do not assign srv_buf_pool_old_size) */
		if (!max_chunk) {
1231

1232
			goto func_exit;
osku's avatar
osku committed
1233 1234
		}

1235 1236 1237 1238 1239 1240
		block = max_chunk->blocks;
		bend = block + max_chunk->size;

		/* Move the blocks of chunk to the end of the
		LRU list and try to flush them. */
		for (; block < bend; block++) {
1241
			switch (buf_block_get_state(block)) {
1242 1243 1244 1245 1246 1247 1248
			case BUF_BLOCK_NOT_USED:
				continue;
			case BUF_BLOCK_FILE_PAGE:
				break;
			default:
				nonfree++;
				continue;
1249
			}
1250 1251 1252

			mutex_enter(&block->mutex);
			/* The following calls will temporarily
1253
			release block->mutex and buf_pool_mutex.
1254 1255 1256
			Therefore, we have to always retry,
			even if !dirty && !nonfree. */

1257
			if (!buf_flush_ready_for_replace(&block->page)) {
1258

1259
				buf_LRU_make_block_old(&block->page);
1260
				dirty++;
1261 1262
			} else if (buf_LRU_free_block(&block->page, TRUE, NULL)
				   != BUF_LRU_FREED) {
1263 1264 1265 1266
				nonfree++;
			}

			mutex_exit(&block->mutex);
1267 1268
		}

1269
		buf_pool_mutex_exit();
1270

1271 1272 1273 1274 1275 1276 1277
		/* Request for a flush of the chunk if it helps.
		Do not flush if there are non-free blocks, since
		flushing will not make the chunk freeable. */
		if (nonfree) {
			/* Avoid busy-waiting. */
			os_thread_sleep(100000);
		} else if (dirty
1278
			   && buf_flush_batch(BUF_FLUSH_LRU, dirty, 0)
1279 1280 1281 1282
			   == ULINT_UNDEFINED) {

			buf_flush_wait_batch_end(BUF_FLUSH_LRU);
		}
1283

1284 1285
		goto try_again;
	}
1286

1287 1288
	max_size = max_free_size;
	max_chunk = max_free_chunk;
1289

1290
	srv_buf_pool_old_size = srv_buf_pool_size;
1291

1292 1293 1294 1295 1296 1297 1298 1299 1300 1301 1302 1303 1304 1305 1306 1307
	/* Rewrite buf_pool->chunks.  Copy everything but max_chunk. */
	chunks = mem_alloc((buf_pool->n_chunks - 1) * sizeof *chunks);
	memcpy(chunks, buf_pool->chunks,
	       (max_chunk - buf_pool->chunks) * sizeof *chunks);
	memcpy(chunks + (max_chunk - buf_pool->chunks),
	       max_chunk + 1,
	       buf_pool->chunks + buf_pool->n_chunks
	       - (max_chunk + 1));
	ut_a(buf_pool->curr_size > max_chunk->size);
	buf_pool->curr_size -= max_chunk->size;
	srv_buf_pool_curr_size = buf_pool->curr_size * UNIV_PAGE_SIZE;
	chunk_size -= max_chunk->size;
	buf_chunk_free(max_chunk);
	mem_free(buf_pool->chunks);
	buf_pool->chunks = chunks;
	buf_pool->n_chunks--;
1308

1309 1310
	/* Allow a slack of one megabyte. */
	if (chunk_size > 1048576 / UNIV_PAGE_SIZE) {
1311

1312 1313
		goto shrink_again;
	}
1314

1315 1316 1317
func_done:
	srv_buf_pool_old_size = srv_buf_pool_size;
func_exit:
1318
	buf_pool_mutex_exit();
1319 1320
	btr_search_enable();
}
1321

1322 1323 1324 1325 1326 1327 1328 1329 1330 1331 1332
/************************************************************************
Rebuild buf_pool->page_hash. */
static
void
buf_pool_page_hash_rebuild(void)
/*============================*/
{
	ulint		i;
	ulint		n_chunks;
	buf_chunk_t*	chunk;
	hash_table_t*	page_hash;
1333 1334
	hash_table_t*	zip_hash;
	buf_page_t*	b;
1335

1336
	buf_pool_mutex_enter();
1337 1338 1339 1340

	/* Free, create, and populate the hash table. */
	hash_table_free(buf_pool->page_hash);
	buf_pool->page_hash = page_hash = hash_create(2 * buf_pool->curr_size);
1341 1342 1343 1344 1345 1346 1347 1348 1349
	zip_hash = hash_create(2 * buf_pool->curr_size);

	HASH_MIGRATE(buf_pool->zip_hash, zip_hash, buf_page_t, hash,
		     BUF_POOL_ZIP_FOLD_BPAGE);

	hash_table_free(buf_pool->zip_hash);
	buf_pool->zip_hash = zip_hash;

	/* Insert the uncompressed file pages to buf_pool->page_hash. */
1350 1351 1352 1353 1354 1355 1356 1357 1358

	chunk = buf_pool->chunks;
	n_chunks = buf_pool->n_chunks;

	for (i = 0; i < n_chunks; i++, chunk++) {
		ulint		j;
		buf_block_t*	block = chunk->blocks;

		for (j = 0; j < chunk->size; j++, block++) {
1359 1360
			if (buf_block_get_state(block)
			    == BUF_BLOCK_FILE_PAGE) {
1361 1362 1363
				ut_ad(!block->page.in_zip_hash);
				ut_ad(block->page.in_page_hash);

1364
				HASH_INSERT(buf_page_t, hash, page_hash,
1365
					    buf_page_address_fold(
1366 1367
						    block->page.space,
						    block->page.offset),
1368
					    &block->page);
1369 1370 1371 1372
			}
		}
	}

1373 1374 1375 1376 1377 1378 1379
	/* Insert the compressed-only pages to buf_pool->page_hash.
	All such blocks are either in buf_pool->zip_clean or
	in buf_pool->flush_list. */

	for (b = UT_LIST_GET_FIRST(buf_pool->zip_clean); b;
	     b = UT_LIST_GET_NEXT(list, b)) {
		ut_a(buf_page_get_state(b) == BUF_BLOCK_ZIP_PAGE);
1380
		ut_ad(!b->in_flush_list);
1381 1382 1383
		ut_ad(b->in_LRU_list);
		ut_ad(b->in_page_hash);
		ut_ad(!b->in_zip_hash);
1384 1385 1386 1387 1388 1389 1390

		HASH_INSERT(buf_page_t, hash, page_hash,
			    buf_page_address_fold(b->space, b->offset), b);
	}

	for (b = UT_LIST_GET_FIRST(buf_pool->flush_list); b;
	     b = UT_LIST_GET_NEXT(list, b)) {
1391
		ut_ad(b->in_flush_list);
1392 1393 1394
		ut_ad(b->in_LRU_list);
		ut_ad(b->in_page_hash);
		ut_ad(!b->in_zip_hash);
1395

1396 1397 1398 1399 1400 1401 1402 1403 1404 1405 1406 1407 1408 1409 1410 1411 1412 1413 1414 1415
		switch (buf_page_get_state(b)) {
		case BUF_BLOCK_ZIP_DIRTY:
			HASH_INSERT(buf_page_t, hash, page_hash,
				    buf_page_address_fold(b->space,
							  b->offset), b);
			break;
		case BUF_BLOCK_FILE_PAGE:
			/* uncompressed page */
			break;
		case BUF_BLOCK_ZIP_FREE:
		case BUF_BLOCK_ZIP_PAGE:
		case BUF_BLOCK_NOT_USED:
		case BUF_BLOCK_READY_FOR_USE:
		case BUF_BLOCK_MEMORY:
		case BUF_BLOCK_REMOVE_HASH:
			ut_error;
			break;
		}
	}

1416
	buf_pool_mutex_exit();
1417 1418
}

1419 1420
/************************************************************************
Resizes the buffer pool. */
1421
UNIV_INTERN
1422 1423 1424 1425
void
buf_pool_resize(void)
/*=================*/
{
1426
	buf_pool_mutex_enter();
1427

1428 1429
	if (srv_buf_pool_old_size == srv_buf_pool_size) {

1430
		buf_pool_mutex_exit();
1431 1432 1433 1434 1435
		return;
	}

	if (srv_buf_pool_curr_size + 1048576 > srv_buf_pool_size) {

1436
		buf_pool_mutex_exit();
1437 1438 1439 1440 1441

		/* Disable adaptive hash indexes and empty the index
		in order to free up memory in the buffer pool chunks. */
		buf_pool_shrink((srv_buf_pool_curr_size - srv_buf_pool_size)
				/ UNIV_PAGE_SIZE);
1442 1443 1444 1445 1446 1447
	} else if (srv_buf_pool_curr_size + 1048576 < srv_buf_pool_size) {

		/* Enlarge the buffer pool by at least one megabyte */

		ulint		mem_size
			= srv_buf_pool_size - srv_buf_pool_curr_size;
1448 1449
		buf_chunk_t*	chunks;
		buf_chunk_t*	chunk;
1450 1451 1452 1453 1454 1455 1456 1457 1458 1459 1460 1461 1462 1463 1464 1465 1466 1467

		chunks = mem_alloc((buf_pool->n_chunks + 1) * sizeof *chunks);

		memcpy(chunks, buf_pool->chunks, buf_pool->n_chunks
		       * sizeof *chunks);

		chunk = &chunks[buf_pool->n_chunks];

		if (!buf_chunk_init(chunk, mem_size)) {
			mem_free(chunks);
		} else {
			buf_pool->curr_size += chunk->size;
			srv_buf_pool_curr_size = buf_pool->curr_size
				* UNIV_PAGE_SIZE;
			mem_free(buf_pool->chunks);
			buf_pool->chunks = chunks;
			buf_pool->n_chunks++;
		}
1468 1469

		srv_buf_pool_old_size = srv_buf_pool_size;
1470
		buf_pool_mutex_exit();
osku's avatar
osku committed
1471 1472
	}

1473
	buf_pool_page_hash_rebuild();
1474
}
osku's avatar
osku committed
1475 1476 1477 1478 1479 1480 1481 1482

/************************************************************************
Moves to the block to the start of the LRU list if there is a danger
that the block would drift out of the buffer pool. */
UNIV_INLINE
void
buf_block_make_young(
/*=================*/
1483
	buf_page_t*	bpage)	/* in: block to make younger */
osku's avatar
osku committed
1484
{
1485
	ut_ad(!buf_pool_mutex_own());
1486 1487 1488 1489

	/* Note that we read freed_page_clock's without holding any mutex:
	this is allowed since the result is used only in heuristics */

1490
	if (buf_page_peek_if_too_old(bpage)) {
osku's avatar
osku committed
1491

1492
		buf_pool_mutex_enter();
osku's avatar
osku committed
1493 1494 1495
		/* There has been freeing activity in the LRU list:
		best to move to the head of the LRU list */

1496
		buf_LRU_make_block_young(bpage);
1497
		buf_pool_mutex_exit();
osku's avatar
osku committed
1498 1499 1500 1501 1502 1503 1504
	}
}

/************************************************************************
Moves a page to the start of the buffer pool LRU list. This high-level
function can be used to prevent an important page from from slipping out of
the buffer pool. */
1505
UNIV_INTERN
osku's avatar
osku committed
1506 1507
void
buf_page_make_young(
1508
/*================*/
1509
	buf_page_t*	bpage)	/* in: buffer block of a file page */
osku's avatar
osku committed
1510
{
1511
	buf_pool_mutex_enter();
osku's avatar
osku committed
1512

1513
	ut_a(buf_page_in_file(bpage));
osku's avatar
osku committed
1514

1515
	buf_LRU_make_block_young(bpage);
osku's avatar
osku committed
1516

1517
	buf_pool_mutex_exit();
osku's avatar
osku committed
1518 1519 1520 1521 1522
}

/************************************************************************
Resets the check_index_page_at_flush field of a page if found in the buffer
pool. */
1523
UNIV_INTERN
osku's avatar
osku committed
1524 1525 1526 1527 1528 1529 1530 1531
void
buf_reset_check_index_page_at_flush(
/*================================*/
	ulint	space,	/* in: space id */
	ulint	offset)	/* in: page number */
{
	buf_block_t*	block;

1532
	buf_pool_mutex_enter();
osku's avatar
osku committed
1533

1534
	block = (buf_block_t*) buf_page_hash_get(space, offset);
osku's avatar
osku committed
1535

1536
	if (block && buf_block_get_state(block) == BUF_BLOCK_FILE_PAGE) {
osku's avatar
osku committed
1537 1538
		block->check_index_page_at_flush = FALSE;
	}
1539

1540
	buf_pool_mutex_exit();
osku's avatar
osku committed
1541 1542 1543 1544 1545 1546
}

/************************************************************************
Returns the current state of is_hashed of a page. FALSE if the page is
not in the pool. NOTE that this operation does not fix the page in the
pool if it is found there. */
1547
UNIV_INTERN
osku's avatar
osku committed
1548 1549 1550 1551 1552 1553 1554 1555 1556 1557 1558
ibool
buf_page_peek_if_search_hashed(
/*===========================*/
			/* out: TRUE if page hash index is built in search
			system */
	ulint	space,	/* in: space id */
	ulint	offset)	/* in: page number */
{
	buf_block_t*	block;
	ibool		is_hashed;

1559
	buf_pool_mutex_enter();
osku's avatar
osku committed
1560

1561
	block = (buf_block_t*) buf_page_hash_get(space, offset);
osku's avatar
osku committed
1562

1563
	if (!block || buf_block_get_state(block) != BUF_BLOCK_FILE_PAGE) {
osku's avatar
osku committed
1564 1565 1566 1567 1568
		is_hashed = FALSE;
	} else {
		is_hashed = block->is_hashed;
	}

1569
	buf_pool_mutex_exit();
osku's avatar
osku committed
1570 1571 1572 1573

	return(is_hashed);
}

1574
#ifdef UNIV_DEBUG_FILE_ACCESSES
osku's avatar
osku committed
1575 1576 1577 1578 1579
/************************************************************************
Sets file_page_was_freed TRUE if the page is found in the buffer pool.
This function should be called when we free a file page and want the
debug version to check that it is not accessed any more unless
reallocated. */
1580
UNIV_INTERN
1581
buf_page_t*
osku's avatar
osku committed
1582 1583
buf_page_set_file_page_was_freed(
/*=============================*/
1584
			/* out: control block if found in page hash table,
osku's avatar
osku committed
1585 1586 1587 1588
			otherwise NULL */
	ulint	space,	/* in: space id */
	ulint	offset)	/* in: page number */
{
1589
	buf_page_t*	bpage;
osku's avatar
osku committed
1590

1591
	buf_pool_mutex_enter();
osku's avatar
osku committed
1592

1593
	bpage = buf_page_hash_get(space, offset);
osku's avatar
osku committed
1594

1595 1596
	if (bpage) {
		bpage->file_page_was_freed = TRUE;
osku's avatar
osku committed
1597 1598
	}

1599
	buf_pool_mutex_exit();
osku's avatar
osku committed
1600

1601
	return(bpage);
osku's avatar
osku committed
1602 1603 1604 1605 1606 1607 1608
}

/************************************************************************
Sets file_page_was_freed FALSE if the page is found in the buffer pool.
This function should be called when we free a file page and want the
debug version to check that it is not accessed any more unless
reallocated. */
1609
UNIV_INTERN
1610
buf_page_t*
osku's avatar
osku committed
1611 1612
buf_page_reset_file_page_was_freed(
/*===============================*/
1613
			/* out: control block if found in page hash table,
osku's avatar
osku committed
1614 1615 1616 1617
			otherwise NULL */
	ulint	space,	/* in: space id */
	ulint	offset)	/* in: page number */
{
1618
	buf_page_t*	bpage;
osku's avatar
osku committed
1619

1620
	buf_pool_mutex_enter();
osku's avatar
osku committed
1621

1622
	bpage = buf_page_hash_get(space, offset);
osku's avatar
osku committed
1623

1624 1625
	if (bpage) {
		bpage->file_page_was_freed = FALSE;
osku's avatar
osku committed
1626 1627
	}

1628
	buf_pool_mutex_exit();
osku's avatar
osku committed
1629

1630
	return(bpage);
osku's avatar
osku committed
1631
}
1632
#endif /* UNIV_DEBUG_FILE_ACCESSES */
osku's avatar
osku committed
1633

1634
/************************************************************************
1635 1636
Get read access to a compressed page (usually of type
FIL_PAGE_TYPE_ZBLOB or FIL_PAGE_TYPE_ZBLOB2).
1637 1638 1639 1640 1641
The page must be released with buf_page_release_zip().
NOTE: the page is not protected by any latch.  Mutual exclusion has to
be implemented at a higher level.  In other words, all possible
accesses to a given page through this function must be protected by
the same set of mutexes or latches. */
1642
UNIV_INTERN
1643 1644 1645 1646 1647 1648 1649 1650 1651 1652 1653 1654 1655 1656 1657 1658 1659 1660
buf_page_t*
buf_page_get_zip(
/*=============*/
				/* out: pointer to the block */
	ulint		space,	/* in: space id */
	ulint		zip_size,/* in: compressed page size */
	ulint		offset)	/* in: page number */
{
	buf_page_t*	bpage;
	mutex_t*	block_mutex;
	ibool		must_read;

#ifndef UNIV_LOG_DEBUG
	ut_ad(!ibuf_inside());
#endif
	buf_pool->n_page_gets++;

	for (;;) {
1661
		buf_pool_mutex_enter();
1662 1663 1664 1665 1666 1667 1668 1669
lookup:
		bpage = buf_page_hash_get(space, offset);
		if (bpage) {
			break;
		}

		/* Page not in buf_pool: needs to be read from file */

1670
		buf_pool_mutex_exit();
1671 1672 1673 1674 1675 1676 1677 1678 1679 1680

		buf_read_page(space, zip_size, offset);

#if defined UNIV_DEBUG || defined UNIV_BUF_DEBUG
		ut_a(++buf_dbg_counter % 37 || buf_validate());
#endif /* UNIV_DEBUG || UNIV_BUF_DEBUG */
	}

	if (UNIV_UNLIKELY(!bpage->zip.data)) {
		/* There is no compressed page. */
1681
		buf_pool_mutex_exit();
1682 1683 1684 1685 1686 1687 1688 1689 1690 1691 1692 1693 1694 1695 1696 1697 1698 1699 1700 1701
		return(NULL);
	}

	block_mutex = buf_page_get_mutex(bpage);
	mutex_enter(block_mutex);

	switch (buf_page_get_state(bpage)) {
	case BUF_BLOCK_NOT_USED:
	case BUF_BLOCK_READY_FOR_USE:
	case BUF_BLOCK_MEMORY:
	case BUF_BLOCK_REMOVE_HASH:
	case BUF_BLOCK_ZIP_FREE:
		ut_error;
		break;
	case BUF_BLOCK_ZIP_PAGE:
	case BUF_BLOCK_ZIP_DIRTY:
		bpage->buf_fix_count++;
		break;
	case BUF_BLOCK_FILE_PAGE:
		/* Discard the uncompressed page frame if possible. */
1702 1703
		if (buf_LRU_free_block(bpage, FALSE, NULL)
		    == BUF_LRU_FREED) {
1704 1705 1706 1707 1708 1709 1710 1711 1712 1713 1714 1715

			mutex_exit(block_mutex);
			goto lookup;
		}

		buf_block_buf_fix_inc((buf_block_t*) bpage,
				      __FILE__, __LINE__);
		break;
	}

	must_read = buf_page_get_io_fix(bpage) == BUF_IO_READ;

1716
	buf_pool_mutex_exit();
1717 1718 1719 1720 1721 1722 1723 1724 1725 1726 1727 1728 1729 1730 1731 1732 1733 1734 1735 1736 1737 1738 1739 1740 1741 1742 1743 1744 1745 1746 1747 1748 1749 1750 1751 1752 1753

	buf_page_set_accessed(bpage, TRUE);

	mutex_exit(block_mutex);

	buf_block_make_young(bpage);

#ifdef UNIV_DEBUG_FILE_ACCESSES
	ut_a(!bpage->file_page_was_freed);
#endif

#if defined UNIV_DEBUG || defined UNIV_BUF_DEBUG
	ut_a(++buf_dbg_counter % 5771 || buf_validate());
	ut_a(bpage->buf_fix_count > 0);
	ut_a(buf_page_in_file(bpage));
#endif /* UNIV_DEBUG || UNIV_BUF_DEBUG */

	if (must_read) {
		/* Let us wait until the read operation
		completes */

		for (;;) {
			enum buf_io_fix	io_fix;

			mutex_enter(block_mutex);
			io_fix = buf_page_get_io_fix(bpage);
			mutex_exit(block_mutex);

			if (io_fix == BUF_IO_READ) {

				os_thread_sleep(WAIT_FOR_READ);
			} else {
				break;
			}
		}
	}

1754
#ifdef UNIV_IBUF_COUNT_DEBUG
1755 1756 1757 1758 1759 1760
	ut_a(ibuf_count_get(buf_page_get_space(bpage),
			    buf_page_get_page_no(bpage)) == 0);
#endif
	return(bpage);
}

1761 1762 1763 1764 1765 1766 1767 1768 1769 1770 1771 1772 1773 1774 1775 1776 1777 1778 1779 1780 1781 1782 1783 1784 1785 1786 1787 1788 1789 1790 1791 1792 1793 1794 1795 1796 1797 1798 1799 1800 1801 1802 1803 1804 1805 1806 1807 1808 1809 1810 1811 1812 1813 1814 1815 1816 1817 1818 1819 1820 1821 1822 1823 1824 1825 1826 1827 1828 1829
/************************************************************************
Initialize some fields of a control block. */
UNIV_INLINE
void
buf_block_init_low(
/*===============*/
	buf_block_t*	block)	/* in: block to init */
{
	block->check_index_page_at_flush = FALSE;
	block->index		= NULL;

	block->n_hash_helps	= 0;
	block->is_hashed	= FALSE;
	block->n_fields		= 1;
	block->n_bytes		= 0;
	block->left_side	= TRUE;
}

/************************************************************************
Decompress a block. */
static
ibool
buf_zip_decompress(
/*===============*/
				/* out: TRUE if successful */
	buf_block_t*	block,	/* in/out: block */
	ibool		check)	/* in: TRUE=verify the page checksum */
{
	const byte* frame = block->page.zip.data;

	ut_ad(buf_block_get_zip_size(block));
	ut_a(buf_block_get_space(block) != 0);

	if (UNIV_LIKELY(check)) {
		ulint	stamp_checksum	= mach_read_from_4(
			frame + FIL_PAGE_SPACE_OR_CHKSUM);
		ulint	calc_checksum	= page_zip_calc_checksum(
			frame, page_zip_get_size(&block->page.zip));

		if (UNIV_UNLIKELY(stamp_checksum != calc_checksum)) {
			ut_print_timestamp(stderr);
			fprintf(stderr,
				"  InnoDB: compressed page checksum mismatch"
				" (space %u page %u): %lu != %lu\n",
				block->page.space, block->page.offset,
				stamp_checksum, calc_checksum);
			return(FALSE);
		}
	}

	switch (fil_page_get_type(frame)) {
	case FIL_PAGE_INDEX:
		if (page_zip_decompress(&block->page.zip,
					block->frame)) {
			return(TRUE);
		}

		fprintf(stderr,
			"InnoDB: unable to decompress space %lu page %lu\n",
			(ulong) block->page.space,
			(ulong) block->page.offset);
		return(FALSE);

	case FIL_PAGE_TYPE_ALLOCATED:
	case FIL_PAGE_INODE:
	case FIL_PAGE_IBUF_BITMAP:
	case FIL_PAGE_TYPE_FSP_HDR:
	case FIL_PAGE_TYPE_XDES:
	case FIL_PAGE_TYPE_ZBLOB:
1830
	case FIL_PAGE_TYPE_ZBLOB2:
1831 1832 1833 1834 1835 1836 1837 1838 1839 1840 1841 1842 1843 1844
		/* Copy to uncompressed storage. */
		memcpy(block->frame, frame,
		       buf_block_get_zip_size(block));
		return(TRUE);
	}

	ut_print_timestamp(stderr);
	fprintf(stderr,
		"  InnoDB: unknown compressed page"
		" type %lu\n",
		fil_page_get_type(frame));
	return(FALSE);
}

1845 1846 1847 1848 1849 1850 1851 1852 1853 1854 1855 1856 1857 1858 1859 1860 1861 1862 1863 1864 1865 1866 1867 1868 1869 1870 1871 1872 1873 1874 1875
/***********************************************************************
Gets the block to whose frame the pointer is pointing to. */
UNIV_INTERN
buf_block_t*
buf_block_align(
/*============*/
				/* out: pointer to block, never NULL */
	const byte*	ptr)	/* in: pointer to a frame */
{
	buf_chunk_t*	chunk;
	ulint		i;

	/* TODO: protect buf_pool->chunks with a mutex (it will
	currently remain constant after buf_pool_init()) */
	for (chunk = buf_pool->chunks, i = buf_pool->n_chunks; i--; chunk++) {
		lint	offs = ptr - chunk->blocks->frame;

		if (UNIV_UNLIKELY(offs < 0)) {

			continue;
		}

		offs >>= UNIV_PAGE_SIZE_SHIFT;

		if (UNIV_LIKELY((ulint) offs < chunk->size)) {
			buf_block_t*	block = &chunk->blocks[offs];

			/* The function buf_chunk_init() invokes
			buf_block_init() so that block[n].frame ==
			block->frame + n * UNIV_PAGE_SIZE.  Check it. */
			ut_ad(block->frame == page_align(ptr));
1876 1877 1878 1879 1880 1881 1882 1883 1884 1885 1886 1887 1888 1889 1890 1891 1892 1893 1894 1895 1896 1897 1898 1899 1900 1901 1902 1903 1904 1905 1906 1907 1908 1909 1910 1911 1912 1913 1914 1915 1916 1917 1918 1919 1920 1921
#ifdef UNIV_DEBUG
			/* A thread that updates these fields must
			hold buf_pool_mutex and block->mutex.  Acquire
			only the latter. */
			mutex_enter(&block->mutex);

			switch (buf_block_get_state(block)) {
			case BUF_BLOCK_ZIP_FREE:
			case BUF_BLOCK_ZIP_PAGE:
			case BUF_BLOCK_ZIP_DIRTY:
				/* These types should only be used in
				the compressed buffer pool, whose
				memory is allocated from
				buf_pool->chunks, in UNIV_PAGE_SIZE
				blocks flagged as BUF_BLOCK_MEMORY. */
				ut_error;
				break;
			case BUF_BLOCK_NOT_USED:
			case BUF_BLOCK_READY_FOR_USE:
			case BUF_BLOCK_MEMORY:
				/* Some data structures contain
				"guess" pointers to file pages.  The
				file pages may have been freed and
				reused.  Do not complain. */
				break;
			case BUF_BLOCK_REMOVE_HASH:
				/* buf_LRU_block_remove_hashed_page()
				will overwrite the FIL_PAGE_OFFSET and
				FIL_PAGE_ARCH_LOG_NO_OR_SPACE_ID with
				0xff and set the state to
				BUF_BLOCK_REMOVE_HASH. */
				ut_ad(page_get_space_id(page_align(ptr))
				      == 0xffffffff);
				ut_ad(page_get_page_no(page_align(ptr))
				      == 0xffffffff);
				break;
			case BUF_BLOCK_FILE_PAGE:
				ut_ad(block->page.space
				      == page_get_space_id(page_align(ptr)));
				ut_ad(block->page.offset
				      == page_get_page_no(page_align(ptr)));
				break;
			}

			mutex_exit(&block->mutex);
#endif /* UNIV_DEBUG */
1922 1923 1924 1925 1926 1927 1928 1929 1930 1931

			return(block);
		}
	}

	/* The block should always be found. */
	ut_error;
	return(NULL);
}

1932 1933 1934 1935 1936 1937 1938 1939 1940 1941 1942 1943 1944 1945 1946
/************************************************************************
Find out if a buffer block was created by buf_chunk_init(). */
static
ibool
buf_block_is_uncompressed(
/*======================*/
					/* out: TRUE if "block" has
					been added to buf_pool->free
					by buf_chunk_init() */
	const buf_block_t*	block)	/* in: pointer to block,
					not dereferenced */
{
	const buf_chunk_t*		chunk	= buf_pool->chunks;
	const buf_chunk_t* const	echunk	= chunk + buf_pool->n_chunks;

1947
	ut_ad(buf_pool_mutex_own());
1948

1949
	if (UNIV_UNLIKELY((((ulint) block) % sizeof *block) != 0)) {
1950 1951 1952 1953
		/* The pointer should be aligned. */
		return(FALSE);
	}

1954 1955 1956 1957 1958 1959 1960 1961 1962 1963 1964 1965 1966
	while (chunk < echunk) {
		if (block >= chunk->blocks
		    && block < chunk->blocks + chunk->size) {

			return(TRUE);
		}

		chunk++;
	}

	return(FALSE);
}

osku's avatar
osku committed
1967 1968
/************************************************************************
This is the general function used to get access to a database page. */
1969
UNIV_INTERN
1970
buf_block_t*
osku's avatar
osku committed
1971 1972
buf_page_get_gen(
/*=============*/
1973
				/* out: pointer to the block or NULL */
osku's avatar
osku committed
1974
	ulint		space,	/* in: space id */
1975 1976
	ulint		zip_size,/* in: compressed page size in bytes
				or 0 for uncompressed pages */
osku's avatar
osku committed
1977 1978
	ulint		offset,	/* in: page number */
	ulint		rw_latch,/* in: RW_S_LATCH, RW_X_LATCH, RW_NO_LATCH */
1979
	buf_block_t*	guess,	/* in: guessed block or NULL */
osku's avatar
osku committed
1980
	ulint		mode,	/* in: BUF_GET, BUF_GET_IF_IN_POOL,
1981
				BUF_GET_NO_LATCH */
osku's avatar
osku committed
1982 1983 1984 1985 1986 1987 1988 1989
	const char*	file,	/* in: file name */
	ulint		line,	/* in: line where called */
	mtr_t*		mtr)	/* in: mini-transaction */
{
	buf_block_t*	block;
	ibool		accessed;
	ulint		fix_type;
	ibool		must_read;
1990

osku's avatar
osku committed
1991 1992
	ut_ad(mtr);
	ut_ad((rw_latch == RW_S_LATCH)
1993 1994
	      || (rw_latch == RW_X_LATCH)
	      || (rw_latch == RW_NO_LATCH));
osku's avatar
osku committed
1995 1996
	ut_ad((mode != BUF_GET_NO_LATCH) || (rw_latch == RW_NO_LATCH));
	ut_ad((mode == BUF_GET) || (mode == BUF_GET_IF_IN_POOL)
1997
	      || (mode == BUF_GET_NO_LATCH));
1998
	ut_ad(zip_size == fil_space_get_zip_size(space));
osku's avatar
osku committed
1999
#ifndef UNIV_LOG_DEBUG
2000
	ut_ad(!ibuf_inside() || ibuf_page(space, zip_size, offset, NULL));
osku's avatar
osku committed
2001 2002 2003
#endif
	buf_pool->n_page_gets++;
loop:
2004
	block = guess;
2005
	buf_pool_mutex_enter();
2006

2007
	if (block) {
2008
		/* If the guess is a compressed page descriptor that
2009 2010
		has been allocated by buf_buddy_alloc(), it may have
		been invalidated by buf_buddy_relocate().  In that
2011 2012 2013 2014
		case, block could point to something that happens to
		contain the expected bits in block->page.  Similarly,
		the guess may be pointing to a buffer pool chunk that
		has been released when resizing the buffer pool. */
2015 2016 2017

		if (!buf_block_is_uncompressed(block)
		    || offset != block->page.offset
2018
		    || space != block->page.space
2019
		    || buf_block_get_state(block) != BUF_BLOCK_FILE_PAGE) {
osku's avatar
osku committed
2020

2021
			block = guess = NULL;
2022 2023 2024
		} else {
			ut_ad(!block->page.in_zip_hash);
			ut_ad(block->page.in_page_hash);
osku's avatar
osku committed
2025 2026 2027 2028
		}
	}

	if (block == NULL) {
2029
		block = (buf_block_t*) buf_page_hash_get(space, offset);
osku's avatar
osku committed
2030 2031
	}

2032 2033 2034
loop2:
	if (block == NULL) {
		/* Page not in buf_pool: needs to be read from file */
osku's avatar
osku committed
2035

2036
		buf_pool_mutex_exit();
osku's avatar
osku committed
2037

2038
		if (mode == BUF_GET_IF_IN_POOL) {
osku's avatar
osku committed
2039 2040 2041 2042

			return(NULL);
		}

marko's avatar
marko committed
2043
		buf_read_page(space, zip_size, offset);
osku's avatar
osku committed
2044

2045 2046 2047
#if defined UNIV_DEBUG || defined UNIV_BUF_DEBUG
		ut_a(++buf_dbg_counter % 37 || buf_validate());
#endif /* UNIV_DEBUG || UNIV_BUF_DEBUG */
osku's avatar
osku committed
2048 2049 2050
		goto loop;
	}

2051
	ut_ad(page_zip_get_size(&block->page.zip) == zip_size);
osku's avatar
osku committed
2052

2053
	must_read = buf_block_get_io_fix(block) == BUF_IO_READ;
2054

2055 2056
	if (must_read && mode == BUF_GET_IF_IN_POOL) {
		/* The page is only being read to buffer */
2057
		buf_pool_mutex_exit();
osku's avatar
osku committed
2058

2059
		return(NULL);
2060
	}
osku's avatar
osku committed
2061

2062 2063
	switch (buf_block_get_state(block)) {
		buf_page_t*	bpage;
2064
		ibool		success;
2065 2066 2067 2068 2069 2070 2071 2072 2073 2074

	case BUF_BLOCK_FILE_PAGE:
		break;

	case BUF_BLOCK_ZIP_PAGE:
	case BUF_BLOCK_ZIP_DIRTY:
		bpage = &block->page;

		if (bpage->buf_fix_count
		    || buf_page_get_io_fix(bpage) != BUF_IO_NONE) {
2075 2076 2077
			/* This condition often occurs when the buffer
			is not buffer-fixed, but I/O-fixed by
			buf_page_init_for_read(). */
2078
wait_until_unfixed:
2079 2080
			/* The block is buffer-fixed or I/O-fixed.
			Try again later. */
2081
			buf_pool_mutex_exit();
2082 2083 2084 2085 2086 2087
			os_thread_sleep(WAIT_FOR_READ);

			goto loop;
		}

		/* Allocate an uncompressed page. */
2088
		buf_pool_mutex_exit();
2089 2090 2091 2092

		block = buf_LRU_get_free_block(0);
		ut_a(block);

2093
		buf_pool_mutex_enter();
2094 2095 2096 2097 2098 2099 2100 2101
		mutex_enter(&block->mutex);

		{
			buf_page_t*	hash_bpage
				= buf_page_hash_get(space, offset);

			if (UNIV_UNLIKELY(bpage != hash_bpage)) {
				/* The buf_pool->page_hash was modified
2102
				while buf_pool_mutex was released.
2103 2104 2105 2106 2107 2108 2109 2110 2111 2112
				Free the block that was allocated. */

				buf_LRU_block_free_non_file_page(block);
				mutex_exit(&block->mutex);

				block = (buf_block_t*) hash_bpage;
				goto loop2;
			}
		}

2113 2114 2115 2116 2117
		if (UNIV_UNLIKELY
		    (bpage->buf_fix_count
		     || buf_page_get_io_fix(bpage) != BUF_IO_NONE)) {

			/* The block was buffer-fixed or I/O-fixed
2118
			while buf_pool_mutex was not held by this thread.
2119 2120
			Free the block that was allocated and try again.
			This should be extremely unlikely. */
2121 2122 2123 2124 2125 2126 2127

			buf_LRU_block_free_non_file_page(block);
			mutex_exit(&block->mutex);

			goto wait_until_unfixed;
		}

2128 2129 2130
		/* Move the compressed page from bpage to block,
		and uncompress it. */

2131
		mutex_enter(&buf_pool_zip_mutex);
2132 2133 2134 2135 2136

		buf_relocate(bpage, &block->page);
		buf_block_init_low(block);
		block->lock_hash_val = lock_rec_hash(space, offset);

2137 2138
		UNIV_MEM_DESC(&block->page.zip.data,
			      page_zip_get_size(&block->page.zip), block);
2139 2140 2141 2142 2143 2144 2145 2146 2147 2148 2149 2150 2151 2152 2153 2154 2155 2156 2157 2158 2159 2160 2161 2162 2163 2164 2165

		if (buf_page_get_state(&block->page)
		    == BUF_BLOCK_ZIP_PAGE) {
			UT_LIST_REMOVE(list, buf_pool->zip_clean,
				       &block->page);
			ut_ad(!block->page.in_flush_list);
		} else {
			/* Relocate buf_pool->flush_list. */
			buf_page_t*	b;

			b = UT_LIST_GET_PREV(list, &block->page);
			ut_ad(block->page.in_flush_list);
			UT_LIST_REMOVE(list, buf_pool->flush_list,
				       &block->page);

			if (b) {
				UT_LIST_INSERT_AFTER(
					list, buf_pool->flush_list, b,
					&block->page);
			} else {
				UT_LIST_ADD_FIRST(
					list, buf_pool->flush_list,
					&block->page);
			}
		}

		/* Buffer-fix, I/O-fix, and X-latch the block
2166 2167
		for the duration of the decompression.
		Also add the block to the unzip_LRU list. */
2168
		block->page.state = BUF_BLOCK_FILE_PAGE;
2169 2170 2171 2172

		/* Insert at the front of unzip_LRU list */
		buf_unzip_LRU_add_block(block, FALSE);

2173 2174
		block->page.buf_fix_count = 1;
		buf_block_set_io_fix(block, BUF_IO_READ);
2175
		buf_pool->n_pend_unzip++;
2176 2177
		rw_lock_x_lock(&block->lock);
		mutex_exit(&block->mutex);
2178
		mutex_exit(&buf_pool_zip_mutex);
2179 2180 2181

		buf_buddy_free(bpage, sizeof *bpage);

2182
		buf_pool_mutex_exit();
2183

2184
		/* Decompress the page and apply buffered operations
2185
		while not holding buf_pool_mutex or block->mutex. */
2186 2187 2188 2189 2190 2191
		success = buf_zip_decompress(block, srv_use_checksums);

		if (UNIV_LIKELY(success)) {
			ibuf_merge_or_delete_for_page(block, space, offset,
						      zip_size, TRUE);
		}
2192 2193

		/* Unfix and unlatch the block. */
2194
		buf_pool_mutex_enter();
2195
		mutex_enter(&block->mutex);
2196
		buf_pool->n_pend_unzip--;
2197 2198 2199 2200 2201
		block->page.buf_fix_count--;
		buf_block_set_io_fix(block, BUF_IO_NONE);
		mutex_exit(&block->mutex);
		rw_lock_x_unlock(&block->lock);

2202 2203
		if (UNIV_UNLIKELY(!success)) {

2204
			buf_pool_mutex_exit();
2205 2206 2207
			return(NULL);
		}

2208 2209 2210 2211 2212 2213 2214 2215 2216 2217 2218 2219 2220 2221
		break;

	case BUF_BLOCK_ZIP_FREE:
	case BUF_BLOCK_NOT_USED:
	case BUF_BLOCK_READY_FOR_USE:
	case BUF_BLOCK_MEMORY:
	case BUF_BLOCK_REMOVE_HASH:
		ut_error;
		break;
	}

	ut_ad(buf_block_get_state(block) == BUF_BLOCK_FILE_PAGE);

	mutex_enter(&block->mutex);
2222
	UNIV_MEM_ASSERT_RW(&block->page, sizeof block->page);
2223

2224
	buf_block_buf_fix_inc(block, file, line);
2225
	buf_pool_mutex_exit();
osku's avatar
osku committed
2226 2227 2228

	/* Check if this is the first access to the page */

2229
	accessed = buf_page_is_accessed(&block->page);
osku's avatar
osku committed
2230

2231
	buf_page_set_accessed(&block->page, TRUE);
osku's avatar
osku committed
2232

2233 2234
	mutex_exit(&block->mutex);

2235
	buf_block_make_young(&block->page);
2236

osku's avatar
osku committed
2237
#ifdef UNIV_DEBUG_FILE_ACCESSES
2238
	ut_a(!block->page.file_page_was_freed);
2239
#endif
osku's avatar
osku committed
2240

2241 2242
#if defined UNIV_DEBUG || defined UNIV_BUF_DEBUG
	ut_a(++buf_dbg_counter % 5771 || buf_validate());
2243
	ut_a(block->page.buf_fix_count > 0);
2244
	ut_a(buf_block_get_state(block) == BUF_BLOCK_FILE_PAGE);
2245
#endif /* UNIV_DEBUG || UNIV_BUF_DEBUG */
osku's avatar
osku committed
2246

2247 2248
	switch (rw_latch) {
	case RW_NO_LATCH:
osku's avatar
osku committed
2249
		if (must_read) {
2250
			/* Let us wait until the read operation
osku's avatar
osku committed
2251 2252
			completes */

2253
			for (;;) {
2254
				enum buf_io_fix	io_fix;
osku's avatar
osku committed
2255

2256 2257 2258
				mutex_enter(&block->mutex);
				io_fix = buf_block_get_io_fix(block);
				mutex_exit(&block->mutex);
osku's avatar
osku committed
2259

2260
				if (io_fix == BUF_IO_READ) {
osku's avatar
osku committed
2261

2262
					os_thread_sleep(WAIT_FOR_READ);
2263 2264
				} else {
					break;
osku's avatar
osku committed
2265 2266 2267 2268 2269
				}
			}
		}

		fix_type = MTR_MEMO_BUF_FIX;
2270
		break;
osku's avatar
osku committed
2271

2272
	case RW_S_LATCH:
osku's avatar
osku committed
2273 2274 2275
		rw_lock_s_lock_func(&(block->lock), 0, file, line);

		fix_type = MTR_MEMO_PAGE_S_FIX;
2276 2277
		break;

2278 2279
	default:
		ut_ad(rw_latch == RW_X_LATCH);
osku's avatar
osku committed
2280 2281 2282
		rw_lock_x_lock_func(&(block->lock), 0, file, line);

		fix_type = MTR_MEMO_PAGE_X_FIX;
2283
		break;
osku's avatar
osku committed
2284 2285 2286 2287 2288 2289 2290 2291
	}

	mtr_memo_push(mtr, block, fix_type);

	if (!accessed) {
		/* In the case of a first access, try to apply linear
		read-ahead */

marko's avatar
marko committed
2292
		buf_read_ahead_linear(space, zip_size, offset);
osku's avatar
osku committed
2293 2294
	}

2295
#ifdef UNIV_IBUF_COUNT_DEBUG
2296 2297
	ut_a(ibuf_count_get(buf_block_get_space(block),
			    buf_block_get_page_no(block)) == 0);
osku's avatar
osku committed
2298
#endif
2299
	return(block);
osku's avatar
osku committed
2300 2301 2302 2303 2304
}

/************************************************************************
This is the general function used to get optimistic access to a database
page. */
2305
UNIV_INTERN
osku's avatar
osku committed
2306 2307 2308 2309 2310 2311
ibool
buf_page_optimistic_get_func(
/*=========================*/
				/* out: TRUE if success */
	ulint		rw_latch,/* in: RW_S_LATCH, RW_X_LATCH */
	buf_block_t*	block,	/* in: guessed buffer block */
2312
	ib_uint64_t	modify_clock,/* in: modify clock value if mode is
osku's avatar
osku committed
2313 2314 2315 2316 2317 2318 2319 2320 2321 2322 2323
				..._GUESS_ON_CLOCK */
	const char*	file,	/* in: file name */
	ulint		line,	/* in: line where called */
	mtr_t*		mtr)	/* in: mini-transaction */
{
	ibool		accessed;
	ibool		success;
	ulint		fix_type;

	ut_ad(mtr && block);
	ut_ad((rw_latch == RW_S_LATCH) || (rw_latch == RW_X_LATCH));
2324

2325
	mutex_enter(&block->mutex);
osku's avatar
osku committed
2326

2327
	if (UNIV_UNLIKELY(buf_block_get_state(block) != BUF_BLOCK_FILE_PAGE)) {
2328 2329

		mutex_exit(&block->mutex);
osku's avatar
osku committed
2330 2331 2332 2333

		return(FALSE);
	}

2334
	buf_block_buf_fix_inc(block, file, line);
2335 2336
	accessed = buf_page_is_accessed(&block->page);
	buf_page_set_accessed(&block->page, TRUE);
osku's avatar
osku committed
2337

2338 2339
	mutex_exit(&block->mutex);

2340
	buf_block_make_young(&block->page);
2341 2342

	/* Check if this is the first access to the page */
osku's avatar
osku committed
2343

2344
	ut_ad(!ibuf_inside()
2345 2346
	      || ibuf_page(buf_block_get_space(block),
			   buf_block_get_zip_size(block),
2347
			   buf_block_get_page_no(block), NULL));
osku's avatar
osku committed
2348 2349

	if (rw_latch == RW_S_LATCH) {
inaam's avatar
inaam committed
2350 2351
		success = rw_lock_s_lock_nowait(&(block->lock),
						file, line);
osku's avatar
osku committed
2352 2353 2354
		fix_type = MTR_MEMO_PAGE_S_FIX;
	} else {
		success = rw_lock_x_lock_func_nowait(&(block->lock),
2355
						     file, line);
osku's avatar
osku committed
2356 2357 2358 2359
		fix_type = MTR_MEMO_PAGE_X_FIX;
	}

	if (UNIV_UNLIKELY(!success)) {
2360
		mutex_enter(&block->mutex);
2361
		buf_block_buf_fix_dec(block);
2362 2363 2364
		mutex_exit(&block->mutex);

		return(FALSE);
osku's avatar
osku committed
2365 2366
	}

2367
	if (UNIV_UNLIKELY(modify_clock != block->modify_clock)) {
2368
		buf_block_dbg_add_level(block, SYNC_NO_ORDER_CHECK);
marko's avatar
marko committed
2369

osku's avatar
osku committed
2370 2371 2372 2373 2374 2375
		if (rw_latch == RW_S_LATCH) {
			rw_lock_s_unlock(&(block->lock));
		} else {
			rw_lock_x_unlock(&(block->lock));
		}

2376
		mutex_enter(&block->mutex);
2377
		buf_block_buf_fix_dec(block);
2378 2379 2380
		mutex_exit(&block->mutex);

		return(FALSE);
osku's avatar
osku committed
2381 2382 2383 2384
	}

	mtr_memo_push(mtr, block, fix_type);

2385 2386
#if defined UNIV_DEBUG || defined UNIV_BUF_DEBUG
	ut_a(++buf_dbg_counter % 5771 || buf_validate());
2387
	ut_a(block->page.buf_fix_count > 0);
2388
	ut_a(buf_block_get_state(block) == BUF_BLOCK_FILE_PAGE);
2389
#endif /* UNIV_DEBUG || UNIV_BUF_DEBUG */
osku's avatar
osku committed
2390 2391

#ifdef UNIV_DEBUG_FILE_ACCESSES
2392
	ut_a(block->page.file_page_was_freed == FALSE);
osku's avatar
osku committed
2393 2394 2395 2396 2397
#endif
	if (UNIV_UNLIKELY(!accessed)) {
		/* In the case of a first access, try to apply linear
		read-ahead */

2398 2399 2400
		buf_read_ahead_linear(buf_block_get_space(block),
				      buf_block_get_zip_size(block),
				      buf_block_get_page_no(block));
osku's avatar
osku committed
2401 2402
	}

2403
#ifdef UNIV_IBUF_COUNT_DEBUG
2404 2405
	ut_a(ibuf_count_get(buf_block_get_space(block),
			    buf_block_get_page_no(block)) == 0);
osku's avatar
osku committed
2406 2407 2408 2409 2410 2411 2412 2413 2414 2415
#endif
	buf_pool->n_page_gets++;

	return(TRUE);
}

/************************************************************************
This is used to get access to a known database page, when no waiting can be
done. For example, if a search in an adaptive hash index leads us to this
frame. */
2416
UNIV_INTERN
osku's avatar
osku committed
2417 2418 2419 2420 2421
ibool
buf_page_get_known_nowait(
/*======================*/
				/* out: TRUE if success */
	ulint		rw_latch,/* in: RW_S_LATCH, RW_X_LATCH */
2422
	buf_block_t*	block,	/* in: the known page */
osku's avatar
osku committed
2423 2424 2425 2426 2427 2428 2429 2430 2431 2432
	ulint		mode,	/* in: BUF_MAKE_YOUNG or BUF_KEEP_OLD */
	const char*	file,	/* in: file name */
	ulint		line,	/* in: line where called */
	mtr_t*		mtr)	/* in: mini-transaction */
{
	ibool		success;
	ulint		fix_type;

	ut_ad(mtr);
	ut_ad((rw_latch == RW_S_LATCH) || (rw_latch == RW_X_LATCH));
2433

2434
	mutex_enter(&block->mutex);
osku's avatar
osku committed
2435

2436
	if (buf_block_get_state(block) == BUF_BLOCK_REMOVE_HASH) {
2437 2438
		/* Another thread is just freeing the block from the LRU list
		of the buffer pool: do not try to access this page; this
osku's avatar
osku committed
2439 2440 2441 2442 2443
		attempt to access the page can only come through the hash
		index because when the buffer block state is ..._REMOVE_HASH,
		we have already removed it from the page address hash table
		of the buffer pool. */

2444
		mutex_exit(&block->mutex);
osku's avatar
osku committed
2445 2446 2447 2448

		return(FALSE);
	}

2449
	ut_a(buf_block_get_state(block) == BUF_BLOCK_FILE_PAGE);
osku's avatar
osku committed
2450

2451 2452
	buf_block_buf_fix_inc(block, file, line);

2453 2454
	mutex_exit(&block->mutex);

osku's avatar
osku committed
2455
	if (mode == BUF_MAKE_YOUNG) {
2456
		buf_block_make_young(&block->page);
osku's avatar
osku committed
2457 2458 2459 2460 2461
	}

	ut_ad(!ibuf_inside() || (mode == BUF_KEEP_OLD));

	if (rw_latch == RW_S_LATCH) {
inaam's avatar
inaam committed
2462 2463
		success = rw_lock_s_lock_nowait(&(block->lock),
						file, line);
osku's avatar
osku committed
2464 2465 2466
		fix_type = MTR_MEMO_PAGE_S_FIX;
	} else {
		success = rw_lock_x_lock_func_nowait(&(block->lock),
2467
						     file, line);
osku's avatar
osku committed
2468 2469
		fix_type = MTR_MEMO_PAGE_X_FIX;
	}
2470

osku's avatar
osku committed
2471
	if (!success) {
2472
		mutex_enter(&block->mutex);
2473
		buf_block_buf_fix_dec(block);
2474 2475
		mutex_exit(&block->mutex);

osku's avatar
osku committed
2476 2477 2478 2479 2480
		return(FALSE);
	}

	mtr_memo_push(mtr, block, fix_type);

2481 2482
#if defined UNIV_DEBUG || defined UNIV_BUF_DEBUG
	ut_a(++buf_dbg_counter % 5771 || buf_validate());
2483
	ut_a(block->page.buf_fix_count > 0);
2484
	ut_a(buf_block_get_state(block) == BUF_BLOCK_FILE_PAGE);
2485
#endif /* UNIV_DEBUG || UNIV_BUF_DEBUG */
osku's avatar
osku committed
2486
#ifdef UNIV_DEBUG_FILE_ACCESSES
2487
	ut_a(block->page.file_page_was_freed == FALSE);
osku's avatar
osku committed
2488 2489
#endif

2490
#ifdef UNIV_IBUF_COUNT_DEBUG
osku's avatar
osku committed
2491
	ut_a((mode == BUF_KEEP_OLD)
2492 2493
	     || (ibuf_count_get(buf_block_get_space(block),
				buf_block_get_page_no(block)) == 0));
osku's avatar
osku committed
2494 2495 2496 2497 2498 2499
#endif
	buf_pool->n_page_gets++;

	return(TRUE);
}

vasil's avatar
vasil committed
2500 2501 2502 2503
/***********************************************************************
Given a tablespace id and page number tries to get that page. If the
page is not in the buffer pool it is not loaded and NULL is returned.
Suitable for using when holding the kernel mutex. */
2504
UNIV_INTERN
2505
const buf_block_t*
vasil's avatar
vasil committed
2506 2507
buf_page_try_get_func(
/*==================*/
vasil's avatar
vasil committed
2508
				/* out: pointer to a page or NULL */
vasil's avatar
vasil committed
2509 2510 2511 2512 2513 2514 2515
	ulint		space_id,/* in: tablespace id */
	ulint		page_no,/* in: page number */
	const char*	file,	/* in: file name */
	ulint		line,	/* in: line where called */
	mtr_t*		mtr)	/* in: mini-transaction */
{
	buf_block_t*	block;
2516 2517
	ibool		success;
	ulint		fix_type;
vasil's avatar
vasil committed
2518

2519
	buf_pool_mutex_enter();
2520 2521 2522
	block = buf_block_hash_get(space_id, page_no);

	if (!block) {
2523
		buf_pool_mutex_exit();
2524 2525 2526 2527
		return(NULL);
	}

	mutex_enter(&block->mutex);
2528
	buf_pool_mutex_exit();
vasil's avatar
vasil committed
2529

2530 2531 2532 2533 2534
#if defined UNIV_DEBUG || defined UNIV_BUF_DEBUG
	ut_a(buf_block_get_state(block) == BUF_BLOCK_FILE_PAGE);
	ut_a(buf_block_get_space(block) == space_id);
	ut_a(buf_block_get_page_no(block) == page_no);
#endif /* UNIV_DEBUG || UNIV_BUF_DEBUG */
vasil's avatar
vasil committed
2535

2536 2537
	buf_block_buf_fix_inc(block, file, line);
	mutex_exit(&block->mutex);
vasil's avatar
vasil committed
2538

2539
	fix_type = MTR_MEMO_PAGE_S_FIX;
inaam's avatar
inaam committed
2540
	success = rw_lock_s_lock_nowait(&block->lock, file, line);
vasil's avatar
vasil committed
2541

2542 2543 2544 2545
	if (!success) {
		/* Let us try to get an X-latch. If the current thread
		is holding an X-latch on the page, we cannot get an
		S-latch. */
vasil's avatar
vasil committed
2546

2547 2548 2549
		fix_type = MTR_MEMO_PAGE_X_FIX;
		success = rw_lock_x_lock_func_nowait(&block->lock,
						     file, line);
vasil's avatar
vasil committed
2550 2551
	}

2552 2553 2554 2555 2556 2557 2558 2559 2560 2561 2562 2563 2564 2565 2566 2567 2568 2569
	if (!success) {
		mutex_enter(&block->mutex);
		buf_block_buf_fix_dec(block);
		mutex_exit(&block->mutex);

		return(NULL);
	}

	mtr_memo_push(mtr, block, fix_type);
#if defined UNIV_DEBUG || defined UNIV_BUF_DEBUG
	ut_a(++buf_dbg_counter % 5771 || buf_validate());
	ut_a(block->page.buf_fix_count > 0);
	ut_a(buf_block_get_state(block) == BUF_BLOCK_FILE_PAGE);
#endif /* UNIV_DEBUG || UNIV_BUF_DEBUG */
#ifdef UNIV_DEBUG_FILE_ACCESSES
	ut_a(block->page.file_page_was_freed == FALSE);
#endif /* UNIV_DEBUG_FILE_ACCESSES */
	buf_block_dbg_add_level(block, SYNC_NO_ORDER_CHECK);
marko's avatar
marko committed
2570

2571 2572
	buf_pool->n_page_gets++;

2573 2574 2575 2576 2577
#ifdef UNIV_IBUF_COUNT_DEBUG
	ut_a(ibuf_count_get(buf_block_get_space(block),
			    buf_block_get_page_no(block)) == 0);
#endif

vasil's avatar
vasil committed
2578 2579 2580
	return(block);
}

2581 2582 2583 2584 2585 2586 2587 2588
/************************************************************************
Initialize some fields of a control block. */
UNIV_INLINE
void
buf_page_init_low(
/*==============*/
	buf_page_t*	bpage)	/* in: block to init */
{
2589
	bpage->flush_type = BUF_FLUSH_LRU;
2590 2591 2592 2593 2594 2595
	bpage->accessed = FALSE;
	bpage->io_fix = BUF_IO_NONE;
	bpage->buf_fix_count = 0;
	bpage->freed_page_clock = 0;
	bpage->newest_modification = 0;
	bpage->oldest_modification = 0;
2596
	HASH_INVALIDATE(bpage, hash);
2597 2598 2599 2600 2601
#ifdef UNIV_DEBUG_FILE_ACCESSES
	bpage->file_page_was_freed = FALSE;
#endif /* UNIV_DEBUG_FILE_ACCESSES */
}

2602
#ifdef UNIV_HOTBACKUP
osku's avatar
osku committed
2603 2604
/************************************************************************
Inits a page to the buffer buf_pool, for use in ibbackup --restore. */
2605
UNIV_INTERN
osku's avatar
osku committed
2606 2607 2608 2609 2610 2611
void
buf_page_init_for_backup_restore(
/*=============================*/
	ulint		space,	/* in: space id */
	ulint		offset,	/* in: offset of the page within space
				in units of a page */
2612 2613
	ulint		zip_size,/* in: compressed page size in bytes
				or 0 for uncompressed pages */
osku's avatar
osku committed
2614 2615
	buf_block_t*	block)	/* in: block to init */
{
2616
	buf_block_init_low(block);
osku's avatar
osku committed
2617 2618

	block->lock_hash_val	= 0;
2619

2620
	buf_page_init_low(&block->page);
2621
	block->page.state	= BUF_BLOCK_FILE_PAGE;
2622 2623
	block->page.space	= space;
	block->page.offset	= offset;
osku's avatar
osku committed
2624

2625
	page_zip_des_init(&block->page.zip);
2626

2627
	/* We assume that block->page.data has been allocated
2628 2629 2630
	with zip_size == UNIV_PAGE_SIZE. */
	ut_ad(zip_size <= UNIV_PAGE_SIZE);
	ut_ad(ut_is_2pow(zip_size));
2631
	page_zip_set_size(&block->page.zip, zip_size);
osku's avatar
osku committed
2632
}
2633
#endif /* UNIV_HOTBACKUP */
osku's avatar
osku committed
2634 2635 2636 2637 2638 2639 2640 2641 2642 2643 2644 2645

/************************************************************************
Inits a page to the buffer buf_pool. */
static
void
buf_page_init(
/*==========*/
	ulint		space,	/* in: space id */
	ulint		offset,	/* in: offset of the page within space
				in units of a page */
	buf_block_t*	block)	/* in: block to init */
{
2646
	buf_page_t*	hash_page;
2647

2648
	ut_ad(buf_pool_mutex_own());
2649
	ut_ad(mutex_own(&(block->mutex)));
2650
	ut_a(buf_block_get_state(block) != BUF_BLOCK_FILE_PAGE);
osku's avatar
osku committed
2651 2652

	/* Set the state of the block */
2653
	buf_block_set_file_page(block, space, offset);
2654 2655 2656 2657 2658 2659 2660 2661 2662

#ifdef UNIV_DEBUG_VALGRIND
	if (!space) {
		/* Silence valid Valgrind warnings about uninitialized
		data being written to data files.  There are some unused
		bytes on some pages that InnoDB does not initialize. */
		UNIV_MEM_VALID(block->frame, UNIV_PAGE_SIZE);
	}
#endif /* UNIV_DEBUG_VALGRIND */
osku's avatar
osku committed
2663

2664
	buf_block_init_low(block);
2665

osku's avatar
osku committed
2666
	block->lock_hash_val	= lock_rec_hash(space, offset);
2667

osku's avatar
osku committed
2668 2669
	/* Insert into the hash table of file pages */

2670 2671 2672
	hash_page = buf_page_hash_get(space, offset);

	if (UNIV_LIKELY_NULL(hash_page)) {
2673
		fprintf(stderr,
2674
			"InnoDB: Error: page %lu %lu already found"
2675
			" in the hash table: %p, %p\n",
osku's avatar
osku committed
2676
			(ulong) space,
2677
			(ulong) offset,
2678
			(const void*) hash_page, (const void*) block);
2679
#if defined UNIV_DEBUG || defined UNIV_BUF_DEBUG
2680
		mutex_exit(&block->mutex);
2681
		buf_pool_mutex_exit();
2682 2683 2684 2685
		buf_print();
		buf_LRU_print();
		buf_validate();
		buf_LRU_validate();
2686
#endif /* UNIV_DEBUG || UNIV_BUF_DEBUG */
2687
		ut_error;
2688
	}
osku's avatar
osku committed
2689

2690 2691
	buf_page_init_low(&block->page);

2692 2693 2694
	ut_ad(!block->page.in_zip_hash);
	ut_ad(!block->page.in_page_hash);
	ut_d(block->page.in_page_hash = TRUE);
2695 2696
	HASH_INSERT(buf_page_t, hash, buf_pool->page_hash,
		    buf_page_address_fold(space, offset), &block->page);
osku's avatar
osku committed
2697 2698 2699 2700 2701 2702 2703 2704 2705 2706
}

/************************************************************************
Function which inits a page for read to the buffer buf_pool. If the page is
(1) already in buf_pool, or
(2) if we specify to read only ibuf pages and the page is not an ibuf page, or
(3) if the space is deleted or being deleted,
then this function does nothing.
Sets the io_fix flag to BUF_IO_READ and sets a non-recursive exclusive lock
on the buffer frame. The io-handler must take care that the flag is cleared
2707
and the lock released later. */
2708
UNIV_INTERN
2709
buf_page_t*
osku's avatar
osku committed
2710 2711 2712 2713 2714 2715
buf_page_init_for_read(
/*===================*/
				/* out: pointer to the block or NULL */
	ulint*		err,	/* out: DB_SUCCESS or DB_TABLESPACE_DELETED */
	ulint		mode,	/* in: BUF_READ_IBUF_PAGES_ONLY, ... */
	ulint		space,	/* in: space id */
2716
	ulint		zip_size,/* in: compressed page size, or 0 */
2717
	ibool		unzip,	/* in: TRUE=request uncompressed page */
2718
	ib_int64_t	tablespace_version,/* in: prevents reading from a wrong
osku's avatar
osku committed
2719 2720 2721 2722 2723
				version of the tablespace in case we have done
				DISCARD + IMPORT */
	ulint		offset)	/* in: page number */
{
	buf_block_t*	block;
2724
	buf_page_t*	bpage;
osku's avatar
osku committed
2725
	mtr_t		mtr;
2726 2727
	ibool		lru	= FALSE;
	void*		data;
osku's avatar
osku committed
2728 2729 2730 2731 2732 2733 2734 2735

	ut_ad(buf_pool);

	*err = DB_SUCCESS;

	if (mode == BUF_READ_IBUF_PAGES_ONLY) {
		/* It is a read-ahead within an ibuf routine */

2736
		ut_ad(!ibuf_bitmap_page(zip_size, offset));
osku's avatar
osku committed
2737
		ut_ad(ibuf_inside());
2738

osku's avatar
osku committed
2739
		mtr_start(&mtr);
2740

2741 2742
		if (!recv_no_ibuf_operations
		    && !ibuf_page(space, zip_size, offset, &mtr)) {
osku's avatar
osku committed
2743 2744 2745 2746 2747 2748 2749 2750

			mtr_commit(&mtr);

			return(NULL);
		}
	} else {
		ut_ad(mode == BUF_READ_ANY_PAGE);
	}
2751

2752 2753
	if (zip_size && UNIV_LIKELY(!unzip)
	    && UNIV_LIKELY(!recv_recovery_is_on())) {
2754 2755 2756 2757 2758
		block = NULL;
	} else {
		block = buf_LRU_get_free_block(0);
		ut_ad(block);
	}
osku's avatar
osku committed
2759

2760
	buf_pool_mutex_enter();
2761

2762 2763
	if (buf_page_hash_get(space, offset)) {
		/* The page is already in the buffer pool. */
marko's avatar
marko committed
2764
err_exit:
2765
		if (block) {
2766
			mutex_enter(&block->mutex);
2767 2768 2769
			buf_LRU_block_free_non_file_page(block);
			mutex_exit(&block->mutex);
		}
osku's avatar
osku committed
2770

2771 2772
		bpage = NULL;
		goto func_exit;
marko's avatar
marko committed
2773
	}
osku's avatar
osku committed
2774

2775 2776 2777 2778 2779 2780 2781 2782 2783
	if (fil_tablespace_deleted_or_being_deleted_in_mem(
		    space, tablespace_version)) {
		/* The page belongs to a space which has been
		deleted or is being deleted. */
		*err = DB_TABLESPACE_DELETED;

		goto err_exit;
	}

2784
	if (block) {
2785 2786 2787
		bpage = &block->page;
		mutex_enter(&block->mutex);
		buf_page_init(space, offset, block);
osku's avatar
osku committed
2788

2789 2790
		/* The block must be put to the LRU list, to the old blocks */
		buf_LRU_add_block(bpage, TRUE/* to old blocks */);
osku's avatar
osku committed
2791

2792 2793 2794 2795 2796 2797 2798 2799
		/* We set a pass-type x-lock on the frame because then
		the same thread which called for the read operation
		(and is running now at this point of code) can wait
		for the read to complete by waiting for the x-lock on
		the frame; if the x-lock were recursive, the same
		thread would illegally get the x-lock before the page
		read is completed.  The x-lock is cleared by the
		io-handler thread. */
2800

2801
		rw_lock_x_lock_gen(&block->lock, BUF_IO_READ);
marko's avatar
marko committed
2802
		buf_page_set_io_fix(bpage, BUF_IO_READ);
2803

2804 2805
		if (UNIV_UNLIKELY(zip_size)) {
			page_zip_set_size(&block->page.zip, zip_size);
2806

2807
			/* buf_pool_mutex may be released and
2808 2809 2810
			reacquired by buf_buddy_alloc().  Thus, we
			must release block->mutex in order not to
			break the latching order in the reacquisition
2811
			of buf_pool_mutex.  We also must defer this
2812 2813 2814
			operation until after the block descriptor has
			been added to buf_pool->LRU and
			buf_pool->page_hash. */
2815 2816
			mutex_exit(&block->mutex);
			data = buf_buddy_alloc(zip_size, &lru);
2817 2818
			mutex_enter(&block->mutex);
			block->page.zip.data = data;
2819 2820 2821 2822 2823 2824 2825 2826

			/* To maintain the invariant
			block->in_unzip_LRU_list
			== buf_page_belongs_to_unzip_LRU(&block->page)
			we have to add this block to unzip_LRU
			after block->page.zip.data is set. */
			ut_ad(buf_page_belongs_to_unzip_LRU(&block->page));
			buf_unzip_LRU_add_block(block, TRUE);
2827
		}
2828

2829
		mutex_exit(&block->mutex);
2830
	} else {
2831 2832 2833 2834 2835 2836 2837 2838 2839 2840 2841 2842 2843
		/* Defer buf_buddy_alloc() until after the block has
		been found not to exist.  The buf_buddy_alloc() and
		buf_buddy_free() calls may be expensive because of
		buf_buddy_relocate(). */

		/* The compressed page must be allocated before the
		control block (bpage), in order to avoid the
		invocation of buf_buddy_relocate_block() on
		uninitialized data. */
		data = buf_buddy_alloc(zip_size, &lru);
		bpage = buf_buddy_alloc(sizeof *bpage, &lru);

		/* If buf_buddy_alloc() allocated storage from the LRU list,
2844
		it released and reacquired buf_pool_mutex.  Thus, we must
2845 2846 2847 2848 2849 2850
		check the page_hash again, as it may have been modified. */
		if (UNIV_UNLIKELY(lru)
		    && UNIV_LIKELY_NULL(buf_page_hash_get(space, offset))) {

			/* The block was added by some other thread. */
			buf_buddy_free(bpage, sizeof *bpage);
2851
			buf_buddy_free(data, zip_size);
2852 2853 2854

			bpage = NULL;
			goto func_exit;
2855 2856 2857 2858 2859 2860
		}

		page_zip_des_init(&bpage->zip);
		page_zip_set_size(&bpage->zip, zip_size);
		bpage->zip.data = data;

2861
		mutex_enter(&buf_pool_zip_mutex);
2862 2863 2864 2865 2866 2867
		UNIV_MEM_DESC(bpage->zip.data,
			      page_zip_get_size(&bpage->zip), bpage);
		buf_page_init_low(bpage);
		bpage->state	= BUF_BLOCK_ZIP_PAGE;
		bpage->space	= space;
		bpage->offset	= offset;
2868

2869 2870 2871 2872 2873 2874 2875 2876 2877 2878 2879 2880 2881 2882 2883 2884 2885 2886
#ifdef UNIV_DEBUG
		bpage->in_page_hash = FALSE;
		bpage->in_zip_hash = FALSE;
		bpage->in_flush_list = FALSE;
		bpage->in_free_list = FALSE;
		bpage->in_LRU_list = FALSE;
#endif /* UNIV_DEBUG */

		ut_d(bpage->in_page_hash = TRUE);
		HASH_INSERT(buf_page_t, hash, buf_pool->page_hash,
			    buf_page_address_fold(space, offset), bpage);

		/* The block must be put to the LRU list, to the old blocks */
		buf_LRU_add_block(bpage, TRUE/* to old blocks */);
		buf_LRU_insert_zip_clean(bpage);

		buf_page_set_io_fix(bpage, BUF_IO_READ);

2887
		mutex_exit(&buf_pool_zip_mutex);
2888
	}
osku's avatar
osku committed
2889

2890
	buf_pool->n_pend_reads++;
2891
func_exit:
2892
	buf_pool_mutex_exit();
2893

osku's avatar
osku committed
2894 2895 2896 2897 2898
	if (mode == BUF_READ_IBUF_PAGES_ONLY) {

		mtr_commit(&mtr);
	}

2899
	ut_ad(!bpage || buf_page_in_file(bpage));
2900
	return(bpage);
2901
}
osku's avatar
osku committed
2902 2903 2904 2905 2906

/************************************************************************
Initializes a page to the buffer buf_pool. The page is usually not read
from a file even if it cannot be found in the buffer buf_pool. This is one
of the functions which perform to a block a state transition NOT_USED =>
2907
FILE_PAGE (the other is buf_page_get_gen). */
2908
UNIV_INTERN
2909
buf_block_t*
osku's avatar
osku committed
2910 2911
buf_page_create(
/*============*/
2912
			/* out: pointer to the block, page bufferfixed */
osku's avatar
osku committed
2913 2914 2915
	ulint	space,	/* in: space id */
	ulint	offset,	/* in: offset of the page within space in units of
			a page */
2916
	ulint	zip_size,/* in: compressed page size, or 0 */
osku's avatar
osku committed
2917 2918 2919 2920 2921
	mtr_t*	mtr)	/* in: mini-transaction handle */
{
	buf_frame_t*	frame;
	buf_block_t*	block;
	buf_block_t*	free_block	= NULL;
2922

osku's avatar
osku committed
2923
	ut_ad(mtr);
2924
	ut_ad(space || !zip_size);
osku's avatar
osku committed
2925

2926
	free_block = buf_LRU_get_free_block(0);
2927

2928
	buf_pool_mutex_enter();
osku's avatar
osku committed
2929

2930
	block = (buf_block_t*) buf_page_hash_get(space, offset);
osku's avatar
osku committed
2931

2932
	if (block && buf_page_in_file(&block->page)) {
2933
#ifdef UNIV_IBUF_COUNT_DEBUG
2934
		ut_a(ibuf_count_get(space, offset) == 0);
osku's avatar
osku committed
2935
#endif
2936
#ifdef UNIV_DEBUG_FILE_ACCESSES
2937
		block->page.file_page_was_freed = FALSE;
2938
#endif /* UNIV_DEBUG_FILE_ACCESSES */
osku's avatar
osku committed
2939 2940

		/* Page can be found in buf_pool */
2941
		buf_pool_mutex_exit();
osku's avatar
osku committed
2942 2943 2944

		buf_block_free(free_block);

2945 2946
		return(buf_page_get_with_no_latch(space, zip_size,
						  offset, mtr));
osku's avatar
osku committed
2947 2948 2949 2950 2951 2952 2953 2954 2955 2956 2957 2958
	}

	/* If we get here, the page was not in buf_pool: init it there */

#ifdef UNIV_DEBUG
	if (buf_debug_prints) {
		fprintf(stderr, "Creating space %lu page %lu to buffer\n",
			(ulong) space, (ulong) offset);
	}
#endif /* UNIV_DEBUG */

	block = free_block;
2959

2960 2961
	mutex_enter(&block->mutex);

marko's avatar
marko committed
2962
	buf_page_init(space, offset, block);
osku's avatar
osku committed
2963 2964

	/* The block must be put to the LRU list */
2965
	buf_LRU_add_block(&block->page, FALSE);
2966

2967
	buf_block_buf_fix_inc(block, __FILE__, __LINE__);
2968 2969
	buf_pool->n_pages_created++;

2970 2971
	if (zip_size) {
		void*	data;
2972
		ibool	lru;
2973 2974

		/* Prevent race conditions during buf_buddy_alloc(),
2975
		which may release and reacquire buf_pool_mutex,
2976 2977 2978 2979 2980
		by IO-fixing and X-latching the block. */

		buf_page_set_io_fix(&block->page, BUF_IO_READ);
		rw_lock_x_lock(&block->lock);

2981 2982
		page_zip_set_size(&block->page.zip, zip_size);
		mutex_exit(&block->mutex);
2983
		/* buf_pool_mutex may be released and reacquired by
2984 2985
		buf_buddy_alloc().  Thus, we must release block->mutex
		in order not to break the latching order in
2986
		the reacquisition of buf_pool_mutex.  We also must
2987 2988
		defer this operation until after the block descriptor
		has been added to buf_pool->LRU and buf_pool->page_hash. */
2989
		data = buf_buddy_alloc(zip_size, &lru);
2990 2991 2992
		mutex_enter(&block->mutex);
		block->page.zip.data = data;

2993 2994 2995 2996 2997 2998 2999 3000
		/* To maintain the invariant
		block->in_unzip_LRU_list
		== buf_page_belongs_to_unzip_LRU(&block->page)
		we have to add this block to unzip_LRU after
		block->page.zip.data is set. */
		ut_ad(buf_page_belongs_to_unzip_LRU(&block->page));
		buf_unzip_LRU_add_block(block, FALSE);

3001 3002 3003
		buf_page_set_io_fix(&block->page, BUF_IO_NONE);
		rw_lock_x_unlock(&block->lock);
	}
3004

3005
	buf_pool_mutex_exit();
3006

osku's avatar
osku committed
3007 3008
	mtr_memo_push(mtr, block, MTR_MEMO_BUF_FIX);

3009
	buf_page_set_accessed(&block->page, TRUE);
3010

3011
	mutex_exit(&block->mutex);
osku's avatar
osku committed
3012 3013 3014 3015

	/* Delete possible entries for the page from the insert buffer:
	such can exist if the page belonged to an index which was dropped */

3016
	ibuf_merge_or_delete_for_page(NULL, space, offset, zip_size, TRUE);
osku's avatar
osku committed
3017 3018 3019 3020 3021 3022

	/* Flush pages from the end of the LRU list if necessary */
	buf_flush_free_margin();

	frame = block->frame;

3023 3024 3025 3026
	memset(frame + FIL_PAGE_PREV, 0xff, 4);
	memset(frame + FIL_PAGE_NEXT, 0xff, 4);
	mach_write_to_2(frame + FIL_PAGE_TYPE, FIL_PAGE_TYPE_ALLOCATED);

osku's avatar
osku committed
3027 3028 3029 3030 3031 3032 3033 3034
	/* Reset to zero the file flush lsn field in the page; if the first
	page of an ibdata file is 'created' in this function into the buffer
	pool then we lose the original contents of the file flush lsn stamp.
	Then InnoDB could in a crash recovery print a big, false, corruption
	warning if the stamp contains an lsn bigger than the ib_logfile lsn. */

	memset(frame + FIL_PAGE_FILE_FLUSH_LSN, 0, 8);

3035 3036 3037
#if defined UNIV_DEBUG || defined UNIV_BUF_DEBUG
	ut_a(++buf_dbg_counter % 357 || buf_validate());
#endif /* UNIV_DEBUG || UNIV_BUF_DEBUG */
3038
#ifdef UNIV_IBUF_COUNT_DEBUG
3039 3040
	ut_a(ibuf_count_get(buf_block_get_space(block),
			    buf_block_get_page_no(block)) == 0);
osku's avatar
osku committed
3041
#endif
3042
	return(block);
osku's avatar
osku committed
3043 3044 3045 3046 3047
}

/************************************************************************
Completes an asynchronous read or write request of a file page to or from
the buffer pool. */
3048
UNIV_INTERN
osku's avatar
osku committed
3049 3050 3051
void
buf_page_io_complete(
/*=================*/
3052
	buf_page_t*	bpage)	/* in: pointer to the block in question */
osku's avatar
osku committed
3053
{
3054
	enum buf_io_fix	io_type;
3055 3056
	const ibool	uncompressed = (buf_page_get_state(bpage)
					== BUF_BLOCK_FILE_PAGE);
3057

3058
	ut_a(buf_page_in_file(bpage));
osku's avatar
osku committed
3059

3060
	/* We do not need protect io_fix here by mutex to read
3061 3062 3063 3064 3065
	it because this is the only function where we can change the value
	from BUF_IO_READ or BUF_IO_WRITE to some other value, and our code
	ensures that this is the only thread that handles the i/o for this
	block. */

3066 3067
	io_type = buf_page_get_io_fix(bpage);
	ut_ad(io_type == BUF_IO_READ || io_type == BUF_IO_WRITE);
osku's avatar
osku committed
3068 3069

	if (io_type == BUF_IO_READ) {
3070 3071
		ulint	read_page_no;
		ulint	read_space_id;
3072
		byte*	frame;
3073

3074 3075
		if (buf_page_get_zip_size(bpage)) {
			frame = bpage->zip.data;
3076
			buf_pool->n_pend_unzip++;
3077 3078 3079
			if (uncompressed
			    && !buf_zip_decompress((buf_block_t*) bpage,
						   FALSE)) {
3080

3081
				buf_pool->n_pend_unzip--;
3082
				goto corrupt;
marko's avatar
marko committed
3083
			}
3084
			buf_pool->n_pend_unzip--;
3085
		} else {
3086 3087
			ut_a(uncompressed);
			frame = ((buf_block_t*) bpage)->frame;
marko's avatar
marko committed
3088 3089
		}

osku's avatar
osku committed
3090
		/* If this page is not uninitialized and not in the
3091 3092
		doublewrite buffer, then the page number and space id
		should be the same as in block. */
3093
		read_page_no = mach_read_from_4(frame + FIL_PAGE_OFFSET);
3094 3095
		read_space_id = mach_read_from_4(
			frame + FIL_PAGE_ARCH_LOG_NO_OR_SPACE_ID);
3096

3097 3098
		if (bpage->space == TRX_SYS_SPACE
		    && trx_doublewrite_page_inside(bpage->offset)) {
osku's avatar
osku committed
3099

3100
			ut_print_timestamp(stderr);
osku's avatar
osku committed
3101
			fprintf(stderr,
3102 3103 3104
				"  InnoDB: Error: reading page %lu\n"
				"InnoDB: which is in the"
				" doublewrite buffer!\n",
3105
				(ulong) bpage->offset);
3106 3107
		} else if (!read_space_id && !read_page_no) {
			/* This is likely an uninitialized page. */
3108 3109 3110
		} else if ((bpage->space
			    && bpage->space != read_space_id)
			   || bpage->offset != read_page_no) {
3111
			/* We did not compare space_id to read_space_id
3112
			if bpage->space == 0, because the field on the
3113
			page may contain garbage in MySQL < 4.1.1,
3114
			which only supported bpage->space == 0. */
3115 3116 3117

			ut_print_timestamp(stderr);
			fprintf(stderr,
3118 3119 3120 3121
				"  InnoDB: Error: space id and page n:o"
				" stored in the page\n"
				"InnoDB: read in are %lu:%lu,"
				" should be %lu:%lu!\n",
3122
				(ulong) read_space_id, (ulong) read_page_no,
3123 3124
				(ulong) bpage->space,
				(ulong) bpage->offset);
osku's avatar
osku committed
3125
		}
marko's avatar
marko committed
3126

osku's avatar
osku committed
3127
		/* From version 3.23.38 up we store the page checksum
3128
		to the 4 first bytes of the page end lsn field */
osku's avatar
osku committed
3129

3130
		if (buf_page_is_corrupted(frame,
3131
					  buf_page_get_zip_size(bpage))) {
3132
corrupt:
3133
			fprintf(stderr,
3134 3135 3136 3137 3138
				"InnoDB: Database page corruption on disk"
				" or a failed\n"
				"InnoDB: file read of page %lu.\n"
				"InnoDB: You may have to recover"
				" from a backup.\n",
3139 3140
				(ulong) bpage->offset);
			buf_page_print(frame, buf_page_get_zip_size(bpage));
3141
			fprintf(stderr,
3142 3143 3144 3145 3146
				"InnoDB: Database page corruption on disk"
				" or a failed\n"
				"InnoDB: file read of page %lu.\n"
				"InnoDB: You may have to recover"
				" from a backup.\n",
3147
				(ulong) bpage->offset);
3148 3149 3150 3151 3152 3153 3154 3155 3156 3157 3158 3159 3160 3161 3162 3163 3164 3165 3166 3167
			fputs("InnoDB: It is also possible that"
			      " your operating\n"
			      "InnoDB: system has corrupted its"
			      " own file cache\n"
			      "InnoDB: and rebooting your computer"
			      " removes the\n"
			      "InnoDB: error.\n"
			      "InnoDB: If the corrupt page is an index page\n"
			      "InnoDB: you can also try to"
			      " fix the corruption\n"
			      "InnoDB: by dumping, dropping,"
			      " and reimporting\n"
			      "InnoDB: the corrupt table."
			      " You can use CHECK\n"
			      "InnoDB: TABLE to scan your"
			      " table for corruption.\n"
			      "InnoDB: See also"
			      " http://dev.mysql.com/doc/refman/5.1/en/"
			      "forcing-recovery.html\n"
			      "InnoDB: about forcing recovery.\n", stderr);
3168

osku's avatar
osku committed
3169
			if (srv_force_recovery < SRV_FORCE_IGNORE_CORRUPT) {
3170 3171 3172
				fputs("InnoDB: Ending processing because of"
				      " a corrupt database page.\n",
				      stderr);
3173 3174
				exit(1);
			}
osku's avatar
osku committed
3175 3176 3177
		}

		if (recv_recovery_is_on()) {
3178 3179
			/* Pages must be uncompressed for crash recovery. */
			ut_a(uncompressed);
3180
			recv_recover_page(TRUE, (buf_block_t*) bpage);
osku's avatar
osku committed
3181 3182
		}

3183
		if (uncompressed && !recv_no_ibuf_operations) {
3184
			ibuf_merge_or_delete_for_page(
3185 3186 3187
				(buf_block_t*) bpage, bpage->space,
				bpage->offset, buf_page_get_zip_size(bpage),
				TRUE);
osku's avatar
osku committed
3188 3189
		}
	}
3190

3191
	buf_pool_mutex_enter();
3192
	mutex_enter(buf_page_get_mutex(bpage));
3193

3194
#ifdef UNIV_IBUF_COUNT_DEBUG
3195 3196 3197 3198 3199 3200
	if (io_type == BUF_IO_WRITE || uncompressed) {
		/* For BUF_IO_READ of compressed-only blocks, the
		buffered operations will be merged by buf_page_get_gen()
		after the block has been uncompressed. */
		ut_a(ibuf_count_get(bpage->space, bpage->offset) == 0);
	}
osku's avatar
osku committed
3201 3202 3203 3204 3205 3206
#endif
	/* Because this thread which does the unlocking is not the same that
	did the locking, we use a pass value != 0 in unlock, which simply
	removes the newest lock debug record, without checking the thread
	id. */

3207
	buf_page_set_io_fix(bpage, BUF_IO_NONE);
3208

3209 3210
	switch (io_type) {
	case BUF_IO_READ:
osku's avatar
osku committed
3211 3212
		/* NOTE that the call to ibuf may have moved the ownership of
		the x-latch to this OS thread: do not let this confuse you in
3213 3214
		debugging! */

osku's avatar
osku committed
3215 3216 3217 3218
		ut_ad(buf_pool->n_pend_reads > 0);
		buf_pool->n_pend_reads--;
		buf_pool->n_pages_read++;

3219 3220 3221
		if (uncompressed) {
			rw_lock_x_unlock_gen(&((buf_block_t*) bpage)->lock,
					     BUF_IO_READ);
osku's avatar
osku committed
3222 3223
		}

3224 3225 3226
		break;

	case BUF_IO_WRITE:
osku's avatar
osku committed
3227 3228 3229
		/* Write means a flush operation: call the completion
		routine in the flush system */

3230
		buf_flush_write_complete(bpage);
osku's avatar
osku committed
3231

3232 3233 3234 3235
		if (uncompressed) {
			rw_lock_s_unlock_gen(&((buf_block_t*) bpage)->lock,
					     BUF_IO_WRITE);
		}
osku's avatar
osku committed
3236 3237 3238

		buf_pool->n_pages_written++;

3239 3240 3241 3242
		break;

	default:
		ut_error;
osku's avatar
osku committed
3243
	}
3244

osku's avatar
osku committed
3245 3246
#ifdef UNIV_DEBUG
	if (buf_debug_prints) {
3247 3248 3249 3250
		fprintf(stderr, "Has %s page space %lu page no %lu\n",
			io_type == BUF_IO_READ ? "read" : "written",
			(ulong) buf_page_get_space(bpage),
			(ulong) buf_page_get_page_no(bpage));
osku's avatar
osku committed
3251 3252
	}
#endif /* UNIV_DEBUG */
3253 3254 3255

	mutex_exit(buf_page_get_mutex(bpage));
	buf_pool_mutex_exit();
osku's avatar
osku committed
3256 3257 3258 3259 3260 3261
}

/*************************************************************************
Invalidates the file pages in the buffer pool when an archive recovery is
completed. All the file pages buffered must be in a replaceable state when
this function is called: not latched and not modified. */
3262
UNIV_INTERN
osku's avatar
osku committed
3263 3264 3265 3266 3267 3268 3269
void
buf_pool_invalidate(void)
/*=====================*/
{
	ibool	freed;

	ut_ad(buf_all_freed());
3270

osku's avatar
osku committed
3271 3272 3273 3274 3275
	freed = TRUE;

	while (freed) {
		freed = buf_LRU_search_and_free_block(100);
	}
3276

3277
	buf_pool_mutex_enter();
osku's avatar
osku committed
3278 3279

	ut_ad(UT_LIST_GET_LEN(buf_pool->LRU) == 0);
3280
	ut_ad(UT_LIST_GET_LEN(buf_pool->unzip_LRU) == 0);
osku's avatar
osku committed
3281

3282
	buf_pool_mutex_exit();
osku's avatar
osku committed
3283 3284
}

3285
#if defined UNIV_DEBUG || defined UNIV_BUF_DEBUG
osku's avatar
osku committed
3286 3287
/*************************************************************************
Validates the buffer buf_pool data structure. */
3288
UNIV_INTERN
osku's avatar
osku committed
3289 3290 3291 3292
ibool
buf_validate(void)
/*==============*/
{
3293
	buf_page_t*	b;
3294
	buf_chunk_t*	chunk;
osku's avatar
osku committed
3295 3296 3297 3298 3299 3300 3301
	ulint		i;
	ulint		n_single_flush	= 0;
	ulint		n_lru_flush	= 0;
	ulint		n_list_flush	= 0;
	ulint		n_lru		= 0;
	ulint		n_flush		= 0;
	ulint		n_free		= 0;
3302
	ulint		n_zip		= 0;
3303

osku's avatar
osku committed
3304 3305
	ut_ad(buf_pool);

3306
	buf_pool_mutex_enter();
osku's avatar
osku committed
3307

3308
	chunk = buf_pool->chunks;
osku's avatar
osku committed
3309

3310
	/* Check the uncompressed blocks. */
3311

3312
	for (i = buf_pool->n_chunks; i--; chunk++) {
osku's avatar
osku committed
3313

3314 3315 3316 3317 3318 3319
		ulint		j;
		buf_block_t*	block = chunk->blocks;

		for (j = chunk->size; j--; block++) {

			mutex_enter(&block->mutex);
3320

3321
			switch (buf_block_get_state(block)) {
3322
			case BUF_BLOCK_ZIP_FREE:
3323
			case BUF_BLOCK_ZIP_PAGE:
3324 3325 3326
			case BUF_BLOCK_ZIP_DIRTY:
				/* These should only occur on
				zip_clean, zip_free[], or flush_list. */
3327 3328
				ut_error;
				break;
osku's avatar
osku committed
3329

3330
			case BUF_BLOCK_FILE_PAGE:
3331 3332 3333 3334
				ut_a(buf_page_hash_get(buf_block_get_space(
							       block),
						       buf_block_get_page_no(
							       block))
3335
				     == &block->page);
osku's avatar
osku committed
3336

3337
#ifdef UNIV_IBUF_COUNT_DEBUG
3338 3339
				ut_a(buf_page_get_io_fix(&block->page)
				     == BUF_IO_READ
3340 3341 3342 3343
				     || !ibuf_count_get(buf_block_get_space(
								block),
							buf_block_get_page_no(
								block)));
osku's avatar
osku committed
3344
#endif
3345 3346 3347
				switch (buf_page_get_io_fix(&block->page)) {
				case BUF_IO_NONE:
					break;
3348

3349
				case BUF_IO_WRITE:
3350 3351
					switch (buf_page_get_flush_type(
							&block->page)) {
3352 3353 3354 3355 3356 3357 3358 3359 3360 3361 3362 3363 3364 3365 3366
					case BUF_FLUSH_LRU:
						n_lru_flush++;
						ut_a(rw_lock_is_locked(
							     &block->lock,
							     RW_LOCK_SHARED));
						break;
					case BUF_FLUSH_LIST:
						n_list_flush++;
						break;
					case BUF_FLUSH_SINGLE_PAGE:
						n_single_flush++;
						break;
					default:
						ut_error;
					}
osku's avatar
osku committed
3367

3368 3369 3370
					break;

				case BUF_IO_READ:
osku's avatar
osku committed
3371

3372 3373
					ut_a(rw_lock_is_locked(&block->lock,
							       RW_LOCK_EX));
3374
					break;
3375
				}
3376

3377
				n_lru++;
osku's avatar
osku committed
3378

3379
				if (block->page.oldest_modification > 0) {
3380 3381 3382
					n_flush++;
				}

3383 3384 3385
				break;

			case BUF_BLOCK_NOT_USED:
3386
				n_free++;
3387
				break;
3388 3389 3390 3391 3392 3393

			case BUF_BLOCK_READY_FOR_USE:
			case BUF_BLOCK_MEMORY:
			case BUF_BLOCK_REMOVE_HASH:
				/* do nothing */
				break;
3394 3395
			}

3396
			mutex_exit(&block->mutex);
osku's avatar
osku committed
3397
		}
3398
	}
osku's avatar
osku committed
3399

3400
	mutex_enter(&buf_pool_zip_mutex);
3401 3402 3403 3404 3405 3406

	/* Check clean compressed-only blocks. */

	for (b = UT_LIST_GET_FIRST(buf_pool->zip_clean); b;
	     b = UT_LIST_GET_NEXT(list, b)) {
		ut_a(buf_page_get_state(b) == BUF_BLOCK_ZIP_PAGE);
marko's avatar
marko committed
3407 3408 3409 3410 3411 3412 3413
		switch (buf_page_get_io_fix(b)) {
		case BUF_IO_NONE:
			/* All clean blocks should be I/O-unfixed. */
			break;
		case BUF_IO_READ:
			/* In buf_LRU_free_block(), we temporarily set
			b->io_fix = BUF_IO_READ for a newly allocated
3414 3415
			control block in order to prevent
			buf_page_get_gen() from decompressing the block. */
marko's avatar
marko committed
3416 3417 3418 3419 3420
			break;
		default:
			ut_error;
			break;
		}
3421 3422 3423 3424 3425 3426 3427 3428 3429 3430 3431
		ut_a(!b->oldest_modification);
		ut_a(buf_page_hash_get(b->space, b->offset) == b);

		n_lru++;
		n_zip++;
	}

	/* Check dirty compressed-only blocks. */

	for (b = UT_LIST_GET_FIRST(buf_pool->flush_list); b;
	     b = UT_LIST_GET_NEXT(list, b)) {
3432 3433
		ut_ad(b->in_flush_list);

3434 3435 3436 3437 3438 3439 3440 3441 3442 3443 3444 3445 3446 3447 3448 3449 3450 3451 3452 3453 3454 3455 3456 3457 3458 3459 3460 3461 3462 3463 3464 3465 3466 3467 3468 3469 3470 3471 3472 3473 3474 3475 3476
		switch (buf_page_get_state(b)) {
		case BUF_BLOCK_ZIP_DIRTY:
			ut_a(b->oldest_modification);
			n_lru++;
			n_flush++;
			n_zip++;
			switch (buf_page_get_io_fix(b)) {
			case BUF_IO_NONE:
			case BUF_IO_READ:
				break;

			case BUF_IO_WRITE:
				switch (buf_page_get_flush_type(b)) {
				case BUF_FLUSH_LRU:
					n_lru_flush++;
					break;
				case BUF_FLUSH_LIST:
					n_list_flush++;
					break;
				case BUF_FLUSH_SINGLE_PAGE:
					n_single_flush++;
					break;
				default:
					ut_error;
				}
				break;
			}
			break;
		case BUF_BLOCK_FILE_PAGE:
			/* uncompressed page */
			break;
		case BUF_BLOCK_ZIP_FREE:
		case BUF_BLOCK_ZIP_PAGE:
		case BUF_BLOCK_NOT_USED:
		case BUF_BLOCK_READY_FOR_USE:
		case BUF_BLOCK_MEMORY:
		case BUF_BLOCK_REMOVE_HASH:
			ut_error;
			break;
		}
		ut_a(buf_page_hash_get(b->space, b->offset) == b);
	}

3477
	mutex_exit(&buf_pool_zip_mutex);
3478 3479 3480 3481 3482

	if (n_lru + n_free > buf_pool->curr_size + n_zip) {
		fprintf(stderr, "n LRU %lu, n free %lu, pool %lu zip %lu\n",
			(ulong) n_lru, (ulong) n_free,
			(ulong) buf_pool->curr_size, (ulong) n_zip);
osku's avatar
osku committed
3483 3484 3485
		ut_error;
	}

3486
	ut_a(UT_LIST_GET_LEN(buf_pool->LRU) == n_lru);
osku's avatar
osku committed
3487 3488
	if (UT_LIST_GET_LEN(buf_pool->free) != n_free) {
		fprintf(stderr, "Free list len %lu, free blocks %lu\n",
3489 3490
			(ulong) UT_LIST_GET_LEN(buf_pool->free),
			(ulong) n_free);
osku's avatar
osku committed
3491 3492 3493 3494
		ut_error;
	}
	ut_a(UT_LIST_GET_LEN(buf_pool->flush_list) == n_flush);

3495 3496 3497
	ut_a(buf_pool->n_flush[BUF_FLUSH_SINGLE_PAGE] == n_single_flush);
	ut_a(buf_pool->n_flush[BUF_FLUSH_LIST] == n_list_flush);
	ut_a(buf_pool->n_flush[BUF_FLUSH_LRU] == n_lru_flush);
3498

3499
	buf_pool_mutex_exit();
osku's avatar
osku committed
3500 3501 3502 3503 3504

	ut_a(buf_LRU_validate());
	ut_a(buf_flush_validate());

	return(TRUE);
3505
}
3506
#endif /* UNIV_DEBUG || UNIV_BUF_DEBUG */
osku's avatar
osku committed
3507

3508
#if defined UNIV_DEBUG_PRINT || defined UNIV_DEBUG || defined UNIV_BUF_DEBUG
osku's avatar
osku committed
3509 3510
/*************************************************************************
Prints info of the buffer buf_pool data structure. */
3511
UNIV_INTERN
osku's avatar
osku committed
3512 3513 3514 3515 3516 3517 3518 3519 3520 3521 3522
void
buf_print(void)
/*===========*/
{
	dulint*		index_ids;
	ulint*		counts;
	ulint		size;
	ulint		i;
	ulint		j;
	dulint		id;
	ulint		n_found;
3523
	buf_chunk_t*	chunk;
osku's avatar
osku committed
3524
	dict_index_t*	index;
3525

osku's avatar
osku committed
3526 3527 3528 3529 3530 3531 3532
	ut_ad(buf_pool);

	size = buf_pool->curr_size;

	index_ids = mem_alloc(sizeof(dulint) * size);
	counts = mem_alloc(sizeof(ulint) * size);

3533
	buf_pool_mutex_enter();
3534

osku's avatar
osku committed
3535 3536 3537 3538 3539
	fprintf(stderr,
		"buf_pool size %lu\n"
		"database pages %lu\n"
		"free pages %lu\n"
		"modified database pages %lu\n"
3540
		"n pending decompressions %lu\n"
osku's avatar
osku committed
3541 3542 3543 3544 3545 3546 3547
		"n pending reads %lu\n"
		"n pending flush LRU %lu list %lu single page %lu\n"
		"pages read %lu, created %lu, written %lu\n",
		(ulong) size,
		(ulong) UT_LIST_GET_LEN(buf_pool->LRU),
		(ulong) UT_LIST_GET_LEN(buf_pool->free),
		(ulong) UT_LIST_GET_LEN(buf_pool->flush_list),
3548
		(ulong) buf_pool->n_pend_unzip,
osku's avatar
osku committed
3549 3550 3551 3552 3553 3554 3555 3556
		(ulong) buf_pool->n_pend_reads,
		(ulong) buf_pool->n_flush[BUF_FLUSH_LRU],
		(ulong) buf_pool->n_flush[BUF_FLUSH_LIST],
		(ulong) buf_pool->n_flush[BUF_FLUSH_SINGLE_PAGE],
		(ulong) buf_pool->n_pages_read, buf_pool->n_pages_created,
		(ulong) buf_pool->n_pages_written);

	/* Count the number of blocks belonging to each index in the buffer */
3557

osku's avatar
osku committed
3558 3559
	n_found = 0;

3560
	chunk = buf_pool->chunks;
osku's avatar
osku committed
3561

3562 3563 3564
	for (i = buf_pool->n_chunks; i--; chunk++) {
		buf_block_t*	block		= chunk->blocks;
		ulint		n_blocks	= chunk->size;
osku's avatar
osku committed
3565

3566 3567
		for (; n_blocks--; block++) {
			const buf_frame_t* frame = block->frame;
osku's avatar
osku committed
3568

3569
			if (fil_page_get_type(frame) == FIL_PAGE_INDEX) {
osku's avatar
osku committed
3570

3571
				id = btr_page_get_index_id(frame);
osku's avatar
osku committed
3572

3573 3574
				/* Look for the id in the index_ids array */
				j = 0;
osku's avatar
osku committed
3575

3576 3577 3578 3579 3580 3581 3582 3583 3584
				while (j < n_found) {

					if (ut_dulint_cmp(index_ids[j],
							  id) == 0) {
						counts[j]++;

						break;
					}
					j++;
osku's avatar
osku committed
3585 3586
				}

3587 3588 3589 3590 3591
				if (j == n_found) {
					n_found++;
					index_ids[j] = id;
					counts[j] = 1;
				}
osku's avatar
osku committed
3592 3593 3594 3595
			}
		}
	}

3596
	buf_pool_mutex_exit();
osku's avatar
osku committed
3597 3598 3599 3600 3601 3602

	for (i = 0; i < n_found; i++) {
		index = dict_index_get_if_in_cache(index_ids[i]);

		fprintf(stderr,
			"Block count for index %lu in buffer is about %lu",
3603 3604
			(ulong) ut_dulint_get_low(index_ids[i]),
			(ulong) counts[i]);
osku's avatar
osku committed
3605 3606 3607 3608 3609 3610 3611 3612

		if (index) {
			putc(' ', stderr);
			dict_index_name_print(stderr, NULL, index);
		}

		putc('\n', stderr);
	}
3613

osku's avatar
osku committed
3614 3615 3616
	mem_free(index_ids);
	mem_free(counts);

3617
	ut_a(buf_validate());
3618
}
3619
#endif /* UNIV_DEBUG_PRINT || UNIV_DEBUG || UNIV_BUF_DEBUG */
osku's avatar
osku committed
3620

3621
#ifdef UNIV_DEBUG
osku's avatar
osku committed
3622 3623
/*************************************************************************
Returns the number of latched pages in the buffer pool. */
3624
UNIV_INTERN
osku's avatar
osku committed
3625 3626
ulint
buf_get_latched_pages_number(void)
3627
/*==============================*/
osku's avatar
osku committed
3628
{
3629
	buf_chunk_t*	chunk;
3630
	buf_page_t*	b;
3631 3632
	ulint		i;
	ulint		fixed_pages_number = 0;
osku's avatar
osku committed
3633

3634
	buf_pool_mutex_enter();
osku's avatar
osku committed
3635

3636 3637 3638 3639 3640 3641 3642
	chunk = buf_pool->chunks;

	for (i = buf_pool->n_chunks; i--; chunk++) {
		buf_block_t*	block;
		ulint		j;

		block = chunk->blocks;
osku's avatar
osku committed
3643

3644
		for (j = chunk->size; j--; block++) {
3645 3646
			if (buf_block_get_state(block)
			    != BUF_BLOCK_FILE_PAGE) {
3647 3648 3649

				continue;
			}
osku's avatar
osku committed
3650

3651 3652
			mutex_enter(&block->mutex);

3653 3654 3655
			if (block->page.buf_fix_count != 0
			    || buf_page_get_io_fix(&block->page)
			    != BUF_IO_NONE) {
3656 3657 3658 3659
				fixed_pages_number++;
			}

			mutex_exit(&block->mutex);
3660
		}
3661
	}
osku's avatar
osku committed
3662

3663
	mutex_enter(&buf_pool_zip_mutex);
3664 3665 3666 3667 3668 3669

	/* Traverse the lists of clean and dirty compressed-only blocks. */

	for (b = UT_LIST_GET_FIRST(buf_pool->zip_clean); b;
	     b = UT_LIST_GET_NEXT(list, b)) {
		ut_a(buf_page_get_state(b) == BUF_BLOCK_ZIP_PAGE);
3670
		ut_a(buf_page_get_io_fix(b) != BUF_IO_WRITE);
3671 3672 3673 3674 3675 3676 3677 3678 3679

		if (b->buf_fix_count != 0
		    || buf_page_get_io_fix(b) != BUF_IO_NONE) {
			fixed_pages_number++;
		}
	}

	for (b = UT_LIST_GET_FIRST(buf_pool->flush_list); b;
	     b = UT_LIST_GET_NEXT(list, b)) {
3680 3681
		ut_ad(b->in_flush_list);

3682 3683 3684 3685 3686 3687 3688 3689 3690 3691 3692 3693 3694 3695 3696 3697 3698 3699 3700 3701 3702
		switch (buf_page_get_state(b)) {
		case BUF_BLOCK_ZIP_DIRTY:
			if (b->buf_fix_count != 0
			    || buf_page_get_io_fix(b) != BUF_IO_NONE) {
				fixed_pages_number++;
			}
			break;
		case BUF_BLOCK_FILE_PAGE:
			/* uncompressed page */
			break;
		case BUF_BLOCK_ZIP_FREE:
		case BUF_BLOCK_ZIP_PAGE:
		case BUF_BLOCK_NOT_USED:
		case BUF_BLOCK_READY_FOR_USE:
		case BUF_BLOCK_MEMORY:
		case BUF_BLOCK_REMOVE_HASH:
			ut_error;
			break;
		}
	}

3703 3704
	mutex_exit(&buf_pool_zip_mutex);
	buf_pool_mutex_exit();
3705 3706

	return(fixed_pages_number);
osku's avatar
osku committed
3707
}
3708
#endif /* UNIV_DEBUG */
osku's avatar
osku committed
3709 3710 3711

/*************************************************************************
Returns the number of pending buf pool ios. */
3712
UNIV_INTERN
osku's avatar
osku committed
3713 3714 3715 3716 3717
ulint
buf_get_n_pending_ios(void)
/*=======================*/
{
	return(buf_pool->n_pend_reads
3718 3719 3720
	       + buf_pool->n_flush[BUF_FLUSH_LRU]
	       + buf_pool->n_flush[BUF_FLUSH_LIST]
	       + buf_pool->n_flush[BUF_FLUSH_SINGLE_PAGE]);
osku's avatar
osku committed
3721 3722 3723 3724 3725
}

/*************************************************************************
Returns the ratio in percents of modified pages in the buffer pool /
database pages in the buffer pool. */
3726
UNIV_INTERN
osku's avatar
osku committed
3727 3728 3729 3730 3731 3732
ulint
buf_get_modified_ratio_pct(void)
/*============================*/
{
	ulint	ratio;

3733
	buf_pool_mutex_enter();
osku's avatar
osku committed
3734 3735

	ratio = (100 * UT_LIST_GET_LEN(buf_pool->flush_list))
3736
		/ (1 + UT_LIST_GET_LEN(buf_pool->LRU)
3737
		   + UT_LIST_GET_LEN(buf_pool->free));
osku's avatar
osku committed
3738

3739
	/* 1 + is there to avoid division by zero */
osku's avatar
osku committed
3740

3741
	buf_pool_mutex_exit();
osku's avatar
osku committed
3742 3743 3744 3745 3746 3747

	return(ratio);
}

/*************************************************************************
Prints info of the buffer i/o. */
3748
UNIV_INTERN
osku's avatar
osku committed
3749 3750 3751 3752 3753 3754 3755 3756
void
buf_print_io(
/*=========*/
	FILE*	file)	/* in/out: buffer where to print */
{
	time_t	current_time;
	double	time_elapsed;
	ulint	size;
3757

osku's avatar
osku committed
3758 3759 3760
	ut_ad(buf_pool);
	size = buf_pool->curr_size;

3761
	buf_pool_mutex_enter();
3762

osku's avatar
osku committed
3763 3764 3765 3766 3767 3768 3769 3770 3771 3772 3773 3774 3775
	fprintf(file,
		"Buffer pool size   %lu\n"
		"Free buffers       %lu\n"
		"Database pages     %lu\n"
		"Modified db pages  %lu\n"
		"Pending reads %lu\n"
		"Pending writes: LRU %lu, flush list %lu, single page %lu\n",
		(ulong) size,
		(ulong) UT_LIST_GET_LEN(buf_pool->free),
		(ulong) UT_LIST_GET_LEN(buf_pool->LRU),
		(ulong) UT_LIST_GET_LEN(buf_pool->flush_list),
		(ulong) buf_pool->n_pend_reads,
		(ulong) buf_pool->n_flush[BUF_FLUSH_LRU]
3776
		+ buf_pool->init_flush[BUF_FLUSH_LRU],
osku's avatar
osku committed
3777
		(ulong) buf_pool->n_flush[BUF_FLUSH_LIST]
3778
		+ buf_pool->init_flush[BUF_FLUSH_LIST],
osku's avatar
osku committed
3779 3780 3781 3782
		(ulong) buf_pool->n_flush[BUF_FLUSH_SINGLE_PAGE]);

	current_time = time(NULL);
	time_elapsed = 0.001 + difftime(current_time,
3783
					buf_pool->last_printout_time);
osku's avatar
osku committed
3784 3785 3786 3787 3788 3789 3790 3791 3792 3793 3794 3795 3796 3797 3798 3799 3800
	buf_pool->last_printout_time = current_time;

	fprintf(file,
		"Pages read %lu, created %lu, written %lu\n"
		"%.2f reads/s, %.2f creates/s, %.2f writes/s\n",
		(ulong) buf_pool->n_pages_read,
		(ulong) buf_pool->n_pages_created,
		(ulong) buf_pool->n_pages_written,
		(buf_pool->n_pages_read - buf_pool->n_pages_read_old)
		/ time_elapsed,
		(buf_pool->n_pages_created - buf_pool->n_pages_created_old)
		/ time_elapsed,
		(buf_pool->n_pages_written - buf_pool->n_pages_written_old)
		/ time_elapsed);

	if (buf_pool->n_page_gets > buf_pool->n_page_gets_old) {
		fprintf(file, "Buffer pool hit rate %lu / 1000\n",
3801 3802 3803 3804 3805
			(ulong)
			(1000 - ((1000 * (buf_pool->n_pages_read
					  - buf_pool->n_pages_read_old))
				 / (buf_pool->n_page_gets
				    - buf_pool->n_page_gets_old))));
osku's avatar
osku committed
3806 3807
	} else {
		fputs("No buffer pool page gets since the last printout\n",
3808
		      file);
osku's avatar
osku committed
3809 3810 3811 3812 3813 3814 3815
	}

	buf_pool->n_page_gets_old = buf_pool->n_page_gets;
	buf_pool->n_pages_read_old = buf_pool->n_pages_read;
	buf_pool->n_pages_created_old = buf_pool->n_pages_created;
	buf_pool->n_pages_written_old = buf_pool->n_pages_written;

3816 3817 3818 3819 3820 3821 3822 3823 3824 3825
	/* Print some values to help us with visualizing what is
	happening with LRU eviction. */
	fprintf(file,
		"LRU len: %lu, unzip_LRU len: %lu\n"
		"I/O sum[%lu]:cur[%lu], unzip sum[%lu]:cur[%lu]\n",
		UT_LIST_GET_LEN(buf_pool->LRU),
		UT_LIST_GET_LEN(buf_pool->unzip_LRU),
		buf_LRU_stat_sum.io, buf_LRU_stat_cur.io,
		buf_LRU_stat_sum.unzip, buf_LRU_stat_cur.unzip);

3826
	buf_pool_mutex_exit();
osku's avatar
osku committed
3827 3828 3829 3830
}

/**************************************************************************
Refreshes the statistics used to print per-second averages. */
3831
UNIV_INTERN
osku's avatar
osku committed
3832 3833 3834 3835
void
buf_refresh_io_stats(void)
/*======================*/
{
3836
	buf_pool->last_printout_time = time(NULL);
osku's avatar
osku committed
3837 3838 3839 3840 3841 3842 3843 3844
	buf_pool->n_page_gets_old = buf_pool->n_page_gets;
	buf_pool->n_pages_read_old = buf_pool->n_pages_read;
	buf_pool->n_pages_created_old = buf_pool->n_pages_created;
	buf_pool->n_pages_written_old = buf_pool->n_pages_written;
}

/*************************************************************************
Checks that all file pages in the buffer are in a replaceable state. */
3845
UNIV_INTERN
osku's avatar
osku committed
3846 3847 3848 3849
ibool
buf_all_freed(void)
/*===============*/
{
3850
	buf_chunk_t*	chunk;
osku's avatar
osku committed
3851
	ulint		i;
3852

osku's avatar
osku committed
3853 3854
	ut_ad(buf_pool);

3855
	buf_pool_mutex_enter();
osku's avatar
osku committed
3856

3857
	chunk = buf_pool->chunks;
3858

3859
	for (i = buf_pool->n_chunks; i--; chunk++) {
osku's avatar
osku committed
3860

3861
		const buf_block_t* block = buf_chunk_not_freed(chunk);
osku's avatar
osku committed
3862

3863 3864 3865
		if (UNIV_LIKELY_NULL(block)) {
			fprintf(stderr,
				"Page %lu %lu still fixed or dirty\n",
3866 3867
				(ulong) block->page.space,
				(ulong) block->page.offset);
3868
			ut_error;
osku's avatar
osku committed
3869
		}
3870
	}
osku's avatar
osku committed
3871

3872
	buf_pool_mutex_exit();
osku's avatar
osku committed
3873 3874

	return(TRUE);
3875
}
osku's avatar
osku committed
3876 3877 3878 3879

/*************************************************************************
Checks that there currently are no pending i/o-operations for the buffer
pool. */
3880
UNIV_INTERN
osku's avatar
osku committed
3881 3882 3883 3884 3885 3886 3887
ibool
buf_pool_check_no_pending_io(void)
/*==============================*/
				/* out: TRUE if there is no pending i/o */
{
	ibool	ret;

3888
	buf_pool_mutex_enter();
osku's avatar
osku committed
3889 3890

	if (buf_pool->n_pend_reads + buf_pool->n_flush[BUF_FLUSH_LRU]
3891 3892
	    + buf_pool->n_flush[BUF_FLUSH_LIST]
	    + buf_pool->n_flush[BUF_FLUSH_SINGLE_PAGE]) {
osku's avatar
osku committed
3893 3894 3895 3896 3897
		ret = FALSE;
	} else {
		ret = TRUE;
	}

3898
	buf_pool_mutex_exit();
osku's avatar
osku committed
3899 3900 3901 3902 3903 3904

	return(ret);
}

/*************************************************************************
Gets the current length of the free list of buffer blocks. */
3905
UNIV_INTERN
osku's avatar
osku committed
3906 3907 3908 3909 3910 3911
ulint
buf_get_free_list_len(void)
/*=======================*/
{
	ulint	len;

3912
	buf_pool_mutex_enter();
osku's avatar
osku committed
3913 3914 3915

	len = UT_LIST_GET_LEN(buf_pool->free);

3916
	buf_pool_mutex_exit();
osku's avatar
osku committed
3917 3918 3919

	return(len);
}