Commits · 2b2d6d019724de6e51ac5bcf22b5ef969daefa8b · Kirill Smelkov / linux

26 Jul, 2008 1 commit
- ext4: Cleanup whitespace and other miscellaneous style issues · 2b2d6d01
  Theodore Ts'o authored Jul 26, 2008
```
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
```
  2b2d6d01
24 Jul, 2008 1 commit

ext4: improve ext4_fill_flex_info() a bit · ec05e868

Li Zefan authored Jul 24, 2008

- use kzalloc() instead of kmalloc() + memset()
- improve a printk info
Signed-off-by: Li Zefan <lizf@cn.fujitsu.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>

ec05e868

17 Jul, 2008 1 commit

ext4: Cleanup the block reservation code path · 12219aea

Aneesh Kumar K.V authored Jul 17, 2008

The truncate patch should not use the i_allocated_meta_blocks
value. So add seperate functions to be used in the truncate
and alloc path. We also need to release the meta-data block
that we reserved for the blocks that we are truncating.
Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>

12219aea

02 Aug, 2008 1 commit

ext4: don't assume extents can't cross block groups when truncating · 34071da7

Theodore Ts'o authored Aug 01, 2008

With the FLEX_BG layout, there is no reason why extents can't cross
block groups, so make the truncate code reserve enough credits so we
don't BUG if we come across such an extent.
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>

34071da7

03 Aug, 2008 1 commit

ext4: Fix lack of credits BUG() when deleting a badly fragmented inode · bc965ab3

Theodore Ts'o authored Aug 02, 2008

The extents codepath for ext4_truncate() requests journal transaction
credits in very small chunks, requesting only what is needed.  This
means there may not be enough credits left on the transaction handle
after ext4_truncate() returns and then when ext4_delete_inode() tries
finish up its work, it may not have enough transaction credits,
causing a BUG() oops in the jbd2 core.

Also, reserve an extra 2 blocks when starting an ext4_delete_inode()
since we need to update the inode bitmap, as well as update the
orphaned inode linked list.
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>

bc965ab3

02 Aug, 2008 2 commits

ext4: Fix ext4_ext_journal_restart() · 0123c939

Theodore Ts'o authored Aug 01, 2008

The ext4_ext_journal_restart() is a convenience function which checks
to see if the requested number of credits is present, and if so it
closes the current transaction and attaches the current handle to the
new transaction.  Unfortunately, it wasn't proprely checking the
return value from ext4_journal_extend(), so it was starting a new
transaction when one was not necessary, and returning an error when
all that was necessary was to restart the handle with a new
transaction.
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>

0123c939

ext4: fix ext4_da_write_begin error path · d5a0d4f7

Eric Sandeen authored Aug 02, 2008

ext4_da_write_begin needs to call journal_stop before returning,
if the page allocation fails.
Signed-off-by: Eric Sandeen <sandeen@redhat.com>
Acked-by: Mingming Cao <cmm@us.ibm.com>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>

d5a0d4f7

01 Aug, 2008 1 commit

jbd2: don't abort if flushing file data failed · e9e34f4e

Hidehiro Kawai authored Jul 31, 2008

In ordered mode, the current jbd2 aborts the journal if a file data buffer
has an error.  But this behavior is unintended, and we found that it has
been adopted accidentally.

This patch undoes it and just calls printk() instead of aborting the
journal.  Unlike a similar patch for ext3/jbd, file data buffers are
written via generic_writepages().  But we also need to set AS_EIO
into their mappings because wait_on_page_writeback_range() clears
AS_EIO before a user process sees it.
Signed-off-by: Hidehiro Kawai <hidehiro.kawai.ez@hitachi.com>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>

e9e34f4e

26 Jul, 2008 1 commit

ext4: don't read inode block if the buffer has a write error · 9c83a923

Hidehiro Kawai authored Jul 26, 2008

A transient I/O error can corrupt inode data.  Here is the scenario:

(1) update inode_A at the block_B
(2) pdflush writes out new inode_A to the filesystem, but it results
    in write I/O error, at this point, BH_Uptodate flag of the buffer
    for block_B is cleared and BH_Write_EIO is set
(3) create new inode_C which located at block_B, and
    __ext4_get_inode_loc() tries to read on-disk block_B because the
    buffer is not uptodate
(4) if it can read on-disk block_B successfully, inode_A is
    overwritten by old data

This patch makes __ext4_get_inode_loc() not read the inode block if the
buffer has BH_Write_EIO flag.  In this case, the buffer should have the
latest information, so setting the uptodate flag to the buffer (this
avoids WARN_ON_ONCE() in mark_buffer_dirty().)

According to this change, we would need to test BH_Write_EIO flag for the
error checking.  Currently nobody checks write I/O errors on metadata
buffers, but it will be done in other patches I'm working on.
Signed-off-by: Hidehiro Kawai <hidehiro.kawai.ez@hitachi.com>
Cc: sugita <yumiko.sugita.yf@hitachi.com>
Cc: Satoshi OSHIMA <satoshi.oshima.fk@hitachi.com>
Cc: Nick Piggin <nickpiggin@yahoo.com.au>
Cc: Jan Kara <jack@ucw.cz>
Cc: <linux-ext4@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>

9c83a923

23 Jul, 2008 3 commits

ext4: Don't allow lg prealloc list to be grow large. · 6be2ded1

Aneesh Kumar K.V authored Jul 23, 2008

Currently, the locality group prealloc list is freed only when there
is a block allocation failure. This can result in large number of
entries in the preallocation list making ext4_mb_use_preallocated()
expensive.

To fix this, we convert the locality group prealloc list to a hash
list. The hash index is the order of number of blocks in the prealloc
space with a max order of 9. When adding prealloc space to the list we
make sure total entries for each order does not exceed 8. If it is
more than 8 we discard few entries and make sure the we have only <= 5
entries.
Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>

6be2ded1

ext4: Convert the usage of NR_CPUS to nr_cpu_ids. · 1320cbcf

Aneesh Kumar K.V authored Jul 23, 2008

NR_CPUS can be really large. We should be using nr_cpu_ids instead.
Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>

1320cbcf

ext4: Improve error handling in mballoc · ce89f46c

Aneesh Kumar K.V authored Jul 23, 2008

Don't call BUG_ON on file system failures. Instead use ext4_error and
also handle the continue case properly.
Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>

ce89f46c

03 Aug, 2008 2 commits

ext4: lock block groups when initializing · b5f10eed

Eric Sandeen authored Aug 02, 2008

I noticed when filling a 1T filesystem with 4 threads using the
fs_mark benchmark:

fs_mark -d /mnt/test -D 256 -n 100000 -t 4 -s 20480 -F -S 0

that I occasionally got checksum mismatch errors:

EXT4-fs error (device sdb): ext4_init_inode_bitmap: Checksum bad for group 6935

etc.  I'd reliably get 4-5 of them during the run.

It appears that the problem is likely a race to init the bg's
when the uninit_bg feature is enabled.

With the patch below, which adds sb_bgl_locking around initialization,
I was able to complete several runs with no errors or warnings.
Signed-off-by: Eric Sandeen <sandeen@redhat.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>

b5f10eed

ext4: sync up block and inode bitmap reading functions · e29d1cde

Eric Sandeen authored Aug 02, 2008

ext4_read_block_bitmap and read_inode_bitmap do essentially
the same thing, and yet they are structured quite differently.
I came across this difference while looking at doing bg locking
during bg initialization.

This patch:

* removes unnecessary casts in the error messages
* renames read_inode_bitmap to ext4_read_inode_bitmap
* and more substantially, restructures the inode bitmap
  reading function to be more like the block bitmap counterpart.

The change to the inode bitmap reader simplifies the locking
to be applied in the next patch.
Signed-off-by: Eric Sandeen <sandeen@redhat.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>

e29d1cde

26 Jul, 2008 1 commit

ext4: Allow read/only mounts with corrupted block group checksums · 8a266467

Theodore Ts'o authored Jul 26, 2008

If the block group checksums are corrupted, still allow the mount to
succeed, so e2fsck can have a chance to try to fix things up. Add
code in the remount r/w path to make sure the block group checksums
are valid before allowing the filesystem to be remounted read/write.
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>

8a266467

02 Aug, 2008 1 commit

ext4: Fix data corruption when writing to prealloc area · d03856bd

Aneesh Kumar K.V authored Aug 02, 2008

Inserting an extent can cause a new entry in the already existing index
block. That doesn't increase the depth of the instead. Instead it adds a
new leaf block. Now with the new leaf block the path information
corresponding to the logical block should be fetched from the new block.
The old path will be pointing to the old leaf block.

We need to recalucate the path information on extent insert
even if depth doesn't change. Without this change, the extent merge
after converting an unwritten extent to initialized extent takes the wrong
extent and cause data corruption.
Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Signed-off-by: Mingming Cao <cmm@us.ibm.com>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>

d03856bd

29 Jul, 2008 6 commits

Linux 2.6.27-rc1 · 6e86841d
Linus Torvalds authored Jul 28, 2008

6e86841d

Merge git://git.kernel.org/pub/scm/linux/kernel/git/rusty/linux-2.6-for-linus · 7874d351

Linus Torvalds authored Jul 28, 2008

* git://git.kernel.org/pub/scm/linux/kernel/git/rusty/linux-2.6-for-linus:
  lguest: turn Waker into a thread, not a process
  lguest: Enlarge virtio rings
  lguest: Use GSO/IFF_VNET_HDR extensions on tun/tap
  lguest: Remove 'network: no dma buffer!' warning
  lguest: Adaptive timeout
  lguest: Tell Guest net not to notify us on every packet xmit
  lguest: net block unneeded receive queue update notifications
  lguest: wrap last_avail accesses.
  lguest: use cpu capability accessors
  lguest: virtio-rng support
  lguest: Support assigning a MAC address
  lguest: Don't leak /dev/zero fd
  lguest: fix verbose printing of device features.
  lguest: fix switcher_page leak on unload
  lguest: Guest int3 fix
  lguest: set max_pfn_mapped, growl loudly at Yinghai Lu

7874d351

Merge branch 'for-linus' of git://git.o-hand.com/linux-mfd · 5dfb66ba

Linus Torvalds authored Jul 28, 2008

* 'for-linus' of git://git.o-hand.com/linux-mfd:
  mfd: accept pure device as a parent, not only platform_device
  mfd: add platform_data to mfd_cell
  mfd: Coding style fixes
  mfd: Use to_platform_device instead of container_of

5dfb66ba

Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jbarnes/pci-2.6 · 1d9b9f6a

Linus Torvalds authored Jul 28, 2008

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jbarnes/pci-2.6: (21 commits)
  x86/PCI: use dev_printk when possible
  PCI: add D3 power state avoidance quirk
  PCI: fix bogus "'device' may be used uninitialized" warning in pci_slot
  PCI: add an option to allow ASPM enabled forcibly
  PCI: disable ASPM on pre-1.1 PCIe devices
  PCI: disable ASPM per ACPI FADT setting
  PCI MSI: Don't disable MSIs if the mask bit isn't supported
  PCI: handle 64-bit resources better on 32-bit machines
  PCI: rewrite PCI BAR reading code
  PCI: document pci_target_state
  PCI hotplug: fix typo in pcie hotplug output
  x86 gart: replace to_pages macro with iommu_num_pages
  x86, AMD IOMMU: replace to_pages macro with iommu_num_pages
  iommu: add iommu_num_pages helper function
  dma-coherent: add documentation to new interfaces
  Cris: convert to using generic dma-coherent mem allocator
  Sh: use generic per-device coherent dma allocator
  ARM: support generic per-device coherent dma mem
  Generic dma-coherent: fix DMA_MEMORY_EXCLUSIVE
  x86: use generic per-device dma coherent allocator
  ...

1d9b9f6a

Merge git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi-rc-fixes-2.6 · a3ad7f12
Linus Torvalds authored Jul 28, 2008
```
* git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi-rc-fixes-2.6:
  [SCSI] qla2xxx: fix msleep compile error
```
a3ad7f12

Fix 'get_user_pages_fast()' with non-page-aligned start address · 9b79022c

Linus Torvalds authored Jul 28, 2008

Alexey Dobriyan reported trouble with LTP with the new fast-gup code,
and Johannes Weiner debugged it to non-page-aligned addresses, where the
new get_user_pages_fast() code would do all the wrong things, including
just traversing past the end of the requested area due to 'addr' never
matching 'end' exactly.

This is not a pretty fix, and we may actually want to move the alignment
into generic code, leaving just the core code per-arch, but Alexey
verified that the vmsplice01 LTP test doesn't crash with this.
Reported-and-tested-by: Alexey Dobriyan <adobriyan@gmail.com>
Debugged-by: Johannes Weiner <hannes@saeurebad.de>
Cc: Nick Piggin <npiggin@suse.de>
Cc: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

9b79022c

28 Jul, 2008 18 commits

lguest: turn Waker into a thread, not a process · 8c79873d

Rusty Russell authored Jul 29, 2008

lguest uses a Waker process to break it out of the kernel (ie.
actually running the guest) when file descriptor needs attention.

Changing this from a process to a thread somewhat simplifies things:
it can directly access the fd_set of things to watch.  More
importantly, it means that the Waker can see Guest memory correctly,
so /dev/vring file descriptors will work as anticipated (the
alternative is to actually mmap MAP_SHARED, but you can't do that with
/dev/zero).
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>

8c79873d

lguest: Enlarge virtio rings · 0f0c4fab

Rusty Russell authored Jul 29, 2008

With big packets, 128 entries is a little small.

Guest -> Host 1GB TCP:
Before: 8.43625 seconds xmit 95640 recv 198266 timeout 49771 usec 1252
After: 8.01099 seconds xmit 49200 recv 102263 timeout 26014 usec 2118
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>

0f0c4fab

lguest: Use GSO/IFF_VNET_HDR extensions on tun/tap · 398f187d

Rusty Russell authored Jul 29, 2008

Guest -> Host 1GB TCP:
Before 20.1974 seconds xmit 214510 recv 5 timeout 214491 usec 278
After 8.43625 seconds xmit 95640 recv 198266 timeout 49771 usec 1252

Host -> Guest 1GB TCP:
Before: Seconds 9.98854 xmit 172166 recv 5344 timeout 172157 usec 251
After: Seconds 5.72803 xmit 244322 recv 9919 timeout 244302 usec 156
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>

398f187d

lguest: Remove 'network: no dma buffer!' warning · 9254926f

Rusty Russell authored Jul 29, 2008

This warning can happen a lot under load, and it should be warnx not
warn anwyay.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>

9254926f

lguest: Adaptive timeout · aa124984

Rusty Russell authored Jul 29, 2008

Since the correct timeout value varies, use a heuristic which adjusts
the timeout depending on how many packets we've seen.  This gives
slightly worse results, but doesn't need tweaking when GSO is
introduced.

500 usec	19.1887		xmit 561141 recv 1 timeout 559657
Dynamic (278)	20.1974		xmit 214510 recv 5 timeout 214491 usec 278
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>

aa124984

lguest: Tell Guest net not to notify us on every packet xmit · a161883a

Rusty Russell authored Jul 29, 2008

virtio_ring has the ability to suppress notifications.  This prevents
a guest exit for every packet, but we need to set a timer on packet
receipt to re-check if there were any remaining packets.

Here are the times for 1G TCP Guest->Host with different timeout
settings (it matters because the TCP window doesn't grow big enough to
fill the entire buffer):

Timeout value	Seconds		Xmit/Recv/Timeout
None (before)	25.3784		xmit 7750233 recv 1
2500 usec	62.5119		xmit 207020 recv 2 timeout 207020
1000 usec	34.5379		xmit 207003 recv 2 timeout 207003
750 usec	29.2305		xmit 207002 recv 1 timeout 207002
500 usec	19.1887		xmit 561141 recv 1 timeout 559657
250 usec	20.0465		xmit 214128 recv 2 timeout 214110
100 usec	19.2583		xmit 561621 recv 1 timeout 560153

(Note that these values are sensitive to the GSO patches which come
 later, and probably other traffic-related variables, so take with a
 large grain of salt).
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>

a161883a

lguest: net block unneeded receive queue update notifications · 5dae785a

Rusty Russell authored Jul 29, 2008

Number of exits transmitting 10GB Guest->Host before:
	network xmit 7858610 recv 118136

After:
	network xmit 7750233 recv 1
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>

5dae785a

lguest: wrap last_avail accesses. · b5111790

Rusty Russell authored Jul 29, 2008

To simplify the transition to when we publish indices in the ring
(and make shuffling my patch queue easier), wrap them in a lg_last_avail()
macro.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>

b5111790

lguest: use cpu capability accessors · cf485e56

Andrew Morton authored Jun 09, 2008

To support my little make-x86-bitops-use-proper-typechecking projectlet.

Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Andrea Arcangeli <andrea@qumranet.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Acked-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>

cf485e56

lguest: virtio-rng support · 28fd6d7f

Rusty Russell authored Jul 29, 2008

This is a simple patch to add support for the virtio "hardware random
generator" to lguest.  It gets about 1.2 MB/sec reading from /dev/hwrng
in the guest.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>

28fd6d7f

lguest: Support assigning a MAC address · dec6a2be

Mark McLoughlin authored Jul 29, 2008

If you've got a nice DHCP configuration which maps MAC
addresses to specific IP addresses, then you're going to
want to start your guest with one of those MAC addresses.

Also, in Fedora, we have persistent network interface naming
based on the MAC address, so with randomly assigned
addresses you're soon going to hit eth13. Who knows what
will happen then!

Allow assigning a MAC address to the network interface with
e.g.

  --tunnet=bridge:eth0:00:FF:95:6B:DA:3D

or:

  --tunnet=192.168.121.1:00:FF:95:6B:DA:3D

which is pretty unintelligable, but ...

(includes Rusty's minor rework)
Signed-off-by: Mark McLoughlin <markmc@redhat.com>
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>

dec6a2be

lguest: Don't leak /dev/zero fd · 34bdaab4

Mark McLoughlin authored Jun 13, 2008

Signed-off-by: Mark McLoughlin <markmc@redhat.com>
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>

34bdaab4

lguest: fix verbose printing of device features. · 32c68e5c

Rusty Russell authored Jul 29, 2008

%02x is more appropriate for bytes than %08x.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>

32c68e5c

lguest: fix switcher_page leak on unload · 0a707210

Johannes Weiner authored Jul 08, 2008

map_switcher allocates the array, unmap_switcher has to free it
accordingly.
Signed-off-by: Johannes Weiner <hannes@saeurebad.de>
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>

0a707210

lguest: Guest int3 fix · 0c12091d

Rusty Russell authored Jul 29, 2008

Ron Minnich noticed that guest userspace gets a GPF when it tries to int3:
we need to copy the privilege level from the guest-supplied IDT to the real
IDT. int3 is the only common case where guest userspace expects to invoke
an interrupt, so that's the symptom of failing to do this.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>

0c12091d

lguest: set max_pfn_mapped, growl loudly at Yinghai Lu · 5d006d8d

Rusty Russell authored Jul 29, 2008

6af61a76 'x86: clean up max_pfn_mapped
usage - 32-bit' makes the following comment:

    XEN PV and lguest may need to assign max_pfn_mapped too.

But no CC.  Yinghai, wasting fellow developers' time is a VERY bad
habit.  If you do it again, I will hunt you down and try to extract
the three hours of my life I just lost :)
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Cc: Yinghai Lu <yhlu.kernel@gmail.com>

5d006d8d

mfd: accept pure device as a parent, not only platform_device · 424f525a
Dmitry Baryshkov authored Jul 29, 2008
```
Signed-off-by: Dmitry Baryshkov <dbaryshkov@gmail.com>
Signed-off-by: Samuel Ortiz <sameo@openedhand.com>
```
424f525a

include/asm-generic/pgtable-nopmd.h: macros are noxious, reason #435 · 34ee5501

Andrew Morton authored Jul 28, 2008

arch/x86/mm/pgtable.c: In function 'pgd_mop_up_pmds':
  arch/x86/mm/pgtable.c:194: warning: unused variable 'pmd'

Cc: Ingo Molnar <mingo@elte.hu>
Cc: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

34ee5501