Commits · be2479818ff227fbb6800935909b65e06159735c · Kirill Smelkov / linux

19 Sep, 2016 40 commits

staging: lustre: ptlrpc: prevent request timeout grow due to recovery · be247981

Mikhail Pershin authored Sep 18, 2016

Patch fixes the issue seen on the client with growing request
timeout which occurred after the server side patch landed for
LU-5079. While commit itself is correct, it reveals another
issue. If request is being processed for a long time on server
then client adaptive timeouts will adapt to that after receiving
reply and new requests will have bigger timeout. Another problem
is that server AT history is corrupted by recovery request
processing time which not pure service time but includes
also waiting time for clients to recover.

Patch prevents the AT stats update from early replies on client and
from recovering requests processing time on server.
The ptlrpc_at_recv_early_reply() still updates the current request
timeout as asked by server, but don't include this into AT stats.
The real reply will bring that data from server after all.
Signed-off-by: Mikhail Pershin <mike.pershin@intel.com>
Intel-bug-id: https://jira.hpdd.intel.com/browse/LU-6084
Reviewed-on: http://review.whamcloud.com/13520Reviewed-by: Andreas Dilger <andreas.dilger@intel.com>
Reviewed-by: Jian Yu <jian.yu@intel.com>
Reviewed-by: Oleg Drokin <oleg.drokin@intel.com>
Signed-off-by: James Simmons <jsimmons@infradead.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

be247981

staging: lustre: obd: use proper flags for call_usermodehelper · bbc222b1

James Simmons authored Sep 18, 2016

When a parameter is permanently changed on the MGS the
MGS send a changelog packet to the proper nodes that
are affected by the change. Once the nodes receive the
change they then call the userland utility lctl to
change its local value. When calling a userland
application from the kernel you specify a flag to
control the interaction with the application. Originally
by default the flag was set to 0 which is UMH_NO_WAIT
which meant lctl was being called asynchronously. In
older kernels this was fine since UHM_NO_WAIT and
UHM_WAIT_PROC had nearly the same logic. This changed
with newer kernels which broke updating our parameters.
Plus doing a UHM_NO_WAIT doesn't report back a error
if something goes wrong with lctl. The fix is to set
the flag to UHM_WAIT_PROC so kernel space waits until
lctl has finished and we get a proper error code if
something does go wrong with lctl.
Signed-off-by: James Simmons <uja.ornl@gmail.com>
Intel-bug-id: https://jira.hpdd.intel.com/browse/LU-6063
Reviewed-on: http://review.whamcloud.com/13677Reviewed-by: Bob Glossman <bob.glossman@intel.com>
Reviewed-by: Andreas Dilger <andreas.dilger@intel.com>
Signed-off-by: James Simmons <jsimmons@infradead.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

bbc222b1

staging: lustre: ptlrpc: remove unnecessary EXPORT_SYMBOL · e47fad9a

frank zago authored Sep 18, 2016

A lot of symbols don't need to be exported at all because they are
only used in the module they belong to.
Signed-off-by: frank zago <fzago@cray.com>
Intel-bug-id: https://jira.hpdd.intel.com/browse/LU-5829
Reviewed-on: http://review.whamcloud.com/12510Reviewed-by: Dmitry Eremin <dmitry.eremin@intel.com>
Reviewed-by: John L. Hammond <john.hammond@intel.com>
Reviewed-by: James Simmons <uja.ornl@gmail.com>
Reviewed-by: Oleg Drokin <oleg.drokin@intel.com>
Signed-off-by: James Simmons <jsimmons@infradead.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

e47fad9a

staging: lustre: llite: lock the inode to be migrated · 8ef9dbe4

wang di authored Sep 18, 2016

Because the inode and its connected dentries will be cleared
out of the cache after migration, the inode needs to be locked
during the migration.
Signed-off-by: wang di <di.wang@intel.com>
Intel-bug-id: https://jira.hpdd.intel.com/browse/LU-4712
Reviewed-on: http://review.whamcloud.com/9689Reviewed-by: Lai Siyao <lai.siyao@intel.com>
Reviewed-by: Fan Yong <fan.yong@intel.com>
Reviewed-by: Oleg Drokin <oleg.drokin@intel.com>
Signed-off-by: James Simmons <jsimmons@infradead.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

8ef9dbe4

staging: lustre: obdclass: remove unnecessary EXPORT_SYMBOL · fee58df6

frank zago authored Sep 18, 2016

A lot of symbols don't need to be exported at all because they are
only used in the module they belong to.
Signed-off-by: frank zago <fzago@cray.com>
Intel-bug-id: https://jira.hpdd.intel.com/browse/LU-5829
Reviewed-on: http://review.whamcloud.com/13323Reviewed-by: Jian Yu <jian.yu@intel.com>
Reviewed-by: Dmitry Eremin <dmitry.eremin@intel.com>
Reviewed-by: James Simmons <uja.ornl@gmail.com>
Reviewed-by: Oleg Drokin <oleg.drokin@intel.com>
Signed-off-by: James Simmons <jsimmons@infradead.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

fee58df6

staging: lustre: misc: remove unnecessary EXPORT_SYMBOL · b5d1b04e

frank zago authored Sep 18, 2016

A lot of symbols don't need to be exported at all because they are
only used in the module they belong to.
Signed-off-by: frank zago <fzago@cray.com>
Intel-bug-id: https://jira.hpdd.intel.com/browse/LU-5829
Reviewed-on: http://review.whamcloud.com/13321Reviewed-by: James Simmons <uja.ornl@gmail.com>
Reviewed-by: Dmitry Eremin <dmitry.eremin@intel.com>
Reviewed-by: Oleg Drokin <oleg.drokin@intel.com>
Signed-off-by: James Simmons <jsimmons@infradead.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

b5d1b04e

staging: lustre: grant: quiet message on grant waiting timeout · a09be844

Johann Lombardi authored Sep 18, 2016

Use at_max in osc_enter_cache() to bound how long we wait for grant
space before switching to synchronous I/Os. Do not print a message
on the console when the timeout is hit since such long wait can
be legitimate with flaky network (i.e. BRW is resent multiple times).
Signed-off-by: Johann Lombardi <johann.lombardi@intel.com>
Intel-bug-id: https://jira.hpdd.intel.com/browse/LU-5521
Reviewed-on: http://review.whamcloud.com/12146Reviewed-by: Niu Yawei <yawei.niu@intel.com>
Reviewed-by: Jinshan Xiong <jinshan.xiong@intel.com>
Reviewed-by: Oleg Drokin <oleg.drokin@intel.com>
Signed-off-by: James Simmons <jsimmons@infradead.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

a09be844

staging: lustre: lmv: Do not revalidate stripes with master lock · 15b241c5

wang di authored Sep 18, 2016

Do not revalidate slave stripes while holding master lock.
Otherwise if the revalidating slaves are blocked, then the
master lock can not be released in time.

Remove some unnecesary merging in ll_revalidate_slave(), and
the attributes will be stored in each stripe, only
merging them if required.
Signed-off-by: wang di <di.wang@intel.com>
Intel-bug-id: https://jira.hpdd.intel.com/browse/LU-6088
Reviewed-on: http://review.whamcloud.com/13432Reviewed-by: Andreas Dilger <andreas.dilger@intel.com>
Reviewed-by: Lai Siyao <lai.siyao@intel.com>
Reviewed-by: Oleg Drokin <oleg.drokin@intel.com>
Signed-off-by: James Simmons <jsimmons@infradead.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

15b241c5

staging: lustre: client: Fix mkdir -i 1 from DNE2 client to DNE1 server · 0022c6bc

Artem Blagodarenko authored Sep 18, 2016

After DNE phase 2 has been added to client it sends
create request to slave MDT.  DNT1-only server doesn't
expect request to slave MDT from client. It expects
only cross-mdt request from master MDT. Thus if DNE2
client tries to "mkdir -i 1" on DNE1 server, then
LBUG happened.

This patch adds OBD_CONNECT_DIR_STRIPE connection
flag check on client side. If striped directories are not
supported by server, then create requrest is sent to
master MDT.
Signed-off-by: Artem Blagodarenko <artem_blagodarenko@xyratex.com>
Intel-bug-id: https://jira.hpdd.intel.com/browse/LU-6071
Xyratex-bug-id: MRP-2319
Reviewed-on: http://review.whamcloud.com/13189Reviewed-by: Andreas Dilger <andreas.dilger@intel.com>
Reviewed-by: wang di <di.wang@intel.com>
Reviewed-by: Oleg Drokin <oleg.drokin@intel.com>
Signed-off-by: James Simmons <jsimmons@infradead.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

0022c6bc

staging: lustre: clio: pass fid for OST setattr · 7ebb0ef3

Bobi Jam authored Sep 18, 2016

Store inode's fid in cl_setattr_ost() and OSC packs this info on the
wire (via lustre_set_wire_obdo) so that OST can use.

NOTE: currently lu_fid::f_ver and obdo::o_parent_ver are not used on
OFD device, and we use obdo::o_stripe_idx as
filter_fid::ff_parent::f_ver and save it to the device.
Signed-off-by: Bobi Jam <bobijam.xu@intel.com>
Intel-bug-id: https://jira.hpdd.intel.com/browse/LU-1154
Reviewed-on: http://review.whamcloud.com/12902Reviewed-by: Jinshan Xiong <jinshan.xiong@intel.com>
Reviewed-by: Andreas Dilger <andreas.dilger@intel.com>
Reviewed-by: John L. Hammond <john.hammond@intel.com>
Signed-off-by: James Simmons <jsimmons@infradead.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

7ebb0ef3

staging: lustre: clio: rename coo_attr_set to coo_attr_update · 96234ec5

Bobi Jam authored Sep 18, 2016

coo_attr_set() is used to update object's attribute but its name
makes confusion that people intuitively think that it is used to
pass object's attribute down to server sides.
Signed-off-by: Bobi Jam <bobijam.xu@intel.com>
Intel-bug-id: https://jira.hpdd.intel.com/browse/LU-1154
Reviewed-on: http://review.whamcloud.com/12888Reviewed-by: Jinshan Xiong <jinshan.xiong@intel.com>
Reviewed-by: Andreas Dilger <andreas.dilger@intel.com>
Signed-off-by: James Simmons <jsimmons@infradead.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

96234ec5

staging: lustre: llite: pack suppgid to MDS correctly · 864d6a25

Fan Yong authored Sep 18, 2016

The ll_lookup_it() may trigger IT_OPEN RPC to open a file by name.
But at that time, the client does not know the target file's GID,
so it cannot pack the necessary supplementary group ID in the RPC.
Because of missing the supplementary group ID, the RPC maybe fail
for open permission check on the MDS. Under such case, MDS should
return the target file's GID, if the current thread on the client
in the right group (according to the file's GID), the client will
try the IT_OPEN RPC again with the right supplementary group ID.

This patch is also helpful if some other(s) changed the file's GID
after current RPC sent to the MDS with the suppgid as the original
GID by race.
Signed-off-by: Fan Yong <fan.yong@intel.com>
Intel-bug-id: https://jira.hpdd.intel.com/browse/LU-5423
Reviewed-on: http://review.whamcloud.com/12476Reviewed-by: Lai Siyao <lai.siyao@intel.com>
Reviewed-by: Andreas Dilger <andreas.dilger@intel.com>
Reviewed-by: Oleg Drokin <oleg.drokin@intel.com>
Signed-off-by: James Simmons <jsimmons@infradead.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

864d6a25

staging: lustre: remove lustre/include/linux/ · 23ec6607

John L. Hammond authored Sep 18, 2016

Merge the contents of lustre/include/linux/lvfs.h into
lustre/include/lvfs.h. Merge lustre/include/linux/lustre_user.h into
lustre/include/lustre/lustre_user.h. Move lustre_compat25.h and
lustre_patchless_compat.h from lustre/include/linux/ to
lustre/include/ and rename lustre_compat25.h to lustre_compat.h.
Signed-off-by: John L. Hammond <john.hammond@intel.com>
Intel-bug-id: https://jira.hpdd.intel.com/browse/LU-2675
Reviewed-on: http://review.whamcloud.com/13271Reviewed-by: Bob Glossman <bob.glossman@intel.com>
Reviewed-by: Amir Shehata <amir.shehata@intel.com>
Reviewed-by: Oleg Drokin <oleg.drokin@intel.com>
Signed-off-by: James Simmons <jsimmons@infradead.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

23ec6607

staging: lustre: libcfs: check mask returned by cpumask_of_node · c658b696

Liang Zhen authored Sep 18, 2016

cpumask_of_node can return NULL if NUMA node is unavailable,
in this case cfs_node_to_cpumask will try to copy from NULL
and cause kernel panic.
Signed-off-by: Liang Zhen <liang.zhen@intel.com>
Intel-bug-id: https://jira.hpdd.intel.com/browse/LU-5751
Reviewed-on: http://review.whamcloud.com/13207Reviewed-by: Li Wei <wei.g.li@intel.com>
Reviewed-by: Bobi Jam <bobijam@hotmail.com>
Reviewed-by: James Simmons <uja.ornl@gmail.com>
Reviewed-by: Oleg Drokin <oleg.drokin@intel.com>
Signed-off-by: James Simmons <jsimmons@infradead.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

c658b696

staging: lustre: obd: change type of cl_conn_count to size_t · 6314ccb6

Dmitry Eremin authored Sep 18, 2016

Change type of cl_conn_count to size_t.
Signed-off-by: Dmitry Eremin <dmitry.eremin@intel.com>
Intel-bug-id: https://jira.hpdd.intel.com/browse/LU-5577
Reviewed-on: http://review.whamcloud.com/13125Reviewed-by: James Simmons <uja.ornl@gmail.com>
Reviewed-by: John L. Hammond <john.hammond@intel.com>
Reviewed-by: Oleg Drokin <oleg.drokin@intel.com>
Signed-off-by: James Simmons <jsimmons@infradead.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

6314ccb6

staging: lustre: llite: unlock inode size in ll_lov_setstripe_ea_info() · 6d2f0127

John L. Hammond authored Sep 18, 2016

In ll_lov_setstripe_ea_info() release the inode size lock on all
appropriate exit paths.
Signed-off-by: John L. Hammond <john.hammond@intel.com>
Intel-bug-id: https://jira.hpdd.intel.com/browse/LU-6059
Reviewed-on: http://review.whamcloud.com/13167Reviewed-by: Jinshan Xiong <jinshan.xiong@intel.com>
Reviewed-by: Andreas Dilger <andreas.dilger@intel.com>
Reviewed-by: Lai Siyao <lai.siyao@intel.com>
Signed-off-by: James Simmons <jsimmons@infradead.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

6d2f0127

staging: lustre: lprocfs: cleanup stats locking code · a466ca4e

Andreas Dilger authored Sep 18, 2016

Add comment blocks on lprocfs_stats_lock() and lprocfs_stats_unlock().
Move common NOPERCPU code out of the switch() statements to reduce
code size and complexity, since it doesn't depend on the opc at all.

Replace switch() in lprocfs_stats_unlock() with a simple if/else,
since the lock opc was already checked in lprocfs_stats_lock().

Add an enum for the lprocfs_stats_lock() operations to make it clear
what the valid values are and allow compiler checking.
Signed-off-by: Andreas Dilger <andreas.dilger@intel.com>
Intel-bug-id: https://jira.hpdd.intel.com/browse/LU-5946
Reviewed-on: http://review.whamcloud.com/12872Reviewed-by: Bobi Jam <bobijam@hotmail.com>
Reviewed-by: John L. Hammond <john.hammond@intel.com>
Reviewed-by: Oleg Drokin <oleg.drokin@intel.com>
Signed-off-by: James Simmons <jsimmons@infradead.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

a466ca4e

staging: lustre: osc: change cl_extent_tax and *grants to unsigned · 6ffc4b3b

Dmitry Eremin authored Sep 18, 2016

Change the type accordant usage and remove warnings.
Signed-off-by: Dmitry Eremin <dmitry.eremin@intel.com>
Intel-bug-id: https://jira.hpdd.intel.com/browse/LU-5577
Reviewed-on: http://review.whamcloud.com/12386Reviewed-by: James Simmons <uja.ornl@gmail.com>
Reviewed-by: John L. Hammond <john.hammond@intel.com>
Reviewed-by: Oleg Drokin <oleg.drokin@intel.com>
Signed-off-by: James Simmons <jsimmons@infradead.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

6ffc4b3b

staging: lustre: osc: osc_object_ast_clear() LBUG · 7715c636

Bobi Jam authored Sep 18, 2016

An OSC object could be destroyed with AGL locks waiting for granted,
so we'd get rid of the osc_object_ast_clear() assertion that its
dlm locks all getting granted.
Signed-off-by: Bobi Jam <bobijam.xu@intel.com>
Intel-bug-id: https://jira.hpdd.intel.com/browse/LU-6042
Reviewed-on: http://review.whamcloud.com/13163Reviewed-by: Jinshan Xiong <jinshan.xiong@intel.com>
Reviewed-by: Niu Yawei <yawei.niu@intel.com>
Reviewed-by: Oleg Drokin <oleg.drokin@intel.com>
Signed-off-by: James Simmons <jsimmons@infradead.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

7715c636

staging: lustre: mgc: add nid iteration · bc875120

Alexander Boyko authored Sep 18, 2016

mgc_apply_recover_logs use only first nid from entry,
this could be the problem for a cluster with several network
address for a one node.
Signed-off-by: Alexander Boyko <alexander.boyko@seagate.com>
Intel-bug-id: https://jira.hpdd.intel.com/browse/LU-5950
Xyratex-bug-id: MRP-2255
Reviewed-on: http://review.whamcloud.com/12829Reviewed-by: Andreas Dilger <andreas.dilger@intel.com>
Reviewed-by: Mike Pershin <mike.pershin@intel.com>
Reviewed-by: Ann Koehler <amk@cray.com>
Signed-off-by: James Simmons <jsimmons@infradead.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

bc875120

staging: lustre: ptlrpc: fix race between connect vs resend · d09cfb00

Alexander Boyko authored Sep 18, 2016

Buggy code at ptlrpc_connect_interpret()
finish:
    rc = ptlrpc_import_recovery_state_machine(imp);
    ...
    Set import connection flags
When import has FULL state ptlrpc_import_recovery_state_machine()
wakeup all waiters on import and all delayed request, which was
resented. And it could happened that request was send without
updated flags and AT is disabled. If such request is in progress
on the server, server drop the new instance, and could do early reply
for it. But this early reply confuse client, cause it wait real
reply(no AT for this request). Client try to touch buffer outside
reply and got EPROTO error.
The same bug existed for initital connect too. Import became FULL
before import connection flags was set.
Signed-off-by: Alexander Boyko <alexander_boyko@xyratex.com>
Intel-bug-id: https://jira.hpdd.intel.com/browse/LU-5528
Xyratex-bug-id: MRP-2034
Reviewed-on: http://review.whamcloud.com/11723Reviewed-by: Li Wei <wei.g.li@intel.com>
Reviewed-by: Alexander Boyko <alexander.boyko@seagate.com>
Reviewed-by: Liang Zhen <liang.zhen@intel.com>
Reviewed-by: Oleg Drokin <oleg.drokin@intel.com>
Signed-off-by: James Simmons <jsimmons@infradead.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

d09cfb00

staging: lustre: lov: flatten struct lov_stripe_md · 01cd98ff

John L. Hammond authored Sep 18, 2016

Flatten out the lsm_wire struct from the middle of struct
lov_stripe_md and remove the member name macros.
Signed-off-by: John L. Hammond <john.hammond@intel.com>
Intel-bug-id: https://jira.hpdd.intel.com/browse/LU-5814
Reviewed-on: http://review.whamcloud.com/12581Reviewed-by: Bobi Jam <bobijam@gmail.com>
Reviewed-by: Jinshan Xiong <jinshan.xiong@intel.com>
Reviewed-by: Oleg Drokin <oleg.drokin@intel.com>
Signed-off-by: James Simmons <jsimmons@infradead.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

01cd98ff

staging: lustre: ldlm: move LDLM_GID_ANY to lustre_dlm.h · 8b915c1e

Jinshan Xiong authored Sep 18, 2016

lustre_idl.h only includes wire data; lustre_dlm.h is the
right place for LDLM_GID_ANY.
Signed-off-by: Jinshan Xiong <jinshan.xiong@intel.com>
Intel-bug-id: https://jira.hpdd.intel.com/browse/LU-6028
Reviewed-on: http://review.whamcloud.com/13074Reviewed-by: Bobi Jam <bobijam@gmail.com>
Reviewed-by: James Simmons <uja.ornl@gmail.com>
Reviewed-by: John L. Hammond <john.hammond@intel.com>
Reviewed-by: Oleg Drokin <oleg.drokin@intel.com>
Signed-off-by: James Simmons <jsimmons@infradead.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

8b915c1e

staging: lustre: ptlrpc: fix comparison between signed and unsigned · 26ad0dc3

Dmitry Eremin authored Sep 18, 2016

Change return type and size argiments of lustre_msg_hdr_size(),
lustre_msg_buf{len,count}() and req_capsule_*_size() to __u32.
Change type of req_format->rf_idx and req_format->rf_fields.nr
to size_t. Also return zero for incorrect message magic instead
of -EINVAL. This will be more robust because of few of them after
LASSERTF(0, "...") and will not be returned. In the rest places
it return zero size instead of huge number after implicit
unsigned conversion.
Signed-off-by: Dmitry Eremin <dmitry.eremin@intel.com>
Intel-bug-id: https://jira.hpdd.intel.com/browse/LU-5577
Reviewed-on: http://review.whamcloud.com/12475Reviewed-by: James Simmons <uja.ornl@gmail.com>
Reviewed-by: Fan Yong <fan.yong@intel.com>
Reviewed-by: John L. Hammond <john.hammond@intel.com>
Reviewed-by: Oleg Drokin <oleg.drokin@intel.com>
Signed-off-by: James Simmons <jsimmons@infradead.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

26ad0dc3

staging: lustre: clio: add coo_getstripe interface · a33fdc0d

Bobi Jam authored Sep 18, 2016

Use cl_object_operations::coo_getstripe() to handle
LL_IOC_LOV_GETSTRIPE ops.
Signed-off-by: Bobi Jam <bobijam.xu@intel.com>
Intel-bug-id: https://jira.hpdd.intel.com/browse/LU-5823
Reviewed-on: http://review.whamcloud.com/12452Reviewed-by: John L. Hammond <john.hammond@intel.com>
Reviewed-by: Jinshan Xiong <jinshan.xiong@intel.com>
Reviewed-by: Oleg Drokin <oleg.drokin@intel.com>
Signed-off-by: James Simmons <jsimmons@infradead.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

a33fdc0d

staging: lustre: obdclass: change cl_fault_io->ft_nob to size_t · fdeb14fa

Dmitry Eremin authored Sep 18, 2016

Change the type accordant usage.
Signed-off-by: Dmitry Eremin <dmitry.eremin@intel.com>
Intel-bug-id: https://jira.hpdd.intel.com/browse/LU-5577
Reviewed-on: http://review.whamcloud.com/12380Reviewed-by: John L. Hammond <john.hammond@intel.com>
Reviewed-by: James Simmons <uja.ornl@gmail.com>
Reviewed-by: Oleg Drokin <oleg.drokin@intel.com>
Signed-off-by: James Simmons <jsimmons@infradead.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

fdeb14fa

staging: lustre: obd: change brw_page->count to unsigned · 855b5da5

Dmitry Eremin authored Sep 18, 2016

Pages count is unsigned. So, change the type accordant usage.
Signed-off-by: Dmitry Eremin <dmitry.eremin@intel.com>
Intel-bug-id: https://jira.hpdd.intel.com/browse/LU-5577
Reviewed-on: http://review.whamcloud.com/12378Reviewed-by: John L. Hammond <john.hammond@intel.com>
Reviewed-by: James Simmons <uja.ornl@gmail.com>
Reviewed-by: Oleg Drokin <oleg.drokin@intel.com>
Signed-off-by: James Simmons <jsimmons@infradead.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

855b5da5

staging: lustre: ldlm: Recalculate interval in ldlm_pool_recalc() · a5d604d7

Nathaniel Clark authored Sep 18, 2016

Instead of rechecking a static value, recalculate to see if pool stats
need to be updated.
Add newline so message will print instead of warning about missing
newline.
Signed-off-by: Nathaniel Clark <nathaniel.l.clark@intel.com>
Intel-bug-id: https://jira.hpdd.intel.com/browse/LU-4536
Reviewed-on: http://review.whamcloud.com/12547Reviewed-by: Lai Siyao <lai.siyao@intel.com>
Reviewed-by: Andreas Dilger <andreas.dilger@intel.com>
Reviewed-by: Jian Yu <jian.yu@intel.com>
Signed-off-by: James Simmons <jsimmons@infradead.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

a5d604d7

staging: lustre: ptlrpc: Suppress error message when imp_sec is freed · d333597a

Amir Shehata authored Sep 18, 2016

There is a race condition on client reconnect when the import
is being destroyed.  Some outstanding client bound requests
are being processed when the imp_sec has alread been freed.
Ensure to suppress the error message in import_sec_validate_get()
in that case
Signed-off-by: Amir Shehata <amir.shehata@intel.com>
Intel-bug-id: https://jira.hpdd.intel.com/browse/LU-3353
Reviewed-on: http://review.whamcloud.com/10200Reviewed-by: Andreas Dilger <andreas.dilger@intel.com>
Reviewed-by: James Simmons <uja.ornl@gmail.com>
Reviewed-by: Oleg Drokin <oleg.drokin@intel.com>
Signed-off-by: James Simmons <jsimmons@infradead.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

d333597a

staging: lustre: obdclass: eliminate NULL error return · a21e4b90

Bob Glossman authored Sep 18, 2016

Always return an ERR_PTR() on errors, never return a NULL,
in lu_object_find_slice().  Also clean up callers who
no longer need special case handling of NULL returns.
Signed-off-by: Bob Glossman <bob.glossman@intel.com>
Intel-bug-id: https://jira.hpdd.intel.com/browse/LU-5858
Reviewed-on: http://review.whamcloud.com/12554Reviewed-by: Dmitry Eremin <dmitry.eremin@intel.com>
Reviewed-by: Fan Yong <fan.yong@intel.com>
Reviewed-by: John L. Hammond <john.hammond@intel.com>
Reviewed-by: Oleg Drokin <oleg.drokin@intel.com>
Signed-off-by: James Simmons <jsimmons@infradead.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

a21e4b90

staging: lustre: obdclass: change loop indexes to unsigned · 0ae0d5eb

Dmitry Eremin authored Sep 18, 2016

Cleanup warnings about comparison between signed and unsigned.
Signed-off-by: Dmitry Eremin <dmitry.eremin@intel.com>
Intel-bug-id: https://jira.hpdd.intel.com/browse/LU-5577
Reviewed-on: http://review.whamcloud.com/12387Reviewed-by: Bob Glossman <bob.glossman@intel.com>
Reviewed-by: John L. Hammond <john.hammond@intel.com>
Reviewed-by: Jinshan Xiong <jinshan.xiong@intel.com>
Reviewed-by: Oleg Drokin <oleg.drokin@intel.com>
Signed-off-by: James Simmons <jsimmons@infradead.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

0ae0d5eb

staging: lustre: fiemap: set FIEMAP_EXTENT_LAST correctly · 5e6bcb48

Bobi Jam authored Sep 18, 2016

When we've collected enough extents as user requested, we'd check one
further to decide whether we've reached the last extent of the file.
Signed-off-by: Bobi Jam <bobijam.xu@intel.com>
Intel-bug-id: https://jira.hpdd.intel.com/browse/LU-5933
Reviewed-on: http://review.whamcloud.com/12781Reviewed-by: Andreas Dilger <andreas.dilger@intel.com>
Reviewed-by: Oleg Drokin <oleg.drokin@intel.com>
Signed-off-by: James Simmons <jsimmons@infradead.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

5e6bcb48

staging: lustre: ldlm: evict clients returning errors on ASTs · c86a6c11

Alexey Lyashkov authored Sep 18, 2016

To test proper behavior of clients returning errors on ASTs
we can induce a failure with setting OBD_FAIL_LDLM_BL_CALLBACK_NET.
Handle the new additonal case of cfs_fail_err being set as well
so that the cfs_fail_err can be sent back in a reply.
Signed-off-by: Alexey Lyashkov <alexey_lyashkov@xyratex.com>
Intel-bug-id: https://jira.hpdd.intel.com/browse/LU-5581
Xyratex-bug-id: MRP-2041
Reviewed-on: http://review.whamcloud.com/11752Reviewed-by: James Simmons <uja.ornl@gmail.com>
Reviewed-by: Oleg Drokin <oleg.drokin@intel.com>
Signed-off-by: James Simmons <jsimmons@infradead.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

c86a6c11

staging: lustre: mdc: Proper accessing struct lov_user_md · 8971ee5b

Yoshifumi Uemura authored Sep 18, 2016

In mdc_setattr_pack() access the members of struct lov_user_md by
little endian byte order.
Signed-off-by: Yoshifumi Uemura <kogexe@gmail.com>
Intel-bug-id: https://jira.hpdd.intel.com/browse/LU-5889
Reviewed-on: http://review.whamcloud.com/12683Reviewed-by: Dmitry Eremin <dmitry.eremin@intel.com>
Reviewed-by: James Simmons <uja.ornl@gmail.com>
Reviewed-by: Yang Sheng <yang.sheng@intel.com>
Reviewed-by: Oleg Drokin <oleg.drokin@intel.com>
Signed-off-by: James Simmons <jsimmons@infradead.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

8971ee5b

staging: lustre: obdclass: lu_htable_order() return type to long · ba40ae79

Dmitry Eremin authored Sep 18, 2016

Change the type accordant usage.
Signed-off-by: Dmitry Eremin <dmitry.eremin@intel.com>
Intel-bug-id: https://jira.hpdd.intel.com/browse/LU-5577
Reviewed-on: http://review.whamcloud.com/12385Reviewed-by: Bob Glossman <bob.glossman@intel.com>
Reviewed-by: John L. Hammond <john.hammond@intel.com>
Reviewed-by: Oleg Drokin <oleg.drokin@intel.com>
Signed-off-by: James Simmons <jsimmons@infradead.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

ba40ae79

staging: lustre: llite: fix dup flags names · 816afc79

Bob Glossman authored Sep 18, 2016

The name 'xattr' is used for two different ll_flags bits.
Change the names to be distinct and different, reflecting
the names of the bits used in LL_SBI_xbitnamex #defines.
Signed-off-by: Bob Glossman <bob.glossman@intel.com>
Intel-bug-id: https://jira.hpdd.intel.com/browse/LU-5586
Reviewed-on: http://review.whamcloud.com/12892Reviewed-by: Minh Diep <minh.diep@intel.com>
Reviewed-by: Jian Yu <jian.yu@intel.com>
Reviewed-by: Oleg Drokin <oleg.drokin@intel.com>
Signed-off-by: James Simmons <jsimmons@infradead.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

816afc79

staging: lustre: llog: prevent out-of-bound index · 2ce3647e

frank zago authored Sep 18, 2016

llog_process_thread() can be called from llog_cat_process_cb with an
index already out of bound, leading to the following crash:

LustreError: 3773:0:(llog.c:310:llog_process_thread())
  ASSERTION(index <= last_index + 1 ) failed:
LustreError: 3773:0:(llog.c:310:llog_process_thread()) LBUG

 #0 [ffff8801144bf900] machine_kexec at ffffffff81038f3b
 #1 [ffff8801144bf960] crash_kexec at ffffffff810c5d82
 #2 [ffff8801144bfa30] panic at ffffffff8152798a
 #3 [ffff8801144bfab0] lbug_with_loc at ffffffffa02f8eeb [libcfs]
 #4 [ffff8801144bfad0] llog_process_thread at ffffffffa0413fff [obdclass]
 #5 [ffff8801144bfb80] llog_process_or_fork at ffffffffa041585f [obdclass]
 #6 [ffff8801144bfbd0] llog_cat_process_cb at ffffffffa0418612 [obdclass]
 #7 [ffff8801144bfc30] llog_process_thread at ffffffffa0413c22 [obdclass]
 #8 [ffff8801144bfce0] llog_process_or_fork at ffffffffa041585f [obdclass]
 #9 [ffff8801144bfd30] llog_cat_process_or_fork at ffffffffa0416b9d [obdclass]

If index is too big, simply return success.
Signed-off-by: frank zago <fzago@cray.com>
Intel-bug-id: https://jira.hpdd.intel.com/browse/LU-5635
Reviewed-on: http://review.whamcloud.com/12161Reviewed-by: Jinshan Xiong <jinshan.xiong@intel.com>
Reviewed-by: Patrick Farrell <paf@cray.com>
Reviewed-by: John L. Hammond <john.hammond@intel.com>
Reviewed-by: Oleg Drokin <oleg.drokin@intel.com>
Signed-off-by: James Simmons <jsimmons@infradead.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

2ce3647e

staging: lustre: ptlrpc: quiet errors on initial connection · 50932a22

Andreas Dilger authored Sep 18, 2016

It may be that a client or MDS is trying to connect to a target (OST
or peer MDT) before that target is finished setup. Rather than
spamming the console logs during initial connection, only print a
console error message if there are repeated failures trying to
connect to the target, which may indicate an error on that node.
Signed-off-by: Andreas Dilger <andreas.dilger@intel.com>
Signed-off-by: Bobi Jam <bobijam.xu@intel.com>
Intel-bug-id: https://jira.hpdd.intel.com/browse/LU-3456
Reviewed-on: http://review.whamcloud.com/10057Reviewed-by: Bobi Jam <bobijam@gmail.com>
Reviewed-by: Bob Glossman <bob.glossman@intel.com>
Reviewed-by: Oleg Drokin <oleg.drokin@intel.com>
Signed-off-by: James Simmons <jsimmons@infradead.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

50932a22

staging: lustre: ldlm: revert the changes for lock canceling policy · 08fd0346

Jinshan Xiong authored Sep 18, 2016

The changes for LRU lock policy was introduced by commit bfae5a4e,
where I was trying to revise the policy to pick locks for canceling.

However, this caused two problems as mentioned in LU-5727. The first
problem is that the lock can only be picked for canceling only if
the number of LRU locks is over preset LRU number AND it's aged; the
second problem is that mdc_cancel_weight() tends to not cancel OPEN
locks, therefore open locks can be kept forever and finally exhausts
memory on the MDT side.

The commit 7b2d26b0 ("revert changes to ldlm_cancel_aged_policy") fixed
the first problem. This patch will revert the rest of changes related
to LRU policy revise.
Signed-off-by: Jinshan Xiong <jinshan.xiong@intel.com>
Intel-bug-id: https://jira.hpdd.intel.com/browse/LU-5727
Reviewed-on: http://review.whamcloud.com/12733Reviewed-by: Niu Yawei <yawei.niu@intel.com>
Reviewed-by: Bobi Jam <bobijam@gmail.com>
Reviewed-by: Oleg Drokin <oleg.drokin@intel.com>
Signed-off-by: James Simmons <jsimmons@infradead.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

08fd0346

staging: lustre: recovery: don't replay closed open · eda3565c

Niu Yawei authored Sep 18, 2016

To avoid scanning the replay open list every time in the
ptlrpc_free_committed(), the fix of LU-2613 (4322e0f9) changed
the ptlrpc_free_committed() to skip the open list unless the
import generation is changed. That introduced a race which could
make a closed open being replayed:

1. Application calls ll_close_inode_openhandle()-> mdc_close(),
   to close file, rq_replay is cleared, but the open request is
   still on the imp_committed_list;

2. Before the md_clear_open_replay_data() is called for close,
   client start replay, and that closed open will be replayed
   mistakenly;

3. Open replay interpret callback (mdc_replay_open) could race
   with the mdc_clear_open_replay_data() at the end;

This patch fix the ptlrpc_free_committed() to make sure the
open list is scanned on recovery to prevent the closed open request
from being replayed.
Signed-off-by: Niu Yawei <yawei.niu@intel.com>
Intel-bug-id: https://jira.hpdd.intel.com/browse/LU-5507
Reviewed-on: http://review.whamcloud.com/12667Reviewed-by: Lai Siyao <lai.siyao@intel.com>
Reviewed-by: Andreas Dilger <andreas.dilger@intel.com>
Reviewed-by: Oleg Drokin <oleg.drokin@intel.com>
Signed-off-by: James Simmons <jsimmons@infradead.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

eda3565c