[SASAG] Fwd: NFS server stumbling
Stuart Kendrick
skendric at fhcrc.org
Fri Nov 4 04:42:18 PDT 2011
I'm working with an NFS server which is stumbling. And I'm looking for
insights from our collective expertise.
DESCRIBE PROBLEM
The problem presents itself as slow 'ls' output -- sometimes, 'ls' can
spend seconds or minutes (1, 5, 20! minutes) to produce a listing of a
directory containing tens or hundreds of files. Happens maybe a dozen
times a day, across a handful of users on a handful of NFS clients (all
OpenSuse 11.3) mounting their volumes from one particular NFS server
(Solaris 10 9/10 s10x_u9wos_14a X86, assembled 2010-08-11) (which in
turn builds its LUNs from any and all of three back-end systems -- 3Par,
Hitachi, and NetApp storage arrays). The sole emacs user experiences
similar hangs on opening files. Most of the time, these boxes are
reading/writing large files (multi gigabyte output files from genome
sequencing runs), running multi-hour compute jobs on them; we don't know
if performance is affected on these.
This is an infrequent issue -- the vast majority of the time,
performance of 'ls' and 'emacs' feel reasonable. And the majority of
the clients (scores) and users (several hundred) don't report a problem...
REFERENCES
Have any recommends for books which explain NFS, and ONC-RPC in
particular, in detail? Which explain how to use 'nfsstat' and others to
trouble-shoot? I have these on-order:
NFS Illustrated by Callaghan
Managing NFS and NIS (O'Reilly)
TOOLS
What tools do you use to shed light on NFS issues? 'nfsstat' and
'wireshark' ... others?
PACKET TRACE
Packet traces show WRITEs and COMMITs which can take .0001 seconds to
complete ... or as long as 35 minutes! to complete.
[BTW: packet traces between Client and Server show no TCP pathology:
no TCP retransmissions or RSTs. Which is why I'm focusing on the NFS /
storage portion of the stack.]
Here is a WRITE which takes ~5 minutes to complete:
filter: rpc.xid == 0x1d207259
No. Abs Time DeltaT Bytes Src Dst Info
55808 10:23:46 14.563 2962 Client Server V3 WRITE Call (Reply In
85295)
62624 10:25:46 14.572 2962 Client Server [RPC retransmission of
#55808] V3 WRITE Call (Reply In 85295)
69434 10:26:46 14.579 2962 Client Server [RPC retransmission of
#55808] V3 WRITE Call (Reply In 85295)
76235 10:28:46 14.577 2962 Client Server [RPC retransmission of
#55808] V3 WRITE Call (Reply In 85295)
85295 10:29:22 0.860 206 Server Client V3 WRITE Reply (Call In
55808)
Looks like some WRITEs needed ~20 minutes to complete, while some
COMMITs needed ~35 minutes.
ONC-RPC Server Response Time statistics for NFSv3:
Procedure Calls Min SRT Max SRT Avg SRT
WRITE 89 .000180 1188.59 260.127
COMMIT 5 328.581 2106.62 1163.34
FSSTAT 14 .000180 .000441 .000260
ACCESS 3 .000187 .000456 .000282
ONC-RPC Program Statistics:
Program Version Calls Min SRT Max SRT Avg SRT
NFS 3 111 .000180 2106.62 260.973
That '2106.62' seconds under Max SRT for a COMMIT is where I pulled the
'~35 minutes' figure.
NFSSTAT
Server:/home/test-user> nfsstat
Server rpc:
Connection oriented:
calls badcalls nullrecv badlen xdrcall dupchecks dupreqs
6565848596 0 0 0 0 2161188143 128335
Now, I'm new to interpreting nfsstat output. Anyone have enough
confidence to speak to this? Here's my thinking:
'calls' are the total number of RPC calls. 'dupchecks' tells us how
many times the NFS server ran one of those calls through cache. And
'dupreqs' tells us how many retransmits, or, more precisely, how many
times the NFS server realized that it was /already/ working on that call.
dupreqs / calls = 128334 / 6547537906 =~ .000019600% Not bad at all
i.e. only a tiny fraction of all the RPC calls are being retransmitted.
But then, I start to doubt my interpretation. I think I'm
misunderstanding the use of the term 'cache' as it relates to dupchecks
.. are we using the term cache in the sense of "If it's in my cache,
then I'll return the answer to you quicker than if I had to retrieve
blocks from storage". Or are we using the term cache in the way I would
use the term 'queue' ... meaning, if I'm already working on this request
(in my queue), then I won't bother adding your most recent request to my
queue. If the latter, then:
dupchecks / calls = 2161188143 / 6565848596 = ~.33 =~ 33%
Which would suggest that ~33% of all RPCs are retransmitted ... a
different story entirely.
http://docstore.mik.ua/orelly/networking_2ndEd/nfs/appb_01.htm
WHAT WE'RE DOING
-We've spent effort analyzing health (load, CPU utilization, network IO,
others) on the clients during these events; so far, we believe that the
clients are underutlized while 'ls' is being slow.
-A wrapper around 'ls' logs information: precisely what 'ls' command
was run, on what volume, by what user, with what flags, and how long it
took to complete
-We're building a table listing which volumes are involved with slow
'ls' commands, and which LUNs comprise those volumes. [Looks like some
of the volumes are built from spindles hosted on multiple storage
systems ... using concats, not stripes, meaning some files probably live
on a single spindle, rather than being striped across multiple spindles
... more investigation needed here.]
-We've started to chart IO latency on the relevant volumes
--sk
Stuart Kendrick
FHCRC
More information about the Members
mailing list