[SASAG] Fwd: NFS server stumbling
Ski Kacoroski
kacoroski at gmail.com
Fri Nov 4 11:48:05 PDT 2011
Stuart,
I have seen this kind of behavior happen when a server flushes the
meta-data of a directory out of its cache which causes it to essentially
do a scan of all directories to get their meta data. Try the following
to help debug:
1. Is there a difference between ls and ls -l? If ls -l is much slower
than that points to a meta-data issue.
2. If you re-run ls again right after a slow one, does it speed up (e.g.
the meta data is now cached).
We have this happen on our old file server as the cache was not large
enough (or smart enough) to keep all the meta data in it and the cache
was flushed when several large files filled it up.
Good luck,
ski
On 11-11-04 04:42 AM, Stuart Kendrick wrote:
> I'm working with an NFS server which is stumbling. And I'm looking for
> insights from our collective expertise.
>
>
> DESCRIBE PROBLEM
> The problem presents itself as slow 'ls' output -- sometimes, 'ls' can
> spend seconds or minutes (1, 5, 20! minutes) to produce a listing of a
> directory containing tens or hundreds of files. Happens maybe a dozen
> times a day, across a handful of users on a handful of NFS clients (all
> OpenSuse 11.3) mounting their volumes from one particular NFS server
> (Solaris 10 9/10 s10x_u9wos_14a X86, assembled 2010-08-11) (which in
> turn builds its LUNs from any and all of three back-end systems -- 3Par,
> Hitachi, and NetApp storage arrays). The sole emacs user experiences
> similar hangs on opening files. Most of the time, these boxes are
> reading/writing large files (multi gigabyte output files from genome
> sequencing runs), running multi-hour compute jobs on them; we don't know
> if performance is affected on these.
>
> This is an infrequent issue -- the vast majority of the time,
> performance of 'ls' and 'emacs' feel reasonable. And the majority of
> the clients (scores) and users (several hundred) don't report a problem...
>
>
>
> REFERENCES
> Have any recommends for books which explain NFS, and ONC-RPC in
> particular, in detail? Which explain how to use 'nfsstat' and others to
> trouble-shoot? I have these on-order:
> NFS Illustrated by Callaghan
> Managing NFS and NIS (O'Reilly)
>
>
> TOOLS
> What tools do you use to shed light on NFS issues? 'nfsstat' and
> 'wireshark' ... others?
>
>
>
> PACKET TRACE
> Packet traces show WRITEs and COMMITs which can take .0001 seconds to
> complete ... or as long as 35 minutes! to complete.
>
> [BTW: packet traces between Client and Server show no TCP pathology:
> no TCP retransmissions or RSTs. Which is why I'm focusing on the NFS /
> storage portion of the stack.]
>
>
> Here is a WRITE which takes ~5 minutes to complete:
>
> filter: rpc.xid == 0x1d207259
>
> No. Abs Time DeltaT Bytes Src Dst Info
> 55808 10:23:46 14.563 2962 Client Server V3 WRITE Call (Reply In
> 85295)
> 62624 10:25:46 14.572 2962 Client Server [RPC retransmission of
> #55808] V3 WRITE Call (Reply In 85295)
> 69434 10:26:46 14.579 2962 Client Server [RPC retransmission of
> #55808] V3 WRITE Call (Reply In 85295)
> 76235 10:28:46 14.577 2962 Client Server [RPC retransmission of
> #55808] V3 WRITE Call (Reply In 85295)
> 85295 10:29:22 0.860 206 Server Client V3 WRITE Reply (Call In
> 55808)
>
> Looks like some WRITEs needed ~20 minutes to complete, while some
> COMMITs needed ~35 minutes.
>
> ONC-RPC Server Response Time statistics for NFSv3:
> Procedure Calls Min SRT Max SRT Avg SRT
> WRITE 89 .000180 1188.59 260.127
> COMMIT 5 328.581 2106.62 1163.34
> FSSTAT 14 .000180 .000441 .000260
> ACCESS 3 .000187 .000456 .000282
>
> ONC-RPC Program Statistics:
> Program Version Calls Min SRT Max SRT Avg SRT
> NFS 3 111 .000180 2106.62 260.973
>
>
> That '2106.62' seconds under Max SRT for a COMMIT is where I pulled the
> '~35 minutes' figure.
>
>
> NFSSTAT
>
> Server:/home/test-user> nfsstat
>
> Server rpc:
> Connection oriented:
> calls badcalls nullrecv badlen xdrcall dupchecks dupreqs
> 6565848596 0 0 0 0 2161188143 128335
>
> Now, I'm new to interpreting nfsstat output. Anyone have enough
> confidence to speak to this? Here's my thinking:
>
> 'calls' are the total number of RPC calls. 'dupchecks' tells us how
> many times the NFS server ran one of those calls through cache. And
> 'dupreqs' tells us how many retransmits, or, more precisely, how many
> times the NFS server realized that it was /already/ working on that call.
>
> dupreqs / calls = 128334 / 6547537906 =~ .000019600% Not bad at all
>
> i.e. only a tiny fraction of all the RPC calls are being retransmitted.
>
> But then, I start to doubt my interpretation. I think I'm
> misunderstanding the use of the term 'cache' as it relates to dupchecks
> .. are we using the term cache in the sense of "If it's in my cache,
> then I'll return the answer to you quicker than if I had to retrieve
> blocks from storage". Or are we using the term cache in the way I would
> use the term 'queue' ... meaning, if I'm already working on this request
> (in my queue), then I won't bother adding your most recent request to my
> queue. If the latter, then:
>
> dupchecks / calls = 2161188143 / 6565848596 = ~.33 =~ 33%
>
> Which would suggest that ~33% of all RPCs are retransmitted ... a
> different story entirely.
>
> http://docstore.mik.ua/orelly/networking_2ndEd/nfs/appb_01.htm
>
>
>
> WHAT WE'RE DOING
> -We've spent effort analyzing health (load, CPU utilization, network IO,
> others) on the clients during these events; so far, we believe that the
> clients are underutlized while 'ls' is being slow.
>
> -A wrapper around 'ls' logs information: precisely what 'ls' command
> was run, on what volume, by what user, with what flags, and how long it
> took to complete
>
> -We're building a table listing which volumes are involved with slow
> 'ls' commands, and which LUNs comprise those volumes. [Looks like some
> of the volumes are built from spindles hosted on multiple storage
> systems ... using concats, not stripes, meaning some files probably live
> on a single spindle, rather than being striped across multiple spindles
> ... more investigation needed here.]
>
> -We've started to chart IO latency on the relevant volumes
>
>
> --sk
>
> Stuart Kendrick
> FHCRC
> _______________________________________________
> Members mailing list
> Members at lists.sasag.org
> http://lists.sasag.org/mailman/listinfo/members
--
"When we try to pick out anything by itself, we find it
connected to the entire universe" John Muir
Chris "Ski" Kacoroski, Unix Admin
206-501-9803, ski98033 on IRC and most IM services
More information about the Members
mailing list