[SASAG] Fwd: NFS server stumbling

Ski Kacoroski kacoroski at gmail.com
Fri Nov 4 11:48:05 PDT 2011


Stuart,

I have seen this kind of behavior happen when a server flushes the 
meta-data of a directory out of its cache which causes it to essentially 
do a scan of all directories to get their meta data.  Try the following 
to help debug:

1. Is there a difference between ls and ls -l?  If ls -l is much slower 
than that points to a meta-data issue.

2. If you re-run ls again right after a slow one, does it speed up (e.g. 
the meta data is now cached).

We have this happen on our old file server as the cache was not large 
enough (or smart enough) to keep all the meta data in it and the cache 
was flushed when several large files filled it up.

Good luck,

ski

On 11-11-04 04:42 AM, Stuart Kendrick wrote:
> I'm working with an NFS server which is stumbling.  And I'm looking for
> insights from our collective expertise.
>
>
> DESCRIBE PROBLEM
> The problem presents itself as slow 'ls' output -- sometimes, 'ls' can
> spend seconds or minutes (1, 5, 20! minutes) to produce a listing of a
> directory containing tens or hundreds of files.  Happens maybe a dozen
> times a day, across a handful of users on a handful of NFS clients (all
> OpenSuse 11.3) mounting their volumes from one particular NFS server
> (Solaris 10 9/10 s10x_u9wos_14a X86, assembled 2010-08-11) (which in
> turn builds its LUNs from any and all of three back-end systems -- 3Par,
> Hitachi, and NetApp storage arrays).  The sole emacs user experiences
> similar hangs on opening files.  Most of the time, these boxes are
> reading/writing large files (multi gigabyte output files from genome
> sequencing runs), running multi-hour compute jobs on them; we don't know
> if performance is affected on these.
>
> This is an infrequent issue -- the vast majority of the time,
> performance of 'ls' and 'emacs' feel reasonable.  And the majority of
> the clients (scores) and users (several hundred) don't report a problem...
>
>
>
> REFERENCES
> Have any recommends for books which explain NFS, and ONC-RPC in
> particular, in detail?  Which explain how to use 'nfsstat' and others to
> trouble-shoot?  I have these on-order:
> 	NFS Illustrated by Callaghan
> 	Managing NFS and NIS (O'Reilly)
>
>
> TOOLS
> What tools do you use to shed light on NFS issues?  'nfsstat' and
> 'wireshark' ... others?
>
>
>
> PACKET TRACE
> Packet traces show WRITEs and COMMITs which can take .0001 seconds to
> complete ... or as long as 35 minutes! to complete.
>
> [BTW:  packet traces between Client and Server show no TCP pathology:
> no TCP retransmissions or RSTs.  Which is why I'm focusing on the NFS /
> storage portion of the stack.]
>
>
> Here is a WRITE which takes ~5 minutes to complete:
>
> filter:  rpc.xid == 0x1d207259
>
> No.   Abs Time DeltaT Bytes Src     Dst    Info
> 55808 10:23:46 14.563 2962  Client  Server V3 WRITE Call (Reply In
>                                                            85295)
> 62624 10:25:46 14.572 2962  Client  Server [RPC retransmission of
>                            #55808] V3 WRITE Call (Reply In 85295)
> 69434 10:26:46 14.579 2962  Client  Server [RPC retransmission of
>                            #55808] V3 WRITE Call (Reply In 85295)
> 76235 10:28:46 14.577 2962  Client  Server [RPC retransmission of
>                            #55808] V3 WRITE Call (Reply In 85295)
> 85295 10:29:22 0.860  206   Server  Client V3 WRITE Reply (Call In
>                                                            55808)
>
> Looks like some WRITEs needed ~20 minutes to complete, while some
> COMMITs needed ~35 minutes.
>
> ONC-RPC Server Response Time statistics for NFSv3:
> Procedure     Calls     Min SRT     Max SRT     Avg SRT
> WRITE            89     .000180     1188.59     260.127
> COMMIT            5     328.581     2106.62     1163.34
> FSSTAT           14     .000180     .000441     .000260
> ACCESS            3     .000187     .000456     .000282
>
> ONC-RPC Program Statistics:
> Program   Version   Calls  Min SRT  Max SRT  Avg SRT
> NFS       3         111    .000180  2106.62  260.973
>
>
> That '2106.62' seconds under Max SRT for a COMMIT is where I pulled the
> '~35 minutes' figure.
>
>
> NFSSTAT
>
> Server:/home/test-user>  nfsstat
>
> Server rpc:
> Connection oriented:
> calls      badcalls nullrecv badlen xdrcall dupchecks dupreqs
> 6565848596 0        0        0      0       2161188143 128335
>
> Now, I'm new to interpreting nfsstat output.  Anyone have enough
> confidence to speak to this?  Here's my thinking:
>
> 'calls' are the total number of RPC calls.  'dupchecks' tells us how
> many times the NFS server ran one of those calls through cache.  And
> 'dupreqs' tells us how many retransmits, or, more precisely, how many
> times the NFS server realized that it was /already/ working on that call.
>
> dupreqs / calls = 128334 / 6547537906 =~ .000019600%  Not bad at all
>
> i.e. only a tiny fraction of all the RPC calls are being retransmitted.
>
> But then, I start to doubt my interpretation.  I think I'm
> misunderstanding the use of the term 'cache' as it relates to dupchecks
> .. are we using the term cache in the sense of "If it's in my cache,
> then I'll return the answer to you quicker than if I had to retrieve
> blocks from storage".  Or are we using the term cache in the way I would
> use the term 'queue' ... meaning, if I'm already working on this request
> (in my queue), then I won't bother adding your most recent request to my
> queue. If the latter, then:
>
> dupchecks / calls = 2161188143 / 6565848596 = ~.33 =~ 33%
>
> Which would suggest that ~33% of all RPCs are retransmitted ... a
> different story entirely.
>
> http://docstore.mik.ua/orelly/networking_2ndEd/nfs/appb_01.htm
>
>
>
> WHAT WE'RE DOING
> -We've spent effort analyzing health (load, CPU utilization, network IO,
> others) on the clients during these events; so far, we believe that the
> clients are underutlized while 'ls' is being slow.
>
> -A wrapper around 'ls' logs information:  precisely what 'ls' command
> was run, on what volume, by what user, with what flags, and how long it
> took to complete
>
> -We're building a table listing which volumes are involved with slow
> 'ls' commands, and which LUNs comprise those volumes.  [Looks like some
> of the volumes are built from spindles hosted on multiple storage
> systems ... using concats, not stripes, meaning some files probably live
> on a single spindle, rather than being striped across multiple spindles
> ... more investigation needed here.]
>
> -We've started to chart IO latency on the relevant volumes
>
>
> --sk
>
> Stuart Kendrick
> FHCRC
> _______________________________________________
> Members mailing list
> Members at lists.sasag.org
> http://lists.sasag.org/mailman/listinfo/members

-- 
"When we try to pick out anything by itself, we find it
  connected to the entire universe"            John Muir

Chris "Ski" Kacoroski, Unix Admin
206-501-9803, ski98033 on IRC and most IM services


More information about the Members mailing list