[dtrace-discuss] How to trace process memory usage
Jim Mauro
James.Mauro at Sun.COM
Mon Dec 4 06:10:10 PST 2006
You need to take a step back I think, and first identify the problem.
You do not yet know if the memory usage is a user-land process,
and the approach of determing which processes are having their
pages stolen may not help - such processes may be victims, not the
cause. You need to audit the memory consumers, and go from there.
They are:
- The kernel
- The file system cache
- Processes
I assume you're certain about the paging activity, meaning you see free
memory drop and the page scanner getting busy. This is observable
with vmstat - monitor freemem and the "sr" column.
prstat(1) is your friend. "prstat -s rss" is a wonderfully simple and
effective
way to track physical memory usage on a per-process basis. Sure, we know
all about shared pages, and the fact that the sum of all process's RSS sizes
will be something much, much larger than physical memory. But all we're
looking for here are processes with increasing large RSS, and who the
large consumers are. Once you're identified the process(es), use "pmap -x"
to refine your understanding of its memory usage.
On a system of this size (1.5GB of RAM), use mdb's "memstat" dcmd -
mdb -k
[output from mdb starting]
::memstat
This will give you a memory usage profile.
In my experience, the symptoms you describe are frequently the result of
the file system cache consuming memory (which, in and of itself, is not
a bad thing), then a process comes along that needs a bigger chunk than
is available, and the kernel has to get busy managing the shortfall.
With UFS, you'll see the page cache in memstat. With ZFS, you will
not, since ZFS uses a its own mechanism for caching data and metadata.
Unfortunately, there isn't an easy way to track ZFS as a memory consumer
(at least not that I'm aware of) - The mdb "kmastat" dcmd will show
usage for
all the zio pools and zfs caches, but it takes a bit of parsing to sort
it out.
I'm sure a dtrace script could help track ZFS memory consumption, but
I'd need to spend a bit of time working through something like that.
Anyway, before we jump to conclusions, let's start with first identifying
the consumer. If it turns out that kernel memory is growing, we can
chase that down with dtrace and mdb/kmastat. If it's a process, pmap to
determine the segment(s), and dtrace to track allocations.
HTH,
/jim
Peter Eriksson wrote:
> We're trying to locate a problem on one of our web servers where suddenly everything grinds to a virtual halt (well, not really) due to something forcing a *lot* of paging activitity. We are suspecting that it might be some process that suddenly allocates a lot of memory and accesses it quickly - forcing the rest of the (big) processes out to swap. *Or* something filesystem related (ZFS perhaps?).
>
> One thing that makes it problematic to trace is that when things go slowly/halt
> we can't login to the machine ("fork: resource temporarily unavailable").
>
> Using a dtrace scripts we've seen that during the periods when things are really slow some processes are starting to paging (and have really long paging response times). (script: http://www.solarisinternals.com/si/dtrace/whospaging.d)
>
> An added complication is that during the times when things fail dtrace also more or less fails to run...
>
> # priocntl -e -c RT dtrace -s ./whospaging.d > paging-RT.log
> dtrace: processing aborted: Abort due to systemic unresponsiveness
>
> It worked better with:
>
> # priocntl -e -c RT dtrace -w -s ./whospaging.d > paging-RT-2.log
>
> but then it wouldn't print anything at all when the interesting things were happening...
>
> (Machine: Sun Ultra 60, 2x360MHz CPUs, 1500MB RAM)
>
> Any suggestions on what to check next?
>
>
> This message posted from opensolaris.org
> _______________________________________________
> dtrace-discuss mailing list
> dtrace-discuss at opensolaris.org
>
More information about the dtrace-discuss
mailing list