[dtrace-discuss] [networking-discuss] Re: DTrace Network Providers, take 2
Brendan Gregg - Sun Microsystems
brendan at sun.com
Mon Jul 2 18:02:15 PDT 2007
G'Day Darren,
On Fri, Jun 29, 2007 at 05:27:01PM -0700, Darren.Reed at sun.com wrote:
> Brendan Gregg - Sun Microsystems wrote:
>
> >G'Day Darren,
> >
> >On Fri, Jun 22, 2007 at 09:01:26PM -0700, Darren.Reed at Sun.COM wrote:
> >
> >>Brendan Gregg - Sun Microsystems wrote:
> >>
[...]
> >Has anyone noticed a problem already? There are *already* more DTrace
> >probes in TCP/IP than what I'm suggesting to add. For example, the
> >following three commands were run at the same time,
> >
> > # snoop -r
> > Using device nge0 (promiscuous mode)
> > 192.168.1.109 -> 192.168.1.108 TELNET C port=40965 d
> > 192.168.1.108 -> 192.168.1.109 TELNET R port=40965 d
> > 192.168.1.109 -> 192.168.1.108 TELNET C port=40965
> > ^C
> >
>
> As Garrett alluded to, the situation that we're most concerned about
> with performance is not just local delivery but packet forwarding
> where there are various efforts underway to try and reduce the size
> of the code path. I'm sure that the length of the code path to get
> data to/from telnet leans heavily towards delivery into TCP.
Rightio - sounds like testing the forwarding code path performance should
also be a standard test (such as in an automated perf PIT).
[...]
> >Thoughts:
> >
> >Each probe addition adds 1 to 5 nops, and a few movs of recently
> >accessed data which we would expect to still be cached. Given a maximum
> >packet rate per CPU per second of 200,000, the execution overhead of some
> >nops and cache movs per packet should be negligible. The effect on i-cache
> >is harder to estimate, and is probably only best checked through
> >measurements.
> >
> >In summary, the execution time of the probes is negligible when considering
> >the per CPU packet rates; the i-cache effect looks negligble, and from the
> >tests was more affected by code layout than the existance or non-existance
> >of probes.
> >
>
> I'm not really worried about the data cache, just the i-cache and
> the CPU pipeline. Throwing in some NOPs and otherwise redundant
> instructions every now and then doesn't seem like too big of a sin,
> but filling an entire i-cache row or two with NOPs seems...wasteful.
> Depending on when the CPU decides to throw away a NOP, during the
> pipeline decoding of the instruction would by and large determine
> what it means to have 2-3 vs 6-9 of them from 1 vs 3 dtrace probes
> next to each other. This may sound silly, but I'd have less of an
> issue about this if the same probes were further away from each
> other :) However this really is micro-optimisation.
3 DTrace probes next to each other is certainly questionable. I've recently
dropped one set of the IP probes so that it is now 2, and put a webrev
of this with just the send/receive probes at the end of,
http://www.opensolaris.org/os/community/dtrace/NetworkProvider/Prototype2/
I wasn't certain that having both ip::: and ipv4/ipv6::: providers was
the right way to go, and put the webrev and website out to see how it looked.
I've also been writing scripts using either to see how they felt. The ip:::
provider is more in the DTrace mould (compare with the io::: provider), and
so I've dropped the ipv4/ipv6 providers from the recent webrev (ip::: now
has identical functionality).
[...]
> >I'd suggest this strategy.
> >
> >1) For development/troubleshooting probes, use fbt.
> >...
> >
>
> I agree with this.
> But having seen the sdt:::ip-xmit-v4 probe, wouldn't it be
> nice if you could present that as fbt::ip-xmit-v4:entry but
> pass in ire->ire_nce as arg2 instead of ire?
I would have thought that the existing (ire_t *)ire as args[2] was already
ideal, as users could refer to args[2]->ire_nce as needed. It might be
a different story if the kernel wasn't CTF'd and these wern't already
casted.
[...]
> >We can't - DTrace doesn't support aliasing like that.
> >
>
> Can I have this added to the list of new things I'd like to
> see dtrace be able to support, along with the DTRACE_IF? :)
>
> I don't know how this would look, maybe:
>
> DTRACE_DEF(ip__hook__abc, m, ip, stq_ill);
> DTRACE_VIEW1(ip__hook__abc, ip__physical__out__end, mblk_t *);
> DTRACE_VIEW2(ip__hook__abc, ip_send, mblk_t *, void_ip_t *);
> DTRACE_VIEW3(ip__hook__abc, ip4__send, mblk_t *, ipha_t *, ill_t *);
>
> or maybe something more complex so that you could create a hook
> alias that included just the mblk_t and the ill_t but still only
> have one hook in the code path.
>
> The general story is we have a collection of values that we want
> to export out through dtrace and we'd like to have a way to see
> them using different filters (or views.)
>
> If dtrace can handle multiple people using the same probe then
> technically it should be able to cope with multiple people
> using the same probe but with different aliases to present the
> values caught by the dtrace hook in different ways.
Cool idea. I don't think there is a need to do this right now based on
performance testing so far, but it is good to have options to try like
this than not to. And both DTRACE_IF and DTRACE_VIEW are options that
could be added later without changing the exported stable provider
interface.
It would take some moderate work to get this done; and when complete, there
would still be more non-enabled probe overhead in other places - like the
mib probes, that wouldn't benifit from this approach.
> Another thought is can these dtrace probe views be defined with
> dtrace(1m) rather than in the kernel?
>
> This would mean that "dtrace -l" might not provide an exhaustive
> list of all the probes available (if it only consults the kernel.)
>
> So, if you wanted to get the ip-send dtrace probe, maybe in your
> dtrace script you would include some special library that knows
> to use the ip4-physical-out-end (with extra args) and then to
> apply some sort of condition and change the types of the args.
>
> So my dtrace script might be:
>
> #!/usr/sbin/dtrace -Fs
>
> include <provider/ip>
> ip:::ip-send{
> ...
> }
>
> If dtrace could be evolved to do that then we could define base
> probes, such as the ip4-physical-out-start and layer on top of
> that new probes without having to deliver any new kernel modules.
Yes, that would be another line of attack.
...
There might be an easier way to drop the duplicate probe overhead, which
currently looks like,
14223 DTRACE_PROBE4(ip4__physical__out__start,
14224 ill_t *, NULL, ill_t *, stq_ill,
14225 ipha_t *, ipha, mblk_t *, mp);
14226 FW_HOOKS(ipst->ips_ip4_physical_out_event,
14227 ipst->ips_ipv4firewall_physical_out,
14228 NULL, stq_ill, ipha, mp, mpip, ipst);
14229 DTRACE_PROBE1(ip4__physical__out__end, mblk_t *,
14230 mp);
14231 if (mp == NULL)
14232 goto drop;
14233
14234 DTRACE_IP5(send, mblk_t *, mp, void_ip_t *, ipha,
14235 ill_t *, stq_ill, ipha_t *, ipha, ip6_t *, NULL);
Why have sdt:::ip4-physical-out-start and sdt:::ip4-physical-out-end anyway?
FW_HOOKS wraps hook_run() if hooks are enabled, and hook_run() is tracable
at zero cost from fbt.
Those sdt::: probes provide interface info, IP header and raw mblk pointer.
So you could write scripts like,
# ./fwhooks01.d
EVENT *MP IF-IN IF-OUT FROM TO
> PHYSICAL_OUT ffffff6137dde4e0 0 2 192.168.1.109 192.168.1.198
< PHYSICAL_OUT ffffff6137dde4e0 0 2 192.168.1.109 192.168.1.198
> PHYSICAL_OUT fffffffedad37220 0 2 192.168.1.109 192.168.1.108
< PHYSICAL_OUT 0 0 2 192.168.1.109 192.168.1.108
> PHYSICAL_OUT ffffff0e2ec32620 0 2 192.168.1.109 192.168.1.198
< PHYSICAL_OUT ffffff0e2ec32620 0 2 192.168.1.109 192.168.1.198
> PHYSICAL_IN ffffff0e2ec32620 2 0 192.168.1.198 192.168.1.109
< PHYSICAL_IN ffffff0e2ec32620 2 0 192.168.1.198 192.168.1.109
> PHYSICAL_OUT ffffffff9ba996c0 0 2 192.168.1.109 192.168.1.198
< PHYSICAL_OUT ffffffff9ba996c0 0 2 192.168.1.109 192.168.1.198
[...]
(That zero *mp was for a blocked packet.)
The above script was written using fbt at zero cost, not sdt!
root at deimos:/root> cat fwhooks01.d
#!/usr/sbin/dtrace -s
#pragma D option quiet
#pragma D option switchrate=10
dtrace:::BEGIN
{
printf(" %-12s %16s %6s %6s %16s %16s\n",
"EVENT", "*MP", "IF-IN", "IF-OUT", "FROM", "TO");
}
fbt::hook_run:entry
{
self->info = (hook_pkt_event_t *)arg1;
self->name = stringof(args[0]->hei_event->he_name);
this->ipha = (ipha_t *)self->info->hpe_hdr;
self->saddr = inet_ntoa(&this->ipha->ipha_src);
self->daddr = inet_ntoa(&this->ipha->ipha_dst);
printf("> %-12s %16x %6d %6d %16s %16s\n", self->name,
(uint64_t)*self->info->hpe_mp, self->info->hpe_ifp,
self->info->hpe_ofp, self->saddr, self->daddr);
}
fbt::hook_run:return
/self->info/
{
printf("< %-12s %16x %6d %6d %16s %16s\n", self->name,
(uint64_t)*self->info->hpe_mp, self->info->hpe_ifp,
self->info->hpe_ofp, self->saddr, self->daddr);
self->info = 0;
self->name = 0;
self->saddr = 0;
self->daddr = 0;
}
Woah, cool huh! :-)
(BTW, that script needs a recent build so that inet_ntoa() exists).
Any specific reason why those sdt probes were added? ... such as a script
that needed to be written? I can check if fbt can be used instead.
I'm not saying that sdt:::ip4-physical-out-start/end *need* to be dropped,
just that it looks like they could be served by zero cost fbt probes, and
so dropped if a performance problem is shown to exist. And that might be
an an easier option than building a new DTrace framework. ;).
cheers,
Brendan
--
Brendan
[CA, USA]
More information about the dtrace-discuss
mailing list