[intel-platform-discuss] Intel Platform optimised software crypto algorithms
Darren J Moffat
Darren.Moffat at Sun.COM
Thu Mar 15 06:48:53 PDT 2007
Since one of the main focus areas of the intel-platform project on
OpenSolaris is performance I'd like to get some discussion started about
using all possible Intel CPU capabilities to get the fastest performing
software crypto implementations we can.
I've cc'd crypto-discuss because thats where the crypto framework people
hang out and most of us aren't (and probably won't) be regular members
of intel-platform-discuss, so please keep both mailing lists copied for
all discussion on this topic.
What do I want from the Intel Platform project ?
------------------------------------------------
First let me say that you don't need to understand anything about
cryptography to assist here this is purely about making the maths and
other parts of the cryptographic algorithms go fast. We have an
extensive known answer and performance based test suite that can be used
to ensure things don't break.
What we in the crypto team need is people with experience of using the
Intel processors extended instructions (eg SSE) to help improve the
performance of the software implementations of the algorithms.
We have support for having cpu/platform (via the runtime linker $HWCAP
and $PLATFORM support) variants in both user and kernel versions.
Where is the focus needed ?
---------------------------
While we could just say we need to make all the algorithms go fast there
is actually a short list of important ones. These are the ones covered
by FIPS 140-2 and the ones used in the SPECWeb benchmarks (and they are
used in SPECWeb because that is what the majority of browsers/webservers
end up using in real life). That list is the following:
Symetric Crypto: RC4 (aka ArcFour), AES
Asymetric Crypto: RSA
Digest/HMAC: MD5, SHA1, HMAC-MD5, HMAC-SHA1
The other algorithms I believe we should focus on, since no optimisation
work at all has been done on the implementations used in Solaris for any
platform, is the SHA256,SHA384,SHA512 set which will be getting much
more use in crypto protocols in the future and SHA256 is used in ZFS.
Where is the source code ?
--------------------------
All the algorithms listed above are build for userland and kernel from
the same algorithm source code but with slightly different "plumbing".
It is the raw algorithms that need to be focused on. The source for all
of these can be found in the opensolaris ON consolidation here:
http://src.opensolaris.org/source/xref/onnv/onnv-gate/usr/src/common/crypto/
For RSA most (probably all) of the optimisation work would actually be
done in the BIGNUM code base which is here, there has already been some
work done that uses SSE for montgomery multiplication, see the README in
the bignum source directory:
http://src.opensolaris.org/source/xref/onnv/onnv-gate/usr/src/common/bignum
Specifically see this example:
http://src.opensolaris.org/source/xref/onnv/onnv-gate/usr/src/common/bignum/i386/bignum_i386_asm.s
What is acceptable ?
--------------------
C or Asm code is fine. As long as it builds with the ON tool set and
passes the test suite we will be very grateful. Hand optimised output
from an assembler is perfectly acceptable as well as coded from scratch
asm code.
--
Darren J Moffat
More information about the intel-platform-discuss
mailing list