[PATCH] optimize hweight64 for x86_64
Based on patch from David Rientjes <rientjes@google.com>, but
changed by AK.
Optimizes the 64-bit hamming weight for x86_64 processors assuming they
have fast multiplication. Uses five fewer bitops than the generic
hweight64. Benchmark on one EMT64 showed ~25% speedup with 2^24
consecutive calls.
Define a new ARCH_HAS_FAST_MULTIPLIER that can be set by other
architectures that can also multiply fast.
Signed-off-by: Andi Kleen <ak@suse.de>
diff --git a/include/asm-x86_64/bitops.h b/include/asm-x86_64/bitops.h
index f7ba57b..5b535ea 100644
--- a/include/asm-x86_64/bitops.h
+++ b/include/asm-x86_64/bitops.h
@@ -399,6 +399,8 @@
return r+1;
}
+#define ARCH_HAS_FAST_MULTIPLIER 1
+
#include <asm-generic/bitops/hweight.h>
#endif /* __KERNEL__ */