lockref: Limit number of cmpxchg loop retries

The lockref cmpxchg loop is unbound as long as the spinlock is not
taken. Depending on the hardware implementation of compare-and-swap
a high number of loop retries might happen.

Add an upper bound to the loop to force the fallback to spinlocks
after some time. A retry value of 100 should not impact any hardware
that does not have this issue.

With the retry limit the performance of an open-close testcase
improved between 60-70% on ThunderX2.

Suggested-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Jan Glauber <jglauber@marvell.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
diff --git a/lib/lockref.c b/lib/lockref.c
index 3d468b5..5b34bbd 100644
--- a/lib/lockref.c
+++ b/lib/lockref.c
@@ -9,6 +9,7 @@
  * failure case.
  */
 #define CMPXCHG_LOOP(CODE, SUCCESS) do {					\
+	int retry = 100;							\
 	struct lockref old;							\
 	BUILD_BUG_ON(sizeof(old) != 8);						\
 	old.lock_count = READ_ONCE(lockref->lock_count);			\
@@ -21,6 +22,8 @@
 		if (likely(old.lock_count == prev.lock_count)) {		\
 			SUCCESS;						\
 		}								\
+		if (!--retry)							\
+			break;							\
 		cpu_relax();							\
 	}									\
 } while (0)