Use neon_m128_reduce_add_epi32 for NEON vector reduction

Accomplishing the entire horizontal addition in a single NEON instruction

closes https://github.com/official-stockfish/Stockfish/pull/5885

No functional change
This commit is contained in:
FauziAkram
2025-02-14 03:07:39 +03:00
committed by Disservin
parent ee7259e48b
commit 095d19afea

View File

@@ -102,7 +102,7 @@ static void affine_transform_non_ssse3(std::int32_t* output,
product = vmlal_s8(product, inputVector[j * 2 + 1], row[j * 2 + 1]);
sum = vpadalq_s16(sum, product);
}
output[i] = sum[0] + sum[1] + sum[2] + sum[3];
output[i] = Simd::neon_m128_reduce_add_epi32(sum);
#endif
}