Replicate network weights only to used NUMA nodes

On a system with multiple NUMA nodes, this patch avoids unneeded replicated
(e.g. 8x for a single threaded run), reducting memory use in that case.

Lazy initialization forced before search.

Passed STC:
https://tests.stockfishchess.org/tests/view/66a28c524ff211be9d4ecdd4
LLR: 2.96 (-2.94,2.94) <-1.75,0.25>
Total: 691776 W: 179429 L: 179927 D: 332420
Ptnml(0-2): 2573, 79370, 182547, 78778, 2620

closes https://github.com/official-stockfish/Stockfish/pull/5515

No functional change
This commit is contained in:
Tomasz Sobczyk
2024-07-25 14:37:08 +02:00
committed by Joost VandeVondele
parent 2343f71f3f
commit 8e560c4fd3
7 changed files with 152 additions and 16 deletions

View File

@@ -83,6 +83,8 @@ class Thread {
void clear_worker();
void run_custom_job(std::function<void()> f);
void ensure_network_replicated();
// Thread has been slightly altered to allow running custom jobs, so
// this name is no longer correct. However, this class (and ThreadPool)
// require further work to make them properly generic while maintaining
@@ -146,6 +148,8 @@ class ThreadPool {
std::vector<size_t> get_bound_thread_count_by_numa_node() const;
void ensure_network_replicated();
std::atomic_bool stop, abortedSearch, increaseDepth;
auto cbegin() const noexcept { return threads.cbegin(); }