Andrew Scull | 5e1ddfa | 2018-08-14 10:06:54 +0100 | [diff] [blame] | 1 | //===- Threads.h ------------------------------------------------*- C++ -*-===// |
| 2 | // |
| 3 | // The LLVM Linker |
| 4 | // |
| 5 | // This file is distributed under the University of Illinois Open Source |
| 6 | // License. See LICENSE.TXT for details. |
| 7 | // |
| 8 | //===----------------------------------------------------------------------===// |
| 9 | // |
| 10 | // LLD supports threads to distribute workloads to multiple cores. Using |
| 11 | // multicore is most effective when more than one core are idle. At the |
| 12 | // last step of a build, it is often the case that a linker is the only |
| 13 | // active process on a computer. So, we are naturally interested in using |
| 14 | // threads wisely to reduce latency to deliver results to users. |
| 15 | // |
| 16 | // That said, we don't want to do "too clever" things using threads. |
| 17 | // Complex multi-threaded algorithms are sometimes extremely hard to |
| 18 | // reason about and can easily mess up the entire design. |
| 19 | // |
| 20 | // Fortunately, when a linker links large programs (when the link time is |
| 21 | // most critical), it spends most of the time to work on massive number of |
| 22 | // small pieces of data of the same kind, and there are opportunities for |
| 23 | // large parallelism there. Here are examples: |
| 24 | // |
| 25 | // - We have hundreds of thousands of input sections that need to be |
| 26 | // copied to a result file at the last step of link. Once we fix a file |
| 27 | // layout, each section can be copied to its destination and its |
| 28 | // relocations can be applied independently. |
| 29 | // |
| 30 | // - We have tens of millions of small strings when constructing a |
| 31 | // mergeable string section. |
| 32 | // |
| 33 | // For the cases such as the former, we can just use parallelForEach |
| 34 | // instead of std::for_each (or a plain for loop). Because tasks are |
| 35 | // completely independent from each other, we can run them in parallel |
| 36 | // without any coordination between them. That's very easy to understand |
| 37 | // and reason about. |
| 38 | // |
| 39 | // For the cases such as the latter, we can use parallel algorithms to |
| 40 | // deal with massive data. We have to write code for a tailored algorithm |
| 41 | // for each problem, but the complexity of multi-threading is isolated in |
| 42 | // a single pass and doesn't affect the linker's overall design. |
| 43 | // |
| 44 | // The above approach seems to be working fairly well. As an example, when |
| 45 | // linking Chromium (output size 1.6 GB), using 4 cores reduces latency to |
| 46 | // 75% compared to single core (from 12.66 seconds to 9.55 seconds) on my |
| 47 | // Ivy Bridge Xeon 2.8 GHz machine. Using 40 cores reduces it to 63% (from |
| 48 | // 12.66 seconds to 7.95 seconds). Because of the Amdahl's law, the |
| 49 | // speedup is not linear, but as you add more cores, it gets faster. |
| 50 | // |
| 51 | // On a final note, if you are trying to optimize, keep the axiom "don't |
| 52 | // guess, measure!" in mind. Some important passes of the linker are not |
| 53 | // that slow. For example, resolving all symbols is not a very heavy pass, |
| 54 | // although it would be very hard to parallelize it. You want to first |
| 55 | // identify a slow pass and then optimize it. |
| 56 | // |
| 57 | //===----------------------------------------------------------------------===// |
| 58 | |
| 59 | #ifndef LLD_COMMON_THREADS_H |
| 60 | #define LLD_COMMON_THREADS_H |
| 61 | |
| 62 | #include "llvm/Support/Parallel.h" |
| 63 | #include <functional> |
| 64 | |
| 65 | namespace lld { |
| 66 | |
| 67 | extern bool ThreadsEnabled; |
| 68 | |
| 69 | template <typename R, class FuncTy> void parallelForEach(R &&Range, FuncTy Fn) { |
| 70 | if (ThreadsEnabled) |
| 71 | for_each(llvm::parallel::par, std::begin(Range), std::end(Range), Fn); |
| 72 | else |
| 73 | for_each(llvm::parallel::seq, std::begin(Range), std::end(Range), Fn); |
| 74 | } |
| 75 | |
| 76 | inline void parallelForEachN(size_t Begin, size_t End, |
| 77 | std::function<void(size_t)> Fn) { |
| 78 | if (ThreadsEnabled) |
| 79 | for_each_n(llvm::parallel::par, Begin, End, Fn); |
| 80 | else |
| 81 | for_each_n(llvm::parallel::seq, Begin, End, Fn); |
| 82 | } |
| 83 | |
| 84 | } // namespace lld |
| 85 | |
| 86 | #endif |