lava_wait_jobs.py: Disable inefficient top-level retries These retrie causes he whole processing to restart from the beginning on the slightest error. E.g., if there were 500 test jobs, and there was and error fetching a result for last one, all 500 will be resubmitted, waited, and fetched again. This is highly inefficient way to handle retries, which pushed LAVA under the heavy load from prolonged amount of time, which affects TrustedFirmware project in general (i.e. not only TF-M, but also TF-A and maybe other sub-projects). Instead, retries should happen on the lowest reasonable level, e.g. on the level of a particular request, or a particular job. A good example of doing it right is a recent change 81ff0ad8cde35276859b. So, disable inefficient retries, instead let's better get immediate exception, then gradually add retries at the right spot. Signed-off-by: Paul Sokolovsky <paul.sokolovsky@linaro.org> Change-Id: Ib35568d2bbf17ec6afb56557fb57201aa9166c5f

commit: de25e1ffd230cfc4da7629c72deb1075b93043dd [log] [tgz]
author: Paul Sokolovsky <paul.sokolovsky@linaro.org> Mon Jan 02 14:29:21 2023 +0300
committer: Paul Sokolovsky <paul.sokolovsky@linaro.org> Mon Jan 02 14:41:27 2023 +0300
tree: ace35460bf7260abc44bde126b5900acfb855827
parent: 119e4df58918245ffcbb476d93c9c65f899653a4 [diff]
diff --git a/lava_helper/lava_wait_jobs.py b/lava_helper/lava_wait_jobs.py
index be69a29..2f4268f 100755
--- a/lava_helper/lava_wait_jobs.py
+++ b/lava_helper/lava_wait_jobs.py

@@ -186,14 +186,20 @@
     if not silent:
         print("INFO: {}".format(line))
 
+
+# WARNING: Setting this to >1 is a last resort, temporary stop-gap measure,
+# which will overload LAVA and jeopardize stability of the entire TF CI.
+INEFFICIENT_RETRIES = 1
+
+
 def main(user_args):
     """ Main logic """
-    for try_time in range(3):
+    for try_time in range(INEFFICIENT_RETRIES):
         try:
             finished_jobs = wait_for_jobs(user_args)
             break
         except Exception as e:
-            if try_time < 2:
+            if try_time < INEFFICIENT_RETRIES - 1:
                 _log.exception("Exception in wait_for_jobs")
                 _log.info("Will try to get LAVA jobs again, this was try: %d", try_time)
             else:
commit	de25e1ffd230cfc4da7629c72deb1075b93043dd	[log] [tgz]
author	Paul Sokolovsky <paul.sokolovsky@linaro.org>	Mon Jan 02 14:29:21 2023 +0300
committer	Paul Sokolovsky <paul.sokolovsky@linaro.org>	Mon Jan 02 14:41:27 2023 +0300
tree	ace35460bf7260abc44bde126b5900acfb855827
parent	119e4df58918245ffcbb476d93c9c65f899653a4 [diff]