Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

{2023.06}[foss/2021b] TensorFlow 2.8.4 #343

Conversation

laraPPr
Copy link
Collaborator

@laraPPr laraPPr commented Sep 27, 2023

14 out of 69 required modules missing:

  • Zip/3.0-GCCcore-11.2.0 (Zip-3.0-GCCcore-11.2.0.eb)
  • protobuf/3.17.3-GCCcore-11.2.0 (protobuf-3.17.3-GCCcore-11.2.0.eb)
  • dill/0.3.4-GCCcore-11.2.0 (dill-0.3.4-GCCcore-11.2.0.eb)
  • pkgconfig/1.5.5-GCCcore-11.2.0-python (pkgconfig-1.5.5-GCCcore-11.2.0-python.eb)
  • Bazel/4.2.2-GCCcore-11.2.0 (Bazel-4.2.2-GCCcore-11.2.0.eb)
  • giflib/5.2.1-GCCcore-11.2.0 (giflib-5.2.1-GCCcore-11.2.0.eb)
  • flatbuffers/2.0.0-GCCcore-11.2.0 (flatbuffers-2.0.0-GCCcore-11.2.0.eb)
  • JsonCpp/1.9.4-GCCcore-11.2.0 (JsonCpp-1.9.4-GCCcore-11.2.0.eb)
  • LMDB/0.9.29-GCCcore-11.2.0 (LMDB-0.9.29-GCCcore-11.2.0.eb)
  • nsync/1.24.0-GCCcore-11.2.0 (nsync-1.24.0-GCCcore-11.2.0.eb)
  • h5py/3.6.0-foss-2021b (h5py-3.6.0-foss-2021b.eb)
  • protobuf-python/3.17.3-GCCcore-11.2.0 (protobuf-python-3.17.3-GCCcore-11.2.0.eb)
  • flatbuffers-python/2.0-GCCcore-11.2.0 (flatbuffers-python-2.0-GCCcore-11.2.0.eb)
  • TensorFlow/2.8.4-foss-2021b (TensorFlow-2.8.4-foss-2021b.eb)

@eessi-bot
Copy link

eessi-bot bot commented Sep 27, 2023

Instance eessi-bot-citc-aws is configured to build:

  • arch x86_64/generic for repo eessi-2021.12
  • arch x86_64/generic for repo eessi.org-2023.06-compat
  • arch x86_64/generic for repo eessi.io-2023.06-compat
  • arch x86_64/generic for repo eessi-2023.06-software
  • arch x86_64/intel/haswell for repo eessi-2021.12
  • arch x86_64/intel/haswell for repo eessi.org-2023.06-compat
  • arch x86_64/intel/haswell for repo eessi.io-2023.06-compat
  • arch x86_64/intel/haswell for repo eessi-2023.06-software
  • arch x86_64/intel/skylake_avx512 for repo eessi-2021.12
  • arch x86_64/intel/skylake_avx512 for repo eessi.org-2023.06-compat
  • arch x86_64/intel/skylake_avx512 for repo eessi.io-2023.06-compat
  • arch x86_64/intel/skylake_avx512 for repo eessi-2023.06-software
  • arch x86_64/amd/zen2 for repo eessi-2021.12
  • arch x86_64/amd/zen2 for repo eessi.org-2023.06-compat
  • arch x86_64/amd/zen2 for repo eessi.io-2023.06-compat
  • arch x86_64/amd/zen2 for repo eessi-2023.06-software
  • arch x86_64/amd/zen3 for repo eessi-2021.12
  • arch x86_64/amd/zen3 for repo eessi.org-2023.06-compat
  • arch x86_64/amd/zen3 for repo eessi.io-2023.06-compat
  • arch x86_64/amd/zen3 for repo eessi-2023.06-software
  • arch aarch64/generic for repo eessi-2021.12
  • arch aarch64/generic for repo eessi.org-2023.06-compat
  • arch aarch64/generic for repo eessi.io-2023.06-compat
  • arch aarch64/generic for repo eessi-2023.06-software
  • arch aarch64/neoverse_n1 for repo eessi-2021.12
  • arch aarch64/neoverse_n1 for repo eessi.org-2023.06-compat
  • arch aarch64/neoverse_n1 for repo eessi.io-2023.06-compat
  • arch aarch64/neoverse_n1 for repo eessi-2023.06-software
  • arch aarch64/neoverse_v1 for repo eessi-2021.12
  • arch aarch64/neoverse_v1 for repo eessi.org-2023.06-compat
  • arch aarch64/neoverse_v1 for repo eessi.io-2023.06-compat
  • arch aarch64/neoverse_v1 for repo eessi-2023.06-software

@laraPPr
Copy link
Collaborator Author

laraPPr commented Sep 27, 2023

bot: build repo:eessi-2023.06-software arch:x86_64/generic

@eessi-bot
Copy link

eessi-bot bot commented Sep 27, 2023

Updates by the bot instance eessi-bot-citc-aws (click for details)
  • received bot command build repo:eessi-2023.06-software arch:x86_64/generic from laraPPr

    • expanded format: build repository:eessi-2023.06-software architecture:x86_64/generic
  • handling command build repository:eessi-2023.06-software architecture:x86_64/generic resulted in:

@eessi-bot
Copy link

eessi-bot bot commented Sep 27, 2023

New job on instance eessi-bot-citc-aws for architecture x86_64-generic for repository eessi-2023.06-software in job dir /mnt/shared/home/bot/eessi-bot-software-layer/jobs/2023.09/pr_343/7586

date job status comment
Sep 27 07:50:07 UTC 2023 submitted job id 7586 awaits release by job manager
Sep 27 07:50:33 UTC 2023 released job awaits launch by Slurm scheduler
Sep 27 09:31:32 UTC 2023 finished
🤷 UNKNOWN (click triangle for detailed information)
  • Job results file _bot_job7586.result does not exist in job directory or reading it failed.
  • No artefacts were found/reported.

@boegel
Copy link
Contributor

boegel commented Sep 27, 2023

I've just cancelled job 7586, we should retry with the hook included in #321

@boegel boegel changed the title add tensoflow 4.8.1 {2023.06}[foss/2021b] TensorFlow 4.8.1 Sep 27, 2023
@laraPPr
Copy link
Collaborator Author

laraPPr commented Sep 27, 2023

bot: build repo:eessi-2023.06-software arch:aarch64/neoverse_v1

@eessi-bot
Copy link

eessi-bot bot commented Sep 27, 2023

Updates by the bot instance eessi-bot-citc-aws (click for details)
  • received bot command build repo:eessi-2023.06-software arch:aarch64/neoverse_v1 from laraPPr

    • expanded format: build repository:eessi-2023.06-software architecture:aarch64/neoverse_v1
  • handling command build repository:eessi-2023.06-software architecture:aarch64/neoverse_v1 resulted in:

@eessi-bot
Copy link

eessi-bot bot commented Sep 27, 2023

New job on instance eessi-bot-citc-aws for architecture aarch64-neoverse_v1 for repository eessi-2023.06-software in job dir /mnt/shared/home/bot/eessi-bot-software-layer/jobs/2023.09/pr_343/7587

date job status comment
Sep 27 13:50:23 UTC 2023 submitted job id 7587 awaits release by job manager
Sep 27 13:51:15 UTC 2023 released job awaits launch by Slurm scheduler
Sep 27 13:55:17 UTC 2023 running job 7587 is running
Sep 27 14:28:56 UTC 2023 finished
😢 FAILURE (click triangle for details)
Details
✅ job output file slurm-7587.out
❌ found message matching ERROR:
❌ found message matching FAILED:
❌ found message matching required modules missing:
✅ found message(s) matching No missing installations
✅ found message matching .tar.gz created!
Artefacts
eessi-2023.06-software-linux-aarch64-neoverse_v1-1695824818.tar.gzsize: 50 MiB (52871105 bytes)
entries: 921
modules under 2023.06/software/linux/aarch64/neoverse_v1/modules/all
Bazel/4.2.2-GCCcore-11.2.0.lua
dill/0.3.4-GCCcore-11.2.0.lua
flatbuffers/2.0.0-GCCcore-11.2.0.lua
flatbuffers-python/2.0-GCCcore-11.2.0.lua
giflib/5.2.1-GCCcore-11.2.0.lua
h5py/3.6.0-foss-2021b.lua
JsonCpp/1.9.4-GCCcore-11.2.0.lua
LMDB/0.9.29-GCCcore-11.2.0.lua
nsync/1.24.0-GCCcore-11.2.0.lua
pkgconfig/1.5.5-GCCcore-11.2.0-python.lua
protobuf/3.17.3-GCCcore-11.2.0.lua
protobuf-python/3.17.3-GCCcore-11.2.0.lua
Zip/3.0-GCCcore-11.2.0.lua
software under 2023.06/software/linux/aarch64/neoverse_v1/software
Bazel/4.2.2-GCCcore-11.2.0
dill/0.3.4-GCCcore-11.2.0
flatbuffers/2.0.0-GCCcore-11.2.0
flatbuffers-python/2.0-GCCcore-11.2.0
giflib/5.2.1-GCCcore-11.2.0
h5py/3.6.0-foss-2021b
JsonCpp/1.9.4-GCCcore-11.2.0
LMDB/0.9.29-GCCcore-11.2.0
nsync/1.24.0-GCCcore-11.2.0
pkgconfig/1.5.5-GCCcore-11.2.0-python
protobuf/3.17.3-GCCcore-11.2.0
protobuf-python/3.17.3-GCCcore-11.2.0
Zip/3.0-GCCcore-11.2.0
other under 2023.06/software/linux/aarch64/neoverse_v1
.lmod/cache/spiderT.lua
.lmod/cache/spiderT.luac_5.1
.lmod/cache/timestamp

@boegel
Copy link
Contributor

boegel commented Sep 28, 2023

Hmm, looks like there's lilttle hope with TensorFlow v4.8.1 on aarch64 as well?

external/XNNPACK/src/qu8-igemm/gen/4x16c4-minmax-fp32-aarch64-neondot-ld128.S: Assembler messages:
external/XNNPACK/src/qu8-igemm/gen/4x16c4-minmax-fp32-aarch64-neondot-ld128.S:123: Error: selected processor does not support `udot v12.4s,v8.16b,v0.16b'

Let's see if x86_64 works at least...

bot: build repo:eessi-2023.06-software arch:x86_64/generic
bot: build repo:eessi-2023.06-software arch:x86_64/amd/zen2

@eessi-bot
Copy link

eessi-bot bot commented Sep 28, 2023

Updates by the bot instance eessi-bot-citc-aws (click for details)
  • received bot command build repo:eessi-2023.06-software arch:x86_64/generic from boegel

    • expanded format: build repository:eessi-2023.06-software architecture:x86_64/generic
  • received bot command build repo:eessi-2023.06-software arch:x86_64/amd/zen2 from boegel

    • expanded format: build repository:eessi-2023.06-software architecture:x86_64/amd/zen2
  • handling command build repository:eessi-2023.06-software architecture:x86_64/generic resulted in:

  • handling command build repository:eessi-2023.06-software architecture:x86_64/amd/zen2 resulted in:

@eessi-bot
Copy link

eessi-bot bot commented Sep 28, 2023

New job on instance eessi-bot-citc-aws for architecture x86_64-generic for repository eessi-2023.06-software in job dir /mnt/shared/home/bot/eessi-bot-software-layer/jobs/2023.09/pr_343/7588

date job status comment
Sep 28 07:34:49 UTC 2023 submitted job id 7588 awaits release by job manager
Sep 28 07:35:28 UTC 2023 released job awaits launch by Slurm scheduler
Sep 28 07:39:32 UTC 2023 running job 7588 is running
Sep 28 11:17:20 UTC 2023 finished
😁 SUCCESS (click triangle for details)
Details
✅ job output file slurm-7588.out
✅ no message matching ERROR:
✅ no message matching FAILED:
✅ no message matching required modules missing:
✅ found message(s) matching No missing installations
✅ found message matching .tar.gz created!
Artefacts
eessi-2023.06-software-linux-x86_64-generic-1695899696.tar.gzsize: 303 MiB (318489048 bytes)
entries: 15541
modules under 2023.06/software/linux/x86_64/generic/modules/all
Bazel/4.2.2-GCCcore-11.2.0.lua
dill/0.3.4-GCCcore-11.2.0.lua
flatbuffers/2.0.0-GCCcore-11.2.0.lua
flatbuffers-python/2.0-GCCcore-11.2.0.lua
giflib/5.2.1-GCCcore-11.2.0.lua
h5py/3.6.0-foss-2021b.lua
JsonCpp/1.9.4-GCCcore-11.2.0.lua
LMDB/0.9.29-GCCcore-11.2.0.lua
nsync/1.24.0-GCCcore-11.2.0.lua
pkgconfig/1.5.5-GCCcore-11.2.0-python.lua
protobuf/3.17.3-GCCcore-11.2.0.lua
protobuf-python/3.17.3-GCCcore-11.2.0.lua
TensorFlow/2.8.4-foss-2021b.lua
Zip/3.0-GCCcore-11.2.0.lua
software under 2023.06/software/linux/x86_64/generic/software
Bazel/4.2.2-GCCcore-11.2.0
dill/0.3.4-GCCcore-11.2.0
flatbuffers/2.0.0-GCCcore-11.2.0
flatbuffers-python/2.0-GCCcore-11.2.0
giflib/5.2.1-GCCcore-11.2.0
h5py/3.6.0-foss-2021b
JsonCpp/1.9.4-GCCcore-11.2.0
LMDB/0.9.29-GCCcore-11.2.0
nsync/1.24.0-GCCcore-11.2.0
pkgconfig/1.5.5-GCCcore-11.2.0-python
protobuf/3.17.3-GCCcore-11.2.0
protobuf-python/3.17.3-GCCcore-11.2.0
TensorFlow/2.8.4-foss-2021b
Zip/3.0-GCCcore-11.2.0
other under 2023.06/software/linux/x86_64/generic
.lmod/cache/spiderT.lua
.lmod/cache/spiderT.luac_5.1
.lmod/cache/timestamp

@eessi-bot
Copy link

eessi-bot bot commented Sep 28, 2023

New job on instance eessi-bot-citc-aws for architecture x86_64-amd-zen2 for repository eessi-2023.06-software in job dir /mnt/shared/home/bot/eessi-bot-software-layer/jobs/2023.09/pr_343/7589

date job status comment
Sep 28 07:34:55 UTC 2023 submitted job id 7589 awaits release by job manager
Sep 28 07:35:25 UTC 2023 released job awaits launch by Slurm scheduler
Sep 28 07:39:30 UTC 2023 running job 7589 is running
Sep 28 10:49:18 UTC 2023 finished
😁 SUCCESS (click triangle for details)
Details
✅ job output file slurm-7589.out
✅ no message matching ERROR:
✅ no message matching FAILED:
✅ no message matching required modules missing:
✅ found message(s) matching No missing installations
✅ found message matching .tar.gz created!
Artefacts
eessi-2023.06-software-linux-x86_64-amd-zen2-1695898089.tar.gzsize: 306 MiB (321850289 bytes)
entries: 15541
modules under 2023.06/software/linux/x86_64/amd/zen2/modules/all
Bazel/4.2.2-GCCcore-11.2.0.lua
dill/0.3.4-GCCcore-11.2.0.lua
flatbuffers/2.0.0-GCCcore-11.2.0.lua
flatbuffers-python/2.0-GCCcore-11.2.0.lua
giflib/5.2.1-GCCcore-11.2.0.lua
h5py/3.6.0-foss-2021b.lua
JsonCpp/1.9.4-GCCcore-11.2.0.lua
LMDB/0.9.29-GCCcore-11.2.0.lua
nsync/1.24.0-GCCcore-11.2.0.lua
pkgconfig/1.5.5-GCCcore-11.2.0-python.lua
protobuf/3.17.3-GCCcore-11.2.0.lua
protobuf-python/3.17.3-GCCcore-11.2.0.lua
TensorFlow/2.8.4-foss-2021b.lua
Zip/3.0-GCCcore-11.2.0.lua
software under 2023.06/software/linux/x86_64/amd/zen2/software
Bazel/4.2.2-GCCcore-11.2.0
dill/0.3.4-GCCcore-11.2.0
flatbuffers/2.0.0-GCCcore-11.2.0
flatbuffers-python/2.0-GCCcore-11.2.0
giflib/5.2.1-GCCcore-11.2.0
h5py/3.6.0-foss-2021b
JsonCpp/1.9.4-GCCcore-11.2.0
LMDB/0.9.29-GCCcore-11.2.0
nsync/1.24.0-GCCcore-11.2.0
pkgconfig/1.5.5-GCCcore-11.2.0-python
protobuf/3.17.3-GCCcore-11.2.0
protobuf-python/3.17.3-GCCcore-11.2.0
TensorFlow/2.8.4-foss-2021b
Zip/3.0-GCCcore-11.2.0
other under 2023.06/software/linux/x86_64/amd/zen2
.lmod/cache/spiderT.lua
.lmod/cache/spiderT.luac_5.1
.lmod/cache/timestamp

@laraPPr
Copy link
Collaborator Author

laraPPr commented Sep 28, 2023

Hmm, looks like there's lilttle hope with TensorFlow v4.8.1 on aarch64 as well?

external/XNNPACK/src/qu8-igemm/gen/4x16c4-minmax-fp32-aarch64-neondot-ld128.S: Assembler messages:
external/XNNPACK/src/qu8-igemm/gen/4x16c4-minmax-fp32-aarch64-neondot-ld128.S:123: Error: selected processor does not support `udot v12.4s,v8.16b,v0.16b'

Let's see if x86_64 works at least...

bot: build repo:eessi-2023.06-software arch:x86_64/generic bot: build repo:eessi-2023.06-software arch:x86_64/amd/zen2

WARNING: Download from https://storage.googleapis.com/mirror.tensorflow.org/github.com/google/XNNPACK/archive/113092317754c7dea47bfb3cb49c4f59c3c1fa10.zip failed: class com.google.devtools.build.lib.bazel.repository.downloader.UnrecoverableHttpException GET returned 404 Not Found

I found this in the logs. Or was this triggered by the error you found above.

@boegel
Copy link
Contributor

boegel commented Sep 28, 2023

@laraPPr Oh, I overlooked that part...
In that case, we can also give it another try on aarch64 I think

@boegel
Copy link
Contributor

boegel commented Sep 28, 2023

Well, since it's a 404 error, it looks like it's not a temporary error, but something that actually got removed and is no longer available. In that case, I expect to see the same problem pop up for x86_64 as well...

The URL is definitely broken with mirror.tensorflow.org, but the direct URL (https://github.com/google/XNNPACK/archive/113092317754c7dea47bfb3cb49c4f59c3c1fa10.zip) works just fine... So this may be patcheable.

@laraPPr
Copy link
Collaborator Author

laraPPr commented Sep 28, 2023

mirror.tensorflow.org this url is not broken anymore which also explains why build succeeded for x86_64-amd-zen2 and x86_64-generic. So I'm retrying build for aarch64-neoverse_v1

@laraPPr
Copy link
Collaborator Author

laraPPr commented Sep 28, 2023

bot: build repo:eessi-2023.06-software arch:aarch64/neoverse_v1

@eessi-bot
Copy link

eessi-bot bot commented Sep 28, 2023

Updates by the bot instance eessi-bot-citc-aws (click for details)
  • received bot command build repo:eessi-2023.06-software arch:aarch64/neoverse_v1 from laraPPr

    • expanded format: build repository:eessi-2023.06-software architecture:aarch64/neoverse_v1
  • handling command build repository:eessi-2023.06-software architecture:aarch64/neoverse_v1 resulted in:

@eessi-bot
Copy link

eessi-bot bot commented Sep 28, 2023

New job on instance eessi-bot-citc-aws for architecture aarch64-neoverse_v1 for repository eessi-2023.06-software in job dir /mnt/shared/home/bot/eessi-bot-software-layer/jobs/2023.09/pr_343/7597

date job status comment
Sep 28 14:12:29 UTC 2023 submitted job id 7597 awaits release by job manager
Sep 28 14:12:39 UTC 2023 released job awaits launch by Slurm scheduler
Sep 28 15:30:54 UTC 2023 running job 7597 is running
Sep 28 15:35:17 UTC 2023 finished
😢 FAILURE (click triangle for details)
Details
✅ job output file slurm-7597.out
❌ found message matching ERROR:
✅ no message matching FAILED:
✅ no message matching required modules missing:
✅ found message(s) matching No missing installations
✅ found message matching .tar.gz created!
Artefacts
eessi-2023.06-software-linux-aarch64-neoverse_v1-1695915289.tar.gzsize: 0 MiB (173156 bytes)
entries: 3
modules under 2023.06/software/linux/aarch64/neoverse_v1/modules/all
no module files in tarball
software under 2023.06/software/linux/aarch64/neoverse_v1/software
no software packages in tarball
other under 2023.06/software/linux/aarch64/neoverse_v1
.lmod/cache/spiderT.lua
.lmod/cache/spiderT.luac_5.1
.lmod/cache/timestamp

@laraPPr
Copy link
Collaborator Author

laraPPr commented Sep 29, 2023

HTTP Error 403: rate limit exceeded
Restarting on aarch64-neoverse_v1

@laraPPr
Copy link
Collaborator Author

laraPPr commented Sep 29, 2023

bot: build repo:eessi-2023.06-software arch:aarch64/neoverse_v1

@eessi-bot
Copy link

eessi-bot bot commented Sep 29, 2023

Updates by the bot instance eessi-bot-citc-aws (click for details)
  • received bot command build repo:eessi-2023.06-software arch:aarch64/neoverse_v1 from laraPPr

    • expanded format: build repository:eessi-2023.06-software architecture:aarch64/neoverse_v1
  • handling command build repository:eessi-2023.06-software architecture:aarch64/neoverse_v1 resulted in:

@eessi-bot
Copy link

eessi-bot bot commented Sep 29, 2023

New job on instance eessi-bot-citc-aws for architecture aarch64-neoverse_v1 for repository eessi-2023.06-software in job dir /mnt/shared/home/bot/eessi-bot-software-layer/jobs/2023.09/pr_343/7601

date job status comment
Sep 29 07:37:55 UTC 2023 submitted job id 7601 awaits release by job manager
Sep 29 07:38:57 UTC 2023 released job awaits launch by Slurm scheduler
Sep 29 07:43:08 UTC 2023 running job 7601 is running
Sep 29 08:42:34 UTC 2023 finished
😢 FAILURE (click triangle for details)
Details
✅ job output file slurm-7601.out
❌ found message matching ERROR:
❌ found message matching FAILED:
❌ found message matching required modules missing:
✅ found message(s) matching No missing installations
✅ found message matching .tar.gz created!
Artefacts
eessi-2023.06-software-linux-aarch64-neoverse_v1-1695976822.tar.gzsize: 50 MiB (52870832 bytes)
entries: 921
modules under 2023.06/software/linux/aarch64/neoverse_v1/modules/all
Bazel/4.2.2-GCCcore-11.2.0.lua
dill/0.3.4-GCCcore-11.2.0.lua
flatbuffers/2.0.0-GCCcore-11.2.0.lua
flatbuffers-python/2.0-GCCcore-11.2.0.lua
giflib/5.2.1-GCCcore-11.2.0.lua
h5py/3.6.0-foss-2021b.lua
JsonCpp/1.9.4-GCCcore-11.2.0.lua
LMDB/0.9.29-GCCcore-11.2.0.lua
nsync/1.24.0-GCCcore-11.2.0.lua
pkgconfig/1.5.5-GCCcore-11.2.0-python.lua
protobuf/3.17.3-GCCcore-11.2.0.lua
protobuf-python/3.17.3-GCCcore-11.2.0.lua
Zip/3.0-GCCcore-11.2.0.lua
software under 2023.06/software/linux/aarch64/neoverse_v1/software
Bazel/4.2.2-GCCcore-11.2.0
dill/0.3.4-GCCcore-11.2.0
flatbuffers/2.0.0-GCCcore-11.2.0
flatbuffers-python/2.0-GCCcore-11.2.0
giflib/5.2.1-GCCcore-11.2.0
h5py/3.6.0-foss-2021b
JsonCpp/1.9.4-GCCcore-11.2.0
LMDB/0.9.29-GCCcore-11.2.0
nsync/1.24.0-GCCcore-11.2.0
pkgconfig/1.5.5-GCCcore-11.2.0-python
protobuf/3.17.3-GCCcore-11.2.0
protobuf-python/3.17.3-GCCcore-11.2.0
Zip/3.0-GCCcore-11.2.0
other under 2023.06/software/linux/aarch64/neoverse_v1
.lmod/cache/spiderT.lua
.lmod/cache/spiderT.luac_5.1
.lmod/cache/timestamp

@laraPPr
Copy link
Collaborator Author

laraPPr commented Sep 29, 2023

WARNING: Download from https://storage.googleapis.com/mirror.tensorflow.org/github.com/tensorflow/runtime/archive/c3e082762b7664bbc7ffd2c39e86464928e27c0c.tar.gz failed: class com.google.devtools.build.lib.bazel.repository.downloader.UnrecoverableHttpException GET returned 404 Not Foun

@boegel
Copy link
Contributor

boegel commented Sep 29, 2023

Yeah, those mirrors don't seem to be very stable...

Again works fine with the direct URL (https://github.com/tensorflow/runtime/archive/c3e082762b7664bbc7ffd2c39e86464928e27c0c.tar.gz).

@Flamefire Do you happen to know what this downloading via mirror.tensorflow.org shenanigans is about?

@Flamefire
Copy link
Contributor

@Flamefire Do you happen to know what this downloading via mirror.tensorflow.org shenanigans is about?

They have (and require) a mirror for the stuff they download and try that first. I'm not sure it ever succeeded for me so maybe it is somewhere reachable internally from Google only, so we can just ignore those.

@eessi-bot
Copy link

eessi-bot bot commented Sep 29, 2023

Updates by the bot instance eessi-bot-citc-aws (click for details)
  • account Flamefire has NO permission to send commands to the bot

@boegel
Copy link
Contributor

boegel commented Sep 29, 2023

OK, so then I think that this is really the problem we're hitting on aarch64 with this TensorFlow version:

  /tmp/eb-600ci_9n/eb-gh7fxuh7/eb-b8j9xosv/eb-ngr9cagg/tmpzkv_lbrw/rpath_wrappers/gcc_wrapper/gcc -U_FORTIFY_SOURCE -fstack-protector -Wall -Wunused-but-set-parameter -Wno-free-nonheap-object -fno-omit-frame-pointer -g0 -O2 '-D_FORTIFY_SOURCE=1' -DNDEBUG -ffunction-sections -fdata-sections -MD -MF bazel-out/aarch64-opt/bin/external/XNNPACK/_objs/asm_microkernels/2/4x16c4-minmax-fp32-aarch64-neondot-ld128.pic.d -fPIC -iquoteexternal/XNNPACK -iquotebazel-out/aarch64-opt/bin/external/XNNPACK -isystem external/XNNPACK/include -isystem bazel-out/aarch64-opt/bin/external/XNNPACK/include -isystem external/XNNPACK/src -isystem bazel-out/aarch64-opt/bin/external/XNNPACK/src -w -DAUTOLOAD_DYNAMIC_KERNELS -O2 -ftree-vectorize '-mcpu=native' -fno-math-errno -fPIC -fPIC -Iinclude -Isrc '-march=armv8.2-a+fp16+dotprod' -O2 -fno-canonical-system-headers -Wno-builtin-macro-redefined '-D__DATE__="redacted"' '-D__TIMESTAMP__="redacted"' '-D__TIME__="redacted"' -c external/XNNPACK/src/qs8-gemm/gen/4x16c4-minmax-fp32-aarch64-neondot-ld128.S -o bazel-out/aarch64-opt/bin/external/XNNPACK/_objs/asm_microkernels/2/4x16c4-minmax-fp32-aarch64-neondot-ld128.pic.o)
Execution platform: @local_execution_config_platform//:platform
external/XNNPACK/src/qs8-gemm/gen/4x16c4-minmax-fp32-aarch64-neondot-ld128.S: Assembler messages:
external/XNNPACK/src/qs8-gemm/gen/4x16c4-minmax-fp32-aarch64-neondot-ld128.S:102: Error: selected processor does not support `sdot v16.4s,v4.16b,v0.4b[0]'
external/XNNPACK/src/qs8-gemm/gen/4x16c4-minmax-fp32-aarch64-neondot-ld128.S:103: Error: selected processor does not support `sdot v17.4s,v4.16b,v1.4b[0]'

@boegel boegel changed the title {2023.06}[foss/2021b] TensorFlow 4.8.1 {2023.06}[foss/2021b] TensorFlow 2.8.4 Sep 29, 2023
@boegel
Copy link
Contributor

boegel commented Sep 29, 2023

I've opened PRs to check whether we're still seeing problems with newer TensorFlow versions on aarch64:

@boegel
Copy link
Contributor

boegel commented Oct 2, 2023

I've opened a dedicated issue in the EasyBuild repo for the problem on aarch64/*, since this is in no way specific to EESSI: easybuilders/easybuild-easyconfigs#18899

@boegel
Copy link
Contributor

boegel commented Oct 3, 2023

bot: build repo:eessi-2023.06-software arch:x86_64/intel/haswell
bot: build repo:eessi-2023.06-software arch:aarch64/neoverse_n1
bot: build repo:eessi-2023.06-software arch:aarch64/neoverse_v1

@eessi-bot
Copy link

eessi-bot bot commented Oct 3, 2023

Updates by the bot instance eessi-bot-citc-aws (click for details)
  • received bot command build repo:eessi-2023.06-software arch:x86_64/intel/haswell from boegel

    • expanded format: build repository:eessi-2023.06-software architecture:x86_64/intel/haswell
  • received bot command build repo:eessi-2023.06-software arch:aarch64/neoverse_n1 from boegel

    • expanded format: build repository:eessi-2023.06-software architecture:aarch64/neoverse_n1
  • received bot command build repo:eessi-2023.06-software arch:aarch64/neoverse_v1 from boegel

    • expanded format: build repository:eessi-2023.06-software architecture:aarch64/neoverse_v1
  • handling command build repository:eessi-2023.06-software architecture:x86_64/intel/haswell resulted in:

  • handling command build repository:eessi-2023.06-software architecture:aarch64/neoverse_n1 resulted in:

  • handling command build repository:eessi-2023.06-software architecture:aarch64/neoverse_v1 resulted in:

@eessi-bot
Copy link

eessi-bot bot commented Oct 3, 2023

New job on instance eessi-bot-citc-aws for architecture x86_64-intel-haswell for repository eessi-2023.06-software in job dir /mnt/shared/home/bot/eessi-bot-software-layer/jobs/2023.10/pr_343/7690

date job status comment
Oct 03 18:50:00 UTC 2023 submitted job id 7690 awaits release by job manager
Oct 03 18:51:07 UTC 2023 released job awaits launch by Slurm scheduler
Oct 03 18:55:17 UTC 2023 running job 7690 is running
Oct 03 19:02:49 UTC 2023 finished
😢 FAILURE (click triangle for details)
Details
✅ job output file slurm-7690.out
❌ found message matching ERROR:
✅ no message matching FAILED:
✅ no message matching required modules missing:
✅ found message(s) matching No missing installations
✅ found message matching .tar.gz created!
Artefacts
eessi-2023.06-software-linux-x86_64-intel-haswell-1696359741.tar.gzsize: 0 MiB (211352 bytes)
entries: 3
modules under 2023.06/software/linux/x86_64/intel/haswell/modules/all
no module files in tarball
software under 2023.06/software/linux/x86_64/intel/haswell/software
no software packages in tarball
other under 2023.06/software/linux/x86_64/intel/haswell
.lmod/cache/spiderT.lua
.lmod/cache/spiderT.luac_5.1
.lmod/cache/timestamp

@eessi-bot
Copy link

eessi-bot bot commented Oct 3, 2023

New job on instance eessi-bot-citc-aws for architecture aarch64-neoverse_n1 for repository eessi-2023.06-software in job dir /mnt/shared/home/bot/eessi-bot-software-layer/jobs/2023.10/pr_343/7691

date job status comment
Oct 03 18:50:08 UTC 2023 submitted job id 7691 awaits release by job manager
Oct 03 18:51:04 UTC 2023 released job awaits launch by Slurm scheduler
Oct 03 18:55:15 UTC 2023 running job 7691 is running
Oct 03 19:03:54 UTC 2023 finished
😢 FAILURE (click triangle for details)
Details
✅ job output file slurm-7691.out
❌ found message matching ERROR:
✅ no message matching FAILED:
✅ no message matching required modules missing:
✅ found message(s) matching No missing installations
✅ found message matching .tar.gz created!
Artefacts
eessi-2023.06-software-linux-aarch64-neoverse_n1-1696359795.tar.gzsize: 0 MiB (210745 bytes)
entries: 3
modules under 2023.06/software/linux/aarch64/neoverse_n1/modules/all
no module files in tarball
software under 2023.06/software/linux/aarch64/neoverse_n1/software
no software packages in tarball
other under 2023.06/software/linux/aarch64/neoverse_n1
.lmod/cache/spiderT.lua
.lmod/cache/spiderT.luac_5.1
.lmod/cache/timestamp

@eessi-bot
Copy link

eessi-bot bot commented Oct 3, 2023

New job on instance eessi-bot-citc-aws for architecture aarch64-neoverse_v1 for repository eessi-2023.06-software in job dir /mnt/shared/home/bot/eessi-bot-software-layer/jobs/2023.10/pr_343/7692

date job status comment
Oct 03 18:50:15 UTC 2023 submitted job id 7692 awaits release by job manager
Oct 03 18:51:00 UTC 2023 released job awaits launch by Slurm scheduler
Oct 03 18:54:10 UTC 2023 running job 7692 is running
Oct 03 19:03:52 UTC 2023 finished
😢 FAILURE (click triangle for details)
Details
✅ job output file slurm-7692.out
❌ found message matching ERROR:
✅ no message matching FAILED:
✅ no message matching required modules missing:
✅ found message(s) matching No missing installations
✅ found message matching .tar.gz created!
Artefacts
eessi-2023.06-software-linux-aarch64-neoverse_v1-1696359762.tar.gzsize: 0 MiB (210748 bytes)
entries: 3
modules under 2023.06/software/linux/aarch64/neoverse_v1/modules/all
no module files in tarball
software under 2023.06/software/linux/aarch64/neoverse_v1/software
no software packages in tarball
other under 2023.06/software/linux/aarch64/neoverse_v1
.lmod/cache/spiderT.lua
.lmod/cache/spiderT.luac_5.1
.lmod/cache/timestamp

@boegel boegel changed the base branch from 2023.06 to pilot.eessi-hpc.org-2023.06 November 21, 2023 21:18
@laraPPr laraPPr closed this Apr 2, 2024
trz42 pushed a commit to trz42/software-layer that referenced this pull request May 2, 2024
…eb491

Rebuild `hatchling 1.18.0`, `Python 3.11.*`, `Python-bundle-PyPI-2023.*` to solve `setuptools_scm`
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants