Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Sockeye is training much faster than Marian #396

Open
tomsbergmanis opened this issue Sep 14, 2022 · 0 comments
Open

Sockeye is training much faster than Marian #396

tomsbergmanis opened this issue Sep 14, 2022 · 0 comments
Labels

Comments

@tomsbergmanis
Copy link

tomsbergmanis commented Sep 14, 2022

Bug description

Sockeye is training much faster than Marian.
I run a 1 data epoch long training on a 4.7M training examples small data set with either framework. To best of my knowledge I used comparable training parameters for both frameworks. Bu the results were 21 min vs 36 min, favoring Sockeye.
What I do not know is, if it is a problem due to my old setup - Ubuntu 18.04.6 and everything that follows from that (e.g. old compiler and other stuff), or it something to do with Marian.

How to reproduce

A typical way of training Sockeye systems is to run data prep step before training.
sockeye-prepare-data --source train.bpe.en --target train.bpe.lv --output . --max-seq-len 128 --shared-vocab --num-words 25000
Data prep time was not included in training time.
To measure Sockeye's training time I used timestamps between start and end of the training, which to me worked out to be 21 min.
touch sockeye.start & torchrun --no_python --nproc_per_node 2 sockeye-train --prepared-data . --output models --validation-source dev.bpe.en --validation-target dev.bpe.lv --max-num-epochs 1 --shared-vocab --dist --amp --update-interval 12 --batch-size 18000--max-seq-len 128 > training.log 2>&1 & touch sockeye.end
image
For Marian I used /marian-vocab --max-size 25000
marian --devices 0 1 --type transformer --model /tmp/toms/sockeye-test/model.npz --train-sets /tmp/toms/sockeye-test/train.bpe.en /tmp/toms/sockeye-test/train.bpe.lv --vocabs en-lv-shared-vocab.yml en-lv-shared-vocab.yml --max-length 128 --max-length-factor 1.5 --mini-batch-fit --workspace 18000 --maxi-batch 2000 --early-stopping 10 --valid-freq 1000000 --save-freq 2000000 --disp-freq 100 --keep-best --overwrite --valid-metrics cross-entropy translation --valid-sets /tmp/toms/sockeye-test/dev.bpe.en /tmp/toms/sockeye-test/dev.bpe.lv --valid-script-path /tmp/toms/sockeye-test/validate.sh --log /tmp/toms/sockeye-test/train.log --valid-log /tmp/toms/sockeye-test/valid.log --seed 347155 --exponential-smoothing --normalize 0.6 --beam-size 6 --quiet-translation --valid-translation-output /tmp/toms/sockeye-test/valid.output.txt --valid-mini-batch 16 --enc-depth 6 --dec-depth 6 --transformer-heads 8 --transformer-preprocess d --transformer-postprocess-emb d --transformer-postprocess dan --optimizer-delay 12 --learn-rate 0.0005 --lr-warmup 16000 --lr-decay-inv-sqrt 16000 --lr-report --clip-norm 5 --tied-embeddings-all --sync-sgd --transformer-dropout 0.1 --transformer-dropout-attention 0.1 --transformer-dropout-ffn 0.1 --optimizer adam --optimizer-params 0.9 0.98 1e-09 --sqlite /tmp/en-lv-W69bwc2f6meuT-combined.db -e 1 --fp16
image
To measure Marian's training time I used timestamps for outputs Training started and Training finished which to me worked out to be around 36 min. This was with Marian version: v1.10.24; 4dd30b5 2021-09-08 14:02:21 +0100
I also tried Marian v1.11.0 f00d062 2022-02-08 08:39:24 -0800 but it gave even worse - 43 min.

I do realize, that Marian's --workspace 18000 and Sockeye's --batch-size 18000 aren't the same, however, running with different --batch-size values didn't affect time it took Sockeye to train for one epoch.

I also checked if both frameworks have seen the same number of sentences during their respective training runs. The numbers were about the same.

Context

  • Marian version: v1.10.24; 4dd30b5 2021-09-08 14:02:21 +0100
  • Marian version: v1.11.0 f00d062 2022-02-08 08:39:24 -0800
  • CMake command:
    cmake ..
    -- The CXX compiler identification is GNU 7.5.0
    -- The C compiler identification is GNU 7.5.0
    -- Check for working CXX compiler: /usr/bin/c++
    -- Check for working CXX compiler: /usr/bin/c++ -- works
    -- Detecting CXX compiler ABI info
    -- Detecting CXX compiler ABI info - done
    -- Detecting CXX compile features
    -- Detecting CXX compile features - done
    -- Check for working C compiler: /usr/bin/cc
    -- Check for working C compiler: /usr/bin/cc -- works
    -- Detecting C compiler ABI info
    -- Detecting C compiler ABI info - done
    -- Detecting C compile features
    -- Detecting C compile features - done
    -- Project name: marian
    -- Project version: v1.11.0+f00d0621
    Submodule 'examples' (https://github.com/marian-nmt/marian-examples) registered for path 'examples'
    Submodule 'regression-tests' (https://github.com/marian-nmt/marian-regression-tests) registered for path 'regression-tests'
    Submodule 'src/3rd_party/fbgemm' (https://github.com/marian-nmt/FBGEMM) registered for path 'src/3rd_party/fbgemm'
    Submodule 'src/3rd_party/intgemm' (https://github.com/marian-nmt/intgemm/) registered for path 'src/3rd_party/intgemm'
    Submodule 'src/3rd_party/nccl' (https://github.com/marian-nmt/nccl) registered for path 'src/3rd_party/nccl'
    Submodule 'src/3rd_party/sentencepiece' (https://github.com/marian-nmt/sentencepiece) registered for path 'src/3rd_party/sentencepiece'
    Submodule 'src/3rd_party/simple-websocket-server' (https://github.com/marian-nmt/Simple-WebSocket-Server) registered for path 'src/3rd_party/simple-websocket-server'
    Cloning into '/tmp/toms/sockeye-test/marian/examples'...
    Cloning into '/tmp/toms/sockeye-test/marian/regression-tests'...
    Cloning into '/tmp/toms/sockeye-test/marian/src/3rd_party/fbgemm'...
    Cloning into '/tmp/toms/sockeye-test/marian/src/3rd_party/intgemm'...
    Cloning into '/tmp/toms/sockeye-test/marian/src/3rd_party/nccl'...
    Cloning into '/tmp/toms/sockeye-test/marian/src/3rd_party/sentencepiece'...
    Cloning into '/tmp/toms/sockeye-test/marian/src/3rd_party/simple-websocket-server'...
    Submodule path 'examples': checked out '6d5921cc7de91f4e915b59e9c52c9a76c4e99b00'
    Submodule path 'regression-tests': checked out '0716f4e012d1e3f7543bffa8aecc97ce9c903e17'
    Submodule path 'src/3rd_party/fbgemm': checked out '6f45243cb8ab7d7ab921af18d313ae97144618b8'
    Submodule 'third_party/asmjit' (https://github.com/asmjit/asmjit.git) registered for path 'src/3rd_party/fbgemm/third_party/asmjit'
    Submodule 'third_party/cpuinfo' (https://github.com/pytorch/cpuinfo) registered for path 'src/3rd_party/fbgemm/third_party/cpuinfo'
    Submodule 'third_party/googletest' (https://github.com/google/googletest) registered for path 'src/3rd_party/fbgemm/third_party/googletest'
    Cloning into '/tmp/toms/sockeye-test/marian/src/3rd_party/fbgemm/third_party/asmjit'...
    Cloning into '/tmp/toms/sockeye-test/marian/src/3rd_party/fbgemm/third_party/cpuinfo'...
    Cloning into '/tmp/toms/sockeye-test/marian/src/3rd_party/fbgemm/third_party/googletest'...
    Submodule path 'src/3rd_party/fbgemm/third_party/asmjit': checked out '4da474ac9aa2689e88d5e40a2f37628f302d7e3c'
    Submodule path 'src/3rd_party/fbgemm/third_party/cpuinfo': checked out 'd5e37adf1406cf899d7d9ec1d317c47506ccb970'
    Submodule path 'src/3rd_party/fbgemm/third_party/googletest': checked out '0fc5466dbb9e623029b1ada539717d10bd45e99e'
    Submodule path 'src/3rd_party/intgemm': checked out '8abde25b13c3ab210c0dec8e23f4944e3953812d'
    Submodule path 'src/3rd_party/nccl': checked out '5dcf7751494f9d04057bfc6b4a2b64611bc12253'
    Submodule path 'src/3rd_party/sentencepiece': checked out 'c307b874deb5ea896db8f93506e173353e66d4d3'
    Submodule path 'src/3rd_party/simple-websocket-server': checked out '1d7e84aeb3f1ebdc78f6965d79ad3ca3003789fe'
    CMake Warning at CMakeLists.txt:79 (message):
    CMAKE_BUILD_TYPE not set; setting to Release

-- Building with -march=native and intrinsics will be chosen automatically by the compiler to match the current machine.
-- Checking support for CPU intrinsics
-- Looking for pthread.h
-- Looking for pthread.h - found
-- Looking for pthread_create
-- Looking for pthread_create - not found
-- Looking for pthread_create in pthreads
-- Looking for pthread_create in pthreads - not found
-- Looking for pthread_create in pthread
-- Looking for pthread_create in pthread - found
-- Found Threads: TRUE
-- Found CUDA: software/anaconda3/envs/sockeye3 (found suitable version "10.0", minimum required is "9.0")
-- Compiling code for Pascal GPUs
-- Compiling code for Volta GPUs
-- Compiling code for Turing GPUs
-- Found CUDA libraries: software/anaconda3/envs/sockeye3/lib64/libcurand.so; software/anaconda3/envs/sockeye3/lib64/libcusparse.so; software/anaconda3/envs/sockeye3/lib64/libcublas.so
-- Found Tcmalloc: /usr/lib/x86_64-linux-gnu/libtcmalloc_minimal.so
-- Found MKL: -Wl,--start-group;/opt/intel/mkl/lib/intel64/libmkl_intel_ilp64.a;/opt/intel/mkl/lib/intel64/libmkl_sequential.a;/opt/intel/mkl/lib/intel64/libmkl_core.a;-Wl,--end-group
CMake Warning at src/3rd_party/intgemm/CMakeLists.txt:33 (message):
Not building AVX512VNNI-based multiplication because your compiler is
too old.

For details rerun cmake with --debug-trycompile then try to build in
compile_tests/CMakeFiles/CMakeTmp.

-- VERSION: 0.1.94
-- Found TCMalloc: /usr/lib/x86_64-linux-gnu/libtcmalloc_minimal.so
-- Found Doxygen: /usr/bin/doxygen (found version "1.8.13") found components: doxygen dot
-- Configuring done
-- Generating done
-- Build files have been written to: /tmp/toms/sockeye-test/marian/build

  • Both frameworks use CUDA Version 10 although there could be minor differences, as Sockeye 3 is installed via Conda and uses its installation.
  • I ran it on two NVIDIA TITAN RTXs

Ubuntu 18.04.6

marian-v-1.10.train.log
marian-v-1.11.train.log
sockye_training.log
sockeye.args.yaml.txt
sockeye.data.config.txt
marian-v-1.10.train.log

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

1 participant