-
Notifications
You must be signed in to change notification settings - Fork 429
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Don't include undef sym refs when building map of symbol definitions #629
Merged
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
andrewjcg
force-pushed
the
undef_symbs
branch
from
November 9, 2023 03:37
1e754f3
to
98cbcd3
Compare
andrewjcg
added a commit
to andrewjcg/py-spy
that referenced
this pull request
Sep 3, 2024
… index Summary: Don't count undefined symbols in the index of symbols that py-spy builds. This can causes e.g. py-spy to misattribute an undefined ref to `_PyRuntime` in some location other than `libpython.so` as the definition. Upstreamed as: benfred#629 Test Plan: Ran on `/packages/cpu.xlformers.train/penv.par`. Before, we'd die with: ``` $ RUST_LOG=info ./fbpy-spy dump -p 1162 [2023-10-31T18:04:04.658254536Z INFO py_spy::config] Command line args: ArgMatches { args: {}, subcommand: Some(SubCommand { id: [hash: B8461C91A07ADDC8], name: "dump", matches: ArgMatches { args: {[hash: CD5160AB4406C427]: MatchedArg { occurs: 1, source: Some(CommandLine), indices: [2], type_id: Some(TypeId { t: 69534013883876418352099503721857626982 }), vals: [[AnyValue { inner: TypeId { t: 69534013883876418352099503721857626982 } }]], raw_vals: [["1162"]], ignore_case: false }}, subcommand: None } }) } [2023-10-31T18:04:04.660694834Z INFO py_spy::python_spy] Got virtual memory maps from pid 1162: [2023-10-31T18:04:07.033385523Z INFO py_spy::python_spy] Found libpython binary @ /usr/local/fbcode/platform010/lib/libpython3.10.so.1.0 [2023-10-31T18:04:07.038415315Z INFO py_spy::python_spy] got symbol Py_GetVersion.version (0x00007fa5a425acf0) from libpython binary [2023-10-31T18:04:07.038425108Z INFO py_spy::python_spy] Getting version from symbol address [2023-10-31T18:04:07.039366641Z INFO py_spy::version] Found matching version string '3.10.9+fb (3.10:1dd9be6, May 4 2022, 01:23:45) [Clang 12.0.1 (mononoke://' [2023-10-31T18:04:07.039374857Z INFO py_spy::python_spy] python version 3.10.9 detected [2023-10-31T18:04:07.039380427Z INFO py_spy::python_spy] got symbol _PyRuntime (0x000056301cf89000) from python binary [2023-10-31T18:04:07.039498251Z WARN py_spy::python_spy] Interpreter address from _PyRuntime symbol is invalid 0000000000000040 [2023-10-31T18:04:07.039503358Z INFO py_spy::python_spy] Failed to get interp_head from symbols, scanning BSS section from main binary [2023-10-31T18:04:07.154577459Z INFO py_spy::python_spy] Failed to get interpreter from binary BSS, scanning libpython BSS Error: Failed to find a python interpreter in the .data section ``` After: ``` $ RUST_LOG=info ./py-spy dump -p 1162 [2023-10-31T18:04:20.036236603Z INFO py_spy::config] Command line args: ArgMatches { args: {}, subcommand: Some(SubCommand { id: [hash: B8461C91A07ADDC8], name: "dump", matches: ArgMatches { args: {[hash: CD5160AB4406C427]: MatchedArg { occurs: 1, source: Some(CommandLine), indices: [2], type_id: Some(TypeId { t: 69534013883876418352099503721857626982 }), vals: [[AnyValue { inner: TypeId { t: 69534013883876418352099503721857626982 } }]], raw_vals: [["1162"]], ignore_case: false }}, subcommand: None } }) } [2023-10-31T18:04:20.038355392Z INFO py_spy::python_spy] Got virtual memory maps from pid 1162: [2023-10-31T18:04:22.319161826Z INFO py_spy::python_spy] Found libpython binary @ /usr/local/fbcode/platform010/lib/libpython3.10.so.1.0 [2023-10-31T18:04:22.323992753Z INFO py_spy::python_spy] got symbol Py_GetVersion.version (0x00007fa5a425acf0) from libpython binary [2023-10-31T18:04:22.324001859Z INFO py_spy::python_spy] Getting version from symbol address [2023-10-31T18:04:22.324937137Z INFO py_spy::version] Found matching version string '3.10.9+fb (3.10:1dd9be6, May 4 2022, 01:23:45) [Clang 12.0.1 (mononoke://' [2023-10-31T18:04:22.324946474Z INFO py_spy::python_spy] python version 3.10.9 detected [2023-10-31T18:04:22.324951227Z INFO py_spy::python_spy] got symbol _PyRuntime (0x00007fa5a42531b0) from libpython binary [2023-10-31T18:04:22.325348234Z INFO py_spy::python_spy] Found interpreter at 0x00007fa57daea000 [2023-10-31T18:04:22.325352986Z INFO py_spy::python_spy] got symbol _PyRuntime (0x00007fa5a42531b0) from libpython binary [2023-10-31T18:04:22.325356193Z INFO py_spy::python_spy] Found _PyRuntime @ 0x00007fa5a42531b0, getting gilstate.tstate_current from offset 0x238 Process 1162: [xarexec] /packages/cpu.xlformers.train/penv.par -tt /dev/shm/uid-0/894107fb-seed-nspid4026533351_cgpid8628534-ns-4026533348/__run_xar_main__.py --model=genesis220B_kv8 --model.non_linearity=swiglu --model.use_rope=True --model.init.use_gaussian=True --model.init.use_depth=current --model.alpha_depth=disabled --optim.lr=0.00015 --optim.lr_min_ratio=0.1 --optim.warmup=2000 --seq_len=4096 --batch_size=4 --steps=476000 --unlimited_steps=False --log_freq=10 --eval_freq=-1 --profile_freq=-1 --dump_freq=50 --iter_type=multi --fp32_reduce_scatter=False --checkpoint_destination=directio --model_entity_id=-1 --do_checkpoint=True --model_parallel_size=8 --log_all_steps=True --gpu_check_level=-1 --tokenizer_dir=/mnt/wsfuse/tokenizers --periodic_gpu_check=False --data=/mnt/wsfuse/fair_llm_v2/shuffled/stackexchange:0.88,/mnt/wsfuse/fair_llm_v2/shuffled/b3g:3.15,/mnt/wsfuse/fair_llm_v2/shuffled/arxiv:1.14,/mnt/wsfuse/fair_llm_v2/shuffled/github_oss_with_stack:4,/mnt/wsfuse/fair_llm_v2/shuffled/c4/en:6,/mnt/wsfuse/fair_llm_v2/edouard_cc_20220927_new:24.7,/mnt/wsfuse/fair_llm_v2/ccnet_new:28.3,/mnt/wsfuse/fair_llm_v2/shuffled/wikipedia:3.5 --use_libuv=True --model_ckpt_multiplier=1 --optim_ckpt_multiplier=1 --dump_dir=/mnt/wsfuse/outputs/torchx-cpu-xlformers-h514mwh Python v3.10.9 (/dev/shm/uid-0/894107fb-seed-nspid4026533351_cgpid8628534-ns-4026533348/runtime/bin/train#native-main#platform-runtime#python#py_version_3_10) Thread 0x7FA5B5B8E000 (active): "MainThread" _single_tensor_adamw (torch/optim/adamw.py:466) adamw (torch/optim/adamw.py:335) step (torch/optim/adamw.py:184) _use_grad (torch/optim/optimizer.py:76) wrapper (torch/optim/optimizer.py:373) wrapper (torch/optim/lr_scheduler.py:68) main (train.py:761) manifoldfs_main_wrapper (train.py:296) inner (contextlib.py:79) <module> (train.py:1204) _run_code (runpy.py:86) _run_module_as_main (runpy.py:196) run_as_main (__par__/bootstrap.py:58) run_as_main (__par__/meta_only/bootstrap.py:76) __invoke_main (__run_xar_main__.py:91) <module> (__run_xar_main__.py:140) Thread 0x7FA55F400000 (idle): "Thread-1" wait (threading.py:324) get (queue.py:180) _run (tensorboard/summary/writer/event_file_writer.py:225) run (tensorboard/summary/writer/event_file_writer.py:253) _bootstrap_inner (threading.py:1016) _bootstrap (threading.py:973) Thread 0x7FA562C00000 (idle): "Thread-2" wait (threading.py:324) get (queue.py:180) _run (tensorboard/summary/writer/event_file_writer.py:225) run (tensorboard/summary/writer/event_file_writer.py:253) _bootstrap_inner (threading.py:1016) _bootstrap (threading.py:973) Thread 0x7F9FC4600000 (idle): "Thread-3" wait (threading.py:324) get (queue.py:180) _run (tensorboard/summary/writer/event_file_writer.py:225) run (tensorboard/summary/writer/event_file_writer.py:253) _bootstrap_inner (threading.py:1016) _bootstrap (threading.py:973) Thread 0x7F9F85A00000 (idle): "Thread-4" wait (threading.py:324) get (queue.py:180) _run (tensorboard/summary/writer/event_file_writer.py:225) run (tensorboard/summary/writer/event_file_writer.py:253) _bootstrap_inner (threading.py:1016) _bootstrap (threading.py:973) Thread 0x7F9F85000000 (idle): "Thread-5" wait (threading.py:324) get (queue.py:180) _run (tensorboard/summary/writer/event_file_writer.py:225) run (tensorboard/summary/writer/event_file_writer.py:253) _bootstrap_inner (threading.py:1016) _bootstrap (threading.py:973) Thread 0x7F9F84600000 (idle): "Thread-6" wait (threading.py:324) get (queue.py:180) _run (tensorboard/summary/writer/event_file_writer.py:225) run (tensorboard/summary/writer/event_file_writer.py:253) _bootstrap_inner (threading.py:1016) _bootstrap (threading.py:973) ``` Reviewers: bmaurer, kunalb, wenyinfu Reviewed By: bmaurer Subscribers: mzlee Differential Revision: https://phabricator.intern.facebook.com/D50847131
andrewjcg
added a commit
to andrewjcg/py-spy
that referenced
this pull request
Sep 3, 2024
… index Summary: Don't count undefined symbols in the index of symbols that py-spy builds. This can causes e.g. py-spy to misattribute an undefined ref to `_PyRuntime` in some location other than `libpython.so` as the definition. Upstreamed as: benfred#629 Test Plan: Ran on `/packages/cpu.xlformers.train/penv.par`. Before, we'd die with: ``` $ RUST_LOG=info ./fbpy-spy dump -p 1162 [2023-10-31T18:04:04.658254536Z INFO py_spy::config] Command line args: ArgMatches { args: {}, subcommand: Some(SubCommand { id: [hash: B8461C91A07ADDC8], name: "dump", matches: ArgMatches { args: {[hash: CD5160AB4406C427]: MatchedArg { occurs: 1, source: Some(CommandLine), indices: [2], type_id: Some(TypeId { t: 69534013883876418352099503721857626982 }), vals: [[AnyValue { inner: TypeId { t: 69534013883876418352099503721857626982 } }]], raw_vals: [["1162"]], ignore_case: false }}, subcommand: None } }) } [2023-10-31T18:04:04.660694834Z INFO py_spy::python_spy] Got virtual memory maps from pid 1162: [2023-10-31T18:04:07.033385523Z INFO py_spy::python_spy] Found libpython binary @ /usr/local/fbcode/platform010/lib/libpython3.10.so.1.0 [2023-10-31T18:04:07.038415315Z INFO py_spy::python_spy] got symbol Py_GetVersion.version (0x00007fa5a425acf0) from libpython binary [2023-10-31T18:04:07.038425108Z INFO py_spy::python_spy] Getting version from symbol address [2023-10-31T18:04:07.039366641Z INFO py_spy::version] Found matching version string '3.10.9+fb (3.10:1dd9be6, May 4 2022, 01:23:45) [Clang 12.0.1 (mononoke://' [2023-10-31T18:04:07.039374857Z INFO py_spy::python_spy] python version 3.10.9 detected [2023-10-31T18:04:07.039380427Z INFO py_spy::python_spy] got symbol _PyRuntime (0x000056301cf89000) from python binary [2023-10-31T18:04:07.039498251Z WARN py_spy::python_spy] Interpreter address from _PyRuntime symbol is invalid 0000000000000040 [2023-10-31T18:04:07.039503358Z INFO py_spy::python_spy] Failed to get interp_head from symbols, scanning BSS section from main binary [2023-10-31T18:04:07.154577459Z INFO py_spy::python_spy] Failed to get interpreter from binary BSS, scanning libpython BSS Error: Failed to find a python interpreter in the .data section ``` After: ``` $ RUST_LOG=info ./py-spy dump -p 1162 [2023-10-31T18:04:20.036236603Z INFO py_spy::config] Command line args: ArgMatches { args: {}, subcommand: Some(SubCommand { id: [hash: B8461C91A07ADDC8], name: "dump", matches: ArgMatches { args: {[hash: CD5160AB4406C427]: MatchedArg { occurs: 1, source: Some(CommandLine), indices: [2], type_id: Some(TypeId { t: 69534013883876418352099503721857626982 }), vals: [[AnyValue { inner: TypeId { t: 69534013883876418352099503721857626982 } }]], raw_vals: [["1162"]], ignore_case: false }}, subcommand: None } }) } [2023-10-31T18:04:20.038355392Z INFO py_spy::python_spy] Got virtual memory maps from pid 1162: [2023-10-31T18:04:22.319161826Z INFO py_spy::python_spy] Found libpython binary @ /usr/local/fbcode/platform010/lib/libpython3.10.so.1.0 [2023-10-31T18:04:22.323992753Z INFO py_spy::python_spy] got symbol Py_GetVersion.version (0x00007fa5a425acf0) from libpython binary [2023-10-31T18:04:22.324001859Z INFO py_spy::python_spy] Getting version from symbol address [2023-10-31T18:04:22.324937137Z INFO py_spy::version] Found matching version string '3.10.9+fb (3.10:1dd9be6, May 4 2022, 01:23:45) [Clang 12.0.1 (mononoke://' [2023-10-31T18:04:22.324946474Z INFO py_spy::python_spy] python version 3.10.9 detected [2023-10-31T18:04:22.324951227Z INFO py_spy::python_spy] got symbol _PyRuntime (0x00007fa5a42531b0) from libpython binary [2023-10-31T18:04:22.325348234Z INFO py_spy::python_spy] Found interpreter at 0x00007fa57daea000 [2023-10-31T18:04:22.325352986Z INFO py_spy::python_spy] got symbol _PyRuntime (0x00007fa5a42531b0) from libpython binary [2023-10-31T18:04:22.325356193Z INFO py_spy::python_spy] Found _PyRuntime @ 0x00007fa5a42531b0, getting gilstate.tstate_current from offset 0x238 Process 1162: [xarexec] /packages/cpu.xlformers.train/penv.par -tt /dev/shm/uid-0/894107fb-seed-nspid4026533351_cgpid8628534-ns-4026533348/__run_xar_main__.py --model=genesis220B_kv8 --model.non_linearity=swiglu --model.use_rope=True --model.init.use_gaussian=True --model.init.use_depth=current --model.alpha_depth=disabled --optim.lr=0.00015 --optim.lr_min_ratio=0.1 --optim.warmup=2000 --seq_len=4096 --batch_size=4 --steps=476000 --unlimited_steps=False --log_freq=10 --eval_freq=-1 --profile_freq=-1 --dump_freq=50 --iter_type=multi --fp32_reduce_scatter=False --checkpoint_destination=directio --model_entity_id=-1 --do_checkpoint=True --model_parallel_size=8 --log_all_steps=True --gpu_check_level=-1 --tokenizer_dir=/mnt/wsfuse/tokenizers --periodic_gpu_check=False --data=/mnt/wsfuse/fair_llm_v2/shuffled/stackexchange:0.88,/mnt/wsfuse/fair_llm_v2/shuffled/b3g:3.15,/mnt/wsfuse/fair_llm_v2/shuffled/arxiv:1.14,/mnt/wsfuse/fair_llm_v2/shuffled/github_oss_with_stack:4,/mnt/wsfuse/fair_llm_v2/shuffled/c4/en:6,/mnt/wsfuse/fair_llm_v2/edouard_cc_20220927_new:24.7,/mnt/wsfuse/fair_llm_v2/ccnet_new:28.3,/mnt/wsfuse/fair_llm_v2/shuffled/wikipedia:3.5 --use_libuv=True --model_ckpt_multiplier=1 --optim_ckpt_multiplier=1 --dump_dir=/mnt/wsfuse/outputs/torchx-cpu-xlformers-h514mwh Python v3.10.9 (/dev/shm/uid-0/894107fb-seed-nspid4026533351_cgpid8628534-ns-4026533348/runtime/bin/train#native-main#platform-runtime#python#py_version_3_10) Thread 0x7FA5B5B8E000 (active): "MainThread" _single_tensor_adamw (torch/optim/adamw.py:466) adamw (torch/optim/adamw.py:335) step (torch/optim/adamw.py:184) _use_grad (torch/optim/optimizer.py:76) wrapper (torch/optim/optimizer.py:373) wrapper (torch/optim/lr_scheduler.py:68) main (train.py:761) manifoldfs_main_wrapper (train.py:296) inner (contextlib.py:79) <module> (train.py:1204) _run_code (runpy.py:86) _run_module_as_main (runpy.py:196) run_as_main (__par__/bootstrap.py:58) run_as_main (__par__/meta_only/bootstrap.py:76) __invoke_main (__run_xar_main__.py:91) <module> (__run_xar_main__.py:140) Thread 0x7FA55F400000 (idle): "Thread-1" wait (threading.py:324) get (queue.py:180) _run (tensorboard/summary/writer/event_file_writer.py:225) run (tensorboard/summary/writer/event_file_writer.py:253) _bootstrap_inner (threading.py:1016) _bootstrap (threading.py:973) Thread 0x7FA562C00000 (idle): "Thread-2" wait (threading.py:324) get (queue.py:180) _run (tensorboard/summary/writer/event_file_writer.py:225) run (tensorboard/summary/writer/event_file_writer.py:253) _bootstrap_inner (threading.py:1016) _bootstrap (threading.py:973) Thread 0x7F9FC4600000 (idle): "Thread-3" wait (threading.py:324) get (queue.py:180) _run (tensorboard/summary/writer/event_file_writer.py:225) run (tensorboard/summary/writer/event_file_writer.py:253) _bootstrap_inner (threading.py:1016) _bootstrap (threading.py:973) Thread 0x7F9F85A00000 (idle): "Thread-4" wait (threading.py:324) get (queue.py:180) _run (tensorboard/summary/writer/event_file_writer.py:225) run (tensorboard/summary/writer/event_file_writer.py:253) _bootstrap_inner (threading.py:1016) _bootstrap (threading.py:973) Thread 0x7F9F85000000 (idle): "Thread-5" wait (threading.py:324) get (queue.py:180) _run (tensorboard/summary/writer/event_file_writer.py:225) run (tensorboard/summary/writer/event_file_writer.py:253) _bootstrap_inner (threading.py:1016) _bootstrap (threading.py:973) Thread 0x7F9F84600000 (idle): "Thread-6" wait (threading.py:324) get (queue.py:180) _run (tensorboard/summary/writer/event_file_writer.py:225) run (tensorboard/summary/writer/event_file_writer.py:253) _bootstrap_inner (threading.py:1016) _bootstrap (threading.py:973) ``` Reviewers: bmaurer, kunalb, wenyinfu Reviewed By: bmaurer Subscribers: mzlee Differential Revision: https://phabricator.intern.facebook.com/D50847131
andrewjcg
added a commit
to andrewjcg/py-spy
that referenced
this pull request
Sep 3, 2024
… index Summary: Don't count undefined symbols in the index of symbols that py-spy builds. This can causes e.g. py-spy to misattribute an undefined ref to `_PyRuntime` in some location other than `libpython.so` as the definition. Upstreamed as: benfred#629 Test Plan: Ran on `/packages/cpu.xlformers.train/penv.par`. Before, we'd die with: ``` $ RUST_LOG=info ./fbpy-spy dump -p 1162 [2023-10-31T18:04:04.658254536Z INFO py_spy::config] Command line args: ArgMatches { args: {}, subcommand: Some(SubCommand { id: [hash: B8461C91A07ADDC8], name: "dump", matches: ArgMatches { args: {[hash: CD5160AB4406C427]: MatchedArg { occurs: 1, source: Some(CommandLine), indices: [2], type_id: Some(TypeId { t: 69534013883876418352099503721857626982 }), vals: [[AnyValue { inner: TypeId { t: 69534013883876418352099503721857626982 } }]], raw_vals: [["1162"]], ignore_case: false }}, subcommand: None } }) } [2023-10-31T18:04:04.660694834Z INFO py_spy::python_spy] Got virtual memory maps from pid 1162: [2023-10-31T18:04:07.033385523Z INFO py_spy::python_spy] Found libpython binary @ /usr/local/fbcode/platform010/lib/libpython3.10.so.1.0 [2023-10-31T18:04:07.038415315Z INFO py_spy::python_spy] got symbol Py_GetVersion.version (0x00007fa5a425acf0) from libpython binary [2023-10-31T18:04:07.038425108Z INFO py_spy::python_spy] Getting version from symbol address [2023-10-31T18:04:07.039366641Z INFO py_spy::version] Found matching version string '3.10.9+fb (3.10:1dd9be6, May 4 2022, 01:23:45) [Clang 12.0.1 (mononoke://' [2023-10-31T18:04:07.039374857Z INFO py_spy::python_spy] python version 3.10.9 detected [2023-10-31T18:04:07.039380427Z INFO py_spy::python_spy] got symbol _PyRuntime (0x000056301cf89000) from python binary [2023-10-31T18:04:07.039498251Z WARN py_spy::python_spy] Interpreter address from _PyRuntime symbol is invalid 0000000000000040 [2023-10-31T18:04:07.039503358Z INFO py_spy::python_spy] Failed to get interp_head from symbols, scanning BSS section from main binary [2023-10-31T18:04:07.154577459Z INFO py_spy::python_spy] Failed to get interpreter from binary BSS, scanning libpython BSS Error: Failed to find a python interpreter in the .data section ``` After: ``` $ RUST_LOG=info ./py-spy dump -p 1162 [2023-10-31T18:04:20.036236603Z INFO py_spy::config] Command line args: ArgMatches { args: {}, subcommand: Some(SubCommand { id: [hash: B8461C91A07ADDC8], name: "dump", matches: ArgMatches { args: {[hash: CD5160AB4406C427]: MatchedArg { occurs: 1, source: Some(CommandLine), indices: [2], type_id: Some(TypeId { t: 69534013883876418352099503721857626982 }), vals: [[AnyValue { inner: TypeId { t: 69534013883876418352099503721857626982 } }]], raw_vals: [["1162"]], ignore_case: false }}, subcommand: None } }) } [2023-10-31T18:04:20.038355392Z INFO py_spy::python_spy] Got virtual memory maps from pid 1162: [2023-10-31T18:04:22.319161826Z INFO py_spy::python_spy] Found libpython binary @ /usr/local/fbcode/platform010/lib/libpython3.10.so.1.0 [2023-10-31T18:04:22.323992753Z INFO py_spy::python_spy] got symbol Py_GetVersion.version (0x00007fa5a425acf0) from libpython binary [2023-10-31T18:04:22.324001859Z INFO py_spy::python_spy] Getting version from symbol address [2023-10-31T18:04:22.324937137Z INFO py_spy::version] Found matching version string '3.10.9+fb (3.10:1dd9be6, May 4 2022, 01:23:45) [Clang 12.0.1 (mononoke://' [2023-10-31T18:04:22.324946474Z INFO py_spy::python_spy] python version 3.10.9 detected [2023-10-31T18:04:22.324951227Z INFO py_spy::python_spy] got symbol _PyRuntime (0x00007fa5a42531b0) from libpython binary [2023-10-31T18:04:22.325348234Z INFO py_spy::python_spy] Found interpreter at 0x00007fa57daea000 [2023-10-31T18:04:22.325352986Z INFO py_spy::python_spy] got symbol _PyRuntime (0x00007fa5a42531b0) from libpython binary [2023-10-31T18:04:22.325356193Z INFO py_spy::python_spy] Found _PyRuntime @ 0x00007fa5a42531b0, getting gilstate.tstate_current from offset 0x238 Process 1162: [xarexec] /packages/cpu.xlformers.train/penv.par -tt /dev/shm/uid-0/894107fb-seed-nspid4026533351_cgpid8628534-ns-4026533348/__run_xar_main__.py --model=genesis220B_kv8 --model.non_linearity=swiglu --model.use_rope=True --model.init.use_gaussian=True --model.init.use_depth=current --model.alpha_depth=disabled --optim.lr=0.00015 --optim.lr_min_ratio=0.1 --optim.warmup=2000 --seq_len=4096 --batch_size=4 --steps=476000 --unlimited_steps=False --log_freq=10 --eval_freq=-1 --profile_freq=-1 --dump_freq=50 --iter_type=multi --fp32_reduce_scatter=False --checkpoint_destination=directio --model_entity_id=-1 --do_checkpoint=True --model_parallel_size=8 --log_all_steps=True --gpu_check_level=-1 --tokenizer_dir=/mnt/wsfuse/tokenizers --periodic_gpu_check=False --data=/mnt/wsfuse/fair_llm_v2/shuffled/stackexchange:0.88,/mnt/wsfuse/fair_llm_v2/shuffled/b3g:3.15,/mnt/wsfuse/fair_llm_v2/shuffled/arxiv:1.14,/mnt/wsfuse/fair_llm_v2/shuffled/github_oss_with_stack:4,/mnt/wsfuse/fair_llm_v2/shuffled/c4/en:6,/mnt/wsfuse/fair_llm_v2/edouard_cc_20220927_new:24.7,/mnt/wsfuse/fair_llm_v2/ccnet_new:28.3,/mnt/wsfuse/fair_llm_v2/shuffled/wikipedia:3.5 --use_libuv=True --model_ckpt_multiplier=1 --optim_ckpt_multiplier=1 --dump_dir=/mnt/wsfuse/outputs/torchx-cpu-xlformers-h514mwh Python v3.10.9 (/dev/shm/uid-0/894107fb-seed-nspid4026533351_cgpid8628534-ns-4026533348/runtime/bin/train#native-main#platform-runtime#python#py_version_3_10) Thread 0x7FA5B5B8E000 (active): "MainThread" _single_tensor_adamw (torch/optim/adamw.py:466) adamw (torch/optim/adamw.py:335) step (torch/optim/adamw.py:184) _use_grad (torch/optim/optimizer.py:76) wrapper (torch/optim/optimizer.py:373) wrapper (torch/optim/lr_scheduler.py:68) main (train.py:761) manifoldfs_main_wrapper (train.py:296) inner (contextlib.py:79) <module> (train.py:1204) _run_code (runpy.py:86) _run_module_as_main (runpy.py:196) run_as_main (__par__/bootstrap.py:58) run_as_main (__par__/meta_only/bootstrap.py:76) __invoke_main (__run_xar_main__.py:91) <module> (__run_xar_main__.py:140) Thread 0x7FA55F400000 (idle): "Thread-1" wait (threading.py:324) get (queue.py:180) _run (tensorboard/summary/writer/event_file_writer.py:225) run (tensorboard/summary/writer/event_file_writer.py:253) _bootstrap_inner (threading.py:1016) _bootstrap (threading.py:973) Thread 0x7FA562C00000 (idle): "Thread-2" wait (threading.py:324) get (queue.py:180) _run (tensorboard/summary/writer/event_file_writer.py:225) run (tensorboard/summary/writer/event_file_writer.py:253) _bootstrap_inner (threading.py:1016) _bootstrap (threading.py:973) Thread 0x7F9FC4600000 (idle): "Thread-3" wait (threading.py:324) get (queue.py:180) _run (tensorboard/summary/writer/event_file_writer.py:225) run (tensorboard/summary/writer/event_file_writer.py:253) _bootstrap_inner (threading.py:1016) _bootstrap (threading.py:973) Thread 0x7F9F85A00000 (idle): "Thread-4" wait (threading.py:324) get (queue.py:180) _run (tensorboard/summary/writer/event_file_writer.py:225) run (tensorboard/summary/writer/event_file_writer.py:253) _bootstrap_inner (threading.py:1016) _bootstrap (threading.py:973) Thread 0x7F9F85000000 (idle): "Thread-5" wait (threading.py:324) get (queue.py:180) _run (tensorboard/summary/writer/event_file_writer.py:225) run (tensorboard/summary/writer/event_file_writer.py:253) _bootstrap_inner (threading.py:1016) _bootstrap (threading.py:973) Thread 0x7F9F84600000 (idle): "Thread-6" wait (threading.py:324) get (queue.py:180) _run (tensorboard/summary/writer/event_file_writer.py:225) run (tensorboard/summary/writer/event_file_writer.py:253) _bootstrap_inner (threading.py:1016) _bootstrap (threading.py:973) ``` Reviewers: bmaurer, kunalb, wenyinfu Reviewed By: bmaurer Subscribers: mzlee Differential Revision: https://phabricator.intern.facebook.com/D50847131
andrewjcg
added a commit
to andrewjcg/py-spy
that referenced
this pull request
Nov 1, 2024
… index Summary: Don't count undefined symbols in the index of symbols that py-spy builds. This can causes e.g. py-spy to misattribute an undefined ref to `_PyRuntime` in some location other than `libpython.so` as the definition. Upstreamed as: benfred#629 Test Plan: Ran on `/packages/cpu.xlformers.train/penv.par`. Before, we'd die with: ``` $ RUST_LOG=info ./fbpy-spy dump -p 1162 [2023-10-31T18:04:04.658254536Z INFO py_spy::config] Command line args: ArgMatches { args: {}, subcommand: Some(SubCommand { id: [hash: B8461C91A07ADDC8], name: "dump", matches: ArgMatches { args: {[hash: CD5160AB4406C427]: MatchedArg { occurs: 1, source: Some(CommandLine), indices: [2], type_id: Some(TypeId { t: 69534013883876418352099503721857626982 }), vals: [[AnyValue { inner: TypeId { t: 69534013883876418352099503721857626982 } }]], raw_vals: [["1162"]], ignore_case: false }}, subcommand: None } }) } [2023-10-31T18:04:04.660694834Z INFO py_spy::python_spy] Got virtual memory maps from pid 1162: [2023-10-31T18:04:07.033385523Z INFO py_spy::python_spy] Found libpython binary @ /usr/local/fbcode/platform010/lib/libpython3.10.so.1.0 [2023-10-31T18:04:07.038415315Z INFO py_spy::python_spy] got symbol Py_GetVersion.version (0x00007fa5a425acf0) from libpython binary [2023-10-31T18:04:07.038425108Z INFO py_spy::python_spy] Getting version from symbol address [2023-10-31T18:04:07.039366641Z INFO py_spy::version] Found matching version string '3.10.9+fb (3.10:1dd9be6, May 4 2022, 01:23:45) [Clang 12.0.1 (mononoke://' [2023-10-31T18:04:07.039374857Z INFO py_spy::python_spy] python version 3.10.9 detected [2023-10-31T18:04:07.039380427Z INFO py_spy::python_spy] got symbol _PyRuntime (0x000056301cf89000) from python binary [2023-10-31T18:04:07.039498251Z WARN py_spy::python_spy] Interpreter address from _PyRuntime symbol is invalid 0000000000000040 [2023-10-31T18:04:07.039503358Z INFO py_spy::python_spy] Failed to get interp_head from symbols, scanning BSS section from main binary [2023-10-31T18:04:07.154577459Z INFO py_spy::python_spy] Failed to get interpreter from binary BSS, scanning libpython BSS Error: Failed to find a python interpreter in the .data section ``` After: ``` $ RUST_LOG=info ./py-spy dump -p 1162 [2023-10-31T18:04:20.036236603Z INFO py_spy::config] Command line args: ArgMatches { args: {}, subcommand: Some(SubCommand { id: [hash: B8461C91A07ADDC8], name: "dump", matches: ArgMatches { args: {[hash: CD5160AB4406C427]: MatchedArg { occurs: 1, source: Some(CommandLine), indices: [2], type_id: Some(TypeId { t: 69534013883876418352099503721857626982 }), vals: [[AnyValue { inner: TypeId { t: 69534013883876418352099503721857626982 } }]], raw_vals: [["1162"]], ignore_case: false }}, subcommand: None } }) } [2023-10-31T18:04:20.038355392Z INFO py_spy::python_spy] Got virtual memory maps from pid 1162: [2023-10-31T18:04:22.319161826Z INFO py_spy::python_spy] Found libpython binary @ /usr/local/fbcode/platform010/lib/libpython3.10.so.1.0 [2023-10-31T18:04:22.323992753Z INFO py_spy::python_spy] got symbol Py_GetVersion.version (0x00007fa5a425acf0) from libpython binary [2023-10-31T18:04:22.324001859Z INFO py_spy::python_spy] Getting version from symbol address [2023-10-31T18:04:22.324937137Z INFO py_spy::version] Found matching version string '3.10.9+fb (3.10:1dd9be6, May 4 2022, 01:23:45) [Clang 12.0.1 (mononoke://' [2023-10-31T18:04:22.324946474Z INFO py_spy::python_spy] python version 3.10.9 detected [2023-10-31T18:04:22.324951227Z INFO py_spy::python_spy] got symbol _PyRuntime (0x00007fa5a42531b0) from libpython binary [2023-10-31T18:04:22.325348234Z INFO py_spy::python_spy] Found interpreter at 0x00007fa57daea000 [2023-10-31T18:04:22.325352986Z INFO py_spy::python_spy] got symbol _PyRuntime (0x00007fa5a42531b0) from libpython binary [2023-10-31T18:04:22.325356193Z INFO py_spy::python_spy] Found _PyRuntime @ 0x00007fa5a42531b0, getting gilstate.tstate_current from offset 0x238 Process 1162: [xarexec] /packages/cpu.xlformers.train/penv.par -tt /dev/shm/uid-0/894107fb-seed-nspid4026533351_cgpid8628534-ns-4026533348/__run_xar_main__.py --model=genesis220B_kv8 --model.non_linearity=swiglu --model.use_rope=True --model.init.use_gaussian=True --model.init.use_depth=current --model.alpha_depth=disabled --optim.lr=0.00015 --optim.lr_min_ratio=0.1 --optim.warmup=2000 --seq_len=4096 --batch_size=4 --steps=476000 --unlimited_steps=False --log_freq=10 --eval_freq=-1 --profile_freq=-1 --dump_freq=50 --iter_type=multi --fp32_reduce_scatter=False --checkpoint_destination=directio --model_entity_id=-1 --do_checkpoint=True --model_parallel_size=8 --log_all_steps=True --gpu_check_level=-1 --tokenizer_dir=/mnt/wsfuse/tokenizers --periodic_gpu_check=False --data=/mnt/wsfuse/fair_llm_v2/shuffled/stackexchange:0.88,/mnt/wsfuse/fair_llm_v2/shuffled/b3g:3.15,/mnt/wsfuse/fair_llm_v2/shuffled/arxiv:1.14,/mnt/wsfuse/fair_llm_v2/shuffled/github_oss_with_stack:4,/mnt/wsfuse/fair_llm_v2/shuffled/c4/en:6,/mnt/wsfuse/fair_llm_v2/edouard_cc_20220927_new:24.7,/mnt/wsfuse/fair_llm_v2/ccnet_new:28.3,/mnt/wsfuse/fair_llm_v2/shuffled/wikipedia:3.5 --use_libuv=True --model_ckpt_multiplier=1 --optim_ckpt_multiplier=1 --dump_dir=/mnt/wsfuse/outputs/torchx-cpu-xlformers-h514mwh Python v3.10.9 (/dev/shm/uid-0/894107fb-seed-nspid4026533351_cgpid8628534-ns-4026533348/runtime/bin/train#native-main#platform-runtime#python#py_version_3_10) Thread 0x7FA5B5B8E000 (active): "MainThread" _single_tensor_adamw (torch/optim/adamw.py:466) adamw (torch/optim/adamw.py:335) step (torch/optim/adamw.py:184) _use_grad (torch/optim/optimizer.py:76) wrapper (torch/optim/optimizer.py:373) wrapper (torch/optim/lr_scheduler.py:68) main (train.py:761) manifoldfs_main_wrapper (train.py:296) inner (contextlib.py:79) <module> (train.py:1204) _run_code (runpy.py:86) _run_module_as_main (runpy.py:196) run_as_main (__par__/bootstrap.py:58) run_as_main (__par__/meta_only/bootstrap.py:76) __invoke_main (__run_xar_main__.py:91) <module> (__run_xar_main__.py:140) Thread 0x7FA55F400000 (idle): "Thread-1" wait (threading.py:324) get (queue.py:180) _run (tensorboard/summary/writer/event_file_writer.py:225) run (tensorboard/summary/writer/event_file_writer.py:253) _bootstrap_inner (threading.py:1016) _bootstrap (threading.py:973) Thread 0x7FA562C00000 (idle): "Thread-2" wait (threading.py:324) get (queue.py:180) _run (tensorboard/summary/writer/event_file_writer.py:225) run (tensorboard/summary/writer/event_file_writer.py:253) _bootstrap_inner (threading.py:1016) _bootstrap (threading.py:973) Thread 0x7F9FC4600000 (idle): "Thread-3" wait (threading.py:324) get (queue.py:180) _run (tensorboard/summary/writer/event_file_writer.py:225) run (tensorboard/summary/writer/event_file_writer.py:253) _bootstrap_inner (threading.py:1016) _bootstrap (threading.py:973) Thread 0x7F9F85A00000 (idle): "Thread-4" wait (threading.py:324) get (queue.py:180) _run (tensorboard/summary/writer/event_file_writer.py:225) run (tensorboard/summary/writer/event_file_writer.py:253) _bootstrap_inner (threading.py:1016) _bootstrap (threading.py:973) Thread 0x7F9F85000000 (idle): "Thread-5" wait (threading.py:324) get (queue.py:180) _run (tensorboard/summary/writer/event_file_writer.py:225) run (tensorboard/summary/writer/event_file_writer.py:253) _bootstrap_inner (threading.py:1016) _bootstrap (threading.py:973) Thread 0x7F9F84600000 (idle): "Thread-6" wait (threading.py:324) get (queue.py:180) _run (tensorboard/summary/writer/event_file_writer.py:225) run (tensorboard/summary/writer/event_file_writer.py:253) _bootstrap_inner (threading.py:1016) _bootstrap (threading.py:973) ``` Reviewers: bmaurer, kunalb, wenyinfu Reviewed By: bmaurer Subscribers: mzlee Differential Revision: https://phabricator.intern.facebook.com/D50847131
Don't count undefined symbols in the index of symbols that py-spy builds. This can causes e.g. py-spy to misattribute an undefined ref to `_PyRuntime` in some location other than `libpython.so` as the definition.
andrewjcg
force-pushed
the
undef_symbs
branch
from
November 1, 2024 23:02
2599fd3
to
844a5ae
Compare
benfred
approved these changes
Nov 1, 2024
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Previously, we'd count undefined symbols references in the map of symbols defined in a binary, which could cause e.g. py-spy to misattribute an undefined ref to
_PyRuntime
in some location other than libpython.so as the definition.