Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Metal ion binding dataset #2

Open
empyriumz opened this issue Nov 17, 2022 · 19 comments
Open

Metal ion binding dataset #2

empyriumz opened this issue Nov 17, 2022 · 19 comments

Comments

@empyriumz
Copy link

Hi there,

Nice work!
I have a question about the metal ion binding dataset used in your paper.
Could you let me know where do you get the original dataset?

Thanks!

@elttaes
Copy link
Owner

elttaes commented Nov 18, 2022

Hi, empyriumz:

Metal ion binding dataset collected from PDB(https://www.rcsb.org/). If the protein has any Metal ion binding site, we set its label as 1.

@empyriumz
Copy link
Author

empyriumz commented Nov 18, 2022

Thanks for your reply!
To clarify, I tries to search on PDB for metal ion binding:
image
or
image

both result in 87,669 entries.
Do you also perform similar queries and compile the dataset?

@elttaes
Copy link
Owner

elttaes commented Nov 18, 2022

We wrote a crawler to crawl the annotations of each PDB protein. Do you need the original dataset we collected?

@empyriumz
Copy link
Author

By original dataset, do you mean all the PDB files? That would be too large I guess, so could you share the script used for search and annotate the PDB entries?
Thanks!

@elttaes
Copy link
Owner

elttaes commented Nov 24, 2022

I am so sorry that the classmates who wrote the crawler are not on the author list and are unwilling to give it to us. They now have a job and will also release the relevant dataset. I can notify you after their paper is released.

But I can give you a simple code that can check whether each page contains keywords. It may help you.

url = 'https://www.rcsb.org/annotations/2XEV'
req = urllib.request.Request(url=url)
content = urllib.request.urlopen(req).read() 
content = content.decode('utf-8') 
soup = BeautifulSoup(content,"html.parser")
tag = soup.find_all(text='metal ion binding')

If the page does not contain the 'metal ion binding' then the code will return a null list.

@Violet969
Copy link

Hi, I try to use your metal alphafold code to predict other protein features, but I find that your code use a pkl data as the input, so I want to know how you generate the pkl files.Thanks!

@elttaes
Copy link
Owner

elttaes commented Nov 25, 2022

Hi, I try to use your metal alphafold code to predict other protein features, but I find that your code use a pkl data as the input, so I want to know how you generate the pkl files.Thanks!

Hi Violet969:

This pkl including MSA and template information.
Related code you can see https://github.com/deepmind/alphafold/blob/main/run_alphafold.py line 172-174. When data_pipeline.process input a fasta and it will return MSA, template and pkl.

feature_dict = data_pipeline.process(
    input_fasta_path=fasta_path,
    msa_output_dir=msa_output_dir)

Pkl detail information you can see Alphafold paper's supplementary information pages 8-9.

I have already released the MSA on https://drive.google.com/drive/folders/1iShEW8NcMIlWqxTRgsEaI_t5ahoHsixt?usp=share_link

But the code to generate pkl maybe you need to modify some on run_alphafold.py. I can upload this part of the preprocessing code later.

@empyriumz
Copy link
Author

I see, thanks for your sample code! I'll try to see if the results match with my aforementioned one.

@Violet969
Copy link

Thanks for your answer. I also have a question, I saw that you use Evofomer and ESM to predict protein SS. But I don't see these code, will you share that?

@elttaes
Copy link
Owner

elttaes commented Dec 7, 2022

Thanks for your answer. I also have a question, I saw that you use Evofomer and ESM to predict protein SS. But I don't see these code, will you share that?

Sure, I will upload this part of the code later.

@elttaes
Copy link
Owner

elttaes commented Dec 19, 2022

Thanks for your answer. I also have a question, I saw that you use Evofomer and ESM to predict protein SS. But I don't see these code, will you share that?

Hi Violet969,
Secondary structure related codes and the code that can generate pkl from a3m have been uploaded into the Structure folder and Data folder, if you have any questions you can contact me.

@Violet969
Copy link

I see, thanks for the answer. I used merge_msa.py but it didn't work, can you show me a case how to use it?

@elttaes
Copy link
Owner

elttaes commented Dec 21, 2022

I see, thanks for the answer. I used merge_msa.py but it didn't work, can you show me a case how to use it?

Hi,
I have added an example, you can have a look at the latest code.

@Violet969
Copy link

I see, thanks for the answer. I used merge_msa.py but it didn't work, can you show me a case how to use it?

Hi, I have added an example, you can have a look at the latest code.

Thanks for your answer. I also want an example for run metal/alphafold/train.py. Can you share that?

@elttaes
Copy link
Owner

elttaes commented Dec 23, 2022

I see, thanks for the answer. I used merge_msa.py but it didn't work, can you show me a case how to use it?

Hi, I have added an example, you can have a look at the latest code.

Thanks for your answer. I also want an example for run metal/alphafold/train.py. Can you share that?

Now you should be able to run train.py directly with a few simple modifications. Please make sure you have configured the Alphafold runtime environment.

In addition, it seems that the current Alphafold parameter format is different from before. You can try to find the previous public parameter file.

@Violet969
Copy link

Thanks for your reply. I try to run 'train.py' on my server. But there always have an error like this.

2023-01-01 07:07:23.507834: E external/org_tensorflow/tensorflow/compiler/xla/pjrt/pjrt_stream_executor_client.cc:2130] Execution of replica 0 failed: INTERNAL: Failed to allocate 50331648 bytes for new constant
Traceback (most recent call last):
File "train.py", line 264, in
app.run(main)
File "/anaconda3/envs/alphafold_2/lib/python3.8/site-packages/absl/app.py", line 312, in run
_run_main(main, args)
File "/anaconda3/envs/alphafold_2/lib/python3.8/site-packages/absl/app.py", line 258, in _run_main
sys.exit(main(argv))
File "train.py", line 216, in main
state = jax.pmap(updater.init)(rng_pmap, data)
File "/anaconda3/envs/alphafold_2/lib/python3.8/site-packages/jax/_src/traceback_util.py", line 162, in reraise_with_filtered_traceback
return fun(*args, **kwargs)
File "/anaconda3/envs/alphafold_2/lib/python3.8/site-packages/jax/src/api.py", line 2158, in cache_miss
out_tree, out_flat = f_pmapped(*args, **kwargs)
File "/anaconda3/envs/alphafold_2/lib/python3.8/site-packages/jax/_src/api.py", line 2034, in pmap_f
out = pxla.xla_pmap(
File "/anaconda3/envs/alphafold_2/lib/python3.8/site-packages/jax/core.py", line 2022, in bind
return map_bind(self, fun, *args, **params)
File "/anaconda3/envs/alphafold_2/lib/python3.8/site-packages/jax/core.py", line 2054, in map_bind
outs = primitive.process(top_trace, fun, tracers, params)
File "/anaconda3/envs/alphafold_2/lib/python3.8/site-packages/jax/core.py", line 2025, in process
return trace.process_map(self, fun, tracers, params)
File "/anaconda3/envs/alphafold_2/lib/python3.8/site-packages/jax/core.py", line 687, in process_call
return primitive.impl(f, *tracers, **params)
File "/anaconda3/envs/alphafold_2/lib/python3.8/site-packages/jax/interpreters/pxla.py", line 841, in xla_pmap_impl
return compiled_fun(*args)
File "/anaconda3/envs/alphafold_2/lib/python3.8/site-packages/jax/_src/profiler.py", line 294, in wrapper
return func(*args, **kwargs)
File "/anaconda3/envs/alphafold_2/lib/python3.8/site-packages/jax/interpreters/pxla.py", line 1656, in call
out_bufs = self.xla_executable.execute_sharded_on_local_devices(input_bufs)
jax._src.traceback_util.UnfilteredStackTrace: jaxlib.xla_extension.XlaRuntimeError: INTERNAL: Failed to allocate 50331648 bytes for new constant: while running replica 0 and partition 0 of a replicated computation (other replicas may have failed as well).

The stack trace below excludes JAX-internal frames.
The preceding is the original exception that occurred, unmodified.

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
File "train.py", line 264, in
app.run(main)
File "/anaconda3/envs/alphafold_2/lib/python3.8/site-packages/absl/app.py", line 312, in run
_run_main(main, args)
File "/anaconda3/envs/alphafold_2/lib/python3.8/site-packages/absl/app.py", line 258, in _run_main
sys.exit(main(argv))
File "train.py", line 216, in main
state = jax.pmap(updater.init)(rng_pmap, data)
jaxlib.xla_extension.XlaRuntimeError: INTERNAL: Failed to allocate 50331648 bytes for new constant: while running replica 0 and partition 0 of a replicated computation (other replicas may have failed as well).

I have 8 nodes of 12G GPU, and 125G mem. Can you tell me how to solve it?

@elttaes
Copy link
Owner

elttaes commented Jan 1, 2023

Thanks for your reply. I try to run 'train.py' on my server. But there always have an error like this.

2023-01-01 07:07:23.507834: E external/org_tensorflow/tensorflow/compiler/xla/pjrt/pjrt_stream_executor_client.cc:2130] Execution of replica 0 failed: INTERNAL: Failed to allocate 50331648 bytes for new constant
Traceback (most recent call last):
File "train.py", line 264, in
app.run(main)
File "/anaconda3/envs/alphafold_2/lib/python3.8/site-packages/absl/app.py", line 312, in run
_run_main(main, args)
File "/anaconda3/envs/alphafold_2/lib/python3.8/site-packages/absl/app.py", line 258, in _run_main
sys.exit(main(argv))
File "train.py", line 216, in main
state = jax.pmap(updater.init)(rng_pmap, data)
File "/anaconda3/envs/alphafold_2/lib/python3.8/site-packages/jax/_src/traceback_util.py", line 162, in reraise_with_filtered_traceback
return fun(*args, **kwargs)
File "/anaconda3/envs/alphafold_2/lib/python3.8/site-packages/jax/src/api.py", line 2158, in cache_miss
out_tree, out_flat = f_pmapped(*args, **kwargs)
File "/anaconda3/envs/alphafold_2/lib/python3.8/site-packages/jax/_src/api.py", line 2034, in pmap_f
out = pxla.xla_pmap(
File "/anaconda3/envs/alphafold_2/lib/python3.8/site-packages/jax/core.py", line 2022, in bind
return map_bind(self, fun, *args, **params)
File "/anaconda3/envs/alphafold_2/lib/python3.8/site-packages/jax/core.py", line 2054, in map_bind
outs = primitive.process(top_trace, fun, tracers, params)
File "/anaconda3/envs/alphafold_2/lib/python3.8/site-packages/jax/core.py", line 2025, in process
return trace.process_map(self, fun, tracers, params)
File "/anaconda3/envs/alphafold_2/lib/python3.8/site-packages/jax/core.py", line 687, in process_call
return primitive.impl(f, *tracers, **params)
File "/anaconda3/envs/alphafold_2/lib/python3.8/site-packages/jax/interpreters/pxla.py", line 841, in xla_pmap_impl
return compiled_fun(*args)
File "/anaconda3/envs/alphafold_2/lib/python3.8/site-packages/jax/_src/profiler.py", line 294, in wrapper
return func(*args, **kwargs)
File "/anaconda3/envs/alphafold_2/lib/python3.8/site-packages/jax/interpreters/pxla.py", line 1656, in call
out_bufs = self.xla_executable.execute_sharded_on_local_devices(input_bufs)
jax._src.traceback_util.UnfilteredStackTrace: jaxlib.xla_extension.XlaRuntimeError: INTERNAL: Failed to allocate 50331648 bytes for new constant: while running replica 0 and partition 0 of a replicated computation (other replicas may have failed as well).
The stack trace below excludes JAX-internal frames.
The preceding is the original exception that occurred, unmodified.
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
File "train.py", line 264, in
app.run(main)
File "/anaconda3/envs/alphafold_2/lib/python3.8/site-packages/absl/app.py", line 312, in run
_run_main(main, args)
File "/anaconda3/envs/alphafold_2/lib/python3.8/site-packages/absl/app.py", line 258, in _run_main
sys.exit(main(argv))
File "train.py", line 216, in main
state = jax.pmap(updater.init)(rng_pmap, data)
jaxlib.xla_extension.XlaRuntimeError: INTERNAL: Failed to allocate 50331648 bytes for new constant: while running replica 0 and partition 0 of a replicated computation (other replicas may have failed as well).

I have 8 nodes of 12G GPU, and 125G mem. Can you tell me how to solve it?

I tested this code on A40(48GB) server and it works.
You can try to set " os.environ['XLA_PYTHON_CLIENT_MEM_FRACTION'] = '2' " or lower to reduce memory usage.

@Violet969
Copy link

Thanks for your reply. I try to run 'train.py' on my server. But there always have an error like this.

2023-01-01 07:07:23.507834: E external/org_tensorflow/tensorflow/compiler/xla/pjrt/pjrt_stream_executor_client.cc:2130] Execution of replica 0 failed: INTERNAL: Failed to allocate 50331648 bytes for new constant
Traceback (most recent call last):
File "train.py", line 264, in
app.run(main)
File "/anaconda3/envs/alphafold_2/lib/python3.8/site-packages/absl/app.py", line 312, in run
_run_main(main, args)
File "/anaconda3/envs/alphafold_2/lib/python3.8/site-packages/absl/app.py", line 258, in _run_main
sys.exit(main(argv))
File "train.py", line 216, in main
state = jax.pmap(updater.init)(rng_pmap, data)
File "/anaconda3/envs/alphafold_2/lib/python3.8/site-packages/jax/_src/traceback_util.py", line 162, in reraise_with_filtered_traceback
return fun(*args, **kwargs)
File "/anaconda3/envs/alphafold_2/lib/python3.8/site-packages/jax/src/api.py", line 2158, in cache_miss
out_tree, out_flat = f_pmapped(*args, **kwargs)
File "/anaconda3/envs/alphafold_2/lib/python3.8/site-packages/jax/_src/api.py", line 2034, in pmap_f
out = pxla.xla_pmap(
File "/anaconda3/envs/alphafold_2/lib/python3.8/site-packages/jax/core.py", line 2022, in bind
return map_bind(self, fun, *args, **params)
File "/anaconda3/envs/alphafold_2/lib/python3.8/site-packages/jax/core.py", line 2054, in map_bind
outs = primitive.process(top_trace, fun, tracers, params)
File "/anaconda3/envs/alphafold_2/lib/python3.8/site-packages/jax/core.py", line 2025, in process
return trace.process_map(self, fun, tracers, params)
File "/anaconda3/envs/alphafold_2/lib/python3.8/site-packages/jax/core.py", line 687, in process_call
return primitive.impl(f, *tracers, **params)
File "/anaconda3/envs/alphafold_2/lib/python3.8/site-packages/jax/interpreters/pxla.py", line 841, in xla_pmap_impl
return compiled_fun(*args)
File "/anaconda3/envs/alphafold_2/lib/python3.8/site-packages/jax/_src/profiler.py", line 294, in wrapper
return func(*args, **kwargs)
File "/anaconda3/envs/alphafold_2/lib/python3.8/site-packages/jax/interpreters/pxla.py", line 1656, in call
out_bufs = self.xla_executable.execute_sharded_on_local_devices(input_bufs)
jax._src.traceback_util.UnfilteredStackTrace: jaxlib.xla_extension.XlaRuntimeError: INTERNAL: Failed to allocate 50331648 bytes for new constant: while running replica 0 and partition 0 of a replicated computation (other replicas may have failed as well).
The stack trace below excludes JAX-internal frames.
The preceding is the original exception that occurred, unmodified.
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
File "train.py", line 264, in
app.run(main)
File "/anaconda3/envs/alphafold_2/lib/python3.8/site-packages/absl/app.py", line 312, in run
_run_main(main, args)
File "/anaconda3/envs/alphafold_2/lib/python3.8/site-packages/absl/app.py", line 258, in _run_main
sys.exit(main(argv))
File "train.py", line 216, in main
state = jax.pmap(updater.init)(rng_pmap, data)
jaxlib.xla_extension.XlaRuntimeError: INTERNAL: Failed to allocate 50331648 bytes for new constant: while running replica 0 and partition 0 of a replicated computation (other replicas may have failed as well).

I have 8 nodes of 12G GPU, and 125G mem. Can you tell me how to solve it?

I tested this code on A40(48GB) server and it works. You can try to set " os.environ['XLA_PYTHON_CLIENT_MEM_FRACTION'] = '2' " or lower to reduce memory usage.

Thanks for your so fast reply, that 'os.environ['XLA_PYTHON_CLIENT_MEM_FRACTION'] = '2'' works.
But i met another error like these

Traceback (most recent call last):
  File "train.py", line 269, in <module>
    app.run(main)
  File "/anaconda3/envs/alphafold_2/lib/python3.8/site-packages/absl/app.py", line 312, in run
    _run_main(main, args)
  File "/anaconda3/envs/alphafold_2/lib/python3.8/site-packages/absl/app.py", line 258, in _run_main
    sys.exit(main(argv))
  File "train.py", line 233, in main
    state, metrics = updater.update(state, data)
  File "train.py", line 176, in update
    if step % self._checkpoint_every_n == 0:
ValueError: The truth value of an array with more than one element is ambiguous. Use a.any() or a.all()

Can you tell me how to solve it?

@elttaes
Copy link
Owner

elttaes commented Jan 1, 2023

Thanks for your reply. I try to run 'train.py' on my server. But there always have an error like this.

2023-01-01 07:07:23.507834: E external/org_tensorflow/tensorflow/compiler/xla/pjrt/pjrt_stream_executor_client.cc:2130] Execution of replica 0 failed: INTERNAL: Failed to allocate 50331648 bytes for new constant
Traceback (most recent call last):
File "train.py", line 264, in
app.run(main)
File "/anaconda3/envs/alphafold_2/lib/python3.8/site-packages/absl/app.py", line 312, in run
_run_main(main, args)
File "/anaconda3/envs/alphafold_2/lib/python3.8/site-packages/absl/app.py", line 258, in _run_main
sys.exit(main(argv))
File "train.py", line 216, in main
state = jax.pmap(updater.init)(rng_pmap, data)
File "/anaconda3/envs/alphafold_2/lib/python3.8/site-packages/jax/_src/traceback_util.py", line 162, in reraise_with_filtered_traceback
return fun(*args, **kwargs)
File "/anaconda3/envs/alphafold_2/lib/python3.8/site-packages/jax/src/api.py", line 2158, in cache_miss
out_tree, out_flat = f_pmapped(*args, **kwargs)
File "/anaconda3/envs/alphafold_2/lib/python3.8/site-packages/jax/_src/api.py", line 2034, in pmap_f
out = pxla.xla_pmap(
File "/anaconda3/envs/alphafold_2/lib/python3.8/site-packages/jax/core.py", line 2022, in bind
return map_bind(self, fun, *args, **params)
File "/anaconda3/envs/alphafold_2/lib/python3.8/site-packages/jax/core.py", line 2054, in map_bind
outs = primitive.process(top_trace, fun, tracers, params)
File "/anaconda3/envs/alphafold_2/lib/python3.8/site-packages/jax/core.py", line 2025, in process
return trace.process_map(self, fun, tracers, params)
File "/anaconda3/envs/alphafold_2/lib/python3.8/site-packages/jax/core.py", line 687, in process_call
return primitive.impl(f, *tracers, **params)
File "/anaconda3/envs/alphafold_2/lib/python3.8/site-packages/jax/interpreters/pxla.py", line 841, in xla_pmap_impl
return compiled_fun(*args)
File "/anaconda3/envs/alphafold_2/lib/python3.8/site-packages/jax/_src/profiler.py", line 294, in wrapper
return func(*args, **kwargs)
File "/anaconda3/envs/alphafold_2/lib/python3.8/site-packages/jax/interpreters/pxla.py", line 1656, in call
out_bufs = self.xla_executable.execute_sharded_on_local_devices(input_bufs)
jax._src.traceback_util.UnfilteredStackTrace: jaxlib.xla_extension.XlaRuntimeError: INTERNAL: Failed to allocate 50331648 bytes for new constant: while running replica 0 and partition 0 of a replicated computation (other replicas may have failed as well).
The stack trace below excludes JAX-internal frames.
The preceding is the original exception that occurred, unmodified.
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
File "train.py", line 264, in
app.run(main)
File "/anaconda3/envs/alphafold_2/lib/python3.8/site-packages/absl/app.py", line 312, in run
_run_main(main, args)
File "/anaconda3/envs/alphafold_2/lib/python3.8/site-packages/absl/app.py", line 258, in _run_main
sys.exit(main(argv))
File "train.py", line 216, in main
state = jax.pmap(updater.init)(rng_pmap, data)
jaxlib.xla_extension.XlaRuntimeError: INTERNAL: Failed to allocate 50331648 bytes for new constant: while running replica 0 and partition 0 of a replicated computation (other replicas may have failed as well).

I have 8 nodes of 12G GPU, and 125G mem. Can you tell me how to solve it?

I tested this code on A40(48GB) server and it works. You can try to set " os.environ['XLA_PYTHON_CLIENT_MEM_FRACTION'] = '2' " or lower to reduce memory usage.

Thanks for your so fast reply, that 'os.environ['XLA_PYTHON_CLIENT_MEM_FRACTION'] = '2'' works. But i met another error like these

Traceback (most recent call last):
  File "train.py", line 269, in <module>
    app.run(main)
  File "/anaconda3/envs/alphafold_2/lib/python3.8/site-packages/absl/app.py", line 312, in run
    _run_main(main, args)
  File "/anaconda3/envs/alphafold_2/lib/python3.8/site-packages/absl/app.py", line 258, in _run_main
    sys.exit(main(argv))
  File "train.py", line 233, in main
    state, metrics = updater.update(state, data)
  File "train.py", line 176, in update
    if step % self._checkpoint_every_n == 0:
ValueError: The truth value of an array with more than one element is ambiguous. Use a.any() or a.all()

Can you tell me how to solve it?

Delete the './tmp' folder.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants