-
Notifications
You must be signed in to change notification settings - Fork 29
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
MD simulation extension #218
base: upcoming-2.0.0
Are you sure you want to change the base?
Conversation
Thank you for the PR! so basically you do:
In case anything goes south in the process: first create a backup branch and you can always I'll be on a working trip next week but try to take a look as soon as the PR is updated. |
- add missed scopes, coordinate attribute - correct description of the triclinic simulation box - remove irrelevant simex and lammps references
a89ef2e
to
dfe2e03
Compare
@ax3l Thanks. I've followed your instruction and squashed the commits. Please check that. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am sorry for the delay.
This looks wonderful, great work! I added a round of questions and formatting suggestions.
The following additional attributes are defined in this extension. | ||
The individual requirement is given in `scope`. | ||
|
||
- `forceField` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
for forceField
and forceFieldParameters
, do you want to define the syntax of the values further or keep values as a human-readable-only free text?
You can check out the SpeciesType
extension for ideas how to define syntax and alternatives.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The ideal case is that there could be a well-defined syntax to describe the forceField, and to be read by MD simulation code directly. But in practice, for different forceFields, there will be different parameters. Like the LJ potential and EAM potential, the parameter syntax are different.
Since I cannot understand all the potentials used in MD simulation, maybe I can start with defining the syntax from some potentials I understand? If it's other potential, just make it a human-readable-only free text.
Should we also ask the opinion of some other experts about this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure, feel free to reach out. Otherwise you can also just keep it undefined for now.
You can always add more attributes in openPMD and if you arrive at something that is worth standardizing for those attributes, then we can just specify them later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually, I don't particularly like this constraint in the name. ;)
Consider a case where one wishes to find the ground state of an atomic configuration. This is not a forceField
but just a method
. Would this make sense to change?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What do you think @JunCEEE? Sounds reasonable to me, if storing the used method is the intent for this attribute :)
Note that if we use a very general attribute name like method
, we need to add a better description to compensate what we mean :)
- description: the representation of the stored position coordinates | ||
- available values: | ||
- `absolute` the unscaled coordinates | ||
- `fractional` the coordinates that are scaled in the range of [0,1] relative to the length of each box edge; in this coordinate system, the `unitSI` of each position component should be `1.0` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For fractional, do you intentionally include the upper value? (E.g. [0.0, 1.0]
vs. [0.0, 1.0)
) Just double-checking.
Formatting suggestion:
- `fractional` the coordinates that are scaled in the range of [0,1] relative to the length of each box edge; in this coordinate system, the `unitSI` of each position component should be `1.0` | |
- `fractional` the coordinates that are scaled in the range of `[0.0, 1.0]` relative to the length of each box edge; in this coordinate system, the `unitSI` of each position component should be `1.0` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I didn't think it deeply. But intuitively, for the atom located at the edge corner of the simulation box, it's fractional coordinate is [1.0,1.0,1.0]. Thus I set the range to be [0.0,1.0].
Please correct me if there are some occasions I missed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok, I am just wondering about double-counting @JunCEEE - when a particle belongs to one box or another. But maybe I misunderstood and there is only one (simulation) box here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree with @JunCEEE here. It may be valuable to have an atom at the box-edge on one side and not the other. I guess the code should it-self check for mirrored positions. Especially considering that not all directions need to be periodic [0 ; 1]
seems like the correct choice here.
|
||
### Records for Sub-Group `box` | ||
|
||
The following records are defined in this extension. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you want to store these small data sets as records
or as attributes
? Both is fine, just be aware that attributes can only be scalars or 1D-arrays, due to limitations in some file formats.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, I would like to use records
here.
Thanks Axel. I've replied to your questions. Do we need more suggestions from experts in MD simulation? |
VC meeting today:
Action items:
|
@CFGrote I realized that the information of scattering factors and electronic charges related is missed in the standard. when I drafted the standard, I only considered the output of traditional/classical MD simulations, where the scattering process was absent. I am not sure whether we add it to this standard or have a different one. Any suggestions? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
General comments, I come from the DFT community and would like to explore whether OpenPMD could be of use to us!
For that this PR seems very much needed.
EXT_MD.md
Outdated
|
||
The following attributes are defined in this extension. The individual requirement is given in `scope`. | ||
|
||
- `coordinate` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In the regular openmpd scheme coordinates are labelled position
. Is there need to introduce another naming convention?
Wouldn't it be better to sub-inherit the position
spec here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here the coordinate
attribute is a string indicating if the position
records are in absolute coordinates or fractional coordinates. It's not a replacement/equivalent of position
.
Maybe the name is a little confusing. We replace it with a bool attribute, fractionalCoordinate
. Do you think it would be clearer to be with the bool attribute?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If it relates directly to position
, then why not positionFormat
? One could also imagine scaled coordinates according to some lattice parameter.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It sounds like a good suggestion.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
EXT_MD.md
Outdated
- description: the boundary condition of the box along each dimension. The allowed values in `boundary` are either **periodic** or **none**. | ||
- example values: | ||
- `["periodic","periodic","periodic"]` periodic in all the three dimensions | ||
- `["none","periodic","periodic"]` periodic in only *y* and *z* dimensions |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Might I suggest we add dirichlet
and neumann
for a more complete basis?.
Also, I would refrain from using y and z as specifiers. When the box
is a (3, 3)
D=3 one may have a skewed box where then the boundary refers to the box-boundaries which are not necessarily Cartesian directions. As below the triclinic box is suggested.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good suggestions. I'll add the two conditions. Also, the other dimensions will be referred to as the second and third dimensions in the example.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- `[[1.,0.,0.],[0.,1.,0.],[0.,0.,1.]]` 3D cubic simulation box, Ax = (1.,0.,0.), Ay = (0.,1.,0.), Az = (0.,0.,1.) | ||
- `[[3.46,0.,0.],[1.73,2.997,0.],[0.,0.,10.]]` 3D triclinic simulation box, Ax = (3.46,0.,0.), Ay = (1.73,2.997,0.), Az = (0.,0.,10.) | ||
- `[[1.,0.],[1.,1.]]` 2D rectangle simulation box, Ax = (1.,0.), Ay = (0.,1.) | ||
- `limit` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't understand the difference between limit
and edge
they are both required?
Why is limit
needed? Could you make an example?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
edge
is a collection of the edge direction vector of each dimension of the box. It basically defines the direction of each edge/dimension of the box. edge=[[1.,0.,0.],[0.,1.,0.],[0.,0.,1.]]
means a 3D orthorhombic (instead of cubic, my fault.) simulation box. edge= [[3.46,0.,0.],[1.73,2.997,0.],[0.,0.,10.]]
means a 3D triclinic simulation box.
limit
defines the starting and ending point on each edge of the box. It basically defines the size and position of the box.
Maybe, still, the naming is a bit confusing. It would be nice if you could suggest something better.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I still don't understand! ;) You write exactly what the document says.
Why isn't edge
fully explaining the box
? What does limit
tell you that edge
doesn't?
Could you make a picture (2D I guess would suffice)?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I.e. what is the relation between edge
and limit
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for asking. Actually, I just realized the problem of this definition.
The idea was inspired by xhi
and xlo
in https://docs.lammps.org/create_box.html. However, I didn't really implement the iead clearly in this definition. What I really need (as shown in the graph) is actually edge
for length and direction and origin
instead of limit
for the origin of the box in a cartesian coordinate. Please let me know how you think if we implement it in this way.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agree. I think we should replace edge
with vectors
.
In addition to that, I'd also introduce a lengths
keyword to define the length of box edges to ease the defining and changing of box sizes (vectors
will only represent the direction.). Considering a box with vectors = [[3.46,0.,0.],[1.73,2.997,0.],[0.,0.,10.]]
, it's not straightforward to change the box size to 5x5x5 using only vectors representing both directions and lengths. With the lengths
keyword, this will be more straightforward and practical.
It will be like this:
https://github.com/JunCEEE/openPMD-standard/blob/85369d1dd4119bd27413de90a64caccaac604520/EXT_MD.md?plain=1#L125-L139
What do you think?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think length
should be there at all. The vectors
should contain the length of the vectors.
I don't understand your:
Considering a box with vectors = [[3.46,0.,0.],[1.73,2.997,0.],[0.,0.,10.]], it's not straightforward to change the box size to 5x5x5 using only vectors representing both directions and lengths. With the lengths keyword, this will be more straightforward and practical.
I really have no clue what you mean here? Do you just want each vector to have a length of 5?
One parameter is more than enough to cover all cases.
My problem with your proposal is that it requires users to do this: 1) read vectors, 2) normalize vectors, 3) read lengths, 4) scale vectors
With only 1 parameter you just read it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks. lengths
is intended to help users change the box size easily: only change the lengths
without re-calculate the vectors, and sizes are changed more frequently than vector directions. But surely your comment also makes sense. The implementation about being user-friendly can also be done on the user interface level. If we follow the principle of storing essential information, we should set only one parameter.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok. I think this should be a data format. Not a convenience format for simulators. :) So keeping things simple is much more important to me ;)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sounds like you arrived at a solution :)
Hello @zerothi! Welcome! Your comments would be important for us to see if this MD standard can also fulfill the need of DFT community. |
I would be happy to review a bit more. Is there something in particular that needs attention? |
|
||
- `positionFormat` | ||
- type: *(string)* | ||
- scope: *required* |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok, another ping here ;)
Since generally position
is considered absolute
wouldn't it make sense to make this optional, thus defaulting to absolute
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It would be a good idea for a program that writes in this format. However, as a data format, I feel it's better to make the positionFormat
clear in metadata to ensure the data is self-described. Thus I tend to keep it required.
What do you think?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't know what the OpenPMD community has of rules for these somewhat optional things. I don't mind having it required, it is just weird given that the OpenPMD specification without the MD simulation extension does not have this. If anything this format shouldn't belong to the MD extension.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't know if I understand correctly, but there is indeed the scope
with optional
in the "PIC extension" which I take as a template. And there are actually some records belonging to the topic of MD simulation but not essential (e.g. observables
).
Maybe @ax3l can comment on this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for thinking about this!
What I understand is that this attribute tries to modify how we interpret position records. While we generally can do such modifications in extensions, they might be a bit tricky to keep neatly compatible with the base standard.
For the specific task here, it looks to me like you try to achieve a "fine" and "coarse" position, similar to what we do in some particle-in-cell codes. For this purpose, the base standard has already he "position" and "positionOffset" records defined. Could we potentially use exactly that mechanism here or does it differ?
As an example, in PIConGPU, we defined positionOffset
to a cell index, scaled via unitSI
back to m
. Then we use position
with values between [0,1)
to store the fine-grained position in a cell. The unitSI
for position
can be a different value than the one for positionOffset
.
For data sets that do not need this splitting in fine and coarse positions, we define positionOffset
to zero values (constant record).
|
||
### Additional Sub-Group for each Particle Species | ||
|
||
`box` is an *optional* sub-group for each particle species to contain the information of the simulation box. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why is the box optional for each particle species? I mean the box is globally defined for the entire simulation box. Not by each particle specie?
Implements issue: #212 @CFGrote
Description
What is introduced, removed or renamed and why?
Example
Affected Components
EXT-MD
Writer Changes
What would a writer need to change?
observables
as a group under eachIteration
, in the same level ofparticles
to describe the thermodynamic parameters of the whole atomic system.Does this pull request change the interpretation of existing data writers?
Reader Changes
What would a reader need to change? Link implementation examples!
observables
as a group under eachIteration
, in the same level ofparticles
to describe the thermodynamic parameters of the whole atomic system.Example data: https://github.com/ejcjason/MDDomainExtension/blob/master/dataMD.h5