Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

💡 Discussion on vocabulary for data #45

Open
RoteKekse opened this issue Feb 5, 2024 · 18 comments
Open

💡 Discussion on vocabulary for data #45

RoteKekse opened this issue Feb 5, 2024 · 18 comments
Labels
discussion This needs more discussion

Comments

@RoteKekse
Copy link
Contributor

RoteKekse commented Feb 5, 2024

Description

Dear NFDI4Cat,

I am working at HZB for Catalysis and I am using Nomad as a software/ELN. We use nomad to describe catalytic data.

I met David Linke Last Wednesday and I would like to discuss if it is worth adding terms describing the data.

In an Ontology there is usually the distinction between a specific dependent quality for example selectivity (of a product) and a data point for example 50%. Selectivity is a property a catalyst always has when probed in a specific catalytic experiment with specific conditions. Data also usually as the property of a unit or dimensionless.
The data point(s) could be the output of a measurement, they could be the results of a simulation or of a calculation. And there can be multiple of those. E.g. the experiment could be conducted under the same conditions multiple times on different days.+

For selectivity one could argue that any catalyst has a selectivity for any possible product (although it might be 0 quite a lot). And the fact that it has that does not depend on any data it is just something it has.

Data then reveals how this quality could be quantified in a specific situation.

This distinction is also present in the ontology work such as in the OBI.

I find this distinction quite helpful since it separates the data from the quality. A more practical distinction would be the mass of a human. Any human has a mass. The fact that they have a mass is fixed and a quality of every human. That mass then can be measured and data is produced, could be 80kg, 100000g or others.

For me as a data steward this makes life a lot easier and i applied this already in the context of solar cells.

What do you think :)?

@schumannj, @dalito @HendrikBorgelt @AleSteB

All the best
Micha

@HendrikBorgelt
Copy link
Member

Hi Micha,

while I accept that selectivity is a property that every catalyst has and that can be uniquely attributed to a catalyst, using selectivity in the same sense as weight is quite difficult. We might forgo that the weight must be measured at a certain time since it is mostly unimportant whether a person is around 100kg today and was 99,8kg yesterday, it is still around 100kg. We can not do so with the selectivity. Selectivity is a nonunary characteristic. It always depends on the given educts and the respective product to which it refers. So "selectivity(of a product for a given educt)" Measuring the Selectivity of a deNOx catalyst like a car catalyst also depends on the fuel we use, not only whether we measure NO2 or N2O2. And even if we can say this, the selectivity is still dependent on the whole set of reaction conditions (e.g. Cold start selectivity vs set point selectivity). If you are familiar with I-adopt this picture, where I tried to model the data content of https://nfdirepo.fokus.fraunhofer.de/dataset.xhtml?persistentId=doi:10.82207/FK2/NR5BWO via I-adopt, might help you understand the complexity of modeling a selectivity measurement in a simple data set.
Blank diagram - Page 1

Maybe it is best not to focus on a highly composite term like selectivity (it is basically not one but multiple parameters combined to be more easily comparable) but focus more on more straightforward terms like reaction enthalpy or even decompose a term like selectivity into more manageable terms like conversion and yield.

But I would also like to hear your suggestion as to how you either want to handle the complexity of a selectivity or examples in which you want to use the term selectivity without struggling with such things.

all the best
Hendrik

@RoteKekse
Copy link
Contributor Author

RoteKekse commented Feb 6, 2024

Hey Hendrik,

thanks for your answer :). I think we need to boil it down to the point where it is like a mass of a person. The data point is the output of a measurement and therefore the context like time, reactants, temperature profile etc is given. If we have a term like selectivity measurement datum this then can have in an ontology sense "children" such as selectivity of methane measurement datum. The instances of such a thing are then the specific values of a specific measurement. By the way for mass of a human it is not that much different, it also depends on temperature, time, air pressure and much more, it just feels easy because we now of what it is a quality. The question remains of what is selectivity a quality. Below I would say that selectivity/selectivities are a quality of a chemical reaction.

Lets be concrete, since you asked for an example, i will use highlighted notation to indicate triple stores:
Lets say we have a catalytic measurement. The catalytic measurement has specific inputs temperature setting datum, pressure setting datum... which are data points and are specifications of process temperature, process pressure ... Note that the data points could easily be a profile over time. The catalytic measurement contains catalytic reaction. The catalytic reaction is a chemical reaction and has reactant some chemicals eg CO and has products some chemicals eg Methane and has catalyst some catalyst.
Then conversion is a qualitiy of each reactant. While the conversion measurement datum is output of the catalytic measurement.
Selectivity is a bit more tricky. Since it is a quality of a ratio of reactants and products. So selectivity is a qualitiy of chemical reaction. While the selectivity measurement datum is output of the whole catalytic measurement.

This vocabulary could be used quite nicely to annotate data in nomad. I still find it quite helpful to separate the qualities like mass from the data. Because then you can think about of what mass is as a quality and then at another point you can think about where the data comes out, eg a measurement.

Obviously one could use one term for data and quality, but then i always wonder, are you referring to selectivity as a property of an reaction or are you referring to a numerical value of a specific measurement.

What do you think, best Micha

@schumannj
Copy link
Contributor

Hi Hendrik and all,
I think there are 2 things getting mixed up in the discussion. If I understood correctly Micha asked whether we need/want to define a seperate 'measurement datum' concept for already defined concepts, to annotate our data entries in a database. So for me the question is do we need to define "conversion" and "conversion measurement datum", "reaction temperature" and "reaction temperature measurement datum" and then also "reaction temperature setting datum". Sometimes we even have another maybe "reaction temperature calibrated setting datum", e.g. when we know we have to set the heater to a different temperature to achive the desired temperature in a reactor. Or maybe this could be simplified in the vocabulary to just define "measurement datum", "setting datum", ect. and these are concepts associated with the main qualitative concept such as conversion, reaction temperature, selectivity, yield,....

The 2nd point of discussion, how we define selectivity, is seperate from the original post, I think. Maybe we should open a seperate post about this? I dont think it is an option to not define selectivity in a catalysis vocabulary. The currently existing definition for selectivity is not general enough and does not capture what I need to annotate the data in my datasets: "In photocatalytic processes where product formation is expected, selectivity describes the ability of a photocatalyst to produce only the desired product with minimal (or none) of byproducts." The definition I suggest "A property of a product that refers to the ratio of products obtained from given reactants." is more data focused. The definitions could be extended by the formula "S(product)=Amount of product/ Sum over all products" and include the ambiguity that sometimes products are assigned special weight factors depending on the number of reactant molecule that goes into each product.

@dalito
Copy link
Member

dalito commented Feb 6, 2024

I agree to separate the discussion on the definition for selectivity (to be continued in #46) from how to model data (to be discussed here).

@schumannj - Do you want to copy/move your 2nd point to the new issue?

@schumannj
Copy link
Contributor

Yes, thanks David, I copied my last paragraph into the new issue #46.

@HendrikBorgelt
Copy link
Member

Hi @schumannj and @RoteKekse and all others,

sorry if I misunderstood the selectivity (of a product), I think I was just interpreting it badly. I agree that we should have something like a selectivity measurement datum, however, I would still recommend against a direct reaction selectivity measurement datum as to my knowledge selectivity in a chemical sense is never directly measured but always computed (correct me if I am wrong and don't know the right measurement technique). This also eliminates the question of whether an entity has a selectivity or not and therefore reduces the complexity of describing the selectivity. I know that most "measurements" in the field of chemistry directly reduce their data into something like a selectivity, however, I don't know if this should be best practice and if we thus should model it in such a way, when describing data. Unfortunately, I have to admit that I work relatively little on real data in the context of selectivity determinations. I would therefore recommend that you simply send me another comment as to whether my suggestion makes sense or not.

With regards to @schumannj, I think we should define all those measurement datum, as the class "conversion" for example might just be used in a Thesauri sense while the class "conversion measurement datum" is the only one allowed to inherit Individuals which link to the data of the measurement. The question for me is more where the pure application level begins and where our direct ontology ends. I recommend reading this design pattern and the corresponding example, maybe that will help. As you can see in the links, the 4Chem data would just be modeled.

regarding the difficulties in finding a definition for selectivity, I agree.

@dalito
Copy link
Member

dalito commented Feb 6, 2024

Even after the split there may still be two discussion topics here:

  • How to model catalysis data?
  • Which terms should voc4cat provide "for data"?

Regarding the latter:

  • First, voc4cat should provide at least all definitions for the terms used in a datamodel. Most important is that voc4cat has at least all terms for a normalized or narrow data model in RDMS world or TBox in ontology world.
  • Second, voc4cat may provide more definitions/terms which are not strictly necessary for a data model but are established terms in the (sub-)domains. For example "catalytic reactor" is kind of redundant, if reactor and catalytic reaction are defined.
  • Third, voc4cat may provide collections of terms for various purposes (in applications), e.g. we could make a catalyst performance measure collection or a synthesis method collection etc.. Note that one term can be part of multiple collections and that collections can be part of other collections in SKOS.

On "datum" in the previous messages: You seem to talk about something different than a location "a datum is a reference point, surface, or axis on an object against which measurements are made" (https://en.wikipedia.org/wiki/Datum_reference). What do you mean with datum?

@HendrikBorgelt
Copy link
Member

In the OBI and IAO, to which @RoteKekse and @schumannj or at least I am referring, a datum is just the singular of a data item.
So I hope that we all speak of the same idea of datum as also found in the Wikipedia article of data:

In common usage data (US: /ˈdætə/; UK: /ˈdeɪtə/) is a collection of discrete or continuous values that convey information, describing the quantity, quality, fact, statistics, other basic units of meaning, or simply sequences of symbols that may be further interpreted formally. A datum is an individual value in a collection of data.

@HendrikBorgelt
Copy link
Member

Regarding the question whether we need to further split the discussion, I think splitting is not necessary as I think both question can not be answered mutually exclusive.

If we want to provide an OBI/IAO-esque approach to providing data we are "stuck" with the general terms and ideas provided via both Ontologies. If we do not want to align your vocabulary (at least too much) to those ontologies we might be able to answer both questions separately.

@dalito dalito changed the title 💡 Discussion on vocabluary for data 💡 Discussion on vocabulary for data Feb 6, 2024
@RoteKekse
Copy link
Contributor Author

Hey Hendrik,
In the OBI world there is the distinction between a datum and a measurement datum. The first is the output of a data transformation while the second is the output of an assay. I am totally fine with selectivity data being not measured but an output of a Data transformation. Nevertheless selectivity can be a quality of a reaction.

The same is true with efficiency of a solarcell. It is calculated from a measurement but a quality of a solar cell.

I would be happy if we could include these distinctions in voc4cat :). I find the initiative very helpful and was happy to learn it last week. I think this can be a big step towards interoperability in catalysis! Me and Julia could try out to annotate data then in nomad to test the vocabulary in an ELN setting.

best Micha

@dalito
Copy link
Member

dalito commented Feb 7, 2024

I agree that selectivity is most often an output of a data transformation. However, you can get devices that output selectivity data directly and do the transformation internally. So selectivity is not fundamentally different then other measures. Temperature is also the result of a transformation, e.g. from voltage to Kelvin. The difference is that in one case you trust the reading being a result of an "external" transformation and in the other case you include the transformation in the data model. Both cases are relevant.

@RoteKekse
Copy link
Contributor Author

Hey David, you are correct, for us it is important to know where to draw a line for now (we can extend it if needed). We do not need to describe reality as its fullest (which is any how not possible :D) and we need to compromise somewhere. For catalysis research i think it is fine group temperature as a measurement (might not be if you are a control engineer) for the selectivity that might be different. But if not we can just treat selectivity as the measurement datum for now and if needed we can refine later.

This process is never finished and we have always the option to refine if needed.

@dalito
Copy link
Member

dalito commented Feb 7, 2024

Yes, be pragmatic first. ...but also be aware of the simplifications you make. I am looking forward to hear how well the nomad ELN integration goes.

@dalito dalito added the discussion This needs more discussion label Feb 13, 2024
@dalito
Copy link
Member

dalito commented Feb 13, 2024

@RoteKekse will you closely follow OBO-style modelling? I wonder because you were referring to their ontologies mainly.

@HendrikBorgelt is leaning towards I-ADOPT and DCAT. I-ADOPT should interface well with voc4cat as terminology.

It would be great to have an exchange about how well both approaches work. Maybe at or soon after Katalytiker-Tagung?

@dalito
Copy link
Member

dalito commented Feb 13, 2024

Regarding the modelling of selectivity, which is a ratio of two or more measurements, you may find the example for modelling body-mass index (BMI) in OBI interesting. BMI is similar in the way that it is also derived from two measurements.

@RoteKekse
Copy link
Contributor Author

RoteKekse commented Feb 14, 2024

Hey David, I read the book on bfo and OBI is based on BFO. I felt there exists all I need. And with Obi and chmo there are already lots of terms which are relevant to us.

yes the example of BMI is exactly how I think about these things. But I have never explored different approaches.

@HendrikBorgelt
Copy link
Member

@dalito, regarding I-ADOPT and DCAT, i am currently just trying to get something in between a data format with is unstructured but uses a defined vocabulary such as "Voc4Cat" and "an" Ontology (in Catalysis I think we mostly have to deal with a plethora of Ontologies combine into the "local" Ontologie), which I think is too difficult for most Scientists and even Data Stewards to handle. So my focus is more on, how can we use a "Vocabulary" such as DCAT to give Scientists as well as Datastewards, an easy model after which they can structure their data. (something like a conceptual model which might not be perfectly depicted in this picture https://www.researchgate.net/publication/234793155/figure/fig1/AS:595782646378496@1519057060120/More-Expressive-Semantic-Models-Enable-More-Complex-Applications.png )

@HendrikBorgelt
Copy link
Member

@RoteKekse I think the OBI/IAO approach is one of the most flexible and also expressive modeling descriptions one can have, it is just quite large and therefore "unwieldy" compared to an approach such as
grafik
from i adopt. This is of course shorter but some times also causes troubles in not being flexible enough. See:
Blank diagram - Page 1 (1)
which tries to model a High throughput selectivity measurement of 4 different types of catalyst as can be found here: https://nfdirepo.fokus.fraunhofer.de/dataset.xhtml?persistentId=doi:10.82207/FK2/NR5BWO

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
discussion This needs more discussion
Projects
None yet
Development

No branches or pull requests

4 participants