DetailedDiscussionOfConcentrationQualityInOBI

How do we specify solute concentration in OBI using BFO + OBO-RO + PATO
There are typically two ways in which concentration is important to the experimental descriptions OBI will need to support:


 * 1) Concentration of a particular set of solutes is fed into an experiment as a part of an assay or material transformation - e.g.:
 * 2) * incubate this set of materials in a saline buffer composed of these solute concentrations
 * 3) * set the pH of your electrophysiological recording saline to X


 * 1) Concentration of a particular solute is the output of an assay - e.g.:
 * 2) * the concentration of protein in this solution
 * 3) * the concentration of glucose in this blood sample

How can we build formal descriptions for capturing concentration both as a description of a planned process/protocol, as well as the information content entity output of an assay? This must also take into account the linear reporting range of the concentration assay being used.

One of the most difficult requirements of the representational strategy we adopt is it must be able to provide a means to capture the fact that frequently concentration measurement assays will indicate the molecule in question has a concentration outside of the linear range of sensitivity for that assay. When hitting the lower bounds, If present at all, the material is at a concentration that is less than the sensitivity of the assay being used. When reaching the upper bound for the assay (typically the bound beyond which the data transformation used to interpret the information content entity output of the assay no longer provides a predictable linear relation to the concentration being measured). For instance, the standard | Coomassie Blue-based Bradford protein assay is only linear in the range of 0.1 mg - 1.5 mg protein/ml. When the data hits the upper bounds, there would likely be an additional assay performed on a diluted sample with the intension of obtaining a determination that falls within the linear range. When one hits the lower bounds (for Bradford, 0.1 mg/ml), one can only state the value is < that lower limit - not that it is zero. One would often then resort to a method whose linear range is more sensitive - e.g., the | Micro BCA assay with a sensitivity in the range of 0.5 - 20 micrograms/ml. Additional techniques such as | silver stained gels or various | immunoassays can provide determination of very specific proteins down in the nanogram range. This variety of techniques is presented in an effort to capture the everyday complexity behind even a simple determination of protein concentration in a sample. This is important for OBI to cover, since the type of assay and its concommittant sensitivity range will provide a means to determine whether two concentration values (information content entities) can actually be compared. When the values are close to the bounds of linear sensitivity of the assay, such comparisons must only be made with a modicum of caution. Some would even say to support a quantitative comparison between concentration values, the same assay should be used.

[problem summary by BB]

Participants

 * Alan Ruttenberg (AR)
 * John Westbrook (JW)
 * Bjoern Peters (BP)
 * Ryan Brinkman (RB)
 * Philippe Rocca-Serra (PRS)
 * James Malone (JM)
 * Melanie Courtot (MC)
 * Chris Mungall (CM)
 * Barry Smith (BS)
 * Bill Bug (BB)

<<Summary Discussion Of Concentration Quality In OBI <<Back to Homepage

BS: February 1, 2008 4:18:46 PM PST
I don't see how this differs from the Alan Rector solution That is, in Case 2: There is no zinc in the sample. What does 'ZincConcentration2' represent?

BB: February 1, 2008 4:31:18 PM PST
That was my concern with the current example as drawn. The definition of concentration does not REQUIRE an "of-solute" relation. It only says that when a concentration quality has a "of-solute" relation, it MUST be to a material. My sense was this was essentially equivalent to the current "ro:towards" relation used to specified "pato:concentration of" what. In other words, a Concentration MUST have a "in-solvent" relation but doesn't have to have an "of-solvent" relation. Of course, it then has NO WAY of linking back to the solute material it is intended to be reporting on.

We need to refine this representation so that it:
 * 1) Copes well with the "lacks" case (i.e., concentration less than the sensitivity of the assay used)
 * 2) Can do so in a way that makes the link back to concentration of WHAT explicit
 * 3) Doesn't lead to the explosion of "no concentration of X" qualities.

CM: February 1, 2008 5:10:06 PM PST
BB: That was my concern with the current example as drawn. The definition of concentration does not REQUIRE an "of-solute" relation. It only says that when a concentration quality has a "of-solute" relation, it MUST be to a material. My sense was this was essentially equivalent to the current "ro:towards" relation used to specified "pato:concentration of" what.

That is what I thought too

I will be the first to admit that this is a curious relation. It is in some part an artificial instance-type relation to make up for the constraint that the inheres relation must be binary.

The strangeness manifests itself at the class an instance level. In [1], is the Zinc box denoting a single molecule of Zinc? Is this a representative molecule? Or have other molecules been omitted from the diagram?

BB: In other words, a Concentration MUST have a "in-solvent" relation but doesn't have to have an "of-solute" relation. Of course, it then has NO WAY of linking back to the solute material it is intended to be reporting on.

Well it can have a universal restriction at the class level; Alan has defined a class

ZincConcentration = Concentration THAT in-solute ONLY Zinc

The instance ZincConcentration2 is presumably asserted to instantiate this class.

AR: February 1, 2008 9:32:34 PM PST
BS: I don't see how this differs from the Alan Rector solution That is, in Case 2: There is no zinc in the sample. What does 'ZincConcentration2' represent?

Don't know Alan's solution but you should notice that in the case that there is no zinc there is no instance of zinc(properly "portion of zinc"). The zero, when there is one, is an information entity about the quality. The quality itself is a bit odd, in that while it generally refers to some zinc, it might not. Finally, there is no implication that a 0 information entity says that there is in fact no zinc - this saves us from the "limit of detectibility" problem-that we might measure nothing because the instrument can't detect that little. Of course that can be tightened up for other classes (of big things)

AR: February 1, 2008 9:32:34 PM PST
CM: The strangeness manifests itself at the class an instance level. In [1], is the Zinc box denoting a single molecule of Zinc? Is this a representative molecule? Or have other molecules been omitted from the diagram?

"Zinc" should be read "portion of zinc".

CM: Alan has defined a class: ZincConcentration = Concentration THAT in-solute ONLY Zinc. The instance ZincConcentration2 is presumably asserted to instantiate this class.

Yes. The name is only a name. What it represents when there is no zinc is admittedly odd, but presumably it is something to do only with the solute. Perhaps "Solute that lacks Zinc" ;-)

BB: We need to refine this representation so that it: 1) Copes well with the "lacks" case (i.e., concentration less than the sensitivity of the assay used).

That isn't "lacks". It is "might lack". Sensitivity of the assay is in that the result of the assay is information and that the information is different from the stuff.

BB: 2) Can do so in a way that makes the link back to concentration of WHAT explicit.

Via the class relations (traversed via allValuesFrom)

BB: 3) Doesn't lead to the explosion of "no concentration of X" qualities.

There is no such quality that in some cases happens not to related to zinc.

CM: February 2, 2008 4:27:51 PM PST
AR: "Zinc" should be read "portion of zinc".</tt>

Aggregate of zinc molecules?

compare: portion of zinc vs portion of cake

CM: February 2, 2008 6:32:30 PM PST
I don't think this works

Consider:
 * SubClassOf(Zinc,Metal)
 * ZincConcentration = Concentration THAT hasSolute ONLY Zinc
 * MetalConcentration = Concentration THAT hasSolute ONLY Metal
 * => SubClassOf(ZincConcentration,MetalConcentration)

Fair enough. This seems intuitive

(we can imagine Zinc means Aggregate that hasPart only Zinc here; we also get the same inference if we use SOME).

However, if we attach our magnitudes to concentration instances (whichever way we do this: direct representation of ppm in reality, or via some information entity, measurement or the like) and rely solely on the type description as a means of capturing the actual concentration we end up with contradictions:

The zinc concentration may be low (or zero), but the overall concentration of metal may be high.

To see this: translate your first (non-zero) example to OWL then ask Pellet for metal concentrations of a certain magnitude.

There are two options: [a] accept this, and understand that "metal concentration" means concentration of *some* metal, and attach concentration instances to instances of molecule aggregates; [b] make the towards/of-solute relation always instance-type.

In detail:

[a] accept this, and understand that "metal concentration" means concentration of *some* metal, and attach concentration instances to instances of molecule aggregates

Recapitulating Alan's diagram1, but using an aggregate:


 * 1) zc1 instanceOf Conc
 * 2) zc1 has_solute z1
 * 3) z1 instanceOf Aggregate THAT hasPart ONLY Zinc
 * 4) zc1 has_magnitude [123ppm]

Fact [4] can be done via some info-entity if we like. That's an orthogonal discussion.

We *cannot* drop fact [3] in the zero-limit case and rely on the asserted membership of a ZincConcentration class for the reasons given above; We can't discriminate between zero-conc of zinc and zero-conc of metal.

We can have the zero-limit case if we allow aggregates with no parts/grains. This would be a little anti-realist.

We can define named classes X Concentration = Concentration THAT has_solute some Aggregate THAT hasPart only X. But we should name them such that it is clear we mean some Xs, not all Xs.

[b] make the towards/of-solute relation always instance-type

The 'towards' relation (or its sub-relations such as has-solute) must always be to a type, even for quality instances.

This has a few interesting consequences:
 * We need OWL Full (or DL+punning?) to represent the type ZincConcentration = Concentration THAT inheresIn VALUE Zinc
 * ZincConcentration is not a kind of MetalConcentration

The benefit of this approach is that we allow something to have an instance of both LOW ZincConc and HIGH MetalConc without contradiction.

We may be able to have a zero-magnitude ZincConc

AR: February 2, 2008 9:24:54 PM PST
CM: Consider:
 * SubClassOf(Zinc,Metal)
 * ZincConcentration = Concentration THAT hasSolute ONLY Zinc
 * MetalConcentration = Concentration THAT hasSolute ONLY Metal
 * => SubClassOf(ZincConcentration,MetalConcentration)</tt>

Reading this as a MetalConcentration is a Concentration who's solute is Metal (if anything).

Are you distinguishing the MetalConcentration from a MetalConcentration? the MetalConcentration would be the concentration any metal in the solvent, i.e. the sum of ZincConcentration, IronConcentration, SilverConcentration, etc.

a MetalConcentration would be any of the individual ones. Let's call the the case TotalMetalConcentration to make this clear.

A material might have several MetalConcentration qualities inhere in it. However it could have only 1 each of TotalMetalConcentration, ZincConcentration, IronConcentration, etc.

We can't capture that the magnitude of the TotalMetalConcentration is the sum of the individual metal concentrations in OWL. We can say that there can be only one ZincConcentration that inheres in a material.

Class(Material partial restriction(bears maxCardinality(1 ZincConcentration))) // OWL 1.1 QCR (Qualified Cardinality Restrictions)

What does this say:
 * 1) If we subclass such Concentration qualities we need to be careful about what we are saying.
 * 2) Where we have true subClasses of Concentrations we may need to have a TotalConcentration quality as well.

CM: We can imagine Zinc means Aggregate that hasPart only Zinc here; we also get the same inference if we use SOME.</tt>

I've deliberately avoided the aggregate distinction. One of the outcomes of the OBI workshop is that the Aggregate/Object/FiatObjectPart doesn't work for OBI because it is too multigranular an ontology. We resolved to erase the distinction by creating

Class(obi:Material :complete (union-of bfo:Object bfo:ObjectAggregate bfo:FiatObjectPart))

and then only subclass obi:Material (Their's more to say about this. It suggests that bfo should only have Object, and leave the granularity distinctions to the domain specific ontologies, for instance).

CM: However, if we attach our magnitudes to concentration instances (whichever way we do this: direct representation of ppm in reality, or via some information entity, measurement or the like) and rely solely on the type description as a means of capturing the actual concentration we end up with contradictions:

The zinc concentration may be low (or zero), but the overall concentration of metal may be high.</tt>

The is the part that I was having trouble tracking without the *a* versus *the* distinction. Indeed the zinc concentration may be low, and then we have that some MetalConcentration is low, but all we know is that the magnitude of TotalMetalConcentration > magnitude ZincConcentration.

No contradiction - just the need to be careful which concentration we are talking about.

CM: To see this: translate your first (non-zero) example to OWL then ask Pellet for metal concentrations of a certain magnitude.</tt>


 * ) Please share yours. It will be easier to sync. I haven't done my OWL implementation yet.

CM: There are two options: [a] accept this, and understand that "metal concentration" means concentration of *some* metal, and attach concentration instances to instances of molecule aggregates, or [b] make the towards/of-solute relation always instance-type.</tt>

I take [a]. MetalConcentration, as you defined it can only mean concentration of *some* metal. We may also want to define TotalMetalConcentration. If I'm not mistaken, TotalMetalConcentration would be a subclass of MetalConcentration as long as the interpretation of Metal is as an aggregate of molecules each of which is any element that is a Metal. ZincConcentration would be a subclass of MetalConcentration but not ofTotalMetalConcentration.

CM:
 * 1) zc1 instanceOf Conc
 * 2) zc1 has_solute z1
 * 3) z1 instanceOf Aggregate THAT hasPart ONLY Zinc
 * 4) zc1 has_magnitude [123ppm]

Fact [4] can be done via some info-entity if we like. That's an orthogonal discussion.</tt>

Agreed, sort of. If the magnitude is real then we can't have it be an information entity. However we could define a new relation between an info entity and a quality and define it to mean that the info entity is an exact representation of the quality value, rather than simply being "about". The current relation is intended to be rather looser. Was it you or Cristian who said that a measurement is necessarily wrong :) (unless you get improbably lucky).

CM: We cannot drop fact [3] in the zero-limit case and rely on the asserted membership of a ZincConcentration class for the reasons given above; We can't discriminate between zero-conc of zinc and zero-conc of metal.</tt>

Yes we can, I think. I believe that your zero-conc of metal means of total metal - a different quality.

CM: We can have the zero-limit case if we allow aggregates with no parts/grains. This would be a little anti-realist.</tt>

Unless we are anti-aggregate (as is OBI now, apparently :)

CM: We can define named classes X Concentration = Concentration THAT has_solute some Aggregate THAT hasPart only X. But we should name them such that it is clear we mean some Xs, not all Xs.</tt>

+1

CM: This has a few interesting consequences:
 * We need OWL Full (or DL+punning?) to represent the type ZincConcentration = Concentration THAT inheresIn VALUE Zinc
 * ZincConcentration is not a kind of MetalConcentration</tt>

Agree with the conclusions. I don't think it is satisfactory, even barring the OWL Full issues (which is merely annoying). The thing is that the realist position also suggests that we ground all relations by explaining how the instance level relations work, and no such explanation is offered here. I think that the instance level description I offer is a reasonable grounding, but then there is no difference between the two cases.

CM: The benefit of this approach is that we allow something to have an instance of both LOW ZincConc and HIGH MetalConc without contradiction.</tt>

Both [a] and [b] allow this.

CM: We may be able to have a zero-magnitude ZincConc.</tt>

Interestingly, I think that the interpretation of my proposed solution doesn't have a zero magnitude ZincConc. There can be concentration datums that are 0, and real magnitudes that are positive, but because there is no Zinc sometimes, the value 0 for real magnitude simply isn't available, as the ratio of solute to solvent can't be computed when there isn't any solute. Which is kind if satisfying, in a perverse (realist) kind of way.

Thanks for the fun :)

BB: February 3, 2008 3:24 AM PST
I do think this discussion is important - especially the points Chris has made regarding the problem of describing a concentration value of zero that can still point back to the specific molecular species we are trying to capture. I don't think we really solved that yet, regardless of whether we're looking with "God's eye" (counting individual molecules) or using an information entity.

However, I want to try to bring us back to the typical way in which we are trying to manipulate and represent concentration in the context of performing experiments.

There are typically two ways in which concentration is important to the experimental descriptions we need to represent:
 * 1) Concentration of a particular set of solutes is fed into an experiment as a part of an assay (e.g., incubate this set of materials in a saline buffer composed of these solute concentrations, set the pH of your electrophysiological recording saline to X, etc.)
 * 2) Concentration of a particular solute is the output of an assay (e.g., the concentration of protein in this solution, the concentration of glucose in this blood sample)

For the former, typically you use an a priori means to set concentration based on some intrinsic property of the solvent - e.g., molar wt (wt. of one Avogadro's Number [6.02 x 10e23] worth of atoms of that compound) or equivalent wt (wt. of one Avogadro's number worth of dissociated H30+ ions) of solute. The following is an exhaustive list of the means by which concentration is measured. OBI will ultimately need to support them all. Since they "mean" slightly different things (though all are essentially designed to specify a density of solutes in solution), we may need more than a Units Ontology to create a BFO-compliant ontological representation of all of these:
 * 1) Molarity = moles of solute per liter of solution
 * 2) Molality = moles of solute per kg of H2O
 * 3) Normality = H3O+ molar equivalents per liter of solution
 * 4) % by wt = % by wt. of solute compared to total wt. of solution
 * 5) % by vol = % by vol. of solute compared to total vol. of solution

To give us something concrete to work with, here's a standard chemistry problem one might be given and then asked to calculate the various concentration measures:

''A 1 Liter solution containing 571.6g of H2SO4 (sulfuric acid) has a density of 1.329g/cm3. Calculate the Molarity, Molality, Normality, % by wt, and % by vol. of this H2SO4 solution''

The properties we have are:
 * H2SO4 molecular weight = 2*(1.0079) + 32.06 + 4*(15.9994) = 98.0734 g/mole
 * H2SO4 equivalent weight = 98.0734 g/mole * 1/(2 H+/mole) = 49.0367 g/H3O+ equiv

The values for this solution are:
 * moles H2SO4 = 571.6 g/(98.0734 g/mole) = 5.828 moles
 * volume of H2O = 1 Liter - (1329g - 571.6g)*1ml/g = 0.7574 L H2O [note that 1 cm3 or cc = 1 ml]
 * Molarity of H2SO4 solution = 5.828 Moles H2SO4/1 L soln = 5.828 M
 * Molality of H2SO4 solution = moles/g H2O = 5.828 / 0.7574 kg H2O = 7.694745180882 m
 * Normality of H2SO4 solution = equiv/L = 571.6 g/49.0367 g/equiv * 1/L = 11.657 N
 * % by wt of H2SO4 solution = 571.6 g/1329 g for 1 L * 100 % = 43.01 % by wt
 * % by vol of H2SO4 solution = (1000 ml total soln - 757.4 ml H2O)/(1000 ml total soln) = 24.26 % by volume

This is just so we all have a sense of what entities we are talking about when dealing with concentration. The type of quality we are describing is a density - whether in grams or moles - described in thermodynamic terms (i.e., scaled by Avogadro's Number). Even when given in grams, its meant to be a proxy for the number of molecules per unit volume again scaled by Avogadro's Number.

These are the concentration qualities we'll need to represent when providing descriptions of assays that rely on precise concentration measures. Though this may seem like more detail than OBI needs contend with, my sense is if we cannot describe a solution creation protocol in detail, we'll not have provided what is needed to specify critical experimental detail, since without the detail of the "typical" protocol, we'll not be able to clearly identify when a specific instance of that protocol as carried out in a particular lab on a particular day may have deviated from that "typical" protocol.

If we now look at case 2 above (using an assay to measure a concentration), it's useful to look just briefly at a fairly ubiquitous assay and associated instrument used for making that measurement - absorbance spectroscopy. Basically, a sample of the solution is put into a container of controlled size made of material that doesn't absorb in the visible (e.g., a quartz cuvette), and the concentration is deduced based on the % of incident light that is transmitted to the photomultiplier detector. The theory behind this is expressed in Beers Law:

A = e x  l  x  C

where The underlying theory associated with Beers Law is best thought of from a statistical mechanical point-of-view. Basically, the absorbance will be a function of liklihood a photon is absorbed when it collides with a molecule (molar absorptivity) times the liklihood it will hit a molecule along its path. The latter quantity is represented as a path length (l) times the density of molecules along that path (C), so once again, we come back to the fact that concentration is thought of as a density.
 * A = absorbance (-log(transmitted photon power/absorbed photon power))
 * e = molar absorptivity (the extinction coefficient when I learned it in the 70's)
 * l = the path length traveled by the light through the solution (i.e., the width of the cuvette)
 * C = concentration

Despite the extensive level of automation that has been added to most lab assays in the last 20 years, absorbance spectroscopy is still a common technique (e.g., Bradford protein determination assay). Regardless, the theory behind Beers Law helps to reinforce this idea that we are dealing with a mass quantity that though explained well at the stat mech level, is always being measured at the thermodynamic level.

Most other assay methods for determining concentration work according to this same basic model. For instance, iontophoretic measurements of neurotransmitters in situ are based on an electrochemical theory of molar electron transfer, and the various means for measuring blood glucose are based on a variety of means of determining molar concentration of glucose or glucose metabolites.

The BFO+OBO-RO+PATO+OBI representation for concentration must be able to cover both of these cases - i.e., a priori protocol instructions for mixing molar amounts AND a posteriori determined via a concentration measurement assay. For the latter, there must be a clear means of representing exactly which molecular constituent the concentration assay was intended to measure even when the information content entity output from the assay is outside the linear response range for that particular assay.