Workshop notes 2007 july 9

Present: Gilberto Fragoso, Chris Stoeckert, Richard Scheuerman, Susanna Sansone, Philippe Rocca-Serra, Bill Bug, Liju Fan, Jennifer Fostel, Allyson Lister, Randi Vita, Tina Boussard, Helen Parkinson, James Malone, Kevin Clancy, Alan Ruttenberg, Christian Cocos, Barry Smith, Daniel Rubin (by phone), Mervi Heiskanen

Morning Session
This was an informational meeting on OBI (semantics), MIBBI (minimum information checklists) and FuGE (syntax) data standards. The following three talks were presented to NIH programme officers.

1.Chris Stoeckert

2. Bill Bug OBI Overview for July, 2007 f2f meeting - Bill Bug update of OBI Core v2 slides

3. Susanna Sansone

Draft Agenda []

Among these talks were two discussion sessions focusing on OBI goals, funding and future.

Discussion I: Who's missing from OBI that should be involved?
The main questions raised by CS in this discussion section were: Who's missing from OBI that should be involved? Any criteria to decide who to target? What incentives should we be trying to provide to join us?

The audience wondered what OBI means by "genomics" community, as it's a very broad topic. Further, many of the communities described overlap. CS replied with the following examples: the eventual replacement MGED Ontology and BIRNLex with OBI, and the RADLex project for the MRI community, e.g. Daniel Rubin.

It is difficult to get money for funding this work, as grant people won't generally give money for ontology curators. AR mentioned that money should be provided to develop the *skill* of ontology creation and curation. He wants to establish a teaching program.

Someone in the audience later made the statement that they (or a subset of users) might want to only use a minimal set of the ontology. BB mentions the MIA* efforts, and using them in the context of the ontologies. Also, members of the audience suggested that OBI could be used in a number of efforts, including the CaBIG (Cancer Bioinformatics) community, and the NCI Thesaurus. AR also says you could try to invest part of people's times, rather than getting specific funding for the entirety of a FTE (full-time equivalent). Another topic brought up was the Clinical Trials community - what can we show management? Does OBI have any good examples for them? This was brought up again later in the day, when the OBI developers thought of a number of good OBI use-cases (see below).

Discussion II: How should efforts such as OBI be funded?
The main questions raised by CS in this discussion section were: How should efforts such as OBI be funded? Encourage communities to make it a budget item? Put it in an OBI-focused resource grant? development, infrastructure, and training are three separate funding areas. Role of the NCBO? Currently serves as advisors, and provides tools and methodologies, not support for building.

'''What is the real point of OBI? How to use it?''' Plenty of examples, like science commons and neuro commons, journals (e.g. tagging articles or sections of articles), alan's work, CISBAN DPI, ArrayExpress to GEO mapping will be a lot easier with the core of OBI developed. Suggestion of AHA, etc. for funding, as long as we can give good examples of the usefulness of OBI to these communities.

How will the world be different when OBI is complete? Provides method for data exchange and for correct analysis and searching over a large corpus of investigations. People will use MIBBI to discover if there is already a minumum information checklist. If there isn't anything there, they have to make their own MIA*. But how will they know how to do this? Look at MIBBI and get started: this sort of thing needs to be written up. There is a need for guidelines on how to do this sort of work. Publication costs money, but if you treat putting data into FuGE format as a publication of your data in electronic format, it could be a useful way of adding such work into the grant proposals.

AR: Changing incentives requires pushing from either journals or funding agencies to say this must be done. Secondly, a workforce that is able to do this sort of encoding does not yet exist. OBI is the start of this training. OBI promises (with a common language for describing results etc) a situation where integration and searching of genomic-scale datasets will be quicker than before. The interest isn't in the individual investigators, but in the people who fund the investigators, knowing they will get more for their money by using OBI.

Do we want to model raw data or "final research data"? Makes a big difference to the cost of using something like OBI. Everything should be included in the long term, including LIMS.

Reiteration of importance of use-cases (how to use OBI) from the point of view of the people who would use OBI. Inevitably, the response to "Our institute should use OBI" is, "What is the benefit to us"?

Additional Agenda Items for Workshop

 * Started with a discussion of what to do in the coming days, with main requested agenda items. This includes presentations on branches, with a brief discussion on how these should go.
 * Formalizing how the advisors get credit for OBI. Have it "offline" as a little subgroup and present the results.
 * Other discussion topics for this week: SOP and reasoning, svn and branching "clinic" (Alan), how to organize OBI when mature, to make it easier for users to use it.
 * AL: How to layer the ontology for the public. Check Matt Pocock's email on making an end user looking at one file, not all the branch files. AR and GF also want to do that
 * Deprecation policy development and spec. for an implementation
 * Reasoning has been done, but should continue to be done. OWL-DL, consistent apart from an annotation property. Suggestion that AR will demo it this week.
 * Created the page for coordinators, were also going to have a page for developers. AL volunteered to make a mock up of the page to include all obi-devel list people. BB suggests we can mine the wiki to calculate no's of posts. AL reports some probs also with subversion and static web pages. She will look into that. Different people have editing access to various pages. Suggestion for a half hour clinic using subversion. Some core developers wanted their own case for testing e.g. with Protege. Suggest using the branching system and create your own branch.
 * discuss whether the lab env is under the envontology or should come back into OBI
 * AR issue 1 - out of scope terms, should go somewhere, need an out of scope section, for now in OBI, but later will be moved. Need to go over some of these cases
 * AR issue 2 - some terms rejected from branches for mult. inhertitance. Need a process for generating defined terms that make our users happy and need to look at these cases too.
 * Linking out to other ontologies from OBO. Break out session for this. Suggestion that relations branch organise this from Bill Bug.
 * Additional Discussion Items also were proposed. Unless we want to get a dedicated space we can start on these @ 3pm through the week each day. These are not complex and we shouldn't spend too much time on these.
 * Pros and cons of Protege 4. When is it stable enough for use. And Protege collaborative framework use. Has been evaluated by Daniel Schober at EBI and is working better after recent bug fixes.

Review of Milestones from San Diego

 * Milestone: Polishing of meta data - large no of dicussions on that.

AA:schedule to have an off line discussion.
 * Milestone: Developing policy for assigning credit. Not finished, we did come up with a draft. Couple of issues e.g. OBI contributor's advisor wanted some credit, and a way to do that. Need to be formalised them. Defer discussion on this in favour of doing ontology building.
 * Milestone - Submission of c'ty terms. Complete for first found. We acknowledge there is still work to do iterating, but we have something to work with. Just a mechanism, doesn't preclude adding on the fly. Should this process be separated from cty submission of terms? We think yes, helps us deal with batch submissions.

Then we talked about our policy in terms of multiple inheritance. AR said one possibility would be to make a defined (necessary & sufficient) class that is not necessarily in the "real" hierarchy. Inference would place it in the right location. Example is Diploid Cell, which could go into multiple locations in the single hierarchy. Further, it may go into an external ontology (diploid could be a quality, which equals PATO). This will be discussed in its own session later in the week.

Another milestone was the review of prelim community OBI versions for inspiration of branch editors. Discussion on what this means. Dependent on Communities making OWL files for review. Didn't do that so this hasn't yet happened.

May 1st had a couple of milestones. The first is to present the proposal for environmental/medical/other history. Jen reported. Barry mentions Geo.obo (thought up by Michael Ashburner), which is an obo foundry. There is also EnvO (Environment Ontology), which Dawn Field, for instance, plans to use within the GSC framework. They are both OBO Foundry, and are primarily devoted to children of the BFO class Site. If they are both OBO, then how will we keep them orthogonal? Geo is for annotating *real* geographic locations (already begun, large-scale things like "Poland"), and EnvO (planned with funding, but not started) for terms like habitat and oral cavity (small-scale things like those kinds of entities where organisms live). Geo has a workshop at the end of August. Michael's Geo sort of popped up suddenly. A lot of Jen's terms have been subsumed into the EnvO. Laboratory or clinical artefact may go back to OBI. However, ultimately, those laboratory terms are still environments. We may develop them initially, but then submit them to the EnvO. Further discussion of this has been added to the agenda for this week.

'''Note from Suzanna Lewis: I noticed the mention of geo.obo and envo in the notes. These are not different ontologies, but different types of things. geo.obo is a straw-man ontology that we're discussing as a part of the envo project. Being good OBO Foundry practitioners we would never have >1 ontology to cover the same domain. We're agreed to knit it all together, it isjust part of the process.'''

Discussion - JF:Norman was working on this and has been working on getting on funding. Will include gut cavity, habitat. Some potential to overlap with OBI. Env. ontology is still under planning. BS:Related efforts. Need a systematic way to annotate fly populations. geo.ontology created by Michael Ashburner, geographical location annotation ontology. Both ontologies will be part of OBO. Will be children of the BFO class 'Site'. Will these ever have to merge? geo.obo is designed for case of real geographical information, env ontology is for describing habitat, smaller scale. We think that this sounds like roles. Update from Susanna on funds for env ontology have now been provided. They have a workshop at the end of August 2007. Direction may have changed based on Michael Ashburner's geo.ontology. Suggestion that the biodiversity community should be included. JF:Original Q. env.ontology talking about e.g. rivers. OBI is interested in lab habitats. Last meeting clinical history, etc subsumed into the env. ontology. Should revisit that at this meeting. There is a placeholder for that, might bring it back it domain in the context of laboratory. BS:OBI should consider everything that is part of the experimental artefact

The second May 1 milestone is the proposal for process - how to link to ontologies / terms / free text entries apart from canonical OBI links. The main point here is that we should/must reuse other ontologies where available. Will probably have a breakout session about this. Perhaps it is best a task to give to the Relations Branch.

The June 1 milestone of review of placement of community terms will be covered with the branch updates given tomorrow. July 1 was the finalizing of terms into branches, which hasn't quite been reached as we are still working on the branches - it took a while to get subversion sorted. The July 9th milestone of re-merging branches will no longer be necessary as we'll be keeping the branches for a while. However, if we want to re-merge to provide a single file for download, AR has made a script to do this dyanamically.

Another 9 July milestone was to have the deprecation policy finalized. Alan had a proposal about where to put deprecated terms - into a separate import file - so that "normally" you wouldn't see them, but could import them if you want to see them. Will talk about the deprecation policy this week too.

This led into a longer discussion of versioning, history, and deprecation. Versioning is a lot more complex than deprecation, but AR argues that you can't have deprecation without history. GO has a versioning policy. Should be documenting ANY change - spelling, add annotation etc. Both what and why the change was made. Barry suggests that each time any change has done, you should create a new ID. AR says that this imposes a larger burden on the user. BB and AL agree that only semantic changes should make ID changes - syntatic changes shouldn't. AR points out what happens if you have a closure axiom over a group of terms, and then you need to add a term, or remove the closure axiom. Is that a semantic change? AR suggested that we not worry about it until we have a stable core. Perhaps a subgroup should set up a proposed policy and send it around. Bill suggests an intermediate milestone of 3 weeks where *everyone* would submit any use-cases / examples they want considered when building the requirements list for this policy. The policy should be ready for the next workshop. Trish had a wiki page for OBI, and one for MO. Then come up with a proposal for implementation. When and how deprecated, cases where do not want to see these. Added as an agenda item. Question on whether versioning is also covered here and what history is needed for consumers. Liju suggests edit notes for each version. Unclear what granularity is needed here. BIRN say what former parents were. E.g. say what this term looked like on date x. GO does this already. Best practice - term gets a new number whenever any aspect of a term is changed. If have many child terms would need to also change their ids as well. This is problematic for users - too much burden. GF:most people want to know when def changes, text or placement and we need to decide if we track lexical and logical definitions. SKOS has some info on this. Could defer till we end up with changes that need this. GF:we need to have this in place for V1, and will take a while to get it right. We decide to id a sub group to do this offline by next workshop: Alan, Bill, Gilberto,. Milestone date set for discussion of this. Wiki page needs creating for requirements. Want to invite Daniel so needs to be scheduled in the am if in this meeting. Relating to this am's discussion with the funders, implementing a deprecation policy is time consuming, might need to prove utility rather than developing this complex policy. i.e. not implement prior to production version 1.0.
 * Milestone, finalising prev meta data annotations. Seems like clean up of web pages. Do offline


 * Milestone. Bill and Bjoern, communities using OBI post their OWL files for review. NMR has done this and Bill will do for BIRN. Suggestion to do this off the community pages. NMR will disappear once OBI is up.


 * Milestone, Present proposal for env ontologies, clinical etc.

We started on this, e.g. using OBO reln ontologies. How do merging etc. OWL API has e-connections for e.g. No conclusion reached. Needs more work.
 * Milestone - how to link to other ontologies.

Partly done. No need to merge back at this point, will be done dynamically
 * Milestone, finalised placing of terms into branches and merge back branches.

Future Milestones Discussion
Based on the discussion with the NIH programme officers we have developed some new milestones which we will assign dates to by the end of the workshop New milestone, Identify a time and location for next workshop.

New milestone, identify a test strategy for OBI in terms of proving utility e.g. (Application Use cases?) for text mining, data acquisition etc. Use in context of OMs. Enthusiam for this, list of projects below.


 * Bill suggests brainmap.org - mining neuro studies. Has imaging and behav. assays and anatomy. Want to ask qus can't ask in current state. Could use this as a way to assess OBI. Kevin representing what vendors are doing. Vendors also want to see value prior to commitment. We propose to use the following to do a preliminary grant proposal. Alan we have  triple store at science commons for e.g. OBI plus data, could use this during this process for multiple data types.


 * HP offers to do this MO and OBI for some ArrayExpress array data and some text mining of AE descriptions.


 * SS will do this for multi-omic data sets


 * SS will do text mining for NMR using OBI as a seed terms


 * Gully Burns text mining neuranatomical techniques - text mining.


 * Data acquisition, problem as when running production


 * Biosupplynet - all catalogs scraped by Cold Spring Harbor Lab. Press- may be able to interact with that from ProtocolApplication and Instrument branches and Biomaterial for reagents. Has also worked with the PSI vendor community. Alan suggests we need to be proactive and then scrape a vendor(s) site into OBI and work from there.

Agenda discussion and some time changes

 * 1) Tuesday AM. Each branch leader will walk us through the OWL file and highlight issues. Should all be in subversion by tomorrow am. All to do that if not uploaded already
 * 2) Clinical ontology meeting report from Jennifer will now be at @ 2-3pm Tuesday, to allow Barry to be there
 * 3) Dealing with biomaterials, or other things,  that are outputs of processes. BS and AR want to discuss this on Weds am.
 * 4) Development of implementation examples esp. e.g. with MAGE/MAGETAB. 3pm Tues PM. Lots of ambiguity there and unresolved problems. We will identify examples that we need based on Bill's neuroimaging.
 * 5) Defined terms process to get us into OBO foundy, also needs to be Weds am.
 * 6) Agenda for data transformation workshop stays at 4pm Thursday slot.

Also, we should have 15 minutes that covers what Protege 4's status is, and what it looks like.

Agenda Item. Using OBI for real based on Matt's email
Based on email from Matt Pocock sent to obi-dev. His email can be read here: http://sourceforge.net/mailarchive/message.php?msg_name=200707091513.52789.matthew.pocock%40ncl.ac.uk 3 OBI tiers, 1 extra layer which is used to annotate with. Discussion on whether instances/individuals should be in OBI. Seems not, this is what the lab is annotating. When look at OBI core BFO is top tier, top 2-3 levels general terms, is second tier, hide BFO classes from consumers. Third tier get the whole file dynamically. Users using OBI might then want to send terms from their own 4th tier OWL file. Who are the users? Developers for data integration? Or biologists. Bioportal is a way to provide access for ontologists and in future build graphs etc. Problems with adding BIRNLEX, don't want it indexed in this way so is linked via SF instead. BIRNLEX imports OBI, and now OBI is now on the portal. If have parts of ontologies then the bioportal doesn't work for OBI as we have so many imports. Question of whether this is relevant for OBI. Do we want people browsing OBI. We need to identify these cases and decide how impact development. Seems we need both and we need to follow up this with OBO - Daniel Rubin needs to be contacted about this. '''AA. Bill Bug will provide details for us to contact Daniel/Bioportal to solve this problems.''' '''AA. Get the plugin and test OBO format ouput, might solve this problem, though this sub optimal solution for some people.'''

Conclusion on agenda item.
 * 1) We need a single file for some users, some will want parts.
 * 2) We need to meet all these needs
 * 3) This is also a selling point confirmed by Programme officers meeting.
 * 4) We need to do this automatically from subversion at commit on each branch.

Notes by Helen Parkinson and Allyson Lister