Workshop Data transformation 2008 Nov 5

Action items day 3
AI:Look at the categorization in GenePattern UI and check vs. the current OBI objectives, should be some concordance

AI:GenePatterns need parameters as well as analysis methods, need to add these as a wish list items.

AI:Add 'cross validation' as an objective - JM (added to s/sheet)

AI:Need to add distance measures in the ontology. We need to add these to the list Euclidean distance. Problem that we don't know where to put these yet. Where's the list?- Branch Work

AI:Add pattern matching (class neighbours) as an objective - DT branch. We need a generic algorithm for this. We think KNN is the closest thing to a generic algorithm for this, where K=2. Checking about how this is implemented in GenePattern

AI:GenePattern has a list of what was run most frequently, we will use these to determine focus for GenePattern support

AI:We need Features - where features are discriminant between samples - feature selection objective. COmparativeMarkerSelection. Also need to add markers to the role branch where a marker is a synonym of feature. - follow up with TL.

AI:Add a link to Teds tools that assist users with which tool to use under which cases, we could build this into the ontology later - TL/DT branch

AI:compare spectra, is a comparison again, we need info about the implementation to do more than add the objective for this to the sheet. -TL

AI:Add cross correlation as a term - generic term will be cross correlation, needs an objective.

AI:Add BIC - bayesian info criteria - way to determine a number that will determine cluster numbers, AIC, ICL, to be added as new terms. Related GenePattern ConsenusClustering

AI:RMA, GC-RMA, MAS5 - need to be added somewhere - with algorithm

AI:Need to deal with annotation where we map to ids, where we add annotation value also for case Hu68kHu35kAtoU95 - is this in data transformation

AI: Add a subclass for GSEA on differential expression excercise. MM

AI: LandmarkMatch, LocatePeaks relate to proteomics. May related to the sequence of the data - MM will investigate these

 ---

Day 3 Ted Liefeld Gene Pattern Presentation  * add ppt *

In silico support for reproducible research. Info to reproduce the research.

Accessible to different levels of users.

Workflow support

Local dist/computing

Gene pattern 60 analyses, 14 reformatters dataloaders, 18 visualizers, 8 pipelines

Using O-XML - uses LSIDs. Taverna use scuffle. Limitation - param files external in a provenance recording system in RDF.

When module defined has some semantic support in Genepattern

Execution logs come with the output files.

Discussion on the formats which are used to define inputs and outputs per module. List of formats and relationship to the processes are available.

NB. Zelig is not yet plugged into gene pattern

Using OBI in Genepattern.

Using GCT and RES - local genepattern formats. Tab delimited formats embedded in GenePattern

Extend file readers for MAGE-TAB.

Within MAGE-TAB can record a workflow. Want to do this in a generic way. OBI id for the analysis type.

Can reverse generate a pipeline from an analysis - possibly done in any tool.

Seen in OBI about the analyses - need something regarding the parameters. E.g. random seeds - for reproducibiliy need to have some reproducibility. IEEE spec for random number generators may not been enough.

Need parameter values to map analyses from other systems.

Execution logs replaced by SDRF.

Define generic protocols using OBI ids rather than LSIDs - static, need to be able to add these easily to the examples.

Categorization of analysis - large menu of terms that describe analyses.

Need to map terms between tools e.g. GenePattern vs. GEWorkbench - different names, same method. Already allow suites, want to have tag clouds - people can add their own names. Want to prepopulate this from OBI.

AI:Look at the categorization in GenePattern and check vs. the current OBI objectives, should be some concordance

AI:GenePatterns need parameters as well as analysis methods, need to add these as a wish list items.

AI:GenePattern has a list of what was run most frequently, we will use these to determine focus for GenePattern support.

Working through the GenePattern use case doc to categorize and add new terms

See google doc http://spreadsheets.google.com/ccc?key=phYpwAQC6nxrfk0SR099aZQ&inv=ob...

AI:Add 'cross validation' as an objective - JM (added to s/sheet)

AI:Need to add distance measures in the ontology. We need to add these to the list Euclidean distance. Problem that we don't know where to put these yet. Where's the list?- Branch Work

'''AI:Add pattern matching (class neighbours) as an objective - DT branch. We need a generic algorithm for this. We think KNN is the closest thing to a generic algorithm for this, where K=2. Checking about how this is implemented in GenePattern'''

Comparative marker selection - generic form of differential analysis. Want features that differentiate the samples. A distance metric is created, then permutes the dataset. Still need an answer for the KNN

'''AI:We need Features - where features are discriminant between samples - feature selection objective. COmparativeMarkerSelection. Also need to add markers to the role branch where a marker is a synonym of feature. - follow up with TL.'''

AI:Add a link to Teds tools that assist users with which tool to use under which cases, we could build this into the ontology later - TL/DT branch

AI:compare spectra, is a comparison again, we need info about the implementation to do more than add the objective for this to the sheet. -TL

AI:Add cross correlation as a term - generic term will be cross correlation, needs an objective.

AI:Add BIC - bayesian info criteria - way to determine a number that will determine cluster numbers, AIC, ICL, to be added as new terms. Related GenePattern ConsenusClustering

AI:RMA, GC-RMA, MAS5 - need to be added somewhere - with algorithm

AI:Need to deal with annotation where we map to ids, where we add annotation value also for case Hu68kHu35kAtoU95 - is this in data transformation

AI: Add a subclass for GSEA on differential expression excercise. MM

TL: HeatmapViewer - if allows subsetting is not simple data visualization - dataset can be sort and subsetted.

AI: LandmarkMatch, LocatePeaks relate to proteomics. May related to the sequence of the data - MM will investigate these