Workshop Data transformation 2008 Nov action items

Editing Tasks - James Malone/Melanie Courtot/Helen Parkinson
AI:Hammer BFO/IO for answers on where variables will be in the ontology. Add this to the agenda for Vancouver. MC

AI:'center calculation' and 'averaging data transformation' defined classes need attention. MM thinks that these could be conflated. Moving average needs to be under both. Data imputation possibly incorrect - could be a class in itself. Need a new defined class for that - probably objective.

DONE AI:Background correction - changes to an objective.

DONE AI:Change 'data partitioning' to an objective (obsoleted property-based and random-based vecto selection)

DONE AI:gating is not a dimensionality reduction and is therefore incorrectly placed under property based vector reduction - PBVR. Gating has objective data partioning, goes under the root. PBVR - was this created so that gating will fall into the hierarchy? Send an email Richard to describe the issue and see if he is happy with this proposal - MC - DONE. RS is Happy

AI:Address dimensionality reduction vs. data vector reduction - if these are the same thing - then we can have a single term for both. - MC will also add this to the email.

DONE AI:create a sheet for variables as these determine what test you do -DONE and submitted to IAO - DONE, discussion with AR

DONE AI:Descriptive stats also becomes an objective

DONE AI:Differential expression analysis becomes an objective - as any test can be used - this is the objective

DONE AI:Discriminant analysis should be a synonym of classification or class discovery MM

DONE AI:EH transformation - belongs where B transformation is sibling - B/H/EH/S transformations should have objective normalization

DONE AI: Feature extraction suggested to be made an objective as suggested in review.

DONE AI:Loess scale group transformation is a scaling adjustment. CHange tree to reflect this. Just the scaling, not the transformation. Add a scaling objective will be added, loess scale gp trans is-a loess group transformation, followed by a scaling adjustment. This will be a defined class - any dt with objective scaling.

AI:JM will email Elisabetta offline to ask about MA transformation's objective.

DONE AI:We need a new objective - 'Type 1 Error Rate correction' (MC comment: should that be correction of feature type 1 rate error?) - JM/MC: added error correction, to revise if need for specific error rate

DONE AI:Add a synonym to Network analysis - network topology analysis.

AI:Send an email to RS about Network analysis's objectives

DONE AI:Find out what WAS classified under polynomial transformation (in some previous version) - need to have a property that is associated with it somehow. MC checked in previous revisions and didn't find anything

DONE AI: Sequence analysis becomes and objective, anything underneath will have it as an objective. Objective is 'sequence feature identification' or 'prediction'

DONE AI: SMCA - has an normalization objective

DONE AI:Add 'cross validation' as an objective - JM (added to s/sheet) - NOTE: this has been added as a child of partitioning objective, with alternative term rotation estimation

AI:design decision - we will use feature to say things like 'normally distributed' - about data as a workaround for now.

AI:Implement feature - normally distributed in OBI and apply to a process.

AI:Document these use cases Use case: longitudinal analysis - show me the methods show me the methods I can use- consider modelling as part of data, rather than in the heirarchy Use case: longitudinal analysis response variable=time - show me the methods I can use

AI:Quality assessment to add - QC is a thing you do to have quality. Outlier id is a part of QA, but not a synonym.

DONE AI:Add outlier to the role bra. MC - submitted to tracker https://sourceforge.net/tracker/index.php?func=detail&aid=2229248&group_id=177891&atid=886178

AI:Put the lsids back in the GenePattern use case file as names change - JM/TL

AI:Special kind of warping or possibly normalization (if we can agree that these are the same thing last resolution is that warping isn't a correction for systematic error) - Warping can be many to 1 or 1:1 for many:1 we need to add Discretization objective - Discretization objective is the approximation the solution of a continuous problem by representing it in terms of a discrete set of elements - JM. Synonym - bucketing, binning. Some reln to normalization but we cannot define this ontologically

AI:Add warping as a synonym for alignment objective - JM

DONE AI:Merge needs to be added as an objective synonym - combine - JM Definition: the union of two or more sets. E.g. the merging of columns or merging of rows from two different tab-delimited data sets.

AI:Ensure that the dt objectives precise e.g. selection -> data selection - JM

AI:We will need to add union, intersection, complement to OBI - JM

AI:define NMF currently u/c and add objectives - A pattern recognition algorithm that ids patterns that together explain the data as a linear combinatio of expression signatures. Synonyms:SY - NCIT. Citation needs adding to NCIT - TL

AI:Add selection - e.g. selection based on row or column ids after merge. Def:Selection is the objective of choosing data based on some criteria such as row or column identifiers. - Add to OWL file. JM

Tina Boussard's Tasks
AI:Add cross correlation as a term - generic term will be cross correlation, needs an objective.

AI:Add BIC - bayesian info criteria - way to determine a number that will determine cluster numbers, AIC, ICL, to be added as new terms. Related GenePattern ConsenusClustering

AI:Add survival analysis objective, and add cox regression, move Kaplan Meyer under this - TB

AI:Check the NCIT for several alg definitions present there - TB

Monnie McGee's Tasks
AI:Add CART random forest to OBI MM will define

AI:Def trimmed mean calc - has part trimming process, suceeded by mean calculation. Outlier removal to be added as an objective. MM

DONE - 3D is different - opens up lots more termsAI:MM will ask a colleague what to do for 3D and 2D feature extraction and if they are different.

AI:MM will look to see whether other scaling adjustments are used instead of loess ever.

DONE - see google docs hypothesis testing spreadsheetAI:look up other types of chi squared and define them MM

AI:quantitative variable - we need a definition that separates from ordinal - MM

AI: LandmarkMatch, LocatePeaks relate to proteomics. May related to the sequence of the data - MM will investigate these

AI: Add a subclass for GSEA on differential expression excercise. MM

AI:Add 'test for trend' to OBI

AI:Add cox regression

AI:add multivariate analysis of variance

AI:Add Longitudinal analysis and define it - change to objective

AI:thresholding: Thresholding objective is a special kind of filtering where boundaries for minimum and maximum values are determined and the values that do not meet these criteria are removed. Add as a child of filtering - JM

AI:LOHPaired module - needs paper checking for what is implemented so we can assign a general algorithm and specific ones - MM

AI:Add mutual information to OBI - def: a formula (borrowed from information theory) that compares the probability that two items occur together as a joint event with the probability that they occur individually (and that their co-occurances are a result of chance). Needs a placement in OBI - should live with correlation - MM

AI:add fold-change - process of calculating to OBI and define - JM

AI:Need defintion for Ward's method under agglomerative h'arch clustering - MM

Ryan Brinkman's Tasks
AI:Find information for B transformation (existing term) - need an objective. - RB

AI: Add QT clustering - RB will define

AI:Add feature selection - objective. RB In this sense doing machine learning as in wikipedia http://en.wikipedia.org/wiki/Feature_selection - RB. Random forests should be under feature

AI:Need some way to relate multiple testing with the need for multiple test correction RB has a use case selection.

Ted Liefeld's Tasks
AI:Add a link to Teds tools that assist users with which tool to use under which cases, we could build this into the ontology later - TL/DT branch

AI:GenePattern's Compare spectra, is a comparison again, we need info about the implementation to do more than add the objective for this to the sheet. -TL

AI:GenePattern has a list of what was run most frequently, we will use these to determine focus for GenePattern support

AI:We need Features - where features are discriminant between samples - feature selection objective. COmparativeMarkerSelection. Also need to add markers to the role branch where a marker is a synonym of feature. - follow up with TL.

AI:Add a link to Teds tools that assist users with which tool to use under which cases, we could build this into the ontology later - TL/DT branch

Ricardo Pietrobon's Tasks
AI: Alt term:categorical variable. Needs two child terms: dichotomous variable, polychotomous variable and definitions - RP

AI:Dump out the definitions of the variables used at Duke - these are defined in templates RP - DONE, available as a google doc.

Philippe Rocca-Serra's Tasks
AI:PRS will go back and check the objective for 'soft independent modeling of class analogy analysis'

AI:We will ask the metagenomic community for a use case (about sequence alignment? We will work on this when we have use cases for these PRS

AI:Methods will be proposed from relevant terms in the header row of the hypothesis test spreadsheet to the plan/planned process branch. PRS

General Data Transformation Branch Tasks
AI:Follow up on input/output lists for Zelig - with Amrupali - DT branch

AI:Need to add the fuzzy clustering methods - http://home.dei.polimi.it/matteucc/Clustering/tutorial_html/cmeans.html

AI:Review branch to check for missing subsumption of terms for objectives

AI:Add more children for specific methods of multiple testing procedure.

AI:Need to deal with annotation where we map to ids only e.g add GO terms, where we add annotation value also for case of GenePattern module - Hu68kHu35kAtoU95 - is this in data transformation or not?

AI:RMA, GC-RMA, MAS5 - need to be added somewhere - with algorithm

AI:Look at the categorization in GenePattern UI and check vs. the current OBI objectives, should be some concordance

AI:Need to add distance measures in the ontology. We need to add these to the list Euclidean distance. Problem that we don't know where to put these yet. Where's the list?- Branch Work

AI:Add pattern matching (class neighbours) as an objective - DT branch. We need a generic algorithm for this. We think KNN is the closest thing to a generic algorithm for this, where K=2. Checking about how this is implemented in GenePattern

AI:Check Ricardo's slides and create a competency question per path

AI:Check RP's slides with the test graphs and tests present on these are consistent with what we have in the ontology. Add any tests that are not present.

AI:Address longitidunal data sets in DT - DT branch

AI:GenePatterns need parameters as well as analysis methods, need to add these as a wish list items.

AI:Add SNP and LOH methods/processes to dt - DT branch