Research Data Management Forum: November 2008

The second RDMF workshop, again held in Manchester last week, was on roles and responsibilities in the management of data. We heard from the ubiquitous Andrew Treloar yesterday evening, relating the Australian data (and other) developments to those in the UK. Key to responsibilities there is the Australian Code for the Responsible Conduct of Research, which I wrote about before in the Digital Curation Blog: this puts complementary obligations onto researchers and their institutions in the retention and management of their data (incidentally, the RCUK consultation on similar matters ended on October 24; I gather we might get a result by next May or so). I do like the system-wide coherence of design of the Platforms for Collaboration in Australia, of which the Australian National Data Service (ANDS) is one part.

Thursday morning, Alma Swan has been talking about the Key Perspectives report (Swan & Brown, 2008) on skills and career structures for data curators and data scientists. Lots of good information here, although a pretty bleak picture. The grouping they use is:

Data creators/authors
Data scientists
Data managers
Data librarians.

Stephen Lawrie of Edinburgh, as a research scientist (and data creator) in the field of Neuroscience, taking the research scientist’s view (his group has been working with the DCC SCARP project, and applying the DRAMBORA risk assessment tool). Their group has around 2,000 sMRI scans and 800 fMRI scans, taken on two different scanners (the source of some of their problems), some of which are unique resources. Their High Risk study into schizophrenia has been gathering data from people with family indications of pre-disposition, from age 16 (average age of onset is 25), coupling the MRI images with lots of other measures and indicators. Angus Whyte of DCC SCRP has written a report (Whyte, 2008) with some significant recommendations in this area. Although instrument makers appear not to be interested in harmonising their data, but rather in innovating, the field is attempting to develop post-processing capabilities that will harmonise that heterogeneous data.

Helen Parkinson of EBI as a data scientist and biocurator. Data scientists do real research, but may never touch the lab equipment. Nice quote on biocurators as the museum cataloguers of the Internet age; but not passive curators; they want to slice and dice, combine and re-use these data. (see PLoS article (Bourne & McEntyre, 2006) ). Talks about the data explosion and new types of data over the last 5 years. Both Helen and Stephen mention the key role of ontologies. Genomics field benefiting from the Bermuda Rules, where an article cannot be published without the accession number for data deposit in the appropriate database. (this does of course mean extra pressure on the curators: “my Nature paper deadline is tomorrow, I need that accession number right now!”) Her particular area is ArrayExpress, on transcriptomics data. Run a public/private data archive, a public gene data slice `(which may be re-annotated), and another one I didn't catch (see the slides when they get mounted!). They seem to see curation as effectively a quality process, but added value aspects as annotation, mapping to ontology terms etc.

Robin Rice as data librarian. See Wikipedia definition of data library. Mentions IASSIST as an organisation of about 300 or so data libraries (or their equivalents) in the social sciences. Also DISC-UK as a small support group for information specialists in the UK. More personal help on campus than a virtual helpdesk? Services include helping with finding, accessing, using, and managing datasets. Data librarians could do more to hep data creators. Refers back to the Leona Carpenter (Carpenter, 2004) definitions used by the DCC: data creators, data curators, and data re-users.

Sam Pepler of BADC as Data Manager, or “plumber” (“piping data from one place to another, ensuring data flows properly and that valuable are not lost” Swan?). Mentions several staff members fitting in the data manager box: storage coordinator, development manager, developer, infrastructure manager, climate modeller/domain expert. Looking at these people, their origins and current roles, it is clear that there is no career structure here (yet). No clear line between system admin, software developer and data manager, although separating data managers and maintainers is important.

Sheila Corrall on the education and training implications. Combination needed of domain expertise, technical skills (in data management) and people skills (in “translating” roles). Overlapping roles: content specialists, conduit specialists, and context specialists. Suggesting a federated model with disciplinary centres and librarians, and I guess local laboratories. Makes the good point that papers in repositories are pretty similar technically (give or take the PDF/Word/LaTeX continuum), but datasets are extremely heterogeneous (eg multiple scale factors, widely different encodings, standards and description/metadata standards).

In the afternoon, three breakout groups aiming to come up with recommendations to JISC, to the Research Information Network, to the DCC, and to research funders: I think these recommendations are worth getting right, in terms of wording, and will be made available later after a bit of careful wordsmithing.

Bourne, P. E., & McEntyre, J. (2006). Biocurators: Contributors to the World of Science. PLoS Computational Biology, 2(10), e142.

Carpenter, L. (2004). Taxonomy of digital curation users Bath: Digital Curation Centre.

Swan, A., & Brown, S. (2008). The Skills, Role and Career Structure of Data Scientists and Curators: an Assessment of Current Practice and Future Needs: Key Perspectives.

Whyte, A. (2008). Curating Brain Images in a Psychiatric Research Group.

Research Data Management Forum

29 November 2008

Research Data Management Forum: Roles & Responsibilities

11 November 2008

Reading for RDMF?

Blog Archive