19 December 2008

Data management advice the UKDA way

Some of you may be aware of this already, but I recently came across a useful resource which underlines much of what was covered at the last Forum meeting in November. The UKDA website includes a set of well-written advice pages on sharing and managing data, including some bits on advice and training. Although it's obviously geared for social sciences researchers, most of the material is sufficiently generic to apply to data management across most disciplines. See http://www.data-archive.ac.uk/sharing/sharing.asp for info.

17 December 2008

RDMF2: Core Skills Diagram

Here's the draft version of a diagram we've put together to reflect the core skills for research data management - another outcome of RDMF2, which informed the breakout groups summarised in an earlier post.

It would be good to hear your views on this, particularly the crossover points (i.e. those skills which span two or more job 'types').

11 December 2008

Notes from breakout groups now available

The raw notes from last month's Forum breakout groups are now online at the RDMF section of the DCC website.

These will form the basis of the group's recommendations and outcomes, to follow in the New Year.

The notes are fairly rough and ready, but there could well be the germ of a great idea or two contained within...

29 November 2008

Research Data Management Forum: Roles & Responsibilities

The second RDMF workshop, again held in Manchester last week, was on roles and responsibilities in the management of data. We heard from the ubiquitous Andrew Treloar yesterday evening, relating the Australian data (and other) developments to those in the UK. Key to responsibilities there is the Australian Code for the Responsible Conduct of Research, which I wrote about before in the Digital Curation Blog: this puts complementary obligations onto researchers and their institutions in the retention and management of their data (incidentally, the RCUK consultation on similar matters ended on October 24; I gather we might get a result by next May or so). I do like the system-wide coherence of design of the Platforms for Collaboration in Australia, of which the Australian National Data Service (ANDS) is one part.

Thursday morning, Alma Swan has been talking about the Key Perspectives report (Swan & Brown, 2008) on skills and career structures for data curators and data scientists. Lots of good information here, although a pretty bleak picture. The grouping they use is:

Data creators/authors
Data scientists
Data managers
Data librarians.

Stephen Lawrie of Edinburgh, as a research scientist (and data creator) in the field of Neuroscience, taking the research scientist’s view (his group has been working with the DCC SCARP project, and applying the DRAMBORA risk assessment tool). Their group has around 2,000 sMRI scans and 800 fMRI scans, taken on two different scanners (the source of some of their problems), some of which are unique resources. Their High Risk study into schizophrenia has been gathering data from people with family indications of pre-disposition, from age 16 (average age of onset is 25), coupling the MRI images with lots of other measures and indicators. Angus Whyte of DCC SCRP has written a report (Whyte, 2008) with some significant recommendations in this area. Although instrument makers appear not to be interested in harmonising their data, but rather in innovating, the field is attempting to develop post-processing capabilities that will harmonise that heterogeneous data.

Helen Parkinson of EBI as a data scientist and biocurator. Data scientists do real research, but may never touch the lab equipment. Nice quote on biocurators as the museum cataloguers of the Internet age; but not passive curators; they want to slice and dice, combine and re-use these data. (see PLoS article (Bourne & McEntyre, 2006) ). Talks about the data explosion and new types of data over the last 5 years. Both Helen and Stephen mention the key role of ontologies. Genomics field benefiting from the Bermuda Rules, where an article cannot be published without the accession number for data deposit in the appropriate database. (this does of course mean extra pressure on the curators: “my Nature paper deadline is tomorrow, I need that accession number right now!”) Her particular area is ArrayExpress, on transcriptomics data. Run a public/private data archive, a public gene data slice `(which may be re-annotated), and another one I didn't catch (see the slides when they get mounted!). They seem to see curation as effectively a quality process, but added value aspects as annotation, mapping to ontology terms etc.

Robin Rice as data librarian. See Wikipedia definition of data library. Mentions IASSIST as an organisation of about 300 or so data libraries (or their equivalents) in the social sciences. Also DISC-UK as a small support group for information specialists in the UK. More personal help on campus than a virtual helpdesk? Services include helping with finding, accessing, using, and managing datasets. Data librarians could do more to hep data creators. Refers back to the Leona Carpenter (Carpenter, 2004) definitions used by the DCC: data creators, data curators, and data re-users.

Sam Pepler of BADC as Data Manager, or “plumber” (“piping data from one place to another, ensuring data flows properly and that valuable are not lost” Swan?). Mentions several staff members fitting in the data manager box: storage coordinator, development manager, developer, infrastructure manager, climate modeller/domain expert. Looking at these people, their origins and current roles, it is clear that there is no career structure here (yet). No clear line between system admin, software developer and data manager, although separating data managers and maintainers is important.

Sheila Corrall on the education and training implications. Combination needed of domain expertise, technical skills (in data management) and people skills (in “translating” roles). Overlapping roles: content specialists, conduit specialists, and context specialists. Suggesting a federated model with disciplinary centres and librarians, and I guess local laboratories. Makes the good point that papers in repositories are pretty similar technically (give or take the PDF/Word/LaTeX continuum), but datasets are extremely heterogeneous (eg multiple scale factors, widely different encodings, standards and description/metadata standards).

In the afternoon, three breakout groups aiming to come up with recommendations to JISC, to the Research Information Network, to the DCC, and to research funders: I think these recommendations are worth getting right, in terms of wording, and will be made available later after a bit of careful wordsmithing.

Bourne, P. E., & McEntyre, J. (2006). Biocurators: Contributors to the World of Science. PLoS Computational Biology, 2(10), e142.

Carpenter, L. (2004). Taxonomy of digital curation users Bath: Digital Curation Centre.

Swan, A., & Brown, S. (2008). The Skills, Role and Career Structure of Data Scientists and Curators: an Assessment of Current Practice and Future Needs: Key Perspectives.

Whyte, A. (2008). Curating Brain Images in a Psychiatric Research Group.

11 November 2008

Reading for RDMF?

James Mullins's paper "Enabling International Access to Scientific Data Sets" may provide food for thought ahead of this month's second RDMF meeting.

Thoughts re. other suggested readings? Let us know via the comments, below.

28 October 2008

Trusted Repository in the Clouds?

You may have seen that Microsoft has unveiled a cloud computing service, in which data and applications will not be stored on individuals' computers – (reported 27th October at http://news.bbc.co.uk/1/hi/technology/7693993.stm). The new platform, dubbed Windows Azure, was announced at Microsoft's Professional Developers Conference in Los Angeles as "Windows for the cloud". It will be offered alongside the next Windows release, Windows 7.

It is not surprising to see Microsoft taking on established players like Google and Amazon in the rapidly growing business of online software. One may ask, however, what guarantees Microsoft is likely to provide for the continued storage and preservation of data stored in the cloud. Consumers will already have asked similar questions when archiving their photographs or personal data files in online vaults.

As Gavin Clarke has reported (http://www.theregister.co.uk/2008/10/28/microsoft_blocks_azure_traffic/), if applications posted to the Azure services platform exceed their allocated storage or processing hours, users may actually be turned away. But never mind, while you might not be able to sign up new customers or do any business with your data, Microsoft will look after them. It has promised not to dump data. So that’s ok then.

21 October 2008


A colleague drew my attention today to the service being offered by the MIT Libraries, namely, to help MIT faculty and researchers manage and publish their data – see http://libraries.mit.edu/guides/subjects/data-management/index.html. I did wonder for a moment whether this was a case of the old data/information conundrum but no, they really do mean data, and they’re not talking in terms merely of research papers.

Interestingly, MIT Libraries don’t claim to provide all the necessary resources, but to describe them, giving links to established national data repositories, for example - an approach that would fit nicely with the umbrella solution described in the UKRDS interim report (accessible from http://www.ukrds/ac.uk).

I’d be interested to know how many of the UK’s university libraries are offering resources for managing research data throughout their lifecycle – or even how many have plans to offer that service. Or is everyone awaiting the outcomes from the UKRDS study before making a commitment?

09 October 2008

Roles and Responsibilities for Effective Data Management

Registration is now open for the second DCC/RIN Research Data Management Forum, which will be held on November 26th and 27th at Chancellors Hotel and Conference Centre in Manchester.

Themes explored at the workshop will include: roles and responsibilities associated with data curation and good stewardship; core skills required for these roles; ways in which the various stakeholders might acquire the required skills; and models and working practice to facilitate the exchange of skills.

Full details, including registration information and draft agenda, are available via the DCC website at http://www.dcc.ac.uk/data-forum/

Please note that spaces are limited, so delegates are urged to register sooner rather than later.

02 July 2008

2nd Forum topic

I would support this topic area for the meeting. It has been touched on by other events but needs more exposure. I would include under roles and responsibilities the question of where understanding and skills development should come from - whose role to provide/deliver.

01 July 2008


We propose to hold the next meeting of the Research Data Management Forum in Birmingham, in November 2008. It has been suggested that the focus of the event should be upon roles and responsibilities, skills and expertise, with a particular examination of the skills and practice models that different groups bring to the table, both here and in an international setting.

Is this an area of interest or concern that you feel it would be advantageous to explore, or are there other issues that you consider it would be more useful and appropriate to address in the context of the Forum’s next meeting?