Monday, April 23, 4:00 pm
Virginia Dale Room/University Club
Lory Student Center
Presentation Abstract:
Digital repository tools such as DSpace and Fedora are now being widely promoted for the capture and preservation of primary academic research output and many institutions and funders are starting to mandate such processes. In principle this effort can be extended to the data on which scientific research rests and has the potential of generating a huge resource for data-driven practice. In Cambridge, we have started to explore this, and I shall report - with interactive demonstrations - on what is currently possible.
However there are many problems that do not apply to research articles (usually in PDF).
These include:
But the major problem is getting it to happen. We believe that the best place to start is with theses. Here the institution is (or should be) in control of what is done, including the requirement to reposit semantic theses (i.e. additional to any PDF) and are starting the JISC-funded SPECTRa-T project to develop protocols and tools.
When data are properly reposited enormous new opportunities in data-driven science arise. We have developed a protocol for the automatic semantic capture of crystallography, computational chemistry and spectroscopy. All tools are Open and we are encouraging anyone to take them and adapt them for their institutions. In this way we can use social computing to tackle the problems of scale and maintenance. We have also managed to capture all currently published crystallography from those journals which allow us to do so. This creates a large knowledgebase for molecules which - if widely adopted - will replace much of the broken commercial aggregation of chemical data by out-of-date secondary abstracting services.
But academia - disciplines, libraries and computational science - and funders have so far been apathetic to these problems and opportunities. I shall show what is possible and give demonstrations of what can be done with scientific data in repositories.