Locah Linking Lives: an introduction

We are very pleased to announce that the Archives Hub will be working on a new Linked Data project over the next 11 months, following on from our first phase of Locah, and called Linking Lives. We’d like to introduce this with a brief overview of the benefits of the Linked Data approach, and an outline of what the project is looking to achieve. For more in-depth discussion of Linked Data, please see our Locah project blog, or take a look at the Linked Data website guides and tutorials.

Linked Open Data Cloud
Linked Data Cloud

The benefits of Linked Data

The W3C currently has a draft of a report, ‘Library Linked Data‘, which covers archives and museums. In this they state that:

‘Linked data is shareable, extensible, and easily re-usable… These characteristics are inherent in the linked data standards and are supported by the use of web-friendly identifiers for data and concepts.’

Shareable

One of the exciting things about Linked Data is that it is about sharing data (certainly where you have Linked Open Data). I have found that this emphasis on sharing and data integration has actually had a positive effect aside from the practical reality of sharing; it engenders a mindset of collaboration and sharing, something that is of great value not just in the pursuit of the Linked Data vision, but also, more broadly, for any kind of collaborative effort and for encouraging a supportive environment. Our previous Linked Data project, Locah, has been great for forging contacts and for putting archival data within this exciting space where people are talking about the future of the Web and the sorts of things that we might be able to do if we work together.

For the Archives Hub, our aim is to share the descriptions of archive collections as a means to raise the profile of archives, and show just how relevant archives are across numerous disciplines and for numerous purposes. In many ways, sharing the data gives us an opportunity to get away from the idea that archives are only of interest to a narrow group of people (i.e. family historians and academics purely within the History Faculty).

Extensible

The principle of allowing for future growth and development seems to me to be vital. The idea is to ensure that we can take a flexible approach, whereby enhancements can be made over time. This is vital for an exploratory area like Linked Data, where an iterative approach to development is the best way to go, and where we are looking at presenting data in new ways, looking to respond to user needs, and working with what technology offers.

Reusable

‘Reuse’ has become a real buzz word, and is seen as synonymous with efficiency and flexibility.  In this context it is about using data in different contexts, for different purposes. In a Linked Data environment what this can mean is providing the means for people to combine data from different sources to create something new, something that answers a certain need. To many archivists this will be fine, but some may question the implications in terms of whether the provenance of the data is lost and what this might mean. What about if information from an archive description is combined with information from Wikipedia? Does this have any implications for the idea of archive repositories being trusted and does it mean that pieces of the information will be out of their original context and therefore in some way open to misuse or misinterpretation?

Reuse may throw up issues, but it provides a great deal more benefits than risks. Whatever the caveats, it is an inevitable consequence of open data combined with technology, so archives either join in or exclude themselves from this type of free-flow of data. Nevertheless, it is certainly worth thinking about the issues involved in providing data within different contexts, and our project will consider issues around provenance.

Linking Lives

The basic idea of the Linking Lives project is to develop a new Web interface that presents useful resources relating to individual people, and potentially organisations as well.

It is about more than just looking at a name-based approach for archives. It is also about utilising external datasets in order to bring archives together with other data sources. Researchers are often interested in individual people or organisations, and will want to know what sort of resources are out there. They may not just be interested in archives. Indeed, they may not really have thought about using archives, but they may be very interested in biographical information, known connections, events during a person’s lifetime, etc. The idea is to show that various sources exist, including archives, and thus to work towards broadening the user-base of archives.

Our interface will bring different data sources together and we will link to all the archival collections relating to an individual. We have many ideas about what we can do, but with limited time and resources, we will have to prioritise, test out various options and see what works and what doesn’t and what each option requires to implement. We’ll be updating you via the blog, and we are very interested in any thoughts that you have about the work, so please do leave comments, or contact us directly.

In some ways, our approach may be an alternative to using EAC-CPF (Encoded Archvial Content for Corporate Bodies, Persons and Families, an XML standard for marking up names associated with archive collections). But maybe in essence it will compliment EAC-CPF, because eventually we could use EAC authority records and create Linked Data from them. We have followed the SNAC project with interest, and we recently met up with some of the SNAC project members at the Society of American Archivists’ Conference. We hope to take advantage of some of the exciting work that they are doing to match and normalise name records.

The W3C draft report on Library Linked Data states that ‘through rich linkages with complementary data from trusted sources, libraries [and archives] can increase the value of their own data beyond the sum of its sources taken individually’. This is one of the main principles that we would like to explore. By providing an interface that is designed for researchers, we will be able to test out the benefits of the Linked Data approach in a much more realistic way.

Maybe we are at a bit of a crossroads with Linked Data. A large number of data sets have been put out as XML RDF, and some great work has been done by people like the BBC (e.g. the Wildlife Finder), the University of Southampton and the various JISC-funded projects. We have data.gov.uk, making Government data sets much more open. But there is still a need to make a convincing argument that Linked Data really will provide concrete benefits to the end user. Talking about Sparql endpoints, JSON, Turtle, Triples that connect entities and the benefits of persistent URIs won’t convince people who are not really interested in process and principles, but just want to see the benefits for themselves.

Has there been too much emphasis on the idea that if we output Linked Data then other people can (will) build tools? The much quoted adage is ‘The best thing that will be done with your data will be done by someone else’, but is there a risk in relying on this idea? In order to get buy-in to Linked Data, we need visible, concrete examples of benefit. Yes, people can build tools, they can combine data for their own purposes, and that’s great if and when it happens; but for a community like the archives community the problem may be that this won’t happen very rapidly, or on the sort of scale that we need to prove the worth of the investment in Linked Data. Maybe we are going to have to come up with some exemplars that show the sorts of benefits that end users can get. We hope that Linking Lives will be one step along this road; not so much an exemplar as a practical addition to the Archives Hub service that researchers will immediately be able to benefit from.

Of course, there has already been very good work within the library, archive and museum communities, and readers of this blog may be interested in the Use Cases that are provided at http://www.w3.org/2005/Incubator/lld/wiki/UseCaseReport

Just to finish with a quote from the W3C draft report (I’ve taken the liberty of adding ‘archives’ as I think they can readily be substituted):

“Linked Data reaches a diverse community far broader than the library/archive community; moving to library/archival Linked Data requires libraries/archives to understand and interact with the entire information community. Much of this information community has been engendered by the capabilities provided by new technologies. The library/archive community has not fully engaged with these new information communities, yet the success of Linked Data will require libraries/archives to interact with them as fully as they interact with other libraries/archives today. This will be a huge cultural change that must be addressed.”

photo of paper chain dolls
Flickr: Icono SVDs photostream, http://www.flickr.com/photos/28860201@N05/with/3674610629/

Archives Wales

map of wales with archivesI recently attended the ‘Online Development in Wales’ day organised by ARCW (Archives and Record Council Wales) to talk about the Porth Archifau (Archives Hub). I found out a good deal about what is happening in Wales at the moment and heard about plans and wishes for future developments.

In her introduction, Charlotte Hodgson from ARCW talked about the need for online catalogues with images rather than the other way around. Maybe there is too much emphasis on digitisation of images which become separated from their context. She referred to the good work of Archives Network Wales (ANW), but acknowledged that Wales is in danger of falling behind with online catalogues. There is a need to maximise opportunities, minimise duplication and effectively deploy resources.

Kim Collis from ARCW gave some background on ANW (now Archives Wales), which is a searchable database for collection-level descriptions that uses a MySQL database and a Typo3 front-end. It has stayed relatively static since it was first developed; the emphasis of individual offices maybe moved to their own web presence (many were using CALM and there was something of a race to get their catalogues online).  The front-end of the ANW site has not necessarily always been very user-friendly and has not provided the depth of information that it might do. However, it was developed in a standards-based way, and this stands it in good stead for future development. ‘Archives Wales’ was a bolt-on to the database, giving more information and including additional information about repositories, making a more complete and visually appealling site.

There has been some geo-tagging within ANW recently. This was seen as a good way to link in with People’s Collection Wales, enabling users to find out more information about, for example, a family that has owned an estate.  Kim talked about a number of possible developments, such as a project to provide links to  searchable tithe apportionments transcripts. The idea is to allow volunteers to transcribe the images.

Kim talked about the need to improve branding and identity. The site must be kept up to date to give it credibility. But there is, in a sense, competition with repository websites because many repositories want to prioritise these. I think it is worth impressing upon archivists the importance of cross-searching capability that aggregators provide, as well as the value of searching within a repository. We should not presuppose that researchers primarily want to know what is at just one individual office; they usually want to find ‘stuff’ on their topic of interest and then go down to the more detailed level of individual sources of information.

Sam Velumyl from The National Archives talked about the Discovery initiative at TNA, which provides a new information architecture that will accommodate the different systems that TNA has.   The idea is that it can accommodate the integration of other systems easily, making it a more sustainable and flexible solution. They are going to be carrying out an exercise in gathering feedback on Discovery, and you’re likely to hear about that very soon.  Sam said that the feedback will help TNA to decide upon their priorities. It may be that A2A will become active again, but at present this has not been decided.  There were concerns in the room that it is very difficult to get TNA to provide data back out of A2A.

People’s Collection Wales, which was presented to us by three speakers, is very much geared towards user-friendly and fun engagement in the history and culture of Wales. It works on the basis of everything being an item, and it gathers items together in collections by topic, not in the way that archivists would normally understand collections, but simply by areas that will be of interest to users. It is quite an eclectic experience, designed to draw in a broad section of the community and promote learning and understanding of Welsh history.  Re-purposing is a strong principle behind PCW. It integrates social media to encourage the idea of sharing the photograph or interview or whatever on Facebook or Twitter. It also has a scrapbook function so that people can gather together their own collections. It does link to the item within context, so you can link back to the website of the depositor.

PCW are going to be using an API to upload collection records  from Archives Wales. I got a little confused about this, as they also spoke about manual upload. I think the automated upload will only be for certain records.  They are also doing some interesting work with GIS, to enable users to do things like look at maps over time to see how a place has developed, and looking at making museum objects viewable in a 3-D way.

My plea to PCW is to make their titles clickable links where it seems as if they should be clickable. I found the site fun, with some great stuff, but it can take a while to understand what you are looking at. I went to browse the collections and many of them are untitled, and it’s not really clear what they are representing. I tried the map interface and looked for ‘castle’ near ‘barmouth’ and I was taken to a page of images of people talking about the Eisteddfod. The second time it worked better, but some of the images were not actually images and one of them remained in place when I did another search and I couldn’t delete it from the display, and I had a few more experiences of searches hanging and the display freezing. But then other searches worked well and I started getting links from places to objects. So, it was a mixed bag for me, and it seemed quite beta in terms of functionality, and also it was very slow, and I do think that’s a problem.  It feels very experimental, with loads of good ideas, but I wonder if it would be better to concentrate on developing fewer ideas but making them more effective.

The afternoon was more focussed on solutions for getting archives online. CyMAL recently commissioned research to analyse requirements for extending online access to archive catalogues in Wales, building on ARCW, and Sarah Horton gave us a summary of some of the findings.  Some of the stats were quite interesting: 11 local authority services use CALM, 1 uses the Archivists’ Toolkit and 1 uses Word. In higher education: 3 CALM, 1 Word, 1 no formal catalogue. The National Library of Wales uses the virutal library system and AC-NMW uses AdLib.  The survey found that the application of authority files and data standards was variable.

For online Access: 3 via CALMView but there are barriers to this for many offices, one being IT and their concerns about security. 4 services provide access via their own systems, 2 via PDF documents.  About 8,000 collections are listed on Archives Wales and 2,000 on the Hub.

9 services have backlogs of between 10-30%, 6 of over 30% and more if poor quality catalogues are taken into account. Many catalogues remain in manual form only.

We had a very interesting talk on the Black Country History website. Linda Ellis talked about how important it was for the project to be sustainable right from the outset.  The project was about working together to reduce costs and create a sustainable online resource. The original website used the Axiell DSCovery software, but it was not fit for purpose.  The redevelopment was by Orangeleaf System using their CollectionsBase system and WordPress, which means it is very easy to create different front-ends. There are a number of microsites, such as one for geology, filtered by keyword, a great idea for a way to target different audiences with minimal additional effort. Partners can upload data when they like via an XML export from CALM.  CollectionsBase will also take Excel, Access and manual data entry.   There is an API, so the data goes on to Culture Grid and Europeana.

Altogether a very stimulating day, with a good vibe and plenty of discussion.

Out and about or Hub contributor training

Every year we provide our contributors and potential contributors with free training on how to use our EAD editor software.

The days are great fun and we really enjoy the chance to meet archivists from around the UK and find out what they are working on.

The EAD editor has been developed so that archivists can create online descriptions of their collections without having to know EAD.  It’s intuitive and user friendly and allows contributors to easily add collection level and multi-level descriptions to the Hub.  Users can also enhance their descriptions by adding digital archival objects  – images, documents and sound files.

Contributor training day

Our training days are a mixture of presentation, demonstration and practical hands on. We (The training team consists of Jane, Beth and myself) tend to start by talking a little about Hub news and developments to set the scene for the day and then we move onto why the Hub uses EAD and why using standards is important for interoperability and means that more ‘stuff’ can be done with the data. We go from here on to a hands-on session that demonstrates how to create a basic record. We cover also cover adding lower level components and images and we show contributors how to add index terms to their descriptions. (Something that we heartily endorse! We LOVE standards and indexing!).

We always like to tailor our training to the users, and encourage users to bring along their own descriptions for the hands-on sessions. Some users manage to submit their first descriptions to the Hub by the end of the training session!

This year we have done training in Manchester and London, for the Lifeshare project team in Sheffield and for the Oxford colleges. We are also hoping (if we get enough take up) to run courses in Glasgow and Cardiff this year. (6th Sept at Glasgow Caledonian, Cardiff date TBC. Email archiveshub@mimas.ac.uk to book a place)

So far this year three new contributors have joined the Hub as a result of training:  Middle East Centre Archive, St Antony’s College, Oxford; Salford City Archive and the Taylor Institute, Oxford. We’ve also enabled four of our existing contributors to start updating their collections on the Hub: National Fairground Archive, the Co-operative Archive, St John’s College, Oxford and the V&A.

We have been given some great feedback this year and 100% of our attendees agreed/strongly agreed that they were satisfied with the content and teaching style of the course.

Some our feedback:

A very good introductory session to working with the EAD editor for the Archives Hub. I have not used the Archives Hub for a long time so an excellent refresher course.

This was a fantastic workshop – excellently designed resources, Lisa and Jane were really helpful (and patient!). The hands-on aspect was really useful: I now feel quite confident about creating EAD records for the Hub, and even more confident that the Hub team are on hand with online help

The hands on experience and being able to ask questions of the course leaders as things happened was really useful. Being able to work on something relevant to me was also a bonus.

Excellent presentation and delivery. I came along with a theoretical but not a practical knowledge of the Archives Hub and its workings, and the training session was pitched perfectly and was completely relevant to my job. Many thanks.

The Hub team train archivists how to use the EAD editor, archive students about EAD and Social media and research students in how to use the Hub to search for primary source materials. You can find our list of training that we provide on our training pages: http://archiveshub.ac.uk/trainingmodules/ .  We’re always happy to hear from people who are interested in training – do let us know!

HubbuB: August 2011

We are out and About in August. Jane and Joy will be going to the Society of American Archivists’ Conference this year, speaking as part of a panel session. We will be talking about Discovery, the Archives Hub and Linked Data. We’re also very excited to be visiting the OCLC offices in Dublin Ohio.  Lisa and Bethan will be at the Archives and Records Association conference in Edinburgh, so go and say hello if you are there. Lisa is also speaking at the conference.

Our Monthly Feature is all levitating women and mustacheod men, as we take a trip into Magic and Illusion at the Fairground Archive: http://archiveshub.ac.uk/features/magic/. Some great images, and a lovely photograph of Cyril Critchlow, a wizard in his 80’s, performing as ‘Wizardo, Harry Potter’s grandfather’!

We’ve recently created a page of Top Tips for Cataloguing: http://archiveshub.ac.uk/cataloguingtips/. These are some of the key areas that we believe are important for good online catalogues. We do still find that archivists don’t always think about the global online environment, so it’s worth setting out some of the most important points to bear in mind. It’s partly about thinking of the audience, browsing the Web, using Google, scanning pages for relevant content, and it’s partly about descriptions – ensuring that the title is as clear and self-explanatory as possible, thinking about how best to describe the archive in a way that is user-friendly.

We’ve been talking about ways to help get descriptions onto the Hub when they are created in Microsoft Word or Excel. We’re just exploring possibilities at the moment, but we are interested in anyone who uses, or knows anyone who uses, Microsoft Word to catalogue. Maybe smaller offices, or maybe you ask volunteers to do some of this?

We know people do use Microsoft Excel as well. We are thinking about ‘Tips for using Excel’. Would this be useful? We don’t necessarily want to give the impression that Excel is the most appropriate choice for cataloguing – its a spreadsheet software, not really for complex hierarchical archives. But we do realise that for some people, the choice of what to use is limited, and we want to do our best to accommodate the realities that people are faced with.

We’ve had some interest in the idea of researchers being able to request digital copies of archives through the Hub. That is, a researcher comes across an archive they would like to see, and they would like digital copies, so they indicate this in some way. Not yet fully thought out, but again, we’d need to know if there is a need for this. How many officers are starting to digitise on demand?

Finally, we’re covering music, dance, plants, medicine and the Middle East with our latest contributors. Check out who is recently on board on our contributors’ page:
http://archiveshub.ac.uk/contributors/