Skip to content
This repository has been archived by the owner on Sep 8, 2021. It is now read-only.

Web archiving / preservation testing #157

Open
tdonohue opened this issue Aug 25, 2016 · 0 comments
Open

Web archiving / preservation testing #157

tdonohue opened this issue Aug 25, 2016 · 0 comments

Comments

@tdonohue
Copy link
Member

tdonohue commented Aug 25, 2016

At OR16, I had a brief conversation with Paulo Graça (of RCAAP) regarding possible concerns about the ability to potentially archive/preserve Angular 2 websites using standard tools like Webrecorder (https://webrecorder.io/) from Internet Archive.

I'm happy to say that this week, Paulo sent me an email noting that all his tests against the demo Angular2 UI were successful! Here's the full email he sent along, which also does a great job of describing how this test could be reproduced (as needed):

We managed to talk a little bit on OpenRepositories. My name is Paulo Graça and I'm working with DSpace on a Portuguese project called RCAAP.

RCAAP it's a national initiative on open access and was born in 2008 and has the mission to promote and support the open access movement development in Portugal. One of our electronic services is based on DSpace. We use DSpace since 2008. Currently we manage the infrastructure and support of 28 DSpace repositories.

Regarding the new UI, our major concern is the preservation. With the UI change we would like that all web pages still could be indexed and grabbed by Internet Archive Search Engines. We have the privilege to work aside with one of those teams, the Portuguese Internet Archive - arquivo.pt

We would like to be able to continue to see and navigate on the past of the repository:
http://arquivo.pt/search.jsp?l=en&query=bibliotecadigital.ipb.pt&btnSubmit=Search

So we spoke with them and they recommended a tool called Webrecorder (https://webrecorder.io/). This tool works similarly with the tool they use and has the ability to store a session, an interactive session, where the user navigates through out the website. To start the user must indicate an URL to start and when it finishes the content is saved and user can replay his session. It can access the same content, even if the website is unavailable.

We did a little test. Since the content is supplied by demo.dspace.org REST API, it would not be possible to do the exact test, with same content on the next week.

The recorded session:

Note:
In the replay process if the user tries to access any other unstored content it will view a page without content.

The session has saved and can be accessed through the following URL:
https://webrecorder.io/paulo_graca/dspace-ui

For the replay test process, we didn't want for any of the calls to be made to the source website, so we define the hosts file and set content source address to be resolved on localhost.
52.202.29.190 localhost

That measure ensured that there weren't calls to http://ui-prototype.atmire.com. Or if were, they would result in 404 errors.

We replayed the session and the result was the expected, the pages were perfectly replayed, we navigate through and we were able to access the content on the website. This mean, for us, that this interface, as it is, is compatible with our expectations on website preservation.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant