Report deadline: April, 28th, during test 1 (or in the DI secretary)
The goal of the project is to create a system for indexing and
searching information about (external) documents.
A document will consist in the following information: an URL and a
list
of keywords.
The system will allow a client to:
- Add documents to be indexed;
- Remove documents from the index;
- Search for documents, given a set of keywords.
Indexing services are useful and used in a number of scenarios.
For
example, operating systems like Windows and Mac have indexing
services
that help searching for files based on their contents (e.g.
Windows
search, Spotlight). Document repositories, in addition to
providing
document storage and retrieval, also maintain indices for helping
searching for information on the documents stored (e.g., Apache
Solr
can be used to build such systems... and more complex ones :-). In
the
former example, the system you will be building could be used for
indexing information about files by adding each file to the index
whenever it has changed - the URL would be the URL of the file in
the
filesystem (file://...) and the keywords would be the words
present in
the files. A similar approach could be used for the second
example,
replacing the notion of file for document. Likewise, you can also
use
the system you are building for indexing web pages, or any other
documents that have an URL and for which you can identify a set of
keywords.
The system must contain, at least, the following components:
Rendez-vous Server
The rendez-vous server maintains a list of indexing servers. The
REST
interface of this server should be the following:
@Path("/contacts")
public interface RendezVousService {
@GET
@Produces(MediaType.APPLICATION_JSON)
Endpoint[] endpoints();
@POST
@Path("/{id}")
@Consumes(MediaType.APPLICATION_JSON)
void register( @PathParam("id")
String id,
Endpoint endpoint);
@DELETE
@Path("/{id}")
void unregister(String id);
}
Indexing servers
Each indexing server maintains indexing information.
The REST interface of this server should be the following:
@Path("/indexer")
public interface IndexerService {
@GET
@Path("/search")
@Produces(MediaType.APPLICATION_JSON)
List<String> search( @QueryParam("query") String keywords
);
@POST
@Path("/{id}")
@Consumes(MediaType.APPLICATION_JSON)
void add( @PathParam("id") String id, Document doc );
@DELETE
@Path("/{id}")
void remove( @PathParam("id") String id );
}
The SOAP interface of this server, is the following:
package api.soap;
@WebService
public interface IndexerAPI {
@WebFault
class InvalidArgumentException
extends
Exception {
private static
final
long serialVersionUID = 1L;
public
InvalidArgumentException() {
super("");
}
public
InvalidArgumentException(String msg) {
super(msg);
}
}
static final String
NAME="IndexerService";
static final String
NAMESPACE="http://sd2017";
static final String
INTERFACE="api.soap.IndexerAPI";
/* keywords contains a list of works
separated by '+'
* returns the list of urls of the
documents
stored in this server that contain all the keywords
* throws IllegalArgumentException if
keywords
is null
*/
@WebMethod
List<String> search(String keywords) throws
InvalidArgumentException;
/*
* return true if document was added,
false if
the document already exists in this server.
* throws IllegalArgumentException if
doc is
null
*/
@WebMethod
boolean add(Document doc) throws
InvalidArgumentException ;
/*
* return true if document was removed,
false
if was not found in the system.
* throws IllegalArgumentException if id
is null
*/
@WebMethod
boolean remove(String id) throws
InvalidArgumentException ;
}
- To allow clients to distinguish between REST and SOAP service instances, the endpoint of the server should should be registered at the RendezVousServer with attibutes that include the key "type" with "rest" or "soap", respectively. In the absence of the "type" key, the client will assume the server is a REST server.
- The first argument of the indexer, if present, must be the url of the rendezvous server -- the test program will start the indexer with the correct parameters.
The indexing service will be used by clients according to the following access pattern.
For adding information for a document, a client will: (1) contact
the rendez-vous server to get a list of indexing servers; (2)
select
one of the indexing servers and invoke the add operation.
For removing information of a document, a client will: (1) contact the rendez-vous server to get a list of indexing servers; (2) select one of the indexing servers and invoke the remove operation.
For searching for information stored in the system, a client will: (1) contact the rendez-vous server to get a list of indexing servers; (2) select one of the indexing servers and invoke the search operation.
IMPORTANT: In phase 1, each indexing server only needs to be
able
to return information for documents that have been added to that
server.
NOTE: We will provide the following components, to be used
in
the system being developed and for testing it:
- a library for indexing, supporting an interface similar to the interface of the indexing server, which stores information locally in a node;
- a test program that will execute a sequence of operations and check if the returned results are the expected ones. You should not change the code of the test program.
The rendez-vous server must maintain up-to-date information about indexing servers. To this end, the information about an indexing server must be discarded if the servers stops.
It should be possible to automatically find the rendez-vous server. To this end, the rendez-vous server should reply to a multicast request with message "rendezvous" with a string with the URL of the rendez-vous server. The multicast address and port used by the server can be selected freely.
This consists in the complete system, composed by the rendez-vous and indexing servers, communicating using REST.
As a result of this option,
you
should have a working system consisting in a REST-based
rendez-vous
server and a set of REST-based indexing servers.
Each indexing server only needs to be able to return information for documents that have been added to that server. However, if a remove for a given document is invoked in a server, the information for that document should be removed independently of the server where it is indexed.
This consists in implementing
the
indexing server using SOAP. It is optional to also implement the
rendez-vous server in SOAP or to use the REST version. The exact
interfaces that the servers must implement will be introduced in
lab 3.
As a result of this option,
you
should have a working system consisting in a rendez-vous server
(either
using REST or SOAP) and a set of SOAP-based indexing servers.
Each indexing server only needs to be able to return information for documents that have been added to that server. However, if a remove for a given document is invoked in a server, the information for that document should be removed independently of the server where it is stored.
This consists in having REST and SOAP indexing servers capable of working together.
As a result of this option, you should have a working system consisting in a REST-based rendez-vous server and a set of indexing servers, some working in REST and the others working in SOAP.
Regarding failures of the components, you must assume:
- the rendez-vous server will not fail;
- indexer servers may fail permanently
(fail-stop model) -- note that this will connections to the
server to
fail.
Regariding communications, you should assume that communication
may
fail temporarily.
IMPORTANT: The project must be demonstrated in the labs, with servers running in at least two computers/containers, either using existing hardware or student's hardware.
Your system will be tested using
the test program provided in this link, which is
divided in steps
that test the different functionalities of your program -- you
should
use the client to check the progress of your project as you add
new
functionalities to your work.
The grading of your project will take into consideration the tests passed by your system -- so , you should guarantee that your systems passes as many test as possible (projects will be accepted even if they do not pass all tests).
A written report must be delivered by each group describing their work and implementation. The report should have at most 4 pages (any code that is found relevant should be delivered as an appendix that goes beyond the 4 page limit).
The report must cover the following topics.
- General description of the work performed by the students,
clearly identifying which aspects were completed and fully
implemented.
- Limitations of the delivered code.
Students should include as annex a table that specifies which tests their code passed. For the failed tests, students should indicate whether the test has failed because the tested functionality was not implemented or because it had a bug.
- Interfaces of the servers (both SOAP and REST).
- Clear explanation of the mechanisms (i.e, protocols) employed
for:
- Discovery of the rendez-vous servers.
- Keeping the rendez-vous server up-to-date.
- Handling of faults.
- Discussion of the implementation decisions taken by the students, when applicable, discussing these decisions in light of possible alternatives (this should include how operations are executed, with focus on those that the implementation in non-trivial).
The report can also cover aspects related with difficulties felt by the students during the execution of the project or other aspects that the students consider relevant.
The code of the project should be delivered in electronic format,
by
uploading a zip file that includes:
- all source files (src directory in the project)
- the sd2017-t1.props file
- the pom.xml file
Use this **** link ****
to deliver your work (NOTE: you must login with your @campus account).
To keep the size of the zip archive small, zip full eclipse
project minus the target folder that maven generates with
the compiled classes and downloaded dependencies.
IMPORTANT: The name of the zip archive should be:
SD2017-T1-NUM1.zip or SD2017-T1-NUM1-NUM2.zip
NOTE: You may deliver the project as many times as needed.