Skip to content

Various end-user tools for searching the FTP sites of the Ensembl project.

Notifications You must be signed in to change notification settings

stefanches7/ensembl-ftp-search

Repository files navigation

Ensembl/Ensembl Genomes FTP search API alt link

HTTP and JS interfaces for searching the FTP sites of Ensembl/Ensembl Genomes

Summary

This API is a comfortable way to get the links to the files needed from Ensembl FTP site and EnsemblGenomes FTP site. I.e., having entered e.g. organism release name/taxonomy branch and data type you are interested in (say, "Sophophora" and "vep") you'll get all the file links to the files matching your query (in this case all the links leading to "vep" files for all species that are Sophophora).

Credits

The development of this API was started during Google Summer of Code program. Ensembl/EG development team, being a part of European Bioinformatics Institute, accepted the student, Stefan Dvoretskii (author of this repository), to start and develop this project under the mentorship of Dan Staines for the best of global Bioinformatics community. The repository author pays many thanks and appreciation to his mentor and Ensembl/EBI community for the great work together and Google Summer of Code community and orgs for making this collaboration technically possible.

Usage

HTTP interface

API can be accessed through HTTP requests to the various endpoints:

  • /hello - "Hello World!" healthcheck
  • /search - the main end-point for searching.
  • /organismNameSuggestion//fileTypeSuggestion - pattern search for distinct organism_names/file_types in the local database (links to organism names and file types database).

Searching

/search endpoint provides the interface to search the database applying all the filters specified. As of now, it will intersect all the filters you have specified and return the Java-like list of links that match your whole query (i.e., list will look like [<link1>, <link2>...]).

The available filters are:

  • organismName - organism release name (lowercase, with underscores, e.g. "drosophila_melanogaster")
  • fileType - dataType, e.g. "vep" or "fasta_cdna" (fasta subtype after underscore)
  • taxaBranch - taxonomy id (number) as of NCBI Taxonomy database (e.g. "Drosophila melanogaster" has id 7227)

Please note that page and size parameters are available for paging (they should also be numbers)

Examples:

<ip>:<port>/search?taxaBranch=7227&file_type=vep&page=1&size=20 - vep files for Drosophila melanogaster, second page of 20 links (links 20-39 from the result)

curl <ip> <port> -d organism_name='drosophila_melanogaster' -d fileType='vep' - same, but the whole result set this time

Javascript interface

More user-friendly JS interface is available, see web\src\js\search-webui.

Update job

Data indices, which are used by HTTP interface to provide search and suggestion functionality, are updated through Perl update job. Source code is to find in the updatejob-perl directory.

Setup & configuration

Requirements

Following software is required to run the HTTP server:

  • JDK1.8
  • MySQL5.5+
  • Gradle *

* Gradle can be automatically installed via Gradle Wrapper, that is:

Linux:

ensembl-ftp-search~$ ./gradlew

Windows:

ensembl-ftp-search> exec gradlew.bat

Javascript user interface was built using React libraries and is therefore dependent on Node and npm.

Update job requires Perl 5.24+ to run.

Configuration

Spring Boot application is configured using main/resources/application.properties file in the Java sources root or passing the command line arguments, e. g. java -jar build/libs/ensembl-ftp-search-XXXXX.jar --server.port 9988. See Spring documentation for a comprehensive list of Spring Boot options available. Please pay special attention to the spring.datasource group values, as they should point to running and accessible MySQL database. Without it, the application won't be able to start.

Startup

HTTP server is started with ./gradlew bootRun (or gradle bootRun if you already have Gradle) command.

Logging

HTTP server logs into the command line from which it was started.

About

Various end-user tools for searching the FTP sites of the Ensembl project.

Resources

Stars

Watchers

Forks

Packages

No packages published