The first step in setting up Lepbase was to create a locally hosted mirror of the four Lepidoptera species already on Ensembl Metazoa. If you’d like to easily install your own Ensembl mirror, checkout the code at github.com/lepbase/easy-mirror
One of the core Lepbase services is the Lepidoptera-specific Ensembl genome browser at ensembl.lepbase.org. Using Ensembl is a logical choice for a taxon-oriented resource such as Lepbase as it gives us access to a mature database structure and codebase and allows us to store data in a format that will have long-term support. Unfortunately setting up an Ensembl server, even to simply create a local mirror of existing content has long been considered non-trivial due to the number of dependencies, the complexity of the code and the interconnected configuration files which can make it difficult to trace the cause of problems during installation.
One of our earliest tasks at Lepbase was to find a way to make it easy to set up an Ensembl webserver so we could set up multiple instances for development and testing and move our site between virtual machines without worrying about missing dependencies.
Easy mirror is the result of generalising this approach to simplify setting up a mirror of any Ensembl or EnsemblGenomes (including Bacteria, Metazoa, Fungi, Plants and Protists) species with none, all or any amount in between of the data hosted locally. At the moment it is Ubuntu-specific (but we’d love to hear from anyone who’d like to use it as a guide to navigate the dependencies on another Linux distribution) and in four steps will take you from a freshly installed OS to a fully functional, locally hosted Ensembl mirror site:
- install-dependencies.sh takes care of installing all package and Perl module dependencies
- setup-databases.sh will set up users and fetch and load local copies of whichever single-/multi-species databases you wish to host locally
- update-ensembl-code.sh clones the required Ensembl/EnsemblGenomes Github repositories, checks out the appropriate branches for the release you wish to mirror and sets up basic config files for each species.
- reload-ensembl-site.sh starts/restarts the ensembl webserver using local databases if available and falling back on remote databases if necessary.
Setup is kept as simple as possible by keeping configuration options in .ini files, rather than depending on command line flags, with fully working examples included.