# Installation

To start using the EpiDiverse analysis pipelines, follow the steps below:

1. [Install Nextflow](/project/-MfxkdBDZggX_vc_sG5l/epidiverse-pipelines/installation.md#1-install-nextflow)
2. [Install the pipeline](/project/-MfxkdBDZggX_vc_sG5l/epidiverse-pipelines/installation.md#2-install-the-pipeline)
   * [Automatic](/project/-MfxkdBDZggX_vc_sG5l/epidiverse-pipelines/installation.md#2-1-automatic)
   * [Offline](/project/-MfxkdBDZggX_vc_sG5l/epidiverse-pipelines/installation.md#2-2-offline)
   * [Development](/project/-MfxkdBDZggX_vc_sG5l/epidiverse-pipelines/installation.md#2-3-development)
3. &#x20;[Pipeline configuration](/project/-MfxkdBDZggX_vc_sG5l/epidiverse-pipelines/installation.md#3-pipeline-configuration)
   * [Configuration profiles](/project/-MfxkdBDZggX_vc_sG5l/epidiverse-pipelines/installation.md#3-1-configuration-profiles)
   * [Software dependencies: Bioconda](/project/-MfxkdBDZggX_vc_sG5l/epidiverse-pipelines/installation.md#3-2-software-dependencies-bioconda)
   * [Software dependencies: Docker and Singularity](/project/-MfxkdBDZggX_vc_sG5l/epidiverse-pipelines/installation.md#3-3-software-dependencies-docker-and-singularity)
4. &#x20;[Appendices](/project/-MfxkdBDZggX_vc_sG5l/epidiverse-pipelines/installation.md#appendices)
   * [Running on EpiDiverse infrastructure](/project/-MfxkdBDZggX_vc_sG5l/epidiverse-pipelines/installation.md#running-on-epidiverse-infrastructure)

### 1) Install Nextflow

Nextflow runs on most POSIX systems (Linux, Mac OSX etc). It can be installed by running the following commands:

```bash
# Make sure that Java v8+ is installed:
java -version

# Install Nextflow v19.09+
curl -fsSL get.nextflow.io | bash

# Add Nextflow binary to your $PATH:
mv nextflow ~/bin
# OR system-wide installation:
# sudo mv nextflow /usr/local/bin
```

See [nextflow.io](https://www.nextflow.io/) for further instructions on how to install and configure Nextflow itself.

### 2) Install the pipeline

#### **2.1) Automatic**

The pipelines themselves need no installation - Nextflow will automatically fetch them from GitHub if eg. `epidiverse/wgbs` is specified as the pipeline name.

#### **2.2) Offline**

The above method requires an internet connection so that Nextflow can download the pipeline files. If you're running on a system that has no internet connection, you'll need to download and transfer the pipeline files manually using the following (pseudo)code:

```bash
# Download the latest release of the pipeline
# eg. (see https://github.com/epidiverse/wgbs/releases)
curl -L https://github.com/epidiverse/[PIPELINE]/archive/[VERSION].zip -o epidiverse-[PIPELINE]-[VERSION].zip
unzip epidiverse-[PIPELINE]-[VERSION].zip
cd /path/to/my/data
nextflow run /path/to/pipelines/epidiverse-[PIPELINE]-[VERSION] [PARAMETERS]
```

NB: Please replace `[PIPELINE]` and `[VERSION]` and `[PARAMETERS]` as necessary, depending on the latest release from e.g. <https://github.com/EpiDiverse/wgbs/releases>

#### **2.3) Development**

If you would like to make changes to the pipeline, it's best to make a fork on GitHub and then clone the files. Once cloned you can run the pipeline directly as above.

### 3) Pipeline configuration

By default, the pipelines run with the `-profile standard` configuration profile. This uses a number of sensible defaults for process requirements and is suitable for running on a simple (if powerful!) basic server. You can see this configuration in `conf/base.config` from the base directory of each pipeline repository.

Be warned of two important points about the default configuration:

1. The default profile uses the `local` executor
   * All jobs are run in the login session. If you're using a simple server, this may be fine. If you're using a compute cluster, this is bad as all jobs will run on the head node.
   * See the [Nextflow docs](https://www.nextflow.io/docs/latest/executor.html) for information about running with other hardware backends. Most job scheduler systems are natively supported.
2. Nextflow will expect all software to be installed and available on the `$PATH`

#### **3.1) Configuration profiles**

Nextflow can be configured to run on a wide range of different computational infrastructures. In addition to pipeline-specific parameters it is likely that you will need to define system-specific options.

{% hint style="info" %}
For more information, please see the [Nextflow documentation](https://www.nextflow.io/docs/latest/).
{% endhint %}

Whilst most parameters can be specified on the command line, it is usually sensible to create a configuration file for your environment. A template for such a config can be found in `assets/custom.config` from the base directory of each pipeline repository.

If you are the only person to be running this pipeline, you can create your config file as `~/.nextflow/config` and it will be applied every time you run Nextflow. Alternatively, save the file anywhere and reference it when running the pipeline with `-config /path/to/config`.

If you think that there are other people using the pipeline who would benefit from your configuration (eg. other common cluster setups), please let us know. We can add a new preset configuration profile which can used by specifying `-profile <name>` when running the pipeline.

The pipelines already come with several such config profiles - see the installation appendices and usage documentation for more information.

#### **3.2) Software dependencies: bioconda**

If you're unable to use either Docker or Singularity but you have conda installed, you can use the bioconda environment that comes with the pipeline. Using the predefined `-profile conda` configuration when running the pipeline will take care of this automatically.

If you prefer to build your own environment, running this command will create a new conda environment with all of the required software installed:

```bash
conda env create -f environment.yml     
conda clean -a                          # Recommended, not essential
conda activate wgbs                     # Name depends on version
```

The `env/environment.yml` file can be found from the base directory of the pipeline repository. Note that you may need to download this file from the GitHub project page if Nextflow is automatically fetching the pipeline files. Ensure that the bioconda environment file version matches the pipeline version that you run.

#### **3.3) Software dependencies: Docker and Singularity**

With either [Docker](https://github.com/EpiDiverse/wgbs/wiki/Installation/_edit) or [Singularity](http://singularity.lbl.gov/) installed, you can use the predefined `-profile docker` or `-profile singularity` configurations when running the pipeline to take care of software dependencies automatically using the official container pulled from Docker Hub.

If you prefer to use your own container, running the pipeline with the option `-with-singularity <container>` or `-with-docker <container>` and pointing towards a specific image will allow it to be automatically fetched and used.

If running offline with Singularity, you'll need to download and transfer the Singularity image first:

```bash
singularity pull --name epidiverse-[PIPELINE]-[VERSION].simg docker://epidiverse/[PIPELINE]:[VERSION]
```

Once transferred, use `-with-singularity` but specify the path to the image file:

```bash
nextflow run /path/to/epidiverse/[PIPELINE] -with-singularity /path/to/epidiverse-[PIPELINE]-[VERSION].simg
```

### Appendices

#### **Running on EPIDIVERSE infrastructure**

To run the pipeline on the [EpiDiverse](https://epidiverse.eu/) servers (`epi` or `diverse`), use the command line flag `-profile epi` or `-profile diverse` respectively. This tells Nextflow to submit jobs using the `SLURM` job executor and use a pre-built conda environment for software dependencies.

There are also three shortcuts available for EpiDiverse species which can be used in place of `--reference` in pipelines that require a reference genome.

* `--thlaspi`
* `--fragaria`
* `--populus`


---

# Agent Instructions: Querying This Documentation

If you need additional information that is not directly available in this page, you can query the documentation dynamically by asking a question.

Perform an HTTP GET request on the current page URL with the `ask` query parameter:

```
GET https://epidiverse.gitbook.io/project/-MfxkdBDZggX_vc_sG5l/epidiverse-pipelines/installation.md?ask=<question>
```

The question should be specific, self-contained, and written in natural language.
The response will contain a direct answer to the question and relevant excerpts and sources from the documentation.

Use this mechanism when the answer is not explicitly present in the current page, you need clarification or additional context, or you want to retrieve related documentation sections.
