Project Tour

Let’s take a tour of the project repository. If you haven’t already, clone the repository for the project:

git clone git@github.com:phyletica/codiv-sanger-bake-off.git

and then cd into it:

cd codiv-sanger-bake-off

Here’s an overview of the contents of the project repository

bin directory

This directory includes executable Bash scripts psub and spawn_job_array.

These Bash scripts were written for members of the Phyletica Lab to submit jobs to the queues on Auburn University’s Hopper cluster. If you are working on a different system and want to use these scripts to submit analyses for this project, you will need to edit these files to work for your system. Alternatively, you can submit the analyses “manually;” simple for loops at the command prompt would work just fine to submit the jobs.

Also, when you run the setup_project_env.sh, described below, all of the ecoevolity tools will get installed in this bin directory.

python-requirements.txt file

This is a text file that tells Python’s venv (or virtualenv) how to set up a Python environment with the necessary requirements to allow all the Python scripts in the project to run successfully.

This Python environment will get created when you run the setup_project_env.sh (described below).

data directory

This directory contains data files in a file format (YAML) recognized by ecoevolity.

All of the files with the naming scheme of comp##-#species-#genomes-######chars.txt are “dummy” data files that will be used by the simcoevolity tool to simulate datasets. They are fully valid data files for ecoevolity, but are only meant to serve the purpose of “telling” simcoevolity the size of datasets to simulate.

docs directory

This directory contains the HTML of the project documunetation which is served by GitHub at http://phyletica.org/codiv-sanger-bake-off.

These HTML files are generated by Sphinx from the source files in docs-source.

docs-source directory

This directory contains the source files that are used by Sphinx to create the HTML files for the project site. These files are in reStructuredText format.

If you want to add to or edit the documentation for this project, docs-source is where to do that. This is covered in the Working on project docs section.

configs directory

This directory contains all of the configuration files needed for the ecoevolity tools. These configuration files specify where the data files are located, and all of the settings for analysis.

These configs are covered more thoroughly in the The ecoevolity configs section.

For more details about ecoevolity config files, please see the ecoevolity documentation.

modules-to-load.sh file

This file contains the shell commands to load the modules on AU’s Hopper cluster that are needed for setting up and working on the project. If you are working on a different system, you will have to determine what modules are needed on your system to replace these; if your cluster is relatively new, perhaps none!

README.md file

This file contains some basic information about the project, and is rendered on the GitHub landing page for the project repository.

scripts directory

This contains a number of Bash and Python scripts that will be doing most of the “heavy lifting” for this project.

setup_project_env.sh script

This is a executable Bash script that will

  1. Compile and install (locally; within the project directory) ecoevolity.

  2. Create a Python virtual environment for this project.

See the Setting up the project section for instructions on using this script to setup the working environment for this project.