Skip to content

Commit

Permalink
Prepare second release (#6)
Browse files Browse the repository at this point in the history
  • Loading branch information
0x002A authored Nov 19, 2021
1 parent 3a25612 commit 0e2e042
Show file tree
Hide file tree
Showing 19 changed files with 645 additions and 86 deletions.
38 changes: 30 additions & 8 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -3,18 +3,40 @@
**Mu**lti-**C**ore **H**ybrid **S**hort- **A**nd **L**ong-read **S**equence **A**ssembler

Based on:
Gatter, Thomas, et al. "Economic genome assembly from low coverage Illumina and Nanopore data." 20th International
Workshop on Algorithms in Bioinformatics (WABI 2020). Schloss Dagstuhl-Leibniz-Zentrum für Informatik, 2020.
Gatter, T., von Löhneysen, S., Fallmann, J. et al. LazyB: fast and cheap genome assembly. Algorithms Mol Biol 16, 8 (
2021).

# Binary
# Prerequisites

A statically linked binary for x86-64 can be downloaded from the releases page on GitHub.
`MuCHSALSA` requires the following tools to be available in the `PATH`:

- jellyfish 2
- bbduk
- Abyss 2 (using 'abyss-pe')
- minimap2
- Python 3.6.9 (at least)

# Download

A zip file containing a statically linked binary for x86-64 alongside with the assembly pipeline and associated scripts
can be downloaded from the releases page on GitHub.

# Usage

In contrast to [LazyB](https://github.com/TGatter/LazyB) this tool has an additional parameter to control the level of
parallelization. Simply plug it into the LazyB-pipeline and set the additional parameter.
**ATTENTION**: If no value is given all available cores will be used.
The pipeline can be run using the following command:

```bash
sh pipeline.sh \[k-mer-size-filter\] \[k-mer-size-assembly\] \[name\] \[illumina-inputfile-1\] \[illumina-inputfile-2\]] \[nanopore-inputfile\] \[output-folder\]
```

**Note**: The level of parallelization used for parts (default value: 8) of the pipeline is set via a variable
in `pipeline.sh`.

[k-mer-size-filter] specifies the k-mer size for k-mer counting in raw illumina data. Reads with highly abundant k-mers
are removed from the data. Starting at k=50 is recommended.

[k-mer-size-assembly] specifies the k-mer size during illumina assembly (here using Abyss). Starting at k=90 is
recommended.

# Building

Expand Down Expand Up @@ -59,5 +81,5 @@ find . -regex '.*\.\(cpp\|h\)' -exec clang-format-12 -style=file -i {} \;

# Standard Library

This project uses _libc++_ as standard library. For information on building _libc++_
This project uses `libc++-12` as standard library. For information on building `libc++`
see [here](https://libcxx.llvm.org/docs/BuildingLibcxx.html).
2 changes: 1 addition & 1 deletion include/ms/BlastFileReader.h
Original file line number Diff line number Diff line change
Expand Up @@ -26,7 +26,7 @@

#include <gsl/pointers>

#include "Lb.fwd.h"
#include "MS.fwd.h"

namespace muchsalsa {

Expand Down
17 changes: 8 additions & 9 deletions include/ms/Kernel.h
Original file line number Diff line number Diff line change
Expand Up @@ -19,8 +19,8 @@
//
//===---------------------------------------------------------------------------------------------------------------==//

#ifndef INCLUDED_MUCHSALSA_PROKRASTINATOR
#define INCLUDED_MUCHSALSA_PROKRASTINATOR
#ifndef INCLUDED_MUCHSALSA_KERNEL
#define INCLUDED_MUCHSALSA_KERNEL

#pragma once

Expand All @@ -31,7 +31,7 @@
#include <unordered_map>
#include <vector>

#include "Lb.fwd.h"
#include "MS.fwd.h"
#include "graph/Edge.h"

namespace muchsalsa {
Expand All @@ -40,9 +40,8 @@ namespace muchsalsa {
// KERNEL
// =====================================================================================================================

std::optional<graph::EdgeOrder> computeOverlap(matching::MatchMap const &matches, std::vector<unsigned int> &ids,
graph::Edge const &edge, bool direction, std::size_t score,
bool isPrimary);
std::optional<graph::EdgeOrder> getOverlap(matching::MatchMap const &matches, std::vector<unsigned int> &ids,
graph::Edge const &edge, bool direction, std::size_t score, bool isPrimary);

std::vector<std::tuple<std::vector<unsigned int>, std::size_t, bool>>
getMaxPairwisePaths(matching::MatchMap const &matches, graph::Edge const &edge,
Expand All @@ -55,8 +54,8 @@ graph::Graph getMaxSpanTree(graph::Graph const &graph);

std::vector<std::vector<muchsalsa::graph::Vertex *>> getConnectedComponents(graph::Graph const &graph);

graph::DiGraph getDirectionGraph(gsl::not_null<matching::MatchMap *> pMatchMap, graph::Graph const &graph,
graph::Graph const &connectedComponent, graph::Vertex const &startNode);
graph::DiGraph getDirectedGraph(gsl::not_null<matching::MatchMap *> pMatchMap, graph::Graph const &graph,
graph::Graph const &connectedComponent, graph::Vertex const &startNode);

std::vector<std::vector<muchsalsa::graph::Vertex const *>> linearizeGraph(gsl::not_null<graph::DiGraph *> pDiGraph);

Expand All @@ -68,6 +67,6 @@ void assemblePath(

} // namespace muchsalsa

#endif // INCLUDED_MUCHSALSA_PROKRASTINATOR
#endif // INCLUDED_MUCHSALSA_KERNEL

// ---------------------------------------------------- END-OF-FILE ----------------------------------------------------
File renamed without changes.
2 changes: 1 addition & 1 deletion include/ms/SequenceAccessor.h
Original file line number Diff line number Diff line change
Expand Up @@ -34,7 +34,7 @@
#include <unordered_map>
#include <utility>

#include "Lb.fwd.h"
#include "MS.fwd.h"

namespace muchsalsa {

Expand Down
2 changes: 1 addition & 1 deletion include/ms/graph/Edge.h
Original file line number Diff line number Diff line change
Expand Up @@ -29,7 +29,7 @@
#include <utility>
#include <vector>

#include "Lb.fwd.h"
#include "MS.fwd.h"
#include "types/Direction.h"
#include "types/Toggle.h"

Expand Down
2 changes: 1 addition & 1 deletion include/ms/graph/Graph.h
Original file line number Diff line number Diff line change
Expand Up @@ -38,7 +38,7 @@
#include <utility>
#include <vector>

#include "Lb.fwd.h"
#include "MS.fwd.h"
#include "Util.h"
#include "graph/Vertex.h"

Expand Down
64 changes: 24 additions & 40 deletions include/ms/matching/Id2OverlapMap.h
Original file line number Diff line number Diff line change
Expand Up @@ -58,77 +58,61 @@ struct KeyEqual : public std::binary_function<key_t, key_t, bool> {
// class Id2OverlapMap
// -------------------

/**
* Class representing the mapping of pairs of a illumina id and a clique index to overlaps.
*/
class Id2OverlapMap {
public:
/**
* Constructor.
*/
Id2OverlapMap() = default;

/**
* Destructor.
*/
~Id2OverlapMap() = default;

/**
* Moving is disallowed.
*/
Id2OverlapMap(Id2OverlapMap const &) = delete;

/**
* Copying is disallowed.
*/
Id2OverlapMap(Id2OverlapMap &&) = delete;

/**
* Move assignment is disallowed.
*/
Id2OverlapMap &operator=(Id2OverlapMap &&) = delete;

/**
* Copy assignment is disallowed.
*/
Id2OverlapMap &operator=(Id2OverlapMap const &) = delete;

/**
* Assigment operator.
*
* @param key
* @return
* @param key a const reference to a std::pair of an unsigned int and a std::size_t representing the illumina id and
* the clique index which the overlap is or should become mapped to
* @return A reference to the mapped overlap, performing an insertion if no element with a key equivalent to key exist
*/
std::pair<int, int> &operator[](detail::key_t const &key);

/**
* Access operator.
*
* @param key
* @return
* @param key a const reference to a std::pair of an unsigned int and a std::size_t representing the illumina id and
* the clique index which the overlap is mapped to
* @return A const reference to the mapped overlap
* @throws std::out_of_range
*/
std::pair<int, int> const &at(detail::key_t const &key) const;

/**
* Access operator.
*
* @param key
* @return
* @param key a const reference to a std::pair of an unsigned int and a std::size_t representing the illumina id and
* the clique index which the overlap is mapped to
* @return A reference to the mapped overlap
* @throws std::out_of_range
*/
std::pair<int, int> &at(detail::key_t const &key);

std::size_t getSize() const;

private:
std::unordered_map<detail::key_t, std::pair<int, int>, detail::KeyHash, detail::KeyEqual>
m_map; /*!< std::unordered_map storing the mapping */
};

// =====================================================================================================================
// INLINE DEFINITIONS
// =====================================================================================================================

// -------------------
// class Id2OverlapMap
// -------------------

// PUBLIC CLASS METHODS

inline std::pair<int, int> &Id2OverlapMap::operator[](detail::key_t const &key) { return m_map[key]; }

inline std::pair<int, int> &Id2OverlapMap::at(detail::key_t const &key) { return m_map.at(key); }

inline std::pair<int, int> const &Id2OverlapMap::at(detail::key_t const &key) const { return m_map.at(key); }

inline std::size_t Id2OverlapMap::getSize() const { return m_map.size(); }

} // namespace muchsalsa::matching

#endif // INCLUDED_MUCHSALSA_ID2OVERLAPMAP
Expand Down
2 changes: 1 addition & 1 deletion include/ms/matching/MatchMap.h
Original file line number Diff line number Diff line change
Expand Up @@ -31,7 +31,7 @@
#include <unordered_map>
#include <utility>

#include "Lb.fwd.h"
#include "MS.fwd.h"
#include "graph/Graph.h"
#include "types/Toggle.h"

Expand Down
2 changes: 1 addition & 1 deletion include/ms/types/Toggle.h
Original file line number Diff line number Diff line change
Expand Up @@ -37,7 +37,7 @@ namespace muchsalsa {
/**
* Class representing a toggle which can reach two possible states.
*
* A toggle has the size of a bool, but supports operations like an unsigned integer.
* A toggle has the size of a bool, but supports operations like an integer.
*
* Example:
*
Expand Down
24 changes: 24 additions & 0 deletions libms/src/MS.fwd.cpp
Original file line number Diff line number Diff line change
@@ -0,0 +1,24 @@
// -*- C++ -*-
//===---------------------------------------------------------------------------------------------------------------==//
//
// Copyright (C) 2021 Kevin Klein
// This file is part of MuCHSALSA <https://github.com/0x002A/MuCHSALSA>.
//
// MuCHSALSA is free software: you can redistribute it and/or modify it under the terms of the GNU General
// Public License as published by the Free Software Foundation, either version 3 of the License, or (at your option) any
// later version.
//
// MuCHSALSA is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the
// implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for more
// details.
//
// You should have received a copy of the GNU General Public License along with MuCHSALSA.
// If not, see <http://www.gnu.org/licenses/>.
//
// SPDX-License-Identifier: GPL-3.0-or-later
//
//===---------------------------------------------------------------------------------------------------------------==//

#include "MS.fwd.h"

// ---------------------------------------------------- END-OF-FILE ----------------------------------------------------
2 changes: 1 addition & 1 deletion libms/src/kernel/ap.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -1090,7 +1090,7 @@ void muchsalsa::assemblePath(

auto globalLeftMostPosition = globalPos1 * -1;
auto const targetName = [&]() {
std::string targetName = ">Prokrastinator_";
std::string targetName = ">muchsalsa_";
targetName.append(std::to_string(asmIdx));

return targetName;
Expand Down
8 changes: 4 additions & 4 deletions libms/src/kernel/dg.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -32,10 +32,10 @@
#include "types/Direction.h"
#include "types/Toggle.h"

muchsalsa::graph::DiGraph muchsalsa::getDirectionGraph(gsl::not_null<muchsalsa::matching::MatchMap *> pMatchMap,
muchsalsa::graph::Graph const & graph,
muchsalsa::graph::Graph const & connectedComponent,
muchsalsa::graph::Vertex const &startNode) {
muchsalsa::graph::DiGraph muchsalsa::getDirectedGraph(gsl::not_null<muchsalsa::matching::MatchMap *> pMatchMap,
muchsalsa::graph::Graph const &graph,
muchsalsa::graph::Graph const &connectedComponent,
muchsalsa::graph::Vertex const &startNode) {
std::stack<std::tuple<graph::Vertex const *, Toggle>> stack;
stack.push(std::make_tuple(&startNode, true));

Expand Down
22 changes: 11 additions & 11 deletions libms/src/kernel/co.cpp → libms/src/kernel/ol.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -29,9 +29,9 @@

namespace {

std::pair<double, double> computeOverhangs(muchsalsa::matching::MatchMap const & matches,
muchsalsa::graph::Vertex const *const pVertex,
muchsalsa::graph::Edge const &edge, unsigned int illuminaId) {
std::pair<double, double> getOverhangs(muchsalsa::matching::MatchMap const &matches,
muchsalsa::graph::Vertex const *const pVertex,
muchsalsa::graph::Edge const &edge, unsigned int illuminaId) {
auto const pVertexMatch = gsl::make_not_null(matches.getVertexMatch(pVertex->getId(), illuminaId));
auto const pEdgeMatch = gsl::make_not_null(matches.getEdgeMatch(&edge, illuminaId));

Expand All @@ -52,19 +52,19 @@ std::pair<double, double> computeOverhangs(muchsalsa::matching::MatchMap const &

} // unnamed namespace

std::optional<muchsalsa::graph::EdgeOrder> muchsalsa::computeOverlap(muchsalsa::matching::MatchMap const &matches,
std::vector<unsigned int> & ids,
muchsalsa::graph::Edge const &edge, bool direction,
std::size_t score, bool isPrimary) {
std::optional<muchsalsa::graph::EdgeOrder> muchsalsa::getOverlap(muchsalsa::matching::MatchMap const &matches,
std::vector<unsigned int> &ids,
muchsalsa::graph::Edge const &edge, bool direction,
std::size_t score, bool isPrimary) {
auto const &firstId = ids.front();
auto const &lastId = ids.back();

auto const vertices = edge.getVertices();

auto const overhangsFirstIdFirstVertex = computeOverhangs(matches, vertices.first, edge, firstId);
auto const overhangsLastIdFirstVertex = computeOverhangs(matches, vertices.first, edge, lastId);
auto const overhangsFirstIdSecondVertex = computeOverhangs(matches, vertices.second, edge, firstId);
auto const overhangsLastIdSecondVertex = computeOverhangs(matches, vertices.second, edge, lastId);
auto const overhangsFirstIdFirstVertex = getOverhangs(matches, vertices.first, edge, firstId);
auto const overhangsLastIdFirstVertex = getOverhangs(matches, vertices.first, edge, lastId);
auto const overhangsFirstIdSecondVertex = getOverhangs(matches, vertices.second, edge, firstId);
auto const overhangsLastIdSecondVertex = getOverhangs(matches, vertices.second, edge, lastId);

auto const leftOverhangFirstVertex = overhangsFirstIdFirstVertex.first;
auto const rightOverhangFirstVertex = overhangsLastIdFirstVertex.second;
Expand Down
Loading

0 comments on commit 0e2e042

Please sign in to comment.