Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Redefine validity for records #1

Merged
merged 2 commits into from
Oct 18, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
24 changes: 9 additions & 15 deletions .github/workflows/documentation.yml
Original file line number Diff line number Diff line change
Expand Up @@ -8,22 +8,16 @@ on:
pull_request:

jobs:
build:
permissions:
contents: write
pull-requests: read
statuses: write
Documenter:
name: Documentation
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4
- uses: julia-actions/setup-julia@v2
- uses: actions/checkout@v2
- uses: julia-actions/setup-julia@v1
with:
version: 'nightly'
- uses: julia-actions/cache@v1
- name: Install dependencies
run: julia --project=docs/ -e 'using Pkg; Pkg.develop(PackageSpec(path=pwd())); Pkg.instantiate()'
- name: Build and deploy
version: '1'
- uses: julia-actions/julia-buildpkg@latest
- uses: julia-actions/julia-docdeploy@latest
env:
GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }} # If authenticating with GitHub Actions token
DOCUMENTER_KEY: ${{ secrets.DOCUMENTER_KEY }} # If authenticating with SSH deploy key
run: julia --project=docs/ docs/make.jl
GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }}
DOCUMENTER_KEY: ${{ secrets.DOCUMENTER_KEY }}
25 changes: 12 additions & 13 deletions .github/workflows/unittest.yml
Original file line number Diff line number Diff line change
Expand Up @@ -8,35 +8,34 @@ on:

jobs:
test:
name: Julia ${{ matrix.version }} - ${{ matrix.os }} - ${{ matrix.arch }}
runs-on: ${{ matrix.os }}
continue-on-error: ${{ matrix.experimental }}
strategy:
fail-fast: false
matrix:
julia-version:
- 'nightly'
os: [ ubuntu-latest, windows-latest ]
arch: [ x64 ]
julia-version: ['1', '1.11']
os: [ubuntu-latest, macOS-latest, windows-latest]
experimental: [false]
include:
# Include nightly, but experimental, so it's allowed to fail without
# failing CI.
- julia-version: nightly
julia-arch: x86
os: ubuntu-latest
experimental: true
# - julia-version: 1
# os: macOS-latest
# experimental: false
fail_ci_if_error: false
steps:
- name: Checkout Repository
uses: actions/checkout@v2
uses: actions/checkout@v3
- name: Setup Julia
uses: julia-actions/setup-julia@v1
uses: julia-actions/setup-julia@latest
with:
version: ${{ matrix.julia-version }}
- name: Run Tests
uses: julia-actions/julia-runtest@latest
- name: Create CodeCov
uses: julia-actions/julia-processcoverage@v1
uses: julia-actions/julia-processcoverage@latest
- name: Upload CodeCov
uses: codecov/codecov-action@v1
uses: codecov/codecov-action@v4
with:
file: ./lcov.info
flags: unittests
Expand Down
5 changes: 3 additions & 2 deletions docs/Project.toml
Original file line number Diff line number Diff line change
Expand Up @@ -3,5 +3,6 @@ Documenter = "e30172f5-a6a5-5a46-863b-614d45cd2de4"
MemoryViews = "a791c907-b98b-4e44-8f4d-e4c2362c6b2f"
XAMAuxData = "e99d641e-1821-45d7-9150-ecb7bf333fe1"

[sources]
XAMAuxData = {path = ".."}
[compat]
Documenter = "1"
MemoryViews = "0.2"
34 changes: 13 additions & 21 deletions docs/make.jl
Original file line number Diff line number Diff line change
@@ -1,33 +1,25 @@
using Documenter, XAMAuxData

meta = quote
using XAMAuxData: SAM, BAM, AuxTag, Hex, Errors, Error
using XAMAuxData: SAM, BAM, AuxTag, Hex, Errors, Error, is_well_formed
using MemoryViews: MemoryView
line = "AK:z:some string\ts1:i:2512\tst:A:+\tas:f:211.2\tar:B:c3,-16,21,-100"
end

DocMeta.setdocmeta!(
XAMAuxData,
:DocTestSetup,
meta,
recursive=true
)
DocMeta.setdocmeta!(XAMAuxData, :DocTestSetup, meta; recursive=true)

makedocs(
sitename = "XAMAuxData.jl",
modules = [XAMAuxData],
pages = [
"Home" => "index.md",
"Reference" => "reference.md",
],
authors = "Jakob Nybo Nissen",
checkdocs = :public,
makedocs(;
sitename="XAMAuxData.jl",
modules=[XAMAuxData],
pages=["Home" => "index.md", "Reference" => "reference.md"],
authors="Jakob Nybo Nissen",
checkdocs=:public,
remotes=nothing,
)

deploydocs(
repo = "github.com/BioJulia/XAMAuxData.jl.git",
push_preview = true,
deps = nothing,
make = nothing,
deploydocs(;
repo="github.com/BioJulia/XAMAuxData.jl.git",
push_preview=true,
deps=nothing,
make=nothing,
)
47 changes: 44 additions & 3 deletions docs/src/index.md
Original file line number Diff line number Diff line change
@@ -1,7 +1,7 @@
```@meta
CurrentModule = XAMAuxData
DocTestSetup = quote
using XAMAuxData: BAM, SAM, AuxTag, Hex, Errors, Error
using XAMAuxData: BAM, SAM, AuxTag, Hex, Errors, Error, is_well_formed
using MemoryViews: MemoryView
end
```
Expand All @@ -19,7 +19,9 @@ Any differences to the BAM format will be explicitly mentioned.

!!! note
Annoyingly, the specification of GFA auxiliary fields differ slightly from that of SAM
auxiliary fields. Hence, in the future, a dedicated GFA module may be introduced.
auxiliary fields.
Currently, this package implements only the SAM specification, and as such does not
fully support GFA files. In the future, a dedicated GFA module may be introduced.

The single auxiliary field `AN:i:1234` is encoded as the key-value pair `AuxTag("AN") => 1234`.
A collection of aux fields are represented by a `SAM.Auxiliary` (or `BAM.Auxiliary`), which are subtypes of `AbstractDict{AuxTag, Any}`.
Expand Down Expand Up @@ -101,7 +103,7 @@ false
```

## Manipulating `Auxiliary` objects
`Auxiliary`'s can be read and written like a normal `AbstractDict{AuxTag, Any}`:
`Auxiliary`s can be read and written like a normal `AbstractDict{AuxTag, Any}`:

```jldoctest
julia> aux = SAM.Auxiliary(UInt8[], 1); # empty Auxiliary
Expand Down Expand Up @@ -256,3 +258,42 @@ println(
true

```

## Invalid data in `Auxiliaries`
The elements of an `Auxiliary` are lazily loaded, and in the interest of speed,
there is no mandatory validation of the data done when constructing an `Auxiliary`.
Hence, they may contain invalid data.
This package distinguishes two different kinds of bad data:

1. If the data is malformed in such a way that it's not possible to identify
the keys of the auxiliary, or the data segment of the corresponding values,
we say the auxiliary is malformed.
Loading keys or values from malformed auxiliaries _may_ throw an exception.
The function [`is_well_formed`](@ref) can be used to check for malformed auxiliaries.

2. If the keys and the data segments corresponding to the values _can_ be identified,
but the data itself is corrupt such that the values cannot be loaded, we instead
say the auxiliary is invalid.
Loading an invalid value return an object of type [`Error`](@ref).
The validity of an auxiliary can be checked with `isvalid`.
All valid records are also well-formed.

For example:
```jldoctest
julia> # This is completely mangled

julia> aux = SAM.Auxiliary("erwlifju093");

julia> (is_well_formed(aux), isvalid(aux))
(false, false)

julia> # Keys and values can be identified, but data can't be loaded as an integer

julia> aux = SAM.Auxiliary("AB:i:dslkjas");

julia> (is_well_formed(aux), isvalid(aux))
(true, false)

julia> only(values(aux)) isa Error
true
```
5 changes: 3 additions & 2 deletions docs/src/reference.md
Original file line number Diff line number Diff line change
@@ -1,7 +1,7 @@
```@meta
CurrentModule = XAMAuxData
DocTestSetup = quote
using XAMAuxData: BAM, SAM, AuxTag, Hex, Errors, Error
using XAMAuxData: BAM, SAM, AuxTag, Hex, Errors, Error, is_well_formed
end
```

Expand All @@ -13,5 +13,6 @@ Error
Errors
SAM.Auxiliary
BAM.Auxiliary
is_well_formed
Base.isvalid(::XAMAuxData.AbstractAuxiliary)
```
```
122 changes: 83 additions & 39 deletions src/XAMAuxData.jl
Original file line number Diff line number Diff line change
Expand Up @@ -6,7 +6,7 @@ using StringViews: StringView
struct Unsafe end
const unsafe = Unsafe()

export Hex, AuxTag, SAM, BAM
export Hex, AuxTag, SAM, BAM, is_well_formed
public Errors, Error

# These are the numerical types supported by the BAM format.
Expand Down Expand Up @@ -130,42 +130,6 @@ end

abstract type AbstractAuxiliary{T} <: AbstractDict{AuxTag, Any} end

"""
isvalid(aux::Auxiliary) -> Bool

Check if the keys of `aux` are valid. Valid `aux` can be iterated
and accessed without throwing an exception.
Invalid `aux` has at least one key/value pair which will error when
accessed.
This function does _not_ validate if the values of `aux` are well-formatted,
a valid `Auxiliary` may return an `Errors.Error` when accessed.
To check if all values of `aux` is valid, use
`isvalid(aux) && all(i -> !isa(i, SAM.Error), values(aux))`.

# Examples
```jldoctest
julia> aux = BAM.Auxiliary("KLZab\\t\\0ABCF");

julia> isvalid(aux)
true

julia> aux["KL"] == Errors.InvalidString
true

julia> aux = BAM.Auxiliary("KLZab\\tABCF");

julia> isvalid(aux)
false

julia> aux["KL"]
ERROR: BAM string or Hex type not terminated by null byte
[...]
```
"""
function Base.isvalid(aux::AbstractAuxiliary)
all(i -> !isa(i, Error), iter_encodings(aux))
end

function striptype end
function Base.copy(aux::AbstractAuxiliary)
x = aux.x
Expand Down Expand Up @@ -332,6 +296,86 @@ function load_hex(mem::ImmutableMemoryView)::Union{Memory{UInt8}, Error}
hex
end

"""
is_well_formed(aux::AbstractAuxiliary) -> Bool

Check if the auxiliary is well formed.
`AbstractAuxiliary` distinguishes two ways their data can be incorrect:
If the data is corrupted such that it is not possible to identify the key, or
start/end of the encoded data of the value for a key/value pair, we say the
auxiliary is malformed.
Accessing values, or iterating such an auxiliary may or may not throw an exception.
This notion of malformedness is what this function checks for.

Alternatively, if the keys and encoded values _can_ be identified, but the encoded
values are corrupt and the value cannot be loaded, we say the auxiliary is invalid.
Such auxiliaries can be iterated over, and the values can be loaded, although the
loaded value will be of type [`XAMAuxData.Error`](@ref).
This can be checked for with `isvalid`.
Valid auxiliaries are always well-formed.

# Examples
```jldoctest
julia> aux = SAM.Auxiliary("KM:i:252\\tAK::C"); # bad key

julia> is_well_formed(aux)
false

julia> aux["KM"] # loading a value may error
252

julia> aux["AK"] # loading a value may error
ERROR: Invalid SAM tag header. Expected <AuxTag>:<type tag>:, but found no colons.
[...]

julia> aux = SAM.Auxiliary("AB:Z:αβγδ\\tCD:f:-1.2"); # bad value encoding

julia> (is_well_formed(aux), isvalid(aux))
(true, false)

julia> aux["AB"] # note: numerical value of enum is not API
InvalidString::Error = 9
```
"""
function is_well_formed(it::AbstractAuxiliary)
all(iter_encodings(it)) do i
!isa(i, Error)
end
end

"""
isvalid(aux::AbstractAuxiliary) -> Bool

Check if the auxiliary is well formed, and that all the values can be loaded
without returning an [`XAMAuxData.Error`](@ref).

# Examples
```jldoctest
julia> aux = SAM.Auxiliary("AB:i:not an integer");

julia> is_well_formed(aux)
true

julia> isvalid(aux)
false
```

See also: [`is_well_formed`](@ref)
"""
Base.isvalid(x::AbstractAuxiliary) = throw(MethodError(isvalid, (x,)))

function validate_hex(mem::ImmutableMemoryView{UInt8})::Bool
len = length(mem)
iseven(len) || return false
good = true
for byte in mem
good &= byte in UInt8('0'):UInt8('9') ||
byte in UInt8('a'):UInt8('h') ||
byte in UInt8('A'):UInt8('H')
end
good
end

function pack_hex(a::UInt8)::UInt8 # 0xff for bad hex
if a ∈ 0x30:0x39
a - 0x30
Expand All @@ -345,7 +389,7 @@ function pack_hex(a::UInt8)::UInt8 # 0xff for bad hex
end

function Base.show(io::IO, ::MIME"text/plain", x::AbstractAuxiliary)
if isvalid(x)
if is_well_formed(x)
buf = IOBuffer()
println(buf, length(x), "-element ", typeof(x), ':')
content = IOContext(buf, :limit => true, :compact => true)
Expand All @@ -359,7 +403,7 @@ function Base.show(io::IO, ::MIME"text/plain", x::AbstractAuxiliary)
end
write(io, take!(buf)[1:end-1]) # remove trailing newline
else
print(io, "Invalid ", typeof(x))
print(io, "Malformed ", typeof(x))
end
end

Expand Down
Loading
Loading