Thoughts about output files from `gtar.uniwig` npy #65

ClaudeHu · 2025-01-07T21:20:40Z

3 files each record start points on chromosomes on start/core/end. For example:

if start.wig looks like this when output is wiggle:

fixedStep chrom=chr1 start=9010079 step=5

...


fixedStep chrom=chr2 start=46656910 step=5

...

Then when -y npy, a start_meta.json should be made like this:

{
    "chr1": 9010079,
    "chr2": 46656910,

    ...

}

Another file (named ref.json?) includes the step size and chromosome sizes (would be more convenient if chromosome sizes dictionary can be sorted by keys):

{
    "step": 5,
    "chroms": {
        "chr1": 248956422,
        "chr2": 242193529,

        ...

    } 
}

The text was updated successfully, but these errors were encountered:

donaldcampbelljr · 2025-01-09T19:00:33Z

Ok, I will attempt a nested hashmap to store the npy metadata and then export at the end of the process.

POC Rust Code (this works nicely):

use std::collections::HashMap;
use std::fs::File;
use std::io::Write;
use serde_json;

fn main() {
    let mut chromosome_data: HashMap<String, HashMap<String, i32>> = HashMap::new();

    chromosome_data.insert(
        "chr1".to_string(),
        HashMap::from([
            ("start".to_string(), 1),
            ("core".to_string(), 10),
            ("end".to_string(), 100),
            ("stepsize".to_string(), 5),
            ("reported_chrom_size".to_string(), 300),
        ]),
    );

    chromosome_data.insert(
        "chr22".to_string(),
        HashMap::from([
            ("stepsize".to_string(), 5),
            ("reported_chrom_size".to_string(), 400),
        ]),
    );

    if let Some(current_chr_data) = chromosome_data.get_mut("chr22") {
        current_chr_data.insert("start".to_string(), 10);
        current_chr_data.insert("end".to_string(), 87);
    }

    println!("{:?}", chromosome_data);

    let json_string = serde_json::to_string_pretty(&chromosome_data).unwrap();

    let mut file = File::create("chromosome_data.json").unwrap();
    file.write_all(json_string.as_bytes()).unwrap();

    println!("HashMap exported to chromosome_data.json");


}

Results in this json:

{
  "chr22": {
    "stepsize": 5,
    "start": 10,
    "end": 87,
    "reported_chrom_size": 400
  },
  "chr1": {
    "stepsize": 5,
    "reported_chrom_size": 300,
    "start": 1,
    "end": 100,
    "core": 10
  }
}

donaldcampbelljr · 2025-01-09T21:50:29Z

I attempted with the above approach in this commit: 27d52f5

However, due to parallel processing with Rayon, I am unable to mutate the hashmap during parallel iterations. I also attempted to use an arc<mutex<>> but was unsuccessful.

I can attempt to make a json metadata file at the end by simply parsing the meta data files that already created and combining them. Not as elegant and requires making temp files to be discarded but it should allow us to get a single meta.json file with all of the npy meta data.

This reverts commit 27d52f5.

Fixes for #64 and #65

donaldcampbelljr · 2025-01-13T17:31:04Z

Closing with 0.2.0 Release

ClaudeHu added the brainstorming label Jan 7, 2025

donaldcampbelljr added a commit that referenced this issue Jan 9, 2025

attempt to use shared hashmap for #65 does not work

27d52f5

donaldcampbelljr added a commit that referenced this issue Jan 9, 2025

Revert "attempt to use shared hashmap for #65 does not work"

391ba68

This reverts commit 27d52f5.

donaldcampbelljr added a commit that referenced this issue Jan 9, 2025

working solution for #65

5f5973b

donaldcampbelljr mentioned this issue Jan 9, 2025

Fixes for #64 and #65 #66

Merged

donaldcampbelljr added a commit that referenced this issue Jan 10, 2025

Merge pull request #66 from databio/dev_64

ce0967a

Fixes for #64 and #65

donaldcampbelljr added the likely solved label Jan 10, 2025

donaldcampbelljr closed this as completed Jan 13, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Thoughts about output files from `gtar.uniwig` npy #65

Thoughts about output files from `gtar.uniwig` npy #65

ClaudeHu commented Jan 7, 2025 •

edited

Loading

donaldcampbelljr commented Jan 9, 2025

donaldcampbelljr commented Jan 9, 2025

donaldcampbelljr commented Jan 13, 2025

Thoughts about output files from gtar.uniwig npy #65

Thoughts about output files from gtar.uniwig npy #65

Comments

ClaudeHu commented Jan 7, 2025 • edited Loading

donaldcampbelljr commented Jan 9, 2025

donaldcampbelljr commented Jan 9, 2025

donaldcampbelljr commented Jan 13, 2025

Thoughts about output files from `gtar.uniwig` npy #65

Thoughts about output files from `gtar.uniwig` npy #65

ClaudeHu commented Jan 7, 2025 •

edited

Loading