Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Thoughts about output files from gtar.uniwig npy #65

Closed
ClaudeHu opened this issue Jan 7, 2025 · 3 comments
Closed

Thoughts about output files from gtar.uniwig npy #65

ClaudeHu opened this issue Jan 7, 2025 · 3 comments

Comments

@ClaudeHu
Copy link
Member

ClaudeHu commented Jan 7, 2025

3 files each record start points on chromosomes on start/core/end. For example:

if start.wig looks like this when output is wiggle:

fixedStep chrom=chr1 start=9010079 step=5

...


fixedStep chrom=chr2 start=46656910 step=5

...

Then when -y npy, a start_meta.json should be made like this:

{
    "chr1": 9010079,
    "chr2": 46656910,

    ...

}

Another file (named ref.json?) includes the step size and chromosome sizes (would be more convenient if chromosome sizes dictionary can be sorted by keys):

{
    "step": 5,
    "chroms": {
        "chr1": 248956422,
        "chr2": 242193529,

        ...

    } 
}
@donaldcampbelljr
Copy link
Member

Ok, I will attempt a nested hashmap to store the npy metadata and then export at the end of the process.

POC Rust Code (this works nicely):

use std::collections::HashMap;
use std::fs::File;
use std::io::Write;
use serde_json;

fn main() {
    let mut chromosome_data: HashMap<String, HashMap<String, i32>> = HashMap::new();

    chromosome_data.insert(
        "chr1".to_string(),
        HashMap::from([
            ("start".to_string(), 1),
            ("core".to_string(), 10),
            ("end".to_string(), 100),
            ("stepsize".to_string(), 5),
            ("reported_chrom_size".to_string(), 300),
        ]),
    );

    chromosome_data.insert(
        "chr22".to_string(),
        HashMap::from([
            ("stepsize".to_string(), 5),
            ("reported_chrom_size".to_string(), 400),
        ]),
    );

    if let Some(current_chr_data) = chromosome_data.get_mut("chr22") {
        current_chr_data.insert("start".to_string(), 10);
        current_chr_data.insert("end".to_string(), 87);
    }

    println!("{:?}", chromosome_data);

    let json_string = serde_json::to_string_pretty(&chromosome_data).unwrap();

    let mut file = File::create("chromosome_data.json").unwrap();
    file.write_all(json_string.as_bytes()).unwrap();

    println!("HashMap exported to chromosome_data.json");


}

Results in this json:

{
  "chr22": {
    "stepsize": 5,
    "start": 10,
    "end": 87,
    "reported_chrom_size": 400
  },
  "chr1": {
    "stepsize": 5,
    "reported_chrom_size": 300,
    "start": 1,
    "end": 100,
    "core": 10
  }
}

@donaldcampbelljr
Copy link
Member

I attempted with the above approach in this commit: 27d52f5

However, due to parallel processing with Rayon, I am unable to mutate the hashmap during parallel iterations. I also attempted to use an arc<mutex<>> but was unsuccessful.

I can attempt to make a json metadata file at the end by simply parsing the meta data files that already created and combining them. Not as elegant and requires making temp files to be discarded but it should allow us to get a single meta.json file with all of the npy meta data.

@donaldcampbelljr
Copy link
Member

Closing with 0.2.0 Release

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants