You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Samples can have many ComputedFiles, which also have ComputationalResults.
Processor jobs create instances of ComputationalResult and ComputedFile when samples are processed.
Problem or idea
We have some samples with multiple computed files, but for each one, it's not obvious what original files were used to generate them. Also, there's no way to know which processor jobs generated them.
For example, sample GSM248431 has multiple computed files.
data_refinery=> select * from computed_files where id in (select computed_file_id from sample_computed_file_associations where sample_id in (select id from samples where accession_code='GSM248431'));
id | filename | absolute_file_path | size_in_bytes | sha1 | is_smashable | is_qc | is_qn_target | s3_bucket | s3_key | is_public | created_at | last_modified | result_id | compendia_organism_id | compendia_version | is_compendia | quant_sf_only | svd_algorithm
---------+--------------------------+----------------------------------------------------------------------+---------------+------------------------------------------+--------------+-------+--------------+--------------------------------+---------------------------------------------------+-----------+-------------------------------+-------------------------------+-----------+-----------------------+-------------------+--------------+---------------+---------------
1043270 | GSM248431_GE1002_2_2.PCL | /home/user/data_store/processor_job_1615745/GSM248431_GE1002_2_2.PCL | 143873 | 619f543f7b5ed39180e0e71a69f37b9ad6e11bd8 | t | f | f | data-refinery-s3-circleci-prod | y3wchb8nev4iyz4c6s9ukv0w_GSM248431_GE1002_2_2.PCL | t | 2018-12-20 14:44:10.890171+00 | 2018-12-20 14:44:11.139879+00 | 927601 | | | f | f | NONE
1043658 | GSM248431_GE1001_3.PCL | /home/user/data_store/processor_job_1616335/GSM248431_GE1001_3.PCL | 143950 | 2688b7951ec72ab49540fcfb4da44d2683715c68 | t | f | f | data-refinery-s3-circleci-prod | kk01w6jrhdp9o0f31mt1pnje_GSM248431_GE1001_3.PCL | t | 2018-12-20 14:47:08.301983+00 | 2018-12-20 14:47:12.167675+00 | 927989 | | | f | f | NONE
1043633 | GSM248431_GE1003_3.PCL | /home/user/data_store/processor_job_1616313/GSM248431_GE1003_3.PCL | 143963 | 925810e5b65bdb8e10079e52da93c0c588833589 | t | f | f | data-refinery-s3-circleci-prod | puu08rbp3hoq9aspikktocmd_GSM248431_GE1003_3.PCL | t | 2018-12-20 14:47:01.282186+00 | 2018-12-20 14:47:01.845971+00 | 927964 | | | f | f | NONE
1043207 | GSM248431_GE1004_3.PCL | /home/user/data_store/processor_job_1615679/GSM248431_GE1004_3.PCL | 143930 | 4cf20fad148a0632a02df608b3246db9f0992797 | t | f | f | data-refinery-s3-circleci-prod | cxcqnjnuorgarw3kx6www1i0_GSM248431_GE1004_3.PCL | t | 2018-12-20 14:43:32.654072+00 | 2018-12-20 14:43:46.704009+00 | 927538 | | | f | f | NONE
1043592 | GSM248431_GE1002_2_1.PCL | /home/user/data_store/processor_job_1616233/GSM248431_GE1002_2_1.PCL | 143841 | 7fe7f2702e1dca767345906d81a33a07b9031484 | t | f | f | data-refinery-s3-circleci-prod | rcou7jcua1kb9kp2a7ryqzu7_GSM248431_GE1002_2_1.PCL | t | 2018-12-20 14:46:34.442139+00 | 2018-12-20 14:46:36.158067+00 | 927923 | | | f | f | NONE
1043205 | GSM248431_GE1003_1.PCL | /home/user/data_store/processor_job_1615657/GSM248431_GE1003_1.PCL | 143799 | ad9a3ce7580aca4acb284c531587d8b4ecc25f7a | t | f | f | data-refinery-s3-circleci-prod | 5xnifq0jqtfkxpb9k2mds8ux_GSM248431_GE1003_1.PCL | t | 2018-12-20 14:43:32.579096+00 | 2018-12-20 14:43:46.693395+00 | 927536 | | | f | f | NONE
1043449 | GSM248431_GE1002_1.PCL | /home/user/data_store/processor_job_1615986/GSM248431_GE1002_1.PCL | 143796 | 68314b156905068ef1649ed78bd5f7ed14f5ed14 | t | f | f | data-refinery-s3-circleci-prod | ffsdc752f6gyfh6dn31ibow9_GSM248431_GE1002_1.PCL | t | 2018-12-20 14:45:31.363883+00 | 2018-12-20 14:45:40.448563+00 | 927780 | | | f | f | NONE
1044554 | GSM248431_GE1004_1.PCL | /home/user/data_store/processor_job_1617375/GSM248431_GE1004_1.PCL | 143816 | 3e72fc3a4f1b4818bc212abc443758511c183b6c | t | f | f | data-refinery-s3-circleci-prod | e1b6si5cgpi02r9n0nyjuwry_GSM248431_GE1004_1.PCL | t | 2018-12-20 15:03:08.943553+00 | 2018-12-20 15:03:21.513281+00 | 928887 | | | f | f | NONE
1043195 | GSM248431_GE1002_3.PCL | /home/user/data_store/processor_job_1615673/GSM248431_GE1002_3.PCL | 143916 | 0df30bf5d76da1a2af1a5d4b43381f220a784fc6 | t | f | f | data-refinery-s3-circleci-prod | bm4x03ivvrxt4cd7dzjfsasu_GSM248431_GE1002_3.PCL | t | 2018-12-20 14:43:26.870718+00 | 2018-12-20 14:43:42.702274+00 | 927526 | | | f | f | NONE
(9 rows)
And multiple original files:
data_refinery=> select id, filename, is_archive, source_filename from original_files where id in (select original_file_id from original_file_sample_associations where sample_id in (select id from samples where accession_code='GSM248431'));
id | filename | is_archive | source_filename
---------+--------------------------+------------+-----------------------------
1482345 | GSM248431_GE1002_3.CEL | f | GSM248431_GE1002_3.CEL.gz
1419569 | | t | GSM248431_GE1002_2_1.CEL.gz
1419302 | | t | GSM248431_GE1002_1.CEL.gz
1482325 | GSM248431_GE1003_1.CEL | f | GSM248431_GE1003_1.CEL.gz
1482352 | GSM248431_GE1004_3.CEL | f | GSM248431_GE1004_3.CEL.gz
1420599 | | t | GSM248431_GE1003_3.CEL.gz
1483092 | GSM248431_GE1003_3.CEL | f | GSM248431_GE1003_3.CEL.gz
1483111 | GSM248431_GE1001_3.CEL | f | GSM248431_GE1001_3.CEL.gz
1419866 | | t | GSM248431_GE1002_2_2.CEL.gz
1484300 | GSM248431_GE1004_1.CEL | f | GSM248431_GE1004_1.CEL.gz
1420090 | | t | GSM248431_GE1002_3.CEL.gz
1419042 | | t | GSM248431_GE1001_3.CEL.gz
1483015 | GSM248431_GE1002_2_1.CEL | f | GSM248431_GE1002_2_1.CEL.gz
1482723 | GSM248431_GE1002_1.CEL | f | GSM248431_GE1002_1.CEL.gz
1420814 | | t | GSM248431_GE1004_1.CEL.gz
1482431 | GSM248431_GE1002_2_2.CEL | f | GSM248431_GE1002_2_2.CEL.gz
1421034 | | t | GSM248431_GE1004_3.CEL.gz
1420360 | | t | GSM248431_GE1003_1.CEL.gz
(18 rows)
Solution or next step
I think it makes sense to add a new relation between ComputationalResult and ProcessorJob.
I think you're right! I think we should be able to tell what ProcessorJob generated a ComputationalResult. A ComputationalResult will never have more than one ProcessorJob associated with it, so it should just be a processor_job_id property on the ComputationalResult model, one that we'll probably want to not expose via the API? (I think at the moment we aren't exposing anything about jobs via the API.)
one that we'll probably want to not expose via the API? (I think at the moment we aren't exposing anything about jobs via the API.)
Actually we have endpoints to expose all the jobs: /jobs/downloader and /jobs/processor. I started using them to list the jobs for each sample at AlexsLemonade/refinebio-frontend#784. Is there any reason not to expose processor_job_id in the API? Would be nice to be able to inspect the jobs associated with a ComputationalResult via the API
Nope, no reason at all! I just thought we were trying to hide those deets from our users but honestly I was hoping that we'd eventually change that anyway :D
Context
Samples can have many
ComputedFiles
, which also haveComputationalResults
.Processor jobs create instances of
ComputationalResult
andComputedFile
when samples are processed.Problem or idea
We have some samples with multiple computed files, but for each one, it's not obvious what original files were used to generate them. Also, there's no way to know which processor jobs generated them.
For example, sample
GSM248431
has multiple computed files.And multiple original files:
Solution or next step
I think it makes sense to add a new relation between
ComputationalResult
andProcessorJob
.Tagging @kurtwheeler for further discussion.
The text was updated successfully, but these errors were encountered: