Fs insert one new #166

sl5035 · 2023-03-21T22:57:27Z

No description provided.

sl5035 · 2023-03-21T23:05:46Z

Sorry I had to change many files in order to make the function work.

Anything related to ChainDB had to be commented out since I did not write ChainDB module yet. I'm considering removing it from pydatarecognition after I'm done with fsclient.
Also, what does db['blacklist'] and db['whitelist'] do? I deleted it to run fsclient for now.
I am still figuring out how to test invalid inputs.

sl5035 · 2023-03-21T23:06:30Z

tests/conftest.py

@@ -15,7 +15,7 @@
 from copy import deepcopy


-OUTPUT_FAKE_DB = False  # always turn it to false after you used it
+OUTPUT_FAKE_DB = True  # always turn it to false after you used it


Sorry I forgot to turn it to false.

please change to false and push

sbillinge

I think this looks good except we need a new push with the db output set to False

sbillinge · 2023-03-21T23:02:11Z

tests/conftest.py

@@ -15,7 +15,7 @@
 from copy import deepcopy


-OUTPUT_FAKE_DB = False  # always turn it to false after you used it
+OUTPUT_FAKE_DB = True  # always turn it to false after you used it


this should be False

sbillinge · 2023-03-21T23:07:03Z

tests/conftest.py

@@ -15,7 +15,7 @@
 from copy import deepcopy


-OUTPUT_FAKE_DB = False  # always turn it to false after you used it
+OUTPUT_FAKE_DB = True  # always turn it to false after you used it


please change to false and push

sbillinge

pls see comments

sbillinge · 2023-03-21T23:10:09Z

tests/test_fsclient.py

-@pytest.mark.skip("Not written")
-def test_insert_one():
-    pass
+test_insert_cif = [{'intensity': [], 'q': [], 'ttheta': [], 'wavelength': 0.111111, '_id': 'ts1129'},


let's think about all the things we want to test. Since insert_one is general (not just cifs in principle) we just need to validate that it is valid json or sthg?

sbillinge · 2023-03-21T23:11:06Z

tests/test_fsclient.py

+def test_insert_one(make_db, make_bad_db, tc):
+    db_path = make_db
+    pydr_rc['databases'][0]['url'] = db_path
+    rc._update(pydr_rc)  # TODO: Is there a way to update rc at a global scope so that we don't have to write this every time we run the test functions?


you may be able to build rc in conftest.py and import it?

sl5035 · 2023-03-23T15:28:17Z

Hello Professor,
Could you please confirm test_insert_one and insert_one? If there is nothing to change I will move on to find one after this branch gets merged. Thank you!

sbillinge

sorry, I thought I sent these comments before...just finding them now as unsent.....

sbillinge · 2023-03-22T08:58:21Z

tests/test_fsclient.py

-    client.insert_one(dbname, collname, tc)
-
-    assert list(client.dbs[dbname][collname].values())[0] == tc  # TODO: How to reformat
+    try:


remember, don't put logic in the test. This is passing because you handle it in the test, but it needs to pass because you handle it in the way you want in the function itself.

sbillinge · 2023-03-22T09:01:05Z

tests/test_fsclient.py

+                   {'intensity': [], 'q': [], 'ttheta': [], 'wavelength': 0.111111, '_id': 'ts1129'}),
+                   ('bad_case_test', 'bad_case_test')]
+@pytest.mark.parametrize('input, result', test_insert_cif)
+def test_insert_one(make_db, rc, input, result):
    client = FileSystemClient(rc)
    client.open()



the code below seems to be testing connect_db, which should be tested in test_connect_db and not here. It is ok to use the function here to connect your db, but don't write tests for it here.

sbillinge · 2023-03-22T09:06:48Z

tests/test_fsclient.py

-
-    assert list(client.dbs[dbname][collname].values())[0] == tc  # TODO: How to reformat
+    try:
+        client.insert_one(dbname, collname, input)


Let's maybe take a step back here and decide what behavior we want. I think we want insert_one to just insert a document successfully into the client database, but we don't want it to do any validation etc.. Do you agree? Then when we use it we may want the validation so perhaps it would be something like:

cif_doc = create_doc(cif_info) # I made tihs up, we don't have this function or need it necessarily but it captures the behavior cif_doc_bool = validate_cifness(cif_doc) if cif_doc_bool: client.insert_one(db, coll, cif_doc) else: something

I am not sure if this is exactly right, but something like that. This makes the functions more reusable because I could use client.insert_one() to insert something that is not a cif document, so it is more reusable.

In this case, I think what we need to test is that insert_one does insert the thing properly, so we need a file on the TempDir filesystem, then after the insert_one we need to read it and show that nothing has changed in there except that the new doc has been inserted correctly. If the input is invalid we need to make sure that the file remains unchanged. sthg like that?

sbillinge · 2023-03-23T15:40:42Z

I just sent some comments that I thought I had sent before. Please could you look and send comments on that?

…

On Thu, Mar 23, 2023 at 3:28 PM Robin Lee ***@***.***> wrote: Hello Professor, Could you please confirm test_insert_one and insert_one? If there is nothing to change I will move on to find one after this branch gets merged. Thank you! — Reply to this email directly, view it on GitHub <#166 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/ABAOWULTRWPHX5AF4PS3IKLW5RTZZANCNFSM6AAAAAAWDCRHDA> . You are receiving this because you commented.Message ID: ***@***.***>

-- Simon Billinge Professor, Columbia University

sl5035 · 2023-03-23T15:43:38Z

sorry, I thought I sent these comments before...just finding them now as unsent.....

All good! I see them now. I'll fix these asap.

sl5035 · 2023-03-23T17:56:27Z

Using a bad test case, the function does not insert into the database so the last element of the database remains the same.

sbillinge

please see my comment.

For the bad entry, test it in a different function called test_insert_one_bad() and use a context manager that checks that the right exception is thrown. I think there are examples in the regolith tests.

sbillinge · 2023-03-23T18:00:10Z

pydatarecognition/fsclient.py

+        try:
+            coll = self.dbs[dbname][collname]
+            coll[doc["_id"]] = doc
+        except TypeError:


maybe don't catch the exception, let the program crash? Otherwise we have bad magic that the program is not updating the database but we don't know why. Here having it crash with the exception is a feature, not a bug.

sbillinge · 2023-03-23T18:06:46Z

tests/test_fsclient.py

-    except TypeError:
-        print('Input type should be json (dict).')
+    client.load_database(pydr_rc['databases'][0])
+    client.insert_one(dbname, collname, input)


Please check we have the right logic here. I think it should look like:

take exemplar and create a filesystem database with exemplar somewhere in tempdir

connect to this db

insert_one into this db

read the file from the tempdir file

make sure it contains the original exemplar and additionally the thing that was added.

I don't see the read so I am not sure we are testing that the file was updated.

I think since we are calling the rc fixture when running the test, the make_db and rc functions in conftest.py automatically generates a filesystem database with exemplar in tempdir. So if we print(client.dbs) after client.insert_one, the database contains {'local': {'calculated': {...EXEMPLAR, INSERTED_DATA}, {'measured': {...EXEMPLAR}}}}. Also this is what I originally intended to do when I wrote assert list(client.dbs[dbname][collname].values())[-1] == result.

sbillinge

please see comments.

sbillinge · 2023-03-23T20:09:03Z

tests/test_fsclient.py

+
+    client.load_database(pydr_rc['databases'][0])
+
+    with pytest.raises(KeyError):


do we want to return more info back to the user, for example, printing what the bad entry was? To help with debugging? I don't remember the syntax but there is a way of capturing the error message and doing an assert on it.

sbillinge · 2023-03-23T20:13:31Z

tests/test_fsclient.py

@@ -134,6 +133,22 @@ def test_insert_one(make_db, rc, input, result):
    assert list(client.dbs[dbname][collname].values())[-1] == result


per your question (which I don't see here for some reason) this may be left over from Regolith where the process is to load the entire database into memory (as client.dbs[dbname][collname]). It is then put back into the database at the end with a db.dump() or sthg like that. But I think we don't want to do that here as we will have 100,000 entries in the db. so we want insert_one to insert it all the way into the database backend. In the filesystem case this will be a text file on disc containing yml or json. in the mongo case it will be a remote mongodb and for mpcontribs it will be in mpcontribs.

Sorry, quick questions here. So it's a bad practice to load client.dbs[dbname][collname] entirely using the above.> What if we implement find_one first and use it to test insert_one? For example: assert client.find_one(inserted_cif_id) == inserted_cif?

I think we could, but I am not convinced that it is the best way. We should presumably also test that nothing else in the file was corrupted or overwritten or sthg.

Also, as I mentioned before, we are not inserting cifs per se. I think insert_one should be able to insert any valid json. I think you agree, but by putting cif in the name it makes the test harder to read so I suggest changing cif to json in the name.

Thank you! Also another quick question, by inserting all the way into the database backend you mean overriding the yaml file directly instead of using client.load_db and client.dump_db? That means we have to change our insert_one function to directly access the yaml file?

yes, I think that is right. Regolith assumed small databases that could be stored in memory, then operations on the items could be done with do loops and list comprehensions, but with large databases we want to use the powerful database capabilities directly, so we want kind of CRUD capabilities. We may probably want the fs to be json format not yml, but otherwise, yes, I think we want to insert directly.

sl5035 · 2023-03-23T21:17:08Z

tests/test_fsclient.py

-    with pytest.raises(TypeError):
-        client.insert_one(dbname, collname, test_insert_cif_bad[1])
+    assert '_id' in str(excinfo.value)
+    print(excinfo)


Also, how do we return info back to the user in pytest? Is print() enough? I couldn't find it on pytest docs...

we don't want to return anything back to the user in the tests, we just want them to pass or fail. This pattern could work

def test_insert_one_bad(): with pytest.raises(TypeError, match=r"Bad value in database entry key bad_entry_key"): insert_one(input[0])

then we need logic in the function itself so it raises an exception with the right error message.

sbillinge · 2023-03-27T06:37:28Z

Sorry to hear that, Robin. I hope you feel better soon. S

…

On Mon, Mar 27, 2023 at 12:34 AM Robin Lee ***@***.***> wrote: Hello, professor, Sorry for the delay, but I've diagnosed with Covid so I’ve been bedridden for the past few days. I am feeling a bit better so I’ll be able to continue working on pydatarecognition starting from tomorrow! Best, Robin > On Mar 23, 2023, at 19:59, Simon Billinge ***@***.***> wrote: > > > @sbillinge commented on this pull request. > > In tests/test_fsclient.py < #166 (comment) >: > > > @@ -134,6 +133,22 @@ def test_insert_one(make_db, rc, input, result): > assert list(client.dbs[dbname][collname].values())[-1] == result > yes, I think that is right. Regolith assumed small databases that could be stored in memory, then operations on the items could be done with do loops and list comprehensions, but with large databases we want to use the powerful database capabilities directly, so we want kind of CRUD capabilities. We may probably want the fs to be json format not yml, but otherwise, yes, I think we want to insert directly. > > — > Reply to this email directly, view it on GitHub < #166 (comment)>, or unsubscribe < https://github.com/notifications/unsubscribe-auth/AUUL2XDQXWALU2233M7BUATW5TPURANCNFSM6AAAAAAWDCRHDA >. > You are receiving this because you authored the thread. > — Reply to this email directly, view it on GitHub <#166 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/ABAOWUPKH5WUPVVR55FTNDDW6DHBLANCNFSM6AAAAAAWDCRHDA> . You are receiving this because you were mentioned.Message ID: ***@***.***>

-- Simon Billinge Professor, Columbia University

sl5035 · 2023-03-28T22:17:55Z

Sorry for the delay, but still thinking about the logic here. So the goal is to modify our database (preferably in .json format) without loading the entire database into our memory, right? I thought one way to achieve that is to open the file using json.load() and append our doc in the file and dump it without storing it in self.dbs. I am still figuring out the code layout though.

sbillinge · 2023-03-29T05:35:14Z

Sorry for the delay, but still thinking about the logic here. So the goal is to modify our database (preferably in .json format) without loading the entire database into our memory, right? I thought one way to achieve that is to open the file using json.load() and append our doc in the file and dump it without storing it in self.dbs. I am still figuring out the code layout though.

yes, I think that can work. It of course makes the filesystem backend very slow, if we load and dump for every insert, but it should make the mongo backend fast because we only insert what we want to insert.

Unless there is a way to have our cake and eat it. If we detect fs backend we load at the beginning, insert everything into a memory object like self.dbs then dump at the end. It may not matter if the client behaves differently in this way if it is transparent to the front end. It will make a difference if the program crashes in the middle. I am not sure how DB programs handle this tbh.

What do you think? I don't have all the answers here

sbillinge

please see my comment. I wonder if we want to have a quick call and do a bit of a hackathon here? It is hard because I am not sure what the right structure is either, so we may need to play around a bit together?

sbillinge · 2023-03-30T07:15:58Z

pydatarecognition/fsclient.py

@@ -234,7 +239,7 @@ def all_documents(self, collname, copy=True):
            return deepcopy(self.chained_db.get(collname, {})).values()
        return self.chained_db.get(collname, {}).values()

-    def insert_one(self, filename, dbname, collname, doc):
+    def insert_one(self, filename, doc):


do we know why this takes filename and doesn't take dbname and collname? Doesn't htis break the api, because the insert_one will have a different signature for the different clients (mongo vs fs).

I may be wrong here, so let me know if so, but I htink the idea is that insert_one has the same signature regardless of the client, but it has different behavior depending on the backend. In general, we want to insert something into a particular collection in a particular database. If that collection is in a mongo db the client will need a URL and login credentials and so on of the db, if it is in a file on a filesystem it will need to know the file-name and where in the directory structure it is located, but this is handled by the rc.databases at runtime.

sl5035 · 2023-03-30T21:47:12Z

Sorry, the function signature should remain the same. I will change it. Also, a quick call would be nice! I am available tomorrow at 4pm after my classes. I will try to work on modifying the function until then. If tomorrow doesn’t work, I can try and adjust my time.

…

On Mar 30, 2023, at 03:17, Simon Billinge ***@***.***> wrote: @sbillinge commented on this pull request. please see my comment. I wonder if we want to have a quick call and do a bit of a hackathon here? It is hard because I am not sure what the right structure is either, so we may need to play around a bit together? In pydatarecognition/fsclient.py <#166 (comment)>: > @@ -234,7 +239,7 @@ def all_documents(self, collname, copy=True): return deepcopy(self.chained_db.get(collname, {})).values() return self.chained_db.get(collname, {}).values() - def insert_one(self, filename, dbname, collname, doc): + def insert_one(self, filename, doc): do we know why this takes filename and doesn't take dbname and collname? Doesn't htis break the api, because the insert_one will have a different signature for the different clients (mongo vs fs). I may be wrong here, so let me know if so, but I htink the idea is that insert_one has the same signature regardless of the client, but it has different behavior depending on the backend. In general, we want to insert something into a particular collection in a particular database. If that collection is in a mongo db the client will need a URL and login credentials and so on of the db, if it is in a file on a filesystem it will need to know the file-name and where in the directory structure it is located, but this is handled by the rc.databases at runtime. — Reply to this email directly, view it on GitHub <#166 (review)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AUUL2XHE6FATZGNBKZIK3ULW6UXP5ANCNFSM6AAAAAAWDCRHDA>. You are receiving this because you authored the thread.

sl5035 added 3 commits March 21, 2023 17:52

init for new fs_insert_one

93d02c9

insert_one test function passes

7929b81

bug fix for test_open and test_close

1123853

sl5035 commented Mar 21, 2023

View reviewed changes

sbillinge reviewed Mar 21, 2023

View reviewed changes

sl5035 added 4 commits March 21, 2023 19:20

output false

1a01b94

global rc

6f3f475

test_insert_one now passes with invalid case

0766efa

runcontrol test bug fix

068a03a

sbillinge reviewed Mar 23, 2023

View reviewed changes

test insert one now passes with bad test case

142e7c4

sbillinge reviewed Mar 23, 2023

View reviewed changes

test insert one split into two

0f305dd

sbillinge reviewed Mar 23, 2023

View reviewed changes

return error info back to the client

4acd773

sl5035 commented Mar 23, 2023

View reviewed changes

sl5035 added 2 commits March 23, 2023 19:33

test_insert_one_bad with return info

3a97060

first draft

d43a7dc

This comment was marked as off-topic.

Sign in to view

second draft

72b4dea

third draft

96b7659

sbillinge reviewed Mar 30, 2023

View reviewed changes

sl5035 added 2 commits April 4, 2023 10:02

fsclient logic implemented

ffd5961

test connect_db skip

f8569b6


		client.load_database(pydr_rc['databases'][0])

		with pytest.raises(KeyError):

		@@ -134,6 +133,22 @@ def test_insert_one(make_db, rc, input, result):
		assert list(client.dbs[dbname][collname].values())[-1] == result

Fs insert one new #166

Are you sure you want to change the base?

Fs insert one new #166

Conversation

sl5035 commented Mar 21, 2023

sl5035 commented Mar 21, 2023

Choose a reason for hiding this comment

Choose a reason for hiding this comment

sbillinge left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

sbillinge left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

sl5035 commented Mar 23, 2023

sbillinge left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

sbillinge commented Mar 23, 2023 via email

sl5035 commented Mar 23, 2023

sl5035 commented Mar 23, 2023

sbillinge left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

sbillinge left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

This comment was marked as off-topic.

sbillinge commented Mar 27, 2023 via email

sl5035 commented Mar 28, 2023 • edited Loading

sbillinge commented Mar 29, 2023

sbillinge left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

sl5035 commented Mar 30, 2023 via email

sl5035 commented Mar 28, 2023 •

edited

Loading