Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fs insert one new #166

Open
wants to merge 16 commits into
base: client
Choose a base branch
from
7 changes: 2 additions & 5 deletions pydatarecognition/fsclient.py
Original file line number Diff line number Diff line change
Expand Up @@ -238,11 +238,8 @@ def all_documents(self, collname, copy=True):

def insert_one(self, dbname, collname, doc):
"""Inserts one document to a database/collection."""
try:
coll = self.dbs[dbname][collname]
coll[doc["_id"]] = doc
except TypeError:
return 'Input type should be json (dict).'
coll = self.dbs[dbname][collname]
coll[doc["_id"]] = doc

def insert_many(self, dbname, collname, docs):
"""Inserts many documents into a database/collection."""
Expand Down
21 changes: 18 additions & 3 deletions tests/test_fsclient.py
Original file line number Diff line number Diff line change
Expand Up @@ -118,10 +118,9 @@ def test_all_documents():


test_insert_cif = [({'intensity': [], 'q': [], 'ttheta': [], 'wavelength': 0.111111, '_id': 'ts1129'},
{'intensity': [], 'q': [], 'ttheta': [], 'wavelength': 0.111111, '_id': 'ts1129'}),
('bad_case_test', EXEMPLARS['calculated'][-1])]
{'intensity': [], 'q': [], 'ttheta': [], 'wavelength': 0.111111, '_id': 'ts1129'})]
@pytest.mark.parametrize('input, result', test_insert_cif)
def test_insert_one(make_db, rc, input, result):
def test_insert_one(rc, input, result):
client = FileSystemClient(rc)
client.open()

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the code below seems to be testing connect_db, which should be tested in test_connect_db and not here. It is ok to use the function here to connect your db, but don't write tests for it here.

Expand All @@ -134,6 +133,22 @@ def test_insert_one(make_db, rc, input, result):
assert list(client.dbs[dbname][collname].values())[-1] == result
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

per your question (which I don't see here for some reason) this may be left over from Regolith where the process is to load the entire database into memory (as client.dbs[dbname][collname]). It is then put back into the database at the end with a db.dump() or sthg like that. But I think we don't want to do that here as we will have 100,000 entries in the db. so we want insert_one to insert it all the way into the database backend. In the filesystem case this will be a text file on disc containing yml or json. in the mongo case it will be a remote mongodb and for mpcontribs it will be in mpcontribs.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry, quick questions here. So it's a bad practice to load client.dbs[dbname][collname] entirely using the above.> What if we implement find_one first and use it to test insert_one? For example: assert client.find_one(inserted_cif_id) == inserted_cif?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we could, but I am not convinced that it is the best way. We should presumably also test that nothing else in the file was corrupted or overwritten or sthg.

Also, as I mentioned before, we are not inserting cifs per se. I think insert_one should be able to insert any valid json. I think you agree, but by putting cif in the name it makes the test harder to read so I suggest changing cif to json in the name.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you! Also another quick question, by inserting all the way into the database backend you mean overriding the yaml file directly instead of using client.load_db and client.dump_db? That means we have to change our insert_one function to directly access the yaml file?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes, I think that is right. Regolith assumed small databases that could be stored in memory, then operations on the items could be done with do loops and list comprehensions, but with large databases we want to use the powerful database capabilities directly, so we want kind of CRUD capabilities. We may probably want the fs to be json format not yml, but otherwise, yes, I think we want to insert directly.



test_insert_cif_bad = [{'bad_case_test_dict': 'bad'}, 'bad_case_test_str']
def test_insert_one_bad(rc):
client = FileSystemClient(rc)
client.open()

dbname = 'local'
collname = 'calculated'

client.load_database(pydr_rc['databases'][0])

with pytest.raises(KeyError):
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do we want to return more info back to the user, for example, printing what the bad entry was? To help with debugging? I don't remember the syntax but there is a way of capturing the error message and doing an assert on it.

client.insert_one(dbname, collname, test_insert_cif_bad[0])
with pytest.raises(TypeError):
client.insert_one(dbname, collname, test_insert_cif_bad[1])


@pytest.mark.skip("Not written")
def test_insert_many():
pass
Expand Down