Merge of Tjdett and Microtardis branches #4

stevage · 2012-08-31T07:12:23Z

I've more or less tested this and it more or less works.

Most site policies can now be customised with option constants: ALLOW_EXPERIMENT_CREATION = True # Should we create new experiments ALLOW_EXPERIMENT_TITLE_MATCHING = True # If there's no id, is the title enough to match on ALLOW_UNIDENTIFIED_EXPERIMENT = True # If there's no title/id, should we process it as "uncategorized"? DEFAULT_UNIDENTIFIED_EXPERIMENT_TITLE="Uncategorized Data" ALLOW_UNNAMED_DATASETS = True # If a dataset has no title, should we ingest it with a default name DEFAULT_UNNAMED_DATASET_TITLE = '(assorted files)' ALLOW_USER_CREATION = True # If experiments belong to unknown users, create them? # Can existing datasets be updated? If not, we ignore updates. To cause a new dataset to be created, the incoming # feed must have a unique EntryID for the dataset (eg, hash of its contents). ALLOW_UPDATING_DATASETS = True # If a datafile is modified, do we re-harvest it (creating two copies)? Else, we ignore the update. False is not recommended. ALLOW_UPDATING_DATAFILES = True # If files are served as /user/instrument/experiment/dataset/datafile/moredatafiles # then 'datafile' is at depth 5. This is so we can maintain directory structure that # is significant within a dataset. Set to -1 to assume the last directory DATAFILE_DIRECTORY_DEPTH = 9 Major changes include: - Support for updated datasets - Support for updated datafiles - Store datafile modification dates - Matching on experiment title - Lots of useful logging at debug/info/warn levels as appropriate - Comments everywhere Known issues/limitations: - Doesn't yet handle directory structures within datasets - Probably crashes if feed doesn't contain modification dates for datafiles - Title matches are case-sensitive - Matching against existing title-less experiments is untested - Sometimes duplicate entries appear in DatasetParameterSet and crash everything (possibly caused by strange debug scenarios - ie, simultaneous celery beats) - I didn't add any tests. - I only tested against my customised atom dataset provider (for all I know, Picasa import is now horribly broken) - Makes use of cringeworthy non-standard additions to the Atom feed. Sorry. - A few debuggy comments left in code.

Also, support subdirectories within datasets. Requires a change to tardis_portal.util.get_local_time() to work properly. Also some nicer log messages (instead of 134584 seconds etc)

…-atom into microtardis

…tly.

…stamps for dataset updates that I don't understand.

…', possibly skipped datasets.

tjdett · 2012-09-02T22:38:00Z

tasks.py



 @task(name="atom_ingest.walk_feeds", ignore_result=True)
 def walk_feeds(*feeds):
    for feed in feeds:
        walk_feed.delay(feed)
+
+'''
+# This function is now in MyTardis core - but not including the 'local copy' mode. What to do?


Replicating the functionality here isn't appropriate. Even if it wasn't dangerous replication of functionality, it skips the new file verification behaviour.

Is there a reason why you can't change the data file URLs when the dataset & datafiles are created by AtomPersister?

(The use of tardis:// urls also has to be a mistake, as the core no longer knows how to handle them.)

So, what are you suggesting exactly?

This method appears to differ from the core implementation simply in which URL it fetches.

Is there a reason why that functionality can't occur in process_enclosure? eg.

def _get_enclosure_url(enclosure) if IngestOptions.USE_LOCAL_TRANSFERS: enclosure.href.replace(IngestOptions.URL_BASE_TO_REPLACE, \ IngestOptions.LOCAL_SOURCE_PATH) else: enclosure.href def process_enclosure: ... datafile = Dataset_File(url=self._get_enclosure_url(enclosure), \ filename=filename, \ dataset=dataset) ...

No reason at all - just time. I wrote the below hack pretty quickly - but after your comments about the performance of HTTP transfers compared to NFS, I'm not sure it's even necessary. Anyway, I'll see how I go with your suggestion.

tjdett · 2012-09-02T23:24:25Z

atom_ingest.py

+                            dataset, getattr(enclosure, 'href', "<no href!>")))
+                    logging.getLogger(__name__).error("Enclosure: {0}".format(json.dumps(enclosure)))
+            for media_content in getattr(entry, 'media_content', []):
+                self.process_media_content(dataset, media_content)


self.process_media_content is gone along with the Picasa feed functionality, as it did not support hash digests. You should no longer be calling it.

tjdett · 2012-09-02T23:51:14Z

atom_ingest.py

+                                                 format(enclosure.href, self.human_time(timediff)))
+                if IngestOptions.HIDE_REPLACED_DATAFILES:
+                    # Mark all older versions of file as hidden. (!)
+                    from tardis.microtardis.models import Dataset_Hidden  


Please at least wrap this in a try...except closure to handle the absence of MicroTardis.

True. Although the default, in options.py, is not to use this feature - and there's a comment pointing out you need MicroTardis right next to it.

tjdett · 2012-09-02T23:52:04Z

@steveandroulakis I found a number of cases in this pull-request where old MyTardis functionality is referenced.

Could you please take a look and see if there are any additional issues.

steveandroulakis · 2012-09-06T03:02:13Z

@tjdett Aside from testing the functionality, I'm admittedly a bit stuck for time while I work on 3.0 (and do my 'actual' job at Monash) -- could you please highlight some of the old parts for me and I'll follow it up.

I'll be much free-er once the 3.0 code list is done (closer than you may think!). Thanks.

tjdett · 2012-09-06T04:26:32Z

@steveandroulakis I'm not an active maintainer for this app, so the following comments are merely suggestions. The final call is yours for this and all subsequent pull-requests. You're in a better position to evaluate the long-term interests of the MyTardis project when performing the merge....

As mentioned in comments, it's doubtful this is necessary:
https://github.com/mytardis/mytardis-app-atom/pull/4/files#L1R32

time_util.py is probably unnecessary duplication - it should at least be checked against core util.py to see if it's compatible before it's merged in.

make_local_copy needs to be removed entirely:
https://github.com/mytardis/mytardis-app-atom/pull/4/files#L3R30

Personally, I believe pull-requests incorporating new features should also include tests for those features (even if they're not particularly comprehensive). Without them, there's no guarantee that any of those features will stay working. Not having them also makes refactoring difficult (and as noted in the comments, I think there's plenty of potential for that).

steveandroulakis · 2012-09-06T05:01:01Z

@tjdett Hi Tim,

Thanks for the prompt reply, even though there's no requirement for you to do so. Much appreciated..

Steve

crawley · 2012-09-07T07:38:49Z

I'm coming in late, but I'd like support Tim (and the MyTardis submission guidelines) in saying that pull requests should include unit tests for new and modified functionality. This is particularly important for me where I'm tracking HEAD and I don't have resources for repeated user / acceptance testing.

Steve Bennett and others added 25 commits March 14, 2012 15:02

Add some comments

2e36e9b

Fully implement dataset "updated" handling.

1eb9537

Also, support subdirectories within datasets. Requires a change to tardis_portal.util.get_local_time() to work properly. Also some nicer log messages (instead of 134584 seconds etc)

Extra warning.

cb5e09b

Update documentation.

16a5cfb

Catch and log exceptions around enclosure processing.

92dd20b

Merge branch 'microtardis' of https://github.com/stevage/mytardis-app…

422c48d

…-atom into microtardis

(Joanna H) Make filters work by calling the Django middleware explici…

5d191d0

…tly.

Auto-hide old versions of new files that are imported.

d84a6ce

Move options to separate options.py

9ac9076

Merge.

5743105

Merge.

3aa0c69

Add options for filter processing. Work around issue with saving time…

35c92d9

…stamps for dataset updates that I don't understand.

Merge

e5b3877

Add proxy option, fix another timezone bug.

ee21d90

Update documentation re: options.

e0bc178

Option to allow full/quick scans of the feed. Full scans pick up 'old…

3b4dfec

…', possibly skipped datasets.

Fix missing import, and add a helpful comment in options.py.

463e74e

Tentative merge of stevage and tjdett branches. Untested.

00ede1f

Refine merge. Still untested.

663c222

Add workaround for util.py

3093bea

More post-merge fixes.

ed78fe1

Tyop.

84126bb

More sensible default options.

d065bd8

Fix datafile depth error in options.

e401b18

tjdett reviewed Sep 2, 2012
View reviewed changes

ghost assigned steveandroulakis Sep 2, 2012

tjdett reviewed Sep 2, 2012
View reviewed changes

Steve Bennett added 3 commits September 3, 2012 18:37

Fail more gracefully if MicroTardis app is not present.

0f9638d

Remove references to process_media_content()

99c5416

Remove local copy of make_local_copy().

65f11d2

Local transfers should start with file:/// now.

f318e5f

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Merge of Tjdett and Microtardis branches #4

Merge of Tjdett and Microtardis branches #4

stevage commented Aug 31, 2012

tjdett Sep 2, 2012

stevage Sep 3, 2012

tjdett Sep 3, 2012

stevage Sep 3, 2012

stevage Sep 4, 2012

tjdett Sep 2, 2012

stevage Sep 3, 2012

tjdett Sep 2, 2012

stevage Sep 3, 2012

tjdett commented Sep 2, 2012

steveandroulakis commented Sep 6, 2012

tjdett commented Sep 6, 2012

steveandroulakis commented Sep 6, 2012

crawley commented Sep 7, 2012

Merge of Tjdett and Microtardis branches #4

Are you sure you want to change the base?

Merge of Tjdett and Microtardis branches #4

Conversation

stevage commented Aug 31, 2012

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

tjdett commented Sep 2, 2012

steveandroulakis commented Sep 6, 2012

tjdett commented Sep 6, 2012

steveandroulakis commented Sep 6, 2012

crawley commented Sep 7, 2012