Automatic Collection Generation #4

b5 · 2017-03-28T03:26:40Z

A common characteristic that's emerging on government sites is an HTML page with numerous direct links to content-urls for example: http://www.nrel.gov/gis/data_solar.html

In an ideal world, these pages should automatically generate collections & attribute metadata to that collection based on HTML content (page title as collection title, meta tags scrutinized & added, etc).

I'm not totally sure how to pull this off, it may be as simple as looking for more than 10 direct links to content urls. Some of this thinking should be driven by analyzing already-crawled content.

b5 added the enhancement label Mar 28, 2017

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Automatic Collection Generation #4

Automatic Collection Generation #4

b5 commented Mar 28, 2017

Automatic Collection Generation #4

Automatic Collection Generation #4

Comments

b5 commented Mar 28, 2017