-
Notifications
You must be signed in to change notification settings - Fork 61
WebEntity Links update process
jrault edited this page Dec 21, 2012
·
1 revision
The WebEntity Links update process allows to have performance when retrieving links between web entities as well as having performance when inserting new links between Nodes. Its purpose is somehow to put the links in cache. We do not want to update the cache each time there is a change in the WebEntity (a Node insertion, with its links) nor to update each time the core asks for data (no cache).
One of the goals we want to achieve is to provide users a graph of the web entities they have defined. The graph of web entities is an expected and common output of our system. But this graph is highly aggregated, and thus costly to build. Here is just a reminder of the aggregation stages we use:
- At the lower level, we have pages and the links in them
- We use this information to build a graph of nodes. These are just an approximation of the pages. We do it into reducing the complexity (and size) of the graph.
- The web entities, defined by the user, whose links are aggregated from links between nodes.
- A WebEntity (we describe the process for a given WebEntity)
- Including some timestamps:
- last insert timestamp
- last update timestamp
- Including some timestamps:
- Nodes contained by this WebEntity
- Node Links outbound from these Nodes
- The WebEntity Links that will be built from these Node Links.
We propose now the process in pseudo-code.
- When a Node and its outbound Node Links are inserted:
- Update process of the WebEntity:
- Retrieve from the WebEntity the last insert timestamp and the last update timestamp, and compare it.
- If last insert timestamp < last update timestamp, we need to update:
- Set the current update timestamp to now
- Retrieve from the WebEntity all the Node Links inserted after last update timestamp and before current update timestamp
- Build the WebEntity Links from to these Nodes Links
- Finally, update the last update timestamp to current update timestamp.
- And iterate (trigger the process again)