-
Notifications
You must be signed in to change notification settings - Fork 1
/
Copy pathproviders.txt
208 lines (131 loc) · 11.4 KB
/
providers.txt
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
==========
Fri Jan 19 10:13:42 -0800 2018
The New Face of LcResource Providers, after the Great Context Refactor:
0. Archives have a source, reference, and namespace uuid
A central authority ties reference to namespace uuid, and delivers these out to clients after authentication (or for free, in the case of public resources). Knowing the namespace uuid allows a client to correctly map a key to a UUID and retrieve information about the entity. Alternately, clients that simply know the UUID can query for it but may not have access to retrieve the entity.
An LcResource-- contained in the interfaces package-- is a spec for creating an archive.
The interfaces package also contains a factory for creating a small subset of archives-- those that do not require the resources package-- to wit, anything that only depends on the BasicArchive: Qdb, Antelope clients, and Foregrounds.
0.1 Foregrounds
The purpose of a foreground is to collect (or create) observations from both the real world and from catalogued resources.
Contents of a foreground:
- flows
- contexts
- quantities
- fragments
LcForeground inherits from BasicArchive and adds fragments and ForegroundImplementation; cannot store processes. Foregrounds do not have an nsuuid ((??))-- they require a catalog and use UUIDs from catalog references. However, they have a bespoke list of fragment names that allow users to give non-uuid external refs to their fragments. AND they have the Qdb upstream so the user can still __getitem__ using canonical names.
When new flows are created, they may be renamed in the foreground- which is why nsuuid should not be used to generate uuids; instead uuid4 should be used.
0.1.1 Transitioning Foregrounds to use link instead of uuid
In order for foregrounds to store entities from multiple origins with the same uuid or external ref, foregrounds will start using links as the default keys. This should not be too hard to deal with, with the proviso that if multiple entities have the same external ref, one will be picked nondeterministically.
Maybe we should transform the entire system to use links instead of uuids. What is the advantage of uuids, if we consider them to no longer be unique? why do we even have uuids?
- they are unique within an origin; plus, lots of people already use them. foregrounds ARE special in that they are designed to hold information from many sources, and in fact they NEED to be able to distinguish between differently-versioned entities in ways that other BasicArchive subclasses don't.
The proposed plan is:
* create a _uuid_map defaultdict that maps uuid to a set of links
* _key_to_id returns a valid key to _entities
- in LcForeground, it accommodates links, uuids (nondeterministically), or fragment names
* change _add signature to _add(entity, key)
- update BasicArchive.add() to use entity.uuid as key
- update LcForeground.add() to use entity.link as key
* some minor plumbing to ensure that _key_to_id returns uuid expectation is not violated in client code
and that's it!
0.2 Archives
LcArchive is contained in the resources package-- it provides the ability to handle reference processes, allocations, and compute background systems (with partial ordering and foreground generation). The main purpose of an archive is to extract tagged entities from a data source, and to store and make available two composite data types:
= Exchange
- relates one process to one flowable in one direction, with an optional termination
- Exchange value is a proportionality between the magnitude of the one flow and the process's reference
= Characterization
- relates one flowable to one quantity in one context, with an optional locale
- Characterization factor is a proportionality between the magnitude of the quantity and the flow's reference
Concisely, foregrounds are for collecting observed exchanges; the Qdb is for collecting observed characterizations.
LcArchive inherits from BasicArchive and adds Inventory, Background, and Configure; cannot store fragments.
LcArchives (with background) can produce exchange lists that can be used to generate fragments in foreground systems; similarly, foreground systems (once constructed) can be "archived" into JSON archives in which fragment systems (under a given scenario specification) are hardened into unit processes. (this still TODO)
I. Archives store contexts, flowables, quantities, and processes (and sometimes fragments)
All entities have a UUID.
Entity uuids are either supplied directly via LcEntity(), or computed from the namespace via LcArchive.new(etype, name). Do away with LcEntity.new()
In the ILCD (and ecospold2) case, UUIDs identify flows (or intermediate or elementary exchanges), which are flowable + context. We want to store these too, right? so ILCD archives (and maybe ecoinvent archives too) have a _flow_list dict, which maps uuid to a flow+context pair for quick retrieval. We can investigate whether this should be a more general purpose feature, but I think it should only be used in instances where archives already have stable UUIDs that we want to preserve-- but these will not be serialized.
I.0 Accessing root data
Archives are also the framework that should be used to provide generalized access to source data (e.g. execute XML queries).. still need to figure this out, but there should be some way to ask an archive to perform an operation on the source file for a given entity-- the implementation would have to be subclass-specific since obviously flows and quantities don't have universal representation in different archive subclasses.
Well, the xml-based ones [unevenly] implement .objectify() so that at least gets us the raw info.
I.1 Characteristics of Entity Types:
Context:
- Name is unique ID
- Reference is nominal direction. Not sure how to figure this out. One idea:
= _sink: +1 for every time the context is observed at the end of an output flow;
= _source: +1 for every time the context is observed at the end of an input flow;
= direction: Input if (_sink - _source) < 0 else Output
= this works fine as long as an archive has 0 or few inverted processes
- optional parent (Adding a parent will update parent's internal list of children)
Quantity: (no changes)
- Name or unitstring is unique ID (unitstring only in deficient data systems e.g. ecospold v1)
- Reference Unit
- optional UnitConversion property as implemented
Flowable: (deprecate Compartment property in favor of context entity)
- Name is unique ID
- reference quantity-- has characterization factor 1 in all contexts
- required CasNumber (may be blank)
- optional Classification (e.g. 'halogenated organic compound')
- Characterizations: unordered set of
(quantity, flowable, context) tuples with an included dict of [location]->factor
Process: (no changes) (only permitted in LcArchive subclasses)
- UUID is unique ID (could be derived from name)
- required SpatialScope, TemporalScope
- optional Classification (e.g. 'plastics and rubber manufacturing')
- Exchanges: unordered set of
(process, flowable, direction, termination) tuples, as implemented
Fragment: (no changes) (only permitted in LcForeground / subclasses)
- UUID4 or user-set name is unique ID
- required flowable + direction
- optional parent fragment is reference entity
- optional balance setting, background setting, exchange_value
- constructed exchange values + terminations by scenario
I.2
Below is crufty-- or already implemented
==========
Tue Jan 09 10:16:59 -0800 2018
The classes in the 'providers' subdirectory all function to open different data sources and extract information into the archive format (or into the interface specification)
This helps me clarify what exactly is required to subclass an LcArchive-- I think it's mainly two things:
- _fetch(uid)
- _load_all()
_load_all() is easy-- just list the processes and load them.
_fetch() is where all the complexity is: need to recursively fetch quantities for flows for exchanges, just like others.
Thu 2018-01-11 15:34:51 -0800
Now that we've [finally] gotten the resource generation working right, we need to think about a couple things:
1- modifying archives. It would be really slick if making alterations to an archive automatically assigned it a new semantic reference. We could do that with archive tools, but there is no real way for an archive-- let alone a resource-- to detect when entities stored within it are modified. What we can do is create an export function that spins out a new archive by appending a new semantic tag or by incrementing / modding the trailing tag.
right now we can create archives, with arbitrary names, and they get stored in the catalog directory. This is just a wrapper around create_source_cache() which is a bit indirect.
To do above, what we want to do is essentially change the resource's reference-- which means relocating it in the resolver's index.
THe problem is: non-JSON-type resources should not be writable. The only kinds of things you should be able to save from a non-writable resource are indices and archives. (but the save function needs to still be enabled for these resources for that reason). When you want to make a new archive, you should do so explicitly.
We should somehow log the provenance of the archive? I guess by enforcing similarity in the semantic ref.
So what should we be allowed to do? only append. And the append
When we create a NEW resource, we should append an 8-character date code to it; spinoff references should have the date removed, additional sgment appended, new date appended.
Just to recap-- what we are doing to local.uslci.ecospold is creating local.uslci.ecospold.clean.20180112, which we do by the following steps:
+ user supplies new semantic name segment
* construct new semantic name
* load_all()
* alter archive's source
* make_cache(_archive_dir/new.semantic.name.json.gz)
* resource from existing archive (source = cache, ref = new.semantic.name, dstype = json; interfaces, privacy, priority, args kept)
*
Fri 2018-01-12 13:16:11 -0800
In fact, almost all of this should happen inside the archive itself. [done]
Now, what we must do is:
+ user supplies new semantic name segment
* archive creates descendant
* resource from existing archive
* remove archive from old resource
==========
Mon Jan 15 15:22:59 -0800 2018
That's all done-- now the problem is persistent configuration. How should this work?
- configuration options are established by the archive providers that are being configured.
- The four core configuration options, established in LcArchive, are:
- ConfigSetReference -- specify a reference exchange for a process
- ConfigBadReference -- remove the reference designation for an exchange
- ConfigFlowCharacterization -- specify a quantity + value (and optional location) for a flow
: note: this will get complicated, requiring context + quantity + value + optional location after refactor
- ConfigAllocation -- specify a quantity to be used for partitioning allocation
- a resource author has to manually configure the resource-- this should be done interactively with something that knows all the various config options available.
- config options are:
= named tuples? just sequences of values to be assigned to a particular config method
= subclasses? hold their own serialization + validation checking properties
The usage I want to demonstrate is:
- configuration is already set and stored in the resource
- configuration gets deserialized + added by the archive
- resource must be saved manually