Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Audit and document reserved document and element properties #17

Open
bsowell opened this issue Sep 8, 2023 · 0 comments
Open

Audit and document reserved document and element properties #17

bsowell opened this issue Sep 8, 2023 · 0 comments

Comments

@bsowell
Copy link
Contributor

bsowell commented Sep 8, 2023

We use a variety of properties in our transforms -- for example coordinates and page_number. We should audit these and make sure we are clear which ones are necessary. In some cases we may want to make these top level fields in the Element or Document class.

eric-anderson added a commit that referenced this issue Feb 2, 2024
* Create sycamore-local-development-example.md

* Update sycamore-local-development-example.md

* Update sycamore-local-development-example.md

* Update sycamore-local-development-example.md

* Update sycamore-local-development-example.md

* Update sycamore-local-development-example.md

* Update sycamore-local-development-example.md

* Update sycamore-local-development-example.md

* Update sycamore-local-development-example.md

* Update sycamore-local-development-example.md

* Update sycamore-local-development-example.md

* Update sycamore-local-development-example.md

* Update sycamore-local-development-example.md

* Update sycamore-local-development-example.md

* Update sycamore-local-development-example.md

* Update sycamore-local-development-example.md

* Update sycamore-local-development-example.md

* Update sycamore-local-development-example.md

* Add files via upload

* Update my branch (#14)

* Update README.md

* Switch to using profiles for compose.yaml. (#6)

This change avoids the "orphaned containers" warning message we got with the old approach.
It leaves the complexity of running the commands unchanged.

Also fix a missing command in the clean up step (was missing the down before reset)

* Add support for crawling arbitrary websites (#7)

* Add support for downloading arbitrary websites via http

also add newlines between containers to make file more readable

* Add documentation on how to crawl an arbitrary website.

* Review fixes

* Add build stamping to our opensearch container. (#8)

* Add support for specifying the version for the containers. (#9)

Initial configuration defaults to the stable version which won't exist until we mark it, so
for now, people will need to use either:

VERSION=latest docker compose up
or
VERSION=latest_rc docker compose up

* Add .gitignore. Remove file that should have been ignored. (#10)

* Add --pull=always to make sure people get up-to-date images. (#11)

* Add --pull=always to make sure people get up-to-date images.

* Fix typos.

* Minor readme improvements found during final testing (#13)

* Run sort all via 'compose run' not 'compose up' the latter restarts all the containers and will
  reset env vars if they aren't set consistently.

* Document what to do on MacOS if opensearch isn't starting

* Give a pointer to the command to get version info when people are reaching out for help

---------

Co-authored-by: Eric Anderson <[email protected]>

* Update my branch (#15)

* Update README.md

* Switch to using profiles for compose.yaml. (#6)

This change avoids the "orphaned containers" warning message we got with the old approach.
It leaves the complexity of running the commands unchanged.

Also fix a missing command in the clean up step (was missing the down before reset)

* Add support for crawling arbitrary websites (#7)

* Add support for downloading arbitrary websites via http

also add newlines between containers to make file more readable

* Add documentation on how to crawl an arbitrary website.

* Review fixes

* Add build stamping to our opensearch container. (#8)

* Add support for specifying the version for the containers. (#9)

Initial configuration defaults to the stable version which won't exist until we mark it, so
for now, people will need to use either:

VERSION=latest docker compose up
or
VERSION=latest_rc docker compose up

* Add .gitignore. Remove file that should have been ignored. (#10)

* Add --pull=always to make sure people get up-to-date images. (#11)

* Add --pull=always to make sure people get up-to-date images.

* Fix typos.

* Minor readme improvements found during final testing (#13)

* Run sort all via 'compose run' not 'compose up' the latter restarts all the containers and will
  reset env vars if they aren't set consistently.

* Document what to do on MacOS if opensearch isn't starting

* Give a pointer to the command to get version info when people are reaching out for help

---------

Co-authored-by: Eric Anderson <[email protected]>

* Update README.md

* Update sycamore-local-development-example.md

* Update README.md

* Update README.md

* Update my branch (#17)

* Update README.md

* Switch to using profiles for compose.yaml. (#6)

This change avoids the "orphaned containers" warning message we got with the old approach.
It leaves the complexity of running the commands unchanged.

Also fix a missing command in the clean up step (was missing the down before reset)

* Add support for crawling arbitrary websites (#7)

* Add support for downloading arbitrary websites via http

also add newlines between containers to make file more readable

* Add documentation on how to crawl an arbitrary website.

* Review fixes

* Add build stamping to our opensearch container. (#8)

* Add support for specifying the version for the containers. (#9)

Initial configuration defaults to the stable version which won't exist until we mark it, so
for now, people will need to use either:

VERSION=latest docker compose up
or
VERSION=latest_rc docker compose up

* Add .gitignore. Remove file that should have been ignored. (#10)

* Add --pull=always to make sure people get up-to-date images. (#11)

* Add --pull=always to make sure people get up-to-date images.

* Fix typos.

* Minor readme improvements found during final testing (#13)

* Run sort all via 'compose run' not 'compose up' the latter restarts all the containers and will
  reset env vars if they aren't set consistently.

* Document what to do on MacOS if opensearch isn't starting

* Give a pointer to the command to get version info when people are reaching out for help

* Update README.md (#16)

Update README.md to provide an overview of the steps and remove a typo.

* Update README.md

Small changes

* Update README.md

Removed space

---------

Co-authored-by: Eric Anderson <[email protected]>

* Partial review by Eric. Up to the last failure.

* Update sycamore-local-development-example.md

* Update sycamore_local_dev_example.ipynb

* Update sycamore-local-development-example.md

* Update sycamore-local-development-example.md

* Update sycamore-local-development-example.md

* Update sycamore-local-development-example.md

* Update sycamore-local-development-example.md

* Update sycamore-local-development-example.md

* Update sycamore-local-development-example.md

* Update sycamore_local_dev_example.ipynb

* Minor fixes:

* Fix /tmp path
* Fix font path on linux
* Fix typo

* Fix script up to step 3k.

The bug was that the variable names were used inconsistently, and as a result the initial
partitioning was dropped in the remainder of the processing.

Also add a bunch more documentation on what the output should be.

* Update sycamore-local-development-example.md

* Update sycamore-local-development-example.md

* Update sycamore-local-development-example.md

* Update sycamore-local-development-example.md

* Update the remainig steps.

* Fixup how we adjust the script to match with earlier changes.
* Renumber to 5[a-e] so that all the steps have unique numbers.
* Fix typo

* Update sycamore-local-development-example.md

* ispell

* Update sycamore-local-development-example.md

* Delete sycamore_local_dev_example.ipynb

* Add files via upload

* Delete sycamore_local_dev_example.ipynb

* Add files via upload

* Cleanup step 5 instructions

* Cleanup step 5 instructions more

---------

Co-authored-by: Eric Anderson <[email protected]>
HenryL27 added a commit that referenced this issue Mar 26, 2024
* add changes required to build from only the plugin

Signed-off-by: HenryL27 <[email protected]>

* use protoletariat to fix protobuf generation.

Signed-off-by: HenryL27 <[email protected]>

* add protoc to gh actions

Signed-off-by: HenryL27 <[email protected]>

* fix test imports

Signed-off-by: HenryL27 <[email protected]>

---------

Signed-off-by: HenryL27 <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant