Skip to content

Database and UI for contributing to catalog data

Jennifer Sun edited this page May 9, 2022 · 4 revisions

Goal: Create a workflow for adding new data to the catalog with less errors, validation, and make it less labor intensive on the user.

Problem

When someone wants to add a new resource to catalog data, they must use Git to clone and edit/add a new file. Then they check if the data added is "correct", which means it must be a valid json object, have the correct types of parameters, and to "check" the correctness, they run make in the directory. This make is a series of scripts that build the json objects used in the catalog API and UI. It is long and complex, and does a lot more than just verify the json added to the repository; but this means that often the make file may have errors that do not matter to the user whose goal is to add or edit a new object.

Another pain point that this process has is handling the "submodules". Because catalog-data-caida is Caida specific data, and not accessible publicly, catalog-data uses a submodule to point to that directory for those who do not access, and the script handles this with placeholder files. However, submodules can be difficult to maintain, and there have been instances of make errors from not having an update submodule or from using the wrong branch. Submodules are not generally a process that git users deal with, and the errors do not make it clear that is from the submodule.

Also, when making or editing json or markdown files, it can be difficult to know what fields are available to add, and what format each field is expecting. For some fields, the format must be one out of a list, and there is also no way to let the user know which forms are required. Additionally, it is important that users who edit these files to have background with JSON and what is valid JSON. It may be impossible for users to know what to add to a catalog data resource; and testing requires many steps.

For the script developers, it is difficult to use "passing the make" as a marker of success. Even if the make is successful, it may not mean the API and data will run as expected. For example, resources.name is a required field in the API, but some resources do not have a name. To get a resource, there is a script that parses the data from external and caida's pubdb, and puts "resources" in based on other fields in those files. To make sure all resources have a "name", it would mean updating that script to correctly parse through these other file's fields correctly and managing all the edge cases. If there was a way to store all of this data in a uniform way, it would be easier to handle the edge cases for each type.

If we were able to reduce these pain points and make it easier to add new data to the catalog, we would benefit from:

  • reduction of errors and bugs in the UI
  • encouragement of more users to add resources to catalog
  • reduce script debugging and management in catalog-data
  • reduce need for manual documentation, and manage documentation from the dynamically generated scheme
  • deprecation of pubdb (and external papers yaml), which would reduce the number of inputs that catalog-data scripts need to read from
  • error messages will be more specific for the "user" -- developers will get the "script" errors, and other users will get "why didn't my new entry work" errors

Proposed Solution

My proposed solution is to add a ReactJS UI Form that users can add new, edit, and delete catalog entries. image

Flow

  1. User makes/updates entry on ReactJS UI for catalog data
    1. The UI will be built on the scheme, so all validation and selection of each field will have 1 source of truth (the scheme)
    2. This could be built on, and adding entries with "content tabs" can be tested and previewed before getting added to the UI.
  2. UI creates a "POST" (a mutation on graphQL) to update or add the new entry to catalog-data
  3. Backend: creates the entry, and runs the scripts to calculate the entry's connections to other entries (updates the node graph)
    1. The backend would need to include a database in order to make changes

This requires the backend to be a database + graphQL API, instead of the scripts -> JSON objects -> graphQL API. This will also require some sort of "user account" to specify permissions of CAIDA resources and other resources.

Backend Database

PostGres Database, I am not too familiar with this part, but after initial conversations it is doable, but will take work.

User Accounts

To start, we can have 2 accounts: CAIDA and non-CAIDA. To be the "CAIDA" account, using the CAIDA address (by VPN or socks Proxy) will allow the user to edit or update CAIDA resources, and will also allow users to "delete" any resource. non-CAIDA users will be able to update or add non CAIDA resources, and will not be able to delete resources.