HATS renaming (#443)

* Initial renaming hipscat -> hats (#418) * Initial renaming * Add the data back... * Rename test file. * Update notebooks. * Update requirement in branch * Fiiiine * Initial work toward properties file. (#422) * Initial work toward properties file. * Responses to code review comment. * Fix reference to partition info constant (#423) * Fix reference to partition info constant * Remove unused import (and unused typedef) * Fix tests for table properties, and add additional on creation. * catalog name in margin generation * Un-skip tests. Fix data so tests can pass. (#425) * Fix reference to partition info constant * Remove unused import (and unused typedef) * Fix tests for table properties, and add additional on creation. * catalog name in margin generation * Un-skip tests. Fix data so tests can pass. * change to spatial index * regen test data * fix unit tests * regenerate test files with point map files * unskip test * fix mypy * update index type * fix review * Update test data for dataset insertion (#440) * Update test data for dataset insertion * Update dependency. * Fix nan references for numpy 2 * Update repo links --------- Co-authored-by: Sean McGuire <[email protected]> Co-authored-by: Sean McGuire <[email protected]>
astronomy-commons · Oct 17, 2024 · aa1780a · aa1780a
1 parent c478401
commit aa1780a
Show file tree

Hide file tree

Showing 413 changed files with 2,801 additions and 3,106 deletions.
diff --git a/.gitignore b/.gitignore
@@ -156,8 +156,6 @@ _html/
 # Project initialization script
 .initialize_new_project.sh
 
-# large, unused fits files
-point_map.fits
-
 # test notebook
-dev/test.ipynb
+dev/test.ipynb
+docs/tutorials/pre_executed/data
diff --git a/README.md b/README.md
@@ -17,17 +17,17 @@
 
 A framework to facilitate and enable spatial analysis for extremely large astronomical databases 
 (i.e. querying and crossmatching O(1B) sources). This package uses dask to parallelize operations across
-multiple HiPSCat partitioned surveys.
+multiple HATS partitioned surveys.
 
 Check out our [ReadTheDocs site](https://lsdb.readthedocs.io/en/stable/)
 for more information on partitioning, installation, and contributing.
 
 See related projects:
 
-* HiPSCat ([on GitHub](https://github.com/astronomy-commons/hipscat))
-  ([on ReadTheDocs](https://hipscat.readthedocs.io/en/stable/))
-* HiPSCat Import ([on GitHub](https://github.com/astronomy-commons/hipscat-import))
-  ([on ReadTheDocs](https://hipscat-import.readthedocs.io/en/stable/))
+* HATS ([on GitHub](https://github.com/astronomy-commons/hats))
+  ([on ReadTheDocs](https://hats.readthedocs.io/en/stable/))
+* HATS Import ([on GitHub](https://github.com/astronomy-commons/hats-import))
+  ([on ReadTheDocs](https://hats-import.readthedocs.io/en/stable/))
 
 ## Contributing
 

diff --git a/benchmarks/benchmarks.py b/benchmarks/benchmarks.py
@@ -22,15 +22,15 @@
 
 
 def load_small_sky():
-    return lsdb.read_hipscat(TEST_DIR / DATA_DIR_NAME / SMALL_SKY_DIR_NAME, catalog_type=lsdb.Catalog)
+    return lsdb.read_hats(TEST_DIR / DATA_DIR_NAME / SMALL_SKY_DIR_NAME, catalog_type=lsdb.Catalog)
 
 
 def load_small_sky_order1():
-    return lsdb.read_hipscat(TEST_DIR / DATA_DIR_NAME / SMALL_SKY_ORDER1, catalog_type=lsdb.Catalog)
+    return lsdb.read_hats(TEST_DIR / DATA_DIR_NAME / SMALL_SKY_ORDER1, catalog_type=lsdb.Catalog)
 
 
 def load_small_sky_xmatch():
-    return lsdb.read_hipscat(TEST_DIR / DATA_DIR_NAME / SMALL_SKY_XMATCH_NAME, catalog_type=lsdb.Catalog)
+    return lsdb.read_hats(TEST_DIR / DATA_DIR_NAME / SMALL_SKY_XMATCH_NAME, catalog_type=lsdb.Catalog)
 
 
 def time_kdtree_crossmatch():
@@ -63,8 +63,8 @@ def time_box_filter_on_partition():
 
 
 def time_create_midsize_catalog():
-    return lsdb.read_hipscat(BENCH_DATA_DIR / "midsize_catalog")
+    return lsdb.read_hats(BENCH_DATA_DIR / "midsize_catalog")
 
 
 def time_create_large_catalog():
-    return lsdb.read_hipscat(BENCH_DATA_DIR / "large_catalog")
+    return lsdb.read_hats(BENCH_DATA_DIR / "large_catalog")
diff --git a/docs/_static/lazy_diagram.svg b/docs/_static/lazy_diagram.svg
diff --git a/docs/conf.py b/docs/conf.py
@@ -74,8 +74,8 @@
 
 pygments_style = "sphinx"
 
-# Cross-link hipscat documentation from the API reference:
+# Cross-link hats documentation from the API reference:
 # https://docs.readthedocs.io/en/stable/guides/intersphinx.html
 intersphinx_mapping = {
-    "hipscat": ("http://hipscat.readthedocs.io/en/stable/", None),
+    "hats": ("http://hats.readthedocs.io/en/stable/", None),
 }
diff --git a/docs/developer/contributing.rst b/docs/developer/contributing.rst
@@ -55,7 +55,7 @@ the GitHub repository. The next steps assume the creation of branches and PRs ar
     If you are (or expect to be) a frequent contributor, you should consider requesting
     access to the `hipscat-friends <https://github.com/orgs/astronomy-commons/teams/hipscat-friends>`_
     working group. Members of this GitHub group should be able to create branches and PRs directly
-    on LSDB, hipscat and hipscat-import, without the need of a fork.
+    on LSDB, hats and hats-import, without the need of a fork.
 
 Create a branch
 -------------------------------------------------------------------------------

diff --git a/docs/getting-started.rst b/docs/getting-started.rst
@@ -62,14 +62,14 @@ for more information.
 Loading a Catalog
 ~~~~~~~~~~~~~~~~~~~~~~~~~~
 
-Let's start by loading a HiPSCat formatted Catalog into LSDB. Use the :func:`lsdb.read_hipscat` function to
+Let's start by loading a HATS formatted Catalog into LSDB. Use the :func:`lsdb.read_hats` function to
 lazy load a catalog object. We'll pass in the URL to load the Zwicky Transient Facility Data Release 14
 Catalog, and specify which columns we want to use from it.
 
 .. code-block:: python
 
     import lsdb
-    ztf = lsdb.read_hipscat(
+    ztf = lsdb.read_hats(
         'https://data.lsdb.io/unstable/ztf/ztf_dr14/',
         columns=["ra", "dec", "ps1_objid", "nobs_r", "mean_mag_r"],
     )
@@ -94,7 +94,7 @@ usually see values).
 
 Where to get Catalogs
 ~~~~~~~~~~~~~~~~~~~~~~~~~~
-LSDB can load any catalogs in the HiPSCat format, locally or from remote sources. There are a number of
+LSDB can load any catalogs in the HATS format, locally or from remote sources. There are a number of
 catalogs available publicly to use from the cloud. You can see them with their URLs to load in LSDB at our
 website `data.lsdb.io <https://data.lsdb.io>`_
 
@@ -107,7 +107,7 @@ If you have your own data not in this format, you can import it by following the
 Performing Filters
 ~~~~~~~~~~~~~~~~~~~~~~~~~~
 
-LSDB can perform spatial filters fast, taking advantage of HiPSCat's spatial partitioning. These optimized
+LSDB can perform spatial filters fast, taking advantage of HATS's spatial partitioning. These optimized
 filters have their own methods, such as :func:`cone_search <lsdb.catalog.Catalog.cone_search>`. For the list
 of these methods see the full docs for the :func:`Catalog <lsdb.catalog.Catalog>` class.
 
@@ -132,7 +132,7 @@ get accurate results. This should be provided with the catalog by the catalog's
 
 .. code-block:: python
 
-    gaia = lsdb.read_hipscat(
+    gaia = lsdb.read_hats(
         'https://data.lsdb.io/unstable/gaia_dr3/gaia/',
         columns=["ra", "dec", "phot_g_n_obs", "phot_g_mean_flux", "pm"],
         margin_cache="https://data.lsdb.io/unstable/gaia_dr3/gaia_10arcs/",
@@ -166,13 +166,13 @@ Saving the Result
 ~~~~~~~~~~~~~~~~~~~~~~~~~~
 
 For large results, it won't be possible to ``compute()`` since the full result won't be able to fit into memory.
-So instead, we can run the computation and save the results directly to disk in hipscat format.
+So instead, we can run the computation and save the results directly to disk in hats format.
 
 .. code-block:: python
 
-    ztf_x_gaia.to_hipscat("./ztf_x_gaia")
+    ztf_x_gaia.to_hats("./ztf_x_gaia")
 
-This creates the following HiPSCat Catalog on disk:
+This creates the following HATS Catalog on disk:
 
 .. code-block::
 
@@ -182,11 +182,10 @@ This creates the following HiPSCat Catalog on disk:
     │   │   ├── Npix=57.parquet
     │   │   └── ...
     │   └── ...
-    ├── _metadata
     ├── _common_metadata
-    ├── catalog_info.json
-    ├── partition_info.csv
-    └── provenance_info.json
+    ├── _metadata
+    ├── properties
+    └── partition_info.csv
 
 Creation of Jupyter Kernel
 --------------------------

diff --git a/docs/index.rst b/docs/index.rst
@@ -11,7 +11,7 @@ large astronomical catalogs (e.g. querying and crossmatching O(1B) sources). It
 data processing challenges, in particular those brought up by `LSST <https://www.lsst.org/about>`_.
 
 Built on top of Dask to efficiently scale and parallelize operations across multiple distributed workers, it
-uses the `HiPSCat <https://hipscat.readthedocs.io/en/stable/>`_ data format to efficiently perform spatial
+uses the `HATS <https://hats.readthedocs.io/en/stable/>`_ data format to efficiently perform spatial
 operations.
 
 .. figure:: _static/gaia.png

diff --git a/docs/requirements.txt b/docs/requirements.txt
@@ -10,4 +10,5 @@ sphinx-autoapi
 sphinx-copybutton
 sphinx-book-theme
 sphinx-design
-git+https://github.com/astronomy-commons/hipscat.git@main
+git+https://github.com/astronomy-commons/hats.git@main
+git+https://github.com/astronomy-commons/hats-import.git@main
diff --git a/docs/tutorials/exporting_results.ipynb b/docs/tutorials/exporting_results.ipynb
@@ -6,16 +6,16 @@
    "source": [
     "# Exporting results\n",
     "\n",
-    "You can save the catalogs that result from running your workflow to disk, in parquet format, using the `to_hipscat` call. \n",
+    "You can save the catalogs that result from running your workflow to disk, in parquet format, using the `to_hats` call. \n",
     "\n",
     "You must provide a `base_catalog_path`, which is the output path for your catalog directory, and (optionally) a name for your catalog, `catalog_name`. The `catalog_name` is the catalog's internal name and therefore may differ from the catalog's base directory name. If the directory already exists and you want to overwrite its content set the `overwrite` flag to True. Do not forget to provide the necessary credentials, as `storage_options` to the UPath construction, when trying to export the catalog to protected remote storage.\n",
     "\n",
     "For example, to save a catalog that contains the results of crossmatching Gaia with ZTF to `\"./my_catalogs/gaia_x_ztf\"` one could run:\n",
     "```python\n",
-    "gaia_x_ztf_catalog.to_hipscat(base_catalog_path=\"./my_catalogs/gaia_x_ztf\", catalog_name=\"gaia_x_ztf\")\n",
+    "gaia_x_ztf_catalog.to_hats(base_catalog_path=\"./my_catalogs/gaia_x_ztf\", catalog_name=\"gaia_x_ztf\")\n",
     "```\n",
     "\n",
-    "The HiPSCat catalogs on disk follow a well-defined directory structure:\n",
+    "The HATS catalogs on disk follow a well-defined directory structure:\n",
     "\n",
     "```\n",
     "gaia_x_ztf/\n",

diff --git a/docs/tutorials/filtering_large_catalogs.ipynb b/docs/tutorials/filtering_large_catalogs.ipynb
@@ -9,7 +9,7 @@
    "source": [
     "# Filtering large catalogs\n",
     "\n",
-    "Large astronomical surveys contain a massive volume of data. Billion object, multi-terabyte sized catalogs are challenging to store and manipulate because they demand state-of-the-art hardware. Processing them is expensive, both in terms of runtime and memory consumption, and performing it in a single machine has become impractical. LSDB is a solution that enables scalable algorithm execution. It handles loading, querying, filtering and crossmatching astronomical data (of HiPSCat format) in a distributed environment. \n",
+    "Large astronomical surveys contain a massive volume of data. Billion object, multi-terabyte sized catalogs are challenging to store and manipulate because they demand state-of-the-art hardware. Processing them is expensive, both in terms of runtime and memory consumption, and performing it in a single machine has become impractical. LSDB is a solution that enables scalable algorithm execution. It handles loading, querying, filtering and crossmatching astronomical data (of HATS format) in a distributed environment. \n",
     "\n",
     "In this tutorial, we will demonstrate how to:\n",
     "\n",
@@ -93,7 +93,7 @@
    "outputs": [],
    "source": [
     "ztf_object_path = f\"{surveys_path}/ztf/ztf_dr14\"\n",
-    "ztf_object = lsdb.read_hipscat(ztf_object_path, columns=[\"ps1_objid\", \"ra\", \"dec\"])\n",
+    "ztf_object = lsdb.read_hats(ztf_object_path, columns=[\"ps1_objid\", \"ra\", \"dec\"])\n",
     "ztf_object"
    ]
   },
@@ -318,7 +318,7 @@
    "id": "9a887b31",
    "metadata": {},
    "source": [
-    "We can stack a several number of filters, which are applied in sequence. For example, `catalog.box_search().polygon_search()` should result in a perfectly valid HiPSCat catalog containing the objects that match both filters."
+    "We can stack a several number of filters, which are applied in sequence. For example, `catalog.box_search().polygon_search()` should result in a perfectly valid HATS catalog containing the objects that match both filters."
    ]
   },
   {

diff --git a/docs/tutorials/getting_data.ipynb b/docs/tutorials/getting_data.ipynb
@@ -6,7 +6,7 @@
    "source": [
     "# Getting data into LSDB\n",
     "\n",
-    "The most practical way to load data into LSDB is from catalogs in HiPSCat format, hosted locally or on a remote source. We recommend you to visit our own cloud repository, [data.lsdb.io](https://data.lsdb.io), where you are able to find large surveys publicly available to use."
+    "The most practical way to load data into LSDB is from catalogs in HATS format, hosted locally or on a remote source. We recommend you to visit our own cloud repository, [data.lsdb.io](https://data.lsdb.io), where you are able to find large surveys publicly available to use."
    ]
   },
   {
@@ -24,7 +24,7 @@
    "source": [
     "### Example: Loading Gaia DR3\n",
     "\n",
-    "Let's get Gaia DR3 into our workflow, as an example. It is as simple as invoking `read_hipscat` with the respective catalog URL, which you can copy directly from our website."
+    "Let's get Gaia DR3 into our workflow, as an example. It is as simple as invoking `read_hats` with the respective catalog URL, which you can copy directly from our website."
    ]
   },
   {
@@ -33,7 +33,7 @@
    "metadata": {},
    "outputs": [],
    "source": [
-    "gaia_dr3 = lsdb.read_hipscat(\"https://data.lsdb.io/unstable/gaia_dr3/gaia/\")\n",
+    "gaia_dr3 = lsdb.read_hats(\"https://data.lsdb.io/unstable/gaia_dr3/gaia/\")\n",
     "gaia_dr3"
    ]
   },
@@ -59,11 +59,11 @@
    "source": [
     "Note that it's important (and highly recommended) to:\n",
     "\n",
-    "- **Pre-select a small subset of columns** that satisfies your scientific needs. Loading an unnecessarily large amount of data leads to computationally expensive and inefficient workflows. To see which columns are available before even having to invoke `read_hipscat`, please refer to the column descriptions in each catalog's section on [data.lsdb.io](https://data.lsdb.io).\n",
+    "- **Pre-select a small subset of columns** that satisfies your scientific needs. Loading an unnecessarily large amount of data leads to computationally expensive and inefficient workflows. To see which columns are available before even having to invoke `read_hats`, please refer to the column descriptions in each catalog's section on [data.lsdb.io](https://data.lsdb.io).\n",
     "\n",
     "- **Load catalogs with their respective margin caches**, when available. These margins are necessary to obtain accurate results in several operations such as joining and crossmatching. For more information about margins please visit our [Margins](margins.ipynb) topic notebook.\n",
     "\n",
-    "Let's define the set of columns we need and add the margin catalog's path to our `read_hipscat` call."
+    "Let's define the set of columns we need and add the margin catalog's path to our `read_hats` call."
    ]
   },
   {
@@ -72,7 +72,7 @@
    "metadata": {},
    "outputs": [],
    "source": [
-    "gaia_dr3 = lsdb.read_hipscat(\n",
+    "gaia_dr3 = lsdb.read_hats(\n",
     "    \"https://data.lsdb.io/unstable/gaia_dr3/gaia/\",\n",
     "    margin_cache=\"https://data.lsdb.io/unstable/gaia_dr3/gaia_10arcs/\",\n",
     "    columns=[\n",
@@ -99,7 +99,7 @@
    "cell_type": "markdown",
    "metadata": {},
    "source": [
-    "When invoking `read_hipscat` only metadata information about that catalog (e.g. sky coverage, number of total rows and column schema) is loaded into memory! Notice that the ellipses in the previous catalog representation are just placeholders.\n",
+    "When invoking `read_hats` only metadata information about that catalog (e.g. sky coverage, number of total rows and column schema) is loaded into memory! Notice that the ellipses in the previous catalog representation are just placeholders.\n",
     "\n",
     "You will find that most use cases start with **LAZY** loading and planning operations, followed by more expensive **COMPUTE** operations. The data is only loaded into memory when we trigger the workflow computations, usually with a `compute` call.\n",
     "\n",

diff --git a/docs/tutorials/import_catalogs.ipynb b/docs/tutorials/import_catalogs.ipynb
@@ -7,12 +7,12 @@
     "collapsed": false
    },
    "source": [
-    "# Importing catalogs to HiPSCat format\n",
+    "# Importing catalogs to HATS format\n",
     "\n",
-    "This notebook presents two modes of importing catalogs to HiPSCat format:\n",
+    "This notebook presents two modes of importing catalogs to HATS format:\n",
     "\n",
     "1. `lsdb.from_dataframe()` method: helpful to load smaller catalogs from a single dataframe. data should have fewer than 1-2 million rows and the pandas dataframe should be less than 1-2G in-memory. if your data is larger, the format is complicated, you need more flexibility, or you notice any performance issues when importing with this mode, use the next mode.\n",
-    "2. `hipscat-import` package: for large datasets (1G - 100s of TB). this is a purpose-built map-reduce pipeline for creating hipscat catalogs from various datasets. in this notebook, we use a very basic dataset and basic import options. please see [the full package documentation](https://hipscat-import.readthedocs.io/) if you need to do anything more complicated."
+    "2. `hats-import` package: for large datasets (1G - 100s of TB). this is a purpose-built map-reduce pipeline for creating HATS catalogs from various datasets. in this notebook, we use a very basic dataset and basic import options. please see [the full package documentation](https://hats-import.readthedocs.io/) if you need to do anything more complicated."
    ]
   },
   {
@@ -119,8 +119,8 @@
     "    threshold=100,\n",
     ")\n",
     "\n",
-    "# Save it to disk in HiPSCat format\n",
-    "catalog.to_hipscat(f\"{tmp_dir.name}/from_dataframe\")"
+    "# Save it to disk in HATS format\n",
+    "catalog.to_hats(f\"{tmp_dir.name}/from_dataframe\")"
    ]
   },
   {
@@ -130,15 +130,15 @@
     "collapsed": false
    },
    "source": [
-    "## HiPSCat import pipeline"
+    "## HATS import pipeline"
    ]
   },
   {
    "cell_type": "markdown",
    "id": "3842520c",
    "metadata": {},
    "source": [
-    "Let's install the latest release of hipscat-import:"
+    "Let's install the latest release of hats-import:"
    ]
   },
   {
@@ -153,7 +153,7 @@
    },
    "outputs": [],
    "source": [
-    "!pip install git+https://github.com/astronomy-commons/hipscat-import.git@main --quiet"
+    "!pip install git+https://github.com/astronomy-commons/hats-import.git@main --quiet"
    ]
   },
   {
@@ -169,8 +169,8 @@
    "outputs": [],
    "source": [
     "from dask.distributed import Client\n",
-    "from hipscat_import.catalog.arguments import ImportArguments\n",
-    "from hipscat_import.pipeline import pipeline_with_client"
+    "from hats_import.catalog.arguments import ImportArguments\n",
+    "from hats_import.pipeline import pipeline_with_client"
    ]
   },
   {
@@ -226,7 +226,7 @@
    },
    "outputs": [],
    "source": [
-    "from_dataframe_catalog = lsdb.read_hipscat(f\"{tmp_dir.name}/from_dataframe\")\n",
+    "from_dataframe_catalog = lsdb.read_hats(f\"{tmp_dir.name}/from_dataframe\")\n",
     "from_dataframe_catalog"
    ]
   },
@@ -242,7 +242,7 @@
    },
    "outputs": [],
    "source": [
-    "from_import_pipeline_catalog = lsdb.read_hipscat(f\"{tmp_dir.name}/from_import_pipeline\")\n",
+    "from_import_pipeline_catalog = lsdb.read_hats(f\"{tmp_dir.name}/from_import_pipeline\")\n",
     "from_import_pipeline_catalog"
    ]
   },