Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

"Golden spike" PR #488

Draft
wants to merge 47 commits into
base: main
Choose a base branch
from
Draft

"Golden spike" PR #488

wants to merge 47 commits into from

Commits on Oct 28, 2023

  1. Configuration menu
    Copy the full SHA
    a859b58 View commit details
    Browse the repository at this point in the history
  2. index_backend().

    knighton committed Oct 28, 2023
    Configuration menu
    Copy the full SHA
    bd0208a View commit details
    Browse the repository at this point in the history
  3. Fix.

    knighton committed Oct 28, 2023
    Configuration menu
    Copy the full SHA
    69575bc View commit details
    Browse the repository at this point in the history
  4. task.py for benchmarking.

    knighton committed Oct 28, 2023
    Configuration menu
    Copy the full SHA
    42b59f1 View commit details
    Browse the repository at this point in the history
  5. generate_datasets.py.

    knighton committed Oct 28, 2023
    Configuration menu
    Copy the full SHA
    2fb1b09 View commit details
    Browse the repository at this point in the history
  6. Fix.

    knighton committed Oct 28, 2023
    Configuration menu
    Copy the full SHA
    11dd673 View commit details
    Browse the repository at this point in the history
  7. Organize/divide streaming/base/util.py:

    Into:
    - importing
    - merging,
    - pretty
    - retrying
    - shared
    - storage.
    knighton committed Oct 28, 2023
    Configuration menu
    Copy the full SHA
    82737e0 View commit details
    Browse the repository at this point in the history
  8. Completely rip out and rewrite pretty args handling:

    Was:
    - bytes_to_int
    - number_abbrev_to_int
    
    Now:
    - normalize_dec_bytes
    - normalize_bin_bytes
    - normalize_bytes
    - normalize_count
    - normalize_duration
    knighton committed Oct 28, 2023
    Configuration menu
    Copy the full SHA
    3212f66 View commit details
    Browse the repository at this point in the history
  9. Layer several new storage APIs wrapping/complementing streaming/base/…

    …storage/.
    
    Let's properly integrate these later.
    
    - walk_dir()
    - Very Fancy list_dataset_files()
    - smart_download_file()
    knighton committed Oct 28, 2023
    Configuration menu
    Copy the full SHA
    eb93bea View commit details
    Browse the repository at this point in the history
  10. Configuration menu
    Copy the full SHA
    23554ac View commit details
    Browse the repository at this point in the history
  11. Add cli/index_parquet.py.

    knighton committed Oct 28, 2023
    Configuration menu
    Copy the full SHA
    c711567 View commit details
    Browse the repository at this point in the history
  12. Configuration menu
    Copy the full SHA
    4ea01b2 View commit details
    Browse the repository at this point in the history
  13. Configuration menu
    Copy the full SHA
    157381a View commit details
    Browse the repository at this point in the history
  14. Long lines.

    knighton committed Oct 28, 2023
    Configuration menu
    Copy the full SHA
    c72127f View commit details
    Browse the repository at this point in the history
  15. Configuration menu
    Copy the full SHA
    d2be6a0 View commit details
    Browse the repository at this point in the history
  16. Fix.

    knighton committed Oct 28, 2023
    Configuration menu
    Copy the full SHA
    da6f4af View commit details
    Browse the repository at this point in the history
  17. Move benchmarks up and out.

    knighton committed Oct 28, 2023
    Configuration menu
    Copy the full SHA
    b0fa3d7 View commit details
    Browse the repository at this point in the history
  18. Fix.

    knighton committed Oct 28, 2023
    Configuration menu
    Copy the full SHA
    4a22638 View commit details
    Browse the repository at this point in the history
  19. Configuration menu
    Copy the full SHA
    cb80865 View commit details
    Browse the repository at this point in the history
  20. Update paths accordingly.

    knighton committed Oct 28, 2023
    Configuration menu
    Copy the full SHA
    4851888 View commit details
    Browse the repository at this point in the history
  21. Update more paths.

    knighton committed Oct 28, 2023
    Configuration menu
    Copy the full SHA
    1051474 View commit details
    Browse the repository at this point in the history
  22. Formatting.

    knighton committed Oct 28, 2023
    Configuration menu
    Copy the full SHA
    65ef0de View commit details
    Browse the repository at this point in the history
  23. Fix.

    knighton committed Oct 28, 2023
    Configuration menu
    Copy the full SHA
    408999a View commit details
    Browse the repository at this point in the history
  24. Move examples/ to top level.

    knighton committed Oct 28, 2023
    Configuration menu
    Copy the full SHA
    b38f8a3 View commit details
    Browse the repository at this point in the history
  25. Update multimodal.

    knighton committed Oct 28, 2023
    Configuration menu
    Copy the full SHA
    ff90826 View commit details
    Browse the repository at this point in the history
  26. Configuration menu
    Copy the full SHA
    a7808ae View commit details
    Browse the repository at this point in the history
  27. Configuration menu
    Copy the full SHA
    c857ed6 View commit details
    Browse the repository at this point in the history
  28. Configuration menu
    Copy the full SHA
    89d5719 View commit details
    Browse the repository at this point in the history
  29. Configuration menu
    Copy the full SHA
    c09248c View commit details
    Browse the repository at this point in the history

Commits on Oct 29, 2023

  1. Configuration menu
    Copy the full SHA
    9befaa6 View commit details
    Browse the repository at this point in the history
  2. Configuration menu
    Copy the full SHA
    c4a5094 View commit details
    Browse the repository at this point in the history
  3. Configuration menu
    Copy the full SHA
    48dce5c View commit details
    Browse the repository at this point in the history
  4. Naming.

    knighton committed Oct 29, 2023
    Configuration menu
    Copy the full SHA
    b0d1543 View commit details
    Browse the repository at this point in the history
  5. Fixes.

    knighton committed Oct 29, 2023
    Configuration menu
    Copy the full SHA
    99ad0c0 View commit details
    Browse the repository at this point in the history
  6. cli/hash.py.

    knighton committed Oct 29, 2023
    Configuration menu
    Copy the full SHA
    b38fce0 View commit details
    Browse the repository at this point in the history
  7. Configuration menu
    Copy the full SHA
    7a9fc90 View commit details
    Browse the repository at this point in the history

Commits on Nov 5, 2023

  1. Configuration menu
    Copy the full SHA
    5247bfe View commit details
    Browse the repository at this point in the history
  2. Configuration menu
    Copy the full SHA
    6dc5e22 View commit details
    Browse the repository at this point in the history
  3. Switch to box-drawing chars in Tabulator. Example:

    ```
        ─ ──────── ─ ────── ─ ──────────── ─ ──────── ─ ────────────── ─ ────── ─ ──────────── ─ ────────────── ─
        │ format   │    sec │      samples │  usec/sp │          bytes │  files │   bytes/file │ max bytes/file │
        ─ ──────── ─ ────── ─ ──────────── ─ ──────── ─ ────────────── ─ ────── ─ ──────────── ─ ────────────── ─
        │ csv      │  5.131 │    2,097,152 │    2.446 │    171,899,840 │     41 │    4,192,679 │      8,388,616 │
        │ jsonl    │ 12.535 │    2,097,152 │    5.977 │    211,747,148 │     51 │    4,151,904 │      8,388,607 │
        │ lance    │  1.074 │    2,097,152 │    0.512 │    176,961,928 │     19 │    9,313,785 │     11,067,536 │
        │ mds      │  8.649 │    2,097,152 │    4.124 │    176,880,177 │     23 │    7,690,442 │      8,388,604 │
        │ parquet  │  1.323 │    2,097,152 │    0.631 │     63,528,364 │     16 │    3,970,522 │      3,973,860 │
        │ delta    │ 16.881 │    2,097,152 │    8.050 │     55,106,514 │     66 │      834,947 │      1,710,970 │
        ─ ──────── ─ ────── ─ ──────────── ─ ──────── ─ ────────────── ─ ────── ─ ──────────── ─ ────────────── ─
    ```
    knighton committed Nov 5, 2023
    Configuration menu
    Copy the full SHA
    18f6474 View commit details
    Browse the repository at this point in the history
  4. Rewrite task.py.

    knighton committed Nov 5, 2023
    Configuration menu
    Copy the full SHA
    52af2cb View commit details
    Browse the repository at this point in the history
  5. Fixes.

    knighton committed Nov 5, 2023
    Configuration menu
    Copy the full SHA
    bc125b4 View commit details
    Browse the repository at this point in the history
  6. Fix.

    knighton committed Nov 5, 2023
    Configuration menu
    Copy the full SHA
    a2ff86f View commit details
    Browse the repository at this point in the history
  7. Misc.

    knighton committed Nov 5, 2023
    Configuration menu
    Copy the full SHA
    57e7571 View commit details
    Browse the repository at this point in the history
  8. Configuration menu
    Copy the full SHA
    52dcb42 View commit details
    Browse the repository at this point in the history
  9. Split out Tabulator.

    knighton committed Nov 5, 2023
    Configuration menu
    Copy the full SHA
    f1e10bb View commit details
    Browse the repository at this point in the history
  10. Configuration menu
    Copy the full SHA
    56674e8 View commit details
    Browse the repository at this point in the history
  11. Refactor.

    knighton committed Nov 5, 2023
    Configuration menu
    Copy the full SHA
    cbfcab3 View commit details
    Browse the repository at this point in the history