Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Locked into UTCTime/UTCTime is slow #1082

Draft
wants to merge 1 commit into
base: master
Choose a base branch
from
Draft

Conversation

MaxGabriel
Copy link
Member

(This is really an issue with some supporting code attached, not a true PR)

At work we've got an admin page that returns all of the companies that use our product and some associated data about them. This page has been taking longer and longer to load, and after profiling we found 15% of the time was spent in deserializing UTCTimes from the database.

My coworker @chessai recommended I check out chronos, which Layer 3 and some others use for much faster time parsing.

Unfortunately, I think Persistent locks you into using UTCTime with Postgres, based on these lines:

, (k PS.timestamp, convertPV (PersistUTCTime. localTimeToUTC utc))
, (k PS.timestamptz, convertPV PersistUTCTime)

(I could be wrong about this, but it seems that way)

To test out Chronos, I modified persistent-postgresql to return PersistByteString for timestamps and made a benchmark. I enabled profiling when running the benchmarks and the generated .prof files and flamegraphs (generated by the sick https://github.com/fpco/ghc-prof-flamegraph) are in here.

This benchmark tested selecting 10,000 rows, each with two Text fields and two UTCTime fields. Using UTCTime, I got:

time                 448.3 ms   (432.3 ms .. 465.9 ms)
                     1.000 R²   (0.999 R² .. 1.000 R²)
mean                 470.0 ms   (459.2 ms .. 485.7 ms)
std dev              14.77 ms   (1.109 ms .. 18.59 ms)
variance introduced by outliers: 19% (moderately inflated)

Using OffsetDatetime from Chronos, I got:

benchmarking postAdminOrganizationsStatusR
time                 349.7 ms   (331.0 ms .. 373.9 ms)
                     0.999 R²   (0.998 R² .. 1.000 R²)
mean                 381.1 ms   (365.7 ms .. 404.3 ms)
std dev              21.85 ms   (1.356 ms .. 27.41 ms)
variance introduced by outliers: 19% (moderately inflated)

This is a fairly dramatic difference. Because of how common time fields are (especially with common patterns like createdAt and updatedAt metadata fields), deserializing timestamps can be quite a drain, especially when joining multiple tables with Esqueleto.


For context, this is my understanding of why chronos is faster:

  1. UTCTime is represented as an Integer of days since 1858-11-17, plus a number of Picoseconds (10^-12 seconds) (represented by [Pico](https://www.stackage.org/haddock/lts-15.11/base-4.13.0.0/Data-Fixed.html#t:Pico from Data.Fixed). Conversely, chronos uses machine integer types like Int and Int64, and only offers nanosecond resolution (10^-9 seconds)
  2. chronos has put a lot more work into making its parsing functions fast

A big question mark here is postgresql-simple—I'm not sure how much it in particular has slow parsing code, vs it being inherent to UTCTime.


My thoughts on this:

  1. It'd be great if Persistent had some benchmarks in general. I'm not sure what a good structure would be, would we need a persistent-benchmark package like persistent-test, that is then imported by the various backends?
  2. It's lame to be locked into UTCTime. Any ideas on fixes to this?
  3. Is there a way to make parsing UTCTime faster? I suspect probably the code in postgresql-simple could be improved, but I have no basis for this (I've looked at the code and seen it in profiling results, but I'm not experienced with high performance Haskell enough to look at the code and immediately see why it's slow).
  4. Could chronos be a good general purpose replacement for UTCTime in Persistent? I would be hesitant about this since it's much less commonly used in the ecosystem.
  5. Possibly he was joking, but fwiw Chessai thought parsing with Chronos, then converting the results to UTCTime, would probably be faster than parsing directly to UTCTime

@yitz-zoomin
Copy link

It's not lame for there to be a built-in PersistentField instance for UTCTime. It's the standard in the Haskell ecosystem and by far the most widely used.

That's not locking you in. There are many alternative time libraries with various trade-offs. There is nothing stopping you from writing a PersistField instance for any of them that you want to use with your favorite backend or backends. You could even release it on hackage if you think others will be interested.

I don't believe there is any reason that the PersistField instance for UTCTime can't be made much more efficient for any given backend. You're right that it might mean avoiding parseTime and formatTime from the time library. But I doubt that it would make sense to go via another time library like chronos, at least not for backends that store time types as integers, such as PostgreSQL.

@MaxGabriel
Copy link
Member Author

It's not lame for there to be a built-in PersistentField instance for UTCTime. It's the standard in the Haskell ecosystem and by far the most widely used.

That's not locking you in. There are many alternative time libraries with various trade-offs. There is nothing stopping you from writing a PersistField instance for any of them that you want to use with your favorite backend or backends. You could even release it on hackage if you think others will be interested.

It's my read of the code/my experience using it that any timestamp or timestampz is going to be deserialized into a PersistUTCTime based on the code in Postgresql.hs, thus incurring the cost of deserializing into that UTCTime. I had to modify the code in Postgresql.hs to write a PersistField instance that deserialized from a ByteString for chronos. Do you see another way?

@MaxGabriel
Copy link
Member Author

Updated: The benchmarks for UTCTime are a lot closer when you turn off profiling and compile with -O2:

time                 36.93 ms   (36.18 ms .. 37.53 ms)
                     0.998 R²   (0.995 R² .. 0.999 R²)
mean                 38.14 ms   (37.49 ms .. 38.95 ms)
std dev              1.472 ms   (1.147 ms .. 2.001 ms)
variance introduced by outliers: 12% (moderately inflated)

vs chronos:

time                 32.52 ms   (31.43 ms .. 33.46 ms)
                     0.997 R²   (0.994 R² .. 0.999 R²)
mean                 33.19 ms   (32.43 ms .. 34.54 ms)
std dev              2.299 ms   (1.241 ms .. 3.764 ms)
variance introduced by outliers: 28% (moderately inflated)

@yitz-zoomin
Copy link

Your custom persistent types can be marshaled to and from the DB however you want, using PersistDbSpecific and SqlOther. See the haddocks in Database.Persistent.Types for a full example.

If you want a custom instance for UTCTime in PostgreSQL that uses only the a numeric unix epoch representation without parsing and rendering textual timestamps, you can use extract(epoch from t) to convert a timestampz to an integer, and to_timestamp(t) to convert an integer to a timestampz.

But for that you still need to parse and render your integers represented as text. Using rawSQL you could marshal your unix epoch integers directly to and from the DB in libpq as integers.

If you play around with these approaches and benchmark them, you might come up with a really great PR to improve the performance of UTCTime marshaling in the PostgreSQL backend.

@MaxGabriel
Copy link
Member Author

MaxGabriel commented May 12, 2020

Your custom persistent types can be marshaled to and from the DB however you want, using PersistDbSpecific and SqlOther. See the haddocks in Database.Persistent.Types for a full example.

@yitz-zoomin I don't think that's the case. You're saying you would expect this code to work, right?

instance PersistField OffsetDatetime where
  toPersistValue date = PersistDbSpecific $ cs $ B.toLazyByteString $ builderUtf8_YmdHMSz OffsetFormatColonAuto SubsecondPrecisionAuto datetimeFormat date
  fromPersistValue (PersistDbSpecific bs) = case APBS.parseOnly (parserUtf8_YmdHMSz OffsetFormatColonAuto datetimeFormat) bs of
    Left err -> Left $ "When parsing a Chronos OffsetDatetime, got error: " <> T.pack err
    Right offsetDateTime -> Right offsetDateTime
  fromPersistValue bad = Left $ "When deserializing a Chronos OffsetDatetime, expected PersistDbSpecific but got " <> (T.pack $ show bad)

instance PersistFieldSql OffsetDatetime where
  sqlType _ = SqlOther "timestamptz"

But instead that code will fail with the error:

persistent-postgresql-bench: PersistMarshalError "Couldn't parse field `createdAt` from table `user_with_timestamps`. When deserializing a Chronos OffsetDatetime, expected PersistDbSpecific but got PersistUTCTime 2020-05-12 16:08:14.937688 UTC"

Tbc, PersistDbSpecific and SqlOther are great and I use them all the time for things like UUIDs. It's just in this case, timestamps are deserialized to PersistUTCTime

@parsonsmatt
Copy link
Collaborator

Updated: The benchmarks for UTCTime are a lot closer when you turn off profiling and compile with -O2:

time                 36.93 ms   (36.18 ms .. 37.53 ms)
                     0.998 R²   (0.995 R² .. 0.999 R²)
mean                 38.14 ms   (37.49 ms .. 38.95 ms)
std dev              1.472 ms   (1.147 ms .. 2.001 ms)
variance introduced by outliers: 12% (moderately inflated)

vs chronos:

time                 32.52 ms   (31.43 ms .. 33.46 ms)
                     0.997 R²   (0.994 R² .. 0.999 R²)
mean                 33.19 ms   (32.43 ms .. 34.54 ms)
std dev              2.299 ms   (1.241 ms .. 3.764 ms)
variance introduced by outliers: 28% (moderately inflated)

I know that @jessekempf also found that parsing UTCTime was a huge drain. Parsing into a chronos datatype for extreme performance and then converting to UTCTime to keep all the relevant instances sane sounds good to me.

Jesse do you mind posting your notes here? Slack ate them and I can't remember what happened :(

@jkachmar
Copy link

jkachmar commented Dec 17, 2020

This honestly feels like the perfect use-case for the zepto parser from chronos (not sure which one y'all are looking at for benchmarks); parsing time values from a PostgreSQL table that's managed by persistent should never fail right?

EDIT: I guess what I'm saying is that I honestly wouldn't be surprised if there was a measurable speed gain from using the zepto parser for DateTime and then converting to UTCTime (e.g. both use modified julian days, so the only real "calculation" that would have to be done would be recovering UTCTime's Difftime from DateTime's TimeOfDay)

@jessekempf
Copy link
Contributor

@parsonsmatt, @MaxGabriel: Ultimately any dense time representation is going to look like a 128-bit number. The overhead for that will be negligible in comparison to the fact that time parses a linked list of characters, which is about the most cache-unfriendly access pattern I can imagine.

The other problem is that the UTCTime parser that ships with time is overly general. It tries to be a reimplementation of strptime(3) from C. We don't need that level of flexibility, all we need is to parse ISO8601 into its constituent parts and then pass those to the Gregorian calendrical functions in Data.Time.Calendar.

@Vlix
Copy link
Contributor

Vlix commented Mar 23, 2024

@parsonsmatt, @MaxGabriel: Ultimately any dense time representation is going to look like a 128-bit number. The overhead for that will be negligible in comparison to the fact that time parses a linked list of characters, which is about the most cache-unfriendly access pattern I can imagine.

The other problem is that the UTCTime parser that ships with time is overly general. It tries to be a reimplementation of strptime(3) from C. We don't need that level of flexibility, all we need is to parse ISO8601 into its constituent parts and then pass those to the Gregorian calendrical functions in Data.Time.Calendar.

postgresql-simple uses it's own parsers using Attoparsec, even back in 2020, so there's no using any parsers from time.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants