Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(rust): utf8 to temporal casting #10517

Closed
wants to merge 106 commits into from
Closed
Show file tree
Hide file tree
Changes from 12 commits
Commits
Show all changes
106 commits
Select commit Hold shift + click to select a range
869fa1d
feat(rust): utf8 to temporal casting
brayanjuls Aug 15, 2023
bb73153
feat(rust): utf8 to temporal casting
brayanjuls Aug 15, 2023
ca7c105
Merge branch 'main' into utf8-to-temporal-cast
brayanjuls Oct 10, 2023
34ba75a
feat: utf8 to timestamp/date casting support
brayanjuls Oct 10, 2023
8be15bd
feat: added missing tests for failure scenarios, also fixed casting f…
brayanjuls Oct 16, 2023
901ab3a
Merge branch 'main' into utf8-to-temporal-cast
brayanjuls Oct 16, 2023
de17db6
fix: fixed issue regarding arrow libraries import and code formatting
brayanjuls Oct 16, 2023
aea80f4
fix(rust,python): only exclude final output names of group_by key exp…
nameexhaustion Oct 16, 2023
20a3991
depr(python): Rename `group_by_rolling` to `rolling` (#11761)
stinodego Oct 16, 2023
7efc54e
refactor(rust): Make all emw function expr non-anonymous (#11638)
romanovacca Oct 16, 2023
8589836
feat(python,rust,cli): add `DATE` function for SQL (#11541)
cmdlineluser Oct 16, 2023
6e886f9
chore: more granular polars-ops imports (#11760)
ritchie46 Oct 16, 2023
5d48cc8
feat(python): primitive kwargs in plugins (#11268)
ritchie46 Oct 16, 2023
17e1402
fix: fixed validate_is_number import issue, also added missing datafr…
brayanjuls Oct 16, 2023
befe308
refactor(python): Rename `IntegralType` to `IntegerType` (#11773)
stinodego Oct 16, 2023
084a7e1
fix: fixed linter issues.
brayanjuls Oct 16, 2023
0cfce61
chore(rust): Move cum_agg to polars-ops (#11770)
reswqa Oct 17, 2023
6f25831
depr(python): Deprecate `use_pyarrow` param for `Series.to_list` (#11…
stinodego Oct 17, 2023
2c6c9bd
refactor(python): Fix Exception module paths (#11785)
stinodego Oct 17, 2023
6b929f2
perf(python): Improve `DataFrame.get_column` performance by ~35% (#11…
stinodego Oct 17, 2023
dcec1e8
fix(rust,python): make `PyLazyGroupby` reusable (#11769)
nameexhaustion Oct 17, 2023
ef503c3
fix(python): Fix values printed by `assert_*_equal` AssertionError wh…
stinodego Oct 17, 2023
4fb3f07
docs(python): Minor tweak in code example in section Coming from Pand…
jrycw Oct 17, 2023
d00a432
fix: handle logical types in plugins (#11788)
ritchie46 Oct 17, 2023
7983724
feat: Expressify pct_change and move to ops (#11786)
reswqa Oct 17, 2023
4476fbd
Fix typo in docs (#11776)
aberres Oct 17, 2023
8463def
fix: fix key in object-store cache (#11790)
ritchie46 Oct 17, 2023
89ffd88
python polars 0.19.9 (#11791)
ritchie46 Oct 17, 2023
a507d67
refactor(python): Remove unused `_to_rust_syntax` util (#11795)
stinodego Oct 17, 2023
003ca4d
refactor(python): Minor updates to assertion utils and docstrings (#1…
stinodego Oct 17, 2023
32e7a24
chore(python): Bump lint dependencies (#11802)
stinodego Oct 17, 2023
f63014e
fix: patch broken aHash AES intrinsics on ARM (#11801)
orlp Oct 17, 2023
d85c452
depr(python): Deprecate non-keyword args for `ewm` methods (#11804)
stinodego Oct 17, 2023
8d29d3c
fix(rust,python): ensure projections containing only hive columns are…
nameexhaustion Oct 17, 2023
00082c5
fix: removing additional asserts from unit test, also improved patter…
brayanjuls Oct 17, 2023
45009eb
refactor(rust): Make some functions in dsl::mod non-anonymous (#11799)
reswqa Oct 18, 2023
d24c508
chore(rust): Move ewma to polars-ops (#11794)
reswqa Oct 18, 2023
f5f3fa9
fix(rust): remove flag inconsistency 'map_many' (#11817)
ritchie46 Oct 18, 2023
d6ef2e4
refactor(rust): remove redundant if branch in nested parquet (#11814)
nameexhaustion Oct 18, 2023
c3e1c1e
feat: don't require empty config for cloud scan_parquet (#11819)
ritchie46 Oct 18, 2023
9ea46ef
chore(rust): Move diff to polars-ops (#11818)
reswqa Oct 18, 2023
46e7009
fix(rust, python): Edge cases for list count formatting (#11780)
Walnut356 Oct 18, 2023
34d42c6
chore(rust): arrow: remove unused arithmetic code and remove doctests…
ritchie46 Oct 18, 2023
89cc1e2
refactor(python): Assert utils refactor (#11813)
stinodego Oct 18, 2023
28a99f6
chore(rust): Move round to ops (#11838)
reswqa Oct 19, 2023
a42185f
docs(rust): Update doc comments for with_column to reflect that colum…
0xForerunner Oct 19, 2023
a05b298
fix: propagate validity when cast primitive to list (#11846)
reswqa Oct 19, 2023
d39c360
fix(rust): panic on hive scan from cloud (#11847)
nameexhaustion Oct 19, 2023
f41e8f4
perf: properly push down slice before left/asof join (#11854)
orlp Oct 19, 2023
dfbc5f4
refactor: add missing polars-ops tests to CI (#11859)
orlp Oct 19, 2023
cd0288e
build: Bump docs dependencies (#11852)
stinodego Oct 19, 2023
21e2c0c
docs(python): fix typo in code example in section Expressions - Basic…
jrycw Oct 19, 2023
65659b9
chore(python): bump hypothesis from 6.87.1 to 6.88.1 in /py-polars (#…
dependabot[bot] Oct 19, 2023
754067c
build(rust): update aws-creds requirement from 0.35.0 to 0.36.0 (#11868)
dependabot[bot] Oct 19, 2023
2fea820
chore(python): bump black from 23.9.1 to 23.10.0 in /py-polars (#11866)
dependabot[bot] Oct 19, 2023
570dca7
build(rust): update regex-syntax requirement from 0.7 to 0.8 (#11870)
dependabot[bot] Oct 19, 2023
6cf037b
build(rust): update zstd requirement from 0.12 to 0.13 (#11869)
dependabot[bot] Oct 19, 2023
eac03a2
build(rust): update pyo3-build-config requirement from 0.19 to 0.20 (…
dependabot[bot] Oct 19, 2023
c8cfdee
docs: fix incorrect example of valid time zones (#11873)
romanovacca Oct 20, 2023
e21c1a7
fix(rust): series.to_numpy fails with dtype=Null (#11858)
romanovacca Oct 20, 2023
c2562d8
build(rust): update simd-json requirement from 0.11 to 0.12 (#11871)
dependabot[bot] Oct 20, 2023
a7fdbee
docs: add section about plugins (#11855)
ritchie46 Oct 20, 2023
0b8be40
fix: fix project pushdown for double projection contains count (#11843)
reswqa Oct 20, 2023
d1af5f9
refactor(rust): rename new_from_owned_with_null_bitmap (#11828)
orlp Oct 20, 2023
7b9f10e
fix(python): Frame slicing single column (#11825)
rjthoen Oct 20, 2023
27a4fe2
fix: recursively check allowed streaming dtypes (#11879)
ritchie46 Oct 20, 2023
d31c30d
ci: Allow manual trigger for docs deployment (#11881)
stinodego Oct 20, 2023
106dce8
feat(rust, python): Introduce list.sample (#11845)
reswqa Oct 20, 2023
f509de9
chore: Fix Cargo warning for parquet2 dependency (#11882)
stinodego Oct 20, 2023
d9f2f5f
depr(python): Deprecate `DataType.is_nested` (#11844)
stinodego Oct 20, 2023
1a0c174
fix: recursively apply `cast_unchecked` in lists (#11884)
ritchie46 Oct 20, 2023
6dae550
docs: load 40x40 avatar from github and add loading=lazy attribute. (…
dannyvankooten Oct 20, 2023
cb92cb8
perf(python): optimise `read_database` Databricks queries made using …
alexander-beedie Oct 20, 2023
0d9f865
refactor(python): Further assert utils refactor (#11888)
stinodego Oct 20, 2023
d9c6316
fix(python): Add `include_nulls` parameter to `update` (#11830)
mcrumiller Oct 20, 2023
fada98b
fix: use physcial append (#11894)
ritchie46 Oct 20, 2023
5425f6a
perf: fix quadratic behavior in append sorted check (#11893)
orlp Oct 20, 2023
c69722d
perf: fix accidental quadratic behavior; cache null_count (#11889)
ritchie46 Oct 20, 2023
eb469b4
python polars 0.19.10 (#11895)
ritchie46 Oct 20, 2023
fe04f4a
fix(python): raise a suitable error from `read_excel` and/or `read_od…
alexander-beedie Oct 20, 2023
d847f69
fix: fixed the bug that incorrectly enabled the conversion from epoch…
brayanjuls Oct 21, 2023
c5459f1
fix: removed unused import
brayanjuls Oct 21, 2023
6a1731e
fix(python): address issue with inadvertently shared options dict in …
alexander-beedie Oct 21, 2023
a75eaca
docs(python): add missing 'diagonal_relaxed' to `pl.concat` "how" par…
alexander-beedie Oct 21, 2023
96b465e
fix(python): address DataFrame construction error with lists of `nump…
alexander-beedie Oct 21, 2023
6155e7f
feat(python): upcast int->float and date->datetime for certain Series…
mcrumiller Oct 21, 2023
ff358ca
fix: predicate push-down remove predicate refers to alias for more br…
reswqa Oct 21, 2023
6e8ce9c
fix(python): set null_count on categorical append (#11914)
ritchie46 Oct 21, 2023
3251703
refactor(rust): prepare for multiple files in a node (#11918)
ritchie46 Oct 21, 2023
8af94b0
docs: fix some typos and add polars-business to curated plugin list (…
ritchie46 Oct 21, 2023
5e96abd
feat: error instead of panic in unsupported sinks (#11915)
ritchie46 Oct 21, 2023
492a3c1
fix(python): Fix `Array` data type initialization (#11907)
stinodego Oct 21, 2023
04357ef
docs(python): Fix docstring for `diff` methods (#11921)
LaurynasMiksys Oct 21, 2023
3386abd
feat(rust): utf8 to temporal casting
brayanjuls Aug 15, 2023
bce21ec
feat(rust): utf8 to temporal casting
brayanjuls Aug 15, 2023
4b08b6b
feat: utf8 to timestamp/date casting support
brayanjuls Oct 10, 2023
013916a
feat: added missing tests for failure scenarios, also fixed casting f…
brayanjuls Oct 16, 2023
dab1451
fix: fixed issue regarding arrow libraries import and code formatting
brayanjuls Oct 16, 2023
374d946
fix: fixed validate_is_number import issue, also added missing datafr…
brayanjuls Oct 16, 2023
68f9d69
fix: fixed linter issues.
brayanjuls Oct 16, 2023
8e810ff
fix: removing additional asserts from unit test, also improved patter…
brayanjuls Oct 17, 2023
5225444
fix: fixed the bug that incorrectly enabled the conversion from epoch…
brayanjuls Oct 21, 2023
fa5a0e1
fix: removed unused import
brayanjuls Oct 21, 2023
30dcdc3
fix: fixed variable naming from tu to time_unit and from tz to time_zone
brayanjuls Oct 21, 2023
215990b
Merge remote-tracking branch 'origin/utf8-to-temporal-cast' into utf8…
brayanjuls Oct 21, 2023
20f94d9
Revert "Merge remote-tracking branch 'origin/utf8-to-temporal-cast' i…
brayanjuls Oct 24, 2023
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
12 changes: 6 additions & 6 deletions crates/polars-arrow/src/compute/cast/mod.rs
Original file line number Diff line number Diff line change
Expand Up @@ -580,9 +580,9 @@ pub fn cast(
LargeUtf8 => Ok(Box::new(utf8_to_large_utf8(
array.as_any().downcast_ref().unwrap(),
))),
Timestamp(TimeUnit::Nanosecond, None) => utf8_to_naive_timestamp_ns_dyn::<i32>(array),
brayanjuls marked this conversation as resolved.
Show resolved Hide resolved
Timestamp(TimeUnit::Nanosecond, Some(tz)) => {
utf8_to_timestamp_ns_dyn::<i32>(array, tz.clone())
Timestamp(tu, None) => utf8_to_naive_timestamp_dyn::<i32>(array, tu.to_owned()),
Timestamp(tu, Some(tz)) => {
utf8_to_timestamp_dyn::<i32>(array, tz.clone(), tu.to_owned())
},
_ => polars_bail!(InvalidOperation:
"casting from {from_type:?} to {to_type:?} not supported",
Expand All @@ -607,9 +607,9 @@ pub fn cast(
to_type.clone(),
)
.boxed()),
Timestamp(TimeUnit::Nanosecond, None) => utf8_to_naive_timestamp_ns_dyn::<i64>(array),
Timestamp(TimeUnit::Nanosecond, Some(tz)) => {
utf8_to_timestamp_ns_dyn::<i64>(array, tz.clone())
Timestamp(tu, None) => utf8_to_naive_timestamp_dyn::<i64>(array, tu.to_owned()),
Timestamp(tu, Some(tz)) => {
utf8_to_timestamp_dyn::<i64>(array, tz.clone(), tu.to_owned())
},
_ => polars_bail!(InvalidOperation:
"casting from {from_type:?} to {to_type:?} not supported",
Expand Down
32 changes: 19 additions & 13 deletions crates/polars-arrow/src/compute/cast/utf8_to.rs
Original file line number Diff line number Diff line change
Expand Up @@ -3,11 +3,11 @@ use polars_error::PolarsResult;

use super::CastOptions;
use crate::array::*;
use crate::datatypes::DataType;
use crate::datatypes::{DataType, TimeUnit};
use crate::offset::Offset;
use crate::temporal_conversions::{
utf8_to_naive_timestamp_ns as utf8_to_naive_timestamp_ns_,
utf8_to_timestamp_ns as utf8_to_timestamp_ns_, EPOCH_DAYS_FROM_CE,
utf8_to_naive_timestamp as utf8_to_naive_timestamp_, utf8_to_timestamp as utf8_to_timestamp_,
EPOCH_DAYS_FROM_CE,
};
use crate::types::NativeType;

Expand Down Expand Up @@ -110,34 +110,40 @@ pub fn utf8_to_dictionary<O: Offset, K: DictionaryKey>(
Ok(array.into())
}

pub(super) fn utf8_to_naive_timestamp_ns_dyn<O: Offset>(
pub(super) fn utf8_to_naive_timestamp_dyn<O: Offset>(
from: &dyn Array,
tu: TimeUnit,
) -> PolarsResult<Box<dyn Array>> {
let from = from.as_any().downcast_ref().unwrap();
Ok(Box::new(utf8_to_naive_timestamp_ns::<O>(from)))
Ok(Box::new(utf8_to_naive_timestamp::<O>(from, tu)))
}

/// [`crate::temporal_conversions::utf8_to_timestamp_ns`] applied for RFC3339 formatting
pub fn utf8_to_naive_timestamp_ns<O: Offset>(from: &Utf8Array<O>) -> PrimitiveArray<i64> {
utf8_to_naive_timestamp_ns_(from, RFC3339)
/// [`crate::temporal_conversions::utf8_to_timestamp`] applied for RFC3339 formatting
pub fn utf8_to_naive_timestamp<O: Offset>(
from: &Utf8Array<O>,
tu: TimeUnit,
) -> PrimitiveArray<i64> {
utf8_to_naive_timestamp_(from, RFC3339, tu)
}

pub(super) fn utf8_to_timestamp_ns_dyn<O: Offset>(
pub(super) fn utf8_to_timestamp_dyn<O: Offset>(
from: &dyn Array,
timezone: String,
tu: TimeUnit,
) -> PolarsResult<Box<dyn Array>> {
let from = from.as_any().downcast_ref().unwrap();
utf8_to_timestamp_ns::<O>(from, timezone)
utf8_to_timestamp::<O>(from, timezone, tu)
.map(Box::new)
.map(|x| x as Box<dyn Array>)
}

/// [`crate::temporal_conversions::utf8_to_timestamp_ns`] applied for RFC3339 formatting
pub fn utf8_to_timestamp_ns<O: Offset>(
/// [`crate::temporal_conversions::utf8_to_timestamp`] applied for RFC3339 formatting
pub fn utf8_to_timestamp<O: Offset>(
from: &Utf8Array<O>,
timezone: String,
tu: TimeUnit,
) -> PolarsResult<PrimitiveArray<i64>> {
utf8_to_timestamp_ns_(from, RFC3339, timezone)
utf8_to_timestamp_(from, RFC3339, timezone, tu)
}

/// Conversion of utf8
Expand Down
49 changes: 18 additions & 31 deletions crates/polars-arrow/src/temporal_conversions.rs
Original file line number Diff line number Diff line change
Expand Up @@ -321,17 +321,6 @@ pub fn parse_offset(offset: &str) -> PolarsResult<FixedOffset> {
.expect("FixedOffset::east out of bounds"))
}

/// Parses `value` to `Option<i64>` consistent with the Arrow's definition of timestamp with timezone.
/// `tz` must be built from `timezone` (either via [`parse_offset`] or `chrono-tz`).
#[inline]
pub fn utf8_to_timestamp_ns_scalar<T: chrono::TimeZone>(
value: &str,
fmt: &str,
tz: &T,
) -> Option<i64> {
utf8_to_timestamp_scalar(value, fmt, tz, &TimeUnit::Nanosecond)
}

/// Parses `value` to `Option<i64>` consistent with the Arrow's definition of timestamp with timezone.
/// `tz` must be built from `timezone` (either via [`parse_offset`] or `chrono-tz`).
/// Returns in scale `tz` of `TimeUnit`.
Expand Down Expand Up @@ -362,12 +351,6 @@ pub fn utf8_to_timestamp_scalar<T: chrono::TimeZone>(
}
}

/// Parses `value` to `Option<i64>` consistent with the Arrow's definition of timestamp without timezone.
#[inline]
pub fn utf8_to_naive_timestamp_ns_scalar(value: &str, fmt: &str) -> Option<i64> {
utf8_to_naive_timestamp_scalar(value, fmt, &TimeUnit::Nanosecond)
}

/// Parses `value` to `Option<i64>` consistent with the Arrow's definition of timestamp without timezone.
/// Returns in scale `tz` of `TimeUnit`.
#[inline]
Expand All @@ -386,18 +369,18 @@ pub fn utf8_to_naive_timestamp_scalar(value: &str, fmt: &str, tu: &TimeUnit) ->
.ok()
}

fn utf8_to_timestamp_ns_impl<O: Offset, T: chrono::TimeZone>(
fn utf8_to_timestamp_impl<O: Offset, T: chrono::TimeZone>(
array: &Utf8Array<O>,
fmt: &str,
timezone: String,
tz: T,
tu: TimeUnit,
) -> PrimitiveArray<i64> {
let iter = array
.iter()
.map(|x| x.and_then(|x| utf8_to_timestamp_ns_scalar(x, fmt, &tz)));
.map(|x| x.and_then(|x| utf8_to_timestamp_scalar(x, fmt, &tz, &tu)));

PrimitiveArray::from_trusted_len_iter(iter)
.to(DataType::Timestamp(TimeUnit::Nanosecond, Some(timezone)))
PrimitiveArray::from_trusted_len_iter(iter).to(DataType::Timestamp(tu, Some(timezone)))
}

/// Parses `value` to a [`chrono_tz::Tz`] with the Arrow's definition of timestamp with a timezone.
Expand All @@ -411,59 +394,63 @@ pub fn parse_offset_tz(timezone: &str) -> PolarsResult<chrono_tz::Tz> {

#[cfg(feature = "chrono-tz")]
#[cfg_attr(docsrs, doc(cfg(feature = "chrono-tz")))]
fn chrono_tz_utf_to_timestamp_ns<O: Offset>(
fn chrono_tz_utf_to_timestamp<O: Offset>(
array: &Utf8Array<O>,
fmt: &str,
timezone: String,
tu: TimeUnit,
) -> PolarsResult<PrimitiveArray<i64>> {
let tz = parse_offset_tz(&timezone)?;
Ok(utf8_to_timestamp_ns_impl(array, fmt, timezone, tz))
Ok(utf8_to_timestamp_impl(array, fmt, timezone, tz, tu))
}

#[cfg(not(feature = "chrono-tz"))]
fn chrono_tz_utf_to_timestamp_ns<O: Offset>(
fn chrono_tz_utf_to_timestamp<O: Offset>(
_: &Utf8Array<O>,
_: &str,
timezone: String,
_: TimeUnit,
) -> PolarsResult<PrimitiveArray<i64>> {
panic!("timezone \"{timezone}\" cannot be parsed (feature chrono-tz is not active)")
}

/// Parses a [`Utf8Array`] to a timeozone-aware timestamp, i.e. [`PrimitiveArray<i64>`] with type `Timestamp(Nanosecond, Some(timezone))`.
/// # Implementation
/// * parsed values with timezone other than `timezone` are converted to `timezone`.
/// * parsed values without timezone are null. Use [`utf8_to_naive_timestamp_ns`] to parse naive timezones.
/// * parsed values without timezone are null. Use [`utf8_to_naive_timestamp`] to parse naive timezones.
/// * Null elements remain null; non-parsable elements are null.
/// The feature `"chrono-tz"` enables IANA and zoneinfo formats for `timezone`.
/// # Error
/// This function errors iff `timezone` is not parsable to an offset.
pub fn utf8_to_timestamp_ns<O: Offset>(
pub fn utf8_to_timestamp<O: Offset>(
array: &Utf8Array<O>,
fmt: &str,
timezone: String,
tu: TimeUnit,
) -> PolarsResult<PrimitiveArray<i64>> {
let tz = parse_offset(timezone.as_str());

if let Ok(tz) = tz {
Ok(utf8_to_timestamp_ns_impl(array, fmt, timezone, tz))
Ok(utf8_to_timestamp_impl(array, fmt, timezone, tz, tu))
} else {
chrono_tz_utf_to_timestamp_ns(array, fmt, timezone)
chrono_tz_utf_to_timestamp(array, fmt, timezone, tu)
}
}

/// Parses a [`Utf8Array`] to naive timestamp, i.e.
/// [`PrimitiveArray<i64>`] with type `Timestamp(Nanosecond, None)`.
/// Timezones are ignored.
/// Null elements remain null; non-parsable elements are set to null.
pub fn utf8_to_naive_timestamp_ns<O: Offset>(
pub fn utf8_to_naive_timestamp<O: Offset>(
array: &Utf8Array<O>,
fmt: &str,
tu: TimeUnit,
) -> PrimitiveArray<i64> {
let iter = array
.iter()
.map(|x| x.and_then(|x| utf8_to_naive_timestamp_ns_scalar(x, fmt)));
.map(|x| x.and_then(|x| utf8_to_naive_timestamp_scalar(x, fmt, &tu)));

PrimitiveArray::from_trusted_len_iter(iter).to(DataType::Timestamp(TimeUnit::Nanosecond, None))
PrimitiveArray::from_trusted_len_iter(iter).to(DataType::Timestamp(tu, None))
}

fn add_month(year: i32, month: u32, months: i32) -> chrono::NaiveDate {
Expand Down
28 changes: 28 additions & 0 deletions crates/polars-core/src/chunked_array/cast.rs
Original file line number Diff line number Diff line change
Expand Up @@ -7,6 +7,7 @@ use arrow::compute::cast::CastOptions;
use crate::chunked_array::categorical::CategoricalChunkedBuilder;
#[cfg(feature = "timezones")]
use crate::chunked_array::temporal::validate_time_zone;
use crate::prelude::DataType::Datetime;
use crate::prelude::*;

pub(crate) fn cast_chunks(
Expand Down Expand Up @@ -195,6 +196,33 @@ impl ChunkCast for Utf8Chunked {
polars_bail!(ComputeError: "expected 'precision' or 'scale' when casting to Decimal")
},
},
#[cfg(feature = "dtype-date")]
DataType::Date => {
let result = cast_chunks(&self.chunks, data_type, true)?;
let out = Series::try_from((self.name(), result))?;
Ok(out)
},
#[cfg(feature = "dtype-datetime")]
DataType::Datetime(tu, tz) => {
let out = match tz {
#[cfg(feature = "timezones")]
Some(tz) => {
validate_time_zone(tz)?;
let result = cast_chunks(
&self.chunks,
&Datetime(tu.to_owned(), Some(tz.clone())),
true,
)?;
Series::try_from((self.name(), result))
},
_ => {
let result =
cast_chunks(&self.chunks, &Datetime(tu.to_owned(), None), true)?;
Series::try_from((self.name(), result))
},
};
out
},
_ => cast_impl(self.name(), &self.chunks, data_type),
}
}
Expand Down
19 changes: 18 additions & 1 deletion py-polars/tests/unit/test_lazy.py
Original file line number Diff line number Diff line change
Expand Up @@ -1375,7 +1375,7 @@ def test_quadratic_behavior_4736() -> None:
ldf.select(reduce(add, (pl.col(fld) for fld in ldf.columns)))


@pytest.mark.parametrize("input_dtype", [pl.Utf8, pl.Int64, pl.Float64])
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why was pl.Utf8 removed here?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This pl.utf8 is to test that the epoch string behavior works, but I think it should not be supported as it is a string not following date/datetime format. ie: pl.DataFrame({"x1":["1234"]}).with_columns(**{ "x1-date": pl.col("x1").cast(pl.Date)})

Should we keep the support for epoch string?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added it back.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are we sure this is meant to work?

To be honest it feels a bit strange to have that pl.Series([<string>]).cast(pl.Date) could be interpreted as both a format string, or as the number of days since 1970-01-01

I find it hard to believe that anyone would store a timestamp as a string - if they're trying to cast a string to Date, it's almost certainly a date string (and if not, I think it's OK to let the user cast to Int32 first):

# this looks odd to me
In [9]: print(pl.Series(['1234']).cast(pl.Date))
shape: (1,)
Series: '' [date]
[
        1973-05-19
]

My suggestion would be to remove Utf8 from here, treat this as a bug fix, and not double-traverse the array with is_parsable_as_number - @stinodego thoughts?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I completely agree, that's why I removed it in the first place but got no feedback on my answer so I put it back, also there is an open issue where this behavior is reported as a bug, #10478

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nice, thanks - ok maybe let's add that as a test case and check and closes #10478 to the description?

and could we wait for Stijn's response before marking this as resolved please? thanks 🙏

finally, sorry for having been slow on this PR, been on holiday recently

Copy link
Member

@stinodego stinodego Oct 20, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Apologies, haven't been able to get back to this one. Agree with what you guys are saying - casting "1234" to a Date should fail.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No worries, I understand that there is a lot of work here to be done and also life happens. Will make the changes. Thanks.

@pytest.mark.parametrize("input_dtype", [pl.Int64, pl.Float64])
def test_from_epoch(input_dtype: pl.PolarsDataType) -> None:
ldf = pl.LazyFrame(
[
Expand Down Expand Up @@ -1415,6 +1415,23 @@ def test_from_epoch(input_dtype: pl.PolarsDataType) -> None:
_ = ldf.select(pl.from_epoch(ts_col, time_unit="s2")) # type: ignore[call-overload]


def test_from_epoch_str() -> None:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nice, good one for splitting this out into a separate test!

ldf = pl.LazyFrame(
[
pl.Series("timestamp_ms", [1147880044 * 1_000]).cast(pl.Utf8),
pl.Series("timestamp_us", [1147880044 * 1_000_000]).cast(pl.Utf8),
]
)

with pytest.raises(ComputeError):
ldf.select(
[
pl.from_epoch(pl.col("timestamp_ms"), time_unit="ms"),
pl.from_epoch(pl.col("timestamp_us"), time_unit="us"),
]
).collect()


def test_cumagg_types() -> None:
ldf = pl.LazyFrame({"a": [1, 2], "b": [True, False], "c": [1.3, 2.4]})
cumsum_lf = ldf.select(
Expand Down
Loading
Loading