[ENH] pivot_longer now supports named groups where names_pattern is a regular expression. A dictionary can now be passed to names_pattern, and is internally evaluated as a list/tuple of regular expressions. Issue #1209 @samukweku
+
[ENH] Improve selection in conditional_join. Issue #1223 @samukweku
+
[ENH] Add col class for selecting columns within an expression. Currently limited to use within conditional_join. PR #1260 @samukweku.
+
[ENH] Performance improvement for range joins in conditional_join, when use_numba = False. Performance improvement for equi-join and a range join, when use_numba = True, for many to many join with wide ranges. PR #1256, #1267 @samukweku
+
[DEPR] Add deprecation warning for pivot_wider. Issue #1045 @samukweku
+
[BUG] Fix string column selection on a MultiIndex. Issue #1265. @samukweku
[ENH] Add lazy imports to speed up the time taken to load pyjanitor (part 2)
+
[DOC] Updated developer guide docs.
+
[ENH] Allow column selection/renaming within conditional_join. Issue #1102. Also allow first or last match. Issue #1020 @samukweku.
+
[ENH] New decorator deprecated_kwargs for breaking API. #1103 @Zeroto521
+
[ENH] Extend select_columns to support non-string columns. Issue #1105 @samukweku
+
[ENH] Performance improvement for groupby_topk. Issue #1093 @samukweku
+
[ENH] min_max_scale drop old_min and old_max to fit sklearn's method API. Issue #1068 @Zeroto521
+
[ENH] Add jointly option for min_max_scale support to transform each column values or entire values. Default transform each column, similar behavior to sklearn.preprocessing.MinMaxScaler. (Issue #1067, PR #1112, PR #1123) @Zeroto521
+
[INF] Require pyspark minimal version is v3.2.0 to cut duplicates codes. Issue #1110 @Zeroto521
+
[ENH] Add support for extension arrays in expand_grid. Issue #1121 @samukweku
+
[ENH] Add names_expand and index_expand parameters to pivot_wider for exposing missing categoricals. Issue #1108 @samukweku
+
[ENH] Add fix for slicing error when selecting columns in pivot_wider. Issue #1134 @samukweku
+
[ENH] dropna parameter added to pivot_longer. Issue #1132 @samukweku
+
[INF] Update mkdocstrings version and to fit its new coming features. PR #1138 @Zeroto521
+
[BUG] Force math.softmax returning Series. PR #1139 @Zeroto521
+
[INF] Set independent environment for building documentation. PR #1141 @Zeroto521
+
[DOC] Add local documentation preview via github action artifact. PR #1149 @Zeroto521
[TST] Fix testcases failing on Window. Issue #1160 @Zeroto521, and @samukweku
+
[INF] Cancel old workflow runs via Github Action concurrency. PR #1161 @Zeroto521
+
[ENH] Faster computation for non-equi join, with a numba engine. Speed improvement for left/right joins when sort_by_appearance is False. Issue #1102 @samukweku
+
[BUG] Avoid change_type mutating original DataFrame. PR #1162 @Zeroto521
+
[ENH] The parameter column_name of change_type totally supports inputing multi-column now. #1163 @Zeroto521
+
[ENH] Fix error when sort_by_appearance=True is combined with dropna=True. Issue #1168 @samukweku
[ENH] select_rows function added for flexible row selection. Generic select function added as well. Add support for MultiIndex selection via dictionary. Issue #1124 @samukweku
+
[TST] Compat with macos and window, to fix FailedHealthCheck Issue #1181 @Zeroto521
+
[INF] Merge two docs CIs (docs-preview.yml and docs.yml) to one. And add documentation pytest mark. PR #1183 @Zeroto521
+
[INF] Merge codecov.yml (only works for the dev branch pushing event) into tests.yml (only works for PR event). PR #1185 @Zeroto521
+
[TST] Fix failure for test/timeseries/test_fill_missing_timestamp. Issue #1184 @samukweku
+
[BUG] Import DataDescription to fix: AttributeError: 'DataFrame' object has no attribute 'data_description'. PR #1191 @Zeroto521
[DOC] Updated fill.py and update_where.py documentation with working examples.
+
[ENH] Deprecate num_bins from bin_numeric in favour of bins, and allow generic **kwargs to be passed into pd.cut. Issue #969. @thatlittleboy
+
[ENH] Fix concatenate_columns not working on category inputs @zbarry
+
[INF] Simplify CI system @ericmjl
+
[ENH] Added "read_commandline" function to janitor.io @BaritoneBeard
+
[BUG] Fix bug with the complement parameter of filter_on. Issue #988. @thatlittleboy
+
[ENH] Add xlsx_table, for reading tables from an Excel sheet. @samukweku
+
[ENH] minor improvements for conditional_join; equality only joins are no longer supported; there has to be at least one non-equi join present. @samukweku
+
[BUG] sort_column_value_order no longer mutates original dataframe.
+
[BUG] Extend fill_empty's column_names type range. Issue #998. @Zeroto521
+
[BUG] Removed/updated error-inducing default arguments in row_to_names (#1004) and round_to_fraction (#1005). @thatlittleboy
+
[ENH] patterns deprecated in favour of importing re.compile. #1007 @samukweku
+
[ENH] Changes to kwargs in encode_categorical, where the values can either be a string or a 1D array. #1021 @samukweku
+
[ENH] Add fill_value and explicit parameters to the complete function. #1019 @samukweku
+
[ENH] Performance improvement for expand_grid. @samukweku
+
[BUG] Make factorize_columns (PR #1028) and truncate_datetime_dataframe (PR #1040) functions non-mutating. @thatlittleboy
+
[BUG] Fix SettingWithCopyWarning and other minor bugs when using truncate_datetime_dataframe, along with further performance improvements (PR #1040). @thatlittleboy
+
[ENH] Performance improvement for conditional_join. @samukweku
+
[ENH] Multiple .value is now supported in pivot_longer. Multiple values_to is also supported, when names_pattern is a list or tuple. names_transform parameter added, for efficient dtype transformation of unpivoted columns. #1034, #1048, #1051 @samukweku
+
[ENH] Add xlsx_cells for reading a spreadsheet as a table of individual cells. #929 @samukweku.
+
[ENH] Let filter_string suit parameters of Series.str.contains Issue #1003 and #1047. @Zeroto521
+
[ENH] names_glue in pivot_wider now takes a string form, using str.format_map under the hood. levels_order is also deprecated. @samukweku
+
[BUG] Fixed bug in transform_columns which ignored the column_names specification when new_column_names dictionary was provided as an argument, issue #1063. @thatlittleboy
+
[BUG] count_cumulative_unique no longer modifies the column being counted in the output when case_sensitive argument is set to False, issue #1065. @thatlittleboy
+
[BUG] Fix for gcc missing error in dev container
+
[DOC] Added a step in the dev guide to install Remote Container in VS Code. @ashenafiyb
+
[DOC] Convert expand_column and find_replace code examples to doctests, issue #972. @gahjelle
+
[DOC] Convert expand_column code examples to doctests, issue #972. @gahjelle
+
[DOC] Convert get_dupes code examples to doctests, issue #972. @ethompsy
+
[DOC] Convert engineering code examples to doctests, issue #972 @ashenafiyb
+
[DOC] Convert groupby_topk code examples to doctests, issue #972. @ethompsy
+
[DOC] Add doctests to math, issue #972. @gahjelle
+
[DOC] Add doctests to math and ml, issue #972. @gahjelle
+
[DOC] Add doctests to math, ml, and xarray, issue #972. @gahjelle
[ENH] pivot_longer can handle multiple values in paired columns, and can reshape
+ using a list/tuple of regular expressions in names_pattern. @samukweku
+
[ENH] Replaced default numeric conversion of dataframe with a dtypes parameter,
+ allowing the user to control the data types. - @samukweku
+
[INF] Loosen dependency specifications. Switch to pip-tools for managing
+ dependencies. Issue #760. @MinchinWeb
[ENH] Add pivot_wider function, which is the inverse of the pivot_longer
+ function. @samukweku
+
[INF] Add openpyxl to environment-dev.yml. @samukweku
+
[ENH] Reduce code by reusing existing functions for fill_direction. @samukweku
+
[ENH] Improvements to pivot_longer function, with improved speed and cleaner code.
+ dtypes parameter dropped; user can change dtypes with pandas' astype method, or
+ pyjanitor's change_type method. @samukweku
+
[ENH] Add kwargs to encode_categorical function, to create ordered categorical columns,
+ or categorical columns with explicit categories. @samukweku
+
[ENH] Improvements to complete method. Use pd.merge to handle duplicates and
+ null values. @samukweku
+
[ENH] Add new_column_names parameter to process_text, allowing a user to
+ create a new column name after processing a text column. Also added a merge_frame
+ parameter, allowing dataframe merging, if the result of the text processing is a
+ dataframe.@samukweku
+
[ENH] Add aggfunc parameter to pivot_wider. @samukweku
+
[ENH] Modified the check function in utils to verify if a value is a callable. @samukweku
+
[ENH] Add a base _select_column function, using functools.singledispatch,
+ to allow for flexible columns selection. @samukweku
+
[ENH] pivot_longer and pivot_wider now support janitor.select_columns syntax,
+ allowing for more flexible and dynamic column selection. @samukweku
[ENH] Added function sort_timestamps_monotonically to timeseries functions @UGuntupalli
+
[ENH] Added the complete function for converting implicit missing values
+ to explicit ones. @samukweku
+
[ENH] Further simplification of expand_grid. @samukweku
+
[BUGFIX] Added copy() method to original dataframe, to avoid mutation. Issue #729. @samukweku
+
[ENH] Added also method for running functions in chain with no return values.
+
[DOC] Added a timeseries module section to website docs. Issue #742. @loganthomas
+
[ENH] Added a pivot_longer function, a wrapper around pd.melt and similar to
+ tidyr's pivot_longer function. Also added an example notebook. @samukweku
+
[ENH] Fixed code to returns error if fill_value is not a dictionary. @samukweku
+
[INF] Welcome bot (.github/config.yml) for new users added. Issue #739. @samukweku
[ENH] Upgraded update_where function to use either the pandas query style,
+ or boolean indexing via the loc method. Also updated find_replace function to use the loc
+ method directly, instead of routing it through the update_where function. @samukweku
+
[INF] Update pandas minimum version to 1.0.0. @hectormz
+
[DOC] Updated the general functions API page to show all available functions. @samukweku
+
[DOC] Fix the few lacking type annotations of functions. @VPerrollaz
+
[DOC] Changed the signature from str to Optional[str] when initialized by None. @VPerrollaz
+
[DOC] Add the Optional type for all signatures of the API. @VPerrollaz
+
[TST] Updated test_expand_grid to account for int dtype difference in Windows OS @samukweku
+
[TST] Make importing pandas testing functions follow uniform pattern. @hectormz
+
[ENH] Added process_text wrapper function for all Pandas string methods. @samukweku
+
[TST] Only skip tests for non-installed libraries on local machine. @hectormz
+
[DOC] Fix minor issues in documentation. @hectormz
+
[ENH] Added fill_direction function for forward/backward fills on missing values
+ for selected columns in a dataframe. @samukweku
+
[ENH] Simpler logic and less lines of code for expand_grid function @samukweku
[INF] Add debug-statements, requirements-txt-fixer, and interrogate to pre-commit. @hectormz
+
[ENH] Upgraded transform_column to use df.assign underneath the hood,
+ and also added option to transform column elementwise (via apply)
+ or columnwise (thus operating on a series). @ericmjl
Biology and bioinformatics-oriented data cleaning functions.
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+ join_fasta(df,filename,id_col,column_name)
+
+
+
+
+
+
+
Convenience method to join in a FASTA file as a column.
+
This allows us to add the string sequence of a FASTA file as a new column
+of data in the dataframe.
+
This method only attaches the string representation of the SeqRecord.Seq
+object from Biopython. Does not attach the full SeqRecord. Alphabet is
+also not stored, under the assumption that the data scientist has domain
+knowledge of what kind of sequence is being read in (nucleotide vs. amino
+acid.)
+
This method mutates the original DataFrame.
+
For more advanced functions, please use phylopandas.
@pf.register_dataframe_method
+@deprecated_alias(col_name="column_name")
+defjoin_fasta(
+ df:pd.DataFrame,filename:str,id_col:str,column_name:str
+)->pd.DataFrame:
+"""Convenience method to join in a FASTA file as a column.
+
+ This allows us to add the string sequence of a FASTA file as a new column
+ of data in the dataframe.
+
+ This method only attaches the string representation of the SeqRecord.Seq
+ object from Biopython. Does not attach the full SeqRecord. Alphabet is
+ also not stored, under the assumption that the data scientist has domain
+ knowledge of what kind of sequence is being read in (nucleotide vs. amino
+ acid.)
+
+ This method mutates the original DataFrame.
+
+ For more advanced functions, please use phylopandas.
+
+ Examples:
+ >>> import tempfile
+ >>> import pandas as pd
+ >>> import janitor.biology
+ >>> tf = tempfile.NamedTemporaryFile()
+ >>> tf.write('''>SEQUENCE_1
+ ... MTEITAAMVKELRESTGAGMMDCK
+ ... >SEQUENCE_2
+ ... SATVSEINSETDFVAKN'''.encode('utf8'))
+ 66
+ >>> tf.seek(0)
+ 0
+ >>> df = pd.DataFrame({"sequence_accession":
+ ... ["SEQUENCE_1", "SEQUENCE_2", ]})
+ >>> df = df.join_fasta( # doctest: +SKIP
+ ... filename=tf.name,
+ ... id_col='sequence_accession',
+ ... column_name='sequence',
+ ... )
+ >>> df.sequence # doctest: +SKIP
+ 0 MTEITAAMVKELRESTGAGMMDCK
+ 1 SATVSEINSETDFVAKN
+ Name: sequence, dtype: object
+
+ Args:
+ df: A pandas DataFrame.
+ filename: Path to the FASTA file.
+ id_col: The column in the DataFrame that houses sequence IDs.
+ column_name: The name of the new column.
+
+ Returns:
+ A pandas DataFrame with new FASTA string sequence column.
+ """
+ seqrecords={
+ x.id:x.seq.__str__()forxinSeqIO.parse(filename,"fasta")
+ }
+ seq_col=[seqrecords[i]foriindf[id_col]]
+ df[column_name]=seq_col
+ returndf
+
If you wish to join the maccs keys fingerprints back into the
+original dataframe, this can be accomplished by doing a join,
+because the indices are preserved:
@pf.register_dataframe_method
+@deprecated_alias(mols_col="mols_column_name")
+defmaccs_keys_fingerprint(
+ df:pd.DataFrame,mols_column_name:Hashable
+)->pd.DataFrame:
+"""Convert a column of RDKIT mol objects into MACCS Keys Fingerprints.
+
+ Returns a new dataframe without any of the original data.
+ This is intentional to leave the user with the data requested.
+
+ This method does not mutate the original DataFrame.
+
+ Examples:
+ Functional usage
+
+ >>> import pandas as pd
+ >>> import janitor.chemistry
+ >>> df = pd.DataFrame({"smiles": ["O=C=O", "CCC(=O)O"]})
+ >>> maccs = janitor.chemistry.maccs_keys_fingerprint(
+ ... df=df.smiles2mol('smiles', 'mols'),
+ ... mols_column_name='mols'
+ ... )
+ >>> len(maccs.columns)
+ 167
+
+ Method chaining usage
+
+ >>> import pandas as pd
+ >>> import janitor.chemistry
+ >>> df = pd.DataFrame({"smiles": ["O=C=O", "CCC(=O)O"]})
+ >>> maccs = (
+ ... df.smiles2mol('smiles', 'mols')
+ ... .maccs_keys_fingerprint(mols_column_name='mols')
+ ... )
+ >>> len(maccs.columns)
+ 167
+
+ If you wish to join the maccs keys fingerprints back into the
+ original dataframe, this can be accomplished by doing a `join`,
+ because the indices are preserved:
+
+ >>> joined = df.join(maccs)
+ >>> len(joined.columns)
+ 169
+
+ Args:
+ df: A pandas DataFrame.
+ mols_column_name: The name of the column that has the RDKIT mol
+ objects.
+
+ Returns:
+ A new pandas DataFrame of MACCS keys fingerprints.
+ """
+
+ maccs=[GetMACCSKeysFingerprint(m)formindf[mols_column_name]]
+
+ np_maccs=[]
+
+ formaccinmaccs:
+ arr=np.zeros((1,))
+ DataStructs.ConvertToNumpyArray(macc,arr)
+ np_maccs.append(arr)
+ np_maccs=np.vstack(np_maccs)
+ fmaccs=pd.DataFrame(np_maccs)
+ fmaccs.index=df.index
+ returnfmaccs
+
+
+
+
+
+
+
+
+
+
+
+
+ molecular_descriptors(df,mols_column_name)
+
+
+
+
+
+
+
Convert a column of RDKIT mol objects into a Pandas DataFrame
+of molecular descriptors.
+
Returns a new dataframe without any of the original data. This is
+intentional to leave the user only with the data requested.
+
This method does not mutate the original DataFrame.
+
The molecular descriptors are from the rdkit.Chem.rdMolDescriptors:
If you wish to join the molecular descriptors back into the original
+dataframe, this can be accomplished by doing a join,
+because the indices are preserved:
If you wish to join the morgan fingerprints back into the original
+dataframe, this can be accomplished by doing a join,
+because the indices are preserved:
@pf.register_dataframe_method
+@deprecated_alias(smiles_col="smiles_column_name",mols_col="mols_column_name")
+defsmiles2mol(
+ df:pd.DataFrame,
+ smiles_column_name:Hashable,
+ mols_column_name:Hashable,
+ drop_nulls:bool=True,
+ progressbar:Optional[str]=None,
+)->pd.DataFrame:
+"""Convert a column of SMILES strings into RDKit Mol objects.
+
+ Automatically drops invalid SMILES, as determined by RDKIT.
+
+ This method mutates the original DataFrame.
+
+ Examples:
+ Functional usage
+
+ >>> import pandas as pd
+ >>> import janitor.chemistry
+ >>> df = pd.DataFrame({"smiles": ["O=C=O", "CCC(=O)O"]})
+ >>> df = janitor.chemistry.smiles2mol(
+ ... df=df,
+ ... smiles_column_name='smiles',
+ ... mols_column_name='mols'
+ ... )
+ >>> df.mols[0].GetNumAtoms(), df.mols[0].GetNumBonds()
+ (3, 2)
+ >>> df.mols[1].GetNumAtoms(), df.mols[1].GetNumBonds()
+ (5, 4)
+
+ Method chaining usage
+
+ >>> import pandas as pd
+ >>> import janitor.chemistry
+ >>> df = df.smiles2mol(
+ ... smiles_column_name='smiles',
+ ... mols_column_name='rdkmol'
+ ... )
+ >>> df.rdkmol[0].GetNumAtoms(), df.rdkmol[0].GetNumBonds()
+ (3, 2)
+
+ A progressbar can be optionally used.
+
+ - Pass in "notebook" to show a `tqdm` notebook progressbar.
+ (`ipywidgets` must be enabled with your Jupyter installation.)
+ - Pass in "terminal" to show a `tqdm` progressbar. Better suited for use
+ with scripts.
+ - `None` is the default value - progress bar will not be shown.
+
+ Args:
+ df: pandas DataFrame.
+ smiles_column_name: Name of column that holds the SMILES strings.
+ mols_column_name: Name to be given to the new mols column.
+ drop_nulls: Whether to drop rows whose mols failed to be
+ constructed.
+ progressbar: Whether to show a progressbar or not.
+
+ Raises:
+ ValueError: If `progressbar` is not one of
+ `"notebook"`, `"terminal"`, or `None`.
+
+ Returns:
+ A pandas DataFrame with new RDKIT Mol objects column.
+ """
+ valid_progress=["notebook","terminal",None]
+ ifprogressbarnotinvalid_progress:
+ raiseValueError(f"progressbar kwarg must be one of {valid_progress}")
+
+ ifprogressbarisNone:
+ df[mols_column_name]=df[smiles_column_name].apply(
+ lambdax:Chem.MolFromSmiles(x)
+ )
+ else:
+ ifprogressbar=="notebook":
+ tqdmn().pandas(desc="mols")
+ elifprogressbar=="terminal":
+ tqdm.pandas(desc="mols")
+ df[mols_column_name]=df[smiles_column_name].progress_apply(
+ lambdax:Chem.MolFromSmiles(x)
+ )
+
+ ifdrop_nulls:
+ df=df.dropna(subset=[mols_column_name])
+ df=df.reset_index(drop=True)
+ returndf
+
Converts a column of numeric values from one unit to another.
+
Unit conversion can only take place if the existing_units and
+to_units are of the same type (e.g., temperature or pressure).
+The provided unit types can be any unit name or alternate name provided
+in the unyt package's Listing of Units table.
+
Volume units are not provided natively in unyt. However, exponents are
+supported, and therefore some volume units can be converted. For example,
+a volume in cubic centimeters can be converted to cubic meters using
+existing_units='cm**3' and to_units='m**3'.
@pf.register_dataframe_method
+defconvert_units(
+ df:pd.DataFrame,
+ column_name:str=None,
+ existing_units:str=None,
+ to_units:str=None,
+ dest_column_name:str=None,
+)->pd.DataFrame:
+"""Converts a column of numeric values from one unit to another.
+
+ Unit conversion can only take place if the `existing_units` and
+ `to_units` are of the same type (e.g., temperature or pressure).
+ The provided unit types can be any unit name or alternate name provided
+ in the `unyt` package's [Listing of Units table](
+ https://unyt.readthedocs.io/en/stable/unit_listing.html#unit-listing).
+
+ Volume units are not provided natively in `unyt`. However, exponents are
+ supported, and therefore some volume units can be converted. For example,
+ a volume in cubic centimeters can be converted to cubic meters using
+ `existing_units='cm**3'` and `to_units='m**3'`.
+
+ This method mutates the original DataFrame.
+
+ Examples:
+ >>> import pandas as pd
+ >>> import janitor.engineering
+ >>> df = pd.DataFrame({"temp_F": [-40, 112]})
+ >>> df = df.convert_units(
+ ... column_name='temp_F',
+ ... existing_units='degF',
+ ... to_units='degC',
+ ... dest_column_name='temp_C'
+ ... )
+ >>> df
+ temp_F temp_C
+ 0 -40 -40.000000
+ 1 112 44.444444
+
+ Args:
+ df: A pandas DataFrame.
+ column_name: Name of the column containing numeric
+ values that are to be converted from one set of units to another.
+ existing_units: The unit type to convert from.
+ to_units: The unit type to convert to.
+ dest_column_name: The name of the new column containing the
+ converted values that will be created.
+
+ Raises:
+ TypeError: If column is not numeric.
+
+ Returns:
+ A pandas DataFrame with a new column of unit-converted values.
+ """
+
+ # Check all inputs are correct data type
+ check("column_name",column_name,[str])
+ check("existing_units",existing_units,[str])
+ check("to_units",to_units,[str])
+ check("dest_column_name",dest_column_name,[str])
+
+ # Check that column_name is a numeric column
+ ifnotnp.issubdtype(df[column_name].dtype,np.number):
+ raiseTypeError(f"{column_name} must be a numeric column.")
+
+ original_vals=df[column_name].to_numpy()*unyt.Unit(existing_units)
+ converted_vals=original_vals.to(to_units)
+ df[dest_column_name]=np.array(converted_vals)
+
+ returndf
+
@pf.register_dataframe_method
+@deprecated_alias(colname="column_name")
+defconvert_currency(
+ df:pd.DataFrame,
+ api_key:str,
+ column_name:str=None,
+ from_currency:str=None,
+ to_currency:str=None,
+ historical_date:date=None,
+ make_new_column:bool=False,
+)->pd.DataFrame:
+"""Deprecated function.
+
+ <!--
+ # noqa: DAR101
+ # noqa: DAR401
+ -->
+ """
+ raiseJanitorError(
+ "The `convert_currency` function has been temporarily disabled due to "
+ "exchangeratesapi.io disallowing free pinging of its API. "
+ "(Our tests started to fail due to this issue.) "
+ "There is no easy way around this problem "
+ "except to find a new API to call on."
+ "Please comment on issue #829 "
+ "(https://github.com/pyjanitor-devs/pyjanitor/issues/829) "
+ "if you know of an alternative API that we can call on, "
+ "otherwise the function will be removed in pyjanitor's 1.0 release."
+ )
+
+
+
+
+
+
+
+
+
+
+
+
+ convert_stock(stock_symbol)
+
+
+
+
+
+
+
This function takes in a stock symbol as a parameter,
+queries an API for the companies full name and returns
+it
defconvert_stock(stock_symbol:str)->str:
+"""
+ This function takes in a stock symbol as a parameter,
+ queries an API for the companies full name and returns
+ it
+
+ Examples:
+
+ ```python
+ import janitor.finance
+ janitor.finance.convert_stock("aapl")
+ ```
+
+ Args:
+ stock_symbol: Stock ticker Symbol
+
+ Raises:
+ ConnectionError: Internet connection is not available
+
+ Returns:
+ Full company name
+ """
+ ifis_connected("www.google.com"):
+ stock_symbol=stock_symbol.upper()
+ returnget_symbol(stock_symbol)
+ else:
+ raiseConnectionError(
+ "Connection Error: Client Not Connected to Internet"
+ )
+
+
+
+
+
+
+
+
+
+
+
+
+ get_symbol(symbol)
+
+
+
+
+
+
+
This is a helper function to get a companies full
+name based on the stock symbol.
defget_symbol(symbol:str)->Optional[str]:
+"""
+ This is a helper function to get a companies full
+ name based on the stock symbol.
+
+ Examples:
+
+ ```python
+ import janitor.finance
+ janitor.finance.get_symbol("aapl")
+ ```
+
+ Args:
+ symbol: This is our stock symbol that we use
+ to query the api for the companies full name.
+
+ Returns:
+ Company full name
+ """
+ result=requests.get(
+ "http://d.yimg.com/autoc."
+ +"finance.yahoo.com/autoc?query={}®ion=1&lang=en".format(symbol)
+ ).json()
+
+ forxinresult["ResultSet"]["Result"]:
+ ifx["symbol"]==symbol:
+ returnx["name"]
+ else:
+ returnNone
+
@pf.register_dataframe_method
+definflate_currency(
+ df:pd.DataFrame,
+ column_name:str=None,
+ country:str=None,
+ currency_year:int=None,
+ to_year:int=None,
+ make_new_column:bool=False,
+)->pd.DataFrame:
+"""
+ Inflates a column of monetary values from one year to another, based on
+ the currency's country.
+
+ The provided country can be any economy name or code from the World Bank
+ [list of economies](https://databank.worldbank.org/data/download/site-content/CLASS.xls).
+
+ **Note**: This method mutates the original DataFrame.
+
+ Examples:
+
+ >>> import pandas as pd
+ >>> import janitor.finance
+ >>> df = pd.DataFrame({"profit":[100.10, 200.20, 300.30, 400.40, 500.50]})
+ >>> df
+ profit
+ 0 100.1
+ 1 200.2
+ 2 300.3
+ 3 400.4
+ 4 500.5
+ >>> df.inflate_currency(
+ ... column_name='profit',
+ ... country='USA',
+ ... currency_year=2015,
+ ... to_year=2018,
+ ... make_new_column=True
+ ... )
+ profit profit_2018
+ 0 100.1 106.050596
+ 1 200.2 212.101191
+ 2 300.3 318.151787
+ 3 400.4 424.202382
+ 4 500.5 530.252978
+
+ Args:
+ df: A pandas DataFrame.
+ column_name: Name of the column containing monetary
+ values to inflate.
+ country: The country associated with the currency being inflated.
+ May be any economy or code from the World Bank
+ [List of economies](https://databank.worldbank.org/data/download/site-content/CLASS.xls).
+ currency_year: The currency year to inflate from.
+ The year should be 1960 or later.
+ to_year: The currency year to inflate to.
+ The year should be 1960 or later.
+ make_new_column: Generates new column for inflated currency if
+ True, otherwise, inflates currency in place.
+
+ Returns:
+ The DataFrame with inflated currency column.
+ """# noqa: E501
+
+ inflator=_inflate_currency(country,currency_year,to_year)
+
+ ifmake_new_column:
+ new_column_name=column_name+"_"+str(to_year)
+ df[new_column_name]=df[column_name]*inflator
+
+ else:
+ df[column_name]=df[column_name]*inflator
+
+ returndf
+
pyjanitor's general-purpose data cleaning functions.
+
NOTE: Instructions for future contributors:
+
+
Place the source code of the functions in a file named after the function.
+
Place utility functions in the same file.
+
If you use a utility function from another source file,
+please refactor it out to janitor.functions.utils.
+
Import the function into this file so that it shows up in the top-level API.
+
Sort the imports in alphabetical order.
+
Try to group related functions together (e.g. see convert_date.py)
+
Never import utils.
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+ DropLabel
+
+
+
+ dataclass
+
+
+
+
+
+
+
+
+
Helper class for removing labels within the select syntax.
+
label can be any of the types supported in the select,
+select_rows and select_columns functions.
+An array of integers not matching the labels is returned.
@dataclass
+classDropLabel:
+"""Helper class for removing labels within the `select` syntax.
+
+ `label` can be any of the types supported in the `select`,
+ `select_rows` and `select_columns` functions.
+ An array of integers not matching the labels is returned.
+
+ !!! info "New in version 0.24.0"
+
+ Args:
+ label: Label(s) to be dropped from the index.
+ """
+
+ label:Any
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+ col
+
+
+
+
+
+
+
+
+
Helper class for column selection within an expression.
+
+
+
+
Parameters:
+
+
+
+
Name
+
Type
+
Description
+
Default
+
+
+
+
+
column
+
+ Hashable
+
+
+
+
The name of the column to be selected.
+
+
+
+ required
+
+
+
+
+
+
+
+
Raises:
+
+
+
+
Type
+
Description
+
+
+
+
+
+ TypeError
+
+
+
+
If the column parameter is not hashable.
+
+
+
+
+
+
+
New in version 0.25.0
+
+
+
Warning
+
col is currently considered experimental.
+The implementation and parts of the API
+may change without warning.
classcol:
+"""Helper class for column selection within an expression.
+
+ Args:
+ column (Hashable): The name of the column to be selected.
+
+ Raises:
+ TypeError: If the `column` parameter is not hashable.
+
+ !!! info "New in version 0.25.0"
+
+ !!! warning
+
+ `col` is currently considered experimental.
+ The implementation and parts of the API
+ may change without warning.
+
+ """
+
+ def__init__(self,column:Hashable):
+ self.cols=column
+ check("column",self.cols,[Hashable])
+ self.join_args=None
+
+ def__gt__(self,other):
+"""Implements the greater-than comparison operator (`>`).
+
+ Args:
+ other (col): The other `col` object to compare to.
+
+ Returns:
+ col: The current `col` object.
+ """
+ self.join_args=(self.cols,other.cols,">")
+ returnself
+
+ def__ge__(self,other):
+"""Implements the greater-than-or-equal-to comparison operator (`>=`).
+
+ Args:
+ other (col): The other `col` object to compare to.
+
+ Returns:
+ col: The current `col` object.
+ """
+ self.join_args=(self.cols,other.cols,">=")
+ returnself
+
+ def__lt__(self,other):
+"""Implements the less-than comparison operator (`<`).
+
+ Args:
+ other (col): The other `col` object to compare to.
+
+ Returns:
+ col: The current `col` object.
+ """
+ self.join_args=(self.cols,other.cols,"<")
+ returnself
+
+ def__le__(self,other):
+"""Implements the less-than-or-equal-to comparison operator (`<=`).
+
+ Args:
+ other (col): The other `col` object to compare to.
+
+ Returns:
+ col: The current `col` object.
+ """
+ self.join_args=(self.cols,other.cols,"<=")
+ returnself
+
+ def__ne__(self,other):
+"""Implements the not-equal-to comparison operator (`!=`).
+
+ Args:
+ other (col): The other `col` object to compare to.
+
+ Returns:
+ col: The current `col` object.
+ """
+ self.join_args=(self.cols,other.cols,"!=")
+ returnself
+
+ def__eq__(self,other):
+"""Implements the equal-to comparison operator (`==`).
+
+ Args:
+ other (col): The other `col` object to compare to.
+
+ Returns:
+ col: The current `col` object.
+ """
+ self.join_args=(self.cols,other.cols,"==")
+ returnself
+
def__eq__(self,other):
+"""Implements the equal-to comparison operator (`==`).
+
+ Args:
+ other (col): The other `col` object to compare to.
+
+ Returns:
+ col: The current `col` object.
+ """
+ self.join_args=(self.cols,other.cols,"==")
+ returnself
+
+
+
+
+
+
+
+
+
+
+
+
+ __ge__(other)
+
+
+
+
+
+
+
Implements the greater-than-or-equal-to comparison operator (>=).
def__ge__(self,other):
+"""Implements the greater-than-or-equal-to comparison operator (`>=`).
+
+ Args:
+ other (col): The other `col` object to compare to.
+
+ Returns:
+ col: The current `col` object.
+ """
+ self.join_args=(self.cols,other.cols,">=")
+ returnself
+
+
+
+
+
+
+
+
+
+
+
+
+ __gt__(other)
+
+
+
+
+
+
+
Implements the greater-than comparison operator (>).
def__gt__(self,other):
+"""Implements the greater-than comparison operator (`>`).
+
+ Args:
+ other (col): The other `col` object to compare to.
+
+ Returns:
+ col: The current `col` object.
+ """
+ self.join_args=(self.cols,other.cols,">")
+ returnself
+
+
+
+
+
+
+
+
+
+
+
+
+ __le__(other)
+
+
+
+
+
+
+
Implements the less-than-or-equal-to comparison operator (<=).
def__le__(self,other):
+"""Implements the less-than-or-equal-to comparison operator (`<=`).
+
+ Args:
+ other (col): The other `col` object to compare to.
+
+ Returns:
+ col: The current `col` object.
+ """
+ self.join_args=(self.cols,other.cols,"<=")
+ returnself
+
def__lt__(self,other):
+"""Implements the less-than comparison operator (`<`).
+
+ Args:
+ other (col): The other `col` object to compare to.
+
+ Returns:
+ col: The current `col` object.
+ """
+ self.join_args=(self.cols,other.cols,"<")
+ returnself
+
+
+
+
+
+
+
+
+
+
+
+
+ __ne__(other)
+
+
+
+
+
+
+
Implements the not-equal-to comparison operator (!=).
def__ne__(self,other):
+"""Implements the not-equal-to comparison operator (`!=`).
+
+ Args:
+ other (col): The other `col` object to compare to.
+
+ Returns:
+ col: The current `col` object.
+ """
+ self.join_args=(self.cols,other.cols,"!=")
+ returnself
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+ convert_excel_date(df,column_name)
+
+
+
+
+
+
+
Convert Excel's serial date format into Python datetime format.
A column name or an iterable (list
+or tuple) of column names. If a single column name is passed in,
+then only that column will be filled; if a list or tuple is passed
+in, then those columns will all be filled with the same value.
@pf.register_dataframe_method
+@refactored_function(
+ message="This function will be deprecated in a 1.x release. "
+ "Kindly use `jn.impute` instead."
+)
+@deprecated_alias(columns="column_names")
+deffill_empty(
+ df:pd.DataFrame,
+ column_names:Union[str,Iterable[str],Hashable],
+ value:Any,
+)->pd.DataFrame:
+"""Fill `NaN` values in specified columns with a given value.
+
+ Super sugary syntax that wraps `pandas.DataFrame.fillna`.
+
+ This method mutates the original DataFrame.
+
+ !!!note
+
+ This function will be deprecated in a 1.x release.
+ Please use [`jn.impute`][janitor.functions.impute.impute] instead.
+
+ Examples:
+ >>> import pandas as pd
+ >>> import janitor
+ >>> df = pd.DataFrame(
+ ... {
+ ... 'col1': [1, 2, 3],
+ ... 'col2': [None, 4, None ],
+ ... 'col3': [None, 5, 6]
+ ... }
+ ... )
+ >>> df
+ col1 col2 col3
+ 0 1 NaN NaN
+ 1 2 4.0 5.0
+ 2 3 NaN 6.0
+ >>> df.fill_empty(column_names = 'col2', value = 0)
+ col1 col2 col3
+ 0 1 0.0 NaN
+ 1 2 4.0 5.0
+ 2 3 0.0 6.0
+ >>> df.fill_empty(column_names = ['col2', 'col3'], value = 0)
+ col1 col2 col3
+ 0 1 0.0 0.0
+ 1 2 4.0 5.0
+ 2 3 0.0 6.0
+
+ Args:
+ df: A pandas DataFrame.
+ column_names: A column name or an iterable (list
+ or tuple) of column names. If a single column name is passed in,
+ then only that column will be filled; if a list or tuple is passed
+ in, then those columns will all be filled with the same value.
+ value: The value that replaces the `NaN` values.
+
+ Returns:
+ A pandas DataFrame with `NaN` values filled.
+ """
+
+ check_column(df,column_names)
+ return_fill_empty(df,column_names,value=value)
+
Filter a dataframe for values in a column that exist in the given iterable.
+
This method does not mutate the original DataFrame.
+
Assumes exact matching; fuzzy matching not implemented.
+
+
+
+
Examples:
+
Filter the dataframe to retain rows for which names
+are exactly James or John.
+
>>> importpandasaspd
+>>> importjanitor
+>>> df=pd.DataFrame({"names":["Jane","Jeremy","John"],"foo":list("xyz")})
+>>> df
+ names foo
+0 Jane x
+1 Jeremy y
+2 John z
+>>> df.filter_column_isin(column_name="names",iterable=["James","John"])
+ names foo
+2 John z
+
+
This is the method-chaining alternative to:
+
df=df[df["names"].isin(["James","John"])]
+
+
If complement=True, then we will only get rows for which the names
+are neither James nor John.
+
+
+
+
Parameters:
+
+
+
+
Name
+
Type
+
Description
+
Default
+
+
+
+
+
df
+
+ DataFrame
+
+
+
+
A pandas DataFrame.
+
+
+
+ required
+
+
+
+
column_name
+
+ Hashable
+
+
+
+
The column on which to filter.
+
+
+
+ required
+
+
+
+
iterable
+
+ Iterable
+
+
+
+
An iterable. Could be a list, tuple, another pandas
+Series.
+
+
+
+ required
+
+
+
+
complement
+
+ bool
+
+
+
+
Whether to return the complement of the selection or
+not.
+
+
+
+ False
+
+
+
+
+
+
+
+
Raises:
+
+
+
+
Type
+
Description
+
+
+
+
+
+ ValueError
+
+
+
+
If iterable does not have a length of 1
+or greater.
+
+
+
+
+
+
+
+
+
Returns:
+
+
+
+
Type
+
Description
+
+
+
+
+
+ DataFrame
+
+
+
+
A filtered pandas DataFrame.
+
+
+
+
+
+
+
+ Source code in janitor/functions/filter.py
+
@pf.register_dataframe_method
+@deprecated_alias(column="column_name")
+deffilter_column_isin(
+ df:pd.DataFrame,
+ column_name:Hashable,
+ iterable:Iterable,
+ complement:bool=False,
+)->pd.DataFrame:
+"""Filter a dataframe for values in a column that exist in the given iterable.
+
+ This method does not mutate the original DataFrame.
+
+ Assumes exact matching; fuzzy matching not implemented.
+
+ Examples:
+ Filter the dataframe to retain rows for which `names`
+ are exactly `James` or `John`.
+
+ >>> import pandas as pd
+ >>> import janitor
+ >>> df = pd.DataFrame({"names": ["Jane", "Jeremy", "John"], "foo": list("xyz")})
+ >>> df
+ names foo
+ 0 Jane x
+ 1 Jeremy y
+ 2 John z
+ >>> df.filter_column_isin(column_name="names", iterable=["James", "John"])
+ names foo
+ 2 John z
+
+ This is the method-chaining alternative to:
+
+ ```python
+ df = df[df["names"].isin(["James", "John"])]
+ ```
+
+ If `complement=True`, then we will only get rows for which the names
+ are neither `James` nor `John`.
+
+ Args:
+ df: A pandas DataFrame.
+ column_name: The column on which to filter.
+ iterable: An iterable. Could be a list, tuple, another pandas
+ Series.
+ complement: Whether to return the complement of the selection or
+ not.
+
+ Raises:
+ ValueError: If `iterable` does not have a length of `1`
+ or greater.
+
+ Returns:
+ A filtered pandas DataFrame.
+ """# noqa: E501
+ iflen(iterable)==0:
+ raiseValueError(
+ "`iterable` kwarg must be given an iterable of length 1 "
+ "or greater."
+ )
+ criteria=df[column_name].isin(iterable)
+
+ ifcomplement:
+ returndf[~criteria]
+ returndf[criteria]
+
This only affects the format of the start_date and end_date
+parameters. If there's an issue with the format of the DataFrame being
+parsed, you would pass {'format': your_format} to column_date_options.
+
+
+
+
+
Parameters:
+
+
+
+
Name
+
Type
+
Description
+
Default
+
+
+
+
+
df
+
+ DataFrame
+
+
+
+
The dataframe to filter on.
+
+
+
+ required
+
+
+
+
column_name
+
+ Hashable
+
+
+
+
The column which to apply the fraction transformation.
+
+
+
+ required
+
+
+
+
start_date
+
+ Optional[date]
+
+
+
+
The beginning date to use to filter the DataFrame.
+
+
+
+ None
+
+
+
+
end_date
+
+ Optional[date]
+
+
+
+
The end date to use to filter the DataFrame.
+
+
+
+ None
+
+
+
+
years
+
+ Optional[List]
+
+
+
+
The years to use to filter the DataFrame.
+
+
+
+ None
+
+
+
+
months
+
+ Optional[List]
+
+
+
+
The months to use to filter the DataFrame.
+
+
+
+ None
+
+
+
+
days
+
+ Optional[List]
+
+
+
+
The days to use to filter the DataFrame.
+
+
+
+ None
+
+
+
+
column_date_options
+
+ Optional[Dict]
+
+
+
+
Special options to use when parsing the date
+column in the original DataFrame. The options may be found at the
+official Pandas documentation.
+
+
+
+ None
+
+
+
+
format
+
+ Optional[str]
+
+
+
+
If you're using a format for start_date or end_date
+that is not recognized natively by pandas' to_datetime function, you
+may supply the format yourself. Python date and time formats may be
+found here.
+
+
+
+ None
+
+
+
+
+
+
+
+
Returns:
+
+
+
+
Type
+
Description
+
+
+
+
+
+ DataFrame
+
+
+
+
A filtered pandas DataFrame.
+
+
+
+
+
+
+
+ Source code in janitor/functions/filter.py
+
@pf.register_dataframe_method
+@deprecated_alias(column="column_name",start="start_date",end="end_date")
+deffilter_date(
+ df:pd.DataFrame,
+ column_name:Hashable,
+ start_date:Optional[dt.date]=None,
+ end_date:Optional[dt.date]=None,
+ years:Optional[List]=None,
+ months:Optional[List]=None,
+ days:Optional[List]=None,
+ column_date_options:Optional[Dict]=None,
+ format:Optional[str]=None,# skipcq: PYL-W0622
+)->pd.DataFrame:
+"""Filter a date-based column based on certain criteria.
+
+ This method does not mutate the original DataFrame.
+
+ Dates may be finicky and this function builds on top of the *magic* from
+ the pandas `to_datetime` function that is able to parse dates well.
+
+ Additional options to parse the date type of your column may be found at
+ the official pandas [documentation][datetime].
+
+ [datetime]: https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.to_datetime.html
+
+ Examples:
+ >>> import pandas as pd
+ >>> import janitor
+ >>> df = pd.DataFrame({
+ ... "a": range(5, 9),
+ ... "dt": ["2021-11-12", "2021-12-15", "2022-01-03", "2022-01-09"],
+ ... })
+ >>> df
+ a dt
+ 0 5 2021-11-12
+ 1 6 2021-12-15
+ 2 7 2022-01-03
+ 3 8 2022-01-09
+ >>> df.filter_date("dt", start_date="2021-12-01", end_date="2022-01-05")
+ a dt
+ 1 6 2021-12-15
+ 2 7 2022-01-03
+ >>> df.filter_date("dt", years=[2021], months=[12])
+ a dt
+ 1 6 2021-12-15
+
+ !!!note
+
+ This method will cast your column to a Timestamp!
+
+ !!!note
+
+ This only affects the format of the `start_date` and `end_date`
+ parameters. If there's an issue with the format of the DataFrame being
+ parsed, you would pass `{'format': your_format}` to `column_date_options`.
+
+ Args:
+ df: The dataframe to filter on.
+ column_name: The column which to apply the fraction transformation.
+ start_date: The beginning date to use to filter the DataFrame.
+ end_date: The end date to use to filter the DataFrame.
+ years: The years to use to filter the DataFrame.
+ months: The months to use to filter the DataFrame.
+ days: The days to use to filter the DataFrame.
+ column_date_options: Special options to use when parsing the date
+ column in the original DataFrame. The options may be found at the
+ official Pandas documentation.
+ format: If you're using a format for `start_date` or `end_date`
+ that is not recognized natively by pandas' `to_datetime` function, you
+ may supply the format yourself. Python date and time formats may be
+ found [here](http://strftime.org/).
+
+ Returns:
+ A filtered pandas DataFrame.
+ """# noqa: E501
+
+ def_date_filter_conditions(conditions):
+"""Taken from: https://stackoverflow.com/a/13616382."""
+ returnreduce(np.logical_and,conditions)
+
+ ifcolumn_date_optionsisNone:
+ column_date_options={}
+ df[column_name]=pd.to_datetime(df[column_name],**column_date_options)
+
+ _filter_list=[]
+
+ ifstart_date:
+ start_date=pd.to_datetime(start_date,format=format)
+ _filter_list.append(df[column_name]>=start_date)
+
+ ifend_date:
+ end_date=pd.to_datetime(end_date,format=format)
+ _filter_list.append(df[column_name]<=end_date)
+
+ ifyears:
+ _filter_list.append(df[column_name].dt.year.isin(years))
+
+ ifmonths:
+ _filter_list.append(df[column_name].dt.month.isin(months))
+
+ ifdays:
+ _filter_list.append(df[column_name].dt.day.isin(days))
+
+ ifstart_dateandend_dateandstart_date>end_date:
+ warnings.warn(
+ f"Your start date of {start_date} is after your end date of "
+ f"{end_date}. Is this intended?"
+ )
+
+ returndf.loc[_date_filter_conditions(_filter_list),:]
+
+
+
+
+
+
+
+
+
+
+
+
+ filter_on(df,criteria,complement=False)
+
+
+
+
+
+
+
Return a dataframe filtered on a particular criteria.
+
This method does not mutate the original DataFrame.
+
This is super-sugary syntax that wraps the pandas .query() API, enabling
+users to use strings to quickly specify filters for filtering their
+dataframe. The intent is that filter_on as a verb better matches the
+intent of a pandas user than the verb query.
+
This is intended to be the method-chaining equivalent of the following:
+
df=df[df["score"]<3]
+
+
+
Note
+
This function will be deprecated in a 1.x release.
+Please use pd.DataFrame.query instead.
+
+
+
+
+
Examples:
+
Filter students who failed an exam (scored less than 50).
@pf.register_dataframe_method
+@refactored_function(
+ message=(
+ "This function will be deprecated in a 1.x release. "
+ "Please use `pd.DataFrame.query` instead."
+ )
+)
+deffilter_on(
+ df:pd.DataFrame,
+ criteria:str,
+ complement:bool=False,
+)->pd.DataFrame:
+"""Return a dataframe filtered on a particular criteria.
+
+ This method does not mutate the original DataFrame.
+
+ This is super-sugary syntax that wraps the pandas `.query()` API, enabling
+ users to use strings to quickly specify filters for filtering their
+ dataframe. The intent is that `filter_on` as a verb better matches the
+ intent of a pandas user than the verb `query`.
+
+ This is intended to be the method-chaining equivalent of the following:
+
+ ```python
+ df = df[df["score"] < 3]
+ ```
+
+ !!!note
+
+ This function will be deprecated in a 1.x release.
+ Please use `pd.DataFrame.query` instead.
+
+
+ Examples:
+ Filter students who failed an exam (scored less than 50).
+
+ >>> import pandas as pd
+ >>> import janitor
+ >>> df = pd.DataFrame({
+ ... "student_id": ["S1", "S2", "S3"],
+ ... "score": [40, 60, 85],
+ ... })
+ >>> df
+ student_id score
+ 0 S1 40
+ 1 S2 60
+ 2 S3 85
+ >>> df.filter_on("score < 50", complement=False)
+ student_id score
+ 0 S1 40
+
+ Credit to Brant Peterson for the name.
+
+ Args:
+ df: A pandas DataFrame.
+ criteria: A filtering criteria that returns an array or Series of
+ booleans, on which pandas can filter on.
+ complement: Whether to return the complement of the filter or not.
+ If set to True, then the rows for which the criteria is False are
+ retained instead.
+
+ Returns:
+ A filtered pandas DataFrame.
+ """
+
+ warnings.warn(
+ "This function will be deprecated in a 1.x release. "
+ "Kindly use `pd.DataFrame.query` instead.",
+ DeprecationWarning,
+ stacklevel=find_stack_level(),
+ )
+
+ ifcomplement:
+ returndf.query(f"not ({criteria})")
+ returndf.query(criteria)
+
@pf.register_dataframe_method
+@deprecated_alias(column="column_name")
+deffilter_string(
+ df:pd.DataFrame,
+ column_name:Hashable,
+ search_string:str,
+ complement:bool=False,
+ case:bool=True,
+ flags:int=0,
+ na:Any=None,
+ regex:bool=True,
+)->pd.DataFrame:
+"""Filter a string-based column according to whether it contains a substring.
+
+ This is super sugary syntax that builds on top of `pandas.Series.str.contains`.
+ It is meant to be the method-chaining equivalent of the following:
+
+ ```python
+ df = df[df[column_name].str.contains(search_string)]]
+ ```
+
+ This method does not mutate the original DataFrame.
+
+ Examples:
+ Retain rows whose column values contain a particular substring.
+
+ >>> import pandas as pd
+ >>> import janitor
+ >>> df = pd.DataFrame({"a": range(3, 6), "b": ["bear", "peeL", "sail"]})
+ >>> df
+ a b
+ 0 3 bear
+ 1 4 peeL
+ 2 5 sail
+ >>> df.filter_string(column_name="b", search_string="ee")
+ a b
+ 1 4 peeL
+ >>> df.filter_string(column_name="b", search_string="L", case=False)
+ a b
+ 1 4 peeL
+ 2 5 sail
+
+ Filter names does not contain `'.'` (disable regex mode).
+
+ >>> import pandas as pd
+ >>> import janitor
+ >>> df = pd.Series(["JoseChen", "Brian.Salvi"], name="Name").to_frame()
+ >>> df
+ Name
+ 0 JoseChen
+ 1 Brian.Salvi
+ >>> df.filter_string(column_name="Name", search_string=".", regex=False, complement=True)
+ Name
+ 0 JoseChen
+
+ Args:
+ df: A pandas DataFrame.
+ column_name: The column to filter. The column should contain strings.
+ search_string: A regex pattern or a (sub-)string to search.
+ complement: Whether to return the complement of the filter or not. If
+ set to True, then the rows for which the string search fails are retained
+ instead.
+ case: If True, case sensitive.
+ flags: Flags to pass through to the re module, e.g. re.IGNORECASE.
+ na: Fill value for missing values. The default depends on dtype of
+ the array. For object-dtype, `numpy.nan` is used. For `StringDtype`,
+ `pandas.NA` is used.
+ regex: If True, assumes `search_string` is a regular expression. If False,
+ treats the `search_string` as a literal string.
+
+ Returns:
+ A filtered pandas DataFrame.
+ """# noqa: E501
+
+ criteria=df[column_name].str.contains(
+ pat=search_string,
+ case=case,
+ flags=flags,
+ na=na,
+ regex=regex,
+ )
+
+ ifcomplement:
+ returndf[~criteria]
+
+ returndf[criteria]
+
+
+
+
+
+
+
+
+
+
+
+
+ get_columns(group,label)
+
+
+
+
+
+
+
Helper function for selecting columns on a grouped object,
+using the
+select syntax.
defget_columns(group:Union[DataFrameGroupBy,SeriesGroupBy],label):
+"""
+ Helper function for selecting columns on a grouped object,
+ using the
+ [`select`][janitor.functions.select.select] syntax.
+
+ !!! info "New in version 0.25.0"
+
+ Args:
+ group: A Pandas GroupBy object.
+ label: column(s) to select.
+
+ Returns:
+ A pandas groupby object.
+ """
+ check("groupby object",group,[DataFrameGroupBy,SeriesGroupBy])
+ label=get_index_labels(label,group.obj,axis="columns")
+ label=labelifis_scalar(label)elselist(label)
+ returngroup[label]
+
+
+
+
+
+
+
+
+
+
+
+
+ get_index_labels(arg,df,axis)
+
+
+
+
+
+
+
Convenience function to get actual labels from column/index
+
+
New in version 0.25.0
+
+
+
+
+
Parameters:
+
+
+
+
Name
+
Type
+
Description
+
Default
+
+
+
+
+
arg
+
+
+
+
+
Valid inputs include: an exact column name to look for,
+a shell-style glob string (e.g. *_thing_*),
+a regular expression,
+a callable,
+or variable arguments of all the aforementioned.
+A sequence of booleans is also acceptable.
+A dictionary can be used for selection
+on a MultiIndex on different levels.
defget_index_labels(
+ arg,df:pd.DataFrame,axis:Literal["index","columns"]
+)->pd.Index:
+"""Convenience function to get actual labels from column/index
+
+ !!! info "New in version 0.25.0"
+
+ Args:
+ arg: Valid inputs include: an exact column name to look for,
+ a shell-style glob string (e.g. `*_thing_*`),
+ a regular expression,
+ a callable,
+ or variable arguments of all the aforementioned.
+ A sequence of booleans is also acceptable.
+ A dictionary can be used for selection
+ on a MultiIndex on different levels.
+ df: The pandas DataFrame object.
+ axis: Should be either `index` or `columns`.
+
+ Returns:
+ A pandas Index.
+ """
+ assertaxisin{"index","columns"}
+ index=getattr(df,axis)
+ returnindex[_select_index(arg,df,axis)]
+
Convenience function to return the matching indices from an inner join.
+
+
New in version 0.27.0
+
+
+
+
+
Parameters:
+
+
+
+
Name
+
Type
+
Description
+
Default
+
+
+
+
+
df
+
+ DataFrame
+
+
+
+
A pandas DataFrame.
+
+
+
+ required
+
+
+
+
right
+
+ Union[DataFrame, Series]
+
+
+
+
Named Series or DataFrame to join to.
+
+
+
+ required
+
+
+
+
conditions
+
+ list[tuple[str]]
+
+
+
+
List of arguments of tuple(s) of the form
+(left_on, right_on, op), where left_on is the column
+label from df, right_on is the column label from right,
+while op is the operator.
+The col class is also supported. The operator can be any of
+==, !=, <=, <, >=, >. For multiple conditions,
+the and(&) operator is used to combine the results
+of the individual conditions.
+
+
+
+ required
+
+
+
+
use_numba
+
+ bool
+
+
+
+
Use numba, if installed, to accelerate the computation.
+
+
+
+ False
+
+
+
+
keep
+
+ Literal['first', 'last', 'all']
+
+
+
+
Choose whether to return the first match, last match or all matches.
+
+
+
+ 'all'
+
+
+
+
force
+
+ bool
+
+
+
+
If True, force the non-equi join conditions
+to execute before the equi join.
+
+
+
+ False
+
+
+
+
+
+
+
+
Returns:
+
+
+
+
Type
+
Description
+
+
+
+
+
+ tuple[ndarray, ndarray]
+
+
+
+
A tuple of indices for the rows in the dataframes that match.
+
+
+
+
+
+
+
+ Source code in janitor/functions/conditional_join.py
+
defget_join_indices(
+ df:pd.DataFrame,
+ right:Union[pd.DataFrame,pd.Series],
+ conditions:list[tuple[str]],
+ keep:Literal["first","last","all"]="all",
+ use_numba:bool=False,
+ force:bool=False,
+)->tuple[np.ndarray,np.ndarray]:
+"""Convenience function to return the matching indices from an inner join.
+
+ !!! info "New in version 0.27.0"
+
+ Args:
+ df: A pandas DataFrame.
+ right: Named Series or DataFrame to join to.
+ conditions: List of arguments of tuple(s) of the form
+ `(left_on, right_on, op)`, where `left_on` is the column
+ label from `df`, `right_on` is the column label from `right`,
+ while `op` is the operator.
+ The `col` class is also supported. The operator can be any of
+ `==`, `!=`, `<=`, `<`, `>=`, `>`. For multiple conditions,
+ the and(`&`) operator is used to combine the results
+ of the individual conditions.
+ use_numba: Use numba, if installed, to accelerate the computation.
+ keep: Choose whether to return the first match, last match or all matches.
+ force: If `True`, force the non-equi join conditions
+ to execute before the equi join.
+
+ Returns:
+ A tuple of indices for the rows in the dataframes that match.
+ """
+ return_conditional_join_compute(
+ df=df,
+ right=right,
+ conditions=conditions,
+ how="inner",
+ sort_by_appearance=False,
+ df_columns=None,
+ right_columns=None,
+ keep=keep,
+ use_numba=use_numba,
+ indicator=False,
+ force=force,
+ return_matching_indices=True,
+ )
+
+
+
+
+
+
+
+
+
+
+
+
+ patterns(regex_pattern)
+
+
+
+
+
+
+
This function converts a string into a compiled regular expression.
+
It can be used to select columns in the index or columns_names
+arguments of pivot_longer function.
+
+
Warning
+
This function is deprecated. Kindly use re.compile instead.
+
+
+
+
+
Parameters:
+
+
+
+
Name
+
Type
+
Description
+
Default
+
+
+
+
+
regex_pattern
+
+ Union[str, Pattern]
+
+
+
+
String to be converted to compiled regular
+expression.
+
+
+
+ required
+
+
+
+
+
+
+
+
Returns:
+
+
+
+
Type
+
Description
+
+
+
+
+
+ Pattern
+
+
+
+
A compile regular expression from provided regex_pattern.
defpatterns(regex_pattern:Union[str,Pattern])->Pattern:
+"""This function converts a string into a compiled regular expression.
+
+ It can be used to select columns in the index or columns_names
+ arguments of `pivot_longer` function.
+
+ !!!warning
+
+ This function is deprecated. Kindly use `re.compile` instead.
+
+ Args:
+ regex_pattern: String to be converted to compiled regular
+ expression.
+
+ Returns:
+ A compile regular expression from provided `regex_pattern`.
+ """
+ warnings.warn(
+ "This function is deprecated. Kindly use `re.compile` instead.",
+ DeprecationWarning,
+ stacklevel=find_stack_level(),
+ )
+ check("regular expression",regex_pattern,[str,Pattern])
+
+ returnre.compile(regex_pattern)
+
This method does not mutate the original DataFrame.
+
It is modeled after the pivot_longer function in R's tidyr package,
+and also takes inspiration from R's data.table package.
+
This function is useful to massage a DataFrame into a format where
+one or more columns are considered measured variables, and all other
+columns are considered as identifier variables.
+
All measured variables are unpivoted (and typically duplicated) along the
+row axis.
+
Column selection in index and column_names is possible using the
+select syntax.
Replicate the above transformation with a nested dictionary passed to names_pattern
+- the outer keys in the names_pattern dictionary are passed to names_to,
+while the inner keys are passed to values_to:
Name(s) of columns to use as identifier variables.
+Should be either a single column name, or a list/tuple of
+column names.
+index should be a list of tuples if the columns are a MultiIndex.
+
+
+
+ None
+
+
+
+
column_names
+
+ Optional[Union[list, tuple, str, Pattern]]
+
+
+
+
Name(s) of columns to unpivot. Should be either
+a single column name or a list/tuple of column names.
+column_names should be a list of tuples
+if the columns are a MultiIndex.
+
+
+
+ None
+
+
+
+
names_to
+
+ Optional[Union[list, tuple, str]]
+
+
+
+
Name of new column as a string that will contain
+what were previously the column names in column_names.
+The default is variable if no value is provided. It can
+also be a list/tuple of strings that will serve as new column
+names, if name_sep or names_pattern is provided.
+If .value is in names_to, new column names will be extracted
+from part of the existing column names and overridesvalues_to.
+
+
+
+ None
+
+
+
+
values_to
+
+ Optional[str]
+
+
+
+
Name of new column as a string that will contain what
+were previously the values of the columns in column_names.
+values_to can also be a list/tuple
+and requires that names_pattern is also a list/tuple.
+
+
+
+ 'value'
+
+
+
+
column_level
+
+ Optional[Union[int, str]]
+
+
+
+
If columns are a MultiIndex, then use this level to
+unpivot the DataFrame. Provided for compatibility with pandas' melt,
+and applies only if neither names_sep nor names_pattern is
+provided.
+
+
+
+ None
+
+
+
+
names_sep
+
+ Optional[Union[str, Pattern]]
+
+
+
+
Determines how the column name is broken up, if
+names_to contains multiple values. It takes the same
+specification as pandas' str.split method, and can be a string
+or regular expression. names_sep does not work with MultiIndex
+columns.
+
+
+
+ None
+
+
+
+
names_pattern
+
+ Optional[Union[list, tuple, str, Pattern]]
+
+
+
+
Determines how the column name is broken up.
+It can be a regular expression containing matching groups.
+Under the hood it is processed with pandas' str.extract function.
+If it is a single regex, the number of groups must match
+the length of names_to.
+Named groups are supported, if names_to is none. _ is used
+instead of .value as a placeholder in named groups.
+_ can be overloaded for multiple .value
+calls - _, __, ___, ...
+names_pattern can also be a list/tuple of regular expressions
+It can also be a list/tuple of strings;
+the strings will be treated as regular expressions.
+Under the hood it is processed with pandas' str.contains function.
+For a list/tuple of regular expressions,
+names_to must also be a list/tuple and the lengths of both
+arguments must match.
+names_pattern can also be a dictionary, where the keys are
+the new column names, while the values can be a regular expression
+or a string which will be evaluated as a regular expression.
+Alternatively, a nested dictionary can be used, where the sub
+key(s) are associated with values_to. Please have a look
+at the examples for usage.
+names_pattern does not work with MultiIndex columns.
+
+
+
+ None
+
+
+
+
names_transform
+
+ Optional[Union[str, Callable, dict]]
+
+
+
+
Use this option to change the types of columns that
+have been transformed to rows. This does not applies to the values' columns.
+Accepts any argument that is acceptable by pd.astype.
+
+
+
+ None
+
+
+
+
dropna
+
+ bool
+
+
+
+
Determines whether or not to drop nulls
+from the values columns. Default is False.
+
+
+
+ False
+
+
+
+
sort_by_appearance
+
+ Optional[bool]
+
+
+
+
Boolean value that determines
+the final look of the DataFrame. If True, the unpivoted DataFrame
+will be stacked in order of first appearance.
+
+
+
+ False
+
+
+
+
ignore_index
+
+ Optional[bool]
+
+
+
+
If True,
+the original index is ignored. If False, the original index
+is retained and the index labels will be repeated as necessary.
+
+
+
+ True
+
+
+
+
+
+
+
+
Returns:
+
+
+
+
Type
+
Description
+
+
+
+
+
+ DataFrame
+
+
+
+
A pandas DataFrame that has been unpivoted from wide to long
+format.
@pf.register_dataframe_method
+defpivot_longer(
+ df:pd.DataFrame,
+ index:Optional[Union[list,tuple,str,Pattern]]=None,
+ column_names:Optional[Union[list,tuple,str,Pattern]]=None,
+ names_to:Optional[Union[list,tuple,str]]=None,
+ values_to:Optional[str]="value",
+ column_level:Optional[Union[int,str]]=None,
+ names_sep:Optional[Union[str,Pattern]]=None,
+ names_pattern:Optional[Union[list,tuple,str,Pattern]]=None,
+ names_transform:Optional[Union[str,Callable,dict]]=None,
+ dropna:bool=False,
+ sort_by_appearance:Optional[bool]=False,
+ ignore_index:Optional[bool]=True,
+)->pd.DataFrame:
+"""Unpivots a DataFrame from *wide* to *long* format.
+
+ This method does not mutate the original DataFrame.
+
+ It is modeled after the `pivot_longer` function in R's tidyr package,
+ and also takes inspiration from R's data.table package.
+
+ This function is useful to massage a DataFrame into a format where
+ one or more columns are considered measured variables, and all other
+ columns are considered as identifier variables.
+
+ All measured variables are *unpivoted* (and typically duplicated) along the
+ row axis.
+
+ Column selection in `index` and `column_names` is possible using the
+ [`select`][janitor.functions.select.select] syntax.
+
+ Examples:
+ >>> import pandas as pd
+ >>> import janitor
+ >>> df = pd.DataFrame(
+ ... {
+ ... "Sepal.Length": [5.1, 5.9],
+ ... "Sepal.Width": [3.5, 3.0],
+ ... "Petal.Length": [1.4, 5.1],
+ ... "Petal.Width": [0.2, 1.8],
+ ... "Species": ["setosa", "virginica"],
+ ... }
+ ... )
+ >>> df
+ Sepal.Length Sepal.Width Petal.Length Petal.Width Species
+ 0 5.1 3.5 1.4 0.2 setosa
+ 1 5.9 3.0 5.1 1.8 virginica
+
+ Replicate pandas' melt:
+ >>> df.pivot_longer(index = 'Species')
+ Species variable value
+ 0 setosa Sepal.Length 5.1
+ 1 virginica Sepal.Length 5.9
+ 2 setosa Sepal.Width 3.5
+ 3 virginica Sepal.Width 3.0
+ 4 setosa Petal.Length 1.4
+ 5 virginica Petal.Length 5.1
+ 6 setosa Petal.Width 0.2
+ 7 virginica Petal.Width 1.8
+
+ Convenient, flexible column selection in the `index` via the
+ [`select`][janitor.functions.select.select] syntax:
+ >>> from pandas.api.types import is_string_dtype
+ >>> df.pivot_longer(index = is_string_dtype)
+ Species variable value
+ 0 setosa Sepal.Length 5.1
+ 1 virginica Sepal.Length 5.9
+ 2 setosa Sepal.Width 3.5
+ 3 virginica Sepal.Width 3.0
+ 4 setosa Petal.Length 1.4
+ 5 virginica Petal.Length 5.1
+ 6 setosa Petal.Width 0.2
+ 7 virginica Petal.Width 1.8
+
+ Split the column labels into parts:
+ >>> df.pivot_longer(
+ ... index = 'Species',
+ ... names_to = ('part', 'dimension'),
+ ... names_sep = '.',
+ ... sort_by_appearance = True,
+ ... )
+ Species part dimension value
+ 0 setosa Sepal Length 5.1
+ 1 setosa Sepal Width 3.5
+ 2 setosa Petal Length 1.4
+ 3 setosa Petal Width 0.2
+ 4 virginica Sepal Length 5.9
+ 5 virginica Sepal Width 3.0
+ 6 virginica Petal Length 5.1
+ 7 virginica Petal Width 1.8
+
+ Retain parts of the column names as headers:
+ >>> df.pivot_longer(
+ ... index = 'Species',
+ ... names_to = ('part', '.value'),
+ ... names_sep = '.',
+ ... sort_by_appearance = True,
+ ... )
+ Species part Length Width
+ 0 setosa Sepal 5.1 3.5
+ 1 setosa Petal 1.4 0.2
+ 2 virginica Sepal 5.9 3.0
+ 3 virginica Petal 5.1 1.8
+
+ Split the column labels based on regex:
+ >>> df = pd.DataFrame({"id": [1], "new_sp_m5564": [2], "newrel_f65": [3]})
+ >>> df
+ id new_sp_m5564 newrel_f65
+ 0 1 2 3
+ >>> df.pivot_longer(
+ ... index = 'id',
+ ... names_to = ('diagnosis', 'gender', 'age'),
+ ... names_pattern = r"new_?(.+)_(.)(\\d+)",
+ ... )
+ id diagnosis gender age value
+ 0 1 sp m 5564 2
+ 1 1 rel f 65 3
+
+ Split the column labels for the above dataframe using named groups in `names_pattern`:
+ >>> df.pivot_longer(
+ ... index = 'id',
+ ... names_pattern = r"new_?(?P<diagnosis>.+)_(?P<gender>.)(?P<age>\\d+)",
+ ... )
+ id diagnosis gender age value
+ 0 1 sp m 5564 2
+ 1 1 rel f 65 3
+
+ Convert the dtypes of specific columns with `names_transform`:
+ >>> result = (df
+ ... .pivot_longer(
+ ... index = 'id',
+ ... names_to = ('diagnosis', 'gender', 'age'),
+ ... names_pattern = r"new_?(.+)_(.)(\\d+)",
+ ... names_transform = {'gender': 'category', 'age':'int'})
+ ... )
+ >>> result.dtypes
+ id int64
+ diagnosis object
+ gender category
+ age int64
+ value int64
+ dtype: object
+
+ Use multiple `.value` to reshape dataframe:
+ >>> df = pd.DataFrame(
+ ... [
+ ... {
+ ... "x_1_mean": 10,
+ ... "x_2_mean": 20,
+ ... "y_1_mean": 30,
+ ... "y_2_mean": 40,
+ ... "unit": 50,
+ ... }
+ ... ]
+ ... )
+ >>> df
+ x_1_mean x_2_mean y_1_mean y_2_mean unit
+ 0 10 20 30 40 50
+ >>> df.pivot_longer(
+ ... index="unit",
+ ... names_to=(".value", "time", ".value"),
+ ... names_pattern=r"(x|y)_([0-9])(_mean)",
+ ... )
+ unit time x_mean y_mean
+ 0 50 1 10 30
+ 1 50 2 20 40
+
+ Replicate the above with named groups in `names_pattern` - use `_` instead of `.value`:
+ >>> df.pivot_longer(
+ ... index="unit",
+ ... names_pattern=r"(?P<_>x|y)_(?P<time>[0-9])(?P<__>_mean)",
+ ... )
+ unit time x_mean y_mean
+ 0 50 1 10 30
+ 1 50 2 20 40
+
+ Convenient, flexible column selection in the `column_names` via
+ [`select`][janitor.functions.select.select] syntax:
+ >>> df.pivot_longer(
+ ... column_names="*mean",
+ ... names_to=(".value", "time", ".value"),
+ ... names_pattern=r"(x|y)_([0-9])(_mean)",
+ ... )
+ unit time x_mean y_mean
+ 0 50 1 10 30
+ 1 50 2 20 40
+
+ >>> df.pivot_longer(
+ ... column_names=slice("x_1_mean", "y_2_mean"),
+ ... names_to=(".value", "time", ".value"),
+ ... names_pattern=r"(x|y)_([0-9])(_mean)",
+ ... )
+ unit time x_mean y_mean
+ 0 50 1 10 30
+ 1 50 2 20 40
+
+ Reshape dataframe by passing a sequence to `names_pattern`:
+ >>> df = pd.DataFrame({'hr1': [514, 573],
+ ... 'hr2': [545, 526],
+ ... 'team': ['Red Sox', 'Yankees'],
+ ... 'year1': [2007, 2007],
+ ... 'year2': [2008, 2008]})
+ >>> df
+ hr1 hr2 team year1 year2
+ 0 514 545 Red Sox 2007 2008
+ 1 573 526 Yankees 2007 2008
+ >>> df.pivot_longer(
+ ... index = 'team',
+ ... names_to = ['year', 'hr'],
+ ... names_pattern = ['year', 'hr']
+ ... )
+ team hr year
+ 0 Red Sox 514 2007
+ 1 Yankees 573 2007
+ 2 Red Sox 545 2008
+ 3 Yankees 526 2008
+
+
+ Reshape above dataframe by passing a dictionary to `names_pattern`:
+ >>> df.pivot_longer(
+ ... index = 'team',
+ ... names_pattern = {"year":"year", "hr":"hr"}
+ ... )
+ team hr year
+ 0 Red Sox 514 2007
+ 1 Yankees 573 2007
+ 2 Red Sox 545 2008
+ 3 Yankees 526 2008
+
+ Multiple values_to:
+ >>> df = pd.DataFrame(
+ ... {
+ ... "City": ["Houston", "Austin", "Hoover"],
+ ... "State": ["Texas", "Texas", "Alabama"],
+ ... "Name": ["Aria", "Penelope", "Niko"],
+ ... "Mango": [4, 10, 90],
+ ... "Orange": [10, 8, 14],
+ ... "Watermelon": [40, 99, 43],
+ ... "Gin": [16, 200, 34],
+ ... "Vodka": [20, 33, 18],
+ ... },
+ ... columns=[
+ ... "City",
+ ... "State",
+ ... "Name",
+ ... "Mango",
+ ... "Orange",
+ ... "Watermelon",
+ ... "Gin",
+ ... "Vodka",
+ ... ],
+ ... )
+ >>> df
+ City State Name Mango Orange Watermelon Gin Vodka
+ 0 Houston Texas Aria 4 10 40 16 20
+ 1 Austin Texas Penelope 10 8 99 200 33
+ 2 Hoover Alabama Niko 90 14 43 34 18
+ >>> df.pivot_longer(
+ ... index=["City", "State"],
+ ... column_names=slice("Mango", "Vodka"),
+ ... names_to=("Fruit", "Drink"),
+ ... values_to=("Pounds", "Ounces"),
+ ... names_pattern=["M|O|W", "G|V"],
+ ... )
+ City State Fruit Pounds Drink Ounces
+ 0 Houston Texas Mango 4 Gin 16.0
+ 1 Austin Texas Mango 10 Gin 200.0
+ 2 Hoover Alabama Mango 90 Gin 34.0
+ 3 Houston Texas Orange 10 Vodka 20.0
+ 4 Austin Texas Orange 8 Vodka 33.0
+ 5 Hoover Alabama Orange 14 Vodka 18.0
+ 6 Houston Texas Watermelon 40 None NaN
+ 7 Austin Texas Watermelon 99 None NaN
+ 8 Hoover Alabama Watermelon 43 None NaN
+
+ Replicate the above transformation with a nested dictionary passed to `names_pattern`
+ - the outer keys in the `names_pattern` dictionary are passed to `names_to`,
+ while the inner keys are passed to `values_to`:
+ >>> df.pivot_longer(
+ ... index=["City", "State"],
+ ... column_names=slice("Mango", "Vodka"),
+ ... names_pattern={
+ ... "Fruit": {"Pounds": "M|O|W"},
+ ... "Drink": {"Ounces": "G|V"},
+ ... },
+ ... )
+ City State Fruit Pounds Drink Ounces
+ 0 Houston Texas Mango 4 Gin 16.0
+ 1 Austin Texas Mango 10 Gin 200.0
+ 2 Hoover Alabama Mango 90 Gin 34.0
+ 3 Houston Texas Orange 10 Vodka 20.0
+ 4 Austin Texas Orange 8 Vodka 33.0
+ 5 Hoover Alabama Orange 14 Vodka 18.0
+ 6 Houston Texas Watermelon 40 None NaN
+ 7 Austin Texas Watermelon 99 None NaN
+ 8 Hoover Alabama Watermelon 43 None NaN
+
+ !!! abstract "Version Changed"
+
+ - 0.24.0
+ - Added `dropna` parameter.
+ - 0.24.1
+ - `names_pattern` can accept a dictionary.
+ - named groups supported in `names_pattern`.
+
+ Args:
+ df: A pandas DataFrame.
+ index: Name(s) of columns to use as identifier variables.
+ Should be either a single column name, or a list/tuple of
+ column names.
+ `index` should be a list of tuples if the columns are a MultiIndex.
+ column_names: Name(s) of columns to unpivot. Should be either
+ a single column name or a list/tuple of column names.
+ `column_names` should be a list of tuples
+ if the columns are a MultiIndex.
+ names_to: Name of new column as a string that will contain
+ what were previously the column names in `column_names`.
+ The default is `variable` if no value is provided. It can
+ also be a list/tuple of strings that will serve as new column
+ names, if `name_sep` or `names_pattern` is provided.
+ If `.value` is in `names_to`, new column names will be extracted
+ from part of the existing column names and overrides`values_to`.
+ values_to: Name of new column as a string that will contain what
+ were previously the values of the columns in `column_names`.
+ values_to can also be a list/tuple
+ and requires that names_pattern is also a list/tuple.
+ column_level: If columns are a MultiIndex, then use this level to
+ unpivot the DataFrame. Provided for compatibility with pandas' melt,
+ and applies only if neither `names_sep` nor `names_pattern` is
+ provided.
+ names_sep: Determines how the column name is broken up, if
+ `names_to` contains multiple values. It takes the same
+ specification as pandas' `str.split` method, and can be a string
+ or regular expression. `names_sep` does not work with MultiIndex
+ columns.
+ names_pattern: Determines how the column name is broken up.
+ It can be a regular expression containing matching groups.
+ Under the hood it is processed with pandas' `str.extract` function.
+ If it is a single regex, the number of groups must match
+ the length of `names_to`.
+ Named groups are supported, if `names_to` is none. `_` is used
+ instead of `.value` as a placeholder in named groups.
+ `_` can be overloaded for multiple `.value`
+ calls - `_`, `__`, `___`, ...
+ `names_pattern` can also be a list/tuple of regular expressions
+ It can also be a list/tuple of strings;
+ the strings will be treated as regular expressions.
+ Under the hood it is processed with pandas' `str.contains` function.
+ For a list/tuple of regular expressions,
+ `names_to` must also be a list/tuple and the lengths of both
+ arguments must match.
+ `names_pattern` can also be a dictionary, where the keys are
+ the new column names, while the values can be a regular expression
+ or a string which will be evaluated as a regular expression.
+ Alternatively, a nested dictionary can be used, where the sub
+ key(s) are associated with `values_to`. Please have a look
+ at the examples for usage.
+ `names_pattern` does not work with MultiIndex columns.
+ names_transform: Use this option to change the types of columns that
+ have been transformed to rows. This does not applies to the values' columns.
+ Accepts any argument that is acceptable by `pd.astype`.
+ dropna: Determines whether or not to drop nulls
+ from the values columns. Default is `False`.
+ sort_by_appearance: Boolean value that determines
+ the final look of the DataFrame. If `True`, the unpivoted DataFrame
+ will be stacked in order of first appearance.
+ ignore_index: If `True`,
+ the original index is ignored. If `False`, the original index
+ is retained and the index labels will be repeated as necessary.
+
+ Returns:
+ A pandas DataFrame that has been unpivoted from wide to long
+ format.
+ """# noqa: E501
+
+ # this code builds on the wonderful work of @benjaminjack’s PR
+ # https://github.com/benjaminjack/pyjanitor/commit/e3df817903c20dd21634461c8a92aec137963ed0
+
+ return_computations_pivot_longer(
+ df=df,
+ index=index,
+ column_names=column_names,
+ column_level=column_level,
+ names_to=names_to,
+ values_to=values_to,
+ names_sep=names_sep,
+ names_pattern=names_pattern,
+ names_transform=names_transform,
+ dropna=dropna,
+ sort_by_appearance=sort_by_appearance,
+ ignore_index=ignore_index,
+ )
+
This function will be deprecated in a 1.x release.
+Please use pd.DataFrame.pivot instead.
+
+
The number of columns are increased, while decreasing
+the number of rows. It is the inverse of the
+pivot_longer
+method, and is a wrapper around pd.DataFrame.pivot method.
+
This method does not mutate the original DataFrame.
+
Column selection in index, names_from and values_from
+is possible using the
+select syntax.
+
A ValueError is raised if the combination
+of the index and names_from is not unique.
+
By default, values from values_from are always
+at the top level if the columns are not flattened.
+If flattened, the values from values_from are usually
+at the start of each label in the columns.
Expand the index to expose implicit missing values
+- this applies only to categorical columns:
+
>>> daily=daily.assign(letter=list('ABBA'))
+>>> daily
+ day value letter
+0 Tue 2 A
+0 Thu 3 B
+0 Fri 1 B
+0 Mon 5 A
+>>> daily.pivot_wider(index='day',names_from='letter',values_from='value')
+ day A B
+0 Tue 2.0 NaN
+1 Thu NaN 3.0
+2 Fri NaN 1.0
+3 Mon 5.0 NaN
+>>> (daily
+... .pivot_wider(
+... index='day',
+... names_from='letter',
+... values_from='value',
+... index_expand=True)
+... )
+ day A B
+0 Mon 5.0 NaN
+1 Tue 2.0 NaN
+2 Wed NaN NaN
+3 Thu NaN 3.0
+4 Fri NaN 1.0
+5 Sat NaN NaN
+6 Sun NaN NaN
+
+
+
Version Changed
+
+
0.24.0
+
Added reset_index, names_expand and index_expand parameters.
+
+
+
+
+
+
+
+
Parameters:
+
+
+
+
Name
+
Type
+
Description
+
Default
+
+
+
+
+
df
+
+ DataFrame
+
+
+
+
A pandas DataFrame.
+
+
+
+ required
+
+
+
+
index
+
+ Optional[Union[list, str]]
+
+
+
+
Name(s) of columns to use as identifier variables.
+It should be either a single column name, or a list of column names.
+If index is not provided, the DataFrame's index is used.
+
+
+
+ None
+
+
+
+
names_from
+
+ Optional[Union[list, str]]
+
+
+
+
Name(s) of column(s) to use to make the new
+DataFrame's columns. Should be either a single column name,
+or a list of column names.
+
+
+
+ None
+
+
+
+
values_from
+
+ Optional[Union[list, str]]
+
+
+
+
Name(s) of column(s) that will be used for populating
+the new DataFrame's values.
+If values_from is not specified, all remaining columns
+will be used.
+
+
+
+ None
+
+
+
+
flatten_levels
+
+ Optional[bool]
+
+
+
+
If False, the DataFrame stays as a MultiIndex.
+
+
+
+ True
+
+
+
+
names_sep
+
+ str
+
+
+
+
If names_from or values_from contain multiple
+variables, this will be used to join the values into a single string
+to use as a column name. Default is _.
+Applicable only if flatten_levels is True.
+
+
+
+ '_'
+
+
+
+
names_glue
+
+ str
+
+
+
+
A string to control the output of the flattened columns.
+It offers more flexibility in creating custom column names,
+and uses python's str.format_map under the hood.
+Simply create the string template,
+using the column labels in names_from,
+and special _value as a placeholder for values_from.
+Applicable only if flatten_levels is True.
+
+
+
+ None
+
+
+
+
reset_index
+
+ bool
+
+
+
+
Determines whether to restore index
+as a column/columns. Applicable only if index is provided,
+and flatten_levels is True.
+
+
+
+ True
+
+
+
+
names_expand
+
+ bool
+
+
+
+
Expand columns to show all the categories.
+Applies only if names_from is a categorical column.
+
+
+
+ False
+
+
+
+
index_expand
+
+ bool
+
+
+
+
Expand the index to show all the categories.
+Applies only if index is a categorical column.
+
+
+
+ False
+
+
+
+
+
+
+
+
Returns:
+
+
+
+
Type
+
Description
+
+
+
+
+
+ DataFrame
+
+
+
+
A pandas DataFrame that has been unpivoted from long to wide form.
@pf.register_dataframe_method
+@refactored_function(
+ message=(
+ "This function will be deprecated in a 1.x release. "
+ "Please use `pd.DataFrame.pivot` instead."
+ )
+)
+defpivot_wider(
+ df:pd.DataFrame,
+ index:Optional[Union[list,str]]=None,
+ names_from:Optional[Union[list,str]]=None,
+ values_from:Optional[Union[list,str]]=None,
+ flatten_levels:Optional[bool]=True,
+ names_sep:str="_",
+ names_glue:str=None,
+ reset_index:bool=True,
+ names_expand:bool=False,
+ index_expand:bool=False,
+)->pd.DataFrame:
+"""Reshapes data from *long* to *wide* form.
+
+ !!!note
+
+ This function will be deprecated in a 1.x release.
+ Please use `pd.DataFrame.pivot` instead.
+
+ The number of columns are increased, while decreasing
+ the number of rows. It is the inverse of the
+ [`pivot_longer`][janitor.functions.pivot.pivot_longer]
+ method, and is a wrapper around `pd.DataFrame.pivot` method.
+
+ This method does not mutate the original DataFrame.
+
+ Column selection in `index`, `names_from` and `values_from`
+ is possible using the
+ [`select`][janitor.functions.select.select] syntax.
+
+ A ValueError is raised if the combination
+ of the `index` and `names_from` is not unique.
+
+ By default, values from `values_from` are always
+ at the top level if the columns are not flattened.
+ If flattened, the values from `values_from` are usually
+ at the start of each label in the columns.
+
+ Examples:
+ >>> import pandas as pd
+ >>> import janitor
+ >>> df = [{'dep': 5.5, 'step': 1, 'a': 20, 'b': 30},
+ ... {'dep': 5.5, 'step': 2, 'a': 25, 'b': 37},
+ ... {'dep': 6.1, 'step': 1, 'a': 22, 'b': 19},
+ ... {'dep': 6.1, 'step': 2, 'a': 18, 'b': 29}]
+ >>> df = pd.DataFrame(df)
+ >>> df
+ dep step a b
+ 0 5.5 1 20 30
+ 1 5.5 2 25 37
+ 2 6.1 1 22 19
+ 3 6.1 2 18 29
+
+ Pivot and flatten columns:
+ >>> df.pivot_wider( # doctest: +SKIP
+ ... index = "dep",
+ ... names_from = "step",
+ ... )
+ dep a_1 a_2 b_1 b_2
+ 0 5.5 20 25 30 37
+ 1 6.1 22 18 19 29
+
+ Modify columns with `names_sep`:
+ >>> df.pivot_wider( # doctest: +SKIP
+ ... index = "dep",
+ ... names_from = "step",
+ ... names_sep = "",
+ ... )
+ dep a1 a2 b1 b2
+ 0 5.5 20 25 30 37
+ 1 6.1 22 18 19 29
+
+ Modify columns with `names_glue`:
+ >>> df.pivot_wider( # doctest: +SKIP
+ ... index = "dep",
+ ... names_from = "step",
+ ... names_glue = "{_value}_step{step}",
+ ... )
+ dep a_step1 a_step2 b_step1 b_step2
+ 0 5.5 20 25 30 37
+ 1 6.1 22 18 19 29
+
+ Expand columns to expose implicit missing values
+ - this applies only to categorical columns:
+ >>> weekdays = ("Mon", "Tue", "Wed", "Thu", "Fri", "Sat", "Sun")
+ >>> daily = pd.DataFrame(
+ ... {
+ ... "day": pd.Categorical(
+ ... values=("Tue", "Thu", "Fri", "Mon"), categories=weekdays
+ ... ),
+ ... "value": (2, 3, 1, 5),
+ ... },
+ ... index=[0, 0, 0, 0],
+ ... )
+ >>> daily
+ day value
+ 0 Tue 2
+ 0 Thu 3
+ 0 Fri 1
+ 0 Mon 5
+ >>> daily.pivot_wider(names_from='day', values_from='value') # doctest: +SKIP
+ Tue Thu Fri Mon
+ 0 2 3 1 5
+ >>> (daily # doctest: +SKIP
+ ... .pivot_wider(
+ ... names_from='day',
+ ... values_from='value',
+ ... names_expand=True)
+ ... )
+ Mon Tue Wed Thu Fri Sat Sun
+ 0 5 2 NaN 3 1 NaN NaN
+
+ Expand the index to expose implicit missing values
+ - this applies only to categorical columns:
+ >>> daily = daily.assign(letter = list('ABBA'))
+ >>> daily
+ day value letter
+ 0 Tue 2 A
+ 0 Thu 3 B
+ 0 Fri 1 B
+ 0 Mon 5 A
+ >>> daily.pivot_wider(index='day',names_from='letter',values_from='value') # doctest: +SKIP
+ day A B
+ 0 Tue 2.0 NaN
+ 1 Thu NaN 3.0
+ 2 Fri NaN 1.0
+ 3 Mon 5.0 NaN
+ >>> (daily # doctest: +SKIP
+ ... .pivot_wider(
+ ... index='day',
+ ... names_from='letter',
+ ... values_from='value',
+ ... index_expand=True)
+ ... )
+ day A B
+ 0 Mon 5.0 NaN
+ 1 Tue 2.0 NaN
+ 2 Wed NaN NaN
+ 3 Thu NaN 3.0
+ 4 Fri NaN 1.0
+ 5 Sat NaN NaN
+ 6 Sun NaN NaN
+
+
+ !!! abstract "Version Changed"
+
+ - 0.24.0
+ - Added `reset_index`, `names_expand` and `index_expand` parameters.
+
+ Args:
+ df: A pandas DataFrame.
+ index: Name(s) of columns to use as identifier variables.
+ It should be either a single column name, or a list of column names.
+ If `index` is not provided, the DataFrame's index is used.
+ names_from: Name(s) of column(s) to use to make the new
+ DataFrame's columns. Should be either a single column name,
+ or a list of column names.
+ values_from: Name(s) of column(s) that will be used for populating
+ the new DataFrame's values.
+ If `values_from` is not specified, all remaining columns
+ will be used.
+ flatten_levels: If `False`, the DataFrame stays as a MultiIndex.
+ names_sep: If `names_from` or `values_from` contain multiple
+ variables, this will be used to join the values into a single string
+ to use as a column name. Default is `_`.
+ Applicable only if `flatten_levels` is `True`.
+ names_glue: A string to control the output of the flattened columns.
+ It offers more flexibility in creating custom column names,
+ and uses python's `str.format_map` under the hood.
+ Simply create the string template,
+ using the column labels in `names_from`,
+ and special `_value` as a placeholder for `values_from`.
+ Applicable only if `flatten_levels` is `True`.
+ reset_index: Determines whether to restore `index`
+ as a column/columns. Applicable only if `index` is provided,
+ and `flatten_levels` is `True`.
+ names_expand: Expand columns to show all the categories.
+ Applies only if `names_from` is a categorical column.
+ index_expand: Expand the index to show all the categories.
+ Applies only if `index` is a categorical column.
+
+ Returns:
+ A pandas DataFrame that has been unpivoted from long to wide form.
+ """# noqa: E501
+
+ # no need for an explicit copy --> df = df.copy()
+ # `pd.pivot` creates one
+ return_computations_pivot_wider(
+ df,
+ index,
+ names_from,
+ values_from,
+ flatten_levels,
+ names_sep,
+ names_glue,
+ reset_index,
+ names_expand,
+ index_expand,
+ )
+
This method does not mutate the original DataFrame.
+
+
Note
+
This function will be deprecated in a 1.x release.
+Please use pd.DataFrame.rename instead.
+
+
This is just syntactic sugar/a convenience function for renaming one column at a time.
+If you are convinced that there are multiple columns in need of changing,
+then use the pandas.DataFrame.rename method.
+
+
+
+
Examples:
+
Change the name of column 'a' to 'a_new'.
+
>>> importpandasaspd
+>>> importjanitor
+>>> df=pd.DataFrame({"a":list(range(3)),"b":list("abc")})
+>>> df.rename_column(old_column_name='a',new_column_name='a_new')
+ a_new b
+0 0 a
+1 1 b
+2 2 c
+
+
+
+
+
Parameters:
+
+
+
+
Name
+
Type
+
Description
+
Default
+
+
+
+
+
df
+
+ DataFrame
+
+
+
+
The pandas DataFrame object.
+
+
+
+ required
+
+
+
+
old_column_name
+
+ str
+
+
+
+
The old column name.
+
+
+
+ required
+
+
+
+
new_column_name
+
+ str
+
+
+
+
The new column name.
+
+
+
+ required
+
+
+
+
+
+
+
+
Returns:
+
+
+
+
Type
+
Description
+
+
+
+
+
+ DataFrame
+
+
+
+
A pandas DataFrame with renamed columns.
+
+
+
+
+
+
+
+ Source code in janitor/functions/rename_columns.py
+
@pf.register_dataframe_method
+@refactored_function(
+ message=(
+ "This function will be deprecated in a 1.x release. "
+ "Please use `pd.DataFrame.rename` instead."
+ )
+)
+@deprecated_alias(old="old_column_name",new="new_column_name")
+defrename_column(
+ df:pd.DataFrame,
+ old_column_name:str,
+ new_column_name:str,
+)->pd.DataFrame:
+"""Rename a column in place.
+
+ This method does not mutate the original DataFrame.
+
+ !!!note
+
+ This function will be deprecated in a 1.x release.
+ Please use `pd.DataFrame.rename` instead.
+
+ This is just syntactic sugar/a convenience function for renaming one column at a time.
+ If you are convinced that there are multiple columns in need of changing,
+ then use the `pandas.DataFrame.rename` method.
+
+ Examples:
+ Change the name of column 'a' to 'a_new'.
+
+ >>> import pandas as pd
+ >>> import janitor
+ >>> df = pd.DataFrame({"a": list(range(3)), "b": list("abc")})
+ >>> df.rename_column(old_column_name='a', new_column_name='a_new')
+ a_new b
+ 0 0 a
+ 1 1 b
+ 2 2 c
+
+ Args:
+ df: The pandas DataFrame object.
+ old_column_name: The old column name.
+ new_column_name: The new column name.
+
+ Returns:
+ A pandas DataFrame with renamed columns.
+ """# noqa: E501
+
+ check_column(df,[old_column_name])
+
+ returndf.rename(columns={old_column_name:new_column_name})
+
+
+
+
+
+
+
+
+
+
+
+
+ select_columns(df,*args,invert=False)
+
+
+
+
+
+
+
Method-chainable selection of columns.
+
It accepts a string, shell-like glob strings (*string*),
+regex, slice, array-like object, or a list of the previous options.
+
Selection on a MultiIndex on a level, or multiple levels,
+is possible with a dictionary.
+
This method does not mutate the original DataFrame.
+
Optional ability to invert selection of columns available as well.
+
+
Note
+
The preferred option when selecting columns or rows in a Pandas DataFrame
+is with .loc or .iloc methods.
+select_columns is primarily for convenience.
+
+
+
Note
+
This function will be deprecated in a 1.x release.
+Please use jn.select instead.
+
+
+
+
+
Examples:
+
>>> importpandasaspd
+>>> importjanitor
+>>> fromnumpyimportnan
+>>> pd.set_option("display.max_columns",None)
+>>> pd.set_option("display.expand_frame_repr",False)
+>>> pd.set_option("max_colwidth",None)
+>>> data={'name':['Cheetah','Owl monkey','Mountain beaver',
+... 'Greater short-tailed shrew','Cow'],
+... 'genus':['Acinonyx','Aotus','Aplodontia','Blarina','Bos'],
+... 'vore':['carni','omni','herbi','omni','herbi'],
+... 'order':['Carnivora','Primates','Rodentia','Soricomorpha','Artiodactyla'],
+... 'conservation':['lc',nan,'nt','lc','domesticated'],
+... 'sleep_total':[12.1,17.0,14.4,14.9,4.0],
+... 'sleep_rem':[nan,1.8,2.4,2.3,0.7],
+... 'sleep_cycle':[nan,nan,nan,0.133333333,0.666666667],
+... 'awake':[11.9,7.0,9.6,9.1,20.0],
+... 'brainwt':[nan,0.0155,nan,0.00029,0.423],
+... 'bodywt':[50.0,0.48,1.35,0.019,600.0]}
+>>> df=pd.DataFrame(data)
+>>> df
+ name genus vore order conservation sleep_total sleep_rem sleep_cycle awake brainwt bodywt
+0 Cheetah Acinonyx carni Carnivora lc 12.1 NaN NaN 11.9 NaN 50.000
+1 Owl monkey Aotus omni Primates NaN 17.0 1.8 NaN 7.0 0.01550 0.480
+2 Mountain beaver Aplodontia herbi Rodentia nt 14.4 2.4 NaN 9.6 NaN 1.350
+3 Greater short-tailed shrew Blarina omni Soricomorpha lc 14.9 2.3 0.133333 9.1 0.00029 0.019
+4 Cow Bos herbi Artiodactyla domesticated 4.0 0.7 0.666667 20.0 0.42300 600.000
+
Valid inputs include: an exact column name to look for,
+a shell-style glob string (e.g. *_thing_*),
+a regular expression,
+a callable,
+or variable arguments of all the aforementioned.
+A sequence of booleans is also acceptable.
+A dictionary can be used for selection
+on a MultiIndex on different levels.
+
+
+
+ ()
+
+
+
+
invert
+
+ bool
+
+
+
+
Whether or not to invert the selection.
+This will result in the selection
+of the complement of the columns provided.
+
+
+
+ False
+
+
+
+
+
+
+
+
Returns:
+
+
+
+
Type
+
Description
+
+
+
+
+
+ DataFrame
+
+
+
+
A pandas DataFrame with the specified columns selected.
+
+
+
+
+
+
+
+ Source code in janitor/functions/select.py
+
@pf.register_dataframe_method
+@refactored_function(
+ message=(
+ "This function will be deprecated in a 1.x release. "
+ "Please use `jn.select` instead."
+ )
+)
+defselect_columns(
+ df:pd.DataFrame,
+ *args:Any,
+ invert:bool=False,
+)->pd.DataFrame:
+"""Method-chainable selection of columns.
+
+ It accepts a string, shell-like glob strings `(*string*)`,
+ regex, slice, array-like object, or a list of the previous options.
+
+ Selection on a MultiIndex on a level, or multiple levels,
+ is possible with a dictionary.
+
+ This method does not mutate the original DataFrame.
+
+ Optional ability to invert selection of columns available as well.
+
+ !!!note
+
+ The preferred option when selecting columns or rows in a Pandas DataFrame
+ is with `.loc` or `.iloc` methods.
+ `select_columns` is primarily for convenience.
+
+ !!!note
+
+ This function will be deprecated in a 1.x release.
+ Please use `jn.select` instead.
+
+ Examples:
+ >>> import pandas as pd
+ >>> import janitor
+ >>> from numpy import nan
+ >>> pd.set_option("display.max_columns", None)
+ >>> pd.set_option("display.expand_frame_repr", False)
+ >>> pd.set_option("max_colwidth", None)
+ >>> data = {'name': ['Cheetah','Owl monkey','Mountain beaver',
+ ... 'Greater short-tailed shrew','Cow'],
+ ... 'genus': ['Acinonyx', 'Aotus', 'Aplodontia', 'Blarina', 'Bos'],
+ ... 'vore': ['carni', 'omni', 'herbi', 'omni', 'herbi'],
+ ... 'order': ['Carnivora','Primates','Rodentia','Soricomorpha','Artiodactyla'],
+ ... 'conservation': ['lc', nan, 'nt', 'lc', 'domesticated'],
+ ... 'sleep_total': [12.1, 17.0, 14.4, 14.9, 4.0],
+ ... 'sleep_rem': [nan, 1.8, 2.4, 2.3, 0.7],
+ ... 'sleep_cycle': [nan, nan, nan, 0.133333333, 0.666666667],
+ ... 'awake': [11.9, 7.0, 9.6, 9.1, 20.0],
+ ... 'brainwt': [nan, 0.0155, nan, 0.00029, 0.423],
+ ... 'bodywt': [50.0, 0.48, 1.35, 0.019, 600.0]}
+ >>> df = pd.DataFrame(data)
+ >>> df
+ name genus vore order conservation sleep_total sleep_rem sleep_cycle awake brainwt bodywt
+ 0 Cheetah Acinonyx carni Carnivora lc 12.1 NaN NaN 11.9 NaN 50.000
+ 1 Owl monkey Aotus omni Primates NaN 17.0 1.8 NaN 7.0 0.01550 0.480
+ 2 Mountain beaver Aplodontia herbi Rodentia nt 14.4 2.4 NaN 9.6 NaN 1.350
+ 3 Greater short-tailed shrew Blarina omni Soricomorpha lc 14.9 2.3 0.133333 9.1 0.00029 0.019
+ 4 Cow Bos herbi Artiodactyla domesticated 4.0 0.7 0.666667 20.0 0.42300 600.000
+
+ Explicit label selection:
+ >>> df.select_columns('name', 'order')
+ name order
+ 0 Cheetah Carnivora
+ 1 Owl monkey Primates
+ 2 Mountain beaver Rodentia
+ 3 Greater short-tailed shrew Soricomorpha
+ 4 Cow Artiodactyla
+
+ Selection via globbing:
+ >>> df.select_columns("sleep*", "*wt")
+ sleep_total sleep_rem sleep_cycle brainwt bodywt
+ 0 12.1 NaN NaN NaN 50.000
+ 1 17.0 1.8 NaN 0.01550 0.480
+ 2 14.4 2.4 NaN NaN 1.350
+ 3 14.9 2.3 0.133333 0.00029 0.019
+ 4 4.0 0.7 0.666667 0.42300 600.000
+
+ Selection via regex:
+ >>> import re
+ >>> df.select_columns(re.compile(r"o.+er"))
+ order conservation
+ 0 Carnivora lc
+ 1 Primates NaN
+ 2 Rodentia nt
+ 3 Soricomorpha lc
+ 4 Artiodactyla domesticated
+
+ Selection via slicing:
+ >>> df.select_columns(slice('name','order'), slice('sleep_total','sleep_cycle'))
+ name genus vore order sleep_total sleep_rem sleep_cycle
+ 0 Cheetah Acinonyx carni Carnivora 12.1 NaN NaN
+ 1 Owl monkey Aotus omni Primates 17.0 1.8 NaN
+ 2 Mountain beaver Aplodontia herbi Rodentia 14.4 2.4 NaN
+ 3 Greater short-tailed shrew Blarina omni Soricomorpha 14.9 2.3 0.133333
+ 4 Cow Bos herbi Artiodactyla 4.0 0.7 0.666667
+
+ Selection via callable:
+ >>> from pandas.api.types import is_numeric_dtype
+ >>> df.select_columns(is_numeric_dtype)
+ sleep_total sleep_rem sleep_cycle awake brainwt bodywt
+ 0 12.1 NaN NaN 11.9 NaN 50.000
+ 1 17.0 1.8 NaN 7.0 0.01550 0.480
+ 2 14.4 2.4 NaN 9.6 NaN 1.350
+ 3 14.9 2.3 0.133333 9.1 0.00029 0.019
+ 4 4.0 0.7 0.666667 20.0 0.42300 600.000
+ >>> df.select_columns(lambda f: f.isna().any())
+ conservation sleep_rem sleep_cycle brainwt
+ 0 lc NaN NaN NaN
+ 1 NaN 1.8 NaN 0.01550
+ 2 nt 2.4 NaN NaN
+ 3 lc 2.3 0.133333 0.00029
+ 4 domesticated 0.7 0.666667 0.42300
+
+ Exclude columns with the `invert` parameter:
+ >>> df.select_columns(is_numeric_dtype, invert=True)
+ name genus vore order conservation
+ 0 Cheetah Acinonyx carni Carnivora lc
+ 1 Owl monkey Aotus omni Primates NaN
+ 2 Mountain beaver Aplodontia herbi Rodentia nt
+ 3 Greater short-tailed shrew Blarina omni Soricomorpha lc
+ 4 Cow Bos herbi Artiodactyla domesticated
+
+ Exclude columns with the `DropLabel` class:
+ >>> from janitor import DropLabel
+ >>> df.select_columns(DropLabel(slice("name", "awake")), "conservation")
+ brainwt bodywt conservation
+ 0 NaN 50.000 lc
+ 1 0.01550 0.480 NaN
+ 2 NaN 1.350 nt
+ 3 0.00029 0.019 lc
+ 4 0.42300 600.000 domesticated
+
+ Selection on MultiIndex columns:
+ >>> d = {'num_legs': [4, 4, 2, 2],
+ ... 'num_wings': [0, 0, 2, 2],
+ ... 'class': ['mammal', 'mammal', 'mammal', 'bird'],
+ ... 'animal': ['cat', 'dog', 'bat', 'penguin'],
+ ... 'locomotion': ['walks', 'walks', 'flies', 'walks']}
+ >>> df = pd.DataFrame(data=d)
+ >>> df = df.set_index(['class', 'animal', 'locomotion']).T
+ >>> df
+ class mammal bird
+ animal cat dog bat penguin
+ locomotion walks walks flies walks
+ num_legs 4 4 2 2
+ num_wings 0 0 2 2
+
+ Selection with a scalar:
+ >>> df.select_columns('mammal')
+ class mammal
+ animal cat dog bat
+ locomotion walks walks flies
+ num_legs 4 4 2
+ num_wings 0 0 2
+
+ Selection with a tuple:
+ >>> df.select_columns(('mammal','bat'))
+ class mammal
+ animal bat
+ locomotion flies
+ num_legs 2
+ num_wings 2
+
+ Selection within a level is possible with a dictionary,
+ where the key is either a level name or number:
+ >>> df.select_columns({'animal':'cat'})
+ class mammal
+ animal cat
+ locomotion walks
+ num_legs 4
+ num_wings 0
+ >>> df.select_columns({1:["bat", "cat"]})
+ class mammal
+ animal bat cat
+ locomotion flies walks
+ num_legs 2 4
+ num_wings 2 0
+
+ Selection on multiple levels:
+ >>> df.select_columns({"class":"mammal", "locomotion":"flies"})
+ class mammal
+ animal bat
+ locomotion flies
+ num_legs 2
+ num_wings 2
+
+ Selection with a regex on a level:
+ >>> df.select_columns({"animal":re.compile(".+t$")})
+ class mammal
+ animal cat bat
+ locomotion walks flies
+ num_legs 4 2
+ num_wings 0 2
+
+ Selection with a callable on a level:
+ >>> df.select_columns({"animal":lambda f: f.str.endswith('t')})
+ class mammal
+ animal cat bat
+ locomotion walks flies
+ num_legs 4 2
+ num_wings 0 2
+
+ Args:
+ df: A pandas DataFrame.
+ *args: Valid inputs include: an exact column name to look for,
+ a shell-style glob string (e.g. `*_thing_*`),
+ a regular expression,
+ a callable,
+ or variable arguments of all the aforementioned.
+ A sequence of booleans is also acceptable.
+ A dictionary can be used for selection
+ on a MultiIndex on different levels.
+ invert: Whether or not to invert the selection.
+ This will result in the selection
+ of the complement of the columns provided.
+
+ Returns:
+ A pandas DataFrame with the specified columns selected.
+ """# noqa: E501
+
+ return_select(df,columns=list(args),invert=invert)
+
+
+
+
+
+
+
+
+
+
+
+
+ select_rows(df,*args,invert=False)
+
+
+
+
+
+
+
Method-chainable selection of rows.
+
It accepts a string, shell-like glob strings (*string*),
+regex, slice, array-like object, or a list of the previous options.
+
Selection on a MultiIndex on a level, or multiple levels,
+is possible with a dictionary.
+
This method does not mutate the original DataFrame.
+
Optional ability to invert selection of rows available as well.
+
+
New in version 0.24.0
+
+
+
Note
+
The preferred option when selecting columns or rows in a Pandas DataFrame
+is with .loc or .iloc methods, as they are generally performant.
+select_rows is primarily for convenience.
+
+
+
Note
+
This function will be deprecated in a 1.x release.
+Please use jn.select instead.
More examples can be found in the
+select_columns section.
+
+
+
+
Parameters:
+
+
+
+
Name
+
Type
+
Description
+
Default
+
+
+
+
+
df
+
+ DataFrame
+
+
+
+
A pandas DataFrame.
+
+
+
+ required
+
+
+
+
*args
+
+ Any
+
+
+
+
Valid inputs include: an exact index name to look for,
+a shell-style glob string (e.g. *_thing_*),
+a regular expression,
+a callable,
+or variable arguments of all the aforementioned.
+A sequence of booleans is also acceptable.
+A dictionary can be used for selection
+on a MultiIndex on different levels.
+
+
+
+ ()
+
+
+
+
invert
+
+ bool
+
+
+
+
Whether or not to invert the selection.
+This will result in the selection
+of the complement of the rows provided.
+
+
+
+ False
+
+
+
+
+
+
+
+
Returns:
+
+
+
+
Type
+
Description
+
+
+
+
+
+ DataFrame
+
+
+
+
A pandas DataFrame with the specified rows selected.
+
+
+
+
+
+
+
+ Source code in janitor/functions/select.py
+
@pf.register_dataframe_method
+@refactored_function(
+ message=(
+ "This function will be deprecated in a 1.x release. "
+ "Please use `jn.select` instead."
+ )
+)
+defselect_rows(
+ df:pd.DataFrame,
+ *args:Any,
+ invert:bool=False,
+)->pd.DataFrame:
+"""Method-chainable selection of rows.
+
+ It accepts a string, shell-like glob strings `(*string*)`,
+ regex, slice, array-like object, or a list of the previous options.
+
+ Selection on a MultiIndex on a level, or multiple levels,
+ is possible with a dictionary.
+
+ This method does not mutate the original DataFrame.
+
+ Optional ability to invert selection of rows available as well.
+
+
+ !!! info "New in version 0.24.0"
+
+ !!!note
+
+ The preferred option when selecting columns or rows in a Pandas DataFrame
+ is with `.loc` or `.iloc` methods, as they are generally performant.
+ `select_rows` is primarily for convenience.
+
+ !!!note
+
+ This function will be deprecated in a 1.x release.
+ Please use `jn.select` instead.
+
+ Examples:
+ >>> import pandas as pd
+ >>> import janitor
+ >>> df = {"col1": [1, 2], "foo": [3, 4], "col2": [5, 6]}
+ >>> df = pd.DataFrame.from_dict(df, orient='index')
+ >>> df
+ 0 1
+ col1 1 2
+ foo 3 4
+ col2 5 6
+ >>> df.select_rows("col*")
+ 0 1
+ col1 1 2
+ col2 5 6
+
+ More examples can be found in the
+ [`select_columns`][janitor.functions.select.select_columns] section.
+
+ Args:
+ df: A pandas DataFrame.
+ *args: Valid inputs include: an exact index name to look for,
+ a shell-style glob string (e.g. `*_thing_*`),
+ a regular expression,
+ a callable,
+ or variable arguments of all the aforementioned.
+ A sequence of booleans is also acceptable.
+ A dictionary can be used for selection
+ on a MultiIndex on different levels.
+ invert: Whether or not to invert the selection.
+ This will result in the selection
+ of the complement of the rows provided.
+
+ Returns:
+ A pandas DataFrame with the specified rows selected.
+ """# noqa: E501
+ return_select(df,rows=list(args),invert=invert)
+
Element-wise (default; elementwise=True). Then, the individual
+column elements will be passed in as the first argument of function.
+
Column-wise (elementwise=False). Then, function is expected to
+take in a pandas Series and return a sequence that is of identical length
+to the original.
+
+
If dest_column_name is provided, then the transformation result is stored
+in that column. Otherwise, the transformed result is stored under the name
+of the original column.
+
This method does not mutate the original DataFrame.
+
+
+
+
Examples:
+
Transform a column in-place with an element-wise function.
+
>>> importpandasaspd
+>>> importjanitor
+>>> df=pd.DataFrame({
+... "a":[2,3,4],
+... "b":["area","pyjanitor","grapefruit"],
+... })
+>>> df
+ a b
+0 2 area
+1 3 pyjanitor
+2 4 grapefruit
+>>> df.transform_column(
+... column_name="a",
+... function=lambdax:x**2-1,
+... )
+ a b
+0 3 area
+1 8 pyjanitor
+2 15 grapefruit
+
+
+
+
+
Examples:
+
Transform a column in-place with an column-wise function.
+
>>> df.transform_column(
+... column_name="b",
+... function=lambdasrs:srs.str[:5],
+... elementwise=False,
+... )
+ a b
+0 2 area
+1 3 pyjan
+2 4 grape
+
+
+
+
+
Parameters:
+
+
+
+
Name
+
Type
+
Description
+
Default
+
+
+
+
+
df
+
+ DataFrame
+
+
+
+
A pandas DataFrame.
+
+
+
+ required
+
+
+
+
column_name
+
+ Hashable
+
+
+
+
The column to transform.
+
+
+
+ required
+
+
+
+
function
+
+ Callable
+
+
+
+
A function to apply on the column.
+
+
+
+ required
+
+
+
+
dest_column_name
+
+ Optional[str]
+
+
+
+
The column name to store the transformation result
+in. Defaults to None, which will result in the original column
+name being overwritten. If a name is provided here, then a new
+column with the transformed values will be created.
+
+
+
+ None
+
+
+
+
elementwise
+
+ bool
+
+
+
+
Whether to apply the function elementwise or not.
+If elementwise is True, then the function's first argument
+should be the data type of each datum in the column of data,
+and should return a transformed datum.
+If elementwise is False, then the function's should expect
+a pandas Series passed into it, and return a pandas Series.
+
+
+
+ True
+
+
+
+
+
+
+
+
Returns:
+
+
+
+
Type
+
Description
+
+
+
+
+
+ DataFrame
+
+
+
+
A pandas DataFrame with a transformed column.
+
+
+
+
+
+
+
+ Source code in janitor/functions/transform_columns.py
+
@pf.register_dataframe_method
+@deprecated_alias(col_name="column_name",dest_col_name="dest_column_name")
+deftransform_column(
+ df:pd.DataFrame,
+ column_name:Hashable,
+ function:Callable,
+ dest_column_name:Optional[str]=None,
+ elementwise:bool=True,
+)->pd.DataFrame:
+"""Transform the given column using the provided function.
+
+ Meant to be the method-chaining equivalent of:
+ ```python
+ df[dest_column_name] = df[column_name].apply(function)
+ ```
+
+ Functions can be applied in one of two ways:
+
+ - **Element-wise** (default; `elementwise=True`). Then, the individual
+ column elements will be passed in as the first argument of `function`.
+ - **Column-wise** (`elementwise=False`). Then, `function` is expected to
+ take in a pandas Series and return a sequence that is of identical length
+ to the original.
+
+ If `dest_column_name` is provided, then the transformation result is stored
+ in that column. Otherwise, the transformed result is stored under the name
+ of the original column.
+
+ This method does not mutate the original DataFrame.
+
+ Examples:
+ Transform a column in-place with an element-wise function.
+
+ >>> import pandas as pd
+ >>> import janitor
+ >>> df = pd.DataFrame({
+ ... "a": [2, 3, 4],
+ ... "b": ["area", "pyjanitor", "grapefruit"],
+ ... })
+ >>> df
+ a b
+ 0 2 area
+ 1 3 pyjanitor
+ 2 4 grapefruit
+ >>> df.transform_column(
+ ... column_name="a",
+ ... function=lambda x: x**2 - 1,
+ ... )
+ a b
+ 0 3 area
+ 1 8 pyjanitor
+ 2 15 grapefruit
+
+ Examples:
+ Transform a column in-place with an column-wise function.
+
+ >>> df.transform_column(
+ ... column_name="b",
+ ... function=lambda srs: srs.str[:5],
+ ... elementwise=False,
+ ... )
+ a b
+ 0 2 area
+ 1 3 pyjan
+ 2 4 grape
+
+ Args:
+ df: A pandas DataFrame.
+ column_name: The column to transform.
+ function: A function to apply on the column.
+ dest_column_name: The column name to store the transformation result
+ in. Defaults to None, which will result in the original column
+ name being overwritten. If a name is provided here, then a new
+ column with the transformed values will be created.
+ elementwise: Whether to apply the function elementwise or not.
+ If `elementwise` is True, then the function's first argument
+ should be the data type of each datum in the column of data,
+ and should return a transformed datum.
+ If `elementwise` is False, then the function's should expect
+ a pandas Series passed into it, and return a pandas Series.
+
+ Returns:
+ A pandas DataFrame with a transformed column.
+ """
+ check_column(df,column_name)
+
+ ifdest_column_nameisNone:
+ dest_column_name=column_name
+ elifdest_column_name!=column_name:
+ # If `dest_column_name` is provided and equals `column_name`, then we
+ # assume that the user's intent is to perform an in-place
+ # transformation (Same behaviour as when `dest_column_name` = None).
+ # Otherwise we throw an error if `dest_column_name` already exists in
+ # df.
+ check_column(df,dest_column_name,present=False)
+
+ result=_get_transform_column_result(
+ df[column_name],
+ function,
+ elementwise,
+ )
+
+ returndf.assign(**{dest_column_name:result})
+
+
+
+
+
+
+
+
+
+
+
+
+ truncate_datetime_dataframe(df,datepart)
+
+
+
+
+
+
+
Truncate times down to a user-specified precision of
+year, month, day, hour, minute, or second.
+
This method does not mutate the original DataFrame.
Given a group of dataframes which contain some categorical columns, for
+each categorical column present, find all the possible categories across
+all the dataframes which have that column.
+Update each dataframes' corresponding column with a new categorical object
+that contains the original data
+but has labels for all the possible categories from all dataframes.
+This is useful when concatenating a list of dataframes which all have the
+same categorical columns into one dataframe.
+
If, for a given categorical column, all input dataframes do not have at
+least one instance of all the possible categories,
+Pandas will change the output dtype of that column from category to
+object, losing out on dramatic speed gains you get from the former
+format.
+
+
+
+
Examples:
+
Usage example for concatenation of categorical column-containing
+dataframes:
defunionize_dataframe_categories(
+ *dataframes:Any,
+ column_names:Optional[Iterable[pd.CategoricalDtype]]=None,
+)->List[pd.DataFrame]:
+"""
+ Given a group of dataframes which contain some categorical columns, for
+ each categorical column present, find all the possible categories across
+ all the dataframes which have that column.
+ Update each dataframes' corresponding column with a new categorical object
+ that contains the original data
+ but has labels for all the possible categories from all dataframes.
+ This is useful when concatenating a list of dataframes which all have the
+ same categorical columns into one dataframe.
+
+ If, for a given categorical column, all input dataframes do not have at
+ least one instance of all the possible categories,
+ Pandas will change the output dtype of that column from `category` to
+ `object`, losing out on dramatic speed gains you get from the former
+ format.
+
+ Examples:
+ Usage example for concatenation of categorical column-containing
+ dataframes:
+
+ Instead of:
+
+ ```python
+ concatenated_df = pd.concat([df1, df2, df3], ignore_index=True)
+ ```
+
+ which in your case has resulted in `category` -> `object` conversion,
+ use:
+
+ ```python
+ unionized_dataframes = unionize_dataframe_categories(df1, df2, df2)
+ concatenated_df = pd.concat(unionized_dataframes, ignore_index=True)
+ ```
+
+ Args:
+ *dataframes: The dataframes you wish to unionize the categorical
+ objects for.
+ column_names: If supplied, only unionize this subset of columns.
+
+ Raises:
+ TypeError: If any of the inputs are not pandas DataFrames.
+
+ Returns:
+ A list of the category-unioned dataframes in the same order they
+ were provided.
+ """
+
+ ifany(notisinstance(df,pd.DataFrame)fordfindataframes):
+ raiseTypeError("Inputs must all be dataframes.")
+
+ ifcolumn_namesisNone:
+ # Find all columns across all dataframes that are categorical
+
+ column_names=set()
+
+ fordataframeindataframes:
+ column_names=column_names.union(
+ [
+ column_name
+ forcolumn_nameindataframe.columns
+ ifisinstance(
+ dataframe[column_name].dtype,pd.CategoricalDtype
+ )
+ ]
+ )
+
+ else:
+ column_names=[column_names]
+ # For each categorical column, find all possible values across the DFs
+
+ category_unions={
+ column_name:union_categoricals(
+ [df[column_name]fordfindataframesifcolumn_nameindf.columns]
+ )
+ forcolumn_nameincolumn_names
+ }
+
+ # Make a shallow copy of all DFs and modify the categorical columns
+ # such that they can encode the union of all possible categories for each.
+
+ refactored_dfs=[]
+
+ fordfindataframes:
+ df=df.copy(deep=False)
+
+ forcolumn_name,categoricalincategory_unions.items():
+ ifcolumn_nameindf.columns:
+ df[column_name]=pd.Categorical(
+ df[column_name],categories=categorical.categories
+ )
+
+ refactored_dfs.append(df)
+
+ returnrefactored_dfs
+
Intended to be the method-chaining alternative to:
+
df[column_name]=value
+
+
+
Note
+
This function will be deprecated in a 1.x release.
+Please use pd.DataFrame.assign instead.
+
+
+
+
+
Examples:
+
Add a column of constant values to the dataframe.
+
>>> importpandasaspd
+>>> importjanitor
+>>> df=pd.DataFrame({"a":list(range(3)),"b":list("abc")})
+>>> df.add_column(column_name="c",value=1)
+ a b c
+0 0 a 1
+1 1 b 1
+2 2 c 1
+
+
Add a column of different values to the dataframe.
+
>>> importpandasaspd
+>>> importjanitor
+>>> df=pd.DataFrame({"a":list(range(3)),"b":list("abc")})
+>>> df.add_column(column_name="c",value=list("efg"))
+ a b c
+0 0 a e
+1 1 b f
+2 2 c g
+
+
Add a column using an iterator.
+
>>> importpandasaspd
+>>> importjanitor
+>>> df=pd.DataFrame({"a":list(range(3)),"b":list("abc")})
+>>> df.add_column(column_name="c",value=range(4,7))
+ a b c
+0 0 a 4
+1 1 b 5
+2 2 c 6
+
+
+
+
+
Parameters:
+
+
+
+
Name
+
Type
+
Description
+
Default
+
+
+
+
+
df
+
+ DataFrame
+
+
+
+
A pandas DataFrame.
+
+
+
+ required
+
+
+
+
column_name
+
+ str
+
+
+
+
Name of the new column. Should be a string, in order
+for the column name to be compatible with the Feather binary
+format (this is a useful thing to have).
+
+
+
+ required
+
+
+
+
value
+
+ Union[List[Any], Tuple[Any], Any]
+
+
+
+
Either a single value, or a list/tuple of values.
+
+
+
+ required
+
+
+
+
fill_remaining
+
+ bool
+
+
+
+
If value is a tuple or list that is smaller than
+the number of rows in the DataFrame, repeat the list or tuple
+(R-style) to the end of the DataFrame.
+
+
+
+ False
+
+
+
+
+
+
+
+
Raises:
+
+
+
+
Type
+
Description
+
+
+
+
+
+ ValueError
+
+
+
+
If attempting to add a column that already exists.
+
+
+
+
+
+ ValueError
+
+
+
+
If value has more elements that number of
+rows in the DataFrame.
+
+
+
+
+
+ ValueError
+
+
+
+
If attempting to add an iterable of values with
+a length not equal to the number of DataFrame rows.
+
+
+
+
+
+ ValueError
+
+
+
+
If value has length of 0.
+
+
+
+
+
+
+
+
+
Returns:
+
+
+
+
Type
+
Description
+
+
+
+
+
+ DataFrame
+
+
+
+
A pandas DataFrame with an added column.
+
+
+
+
+
+
+
+ Source code in janitor/functions/add_columns.py
+
@pf.register_dataframe_method
+@refactored_function(
+ message=(
+ "This function will be deprecated in a 1.x release. "
+ "Please use `pd.DataFrame.assign` instead."
+ )
+)
+@deprecated_alias(col_name="column_name")
+defadd_column(
+ df:pd.DataFrame,
+ column_name:str,
+ value:Union[List[Any],Tuple[Any],Any],
+ fill_remaining:bool=False,
+)->pd.DataFrame:
+"""Add a column to the dataframe.
+
+ Intended to be the method-chaining alternative to:
+
+ ```python
+ df[column_name] = value
+ ```
+
+ !!!note
+
+ This function will be deprecated in a 1.x release.
+ Please use `pd.DataFrame.assign` instead.
+
+ Examples:
+ Add a column of constant values to the dataframe.
+
+ >>> import pandas as pd
+ >>> import janitor
+ >>> df = pd.DataFrame({"a": list(range(3)), "b": list("abc")})
+ >>> df.add_column(column_name="c", value=1)
+ a b c
+ 0 0 a 1
+ 1 1 b 1
+ 2 2 c 1
+
+ Add a column of different values to the dataframe.
+
+ >>> import pandas as pd
+ >>> import janitor
+ >>> df = pd.DataFrame({"a": list(range(3)), "b": list("abc")})
+ >>> df.add_column(column_name="c", value=list("efg"))
+ a b c
+ 0 0 a e
+ 1 1 b f
+ 2 2 c g
+
+ Add a column using an iterator.
+
+ >>> import pandas as pd
+ >>> import janitor
+ >>> df = pd.DataFrame({"a": list(range(3)), "b": list("abc")})
+ >>> df.add_column(column_name="c", value=range(4, 7))
+ a b c
+ 0 0 a 4
+ 1 1 b 5
+ 2 2 c 6
+
+ Args:
+ df: A pandas DataFrame.
+ column_name: Name of the new column. Should be a string, in order
+ for the column name to be compatible with the Feather binary
+ format (this is a useful thing to have).
+ value: Either a single value, or a list/tuple of values.
+ fill_remaining: If value is a tuple or list that is smaller than
+ the number of rows in the DataFrame, repeat the list or tuple
+ (R-style) to the end of the DataFrame.
+
+ Raises:
+ ValueError: If attempting to add a column that already exists.
+ ValueError: If `value` has more elements that number of
+ rows in the DataFrame.
+ ValueError: If attempting to add an iterable of values with
+ a length not equal to the number of DataFrame rows.
+ ValueError: If `value` has length of `0`.
+
+ Returns:
+ A pandas DataFrame with an added column.
+ """
+ check("column_name",column_name,[str])
+
+ ifcolumn_nameindf.columns:
+ raiseValueError(
+ f"Attempted to add column that already exists: "f"{column_name}."
+ )
+
+ nrows=len(df)
+
+ ifhasattr(value,"__len__")andnotisinstance(
+ value,(str,bytes,bytearray)
+ ):
+ len_value=len(value)
+
+ # if `value` is a list, ndarray, etc.
+ iflen_value>nrows:
+ raiseValueError(
+ "`value` has more elements than number of rows "
+ f"in your `DataFrame`. vals: {len_value}, "
+ f"df: {nrows}"
+ )
+ iflen_value!=nrowsandnotfill_remaining:
+ raiseValueError(
+ "Attempted to add iterable of values with length"
+ " not equal to number of DataFrame rows"
+ )
+ ifnotlen_value:
+ raiseValueError(
+ "`value` has to be an iterable of minimum length 1"
+ )
+
+ eliffill_remaining:
+ # relevant if a scalar val was passed, yet fill_remaining == True
+ len_value=1
+ value=[value]
+
+ df=df.copy()
+ iffill_remaining:
+ times_to_loop=int(np.ceil(nrows/len_value))
+ fill_values=list(value)*times_to_loop
+ df[column_name]=fill_values[:nrows]
+ else:
+ df[column_name]=value
+
+ returndf
+
This method does not mutate the original DataFrame.
+
Method to augment
+add_column
+with ability to add multiple columns in
+one go. This replaces the need for multiple
+add_column calls.
+
Usage is through supplying kwargs where the key is the col name and the
+values correspond to the values of the new DataFrame column.
+
Values passed can be scalar or iterable (list, ndarray, etc.)
+
+
Note
+
This function will be deprecated in a 1.x release.
+Please use pd.DataFrame.assign instead.
+
+
+
+
+
Examples:
+
Inserting two more columns into a dataframe.
+
>>> importpandasaspd
+>>> importjanitor
+>>> df=pd.DataFrame({"a":list(range(3)),"b":list("abc")})
+>>> df.add_columns(x=4,y=list("def"))
+ a b x y
+0 0 a 4 d
+1 1 b 4 e
+2 2 c 4 f
+
+
+
+
+
Parameters:
+
+
+
+
Name
+
Type
+
Description
+
Default
+
+
+
+
+
df
+
+ DataFrame
+
+
+
+
A pandas DataFrame.
+
+
+
+ required
+
+
+
+
fill_remaining
+
+ bool
+
+
+
+
If value is a tuple or list that is smaller than
+the number of rows in the DataFrame, repeat the list or tuple
+(R-style) to the end of the DataFrame. (Passed to
+add_column)
+
+
+
+ False
+
+
+
+
**kwargs
+
+ Any
+
+
+
+
Column, value pairs which are looped through in
+add_column calls.
+
+
+
+ {}
+
+
+
+
+
+
+
+
Returns:
+
+
+
+
Type
+
Description
+
+
+
+
+
+ DataFrame
+
+
+
+
A pandas DataFrame with added columns.
+
+
+
+
+
+
+
+ Source code in janitor/functions/add_columns.py
+
@pf.register_dataframe_method
+@refactored_function(
+ message=(
+ "This function will be deprecated in a 1.x release. "
+ "Please use `pd.DataFrame.assign` instead."
+ )
+)
+defadd_columns(
+ df:pd.DataFrame,
+ fill_remaining:bool=False,
+ **kwargs:Any,
+)->pd.DataFrame:
+"""Add multiple columns to the dataframe.
+
+ This method does not mutate the original DataFrame.
+
+ Method to augment
+ [`add_column`][janitor.functions.add_columns.add_column]
+ with ability to add multiple columns in
+ one go. This replaces the need for multiple
+ [`add_column`][janitor.functions.add_columns.add_column] calls.
+
+ Usage is through supplying kwargs where the key is the col name and the
+ values correspond to the values of the new DataFrame column.
+
+ Values passed can be scalar or iterable (list, ndarray, etc.)
+
+ !!!note
+
+ This function will be deprecated in a 1.x release.
+ Please use `pd.DataFrame.assign` instead.
+
+ Examples:
+ Inserting two more columns into a dataframe.
+
+ >>> import pandas as pd
+ >>> import janitor
+ >>> df = pd.DataFrame({"a": list(range(3)), "b": list("abc")})
+ >>> df.add_columns(x=4, y=list("def"))
+ a b x y
+ 0 0 a 4 d
+ 1 1 b 4 e
+ 2 2 c 4 f
+
+ Args:
+ df: A pandas DataFrame.
+ fill_remaining: If value is a tuple or list that is smaller than
+ the number of rows in the DataFrame, repeat the list or tuple
+ (R-style) to the end of the DataFrame. (Passed to
+ [`add_column`][janitor.functions.add_columns.add_column])
+ **kwargs: Column, value pairs which are looped through in
+ [`add_column`][janitor.functions.add_columns.add_column] calls.
+
+ Returns:
+ A pandas DataFrame with added columns.
+ """
+ # Note: error checking can pretty much be handled in `add_column`
+
+ forcol_name,valuesinkwargs.items():
+ df=df.add_column(col_name,values,fill_remaining=fill_remaining)
+
+ returndf
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+ also
+
+
+
+
+
+
+
Implementation source for chainable function also.
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+ also(df,func,*args,**kwargs)
+
+
+
+
+
+
+
Run a function with side effects.
+
This function allows you to run an arbitrary function
+in the pyjanitor method chain.
+Doing so will let you do things like save the dataframe to disk midway
+while continuing to modify the dataframe afterwards.
A function you would like to run in the method chain.
+It should take one DataFrame object as a parameter and have no return.
+If there is a return, it will be ignored.
@pf.register_dataframe_method
+defalso(
+ df:pd.DataFrame,func:Callable,*args:Any,**kwargs:Any
+)->pd.DataFrame:
+"""Run a function with side effects.
+
+ This function allows you to run an arbitrary function
+ in the `pyjanitor` method chain.
+ Doing so will let you do things like save the dataframe to disk midway
+ while continuing to modify the dataframe afterwards.
+
+ Examples:
+ >>> import pandas as pd
+ >>> import janitor
+ >>> df = (
+ ... pd.DataFrame({"a": [1, 2, 3], "b": list("abc")})
+ ... .query("a > 1")
+ ... .also(lambda df: print(f"DataFrame shape is: {df.shape}"))
+ ... .rename_column(old_column_name="a", new_column_name="a_new")
+ ... .also(lambda df: df.to_csv("midpoint.csv"))
+ ... .also(
+ ... lambda df: print(f"Columns: {df.columns}")
+ ... )
+ ... )
+ DataFrame shape is: (2, 2)
+ Columns: Index(['a_new', 'b'], dtype='object')
+
+ Args:
+ df: A pandas DataFrame.
+ func: A function you would like to run in the method chain.
+ It should take one DataFrame object as a parameter and have no return.
+ If there is a return, it will be ignored.
+ *args: Optional arguments for `func`.
+ **kwargs: Optional keyword arguments for `func`.
+
+ Returns:
+ The input pandas DataFrame, unmodified.
+ """# noqa: E501
+ func(df.copy(),*args,**kwargs)
+ returndf
+
@pf.register_dataframe_method
+@deprecated_alias(
+ from_column="from_column_name",
+ to_column="to_column_name",
+ num_bins="bins",
+)
+defbin_numeric(
+ df:pd.DataFrame,
+ from_column_name:str,
+ to_column_name:str,
+ bins:Optional[Union[int,ScalarSequence,pd.IntervalIndex]]=5,
+ **kwargs:Any,
+)->pd.DataFrame:
+"""Generate a new column that labels bins for a specified numeric column.
+
+ This method does not mutate the original DataFrame.
+
+ A wrapper around the pandas [`cut()`][pd_cut_docs] function to bin data of
+ one column, generating a new column with the results.
+
+ [pd_cut_docs]: https://pandas.pydata.org/docs/reference/api/pandas.cut.html
+
+ Examples:
+ Binning a numeric column with specific bin edges.
+
+ >>> import pandas as pd
+ >>> import janitor
+ >>> df = pd.DataFrame({"a": [3, 6, 9, 12, 15]})
+ >>> df.bin_numeric(
+ ... from_column_name="a", to_column_name="a_binned",
+ ... bins=[0, 5, 11, 15],
+ ... )
+ a a_binned
+ 0 3 (0, 5]
+ 1 6 (5, 11]
+ 2 9 (5, 11]
+ 3 12 (11, 15]
+ 4 15 (11, 15]
+
+ Args:
+ df: A pandas DataFrame.
+ from_column_name: The column whose data you want binned.
+ to_column_name: The new column to be created with the binned data.
+ bins: The binning strategy to be utilized. Read the `pd.cut`
+ documentation for more details.
+ **kwargs: Additional kwargs to pass to `pd.cut`, except `retbins`.
+
+ Raises:
+ ValueError: If `retbins` is passed in as a kwarg.
+
+ Returns:
+ A pandas DataFrame.
+ """
+ if"retbins"inkwargs:
+ raiseValueError("`retbins` is not an acceptable keyword argument.")
+
+ check("from_column_name",from_column_name,[str])
+ check("to_column_name",to_column_name,[str])
+ check_column(df,from_column_name)
+
+ df=df.assign(
+ **{
+ to_column_name:pd.cut(df[from_column_name],bins=bins,**kwargs),
+ }
+ )
+
+ returndf
+
Create a column based on a condition or multiple conditions.
+
Similar to SQL and dplyr's case_when
+with inspiration from pydatatable if_else function.
+
If your scenario requires direct replacement of values,
+pandas' replace method or map method should be better
+suited and more efficient; if the conditions check
+if a value is within a range of values, pandas' cut or qcut
+should be more efficient; np.where/np.select are also
+performant options.
+
This function relies on pd.Series.mask method.
+
When multiple conditions are satisfied, the first one is used.
+
The variable *args parameters takes arguments of the form :
+condition0, value0, condition1, value1, ..., default.
+If condition0 evaluates to True, then assign value0 to
+column_name, if condition1 evaluates to True, then
+assign value1 to column_name, and so on. If none of the
+conditions evaluate to True, assign default to
+column_name.
+
This function can be likened to SQL's case_when:
+
CASEWHENcondition0THENvalue0
+WHENcondition1THENvalue1
+--- more conditions
+ELSEdefault
+ENDAScolumn_name
+
>>> importpandasaspd
+>>> importjanitor
+>>> df=pd.DataFrame(
+... {
+... "a":[0,0,1,2,"hi"],
+... "b":[0,3,4,5,"bye"],
+... "c":[6,7,8,9,"wait"],
+... }
+... )
+>>> df
+ a b c
+0 0 0 6
+1 0 3 7
+2 1 4 8
+3 2 5 9
+4 hi bye wait
+>>> df.case_when(
+... ((df.a==0)&(df.b!=0))|(df.c=="wait"),df.a,
+... (df.b==0)&(df.a==0),"x",
+... default=df.c,
+... column_name="value",
+... )
+ a b c value
+0 0 0 6 x
+1 0 3 7 0
+2 1 4 8 8
+3 2 5 9 9
+4 hi bye wait hi
+
+
+
Version Changed
+
+
0.24.0
+
Added default parameter.
+
+
+
+
+
+
+
+
Parameters:
+
+
+
+
Name
+
Type
+
Description
+
Default
+
+
+
+
+
df
+
+ DataFrame
+
+
+
+
A pandas DataFrame.
+
+
+
+ required
+
+
+
+
*args
+
+ Any
+
+
+
+
Variable argument of conditions and expected values.
+Takes the form
+condition0, value0, condition1, value1, ... .
+condition can be a 1-D boolean array, a callable, or a string.
+If condition is a callable, it should evaluate
+to a 1-D boolean array. The array should have the same length
+as the DataFrame. If it is a string, it is computed on the dataframe,
+via df.eval, and should return a 1-D boolean array.
+result can be a scalar, a 1-D array, or a callable.
+If result is a callable, it should evaluate to a 1-D array.
+For a 1-D array, it should have the same length as the DataFrame.
+
+
+
+ ()
+
+
+
+
default
+
+ Any
+
+
+
+
This is the element inserted in the output
+when all conditions evaluate to False.
+Can be scalar, 1-D array or callable.
+If callable, it should evaluate to a 1-D array.
+The 1-D array should be the same length as the DataFrame.
+
+
+
+ None
+
+
+
+
column_name
+
+ str
+
+
+
+
Name of column to assign results to. A new column
+is created if it does not already exist in the DataFrame.
+
+
+
+ required
+
+
+
+
+
+
+
+
Raises:
+
+
+
+
Type
+
Description
+
+
+
+
+
+ ValueError
+
+
+
+
If condition/value fails to evaluate.
+
+
+
+
+
+
+
+
+
Returns:
+
+
+
+
Type
+
Description
+
+
+
+
+
+ DataFrame
+
+
+
+
A pandas DataFrame.
+
+
+
+
+
+
+
+ Source code in janitor/functions/case_when.py
+
@pf.register_dataframe_method
+@refactored_function(
+ message=(
+ "This function will be deprecated in a 1.x release. "
+ "Please use `pd.Series.case_when` instead."
+ )
+)
+defcase_when(
+ df:pd.DataFrame,*args:Any,default:Any=None,column_name:str
+)->pd.DataFrame:
+"""Create a column based on a condition or multiple conditions.
+
+ Similar to SQL and dplyr's case_when
+ with inspiration from `pydatatable` if_else function.
+
+ If your scenario requires direct replacement of values,
+ pandas' `replace` method or `map` method should be better
+ suited and more efficient; if the conditions check
+ if a value is within a range of values, pandas' `cut` or `qcut`
+ should be more efficient; `np.where/np.select` are also
+ performant options.
+
+ This function relies on `pd.Series.mask` method.
+
+ When multiple conditions are satisfied, the first one is used.
+
+ The variable `*args` parameters takes arguments of the form :
+ `condition0`, `value0`, `condition1`, `value1`, ..., `default`.
+ If `condition0` evaluates to `True`, then assign `value0` to
+ `column_name`, if `condition1` evaluates to `True`, then
+ assign `value1` to `column_name`, and so on. If none of the
+ conditions evaluate to `True`, assign `default` to
+ `column_name`.
+
+ This function can be likened to SQL's `case_when`:
+
+ ```sql
+ CASE WHEN condition0 THEN value0
+ WHEN condition1 THEN value1
+ --- more conditions
+ ELSE default
+ END AS column_name
+ ```
+
+ compared to python's `if-elif-else`:
+
+ ```python
+ if condition0:
+ value0
+ elif condition1:
+ value1
+ # more elifs
+ else:
+ default
+ ```
+
+ Examples:
+ >>> import pandas as pd
+ >>> import janitor
+ >>> df = pd.DataFrame(
+ ... {
+ ... "a": [0, 0, 1, 2, "hi"],
+ ... "b": [0, 3, 4, 5, "bye"],
+ ... "c": [6, 7, 8, 9, "wait"],
+ ... }
+ ... )
+ >>> df
+ a b c
+ 0 0 0 6
+ 1 0 3 7
+ 2 1 4 8
+ 3 2 5 9
+ 4 hi bye wait
+ >>> df.case_when(
+ ... ((df.a == 0) & (df.b != 0)) | (df.c == "wait"), df.a,
+ ... (df.b == 0) & (df.a == 0), "x",
+ ... default = df.c,
+ ... column_name = "value",
+ ... )
+ a b c value
+ 0 0 0 6 x
+ 1 0 3 7 0
+ 2 1 4 8 8
+ 3 2 5 9 9
+ 4 hi bye wait hi
+
+ !!! abstract "Version Changed"
+
+ - 0.24.0
+ - Added `default` parameter.
+
+ Args:
+ df: A pandas DataFrame.
+ *args: Variable argument of conditions and expected values.
+ Takes the form
+ `condition0`, `value0`, `condition1`, `value1`, ... .
+ `condition` can be a 1-D boolean array, a callable, or a string.
+ If `condition` is a callable, it should evaluate
+ to a 1-D boolean array. The array should have the same length
+ as the DataFrame. If it is a string, it is computed on the dataframe,
+ via `df.eval`, and should return a 1-D boolean array.
+ `result` can be a scalar, a 1-D array, or a callable.
+ If `result` is a callable, it should evaluate to a 1-D array.
+ For a 1-D array, it should have the same length as the DataFrame.
+ default: This is the element inserted in the output
+ when all conditions evaluate to False.
+ Can be scalar, 1-D array or callable.
+ If callable, it should evaluate to a 1-D array.
+ The 1-D array should be the same length as the DataFrame.
+ column_name: Name of column to assign results to. A new column
+ is created if it does not already exist in the DataFrame.
+
+ Raises:
+ ValueError: If condition/value fails to evaluate.
+
+ Returns:
+ A pandas DataFrame.
+ """# noqa: E501
+ # Preliminary checks on the case_when function.
+ # The bare minimum checks are done; the remaining checks
+ # are done within `pd.Series.mask`.
+ check("column_name",column_name,[str])
+ len_args=len(args)
+ iflen_args<2:
+ raiseValueError(
+ "At least two arguments are required for the `args` parameter"
+ )
+
+ iflen_args%2:
+ ifdefaultisNone:
+ warnings.warn(
+ "The last argument in the variable arguments "
+ "has been assigned as the default. "
+ "Note however that this will be deprecated "
+ "in a future release; use an even number "
+ "of boolean conditions and values, "
+ "and pass the default argument to the `default` "
+ "parameter instead.",
+ DeprecationWarning,
+ stacklevel=find_stack_level(),
+ )
+ *args,default=args
+ else:
+ raiseValueError(
+ "The number of conditions and values do not match. "
+ f"There are {len_args-len_args//2} conditions "
+ f"and {len_args//2} values."
+ )
+
+ booleans=[]
+ replacements=[]
+
+ forindex,valueinenumerate(args):
+ ifindex%2:
+ ifcallable(value):
+ value=apply_if_callable(value,df)
+ replacements.append(value)
+ else:
+ ifcallable(value):
+ value=apply_if_callable(value,df)
+ elifisinstance(value,str):
+ value=df.eval(value)
+ booleans.append(value)
+
+ ifcallable(default):
+ default=apply_if_callable(default,df)
+ ifis_scalar(default):
+ default=pd.Series([default]).repeat(len(df))
+ ifnothasattr(default,"shape"):
+ default=pd.Series([*default])
+ ifisinstance(default,pd.Index):
+ arr_ndim=default.nlevels
+ else:
+ arr_ndim=default.ndim
+ ifarr_ndim!=1:
+ raiseValueError(
+ "The argument for the `default` parameter "
+ "should either be a 1-D array, a scalar, "
+ "or a callable that can evaluate to a 1-D array."
+ )
+ ifnotisinstance(default,pd.Series):
+ default=pd.Series(default)
+ default.index=df.index
+ # actual computation
+ # ensures value assignment is on a first come basis
+ booleans=booleans[::-1]
+ replacements=replacements[::-1]
+ forindex,(condition,value)inenumerate(zip(booleans,replacements)):
+ try:
+ default=default.mask(condition,value)
+ # error `feedoff` idea from SO
+ # https://stackoverflow.com/a/46091127/7175713
+ exceptExceptionaserror:
+ raiseValueError(
+ f"condition{index} and value{index} failed to evaluate. "
+ f"Original error message: {error}"
+ )fromerror
+
+ returndf.assign(**{column_name:default})
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+ change_index_dtype
+
+
+
+
+
+
+
Implementation of the change_index_dtype function.
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+ change_index_dtype(df,dtype,axis='index')
+
+
+
+
+
+
+
Cast an index to a specified dtype dtype.
+
This method does not mutate the original DataFrame.
This method does not mutate the original DataFrame.
+
Exceptions that are raised can be ignored. For example, if one has a mixed
+dtype column that has non-integer strings and integers, and you want to
+coerce everything to integers, you can optionally ignore the non-integer
+strings and replace them with NaN or keep the original value.
+
Intended to be the method-chaining alternative to:
+
df[col]=df[col].astype(dtype)
+
+
+
Note
+
This function will be deprecated in a 1.x release.
+Please use pd.DataFrame.astype instead.
@pf.register_dataframe_method
+@refactored_function(
+ message=(
+ "This function will be deprecated in a 1.x release. "
+ "Please use `pd.DataFrame.astype` instead."
+ )
+)
+@deprecated_alias(column="column_name")
+defchange_type(
+ df:pd.DataFrame,
+ column_name:Hashable|list[Hashable]|pd.Index,
+ dtype:type,
+ ignore_exception:bool=False,
+)->pd.DataFrame:
+"""Change the type of a column.
+
+ This method does not mutate the original DataFrame.
+
+ Exceptions that are raised can be ignored. For example, if one has a mixed
+ dtype column that has non-integer strings and integers, and you want to
+ coerce everything to integers, you can optionally ignore the non-integer
+ strings and replace them with `NaN` or keep the original value.
+
+ Intended to be the method-chaining alternative to:
+
+ ```python
+ df[col] = df[col].astype(dtype)
+ ```
+
+ !!!note
+
+ This function will be deprecated in a 1.x release.
+ Please use `pd.DataFrame.astype` instead.
+
+ Examples:
+ Change the type of a column.
+
+ >>> import pandas as pd
+ >>> import janitor
+ >>> df = pd.DataFrame({"col1": range(3), "col2": ["m", 5, True]})
+ >>> df
+ col1 col2
+ 0 0 m
+ 1 1 5
+ 2 2 True
+ >>> df.change_type(
+ ... "col1", dtype=str,
+ ... ).change_type(
+ ... "col2", dtype=float, ignore_exception="fillna",
+ ... )
+ col1 col2
+ 0 0 NaN
+ 1 1 5.0
+ 2 2 1.0
+
+ Change the type of multiple columns. To change the type of all columns,
+ please use `DataFrame.astype` instead.
+
+ >>> import pandas as pd
+ >>> import janitor
+ >>> df = pd.DataFrame({"col1": range(3), "col2": ["m", 5, True]})
+ >>> df.change_type(['col1', 'col2'], str)
+ col1 col2
+ 0 0 m
+ 1 1 5
+ 2 2 True
+
+ Args:
+ df: A pandas DataFrame.
+ column_name: The column(s) in the dataframe.
+ dtype: The datatype to convert to. Should be one of the standard
+ Python types, or a numpy datatype.
+ ignore_exception: One of `{False, "fillna", "keep_values"}`.
+
+ Raises:
+ ValueError: If unknown option provided for `ignore_exception`.
+
+ Returns:
+ A pandas DataFrame with changed column types.
+ """# noqa: E501
+
+ df=df.copy()# avoid mutating the original DataFrame
+ ifnotignore_exception:
+ df[column_name]=df[column_name].astype(dtype)
+ elifignore_exception=="keep_values":
+ df[column_name]=df[column_name].astype(dtype,errors="ignore")
+ elifignore_exception=="fillna":
+ ifisinstance(column_name,Hashable):
+ column_name=[column_name]
+ df[column_name]=df[column_name].map(_convert,dtype=dtype)
+ else:
+ raiseValueError("Unknown option for ignore_exception")
+
+ returndf
+
Whether to clean the labels on the index or columns.
+If None, applies to a defined column
+or columns in column_names.
+
+
+
+ 'columns'
+
+
+
+
column_names
+
+ Union[str, list]
+
+
+
+
Clean the values in a column.
+axis should be None.
+Column selection is possible using the
+select syntax.
+
+
+
+ None
+
+
+
+
strip_underscores
+
+ Optional[Union[str, bool]]
+
+
+
+
Removes the outer underscores from all
+column names. Default None keeps outer underscores. Values can be
+either 'left', 'right' or 'both' or the respective shorthand 'l',
+'r' and True.
+
+
+
+ None
+
+
+
+
case_type
+
+ str
+
+
+
+
Whether to make columns lower or uppercase.
+Current case may be preserved with 'preserve',
+while snake case conversion (from CamelCase or camelCase only)
+can be turned on using "snake".
+Default 'lower' makes all characters lowercase.
+
+
+
+ 'lower'
+
+
+
+
remove_special
+
+ bool
+
+
+
+
Remove special characters from columns.
+Only letters, numbers and underscores are preserved.
+
+
+
+ False
+
+
+
+
strip_accents
+
+ bool
+
+
+
+
Whether or not to remove accents from
+columns names.
+
+
+
+ True
+
+
+
+
preserve_original_labels
+
+ bool
+
+
+
+
Preserve original names.
+This is later retrievable using df.original_labels.
+Applies if axis is not None.
+
+
+
+ True
+
+
+
+
enforce_string
+
+ bool
+
+
+
+
Whether or not to convert all column names
+to string type. Defaults to True, but can be turned off.
+Columns with >1 levels will not be converted by default.
+
+
+
+ True
+
+
+
+
truncate_limit
+
+ int
+
+
+
+
Truncates formatted column names to
+the specified length. Default None does not truncate.
+
+
+
+ None
+
+
+
+
+
+
+
+
Raises:
+
+
+
+
Type
+
Description
+
+
+
+
+
+ ValueError
+
+
+
+
If axis=None and column_names=None.
+
+
+
+
+
+
+
+
+
Returns:
+
+
+
+
Type
+
Description
+
+
+
+
+
+ DataFrame
+
+
+
+
A pandas DataFrame.
+
+
+
+
+
+
+
+ Source code in janitor/functions/clean_names.py
+
@pf.register_dataframe_method
+@deprecated_alias(preserve_original_columns="preserve_original_labels")
+defclean_names(
+ df:pd.DataFrame,
+ axis:Union[str,None]="columns",
+ column_names:Union[str,list]=None,
+ strip_underscores:Optional[Union[str,bool]]=None,
+ case_type:str="lower",
+ remove_special:bool=False,
+ strip_accents:bool=True,
+ preserve_original_labels:bool=True,
+ enforce_string:bool=True,
+ truncate_limit:int=None,
+)->pd.DataFrame:
+"""Clean column/index names. It can also be applied to column values.
+
+ Takes all column names, converts them to lowercase,
+ then replaces all spaces with underscores.
+
+ By default, column names are converted to string types.
+ This can be switched off by passing in `enforce_string=False`.
+
+ This method does not mutate the original DataFrame.
+
+ Examples:
+ >>> import pandas as pd
+ >>> import janitor
+ >>> df = pd.DataFrame(
+ ... {
+ ... "Aloha": range(3),
+ ... "Bell Chart": range(3),
+ ... "Animals@#$%^": range(3)
+ ... }
+ ... )
+ >>> df
+ Aloha Bell Chart Animals@#$%^
+ 0 0 0 0
+ 1 1 1 1
+ 2 2 2 2
+ >>> df.clean_names()
+ aloha bell_chart animals@#$%^
+ 0 0 0 0
+ 1 1 1 1
+ 2 2 2 2
+ >>> df.clean_names(remove_special=True)
+ aloha bell_chart animals
+ 0 0 0 0
+ 1 1 1 1
+ 2 2 2 2
+
+ !!! summary "Version Changed"
+
+ - 0.26.0
+ - Added `axis` and `column_names` parameters.
+
+ Args:
+ df: The pandas DataFrame object.
+ axis: Whether to clean the labels on the index or columns.
+ If `None`, applies to a defined column
+ or columns in `column_names`.
+ column_names: Clean the values in a column.
+ `axis` should be `None`.
+ Column selection is possible using the
+ [`select`][janitor.functions.select.select] syntax.
+ strip_underscores: Removes the outer underscores from all
+ column names. Default None keeps outer underscores. Values can be
+ either 'left', 'right' or 'both' or the respective shorthand 'l',
+ 'r' and True.
+ case_type: Whether to make columns lower or uppercase.
+ Current case may be preserved with 'preserve',
+ while snake case conversion (from CamelCase or camelCase only)
+ can be turned on using "snake".
+ Default 'lower' makes all characters lowercase.
+ remove_special: Remove special characters from columns.
+ Only letters, numbers and underscores are preserved.
+ strip_accents: Whether or not to remove accents from
+ columns names.
+ preserve_original_labels: Preserve original names.
+ This is later retrievable using `df.original_labels`.
+ Applies if `axis` is not None.
+ enforce_string: Whether or not to convert all column names
+ to string type. Defaults to True, but can be turned off.
+ Columns with >1 levels will not be converted by default.
+ truncate_limit: Truncates formatted column names to
+ the specified length. Default None does not truncate.
+
+ Raises:
+ ValueError: If `axis=None` and `column_names=None`.
+
+ Returns:
+ A pandas DataFrame.
+ """
+ ifnotaxisandnotcolumn_names:
+ raiseValueError(
+ "Kindly provide an argument to `column_names`, if axis is None."
+ )
+ ifaxisisNone:
+ column_names=get_index_labels(
+ arg=column_names,df=df,axis="columns"
+ )
+ ifis_scalar(column_names):
+ column_names=[column_names]
+ df=df.copy()
+ forcolumn_nameincolumn_names:
+ df[column_name]=_clean_names_single_object(
+ obj=df[column_name],
+ enforce_string=enforce_string,
+ case_type=case_type,
+ remove_special=remove_special,
+ strip_accents=strip_accents,
+ strip_underscores=strip_underscores,
+ truncate_limit=truncate_limit,
+ )
+ returndf
+
+ assertaxisin{"index","columns"}
+ df=df[:]
+ target_axis=getattr(df,axis)
+ ifisinstance(target_axis,pd.MultiIndex):
+ target_axis=[
+ target_axis.get_level_values(number)
+ fornumberinrange(target_axis.nlevels)
+ ]
+ target_axis=[
+ _clean_names_single_object(
+ obj=obj,
+ enforce_string=enforce_string,
+ case_type=case_type,
+ remove_special=remove_special,
+ strip_accents=strip_accents,
+ strip_underscores=strip_underscores,
+ truncate_limit=truncate_limit,
+ )
+ forobjintarget_axis
+ ]
+ else:
+ target_axis=_clean_names_single_object(
+ obj=target_axis,
+ enforce_string=enforce_string,
+ case_type=case_type,
+ remove_special=remove_special,
+ strip_accents=strip_accents,
+ strip_underscores=strip_underscores,
+ truncate_limit=truncate_limit,
+ )
+ # Store the original column names, if enabled by user
+ ifpreserve_original_labels:
+ df.__dict__["original_labels"]=getattr(df,axis)
+ setattr(df,axis,target_axis)
+ returndf
+
Coalesce two or more columns of data in order of column names provided.
+
Given the variable arguments of column names,
+coalesce finds and returns the first non-missing value
+from these columns, for every row in the input dataframe.
+If all the column values are null for a particular row,
+then the default_value will be filled in.
+
If target_column_name is not provided,
+then the first column is coalesced.
+
This method does not mutate the original DataFrame.
>>> importpandasaspd
+>>> importnumpyasnp
+>>> importjanitor
+>>> df=pd.DataFrame({
+... "a":[np.nan,1,np.nan],
+... "b":[2,3,np.nan],
+... "c":[4,np.nan,np.nan],
+... })
+>>> df.coalesce("a","b","c")
+ a b c
+0 2.0 2.0 4.0
+1 1.0 3.0 NaN
+2 NaN NaN NaN
+
+
Provide a target_column_name.
+
>>> df.coalesce("a","b","c",target_column_name="new_col")
+ a b c new_col
+0 NaN 2.0 4.0 2.0
+1 1.0 3.0 NaN 1.0
+2 NaN NaN NaN NaN
+
+
Provide a default value.
+
>>> importpandasaspd
+>>> importnumpyasnp
+>>> importjanitor
+>>> df=pd.DataFrame({
+... "a":[1,np.nan,np.nan],
+... "b":[2,3,np.nan],
+... })
+>>> df.coalesce(
+... "a","b",
+... target_column_name="new_col",
+... default_value=-1,
+... )
+ a b new_col
+0 1.0 2.0 1.0
+1 NaN 3.0 3.0
+2 NaN NaN -1.0
+
+
This is more syntactic diabetes! For R users, this should look familiar to
+dplyr's coalesce function; for Python users, the interface
+should be more intuitive than the pandas.Series.combine_first
+method.
+
+
+
+
Parameters:
+
+
+
+
Name
+
Type
+
Description
+
Default
+
+
+
+
+
df
+
+ DataFrame
+
+
+
+
A pandas DataFrame.
+
+
+
+ required
+
+
+
+
column_names
+
+ Any
+
+
+
+
A list of column names.
+
+
+
+ ()
+
+
+
+
target_column_name
+
+ Optional[str]
+
+
+
+
The new column name after combining.
+If None, then the first column in column_names is updated,
+with the Null values replaced.
+
+
+
+ None
+
+
+
+
default_value
+
+ Optional[Union[int, float, str]]
+
+
+
+
A scalar to replace any remaining nulls
+after coalescing.
+
+
+
+ None
+
+
+
+
+
+
+
+
Raises:
+
+
+
+
Type
+
Description
+
+
+
+
+
+ ValueError
+
+
+
+
If length of column_names is less than 2.
+
+
+
+
+
+
+
+
+
Returns:
+
+
+
+
Type
+
Description
+
+
+
+
+
+ DataFrame
+
+
+
+
A pandas DataFrame with coalesced columns.
+
+
+
+
+
+
+
+ Source code in janitor/functions/coalesce.py
+
@pf.register_dataframe_method
+@deprecated_alias(columns="column_names",new_column_name="target_column_name")
+defcoalesce(
+ df:pd.DataFrame,
+ *column_names:Any,
+ target_column_name:Optional[str]=None,
+ default_value:Optional[Union[int,float,str]]=None,
+)->pd.DataFrame:
+"""Coalesce two or more columns of data in order of column names provided.
+
+ Given the variable arguments of column names,
+ `coalesce` finds and returns the first non-missing value
+ from these columns, for every row in the input dataframe.
+ If all the column values are null for a particular row,
+ then the `default_value` will be filled in.
+
+ If `target_column_name` is not provided,
+ then the first column is coalesced.
+
+ This method does not mutate the original DataFrame.
+
+ The [`select`][janitor.functions.select.select] syntax
+ can be used in `column_names`.
+
+ Examples:
+ Use `coalesce` with 3 columns, "a", "b" and "c".
+
+ >>> import pandas as pd
+ >>> import numpy as np
+ >>> import janitor
+ >>> df = pd.DataFrame({
+ ... "a": [np.nan, 1, np.nan],
+ ... "b": [2, 3, np.nan],
+ ... "c": [4, np.nan, np.nan],
+ ... })
+ >>> df.coalesce("a", "b", "c")
+ a b c
+ 0 2.0 2.0 4.0
+ 1 1.0 3.0 NaN
+ 2 NaN NaN NaN
+
+ Provide a target_column_name.
+
+ >>> df.coalesce("a", "b", "c", target_column_name="new_col")
+ a b c new_col
+ 0 NaN 2.0 4.0 2.0
+ 1 1.0 3.0 NaN 1.0
+ 2 NaN NaN NaN NaN
+
+ Provide a default value.
+
+ >>> import pandas as pd
+ >>> import numpy as np
+ >>> import janitor
+ >>> df = pd.DataFrame({
+ ... "a": [1, np.nan, np.nan],
+ ... "b": [2, 3, np.nan],
+ ... })
+ >>> df.coalesce(
+ ... "a", "b",
+ ... target_column_name="new_col",
+ ... default_value=-1,
+ ... )
+ a b new_col
+ 0 1.0 2.0 1.0
+ 1 NaN 3.0 3.0
+ 2 NaN NaN -1.0
+
+ This is more syntactic diabetes! For R users, this should look familiar to
+ `dplyr`'s `coalesce` function; for Python users, the interface
+ should be more intuitive than the `pandas.Series.combine_first`
+ method.
+
+ Args:
+ df: A pandas DataFrame.
+ column_names: A list of column names.
+ target_column_name: The new column name after combining.
+ If `None`, then the first column in `column_names` is updated,
+ with the Null values replaced.
+ default_value: A scalar to replace any remaining nulls
+ after coalescing.
+
+ Raises:
+ ValueError: If length of `column_names` is less than 2.
+
+ Returns:
+ A pandas DataFrame with coalesced columns.
+ """
+
+ ifnotcolumn_names:
+ returndf
+
+ indexers=_select_index([*column_names],df,axis="columns")
+
+ iflen(indexers)<2:
+ raiseValueError(
+ "The number of columns to coalesce should be a minimum of 2."
+ )
+
+ iftarget_column_name:
+ check("target_column_name",target_column_name,[str])
+
+ ifdefault_value:
+ check("default_value",default_value,[int,float,str])
+
+ df=df.copy()
+
+ outcome=df.iloc[:,indexers[0]]
+
+ fornuminrange(1,len(indexers)):
+ position=indexers[num]
+ replacement=df.iloc[:,position]
+ outcome=outcome.fillna(replacement)
+
+ ifoutcome.hasnansand(default_valueisnotNone):
+ outcome=outcome.fillna(default_value)
+
+ iftarget_column_nameisNone:
+ df.iloc[:,indexers[0]]=outcome
+ else:
+ df[target_column_name]=outcome
+
+ returndf
+
Flatten multi-level index/column dataframe to a single level.
+
This method does not mutate the original DataFrame.
+
Given a DataFrame containing multi-level index/columns, flatten to single-level
+by string-joining the labels in each level.
+
After a groupby / aggregate operation where .agg() is passed a
+list of multiple aggregation functions, a multi-level DataFrame is
+returned with the name of the function applied in the second level.
+
It is sometimes convenient for later indexing to flatten out this
+multi-level configuration back into a single level. This function does
+this through a simple string-joining of all the names across different
+levels in a single column.
@pf.register_dataframe_method
+defcollapse_levels(
+ df:pd.DataFrame,
+ sep:Union[str,None]=None,
+ glue:Union[str,None]=None,
+ axis="columns",
+)->pd.DataFrame:
+"""Flatten multi-level index/column dataframe to a single level.
+
+ This method does not mutate the original DataFrame.
+
+ Given a DataFrame containing multi-level index/columns, flatten to single-level
+ by string-joining the labels in each level.
+
+ After a `groupby` / `aggregate` operation where `.agg()` is passed a
+ list of multiple aggregation functions, a multi-level DataFrame is
+ returned with the name of the function applied in the second level.
+
+ It is sometimes convenient for later indexing to flatten out this
+ multi-level configuration back into a single level. This function does
+ this through a simple string-joining of all the names across different
+ levels in a single column.
+
+ Examples:
+ >>> import pandas as pd
+ >>> import janitor
+ >>> df = pd.DataFrame({
+ ... "class": ["bird", "bird", "bird", "mammal", "mammal"],
+ ... "max_speed": [389, 389, 24, 80, 21],
+ ... "type": ["falcon", "falcon", "parrot", "Lion", "Monkey"],
+ ... })
+ >>> df
+ class max_speed type
+ 0 bird 389 falcon
+ 1 bird 389 falcon
+ 2 bird 24 parrot
+ 3 mammal 80 Lion
+ 4 mammal 21 Monkey
+ >>> grouped_df = df.groupby("class")[['max_speed']].agg(["mean", "median"])
+ >>> grouped_df # doctest: +NORMALIZE_WHITESPACE
+ max_speed
+ mean median
+ class
+ bird 267.333333 389.0
+ mammal 50.500000 50.5
+ >>> grouped_df.collapse_levels(sep="_") # doctest: +NORMALIZE_WHITESPACE
+ max_speed_mean max_speed_median
+ class
+ bird 267.333333 389.0
+ mammal 50.500000 50.5
+
+ Before applying `.collapse_levels`, the `.agg` operation returns a
+ multi-level column DataFrame whose columns are `(level 1, level 2)`:
+
+ ```python
+ [("max_speed", "mean"), ("max_speed", "median")]
+ ```
+
+ `.collapse_levels` then flattens the column MultiIndex into a single
+ level index with names:
+
+ ```python
+ ["max_speed_mean", "max_speed_median"]
+ ```
+
+ For more control, a `glue` specification can be passed,
+ where the names of the levels are used to control the output of the
+ flattened index:
+ >>> (grouped_df
+ ... .rename_axis(columns=['column_name', 'agg_name'])
+ ... .collapse_levels(glue="{agg_name}_{column_name}")
+ ... )
+ mean_max_speed median_max_speed
+ class
+ bird 267.333333 389.0
+ mammal 50.500000 50.5
+
+ Note that for `glue` to work, the keyword arguments
+ in the glue specification
+ should be the names of the levels in the MultiIndex.
+
+ !!! abstract "Version Changed"
+
+ - 0.27.0
+ - Added `glue` and `axis` parameters.
+
+ Args:
+ df: A pandas DataFrame.
+ sep: String separator used to join the column level names.
+ glue: A specification on how the column levels should be combined.
+ It allows for a more granular composition,
+ and serves as an alternative to `sep`.
+ axis: Determines whether to collapse the
+ levels on the index or columns.
+
+ Returns:
+ A pandas DataFrame with single-level column index.
+ """# noqa: E501
+ if(sepisnotNone)and(glueisnotNone):
+ raiseValueError("Only one of sep or glue should be provided.")
+ ifsepisnotNone:
+ check("sep",sep,[str])
+ ifglueisnotNone:
+ check("glue",glue,[str])
+ check("axis",axis,[str])
+ ifaxisnotin{"index","columns"}:
+ raiseValueError(
+ "axis argument should be either 'index' or 'columns'."
+ )
+
+ ifnotisinstance(getattr(df,axis),pd.MultiIndex):
+ returndf
+
+ # TODO: Pyarrow offers faster string computations
+ # future work should take this into consideration,
+ # which would require a different route from python's string.join
+ # since work is only on the columns
+ # it is safe, and more efficient to slice/view the dataframe
+ # plus Pandas creates a new Index altogether
+ # as such, the original dataframe is not modified
+ df=df[:]
+ new_index=getattr(df,axis)
+ ifglueisnotNone:
+ new_index=[dict(zip(new_index.names,entry))forentryinnew_index]
+ new_index=[glue.format_map(mapping)formappinginnew_index]
+ setattr(df,axis,new_index)
+ returndf
+ sep="_"ifsepisNoneelsesep
+ levels=[levelforlevelinnew_index.levels]
+ all_strings=all(map(is_string_dtype,levels))
+ ifall_strings:
+ no_empty_string=all((entry!="").all()forentryinlevels)
+ ifno_empty_string:
+ new_index=new_index.map(sep.join)
+ setattr(df,axis,new_index)
+ returndf
+ new_index=(map(str,entry)forentryinnew_index)
+ new_index=[
+ # faster to use a list comprehension within string.join
+ # compared to a generator
+ # https://stackoverflow.com/a/37782238
+ sep.join([entryforentryinwordifentry])
+ forwordinnew_index
+ ]
+ setattr(df,axis,new_index)
+ returndf
+
Complete a data frame with missing combinations of data.
+
It is modeled after tidyr's complete function, and is a wrapper around
+expand_grid, pd.merge
+and pd.fillna. In a way, it is the inverse of pd.dropna, as it exposes
+implicitly missing rows.
+
Combinations of column names or a list/tuple of column names, or even a
+dictionary of column names and new values are possible.
+If a dictionary is passed,
+the user is required to ensure that the values are unique 1-D arrays.
+The keys in a dictionary must be present in the dataframe.
>>> df.complete(
+... {"Year":range(df.Year.min(),df.Year.max()+1)},
+... "Taxon",
+... sort=True
+... )
+ Year Taxon Abundance
+0 1999 Agarum 1.0
+1 1999 Saccharina 4.0
+2 2000 Agarum NaN
+3 2000 Saccharina 5.0
+4 2001 Agarum NaN
+5 2001 Saccharina NaN
+6 2002 Agarum NaN
+7 2002 Saccharina NaN
+8 2003 Agarum NaN
+9 2003 Saccharina NaN
+10 2004 Agarum 8.0
+11 2004 Saccharina 2.0
+
+
Fill missing values:
+
>>> df=pd.DataFrame(
+... dict(
+... group=(1,2,1,2),
+... item_id=(1,2,2,3),
+... item_name=("a","a","b","b"),
+... value1=(1,np.nan,3,4),
+... value2=range(4,8),
+... )
+... )
+>>> df
+ group item_id item_name value1 value2
+0 1 1 a 1.0 4
+1 2 2 a NaN 5
+2 1 2 b 3.0 6
+3 2 3 b 4.0 7
+>>> df.complete(
+... "group",
+... ("item_id","item_name"),
+... fill_value={"value1":0,"value2":99},
+... sort=True
+... )
+ group item_id item_name value1 value2
+0 1 1 a 1.0 4.0
+1 1 2 a 0.0 99.0
+2 1 2 b 3.0 6.0
+3 1 3 b 0.0 99.0
+4 2 1 a 0.0 99.0
+5 2 2 a 0.0 5.0
+6 2 2 b 0.0 99.0
+7 2 3 b 4.0 7.0
+
+
Limit the fill to only implicit missing values
+by setting explicit to False:
+
>>> df.complete(
+... "group",
+... ("item_id","item_name"),
+... fill_value={"value1":0,"value2":99},
+... explicit=False,
+... sort=True
+... )
+ group item_id item_name value1 value2
+0 1 1 a 1.0 4.0
+1 1 2 a 0.0 99.0
+2 1 2 b 3.0 6.0
+3 1 3 b 0.0 99.0
+4 2 1 a 0.0 99.0
+5 2 2 a NaN 5.0
+6 2 2 b 0.0 99.0
+7 2 3 b 4.0 7.0
+
+
+
+
+
Parameters:
+
+
+
+
Name
+
Type
+
Description
+
Default
+
+
+
+
+
df
+
+ DataFrame
+
+
+
+
A pandas DataFrame.
+
+
+
+ required
+
+
+
+
*columns
+
+ Any
+
+
+
+
This refers to the columns to be completed.
+It could be column labels (string type),
+a list/tuple of column labels, or a dictionary that pairs
+column labels with new values.
+
+
+
+ ()
+
+
+
+
sort
+
+ bool
+
+
+
+
Sort DataFrame based on *columns.
+
+
+
+ False
+
+
+
+
by
+
+ Optional[Union[list, str]]
+
+
+
+
Label or list of labels to group by.
+The explicit missing rows are returned per group.
+
+
+
+ None
+
+
+
+
fill_value
+
+ Optional[Union[Dict, Any]]
+
+
+
+
Scalar value to use instead of NaN
+for missing combinations. A dictionary, mapping columns names
+to a scalar value is also accepted.
+
+
+
+ None
+
+
+
+
explicit
+
+ bool
+
+
+
+
Determines if only implicitly missing values
+should be filled (False), or all nulls existing in the dataframe
+(True). explicit is applicable only
+if fill_value is not None.
+
+
+
+ True
+
+
+
+
+
+
+
+
Returns:
+
+
+
+
Type
+
Description
+
+
+
+
+
+ DataFrame
+
+
+
+
A pandas DataFrame with explicit missing rows, if any.
+
+
+
+
+
+
+
+ Source code in janitor/functions/complete.py
+
@pf.register_dataframe_method
+defcomplete(
+ df:pd.DataFrame,
+ *columns:Any,
+ sort:bool=False,
+ by:Optional[Union[list,str]]=None,
+ fill_value:Optional[Union[Dict,Any]]=None,
+ explicit:bool=True,
+)->pd.DataFrame:
+"""Complete a data frame with missing combinations of data.
+
+ It is modeled after tidyr's `complete` function, and is a wrapper around
+ [`expand_grid`][janitor.functions.expand_grid.expand_grid], `pd.merge`
+ and `pd.fillna`. In a way, it is the inverse of `pd.dropna`, as it exposes
+ implicitly missing rows.
+
+ Combinations of column names or a list/tuple of column names, or even a
+ dictionary of column names and new values are possible.
+ If a dictionary is passed,
+ the user is required to ensure that the values are unique 1-D arrays.
+ The keys in a dictionary must be present in the dataframe.
+
+ Examples:
+ >>> import pandas as pd
+ >>> import janitor
+ >>> import numpy as np
+ >>> df = pd.DataFrame(
+ ... {
+ ... "Year": [1999, 2000, 2004, 1999, 2004],
+ ... "Taxon": [
+ ... "Saccharina",
+ ... "Saccharina",
+ ... "Saccharina",
+ ... "Agarum",
+ ... "Agarum",
+ ... ],
+ ... "Abundance": [4, 5, 2, 1, 8],
+ ... }
+ ... )
+ >>> df
+ Year Taxon Abundance
+ 0 1999 Saccharina 4
+ 1 2000 Saccharina 5
+ 2 2004 Saccharina 2
+ 3 1999 Agarum 1
+ 4 2004 Agarum 8
+
+ Expose missing pairings of `Year` and `Taxon`:
+ >>> df.complete("Year", "Taxon", sort=True)
+ Year Taxon Abundance
+ 0 1999 Agarum 1.0
+ 1 1999 Saccharina 4.0
+ 2 2000 Agarum NaN
+ 3 2000 Saccharina 5.0
+ 4 2004 Agarum 8.0
+ 5 2004 Saccharina 2.0
+
+ Expose missing years from 1999 to 2004:
+ >>> df.complete(
+ ... {"Year": range(df.Year.min(), df.Year.max() + 1)},
+ ... "Taxon",
+ ... sort=True
+ ... )
+ Year Taxon Abundance
+ 0 1999 Agarum 1.0
+ 1 1999 Saccharina 4.0
+ 2 2000 Agarum NaN
+ 3 2000 Saccharina 5.0
+ 4 2001 Agarum NaN
+ 5 2001 Saccharina NaN
+ 6 2002 Agarum NaN
+ 7 2002 Saccharina NaN
+ 8 2003 Agarum NaN
+ 9 2003 Saccharina NaN
+ 10 2004 Agarum 8.0
+ 11 2004 Saccharina 2.0
+
+ Fill missing values:
+ >>> df = pd.DataFrame(
+ ... dict(
+ ... group=(1, 2, 1, 2),
+ ... item_id=(1, 2, 2, 3),
+ ... item_name=("a", "a", "b", "b"),
+ ... value1=(1, np.nan, 3, 4),
+ ... value2=range(4, 8),
+ ... )
+ ... )
+ >>> df
+ group item_id item_name value1 value2
+ 0 1 1 a 1.0 4
+ 1 2 2 a NaN 5
+ 2 1 2 b 3.0 6
+ 3 2 3 b 4.0 7
+ >>> df.complete(
+ ... "group",
+ ... ("item_id", "item_name"),
+ ... fill_value={"value1": 0, "value2": 99},
+ ... sort=True
+ ... )
+ group item_id item_name value1 value2
+ 0 1 1 a 1.0 4.0
+ 1 1 2 a 0.0 99.0
+ 2 1 2 b 3.0 6.0
+ 3 1 3 b 0.0 99.0
+ 4 2 1 a 0.0 99.0
+ 5 2 2 a 0.0 5.0
+ 6 2 2 b 0.0 99.0
+ 7 2 3 b 4.0 7.0
+
+ Limit the fill to only implicit missing values
+ by setting explicit to `False`:
+ >>> df.complete(
+ ... "group",
+ ... ("item_id", "item_name"),
+ ... fill_value={"value1": 0, "value2": 99},
+ ... explicit=False,
+ ... sort=True
+ ... )
+ group item_id item_name value1 value2
+ 0 1 1 a 1.0 4.0
+ 1 1 2 a 0.0 99.0
+ 2 1 2 b 3.0 6.0
+ 3 1 3 b 0.0 99.0
+ 4 2 1 a 0.0 99.0
+ 5 2 2 a NaN 5.0
+ 6 2 2 b 0.0 99.0
+ 7 2 3 b 4.0 7.0
+
+ Args:
+ df: A pandas DataFrame.
+ *columns: This refers to the columns to be completed.
+ It could be column labels (string type),
+ a list/tuple of column labels, or a dictionary that pairs
+ column labels with new values.
+ sort: Sort DataFrame based on *columns.
+ by: Label or list of labels to group by.
+ The explicit missing rows are returned per group.
+ fill_value: Scalar value to use instead of NaN
+ for missing combinations. A dictionary, mapping columns names
+ to a scalar value is also accepted.
+ explicit: Determines if only implicitly missing values
+ should be filled (`False`), or all nulls existing in the dataframe
+ (`True`). `explicit` is applicable only
+ if `fill_value` is not `None`.
+
+ Returns:
+ A pandas DataFrame with explicit missing rows, if any.
+ """# noqa: E501
+
+ ifnotcolumns:
+ returndf
+
+ # no copy made of the original dataframe
+ # since pd.merge (computed some lines below)
+ # makes a new object - essentially a copy
+ return_computations_complete(df,columns,sort,by,fill_value,explicit)
+
Concatenates the set of columns into a single column.
+
Used to quickly generate an index based on a group of columns.
+
This method mutates the original DataFrame.
+
+
+
+
Examples:
+
Concatenate two columns row-wise.
+
>>> importpandasaspd
+>>> importjanitor
+>>> df=pd.DataFrame({"a":[1,3,5],"b":list("xyz")})
+>>> df
+ a b
+0 1 x
+1 3 y
+2 5 z
+>>> df.concatenate_columns(
+... column_names=["a","b"],new_column_name="m",
+... )
+ a b m
+0 1 x 1-x
+1 3 y 3-y
+2 5 z 5-z
+
+
+
+
+
Parameters:
+
+
+
+
Name
+
Type
+
Description
+
Default
+
+
+
+
+
df
+
+ DataFrame
+
+
+
+
A pandas DataFrame.
+
+
+
+ required
+
+
+
+
column_names
+
+ List[Hashable]
+
+
+
+
A list of columns to concatenate together.
+
+
+
+ required
+
+
+
+
new_column_name
+
+ Hashable
+
+
+
+
The name of the new column.
+
+
+
+ required
+
+
+
+
sep
+
+ str
+
+
+
+
The separator between each column's data.
+
+
+
+ '-'
+
+
+
+
ignore_empty
+
+ bool
+
+
+
+
Ignore null values if exists.
+
+
+
+ True
+
+
+
+
+
+
+
+
Raises:
+
+
+
+
Type
+
Description
+
+
+
+
+
+ JanitorError
+
+
+
+
If at least two columns are not provided
+within column_names.
+
+
+
+
+
+
+
+
+
Returns:
+
+
+
+
Type
+
Description
+
+
+
+
+
+ DataFrame
+
+
+
+
A pandas DataFrame with concatenated columns.
+
+
+
+
+
+
+
+ Source code in janitor/functions/concatenate_columns.py
+
@pf.register_dataframe_method
+@deprecated_alias(columns="column_names")
+defconcatenate_columns(
+ df:pd.DataFrame,
+ column_names:List[Hashable],
+ new_column_name:Hashable,
+ sep:str="-",
+ ignore_empty:bool=True,
+)->pd.DataFrame:
+"""Concatenates the set of columns into a single column.
+
+ Used to quickly generate an index based on a group of columns.
+
+ This method mutates the original DataFrame.
+
+ Examples:
+ Concatenate two columns row-wise.
+
+ >>> import pandas as pd
+ >>> import janitor
+ >>> df = pd.DataFrame({"a": [1, 3, 5], "b": list("xyz")})
+ >>> df
+ a b
+ 0 1 x
+ 1 3 y
+ 2 5 z
+ >>> df.concatenate_columns(
+ ... column_names=["a", "b"], new_column_name="m",
+ ... )
+ a b m
+ 0 1 x 1-x
+ 1 3 y 3-y
+ 2 5 z 5-z
+
+ Args:
+ df: A pandas DataFrame.
+ column_names: A list of columns to concatenate together.
+ new_column_name: The name of the new column.
+ sep: The separator between each column's data.
+ ignore_empty: Ignore null values if exists.
+
+ Raises:
+ JanitorError: If at least two columns are not provided
+ within `column_names`.
+
+ Returns:
+ A pandas DataFrame with concatenated columns.
+ """
+ iflen(column_names)<2:
+ raiseJanitorError("At least two columns must be specified")
+
+ df[new_column_name]=(
+ df[column_names].astype(str).fillna("").agg(sep.join,axis=1)
+ )
+
+ ifignore_empty:
+
+ defremove_empty_string(x):
+"""Ignore empty/null string values from the concatenated output."""
+ returnsep.join(xforxinx.split(sep)ifx)
+
+ df[new_column_name]=df[new_column_name].transform(
+ remove_empty_string
+ )
+
+ returndf
+
The conditional_join function operates similarly to pd.merge,
+but supports joins on inequality operators,
+or a combination of equi and non-equi joins.
+
Joins solely on equality are not supported.
+
If the join is solely on equality, pd.merge function
+covers that; if you are interested in nearest joins, asof joins,
+or rolling joins, then pd.merge_asof covers that.
+There is also pandas' IntervalIndex, which is efficient for range joins,
+especially if the intervals do not overlap.
+
Column selection in df_columns and right_columns is possible using the
+select syntax.
+
Performance might be improved by setting use_numba to True.
+This assumes that numba is installed.
+
This function returns rows, if any, where values from df meet the
+condition(s) for values from right. The conditions are passed in
+as a variable argument of tuples, where the tuple is of
+the form (left_on, right_on, op); left_on is the column
+label from df, right_on is the column label from right,
+while op is the operator.
+
The col class is also supported in the conditional_join syntax.
+
For multiple conditions, the and(&)
+operator is used to combine the results of the individual conditions.
+
In some scenarios there might be performance gains if the less than join,
+or the greater than join condition, or the range condition
+is executed before the equi join - pass force=True to force this.
+
The operator can be any of ==, !=, <=, <, >=, >.
+
The join is done only on the columns.
+
For non-equi joins, only numeric, timedelta and date columns are supported.
+
inner, left, right and outer joins are supported.
+
If the columns from df and right have nothing in common,
+a single index column is returned; else, a MultiIndex column
+is returned.
Variable argument of tuple(s) of the form
+(left_on, right_on, op), where left_on is the column
+label from df, right_on is the column label from right,
+while op is the operator.
+The col class is also supported. The operator can be any of
+==, !=, <=, <, >=, >. For multiple conditions,
+the and(&) operator is used to combine the results
+of the individual conditions.
+
+
+
+ ()
+
+
+
+
how
+
+ Literal['inner', 'left', 'right', 'outer']
+
+
+
+
Indicates the type of join to be performed.
+It can be one of inner, left, right or outer.
+
+
+
+ 'inner'
+
+
+
+
sort_by_appearance
+
+ bool
+
+
+
+
If how = inner and
+sort_by_appearance = False, there
+is no guarantee that the original order is preserved.
+Usually, this offers more performance.
+If how = left, the row order from the left dataframe
+is preserved; if how = right, the row order
+from the right dataframe is preserved.
+
+
Deprecated in 0.25.0
+
+
+
+
+ False
+
+
+
+
df_columns
+
+ Optional[Any]
+
+
+
+
Columns to select from df in the final output dataframe.
+Column selection is based on the
+select syntax.
+
+
+
+ slice(None)
+
+
+
+
right_columns
+
+ Optional[Any]
+
+
+
+
Columns to select from right in the final output dataframe.
+Column selection is based on the
+select syntax.
+
+
+
+ slice(None)
+
+
+
+
use_numba
+
+ bool
+
+
+
+
Use numba, if installed, to accelerate the computation.
+
+
+
+ False
+
+
+
+
keep
+
+ Literal['first', 'last', 'all']
+
+
+
+
Choose whether to return the first match, last match or all matches.
+
+
+
+ 'all'
+
+
+
+
indicator
+
+ Optional[Union[bool, str]]
+
+
+
+
If True, adds a column to the output DataFrame
+called _merge with information on the source of each row.
+The column can be given a different name by providing a string argument.
+The column will have a Categorical type with the value of left_only
+for observations whose merge key only appears in the left DataFrame,
+right_only for observations whose merge key
+only appears in the right DataFrame, and both if the observation’s
+merge key is found in both DataFrames.
+
+
+
+ False
+
+
+
+
force
+
+ bool
+
+
+
+
If True, force the non-equi join conditions to execute before the equi join.
+
+
+
+ False
+
+
+
+
+
+
+
+
Returns:
+
+
+
+
Type
+
Description
+
+
+
+
+
+ DataFrame
+
+
+
+
A pandas DataFrame of the two merged Pandas objects.
+
+
+
+
+
+
+
+ Source code in janitor/functions/conditional_join.py
+
@pf.register_dataframe_method
+defconditional_join(
+ df:pd.DataFrame,
+ right:Union[pd.DataFrame,pd.Series],
+ *conditions:Any,
+ how:Literal["inner","left","right","outer"]="inner",
+ sort_by_appearance:bool=False,
+ df_columns:Optional[Any]=slice(None),
+ right_columns:Optional[Any]=slice(None),
+ keep:Literal["first","last","all"]="all",
+ use_numba:bool=False,
+ indicator:Optional[Union[bool,str]]=False,
+ force:bool=False,
+)->pd.DataFrame:
+"""The conditional_join function operates similarly to `pd.merge`,
+ but supports joins on inequality operators,
+ or a combination of equi and non-equi joins.
+
+ Joins solely on equality are not supported.
+
+ If the join is solely on equality, `pd.merge` function
+ covers that; if you are interested in nearest joins, asof joins,
+ or rolling joins, then `pd.merge_asof` covers that.
+ There is also pandas' IntervalIndex, which is efficient for range joins,
+ especially if the intervals do not overlap.
+
+ Column selection in `df_columns` and `right_columns` is possible using the
+ [`select`][janitor.functions.select.select] syntax.
+
+ Performance might be improved by setting `use_numba` to `True`.
+ This assumes that `numba` is installed.
+
+ This function returns rows, if any, where values from `df` meet the
+ condition(s) for values from `right`. The conditions are passed in
+ as a variable argument of tuples, where the tuple is of
+ the form `(left_on, right_on, op)`; `left_on` is the column
+ label from `df`, `right_on` is the column label from `right`,
+ while `op` is the operator.
+
+ The `col` class is also supported in the `conditional_join` syntax.
+
+ For multiple conditions, the and(`&`)
+ operator is used to combine the results of the individual conditions.
+
+ In some scenarios there might be performance gains if the less than join,
+ or the greater than join condition, or the range condition
+ is executed before the equi join - pass `force=True` to force this.
+
+ The operator can be any of `==`, `!=`, `<=`, `<`, `>=`, `>`.
+
+ The join is done only on the columns.
+
+ For non-equi joins, only numeric, timedelta and date columns are supported.
+
+ `inner`, `left`, `right` and `outer` joins are supported.
+
+ If the columns from `df` and `right` have nothing in common,
+ a single index column is returned; else, a MultiIndex column
+ is returned.
+
+ Examples:
+ >>> import pandas as pd
+ >>> import janitor
+ >>> df1 = pd.DataFrame({"value_1": [2, 5, 7, 1, 3, 4]})
+ >>> df2 = pd.DataFrame({"value_2A": [0, 3, 7, 12, 0, 2, 3, 1],
+ ... "value_2B": [1, 5, 9, 15, 1, 4, 6, 3],
+ ... })
+ >>> df1
+ value_1
+ 0 2
+ 1 5
+ 2 7
+ 3 1
+ 4 3
+ 5 4
+ >>> df2
+ value_2A value_2B
+ 0 0 1
+ 1 3 5
+ 2 7 9
+ 3 12 15
+ 4 0 1
+ 5 2 4
+ 6 3 6
+ 7 1 3
+
+ >>> df1.conditional_join(
+ ... df2,
+ ... ("value_1", "value_2A", ">"),
+ ... ("value_1", "value_2B", "<")
+ ... )
+ value_1 value_2A value_2B
+ 0 2 1 3
+ 1 5 3 6
+ 2 3 2 4
+ 3 4 3 5
+ 4 4 3 6
+
+ Use the `col` class:
+ >>> df1.conditional_join(
+ ... df2,
+ ... col("value_1") > col("value_2A"),
+ ... col("value_1") < col("value_2B")
+ ... )
+ value_1 value_2A value_2B
+ 0 2 1 3
+ 1 5 3 6
+ 2 3 2 4
+ 3 4 3 5
+ 4 4 3 6
+
+ Select specific columns, after the join:
+ >>> df1.conditional_join(
+ ... df2,
+ ... col("value_1") > col("value_2A"),
+ ... col("value_1") < col("value_2B"),
+ ... right_columns='value_2B',
+ ... how='left'
+ ... )
+ value_1 value_2B
+ 0 2 3.0
+ 1 5 6.0
+ 2 7 NaN
+ 3 1 NaN
+ 4 3 4.0
+ 5 4 5.0
+ 6 4 6.0
+
+ Rename columns, before the join:
+ >>> (df1
+ ... .rename(columns={'value_1':'left_column'})
+ ... .conditional_join(
+ ... df2,
+ ... ("left_column", "value_2A", ">"),
+ ... ("left_column", "value_2B", "<"),
+ ... right_columns='value_2B',
+ ... how='outer')
+ ... )
+ left_column value_2B
+ 0 7.0 NaN
+ 1 1.0 NaN
+ 2 2.0 3.0
+ 3 5.0 6.0
+ 4 3.0 4.0
+ 5 4.0 5.0
+ 6 4.0 6.0
+ 7 NaN 1.0
+ 8 NaN 9.0
+ 9 NaN 15.0
+ 10 NaN 1.0
+
+ Get the first match:
+ >>> df1.conditional_join(
+ ... df2,
+ ... col("value_1") > col("value_2A"),
+ ... col("value_1") < col("value_2B"),
+ ... keep='first'
+ ... )
+ value_1 value_2A value_2B
+ 0 2 1 3
+ 1 5 3 6
+ 2 3 2 4
+ 3 4 3 5
+
+ Get the last match:
+ >>> df1.conditional_join(
+ ... df2,
+ ... col("value_1") > col("value_2A"),
+ ... col("value_1") < col("value_2B"),
+ ... keep='last'
+ ... )
+ value_1 value_2A value_2B
+ 0 2 1 3
+ 1 5 3 6
+ 2 3 2 4
+ 3 4 3 6
+
+ Add an indicator column:
+ >>> df1.conditional_join(
+ ... df2,
+ ... ("value_1", "value_2A", ">"),
+ ... ("value_1", "value_2B", "<"),
+ ... how='outer',
+ ... indicator=True
+ ... )
+ value_1 _merge value_2A value_2B
+ 0 7.0 left_only NaN NaN
+ 1 1.0 left_only NaN NaN
+ 2 2.0 both 1.0 3.0
+ 3 5.0 both 3.0 6.0
+ 4 3.0 both 2.0 4.0
+ 5 4.0 both 3.0 5.0
+ 6 4.0 both 3.0 6.0
+ 7 NaN right_only 0.0 1.0
+ 8 NaN right_only 7.0 9.0
+ 9 NaN right_only 12.0 15.0
+ 10 NaN right_only 0.0 1.0
+
+ !!! abstract "Version Changed"
+
+ - 0.24.0
+ - Added `df_columns`, `right_columns`, `keep` and `use_numba` parameters.
+ - 0.24.1
+ - Added `indicator` parameter.
+ - 0.25.0
+ - `col` class supported.
+ - Outer join supported. `sort_by_appearance` deprecated.
+ - Numba support for equi join
+ - 0.27.0
+ - Added support for timedelta dtype.
+
+ Args:
+ df: A pandas DataFrame.
+ right: Named Series or DataFrame to join to.
+ conditions: Variable argument of tuple(s) of the form
+ `(left_on, right_on, op)`, where `left_on` is the column
+ label from `df`, `right_on` is the column label from `right`,
+ while `op` is the operator.
+ The `col` class is also supported. The operator can be any of
+ `==`, `!=`, `<=`, `<`, `>=`, `>`. For multiple conditions,
+ the and(`&`) operator is used to combine the results
+ of the individual conditions.
+ how: Indicates the type of join to be performed.
+ It can be one of `inner`, `left`, `right` or `outer`.
+ sort_by_appearance: If `how = inner` and
+ `sort_by_appearance = False`, there
+ is no guarantee that the original order is preserved.
+ Usually, this offers more performance.
+ If `how = left`, the row order from the left dataframe
+ is preserved; if `how = right`, the row order
+ from the right dataframe is preserved.
+ !!!warning "Deprecated in 0.25.0"
+ df_columns: Columns to select from `df` in the final output dataframe.
+ Column selection is based on the
+ [`select`][janitor.functions.select.select] syntax.
+ right_columns: Columns to select from `right` in the final output dataframe.
+ Column selection is based on the
+ [`select`][janitor.functions.select.select] syntax.
+ use_numba: Use numba, if installed, to accelerate the computation.
+ keep: Choose whether to return the first match, last match or all matches.
+ indicator: If `True`, adds a column to the output DataFrame
+ called `_merge` with information on the source of each row.
+ The column can be given a different name by providing a string argument.
+ The column will have a Categorical type with the value of `left_only`
+ for observations whose merge key only appears in the left DataFrame,
+ `right_only` for observations whose merge key
+ only appears in the right DataFrame, and `both` if the observation’s
+ merge key is found in both DataFrames.
+ force: If `True`, force the non-equi join conditions to execute before the equi join.
+
+
+ Returns:
+ A pandas DataFrame of the two merged Pandas objects.
+ """# noqa: E501
+
+ return_conditional_join_compute(
+ df,
+ right,
+ conditions,
+ how,
+ sort_by_appearance,
+ df_columns,
+ right_columns,
+ keep,
+ use_numba,
+ indicator,
+ force,
+ )
+
Convenience function to return the matching indices from an inner join.
+
+
New in version 0.27.0
+
+
+
+
+
Parameters:
+
+
+
+
Name
+
Type
+
Description
+
Default
+
+
+
+
+
df
+
+ DataFrame
+
+
+
+
A pandas DataFrame.
+
+
+
+ required
+
+
+
+
right
+
+ Union[DataFrame, Series]
+
+
+
+
Named Series or DataFrame to join to.
+
+
+
+ required
+
+
+
+
conditions
+
+ list[tuple[str]]
+
+
+
+
List of arguments of tuple(s) of the form
+(left_on, right_on, op), where left_on is the column
+label from df, right_on is the column label from right,
+while op is the operator.
+The col class is also supported. The operator can be any of
+==, !=, <=, <, >=, >. For multiple conditions,
+the and(&) operator is used to combine the results
+of the individual conditions.
+
+
+
+ required
+
+
+
+
use_numba
+
+ bool
+
+
+
+
Use numba, if installed, to accelerate the computation.
+
+
+
+ False
+
+
+
+
keep
+
+ Literal['first', 'last', 'all']
+
+
+
+
Choose whether to return the first match, last match or all matches.
+
+
+
+ 'all'
+
+
+
+
force
+
+ bool
+
+
+
+
If True, force the non-equi join conditions
+to execute before the equi join.
+
+
+
+ False
+
+
+
+
+
+
+
+
Returns:
+
+
+
+
Type
+
Description
+
+
+
+
+
+ tuple[ndarray, ndarray]
+
+
+
+
A tuple of indices for the rows in the dataframes that match.
+
+
+
+
+
+
+
+ Source code in janitor/functions/conditional_join.py
+
defget_join_indices(
+ df:pd.DataFrame,
+ right:Union[pd.DataFrame,pd.Series],
+ conditions:list[tuple[str]],
+ keep:Literal["first","last","all"]="all",
+ use_numba:bool=False,
+ force:bool=False,
+)->tuple[np.ndarray,np.ndarray]:
+"""Convenience function to return the matching indices from an inner join.
+
+ !!! info "New in version 0.27.0"
+
+ Args:
+ df: A pandas DataFrame.
+ right: Named Series or DataFrame to join to.
+ conditions: List of arguments of tuple(s) of the form
+ `(left_on, right_on, op)`, where `left_on` is the column
+ label from `df`, `right_on` is the column label from `right`,
+ while `op` is the operator.
+ The `col` class is also supported. The operator can be any of
+ `==`, `!=`, `<=`, `<`, `>=`, `>`. For multiple conditions,
+ the and(`&`) operator is used to combine the results
+ of the individual conditions.
+ use_numba: Use numba, if installed, to accelerate the computation.
+ keep: Choose whether to return the first match, last match or all matches.
+ force: If `True`, force the non-equi join conditions
+ to execute before the equi join.
+
+ Returns:
+ A tuple of indices for the rows in the dataframes that match.
+ """
+ return_conditional_join_compute(
+ df=df,
+ right=right,
+ conditions=conditions,
+ how="inner",
+ sort_by_appearance=False,
+ df_columns=None,
+ right_columns=None,
+ keep=keep,
+ use_numba=use_numba,
+ indicator=False,
+ force=force,
+ return_matching_indices=True,
+ )
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+ convert_date
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+ convert_excel_date(df,column_name)
+
+
+
+
+
+
+
Convert Excel's serial date format into Python datetime format.
Generates a running total of cumulative unique values in a given column.
+
A new column will be created containing a running
+count of unique values in the specified column.
+If case_sensitive is True, then the case of
+any letters will matter (i.e., a != A);
+otherwise, the case of any letters will not matter.
+
This method does not mutate the original DataFrame.
+
+
+
+
Examples:
+
>>> importpandasaspd
+>>> importjanitor
+>>> df=pd.DataFrame({
+... "letters":list("aabABb"),
+... "numbers":range(4,10),
+... })
+>>> df
+ letters numbers
+0 a 4
+1 a 5
+2 b 6
+3 A 7
+4 B 8
+5 b 9
+>>> df.count_cumulative_unique(
+... column_name="letters",
+... dest_column_name="letters_unique_count",
+... )
+ letters numbers letters_unique_count
+0 a 4 1
+1 a 5 1
+2 b 6 2
+3 A 7 3
+4 B 8 4
+5 b 9 4
+
+
Cumulative counts, ignoring casing.
+
>>> df.count_cumulative_unique(
+... column_name="letters",
+... dest_column_name="letters_unique_count",
+... case_sensitive=False,
+... )
+ letters numbers letters_unique_count
+0 a 4 1
+1 a 5 1
+2 b 6 2
+3 A 7 2
+4 B 8 2
+5 b 9 2
+
+
+
+
+
Parameters:
+
+
+
+
Name
+
Type
+
Description
+
Default
+
+
+
+
+
df
+
+ DataFrame
+
+
+
+
A pandas DataFrame.
+
+
+
+ required
+
+
+
+
column_name
+
+ Hashable
+
+
+
+
Name of the column containing values from which a
+running count of unique values will be created.
+
+
+
+ required
+
+
+
+
dest_column_name
+
+ str
+
+
+
+
The name of the new column containing the
+cumulative count of unique values that will be created.
+
+
+
+ required
+
+
+
+
case_sensitive
+
+ bool
+
+
+
+
Whether or not uppercase and lowercase letters
+will be considered equal. Only valid with string-like columns.
+
+
+
+ True
+
+
+
+
+
+
+
+
Raises:
+
+
+
+
Type
+
Description
+
+
+
+
+
+ TypeError
+
+
+
+
If case_sensitive is False when counting a non-string
+column_name.
+
+
+
+
+
+
+
+
+
Returns:
+
+
+
+
Type
+
Description
+
+
+
+
+
+ DataFrame
+
+
+
+
A pandas DataFrame with a new column containing a cumulative
+count of unique values from another column.
+
+
+
+
+
+
+
+ Source code in janitor/functions/count_cumulative_unique.py
+
@pf.register_dataframe_method
+defcount_cumulative_unique(
+ df:pd.DataFrame,
+ column_name:Hashable,
+ dest_column_name:str,
+ case_sensitive:bool=True,
+)->pd.DataFrame:
+"""Generates a running total of cumulative unique values in a given column.
+
+ A new column will be created containing a running
+ count of unique values in the specified column.
+ If `case_sensitive` is `True`, then the case of
+ any letters will matter (i.e., `a != A`);
+ otherwise, the case of any letters will not matter.
+
+ This method does not mutate the original DataFrame.
+
+ Examples:
+ >>> import pandas as pd
+ >>> import janitor
+ >>> df = pd.DataFrame({
+ ... "letters": list("aabABb"),
+ ... "numbers": range(4, 10),
+ ... })
+ >>> df
+ letters numbers
+ 0 a 4
+ 1 a 5
+ 2 b 6
+ 3 A 7
+ 4 B 8
+ 5 b 9
+ >>> df.count_cumulative_unique(
+ ... column_name="letters",
+ ... dest_column_name="letters_unique_count",
+ ... )
+ letters numbers letters_unique_count
+ 0 a 4 1
+ 1 a 5 1
+ 2 b 6 2
+ 3 A 7 3
+ 4 B 8 4
+ 5 b 9 4
+
+ Cumulative counts, ignoring casing.
+
+ >>> df.count_cumulative_unique(
+ ... column_name="letters",
+ ... dest_column_name="letters_unique_count",
+ ... case_sensitive=False,
+ ... )
+ letters numbers letters_unique_count
+ 0 a 4 1
+ 1 a 5 1
+ 2 b 6 2
+ 3 A 7 2
+ 4 B 8 2
+ 5 b 9 2
+
+ Args:
+ df: A pandas DataFrame.
+ column_name: Name of the column containing values from which a
+ running count of unique values will be created.
+ dest_column_name: The name of the new column containing the
+ cumulative count of unique values that will be created.
+ case_sensitive: Whether or not uppercase and lowercase letters
+ will be considered equal. Only valid with string-like columns.
+
+ Raises:
+ TypeError: If `case_sensitive` is False when counting a non-string
+ `column_name`.
+
+ Returns:
+ A pandas DataFrame with a new column containing a cumulative
+ count of unique values from another column.
+ """
+ check_column(df,column_name)
+ check_column(df,dest_column_name,present=False)
+
+ counter=df[column_name]
+ ifnotcase_sensitive:
+ try:
+ # Make it so that the the same uppercase and lowercase
+ # letter are treated as one unique value
+ counter=counter.str.lower()
+ except(AttributeError,TypeError)ase:
+ # AttributeError is raised by pandas when .str is used on
+ # non-string types, e.g. int.
+ # TypeError is raised by pandas when .str.lower is used on a
+ # forbidden string type, e.g. bytes.
+ raiseTypeError(
+ "case_sensitive=False can only be used with a string-like "
+ f"type. Column {column_name} is {counter.dtype} type."
+ )frome
+
+ counter=(
+ counter.groupby(counter,sort=False).cumcount().to_numpy(copy=False)
+ )
+ counter=np.cumsum(counter==0)
+
+ returndf.assign(**{dest_column_name:counter})
+
This method does not mutate the original DataFrame.
+
This method allows one to take a column containing currency values,
+inadvertently imported as a string, and cast it as a float. This is
+usually the case when reading CSV files that were modified in Excel.
+Empty strings (i.e. '') are retained as NaN values.
None: Default cleaning is applied. Empty strings are always retained as
+ NaN. Numbers, -, . are extracted and the resulting string
+ is cast to a float.
+
'accounting': Replaces numbers in parentheses with negatives, removes commas.
+
+
+
+
+
Parameters:
+
+
+
+
Name
+
Type
+
Description
+
Default
+
+
+
+
+
df
+
+ DataFrame
+
+
+
+
The pandas DataFrame.
+
+
+
+ required
+
+
+
+
column_name
+
+ str
+
+
+
+
The column containing currency values to modify.
+
+
+
+ required
+
+
+
+
cleaning_style
+
+ Optional[str]
+
+
+
+
What style of cleaning to perform.
+
+
+
+ None
+
+
+
+
cast_non_numeric
+
+ Optional[dict]
+
+
+
+
A dict of how to coerce certain strings to numeric
+type. For example, if there are values of 'REORDER' in the DataFrame,
+{'REORDER': 0} will cast all instances of 'REORDER' to 0.
+Only takes effect in the default cleaning style.
+
+
+
+ None
+
+
+
+
fill_all_non_numeric
+
+ Optional[Union[float, int]]
+
+
+
+
Similar to cast_non_numeric, but fills all
+strings to the same value. For example, fill_all_non_numeric=1, will
+make everything that doesn't coerce to a currency 1.
+Only takes effect in the default cleaning style.
+
+
+
+ None
+
+
+
+
remove_non_numeric
+
+ bool
+
+
+
+
If set to True, rows of df that contain
+non-numeric values in the column_name column will be removed.
+Only takes effect in the default cleaning style.
+
+
+
+ False
+
+
+
+
+
+
+
+
Raises:
+
+
+
+
Type
+
Description
+
+
+
+
+
+ ValueError
+
+
+
+
If cleaning_style is not one of the accepted styles.
+
+
+
+
+
+
+
+
+
Returns:
+
+
+
+
Type
+
Description
+
+
+
+
+
+ DataFrame
+
+
+
+
A pandas DataFrame.
+
+
+
+
+
+
+
+ Source code in janitor/functions/currency_column_to_numeric.py
+
@pf.register_dataframe_method
+@deprecated_alias(col_name="column_name",type="cleaning_style")
+defcurrency_column_to_numeric(
+ df:pd.DataFrame,
+ column_name:str,
+ cleaning_style:Optional[str]=None,
+ cast_non_numeric:Optional[dict]=None,
+ fill_all_non_numeric:Optional[Union[float,int]]=None,
+ remove_non_numeric:bool=False,
+)->pd.DataFrame:
+"""Convert currency column to numeric.
+
+ This method does not mutate the original DataFrame.
+
+ This method allows one to take a column containing currency values,
+ inadvertently imported as a string, and cast it as a float. This is
+ usually the case when reading CSV files that were modified in Excel.
+ Empty strings (i.e. `''`) are retained as `NaN` values.
+
+ Examples:
+ >>> import pandas as pd
+ >>> import janitor
+ >>> df = pd.DataFrame({
+ ... "a_col": [" 24.56", "-", "(12.12)", "1,000,000"],
+ ... "d_col": ["", "foo", "1.23 dollars", "-1,000 yen"],
+ ... })
+ >>> df # doctest: +NORMALIZE_WHITESPACE
+ a_col d_col
+ 0 24.56
+ 1 - foo
+ 2 (12.12) 1.23 dollars
+ 3 1,000,000 -1,000 yen
+
+ The default cleaning style.
+
+ >>> df.currency_column_to_numeric("d_col")
+ a_col d_col
+ 0 24.56 NaN
+ 1 - NaN
+ 2 (12.12) 1.23
+ 3 1,000,000 -1000.00
+
+ The accounting cleaning style.
+
+ >>> df.currency_column_to_numeric("a_col", cleaning_style="accounting") # doctest: +NORMALIZE_WHITESPACE
+ a_col d_col
+ 0 24.56
+ 1 0.00 foo
+ 2 -12.12 1.23 dollars
+ 3 1000000.00 -1,000 yen
+
+ Valid cleaning styles are:
+
+ - `None`: Default cleaning is applied. Empty strings are always retained as
+ `NaN`. Numbers, `-`, `.` are extracted and the resulting string
+ is cast to a float.
+ - `'accounting'`: Replaces numbers in parentheses with negatives, removes commas.
+
+ Args:
+ df: The pandas DataFrame.
+ column_name: The column containing currency values to modify.
+ cleaning_style: What style of cleaning to perform.
+ cast_non_numeric: A dict of how to coerce certain strings to numeric
+ type. For example, if there are values of 'REORDER' in the DataFrame,
+ `{'REORDER': 0}` will cast all instances of 'REORDER' to 0.
+ Only takes effect in the default cleaning style.
+ fill_all_non_numeric: Similar to `cast_non_numeric`, but fills all
+ strings to the same value. For example, `fill_all_non_numeric=1`, will
+ make everything that doesn't coerce to a currency `1`.
+ Only takes effect in the default cleaning style.
+ remove_non_numeric: If set to True, rows of `df` that contain
+ non-numeric values in the `column_name` column will be removed.
+ Only takes effect in the default cleaning style.
+
+ Raises:
+ ValueError: If `cleaning_style` is not one of the accepted styles.
+
+ Returns:
+ A pandas DataFrame.
+ """# noqa: E501
+
+ check("column_name",column_name,[str])
+ check_column(df,column_name)
+
+ column_series=df[column_name]
+ ifcleaning_style=="accounting":
+ outcome=(
+ df[column_name]
+ .str.strip()
+ .str.replace(",","",regex=False)
+ .str.replace(")","",regex=False)
+ .str.replace("(","-",regex=False)
+ .replace({"-":0.0})
+ .astype(float)
+ )
+ returndf.assign(**{column_name:outcome})
+ ifcleaning_styleisnotNone:
+ raiseValueError(
+ "`cleaning_style` is expected to be one of ('accounting', None). "
+ f"Got {cleaning_style!r} instead."
+ )
+
+ ifcast_non_numeric:
+ check("cast_non_numeric",cast_non_numeric,[dict])
+
+ _make_cc_patrial=partial(
+ _currency_column_to_numeric,
+ cast_non_numeric=cast_non_numeric,
+ )
+ column_series=column_series.apply(_make_cc_patrial)
+
+ ifremove_non_numeric:
+ df=df.loc[column_series!="",:]
+
+ # _replace_empty_string_with_none is applied here after the check on
+ # remove_non_numeric since "" is our indicator that a string was coerced
+ # in the original column
+ column_series=_replace_empty_string_with_none(column_series)
+
+ iffill_all_non_numericisnotNone:
+ check("fill_all_non_numeric",fill_all_non_numeric,[int,float])
+ column_series=column_series.fillna(fill_all_non_numeric)
+
+ column_series=_replace_original_empty_string_with_none(column_series)
+
+ df=df.assign(**{column_name:pd.to_numeric(column_series)})
+
+ returndf
+
De-concatenates a single column into multiple columns.
+
The column to de-concatenate can be either a collection (list, tuple, ...)
+which can be separated out with pd.Series.tolist(),
+or a string to slice based on sep.
+
To determine this behaviour automatically,
+the first element in the column specified is inspected.
+
If it is a string, then sep must be specified.
+Else, the function assumes that it is an iterable type
+(e.g. list or tuple),
+and will attempt to deconcatenate by splitting the list.
+
Given a column with string values, this is the inverse of the
+concatenate_columns
+function.
+
Used to quickly split columns out of a single column.
+
+
+
+
Examples:
+
>>> importpandasaspd
+>>> importjanitor
+>>> df=pd.DataFrame({"m":["1-x","2-y","3-z"]})
+>>> df
+ m
+0 1-x
+1 2-y
+2 3-z
+>>> df.deconcatenate_column("m",sep="-",autoname="col")
+ m col1 col2
+0 1-x 1 x
+1 2-y 2 y
+2 3-z 3 z
+
+
The keyword argument preserve_position
+takes True or False boolean
+that controls whether the new_column_names
+will take the original position
+of the to-be-deconcatenated column_name:
+
+
When preserve_position=False (default), df.columns change from
+ [..., column_name, ...] to [..., column_name, ..., new_column_names].
+ In other words, the deconcatenated new columns are appended to the right
+ of the original dataframe and the original column_name is NOT dropped.
+
When preserve_position=True, df.column change from
+ [..., column_name, ...] to [..., new_column_names, ...].
+ In other words, the deconcatenated new column will REPLACE the original
+ column_name at its original position, and column_name itself
+ is dropped.
+
+
The keyword argument autoname accepts a base string
+and then automatically creates numbered column names
+based off the base string.
+For example, if col is passed in as the argument to autoname,
+and 4 columns are created, then the resulting columns will be named
+col1, col2, col3, col4.
+Numbering is always 1-indexed, not 0-indexed,
+in order to make the column names human-friendly.
+
This method does not mutate the original DataFrame.
+
+
+
+
Parameters:
+
+
+
+
Name
+
Type
+
Description
+
Default
+
+
+
+
+
df
+
+ DataFrame
+
+
+
+
A pandas DataFrame.
+
+
+
+ required
+
+
+
+
column_name
+
+ Hashable
+
+
+
+
The column to split.
+
+
+
+ required
+
+
+
+
sep
+
+ Optional[str]
+
+
+
+
The separator delimiting the column's data.
+
+
+
+ None
+
+
+
+
new_column_names
+
+ Optional[Union[List[str], Tuple[str]]]
+
+
+
+
A list of new column names post-splitting.
+
+
+
+ None
+
+
+
+
autoname
+
+ str
+
+
+
+
A base name for automatically naming the new columns.
+Takes precedence over new_column_names if both are provided.
+
+
+
+ None
+
+
+
+
preserve_position
+
+ bool
+
+
+
+
Boolean for whether or not to preserve original
+position of the column upon de-concatenation.
+
+
+
+ False
+
+
+
+
+
+
+
+
Raises:
+
+
+
+
Type
+
Description
+
+
+
+
+
+ ValueError
+
+
+
+
If column_name is not present in the DataFrame.
+
+
+
+
+
+ ValueError
+
+
+
+
If sep is not provided and the column values
+are of type str.
+
+
+
+
+
+ ValueError
+
+
+
+
If either new_column_names or autoname
+is not supplied.
+
+
+
+
+
+ JanitorError
+
+
+
+
If incorrect number of names is provided
+within new_column_names.
+
+
+
+
+
+
+
+
+
Returns:
+
+
+
+
Type
+
Description
+
+
+
+
+
+ DataFrame
+
+
+
+
A pandas DataFrame with a deconcatenated column.
+
+
+
+
+
+
+
+ Source code in janitor/functions/deconcatenate_column.py
+
@pf.register_dataframe_method
+@deprecated_alias(column="column_name")
+defdeconcatenate_column(
+ df:pd.DataFrame,
+ column_name:Hashable,
+ sep:Optional[str]=None,
+ new_column_names:Optional[Union[List[str],Tuple[str]]]=None,
+ autoname:str=None,
+ preserve_position:bool=False,
+)->pd.DataFrame:
+"""De-concatenates a single column into multiple columns.
+
+ The column to de-concatenate can be either a collection (list, tuple, ...)
+ which can be separated out with `pd.Series.tolist()`,
+ or a string to slice based on `sep`.
+
+ To determine this behaviour automatically,
+ the first element in the column specified is inspected.
+
+ If it is a string, then `sep` must be specified.
+ Else, the function assumes that it is an iterable type
+ (e.g. `list` or `tuple`),
+ and will attempt to deconcatenate by splitting the list.
+
+ Given a column with string values, this is the inverse of the
+ [`concatenate_columns`][janitor.functions.concatenate_columns.concatenate_columns]
+ function.
+
+ Used to quickly split columns out of a single column.
+
+ Examples:
+ >>> import pandas as pd
+ >>> import janitor
+ >>> df = pd.DataFrame({"m": ["1-x", "2-y", "3-z"]})
+ >>> df
+ m
+ 0 1-x
+ 1 2-y
+ 2 3-z
+ >>> df.deconcatenate_column("m", sep="-", autoname="col")
+ m col1 col2
+ 0 1-x 1 x
+ 1 2-y 2 y
+ 2 3-z 3 z
+
+ The keyword argument `preserve_position`
+ takes `True` or `False` boolean
+ that controls whether the `new_column_names`
+ will take the original position
+ of the to-be-deconcatenated `column_name`:
+
+ - When `preserve_position=False` (default), `df.columns` change from
+ `[..., column_name, ...]` to `[..., column_name, ..., new_column_names]`.
+ In other words, the deconcatenated new columns are appended to the right
+ of the original dataframe and the original `column_name` is NOT dropped.
+ - When `preserve_position=True`, `df.column` change from
+ `[..., column_name, ...]` to `[..., new_column_names, ...]`.
+ In other words, the deconcatenated new column will REPLACE the original
+ `column_name` at its original position, and `column_name` itself
+ is dropped.
+
+ The keyword argument `autoname` accepts a base string
+ and then automatically creates numbered column names
+ based off the base string.
+ For example, if `col` is passed in as the argument to `autoname`,
+ and 4 columns are created, then the resulting columns will be named
+ `col1, col2, col3, col4`.
+ Numbering is always 1-indexed, not 0-indexed,
+ in order to make the column names human-friendly.
+
+ This method does not mutate the original DataFrame.
+
+ Args:
+ df: A pandas DataFrame.
+ column_name: The column to split.
+ sep: The separator delimiting the column's data.
+ new_column_names: A list of new column names post-splitting.
+ autoname: A base name for automatically naming the new columns.
+ Takes precedence over `new_column_names` if both are provided.
+ preserve_position: Boolean for whether or not to preserve original
+ position of the column upon de-concatenation.
+
+ Raises:
+ ValueError: If `column_name` is not present in the DataFrame.
+ ValueError: If `sep` is not provided and the column values
+ are of type `str`.
+ ValueError: If either `new_column_names` or `autoname`
+ is not supplied.
+ JanitorError: If incorrect number of names is provided
+ within `new_column_names`.
+
+ Returns:
+ A pandas DataFrame with a deconcatenated column.
+ """# noqa: E501
+
+ ifcolumn_namenotindf.columns:
+ raiseValueError(f"column name {column_name} not present in DataFrame")
+
+ ifisinstance(df[column_name].iloc[0],str):
+ ifsepisNone:
+ raiseValueError(
+ "`sep` must be specified if the column values "
+ "are of type `str`."
+ )
+ df_deconcat=df[column_name].str.split(sep,expand=True)
+ else:
+ df_deconcat=pd.DataFrame(
+ df[column_name].to_list(),columns=new_column_names,index=df.index
+ )
+
+ ifnew_column_namesisNoneandautonameisNone:
+ raiseValueError(
+ "One of `new_column_names` or `autoname` must be supplied."
+ )
+
+ ifautoname:
+ new_column_names=[
+ f"{autoname}{i}"foriinrange(1,df_deconcat.shape[1]+1)
+ ]
+
+ ifnotlen(new_column_names)==df_deconcat.shape[1]:
+ raiseJanitorError(
+ f"You need to provide {len(df_deconcat.shape[1])} names "
+ "to `new_column_names`"
+ )
+
+ df_deconcat.columns=new_column_names
+ df_new=pd.concat([df,df_deconcat],axis=1)
+
+ ifpreserve_position:
+ df_original=df.copy()
+ cols=list(df_original.columns)
+ index_original=cols.index(column_name)
+
+ fori,col_newinenumerate(new_column_names):
+ cols.insert(index_original+i,col_new)
+
+ df_new=df_new.select(cols,axis="columns").drop(columns=column_name)
+
+ returndf_new
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+ drop_constant_columns
+
+
+
+
+
+
+
Implementation of drop_constant_columns.
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+ drop_constant_columns(df)
+
+
+
+
+
+
+
Finds and drops the constant columns from a Pandas DataFrame.
+
+
+
+
Examples:
+
>>> importpandasaspd
+>>> importjanitor
+>>> data_dict={
+... "a":[1,1,1],
+... "b":[1,2,3],
+... "c":[1,1,1],
+... "d":["rabbit","leopard","lion"],
+... "e":["Cambridge","Shanghai","Basel"]
+... }
+>>> df=pd.DataFrame(data_dict)
+>>> df
+ a b c d e
+0 1 1 1 rabbit Cambridge
+1 1 2 1 leopard Shanghai
+2 1 3 1 lion Basel
+>>> df.drop_constant_columns()
+ b d e
+0 1 rabbit Cambridge
+1 2 leopard Shanghai
+2 3 lion Basel
+
+
+
+
+
Parameters:
+
+
+
+
Name
+
Type
+
Description
+
Default
+
+
+
+
+
df
+
+ DataFrame
+
+
+
+
Input Pandas DataFrame
+
+
+
+ required
+
+
+
+
+
+
+
+
Returns:
+
+
+
+
Type
+
Description
+
+
+
+
+
+ DataFrame
+
+
+
+
The Pandas DataFrame with the constant columns dropped.
+
+
+
+
+
+
+
+ Source code in janitor/functions/drop_constant_columns.py
+
@pf.register_dataframe_method
+defdrop_duplicate_columns(
+ df:pd.DataFrame,column_name:Hashable,nth_index:int=0
+)->pd.DataFrame:
+"""Remove a duplicated column specified by `column_name`.
+
+ Specifying `nth_index=0` will remove the first column,
+ `nth_index=1` will remove the second column,
+ and so on and so forth.
+
+ The corresponding tidyverse R's library is:
+ `select(-<column_name>_<nth_index + 1>)`
+
+ Examples:
+ >>> import pandas as pd
+ >>> import janitor
+ >>> df = pd.DataFrame({
+ ... "a": range(2, 5),
+ ... "b": range(3, 6),
+ ... "A": range(4, 7),
+ ... "a*": range(6, 9),
+ ... }).clean_names(remove_special=True)
+ >>> df
+ a b a a
+ 0 2 3 4 6
+ 1 3 4 5 7
+ 2 4 5 6 8
+ >>> df.drop_duplicate_columns(column_name="a", nth_index=1)
+ a b a
+ 0 2 3 6
+ 1 3 4 7
+ 2 4 5 8
+
+ Args:
+ df: A pandas DataFrame
+ column_name: Name of duplicated columns.
+ nth_index: Among the duplicated columns,
+ select the nth column to drop.
+
+ Returns:
+ A pandas DataFrame
+ """
+ col_indexes=[
+ col_idx
+ forcol_idx,col_nameinenumerate(df.columns)
+ ifcol_name==column_name
+ ]
+
+ # Select the column to remove based on nth_index.
+ removed_col_idx=col_indexes[nth_index]
+ # Filter out columns except for the one to be removed.
+ filtered_cols=[
+ c_iforc_i,_inenumerate(df.columns)ifc_i!=removed_col_idx
+ ]
+
+ returndf.iloc[:,filtered_cols]
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+ dropnotnull
+
+
+
+
+
+
+
Implementation source for dropnotnull.
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+ dropnotnull(df,column_name)
+
+
+
+
+
+
+
Drop rows that do not have null values in the given column.
+
This method does not mutate the original DataFrame.
+
+
+
+
Examples:
+
>>> importnumpyasnp
+>>> importpandasaspd
+>>> importjanitor
+>>> df=pd.DataFrame({"a":[1.,np.NaN,3.],"b":[None,"y","z"]})
+>>> df
+ a b
+0 1.0 None
+1 NaN y
+2 3.0 z
+>>> df.dropnotnull("a")
+ a b
+1 NaN y
+>>> df.dropnotnull("b")
+ a b
+0 1.0 None
+
+
+
+
+
Parameters:
+
+
+
+
Name
+
Type
+
Description
+
Default
+
+
+
+
+
df
+
+ DataFrame
+
+
+
+
A pandas DataFrame.
+
+
+
+ required
+
+
+
+
column_name
+
+ Hashable
+
+
+
+
The column name to drop rows from.
+
+
+
+ required
+
+
+
+
+
+
+
+
Returns:
+
+
+
+
Type
+
Description
+
+
+
+
+
+ DataFrame
+
+
+
+
A pandas DataFrame with dropped rows.
+
+
+
+
+
+
+
+ Source code in janitor/functions/dropnotnull.py
+
@pf.register_dataframe_method
+@deprecated_alias(column="column_name")
+defdropnotnull(df:pd.DataFrame,column_name:Hashable)->pd.DataFrame:
+"""Drop rows that do *not* have null values in the given column.
+
+ This method does not mutate the original DataFrame.
+
+ Examples:
+ >>> import numpy as np
+ >>> import pandas as pd
+ >>> import janitor
+ >>> df = pd.DataFrame({"a": [1., np.NaN, 3.], "b": [None, "y", "z"]})
+ >>> df
+ a b
+ 0 1.0 None
+ 1 NaN y
+ 2 3.0 z
+ >>> df.dropnotnull("a")
+ a b
+ 1 NaN y
+ >>> df.dropnotnull("b")
+ a b
+ 0 1.0 None
+
+ Args:
+ df: A pandas DataFrame.
+ column_name: The column name to drop rows from.
+
+ Returns:
+ A pandas DataFrame with dropped rows.
+ """
+ returndf[pd.isna(df[column_name])]
+
Encode the specified columns with Pandas' category dtype.
+
It is syntactic sugar around pd.Categorical.
+
This method does not mutate the original DataFrame.
+
Simply pass a string, or a sequence of column names to column_names;
+alternatively, you can pass kwargs, where the keys are the column names
+and the values can either be None, sort, appearance
+or a 1-D array-like object.
+
+
None: column is cast to an unordered categorical.
+
sort: column is cast to an ordered categorical,
+ with the order defined by the sort-order of the categories.
+
appearance: column is cast to an ordered categorical,
+ with the order defined by the order of appearance
+ in the original column.
+
1d-array-like object: column is cast to an ordered categorical,
+ with the categories and order as specified
+ in the input array.
+
+
column_names and kwargs parameters cannot be used at the same time.
+
+
+
+
Examples:
+
Using column_names
+
>>> importpandasaspd
+>>> importjanitor
+>>> df=pd.DataFrame({
+... "foo":["b","b","a","c","b"],
+... "bar":range(4,9),
+... })
+>>> df
+ foo bar
+0 b 4
+1 b 5
+2 a 6
+3 c 7
+4 b 8
+>>> df.dtypes
+foo object
+bar int64
+dtype: object
+>>> enc_df=df.encode_categorical(column_names="foo")
+>>> enc_df.dtypes
+foo category
+bar int64
+dtype: object
+>>> enc_df["foo"].cat.categories
+Index(['a', 'b', 'c'], dtype='object')
+>>> enc_df["foo"].cat.ordered
+False
+
A column name or an iterable (list or tuple)
+of column names.
+
+
+
+ None
+
+
+
+
**kwargs
+
+ Any
+
+
+
+
A mapping from column name to either None,
+'sort' or 'appearance', or a 1-D array. This is useful
+in creating categorical columns that are ordered, or
+if the user needs to explicitly specify the categories.
+
+
+
+ {}
+
+
+
+
+
+
+
+
Raises:
+
+
+
+
Type
+
Description
+
+
+
+
+
+ ValueError
+
+
+
+
If both column_names and kwargs are provided.
+
+
+
+
+
+
+
+
+
Returns:
+
+
+
+
Type
+
Description
+
+
+
+
+
+ DataFrame
+
+
+
+
A pandas DataFrame.
+
+
+
+
+
+
+
+ Source code in janitor/functions/encode_categorical.py
+
@pf.register_dataframe_method
+@deprecated_alias(columns="column_names")
+defencode_categorical(
+ df:pd.DataFrame,
+ column_names:Union[str,Iterable[str],Hashable]=None,
+ **kwargs:Any,
+)->pd.DataFrame:
+"""Encode the specified columns with Pandas' [category dtype][cat].
+
+ [cat]: http://pandas.pydata.org/pandas-docs/stable/user_guide/categorical.html
+
+ It is syntactic sugar around `pd.Categorical`.
+
+ This method does not mutate the original DataFrame.
+
+ Simply pass a string, or a sequence of column names to `column_names`;
+ alternatively, you can pass kwargs, where the keys are the column names
+ and the values can either be None, `sort`, `appearance`
+ or a 1-D array-like object.
+
+ - None: column is cast to an unordered categorical.
+ - `sort`: column is cast to an ordered categorical,
+ with the order defined by the sort-order of the categories.
+ - `appearance`: column is cast to an ordered categorical,
+ with the order defined by the order of appearance
+ in the original column.
+ - 1d-array-like object: column is cast to an ordered categorical,
+ with the categories and order as specified
+ in the input array.
+
+ `column_names` and `kwargs` parameters cannot be used at the same time.
+
+ Examples:
+ Using `column_names`
+
+ >>> import pandas as pd
+ >>> import janitor
+ >>> df = pd.DataFrame({
+ ... "foo": ["b", "b", "a", "c", "b"],
+ ... "bar": range(4, 9),
+ ... })
+ >>> df
+ foo bar
+ 0 b 4
+ 1 b 5
+ 2 a 6
+ 3 c 7
+ 4 b 8
+ >>> df.dtypes
+ foo object
+ bar int64
+ dtype: object
+ >>> enc_df = df.encode_categorical(column_names="foo")
+ >>> enc_df.dtypes
+ foo category
+ bar int64
+ dtype: object
+ >>> enc_df["foo"].cat.categories
+ Index(['a', 'b', 'c'], dtype='object')
+ >>> enc_df["foo"].cat.ordered
+ False
+
+ Using `kwargs` to specify an ordered categorical.
+
+ >>> import pandas as pd
+ >>> import janitor
+ >>> df = pd.DataFrame({
+ ... "foo": ["b", "b", "a", "c", "b"],
+ ... "bar": range(4, 9),
+ ... })
+ >>> df.dtypes
+ foo object
+ bar int64
+ dtype: object
+ >>> enc_df = df.encode_categorical(foo="appearance")
+ >>> enc_df.dtypes
+ foo category
+ bar int64
+ dtype: object
+ >>> enc_df["foo"].cat.categories
+ Index(['b', 'a', 'c'], dtype='object')
+ >>> enc_df["foo"].cat.ordered
+ True
+
+ Args:
+ df: A pandas DataFrame object.
+ column_names: A column name or an iterable (list or tuple)
+ of column names.
+ **kwargs: A mapping from column name to either `None`,
+ `'sort'` or `'appearance'`, or a 1-D array. This is useful
+ in creating categorical columns that are ordered, or
+ if the user needs to explicitly specify the categories.
+
+ Raises:
+ ValueError: If both `column_names` and `kwargs` are provided.
+
+ Returns:
+ A pandas DataFrame.
+ """# noqa: E501
+
+ ifall((column_names,kwargs)):
+ raiseValueError(
+ "Only one of `column_names` or `kwargs` can be provided."
+ )
+ # column_names deal with only category dtype (unordered)
+ # kwargs takes care of scenarios where user wants an ordered category
+ # or user supplies specific categories to create the categorical
+ ifcolumn_namesisnotNone:
+ column_names=get_index_labels([column_names],df,axis="columns")
+ dtypes={col:"category"forcolincolumn_names}
+ returndf.astype(dtypes)
+
+ return_computations_as_categorical(df,**kwargs)
+
Creates a DataFrame from a cartesian combination of all inputs.
+
It is not restricted to DataFrame;
+it can work with any list-like structure
+that is 1 or 2 dimensional.
+
If method-chaining to a DataFrame, a string argument
+to df_key parameter must be provided.
+
Data types are preserved in this function,
+including pandas' extension array dtypes.
+
The output will always be a DataFrame, usually with a MultiIndex column,
+with the keys of the others dictionary serving as the top level columns.
+
If a pandas Series/DataFrame is passed, and has a labeled index, or
+a MultiIndex index, the index is discarded; the final DataFrame
+will have a RangeIndex.
+
The MultiIndexed DataFrame can be flattened using pyjanitor's
+collapse_levels
+method; the user can also decide to drop any of the levels, via pandas'
+droplevel method.
+
Examples:
+
>>> import pandas as pd
+>>> from janitor.functions.expand_grid import expand_grid
+>>> df = pd.DataFrame({"x": [1, 2], "y": [2, 1]})
+>>> data = {"z": [1, 2, 3]}
+>>> df.expand_grid(df_key="df", others=data)
+ df z
+ x y 0
+0 1 2 1
+1 1 2 2
+2 1 2 3
+3 2 1 1
+4 2 1 2
+5 2 1 3
+
+`expand_grid` works with non-pandas objects:
+
+>>> data = {"x": [1, 2, 3], "y": [1, 2]}
+>>> expand_grid(others=data)
+ x y
+ 0 0
+0 1 1
+1 1 2
+2 2 1
+3 2 2
+4 3 1
+5 3 2
+
+
+
+
+
Parameters:
+
+
+
+
Name
+
Type
+
Description
+
Default
+
+
+
+
+
df
+
+ Optional[DataFrame]
+
+
+
+
A pandas DataFrame.
+
+
+
+ None
+
+
+
+
df_key
+
+ Optional[str]
+
+
+
+
Name of key for the dataframe.
+It becomes part of the column names of the dataframe.
+
+
+
+ None
+
+
+
+
others
+
+ Optional[Dict]
+
+
+
+
A dictionary that contains the data
+to be combined with the dataframe.
+If no dataframe exists, all inputs
+in others will be combined to create a DataFrame.
+
+
+
+ None
+
+
+
+
+
+
+
+
Raises:
+
+
+
+
Type
+
Description
+
+
+
+
+
+ KeyError
+
+
+
+
If there is a DataFrame and df_key is not provided.
+
+
+
+
+
+
+
+
+
Returns:
+
+
+
+
Type
+
Description
+
+
+
+
+
+ Union[DataFrame, None]
+
+
+
+
A pandas DataFrame of the cartesian product.
+
+
+
+
+
+ Union[DataFrame, None]
+
+
+
+
If df is not provided, and others is not provided,
+
+
+
+
+
+ Union[DataFrame, None]
+
+
+
+
None is returned.
+
+
+
+
+
+
+
+ Source code in janitor/functions/expand_grid.py
+
@pf.register_dataframe_method
+defexpand_grid(
+ df:Optional[pd.DataFrame]=None,
+ df_key:Optional[str]=None,
+ *,
+ others:Optional[Dict]=None,
+)->Union[pd.DataFrame,None]:
+"""Creates a DataFrame from a cartesian combination of all inputs.
+
+ It is not restricted to DataFrame;
+ it can work with any list-like structure
+ that is 1 or 2 dimensional.
+
+ If method-chaining to a DataFrame, a string argument
+ to `df_key` parameter must be provided.
+
+ Data types are preserved in this function,
+ including pandas' extension array dtypes.
+
+ The output will always be a DataFrame, usually with a MultiIndex column,
+ with the keys of the `others` dictionary serving as the top level columns.
+
+ If a pandas Series/DataFrame is passed, and has a labeled index, or
+ a MultiIndex index, the index is discarded; the final DataFrame
+ will have a RangeIndex.
+
+ The MultiIndexed DataFrame can be flattened using pyjanitor's
+ [`collapse_levels`][janitor.functions.collapse_levels.collapse_levels]
+ method; the user can also decide to drop any of the levels, via pandas'
+ `droplevel` method.
+
+ Examples:
+
+ >>> import pandas as pd
+ >>> from janitor.functions.expand_grid import expand_grid
+ >>> df = pd.DataFrame({"x": [1, 2], "y": [2, 1]})
+ >>> data = {"z": [1, 2, 3]}
+ >>> df.expand_grid(df_key="df", others=data)
+ df z
+ x y 0
+ 0 1 2 1
+ 1 1 2 2
+ 2 1 2 3
+ 3 2 1 1
+ 4 2 1 2
+ 5 2 1 3
+
+ `expand_grid` works with non-pandas objects:
+
+ >>> data = {"x": [1, 2, 3], "y": [1, 2]}
+ >>> expand_grid(others=data)
+ x y
+ 0 0
+ 0 1 1
+ 1 1 2
+ 2 2 1
+ 3 2 2
+ 4 3 1
+ 5 3 2
+
+ Args:
+ df: A pandas DataFrame.
+ df_key: Name of key for the dataframe.
+ It becomes part of the column names of the dataframe.
+ others: A dictionary that contains the data
+ to be combined with the dataframe.
+ If no dataframe exists, all inputs
+ in `others` will be combined to create a DataFrame.
+
+ Raises:
+ KeyError: If there is a DataFrame and `df_key` is not provided.
+
+ Returns:
+ A pandas DataFrame of the cartesian product.
+ If `df` is not provided, and `others` is not provided,
+ None is returned.
+ """
+
+ ifdfisnotNone:
+ check("df",df,[pd.DataFrame])
+ ifnotdf_key:
+ raiseKeyError(
+ "Using `expand_grid` as part of a "
+ "DataFrame method chain requires that "
+ "a string argument be provided for "
+ "the `df_key` parameter. "
+ )
+
+ check("df_key",df_key,[str])
+
+ ifnotothersand(dfisnotNone):
+ returndf
+
+ ifnotothers:
+ returnNone
+
+ check("others",others,[dict])
+
+ forkeyinothers:
+ check("key",key,[str])
+
+ ifdfisnotNone:
+ others={**{df_key:df},**others}
+
+ others=_computations_expand_grid(others)
+ returnpd.DataFrame(others,copy=False)
+
This method will create a new column with the string _enc appended
+after the original column's name.
+This can be overridden with the suffix parameter.
+
Internally, this method uses pandas factorize method.
+It takes in an optional suffix and keyword arguments also.
+An empty string as suffix will override the existing column.
+
This method does not mutate the original DataFrame.
+
+
+
+
Examples:
+
>>> importpandasaspd
+>>> importjanitor
+>>> df=pd.DataFrame({
+... "foo":["b","b","a","c","b"],
+... "bar":range(4,9),
+... })
+>>> df
+ foo bar
+0 b 4
+1 b 5
+2 a 6
+3 c 7
+4 b 8
+>>> df.factorize_columns(column_names="foo")
+ foo bar foo_enc
+0 b 4 0
+1 b 5 0
+2 a 6 1
+3 c 7 2
+4 b 8 0
+
+
+
+
+
Parameters:
+
+
+
+
Name
+
Type
+
Description
+
Default
+
+
+
+
+
df
+
+ DataFrame
+
+
+
+
The pandas DataFrame object.
+
+
+
+ required
+
+
+
+
column_names
+
+ Union[str, Iterable[str], Hashable]
+
+
+
+
A column name or an iterable (list or tuple) of
+column names.
+
+
+
+ required
+
+
+
+
suffix
+
+ str
+
+
+
+
Suffix to be used for the new column.
+An empty string suffix means, it will override the existing column.
+
+
+
+ '_enc'
+
+
+
+
**kwargs
+
+ Any
+
+
+
+
Keyword arguments. It takes any of the keyword arguments,
+which the pandas factorize method takes like sort, na_sentinel,
+size_hint.
+
+
+
+ {}
+
+
+
+
+
+
+
+
Returns:
+
+
+
+
Type
+
Description
+
+
+
+
+
+ DataFrame
+
+
+
+
A pandas DataFrame.
+
+
+
+
+
+
+
+ Source code in janitor/functions/factorize_columns.py
+
@pf.register_dataframe_method
+deffactorize_columns(
+ df:pd.DataFrame,
+ column_names:Union[str,Iterable[str],Hashable],
+ suffix:str="_enc",
+ **kwargs:Any,
+)->pd.DataFrame:
+"""Converts labels into numerical data.
+
+ This method will create a new column with the string `_enc` appended
+ after the original column's name.
+ This can be overridden with the suffix parameter.
+
+ Internally, this method uses pandas `factorize` method.
+ It takes in an optional suffix and keyword arguments also.
+ An empty string as suffix will override the existing column.
+
+ This method does not mutate the original DataFrame.
+
+ Examples:
+ >>> import pandas as pd
+ >>> import janitor
+ >>> df = pd.DataFrame({
+ ... "foo": ["b", "b", "a", "c", "b"],
+ ... "bar": range(4, 9),
+ ... })
+ >>> df
+ foo bar
+ 0 b 4
+ 1 b 5
+ 2 a 6
+ 3 c 7
+ 4 b 8
+ >>> df.factorize_columns(column_names="foo")
+ foo bar foo_enc
+ 0 b 4 0
+ 1 b 5 0
+ 2 a 6 1
+ 3 c 7 2
+ 4 b 8 0
+
+ Args:
+ df: The pandas DataFrame object.
+ column_names: A column name or an iterable (list or tuple) of
+ column names.
+ suffix: Suffix to be used for the new column.
+ An empty string suffix means, it will override the existing column.
+ **kwargs: Keyword arguments. It takes any of the keyword arguments,
+ which the pandas factorize method takes like `sort`, `na_sentinel`,
+ `size_hint`.
+
+ Returns:
+ A pandas DataFrame.
+ """
+ df=_factorize(df.copy(),column_names,suffix,**kwargs)
+ returndf
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+ fill
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+ fill_direction(df,**kwargs)
+
+
+
+
+
+
+
Provide a method-chainable function for filling missing values
+in selected columns.
+
It is a wrapper for pd.Series.ffill and pd.Series.bfill,
+and pairs the column name with one of up, down, updown,
+and downup.
+
+
Note
+
This function will be deprecated in a 1.x release.
+Please use pd.DataFrame.assign instead.
A column name or an iterable (list
+or tuple) of column names. If a single column name is passed in,
+then only that column will be filled; if a list or tuple is passed
+in, then those columns will all be filled with the same value.
@pf.register_dataframe_method
+@refactored_function(
+ message="This function will be deprecated in a 1.x release. "
+ "Kindly use `jn.impute` instead."
+)
+@deprecated_alias(columns="column_names")
+deffill_empty(
+ df:pd.DataFrame,
+ column_names:Union[str,Iterable[str],Hashable],
+ value:Any,
+)->pd.DataFrame:
+"""Fill `NaN` values in specified columns with a given value.
+
+ Super sugary syntax that wraps `pandas.DataFrame.fillna`.
+
+ This method mutates the original DataFrame.
+
+ !!!note
+
+ This function will be deprecated in a 1.x release.
+ Please use [`jn.impute`][janitor.functions.impute.impute] instead.
+
+ Examples:
+ >>> import pandas as pd
+ >>> import janitor
+ >>> df = pd.DataFrame(
+ ... {
+ ... 'col1': [1, 2, 3],
+ ... 'col2': [None, 4, None ],
+ ... 'col3': [None, 5, 6]
+ ... }
+ ... )
+ >>> df
+ col1 col2 col3
+ 0 1 NaN NaN
+ 1 2 4.0 5.0
+ 2 3 NaN 6.0
+ >>> df.fill_empty(column_names = 'col2', value = 0)
+ col1 col2 col3
+ 0 1 0.0 NaN
+ 1 2 4.0 5.0
+ 2 3 0.0 6.0
+ >>> df.fill_empty(column_names = ['col2', 'col3'], value = 0)
+ col1 col2 col3
+ 0 1 0.0 0.0
+ 1 2 4.0 5.0
+ 2 3 0.0 6.0
+
+ Args:
+ df: A pandas DataFrame.
+ column_names: A column name or an iterable (list
+ or tuple) of column names. If a single column name is passed in,
+ then only that column will be filled; if a list or tuple is passed
+ in, then those columns will all be filled with the same value.
+ value: The value that replaces the `NaN` values.
+
+ Returns:
+ A pandas DataFrame with `NaN` values filled.
+ """
+
+ check_column(df,column_names)
+ return_fill_empty(df,column_names,value=value)
+
Filter a dataframe for values in a column that exist in the given iterable.
+
This method does not mutate the original DataFrame.
+
Assumes exact matching; fuzzy matching not implemented.
+
+
+
+
Examples:
+
Filter the dataframe to retain rows for which names
+are exactly James or John.
+
>>> importpandasaspd
+>>> importjanitor
+>>> df=pd.DataFrame({"names":["Jane","Jeremy","John"],"foo":list("xyz")})
+>>> df
+ names foo
+0 Jane x
+1 Jeremy y
+2 John z
+>>> df.filter_column_isin(column_name="names",iterable=["James","John"])
+ names foo
+2 John z
+
+
This is the method-chaining alternative to:
+
df=df[df["names"].isin(["James","John"])]
+
+
If complement=True, then we will only get rows for which the names
+are neither James nor John.
+
+
+
+
Parameters:
+
+
+
+
Name
+
Type
+
Description
+
Default
+
+
+
+
+
df
+
+ DataFrame
+
+
+
+
A pandas DataFrame.
+
+
+
+ required
+
+
+
+
column_name
+
+ Hashable
+
+
+
+
The column on which to filter.
+
+
+
+ required
+
+
+
+
iterable
+
+ Iterable
+
+
+
+
An iterable. Could be a list, tuple, another pandas
+Series.
+
+
+
+ required
+
+
+
+
complement
+
+ bool
+
+
+
+
Whether to return the complement of the selection or
+not.
+
+
+
+ False
+
+
+
+
+
+
+
+
Raises:
+
+
+
+
Type
+
Description
+
+
+
+
+
+ ValueError
+
+
+
+
If iterable does not have a length of 1
+or greater.
+
+
+
+
+
+
+
+
+
Returns:
+
+
+
+
Type
+
Description
+
+
+
+
+
+ DataFrame
+
+
+
+
A filtered pandas DataFrame.
+
+
+
+
+
+
+
+ Source code in janitor/functions/filter.py
+
@pf.register_dataframe_method
+@deprecated_alias(column="column_name")
+deffilter_column_isin(
+ df:pd.DataFrame,
+ column_name:Hashable,
+ iterable:Iterable,
+ complement:bool=False,
+)->pd.DataFrame:
+"""Filter a dataframe for values in a column that exist in the given iterable.
+
+ This method does not mutate the original DataFrame.
+
+ Assumes exact matching; fuzzy matching not implemented.
+
+ Examples:
+ Filter the dataframe to retain rows for which `names`
+ are exactly `James` or `John`.
+
+ >>> import pandas as pd
+ >>> import janitor
+ >>> df = pd.DataFrame({"names": ["Jane", "Jeremy", "John"], "foo": list("xyz")})
+ >>> df
+ names foo
+ 0 Jane x
+ 1 Jeremy y
+ 2 John z
+ >>> df.filter_column_isin(column_name="names", iterable=["James", "John"])
+ names foo
+ 2 John z
+
+ This is the method-chaining alternative to:
+
+ ```python
+ df = df[df["names"].isin(["James", "John"])]
+ ```
+
+ If `complement=True`, then we will only get rows for which the names
+ are neither `James` nor `John`.
+
+ Args:
+ df: A pandas DataFrame.
+ column_name: The column on which to filter.
+ iterable: An iterable. Could be a list, tuple, another pandas
+ Series.
+ complement: Whether to return the complement of the selection or
+ not.
+
+ Raises:
+ ValueError: If `iterable` does not have a length of `1`
+ or greater.
+
+ Returns:
+ A filtered pandas DataFrame.
+ """# noqa: E501
+ iflen(iterable)==0:
+ raiseValueError(
+ "`iterable` kwarg must be given an iterable of length 1 "
+ "or greater."
+ )
+ criteria=df[column_name].isin(iterable)
+
+ ifcomplement:
+ returndf[~criteria]
+ returndf[criteria]
+
This only affects the format of the start_date and end_date
+parameters. If there's an issue with the format of the DataFrame being
+parsed, you would pass {'format': your_format} to column_date_options.
+
+
+
+
+
Parameters:
+
+
+
+
Name
+
Type
+
Description
+
Default
+
+
+
+
+
df
+
+ DataFrame
+
+
+
+
The dataframe to filter on.
+
+
+
+ required
+
+
+
+
column_name
+
+ Hashable
+
+
+
+
The column which to apply the fraction transformation.
+
+
+
+ required
+
+
+
+
start_date
+
+ Optional[date]
+
+
+
+
The beginning date to use to filter the DataFrame.
+
+
+
+ None
+
+
+
+
end_date
+
+ Optional[date]
+
+
+
+
The end date to use to filter the DataFrame.
+
+
+
+ None
+
+
+
+
years
+
+ Optional[List]
+
+
+
+
The years to use to filter the DataFrame.
+
+
+
+ None
+
+
+
+
months
+
+ Optional[List]
+
+
+
+
The months to use to filter the DataFrame.
+
+
+
+ None
+
+
+
+
days
+
+ Optional[List]
+
+
+
+
The days to use to filter the DataFrame.
+
+
+
+ None
+
+
+
+
column_date_options
+
+ Optional[Dict]
+
+
+
+
Special options to use when parsing the date
+column in the original DataFrame. The options may be found at the
+official Pandas documentation.
+
+
+
+ None
+
+
+
+
format
+
+ Optional[str]
+
+
+
+
If you're using a format for start_date or end_date
+that is not recognized natively by pandas' to_datetime function, you
+may supply the format yourself. Python date and time formats may be
+found here.
+
+
+
+ None
+
+
+
+
+
+
+
+
Returns:
+
+
+
+
Type
+
Description
+
+
+
+
+
+ DataFrame
+
+
+
+
A filtered pandas DataFrame.
+
+
+
+
+
+
+
+ Source code in janitor/functions/filter.py
+
@pf.register_dataframe_method
+@deprecated_alias(column="column_name",start="start_date",end="end_date")
+deffilter_date(
+ df:pd.DataFrame,
+ column_name:Hashable,
+ start_date:Optional[dt.date]=None,
+ end_date:Optional[dt.date]=None,
+ years:Optional[List]=None,
+ months:Optional[List]=None,
+ days:Optional[List]=None,
+ column_date_options:Optional[Dict]=None,
+ format:Optional[str]=None,# skipcq: PYL-W0622
+)->pd.DataFrame:
+"""Filter a date-based column based on certain criteria.
+
+ This method does not mutate the original DataFrame.
+
+ Dates may be finicky and this function builds on top of the *magic* from
+ the pandas `to_datetime` function that is able to parse dates well.
+
+ Additional options to parse the date type of your column may be found at
+ the official pandas [documentation][datetime].
+
+ [datetime]: https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.to_datetime.html
+
+ Examples:
+ >>> import pandas as pd
+ >>> import janitor
+ >>> df = pd.DataFrame({
+ ... "a": range(5, 9),
+ ... "dt": ["2021-11-12", "2021-12-15", "2022-01-03", "2022-01-09"],
+ ... })
+ >>> df
+ a dt
+ 0 5 2021-11-12
+ 1 6 2021-12-15
+ 2 7 2022-01-03
+ 3 8 2022-01-09
+ >>> df.filter_date("dt", start_date="2021-12-01", end_date="2022-01-05")
+ a dt
+ 1 6 2021-12-15
+ 2 7 2022-01-03
+ >>> df.filter_date("dt", years=[2021], months=[12])
+ a dt
+ 1 6 2021-12-15
+
+ !!!note
+
+ This method will cast your column to a Timestamp!
+
+ !!!note
+
+ This only affects the format of the `start_date` and `end_date`
+ parameters. If there's an issue with the format of the DataFrame being
+ parsed, you would pass `{'format': your_format}` to `column_date_options`.
+
+ Args:
+ df: The dataframe to filter on.
+ column_name: The column which to apply the fraction transformation.
+ start_date: The beginning date to use to filter the DataFrame.
+ end_date: The end date to use to filter the DataFrame.
+ years: The years to use to filter the DataFrame.
+ months: The months to use to filter the DataFrame.
+ days: The days to use to filter the DataFrame.
+ column_date_options: Special options to use when parsing the date
+ column in the original DataFrame. The options may be found at the
+ official Pandas documentation.
+ format: If you're using a format for `start_date` or `end_date`
+ that is not recognized natively by pandas' `to_datetime` function, you
+ may supply the format yourself. Python date and time formats may be
+ found [here](http://strftime.org/).
+
+ Returns:
+ A filtered pandas DataFrame.
+ """# noqa: E501
+
+ def_date_filter_conditions(conditions):
+"""Taken from: https://stackoverflow.com/a/13616382."""
+ returnreduce(np.logical_and,conditions)
+
+ ifcolumn_date_optionsisNone:
+ column_date_options={}
+ df[column_name]=pd.to_datetime(df[column_name],**column_date_options)
+
+ _filter_list=[]
+
+ ifstart_date:
+ start_date=pd.to_datetime(start_date,format=format)
+ _filter_list.append(df[column_name]>=start_date)
+
+ ifend_date:
+ end_date=pd.to_datetime(end_date,format=format)
+ _filter_list.append(df[column_name]<=end_date)
+
+ ifyears:
+ _filter_list.append(df[column_name].dt.year.isin(years))
+
+ ifmonths:
+ _filter_list.append(df[column_name].dt.month.isin(months))
+
+ ifdays:
+ _filter_list.append(df[column_name].dt.day.isin(days))
+
+ ifstart_dateandend_dateandstart_date>end_date:
+ warnings.warn(
+ f"Your start date of {start_date} is after your end date of "
+ f"{end_date}. Is this intended?"
+ )
+
+ returndf.loc[_date_filter_conditions(_filter_list),:]
+
+
+
+
+
+
+
+
+
+
+
+
+ filter_on(df,criteria,complement=False)
+
+
+
+
+
+
+
Return a dataframe filtered on a particular criteria.
+
This method does not mutate the original DataFrame.
+
This is super-sugary syntax that wraps the pandas .query() API, enabling
+users to use strings to quickly specify filters for filtering their
+dataframe. The intent is that filter_on as a verb better matches the
+intent of a pandas user than the verb query.
+
This is intended to be the method-chaining equivalent of the following:
+
df=df[df["score"]<3]
+
+
+
Note
+
This function will be deprecated in a 1.x release.
+Please use pd.DataFrame.query instead.
+
+
+
+
+
Examples:
+
Filter students who failed an exam (scored less than 50).
@pf.register_dataframe_method
+@refactored_function(
+ message=(
+ "This function will be deprecated in a 1.x release. "
+ "Please use `pd.DataFrame.query` instead."
+ )
+)
+deffilter_on(
+ df:pd.DataFrame,
+ criteria:str,
+ complement:bool=False,
+)->pd.DataFrame:
+"""Return a dataframe filtered on a particular criteria.
+
+ This method does not mutate the original DataFrame.
+
+ This is super-sugary syntax that wraps the pandas `.query()` API, enabling
+ users to use strings to quickly specify filters for filtering their
+ dataframe. The intent is that `filter_on` as a verb better matches the
+ intent of a pandas user than the verb `query`.
+
+ This is intended to be the method-chaining equivalent of the following:
+
+ ```python
+ df = df[df["score"] < 3]
+ ```
+
+ !!!note
+
+ This function will be deprecated in a 1.x release.
+ Please use `pd.DataFrame.query` instead.
+
+
+ Examples:
+ Filter students who failed an exam (scored less than 50).
+
+ >>> import pandas as pd
+ >>> import janitor
+ >>> df = pd.DataFrame({
+ ... "student_id": ["S1", "S2", "S3"],
+ ... "score": [40, 60, 85],
+ ... })
+ >>> df
+ student_id score
+ 0 S1 40
+ 1 S2 60
+ 2 S3 85
+ >>> df.filter_on("score < 50", complement=False)
+ student_id score
+ 0 S1 40
+
+ Credit to Brant Peterson for the name.
+
+ Args:
+ df: A pandas DataFrame.
+ criteria: A filtering criteria that returns an array or Series of
+ booleans, on which pandas can filter on.
+ complement: Whether to return the complement of the filter or not.
+ If set to True, then the rows for which the criteria is False are
+ retained instead.
+
+ Returns:
+ A filtered pandas DataFrame.
+ """
+
+ warnings.warn(
+ "This function will be deprecated in a 1.x release. "
+ "Kindly use `pd.DataFrame.query` instead.",
+ DeprecationWarning,
+ stacklevel=find_stack_level(),
+ )
+
+ ifcomplement:
+ returndf.query(f"not ({criteria})")
+ returndf.query(criteria)
+
@pf.register_dataframe_method
+@deprecated_alias(column="column_name")
+deffilter_string(
+ df:pd.DataFrame,
+ column_name:Hashable,
+ search_string:str,
+ complement:bool=False,
+ case:bool=True,
+ flags:int=0,
+ na:Any=None,
+ regex:bool=True,
+)->pd.DataFrame:
+"""Filter a string-based column according to whether it contains a substring.
+
+ This is super sugary syntax that builds on top of `pandas.Series.str.contains`.
+ It is meant to be the method-chaining equivalent of the following:
+
+ ```python
+ df = df[df[column_name].str.contains(search_string)]]
+ ```
+
+ This method does not mutate the original DataFrame.
+
+ Examples:
+ Retain rows whose column values contain a particular substring.
+
+ >>> import pandas as pd
+ >>> import janitor
+ >>> df = pd.DataFrame({"a": range(3, 6), "b": ["bear", "peeL", "sail"]})
+ >>> df
+ a b
+ 0 3 bear
+ 1 4 peeL
+ 2 5 sail
+ >>> df.filter_string(column_name="b", search_string="ee")
+ a b
+ 1 4 peeL
+ >>> df.filter_string(column_name="b", search_string="L", case=False)
+ a b
+ 1 4 peeL
+ 2 5 sail
+
+ Filter names does not contain `'.'` (disable regex mode).
+
+ >>> import pandas as pd
+ >>> import janitor
+ >>> df = pd.Series(["JoseChen", "Brian.Salvi"], name="Name").to_frame()
+ >>> df
+ Name
+ 0 JoseChen
+ 1 Brian.Salvi
+ >>> df.filter_string(column_name="Name", search_string=".", regex=False, complement=True)
+ Name
+ 0 JoseChen
+
+ Args:
+ df: A pandas DataFrame.
+ column_name: The column to filter. The column should contain strings.
+ search_string: A regex pattern or a (sub-)string to search.
+ complement: Whether to return the complement of the filter or not. If
+ set to True, then the rows for which the string search fails are retained
+ instead.
+ case: If True, case sensitive.
+ flags: Flags to pass through to the re module, e.g. re.IGNORECASE.
+ na: Fill value for missing values. The default depends on dtype of
+ the array. For object-dtype, `numpy.nan` is used. For `StringDtype`,
+ `pandas.NA` is used.
+ regex: If True, assumes `search_string` is a regular expression. If False,
+ treats the `search_string` as a literal string.
+
+ Returns:
+ A filtered pandas DataFrame.
+ """# noqa: E501
+
+ criteria=df[column_name].str.contains(
+ pat=search_string,
+ case=case,
+ flags=flags,
+ na=na,
+ regex=regex,
+ )
+
+ ifcomplement:
+ returndf[~criteria]
+
+ returndf[criteria]
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+ find_replace
+
+
+
+
+
+
+
Implementation for find_replace.
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+ find_replace(df,match='exact',**mappings)
+
+
+
+
+
+
+
Perform a find-and-replace action on provided columns.
+
+
Note
+
This function will be deprecated in a 1.x release.
+Please use pd.DataFrame.replace instead.
+
+
Depending on use case, users can choose either exact, full-value matching,
+or regular-expression-based fuzzy matching
+(hence allowing substring matching in the latter case).
+For strings, the matching is always case sensitive.
+
+
+
+
Examples:
+
For instance, given a DataFrame containing orders at a coffee shop:
+
>>> df=pd.DataFrame({
+... "customer":["Mary","Tom","Lila"],
+... "order":["ice coffee","lemonade","regular coffee"]
+... })
+>>> df
+ customer order
+0 Mary ice coffee
+1 Tom lemonade
+2 Lila regular coffee
+
+
Our task is to replace values ice coffee and regular coffee
+of the order column into latte.
+
Example 1 - exact matching (functional usage):
+
>>> df=find_replace(
+... df,
+... match="exact",
+... order={"ice coffee":"latte","regular coffee":"latte"},
+... )
+>>> df
+ customer order
+0 Mary latte
+1 Tom lemonade
+2 Lila latte
+
+
Example 1 - exact matching (method chaining):
+
>>> df=df.find_replace(
+... match="exact",
+... order={"ice coffee":"latte","regular coffee":"latte"},
+... )
+>>> df
+ customer order
+0 Mary latte
+1 Tom lemonade
+2 Lila latte
+
+
Example 2 - Regular-expression-based matching (functional usage):
+
>>> df=find_replace(
+... df,
+... match='regex',
+... order={'coffee$':'latte'},
+... )
+>>> df
+ customer order
+0 Mary latte
+1 Tom lemonade
+2 Lila latte
+
+
Example 2 - Regular-expression-based matching (method chaining usage):
+
>>> df=df.find_replace(
+... match='regex',
+... order={'coffee$':'latte'},
+... )
+>>> df
+ customer order
+0 Mary latte
+1 Tom lemonade
+2 Lila latte
+
+
To perform a find and replace on the entire DataFrame,
+pandas' df.replace() function provides the appropriate functionality.
+You can find more detail on the replace docs.
+
This function only works with column names that have no spaces
+or punctuation in them.
+For example, a column name item_name would work with find_replace,
+because it is a contiguous string that can be parsed correctly,
+but item name would not be parsed correctly by the Python interpreter.
+
If you have column names that might not be compatible,
+we recommend calling on clean_names()
+as the first method call. If, for whatever reason, that is not possible,
+then _find_replace is available as a function
+that you can do a pandas pipe call on.
+
+
+
+
Parameters:
+
+
+
+
Name
+
Type
+
Description
+
Default
+
+
+
+
+
df
+
+ DataFrame
+
+
+
+
A pandas DataFrame.
+
+
+
+ required
+
+
+
+
match
+
+ str
+
+
+
+
Whether or not to perform an exact match or not.
+Valid values are "exact" or "regex".
+
+
+
+ 'exact'
+
+
+
+
**mappings
+
+ Any
+
+
+
+
keyword arguments corresponding to column names
+that have dictionaries passed in indicating what to find (keys)
+and what to replace with (values).
+
+
+
+ {}
+
+
+
+
+
+
+
+
Returns:
+
+
+
+
Type
+
Description
+
+
+
+
+
+ DataFrame
+
+
+
+
A pandas DataFrame with replaced values.
+
+
+
+
+
+
+
+ Source code in janitor/functions/find_replace.py
+
@pf.register_dataframe_method
+@refactored_function(
+ message=(
+ "This function will be deprecated in a 1.x release. "
+ "Please use `pd.DataFrame.replace` instead."
+ )
+)
+deffind_replace(
+ df:pd.DataFrame,match:str="exact",**mappings:Any
+)->pd.DataFrame:
+"""Perform a find-and-replace action on provided columns.
+
+ !!!note
+
+ This function will be deprecated in a 1.x release.
+ Please use `pd.DataFrame.replace` instead.
+
+ Depending on use case, users can choose either exact, full-value matching,
+ or regular-expression-based fuzzy matching
+ (hence allowing substring matching in the latter case).
+ For strings, the matching is always case sensitive.
+
+ Examples:
+ For instance, given a DataFrame containing orders at a coffee shop:
+
+ >>> df = pd.DataFrame({
+ ... "customer": ["Mary", "Tom", "Lila"],
+ ... "order": ["ice coffee", "lemonade", "regular coffee"]
+ ... })
+ >>> df
+ customer order
+ 0 Mary ice coffee
+ 1 Tom lemonade
+ 2 Lila regular coffee
+
+ Our task is to replace values `ice coffee` and `regular coffee`
+ of the `order` column into `latte`.
+
+ Example 1 - exact matching (functional usage):
+
+ >>> df = find_replace(
+ ... df,
+ ... match="exact",
+ ... order={"ice coffee": "latte", "regular coffee": "latte"},
+ ... )
+ >>> df
+ customer order
+ 0 Mary latte
+ 1 Tom lemonade
+ 2 Lila latte
+
+ Example 1 - exact matching (method chaining):
+
+ >>> df = df.find_replace(
+ ... match="exact",
+ ... order={"ice coffee": "latte", "regular coffee": "latte"},
+ ... )
+ >>> df
+ customer order
+ 0 Mary latte
+ 1 Tom lemonade
+ 2 Lila latte
+
+ Example 2 - Regular-expression-based matching (functional usage):
+
+ >>> df = find_replace(
+ ... df,
+ ... match='regex',
+ ... order={'coffee$': 'latte'},
+ ... )
+ >>> df
+ customer order
+ 0 Mary latte
+ 1 Tom lemonade
+ 2 Lila latte
+
+ Example 2 - Regular-expression-based matching (method chaining usage):
+
+ >>> df = df.find_replace(
+ ... match='regex',
+ ... order={'coffee$': 'latte'},
+ ... )
+ >>> df
+ customer order
+ 0 Mary latte
+ 1 Tom lemonade
+ 2 Lila latte
+
+ To perform a find and replace on the entire DataFrame,
+ pandas' `df.replace()` function provides the appropriate functionality.
+ You can find more detail on the [replace] docs.
+
+ [replace]: https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.replace.html
+
+ This function only works with column names that have no spaces
+ or punctuation in them.
+ For example, a column name `item_name` would work with `find_replace`,
+ because it is a contiguous string that can be parsed correctly,
+ but `item name` would not be parsed correctly by the Python interpreter.
+
+ If you have column names that might not be compatible,
+ we recommend calling on [`clean_names()`][janitor.functions.clean_names.clean_names]
+ as the first method call. If, for whatever reason, that is not possible,
+ then `_find_replace` is available as a function
+ that you can do a pandas [pipe](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.pipe.html) call on.
+
+ Args:
+ df: A pandas DataFrame.
+ match: Whether or not to perform an exact match or not.
+ Valid values are "exact" or "regex".
+ **mappings: keyword arguments corresponding to column names
+ that have dictionaries passed in indicating what to find (keys)
+ and what to replace with (values).
+
+ Returns:
+ A pandas DataFrame with replaced values.
+ """# noqa: E501
+ forcolumn_name,mapperinmappings.items():
+ df=_find_replace(df,column_name,mapper,match=match)
+ returndf
+
Creates a new column to indicate whether you have null values in a given
+row.
+
If the columns parameter is not set, looks across the entire
+DataFrame, otherwise will look only in the columns you set.
+
This method does not mutate the original DataFrame.
+
+
+
+
Examples:
+
>>> importpandasaspd
+>>> importjanitor
+>>> df=pd.DataFrame({
+... "a":["w","x",None,"z"],"b":[5,None,7,8],
+... })
+>>> df.flag_nulls()
+ a b null_flag
+0 w 5.0 0
+1 x NaN 1
+2 None 7.0 1
+3 z 8.0 0
+>>> df.flag_nulls(columns="b")
+ a b null_flag
+0 w 5.0 0
+1 x NaN 1
+2 None 7.0 0
+3 z 8.0 0
+
+
+
+
+
Parameters:
+
+
+
+
Name
+
Type
+
Description
+
Default
+
+
+
+
+
df
+
+ DataFrame
+
+
+
+
Input pandas DataFrame.
+
+
+
+ required
+
+
+
+
column_name
+
+ Optional[Hashable]
+
+
+
+
Name for the output column.
+
+
+
+ 'null_flag'
+
+
+
+
columns
+
+ Optional[Union[str, Iterable[str], Hashable]]
+
+
+
+
List of columns to look at for finding null values. If you
+only want to look at one column, you can simply give its name.
+If set to None (default), all DataFrame columns are used.
+
+
+
+ None
+
+
+
+
+
+
+
+
Raises:
+
+
+
+
Type
+
Description
+
+
+
+
+
+ ValueError
+
+
+
+
If column_name is already present in the
+DataFrame.
+
+
+
+
+
+ ValueError
+
+
+
+
If any column within columns is not present in
+the DataFrame.
+
+
+
+
+
+
+
+
+
Returns:
+
+
+
+
Type
+
Description
+
+
+
+
+
+ DataFrame
+
+
+
+
Input dataframe with the null flag column.
+
+
+
+
+
+
+
+
+ Source code in janitor/functions/flag_nulls.py
+
@pf.register_dataframe_method
+defflag_nulls(
+ df:pd.DataFrame,
+ column_name:Optional[Hashable]="null_flag",
+ columns:Optional[Union[str,Iterable[str],Hashable]]=None,
+)->pd.DataFrame:
+"""Creates a new column to indicate whether you have null values in a given
+ row.
+
+ If the columns parameter is not set, looks across the entire
+ DataFrame, otherwise will look only in the columns you set.
+
+ This method does not mutate the original DataFrame.
+
+ Examples:
+ >>> import pandas as pd
+ >>> import janitor
+ >>> df = pd.DataFrame({
+ ... "a": ["w", "x", None, "z"], "b": [5, None, 7, 8],
+ ... })
+ >>> df.flag_nulls()
+ a b null_flag
+ 0 w 5.0 0
+ 1 x NaN 1
+ 2 None 7.0 1
+ 3 z 8.0 0
+ >>> df.flag_nulls(columns="b")
+ a b null_flag
+ 0 w 5.0 0
+ 1 x NaN 1
+ 2 None 7.0 0
+ 3 z 8.0 0
+
+ Args:
+ df: Input pandas DataFrame.
+ column_name: Name for the output column.
+ columns: List of columns to look at for finding null values. If you
+ only want to look at one column, you can simply give its name.
+ If set to None (default), all DataFrame columns are used.
+
+ Raises:
+ ValueError: If `column_name` is already present in the
+ DataFrame.
+ ValueError: If any column within `columns` is not present in
+ the DataFrame.
+
+ Returns:
+ Input dataframe with the null flag column.
+
+ <!--
+ # noqa: DAR402
+ -->
+ """
+ # Sort out columns input
+ ifisinstance(columns,str):
+ columns=[columns]
+ elifcolumnsisNone:
+ columns=df.columns
+ elifnotisinstance(columns,Iterable):
+ # catches other hashable types
+ columns=[columns]
+
+ # Input sanitation checks
+ check_column(df,columns)
+ check_column(df,[column_name],present=False)
+
+ # This algorithm works best for n_rows >> n_cols. See issue #501
+ null_array=np.zeros(len(df))
+ forcolincolumns:
+ null_array=np.logical_or(null_array,pd.isna(df[col]))
+
+ df=df.copy()
+ df[column_name]=null_array.astype(int)
+ returndf
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+ get_dupes
+
+
+
+
+
+
+
Implementation of the get_dupes function
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+ get_dupes(df,column_names=None)
+
+
+
+
+
+
+
Return all duplicate rows.
+
This method does not mutate the original DataFrame.
A column name or an iterable
+(list or tuple) of column names. Following pandas API, this only
+considers certain columns for identifying duplicates. Defaults
+to using all columns.
+
+
+
+ None
+
+
+
+
+
+
+
+
Returns:
+
+
+
+
Type
+
Description
+
+
+
+
+
+ DataFrame
+
+
+
+
The duplicate rows, as a pandas DataFrame.
+
+
+
+
+
+
+
+ Source code in janitor/functions/get_dupes.py
+
Return top k rows from a groupby of a set of columns.
+
Returns a DataFrame that has the top k values per column,
+grouped by by. Under the hood it uses nlargest/nsmallest,
+for numeric columns, which avoids sorting the entire dataframe,
+and is usually more performant. For non-numeric columns, pd.sort_values
+is used.
+No sorting is done to the by column(s); the order is maintained
+in the final output.
>>> df.groupby_topk(by="result",column="age",k=3)
+ age id result
+0 20 1 pass
+1 23 4 pass
+2 43 2 pass
+3 21 5 fail
+4 22 6 fail
+
+
Descending top 2:
+
>>> df.groupby_topk(
+... by="result",column="age",k=2,ascending=False,ignore_index=False
+... )
+ age id result
+3 43 2 pass
+1 23 4 pass
+2 22 6 fail
+4 21 5 fail
+
+
+
+
+
Parameters:
+
+
+
+
Name
+
Type
+
Description
+
Default
+
+
+
+
+
df
+
+ DataFrame
+
+
+
+
A pandas DataFrame.
+
+
+
+ required
+
+
+
+
by
+
+ Union[list, Hashable]
+
+
+
+
Column name(s) to group input DataFrame df by.
+
+
+
+ required
+
+
+
+
column
+
+ Hashable
+
+
+
+
Name of the column that determines k rows
+to return.
+
+
+
+ required
+
+
+
+
k
+
+ int
+
+
+
+
Number of top rows to return for each group.
+
+
+
+ required
+
+
+
+
dropna
+
+ bool
+
+
+
+
If True, and NA values exist in by, the NA
+values are not used in the groupby computation to get the relevant
+k rows. If False, and NA values exist in by, then the NA
+values are used in the groupby computation to get the relevant
+k rows.
+
+
+
+ True
+
+
+
+
ascending
+
+ bool
+
+
+
+
If True, the smallest top k rows,
+determined by column are returned; if False, the largest topkrows, determined bycolumn` are returned.
+
+
+
+ True
+
+
+
+
ignore_index
+
+ bool
+
+
+
+
If True, the original index is ignored.
+If False, the original index for the top k rows is retained.
+
+
+
+ True
+
+
+
+
+
+
+
+
Raises:
+
+
+
+
Type
+
Description
+
+
+
+
+
+ ValueError
+
+
+
+
If k is less than 1.
+
+
+
+
+
+
+
+
+
Returns:
+
+
+
+
Type
+
Description
+
+
+
+
+
+ DataFrame
+
+
+
+
A pandas DataFrame with top k rows per column, grouped by by.
+
+
+
+
+
+
+
+ Source code in janitor/functions/groupby_topk.py
+
@pf.register_dataframe_method
+@deprecated_alias(groupby_column_name="by",sort_column_name="column")
+defgroupby_topk(
+ df:pd.DataFrame,
+ by:Union[list,Hashable],
+ column:Hashable,
+ k:int,
+ dropna:bool=True,
+ ascending:bool=True,
+ ignore_index:bool=True,
+)->pd.DataFrame:
+"""Return top `k` rows from a groupby of a set of columns.
+
+ Returns a DataFrame that has the top `k` values per `column`,
+ grouped by `by`. Under the hood it uses `nlargest/nsmallest`,
+ for numeric columns, which avoids sorting the entire dataframe,
+ and is usually more performant. For non-numeric columns, `pd.sort_values`
+ is used.
+ No sorting is done to the `by` column(s); the order is maintained
+ in the final output.
+
+ Examples:
+ >>> import pandas as pd
+ >>> import janitor
+ >>> df = pd.DataFrame(
+ ... {
+ ... "age": [20, 23, 22, 43, 21],
+ ... "id": [1, 4, 6, 2, 5],
+ ... "result": ["pass", "pass", "fail", "pass", "fail"],
+ ... }
+ ... )
+ >>> df
+ age id result
+ 0 20 1 pass
+ 1 23 4 pass
+ 2 22 6 fail
+ 3 43 2 pass
+ 4 21 5 fail
+
+ Ascending top 3:
+
+ >>> df.groupby_topk(by="result", column="age", k=3)
+ age id result
+ 0 20 1 pass
+ 1 23 4 pass
+ 2 43 2 pass
+ 3 21 5 fail
+ 4 22 6 fail
+
+ Descending top 2:
+
+ >>> df.groupby_topk(
+ ... by="result", column="age", k=2, ascending=False, ignore_index=False
+ ... )
+ age id result
+ 3 43 2 pass
+ 1 23 4 pass
+ 2 22 6 fail
+ 4 21 5 fail
+
+ Args:
+ df: A pandas DataFrame.
+ by: Column name(s) to group input DataFrame `df` by.
+ column: Name of the column that determines `k` rows
+ to return.
+ k: Number of top rows to return for each group.
+ dropna: If `True`, and `NA` values exist in `by`, the `NA`
+ values are not used in the groupby computation to get the relevant
+ `k` rows. If `False`, and `NA` values exist in `by`, then the `NA`
+ values are used in the groupby computation to get the relevant
+ `k` rows.
+ ascending: If `True`, the smallest top `k` rows,
+ determined by `column` are returned; if `False, the largest top `k`
+ rows, determined by `column` are returned.
+ ignore_index: If `True`, the original index is ignored.
+ If `False`, the original index for the top `k` rows is retained.
+
+ Raises:
+ ValueError: If `k` is less than 1.
+
+ Returns:
+ A pandas DataFrame with top `k` rows per `column`, grouped by `by`.
+ """# noqa: E501
+
+ ifisinstance(by,Hashable):
+ by=[by]
+
+ check("by",by,[Hashable,list])
+
+ check_column(df,[column])
+ check_column(df,by)
+
+ ifk<1:
+ raiseValueError(
+ "Numbers of rows per group "
+ "to be returned must be greater than 0."
+ )
+
+ indices=df.groupby(by=by,dropna=dropna,sort=False,observed=True)
+ indices=indices[column]
+
+ try:
+ ifascending:
+ indices=indices.nsmallest(n=k)
+ else:
+ indices=indices.nlargest(n=k)
+ exceptTypeError:
+ indices=indices.apply(
+ lambdad:d.sort_values(ascending=ascending).head(k)
+ )
+
+ indices=indices.index.get_level_values(-1)
+ ifignore_index:
+ returndf.loc[indices].reset_index(drop=True)
+ returndf.loc[indices]
+
Method-chainable imputation of values in a column.
+
This method does not mutate the original DataFrame.
+
Underneath the hood, this function calls the .fillna() method available
+to every pandas.Series object.
+
Either one of value or statistic_column_name should be provided.
+
If value is provided, then all null values in the selected column will
+take on the value provided.
+
If statistic_column_name is provided, then all null values in the
+selected column(s) will take on the summary statistic value
+of other non-null values.
+
Column selection in column_names is possible using the
+select syntax.
+
Currently supported statistics include:
+
+
mean (also aliased by average)
+
median
+
mode
+
minimum (also aliased by min)
+
maximum (also aliased by max)
+
+
+
+
+
Examples:
+
>>> importnumpyasnp
+>>> importpandasaspd
+>>> importjanitor
+>>> df=pd.DataFrame({
+... "a":[1,2,3],
+... "sales":np.nan,
+... "score":[np.nan,3,2],
+... })
+>>> df
+ a sales score
+0 1 NaN NaN
+1 2 NaN 3.0
+2 3 NaN 2.0
+
+
Imputing null values with 0 (using the value parameter):
+
>>> df.impute(column_names="sales",value=0.0)
+ a sales score
+0 1 0.0 NaN
+1 2 0.0 3.0
+2 3 0.0 2.0
+
+
Imputing null values with median (using the statistic_column_name
+parameter):
+
>>> df.impute(column_names="score",statistic_column_name="median")
+ a sales score
+0 1 NaN 2.5
+1 2 NaN 3.0
+2 3 NaN 2.0
+
+
+
+
+
Parameters:
+
+
+
+
Name
+
Type
+
Description
+
Default
+
+
+
+
+
df
+
+ DataFrame
+
+
+
+
A pandas DataFrame.
+
+
+
+ required
+
+
+
+
column_names
+
+ Any
+
+
+
+
The name of the column(s) on which to impute values.
+
+
+
+ required
+
+
+
+
value
+
+ Optional[Any]
+
+
+
+
The value used for imputation, passed into .fillna method
+of the underlying pandas Series.
+
+
+
+ None
+
+
+
+
statistic_column_name
+
+ Optional[str]
+
+
+
+
The column statistic to impute.
+
+
+
+ None
+
+
+
+
+
+
+
+
Raises:
+
+
+
+
Type
+
Description
+
+
+
+
+
+ ValueError
+
+
+
+
If both value and statistic_column_name are
+provided.
+
+
+
+
+
+ KeyError
+
+
+
+
If statistic_column_name is not one of mean,
+average, median, mode, minimum, min, maximum, or
+max.
+
+
+
+
+
+
+
+
+
Returns:
+
+
+
+
Type
+
Description
+
+
+
+
+
+ DataFrame
+
+
+
+
An imputed pandas DataFrame.
+
+
+
+
+
+
+
+ Source code in janitor/functions/impute.py
+
@pf.register_dataframe_method
+@deprecated_alias(column="column_name")
+@deprecated_alias(column_name="column_names")
+@deprecated_alias(statistic="statistic_column_name")
+defimpute(
+ df:pd.DataFrame,
+ column_names:Any,
+ value:Optional[Any]=None,
+ statistic_column_name:Optional[str]=None,
+)->pd.DataFrame:
+"""Method-chainable imputation of values in a column.
+
+ This method does not mutate the original DataFrame.
+
+ Underneath the hood, this function calls the `.fillna()` method available
+ to every `pandas.Series` object.
+
+ Either one of `value` or `statistic_column_name` should be provided.
+
+ If `value` is provided, then all null values in the selected column will
+ take on the value provided.
+
+ If `statistic_column_name` is provided, then all null values in the
+ selected column(s) will take on the summary statistic value
+ of other non-null values.
+
+ Column selection in `column_names` is possible using the
+ [`select`][janitor.functions.select.select] syntax.
+
+ Currently supported statistics include:
+
+ - `mean` (also aliased by `average`)
+ - `median`
+ - `mode`
+ - `minimum` (also aliased by `min`)
+ - `maximum` (also aliased by `max`)
+
+ Examples:
+ >>> import numpy as np
+ >>> import pandas as pd
+ >>> import janitor
+ >>> df = pd.DataFrame({
+ ... "a": [1, 2, 3],
+ ... "sales": np.nan,
+ ... "score": [np.nan, 3, 2],
+ ... })
+ >>> df
+ a sales score
+ 0 1 NaN NaN
+ 1 2 NaN 3.0
+ 2 3 NaN 2.0
+
+ Imputing null values with 0 (using the `value` parameter):
+
+ >>> df.impute(column_names="sales", value=0.0)
+ a sales score
+ 0 1 0.0 NaN
+ 1 2 0.0 3.0
+ 2 3 0.0 2.0
+
+ Imputing null values with median (using the `statistic_column_name`
+ parameter):
+
+ >>> df.impute(column_names="score", statistic_column_name="median")
+ a sales score
+ 0 1 NaN 2.5
+ 1 2 NaN 3.0
+ 2 3 NaN 2.0
+
+ Args:
+ df: A pandas DataFrame.
+ column_names: The name of the column(s) on which to impute values.
+ value: The value used for imputation, passed into `.fillna` method
+ of the underlying pandas Series.
+ statistic_column_name: The column statistic to impute.
+
+ Raises:
+ ValueError: If both `value` and `statistic_column_name` are
+ provided.
+ KeyError: If `statistic_column_name` is not one of `mean`,
+ `average`, `median`, `mode`, `minimum`, `min`, `maximum`, or
+ `max`.
+
+ Returns:
+ An imputed pandas DataFrame.
+ """
+ # Firstly, we check that only one of `value` or `statistic` are provided.
+ if(valueisNone)and(statistic_column_nameisNone):
+ raiseValueError("Kindly specify a value or a statistic_column_name")
+
+ ifvalueisnotNoneandstatistic_column_nameisnotNone:
+ raiseValueError(
+ "Only one of `value` or `statistic_column_name` should be "
+ "provided."
+ )
+
+ column_names=get_index_labels([column_names],df,axis="columns")
+
+ ifvalueisnotNone:
+ value=dict(product(column_names,[value]))
+
+ else:
+ # If statistic is provided, then we compute
+ # the relevant summary statistic
+ # from the other data.
+ funcs={
+ "mean":"mean",
+ "average":"mean",# aliased
+ "median":"median",
+ "mode":"mode",
+ "minimum":"min",
+ "min":"min",# aliased
+ "maximum":"max",
+ "max":"max",# aliased
+ }
+ # Check that the statistic keyword argument is one of the approved.
+ ifstatistic_column_namenotinfuncs:
+ raiseKeyError(
+ f"`statistic_column_name` must be one of {funcs.keys()}."
+ )
+
+ value=dict(product(column_names,[funcs[statistic_column_name]]))
+
+ value=df.agg(value)
+
+ # special treatment for mode
+ ifstatistic_column_name=="mode":
+ value={key:val.at[0]forkey,valinvalue.items()}
+
+ returndf.fillna(value=value)
+
Adds Gaussian noise (jitter) to the values of a column.
+
A new column will be created containing the values of the original column
+with Gaussian noise added.
+For each value in the column, a Gaussian distribution is created
+having a location (mean) equal to the value
+and a scale (standard deviation) equal to scale.
+A random value is then sampled from this distribution,
+which is the jittered value.
+If a tuple is supplied for clip,
+then any values of the new column less than clip[0]
+will be set to clip[0],
+and any values greater than clip[1] will be set to clip[1].
+Additionally, if a numeric value is supplied for random_state,
+this value will be used to set the random seed used for sampling.
+NaN values are ignored in this method.
+
This method mutates the original DataFrame.
+
+
+
+
Examples:
+
>>> importnumpyasnp
+>>> importpandasaspd
+>>> importjanitor
+>>> df=pd.DataFrame({"a":[3,4,5,np.nan]})
+>>> df
+ a
+0 3.0
+1 4.0
+2 5.0
+3 NaN
+>>> df.jitter("a",dest_column_name="a_jit",scale=1,random_state=42)
+ a a_jit
+0 3.0 3.496714
+1 4.0 3.861736
+2 5.0 5.647689
+3 NaN NaN
+
+
+
+
+
Parameters:
+
+
+
+
Name
+
Type
+
Description
+
Default
+
+
+
+
+
df
+
+ DataFrame
+
+
+
+
A pandas DataFrame.
+
+
+
+ required
+
+
+
+
column_name
+
+ Hashable
+
+
+
+
Name of the column containing
+values to add Gaussian jitter to.
+
+
+
+ required
+
+
+
+
dest_column_name
+
+ str
+
+
+
+
The name of the new column containing the
+jittered values that will be created.
+
+
+
+ required
+
+
+
+
scale
+
+ number
+
+
+
+
A positive value multiplied by the original
+column value to determine the scale (standard deviation) of the
+Gaussian distribution to sample from. (A value of zero results in
+no jittering.)
+
+
+
+ required
+
+
+
+
clip
+
+ Optional[Iterable[number]]
+
+
+
+
An iterable of two values (minimum and maximum) to clip
+the jittered values to, default to None.
+
+
+
+ None
+
+
+
+
random_state
+
+ Optional[number]
+
+
+
+
An integer or 1-d array value used to set the random
+seed, default to None.
+
+
+
+ None
+
+
+
+
+
+
+
+
Raises:
+
+
+
+
Type
+
Description
+
+
+
+
+
+ TypeError
+
+
+
+
If column_name is not numeric.
+
+
+
+
+
+ ValueError
+
+
+
+
If scale is not a numerical value
+greater than 0.
+
+
+
+
+
+ ValueError
+
+
+
+
If clip is not an iterable of length 2.
+
+
+
+
+
+ ValueError
+
+
+
+
If clip[0] is greater than clip[1].
+
+
+
+
+
+
+
+
+
Returns:
+
+
+
+
Type
+
Description
+
+
+
+
+
+ DataFrame
+
+
+
+
A pandas DataFrame with a new column containing
+Gaussian-jittered values from another column.
+
+
+
+
+
+
+
+ Source code in janitor/functions/jitter.py
+
@pf.register_dataframe_method
+defjitter(
+ df:pd.DataFrame,
+ column_name:Hashable,
+ dest_column_name:str,
+ scale:np.number,
+ clip:Optional[Iterable[np.number]]=None,
+ random_state:Optional[np.number]=None,
+)->pd.DataFrame:
+"""Adds Gaussian noise (jitter) to the values of a column.
+
+ A new column will be created containing the values of the original column
+ with Gaussian noise added.
+ For each value in the column, a Gaussian distribution is created
+ having a location (mean) equal to the value
+ and a scale (standard deviation) equal to `scale`.
+ A random value is then sampled from this distribution,
+ which is the jittered value.
+ If a tuple is supplied for `clip`,
+ then any values of the new column less than `clip[0]`
+ will be set to `clip[0]`,
+ and any values greater than `clip[1]` will be set to `clip[1]`.
+ Additionally, if a numeric value is supplied for `random_state`,
+ this value will be used to set the random seed used for sampling.
+ NaN values are ignored in this method.
+
+ This method mutates the original DataFrame.
+
+ Examples:
+ >>> import numpy as np
+ >>> import pandas as pd
+ >>> import janitor
+ >>> df = pd.DataFrame({"a": [3, 4, 5, np.nan]})
+ >>> df
+ a
+ 0 3.0
+ 1 4.0
+ 2 5.0
+ 3 NaN
+ >>> df.jitter("a", dest_column_name="a_jit", scale=1, random_state=42)
+ a a_jit
+ 0 3.0 3.496714
+ 1 4.0 3.861736
+ 2 5.0 5.647689
+ 3 NaN NaN
+
+ Args:
+ df: A pandas DataFrame.
+ column_name: Name of the column containing
+ values to add Gaussian jitter to.
+ dest_column_name: The name of the new column containing the
+ jittered values that will be created.
+ scale: A positive value multiplied by the original
+ column value to determine the scale (standard deviation) of the
+ Gaussian distribution to sample from. (A value of zero results in
+ no jittering.)
+ clip: An iterable of two values (minimum and maximum) to clip
+ the jittered values to, default to None.
+ random_state: An integer or 1-d array value used to set the random
+ seed, default to None.
+
+ Raises:
+ TypeError: If `column_name` is not numeric.
+ ValueError: If `scale` is not a numerical value
+ greater than `0`.
+ ValueError: If `clip` is not an iterable of length `2`.
+ ValueError: If `clip[0]` is greater than `clip[1]`.
+
+ Returns:
+ A pandas DataFrame with a new column containing
+ Gaussian-jittered values from another column.
+ """
+
+ # Check types
+ check("scale",scale,[int,float])
+
+ # Check that `column_name` is a numeric column
+ ifnotnp.issubdtype(df[column_name].dtype,np.number):
+ raiseTypeError(f"{column_name} must be a numeric column.")
+
+ ifscale<=0:
+ raiseValueError("`scale` must be a numeric value greater than 0.")
+ values=df[column_name]
+ ifrandom_stateisnotNone:
+ np.random.seed(random_state)
+ result=np.random.normal(loc=values,scale=scale)
+ ifclip:
+ # Ensure `clip` has length 2
+ iflen(clip)!=2:
+ raiseValueError("`clip` must be an iterable of length 2.")
+ # Ensure the values in `clip` are ordered as min, max
+ ifclip[1]<clip[0]:
+ raiseValueError(
+ "`clip[0]` must be less than or equal to `clip[1]`."
+ )
+ result=np.clip(result,*clip)
+ df[dest_column_name]=result
+
+ returndf
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+ join_apply
+
+
+
+
+
+
+
Implementation of the join_apply function
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+ join_apply(df,func,new_column_name)
+
+
+
+
+
+
+
Join the result of applying a function across dataframe rows.
+
This method does not mutate the original DataFrame.
+
This is a convenience function that allows us to apply arbitrary functions
+that take any combination of information from any of the columns. The only
+requirement is that the function signature takes in a row from the
+DataFrame.
+
+
Note
+
This function will be deprecated in a 1.x release.
+Please use jn.transform_column
+instead.
@pf.register_dataframe_method
+@refactored_function(
+ message=(
+ "This function will be deprecated in a 1.x release. "
+ "Please use `jn.transform_columns` instead."
+ )
+)
+defjoin_apply(
+ df:pd.DataFrame,
+ func:Callable,
+ new_column_name:str,
+)->pd.DataFrame:
+"""Join the result of applying a function across dataframe rows.
+
+ This method does not mutate the original DataFrame.
+
+ This is a convenience function that allows us to apply arbitrary functions
+ that take any combination of information from any of the columns. The only
+ requirement is that the function signature takes in a row from the
+ DataFrame.
+
+ !!!note
+
+ This function will be deprecated in a 1.x release.
+ Please use [`jn.transform_column`][janitor.functions.transform_columns.transform_column]
+ instead.
+
+ Examples:
+ Sum the result of two columns into a new column.
+
+ >>> import pandas as pd
+ >>> import janitor
+ >>> df = pd.DataFrame({"a":[1, 2, 3], "b": [2, 3, 4]})
+ >>> df
+ a b
+ 0 1 2
+ 1 2 3
+ 2 3 4
+ >>> df.join_apply(
+ ... func=lambda x: 2 * x["a"] + x["b"],
+ ... new_column_name="2a+b",
+ ... )
+ a b 2a+b
+ 0 1 2 4
+ 1 2 3 7
+ 2 3 4 10
+
+ Incorporating conditionals in `func`.
+
+ >>> import pandas as pd
+ >>> import janitor
+ >>> df = pd.DataFrame({"a": [1, 2, 3], "b": [20, 30, 40]})
+ >>> df
+ a b
+ 0 1 20
+ 1 2 30
+ 2 3 40
+ >>> def take_a_if_even(x):
+ ... if x["a"] % 2 == 0:
+ ... return x["a"]
+ ... else:
+ ... return x["b"]
+ >>> df.join_apply(take_a_if_even, "a_if_even")
+ a b a_if_even
+ 0 1 20 20
+ 1 2 30 2
+ 2 3 40 40
+
+ Args:
+ df: A pandas DataFrame.
+ func: A function that is applied elementwise across all rows of the
+ DataFrame.
+ new_column_name: Name of the resulting column.
+
+ Returns:
+ A pandas DataFrame with new column appended.
+ """# noqa: E501
+ df=df.copy().join(df.apply(func,axis=1).rename(new_column_name))
+ returndf
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+ label_encode
+
+
+
+
+
+
+
Implementation of label_encode function
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+ label_encode(df,column_names)
+
+
+
+
+
+
+
Convert labels into numerical data.
+
This method will create a new column with the string _enc appended
+after the original column's name.
+Consider this to be syntactic sugar.
+This function uses the factorize pandas function under the hood.
+
This method behaves differently from
+encode_categorical.
+This method creates a new column of numeric data.
+encode_categorical
+replaces the dtype of the original column with a categorical dtype.
+
This method mutates the original DataFrame.
+
+
Note
+
This function will be deprecated in a 1.x release.
+Please use factorize_columns
+instead.
+
+
+
+
+
Examples:
+
>>> importpandasaspd
+>>> importjanitor
+>>> df=pd.DataFrame({
+... "foo":["b","b","a","c","b"],
+... "bar":range(4,9),
+... })
+>>> df
+ foo bar
+0 b 4
+1 b 5
+2 a 6
+3 c 7
+4 b 8
+>>> df.label_encode(column_names="foo")
+ foo bar foo_enc
+0 b 4 0
+1 b 5 0
+2 a 6 1
+3 c 7 2
+4 b 8 0
+
+
+
+
+
Parameters:
+
+
+
+
Name
+
Type
+
Description
+
Default
+
+
+
+
+
df
+
+ DataFrame
+
+
+
+
The pandas DataFrame object.
+
+
+
+ required
+
+
+
+
column_names
+
+ Union[str, Iterable[str], Hashable]
+
+
+
+
A column name or an iterable (list
+or tuple) of column names.
+
+
+
+ required
+
+
+
+
+
+
+
+
Returns:
+
+
+
+
Type
+
Description
+
+
+
+
+
+ DataFrame
+
+
+
+
A pandas DataFrame.
+
+
+
+
+
+
+
+ Source code in janitor/functions/label_encode.py
+
@pf.register_dataframe_method
+@deprecated_alias(columns="column_names")
+deflabel_encode(
+ df:pd.DataFrame,
+ column_names:Union[str,Iterable[str],Hashable],
+)->pd.DataFrame:
+"""Convert labels into numerical data.
+
+ This method will create a new column with the string `_enc` appended
+ after the original column's name.
+ Consider this to be syntactic sugar.
+ This function uses the `factorize` pandas function under the hood.
+
+ This method behaves differently from
+ [`encode_categorical`][janitor.functions.encode_categorical.encode_categorical].
+ This method creates a new column of numeric data.
+ [`encode_categorical`][janitor.functions.encode_categorical.encode_categorical]
+ replaces the dtype of the original column with a *categorical* dtype.
+
+ This method mutates the original DataFrame.
+
+ !!!note
+
+ This function will be deprecated in a 1.x release.
+ Please use [`factorize_columns`][janitor.functions.factorize_columns.factorize_columns]
+ instead.
+
+ Examples:
+ >>> import pandas as pd
+ >>> import janitor
+ >>> df = pd.DataFrame({
+ ... "foo": ["b", "b", "a", "c", "b"],
+ ... "bar": range(4, 9),
+ ... })
+ >>> df
+ foo bar
+ 0 b 4
+ 1 b 5
+ 2 a 6
+ 3 c 7
+ 4 b 8
+ >>> df.label_encode(column_names="foo")
+ foo bar foo_enc
+ 0 b 4 0
+ 1 b 5 0
+ 2 a 6 1
+ 3 c 7 2
+ 4 b 8 0
+
+ Args:
+ df: The pandas DataFrame object.
+ column_names: A column name or an iterable (list
+ or tuple) of column names.
+
+ Returns:
+ A pandas DataFrame.
+ """# noqa: E501
+ warnings.warn(
+ "`label_encode` will be deprecated in a 1.x release. "
+ "Please use `factorize_columns` instead."
+ )
+ df=_factorize(df,column_names,"_enc")
+ returndf
+
Method chaining will truncate all columns to a given length and append
+a given separator character with the index of duplicate columns, except
+for the first distinct column name.
+
+
+
+
Examples:
+
>>> importpandasaspd
+>>> importjanitor
+>>> data_dict={
+... "really_long_name":[9,8,7],
+... "another_really_long_name":[2,4,6],
+... "another_really_longer_name":list("xyz"),
+... "this_is_getting_out_of_hand":list("pqr"),
+... }
+>>> df=pd.DataFrame(data_dict)
+>>> df
+ really_long_name another_really_long_name another_really_longer_name this_is_getting_out_of_hand
+0 9 2 x p
+1 8 4 y q
+2 7 6 z r
+>>> df.limit_column_characters(7)
+ really_ another another_1 this_is
+0 9 2 x p
+1 8 4 y q
+2 7 6 z r
+
+
+
+
+
Parameters:
+
+
+
+
Name
+
Type
+
Description
+
Default
+
+
+
+
+
df
+
+ DataFrame
+
+
+
+
A pandas DataFrame.
+
+
+
+ required
+
+
+
+
column_length
+
+ int
+
+
+
+
Character length for which to truncate all columns.
+The column separator value and number for duplicate column name does
+not contribute. Therefore, if all columns are truncated to 10
+characters, the first distinct column will be 10 characters and the
+remaining will be 12 characters (assuming a column separator of one
+character).
+
+
+
+ required
+
+
+
+
col_separator
+
+ str
+
+
+
+
The separator to use for counting distinct column
+values, for example, '_' or '.'.
+Supply an empty string (i.e. '') to remove the separator.
+
+
+
+ '_'
+
+
+
+
+
+
+
+
Returns:
+
+
+
+
Type
+
Description
+
+
+
+
+
+ DataFrame
+
+
+
+
A pandas DataFrame with truncated column lengths.
+
+
+
+
+
+
+
+ Source code in janitor/functions/limit_column_characters.py
+
@pf.register_dataframe_method
+deflimit_column_characters(
+ df:pd.DataFrame,
+ column_length:int,
+ col_separator:str="_",
+)->pd.DataFrame:
+"""Truncate column sizes to a specific length.
+
+ This method mutates the original DataFrame.
+
+ Method chaining will truncate all columns to a given length and append
+ a given separator character with the index of duplicate columns, except
+ for the first distinct column name.
+
+ Examples:
+ >>> import pandas as pd
+ >>> import janitor
+ >>> data_dict = {
+ ... "really_long_name": [9, 8, 7],
+ ... "another_really_long_name": [2, 4, 6],
+ ... "another_really_longer_name": list("xyz"),
+ ... "this_is_getting_out_of_hand": list("pqr"),
+ ... }
+ >>> df = pd.DataFrame(data_dict)
+ >>> df # doctest: +SKIP
+ really_long_name another_really_long_name another_really_longer_name this_is_getting_out_of_hand
+ 0 9 2 x p
+ 1 8 4 y q
+ 2 7 6 z r
+ >>> df.limit_column_characters(7)
+ really_ another another_1 this_is
+ 0 9 2 x p
+ 1 8 4 y q
+ 2 7 6 z r
+
+ Args:
+ df: A pandas DataFrame.
+ column_length: Character length for which to truncate all columns.
+ The column separator value and number for duplicate column name does
+ not contribute. Therefore, if all columns are truncated to 10
+ characters, the first distinct column will be 10 characters and the
+ remaining will be 12 characters (assuming a column separator of one
+ character).
+ col_separator: The separator to use for counting distinct column
+ values, for example, `'_'` or `'.'`.
+ Supply an empty string (i.e. `''`) to remove the separator.
+
+ Returns:
+ A pandas DataFrame with truncated column lengths.
+ """# noqa: E501
+
+ check("column_length",column_length,[int])
+ check("col_separator",col_separator,[str])
+
+ col_names=df.columns
+ col_names=[col_name[:column_length]forcol_nameincol_names]
+
+ col_name_set=set(col_names)
+ col_name_count={}
+
+ # If no columns are duplicates, we can skip the loops below.
+ iflen(col_name_set)==len(col_names):
+ df.columns=col_names
+ returndf
+
+ forcol_name_to_checkincol_name_set:
+ count=0
+ foridx,col_nameinenumerate(col_names):
+ ifcol_name_to_check==col_name:
+ col_name_count[idx]=count
+ count+=1
+
+ final_col_names=[]
+ foridx,col_nameinenumerate(col_names):
+ ifcol_name_count[idx]>0:
+ col_name_to_append=(
+ col_name+col_separator+str(col_name_count[idx])
+ )
+ final_col_names.append(col_name_to_append)
+ else:
+ final_col_names.append(col_name)
+
+ df.columns=final_col_names
+ returndf
+
Scales DataFrame to between a minimum and maximum value.
+
One can optionally set a new target minimum and maximum value
+using the feature_range keyword argument.
+
If column_name is specified, then only that column(s) of data is scaled.
+Otherwise, the entire dataframe is scaled.
+If jointly is True, the column_names provided entire dataframe will
+be regnozied as the one to jointly scale. Otherwise, each column of data
+will be scaled separately.
+
+
+
+
Examples:
+
>>> importpandasaspd
+>>> importjanitor
+>>> df=pd.DataFrame({'a':[1,2],'b':[0,1]})
+>>> df.min_max_scale()
+ a b
+0 0.0 0.0
+1 1.0 1.0
+>>> df.min_max_scale(jointly=True)
+ a b
+0 0.5 0.0
+1 1.0 0.5
+
+
Setting custom minimum and maximum.
+
>>> importpandasaspd
+>>> importjanitor
+>>> df=pd.DataFrame({'a':[1,2],'b':[0,1]})
+>>> df.min_max_scale(feature_range=(0,100))
+ a b
+0 0.0 0.0
+1 100.0 100.0
+>>> df.min_max_scale(feature_range=(0,100),jointly=True)
+ a b
+0 50.0 0.0
+1 100.0 50.0
+
+
Apply min-max to the selected columns.
+
>>> importpandasaspd
+>>> importjanitor
+>>> df=pd.DataFrame({'a':[1,2],'b':[0,1],'c':[1,0]})
+>>> df.min_max_scale(
+... feature_range=(0,100),
+... column_name=["a","c"],
+... )
+ a b c
+0 0.0 0 100.0
+1 100.0 1 0.0
+>>> df.min_max_scale(
+... feature_range=(0,100),
+... column_name=["a","c"],
+... jointly=True,
+... )
+ a b c
+0 50.0 0 50.0
+1 100.0 1 0.0
+>>> df.min_max_scale(feature_range=(0,100),column_name='a')
+ a b c
+0 0.0 0 1
+1 100.0 1 0
+
+
The aforementioned example might be applied to something like scaling the
+isoelectric points of amino acids. While technically they range from
+approx 3-10, we can also think of them on the pH scale which ranges from
+1 to 14. Hence, 3 gets scaled not to 0 but approx. 0.15 instead, while 10
+gets scaled to approx. 0.69 instead.
+
+
Version Changed
+
+
0.24.0
+
Deleted old_min, old_max, new_min, and new_max options.
+
Added feature_range, and jointly options.
+
+
+
+
+
+
+
+
Parameters:
+
+
+
+
Name
+
Type
+
Description
+
Default
+
+
+
+
+
df
+
+ DataFrame
+
+
+
+
A pandas DataFrame.
+
+
+
+ required
+
+
+
+
feature_range
+
+ tuple[int | float, int | float]
+
+
+
+
Desired range of transformed data.
+
+
+
+ (0, 1)
+
+
+
+
column_name
+
+ str | int | list[str | int] | Index
+
+
+
+
The column on which to perform scaling.
+
+
+
+ None
+
+
+
+
jointly
+
+ bool
+
+
+
+
Scale the entire data if True.
+
+
+
+ False
+
+
+
+
+
+
+
+
Raises:
+
+
+
+
Type
+
Description
+
+
+
+
+
+ ValueError
+
+
+
+
If feature_range isn't tuple type.
+
+
+
+
+
+ ValueError
+
+
+
+
If the length of feature_range isn't equal to two.
+
+
+
+
+
+ ValueError
+
+
+
+
If the element of feature_range isn't number type.
+
+
+
+
+
+ ValueError
+
+
+
+
If feature_range[1] <= feature_range[0].
+
+
+
+
+
+
+
+
+
Returns:
+
+
+
+
Type
+
Description
+
+
+
+
+
+ DataFrame
+
+
+
+
A pandas DataFrame with scaled data.
+
+
+
+
+
+
+
+ Source code in janitor/functions/min_max_scale.py
+
@pf.register_dataframe_method
+@deprecated_kwargs(
+ "old_min",
+ "old_max",
+ "new_min",
+ "new_max",
+ message=(
+ "The keyword argument {argument!r} of {func_name!r} is deprecated. "
+ "Please use 'feature_range' instead."
+ ),
+)
+@deprecated_alias(col_name="column_name")
+defmin_max_scale(
+ df:pd.DataFrame,
+ feature_range:tuple[int|float,int|float]=(0,1),
+ column_name:str|int|list[str|int]|pd.Index=None,
+ jointly:bool=False,
+)->pd.DataFrame:
+"""Scales DataFrame to between a minimum and maximum value.
+
+ One can optionally set a new target **minimum** and **maximum** value
+ using the `feature_range` keyword argument.
+
+ If `column_name` is specified, then only that column(s) of data is scaled.
+ Otherwise, the entire dataframe is scaled.
+ If `jointly` is `True`, the `column_names` provided entire dataframe will
+ be regnozied as the one to jointly scale. Otherwise, each column of data
+ will be scaled separately.
+
+ Examples:
+ >>> import pandas as pd
+ >>> import janitor
+ >>> df = pd.DataFrame({'a':[1, 2], 'b':[0, 1]})
+ >>> df.min_max_scale()
+ a b
+ 0 0.0 0.0
+ 1 1.0 1.0
+ >>> df.min_max_scale(jointly=True)
+ a b
+ 0 0.5 0.0
+ 1 1.0 0.5
+
+ Setting custom minimum and maximum.
+
+ >>> import pandas as pd
+ >>> import janitor
+ >>> df = pd.DataFrame({'a':[1, 2], 'b':[0, 1]})
+ >>> df.min_max_scale(feature_range=(0, 100))
+ a b
+ 0 0.0 0.0
+ 1 100.0 100.0
+ >>> df.min_max_scale(feature_range=(0, 100), jointly=True)
+ a b
+ 0 50.0 0.0
+ 1 100.0 50.0
+
+ Apply min-max to the selected columns.
+
+ >>> import pandas as pd
+ >>> import janitor
+ >>> df = pd.DataFrame({'a':[1, 2], 'b':[0, 1], 'c': [1, 0]})
+ >>> df.min_max_scale(
+ ... feature_range=(0, 100),
+ ... column_name=["a", "c"],
+ ... )
+ a b c
+ 0 0.0 0 100.0
+ 1 100.0 1 0.0
+ >>> df.min_max_scale(
+ ... feature_range=(0, 100),
+ ... column_name=["a", "c"],
+ ... jointly=True,
+ ... )
+ a b c
+ 0 50.0 0 50.0
+ 1 100.0 1 0.0
+ >>> df.min_max_scale(feature_range=(0, 100), column_name='a')
+ a b c
+ 0 0.0 0 1
+ 1 100.0 1 0
+
+ The aforementioned example might be applied to something like scaling the
+ isoelectric points of amino acids. While technically they range from
+ approx 3-10, we can also think of them on the pH scale which ranges from
+ 1 to 14. Hence, 3 gets scaled not to 0 but approx. 0.15 instead, while 10
+ gets scaled to approx. 0.69 instead.
+
+ !!! summary "Version Changed"
+
+ - 0.24.0
+ - Deleted `old_min`, `old_max`, `new_min`, and `new_max` options.
+ - Added `feature_range`, and `jointly` options.
+
+ Args:
+ df: A pandas DataFrame.
+ feature_range: Desired range of transformed data.
+ column_name: The column on which to perform scaling.
+ jointly: Scale the entire data if True.
+
+ Raises:
+ ValueError: If `feature_range` isn't tuple type.
+ ValueError: If the length of `feature_range` isn't equal to two.
+ ValueError: If the element of `feature_range` isn't number type.
+ ValueError: If `feature_range[1]` <= `feature_range[0]`.
+
+ Returns:
+ A pandas DataFrame with scaled data.
+ """# noqa: E501
+
+ ifnot(
+ isinstance(feature_range,(tuple,list))
+ andlen(feature_range)==2
+ andall((isinstance(i,(int,float)))foriinfeature_range)
+ andfeature_range[1]>feature_range[0]
+ ):
+ raiseValueError(
+ "`feature_range` should be a range type contains number element, "
+ "the first element must be greater than the second one"
+ )
+
+ ifcolumn_nameisnotNone:
+ df=df.copy()# Avoid to change the original DataFrame.
+
+ old_feature_range=df[column_name].pipe(_min_max_value,jointly)
+ df[column_name]=df[column_name].pipe(
+ _apply_min_max,
+ *old_feature_range,
+ *feature_range,
+ )
+ else:
+ old_feature_range=df.pipe(_min_max_value,jointly)
+ df=df.pipe(
+ _apply_min_max,
+ *old_feature_range,
+ *feature_range,
+ )
+
+ returndf
+
Changes rows or columns positions in the dataframe.
+
It uses the
+select syntax,
+making it easy to move blocks of rows or columns at once.
+
This operation does not reset the index of the dataframe. User must
+explicitly do so.
+
The dataframe must have unique column names or indices.
+
+
+
+
Examples:
+
Move a row:
+
>>> importpandasaspd
+>>> importjanitor
+>>> df=pd.DataFrame({"a":[2,4,6,8],"b":list("wxyz")})
+>>> df
+ a b
+0 2 w
+1 4 x
+2 6 y
+3 8 z
+>>> df.move(source=0,target=3,position="before",axis=0)
+ a b
+1 4 x
+2 6 y
+0 2 w
+3 8 z
+
+
Move a column:
+
>>> importpandasaspd
+>>> importjanitor
+>>> data=[{"a":1,"b":1,"c":1,
+... "d":"a","e":"a","f":"a"}]
+>>> df=pd.DataFrame(data)
+>>> df
+ a b c d e f
+0 1 1 1 a a a
+>>> df.move(source="a",target="c",position="after",axis=1)
+ b c a d e f
+0 1 1 1 a a a
+>>> df.move(source="f",target="b",position="before",axis=1)
+ a f b c d e
+0 1 a 1 1 a a
+>>> df.move(source="a",target=None,position="after",axis=1)
+ b c d e f a
+0 1 1 a a a 1
+
+
Move columns:
+
>>> frompandas.api.typesimportis_numeric_dtype,is_string_dtype
+>>> df.move(source=is_string_dtype,target=None,position="before",axis=1)
+ d e f a b c
+0 a a a 1 1 1
+>>> df.move(source=is_numeric_dtype,target=None,position="after",axis=1)
+ d e f a b c
+0 a a a 1 1 1
+>>> df.move(source=["d","f"],target=is_numeric_dtype,position="before",axis=1)
+ d f a b c e
+0 a a 1 1 1 a
+
+
+
+
+
Parameters:
+
+
+
+
Name
+
Type
+
Description
+
Default
+
+
+
+
+
df
+
+ DataFrame
+
+
+
+
The pandas DataFrame object.
+
+
+
+ required
+
+
+
+
source
+
+ Any
+
+
+
+
Columns or rows to move.
+
+
+
+ required
+
+
+
+
target
+
+ Any
+
+
+
+
Columns or rows to move adjacent to.
+If None and position == 'before', source
+is moved to the beginning; if position == 'after',
+source is moved to the end.
+
+
+
+ None
+
+
+
+
position
+
+ str
+
+
+
+
Specifies the destination of the columns/rows.
+Values can be either before or after; defaults to before.
+
+
+
+ 'before'
+
+
+
+
axis
+
+ int
+
+
+
+
Axis along which the function is applied. 0 to move along
+the index, 1 to move along the columns.
@pf.register_dataframe_method
+defmove(
+ df:pd.DataFrame,
+ source:Any,
+ target:Any=None,
+ position:str="before",
+ axis:int=0,
+)->pd.DataFrame:
+"""Changes rows or columns positions in the dataframe.
+
+ It uses the
+ [`select`][janitor.functions.select.select] syntax,
+ making it easy to move blocks of rows or columns at once.
+
+ This operation does not reset the index of the dataframe. User must
+ explicitly do so.
+
+ The dataframe must have unique column names or indices.
+
+ Examples:
+ Move a row:
+ >>> import pandas as pd
+ >>> import janitor
+ >>> df = pd.DataFrame({"a": [2, 4, 6, 8], "b": list("wxyz")})
+ >>> df
+ a b
+ 0 2 w
+ 1 4 x
+ 2 6 y
+ 3 8 z
+ >>> df.move(source=0, target=3, position="before", axis=0)
+ a b
+ 1 4 x
+ 2 6 y
+ 0 2 w
+ 3 8 z
+
+ Move a column:
+ >>> import pandas as pd
+ >>> import janitor
+ >>> data = [{"a": 1, "b": 1, "c": 1,
+ ... "d": "a", "e": "a","f": "a"}]
+ >>> df = pd.DataFrame(data)
+ >>> df
+ a b c d e f
+ 0 1 1 1 a a a
+ >>> df.move(source="a", target="c", position="after", axis=1)
+ b c a d e f
+ 0 1 1 1 a a a
+ >>> df.move(source="f", target="b", position="before", axis=1)
+ a f b c d e
+ 0 1 a 1 1 a a
+ >>> df.move(source="a", target=None, position="after", axis=1)
+ b c d e f a
+ 0 1 1 a a a 1
+
+ Move columns:
+ >>> from pandas.api.types import is_numeric_dtype, is_string_dtype
+ >>> df.move(source=is_string_dtype, target=None, position="before", axis=1)
+ d e f a b c
+ 0 a a a 1 1 1
+ >>> df.move(source=is_numeric_dtype, target=None, position="after", axis=1)
+ d e f a b c
+ 0 a a a 1 1 1
+ >>> df.move(source = ["d", "f"], target=is_numeric_dtype, position="before", axis=1)
+ d f a b c e
+ 0 a a 1 1 1 a
+
+ Args:
+ df: The pandas DataFrame object.
+ source: Columns or rows to move.
+ target: Columns or rows to move adjacent to.
+ If `None` and `position == 'before'`, `source`
+ is moved to the beginning; if `position == 'after'`,
+ `source` is moved to the end.
+ position: Specifies the destination of the columns/rows.
+ Values can be either `before` or `after`; defaults to `before`.
+ axis: Axis along which the function is applied. 0 to move along
+ the index, 1 to move along the columns.
+
+ Raises:
+ ValueError: If `axis` is not `0` or `1`.
+ ValueError: If `position` is not `before` or `after`.
+
+ Returns:
+ The dataframe with the Series moved.
+ """# noqa: E501
+ ifaxisnotin[0,1]:
+ raiseValueError(f"Invalid axis '{axis}'. Can only be 0 or 1.")
+
+ ifpositionnotin["before","after"]:
+ raiseValueError(
+ f"Invalid position '{position}'. Can only be 'before' or 'after'."
+ )
+
+ mapping={0:"index",1:"columns"}
+ names=getattr(df,mapping[axis])
+
+ assertnames.is_unique
+
+ index=np.arange(names.size)
+ source=_select_index([source],df,mapping[axis])
+ source=_index_converter(source,index)
+ iftargetisNone:
+ ifposition=="after":
+ target=np.array([names.size])
+ else:
+ target=np.array([0])
+ else:
+ target=_select_index([target],df,mapping[axis])
+ target=_index_converter(target,index)
+ index=np.delete(index,source)
+
+ ifposition=="before":
+ position=index.searchsorted(target[0])
+ else:
+ position=index.searchsorted(target[-1])+1
+ start=index[:position]
+ end=index[position:]
+ position=np.concatenate([start,source,end])
+
+ returndf.iloc(axis=axis)[position]
+
This method does not mutate the original DataFrame.
+
It is modeled after the pivot_longer function in R's tidyr package,
+and also takes inspiration from R's data.table package.
+
This function is useful to massage a DataFrame into a format where
+one or more columns are considered measured variables, and all other
+columns are considered as identifier variables.
+
All measured variables are unpivoted (and typically duplicated) along the
+row axis.
+
Column selection in index and column_names is possible using the
+select syntax.
Replicate the above transformation with a nested dictionary passed to names_pattern
+- the outer keys in the names_pattern dictionary are passed to names_to,
+while the inner keys are passed to values_to:
Name(s) of columns to use as identifier variables.
+Should be either a single column name, or a list/tuple of
+column names.
+index should be a list of tuples if the columns are a MultiIndex.
+
+
+
+ None
+
+
+
+
column_names
+
+ Optional[Union[list, tuple, str, Pattern]]
+
+
+
+
Name(s) of columns to unpivot. Should be either
+a single column name or a list/tuple of column names.
+column_names should be a list of tuples
+if the columns are a MultiIndex.
+
+
+
+ None
+
+
+
+
names_to
+
+ Optional[Union[list, tuple, str]]
+
+
+
+
Name of new column as a string that will contain
+what were previously the column names in column_names.
+The default is variable if no value is provided. It can
+also be a list/tuple of strings that will serve as new column
+names, if name_sep or names_pattern is provided.
+If .value is in names_to, new column names will be extracted
+from part of the existing column names and overridesvalues_to.
+
+
+
+ None
+
+
+
+
values_to
+
+ Optional[str]
+
+
+
+
Name of new column as a string that will contain what
+were previously the values of the columns in column_names.
+values_to can also be a list/tuple
+and requires that names_pattern is also a list/tuple.
+
+
+
+ 'value'
+
+
+
+
column_level
+
+ Optional[Union[int, str]]
+
+
+
+
If columns are a MultiIndex, then use this level to
+unpivot the DataFrame. Provided for compatibility with pandas' melt,
+and applies only if neither names_sep nor names_pattern is
+provided.
+
+
+
+ None
+
+
+
+
names_sep
+
+ Optional[Union[str, Pattern]]
+
+
+
+
Determines how the column name is broken up, if
+names_to contains multiple values. It takes the same
+specification as pandas' str.split method, and can be a string
+or regular expression. names_sep does not work with MultiIndex
+columns.
+
+
+
+ None
+
+
+
+
names_pattern
+
+ Optional[Union[list, tuple, str, Pattern]]
+
+
+
+
Determines how the column name is broken up.
+It can be a regular expression containing matching groups.
+Under the hood it is processed with pandas' str.extract function.
+If it is a single regex, the number of groups must match
+the length of names_to.
+Named groups are supported, if names_to is none. _ is used
+instead of .value as a placeholder in named groups.
+_ can be overloaded for multiple .value
+calls - _, __, ___, ...
+names_pattern can also be a list/tuple of regular expressions
+It can also be a list/tuple of strings;
+the strings will be treated as regular expressions.
+Under the hood it is processed with pandas' str.contains function.
+For a list/tuple of regular expressions,
+names_to must also be a list/tuple and the lengths of both
+arguments must match.
+names_pattern can also be a dictionary, where the keys are
+the new column names, while the values can be a regular expression
+or a string which will be evaluated as a regular expression.
+Alternatively, a nested dictionary can be used, where the sub
+key(s) are associated with values_to. Please have a look
+at the examples for usage.
+names_pattern does not work with MultiIndex columns.
+
+
+
+ None
+
+
+
+
names_transform
+
+ Optional[Union[str, Callable, dict]]
+
+
+
+
Use this option to change the types of columns that
+have been transformed to rows. This does not applies to the values' columns.
+Accepts any argument that is acceptable by pd.astype.
+
+
+
+ None
+
+
+
+
dropna
+
+ bool
+
+
+
+
Determines whether or not to drop nulls
+from the values columns. Default is False.
+
+
+
+ False
+
+
+
+
sort_by_appearance
+
+ Optional[bool]
+
+
+
+
Boolean value that determines
+the final look of the DataFrame. If True, the unpivoted DataFrame
+will be stacked in order of first appearance.
+
+
+
+ False
+
+
+
+
ignore_index
+
+ Optional[bool]
+
+
+
+
If True,
+the original index is ignored. If False, the original index
+is retained and the index labels will be repeated as necessary.
+
+
+
+ True
+
+
+
+
+
+
+
+
Returns:
+
+
+
+
Type
+
Description
+
+
+
+
+
+ DataFrame
+
+
+
+
A pandas DataFrame that has been unpivoted from wide to long
+format.
@pf.register_dataframe_method
+defpivot_longer(
+ df:pd.DataFrame,
+ index:Optional[Union[list,tuple,str,Pattern]]=None,
+ column_names:Optional[Union[list,tuple,str,Pattern]]=None,
+ names_to:Optional[Union[list,tuple,str]]=None,
+ values_to:Optional[str]="value",
+ column_level:Optional[Union[int,str]]=None,
+ names_sep:Optional[Union[str,Pattern]]=None,
+ names_pattern:Optional[Union[list,tuple,str,Pattern]]=None,
+ names_transform:Optional[Union[str,Callable,dict]]=None,
+ dropna:bool=False,
+ sort_by_appearance:Optional[bool]=False,
+ ignore_index:Optional[bool]=True,
+)->pd.DataFrame:
+"""Unpivots a DataFrame from *wide* to *long* format.
+
+ This method does not mutate the original DataFrame.
+
+ It is modeled after the `pivot_longer` function in R's tidyr package,
+ and also takes inspiration from R's data.table package.
+
+ This function is useful to massage a DataFrame into a format where
+ one or more columns are considered measured variables, and all other
+ columns are considered as identifier variables.
+
+ All measured variables are *unpivoted* (and typically duplicated) along the
+ row axis.
+
+ Column selection in `index` and `column_names` is possible using the
+ [`select`][janitor.functions.select.select] syntax.
+
+ Examples:
+ >>> import pandas as pd
+ >>> import janitor
+ >>> df = pd.DataFrame(
+ ... {
+ ... "Sepal.Length": [5.1, 5.9],
+ ... "Sepal.Width": [3.5, 3.0],
+ ... "Petal.Length": [1.4, 5.1],
+ ... "Petal.Width": [0.2, 1.8],
+ ... "Species": ["setosa", "virginica"],
+ ... }
+ ... )
+ >>> df
+ Sepal.Length Sepal.Width Petal.Length Petal.Width Species
+ 0 5.1 3.5 1.4 0.2 setosa
+ 1 5.9 3.0 5.1 1.8 virginica
+
+ Replicate pandas' melt:
+ >>> df.pivot_longer(index = 'Species')
+ Species variable value
+ 0 setosa Sepal.Length 5.1
+ 1 virginica Sepal.Length 5.9
+ 2 setosa Sepal.Width 3.5
+ 3 virginica Sepal.Width 3.0
+ 4 setosa Petal.Length 1.4
+ 5 virginica Petal.Length 5.1
+ 6 setosa Petal.Width 0.2
+ 7 virginica Petal.Width 1.8
+
+ Convenient, flexible column selection in the `index` via the
+ [`select`][janitor.functions.select.select] syntax:
+ >>> from pandas.api.types import is_string_dtype
+ >>> df.pivot_longer(index = is_string_dtype)
+ Species variable value
+ 0 setosa Sepal.Length 5.1
+ 1 virginica Sepal.Length 5.9
+ 2 setosa Sepal.Width 3.5
+ 3 virginica Sepal.Width 3.0
+ 4 setosa Petal.Length 1.4
+ 5 virginica Petal.Length 5.1
+ 6 setosa Petal.Width 0.2
+ 7 virginica Petal.Width 1.8
+
+ Split the column labels into parts:
+ >>> df.pivot_longer(
+ ... index = 'Species',
+ ... names_to = ('part', 'dimension'),
+ ... names_sep = '.',
+ ... sort_by_appearance = True,
+ ... )
+ Species part dimension value
+ 0 setosa Sepal Length 5.1
+ 1 setosa Sepal Width 3.5
+ 2 setosa Petal Length 1.4
+ 3 setosa Petal Width 0.2
+ 4 virginica Sepal Length 5.9
+ 5 virginica Sepal Width 3.0
+ 6 virginica Petal Length 5.1
+ 7 virginica Petal Width 1.8
+
+ Retain parts of the column names as headers:
+ >>> df.pivot_longer(
+ ... index = 'Species',
+ ... names_to = ('part', '.value'),
+ ... names_sep = '.',
+ ... sort_by_appearance = True,
+ ... )
+ Species part Length Width
+ 0 setosa Sepal 5.1 3.5
+ 1 setosa Petal 1.4 0.2
+ 2 virginica Sepal 5.9 3.0
+ 3 virginica Petal 5.1 1.8
+
+ Split the column labels based on regex:
+ >>> df = pd.DataFrame({"id": [1], "new_sp_m5564": [2], "newrel_f65": [3]})
+ >>> df
+ id new_sp_m5564 newrel_f65
+ 0 1 2 3
+ >>> df.pivot_longer(
+ ... index = 'id',
+ ... names_to = ('diagnosis', 'gender', 'age'),
+ ... names_pattern = r"new_?(.+)_(.)(\\d+)",
+ ... )
+ id diagnosis gender age value
+ 0 1 sp m 5564 2
+ 1 1 rel f 65 3
+
+ Split the column labels for the above dataframe using named groups in `names_pattern`:
+ >>> df.pivot_longer(
+ ... index = 'id',
+ ... names_pattern = r"new_?(?P<diagnosis>.+)_(?P<gender>.)(?P<age>\\d+)",
+ ... )
+ id diagnosis gender age value
+ 0 1 sp m 5564 2
+ 1 1 rel f 65 3
+
+ Convert the dtypes of specific columns with `names_transform`:
+ >>> result = (df
+ ... .pivot_longer(
+ ... index = 'id',
+ ... names_to = ('diagnosis', 'gender', 'age'),
+ ... names_pattern = r"new_?(.+)_(.)(\\d+)",
+ ... names_transform = {'gender': 'category', 'age':'int'})
+ ... )
+ >>> result.dtypes
+ id int64
+ diagnosis object
+ gender category
+ age int64
+ value int64
+ dtype: object
+
+ Use multiple `.value` to reshape dataframe:
+ >>> df = pd.DataFrame(
+ ... [
+ ... {
+ ... "x_1_mean": 10,
+ ... "x_2_mean": 20,
+ ... "y_1_mean": 30,
+ ... "y_2_mean": 40,
+ ... "unit": 50,
+ ... }
+ ... ]
+ ... )
+ >>> df
+ x_1_mean x_2_mean y_1_mean y_2_mean unit
+ 0 10 20 30 40 50
+ >>> df.pivot_longer(
+ ... index="unit",
+ ... names_to=(".value", "time", ".value"),
+ ... names_pattern=r"(x|y)_([0-9])(_mean)",
+ ... )
+ unit time x_mean y_mean
+ 0 50 1 10 30
+ 1 50 2 20 40
+
+ Replicate the above with named groups in `names_pattern` - use `_` instead of `.value`:
+ >>> df.pivot_longer(
+ ... index="unit",
+ ... names_pattern=r"(?P<_>x|y)_(?P<time>[0-9])(?P<__>_mean)",
+ ... )
+ unit time x_mean y_mean
+ 0 50 1 10 30
+ 1 50 2 20 40
+
+ Convenient, flexible column selection in the `column_names` via
+ [`select`][janitor.functions.select.select] syntax:
+ >>> df.pivot_longer(
+ ... column_names="*mean",
+ ... names_to=(".value", "time", ".value"),
+ ... names_pattern=r"(x|y)_([0-9])(_mean)",
+ ... )
+ unit time x_mean y_mean
+ 0 50 1 10 30
+ 1 50 2 20 40
+
+ >>> df.pivot_longer(
+ ... column_names=slice("x_1_mean", "y_2_mean"),
+ ... names_to=(".value", "time", ".value"),
+ ... names_pattern=r"(x|y)_([0-9])(_mean)",
+ ... )
+ unit time x_mean y_mean
+ 0 50 1 10 30
+ 1 50 2 20 40
+
+ Reshape dataframe by passing a sequence to `names_pattern`:
+ >>> df = pd.DataFrame({'hr1': [514, 573],
+ ... 'hr2': [545, 526],
+ ... 'team': ['Red Sox', 'Yankees'],
+ ... 'year1': [2007, 2007],
+ ... 'year2': [2008, 2008]})
+ >>> df
+ hr1 hr2 team year1 year2
+ 0 514 545 Red Sox 2007 2008
+ 1 573 526 Yankees 2007 2008
+ >>> df.pivot_longer(
+ ... index = 'team',
+ ... names_to = ['year', 'hr'],
+ ... names_pattern = ['year', 'hr']
+ ... )
+ team hr year
+ 0 Red Sox 514 2007
+ 1 Yankees 573 2007
+ 2 Red Sox 545 2008
+ 3 Yankees 526 2008
+
+
+ Reshape above dataframe by passing a dictionary to `names_pattern`:
+ >>> df.pivot_longer(
+ ... index = 'team',
+ ... names_pattern = {"year":"year", "hr":"hr"}
+ ... )
+ team hr year
+ 0 Red Sox 514 2007
+ 1 Yankees 573 2007
+ 2 Red Sox 545 2008
+ 3 Yankees 526 2008
+
+ Multiple values_to:
+ >>> df = pd.DataFrame(
+ ... {
+ ... "City": ["Houston", "Austin", "Hoover"],
+ ... "State": ["Texas", "Texas", "Alabama"],
+ ... "Name": ["Aria", "Penelope", "Niko"],
+ ... "Mango": [4, 10, 90],
+ ... "Orange": [10, 8, 14],
+ ... "Watermelon": [40, 99, 43],
+ ... "Gin": [16, 200, 34],
+ ... "Vodka": [20, 33, 18],
+ ... },
+ ... columns=[
+ ... "City",
+ ... "State",
+ ... "Name",
+ ... "Mango",
+ ... "Orange",
+ ... "Watermelon",
+ ... "Gin",
+ ... "Vodka",
+ ... ],
+ ... )
+ >>> df
+ City State Name Mango Orange Watermelon Gin Vodka
+ 0 Houston Texas Aria 4 10 40 16 20
+ 1 Austin Texas Penelope 10 8 99 200 33
+ 2 Hoover Alabama Niko 90 14 43 34 18
+ >>> df.pivot_longer(
+ ... index=["City", "State"],
+ ... column_names=slice("Mango", "Vodka"),
+ ... names_to=("Fruit", "Drink"),
+ ... values_to=("Pounds", "Ounces"),
+ ... names_pattern=["M|O|W", "G|V"],
+ ... )
+ City State Fruit Pounds Drink Ounces
+ 0 Houston Texas Mango 4 Gin 16.0
+ 1 Austin Texas Mango 10 Gin 200.0
+ 2 Hoover Alabama Mango 90 Gin 34.0
+ 3 Houston Texas Orange 10 Vodka 20.0
+ 4 Austin Texas Orange 8 Vodka 33.0
+ 5 Hoover Alabama Orange 14 Vodka 18.0
+ 6 Houston Texas Watermelon 40 None NaN
+ 7 Austin Texas Watermelon 99 None NaN
+ 8 Hoover Alabama Watermelon 43 None NaN
+
+ Replicate the above transformation with a nested dictionary passed to `names_pattern`
+ - the outer keys in the `names_pattern` dictionary are passed to `names_to`,
+ while the inner keys are passed to `values_to`:
+ >>> df.pivot_longer(
+ ... index=["City", "State"],
+ ... column_names=slice("Mango", "Vodka"),
+ ... names_pattern={
+ ... "Fruit": {"Pounds": "M|O|W"},
+ ... "Drink": {"Ounces": "G|V"},
+ ... },
+ ... )
+ City State Fruit Pounds Drink Ounces
+ 0 Houston Texas Mango 4 Gin 16.0
+ 1 Austin Texas Mango 10 Gin 200.0
+ 2 Hoover Alabama Mango 90 Gin 34.0
+ 3 Houston Texas Orange 10 Vodka 20.0
+ 4 Austin Texas Orange 8 Vodka 33.0
+ 5 Hoover Alabama Orange 14 Vodka 18.0
+ 6 Houston Texas Watermelon 40 None NaN
+ 7 Austin Texas Watermelon 99 None NaN
+ 8 Hoover Alabama Watermelon 43 None NaN
+
+ !!! abstract "Version Changed"
+
+ - 0.24.0
+ - Added `dropna` parameter.
+ - 0.24.1
+ - `names_pattern` can accept a dictionary.
+ - named groups supported in `names_pattern`.
+
+ Args:
+ df: A pandas DataFrame.
+ index: Name(s) of columns to use as identifier variables.
+ Should be either a single column name, or a list/tuple of
+ column names.
+ `index` should be a list of tuples if the columns are a MultiIndex.
+ column_names: Name(s) of columns to unpivot. Should be either
+ a single column name or a list/tuple of column names.
+ `column_names` should be a list of tuples
+ if the columns are a MultiIndex.
+ names_to: Name of new column as a string that will contain
+ what were previously the column names in `column_names`.
+ The default is `variable` if no value is provided. It can
+ also be a list/tuple of strings that will serve as new column
+ names, if `name_sep` or `names_pattern` is provided.
+ If `.value` is in `names_to`, new column names will be extracted
+ from part of the existing column names and overrides`values_to`.
+ values_to: Name of new column as a string that will contain what
+ were previously the values of the columns in `column_names`.
+ values_to can also be a list/tuple
+ and requires that names_pattern is also a list/tuple.
+ column_level: If columns are a MultiIndex, then use this level to
+ unpivot the DataFrame. Provided for compatibility with pandas' melt,
+ and applies only if neither `names_sep` nor `names_pattern` is
+ provided.
+ names_sep: Determines how the column name is broken up, if
+ `names_to` contains multiple values. It takes the same
+ specification as pandas' `str.split` method, and can be a string
+ or regular expression. `names_sep` does not work with MultiIndex
+ columns.
+ names_pattern: Determines how the column name is broken up.
+ It can be a regular expression containing matching groups.
+ Under the hood it is processed with pandas' `str.extract` function.
+ If it is a single regex, the number of groups must match
+ the length of `names_to`.
+ Named groups are supported, if `names_to` is none. `_` is used
+ instead of `.value` as a placeholder in named groups.
+ `_` can be overloaded for multiple `.value`
+ calls - `_`, `__`, `___`, ...
+ `names_pattern` can also be a list/tuple of regular expressions
+ It can also be a list/tuple of strings;
+ the strings will be treated as regular expressions.
+ Under the hood it is processed with pandas' `str.contains` function.
+ For a list/tuple of regular expressions,
+ `names_to` must also be a list/tuple and the lengths of both
+ arguments must match.
+ `names_pattern` can also be a dictionary, where the keys are
+ the new column names, while the values can be a regular expression
+ or a string which will be evaluated as a regular expression.
+ Alternatively, a nested dictionary can be used, where the sub
+ key(s) are associated with `values_to`. Please have a look
+ at the examples for usage.
+ `names_pattern` does not work with MultiIndex columns.
+ names_transform: Use this option to change the types of columns that
+ have been transformed to rows. This does not applies to the values' columns.
+ Accepts any argument that is acceptable by `pd.astype`.
+ dropna: Determines whether or not to drop nulls
+ from the values columns. Default is `False`.
+ sort_by_appearance: Boolean value that determines
+ the final look of the DataFrame. If `True`, the unpivoted DataFrame
+ will be stacked in order of first appearance.
+ ignore_index: If `True`,
+ the original index is ignored. If `False`, the original index
+ is retained and the index labels will be repeated as necessary.
+
+ Returns:
+ A pandas DataFrame that has been unpivoted from wide to long
+ format.
+ """# noqa: E501
+
+ # this code builds on the wonderful work of @benjaminjack’s PR
+ # https://github.com/benjaminjack/pyjanitor/commit/e3df817903c20dd21634461c8a92aec137963ed0
+
+ return_computations_pivot_longer(
+ df=df,
+ index=index,
+ column_names=column_names,
+ column_level=column_level,
+ names_to=names_to,
+ values_to=values_to,
+ names_sep=names_sep,
+ names_pattern=names_pattern,
+ names_transform=names_transform,
+ dropna=dropna,
+ sort_by_appearance=sort_by_appearance,
+ ignore_index=ignore_index,
+ )
+
This function will be deprecated in a 1.x release.
+Please use pd.DataFrame.pivot instead.
+
+
The number of columns are increased, while decreasing
+the number of rows. It is the inverse of the
+pivot_longer
+method, and is a wrapper around pd.DataFrame.pivot method.
+
This method does not mutate the original DataFrame.
+
Column selection in index, names_from and values_from
+is possible using the
+select syntax.
+
A ValueError is raised if the combination
+of the index and names_from is not unique.
+
By default, values from values_from are always
+at the top level if the columns are not flattened.
+If flattened, the values from values_from are usually
+at the start of each label in the columns.
Expand the index to expose implicit missing values
+- this applies only to categorical columns:
+
>>> daily=daily.assign(letter=list('ABBA'))
+>>> daily
+ day value letter
+0 Tue 2 A
+0 Thu 3 B
+0 Fri 1 B
+0 Mon 5 A
+>>> daily.pivot_wider(index='day',names_from='letter',values_from='value')
+ day A B
+0 Tue 2.0 NaN
+1 Thu NaN 3.0
+2 Fri NaN 1.0
+3 Mon 5.0 NaN
+>>> (daily
+... .pivot_wider(
+... index='day',
+... names_from='letter',
+... values_from='value',
+... index_expand=True)
+... )
+ day A B
+0 Mon 5.0 NaN
+1 Tue 2.0 NaN
+2 Wed NaN NaN
+3 Thu NaN 3.0
+4 Fri NaN 1.0
+5 Sat NaN NaN
+6 Sun NaN NaN
+
+
+
Version Changed
+
+
0.24.0
+
Added reset_index, names_expand and index_expand parameters.
+
+
+
+
+
+
+
+
Parameters:
+
+
+
+
Name
+
Type
+
Description
+
Default
+
+
+
+
+
df
+
+ DataFrame
+
+
+
+
A pandas DataFrame.
+
+
+
+ required
+
+
+
+
index
+
+ Optional[Union[list, str]]
+
+
+
+
Name(s) of columns to use as identifier variables.
+It should be either a single column name, or a list of column names.
+If index is not provided, the DataFrame's index is used.
+
+
+
+ None
+
+
+
+
names_from
+
+ Optional[Union[list, str]]
+
+
+
+
Name(s) of column(s) to use to make the new
+DataFrame's columns. Should be either a single column name,
+or a list of column names.
+
+
+
+ None
+
+
+
+
values_from
+
+ Optional[Union[list, str]]
+
+
+
+
Name(s) of column(s) that will be used for populating
+the new DataFrame's values.
+If values_from is not specified, all remaining columns
+will be used.
+
+
+
+ None
+
+
+
+
flatten_levels
+
+ Optional[bool]
+
+
+
+
If False, the DataFrame stays as a MultiIndex.
+
+
+
+ True
+
+
+
+
names_sep
+
+ str
+
+
+
+
If names_from or values_from contain multiple
+variables, this will be used to join the values into a single string
+to use as a column name. Default is _.
+Applicable only if flatten_levels is True.
+
+
+
+ '_'
+
+
+
+
names_glue
+
+ str
+
+
+
+
A string to control the output of the flattened columns.
+It offers more flexibility in creating custom column names,
+and uses python's str.format_map under the hood.
+Simply create the string template,
+using the column labels in names_from,
+and special _value as a placeholder for values_from.
+Applicable only if flatten_levels is True.
+
+
+
+ None
+
+
+
+
reset_index
+
+ bool
+
+
+
+
Determines whether to restore index
+as a column/columns. Applicable only if index is provided,
+and flatten_levels is True.
+
+
+
+ True
+
+
+
+
names_expand
+
+ bool
+
+
+
+
Expand columns to show all the categories.
+Applies only if names_from is a categorical column.
+
+
+
+ False
+
+
+
+
index_expand
+
+ bool
+
+
+
+
Expand the index to show all the categories.
+Applies only if index is a categorical column.
+
+
+
+ False
+
+
+
+
+
+
+
+
Returns:
+
+
+
+
Type
+
Description
+
+
+
+
+
+ DataFrame
+
+
+
+
A pandas DataFrame that has been unpivoted from long to wide form.
@pf.register_dataframe_method
+@refactored_function(
+ message=(
+ "This function will be deprecated in a 1.x release. "
+ "Please use `pd.DataFrame.pivot` instead."
+ )
+)
+defpivot_wider(
+ df:pd.DataFrame,
+ index:Optional[Union[list,str]]=None,
+ names_from:Optional[Union[list,str]]=None,
+ values_from:Optional[Union[list,str]]=None,
+ flatten_levels:Optional[bool]=True,
+ names_sep:str="_",
+ names_glue:str=None,
+ reset_index:bool=True,
+ names_expand:bool=False,
+ index_expand:bool=False,
+)->pd.DataFrame:
+"""Reshapes data from *long* to *wide* form.
+
+ !!!note
+
+ This function will be deprecated in a 1.x release.
+ Please use `pd.DataFrame.pivot` instead.
+
+ The number of columns are increased, while decreasing
+ the number of rows. It is the inverse of the
+ [`pivot_longer`][janitor.functions.pivot.pivot_longer]
+ method, and is a wrapper around `pd.DataFrame.pivot` method.
+
+ This method does not mutate the original DataFrame.
+
+ Column selection in `index`, `names_from` and `values_from`
+ is possible using the
+ [`select`][janitor.functions.select.select] syntax.
+
+ A ValueError is raised if the combination
+ of the `index` and `names_from` is not unique.
+
+ By default, values from `values_from` are always
+ at the top level if the columns are not flattened.
+ If flattened, the values from `values_from` are usually
+ at the start of each label in the columns.
+
+ Examples:
+ >>> import pandas as pd
+ >>> import janitor
+ >>> df = [{'dep': 5.5, 'step': 1, 'a': 20, 'b': 30},
+ ... {'dep': 5.5, 'step': 2, 'a': 25, 'b': 37},
+ ... {'dep': 6.1, 'step': 1, 'a': 22, 'b': 19},
+ ... {'dep': 6.1, 'step': 2, 'a': 18, 'b': 29}]
+ >>> df = pd.DataFrame(df)
+ >>> df
+ dep step a b
+ 0 5.5 1 20 30
+ 1 5.5 2 25 37
+ 2 6.1 1 22 19
+ 3 6.1 2 18 29
+
+ Pivot and flatten columns:
+ >>> df.pivot_wider( # doctest: +SKIP
+ ... index = "dep",
+ ... names_from = "step",
+ ... )
+ dep a_1 a_2 b_1 b_2
+ 0 5.5 20 25 30 37
+ 1 6.1 22 18 19 29
+
+ Modify columns with `names_sep`:
+ >>> df.pivot_wider( # doctest: +SKIP
+ ... index = "dep",
+ ... names_from = "step",
+ ... names_sep = "",
+ ... )
+ dep a1 a2 b1 b2
+ 0 5.5 20 25 30 37
+ 1 6.1 22 18 19 29
+
+ Modify columns with `names_glue`:
+ >>> df.pivot_wider( # doctest: +SKIP
+ ... index = "dep",
+ ... names_from = "step",
+ ... names_glue = "{_value}_step{step}",
+ ... )
+ dep a_step1 a_step2 b_step1 b_step2
+ 0 5.5 20 25 30 37
+ 1 6.1 22 18 19 29
+
+ Expand columns to expose implicit missing values
+ - this applies only to categorical columns:
+ >>> weekdays = ("Mon", "Tue", "Wed", "Thu", "Fri", "Sat", "Sun")
+ >>> daily = pd.DataFrame(
+ ... {
+ ... "day": pd.Categorical(
+ ... values=("Tue", "Thu", "Fri", "Mon"), categories=weekdays
+ ... ),
+ ... "value": (2, 3, 1, 5),
+ ... },
+ ... index=[0, 0, 0, 0],
+ ... )
+ >>> daily
+ day value
+ 0 Tue 2
+ 0 Thu 3
+ 0 Fri 1
+ 0 Mon 5
+ >>> daily.pivot_wider(names_from='day', values_from='value') # doctest: +SKIP
+ Tue Thu Fri Mon
+ 0 2 3 1 5
+ >>> (daily # doctest: +SKIP
+ ... .pivot_wider(
+ ... names_from='day',
+ ... values_from='value',
+ ... names_expand=True)
+ ... )
+ Mon Tue Wed Thu Fri Sat Sun
+ 0 5 2 NaN 3 1 NaN NaN
+
+ Expand the index to expose implicit missing values
+ - this applies only to categorical columns:
+ >>> daily = daily.assign(letter = list('ABBA'))
+ >>> daily
+ day value letter
+ 0 Tue 2 A
+ 0 Thu 3 B
+ 0 Fri 1 B
+ 0 Mon 5 A
+ >>> daily.pivot_wider(index='day',names_from='letter',values_from='value') # doctest: +SKIP
+ day A B
+ 0 Tue 2.0 NaN
+ 1 Thu NaN 3.0
+ 2 Fri NaN 1.0
+ 3 Mon 5.0 NaN
+ >>> (daily # doctest: +SKIP
+ ... .pivot_wider(
+ ... index='day',
+ ... names_from='letter',
+ ... values_from='value',
+ ... index_expand=True)
+ ... )
+ day A B
+ 0 Mon 5.0 NaN
+ 1 Tue 2.0 NaN
+ 2 Wed NaN NaN
+ 3 Thu NaN 3.0
+ 4 Fri NaN 1.0
+ 5 Sat NaN NaN
+ 6 Sun NaN NaN
+
+
+ !!! abstract "Version Changed"
+
+ - 0.24.0
+ - Added `reset_index`, `names_expand` and `index_expand` parameters.
+
+ Args:
+ df: A pandas DataFrame.
+ index: Name(s) of columns to use as identifier variables.
+ It should be either a single column name, or a list of column names.
+ If `index` is not provided, the DataFrame's index is used.
+ names_from: Name(s) of column(s) to use to make the new
+ DataFrame's columns. Should be either a single column name,
+ or a list of column names.
+ values_from: Name(s) of column(s) that will be used for populating
+ the new DataFrame's values.
+ If `values_from` is not specified, all remaining columns
+ will be used.
+ flatten_levels: If `False`, the DataFrame stays as a MultiIndex.
+ names_sep: If `names_from` or `values_from` contain multiple
+ variables, this will be used to join the values into a single string
+ to use as a column name. Default is `_`.
+ Applicable only if `flatten_levels` is `True`.
+ names_glue: A string to control the output of the flattened columns.
+ It offers more flexibility in creating custom column names,
+ and uses python's `str.format_map` under the hood.
+ Simply create the string template,
+ using the column labels in `names_from`,
+ and special `_value` as a placeholder for `values_from`.
+ Applicable only if `flatten_levels` is `True`.
+ reset_index: Determines whether to restore `index`
+ as a column/columns. Applicable only if `index` is provided,
+ and `flatten_levels` is `True`.
+ names_expand: Expand columns to show all the categories.
+ Applies only if `names_from` is a categorical column.
+ index_expand: Expand the index to show all the categories.
+ Applies only if `index` is a categorical column.
+
+ Returns:
+ A pandas DataFrame that has been unpivoted from long to wide form.
+ """# noqa: E501
+
+ # no need for an explicit copy --> df = df.copy()
+ # `pd.pivot` creates one
+ return_computations_pivot_wider(
+ df,
+ index,
+ names_from,
+ values_from,
+ flatten_levels,
+ names_sep,
+ names_glue,
+ reset_index,
+ names_expand,
+ index_expand,
+ )
+
Apply a Pandas string method to an existing column.
+
This function aims to make string cleaning easy, while chaining,
+by simply passing the string method name,
+along with keyword arguments, if any, to the function.
+
This modifies an existing column; it does not create a new column;
+new columns can be created via pyjanitor's
+transform_columns.
+
A list of all the string methods in Pandas can be accessed here.
+
+
Note
+
This function will be deprecated in a 1.x release.
+Please use jn.transform_column
+instead.
@pf.register_dataframe_method
+@refactored_function(
+ message=(
+ "This function will be deprecated in a 1.x release. "
+ "Please use `jn.transform_columns` instead."
+ )
+)
+@deprecated_alias(column="column_name")
+defprocess_text(
+ df:pd.DataFrame,
+ column_name:str,
+ string_function:str,
+ **kwargs:Any,
+)->pd.DataFrame:
+"""Apply a Pandas string method to an existing column.
+
+ This function aims to make string cleaning easy, while chaining,
+ by simply passing the string method name,
+ along with keyword arguments, if any, to the function.
+
+ This modifies an existing column; it does not create a new column;
+ new columns can be created via pyjanitor's
+ [`transform_columns`][janitor.functions.transform_columns.transform_columns].
+
+ A list of all the string methods in Pandas can be accessed [here](https://pandas.pydata.org/docs/user_guide/text.html#method-summary).
+
+ !!!note
+
+ This function will be deprecated in a 1.x release.
+ Please use [`jn.transform_column`][janitor.functions.transform_columns.transform_column]
+ instead.
+
+ Examples:
+ >>> import pandas as pd
+ >>> import janitor
+ >>> import re
+ >>> df = pd.DataFrame({"text": ["Ragnar", "sammywemmy", "ginger"],
+ ... "code": [1, 2, 3]})
+ >>> df
+ text code
+ 0 Ragnar 1
+ 1 sammywemmy 2
+ 2 ginger 3
+ >>> df.process_text(column_name="text", string_function="lower")
+ text code
+ 0 ragnar 1
+ 1 sammywemmy 2
+ 2 ginger 3
+
+ For string methods with parameters, simply pass the keyword arguments:
+
+ >>> df.process_text(
+ ... column_name="text",
+ ... string_function="extract",
+ ... pat=r"(ag)",
+ ... expand=False,
+ ... flags=re.IGNORECASE,
+ ... )
+ text code
+ 0 ag 1
+ 1 NaN 2
+ 2 NaN 3
+
+ Args:
+ df: A pandas DataFrame.
+ column_name: String column to be operated on.
+ string_function: pandas string method to be applied.
+ **kwargs: Keyword arguments for parameters of the `string_function`.
+
+ Raises:
+ KeyError: If `string_function` is not a Pandas string method.
+ ValueError: If the text function returns a DataFrame, instead of a Series.
+
+ Returns:
+ A pandas DataFrame with modified column.
+ """# noqa: E501
+
+ check("column_name",column_name,[str])
+ check("string_function",string_function,[str])
+ check_column(df,[column_name])
+
+ pandas_string_methods=[
+ func.__name__
+ for_,funcininspect.getmembers(pd.Series.str,inspect.isfunction)
+ ifnotfunc.__name__.startswith("_")
+ ]
+
+ ifstring_functionnotinpandas_string_methods:
+ raiseKeyError(f"{string_function} is not a Pandas string method.")
+
+ result=getattr(df[column_name].str,string_function)(**kwargs)
+
+ ifisinstance(result,pd.DataFrame):
+ raiseValueError(
+ "The outcome of the processed text is a DataFrame, "
+ "which is not supported in `process_text`."
+ )
+
+ returndf.assign(**{column_name:result})
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+ remove_columns
+
+
+
+
+
+
+
Implementation of remove_columns.
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+ remove_columns(df,column_names)
+
+
+
+
+
+
+
Remove the set of columns specified in column_names.
+
This method does not mutate the original DataFrame.
+
Intended to be the method-chaining alternative to del df[col].
+
+
Note
+
This function will be deprecated in a 1.x release.
+Kindly use pd.DataFrame.drop instead.
+
+
+
+
+
Examples:
+
>>> importpandasaspd
+>>> importjanitor
+>>> df=pd.DataFrame({"a":[2,4,6],"b":[1,3,5],"c":[7,8,9]})
+>>> df
+ a b c
+0 2 1 7
+1 4 3 8
+2 6 5 9
+>>> df.remove_columns(column_names=['a','c'])
+ b
+0 1
+1 3
+2 5
+
+
+
+
+
Parameters:
+
+
+
+
Name
+
Type
+
Description
+
Default
+
+
+
+
+
df
+
+ DataFrame
+
+
+
+
A pandas DataFrame.
+
+
+
+ required
+
+
+
+
column_names
+
+ Union[str, Iterable[str], Hashable]
+
+
+
+
The columns to remove.
+
+
+
+ required
+
+
+
+
+
+
+
+
Returns:
+
+
+
+
Type
+
Description
+
+
+
+
+
+ DataFrame
+
+
+
+
A pandas DataFrame.
+
+
+
+
+
+
+
+ Source code in janitor/functions/remove_columns.py
+
@pf.register_dataframe_method
+@refactored_function(
+ message=(
+ "This function will be deprecated in a 1.x release. "
+ "Please use `pd.DataFrame.drop` instead."
+ )
+)
+@deprecated_alias(columns="column_names")
+defremove_columns(
+ df:pd.DataFrame,
+ column_names:Union[str,Iterable[str],Hashable],
+)->pd.DataFrame:
+"""Remove the set of columns specified in `column_names`.
+
+ This method does not mutate the original DataFrame.
+
+ Intended to be the method-chaining alternative to `del df[col]`.
+
+ !!!note
+
+ This function will be deprecated in a 1.x release.
+ Kindly use `pd.DataFrame.drop` instead.
+
+ Examples:
+ >>> import pandas as pd
+ >>> import janitor
+ >>> df = pd.DataFrame({"a": [2, 4, 6], "b": [1, 3, 5], "c": [7, 8, 9]})
+ >>> df
+ a b c
+ 0 2 1 7
+ 1 4 3 8
+ 2 6 5 9
+ >>> df.remove_columns(column_names=['a', 'c'])
+ b
+ 0 1
+ 1 3
+ 2 5
+
+ Args:
+ df: A pandas DataFrame.
+ column_names: The columns to remove.
+
+ Returns:
+ A pandas DataFrame.
+ """
+
+ returndf.drop(columns=column_names)
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+ remove_empty
+
+
+
+
+
+
+
Implementation of remove_empty.
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+ remove_empty(df,reset_index=True)
+
+
+
+
+
+
+
Drop all rows and columns that are completely null.
+
This method does not mutate the original DataFrame.
>>> importnumpyasnp
+>>> importpandasaspd
+>>> importjanitor
+>>> df=pd.DataFrame({
+... "a":[1,np.nan,2],
+... "b":[3,np.nan,4],
+... "c":[np.nan,np.nan,np.nan],
+... })
+>>> df
+ a b c
+0 1.0 3.0 NaN
+1 NaN NaN NaN
+2 2.0 4.0 NaN
+>>> df.remove_empty()
+ a b
+0 1.0 3.0
+1 2.0 4.0
+
+
+
+
+
Parameters:
+
+
+
+
Name
+
Type
+
Description
+
Default
+
+
+
+
+
df
+
+ DataFrame
+
+
+
+
The pandas DataFrame object.
+
+
+
+ required
+
+
+
+
reset_index
+
+ bool
+
+
+
+
Determines if the index is reset.
+
+
+
+ True
+
+
+
+
+
+
+
+
Returns:
+
+
+
+
Type
+
Description
+
+
+
+
+
+ DataFrame
+
+
+
+
A pandas DataFrame.
+
+
+
+
+
+
+
+ Source code in janitor/functions/remove_empty.py
+
This method does not mutate the original DataFrame.
+
+
Note
+
This function will be deprecated in a 1.x release.
+Please use pd.DataFrame.rename instead.
+
+
This is just syntactic sugar/a convenience function for renaming one column at a time.
+If you are convinced that there are multiple columns in need of changing,
+then use the pandas.DataFrame.rename method.
+
+
+
+
Examples:
+
Change the name of column 'a' to 'a_new'.
+
>>> importpandasaspd
+>>> importjanitor
+>>> df=pd.DataFrame({"a":list(range(3)),"b":list("abc")})
+>>> df.rename_column(old_column_name='a',new_column_name='a_new')
+ a_new b
+0 0 a
+1 1 b
+2 2 c
+
+
+
+
+
Parameters:
+
+
+
+
Name
+
Type
+
Description
+
Default
+
+
+
+
+
df
+
+ DataFrame
+
+
+
+
The pandas DataFrame object.
+
+
+
+ required
+
+
+
+
old_column_name
+
+ str
+
+
+
+
The old column name.
+
+
+
+ required
+
+
+
+
new_column_name
+
+ str
+
+
+
+
The new column name.
+
+
+
+ required
+
+
+
+
+
+
+
+
Returns:
+
+
+
+
Type
+
Description
+
+
+
+
+
+ DataFrame
+
+
+
+
A pandas DataFrame with renamed columns.
+
+
+
+
+
+
+
+ Source code in janitor/functions/rename_columns.py
+
@pf.register_dataframe_method
+@refactored_function(
+ message=(
+ "This function will be deprecated in a 1.x release. "
+ "Please use `pd.DataFrame.rename` instead."
+ )
+)
+@deprecated_alias(old="old_column_name",new="new_column_name")
+defrename_column(
+ df:pd.DataFrame,
+ old_column_name:str,
+ new_column_name:str,
+)->pd.DataFrame:
+"""Rename a column in place.
+
+ This method does not mutate the original DataFrame.
+
+ !!!note
+
+ This function will be deprecated in a 1.x release.
+ Please use `pd.DataFrame.rename` instead.
+
+ This is just syntactic sugar/a convenience function for renaming one column at a time.
+ If you are convinced that there are multiple columns in need of changing,
+ then use the `pandas.DataFrame.rename` method.
+
+ Examples:
+ Change the name of column 'a' to 'a_new'.
+
+ >>> import pandas as pd
+ >>> import janitor
+ >>> df = pd.DataFrame({"a": list(range(3)), "b": list("abc")})
+ >>> df.rename_column(old_column_name='a', new_column_name='a_new')
+ a_new b
+ 0 0 a
+ 1 1 b
+ 2 2 c
+
+ Args:
+ df: The pandas DataFrame object.
+ old_column_name: The old column name.
+ new_column_name: The new column name.
+
+ Returns:
+ A pandas DataFrame with renamed columns.
+ """# noqa: E501
+
+ check_column(df,[old_column_name])
+
+ returndf.rename(columns={old_column_name:new_column_name})
+
This method does not mutate the original DataFrame.
+
+
Note
+
This function will be deprecated in a 1.x release.
+Please use pd.DataFrame.rename instead.
+
+
One of the new_column_names or function are a required parameter.
+If both are provided, then new_column_names takes priority and function
+is never executed.
+
+
+
+
Examples:
+
Rename columns using a dictionary which maps old names to new names.
+
>>> importpandasaspd
+>>> importjanitor
+>>> df=pd.DataFrame({"a":list(range(3)),"b":list("xyz")})
+>>> df
+ a b
+0 0 x
+1 1 y
+2 2 z
+>>> df.rename_columns(new_column_names={"a":"a_new","b":"b_new"})
+ a_new b_new
+0 0 x
+1 1 y
+2 2 z
+
+
Rename columns using a generic callable.
+
>>> importpandasaspd
+>>> importjanitor
+>>> df=pd.DataFrame({"a":list(range(3)),"b":list("xyz")})
+>>> df.rename_columns(function=str.upper)
+ A B
+0 0 x
+1 1 y
+2 2 z
+
+
+
+
+
Parameters:
+
+
+
+
Name
+
Type
+
Description
+
Default
+
+
+
+
+
df
+
+ DataFrame
+
+
+
+
The pandas DataFrame object.
+
+
+
+ required
+
+
+
+
new_column_names
+
+ Union[Dict, None]
+
+
+
+
A dictionary of old and new column names.
+
+
+
+ None
+
+
+
+
function
+
+ Callable
+
+
+
+
A function which should be applied to all the columns.
+
+
+
+ None
+
+
+
+
+
+
+
+
Raises:
+
+
+
+
Type
+
Description
+
+
+
+
+
+ ValueError
+
+
+
+
If both new_column_names and function are None.
+
+
+
+
+
+
+
+
+
Returns:
+
+
+
+
Type
+
Description
+
+
+
+
+
+ DataFrame
+
+
+
+
A pandas DataFrame with renamed columns.
+
+
+
+
+
+
+
+ Source code in janitor/functions/rename_columns.py
+
@pf.register_dataframe_method
+@refactored_function(
+ message=(
+ "This function will be deprecated in a 1.x release. "
+ "Please use `pd.DataFrame.rename` instead."
+ )
+)
+defrename_columns(
+ df:pd.DataFrame,
+ new_column_names:Union[Dict,None]=None,
+ function:Callable=None,
+)->pd.DataFrame:
+"""Rename columns.
+
+ This method does not mutate the original DataFrame.
+
+ !!!note
+
+ This function will be deprecated in a 1.x release.
+ Please use `pd.DataFrame.rename` instead.
+
+ One of the `new_column_names` or `function` are a required parameter.
+ If both are provided, then `new_column_names` takes priority and `function`
+ is never executed.
+
+ Examples:
+ Rename columns using a dictionary which maps old names to new names.
+
+ >>> import pandas as pd
+ >>> import janitor
+ >>> df = pd.DataFrame({"a": list(range(3)), "b": list("xyz")})
+ >>> df
+ a b
+ 0 0 x
+ 1 1 y
+ 2 2 z
+ >>> df.rename_columns(new_column_names={"a": "a_new", "b": "b_new"})
+ a_new b_new
+ 0 0 x
+ 1 1 y
+ 2 2 z
+
+ Rename columns using a generic callable.
+
+ >>> import pandas as pd
+ >>> import janitor
+ >>> df = pd.DataFrame({"a": list(range(3)), "b": list("xyz")})
+ >>> df.rename_columns(function=str.upper)
+ A B
+ 0 0 x
+ 1 1 y
+ 2 2 z
+
+ Args:
+ df: The pandas DataFrame object.
+ new_column_names: A dictionary of old and new column names.
+ function: A function which should be applied to all the columns.
+
+ Raises:
+ ValueError: If both `new_column_names` and `function` are None.
+
+ Returns:
+ A pandas DataFrame with renamed columns.
+ """# noqa: E501
+
+ ifnew_column_namesisNoneandfunctionisNone:
+ raiseValueError(
+ "One of new_column_names or function must be provided"
+ )
+
+ ifnew_column_namesisnotNone:
+ check_column(df,new_column_names)
+ returndf.rename(columns=new_column_names)
+
+ returndf.rename(mapper=function,axis="columns")
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+ reorder_columns
+
+
+
+
+
+
+
Implementation source for reorder_columns.
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+ reorder_columns(df,column_order)
+
+
+
+
+
+
+
Reorder DataFrame columns by specifying desired order as list of col names.
+
Columns not specified retain their order and follow after the columns specified
+in column_order.
+
All columns specified within the column_order list must be present within df.
+
This method does not mutate the original DataFrame.
@pf.register_dataframe_method
+defreorder_columns(
+ df:pd.DataFrame,column_order:Union[Iterable[str],pd.Index,Hashable]
+)->pd.DataFrame:
+"""Reorder DataFrame columns by specifying desired order as list of col names.
+
+ Columns not specified retain their order and follow after the columns specified
+ in `column_order`.
+
+ All columns specified within the `column_order` list must be present within `df`.
+
+ This method does not mutate the original DataFrame.
+
+ Examples:
+ >>> import pandas as pd
+ >>> import janitor
+ >>> df = pd.DataFrame({"col1": [1, 1, 1], "col2": [2, 2, 2], "col3": [3, 3, 3]})
+ >>> df
+ col1 col2 col3
+ 0 1 2 3
+ 1 1 2 3
+ 2 1 2 3
+ >>> df.reorder_columns(['col3', 'col1'])
+ col3 col1 col2
+ 0 3 1 2
+ 1 3 1 2
+ 2 3 1 2
+
+ Notice that the column order of `df` is now `col3`, `col1`, `col2`.
+
+ Internally, this function uses `DataFrame.reindex` with `copy=False`
+ to avoid unnecessary data duplication.
+
+ Args:
+ df: `DataFrame` to reorder
+ column_order: A list of column names or Pandas `Index`
+ specifying their order in the returned `DataFrame`.
+
+ Raises:
+ IndexError: If a column within `column_order` is not found
+ within the DataFrame.
+
+ Returns:
+ A pandas DataFrame with reordered columns.
+ """# noqa: E501
+ check("column_order",column_order,[list,tuple,pd.Index])
+
+ ifany(colnotindf.columnsforcolincolumn_order):
+ raiseIndexError(
+ "One or more columns in `column_order` were not found in the "
+ "DataFrame."
+ )
+
+ # if column_order is a Pandas index, needs conversion to list:
+ column_order=list(column_order)
+
+ returndf.reindex(
+ columns=(
+ column_order
+ +[colforcolindf.columnsifcolnotincolumn_order]
+ ),
+ copy=False,
+ )
+
@pf.register_dataframe_method
+@deprecated_alias(col_name="column_name")
+defround_to_fraction(
+ df:pd.DataFrame,
+ column_name:Hashable,
+ denominator:float,
+ digits:float=np.inf,
+)->pd.DataFrame:
+"""Round all values in a column to a fraction.
+
+ This method mutates the original DataFrame.
+
+ Taken from [the R package](https://github.com/sfirke/janitor/issues/235).
+
+ Also, optionally round to a specified number of digits.
+
+ Examples:
+ Round numeric column to the nearest 1/4 value.
+
+ >>> import numpy as np
+ >>> import pandas as pd
+ >>> import janitor
+ >>> df = pd.DataFrame({
+ ... "a1": [1.263, 2.499, np.nan],
+ ... "a2": ["x", "y", "z"],
+ ... })
+ >>> df
+ a1 a2
+ 0 1.263 x
+ 1 2.499 y
+ 2 NaN z
+ >>> df.round_to_fraction("a1", denominator=4)
+ a1 a2
+ 0 1.25 x
+ 1 2.50 y
+ 2 NaN z
+
+ Args:
+ df: A pandas DataFrame.
+ column_name: Name of column to round to fraction.
+ denominator: The denominator of the fraction for rounding. Must be
+ a positive number.
+ digits: The number of digits for rounding after rounding to the
+ fraction. Default is np.inf (i.e. no subsequent rounding).
+
+ Raises:
+ ValueError: If `denominator` is not a positive number.
+
+ Returns:
+ A pandas DataFrame with a column's values rounded.
+ """
+ check_column(df,column_name)
+ check("denominator",denominator,[float,int])
+ check("digits",digits,[float,int])
+
+ ifdenominator<=0:
+ raiseValueError("denominator is expected to be a positive number.")
+
+ df[column_name]=round(df[column_name]*denominator,0)/denominator
+ ifnotnp.isinf(digits):
+ df[column_name]=round(df[column_name],digits)
+
+ returndf
+
Elevates a row, or rows, to be the column names of a DataFrame.
+
This method does not mutate the original DataFrame.
+
Contains options to remove the elevated row from the DataFrame along with
+removing the rows above the selected row.
+
+
+
+
Examples:
+
Replace column names with the first row and reset the index.
+
>>> importpandasaspd
+>>> importjanitor
+>>> df=pd.DataFrame({
+... "a":["nums",6,9],
+... "b":["chars","x","y"],
+... })
+>>> df
+ a b
+0 nums chars
+1 6 x
+2 9 y
+>>> df.row_to_names(0,remove_rows=True,reset_index=True)
+ nums chars
+0 6 x
+1 9 y
+>>> df.row_to_names([0,1],remove_rows=True,reset_index=True)
+ nums chars
+ 6 x
+0 9 y
+
+
Remove rows above the elevated row and the elevated row itself.
+
>>> importpandasaspd
+>>> importjanitor
+>>> df=pd.DataFrame({
+... "a":["bla1","nums",6,9],
+... "b":["bla2","chars","x","y"],
+... })
+>>> df
+ a b
+0 bla1 bla2
+1 nums chars
+2 6 x
+3 9 y
+>>> df.row_to_names(1,remove_rows=True,remove_rows_above=True,reset_index=True)
+ nums chars
+0 6 x
+1 9 y
+
+
+
+
+
Parameters:
+
+
+
+
Name
+
Type
+
Description
+
Default
+
+
+
+
+
df
+
+ DataFrame
+
+
+
+
A pandas DataFrame.
+
+
+
+ required
+
+
+
+
row_numbers
+
+ int
+
+
+
+
Position of the row(s) containing the variable names.
+Note that indexing starts from 0. It can also be a list,
+in which case, a MultiIndex column is created.
+Defaults to 0 (first row).
+
+
+
+ 0
+
+
+
+
remove_row
+
+
+
+
+
Whether the row(s) should be removed from the DataFrame.
+
+
+
+ required
+
+
+
+
remove_rows_above
+
+ bool
+
+
+
+
Whether the row(s) above the selected row should
+be removed from the DataFrame.
+
+
+
+ False
+
+
+
+
reset_index
+
+ bool
+
+
+
+
Whether the index should be reset on the returning DataFrame.
+
+
+
+ False
+
+
+
+
+
+
+
+
Returns:
+
+
+
+
Type
+
Description
+
+
+
+
+
+ DataFrame
+
+
+
+
A pandas DataFrame with set column names.
+
+
+
+
+
+
+
+ Source code in janitor/functions/row_to_names.py
+
@pf.register_dataframe_method
+@deprecated_alias(row_number="row_numbers",remove_row="remove_rows")
+defrow_to_names(
+ df:pd.DataFrame,
+ row_numbers:int=0,
+ remove_rows:bool=False,
+ remove_rows_above:bool=False,
+ reset_index:bool=False,
+)->pd.DataFrame:
+"""Elevates a row, or rows, to be the column names of a DataFrame.
+
+ This method does not mutate the original DataFrame.
+
+ Contains options to remove the elevated row from the DataFrame along with
+ removing the rows above the selected row.
+
+ Examples:
+ Replace column names with the first row and reset the index.
+
+ >>> import pandas as pd
+ >>> import janitor
+ >>> df = pd.DataFrame({
+ ... "a": ["nums", 6, 9],
+ ... "b": ["chars", "x", "y"],
+ ... })
+ >>> df
+ a b
+ 0 nums chars
+ 1 6 x
+ 2 9 y
+ >>> df.row_to_names(0, remove_rows=True, reset_index=True)
+ nums chars
+ 0 6 x
+ 1 9 y
+ >>> df.row_to_names([0,1], remove_rows=True, reset_index=True)
+ nums chars
+ 6 x
+ 0 9 y
+
+ Remove rows above the elevated row and the elevated row itself.
+
+ >>> import pandas as pd
+ >>> import janitor
+ >>> df = pd.DataFrame({
+ ... "a": ["bla1", "nums", 6, 9],
+ ... "b": ["bla2", "chars", "x", "y"],
+ ... })
+ >>> df
+ a b
+ 0 bla1 bla2
+ 1 nums chars
+ 2 6 x
+ 3 9 y
+ >>> df.row_to_names(1, remove_rows=True, remove_rows_above=True, reset_index=True)
+ nums chars
+ 0 6 x
+ 1 9 y
+
+ Args:
+ df: A pandas DataFrame.
+ row_numbers: Position of the row(s) containing the variable names.
+ Note that indexing starts from 0. It can also be a list,
+ in which case, a MultiIndex column is created.
+ Defaults to 0 (first row).
+ remove_row: Whether the row(s) should be removed from the DataFrame.
+ remove_rows_above: Whether the row(s) above the selected row should
+ be removed from the DataFrame.
+ reset_index: Whether the index should be reset on the returning DataFrame.
+
+ Returns:
+ A pandas DataFrame with set column names.
+ """# noqa: E501
+ ifnotpd.options.mode.copy_on_write:
+ df=df.copy()
+
+ check("row_number",row_numbers,[int,list])
+ ifisinstance(row_numbers,list):
+ forentryinrow_numbers:
+ check("entry in the row_number argument",entry,[int])
+
+ warnings.warn(
+ "The function row_to_names will, in the official 1.0 release, "
+ "change its behaviour to reset the dataframe's index by default. "
+ "You can prepare for this change right now by explicitly setting "
+ "`reset_index=True` when calling on `row_to_names`."
+ )
+ # should raise if positional indexers are missing
+ # IndexError: positional indexers are out-of-bounds
+ headers=df.iloc[row_numbers]
+ ifisinstance(headers,pd.DataFrame)and(len(headers)==1):
+ headers=headers.squeeze()
+ ifisinstance(headers,pd.Series):
+ headers=pd.Index(headers)
+ else:
+ headers=[entry.arrayfor_,entryinheaders.items()]
+ headers=pd.MultiIndex.from_tuples(headers)
+
+ df.columns=headers
+ df.columns.name=None
+
+ df_index=df.index
+ ifremove_rows_above:
+ ifisinstance(row_numbers,list):
+ ifnot(np.diff(row_numbers)==1).all():
+ raiseValueError(
+ "The remove_rows_above argument is applicable "
+ "only if the row_numbers argument is an integer, "
+ "or the integers in a list are consecutive increasing, "
+ "with a difference of 1."
+ )
+ tail=row_numbers[0]
+ else:
+ tail=row_numbers
+ df=df.iloc[tail:]
+ ifremove_rows:
+ ifisinstance(row_numbers,int):
+ row_numbers=[row_numbers]
+ df_index=df.index.symmetric_difference(df_index[row_numbers])
+ df=df.loc[df_index]
+ ifreset_index:
+ df.index=range(len(df))
+ returndf
+
It accepts a string, shell-like glob strings (*string*),
+regex, slice, array-like object, or a list of the previous options.
+
Selection on a MultiIndex on a level, or multiple levels,
+is possible with a dictionary.
+
This method does not mutate the original DataFrame.
+
Selection can be inverted with the DropLabel class.
+
Optional ability to invert selection of index/columns available as well.
+
+
New in version 0.24.0
+
+
+
Note
+
The preferred option when selecting columns or rows in a Pandas DataFrame
+is with .loc or .iloc methods, as they are generally performant.
+select is primarily for convenience.
More examples can be found in the
+select_columns section.
+
+
+
+
Parameters:
+
+
+
+
Name
+
Type
+
Description
+
Default
+
+
+
+
+
df
+
+ DataFrame
+
+
+
+
A pandas DataFrame.
+
+
+
+ required
+
+
+
+
*args
+
+
+
+
+
Valid inputs include: an exact index name to look for,
+a shell-style glob string (e.g. *_thing_*),
+a regular expression,
+a callable,
+or variable arguments of all the aforementioned.
+A sequence of booleans is also acceptable.
+A dictionary can be used for selection
+on a MultiIndex on different levels.
+
+
+
+ ()
+
+
+
+
index
+
+ Any
+
+
+
+
Valid inputs include: an exact label to look for,
+a shell-style glob string (e.g. *_thing_*),
+a regular expression,
+a callable,
+or variable arguments of all the aforementioned.
+A sequence of booleans is also acceptable.
+A dictionary can be used for selection
+on a MultiIndex on different levels.
+
+
+
+ None
+
+
+
+
columns
+
+ Any
+
+
+
+
Valid inputs include: an exact label to look for,
+a shell-style glob string (e.g. *_thing_*),
+a regular expression,
+a callable,
+or variable arguments of all the aforementioned.
+A sequence of booleans is also acceptable.
+A dictionary can be used for selection
+on a MultiIndex on different levels.
+
+
+
+ None
+
+
+
+
invert
+
+ bool
+
+
+
+
Whether or not to invert the selection.
+This will result in the selection
+of the complement of the rows/columns provided.
+
+
+
+ False
+
+
+
+
axis
+
+ str
+
+
+
+
Whether the selection should be on the index('index'),
+or columns('columns').
+Applicable only for the variable args parameter.
+
+
+
+ 'columns'
+
+
+
+
+
+
+
+
Raises:
+
+
+
+
Type
+
Description
+
+
+
+
+
+ ValueError
+
+
+
+
If args and index/columns are provided.
+
+
+
+
+
+
+
+
+
Returns:
+
+
+
+
Type
+
Description
+
+
+
+
+
+ DataFrame
+
+
+
+
A pandas DataFrame with the specified rows and/or columns selected.
+
+
+
+
+
+
+
+ Source code in janitor/functions/select.py
+
@pf.register_dataframe_method
+@deprecated_alias(rows="index")
+defselect(
+ df:pd.DataFrame,
+ *args,
+ index:Any=None,
+ columns:Any=None,
+ axis:str="columns",
+ invert:bool=False,
+)->pd.DataFrame:
+"""Method-chainable selection of rows and columns.
+
+ It accepts a string, shell-like glob strings `(*string*)`,
+ regex, slice, array-like object, or a list of the previous options.
+
+ Selection on a MultiIndex on a level, or multiple levels,
+ is possible with a dictionary.
+
+ This method does not mutate the original DataFrame.
+
+ Selection can be inverted with the `DropLabel` class.
+
+ Optional ability to invert selection of index/columns available as well.
+
+
+ !!! info "New in version 0.24.0"
+
+
+ !!!note
+
+ The preferred option when selecting columns or rows in a Pandas DataFrame
+ is with `.loc` or `.iloc` methods, as they are generally performant.
+ `select` is primarily for convenience.
+
+ !!! abstract "Version Changed"
+
+ - 0.26.0
+ - Added variable `args`, `invert` and `axis` parameters.
+ - `rows` keyword deprecated in favour of `index`.
+
+ Examples:
+ >>> import pandas as pd
+ >>> import janitor
+ >>> df = pd.DataFrame([[1, 2], [4, 5], [7, 8]],
+ ... index=['cobra', 'viper', 'sidewinder'],
+ ... columns=['max_speed', 'shield'])
+ >>> df
+ max_speed shield
+ cobra 1 2
+ viper 4 5
+ sidewinder 7 8
+ >>> df.select(index='cobra', columns='shield')
+ shield
+ cobra 2
+
+ Labels can be dropped with the `DropLabel` class:
+
+ >>> df.select(index=DropLabel('cobra'))
+ max_speed shield
+ viper 4 5
+ sidewinder 7 8
+
+ More examples can be found in the
+ [`select_columns`][janitor.functions.select.select_columns] section.
+
+ Args:
+ df: A pandas DataFrame.
+ *args: Valid inputs include: an exact index name to look for,
+ a shell-style glob string (e.g. `*_thing_*`),
+ a regular expression,
+ a callable,
+ or variable arguments of all the aforementioned.
+ A sequence of booleans is also acceptable.
+ A dictionary can be used for selection
+ on a MultiIndex on different levels.
+ index: Valid inputs include: an exact label to look for,
+ a shell-style glob string (e.g. `*_thing_*`),
+ a regular expression,
+ a callable,
+ or variable arguments of all the aforementioned.
+ A sequence of booleans is also acceptable.
+ A dictionary can be used for selection
+ on a MultiIndex on different levels.
+ columns: Valid inputs include: an exact label to look for,
+ a shell-style glob string (e.g. `*_thing_*`),
+ a regular expression,
+ a callable,
+ or variable arguments of all the aforementioned.
+ A sequence of booleans is also acceptable.
+ A dictionary can be used for selection
+ on a MultiIndex on different levels.
+ invert: Whether or not to invert the selection.
+ This will result in the selection
+ of the complement of the rows/columns provided.
+ axis: Whether the selection should be on the index('index'),
+ or columns('columns').
+ Applicable only for the variable args parameter.
+
+ Raises:
+ ValueError: If args and index/columns are provided.
+
+ Returns:
+ A pandas DataFrame with the specified rows and/or columns selected.
+ """# noqa: E501
+
+ ifargs:
+ check("invert",invert,[bool])
+ if(indexisnotNone)or(columnsisnotNone):
+ raiseValueError(
+ "Either provide variable args with the axis parameter, "
+ "or provide arguments to the index and/or columns parameters."
+ )
+ ifaxis=="index":
+ return_select(df,rows=list(args),columns=columns,invert=invert)
+ ifaxis=="columns":
+ return_select(df,columns=list(args),rows=index,invert=invert)
+ raiseValueError("axis should be either 'index' or 'columns'.")
+ return_select(df,rows=index,columns=columns,invert=invert)
+
+
+
+
+
+
+
+
+
+
+
+
+ select_columns(df,*args,invert=False)
+
+
+
+
+
+
+
Method-chainable selection of columns.
+
It accepts a string, shell-like glob strings (*string*),
+regex, slice, array-like object, or a list of the previous options.
+
Selection on a MultiIndex on a level, or multiple levels,
+is possible with a dictionary.
+
This method does not mutate the original DataFrame.
+
Optional ability to invert selection of columns available as well.
+
+
Note
+
The preferred option when selecting columns or rows in a Pandas DataFrame
+is with .loc or .iloc methods.
+select_columns is primarily for convenience.
+
+
+
Note
+
This function will be deprecated in a 1.x release.
+Please use jn.select instead.
+
+
+
+
+
Examples:
+
>>> importpandasaspd
+>>> importjanitor
+>>> fromnumpyimportnan
+>>> pd.set_option("display.max_columns",None)
+>>> pd.set_option("display.expand_frame_repr",False)
+>>> pd.set_option("max_colwidth",None)
+>>> data={'name':['Cheetah','Owl monkey','Mountain beaver',
+... 'Greater short-tailed shrew','Cow'],
+... 'genus':['Acinonyx','Aotus','Aplodontia','Blarina','Bos'],
+... 'vore':['carni','omni','herbi','omni','herbi'],
+... 'order':['Carnivora','Primates','Rodentia','Soricomorpha','Artiodactyla'],
+... 'conservation':['lc',nan,'nt','lc','domesticated'],
+... 'sleep_total':[12.1,17.0,14.4,14.9,4.0],
+... 'sleep_rem':[nan,1.8,2.4,2.3,0.7],
+... 'sleep_cycle':[nan,nan,nan,0.133333333,0.666666667],
+... 'awake':[11.9,7.0,9.6,9.1,20.0],
+... 'brainwt':[nan,0.0155,nan,0.00029,0.423],
+... 'bodywt':[50.0,0.48,1.35,0.019,600.0]}
+>>> df=pd.DataFrame(data)
+>>> df
+ name genus vore order conservation sleep_total sleep_rem sleep_cycle awake brainwt bodywt
+0 Cheetah Acinonyx carni Carnivora lc 12.1 NaN NaN 11.9 NaN 50.000
+1 Owl monkey Aotus omni Primates NaN 17.0 1.8 NaN 7.0 0.01550 0.480
+2 Mountain beaver Aplodontia herbi Rodentia nt 14.4 2.4 NaN 9.6 NaN 1.350
+3 Greater short-tailed shrew Blarina omni Soricomorpha lc 14.9 2.3 0.133333 9.1 0.00029 0.019
+4 Cow Bos herbi Artiodactyla domesticated 4.0 0.7 0.666667 20.0 0.42300 600.000
+
Valid inputs include: an exact column name to look for,
+a shell-style glob string (e.g. *_thing_*),
+a regular expression,
+a callable,
+or variable arguments of all the aforementioned.
+A sequence of booleans is also acceptable.
+A dictionary can be used for selection
+on a MultiIndex on different levels.
+
+
+
+ ()
+
+
+
+
invert
+
+ bool
+
+
+
+
Whether or not to invert the selection.
+This will result in the selection
+of the complement of the columns provided.
+
+
+
+ False
+
+
+
+
+
+
+
+
Returns:
+
+
+
+
Type
+
Description
+
+
+
+
+
+ DataFrame
+
+
+
+
A pandas DataFrame with the specified columns selected.
+
+
+
+
+
+
+
+ Source code in janitor/functions/select.py
+
@pf.register_dataframe_method
+@refactored_function(
+ message=(
+ "This function will be deprecated in a 1.x release. "
+ "Please use `jn.select` instead."
+ )
+)
+defselect_columns(
+ df:pd.DataFrame,
+ *args:Any,
+ invert:bool=False,
+)->pd.DataFrame:
+"""Method-chainable selection of columns.
+
+ It accepts a string, shell-like glob strings `(*string*)`,
+ regex, slice, array-like object, or a list of the previous options.
+
+ Selection on a MultiIndex on a level, or multiple levels,
+ is possible with a dictionary.
+
+ This method does not mutate the original DataFrame.
+
+ Optional ability to invert selection of columns available as well.
+
+ !!!note
+
+ The preferred option when selecting columns or rows in a Pandas DataFrame
+ is with `.loc` or `.iloc` methods.
+ `select_columns` is primarily for convenience.
+
+ !!!note
+
+ This function will be deprecated in a 1.x release.
+ Please use `jn.select` instead.
+
+ Examples:
+ >>> import pandas as pd
+ >>> import janitor
+ >>> from numpy import nan
+ >>> pd.set_option("display.max_columns", None)
+ >>> pd.set_option("display.expand_frame_repr", False)
+ >>> pd.set_option("max_colwidth", None)
+ >>> data = {'name': ['Cheetah','Owl monkey','Mountain beaver',
+ ... 'Greater short-tailed shrew','Cow'],
+ ... 'genus': ['Acinonyx', 'Aotus', 'Aplodontia', 'Blarina', 'Bos'],
+ ... 'vore': ['carni', 'omni', 'herbi', 'omni', 'herbi'],
+ ... 'order': ['Carnivora','Primates','Rodentia','Soricomorpha','Artiodactyla'],
+ ... 'conservation': ['lc', nan, 'nt', 'lc', 'domesticated'],
+ ... 'sleep_total': [12.1, 17.0, 14.4, 14.9, 4.0],
+ ... 'sleep_rem': [nan, 1.8, 2.4, 2.3, 0.7],
+ ... 'sleep_cycle': [nan, nan, nan, 0.133333333, 0.666666667],
+ ... 'awake': [11.9, 7.0, 9.6, 9.1, 20.0],
+ ... 'brainwt': [nan, 0.0155, nan, 0.00029, 0.423],
+ ... 'bodywt': [50.0, 0.48, 1.35, 0.019, 600.0]}
+ >>> df = pd.DataFrame(data)
+ >>> df
+ name genus vore order conservation sleep_total sleep_rem sleep_cycle awake brainwt bodywt
+ 0 Cheetah Acinonyx carni Carnivora lc 12.1 NaN NaN 11.9 NaN 50.000
+ 1 Owl monkey Aotus omni Primates NaN 17.0 1.8 NaN 7.0 0.01550 0.480
+ 2 Mountain beaver Aplodontia herbi Rodentia nt 14.4 2.4 NaN 9.6 NaN 1.350
+ 3 Greater short-tailed shrew Blarina omni Soricomorpha lc 14.9 2.3 0.133333 9.1 0.00029 0.019
+ 4 Cow Bos herbi Artiodactyla domesticated 4.0 0.7 0.666667 20.0 0.42300 600.000
+
+ Explicit label selection:
+ >>> df.select_columns('name', 'order')
+ name order
+ 0 Cheetah Carnivora
+ 1 Owl monkey Primates
+ 2 Mountain beaver Rodentia
+ 3 Greater short-tailed shrew Soricomorpha
+ 4 Cow Artiodactyla
+
+ Selection via globbing:
+ >>> df.select_columns("sleep*", "*wt")
+ sleep_total sleep_rem sleep_cycle brainwt bodywt
+ 0 12.1 NaN NaN NaN 50.000
+ 1 17.0 1.8 NaN 0.01550 0.480
+ 2 14.4 2.4 NaN NaN 1.350
+ 3 14.9 2.3 0.133333 0.00029 0.019
+ 4 4.0 0.7 0.666667 0.42300 600.000
+
+ Selection via regex:
+ >>> import re
+ >>> df.select_columns(re.compile(r"o.+er"))
+ order conservation
+ 0 Carnivora lc
+ 1 Primates NaN
+ 2 Rodentia nt
+ 3 Soricomorpha lc
+ 4 Artiodactyla domesticated
+
+ Selection via slicing:
+ >>> df.select_columns(slice('name','order'), slice('sleep_total','sleep_cycle'))
+ name genus vore order sleep_total sleep_rem sleep_cycle
+ 0 Cheetah Acinonyx carni Carnivora 12.1 NaN NaN
+ 1 Owl monkey Aotus omni Primates 17.0 1.8 NaN
+ 2 Mountain beaver Aplodontia herbi Rodentia 14.4 2.4 NaN
+ 3 Greater short-tailed shrew Blarina omni Soricomorpha 14.9 2.3 0.133333
+ 4 Cow Bos herbi Artiodactyla 4.0 0.7 0.666667
+
+ Selection via callable:
+ >>> from pandas.api.types import is_numeric_dtype
+ >>> df.select_columns(is_numeric_dtype)
+ sleep_total sleep_rem sleep_cycle awake brainwt bodywt
+ 0 12.1 NaN NaN 11.9 NaN 50.000
+ 1 17.0 1.8 NaN 7.0 0.01550 0.480
+ 2 14.4 2.4 NaN 9.6 NaN 1.350
+ 3 14.9 2.3 0.133333 9.1 0.00029 0.019
+ 4 4.0 0.7 0.666667 20.0 0.42300 600.000
+ >>> df.select_columns(lambda f: f.isna().any())
+ conservation sleep_rem sleep_cycle brainwt
+ 0 lc NaN NaN NaN
+ 1 NaN 1.8 NaN 0.01550
+ 2 nt 2.4 NaN NaN
+ 3 lc 2.3 0.133333 0.00029
+ 4 domesticated 0.7 0.666667 0.42300
+
+ Exclude columns with the `invert` parameter:
+ >>> df.select_columns(is_numeric_dtype, invert=True)
+ name genus vore order conservation
+ 0 Cheetah Acinonyx carni Carnivora lc
+ 1 Owl monkey Aotus omni Primates NaN
+ 2 Mountain beaver Aplodontia herbi Rodentia nt
+ 3 Greater short-tailed shrew Blarina omni Soricomorpha lc
+ 4 Cow Bos herbi Artiodactyla domesticated
+
+ Exclude columns with the `DropLabel` class:
+ >>> from janitor import DropLabel
+ >>> df.select_columns(DropLabel(slice("name", "awake")), "conservation")
+ brainwt bodywt conservation
+ 0 NaN 50.000 lc
+ 1 0.01550 0.480 NaN
+ 2 NaN 1.350 nt
+ 3 0.00029 0.019 lc
+ 4 0.42300 600.000 domesticated
+
+ Selection on MultiIndex columns:
+ >>> d = {'num_legs': [4, 4, 2, 2],
+ ... 'num_wings': [0, 0, 2, 2],
+ ... 'class': ['mammal', 'mammal', 'mammal', 'bird'],
+ ... 'animal': ['cat', 'dog', 'bat', 'penguin'],
+ ... 'locomotion': ['walks', 'walks', 'flies', 'walks']}
+ >>> df = pd.DataFrame(data=d)
+ >>> df = df.set_index(['class', 'animal', 'locomotion']).T
+ >>> df
+ class mammal bird
+ animal cat dog bat penguin
+ locomotion walks walks flies walks
+ num_legs 4 4 2 2
+ num_wings 0 0 2 2
+
+ Selection with a scalar:
+ >>> df.select_columns('mammal')
+ class mammal
+ animal cat dog bat
+ locomotion walks walks flies
+ num_legs 4 4 2
+ num_wings 0 0 2
+
+ Selection with a tuple:
+ >>> df.select_columns(('mammal','bat'))
+ class mammal
+ animal bat
+ locomotion flies
+ num_legs 2
+ num_wings 2
+
+ Selection within a level is possible with a dictionary,
+ where the key is either a level name or number:
+ >>> df.select_columns({'animal':'cat'})
+ class mammal
+ animal cat
+ locomotion walks
+ num_legs 4
+ num_wings 0
+ >>> df.select_columns({1:["bat", "cat"]})
+ class mammal
+ animal bat cat
+ locomotion flies walks
+ num_legs 2 4
+ num_wings 2 0
+
+ Selection on multiple levels:
+ >>> df.select_columns({"class":"mammal", "locomotion":"flies"})
+ class mammal
+ animal bat
+ locomotion flies
+ num_legs 2
+ num_wings 2
+
+ Selection with a regex on a level:
+ >>> df.select_columns({"animal":re.compile(".+t$")})
+ class mammal
+ animal cat bat
+ locomotion walks flies
+ num_legs 4 2
+ num_wings 0 2
+
+ Selection with a callable on a level:
+ >>> df.select_columns({"animal":lambda f: f.str.endswith('t')})
+ class mammal
+ animal cat bat
+ locomotion walks flies
+ num_legs 4 2
+ num_wings 0 2
+
+ Args:
+ df: A pandas DataFrame.
+ *args: Valid inputs include: an exact column name to look for,
+ a shell-style glob string (e.g. `*_thing_*`),
+ a regular expression,
+ a callable,
+ or variable arguments of all the aforementioned.
+ A sequence of booleans is also acceptable.
+ A dictionary can be used for selection
+ on a MultiIndex on different levels.
+ invert: Whether or not to invert the selection.
+ This will result in the selection
+ of the complement of the columns provided.
+
+ Returns:
+ A pandas DataFrame with the specified columns selected.
+ """# noqa: E501
+
+ return_select(df,columns=list(args),invert=invert)
+
+
+
+
+
+
+
+
+
+
+
+
+ select_rows(df,*args,invert=False)
+
+
+
+
+
+
+
Method-chainable selection of rows.
+
It accepts a string, shell-like glob strings (*string*),
+regex, slice, array-like object, or a list of the previous options.
+
Selection on a MultiIndex on a level, or multiple levels,
+is possible with a dictionary.
+
This method does not mutate the original DataFrame.
+
Optional ability to invert selection of rows available as well.
+
+
New in version 0.24.0
+
+
+
Note
+
The preferred option when selecting columns or rows in a Pandas DataFrame
+is with .loc or .iloc methods, as they are generally performant.
+select_rows is primarily for convenience.
+
+
+
Note
+
This function will be deprecated in a 1.x release.
+Please use jn.select instead.
More examples can be found in the
+select_columns section.
+
+
+
+
Parameters:
+
+
+
+
Name
+
Type
+
Description
+
Default
+
+
+
+
+
df
+
+ DataFrame
+
+
+
+
A pandas DataFrame.
+
+
+
+ required
+
+
+
+
*args
+
+ Any
+
+
+
+
Valid inputs include: an exact index name to look for,
+a shell-style glob string (e.g. *_thing_*),
+a regular expression,
+a callable,
+or variable arguments of all the aforementioned.
+A sequence of booleans is also acceptable.
+A dictionary can be used for selection
+on a MultiIndex on different levels.
+
+
+
+ ()
+
+
+
+
invert
+
+ bool
+
+
+
+
Whether or not to invert the selection.
+This will result in the selection
+of the complement of the rows provided.
+
+
+
+ False
+
+
+
+
+
+
+
+
Returns:
+
+
+
+
Type
+
Description
+
+
+
+
+
+ DataFrame
+
+
+
+
A pandas DataFrame with the specified rows selected.
+
+
+
+
+
+
+
+ Source code in janitor/functions/select.py
+
@pf.register_dataframe_method
+@refactored_function(
+ message=(
+ "This function will be deprecated in a 1.x release. "
+ "Please use `jn.select` instead."
+ )
+)
+defselect_rows(
+ df:pd.DataFrame,
+ *args:Any,
+ invert:bool=False,
+)->pd.DataFrame:
+"""Method-chainable selection of rows.
+
+ It accepts a string, shell-like glob strings `(*string*)`,
+ regex, slice, array-like object, or a list of the previous options.
+
+ Selection on a MultiIndex on a level, or multiple levels,
+ is possible with a dictionary.
+
+ This method does not mutate the original DataFrame.
+
+ Optional ability to invert selection of rows available as well.
+
+
+ !!! info "New in version 0.24.0"
+
+ !!!note
+
+ The preferred option when selecting columns or rows in a Pandas DataFrame
+ is with `.loc` or `.iloc` methods, as they are generally performant.
+ `select_rows` is primarily for convenience.
+
+ !!!note
+
+ This function will be deprecated in a 1.x release.
+ Please use `jn.select` instead.
+
+ Examples:
+ >>> import pandas as pd
+ >>> import janitor
+ >>> df = {"col1": [1, 2], "foo": [3, 4], "col2": [5, 6]}
+ >>> df = pd.DataFrame.from_dict(df, orient='index')
+ >>> df
+ 0 1
+ col1 1 2
+ foo 3 4
+ col2 5 6
+ >>> df.select_rows("col*")
+ 0 1
+ col1 1 2
+ col2 5 6
+
+ More examples can be found in the
+ [`select_columns`][janitor.functions.select.select_columns] section.
+
+ Args:
+ df: A pandas DataFrame.
+ *args: Valid inputs include: an exact index name to look for,
+ a shell-style glob string (e.g. `*_thing_*`),
+ a regular expression,
+ a callable,
+ or variable arguments of all the aforementioned.
+ A sequence of booleans is also acceptable.
+ A dictionary can be used for selection
+ on a MultiIndex on different levels.
+ invert: Whether or not to invert the selection.
+ This will result in the selection
+ of the complement of the rows provided.
+
+ Returns:
+ A pandas DataFrame with the specified rows selected.
+ """# noqa: E501
+ return_select(df,rows=list(args),invert=invert)
+
@pf.register_dataframe_method
+defshuffle(
+ df:pd.DataFrame,random_state:Any=None,reset_index:bool=True
+)->pd.DataFrame:
+"""Shuffle the rows of the DataFrame.
+
+ This method does not mutate the original DataFrame.
+
+ Super-sugary syntax! Underneath the hood, we use `df.sample(frac=1)`,
+ with the option to set the random state.
+
+ Examples:
+ >>> import pandas as pd
+ >>> import janitor
+ >>> df = pd.DataFrame({
+ ... "col1": range(5),
+ ... "col2": list("abcde"),
+ ... })
+ >>> df
+ col1 col2
+ 0 0 a
+ 1 1 b
+ 2 2 c
+ 3 3 d
+ 4 4 e
+ >>> df.shuffle(random_state=42)
+ col1 col2
+ 0 1 b
+ 1 4 e
+ 2 2 c
+ 3 0 a
+ 4 3 d
+
+ Args:
+ df: A pandas DataFrame.
+ random_state: If provided, set a seed for the random number
+ generator. Passed to `pd.DataFrame.sample()`.
+ reset_index: If True, reset the dataframe index to the default
+ RangeIndex.
+
+ Returns:
+ A shuffled pandas DataFrame.
+ """
+ result=df.sample(frac=1,random_state=random_state)
+ ifreset_index:
+ result=result.reset_index(drop=True)
+ returnresult
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+ sort_column_value_order
+
+
+
+
+
+
+
Implementation of the sort_column_value_order function.
@pf.register_dataframe_method
+defsort_column_value_order(
+ df:pd.DataFrame,column:str,column_value_order:dict,columns=None
+)->pd.DataFrame:
+"""This function adds precedence to certain values in a specified column,
+ then sorts based on that column and any other specified columns.
+
+ Examples:
+ >>> import pandas as pd
+ >>> import janitor
+ >>> import numpy as np
+ >>> company_sales = {
+ ... "SalesMonth": ["Jan", "Feb", "Feb", "Mar", "April"],
+ ... "Company1": [150.0, 200.0, 200.0, 300.0, 400.0],
+ ... "Company2": [180.0, 250.0, 250.0, np.nan, 500.0],
+ ... "Company3": [400.0, 500.0, 500.0, 600.0, 675.0],
+ ... }
+ >>> df = pd.DataFrame.from_dict(company_sales)
+ >>> df
+ SalesMonth Company1 Company2 Company3
+ 0 Jan 150.0 180.0 400.0
+ 1 Feb 200.0 250.0 500.0
+ 2 Feb 200.0 250.0 500.0
+ 3 Mar 300.0 NaN 600.0
+ 4 April 400.0 500.0 675.0
+ >>> df.sort_column_value_order(
+ ... "SalesMonth",
+ ... {"April": 1, "Mar": 2, "Feb": 3, "Jan": 4}
+ ... )
+ SalesMonth Company1 Company2 Company3
+ 4 April 400.0 500.0 675.0
+ 3 Mar 300.0 NaN 600.0
+ 1 Feb 200.0 250.0 500.0
+ 2 Feb 200.0 250.0 500.0
+ 0 Jan 150.0 180.0 400.0
+
+ Args:
+ df: pandas DataFrame that we are manipulating
+ column: This is a column name as a string we are using to specify
+ which column to sort by
+ column_value_order: Dictionary of values that will
+ represent precedence of the values in the specified column
+ columns: A list of additional columns that we can sort by
+
+ Raises:
+ ValueError: If chosen Column Name is not in
+ Dataframe, or if `column_value_order` dictionary is empty.
+
+ Returns:
+ A sorted pandas DataFrame.
+ """
+ # Validation checks
+ check_column(df,column,present=True)
+ check("column_value_order",column_value_order,[dict])
+ ifnotcolumn_value_order:
+ raiseValueError("column_value_order dictionary cannot be empty")
+
+ df=df.assign(cond_order=df[column].map(column_value_order))
+
+ sort_by=["cond_order"]
+ ifcolumnsisnotNone:
+ sort_by=["cond_order"]+columns
+
+ df=df.sort_values(sort_by).remove_columns("cond_order")
+ returndf
+
Sort a DataFrame by a column using natural sorting.
+
Natural sorting is distinct from
+the default lexiographical sorting provided by pandas.
+For example, given the following list of items:
+
["A1","A11","A3","A2","A10"]
+
+
Lexicographical sorting would give us:
+
["A1","A10","A11","A2","A3"]
+
+
By contrast, "natural" sorting would give us:
+
["A1","A2","A3","A10","A11"]
+
+
This function thus provides natural sorting
+on a single column of a dataframe.
+
To accomplish this, we do a natural sort
+on the unique values that are present in the dataframe.
+Then, we reconstitute the entire dataframe
+in the naturally sorted order.
+
Natural sorting is provided by the Python package
+natsort.
+
All keyword arguments to natsort should be provided
+after the column name to sort by is provided.
+They are passed through to the natsorted function.
@pf.register_dataframe_method
+defsort_naturally(
+ df:pd.DataFrame,column_name:str,**natsorted_kwargs:Any
+)->pd.DataFrame:
+"""Sort a DataFrame by a column using *natural* sorting.
+
+ Natural sorting is distinct from
+ the default lexiographical sorting provided by `pandas`.
+ For example, given the following list of items:
+
+ ```python
+ ["A1", "A11", "A3", "A2", "A10"]
+ ```
+
+ Lexicographical sorting would give us:
+
+ ```python
+ ["A1", "A10", "A11", "A2", "A3"]
+ ```
+
+ By contrast, "natural" sorting would give us:
+
+ ```python
+ ["A1", "A2", "A3", "A10", "A11"]
+ ```
+
+ This function thus provides *natural* sorting
+ on a single column of a dataframe.
+
+ To accomplish this, we do a natural sort
+ on the unique values that are present in the dataframe.
+ Then, we reconstitute the entire dataframe
+ in the naturally sorted order.
+
+ Natural sorting is provided by the Python package
+ [natsort](https://natsort.readthedocs.io/en/master/index.html).
+
+ All keyword arguments to `natsort` should be provided
+ after the column name to sort by is provided.
+ They are passed through to the `natsorted` function.
+
+ Examples:
+ >>> import pandas as pd
+ >>> import janitor
+ >>> df = pd.DataFrame(
+ ... {
+ ... "Well": ["A21", "A3", "A21", "B2", "B51", "B12"],
+ ... "Value": [1, 2, 13, 3, 4, 7],
+ ... }
+ ... )
+ >>> df
+ Well Value
+ 0 A21 1
+ 1 A3 2
+ 2 A21 13
+ 3 B2 3
+ 4 B51 4
+ 5 B12 7
+ >>> df.sort_naturally("Well")
+ Well Value
+ 1 A3 2
+ 0 A21 1
+ 2 A21 13
+ 3 B2 3
+ 5 B12 7
+ 4 B51 4
+
+ Args:
+ df: A pandas DataFrame.
+ column_name: The column on which natural sorting should take place.
+ **natsorted_kwargs: Keyword arguments to be passed
+ to natsort's `natsorted` function.
+
+ Returns:
+ A sorted pandas DataFrame.
+ """
+ new_order=index_natsorted(df[column_name],**natsorted_kwargs)
+ returndf.iloc[new_order,:]
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+ take_first
+
+
+
+
+
+
+
Implementation of take_first function.
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+ take_first(df,subset,by,ascending=True)
+
+
+
+
+
+
+
Take the first row within each group specified by subset.
+
+
+
+
Examples:
+
>>> importpandasaspd
+>>> importjanitor
+>>> df=pd.DataFrame({"a":["x","x","y","y"],"b":[0,1,2,3]})
+>>> df
+ a b
+0 x 0
+1 x 1
+2 y 2
+3 y 3
+>>> df.take_first(subset="a",by="b")
+ a b
+0 x 0
+2 y 2
+
+
+
+
+
Parameters:
+
+
+
+
Name
+
Type
+
Description
+
Default
+
+
+
+
+
df
+
+ DataFrame
+
+
+
+
A pandas DataFrame.
+
+
+
+ required
+
+
+
+
subset
+
+ Union[Hashable, Iterable[Hashable]]
+
+
+
+
Column(s) defining the group.
+
+
+
+ required
+
+
+
+
by
+
+ Hashable
+
+
+
+
Column to sort by.
+
+
+
+ required
+
+
+
+
ascending
+
+ bool
+
+
+
+
Whether or not to sort in ascending order, bool.
+
+
+
+ True
+
+
+
+
+
+
+
+
Returns:
+
+
+
+
Type
+
Description
+
+
+
+
+
+ DataFrame
+
+
+
+
A pandas DataFrame.
+
+
+
+
+
+
+
+ Source code in janitor/functions/take_first.py
+
A function you would like to run in the method chain.
+It should take one parameter and return one parameter, each being
+the DataFrame object. After that, do whatever you want in the
+middle. Go crazy.
@pf.register_dataframe_method
+@refactored_function(
+ message="This function will be deprecated in a 1.x release. "
+ "Kindly use `pd.DataFrame.pipe` instead."
+)
+defthen(df:pd.DataFrame,func:Callable)->pd.DataFrame:
+"""Add an arbitrary function to run in the `pyjanitor` method chain.
+
+ This method does not mutate the original DataFrame.
+
+ !!!note
+
+ This function will be deprecated in a 1.x release.
+ Please use `pd.DataFrame.pipe` instead.
+
+ Examples:
+ A trivial example using a lambda `func`.
+
+ >>> import pandas as pd
+ >>> import janitor
+ >>> (pd.DataFrame({"a": [1, 2, 3], "b": [7, 8, 9]})
+ ... .then(lambda df: df * 2))
+ a b
+ 0 2 14
+ 1 4 16
+ 2 6 18
+
+ Args:
+ df: A pandas DataFrame.
+ func: A function you would like to run in the method chain.
+ It should take one parameter and return one parameter, each being
+ the DataFrame object. After that, do whatever you want in the
+ middle. Go crazy.
+
+ Returns:
+ A pandas DataFrame.
+ """
+ df=func(df)
+ returndf
+
@pf.register_series_method
+@refactored_function(
+ message=(
+ "This function will be deprecated in a 1.x release. "
+ "Please use `set(df[column])` instead."
+ )
+)
+deftoset(series:pd.Series)->Set:
+"""Return a set of the values.
+
+ !!!note
+
+ This function will be deprecated in a 1.x release.
+ Please use `set(df[column])` instead.
+
+ These are each a scalar type, which is a Python scalar
+ (for str, int, float) or a pandas scalar
+ (for Timestamp/Timedelta/Interval/Period)
+
+ Examples:
+ >>> import pandas as pd
+ >>> import janitor
+ >>> s = pd.Series([1, 2, 3, 5, 5], index=["a", "b", "c", "d", "e"])
+ >>> s
+ a 1
+ b 2
+ c 3
+ d 5
+ e 5
+ dtype: int64
+ >>> s.toset()
+ {1, 2, 3, 5}
+
+ Args:
+ series: A pandas series.
+
+ Returns:
+ A set of values.
+ """
+
+ returnset(series.tolist())
+
Element-wise (default; elementwise=True). Then, the individual
+column elements will be passed in as the first argument of function.
+
Column-wise (elementwise=False). Then, function is expected to
+take in a pandas Series and return a sequence that is of identical length
+to the original.
+
+
If dest_column_name is provided, then the transformation result is stored
+in that column. Otherwise, the transformed result is stored under the name
+of the original column.
+
This method does not mutate the original DataFrame.
+
+
+
+
Examples:
+
Transform a column in-place with an element-wise function.
+
>>> importpandasaspd
+>>> importjanitor
+>>> df=pd.DataFrame({
+... "a":[2,3,4],
+... "b":["area","pyjanitor","grapefruit"],
+... })
+>>> df
+ a b
+0 2 area
+1 3 pyjanitor
+2 4 grapefruit
+>>> df.transform_column(
+... column_name="a",
+... function=lambdax:x**2-1,
+... )
+ a b
+0 3 area
+1 8 pyjanitor
+2 15 grapefruit
+
+
+
+
+
Examples:
+
Transform a column in-place with an column-wise function.
+
>>> df.transform_column(
+... column_name="b",
+... function=lambdasrs:srs.str[:5],
+... elementwise=False,
+... )
+ a b
+0 2 area
+1 3 pyjan
+2 4 grape
+
+
+
+
+
Parameters:
+
+
+
+
Name
+
Type
+
Description
+
Default
+
+
+
+
+
df
+
+ DataFrame
+
+
+
+
A pandas DataFrame.
+
+
+
+ required
+
+
+
+
column_name
+
+ Hashable
+
+
+
+
The column to transform.
+
+
+
+ required
+
+
+
+
function
+
+ Callable
+
+
+
+
A function to apply on the column.
+
+
+
+ required
+
+
+
+
dest_column_name
+
+ Optional[str]
+
+
+
+
The column name to store the transformation result
+in. Defaults to None, which will result in the original column
+name being overwritten. If a name is provided here, then a new
+column with the transformed values will be created.
+
+
+
+ None
+
+
+
+
elementwise
+
+ bool
+
+
+
+
Whether to apply the function elementwise or not.
+If elementwise is True, then the function's first argument
+should be the data type of each datum in the column of data,
+and should return a transformed datum.
+If elementwise is False, then the function's should expect
+a pandas Series passed into it, and return a pandas Series.
+
+
+
+ True
+
+
+
+
+
+
+
+
Returns:
+
+
+
+
Type
+
Description
+
+
+
+
+
+ DataFrame
+
+
+
+
A pandas DataFrame with a transformed column.
+
+
+
+
+
+
+
+ Source code in janitor/functions/transform_columns.py
+
@pf.register_dataframe_method
+@deprecated_alias(col_name="column_name",dest_col_name="dest_column_name")
+deftransform_column(
+ df:pd.DataFrame,
+ column_name:Hashable,
+ function:Callable,
+ dest_column_name:Optional[str]=None,
+ elementwise:bool=True,
+)->pd.DataFrame:
+"""Transform the given column using the provided function.
+
+ Meant to be the method-chaining equivalent of:
+ ```python
+ df[dest_column_name] = df[column_name].apply(function)
+ ```
+
+ Functions can be applied in one of two ways:
+
+ - **Element-wise** (default; `elementwise=True`). Then, the individual
+ column elements will be passed in as the first argument of `function`.
+ - **Column-wise** (`elementwise=False`). Then, `function` is expected to
+ take in a pandas Series and return a sequence that is of identical length
+ to the original.
+
+ If `dest_column_name` is provided, then the transformation result is stored
+ in that column. Otherwise, the transformed result is stored under the name
+ of the original column.
+
+ This method does not mutate the original DataFrame.
+
+ Examples:
+ Transform a column in-place with an element-wise function.
+
+ >>> import pandas as pd
+ >>> import janitor
+ >>> df = pd.DataFrame({
+ ... "a": [2, 3, 4],
+ ... "b": ["area", "pyjanitor", "grapefruit"],
+ ... })
+ >>> df
+ a b
+ 0 2 area
+ 1 3 pyjanitor
+ 2 4 grapefruit
+ >>> df.transform_column(
+ ... column_name="a",
+ ... function=lambda x: x**2 - 1,
+ ... )
+ a b
+ 0 3 area
+ 1 8 pyjanitor
+ 2 15 grapefruit
+
+ Examples:
+ Transform a column in-place with an column-wise function.
+
+ >>> df.transform_column(
+ ... column_name="b",
+ ... function=lambda srs: srs.str[:5],
+ ... elementwise=False,
+ ... )
+ a b
+ 0 2 area
+ 1 3 pyjan
+ 2 4 grape
+
+ Args:
+ df: A pandas DataFrame.
+ column_name: The column to transform.
+ function: A function to apply on the column.
+ dest_column_name: The column name to store the transformation result
+ in. Defaults to None, which will result in the original column
+ name being overwritten. If a name is provided here, then a new
+ column with the transformed values will be created.
+ elementwise: Whether to apply the function elementwise or not.
+ If `elementwise` is True, then the function's first argument
+ should be the data type of each datum in the column of data,
+ and should return a transformed datum.
+ If `elementwise` is False, then the function's should expect
+ a pandas Series passed into it, and return a pandas Series.
+
+ Returns:
+ A pandas DataFrame with a transformed column.
+ """
+ check_column(df,column_name)
+
+ ifdest_column_nameisNone:
+ dest_column_name=column_name
+ elifdest_column_name!=column_name:
+ # If `dest_column_name` is provided and equals `column_name`, then we
+ # assume that the user's intent is to perform an in-place
+ # transformation (Same behaviour as when `dest_column_name` = None).
+ # Otherwise we throw an error if `dest_column_name` already exists in
+ # df.
+ check_column(df,dest_column_name,present=False)
+
+ result=_get_transform_column_result(
+ df[column_name],
+ function,
+ elementwise,
+ )
+
+ returndf.assign(**{dest_column_name:result})
+
Transform multiple columns through the same transformation.
+
This method does not mutate the original DataFrame.
+
Super syntactic sugar!
+Essentially wraps transform_column
+and calls it repeatedly over all column names provided.
+
User can optionally supply either a suffix to create a new set of columns
+with the specified suffix, or provide a dictionary mapping each original
+column name in column_names to its corresponding new column name.
+Note that all column names must be strings.
+
+
+
+
Examples:
+
log10 transform a list of columns, replacing original columns.
Suffix to use when creating new columns to hold
+the transformed values.
+
+
+
+ None
+
+
+
+
elementwise
+
+ bool
+
+
+
+
Passed on to transform_column; whether or not
+to apply the transformation function elementwise (True)
+or columnwise (False).
+
+
+
+ True
+
+
+
+
new_column_names
+
+ Optional[Dict[str, str]]
+
+
+
+
An explicit mapping of old column names in
+column_names to new column names. If any column specified in
+column_names is not a key in this dictionary, the transformation
+will happen in-place for that column.
+
+
+
+ None
+
+
+
+
+
+
+
+
Raises:
+
+
+
+
Type
+
Description
+
+
+
+
+
+ ValueError
+
+
+
+
If both suffix and new_column_names are specified.
+
+
+
+
+
+
+
+
+
Returns:
+
+
+
+
Type
+
Description
+
+
+
+
+
+ DataFrame
+
+
+
+
A pandas DataFrame with transformed columns.
+
+
+
+
+
+
+
+ Source code in janitor/functions/transform_columns.py
+
@pf.register_dataframe_method
+@deprecated_alias(columns="column_names",new_names="new_column_names")
+deftransform_columns(
+ df:pd.DataFrame,
+ column_names:Union[List[str],Tuple[str]],
+ function:Callable,
+ suffix:Optional[str]=None,
+ elementwise:bool=True,
+ new_column_names:Optional[Dict[str,str]]=None,
+)->pd.DataFrame:
+"""Transform multiple columns through the same transformation.
+
+ This method does not mutate the original DataFrame.
+
+ Super syntactic sugar!
+ Essentially wraps [`transform_column`][janitor.functions.transform_columns.transform_column]
+ and calls it repeatedly over all column names provided.
+
+ User can optionally supply either a suffix to create a new set of columns
+ with the specified suffix, or provide a dictionary mapping each original
+ column name in `column_names` to its corresponding new column name.
+ Note that all column names must be strings.
+
+ Examples:
+ log10 transform a list of columns, replacing original columns.
+
+ >>> import numpy as np
+ >>> import pandas as pd
+ >>> import janitor
+ >>> df = pd.DataFrame({
+ ... "col1": [5, 10, 15],
+ ... "col2": [3, 6, 9],
+ ... "col3": [10, 100, 1_000],
+ ... })
+ >>> df
+ col1 col2 col3
+ 0 5 3 10
+ 1 10 6 100
+ 2 15 9 1000
+ >>> df.transform_columns(["col1", "col2", "col3"], np.log10)
+ col1 col2 col3
+ 0 0.698970 0.477121 1.0
+ 1 1.000000 0.778151 2.0
+ 2 1.176091 0.954243 3.0
+
+ Using the `suffix` parameter to create new columns.
+
+ >>> df.transform_columns(["col1", "col3"], np.log10, suffix="_log")
+ col1 col2 col3 col1_log col3_log
+ 0 5 3 10 0.698970 1.0
+ 1 10 6 100 1.000000 2.0
+ 2 15 9 1000 1.176091 3.0
+
+ Using the `new_column_names` parameter to create new columns.
+
+ >>> df.transform_columns(
+ ... ["col1", "col3"],
+ ... np.log10,
+ ... new_column_names={"col1": "transform1"},
+ ... )
+ col1 col2 col3 transform1
+ 0 5 3 1.0 0.698970
+ 1 10 6 2.0 1.000000
+ 2 15 9 3.0 1.176091
+
+ Args:
+ df: A pandas DataFrame.
+ column_names: An iterable of columns to transform.
+ function: A function to apply on each column.
+ suffix: Suffix to use when creating new columns to hold
+ the transformed values.
+ elementwise: Passed on to [`transform_column`][janitor.functions.transform_columns.transform_column]; whether or not
+ to apply the transformation function elementwise (True)
+ or columnwise (False).
+ new_column_names: An explicit mapping of old column names in
+ `column_names` to new column names. If any column specified in
+ `column_names` is not a key in this dictionary, the transformation
+ will happen in-place for that column.
+
+ Raises:
+ ValueError: If both `suffix` and `new_column_names` are specified.
+
+ Returns:
+ A pandas DataFrame with transformed columns.
+ """# noqa: E501
+ check("column_names",column_names,[list,tuple])
+ check_column(df,column_names)
+
+ ifsuffixisnotNoneandnew_column_namesisnotNone:
+ raiseValueError(
+ "Only one of `suffix` or `new_column_names` should be specified."
+ )
+
+ ifsuffix:
+ check("suffix",suffix,[str])
+ dest_column_names={col:col+suffixforcolincolumn_names}
+ elifnew_column_names:
+ check("new_column_names",new_column_names,[dict])
+ dest_column_names={
+ col:new_column_names.get(col,col)forcolincolumn_names
+ }
+ else:
+ dest_column_names=dict(zip(column_names,column_names))
+
+ results={}
+ forold_col,new_colindest_column_names.items():
+ ifold_col!=new_col:
+ check_column(df,new_col,present=False)
+ results[new_col]=_get_transform_column_result(
+ df[old_col],
+ function,
+ elementwise=elementwise,
+ )
+
+ returndf.assign(**results)
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+ truncate_datetime
+
+
+
+
+
+
+
Implementation of the truncate_datetime family of functions.
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+ truncate_datetime_dataframe(df,datepart)
+
+
+
+
+
+
+
Truncate times down to a user-specified precision of
+year, month, day, hour, minute, or second.
+
This method does not mutate the original DataFrame.
Add multiple conditions to update a column in the dataframe.
+
This method does not mutate the original DataFrame.
+
+
+
+
Examples:
+
>>> importjanitor
+>>> data={
+... "a":[1,2,3,4],
+... "b":[5,6,7,8],
+... "c":[0,0,0,0],
+... }
+>>> df=pd.DataFrame(data)
+>>> df
+ a b c
+0 1 5 0
+1 2 6 0
+2 3 7 0
+3 4 8 0
+>>> df.update_where(
+... conditions=(df.a>2)&(df.b<8),
+... target_column_name='c',
+... target_val=10
+... )
+ a b c
+0 1 5 0
+1 2 6 0
+2 3 7 10
+3 4 8 0
+>>> df.update_where(# supports pandas *query* style string expressions
+... conditions="a > 2 and b < 8",
+... target_column_name='c',
+... target_val=10
+... )
+ a b c
+0 1 5 0
+1 2 6 0
+2 3 7 10
+3 4 8 0
+
+
+
+
+
Parameters:
+
+
+
+
Name
+
Type
+
Description
+
Default
+
+
+
+
+
df
+
+ DataFrame
+
+
+
+
The pandas DataFrame object.
+
+
+
+ required
+
+
+
+
conditions
+
+ Any
+
+
+
+
Conditions used to update a target column
+and target value.
+
+
+
+ required
+
+
+
+
target_column_name
+
+ Hashable
+
+
+
+
Column to be updated. If column does not exist
+in DataFrame, a new column will be created; note that entries
+that do not get set in the new column will be null.
+
+
+
+ required
+
+
+
+
target_val
+
+ Any
+
+
+
+
Value to be updated.
+
+
+
+ required
+
+
+
+
+
+
+
+
Raises:
+
+
+
+
Type
+
Description
+
+
+
+
+
+ ValueError
+
+
+
+
If conditions does not return a boolean array-like
+data structure.
+
+
+
+
+
+
+
+
+
Returns:
+
+
+
+
Type
+
Description
+
+
+
+
+
+ DataFrame
+
+
+
+
A pandas DataFrame.
+
+
+
+
+
+
+
+ Source code in janitor/functions/update_where.py
+
@pf.register_dataframe_method
+@deprecated_alias(target_col="target_column_name")
+defupdate_where(
+ df:pd.DataFrame,
+ conditions:Any,
+ target_column_name:Hashable,
+ target_val:Any,
+)->pd.DataFrame:
+"""Add multiple conditions to update a column in the dataframe.
+
+ This method does not mutate the original DataFrame.
+
+ Examples:
+ >>> import janitor
+ >>> data = {
+ ... "a": [1, 2, 3, 4],
+ ... "b": [5, 6, 7, 8],
+ ... "c": [0, 0, 0, 0],
+ ... }
+ >>> df = pd.DataFrame(data)
+ >>> df
+ a b c
+ 0 1 5 0
+ 1 2 6 0
+ 2 3 7 0
+ 3 4 8 0
+ >>> df.update_where(
+ ... conditions = (df.a > 2) & (df.b < 8),
+ ... target_column_name = 'c',
+ ... target_val = 10
+ ... )
+ a b c
+ 0 1 5 0
+ 1 2 6 0
+ 2 3 7 10
+ 3 4 8 0
+ >>> df.update_where( # supports pandas *query* style string expressions
+ ... conditions = "a > 2 and b < 8",
+ ... target_column_name = 'c',
+ ... target_val = 10
+ ... )
+ a b c
+ 0 1 5 0
+ 1 2 6 0
+ 2 3 7 10
+ 3 4 8 0
+
+ Args:
+ df: The pandas DataFrame object.
+ conditions: Conditions used to update a target column
+ and target value.
+ target_column_name: Column to be updated. If column does not exist
+ in DataFrame, a new column will be created; note that entries
+ that do not get set in the new column will be null.
+ target_val: Value to be updated.
+
+ Raises:
+ ValueError: If `conditions` does not return a boolean array-like
+ data structure.
+
+ Returns:
+ A pandas DataFrame.
+ """
+
+ df=df.copy()
+
+ # use query mode if a string expression is passed
+ ifisinstance(conditions,str):
+ conditions=df.eval(conditions)
+
+ ifnotis_bool_dtype(conditions):
+ raiseValueError(
+"""
+ Kindly ensure that `conditions` passed
+ evaluates to a Boolean dtype.
+ """
+ )
+
+ df.loc[conditions,target_column_name]=target_val
+
+ returndf
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+ utils
+
+
+
+
+
+
+
Utility functions for all of the functions submodule.
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+ DropLabel
+
+
+
+ dataclass
+
+
+
+
+
+
+
+
+
Helper class for removing labels within the select syntax.
+
label can be any of the types supported in the select,
+select_rows and select_columns functions.
+An array of integers not matching the labels is returned.
@dataclass
+classDropLabel:
+"""Helper class for removing labels within the `select` syntax.
+
+ `label` can be any of the types supported in the `select`,
+ `select_rows` and `select_columns` functions.
+ An array of integers not matching the labels is returned.
+
+ !!! info "New in version 0.24.0"
+
+ Args:
+ label: Label(s) to be dropped from the index.
+ """
+
+ label:Any
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+ col
+
+
+
+
+
+
+
+
+
Helper class for column selection within an expression.
+
+
+
+
Parameters:
+
+
+
+
Name
+
Type
+
Description
+
Default
+
+
+
+
+
column
+
+ Hashable
+
+
+
+
The name of the column to be selected.
+
+
+
+ required
+
+
+
+
+
+
+
+
Raises:
+
+
+
+
Type
+
Description
+
+
+
+
+
+ TypeError
+
+
+
+
If the column parameter is not hashable.
+
+
+
+
+
+
+
New in version 0.25.0
+
+
+
Warning
+
col is currently considered experimental.
+The implementation and parts of the API
+may change without warning.
classcol:
+"""Helper class for column selection within an expression.
+
+ Args:
+ column (Hashable): The name of the column to be selected.
+
+ Raises:
+ TypeError: If the `column` parameter is not hashable.
+
+ !!! info "New in version 0.25.0"
+
+ !!! warning
+
+ `col` is currently considered experimental.
+ The implementation and parts of the API
+ may change without warning.
+
+ """
+
+ def__init__(self,column:Hashable):
+ self.cols=column
+ check("column",self.cols,[Hashable])
+ self.join_args=None
+
+ def__gt__(self,other):
+"""Implements the greater-than comparison operator (`>`).
+
+ Args:
+ other (col): The other `col` object to compare to.
+
+ Returns:
+ col: The current `col` object.
+ """
+ self.join_args=(self.cols,other.cols,">")
+ returnself
+
+ def__ge__(self,other):
+"""Implements the greater-than-or-equal-to comparison operator (`>=`).
+
+ Args:
+ other (col): The other `col` object to compare to.
+
+ Returns:
+ col: The current `col` object.
+ """
+ self.join_args=(self.cols,other.cols,">=")
+ returnself
+
+ def__lt__(self,other):
+"""Implements the less-than comparison operator (`<`).
+
+ Args:
+ other (col): The other `col` object to compare to.
+
+ Returns:
+ col: The current `col` object.
+ """
+ self.join_args=(self.cols,other.cols,"<")
+ returnself
+
+ def__le__(self,other):
+"""Implements the less-than-or-equal-to comparison operator (`<=`).
+
+ Args:
+ other (col): The other `col` object to compare to.
+
+ Returns:
+ col: The current `col` object.
+ """
+ self.join_args=(self.cols,other.cols,"<=")
+ returnself
+
+ def__ne__(self,other):
+"""Implements the not-equal-to comparison operator (`!=`).
+
+ Args:
+ other (col): The other `col` object to compare to.
+
+ Returns:
+ col: The current `col` object.
+ """
+ self.join_args=(self.cols,other.cols,"!=")
+ returnself
+
+ def__eq__(self,other):
+"""Implements the equal-to comparison operator (`==`).
+
+ Args:
+ other (col): The other `col` object to compare to.
+
+ Returns:
+ col: The current `col` object.
+ """
+ self.join_args=(self.cols,other.cols,"==")
+ returnself
+
def__eq__(self,other):
+"""Implements the equal-to comparison operator (`==`).
+
+ Args:
+ other (col): The other `col` object to compare to.
+
+ Returns:
+ col: The current `col` object.
+ """
+ self.join_args=(self.cols,other.cols,"==")
+ returnself
+
+
+
+
+
+
+
+
+
+
+
+
+ __ge__(other)
+
+
+
+
+
+
+
Implements the greater-than-or-equal-to comparison operator (>=).
def__ge__(self,other):
+"""Implements the greater-than-or-equal-to comparison operator (`>=`).
+
+ Args:
+ other (col): The other `col` object to compare to.
+
+ Returns:
+ col: The current `col` object.
+ """
+ self.join_args=(self.cols,other.cols,">=")
+ returnself
+
+
+
+
+
+
+
+
+
+
+
+
+ __gt__(other)
+
+
+
+
+
+
+
Implements the greater-than comparison operator (>).
def__gt__(self,other):
+"""Implements the greater-than comparison operator (`>`).
+
+ Args:
+ other (col): The other `col` object to compare to.
+
+ Returns:
+ col: The current `col` object.
+ """
+ self.join_args=(self.cols,other.cols,">")
+ returnself
+
+
+
+
+
+
+
+
+
+
+
+
+ __le__(other)
+
+
+
+
+
+
+
Implements the less-than-or-equal-to comparison operator (<=).
def__le__(self,other):
+"""Implements the less-than-or-equal-to comparison operator (`<=`).
+
+ Args:
+ other (col): The other `col` object to compare to.
+
+ Returns:
+ col: The current `col` object.
+ """
+ self.join_args=(self.cols,other.cols,"<=")
+ returnself
+
def__lt__(self,other):
+"""Implements the less-than comparison operator (`<`).
+
+ Args:
+ other (col): The other `col` object to compare to.
+
+ Returns:
+ col: The current `col` object.
+ """
+ self.join_args=(self.cols,other.cols,"<")
+ returnself
+
+
+
+
+
+
+
+
+
+
+
+
+ __ne__(other)
+
+
+
+
+
+
+
Implements the not-equal-to comparison operator (!=).
def__ne__(self,other):
+"""Implements the not-equal-to comparison operator (`!=`).
+
+ Args:
+ other (col): The other `col` object to compare to.
+
+ Returns:
+ col: The current `col` object.
+ """
+ self.join_args=(self.cols,other.cols,"!=")
+ returnself
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+ get_columns(group,label)
+
+
+
+
+
+
+
Helper function for selecting columns on a grouped object,
+using the
+select syntax.
defget_columns(group:Union[DataFrameGroupBy,SeriesGroupBy],label):
+"""
+ Helper function for selecting columns on a grouped object,
+ using the
+ [`select`][janitor.functions.select.select] syntax.
+
+ !!! info "New in version 0.25.0"
+
+ Args:
+ group: A Pandas GroupBy object.
+ label: column(s) to select.
+
+ Returns:
+ A pandas groupby object.
+ """
+ check("groupby object",group,[DataFrameGroupBy,SeriesGroupBy])
+ label=get_index_labels(label,group.obj,axis="columns")
+ label=labelifis_scalar(label)elselist(label)
+ returngroup[label]
+
+
+
+
+
+
+
+
+
+
+
+
+ get_index_labels(arg,df,axis)
+
+
+
+
+
+
+
Convenience function to get actual labels from column/index
+
+
New in version 0.25.0
+
+
+
+
+
Parameters:
+
+
+
+
Name
+
Type
+
Description
+
Default
+
+
+
+
+
arg
+
+
+
+
+
Valid inputs include: an exact column name to look for,
+a shell-style glob string (e.g. *_thing_*),
+a regular expression,
+a callable,
+or variable arguments of all the aforementioned.
+A sequence of booleans is also acceptable.
+A dictionary can be used for selection
+on a MultiIndex on different levels.
defget_index_labels(
+ arg,df:pd.DataFrame,axis:Literal["index","columns"]
+)->pd.Index:
+"""Convenience function to get actual labels from column/index
+
+ !!! info "New in version 0.25.0"
+
+ Args:
+ arg: Valid inputs include: an exact column name to look for,
+ a shell-style glob string (e.g. `*_thing_*`),
+ a regular expression,
+ a callable,
+ or variable arguments of all the aforementioned.
+ A sequence of booleans is also acceptable.
+ A dictionary can be used for selection
+ on a MultiIndex on different levels.
+ df: The pandas DataFrame object.
+ axis: Should be either `index` or `columns`.
+
+ Returns:
+ A pandas Index.
+ """
+ assertaxisin{"index","columns"}
+ index=getattr(df,axis)
+ returnindex[_select_index(arg,df,axis)]
+
+
+
+
+
+
+
+
+
+
+
+
+ patterns(regex_pattern)
+
+
+
+
+
+
+
This function converts a string into a compiled regular expression.
+
It can be used to select columns in the index or columns_names
+arguments of pivot_longer function.
+
+
Warning
+
This function is deprecated. Kindly use re.compile instead.
+
+
+
+
+
Parameters:
+
+
+
+
Name
+
Type
+
Description
+
Default
+
+
+
+
+
regex_pattern
+
+ Union[str, Pattern]
+
+
+
+
String to be converted to compiled regular
+expression.
+
+
+
+ required
+
+
+
+
+
+
+
+
Returns:
+
+
+
+
Type
+
Description
+
+
+
+
+
+ Pattern
+
+
+
+
A compile regular expression from provided regex_pattern.
defpatterns(regex_pattern:Union[str,Pattern])->Pattern:
+"""This function converts a string into a compiled regular expression.
+
+ It can be used to select columns in the index or columns_names
+ arguments of `pivot_longer` function.
+
+ !!!warning
+
+ This function is deprecated. Kindly use `re.compile` instead.
+
+ Args:
+ regex_pattern: String to be converted to compiled regular
+ expression.
+
+ Returns:
+ A compile regular expression from provided `regex_pattern`.
+ """
+ warnings.warn(
+ "This function is deprecated. Kindly use `re.compile` instead.",
+ DeprecationWarning,
+ stacklevel=find_stack_level(),
+ )
+ check("regular expression",regex_pattern,[str,Pattern])
+
+ returnre.compile(regex_pattern)
+
Given a group of dataframes which contain some categorical columns, for
+each categorical column present, find all the possible categories across
+all the dataframes which have that column.
+Update each dataframes' corresponding column with a new categorical object
+that contains the original data
+but has labels for all the possible categories from all dataframes.
+This is useful when concatenating a list of dataframes which all have the
+same categorical columns into one dataframe.
+
If, for a given categorical column, all input dataframes do not have at
+least one instance of all the possible categories,
+Pandas will change the output dtype of that column from category to
+object, losing out on dramatic speed gains you get from the former
+format.
+
+
+
+
Examples:
+
Usage example for concatenation of categorical column-containing
+dataframes:
defunionize_dataframe_categories(
+ *dataframes:Any,
+ column_names:Optional[Iterable[pd.CategoricalDtype]]=None,
+)->List[pd.DataFrame]:
+"""
+ Given a group of dataframes which contain some categorical columns, for
+ each categorical column present, find all the possible categories across
+ all the dataframes which have that column.
+ Update each dataframes' corresponding column with a new categorical object
+ that contains the original data
+ but has labels for all the possible categories from all dataframes.
+ This is useful when concatenating a list of dataframes which all have the
+ same categorical columns into one dataframe.
+
+ If, for a given categorical column, all input dataframes do not have at
+ least one instance of all the possible categories,
+ Pandas will change the output dtype of that column from `category` to
+ `object`, losing out on dramatic speed gains you get from the former
+ format.
+
+ Examples:
+ Usage example for concatenation of categorical column-containing
+ dataframes:
+
+ Instead of:
+
+ ```python
+ concatenated_df = pd.concat([df1, df2, df3], ignore_index=True)
+ ```
+
+ which in your case has resulted in `category` -> `object` conversion,
+ use:
+
+ ```python
+ unionized_dataframes = unionize_dataframe_categories(df1, df2, df2)
+ concatenated_df = pd.concat(unionized_dataframes, ignore_index=True)
+ ```
+
+ Args:
+ *dataframes: The dataframes you wish to unionize the categorical
+ objects for.
+ column_names: If supplied, only unionize this subset of columns.
+
+ Raises:
+ TypeError: If any of the inputs are not pandas DataFrames.
+
+ Returns:
+ A list of the category-unioned dataframes in the same order they
+ were provided.
+ """
+
+ ifany(notisinstance(df,pd.DataFrame)fordfindataframes):
+ raiseTypeError("Inputs must all be dataframes.")
+
+ ifcolumn_namesisNone:
+ # Find all columns across all dataframes that are categorical
+
+ column_names=set()
+
+ fordataframeindataframes:
+ column_names=column_names.union(
+ [
+ column_name
+ forcolumn_nameindataframe.columns
+ ifisinstance(
+ dataframe[column_name].dtype,pd.CategoricalDtype
+ )
+ ]
+ )
+
+ else:
+ column_names=[column_names]
+ # For each categorical column, find all possible values across the DFs
+
+ category_unions={
+ column_name:union_categoricals(
+ [df[column_name]fordfindataframesifcolumn_nameindf.columns]
+ )
+ forcolumn_nameincolumn_names
+ }
+
+ # Make a shallow copy of all DFs and modify the categorical columns
+ # such that they can encode the union of all possible categories for each.
+
+ refactored_dfs=[]
+
+ fordfindataframes:
+ df=df.copy(deep=False)
+
+ forcolumn_name,categoricalincategory_unions.items():
+ ifcolumn_nameindf.columns:
+ df[column_name]=pd.Categorical(
+ df[column_name],categories=categorical.categories
+ )
+
+ refactored_dfs.append(df)
+
+ returnrefactored_dfs
+
This function assumes that your command line command will return
+an output that is parsable using pandas.read_csv and StringIO.
+We default to using pd.read_csv underneath the hood.
+Keyword arguments are passed through to read_csv.
+
+
+
+
Parameters:
+
+
+
+
Name
+
Type
+
Description
+
Default
+
+
+
+
+
cmd
+
+ str
+
+
+
+
Shell command to preprocess a file on disk.
+
+
+
+ required
+
+
+
+
**kwargs
+
+ Any
+
+
+
+
Keyword arguments that are passed through to
+pd.read_csv().
+
+
+
+ {}
+
+
+
+
+
+
+
+
Returns:
+
+
+
+
Type
+
Description
+
+
+
+
+
+ DataFrame
+
+
+
+
A pandas DataFrame parsed from the stdout of the underlying
+shell.
defread_commandline(cmd:str,**kwargs:Any)->pd.DataFrame:
+"""Read a CSV file based on a command-line command.
+
+ For example, you may wish to run the following command on `sep-quarter.csv`
+ before reading it into a pandas DataFrame:
+
+ ```bash
+ cat sep-quarter.csv | grep .SEA1AA
+ ```
+
+ In this case, you can use the following Python code to load the dataframe:
+
+ ```python
+ import janitor as jn
+ df = jn.read_commandline("cat data.csv | grep .SEA1AA")
+ ```
+
+ This function assumes that your command line command will return
+ an output that is parsable using `pandas.read_csv` and StringIO.
+ We default to using `pd.read_csv` underneath the hood.
+ Keyword arguments are passed through to read_csv.
+
+ Args:
+ cmd: Shell command to preprocess a file on disk.
+ **kwargs: Keyword arguments that are passed through to
+ `pd.read_csv()`.
+
+ Returns:
+ A pandas DataFrame parsed from the stdout of the underlying
+ shell.
+ """
+
+ check("cmd",cmd,[str])
+ # adding check=True ensures that an explicit, clear error
+ # is raised, so that the user can see the reason for the failure
+ outcome=subprocess.run(
+ cmd,shell=True,capture_output=True,text=True,check=True
+ )
+ returnpd.read_csv(StringIO(outcome.stdout),**kwargs)
+
Read multiple CSV files and return a dictionary of DataFrames, or
+one concatenated DataFrame.
+
+
+
+
Parameters:
+
+
+
+
Name
+
Type
+
Description
+
Default
+
+
+
+
+
files_path
+
+ Union[str, Iterable[str]]
+
+
+
+
The filepath pattern matching the CSV files.
+Accepts regular expressions, with or without .csv extension.
+Also accepts iterable of file paths.
+
+
+
+ required
+
+
+
+
separate_df
+
+ bool
+
+
+
+
If False (default), returns a single Dataframe
+with the concatenation of the csv files.
+If True, returns a dictionary of separate DataFrames
+for each CSV file.
+
+
+
+ False
+
+
+
+
**kwargs
+
+ Any
+
+
+
+
Keyword arguments to pass into the
+original pandas read_csv.
+
+
+
+ {}
+
+
+
+
+
+
+
+
Raises:
+
+
+
+
Type
+
Description
+
+
+
+
+
+ JanitorError
+
+
+
+
If None provided for files_path.
+
+
+
+
+
+ JanitorError
+
+
+
+
If length of files_path is 0.
+
+
+
+
+
+ ValueError
+
+
+
+
If no CSV files exist in files_path.
+
+
+
+
+
+ ValueError
+
+
+
+
If columns in input CSV files do not match.
+
+
+
+
+
+
+
+
+
Returns:
+
+
+
+
Type
+
Description
+
+
+
+
+
+ Union[DataFrame, dict]
+
+
+
+
DataFrame of concatenated DataFrames or dictionary of DataFrames.
@deprecated_alias(seperate_df="separate_df",filespath="files_path")
+defread_csvs(
+ files_path:Union[str,Iterable[str]],
+ separate_df:bool=False,
+ **kwargs:Any,
+)->Union[pd.DataFrame,dict]:
+"""Read multiple CSV files and return a dictionary of DataFrames, or
+ one concatenated DataFrame.
+
+ Args:
+ files_path: The filepath pattern matching the CSV files.
+ Accepts regular expressions, with or without `.csv` extension.
+ Also accepts iterable of file paths.
+ separate_df: If `False` (default), returns a single Dataframe
+ with the concatenation of the csv files.
+ If `True`, returns a dictionary of separate DataFrames
+ for each CSV file.
+ **kwargs: Keyword arguments to pass into the
+ original pandas `read_csv`.
+
+ Raises:
+ JanitorError: If `None` provided for `files_path`.
+ JanitorError: If length of `files_path` is `0`.
+ ValueError: If no CSV files exist in `files_path`.
+ ValueError: If columns in input CSV files do not match.
+
+ Returns:
+ DataFrame of concatenated DataFrames or dictionary of DataFrames.
+ """
+ # Sanitize input
+ iffiles_pathisNone:
+ raiseJanitorError("`None` provided for `files_path`")
+ ifnotfiles_path:
+ raiseJanitorError("0 length `files_path` provided")
+
+ # Read the csv files
+ # String to file/folder or file pattern provided
+ ifisinstance(files_path,str):
+ dfs_dict={
+ os.path.basename(f):pd.read_csv(f,**kwargs)
+ forfinglob(files_path)
+ }
+ # Iterable of file paths provided
+ else:
+ dfs_dict={
+ os.path.basename(f):pd.read_csv(f,**kwargs)forfinfiles_path
+ }
+ # Check if dataframes have been read
+ ifnotdfs_dict:
+ raiseValueError("No CSV files to read with the given `files_path`")
+ # Concatenate the dataframes if requested (default)
+ col_names=list(dfs_dict.values())[0].columns# noqa: PD011
+ ifnotseparate_df:
+ # If columns do not match raise an error
+ fordfindfs_dict.values():# noqa: PD011
+ ifnotall(df.columns==col_names):
+ raiseValueError(
+ "Columns in input CSV files do not match."
+ "Files cannot be concatenated."
+ )
+ returnpd.concat(
+ list(dfs_dict.values()),
+ ignore_index=True,
+ sort=False,# noqa: PD011
+ copy=False,
+ )
+ returndfs_dict
+
Imports data from spreadsheet without coercing it into a rectangle.
+
Each cell is represented by a row in a dataframe, and includes the
+cell's coordinates, the value, row and column position.
+The cell formatting (fill, font, border, etc) can also be accessed;
+usually this is returned as a dictionary in the cell, and the specific
+cell format attribute can be accessed using pd.Series.str.get.
+
Inspiration for this comes from R's tidyxl package.
>>> xlsx_cells(filename,sheetnames="highlights")
+ value internal_value coordinate row column data_type is_date number_format
+0 Age Age A1 1 1 s False General
+1 Height Height B1 1 2 s False General
+2 1 1 A2 2 1 n False General
+3 2 2 B2 2 2 n False General
+4 3 3 A3 3 1 n False General
+5 4 4 B3 3 2 n False General
+6 5 5 A4 4 1 n False General
+7 6 6 B4 4 2 n False General
+
Path to the Excel File. It can also be an openpyxl Workbook.
+
+
+
+ required
+
+
+
+
sheetnames
+
+ Union[str, list, tuple]
+
+
+
+
Names of the sheets from which the cells are to be extracted.
+If None, all the sheets in the file are extracted;
+if it is a string, or list or tuple, only the specified sheets are extracted.
+
+
+
+ None
+
+
+
+
start_point
+
+ Union[str, int]
+
+
+
+
Start coordinates of the Excel sheet. This is useful
+if the user is only interested in a subsection of the sheet.
+If start_point is provided, end_point must be provided as well.
+
+
+
+ None
+
+
+
+
end_point
+
+ Union[str, int]
+
+
+
+
End coordinates of the Excel sheet. This is useful
+if the user is only interested in a subsection of the sheet.
+If end_point is provided, start_point must be provided as well.
+
+
+
+ None
+
+
+
+
read_only
+
+ bool
+
+
+
+
Determines if the entire file is loaded in memory,
+or streamed. For memory efficiency, read_only should be set to True.
+Some cell properties like comment, can only be accessed by
+setting read_only to False.
+
+
+
+ True
+
+
+
+
include_blank_cells
+
+ bool
+
+
+
+
Determines if cells without a value should be included.
+
+
+
+ True
+
+
+
+
fill
+
+ bool
+
+
+
+
If True, return fill properties of the cell.
+It is usually returned as a dictionary.
+
+
+
+ False
+
+
+
+
font
+
+ bool
+
+
+
+
If True, return font properties of the cell.
+It is usually returned as a dictionary.
+
+
+
+ False
+
+
+
+
alignment
+
+ bool
+
+
+
+
If True, return alignment properties of the cell.
+It is usually returned as a dictionary.
+
+
+
+ False
+
+
+
+
border
+
+ bool
+
+
+
+
If True, return border properties of the cell.
+It is usually returned as a dictionary.
+
+
+
+ False
+
+
+
+
protection
+
+ bool
+
+
+
+
If True, return protection properties of the cell.
+It is usually returned as a dictionary.
+
+
+
+ False
+
+
+
+
comment
+
+ bool
+
+
+
+
If True, return comment properties of the cell.
+It is usually returned as a dictionary.
+
+
+
+ False
+
+
+
+
**kwargs
+
+ Any
+
+
+
+
Any other attributes of the cell, that can be accessed from openpyxl.
+
+
+
+ {}
+
+
+
+
+
+
+
+
Raises:
+
+
+
+
Type
+
Description
+
+
+
+
+
+ ValueError
+
+
+
+
If kwargs is provided, and one of the keys is a default column.
+
+
+
+
+
+ AttributeError
+
+
+
+
If kwargs is provided and any of the keys
+is not a openpyxl cell attribute.
+
+
+
+
+
+
+
+
+
Returns:
+
+
+
+
Type
+
Description
+
+
+
+
+
+ Union[dict, DataFrame]
+
+
+
+
A pandas DataFrame, or a dictionary of DataFrames.
defxlsx_cells(
+ path:Union[str,Workbook],
+ sheetnames:Union[str,list,tuple]=None,
+ start_point:Union[str,int]=None,
+ end_point:Union[str,int]=None,
+ read_only:bool=True,
+ include_blank_cells:bool=True,
+ fill:bool=False,
+ font:bool=False,
+ alignment:bool=False,
+ border:bool=False,
+ protection:bool=False,
+ comment:bool=False,
+ **kwargs:Any,
+)->Union[dict,pd.DataFrame]:
+"""Imports data from spreadsheet without coercing it into a rectangle.
+
+ Each cell is represented by a row in a dataframe, and includes the
+ cell's coordinates, the value, row and column position.
+ The cell formatting (fill, font, border, etc) can also be accessed;
+ usually this is returned as a dictionary in the cell, and the specific
+ cell format attribute can be accessed using `pd.Series.str.get`.
+
+ Inspiration for this comes from R's [tidyxl][link] package.
+ [link]: https://nacnudus.github.io/tidyxl/reference/tidyxl.html
+
+ Examples:
+ >>> import pandas as pd
+ >>> from janitor import xlsx_cells
+ >>> pd.set_option("display.max_columns", None)
+ >>> pd.set_option("display.expand_frame_repr", False)
+ >>> pd.set_option("max_colwidth", None)
+ >>> filename = "../pyjanitor/tests/test_data/worked-examples.xlsx"
+
+ Each cell is returned as a row:
+
+ >>> xlsx_cells(filename, sheetnames="highlights")
+ value internal_value coordinate row column data_type is_date number_format
+ 0 Age Age A1 1 1 s False General
+ 1 Height Height B1 1 2 s False General
+ 2 1 1 A2 2 1 n False General
+ 3 2 2 B2 2 2 n False General
+ 4 3 3 A3 3 1 n False General
+ 5 4 4 B3 3 2 n False General
+ 6 5 5 A4 4 1 n False General
+ 7 6 6 B4 4 2 n False General
+
+ Access cell formatting such as fill:
+
+ >>> out=xlsx_cells(filename, sheetnames="highlights", fill=True).select("value", "fill", axis='columns')
+ >>> out
+ value fill
+ 0 Age {'patternType': None, 'fgColor': {'rgb': '00000000', 'type': 'rgb', 'tint': 0.0}, 'bgColor': {'rgb': '00000000', 'type': 'rgb', 'tint': 0.0}}
+ 1 Height {'patternType': None, 'fgColor': {'rgb': '00000000', 'type': 'rgb', 'tint': 0.0}, 'bgColor': {'rgb': '00000000', 'type': 'rgb', 'tint': 0.0}}
+ 2 1 {'patternType': None, 'fgColor': {'rgb': '00000000', 'type': 'rgb', 'tint': 0.0}, 'bgColor': {'rgb': '00000000', 'type': 'rgb', 'tint': 0.0}}
+ 3 2 {'patternType': None, 'fgColor': {'rgb': '00000000', 'type': 'rgb', 'tint': 0.0}, 'bgColor': {'rgb': '00000000', 'type': 'rgb', 'tint': 0.0}}
+ 4 3 {'patternType': 'solid', 'fgColor': {'rgb': 'FFFFFF00', 'type': 'rgb', 'tint': 0.0}, 'bgColor': {'rgb': 'FFFFFF00', 'type': 'rgb', 'tint': 0.0}}
+ 5 4 {'patternType': 'solid', 'fgColor': {'rgb': 'FFFFFF00', 'type': 'rgb', 'tint': 0.0}, 'bgColor': {'rgb': 'FFFFFF00', 'type': 'rgb', 'tint': 0.0}}
+ 6 5 {'patternType': None, 'fgColor': {'rgb': '00000000', 'type': 'rgb', 'tint': 0.0}, 'bgColor': {'rgb': '00000000', 'type': 'rgb', 'tint': 0.0}}
+ 7 6 {'patternType': None, 'fgColor': {'rgb': '00000000', 'type': 'rgb', 'tint': 0.0}, 'bgColor': {'rgb': '00000000', 'type': 'rgb', 'tint': 0.0}}
+
+ Specific cell attributes can be accessed by using Pandas' `series.str.get`:
+
+ >>> out.fill.str.get("fgColor").str.get("rgb")
+ 0 00000000
+ 1 00000000
+ 2 00000000
+ 3 00000000
+ 4 FFFFFF00
+ 5 FFFFFF00
+ 6 00000000
+ 7 00000000
+ Name: fill, dtype: object
+
+ Args:
+ path: Path to the Excel File. It can also be an openpyxl Workbook.
+ sheetnames: Names of the sheets from which the cells are to be extracted.
+ If `None`, all the sheets in the file are extracted;
+ if it is a string, or list or tuple, only the specified sheets are extracted.
+ start_point: Start coordinates of the Excel sheet. This is useful
+ if the user is only interested in a subsection of the sheet.
+ If `start_point` is provided, `end_point` must be provided as well.
+ end_point: End coordinates of the Excel sheet. This is useful
+ if the user is only interested in a subsection of the sheet.
+ If `end_point` is provided, `start_point` must be provided as well.
+ read_only: Determines if the entire file is loaded in memory,
+ or streamed. For memory efficiency, read_only should be set to `True`.
+ Some cell properties like `comment`, can only be accessed by
+ setting `read_only` to `False`.
+ include_blank_cells: Determines if cells without a value should be included.
+ fill: If `True`, return fill properties of the cell.
+ It is usually returned as a dictionary.
+ font: If `True`, return font properties of the cell.
+ It is usually returned as a dictionary.
+ alignment: If `True`, return alignment properties of the cell.
+ It is usually returned as a dictionary.
+ border: If `True`, return border properties of the cell.
+ It is usually returned as a dictionary.
+ protection: If `True`, return protection properties of the cell.
+ It is usually returned as a dictionary.
+ comment: If `True`, return comment properties of the cell.
+ It is usually returned as a dictionary.
+ **kwargs: Any other attributes of the cell, that can be accessed from openpyxl.
+
+ Raises:
+ ValueError: If kwargs is provided, and one of the keys is a default column.
+ AttributeError: If kwargs is provided and any of the keys
+ is not a openpyxl cell attribute.
+
+ Returns:
+ A pandas DataFrame, or a dictionary of DataFrames.
+ """# noqa : E501
+
+ try:
+ fromopenpyxlimportload_workbook
+ fromopenpyxl.cell.cellimportCell
+ fromopenpyxl.cell.read_onlyimportReadOnlyCell
+ fromopenpyxl.workbook.workbookimportWorkbook
+ exceptImportError:
+ import_message(
+ submodule="io",
+ package="openpyxl",
+ conda_channel="conda-forge",
+ pip_install=True,
+ )
+
+ path_is_workbook=isinstance(path,Workbook)
+ ifnotpath_is_workbook:
+ # for memory efficiency, read_only is set to True
+ # if comments is True, read_only has to be False,
+ # as lazy loading is not enabled for comments
+ ifcommentandread_only:
+ raiseValueError(
+ "To access comments, kindly set 'read_only' to False."
+ )
+ path=load_workbook(
+ filename=path,read_only=read_only,keep_links=False
+ )
+ # start_point and end_point applies if the user is interested in
+ # only a subset of the Excel File and knows the coordinates
+ ifstart_pointorend_point:
+ check("start_point",start_point,[str,int])
+ check("end_point",end_point,[str,int])
+
+ defaults=(
+ "value",
+ "internal_value",
+ "coordinate",
+ "row",
+ "column",
+ "data_type",
+ "is_date",
+ "number_format",
+ )
+
+ parameters={
+ "fill":fill,
+ "font":font,
+ "alignment":alignment,
+ "border":border,
+ "protection":protection,
+ "comment":comment,
+ }
+
+ ifkwargs:
+ ifpath_is_workbook:
+ ifpath.read_only:
+ _cell=ReadOnlyCell
+ else:
+ _cell=Cell
+ else:
+ ifread_only:
+ _cell=ReadOnlyCell
+ else:
+ _cell=Cell
+
+ attrs={
+ attr
+ forattr,_ininspect.getmembers(_cell,not(inspect.isroutine))
+ ifnotattr.startswith("_")
+ }
+
+ forkeyinkwargs:
+ ifkeyindefaults:
+ raiseValueError(
+ f"{key} is part of the default attributes "
+ "returned as a column."
+ )
+ elifkeynotinattrs:
+ raiseAttributeError(
+ f"{key} is not a recognized attribute of {_cell}."
+ )
+ parameters.update(kwargs)
+
+ ifnotsheetnames:
+ sheetnames=path.sheetnames
+ elifisinstance(sheetnames,str):
+ sheetnames=[sheetnames]
+ else:
+ check("sheetnames",sheetnames,[str,list,tuple])
+
+ out={
+ sheetname:_xlsx_cells(
+ path[sheetname],
+ defaults,
+ parameters,
+ start_point,
+ end_point,
+ include_blank_cells,
+ )
+ forsheetnameinsheetnames
+ }
+ iflen(out)==1:
+ _,out=out.popitem()
+
+ if(notpath_is_workbook)andpath.read_only:
+ path.close()
+
+ returnout
+
+
+
+
+
+
+
+
+
+
+
+
+ xlsx_table(path,sheetname=None,table=None)
+
+
+
+
+
+
+
Returns a DataFrame of values in a table in the Excel file.
+
This applies to an Excel file, where the data range is explicitly
+specified as a Microsoft Excel table.
+
If there is a single table in the sheet, or a string is provided
+as an argument to the table parameter, a pandas DataFrame is returned;
+if there is more than one table in the sheet,
+and the table argument is None, or a list/tuple of names,
+a dictionary of DataFrames is returned, where the keys of the dictionary
+are the table names.
defxlsx_table(
+ path:Union[str,IO,Workbook],
+ sheetname:str=None,
+ table:Union[str,list,tuple]=None,
+)->Union[pd.DataFrame,dict]:
+"""Returns a DataFrame of values in a table in the Excel file.
+
+ This applies to an Excel file, where the data range is explicitly
+ specified as a Microsoft Excel table.
+
+ If there is a single table in the sheet, or a string is provided
+ as an argument to the `table` parameter, a pandas DataFrame is returned;
+ if there is more than one table in the sheet,
+ and the `table` argument is `None`, or a list/tuple of names,
+ a dictionary of DataFrames is returned, where the keys of the dictionary
+ are the table names.
+
+ Examples:
+ >>> import pandas as pd
+ >>> from janitor import xlsx_table
+ >>> filename="../pyjanitor/tests/test_data/016-MSPTDA-Excel.xlsx"
+
+ Single table:
+
+ >>> xlsx_table(filename, table='dCategory')
+ CategoryID Category
+ 0 1 Beginner
+ 1 2 Advanced
+ 2 3 Freestyle
+ 3 4 Competition
+ 4 5 Long Distance
+
+ Multiple tables:
+
+ >>> out=xlsx_table(filename, table=["dCategory", "dSalesReps"])
+ >>> out["dCategory"]
+ CategoryID Category
+ 0 1 Beginner
+ 1 2 Advanced
+ 2 3 Freestyle
+ 3 4 Competition
+ 4 5 Long Distance
+ >>> out["dSalesReps"].head(3)
+ SalesRepID SalesRep Region
+ 0 1 Sioux Radcoolinator NW
+ 1 2 Tyrone Smithe NE
+ 2 3 Chantel Zoya SW
+
+ Args:
+ path: Path to the Excel File. It can also be an openpyxl Workbook.
+ table: Name of a table, or list of tables in the sheet.
+
+ Raises:
+ AttributeError: If a workbook is provided, and is a ReadOnlyWorksheet.
+ ValueError: If there are no tables in the sheet.
+ KeyError: If the provided table does not exist in the sheet.
+
+ Returns:
+ A pandas DataFrame, or a dictionary of DataFrames,
+ if there are multiple arguments for the `table` parameter,
+ or the argument to `table` is `None`.
+ """# noqa : E501
+
+ try:
+ fromopenpyxlimportload_workbook
+ fromopenpyxl.workbook.workbookimportWorkbook
+ exceptImportError:
+ import_message(
+ submodule="io",
+ package="openpyxl",
+ conda_channel="conda-forge",
+ pip_install=True,
+ )
+ # TODO: remove in version 1.0
+ ifsheetname:
+ warnings.warn(
+ "The keyword argument "
+ "'sheetname' of 'xlsx_tables' is deprecated.",
+ DeprecationWarning,
+ stacklevel=find_stack_level(),
+ )
+ iftableisnotNone:
+ check("table",table,[str,list,tuple])
+ ifisinstance(table,(list,tuple)):
+ fornum,entryinenumerate(table):
+ check(f"entry{num} in the table argument",entry,[str])
+ ifisinstance(path,Workbook):
+ ws=path
+ else:
+ ws=load_workbook(
+ filename=path,read_only=False,keep_links=False,data_only=True
+ )
+ ifws.read_only:
+ raiseValueError("xlsx_table does not work in read only mode.")
+
+ def_create_dataframe_or_dictionary_from_table(
+ table_name_and_worksheet:tuple,
+ ):
+"""
+ Create DataFrame/dictionary if table exists in Workbook
+ """
+ dictionary={}
+ fortable_name,worksheetintable_name_and_worksheet:
+ contents=worksheet.tables[table_name]
+ header_exist=contents.headerRowCount
+ coordinates=contents.ref
+ data=worksheet[coordinates]
+ data=[[entry.valueforentryincell]forcellindata]
+ ifheader_exist:
+ header,*data=data
+ else:
+ header=[f"C{num}"fornuminrange(len(data[0]))]
+ data=pd.DataFrame(data,columns=header)
+ dictionary[table_name]=data
+ returndictionary
+
+ worksheets=[worksheetforworksheetinwsifworksheet.tables.items()]
+ ifnotany(worksheets):
+ raiseValueError("There are no tables in the Workbook.")
+ table_is_a_string=False
+ iftable:
+ ifisinstance(table,str):
+ table_is_a_string=True
+ table=[table]
+ table_names=(
+ entryforworksheetinworksheetsforentryinworksheet.tables
+ )
+ missing=set(table).difference(table_names)
+ ifmissing:
+ raiseKeyError(f"Tables {*missing,} do not exist in the Workbook.")
+ tables=[
+ (entry,worksheet)
+ forworksheetinworksheets
+ forentryinworksheet.tables
+ ifentryintable
+ ]
+ else:
+ tables=[
+ (entry,worksheet)
+ forworksheetinworksheets
+ forentryinworksheet.tables
+ ]
+ data=_create_dataframe_or_dictionary_from_table(
+ table_name_and_worksheet=tables
+ )
+ iftable_is_a_string:
+ returndata[table[0]]
+ returndata
+
@pf.register_series_method
+defecdf(s:"Series")->Tuple["ndarray","ndarray"]:
+"""Return cumulative distribution of values in a series.
+
+ Null values must be dropped from the series,
+ otherwise a `ValueError` is raised.
+
+ Also, if the `dtype` of the series is not numeric,
+ a `TypeError` is raised.
+
+ Examples:
+ >>> import pandas as pd
+ >>> import janitor
+ >>> s = pd.Series([0, 4, 0, 1, 2, 1, 1, 3])
+ >>> x, y = s.ecdf()
+ >>> x # doctest: +SKIP
+ array([0, 0, 1, 1, 1, 2, 3, 4])
+ >>> y # doctest: +SKIP
+ array([0.125, 0.25 , 0.375, 0.5 , 0.625, 0.75 , 0.875, 1. ])
+
+ You can then plot the ECDF values, for example:
+
+ >>> from matplotlib import pyplot as plt
+ >>> plt.scatter(x, y) # doctest: +SKIP
+
+ Args:
+ s: A pandas series. `dtype` should be numeric.
+
+ Raises:
+ TypeError: If series is not numeric.
+ ValueError: If series contains nulls.
+
+ Returns:
+ x: Sorted array of values.
+ y: Cumulative fraction of data points with value `x` or lower.
+ """
+ importnumpyasnp
+ importpandas.api.typesaspdtypes
+
+ ifnotpdtypes.is_numeric_dtype(s):
+ raiseTypeError(f"series {s.name} must be numeric!")
+ ifnots.isna().sum()==0:
+ raiseValueError(f"series {s.name} contains nulls. Please drop them.")
+
+ n=len(s)
+ x=np.sort(s)
+ y=np.arange(1,n+1)/n
+
+ returnx,y
+
Determines behavior when taking the log of nonpositive
+entries. If 'warn' then a RuntimeWarning is thrown. If
+'raise', then a RuntimeError is thrown. Otherwise, nothing
+is thrown and log of nonpositive values is np.nan.
+
+
+
+ 'warn'
+
+
+
+
+
+
+
+
Raises:
+
+
+
+
Type
+
Description
+
+
+
+
+
+ RuntimeError
+
+
+
+
Raised when there are nonpositive values in the
+Series and error='raise'.
@pf.register_series_method
+deflog(s:"Series",error:str="warn")->"Series":
+"""
+ Take natural logarithm of the Series.
+
+ Each value in the series should be positive. Use `error` to control the
+ behavior if there are nonpositive entries in the series.
+
+ Examples:
+ >>> import pandas as pd
+ >>> import janitor
+ >>> s = pd.Series([0, 1, 3], name="numbers")
+ >>> s.log(error="ignore")
+ 0 NaN
+ 1 0.000000
+ 2 1.098612
+ Name: numbers, dtype: float64
+
+ Args:
+ s: Input Series.
+ error: Determines behavior when taking the log of nonpositive
+ entries. If `'warn'` then a `RuntimeWarning` is thrown. If
+ `'raise'`, then a `RuntimeError` is thrown. Otherwise, nothing
+ is thrown and log of nonpositive values is `np.nan`.
+
+ Raises:
+ RuntimeError: Raised when there are nonpositive values in the
+ Series and `error='raise'`.
+
+ Returns:
+ Transformed Series.
+ """
+ importnumpyasnp
+
+ s=s.copy()
+ nonpositive=s<=0
+ if(nonpositive).any():
+ msg=f"Log taken on {nonpositive.sum()} nonpositive value(s)"
+ iferror.lower()=="warn":
+ warnings.warn(msg,RuntimeWarning)
+ iferror.lower()=="raise":
+ raiseRuntimeError(msg)
+ else:
+ pass
+ s[nonpositive]=np.nan
+ returnnp.log(s)
+
+
+
+
+
+
+
+
+
+
+
+
+ logit(s,error='warn')
+
+
+
+
+
+
+
Take logit transform of the Series.
+
The logit transform is defined:
+
logit(p)=log(p/(1-p))
+
+
Each value in the series should be between 0 and 1. Use error to
+control the behavior if any series entries are outside of (0, 1).
Determines behavior when s is outside of (0, 1).
+If 'warn' then a RuntimeWarning is thrown. If 'raise', then a
+RuntimeError is thrown. Otherwise, nothing is thrown and np.nan
+is returned for the problematic entries; defaults to 'warn'.
@pf.register_series_method
+deflogit(s:"Series",error:str="warn")->"Series":
+"""Take logit transform of the Series.
+
+ The logit transform is defined:
+
+ ```python
+ logit(p) = log(p/(1-p))
+ ```
+
+ Each value in the series should be between 0 and 1. Use `error` to
+ control the behavior if any series entries are outside of (0, 1).
+
+ Examples:
+ >>> import pandas as pd
+ >>> import janitor
+ >>> s = pd.Series([0.1, 0.5, 0.9], name="numbers")
+ >>> s.logit()
+ 0 -2.197225
+ 1 0.000000
+ 2 2.197225
+ Name: numbers, dtype: float64
+
+ Args:
+ s: Input Series.
+ error: Determines behavior when `s` is outside of `(0, 1)`.
+ If `'warn'` then a `RuntimeWarning` is thrown. If `'raise'`, then a
+ `RuntimeError` is thrown. Otherwise, nothing is thrown and `np.nan`
+ is returned for the problematic entries; defaults to `'warn'`.
+
+ Raises:
+ RuntimeError: If `error` is set to `'raise'`.
+
+ Returns:
+ Transformed Series.
+ """
+ importnumpyasnp
+ importscipy
+
+ s=s.copy()
+ outside_support=(s<=0)|(s>=1)
+ if(outside_support).any():
+ msg=f"{outside_support.sum()} value(s) are outside of (0, 1)"
+ iferror.lower()=="warn":
+ warnings.warn(msg,RuntimeWarning)
+ iferror.lower()=="raise":
+ raiseRuntimeError(msg)
+ else:
+ pass
+ s[outside_support]=np.nan
+ returnscipy.special.logit(s)
+
+
+
+
+
+
+
+
+
+
+
+
+ normal_cdf(s)
+
+
+
+
+
+
+
Transforms the Series via the CDF of the Normal distribution.
Determines behavior when s is outside of (0, 1).
+If 'warn' then a RuntimeWarning is thrown. If 'raise', then
+a RuntimeError is thrown. Otherwise, nothing is thrown and
+np.nan is returned for the problematic entries.
+
+
+
+ 'warn'
+
+
+
+
+
+
+
+
Raises:
+
+
+
+
Type
+
Description
+
+
+
+
+
+ RuntimeError
+
+
+
+
When there are problematic values
+in the Series and error='raise'.
@pf.register_series_method
+defprobit(s:"Series",error:str="warn")->"Series":
+"""Transforms the Series via the inverse CDF of the Normal distribution.
+
+ Each value in the series should be between 0 and 1. Use `error` to
+ control the behavior if any series entries are outside of (0, 1).
+
+ Examples:
+ >>> import pandas as pd
+ >>> import janitor
+ >>> s = pd.Series([0.1, 0.5, 0.8], name="numbers")
+ >>> s.probit()
+ 0 -1.281552
+ 1 0.000000
+ 2 0.841621
+ dtype: float64
+
+ Args:
+ s: Input Series.
+ error: Determines behavior when `s` is outside of `(0, 1)`.
+ If `'warn'` then a `RuntimeWarning` is thrown. If `'raise'`, then
+ a `RuntimeError` is thrown. Otherwise, nothing is thrown and
+ `np.nan` is returned for the problematic entries.
+
+ Raises:
+ RuntimeError: When there are problematic values
+ in the Series and `error='raise'`.
+
+ Returns:
+ Transformed Series
+ """
+ importnumpyasnp
+ importpandasaspd
+ importscipy
+
+ s=s.copy()
+ outside_support=(s<=0)|(s>=1)
+ if(outside_support).any():
+ msg=f"{outside_support.sum()} value(s) are outside of (0, 1)"
+ iferror.lower()=="warn":
+ warnings.warn(msg,RuntimeWarning)
+ iferror.lower()=="raise":
+ raiseRuntimeError(msg)
+ else:
+ pass
+ s[outside_support]=np.nan
+ withnp.errstate(all="ignore"):
+ out=pd.Series(scipy.stats.norm.ppf(s),index=s.index)
+ returnout
+
The softmax function transforms each element of a collection by
+computing the exponential of each element divided by the sum of the
+exponentials of all the elements.
+
That is, if x is a one-dimensional numpy array or pandas Series:
@pf.register_series_method
+defsoftmax(s:"Series")->"Series":
+"""Take the softmax transform of the series.
+
+ The softmax function transforms each element of a collection by
+ computing the exponential of each element divided by the sum of the
+ exponentials of all the elements.
+
+ That is, if x is a one-dimensional numpy array or pandas Series:
+
+ ```python
+ softmax(x) = exp(x)/sum(exp(x))
+ ```
+
+ Examples:
+ >>> import pandas as pd
+ >>> import janitor
+ >>> s = pd.Series([0, 1, 3], name="numbers")
+ >>> s.softmax()
+ 0 0.042010
+ 1 0.114195
+ 2 0.843795
+ Name: numbers, dtype: float64
+
+ Args:
+ s: Input Series.
+
+ Returns:
+ Transformed Series.
+ """
+ importpandasaspd
+ importscipy
+
+ returnpd.Series(scipy.special.softmax(s),index=s.index,name=s.name)
+
If not None, then the mean and standard
+deviation used to compute the z-score transformation is
+saved as entries in moments_dict with keys determined by
+the keys argument; defaults to None.
+
+
+
+ None
+
+
+
+
keys
+
+ Tuple[str, str]
+
+
+
+
Determines the keys saved in moments_dict
+if moments are saved; defaults to ('mean', 'std').
@pf.register_dataframe_method
+@deprecated_alias(
+ target_columns="target_column_names",
+ feature_columns="feature_column_names",
+)
+defget_features_targets(
+ df:pd.DataFrame,
+ target_column_names:Union[str,Union[List,Tuple],Hashable],
+ feature_column_names:Optional[Union[str,Iterable[str],Hashable]]=None,
+)->Tuple[pd.DataFrame,pd.DataFrame]:
+"""Get the features and targets as separate DataFrames/Series.
+
+ This method does not mutate the original DataFrame.
+
+ The behaviour is as such:
+
+ - `target_column_names` is mandatory.
+ - If `feature_column_names` is present, then we will respect the column
+ names inside there.
+ - If `feature_column_names` is not passed in, then we will assume that
+ the rest of the columns are feature columns, and return them.
+
+ Examples:
+ >>> import pandas as pd
+ >>> import janitor.ml
+ >>> df = pd.DataFrame(
+ ... {"a": [1, 2, 3], "b": [-2, 0, 4], "c": [1.23, 7.89, 4.56]}
+ ... )
+ >>> X, Y = df.get_features_targets(target_column_names=["a", "c"])
+ >>> X
+ b
+ 0 -2
+ 1 0
+ 2 4
+ >>> Y
+ a c
+ 0 1 1.23
+ 1 2 7.89
+ 2 3 4.56
+
+ Args:
+ df: The pandas DataFrame object.
+ target_column_names: Either a column name or an
+ iterable (list or tuple) of column names that are the target(s) to
+ be predicted.
+ feature_column_names: The column name or
+ iterable of column names that are the features (a.k.a. predictors)
+ used to predict the targets.
+
+ Returns:
+ `(X, Y)` the feature matrix (`X`) and the target matrix (`Y`).
+ Both are pandas DataFrames.
+ """
+ Y=df[target_column_names]
+
+ iffeature_column_names:
+ X=df[feature_column_names]
+ else:
+ ifisinstance(target_column_names,(list,tuple)):# noqa: W503
+ xcols=[cforcindf.columnsifcnotintarget_column_names]
+ else:
+ xcols=[cforcindf.columnsiftarget_column_names!=c]
+
+ X=df[xcols]
+ returnX,Y
+
Fills a DataFrame with missing timestamps based on a defined frequency.
+
If timestamps are missing, this function will re-index the DataFrame.
+If timestamps are not missing, then the function will return the DataFrame
+unmodified.
@pf.register_dataframe_method
+deffill_missing_timestamps(
+ df:pd.DataFrame,
+ frequency:str,
+ first_time_stamp:pd.Timestamp=None,
+ last_time_stamp:pd.Timestamp=None,
+)->pd.DataFrame:
+"""Fills a DataFrame with missing timestamps based on a defined frequency.
+
+ If timestamps are missing, this function will re-index the DataFrame.
+ If timestamps are not missing, then the function will return the DataFrame
+ unmodified.
+
+ Examples:
+ Functional usage
+
+ >>> import pandas as pd
+ >>> import janitor.timeseries
+ >>> df = janitor.timeseries.fill_missing_timestamps(
+ ... df=pd.DataFrame(...),
+ ... frequency="1H",
+ ... ) # doctest: +SKIP
+
+ Method chaining example:
+
+ >>> import pandas as pd
+ >>> import janitor.timeseries
+ >>> df = (
+ ... pd.DataFrame(...)
+ ... .fill_missing_timestamps(frequency="1H")
+ ... ) # doctest: +SKIP
+
+ Args:
+ df: DataFrame which needs to be tested for missing timestamps
+ frequency: Sampling frequency of the data.
+ Acceptable frequency strings are available
+ [here](https://pandas.pydata.org/docs/user_guide/timeseries.html#timeseries-offset-aliases).
+ Check offset aliases under time series in user guide
+ first_time_stamp: Timestamp expected to start from;
+ defaults to `None`. If no input is provided, assumes the
+ minimum value in `time_series`.
+ last_time_stamp: Timestamp expected to end with; defaults to `None`.
+ If no input is provided, assumes the maximum value in
+ `time_series`.
+
+ Returns:
+ DataFrame that has a complete set of contiguous datetimes.
+ """
+ # Check all the inputs are the correct data type
+ check("frequency",frequency,[str])
+ check("first_time_stamp",first_time_stamp,[pd.Timestamp,type(None)])
+ check("last_time_stamp",last_time_stamp,[pd.Timestamp,type(None)])
+
+ iffirst_time_stampisNone:
+ first_time_stamp=df.index.min()
+ iflast_time_stampisNone:
+ last_time_stamp=df.index.max()
+
+ # Generate expected timestamps
+ expected_timestamps=pd.date_range(
+ start=first_time_stamp,end=last_time_stamp,freq=frequency
+ )
+
+ returndf.reindex(expected_timestamps)
+
Create boolean column(s) that flag whether or not the change
+between consecutive rows exceeds a provided threshold.
+
Examples:
+
Applies specified criteria across all columns of the DataFrame
+and appends a flag column for each column in the DataFrame
+
+>>> df = (
+... pd.DataFrame(...)
+... .flag_jumps(
+... scale="absolute",
+... direction="any",
+... threshold=2
+... )
+... ) # doctest: +SKIP
+
+Applies specific criteria to certain DataFrame columns,
+applies default criteria to columns *not* specifically listed and
+appends a flag column for each column in the DataFrame
+
+>>> df = (
+... pd.DataFrame(...)
+... .flag_jumps(
+... scale=dict(col1="absolute", col2="percentage"),
+... direction=dict(col1="increasing", col2="any"),
+... threshold=dict(col1=1, col2=0.5),
+... )
+... ) # doctest: +SKIP
+
+Applies specific criteria to certain DataFrame columns,
+applies default criteria to columns *not* specifically listed and
+appends a flag column for only those columns found in specified
+criteria
+
+>>> df = (
+... pd.DataFrame(...)
+... .flag_jumps(
+... scale=dict(col1="absolute"),
+... threshold=dict(col2=1),
+... strict=True,
+... )
+... ) # doctest: +SKIP
+
+
+
+
+
Parameters:
+
+
+
+
Name
+
Type
+
Description
+
Default
+
+
+
+
+
df
+
+ DataFrame
+
+
+
+
DataFrame which needs to be flagged for changes between
+consecutive rows above a certain threshold.
+
+
+
+ required
+
+
+
+
scale
+
+ Union[str, Dict[str, str]]
+
+
+
+
Type of scaling approach to use.
+Acceptable arguments are
+
+
'absolute' (consider the difference between rows)
+
'percentage' (consider the percentage change between rows).
+
+
+
+
+ 'percentage'
+
+
+
+
direction
+
+ Union[str, Dict[str, str]]
+
+
+
+
Type of method used to handle the sign change when
+comparing consecutive rows.
+Acceptable arguments are
+
+
'increasing' (only consider rows that are increasing in value)
+
'decreasing' (only consider rows that are decreasing in value)
+
'any' (consider rows that are either increasing or decreasing;
+ sign is ignored).
The value to check if consecutive row comparisons
+exceed. Always uses a greater than comparison. Must be >= 0.0.
+
+
+
+ 0.0
+
+
+
+
strict
+
+ bool
+
+
+
+
Flag to enable/disable appending of a flag column for
+each column in the provided DataFrame. If set to True, will
+only append a flag column for those columns found in at least
+one of the input dictionaries. If set to False, will append
+a flag column for each column found in the provided DataFrame.
+If criteria is not specified, the defaults for each criteria
+is used.
+
+
+
+ False
+
+
+
+
+
+
+
+
Raises:
+
+
+
+
Type
+
Description
+
+
+
+
+
+ JanitorError
+
+
+
+
If strict=True and at least one of
+scale, direction, or threshold inputs is not a
+dictionary.
+
+
+
+
+
+ JanitorError
+
+
+
+
If scale is not one of
+("absolute", "percentage").
+
+
+
+
+
+ JanitorError
+
+
+
+
If direction is not one of
+("increasing", "decreasing", "any").
@pf.register_dataframe_method
+defflag_jumps(
+ df:pd.DataFrame,
+ scale:Union[str,Dict[str,str]]="percentage",
+ direction:Union[str,Dict[str,str]]="any",
+ threshold:Union[int,float,Dict[str,Union[int,float]]]=0.0,
+ strict:bool=False,
+)->pd.DataFrame:
+"""Create boolean column(s) that flag whether or not the change
+ between consecutive rows exceeds a provided threshold.
+
+ Examples:
+
+ Applies specified criteria across all columns of the DataFrame
+ and appends a flag column for each column in the DataFrame
+
+ >>> df = (
+ ... pd.DataFrame(...)
+ ... .flag_jumps(
+ ... scale="absolute",
+ ... direction="any",
+ ... threshold=2
+ ... )
+ ... ) # doctest: +SKIP
+
+ Applies specific criteria to certain DataFrame columns,
+ applies default criteria to columns *not* specifically listed and
+ appends a flag column for each column in the DataFrame
+
+ >>> df = (
+ ... pd.DataFrame(...)
+ ... .flag_jumps(
+ ... scale=dict(col1="absolute", col2="percentage"),
+ ... direction=dict(col1="increasing", col2="any"),
+ ... threshold=dict(col1=1, col2=0.5),
+ ... )
+ ... ) # doctest: +SKIP
+
+ Applies specific criteria to certain DataFrame columns,
+ applies default criteria to columns *not* specifically listed and
+ appends a flag column for only those columns found in specified
+ criteria
+
+ >>> df = (
+ ... pd.DataFrame(...)
+ ... .flag_jumps(
+ ... scale=dict(col1="absolute"),
+ ... threshold=dict(col2=1),
+ ... strict=True,
+ ... )
+ ... ) # doctest: +SKIP
+
+ Args:
+ df: DataFrame which needs to be flagged for changes between
+ consecutive rows above a certain threshold.
+ scale:
+ Type of scaling approach to use.
+ Acceptable arguments are
+
+ * `'absolute'` (consider the difference between rows)
+ * `'percentage'` (consider the percentage change between rows).
+
+ direction: Type of method used to handle the sign change when
+ comparing consecutive rows.
+ Acceptable arguments are
+
+ * `'increasing'` (only consider rows that are increasing in value)
+ * `'decreasing'` (only consider rows that are decreasing in value)
+ * `'any'` (consider rows that are either increasing or decreasing;
+ sign is ignored).
+ threshold: The value to check if consecutive row comparisons
+ exceed. Always uses a greater than comparison. Must be `>= 0.0`.
+ strict: Flag to enable/disable appending of a flag column for
+ each column in the provided DataFrame. If set to `True`, will
+ only append a flag column for those columns found in at least
+ one of the input dictionaries. If set to `False`, will append
+ a flag column for each column found in the provided DataFrame.
+ If criteria is not specified, the defaults for each criteria
+ is used.
+
+ Raises:
+ JanitorError: If `strict=True` and at least one of
+ `scale`, `direction`, or `threshold` inputs is not a
+ dictionary.
+ JanitorError: If `scale` is not one of
+ `("absolute", "percentage")`.
+ JanitorError: If `direction` is not one of
+ `("increasing", "decreasing", "any")`.
+ JanitorError: If `threshold` is less than `0.0`.
+
+ Returns:
+ DataFrame that has `flag jump` columns.
+
+ <!--
+ # noqa: DAR101
+ -->
+ """
+ df=df.copy()
+
+ ifstrict:
+ if(
+ any(isinstance(arg,dict)forargin(scale,direction,threshold))
+ isFalse
+ ):
+ raiseJanitorError(
+ "When enacting 'strict=True', 'scale', 'direction', or "
+ +"'threshold' must be a dictionary."
+ )
+
+ # Only append a flag col for the cols that appear
+ # in at least one of the input dicts
+ arg_keys=[
+ arg.keys()
+ forargin(scale,direction,threshold)
+ ifisinstance(arg,dict)
+ ]
+ cols=set(itertools.chain.from_iterable(arg_keys))
+
+ else:
+ # Append a flag col for each col in the DataFrame
+ cols=df.columns
+
+ columns_to_add={}
+ forcolinsorted(cols):
+ # Allow arguments to be a mix of dict and single instances
+ s=scale.get(col,"percentage")ifisinstance(scale,dict)elsescale
+ d=(
+ direction.get(col,"any")
+ ifisinstance(direction,dict)
+ elsedirection
+ )
+ t=(
+ threshold.get(col,0.0)
+ ifisinstance(threshold,dict)
+ elsethreshold
+ )
+
+ columns_to_add[f"{col}_jump_flag"]=_flag_jumps_single_col(
+ df,col,scale=s,direction=d,threshold=t
+ )
+
+ df=df.assign(**columns_to_add)
+
+ returndf
+
If timestamps are monotonic, this function will return
+the DataFrame unmodified. If timestamps are not monotonic,
+then the function will sort the DataFrame.
DataFrame which needs to be tested for monotonicity.
+
+
+
+ required
+
+
+
+
direction
+
+ str
+
+
+
+
Type of monotonicity desired.
+Acceptable arguments are 'increasing' or 'decreasing'.
+
+
+
+ 'increasing'
+
+
+
+
strict
+
+ bool
+
+
+
+
Flag to enable/disable strict monotonicity.
+If set to True, will remove duplicates in the index
+by retaining first occurrence of value in index.
+If set to False, will not test for duplicates in the index.
+
+
+
+ False
+
+
+
+
+
+
+
+
Returns:
+
+
+
+
Type
+
Description
+
+
+
+
+
+ DataFrame
+
+
+
+
DataFrame that has monotonically increasing (or decreasing)
+timestamps.
@pf.register_dataframe_method
+defsort_timestamps_monotonically(
+ df:pd.DataFrame,direction:str="increasing",strict:bool=False
+)->pd.DataFrame:
+"""Sort DataFrame such that index is monotonic.
+
+ If timestamps are monotonic, this function will return
+ the DataFrame unmodified. If timestamps are not monotonic,
+ then the function will sort the DataFrame.
+
+ Examples:
+ Functional usage
+
+ >>> import pandas as pd
+ >>> import janitor.timeseries
+ >>> df = janitor.timeseries.sort_timestamps_monotonically(
+ ... df=pd.DataFrame(...),
+ ... direction="increasing",
+ ... ) # doctest: +SKIP
+
+ Method chaining example:
+
+ >>> import pandas as pd
+ >>> import janitor.timeseries
+ >>> df = (
+ ... pd.DataFrame(...)
+ ... .sort_timestamps_monotonically(direction="increasing")
+ ... ) # doctest: +SKIP
+
+ Args:
+ df: DataFrame which needs to be tested for monotonicity.
+ direction: Type of monotonicity desired.
+ Acceptable arguments are `'increasing'` or `'decreasing'`.
+ strict: Flag to enable/disable strict monotonicity.
+ If set to `True`, will remove duplicates in the index
+ by retaining first occurrence of value in index.
+ If set to `False`, will not test for duplicates in the index.
+
+ Returns:
+ DataFrame that has monotonically increasing (or decreasing)
+ timestamps.
+ """
+ # Check all the inputs are the correct data type
+ check("df",df,[pd.DataFrame])
+ check("direction",direction,[str])
+ check("strict",strict,[bool])
+
+ # Remove duplicates if requested
+ ifstrict:
+ df=df[~df.index.duplicated(keep="first")]
+
+ # Sort timestamps
+ ifdirection=="increasing":
+ df=df.sort_index()
+ else:
+ df=df.sort_index(ascending=False)
+
+ # Return the DataFrame
+ returndf
+
Given a NumPy array, return an XArray DataArray which contains the same
+dimension names and (optionally) coordinates and other properties as the
+supplied DataArray.
+
This is similar to xr.DataArray.copy() with more specificity for
+the type of cloning you would like to perform - the different properties
+that you desire to mirror in the new DataArray.
+
If the coordinates from the source DataArray are not desired, the shape
+of the source and new NumPy arrays don't need to match.
+The number of dimensions do, however.
+
+
+
+
Examples:
+
Making a new DataArray from a previous one, keeping the
+dimension names but dropping the coordinates (the input NumPy array
+is of a different size):
The NumPy array which will be wrapped in a new DataArray
+given the properties copied over from the source DataArray.
+
+
+
+ required
+
+
+
+
use_coords
+
+ bool
+
+
+
+
If True, use the coordinates of the source
+DataArray for the coordinates of the newly-generated array.
+Shapes must match in this case. If False, only the number of
+dimensions must match.
+
+
+
+ True
+
+
+
+
use_attrs
+
+ bool
+
+
+
+
If True, copy over the attrs from the source
+DataArray.
+The data inside attrs itself is not copied, only the mapping.
+Otherwise, use the supplied attrs.
+
+
+
+ False
+
+
+
+
new_name
+
+ str
+
+
+
+
If set, use as the new name of the returned DataArray.
+Otherwise, use the name of da.
+
+
+
+ None
+
+
+
+
+
+
+
+
Raises:
+
+
+
+
Type
+
Description
+
+
+
+
+
+ ValueError
+
+
+
+
If number of dimensions in NumPy array and
+DataArray do not match.
+
+
+
+
+
+ ValueError
+
+
+
+
If shape of NumPy array and DataArray
+do not match.
+
+
+
+
+
+
+
+
+
Returns:
+
+
+
+
Type
+
Description
+
+
+
+
+
+ DataArray
+
+
+
+
A DataArray styled like the input DataArray containing the
+NumPy array data.
+
+
+
+
+
+
+
+ Source code in janitor/xarray/functions.py
+
@pf.register_xarray_dataarray_method
+defclone_using(
+ da:xr.DataArray,
+ np_arr:np.array,
+ use_coords:bool=True,
+ use_attrs:bool=False,
+ new_name:str=None,
+)->xr.DataArray:
+"""
+ Given a NumPy array, return an XArray `DataArray` which contains the same
+ dimension names and (optionally) coordinates and other properties as the
+ supplied `DataArray`.
+
+ This is similar to `xr.DataArray.copy()` with more specificity for
+ the type of cloning you would like to perform - the different properties
+ that you desire to mirror in the new `DataArray`.
+
+ If the coordinates from the source `DataArray` are not desired, the shape
+ of the source and new NumPy arrays don't need to match.
+ The number of dimensions do, however.
+
+ Examples:
+ Making a new `DataArray` from a previous one, keeping the
+ dimension names but dropping the coordinates (the input NumPy array
+ is of a different size):
+
+ >>> import xarray as xr
+ >>> import janitor.xarray
+ >>> da = xr.DataArray(
+ ... np.zeros((512, 1024)), dims=["ax_1", "ax_2"],
+ ... coords=dict(ax_1=np.linspace(0, 1, 512),
+ ... ax_2=np.logspace(-2, 2, 1024)),
+ ... name="original",
+ ... )
+ >>> new_da = da.clone_using(
+ ... np.ones((4, 6)), new_name='new_and_improved', use_coords=False,
+ ... )
+ >>> new_da
+ <xarray.DataArray 'new_and_improved' (ax_1: 4, ax_2: 6)> Size: 192B
+ array([[1., 1., 1., 1., 1., 1.],
+ [1., 1., 1., 1., 1., 1.],
+ [1., 1., 1., 1., 1., 1.],
+ [1., 1., 1., 1., 1., 1.]])
+ Dimensions without coordinates: ax_1, ax_2
+
+ Args:
+ da: The `DataArray` supplied by the method itself.
+ np_arr: The NumPy array which will be wrapped in a new `DataArray`
+ given the properties copied over from the source `DataArray`.
+ use_coords: If `True`, use the coordinates of the source
+ `DataArray` for the coordinates of the newly-generated array.
+ Shapes must match in this case. If `False`, only the number of
+ dimensions must match.
+ use_attrs: If `True`, copy over the `attrs` from the source
+ `DataArray`.
+ The data inside `attrs` itself is not copied, only the mapping.
+ Otherwise, use the supplied attrs.
+ new_name: If set, use as the new name of the returned `DataArray`.
+ Otherwise, use the name of `da`.
+
+ Raises:
+ ValueError: If number of dimensions in `NumPy` array and
+ `DataArray` do not match.
+ ValueError: If shape of `NumPy` array and `DataArray`
+ do not match.
+
+ Returns:
+ A `DataArray` styled like the input `DataArray` containing the
+ NumPy array data.
+ """
+
+ ifnp_arr.ndim!=da.ndim:
+ raiseValueError(
+ "Number of dims in the NumPy array and the DataArray "
+ "must match."
+ )
+
+ ifuse_coordsandnotall(
+ np_ax_len==da_ax_len
+ fornp_ax_len,da_ax_leninzip(np_arr.shape,da.shape)
+ ):
+ raiseValueError(
+ "Input NumPy array and DataArray must have the same "
+ "shape if copying over coordinates."
+ )
+
+ returnxr.DataArray(
+ np_arr,
+ dims=da.dims,
+ coords=da.coordsifuse_coordselseNone,
+ attrs=da.attrs.copy()ifuse_attrselseNone,
+ name=new_nameifnew_nameisnotNoneelseda.name,
+ )
+
{"use strict";/*!
+ * escape-html
+ * Copyright(c) 2012-2013 TJ Holowaychuk
+ * Copyright(c) 2015 Andreas Lubbe
+ * Copyright(c) 2015 Tiancheng "Timothy" Gu
+ * MIT Licensed
+ */var Va=/["'&<>]/;qn.exports=za;function za(e){var t=""+e,r=Va.exec(t);if(!r)return t;var o,n="",i=0,s=0;for(i=r.index;i0&&i[i.length-1])&&(p[0]===6||p[0]===2)){r=0;continue}if(p[0]===3&&(!i||p[1]>i[0]&&p[1]=e.length&&(e=void 0),{value:e&&e[o++],done:!e}}};throw new TypeError(t?"Object is not iterable.":"Symbol.iterator is not defined.")}function V(e,t){var r=typeof Symbol=="function"&&e[Symbol.iterator];if(!r)return e;var o=r.call(e),n,i=[],s;try{for(;(t===void 0||t-- >0)&&!(n=o.next()).done;)i.push(n.value)}catch(a){s={error:a}}finally{try{n&&!n.done&&(r=o.return)&&r.call(o)}finally{if(s)throw s.error}}return i}function z(e,t,r){if(r||arguments.length===2)for(var o=0,n=t.length,i;o1||a(u,h)})})}function a(u,h){try{c(o[u](h))}catch(w){f(i[0][3],w)}}function c(u){u.value instanceof ot?Promise.resolve(u.value.v).then(p,l):f(i[0][2],u)}function p(u){a("next",u)}function l(u){a("throw",u)}function f(u,h){u(h),i.shift(),i.length&&a(i[0][0],i[0][1])}}function so(e){if(!Symbol.asyncIterator)throw new TypeError("Symbol.asyncIterator is not defined.");var t=e[Symbol.asyncIterator],r;return t?t.call(e):(e=typeof ue=="function"?ue(e):e[Symbol.iterator](),r={},o("next"),o("throw"),o("return"),r[Symbol.asyncIterator]=function(){return this},r);function o(i){r[i]=e[i]&&function(s){return new Promise(function(a,c){s=e[i](s),n(a,c,s.done,s.value)})}}function n(i,s,a,c){Promise.resolve(c).then(function(p){i({value:p,done:a})},s)}}function k(e){return typeof e=="function"}function pt(e){var t=function(o){Error.call(o),o.stack=new Error().stack},r=e(t);return r.prototype=Object.create(Error.prototype),r.prototype.constructor=r,r}var Wt=pt(function(e){return function(r){e(this),this.message=r?r.length+` errors occurred during unsubscription:
+`+r.map(function(o,n){return n+1+") "+o.toString()}).join(`
+ `):"",this.name="UnsubscriptionError",this.errors=r}});function Ve(e,t){if(e){var r=e.indexOf(t);0<=r&&e.splice(r,1)}}var Ie=function(){function e(t){this.initialTeardown=t,this.closed=!1,this._parentage=null,this._finalizers=null}return e.prototype.unsubscribe=function(){var t,r,o,n,i;if(!this.closed){this.closed=!0;var s=this._parentage;if(s)if(this._parentage=null,Array.isArray(s))try{for(var a=ue(s),c=a.next();!c.done;c=a.next()){var p=c.value;p.remove(this)}}catch(A){t={error:A}}finally{try{c&&!c.done&&(r=a.return)&&r.call(a)}finally{if(t)throw t.error}}else s.remove(this);var l=this.initialTeardown;if(k(l))try{l()}catch(A){i=A instanceof Wt?A.errors:[A]}var f=this._finalizers;if(f){this._finalizers=null;try{for(var u=ue(f),h=u.next();!h.done;h=u.next()){var w=h.value;try{co(w)}catch(A){i=i!=null?i:[],A instanceof Wt?i=z(z([],V(i)),V(A.errors)):i.push(A)}}}catch(A){o={error:A}}finally{try{h&&!h.done&&(n=u.return)&&n.call(u)}finally{if(o)throw o.error}}}if(i)throw new Wt(i)}},e.prototype.add=function(t){var r;if(t&&t!==this)if(this.closed)co(t);else{if(t instanceof e){if(t.closed||t._hasParent(this))return;t._addParent(this)}(this._finalizers=(r=this._finalizers)!==null&&r!==void 0?r:[]).push(t)}},e.prototype._hasParent=function(t){var r=this._parentage;return r===t||Array.isArray(r)&&r.includes(t)},e.prototype._addParent=function(t){var r=this._parentage;this._parentage=Array.isArray(r)?(r.push(t),r):r?[r,t]:t},e.prototype._removeParent=function(t){var r=this._parentage;r===t?this._parentage=null:Array.isArray(r)&&Ve(r,t)},e.prototype.remove=function(t){var r=this._finalizers;r&&Ve(r,t),t instanceof e&&t._removeParent(this)},e.EMPTY=function(){var t=new e;return t.closed=!0,t}(),e}();var Er=Ie.EMPTY;function Dt(e){return e instanceof Ie||e&&"closed"in e&&k(e.remove)&&k(e.add)&&k(e.unsubscribe)}function co(e){k(e)?e():e.unsubscribe()}var ke={onUnhandledError:null,onStoppedNotification:null,Promise:void 0,useDeprecatedSynchronousErrorHandling:!1,useDeprecatedNextContext:!1};var lt={setTimeout:function(e,t){for(var r=[],o=2;o0},enumerable:!1,configurable:!0}),t.prototype._trySubscribe=function(r){return this._throwIfClosed(),e.prototype._trySubscribe.call(this,r)},t.prototype._subscribe=function(r){return this._throwIfClosed(),this._checkFinalizedStatuses(r),this._innerSubscribe(r)},t.prototype._innerSubscribe=function(r){var o=this,n=this,i=n.hasError,s=n.isStopped,a=n.observers;return i||s?Er:(this.currentObservers=null,a.push(r),new Ie(function(){o.currentObservers=null,Ve(a,r)}))},t.prototype._checkFinalizedStatuses=function(r){var o=this,n=o.hasError,i=o.thrownError,s=o.isStopped;n?r.error(i):s&&r.complete()},t.prototype.asObservable=function(){var r=new j;return r.source=this,r},t.create=function(r,o){return new vo(r,o)},t}(j);var vo=function(e){se(t,e);function t(r,o){var n=e.call(this)||this;return n.destination=r,n.source=o,n}return t.prototype.next=function(r){var o,n;(n=(o=this.destination)===null||o===void 0?void 0:o.next)===null||n===void 0||n.call(o,r)},t.prototype.error=function(r){var o,n;(n=(o=this.destination)===null||o===void 0?void 0:o.error)===null||n===void 0||n.call(o,r)},t.prototype.complete=function(){var r,o;(o=(r=this.destination)===null||r===void 0?void 0:r.complete)===null||o===void 0||o.call(r)},t.prototype._subscribe=function(r){var o,n;return(n=(o=this.source)===null||o===void 0?void 0:o.subscribe(r))!==null&&n!==void 0?n:Er},t}(v);var St={now:function(){return(St.delegate||Date).now()},delegate:void 0};var Ot=function(e){se(t,e);function t(r,o,n){r===void 0&&(r=1/0),o===void 0&&(o=1/0),n===void 0&&(n=St);var i=e.call(this)||this;return i._bufferSize=r,i._windowTime=o,i._timestampProvider=n,i._buffer=[],i._infiniteTimeWindow=!0,i._infiniteTimeWindow=o===1/0,i._bufferSize=Math.max(1,r),i._windowTime=Math.max(1,o),i}return t.prototype.next=function(r){var o=this,n=o.isStopped,i=o._buffer,s=o._infiniteTimeWindow,a=o._timestampProvider,c=o._windowTime;n||(i.push(r),!s&&i.push(a.now()+c)),this._trimBuffer(),e.prototype.next.call(this,r)},t.prototype._subscribe=function(r){this._throwIfClosed(),this._trimBuffer();for(var o=this._innerSubscribe(r),n=this,i=n._infiniteTimeWindow,s=n._buffer,a=s.slice(),c=0;c0?e.prototype.requestAsyncId.call(this,r,o,n):(r.actions.push(this),r._scheduled||(r._scheduled=ut.requestAnimationFrame(function(){return r.flush(void 0)})))},t.prototype.recycleAsyncId=function(r,o,n){var i;if(n===void 0&&(n=0),n!=null?n>0:this.delay>0)return e.prototype.recycleAsyncId.call(this,r,o,n);var s=r.actions;o!=null&&((i=s[s.length-1])===null||i===void 0?void 0:i.id)!==o&&(ut.cancelAnimationFrame(o),r._scheduled=void 0)},t}(zt);var yo=function(e){se(t,e);function t(){return e!==null&&e.apply(this,arguments)||this}return t.prototype.flush=function(r){this._active=!0;var o=this._scheduled;this._scheduled=void 0;var n=this.actions,i;r=r||n.shift();do if(i=r.execute(r.state,r.delay))break;while((r=n[0])&&r.id===o&&n.shift());if(this._active=!1,i){for(;(r=n[0])&&r.id===o&&n.shift();)r.unsubscribe();throw i}},t}(qt);var de=new yo(xo);var L=new j(function(e){return e.complete()});function Kt(e){return e&&k(e.schedule)}function _r(e){return e[e.length-1]}function Je(e){return k(_r(e))?e.pop():void 0}function Ae(e){return Kt(_r(e))?e.pop():void 0}function Qt(e,t){return typeof _r(e)=="number"?e.pop():t}var dt=function(e){return e&&typeof e.length=="number"&&typeof e!="function"};function Yt(e){return k(e==null?void 0:e.then)}function Bt(e){return k(e[ft])}function Gt(e){return Symbol.asyncIterator&&k(e==null?void 0:e[Symbol.asyncIterator])}function Jt(e){return new TypeError("You provided "+(e!==null&&typeof e=="object"?"an invalid object":"'"+e+"'")+" where a stream was expected. You can provide an Observable, Promise, ReadableStream, Array, AsyncIterable, or Iterable.")}function Di(){return typeof Symbol!="function"||!Symbol.iterator?"@@iterator":Symbol.iterator}var Xt=Di();function Zt(e){return k(e==null?void 0:e[Xt])}function er(e){return ao(this,arguments,function(){var r,o,n,i;return Ut(this,function(s){switch(s.label){case 0:r=e.getReader(),s.label=1;case 1:s.trys.push([1,,9,10]),s.label=2;case 2:return[4,ot(r.read())];case 3:return o=s.sent(),n=o.value,i=o.done,i?[4,ot(void 0)]:[3,5];case 4:return[2,s.sent()];case 5:return[4,ot(n)];case 6:return[4,s.sent()];case 7:return s.sent(),[3,2];case 8:return[3,10];case 9:return r.releaseLock(),[7];case 10:return[2]}})})}function tr(e){return k(e==null?void 0:e.getReader)}function N(e){if(e instanceof j)return e;if(e!=null){if(Bt(e))return Ni(e);if(dt(e))return Vi(e);if(Yt(e))return zi(e);if(Gt(e))return Eo(e);if(Zt(e))return qi(e);if(tr(e))return Ki(e)}throw Jt(e)}function Ni(e){return new j(function(t){var r=e[ft]();if(k(r.subscribe))return r.subscribe(t);throw new TypeError("Provided object does not correctly implement Symbol.observable")})}function Vi(e){return new j(function(t){for(var r=0;r=2;return function(o){return o.pipe(e?g(function(n,i){return e(n,i,o)}):ce,ye(1),r?Qe(t):jo(function(){return new or}))}}function $r(e){return e<=0?function(){return L}:x(function(t,r){var o=[];t.subscribe(S(r,function(n){o.push(n),e=2,!0))}function le(e){e===void 0&&(e={});var t=e.connector,r=t===void 0?function(){return new v}:t,o=e.resetOnError,n=o===void 0?!0:o,i=e.resetOnComplete,s=i===void 0?!0:i,a=e.resetOnRefCountZero,c=a===void 0?!0:a;return function(p){var l,f,u,h=0,w=!1,A=!1,Z=function(){f==null||f.unsubscribe(),f=void 0},te=function(){Z(),l=u=void 0,w=A=!1},J=function(){var C=l;te(),C==null||C.unsubscribe()};return x(function(C,ct){h++,!A&&!w&&Z();var Ne=u=u!=null?u:r();ct.add(function(){h--,h===0&&!A&&!w&&(f=Pr(J,c))}),Ne.subscribe(ct),!l&&h>0&&(l=new it({next:function(Pe){return Ne.next(Pe)},error:function(Pe){A=!0,Z(),f=Pr(te,n,Pe),Ne.error(Pe)},complete:function(){w=!0,Z(),f=Pr(te,s),Ne.complete()}}),N(C).subscribe(l))})(p)}}function Pr(e,t){for(var r=[],o=2;oe.next(document)),e}function R(e,t=document){return Array.from(t.querySelectorAll(e))}function P(e,t=document){let r=me(e,t);if(typeof r=="undefined")throw new ReferenceError(`Missing element: expected "${e}" to be present`);return r}function me(e,t=document){return t.querySelector(e)||void 0}function Re(){var e,t,r,o;return(o=(r=(t=(e=document.activeElement)==null?void 0:e.shadowRoot)==null?void 0:t.activeElement)!=null?r:document.activeElement)!=null?o:void 0}var la=T(d(document.body,"focusin"),d(document.body,"focusout")).pipe(be(1),q(void 0),m(()=>Re()||document.body),B(1));function vt(e){return la.pipe(m(t=>e.contains(t)),Y())}function Vo(e,t){return T(d(e,"mouseenter").pipe(m(()=>!0)),d(e,"mouseleave").pipe(m(()=>!1))).pipe(t?be(t):ce,q(!1))}function Ue(e){return{x:e.offsetLeft,y:e.offsetTop}}function zo(e){return T(d(window,"load"),d(window,"resize")).pipe(Me(0,de),m(()=>Ue(e)),q(Ue(e)))}function ir(e){return{x:e.scrollLeft,y:e.scrollTop}}function et(e){return T(d(e,"scroll"),d(window,"resize")).pipe(Me(0,de),m(()=>ir(e)),q(ir(e)))}function qo(e,t){if(typeof t=="string"||typeof t=="number")e.innerHTML+=t.toString();else if(t instanceof Node)e.appendChild(t);else if(Array.isArray(t))for(let r of t)qo(e,r)}function E(e,t,...r){let o=document.createElement(e);if(t)for(let n of Object.keys(t))typeof t[n]!="undefined"&&(typeof t[n]!="boolean"?o.setAttribute(n,t[n]):o.setAttribute(n,""));for(let n of r)qo(o,n);return o}function ar(e){if(e>999){let t=+((e-950)%1e3>99);return`${((e+1e-6)/1e3).toFixed(t)}k`}else return e.toString()}function gt(e){let t=E("script",{src:e});return H(()=>(document.head.appendChild(t),T(d(t,"load"),d(t,"error").pipe(b(()=>Ar(()=>new ReferenceError(`Invalid script: ${e}`))))).pipe(m(()=>{}),_(()=>document.head.removeChild(t)),ye(1))))}var Ko=new v,ma=H(()=>typeof ResizeObserver=="undefined"?gt("https://unpkg.com/resize-observer-polyfill"):$(void 0)).pipe(m(()=>new ResizeObserver(e=>{for(let t of e)Ko.next(t)})),b(e=>T(qe,$(e)).pipe(_(()=>e.disconnect()))),B(1));function pe(e){return{width:e.offsetWidth,height:e.offsetHeight}}function Ee(e){return ma.pipe(y(t=>t.observe(e)),b(t=>Ko.pipe(g(({target:r})=>r===e),_(()=>t.unobserve(e)),m(()=>pe(e)))),q(pe(e)))}function xt(e){return{width:e.scrollWidth,height:e.scrollHeight}}function sr(e){let t=e.parentElement;for(;t&&(e.scrollWidth<=t.scrollWidth&&e.scrollHeight<=t.scrollHeight);)t=(e=t).parentElement;return t?e:void 0}var Qo=new v,fa=H(()=>$(new IntersectionObserver(e=>{for(let t of e)Qo.next(t)},{threshold:0}))).pipe(b(e=>T(qe,$(e)).pipe(_(()=>e.disconnect()))),B(1));function yt(e){return fa.pipe(y(t=>t.observe(e)),b(t=>Qo.pipe(g(({target:r})=>r===e),_(()=>t.unobserve(e)),m(({isIntersecting:r})=>r))))}function Yo(e,t=16){return et(e).pipe(m(({y:r})=>{let o=pe(e),n=xt(e);return r>=n.height-o.height-t}),Y())}var cr={drawer:P("[data-md-toggle=drawer]"),search:P("[data-md-toggle=search]")};function Bo(e){return cr[e].checked}function Be(e,t){cr[e].checked!==t&&cr[e].click()}function We(e){let t=cr[e];return d(t,"change").pipe(m(()=>t.checked),q(t.checked))}function ua(e,t){switch(e.constructor){case HTMLInputElement:return e.type==="radio"?/^Arrow/.test(t):!0;case HTMLSelectElement:case HTMLTextAreaElement:return!0;default:return e.isContentEditable}}function da(){return T(d(window,"compositionstart").pipe(m(()=>!0)),d(window,"compositionend").pipe(m(()=>!1))).pipe(q(!1))}function Go(){let e=d(window,"keydown").pipe(g(t=>!(t.metaKey||t.ctrlKey)),m(t=>({mode:Bo("search")?"search":"global",type:t.key,claim(){t.preventDefault(),t.stopPropagation()}})),g(({mode:t,type:r})=>{if(t==="global"){let o=Re();if(typeof o!="undefined")return!ua(o,r)}return!0}),le());return da().pipe(b(t=>t?L:e))}function ve(){return new URL(location.href)}function st(e,t=!1){if(G("navigation.instant")&&!t){let r=E("a",{href:e.href});document.body.appendChild(r),r.click(),r.remove()}else location.href=e.href}function Jo(){return new v}function Xo(){return location.hash.slice(1)}function Zo(e){let t=E("a",{href:e});t.addEventListener("click",r=>r.stopPropagation()),t.click()}function ha(e){return T(d(window,"hashchange"),e).pipe(m(Xo),q(Xo()),g(t=>t.length>0),B(1))}function en(e){return ha(e).pipe(m(t=>me(`[id="${t}"]`)),g(t=>typeof t!="undefined"))}function At(e){let t=matchMedia(e);return nr(r=>t.addListener(()=>r(t.matches))).pipe(q(t.matches))}function tn(){let e=matchMedia("print");return T(d(window,"beforeprint").pipe(m(()=>!0)),d(window,"afterprint").pipe(m(()=>!1))).pipe(q(e.matches))}function Ur(e,t){return e.pipe(b(r=>r?t():L))}function Wr(e,t){return new j(r=>{let o=new XMLHttpRequest;return o.open("GET",`${e}`),o.responseType="blob",o.addEventListener("load",()=>{o.status>=200&&o.status<300?(r.next(o.response),r.complete()):r.error(new Error(o.statusText))}),o.addEventListener("error",()=>{r.error(new Error("Network error"))}),o.addEventListener("abort",()=>{r.complete()}),typeof(t==null?void 0:t.progress$)!="undefined"&&(o.addEventListener("progress",n=>{var i;if(n.lengthComputable)t.progress$.next(n.loaded/n.total*100);else{let s=(i=o.getResponseHeader("Content-Length"))!=null?i:0;t.progress$.next(n.loaded/+s*100)}}),t.progress$.next(5)),o.send(),()=>o.abort()})}function De(e,t){return Wr(e,t).pipe(b(r=>r.text()),m(r=>JSON.parse(r)),B(1))}function rn(e,t){let r=new DOMParser;return Wr(e,t).pipe(b(o=>o.text()),m(o=>r.parseFromString(o,"text/html")),B(1))}function on(e,t){let r=new DOMParser;return Wr(e,t).pipe(b(o=>o.text()),m(o=>r.parseFromString(o,"text/xml")),B(1))}function nn(){return{x:Math.max(0,scrollX),y:Math.max(0,scrollY)}}function an(){return T(d(window,"scroll",{passive:!0}),d(window,"resize",{passive:!0})).pipe(m(nn),q(nn()))}function sn(){return{width:innerWidth,height:innerHeight}}function cn(){return d(window,"resize",{passive:!0}).pipe(m(sn),q(sn()))}function pn(){return Q([an(),cn()]).pipe(m(([e,t])=>({offset:e,size:t})),B(1))}function pr(e,{viewport$:t,header$:r}){let o=t.pipe(X("size")),n=Q([o,r]).pipe(m(()=>Ue(e)));return Q([r,t,n]).pipe(m(([{height:i},{offset:s,size:a},{x:c,y:p}])=>({offset:{x:s.x-c,y:s.y-p+i},size:a})))}function ba(e){return d(e,"message",t=>t.data)}function va(e){let t=new v;return t.subscribe(r=>e.postMessage(r)),t}function ln(e,t=new Worker(e)){let r=ba(t),o=va(t),n=new v;n.subscribe(o);let i=o.pipe(ee(),oe(!0));return n.pipe(ee(),$e(r.pipe(U(i))),le())}var ga=P("#__config"),Et=JSON.parse(ga.textContent);Et.base=`${new URL(Et.base,ve())}`;function we(){return Et}function G(e){return Et.features.includes(e)}function ge(e,t){return typeof t!="undefined"?Et.translations[e].replace("#",t.toString()):Et.translations[e]}function Te(e,t=document){return P(`[data-md-component=${e}]`,t)}function ne(e,t=document){return R(`[data-md-component=${e}]`,t)}function xa(e){let t=P(".md-typeset > :first-child",e);return d(t,"click",{once:!0}).pipe(m(()=>P(".md-typeset",e)),m(r=>({hash:__md_hash(r.innerHTML)})))}function mn(e){if(!G("announce.dismiss")||!e.childElementCount)return L;if(!e.hidden){let t=P(".md-typeset",e);__md_hash(t.innerHTML)===__md_get("__announce")&&(e.hidden=!0)}return H(()=>{let t=new v;return t.subscribe(({hash:r})=>{e.hidden=!0,__md_set("__announce",r)}),xa(e).pipe(y(r=>t.next(r)),_(()=>t.complete()),m(r=>F({ref:e},r)))})}function ya(e,{target$:t}){return t.pipe(m(r=>({hidden:r!==e})))}function fn(e,t){let r=new v;return r.subscribe(({hidden:o})=>{e.hidden=o}),ya(e,t).pipe(y(o=>r.next(o)),_(()=>r.complete()),m(o=>F({ref:e},o)))}function Ct(e,t){return t==="inline"?E("div",{class:"md-tooltip md-tooltip--inline",id:e,role:"tooltip"},E("div",{class:"md-tooltip__inner md-typeset"})):E("div",{class:"md-tooltip",id:e,role:"tooltip"},E("div",{class:"md-tooltip__inner md-typeset"}))}function un(e,t){if(t=t?`${t}_annotation_${e}`:void 0,t){let r=t?`#${t}`:void 0;return E("aside",{class:"md-annotation",tabIndex:0},Ct(t),E("a",{href:r,class:"md-annotation__index",tabIndex:-1},E("span",{"data-md-annotation-id":e})))}else return E("aside",{class:"md-annotation",tabIndex:0},Ct(t),E("span",{class:"md-annotation__index",tabIndex:-1},E("span",{"data-md-annotation-id":e})))}function dn(e){return E("button",{class:"md-clipboard md-icon",title:ge("clipboard.copy"),"data-clipboard-target":`#${e} > code`})}function Dr(e,t){let r=t&2,o=t&1,n=Object.keys(e.terms).filter(c=>!e.terms[c]).reduce((c,p)=>[...c,E("del",null,p)," "],[]).slice(0,-1),i=we(),s=new URL(e.location,i.base);G("search.highlight")&&s.searchParams.set("h",Object.entries(e.terms).filter(([,c])=>c).reduce((c,[p])=>`${c} ${p}`.trim(),""));let{tags:a}=we();return E("a",{href:`${s}`,class:"md-search-result__link",tabIndex:-1},E("article",{class:"md-search-result__article md-typeset","data-md-score":e.score.toFixed(2)},r>0&&E("div",{class:"md-search-result__icon md-icon"}),r>0&&E("h1",null,e.title),r<=0&&E("h2",null,e.title),o>0&&e.text.length>0&&e.text,e.tags&&e.tags.map(c=>{let p=a?c in a?`md-tag-icon md-tag--${a[c]}`:"md-tag-icon":"";return E("span",{class:`md-tag ${p}`},c)}),o>0&&n.length>0&&E("p",{class:"md-search-result__terms"},ge("search.result.term.missing"),": ",...n)))}function hn(e){let t=e[0].score,r=[...e],o=we(),n=r.findIndex(l=>!`${new URL(l.location,o.base)}`.includes("#")),[i]=r.splice(n,1),s=r.findIndex(l=>l.scoreDr(l,1)),...c.length?[E("details",{class:"md-search-result__more"},E("summary",{tabIndex:-1},E("div",null,c.length>0&&c.length===1?ge("search.result.more.one"):ge("search.result.more.other",c.length))),...c.map(l=>Dr(l,1)))]:[]];return E("li",{class:"md-search-result__item"},p)}function bn(e){return E("ul",{class:"md-source__facts"},Object.entries(e).map(([t,r])=>E("li",{class:`md-source__fact md-source__fact--${t}`},typeof r=="number"?ar(r):r)))}function Nr(e){let t=`tabbed-control tabbed-control--${e}`;return E("div",{class:t,hidden:!0},E("button",{class:"tabbed-button",tabIndex:-1,"aria-hidden":"true"}))}function vn(e){return E("div",{class:"md-typeset__scrollwrap"},E("div",{class:"md-typeset__table"},e))}function Ea(e){let t=we(),r=new URL(`../${e.version}/`,t.base);return E("li",{class:"md-version__item"},E("a",{href:`${r}`,class:"md-version__link"},e.title))}function gn(e,t){return e=e.filter(r=>{var o;return!((o=r.properties)!=null&&o.hidden)}),E("div",{class:"md-version"},E("button",{class:"md-version__current","aria-label":ge("select.version")},t.title),E("ul",{class:"md-version__list"},e.map(Ea)))}var wa=0;function Ta(e,t){document.body.append(e);let{width:r}=pe(e);e.style.setProperty("--md-tooltip-width",`${r}px`),e.remove();let o=sr(t),n=typeof o!="undefined"?et(o):$({x:0,y:0}),i=T(vt(t),Vo(t)).pipe(Y());return Q([i,n]).pipe(m(([s,a])=>{let{x:c,y:p}=Ue(t),l=pe(t),f=t.closest("table");return f&&t.parentElement&&(c+=f.offsetLeft+t.parentElement.offsetLeft,p+=f.offsetTop+t.parentElement.offsetTop),{active:s,offset:{x:c-a.x+l.width/2-r/2,y:p-a.y+l.height+8}}}))}function Ge(e){let t=e.title;if(!t.length)return L;let r=`__tooltip_${wa++}`,o=Ct(r,"inline"),n=P(".md-typeset",o);return n.innerHTML=t,H(()=>{let i=new v;return i.subscribe({next({offset:s}){o.style.setProperty("--md-tooltip-x",`${s.x}px`),o.style.setProperty("--md-tooltip-y",`${s.y}px`)},complete(){o.style.removeProperty("--md-tooltip-x"),o.style.removeProperty("--md-tooltip-y")}}),T(i.pipe(g(({active:s})=>s)),i.pipe(be(250),g(({active:s})=>!s))).subscribe({next({active:s}){s?(e.insertAdjacentElement("afterend",o),e.setAttribute("aria-describedby",r),e.removeAttribute("title")):(o.remove(),e.removeAttribute("aria-describedby"),e.setAttribute("title",t))},complete(){o.remove(),e.removeAttribute("aria-describedby"),e.setAttribute("title",t)}}),i.pipe(Me(16,de)).subscribe(({active:s})=>{o.classList.toggle("md-tooltip--active",s)}),i.pipe(_t(125,de),g(()=>!!e.offsetParent),m(()=>e.offsetParent.getBoundingClientRect()),m(({x:s})=>s)).subscribe({next(s){s?o.style.setProperty("--md-tooltip-0",`${-s}px`):o.style.removeProperty("--md-tooltip-0")},complete(){o.style.removeProperty("--md-tooltip-0")}}),Ta(o,e).pipe(y(s=>i.next(s)),_(()=>i.complete()),m(s=>F({ref:e},s)))}).pipe(ze(ie))}function Sa(e,t){let r=H(()=>Q([zo(e),et(t)])).pipe(m(([{x:o,y:n},i])=>{let{width:s,height:a}=pe(e);return{x:o-i.x+s/2,y:n-i.y+a/2}}));return vt(e).pipe(b(o=>r.pipe(m(n=>({active:o,offset:n})),ye(+!o||1/0))))}function xn(e,t,{target$:r}){let[o,n]=Array.from(e.children);return H(()=>{let i=new v,s=i.pipe(ee(),oe(!0));return i.subscribe({next({offset:a}){e.style.setProperty("--md-tooltip-x",`${a.x}px`),e.style.setProperty("--md-tooltip-y",`${a.y}px`)},complete(){e.style.removeProperty("--md-tooltip-x"),e.style.removeProperty("--md-tooltip-y")}}),yt(e).pipe(U(s)).subscribe(a=>{e.toggleAttribute("data-md-visible",a)}),T(i.pipe(g(({active:a})=>a)),i.pipe(be(250),g(({active:a})=>!a))).subscribe({next({active:a}){a?e.prepend(o):o.remove()},complete(){e.prepend(o)}}),i.pipe(Me(16,de)).subscribe(({active:a})=>{o.classList.toggle("md-tooltip--active",a)}),i.pipe(_t(125,de),g(()=>!!e.offsetParent),m(()=>e.offsetParent.getBoundingClientRect()),m(({x:a})=>a)).subscribe({next(a){a?e.style.setProperty("--md-tooltip-0",`${-a}px`):e.style.removeProperty("--md-tooltip-0")},complete(){e.style.removeProperty("--md-tooltip-0")}}),d(n,"click").pipe(U(s),g(a=>!(a.metaKey||a.ctrlKey))).subscribe(a=>{a.stopPropagation(),a.preventDefault()}),d(n,"mousedown").pipe(U(s),ae(i)).subscribe(([a,{active:c}])=>{var p;if(a.button!==0||a.metaKey||a.ctrlKey)a.preventDefault();else if(c){a.preventDefault();let l=e.parentElement.closest(".md-annotation");l instanceof HTMLElement?l.focus():(p=Re())==null||p.blur()}}),r.pipe(U(s),g(a=>a===o),Ye(125)).subscribe(()=>e.focus()),Sa(e,t).pipe(y(a=>i.next(a)),_(()=>i.complete()),m(a=>F({ref:e},a)))})}function Oa(e){return e.tagName==="CODE"?R(".c, .c1, .cm",e):[e]}function Ma(e){let t=[];for(let r of Oa(e)){let o=[],n=document.createNodeIterator(r,NodeFilter.SHOW_TEXT);for(let i=n.nextNode();i;i=n.nextNode())o.push(i);for(let i of o){let s;for(;s=/(\(\d+\))(!)?/.exec(i.textContent);){let[,a,c]=s;if(typeof c=="undefined"){let p=i.splitText(s.index);i=p.splitText(a.length),t.push(p)}else{i.textContent=a,t.push(i);break}}}}return t}function yn(e,t){t.append(...Array.from(e.childNodes))}function lr(e,t,{target$:r,print$:o}){let n=t.closest("[id]"),i=n==null?void 0:n.id,s=new Map;for(let a of Ma(t)){let[,c]=a.textContent.match(/\((\d+)\)/);me(`:scope > li:nth-child(${c})`,e)&&(s.set(c,un(c,i)),a.replaceWith(s.get(c)))}return s.size===0?L:H(()=>{let a=new v,c=a.pipe(ee(),oe(!0)),p=[];for(let[l,f]of s)p.push([P(".md-typeset",f),P(`:scope > li:nth-child(${l})`,e)]);return o.pipe(U(c)).subscribe(l=>{e.hidden=!l,e.classList.toggle("md-annotation-list",l);for(let[f,u]of p)l?yn(f,u):yn(u,f)}),T(...[...s].map(([,l])=>xn(l,t,{target$:r}))).pipe(_(()=>a.complete()),le())})}function En(e){if(e.nextElementSibling){let t=e.nextElementSibling;if(t.tagName==="OL")return t;if(t.tagName==="P"&&!t.children.length)return En(t)}}function wn(e,t){return H(()=>{let r=En(e);return typeof r!="undefined"?lr(r,e,t):L})}var Tn=jt(zr());var La=0;function Sn(e){if(e.nextElementSibling){let t=e.nextElementSibling;if(t.tagName==="OL")return t;if(t.tagName==="P"&&!t.children.length)return Sn(t)}}function _a(e){return Ee(e).pipe(m(({width:t})=>({scrollable:xt(e).width>t})),X("scrollable"))}function On(e,t){let{matches:r}=matchMedia("(hover)"),o=H(()=>{let n=new v,i=n.pipe($r(1));n.subscribe(({scrollable:c})=>{c&&r?e.setAttribute("tabindex","0"):e.removeAttribute("tabindex")});let s=[];if(Tn.default.isSupported()&&(e.closest(".copy")||G("content.code.copy")&&!e.closest(".no-copy"))){let c=e.closest("pre");c.id=`__code_${La++}`;let p=dn(c.id);c.insertBefore(p,e),G("content.tooltips")&&s.push(Ge(p))}let a=e.closest(".highlight");if(a instanceof HTMLElement){let c=Sn(a);if(typeof c!="undefined"&&(a.classList.contains("annotate")||G("content.code.annotate"))){let p=lr(c,e,t);s.push(Ee(a).pipe(U(i),m(({width:l,height:f})=>l&&f),Y(),b(l=>l?p:L)))}}return _a(e).pipe(y(c=>n.next(c)),_(()=>n.complete()),m(c=>F({ref:e},c)),$e(...s))});return G("content.lazy")?yt(e).pipe(g(n=>n),ye(1),b(()=>o)):o}function Aa(e,{target$:t,print$:r}){let o=!0;return T(t.pipe(m(n=>n.closest("details:not([open])")),g(n=>e===n),m(()=>({action:"open",reveal:!0}))),r.pipe(g(n=>n||!o),y(()=>o=e.open),m(n=>({action:n?"open":"close"}))))}function Mn(e,t){return H(()=>{let r=new v;return r.subscribe(({action:o,reveal:n})=>{e.toggleAttribute("open",o==="open"),n&&e.scrollIntoView()}),Aa(e,t).pipe(y(o=>r.next(o)),_(()=>r.complete()),m(o=>F({ref:e},o)))})}var Ln=".node circle,.node ellipse,.node path,.node polygon,.node rect{fill:var(--md-mermaid-node-bg-color);stroke:var(--md-mermaid-node-fg-color)}marker{fill:var(--md-mermaid-edge-color)!important}.edgeLabel .label rect{fill:#0000}.label{color:var(--md-mermaid-label-fg-color);font-family:var(--md-mermaid-font-family)}.label foreignObject{line-height:normal;overflow:visible}.label div .edgeLabel{color:var(--md-mermaid-label-fg-color)}.edgeLabel,.edgeLabel rect,.label div .edgeLabel{background-color:var(--md-mermaid-label-bg-color)}.edgeLabel,.edgeLabel rect{fill:var(--md-mermaid-label-bg-color);color:var(--md-mermaid-edge-color)}.edgePath .path,.flowchart-link{stroke:var(--md-mermaid-edge-color);stroke-width:.05rem}.edgePath .arrowheadPath{fill:var(--md-mermaid-edge-color);stroke:none}.cluster rect{fill:var(--md-default-fg-color--lightest);stroke:var(--md-default-fg-color--lighter)}.cluster span{color:var(--md-mermaid-label-fg-color);font-family:var(--md-mermaid-font-family)}g #flowchart-circleEnd,g #flowchart-circleStart,g #flowchart-crossEnd,g #flowchart-crossStart,g #flowchart-pointEnd,g #flowchart-pointStart{stroke:none}g.classGroup line,g.classGroup rect{fill:var(--md-mermaid-node-bg-color);stroke:var(--md-mermaid-node-fg-color)}g.classGroup text{fill:var(--md-mermaid-label-fg-color);font-family:var(--md-mermaid-font-family)}.classLabel .box{fill:var(--md-mermaid-label-bg-color);background-color:var(--md-mermaid-label-bg-color);opacity:1}.classLabel .label{fill:var(--md-mermaid-label-fg-color);font-family:var(--md-mermaid-font-family)}.node .divider{stroke:var(--md-mermaid-node-fg-color)}.relation{stroke:var(--md-mermaid-edge-color)}.cardinality{fill:var(--md-mermaid-label-fg-color);font-family:var(--md-mermaid-font-family)}.cardinality text{fill:inherit!important}defs #classDiagram-compositionEnd,defs #classDiagram-compositionStart,defs #classDiagram-dependencyEnd,defs #classDiagram-dependencyStart,defs #classDiagram-extensionEnd,defs #classDiagram-extensionStart{fill:var(--md-mermaid-edge-color)!important;stroke:var(--md-mermaid-edge-color)!important}defs #classDiagram-aggregationEnd,defs #classDiagram-aggregationStart{fill:var(--md-mermaid-label-bg-color)!important;stroke:var(--md-mermaid-edge-color)!important}g.stateGroup rect{fill:var(--md-mermaid-node-bg-color);stroke:var(--md-mermaid-node-fg-color)}g.stateGroup .state-title{fill:var(--md-mermaid-label-fg-color)!important;font-family:var(--md-mermaid-font-family)}g.stateGroup .composit{fill:var(--md-mermaid-label-bg-color)}.nodeLabel,.nodeLabel p{color:var(--md-mermaid-label-fg-color);font-family:var(--md-mermaid-font-family)}.node circle.state-end,.node circle.state-start,.start-state{fill:var(--md-mermaid-edge-color);stroke:none}.end-state-inner,.end-state-outer{fill:var(--md-mermaid-edge-color)}.end-state-inner,.node circle.state-end{stroke:var(--md-mermaid-label-bg-color)}.transition{stroke:var(--md-mermaid-edge-color)}[id^=state-fork] rect,[id^=state-join] rect{fill:var(--md-mermaid-edge-color)!important;stroke:none!important}.statediagram-cluster.statediagram-cluster .inner{fill:var(--md-default-bg-color)}.statediagram-cluster rect{fill:var(--md-mermaid-node-bg-color);stroke:var(--md-mermaid-node-fg-color)}.statediagram-state rect.divider{fill:var(--md-default-fg-color--lightest);stroke:var(--md-default-fg-color--lighter)}defs #statediagram-barbEnd{stroke:var(--md-mermaid-edge-color)}.attributeBoxEven,.attributeBoxOdd{fill:var(--md-mermaid-node-bg-color);stroke:var(--md-mermaid-node-fg-color)}.entityBox{fill:var(--md-mermaid-label-bg-color);stroke:var(--md-mermaid-node-fg-color)}.entityLabel{fill:var(--md-mermaid-label-fg-color);font-family:var(--md-mermaid-font-family)}.relationshipLabelBox{fill:var(--md-mermaid-label-bg-color);fill-opacity:1;background-color:var(--md-mermaid-label-bg-color);opacity:1}.relationshipLabel{fill:var(--md-mermaid-label-fg-color)}.relationshipLine{stroke:var(--md-mermaid-edge-color)}defs #ONE_OR_MORE_END *,defs #ONE_OR_MORE_START *,defs #ONLY_ONE_END *,defs #ONLY_ONE_START *,defs #ZERO_OR_MORE_END *,defs #ZERO_OR_MORE_START *,defs #ZERO_OR_ONE_END *,defs #ZERO_OR_ONE_START *{stroke:var(--md-mermaid-edge-color)!important}defs #ZERO_OR_MORE_END circle,defs #ZERO_OR_MORE_START circle{fill:var(--md-mermaid-label-bg-color)}.actor{fill:var(--md-mermaid-sequence-actor-bg-color);stroke:var(--md-mermaid-sequence-actor-border-color)}text.actor>tspan{fill:var(--md-mermaid-sequence-actor-fg-color);font-family:var(--md-mermaid-font-family)}line{stroke:var(--md-mermaid-sequence-actor-line-color)}.actor-man circle,.actor-man line{fill:var(--md-mermaid-sequence-actorman-bg-color);stroke:var(--md-mermaid-sequence-actorman-line-color)}.messageLine0,.messageLine1{stroke:var(--md-mermaid-sequence-message-line-color)}.note{fill:var(--md-mermaid-sequence-note-bg-color);stroke:var(--md-mermaid-sequence-note-border-color)}.loopText,.loopText>tspan,.messageText,.noteText>tspan{stroke:none;font-family:var(--md-mermaid-font-family)!important}.messageText{fill:var(--md-mermaid-sequence-message-fg-color)}.loopText,.loopText>tspan{fill:var(--md-mermaid-sequence-loop-fg-color)}.noteText>tspan{fill:var(--md-mermaid-sequence-note-fg-color)}#arrowhead path{fill:var(--md-mermaid-sequence-message-line-color);stroke:none}.loopLine{fill:var(--md-mermaid-sequence-loop-bg-color);stroke:var(--md-mermaid-sequence-loop-border-color)}.labelBox{fill:var(--md-mermaid-sequence-label-bg-color);stroke:none}.labelText,.labelText>span{fill:var(--md-mermaid-sequence-label-fg-color);font-family:var(--md-mermaid-font-family)}.sequenceNumber{fill:var(--md-mermaid-sequence-number-fg-color)}rect.rect{fill:var(--md-mermaid-sequence-box-bg-color);stroke:none}rect.rect+text.text{fill:var(--md-mermaid-sequence-box-fg-color)}defs #sequencenumber{fill:var(--md-mermaid-sequence-number-bg-color)!important}";var qr,ka=0;function Ha(){return typeof mermaid=="undefined"||mermaid instanceof Element?gt("https://unpkg.com/mermaid@10.7.0/dist/mermaid.min.js"):$(void 0)}function _n(e){return e.classList.remove("mermaid"),qr||(qr=Ha().pipe(y(()=>mermaid.initialize({startOnLoad:!1,themeCSS:Ln,sequence:{actorFontSize:"16px",messageFontSize:"16px",noteFontSize:"16px"}})),m(()=>{}),B(1))),qr.subscribe(()=>ro(this,null,function*(){e.classList.add("mermaid");let t=`__mermaid_${ka++}`,r=E("div",{class:"mermaid"}),o=e.textContent,{svg:n,fn:i}=yield mermaid.render(t,o),s=r.attachShadow({mode:"closed"});s.innerHTML=n,e.replaceWith(r),i==null||i(s)})),qr.pipe(m(()=>({ref:e})))}var An=E("table");function Cn(e){return e.replaceWith(An),An.replaceWith(vn(e)),$({ref:e})}function $a(e){let t=e.find(r=>r.checked)||e[0];return T(...e.map(r=>d(r,"change").pipe(m(()=>P(`label[for="${r.id}"]`))))).pipe(q(P(`label[for="${t.id}"]`)),m(r=>({active:r})))}function kn(e,{viewport$:t,target$:r}){let o=P(".tabbed-labels",e),n=R(":scope > input",e),i=Nr("prev");e.append(i);let s=Nr("next");return e.append(s),H(()=>{let a=new v,c=a.pipe(ee(),oe(!0));Q([a,Ee(e)]).pipe(U(c),Me(1,de)).subscribe({next([{active:p},l]){let f=Ue(p),{width:u}=pe(p);e.style.setProperty("--md-indicator-x",`${f.x}px`),e.style.setProperty("--md-indicator-width",`${u}px`);let h=ir(o);(f.xh.x+l.width)&&o.scrollTo({left:Math.max(0,f.x-16),behavior:"smooth"})},complete(){e.style.removeProperty("--md-indicator-x"),e.style.removeProperty("--md-indicator-width")}}),Q([et(o),Ee(o)]).pipe(U(c)).subscribe(([p,l])=>{let f=xt(o);i.hidden=p.x<16,s.hidden=p.x>f.width-l.width-16}),T(d(i,"click").pipe(m(()=>-1)),d(s,"click").pipe(m(()=>1))).pipe(U(c)).subscribe(p=>{let{width:l}=pe(o);o.scrollBy({left:l*p,behavior:"smooth"})}),r.pipe(U(c),g(p=>n.includes(p))).subscribe(p=>p.click()),o.classList.add("tabbed-labels--linked");for(let p of n){let l=P(`label[for="${p.id}"]`);l.replaceChildren(E("a",{href:`#${l.htmlFor}`,tabIndex:-1},...Array.from(l.childNodes))),d(l.firstElementChild,"click").pipe(U(c),g(f=>!(f.metaKey||f.ctrlKey)),y(f=>{f.preventDefault(),f.stopPropagation()})).subscribe(()=>{history.replaceState({},"",`#${l.htmlFor}`),l.click()})}return G("content.tabs.link")&&a.pipe(Le(1),ae(t)).subscribe(([{active:p},{offset:l}])=>{let f=p.innerText.trim();if(p.hasAttribute("data-md-switching"))p.removeAttribute("data-md-switching");else{let u=e.offsetTop-l.y;for(let w of R("[data-tabs]"))for(let A of R(":scope > input",w)){let Z=P(`label[for="${A.id}"]`);if(Z!==p&&Z.innerText.trim()===f){Z.setAttribute("data-md-switching",""),A.click();break}}window.scrollTo({top:e.offsetTop-u});let h=__md_get("__tabs")||[];__md_set("__tabs",[...new Set([f,...h])])}}),a.pipe(U(c)).subscribe(()=>{for(let p of R("audio, video",e))p.pause()}),$a(n).pipe(y(p=>a.next(p)),_(()=>a.complete()),m(p=>F({ref:e},p)))}).pipe(ze(ie))}function Hn(e,{viewport$:t,target$:r,print$:o}){return T(...R(".annotate:not(.highlight)",e).map(n=>wn(n,{target$:r,print$:o})),...R("pre:not(.mermaid) > code",e).map(n=>On(n,{target$:r,print$:o})),...R("pre.mermaid",e).map(n=>_n(n)),...R("table:not([class])",e).map(n=>Cn(n)),...R("details",e).map(n=>Mn(n,{target$:r,print$:o})),...R("[data-tabs]",e).map(n=>kn(n,{viewport$:t,target$:r})),...R("[title]",e).filter(()=>G("content.tooltips")).map(n=>Ge(n)))}function Ra(e,{alert$:t}){return t.pipe(b(r=>T($(!0),$(!1).pipe(Ye(2e3))).pipe(m(o=>({message:r,active:o})))))}function $n(e,t){let r=P(".md-typeset",e);return H(()=>{let o=new v;return o.subscribe(({message:n,active:i})=>{e.classList.toggle("md-dialog--active",i),r.textContent=n}),Ra(e,t).pipe(y(n=>o.next(n)),_(()=>o.complete()),m(n=>F({ref:e},n)))})}function Pa({viewport$:e}){if(!G("header.autohide"))return $(!1);let t=e.pipe(m(({offset:{y:n}})=>n),Ke(2,1),m(([n,i])=>[nMath.abs(i-n.y)>100),m(([,[n]])=>n),Y()),o=We("search");return Q([e,o]).pipe(m(([{offset:n},i])=>n.y>400&&!i),Y(),b(n=>n?r:$(!1)),q(!1))}function Rn(e,t){return H(()=>Q([Ee(e),Pa(t)])).pipe(m(([{height:r},o])=>({height:r,hidden:o})),Y((r,o)=>r.height===o.height&&r.hidden===o.hidden),B(1))}function Pn(e,{header$:t,main$:r}){return H(()=>{let o=new v,n=o.pipe(ee(),oe(!0));o.pipe(X("active"),je(t)).subscribe(([{active:s},{hidden:a}])=>{e.classList.toggle("md-header--shadow",s&&!a),e.hidden=a});let i=fe(R("[title]",e)).pipe(g(()=>G("content.tooltips")),re(s=>Ge(s)));return r.subscribe(o),t.pipe(U(n),m(s=>F({ref:e},s)),$e(i.pipe(U(n))))})}function Ia(e,{viewport$:t,header$:r}){return pr(e,{viewport$:t,header$:r}).pipe(m(({offset:{y:o}})=>{let{height:n}=pe(e);return{active:o>=n}}),X("active"))}function In(e,t){return H(()=>{let r=new v;r.subscribe({next({active:n}){e.classList.toggle("md-header__title--active",n)},complete(){e.classList.remove("md-header__title--active")}});let o=me(".md-content h1");return typeof o=="undefined"?L:Ia(o,t).pipe(y(n=>r.next(n)),_(()=>r.complete()),m(n=>F({ref:e},n)))})}function Fn(e,{viewport$:t,header$:r}){let o=r.pipe(m(({height:i})=>i),Y()),n=o.pipe(b(()=>Ee(e).pipe(m(({height:i})=>({top:e.offsetTop,bottom:e.offsetTop+i})),X("bottom"))));return Q([o,n,t]).pipe(m(([i,{top:s,bottom:a},{offset:{y:c},size:{height:p}}])=>(p=Math.max(0,p-Math.max(0,s-c,i)-Math.max(0,p+c-a)),{offset:s-i,height:p,active:s-i<=c})),Y((i,s)=>i.offset===s.offset&&i.height===s.height&&i.active===s.active))}function Fa(e){let t=__md_get("__palette")||{index:e.findIndex(o=>matchMedia(o.getAttribute("data-md-color-media")).matches)},r=Math.max(0,Math.min(t.index,e.length-1));return $(...e).pipe(re(o=>d(o,"change").pipe(m(()=>o))),q(e[r]),m(o=>({index:e.indexOf(o),color:{media:o.getAttribute("data-md-color-media"),scheme:o.getAttribute("data-md-color-scheme"),primary:o.getAttribute("data-md-color-primary"),accent:o.getAttribute("data-md-color-accent")}})),B(1))}function jn(e){let t=R("input",e),r=E("meta",{name:"theme-color"});document.head.appendChild(r);let o=E("meta",{name:"color-scheme"});document.head.appendChild(o);let n=At("(prefers-color-scheme: light)");return H(()=>{let i=new v;return i.subscribe(s=>{if(document.body.setAttribute("data-md-color-switching",""),s.color.media==="(prefers-color-scheme)"){let a=matchMedia("(prefers-color-scheme: light)"),c=document.querySelector(a.matches?"[data-md-color-media='(prefers-color-scheme: light)']":"[data-md-color-media='(prefers-color-scheme: dark)']");s.color.scheme=c.getAttribute("data-md-color-scheme"),s.color.primary=c.getAttribute("data-md-color-primary"),s.color.accent=c.getAttribute("data-md-color-accent")}for(let[a,c]of Object.entries(s.color))document.body.setAttribute(`data-md-color-${a}`,c);for(let a=0;a{let s=Te("header"),a=window.getComputedStyle(s);return o.content=a.colorScheme,a.backgroundColor.match(/\d+/g).map(c=>(+c).toString(16).padStart(2,"0")).join("")})).subscribe(s=>r.content=`#${s}`),i.pipe(Oe(ie)).subscribe(()=>{document.body.removeAttribute("data-md-color-switching")}),Fa(t).pipe(U(n.pipe(Le(1))),at(),y(s=>i.next(s)),_(()=>i.complete()),m(s=>F({ref:e},s)))})}function Un(e,{progress$:t}){return H(()=>{let r=new v;return r.subscribe(({value:o})=>{e.style.setProperty("--md-progress-value",`${o}`)}),t.pipe(y(o=>r.next({value:o})),_(()=>r.complete()),m(o=>({ref:e,value:o})))})}var Kr=jt(zr());function ja(e){e.setAttribute("data-md-copying","");let t=e.closest("[data-copy]"),r=t?t.getAttribute("data-copy"):e.innerText;return e.removeAttribute("data-md-copying"),r.trimEnd()}function Wn({alert$:e}){Kr.default.isSupported()&&new j(t=>{new Kr.default("[data-clipboard-target], [data-clipboard-text]",{text:r=>r.getAttribute("data-clipboard-text")||ja(P(r.getAttribute("data-clipboard-target")))}).on("success",r=>t.next(r))}).pipe(y(t=>{t.trigger.focus()}),m(()=>ge("clipboard.copied"))).subscribe(e)}function Dn(e,t){return e.protocol=t.protocol,e.hostname=t.hostname,e}function Ua(e,t){let r=new Map;for(let o of R("url",e)){let n=P("loc",o),i=[Dn(new URL(n.textContent),t)];r.set(`${i[0]}`,i);for(let s of R("[rel=alternate]",o)){let a=s.getAttribute("href");a!=null&&i.push(Dn(new URL(a),t))}}return r}function mr(e){return on(new URL("sitemap.xml",e)).pipe(m(t=>Ua(t,new URL(e))),he(()=>$(new Map)))}function Wa(e,t){if(!(e.target instanceof Element))return L;let r=e.target.closest("a");if(r===null)return L;if(r.target||e.metaKey||e.ctrlKey)return L;let o=new URL(r.href);return o.search=o.hash="",t.has(`${o}`)?(e.preventDefault(),$(new URL(r.href))):L}function Nn(e){let t=new Map;for(let r of R(":scope > *",e.head))t.set(r.outerHTML,r);return t}function Vn(e){for(let t of R("[href], [src]",e))for(let r of["href","src"]){let o=t.getAttribute(r);if(o&&!/^(?:[a-z]+:)?\/\//i.test(o)){t[r]=t[r];break}}return $(e)}function Da(e){for(let o of["[data-md-component=announce]","[data-md-component=container]","[data-md-component=header-topic]","[data-md-component=outdated]","[data-md-component=logo]","[data-md-component=skip]",...G("navigation.tabs.sticky")?["[data-md-component=tabs]"]:[]]){let n=me(o),i=me(o,e);typeof n!="undefined"&&typeof i!="undefined"&&n.replaceWith(i)}let t=Nn(document);for(let[o,n]of Nn(e))t.has(o)?t.delete(o):document.head.appendChild(n);for(let o of t.values()){let n=o.getAttribute("name");n!=="theme-color"&&n!=="color-scheme"&&o.remove()}let r=Te("container");return Fe(R("script",r)).pipe(b(o=>{let n=e.createElement("script");if(o.src){for(let i of o.getAttributeNames())n.setAttribute(i,o.getAttribute(i));return o.replaceWith(n),new j(i=>{n.onload=()=>i.complete()})}else return n.textContent=o.textContent,o.replaceWith(n),L}),ee(),oe(document))}function zn({location$:e,viewport$:t,progress$:r}){let o=we();if(location.protocol==="file:")return L;let n=mr(o.base);$(document).subscribe(Vn);let i=d(document.body,"click").pipe(je(n),b(([c,p])=>Wa(c,p)),le()),s=d(window,"popstate").pipe(m(ve),le());i.pipe(ae(t)).subscribe(([c,{offset:p}])=>{history.replaceState(p,""),history.pushState(null,"",c)}),T(i,s).subscribe(e);let a=e.pipe(X("pathname"),b(c=>rn(c,{progress$:r}).pipe(he(()=>(st(c,!0),L)))),b(Vn),b(Da),le());return T(a.pipe(ae(e,(c,p)=>p)),e.pipe(X("pathname"),b(()=>e),X("hash")),e.pipe(Y((c,p)=>c.pathname===p.pathname&&c.hash===p.hash),b(()=>i),y(()=>history.back()))).subscribe(c=>{var p,l;history.state!==null||!c.hash?window.scrollTo(0,(l=(p=history.state)==null?void 0:p.y)!=null?l:0):(history.scrollRestoration="auto",Zo(c.hash),history.scrollRestoration="manual")}),e.subscribe(()=>{history.scrollRestoration="manual"}),d(window,"beforeunload").subscribe(()=>{history.scrollRestoration="auto"}),t.pipe(X("offset"),be(100)).subscribe(({offset:c})=>{history.replaceState(c,"")}),a}var Qn=jt(Kn());function Yn(e){let t=e.separator.split("|").map(n=>n.replace(/(\(\?[!=<][^)]+\))/g,"").length===0?"\uFFFD":n).join("|"),r=new RegExp(t,"img"),o=(n,i,s)=>`${i}${s}`;return n=>{n=n.replace(/[\s*+\-:~^]+/g," ").trim();let i=new RegExp(`(^|${e.separator}|)(${n.replace(/[|\\{}()[\]^$+*?.-]/g,"\\$&").replace(r,"|")})`,"img");return s=>(0,Qn.default)(s).replace(i,o).replace(/<\/mark>(\s+)]*>/img,"$1")}}function Ht(e){return e.type===1}function fr(e){return e.type===3}function Bn(e,t){let r=ln(e);return T($(location.protocol!=="file:"),We("search")).pipe(He(o=>o),b(()=>t)).subscribe(({config:o,docs:n})=>r.next({type:0,data:{config:o,docs:n,options:{suggest:G("search.suggest")}}})),r}function Gn({document$:e}){let t=we(),r=De(new URL("../versions.json",t.base)).pipe(he(()=>L)),o=r.pipe(m(n=>{let[,i]=t.base.match(/([^/]+)\/?$/);return n.find(({version:s,aliases:a})=>s===i||a.includes(i))||n[0]}));r.pipe(m(n=>new Map(n.map(i=>[`${new URL(`../${i.version}/`,t.base)}`,i]))),b(n=>d(document.body,"click").pipe(g(i=>!i.metaKey&&!i.ctrlKey),ae(o),b(([i,s])=>{if(i.target instanceof Element){let a=i.target.closest("a");if(a&&!a.target&&n.has(a.href)){let c=a.href;return!i.target.closest(".md-version")&&n.get(c)===s?L:(i.preventDefault(),$(c))}}return L}),b(i=>{let{version:s}=n.get(i);return mr(new URL(i)).pipe(m(a=>{let p=ve().href.replace(t.base,"");return a.has(p.split("#")[0])?new URL(`../${s}/${p}`,t.base):new URL(i)}))})))).subscribe(n=>st(n,!0)),Q([r,o]).subscribe(([n,i])=>{P(".md-header__topic").appendChild(gn(n,i))}),e.pipe(b(()=>o)).subscribe(n=>{var s;let i=__md_get("__outdated",sessionStorage);if(i===null){i=!0;let a=((s=t.version)==null?void 0:s.default)||"latest";Array.isArray(a)||(a=[a]);e:for(let c of a)for(let p of n.aliases.concat(n.version))if(new RegExp(c,"i").test(p)){i=!1;break e}__md_set("__outdated",i,sessionStorage)}if(i)for(let a of ne("outdated"))a.hidden=!1})}function Ka(e,{worker$:t}){let{searchParams:r}=ve();r.has("q")&&(Be("search",!0),e.value=r.get("q"),e.focus(),We("search").pipe(He(i=>!i)).subscribe(()=>{let i=ve();i.searchParams.delete("q"),history.replaceState({},"",`${i}`)}));let o=vt(e),n=T(t.pipe(He(Ht)),d(e,"keyup"),o).pipe(m(()=>e.value),Y());return Q([n,o]).pipe(m(([i,s])=>({value:i,focus:s})),B(1))}function Jn(e,{worker$:t}){let r=new v,o=r.pipe(ee(),oe(!0));Q([t.pipe(He(Ht)),r],(i,s)=>s).pipe(X("value")).subscribe(({value:i})=>t.next({type:2,data:i})),r.pipe(X("focus")).subscribe(({focus:i})=>{i&&Be("search",i)}),d(e.form,"reset").pipe(U(o)).subscribe(()=>e.focus());let n=P("header [for=__search]");return d(n,"click").subscribe(()=>e.focus()),Ka(e,{worker$:t}).pipe(y(i=>r.next(i)),_(()=>r.complete()),m(i=>F({ref:e},i)),B(1))}function Xn(e,{worker$:t,query$:r}){let o=new v,n=Yo(e.parentElement).pipe(g(Boolean)),i=e.parentElement,s=P(":scope > :first-child",e),a=P(":scope > :last-child",e);We("search").subscribe(l=>a.setAttribute("role",l?"list":"presentation")),o.pipe(ae(r),Ir(t.pipe(He(Ht)))).subscribe(([{items:l},{value:f}])=>{switch(l.length){case 0:s.textContent=f.length?ge("search.result.none"):ge("search.result.placeholder");break;case 1:s.textContent=ge("search.result.one");break;default:let u=ar(l.length);s.textContent=ge("search.result.other",u)}});let c=o.pipe(y(()=>a.innerHTML=""),b(({items:l})=>T($(...l.slice(0,10)),$(...l.slice(10)).pipe(Ke(4),jr(n),b(([f])=>f)))),m(hn),le());return c.subscribe(l=>a.appendChild(l)),c.pipe(re(l=>{let f=me("details",l);return typeof f=="undefined"?L:d(f,"toggle").pipe(U(o),m(()=>f))})).subscribe(l=>{l.open===!1&&l.offsetTop<=i.scrollTop&&i.scrollTo({top:l.offsetTop})}),t.pipe(g(fr),m(({data:l})=>l)).pipe(y(l=>o.next(l)),_(()=>o.complete()),m(l=>F({ref:e},l)))}function Qa(e,{query$:t}){return t.pipe(m(({value:r})=>{let o=ve();return o.hash="",r=r.replace(/\s+/g,"+").replace(/&/g,"%26").replace(/=/g,"%3D"),o.search=`q=${r}`,{url:o}}))}function Zn(e,t){let r=new v,o=r.pipe(ee(),oe(!0));return r.subscribe(({url:n})=>{e.setAttribute("data-clipboard-text",e.href),e.href=`${n}`}),d(e,"click").pipe(U(o)).subscribe(n=>n.preventDefault()),Qa(e,t).pipe(y(n=>r.next(n)),_(()=>r.complete()),m(n=>F({ref:e},n)))}function ei(e,{worker$:t,keyboard$:r}){let o=new v,n=Te("search-query"),i=T(d(n,"keydown"),d(n,"focus")).pipe(Oe(ie),m(()=>n.value),Y());return o.pipe(je(i),m(([{suggest:a},c])=>{let p=c.split(/([\s-]+)/);if(a!=null&&a.length&&p[p.length-1]){let l=a[a.length-1];l.startsWith(p[p.length-1])&&(p[p.length-1]=l)}else p.length=0;return p})).subscribe(a=>e.innerHTML=a.join("").replace(/\s/g," ")),r.pipe(g(({mode:a})=>a==="search")).subscribe(a=>{switch(a.type){case"ArrowRight":e.innerText.length&&n.selectionStart===n.value.length&&(n.value=e.innerText);break}}),t.pipe(g(fr),m(({data:a})=>a)).pipe(y(a=>o.next(a)),_(()=>o.complete()),m(()=>({ref:e})))}function ti(e,{index$:t,keyboard$:r}){let o=we();try{let n=Bn(o.search,t),i=Te("search-query",e),s=Te("search-result",e);d(e,"click").pipe(g(({target:c})=>c instanceof Element&&!!c.closest("a"))).subscribe(()=>Be("search",!1)),r.pipe(g(({mode:c})=>c==="search")).subscribe(c=>{let p=Re();switch(c.type){case"Enter":if(p===i){let l=new Map;for(let f of R(":first-child [href]",s)){let u=f.firstElementChild;l.set(f,parseFloat(u.getAttribute("data-md-score")))}if(l.size){let[[f]]=[...l].sort(([,u],[,h])=>h-u);f.click()}c.claim()}break;case"Escape":case"Tab":Be("search",!1),i.blur();break;case"ArrowUp":case"ArrowDown":if(typeof p=="undefined")i.focus();else{let l=[i,...R(":not(details) > [href], summary, details[open] [href]",s)],f=Math.max(0,(Math.max(0,l.indexOf(p))+l.length+(c.type==="ArrowUp"?-1:1))%l.length);l[f].focus()}c.claim();break;default:i!==Re()&&i.focus()}}),r.pipe(g(({mode:c})=>c==="global")).subscribe(c=>{switch(c.type){case"f":case"s":case"/":i.focus(),i.select(),c.claim();break}});let a=Jn(i,{worker$:n});return T(a,Xn(s,{worker$:n,query$:a})).pipe($e(...ne("search-share",e).map(c=>Zn(c,{query$:a})),...ne("search-suggest",e).map(c=>ei(c,{worker$:n,keyboard$:r}))))}catch(n){return e.hidden=!0,qe}}function ri(e,{index$:t,location$:r}){return Q([t,r.pipe(q(ve()),g(o=>!!o.searchParams.get("h")))]).pipe(m(([o,n])=>Yn(o.config)(n.searchParams.get("h"))),m(o=>{var s;let n=new Map,i=document.createNodeIterator(e,NodeFilter.SHOW_TEXT);for(let a=i.nextNode();a;a=i.nextNode())if((s=a.parentElement)!=null&&s.offsetHeight){let c=a.textContent,p=o(c);p.length>c.length&&n.set(a,p)}for(let[a,c]of n){let{childNodes:p}=E("span",null,c);a.replaceWith(...Array.from(p))}return{ref:e,nodes:n}}))}function Ya(e,{viewport$:t,main$:r}){let o=e.closest(".md-grid"),n=o.offsetTop-o.parentElement.offsetTop;return Q([r,t]).pipe(m(([{offset:i,height:s},{offset:{y:a}}])=>(s=s+Math.min(n,Math.max(0,a-i))-n,{height:s,locked:a>=i+n})),Y((i,s)=>i.height===s.height&&i.locked===s.locked))}function Qr(e,o){var n=o,{header$:t}=n,r=to(n,["header$"]);let i=P(".md-sidebar__scrollwrap",e),{y:s}=Ue(i);return H(()=>{let a=new v,c=a.pipe(ee(),oe(!0)),p=a.pipe(Me(0,de));return p.pipe(ae(t)).subscribe({next([{height:l},{height:f}]){i.style.height=`${l-2*s}px`,e.style.top=`${f}px`},complete(){i.style.height="",e.style.top=""}}),p.pipe(He()).subscribe(()=>{for(let l of R(".md-nav__link--active[href]",e)){if(!l.clientHeight)continue;let f=l.closest(".md-sidebar__scrollwrap");if(typeof f!="undefined"){let u=l.offsetTop-f.offsetTop,{height:h}=pe(f);f.scrollTo({top:u-h/2})}}}),fe(R("label[tabindex]",e)).pipe(re(l=>d(l,"click").pipe(Oe(ie),m(()=>l),U(c)))).subscribe(l=>{let f=P(`[id="${l.htmlFor}"]`);P(`[aria-labelledby="${l.id}"]`).setAttribute("aria-expanded",`${f.checked}`)}),Ya(e,r).pipe(y(l=>a.next(l)),_(()=>a.complete()),m(l=>F({ref:e},l)))})}function oi(e,t){if(typeof t!="undefined"){let r=`https://api.github.com/repos/${e}/${t}`;return Lt(De(`${r}/releases/latest`).pipe(he(()=>L),m(o=>({version:o.tag_name})),Qe({})),De(r).pipe(he(()=>L),m(o=>({stars:o.stargazers_count,forks:o.forks_count})),Qe({}))).pipe(m(([o,n])=>F(F({},o),n)))}else{let r=`https://api.github.com/users/${e}`;return De(r).pipe(m(o=>({repositories:o.public_repos})),Qe({}))}}function ni(e,t){let r=`https://${e}/api/v4/projects/${encodeURIComponent(t)}`;return De(r).pipe(he(()=>L),m(({star_count:o,forks_count:n})=>({stars:o,forks:n})),Qe({}))}function ii(e){let t=e.match(/^.+github\.com\/([^/]+)\/?([^/]+)?/i);if(t){let[,r,o]=t;return oi(r,o)}if(t=e.match(/^.+?([^/]*gitlab[^/]+)\/(.+?)\/?$/i),t){let[,r,o]=t;return ni(r,o)}return L}var Ba;function Ga(e){return Ba||(Ba=H(()=>{let t=__md_get("__source",sessionStorage);if(t)return $(t);if(ne("consent").length){let o=__md_get("__consent");if(!(o&&o.github))return L}return ii(e.href).pipe(y(o=>__md_set("__source",o,sessionStorage)))}).pipe(he(()=>L),g(t=>Object.keys(t).length>0),m(t=>({facts:t})),B(1)))}function ai(e){let t=P(":scope > :last-child",e);return H(()=>{let r=new v;return r.subscribe(({facts:o})=>{t.appendChild(bn(o)),t.classList.add("md-source__repository--active")}),Ga(e).pipe(y(o=>r.next(o)),_(()=>r.complete()),m(o=>F({ref:e},o)))})}function Ja(e,{viewport$:t,header$:r}){return Ee(document.body).pipe(b(()=>pr(e,{header$:r,viewport$:t})),m(({offset:{y:o}})=>({hidden:o>=10})),X("hidden"))}function si(e,t){return H(()=>{let r=new v;return r.subscribe({next({hidden:o}){e.hidden=o},complete(){e.hidden=!1}}),(G("navigation.tabs.sticky")?$({hidden:!1}):Ja(e,t)).pipe(y(o=>r.next(o)),_(()=>r.complete()),m(o=>F({ref:e},o)))})}function Xa(e,{viewport$:t,header$:r}){let o=new Map,n=R(".md-nav__link",e);for(let a of n){let c=decodeURIComponent(a.hash.substring(1)),p=me(`[id="${c}"]`);typeof p!="undefined"&&o.set(a,p)}let i=r.pipe(X("height"),m(({height:a})=>{let c=Te("main"),p=P(":scope > :first-child",c);return a+.8*(p.offsetTop-c.offsetTop)}),le());return Ee(document.body).pipe(X("height"),b(a=>H(()=>{let c=[];return $([...o].reduce((p,[l,f])=>{for(;c.length&&o.get(c[c.length-1]).tagName>=f.tagName;)c.pop();let u=f.offsetTop;for(;!u&&f.parentElement;)f=f.parentElement,u=f.offsetTop;let h=f.offsetParent;for(;h;h=h.offsetParent)u+=h.offsetTop;return p.set([...c=[...c,l]].reverse(),u)},new Map))}).pipe(m(c=>new Map([...c].sort(([,p],[,l])=>p-l))),je(i),b(([c,p])=>t.pipe(Rr(([l,f],{offset:{y:u},size:h})=>{let w=u+h.height>=Math.floor(a.height);for(;f.length;){let[,A]=f[0];if(A-p=u&&!w)f=[l.pop(),...f];else break}return[l,f]},[[],[...c]]),Y((l,f)=>l[0]===f[0]&&l[1]===f[1])))))).pipe(m(([a,c])=>({prev:a.map(([p])=>p),next:c.map(([p])=>p)})),q({prev:[],next:[]}),Ke(2,1),m(([a,c])=>a.prev.length{let i=new v,s=i.pipe(ee(),oe(!0));if(i.subscribe(({prev:a,next:c})=>{for(let[p]of c)p.classList.remove("md-nav__link--passed"),p.classList.remove("md-nav__link--active");for(let[p,[l]]of a.entries())l.classList.add("md-nav__link--passed"),l.classList.toggle("md-nav__link--active",p===a.length-1)}),G("toc.follow")){let a=T(t.pipe(be(1),m(()=>{})),t.pipe(be(250),m(()=>"smooth")));i.pipe(g(({prev:c})=>c.length>0),je(o.pipe(Oe(ie))),ae(a)).subscribe(([[{prev:c}],p])=>{let[l]=c[c.length-1];if(l.offsetHeight){let f=sr(l);if(typeof f!="undefined"){let u=l.offsetTop-f.offsetTop,{height:h}=pe(f);f.scrollTo({top:u-h/2,behavior:p})}}})}return G("navigation.tracking")&&t.pipe(U(s),X("offset"),be(250),Le(1),U(n.pipe(Le(1))),at({delay:250}),ae(i)).subscribe(([,{prev:a}])=>{let c=ve(),p=a[a.length-1];if(p&&p.length){let[l]=p,{hash:f}=new URL(l.href);c.hash!==f&&(c.hash=f,history.replaceState({},"",`${c}`))}else c.hash="",history.replaceState({},"",`${c}`)}),Xa(e,{viewport$:t,header$:r}).pipe(y(a=>i.next(a)),_(()=>i.complete()),m(a=>F({ref:e},a)))})}function Za(e,{viewport$:t,main$:r,target$:o}){let n=t.pipe(m(({offset:{y:s}})=>s),Ke(2,1),m(([s,a])=>s>a&&a>0),Y()),i=r.pipe(m(({active:s})=>s));return Q([i,n]).pipe(m(([s,a])=>!(s&&a)),Y(),U(o.pipe(Le(1))),oe(!0),at({delay:250}),m(s=>({hidden:s})))}function pi(e,{viewport$:t,header$:r,main$:o,target$:n}){let i=new v,s=i.pipe(ee(),oe(!0));return i.subscribe({next({hidden:a}){e.hidden=a,a?(e.setAttribute("tabindex","-1"),e.blur()):e.removeAttribute("tabindex")},complete(){e.style.top="",e.hidden=!0,e.removeAttribute("tabindex")}}),r.pipe(U(s),X("height")).subscribe(({height:a})=>{e.style.top=`${a+16}px`}),d(e,"click").subscribe(a=>{a.preventDefault(),window.scrollTo({top:0})}),Za(e,{viewport$:t,main$:o,target$:n}).pipe(y(a=>i.next(a)),_(()=>i.complete()),m(a=>F({ref:e},a)))}function li({document$:e}){e.pipe(b(()=>R(".md-ellipsis")),re(t=>yt(t).pipe(U(e.pipe(Le(1))),g(r=>r),m(()=>t),ye(1))),g(t=>t.offsetWidth{let r=t.innerText,o=t.closest("a")||t;return o.title=r,Ge(o).pipe(U(e.pipe(Le(1))),_(()=>o.removeAttribute("title")))})).subscribe(),e.pipe(b(()=>R(".md-status")),re(t=>Ge(t))).subscribe()}function mi({document$:e,tablet$:t}){e.pipe(b(()=>R(".md-toggle--indeterminate")),y(r=>{r.indeterminate=!0,r.checked=!1}),re(r=>d(r,"change").pipe(Fr(()=>r.classList.contains("md-toggle--indeterminate")),m(()=>r))),ae(t)).subscribe(([r,o])=>{r.classList.remove("md-toggle--indeterminate"),o&&(r.checked=!1)})}function es(){return/(iPad|iPhone|iPod)/.test(navigator.userAgent)}function fi({document$:e}){e.pipe(b(()=>R("[data-md-scrollfix]")),y(t=>t.removeAttribute("data-md-scrollfix")),g(es),re(t=>d(t,"touchstart").pipe(m(()=>t)))).subscribe(t=>{let r=t.scrollTop;r===0?t.scrollTop=1:r+t.offsetHeight===t.scrollHeight&&(t.scrollTop=r-1)})}function ui({viewport$:e,tablet$:t}){Q([We("search"),t]).pipe(m(([r,o])=>r&&!o),b(r=>$(r).pipe(Ye(r?400:100))),ae(e)).subscribe(([r,{offset:{y:o}}])=>{if(r)document.body.setAttribute("data-md-scrolllock",""),document.body.style.top=`-${o}px`;else{let n=-1*parseInt(document.body.style.top,10);document.body.removeAttribute("data-md-scrolllock"),document.body.style.top="",n&&window.scrollTo(0,n)}})}Object.entries||(Object.entries=function(e){let t=[];for(let r of Object.keys(e))t.push([r,e[r]]);return t});Object.values||(Object.values=function(e){let t=[];for(let r of Object.keys(e))t.push(e[r]);return t});typeof Element!="undefined"&&(Element.prototype.scrollTo||(Element.prototype.scrollTo=function(e,t){typeof e=="object"?(this.scrollLeft=e.left,this.scrollTop=e.top):(this.scrollLeft=e,this.scrollTop=t)}),Element.prototype.replaceWith||(Element.prototype.replaceWith=function(...e){let t=this.parentNode;if(t){e.length===0&&t.removeChild(this);for(let r=e.length-1;r>=0;r--){let o=e[r];typeof o=="string"?o=document.createTextNode(o):o.parentNode&&o.parentNode.removeChild(o),r?t.insertBefore(this.previousSibling,o):t.replaceChild(o,this)}}}));function ts(){return location.protocol==="file:"?gt(`${new URL("search/search_index.js",Yr.base)}`).pipe(m(()=>__index),B(1)):De(new URL("search/search_index.json",Yr.base))}document.documentElement.classList.remove("no-js");document.documentElement.classList.add("js");var rt=No(),Rt=Jo(),wt=en(Rt),Br=Go(),_e=pn(),ur=At("(min-width: 960px)"),hi=At("(min-width: 1220px)"),bi=tn(),Yr=we(),vi=document.forms.namedItem("search")?ts():qe,Gr=new v;Wn({alert$:Gr});var Jr=new v;G("navigation.instant")&&zn({location$:Rt,viewport$:_e,progress$:Jr}).subscribe(rt);var di;((di=Yr.version)==null?void 0:di.provider)==="mike"&&Gn({document$:rt});T(Rt,wt).pipe(Ye(125)).subscribe(()=>{Be("drawer",!1),Be("search",!1)});Br.pipe(g(({mode:e})=>e==="global")).subscribe(e=>{switch(e.type){case"p":case",":let t=me("link[rel=prev]");typeof t!="undefined"&&st(t);break;case"n":case".":let r=me("link[rel=next]");typeof r!="undefined"&&st(r);break;case"Enter":let o=Re();o instanceof HTMLLabelElement&&o.click()}});li({document$:rt});mi({document$:rt,tablet$:ur});fi({document$:rt});ui({viewport$:_e,tablet$:ur});var tt=Rn(Te("header"),{viewport$:_e}),$t=rt.pipe(m(()=>Te("main")),b(e=>Fn(e,{viewport$:_e,header$:tt})),B(1)),rs=T(...ne("consent").map(e=>fn(e,{target$:wt})),...ne("dialog").map(e=>$n(e,{alert$:Gr})),...ne("header").map(e=>Pn(e,{viewport$:_e,header$:tt,main$:$t})),...ne("palette").map(e=>jn(e)),...ne("progress").map(e=>Un(e,{progress$:Jr})),...ne("search").map(e=>ti(e,{index$:vi,keyboard$:Br})),...ne("source").map(e=>ai(e))),os=H(()=>T(...ne("announce").map(e=>mn(e)),...ne("content").map(e=>Hn(e,{viewport$:_e,target$:wt,print$:bi})),...ne("content").map(e=>G("search.highlight")?ri(e,{index$:vi,location$:Rt}):L),...ne("header-title").map(e=>In(e,{viewport$:_e,header$:tt})),...ne("sidebar").map(e=>e.getAttribute("data-md-type")==="navigation"?Ur(hi,()=>Qr(e,{viewport$:_e,header$:tt,main$:$t})):Ur(ur,()=>Qr(e,{viewport$:_e,header$:tt,main$:$t}))),...ne("tabs").map(e=>si(e,{viewport$:_e,header$:tt})),...ne("toc").map(e=>ci(e,{viewport$:_e,header$:tt,main$:$t,target$:wt})),...ne("top").map(e=>pi(e,{viewport$:_e,header$:tt,main$:$t,target$:wt})))),gi=rt.pipe(b(()=>os),$e(rs),B(1));gi.subscribe();window.document$=rt;window.location$=Rt;window.target$=wt;window.keyboard$=Br;window.viewport$=_e;window.tablet$=ur;window.screen$=hi;window.print$=bi;window.alert$=Gr;window.progress$=Jr;window.component$=gi;})();
+//# sourceMappingURL=bundle.bd41221c.min.js.map
+
diff --git a/assets/javascripts/bundle.bd41221c.min.js.map b/assets/javascripts/bundle.bd41221c.min.js.map
new file mode 100644
index 000000000..1663daba2
--- /dev/null
+++ b/assets/javascripts/bundle.bd41221c.min.js.map
@@ -0,0 +1,7 @@
+{
+ "version": 3,
+ "sources": ["node_modules/focus-visible/dist/focus-visible.js", "node_modules/clipboard/dist/clipboard.js", "node_modules/escape-html/index.js", "src/templates/assets/javascripts/bundle.ts", "node_modules/rxjs/node_modules/tslib/tslib.es6.js", "node_modules/rxjs/src/internal/util/isFunction.ts", "node_modules/rxjs/src/internal/util/createErrorClass.ts", "node_modules/rxjs/src/internal/util/UnsubscriptionError.ts", "node_modules/rxjs/src/internal/util/arrRemove.ts", "node_modules/rxjs/src/internal/Subscription.ts", "node_modules/rxjs/src/internal/config.ts", "node_modules/rxjs/src/internal/scheduler/timeoutProvider.ts", "node_modules/rxjs/src/internal/util/reportUnhandledError.ts", "node_modules/rxjs/src/internal/util/noop.ts", "node_modules/rxjs/src/internal/NotificationFactories.ts", "node_modules/rxjs/src/internal/util/errorContext.ts", "node_modules/rxjs/src/internal/Subscriber.ts", "node_modules/rxjs/src/internal/symbol/observable.ts", "node_modules/rxjs/src/internal/util/identity.ts", "node_modules/rxjs/src/internal/util/pipe.ts", "node_modules/rxjs/src/internal/Observable.ts", "node_modules/rxjs/src/internal/util/lift.ts", "node_modules/rxjs/src/internal/operators/OperatorSubscriber.ts", "node_modules/rxjs/src/internal/scheduler/animationFrameProvider.ts", "node_modules/rxjs/src/internal/util/ObjectUnsubscribedError.ts", "node_modules/rxjs/src/internal/Subject.ts", "node_modules/rxjs/src/internal/scheduler/dateTimestampProvider.ts", "node_modules/rxjs/src/internal/ReplaySubject.ts", "node_modules/rxjs/src/internal/scheduler/Action.ts", "node_modules/rxjs/src/internal/scheduler/intervalProvider.ts", "node_modules/rxjs/src/internal/scheduler/AsyncAction.ts", "node_modules/rxjs/src/internal/Scheduler.ts", "node_modules/rxjs/src/internal/scheduler/AsyncScheduler.ts", "node_modules/rxjs/src/internal/scheduler/async.ts", "node_modules/rxjs/src/internal/scheduler/AnimationFrameAction.ts", "node_modules/rxjs/src/internal/scheduler/AnimationFrameScheduler.ts", "node_modules/rxjs/src/internal/scheduler/animationFrame.ts", "node_modules/rxjs/src/internal/observable/empty.ts", "node_modules/rxjs/src/internal/util/isScheduler.ts", "node_modules/rxjs/src/internal/util/args.ts", "node_modules/rxjs/src/internal/util/isArrayLike.ts", "node_modules/rxjs/src/internal/util/isPromise.ts", "node_modules/rxjs/src/internal/util/isInteropObservable.ts", "node_modules/rxjs/src/internal/util/isAsyncIterable.ts", "node_modules/rxjs/src/internal/util/throwUnobservableError.ts", "node_modules/rxjs/src/internal/symbol/iterator.ts", "node_modules/rxjs/src/internal/util/isIterable.ts", "node_modules/rxjs/src/internal/util/isReadableStreamLike.ts", "node_modules/rxjs/src/internal/observable/innerFrom.ts", "node_modules/rxjs/src/internal/util/executeSchedule.ts", "node_modules/rxjs/src/internal/operators/observeOn.ts", "node_modules/rxjs/src/internal/operators/subscribeOn.ts", "node_modules/rxjs/src/internal/scheduled/scheduleObservable.ts", "node_modules/rxjs/src/internal/scheduled/schedulePromise.ts", "node_modules/rxjs/src/internal/scheduled/scheduleArray.ts", "node_modules/rxjs/src/internal/scheduled/scheduleIterable.ts", "node_modules/rxjs/src/internal/scheduled/scheduleAsyncIterable.ts", "node_modules/rxjs/src/internal/scheduled/scheduleReadableStreamLike.ts", "node_modules/rxjs/src/internal/scheduled/scheduled.ts", "node_modules/rxjs/src/internal/observable/from.ts", "node_modules/rxjs/src/internal/observable/of.ts", "node_modules/rxjs/src/internal/observable/throwError.ts", "node_modules/rxjs/src/internal/util/EmptyError.ts", "node_modules/rxjs/src/internal/util/isDate.ts", "node_modules/rxjs/src/internal/operators/map.ts", "node_modules/rxjs/src/internal/util/mapOneOrManyArgs.ts", "node_modules/rxjs/src/internal/util/argsArgArrayOrObject.ts", "node_modules/rxjs/src/internal/util/createObject.ts", "node_modules/rxjs/src/internal/observable/combineLatest.ts", "node_modules/rxjs/src/internal/operators/mergeInternals.ts", "node_modules/rxjs/src/internal/operators/mergeMap.ts", "node_modules/rxjs/src/internal/operators/mergeAll.ts", "node_modules/rxjs/src/internal/operators/concatAll.ts", "node_modules/rxjs/src/internal/observable/concat.ts", "node_modules/rxjs/src/internal/observable/defer.ts", "node_modules/rxjs/src/internal/observable/fromEvent.ts", "node_modules/rxjs/src/internal/observable/fromEventPattern.ts", "node_modules/rxjs/src/internal/observable/timer.ts", "node_modules/rxjs/src/internal/observable/merge.ts", "node_modules/rxjs/src/internal/observable/never.ts", "node_modules/rxjs/src/internal/util/argsOrArgArray.ts", "node_modules/rxjs/src/internal/operators/filter.ts", "node_modules/rxjs/src/internal/observable/zip.ts", "node_modules/rxjs/src/internal/operators/audit.ts", "node_modules/rxjs/src/internal/operators/auditTime.ts", "node_modules/rxjs/src/internal/operators/bufferCount.ts", "node_modules/rxjs/src/internal/operators/catchError.ts", "node_modules/rxjs/src/internal/operators/scanInternals.ts", "node_modules/rxjs/src/internal/operators/combineLatest.ts", "node_modules/rxjs/src/internal/operators/combineLatestWith.ts", "node_modules/rxjs/src/internal/operators/debounceTime.ts", "node_modules/rxjs/src/internal/operators/defaultIfEmpty.ts", "node_modules/rxjs/src/internal/operators/take.ts", "node_modules/rxjs/src/internal/operators/ignoreElements.ts", "node_modules/rxjs/src/internal/operators/mapTo.ts", "node_modules/rxjs/src/internal/operators/delayWhen.ts", "node_modules/rxjs/src/internal/operators/delay.ts", "node_modules/rxjs/src/internal/operators/distinctUntilChanged.ts", "node_modules/rxjs/src/internal/operators/distinctUntilKeyChanged.ts", "node_modules/rxjs/src/internal/operators/throwIfEmpty.ts", "node_modules/rxjs/src/internal/operators/endWith.ts", "node_modules/rxjs/src/internal/operators/finalize.ts", "node_modules/rxjs/src/internal/operators/first.ts", "node_modules/rxjs/src/internal/operators/takeLast.ts", "node_modules/rxjs/src/internal/operators/merge.ts", "node_modules/rxjs/src/internal/operators/mergeWith.ts", "node_modules/rxjs/src/internal/operators/repeat.ts", "node_modules/rxjs/src/internal/operators/scan.ts", "node_modules/rxjs/src/internal/operators/share.ts", "node_modules/rxjs/src/internal/operators/shareReplay.ts", "node_modules/rxjs/src/internal/operators/skip.ts", "node_modules/rxjs/src/internal/operators/skipUntil.ts", "node_modules/rxjs/src/internal/operators/startWith.ts", "node_modules/rxjs/src/internal/operators/switchMap.ts", "node_modules/rxjs/src/internal/operators/takeUntil.ts", "node_modules/rxjs/src/internal/operators/takeWhile.ts", "node_modules/rxjs/src/internal/operators/tap.ts", "node_modules/rxjs/src/internal/operators/throttle.ts", "node_modules/rxjs/src/internal/operators/throttleTime.ts", "node_modules/rxjs/src/internal/operators/withLatestFrom.ts", "node_modules/rxjs/src/internal/operators/zip.ts", "node_modules/rxjs/src/internal/operators/zipWith.ts", "src/templates/assets/javascripts/browser/document/index.ts", "src/templates/assets/javascripts/browser/element/_/index.ts", "src/templates/assets/javascripts/browser/element/focus/index.ts", "src/templates/assets/javascripts/browser/element/hover/index.ts", "src/templates/assets/javascripts/browser/element/offset/_/index.ts", "src/templates/assets/javascripts/browser/element/offset/content/index.ts", "src/templates/assets/javascripts/utilities/h/index.ts", "src/templates/assets/javascripts/utilities/round/index.ts", "src/templates/assets/javascripts/browser/script/index.ts", "src/templates/assets/javascripts/browser/element/size/_/index.ts", "src/templates/assets/javascripts/browser/element/size/content/index.ts", "src/templates/assets/javascripts/browser/element/visibility/index.ts", "src/templates/assets/javascripts/browser/toggle/index.ts", "src/templates/assets/javascripts/browser/keyboard/index.ts", "src/templates/assets/javascripts/browser/location/_/index.ts", "src/templates/assets/javascripts/browser/location/hash/index.ts", "src/templates/assets/javascripts/browser/media/index.ts", "src/templates/assets/javascripts/browser/request/index.ts", "src/templates/assets/javascripts/browser/viewport/offset/index.ts", "src/templates/assets/javascripts/browser/viewport/size/index.ts", "src/templates/assets/javascripts/browser/viewport/_/index.ts", "src/templates/assets/javascripts/browser/viewport/at/index.ts", "src/templates/assets/javascripts/browser/worker/index.ts", "src/templates/assets/javascripts/_/index.ts", "src/templates/assets/javascripts/components/_/index.ts", "src/templates/assets/javascripts/components/announce/index.ts", "src/templates/assets/javascripts/components/consent/index.ts", "src/templates/assets/javascripts/templates/tooltip/index.tsx", "src/templates/assets/javascripts/templates/annotation/index.tsx", "src/templates/assets/javascripts/templates/clipboard/index.tsx", "src/templates/assets/javascripts/templates/search/index.tsx", "src/templates/assets/javascripts/templates/source/index.tsx", "src/templates/assets/javascripts/templates/tabbed/index.tsx", "src/templates/assets/javascripts/templates/table/index.tsx", "src/templates/assets/javascripts/templates/version/index.tsx", "src/templates/assets/javascripts/components/tooltip/index.ts", "src/templates/assets/javascripts/components/content/annotation/_/index.ts", "src/templates/assets/javascripts/components/content/annotation/list/index.ts", "src/templates/assets/javascripts/components/content/annotation/block/index.ts", "src/templates/assets/javascripts/components/content/code/_/index.ts", "src/templates/assets/javascripts/components/content/details/index.ts", "src/templates/assets/javascripts/components/content/mermaid/index.css", "src/templates/assets/javascripts/components/content/mermaid/index.ts", "src/templates/assets/javascripts/components/content/table/index.ts", "src/templates/assets/javascripts/components/content/tabs/index.ts", "src/templates/assets/javascripts/components/content/_/index.ts", "src/templates/assets/javascripts/components/dialog/index.ts", "src/templates/assets/javascripts/components/header/_/index.ts", "src/templates/assets/javascripts/components/header/title/index.ts", "src/templates/assets/javascripts/components/main/index.ts", "src/templates/assets/javascripts/components/palette/index.ts", "src/templates/assets/javascripts/components/progress/index.ts", "src/templates/assets/javascripts/integrations/clipboard/index.ts", "src/templates/assets/javascripts/integrations/sitemap/index.ts", "src/templates/assets/javascripts/integrations/instant/index.ts", "src/templates/assets/javascripts/integrations/search/highlighter/index.ts", "src/templates/assets/javascripts/integrations/search/worker/message/index.ts", "src/templates/assets/javascripts/integrations/search/worker/_/index.ts", "src/templates/assets/javascripts/integrations/version/index.ts", "src/templates/assets/javascripts/components/search/query/index.ts", "src/templates/assets/javascripts/components/search/result/index.ts", "src/templates/assets/javascripts/components/search/share/index.ts", "src/templates/assets/javascripts/components/search/suggest/index.ts", "src/templates/assets/javascripts/components/search/_/index.ts", "src/templates/assets/javascripts/components/search/highlight/index.ts", "src/templates/assets/javascripts/components/sidebar/index.ts", "src/templates/assets/javascripts/components/source/facts/github/index.ts", "src/templates/assets/javascripts/components/source/facts/gitlab/index.ts", "src/templates/assets/javascripts/components/source/facts/_/index.ts", "src/templates/assets/javascripts/components/source/_/index.ts", "src/templates/assets/javascripts/components/tabs/index.ts", "src/templates/assets/javascripts/components/toc/index.ts", "src/templates/assets/javascripts/components/top/index.ts", "src/templates/assets/javascripts/patches/ellipsis/index.ts", "src/templates/assets/javascripts/patches/indeterminate/index.ts", "src/templates/assets/javascripts/patches/scrollfix/index.ts", "src/templates/assets/javascripts/patches/scrolllock/index.ts", "src/templates/assets/javascripts/polyfills/index.ts"],
+ "sourcesContent": ["(function (global, factory) {\n typeof exports === 'object' && typeof module !== 'undefined' ? factory() :\n typeof define === 'function' && define.amd ? define(factory) :\n (factory());\n}(this, (function () { 'use strict';\n\n /**\n * Applies the :focus-visible polyfill at the given scope.\n * A scope in this case is either the top-level Document or a Shadow Root.\n *\n * @param {(Document|ShadowRoot)} scope\n * @see https://github.com/WICG/focus-visible\n */\n function applyFocusVisiblePolyfill(scope) {\n var hadKeyboardEvent = true;\n var hadFocusVisibleRecently = false;\n var hadFocusVisibleRecentlyTimeout = null;\n\n var inputTypesAllowlist = {\n text: true,\n search: true,\n url: true,\n tel: true,\n email: true,\n password: true,\n number: true,\n date: true,\n month: true,\n week: true,\n time: true,\n datetime: true,\n 'datetime-local': true\n };\n\n /**\n * Helper function for legacy browsers and iframes which sometimes focus\n * elements like document, body, and non-interactive SVG.\n * @param {Element} el\n */\n function isValidFocusTarget(el) {\n if (\n el &&\n el !== document &&\n el.nodeName !== 'HTML' &&\n el.nodeName !== 'BODY' &&\n 'classList' in el &&\n 'contains' in el.classList\n ) {\n return true;\n }\n return false;\n }\n\n /**\n * Computes whether the given element should automatically trigger the\n * `focus-visible` class being added, i.e. whether it should always match\n * `:focus-visible` when focused.\n * @param {Element} el\n * @return {boolean}\n */\n function focusTriggersKeyboardModality(el) {\n var type = el.type;\n var tagName = el.tagName;\n\n if (tagName === 'INPUT' && inputTypesAllowlist[type] && !el.readOnly) {\n return true;\n }\n\n if (tagName === 'TEXTAREA' && !el.readOnly) {\n return true;\n }\n\n if (el.isContentEditable) {\n return true;\n }\n\n return false;\n }\n\n /**\n * Add the `focus-visible` class to the given element if it was not added by\n * the author.\n * @param {Element} el\n */\n function addFocusVisibleClass(el) {\n if (el.classList.contains('focus-visible')) {\n return;\n }\n el.classList.add('focus-visible');\n el.setAttribute('data-focus-visible-added', '');\n }\n\n /**\n * Remove the `focus-visible` class from the given element if it was not\n * originally added by the author.\n * @param {Element} el\n */\n function removeFocusVisibleClass(el) {\n if (!el.hasAttribute('data-focus-visible-added')) {\n return;\n }\n el.classList.remove('focus-visible');\n el.removeAttribute('data-focus-visible-added');\n }\n\n /**\n * If the most recent user interaction was via the keyboard;\n * and the key press did not include a meta, alt/option, or control key;\n * then the modality is keyboard. Otherwise, the modality is not keyboard.\n * Apply `focus-visible` to any current active element and keep track\n * of our keyboard modality state with `hadKeyboardEvent`.\n * @param {KeyboardEvent} e\n */\n function onKeyDown(e) {\n if (e.metaKey || e.altKey || e.ctrlKey) {\n return;\n }\n\n if (isValidFocusTarget(scope.activeElement)) {\n addFocusVisibleClass(scope.activeElement);\n }\n\n hadKeyboardEvent = true;\n }\n\n /**\n * If at any point a user clicks with a pointing device, ensure that we change\n * the modality away from keyboard.\n * This avoids the situation where a user presses a key on an already focused\n * element, and then clicks on a different element, focusing it with a\n * pointing device, while we still think we're in keyboard modality.\n * @param {Event} e\n */\n function onPointerDown(e) {\n hadKeyboardEvent = false;\n }\n\n /**\n * On `focus`, add the `focus-visible` class to the target if:\n * - the target received focus as a result of keyboard navigation, or\n * - the event target is an element that will likely require interaction\n * via the keyboard (e.g. a text box)\n * @param {Event} e\n */\n function onFocus(e) {\n // Prevent IE from focusing the document or HTML element.\n if (!isValidFocusTarget(e.target)) {\n return;\n }\n\n if (hadKeyboardEvent || focusTriggersKeyboardModality(e.target)) {\n addFocusVisibleClass(e.target);\n }\n }\n\n /**\n * On `blur`, remove the `focus-visible` class from the target.\n * @param {Event} e\n */\n function onBlur(e) {\n if (!isValidFocusTarget(e.target)) {\n return;\n }\n\n if (\n e.target.classList.contains('focus-visible') ||\n e.target.hasAttribute('data-focus-visible-added')\n ) {\n // To detect a tab/window switch, we look for a blur event followed\n // rapidly by a visibility change.\n // If we don't see a visibility change within 100ms, it's probably a\n // regular focus change.\n hadFocusVisibleRecently = true;\n window.clearTimeout(hadFocusVisibleRecentlyTimeout);\n hadFocusVisibleRecentlyTimeout = window.setTimeout(function() {\n hadFocusVisibleRecently = false;\n }, 100);\n removeFocusVisibleClass(e.target);\n }\n }\n\n /**\n * If the user changes tabs, keep track of whether or not the previously\n * focused element had .focus-visible.\n * @param {Event} e\n */\n function onVisibilityChange(e) {\n if (document.visibilityState === 'hidden') {\n // If the tab becomes active again, the browser will handle calling focus\n // on the element (Safari actually calls it twice).\n // If this tab change caused a blur on an element with focus-visible,\n // re-apply the class when the user switches back to the tab.\n if (hadFocusVisibleRecently) {\n hadKeyboardEvent = true;\n }\n addInitialPointerMoveListeners();\n }\n }\n\n /**\n * Add a group of listeners to detect usage of any pointing devices.\n * These listeners will be added when the polyfill first loads, and anytime\n * the window is blurred, so that they are active when the window regains\n * focus.\n */\n function addInitialPointerMoveListeners() {\n document.addEventListener('mousemove', onInitialPointerMove);\n document.addEventListener('mousedown', onInitialPointerMove);\n document.addEventListener('mouseup', onInitialPointerMove);\n document.addEventListener('pointermove', onInitialPointerMove);\n document.addEventListener('pointerdown', onInitialPointerMove);\n document.addEventListener('pointerup', onInitialPointerMove);\n document.addEventListener('touchmove', onInitialPointerMove);\n document.addEventListener('touchstart', onInitialPointerMove);\n document.addEventListener('touchend', onInitialPointerMove);\n }\n\n function removeInitialPointerMoveListeners() {\n document.removeEventListener('mousemove', onInitialPointerMove);\n document.removeEventListener('mousedown', onInitialPointerMove);\n document.removeEventListener('mouseup', onInitialPointerMove);\n document.removeEventListener('pointermove', onInitialPointerMove);\n document.removeEventListener('pointerdown', onInitialPointerMove);\n document.removeEventListener('pointerup', onInitialPointerMove);\n document.removeEventListener('touchmove', onInitialPointerMove);\n document.removeEventListener('touchstart', onInitialPointerMove);\n document.removeEventListener('touchend', onInitialPointerMove);\n }\n\n /**\n * When the polfyill first loads, assume the user is in keyboard modality.\n * If any event is received from a pointing device (e.g. mouse, pointer,\n * touch), turn off keyboard modality.\n * This accounts for situations where focus enters the page from the URL bar.\n * @param {Event} e\n */\n function onInitialPointerMove(e) {\n // Work around a Safari quirk that fires a mousemove on whenever the\n // window blurs, even if you're tabbing out of the page. \u00AF\\_(\u30C4)_/\u00AF\n if (e.target.nodeName && e.target.nodeName.toLowerCase() === 'html') {\n return;\n }\n\n hadKeyboardEvent = false;\n removeInitialPointerMoveListeners();\n }\n\n // For some kinds of state, we are interested in changes at the global scope\n // only. For example, global pointer input, global key presses and global\n // visibility change should affect the state at every scope:\n document.addEventListener('keydown', onKeyDown, true);\n document.addEventListener('mousedown', onPointerDown, true);\n document.addEventListener('pointerdown', onPointerDown, true);\n document.addEventListener('touchstart', onPointerDown, true);\n document.addEventListener('visibilitychange', onVisibilityChange, true);\n\n addInitialPointerMoveListeners();\n\n // For focus and blur, we specifically care about state changes in the local\n // scope. This is because focus / blur events that originate from within a\n // shadow root are not re-dispatched from the host element if it was already\n // the active element in its own scope:\n scope.addEventListener('focus', onFocus, true);\n scope.addEventListener('blur', onBlur, true);\n\n // We detect that a node is a ShadowRoot by ensuring that it is a\n // DocumentFragment and also has a host property. This check covers native\n // implementation and polyfill implementation transparently. If we only cared\n // about the native implementation, we could just check if the scope was\n // an instance of a ShadowRoot.\n if (scope.nodeType === Node.DOCUMENT_FRAGMENT_NODE && scope.host) {\n // Since a ShadowRoot is a special kind of DocumentFragment, it does not\n // have a root element to add a class to. So, we add this attribute to the\n // host element instead:\n scope.host.setAttribute('data-js-focus-visible', '');\n } else if (scope.nodeType === Node.DOCUMENT_NODE) {\n document.documentElement.classList.add('js-focus-visible');\n document.documentElement.setAttribute('data-js-focus-visible', '');\n }\n }\n\n // It is important to wrap all references to global window and document in\n // these checks to support server-side rendering use cases\n // @see https://github.com/WICG/focus-visible/issues/199\n if (typeof window !== 'undefined' && typeof document !== 'undefined') {\n // Make the polyfill helper globally available. This can be used as a signal\n // to interested libraries that wish to coordinate with the polyfill for e.g.,\n // applying the polyfill to a shadow root:\n window.applyFocusVisiblePolyfill = applyFocusVisiblePolyfill;\n\n // Notify interested libraries of the polyfill's presence, in case the\n // polyfill was loaded lazily:\n var event;\n\n try {\n event = new CustomEvent('focus-visible-polyfill-ready');\n } catch (error) {\n // IE11 does not support using CustomEvent as a constructor directly:\n event = document.createEvent('CustomEvent');\n event.initCustomEvent('focus-visible-polyfill-ready', false, false, {});\n }\n\n window.dispatchEvent(event);\n }\n\n if (typeof document !== 'undefined') {\n // Apply the polyfill to the global document, so that no JavaScript\n // coordination is required to use the polyfill in the top-level document:\n applyFocusVisiblePolyfill(document);\n }\n\n})));\n", "/*!\n * clipboard.js v2.0.11\n * https://clipboardjs.com/\n *\n * Licensed MIT \u00A9 Zeno Rocha\n */\n(function webpackUniversalModuleDefinition(root, factory) {\n\tif(typeof exports === 'object' && typeof module === 'object')\n\t\tmodule.exports = factory();\n\telse if(typeof define === 'function' && define.amd)\n\t\tdefine([], factory);\n\telse if(typeof exports === 'object')\n\t\texports[\"ClipboardJS\"] = factory();\n\telse\n\t\troot[\"ClipboardJS\"] = factory();\n})(this, function() {\nreturn /******/ (function() { // webpackBootstrap\n/******/ \tvar __webpack_modules__ = ({\n\n/***/ 686:\n/***/ (function(__unused_webpack_module, __webpack_exports__, __webpack_require__) {\n\n\"use strict\";\n\n// EXPORTS\n__webpack_require__.d(__webpack_exports__, {\n \"default\": function() { return /* binding */ clipboard; }\n});\n\n// EXTERNAL MODULE: ./node_modules/tiny-emitter/index.js\nvar tiny_emitter = __webpack_require__(279);\nvar tiny_emitter_default = /*#__PURE__*/__webpack_require__.n(tiny_emitter);\n// EXTERNAL MODULE: ./node_modules/good-listener/src/listen.js\nvar listen = __webpack_require__(370);\nvar listen_default = /*#__PURE__*/__webpack_require__.n(listen);\n// EXTERNAL MODULE: ./node_modules/select/src/select.js\nvar src_select = __webpack_require__(817);\nvar select_default = /*#__PURE__*/__webpack_require__.n(src_select);\n;// CONCATENATED MODULE: ./src/common/command.js\n/**\n * Executes a given operation type.\n * @param {String} type\n * @return {Boolean}\n */\nfunction command(type) {\n try {\n return document.execCommand(type);\n } catch (err) {\n return false;\n }\n}\n;// CONCATENATED MODULE: ./src/actions/cut.js\n\n\n/**\n * Cut action wrapper.\n * @param {String|HTMLElement} target\n * @return {String}\n */\n\nvar ClipboardActionCut = function ClipboardActionCut(target) {\n var selectedText = select_default()(target);\n command('cut');\n return selectedText;\n};\n\n/* harmony default export */ var actions_cut = (ClipboardActionCut);\n;// CONCATENATED MODULE: ./src/common/create-fake-element.js\n/**\n * Creates a fake textarea element with a value.\n * @param {String} value\n * @return {HTMLElement}\n */\nfunction createFakeElement(value) {\n var isRTL = document.documentElement.getAttribute('dir') === 'rtl';\n var fakeElement = document.createElement('textarea'); // Prevent zooming on iOS\n\n fakeElement.style.fontSize = '12pt'; // Reset box model\n\n fakeElement.style.border = '0';\n fakeElement.style.padding = '0';\n fakeElement.style.margin = '0'; // Move element out of screen horizontally\n\n fakeElement.style.position = 'absolute';\n fakeElement.style[isRTL ? 'right' : 'left'] = '-9999px'; // Move element to the same position vertically\n\n var yPosition = window.pageYOffset || document.documentElement.scrollTop;\n fakeElement.style.top = \"\".concat(yPosition, \"px\");\n fakeElement.setAttribute('readonly', '');\n fakeElement.value = value;\n return fakeElement;\n}\n;// CONCATENATED MODULE: ./src/actions/copy.js\n\n\n\n/**\n * Create fake copy action wrapper using a fake element.\n * @param {String} target\n * @param {Object} options\n * @return {String}\n */\n\nvar fakeCopyAction = function fakeCopyAction(value, options) {\n var fakeElement = createFakeElement(value);\n options.container.appendChild(fakeElement);\n var selectedText = select_default()(fakeElement);\n command('copy');\n fakeElement.remove();\n return selectedText;\n};\n/**\n * Copy action wrapper.\n * @param {String|HTMLElement} target\n * @param {Object} options\n * @return {String}\n */\n\n\nvar ClipboardActionCopy = function ClipboardActionCopy(target) {\n var options = arguments.length > 1 && arguments[1] !== undefined ? arguments[1] : {\n container: document.body\n };\n var selectedText = '';\n\n if (typeof target === 'string') {\n selectedText = fakeCopyAction(target, options);\n } else if (target instanceof HTMLInputElement && !['text', 'search', 'url', 'tel', 'password'].includes(target === null || target === void 0 ? void 0 : target.type)) {\n // If input type doesn't support `setSelectionRange`. Simulate it. https://developer.mozilla.org/en-US/docs/Web/API/HTMLInputElement/setSelectionRange\n selectedText = fakeCopyAction(target.value, options);\n } else {\n selectedText = select_default()(target);\n command('copy');\n }\n\n return selectedText;\n};\n\n/* harmony default export */ var actions_copy = (ClipboardActionCopy);\n;// CONCATENATED MODULE: ./src/actions/default.js\nfunction _typeof(obj) { \"@babel/helpers - typeof\"; if (typeof Symbol === \"function\" && typeof Symbol.iterator === \"symbol\") { _typeof = function _typeof(obj) { return typeof obj; }; } else { _typeof = function _typeof(obj) { return obj && typeof Symbol === \"function\" && obj.constructor === Symbol && obj !== Symbol.prototype ? \"symbol\" : typeof obj; }; } return _typeof(obj); }\n\n\n\n/**\n * Inner function which performs selection from either `text` or `target`\n * properties and then executes copy or cut operations.\n * @param {Object} options\n */\n\nvar ClipboardActionDefault = function ClipboardActionDefault() {\n var options = arguments.length > 0 && arguments[0] !== undefined ? arguments[0] : {};\n // Defines base properties passed from constructor.\n var _options$action = options.action,\n action = _options$action === void 0 ? 'copy' : _options$action,\n container = options.container,\n target = options.target,\n text = options.text; // Sets the `action` to be performed which can be either 'copy' or 'cut'.\n\n if (action !== 'copy' && action !== 'cut') {\n throw new Error('Invalid \"action\" value, use either \"copy\" or \"cut\"');\n } // Sets the `target` property using an element that will be have its content copied.\n\n\n if (target !== undefined) {\n if (target && _typeof(target) === 'object' && target.nodeType === 1) {\n if (action === 'copy' && target.hasAttribute('disabled')) {\n throw new Error('Invalid \"target\" attribute. Please use \"readonly\" instead of \"disabled\" attribute');\n }\n\n if (action === 'cut' && (target.hasAttribute('readonly') || target.hasAttribute('disabled'))) {\n throw new Error('Invalid \"target\" attribute. You can\\'t cut text from elements with \"readonly\" or \"disabled\" attributes');\n }\n } else {\n throw new Error('Invalid \"target\" value, use a valid Element');\n }\n } // Define selection strategy based on `text` property.\n\n\n if (text) {\n return actions_copy(text, {\n container: container\n });\n } // Defines which selection strategy based on `target` property.\n\n\n if (target) {\n return action === 'cut' ? actions_cut(target) : actions_copy(target, {\n container: container\n });\n }\n};\n\n/* harmony default export */ var actions_default = (ClipboardActionDefault);\n;// CONCATENATED MODULE: ./src/clipboard.js\nfunction clipboard_typeof(obj) { \"@babel/helpers - typeof\"; if (typeof Symbol === \"function\" && typeof Symbol.iterator === \"symbol\") { clipboard_typeof = function _typeof(obj) { return typeof obj; }; } else { clipboard_typeof = function _typeof(obj) { return obj && typeof Symbol === \"function\" && obj.constructor === Symbol && obj !== Symbol.prototype ? \"symbol\" : typeof obj; }; } return clipboard_typeof(obj); }\n\nfunction _classCallCheck(instance, Constructor) { if (!(instance instanceof Constructor)) { throw new TypeError(\"Cannot call a class as a function\"); } }\n\nfunction _defineProperties(target, props) { for (var i = 0; i < props.length; i++) { var descriptor = props[i]; descriptor.enumerable = descriptor.enumerable || false; descriptor.configurable = true; if (\"value\" in descriptor) descriptor.writable = true; Object.defineProperty(target, descriptor.key, descriptor); } }\n\nfunction _createClass(Constructor, protoProps, staticProps) { if (protoProps) _defineProperties(Constructor.prototype, protoProps); if (staticProps) _defineProperties(Constructor, staticProps); return Constructor; }\n\nfunction _inherits(subClass, superClass) { if (typeof superClass !== \"function\" && superClass !== null) { throw new TypeError(\"Super expression must either be null or a function\"); } subClass.prototype = Object.create(superClass && superClass.prototype, { constructor: { value: subClass, writable: true, configurable: true } }); if (superClass) _setPrototypeOf(subClass, superClass); }\n\nfunction _setPrototypeOf(o, p) { _setPrototypeOf = Object.setPrototypeOf || function _setPrototypeOf(o, p) { o.__proto__ = p; return o; }; return _setPrototypeOf(o, p); }\n\nfunction _createSuper(Derived) { var hasNativeReflectConstruct = _isNativeReflectConstruct(); return function _createSuperInternal() { var Super = _getPrototypeOf(Derived), result; if (hasNativeReflectConstruct) { var NewTarget = _getPrototypeOf(this).constructor; result = Reflect.construct(Super, arguments, NewTarget); } else { result = Super.apply(this, arguments); } return _possibleConstructorReturn(this, result); }; }\n\nfunction _possibleConstructorReturn(self, call) { if (call && (clipboard_typeof(call) === \"object\" || typeof call === \"function\")) { return call; } return _assertThisInitialized(self); }\n\nfunction _assertThisInitialized(self) { if (self === void 0) { throw new ReferenceError(\"this hasn't been initialised - super() hasn't been called\"); } return self; }\n\nfunction _isNativeReflectConstruct() { if (typeof Reflect === \"undefined\" || !Reflect.construct) return false; if (Reflect.construct.sham) return false; if (typeof Proxy === \"function\") return true; try { Date.prototype.toString.call(Reflect.construct(Date, [], function () {})); return true; } catch (e) { return false; } }\n\nfunction _getPrototypeOf(o) { _getPrototypeOf = Object.setPrototypeOf ? Object.getPrototypeOf : function _getPrototypeOf(o) { return o.__proto__ || Object.getPrototypeOf(o); }; return _getPrototypeOf(o); }\n\n\n\n\n\n\n/**\n * Helper function to retrieve attribute value.\n * @param {String} suffix\n * @param {Element} element\n */\n\nfunction getAttributeValue(suffix, element) {\n var attribute = \"data-clipboard-\".concat(suffix);\n\n if (!element.hasAttribute(attribute)) {\n return;\n }\n\n return element.getAttribute(attribute);\n}\n/**\n * Base class which takes one or more elements, adds event listeners to them,\n * and instantiates a new `ClipboardAction` on each click.\n */\n\n\nvar Clipboard = /*#__PURE__*/function (_Emitter) {\n _inherits(Clipboard, _Emitter);\n\n var _super = _createSuper(Clipboard);\n\n /**\n * @param {String|HTMLElement|HTMLCollection|NodeList} trigger\n * @param {Object} options\n */\n function Clipboard(trigger, options) {\n var _this;\n\n _classCallCheck(this, Clipboard);\n\n _this = _super.call(this);\n\n _this.resolveOptions(options);\n\n _this.listenClick(trigger);\n\n return _this;\n }\n /**\n * Defines if attributes would be resolved using internal setter functions\n * or custom functions that were passed in the constructor.\n * @param {Object} options\n */\n\n\n _createClass(Clipboard, [{\n key: \"resolveOptions\",\n value: function resolveOptions() {\n var options = arguments.length > 0 && arguments[0] !== undefined ? arguments[0] : {};\n this.action = typeof options.action === 'function' ? options.action : this.defaultAction;\n this.target = typeof options.target === 'function' ? options.target : this.defaultTarget;\n this.text = typeof options.text === 'function' ? options.text : this.defaultText;\n this.container = clipboard_typeof(options.container) === 'object' ? options.container : document.body;\n }\n /**\n * Adds a click event listener to the passed trigger.\n * @param {String|HTMLElement|HTMLCollection|NodeList} trigger\n */\n\n }, {\n key: \"listenClick\",\n value: function listenClick(trigger) {\n var _this2 = this;\n\n this.listener = listen_default()(trigger, 'click', function (e) {\n return _this2.onClick(e);\n });\n }\n /**\n * Defines a new `ClipboardAction` on each click event.\n * @param {Event} e\n */\n\n }, {\n key: \"onClick\",\n value: function onClick(e) {\n var trigger = e.delegateTarget || e.currentTarget;\n var action = this.action(trigger) || 'copy';\n var text = actions_default({\n action: action,\n container: this.container,\n target: this.target(trigger),\n text: this.text(trigger)\n }); // Fires an event based on the copy operation result.\n\n this.emit(text ? 'success' : 'error', {\n action: action,\n text: text,\n trigger: trigger,\n clearSelection: function clearSelection() {\n if (trigger) {\n trigger.focus();\n }\n\n window.getSelection().removeAllRanges();\n }\n });\n }\n /**\n * Default `action` lookup function.\n * @param {Element} trigger\n */\n\n }, {\n key: \"defaultAction\",\n value: function defaultAction(trigger) {\n return getAttributeValue('action', trigger);\n }\n /**\n * Default `target` lookup function.\n * @param {Element} trigger\n */\n\n }, {\n key: \"defaultTarget\",\n value: function defaultTarget(trigger) {\n var selector = getAttributeValue('target', trigger);\n\n if (selector) {\n return document.querySelector(selector);\n }\n }\n /**\n * Allow fire programmatically a copy action\n * @param {String|HTMLElement} target\n * @param {Object} options\n * @returns Text copied.\n */\n\n }, {\n key: \"defaultText\",\n\n /**\n * Default `text` lookup function.\n * @param {Element} trigger\n */\n value: function defaultText(trigger) {\n return getAttributeValue('text', trigger);\n }\n /**\n * Destroy lifecycle.\n */\n\n }, {\n key: \"destroy\",\n value: function destroy() {\n this.listener.destroy();\n }\n }], [{\n key: \"copy\",\n value: function copy(target) {\n var options = arguments.length > 1 && arguments[1] !== undefined ? arguments[1] : {\n container: document.body\n };\n return actions_copy(target, options);\n }\n /**\n * Allow fire programmatically a cut action\n * @param {String|HTMLElement} target\n * @returns Text cutted.\n */\n\n }, {\n key: \"cut\",\n value: function cut(target) {\n return actions_cut(target);\n }\n /**\n * Returns the support of the given action, or all actions if no action is\n * given.\n * @param {String} [action]\n */\n\n }, {\n key: \"isSupported\",\n value: function isSupported() {\n var action = arguments.length > 0 && arguments[0] !== undefined ? arguments[0] : ['copy', 'cut'];\n var actions = typeof action === 'string' ? [action] : action;\n var support = !!document.queryCommandSupported;\n actions.forEach(function (action) {\n support = support && !!document.queryCommandSupported(action);\n });\n return support;\n }\n }]);\n\n return Clipboard;\n}((tiny_emitter_default()));\n\n/* harmony default export */ var clipboard = (Clipboard);\n\n/***/ }),\n\n/***/ 828:\n/***/ (function(module) {\n\nvar DOCUMENT_NODE_TYPE = 9;\n\n/**\n * A polyfill for Element.matches()\n */\nif (typeof Element !== 'undefined' && !Element.prototype.matches) {\n var proto = Element.prototype;\n\n proto.matches = proto.matchesSelector ||\n proto.mozMatchesSelector ||\n proto.msMatchesSelector ||\n proto.oMatchesSelector ||\n proto.webkitMatchesSelector;\n}\n\n/**\n * Finds the closest parent that matches a selector.\n *\n * @param {Element} element\n * @param {String} selector\n * @return {Function}\n */\nfunction closest (element, selector) {\n while (element && element.nodeType !== DOCUMENT_NODE_TYPE) {\n if (typeof element.matches === 'function' &&\n element.matches(selector)) {\n return element;\n }\n element = element.parentNode;\n }\n}\n\nmodule.exports = closest;\n\n\n/***/ }),\n\n/***/ 438:\n/***/ (function(module, __unused_webpack_exports, __webpack_require__) {\n\nvar closest = __webpack_require__(828);\n\n/**\n * Delegates event to a selector.\n *\n * @param {Element} element\n * @param {String} selector\n * @param {String} type\n * @param {Function} callback\n * @param {Boolean} useCapture\n * @return {Object}\n */\nfunction _delegate(element, selector, type, callback, useCapture) {\n var listenerFn = listener.apply(this, arguments);\n\n element.addEventListener(type, listenerFn, useCapture);\n\n return {\n destroy: function() {\n element.removeEventListener(type, listenerFn, useCapture);\n }\n }\n}\n\n/**\n * Delegates event to a selector.\n *\n * @param {Element|String|Array} [elements]\n * @param {String} selector\n * @param {String} type\n * @param {Function} callback\n * @param {Boolean} useCapture\n * @return {Object}\n */\nfunction delegate(elements, selector, type, callback, useCapture) {\n // Handle the regular Element usage\n if (typeof elements.addEventListener === 'function') {\n return _delegate.apply(null, arguments);\n }\n\n // Handle Element-less usage, it defaults to global delegation\n if (typeof type === 'function') {\n // Use `document` as the first parameter, then apply arguments\n // This is a short way to .unshift `arguments` without running into deoptimizations\n return _delegate.bind(null, document).apply(null, arguments);\n }\n\n // Handle Selector-based usage\n if (typeof elements === 'string') {\n elements = document.querySelectorAll(elements);\n }\n\n // Handle Array-like based usage\n return Array.prototype.map.call(elements, function (element) {\n return _delegate(element, selector, type, callback, useCapture);\n });\n}\n\n/**\n * Finds closest match and invokes callback.\n *\n * @param {Element} element\n * @param {String} selector\n * @param {String} type\n * @param {Function} callback\n * @return {Function}\n */\nfunction listener(element, selector, type, callback) {\n return function(e) {\n e.delegateTarget = closest(e.target, selector);\n\n if (e.delegateTarget) {\n callback.call(element, e);\n }\n }\n}\n\nmodule.exports = delegate;\n\n\n/***/ }),\n\n/***/ 879:\n/***/ (function(__unused_webpack_module, exports) {\n\n/**\n * Check if argument is a HTML element.\n *\n * @param {Object} value\n * @return {Boolean}\n */\nexports.node = function(value) {\n return value !== undefined\n && value instanceof HTMLElement\n && value.nodeType === 1;\n};\n\n/**\n * Check if argument is a list of HTML elements.\n *\n * @param {Object} value\n * @return {Boolean}\n */\nexports.nodeList = function(value) {\n var type = Object.prototype.toString.call(value);\n\n return value !== undefined\n && (type === '[object NodeList]' || type === '[object HTMLCollection]')\n && ('length' in value)\n && (value.length === 0 || exports.node(value[0]));\n};\n\n/**\n * Check if argument is a string.\n *\n * @param {Object} value\n * @return {Boolean}\n */\nexports.string = function(value) {\n return typeof value === 'string'\n || value instanceof String;\n};\n\n/**\n * Check if argument is a function.\n *\n * @param {Object} value\n * @return {Boolean}\n */\nexports.fn = function(value) {\n var type = Object.prototype.toString.call(value);\n\n return type === '[object Function]';\n};\n\n\n/***/ }),\n\n/***/ 370:\n/***/ (function(module, __unused_webpack_exports, __webpack_require__) {\n\nvar is = __webpack_require__(879);\nvar delegate = __webpack_require__(438);\n\n/**\n * Validates all params and calls the right\n * listener function based on its target type.\n *\n * @param {String|HTMLElement|HTMLCollection|NodeList} target\n * @param {String} type\n * @param {Function} callback\n * @return {Object}\n */\nfunction listen(target, type, callback) {\n if (!target && !type && !callback) {\n throw new Error('Missing required arguments');\n }\n\n if (!is.string(type)) {\n throw new TypeError('Second argument must be a String');\n }\n\n if (!is.fn(callback)) {\n throw new TypeError('Third argument must be a Function');\n }\n\n if (is.node(target)) {\n return listenNode(target, type, callback);\n }\n else if (is.nodeList(target)) {\n return listenNodeList(target, type, callback);\n }\n else if (is.string(target)) {\n return listenSelector(target, type, callback);\n }\n else {\n throw new TypeError('First argument must be a String, HTMLElement, HTMLCollection, or NodeList');\n }\n}\n\n/**\n * Adds an event listener to a HTML element\n * and returns a remove listener function.\n *\n * @param {HTMLElement} node\n * @param {String} type\n * @param {Function} callback\n * @return {Object}\n */\nfunction listenNode(node, type, callback) {\n node.addEventListener(type, callback);\n\n return {\n destroy: function() {\n node.removeEventListener(type, callback);\n }\n }\n}\n\n/**\n * Add an event listener to a list of HTML elements\n * and returns a remove listener function.\n *\n * @param {NodeList|HTMLCollection} nodeList\n * @param {String} type\n * @param {Function} callback\n * @return {Object}\n */\nfunction listenNodeList(nodeList, type, callback) {\n Array.prototype.forEach.call(nodeList, function(node) {\n node.addEventListener(type, callback);\n });\n\n return {\n destroy: function() {\n Array.prototype.forEach.call(nodeList, function(node) {\n node.removeEventListener(type, callback);\n });\n }\n }\n}\n\n/**\n * Add an event listener to a selector\n * and returns a remove listener function.\n *\n * @param {String} selector\n * @param {String} type\n * @param {Function} callback\n * @return {Object}\n */\nfunction listenSelector(selector, type, callback) {\n return delegate(document.body, selector, type, callback);\n}\n\nmodule.exports = listen;\n\n\n/***/ }),\n\n/***/ 817:\n/***/ (function(module) {\n\nfunction select(element) {\n var selectedText;\n\n if (element.nodeName === 'SELECT') {\n element.focus();\n\n selectedText = element.value;\n }\n else if (element.nodeName === 'INPUT' || element.nodeName === 'TEXTAREA') {\n var isReadOnly = element.hasAttribute('readonly');\n\n if (!isReadOnly) {\n element.setAttribute('readonly', '');\n }\n\n element.select();\n element.setSelectionRange(0, element.value.length);\n\n if (!isReadOnly) {\n element.removeAttribute('readonly');\n }\n\n selectedText = element.value;\n }\n else {\n if (element.hasAttribute('contenteditable')) {\n element.focus();\n }\n\n var selection = window.getSelection();\n var range = document.createRange();\n\n range.selectNodeContents(element);\n selection.removeAllRanges();\n selection.addRange(range);\n\n selectedText = selection.toString();\n }\n\n return selectedText;\n}\n\nmodule.exports = select;\n\n\n/***/ }),\n\n/***/ 279:\n/***/ (function(module) {\n\nfunction E () {\n // Keep this empty so it's easier to inherit from\n // (via https://github.com/lipsmack from https://github.com/scottcorgan/tiny-emitter/issues/3)\n}\n\nE.prototype = {\n on: function (name, callback, ctx) {\n var e = this.e || (this.e = {});\n\n (e[name] || (e[name] = [])).push({\n fn: callback,\n ctx: ctx\n });\n\n return this;\n },\n\n once: function (name, callback, ctx) {\n var self = this;\n function listener () {\n self.off(name, listener);\n callback.apply(ctx, arguments);\n };\n\n listener._ = callback\n return this.on(name, listener, ctx);\n },\n\n emit: function (name) {\n var data = [].slice.call(arguments, 1);\n var evtArr = ((this.e || (this.e = {}))[name] || []).slice();\n var i = 0;\n var len = evtArr.length;\n\n for (i; i < len; i++) {\n evtArr[i].fn.apply(evtArr[i].ctx, data);\n }\n\n return this;\n },\n\n off: function (name, callback) {\n var e = this.e || (this.e = {});\n var evts = e[name];\n var liveEvents = [];\n\n if (evts && callback) {\n for (var i = 0, len = evts.length; i < len; i++) {\n if (evts[i].fn !== callback && evts[i].fn._ !== callback)\n liveEvents.push(evts[i]);\n }\n }\n\n // Remove event from queue to prevent memory leak\n // Suggested by https://github.com/lazd\n // Ref: https://github.com/scottcorgan/tiny-emitter/commit/c6ebfaa9bc973b33d110a84a307742b7cf94c953#commitcomment-5024910\n\n (liveEvents.length)\n ? e[name] = liveEvents\n : delete e[name];\n\n return this;\n }\n};\n\nmodule.exports = E;\nmodule.exports.TinyEmitter = E;\n\n\n/***/ })\n\n/******/ \t});\n/************************************************************************/\n/******/ \t// The module cache\n/******/ \tvar __webpack_module_cache__ = {};\n/******/ \t\n/******/ \t// The require function\n/******/ \tfunction __webpack_require__(moduleId) {\n/******/ \t\t// Check if module is in cache\n/******/ \t\tif(__webpack_module_cache__[moduleId]) {\n/******/ \t\t\treturn __webpack_module_cache__[moduleId].exports;\n/******/ \t\t}\n/******/ \t\t// Create a new module (and put it into the cache)\n/******/ \t\tvar module = __webpack_module_cache__[moduleId] = {\n/******/ \t\t\t// no module.id needed\n/******/ \t\t\t// no module.loaded needed\n/******/ \t\t\texports: {}\n/******/ \t\t};\n/******/ \t\n/******/ \t\t// Execute the module function\n/******/ \t\t__webpack_modules__[moduleId](module, module.exports, __webpack_require__);\n/******/ \t\n/******/ \t\t// Return the exports of the module\n/******/ \t\treturn module.exports;\n/******/ \t}\n/******/ \t\n/************************************************************************/\n/******/ \t/* webpack/runtime/compat get default export */\n/******/ \t!function() {\n/******/ \t\t// getDefaultExport function for compatibility with non-harmony modules\n/******/ \t\t__webpack_require__.n = function(module) {\n/******/ \t\t\tvar getter = module && module.__esModule ?\n/******/ \t\t\t\tfunction() { return module['default']; } :\n/******/ \t\t\t\tfunction() { return module; };\n/******/ \t\t\t__webpack_require__.d(getter, { a: getter });\n/******/ \t\t\treturn getter;\n/******/ \t\t};\n/******/ \t}();\n/******/ \t\n/******/ \t/* webpack/runtime/define property getters */\n/******/ \t!function() {\n/******/ \t\t// define getter functions for harmony exports\n/******/ \t\t__webpack_require__.d = function(exports, definition) {\n/******/ \t\t\tfor(var key in definition) {\n/******/ \t\t\t\tif(__webpack_require__.o(definition, key) && !__webpack_require__.o(exports, key)) {\n/******/ \t\t\t\t\tObject.defineProperty(exports, key, { enumerable: true, get: definition[key] });\n/******/ \t\t\t\t}\n/******/ \t\t\t}\n/******/ \t\t};\n/******/ \t}();\n/******/ \t\n/******/ \t/* webpack/runtime/hasOwnProperty shorthand */\n/******/ \t!function() {\n/******/ \t\t__webpack_require__.o = function(obj, prop) { return Object.prototype.hasOwnProperty.call(obj, prop); }\n/******/ \t}();\n/******/ \t\n/************************************************************************/\n/******/ \t// module exports must be returned from runtime so entry inlining is disabled\n/******/ \t// startup\n/******/ \t// Load entry module and return exports\n/******/ \treturn __webpack_require__(686);\n/******/ })()\n.default;\n});", "/*!\n * escape-html\n * Copyright(c) 2012-2013 TJ Holowaychuk\n * Copyright(c) 2015 Andreas Lubbe\n * Copyright(c) 2015 Tiancheng \"Timothy\" Gu\n * MIT Licensed\n */\n\n'use strict';\n\n/**\n * Module variables.\n * @private\n */\n\nvar matchHtmlRegExp = /[\"'&<>]/;\n\n/**\n * Module exports.\n * @public\n */\n\nmodule.exports = escapeHtml;\n\n/**\n * Escape special characters in the given string of html.\n *\n * @param {string} string The string to escape for inserting into HTML\n * @return {string}\n * @public\n */\n\nfunction escapeHtml(string) {\n var str = '' + string;\n var match = matchHtmlRegExp.exec(str);\n\n if (!match) {\n return str;\n }\n\n var escape;\n var html = '';\n var index = 0;\n var lastIndex = 0;\n\n for (index = match.index; index < str.length; index++) {\n switch (str.charCodeAt(index)) {\n case 34: // \"\n escape = '"';\n break;\n case 38: // &\n escape = '&';\n break;\n case 39: // '\n escape = ''';\n break;\n case 60: // <\n escape = '<';\n break;\n case 62: // >\n escape = '>';\n break;\n default:\n continue;\n }\n\n if (lastIndex !== index) {\n html += str.substring(lastIndex, index);\n }\n\n lastIndex = index + 1;\n html += escape;\n }\n\n return lastIndex !== index\n ? html + str.substring(lastIndex, index)\n : html;\n}\n", "/*\n * Copyright (c) 2016-2024 Martin Donath \n *\n * Permission is hereby granted, free of charge, to any person obtaining a copy\n * of this software and associated documentation files (the \"Software\"), to\n * deal in the Software without restriction, including without limitation the\n * rights to use, copy, modify, merge, publish, distribute, sublicense, and/or\n * sell copies of the Software, and to permit persons to whom the Software is\n * furnished to do so, subject to the following conditions:\n *\n * The above copyright notice and this permission notice shall be included in\n * all copies or substantial portions of the Software.\n *\n * THE SOFTWARE IS PROVIDED \"AS IS\", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR\n * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,\n * FITNESS FOR A PARTICULAR PURPOSE AND NON-INFRINGEMENT. IN NO EVENT SHALL THE\n * AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER\n * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING\n * FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS\n * IN THE SOFTWARE.\n */\n\nimport \"focus-visible\"\n\nimport {\n EMPTY,\n NEVER,\n Observable,\n Subject,\n defer,\n delay,\n filter,\n map,\n merge,\n mergeWith,\n shareReplay,\n switchMap\n} from \"rxjs\"\n\nimport { configuration, feature } from \"./_\"\nimport {\n at,\n getActiveElement,\n getOptionalElement,\n requestJSON,\n setLocation,\n setToggle,\n watchDocument,\n watchKeyboard,\n watchLocation,\n watchLocationTarget,\n watchMedia,\n watchPrint,\n watchScript,\n watchViewport\n} from \"./browser\"\nimport {\n getComponentElement,\n getComponentElements,\n mountAnnounce,\n mountBackToTop,\n mountConsent,\n mountContent,\n mountDialog,\n mountHeader,\n mountHeaderTitle,\n mountPalette,\n mountProgress,\n mountSearch,\n mountSearchHiglight,\n mountSidebar,\n mountSource,\n mountTableOfContents,\n mountTabs,\n watchHeader,\n watchMain\n} from \"./components\"\nimport {\n SearchIndex,\n setupClipboardJS,\n setupInstantNavigation,\n setupVersionSelector\n} from \"./integrations\"\nimport {\n patchEllipsis,\n patchIndeterminate,\n patchScrollfix,\n patchScrolllock\n} from \"./patches\"\nimport \"./polyfills\"\n\n/* ----------------------------------------------------------------------------\n * Functions - @todo refactor\n * ------------------------------------------------------------------------- */\n\n/**\n * Fetch search index\n *\n * @returns Search index observable\n */\nfunction fetchSearchIndex(): Observable {\n if (location.protocol === \"file:\") {\n return watchScript(\n `${new URL(\"search/search_index.js\", config.base)}`\n )\n .pipe(\n // @ts-ignore - @todo fix typings\n map(() => __index),\n shareReplay(1)\n )\n } else {\n return requestJSON(\n new URL(\"search/search_index.json\", config.base)\n )\n }\n}\n\n/* ----------------------------------------------------------------------------\n * Application\n * ------------------------------------------------------------------------- */\n\n/* Yay, JavaScript is available */\ndocument.documentElement.classList.remove(\"no-js\")\ndocument.documentElement.classList.add(\"js\")\n\n/* Set up navigation observables and subjects */\nconst document$ = watchDocument()\nconst location$ = watchLocation()\nconst target$ = watchLocationTarget(location$)\nconst keyboard$ = watchKeyboard()\n\n/* Set up media observables */\nconst viewport$ = watchViewport()\nconst tablet$ = watchMedia(\"(min-width: 960px)\")\nconst screen$ = watchMedia(\"(min-width: 1220px)\")\nconst print$ = watchPrint()\n\n/* Retrieve search index, if search is enabled */\nconst config = configuration()\nconst index$ = document.forms.namedItem(\"search\")\n ? fetchSearchIndex()\n : NEVER\n\n/* Set up Clipboard.js integration */\nconst alert$ = new Subject()\nsetupClipboardJS({ alert$ })\n\n/* Set up progress indicator */\nconst progress$ = new Subject()\n\n/* Set up instant navigation, if enabled */\nif (feature(\"navigation.instant\"))\n setupInstantNavigation({ location$, viewport$, progress$ })\n .subscribe(document$)\n\n/* Set up version selector */\nif (config.version?.provider === \"mike\")\n setupVersionSelector({ document$ })\n\n/* Always close drawer and search on navigation */\nmerge(location$, target$)\n .pipe(\n delay(125)\n )\n .subscribe(() => {\n setToggle(\"drawer\", false)\n setToggle(\"search\", false)\n })\n\n/* Set up global keyboard handlers */\nkeyboard$\n .pipe(\n filter(({ mode }) => mode === \"global\")\n )\n .subscribe(key => {\n switch (key.type) {\n\n /* Go to previous page */\n case \"p\":\n case \",\":\n const prev = getOptionalElement(\"link[rel=prev]\")\n if (typeof prev !== \"undefined\")\n setLocation(prev)\n break\n\n /* Go to next page */\n case \"n\":\n case \".\":\n const next = getOptionalElement(\"link[rel=next]\")\n if (typeof next !== \"undefined\")\n setLocation(next)\n break\n\n /* Expand navigation, see https://bit.ly/3ZjG5io */\n case \"Enter\":\n const active = getActiveElement()\n if (active instanceof HTMLLabelElement)\n active.click()\n }\n })\n\n/* Set up patches */\npatchEllipsis({ document$ })\npatchIndeterminate({ document$, tablet$ })\npatchScrollfix({ document$ })\npatchScrolllock({ viewport$, tablet$ })\n\n/* Set up header and main area observable */\nconst header$ = watchHeader(getComponentElement(\"header\"), { viewport$ })\nconst main$ = document$\n .pipe(\n map(() => getComponentElement(\"main\")),\n switchMap(el => watchMain(el, { viewport$, header$ })),\n shareReplay(1)\n )\n\n/* Set up control component observables */\nconst control$ = merge(\n\n /* Consent */\n ...getComponentElements(\"consent\")\n .map(el => mountConsent(el, { target$ })),\n\n /* Dialog */\n ...getComponentElements(\"dialog\")\n .map(el => mountDialog(el, { alert$ })),\n\n /* Header */\n ...getComponentElements(\"header\")\n .map(el => mountHeader(el, { viewport$, header$, main$ })),\n\n /* Color palette */\n ...getComponentElements(\"palette\")\n .map(el => mountPalette(el)),\n\n /* Progress bar */\n ...getComponentElements(\"progress\")\n .map(el => mountProgress(el, { progress$ })),\n\n /* Search */\n ...getComponentElements(\"search\")\n .map(el => mountSearch(el, { index$, keyboard$ })),\n\n /* Repository information */\n ...getComponentElements(\"source\")\n .map(el => mountSource(el))\n)\n\n/* Set up content component observables */\nconst content$ = defer(() => merge(\n\n /* Announcement bar */\n ...getComponentElements(\"announce\")\n .map(el => mountAnnounce(el)),\n\n /* Content */\n ...getComponentElements(\"content\")\n .map(el => mountContent(el, { viewport$, target$, print$ })),\n\n /* Search highlighting */\n ...getComponentElements(\"content\")\n .map(el => feature(\"search.highlight\")\n ? mountSearchHiglight(el, { index$, location$ })\n : EMPTY\n ),\n\n /* Header title */\n ...getComponentElements(\"header-title\")\n .map(el => mountHeaderTitle(el, { viewport$, header$ })),\n\n /* Sidebar */\n ...getComponentElements(\"sidebar\")\n .map(el => el.getAttribute(\"data-md-type\") === \"navigation\"\n ? at(screen$, () => mountSidebar(el, { viewport$, header$, main$ }))\n : at(tablet$, () => mountSidebar(el, { viewport$, header$, main$ }))\n ),\n\n /* Navigation tabs */\n ...getComponentElements(\"tabs\")\n .map(el => mountTabs(el, { viewport$, header$ })),\n\n /* Table of contents */\n ...getComponentElements(\"toc\")\n .map(el => mountTableOfContents(el, {\n viewport$, header$, main$, target$\n })),\n\n /* Back-to-top button */\n ...getComponentElements(\"top\")\n .map(el => mountBackToTop(el, { viewport$, header$, main$, target$ }))\n))\n\n/* Set up component observables */\nconst component$ = document$\n .pipe(\n switchMap(() => content$),\n mergeWith(control$),\n shareReplay(1)\n )\n\n/* Subscribe to all components */\ncomponent$.subscribe()\n\n/* ----------------------------------------------------------------------------\n * Exports\n * ------------------------------------------------------------------------- */\n\nwindow.document$ = document$ /* Document observable */\nwindow.location$ = location$ /* Location subject */\nwindow.target$ = target$ /* Location target observable */\nwindow.keyboard$ = keyboard$ /* Keyboard observable */\nwindow.viewport$ = viewport$ /* Viewport observable */\nwindow.tablet$ = tablet$ /* Media tablet observable */\nwindow.screen$ = screen$ /* Media screen observable */\nwindow.print$ = print$ /* Media print observable */\nwindow.alert$ = alert$ /* Alert subject */\nwindow.progress$ = progress$ /* Progress indicator subject */\nwindow.component$ = component$ /* Component observable */\n", "/*! *****************************************************************************\r\nCopyright (c) Microsoft Corporation.\r\n\r\nPermission to use, copy, modify, and/or distribute this software for any\r\npurpose with or without fee is hereby granted.\r\n\r\nTHE SOFTWARE IS PROVIDED \"AS IS\" AND THE AUTHOR DISCLAIMS ALL WARRANTIES WITH\r\nREGARD TO THIS SOFTWARE INCLUDING ALL IMPLIED WARRANTIES OF MERCHANTABILITY\r\nAND FITNESS. IN NO EVENT SHALL THE AUTHOR BE LIABLE FOR ANY SPECIAL, DIRECT,\r\nINDIRECT, OR CONSEQUENTIAL DAMAGES OR ANY DAMAGES WHATSOEVER RESULTING FROM\r\nLOSS OF USE, DATA OR PROFITS, WHETHER IN AN ACTION OF CONTRACT, NEGLIGENCE OR\r\nOTHER TORTIOUS ACTION, ARISING OUT OF OR IN CONNECTION WITH THE USE OR\r\nPERFORMANCE OF THIS SOFTWARE.\r\n***************************************************************************** */\r\n/* global Reflect, Promise */\r\n\r\nvar extendStatics = function(d, b) {\r\n extendStatics = Object.setPrototypeOf ||\r\n ({ __proto__: [] } instanceof Array && function (d, b) { d.__proto__ = b; }) ||\r\n function (d, b) { for (var p in b) if (Object.prototype.hasOwnProperty.call(b, p)) d[p] = b[p]; };\r\n return extendStatics(d, b);\r\n};\r\n\r\nexport function __extends(d, b) {\r\n if (typeof b !== \"function\" && b !== null)\r\n throw new TypeError(\"Class extends value \" + String(b) + \" is not a constructor or null\");\r\n extendStatics(d, b);\r\n function __() { this.constructor = d; }\r\n d.prototype = b === null ? Object.create(b) : (__.prototype = b.prototype, new __());\r\n}\r\n\r\nexport var __assign = function() {\r\n __assign = Object.assign || function __assign(t) {\r\n for (var s, i = 1, n = arguments.length; i < n; i++) {\r\n s = arguments[i];\r\n for (var p in s) if (Object.prototype.hasOwnProperty.call(s, p)) t[p] = s[p];\r\n }\r\n return t;\r\n }\r\n return __assign.apply(this, arguments);\r\n}\r\n\r\nexport function __rest(s, e) {\r\n var t = {};\r\n for (var p in s) if (Object.prototype.hasOwnProperty.call(s, p) && e.indexOf(p) < 0)\r\n t[p] = s[p];\r\n if (s != null && typeof Object.getOwnPropertySymbols === \"function\")\r\n for (var i = 0, p = Object.getOwnPropertySymbols(s); i < p.length; i++) {\r\n if (e.indexOf(p[i]) < 0 && Object.prototype.propertyIsEnumerable.call(s, p[i]))\r\n t[p[i]] = s[p[i]];\r\n }\r\n return t;\r\n}\r\n\r\nexport function __decorate(decorators, target, key, desc) {\r\n var c = arguments.length, r = c < 3 ? target : desc === null ? desc = Object.getOwnPropertyDescriptor(target, key) : desc, d;\r\n if (typeof Reflect === \"object\" && typeof Reflect.decorate === \"function\") r = Reflect.decorate(decorators, target, key, desc);\r\n else for (var i = decorators.length - 1; i >= 0; i--) if (d = decorators[i]) r = (c < 3 ? d(r) : c > 3 ? d(target, key, r) : d(target, key)) || r;\r\n return c > 3 && r && Object.defineProperty(target, key, r), r;\r\n}\r\n\r\nexport function __param(paramIndex, decorator) {\r\n return function (target, key) { decorator(target, key, paramIndex); }\r\n}\r\n\r\nexport function __metadata(metadataKey, metadataValue) {\r\n if (typeof Reflect === \"object\" && typeof Reflect.metadata === \"function\") return Reflect.metadata(metadataKey, metadataValue);\r\n}\r\n\r\nexport function __awaiter(thisArg, _arguments, P, generator) {\r\n function adopt(value) { return value instanceof P ? value : new P(function (resolve) { resolve(value); }); }\r\n return new (P || (P = Promise))(function (resolve, reject) {\r\n function fulfilled(value) { try { step(generator.next(value)); } catch (e) { reject(e); } }\r\n function rejected(value) { try { step(generator[\"throw\"](value)); } catch (e) { reject(e); } }\r\n function step(result) { result.done ? resolve(result.value) : adopt(result.value).then(fulfilled, rejected); }\r\n step((generator = generator.apply(thisArg, _arguments || [])).next());\r\n });\r\n}\r\n\r\nexport function __generator(thisArg, body) {\r\n var _ = { label: 0, sent: function() { if (t[0] & 1) throw t[1]; return t[1]; }, trys: [], ops: [] }, f, y, t, g;\r\n return g = { next: verb(0), \"throw\": verb(1), \"return\": verb(2) }, typeof Symbol === \"function\" && (g[Symbol.iterator] = function() { return this; }), g;\r\n function verb(n) { return function (v) { return step([n, v]); }; }\r\n function step(op) {\r\n if (f) throw new TypeError(\"Generator is already executing.\");\r\n while (_) try {\r\n if (f = 1, y && (t = op[0] & 2 ? y[\"return\"] : op[0] ? y[\"throw\"] || ((t = y[\"return\"]) && t.call(y), 0) : y.next) && !(t = t.call(y, op[1])).done) return t;\r\n if (y = 0, t) op = [op[0] & 2, t.value];\r\n switch (op[0]) {\r\n case 0: case 1: t = op; break;\r\n case 4: _.label++; return { value: op[1], done: false };\r\n case 5: _.label++; y = op[1]; op = [0]; continue;\r\n case 7: op = _.ops.pop(); _.trys.pop(); continue;\r\n default:\r\n if (!(t = _.trys, t = t.length > 0 && t[t.length - 1]) && (op[0] === 6 || op[0] === 2)) { _ = 0; continue; }\r\n if (op[0] === 3 && (!t || (op[1] > t[0] && op[1] < t[3]))) { _.label = op[1]; break; }\r\n if (op[0] === 6 && _.label < t[1]) { _.label = t[1]; t = op; break; }\r\n if (t && _.label < t[2]) { _.label = t[2]; _.ops.push(op); break; }\r\n if (t[2]) _.ops.pop();\r\n _.trys.pop(); continue;\r\n }\r\n op = body.call(thisArg, _);\r\n } catch (e) { op = [6, e]; y = 0; } finally { f = t = 0; }\r\n if (op[0] & 5) throw op[1]; return { value: op[0] ? op[1] : void 0, done: true };\r\n }\r\n}\r\n\r\nexport var __createBinding = Object.create ? (function(o, m, k, k2) {\r\n if (k2 === undefined) k2 = k;\r\n Object.defineProperty(o, k2, { enumerable: true, get: function() { return m[k]; } });\r\n}) : (function(o, m, k, k2) {\r\n if (k2 === undefined) k2 = k;\r\n o[k2] = m[k];\r\n});\r\n\r\nexport function __exportStar(m, o) {\r\n for (var p in m) if (p !== \"default\" && !Object.prototype.hasOwnProperty.call(o, p)) __createBinding(o, m, p);\r\n}\r\n\r\nexport function __values(o) {\r\n var s = typeof Symbol === \"function\" && Symbol.iterator, m = s && o[s], i = 0;\r\n if (m) return m.call(o);\r\n if (o && typeof o.length === \"number\") return {\r\n next: function () {\r\n if (o && i >= o.length) o = void 0;\r\n return { value: o && o[i++], done: !o };\r\n }\r\n };\r\n throw new TypeError(s ? \"Object is not iterable.\" : \"Symbol.iterator is not defined.\");\r\n}\r\n\r\nexport function __read(o, n) {\r\n var m = typeof Symbol === \"function\" && o[Symbol.iterator];\r\n if (!m) return o;\r\n var i = m.call(o), r, ar = [], e;\r\n try {\r\n while ((n === void 0 || n-- > 0) && !(r = i.next()).done) ar.push(r.value);\r\n }\r\n catch (error) { e = { error: error }; }\r\n finally {\r\n try {\r\n if (r && !r.done && (m = i[\"return\"])) m.call(i);\r\n }\r\n finally { if (e) throw e.error; }\r\n }\r\n return ar;\r\n}\r\n\r\n/** @deprecated */\r\nexport function __spread() {\r\n for (var ar = [], i = 0; i < arguments.length; i++)\r\n ar = ar.concat(__read(arguments[i]));\r\n return ar;\r\n}\r\n\r\n/** @deprecated */\r\nexport function __spreadArrays() {\r\n for (var s = 0, i = 0, il = arguments.length; i < il; i++) s += arguments[i].length;\r\n for (var r = Array(s), k = 0, i = 0; i < il; i++)\r\n for (var a = arguments[i], j = 0, jl = a.length; j < jl; j++, k++)\r\n r[k] = a[j];\r\n return r;\r\n}\r\n\r\nexport function __spreadArray(to, from, pack) {\r\n if (pack || arguments.length === 2) for (var i = 0, l = from.length, ar; i < l; i++) {\r\n if (ar || !(i in from)) {\r\n if (!ar) ar = Array.prototype.slice.call(from, 0, i);\r\n ar[i] = from[i];\r\n }\r\n }\r\n return to.concat(ar || Array.prototype.slice.call(from));\r\n}\r\n\r\nexport function __await(v) {\r\n return this instanceof __await ? (this.v = v, this) : new __await(v);\r\n}\r\n\r\nexport function __asyncGenerator(thisArg, _arguments, generator) {\r\n if (!Symbol.asyncIterator) throw new TypeError(\"Symbol.asyncIterator is not defined.\");\r\n var g = generator.apply(thisArg, _arguments || []), i, q = [];\r\n return i = {}, verb(\"next\"), verb(\"throw\"), verb(\"return\"), i[Symbol.asyncIterator] = function () { return this; }, i;\r\n function verb(n) { if (g[n]) i[n] = function (v) { return new Promise(function (a, b) { q.push([n, v, a, b]) > 1 || resume(n, v); }); }; }\r\n function resume(n, v) { try { step(g[n](v)); } catch (e) { settle(q[0][3], e); } }\r\n function step(r) { r.value instanceof __await ? Promise.resolve(r.value.v).then(fulfill, reject) : settle(q[0][2], r); }\r\n function fulfill(value) { resume(\"next\", value); }\r\n function reject(value) { resume(\"throw\", value); }\r\n function settle(f, v) { if (f(v), q.shift(), q.length) resume(q[0][0], q[0][1]); }\r\n}\r\n\r\nexport function __asyncDelegator(o) {\r\n var i, p;\r\n return i = {}, verb(\"next\"), verb(\"throw\", function (e) { throw e; }), verb(\"return\"), i[Symbol.iterator] = function () { return this; }, i;\r\n function verb(n, f) { i[n] = o[n] ? function (v) { return (p = !p) ? { value: __await(o[n](v)), done: n === \"return\" } : f ? f(v) : v; } : f; }\r\n}\r\n\r\nexport function __asyncValues(o) {\r\n if (!Symbol.asyncIterator) throw new TypeError(\"Symbol.asyncIterator is not defined.\");\r\n var m = o[Symbol.asyncIterator], i;\r\n return m ? m.call(o) : (o = typeof __values === \"function\" ? __values(o) : o[Symbol.iterator](), i = {}, verb(\"next\"), verb(\"throw\"), verb(\"return\"), i[Symbol.asyncIterator] = function () { return this; }, i);\r\n function verb(n) { i[n] = o[n] && function (v) { return new Promise(function (resolve, reject) { v = o[n](v), settle(resolve, reject, v.done, v.value); }); }; }\r\n function settle(resolve, reject, d, v) { Promise.resolve(v).then(function(v) { resolve({ value: v, done: d }); }, reject); }\r\n}\r\n\r\nexport function __makeTemplateObject(cooked, raw) {\r\n if (Object.defineProperty) { Object.defineProperty(cooked, \"raw\", { value: raw }); } else { cooked.raw = raw; }\r\n return cooked;\r\n};\r\n\r\nvar __setModuleDefault = Object.create ? (function(o, v) {\r\n Object.defineProperty(o, \"default\", { enumerable: true, value: v });\r\n}) : function(o, v) {\r\n o[\"default\"] = v;\r\n};\r\n\r\nexport function __importStar(mod) {\r\n if (mod && mod.__esModule) return mod;\r\n var result = {};\r\n if (mod != null) for (var k in mod) if (k !== \"default\" && Object.prototype.hasOwnProperty.call(mod, k)) __createBinding(result, mod, k);\r\n __setModuleDefault(result, mod);\r\n return result;\r\n}\r\n\r\nexport function __importDefault(mod) {\r\n return (mod && mod.__esModule) ? mod : { default: mod };\r\n}\r\n\r\nexport function __classPrivateFieldGet(receiver, state, kind, f) {\r\n if (kind === \"a\" && !f) throw new TypeError(\"Private accessor was defined without a getter\");\r\n if (typeof state === \"function\" ? receiver !== state || !f : !state.has(receiver)) throw new TypeError(\"Cannot read private member from an object whose class did not declare it\");\r\n return kind === \"m\" ? f : kind === \"a\" ? f.call(receiver) : f ? f.value : state.get(receiver);\r\n}\r\n\r\nexport function __classPrivateFieldSet(receiver, state, value, kind, f) {\r\n if (kind === \"m\") throw new TypeError(\"Private method is not writable\");\r\n if (kind === \"a\" && !f) throw new TypeError(\"Private accessor was defined without a setter\");\r\n if (typeof state === \"function\" ? receiver !== state || !f : !state.has(receiver)) throw new TypeError(\"Cannot write private member to an object whose class did not declare it\");\r\n return (kind === \"a\" ? f.call(receiver, value) : f ? f.value = value : state.set(receiver, value)), value;\r\n}\r\n", "/**\n * Returns true if the object is a function.\n * @param value The value to check\n */\nexport function isFunction(value: any): value is (...args: any[]) => any {\n return typeof value === 'function';\n}\n", "/**\n * Used to create Error subclasses until the community moves away from ES5.\n *\n * This is because compiling from TypeScript down to ES5 has issues with subclassing Errors\n * as well as other built-in types: https://github.com/Microsoft/TypeScript/issues/12123\n *\n * @param createImpl A factory function to create the actual constructor implementation. The returned\n * function should be a named function that calls `_super` internally.\n */\nexport function createErrorClass(createImpl: (_super: any) => any): T {\n const _super = (instance: any) => {\n Error.call(instance);\n instance.stack = new Error().stack;\n };\n\n const ctorFunc = createImpl(_super);\n ctorFunc.prototype = Object.create(Error.prototype);\n ctorFunc.prototype.constructor = ctorFunc;\n return ctorFunc;\n}\n", "import { createErrorClass } from './createErrorClass';\n\nexport interface UnsubscriptionError extends Error {\n readonly errors: any[];\n}\n\nexport interface UnsubscriptionErrorCtor {\n /**\n * @deprecated Internal implementation detail. Do not construct error instances.\n * Cannot be tagged as internal: https://github.com/ReactiveX/rxjs/issues/6269\n */\n new (errors: any[]): UnsubscriptionError;\n}\n\n/**\n * An error thrown when one or more errors have occurred during the\n * `unsubscribe` of a {@link Subscription}.\n */\nexport const UnsubscriptionError: UnsubscriptionErrorCtor = createErrorClass(\n (_super) =>\n function UnsubscriptionErrorImpl(this: any, errors: (Error | string)[]) {\n _super(this);\n this.message = errors\n ? `${errors.length} errors occurred during unsubscription:\n${errors.map((err, i) => `${i + 1}) ${err.toString()}`).join('\\n ')}`\n : '';\n this.name = 'UnsubscriptionError';\n this.errors = errors;\n }\n);\n", "/**\n * Removes an item from an array, mutating it.\n * @param arr The array to remove the item from\n * @param item The item to remove\n */\nexport function arrRemove(arr: T[] | undefined | null, item: T) {\n if (arr) {\n const index = arr.indexOf(item);\n 0 <= index && arr.splice(index, 1);\n }\n}\n", "import { isFunction } from './util/isFunction';\nimport { UnsubscriptionError } from './util/UnsubscriptionError';\nimport { SubscriptionLike, TeardownLogic, Unsubscribable } from './types';\nimport { arrRemove } from './util/arrRemove';\n\n/**\n * Represents a disposable resource, such as the execution of an Observable. A\n * Subscription has one important method, `unsubscribe`, that takes no argument\n * and just disposes the resource held by the subscription.\n *\n * Additionally, subscriptions may be grouped together through the `add()`\n * method, which will attach a child Subscription to the current Subscription.\n * When a Subscription is unsubscribed, all its children (and its grandchildren)\n * will be unsubscribed as well.\n *\n * @class Subscription\n */\nexport class Subscription implements SubscriptionLike {\n /** @nocollapse */\n public static EMPTY = (() => {\n const empty = new Subscription();\n empty.closed = true;\n return empty;\n })();\n\n /**\n * A flag to indicate whether this Subscription has already been unsubscribed.\n */\n public closed = false;\n\n private _parentage: Subscription[] | Subscription | null = null;\n\n /**\n * The list of registered finalizers to execute upon unsubscription. Adding and removing from this\n * list occurs in the {@link #add} and {@link #remove} methods.\n */\n private _finalizers: Exclude[] | null = null;\n\n /**\n * @param initialTeardown A function executed first as part of the finalization\n * process that is kicked off when {@link #unsubscribe} is called.\n */\n constructor(private initialTeardown?: () => void) {}\n\n /**\n * Disposes the resources held by the subscription. May, for instance, cancel\n * an ongoing Observable execution or cancel any other type of work that\n * started when the Subscription was created.\n * @return {void}\n */\n unsubscribe(): void {\n let errors: any[] | undefined;\n\n if (!this.closed) {\n this.closed = true;\n\n // Remove this from it's parents.\n const { _parentage } = this;\n if (_parentage) {\n this._parentage = null;\n if (Array.isArray(_parentage)) {\n for (const parent of _parentage) {\n parent.remove(this);\n }\n } else {\n _parentage.remove(this);\n }\n }\n\n const { initialTeardown: initialFinalizer } = this;\n if (isFunction(initialFinalizer)) {\n try {\n initialFinalizer();\n } catch (e) {\n errors = e instanceof UnsubscriptionError ? e.errors : [e];\n }\n }\n\n const { _finalizers } = this;\n if (_finalizers) {\n this._finalizers = null;\n for (const finalizer of _finalizers) {\n try {\n execFinalizer(finalizer);\n } catch (err) {\n errors = errors ?? [];\n if (err instanceof UnsubscriptionError) {\n errors = [...errors, ...err.errors];\n } else {\n errors.push(err);\n }\n }\n }\n }\n\n if (errors) {\n throw new UnsubscriptionError(errors);\n }\n }\n }\n\n /**\n * Adds a finalizer to this subscription, so that finalization will be unsubscribed/called\n * when this subscription is unsubscribed. If this subscription is already {@link #closed},\n * because it has already been unsubscribed, then whatever finalizer is passed to it\n * will automatically be executed (unless the finalizer itself is also a closed subscription).\n *\n * Closed Subscriptions cannot be added as finalizers to any subscription. Adding a closed\n * subscription to a any subscription will result in no operation. (A noop).\n *\n * Adding a subscription to itself, or adding `null` or `undefined` will not perform any\n * operation at all. (A noop).\n *\n * `Subscription` instances that are added to this instance will automatically remove themselves\n * if they are unsubscribed. Functions and {@link Unsubscribable} objects that you wish to remove\n * will need to be removed manually with {@link #remove}\n *\n * @param teardown The finalization logic to add to this subscription.\n */\n add(teardown: TeardownLogic): void {\n // Only add the finalizer if it's not undefined\n // and don't add a subscription to itself.\n if (teardown && teardown !== this) {\n if (this.closed) {\n // If this subscription is already closed,\n // execute whatever finalizer is handed to it automatically.\n execFinalizer(teardown);\n } else {\n if (teardown instanceof Subscription) {\n // We don't add closed subscriptions, and we don't add the same subscription\n // twice. Subscription unsubscribe is idempotent.\n if (teardown.closed || teardown._hasParent(this)) {\n return;\n }\n teardown._addParent(this);\n }\n (this._finalizers = this._finalizers ?? []).push(teardown);\n }\n }\n }\n\n /**\n * Checks to see if a this subscription already has a particular parent.\n * This will signal that this subscription has already been added to the parent in question.\n * @param parent the parent to check for\n */\n private _hasParent(parent: Subscription) {\n const { _parentage } = this;\n return _parentage === parent || (Array.isArray(_parentage) && _parentage.includes(parent));\n }\n\n /**\n * Adds a parent to this subscription so it can be removed from the parent if it\n * unsubscribes on it's own.\n *\n * NOTE: THIS ASSUMES THAT {@link _hasParent} HAS ALREADY BEEN CHECKED.\n * @param parent The parent subscription to add\n */\n private _addParent(parent: Subscription) {\n const { _parentage } = this;\n this._parentage = Array.isArray(_parentage) ? (_parentage.push(parent), _parentage) : _parentage ? [_parentage, parent] : parent;\n }\n\n /**\n * Called on a child when it is removed via {@link #remove}.\n * @param parent The parent to remove\n */\n private _removeParent(parent: Subscription) {\n const { _parentage } = this;\n if (_parentage === parent) {\n this._parentage = null;\n } else if (Array.isArray(_parentage)) {\n arrRemove(_parentage, parent);\n }\n }\n\n /**\n * Removes a finalizer from this subscription that was previously added with the {@link #add} method.\n *\n * Note that `Subscription` instances, when unsubscribed, will automatically remove themselves\n * from every other `Subscription` they have been added to. This means that using the `remove` method\n * is not a common thing and should be used thoughtfully.\n *\n * If you add the same finalizer instance of a function or an unsubscribable object to a `Subscription` instance\n * more than once, you will need to call `remove` the same number of times to remove all instances.\n *\n * All finalizer instances are removed to free up memory upon unsubscription.\n *\n * @param teardown The finalizer to remove from this subscription\n */\n remove(teardown: Exclude): void {\n const { _finalizers } = this;\n _finalizers && arrRemove(_finalizers, teardown);\n\n if (teardown instanceof Subscription) {\n teardown._removeParent(this);\n }\n }\n}\n\nexport const EMPTY_SUBSCRIPTION = Subscription.EMPTY;\n\nexport function isSubscription(value: any): value is Subscription {\n return (\n value instanceof Subscription ||\n (value && 'closed' in value && isFunction(value.remove) && isFunction(value.add) && isFunction(value.unsubscribe))\n );\n}\n\nfunction execFinalizer(finalizer: Unsubscribable | (() => void)) {\n if (isFunction(finalizer)) {\n finalizer();\n } else {\n finalizer.unsubscribe();\n }\n}\n", "import { Subscriber } from './Subscriber';\nimport { ObservableNotification } from './types';\n\n/**\n * The {@link GlobalConfig} object for RxJS. It is used to configure things\n * like how to react on unhandled errors.\n */\nexport const config: GlobalConfig = {\n onUnhandledError: null,\n onStoppedNotification: null,\n Promise: undefined,\n useDeprecatedSynchronousErrorHandling: false,\n useDeprecatedNextContext: false,\n};\n\n/**\n * The global configuration object for RxJS, used to configure things\n * like how to react on unhandled errors. Accessible via {@link config}\n * object.\n */\nexport interface GlobalConfig {\n /**\n * A registration point for unhandled errors from RxJS. These are errors that\n * cannot were not handled by consuming code in the usual subscription path. For\n * example, if you have this configured, and you subscribe to an observable without\n * providing an error handler, errors from that subscription will end up here. This\n * will _always_ be called asynchronously on another job in the runtime. This is because\n * we do not want errors thrown in this user-configured handler to interfere with the\n * behavior of the library.\n */\n onUnhandledError: ((err: any) => void) | null;\n\n /**\n * A registration point for notifications that cannot be sent to subscribers because they\n * have completed, errored or have been explicitly unsubscribed. By default, next, complete\n * and error notifications sent to stopped subscribers are noops. However, sometimes callers\n * might want a different behavior. For example, with sources that attempt to report errors\n * to stopped subscribers, a caller can configure RxJS to throw an unhandled error instead.\n * This will _always_ be called asynchronously on another job in the runtime. This is because\n * we do not want errors thrown in this user-configured handler to interfere with the\n * behavior of the library.\n */\n onStoppedNotification: ((notification: ObservableNotification, subscriber: Subscriber) => void) | null;\n\n /**\n * The promise constructor used by default for {@link Observable#toPromise toPromise} and {@link Observable#forEach forEach}\n * methods.\n *\n * @deprecated As of version 8, RxJS will no longer support this sort of injection of a\n * Promise constructor. If you need a Promise implementation other than native promises,\n * please polyfill/patch Promise as you see appropriate. Will be removed in v8.\n */\n Promise?: PromiseConstructorLike;\n\n /**\n * If true, turns on synchronous error rethrowing, which is a deprecated behavior\n * in v6 and higher. This behavior enables bad patterns like wrapping a subscribe\n * call in a try/catch block. It also enables producer interference, a nasty bug\n * where a multicast can be broken for all observers by a downstream consumer with\n * an unhandled error. DO NOT USE THIS FLAG UNLESS IT'S NEEDED TO BUY TIME\n * FOR MIGRATION REASONS.\n *\n * @deprecated As of version 8, RxJS will no longer support synchronous throwing\n * of unhandled errors. All errors will be thrown on a separate call stack to prevent bad\n * behaviors described above. Will be removed in v8.\n */\n useDeprecatedSynchronousErrorHandling: boolean;\n\n /**\n * If true, enables an as-of-yet undocumented feature from v5: The ability to access\n * `unsubscribe()` via `this` context in `next` functions created in observers passed\n * to `subscribe`.\n *\n * This is being removed because the performance was severely problematic, and it could also cause\n * issues when types other than POJOs are passed to subscribe as subscribers, as they will likely have\n * their `this` context overwritten.\n *\n * @deprecated As of version 8, RxJS will no longer support altering the\n * context of next functions provided as part of an observer to Subscribe. Instead,\n * you will have access to a subscription or a signal or token that will allow you to do things like\n * unsubscribe and test closed status. Will be removed in v8.\n */\n useDeprecatedNextContext: boolean;\n}\n", "import type { TimerHandle } from './timerHandle';\ntype SetTimeoutFunction = (handler: () => void, timeout?: number, ...args: any[]) => TimerHandle;\ntype ClearTimeoutFunction = (handle: TimerHandle) => void;\n\ninterface TimeoutProvider {\n setTimeout: SetTimeoutFunction;\n clearTimeout: ClearTimeoutFunction;\n delegate:\n | {\n setTimeout: SetTimeoutFunction;\n clearTimeout: ClearTimeoutFunction;\n }\n | undefined;\n}\n\nexport const timeoutProvider: TimeoutProvider = {\n // When accessing the delegate, use the variable rather than `this` so that\n // the functions can be called without being bound to the provider.\n setTimeout(handler: () => void, timeout?: number, ...args) {\n const { delegate } = timeoutProvider;\n if (delegate?.setTimeout) {\n return delegate.setTimeout(handler, timeout, ...args);\n }\n return setTimeout(handler, timeout, ...args);\n },\n clearTimeout(handle) {\n const { delegate } = timeoutProvider;\n return (delegate?.clearTimeout || clearTimeout)(handle as any);\n },\n delegate: undefined,\n};\n", "import { config } from '../config';\nimport { timeoutProvider } from '../scheduler/timeoutProvider';\n\n/**\n * Handles an error on another job either with the user-configured {@link onUnhandledError},\n * or by throwing it on that new job so it can be picked up by `window.onerror`, `process.on('error')`, etc.\n *\n * This should be called whenever there is an error that is out-of-band with the subscription\n * or when an error hits a terminal boundary of the subscription and no error handler was provided.\n *\n * @param err the error to report\n */\nexport function reportUnhandledError(err: any) {\n timeoutProvider.setTimeout(() => {\n const { onUnhandledError } = config;\n if (onUnhandledError) {\n // Execute the user-configured error handler.\n onUnhandledError(err);\n } else {\n // Throw so it is picked up by the runtime's uncaught error mechanism.\n throw err;\n }\n });\n}\n", "/* tslint:disable:no-empty */\nexport function noop() { }\n", "import { CompleteNotification, NextNotification, ErrorNotification } from './types';\n\n/**\n * A completion object optimized for memory use and created to be the\n * same \"shape\" as other notifications in v8.\n * @internal\n */\nexport const COMPLETE_NOTIFICATION = (() => createNotification('C', undefined, undefined) as CompleteNotification)();\n\n/**\n * Internal use only. Creates an optimized error notification that is the same \"shape\"\n * as other notifications.\n * @internal\n */\nexport function errorNotification(error: any): ErrorNotification {\n return createNotification('E', undefined, error) as any;\n}\n\n/**\n * Internal use only. Creates an optimized next notification that is the same \"shape\"\n * as other notifications.\n * @internal\n */\nexport function nextNotification(value: T) {\n return createNotification('N', value, undefined) as NextNotification;\n}\n\n/**\n * Ensures that all notifications created internally have the same \"shape\" in v8.\n *\n * TODO: This is only exported to support a crazy legacy test in `groupBy`.\n * @internal\n */\nexport function createNotification(kind: 'N' | 'E' | 'C', value: any, error: any) {\n return {\n kind,\n value,\n error,\n };\n}\n", "import { config } from '../config';\n\nlet context: { errorThrown: boolean; error: any } | null = null;\n\n/**\n * Handles dealing with errors for super-gross mode. Creates a context, in which\n * any synchronously thrown errors will be passed to {@link captureError}. Which\n * will record the error such that it will be rethrown after the call back is complete.\n * TODO: Remove in v8\n * @param cb An immediately executed function.\n */\nexport function errorContext(cb: () => void) {\n if (config.useDeprecatedSynchronousErrorHandling) {\n const isRoot = !context;\n if (isRoot) {\n context = { errorThrown: false, error: null };\n }\n cb();\n if (isRoot) {\n const { errorThrown, error } = context!;\n context = null;\n if (errorThrown) {\n throw error;\n }\n }\n } else {\n // This is the general non-deprecated path for everyone that\n // isn't crazy enough to use super-gross mode (useDeprecatedSynchronousErrorHandling)\n cb();\n }\n}\n\n/**\n * Captures errors only in super-gross mode.\n * @param err the error to capture\n */\nexport function captureError(err: any) {\n if (config.useDeprecatedSynchronousErrorHandling && context) {\n context.errorThrown = true;\n context.error = err;\n }\n}\n", "import { isFunction } from './util/isFunction';\nimport { Observer, ObservableNotification } from './types';\nimport { isSubscription, Subscription } from './Subscription';\nimport { config } from './config';\nimport { reportUnhandledError } from './util/reportUnhandledError';\nimport { noop } from './util/noop';\nimport { nextNotification, errorNotification, COMPLETE_NOTIFICATION } from './NotificationFactories';\nimport { timeoutProvider } from './scheduler/timeoutProvider';\nimport { captureError } from './util/errorContext';\n\n/**\n * Implements the {@link Observer} interface and extends the\n * {@link Subscription} class. While the {@link Observer} is the public API for\n * consuming the values of an {@link Observable}, all Observers get converted to\n * a Subscriber, in order to provide Subscription-like capabilities such as\n * `unsubscribe`. Subscriber is a common type in RxJS, and crucial for\n * implementing operators, but it is rarely used as a public API.\n *\n * @class Subscriber\n */\nexport class Subscriber extends Subscription implements Observer {\n /**\n * A static factory for a Subscriber, given a (potentially partial) definition\n * of an Observer.\n * @param next The `next` callback of an Observer.\n * @param error The `error` callback of an\n * Observer.\n * @param complete The `complete` callback of an\n * Observer.\n * @return A Subscriber wrapping the (partially defined)\n * Observer represented by the given arguments.\n * @nocollapse\n * @deprecated Do not use. Will be removed in v8. There is no replacement for this\n * method, and there is no reason to be creating instances of `Subscriber` directly.\n * If you have a specific use case, please file an issue.\n */\n static create(next?: (x?: T) => void, error?: (e?: any) => void, complete?: () => void): Subscriber {\n return new SafeSubscriber(next, error, complete);\n }\n\n /** @deprecated Internal implementation detail, do not use directly. Will be made internal in v8. */\n protected isStopped: boolean = false;\n /** @deprecated Internal implementation detail, do not use directly. Will be made internal in v8. */\n protected destination: Subscriber | Observer; // this `any` is the escape hatch to erase extra type param (e.g. R)\n\n /**\n * @deprecated Internal implementation detail, do not use directly. Will be made internal in v8.\n * There is no reason to directly create an instance of Subscriber. This type is exported for typings reasons.\n */\n constructor(destination?: Subscriber | Observer) {\n super();\n if (destination) {\n this.destination = destination;\n // Automatically chain subscriptions together here.\n // if destination is a Subscription, then it is a Subscriber.\n if (isSubscription(destination)) {\n destination.add(this);\n }\n } else {\n this.destination = EMPTY_OBSERVER;\n }\n }\n\n /**\n * The {@link Observer} callback to receive notifications of type `next` from\n * the Observable, with a value. The Observable may call this method 0 or more\n * times.\n * @param {T} [value] The `next` value.\n * @return {void}\n */\n next(value?: T): void {\n if (this.isStopped) {\n handleStoppedNotification(nextNotification(value), this);\n } else {\n this._next(value!);\n }\n }\n\n /**\n * The {@link Observer} callback to receive notifications of type `error` from\n * the Observable, with an attached `Error`. Notifies the Observer that\n * the Observable has experienced an error condition.\n * @param {any} [err] The `error` exception.\n * @return {void}\n */\n error(err?: any): void {\n if (this.isStopped) {\n handleStoppedNotification(errorNotification(err), this);\n } else {\n this.isStopped = true;\n this._error(err);\n }\n }\n\n /**\n * The {@link Observer} callback to receive a valueless notification of type\n * `complete` from the Observable. Notifies the Observer that the Observable\n * has finished sending push-based notifications.\n * @return {void}\n */\n complete(): void {\n if (this.isStopped) {\n handleStoppedNotification(COMPLETE_NOTIFICATION, this);\n } else {\n this.isStopped = true;\n this._complete();\n }\n }\n\n unsubscribe(): void {\n if (!this.closed) {\n this.isStopped = true;\n super.unsubscribe();\n this.destination = null!;\n }\n }\n\n protected _next(value: T): void {\n this.destination.next(value);\n }\n\n protected _error(err: any): void {\n try {\n this.destination.error(err);\n } finally {\n this.unsubscribe();\n }\n }\n\n protected _complete(): void {\n try {\n this.destination.complete();\n } finally {\n this.unsubscribe();\n }\n }\n}\n\n/**\n * This bind is captured here because we want to be able to have\n * compatibility with monoid libraries that tend to use a method named\n * `bind`. In particular, a library called Monio requires this.\n */\nconst _bind = Function.prototype.bind;\n\nfunction bind any>(fn: Fn, thisArg: any): Fn {\n return _bind.call(fn, thisArg);\n}\n\n/**\n * Internal optimization only, DO NOT EXPOSE.\n * @internal\n */\nclass ConsumerObserver implements Observer {\n constructor(private partialObserver: Partial>) {}\n\n next(value: T): void {\n const { partialObserver } = this;\n if (partialObserver.next) {\n try {\n partialObserver.next(value);\n } catch (error) {\n handleUnhandledError(error);\n }\n }\n }\n\n error(err: any): void {\n const { partialObserver } = this;\n if (partialObserver.error) {\n try {\n partialObserver.error(err);\n } catch (error) {\n handleUnhandledError(error);\n }\n } else {\n handleUnhandledError(err);\n }\n }\n\n complete(): void {\n const { partialObserver } = this;\n if (partialObserver.complete) {\n try {\n partialObserver.complete();\n } catch (error) {\n handleUnhandledError(error);\n }\n }\n }\n}\n\nexport class SafeSubscriber extends Subscriber {\n constructor(\n observerOrNext?: Partial> | ((value: T) => void) | null,\n error?: ((e?: any) => void) | null,\n complete?: (() => void) | null\n ) {\n super();\n\n let partialObserver: Partial>;\n if (isFunction(observerOrNext) || !observerOrNext) {\n // The first argument is a function, not an observer. The next\n // two arguments *could* be observers, or they could be empty.\n partialObserver = {\n next: (observerOrNext ?? undefined) as (((value: T) => void) | undefined),\n error: error ?? undefined,\n complete: complete ?? undefined,\n };\n } else {\n // The first argument is a partial observer.\n let context: any;\n if (this && config.useDeprecatedNextContext) {\n // This is a deprecated path that made `this.unsubscribe()` available in\n // next handler functions passed to subscribe. This only exists behind a flag\n // now, as it is *very* slow.\n context = Object.create(observerOrNext);\n context.unsubscribe = () => this.unsubscribe();\n partialObserver = {\n next: observerOrNext.next && bind(observerOrNext.next, context),\n error: observerOrNext.error && bind(observerOrNext.error, context),\n complete: observerOrNext.complete && bind(observerOrNext.complete, context),\n };\n } else {\n // The \"normal\" path. Just use the partial observer directly.\n partialObserver = observerOrNext;\n }\n }\n\n // Wrap the partial observer to ensure it's a full observer, and\n // make sure proper error handling is accounted for.\n this.destination = new ConsumerObserver(partialObserver);\n }\n}\n\nfunction handleUnhandledError(error: any) {\n if (config.useDeprecatedSynchronousErrorHandling) {\n captureError(error);\n } else {\n // Ideal path, we report this as an unhandled error,\n // which is thrown on a new call stack.\n reportUnhandledError(error);\n }\n}\n\n/**\n * An error handler used when no error handler was supplied\n * to the SafeSubscriber -- meaning no error handler was supplied\n * do the `subscribe` call on our observable.\n * @param err The error to handle\n */\nfunction defaultErrorHandler(err: any) {\n throw err;\n}\n\n/**\n * A handler for notifications that cannot be sent to a stopped subscriber.\n * @param notification The notification being sent\n * @param subscriber The stopped subscriber\n */\nfunction handleStoppedNotification(notification: ObservableNotification, subscriber: Subscriber) {\n const { onStoppedNotification } = config;\n onStoppedNotification && timeoutProvider.setTimeout(() => onStoppedNotification(notification, subscriber));\n}\n\n/**\n * The observer used as a stub for subscriptions where the user did not\n * pass any arguments to `subscribe`. Comes with the default error handling\n * behavior.\n */\nexport const EMPTY_OBSERVER: Readonly> & { closed: true } = {\n closed: true,\n next: noop,\n error: defaultErrorHandler,\n complete: noop,\n};\n", "/**\n * Symbol.observable or a string \"@@observable\". Used for interop\n *\n * @deprecated We will no longer be exporting this symbol in upcoming versions of RxJS.\n * Instead polyfill and use Symbol.observable directly *or* use https://www.npmjs.com/package/symbol-observable\n */\nexport const observable: string | symbol = (() => (typeof Symbol === 'function' && Symbol.observable) || '@@observable')();\n", "/**\n * This function takes one parameter and just returns it. Simply put,\n * this is like `(x: T): T => x`.\n *\n * ## Examples\n *\n * This is useful in some cases when using things like `mergeMap`\n *\n * ```ts\n * import { interval, take, map, range, mergeMap, identity } from 'rxjs';\n *\n * const source$ = interval(1000).pipe(take(5));\n *\n * const result$ = source$.pipe(\n * map(i => range(i)),\n * mergeMap(identity) // same as mergeMap(x => x)\n * );\n *\n * result$.subscribe({\n * next: console.log\n * });\n * ```\n *\n * Or when you want to selectively apply an operator\n *\n * ```ts\n * import { interval, take, identity } from 'rxjs';\n *\n * const shouldLimit = () => Math.random() < 0.5;\n *\n * const source$ = interval(1000);\n *\n * const result$ = source$.pipe(shouldLimit() ? take(5) : identity);\n *\n * result$.subscribe({\n * next: console.log\n * });\n * ```\n *\n * @param x Any value that is returned by this function\n * @returns The value passed as the first parameter to this function\n */\nexport function identity(x: T): T {\n return x;\n}\n", "import { identity } from './identity';\nimport { UnaryFunction } from '../types';\n\nexport function pipe(): typeof identity;\nexport function pipe(fn1: UnaryFunction): UnaryFunction;\nexport function pipe(fn1: UnaryFunction, fn2: UnaryFunction): UnaryFunction;\nexport function pipe(fn1: UnaryFunction, fn2: UnaryFunction, fn3: UnaryFunction): UnaryFunction;\nexport function pipe(\n fn1: UnaryFunction,\n fn2: UnaryFunction,\n fn3: UnaryFunction,\n fn4: UnaryFunction\n): UnaryFunction;\nexport function pipe(\n fn1: UnaryFunction,\n fn2: UnaryFunction,\n fn3: UnaryFunction,\n fn4: UnaryFunction,\n fn5: UnaryFunction\n): UnaryFunction;\nexport function pipe(\n fn1: UnaryFunction,\n fn2: UnaryFunction,\n fn3: UnaryFunction,\n fn4: UnaryFunction,\n fn5: UnaryFunction,\n fn6: UnaryFunction\n): UnaryFunction;\nexport function pipe(\n fn1: UnaryFunction,\n fn2: UnaryFunction,\n fn3: UnaryFunction,\n fn4: UnaryFunction,\n fn5: UnaryFunction,\n fn6: UnaryFunction,\n fn7: UnaryFunction\n): UnaryFunction;\nexport function pipe(\n fn1: UnaryFunction,\n fn2: UnaryFunction,\n fn3: UnaryFunction,\n fn4: UnaryFunction,\n fn5: UnaryFunction,\n fn6: UnaryFunction,\n fn7: UnaryFunction,\n fn8: UnaryFunction\n): UnaryFunction;\nexport function pipe(\n fn1: UnaryFunction,\n fn2: UnaryFunction,\n fn3: UnaryFunction,\n fn4: UnaryFunction,\n fn5: UnaryFunction,\n fn6: UnaryFunction,\n fn7: UnaryFunction,\n fn8: UnaryFunction,\n fn9: UnaryFunction\n): UnaryFunction;\nexport function pipe(\n fn1: UnaryFunction,\n fn2: UnaryFunction,\n fn3: UnaryFunction,\n fn4: UnaryFunction,\n fn5: UnaryFunction,\n fn6: UnaryFunction,\n fn7: UnaryFunction,\n fn8: UnaryFunction,\n fn9: UnaryFunction,\n ...fns: UnaryFunction[]\n): UnaryFunction;\n\n/**\n * pipe() can be called on one or more functions, each of which can take one argument (\"UnaryFunction\")\n * and uses it to return a value.\n * It returns a function that takes one argument, passes it to the first UnaryFunction, and then\n * passes the result to the next one, passes that result to the next one, and so on. \n */\nexport function pipe(...fns: Array>): UnaryFunction {\n return pipeFromArray(fns);\n}\n\n/** @internal */\nexport function pipeFromArray(fns: Array>): UnaryFunction {\n if (fns.length === 0) {\n return identity as UnaryFunction;\n }\n\n if (fns.length === 1) {\n return fns[0];\n }\n\n return function piped(input: T): R {\n return fns.reduce((prev: any, fn: UnaryFunction) => fn(prev), input as any);\n };\n}\n", "import { Operator } from './Operator';\nimport { SafeSubscriber, Subscriber } from './Subscriber';\nimport { isSubscription, Subscription } from './Subscription';\nimport { TeardownLogic, OperatorFunction, Subscribable, Observer } from './types';\nimport { observable as Symbol_observable } from './symbol/observable';\nimport { pipeFromArray } from './util/pipe';\nimport { config } from './config';\nimport { isFunction } from './util/isFunction';\nimport { errorContext } from './util/errorContext';\n\n/**\n * A representation of any set of values over any amount of time. This is the most basic building block\n * of RxJS.\n *\n * @class Observable\n */\nexport class Observable implements Subscribable {\n /**\n * @deprecated Internal implementation detail, do not use directly. Will be made internal in v8.\n */\n source: Observable | undefined;\n\n /**\n * @deprecated Internal implementation detail, do not use directly. Will be made internal in v8.\n */\n operator: Operator | undefined;\n\n /**\n * @constructor\n * @param {Function} subscribe the function that is called when the Observable is\n * initially subscribed to. This function is given a Subscriber, to which new values\n * can be `next`ed, or an `error` method can be called to raise an error, or\n * `complete` can be called to notify of a successful completion.\n */\n constructor(subscribe?: (this: Observable, subscriber: Subscriber) => TeardownLogic) {\n if (subscribe) {\n this._subscribe = subscribe;\n }\n }\n\n // HACK: Since TypeScript inherits static properties too, we have to\n // fight against TypeScript here so Subject can have a different static create signature\n /**\n * Creates a new Observable by calling the Observable constructor\n * @owner Observable\n * @method create\n * @param {Function} subscribe? the subscriber function to be passed to the Observable constructor\n * @return {Observable} a new observable\n * @nocollapse\n * @deprecated Use `new Observable()` instead. Will be removed in v8.\n */\n static create: (...args: any[]) => any = (subscribe?: (subscriber: Subscriber) => TeardownLogic) => {\n return new Observable(subscribe);\n };\n\n /**\n * Creates a new Observable, with this Observable instance as the source, and the passed\n * operator defined as the new observable's operator.\n * @method lift\n * @param operator the operator defining the operation to take on the observable\n * @return a new observable with the Operator applied\n * @deprecated Internal implementation detail, do not use directly. Will be made internal in v8.\n * If you have implemented an operator using `lift`, it is recommended that you create an\n * operator by simply returning `new Observable()` directly. See \"Creating new operators from\n * scratch\" section here: https://rxjs.dev/guide/operators\n */\n lift(operator?: Operator): Observable {\n const observable = new Observable();\n observable.source = this;\n observable.operator = operator;\n return observable;\n }\n\n subscribe(observerOrNext?: Partial> | ((value: T) => void)): Subscription;\n /** @deprecated Instead of passing separate callback arguments, use an observer argument. Signatures taking separate callback arguments will be removed in v8. Details: https://rxjs.dev/deprecations/subscribe-arguments */\n subscribe(next?: ((value: T) => void) | null, error?: ((error: any) => void) | null, complete?: (() => void) | null): Subscription;\n /**\n * Invokes an execution of an Observable and registers Observer handlers for notifications it will emit.\n *\n * Use it when you have all these Observables, but still nothing is happening.\n *\n * `subscribe` is not a regular operator, but a method that calls Observable's internal `subscribe` function. It\n * might be for example a function that you passed to Observable's constructor, but most of the time it is\n * a library implementation, which defines what will be emitted by an Observable, and when it be will emitted. This means\n * that calling `subscribe` is actually the moment when Observable starts its work, not when it is created, as it is often\n * the thought.\n *\n * Apart from starting the execution of an Observable, this method allows you to listen for values\n * that an Observable emits, as well as for when it completes or errors. You can achieve this in two\n * of the following ways.\n *\n * The first way is creating an object that implements {@link Observer} interface. It should have methods\n * defined by that interface, but note that it should be just a regular JavaScript object, which you can create\n * yourself in any way you want (ES6 class, classic function constructor, object literal etc.). In particular, do\n * not attempt to use any RxJS implementation details to create Observers - you don't need them. Remember also\n * that your object does not have to implement all methods. If you find yourself creating a method that doesn't\n * do anything, you can simply omit it. Note however, if the `error` method is not provided and an error happens,\n * it will be thrown asynchronously. Errors thrown asynchronously cannot be caught using `try`/`catch`. Instead,\n * use the {@link onUnhandledError} configuration option or use a runtime handler (like `window.onerror` or\n * `process.on('error)`) to be notified of unhandled errors. Because of this, it's recommended that you provide\n * an `error` method to avoid missing thrown errors.\n *\n * The second way is to give up on Observer object altogether and simply provide callback functions in place of its methods.\n * This means you can provide three functions as arguments to `subscribe`, where the first function is equivalent\n * of a `next` method, the second of an `error` method and the third of a `complete` method. Just as in case of an Observer,\n * if you do not need to listen for something, you can omit a function by passing `undefined` or `null`,\n * since `subscribe` recognizes these functions by where they were placed in function call. When it comes\n * to the `error` function, as with an Observer, if not provided, errors emitted by an Observable will be thrown asynchronously.\n *\n * You can, however, subscribe with no parameters at all. This may be the case where you're not interested in terminal events\n * and you also handled emissions internally by using operators (e.g. using `tap`).\n *\n * Whichever style of calling `subscribe` you use, in both cases it returns a Subscription object.\n * This object allows you to call `unsubscribe` on it, which in turn will stop the work that an Observable does and will clean\n * up all resources that an Observable used. Note that cancelling a subscription will not call `complete` callback\n * provided to `subscribe` function, which is reserved for a regular completion signal that comes from an Observable.\n *\n * Remember that callbacks provided to `subscribe` are not guaranteed to be called asynchronously.\n * It is an Observable itself that decides when these functions will be called. For example {@link of}\n * by default emits all its values synchronously. Always check documentation for how given Observable\n * will behave when subscribed and if its default behavior can be modified with a `scheduler`.\n *\n * #### Examples\n *\n * Subscribe with an {@link guide/observer Observer}\n *\n * ```ts\n * import { of } from 'rxjs';\n *\n * const sumObserver = {\n * sum: 0,\n * next(value) {\n * console.log('Adding: ' + value);\n * this.sum = this.sum + value;\n * },\n * error() {\n * // We actually could just remove this method,\n * // since we do not really care about errors right now.\n * },\n * complete() {\n * console.log('Sum equals: ' + this.sum);\n * }\n * };\n *\n * of(1, 2, 3) // Synchronously emits 1, 2, 3 and then completes.\n * .subscribe(sumObserver);\n *\n * // Logs:\n * // 'Adding: 1'\n * // 'Adding: 2'\n * // 'Adding: 3'\n * // 'Sum equals: 6'\n * ```\n *\n * Subscribe with functions ({@link deprecations/subscribe-arguments deprecated})\n *\n * ```ts\n * import { of } from 'rxjs'\n *\n * let sum = 0;\n *\n * of(1, 2, 3).subscribe(\n * value => {\n * console.log('Adding: ' + value);\n * sum = sum + value;\n * },\n * undefined,\n * () => console.log('Sum equals: ' + sum)\n * );\n *\n * // Logs:\n * // 'Adding: 1'\n * // 'Adding: 2'\n * // 'Adding: 3'\n * // 'Sum equals: 6'\n * ```\n *\n * Cancel a subscription\n *\n * ```ts\n * import { interval } from 'rxjs';\n *\n * const subscription = interval(1000).subscribe({\n * next(num) {\n * console.log(num)\n * },\n * complete() {\n * // Will not be called, even when cancelling subscription.\n * console.log('completed!');\n * }\n * });\n *\n * setTimeout(() => {\n * subscription.unsubscribe();\n * console.log('unsubscribed!');\n * }, 2500);\n *\n * // Logs:\n * // 0 after 1s\n * // 1 after 2s\n * // 'unsubscribed!' after 2.5s\n * ```\n *\n * @param {Observer|Function} observerOrNext (optional) Either an observer with methods to be called,\n * or the first of three possible handlers, which is the handler for each value emitted from the subscribed\n * Observable.\n * @param {Function} error (optional) A handler for a terminal event resulting from an error. If no error handler is provided,\n * the error will be thrown asynchronously as unhandled.\n * @param {Function} complete (optional) A handler for a terminal event resulting from successful completion.\n * @return {Subscription} a subscription reference to the registered handlers\n * @method subscribe\n */\n subscribe(\n observerOrNext?: Partial> | ((value: T) => void) | null,\n error?: ((error: any) => void) | null,\n complete?: (() => void) | null\n ): Subscription {\n const subscriber = isSubscriber(observerOrNext) ? observerOrNext : new SafeSubscriber(observerOrNext, error, complete);\n\n errorContext(() => {\n const { operator, source } = this;\n subscriber.add(\n operator\n ? // We're dealing with a subscription in the\n // operator chain to one of our lifted operators.\n operator.call(subscriber, source)\n : source\n ? // If `source` has a value, but `operator` does not, something that\n // had intimate knowledge of our API, like our `Subject`, must have\n // set it. We're going to just call `_subscribe` directly.\n this._subscribe(subscriber)\n : // In all other cases, we're likely wrapping a user-provided initializer\n // function, so we need to catch errors and handle them appropriately.\n this._trySubscribe(subscriber)\n );\n });\n\n return subscriber;\n }\n\n /** @internal */\n protected _trySubscribe(sink: Subscriber): TeardownLogic {\n try {\n return this._subscribe(sink);\n } catch (err) {\n // We don't need to return anything in this case,\n // because it's just going to try to `add()` to a subscription\n // above.\n sink.error(err);\n }\n }\n\n /**\n * Used as a NON-CANCELLABLE means of subscribing to an observable, for use with\n * APIs that expect promises, like `async/await`. You cannot unsubscribe from this.\n *\n * **WARNING**: Only use this with observables you *know* will complete. If the source\n * observable does not complete, you will end up with a promise that is hung up, and\n * potentially all of the state of an async function hanging out in memory. To avoid\n * this situation, look into adding something like {@link timeout}, {@link take},\n * {@link takeWhile}, or {@link takeUntil} amongst others.\n *\n * #### Example\n *\n * ```ts\n * import { interval, take } from 'rxjs';\n *\n * const source$ = interval(1000).pipe(take(4));\n *\n * async function getTotal() {\n * let total = 0;\n *\n * await source$.forEach(value => {\n * total += value;\n * console.log('observable -> ' + value);\n * });\n *\n * return total;\n * }\n *\n * getTotal().then(\n * total => console.log('Total: ' + total)\n * );\n *\n * // Expected:\n * // 'observable -> 0'\n * // 'observable -> 1'\n * // 'observable -> 2'\n * // 'observable -> 3'\n * // 'Total: 6'\n * ```\n *\n * @param next a handler for each value emitted by the observable\n * @return a promise that either resolves on observable completion or\n * rejects with the handled error\n */\n forEach(next: (value: T) => void): Promise;\n\n /**\n * @param next a handler for each value emitted by the observable\n * @param promiseCtor a constructor function used to instantiate the Promise\n * @return a promise that either resolves on observable completion or\n * rejects with the handled error\n * @deprecated Passing a Promise constructor will no longer be available\n * in upcoming versions of RxJS. This is because it adds weight to the library, for very\n * little benefit. If you need this functionality, it is recommended that you either\n * polyfill Promise, or you create an adapter to convert the returned native promise\n * to whatever promise implementation you wanted. Will be removed in v8.\n */\n forEach(next: (value: T) => void, promiseCtor: PromiseConstructorLike): Promise;\n\n forEach(next: (value: T) => void, promiseCtor?: PromiseConstructorLike): Promise {\n promiseCtor = getPromiseCtor(promiseCtor);\n\n return new promiseCtor((resolve, reject) => {\n const subscriber = new SafeSubscriber({\n next: (value) => {\n try {\n next(value);\n } catch (err) {\n reject(err);\n subscriber.unsubscribe();\n }\n },\n error: reject,\n complete: resolve,\n });\n this.subscribe(subscriber);\n }) as Promise;\n }\n\n /** @internal */\n protected _subscribe(subscriber: Subscriber): TeardownLogic {\n return this.source?.subscribe(subscriber);\n }\n\n /**\n * An interop point defined by the es7-observable spec https://github.com/zenparsing/es-observable\n * @method Symbol.observable\n * @return {Observable} this instance of the observable\n */\n [Symbol_observable]() {\n return this;\n }\n\n /* tslint:disable:max-line-length */\n pipe(): Observable;\n pipe(op1: OperatorFunction): Observable;\n pipe(op1: OperatorFunction, op2: OperatorFunction): Observable;\n pipe(op1: OperatorFunction, op2: OperatorFunction, op3: OperatorFunction): Observable;\n pipe(\n op1: OperatorFunction,\n op2: OperatorFunction,\n op3: OperatorFunction,\n op4: OperatorFunction\n ): Observable;\n pipe(\n op1: OperatorFunction,\n op2: OperatorFunction,\n op3: OperatorFunction,\n op4: OperatorFunction,\n op5: OperatorFunction\n ): Observable;\n pipe(\n op1: OperatorFunction,\n op2: OperatorFunction,\n op3: OperatorFunction,\n op4: OperatorFunction,\n op5: OperatorFunction,\n op6: OperatorFunction\n ): Observable;\n pipe(\n op1: OperatorFunction,\n op2: OperatorFunction,\n op3: OperatorFunction,\n op4: OperatorFunction,\n op5: OperatorFunction,\n op6: OperatorFunction,\n op7: OperatorFunction\n ): Observable;\n pipe(\n op1: OperatorFunction,\n op2: OperatorFunction,\n op3: OperatorFunction,\n op4: OperatorFunction,\n op5: OperatorFunction,\n op6: OperatorFunction,\n op7: OperatorFunction,\n op8: OperatorFunction\n ): Observable;\n pipe(\n op1: OperatorFunction,\n op2: OperatorFunction,\n op3: OperatorFunction,\n op4: OperatorFunction,\n op5: OperatorFunction,\n op6: OperatorFunction,\n op7: OperatorFunction,\n op8: OperatorFunction,\n op9: OperatorFunction\n ): Observable;\n pipe(\n op1: OperatorFunction,\n op2: OperatorFunction,\n op3: OperatorFunction,\n op4: OperatorFunction,\n op5: OperatorFunction,\n op6: OperatorFunction,\n op7: OperatorFunction,\n op8: OperatorFunction,\n op9: OperatorFunction,\n ...operations: OperatorFunction[]\n ): Observable;\n /* tslint:enable:max-line-length */\n\n /**\n * Used to stitch together functional operators into a chain.\n * @method pipe\n * @return {Observable} the Observable result of all of the operators having\n * been called in the order they were passed in.\n *\n * ## Example\n *\n * ```ts\n * import { interval, filter, map, scan } from 'rxjs';\n *\n * interval(1000)\n * .pipe(\n * filter(x => x % 2 === 0),\n * map(x => x + x),\n * scan((acc, x) => acc + x)\n * )\n * .subscribe(x => console.log(x));\n * ```\n */\n pipe(...operations: OperatorFunction[]): Observable {\n return pipeFromArray(operations)(this);\n }\n\n /* tslint:disable:max-line-length */\n /** @deprecated Replaced with {@link firstValueFrom} and {@link lastValueFrom}. Will be removed in v8. Details: https://rxjs.dev/deprecations/to-promise */\n toPromise(): Promise;\n /** @deprecated Replaced with {@link firstValueFrom} and {@link lastValueFrom}. Will be removed in v8. Details: https://rxjs.dev/deprecations/to-promise */\n toPromise(PromiseCtor: typeof Promise): Promise;\n /** @deprecated Replaced with {@link firstValueFrom} and {@link lastValueFrom}. Will be removed in v8. Details: https://rxjs.dev/deprecations/to-promise */\n toPromise(PromiseCtor: PromiseConstructorLike): Promise;\n /* tslint:enable:max-line-length */\n\n /**\n * Subscribe to this Observable and get a Promise resolving on\n * `complete` with the last emission (if any).\n *\n * **WARNING**: Only use this with observables you *know* will complete. If the source\n * observable does not complete, you will end up with a promise that is hung up, and\n * potentially all of the state of an async function hanging out in memory. To avoid\n * this situation, look into adding something like {@link timeout}, {@link take},\n * {@link takeWhile}, or {@link takeUntil} amongst others.\n *\n * @method toPromise\n * @param [promiseCtor] a constructor function used to instantiate\n * the Promise\n * @return A Promise that resolves with the last value emit, or\n * rejects on an error. If there were no emissions, Promise\n * resolves with undefined.\n * @deprecated Replaced with {@link firstValueFrom} and {@link lastValueFrom}. Will be removed in v8. Details: https://rxjs.dev/deprecations/to-promise\n */\n toPromise(promiseCtor?: PromiseConstructorLike): Promise {\n promiseCtor = getPromiseCtor(promiseCtor);\n\n return new promiseCtor((resolve, reject) => {\n let value: T | undefined;\n this.subscribe(\n (x: T) => (value = x),\n (err: any) => reject(err),\n () => resolve(value)\n );\n }) as Promise;\n }\n}\n\n/**\n * Decides between a passed promise constructor from consuming code,\n * A default configured promise constructor, and the native promise\n * constructor and returns it. If nothing can be found, it will throw\n * an error.\n * @param promiseCtor The optional promise constructor to passed by consuming code\n */\nfunction getPromiseCtor(promiseCtor: PromiseConstructorLike | undefined) {\n return promiseCtor ?? config.Promise ?? Promise;\n}\n\nfunction isObserver(value: any): value is Observer {\n return value && isFunction(value.next) && isFunction(value.error) && isFunction(value.complete);\n}\n\nfunction isSubscriber(value: any): value is Subscriber {\n return (value && value instanceof Subscriber) || (isObserver(value) && isSubscription(value));\n}\n", "import { Observable } from '../Observable';\nimport { Subscriber } from '../Subscriber';\nimport { OperatorFunction } from '../types';\nimport { isFunction } from './isFunction';\n\n/**\n * Used to determine if an object is an Observable with a lift function.\n */\nexport function hasLift(source: any): source is { lift: InstanceType['lift'] } {\n return isFunction(source?.lift);\n}\n\n/**\n * Creates an `OperatorFunction`. Used to define operators throughout the library in a concise way.\n * @param init The logic to connect the liftedSource to the subscriber at the moment of subscription.\n */\nexport function operate(\n init: (liftedSource: Observable, subscriber: Subscriber) => (() => void) | void\n): OperatorFunction {\n return (source: Observable) => {\n if (hasLift(source)) {\n return source.lift(function (this: Subscriber, liftedSource: Observable) {\n try {\n return init(liftedSource, this);\n } catch (err) {\n this.error(err);\n }\n });\n }\n throw new TypeError('Unable to lift unknown Observable type');\n };\n}\n", "import { Subscriber } from '../Subscriber';\n\n/**\n * Creates an instance of an `OperatorSubscriber`.\n * @param destination The downstream subscriber.\n * @param onNext Handles next values, only called if this subscriber is not stopped or closed. Any\n * error that occurs in this function is caught and sent to the `error` method of this subscriber.\n * @param onError Handles errors from the subscription, any errors that occur in this handler are caught\n * and send to the `destination` error handler.\n * @param onComplete Handles completion notification from the subscription. Any errors that occur in\n * this handler are sent to the `destination` error handler.\n * @param onFinalize Additional teardown logic here. This will only be called on teardown if the\n * subscriber itself is not already closed. This is called after all other teardown logic is executed.\n */\nexport function createOperatorSubscriber