Skip to content

Commit

Permalink
docs: Consolidate "getting started" and "user guide" sections (#12246)
Browse files Browse the repository at this point in the history
  • Loading branch information
stinodego authored Nov 8, 2023
1 parent d963338 commit 444ae3a
Show file tree
Hide file tree
Showing 19 changed files with 98 additions and 136 deletions.
31 changes: 0 additions & 31 deletions docs/getting-started/installation.md

This file was deleted.

16 changes: 0 additions & 16 deletions docs/getting-started/intro.md

This file was deleted.

9 changes: 0 additions & 9 deletions docs/index.md
Original file line number Diff line number Diff line change
Expand Up @@ -32,15 +32,6 @@ Polars is a highly performant DataFrame library for manipulating structured data
- **Parallel**: Polars fully utilises the power of your machine by dividing the workload among the available CPU cores without any additional configuration.
- **Vectorized Query Engine**: Polars uses [Apache Arrow](https://arrow.apache.org/), a columnar data format, to process your queries in a vectorized manner. It uses [SIMD](https://en.wikipedia.org/wiki/Single_instruction,_multiple_data) to optimize CPU usage.

## About this guide

The Polars user guide is intended to live alongside the API documentation. Its purpose is to explain (new) users how to use Polars and to provide meaningful examples. The guide is split into two parts:

- [Getting started](getting-started/intro.md): A 10 minute helicopter view of the library and its primary function.
- [User guide](user-guide/index.md): A detailed explanation of how the library is setup and how to use it most effectively.

If you are looking for details on a specific level / object, it is probably best to go the API documentation: [Python](https://pola-rs.github.io/polars/py-polars/html/reference/index.html) | [Rust](https://docs.rs/polars/latest/polars/).

## Performance :rocket: :rocket:

Polars is very fast, and in fact is one of the best performing solutions available.
Expand Down
File renamed without changes.
File renamed without changes.
Original file line number Diff line number Diff line change
Expand Up @@ -7,124 +7,124 @@
- `with_columns`
- `group_by`

To learn more about expressions and the context in which they operate, see the User Guide sections: [Contexts](../user-guide/concepts/contexts.md) and [Expressions](../user-guide/concepts/expressions.md).
To learn more about expressions and the context in which they operate, see the User Guide sections: [Contexts](../concepts/contexts.md) and [Expressions](../concepts/expressions.md).

### Select statement

To select a column we need to do two things. Define the `DataFrame` we want the data from. And second, select the data that we need. In the example below you see that we select `col('*')`. The asterisk stands for all columns.

{{code_block('getting-started/expressions','select',['select'])}}
{{code_block('user-guide/basics/expressions','select',['select'])}}

```python exec="on" result="text" session="getting-started/expressions"
--8<-- "python/getting-started/expressions.py:setup"
--8<-- "python/user-guide/basics/expressions.py:setup"
print(
--8<-- "python/getting-started/expressions.py:select"
--8<-- "python/user-guide/basics/expressions.py:select"
)
```

You can also specify the specific columns that you want to return. There are two ways to do this. The first option is to pass the column names, as seen below.

{{code_block('getting-started/expressions','select2',['select'])}}
{{code_block('user-guide/basics/expressions','select2',['select'])}}

```python exec="on" result="text" session="getting-started/expressions"
print(
--8<-- "python/getting-started/expressions.py:select2"
--8<-- "python/user-guide/basics/expressions.py:select2"
)
```

The second option is to specify each column using `pl.col`. This option is shown below.

{{code_block('getting-started/expressions','select3',['select'])}}
{{code_block('user-guide/basics/expressions','select3',['select'])}}

```python exec="on" result="text" session="getting-started/expressions"
print(
--8<-- "python/getting-started/expressions.py:select3"
--8<-- "python/user-guide/basics/expressions.py:select3"
)
```

If you want to exclude an entire column from your view, you can simply use `exclude` in your `select` statement.

{{code_block('getting-started/expressions','exclude',['select'])}}
{{code_block('user-guide/basics/expressions','exclude',['select'])}}

```python exec="on" result="text" session="getting-started/expressions"
print(
--8<-- "python/getting-started/expressions.py:exclude"
--8<-- "python/user-guide/basics/expressions.py:exclude"
)
```

### Filter

The `filter` option allows us to create a subset of the `DataFrame`. We use the same `DataFrame` as earlier and we filter between two specified dates.

{{code_block('getting-started/expressions','filter',['filter'])}}
{{code_block('user-guide/basics/expressions','filter',['filter'])}}

```python exec="on" result="text" session="getting-started/expressions"
print(
--8<-- "python/getting-started/expressions.py:filter"
--8<-- "python/user-guide/basics/expressions.py:filter"
)
```

With `filter` you can also create more complex filters that include multiple columns.

{{code_block('getting-started/expressions','filter2',['filter'])}}
{{code_block('user-guide/basics/expressions','filter2',['filter'])}}

```python exec="on" result="text" session="getting-started/expressions"
print(
--8<-- "python/getting-started/expressions.py:filter2"
--8<-- "python/user-guide/basics/expressions.py:filter2"
)
```

### With_columns

`with_columns` allows you to create new columns for your analyses. We create two new columns `e` and `b+42`. First we sum all values from column `b` and store the results in column `e`. After that we add `42` to the values of `b`. Creating a new column `b+42` to store these results.

{{code_block('getting-started/expressions','with_columns',['with_columns'])}}
{{code_block('user-guide/basics/expressions','with_columns',['with_columns'])}}

```python exec="on" result="text" session="getting-started/expressions"
print(
--8<-- "python/getting-started/expressions.py:with_columns"
--8<-- "python/user-guide/basics/expressions.py:with_columns"
)
```

### Group by

We will create a new `DataFrame` for the Group by functionality. This new `DataFrame` will include several 'groups' that we want to group by.

{{code_block('getting-started/expressions','dataframe2',['DataFrame'])}}
{{code_block('user-guide/basics/expressions','dataframe2',['DataFrame'])}}

```python exec="on" result="text" session="getting-started/expressions"
--8<-- "python/getting-started/expressions.py:dataframe2"
--8<-- "python/user-guide/basics/expressions.py:dataframe2"
print(df2)
```

{{code_block('getting-started/expressions','group_by',['group_by'])}}
{{code_block('user-guide/basics/expressions','group_by',['group_by'])}}

```python exec="on" result="text" session="getting-started/expressions"
print(
--8<-- "python/getting-started/expressions.py:group_by"
--8<-- "python/user-guide/basics/expressions.py:group_by"
)
```

{{code_block('getting-started/expressions','group_by2',['group_by'])}}
{{code_block('user-guide/basics/expressions','group_by2',['group_by'])}}

```python exec="on" result="text" session="getting-started/expressions"
print(
--8<-- "python/getting-started/expressions.py:group_by2"
--8<-- "python/user-guide/basics/expressions.py:group_by2"
)
```

### Combining operations

Below are some examples on how to combine operations to create the `DataFrame` you require.

{{code_block('getting-started/expressions','combine',['select','with_columns'])}}
{{code_block('user-guide/basics/expressions','combine',['select','with_columns'])}}

```python exec="on" result="text" session="getting-started/expressions"
--8<-- "python/getting-started/expressions.py:combine"
--8<-- "python/user-guide/basics/expressions.py:combine"
```

{{code_block('getting-started/expressions','combine2',['select','with_columns'])}}
{{code_block('user-guide/basics/expressions','combine2',['select','with_columns'])}}

```python exec="on" result="text" session="getting-started/expressions"
--8<-- "python/getting-started/expressions.py:combine2"
--8<-- "python/user-guide/basics/expressions.py:combine2"
```
18 changes: 18 additions & 0 deletions docs/user-guide/basics/index.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,18 @@
# Introduction

This chapter is intended for new Polars users.
The goal is to provide a quick overview of the most common functionality.
Feel free to skip ahead to the [next chapter](../concepts/data-types.md) to dive into the details.

!!! rust "Rust Users Only"

Due to historical reasons, the eager API in Rust is outdated. In the future, we would like to redesign it as a small wrapper around the lazy API (as is the design in Python / NodeJS). In the examples, we will use the lazy API instead with `.lazy()` and `.collect()`. For now you can ignore these two functions. If you want to know more about the lazy and eager API, go [here](../concepts/lazy-vs-eager.md).

To enable the Lazy API ensure you have the feature flag `lazy` configured when installing Polars
```
# Cargo.toml
[dependencies]
polars = { version = "x", features = ["lazy", ...]}
```

Because of the ownership ruling in Rust, we can not reuse the same `DataFrame` multiple times in the examples. For simplicity reasons we call `clone()` to overcome this issue. Note that this does not duplicate the data but just increments a pointer (`Arc`).
Original file line number Diff line number Diff line change
Expand Up @@ -6,21 +6,21 @@ There are two ways `DataFrame`s can be combined depending on the use case: join

Polars supports all types of join (e.g. left, right, inner, outer). Let's have a closer look on how to `join` two `DataFrames` into a single `DataFrame`. Our two `DataFrames` both have an 'id'-like column: `a` and `x`. We can use those columns to `join` the `DataFrames` in this example.

{{code_block('getting-started/joins','join',['join'])}}
{{code_block('user-guide/basics/joins','join',['join'])}}

```python exec="on" result="text" session="getting-started/joins"
--8<-- "python/getting-started/joins.py:setup"
--8<-- "python/getting-started/joins.py:join"
--8<-- "python/user-guide/basics/joins.py:setup"
--8<-- "python/user-guide/basics/joins.py:join"
```

To see more examples with other types of joins, go the [User Guide](../user-guide/transformations/joins.md).
To see more examples with other types of joins, go the [User Guide](../transformations/joins.md).

## Concat

We can also `concatenate` two `DataFrames`. Vertical concatenation will make the `DataFrame` longer. Horizontal concatenation will make the `DataFrame` wider. Below you can see the result of an horizontal concatenation of our two `DataFrames`.

{{code_block('getting-started/joins','hstack',['hstack'])}}
{{code_block('user-guide/basics/joins','hstack',['hstack'])}}

```python exec="on" result="text" session="getting-started/joins"
--8<-- "python/getting-started/joins.py:hstack"
--8<-- "python/user-guide/basics/joins.py:hstack"
```
Original file line number Diff line number Diff line change
Expand Up @@ -2,44 +2,44 @@

Polars supports reading and writing to all common files (e.g. csv, json, parquet), cloud storage (S3, Azure Blob, BigQuery) and databases (e.g. postgres, mysql). In the following examples we will show how to operate on most common file formats. For the following dataframe

{{code_block('getting-started/reading-writing','dataframe',['DataFrame'])}}
{{code_block('user-guide/basics/reading-writing','dataframe',['DataFrame'])}}

```python exec="on" result="text" session="getting-started/reading"
--8<-- "python/getting-started/reading-writing.py:dataframe"
--8<-- "python/user-guide/basics/reading-writing.py:dataframe"
```

#### CSV

Polars has its own fast implementation for csv reading with many flexible configuration options.

{{code_block('getting-started/reading-writing','csv',['read_csv','write_csv'])}}
{{code_block('user-guide/basics/reading-writing','csv',['read_csv','write_csv'])}}

```python exec="on" result="text" session="getting-started/reading"
--8<-- "python/getting-started/reading-writing.py:csv"
--8<-- "python/user-guide/basics/reading-writing.py:csv"
```

As we can see above, Polars made the datetimes a `string`. We can tell Polars to parse dates, when reading the csv, to ensure the date becomes a datetime. The example can be found below:

{{code_block('getting-started/reading-writing','csv2',['read_csv'])}}
{{code_block('user-guide/basics/reading-writing','csv2',['read_csv'])}}

```python exec="on" result="text" session="getting-started/reading"
--8<-- "python/getting-started/reading-writing.py:csv2"
--8<-- "python/user-guide/basics/reading-writing.py:csv2"
```

#### JSON

{{code_block('getting-started/reading-writing','json',['read_json','write_json'])}}
{{code_block('user-guide/basics/reading-writing','json',['read_json','write_json'])}}

```python exec="on" result="text" session="getting-started/reading"
--8<-- "python/getting-started/reading-writing.py:json"
--8<-- "python/user-guide/basics/reading-writing.py:json"
```

#### Parquet

{{code_block('getting-started/reading-writing','parquet',['read_parquet','write_parquet'])}}
{{code_block('user-guide/basics/reading-writing','parquet',['read_parquet','write_parquet'])}}

```python exec="on" result="text" session="getting-started/reading"
--8<-- "python/getting-started/reading-writing.py:parquet"
--8<-- "python/user-guide/basics/reading-writing.py:parquet"
```

To see more examples and other data formats go to the [User Guide](../user-guide/io/csv.md), section IO.
To see more examples and other data formats go to the [User Guide](../io/csv.md), section IO.
Loading

0 comments on commit 444ae3a

Please sign in to comment.