Skip to content

Commit

Permalink
Polish content of README.md file
Browse files Browse the repository at this point in the history
  • Loading branch information
stloyd authored Oct 13, 2023
1 parent b1d5ad7 commit 07caa39
Showing 1 changed file with 32 additions and 33 deletions.
65 changes: 32 additions & 33 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,23 +1,38 @@
![img](docs/flow_php_banner_02_2022.png)

Flow is a PHP based, strongly typed ETL (Extract Transform Load), asynchronous data processing library with constant memory consumption.
Flow is a PHP-based, strongly typed ETL (Extract Transform Load), asynchronous data processing library with constant memory consumption.

[![Latest Stable Version](https://poser.pugx.org/flow-php/flow/v)](https://packagist.org/packages/flow-php/flow)
[![Latest Unstable Version](https://poser.pugx.org/flow-php/flow/v/unstable)](https://packagist.org/packages/flow-php/flow)
[![License](https://poser.pugx.org/flow-php/flow/license)](https://packagist.org/packages/flow-php/flow)
[![Test Suite](https://github.com/flow-php/flow/actions/workflows/test-suite.yml/badge.svg?branch=1.x)](https://github.com/flow-php/flow/actions/workflows/test-suite.yml)

Supported PHP versions
Supported PHP versions: [![PHP 8.1](https://img.shields.io/badge/php-~8.1-8892BF.svg)](https://php.net/) [![PHP 8.2](https://img.shields.io/badge/php-~8.2-8892BF.svg)](https://php.net/)

* [![Supported PHP Version](https://img.shields.io/badge/php-~8.1-8892BF.svg)](https://php.net/)
* [![Supported PHP Version](https://img.shields.io/badge/php-~8.2-8892BF.svg)](https://php.net/)
## Features

* low and constant memory consumption
* asynchronous data processing
* reading from any data source
* writing to any data source
* rich collection of data transformation functions
* direct access to remote filesystems
* partitioning
* grouping & aggregating
* remote file processing
* joins
* sorting
* displaying datasets as ASCII table
* validation against the schema
* window functions
* caching

📈[Project Roadmap](https://github.com/orgs/flow-php/projects/1)

## Installation

This package is a [monorepo](https://tomasvotruba.com/blog/2019/10/28/all-you-always-wanted-to-know-about-monorepo-but-were-afraid-to-ask/).
Please check below packages and select only those that you are going to use,
Please check the below packages and select only those that you are going to use,
this will reduce the number of unnecessary dependencies in your project (less maintenance).

- [ETL](src/core/etl/README.md)
Expand All @@ -38,10 +53,12 @@ this will reduce the number of unnecessary dependencies in your project (less ma
- [text](src/adapter/etl-adapter-text/README.md)
- [xml](src/adapter/etl-adapter-xml/README.md)
- Libraries
- [array-dot](src/lib/array-dot/README.md) - auto included
- [array-dot](src/lib/array-dot/README.md)
- [doctrine-dbal-bulk](src/lib/doctrine-dbal-bulk/README.md)
- [Google Dremel algorithm](src/lib/dremel/README.md)
- [Parquet](src/lib/parquet/README.md)

For example if you want to work with json/csv files here are dependencies you will need to install:
For example, if you want to work with JSON/CSV files here are the dependencies you will need to install:

```shell
composer require flow-php/etl:^0.1 flow-php/etl-adapter-csv:^0.1 flow-php/etl-adapter-json:^0.1
Expand All @@ -53,40 +70,22 @@ In order to understand how Flow works, please read [documentation](src/core/etl/

### [Usage Examples](examples/README.md)

## Features

* low and constant memory consumption
* asynchronous data processing
* reading from any data source
* writing to any data source
* rich collection of data transformation functions
* direct access to remote filesystems
* partitioning
* grouping & aggregating
* remote files processing
* joins
* sorting
* displaying datasets as ASCII table
* validation against schema
* window functions
* caching

## Asynchronous Processing

* [etl-adapter-amphp](https://github.com/flow-php/etl-adapter-amphp)
* [etl-adapter-reactphp](https://github.com/flow-php/etl-adapter-reactphp)

## Building blocks

* DataFrame - Lazy data processing frame.
* Rows - Immutable collection of `Row` objects.
* Row - Immutable, strongly typed collection of `Entry` objects.
* Entry - Immutable, strongly typed object representing cell in a row.
* Entry - Immutable, strongly typed object representing a cell in a row.
* **E**xtractor (Reader) - Memory safe, Data Source returning \Generator, yielding `Rows` to the `Pipeline`
* **T**ransformer - Data transformer receiving and returning `Rows` (in most cases transformer), one instance of `Rows` at once.
* **L**oader (Writer) - Memory safe representation of Data Sink, responsibility of Loader is to write `Rows` into destination storage, one at time.
* **L**oader (Writer) - Memory safe representation of Data Sink, the responsibility of Loader is to write `Rows` into destination storage, one at time.
* Pipeline - Interface representing ETL process, each received `Rows` instanced is passed through all `Pipes`, also responsible for error handling.
* Pipe - Loader of Transformer instance existing in `Pipes` collection.
* Pipe - Loader of Transformer instance existing in the `Pipes` collection.

## Asynchronous Processing

* [etl-adapter-amphp](https://github.com/flow-php/etl-adapter-amphp)
* [etl-adapter-reactphp](https://github.com/flow-php/etl-adapter-reactphp)

### GitHub Stars

Expand Down

0 comments on commit 07caa39

Please sign in to comment.