Skip to content

Commit

Permalink
ScalaPB docs
Browse files Browse the repository at this point in the history
  • Loading branch information
mjakubowski84 committed Sep 22, 2023
1 parent 310faf8 commit 56e82c2
Show file tree
Hide file tree
Showing 3 changed files with 41 additions and 4 deletions.
Original file line number Diff line number Diff line change
Expand Up @@ -5,6 +5,7 @@ import com.github.mjakubowski84.parquet4s.protobuf.Data
import com.github.mjakubowski84.parquet4s.{ParquetReader, ParquetWriter, Path}

import java.nio.file.Files
import scala.util.Using

object WriteAndReadApp extends App {
val data = (1 to 100).map(id => Data(id = id, text = id.toString))
Expand All @@ -14,8 +15,5 @@ object WriteAndReadApp extends App {
ParquetWriter.of[Data].writeAndClose(path.append("data.parquet"), data)

// read
val readData = ParquetReader.as[Data].read(path)
// hint: you can filter by dict using string value, for example: filter = Col("dict") === "A"
try readData.foreach(println)
finally readData.close()
Using(ParquetReader.as[Data].read(path))(_.foreach(println))
}
3 changes: 3 additions & 0 deletions site/docs/data/menu.yml
Original file line number Diff line number Diff line change
Expand Up @@ -37,3 +37,6 @@ options:

- title: (Experimental) ETL
url: docs/experimental

- title: (Experimental) Protobuf
url: docs/protobuf
36 changes: 36 additions & 0 deletions site/docs/docs/protobuf.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,36 @@
---
layout: docs
title: Read and write Parquet from and to Protobuf
permalink: docs/protobuf/
---

# Read and write Parquet from and to Protobuf

Using the original Java Parquet library, you can read and write parquet to and from Protbuf. Parquet4s has `custom` functions in its API, which could be leveraged for that. However, Protobuf Parquet can only be used with Java models, not to mention other issues that make it hard to use, especially in Scala. You would prefer to use [ScalaPB](https://scalapb.github.io/) in Scala projects, right? Thanks to Parquet4S, you can! Import ScalaPB extension to any Parquet4S project, either it is Akka, FS2 or plain Scala:

```scala
"com.github.mjakubowski84" %% "parquet4s-scalapb" % "@VERSION@"
```

Follow the ScalaPB [documentation](https://scalapb.github.io/docs/installation) to generate your Scala model from `.proto` files.

Then, import Parquet4S type classes tailored for Protobuf. The rest of the code stays the same as in regular Parquet4S - no matter if that is Akka, FS2 or core!

```scala mdoc:compile-only
import com.github.mjakubowski84.parquet4s.ScalaPBImplicits.*
import com.github.mjakubowski84.parquet4s.protobuf.Data
import com.github.mjakubowski84.parquet4s.{ParquetReader, ParquetWriter, Path}

import scala.util.Using

val data: Iterable[Data] = ??? // your data
val path: Path = ??? // path to write to / to read from

// write
ParquetWriter.of[Data].writeAndClose(path.append("data.parquet"), data)

// read
Using(ParquetReader.as[Data].read(path))(_.foreach(println))
```

Please follow the [examples](https://github.com/mjakubowski84/parquet4s/tree/master/examples/src/main/scala/com/github/mjakubowski84/parquet4s/scalapb) to learn more.

0 comments on commit 56e82c2

Please sign in to comment.