_layout
landing

Introduction

ParquetSharp is a cross-platform .NET library for reading and writing Apache Parquet files.

ParquetSharp is implemented in C# as a PInvoke wrapper around Apache Parquet C++ to provide high performance and compatibility. Check out ParquetSharp.DataFrame if you need a convenient integration with the .NET DataFrames.

Supported platforms:

Chip	Linux	Windows	macOS
x64	✔	✔	✔
arm64	✔		✔

	Status
Release Nuget
Pre-Release Nuget
CI Build

Why use Parquet?

Apache Parquet is an open source, column-oriented data file format designed for efficient data storage and retrieval. It provides efficient data compression and encoding schemes with enhanced performance to handle complex data in bulk. Relative to CSV files, Parquet executes queries 34x faster while taking up 87% less space. Source

Quickstart

The following examples show how to write and then read a Parquet file with three columns representing a timeseries of object-value pairs. These use the low-level API, which is the recommended API for working with native .NET types and closely maps to the API of Apache Parquet C++. For reading and writing data in the Apache Arrow format, an Arrow-based API is also provided.

1. Initialize a new project

First, let's create a new console application:

dotnet new console -n ParquetExample
cd ParquetExample

In your project directory, you'll find a Program.cs file that we'll use to write a Parquet file, and then read it back.

2. Install ParquetSharp

ParquetSharp is available as a NuGet package. You can install it using the following command:

dotnet add package ParquetSharp

3. Write a Parquet File

This example shows how to write a Parquet file with three columns: Timestamp, ObjectId, and Value.

Update your Program.cs with the following code:

using System;
using ParquetSharp;

class Program
{
    static void Main()
    {
        var timestamps = new DateTime[] { DateTime.Now, DateTime.Now.AddMinutes(1) };
        var objectIds = new int[] { 1, 2 };
        var values = new float[] { 1.23f, 4.56f };

        var columns = new Column[]
        {
            new Column<DateTime>("Timestamp"),
            new Column<int>("ObjectId"),
            new Column<float>("Value")
        };

        using var file = new ParquetFileWriter("float_timeseries.parquet", columns);
        using var rowGroup = file.AppendRowGroup();

        using (var timestampWriter = rowGroup.NextColumn().LogicalWriter<DateTime>())
        {
            timestampWriter.WriteBatch(timestamps);
        }
        using (var objectIdWriter = rowGroup.NextColumn().LogicalWriter<int>())
        {
            objectIdWriter.WriteBatch(objectIds);
        }
        using (var valueWriter = rowGroup.NextColumn().LogicalWriter<float>())
        {
            valueWriter.WriteBatch(values);
        }

        file.Close();
        Console.WriteLine("Parquet file written successfully!");
    }
}

You can execute it with:

dotnet run

4. Read a Parquet File

After writing the Parquet file, we can read it back by updating the Program.cs file with the following code:

using System;
using ParquetSharp;

class Program
{
    static void Main()
    {
        using var file = new ParquetFileReader("float_timeseries.parquet");

        for (int rowGroup = 0; rowGroup < file.FileMetaData.NumRowGroups; ++rowGroup)
        {
            using var rowGroupReader = file.RowGroup(rowGroup);
            var groupNumRows = checked((int)rowGroupReader.MetaData.NumRows);

            var groupTimestamps = rowGroupReader.Column(0).LogicalReader<DateTime>().ReadAll(groupNumRows);
            var groupObjectIds = rowGroupReader.Column(1).LogicalReader<int>().ReadAll(groupNumRows);
            var groupValues = rowGroupReader.Column(2).LogicalReader<float>().ReadAll(groupNumRows);

            Console.WriteLine("Read Parquet file:");
            for (int i = 0; i < groupNumRows; ++i)
            {
                Console.WriteLine($"Timestamp: {groupTimestamps[i]}, ObjectId: {groupObjectIds[i]}, Value: {groupValues[i]}");
            }
        }

        file.Close();
    }
}

Once again, run the program with:

dotnet run

This should give you an output similar to:

Read Parquet file:
Timestamp: 2025-01-25 10:15:25 AM, ObjectId: 1, Value: 1.23
Timestamp: 2025-01-25 10:16:25 AM, ObjectId: 2, Value: 4.56

Documentation

For more detailed information on how to use ParquetSharp, see the following documentation:

Writing Parquet files
Reading Parquet files
Working with nested data
Reading and writing Arrow data — how to read and write data using the Apache Arrow format
Row-oriented API — a higher level API that abstracts away the column-oriented nature of Parquet files
Custom types — how to customize the mapping between .NET and Parquet types, including using the DateOnly and TimeOnly types added in .NET 6.
Encryption — using Parquet Modular Encryption to read and write encrypted data
Writing TimeSpan data — interoperability with other libraries when writing TimeSpan data
Use from PowerShell

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

index.md

index.md

Introduction

Why use Parquet?

Quickstart

1. Initialize a new project

2. Install ParquetSharp

3. Write a Parquet File

4. Read a Parquet File

Documentation

Files

index.md

Latest commit

History

index.md

File metadata and controls

Introduction

Why use Parquet?

Quickstart

1. Initialize a new project

2. Install ParquetSharp

3. Write a Parquet File

4. Read a Parquet File

Documentation