Skip to content

Latest commit

 

History

History
400 lines (276 loc) · 17.1 KB

how_to_setup_metadata_engine.md

File metadata and controls

400 lines (276 loc) · 17.1 KB
sidebar_label sidebar_position slug
How to Setup Metadata Engine
3
/databases_for_metadata

How to Setup Metadata Engine

:::tip Version Tips The environment variable META_PASSWORD used in this document is a new feature in JuiceFS v1.0, and not applied to old clients. Please upgrade the clients before using it if you are using the old ones. :::

As mentioned in JuiceFS Technical Architecture and How JuiceFS Store Files, JuiceFS is designed to store data and metadata seperately. Generally, data is stored in the cloud storage based on object storage, and metadata corresponding to the data is stored in an independent database. The database that supports storing metadata is referred to "Metadata Storage Engine".

Metadata Storage Engine

Metadata is crucially important to a file system as it contains all the detailed information of each file, such as name, size, permissions and location. Especially, for the file system where data and metadata are stored separately, the read and write performance of metadata directly determines the file system performance, and the engine that stores metadata is the most fundamental determinant of performance and reliability.

The metadata storage of JuiceFS uses a multi-engine design. In order to create an ultra-high-performance cloud-native file system, JuiceFS first supports Redis, an in-memory Key-Value database, which makes JuiceFS ten times more powerful than Amazon EFS and S3FS performance. Test results can be seen here.

However, based on the feedback from community users, we have noticed that a high-performance file system is not urgently required in many application scenarios. Sometimes users just want to find a convenient tool to migrate data on the cloud with high reliability, or to mount the object storage locally to use on a small scale. Therefore, JuiceFS has successively opened up support for more databases such as MySQL/MariaDB and SQLite. The comparison of performance can be found here).

A special attention needs to be paid while using the JuiceFS file system - no matter which database you choose to store metadata, please ensure the safety of the metadata! Once the metadata is damaged or lost, the corresponding data will accordingly be damaged or lost, and the entire file system can even be damaged in the worse cases.

:::caution No matter which database is used to store metadata, it is important to ensure the safety of metadata. The corruption or loss of metadata will directly cause the damage of the corresponding data, or even the whole file system. For production environments, you should always choose a database with high availability, and at the same time, it is recommended to "backup metadata" periodically. :::

Redis

Redis is an open source (BSD license) memory-based Key-Value storage system, often used as a database, cache, and message broker.

:::note JuiceFS requires Redis 4.0+ :::

Create a file system

When using Redis as the metadata storage engine, the following format is usually used to access the database:

# use tcp
redis[s]://[<username>:<password>@]<host>[:<port>]/<db>

# use unix socket 
unix://[<username>:<password>@]<socket-file-path>?db=<db>

Where [] enclosed are optional and the rest are mandatory.

  • If the TLS feature of Redis is enabled, the protocol header needs to use rediss://, otherwise use redis://.
  • <username> is introduced after Redis 6.0 and can be ignored if there is no username, but the : colon in front of the password needs to be kept, e.g. redis://:<password>@<host>:6379/1.
  • The default port number on which Redis listens is 6379, which can be left blank if the default port number is not changed, e.g. redis://:<password>@<host>/1.
  • Redis supports multiple logical databases, please replace <db> with the actual database number used.
  • If you need to connect to Redis Sentinel, the format of the metadata URL will be slightly different, please refer to the "Redis Best Practices" document for details.

For example, the following command will create a JuiceFS file system named pics, using the database No. 1 in Redis to store metadata:

$ juicefs format --storage s3 \
    ...
    "redis://:[email protected]:6379/1" \
    pics

For security purposes, it is recommended to pass the password using the environment variable META_PASSWORD or REDIS_PASSWORD, e.g.

export META_PASSWORD=mypassword

Then there is no need to set a password in the metadata URL.

$ juicefs format --storage s3 \
    ...
    "redis://192.168.1.6:6379/1" \
    pics

:::note You can also use the standard URL syntax when passing database passwords using environment variables, e.g., "redis://:@192.168.1.6:6379/1" which preserves the : and @ separators between the username and password. :::

Mount a file system

sudo juicefs mount -d "redis://:[email protected]:6379/1" /mnt/jfs

Passing passwords with the META_PASSWORD or REDIS_PASSWORD environment variables is also supported when mounting file systems.

$ export META_PASSWORD=mypassword
$ sudo juicefs mount -d "redis://192.168.1.6:6379/1" /mnt/jfs

:::tip If you need to share the same file system on multiple servers, you must ensure that each server has access to the database where the metadata is stored. :::

If you maintain the Redis database on your own, it is recommended to read Redis Best Practices.

KeyDB

KeyDB is an open source fork of Redis, developed to stay aligned with the Redis community. KeyDB implements multi-threading support, better memory utilization, and greater throughput on top of Redis, and also supports Active Replication, i.e., the Active Active feature.

:::note Same as Redis, the Active Replication is asychronous, which may cause consistency issues. So use with caution! :::

When being used as metadata storage engine for Juice, KeyDB is used exactly in the same way as Redis. So please refer to the Redis section for usage.

PostgreSQL

PostgreSQL is a powerful open source relational database with a perfect ecosystem and rich application scenarios, and it also works as the metadata engine of JuiceFS.

Many cloud computing platforms offer hosted PostgreSQL database services, or you can deploy one yourself by following the Usage Wizard.

Other PostgreSQL-compatible databases (such as CockroachDB) can also be used as metadata engine.

Create a file system

When using PostgreSQL as the metadata storage engine, you need to create a database manually before creating the file system by following the format below:

# use tcp
postgres://<username>[:<password>]@<host>[:5432]/<database-name>[?parameters]

# use unix socket
postgres:///<database-name>?host=<socket-directories-path>

Where [] enclosed are optional and the rest are mandatory.

For example:

$ juicefs format --storage s3 \
    ...
    "postgres://user:[email protected]:5432/juicefs" \
    pics

A more secure approach would be to pass the database password through the environment variable META_PASSWORD:

$ export META_PASSWORD="mypassword"
$ juicefs format --storage s3 \
    ...
    "postgres://[email protected]:5432/juicefs" \
    pics

Mount a file system

sudo juicefs mount -d "postgres://user:[email protected]:5432/juicefs" /mnt/jfs

Passing password with the META_PASSWORD environment variable is also supported when mounting a file system.

$ export META_PASSWORD="mypassword"
$ sudo juicefs mount -d "postgres://[email protected]:5432/juicefs" /mnt/jfs

Troubleshooting

The JuiceFS client connects to PostgreSQL via SSL encryption by default. If you encountered an error saying pq: SSL is not enabled on the server, you need to enable SSL encryption for PostgreSQL according to your own business scenario, or you can disable it by adding a parameter to the metadata URL Validation.

$ juicefs format --storage s3 \
    ...
    "postgres://[email protected]:5432/juicefs?sslmode=disable" \
    pics

Additional parameters can be appended to the metadata URL. More details can be seen here.

MySQL

MySQL is one of the most popular open source relational databases, and is often preferred for web applications.

Create a file system

When using MySQL as the metadata storage engine, you need to create a database manually before create the file system. The command with the following format is usually used to access the database:

# use tcp
mysql://<username>[:<password>]@(<host>:3306)/<database-name>

# use unix socket 
mysql://<username>[:<password>]@unix(<socket-file-path>)/<database-name>

:::note Don't leave out the () brackets on either side of the URL. :::

For example:

$ juicefs format --storage s3 \
    ...
    "mysql://user:mypassword@(192.168.1.6:3306)/juicefs" \
    pics

A more secure approach would be to pass the database password through the environment variable META_PASSWORD:

$ export META_PASSWORD="mypassword"
$ juicefs format --storage s3 \
    ...
    "mysql://user@(192.168.1.6:3306)/juicefs" \
    pics

To connect to a TLS enabled MySQL server, pass the tls=true parameter (or tls=skip-verify if using a self-signed certificate).

$ juicefs format --storage s3 \
    ...
    "mysql://user:mypassword@(192.168.1.6:3306)/juicefs?tls=true" \
    pics

Mount a file system

sudo juicefs mount -d "mysql://user:mypassword@(192.168.1.6:3306)/juicefs" /mnt/jfs

Passing password with the META_PASSWORD environment variable is also supported when mounting a file system.

$ export META_PASSWORD="mypassword"
$ sudo juicefs mount -d "mysql://user@(192.168.1.6:3306)/juicefs" /mnt/jfs

To connect to a TLS enabled MySQL server, pass the tls=true parameter (or tls=skip-verify if using a self-signed certificate).

sudo juicefs mount -d "mysql://user:mypassword@(192.168.1.6:3306)/juicefs?tls=true" /mnt/jfs

For more examples of MySQL database address format, please refer to Go-MySQL-Driver.

MariaDB

MariaDB is an open source branch of MySQL, maintained by the original developers of MySQL.

Because MariaDB is highly compatible with MySQL, there is no difference in usage, the parameters and settings are exactly the same.

For example:

$ juicefs format --storage s3 \
    ...
    "mysql://user:mypassword@(192.168.1.6:3306)/juicefs" \
    pics

$ sudo juicefs mount -d "mysql://user:mypassword@(192.168.1.6:3306)/juicefs" /mnt/jfs

Passing passwords through environment variables is also the same:

$ export META_PASSWORD="mypassword"
$ juicefs format --storage s3 \
    ...
    "mysql://user@(192.168.1.6:3306)/juicefs" \
    pics

$ sudo juicefs mount -d "mysql://user@(192.168.1.6:3306)/juicefs" /mnt/jfs

To connect to a TLS enabled MariaDB server, pass the tls=true parameter (or tls=skip-verify if using a self-signed certificate).

$ export META_PASSWORD="mypassword"
$ juicefs format --storage s3 \
    ...
    "mysql://user@(192.168.1.6:3306)/juicefs?tls=true" \
    pics

$ sudo juicefs mount -d "mysql://user@(192.168.1.6:3306)/juicefs?tls=true" /mnt/jfs

For more examples of MariaDB database address format, please refer to Go-MySQL-Driver.

SQLite

SQLite is a widely used small, fast, single-file, reliable and full-featured SQL database engine.

The SQLite database has only one file, which is very flexible to create and use. When using SQLite as the JuiceFS metadata storage engine, there is no need to create a database file in advance, and you can directly create a file system:

$ juicefs format --storage s3 \
    ...
    "sqlite3://my-jfs.db" \
    pics

Executing the above command will automatically create a database file named my-jfs.db in the current directory. Please keep this file properly!

Mount the file system:

sudo juicefs mount -d "sqlite3://my-jfs.db" /mnt/jfs/

Please note the location of the database file, if it is not in the current directory, you need to specify the absolute path to the database file, e.g.

sudo juicefs mount -d "sqlite3:///home/herald/my-jfs.db" /mnt/jfs/

:::note Since SQLite is a single-file database, usually only the host where the database is located can access it. Therefore, SQLite database is more suitable for standalone use. For multiple servers sharing the same file system, it is recommended to use databases such as Redis or MySQL. :::

BadgerDB

BadgerDB is an embedded, persistent, and standalone Key-Value database developed in pure Go. The database files are stored locally in the specified directory.

When using BadgerDB as the JuiceFS metadata storage engine, use badger:// to specify the database path.

Create a file system

You only need to create a file system for use, and there is no need to create a BadgerDB database in advance.

juicefs format badger://$HOME/badger-data myjfs

This command creates badger-data as a database directory in the home directory of the current user, which is used as metadata storage for JuiceFS.

Mount a file system

The database path needs to be specified when mounting the file system.

juicefs mount -d badger://$HOME/badger-data /mnt/jfs

:::note Since BadgerDB is a standalone database, it can only be used locally and does not support multi-host shared mounts. In addition, only one process is allowed to access BadgerDB at the same time, and gc and fsck operations cannot be performed when the file system is mounted. :::

TiKV

TiKV is a distributed transactional Key-Value database. It is originally developed by PingCAP as the storage layer for their flagship product TiDB. Now TiKV is an independent open source project, and is also a granduated project of CNCF.

By using the official tool TiUP, you can easily build a local playground for testing (refer here for details). Production environment generally requires at least three hosts to store three data replicas (refer to the official document for all deployment steps).

Create a file system

When using TiKV as the metadata storage engine, parameters needs to be specified as the following format:

tikv://<pd_addr>[,<pd_addr>...]/<prefix>

The prefix is a user-defined string, which can be used to distinguish multiple file systems or applications when they share the same TiKV cluster. For example:

$ juicefs format --storage s3 \
    ...
    "tikv://192.168.1.6:2379,192.168.1.7:2379,192.168.1.8:2379/jfs" \
    pics

Set up TLS

If you need to enable TLS, you can set the TLS configuration item by adding the query parameter after the Meta-URL. Currently supported configuration items:

name value
ca CA root certificate, used to connect TiKV/PD with tls
cert certificate file path, used to connect TiKV/PD with tls
key private key file path, used to connect TiKV/PD with tls
verify-cn verify component caller's identity, reference link

example:

$ juicefs format --storage s3 \
    ...
    "tikv://192.168.1.6:2379,192.168.1.7:2379,192.168.1.8:2379/jfs?ca=/path/to/ca.pem&cert=/path/to/tikv-server.pem&key=/path/to/tikv-server-key.pem&verify-cn=CN1,CN2" \
    pics

Mount a file system

sudo juicefs mount -d "tikv://192.168.1.6:2379,192.168.1.7:2379,192.168.1.8:2379/jfs" /mnt/jfs

FoundationDB

Coming soon...