Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Auxiliary directory for supporting the multi-directory configuration. #294

Merged
merged 23 commits into from
Mar 7, 2023

Conversation

LykxSassinator
Copy link
Contributor

@LykxSassinator LykxSassinator commented Feb 15, 2023

Brief Introduction

Close #257 .

This pr is used to support the auxiliary directory feature, referring to the previous pr: #261.
It mainly contains:

  • Add an extra configuration auxiliary-dir to enable this feature. Default is None, and if set with Some(...), this directory will be used if the main dir is full, specified by dir option.

IMO, auxiliary-dir is more conspicuous to understand and use for users, compared to the previous name secondary-dir.

@codecov
Copy link

codecov bot commented Feb 15, 2023

Codecov Report

Patch coverage: 98.68% and project coverage change: +0.08 🎉

Comparison is base (404e3fe) 97.79% compared to head (b8ff5bd) 97.88%.

Additional details and impacted files
@@            Coverage Diff             @@
##           master     #294      +/-   ##
==========================================
+ Coverage   97.79%   97.88%   +0.08%     
==========================================
  Files          30       30              
  Lines       11454    11998     +544     
==========================================
+ Hits        11202    11744     +542     
- Misses        252      254       +2     
Impacted Files Coverage Δ
src/env/default.rs 92.47% <75.00%> (-1.49%) ⬇️
src/file_pipe_log/pipe_builder.rs 95.85% <94.54%> (+0.33%) ⬆️
src/file_pipe_log/pipe.rs 98.40% <96.36%> (-0.08%) ⬇️
src/config.rs 97.63% <100.00%> (+0.05%) ⬆️
src/engine.rs 98.63% <100.00%> (+0.34%) ⬆️
src/errors.rs 100.00% <100.00%> (ø)
src/file_pipe_log/log_file.rs 100.00% <100.00%> (ø)
src/log_batch.rs 98.07% <100.00%> (-0.22%) ⬇️
tests/failpoints/test_engine.rs 99.89% <100.00%> (+0.01%) ⬆️
tests/failpoints/test_io_error.rs 100.00% <100.00%> (ø)
... and 5 more

Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here.

☔ View full report at Codecov.
📢 Do you have feedback about the report comment? Let us know in this issue.

@LykxSassinator
Copy link
Contributor Author

ping @tabokie PTAL, thx. This pr is a successor to the obsolete #261.

src/engine.rs Outdated Show resolved Hide resolved
Comment on lines 29 to 32
/// Main directory path id.
pub const DEFAULT_PATH_ID: PathId = 0;
/// Auxiliary directory path id.
pub const AUXILIARY_PATH_ID: PathId = 1;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I doubt this is necessary. What if there're 100 directories? Maybe just iterate them all whenever needing a dir.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I hold the different perspective on this point.

  • Firstly, the original design to this is to support an EXTRA dir to store logs if and only if the main dir is full, ideally placed on another DEVICE. It serves the target that the directory of storing raft logs could be placed on an individual device to make IOs individual to other dirs' IO operations.
  • Secondly, in normal cases, two directories, specified by dir and auxiliary_dir, are enough for the practical production envs. And we do not expect to support 3,4, or more dir lists for users.

So, I still recommend to use this design.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is contradicting the whole point of using a vector as paths. If you need to hardcode the path ID, it is clearly better to just use two variables dir1 and dir2, because it checks the code statically instead of runtime (via vector indexing). And my originally idea of using a vector is to not restricting the amount of directories. For example, if in the future we would support #258, user can configure multiple disks to increase the write parallelism.

if idx >= paths.len() {
break;
}
let disk_stats = match fs2::statvfs(&paths[idx]) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ignore error, and avoid syscall when there's only one dir.

debug_assert!(!paths.is_empty());
if paths.len() > 1 {
  for (i, p) in paths.iter().enumerate() {
    if let Ok(stats) = fs2::statvfs(p) {
      if stats.space > target {
        return i;
      }
    }
  }
}
return 0;

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IMO, we need Err, which means that there exists no free space for generating new files.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Two reasons: (1) statvfs is only a hint. There could be another user deleting files or creating files. The result of fetch_dir doesn't promise anything at all. And it is false to rely on it. (2) All the error handling logic is already in place, adding another code path that handles the error of fetch_dir is redundant. Even if they (fetch_dir().is_err() and write().is_err()) share the same code path, it is always superior to have fewer exit points.

src/log_batch.rs Outdated Show resolved Hide resolved
src/log_batch.rs Show resolved Hide resolved
src/log_batch.rs Outdated Show resolved Hide resolved
src/engine.rs Outdated Show resolved Hide resolved
src/file_pipe_log/pipe.rs Outdated Show resolved Hide resolved
src/log_batch.rs Show resolved Hide resolved
src/config.rs Outdated Show resolved Hide resolved
src/engine.rs Show resolved Hide resolved
src/file_pipe_log/pipe_builder.rs Outdated Show resolved Hide resolved
src/file_pipe_log/pipe_builder.rs Outdated Show resolved Hide resolved
src/file_pipe_log/pipe_builder.rs Outdated Show resolved Hide resolved
src/file_pipe_log/pipe_builder.rs Outdated Show resolved Hide resolved
src/file_pipe_log/pipe_builder.rs Outdated Show resolved Hide resolved
src/engine.rs Outdated Show resolved Hide resolved
src/engine.rs Outdated Show resolved Hide resolved
src/engine.rs Show resolved Hide resolved
src/file_pipe_log/pipe.rs Show resolved Hide resolved
src/errors.rs Outdated Show resolved Hide resolved
src/file_pipe_log/log_file.rs Outdated Show resolved Hide resolved
src/file_pipe_log/pipe.rs Outdated Show resolved Hide resolved
if let Err(te) = writer.truncate() {
panic!(
"error when truncate {} after error: {}, get: {}",
seq, e, te
);
}
return Err(e);
if is_no_space_err(&e) {
// To avoid incorrect `rotate_imp` on the same dir when the size of
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't understand. If the available space is less than buf_bytes.len(), it should panic when actually writing that buf_bytes, instead of panicking here too early.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In this case, it won't panic but fail on engine.write(...) unexpectedly.
Given that we have a two directories setting with dir and spill-dir. If dir has free space for rotating several new log files, but no enough space for dumping the given buf. And spill-dir has free space for dumping this buf.
That is:

  • buf_bytes.len() > target_file_size == true;
  • n * target_file_size < dir.spare_space() && buf_bytes.len() > dir.spare_space() && buf_bytes.len() < spill-dir.spare_space().

Then, with the previous rotate_imp(self.target_file_size),

    1. First attempt for writing, failed on self.write() and try to rotate.
    1. As target_file_size < dir.spare_space(), rotate_imp(target_file_size) successfully, and returned TryAgain(...) to trigger the second attempt for write. (Naming this newly rotated file as rotate-1.)
    1. Secondly, it tried to dump this buf into rotate-1. But self.write() failed again as the dir.spare_space() < buf_bytes.len(), and it returned the NOSPAC error. Buffer still could not be dumped into current rotate-1 file. Then, it made the next rotate_imp(). And unfortunately, it succeed again because the dir still had enough space for rotating a new file. So, it returned TryAgain() error again. It would cause the final failure on engine.write(...).
    1. The whole write of this LogBatch failed because its trying times on write exceed the trying limitation. However, the spill-dir still had enough space for it.

For example, given that, target_file_size == 1MB, dir.spare_space() == 1.5MB, spill-dir.spare_space() == 100MB and buf_bytes.len() == 3MB. The above steps would be executed.

   Compiling raft-engine v0.3.0 (/nvme1n1/lucasliang/workspace/raft-engine)
    Finished test [unoptimized + debuginfo] target(s) in 13.33s
     Running tests/failpoints/mod.rs (target/debug/deps/failpoints-23610de2336687d7)
thread 'test_engine::test_build_engine_with_spill_dir' panicked at 'called `Result::unwrap()` on an `Err` value: 
TryAgain("Failed to write logbatch, exceed max_retry_count: (2),
err: failed to write 41 file, get IO Error: Custom { kind: StorageFull, error: \"nospace\" } try to flush it to another dir")', 
tests/failpoints/test_engine.rs:37:40

Signed-off-by: Lucasliang <[email protected]>
src/log_batch.rs Outdated Show resolved Hide resolved
src/log_batch.rs Outdated Show resolved Hide resolved
@@ -258,7 +263,7 @@ impl<F: FileSystem> SinglePipe<F> {
fail_point!("file_pipe_log::append");
let mut writable_file = self.writable_file.lock();
if writable_file.writer.offset() >= self.target_file_size {
if let Err(e) = self.rotate_imp(&mut writable_file) {
if let Err(e) = self.rotate_imp(&mut writable_file, self.target_file_size) {
Copy link
Member

@tabokie tabokie Mar 1, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's ignore this case temporarily (assume buf.len is always smaller than target_file_size). You can improve it in another PR. Because it's a bit complicated than this, you also have to consider recycled file. E.g. recycled_file + available_space is not enough, but all recycled_files + available_space is enough.

Copy link
Contributor Author

@LykxSassinator LykxSassinator Mar 1, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's right. I've considered it and will patch it just in this PR. But I'm struggling with how to add an unit case for this.

I'll update it in next commit, thx for any supplementary comments.

src/file_pipe_log/pipe_builder.rs Outdated Show resolved Hide resolved
src/file_pipe_log/pipe_builder.rs Outdated Show resolved Hide resolved
src/file_pipe_log/pipe_builder.rs Show resolved Hide resolved
src/file_pipe_log/pipe_builder.rs Show resolved Hide resolved
src/file_pipe_log/pipe_builder.rs Outdated Show resolved Hide resolved
src/file_pipe_log/pipe_builder.rs Outdated Show resolved Hide resolved
src/file_pipe_log/pipe_builder.rs Outdated Show resolved Hide resolved
src/engine.rs Show resolved Hide resolved
src/engine.rs Outdated Show resolved Hide resolved
src/log_batch.rs Outdated Show resolved Hide resolved
src/log_batch.rs Outdated
// `BufState::Sealed` means that `LogBatch` is under a repeated state of dumping.
BufState::Encoded(header_offset, entries_len)
| BufState::Sealed(header_offset, entries_len) => {
self.item_batch.prepare_write(&mut self.buf, file_context)?;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What if item batch is empty?

Suggested change
self.item_batch.prepare_write(&mut self.buf, file_context)?;
self.item_batch.prepare_write(&mut self.buf[header_offset + HEADER_LEN + entries_len ..], file_context)?;

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What if item batch is empty?

In this case, it will return Ok(0) directly when calling engine.write(...).

raft-engine/src/engine.rs

Lines 138 to 141 in 404e3fe

pub fn write(&self, log_batch: &mut LogBatch, mut sync: bool) -> Result<usize> {
if log_batch.is_empty() {
return Ok(0);
}

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Better make the function more robust. Skip writing when empty should only be an optimization instead of being depended on.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Better make the function more robust. Skip writing when empty should only be an optimization instead of being depended on.

Got.

// current buffer is too huge to be appended directly. So, we should free
// enough space by clearing several recycled log files until this buffer
// could be dumped.
let mut expected_size = expected_file_size;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(1) recycled file could be on a different disk than the full one
(2) should not modify the expected_file_size passed into fetch_dir. There could be another writer occupying the space you just released.
(3) if deleting recycled file is enough, there's no need to rotate, which makes the rotated file smaller than usual and hurt future performance.

This branch needs test.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep, these cases have already considered in this branch.

Firstly, about:

(1) recycled file could be on a different disk than the full one

That's right, this case has three corner cases(Suppose main-dir is full, and if and only if the first attempt by engine.write(...) is failed, will the whole writing progress try to enter this branch.):

    1. main-dir has no recycled logs, and spill-dir have several recycled logs.
      As the main-dir is full, the following find_available_dir must skip the main-dir, and spill-dir should prepare enough space for this write. Otherwise, the whole write progress(engine.write(...)) will fail finally with Err:TryAgain.
      So, in this case, the deletion is necessary for preparing enough space for write. And if there did not exists enough space for writing this LogBatch, it will fail as expected because there truly exists no enough space on both disks.
    1. main-dir has several recycled logs, and sum(recycled_logs.size()) < expected_file_size, but no recycled logs exist in spill-dir.
      After deletion, the main-dir still have no enough space for this second attempt. So, the following find_available_dir must skip the main-dir as expected. And whether this second attempt on write will success or not depends on whether there exists enough space on spill-dir. It's expected. So, int this case, this deletion is also necessary.
    1. Both main-dir and spill-dir have several recycled logs.
      The deletion progress will try to release free space on one dir until it's enough to dump this LogBatch into this dir. If and only if free space is still not enough, that is, deleted_recycled_logs.size() + free_space.size() < expected_file_size, does fail this second write. So, the deletion should be kept.

Secondly, about:

(2) should not modify the expected_file_size passed into fetch_dir. There could be another writer occupying the space you just released.

Here, the expected_file_size do not be modified, just copy it to expected_size in this pr.

And for:

(3) if deleting recycled file is enough, there's no need to rotate, which makes the rotated file smaller than usual and hurt future performance.

Yep, unless expected_file_size <= self.target_file_size, it will do reuse recycled logs by rotate. Otherwise, it will try to free enough space and find an available dir to place this LogBatch by the following find_available_dir directly.

As for testing on this branch, I've do it by the following manual case on centos7:

#[test]
fn test_build_engine_with_spill_dir() {
    fn mount_tmpfs(path: &str) {
        debug_assert!(!path.is_empty());
        #[cfg(target_os = "linux")]
        {
            use std::process::Command;

            let mut mk_device = Command::new("sudo");
            mk_device.args([
                "mount",
                "-t",
                "tmpfs",
                "-o",
                "size=128M",
                "tmpfs",
                path.into(),
            ]);
            mk_device.status().expect("process failed to execute");
        }
    }

    fn umount_tmpfs(path: &str) {
        debug_assert!(!path.is_empty());
        #[cfg(target_os = "linux")]
        {
            use std::process::Command;

            let mut mk_device = Command::new("sudo");
            mk_device.args(["umount", path.into()]);
            mk_device.status().expect("process failed to execute");
        }
    }

    let main_dir = tempfile::Builder::new()
        .prefix("test_build_engine_with_spill_dir_main")
        .tempdir()
        .unwrap();
    let spill_dir = tempfile::Builder::new()
        .prefix("test_build_engine_with_spill_dir_spill")
        .tempdir()
        .unwrap();
    {
        // Mount a tmpfs with limited size.
        mount_tmpfs(main_dir.path().to_str().unwrap());
    }
    let cfg = Config {
        dir: main_dir.path().to_str().unwrap().to_owned(),
        spill_dir: Some(spill_dir.path().to_str().unwrap().to_owned()),
        target_file_size: ReadableSize::mb(1),
        purge_threshold: ReadableSize::mb(128),
        enable_log_recycle: true,
        prefill_for_recycle: true,
        ..Default::default()
    };
    let data = vec![b'x'; 17371];
    let engine = Engine::open(cfg.clone()).unwrap();
    let _f1 = FailGuard::new("file_pipe_log::force_choose_dir", "return");
    for rid in 1..12000 {
        assert!(catch_unwind_silent(|| append(&engine, rid, 1, 100, Some(&data))).is_ok());
    }
    {
        // TODO: failed because current pid still occupy this dir.
        umount_tmpfs(main_dir.path().to_str().unwrap());
    }
}

But I'm wondering how to release the temporarily mounted device and make it approved by CI/CD.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Honestly, I don't even know how to respond to such a long comment. If the situation is indeed so complicated, you should comment it in the code so that we can discuss line by line there.

(1.i) So, in this case, the deletion is necessary for preparing enough space for write.

You didn't check the available space of spill_dir. It might not need deletion at all.

(1.iii) The deletion progress will try to release free space on one dir until it's enough to dump this LogBatch into this dir.

Same thing as before, you don't know if the other directory has enough available space so that it can avoid deletion. And in addition to that, your code won't work if the sum of deleted files from dir1 and dir2 reaches target_size, but the the two individual directories still haven't deleted enough.

(2) Here, the expected_file_size do not be modified, just copy it to expected_size in this pr.

Well, it's a misread, but you shouldn't copy something into another with nearly the same name.

Yep, unless expected_file_size <= self.target_file_size, it will do reuse recycled logs by rotate. Otherwise, it will try to free enough space and find an available dir to place this LogBatch by the following find_available_dir directly.

I think there's some misunderstanding. I meant you don't need to rotate the active file, i.e. no need to close the file writer and open a new one.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As for testing on this branch, I've do it by the following manual case on centos7:

Just some failpoint unit tests are enough. Hardware doesn't really matter to this particular function. I only reason I asked before to test on hardware, is because I don't know if truncating a file on an already full disk can succeed.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My suggestion is to move the whole deletion outside of rotate. Like this:

fn try_make_space(target) {
  let used = vec![0; self.paths.len()];
  for f in recycled_files {
    used[f.path_id] += f.handle.file_size()?;
  }
  for i in 0..self.paths.len() {
    if used[i] + self.paths[i].available_space > target + HEADER_LEN {
      // remove some.
      break;
    }
  }
}

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You didn't check the available space of spill_dir. It might not need deletion at all.

IMO, the too early and unified deletion is focused on reducing the extra checking of available_space on each dir, and unify the code into one branch expected_file_size > target_file_size. For example, the available_space consists of recycled logs and free space on each dir respectively, and the checking and deletion of recycled files should be separated individually.

As this pr has paid too long comments on this point, making it too ugly to be reviewed, I accept the previous advice that we should open another issue to focus on optimizing this issue.

src/file_pipe_log/pipe.rs Outdated Show resolved Hide resolved
Copy link
Member

@tabokie tabokie left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is #294 (comment) tested?

for (i, file_pair) in files.windows(2).enumerate() {
// If there exists a black hole or duplicate scenario on FileSeq, these
// files should be skipped and cleared.
if file_pair[1].seq - file_pair[0].seq != 1 {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is duplicate file tested?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep, this case has already been supplemented in src/engine.rs +2735 as following shows:

// Case 4: abnormal case - duplicate FileSeq among different dirs.
{
    // Prerequisite: choose several files and duplicate them to main dir.
    let mut file_count = 0;
    std::fs::read_dir(paths[1]).unwrap().for_each(|e| {
        let p = e.unwrap().path();
        let file_name = p.file_name().unwrap().to_str().unwrap();
        if let Some(FileId {
            queue: LogQueue::Append,
            seq: _,
        }) = FileId::parse_file_name(file_name)
        {
            let mut dst_path = PathBuf::from(&paths[0]);
            dst_path.push(file_name);
            if file_count % 2 == 0 {
                std::fs::copy(p, dst_path).unwrap();
            }
            file_count += 1;
        }
    });
}
let engine = engine.reopen();
// Duplicate log files will be skipped and cleared.
assert!(engine.file_span(LogQueue::Append).0 > start_5);

@tabokie
Copy link
Member

tabokie commented Mar 6, 2023

update changelog.

@LykxSassinator
Copy link
Contributor Author

LykxSassinator commented Mar 6, 2023

is #294 (comment) tested?

Done for it in the log_batch::tests::test_empty_log_batch case.

@tabokie
Copy link
Member

tabokie commented Mar 6, 2023

update changelog.

ping

@LykxSassinator
Copy link
Contributor Author

update changelog.

ping

Also done in the last commit:

* Add a new configuration `spill-dir` to allow automatic placement of logs into an auxiliary directory when `dir` is full.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Support secondary log directory
2 participants