Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

adding a realdir with target to '/' creates a self-referential '' entry that makes the fs unusable #901

Closed
pdemarti opened this issue Oct 25, 2023 · 14 comments · Fixed by #903

Comments

@pdemarti
Copy link

Describe the bug
My first baby steps with pyfakefs aren't going well... I am trying to populate a fake fs with custom entries under root (to test a system that expects certain files at absolute locations).

Instead of seeing the desired content, doing a simple os.walk() results in an infinite loop, seemingly because an extra, self-referential entry (the empty string '') is created under /. If I look at the beginning of the iterator of os.walk(), I see a repeating alternation of '/' and '/tmp'.

How To Reproduce

cd $(mktemp -d)
mkdir -p data/root/foo/bar
touch data/root/foo/bar/file-{0,1,2,3}.txt
touch __init__.py

cat > conftest.py <<"EOF"
import pytest

@pytest.fixture
def myfs(fs):
    fs.add_real_directory('data/root', target_path='/')
    yield fs
EOF

cat > test_fake.py <<"EOF"
import os

def test_fake(myfs):
    print('\n\nwith fake fs')
    for _, e in zip(range(10), os.walk('/')):
        print(e)

def test_real():
    print('\n\nwith real fs')
    os.chdir('data/root')
    for _, e in zip(range(10), os.walk('.')):
        print(e)
EOF

pytest -s test_fake.py

The output of the above is:

================================================== test session starts ==================================================
platform linux -- Python 3.10.9, pytest-7.1.2, pluggy-1.0.0
rootdir: /tmp/tmp.xpzkMjworA
plugins: xdist-2.5.0, anyio-3.5.0, cov-3.0.0, pyfakefs-5.3.0, hypothesis-6.29.3, forked-1.3.0
collected 2 items                                                                                                       

test_fake.py 

with fake fs
('/', ['tmp', ''], [])
('/tmp', [], [])
('/', ['tmp', ''], [])
('/tmp', [], [])
('/', ['tmp', ''], [])
('/tmp', [], [])
('/', ['tmp', ''], [])
('/tmp', [], [])
('/', ['tmp', ''], [])
('/tmp', [], [])
.

with real fs
('.', ['foo'], [])
('./foo', ['bar'], [])
('./foo/bar', [], ['file-2.txt', 'file-3.txt', 'file-0.txt', 'file-1.txt'])
.

=================================================== 2 passed in 0.19s ===================================================

What I was expecting was to see the same output for test_fake as for test_real, but relocated under '/'.

Your environment

Linux-6.2.0-32-generic-x86_64-with-glibc2.35
Python 3.10.9 (main, Jan 11 2023, 15:21:40) [GCC 11.2.0]
pyfakefs 5.3.0
pytest 7.1.2

Probable cause

When I run the test in PyCharm's debugger, I noticed a strange entry under the root dir of myfs: the empty string ('', third line of the output below):

>>> str(myfs)
'/'(40777):
  'tmp'(40777):
  ''(40775):
  'foo'(40775):
    'bar'(40775):
      'file-2.txt'(100444)
      'file-3.txt'(100444)
      'file-0.txt'(100444)
      'file-1.txt'(100444)

Incidentally, if I try to print that in the unit test itself (rather than copying the view from the debugger above), I get:

self = <pyfakefs.fake_file.FakeDirectory object at 0x7fe0490633d0>

    def __str__(self) -> str:
        description = super(FakeDirectory, self).__str__() + ":\n"
>       for item in self.entries:
E       RuntimeError: dictionary changed size during iteration

In any case, it seems that this erroneous entry has been created during fs.add_real_directory. Inspecting it in the debugger, that entry refers to the filesystem itself. Specifically:

>>> myfs.root_dir.entries[''].filesystem is myfs
True

# or, in other words:
>>> id(myfs.root_dir.entries[''].filesystem) == id(myfs)
True
@pdemarti
Copy link
Author

Workaround

The following produces the desired result.

cd $(mktemp -d)
mkdir -p data/root/foo/bar
touch data/root/foo/bar/file-{0,1,2,3}.txt
touch data/root/hello.txt
touch __init__.py

cat > conftest.py <<"EOF"
import os
import pytest


@pytest.fixture
def myfs(fs):
    fs.pause()
    content = [(e.name, e.path, os.path.isdir(e)) for e in os.scandir('data/root')]
    fs.resume()
    for name, path, isdir in content:
        if isdir:
            fs.add_real_directory(path, target_path=f'/{name}')
        else:
            fs.add_real_file(path, target_path=f'/{name}')
    yield fs
EOF

cat > test_fake.py <<"EOF"
import os

def test_fake(myfs):
    print('\n\nwith fake fs')
    for _, e in zip(range(10), os.walk('/')):
        print(e)

def test_real():
    print('\n\nwith real fs')
    os.chdir('data/root')
    for _, e in zip(range(10), os.walk('.')):
        print(e)
EOF

pytest -s test_fake.py

Output:

================================================== test session starts ==================================================
platform linux -- Python 3.10.9, pytest-7.1.2, pluggy-1.0.0
rootdir: /tmp/tmp.ykVCAbMQoK
plugins: xdist-2.5.0, anyio-3.5.0, cov-3.0.0, pyfakefs-5.3.0, hypothesis-6.29.3, forked-1.3.0
collected 2 items                                                                                                       

test_fake.py 

with fake fs
('/', ['tmp', 'foo'], ['hello.txt'])
('/tmp', [], [])
('/foo', ['bar'], [])
('/foo/bar', [], ['file-2.txt', 'file-3.txt', 'file-0.txt', 'file-1.txt'])
.

with real fs
('.', ['foo'], ['hello.txt'])
('./foo', ['bar'], [])
('./foo/bar', [], ['file-2.txt', 'file-3.txt', 'file-0.txt', 'file-1.txt'])
.

=================================================== 2 passed in 0.20s ===================================================

@mrbean-bremen
Copy link
Member

Thanks for the report and the investigation!
Mapping a real directory to root is something that I have never tested, so it is likely a bug somehow related to the special handling of the root path. I will probably have a closer look at the weekend.

@mrbean-bremen
Copy link
Member

So far I have not been able to reproduce your problem. I let the test run in the CI and got no errors on any system - maybe I'm doing something wrong. Here is the latest varaint of the test I have been using (modified it a bit to work on different systems):

import os
import shutil
import tempfile
from pathlib import Path

import pytest


@pytest.fixture(scope="session")
def real_dir():
    path = Path(tempfile.gettempdir()) / "data" / "root" / "foo" / "bar"
    os.makedirs(path, exist_ok=True)
    (path / "file.txt").touch(exist_ok=True)
    (path / "__init__.py").touch(exist_ok=True)
    yield path
    shutil.rmtree(path, ignore_errors=True)


@pytest.fixture
def myfs(real_dir, fs):
    fs.add_real_directory(real_dir, target_path=fs.root.path)
    yield fs


def test_fake(myfs):
    print("\n\nwith fake fs")
    for _, e in zip(range(10), os.walk(myfs.root.path)):
        print(e)


def test_real(real_dir):
    print("\n\nwith real fs")
    for _, e in zip(range(10), os.walk(real_dir)):
        print(e)

Can you please check the outcome of this test in your environment?

@pdemarti
Copy link
Author

pdemarti commented Oct 28, 2023

I am away from my desktop right now. But what I provided was a full reproducible example (including setting up of the data dir) using simple shell commands. Were you able to just execute the shell commands I provided? If so, what was the output?

@pdemarti
Copy link
Author

pdemarti commented Oct 28, 2023

Update: when I try your test code, I can see the unexpected output (same as in my case). You have to use pytest -s to see the output (the tests themselves don't fail, as written; just the output is incorrect, indicating the problem in the fake fs).

pytest -s test_it.py
================================================================================================ test session starts =================================================================================================
platform linux -- Python 3.10.9, pytest-7.1.2, pluggy-1.0.0
rootdir: /home/pierred/tmp/foobar
plugins: xdist-2.5.0, anyio-3.5.0, cov-3.0.0, pyfakefs-5.3.0, hypothesis-6.29.3, forked-1.3.0
collected 2 items                                                                                                                                                                                                    

test_it.py 

with fake fs
('/', ['tmp', ''], [])
('/tmp', [], [])
('/', ['tmp', ''], [])
('/tmp', [], [])
('/', ['tmp', ''], [])
('/tmp', [], [])
('/', ['tmp', ''], [])
('/tmp', [], [])
('/', ['tmp', ''], [])
('/tmp', [], [])
.

with real fs
('/tmp/data/root/foo/bar', [], ['__init__.py', 'file.txt'])
.

================================================================================================= 2 passed in 0.21s =========

Addendum
To make it fail, try changing:

def test_fake(myfs):
    print("\n\nwith fake fs")
    for _, e in zip(range(10), os.walk(myfs.root.path)):
        print(e)

into:

def test_fake(myfs):
    print("\n\nwith fake fs")
    for e in os.walk(myfs.root.path):
        print(e)

You should then get a recursion error.

@mrbean-bremen
Copy link
Member

Ah sorry, ignore that - I just noticed that I can reproduce it, I just made a dumb mistake...
I was checking for a recursion error, but should have looked at the output which was indeed wrong. And yes, with your change I can also reproduce the recursion error, thanks!

@pdemarti
Copy link
Author

No worries at all. I am grateful that you're willing to take a look. If I had time, I would dig in, but alas, I have zero time at the moment.

@mrbean-bremen
Copy link
Member

The problem is not mapping to the root dir as such, but mapping to a directory already existing in the fake directory - this is currently not supported (never come up before). This may take a bit...

@mrbean-bremen
Copy link
Member

mrbean-bremen commented Oct 29, 2023

There was actually a test that checked that mapping to an existing directory raises an exception - which it does, except for the root path (which was actually a bug).
Mapping to an existing directory (including root) would mean to potentially merge two directory trees, with some conflicts possible, e.g. a file already existing in the fake file system that would be overwritten, or a file existing where a directory would be mapped. There may be more problems...
I'm not quite sure what the expected behavior in these cases should be - maybe you already have an idea.
I will put something together - though I guess this is not urgent, as a workaround exists, and I also have some time constraints, so it may take a few days.

mrbean-bremen added a commit to mrbean-bremen/pyfakefs that referenced this issue Oct 31, 2023
- changed behavior of add_real_directory() to be able to map
  a real directory to an existing directory in the fake filesystem
- fixes pytest-dev#901
mrbean-bremen added a commit to mrbean-bremen/pyfakefs that referenced this issue Oct 31, 2023
- changed behavior of add_real_directory() to be able to map
  a real directory to an existing directory in the fake filesystem
- fixes pytest-dev#901
@pdemarti
Copy link
Author

pdemarti commented Nov 5, 2023

I think the merge could be done in a few ways, a bit similar to some of the ways rsync can operate between a source and a destination.

  1. Hard replace (equivalent of rsync -az --force --delete src/ dst/). It would mean that the whole target path, if existing, is removed first and replaced by the new content. In any case, the result is that the new path becomes an exact mirror of the source.
  2. Merge when possible (e.g. both src and dst are dir, of src is file and dst doesn't exist), clobbering previous entries in cases where merge is not possible (e.g., src is file and dst is file or dir). This would be equivalent to rsync -az --force src/ dst/).
  3. Merge only if the target doesn't exist and raise otherwise.

We would need to think about how to control this by a clear set of flags. But how does it sound in terms of the logic?

@mrbean-bremen
Copy link
Member

mrbean-bremen commented Nov 12, 2023

Sorry, I seem to have missed your comment and completely forgot about my PR (#903)...

EDIT: Yours was not the only comment I missed - looks like I accidentally removed myself from the watchers for this repo (or have been removed somehow).

The PR uses a mix of the second and third approach. If a directory exists, the directories are merged to the extent that no files are overwritten. If a file would be overwritten (or a file exists where a directory would be created), an exception is raised (you can have a look at the PR, especially the tests which should show the behavior). If this is ok for you, I would merge it.

@mrbean-bremen
Copy link
Member

@pdemarti - I'm going to merge the PR, and if that isn't sufficient for you, I can see if I can make more changes.

mrbean-bremen added a commit that referenced this issue Nov 13, 2023
- changed behavior of add_real_directory() to be able to map
  a real directory to an existing directory in the fake filesystem
- fixes #901
@pdemarti
Copy link
Author

@mrbean-bremen : that sounds perfect. We probably don't need the other cases. Thank you so much for doing this.

The PR uses a mix of the second and third approach. If a directory exists, the directories are merged to the extent that no files are overwritten. If a file would be overwritten (or a file exists where a directory would be created), an exception is raised (you can have a look at the PR, especially the tests which should show the behavior). If this is ok for you, I would merge it.

@mrbean-bremen
Copy link
Member

Good to hear! I'm waiting for another PR to be finished and probably will make a bug fix release afterwards (I wanted to do it anyway because of another fixed bug).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants