-
Notifications
You must be signed in to change notification settings - Fork 1
/
Copy pathpathlib.Rmd
255 lines (187 loc) · 6.79 KB
/
pathlib.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
---
jupyter:
jupytext:
notebook_metadata_filter: all,-language_info
split_at_heading: true
text_representation:
extension: .Rmd
format_name: rmarkdown
format_version: '1.2'
jupytext_version: 1.10.3
kernelspec:
display_name: Python 3
language: python
name: python3
orphan: true
---
# Using the `pathlib` module
The `pathlib` module is one of two ways of manipulating and using file paths in
Python. See the [path manipulation](path_manipulation.Rmd) page for more
discussion, and [os.path](os_path.Rmd) page for a tutorial on an alternative
approach.
The primary documentation for `pathlib` is
<https://docs.python.org/3/library/pathlib.html>.
The standard way to use the Pathlib module is to import the `Path` class from the module:
```{python}
# Import Path class from pathlib library.
from pathlib import Path
```
In Jupyter or IPython, you can tab complete on `Path` to list the methods (functions) and attributes attached to it.
An object (value) of type `Path` represents a pathname. As you [remember
](path_manipulation.Rmd), a pathname is a string that identifies a
particular file or directory on a computer filesystem.
Let us start by making a default object from the `Path` class, like this:
```{python}
p = Path()
p
```
By default, the path object, here `p`, refers to our current working directory, or `.` for short. `.` is a *relative path*, meaning that we specify where we are relative to our current directory. `.` means we are exactly in our current directory.
Because the `.` is a *relative path*, it does not tell us where we are in
the filesystem, only where we are relative to the current directory.
Path objects have an `absolute` function attached to them. Another way of
saying this is that Path objects have an `absolute` *method*. Calling
this method gives us the *absolute* location of the path, meaning, the
filesystem position relative to the base location of the disk the file is
on.
```{python}
abs_p = p.absolute()
abs_p
```
Notice the `/` in front of the absolute filename (on Unix), meaning the
base location for all files. You will see a drive location like `C:` or
similar, at the front of the absolute path, if you are on Windows.
We can always convert the `Path` object to a simple string, using the
`str` function. `str()` converts anything to a string, if it can:
```{python}
# The path, as a string
str(abs_p)
```
Sometimes we want to get a path referring the directory *containing* a path.
The Path object has a `parent` attribute attached to it (an attribute is data
attached to an object). The `parent` attribute is a Path object for the
containing directory:
```{python}
abs_p.parent
```
The `parent` attribute of the Path object gives the directory name from a
full file path. It works correctly for Unix paths on Unix machines, and
Windows paths on Windows machines.
```{python}
# On Unix
a_path = Path('/a/full/path/then_filename.txt')
# Show the directory containing the file.
a_path.parent
```
`parent` also works for relative paths.
```{python}
# On Unix
rel_path = Path('relative/path/then_filename.txt')
rel_path.parent
```
Use the `name` attribute of the Path object to get the filename rather
than the directory name:
```{python}
# On Unix
rel_path.name
```
Sometimes you want to join one or more directory names with a filename to get a
path. Path objects have a clever way of doing this, by *overriding* the `/`
(division) operator.
To remind you about operator overloading, remember that addition means different things for numbers and strings. For numbers, addition means arithmetic addition:
```{python}
# Addition for numbers
2 + 2
```
For strings, addition means concatenation — sticking the strings together:
```{python}
# Addition for strings.
"first" + "second"
```
Path objects use the division operator `/` to mean "stick the path fragments
together to make a new path, where the `/` separates directories".
```{python}
# On Unix
Path('relative') / 'path' / 'then_filename.txt'
```
This also works on Windows and Unix in the same way.
Sometimes you want to get the filename extension. Use the `suffix` attribute for this:
```{python}
rel_path
```
```{python}
rel_path.suffix
```
You will often find yourself wanting to replace the file extension. You can do this with the `with_suffix` method:
```{python}
rel_path.with_suffix('.md')
```
Path objects also have methods that allow you to read and write text characters and raw bytes.
Let us make a new path to point to a file we will write in the current directory.
```{python}
new_path = Path() / 'a_test_file.txt'
new_path
```
We can write text characters (strings) to this file, with the `write_text` method:
```{python}
a_multiline_string = """Some text.
More text.
Last text."""
new_path.write_text(a_multiline_string)
```
We can read the text out of a file using `read_text`:
```{python}
new_path.read_text()
```
Similarly, we can write and read raw byte data, using `write_bytes` and `read_bytes`.
It is often useful to read in a text file, and split the result into lines. We
do this with `read_text`, and then we use the `splitlines` method of string
object to split the read text into lines.
```{python}
text = new_path.read_text()
text.splitlines()
```
## Listing files in a directory
We can use the `glob` method of the `Path` object to give a list of all, or
some files in a directory.
For example, to see all files in the current directory, we could do this:
```{python}
cwd = Path()
list(cwd.glob('*'))
```
Notice two things here.
### Selecting files with `glob`
The argument to the `glob` method above is `'*'`. The `'*'` tells `glob` to
get *all* files and directories, using what is called a [Glob
match](https://docs.python.org/3/library/glob.html). This is a powerful
feature that allows you to be selective in asking for the files that `glob`
returns. For example, if you wanted to see only the files ending with `.txt`
you could do:
```{python}
list(cwd.glob('*.txt'))
```
There are more detail in the page linked above.
### `list` around the output of `glob`
Notice that we used `list` around the output of `glob`, as in, for example:
```{python}
# List all txt files.
list(cwd.glob('*.txt'))
```
This is because `glob` returns something called a *generator* which *can*
return all the Path objects, but will not do that until we ask it to.
```{python}
cwd.glob('*.txt')
```
The `list` call converts the result into a list, and in doing so, asks the
generator to return all the Path objects:
```{python}
list(cwd.glob('*.txt'))
```
## Deleting files
And finally, to be tidy, we use the `unlink` method to delete the temporary
file we were using. `unlink` is [strangely
named](https://linux.die.net/man/2/unlink), where the name refers to the way
the computer disk system stores files, but does always have the effect of
deleting the file.
```{python}
new_path.unlink()
```