Scratchpad:Paths

From OpenLP
Jump to: navigation, search
Please note: This is a work in progess

OpenLP currently uses string objects to represent file and directory paths. From Python 3.4 pathlib, a new module introducing a Path object, was included in the standard library.

Switching to this Path object will allow us to deal with file paths on different platforms easier. In some cases it also reduces LOC and in my opinion makes the code cleaner and easier to read.

Naming Convention

At this point I would like to propose a naming convention.

  • All variables that reference a Path object end with '_path' i.e (save_path, media_path)
  • Variables that reference a string representation of a part of a path end with '_name' i.e (file_name, directory_name)

Lists of the afore mentioned type shall be plurals, i.e

  • save_paths, media_paths
  • file_names, directory_names

The Path object

Here are some examples to help get started using pathlib.

All of these code samples are from work I've done on refactoring OpenLP to use Path objects. (Some have been simplified to provide a concise example)

Creating paths

The existing way using strings:

path = os.path.join(AppLocation.get_section_data_path('themes'), 'theme_name')

Using a Path object. (Note once OpenLP has been converted to using Path objects AppLocation.get_section_data_path will return a Path object)

# Using the Path constructor (If you're creating a Path object from scratch)
path = Path(AppLocation.get_section_data_path('themes'), 'theme_name')

# Creating a new Path object from an existing Path object 
path = AppLocation.get_section_data_path('themes') / 'theme_name'

# -- or --
path = AppLocation.get_section_data_path('themes').joinpath('theme_name')

The '/' is used to join Paths, or a Path and a string object regardless if the operating system uses forward or backward slashes.

Formatting strings

Nothing special needs to be done when using a Path object as an argument to the format method of a string.

from pathlib import PurePosixPath, PureWindowsPath
'Directory: {path}'.format(path=PurePosixPath('test', 'path')) == 'Directory: test/path'
'Directory: {path}'.format(path=PureWindowsPath('test', 'path')) == 'Directory: test\\path'

Using Paths

The Path object is divided in to ConcretePath objects (ones who's methods access the file system) and PurePath objects (ones who's methods provide their functionally with out accessing the file system). These are objects are sub classed to provide the Path object. See the pathlib documentation for more details.

PurePath Methods

These are methods that do not access the file system, consequently, PurePosixPath can be imported in Windows and PureWindowsPath can be imported on Posix systems. The same cannot be said for the ConcretePath objects

name (File / Directory Names)

Used to access the name of the last part of the path (anything after the last slash)

With os.path:

filename = os.path.split(self.theme.background_filename)[1]

# -- or --
filename = os.path.basename(self.theme.background_filename)

With pathlib:

file_name = self.theme.background_file_path.name

Note: See Path object removes the trailing / for a difference between how os.path and pathlib.Path handle trailing slashes

with_name (File / Directory Names)

Same as above, but allows the file / directory name to be easily changed when using a Path object

With os.path:

data_folder_backup_path = data_folder_path + '-' + timestamp

With pathlib:

data_folder_backup_path = data_folder_path.with_name(data_folder_path.name + '-' + timestamp)
suffix (File extensions)

In pathlib the extension is known (correctly) as the suffix.

With os.path:

extension = os.path.splitext(file_name)[1].lower()

With pathlib:

extension = file_path.suffix.lower()
with_suffix (File extensions)

As with file names, pathlib makes replacing the extension/suffix a breeze.

With os.path:

if os.path.splitext(file_name)[1] == '':
    file_name += '.osz'
else:
    ext = os.path.splitext(file_name)[1]
    file_name.replace(ext, '.osz')

With pathlib:

file_path.with_suffix('.osz')
stem (File name with out extension)

pathlib.stem gets the file name with out extensions.

This involved a two step process with os.path of 'splitting' the name and then 'splitting' the extension.

With os.path:

path_file_name = self.file_name()
path, file_name = os.path.split(path_file_name)
base_name = os.path.splitext(file_name)[0]

With pathlib:

base_name = self.file_name().stem
Parent

Get the parent directory name.

With os.path:

last_dir = os.path.split(file)[0]

# -- or --
last_dir = os.path.dirname(file)

With pathlib:

last_dir_path = file_path.parent

ConcretePath Methods

Concrete Path methods preform reads or writes to the file system. Because of this the ConcretePath implementations can only be used on the system for which they were written for.

Stat

With os.path:

os.path.getsize(file_name) == 0

With pathlib:

file_path.stat().st_size == 0

With os.path:

image_date = os.stat(file_path).st_mtime

With pathlib:

image_date = file_path.stat().st_mtime
Exists

Does the path exist regardless if it is a file or directory.

With os.path:

if os.path.exists(thumb_path):

With pathlib:

if thumb_path.exists():
is_dir

Is the path a directory?

With os.path:

if os.path.isdir(local_file):

With pathlib:

if local_path.is_dir():
is_file

Is the path a file?

With os.path:

if not os.path.isfile(text_file):

With pathlib:

if not text_file_path.is_file():
iterdir

Returns a list of absolute paths, so iterating through results and joins are not required.

With os.path:

listing = os.listdir(local_file)
    for file_name in listing:
        files.append(os.path.join(local_file, file_name))

With pathlib:

file_paths = local_path.iterdir()

When using os.walk, and only expecting results from the source directory (i.e. no sub directories).

With os.path:

for files in os.walk(source):
    for name in files[2]:

With pathlib:

for file_path in source_path.iterdir():
open

With os.path:

with open(filename, 'rb') as detect_file:

With pathlib:

with file_path.open('rb') as detect_file:
read_text

Open the file and read out the text.

With os.path:

song_file = open(self.import_source, 'rt', encoding='utf-8-sig')
file_content = song_file.read()
song_file.close()

# -- or --
with open(self.import_source, 'rt', encoding='utf-8-sig') as song_file:
    file_content = song_file.read()

With pathlib:

file_content = self.import_source.read_text(encoding='utf-8-sig')
write_text
fn = open(notes_file, mode='wt', encoding='utf-8')
fn.write(note)
fn.close()

# -- or --
with open(notes_file, mode='wt', encoding='utf-8') as fn:
    fn.write(note)
notes_path.write_text(note)
resolve

Wrappers and Utility functions

Gotchas

No such thing as a Falsey path

Perhaps the biggest annoyance of the Path object is that the Path object is assumed to be relative to the current working directory. If its instantiated with out any arguments, or an empty string, its still a object with a path relative to the current working directory.

Path() == Path('') == Path('.')

Previously in OpenLP there would be cases where we did things like:

file_name = ''
# some code ...
if file_name:

We could do this because an empty string is a Falsey value. However all Path objects are Truthy

To work round this empty path variables should be defined as None. This leads to extra effort when handling things like QFileDialogs, as they return an empty string if the user cancels the dialog box. Meaning we can't just wrap the return value with a Path object. Instead the return needs evaluating and if equal to a Falsey value we need to return None.

file_name = ''
# some code ..
if file_name == '':
    file_path = None
else:
    file_path = Path(file_name)

Of course it goes the other way too. We cannot just call str() on a variable which stores a Path object, as it could be None, and str(None) == 'None'. So something like the following is needed.

file_path = None
# some code ..
if file_path is None:
    file_name = ''
else:
    file_name = str(file_path)


To simplify this I have implemented a version of both the above code samples as utilities path_to_str and str_to_path.

Path object removes the trailing \

Another feature to look out for is that the Path object removes the trailing slash. For example:

str(Path('a/')) == 'a'

Path('a/') == Path('a')

This kind of makes sense. Drop in to a terminal and try the following (should work on Windows too)

:~$ cd Documents/
:~/Documents$ cd ..

:~$ cd Documents
:~/Documents$

However this leads to some inconsistencies between the os.path module and the pathlib module. Here are some (but not exhaustive examples):

a_name = 'user/desktop'
b_name = 'user/desktop/'

a_path = Path(a_name)
b_path = Path(b_name)

(a_path == b_path) == True

# Get the file / directory name
os.path.basename(a_name) ==  'desktop'
os.path.basename(b_name) == ''

a_path.name == 'desktop'

# Get the parent directory
os.path.dirname(a_name) == 'user'
os.path.dirname(b_name) == 'user/desktop'

a_path.parent == Path('user')

Saving Paths

To save the Path in a cross platform way, we should consider using relative Paths, i.e. relative to the service file, theme file and so on.

Whilst PureWindowsPath accepts forward and back slashes, PurePoisixPath only supports forward slashes.

For ultimate portability, we should save the value of parts on the Path object. That way they can be used in a Path object constructor. See the example that follows:

orig_path = Path ('user/desktop')
orig_path.parts == ('user', 'desktop')

orig_parts = orig_path.parts

new_path = Path(*orig_parts)
new_path == Path('user/desktop')

JSON

Once we have the parts of the Path object encoding it with JSON means that we can save it as a string.

I have implemented a JSON encoder and decoder which automatically does the conversion to / from Path objects:

import json
from pathlib import Path
from json import JSONDecoder, JSONEncoder


class OpenLPJsonEncoder(JSONEncoder):
    def default(self, obj):
        if isinstance(obj, Path):
            return {'__path__': obj.parts}
        return super().default(obj)


class OpenLPJsonDecoder(JSONDecoder):
    def decode(self, jsn):
        obj = super().decode(jsn)
        if '__path__' in obj:
            return Path(*obj['__path__'])
        return obj


orig_path = Path('user', 'home')
json_encoded_path = json.dumps(orig_path, cls=OpenLPJsonEncoder)
json_encoded_path == '{"__path__": ["user", "home"]}'

new_path = json.loads(json_encoded_path, cls=OpenLPJsonDecoder)
new_path == Path('user/home')

SQLAlchemy

Wrapping this in a SQLAlchemy type becomes trivial.

import sqlalchemy.types as types


class PathType(types.TypeDecorator):
    impl = types.Unicode

    def process_bind_param(self, value, dialect):
        return json.dumps(value, cls=OpenLPJsonEncoder)

    def process_result_value(self, value, dialect):
        return json.loads(value, cls=OpenLPJsonDecoder)