Python's "with open(...) as ..." Pattern | ||||||||||||||||||
use Python's "with open(...) as ..." pattern to open a text file and read its contents
with open('data.txt', 'r') as f: data = f.read()open() takes a filename and a file mode as its arguments for further info on file modes see Reading and Writing Text Files |
||||||||||||||||||
Getting a Directory Listing | ||||||||||||||||||
to get a list of all the files and folders in a particular directory in the filesystem
returns file and directory properties such as file size and modification date Directory Listing in Legacy Python Versions
in legacy versions of Python use os.listdir() to get a directory listing
>>> import os >>> entries = os.listdir('my_directory/') >>> os.listdir('my_directory/') ['sub_dir_c', 'file1.py', 'sub_dir_b', 'file3.txt', 'file2.csv', 'sub_dir']os.listdir() returns a Python list containing the names of the files and subdirectories can use loop to read the list >>> entries = os.listdir('my_directory/') >>> for entry in entries: ... print(entry) ... ... sub_dir_c file1.py sub_dir_b file3.txt file2.csv sub_dir Directory Listing in Modern Python Versions
os.scandir() returns an iterator as opposed to a list when called
>>> import os >>> entries = os.scandir('my_directory/') >>> entries <posix.ScandirIterator object at 0x7f5b047f3690>the ScandirIterator points to all the entries in the current directory can loop over the contents of the iterator and print out the filenames import os with os.scandir('my_directory/') as entries: for entry in entries: print(entry.name)another way to get a directory listing is to use the pathlib module from pathlib import Path entries = Path('my_directory/') for entry in entries.iterdir(): print(entry.name)objects returned by Path are either PosixPath or WindowsPath objects depending on the OS pathlib.Path() objects have an .iterdir() method for creating an iterator of all files and folders in a directory each entry yielded by .iterdir() contains information about the file or directory such as its name and file attributes pathlib offers a set of classes featuring most of the common operations on paths in an easy, object-oriented way using pathlib is more if not equally efficient as using the functions in os benefit of using pathlib over os is that it reduces the number of imports needed to make to manipulate filesystem paths for further info on Python's pathlib Module directory-listing functions
Listing All Files in a Directory
to filter out directories and only list files from a directory listing produced by os.listdir() use os.path
import os # List all files in a directory using os.listdir basepath = 'my_directory/' for entry in os.listdir(basepath): # only print out filenames and not directories if os.path.isfile(os.path.join(basepath, entry)): print(entry)easier and cleaner way is to use os.scandir() or pathlib.Path() import os # List all files in a directory using scandir() basepath = 'my_directory/' with os.scandir(basepath) as entries: for entry in entries: if entry.is_file(): print(entry.name) Listing Subdirectories
use one of the methods
import os # List all subdirectories using os.listdir basepath = 'my_directory/' for entry in os.listdir(basepath): if os.path.isdir(os.path.join(basepath, entry)): print(entry)using pathlib.Path() is again much simpler and cleaner from pathlib import Path # List all subdirectory using pathlib basepath = Path('my_directory/') for entry in basepath.iterdir(): if entry.is_dir(): print(entry.name) |
||||||||||||||||||
Getting File Attributes | ||||||||||||||||||
os.scandir() and pathlib.Path() retrieve a directory listing with file attributes combined this can be potentially more efficient than using os.listdir() to list files and then getting file attribute information for each file >>> import os >>> with os.scandir('my_directory/') as dir_contents: ... for entry in dir_contents: ... info = entry.stat() # list time file was last modified ... print(info.st_mtime) ... 1539032199.0052035 1539032469.6324475 1538998552.2402923 1540233322.4009316 1537192240.0497339 1540266380.3434134os.scandir() returns a ScandirIterator object each entry in a ScandirIterator object has a .stat() method that retrieves information about the file or directory it points to .stat() provides information such as file size and the time of last modification the pathlib module has corresponding methods for retrieving file information that give the same results >>> from pathlib import Path >>> current_dir = Path('my_directory') >>> for path in current_dir.iterdir(): ... info = path.stat() # list time file was last modified ... print(info.st_mtime) ... 1539032199.0052035 1539032469.6324475 1538998552.2402923 1540233322.4009316 1537192240.0497339 1540266380.3434134the st_mtime attribute returns a float value that represents seconds since the epoch to convert the values returned by st_mtime for display purposes, can write a helper function to convert the seconds into a datetime object from datetime import datetime from os import scandir def convert_date(timestamp): d = datetime.utcfromtimestamp(timestamp) formated_date = d.strftime('%d %b %Y') return formated_date def get_files(): dir_entries = scandir('my_directory/') for entry in dir_entries: if entry.is_file(): info = entry.stat() print(f'{entry.name}\t Last Modified: {convert_date(info.st_mtime)}')the arguments passed to .strftime()
|
||||||||||||||||||
Making Directories | ||||||||||||||||||
Creating a Single Directory
the os and pathlib modules include functions for creating directories
import os os.mkdir('example_directory/')if a directory already exists, os.mkdir() raises FileExistsError can create a directory using pathlib from pathlib import Path p = Path('example_directory/') p.mkdir()again if a directory already exists, .mkdir() raises FileExistsError to avoid errors like this, catch the error when it happens and let the user know from pathlib import Path p = Path('example_directory') try: p.mkdir() except FileExistsError as exc: print(exc)alternatively can ignore the FileExistsError by passing the exist_ok=True argument to .mkdir() from pathlib import Path p = Path('example_directory') p.mkdir(exist_ok=True) Creating Multiple Directories
os.makedirs() is similar to os.mkdir()the difference between the two is that not only can os.makedirs() create individual directories, it can also be used to create directory trees it can create any necessary intermediate folders in order to ensure a full path exists to create a group of directories like 2018/10/05 import os os.makedirs('2018/10/05')results with . | └── 2018/ └── 10/ └── 05/.makedirs() creates directories with default permissions if directories with different permissions are needed call .makedirs() passing in the mode for the directories to be created in import os os.makedirs('2018/10/05', mode=0o770)creates the 2018/10/05 directory structure and gives the owner and group users read, write, and execute permission the default mode is 0o777 the mode argument does not affect the file permission bits of newly created intermediate-level directories octal values are used to represent file permissions and are specified with a leading 0o or 0O each digit in the octal number represents a specific set of permissions for the owner, group, and others, respectively the digits range from 0 to 7, where each bit represents a permission: read (4), write (2), and execute (1). 0o755 represents rwxr-xr-x permissions
|
||||||||||||||||||
Filename Pattern Matching | ||||||||||||||||||
. ├── sub_dir/ | ├── file1.py | └── file2.py | ├── admin.py ├── data_01_backup.txt ├── data_01.txt ├── data_02_backup.txt ├── data_02.txt ├── data_03_backup.txt ├── data_03.txt └── tests.py Using String Methods
.startswith() and .endswith() are useful when searching for patterns in filenamesfirst get a directory listing and then iterate over it code above finds all the files in some_directory/ ending with .txt >>> import os >>> # Get .txt files >>> for f_name in os.listdir('some_directory'): ... if f_name.endswith('.txt'): ... print(f_name) Simple Filename Pattern Matching Using fnmatch
fnmatch has more advanced functions and methods for pattern matchingfnmatch.fnmatch() is a function which supports the use of wildcards such as * and ? to match filenames in order to find all .txt files in a directory using fnmatch do the following >>> import os >>> import fnmatch >>> for file_name in os.listdir('some_directory/'): ... if fnmatch.fnmatch(file_name, '*.txt'): ... print(file_name) More Advanced Pattern Matching
using * for finding files that meet certain criteriacould be only interested in finding .txt files' filenames which contain
>>> for filename in os.listdir('.'): ... if fnmatch.fnmatch(filename, 'data_*_backup.txt'): ... print(filename) Filename Pattern Matching Using glob
.glob() in the glob module works just like fnmatch.fnmatch()unlike fnmatch.fnmatch(), it treats files beginning with a period (.) as special how to use glob to search for all Python (.py) source files in the current directory >>> import glob >>> glob.glob('*.py') ['admin.py', 'tests.py']glob can search for files recursively in subdirectories >>> import glob >>> for file in glob.iglob('**/*.py', recursive=True): ... print(file) admin.py tests.py sub_dir/file1.py sub_dir/file2.pypathlib contains similar methods for making flexible file listings can use .Path.glob() to list file types that start with the letter p >>> from pathlib import Path >>> p = Path('.') >>> for name in p.glob('*.p*'): ... print(name) admin.py scraper.py docs.pdfcalling p.glob('*.p*') returns a generator object which points to all files in the current directory starting with the letter p in their file extension pathlib combines many of the best features of the os, os.path, and glob modules into one single module |
||||||||||||||||||
Traversing Directories and Processing Files | ||||||||||||||||||
example directory structure used in this section
. | ├── folder_1/ | ├── file1.py | ├── file2.py | └── file3.py | ├── folder_2/ | ├── file4.py | ├── file5.py | └── file6.py | ├── test1.txt └── test2.txtos.walk() defaults to traversing directories in a top-down manner # Walking a directory tree and printing the names of the directories and files for dirpath, dirnames, files in os.walk('.'): print(f'Found directory: {dirpath}') for file_name in files: print(file_name)os.walk() returns three values on each iteration of the loop
Found directory: . test1.txt test2.txt Found directory: ./folder_1 file1.py file3.py file2.py Found directory: ./folder_2 file4.py file5.py file6.pyto traverse the directory tree in a bottom-up manner, pass in a topdown=False keyword argument to os.walk() for dirpath, dirnames, files in os.walk('.', topdown=False): print(f'Found directory: {dirpath}') for file_name in files: print(file_name) |
||||||||||||||||||
Making Temporary Files and Directories | ||||||||||||||||||
tempfile module can be used to open and store data temporarily in a file or directory while the app is running tempfile handles the deletion of the temporary files when the app is done with them from tempfile import TemporaryFile # Create a temporary file and write some data to it fp = TemporaryFile('w+t') fp.write('Hello universe!') # Go back to the beginning and read data from file fp.seek(0) data = fp.read() # Close the file, after which it will be removed fp.close()the mode is 'w+t', makes tempfile create a temporary text file in write mode no need to give the temporary file a filename as it will be deleted after the script has completed temporary files and directories are stored in a special system directory for storing temporary files Python searches a standard list of directories to find one that the user can create files in On Windows the directories are C:\TEMP, C:\TMP, \TEMP, and \TMP, in that order .TemporaryFile() is also a context manager so it can be used in conjunction with the with statement with TemporaryFile('w+t') as fp: fp.write('Hello universe!') fp.seek(0) fp.read() # File is now closed and removedtempfile can also be used to create temporary directories using tempfile.TemporaryDirectory() >>> import tempfile >>> with tempfile.TemporaryDirectory() as tmpdir: ... print('Created temporary directory ', tmpdir) ... os.path.exists(tmpdir) ... Created temporary directory /tmp/tmpoxbkrm6c True >>> # Directory contents have been removed ... >>> tmpdir '/tmp/tmpoxbkrm6c' >>> os.path.exists(tmpdir) False |
||||||||||||||||||
Deleting Files and Directories | ||||||||||||||||||
Deleting Files in Python
import os data_file = 'C:\\Users\\vuyisile\\Desktop\\Test\\data_0.txt' os.remove(data_file) data_file = 'C:\\Users\\vuyisile\\Desktop\\Test\\data_1.txt' os.unlink(data_file)both functions will throw an OSError if the path passed to them points to a directory instead of a file can either check that what is to be deleted is actually a file and only delete it if it is can also use exception handling to handle the OSError import os data_file = 'home/data.txt' # If the file exists, delete it if os.path.isfile(data_file): os.remove(data_file) else: print(f'Error: {data_file} not a valid filename')os.path.isfile() checks whether data_file is actually a file If it is, it is deleted by the call to os.remove() if data_file points to a folder, an error message is printed to the console using exception handling when deleting fiels import os data_file = 'home/data.txt' # Use exception handling try: os.remove(data_file) except OSError as e: print(f'Error: {data_file} : {e.strerror}')can use pathlib.Path.unlink() to delete files from pathlib import Path data_file = Path('home/data.txt') try: data_file.unlink() except IsADirectoryError as e: print(f'Error: {data_file} : {e.strerror}')creates a Path object called data_file that points to a file calling .remove() on data_file will delete home/data.txt if data_file points to a directory, an IsADirectoryError is raised a Python program has the same permissions as the user running it if the user does not have permission to delete the file, a PermissionError is raised Deleting Directories
standard library offers these functions for deleting directories
two functions only work if the directory is empty if the directory isn't empty, an OSError is raised import os trash_dir = 'my_documents/bad_dir' try: os.rmdir(trash_dir) except OSError as e: print(f'Error: {trash_dir} : {e.strerror}')can use pathlib to delete directory from pathlib import Path trash_dir = Path('my_documents/bad_dir') try: trash_dir.rmdir() except OSError as e: print(f'Error: {trash_dir} : {e.strerror}') Deleting Entire Directory Trees
to delete non-empty directories and entire directory trees use shutil.rmtree()
import shutil trash_dir = 'my_documents/bad_dir' try: shutil.rmtree(trash_dir) except OSError as e: print(f'Error: {trash_dir} : {e.strerror}')to delete empty folders recursively in conjunction with os.walk() import os for dirpath, dirnames, files in os.walk('.', topdown=False): try: os.rmdir(dirpath) except OSError as ex: passwalks down the directory tree and tries to delete each directory it finds if the directory isn't empty, an OSError is raised and that directory is skipped
|
||||||||||||||||||
Copying, Moving, and Renaming Files and Directories | ||||||||||||||||||
shutil is short for shell utilities provides a number of high-level operations on files to support copying, archiving, and removal of files and directories Copying Files in Python
shutil offers commonly used functions shutil.copy() and shutil.copy2()
to copy a file from one location to another using shutil.copy()
import shutil src = 'path/to/file.txt' dst = 'path/to/dest_dir' shutil.copy(src, dst)shutil.copy() only copies the file's contents and the file's permissions other metadata like the file's creation and modification times are not preserved to preserve all file metadata when copying use shutil.copy2() import shutil src = 'path/to/file.txt' dst = 'path/to/dest_dir' shutil.copy2(src, dst) Copying Directories
shutil.copytree() will copy an entire directory and everything contained in itshutil.copytree(src, dest) takes two arguments
>>> import shutil >>> shutil.copytree('data_1', 'data1_backup') 'data1_backup' Moving Files and Directories
to move a file or directory to another location, use shutil.move(src, dst)src is the file or directory to be moved and dst is the destination >>> import shutil >>> shutil.move('dir_1/', 'backup/') 'backup'shutil.move('dir_1/', 'backup/') moves dir_1/ into backup/ if backup/ exists if backup/ does not exist, dir_1/ will be renamed to backup Renaming Files and Directories
Python includes os.rename(src, dst) for renaming files and directories
>>> os.rename('first.zip', 'first_01.zip')another way to rename files or directories is to use rename() from the pathlib module >>> from pathlib import Path >>> data_file = Path('data_01.txt') >>> data_file.rename('data.txt') |
||||||||||||||||||
Archiving | ||||||||||||||||||
Python programs can create, read, and extract data from archives Reading ZIP Files
the zipfile module is a low level module which is part of the Standard Libraryzipfile has functions that make it easy to open and extract ZIP files to read the contents of a ZIP file create a ZipFile object ZipFile objects are similar to file objects created using open() ZipFile is also a context manager supporting the with statement import zipfile with zipfile.ZipFile('data.zip', 'r') as zipobj: passarchive below was created from a directory named data which contains a total of 5 files and 1 subdirectory . | ├── sub_dir/ | ├── bar.py | └── foo.py | ├── file1.py ├── file2.py └── file3.pyto get a list of files in the archive, call namelist() import zipfile with zipfile.ZipFile('data.zip', 'r') as zipobj: zipobj.namelist()output is a list ['file1.py', 'file2.py', 'file3.py', 'sub_dir/', 'sub_dir/bar.py', 'sub_dir/foo.py'].namelist() returns a list of names of the files and directories in the archive to retrieve information about the files in the archive, use .getinfo() import zipfile with zipfile.ZipFile('data.zip', 'r') as zipobj: bar_info = zipobj.getinfo('sub_dir/bar.py') bar_info.file_sizeoutput import zipfile with zipfile.ZipFile('data.zip', 'r') as zipobj: bar_info = zipobj.getinfo('sub_dir/bar.py') bar_info.file_sizeoutput 15277.getinfo() returns a ZipInfo object that stores information about a single member of the archive to get information about a file in the archive, you pass its path as an argument to .getinfo() using getinfo() can retrieve information about archive members such as
Extracting ZIP Archives
The zipfile module allows you to extract one or more files from ZIP archives through .extract() and .extractall()methods extract files to the current directory by default both take an optional path parameter specifying the directory to extract files to if the directory does not exist, it is automatically created >>> import zipfile >>> import os >>> os.listdir('.') ['data.zip'] >>> data_zip = zipfile.ZipFile('data.zip', 'r') >>> # Extract a single file to current directory >>> data_zip.extract('file1.py') '/home/terra/test/dir1/zip_extract/file1.py' >>> os.listdir('.') ['file1.py', 'data.zip'] >>> # Extract all files into a different directory >>> data_zip.extractall(path='extract_dir/') >>> os.listdir('.') ['file1.py', 'extract_dir', 'data.zip'] >>> os.listdir('extract_dir') ['file1.py', 'file3.py', 'file2.py', 'sub_dir'] >>> data_zip.close() Extracting Data From Password Protected Archives
zipfile supports extracting password protected ZIPsto extract password protected ZIP files, pass in the password to the .extract() or .extractall() method as an argument >>> import zipfile >>> with zipfile.ZipFile('secret.zip', 'r') as pwd_zip: ... # Extract from a password protected archive ... pwd_zip.extractall(path='extract_dir', pwd='Quish3@o') Creating New ZIP Archives
to create a new ZIP archive open a ZipFile object in write mode (w) and add the files to the archive
>>> import zipfile >>> file_list = ['file1.py', 'sub_dir/', 'sub_dir/bar.py', 'sub_dir/foo.py'] >>> with zipfile.ZipFile('new.zip', 'w') as new_zip: ... for name in file_list: ... new_zip.write(name)to add files to an existing archive open a ZipFile object in append mode and then add the files >>> # Open a ZipFile object in append mode >>> with zipfile.ZipFile('new.zip', 'a') as new_zip: ... new_zip.write('data.txt') ... new_zip.write('latin.txt') Opening TAR Archives
TAR files are uncompressed file archives like ZIPcan be compressed using gzip, bzip2, and lzma compression methods tarFile class allows reading and writing of TAR archives to read from an archive import tarfile with tarfile.open('example.tar', 'r') as tar_file: print(tar_file.getnames())tarfile objects open like most file-like objects has an open() function that takes a mode that determines how the file is to be opened use the 'r', 'w' or 'a' modes to open an uncompressed TAR file for reading, writing, and appending, respectively to open compressed TAR files, pass in a mode argument to tarfile.open() in the form filemode[:compression] possible modes TAR files can be opened in:
to read an uncompressed TAR file and retrieve the names of the files in it use .getnames() >>> import tarfile >>> tar = tarfile.open('example.tar', mode='r') >>> tar.getnames() ['CONTRIBUTING.rst', 'README.md', 'app.py']metadata of each entry in the archive can be accessed using special attributes >>> for entry in tar.getmembers(): ... print(entry.name) ... print(' Modified:', time.ctime(entry.mtime)) ... print(' Size :', entry.size, 'bytes') ... print() CONTRIBUTING.rst Modified: Sat Nov 1 09:09:51 2018 Size : 402 bytes README.md Modified: Sat Nov 3 07:29:40 2018 Size : 5426 bytes app.py Modified: Sat Nov 3 07:29:13 2018 Size : 6218 bytes Extracting Files From a TAR Archive
to extract files from TAR archives can use
>>> tar.extract('README.md') >>> os.listdir('.') ['README.md', 'example.tar']to unpack or extract everything from the archive use .extractall() >>> tar.extractall(path="extracted/")to extract a file object for reading or writing, use .extractfile() which takes a filename or TarInfo object to extract as an argument .extractfile() returns a file-like object that can be read and used >>> f = tar.extractfile('app.py') >>> f.read() >>> tar.close()opened archives should always be closed after they have been read or written to Creating New TAR Archives
>>> import tarfile >>> # create list of files to be archived >>> file_list = ['app.py', 'config.py', 'CONTRIBUTORS.md', 'tests.py'] >>> with tarfile.open('packages.tar', mode='w') as tar: ... for file in file_list: ... tar.add(file) >>> # Read the contents of the newly created archive >>> with tarfile.open('package.tar', mode='r') as t: ... for member in t.getmembers(): ... print(member.name) app.py config.py CONTRIBUTORS.md tests.pyto add new files to an existing archive, open the archive in append mode ('a') >>> with tarfile.open('package.tar', mode='a') as tar: ... tar.add('foo.bar') >>> with tarfile.open('package.tar', mode='r') as tar: ... for member in tar.getmembers(): ... print(member.name) app.py config.py CONTRIBUTORS.md tests.py foo.bar Working With Compressed Archives
tarfile can also read and write TAR archives compressed using gzip, bzip2, and lzma compressionto read or write to a compressed archive use tarfile.open() pass in the appropriate mode for the compression type to read or write data to a TAR archive compressed using gzip, use the 'r:gz' or 'w:gz' modes respectively >>> files = ['app.py', 'config.py', 'tests.py'] >>> with tarfile.open('packages.tar.gz', mode='w:gz') as tar: ... tar.add('app.py') ... tar.add('config.py') ... tar.add('tests.py') >>> with tarfile.open('packages.tar.gz', mode='r:gz') as t: ... for member in t.getmembers(): ... print(member.name) app.py config.py tests.pyopening compressed archives in append mode is not possible |
||||||||||||||||||
An Easier Way of Creating Archives | ||||||||||||||||||
the Standard Library also supports creating TAR and ZIP archives using the high-level methods in the shutil module the archiving utilities in shutil allow you to create, read, and extract ZIP and TAR archives these utilities rely on the lower level tarfile and zipfile modules shutil.make_archive() takes at least two arguments
can pass in an optional root_dir argument to compress files in a different directory .make_archive() supports the zip, tar, bztar, and gztar archive formats import shutil # shutil.make_archive(base_name, format, root_dir) shutil.make_archive('data/backup', 'tar', 'data/')above code copies everything in data/ and creates an archive called backup.tar in the filesystem and returns its name to extract the archive call .unpack_archive() shutil.unpack_archive('backup.tar', 'extract_dir/') |
||||||||||||||||||
Reading Multiple Files | ||||||||||||||||||
Python supports reading data from multiple input streams or from a list of files through the fileinput module module allows looping over the contents of one or more text files quickly and easily typical way fileinput is used import fileinput for line in fileinput.input() process(line)use fileinput to build a crude version of the common UNIX utility cat cat utility reads files sequentially writing them to standard output when given more than one file in its command line arguments, cat will concatenate the text files and display the result in the terminal # File: fileinput-example.py import fileinput import sys files = fileinput.input() for line in files: if fileinput.isfirstline(): print(f'\n--- Reading {fileinput.filename()} ---') print(' -> ' + line, end='') print()example $ python3 fileinput-example.py bacon.txt cupcake.txt --- Reading bacon.txt --- -> Spicy jalapeno bacon ipsum dolor amet in in aute est qui enim aliquip, -> irure cillum drumstick elit. -> Doner jowl shank ea exercitation landjaeger incididunt ut porchetta. -> Tenderloin bacon aliquip cupidatat chicken chuck quis anim et swine. -> Tri-tip doner kevin cillum ham veniam cow hamburger. -> Turkey pork loin cupidatat filet mignon capicola brisket cupim ad in. -> Ball tip dolor do magna laboris nisi pancetta nostrud doner. --- Reading cupcake.txt --- -> Cupcake ipsum dolor sit amet candy I love cheesecake fruitcake. -> Topping muffin cotton candy. -> Gummies macaroon jujubes jelly beans marzipan.fileinput retrieves more information about each line such as
|