Getting to Know Strings and Characters in Python | ||||||||||||||||||||||||||||||||||||||||||||||
Python doesn't have a character type single characters are strings of length one strings are immutable sequences of characters any operation that modifies a string will create a new string a string is also a sequence allows access to characters using zero-based integer indices |
||||||||||||||||||||||||||||||||||||||||||||||
Creating Strings in Python | ||||||||||||||||||||||||||||||||||||||||||||||
Standard String Literals
a string literal is just a sequence of characters enclosed in quotescan single or double quotes # create an empty string object x = '' y = "" # create a string object z = 'not an empty string'can use triple-quoted strings to create multiline strings >>> '''A triple-quoted string ... spanning across multiple ... lines using single quotes''' 'A triple-quoted string\nspanning across multiple\nlines using single quotes' >>> """A triple-quoted string ... spanning across multiple ... lines using double quotes""" 'A triple-quoted string\nspanning across multiple\nlines using double quotes' Escape Sequences in String Literals
an escape sequence allows interpretation of characters as something different
here a backslash suppresses the single quote's usual meaning as a delimiter >>> 'This string contains a single quote (\') character' "This string contains a single quote (') character"escape sequences
>>> "Hello\ ... , World\ ... !" 'Hello, World!'additional escape sequences
Raw String Literals
with raw string literals, you can create strings that don't translate escape sequencesany backslash characters are left in the string to create a raw string prepend the string literal with the letter r or R >>> print("Before\tAfter") # Regular string Before After >>> print(r"Before\tAfter") # Raw string Before\tAfterraw strings are commonly used to create regular expressions they allow the use of several different characters which may have special meanings without restrictions want to create a regular expression to match email addresses >>> import re >>> pattern = r"\b[A-Za-z0-9._%+-]+@[A-Za-z0-9.-]+\.[A-Z|a-z]{2,}\b" >>> pattern '\\b[A-Za-z0-9._%+-]+@[A-Za-z0-9.-]+\\.[A-Z|a-z]{2,}\\b' >>> regex = re.compile(pattern) >>> text = """ ... Please contact us at [email protected] ... or [email protected] for further information. ... """ >>> regex.findall(text) ['[email protected]', '[email protected]'] Formatted String Literals
formatted string literals (f-strings) allows to interpolation of values into strings
and format them as neededto create a string with an f-string literal prepend with an f or F letter F-strings let you interpolate values into replacement fields in a string literal create these fields using curly brackets >>> name = "Jane" >>> f"Hello, {name}!" 'Hello, Jane!' The Built-in str() Function
can create new strings using the built-in str() functionmore common use case is to convert other data types into strings >>> str() '' >>> str(42) '42' >>> str(3.14) '3.14' >>> str([1, 2, 3]) '[1, 2, 3]' >>> str({"one": 1, "two": 2, "three": 3}) "{'one': 1, 'two': 2, 'three': 3}" >>> str({"A", "B", "C"}) "{'B', 'C', 'A'}" |
||||||||||||||||||||||||||||||||||||||||||||||
Using Operators on Strings | ||||||||||||||||||||||||||||||||||||||||||||||
Concatenating Strings: The + Operator
the + operator is used to concatenate stringsconcatenation involves joining two or more string objects to create a single new string >>> greeting = "Hello" >>> name = "Pythonista" >>> greeting + ", " + name + "!!!" 'Hello, Pythonista!!!' >>> file = "app" >>> file += ".py" >>> file 'app.py' Repeating Strings: The * Operator
the repetition operator is the asterisk (*)the repetition operator takes two operands one operand is the string to be repeated the other operand is an integer representing the number of repetitions >>> "=" * 10 '==========' >>> 10 * "Hi!" 'Hi!Hi!Hi!Hi!Hi!Hi!Hi!Hi!Hi!Hi!' >>> sep = "-" >>> sep *= 10 >>> sep '----------'
Finding Substrings in a String: The in and not in Operators
use membership tests when need to determine if a substring appears in a given string
>>> "food" in "That's food for thought." True >>> "food" in "That's good for now." False |
||||||||||||||||||||||||||||||||||||||||||||||
Exploring Built-in Functions for String Processing | ||||||||||||||||||||||||||||||||||||||||||||||
Finding the Number of Characters: len()
>>> len("Python") 6 >>> len("") 0 Converting Objects Into Strings: str() and repr()
to convert objects into their string representation can use the built-in str()
and repr() functionsthe str() function converts a given object into its user-friendly representation this type of string representation is targeted at end users >>> str(42) '42' >>> str(3.14) '3.14' >>> str([1, 2, 3]) '[1, 2, 3]' >>> str({"one": 1, "two": 2, "three": 3}) "{'one': 1, 'two': 2, 'three': 3}" >>> str({"A", "B", "C"}) "{'B', 'C', 'A'}"repr() function returns a developer-friendly representation of the object >>> repr(42) '42' >>> repr(3.14) '3.14' >>> repr([1, 2, 3]) '[1, 2, 3]' >>> repr({"one": 1, "two": 2, "three": 3}) "{'one': 1, 'two': 2, 'three': 3}" >>> repr({"A", "B", "C"}) "{'B', 'C', 'A'}"should be able to copy the output of repr() to re-create the original object difference between repr() and str() class Person: def __init__(self, name, age): self.name = name self.age = age def __repr__(self): return f"{type(self).__name__}(name='{self.name}', age={self.age})" def __str__(self): return f"I'm {self.name}, and I'm {self.age} years old." Formatting Strings: format()
>>> import math >>> from datetime import datetime >>> format(math.pi, ".4f") # Four decimal places '3.1416' >>> format(1000000, ",.2f") # Thousand separators '1,000,000.00' >>> format("Header", "=^30") # Centered and filled '============Header============' >>> format(datetime.now(), "%a %b %d, %Y") # Date 'Mon Jul 29, 2024'the ".4f" specifier formats the input value as a floating-point number with four decimal places. the ",.2f" format specifier formats a number using commas as thousand separators and with two decimal places the "=^30" specifier to format the string "Header" centered in a width of 30 characters using the equal sign as a filler character
Processing Characters Through Code Points: ord() and chr()
the ord() function returns an integer value representing the Unicode code point for
the given characterthe chr() function does the reverse of ord() returns the character value associated with a given code point |
||||||||||||||||||||||||||||||||||||||||||||||
Indexing and Slicing Strings | ||||||||||||||||||||||||||||||||||||||||||||||
Python's strings are ordered sequences of characters Indexing Strings
can access individual characters from a string using the characters' associated indexsequences are zero-indexed can use positive or negative index Slicing Strings
slice is an expression of the form s[m:n]returns the portion of s starting at index m, and up to but not including index n by default the first index is zero slicing starts at the beginning of the string if the second index is not provided slicing returns the rest of the string >>> s = "foobar" >>> s[2:5] 'oba'a optional argument is the step to use >>> numbers = "12345" * 5 >>> numbers '1234512345123451234512345' >>> numbers[::5] '11111' >>> numbers[4::5] '55555' |
||||||||||||||||||||||||||||||||||||||||||||||
Exploring str Class Methods | ||||||||||||||||||||||||||||||||||||||||||||||
Manipulating Casing
Finding and Replacing Substrings
.count(sub[, start[, end]])sub is the substring to search for start is the index to start with default is beginning of the string end is an exclusive index default is the end of the string returns the number of non-overlapping occurrences of the substring >>> "foo goo moo".count("oo") 3 .find(sub[, start[, end]])arguments are the same as above can use .find() to check whether a string contains a particular substring calling .find(sub) returns the lowest index in the target string where sub is found returns -1 if substring is not found >>> "foo bar foo baz foo qux".find("foo") 0 >>> "foo bar foo baz foo qux".find("grault") -1 .index(sub[, start[, end]])similar to .find() raises an exception if the substring is not found >>> "foo bar foo baz foo qux".index("foo") 0 >>> "foo bar foo baz foo qux".index("grault") Traceback (most recent call last): ... ValueError: substring not found .rfind(sub[, start[, end]])similar to .find() returns the highest index in the target string where the substring sub is found >>> "foo bar foo baz foo qux".rfind("foo", 0, 14) 8 >>> "foo bar foo baz foo qux".rfind("foo", 10, 14) -1 .rindex(sub[, start[, end]])similar to .rfind() raises an exception if the substring is not found >>> "foo bar foo baz foo qux".rindex("foo") 16 >>> "foo bar foo baz foo qux".rindex("grault") Traceback (most recent call last): ... ValueError: substring not found .startswith(prefix[, start[, end]])returns True if the target string starts with the specified prefix and False otherwise >>> "foobar".startswith("foo") True >>> "foobar".startswith("bar") Falsecomparison is restricted to the substring indicated by start and end if they're specified >>> "foobar".startswith("bar", 3) True >>> "foobar".startswith("bar", 3, 5) False .endswith(suffix[, start[, end]])returns True if the target string ends with the specified suffix and False otherwise >>> "foobar".endswith("bar") True >>> "foobar".endswith("foo") False >>> "foobar".endswith("oob", 0, 4) True >>> "foobar".endswith("oob", 2, 4) False Classifying Strings
classify a string based on its charactersin all cases the methods are predicates returning True or False .isalnum()returns True if the target string isn't empty and all its characters are alphanumeric >>> "abc123".isalnum() True >>> "abc$123".isalnum() False .isalpha()returns True if the target string isn't empty and all its characters are alphabetic whitespaces aren't considered alpha characters >>> "ABCabc".isalpha() True >>> "abc123".isalpha() False >>> "ABC abc".isalpha() False .isdigit()returns True if the target string is not empty and all its characters are numeric digits >>> "123".isdigit() True >>> "123abc".isdigit() False .isdigit()returns True if the target string is not empty and all its characters are numeric digits >>> "123".isdigit() True >>> "123abc".isdigit() False .isidentifier()returns True if the target string is a valid Python identifier according to the language definition will return True for a string that matches a Python keyword even though that wouldn't be a valid identifier >>> "foo32".isidentifier() True >>> "32foo".isidentifier() False >>> "foo$32".isidentifier() False >>> "and".isidentifier() True .iskeyword()contained keyword module >>> from keyword import iskeyword >>> iskeyword("and") True .islower()returns True if the target string isn't empty and all its alphabetic characters are lowercase >>> "abc".islower() True >>> "abc1$d".islower() True >>> "Abc1$D".islower() False .isprintable()returns True if the target string is empty or if all its alphabetic characters are printable >>> "a\tb".isprintable() False >>> "a b".isprintable() True >>> "".isprintable() True >>> "a\nb".isprintable() False .isspace()returns True if the target string isn't empty and all its characters are whitespaces most commonly used whitespace characters are space (" "), tab ("\t"), and newline ("\n") >>> " \t \n ".isspace() True >>> " a ".isspace() False .istitle()returns True if
>>> "This Is A Title".istitle() True >>> "This is a title".istitle() False >>> "Give Me The #$#@ Ball!".istitle() True .isupper()returns True if the target string isn't empty and all its alphabetic characters are uppercase >>> "ABC".isupper() True >>> "ABC1$D".isupper() True >>> "Abc1$D".isupper() False Formatting Strings
.center(width[, fill])returns a string consisting of the target string centered in a field of width characters default padding consists of the ASCII space character if the target string is as long as width or longer then it's returned unchanged >>> "foo".center(10) ' foo ' >>> "bar".center(10, "-") '---bar----' >>> "foo".center(2) 'foo' .expandtabs(tabsize=8)replaces each tab character ("\t") found in the target string with spaces default assumes eight characters per tab >>> "a\tb\tc".expandtabs() 'a b c' >>> "aaa\tbbb\tc".expandtabs() 'aaa bbb c' >>> "a\tb\tc".expandtabs(4) 'a b c' >>> "aaa\tbbb\tc".expandtabs(tabsize=4) 'aaa bbb c' .ljust(width[, fill])returns a string consisting of the target string left-justified in a field of width characters default padding consists of the ASCII space character if the target string is as long as width or longer, then it's returned unchanged >>> "foo".ljust(10) 'foo ' >>> "foo".ljust(10, "-") 'foo-------' >>> "foo".ljust(2) 'foo' rjust(width[, fill])similar to.ljust() but right justifies string .removeprefix(prefix)returns a copy of the target string with prefix removed from the beginning if the original string doesn't begin with prefix, then the string is returned unchanged >>> "http://python.org".removeprefix("http://") 'python.org' >>> "http://python.org".removeprefix("python") 'http://python.org' .removesuffix(suffix)similar to .removeprefix() .lstrip([chars])returns a copy of the target string with any whitespace characters removed from the left end >>> " foo bar baz ".lstrip() 'foo bar baz ' >>> "\t\nfoo\t\nbar\t\nbaz".lstrip() 'foo\t\nbar\t\nbaz'optional chars argument is a string that specifies the set of characters to be removed >>> "http://cnn.com".lstrip("/:htp") 'cnn.com' .rstrip([chars]) Joining and Splitting Strings
similar to .lstrip()
.strip([chars])trims whitespace from ends of string >>> " foo bar baz ".strip() 'foo bar baz'optional chars argument is a string that specifies the set of characters to be removed .replace(old, new[, count])use the .replace() method to replace a substring of a string returns a copy of the target string with all the occurrences of the old substring replaced by new >>> "foo bar foo baz foo qux".replace("foo", "grault") 'grault bar grault baz grault qux'the optional count argument is the maximum of count replacements are performed starts at the left end of the target string >>> "foo bar foo baz foo qux".replace("foo", "grault", 2) 'grault bar grault baz foo qux' .zfill(width)returns a copy of the target string left-padded with zeroes to the specified width if the target string contains a leading sign, then it remains at the left edge of the result string after zeros are inserted if the target string is as long as width or longer, then it's returned unchanged: >>> "42".zfill(5) '00042' >>> "+42".zfill(8) '+0000042' >>> "-42".zfill(8) '-0000042'will zero-pad a string that isn't a numeric value >>> "foo".zfill(6) '000foo' |
||||||||||||||||||||||||||||||||||||||||||||||
Joining and Splitting Strings | ||||||||||||||||||||||||||||||||||||||||||||||
these methods operate on or return iterables
.join(iterable)takes an iterable of string objects returns the string that results from concatenating the objects in the input iterable (argument) separated by the target string (separator) >>> "**".join(["foo", "bar", "baz", "qux"]) 'foo**bar**baz**qux' .partition(sep)the .partition(sep) call splits the target string at the first occurrence of string sep the return value is a tuple with three objects
if sep isn't found then the returned tuple contains the string followed by two empty strings >>> "foo.bar".partition(".") ('foo', '.', 'bar') >>> "foo@@bar@@baz".partition("@@") ('foo', '@@', 'bar@@baz') >>> "foo.bar@@".partition("@@") ('foo.bar', '@@', '') >>> "foo.bar".partition("@@") ('foo.bar', '', '') .rpartition(sep)works like .partition(sep) except that the target string is split at the last occurrence of sep instead of the first >>> "foo@@bar@@baz".partition("@@"") ('foo', '@@', 'bar@@baz') >>> "foo@@bar@@baz".rpartition("@@") ('foo@@bar', '@@', 'baz') .split(sep=None, maxsplit=-1)Without arguments .split() splits the target string into substrings delimited by any sequence of whitespace consecutive whitespace characters are combined into a single delimiter returns the substrings as a list >>> "foo bar baz qux".split() ['foo', 'bar', 'baz', 'qux'] >>> "foo\n\tbar baz\r\fqux".split() ['foo', 'bar', 'baz', 'qux']the sep argument is specified it will be used as the separator >>> "foo.bar.baz.qux".split(".") ['foo', 'bar', 'baz', 'qux']if the optional parameter maxsplit is specified, then a maximum of that many splits are performed >>> "foo.bar.baz.qux".split(".", 1) ['foo', 'bar.baz.qux']if maxsplit isn't specified, then the results of .split() and .rsplit() are indentical following escape sequences can work as line boundaries
|