Python Topics : Requests Library (HTTP)
Getting Started With Python's Requests Library
not included in standard library
to install the package
$ python -m pip install requests
The GET Request
most common HTTP methods is GET
method used when trying to retrieve data from a specified resource
to make a GET request using Requests invoke requests.get()
The Response
a Response is an object for inspecting the results of the request
>>> import requests
>>> response = requests.get("https://api.github.com")
Status Codes
status code represents the status of the request

Status Code Description
1xx informational response the request was received, continuing process
2xx successful the request was successfully received, understood, and accepted
3xx redirection further action needs to be taken in order to complete the request
4xx client error the request contains bad syntax or cannot be fulfilled
5xx server error the server failed to fulfill an apparently valid request
checking the status code
if response.status_code == 200:
    print("Success!")
elif response.status_code == 404:
    print("Not Found.")
below the Response object is used as a conditional
implicitly checks whether the response.status_code is between 200 and 399
indicates general success and not specific to only status code 200
if response:
    print("Success!")
else:
    raise Exception(f"Non-success status code: {response.status_code}")
can also use Response object's .raise_for_status() method
will raise an HTTPError for status codes between 400 and 600
import requests
from requests.exceptions import HTTPError

URLS = ["https://api.github.com", "https://api.github.com/invalid"]

for url in URLS:
    try:
        response = requests.get(url)
        response.raise_for_status() 
    except HTTPError as http_err:
        print(f"HTTP error occurred: {http_err}")
    except Exception as err:
        print(f"Other error occurred: {err}")
    else:
        print("Success!")
Content
a Response object's message body is its payload to view the message body as raw bytes use response.content
to view the message body as a string use response.text

response is actually serialized JSON content
response.json() returns a dictionary

Headers
response headers can contain information such as
  • the content type of the response payload
  • a time limit on how long to cache the response
to view the headers
>>> import requests

>>> response = requests.get("https://api.github.com")
>>> response.headers
{'Server': 'GitHub.com',
...
'X-GitHub-Request-Id': 'AE83:3F40:2151C46:438A840:65C38178'}
response.headers returns a dictionary-like object
to see the content type of a response via its headers
>>> response.headers["Content-Type"]
'application/json; charset=utf-8'
the HTTP specification defines headers as case-insensitive
>>> response.headers["content-type"]
'application/json; charset=utf-8'
Query String Parameters
customize a GET request is to pass values through query string parameters in the URL
to do this using get() pass data to params
here use a dictionary
import requests

# Search GitHub's repositories for popular Python projects
response = requests.get(
    "https://api.github.com/search/repositories",
    params={"q": "language:python", "sort": "stars", "order": "desc"},
)

# Inspect some attributes of the first three repositories
json_response = response.json()
popular_repositories = json_response["items"]
for repo in popular_repositories[:3]:
    print(f"Name: {repo['name']}")
    print(f"Description: {repo['description']}")
    print(f"Stars: {repo['stargazers_count']}")
    print()
here use a list of tuples as the params
>>>import requests

>>>requests.get(
...     "https://api.github.com/search/repositories",
...     [("q", "language:python"), ("sort", "stars"), ("order", "desc")],
... )
<Response [200]>
here pass the values as bytes
>>>requests.get(
...     "https://api.github.com/search/repositories",
...     params=b"q=language:python&sort=stars&order=desc",
... )
<Response [200]>
query strings are used for parameterizing GET requests
can also customize requests is by adding or modifying the headers which are sent in the request
Request Headers
to customize headers pass a dictionary of HTTP headers to get() using the headers parameter
import requests

response = requests.get(
    "https://api.github.com/search/repositories",
    params={"q": '"real python"'},
    headers={"Accept": "application/vnd.github.text-match+json"},
)

# View the new 'text-matches' list which provides information
# about your search term within the results
json_response = response.json()
first_repository = json_response["items"][0]
print(first_repository["text_matches"][0]["matches"])
the Accept header tells the server what content types the application can handle
here the request is expecting the matching search terms to be highlighted
the header value application/vnd.github.text-match+json is a proprietary GitHub Accept header where the content is a special JSON format
Other HTTP Methods
other popular HTTP methods include POST, PUT, DELETE, HEAD, PATCH, and OPTIONS
Requests provides a function for each of these HTTP methods
>>>import requests

>>>requests.get("https://httpbin.org/get")
<Response [200]>
>>>requests.post("https://httpbin.org/post", data={"key": "value"})
<Response [200]>
>>>requests.put("https://httpbin.org/put", data={"key": "value"})
<Response [200]>
>>>requests.delete("https://httpbin.org/delete")
<Response [200]>
>>>requests.head("https://httpbin.org/get")
<Response [200]>
>>>requests.patch("https://httpbin.org/patch", data={"key": "value"})
<Response [200]>
>>>requests.options("https://httpbin.org/get")
<Response [200]>
The Message Body
according to the HTTP specification, POST, PUT, and PATCH requests pass their data through the message body
using Requests pass the payload to the corresponding function's data parameter

data takes a dictionary, a list of tuples, bytes, or a file-like object
need to adapt the data to the specific needs of the service receiving the request

if the request's content type is application/x-www-form-urlencoded, then can send the form data as a dictionary

>>> import requests

>>> requests.post("https://httpbin.org/post", data={"key": "value"})
<Response [200]>
send the same data as a tuple
>>> requests.post("https://httpbin.org/post", data=[("key", "value")])
<Response [200]>
if the server requires JSON data, then use the json parameter
when JSON data is passed via json, Requests will serialize the data and add the correct Content-Type header
>>> response = requests.post("https://httpbin.org/post", json={"key": "value"})
>>> json_response = response.json()
>>> json_response["data"]
'{"key": "value"}'
>>> json_response["headers"]["Content-Type"]
'application/json'
Request Inspection
when a request is made, the Requests library prepares the request before sending it to the destination server
request preparation includes things like validating headers and serializing JSON content
can view the PreparedRequest object by accessing .request on a Response object
>>> import requests

>>> response = requests.post("https://httpbin.org/post", json={"key":"value"})

>>> response.request.headers["Content-Type"]
'application/json'
>>> response.request.url
'https://httpbin.org/post'
>>> response.request.body
b'{"key": "value"}'
Authentication
Typically provide required credentials to a server by passing data through the Authorization header or a custom header defined by the service
all the functions of Requests covered provide a parameter named auth for passing the credentials
>>> import requests

>>> response = requests.get(
...     "https://httpbin.org/basic-auth/user/passwd",
...     auth=("user", "passwd")
... )

>>> response.status_code
200
>>> response.request.headers["Authorization"]
'Basic dXNlcjpwYXNzd2Q='
the request succeeds if the credentials passes in the tuple to auth are valid

when passing credentials in a tuple to the auth parameter, Requests applies the credentials using HTTP' Basic access authentication scheme
'Basic dXNlcjpwYXNzd2Q=' is a Base64-encoded string of the username and password with the prefix "Basic "

  1. Requests combines the provided username and password putting a colon in between them
    the username "user" and password "passwd" becomes "user:passwd"
  2. Requests encodes this string in Base64 using base64.b64encode()
  3. Requests adds "Basic " in front of this Base64 string
HTTP Basic authentication isn't very secure
important to always send these requests over HTTPS
provides an additional layer of security by encrypting the entire HTTP request

could make the same request by passing explicit Basic authentication credentials using HTTPBasicAuth

>>> from requests.auth import HTTPBasicAuth
>>> requests.get(
...     "https://httpbin.org/basic-auth/user/passwd",
...     auth=HTTPBasicAuth("user", "passwd")
... )
<Response [200]>
Requests provides other methods of authentication out of the box such as HTTPDigestAuth and HTTPProxyAuth

some servers require the use of an authentication token

>>> import requests

>>> token = "<YOUR_TOKEN>"
>>> response = requests.get(
...     "https://api.github.com/user",
...     auth=("", token)
... )
>>> response.status_code
200
the code above 'works' but it's not the right way to authenticate a token
the empty string arg makes a little code smell
with Requests can supply derived authentication mechanism
from requests.auth import AuthBase

class TokenAuth(AuthBase):
    """Implements a token authentication scheme."""

    def __init__(self, token):
        self.token = token

    def __call__(self, request):
        """Attach an API token to the Authorization header."""
        request.headers["Authorization"] = f"Bearer {self.token}"
        return request
TokenAuth mechanism receives a token
  • includes that token in the Authorization header of your request
  • setting the recommended "Bearer " prefix to the string
use the TokenAuth type in the request
>>> import requests
>>> from custom_token_auth import TokenAuth

>>> token = ""
>>> response = requests.get(
...     "https://api.github.com/user",
...     auth=TokenAuth(token)
... )

>>> response.status_code
200
>>> response.request.headers["Authorization"]
'Bearer ghp_b...Tx'
SSL Certificate Verification
the way to communicate with secure sites over HTTP is by establishing an encrypted connection using SSL
verifying the target server's SSL certificate is critical
Requests does this by default
to disable SSL certification
>>> import requests

>>> requests.get("https://api.github.com", verify=False)
InsecureRequestWarning: Unverified HTTPS request is being made to host
⮑ 'api.github.com'. Adding certificate verification is strongly advised.
⮑ See: https://urllib3.readthedocs.io/en/latest/advanced-usage.html#tls-warnings
⮑  warnings.warn(
<Response [200]>
Performance
Timeouts
by default Requests will wait indefinitely on the response
specify a timeout duration to prevent these issues
to set the request's timeout, use the timeout parameter
timeout can be an integer or float representing the number of seconds to wait
>>> requests.get("https://api.github.com", timeout=1)
<Response [200]>
>>> requests.get("https://api.github.com", timeout=3.05)
<Response [200]>
can also pass a tuple to timeout with the two elements
  1. Connect timeout - the time it allows for the client to establish a connection to the server
  2. Read timeout - the time it'll wait on a response once your client has established a connection
>>> requests.get("https://api.github.com", timeout=(3.05, 5))
<Response [200]>
if the request times out a Timeout exception will be raised
import requests
from requests.exceptions import Timeout

try:
    response = requests.get("https://api.github.com", timeout=(3.05, 5))
except Timeout:
    print("The request timed out")
else:
    print("The request did not time out")
The Session Object
Session objects are used to persist parameters across requests
import requests
from custom_token_auth import TokenAuth

TOKEN = "<YOUR_GITHUB_PA_TOKEN>"

with requests.Session() as session:
    session.auth = TokenAuth(TOKEN)

    first_response = session.get("https://api.github.com/user")
    second_response = session.get("https://api.github.com/user")

print(first_response.headers)
print(second_response.json())
only need to log in once per session
can make multiple authenticated requests
requests will persist the credentials while the session exists

primary performance optimization of sessions comes in the form of persistent connections
When the app uses a Session to make a connection to a server, the Session keeps that connection in a connection pool
when the app wants to connect to the same server again the connection will be reused

Max Retries
when a request fails, may want to retry the same request
Requests don't do this by default
to use this functionality need to implement a custom transport adapter

transport adapters define a set of configurations for each service the app interacting with
For example, all requests to https://api.github.com should retry two times before finally raising a RetryError
create a transport adapter, set its max_retries parameter, and mount it to an existing Session

import requests
from requests.adapters import HTTPAdapter
from requests.exceptions import RetryError

github_adapter = HTTPAdapter(max_retries=2)
session = requests.Session()
session.mount("https://api.github.com", github_adapter)

try:
    response = session.get("https://api.github.com/")
except RetryError as err:
    print(f"Error: {err}")
finally:
    session.close()
index