Getting Started With Python's Requests Library | ||||||||||||
not included in standard library to install the package $ python -m pip install requests |
||||||||||||
The GET Request | ||||||||||||
most common HTTP methods is GET method used when trying to retrieve data from a specified resource to make a GET request using Requests invoke requests.get() |
||||||||||||
The Response | ||||||||||||
a Response is an object for inspecting the results of the request
>>> import requests >>> response = requests.get("https://api.github.com") Status Codes
status code represents the status of the request
if response.status_code == 200: print("Success!") elif response.status_code == 404: print("Not Found.")below the Response object is used as a conditional implicitly checks whether the response.status_code is between 200 and 399 indicates general success and not specific to only status code 200 if response: print("Success!") else: raise Exception(f"Non-success status code: {response.status_code}")can also use Response object's .raise_for_status() method will raise an HTTPError for status codes between 400 and 600 import requests from requests.exceptions import HTTPError URLS = ["https://api.github.com", "https://api.github.com/invalid"] for url in URLS: try: response = requests.get(url) response.raise_for_status() except HTTPError as http_err: print(f"HTTP error occurred: {http_err}") except Exception as err: print(f"Other error occurred: {err}") else: print("Success!") Content
a Response object's message body is its payload
to view the message body as raw bytes use response.contentto view the message body as a string use response.text response is actually serialized JSON content response.json() returns a dictionary Headers
response headers can contain information such as
>>> import requests >>> response = requests.get("https://api.github.com") >>> response.headers {'Server': 'GitHub.com', ... 'X-GitHub-Request-Id': 'AE83:3F40:2151C46:438A840:65C38178'}response.headers returns a dictionary-like object to see the content type of a response via its headers >>> response.headers["Content-Type"] 'application/json; charset=utf-8'the HTTP specification defines headers as case-insensitive >>> response.headers["content-type"] 'application/json; charset=utf-8' |
||||||||||||
Query String Parameters | ||||||||||||
customize a GET request is to pass values through query
string parameters in the URL to do this using get() pass data to params here use a dictionary import requests # Search GitHub's repositories for popular Python projects response = requests.get( "https://api.github.com/search/repositories", params={"q": "language:python", "sort": "stars", "order": "desc"}, ) # Inspect some attributes of the first three repositories json_response = response.json() popular_repositories = json_response["items"] for repo in popular_repositories[:3]: print(f"Name: {repo['name']}") print(f"Description: {repo['description']}") print(f"Stars: {repo['stargazers_count']}") print()here use a list of tuples as the params >>>import requests >>>requests.get( ... "https://api.github.com/search/repositories", ... [("q", "language:python"), ("sort", "stars"), ("order", "desc")], ... ) <Response [200]>here pass the values as bytes >>>requests.get( ... "https://api.github.com/search/repositories", ... params=b"q=language:python&sort=stars&order=desc", ... ) <Response [200]>query strings are used for parameterizing GET requests can also customize requests is by adding or modifying the headers which are sent in the request |
||||||||||||
Request Headers | ||||||||||||
to customize headers pass a dictionary of HTTP headers to get() using the headers
parameterimport requests response = requests.get( "https://api.github.com/search/repositories", params={"q": '"real python"'}, headers={"Accept": "application/vnd.github.text-match+json"}, ) # View the new 'text-matches' list which provides information # about your search term within the results json_response = response.json() first_repository = json_response["items"][0] print(first_repository["text_matches"][0]["matches"])the Accept header tells the server what content types the application can handle here the request is expecting the matching search terms to be highlighted the header value application/vnd.github.text-match+json is a proprietary GitHub Accept header where the content is a special JSON format |
||||||||||||
Other HTTP Methods | ||||||||||||
other popular HTTP methods include POST, PUT, DELETE, HEAD, PATCH, and OPTIONS Requests provides a function for each of these HTTP methods >>>import requests >>>requests.get("https://httpbin.org/get") <Response [200]> >>>requests.post("https://httpbin.org/post", data={"key": "value"}) <Response [200]> >>>requests.put("https://httpbin.org/put", data={"key": "value"}) <Response [200]> >>>requests.delete("https://httpbin.org/delete") <Response [200]> >>>requests.head("https://httpbin.org/get") <Response [200]> >>>requests.patch("https://httpbin.org/patch", data={"key": "value"}) <Response [200]> >>>requests.options("https://httpbin.org/get") <Response [200]> |
||||||||||||
The Message Body | ||||||||||||
according to the HTTP specification, POST, PUT, and PATCH requests
pass their data through the message body using Requests pass the payload to the corresponding function's data parameter data takes a dictionary, a list of tuples, bytes, or a file-like object need to adapt the data to the specific needs of the service receiving the request if the request's content type is application/x-www-form-urlencoded, then can send the form data as a dictionary >>> import requests >>> requests.post("https://httpbin.org/post", data={"key": "value"}) <Response [200]>send the same data as a tuple >>> requests.post("https://httpbin.org/post", data=[("key", "value")]) <Response [200]>if the server requires JSON data, then use the json parameter when JSON data is passed via json, Requests will serialize the data and add the correct Content-Type header >>> response = requests.post("https://httpbin.org/post", json={"key": "value"}) >>> json_response = response.json() >>> json_response["data"] '{"key": "value"}' >>> json_response["headers"]["Content-Type"] 'application/json' |
||||||||||||
Request Inspection | ||||||||||||
when a request is made, the Requests library prepares the request before
sending it to the destination server request preparation includes things like validating headers and serializing JSON content can view the PreparedRequest object by accessing .request on a Response object >>> import requests >>> response = requests.post("https://httpbin.org/post", json={"key":"value"}) >>> response.request.headers["Content-Type"] 'application/json' >>> response.request.url 'https://httpbin.org/post' >>> response.request.body b'{"key": "value"}' |
||||||||||||
Authentication | ||||||||||||
Typically provide required credentials to a server by passing data
through the Authorization header or a custom header defined by the service all the functions of Requests covered provide a parameter named auth for passing the credentials >>> import requests >>> response = requests.get( ... "https://httpbin.org/basic-auth/user/passwd", ... auth=("user", "passwd") ... ) >>> response.status_code 200 >>> response.request.headers["Authorization"] 'Basic dXNlcjpwYXNzd2Q='the request succeeds if the credentials passes in the tuple to auth are valid when passing credentials in a tuple to the auth parameter, Requests applies the credentials using HTTP' Basic access authentication scheme 'Basic dXNlcjpwYXNzd2Q=' is a Base64-encoded string of the username and password with the prefix "Basic "
important to always send these requests over HTTPS provides an additional layer of security by encrypting the entire HTTP request could make the same request by passing explicit Basic authentication credentials using HTTPBasicAuth >>> from requests.auth import HTTPBasicAuth >>> requests.get( ... "https://httpbin.org/basic-auth/user/passwd", ... auth=HTTPBasicAuth("user", "passwd") ... ) <Response [200]>Requests provides other methods of authentication out of the box such as HTTPDigestAuth and HTTPProxyAuth some servers require the use of an authentication token >>> import requests >>> token = "<YOUR_TOKEN>" >>> response = requests.get( ... "https://api.github.com/user", ... auth=("", token) ... ) >>> response.status_code 200the code above 'works' but it's not the right way to authenticate a token the empty string arg makes a little code smell with Requests can supply derived authentication mechanism from requests.auth import AuthBase class TokenAuth(AuthBase): """Implements a token authentication scheme.""" def __init__(self, token): self.token = token def __call__(self, request): """Attach an API token to the Authorization header.""" request.headers["Authorization"] = f"Bearer {self.token}" return requestTokenAuth mechanism receives a token
>>> import requests >>> from custom_token_auth import TokenAuth >>> token = " |
||||||||||||
SSL Certificate Verification | ||||||||||||
the way to communicate with secure sites over HTTP is by establishing an encrypted connection
using SSL verifying the target server's SSL certificate is critical Requests does this by default to disable SSL certification >>> import requests >>> requests.get("https://api.github.com", verify=False) InsecureRequestWarning: Unverified HTTPS request is being made to host ⮑ 'api.github.com'. Adding certificate verification is strongly advised. ⮑ See: https://urllib3.readthedocs.io/en/latest/advanced-usage.html#tls-warnings ⮑ warnings.warn( <Response [200]> |
||||||||||||
Performance | ||||||||||||
Timeouts
by default Requests will wait indefinitely on the responsespecify a timeout duration to prevent these issues to set the request's timeout, use the timeout parameter timeout can be an integer or float representing the number of seconds to wait >>> requests.get("https://api.github.com", timeout=1) <Response [200]> >>> requests.get("https://api.github.com", timeout=3.05) <Response [200]>can also pass a tuple to timeout with the two elements
>>> requests.get("https://api.github.com", timeout=(3.05, 5)) <Response [200]>if the request times out a Timeout exception will be raised import requests from requests.exceptions import Timeout try: response = requests.get("https://api.github.com", timeout=(3.05, 5)) except Timeout: print("The request timed out") else: print("The request did not time out") The Session Object
Session objects are used to persist parameters across requests
import requests from custom_token_auth import TokenAuth TOKEN = "<YOUR_GITHUB_PA_TOKEN>" with requests.Session() as session: session.auth = TokenAuth(TOKEN) first_response = session.get("https://api.github.com/user") second_response = session.get("https://api.github.com/user") print(first_response.headers) print(second_response.json())only need to log in once per session can make multiple authenticated requests requests will persist the credentials while the session exists primary performance optimization of sessions comes in the form of persistent connections When the app uses a Session to make a connection to a server, the Session keeps that connection in a connection pool when the app wants to connect to the same server again the connection will be reused Max Retries
when a request fails, may want to retry the same requestRequests don't do this by default to use this functionality need to implement a custom transport adapter transport adapters define a set of configurations for each service the app interacting with For example, all requests to https://api.github.com should retry two times before finally raising a RetryError create a transport adapter, set its max_retries parameter, and mount it to an existing Session import requests from requests.adapters import HTTPAdapter from requests.exceptions import RetryError github_adapter = HTTPAdapter(max_retries=2) session = requests.Session() session.mount("https://api.github.com", github_adapter) try: response = session.get("https://api.github.com/") except RetryError as err: print(f"Error: {err}") finally: session.close() |