--- title: "httr2" output: rmarkdown::html_vignette vignette: > %\VignetteIndexEntry{httr2} %\VignetteEngine{knitr::rmarkdown} %\VignetteEncoding{UTF-8} --- ```{r, include = FALSE} # needs pipe & avoids error = TRUE problem in 4.3.0 run_code <- getRversion() >= "4.4.0" knitr::opts_chunk$set( collapse = TRUE, comment = "#>", eval = run_code, purl = run_code ) ``` The goal of this document is show you the basics of httr2. You'll learn how to create and submit HTTP requests and work with the HTTP responses that you get back. httr2 is designed to map closely to the underlying HTTP protocol, which I'll explain as we go along. For more details, I also recommend "[An overview of HTTP](https://developer.mozilla.org/en-US/docs/Web/HTTP/Overview)" from MDN. ```{r setup} #| eval: true library(httr2) ``` ## Create a request In httr2, you start by creating a request. If you're familiar with httr, this a big change: with httr you could only submit a request, immediately receiving a response. Having an explicit request object makes it easier to build up a complex request piece by piece and works well with the pipe. Every request starts with a URL: ```{r} req <- request(example_url()) req ``` Here, instead of an external website, we use a test server that's built-in to httr2 itself. That ensures that this vignette will work regardless of when or where you run it. We can see exactly what this request will send to the server with a dry run: ```{r} req |> req_dry_run() ``` ```{r} #| include: FALSE #| eval: true port <- if (run_code) paste0("`", url_parse(example_url())$port, "`") else "e.g. `1234`" ``` The first line of the request contains three important pieces of information: - The HTTP **method**, which is a verb that tells the server what you want to do. Here's its GET, the most common verb, indicating that we want to *get* a resource. Other verbs include POST, to create a new resource, PUT, to replace an existing resource, and DELETE, to delete a resource. - The **path**, which is the URL stripped of details that the server already knows, i.e. the protocol (`http` or `https`), the host (`localhost`), and the port (`r port`). - The version of the HTTP protocol. This is unimportant for our purposes because it's handled at a lower level. The following lines specify the HTTP **headers**, a series of name-value pairs separated by `:`. The headers in this request were automatically added by httr2, but you can override them or add your own with `req_headers()`: ```{r} req |> req_headers( Name = "Hadley", `Shoe-Size` = "11", Accept = "application/json" ) |> req_dry_run() ``` Header names are case-insensitive, and servers will ignore headers that they don't understand. The headers finish with a blank line which is followed by the **body**. The requests above (like all GET requests) don't have a body, so let's add one to see what happens. The `req_body_*()` functions provide a variety of ways to add data to the body. Here we'll use `req_body_json()` to add some data encoded as JSON: ```{r} req |> req_body_json(list(x = 1, y = "a")) |> req_dry_run() ``` What's changed? - The method has changed from GET to POST. POST is the standard method for sending data to a website, and is automatically used whenever you add a body. Use `req_method()` to for a different method. - There are two new headers: `Content-Type` and `Content-Length`. They tell the server how to interpret the body --- it's encoded as JSON and is 15 bytes long. - We have a body, consisting of some JSON. Different servers want data encoded differently so httr2 provides a selection of common formats. For example, `req_body_form()` uses the encoding used when you submit a form from a web browser: ```{r} req |> req_body_form(x = "1", y = "a") |> req_dry_run() ``` And `req_body_multipart()` uses the multipart encoding which is particularly important when you need to send larger amounts of data or complete files: ```{r} req |> req_body_multipart(x = "1", y = "a") |> req_dry_run() ``` If you need to send data encoded in a different form, you can use `req_body_raw()` to add the data to the body and set the `Content-Type` header. ## Perform a request and fetch the response To actually perform a request and fetch the response back from the server, call `req_perform()`: ```{r} req <- request(example_url()) |> req_url_path("/json") resp <- req |> req_perform() resp ``` You can see a simulation of what httr2 actually received with `resp_raw()`: ```{r} resp |> resp_raw() ``` An HTTP response has a very similar structure to an HTTP request. The first line gives the version of HTTP used, and a status code that's optionally followed by a short description. Then we have the headers, followed by a blank line, followed by a body. The majority of responses will have a body, unlike requests. You can extract data from the response using the `resp_()` functions: - `resp_status()` returns the status code and `resp_status_desc()` returns the description: ```{r} resp |> resp_status() resp |> resp_status_desc() ``` - You can extract all headers with `resp_headers()` or a specific header with `resp_header()`: ```{r} resp |> resp_headers() resp |> resp_header("Content-Length") ``` Headers are case insensitive: ```{r} resp |> resp_header("ConTEnT-LeNgTH") ``` - You can extract the body in various forms using the `resp_body_*()` family of functions. Since this response returns JSON we can use `resp_body_json()`: ```{r} resp |> resp_body_json() |> str() ``` Responses with status codes 4xx and 5xx are HTTP errors. httr2 automatically turns these into R errors: ```{r, error = TRUE} request(example_url()) |> req_url_path("/status/404") |> req_perform() request(example_url()) |> req_url_path("/status/500") |> req_perform() ``` This is another important difference to httr, which required that you explicitly call `httr::stop_for_status()` to turn HTTP errors into R errors. You can revert to the httr behaviour with `req_error(req, is_error = ~ FALSE)`. ## Control the request process A number of `req_` functions don't directly affect the HTTP request but instead control the overall process of submitting a request and handling the response. These include: - `req_cache()` sets up a cache so if repeated requests return the same results, you can avoid a trip to the server. - `req_throttle()` will automatically add a small delay before each request so you can avoid hammering a server with many requests. - `req_retry()` sets up a retry strategy so that if the request either fails or you get a transient HTTP error, it'll automatically retry after a short delay. For more details see their documentation, as well as examples of the usage in real APIs in `vignette("wrapping-apis")`.