How gargle gets tokens

This vignette explains the purpose and usage of token_fetch() and the functions it subsequently calls. The goal of token_fetch() is to secure a token for use in downstream requests.

The target audience is someone who works directly with a Google API. These people roughly fall into two camps:

  • The author of an R package that wraps a Google API.
  • The useR who is writing a script or app, without using such a wrapper, either because the wrapper does not exist or there’s a reason to avoid the dependency.

This vignette might also be useful to the user of a wrapper package who needs to influence the operations of token_fetch(), e.g. by telling it to try auth methods in a non-default order or to not try certain methods at all.

token_fetch() is aimed at whoever is going to manage the returned token, e.g., incorporate it into downstream requests. It can be very nice for users if wrapper packages assume this responsibility, as opposed to requiring users to explicitly acquire and manage their tokens. We give a few design suggestions here and cover this in more depth in vignette("gargle-auth-in-client-package").

library(gargle)

token_fetch()

token_fetch() is a rather magical function for getting a token. The goal is to make auth relatively painless for users, while allowing developers and power users to take control when and if they need to. Most users will presumably interact with token_fetch() only in an indirect way, mediated through an API wrapper package. That is not because the interface of token_fetch() is unfriendly – it’s very flexible! The objective of token_fetch() is to allow package developers to take responsibility for managing the user’s token, without having to implement all the different ways of obtaining that token in the first place.

The signature of token_fetch() is very simple and, therefore, not very informative:

token_fetch(scopes, ...)

Under the hood, token_fetch() calls a sequence of much more specific credential functions, each wrapped in a tryCatch() and returning NULL if unsuccessful. The only formal argument these functions have in common is scopes, with the rest being passed via ....

This gives a sense of the credential functions and reflects the order in which they are called:

writeLines(names(cred_funs_list()))
#> credentials_byo_oauth2
#> credentials_service_account
#> credentials_external_account
#> credentials_app_default
#> credentials_gce
#> credentials_user_oauth2

It is possible to manipulate this registry of functions. The help for cred_funs_list() is a good place to learn more and we present a concrete example in the last section of this vignette.

For now, however, we assume you’re working with the default registry that ships with gargle.

Note also that these credential functions are exported and can be called directly.

Get verbose output

To see more information about what gargle is up to, set the "gargle_verbosity" option to “debug”. Read more in the docs for gargle_verbosity().

credentials_byo_oauth2()

The first function tried is credentials_byo_oauth2(). Here’s how a call to token_fetch() might work:

token_fetch(token = <TOKEN2.0>)

credentials_byo_oauth2(
  token = <TOKEN2.0>
)

credentials_byo_oauth2() provides a back door for a “bring your own token” workflow. This function accounts for the scenario where an OAuth token has been obtained through external means and it’s convenient to be able to put it into force.

credentials_byo_oauth2() checks that token is of class httr::Token2.0 and that it appears to be associated with Google. A token of class request is also acceptable, in which case the auth_token component is extracted and treated as the input. This is how a Token2.0 object would present, if processed with httr::config(), as functions like googledrive::drive_token() and bigrquery::bq_token() do.

If token is not provided or if it doesn’t satisfy these requirements, we fail and token_fetch()’s execution moves on to the next function in the registry.

credentials_service_account()

The next function tried is credentials_service_account(). Here’s how a call to token_fetch() with service account inputs plays out:

token_fetch(scopes = <SCOPES>, path = "/path/to/your/service-account.json")

# credentials_byo_oauth2() fails because no `token`,
# which leads to this call:
credentials_service_account(
  scopes = <SCOPES>,
  path = "/path/to/your/service-account.json"
)

The scopes are often provided by the API wrapper function that is mediating the calls to token_fetch() and credential_service_account(). The path argument is presumably coming from the user. It is treated as a JSON representation of service account credentials, in any form that is acceptable to jsonlite::fromJSON(). In the above example, that is a file path, but it could also be a JSON string. If there is no named path argument or if it can’t be parsed as a service account credential, we fail and token_fetch()’s execution moves on to the next function in the registry.

Here is some Google documentation about service accounts:

For R users, a service account is a great option for credentials that will be used in a script or application running remotely or in an unattended fashion. In particular, this is a better approach than trying to move OAuth2 credentials from one machine to another. For example, a service account is the preferred method of auth when testing and documenting a package on a continuous integration service.

The JSON key file must be managed securely. In particular, it should not be kept in, e.g., a GitHub repository (unless it is encrypted). The encryption strategy used by gargle and other packages is described in the article Managing tokens securely.

Note that fetching a token for a service account requires a reasonably accurate system clock. This is of particular importance for users running gargle inside a Docker container, as Docker for Windows has intermittently seen problems with clock drift. If your service account token requests fail with “Bad Request” inside a container, but succeed locally, check that the container’s system clock is accurate.

credentials_external_account()

The next function tried is credentials_external_account(). Here’s how a call to token_fetch() with an external account inputs plays out:

token_fetch(scopes = <SCOPES>, path = "/path/to/your/external-account.json")

# credentials_byo_oauth2() fails because no `token`,
# credentials_service_account() fails because the JSON provided via
#   `path` is not of type "service_account",
# which leads to this call:
credentials_external_account(
  scopes = <SCOPES>,
  path = "/path/to/your/external-account.json"
)

credentials_external_account() implements something called workload identity federation and is available to applications running on specific non-Google Cloud platforms. At the time of writing, gargle only supports AWS, but this could be expanded to other providers, such as Azure, if there is a documented need.

Similar to credentials_service_account(), the path is treated as a JSON representation of the account’s configuration and it’s probably a file path. However, in contrast to credentials_service_account(), this JSON only contains non-sensitive metadata, which is, indeed, the main point of this flow. The secrets needed to complete auth are obtained “on-the-fly” from, e.g., the running EC2 instance.

credentials_external_account() will fail for many reasons: there is no named path argument, the JSON at path can’t be parsed as configuration for an external AWS account, we don’t appear to running on AWS, suggested packages for AWS functionality are not installed, or the workload identity pool is misconfigured. If any of that happens, we fail and token_fetch()’s execution moves on to the next function in the registry.

Here is some Google documentation about workload identity federation and the specifics for AWS:

credentials_app_default()

The next function tried is credentials_app_default(). Here’s how a call to token_fetch() might work:

token_fetch(scopes = <SCOPES>)

# credentials_byo_oauth2() fails because no `token`,
# credentials_service_account() fails because no `path`,
# credentials_external_account() fails because no `path`,
# which leads to this call:
credentials_app_default(
  scopes = <SCOPES>
)

credentials_app_default() loads credentials from a file identified via a search strategy known as Application Default Credentials (ADC). The credentials themselves are conventional service account, external account, or user credentials that happen to be stored in a pre-ordained location and format.

The hope is to make auth “just work” for someone working on Google-provided infrastructure or who has used Google tooling to get started, such as the gcloud command line tool. A sequence of paths is consulted, which we describe here, with some abuse of notation. ALL_CAPS represents the value of an environment variable.

${GOOGLE_APPLICATION_CREDENTIALS}
${CLOUDSDK_CONFIG}/application_default_credentials.json

# on Windows:
%APPDATA%\gcloud\application_default_credentials.json
%SystemDrive%\gcloud\application_default_credentials.json
C:\gcloud\application_default_credentials.json

# on not-Windows:
~/.config/gcloud/application_default_credentials.json

If the above search successfully identifies a JSON file, it is parsed and ingested either as a service account token, an external account configuration, or an OAuth2 user credential. In the case of an OAuth2 credential, the requested scopes must also meet certain criteria. Note that this will NOT work for OAuth2 credentials initiated by gargle, which are stored on disk in .rds files. The storage of OAuth2 user credentials as JSON is unique to certain Google tools – possibly just the gcloud CLI – and should probably be regarded as deprecated. It is recommended to use ADC with a service account or workload identity federation. If this quest is unsuccessful, we fail and token_fetch()’s execution moves on to the next function in the registry.

The main takeaway lesson:

  • You can make auth “just work” by storing the JSON for a service account or an external account at one of the filepaths listed above. It will be automagically discovered when token_fetch() is called with only the scopes argument specified.

Again, remember that the JSON key file for a conventional service account must be managed securely and should NOT live in a directory that syncs to the cloud. The JSON configuration for an external account is not actually sensitive and this is one of the benefits of this flow, but it’s only available in a very narrow set of circumstances.

credentials_gce()

The next function tried is credentials_gce(). Here’s how a call to token_fetch() might work:

token_fetch(scopes = <SCOPES>)
# or perhaps
token_fetch(scopes = <SCOPES>, service_account = <SERVICE_ACCOUNT>)

# credentials_byo_oauth2() fails because no `token`,
# credentials_service_account() fails because no `path`,
# credentials_external_account() fails because no `path`,
# credentials_app_default() fails because no ADC found,
# which leads to one of these calls:
credentials_gce(
  scopes = <SCOPES>,
  service_account = "default"
)
# or
credentials_gce(
  scopes = <SCOPES>,
  service_account = <SERVICE_ACCOUNT>
)

credentials_gce() retrieves service account credentials from a metadata service that is specific to virtual machine instances running on Google Cloud Engine (GCE). Basically, if you have to ask what this is about, this is not the auth method for you. Let us move on.

If this seems to happening to you and it’s not what you want, see the last section for how to remove this auth method.

credentials_user_oauth2()

The next and final function tried is credentials_user_oauth2(). Here’s how a call to token_fetch() might work:

token_fetch(scopes = <SCOPES>)

# credentials_byo_oauth2() fails because no `token`,
# credentials_service_account() fails because no `path`,
# credentials_external_account() fails because no `path`,
# credentials_app_default() fails because no ADC found,
# credentials_gce() fails because not on GCE,
# which leads to this call:
credentials_user_oauth2(
  scopes = <SCOPES>,
  app = <OAUTH_APP>,
  package = "<PACKAGE>"
)

credentials_user_oauth2() is where the vast majority of users will end up. This is the function that choreographs the traditional “OAuth dance” in the browser. User credentials are cached locally, at the user level, by default. Therefore, after first use, there are scenarios in which gargle can determine unequivocally that it already has a suitable token on hand and can load (and possibly refresh) it, without additional user intervention.

The scopes, app (likely to be renamed client in a future version of gargle), and package are generally provided by the API wrapper function that is mediating the calls to token_fetch(). Do not “borrow” an OAuth client from gargle or any other package; always use credentials associated with your package or provided by your user. Per the Google User Data Policy https://developers.google.com/terms/api-services-user-data-policy, your application must accurately represent itself when authenticating to Google API services.

The wrapper package would presumably also declare itself as the package requesting a token (this is used in messages). So here’s how a call to token_fetch() and credentials_user_oauth2() might look when initiated from THINGY_auth(), a function in the fictional thingyr wrapper package:

# user initiates auth or does something that triggers it indirectly
THINGY_auth()

# which then calls
gargle::token_fetch(
  scopes  = <SCOPES_NEEDED_FOR_THE_THINGY_API>,
  app     = thingy_app(),
  package = "thingyr"
)

# which leads to this call:
credentials_user_oauth2(
  scopes  = <SCOPES_NEEDED_FOR_THE_THINGY_API>,
  app     = thingy_app(),
  package = "thingyr"
)

See vignette("gargle-auth-in-client-package") for design ideas for a function like THINGY_auth().

What happens tomorrow or next week? Do we make this user go through the browser dance again? How do we get to that happy place where we don’t bug them constantly about auth?

First, we define “suitable”, i.e. what it means to find a matching token in the cache. credentials_user_oauth2() is a thin wrapper around gargle2.0_token() which is the constructor for the gargle::Gargle2.0 class used to hold an OAuth2 token. And that call might look something like this (simplified for communication purposes):

gargle2.0_token(
  email   = gargle_oauth_email(),
  app     = thingy_app(),
  package = "thingyr",
  scope   = <SCOPES_NEEDED_FOR_THE_THINGY_API>,
  cache   = gargle_oauth_cache()
)

gargle looks in the cache specified by gargle_oauth_cache() for a token that has these scopes, this client, and the Google identity specified by email. By default email is NA, so we might find one or more tokens that have the necessary scopes and client. In that case, gargle reveals the email associated with the matching token(s) and asks the user for explicit instructions about how to proceed. That looks something like this:

The thingyr package is requesting access to your Google account.
Enter '1' to start a new auth process or select a pre-authorized account.
1: Send me to the browser for a new auth process.
2: janedoe_personal@gmail.com
3: janedoe@example.com
4: janedoe_work@gmail.com
Selection: 

If none of the tokens has the right scopes and client (or if the user declines to use a pre-existing token), we head to the browser to initiate OAuth2 flow de novo.

A user can reduce the need for interaction by passing the target email to thingy_auth():

thingy_auth(email = "[email protected]")

or by specifying same in the "gargle_oauth_email" option. A value of email = TRUE, passed directly or via the option, is an alternative strategy: TRUE means that gargle is allowed to use a matching token whenever there is exactly one match.

The elevated status of email for gargle::Gargle2.0 tokens is motivated by the fact that many of us have multiple Google identities and need them to be very prominent when working with Google APIs. This is one of the main motivations for gargle::Gargle2.0, which extends httr::Token2.0. The gargle::Gargle2.0 class also defaults to a user-level token cache, as opposed to project-level. An overview of the current OAuth cache is available via gargle_oauth_cache() and the output looks something like this:

gargle_oauth_sitrep()
#> 14 tokens found in this gargle OAuth cache:
#> ~/Library/Caches/gargle
#' 
#' email                         app         scopes                         hash...   
#' ----------------------------- ----------- ------------------------------ ----------
#' [email protected]       thingy      ...bigquery, ...cloud-platform 128f9cc...
#' [email protected]             gargle-demo                                15acf95...
#' [email protected]            gargle-demo ...drive                       4281945...
#' [email protected]       gargle-demo ...drive                       48e7e76...
#' [email protected]       tidyverse                                  69a7353...
#' [email protected]             tidyverse   ...spreadsheets.readonly       86a70b9...
#' [email protected]       tidyverse   ...drive                       d9443db...
#' [email protected]             tidyverse   ...drive                       d9443db...
#' [email protected]             tidyverse   ...drive                       d9443db...
#' [email protected]        tidyverse   ...drive                       d9443db...
#' [email protected] tidyverse   ...drive                       d9443db...
#' [email protected]       tidyverse   ...drive.readonly              ecd11fa...
#' [email protected]       tidyverse   ...bigquery, ...cloud-platform ece63f4...
#' [email protected]             tidyverse   ...spreadsheets                f178dd8...

Manipulate the credential function registry

Recall that you can get an overview of the credential functions that token_fetch() works through like so:

writeLines(names(cred_funs_list()))
#> credentials_byo_oauth2
#> credentials_service_account
#> credentials_external_account
#> credentials_app_default
#> credentials_gce
#> credentials_user_oauth2

Sometimes more than one of these auth methods “work”, but only one of them actually “works” and, sadly, it’s not the first one. In this case, gargle successfully gets a token, but then you experience token-related failure in downstream work.

The most common example of this is someone who is working on Google Compute Engine (GCE), but they prefer to auth as a normal user, not as the default service account.

Let’s say you want to prevent token_fetch() from even trying one specific auth method, clearing the way for it to automagically use the method you want. You can remove a specific credential function from the registry. Here’s how to do this for the scenario described above, where you want to skip GCE-specific auth:

gargle::cred_funs_add(credentials_gce = NULL)

Learn more in the docs for cred_funs_list(). You can even make narrowly scoped changes to the registry with local_cred_funs() and with_cred_funs().