--- title: "here" output: rmarkdown::html_vignette vignette: > %\VignetteIndexEntry{here} %\VignetteEngine{knitr::rmarkdown} %\VignetteEncoding{UTF-8} --- ```{r, include = FALSE} knitr::opts_chunk$set( collapse = TRUE, comment = "#>", error = !identical(Sys.getenv("IN_PKGDOWN"), "true") ) project_path <- system.file("demo-project", package = "here") ``` The here package enables easy file referencing by using the top-level directory of a file project to easily build file paths. This is in contrast to using `setwd()`, which is fragile and dependent on the way you order your files on your computer. Read more about project-oriented workflows: - What They Forgot to Teach You About R: ["Project-oriented workflow"](https://rstats.wtf/projects.html) chapter by Jenny Bryan and Jim Hester - ["Project-oriented workflow"](https://www.tidyverse.org/blog/2017/12/workflow-vs-script/) blog post by Jenny Bryan - R for data science: ["Workflow: projects"](https://r4ds.had.co.nz/workflow-projects.html) chapter by Hadley Wickham ## Basic functionality For demonstration, this article uses a data analysis project that lives in `` `r project_path` `` on my machine. This is the *project root*. The path will most likely be different on your machine, the here package helps deal with this situation. The project has the following structure: ```{r echo = FALSE} fs::dir_tree(project_path) ``` You can review the project on [GitHub](https://github.com/r-lib/here/tree/main/inst/demo-project) and also download a copy. To start working on this project in RStudio, open the `demo-project.Rproj` file. This ensures that the [working directory](https://en.wikipedia.org/wiki/Working_directory) is set to `` `r project_path` ``, the project root. Opening only the `.R` or the `.Rmd` file may be insufficient! Other development environments may have a different notion of a project. Either way, it is important that the working directory is set to the project root or a subdirectory of that path. You can check with: ```{r eval = FALSE} setwd(project_path) ``` ```{r include = FALSE} knitr::opts_knit$set(root.dir = project_path) ``` ```{r} getwd() ``` (See `vignette("rmarkdown")` for an example where the working directory is set to a subdirectory on start.) ### Declare the location of the current script The intended use is to add a call to `here::i_am()` at the beginning of your script or in the first chunk of your rmarkdown report.[^legacy] This achieves the following: [^legacy]: Prior to version 1.0.0, it was recommended to attach the here package via `library(here)`. This still works, but is no longer the recommended approach. - The location of the current script or report within the project is declared - The project root is initialized, consistent with the location of the current script or report - An informative message is emitted.[^print] [^print]: `library(here)` no longer emits an informative message if `here::i_am()` has been called before. The first argument to `here::i_am()` should be the path to the current file, relative to the project root. The `penguins.R` script uses: ```{r} here::i_am("prepare/penguins.R") ``` `here::i_am()` displays the top-level directory of the current project. Because the project has a `prepare/` directory in its root that contains `penguins.R`, it is correctly inferred as the project root. After `here::i_am()`, insert `library(here)` to make the `here()` function available:[^why-not-first] [^why-not-first]: `library(here)` emits a message that may be confusing if followed by the message from `here::i_am()`. ```{r} library(here) ``` The top-level directory is also returned from the `here()` function: ```{r} here() ``` One important distinction from the working directory is that this remains stable even if the working directory is changed: ```{r} setwd("analysis") getwd() here() setwd("..") ``` (I suggest to steer clear from ever changing the working directory. This may not always be feasible, in particular if the working directory is changed by code that you do not control.) ### Use project-relative paths You can build a path relative to the top-level directory in order to build the full path to a file: ```{r} here("data", "penguins.csv") readr::read_csv( here("data", "penguins.csv"), col_types = list(.default = readr::col_guess()), n_max = 3 ) ``` This works regardless of where the associated source file lives inside your project. With `here()`, the path will always be relative to the top-level project directory. `here()` works very similarly to `file.path()` or `fs::path()`, you can pass path components or entire subpaths: ```{r} here("data/penguins.csv") ``` As seen above, `here()` returns absolute paths (starting with `/`, `:\` or `\\`). This makes it safe to pass these paths to other functions, even if the working directory is changed along the way. As of version 1.0.0, absolute paths passed to `here()` are returned unchanged. This means that you can safely use both absolute and project-relative paths in `here()`. ```{r} data_path <- here("data") here(data_path) here(data_path, "penguins.csv") ``` ### Situation report The `dr_here()` function explains the reasoning behind choosing the project root: ```{r} dr_here() ``` The `show_reason` argument can be set to `FALSE` to reduce the output to one line: ```{r} dr_here(show_reason = FALSE) ``` ### What if the working directory is wrong? The declaration of the active file via `here::i_am()` also protects against accidentally running the script from a working directory outside of your project. The example below calls `here::i_am()` from the temporary directory, which is clearly outside our project: ```{r error = TRUE} withr::with_dir(tempdir(), { print(getwd()) here::i_am("prepare/penguins.R") }) ``` This can also happen when a file has been renamed or moved without updating the `here::i_am()` call. In the future, a helper function will assist with installing and updating suitably formatted `here::i_am()` calls in your scripts and reports. ## Extra safety ### Conflicts with other packages Other packages also export a `here()` function. Loading these packages after loading here masks our `here()` function: ```{r error = TRUE} library(plyr) here() ``` One way to work around this problem is to use `here::here()`: ```{r} here::here() ``` The conflicted package offers an alternative: it detects that `here()` is exported from more than one package and allows you to use neither until you indicate a preference. ```{r error = TRUE} library(conflicted) here() conflicted::conflict_prefer("here", "here") here() ``` ### Specify a unique identifier To eliminate potential confusion, `here::i_am()` accepts a `uuid` argument. The idea is that each script and report calls `here::i_am()` very early (in the first 100 lines) with a universally unique identifier. Even if a file location is reused across projects (e.g. two projects contain a "prepare/data.R" file), the files can be identified correctly if the `uuid` argument in the `here::i_am()` call is different. If a `uuid` argument is passed to `here::i_am()`: - a project root that contains a file at the specified file path is searched - the first 100 lines of that file are read - for the correct file, the `here::i_am()` call that passes this very `uuid` is among those 100 lines, and will be matched - for the wrong file, `uuid` is not found in the text Use `uuid::UUIDgenerate()` to create universally unique identifiers: ```{r} uuid::UUIDgenerate() ``` Ensure that the `uuid` arguments are actually unique across your files! In the future, a helper function will assist with installing and updating suitably formatted `here::i_am()` calls in your scripts and reports. ## Beyond here ### Change project root It is advisable to [start a fresh R session](https://rstats.wtf/save-source.html) as often as possible, especially before focusing on another project. There still may be legitimate cases when it is desirable to reset the project root. To start, let's create a temporary project for demonstration: ```{r} temp_project_path <- tempfile() dir.create(temp_project_path) scripts_path <- file.path(temp_project_path, "scripts") dir.create(scripts_path) script_path <- file.path(scripts_path, "script.R") writeLines( c( 'here::i_am("scripts/script.R")', 'print("Hello, world!")' ), script_path ) fs::dir_tree(temp_project_path) writeLines(readLines(script_path)) ``` The `script.R` file contains a call to `here::i_am()` to declare its location. Running it from the current working directory will fail: ```{r error = TRUE} source(script_path, echo = TRUE) ``` To reset the project root mid-session, change the working directory with `setwd()`. Now, the subsequent call to `here::i_am()` from within `script.R` works: ```{r eval = FALSE} setwd(temp_project_path) ``` ```{r include = FALSE} knitr::opts_knit$set(root.dir = temp_project_path) ``` ```{r} source(script_path, echo = TRUE) ``` To reiterate: a fresh session is almost always the better, cleaner, safer, and more robust solution. Use this approach only as a last resort. ### Under the hood: rprojroot The here package has a very simple and restricted interface, by design. The underlying logic is provided by the much more powerful rprojroot package. If the default behavior of here does not suit your workflow for one reason or another, the rprojroot package may be a better alternative. It is also recommended to import rprojroot and not here from other packages. The following example shows how to find an RStudio project starting from a directory: ```{r} library(rprojroot) find_root(is_rstudio_project, file.path(project_path, "analysis")) ``` Arbitrary criteria can be defined. See `vignette("rprojroot", package = "rprojroot")` for an introduction.