The here package enables easy file
referencing by using the top-level directory of a file project to easily
build file paths. This is in contrast to using setwd()
,
which is fragile and dependent on the way you order your files on your
computer. Read more about project-oriented workflows:
What They Forgot to Teach You About R: “Project-oriented workflow” chapter by Jenny Bryan and Jim Hester
“Project-oriented workflow” blog post by Jenny Bryan
R for data science: “Workflow: projects” chapter by Hadley Wickham
For demonstration, this article uses a data analysis project that
lives in /tmp/RtmpTxQu3u/Rinstc3ccc3a1ac/here/demo-project
on my machine. This is the project root. The path will most
likely be different on your machine, the here package helps deal with
this situation.
The project has the following structure:
#> /tmp/RtmpTxQu3u/Rinstc3ccc3a1ac/here/demo-project
#> ├── analysis
#> │ └── report.Rmd
#> ├── data
#> │ └── penguins.csv
#> ├── demo-project.Rproj
#> └── prepare
#> └── penguins.R
You can review the project on GitHub and also download a copy.
To start working on this project in RStudio, open the
demo-project.Rproj
file. This ensures that the working
directory is set to
/tmp/RtmpTxQu3u/Rinstc3ccc3a1ac/here/demo-project
, the
project root. Opening only the .R
or the .Rmd
file may be insufficient!
Other development environments may have a different notion of a project. Either way, it is important that the working directory is set to the project root or a subdirectory of that path. You can check with:
(See vignette("rmarkdown")
for an example where the
working directory is set to a subdirectory on start.)
The intended use is to add a call to here::i_am()
at the
beginning of your script or in the first chunk of your rmarkdown
report.1
This achieves the following:
The first argument to here::i_am()
should be the path to
the current file, relative to the project root. The
penguins.R
script uses:
here::i_am("prepare/penguins.R")
#> here() starts at /tmp/RtmpTxQu3u/Rinstc3ccc3a1ac/here/demo-project
here::i_am()
displays the top-level directory of the
current project. Because the project has a prepare/
directory in its root that contains penguins.R
, it is
correctly inferred as the project root.
After here::i_am()
, insert library(here)
to
make the here()
function available:3
The top-level directory is also returned from the here()
function:
One important distinction from the working directory is that this remains stable even if the working directory is changed:
setwd("analysis")
getwd()
#> [1] "/tmp/RtmpTxQu3u/Rinstc3ccc3a1ac/here/demo-project/analysis"
here()
#> [1] "/tmp/RtmpTxQu3u/Rinstc3ccc3a1ac/here/demo-project"
setwd("..")
(I suggest to steer clear from ever changing the working directory. This may not always be feasible, in particular if the working directory is changed by code that you do not control.)
You can build a path relative to the top-level directory in order to build the full path to a file:
here("data", "penguins.csv")
#> [1] "/tmp/RtmpTxQu3u/Rinstc3ccc3a1ac/here/demo-project/data/penguins.csv"
readr::read_csv(
here("data", "penguins.csv"),
col_types = list(.default = readr::col_guess()),
n_max = 3
)
#> # A tibble: 3 × 8
#> species island bill_length_mm bill_depth_mm flipper_length_mm body_mass_g
#> <chr> <chr> <dbl> <dbl> <dbl> <dbl>
#> 1 Adelie Torgersen 39.1 18.7 181 3750
#> 2 Adelie Torgersen 39.5 17.4 186 3800
#> 3 Adelie Torgersen 40.3 18 195 3250
#> # ℹ 2 more variables: sex <chr>, year <dbl>
This works regardless of where the associated source file lives
inside your project. With here()
, the path will always be
relative to the top-level project directory.
here()
works very similarly to file.path()
or fs::path()
, you can pass path components or entire
subpaths:
here("data/penguins.csv")
#> [1] "/tmp/RtmpTxQu3u/Rinstc3ccc3a1ac/here/demo-project/data/penguins.csv"
As seen above, here()
returns absolute paths (starting
with /
, <drive letter>:\
or
\\
). This makes it safe to pass these paths to other
functions, even if the working directory is changed along the way.
As of version 1.0.0, absolute paths passed to here()
are
returned unchanged. This means that you can safely use both absolute and
project-relative paths in here()
.
The dr_here()
function explains the reasoning behind
choosing the project root:
dr_here()
#> here() starts at /tmp/RtmpTxQu3u/Rinstc3ccc3a1ac/here/demo-project.
#> - This directory contains a file "prepare/penguins.R"
#> - Initial working directory: /tmp/RtmpTxQu3u/Rinstc3ccc3a1ac/here/demo-project
#> - Current working directory: /tmp/RtmpTxQu3u/Rinstc3ccc3a1ac/here/demo-project
The show_reason
argument can be set to
FALSE
to reduce the output to one line:
The declaration of the active file via here::i_am()
also
protects against accidentally running the script from a working
directory outside of your project. The example below calls
here::i_am()
from the temporary directory, which is clearly
outside our project:
withr::with_dir(tempdir(), {
print(getwd())
here::i_am("prepare/penguins.R")
})
#> [1] "/tmp/RtmpD769js"
#> Error: Could not find associated project in working directory or any parent directory.
#> - Path in project: prepare/penguins.R
#> - Current working directory: /tmp/RtmpD769js
#> Please open the project associated with this file and try again.
This can also happen when a file has been renamed or moved without
updating the here::i_am()
call. In the future, a helper
function will assist with installing and updating suitably formatted
here::i_am()
calls in your scripts and reports.
Other packages also export a here()
function. Loading
these packages after loading here masks our here()
function:
library(plyr)
#>
#> Attaching package: 'plyr'
#> The following object is masked from 'package:here':
#>
#> here
here()
#> Error in here(): argument "f" is missing, with no default
One way to work around this problem is to use
here::here()
:
The conflicted package offers an alternative: it detects that
here()
is exported from more than one package and allows
you to use neither until you indicate a preference.
library(conflicted)
here()
#> Error:
#> ! [conflicted] here found in 2 packages.
#> Either pick the one you want with `::`:
#> • plyr::here
#> • here::here
#> Or declare a preference with `conflicts_prefer()`:
#> • `conflicts_prefer(plyr::here)`
#> • `conflicts_prefer(here::here)`
conflicted::conflict_prefer("here", "here")
#> [conflicted] Will prefer here::here over any other package.
here()
#> Error in here(): argument "f" is missing, with no default
To eliminate potential confusion, here::i_am()
accepts a
uuid
argument. The idea is that each script and report
calls here::i_am()
very early (in the first 100 lines) with
a universally unique identifier. Even if a file location is reused
across projects (e.g. two projects contain a “prepare/data.R” file), the
files can be identified correctly if the uuid
argument in
the here::i_am()
call is different.
If a uuid
argument is passed to
here::i_am()
:
here::i_am()
call that passes
this very uuid
is among those 100 lines, and will be
matcheduuid
is not found in the textUse uuid::UUIDgenerate()
to create universally unique
identifiers:
Ensure that the uuid
arguments are actually unique
across your files! In the future, a helper function will assist with
installing and updating suitably formatted here::i_am()
calls in your scripts and reports.
It is advisable to start a fresh R session as often as possible, especially before focusing on another project. There still may be legitimate cases when it is desirable to reset the project root.
To start, let’s create a temporary project for demonstration:
temp_project_path <- tempfile()
dir.create(temp_project_path)
scripts_path <- file.path(temp_project_path, "scripts")
dir.create(scripts_path)
script_path <- file.path(scripts_path, "script.R")
writeLines(
c(
'here::i_am("scripts/script.R")',
'print("Hello, world!")'
),
script_path
)
fs::dir_tree(temp_project_path)
#> /tmp/RtmpD769js/filecf24cdef230
#> └── scripts
#> └── script.R
writeLines(readLines(script_path))
#> here::i_am("scripts/script.R")
#> print("Hello, world!")
The script.R
file contains a call to
here::i_am()
to declare its location. Running it from the
current working directory will fail:
source(script_path, echo = TRUE)
#>
#> > here::i_am("scripts/script.R")
#> Error: Could not find associated project in working directory or any parent directory.
#> - Path in project: scripts/script.R
#> - Current working directory: /tmp/RtmpTxQu3u/Rinstc3ccc3a1ac/here/demo-project
#> Please open the project associated with this file and try again.
To reset the project root mid-session, change the working directory
with setwd()
. Now, the subsequent call to
here::i_am()
from within script.R
works:
source(script_path, echo = TRUE)
#>
#> > here::i_am("scripts/script.R")
#> here() starts at /tmp/RtmpD769js/filecf24cdef230
#>
#> > print("Hello, world!")
#> [1] "Hello, world!"
To reiterate: a fresh session is almost always the better, cleaner, safer, and more robust solution. Use this approach only as a last resort.
The here package has a very simple and restricted interface, by design. The underlying logic is provided by the much more powerful rprojroot package. If the default behavior of here does not suit your workflow for one reason or another, the rprojroot package may be a better alternative. It is also recommended to import rprojroot and not here from other packages.
The following example shows how to find an RStudio project starting from a directory:
library(rprojroot)
find_root(is_rstudio_project, file.path(project_path, "analysis"))
#> [1] "/tmp/RtmpTxQu3u/Rinstc3ccc3a1ac/here/demo-project"
Arbitrary criteria can be defined. See
vignette("rprojroot", package = "rprojroot")
for an
introduction.
Prior to version 1.0.0, it was recommended to attach the
here package via library(here)
. This still works, but is no
longer the recommended approach.↩︎
library(here)
no longer emits an
informative message if here::i_am()
has been called
before.↩︎
library(here)
emits a message that may be
confusing if followed by the message from here::i_am()
.↩︎