docdata is an R package that generates documentation for datasets semi-automatically. It streamlines the process of documenting when/where/who etc. a dataset is from. It also standardizes documentation.
Ideally, every dataset (e.g., csv/txt file) with tabular data should have a corresponding documentation file that describes the rows and columns of that dataset and other information about the dataset.
docdata helps you accomplish all that.
docdata aims to make data docmentation and sharing easier. It helps you avoid being that person who shares data that no one else can use because nothing was documented.
Below are examples of documentation generated by
To install the package, type the following commands into the R console:
Step 1: use
doc_data() to generate a documentation (markdown file)
mtcars.csvis a dataset in your working directory.)
Step 2: use
disp_doc() to print the doc in your console
Step 3: use
doc_open() to open the doc to edit it
Step 4: use
doc_refresh() to refresh/update your documentation
Step 5: share your dataset and documentation file with others or your future self(!)
doc_data() generates a markdown file that looks like the one shown below. If you dataset is
mtcars.csv, the markdown file will be named
mtcars.md and will be located in the same directory as
mtcars.csv is a dataset in your working directory.)
A GitHub flavored Markdown textfile documenting a dataset. Generated using [docdata package](https://hauselin.github.io/docdata/) on 2019-12-08 18:16:46. To cite this package, type citations("docdata") in console. ## Data source mtcars.csv ## About this file * What (is the data): * Who (generated this documentation): * Who (collected the data): * When (was the data collected): * Where (was the data collected): * How (was the data collected): * Why (was the data collected): ## Additional information * Contact: XXX@XXX.com * Registration: https://osf.io ## Columns * Rows: 32 * Columns: 4 | Column | Type | Description | | ------- | -------- | ----------- | | mpg | numeric | | | cyl | numeric | | | disp | numeric | | | hp | numeric | | End of documentation.
disp_doc() prints the documentation in your console. An example (truncated) output is shown below.
doc_open() opens the documentation in R or RStudio so you can edit it and fill in the details.
If your documentation looks messy after you’ve edited it (especially if the description column isn’t aligned), run
doc_refresh() to clean it up. Or if the columns/rows of your dataset have changed since the last time the documentation was generated, run this function again to update your documentation, which merges your previous documentation with a refreshed/updated one.
doc_refresh(): spacing are cleaned and new columns are deleted/added