We all have data we value, but do we value our data? Do we organize, version control and backup our data?
This overview is targeted at home users, not enterprise or governments, which require multiple access groups and clearances.
Approach
Before we start and make all our data redundant, we need to have everything organized. We want our data to be;
- easy to file
- easy to find
- clear naming conventions; e.g.
20181120.md
not20nov2018.md - clear directory structure; e.g.
./meetings/20181120.md
not./meeting_20181120.md - independant of data carrier; e.g. hdd, dropbox, icloud, etc.
When using dates in filenames, try to stick to the ISO8061 format (YYYYMMDD), which allows you to sort on date.
SCARV attributes
When thinking of the attributes we can give to our data, we find 5 important indicators, of which the first, version control (revision), is not independent of the platform. E.g. Git systems provide very different version control than online collaborative document editing systems such as Google Docs. They all allow you to go to previous versions, some as snapshots and others by adding the new state (not changing the history), examples are a transactional database or blockchain.
We will first show the raw data and then discuss it;
parameters:
revision:
desc: is version control (VCS) required?
value:
0:
desc: No
examples:
- pictures
1:
desc: Yes
examples:
- Markdown in Git as technical documentation
value:
desc: impact when lost, importance
values:
0:
desc: MAY be lost
examples:
- software binaries
- music and movies
1:
desc: SHOULD NOT be lost
examples:
- birthday pictures
- finished homework
2:
desc: MUST NOT be lost
examples:
- legal backup of financial records
- private keys
3:
desc: MUST NOT be lost
example:
- private keys
- store in (H)EMP proof data container
- military grade, national secrets
confidentiality:
desc: how secret / public is the data
values:
0:
desc: public data
examples:
- html, css
- food pics
1:
desc: non public but non secrative, may leak
examples:
- homework
- selfies
2:
desc: non public, but accessible to storage provider
examples:
- email
- financial records
3:
desc: secrets
examples:
- private keys
- passwords
availability:
desc: accessibility / availability
values:
0:
desc: cold storage / archive
examples:
- homework finished education
- financial records years back
1:
desc: easy to obtain access
examples:
- disaster backup
2:
desc: easy to access
examples:
- documents
3:
desc: always access, offline syncing
examples:
- password manager
- calendar
size:
desc: size on disk
values:
0:
desc: KB
examples:
- passwords
1:
desc: MB
examples:
- database dump
2:
desc: GB
examples:
- photo gallery
3:
desc: TB
examples:
- logs
When we have clasified the 5 fields for our data set, we first check for the most picky field, down to the most generic:
- revision
- confidentiality
- size
The last two fields can be solved in various ways, by providing backup(s) and different forms of availabilities:
- availability
- value
This gives us 2 * 4^4 = 512
possible file groups,
which is not what we want.
So we just group our files in directories
and specify the requirements for it.
Two omitted attributes:
To judge the availability, one should also consider the update cadence, i.e. the frequency of change. We do not consider append only or write once storage, since these are rare (e.g. central log server) or old tech (cd/dvd).
We also do not consider collaboration. This requirement is met with tools like Google Docs or Git. Enterprise environments need to deal with this, home users do not.
Example providers
A subset of free storage providers:
we read this as: a public repo. on Github allows Revision, is not Confidential, allows MBs as storage Size and is highly Available.
provider | storage type | R | C | S | A |
---|---|---|---|---|---|
github | public repo. | 1 | 0 | 1 | 3 |
github | private repo. | 1 | 2 | 1 | 3 |
bitbucket | public repo. | 1 | 0 | 1 | 3 |
bitbucket | private repo. | 1 | 2 | 1 | 3 |
dropbox | default | 0 | 2 | 2 | 2 |
photos | 0 | 1 | 3 | 3 | |
drive | 0 | 2 | 2 | 3 | |
docs | 1 | 2 | 1 | 3 | |
stack | default | 0 | 2 | 2 | 3 |
docker hub | other | 1 | 0 | 2 | 1 |
usb hdd | local | 0 | 3 | 2 | 1 |
NAS at home | local | 0 | 3 | 3 | 2 |
More providers can be found here or here.
We all know people who write down lists of contacts details, weight or passwords, but each of them have a better digital alternative, therefore we do not consider writing things down. Printing your private keys as a backup may be the exception.
Example files
type | R | C | S | A | V | example solution |
---|---|---|---|---|---|---|
finished education backup | 0 | 1 | 1 | 0 | 1 | 2 usb sticks, one at home, one at parents house |
finished course current edu. | 0 | 1 | 1 | 2 | 2 | git repo. which is backed up to docker hub |
current project / course | 1 | 2 | 1 | 3 | 3 | git repo. stored in Dropbox folder |
long spanning documents; resume, lists | 1 | 2 | 1 | 2 | 2 | Google Docs and montly backup to usb hdd |
(watched) movies | 0 | 0 | 3 | 1 | 0 | NAS which enables video streaming |
private keys | 0 | 3 | 0 | 2 | 3 | local with full disk encryption and on usb in vault |
wordpress | 1 | 0 | 1 | 2 | 2 | on server and daily backup to git repo. |
wordpress db | 1 | 1 | 1 | 2 | 2 | dump.sql backed up using systemd timer to git |
Email is usually very important, but what if your free email service blocks your account?
We can prevent the damage by forwarding all email to another provider (e.g. using gmail and hotmail as a backup). Note that this gives you twice the chance of getting hacked.
Photos
The unlimited photo backup of Google is very handy, but again, what if your account gets suspended?
To minimize the damage, create an album of your valuable photos/videos and download/backup this album once a month/quarter.
You could make albums (of backup worthy pictures) by year for recent years; ‘before2015’, ‘2015’, ‘2016’, ‘2017’, ‘2018’ and life phases for older pictures; ‘kid’, ‘highschool’, ‘college’, ‘uni’, ‘earlyparenthood’. Which eases the backup procedure.
Stateful docker
If you keep state in your docker container, you are doing it wrong, please mount volumes. But if you are already there:
docker commit <container ID> <new image name>
docker save > image-name.tar
Example setup
On my own machine (PC) my files are stored on the hdd.
|sep. backup |
|____________|
__.
/|
_________________________________/
|pc hdd /|
| _____________________________/ |
| |cloud sync | |
| | _________________________ | |
| | |git repo. 01 | | |
| | | _____________________ | | |
| | | |files | | | |
| | | | ├── README.md | | | |
| | | | ├── project_dir | | | |
| | | | │ ├── file1.txt | | | |
| | | | │ └── filen.txt | | | |
| | | | ├── another_dir | | | |
| | | | │ └── config.yaml | | | |
| | | | └── directory | | | |
| | | |_____________________| | | |
| | |_________________________| | |
| | | |
| | _________________________ | |
| | |git repo. 02 | | |
| | | _____________________ | | |
| | | |files | | | |
| | | |_____________________| | | |
| | |_________________________| | |
| |_____________________________| |
|_________________________________|
I have a cloud sync running (e.g. dropbox or drive) and see this as my root directory for all my files, so all my files are synced as a backup to the cloud.
Since I use a free cloud provider, there is less incentive for them to keep my account up and thus I have a separate backup. This second backup is made once a week/month to another free service, which gets an encrypted archive (.zip or .tgz) of my files. This is my disaster recovery backup.
Active projects are more important and are also inside a git repository. These git repositories have a backup on the git service in use.
To avoid keeping old git repositories on my machine, we create a backup of them on docker hub.