Ticket #376 (new Improvement)

Opened 3 years ago

Last modified 3 years ago

Hot backup of repository

Reported by: idzelis@… Owned by: somebody
Priority: Major Milestone:
Component: Backup Version: 1.5
Keywords: Cc:

Description

[jira2trac import : issue created on December 4, 2006 12:47:56 PM CET http://issues.cocoondev.org/browse/DSY-376 ]

We have several distributed teams accessing our Daisy document repository from different timezones. Our repository is large, and takes about an hour to backup. During this time, new documents can't be created, changed, etc because the repository is locked. Because someone may always want access to the repository, there is no good time to backup the repository. The best solution would be a way to backup the repository without locking it for writes.

Change History

comment:1 Changed 3 years ago by paul

[jira2trac import : comment created by bruno on June 18, 2007 9:12:45 AM CEST]

This issue could be solved by some more intelligent lock-mode for the blobstore, exploiting the fact that the blobstore never updates blobs, it only adds new blobs and sometimes removes blobs.

From the point of view of the backup, the important thing is no data is lost:

  • if new blobs are added during the backup, this is not really a problem. In case a backup is restored, a tool could be run to check for, and remove, redundant (= non-referenced) blobs. [to check: would copying the blobstore fail if files are concurrently being written?]
  • blobs which are requested to be deleted during backup could be added to a queue, to be processed after the backup lock is released.

The main problem is how to keep track of this queue:

  • in-memory queue: would be lost when server is killed unexpectedly. Could be solved with the afore-mentioned garbage-cleanup tool. [running such tools is extra admin effort/knowledge, so should be avoided if possible]
  • queue stored in database or in file in blobstore: avoids the problem of queue entries being lost, but might be a problem in case a backup is restored, since the queue would also be part of the backup and hence a queue view of during the backup would be restored. This could be solved by allowing the blobstore to check with the repository if a key is still in use. [would introduce a two-way dependency between blobstore and repository, unless we check directly on the DB]

Note about the blobstore-cleanup tool: care should be taken that, if the repository server is running, this doesn't remove blobs for documents just being added (= non-committed db transactions). This could e.g. be solved by only considering blobs that are older than e.g. one day.

comment:2 Changed 3 years ago by paul

[jira2trac import : comment created by julio.reis on May 22, 2009 12:40:24 PM CEST]

+1

We have about 5,000 documents in Daisy, and already the backup takes from 6:25 to 8:22 am CET -- 1 hour 57 minutes. Too long to deny write access!

The time doesn't affect me too much, but it does affect my team mates in New Zealand... and if I do the backup earlier I will affect the Canadians ;-) So fiddling with the backup hour won't solve anything. The backup has become a liability; but we cannot simply not do it.

So, please create a backup which won't lock the repository. Please.

comment:3 Changed 3 years ago by paul

[jira2trac import : comment created by idzelis on May 22, 2009 4:42:04 PM CEST]

Our backups were running about 3-5 hours at this point. We've come up with a pretty nice strategy for backups.

This will only work on linux (or linux-type) machines. Our daisy installation is installed on a LVM-managed partition. To "backup" a database, we use daisy API to lock the database, then we create a LVM snapshot of the disk. This will "freeze" the state of the disk at that point in time. Then we backup the database, and then use the daisy API to unlock the database. Even with our 20+Gig disk image, we only need to lock the daisy repository for write access for about 30 seconds! The backup still takes 3-5 hours to backup the snapshot of the LVM, but at least daisy is open for business during that time. After the backup is done, the LVM snapshot is destroyed and the partition is "normal" again.

To perform incremental backups, we've created a rsync script that will hard link the previous 14 days of data. (However, since they are hard links - the files are only really stored once, but hardlinked into the daily backup directories) This will allow us to roll back the database to any state it was in the previous 14 days. The backups are stored on a separate NFS (or SMB?) mounted drive.

comment:4 Changed 3 years ago by paul

[jira2trac import : comment created by karel on March 10, 2010 5:12:09 PM CET]

Another company using Daisy reported to us that they were using a backup lock + a proprietary system (HP EVA) for backups.

Another solution (Similar to Min Idzelis' solution) would be to have two systems: The 'master' system and a 'backup' system. The backup system would receive data using mysql replication and an rsync cron job or something similar for the blobstore and indexstore. Taking a backup would require these steps:

  • take a backup lock on the master
  • wait until you are sure mysql slave replication is not lagging (maatkit tools can help you here)
  • wait for a last rsync of the blobstore to complete
  • stop replication (and rsync)
  • unlock the master

At this point you can continue editing on the master repository, and you can take as much time as needed to do the complete backup. When you are done, resume mysql replication and rsync'ing the blobstore and indexstore

Note: See TracTickets for help on using tickets.