Ticket #20 (closed defect: fixed)
Record/FieldType create: reliably detect if record already exists
| Reported by: | bruno | Owned by: | evert |
|---|---|---|---|
| Priority: | major | Milestone: | 0.2 |
| Component: | Repository | Version: | |
| Keywords: | Cc: | lily-developers@… |
Description
HBase does not make a distinction between "create" and "update", you just "put" and it does not care if the record existed previously. Lily however does make this distinction.
Currently the implementation of record-create checks if the row exists and if so throws RecordExistsException?, otherwise it goes on to create the row by doing a Put.
However, there is a concurrency problem since someone else might do the same check at the same time, also come to conclusion that the row does not exist, and also do the put. Both clients will receive the answer that their "create" succeeded (I assume, not verified).
So I started thinking about a reliable way to do this. Turns out that you can actually take a lock on a row that does not exist. In fact, when doing an operation like put/get HBase internally always first takes a lock on the row (if the client did not specify one), ignoring if the row actually exists.
The below testcase illustrates this scenario.
import org.apache.hadoop.hbase.HBaseTestingUtility;
import org.apache.hadoop.hbase.HColumnDescriptor;
import org.apache.hadoop.hbase.HTableDescriptor;
import org.apache.hadoop.hbase.client.*;
import org.apache.hadoop.hbase.util.Bytes;
import org.junit.AfterClass;
import org.junit.BeforeClass;
import org.junit.Test;
import org.lilycms.hbaseindex.IndexManager;
import org.lilycms.testfw.TestHelper;
public class CreateTest {
private final static HBaseTestingUtility TEST_UTIL = new HBaseTestingUtility();
@BeforeClass
public static void setUpBeforeClass() throws Exception {
TestHelper.setupLogging();
TEST_UTIL.startMiniCluster(1);
}
@AfterClass
public static void tearDownAfterClass() throws Exception {
TEST_UTIL.shutdownMiniCluster();
}
@Test
public void test() throws Exception {
HBaseAdmin hbaseAdmin = new HBaseAdmin(TEST_UTIL.getConfiguration());
HTableDescriptor table = new HTableDescriptor("CreateTestTable");
HColumnDescriptor family = new HColumnDescriptor("Family");
table.addFamily(family);
hbaseAdmin.createTable(table);
HTable htable = new HTable(TEST_UTIL.getConfiguration(), "CreateTestTable");
byte[] rowkey = Bytes.toBytes("fookey");
RowLock rowLock = htable.lockRow(rowkey);
try {
// Check if the record exists
Get get = new Get(rowkey, rowLock);
Result result = htable.get(get);
if (!result.isEmpty()) {
throw new RuntimeException("row with this key already exists.");
}
// Row does not exist yet, create it
Put put = new Put(rowkey, rowLock);
put.add(Bytes.toBytes("Family"), Bytes.toBytes("Qualifier"), Bytes.toBytes("Value"));
htable.put(put);
} finally {
htable.unlockRow(rowLock);
}
}
}
Change History
comment:2 Changed 3 years ago by bruno
Here's pretty much the same message again:
http://markmail.org/message/cqthwqedqvxmqnun
Also:
http://markmail.org/message/laxch6i67j6f6op6
Or from the start of the thread:
http://markmail.org/message/lxsw3c4raq2znbgx
Need to look at this when I have some more time...
comment:3 Changed 3 years ago by bruno
Just adding after talking with mpo: this could also be solved by sending the logic to the hbase server (mobile code style, or maybe stored-procedure style where the custom code already sits in the server)
comment:4 Changed 3 years ago by bruno
Here is a solution for the create scenario:
http://markmail.org/message/hxlo6n56adt6pqtg
The same trick might also be usable for the update scenario: read current situation, put with condition that the value of the counter field is still the same as when read (and augment it as part of the put, of course).
comment:7 Changed 3 years ago by evert
- Status changed from new to closed
- Resolution set to fixed
r4202 and 4203
<p>Reliable Record creates and updates are covered by taking rowlocks. This is needed since next to putting the row, we also have to update the wall before releasing the rowlock.
<p>RecordType? and FieldType? creates and updates have to check that the name they want to create or update to doesn't already exist.
A concurrentCounter is taken (through incrementColumnValue) on a row with as key the name of the type. Then it is checked if the name is already in use. This must happen after taking the counter, otherwise a concurrent create or update can 'jump in between'. A checkAndPut is then done on this counter to make sure it didn't change in between. Otherwise this means a concurrent create or update happened.
comment:8 Changed 3 years ago by evert
Avoid using duplicate UUID's upon creation of a RecordType? or FieldType?
(what a coincidence this just comes up on the hbase list) This seems to suggest this is not a good solution:
http://markmail.org/message/nfzgzv32eg36gkdi
Quote:
If I'm guessing right: