[leveldb] Log file in blocks

Discussion:

Lucas Lersch

2016-04-08 14:51:35 UTC

Hi,

this is probably a basic question, but the documentation says: "The log
file contents are a sequence of 32KB blocks. The only exception is that
the tail of the file may contain a partial block". Why exactly is it
organized as 32KB blocks? In other words, why is the block organization
useful? Can't I just append log entries in the following format?

entry :=
checksum: uint32 // crc32c of type and data[] ; little-endian
sequence: fixed64
count: fixed32
data: record[count]

record := kTypeValue varstring varstring | kTypeDeletion
varstring

varstring :=
len: varint32
data: uint8[len]

Best regards.

--
You received this message because you are subscribed to the Google Groups "leveldb" group.
To unsubscribe from this group and stop receiving emails from it, send an email to leveldb+***@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Robert Escriva

2016-04-08 14:53:21 UTC

Permalink

The block format means that corruption early in a file does not damage
the entire file. You can simply seek forward 32KB at a time until you
find a valid place to resume parsing.

-Robert

Hi,
this is probably a basic question, but the documentation says: "The log file
contents are a sequence of 32KB blocks. The only exception is that the tail of
the file may contain a partial block". Why exactly is it organized as 32KB
blocks? In other words, why is the block organization useful? Can't I just
append log entries in the following format?
entry :=
checksum: uint32 // crc32c of type and data[] ; little-endian
sequence: fixed64
count: fixed32
data: record[count]
record := kTypeValue varstring varstring | kTypeDeletion varstring
varstring :=
len: varint32
data: uint8[len]
Best regards.
--
You received this message because you are subscribed to the Google Groups "leveldb" group.
To unsubscribe from this group and stop receiving emails from it, send an email
For more options, visit https://groups.google.com/d/optout.

Lucas Lersch

2016-04-08 15:09:29 UTC

Permalink

Thanks for the answer. I get it. But in case you have a system failure and
need to rebuild based on the log file, if there is a corruption early in
the file and you just seek forward to the next block, you lose all the
updated in the first block. Putting in other words, why is a corruption in
the log file not treated as something critical? Why can you just ignore it
and keep going?

Post by Robert Escriva
The block format means that corruption early in a file does not damage
the entire file. You can simply seek forward 32KB at a time until you
find a valid place to resume parsing.
-Robert

Post by Lucas Lersch
Hi,
this is probably a basic question, but the documentation says: "The log

file

Post by Lucas Lersch
contents are a sequence of 32KB blocks. The only exception is that the

tail of

Post by Lucas Lersch
the file may contain a partial block". Why exactly is it organized as

32KB

Post by Lucas Lersch
blocks? In other words, why is the block organization useful? Can't I

just

Post by Lucas Lersch
append log entries in the following format?
entry :=
checksum: uint32 // crc32c of type and data[] ; little-endian
sequence: fixed64
count: fixed32
data: record[count]
record := kTypeValue varstring varstring | kTypeDeletion

varstring

Post by Lucas Lersch
varstring :=
len: varint32
data: uint8[len]
Best regards.
--
You received this message because you are subscribed to the Google Groups
"leveldb" group.
To unsubscribe from this group and stop receiving emails from it, send

an email

Post by Lucas Lersch
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to a topic in the
Google Groups "leveldb" group.
To unsubscribe from this topic, visit
https://groups.google.com/d/topic/leveldb/-5iAL3Fr8i0/unsubscribe.
To unsubscribe from this group and all its topics, send an email to
For more options, visit https://groups.google.com/d/optout.

--
Lucas Lersch
--
You received this message because you are subscribed to the Google Groups "leveldb" group.
To unsubscribe from this group and stop receiving emails from it, send an email to leveldb+***@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Dhruba Borthakur

2016-04-09 08:05:35 UTC

Permalink

This exact problem has caused us some pain too earlier. We enhanced this
default behaviour of leveldb to be more flexible here:

https://github.com/facebook/rocksdb/blob/master/include/rocksdb/options.h#L102

There were some use-cases that were ok with the default leveldb recovery
mode (which skips over corruptions in the transaction log), but there were
other use-cases that needed the database open to fail even if there is a
single corruption in the transaction log.

enum class WALRecoveryMode : char {
// Original levelDB recovery
// We tolerate incomplete record in trailing data on all logs
// Use case : This is legacy behavior (default)
kTolerateCorruptedTailRecords = 0x00,
// Recover from clean shutdown
// We don't expect to find any corruption in the WAL
// Use case : This is ideal for unit tests and rare applications that
// can require high consistency guarantee
kAbsoluteConsistency = 0x01,
// Recover to point-in-time consistency
// We stop the WAL playback on discovering WAL inconsistency
// Use case : Ideal for systems that have disk controller cache like
// hard disk, SSD without super capacitor that store related data
kPointInTimeRecovery = 0x02,
// Recovery after a disaster
// We ignore any corruption in the WAL and try to salvage as much data as
// possible
// Use case : Ideal for last ditch effort to recover data or systems that
// operate with low grade unrelated data
kSkipAnyCorruptedRecords = 0x03,
};

Post by Lucas Lersch
Thanks for the answer. I get it. But in case you have a system failure and
need to rebuild based on the log file, if there is a corruption early in
the file and you just seek forward to the next block, you lose all the
updated in the first block. Putting in other words, why is a corruption in
the log file not treated as something critical? Why can you just ignore it
and keep going?

Post by Lucas Lersch
Hi,
this is probably a basic question, but the documentation says: "The log

file

Post by Lucas Lersch
contents are a sequence of 32KB blocks. The only exception is that the

tail of

Post by Lucas Lersch
the file may contain a partial block". Why exactly is it organized as

32KB

Post by Lucas Lersch
blocks? In other words, why is the block organization useful? Can't I

just

varstring

Post by Lucas Lersch
varstring :=
len: varint32
data: uint8[len]
Best regards.
--
You received this message because you are subscribed to the Google

Groups

Post by Lucas Lersch
"leveldb" group.
To unsubscribe from this group and stop receiving emails from it, send

an email

Post by Lucas Lersch
For more options, visit https://groups.google.com/d/optout.

--
Lucas Lersch
--
You received this message because you are subscribed to the Google Groups "leveldb" group.
To unsubscribe from this group and stop receiving emails from it, send an
For more options, visit https://groups.google.com/d/optout.

--
Subscribe to my posts at http://www.facebook.com/dhruba
--
You received this message because you are subscribed to the Google Groups "leveldb" group.
To unsubscribe from this group and stop receiving emails from it, send an email to leveldb+***@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Lucas Lersch

2016-04-11 13:00:14 UTC

Permalink

Thanks, that was very elucidative. I am taking a look at both leveldb and
rocksdb source code, unfortunately I do not have a facebook account to
participate in rocksdb discussion group. Anyway, it is cool to see that you
guys are still active and improving the code :)

Post by Dhruba Borthakur
This exact problem has caused us some pain too earlier. We enhanced this
https://github.com/facebook/rocksdb/blob/master/include/rocksdb/options.h#L102
There were some use-cases that were ok with the default leveldb recovery
mode (which skips over corruptions in the transaction log), but there were
other use-cases that needed the database open to fail even if there is a
single corruption in the transaction log.
enum class WALRecoveryMode : char {
// Original levelDB recovery
// We tolerate incomplete record in trailing data on all logs
// Use case : This is legacy behavior (default)
kTolerateCorruptedTailRecords = 0x00,
// Recover from clean shutdown
// We don't expect to find any corruption in the WAL
// Use case : This is ideal for unit tests and rare applications that
// can require high consistency guarantee
kAbsoluteConsistency = 0x01,
// Recover to point-in-time consistency
// We stop the WAL playback on discovering WAL inconsistency
// Use case : Ideal for systems that have disk controller cache like
// hard disk, SSD without super capacitor that store related data
kPointInTimeRecovery = 0x02,
// Recovery after a disaster
// We ignore any corruption in the WAL and try to salvage as much data as
// possible
// Use case : Ideal for last ditch effort to recover data or systems that
// operate with low grade unrelated data
kSkipAnyCorruptedRecords = 0x03,
};

Post by Lucas Lersch
Thanks for the answer. I get it. But in case you have a system failure
and need to rebuild based on the log file, if there is a corruption early
in the file and you just seek forward to the next block, you lose all the
updated in the first block. Putting in other words, why is a corruption in
the log file not treated as something critical? Why can you just ignore it
and keep going?

Post by Lucas Lersch
Hi,
this is probably a basic question, but the documentation says: "The

log file

Post by Lucas Lersch
contents are a sequence of 32KB blocks. The only exception is that

the tail of

Post by Lucas Lersch
the file may contain a partial block". Why exactly is it organized as

32KB

Post by Lucas Lersch
blocks? In other words, why is the block organization useful? Can't I

just

varstring

Post by Lucas Lersch
varstring :=
len: varint32
data: uint8[len]
Best regards.
--
You received this message because you are subscribed to the Google

Groups

Post by Lucas Lersch
"leveldb" group.
To unsubscribe from this group and stop receiving emails from it, send

an email

Post by Lucas Lersch
For more options, visit https://groups.google.com/d/optout.

--
Lucas Lersch
--
You received this message because you are subscribed to the Google Groups
"leveldb" group.
To unsubscribe from this group and stop receiving emails from it, send an
For more options, visit https://groups.google.com/d/optout.

--
Subscribe to my posts at http://www.facebook.com/dhruba
--
You received this message because you are subscribed to a topic in the
Google Groups "leveldb" group.
To unsubscribe from this topic, visit
https://groups.google.com/d/topic/leveldb/-5iAL3Fr8i0/unsubscribe.
To unsubscribe from this group and all its topics, send an email to
For more options, visit https://groups.google.com/d/optout.

MARK CALLAGHAN

2016-04-11 13:23:03 UTC

Permalink

We are happy to discuss RocksDB via email at
https://groups.google.com/forum/#!forum/rocksdb

Post by Lucas Lersch
Thanks, that was very elucidative. I am taking a look at both leveldb and
rocksdb source code, unfortunately I do not have a facebook account to
participate in rocksdb discussion group. Anyway, it is cool to see that you
guys are still active and improving the code :)

Post by Lucas Lersch
Thanks for the answer. I get it. But in case you have a system failure
and need to rebuild based on the log file, if there is a corruption early
in the file and you just seek forward to the next block, you lose all the
updated in the first block. Putting in other words, why is a corruption in
the log file not treated as something critical? Why can you just ignore it
and keep going?

Post by Lucas Lersch
Hi,
this is probably a basic question, but the documentation says: "The

log file

Post by Lucas Lersch
contents are a sequence of 32KB blocks. The only exception is that

the tail of

Post by Lucas Lersch
the file may contain a partial block". Why exactly is it organized as

32KB

Post by Lucas Lersch
blocks? In other words, why is the block organization useful? Can't I

just

varstring

Post by Lucas Lersch
varstring :=
len: varint32
data: uint8[len]
Best regards.
--
You received this message because you are subscribed to the Google

Groups

Post by Lucas Lersch
"leveldb" group.
To unsubscribe from this group and stop receiving emails from it,

send an email

Post by Lucas Lersch
For more options, visit https://groups.google.com/d/optout.

--
Lucas Lersch
--
You received this message because you are subscribed to the Google
Groups "leveldb" group.
To unsubscribe from this group and stop receiving emails from it, send
For more options, visit https://groups.google.com/d/optout.

--
Lucas Lersch
--
You received this message because you are subscribed to the Google Groups "leveldb" group.
To unsubscribe from this group and stop receiving emails from it, send an
For more options, visit https://groups.google.com/d/optout.

--
Mark Callaghan
***@gmail.com
--
You received this message because you are subscribed to the Google Groups "leveldb" group.
To unsubscribe from this group and stop receiving emails from it, send an email to leveldb+***@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Continue reading on narkive:

Search results for '[leveldb] Log file in blocks' (Questions and Answers)

replies

Why does Avast detect the hosts file as a Trojan?

started 2012-09-16 03:45:58 UTC

security

replies

Can I file a police report for this?

started 2009-04-30 07:50:32 UTC

law enforcement & police

replies

Web site problems? File hiding tips?