Discussion:
[leveldb] Some ldb files not getting garbage collected
And Or
2017-02-17 03:30:26 UTC
Permalink
In our production code, we use LevelDb to store protobuf messages in
key-value format.

I am seeing that the amount of data we insert into LevelDb is not matching
its on-disk size. For example, I have a test with key size set to 45bytes,
value size set to 3750bytes, and a total of 5600 records. All keys are
unique. Also, for every record insertion, we add a special record with key
"txn_identifier" and value a simple integer of a few bytes. For this
experiment, total index size should come to around 21Mb for this workload.
but I am seeing an ondisk size of 76Mb.

Upon restart of our program the index on-disk size drops to 21Mb. This
restart essentially amounts to a re-open of the index. After restart some
files in the index directory are disappearing. My guess is that for some
reason compacted ldb files are still being kept around, but a restart of
the index is deleting them.

Has anyone seen this behaviour? Any help is appreciated.
--
You received this message because you are subscribed to the Google Groups "leveldb" group.
To unsubscribe from this group and stop receiving emails from it, send an email to leveldb+***@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.
Florian Weimer
2017-02-17 08:11:34 UTC
Permalink
Post by And Or
Upon restart of our program the index on-disk size drops to 21Mb. This
restart essentially amounts to a re-open of the index. After restart some
files in the index directory are disappearing. My guess is that for some
reason compacted ldb files are still being kept around, but a restart of
the index is deleting them.
Do you keep iterators around which hold onto a snapshot?
--
You received this message because you are subscribed to the Google Groups "leveldb" group.
To unsubscribe from this group and stop receiving emails from it, send an email to leveldb+***@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.
And Or
2017-02-17 17:54:37 UTC
Permalink
I do have iterators which use a snapshot. Will take a closer look at it
after reaching work today and post if I find anything wrong in the snapshot
deletion.
Post by Florian Weimer
Post by And Or
Upon restart of our program the index on-disk size drops to 21Mb. This
restart essentially amounts to a re-open of the index. After restart
some
Post by And Or
files in the index directory are disappearing. My guess is that for some
reason compacted ldb files are still being kept around, but a restart of
the index is deleting them.
Do you keep iterators around which hold onto a snapshot?
--
You received this message because you are subscribed to the Google Groups "leveldb" group.
To unsubscribe from this group and stop receiving emails from it, send an email to leveldb+***@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.
And Or
2017-02-18 01:35:32 UTC
Permalink
Thanks Florian. yes, I have an iterator which was holding onto a snapshot.
After some reworking and removal of both the iterator and the snapshot, the
performance issue vanished.

From what I can understand from my tests, just having a snapshot around is
not causing any issues. Holding onto an iterator for long period of time is
causing the issue.
Post by Florian Weimer
Post by And Or
Upon restart of our program the index on-disk size drops to 21Mb. This
restart essentially amounts to a re-open of the index. After restart
some
Post by And Or
files in the index directory are disappearing. My guess is that for some
reason compacted ldb files are still being kept around, but a restart of
the index is deleting them.
Do you keep iterators around which hold onto a snapshot?
--
You received this message because you are subscribed to the Google Groups "leveldb" group.
To unsubscribe from this group and stop receiving emails from it, send an email to leveldb+***@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.
Robert Escriva
2017-02-18 15:06:25 UTC
Permalink
Having a snapshot and an iterator around will both cause data growth,
but in different ways.

When you Put(k, v) into leveldb, it is stored as the tuple (k, t, v),
where t is the time of the write according to an internal sequence
number. A snapshot picks the highest sequence number at the time of the
snapshot.

During compaction, keys with timestamps greater than the time of the
earliest snapshot (or latest write if there are no snapshots) will be
kept in the database. If you hold a snapshot and perform many
overwrites, you will be able to watch your data grow in size (if you
performed simple writes where no object previously existed, you won't
notice any inflation).

An iterator inherently has an underlying snapshot whether or not you
created one. This means you'll see growth as above, and the growth you
noticed in your original email. During compaction, a set of input files
are rewritten as a set of output files. Normally, these input files are
deleted after compaction, but an iterator stops that. The iterator will
prevent compaction from deleting all files that exist at the time it is
created. This means you'll watch your data grow in size for both writes
and overwrites.

A good practice to adopt is to restrict usage of iterators to local
scope, deleting them immediately after you are done with them. You can
use snapshots to hold the same data for longer periods of time, and
recreate the iterator by creating it from the snapshot and seeking to
the key where it previously was. This won't stop bloat, but will
restrict the bloat to be proportional to the keys that you overwrite.

-Robert
Post by And Or
Thanks Florian. yes, I have an iterator which was holding onto a snapshot.
After some reworking and removal of both the iterator and the snapshot, the
performance issue vanished.
From what I can understand from my tests, just having a snapshot around is not
causing any issues. Holding onto an iterator for long period of time is causing
the issue.
Post by And Or
Upon restart of our program the index on-disk size drops to 21Mb. This
restart essentially amounts to a re-open of the index. After restart some
files in the index directory are disappearing. My guess is that for some
reason compacted ldb files are still being kept around, but a restart of
the index is deleting them.
Do you keep iterators around which hold onto a snapshot?
--
You received this message because you are subscribed to the Google Groups "leveldb" group.
To unsubscribe from this group and stop receiving emails from it, send an email
For more options, visit https://groups.google.com/d/optout.
--
You received this message because you are subscribed to the Google Groups "leveldb" group.
To unsubscribe from this group and stop receiving emails from it, send an email to leveldb+***@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.
Loading...