Lucas Lersch
2016-07-05 15:13:48 UTC
Hi,
By taking a look at the code of db_bench.cc and how it runs the experiments
and calculate the latency of operations, I came up with some questions.
First, most of the default experiments use Write() async mode, meaning that
the latency of writing a single key-value pair is simply the time of
pushing the write to the OS. However, if I want to offer full durability
guarantees, every Write should be done synchronously, which has poor
performance.
I can still do every Write() in sync mode, but batch many Put requests (say
1000) in a WriteBatch() (similar to what the fillbatch operation does). By
doing so, I have control of when a Put is durable or not.
While grouping many Put to a single Write() increases the throughput of the
whole system, the latency of a single Put must take into consideration the
whole latency of the Write().
In other words, latency of any Put = time to insert into batch (minimal) +
time to flush the whole batch
Only after the whole batch is written I have guarantee that a Put is
durable.
However, the way latency is calculated in db_bench is: total_time /
#operations
That completely ignores the fact that some of the Put requests had to wait
for the batch to fill before being confirmed.
Therefore, it seems that the values outputted (micros/op) are simply
1/tput, not the latency. Is this assumption right? If so, is there any way
to get the real latency.
Best regards,
Lucas
By taking a look at the code of db_bench.cc and how it runs the experiments
and calculate the latency of operations, I came up with some questions.
First, most of the default experiments use Write() async mode, meaning that
the latency of writing a single key-value pair is simply the time of
pushing the write to the OS. However, if I want to offer full durability
guarantees, every Write should be done synchronously, which has poor
performance.
I can still do every Write() in sync mode, but batch many Put requests (say
1000) in a WriteBatch() (similar to what the fillbatch operation does). By
doing so, I have control of when a Put is durable or not.
While grouping many Put to a single Write() increases the throughput of the
whole system, the latency of a single Put must take into consideration the
whole latency of the Write().
In other words, latency of any Put = time to insert into batch (minimal) +
time to flush the whole batch
Only after the whole batch is written I have guarantee that a Put is
durable.
However, the way latency is calculated in db_bench is: total_time /
#operations
That completely ignores the fact that some of the Put requests had to wait
for the batch to fill before being confirmed.
Therefore, it seems that the values outputted (micros/op) are simply
1/tput, not the latency. Is this assumption right? If so, is there any way
to get the real latency.
Best regards,
Lucas
--
You received this message because you are subscribed to the Google Groups "leveldb" group.
To unsubscribe from this group and stop receiving emails from it, send an email to leveldb+***@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.
You received this message because you are subscribed to the Google Groups "leveldb" group.
To unsubscribe from this group and stop receiving emails from it, send an email to leveldb+***@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.