DB读Get-leveldb源码剖析(11)

读操作基于user_key可以找到当前最大SequenceNumber的数据,也支持获取指定快照的数据,本篇将分析具体调用流程及详细实现。

DB读即DBImpl::Get的具体实现如下:

Status DBImpl::Get(const ReadOptions& options,
                   const Slice& key,
                   std::string* value) {
  Status s;
  MutexLock l(&mutex_);
  SequenceNumber snapshot;
  if (options.snapshot != NULL) {
    snapshot = reinterpret_cast(options.snapshot)->number_;
  } else {
    snapshot = versions_->LastSequence();
  }

  MemTable* mem = mem_;
  MemTable* imm = imm_;
  Version* current = versions_->current();
  mem->Ref();
  if (imm != NULL) imm->Ref();
  current->Ref();

  bool have_stat_update = false;
  Version::GetStats stats;

  // Unlock while reading from files and memtables
  {
    mutex_.Unlock();
    // First look in the memtable, then in the immutable memtable (if any).
    LookupKey lkey(key, snapshot);
    if (mem->Get(lkey, value, &s)) {
      // Done
    } else if (imm != NULL && imm->Get(lkey, value, &s)) {
      // Done
    } else {
      s = current->Get(options, lkey, value, &stats);
      have_stat_update = true;
    }
    mutex_.Lock();
  }

  if (have_stat_update && current->UpdateStats(stats)) {
    MaybeScheduleCompaction();
  }
  mem->Unref();
  if (imm != NULL) imm->Unref();
  current->Unref();
  return s;
}

首先检查是否有指定快照数据读取的需求,若没有就传入版本集合最后的SequenceNumber,能够确保正确读取到最新的数据。

然后获取当前版本下的memtable和immutable memtable,使用两个局部变量且分别Ref确保操作的安全性,主要有两点考量:
1. DB写Put操作时会调用MakeRoomForWrite检查mem若写满及时将mem置为imm并创建新的memtable。
2. 将imm做compaction dump到磁盘完成后会对imm进行Unref并置空。
继而开始真正的查找工作。

先做了一次unlock解锁,从Memtable查找数据时是允许同时写数据的,SSTable文件本身是只读的,所以对于并发的多读单写这里是需要解锁而且没有问题。
而对于后续的compaction检查则在多线程下是需要lock加锁的,要确保创建compaction任务和后台只有一个compaction work在执行的安全。

从Memtable查找数据是比较简单的,在之前的篇章也有过分析,这里着重看mem和imm均未命中查询时从SSTable的逐层查找过程:

Status Version::Get(const ReadOptions& options,
                    const LookupKey& k,
                    std::string* value,
                    GetStats* stats) {
  Slice ikey = k.internal_key();
  Slice user_key = k.user_key();
  const Comparator* ucmp = vset_->icmp_.user_comparator();
  Status s;

  stats->seek_file = NULL;
  stats->seek_file_level = -1;
  FileMetaData* last_file_read = NULL;
  int last_file_read_level = -1;

  // We can search level-by-level since entries never hop across
  // levels.  Therefore we are guaranteed that if we find data
  // in an smaller level, later levels are irrelevant.
  std::vector<FileMetaData*> tmp;
  FileMetaData* tmp2;
  for (int level = 0; level < config::kNumLevels; level++) {
    size_t num_files = files_[level].size();
    if (num_files == 0) continue;

    // Get the list of files to search in this level
    FileMetaData* const* files = &files_[level][0];
    if (level == 0) {
      // Level-0 files may overlap each other.  Find all files that
      // overlap user_key and process them in order from newest to oldest.
      tmp.reserve(num_files);
      for (uint32_t i = 0; i < num_files; i++) {
        FileMetaData* f = files[i];
        if (ucmp->Compare(user_key, f->smallest.user_key()) >= 0 &&
            ucmp->Compare(user_key, f->largest.user_key()) <= 0) {
          tmp.push_back(f);
        }
      }
      if (tmp.empty()) continue;

      std::sort(tmp.begin(), tmp.end(), NewestFirst);
      files = &tmp[0];
      num_files = tmp.size();
    } else {
      // Binary search to find earliest index whose largest key >= ikey.
      uint32_t index = FindFile(vset_->icmp_, files_[level], ikey);
      if (index >= num_files) {
        files = NULL;
        num_files = 0;
      } else {
        tmp2 = files[index];
        if (ucmp->Compare(user_key, tmp2->smallest.user_key()) < 0) {
          // All of "tmp2" is past any data for user_key
          files = NULL;
          num_files = 0;
        } else {
          files = &tmp2;
          num_files = 1;
        }
      }
    }

    for (uint32_t i = 0; i < num_files; ++i) {
      if (last_file_read != NULL && stats->seek_file == NULL) {
        // We have had more than one seek for this read.  Charge the 1st file.
        stats->seek_file = last_file_read;
        stats->seek_file_level = last_file_read_level;
      }

      FileMetaData* f = files[i];
      last_file_read = f;
      last_file_read_level = level;

      Saver saver;
      saver.state = kNotFound;
      saver.ucmp = ucmp;
      saver.user_key = user_key;
      saver.value = value;
      s = vset_->table_cache_->Get(options, f->number, f->file_size,
                                   ikey, &saver, SaveValue);
      if (!s.ok()) {
        return s;
      }
      switch (saver.state) {
        case kNotFound:
          break;      // Keep searching in other files
        case kFound:
          return s;
        case kDeleted:
          s = Status::NotFound(Slice());  // Use empty error message for speed
          return s;
        case kCorrupt:
          s = Status::Corruption("corrupted key for ", user_key);
          return s;
      }
    }
  }

  return Status::NotFound(Slice());  // Use an empty error message for speed
}

因第0 level的sstable数据range可能有重叠,需特别处理,从第0 level的文件里依次遍历通过比较key range选取出可能包含user_key的文件作为待查集合,并按照文件编号新旧排序。
对于非0 level的sstable数据没有重叠区间,则通过FindFile以internal_key对本身有序的这些文件区间进行二分查找确定可能包含待查key的sstable,至多有一个。

确定某层下的待查数据集合后,借助TableCache进行遍历查找,其中TableCache的查找过程在前面的文章已有分析。查找的过程中同时更新allowed_seeks,这个在Compaction上篇已做分析,不再赘述。之后对数据进行校验。

读操作最后会基于对SSTable的文件查找及allowed_seeks更新操作,主动调用MaybeScheduleCompaction做Compaction检查。

基于上面的分析看,读操作还是比较简单的。

发表回复

您的邮箱地址不会被公开。 必填项已用 * 标注