type
status
date
slug
summary
tags
category
icon
password

任务目标

Putting substrings in sequence

  • In this lab you’ll write the data structure that will be responsible for this reassembly: a Reassembler. It will receive substrings, consisting of a string of bytes, and the index of the first byte of that string within the larger stream. Each byte of the stream has its own unique index, starting from zero and counting upwards. As soon as the Reassembler knows the next byte of the stream, it will write it to the Writer side of a ByteStream— the same ByteStream you implemented in checkpoint 0. The Reassembler’s “customer” can read from the Reader side of the same ByteStream.
  • 接口:
    • The full (public) interface of the reassembler is described by the Reassembler class in the reassembler.hh header. Your task is to implement this class. You may add any private members and member functions you desire to the Reassembler class, but you cannot change its public interface.
    • 总而言之就是TCP发送方会把要发送的数据分成许多字串装在datagram里面,但传输过程中可能出现丢失、乱序、重复等情况,需要根据这些字串的索引将它们整合并且实时写出(pop)
      • 🔥
        The Reassembler's job is to reassemble the indexed substrings (possibly out-of-order and possibly overlapping) back into the original ByteStream. As soon as the Reassembler learns the next byte in the stream, it should write it to the output.

    What should the Reassembler store internally?

    • The insert method informs the Reassembler about a new excerpt of the ByteStream, and where it fits in the overall stream (the index of the beginning of the substring).
    • 处理三类东西:
      • 下一段bytes,一旦整合马上读出
      • Bytes that fit within the stream’s available capacity but can’t yet be written, because earlier bytes remain unknown. These should be stored internally in the Reassembler.
      • Bytes that lie beyond the stream’s available capacity. These should be discarded(丢弃). The Reassembler’s will not store any bytes that can’t be pushed to the ByteStream either immediately, or as soon as earlier bytes become known.
    • 红色和绿色的部分都不能超过“capacity”
      • notion image

    FAQs

    • When should bytes be written to the stream? As soon as possible. The only situation in which a byte should not be in the stream is that when there is a byte before it that has not been “pushed” yet.
    • Will I need to add private members to the Reassembler? Yes. Substrings may arrive in any order, so your data structure will have to “remember” substrings until they’re ready to be put into the stream—that is, until all indices before them have been written.
    • Is it okay for our re-assembly data structure to store overlapping substrings? No. It is possible to implement an “interface-correct” reassembler that stores overlapping substrings. But allowing the re-assembler to do this undermines the notion of “capacity” as a memory limit. If the caller provides redundant knowledge about the same index, the Reassembler should only store one copy of this information.
    • Will the Reassembler ever use the Reader side of the ByteStream? No—that’s for the external customer. The Reassembler uses the Writer side only
    • So a reasonable implementation of the Reassembler might be about 50–60 lines of code for the Reassembler (on top of the starter code).

    理解接口

    • explicit Reassembler( ByteStream&& output ) : output_( std::move( output ) ) {}
      • Construct Reassembler to write into given ByteStream. 这个是构造函数
      • explicit :这是一个关键字,用于指示编译器不要隐式地将此构造函数用于隐式类型转换
      • ( ByteStream&& output ) :这个是构造函数的参数列表
        • ByteStream&&是一个右值引用
          • 在C++中,我们通常用变量来存储值,比如 int x = 5;,这里 x 是一个变量,存储了值 5。当我们需要传递这个值给函数或者赋值给另一个变量时,我们通常会使用变量的名称来引用这个值,比如 int y = x;,这里 y 也存储了值 5,但它是通过复制 x 的值而得到的
            然而,在某些情况下,我们可能希望直接操作某个临时创建的对象,而不是复制它的值。这就是右值引用的用武之地。右值引用允许我们绑定到临时创建的对象,或者将要被移动的对象,而不是复制它们的值
        • 综合起来,( ByteStream&& output ) 这句代码表示一个函数的参数,它接受一个 ByteStream 类型的右值引用。右值引用允许我们在函数内部获取传递给函数的参数,并且可以对其进行修改或者移动其资源,而不需要执行深拷贝操作,从而提高了程序的效率
      • output: a mutable reference to the Writer
    • void insert( uint64_t first_index, std::string data, bool is_last_substring )
      • Insert a new substring to be reassembled into a ByteStream.
      • first_index: the index of the first byte of the substring
      • data: the substring itself
      • is_last_substring: this substring represents the end of the stream
    • uint64_t bytes_pending() const
      • How many bytes are stored in the Reassembler itself?
      • 指的是缓存在internal storage的字节数

    主要思路

    • 首先考虑存储unassembled string的数据结构
      • 最好是让data在这部分空间里面自动按照first_index排序
        • std::set中的元素是通过关键字升序排列的,支持快速查找,但不支持重复。
          • 💡
            c++中的std::set,是基于红黑树的平衡二叉树的数据结构实现的一种容器,因为其中所包含的元素的值是唯一的,因此主要用于去重和排序
        • std::map是一个关联式容器,其中元素以键值对形式存在,并且按照键值升序排列。
        • 因此如果考虑data不需要重复的情况,目前看来是使用std::set比较好
        • 如果你想要键值对按照键自动排序,但又不需要键和值分开存储,可以使用std::set结合std::pair,或者使用std::map,将键和值设置为相同的类型
        • std::set+std::pair
        • 迭代器:访问set里面的元素
          • 迭代器是一种用于遍历容器中元素的对象,它提供了一种统一的访问容器元素的方式,使得可以在不暴露容器内部结构的情况下对容器进行遍历和访问
    • 最难的是考虑怎么解决data与set里面元素重叠的情况
      • 用leftDownIndex,leftUpIndex,rightDownIndex,rightUpIndex来界定边界
    • 对于available_capacity()
      • 一开始是想着每次在set里面加入时就把这个数据减去一个string长度。后来发现理解错了
      • 因为每一个byte的位置是确定的,因此只有ByteStream的长度会影响available_capacity(),只用考虑ByteStream长度已经确定的情况下,data的某个byte的index是否超出firstUnacceptableIndex就行了
    • 对于pending_num
      • 这个表示存在assembler的internal storage里面的byte数

    代码及示意图

    assembler.hh

    assembler.cc

    示意图

    notion image

    运行结果

    notion image
     
    数组Computer Networking Notes
    • Giscus