实验5: 数据抽象, 树
Lab 5: Data Abstraction, Trees

lab05.zip

Due by 11:59pm on Tuesday, September 29.

查看英文原文

初始文件

下载 lab05.zip。在压缩包中，你可以找到本实验问题的初始文件，以及一份 Ok 自动评分器。

主要内容

如果你需要复习本实验的材料，请参考本节。你可以直接跳到问题部分，遇到困难再回到这里。

List Comprehensions

List comprehensions are a compact and powerful way of creating new lists out of sequences. The general syntax for a list comprehension is the following:

[<expression> for <element> in <sequence> if <conditional>]

The syntax is designed to read like English: "Compute the expression for each element in the sequence if the conditional is true for that element."

Let's see it in action:

>>> [i**2 for i in [1, 2, 3, 4] if i % 2 == 0]
[4, 16]

Here, for each element i in [1, 2, 3, 4] that satisfies i % 2 == 0, we evaluate the expression i**2 and insert the resulting values into a new list. In other words, this list comprehension will create a new list that contains the square of each of the even elements of the original list.

If we were to write this using a for statement, it would look like this:

>>> lst = []
>>> for i in [1, 2, 3, 4]:
...     if i % 2 == 0:
...         lst = lst + [i**2]
>>> lst
[4, 16]

Note: The if clause in a list comprehension is optional. For example, you can just say:
>>> [i**2 for i in [1, 2, 3, 4]]
[1, 4, 9, 16]

Data Abstraction

Data abstraction is a powerful concept in computer science that allows programmers to treat code as objects -- for example, car objects, chair objects, people objects, etc. That way, programmers don't have to worry about how code is implemented -- they just have to know what it does.

Data abstraction mimics how we think about the world. When you want to drive a car, you don't need to know how the engine was built or what kind of material the tires are made of. You just have to know how to turn the wheel and press the gas pedal.

An abstract data type consists of two types of functions:

Constructors: functions that build the abstract data type.
Selectors: functions that retrieve information from the data type.

Programmers design ADTs to abstract away how information is stored and calculated such that the end user does not need to know how constructors and selectors are implemented. The nature of abstract data types allows whoever uses them to assume that the functions have been written correctly and work as described.

Trees

A tree is a data structure that represents a hierarchy of information. A file system is a good example of a tree structure. For example, within your cs61a folder, you have folders separating your projects, lab assignments, and homework. The next level is folders that separate different assignments, hw01, lab01, hog, etc., and inside those are the files themselves, including the starter files and ok. Below is an incomplete diagram of what your cs61a directory might look like.

cs61a_tree

As you can see, unlike trees in nature, the tree abstract data type is drawn with the root at the top and the leaves at the bottom.

Some tree terminology:

root: the node at the top of the tree
label: the value in a node, selected by the label function
branches: a list of trees directly under the tree's root, selected by the branches function
leaf: a tree with zero branches
node: any location within the tree (e.g., root node, leaf nodes, etc.)

Our tree abstract data type consists of a root and a list of its branches. To create a tree and access its root value and branches, use the following constructor and selectors:

Constructor
- tree(label, branches=[]): creates a tree object with the given label value at its root node and list of branches. Notice that the second argument to this constructor, branches, is optional - if you want to make a tree with no branches, leave this argument empty.
Selectors
- label(tree): returns the value in the root node of tree.
- branches(tree): returns the list of branches of the given tree.
Convenience function
- is_leaf(tree): returns True if tree's list of branches is empty, and False otherwise.

For example, the tree generated by

number_tree = tree(1,
         [tree(2),
          tree(3,
               [tree(4),
                tree(5)]),
          tree(6,
               [tree(7)])])

would look like this:

To extract the number 3 from this tree, which is the label of the root of its second branch, we would do this:

label(branches(number_tree)[1])

The print_tree function prints out a tree in a human-readable form. The exact form follows the pattern illustrated above, where the root is unindented, and each of its branches is indented one level further.

def print_tree(t, indent=0):
    """Print a representation of this tree in which each node is
    indented by two spaces times its depth from the root.

    >>> print_tree(tree(1))
    1
    >>> print_tree(tree(1, [tree(2)]))
    1
      2
    >>> numbers = tree(1, [tree(2), tree(3, [tree(4), tree(5)]), tree(6, [tree(7)])])
    >>> print_tree(numbers)
    1
      2
      3
        4
        5
      6
        7
    """
    print('  ' * indent + str(label(t)))
    for b in branches(t):
        print_tree(b, indent + 1)

必答题

列表推导式

Q1: 配对

实现 couple 函数，该函数接收两个列表，并返回一个包含嵌套列表的列表，其中每个嵌套列表包含两个输入序列中相同索引位置的元素配对在一起。你可以假设两个序列的长度相同。尝试使用列表推导式 (list comprehension) 来实现。

提示：内置的 range 函数可能对你有帮助。

def couple(s, t):
    """Return a list of two-element lists in which the i-th element is [s[i], t[i]].

    >>> a = [1, 2, 3]
    >>> b = [4, 5, 6]
    >>> couple(a, b)
    [[1, 4], [2, 5], [3, 6]]
    >>> c = ['c', 6]
    >>> d = ['s', '1']
    >>> couple(c, d)
    [['c', 's'], [6, '1']]
    """
    assert len(s) == len(t)
    "*** YOUR CODE HERE ***"

使用 Ok 来测试你的代码：

python3 ok -q couple --local

数据抽象

假设我们有一个表示城市的抽象数据类型（ADT）。一个城市具有名称、纬度坐标和经度坐标。

我们的 ADT 有一个构造函数 (Constructor)：

make_city(name, lat, lon)：创建一个具有指定名称、纬度和经度的城市对象。

此外，我们还有以下选择器函数 (Selector)，用于获取城市的信息：

get_name(city)：返回城市的名称
get_lat(city)：返回城市的纬度
get_lon(city)：返回城市的经度

以下是如何使用构造函数和选择器来创建城市并提取其信息的示例：

>>> berkeley = make_city('Berkeley', 122, 37)
>>> get_name(berkeley)
'Berkeley'
>>> get_lat(berkeley)
122
>>> new_york = make_city('New York City', 74, 40)
>>> get_lon(new_york)
40

所有的选择器和构造函数都可以在实验文件中找到，如果你对它们的实现方式感兴趣，可以查看文件。然而，数据抽象的关键在于，我们不需要知道一个抽象数据类型是如何实现的，而只需要知道如何与其交互并使用它。

Q2: 距离计算

实现 distance 函数，用于计算两个城市对象之间的距离。回顾一下，两个坐标点 (x1, y1) 和 (x2, y2) 之间的距离可以通过计算 sqrt((x1 - x2)**2 + (y1 - y2)**2) 来得到。我们已经为你导入了 sqrt 函数以方便使用。请使用城市的纬度和经度作为其坐标，并使用选择器函数来访问这些信息！

from math import sqrt
def distance(city_a, city_b):
    """
    >>> city_a = make_city('city_a', 0, 1)
    >>> city_b = make_city('city_b', 0, 2)
    >>> distance(city_a, city_b)
    1.0
    >>> city_c = make_city('city_c', 6.5, 12)
    >>> city_d = make_city('city_d', 2.5, 15)
    >>> distance(city_c, city_d)
    5.0
    """
    "*** YOUR CODE HERE ***"

使用 Ok 来测试你的代码：

python3 ok -q distance --local

Q3: 更近的城市

接下来，实现 closer_city 函数。该函数接收一个纬度、一个经度以及两个城市，并返回相对更接近提供的纬度和经度的城市名称。

你只能使用上面介绍的选择器、构造器以及刚刚定义的 distance 函数来完成此问题。

提示：你如何利用 distance 函数来计算给定位置与两个城市之间的距离呢？

def closer_city(lat, lon, city_a, city_b):
    """
    Returns the name of either city_a or city_b, whichever is closest to
    coordinate (lat, lon). If the two cities are the same distance away
    from the coordinate, consider city_b to be the closer city.

    >>> berkeley = make_city('Berkeley', 37.87, 112.26)
    >>> stanford = make_city('Stanford', 34.05, 118.25)
    >>> closer_city(38.33, 121.44, berkeley, stanford)
    'Stanford'
    >>> bucharest = make_city('Bucharest', 44.43, 26.10)
    >>> vienna = make_city('Vienna', 48.20, 16.37)
    >>> closer_city(41.29, 174.78, bucharest, vienna)
    'Bucharest'
    """
    "*** YOUR CODE HERE ***"

使用 Ok 来测试你的代码：

python3 ok -q closer_city --local

Q4: 不要违反抽象屏障！

注意：如果你正确实现了 distance 和 closer_city，此问题无需编写代码！

在编写使用抽象数据类型（ADT）的函数时，应尽可能使用构造器和选择器，而不是假设 ADT 的具体实现。依赖数据抽象的底层实现被称为违反抽象屏障，我们应当尽量避免这样做！

即使你违反了抽象屏障，你的 distance 和 closer_city 代码仍可能通过 doctest。要检查是否存在此问题，请运行以下命令：

使用 Ok 来测试你的代码：

python3 ok -q check_city_abstraction --local

check_city_abstraction 函数仅用于 doctest，它会将 city 抽象的实现替换为另一种方式，运行前两部分的测试，然后恢复原始实现。

抽象屏障的性质保证了，只要正确使用构造器和选择器，更改 ADT 的实现就不会影响任何依赖该 ADT 的程序的功能。

如果你通过了前面问题的 Ok 测试但未通过本测试，修复方法很简单！只需将任何违反抽象屏障的代码（例如直接创建列表表示城市或对城市对象进行索引）替换为合适的构造器或选择器。

确保你的函数在第一种和第二种 City ADT 实现下都能通过测试，并理解为什么它们应该对两者都有效，然后再继续。

树

Q5: 寻找浆果！

校园里的松鼠需要你的帮助！校园里有很多树，松鼠们想知道哪些树上有浆果。定义函数 berry_finder，该函数接受一棵树作为输入，如果树中包含值为 'berry' 的节点，则返回 True，否则返回 False。

提示：考虑使用 for 循环递归遍历每个分支！

def berry_finder(t):
    """Returns True if t contains a node with the value 'berry' and 
    False otherwise.

    >>> scrat = tree('berry')
    >>> berry_finder(scrat)
    True
    >>> sproul = tree('roots', [tree('branch1', [tree('leaf'), tree('berry')]), tree('branch2')])
    >>> berry_finder(sproul)
    True
    >>> numbers = tree(1, [tree(2), tree(3, [tree(4), tree(5)]), tree(6, [tree(7)])])
    >>> berry_finder(numbers)
    False
    >>> t = tree(1, [tree('berry',[tree('not berry')])])
    >>> berry_finder(t)
    True
    """
    "*** YOUR CODE HERE ***"

使用 Ok 来测试你的代码：

python3 ok -q berry_finder --local

Q6: 萌发新叶

定义一个函数 sprout_leaves，它接受一棵树 t 和一个叶子列表 leaves。该函数返回一棵新的树，该树与 t 相同，但所有原始叶子节点都会新增分支，每个分支对应 leaves 中的一个元素。

例如，假设我们有以下树 t = tree(1, [tree(2), tree(3, [tree(4)])])：

如果我们调用 sprout_leaves(t, [5, 6])，生成的树如下：

def sprout_leaves(t, leaves):
    """Sprout new leaves containing the data in leaves at each leaf in
    the original tree t and return the resulting tree.

    >>> t1 = tree(1, [tree(2), tree(3)])
    >>> print_tree(t1)
    1
      2
      3
    >>> new1 = sprout_leaves(t1, [4, 5])
    >>> print_tree(new1)
    1
      2
        4
        5
      3
        4
        5

    >>> t2 = tree(1, [tree(2, [tree(3)])])
    >>> print_tree(t2)
    1
      2
        3
    >>> new2 = sprout_leaves(t2, [6, 1, 2])
    >>> print_tree(new2)
    1
      2
        3
          6
          1
          2
    """
    "*** YOUR CODE HERE ***"

使用 Ok 来测试你的代码：

python3 ok -q sprout_leaves --local

Q7: 不要违反抽象屏障！

注意：如果你正确实现了 berry_finder 和 sprout_leaves，那么这个问题不需要编写任何代码！

在编写使用 ADT（抽象数据类型）的函数时，我们应尽可能使用构造函数和选择器，而不是假设 ADT 的具体实现。依赖数据抽象的底层实现被称为违反抽象屏障，而我们应当避免这样做！

即使你违反了抽象屏障，你仍然可能通过 berry_finder 和 sprout_leaves 的 doctest 测试。要检查你是否违反了抽象屏障，请运行以下命令：

使用 Ok 来测试你的代码：

python3 ok -q check_abstraction --local

check_abstraction 仅用于 doctest，它会替换 tree 抽象的实现，运行前面两个部分的测试，然后恢复原始抽象。

抽象屏障的核心思想是，即使更改 ADT 的实现，任何正确使用构造函数和选择器的程序都不应受到影响。

如果你通过了前面问题的 Ok 测试但未通过本测试，修复方法很简单！只需替换任何违反抽象屏障的代码，例如直接创建一个新的列表对象表示树，或直接索引树的数据结构，改为使用正确的构造函数或选择器。

确保你的函数能够通过树 ADT 的两种不同实现的测试，并理解为什么它们应该适用于两者，然后再继续下一部分。

Submit

Make sure to submit this assignment by running:

python3 ok --submit

选做题

Q8: 坐标

实现一个函数 coords，它接受一个函数 fn，一个序列 seq，以及该函数输出的 lower（下界）和 upper（上界）。coords 返回一个包含两个坐标（列表）的列表，使得：

每个 (x, y) 对应的坐标表示为 [x, fn(x)]
x 坐标来自给定的序列
结果仅包含那些 y 坐标在给定上界和下界范围内（包含边界）的坐标对

请查看 doctest 以获取示例。

注意：你的答案必须只有一行代码。你应该使用列表推导式！

def coords(fn, seq, lower, upper):
    """
    >>> seq = [-4, -2, 0, 1, 3]
    >>> fn = lambda x: x**2
    >>> coords(fn, seq, 1, 9)
    [[-2, 4], [1, 1], [3, 9]]
    """
    "*** YOUR CODE HERE ***"
    return ______

使用 Ok 来测试你的代码：

python3 ok -q coords --local

Q9: 交错洗牌

交错洗牌（riffle shuffle）方法常用于洗一副牌（在本题中，是一个序列）。洗完牌后新的排列方式是：顶部的牌后面接着中间的牌，然后是第二张牌，再接着是中间的下一张牌，以此类推。假设序列的元素数量为偶数，请编写一个使用列表推导式的表达式来生成洗牌后的序列。

提示： 要将此操作写成单行列表推导式，表达式 k%2 可能对你有用。该表达式在偶数时返回 0，在奇数时返回 1。考虑如何利用 k%2 的值交替访问列表的前半部分和后半部分。

def riffle(deck):
    """Produces a single, perfect riffle shuffle of DECK, consisting of
    DECK[0], DECK[M], DECK[1], DECK[M+1], ... where M is position of the
    second half of the deck.  Assume that len(DECK) is even.
    >>> riffle([3, 4, 5, 6])
    [3, 5, 4, 6]
    >>> riffle(range(20))
    [0, 10, 1, 11, 2, 12, 3, 13, 4, 14, 5, 15, 6, 16, 7, 17, 8, 18, 9, 19]
    """
    "*** YOUR CODE HERE ***"
    return _______

使用 Ok 来测试你的代码：

python3 ok -q riffle --local

Q10: 两树之和

定义 add_trees 函数，该函数接收两棵树作为输入，返回一棵新的树，其中每个对应位置的节点来自两棵树相加的结果。如果某个位置的节点只在一棵树中存在，而在另一棵树中不存在，该节点也应出现在新树中。

提示: 你可能想要使用内置的 zip 函数来同时遍历多个序列。

注意: 如果你觉得这个问题比之前的树问题要难很多，那是完全可以理解的！这个问题确实挺难的，但是你可以做到的！可以和其他同学讨论，再来考虑这个问题。

def add_trees(t1, t2):
    """
    >>> numbers = tree(1,
    ...                [tree(2,
    ...                      [tree(3),
    ...                       tree(4)]),
    ...                 tree(5,
    ...                      [tree(6,
    ...                            [tree(7)]),
    ...                       tree(8)])])
    >>> print_tree(add_trees(numbers, numbers))
    2
      4
        6
        8
      10
        12
          14
        16
    >>> print_tree(add_trees(tree(2), tree(3, [tree(4), tree(5)])))
    5
      4
      5
    >>> print_tree(add_trees(tree(2, [tree(3)]), tree(2, [tree(3), tree(4)])))
    4
      6
      4
    >>> print_tree(add_trees(tree(2, [tree(3, [tree(4), tree(5)])]), \
    tree(2, [tree(3, [tree(4)]), tree(5)])))
    4
      6
        8
        5
      5
    """
    "*** YOUR CODE HERE ***"

使用 Ok 来测试你的代码：

python3 ok -q add_trees --local

趣味问题！

莎士比亚与字典

我们将使用字典来近似莎士比亚的全部作品！我们要使用一个二元语法（bigram）语言模型。它的基本思想如下：我们从某个单词开始——例如 "The"。然后，我们遍历莎士比亚所有的文本，记录每次 "The" 出现后紧随的单词，并将其添加到一个列表中，这个列表中的单词被称为 "The" 的 后继者（successors）。假设我们对莎士比亚曾经使用过的每一个单词都做了这样的处理。

回到 "The"。现在，我们可以从这个列表中随机选择一个单词，比如 "cat"。接着，我们查找 "cat" 的后继者，并从中随机选择一个单词，如此往复。最终，这个过程会以一个句号（"."）终止，我们就生成了一句莎士比亚风格的句子！

我们用于查找单词后继者的对象被称为 "后继表（successor table）"，本质上它就是一个字典（dictionary）。这个字典的键是单词，值是该单词的后继单词列表。

Q11: 后继表

下面是一个不完整的 build_successors_table 函数定义。它的输入是一个单词列表（对应于莎士比亚的文本），输出是一个后继表。（默认情况下，第一个单词是句号 "." 的后继者）。请参考下面的示例。

注意：有两个需要填写代码的地方，标记为 "*** YOUR CODE HERE ***"。

def build_successors_table(tokens):
    """Return a dictionary: keys are words; values are lists of successors.

    >>> text = ['We', 'came', 'to', 'investigate', ',', 'catch', 'bad', 'guys', 'and', 'to', 'eat', 'pie', '.']
    >>> table = build_successors_table(text)
    >>> sorted(table)
    [',', '.', 'We', 'and', 'bad', 'came', 'catch', 'eat', 'guys', 'investigate', 'pie', 'to']
    >>> table['to']
    ['investigate', 'eat']
    >>> table['pie']
    ['.']
    >>> table['.']
    ['We']
    """
    table = {}
    prev = '.'
    for word in tokens:
        if prev not in table:
            "*** YOUR CODE HERE ***"
        "*** YOUR CODE HERE ***"
        prev = word
    return table

使用 Ok 来测试你的代码：

python3 ok -q build_successors_table --local

Q12: 生成句子

让我们来生成一些句子！假设我们给定了一个起始单词。我们可以在后继表中查找这个单词，找到它的后继单词列表，然后从这个列表中随机选择一个单词作为句子的下一个单词。然后，我们重复这个过程，直到遇到某个终止标点符号。

提示：要从列表中随机选择一个单词，可以导入 Python 的 random 库： import random，然后使用表达式 random.choice(my_list) 进行选择。

现在是一个尝试字符串拼接的好时机。让我们完成 construct_sent 函数！

def construct_sent(word, table):
    """Prints a random sentence starting with word, sampling from
    table.

    >>> table = {'Wow': ['!'], 'Sentences': ['are'], 'are': ['cool'], 'cool': ['.']}
    >>> construct_sent('Wow', table)
    'Wow!'
    >>> construct_sent('Sentences', table)
    'Sentences are cool.'
    """
    import random
    result = ''
    while word not in ['.', '!', '?']:
        "*** YOUR CODE HERE ***"
    return result.strip() + word

使用 Ok 来测试你的代码：

python3 ok -q construct_sent --local

整合所有内容

太棒了！现在让我们尝试用一些实际数据运行我们的函数。以下代码片段包含在提供的框架代码中，它将返回一个包含莎士比亚所有作品单词的列表。

警告：请不要尝试打印此函数的返回结果。

def shakespeare_tokens(path='shakespeare.txt', url='http://composingprograms.com/shakespeare.txt'):
    """Return the words of Shakespeare's plays as a list."""
    import os
    from urllib.request import urlopen
    if os.path.exists(path):
        return open(path, encoding='ascii').read().split()
    else:
        shakespeare = urlopen(url)
        return shakespeare.read().decode(encoding='ascii').split()

取消以下两行代码的注释，以运行上述函数并从这些单词生成后继表。

# Uncomment the following two lines
# tokens = shakespeare_tokens()
# table = build_successors_table(tokens)

接下来，让我们定义一个辅助函数，它可以根据后继表构造句子：

>>> def sent():
...     return construct_sent('The', table)
>>> sent()
" The plebeians have done us must be news-cramm'd."

>>> sent()
" The ravish'd thee , with the mercy of beauty!"

>>> sent()
" The bird of Tunis , or two white and plucker down with better ; that's God's sake."

请注意，所有的句子都以“The”开头。只需进行一些小修改，我们就可以让句子以随机单词开头。以下 random_sent 函数（已在提供的代码文件中定义）可以实现这一点：

def random_sent():
    import random
    return construct_sent(random.choice(table['.']), table)

现在，你可以将文件加载到 Python 中（确保使用 -i 选项）。然后，你就可以调用 random_sent 函数来生成随机的莎士比亚风格句子！

>>> random_sent()
' Long live by thy name , then , Dost thou more angel , good Master Deep-vow , And tak'st more ado but following her , my sight Of speaking false!'

>>> random_sent()
' Yes , why blame him , as is as I shall find a case , That plays at the public weal or the ghost.'