项目2: CS61A 打字自动纠错软件
Project 2: CS 61A Autocorrected Typing Software
查看英文原文

Programmers dream of
Abstraction, recursion, and
Typing really fast.
介绍
截止日期:
- 提交第一阶段的截止日期为 星期二, 9/29。
- 整个项目的提交截止日期为 星期五, 10/2。
- 如果在星期四, 10/1前提交整个项目,你将获得一个提前提交的额外积分。
你可以与同伴一起完成整个项目。
在这个项目中,你将编写一个用于测量打字速度的程序。此外,你还会实现打字自动纠错功能,这是一种在用户输入单词后尝试纠正拼写错误的功能。这个项目的灵感来自 typeracer。
最终产品
我们的项目解决方案可以在 cats.cs61a.org 进行交互 - 如果你愿意,现在就试试看!当你完成项目时,你将自己实现这个游戏的一个重要部分!
下载起始文件
你可以下载项目代码的 zip 文件 该压缩包包含多个文件,但你的只需要更改
cats.py
。以下是压缩包中包含的文件:
cats.py
: 打字测试逻辑。utils.py
: 与文件和字符串交互的实用程序函数。ucb.py
: CS 61A 课程项目的实用程序函数。data/sample_paragraphs.txt
: 包含待输入文本样本的文件。这些是 抓取 的关于各种主题的维基百科文章。data/common_words.txt
: 包含常见的 英语单词 的文件,按频率排序。data/words.txt
: 包含更多 英语单词 的文件,按频率排序。gui.py
: 用于基于 Web 的图形用户界面 (GUI) 的 Web 服务器。gui_files
: 图形用户界面 (GUI) 所需文件的目录。images
: 图像存储路径ok
,proj02.ok
,tests
: 测试文件。
The CATS GUI is an open source project on Github.
项目安排与提交
本项目总分为20分。其中17分用于最终代码的正确性,1分用于在检查点截止日期前提交第一阶段,2分用于代码结构与风格。
你需要提交以下文件:
cats.py
你无需修改或提交其他文件即可完成本项目。提交项目时,请运行以下命令:
python3 ok --submit
你可以在 Ok dashboard 上查看你的提交记录。
对于我们要求你完成的函数,可能会提供一些初始代码。如果你不想使用这些代码,可以删除并从头开始编写。你也可以根据需要添加新的函数定义。
但请不要修改其他函数。这样做可能导致你的代码无法通过自动评分测试。同时,请不要更改任何函数签名(名称、参数顺序或参数数量)。
在整个项目过程中,你应该经常测试代码的正确性。经常测试有助于快速定位问题,但也不要测试过于频繁,要留出时间思考问题。
我们为你提供了一个名为 ok 的自动评分器,帮助你测试代码并跟踪进度。第一次运行自动评分器时,你会被要求使用你的 Ok 账户通过网页登录,请按要求操作。每次运行
ok
时,它都会将你的工作和进度备份到我们的服务器。
ok
的主要用途是测试你的实现。
我们建议你在完成每一个问题后就提交一次。只有你最后一次提交的内容会被评分。这样也方便我们为你的代码多做备份,以防你遇到提交问题。 如果你忘记提交,系统会自动将你最后一次备份转为正式提交。
如果你不希望我们记录你的作业备份或进度信息,可以运行
python3 ok --local使用此选项时,任何信息都不会发送到课程服务器。 如果你想以交互方式测试你的代码,可以运行
python3 ok -q [问题编号] -i将对应的问题编号(如
01
)填入即可。此命令会运行该问题的测试,直到遇到第一个未通过的测试,然后你可以在交互环境中测试自己编写的函数。
你还可以在 OK 中使用调试打印功能,只需写下
print("DEBUG:", x)这样会在终端输出调试信息,并且不会因为多余输出导致 OK 测试失败。
Phase 1: Typing
Phase 1 Hint Video
This video provides some helpful direction for tackling the problems on this phase of CATS.
Problem 1 (1 pt)
Implement choose
, which selects which paragraph the user will type. It takes a
list of paragraphs
(strings), a select
function that returns True
for
paragraphs that can be selected, and a non-negative index k
. The choose
function return's the k
th paragraph for which select
returns True
. If
no
such paragraph exists (because k
is too large), then choose
returns the
empty string.
Before writing any code, unlock the tests to verify your understanding of the question.
python3 ok -q 01 -u
Once you are done unlocking, begin implementing your solution. You can check your correctness with:
python3 ok -q 01
Problem 2 (2 pt)
Implement about
, which takes a list of topic
words. It returns a function which
takes a paragraph and returns a boolean indicating whether that paragraph
contains any of the words in topic
. The returned function can be passed to choose
as
the select
argument.
To make this comparison accurately, you will need to ignore case (that is, assume that uppercase and lowercase letters don't change what word it is) and punctuation in the paragraph.
Assume that all words in the topic
list are already lowercased and do not
contain punctuation.
Hint: You may use the string utility functions in
utils.py
.
Before writing any code, unlock the tests to verify your understanding of the question.
python3 ok -q 02 -u
Once you are done unlocking, begin implementing your solution. You can check your correctness with:
python3 ok -q 02
Problem 3 (1 pt)
Implement accuracy
, which takes a typed
paragraph and a reference
paragraph. It returns the percentage of words in typed
that exactly match the
corresponding words in reference
. Case and punctuation must match as well.
A word in this context is any sequence of characters separated from other words by whitespace, so treat "dog;" as all one word.
If a typed word has no corresponding word in the reference because typed
is
longer than reference
, then the extra words in typed
are all incorrect.
If typed
is empty, then the accuracy is zero.
Before writing any code, unlock the tests to verify your understanding of the question.
python3 ok -q 03 -u
Once you are done unlocking, begin implementing your solution. You can check your correctness with:
python3 ok -q 03
Problem 4 (1 pt)
Implement wpm
, which computes the words per minute, a measure of typing
speed, given a string typed
and the amount of elapsed
time in
seconds.
Despite its name, words per minute is not based on the number of words typed,
but instead the number of characters, so that a typing test is not biased by the
length of words. The formula for words per minute is the ratio of the number
of characters (including spaces) typed divided by 5 (a typical word length) to
the elapsed time in minutes.
For example, the string "I am glad!"
contains three words and ten characters
(not including the quotation marks). The words per minute calculation uses 2 as
the number of words typed (because 10 / 5 = 2). If someone typed this string in
30 seconds (half a minute), their speed would be 4 words per minute.
Before writing any code, unlock the tests to verify your understanding of the question.
python3 ok -q 04 -u
Once you are done unlocking, begin implementing your solution. You can check your correctness with:
python3 ok -q 04
Time to test your typing speed! You can use the command line to test your
typing speed on paragraphs about a particular topic. For example, the command
below will load paragraphs about cats or kittens. See the run_typing_test
function for the implementation if you're curious (but it is defined for you).
python3 cats.py -t cats kittens
You can try out the web-based graphical user interface (GUI) using the following command.
python3 gui.py
To submit your Phase 1 checkpoint type:
python3 ok --submit
You can submit again once you've finished the whole project, and we will score only your latest submission, but please submit at least once before the checkpoint deadline (after finishing at least the Phase 1 questions) to receive credit for the checkpoint.
Phase 2: Autocorrect
In the web-based GUI, there is an autocorrect button, but right now it doesn't do anything. Let's implement automatic correction of typos. Whenever the user presses the space bar, if the last word they typed doesn't match a word in the dictionary but is close to one, then that similar word will be substituted for what they typed.
Phase 2 Hint Video
This video provides some helpful direction for tackling the problems on this phase of CATS.
Problem 5 (2 pt)
Implement autocorrect
, which takes a user_word
, a list of all
valid_words
,
a diff_function
, and a limit
.
If the user_word
is contained inside the valid_words
list, autocorrect
returns that word. Otherwise, autocorrect
returns the word from valid_words
that has the lowest difference from the provided user_word
based on the
diff_function
. However, if the lowest difference between user_word
and any
of the valid_words
is greater than limit
, then user_word
is returned
instead.
A diff function takes in three arguments. The first two arguments are the two strings to be
compared (the user_word
and a word from valid_words
), and the third argument is
the limit
. The output of the diff function, which is a number, represents the
amount of difference between the two strings.
Here is an example of a diff function that computes the minimum of 1 + limit
and
the difference in length between the two input strings:
>>> def length_diff(w1, w2, limit):
... return min(limit + 1, abs(len(w2) - len(w1)))
>>> length_diff('mellow', 'cello', 10)
1
>>> length_diff('hippo', 'hippopotamus', 5)
6
Assume that user_word
and all elements of valid_words
are lowercase and have
no punctuation.
Important: if multiple strings have the same lowest difference according to
the diff_function
, autocorrect
should return the string that appears first
in valid_words
.
Hint: Try using
max
ormin
with the optionalkey
argument.
Before writing any code, unlock the tests to verify your understanding of the question.
python3 ok -q 05 -u
Once you are done unlocking, begin implementing your solution. You can check your correctness with:
python3 ok -q 05
Problem 6 (2 pts)
Implement shifty_shifts
, which is a diff function that takes two strings. It
returns the minimum number of characters that must be changed in the start
word in order to transform it into the goal
word. If the strings are not of
equal length, the difference in lengths is added to the total.
Important: You may not use while
or for
statements in your
implementation. Use recursion.
Here are some examples:
>>> big_limit = 10
>>> shifty_shifts("nice", "rice", big_limit) # Substitute: n -> r
1
>>> shifty_shifts("range", "rungs", big_limit) # Substitute: a -> u, e -> s
2
>>> shifty_shifts("pill", "pillage", big_limit) # Don't substitute anything, length difference of 3.
3
>>> shifty_shifts("roses", "arose", big_limit) # Substitute: r -> a, o -> r, s -> o, e -> s, s -> e
5
>>> shifty_shifts("rose", "hello", big_limit) # Substitute: r->h, o->e, s->l, e->l, length difference of 1.
5
If the number of characters that must change is greater than limit
, then
shifty_shifts
should return any number larger than limit
and should minimize
the amount of computation needed to do so.
These two calls to shifty_shifts
should take about the same amount of time to
evaluate:
>>> limit = 4
>>> shifty_shifts("roses", "arose", limit) > limit
True
>>> shifty_shifts("rosesabcdefghijklm", "arosenopqrstuvwxyz", limit) > limit
True
Before writing any code, unlock the tests to verify your understanding of the question.
python3 ok -q 06 -u
Once you are done unlocking, begin implementing your solution. You can check your correctness with:
python3 ok -q 06
Try turning on autocorrect in the GUI. Does it help you type faster? Are the corrections accurate? You should notice that inserting a letter or leaving one out near the beginning of a word is not handled well by this diff function. Let's fix that!
Problem 7 (3 pt)
Implement pawssible_patches
, which is a diff function that returns the minimum number
of edit operations needed to transform the start
word into the goal
word.
There are three kinds of edit operations:
- Add a letter to
start
, - Remove a letter from
start
, - Substitute a letter in
start
for another.
Each edit operation contributes 1 to the difference between two words.
>>> big_limit = 10
>>> pawssible_patches("cats", "scat", big_limit) # cats -> scats -> scat
2
>>> pawssible_patches("purng", "purring", big_limit) # purng -> purrng -> purring
2
>>> pawssible_patches("ckiteus", "kittens", big_limit) # ckiteus -> kiteus -> kitteus -> kittens
3
We have provided a template of an implementation in cats.py
. This is a
recursive function with three recursive calls. One of these recursive
calls will be similar to the recursive call in shifty_shifts
.
You may modify the template however you want or delete it entirely.
If the number of edits required is greater than limit
, then pawssible_patches
should return any number larger than limit
and should minimize the amount of
computation needed to do so.
These two calls to pawssible_patches
should take about the same amount of time to
evaluate:
>>> limit = 2
>>> pawssible_patches("ckiteus", "kittens", limit) > limit
True
>>> pawssible_patches("ckiteusabcdefghijklm", "kittensnopqrstuvwxyz", limit) > limit
True
There are no unlock cases for this problem. Make sure you understand the above test cases!
Test your implementation before proceeding:
python3 ok -q 07
Try typing again. Are the corrections more accurate?
python3 gui.py
Extensions: You may optionally design your own diff function called
final_diff
. Here are some ideas for making even more accurate corrections:
- Take into account which additions and deletions are more likely than others. For example, it's much more likely that you'll accidentally leave out a letter if it appears twice in a row.
- Treat two adjacent letters that have swapped positions as one change, not two.
- Try to incorporate common misspellings
Phase 3: Multiplayer
Typing is more fun with friends! You'll now implement multiplayer functionality,
so that when you run gui.py
on your computer, it connects to the
course server at cats.cs61a.org and looks for someone else to race
against.
To race against a friend, 5 different programs will be running:
- Your GUI, which is a program that handles all the text coloring and display in your web browser.
- Your
gui.py
, which is a web server that communicates with your GUI using the code you wrote incats.py
. - Your opponent's
gui.py
. - Your opponent's GUI.
- The CS 61A multiplayer server, which matches players together and passes messages around.
When you type, your GUI sends what you have typed to your gui.py
server, which
computes how much progress you have made and returns a progress update. It also
sends a progress update to the multiplayer server, so that your opponent's GUI can
display it.
Meanwhile, your GUI display is always trying to keep current by asking for
progress updates from gui.py
, which in turn requests that info from the
multiplayer server.
Each player has an id
number that is used by the server to track typing
progress.
Phase 3 Hint Video
This video provides some helpful direction for tackling the problems on this phase of CATS.
Problem 8 (2 pt)
Implement report_progress
, which is called every time the user finishes typing
a word. It takes a list of the words typed
, a list of the words in the
prompt
, the user's user_id
, and a send
function that is used to send a
progress
report to the multiplayer server. Note that there will never be more words in
typed
than in prompt
.
Your progress is a ratio of the words in the prompt
that you have typed
correctly, up to the first incorrect word, divided by the number of prompt
words. For example, this example has a progress of 0.25
:
report_progress(["Hello", "ths", "is"], ["Hello", "this", "is", "wrong"], ...)
Your report_progress
function should return this number. Before that, it
should send a message to the multiplayer server that is a two-element dictionary
containing the keys 'id'
and 'progress'
. The
user_id
is passed into
report_progress
from the GUI. The progress is the fraction you compute. Call
send
on this dictionary to send it to the multiplayer server.
Before writing any code, unlock the tests to verify your understanding of the question.
python3 ok -q 08 -u
Once you are done unlocking, begin implementing your solution. You can check your correctness with:
python3 ok -q 08
Problem 9 (1 pt)
Implement time_per_word
, which takes in times_per_player
, a list of lists
for each player with timestamps indicating when each player finished typing each
word. It also takes in a list words
. It returns a game
with the given
information.
A game
is a data abstraction that has a list of words
and times
. The
times
are stored as a list of lists of how long it took each player to type
each word. times[i][j]
indicates how long it took player i
to type word
j
.
Timestamps are cumulative and always increasing, while the values in time
are differences
between consecutive timestamps. For example, if times_per_player = [[1, 3, 5], [2, 5, 6]]
, the
corresponding time
attribute of the game
would be [[2, 2], [3, 1]]
((3-1)
, (5-3)
and (5-2)
, (6-5)
). Note
that the first value of each list within times_per_player
represents the initial starting time
for each player.
Be sure to use the game
constructor when returning a game
, rather than assuming a
particular data format. Read the definitions for the game
constructor and selectors in
cats.py
to learn more about how the data abstraction is implemented.
Before writing any code, unlock the tests to verify your understanding of the question.
python3 ok -q 09 -u
Once you are done unlocking, begin implementing your solution. You can check your correctness with:
python3 ok -q 09
Problem 10 (2 pt)
Implement fastest_words
, which returns which words each player typed fastest.
This function is called once both players have finished typing. It takes in a game
.
The game
argument is a game
data abstraction, like the one returned in
Problem 9. You can access words in the game
with selectors word_at
, which
takes in a game
and the word_index
(an integer). You can access the time it
took any player to type any word using time
.
The fastest_words
function returns a list of lists of words, one list for each
player, and within each list the words they typed the fastest (against all the other players). In the case of
a tie, consider the earliest player in the list (the smallest player index) to be the one who typed it the
fastest.
Be sure to use the accessor functions for the game
data abstraction, rather
than assuming a particular data format.
Before writing any code, unlock the tests to verify your understanding of the question.
python3 ok -q 10 -u
Once you are done unlocking, begin implementing your solution. You can check your correctness with:
python3 ok -q 10
Congratulations! Now you can play against other students in the course. Set
enable_multiplayer
to True
near the bottom of cats.py
and type swiftly!
python3 gui.py
At this point, run the entire autograder to see if there are any tests that don't pass.
python3 ok
Once you are satisfied, submit to Ok to complete the project.
python3 ok --submit
If you have a partner, make sure to add them to the submission on okpy.org.
Check to make sure that you did all the problems by running
python3 ok --score