Translation. Grasping `all-the-apples-at-once' [jsr-000U]
- March 3, 2025
-
Oleg Kiselyov with contributions from Jinser Kafka, 阅卜录
- Original
Translation. Grasping `all-the-apples-at-once' [jsr-000U]
- March 3, 2025
- Oleg Kiselyov with contributions from Jinser Kafka, 阅卜录
- Original
This article is translated from Oleg Kiselyov's article titled "Grasping `all-the-apples-at-once'".
本文翻译自 Oleg Kiselyov 题为《Grasping `all-the-apples-at-once'》的文章
tips: Click on the translation to expand and view the original text. 点击译文可展开查看原文。
Translated according to the version last updated on June 3, 2024.
根据2024年6月3日更新的版本翻译。
1. 介绍
Introduction
[#257]
- March 3, 2025
- Oleg Kiselyov
1. 介绍 Introduction [#257]
- March 3, 2025
- Oleg Kiselyov
这是一次轻松的讨论,内容是对一次即兴编程竞赛的分析:参与者使用自选的编程语言解决一个简单但现实的问题。结果显示,那些明显正确且用恰当语言表达出的解决方案,都不是用我预期的语言写成的。这个讨论记录了令人不安的观察和几个意想不到的鼓励,同时也反思了常见的编程语言教学及讨论方式。
This is a light talk with the analysis of an impromptu programming contest: solving a simple but realistic problem in a language of participants' choice. It turns out that the solutions that were obviously right, expressed in just the right words were written not in the languages I have expected. This talk notes the unsettling observations and a few unexpected encouragements, along with a reflection on the common way of teaching and arguing about programming languages.
这场竞赛发生在一个以计算机为中心的社区 Lobsters。Lobsters 有点像 HN,也是一个论坛,会员们会提交和讨论各种文章,从学术期刊论文到博客文章、公告和视频。这个社区可以被称为“主流 IT”:包括嵌入式系统到 Web 的开发人员,系统管理员,学生,计算机爱好者以及少数学者。
The contest happened at a computer-focused community Lobsters. Somewhat like HN, Lobsters is a forum for its members to submit and discuss articles: from journal papers to blog posts, announcements and videos. The community is what one may call the `mainstream IT': developers, from embedded systems to the Web, system administrators as well as students, computer hobbyists, and a few academics.
某天,一篇关于常见工作面试问题的文章被发布出来以供讨论。随后,成员们自发地开始提交他们用自己选择的编程语言编写的解决方案。他们还对自己的和其他人的解决方案进行了评论。
One day an article about a common job interview question was posted for discussion. Spontaneously, members started submitting their own solutions, in their language of choice. They also commented on their own and others solutions.
令我惊讶的是,无论它们是使用什么编程语言编写的,是否是函数式语言(如 Haskell、Erlang、Clojure)或命令式语言(如 Go、Ruby、Raku),这些解决方案被整齐地分成了两类。我现在才意识到,递归不是只在理论上才是迭代的对偶,就像循环一样,它是 low-level 的:一次选择一个苹果。同样,“纯函数式”的状态处理和命令式的状态处理也是如此。
It has struck me how the the solutions neatly fall into two classes -- irrespective of the language they are written in, be it a functional (such as Haskell, Erlang or Clojure) or imperative (Go, Ruby, Raku). Only now I realized that recursion is dual to iteration not just in theory. Like loops, it is low-level: selecting one apple at a time. Likewise is the `pure-functional' and imperative handling of state.
我们如何一次抓住所有苹果?就像在自然语言的学习和教学中一样,我们是否应该少关注语法,而更多地关注对话和讲故事?
How do we grasp all the apples at once? As in learning and teaching of natural languages, should we focus less on grammar and more on conversation and storytelling?
该演讲是在 2021年7月1日 的 IFIP WG 2.1 线上会议进行的。
The talk was given at an online meeting of IFIP WG 2.1 on July 1, 2021.
引用
References
[#258]
- March 3, 2025
- Oleg Kiselyov
引用 References [#258]
- March 3, 2025
- Oleg Kiselyov
Lobsters 上竞赛的讨论串。2021年2月。
<https://lobste.rs/s/zpqrpc/random_job_interview_challenge_clojure>
The Lobsters thread with the contest. February 2021.
<https://lobste.rs/s/zpqrpc/random_job_interview_challenge_clojure>
2. 竞赛问题
Contest problem
[jsr-000V]
2. 竞赛问题 Contest problem [jsr-000V]
Alexey Grigorev (@Al_Grigor) February 3, 2021 (Twitter)
大多数候选人都无法解决这个面试问题:
输入: "aaaabbbcca"
输出: [("a",4), ("b", 3), ("c", 2), ("a", 1)]
写一个将输入转换为输出的函数
我在视频面试中提出这个问题,给予 25 分钟时间
你会如何解决?
Most candidates cannot solve this interview problem:
Input: "aaaabbbcca"
Output: [("a",4), ("b", 3), ("c", 2), ("a", 1)]
Write a function that converts the input to the output
I ask it in the screening interview and give it 25 minutes
How would you solve it?
Alexey Grigorev (@Al_Grigor) February 3, 2021 (Twitter)
引用
References
[#288]
引用 References [#288]
该问题的说明引用自 Lobsters 上发布的网页:
A Random Job Interview Challenge In Clojure. Feb 4 2021 by Savo Djuric
<https://savo.rocks/posts/a-random-job-interview-challenge-in-clojure/>
The statement of the problem is quoted from the web page posted on Lobsters:
A Random Job Interview Challenge In Clojure. Feb 4 2021 by Savo Djuric
<https://savo.rocks/posts/a-random-job-interview-challenge-in-clojure/>
3. 解法样本
Sample solutions
[jsr-000W]
3. 解法样本 Sample solutions [jsr-000W]
提交到 Lobsters 供讨论的博客文章描述了在 Clojure 中的一个解决方案。但不知为何,这促使会员们用自己选择的编程语言来解决这个问题并发布结果。于是这就形成了一场即兴竞赛。
The blog post submitted to Lobsters for discussion described one solution, in Clojure. Somehow it prompted the members to solve the problem in their language of choice, and post the results. Hence there developed an impromptu contest.
以下是已发布的解法样本。这些解决方案由不同的人用不同的编程语言编写⸺然而不知为何,它们分成了两组,我暂时称之为 Group I 和 Group II。我稍后会告诉你这些组的意义⸺请让我卖个关子。
Below is a sample of posted solutions. They are written in various languages by various people -- yet somehow fall into two groups, which I call for now Group I and Group II. I'll tell what they mean only later -- I hope you'd allow me a bit of suspense.
3.1. Clojure (Group I) [#277]
3.1. Clojure (Group I) [#277]
(->> "aaaabbbcca"
(partition-by identity)
(map (juxt (comp str first) count)))
提交者还提供了另一个解法:“有点繁琐,但更清晰”:
(for [p (partition-by identity "aaaabbbcca" )]
[(-> p first str) (count p)])
The submitter also gave another solution, ``a bit chattier but also more legible:''
值得注意的是,
comp
是
函数组合
,而 juxt
粗略的类型是 (a -> b * a -> c) -> a -> (b * c)
,这是一个组合子,它接收一个函数二元组和一个值,对该值应用两个函数,然后返回由这两个函数的返回值组成的元组。
It is worth noting that comp is the functional composition, and juxt, with the rough type (a->b * a->c) -> a -> (b * c), is the combinator that takes a tuple of functions and a value, applies each function to it and returns the results as a tuple.
3.2. Ruby (Group I) [#278]
3.2. Ruby (Group I) [#278]
Ruby 有三种解法⸺都是单行的,而且很相似。第一个可能是最容易理解的。
There were three solutions in Ruby -- all one-liners, and rather similar. The first is probably the easiest to understand.
"aaaabbbcca".split("").chunk_while {|a,b| a==b}.map {|a| [a.first, a.size]}
"aaaabbbcca".chars.slice_when {|c,n| c != n }.map {|s| [s[0], s.size]}
'aaaabbbcca'.chars.chunk(&:itself).map{|x,y| [x, y.size]}
3.3. APL (Group I) [#279]
3.3. APL (Group I) [#279]
两个 APL 解法:
Two APL solutions:
(((1∘+ +\) ⊢ ≠ ¯1∘⌽) (((⊃,≢) ⊢) ⌸) ⊢) 'aaaabbbcca'
f←(⊃,≢)¨⊢⊆⍨1+(+\0,2≠/⊢) ⊢x←f'aaaabbbcca'
3.4. GO (Group II) [#280]
3.4. GO (Group II) [#280]
我知道这是一个 Clojure 的帖子,但是出于好奇,我用 Go 编写了它(我最近一直在使用 Go)。只是为了看看是否可以快速解决。我花了大约 10 分钟,减去 3-4 分钟与 runes 战斗的时间。 这不是特别优雅,但能工作。
``I know this is a Clojure post, but out of curiosity I wrote it in Go (what I've been working in lately) just to see if I could solve it quickly. Took about 10 minutes, subtract 3-4 for fighting with runes. It's not particularly elegant, but it works.''
type result struct {
letter string
count int
}
func main() {
const input = "aaaabbbcca"
var ret []result
currentLetter := string(input[0])
countCurrentLetter := 1
for _, elem := range input[1:] {
elemAsString := string(elem)
if currentLetter == elemAsString {
countCurrentLetter++
} else {
ret = append(ret, result{currentLetter, countCurrentLetter})
currentLetter = elemAsString
countCurrentLetter = 1
}
}
ret = append(ret, result{currentLetter, countCurrentLetter})
fmt.Printf("%+v", ret)
}
该解法与类似解法的显著特征是明确的迭代:
for
循环。
The salient feature of this and similar solutions is explicit iteration: the for loop.
3.5. Racket (Group II) [#281]
3.5. Racket (Group II) [#281]
(for/fold ([p #f]
[cnt 0]
[counts null]
#:result (cdr (reverse (cons `(,p ,cnt) counts))))
([c (in-string "aaaabbbcca")])
(if (equal? p c)
(values c (add1 cnt) counts)
(values c 1 (cons `(,p ,cnt) counts))))
该解法是纯函数式的⸺但与上面 GO 的解决方案非常相似。
The solution is pure functional -- yet closely resembling the Go solution above.
3.6. Raku (Group I) [#282]
3.6. Raku (Group I) [#282]
"aaaabbbcca".subst( /(.) $0*/ , {"({$0},{$/.chars})" }, :g )
Raku 是前身为 Perl 6 的艺术品。Perl 一直以其(非常)正则表达式而闻名,威力可见一斑。
Raku is the artist formerly known as Perl 6. Perl has always been known for its (ir)regular expressions, whose power is on display.
3.7. Erlang (Group II) [#283]
3.7. Erlang (Group II) [#283]
L = "aaaabbbcca",
lists:reverse(lists:foldl(
fun
(Elt, [ {[Elt], N} | R ]) -> [ {[Elt], N + 1} | R];
(Elt, Acc) -> [{[Elt], 1} | Acc]
end,
[],
L
)).
与 Racket 解法相似。我们看到了 Erlang
非线性模式匹配
的特征:在第一个模式中变量
Elt
出现了两次。
It is rather similar to the Racket solution. We see the characteristic for Erlang non-linear pattern matching: the variable Elt appears twice in the first pattern.
3.8. Python (Group I) [#284]
3.8. Python (Group I) [#284]
[(k, len(list(g))) for k, g in itertools.groupby("aaaabbbcca")]
3.9. Rust (Group II) [#285]
3.9. Rust (Group II) [#285]
fn run_length_encode(ins: &str) -> Vec<(char, usize)> {
let mut out = vec![];
let mut i = ins.chars();
if let Some(mut c) = i.next() {
let mut count = 1;
for new_c in i {
if new_c == c {
count += 1;
} else {
out.push((c, count));
count = 1;
c = new_c;
}
}
out.push((c, count));
}
out}
就像 GO 的解法一样⸺但是更干净,并且更好地处理边缘情况。作者确实掌握了问题的本质:
游程编码
。
It is quite like the solution in Go -- but cleaner and with a better handling of edge cases. The author did grasp the nature of the problem: run-length encoding.
3.10. Julia (Group II) [#286]
3.10. Julia (Group II) [#286]
function rle(itr)
acc = Tuple{eltype(itr), Int}[]
x = iterate(itr)
isnothing(x) && return acc # iterable is empty
last, state = x
n = 1
for v in Iterators.rest(itr, state)
if last == v
n += 1
else
push!(acc, (last, n))
last = v
n = 1
end
end
push!(acc, (last, n))
return acc
end
镜像了上面的 Rust 解法。
It mirrors the Rust solution above.
3.11. Haskell (Group II) [#287]
3.11. Haskell (Group II) [#287]
main = interact soln
soln = show . reverse . go []
go counts [] = counts
go counts (x:xs) = go (inc x counts) xs
inc c ((x,ct):xs) | c == x = (x,ct+1):xs
inc c xs = (c, 1):xs
尽管该解法依赖于显式递归而不是
fold
,但它与 Erlang 解法类似。(后来有另一个人提到了另一种使用 group
的解法)。
Although this solution relies on the explicit recursion rather than fold, it is rather similar to the Erlang solution. (A different person later mentioned another solution, using group).
4. 评论
Comments
[jsr-000X]
4. 评论 Comments [jsr-000X]
Lobsters 竞赛的一个特色是参与者对自己和他人的解法的评论。以下是几条值得注意的评论,它们进一步证明了我在本次演讲中的观点。
A particular feature of the Lobsters' contest is comments of the participants, about their and others' solutions. Below are several notable comments, which go on to make my point in this talk.
一位评论者想知道:
One commenter wondered:
阅读 Clojure 解法需要跟着作者的思维过程。阅读等效的 Go 解法几乎没有认知开销。
Clojure 程序员如何编写代码,才能不仅在作者眼中简洁优雅,而且还可以让其他人快速阅读而无需花费时间进行脑力劳动?``Reading the Clojure solution involves following the thought process of how the author ended up there. Reading the equivalent Go solution has very little cognitive overhead.
How does Clojure programmers write code that is not just brief and elegant in the eyes of the author, but can also be read quickly by others without time consuming mental gymnastics?''
另一人回答:
and another replied:
我想这只是你的习惯的问题。我发现上面的两种 Go 解法都很难理解,因为你必须遵循算法并在脑海中构建它的功能。而在大多数 Clojure 解法中,代码会直接告诉你发生了什么。
I guess it's just a matter of what you're used to. I find both Go solutions above arduous to understand because you have to follow the algorithm and construct what it functionally does in your head. Whereas in most of the clojure solutions the code directly tells you what's happening.
例如,带有注释的最高帖子:
Eg. the top post with comments:
; threading macro (take input, then do stuff to it in steps)
(->> "aaaabbbcca"
; split into lists of consecutive same character
(partition-by identity)
; map each item (list of chars) with a function
(map
; return list of outputs of functions
(juxt
; first character as string
(comp str first)
; count of characters
count)))
该信息更加密集。如果你擅长阅读它,你可以一目了然地理解它。
The information is much more dense. If you are good at reading it, you understand it at a glance.
我相信 APL 解法也是如此。我看不到它们,但是确实有清楚看到算法的人。
I believe the same is true for the APL solutions. I can't read them, but someone who does sees the algorithm clearly.''
另一位评论者同意:
Yet another commenter concurred:
我也是;他们的[Go 解法]似乎超级繁琐和低级。就像你给要编译器灌输基本的东西,而不是自然地描述算法一样。
``Same here; they [Go solutions] just seem super tedious and low-level; like you're spoon-feeding the compiler on basic things instead of just describing the algorithm naturally.
如果你没有词汇来描述算法,那么当然低级版本会更容易阅读,就像刚刚学习英语的人会发现“简化英语”维基百科比经常使用专业术语的常规维基百科更容易理解一样。但如果你每天都使用算法,你就应该投资学习这些术语!
If you don't have the vocabulary to describe an algorithm, of course the low-level version is going to be easier to read, much like someone who is only just learning English will find the `Simplified English' wikipedia more accessible than the regular one that often uses specialized jargon. But if you work with algorithms every day, you owe it to yourself to invest in learning the jargon!''
5. 苹果类比
The apples analogy
[jsr-000Y]
5. 苹果类比 The apples analogy [jsr-000Y]
我已经提到了几次苹果。以下引用能说明一切。
I have mentioned apples several times already. The following quote explains what is it all about.
假设我们有一箱苹果,想从中挑选出所有好苹果。我们不想自己做这个任务,所以我们写了一张清单,要求助手来完成。
``Suppose we have a box of apples from which we wish to select all of the good apples. Not wishing to do the task ourselves, we write a list of instructions to be carried out by a helper.
常规语言的指令可能表达如下内容:从箱子里选一个苹果。如果是好的,就把它放在为好苹果保留的地方;如果不好,就把它扔掉。再选一个苹果;如果是好的,就把它放在保留的地方;如果不好,就把它扔掉…继续以这种方式依次检查每个苹果,直到选出所有好苹果。总之,从我们拿起的第一个苹果开始,一次检查一个苹果,直到检查到箱子里的最后一个苹果,直到我们选出并把所有好苹果都放在一边。
The instructions corresponding to a conventional language might be expressed something like the following: Select an apple from the box. If it is good, set it aside in some place reserved for the good apples; if it is not good, discard it. Select a second apple; if it is good put it in the reserved place, and if it is not good discard it.... Continue in this manner examining each apple in turn until all of the good apples have been selected. In summary, then, examine the apples one at a time starting with the first one we pick up and finishing with the last one in the box until we have selected and set aside all of the good apples.
另一方面,与数组语言相对应的指令可以简单地说明“从箱子里选择所有好苹果”。
On the other hand the instructions corresponding to an array language could be stated simply as ``Select all of the good apples from the box.'' Of course, the apples would still have to be examined individually, but the apple-by-apple details could be left to the helper.
可以考虑常规的编程语言是以机器语言为原始形式的“
一次一苹果
”语言,而数组语言可以视为“
一次所有苹果
”语言。
Conventional programming languages may be considered then ``one-apple-at-a-time'' languages with machine languages being a very primitive form, whereas array languages may be considered ``all-the-apples-at-once'' languages.''
2005年10月,
基思·斯迈利
:《My Life with Array Languages》。这个比喻要感谢那个以写出《人月神话》 而闻名的
弗雷德
·P·
布鲁克斯
。
Keith Smillie: ``My Life with Array Languages''. October 2005. This analogy is credited to Frederick P. Brooks: that Fred Brooks, of the Mythical Man-Month fame.
(译者按:《My Life with Array Languages》 原文
The apple analogy originated with Frederick P. Brooks who worked closely with Ken Iverson in the early days of the development of APL.原翻译有错,感谢 阅卜录 指出。)
6. 观察
Observations
[jsr-000Z]
6. 观察 Observations [jsr-000Z]
在一场不受控制的竞赛中,仅凭几个针对单个问题的解法,我们无法得出任何统计结论。但仍有足够的数据供我们观察和思考。第一个观察结果是,人们成功地使用了多种语言来解决问题,提交者自述大约花了 5-10 分钟。提交的解法似乎也很地道,反映了“社区标准”,这些语言教学(或自学)用于工业家的使用方式。
A few solutions to a single problem in an uncontrolled contest do not let us draw any statistical conclusions. Still there is enough data for observations and reflections. The first observation is that quite a variety of languages were used to solve the problem, successfully, and it took the submitters, by their self-reports, about 5-10 minutes. The submitted solutions also seem idiomatic, reflecting the `community standards', the way these languages are (self-) taught and used in the industry.
第二个令人惊讶的观察结果是,提交的解法分为两类:
The second, surprising observation is that the submitted solutions fall into two classes:
逐字符扫描输入,构造结果: Group II
将问题视为 groupBy,然后
映射
:Group I
scan the input string char by char, building the result: Group II
see the problem as groupBy followed by mapping: Group I
这是编写各组解法的语言:
Here are the languages used to write the solutions in each group:
- Group I: Clojure, Ruby, Raku, Python, Haskell, APL
- Group II: Go, Racket, Erlang, Rust, Julia, Haskell
无论选择哪种方式分类编程语言⸺函数式与命令式、旧式与新型、由“业余爱好者”创建与由“专业人士”创建⸺这两类语言中都有表现。
Whichever the classification of programming languages one may choose -- functional vs. imperative, old vs. new, created by `amateurs' vs. `professionals' -- there is a representative in both groups.
看看 Racket 和 Rust(或 Go、Julia)的解法,可以得出另一个关于
状态
的结论。尽管 Racket 解法是“纯函数式”的,但它与 Rust 和 Go 解法一样拥有状态。状态数量相同。Racket 解法中的状态使用
values
子句一次性更新;更新按位置而不是按名称与其目标匹配,这很容易出错。(顺便说一句,Rust 和 Julia 对边缘情况的处理比 Racket 或 Go 解法更清晰,显然更正确。)
Looking at the Racket and Rust (or Go, Julia) solutions leads to another conclusion, about state. Although the Racket solution is `pure functional', it is just as stateful as Rust and Go solutions. There is the same amount of state. The state is updated in the Racket solution at once, with the values clause; the update is matched to its target by position rather by name, which is quite error prone. (As a side observation, the handling of edge cases is clearer and obviously correct in Rust and Julia than in the Racket or Go solutions.)
因此,“纯函数式”并不会消除状态;它只是改变了处理状态的方式⸺这可能或多或少更清晰和方便。毫无疑问,改进状态处理的方法是将其划分并封装到“
更高级
”、
可组合
的操作中,如“groupBy”:参见 Clojure、Ruby、Python 解法。groupBy 本身可以实现可变或不可变状态⸺无论怎样,状态都被抽象出来,groupBy 的用户不必关心它。
Thus being `pure functional' does not eradicate state; it merely changes the way state is handled -- which could be either more or less clearer and convenient. What unequivocally improves the handling of the state is partitioning and encapsulating it into `higher-level', composable operations like `groupBy': see Clojure, Ruby, Python solutions. The groupBy may itself be implemented with mutable state or without -- no matter, the state is abstracted away and the user of groupBy does not have to care about it.
7. 惊喜
Pleasant surprises
[jsr-0010]
7. 惊喜 Pleasant surprises [jsr-0010]
第一个惊喜是 Group I(即使用
一次所有苹果
解法)的人数众多⸺而且当今使用的语言也具有足够的能力来表达这些解决方案。
The first pleasant surprise was the notable number of Group I, that is, all-the-apples-at-once solutions -- and the number of languages in use today with the adequate facilities to express these solutions.
引发此次竞赛的 Savo Djuric 的博客文章也描述了一个解法,值得引用。
The blog post by Savo Djuric that prompted the contest has also described a solution, which is worth quoting.
由于我最近才开始学习 Clojure,而且我经常做 4clojure 练习,所以这个问题激起了我的兴趣,我决定立即尝试解决它。我启动了我的 REPL,几分钟后,我写出了:
Since I recently started learning Clojure, and I regularly do 4clojure exercises, this problem seemed intriguing, so I decided to try to solve it immediately. I fired up my REPL, and within several minutes i came up with:(interleave (map str (map first (partition-by identity "aaaabbbcca"))) (map count (partition-by identity "aaaabbbcca"))) => ("a" 4 "b" 3 "c" 2 "a" 1)
耶!这有用!但它并不完全符合要求。可以说,它也不是很“干净”。这个解法最让我困扰的是重复的代码,所以我决定用 let:... 来解决这个问题。
Yaay! It works! But it is not exactly what was asked for. Nor is it very ‘clean’, so to say. What bothers me the most in this solution is that i have repetitive code, so I decided to solve that with let:...(defn chop-chop [coll] (let [x (partition-by identity coll)] (map list (map (comp str first) x) (map count x))))
我必须说,在几分钟之内解决问题感觉很棒(尽管这并不难),因为我只是刚开始 Clojure 的旅程。
I must say that it felt great to solve a problem within several minutes (albeit it was not that hard), since I'm only at the beginning of my journey with Clojure.
这篇博文展示了我认为的解决问题的理想方法:把握问题的本质⸺
groupBy
,或 Clojure 中的partition-by
⸺设计解法原型,理解其缺陷并希望做得更好,采取措施改进。我认为作者作为程序员前途光明。
The blog post shows the ideal, to me, way to solve the problem: grasping its nature -- groupBy, or partition-by in Clojure, -- prototyping the solution, understanding its imperfections and wanting to do better, taking steps to improve. I think the author has a bright future as a programmer.
我从他的博客中得知,作者从未上过大学,偶然进入编程领域(但受到他才华横溢的兄弟的启发),并且一直在自学编程语言(现在是 Clojure)。也许这就是(他能做到这件事的)秘密。
From his blog I learned that the author never went to college, got into programming by accident (but prompted by his talented brother), and has been studying programming languages (now, Clojure) entirely on his own. May be that's the secret.
另一个惊喜是看到了我眼中理想的解法:
Another pleasant surprise was seeing the solution that in my eyes is ideal:
[(k, len(list(g))) for k, g in itertools.groupby("aaaabbbcca")]
这个 Python 解法本质上是用常规的数学符号来表达的,每个人都可以读懂(与 APL 不同)。更重要的是,它在语言上很经济⸺同样,与 APL 不同。它只使用两个直接相关的功能:
groupBy
和 comprehension。符号在这里显然是思维的工具。
This Python solution is expressed in what is essentially the conventional mathematical notation, readable by everyone (unlike APL). More importantly, it is linguistically economical -- again, unlike APL. It uses just the two directly relevant facilities: groupBy and comprehension. Notation is clearly here the tool of thought.
8. 结论:编程和自然语言,软件和文学 [jsr-0011]
8. 结论:编程和自然语言,软件和文学 [jsr-0011]
前面引用的一条评论一针见血:要描述一种算法,就需要
词汇表
。编程的很大一部分是
构建
(
学习
或
构造
)词汇来讨论常见的(子)问题,例如
游程编码
、groupBy、
映射
、
累积
、
过滤
等。学习、寻找和命名“正确”的抽象是数学的核心活动⸺正如“符号作为思维工具”中所阐述的那样。
A comment cited earlier hits the nail on the head: to describe an algorithm, one needs vocabulary. A large part of programming is building -- learning or constructing -- vocabulary to talk about common (sub)problems, e.g., run-length-encoding, groupBy, mapping, accumulating, filtering, etc. Learning, finding and naming the `right' abstractions is the core mathematical activity -- as explicated in the `Notation as a tool of thought'.
自然语言也是一种思维工具。建立词汇表并理解何时以及如何使用某个词是学习一门语言最难的部分。自然语言和形式语言有很多共同之处,这一观点由来已久⸺
蒙塔古
在 20 世纪 70 年代就曾对此作出著名而有力的阐述。在之前引用过的
基思·斯迈利
关于数组语言的论文中,他进一步指出,编程教学应该借鉴外语教学最好的理念。
A natural language is also a tool of thought. Building vocabulary and understanding when and how to use a particular word is the hardest part of learning a language. That natural and formal languages have a lot in common is the idea with a long pedigree -- famously and forcefully articulated by Montague in 1970s. Keith Smillie, in his paper about array languages cited earlier further argues that teaching programming ought to adapt the best ideas of teaching foreign languages.
无论是编程语言还是自然语言,人们都可以相对较快地掌握基础知识。流利掌握则需要几十年的时间。似乎没有什么比不断练习、说和听、写和重写更好。我想起
欧内斯特·海明威
和
乔治·普林普顿
(《Interviewer》杂志的创始人和编辑)之间那次著名访谈的结尾(《
巴黎评论
》第 18 期,1958 年春季):
Both in programming and natural languages, one can acquire basics relatively quickly. Fluency takes decades. There doesn't seem to be anything better than constant practice, speaking and listening, writing and rewriting. The ending of the famous interview between Ernest Hemingway and George Plimpton (the `Interviewer', the magazine's founder and editor) (Paris Review, issue 18, Spring 1958) comes to mind:
采访者 你在读到前一天停下的地方时会重写吗?还是等到全部完成后再重写?
海明威 我总是每天重写到我停下的地方。都完成后,你自然会再看一遍。在别人打字时,你会在干净的排版中看到它,这也是另一个修改和重写的机会。在校样时你还有最后的机会。你会对这几次不同的机会心怀感激。
采访者 你进行了多少次重写?
海明威 看情况。《 永别了,武器 》的结尾,也就是最后一页,我重写了三十九次才满意。
采访者 是有什么技术问题吗?什么问题难住了你?
海明威 推敲用词。
INTERVIEWER Do you do any rewriting as you read up to the place you left off the day before? Or does that come later, when the whole is finished? HEMINGWAY I always rewrite each day up to the point where I stopped. When it is all finished, naturally you go over it. You get another chance to correct and rewrite when someone else types it, and you see it clean in type. The last chance is in the proofs. You’re grateful for these different chances. INTERVIEWER How much rewriting do you do? HEMINGWAY It depends. I rewrote the ending to Farewell to Arms, the last page of it, thirty-nine times before I was satisfied. INTERVIEWER Was there some technical problem there? What was it that had stumped you? HEMINGWAY Getting the words right.
9. 尾声:我的“苹果流”解决方案 [jsr-0012]
9. 尾声:我的“苹果流”解决方案 [jsr-0012]
我该如何解决这个问题?我第一次看到它时,我脑海中浮现的第一个想法是:
groupBy
,这是对序列中连续元素进行分组的操作的通用名称。然后我们需要以 (group-element, counter) 对的形式呈现这些组。
How would I solve the problem? When I first saw it, the very first thought that came to mind was: groupBy, which is the common name for the operation to group consecutive elements of a sequence. We then need to present the groups in the form of (group-element, counter) pairs.
为了正确性(以及其他很快就会清楚的原因),我将使用 OCaml。假设我们有以下操作
I will use OCaml for concreteness (and for other reasons that should be clear soon). Suppose we have the operation
group : 'a list -> 'a list list
它将列表中(根据某些标准)相等的连续元素分组到它们自己的列表中,并返回这些组的列表(组不能为空)。然后,该问题解法的第一个近似写法如下,非常接近上面的第一个 Clojure 解决方案:
which groups equal (according to some criterion) consecutive elements of a list in their own list, and returns the list of such groups. (A group cannot be empty). Then the first approximation to the solution is as follows, quite close to the first Clojure solution above:
let rll (group : char list -> char list list) : char list -> (char * int) list =
group >> List.(map (fun g -> (hd g, length g)))
类型注释是可选的,但我倾向于写下它们(一个使用 Ocaml 的理由就是类型:不花哨,但没有过多细节,足够可视地展示数据的转移)。由于组是假设的,它作为参数(parameter/argument)出现。这里的
(>>)
是从左到右的函数组合(就像 F#)。将输入视为
序列
(例如列表)很方便。但字符串在 OCaml 中不是列表⸺不过可以借助几个标准库函数轻松地将其转换为列表:
Type annotations are optional, but I prefer to write them. (One reason to use OCaml is types: not fancy, but sufficient to visualize data transformations, without too many details.) Since group is the assumption, it appears as the parameter (argument). Here, (>>) is the left-to-right functional composition (as in F#). It is convenient to think of input as a sequence, such as list. But a string is not a list in OCaml -- but can be easily turned into one, with the help of a couple standard library functions:
let rll (group : char list -> char list list) : string -> (char * int) list =
String.to_seq >> List.of_seq >> group >> List.(map (fun g -> (hd g, length g)))
使用 APL 中经常使用的 fork 组合器(但是 left-to-write 版本):
With the fork combinator, so frequent in APL (but a left-to-write version):
let fork : ('a -> 'b) -> ('a -> 'c) -> ('b -> 'c -> 'd) -> 'a -> 'd = fun f g h -> fun x -> h (f x) (g x)
这个解法甚至可以写得更清楚些:
the solution can be written even cleaner:
let rll (group : char list -> char list list) : string -> (char * int) list =
String.to_seq >> List.of_seq >> group >> List.(map (fork hd length pair))
我们只需要
group
本身的实现即可运行此代码。可惜的是,OCaml 标准库目前不提供分组操作。我当然可以编写它。但让我更详细地重新表述这种方式。
We only need the implementation of group itself to run this code. Alas, the OCaml standard library at present does not provide a grouping operation. I can of course write it. But let me rephrase this approach in more detail.
我们从头开始,逐步开发解法,强调越来越紧地“
抓住
”数据流。重点在于理解数据流,而不是编写循环或递归函数,与终止条件和边缘情况作斗争。我们也不会沉迷于状态:它在需要时自然而然地出现。分组也是自然而然的,而且更简单。当我们找到解法时,只要一看就能知道它一定是正确的。确实如此。它也相当高效。它甚至可以稍后机械地重铸成 C 以提高效率⸺不是通常手写的那种 C,而是正确且没有问题的。
Let us start from scratch and develop the solution in smaller steps, emphasizing tighter and tighter `grasping' of the data flow. The stress is on understanding data flow, rather than on writing loops or recursive functions and struggling with termination conditions and edge cases. We won't obsess about state either: it comes naturally when needed. Grouping also comes naturally, and simpler. When we reach the solution, one look at it will tell that it must be right. It is. It is also reasonably efficient. It can even be later mechanically recast into C for better efficiency -- not the sort of C one typically writes by hand, but correct and problem-free.
首先从问题陈述中,我们很快就能明白我们面临的是一个
顺序
问题:将字符序列转换为其他序列(成对的),这是
局部
进行的。实际上,向示例输入
"aaaabbbcca"
添加更多字符不会影响示例输出的前缀 ("a",4), ("b", 3), ("c", 2)
。一旦理解了这个顺序特性,其余的东西几乎是自然而然的。
First, from the problem statement one quickly understands that we are facing a sequential problem: transforming a sequence of characters into some other sequence (of pairs), locally. Indeed, appending more characters to the sample input "aaaabbbcca" will not affect the prefix ("a",4), ("b", 3), ("c", 2) of the sample output. Once this sequential character is understood, the rest follows almost automatically.
为了思考和开发解法,我们需要一个
词汇表
,我们在过程中不断建立它。首先,我们需要一个表示序列的词。我们称之为
'a seq
:元素类型为 'a
的序列类型。现在我们先不要担心它是如何实现的,而是将其视为抽象类型。由于输入是字符串,我们显然需要操作
To think about and develop the solution, we need a vocabulary, which we build up as we go along. First we need a word for a sequence. Let's call it 'a seq: a type of a sequence of elements of type 'a. Let us not worry for now how it is actually implemented and treat it as an abstract type. Since the input is given as string, we obviously need the operation
of_string : string -> char seq
来将字符串
转换
为(或
渲染
为)字符序列。同样假设它已给定。再次重申,现在重要的是代表数据流的类型。
to transform (or, render) a string as a sequence of characters. Assume it is given. Again, what is important for now is the type, which represents the data flow.
接着我们必须处理如何将相邻字符分组在一起,这本质上是
有状态的
:字符属于哪个组取决于上下文,即取决于之前或之后的字符。我们采取后者,假设操作
Next we have to deal with grouping of neighboring characters, which is inherently stateful: which group the character belongs to depends on the context, that is, on the character that came before, or follows next. Let's go with the latter and assume the operation
look_ahead : 'a seq -> ('a * 'a option) seq
将序列的当前元素与下一个元素配对。下一个元素可能不存在(如果序列结束),所以使用
'a option
,允许缺失值。
that pairs the current element of sequence with the element that comes next. The next element may be absent (if the sequence is about to end), hence the 'a option type, allowing for the missing value.
然后,通过查看下一个元素,分组本身会插入分组中断:
The grouping itself is then inserting group breaks, determined by peeking at the next element:
type 'a annot = Plain of 'a | Break of 'a
let group : ('a * 'a option) seq -> 'a annot seq =
map @@ function
| (x,Some next) when x = next -> Plain x
| (x,_) -> Break x
也就是说,我们将
前瞻序列
转换为用
组边界
注释的元素序列:
Plain x
表示当前组继续,Break x
表示 x
是其组中的最后一个元素。如果有下一个元素,开始一个不同的组。('a annot
类型可以以不同的方式实现,比如一个对 'a * bool
。)我们还假设
映射
操作存在
That is, we transform a looked-ahead sequence into a sequence of elements annotated with group boundaries: Plain x means the current group continues, Break x means x is the last element in its group. The next element (if any) begins a different group. (The 'a annot type could have been realized differently, as a pair 'a * bool.) We have also assumed the existence of the mapping operation
map : ('a -> 'b) -> 'a seq -> 'b seq
最后,我们必须计数,这再次需要状态(当前组中元素的数量)并在组边界
发出
元组:
Finally, we have to count, which again requires state (the count of elements in the current group) and emit the tuples at the group boundary:
let count : 'a annot seq -> ('a * int) seq =
map_accum_filter 0 @@ fun cnt -> function
| Plain x -> (None, cnt+1)
| Break x -> (Some (x,cnt+1), 0)
分组中断标记强制输出(描述刚刚结束的组)并重置计数器。我们假设存在 accumulating map-filter,流处理中相当流行的操作:
The group break forces the output (describing the group that has just ended) and resets the counter. We assumed the existence of accumulating map-filter, a rather popular operation in stream processing:
map_accum_filter : 'state -> ('state -> 'a -> 'b option * 'state) -> 'a seq -> 'b seq
映射
('state -> 'a -> 'b option * 'state)
接受当前状态和当前元素,并返回可能转换的元素和新状态的元组。如果可能转换的元素为 None,则输出流中不会发出任何内容;但状态会继续前进。从类型来看,map_accum_filter
看起来是“
纯
”的,是函数式的状态传递。记住这个想法。
The mapping ('state ->'a -> 'b option * 'state) takes the current state and the current element and returns a tuple of a possibly transformed element and the new state. If the possibly transformed element is None, nothing is emitted in the output stream; the state advances nevertheless. From the type of it, map_accum_filter looks `pure', with functional-state passing. Hold that thought.
就是这样。问题的解法,即游程编码的字符序列,就是我们到现在开发的操作的组合:
That is it. The solution to the problem, the run-length-encoded sequence of characters, is then the composition of what we have developed so far:
let rll = of_string >> look_ahead >> group >> count
它以简洁且明显正确的方式表达了问题。我们只需要在链中添加一个观察操作,将结果
(char * int) seq
转换为列表,或者直接将其打印出来。
which expresses the problem in the concise and obviously correct way. We only need to add to our chain an observation operation, to transform the resulting (char * int) seq to a list, or to print it out.
我们解法的一个优点是易于修改。例如,如果源字符串是 UTF-8 编码的,我们只需要在管道中添加一个操作:UTF-8 解码和
字素聚类
:
One advantage of our solution is the ease of modification. For example, if the source string is UTF-8 encoded, we only need to add to the pipeline one more operation: UTF-8 decoding and grapheme clustering:
utf_decode : char seq -> uchar seq
留给读者作为练习。
We leave this as an exercise to the reader.
要真正运行
rll
程序,我们必须履行承诺:为 'a seq
选择一个具体表示并实现我们为其假设的操作。用 OCaml 术语来说,我们必须实现以下签名:
To actually run the rll program, we have to fulfill our promises: to pick a concrete representation for 'a seq and implement the operations that we assumed for it. In OCaml terms, we have to implement the following signature:
module type simple_seq = sig
type 'a seq
val of_string : string -> char seq
val map : ('a -> 'b) -> 'a seq -> 'b seq
val look_ahead : 'a seq -> ('a * 'a option) seq
val map_accum_filter : 'state -> ('state ->'a -> 'b option * 'state) -> 'a seq -> 'b seq
val iter : ('a -> unit) -> 'a seq -> unit
val to_list : 'a seq -> 'a list
end
事实证明,OCaml 标准库几乎拥有我们需要的一切:
Seq.t
数据类型以及映射、字符串/列表转换等操作。缺少的是 map_accum_filter
和 look_ahead
。不过,标准库提供了序列上的 filter_map
。我们要做的只是添加状态,实现为可变状态:
It turns out that the OCaml standard library has almost everything we need: the Seq.t data type and the mapping, string/list conversion, etc. operations on it. What is missing is map_accum_filter and look_ahead. However, the standard library provides filter_map on sequences. We merely need to add state, realized as mutable state:
let map_accum_filter : 'state -> ('state -> 'a -> 'b option * 'state) -> 'a seq -> 'b seq =
fun init f seq ->
let stat = ref init in
seq |> Seq.filter_map (fun a -> let (bo,st) = f !stat a in stat := st; bo)
总体而言,
map_accum_filter
确实有一个具有显式
状态传递
的“函数式”接口;但实现是命令式的。明确的规范和快速的实现之间没有内在冲突。剩下的 look_ahead
很容易通过 map_accum_filter
来表达(我们将实现留给读者⸺或者参见随附的代码)。
The overall map_accum_filter does have a `functional' interface with the explicit state-passing; the implementation is imperative however. There is no inherent conflict between a clear specification and a fast implementation. The remaining look_ahead turns out easily expressed via map_accum_filter (we leave the implementation to the reader -- or see the accompanying code).
回顾我们的解法,我们注意到所有状态最终都被封装了。
前瞻
的状态被封装在
look_ahead
中。分组本身变成了无状态的。计数需要多一个状态。每个操作都保留其所需的状态,而无需了解或干涉其他操作的状态。
Looking back at our solution, we notice that all state ends up encapsulated. The state for look-ahead is encapsulated in look_ahead. The grouping per se turned out stateless. Counting requires one more piece of state. Each operation keeps the state it needs, without knowing or interfering with the state of others.
我们还注意到这里没有循环或递归函数。我们也不必处理终止条件。这是所有 Group I 解决方案的特点,在 APL 中明确阐述:指定如何更改集合的内容或其形状,但不要浪费时间思考如何获取元素或将它们放在哪里。
We also notice that there are no loops or recursive functions. Neither did we have to deal with termination conditions. That is the feature of all Group I solutions, clearly articulated back in APL: specify how to change the content of a collection or its shape, but don't waste time thinking how to get the elements or where to put them.
我们的
'a seq
也可以实现为
迭代器
/
生成器
,通过 for
循环进行组合,拥有良好的性能。即使是 OCaml 的 Seq.t
也设计为了一定的
融合
(尽管不完整:仍有中间数据结构,但大小恒定)。但如果你使用如 strymonas 这样的库就可以实现完美的融合,还能生成C代码。
Our 'a seq can also be realized as iterator/generator, to be composed via for-loops -- with good performance. Even OCaml Seq.t fuses by design (albeit incompletely: there are still intermediate data structures, but of constant size). Perfect fusion is achievable, if one uses, say, strymonas, which also permits the generation of C code. Here it is.
void rll(const int64_t * aa_1,const int64_t al_2)
{
int64_t v_3 = 0;
int64_t v_4 = 0;
bool v_5 = 0;
int64_t v_6 = 0;
while (v_6 <= al_2)
{
int64_t t_7;
t_7 = v_6;
v_6++;
if (t_7 < al_2)
{
int64_t t_9;
t_9 = aa_1[t_7];
if (v_5)
{
int64_t t_10;
t_10 = v_4;
v_4 = t_9;
if (!(t_10 == t_9))
{
int64_t t_11;
t_11 = v_3;
v_3 = 0;
printf("%ld\n",(long)t_10);
printf("%ld\n",(long)(t_11 + 1));
}
else {
v_3++;
}
}
else {
v_4 = t_9;
v_5 = 1;
}
}
else {
if (v_5)
{
int64_t t_8;
t_8 = v_3;
v_3 = 0;
printf("%ld\n",(long)v_4);
printf("%ld\n",(long)(t_8 + 1));
}
}
}}
还有一件事:第一次尝试就成功了。
One more thing: it all worked on the first try.
引用
References
[#276]
引用 References [#276]
apples.ml [4K]
完整的源码
apples.ml [4K]
The complete source code
apples.c [2K]
strymonas 解法:生成出的 C 代码。有关生成器,请参阅 strymonas 发行中的 examples/run-length-encoding
目录。
apples.c [2K]
The strymonas solution: generated C code. For the generator, see the directory examples/run-length-encoding in the strymonas distribution.
Incremental, Linear Pretty-printing
分步式
的
管道
开发可以走得很远。上述参考资料开发了一种相当
不平凡
的算法,具有最理想的、难以实现的性能。
Incremental, Linear Pretty-printing
The stepwise pipeline development can be taken quite far. The above reference develops a rather non-trivial algorithm with the optimal, difficult to achieve performance.