Baby’s First Garbage Collector › 清楚清除
Sweepy sweep
[#269]
Baby’s First Garbage Collector › 清楚清除 Sweepy sweep [#269]
下一阶段是清除我们分配的所有对象并释放任何未标记的对象。但这里有一个问题:根据定义,所有未标记的对象都是无法访问的!我们无法获取他们!
The next phase is to sweep through all of the objects we’ve allocated and free any of them that aren’t marked. But there’s a problem here: all of the unmarked objects are, by definition, unreachable! We can’t get to them!
我们的虚拟机已经实现了对象引用的语言语义,因此我们只在变量和对的字段中存储指向对象的指针。一旦其中一个对象不再指向某个对象,虚拟机就完全丢失了它,并且实际上泄漏了内存。
The VM has implemented the language’s semantics for object references, so we’re only storing pointers to objects in variables and the pair fields. As soon as an object is no longer pointed to by one of those, the VM has lost it entirely and actually leaked memory.
解决这个问题的技巧是,虚拟机可以拥有自己的对象引用,与语言用户可见的语义不同。换句话说,我们可以自己追踪它们。
The trick to solve this is that the VM can have its own references to objects that are distinct from the semantics that are visible to the language user. In other words, we can keep track of them ourselves.
最简单的方法是维护我们分配的每个对象的链表。我们将
Object
本身扩展为该链表中的一个节点:
The simplest way to do this is to just maintain a linked list of every object we’ve ever allocated. We extend Object itself to be a node in that list:
typedef struct sObject {
/* The next object in the list of all objects. */
struct sObject* next;
/* Previous stuff... */
} Object;
虚拟机追踪该链表的头节点:
The VM keeps track of the head of that list:
typedef struct {
/* The first object in the list of all objects. */
Object* firstObject;
/* Previous stuff... */
} VM;
在
newVM()
中,我们确保将 firstObject
初始化为 NULL
。每当我们创建一个对象时,我们都会将其添加到链表中:
In newVM() we make sure to initialize firstObject to NULL. Whenever we create an object, we add it to the list:
Object* newObject(VM* vm, ObjectType type) {
Object* object = malloc(sizeof(Object));
object->type = type;
object->marked = 0;
/* Insert it into the list of allocated objects. */
object->next = vm->firstObject;
vm->firstObject = object;
return object;
}
这样,即使语言找不到对象,但语言实现仍然可以。要扫描并删除未标记的对象,我们遍历列表:
This way, even if the language can’t find an object, the language implementation still can. To sweep through and delete the unmarked objects, we traverse the list:
void sweep(VM* vm)
{
Object** object = &vm->firstObject;
while (*object) {
if (!(*object)->marked) {
/* This object wasn't reached, so remove it from the list
and free it. */
Object* unreached = *object;
*object = unreached->next;
free(unreached);
} else {
/* This object was reached, so unmark it (for the next GC)
and move on to the next. */
(*object)->marked = 0;
object = &(*object)->next;
}
}
}
由于指针指向指针,该代码读起来有点棘手,但如果你仔细阅读它,你会发现它非常简单。它只是遍历整个链表。每当它遇到未标记的对象时,它就会释放其内存并将其从链表中删除。完成操作后,我们会删除所有无法访问的对象。
That code is a bit tricky to read because of that pointer to a pointer, but if you work through it, you can see it’s pretty straightforward. It just walks the entire linked list. Whenever it hits an object that isn’t marked, it frees its memory and removes it from the list. When this is done, we will have deleted every unreachable object.
恭喜!我们有了一个垃圾回收器!只剩下最后一个碎片:实际调用它。首先让我们将这两个阶段结合在一起:
Congratulations! We have a garbage collector! There’s just one missing piece: actually calling it. First let’s wrap the two phases together:
void gc(VM* vm) {
markAll(vm);
sweep(vm);
}
你找不到更清晰的标记清除实现了。最棘手的部分是要弄清楚什么时候真正调用它。“内存不足”到底意味着什么,尤其是在虚拟内存接近无限的现代计算机上?
You couldn’t ask for a more obvious mark-sweep implementation. The trickiest part is figuring out when to actually call this. What does “low on memory” even mean, especially on modern computers with near-infinite virtual memory?
事实证明,这里没有精确的正确或错误答案。这实际上取决于你使用虚拟机的目的以及它运行的硬件类型。为了使这个示例简单,我们将在一定次数的分配后进行回收。这实际上就是某些语言实现的工作原理,而且很容易实现。
It turns out there’s no precise right or wrong answer here. It really depends on what you’re using your VM for and what kind of hardware it runs on. To keep this example simple, we’ll just collect after a certain number of allocations. That’s actually how some language implementations work, and it’s easy to implement.
我们扩展
VM
来追踪我们创建了多少个:
We extend VM to track how many we’ve created:
typedef struct {
/* The total number of currently allocated objects. */
int numObjects;
/* The number of objects required to trigger a GC. */
int maxObjects;
/* Previous stuff... */
} VM;
然后初始化它们:
And then initialize them:
VM* newVM() {
/* Previous stuff... */
vm->numObjects = 0;
vm->maxObjects = INITIAL_GC_THRESHOLD;
return vm;
}
INITIAL_GC_THRESHOLD
将是我们启动第一次 GC 时的对象数量。较小的数字在内存方面更加保守,较大的数字在垃圾回收上花费的时间较少。按个人口味调整。
The INITIAL_GC_THRESHOLD will be the number of objects at which we kick off the first GC. A smaller number is more conservative with memory, a larger number spends less time on garbage collection. Adjust to taste.
我们每创建一个对象,都增加
numObjects
并在其达到最大值时进行一次回收:
Whenever we create an object, we increment numObjects and run a collection if it reaches the max:
Object* newObject(VM* vm, ObjectType type) {
if (vm->numObjects == vm->maxObjects) gc(vm);
/* Create object... */
vm->numObjects++;
return object;
}
我不会费心展示它,但我们还会调整
sweep()
以在每次释放时减少 numObjects
。最后我们修改 gc()
来更新最大值:
I won’t bother showing it, but we’ll also tweak sweep() to decrement numObjects every time it frees one. Finally, we modify gc() to update the max:
void gc(VM* vm) {
int numObjects = vm->numObjects;
markAll(vm);
sweep(vm);
vm->maxObjects = vm->numObjects * 2;
}
每次回收后,我们都会根据回收后剩余的存活对象数量更新
maxObjects
。那里的乘数让我们的堆随着存活对象数量的增加而增长。同样,如果一堆对象最终被释放,它会自动收缩。
After every collection, we update maxObjects based on the number of live objects left after the collection. The multiplier there lets our heap grow as the number of living objects increases. Likewise, it will shrink automatically if a bunch of objects end up being freed.