Skip to content

Commit 1575a81

Browse files
sergey-miryanovnaschemezaniebhugovk
authored
GH-148726: Forward-port generational GC. (GH-148746)
The replaces the incremental GC with a forward port (from 3.13) of the generational GC. Co-Authored-By: Neil Schemenauer <nas@arctrix.com> Co-Authored-By: Zanie Blue <contact@zanie.dev> Co-Authored-By: Sergey Miryanov <sergey.miryanov@gmail.com> Co-authored-by: Hugo van Kemenade <1324225+hugovk@users.noreply.github.com>
1 parent b413bc7 commit 1575a81

11 files changed

Lines changed: 552 additions & 1101 deletions

File tree

Doc/library/gc.rst

Lines changed: 24 additions & 29 deletions
Original file line numberDiff line numberDiff line change
@@ -37,18 +37,11 @@ The :mod:`!gc` module provides the following functions:
3737

3838
.. function:: collect(generation=2)
3939

40-
Perform a collection. The optional argument *generation*
40+
With no arguments, run a full collection. The optional argument *generation*
4141
may be an integer specifying which generation to collect (from 0 to 2). A
4242
:exc:`ValueError` is raised if the generation number is invalid. The sum of
4343
collected objects and uncollectable objects is returned.
4444

45-
Calling ``gc.collect(0)`` will perform a GC collection on the young generation.
46-
47-
Calling ``gc.collect(1)`` will perform a GC collection on the young generation
48-
and an increment of the old generation.
49-
50-
Calling ``gc.collect(2)`` or ``gc.collect()`` performs a full collection
51-
5245
The free lists maintained for a number of built-in types are cleared
5346
whenever a full collection or collection of the highest generation (2)
5447
is run. Not all items in some free lists may be freed due to the
@@ -60,6 +53,9 @@ The :mod:`!gc` module provides the following functions:
6053
.. versionchanged:: 3.14
6154
``generation=1`` performs an increment of collection.
6255

56+
.. versionchanged:: 3.14.5
57+
``generation=1`` performs collection of the middle generation.
58+
6359

6460
.. function:: set_debug(flags)
6561

@@ -75,20 +71,19 @@ The :mod:`!gc` module provides the following functions:
7571

7672
.. function:: get_objects(generation=None)
7773

78-
7974
Returns a list of all objects tracked by the collector, excluding the list
80-
returned. If *generation* is not ``None``, return only the objects as follows:
81-
82-
* 0: All objects in the young generation
83-
* 1: No objects, as there is no generation 1 (as of Python 3.14)
84-
* 2: All objects in the old generation
75+
returned. If *generation* is not ``None``, return only the objects tracked by
76+
the collector that are in that generation.
8577

8678
.. versionchanged:: 3.8
8779
New *generation* parameter.
8880

8981
.. versionchanged:: 3.14
9082
Generation 1 is removed
9183

84+
.. versionchanged:: 3.14.5
85+
Generation 1 is reintroduced to maintain GC behavior from 3.13.
86+
9287
.. audit-event:: gc.get_objects generation gc.get_objects
9388

9489
.. function:: get_stats()
@@ -124,33 +119,33 @@ The :mod:`!gc` module provides the following functions:
124119
Set the garbage collection thresholds (the collection frequency). Setting
125120
*threshold0* to zero disables collection.
126121

127-
The GC classifies objects into two generations depending on whether they have
128-
survived a collection. New objects are placed in the young generation. If an
129-
object survives a collection it is moved into the old generation.
130-
131-
In order to decide when to run, the collector keeps track of the number of object
122+
The GC classifies objects into three generations depending on how many
123+
collection sweeps they have survived. New objects are placed in the youngest
124+
generation (generation ``0``). If an object survives a collection it is moved
125+
into the next older generation. Since generation ``2`` is the oldest
126+
generation, objects in that generation remain there after a collection. In
127+
order to decide when to run, the collector keeps track of the number object
132128
allocations and deallocations since the last collection. When the number of
133129
allocations minus the number of deallocations exceeds *threshold0*, collection
134-
starts. For each collection, all the objects in the young generation and some
135-
fraction of the old generation is collected.
130+
starts. Initially only generation ``0`` is examined. If generation ``0`` has
131+
been examined more than *threshold1* times since generation ``1`` has been
132+
examined, then generation ``1`` is examined as well.
133+
With the third generation, things are a bit more complicated,
134+
see `Collecting the oldest generation <https://github.com/python/cpython/blob/ff0ef0a54bef26fc507fbf9b7a6009eb7d3f17f5/InternalDocs/garbage_collector.md#collecting-the-oldest-generation>`_ for more information.
136135

137136
In the free-threaded build, the increase in process memory usage is also
138137
checked before running the collector. If the memory usage has not increased
139138
by 10% since the last collection and the net number of object allocations
140139
has not exceeded 40 times *threshold0*, the collection is not run.
141140

142-
The fraction of the old generation that is collected is **inversely** proportional
143-
to *threshold1*. The larger *threshold1* is, the slower objects in the old generation
144-
are collected.
145-
For the default value of 10, 1% of the old generation is scanned during each collection.
146-
147-
*threshold2* is ignored.
148-
149-
See `Garbage collector design <https://devguide.python.org/garbage_collector>`_ for more information.
141+
See `Garbage collector design <https://github.com/python/cpython/blob/3.15/InternalDocs/garbage_collector.md>`_ for more information.
150142

151143
.. versionchanged:: 3.14
152144
*threshold2* is ignored
153145

146+
.. versionchanged:: 3.14.5
147+
*threshold2* is restored to match Python 3.13 behavior.
148+
154149

155150
.. function:: get_count()
156151

Include/internal/pycore_gc.h

Lines changed: 5 additions & 38 deletions
Original file line numberDiff line numberDiff line change
@@ -118,21 +118,6 @@ static inline void _PyObject_GC_SET_SHARED(PyObject *op) {
118118
/* Bit 1 is set when the object is in generation which is GCed currently. */
119119
#define _PyGC_PREV_MASK_COLLECTING ((uintptr_t)2)
120120

121-
/* Bit 0 in _gc_next is the old space bit.
122-
* It is set as follows:
123-
* Young: gcstate->visited_space
124-
* old[0]: 0
125-
* old[1]: 1
126-
* permanent: 0
127-
*
128-
* During a collection all objects handled should have the bit set to
129-
* gcstate->visited_space, as objects are moved from the young gen
130-
* and the increment into old[gcstate->visited_space].
131-
* When object are moved from the pending space, old[gcstate->visited_space^1]
132-
* into the increment, the old space bit is flipped.
133-
*/
134-
#define _PyGC_NEXT_MASK_OLD_SPACE_1 1
135-
136121
#define _PyGC_PREV_SHIFT 2
137122
#define _PyGC_PREV_MASK (((uintptr_t) -1) << _PyGC_PREV_SHIFT)
138123

@@ -159,13 +144,11 @@ typedef enum {
159144
// Lowest bit of _gc_next is used for flags only in GC.
160145
// But it is always 0 for normal code.
161146
static inline PyGC_Head* _PyGCHead_NEXT(PyGC_Head *gc) {
162-
uintptr_t next = gc->_gc_next & _PyGC_PREV_MASK;
147+
uintptr_t next = gc->_gc_next;
163148
return (PyGC_Head*)next;
164149
}
165150
static inline void _PyGCHead_SET_NEXT(PyGC_Head *gc, PyGC_Head *next) {
166-
uintptr_t unext = (uintptr_t)next;
167-
assert((unext & ~_PyGC_PREV_MASK) == 0);
168-
gc->_gc_next = (gc->_gc_next & ~_PyGC_PREV_MASK) | unext;
151+
gc->_gc_next = (uintptr_t)next;
169152
}
170153

171154
// Lowest two bits of _gc_prev is used for _PyGC_PREV_MASK_* flags.
@@ -207,10 +190,6 @@ static inline void _PyGC_CLEAR_FINALIZED(PyObject *op) {
207190

208191
extern void _Py_ScheduleGC(PyThreadState *tstate);
209192

210-
#ifndef Py_GIL_DISABLED
211-
extern void _Py_TriggerGC(struct _gc_runtime_state *gcstate);
212-
#endif
213-
214193

215194
/* Tell the GC to track this object.
216195
*
@@ -220,7 +199,7 @@ extern void _Py_TriggerGC(struct _gc_runtime_state *gcstate);
220199
* ob_traverse method.
221200
*
222201
* Internal note: interp->gc.generation0->_gc_prev doesn't have any bit flags
223-
* because it's not object header. So we don't use _PyGCHead_PREV() and
202+
* because it's not an object header. So we don't use _PyGCHead_PREV() and
224203
* _PyGCHead_SET_PREV() for it to avoid unnecessary bitwise operations.
225204
*
226205
* See also the public PyObject_GC_Track() function.
@@ -244,19 +223,12 @@ static inline void _PyObject_GC_TRACK(
244223
"object is in generation which is garbage collected",
245224
filename, lineno, __func__);
246225

247-
struct _gc_runtime_state *gcstate = &_PyInterpreterState_GET()->gc;
248-
PyGC_Head *generation0 = &gcstate->young.head;
226+
PyGC_Head *generation0 = _PyInterpreterState_GET()->gc.generation0;
249227
PyGC_Head *last = (PyGC_Head*)(generation0->_gc_prev);
250228
_PyGCHead_SET_NEXT(last, gc);
251229
_PyGCHead_SET_PREV(gc, last);
252-
uintptr_t not_visited = 1 ^ gcstate->visited_space;
253-
gc->_gc_next = ((uintptr_t)generation0) | not_visited;
230+
_PyGCHead_SET_NEXT(gc, generation0);
254231
generation0->_gc_prev = (uintptr_t)gc;
255-
gcstate->young.count++; /* number of tracked GC objects */
256-
gcstate->heap_size++;
257-
if (gcstate->young.count > gcstate->young.threshold) {
258-
_Py_TriggerGC(gcstate);
259-
}
260232
#endif
261233
}
262234

@@ -291,11 +263,6 @@ static inline void _PyObject_GC_UNTRACK(
291263
_PyGCHead_SET_PREV(next, prev);
292264
gc->_gc_next = 0;
293265
gc->_gc_prev &= _PyGC_PREV_MASK_FINALIZED;
294-
struct _gc_runtime_state *gcstate = &_PyInterpreterState_GET()->gc;
295-
if (gcstate->young.count > 0) {
296-
gcstate->young.count--;
297-
}
298-
gcstate->heap_size--;
299266
#endif
300267
}
301268

Include/internal/pycore_interp_structs.h

Lines changed: 22 additions & 27 deletions
Original file line numberDiff line numberDiff line change
@@ -181,29 +181,14 @@ struct gc_generation {
181181
struct gc_generation_stats {
182182
PyTime_t ts_start;
183183
PyTime_t ts_stop;
184-
185-
/* heap_size on the start of the collection */
186-
Py_ssize_t heap_size;
187-
188-
/* work_to_do on the start of the collection */
189-
Py_ssize_t work_to_do;
190-
191184
/* total number of collections */
192185
Py_ssize_t collections;
193-
194-
/* total number of visited objects */
195-
Py_ssize_t object_visits;
196-
197186
/* total number of collected objects */
198187
Py_ssize_t collected;
199188
/* total number of uncollectable objects (put into gc.garbage) */
200189
Py_ssize_t uncollectable;
201190
// Total number of objects considered for collection and traversed:
202191
Py_ssize_t candidates;
203-
204-
Py_ssize_t objects_transitively_reachable;
205-
Py_ssize_t objects_not_transitively_reachable;
206-
207192
// Total duration of the collection in seconds:
208193
double duration;
209194
};
@@ -225,11 +210,6 @@ struct gc_old_stats_buffer {
225210
int8_t index;
226211
};
227212

228-
enum _GCPhase {
229-
GC_PHASE_MARK = 0,
230-
GC_PHASE_COLLECT = 1
231-
};
232-
233213
/* If we change this, we need to change the default value in the
234214
signature of gc.collect and change the size of PyStats.gc_stats */
235215
#define NUM_GENERATIONS 3
@@ -244,8 +224,13 @@ struct _gc_runtime_state {
244224
int enabled;
245225
int debug;
246226
/* linked lists of container objects */
227+
#ifndef Py_GIL_DISABLED
228+
struct gc_generation generations[NUM_GENERATIONS];
229+
PyGC_Head *generation0;
230+
#else
247231
struct gc_generation young;
248232
struct gc_generation old[2];
233+
#endif
249234
/* a permanent generation which won't be collected */
250235
struct gc_generation permanent_generation;
251236
struct gc_stats *generation_stats;
@@ -259,13 +244,6 @@ struct _gc_runtime_state {
259244
/* a list of callbacks to be invoked when collection is performed */
260245
PyObject *callbacks;
261246

262-
Py_ssize_t heap_size;
263-
Py_ssize_t work_to_do;
264-
/* Which of the old spaces is the visited space */
265-
int visited_space;
266-
int phase;
267-
268-
#ifdef Py_GIL_DISABLED
269247
/* This is the number of objects that survived the last full
270248
collection. It approximates the number of long lived objects
271249
tracked by the GC.
@@ -278,6 +256,7 @@ struct _gc_runtime_state {
278256
the first time. */
279257
Py_ssize_t long_lived_pending;
280258

259+
#ifdef Py_GIL_DISABLED
281260
/* True if gc.freeze() has been used. */
282261
int freeze_active;
283262

@@ -293,6 +272,22 @@ struct _gc_runtime_state {
293272
#endif
294273
};
295274

275+
#ifndef Py_GIL_DISABLED
276+
#define GC_GENERATION_INIT \
277+
.generations = { \
278+
{ .threshold = 2000, }, \
279+
{ .threshold = 10, }, \
280+
{ .threshold = 10, }, \
281+
},
282+
#else
283+
#define GC_GENERATION_INIT \
284+
.young = { .threshold = 2000, }, \
285+
.old = { \
286+
{ .threshold = 10, }, \
287+
{ .threshold = 10, }, \
288+
},
289+
#endif
290+
296291
#include "pycore_gil.h" // struct _gil_runtime_state
297292

298293
/**** Import ********/

Include/internal/pycore_runtime_init.h

Lines changed: 1 addition & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -130,13 +130,7 @@ extern PyTypeObject _PyExc_MemoryError;
130130
}, \
131131
.gc = { \
132132
.enabled = 1, \
133-
.young = { .threshold = 2000, }, \
134-
.old = { \
135-
{ .threshold = 10, }, \
136-
{ .threshold = 0, }, \
137-
}, \
138-
.work_to_do = -5000, \
139-
.phase = GC_PHASE_MARK, \
133+
GC_GENERATION_INIT \
140134
}, \
141135
.qsbr = { \
142136
.wr_seq = QSBR_INITIAL, \

0 commit comments

Comments
 (0)