Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
26 commits
Select commit Hold shift + click to select a range
cb479a1
Remove unused functions in binlog_file.go
grodowski Nov 7, 2025
215dee4
escape table name in ReadLastCheckpoint (#1610)
meiji163 Nov 24, 2025
ec02c37
Revertible Migration (#1607)
meiji163 Nov 25, 2025
aa93172
Merge branch 'master' into grodowski/remove-unused-code
grodowski Nov 26, 2025
982eefb
Change checkpoint binlog coordinates field to TEXT (#1611)
meiji163 Dec 5, 2025
8b5aa46
Merge branch 'master' into grodowski/remove-unused-code
grodowski Dec 9, 2025
cc2dd7f
Add flag --skip-metadata-lock-check (#1616)
meiji163 Jan 16, 2026
a1e9c9d
fix: add missing error check for WriteChangelogState in initiateAppli…
ajm188 Jan 30, 2026
5c0359a
Merge branch 'master' into grodowski/remove-unused-code
grodowski Feb 4, 2026
aadbb79
Fix problems when altering a column from `binary` to `varbinary` (#1628)
jorendorff Feb 10, 2026
c6f95cc
Fix 4 trigger handling bugs (#1626)
yakirgb Feb 10, 2026
e5760f7
Merge branch 'master' into grodowski/remove-unused-code
grodowski Mar 5, 2026
c72b237
Create a hook to capture copy batch errors and retries (#1638)
meiji163 Mar 5, 2026
753cf88
Fix data loss when inserting duplicate values during a migration (#1633)
ggilder Mar 9, 2026
7aea210
Merge branch 'master' into grodowski/remove-unused-code
grodowski Mar 9, 2026
f7862c0
Add support for go 1.25 (#1634)
ggilder Mar 9, 2026
7bb2c12
Merge branch 'master' into grodowski/remove-unused-code
grodowski Mar 10, 2026
b000b24
Replace usage of `Fatale` with context cancellation (#1639)
ggilder Mar 12, 2026
67cc636
Improve tests for various error scenarios (#1642)
ggilder Mar 17, 2026
b9652c3
Add retry logic for instant DDL on lock wait timeout (#1651)
yosefbs Mar 18, 2026
0d5c737
Fix local tests by making .gopath writable to avoid toolchain rm perm…
grodowski Mar 24, 2026
8f274f7
Fix handling of warnings on DML batches (#1643)
ggilder Mar 27, 2026
64f8c18
Merge branch 'master' into grodowski/remove-unused-code
grodowski Mar 30, 2026
8bc63f0
Fix abort/retry interaction (#1655)
ggilder Apr 2, 2026
0270a28
Add `GH_OST_INSTANT_DDL` for gh-ost-on-success hook (#1658)
meiji163 Apr 8, 2026
7824e0b
Merge branch 'master' into grodowski/remove-unused-code
grodowski Apr 21, 2026
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
18 changes: 18 additions & 0 deletions .github/CONTRIBUTING.md
Original file line number Diff line number Diff line change
Expand Up @@ -19,6 +19,24 @@ Here are a few things you can do that will increase the likelihood of your pull
- Keep your change as focused as possible. If there are multiple changes you would like to make that are not dependent upon each other, consider submitting them as separate pull requests.
- Write a [good commit message](http://tbaggery.com/2008/04/19/a-note-about-git-commit-messages.html).

## Development Guidelines

### Channel Safety

When working with channels in goroutines, it's critical to prevent deadlocks that can occur when a channel receiver exits due to an error while senders are still trying to send values. Always use `base.SendWithContext` for channel sends to avoid deadlocks:

```go
// ✅ CORRECT - Uses helper to prevent deadlock
if err := base.SendWithContext(ctx, ch, value); err != nil {
return err // context was cancelled
}

// ❌ WRONG - Can deadlock if receiver exits
ch <- value
```

Even if the destination channel is buffered, deadlocks could still occur if the buffer fills up and the receiver exits, so it's important to use `SendWithContext` in those cases as well.

## Resources

- [Contributing to Open Source on GitHub](https://guides.github.com/activities/contributing-to-open-source/)
Expand Down
14 changes: 14 additions & 0 deletions .github/workflows/replica-tests.yml
Original file line number Diff line number Diff line change
Expand Up @@ -28,6 +28,20 @@ jobs:
- name: Run tests
run: script/docker-gh-ost-replica-tests run

- name: Set artifact name
if: failure()
run: |
ARTIFACT_NAME=$(echo "${{ matrix.image }}" | tr '/:' '-')
echo "ARTIFACT_NAME=test-logs-${ARTIFACT_NAME}" >> $GITHUB_ENV

- name: Upload test logs on failure
if: failure()
uses: actions/upload-artifact@v4
with:
name: ${{ env.ARTIFACT_NAME }}
path: /tmp/gh-ost-test.*
retention-days: 7

- name: Teardown environment
if: always()
run: script/docker-gh-ost-replica-tests down
12 changes: 12 additions & 0 deletions doc/command-line-flags.md
Original file line number Diff line number Diff line change
Expand Up @@ -24,6 +24,11 @@ By default, `gh-ost` would like you to connect to a replica, from where it figur

If, for some reason, you do not wish `gh-ost` to connect to a replica, you may connect it directly to the master and approve this via `--allow-on-master`.

### allow-setup-metadata-lock-instruments

`--allow-setup-metadata-lock-instruments` allows gh-ost to enable the [`metadata_locks`](https://dev.mysql.com/doc/refman/8.0/en/performance-schema-metadata-locks-table.html) table in `performance_schema`, if it is not already enabled. This is used for a safety check before cut-over.
See also: [`skip-metadata-lock-check`](#skip-metadata-lock-check)

### approve-renamed-columns

When your migration issues a column rename (`change column old_name new_name ...`) `gh-ost` analyzes the statement to try and associate the old column name with new column name. Otherwise, the new structure may also look like some column was dropped and another was added.
Expand Down Expand Up @@ -247,6 +252,13 @@ Defaults to an auto-determined and advertised upon startup file. Defines Unix so

By default `gh-ost` verifies no foreign keys exist on the migrated table. On servers with large number of tables this check can take a long time. If you're absolutely certain no foreign keys exist (table does not reference other table nor is referenced by other tables) and wish to save the check time, provide with `--skip-foreign-key-checks`.

### skip-metadata-lock-check

By default `gh-ost` performs a check before the cut-over to ensure the rename session holds the exclusive metadata lock on the table. In case `performance_schema.metadata_locks` cannot be enabled on your setup, this check can be skipped with `--skip-metadata-lock-check`.
:warning: Disabling this check involves the small chance of data loss in case a session accesses the ghost table during cut-over. See https://github.com/github/gh-ost/pull/1536 for details.

See also: [`allow-setup-metadata-lock-instruments`](#allow-setup-metadata-lock-instruments)

### skip-strict-mode

By default `gh-ost` enforces STRICT_ALL_TABLES sql_mode as a safety measure. In some cases this changes the behaviour of other modes (namely ERROR_FOR_DIVISION_BY_ZERO, NO_ZERO_DATE, and NO_ZERO_IN_DATE) which may lead to errors during migration. Use `--skip-strict-mode` to explicitly tell `gh-ost` not to enforce this. **Danger** This may have some unexpected disastrous side effects.
Expand Down
4 changes: 4 additions & 0 deletions doc/hooks.md
Original file line number Diff line number Diff line change
Expand Up @@ -49,6 +49,7 @@ The full list of supported hooks is best found in code: [hooks.go](https://githu
- `gh-ost-on-before-cut-over`
- `gh-ost-on-success`
- `gh-ost-on-failure`
- `gh-ost-on-batch-copy-retry`

### Context

Expand Down Expand Up @@ -76,11 +77,14 @@ The following variables are available on all hooks:
- `GH_OST_HOOKS_HINT_OWNER` - copy of `--hooks-hint-owner` value
- `GH_OST_HOOKS_HINT_TOKEN` - copy of `--hooks-hint-token` value
- `GH_OST_DRY_RUN` - whether or not the `gh-ost` run is a dry run
- `GH_OST_REVERT` - whether or not `gh-ost` is running in revert mode

The following variable are available on particular hooks:

- `GH_OST_INSTANT_DDL` is only available in `gh-ost-on-success`. The value is `true` if instant DDL was successful, and `false` if it was not.
- `GH_OST_COMMAND` is only available in `gh-ost-on-interactive-command`
- `GH_OST_STATUS` is only available in `gh-ost-on-status`
- `GH_OST_LAST_BATCH_COPY_ERROR` is only available in `gh-ost-on-batch-copy-retry`

### Examples

Expand Down
1 change: 1 addition & 0 deletions doc/resume.md
Original file line number Diff line number Diff line change
Expand Up @@ -4,6 +4,7 @@
- The first `gh-ost` process was invoked with `--checkpoint`
- The first `gh-ost` process had at least one successful checkpoint
- The binlogs from the last checkpoint's binlog coordinates still exist on the replica gh-ost is inspecting (specified by `--host`)
- The checkpoint table (name ends with `_ghk`) still exists

To resume, invoke `gh-ost` again with the same arguments with the `--resume` flag.

Expand Down
56 changes: 56 additions & 0 deletions doc/revert.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,56 @@
# Reverting Migrations

`gh-ost` can attempt to revert a previously completed migration if the follow conditions are met:
- The first `gh-ost` process was invoked with `--checkpoint`
- The checkpoint table (name ends with `_ghk`) still exists
- The binlogs from the time of the migration's cut-over still exist on the replica gh-ost is inspecting (specified by `--host`)

To revert, find the name of the "old" table from the original migration e.g. `_mytable_del`. Then invoke `gh-ost` with the same arguments and the flags `--revert` and `--old-table="_mytable_del"`.
gh-ost will read the binlog coordinates of the original cut-over from the checkpoint table and bring the old table up to date. Then it performs another cut-over to complete the reversion.
Note that the checkpoint table (name ends with _ghk) will not be automatically dropped unless `--ok-to-drop-table` is provided.

> [!WARNING]
> It is recommended use `--checkpoint` with `--gtid` enabled so that checkpoint binlog coordinates store GTID sets rather than file positions. In that case, `gh-ost` can revert using a different replica than it originally attached to.

### ❗ Note ❗
Reverting is roughly equivalent to applying the "reverse" migration. _Before attempting to revert you should determine if the reverse migration is possible and does not involve any unacceptable data loss._

For example: if the original migration drops a `NOT NULL` column that has no `DEFAULT` then the reverse migration adds the column. In this case, the reverse migration is impossible if rows were added after the original cut-over and the revert will fail.
Another example: if the original migration modifies a `VARCHAR(32)` column to `VARCHAR(64)`, the reverse migration truncates the `VARCHAR(64)` column to `VARCHAR(32)`. If values were inserted with length > 32 after the cut-over then the revert will fail.


## Example
The migration starts with a `gh-ost` invocation such as:
```shell
gh-ost \
--chunk-size=100 \
--host=replica1.company.com \
--database="mydb" \
--table="mytable" \
--alter="drop key idx1"
--gtid \
--checkpoint \
--checkpoint-seconds=60 \
--execute
```

In this example `gh-ost` writes a cut-over checkpoint to `_mytable_ghk` after the cut-over is successful. The original table is renamed to `_mytable_del`.

Suppose that dropping the index causes problems, the migration can be revert with:
```shell
# revert migration
gh-ost \
--chunk-size=100 \
--host=replica1.company.com \
--database="mydb" \
--table="mytable" \
--old-table="_mytable_del"
--gtid \
--checkpoint \
--checkpoint-seconds=60 \
--revert \
--execute
```

gh-ost then reconnects at the binlog coordinates stored in the cut-over checkpoint and applies DMLs until the old table is up-to-date.
Note that the "reverse" migration is `ADD KEY idx(...)` so there is no potential data loss to consider in this case.
93 changes: 88 additions & 5 deletions go/base/context.go
Original file line number Diff line number Diff line change
Expand Up @@ -6,6 +6,7 @@
package base

import (
"context"
"fmt"
"math"
"os"
Expand Down Expand Up @@ -104,6 +105,8 @@ type MigrationContext struct {
AzureMySQL bool
AttemptInstantDDL bool
Resume bool
Revert bool
OldTableName string

// SkipPortValidation allows skipping the port validation in `ValidateConnection`
// This is useful when connecting to a MySQL instance where the external port
Expand Down Expand Up @@ -223,6 +226,16 @@ type MigrationContext struct {
InCutOverCriticalSectionFlag int64
PanicAbort chan error

// Context for cancellation signaling across all goroutines
// Stored in struct as it spans the entire migration lifecycle, not per-function.
// context.Context is safe for concurrent use by multiple goroutines.
ctx context.Context //nolint:containedctx
cancelFunc context.CancelFunc

// Stores the fatal error that triggered abort
AbortError error
abortMutex *sync.Mutex

OriginalTableColumnsOnApplier *sql.ColumnList
OriginalTableColumns *sql.ColumnList
OriginalTableVirtualColumns *sql.ColumnList
Expand Down Expand Up @@ -254,6 +267,7 @@ type MigrationContext struct {

BinlogSyncerMaxReconnectAttempts int
AllowSetupMetadataLockInstruments bool
SkipMetadataLockCheck bool
IsOpenMetadataLockInstruments bool

Log Logger
Expand Down Expand Up @@ -290,6 +304,7 @@ type ContextConfig struct {
}

func NewMigrationContext() *MigrationContext {
ctx, cancelFunc := context.WithCancel(context.Background())
return &MigrationContext{
Uuid: uuid.NewString(),
defaultNumRetries: 60,
Expand All @@ -310,6 +325,9 @@ func NewMigrationContext() *MigrationContext {
lastHeartbeatOnChangelogMutex: &sync.Mutex{},
ColumnRenameMap: make(map[string]string),
PanicAbort: make(chan error),
ctx: ctx,
cancelFunc: cancelFunc,
abortMutex: &sync.Mutex{},
Log: NewDefaultLogger(),
}
}
Expand Down Expand Up @@ -348,6 +366,10 @@ func getSafeTableName(baseName string, suffix string) string {
// GetGhostTableName generates the name of ghost table, based on original table name
// or a given table name
func (this *MigrationContext) GetGhostTableName() string {
if this.Revert {
// When reverting the "ghost" table is the _del table from the original migration.
return this.OldTableName
}
if this.ForceTmpTableName != "" {
return getSafeTableName(this.ForceTmpTableName, "gho")
} else {
Expand All @@ -364,14 +386,18 @@ func (this *MigrationContext) GetOldTableName() string {
tableName = this.OriginalTableName
}

suffix := "del"
if this.Revert {
suffix = "rev_del"
}
if this.TimestampOldTable {
t := this.StartTime
timestamp := fmt.Sprintf("%d%02d%02d%02d%02d%02d",
t.Year(), t.Month(), t.Day(),
t.Hour(), t.Minute(), t.Second())
return getSafeTableName(tableName, fmt.Sprintf("%s_del", timestamp))
return getSafeTableName(tableName, fmt.Sprintf("%s_%s", timestamp, suffix))
}
return getSafeTableName(tableName, "del")
return getSafeTableName(tableName, suffix)
}

// GetChangelogTableName generates the name of changelog table, based on original table name
Expand Down Expand Up @@ -600,6 +626,13 @@ func (this *MigrationContext) GetIteration() int64 {
return atomic.LoadInt64(&this.Iteration)
}

func (this *MigrationContext) SetNextIterationRangeMinValues() {
this.MigrationIterationRangeMinValues = this.MigrationIterationRangeMaxValues
if this.MigrationIterationRangeMinValues == nil {
this.MigrationIterationRangeMinValues = this.MigrationRangeMinValues
}
}

func (this *MigrationContext) MarkPointOfInterest() int64 {
this.pointOfInterestTimeMutex.Lock()
defer this.pointOfInterestTimeMutex.Unlock()
Expand Down Expand Up @@ -959,9 +992,59 @@ func (this *MigrationContext) GetGhostTriggerName(triggerName string) string {
return triggerName + this.TriggerSuffix
}

// validateGhostTriggerLength check if the ghost trigger name length is not more than 64 characters
// ValidateGhostTriggerLengthBelowMaxLength checks if the given trigger name (already transformed
// by GetGhostTriggerName) does not exceed the maximum allowed length.
func (this *MigrationContext) ValidateGhostTriggerLengthBelowMaxLength(triggerName string) bool {
ghostTriggerName := this.GetGhostTriggerName(triggerName)
return utf8.RuneCountInString(triggerName) <= mysql.MaxTableNameLength
}

// GetContext returns the migration context for cancellation checking
func (this *MigrationContext) GetContext() context.Context {
return this.ctx
}

return utf8.RuneCountInString(ghostTriggerName) <= mysql.MaxTableNameLength
// SetAbortError stores the fatal error that triggered abort
// Only the first error is stored (subsequent errors are ignored)
func (this *MigrationContext) SetAbortError(err error) {
this.abortMutex.Lock()
defer this.abortMutex.Unlock()
if this.AbortError == nil {
this.AbortError = err
}
}

// GetAbortError retrieves the stored abort error
func (this *MigrationContext) GetAbortError() error {
this.abortMutex.Lock()
defer this.abortMutex.Unlock()
return this.AbortError
}

// CancelContext cancels the migration context to signal all goroutines to stop
// The cancel function is safe to call multiple times and from multiple goroutines.
func (this *MigrationContext) CancelContext() {
if this.cancelFunc != nil {
this.cancelFunc()
}
}

// SendWithContext attempts to send a value to a channel, but returns early
// if the context is cancelled. This prevents goroutine deadlocks when the
// channel receiver has exited due to an error.
//
// Use this instead of bare channel sends (ch <- val) in goroutines to ensure
// proper cleanup when the migration is aborted.
//
// Example:
//
// if err := base.SendWithContext(ctx, ch, value); err != nil {
// return err // context was cancelled
// }
func SendWithContext[T any](ctx context.Context, ch chan<- T, val T) error {
select {
case ch <- val:
return nil
case <-ctx.Done():
return ctx.Err()
}
}
Loading
Loading