-
Notifications
You must be signed in to change notification settings - Fork 54.3k
fix(core): Support reconnecting on Redis failover #25038
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
Bundle ReportChanges will increase total bundle size by 40.94MB (100.0%) ⬆️
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No issues found across 3 files
Codecov Report❌ Patch coverage is
📢 Thoughts on this report? Let us know! |
This comment has been minimized.
This comment has been minimized.
|
|
||
| /** Whether to reconnect to Redis on READONLY errors i.e., failover events. */ | ||
| @Env('QUEUE_BULL_REDIS_RECONNECT_ON_FAILOVER') | ||
| reconnectOnFailover: boolean = false; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why are we not enabling this by default?
| if (reconnectOnFailover) { | ||
| options.reconnectOnError = (redisErr: Error) => { | ||
| const targetError = 'READONLY'; | ||
| if (redisErr.message.includes(targetError)) { | ||
| this.logger.warn('Reconnecting to Redis due to READONLY error (possible failover event)'); | ||
| return true; | ||
| } | ||
| return false; | ||
| }; | ||
| } | ||
|
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So if a write e.g. to enqueue a Bull job fails and we fail over that job is lost forever? Should we resend the failed write as well?
This feature is useful when using Amazon ElastiCache instances with Auto-failover disabled. On these instances, test your reconnectOnError handler by manually promoting the replica node to the primary role using the AWS console. The following writes fail with the error READONLY. Using reconnectOnError, we can force the connection to reconnect on this error in order to connect to the new master. Furthermore, if the reconnectOnError returns 2, ioredis will resend the failed command after reconnecting.
Summary
If the Redis server fails over, it will start to throw
READONLYerrors when you try to write because you're trying to write to a readonly server. In this case, the client should reconnect.By default this doesn't happen, so we leave this disabled by default as well, but we expose an option to enable it.
Related Linear tickets, Github issues, and Community forum posts
https://linear.app/n8n/issue/CAT-2240/n8n-fails-to-reconnect-after-redisvalkey-failover-in-multi-az-setup
Review / Merge checklist
release/backport(if the PR is an urgent fix that needs to be backported)