Refactor BigQuery write error handling and add timeout #4321

haiyuan-eng-google · 2026-01-29T23:49:56Z

Refactor error handling in BigQuery write operations and add timeout for perform_write function.

Please ensure you have read the contribution guide before creating a pull request.

Link to Issue or Description of Change

1. Link to an existing issue (if applicable):

Closes: #issue_number
Related: #issue_number

2. Or, if no issue exists, describe the change:

Problem:
The BigQuery BatchProcessor worker thread could hang indefinitely during write_client.append_rows() calls under certain failure conditions (e.g., silent connection drops or blocked RPCs). Because the worker waits for this call without a timeout, it stops processing new events. The internal queue eventually fills up (defaulting to 1000 items) and subsequent logs are dropped to prevent memory leaks, leading to a silent cessation of logging despite the application continuing to run. Additionally, a ReferenceError was occasionally observed during interpreter shutdown (_atexit_cleanup) when the batch_processor object had already been garbage collected.

Solution:

Enforce Timeout: Wrapped the write_client.append_rows call within asyncio.wait_for with a 30-second timeout in _write_rows_with_retry.
Robust Error Handling: Updated the exception handling to catch asyncio.TimeoutError. This ensures that if a write hangs, it fails fast, triggers the existing retry mechanism (with backoff), and eventually drops the problematic batch if retries are exhausted, allowing the worker to proceed to the next batch.
Graceful Shutdown: Added a try-except ReferenceError block in _atexit_cleanup to prevent noisy errors during script termination.

Testing Plan

Unit Tests:

I have added or updated unit tests for my change.
All unit tests pass locally.

Please include a summary of passed pytest results.

Manual End-to-End (E2E) Tests:

I verified the changes using a reproduction script that mocks a hanging BigQuery writer.

Reproduction: Created a script that mocks append_rows to return a hanging iterator and asyncio.wait_for to simulate a timeout.
Observation: Verified that without the fix, the worker hangs and the queue fills up.
Verification: With the fix, confirmed that:
- The TimeoutError is raised after the simulated timeout.
- The retry logic is triggered (logs "BigQuery Batch Dropped after X attempts").
- The queue continues to drain, ensuring subsequent events are processed.
- The application does not hang.

Checklist

I have read the CONTRIBUTING.md document.
I have performed a self-review of my own code.
I have commented my code, particularly in hard-to-understand areas.
I have added tests that prove my fix is effective or that my feature works.
New and existing unit tests pass locally with my changes.
I have manually tested my changes end-to-end.
Any dependent changes have been merged and published in downstream modules.

Additional context

This change prevents a critical failure mode where observability data is silently lost due to transient network issues or service hangs.

Refactor error handling in BigQuery write operations and add timeout for perform_write function.

gemini-code-assist · 2026-01-29T23:50:12Z

Summary of Changes

Hello @haiyuan-eng-google, I'm Gemini Code Assist¹! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request significantly enhances the reliability and robustness of BigQuery write operations within the analytics plugin. It refactors the existing error handling to be more structured, introduces a crucial timeout mechanism for write calls to prevent indefinite hangs, and improves the application's shutdown process by making the atexit cleanup more resilient to object lifecycle issues.

Highlights

Improved BigQuery Write Error Handling: The logic for handling BigQuery append_rows responses has been refactored into a dedicated perform_write asynchronous function, centralizing error processing.
Introduced Write Operation Timeout: A 30-second timeout has been added to the perform_write function using asyncio.wait_for, ensuring that BigQuery write operations do not hang indefinitely.
Enhanced Retry Mechanism: The retry block for BigQuery write operations now explicitly catches asyncio.TimeoutError, allowing timed-out operations to be retried gracefully.
Robust atexit Cleanup: The _atexit_cleanup static method now includes a try-except ReferenceError block when checking the batch_processor object, preventing potential crashes if the object has been garbage collected before cleanup.

🧠 New Feature in Public Preview: You can now enable Memory to help Gemini Code Assist learn from your team's feedback. This makes future code reviews more consistent and personalized to your project's style. Click here to enable Memory in your admin console.

Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature	Command	Description
Code Review	`/gemini review`	Performs a code review for the current pull request in its current state.
Pull Request Summary	`/gemini summary`	Provides a summary of the current pull request in its current state.
Comment	@gemini-code-assist	Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help	`/gemini help`	Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution. ↩

adk-bot · 2026-01-29T23:51:00Z

Response from ADK Triaging Agent

Hello @haiyuan-eng-google, thank you for creating this PR!

To help us review your contribution more effectively, could you please fill out the "Testing Plan" section in your PR description? This is required for all PRs that are not small documentation or typo fixes.

Additionally, could you please link to an existing issue or provide a more detailed description of the change in the PR body, following the structure of our issue templates?

This information will help reviewers to better understand and test your changes. Thanks!

gemini-code-assist

Code Review

This pull request refactors the BigQuery write operation to include a timeout and improves the robustness of the _atexit_cleanup function. The changes are generally good, but I have a couple of suggestions to improve configurability and code clarity. Specifically, I recommend making the new write timeout configurable instead of hardcoded, and refactoring a confusing if True: block in the atexit handler.

gemini-code-assist · 2026-01-29T23:51:21Z

src/google/adk/plugins/bigquery_agent_analytics_plugin.py

+                return
+          return
+
+        await asyncio.wait_for(perform_write(), timeout=30.0)


The timeout for perform_write is hardcoded to 30.0 seconds. It's better to make this value configurable to allow adjustments for different environments without changing the code. I recommend adding a write_timeout attribute to the BigQueryLoggerConfig class and using it here, similar to other timeout configurations.

Suggested change

await asyncio.wait_for(perform_write(), timeout=30.0)

await asyncio.wait_for(perform_write(), timeout=self.config.write_timeout)

gemini-code-assist · 2026-01-29T23:51:21Z

src/google/adk/plugins/bigquery_agent_analytics_plugin.py

+    try:
+      # Check if the batch_processor object is still alive
+      if batch_processor and not batch_processor._shutdown:
+        pass
+      else:
+        return
+    except ReferenceError:
+      return
+
+    if True:  # Indentation anchor, logic continues below


The try...except ReferenceError block is a good addition for robustness. However, the if/else with pass can be simplified. More importantly, the if True: on line 1648 is an anti-pattern used as an 'indentation anchor' which harms readability. It should be removed, and the subsequent code block (lines 1650-1694) should be unindented.

Here's a suggestion for a cleaner implementation:

try: # Check if the batch_processor object is still alive and not shut down. if not batch_processor or batch_processor._shutdown: return except ReferenceError: # The weak reference is no longer valid, so there's nothing to clean up. return # Emergency Flush: Rescue any logs remaining in the queue # ... (rest of the function, unindented)

Refactor BigQuery write error handling and add timeout

11f90cf

Refactor error handling in BigQuery write operations and add timeout for perform_write function.

adk-bot added the services [Component] This issue is related to runtime services, e.g. sessions, memory, artifacts, etc label Jan 29, 2026

gemini-code-assist bot reviewed Jan 29, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Refactor BigQuery write error handling and add timeout #4321

Refactor BigQuery write error handling and add timeout #4321

haiyuan-eng-google commented Jan 29, 2026 •

edited

Loading

Uh oh!

gemini-code-assist bot commented Jan 29, 2026

Uh oh!

adk-bot commented Jan 29, 2026

Uh oh!

gemini-code-assist bot left a comment

Uh oh!

gemini-code-assist bot Jan 29, 2026

Uh oh!

gemini-code-assist bot Jan 29, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

	await asyncio.wait_for(perform_write(), timeout=30.0)
	await asyncio.wait_for(perform_write(), timeout=self.config.write_timeout)

Refactor BigQuery write error handling and add timeout #4321

Are you sure you want to change the base?

Refactor BigQuery write error handling and add timeout #4321

Conversation

haiyuan-eng-google commented Jan 29, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Refactor error handling in BigQuery write operations and add timeout for perform_write function.

Link to Issue or Description of Change

Testing Plan

Checklist

Additional context

Uh oh!

gemini-code-assist bot commented Jan 29, 2026

Summary of Changes

Highlights

Footnotes

Uh oh!

adk-bot commented Jan 29, 2026

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

gemini-code-assist bot Jan 29, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Jan 29, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

haiyuan-eng-google commented Jan 29, 2026 •

edited

Loading