feat(genai): Support multimodal file inputs and display_name in function responses#1834
Conversation
|
Respected Sir Bagatur (@baskaryan) / Sir Harrison Chase (@hwchase17), I have fully resolved the issue regarding the missing display_name metadata in multimodal function responses. The architecture has been cleanly upgraded to natively parse interleaved media, file, and image_url blocks directly into the FunctionResponsePart and FunctionResponseFileData schemas as required by the new google-genai SDK standards. I have also included a comprehensive unit test to verify this behavior, and all strict typing (mypy) and formatting (ruff) checks have successfully passed without any workarounds. Thank you very much, Sir, for this wonderful opportunity to contribute to the LangChain ecosystem. It is an honor to help improve this library. Please let me know if there are any further adjustments you would like me to make, and I will be happy to implement them immediately! looking forward for your response |
|
"Hi maintainers, it looks like the langchain-google-genai-us integration test hit the 60-minute timeout limit on Google Cloud Build. Could someone with write access please re-run the failed jobs when you get a chance? All other 12 checks and unit tests have passed successfully. Thank you!" |
|
Respected Mason Daugherty (@mdrxy), Bagatur (@baskaryan), and Eugene Yurtsev (@eyurtsev), I hope you are having a great week! I wanted to gently bump this PR regarding support for multimodal file inputs and display_name within function responses for the Google GenAI integration. The code is fully complete and all 12 core unit checks passed successfully. However, it looks like the langchain-google-genai-us (llm-integration-tests) job hit the 60-minute Google Cloud Build timeout limit (likely an infrastructure flake). Could someone with write access please re-run the failed jobs when you have a chance? I know you all manage an incredible volume of work, so I have the utmost respect for your time. Whenever you have the bandwidth to review, I am highly available to make any structural adjustments you recommend to ensure this aligns perfectly with your vision for the package. Looking forward to your guidance, and thank you for all your hard work! |
Description
This PR resolves the issue where multimodal files were disconnected from the
FunctionResponseand stripped of their metadata. It updates_convert_tool_message_to_partsto accurately parsefile,media, andimage_urlblocks from aToolMessageand maps them intoFunctionResponsePartobjects.By natively integrating
FunctionResponseFileDataandFunctionResponseBlobfrom thegoogle-genaiSDK, this strictly maintains data associations and preserves thedisplay_name, ensuring the Gemini API can successfully distinguish between multiple files generated by a single tool call.Testing
test_convert_tool_message_to_parts_list_content_with_mediato assert the correct bundledPartstructure.test_convert_tool_message_to_parts_with_display_nameto explicitly verifydisplay_namemetadata preservation.Sir, if anything misses out, please let me know and I will fix it according to your expectation. Thank you so much for looking into it!