powers
Team consisting of Hamza (Data Scientist, SOPHiA GENETICS — multimodal ML, LLMs/RAG), Moncif (Java/Spring, Amundi), Taha (PhD ENIB/UniSA — Unity/C# XR), Wissal (SNCF — software, CV).
YouTube Video
Project Description
We have a built an interactive google maps agent, that showcases directions using 3D artifacts built on top of Unity, Google Gemini APIs and ADK. The audio transcribes and then the text is sent to the agent through a fastapi POST request. The agent decides if he wants to call the tools in his disposition (get_directions) in order to get the exact directions and coordinates (lantitudes and latitudes) of the start and end of trajectory as well as each step of the way. For example, while walking to your destination, you will get to see from far away the arrow pointing to you next turn as well as distance in meters.
Our agent has one goal which is to show clear directions to help people navigate new cities while being super accurate by relying on real coordinates.
Our current agent is deployed at: https://production-adk-agent-3-1004225678073.europe-west1.run.app
We also managed to deploy and test gemma3-4b on NVIDIA L4 GPU in google cloud at https://ollama-gemma3-4b-gpu-1004225678073.europe-west1.run.app
As well as 2 google maps MCPs. However, we noticed a huge issue with gemma3-4b when using tool calling or MCP (we tested both) it kept recalling the tools in a never stopping loop, only once we switched to gemini api that it worked.
Our app is built with the following:
- Unity (C#)
- ARCore Geospatial API (to load worldwide map)
- AR Foundation (for augmented reality)
- Unity Speech to Text Plugin (using Whisper)
- ADK for agent setup and Tool calling
- All deployments on Google cloud run (GPU for Gemma3-4b), MCPs and Agent as well
- FastAPI to expose the endpoint to the frontend
Prior Work
None