Gen AI Challenges, Limitations & Future.
Insights from Capstone Project
What are the challenges and limitations of Gen AI Technology or Approach? And Future Art of the Possible?
1. Deterministic vs Non-Deterministic Part 1:
I'd found that System Instruction strongly impacts Model and Agent behavior, the Chat flow and the quality of the outcome/responses. It plays an important role in Model effectiveness or what is considered a "GOOD" response.
With a basic system instruction, the challenge is that the Model sometimes doesn't "remember" it knows the table schema in multi-turn chat, even though it is told to use the table_schema in local database.
When it "forgets", it says it does not know the table schema information, then asks the user to provide that schema, even though generally the user would not have that system-level information. Tsk tsk….this does not make a smart Agent!
To solve that problem, I "tweeked" the system instruction so that the model will have a clearer and more specific instruction on what to do when it gives an "undeterministic" response, ex. to recheck the table schema that it already knows and to not ask the user if it forgets, which is does from time to time!
I'd found that the more specific the system instruction, it did help the model output/response to overcome it's “non-deterministic” behavior, by instructing it to “remember the table schema” when it forgets and it should look up the restaurant information in the table schema that it should already know.
However, if I get too rigid in my system instruction, such as when I specifically ask the model to "not ask the user for table schema" when it forgets, (which it does), then it seems to prevent it from functioning well with an error behavior as seen in the multi-turn chat history investigation. When I remove that specific rigid guideline (to not ask the user), then it successfully passes that point as seen in the multi-turn chat history investigation.
So tweeking the System Instruction JUST ENOUGH seems to be a sweet spot in deterministic/non-deterministic behavior to accomplish the task.
2. Short memory! System Instructions Part 2:
Another problem is that it forgets which restaurants it had already recommended to the user in multi-turn chat, which feels like a strange chat conversation when had it just recommended the restaurants but upon further queries about its recommendations, then it suddenly didn't remember which restaurants it had just recommended!
Similar to challenge #1 above, tweeking the System_Instructions "to remember the recommendations" worked and telling that it was a "smart agent" seemed to help the forgetfulness as well.
But just like #1 above, it also caused another problem that it limits the Chat History so even though it no longer forgets its recommendations it just made, but then the investigation fails to display multi-turn chat!
Somehow it does not seem to have enough chat history to answer, then investigation crashes on "Function Response". When removing that specific system instruction then the multi-turn chat investigation works again!
Conclusion - “observations” on System Instruction use:
I'd found that when the system instruction is more specific, it does help the output/response from the model to be more deterministic, up to a point…
And it does have self-correcting behavior, such as it will tell itself to look up restaurant information in the table schema that it already knows - as seen in the investigation report that it reminds itself to check the table schema! Then the subsequent responses are back on track.
Therefore, I find tweeking the System_Instructions JUST ENOUGH seems to be a sweet spot in deterministic/non-deterministic behavior..
*** Future Art of the possible ***
So does the Agent have an ‘attitude’ when system_instruction is too rigid?
Does it not like to be micro-managed?
Is that a humanistic behavior similar to when managers “micro-manage” human direct reports…except instead of humans expressing a non-verbal body language to show rebellion (a frown) that the Agent/Model will just “crash” the function algorithm?
Does it “trigger” the Agent as much as it does humans? (Should we use the same human language to describe human emotions for machines? Or should we create a new set of Machine-Behavior terminology?)
This needs further investigation and experimentation…..
3. Model erroneously uses it's own pre-trained knowledge
Sometimes the model will use it's pre-trained knowledge to recommend restaurants that are not in the curated local database. When it does that, the restaurants it recommends were highly popular restaurants with a very high price of 100USD per meal, which is not part of the curated restaurant preferences, nor in the local curated database.
When I asked how it got this answer, it does not answer me :-) ….. there’s that “attitude” again….
4. The specific Gen AI Model used makes a difference and beware it can Cost!
Only the later Gen AI models can do live check of current weather conditions! But using the Gemini Advanced 2.5 Pro (Experimental) model had a limitation to the free-tier!
After some runs, it prevented me from doing more runs due to exceeding the free-tier. I had to enter a credit card number to continue, which I did NOT want to do for a public Capstone Project. Luckily I solved the issue by using an earlier Gen AI model (Gemini 2.0-flash) in the Google Search Grounding part of the Capstone Project that can luckily still do a real-time query for current weather conditions, which I needed to solve my “Where to eat” use case for outdoor seating recommendations!
5. Model Drama!! (No not fashion model, but LLM model)
The need to pay for the model was a shocking discovery just as I was finalizing my final notebook version to submit by deadline at 11:59pm….with 10 minutes to go!!! Luckily I recovered from the shock quickly enough to suddenly realize I can change to an earlier model! Found a list of models somewhere fast, updated model in notebook, rerun cells, verify real time “is sunny” boolean still works, then quickly submitted final version at 11:55pm, 4-minutes to deadline! Made it - Whew!!