Evaluating LLMs with Leva
Manage episode 502564807 series 3642718
In this episode of the Ruby AI Podcast, host Valentino Stoll talks with special guest Kieran, a prominent figure in the Ruby AI space. Kieran recently gave a talk at the San Francisco Ruby Meetup about his new gem, Leva, which focuses on LLM evaluations in Ruby. Kieran discusses his background, his passion for AI and Ruby, as well as his journey in building AI products, including his tool Cora, which helps manage email inboxes by categorizing and summarizing emails using AI. Together, Valentino and Kieran explore the process, challenges, and best practices of creating AI-driven gems and tools in Ruby, the importance of evaluations, and the fun and creative aspects of integrating AI into Ruby on Rails projects.
00:00 Introduction and Guest Welcome
00:53 Kieran's Background and AI Journey
01:20 Building AI Tools and the Leva Gem
03:47 Challenges and Best Practices in AI Development
07:16 Evaluations and Real-World Applications
07:36 Community Recognition and Adoption
12:37 Prompt Engineering and Model Testing
22:06 Leveraging AI for Workflow Optimization
28:35 Visualizing Workflows and Tools
31:44 Exploring Hybrid Orchestration Layers
33:15 Debating Deterministic Workflows vs. Agent Flows
34:28 The Fun of Experimenting with AI and Ruby
34:55 Building Gems and Learning Through Creation
40:03 The Value of Rails in AI Development
46:28 Evaluating AI Outputs and Metrics
50:40 Annotation and Continuous Improvement
53:50 Future of AI and Rails Integration
54:54 Closing Thoughts and Recommendations
Kapitel
1. Evaluating LLMs with Leva (00:00:00)
2. Kieran's Background and AI Journey (00:00:53)
3. Building AI Tools and the Leva Gem (00:01:10)
4. Challenges and Best Practices in AI Development (00:03:47)
5. Evaluations and Real-World Applications (00:07:16)
6. Community Recognition and Adoption (00:07:36)
7. Prompt Engineering and Model Testing (00:12:37)
8. Leveraging AI for Workflow Optimization (00:22:06)
9. Visualizing Workflows and Tools (00:28:35)
10. Exploring Hybrid Orchestration Layers (00:31:44)
11. Debating Deterministic Workflows vs. Agent Flows (00:33:15)
12. The Fun of Experimenting with AI and Ruby (00:34:28)
13. Building Gems and Learning Through Creation (00:34:55)
14. The Value of Rails in AI Development (00:40:03)
15. Evaluating AI Outputs and Metrics (00:46:28)
16. Annotation and Continuous Improvement (00:50:40)
17. Future of AI and Rails Integration (00:53:50)
18. Closing Thoughts and Recommendations (00:54:54)
6 Episoden