show episodes
 
Artwork
 
Deep Papers is a podcast series featuring deep dives on today’s most important AI papers and research. Hosted by Arize AI founders and engineers, each episode profiles the people and techniques behind cutting-edge breakthroughs in machine learning.
  continue reading
 
Loading …
show series
 
In this episode, we dive into the intriguing mechanics behind why chat experiences with models like GPT often start slow but then rapidly pick up speed. The key? The KV cache. This essential but under-discussed component enables the seamless and snappy interactions we expect from modern AI systems. Harrison Chu breaks down how the KV cache works, h…
  continue reading
 
In this byte-sized podcast, Harrison Chu, Director of Engineering at Arize, breaks down the Shrek Sampler. This innovative Entropy-Based Sampling technique--nicknamed the 'Shrek Sampler--is transforming LLMs. Harrison talks about how this method improves upon traditional sampling strategies by leveraging entropy and varentropy to produce more dynam…
  continue reading
 
This week, Aman Khan and Harrison Chu explore NotebookLM’s unique features, including its ability to generate realistic-sounding podcast episodes from text (but this podcast is very real!). They dive into some technical underpinnings of the product, specifically the SoundStorm model used for generating high-quality audio, and how it leverages a hie…
  continue reading
 
Financial institutions have dramatically increased their investment and trust in large language models since 2023. Kartik Ramakrishnan, Capgemini’s deputy CEO of Financial Services and head of banking and capital markets, shares the results of a recent report that analyzed these changes.
  continue reading
 
OpenAI recently released its o1-preview, which they claim outperforms GPT-4o on a number of benchmarks. These models are designed to think more before answering and handle complex tasks better than their other models, especially science and math questions. We take a closer look at their latest crop of o1 models, and we also highlight some research …
  continue reading
 
A recent announcement on X boasted a tuned model with pretty outstanding performance, and claimed these results were achieved through Reflection Tuning. However, people were unable to reproduce the results. We dive into some recent drama in the AI community as a jumping off point for a discussion about Reflection 70B. In 2023, there was a paper wri…
  continue reading
 
This week, we're excited to be joined by Kyle O'Brien, Applied Scientist at Microsoft, to discuss his most recent paper, Composable Interventions for Language Models. Kyle and his team present a new framework, composable interventions, that allows for the study of multiple interventions applied sequentially to the same language model. The discussio…
  continue reading
 
This week’s paper presents a comprehensive study of the performance of various LLMs acting as judges. The researchers leverage TriviaQA as a benchmark for assessing objective knowledge reasoning of LLMs and evaluate them alongside human annotations which they find to have a high inter-annotator agreement. The study includes nine judge models and ni…
  continue reading
 
Meta just released Llama 3.1 405B–according to them, it’s “the first openly available model that rivals the top AI models when it comes to state-of-the-art capabilities in general knowledge, steerability, math, tool use, and multilingual translation.” Will the latest Llama herd ignite new applications and modeling paradigms like synthetic data gene…
  continue reading
 
Chaining language model (LM) calls as composable modules is fueling a new way of programming, but ensuring LMs adhere to important constraints requires heuristic “prompt engineering.” The paper this week introduces LM Assertions, a programming construct for expressing computational constraints that LMs should satisfy. The researchers integrated the…
  continue reading
 
Where adapting LLMs to specialized domains is essential (e.g., recent news, enterprise private documents), we discuss a paper that asks how we adapt pre-trained LLMs for RAG in specialized domains. SallyAnn DeLucia is joined by Sai Kolasani, researcher at UC Berkeley’s RISE Lab (and Arize AI Intern), to talk about his work on RAFT: Adapting Languag…
  continue reading
 
It’s been an exciting couple weeks for GenAI! Join us as we discuss the latest research from OpenAI and Anthropic. We’re excited to chat about this significant step forward in understanding how LLMs work and the implications it has for deeper understanding of the neural activity of language models. We take a closer look at some recent research from…
  continue reading
 
Foundational models like GPT-4, the large language model behind ChatGPT, have hoovered up content from publications like The New York Times and social media sites like Reddit and OpenAI, and it faces several lawsuits because of this. John Thompson, global head of artificial intelligence at EY and author of the book Data for All, has set up what is …
  continue reading
 
We break down the paper--Trustworthy LLMs: A Survey and Guideline for Evaluating Large Language Models' Alignment. Ensuring alignment (aka: making models behave in accordance with human intentions) has become a critical task before deploying LLMs in real-world applications. However, a major challenge faced by practitioners is the lack of clear guid…
  continue reading
 
Proof of identity is critical for many things, including being able to open a bank account, get a job, or obtain health care. Yet proving one’s identity is getting harder in a world of frequent data breaches. We asked Mariana Dahan, founder of the World Identity Network and chair of the Universal ID Council, what she thinks will solve this problem.…
  continue reading
 
Due to the cumbersome nature of human evaluation and limitations of code-based evaluation, Large Language Models (LLMs) are increasingly being used to assist humans in evaluating LLM outputs. Yet LLM-generated evaluators often inherit the problems of the LLMs they evaluate, requiring further human validation. This week’s paper explores EvalGen, a m…
  continue reading
 
Custodia Founder and CEO Caitlin Long says the Federal Reserve has rewritten the rules around accessing the government's payments system. The central bank and a federal court judge disagree. Editor’s note: This conversation was recorded on April 17. On April 26, Custodia Bank filed a notice of appeal, signaling that it will challenge the district c…
  continue reading
 
This week we explore ReAct, an approach that enhances the reasoning and decision-making capabilities of LLMs by combining step-by-step reasoning with the ability to take actions and gather information from external sources in a unified framework. To learn more about ML observability, join the Arize AI Slack community or get the latest on our Linked…
  continue reading
 
This week, we’ve covering Amazon’s time series model: Chronos. Developing accurate machine-learning-based forecasting models has traditionally required substantial dataset-specific tuning and model customization. Chronos however, is built on a language model architecture and trained with billions of tokenized time series observations, enabling it t…
  continue reading
 
This week we dive into the latest buzz in the AI world – the arrival of Claude 3. Claude 3 is the newest family of models in the LLM space, and Opus Claude 3 ( Anthropic's "most intelligent" Claude model ) challenges the likes of GPT-4. The Claude 3 family of models, according to Anthropic "sets new industry benchmarks," and includes "three state-o…
  continue reading
 
We’re exploring Reinforcement Learning in the Era of LLMs this week with Claire Longo, Arize’s Head of Customer Success. Recent advancements in Large Language Models (LLMs) have garnered wide attention and led to successful products such as ChatGPT and GPT-4. Their proficiency in adhering to instructions and delivering harmless, helpful, and honest…
  continue reading
 
Geraldine Fleming, financial task force manager at United for Wildlife and Jonny Bell, director, EMEA, LexisNexis Risk Solutions explain how banks around the world are helping to catch criminals who illegally mutilate, kill and sell rhinoceroses, elephants, donkeys and other animals.
  continue reading
 
This week, we discuss the implications of Text-to-Video Generation and speculate as to the possibilities (and limitations) of this incredible technology with some hot takes. Dat Ngo, ML Solutions Engineer at Arize, is joined by community member and AI Engineer Vibhu Sapra to review OpenAI’s technical report on their Text-To-Video Generation Model: …
  continue reading
 
This week, we’re discussing "RAG vs Fine-Tuning: Pipelines, Tradeoff, and a Case Study on Agriculture." This paper explores a pipeline for fine-tuning and RAG, and presents the tradeoffs of both for multiple popular LLMs, including Llama2-13B, GPT-3.5, and GPT-4. The authors propose a pipeline that consists of multiple stages, including extracting …
  continue reading
 
We dive into Phi-2 and some of the major differences and use cases for a small language model (SLM) versus an LLM. With only 2.7 billion parameters, Phi-2 surpasses the performance of Mistral and Llama-2 models at 7B and 13B parameters on various aggregated benchmarks. Notably, it achieves better performance compared to 25x larger Llama-2-70B model…
  continue reading
 
We discuss HyDE: a thrilling zero-shot learning technique that combines GPT-3’s language understanding with contrastive text encoders. HyDE revolutionizes information retrieval and grounding in real-world data by generating hypothetical documents from queries and retrieving similar real-world documents. It outperforms traditional unsupervised retri…
  continue reading
 
The fintech revolution has been more successful at working with banks than at trying to replace them, points out Gene Ludwig, former Comptroller of the Currency, chair of the Ludwig Institute for Shared Economic Prosperity, and co-founder of Canapi Ventures. Those with “must have” products will fare far better in 2024 than those with “nice to have”…
  continue reading
 
For the last paper read of the year, Arize CPO & Co-Founder, Aparna Dhinakaran, is joined by a Dat Ngo (ML Solutions Architect) and Aman Khan (Product Manager) for an exploration of the new kids on the block: Gemini and Mixtral-8x7B. There's a lot to cover, so this week's paper read is Part I in a series about Mixtral and Gemini. In Part I, we prov…
  continue reading
 
It's been a rough year for fintechs and for the venture capital firms that fund them. Venture capital flows into financial technology companies dropped by 36% year over year to $6 billion in the third quarter of 2023. But Amy Nauiokas, founder and CEO of Anthemis Group, is optimistic about 2024.
  continue reading
 
We’re thrilled to be joined by Shuaichen Chang, LLM researcher and the author of this week’s paper to discuss his findings. Shuaichen’s research investigates the impact of prompt constructions on the performance of large language models (LLMs) in the text-to-SQL task, particularly focusing on zero-shot, single-domain, and cross-domain settings. Shu…
  continue reading
 
Over the past year, the national bank regulators’ oversight of Silicon Valley Bank, Signature Bank, Silvergate Capital and other banks that failed has been criticized. Reports of a toxic workplace at the FDIC have come to light. And the OCC hired a Deputy Comptroller and overseer of fintech who had easily discoverable falsehoods on his resume. Mich…
  continue reading
 
For this paper read, we’re joined by Samuel Marks, Postdoctoral Research Associate at Northeastern University, to discuss his paper, “The Geometry of Truth: Emergent Linear Structure in LLM Representation of True/False Datasets.” Samuel and his team curated high-quality datasets of true/false statements and used them to study in detail the structur…
  continue reading
 
In this paper read, we discuss “Towards Monosemanticity: Decomposing Language Models Into Understandable Components,” a paper from Anthropic that addresses the challenge of understanding the inner workings of neural networks, drawing parallels with the complexity of human brain function. It explores the concept of “features,” (patterns of neuron ac…
  continue reading
 
Community banks sometimes feel that they lack the budget and staff to compete with larger banks and fintechs on things like mobile and online banking, virtual assistants and most recently generative AI. Jim Perry, senior strategist at Market Insights, suggests steps they can and should take to stay relevant technology wise.…
  continue reading
 
The case is not really about cryptocurrency but about fraud, points out Seoyoung Kim, department chair and associate professor of finance and business analytics at the Leavey School of Business at Santa Clara University. But regulators and lawmakers are watching and the outcome of the trial will have repercussions throughout finance.…
  continue reading
 
Loading …

Kurzanleitung