AI/ML news summary: Week 41

Another week in AI means more breakthroughs, new models, incredible research, and massive leaps in hardware. I’ve scoured a lot of dark and obscure place of the interwebs to bring you this content as usual.

If you don’t like reading, here’s the podcast of this episode:


Before we start!

If you like this topic and you want to support me:

  • Comment on the article; Google appreciates that and it will really help spread the word 📢
  • Connect with me on Linkedin 🔗
  • Subscribe to TechTonic Shifts to get your daily dose of tech 💉


Alright, get ready…

GO !

OpenAI’s 2024 DevDay is the Super Bowl for AI nerds. Complete with executive drama, fancy new toys for developers, and enough jargon to make your head grow in circumference. But do not worry, I’m here to translate it into something that won’t require multiple PhDs to understand.

Think of it as the highlights, minus the fluff. And trust me, the tea was hot, starting with execs leaving the company and OpenAI securing more cash than a lottery winner.

Ze big announcements

OpenAI has been busy despite the executive exits that could rival a reality show drama. They have rolled out new developer features like vision fine-tuning, model distillation, prompt caching, and a bunch of other tools designed to make developers’ lives easier, or at least slightly less expensive. They’ve even brought out their new favorite: the Realtime API for speech-to-speech conversations. Plus, they bragged about their 3 million developers using AI models. I’m pretty sure even Starbucks doesn’t see that many people in a week.

And Nvidia decided to crash the party with their NVLM 1.0 multimodal model. That’s right, Nvidia wants a piece of the AIpie too, because apparently, GPUs aren’t enough to keep them busy.

Or from becoming even richer.

Let’s break it down:

1. Realtime API for speech-to-speech conversations:

You like talking to Siri or Alexa… Well, imagine those voice assistants after a serious upgrade.

OpenAI introduced a Realtime API that lets developers build apps with smooth, low-latency speech-to-speech interactions. No more clunky back-and-forths where you ask the bot to order pizza, and it responds 30 seconds later like it’s on dial-up. This API handles streaming audio inputs and outputs so conversations flow like you’re chatting with a human. It’s in public beta now, and if you’re ready to drain your wallet, it costs $0.06 per minute of input and $0.24 per minute of output. (this is the link: https://openai.com/realtime-api if you don’t trust me with dollars).

2. Canvas interface for writing and coding:

Ah, the Canvas Interface, for those of us who need a bit more hand-holding when it comes to coding and writing.

😊

OpenAI is now giving developers and writers a collaborative tool, built right into ChatGPT. It’s Google Docs or GoogleLM but the AI is somewhat smarter and more patient. You can tweak code or writing, get inline feedback, and make changes without the fear of judgment from your real-life colleagues. This is ChatGPT’s version of being your non-judgmental partner who helps fix your mess. You’re welcome. (this is the link: https://openai.com/canvas-interface).

3. Vision fine-tuning with GPT-4o:

Vision fine-tuning, because why settle for text when you can throw some images into the mix. OpenAI has expanded its GPT-4o capabilities to let developers fine-tune the model with images. You can teach it to recognize objects, analyze medical stuff, or do a visual search. The best part is that you don’t need thousands of images to get started.

A couple of 100 will do!

It feels like someone handed a personal AI that could diagnose a broken bone from an X-ray and find my missing keys under the couch. There is also a bonus…OpenAI is giving out 1 million free training tokens daily until the end of October 2024.

Who doesn’t love free stuff? (this is the link: https://openai.com/vision-fine-tuning).

4. Prompt caching:

You know the feeling that you’re paying too much just to get the same AI response again..

Prompt Caching is here to save your bank account. This is a feature that lets OpenAI’s models reuse tokens from previous conversations. And that is making it cheaper and faster when you keep asking the same questions. It’s kinda like when your phone auto-completes your texts based on what you usually say. The Pricing: Half the cost of regular tokens (this is the link: https://openai.com/prompt-caching).

5. Model Distillation:

Model distillation sounds fancy, but it’s really about a big, expensive AI model and making a smaller, cheaper version of it that is almost as smart. OpenAI has made it easier to fine-tune these smaller models with tools like Stored Completions to generate datasets and Evals for model performance. So, now you get high performance at a fraction of the cost. And no, it does not come with a free sandwich. But it’s still pretty good (this is the link: https://openai.com/model-distillation).

Why even bother

Well, first off, competition is heating up. Nvidia is launching its NVLM 1.0 and Google, Anthropic, and Deepseek they’re all pushing their own multimodal models. The AI space is starting to look like the Olympics. But OpenAI still holds the crown when it comes to making things user-friendly, even if they’re a bit late to the prompt-caching party.


Hottest news:

Because what’s a tech event without some breaking news? Here’s what you might’ve missed while OpenAI was busy:

1. Nvidia’s NVLM 1.0:

Nvidia just launched NVLM 1.0. That is a multimodal model that can handle vision and text tasks and throws down the gauntlet with 72 billion parameters. Because bigger is better when it comes to AI models. And Nvidia wants to prove they’re a bit more than just GPUs, they want more cash, and a slice of the AI pie sounds delicious too.

2. Meta’s Movie Gen Model:

If you ever wanted to create your own 16-second movie from a text prompt, but you were fed up with waiting for OpenAI’s Sora… Now you can, thanks to Meta’s new Movie Gen model. It generates realistic video clips with sound, but they haven’t released it publicly yet because, let’s face it, there’s a lot that could go wrong with letting people make AI-generated videos at scale. Humans… Big sigh…(this is the link: https://meta.com/movie-gen-model).

3. Microsoft’s Copilot overhaul:

Microsoft’s Copilot is getting a major upgrade with voice and vision capabilities. Like Microsoft Word got a superhero makeover, and now can talk to you and look at your documents while it helps you write that report (this is the link: https://microsoft.com/copilot-overhaul).


Short reads for when you need a break

1. Knowledge extraction using LLMs:

Extracting info from big ol’ piles of data using Large Language Models is all the rage. This piece tells you how businesses are getting smarter by having AI comb through text, tables, and figures to pull out the good stuff (this is the link: https://towardsdatascience.com/knowledge-extraction-llms).

2. A Data scientist’s guide to ensemble learning:

If you’ve ever wanted to combine multiple algorithms to get smarter predictions then ensemble learning is your new best friend. This guide covers the techniques and code you’ll need to pull it off (this is the link: https://towardsdatascience.com/ensemble-learning-guide).


Research papers of the week:

For those of you who like to dive deep, here are the hottest academic papers right now:

1. Were RNNs all we needed?:

This paper talks about the fact that maybe we didn’t need all this fancy new transformer tech after all. Turns out that by stripping down RNNs to their core components, you might get similar results (this is the link: https://arxiv.org/abs/rnns-vs-transformers).

Well, that’s it for the tech world’s latest roundup of AI goodness. From speech APIs to canvas coding interfaces, the AI future is looking bright.

Signing off – Marco


Well, that’s a wrap for today. Tomorrow, I’ll have a fresh episode of TechTonic Shifts for you. If you enjoy my writing and want to support my work, feel free to buy me a coffee ♨️


Think a friend would enjoy this too? Share the newsletter and let them join the conversation. Google appreciates your likes by making my articles available to more readers.

Become an AI Expert !

Sign up to receive insider articles in your inbox, every week.

✔️ We scour 75+ sources daily

✔️ Read by CEO, Scientists, Business Owners, and more

✔️ Join thousands of subscribers

✔️ No clickbait - 100% free

We don’t spam! Read our privacy policy for more info.

Leave a Reply

Up ↑

Discover more from TechTonic Shifts

Subscribe now to keep reading and get access to the full archive.

Continue reading