Session Timeout

Your session is about to expire. Do you want to extend the session?

00:

Extend My Session

Custom Meta Tags

Podcast Hero Banner (HB)

Podcast

Title ep75 (LC)

Billion-parameter brains in pocket-sized chips: The local AI revolution

We Talk IoT - Episode 75 | Avnet Silica & Tria

Introduction and embedded podcast ep75 (LC)

In this episode, we explore how engineers are embedding powerful AI directly into hardware – no cloud connection required.

Michaël Uyttersprot from Avnet Silica and Cedric Vincent from Tria Technologies reveal how they run ChatGPT-quality language models on resource-constrained embedded devices. What once demanded data centre infrastructure now fits onto chips with just 2GB of RAM.

The conversation covers the technical challenges of cramming billion-parameter models into embedded systems, real-world applications from conference rooms to industrial robotics, and the three compelling reasons driving this shift: data privacy, power efficiency, and cost control.

Michaël and Cedric discuss hardware platforms from AMD, NXP, and Qualcomm, explain techniques like quantisation and mixture of experts, and demonstrate applications including a vintage telephone box that lets you call avatars from different time periods.

Tune in to learn why the future of AI might not be in the cloud at all – and what that means for industries from manufacturing to healthcare.

Summary of episode

02:48 - What makes large language models special
05:27 - Why run LLMs locally on embedded devices
07:42 - Real-world applications: Vision LLMs and OCR
11:12 - Technical deep dive: How to fit billions of parameters into tiny devices
18:52 - Understanding temperature: Making AI creative or accurate
22:41 - Industries moving fastest: OCR, security, and robotics
24:52 - Future applications: Robotic arms and time series analysis
28:00 - The biggest technical hurdle: Power consumption
30:55 - Advice for engineers: Start with llama.cpp

Speakers

Podcast overview (GBL)

See all episodes

From revolutionising water conservation to building smarter cities, each episode of the We Talk IoT podcast brings you the latest intriguing developments in IoT from a range of verticals and topics. Hosted by Stefanie Ruth Heyduck.

PODCAST OVERVIEW

Get in touch (CO)

Liked this episode, have feedback or want to suggest a guest speaker?

GET IN TOUCH

Transcript ep75 (LC)

Episode transcript

Transcript from episode sample

Ruth: While ChatGPT and similar services dominate headlines, a growing number of engineers are embedding AI directly into hardware from conference room systems to industrial ovens. But why choose local processing over cloud services? The answer lies in three critical factors, data, privacy, connectivity, independence.

Cost control. My guests today are Michaël Uyttersprot from Avnet Silica and Cedric Vincent from Tria. And together we will explore the technical challenges of cramming internet scale knowledge into embedded systems, examine real world applications that are already working today, and we will discuss which hardware platforms deliver the best performance.

So, let's dive right in. Welcome to the show, Michaël, Cedric, I'm really glad to have you today. How are you?

Michaël: Thank you, Ruth, I'm doing well. Thank you.

Start of full transcript

Ruth: Hmm.

Cedric: Thank you for the invitation, looking forward to discussing large language models with you.

Ruth: Would you like the opportunity, both of you to introduce yourselves, Michaël, maybe we start with you. What is it that you do?

Michaël: The topic that I'm focussed on, within Avnet Silica, so we are a distributor, of electronic components. My focus is on artificial intelligence. I started with this topic already more than 20 years ago from university when it was not popular that time.

I support customers, but also, support the company in going to the direction that we want for the future with, artificial intelligence.

Ruth: Terrific, Cedric.

Cedric: So, I'm Cedric Vincent. I'm working for Tria Technologies. With my team, we are specialised in application development for compute modules. We are developing compute modules, based on NXP processors, Qualcomm processors. And what we do with my team is to put some software on these modules, but to try to show a bit the future. So basically, not sticking with what we can do today. Really try to embrace what we can implement with the next generation of hardware. For instance, if we talk about Qualcomm with the new possibilities that come with these processors, thanks to the neural processing unit, we are trying to show customers, okay.

What's next and what could we implement, within that specific piece of hardware, which comes with Qualcomm processors. And of course, one of these things are large language models.

Ruth: Mm-hmm. Fantastic. To start us off, Michaël, what is so special about large language models?

Michaël: Well maybe it's good to go back a bit into history, I would say, and especially if you take a look to, what has been developed by OpenAI with ChatGPT. And if you take a look to, specifically GPT-2, this was in fact a game changer, because what they implemented was a kind of large language model, was not that big, but bigger than what they did before because it was not the first language model of 1.5 billion parameters.

And what they found out is that the usage that you have with this model was completely different than what you had to do in the past. So before large language models, you had, to train a model, to have a specific task, for example, to translate a text. But if you want to do summarisation or any other kind of task, you need to create a new kind of model, retrain it, new kind of data.

So, this makes it more complex if you want to change something. But with GPT-2, they found out that if you have trained this model, that you can have a prompt, like translate this English text to French, summarise this article in one sentence, or create a story, that this can be done with a prompt and in fact, you do not need to.

Retrain the model. It's already inside the model that you have this implementation. So, the model could change its behaviour just by changing the input text. So, what we call the context without retraining. This was a game changer. It was the first general purpose transformer-based language model. And so, what developers also realised is that the context can be, in fact a control mechanism. That means, and this is how they implemented it since, GPT-3, that means that, you have in-context learning where you can say, for example, those are three examples. Behave now the same based on those examples.

And this is something that made a very big difference if you compare what you had to do by retraining in the past, having new data sets, in fact because of GPT-2 and then the follow up, GPT versions where you have in-context learning. This made a big, big difference.

Ruth: Okay. And now with these large language models do need a lot of storage, a lot of power to run on machines that are usually in the cloud. And now we are talking about, you have been testing how local LLMs could work on embedded devices. Why did you start on this journey?

Michaël: It's a good question. Well, there are several reasons. One of them is because we saw more and more interest from our customers, maybe because of the reasons that you already mentioned, like privacy, cost reduction, so to run AI locally.

Ruth: Mm-hmm.

Michaël: In addition, we also wanted to find out, does it, can this really work like running large language models on embedded devices?

Is this feasible?

Ruth: Mm-hmm.

Michaël: So, what we did is we started with some internal testing. We also created some applications, together with Cedric. We developed like a telephone box where you were able to talk with an avatar and call back to the past and the future. Of course, this is nice to find out how things are working, but in addition, we also did investigation on what we call small language models between one and 10 billion parameters.

Can you create, for example, like what Google developed? Google Notebook LM, the audio podcast implementation. So, you have two voices that talk to each other about a specific topic. So that's how it's working. So, with this application, so you upload a document on Google Notebook LM, or you upload like a link from the website, and two voices talk to each other, have an interaction, a discussion about that topic.

Ruth: Okay.

Michaël: So, we tried—

Ruth: What's happening here. These are all real-life people.

Michaël: This is real life. Exactly. Yeah. So, we did some tests. We took hardware like with AMD. We also tested with NXP devices like i.MX 95.

And what we found out is that even if you have a model, like 2 billion parameters, that this can work, you can have an interaction, between two voices, that talk about a specific topic, and that sounds very realistic. And this is one of the starting points for us to continue more on this and to have engagement with customers and to work on projects with customers.

Ruth: Speaking of customer, Cedric, is this something that you are also noticing this increased customer interest. What do you think is driving this demand? In addition to what Michaël just mentioned?

Cedric: Actually, what's happening right now, it's not so much about large language models like ChatGPT. There is some interest, because ultimately this type of use case, so you talk with a model and you get some answers, can use some database to give you this answer. But where we see more interest in the industry right now, it's about vision large language models, which is kind of derivative of the original ChatGPT.

You can see that already on your mobile phone. Like, sometimes I'm using that when I'm cooking, like I'm taking a picture of a vegetable. I'm not quite sure what it is because I get that in one of these boxes. Anyway,

Ruth: Okay. I understand what you mean.

Cedric: And then, and then I just take a picture of that and it's telling me, okay, this vegetable is this and you can cook it this way.

Ruth: Hmm.

Cedric: But where it gets interesting in an industrial context is like we can run this type of model as well on the device. And you can use that for safety purposes, for instance, like on a shop floor in any industry, that you want to check if everybody is wearing hard hats, for instance.

In the past you could have done that with what I'm going to call regular computer vision. But you would've needed to get a lot of data, like a lot of pictures of people basically wearing hard hats and then you train a model. But, what Michaël already explained, thanks to generative AI and vision large language models.

What you can do today is just explain the model, okay? Please look that everybody on that picture is wearing hard hats and if someone is not actually doing it, please make sure that you tag that and I get an alarm on my system. So that's one thing that is definitely triggering a lot of interest for a lot of customers, that ability to have AI models that you can reconfigure very easily to a specific task and including computer vision. Another use case that is triggering a lot of interest is OCR, because in the past we already had models for OCR, like it's nothing new, but the quality that we can reach today with a large language model that you can run on embedded devices that are developed by small companies like I2OCR, for instance, or some other Chinese company that I will not mention here, they reach a level of quality even for unstructured documents with handwriting and everything, which is absolutely crazy. And it's where we see a lot of interest as well coming from customers. They can use this type of models on processors like i.MX 95 on our SMARC modules or on Qualcomm, if you need a bit more processing power and the time between the moment you do the request and you get actually the document is super critical for you.

Most of interest is it's kind of this crossroad between computer vision and large language models, and we see more and more interest in that specific field today coming from our customers.

Ruth: Interesting. Now my mind goes into all different kinds of directions. I of course want to know more about the real-world applications and especially the OCR example you just mentioned. But I think we also should talk about a little bit of technical deep dive at this point where, how you make this possible.

You mentioned you tested different boards on different applications. What are your findings here? Can you walk us through some of this?

Cedric: Yeah, of course. So, running, like when Michaël was talking about the size of the model. Like it's billion parameters. It's a lot. You should imagine it's a lot of little dials that you need to move to get to the solution of what you put as an input. Like is it your voice?

Is it some text? Is it some image? And then you can explain like, as the user, you can move some of these dials with the context and then everything, somehow the magic happens. If you look at the original paper, the guys who wrote it, the name of the architecture, its transformer, and one of the sentences was like, it works extremely well, but to be honest, we don't really know why.

Ruth: Yeah.

Cedric: It's a kind of, and that's why some people were scared about them anyway. And so, you have all these parameters, but right now, from a technical standpoint, these parameters, how we code them, it's like with big numbers. And these big numbers, they're taking a lot of space in the memory of the compute module and the magic to make this run on embedded devices.

It's basically to cut these numbers, to reduce the size, and we can do that very well. And there are very good algorithms actually to take like these, like, I don't know, like four, 8 billion parameters, models, and to shrink the size of the model by reducing the size of the way we encode these numbers.

Ruth: Okay.

Cedric: By doing that we can actually run that on very small microprocessors just by doing that shrinking of a model, we can run that on smaller processors like i.MX 95, QCS 6490, and run it quite fast actually.

Fast enough that it's actually usable for customers.

Michaël: This is, that's quite interesting indeed. And it is, like what Cedric mentioned, there are several techniques like quantisation for example, where you, it's a kind of compression, but you can keep the accuracy quite high, even like 95, close to a hundred per cent and which is good enough for many applications.

Also, Cedric can, like for example, we have mixture of experts. So, this is a technique that, you select one part of the model. So, you, it's also kind of shrinking down what you use of the model and the expertise, for example, focussed on mathematics or another one of language, for example. But this is all in the same model.

But you use only a small part of it. And those are the things that we try to find out if we use those models on embedded devices. What can you use to make it efficient and run it fast on the processing devices that we target, for example, with AMD or NXP.

Ruth: When I try to now reverse this logic of now, you're trying to, you're fitting something that has been out there, the large language model onto small, embedded devices. Now my question would be, why not use that intelligence to make the large language models run more efficiently?

Would that also work? Or are we losing, are we losing use cases or scenarios?

Cedric: You still lose in accuracy. So basically, there is no miracle if you reduce the size of something. At some point there is a loss in terms of quality of the output, you can balance that very well. And there are a lot of people that have been working for years to make sure that it's not too noticeable when you are using the model, but—

Ruth: Okay. Got it.

Michaël: But of course, some techniques they use in higher end models like mixture of experts. This is used also in the bigger models, but quantisation, this is more sensitive on GPUs. You will not do this kind of compression. But yeah, this is something you see that the models, the evolution of the models improve, and this also helps by reducing the amount of the modules or the number of parameters and to speed up the models, for example.

Ruth: And coming back to real world applications, you are now testing what is possible with these compressed and local language models, correct.

Cedric: That's absolutely correct. It's what we have been running already from the beginning, because that specific architecture, it's our COM Express module AMD Ryzen 8000. And on this module, like there is very powerful iGPU but still, it's not what you will get in a data centre.

It's not the same power consumption; it's not the same processing power. So, for that type of processor, we already need to quantise the model. So, to reduce basically the size to compress it, that we can actually run it fast enough on this type of processor.

Ruth: And I think we have talked about the telephone box on one of our previous episodes. I will just put the link into the show notes for our listeners who have not listened to the previous episode. Can you explain what that use case was about? And I think you also have a video from Embedded World.

I think I will also put in the show notes, but maybe you can describe a little bit what that use case was about.

Cedric: So, it was to demonstrate how we can link different types of AI models to create an end-to-end experience for customers. Like instead of talking to a real person, basically you are talking to an AI bot. And to do that, we are using one model to do the speech to text, so transforming your voice into some text, which was developed by OpenAI, which is called Whisper, and that we are running on the processor.

Then we have a large language model, which is basically the brain of the system and that we customise for different types of personas. So, with Michaël, we had like one Texas guy, one guy in the future. So, we had a couple of fun personas just to demonstrate the concept. Then we have a database attached to these people.

We had a technical specialist for Tria Technologies products. And we have a database of all our products. So, which means that when the AI is going to look for some information, it's not going to invent things. I guess you heard about the bad stories about AI starting to invent new information, because it was just trying to please the end customer. In that case, we absolutely wanted to avoid that. So, we have a database attached to the AI model, just to make sure that it's not going to invent new information.

Ruth: To check back with the database if the information it's giving is correct.

Cedric: Actually, it's taking as an input, the information from the database, and then we set up the model in a way that it's not supposed to, I was about to say lie or improvise or invent new information. So, you have that notion of temperature.

And the higher the temperature is, the more likely the model is to improvise, which is good because then you're going to have more, more interesting discussion, or it might get funny or like, then it has a bit more of its own character. And the lower the temperature is, the more it's going to be boring, but the more it's going to be accurate in terms of what it's answering to your questions basically.

Ruth: What temperature, sorry.

Cedric: So, it's one of the, like when you start to work with large language models, you have a lot of parameters that you can use to customise basically how it's going to react to your request. And it's all about probabilities. And one of this parameter is called the temperature of the model. And this temperature.

It's very interesting. And if you use ChatGPT, I don't know actually if it's for like, the public, but quite often actually that's the setting that you can change or that's the most common setting that you get access to and then you can change that setting and you can improve.

Basically, the more you increase that temperature, you're going to improve the possibilities for the model to come up with original ideas. So, for instance, if you use LLM to do creative writing, then you want to have a very high temperature.

Ruth: Mm-hmm.

Cedric: Because you want to make sure that it can actually invent interesting things.

If you want to have a large language model for technical questions like for technical specialists, whoever is going to—

Ruth: Yeah.

Cedric: You go rather close to zero and say, no, no, you please do not improvise. Do not invent that module with a brand-new processor.

Ruth: Keep it cool.

Cedric: Keep it slow down, tiny little bit, and that's the temperature that you adjust.

Ruth: I didn't know that. Thank you.

Michaël: This is also like, for example, with the telephone box when we had the avatars that you call back to the past or call to the future, that they are very creative and inventive. So, the temperature is higher on that one. But if you ask questions about Tria products, it needs to be correct.

So, you will not put a high temperature there and is the same in the direction that you have, what we call hallucinations. If this is too high, it can create also easier hallucinations so that it goes in a direction that you do not want. So, it depends on the application, how you set this or you change this parameter.

Ruth: Speaking of applications, I can see service hotlines, customer service desks, information. If I'm in a city, they used to have it in the airport where you can talk to an avatar and it would give you information on the city you just landed in, stuff like that.

Michaël: It can be. Cedric mentioned also something like, vision related. But initially we started, in fact more the text-based models where you have applications like a kiosk or you arrive in a hotel. You can check in, ask questions.

For us, what was quite interesting is, if you have applications like conference systems.

Ruth: Mm-hmm.

Michaël: And so that the conference system can have an interaction with the participants in the room, can translate, for example. But this can be critical if you have, where it's very confidential what is said.

So, to have this locally, and not running in the cloud, not to send the data to OpenAI, for example. So, this was one of the triggers we had to do this kind of implementation. But of course, in the meantime, you see that there are much more applications also related to vision in combination with audio.

And this is a trend that we have now. So, it's not only text-based, it's not only like conference systems, kiosks, but also applications. This is something that will be very, very interesting for the near future.

Ruth: Okay. So exactly that's what is next on my list of questions. I want to ask you which industries are moving fastest with this, and especially with the OCR use case you just mentioned. What is it about?

Cedric: The use case is about, like in some industrial context, when you are printing, you need to make sure that the output of your printing is 100 per cent accurate.

Ruth: Mm-hmm.

Cedric: It's one of the requirements and drug companies that are specialised, for instance, in developing OCR systems, are going to check that the output of a printer.

Is going to be 100 per cent accurate compared to what it was supposed to print. And that's a real use case that we've been discussing with customers, for instance.

Ruth: You mentioned conference rooms as a use case, so security relevant use cases.

Walk me through some more of those examples.

Michaël: Well, you can imagine, for example, robotics or security cameras for example. And even also where you traditionally had vision implementation. So, with convolutional networks, that you can do this also with generative AI. And have a combination of, for example, you detect what you see on the street but also can ask questions what happens on the street, for example, if there was an accident, when this accident happened.

So, you will see more and more this kind of combined views. It's not purely the vision; it's also an interaction you have with the camera. So those are applications that we will see more and more like, related to cameras, vision, robotics where you not only have the interaction with the robot, but also the understanding what's the physics around the robot so that he knows what is happening.

Ruth: Hmm.

Michaël: Yeah, those kind of applications. This is typically what we discuss today with customers.

Cedric: I mean, it's what's going to happen. So that's really brand new, I would say, and that's going to take a couple of years before it gets to the industry, just to be clear. There are two new types of models actually that are becoming more and more popular. One is for robotic arms, that's a new type of, should not be called large language model anymore because it does not make much sense.

But it's the same type of architecture, but it's for the movement of your arm. And basically, you are explaining what the robotic arm should do. So instead of training computer vision, having some logic to pick up, I don't know, the red blocks, and to put them in the blue bin, you are just prompt and you say, please pick up the red blocks and put them in the blue bin.

And that's brand new. This type of large language model that you can use actually to control the robotic arm because there is nothing in between. There is no logic anymore. It's just the AI model that is actually taking control of all the sensors, all the actuators that are in the robotic arm, and then.

Targeting the move, and I've seen like some demonstrations on Hugging Face, that is the go-to platform for this type of AI model today. And they developed basically an open-source robotic arm, and they are developing a framework and a lot of things around that, which is, the CEO of the company is French, they call it Le Robot.

And I've seen that they were using it, for instance, to fold some laundry, which is super hard. Like you do that on your bed. You think, okay, yeah, but for a robotic arm, very fine motoric skills to do that.

Ruth: Wow.

Cedric: It's super impressive. And it's coming. And there is another trend, which is with time series.

Ruth: Hmm.

Cedric: But when you are looking at a factory, basically a lot of data that are coming from the sensors of a factory are time series. And, I think it's coming from Stanford in the US, they developed a new generation of, again, I call it large language model, but the name is a bit wrong there.

Let's take a real-world example. Your heart, like, you're going to, you don't feel so good. And so you go to the doctor, they're going to do some measurement of how you're breathing, your heart, all the information about what's going on in your body, basically.

And they're going to look at it, this new generation of LLM, they can take as an input in addition to the text, basically time series. So, time series, it's going to be like the recording of your heart, and they can then tell you, okay, like potentially the issue that you have is that that, and that. I mean, AI scientists have been trying that with health data so far. They've always failed, just to be clear. Google invested a lot of money into that for a long time, but this type of model is the first time, actually, it looks good. It looks like it could be a good assistant. It will never replace doctors, but it could be a good assistance for doctors, that it's actually going to give them a good hint in terms of what's happening for that specific person.

Ruth: What, in your opinion, is the biggest technical hurdle that you now still have to overcome?

Cedric: I would say power consumption. I mean, everybody talks about that when you use ChatGPT. But it's really bad. Right now, these AI models, there are still many running on the GPUs, which are extremely power hungry, because they're good at doing it. But yeah, they use a lot of energy.

But it's changing. Like, companies like Qualcomm, they have been investing a lot of money that we can run these models on NPUs, neural processing units.

Ruth: Mm-hmm.

Cedric: You might wonder, why is it any different? Because NPUs, they're specifically optimised to work with those quantised values. I mean, GPUs, they can do it as well.

And they're specifically optimised to have a very, very low power consumption, which means in the future we will be able to run this type of AI model on NPUs and with a very, very low power consumption, which means that when today you use your mobile phone to run one of these models, it goes to the cloud and then you get your answer back.

Which is very bad for the environment, but with the next generation of NPUs that we're going to have on the mobile phone.

Ruth: Mm-hmm.

Cedric: That will not go to the cloud anymore. It'll just stay locally on your mobile phone using very little energy compared to what's happening today.

And you will get your answer about what the heck is that vegetable?

Ruth: And looking ahead, what do you think, when will we reach cloud quality performance on embedded devices?

Cedric: It's still going to take a couple of years.

Ruth: Hmm.

Cedric: Do we need it?

Michaël: We do not always need it. It depends on the application, of course. Do you need to have a high-end system that knows everything if you only have a specific task for it?

Ruth: Hmm.

Michaël: So, it depends on the application. But of course, what we will see is that many companies, like for example, Qualcomm, but there are also companies that develop dedicated chips for generative AI or for CNNs, for example.

We will see that those will come with chips, very low power consumption, but yet that you can integrate in your application and have a quality of a kind of, I would say, ChatGPT of today. And you run it within two years on a chip. This is the trend that we will see. And of course, this is also what we will focus on to support our customers for this kind of higher end applications, but even on chip level.

Can be done integrated in the processor or can be as an external NPU, so neural processing unit, to build the application.

Ruth: And for engineers considering exploring local language models, what's a piece of advice you would give them before they start prototyping?

Cedric: I have six, straightforward, they should use llama.cpp. So that's from, also, you can find it on GitHub. I mean, there are plenty of other options. Sometimes, for instance, Qualcomm, they have their own libraries for the NPU. I know that NXP is developing the same. But llama.cpp, it has been developed by the community for the last four years.

We have been using that from the beginning. Qualcomm participated to the project already to optimise the implementation that is running on their embedded GPUs. So, it's super well maintained. It's like you have all the algorithms ready for you and the models quantised that you can find on Hugging Face.

You have all the tooling to retrain some of these models because when you use a very small, large language model, ultimately what you end up doing, it's still to retrain it for your specific use case. If you want to have good accuracy in terms of output and, yeah. So, all in all, I would say llama.cpp, if you are looking into large language models, running locally on embedded devices.

Ruth: Okay.

Michaël: And maybe not starting with too complex implementations because many customers, they think they need to have the best hardware to learn and to have the best models. And then they buy expensive GPUs. They want to run 70 billion parameter models and then start testing, but it's in fact not really needed.

They can do this through the standard models in the clouds. But this happens a lot that people want to start with the best and the most expensive, but it's not really necessary in my point of view.

Ruth: That's great advice. Thank you. If you had to pick a song for the soundtrack of this episode, what song would you put on it?

Cedric: Okay, so we discussed that with Michaël. I would like something with llama, but I actually, I could not find anything unfortunately.

Ruth: I googled llama songs and I came up with very interesting songs.

Cedric: Yeah, but that's why like, okay, we found songs. But—

Ruth: Okay. I think this is like a children's song, the llama song. Mm-hmm.

Cedric: That's more what you find when you are looking for llama songs.

Ruth: Okay. So, we should not put that on the playlist.

Cedric: I mean, I'm up to it.

Ruth: Okay, let me see. What do you think, Michaël?

Michaël: Well indeed, Cedric, we discussed and llama, for example, or llama.cpp, it does not come from the animal. They use the animal as a kind of icon, but it means completely something else. But to find a song about it, I don't know what the best one would be.

Ruth: Maybe we have to ask our listeners what song they would put on the playlist and see if they have an idea and some feedback.

Michaël: Well, and another option is of course, you can ask a generative AI model to generate a song. There are several possibilities now, like with,

Ruth: Oh.

Michaël: Oh yeah, like with 11 Labs, you can do it with Suno. I think it's Suno. You can do that. You put in the text, what you just asked, can you create me a song about llama and it needs to be related to llama.cpp?

Let's see what comes out.

Ruth: Hmm, maybe I should give that back into your hands.

Michaël: Let me, I can do it.

Ruth: Okay, that's a first. Having the song composed by generative AI for the playlist. I love it. Let's see if we get something together, then.

Michaël: Okay.

Ruth: Michaël, Cedric, thank you so much for being on the show. It has been fascinating to chat with you about local language models. Thank you for being here and sharing your expertise.

Michaël: With pleasure.

Cedric: Thank you for the invitation.

Podcast AI Spotlight (LC)

Artificial Intelligence: Transforming Industries and Shaping Tomorrow

Surveillance camera wearing glasses. Text reads: We Make you Devices Smarter with AI

Despite its immense potential, AI is a complex and rapidly evolving engineering discipline, which can make mastering it a daunting task. Partnering with an experienced collaborator who understands the intricacies of selecting appropriate data sets, tools, software, and hardware components can significantly reduce development time and mitigate risks.

Effective AI implementation requires a balanced integration of hardware and software, along with the right machine learning (ML) algorithms. Avnet Silica brings extensive expertise in implementing machine learning on the edge, in the cloud, and on-premises. We support our customers in understanding and building their AI-based applications, focusing on:

LEARN MORE

About - individual pages (MM)

About the We Talk IoT Podcast

We Talk IoT is an IoT and smart industry podcast that keeps you up to date with major developments in the world of the internet of things, IIoT, artificial intelligence, and cognitive computing. Our guests are leading industry experts, business professionals, and experienced journalists as they discuss some of today’s hottest tech topics and how they can help boost your bottom line.

From revolutionising water conservation to building smarter cities, each episode of the We Talk IoT podcast brings you the latest intriguing developments in IoT from a range of verticals and topics.

You can listen to the latest episodes right here on this page, or you can follow our IoT podcast anywhere you would usually listen to your podcasts. Follow the We Talk IoT podcast on the following streaming providers where you’ll be notified of all the latest episodes: