🎙 Develpreneur Podcast Episode

Audio + transcript

Navigating Data Integration: Scraping Vs. APIs

In this episode, Michael Milosz and Matt Zalto discuss the importance of data integration in AI systems and the need for guardrails to prevent exposure to sensitive information. They also discuss the role of AI testing AI in data protection and the importance of flexibility in AI deployment.

2026-03-15 •Season 27 • Episode 19 •AI readiness and data integration •Podcast

Summary

In this episode, Michael Milosz and Matt Zalto discuss the importance of data integration in AI systems and the need for guardrails to prevent exposure to sensitive information. They also discuss the role of AI testing AI in data protection and the importance of flexibility in AI deployment.

Detailed Notes

The hosts discuss the importance of data integration in AI systems, citing the example of an insurance company in Singapore that had a successful trial with a voice AI agent. However, when the tool was moved to production, it failed due to the lack of integration with real-world data. The hosts emphasize the need for guardrails to prevent exposure to sensitive information and discuss the role of AI testing AI in data protection. They also highlight the importance of flexibility in AI deployment, mentioning that TeleIpass is designed to be deployment flexible.

Highlights

  • The importance of data integration in AI systems
  • The need for guardrails to prevent exposure to sensitive information
  • The role of AI testing AI in data protection
  • The importance of flexibility in AI deployment
  • The need for a hybrid approach to AI development

Key Takeaways

  • Data integration is essential for AI systems to function effectively.
  • Guardrails are necessary to prevent exposure to sensitive information.
  • AI testing AI is crucial for data protection.
  • Flexibility is key in AI deployment.

Practical Lessons

  • Start with a legal perspective when building AI systems.
  • Ensure data is 100% secure.
  • Build AI systems with the regulatory landscape in mind.
  • Test AI systems thoroughly before moving to production.

Strong Lines

  • Data integration is the Ferrari engine of AI systems.
  • Guardrails are necessary to prevent exposure to sensitive information.
  • AI testing AI is crucial for data protection.

Blog Post Angles

  • The importance of data integration in AI systems.
  • The need for guardrails to prevent exposure to sensitive information.
  • The role of AI testing AI in data protection.
  • The importance of flexibility in AI deployment.

Keywords

  • AI readiness
  • data integration
  • guardrails
  • AI testing AI
  • flexibility in AI deployment
Transcript Text
Welcome to Building Better Developers, the Developer podcast, where we work on getting better step by step professionally and personally. Let's get started. Well, hello and welcome back. We are continuing our season where we're getting unstuck. We're moving forward. We're building momentum. We are the Building Better Developers podcast, also known as the Developer podcast. I am Rob Broadhead, also just known as Rob Broadhead, one of the founders of Developador, also the founder of RV Consulting, where we help you with a technology reality check. We help you get in there before you do that AI project, that big IT effort and investment. So let's make sure that you've got your ducks in a row and ready to make the most of that so that we can improve your chance for success and help you with that roadmap. Good thing and bad thing. Good thing is we are into spring. We are into things just keeping us busy. And it is a great time to be alive, as it were, because there's so many projects, there's so many advances, there's so many things going on that are just really cool. I'm just going to stop my geeking out right there. But this is one of those times that is an opportunity wherever you want to go. If you want to change your direction, if you want to change your career, if you want to take that next step and suddenly become an expert, this is a time to do it. Bad thing is, is there's just too much. It is just like every day I go to sleep thinking, gosh, there's a hundred other things that I wanted to do. But one of the things I'm going to check off my list right now is pass this over to Michael to introduce himself. Hey, everyone. My name is Michael Milosz. I'm one of the co-founders of Building Better Developers, also known as Developer Nur. I'm also the founder of Envision QA, where we help you build test. We build and test custom software that helps you remove the bottlenecks of your business. That way your business can run smoother and faster. Good thing, bad thing, good thing. Getting real busy. It's that time of year. Things are moving forward. Bad thing, we're getting real busy. Don't have enough time for a lot of things right now. So don't have enough time to enjoy the weather. But that's kind of a mixed bag there. But it's a good it's a good time, like Rob said. And there's a lot of exciting things going on, like what's going to happen in this conversation. That's right. This conversation will continue. Hopefully, if you have not listened to part one, pause here, go back and listen to the first part of our conversation with Matt, because we're going to pick up right where we left off. And rather than waste my breath, let's go right back in and join Matt in our conversation. Yeah, I think mobility is going to continue to be a huge thing as far as being able to move your servers around, your data and things like that, because it's traditionally been the EU has had their rules and Asia, China has their rules and USS sort of theirs. And they try to keep it in a box. But it's I think we're getting to that point where you just if you're going to be global, you're going to have to be able to adjust. And it feels like, yeah, that there's that that's coming. There's going to be more regulation. There's going to be some of these places that were essentially open before. Now they're not. Or maybe it's just not worth it's not worth going through the hoops to have a server somewhere else. Just like you said, like, OK, just we'll have something in Luxembourg and then we don't have to worry about it. We don't have to worry about all of the whatever the regulations would be to move data from from Ireland into there. Are you see that you touched on something earlier? And it really is. It feels like it really is. It's about like it's about integrations and being able to connect to data for all these different places. Are you seeing some some organizations because we talked about these big ones that are trying to figure stuff out? Are you seeing them maybe change their tune a little bit on how open or integrated they are? Or is it more do you feel that it's a push towards everybody's just you almost have to be open. So everybody's going to have to learn how to find some way to play well with each other. Protectionism comes in much more. I mean, we mentioned it earlier. It's so easy to even reverse engineer or clone entire applications. Companies would be silly to load more, especially that is especially into into wider networks that are potentially non secured. So among others, we work with one of the largest, largest Fortune 15, in fact, companies out of Korea. They're known for anything from from smartphones to fridges to toasters to whatever else. And. They have. Significant requirements on data security, privacy, et cetera. Now, that particular company is literally big enough that it's it's too big to fail. So it's you know, this comes into Korean national security interests here. But at the end of the day. They are more open because they have to be. But there's a lot more. I guess it's a it's a controlled openness. Right. So let's let's look at a couple of things. So A.P.I. first and the integration readiness is really something that's needed in order to under to to enable this underlying connectivity there. But there are a lot of conditions and much more now than there have been before. So one would be from a security and observability point of view. We've got plenty, in fact, governance around all sorts of integrations that we do. In fact, what's interesting is, is that of late, there are a lot of companies that are asking us to deploy the entire integration. So it's kind of like go away from the cloud. It's kind of regaining this control. So that was a little bit to the conversation that we had earlier. So I guess enterprises, I don't know, have enterprises have been more open at the edges to allow, you know, to allow the open at the edges to allow to kind of open themselves up to be more relevant. But at its core, they've become very, very strict. So that's that's really what what we see here at the moment. Now, are you saying that, like, for example, like at the D.A.P.I. level and things like that, that is, is it as much really, I guess, is it much as much security as it is anything else where they just want to make sure that only the only the right people are let through and only the correct data is is pushed out? Or do you see it as a trying to just sort of protect their environment a little more? Or is it maybe a blend of both? The security is nonnegotiable, right? So that needs to be there. Underlyingly, it's it's it's protecting whatever is worth protecting. So I guess, you know, companies that we see, they certainly say yes to more integrations. They say yes to more best of breed applications. They say yes to, you know, new ideas in an A.I. environment as well. Mind you, that really needs to be sitting in a sandbox first before it then slowly gets released. But they say no to anything that cannot govern. And that's really the important part. It goes back to control. And I think that's really where, you know, a lot of these laws that we're going to be seeing as well. You know, we mentioned Korea, we mentioned the European Union, but there's even in the United States, Colorado, if I'm not mistaken, has quite quite interesting laws coming up there as well. And yeah, ultimately what it means is the CIOs, CTOs, but also the wider organization, chief data offices, they need to be ready for all of it. And that's the entire organization at large. Yes, go ahead, Rob. As you say, well, go ahead and fish off because I can switch gears here. Yeah, so you talk about governance and, you know, I've worked in different governances of health care, banking, commerce. For those that don't work in those type of industries, the idea of governance is kind of a foreign concept to them. And you might be like the bleeding edge of A.I. You might be building these systems, you might have the best APIs out there and even the best security. But how do you ensure or what would you recommend to our listeners to help them be better at governance, to be aware of these things so they don't build something that in six months could completely be blocked just by a bill that gets passed in Europe or in the U.S.? Yeah, that's a good question. Always start with a legal. I mean, first of all, know your customer, right? Each different customer in different industries will be subject to different levels of governance. Always start with the legalities first and look into what's openly being discussed. And basically be ready for those parts. What I mean with that is you want to ensure that you become, to an extent, a subject matter expert in that particular field that you're trying to solve. Again, we have the HR example. Let's use that again as an example here. So let's say you're building an HR tool again. That obviously touches a lot of different data points from personal information. It would have people's personal contact details, addresses, LinkedIn profiles, whatever other details that might be scraped. Could be social security numbers in the U.S. Could be police reports. Anything could be in there. So, first of all, making sure that that data is 100 percent secure because you don't want to end up on the front page of any sort of news outlet. But in addition to that, one part is a legal obligation that you obviously have. On top of that as well, it's like you kind of for yourself and your business, you've got the obligation to think at least 12 months ahead from a from a regulatory landscape and build or advance the product in order to fulfill that. So what one one example again would be for let's pick the let's pick the European Union. You might have different source systems. You might have a workday again related to this HR model. Right. So you might have a workday. You might have some sort of application talent management systems. All of these gain information and then your AI that you may have built does some form of AI screening. So making sure that data is completely anonymized, that it's masked, that the algorithms don't take personal identifiable information into it. That then the destinations of ultimately what's happening with that, that is 100 percent clear and that you've got audit logs in the entire scenario. Because then you can go to any CIO, CTO or data compliance officer and say, hey, dear Mr. Officer, we've got your source systems here. This is your internal data and it happens to have all this personal identifiable information. What we then do here, we cleans that data and then from there it goes into target destinations. If that is a problem, then well, we happen to have a solution for that. But ultimately what it is, is you're taking data, you're making sure compliance sits within it. And that in itself could be a product if you really wanted to sell it that way before it then goes to its its target destinations here. And that is really just governed by what is it that you know today in terms of regulatory, a little bit of common sense on top of that as well. And what we often see is that especially data compliance officers in larger organizations take an existing piece of regulatory text. They put their own flavor on top of that and often they make it a little bit tougher than it was before. Sometimes also to add their own signature. Everybody's got to have their own fingerprint on those things. It feels like nobody wants to just like pass it on through and say, yep, we'll just go on with what's there. You got to make that that little bit of an improvement. I'll switch gears a little bit back to serve our initial swinging back around to like AI readiness and things like that is what I mentioned. You mentioned a lot of different things that you can do, a lot of a lot of problems that need to be tackled as part of doing this. What is maybe one thing that you see that is a common area where companies are not ready, particularly I would even throw like maybe a blind spot where they're like, yeah, we're ready to go. And you start digging a little bit or you scratch the surface and it's like, oh, wait, no, we don't actually have that in place. Is there something there that you see is sort of a common blind spot for AI readiness? A little bit to the beginning of this conversation very often and back to Gartner as well. If you don't have data that's AI ready, then your entire AI readiness project is doomed to fail. So think of it this way. If you've got a lab environment with incredible AI capabilities, and I'll give you an example, actually. There is there's an insurance company in Singapore we've worked with. This insurance company has had a very successful trial having a voice AI agent that listens in to incoming conversations with customers. They then had the customer service team, sometimes semi-trained, sometimes not trained at all. Only follow what's on the screen. So we're having our conversation about any type of whatever inquiry there might be. And the answer will always appear on the screen, which is amazing, right? Because you've got immediate training that people can immediately hit the ground running and listen. Well, listen to the conversation, but literally the correct answer is displayed in front of them. Work perfect in sandbox failed in production. Why? Because various different changes in terms of weather event changes, in terms of other sort of legislations. I mean, look what's happening around the world in terms of travel disruptions. All these sort of things weren't built natively into that tool. So the tool knew seemingly everything in the sandbox environment, but then didn't naturally evolve with that. It kind of like was continuing to operate in its own little black box. And as a result of it, the entire project at least failed. They didn't completely shelve it. Now we're involved because again, to the point earlier, we need to make sure that data integration is there in order to allow for real time data to flow into an AI system, to make good recommendations or to at least give the right answer with it. So it's an ongoing project. It's going to go live pretty soon. I think it's going to be really successful. But this is a great example of where you've got something that's worked particularly well in a lab environment. So you kind of you've built your Ferrari engine in your lab environment and then you drop it into your the rest of your environment, which happens to be to continue the analogy of golf cart and the chassis snaps. And so and there you have challenges that are then often resulting in some embarrassing conversations internally, because it all sounds good on paper and it didn't work well in that environment. But then at scale it failed. So it's only data integration or at least the readiness of data wherever and however you get to it. That really needs to be that needs to be taken care of. So do you see this? Part of this, obviously, is you've got to get your data corrected, things like that. But if somebody wants to move from a proof of concept or something like that, you know, sort of a sandbox environment into something production, are you seeing the thing? Do you see where there is opportunities for guardrails and things like that to sort of dip a toe into production? Or is this one of those that it's usually, you know, they're in the sandbox and then they're now they've moved into production. And that's when it really starts to fall apart. Or are there some things that like I guess it's mitigated risk that a company that can go through to move from that sandbox into production, especially if they are a little bit concerned or unclear about how how that's actually going to work in the real world. Use cases, right. So you can many companies have segmented their entire database. So if they're doing it right, and that's what we're going to be doing in this example that I mentioned with the insurer, we're only going to be focusing on one particular product first. So incoming call, you know, you either talk to it or press a number. I think you have to press a number at the moment and you end up getting routed to a particular department. That particular department then only focuses on travel insurance, for example, which is going to be used in isolation. So therefore you don't have to immediately to focus on everything insurance. Right. So in that case, and that is the end goal. But you want to make sure that you've got manageable proof of concepts that can can last under pressure and that you obviously have failsafe in place so that ideally the people that are running this, it's a little bit like, you know, your autopilot car problem. So you want to make sure like it's nice to see that the that the car drives itself, but you still want to be in control if it really doesn't handle a situation the way it's supposed to. And that is in a defined scope. You're only going to be focusing on one or two use cases in areas. And from there, you slowly expand. And every single time you're going to get another learning. And every single time there's a different customer or a different complication coming up. You need to make sure that this entire system learns from it. And that's really the important part, because otherwise you continue just having your black box that you're just trying to or your sandbox environment that you're trying to move further upstream. That needs to naturally grow with it as well in order to adapt to all the all the different. Yeah. All the different environments that it's going to be exposed to all the time. Are there any particular tools or systems out there that you would recommend to help people test their systems to make sure that they're ready? There's various difference out there, of course. We closely work with a company that focuses on that. We're under NDA with them, though. So. It could be general. Part of the question is inclined because, you know, if I'm starting out, I'm getting into AI, I'm building these systems. How what are maybe maybe a way to flip this around is what kind of questions will I should I ask about my system to make sure that it's ready to make sure that it's prepared to be taken out of the sandbox and it's ready for the real world? Yeah, that's a good question. In the real world, it really comes down to. It's expect the unexpected, right? For the most part. And and that's exactly the that's exactly the challenge, especially when it comes to AI, because people will test it. I mean, an example is chat bots that have famously been deployed on major websites. I think major U.S. car manufacturers were some of them. And it turns out that it was just a train chat. So and in the end, people were joking, saying, you know, cancel cancel your open a subscription. You just got to go to automaker dot com. And I'm not going to mention which one it was, but it was a big one. And look, you don't want to create these environments in production. How can you best test that? You can really make sure that you've got the right guardrails in place because. The honest answer is you won't be able to robustly test. Against the. Entire world's imagination of which options might be possible, right? I mean, I could be next time you're calling your favorite pizza agent and an AI voice greets you. Try to ask it for, you know, something completely out of the ordinary. Chances are it's going to fail and just going to put you through to an operator because they want to save themselves the embarrassment. And that is really where. You can I mean, there's various tools you can try different scenarios. You can try to be as creative as you want to be. At the end of the day, you need to be ready for learning on the go. And I think with that, the most important part is having guardrails in place so that if something happens, if exposures are coming up, if questions that aren't supposed to be coming up are being raised, all these type of details really need to be need to be addressed. Let's let's go one level deeper on this, actually, because. If on the one side, you've got an incoming. So there's various different ways on how you can look at your data, right? So the data then comes in. You want to make sure there's almost like an interceptor that tests and that's almost like AI testing AI. Is this information that's being requested in breach of any laws? And these could be internal laws or overall laws. On top of that, inappropriate questions being asked against it. If yes, reject the order, reject the request immediately. If no, it passes through. And then from there, it just comes down to how do you want to manage this? Do you want to manage this with your own internal AI bot? Or maybe is there a separate run? You can have different AI bots and different knowledge and LLMs as well that are trained to do different things to provide better answers for it. So you can create architecturally very easily create that flow. And a request comes in at the top based on what it is. It passes the guardrails. And then after that, it's ultimately being routed to this particular expert with it. If you then focus on it makes it easier to fix things that are broken as well, because if you if it turns out that the advisor for pizza happens to be making mistakes all the time, but the advisor for pasta is fine. Ninety percent of the time. Well, then, you know, you need to focus on this one that happens to be giving you some grief and all the other ones at least you can ignore for the time being. So that's going to make it a little bit easier rather than having to re architect everything and look at everything from scratch. Well, as usual, it feels like time has flown by and we've had it's been a great conversation. So for those that are out there in the audience that would love to learn more about Intel iPaaS and some of the things that you've been talking about, what is the best way for them to get a hold of you? Yeah, thank you. So best place to go into our website and tele I pass dot I oh and tele I pass will connect any of your data. It's designed to be completely deployment flexible so you can deploy it on premises in the cloud in a hybrid environment. We even have a G seven NATO partner that deploys it in an air gapped environment. And it's really all about the flexibility and obviously underlying security with that. If you're an AI startup and you need data integration at scale with it, happy to have a conversation. You can also connect with me on LinkedIn. That's Matt Zalto Zalto spelled with an S. And yeah, looking forward to having the conversation with any of you. And most importantly, yeah, Rob Michael really appreciate your time. Great to be on. And I enjoyed the conversation. So yeah, appreciate it. Good. Well, thank you so much. We will wrap this one up. For those of you out there, we are not done. We will continue with another interview right around the corner and actually a weekly challenge. Just a couple of days right around the corner. As always, got there and have yourself a great day, a great week, and we will talk to you next time. Thank you for listening. And remember, a little bit of effort every day adds up to a great success. Keep learning, keep growing, and we'll see you in the next episode.