Bright Data CEO Or Lenchner - Why Meta Lost the Web-Scraping Battle Artwork

"World of DaaS"

deep dive into Data-as-a-Service (DaaS) businesses. World of DaaS is a podcast for data enthusiasts, by data enthusiasts, where Auren Hoffman talks to business and technology leaders about all things data - building it, acquiring it, analyzing it, and everything in between.

All Episodes

"World of DaaS"

Bright Data CEO Or Lenchner - Why Meta Lost the Web-Scraping Battle

August 06, 2024 • Word of DaaS with Auren Hoffman • Episode 155

Or Lenchner is the CEO of Bright Data, a leading web data collection company.

In this episode of World of DaaS, Auren and Or discuss:

Web data freedom and ethics
Bright Data's legal battle with Meta
Data pricing strategies
The Israeli tech scene

Looking for more tech, data and venture capital intel? Head to worldofdaas.com for our podcast, newsletter and events, and follow us on X @worldofdaas.

You can find Auren Hoffman on X at @auren and Or Lenchner on X at @orlench.

Editing and post-production work for this episode was provided by The Podcast Consultant (https://thepodcastconsultant.com)

Auren Hoffman: 0:02

Hello data nerds, welcome to World of Daa. I'm your host, oren Hoffman, ceo of Safegraph and GP of Flex Capital. Discover more episodes. Get weekly data as a service news at worldofdascom. That's spelled world of D-A-A-S dot com.

Auren Hoffman: 0:20

Hello fellow data nerds. My guest Or Lenchner. i. Or is the CEO of Bright Data, a leading web data collection company? Or welcome to World of Desk.

Or Lenchner: 0:30

Thank you very much for inviting me. Super happy to be here.

Auren Hoffman: 0:34

I'm super excited as well. This is a topic I know both of us are very, very passionate about, and we're at a point right now where there are a lot of companies and websites trying various tactics to make their content less accessible for data collection, crawling, scraping, etc. Can you give us an overview of the state of web data freedom right now?

Or Lenchner: 1:04

And I'll explain what that means, but I just want to start by saying that that was always the case, so trying to get access to information that is in the public domain was always a challenge, because someone always tried to prevent others to get to that data, because, as the cliche says, data is power, and that's true. So what we're seeing today is that it's becoming an increasing challenge to gain access to public data, and first we need to define what public data means. It means, at least by our definition, that just getting the standard to be the standard is any information you can find on a public web page, which means you don't need to do any sign-up, login, bypass a paywall or something like that. Any information that you can see is public.

Auren Hoffman: 1:49

Which is traditionally what search engines have been able to crawl. So that's similar data.

Or Lenchner: 1:54

Yeah, sometimes search engines actually go even deeper, but it's still public data. There are some edge cases, some open endpoints for APIs, that you won't really find public data there, but it's not behind the login, so you can argue about that. We don't go to these places because we are all collecting data and nothing else. But, generally speaking, just think about yourself opening an incognito browser and searching the web. If you can see whatever your eyes can see on incognito mode, which means without being logged in. That's pretty much public information. Now the attempts to block access to this public asset that belongs to all of us, at least in my view, is becoming an increasing challenge from the technical point of view and in many cases, those websites are trying to just keep the public data for themselves. A part of what we're doing pretty well is to help our customers pretty much every company you can think of to gain access to this public asset and to take that data and use it for the benefit of their business and eventually, to the benefit of our consumers, which are all of us.

Auren Hoffman: 3:04

What is the analogy? I'm walking down a public street and I see a store, a Starbucks, a McDonald's, and I could note that the Starbucks or McDonald's is there. But I can't go into the McDonald's and jump behind the counter and start opening up the internal refrigerator and stuff. That would be wrong, but I could just see what's going. Is that the right analogy?

Or Lenchner: 3:28

I'll simplify that. You're walking down the same street and then you're seeing a window with clothes. You want to buy some products. You want to buy and there are prices and you want to know what the price is. And it's there, presented to you, that potential consumer walking down the street. But then you're trying to look it up and someone identify you as a competitor and they are rushing out of the store to the street and blindfolding your eyes in the middle. That's a better analogy. I think that's literally what's going on in the internet. You want to pay the website money to buy a product. It's fine, you will see the price.

Or Lenchner: 4:06

If you're just trying to see the price in order, for example, to compete and to make sure you can present a better offer to the same consumer on your competing website, then someone is trying to block that from you. In what world is this a positive thing? Competition is great and you can't have competition without transparency. So I think that's a better analogy. You can think about also competing supermarkets at the same street. Now why an employee from one supermarket is walking into another supermarket to check the prices of the tomatoes? Because they want to win the consumer attention with discounts on tomatoes, but immediately when he's trying to enter the competing supermarket, someone is blocking him from access. That's actually even more closer to discrimination. Okay, interesting. So we see a lot of it and it's growing, and sometimes we see the same CEO of a company that is using bright data heavily to collect data from the web also trying to implement sophisticated technology on his site, on his own website, to block bright data from collecting data from their site.

Auren Hoffman: 5:15

It seems some of the biggest crawlers, the ones who most crawl the internet, are also the companies who are most trying to stop people from crawling their site. Is that right Absolutely, which I think is very funny and maybe just very humorous, maybe just part of human nature that that would happen.

Or Lenchner: 5:36

I think that's an amazing, accurate representation of economics.

Auren Hoffman: 5:41

Yeah, they want to get all the benefits without giving any of their competition the benefits.

Or Lenchner: 5:46

Exactly, and actually that's okay, I assume. It's an interesting cat and mouse game to play. I personally enjoy it a lot as long as we keep winning.

Auren Hoffman: 5:55

Got it. So in your case it's instead of sending the most recognizable Harold Schultz, the CEO or founder of Starbucks, to Pete's Coffee, who he would be immediately recognized and shut out, you send a regular employee of Starbucks that they would never know to Pete's Coffee to check it out, to see how the service is and see how the products are, etc.

Or Lenchner: 6:19

Yeah, that's one thing. The more interesting part, I think, is doing that at scale. So it's sending Howard Schultz people to all of the branches of Pete's Coffee across the US 10 times an hour, assuming the prices will change and it's not just prices, obviously. That's just an easy example or easy use case to understand.

Or Lenchner: 6:41

But, using exactly the same analogy, physical coffee shops can also be county customers understanding the sentiment inside the restaurant or the coffee shop. So it's much more than just the price of the coffee and online. Obviously, we're talking about web pages and intellectual coffee shops. Every web page holds infinite amount of data that represent an infinite amount of insights, so it's a lot more than a crisis.

Auren Hoffman: 7:11

I assume part of being a good citizen when you're crawling the internet is also not overly taxing servers. So crawling it in a way where you're a very tiny, insignificant percentage of the traffic and where you're being a good citizen in general, not overly crawling it, I assume one has to figure those things out too.

Or Lenchner: 7:33

It has to go together. If you look for shortcuts to earn a quick buck, you might succeed for a while, but you can't win in the long term if you're not crawling the web in a polite, fair way, which means that you need to respect the website that holds the data. You need to educate your customers to understand that and to support that. And probably the most important thing because you can't control everything is that you need to develop proprietary technology. I would say you have to do that because there isn't something available just out there, so you have to build your own in-house technology to make sure that you can enforce your own standards, policies and rules.

Or Lenchner: 8:18

I'm very proud to say that we are doing that from the first day of the company. A lot of our technology, a lot of our R&D, a lot of our inventions and patents are not just about actual technology of web scraping, but also about how to do it in the proper way. And it's not an easy task when you're talking about massive scale, a lot of traffic going to many websites from tens of thousands of customers at the same time. But it probably should be and for us at least, it's the number one priority above anything else. Otherwise it won't be a win-win situation, and if one side will lose, then we wouldn't be here for a decade.

Auren Hoffman: 9:00

Now Meta recently dropped a lawsuit against you and Bright Data for web scraping, web crawling, and that was a long litigation cycle. Before we get into that, what did you learn from that experience?

Or Lenchner: 9:15

First of all, it was shorter than I expected, because we were right.

Auren Hoffman: 9:19

Yeah, you were doing all the right things following the guidance.

Or Lenchner: 9:23

The court saw that and realized that and took less than a year and a half to win. That's actually pretty fast.

Auren Hoffman: 9:29

Which is extremely rare. I've never heard of a lawsuit that can get adjudicated that fast, especially between such a large company.

Or Lenchner: 9:36

Mentally and resource-wise I was prepared for a very long and exhausting process and that was a very good surprise to finish that quickly. But maybe how we started and what we're talking about and obviously I can only share what's public, but quite a lot of it has become public because that's already behind us. So the experience to answer your question was, I think, as any litigation, and so the first one, a roller coaster. You know you're right, you know that you're doing all the right things and if you don't, don't go into litigation. But I knew that we're doing all the right things so we decided to fight for it, but still it's a roller coaster litigating.

Auren Hoffman: 10:21

You're going up against one of the biggest companies in the world with endless amounts of legal budget, and Bright Data is a cool company, but you don't have endless amounts of legal budget to fight.

Or Lenchner: 10:33

But I tell you that the decision not to cave and to say we're not going to stop because we're doing the right thing and back that up in court wasn't a hard decision. It wasn't a hard decision because we knew what we were doing. I can understand others that might cave. It's also a matter of resources. Even if you're doing the right thing, sometimes you just can't sponsor it. Fortunately enough, Bright Data is a fast-growing, very profitable company, so we could quite easily sponsor that company. So we could quite easily sponsor that, and that was, I think, one of the best decisions I ever had in my career to stand with my principles, the company management principles and obviously the board back and support to say, yeah, okay, if that's the case, let's meet in court If you stand with their argument.

Auren Hoffman: 11:21

what was their contention? Why were they suing you?

Or Lenchner: 11:25

If you stand with me on their argument. What was their contention? Why were they suing you? Great question for them.

Or Lenchner: 11:31

An answer from what I've learned and saw from their filings, but basically how it unfolded was that they were a bright data customer that was revealed in the court for probably six and a half years I hope happy customers, because if you're using a vendor for such a long time, I assume it's working well.

Or Lenchner: 11:45

We're only doing one thing as a company allowing our customers to collect public data from the web. So this is what they were doing with us, and one bright day we just got a cease and desist letter from them, giving us a very short period of time to just stop collecting data from their assets Facebook and Instagram and also not allowing any of our customers to do that as well. That was something that I expected to happen one day from someone I didn't know who exactly, but as time goes by, the value of data is just getting so big it's still unregulated enough without any major court precedents that listen to this case. So I knew that it's going to happen one day, especially because we are leaders of this industry. Then we try to understand what their claims are all about, because obviously scraping is legal. They're doing that with us, so what's?

Auren Hoffman: 12:42

going on and at this time they're still a customer or they've stopped being a customer for you at this time.

Or Lenchner: 12:48

When I got the cease and desist letter, they were a paying customer.

Auren Hoffman: 12:52

And did the legal team know that? Because sometimes one side doesn't know Meta's a huge company and stuff, I assume they must have done a vendor check internally. No, that would be the first thing.

Or Lenchner: 13:03

I never asked that question. I regret not asking that at the same point of time. My personal opinion is that no, they didn't realize that.

Auren Hoffman: 13:12

Didn't realize that Okay.

Or Lenchner: 13:14

So, anyway, we decided that almost immediately, but obviously I need to get the support of the management and the board, but it was unanimous that we're not going to cave. We realized that we're doing the right thing and their claims were all surrounding contract claims. So basically, the claims were around hey right, dana, once many years ago you opened a Facebook and Instagram pages for the company when you're showing your happy hour on Thursday night or whatever and by doing that you accepted their terms of service or license agreement of the website and one of those terms says no scraping is allowed, which is buried under 500 pages that nobody would ever know anyway when they were on it, or something.

Or Lenchner: 13:57

You know what, even if it was the first paragraph in bold font, see, I mean, our position was fine, but we didn't sign up to Facebook and Instagram as a company to screen their websites. We're not collecting data behind the log in ever in our history, never in the future, because it's our standards. We don't believe that this is right. So that was around the contract claim and eventually it was two claims or around that. Fast forward, probably a year and a half. We got a summary judgment ruling that completely dismissed their first claim and a few weeks afterwards they just dismissed from their side the second claim. We haven't changed anything on our side. If you need public information from any website, even if it's those websites, you will still get it from Bright Data, because this is the right thing to do, and the wrong thing to do is to try and block access to public information. It's public.

Auren Hoffman: 14:55

And I assume one way they can block access is just putting it all behind a login, Like today. My guess is 90% of the data already is behind the login. You're just being able to crawl 10% of the data that's not behind the login or there's some sort of proportion. So they could have made 100% of it behind a login if they choose to at any point in time.

Or Lenchner: 15:14

Absolutely, and then we can't crawl it, and that's completely fine. But you can't eat the cake and just leave it whole. You can't block all the access to your website and assume that Google, for example, will be able to index your website and what you're going to do with other websites that embed your snippets on their website, which actually gives you traffic. So you can't enjoy being on one side, that you're allowing, or at least not being able to block, crawlers to crawl the data and see how successful a company you are. Let's get back to that analogy. You can't put the prices on the cloth in the window of the store in the street to attract customers and assume that customers will see it, and that's fine, but then you can't turn away or delete the prices and complain about the fact that no one is buying your clothes because you're hiding the prices.

Or Lenchner: 16:10

You need to get a decision. Is that even large websites that everyone are using that are testing a lot with moving data behind the login fast enough, a matter of days. They just put it all in the public domain because it hurts their monetization, it hurts the visitors, it hurts their traffic, it hurts the experience. Just think about you looking for content online on a specific website. Sometimes you will do it through Google. You'll Google whatever and the website name, just to get into that directly.

Auren Hoffman: 16:44

In fact, if I'm searching LinkedIn, their search is so slow it's almost unusable. So the way I search LinkedIn is I just go to Google and I say LinkedIn and then someone's name or some company or something like that. I get it much faster. So for me it's a huge benefit that Google can crawl LinkedIn.

Or Lenchner: 17:01

Exactly, and now it's completely fine if LinkedIn or anyone else will move all of their information behind the login, but there will be an downside for that and this is why you don't really see it happen.

Auren Hoffman: 17:12

One of the things if you're in Facebook or WhatsApp or something like that, if you put a URL to a news site, let's say to New York Times or something like that, it will then show you the content in a beautiful way, which is essentially crawling. So it goes, it gets a nice preview image, it usually gets maybe the first sentence, it gets the headline and it makes the visual experience on a site like Facebook much more appealing to the user. And that is essentially, I assume, just crawling. They're going out and they're getting it. At least the key things are public and they put it back. So, just to make the internet work well, on just a regular basis, everybody has to crawl each other.

Or Lenchner: 17:50

Yeah, I don't think that people actually understand how their lives regular, ongoing, daily lives would look like if data won't be accessible. You just gave one very simple example that everyone will notice and everyone will get really mad. If this will disappear this preview on social media Just think about what will happen. We're being used by the largest security companies in the world. They are using us in large scale to collect public data from the web to find malware, phishing attempts, scams. You can only do that if you collect massive amounts of data and then analyze it, and the analysis is being done on the company side and also with us. Just think what will happen if they don't have any more access to this information. It's easier to think about the immediate stuff that you feel like prices will go up because they're low competition. You won't see the nice reviews on social media platforms. The implications are much, much deeper than that.

Auren Hoffman: 18:51

Now there was also a famous case called HiQ versus LinkedIn, which also had a very similar case around crawling and scraping, and now we've got your case and we've got other cases that are part of this body of law that seems to be at least favoring web crawling, these judges ruling on it. How do you see the totality of the legal landscape in the US?

Or Lenchner: 19:14

It's becoming much clearer what's right and what's wrong, and I'm very, very happy I wouldn't say surprised, but I wasn't sure how it's going to unfold but for sure very happy that it seems that the court system that's all happening in California, which has an even bigger implication the court system understand the importance of data. They understand that this is an asset that is equivalent to money and even maybe with higher value than money Again, a cliche, but a true one. That's like the shovels for the gold rush, or it's the gold itself, depends where you're asking. They get that and they also see who are those large corporates that are filing the lawsuits. It's those large corporates that are trying to keep these assets to themselves, even though it's not theirs, and I believe that also tells an important story that these judges, I assume, understand.

Or Lenchner: 20:15

Just by looking at our cases and they mentioned IQ versus LinkedIn, even though that ended up a bit differently what's becoming very, very clear is that you need definitions. They need to be clear, they need to be logic. So the definition that we keep talking about in both of our cases the meta versus bright data and also the X versus bright data, which we want both is the meta versus bright data and also the X versus bright data, which we want both. The definition of public data is what we define. What I told you about that's anything that is not behind the login or signup, and it seems that the courts are accepting that, and this is important because it's a fast-growing, fast-moving industry. It will always move faster than the regulator, but usually what you see is that the precedent starts in major court cases like the ones we're talking about, and then moving up the chain to the regulators and legislators, and that's a positive thing. That's a very positive thing, and we're seeing it happen. Our case was already, I think, twice cited in the Congress. That's a positive thing.

Auren Hoffman: 21:20

In some ways, it's not a surprise to me that being able to crawl responsibly is protected by the courts in the US, because it does fit the zeitgeist of the US. But I can imagine that other countries and other jurisdictions may have very different ways of thinking about this and maybe don't have the same types of freedoms built in as you. And you crawl all over the world. So how do you navigate all those different jurisdictional issues?

Or Lenchner: 21:51

First of all, you said something interesting the corner reactor that they, or we, think that they understand the concept of fair crawling. I absolutely agree. I would even say that there is unfair crawling and these cases the crawling companies should lose in court. That's even more important than the good guys winning in court and we are also seeing that happen. And it's super important because it's not just tells you no, this is. It's super important because it's not just tells you no, this is the right way to do it. It also tells you that's the wrong way and sets the borders. And that's a good conduit to your question about the global presence of what we're doing and what others are doing.

Or Lenchner: 22:29

It's all about logic. I'm not sure that the reasons that we're winning in the US courts is freedom of speech. It might be part of it. It's more about logic, I think. I mean it's there, it's public, anyone could see it. Someone will sue a company that walked down the street and saw a billboard that anyone can see, but that company is competing with the person that published the billboard. It's just logic.

Or Lenchner: 22:55

So I hope, I assume and I know because we're really engaged in actually trying to contribute a lot to the different regulators in different countries, also in Europe, for example. I know that they have the same thing. It's logical, it creates good competition. It takes some of the power from the large organizations that are trying to get as much power and knowledge and advantage than anyone else. So that's a good thing, and I think they understand the risk in allowing someone to block access to public information. Think about an amazing library that only allows certain type of people to walk in, read books, gain knowledge and progress the world. No one will argue that this is a positive thing. So it feels the same. Again, two cases already, major cases that we've. So to me it's like a fact that they all get that.

Auren Hoffman: 23:50

When you think of AI, llms, transformers on the one hand, it seems like it would make it a lot easier for you to crawl sites and process the data and make sense of the data. On the other hand, it seems like it would also make it a lot easier for competitors, other crawlers, to crawl. So, when you think of AI, do you think of it more as an enabler, or more as a threat, or a bit of both? Do you think of it more as an enabler or more as a threat, or a bit of both?

Or Lenchner: 24:17

100% enablers and we're a year and a half into debt revolution, so that's not a theory. We're using AI to serve AI and LLMs. So LLM, or data for AI, has become our largest growth driver in the company and the largest vertical, and it surpassed everything in the last year. Because they need as much data as they can get and we'll probably also talk about that. But to your question, we are able to deliver better, faster, bigger because we are implementing a lot of AI in break data.

Or Lenchner: 24:55

Now, the fact that you are a large language foundation model doesn't mean that you have any ability whatsoever to get to a website in massive scale millions of websites at the same moment and crawl it deep, wide, up down multiple times. It just has nothing to do with your algorithms. That's a completely different challenge to overcome those blocking mechanisms, to do it in a proper, fair way, to understand the structure, to parse it from an HTML to a table that the machine can actually read. That's actually not really their business and we can't really do what these guys are doing. I'm not an AI person. I know a lot about AI and I'm good friends with all of them, but we're getting the data. It's their job to write a sophisticated algorithm? I can't. That's not my focus.

Auren Hoffman: 25:49

Much of the AI revolution was initially built on this common crawl data set and maybe a few other more open crawlers. Now, a lot of these while they're still using a lot of open data that maybe they get from Bright Data, et cetera, they're also trying to lock in proprietary data. They're making deals to get data, maybe the data that's behind a login. How do you think some of this is going to shake out over time?

Or Lenchner: 26:14

So what we're seeing is very, very interesting again because we're surveying most of these household names in the AI industry. So everyone, no matter if they've been around for a few years or just two people, start up. They raise money. Everyone will go to the common roles of the world Amazing platforms that are doing such an amazing and important work to the world. They're getting a lot of value from that. They're taking the corpus of data. They're training their models. It allows them to start running really, really fast, but then they're all at the same level. All of their models were trained on the same data.

Or Lenchner: 26:49

Everyone will tell a slightly different story because they're doing things a bit different with their experience. Fine, true, maybe, but it's all coming back to the data. You need the best talent, you need compute, you need tokens. That's data. So you can argue. If the talent is different between one company to another, usually it's exactly the same people moving from one to another.

Or Lenchner: 27:12

The second thing is compute. It depends on how much money you raise. Eventually, you buy a GPU from NVIDIA. And the third thing is compute Everyone are, you know. It depends on how much money you raise. Eventually, you buy a GPU from NVIDIA. And the third thing is the token, and that's the major challenge and the huge opportunity.

Or Lenchner: 27:26

So what we're seeing now, and literally now, these are the everyday calls that I'm having with the CEOs of the RGI companies. We have all the internet. What's now? And my reaction is you have no idea what the internet is and you don't know what you don't know. Let me show you what you don't know. And, thanks to our scale, and actually what we've built in the last decade and landed on us in the last year, allows us to map the internet, talking about hundreds of millions of new websites every week that we are able to map and we can show them.

Or Lenchner: 27:59

Okay, this is the content that you're not seeing on a daily basis. It's more content and data than anything that you've collected so far. Fine, you can play around with synthetic data, that's fine, but you're not anywhere close to eating the internet, as everyone likes to say these days. You just don't know what you don't know, and this is what I'm seeing now. So now they're all trying to get an advantage against their competition by adding more layers into their NLMs that are more specific and more sophisticated towards a specific use case, and for that you need to find the right data. It's very, very hard. There's a limit to how much relevant data you can find by Googling stuff menu, especially when you need a lot of data. So the token part is very, very interesting, and this is what we're seeing now. It's becoming more niche, more specific, and the challenge is to find it and then also collect it. But that's solved.

Auren Hoffman: 29:02

Can we double-click on this idea of a login as the main gate? So one thing that is starting to happen is these agents that can work on an individual's behalf to go do things. So maybe right now for me to go check my mileage status on Southwest Airlines or something like that, I have to go to the Southwest site, I have to log in, I have to go see it, but maybe I will have an agent that can go do that for me and let me know. Hey, you need to fly Southwest one more time this year before you get the new upgraded status or something. Or I could even have an agent go book my flight, or an agent to go do some of these other things. We're moving more and more to these agent worlds that are going to act on my behalf. In some ways, what it does is it goes, logs in, it crawls, it does things, it does some actions, comes back, it's taking data, it's bringing it somewhere else. Why is the login, the mechanism that we've all decided?

Or Lenchner: 29:58

it has to stop with the crawler, that's a very, very good point and we're seeing a lot of that. We're seeing a lot of demand. So, crawling the web, that's pretty much a GET request. It's a read-only thing and what you're describing is like a POST request.

Auren Hoffman: 30:15

Yeah, a post and a get at the same time.

Or Lenchner: 30:18

Exactly. We're seeing some interest from that. It's not exploding right now, but there's a difference, I think, between doing the logging to collect the data in large scale or doing the logging on your behalf, with your real user, your real credentials, by your request, with the contract with you, aaron, and just optimizing and automating your life. I think that these are completely two different things. I absolutely agree with the point and I also believe that in the near future we'll see I'm not sure, court cases, but I do think, some regulation around that, because that's a logical thing to do. I'm not even sure that websites would care that it's a bot working for you. I mean, it's still doing what they want the bot to do, what they want you to do.

Auren Hoffman: 31:12

Some might not care In Southwest Airlines' case, they might not care at all, unless I'm using it to get the best offer or something like that somehow and then maybe they'll care. But right now I might go to LinkedIn and I might go search for all the engineers in the Atlanta area that work for these types of companies and I'll go personally. I'll go do it and I'll poke around a bit and maybe these are the four engineers I'm gonna go approach for my company or something. I could have an agent do that for me in the future. It seems logical to do it. It would save me some time to go do that.

Auren Hoffman: 31:48

I could also have someone else in my company do that who I pay less, which is usually what happens today. Usually you get a person who you pay less money to to go do that for you and they go do the research. They come back and here's four candidates that we think are good for your role. But why can't, instead of outsourcing that to some fabulous people in the Philippines, why can't I outsource that to an agent, again, not to crawl every single person in LinkedIn, but just to do whatever would take me an hour to do to do the same type of thing, but just to do whatever would take me an hour to do, to do the same type of thing and maybe still take a whole hour the agent.

Or Lenchner: 32:24

So why this difference? I absolutely agree.

Or Lenchner: 32:28

I just think that currently it's a stepping stone to get to what you're describing and this is how in previous days that not everyone understood what web data collection, web data crawling, web scraping, whatever you call it means. Now everyone knows because everyone is doing that. In many cases that I try to explain, I use exactly the same explanation. So, yeah, you can, or I can, pay a thousand people in places that I can afford doing that, buying all of them, a thousand computers, and ask them to go to a website to manually fetch the data. No one will argue, that's not okay. I'm just doing that more efficiently. I took the 1,000 people, 1,000 computers and merged them into a single computer that does it for me. I absolutely agree that this is the next step of the evolution of the industry, but it's more related, I think, to the AI industry and less to the data industry.

Auren Hoffman: 33:21

What is the difference between a company that publishes this data that's true, like a Bright Data or iWork, safegraph or something like that we publish data, it's true and a news organization like the New York Times or the Associated Press or Reuters or Bloomberg, that is also trying to publish data, that's true. Why is there a distinction between them? If you're a news organization, you're saying, okay, this is what happened in the football game, this is so-and-so scored, this is what happened in the stock market, this particular stock went up 2% last month, this other thing went down, et cetera. It's just facts that they publish.

Auren Hoffman: 34:01

Obviously, there's an opinion side of some of these things, but a lot of it is really just news. It's facts. It's here's what's going on, and then, really, that's what data is. I assume with your business, you're publishing facts about what's happening, maybe about a price here's the fact or about some other type of thing, and in some ways to me it seems you're just a news organization, just like they are, and in some ways should be afforded the same type of rights that they are as well. Or is that too crazy?

Or Lenchner: 34:32

No, no, that's exactly the same thing. I think that maybe the one single difference is that when you're reading the New York Times, you're a human and when you're reading the Bright Data Dataset Marketplace or the SafeGraph location dataset, you're usually a machine that needs to consume with large in large scale, to process it and to get usually automated decisions like that. Other than that, exactly the same thing. I absolutely agree. Actually, the New York Times worked with Bright Data. It was already published, because it's like a very good merge of what you just said.

Or Lenchner: 35:04

They wanted to publish a story it was probably two years ago or a year and a half ago about what happened to the Twitter feed after Twitter was acquired by Elon Musk. So that's a news story. They tried to, I assume, reflect the facts, but they needed a lot of information and data to analyze the data to get to the facts. So they partnered up with Bright Data to collect public information from what was back then Twitter, and then you had a story in the New York Times, and then you had a story in the New York Times. So it's exactly the same thing, but the data that they connected through us was given to an MST, which is an equivalent to a computer.

Auren Hoffman: 35:44

It's interesting. It's like if you have a whole bunch of data, it's data and it's treated one way, but if you turn it into a big pie chart for everyone to see, it's news, it's very funny. Oh, we summarize the data, it's news, but the data itself is data. We have to treat it in a different way or something.

Or Lenchner: 36:00

And, by the way, the same data. Don't give it to the New York Times. Give it to a large hedge fund. It becomes alternative data that helps them to get investment decisions. Exactly the same information, so that's even more. I mean, it's the same facts, but the insights varies depending on who's reading them.

Auren Hoffman: 36:22

There are so many things that are happening in the data business. One thing I'd love to get your opinion about is pricing. How does one think about how they should price for their data? It's competitive. We live in a competitive world. You have competitors, we have competitors. So how should one think about pricing the data, giving good value for the data? You can do a yearly price. You can price per data element. How do you think about it?

Or Lenchner: 36:43

So I'll give you my philosophy. That seems to be working for Bright Data for a while. It's all about value, but it's very hard to understand what is the value to the customer. So, first of all, just start with a price. In 100% of the cases you'll get it wrong Either too pricey, too high, too low, that's fine. Move fast and for the next customer you can adjust. To the next customer you'll even adjust more, and then what will happen is that competition will arise and you need to adjust pricing, usually down. That's what happens.

Or Lenchner: 37:16

What we're doing, and what's working great for us, is to make sure that every day, I would say we're working on the next two to three years product line, and we're doing that for a decade. So every I would say, six months, we introduce a completely new product with the same single goal get you the data you need. But it's always more advanced and that's a premium product. It's a premium product not because I decided one day and just because it's new. It's a premium product because it's easier for you, the customer, to get the data faster in a higher quality, with less resources from your side, and then the price is slightly higher. So it's always the more mature products with higher competition that are being priced along the years, and as long as you can keep innovating and increasing the value to the customer, you will always have higher yielding products.

Or Lenchner: 38:15

Now, specifically for data. It really depends how the customer is going to use the data. As I sort of talked about it earlier, the same information from the same website, the same information from the same website, even parsed in exactly the same way, can be worth a lot of money. To again, say, for an investment company that wants to get a very, very, very important decision to invest billions of dollars in something or not, it's worth a lot of money this information to them and to another company it can be a marginal value, and maybe they can't even skip it.

Auren Hoffman: 38:52

But then how do you discriminate then on price? Is it okay? Well, you get it with a one second delay and you get it with a one week delay, or something. Or how does one think about doing different types of pricing?

Or Lenchner: 39:03

So it's actually. In most cases you don't. At least, if you're trying to be a pure product company, like we are, then you don't. You have that on the marketplace. It's visible to everyone. Transparency is important, so it's there.

Auren Hoffman: 39:19

We're talking about pricing and partially the goal is trying to keep doing things cheap, moving, keep trying to get things better. How does one think of building a moat? Is the only way to build a moat just continual running and improvement every single day? You can never let up and you just have to keep improving, improving, improving. Or is there some sort of way in some of these data businesses to build a moat? Of course you never can coast, but building a little bit more of a moat that can make your business a little bit more defensible.

Or Lenchner: 39:50

So I think it's actually about keep moving fast and it's all about product, even when it doesn't look this way. I'll give you an example. So eventually, 99% of our customers, the only thing they want to get is the data. It's just doing the crawling and scraping. That's the mean to get to the end goal, which is the data, and even more than that, they don't need the data, most of them. So AI companies do want just the raw data, but most of the other companies or customers, they want the insights from the data, if they can just get it.

Auren Hoffman: 40:22

They care about a specific type of thing, like the different price or something, and they want to get that data from you.

Or Lenchner: 40:29

Exactly so. One new product that we launched, probably a year and a half ago, was Bright Insights, for now, only for the e-commerce industry. That gives you the bottom line insight. Your market share on this website is X, and these are competitors. This is their market share. This is how much units they sold yesterday in that average price Really an amazing product. So you can call it a strategic mode to give more value to the customers.

Auren Hoffman: 40:58

In some ways, you could just be giving them a dashboard in a way. They just need a way of being able to review the data.

Or Lenchner: 41:05

Yeah, but my point is that this is still a product. It's always about that. You can have a very sophisticated pricing model with subscription that will lock the customer for a long time and the value is there. That's all fine, but eventually, at least in my view, it's all about product. It's all about technology. Otherwise, something will happen one day and you'll start losing pieces. It's all about that.

Auren Hoffman: 41:31

There are some data companies that have either tried to get more proprietary data or have some give-to-get model where they have some proprietary data where they can lock in a little bit more of an advantage. How do you think other, or what would be your advice to some of these other data companies out there?

Or Lenchner: 41:48

I'm not a guest, but it can work for a specific niche. That's fine. So think about those large LLMs that, as you said before, are contracting directly with big publishers.

Auren Hoffman: 41:59

Reddit or something.

Or Lenchner: 42:01

Reddit, I'm not sure, is a great example, because a lot of the content is still public and not behind the locale, so it's still collectible. I'm talking about more of like big publishers, the big newspapers of the world that are contracting with the open AIs of the world. That's fine. It can be a moat, but if you're using ChangeGPT for coding as a copilot, who cares? So if the coding copilot won't be in par with the cloudpilot for coding, you'll just move to the better product. So it can work really really well as a mode for specific use cases, but if you want to conquer the world, then that's just not good enough. It's all about products.

Auren Hoffman: 42:43

What do you think about these data marketplaces where a buyer can see many, many different data sources. Maybe a data seller could access many buyers. I haven't yet seen them take off. There's a lot of data marketplaces but I don't yet see a lot of dollars going through them. But I'd be interested in your thoughts about it.

Or Lenchner: 43:00

Well, if you weren't a private company, you would have seen. So the Bright Data marketplace which you can sign up and you can see it, that's a product.

Auren Hoffman: 43:10

It's a marketplace, so you have other data sellers that sell on it.

Or Lenchner: 43:15

We have third-party data centers that are selling on that.

Auren Hoffman: 43:18

Ah, I didn't realize that. Oh, that's interesting.

Or Lenchner: 43:21

But I think that that's actually not the most interesting part. I think that the interesting part of it is the way that we'll build it from a product point of view, not from a business point of view. So we see what our customers and the prospects and previous customers and everyone in Midsida are looking for and this is how we get the decisions how to build it and what to build, what data sets to add, how to consume it, how to filter it. So I'll give you an example. In the early days of the marketplace, we believe that that's a marketplace that you literally get a CSV or even an Excel. So that can serve not just a data science team in a company, it can serve the marketing team, because anyone knows how to read an Excel. So this is how we started building that. From a product point of view, it has the GUI that you can go in and filter stuff and see with your own eyes without writing a single line of code.

Or Lenchner: 44:19

What we realized from product perspective is that, even if it's serving the marketing team, eventually the person that is responsible to get the data wants to consume it through an API. That's a product understanding. So we've built an API for that and this is why it's taking off. And, yes, you can also sell your own datasets there if you're not a part of Bright Data and it's, I assume, like an economy of scale game.

Or Lenchner: 44:50

We're sending it out to around 20,000 data buyers. They don't really care if Bright Data produced a data set or if that is a safe graph that produced a data set, and they are coming back to get the data because they know that they will find it with us. Also, because we won't cave if a huge company will sue us and that we also have a very good technology to overcome the blockings to get the data, and because we have strong partnerships with other centers that are sitting on our marketplace. So once we reach that tipping point and past it, we have many data buyers. That just keeps coming in. Then it's again all about the product. You need to deliver the right product to give them the data they need fast, high quality and that will always be the most.

Auren Hoffman: 45:40

And joining data across all these different data sources in your case, you're crawling many, many different sites is really hard. If you think of this shirt that I'm wearing, it's a green shirt and it might be sold on 10 different places at 10 different prices and, of course, it may come in different sizes, comes in different colors and it's probably very related to the women's version of the shirt and this other shirt and these other brands and they're all connected to one another. How do you think about joining the data and these different join keys, especially for products? I imagine that's just a huge challenge.

Or Lenchner: 46:15

That's a huge challenge, not a new challenge. Product matching that's the definition and that was a very interesting thing for Bright Data to solve. That was a classic build versus buy that we had and in this case, we actually chose buy. So to launch Bright Insights and the product matching part in Bright Data is under the Bright Insights products suite we actually acquired one of our customers. That was a great due diligence, because they used us for six years. We realized what they're doing. We saw okay, this is the next piece in the value chain for us, giving the insight, what the customer actually wants to get, and that's a very, very hard thing to solve. These guys are doing that a company called Market Beyond. It was an easy decision for us.

Auren Hoffman: 47:06

We're glad we could. It was an easy decision for us. We acquired it. You could have also, instead of buying the whole company, you could have, I'm sure, just been a customer of theirs or something and paid them to be a customer, which would have been a lot cheaper than buying the whole thing. Why did you choose to own the whole thing?

Or Lenchner: 47:22

In the last decade. I'm the CEO of Bright Data for I think seven years almost. I've been to the company almost from day one. I learned that what's working really, really well for us is two things. First, one is focus only, or almost only, on getting the data. It's hard enough, huge market and we're the best Laser. Focus only on that Now. The second thing is to keep going up to value chain, and that was just a decision that if we want to keep going up to value chain, we need to own this technology. And it's working well for us because we own the technology, from the underlying infrastructure, for example, the proxy networks, all up to the insights, and I already know what's the next things in the value chain will look like and working on that, and I have a few strong build versus buy dilemmas right now.

Auren Hoffman: 48:19

All right Now. A couple of personal questions before we go Outside of Silicon Valley. Israel has been really the most prolific of all the high-tech scenes. So much has been written about it. What are some non-obvious reasons why Israel has been so successful in the tech sector?

Or Lenchner: 48:38

I'm not sure if that's obvious or not. I think that we have no other choice. That's just thinking and solving problems in large scale. That's something that we have to do as a nation, even though we're not an island, an actual island. It's like living on an island. So just because who's surrounding us, and physically, if I want to visit Europe or come to you, I need to go on a plane, even though that theoretically, a couple of weeks I can drive this part of the way, but technically it's impossible. So we're like an island. It means that we need to use the resources that we have as a nation.

Or Lenchner: 49:17

You won't find any oil in Israel and you won't find minerals that you can take and then sell to other countries not a lot.

Or Lenchner: 49:25

You will find a lot of knowledge that was developed along the years, I want to say from going back to our history and the Bible and reading books and writing books three to four thousand years ago, but that might be too much for me to testify on.

Or Lenchner: 49:45

But even going to servicing in the army in a young age before going to college and just getting a lot of responsibility without the option to fail when you're 18, then there's no option to fail and I see it with Bright Data, I see it with a lot of HRA CEO friends and colleagues that's not an option and if we fail, it will be a big failure because we need to try really hard, but then getting up and then trying again, and I think that that compensates for a lot of things that might be missing in H1N1. So keep trying with a lot of resilience, while understanding the power of knowledge. That's a very good mixture and this is why I think you're seeing a lot of the important technologies coming out from Israel, and we're seeing it now, also AI. I mean, just recently, a few large new ventures started in Israel, even going back a few years when NVIDIA bought Mellanox for many billions of dollars. That's a good testimony for that.

Auren Hoffman: 50:52

Last question we ask all of our guests what conventional wisdom or advice do you think is generally bad advice?

Or Lenchner: 50:58

That solving something in a meeting or just doing meetings is a legitimate way. I am really, really against meetings. This is how the company operates. I'm not sure you even noticed, but when we try to schedule this one, I don't have anyone to schedule my calendar. I'm doing that because I don't have any meetings. I just told, yeah, whatever works, I'm free, and that's usually the case, and I like to be challenged by people that are asking me, and I like to be challenged by people that are asking me. Yeah, I get that Sounds cool, but you can't do this and that without an actual meeting. And the answer is no. You can actually do it much better without a meeting, so you async it.

Auren Hoffman: 51:41

you've got a shared dock, or you share Confluence or share Slack. You're asyncing the stuff, or do you do it live or you do it async? Is that the difference? Because you're collaborating with other people in your company, I assume.

Or Lenchner: 51:55

But let's not confuse a meeting with collaboration. What you talked about, that's methods to collaborate when you're doing a meeting. Usually it means that there's no clear ownership, and if there is a clear owner, then they don't have the authority to be a real owner. What I'm trying to do and it's hard is to make sure that there are clear owners with the right authority. It's not always working, but we have to keep trying, and then you don't really need someone that knows how to make the right decisions and execute and to collaborate. Yeah, you can absolutely do it async. It doesn't mean that I can't give you a short phone call. Hey, aaron, I'm going to do this and that. What do you think about it? I need your help with this, and that that's absolutely fine.

Auren Hoffman: 52:42

So you don't need to have a meeting, you just call someone, just a quick sync on it or something like that.

Or Lenchner: 52:46

Yeah, you're taking the 45 to an hour meeting, which usually is not very useful, and summarizing into the actual five minutes that are useful. That are about the action items, the decisions, the things to be taking and all of that that sounds amazing.

Auren Hoffman: 53:02

Well, thank you, Orr Lencher, for joining us on World of DAS. You're at Orr Lencher on X, so I think you mostly post in Hebrew, so some of our audience could benefit from that. But you're also active on LinkedIn, so I definitely encourage our listeners to engage with you there. This has been a ton of fun. I really appreciate you coming on World of DAS.

Or Lenchner: 53:20

Thanks for having me. And unfortunately, my Instagram and Facebook accounts were shut down by Meta when they sued us.

Auren Hoffman: 53:27

So find me on.

Or Lenchner: 53:28

LinkedIn and feel free to reach out, and really I appreciate the invitation. It was a great time.

Auren Hoffman: 53:35

If you're a super data nerd, go to worldofdascom that's D-A-A-S. Worldofdascom and sign up for our weekly data as a service roundup newsletter. Thanks for listening. If you enjoyed the show, consider reading this podcast and leaving a review. For more World of DAS and DAS is D-A-A-S you can subscribe on Spotify or Apple Podcasts or anywhere you get your podcasts, and also check out YouTube for videos. You can find me at Twitter at at Oren. That's A-U-R-E-N. Oren, and we'd love to hear from you. World of DAS is brought to you by Safegraph. Safegraph is geospatial data for physical places. Check it out at safegraphcom. And by Flex Capital. Flex Capital invests in data companies like those we talk about at World of DAS. Check it out at flexcapitalcom.