How to Scrape Zomato Listings Using BeautifulSoup and Python?
Amongst the biggest apps of Web Scraping is within scraping restaurants listings from different websites. It might be to create aggregators, monitor prices, or offer superior UX on the top of available hotel booking sites.
We will see how a simple script can do that. We will utilize BeautifulSoup for scraping information as well as retrieve hotels data on Zomato.
To begin with, the given code is boilerplate and we require to get Zomato search result pages and set BeautifulSoup for helping us utilize CSS selectors for asking the pages for important data.
# -*- coding: utf-8 -*- from bs4 import BeautifulSoup import requests headers = {'User-Agent':'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_11_2) AppleWebKit/601.3.9 (KHTML, like Gecko) Version/9.0.2 Safari/601.3.9'} url = 'https://www.zomato.com/ncr/restaurants/pizza' response=requests.get(url,headers=headers) soup=BeautifulSoup(response.content,'lxml') #print(soup.select('[data-lid]')) for item in soup.select('.search-result'): try: print('----------------------------------------') print(item) except Exception as e: #raise e print('')
We are passing user agents’ headers for simulating a browser call to avoid getting blocked.
Now, it’s time to analyze Zomato searching results for the destination we need and it works like this.
When we review the page, we will find that all the HTML items are encapsulated in the tag having class search-results.
We need to use it to break an HTML document to these parts that have individual item data like this.
# -*- coding: utf-8 -*- from bs4 import BeautifulSoup import requests headers = {'User-Agent':'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_11_2) AppleWebKit/601.3.9 (KHTML, like Gecko) Version/9.0.2 Safari/601.3.9'} url = 'https://www.zomato.com/ncr/restaurants/pizza' response=requests.get(url,headers=headers) soup=BeautifulSoup(response.content,'lxml') #print(soup.select('[data-lid]')) for item in soup.select('.search-result'): try: print('----------------------------------------') print(item) except Exception as e: #raise e print('')
And once you run that…
python3 scrapeZomato.py
You could tell that the code is separating the HTML cards.
For further assessment, you can observe the restaurant’s name that always has a class result-title. Therefore, let’s try to reclaim that.
# -*- coding: utf-8 -*- from bs4 import BeautifulSoup import requests headers = {'User-Agent':'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_11_2) AppleWebKit/601.3.9 (KHTML, like Gecko) Version/9.0.2 Safari/601.3.9'} url = 'https://www.zomato.com/ncr/restaurants/pizza' response=requests.get(url,headers=headers) soup=BeautifulSoup(response.content,'lxml') #print(soup.select('[data-lid]')) for item in soup.select('.search-result'): try: print('----------------------------------------') #print(item) print(item.select('.result-title')[0].get_text()) except Exception as e: #raise e print('')
This will provide us different names…
Hurrah!
Now, it’s time to get other data…
# -*- coding: utf-8 -*- from bs4 import BeautifulSoup import requests headers = {'User-Agent':'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_11_2) AppleWebKit/601.3.9 (KHTML, like Gecko) Version/9.0.2 Safari/601.3.9'} url = 'https://www.zomato.com/ncr/restaurants/pizza' response=requests.get(url,headers=headers) soup=BeautifulSoup(response.content,'lxml') #print(soup.select('[data-lid]')) for item in soup.select('.search-result'): try: print('----------------------------------------') #print(item) print(item.select('.result-title')[0].get_text().strip()) print(item.select('.search_result_subzone')[0].get_text().strip()) print(item.select('.res-rating-nf')[0].get_text().strip()) print(item.select('[class*=rating-votes-div]')[0].get_text().strip()) print(item.select('.res-timing')[0].get_text().strip()) print(item.select('.res-cost')[0].get_text().strip()) except Exception as e: #raise e print('')
And once you run that…
Creates all the details we require including reviews, price, ratings, and addresses.
In more superior implementations, you would need to rotate a User-Agent string as Zomato just can’t detect it is the similar browser!
In case, we find a bit advanced, you would understand that Zomato could just block the IP by ignoring all the other tricks. It is a letdown and that is where the majority of web scraping projects fail.
Disabling IP Blocks
Investing in the private turning proxy services including Proxies API could mostly make a difference between any successful as well as headache-free data scraping project that complete the job constantly and one, which never works.
In addition, with 1000 free API calls working, you have nothing to lose with using our comparing notes and rotating proxy. This only takes a single line of addition to its barely disruptive.
Our turning proxy server Proxies API offers an easy API, which can solve your IP Blocking difficulties instantly.
- Having millions of higher speed rotating proxies positioned around the world,
- Having our auto IP rotation
- Having our auto User-Agent-String rotations (that simulate requests from various, authentic web browsers as well as web browser varieties)
- Having our auto CAPTCHA resolving technology,
Hundreds of clients have successfully resolved the problem of IP blocking using our east API.
The entire thing could be accessed with an easy API from Foodspark.
To know more about our Zomato Listings Scraper, contact us or ask for a free quote!
https://www.foodspark.io/how-to-scrape-zomato-listings-using-beautifulsoup-and-python.php
Comments
Post a Comment