How to Scrape Food Data with Google Maps Data Scraping Using Python & Google Colab?

 how-to-scrape-food-data-with-google-maps-data-scraping-using-python-&-google-colab

Do you want a complete list of restaurants with their ratings and addresses whenever you visit a place or go for holidays? Off-course yes as it makes your way much easier. The easiest way to do it is using data scraping.

Web scraping or data scraping imports data from a website to the local machine. The result is in the form of spreadsheets so that you can get an entire list of restaurants available around me having its address as well as ratings in the easy spreadsheet!

Here at Foodspark, we use Python 3 scripts to scrape restaurant and food data as installing Python could be very useful. For proofreading the script, we use Google Colab for running the script as it helps us running the Python scripts on the cloud.

As our objective is to find a complete listing of places, scraping Google Maps data is its answer. Using Google Maps scraping, we can extract a place’s name, coordinates, address, kind of place, ratings, phone number, as well as other important data. For a start, we can also use a Places Scraping API. By using the Places Scraping API, it’s easy to extract Places data.

1st Step: Which data would you need?

Here, we will search for “restaurants around me” in Sanur, Bali within a 1 km of radius. Therefore, the parameters might be ‘restaurants’, ‘Sanur Beach’, as well as ‘1 km’.

Let’s translate it in Python:

coordinates = ['-8.705833, 115.261377']
keywords = ['restaurant']
radius = '1000'
api_key = 'acbhsjbfeur2y8r' #insert your API key here

These ‘keywords’ help us find places, which are listed as restaurants OR results having ‘restaurant’ within them. It’s better than using ‘type’ or ‘name’ of places as we can have a complete listing of places, which the name, as well as type, contains ‘restaurant’. For instance, we can have Sushi Tei as well as Se’i Sapi Restaurants at a similar time. In case, we utilize ‘name’, then we’ll only have places whose name is having the ‘restaurant’ word in it. In case we utilize ‘type’, we’ll get places where the type is the ‘restaurant’. Although, the disadvantage of using ‘keywords’ is that this will take more time for cleaning the data.

2nd Step: Make some libraries required, like:

import pandas as pd, numpy as np
import requests
import json
import time
from google.colab import files

Have you noticed the “from google.colab imported files”? Yes, the use of Google Colab needs us to utilize google.colab library for opening or saving data.

3rd Step: Write a code, which produces data depending on the parameters given in the 1st Step.

for coordinate in coordinates:
for keyword in keywords:url = 'https://maps.googleapis.com/maps/api/place/nearbysearch/json?location='+coordinate+'&radius='+str(radius)+'&keyword='+str(keyword)+'&key='+str(api_key)while True:
print(url)
respon = requests.get(url)
jj = json.loads(respon.text)
results = jj['results']
for result in results:
name = result['name']
place_id = result ['place_id']
lat = result['geometry']['location']['lat']
lng = result['geometry']['location']['lng']
rating = result['rating']
types = result['types']
vicinity = result['vicinity']data = [name, place_id, lat, lng, rating, types, vicinity]
final_data.append(data)time.sleep(5)if 'next_page_token' not in jj:
break
else:next_page_token = jj['next_page_token']url = 'https://maps.googleapis.com/maps/api/place/nearbysearch/json?key='+str(api_key)+'&pagetoken='+str(next_page_token)labels = ['Place Name','Place ID', 'Latitude', 'Longitude', 'Types', 'Vicinity']

This code helps us get the place’s name, latitude-longitude, ids, ratings, types, and area for every keyword and coordinate. As Google shows only 20 entries on every page, we need to add ‘next_page_token’ for scraping the next page data. Let’s assume we have 40 restaurants nearby Sanur, then Google would show results in two pages. In the case of 55 results, there will be three pages.

The maximum data points that we scrape are merely 60 places. This is Google’s rule. For instance, 140 restaurants are there around Sanur within a 1 km of radius from where we started. It means merely 60 of the 140 restaurants would get generated. Therefore, to prevent inconsistencies, we need to control radius as well as coordinate competently. Ensure the radius isn’t very wide that results in “merely 60 points are produced while there are many of them”. In addition, ensure the radius isn’t very small that results in listing many coordinates. Both might not become efficient, therefore we want to understand the context of a location earlier.

4th Step: Save data to the local machine

4th-step
export_dataframe_1_medium = pd.DataFrame.from_records(final_data, columns=labels)
export_dataframe_1_medium.to_csv('export_dataframe_1_medium.csv')

Final Step: Combine all these steps into a complete code:

import pandas as pd, numpy as np
import requests
import json
import time
final_data = []# Parameters
coordinates = ['-8.705833, 115.261377']
keywords = ['restaurant']
radius = '1000'
api_key = 'acbhsjbfeur2y8r' #insert your Places APIfor coordinate in coordinates:
for keyword in keywords:url = 'https://maps.googleapis.com/maps/api/place/nearbysearch/json?location='+coordinate+'&radius='+str(radius)+'&keyword='+str(keyword)+'&key='+str(api_key)while True:
print(url)
respon = requests.get(url)
jj = json.loads(respon.text)
results = jj['results']
for result in results:
name = result['name']
place_id = result ['place_id']
lat = result['geometry']['location']['lat']
lng = result['geometry']['location']['lng']
rating = result['rating']
types = result['types']
vicinity = result['vicinity']data = [name, place_id, lat, lng, rating, types, vicinity]
final_data.append(data)time.sleep(5)if 'next_page_token' not in jj:
break
else:next_page_token = jj['next_page_token']url = 'https://maps.googleapis.com/maps/api/place/nearbysearch/json?key='+str(api_key)+'&pagetoken='+str(next_page_token)labels = ['Place Name','Place ID', 'Latitude', 'Longitude', 'Types', 'Vicinity']export_dataframe_1_medium = pd.DataFrame.from_records(final_data, columns=labels)
export_dataframe_1_medium.to_csv('export_dataframe_1_medium.csv')

Now, you can easily download data from different Google Colab files. Just click on the arrow button given on the left pane and then click ‘Files’ and download data!

Your scraped data will be saved in the CSV format and it could be visualized with any tools, which you’re aware of! This could be Python, Tableau, R, etc. Here, we have visualized that with Kepler.gl; a WebGL empowered, data agnostic, and high-performance web app for geospatial diagnostic visualizations.

That is how the data will look in the spreadsheet:

that-is-how-the-data-will-look-in-the-spreadsheet

And, that’s how it will look in the Kepler.gl map:

and-that’s-how-it-will-look-in-the-kepler.gl-map

59 restaurants are there from where we stand, chilling on the beach at Sanur. Just need to add name and ratings in a map and we’re ready to explore foods around our area!


https://www.foodspark.io/how-to-scrape-food-data-with-google-maps-data-scraping-using-python-and-google-colab.php

Comments

Popular posts from this blog

A Comprehensive Guide to Grubhub Data Scraping and Grubhub API

How Web Scraping is Used to Deliver Ocado Grocery Delivery Data?

How Web Scraping is Used to Explore Indian Restaurants in Canada?