Prerequisites for the Indexing API: Getting Set Up with Google
Before you can start using Google’s powerful Indexing API, there are a few essential steps to complete. Don’t worry – it’s not as complex as it sounds! Whether you’re setting this up for a client or your own site, here’s a breakdown of the key prerequisites to get you up and running.
1. Create a Project for Your Client
The first thing you’ll need to do is create a project in Google’s API Console. This is where you’ll tell Google about your client and set up all the necessary settings to enable API access.
To get started, you’ll want to use Google’s setup tool, which will guide you through:
- Creating a project in the Google API Console
- Enabling the Indexing API
- Setting up credentials for your project
This step is crucial because it activates the API and lets Google know about the application you’re working with. Think of it like setting up a “home base” for your project where all the technical details live.
2. Create a Service Account
Once your project is set up, you’ll need to create a service account. This is the account that will interact with the Indexing API on behalf of your project. Here’s how you do it:
- Go to the Service Accounts page in the Google Cloud Console and select your project.
- Click Create Service Account, give it a name, and description. You can keep the default service account ID or create your own.
- When you’re done, click Create.
- You don’t need to change any permissions here, so just click Continue.
- Now, you’ll need to create a key. Choose the JSON format (it’s the most commonly used), and Google will generate a public/private key pair that’s downloaded to your computer. This key is vital for authentication, so make sure you store it securely.
And that’s it! You’ve created your service account. Keep that key safe, as it’s the only copy Google will provide.
3. Add Your Service Account as a Site Owner
Next up, you’ll need to prove that you own the site you’re going to index. This step involves two parts:
Step 1: Prove Site Ownership
To prove ownership of your site, you’ll need to use Google Search Console. You can verify your site in a few different ways (HTML file, DNS, etc.). You can choose between a Domain Property (like example.com
) or a URL-prefix Property (like https://example.com
).
Step 2: Grant Your Service Account Owner Access
Once your site is verified, you can add your service account as a delegated site owner. Here’s how:
- Go to Search Console and click on the property you’ve verified.
- In the Verified owner list, click Add an owner.
- Enter the service account email – you’ll find this in your JSON key file or under the Service Account ID in the Google Cloud Console. It’ll look something like this:
my-service-account@project-name.google.com.iam.gserviceaccount.com
Once you add this, your service account will have the necessary permissions to interact with the Indexing API.
4. Get an Access Token
Every time you make a request to the Indexing API, you’ll need an OAuth token for authentication. This token is granted in exchange for your private key, and it’s valid for a specific amount of time.
Here are the basic requirements for submitting a request to the Indexing API:
- Your request must include the scope:
https://www.googleapis.com/auth/indexing
- Use one of the API endpoints for sending indexing requests.
- Attach your service account access token to the request.
- Define the body of the request (as outlined in the API documentation).
from oauth2client.service_account import ServiceAccountCredentials
import httplib2
import json
import requests
import pandas as pd
from bs4 import BeautifulSoup
# https://developers.google.com/search/apis/indexing-api/v3/prereqs#header_2
JSON_KEY_FILE = "<CRED_FILE_FROM_GOOGLE_CONSOLE>"
SCOPES = ["https://www.googleapis.com/auth/indexing"]
credentials = ServiceAccountCredentials.from_json_keyfile_name(JSON_KEY_FILE, scopes=SCOPES)
http = credentials.authorize(httplib2.Http())
def indexURL(urls, http):
# print(type(url)); print("URL: {}".format(url));return;
print('In Index URL')
print(urls)
ENDPOINT = "https://indexing.googleapis.com/v3/urlNotifications:publish"
for u in urls:
# print("U: {} type: {}".format(u, type(u)))
content = {}
content['url'] = u.strip()
content['type'] = "URL_UPDATED"
json_ctn = json.dumps(content)
# print(json_ctn);return
response, content = http.request(ENDPOINT, method="POST", body=json_ctn)
result = json.loads(content.decode())
print('___________________')
print(result)
print('___________________')
# For debug purpose only
if("error" in result):
print("Error({} - {}): {}".format(result["error"]["code"], result["error"]["status"], result["error"]["message"]))
else:
print("urlNotificationMetadata.url: {}".format(result["urlNotificationMetadata"]["url"]))
# print("urlNotificationMetadata.latestUpdate.url: {}".format(result["urlNotificationMetadata"]["latestUpdate"]["url"]))
# print("urlNotificationMetadata.latestUpdate.type: {}".format(result["urlNotificationMetadata"]["latestUpdate"]["type"]))
# print("urlNotificationMetadata.latestUpdate.notifyTime: {}".format(result["urlNotificationMetadata"]["latestUpdate"]["notifyTime"]))
"""
data.csv has 2 columns: URL and date.
I just need the URL column.
"""
# Fetch recent links using sitemap
r = requests.get('https://<YOUR_WEBSITE_DOMAIN>/post-sitemap.xml')
print(r)
# Parsing the HTML
soup = BeautifulSoup(r.content, features="xml")
print(soup)
print('______________________')
# table-wrapper
s = soup.find_all('loc')
# content = soup.find_all('a', href=True)
urls = []
for a in s:
# print('=============')
# print(a.text)
# print('=============')
if "wp-content/uploads/" not in a.text:
urls.append(a.text)
#
indexURL(urls, http)
# csv = pd.read_csv("my_data.csv")
# csv[["URL"]].apply(lambda x: indexURL(x, http))
Leave a Reply