Friday, January 11, 2019

Etsy Shop Keyword Tag Scraping With Python

Idea:

This project stemmed from the idea of helping my wife with her Etsy shop. She's been running her shop on Etsy for a couple of years now, and to make her life a little bit easier I wanted to use my data skills to help her do some product research. Happy wife, happy life.

My main goal was to try and generate a CSV file with all the keywords tags that were used in each of the top fifty (50) products from a specific search query on Etsy.  I would then be able to use this data to discover what keywords other sellers are using, in hopes to improve Etsy product listings by adding the most popularly used keywords to existing listings (for example on my wife's products). Theoretically, this would improve the odds of these listings appearing in relevant Etsy searches.

**Sidenote: There are now two (2) organizations that have built a great user interface online cloud platforms with this similar type of idea.  Both websites provide this date as well as more insight than what is available with just a simple scrape.  Here are both links: marmalead.com and etsyrank.com**

Data Viz:



Insight:

The best insight for this project was finding keywords to use in a listing.  When reviewing the word cloud, I want to pay close attention to the little words just as much as the larger words.  The little words might provide great long-tail keyword tags that I may be missing.

The other thing we are looking for, that might impact our listing, is the different spelling variations or phrases used in the product listing.  Finding this type of data nuggets can help provide a strong Etsy SEO.  Some search queries may be different from the correct spelling.


I would also stack the words by the count that they show up in the column.  This will give an exact amount of time the words showed up in different listings.  This can provide a must-have in the keyword tags and what words are possibly being underutilized.


Keyword_Tag
COUNTA of Keyword_Tag
Clothing98
Tops & Tees94
Unisex Adult Clothing84
T-shirts84
dance shirt70
gift for dancer50
dancer shirt50
dancing shirt46
dance42
dancer gift38
*This is the top ten words used in the example

This same style of analysis could also be used on the title of the product.  The product title can be thoroughly analyzed to find the best words and how the words are organized in the listings.  This requires an understanding of statistics and how to get the range of the location, but it can be completed.

You can also perform a cost analysis of the products being sold.  This would help you to decide what price point you might want to set your product.

And lastly, you may be wondering why we needed the seller name.  Well, let me tell you it provides great insight.  This information can provide you with how many times a seller shows up on a particular keyword search.  This would be a bigger competitor in terms of ranking.  It would be worth investigating more about the seller to help improve your product listing in Etsy.

Project:

Etsy is an online marketplace where a seller who makes things (such as crafts) can sell their goods to customers.  Etsy is similar to Amazon and eBay but targets the sale of handmade goods and craft supplies.  Their platform allows for thirteen (13) keywords for each product listing, which they call keyword tags.  These keyword tags are what help get the product noticed in the search algorithm.  Each of these keyword tags is displayed on the product page near the bottom.

This project is going to scrape those keyword tags.

Tools:

Anaconda 
Python 3.7
Jupyter Notebook
Pandas
Beautiful Soup
Google Sheets

Data:

The data is from Etsy.com in an unstructured format.  This scrape will be used to build the data set.

Here is the example dataset generated in Google Sheets.

Data Cleaning:

The data cleaning was completed in Google Sheets to help build a quick word cloud.

The data we will be using to gather in this blog post is the search term in Etsy "Dance Shirts."  Here is what the CSV file looked like when I ran the code.

Process:

The coding will be written in Python on a Jupyter Notebook.

In order to scrape a site, I will need to install Beautiful Soup and Pandas packages into Python.  I am using Anaconda, which provides the packages already downloaded for Python.

Code:
import requests
from bs4 import BeautifulSoup
import pandas as pd

Next is to fetch that data from Etsy.com.  I need to get the website of the page I am going to scrape and look at the HTML code from the site.  The HTML code will be broken into tags and that is what breaks up the elements I need to extract.  What we are looking for is the results from a search query on Etsy, and from that page we will be learning the structure.  This will help give our crawler a list of products.

Code:
# this is the keyword search term for etsy
query = "dance+shirt"
#this is getting calling the page in python
etsy_page_search = requests.get("https://www.etsy.com/search/?q=" + query)
soup_search = BeautifulSoup(etsy_page_search.content,"html5lib")

The list data build needs to be put into a new data frame, which we will build a list for the next part of the crawl to use.

Code:
#This is the listing id list
listing_id = soup_search.find_all("a")
#This holds the listing url
list_id_records = []
keywords_records = []

#this gather listing url by listing id and adding to website address
for listing in listing_id:
    list_id = (listing.get("data-listing-id"))
    if list_id != None:
        url_product = "http://www.etsy.com/listing/" + str(list_id) +"/"
        list_id_records.append(url_product)

After building the code to get the 50 products on the page, we need to open each product page and have the crawler scrape this page.  In this crawl, we the main data we are looking to grab from each page is as follows:
  • Title of the product
  • Name of the Seller
  • The sale price of the product
  • The keyword tags
Code:

 #getting product page information     
for list_id in list_id_records:
    etsy_page_product = requests.get(list_id)
    soup_product = BeautifulSoup(etsy_page_product.content,"html.parser")
    keywords_list = soup_product.find_all("a", {"class":"text-center btn btn-link tag-button-link"})
    for keywords in keywords_list:
        keyword = keywords.text        
        title = soup_product.find("h1", {"class":"mb-xs-2 override-listing-title break-word"}).text
        seller = soup_product.find("span", {"itemprop":"title"}).text
        price = soup_product.find("span", {"class":"currency-value"}).text
        keywords_records.append((title, seller, price, keyword))

The way this data will output is by putting thirteen (13) keywords in one row, making thirteen columns of keywords.  What we want is to have all of the keywords in one column.  By placing them in one column, we also have to repeat a lot of the information again on the listing.  Guess how many times? You guessed it- thirteen (13) times.

The reason for this madness is to make it easier to build the word cloud picture in Google sheets and get a simple analysis of the count of the words.

After we gather all this information we need to store it in a CSV file.  I used pandas to help build the data frame and make it a CSV file.  The CSV file generated by the code will then be used in Google Sheets.

df = pd.DataFrame(keywords_records, columns =["Title","Seller","Current_Price","Keyword_Tag"])
df.to_csv(query + ".csv", index=True, encoding="utf-8")
len(keywords_records)

Inside Google sheets, we want to open our CSV as a Google Sheets file.  This will give the spreadsheet with all the information we needed to scrape.  We will take the keyword column and use that column to build the word cloud.  The larger the word, the more commonly it is used.

5 comments:

  1. This comment has been removed by a blog administrator.

    ReplyDelete
    Replies
    1. I apologize about deleting you comment. I hit the wrong button. I am currently trying to figure out how to get it back. But to answer your question. We did see a spike in some of products that did not usually sell. After a few days we sold stuff that we thought was on the last pages of products. If you would like some help please let me know I would be happy.

      Delete
  2. This is great - it seems to run AOK but the CSV file is blank - list_id_records is always [] after soup_search. Very odd!?

    ReplyDelete
  3. Hi Larry, it seems to run ok but he CSV file is blank Thanks

    ReplyDelete
  4. I fixed this code a bit, it fixes the CSV file being blank:

    import requests
    from bs4 import BeautifulSoup
    import pandas as pd

    # this is the keyword search term for etsy

    query = input("what do you want to search for?\n")
    #this is getting calling the page in python
    etsy_page_search = requests.get("https://www.etsy.com/search/?q=" + query)
    soup_search = BeautifulSoup(etsy_page_search.content,"html.parser")

    #This is the listing id list
    listing_id = soup_search.find_all("a")

    #This holds the listing url
    list_id_records = []
    keywords_records = []

    #this gather listing url by listing id and adding to website address
    for listing in listing_id:
    list_id = (listing.get("data-listing-id"))

    if list_id != None:
    url_product = "http://www.etsy.com/listing/" + str(list_id) +"/"
    list_id_records.append(url_product)

    for list_id in list_id_records:
    etsy_page_product = requests.get(list_id)
    soup_product = BeautifulSoup(etsy_page_product.content,"html.parser")
    keywords_list = soup_product.find_all("a", {"class":"wt-btn wt-action-group__item"})
    for keywords in keywords_list:
    keyword = keywords.text
    title = soup_product.find("h1", {"class":"mb-xs-2 override-listing-title break-word"})
    seller = soup_product.find("span", {"itemprop":"title"})
    price = soup_product.find("span", {"class":"currency-value"})
    keywords_records.append((title, seller, price, keyword))
    print(keyword)

    df = pd.DataFrame(keywords_records, columns =["Title","Seller","Current_Price","Keyword_Tag"])
    df.to_csv(query + ".csv", index=True, encoding="utf-8")
    len(keywords_records)

    also check out my etsy https://www.etsy.com/shop/CybergothCreations?ref=profile_header
    lol

    ReplyDelete

10 Years of Marvel Comic Move Magic -Part 2 (How Much Was James Gunn Worth to MCU?)

Idea: Welcome to the second part of the MCU analysis project.  I know all you awesome readers have been dying with anticipation to see w...