A Modest Proposal

Recently, I had an "xkcd" moment, and decided to waste otherwise leisurely reading time conceiving a technically brilliant (but utterly worthless) project. Although the idea is meant as a joke, feel free to take it and run with it if it inspires you.

The Case for a New Restaurant Review Application

While I don't really consider myself a "foodie" (I avoid trendy, upscale restaurants, and shudder at sweet drizzles), I do enjoy going to a variety of restaurants and exploring their menus. However, the number of restaurants coming and going in Flagstaff have exceeded my ability to keep track. I hardly ever go downtown anymore, and have hopelessly lost touch with the restaurant scene there.

Nonetheless, sometimes I feel the urge to contribute reviews for those restaurants with which I'm still familiar, as well as to check out reviews by others. Perhaps I might even feel inclined to look at reviews for a place which I'm considering to visit.

While I acknowledge the existence of many crowd-sourced review sites, for this particular study I decided to focus on Yelp as an example of how such sites can go wrong.

Yelp claims to use "automated software" that filters and sorts reviews "based primarily on quality, reliability and the reviewer’s activity on Yelp", thereby supposedly relieving the intrepid seeker of recommendations from having to sort through the weeds to find useful reviews. The more I tried to determine the difference between approved and unapproved reviews, however, the more I was mystified by their methodology. I finally decided that the Yelp Sort was smoke and mirrors, and the filtering algorithm was pure snake oil. Clearly, there is a need for greater transparency when it comes to review analytics.

That's when I got the idea for a new type of restaurant review app: why not give the user the ability to sort through all submitted reviews and perform their own data filtering, analysis, and mapping? A well-designed and well-presented database, coupled with intuitive, simple-to-use tools, could conceivably allow the user to make queries and see the results in the form of map layers and charts.

Spatial Queries

The technical challenges behind the creation of such an application are daunting. While it's fairly easy to create widgets with pre-cooked queries, it's not so easy to develop a relatively simple and intuitive way to turn a series of mouse clicks or taps into sophisticated spatial queries that ask the sorts of questions that a user might want to ask. And you need to have an underlying data model that would support those queries.

Here are some examples of queries that might be made [see Appendix B for example results]:

A GIS professional could develop a sophisticated geoprocessing model that can make those queries, but again the real trick is turning it into a pleasant user experience. The key, I believe, would be to provide simple choices that could be made more complex as the user chooses to dive in deeper to explore the available data.

Data Collection

Obviously for such a scheme to provide meaningful results would require an unprecentedly massive data collection of every restaurant and customer in existence. And that's where my modest proposal comes in.

I'm not proposing body cams for bathroom cleanliness, nor interruption of phone displays and earpieces for AI-driven contextual interrogation. That sort of approach cannot possibly work because it would be annoying. What is required is an interface that is completely voluntary and fun for people to use.

I'll leave the rest to your virtual-reality-embracing, geoboundary-driven, incessantly game-playing imagination.

In the meantime, let me use this as an introduction to a new feature in my own web site: ffred's restaurant reviews! No, it's not the aforementioned app, but at least it's a place where I can write about various restaurants any way I damn well want to.

Appendix A: An Example Data Model

While I didn't spend a great deal of time on the database schema, my goal was to come up with something that could be readily quantified, analyzed, and presented thematically on a map. Of course, there's plenty of room for improvement. For example, I did not include a field for price range.

RESTAURANTS

This feature class contains the geographic location as well as certain business data. It would be maintained in the back-office, perhaps via wiki.

GLOBALID: A unique identifier.

NAME: Three guesses, and the first two don't count.

HOURS: I could not think of a good way to model this in a single field. Instead, I created a joined table. Hence, the need for a unique identifier.

BUSINESS CATEGORY: This was the toughest measure for me to decide. I definitely prefer local establishments to large chains, but there are gradations in between. Finally, I settled on the following categories:

CodeDescription
LocalLocal establishment (1-2 locations)
SmallLocal or small chain (3-20 locations)
LargeLarge chain or franchise (>20 locations)

Thus, Country Host, which has 2 locations, clearly qualifies as Local. Picazzo's definitely qualifies as Small, while Oregano's and Pita Jungle are not quite Large, but getting really close. Flame Broiler clearly qualifies as Large. Again, this is factual data and would not be captured by the review collection app.

KEYWORDS: This is a collection of single, space-delimited, lower-case keywords that describe the food and/or ingredients. For example: "mexican pizza thai burger" [that would be an interesting place!].

REVIEWS

This table is joined to the restaurants feature class via the restaurant ID. The review collection app would populate this table.

BREAKFAST, LUNCH, DINNER, CLEANLINESS, SERVICE, and NOISE are all rated on a scale of 1 to 10, where 10 is the most desirable value (or null where not applicable). Thus, 1 would be Noisiest, while 10 would be Quietest [although, I admit, arguments could be made for the opposite if you're looking for a sports bar]. Note that noise refers not only to sound, but also to visual distractions such as video screens.

VISITS: How often a place has been visited should influence the credibility of a review. A good or bad review after just one visit might not have the analytical weight of a review with several visits.

CodeDescription
0Never*
1Once
2A few times
3Many times
4Routinely

*This might be a placeholder for someone who intends to visit, or simply has some potentially interesting information to share.

RETURN: The likeliness that a reviewer will return might also affect the review's credibility.

CodeDescription
0Never
1Probably Not
2Maybe Not
3Maybe Yes
4Probably Yes
5Definitely

LASTVISIT: A visit that is less than six months old might be more credible than one that is two years old. The review collection app would obviously have to allow reviewers to keep their reviews up-to-date.

LASTUPDATED: Similarly, a review that is more recent might be more credible.

REVIEW: Link to detailed review.

Appendix B: Example Query Results

The dataset I use here is pulled out of my ass, but the queries and results are real; it is in a file geodatabase, rather than Oracle, which limits the complexity of the SQL statements I can build.

Query:

Show me Mexican restaurants in 86001 that are open for breakfast by 9 AM Sunday and average at least 5 out of 10 among reviews that were last visited less than 6 months ago and were visited more than once.

Python script:


import arcpy
import datetime
arcpy.env.workspace = "c:/apps/Restaurants/Restaurants.gdb"
# Get restaurants in zip code
result = arcpy.MakeFeatureLayer_management("Restaurants", "AllRestaurants")
lyrRestaurants = result[0]
sWhere = "zip_code = '86001'"
result = arcpy.MakeFeatureLayer_management("Flagstaff_Zip", "SelectedZip", sWhere)
lyrZip = result[0]
arcpy.SelectLayerByLocation_management(lyrRestaurants, "INTERSECT", lyrZip, "", "NEW_SELECTION")
sRestaurantIDs = "("
Fields = ["GLOBALID"]
with arcpy.da.SearchCursor(lyrRestaurants, Fields) as cursor:
    bFirst = True
    for row in cursor:
        if bFirst:
            bFirst = False
        else:
            sRestaurantIDs += ","
        sRestaurantIDs += "'" + row[0] + "'"
sRestaurantIDs += ")"
# Get ids for restaurants that are open by 9 AM on Sunday
sHoursIDs = "("
Fields = ["RestaurantID"]
sWhere = "day = 'Sunday' AND start <= 9 AND restaurantid IN " + sRestaurantIDs
with arcpy.da.SearchCursor("Hours", Fields, sWhere) as cursor:
    bFirst = True
    for row in cursor:
        if bFirst:
            bFirst = False
        else:
            sHoursIDs += ","
        sHoursIDs += "'" + row[0] + "'"
sHoursIDs += ")"
# Get ids for restaurants where reviews average at least 5
# among those that were visited less than 6 months ago
# and indicate more than one visit
ReviewSums = {};
ReviewCounts = {};
Fields = ["RestaurantID", "Breakfast"]
dtToday = datetime.date.today() - datetime.timedelta(days=183)
sDate = "date '" + dtToday.strftime("%Y-%m-%d") + "'"
sWhere = "lastvisit > " + sDate + " AND visits > 1 AND restaurantid IN " + sRestaurantIDs
with arcpy.da.SearchCursor("Reviews", Fields, sWhere) as cursor:
    for row in cursor:
        if not row[1]:
            continue
        sID = row[0]
        if not ReviewCounts.get(sID):
            ReviewCounts[sID] = 1
            ReviewSums[sID] = row[1]
        else:
            ReviewCounts[sID] += 1
            ReviewSums[sID] += row[1]
sReviewsIDs = "("
bFirst = True
for sID in ReviewSums.keys():
    dAvg = ReviewSums[sID] / ReviewCounts[sID]
    if dAvg < 5:
        continue
    if bFirst:
        bFirst = False
    else:
        sReviewsIDs += ","
    sReviewsIDs += "'" + row[0] + "'"
sReviewsIDs += ")"
# Get restaurants with Mexican food that match criteria
sWhere = "globalid IN " + sHoursIDs
sWhere += " AND globalid IN " + sReviewsIDs
sWhere += " AND keywords LIKE '%mexican%'"
result = arcpy.MakeFeatureLayer_management("Restaurants", "SundayMexican", sWhere)
lyrRestaurants = result[0]
arcpy.SetParameter(0, lyrRestaurants)

Results:

Query:

Show me restaurants within 1.5 miles that are open by 11 AM today and offer local or organic ingredients.

Python script:


import arcpy
import datetime
arcpy.env.workspace = "c:/apps/Restaurants/Restaurants.gdb"
# Get restaurants within 1.5 miles of location
result = arcpy.MakeFeatureLayer_management("Restaurants", "AllRestaurants")
lyrRestaurants = result[0]
dLongitude = -111.676364
dLatitude = 35.181468
sr = arcpy.SpatialReference(4326) # GCS_WGS_1984
sDist = "1.5 Miles"
sTmpWS = "in_memory"
sTmpLoc = "tmpLoc"
sTmpLocPath = sTmpWS + "/" + sTmpLoc
sTmpBuf = "tmpBuf"
sTmpBufPath = sTmpWS + "/" + sTmpBuf
arcpy.CreateFeatureclass_management(sTmpWS, sTmpLoc, "POINT", "", "DISABLED", "DISABLED", sr)
pt = arcpy.Point(dLongitude, dLatitude)
with arcpy.da.InsertCursor(sTmpLocPath, ["SHAPE@"]) as cursor:
    cursor.insertRow([pt])
arcpy.Buffer_analysis(sTmpLocPath, sTmpBufPath, sDist)
result = arcpy.MakeFeatureLayer_management(sTmpBufPath, "Buffer")
lyrBuf = result[0]
arcpy.SelectLayerByLocation_management(lyrRestaurants, "INTERSECT", lyrBuf, "", "NEW_SELECTION")
sRestaurantIDs = "("
Fields = ["GLOBALID"]
with arcpy.da.SearchCursor(lyrRestaurants, Fields) as cursor:
    bFirst = True
    for row in cursor:
        if bFirst:
            bFirst = False
        else:
            sRestaurantIDs += ","
        sRestaurantIDs += "'" + row[0] + "'"
sRestaurantIDs += ")"
arcpy.Delete_management(sTmpLoc)
arcpy.Delete_management(sTmpBuf)
# Get ids for restaurants that are open by 11 AM today
sToday = datetime.date.today().strftime("%A")
sHoursIDs = "("
Fields = ["RestaurantID"]
sWhere = "day = '" + sToday + "'"
sWhere += " AND start <= 11 AND end >= 13 AND restaurantid IN " + sRestaurantIDs
with arcpy.da.SearchCursor("Hours", Fields, sWhere) as cursor:
    bFirst = True
    for row in cursor:
        if bFirst:
            bFirst = False
        else:
            sHoursIDs += ","
        sHoursIDs += "'" + row[0] + "'"
sHoursIDs += ")"
# Get restaurants with local or organic food that match criteria
sWhere = "globalid IN " + sHoursIDs
sWhere += " AND (keywords LIKE '%local%' OR keywords LIKE '%organic%')"
result = arcpy.MakeFeatureLayer_management("Restaurants", "LocalOrganicLunch", sWhere)
lyrRestaurants = result[0]
arcpy.SetParameter(0, lyrRestaurants)

Results:

Query:

Show me restaurants within 1.5 miles that are open by 4 PM today, offer Chicago pizza, and are not large chains.

Python script:


import arcpy
import datetime
arcpy.env.workspace = "c:/apps/Restaurants/Restaurants.gdb"
# Get restaurants within 1.5 miles of location
result = arcpy.MakeFeatureLayer_management("Restaurants", "AllRestaurants")
lyrRestaurants = result[0]
dLongitude = -111.676364
dLatitude = 35.181468
sr = arcpy.SpatialReference(4326) # GCS_WGS_1984
sDist = "1.5 Miles"
sTmpWS = "in_memory"
sTmpLoc = "tmpLoc"
sTmpLocPath = sTmpWS + "/" + sTmpLoc
sTmpBuf = "tmpBuf"
sTmpBufPath = sTmpWS + "/" + sTmpBuf
arcpy.CreateFeatureclass_management(sTmpWS, sTmpLoc, "POINT", "", "DISABLED", "DISABLED", sr)
pt = arcpy.Point(dLongitude, dLatitude)
with arcpy.da.InsertCursor(sTmpLocPath, ["SHAPE@"]) as cursor:
    cursor.insertRow([pt])
arcpy.Buffer_analysis(sTmpLocPath, sTmpBufPath, sDist)
result = arcpy.MakeFeatureLayer_management(sTmpBufPath, "Buffer")
lyrBuf = result[0]
arcpy.SelectLayerByLocation_management(lyrRestaurants, "INTERSECT", lyrBuf, "", "NEW_SELECTION")
sRestaurantIDs = "("
Fields = ["GLOBALID"]
with arcpy.da.SearchCursor(lyrRestaurants, Fields) as cursor:
    bFirst = True
    for row in cursor:
        if bFirst:
            bFirst = False
        else:
            sRestaurantIDs += ","
        sRestaurantIDs += "'" + row[0] + "'"
sRestaurantIDs += ")"
arcpy.Delete_management(sTmpLoc)
arcpy.Delete_management(sTmpBuf)
# Get ids for restaurants that are open by 4 PM today
sToday = datetime.date.today().strftime("%A")
sHoursIDs = "("
Fields = ["RestaurantID"]
sWhere = "day = '" + sToday + "'"
sWhere += " AND start <= 16 AND end >= 18 AND restaurantid IN " + sRestaurantIDs
with arcpy.da.SearchCursor("Hours", Fields, sWhere) as cursor:
    bFirst = True
    for row in cursor:
        if bFirst:
            bFirst = False
        else:
            sHoursIDs += ","
        sHoursIDs += "'" + row[0] + "'"
sHoursIDs += ")"
# Get restaurants with Chicago pizza that match criteria
sWhere = "globalid IN " + sHoursIDs
sWhere += " AND keywords LIKE '%pizza%' AND keywords LIKE '%chicago%'"
sWhere += " AND category <> 'Large'"
result = arcpy.MakeFeatureLayer_management("Restaurants", "ChicagoPizzaDinner", sWhere)
lyrRestaurants = result[0]
arcpy.SetParameter(0, lyrRestaurants)

Results:

Query:

Show me the number and average of all ratings for the Crown Railroad by meal type (Breakfast, Lunch, Dinner).

Python script:


import arcpy
import datetime
arcpy.env.workspace = "c:/apps/Restaurants/Restaurants.gdb"
# Get summary stats for restaurant
ReviewSums = {"Breakfast":0, "Lunch":0, "Dinner":0};
ReviewCounts = {"Breakfast":0, "Lunch":0, "Dinner":0};
ReviewAvg = {"Breakfast":0, "Lunch":0, "Dinner":0};
MealFields = ["Breakfast", "Lunch", "Dinner"]
sWhere = "restaurantid = '{6100406A-F47B-4386-8080-745A0F977378}'"
with arcpy.da.SearchCursor("Reviews", MealFields, sWhere) as cursor:
    for row in cursor:
        if row[0]:
            ReviewCounts["Breakfast"] += 1
            ReviewSums["Breakfast"] += row[0]
        if row[1]:
            ReviewCounts["Lunch"] += 1
            ReviewSums["Lunch"] += row[1]
        if row[2]:
            ReviewCounts["Dinner"] += 1
            ReviewSums["Dinner"] += row[2]
for sKey in ReviewCounts.keys():
    iCount = ReviewCounts[sKey]
    if iCount == 0:
        continue
    ReviewAvg[sKey] = ReviewSums[sKey] / iCount
# Populate output table
sTableName = "CrownRailroad"
if arcpy.Exists(sTableName):
    arcpy.DeleteRows_management(sTableName)
else:
    arcpy.CreateTable_management("", sTableName)
    arcpy.AddField_management(sTableName, "Meal", "TEXT", "", "", 20)
    arcpy.AddField_management(sTableName, "Count", "LONG")
    arcpy.AddField_management(sTableName, "Average", "FLOAT")
Fields = ["Meal", "Count", "Average"]
for sMeal in MealFields:
    with arcpy.da.InsertCursor(sTableName, Fields) as cursor:
        row = [sMeal, ReviewCounts[sMeal], ReviewAvg[sMeal]]
        cursor.insertRow(row)

Results:

August 29, 2016


The Circular File