Step-by-step: Build Rebecca Yu’s dining recommender micro-app using Scrapy + Playwright
Step-by-step guide to build a reproducible dining recommender micro-app with Scrapy + Playwright, preference scoring, and a tiny web UI.
Hook — Stop wasting time arguing over dinner: build a reproducible dining micro-app
Decision fatigue, messy group chats, and a dozen half-baked suggestions are everyday pains for non-dev founders who just want a quick place to eat. In 2026, with mature headless-browser integrations and cheap LLM access, you can go from idea to a working dining recommender micro-app in a weekend. This tutorial recreates Rebecca Yu’s quick dining app concept using Scrapy + Playwright, implements a lightweight preference scoring engine, and ships a minimal web UI — with step-by-step code and deployment guidance that non-developers can follow.
Why this micro-app in 2026 (and why now)
Short answer: the tooling is aligned. By late 2025 and into 2026, Playwright’s cloud and headless improvements plus tighter integration with Scrapy (via scrapy-playwright) made rendering JS-heavy restaurant pages reliable. LLMs and managed embedding services let micro-app creators add explainability and personality to recommendations without building heavy infra. That convergence makes a reproducible, low-maintenance dining micro-app practical for founders who want ownership without a long engineering backlog.
"Once vibe-coding apps emerged, I started hearing about people with no tech backgrounds successfully building their own apps." — Rebecca Yu (paraphrase)
What you’ll build — overview & architecture
End result: a small scraper pipeline that extracts restaurant records (name, address, cuisine, price level, ratings, tags), a scoring function that matches user preferences, and a tiny web UI (FastAPI + Jinja) that returns ranked recommendations and short LLM-generated explanations.
Architecture (simple):
- Scraping: Scrapy + scrapy-playwright for JS-rendered pages
- Storage: local SQLite (or JSONL) for prototypes
- Matching: Python preference-scoring function (deterministic)
- Explainability: small LLM prompt to generate short recommendation text
- Web UI: FastAPI with a single HTML endpoint and a /recommend API
- Deployment: Docker + Render/Fly/Railway for quick hosting
Prerequisites (aimed at non-dev founders)
- Basic terminal familiarity
- Python 3.10+ installed
- pip, virtualenv (or use Docker)
- Optional: an OpenAI/Anthropic API key (or any LLM provider) for short explanations
Step 1 — Create the Scrapy project & enable Playwright
Start by making a reproducible Python project. If you are a non-dev founder, copy-paste the commands below into a terminal.
python -m venv venv
source venv/bin/activate # Windows: venv\Scripts\activate
pip install scrapy scrapy-playwright playwright
playwright install chromium
Create a new Scrapy project:
scrapy startproject dinebot
cd dinebot
Enable scrapy-playwright in dinebot/settings.py (add or update):
# settings.py additions
DOWNLOADER_MIDDLEWARES = {
"scrapy_playwright.middleware.PlaywrightMiddleware": 800,
}
TWISTED_REACTOR = "twisted.internet.asyncioreactor.AsyncioSelectorReactor"
Note: scrapy-playwright handles Playwright-based page rendering inside Scrapy spiders so you can reuse Scrapy's pipelines and item model.
Step 2 — Write a Playwright-backed spider
We’ll target a generic example site structure. For legal and reliability reasons, use sites that allow crawling or your own mock pages. Below is a minimal spider that uses PlaywrightRequest to render JS and extract restaurant fields.
# dinebot/spiders/restaurants.py
from scrapy import Spider
from scrapy_playwright.page import PageCoroutine
from scrapy_playwright.handlers import PageMethod
from scrapy import Request
from scrapy_playwright.http import PlaywrightRequest
class RestaurantSpider(Spider):
name = "restaurants"
start_urls = [
"https://example-restaurant-site.com/city/restaurants" # replace with allowed target
]
def start_requests(self):
for url in self.start_urls:
yield PlaywrightRequest(url, callback=self.parse_list, dont_filter=True)
async def parse_list(self, response):
# example selector -- adjust to target site
for card in response.css(".restaurant-card"):
detail_url = card.css("a::attr(href)").get()
yield PlaywrightRequest(response.urljoin(detail_url), callback=self.parse_detail)
async def parse_detail(self, response):
item = {
"name": response.css("h1.name::text").get(default="").strip(),
"address": response.css(".address::text").get(default="").strip(),
"cuisine": response.css(".cuisine::text").get(default="").strip(),
"price_level": response.css(".price::text").get(default="").strip(),
"rating": float(response.css(".rating::text").get(default=0)),
"tags": response.css(".tags li::text").getall(),
"url": response.url,
}
yield item
Tips
- Run this locally against a mock HTML set or a permissive site while you're testing.
- Use PlaywrightRequest options to set user agent, device, or to wait for network idle when necessary.
Step 3 — Store and normalize data (pipeline)
Add a pipeline to clean fields and write to SQLite or JSONL. For prototypes, JSONL is simplest.
# dinebot/pipelines.py
import json
class JsonWriterPipeline:
def open_spider(self, spider):
self.file = open("restaurants.jsonl", "w", encoding="utf-8")
def close_spider(self, spider):
self.file.close()
def process_item(self, item, spider):
# basic normalization
item["cuisine"] = item.get("cuisine", "").lower()
item["tags"] = [t.lower() for t in item.get("tags", [])]
self.file.write(json.dumps(item, ensure_ascii=False) + "\n")
return item
Enable the pipeline in settings.py:
ITEM_PIPELINES = {
"dinebot.pipelines.JsonWriterPipeline": 300,
}
Step 4 — Implement a deterministic preference scorer
Your micro-app’s core can be a small, easily explainable scoring function that non-devs can tweak. The goal is reproducibility over complexity. Here’s a simple scoring scheme that weights cuisine match, price, rating, and tag overlap.
# score.py
from typing import Dict, List
def score_restaurant(restaurant: Dict, prefs: Dict) -> float:
score = 0.0
# cuisine: exact or partial match
preferred = [p.lower() for p in prefs.get("cuisines", [])]
if restaurant.get("cuisine") in preferred:
score += 40
# price level: range matches
pref_price = prefs.get("price_level") # e.g. "$" or "$-$$"
if pref_price and restaurant.get("price_level") == pref_price:
score += 20
# rating: scale 0-5 => up to 25 points
score += (restaurant.get("rating", 0) / 5.0) * 25
# tag overlap: up to 15 points
tags = set(restaurant.get("tags", []))
overlap = tags.intersection(set([t.lower() for t in prefs.get("tags", [])]))
score += min(15, 5 * len(overlap))
return score
This deterministic approach is easy to inspect and change — ideal for founders who want control without black-box models.
Step 5 — Add optional LLM-based explanations (short prompts)
LLMs are great for turning raw scores into friendly language. Use them sparingly to avoid cost. A typical flow in 2026 is to generate one-sentence explanations and short blurbs for each recommendation. Keep the prompt constrained to avoid hallucination.
# example prompt (string)
PROMPT = '''You are a concise assistant. Given a restaurant and a user's preferences, return one sentence explaining why this restaurant is recommended.
Restaurant:
Name: {name}
Cuisine: {cuisine}
Price: {price}
Rating: {rating}
Tags: {tags}
User preferences:
{prefs}
Return: one sentence explanation, max 20 words.'''
Call your LLM provider (OpenAI, Anthropic, etc.) with the prompt and keep the response length small. In 2026, many micro-apps use managed LLMs with usage limits to keep costs predictable.
Step 6 — Lightweight web UI with FastAPI
FastAPI gives you an easy way to serve an HTML UI and JSON /recommend endpoint that accepts preferences and returns ranked items. Non-dev founders can host this as a single container.
# app/main.py
from fastapi import FastAPI, Request
from fastapi.responses import HTMLResponse, JSONResponse
from fastapi.templating import Jinja2Templates
import json
from score import score_restaurant
app = FastAPI()
templates = Jinja2Templates(directory="templates")
# load scraped data (for prototype)
with open("restaurants.jsonl") as f:
DATA = [json.loads(line) for line in f]
@app.get("/", response_class=HTMLResponse)
async def home(request: Request):
return templates.TemplateResponse("index.html", {"request": request})
@app.post("/recommend")
async def recommend(payload: dict):
prefs = payload.get("prefs", {})
scored = []
for r in DATA:
scored.append({"restaurant": r, "score": score_restaurant(r, prefs)})
scored.sort(key=lambda x: x["score"], reverse=True)
# return top 10
top = scored[:10]
return JSONResponse({"results": top})
Build a minimal template (templates/index.html) with a basic form and JS fetch to /recommend. Keep the UI intentionally simple — micro-apps are about utility, not polish.
Step 7 — Docker, local testing, and deploy
Dockerfile for quick reproducibility:
# Dockerfile (simplified)
FROM python:3.11-slim
WORKDIR /app
COPY . /app
RUN pip install --no-cache-dir -r requirements.txt
# Install playwright browsers in build stage
RUN playwright install --with-deps chromium
CMD ["uvicorn", "app.main:app", "--host", "0.0.0.0", "--port", "8080"]
Recommended deployment for non-dev founders in 2026:
- Use Render or Fly for container deployments — they handle HTTPS and simple autoscaling.
- Schedule Scrapy runs using GitHub Actions or Render Cron to refresh the JSONL every night.
- Store production data in SQLite for prototypes and migrate to Postgres or a managed vector DB only if you need semantic search / RAG features.
Step 8 — Production hardening & compliance
Scraping restaurant data in production requires operational and legal considerations:
- Respect robots.txt and terms of service. Prefer official APIs (Google Places, Yelp Fusion, OpenTable) when available.
- Rate limits & proxies: Use polite crawl rates. For scale, integrate residential or rotating proxies and respect site limits.
- Bot detection: Playwright + realistic browser profiles reduce detection but don't guarantee it. Use backoff and caching.
- Data quality: Add validation in pipelines (addresses, normalized cuisines, price buckets).
- Audit logs & opt-out: Keep sourcing metadata (scrape date, source URL). If a business requests removal, honor it promptly.
2026 trends and how they shape this build
Several trends through late 2025 and 2026 change how you should think about micro-apps:
- Managed headless services: Playwright-compatible hosting and playwright-cloud reduced flakiness of JS rendering; fewer infra headaches for founders.
- LLM cost predictability: More usage tiers and on-device inference choices mean you can add explanation features without runaway bills.
- Composability: Micro-apps increasingly use small composable services (API for geocoding, API for reviews) instead of building everything from scratch.
- Regulatory clarity: By 2025 many jurisdictions clarified scraping vs. API access expectations — but compliance still matters.
Advanced strategies (optional upgrades)
If you want to extend the prototype while keeping the micro-app spirit:
- Add a small embeddings index (Pinecone/Weaviate/Chroma) to support semantic matching like "I want cozy ramen".
- Use a lightweight feature store in Postgres to persist normalized attributes and track freshness.
- Expose a single shared link (short URL) per group that stores simple group prefs in DB — the micro-app becomes a private small tool rather than a public product.
- Automate daily scraping with GitHub Actions and validate diffs so you only re-scrape changed listings.
Actionable takeaways — quick checklist
- Start small: target one city and one permissive data source.
- Prefer deterministic scoring for transparency and easy iteration.
- Use Playwright via scrapy-playwright for modern JS sites; test against mocks first.
- Limit LLM use to explanations and short text to reduce cost and risk.
- Deploy as a single container, schedule scrapes, and keep the codebase minimal for maintenance.
Common pitfalls and troubleshooting
- Spider returns no items: inspect PlaywrightRequest timing and use response.text to debug the rendered HTML.
- Fields missing or inconsistent: build defensive parsing and normalization pipelines.
- Site blocks requests: slow down crawl rate, add jitter, or switch to official APIs.
- LLM explanations hallucinate: constrain prompts strictly and include source fields in the prompt for verification.
Why reproducibility matters for non-dev founders
Non-dev founders benefit most from systems they can reason about and repeat. Keep the scraping pipeline compact, the scoring deterministic and visible, and the deployment simple. That way you can: iterate with confidence, onboard collaborators quickly, and avoid opaque ML black boxes that are expensive to debug.
Next steps & checklist to ship in one weekend
- Day 1 morning: scaffold project, run Playwright, and validate one rendered page.
- Day 1 afternoon: build spider and extract 20–50 sample restaurants into JSONL.
- Day 2 morning: implement scoring, create FastAPI endpoints, wire UI to call /recommend.
- Day 2 afternoon: add LLM explanation, Dockerize, deploy to Render, and schedule nightly scrapes.
Final notes — ethics & legal
Always verify a target site's robots.txt and terms. When in doubt, use public APIs or ask for permission. Keep business ethics in mind: a private micro-app for friends is different from a commercial aggregator. If you plan to scale, consult legal counsel before scraping at scale.
Call to action
Ready to ship Rebecca Yu’s vibe-coded dining micro-app for your circle? Clone the sample repo, run the spider on allowed targets, and iterate on the scoring weights until the recommendations feel right. If you want a starter repository with all the code above wired together and a ready-to-deploy Docker image, grab our reproducible template and deploy to Render in under 30 minutes.
Get the template, sample data, and deployment guide — jumpstart your dining micro-app today.
Related Reading
- Zone Skipping and Consolidation for Cross-State Heavy Goods (Dumbbells, Bikes)
- Safety & Meds: What Weekend Travelers Should Know About New Weight-Loss Drugs and Flight Health
- Future Forecast: Clean Eating and Plant-Based Clinical Foods 2026–2029 — Opportunities for Dietitians and Startups
- The 2026 Home Heating Reset: Smart Compact Radiators, Indoor Air, and Cost‑Savvy Upgrades for Renters
- How To Use Smart Plugs to Power a Timed Sous-Vide or Bake (Safely)
Related Topics
Unknown
Contributor
Senior editor and content strategist. Writing about technology, design, and the future of digital media. Follow along for deep dives into the industry's moving parts.
Up Next
More stories handpicked for you
Review: Best CRM APIs for programmatic ingestion in 2026
Automated monitoring for SaaS endpoint changes and shutdowns
Optimize scraper runtimes on constrained hardware using timing analysis (WCET)
Using a developer-friendly Linux distro to boost scraper team productivity
Why Your Scraping Operations Need to Adapt to Social Media Algorithms
From Our Network
Trending stories across our publication group