Step-by-step: Build Rebecca Yu’s dining recommender micro-app using Scrapy + Playwright
tutorialmicroappsplaywright

Step-by-step: Build Rebecca Yu’s dining recommender micro-app using Scrapy + Playwright

UUnknown
2026-02-22
10 min read
Advertisement

Step-by-step guide to build a reproducible dining recommender micro-app with Scrapy + Playwright, preference scoring, and a tiny web UI.

Hook — Stop wasting time arguing over dinner: build a reproducible dining micro-app

Decision fatigue, messy group chats, and a dozen half-baked suggestions are everyday pains for non-dev founders who just want a quick place to eat. In 2026, with mature headless-browser integrations and cheap LLM access, you can go from idea to a working dining recommender micro-app in a weekend. This tutorial recreates Rebecca Yu’s quick dining app concept using Scrapy + Playwright, implements a lightweight preference scoring engine, and ships a minimal web UI — with step-by-step code and deployment guidance that non-developers can follow.

Why this micro-app in 2026 (and why now)

Short answer: the tooling is aligned. By late 2025 and into 2026, Playwright’s cloud and headless improvements plus tighter integration with Scrapy (via scrapy-playwright) made rendering JS-heavy restaurant pages reliable. LLMs and managed embedding services let micro-app creators add explainability and personality to recommendations without building heavy infra. That convergence makes a reproducible, low-maintenance dining micro-app practical for founders who want ownership without a long engineering backlog.

"Once vibe-coding apps emerged, I started hearing about people with no tech backgrounds successfully building their own apps." — Rebecca Yu (paraphrase)

What you’ll build — overview & architecture

End result: a small scraper pipeline that extracts restaurant records (name, address, cuisine, price level, ratings, tags), a scoring function that matches user preferences, and a tiny web UI (FastAPI + Jinja) that returns ranked recommendations and short LLM-generated explanations.

Architecture (simple):

  • Scraping: Scrapy + scrapy-playwright for JS-rendered pages
  • Storage: local SQLite (or JSONL) for prototypes
  • Matching: Python preference-scoring function (deterministic)
  • Explainability: small LLM prompt to generate short recommendation text
  • Web UI: FastAPI with a single HTML endpoint and a /recommend API
  • Deployment: Docker + Render/Fly/Railway for quick hosting

Prerequisites (aimed at non-dev founders)

  • Basic terminal familiarity
  • Python 3.10+ installed
  • pip, virtualenv (or use Docker)
  • Optional: an OpenAI/Anthropic API key (or any LLM provider) for short explanations

Step 1 — Create the Scrapy project & enable Playwright

Start by making a reproducible Python project. If you are a non-dev founder, copy-paste the commands below into a terminal.

python -m venv venv
source venv/bin/activate   # Windows: venv\Scripts\activate
pip install scrapy scrapy-playwright playwright
playwright install chromium

Create a new Scrapy project:

scrapy startproject dinebot
cd dinebot

Enable scrapy-playwright in dinebot/settings.py (add or update):

# settings.py additions
DOWNLOADER_MIDDLEWARES = {
  "scrapy_playwright.middleware.PlaywrightMiddleware": 800,
}
TWISTED_REACTOR = "twisted.internet.asyncioreactor.AsyncioSelectorReactor"

Note: scrapy-playwright handles Playwright-based page rendering inside Scrapy spiders so you can reuse Scrapy's pipelines and item model.

Step 2 — Write a Playwright-backed spider

We’ll target a generic example site structure. For legal and reliability reasons, use sites that allow crawling or your own mock pages. Below is a minimal spider that uses PlaywrightRequest to render JS and extract restaurant fields.

# dinebot/spiders/restaurants.py
from scrapy import Spider
from scrapy_playwright.page import PageCoroutine
from scrapy_playwright.handlers import PageMethod
from scrapy import Request
from scrapy_playwright.http import PlaywrightRequest

class RestaurantSpider(Spider):
    name = "restaurants"
    start_urls = [
        "https://example-restaurant-site.com/city/restaurants"  # replace with allowed target
    ]

    def start_requests(self):
        for url in self.start_urls:
            yield PlaywrightRequest(url, callback=self.parse_list, dont_filter=True)

    async def parse_list(self, response):
        # example selector -- adjust to target site
        for card in response.css(".restaurant-card"):
            detail_url = card.css("a::attr(href)").get()
            yield PlaywrightRequest(response.urljoin(detail_url), callback=self.parse_detail)

    async def parse_detail(self, response):
        item = {
            "name": response.css("h1.name::text").get(default="").strip(),
            "address": response.css(".address::text").get(default="").strip(),
            "cuisine": response.css(".cuisine::text").get(default="").strip(),
            "price_level": response.css(".price::text").get(default="").strip(),
            "rating": float(response.css(".rating::text").get(default=0)),
            "tags": response.css(".tags li::text").getall(),
            "url": response.url,
        }
        yield item

Tips

  • Run this locally against a mock HTML set or a permissive site while you're testing.
  • Use PlaywrightRequest options to set user agent, device, or to wait for network idle when necessary.

Step 3 — Store and normalize data (pipeline)

Add a pipeline to clean fields and write to SQLite or JSONL. For prototypes, JSONL is simplest.

# dinebot/pipelines.py
import json

class JsonWriterPipeline:
    def open_spider(self, spider):
        self.file = open("restaurants.jsonl", "w", encoding="utf-8")

    def close_spider(self, spider):
        self.file.close()

    def process_item(self, item, spider):
        # basic normalization
        item["cuisine"] = item.get("cuisine", "").lower()
        item["tags"] = [t.lower() for t in item.get("tags", [])]
        self.file.write(json.dumps(item, ensure_ascii=False) + "\n")
        return item

Enable the pipeline in settings.py:

ITEM_PIPELINES = {
    "dinebot.pipelines.JsonWriterPipeline": 300,
}

Step 4 — Implement a deterministic preference scorer

Your micro-app’s core can be a small, easily explainable scoring function that non-devs can tweak. The goal is reproducibility over complexity. Here’s a simple scoring scheme that weights cuisine match, price, rating, and tag overlap.

# score.py
from typing import Dict, List

def score_restaurant(restaurant: Dict, prefs: Dict) -> float:
    score = 0.0

    # cuisine: exact or partial match
    preferred = [p.lower() for p in prefs.get("cuisines", [])]
    if restaurant.get("cuisine") in preferred:
        score += 40

    # price level: range matches
    pref_price = prefs.get("price_level")  # e.g. "$" or "$-$$"
    if pref_price and restaurant.get("price_level") == pref_price:
        score += 20

    # rating: scale 0-5 => up to 25 points
    score += (restaurant.get("rating", 0) / 5.0) * 25

    # tag overlap: up to 15 points
    tags = set(restaurant.get("tags", []))
    overlap = tags.intersection(set([t.lower() for t in prefs.get("tags", [])]))
    score += min(15, 5 * len(overlap))

    return score

This deterministic approach is easy to inspect and change — ideal for founders who want control without black-box models.

Step 5 — Add optional LLM-based explanations (short prompts)

LLMs are great for turning raw scores into friendly language. Use them sparingly to avoid cost. A typical flow in 2026 is to generate one-sentence explanations and short blurbs for each recommendation. Keep the prompt constrained to avoid hallucination.

# example prompt (string)
PROMPT = '''You are a concise assistant. Given a restaurant and a user's preferences, return one sentence explaining why this restaurant is recommended.

Restaurant:
Name: {name}
Cuisine: {cuisine}
Price: {price}
Rating: {rating}
Tags: {tags}

User preferences:
{prefs}

Return: one sentence explanation, max 20 words.'''

Call your LLM provider (OpenAI, Anthropic, etc.) with the prompt and keep the response length small. In 2026, many micro-apps use managed LLMs with usage limits to keep costs predictable.

Step 6 — Lightweight web UI with FastAPI

FastAPI gives you an easy way to serve an HTML UI and JSON /recommend endpoint that accepts preferences and returns ranked items. Non-dev founders can host this as a single container.

# app/main.py
from fastapi import FastAPI, Request
from fastapi.responses import HTMLResponse, JSONResponse
from fastapi.templating import Jinja2Templates
import json
from score import score_restaurant

app = FastAPI()
templates = Jinja2Templates(directory="templates")

# load scraped data (for prototype)
with open("restaurants.jsonl") as f:
    DATA = [json.loads(line) for line in f]

@app.get("/", response_class=HTMLResponse)
async def home(request: Request):
    return templates.TemplateResponse("index.html", {"request": request})

@app.post("/recommend")
async def recommend(payload: dict):
    prefs = payload.get("prefs", {})
    scored = []
    for r in DATA:
        scored.append({"restaurant": r, "score": score_restaurant(r, prefs)})
    scored.sort(key=lambda x: x["score"], reverse=True)
    # return top 10
    top = scored[:10]
    return JSONResponse({"results": top})

Build a minimal template (templates/index.html) with a basic form and JS fetch to /recommend. Keep the UI intentionally simple — micro-apps are about utility, not polish.

Step 7 — Docker, local testing, and deploy

Dockerfile for quick reproducibility:

# Dockerfile (simplified)
FROM python:3.11-slim
WORKDIR /app
COPY . /app
RUN pip install --no-cache-dir -r requirements.txt
# Install playwright browsers in build stage
RUN playwright install --with-deps chromium
CMD ["uvicorn", "app.main:app", "--host", "0.0.0.0", "--port", "8080"]

Recommended deployment for non-dev founders in 2026:

  • Use Render or Fly for container deployments — they handle HTTPS and simple autoscaling.
  • Schedule Scrapy runs using GitHub Actions or Render Cron to refresh the JSONL every night.
  • Store production data in SQLite for prototypes and migrate to Postgres or a managed vector DB only if you need semantic search / RAG features.

Step 8 — Production hardening & compliance

Scraping restaurant data in production requires operational and legal considerations:

  • Respect robots.txt and terms of service. Prefer official APIs (Google Places, Yelp Fusion, OpenTable) when available.
  • Rate limits & proxies: Use polite crawl rates. For scale, integrate residential or rotating proxies and respect site limits.
  • Bot detection: Playwright + realistic browser profiles reduce detection but don't guarantee it. Use backoff and caching.
  • Data quality: Add validation in pipelines (addresses, normalized cuisines, price buckets).
  • Audit logs & opt-out: Keep sourcing metadata (scrape date, source URL). If a business requests removal, honor it promptly.

Several trends through late 2025 and 2026 change how you should think about micro-apps:

  • Managed headless services: Playwright-compatible hosting and playwright-cloud reduced flakiness of JS rendering; fewer infra headaches for founders.
  • LLM cost predictability: More usage tiers and on-device inference choices mean you can add explanation features without runaway bills.
  • Composability: Micro-apps increasingly use small composable services (API for geocoding, API for reviews) instead of building everything from scratch.
  • Regulatory clarity: By 2025 many jurisdictions clarified scraping vs. API access expectations — but compliance still matters.

Advanced strategies (optional upgrades)

If you want to extend the prototype while keeping the micro-app spirit:

  1. Add a small embeddings index (Pinecone/Weaviate/Chroma) to support semantic matching like "I want cozy ramen".
  2. Use a lightweight feature store in Postgres to persist normalized attributes and track freshness.
  3. Expose a single shared link (short URL) per group that stores simple group prefs in DB — the micro-app becomes a private small tool rather than a public product.
  4. Automate daily scraping with GitHub Actions and validate diffs so you only re-scrape changed listings.

Actionable takeaways — quick checklist

  • Start small: target one city and one permissive data source.
  • Prefer deterministic scoring for transparency and easy iteration.
  • Use Playwright via scrapy-playwright for modern JS sites; test against mocks first.
  • Limit LLM use to explanations and short text to reduce cost and risk.
  • Deploy as a single container, schedule scrapes, and keep the codebase minimal for maintenance.

Common pitfalls and troubleshooting

  • Spider returns no items: inspect PlaywrightRequest timing and use response.text to debug the rendered HTML.
  • Fields missing or inconsistent: build defensive parsing and normalization pipelines.
  • Site blocks requests: slow down crawl rate, add jitter, or switch to official APIs.
  • LLM explanations hallucinate: constrain prompts strictly and include source fields in the prompt for verification.

Why reproducibility matters for non-dev founders

Non-dev founders benefit most from systems they can reason about and repeat. Keep the scraping pipeline compact, the scoring deterministic and visible, and the deployment simple. That way you can: iterate with confidence, onboard collaborators quickly, and avoid opaque ML black boxes that are expensive to debug.

Next steps & checklist to ship in one weekend

  1. Day 1 morning: scaffold project, run Playwright, and validate one rendered page.
  2. Day 1 afternoon: build spider and extract 20–50 sample restaurants into JSONL.
  3. Day 2 morning: implement scoring, create FastAPI endpoints, wire UI to call /recommend.
  4. Day 2 afternoon: add LLM explanation, Dockerize, deploy to Render, and schedule nightly scrapes.

Always verify a target site's robots.txt and terms. When in doubt, use public APIs or ask for permission. Keep business ethics in mind: a private micro-app for friends is different from a commercial aggregator. If you plan to scale, consult legal counsel before scraping at scale.

Call to action

Ready to ship Rebecca Yu’s vibe-coded dining micro-app for your circle? Clone the sample repo, run the spider on allowed targets, and iterate on the scoring weights until the recommendations feel right. If you want a starter repository with all the code above wired together and a ready-to-deploy Docker image, grab our reproducible template and deploy to Render in under 30 minutes.

Get the template, sample data, and deployment guide — jumpstart your dining micro-app today.

Advertisement

Related Topics

#tutorial#microapps#playwright
U

Unknown

Contributor

Senior editor and content strategist. Writing about technology, design, and the future of digital media. Follow along for deep dives into the industry's moving parts.

Advertisement
2026-02-22T00:29:25.950Z