tutorialmicroappsplaywright

Step-by-step: Build Rebecca Yu’s dining recommender micro-app using Scrapy + Playwright

UUnknown

2026-02-22

10 min read

Step-by-step guide to build a reproducible dining recommender micro-app with Scrapy + Playwright, preference scoring, and a tiny web UI.

Hook — Stop wasting time arguing over dinner: build a reproducible dining micro-app

Decision fatigue, messy group chats, and a dozen half-baked suggestions are everyday pains for non-dev founders who just want a quick place to eat. In 2026, with mature headless-browser integrations and cheap LLM access, you can go from idea to a working dining recommender micro-app in a weekend. This tutorial recreates Rebecca Yu’s quick dining app concept using Scrapy + Playwright, implements a lightweight preference scoring engine, and ships a minimal web UI — with step-by-step code and deployment guidance that non-developers can follow.

Why this micro-app in 2026 (and why now)

Short answer: the tooling is aligned. By late 2025 and into 2026, Playwright’s cloud and headless improvements plus tighter integration with Scrapy (via scrapy-playwright) made rendering JS-heavy restaurant pages reliable. LLMs and managed embedding services let micro-app creators add explainability and personality to recommendations without building heavy infra. That convergence makes a reproducible, low-maintenance dining micro-app practical for founders who want ownership without a long engineering backlog.

"Once vibe-coding apps emerged, I started hearing about people with no tech backgrounds successfully building their own apps." — Rebecca Yu (paraphrase)

What you’ll build — overview & architecture

End result: a small scraper pipeline that extracts restaurant records (name, address, cuisine, price level, ratings, tags), a scoring function that matches user preferences, and a tiny web UI (FastAPI + Jinja) that returns ranked recommendations and short LLM-generated explanations.

Architecture (simple):

Scraping: Scrapy + scrapy-playwright for JS-rendered pages
Storage: local SQLite (or JSONL) for prototypes
Matching: Python preference-scoring function (deterministic)
Explainability: small LLM prompt to generate short recommendation text
Web UI: FastAPI with a single HTML endpoint and a /recommend API
Deployment: Docker + Render/Fly/Railway for quick hosting

Prerequisites (aimed at non-dev founders)

Basic terminal familiarity
Python 3.10+ installed
pip, virtualenv (or use Docker)
Optional: an OpenAI/Anthropic API key (or any LLM provider) for short explanations

Step 1 — Create the Scrapy project & enable Playwright

Start by making a reproducible Python project. If you are a non-dev founder, copy-paste the commands below into a terminal.

python -m venv venv
source venv/bin/activate   # Windows: venv\Scripts\activate
pip install scrapy scrapy-playwright playwright
playwright install chromium

Create a new Scrapy project:

scrapy startproject dinebot
cd dinebot

Enable scrapy-playwright in dinebot/settings.py (add or update):

# settings.py additions
DOWNLOADER_MIDDLEWARES = {
  "scrapy_playwright.middleware.PlaywrightMiddleware": 800,
}
TWISTED_REACTOR = "twisted.internet.asyncioreactor.AsyncioSelectorReactor"

Note: scrapy-playwright handles Playwright-based page rendering inside Scrapy spiders so you can reuse Scrapy's pipelines and item model.

Step 2 — Write a Playwright-backed spider

We’ll target a generic example site structure. For legal and reliability reasons, use sites that allow crawling or your own mock pages. Below is a minimal spider that uses PlaywrightRequest to render JS and extract restaurant fields.

# dinebot/spiders/restaurants.py
from scrapy import Spider
from scrapy_playwright.page import PageCoroutine
from scrapy_playwright.handlers import PageMethod
from scrapy import Request
from scrapy_playwright.http import PlaywrightRequest

class RestaurantSpider(Spider):
    name = "restaurants"
    start_urls = [
        "https://example-restaurant-site.com/city/restaurants"  # replace with allowed target
    ]

    def start_requests(self):
        for url in self.start_urls:
            yield PlaywrightRequest(url, callback=self.parse_list, dont_filter=True)

    async def parse_list(self, response):
        # example selector -- adjust to target site
        for card in response.css(".restaurant-card"):
            detail_url = card.css("a::attr(href)").get()
            yield PlaywrightRequest(response.urljoin(detail_url), callback=self.parse_detail)

    async def parse_detail(self, response):
        item = {
            "name": response.css("h1.name::text").get(default="").strip(),
            "address": response.css(".address::text").get(default="").strip(),
            "cuisine": response.css(".cuisine::text").get(default="").strip(),
            "price_level": response.css(".price::text").get(default="").strip(),
            "rating": float(response.css(".rating::text").get(default=0)),
            "tags": response.css(".tags li::text").getall(),
            "url": response.url,
        }
        yield item

Tips

Run this locally against a mock HTML set or a permissive site while you're testing.
Use PlaywrightRequest options to set user agent, device, or to wait for network idle when necessary.

Step 3 — Store and normalize data (pipeline)

Add a pipeline to clean fields and write to SQLite or JSONL. For prototypes, JSONL is simplest.

# dinebot/pipelines.py
import json

class JsonWriterPipeline:
    def open_spider(self, spider):
        self.file = open("restaurants.jsonl", "w", encoding="utf-8")

    def close_spider(self, spider):
        self.file.close()

    def process_item(self, item, spider):
        # basic normalization
        item["cuisine"] = item.get("cuisine", "").lower()
        item["tags"] = [t.lower() for t in item.get("tags", [])]
        self.file.write(json.dumps(item, ensure_ascii=False) + "\n")
        return item

Enable the pipeline in settings.py:

ITEM_PIPELINES = {
    "dinebot.pipelines.JsonWriterPipeline": 300,
}

Step 4 — Implement a deterministic preference scorer

Your micro-app’s core can be a small, easily explainable scoring function that non-devs can tweak. The goal is reproducibility over complexity. Here’s a simple scoring scheme that weights cuisine match, price, rating, and tag overlap.

# score.py
from typing import Dict, List

def score_restaurant(restaurant: Dict, prefs: Dict) -> float:
    score = 0.0

    # cuisine: exact or partial match
    preferred = [p.lower() for p in prefs.get("cuisines", [])]
    if restaurant.get("cuisine") in preferred:
        score += 40

    # price level: range matches
    pref_price = prefs.get("price_level")  # e.g. "$" or "$-$$"
    if pref_price and restaurant.get("price_level") == pref_price:
        score += 20

    # rating: scale 0-5 => up to 25 points
    score += (restaurant.get("rating", 0) / 5.0) * 25

    # tag overlap: up to 15 points
    tags = set(restaurant.get("tags", []))
    overlap = tags.intersection(set([t.lower() for t in prefs.get("tags", [])]))
    score += min(15, 5 * len(overlap))

    return score

This deterministic approach is easy to inspect and change — ideal for founders who want control without black-box models.

Step 5 — Add optional LLM-based explanations (short prompts)

LLMs are great for turning raw scores into friendly language. Use them sparingly to avoid cost. A typical flow in 2026 is to generate one-sentence explanations and short blurbs for each recommendation. Keep the prompt constrained to avoid hallucination.

# example prompt (string)
PROMPT = '''You are a concise assistant. Given a restaurant and a user's preferences, return one sentence explaining why this restaurant is recommended.

Restaurant:
Name: {name}
Cuisine: {cuisine}
Price: {price}
Rating: {rating}
Tags: {tags}

User preferences:
{prefs}

Return: one sentence explanation, max 20 words.'''

Call your LLM provider (OpenAI, Anthropic, etc.) with the prompt and keep the response length small. In 2026, many micro-apps use managed LLMs with usage limits to keep costs predictable.

Step 6 — Lightweight web UI with FastAPI

FastAPI gives you an easy way to serve an HTML UI and JSON /recommend endpoint that accepts preferences and returns ranked items. Non-dev founders can host this as a single container.

# app/main.py
from fastapi import FastAPI, Request
from fastapi.responses import HTMLResponse, JSONResponse
from fastapi.templating import Jinja2Templates
import json
from score import score_restaurant

app = FastAPI()
templates = Jinja2Templates(directory="templates")

# load scraped data (for prototype)
with open("restaurants.jsonl") as f:
    DATA = [json.loads(line) for line in f]

@app.get("/", response_class=HTMLResponse)
async def home(request: Request):
    return templates.TemplateResponse("index.html", {"request": request})

@app.post("/recommend")
async def recommend(payload: dict):
    prefs = payload.get("prefs", {})
    scored = []
    for r in DATA:
        scored.append({"restaurant": r, "score": score_restaurant(r, prefs)})
    scored.sort(key=lambda x: x["score"], reverse=True)
    # return top 10
    top = scored[:10]
    return JSONResponse({"results": top})

Build a minimal template (templates/index.html) with a basic form and JS fetch to /recommend. Keep the UI intentionally simple — micro-apps are about utility, not polish.

Step 7 — Docker, local testing, and deploy

Dockerfile for quick reproducibility:

# Dockerfile (simplified)
FROM python:3.11-slim
WORKDIR /app
COPY . /app
RUN pip install --no-cache-dir -r requirements.txt
# Install playwright browsers in build stage
RUN playwright install --with-deps chromium
CMD ["uvicorn", "app.main:app", "--host", "0.0.0.0", "--port", "8080"]

Recommended deployment for non-dev founders in 2026:

Use Render or Fly for container deployments — they handle HTTPS and simple autoscaling.
Schedule Scrapy runs using GitHub Actions or Render Cron to refresh the JSONL every night.
Store production data in SQLite for prototypes and migrate to Postgres or a managed vector DB only if you need semantic search / RAG features.

Step 8 — Production hardening & compliance

Scraping restaurant data in production requires operational and legal considerations:

Respect robots.txt and terms of service. Prefer official APIs (Google Places, Yelp Fusion, OpenTable) when available.
Rate limits & proxies: Use polite crawl rates. For scale, integrate residential or rotating proxies and respect site limits.
Bot detection: Playwright + realistic browser profiles reduce detection but don't guarantee it. Use backoff and caching.
Data quality: Add validation in pipelines (addresses, normalized cuisines, price buckets).
Audit logs & opt-out: Keep sourcing metadata (scrape date, source URL). If a business requests removal, honor it promptly.

2026 trends and how they shape this build

Several trends through late 2025 and 2026 change how you should think about micro-apps:

Managed headless services: Playwright-compatible hosting and playwright-cloud reduced flakiness of JS rendering; fewer infra headaches for founders.
LLM cost predictability: More usage tiers and on-device inference choices mean you can add explanation features without runaway bills.
Composability: Micro-apps increasingly use small composable services (API for geocoding, API for reviews) instead of building everything from scratch.
Regulatory clarity: By 2025 many jurisdictions clarified scraping vs. API access expectations — but compliance still matters.

Advanced strategies (optional upgrades)

If you want to extend the prototype while keeping the micro-app spirit:

Add a small embeddings index (Pinecone/Weaviate/Chroma) to support semantic matching like "I want cozy ramen".
Use a lightweight feature store in Postgres to persist normalized attributes and track freshness.
Expose a single shared link (short URL) per group that stores simple group prefs in DB — the micro-app becomes a private small tool rather than a public product.
Automate daily scraping with GitHub Actions and validate diffs so you only re-scrape changed listings.

Actionable takeaways — quick checklist

Start small: target one city and one permissive data source.
Prefer deterministic scoring for transparency and easy iteration.
Use Playwright via scrapy-playwright for modern JS sites; test against mocks first.
Limit LLM use to explanations and short text to reduce cost and risk.
Deploy as a single container, schedule scrapes, and keep the codebase minimal for maintenance.

Common pitfalls and troubleshooting

Spider returns no items: inspect PlaywrightRequest timing and use response.text to debug the rendered HTML.
Fields missing or inconsistent: build defensive parsing and normalization pipelines.
Site blocks requests: slow down crawl rate, add jitter, or switch to official APIs.
LLM explanations hallucinate: constrain prompts strictly and include source fields in the prompt for verification.

Why reproducibility matters for non-dev founders

Non-dev founders benefit most from systems they can reason about and repeat. Keep the scraping pipeline compact, the scoring deterministic and visible, and the deployment simple. That way you can: iterate with confidence, onboard collaborators quickly, and avoid opaque ML black boxes that are expensive to debug.

Next steps & checklist to ship in one weekend

Day 1 morning: scaffold project, run Playwright, and validate one rendered page.
Day 1 afternoon: build spider and extract 20–50 sample restaurants into JSONL.
Day 2 morning: implement scoring, create FastAPI endpoints, wire UI to call /recommend.
Day 2 afternoon: add LLM explanation, Dockerize, deploy to Render, and schedule nightly scrapes.

Final notes — ethics & legal

Always verify a target site's robots.txt and terms. When in doubt, use public APIs or ask for permission. Keep business ethics in mind: a private micro-app for friends is different from a commercial aggregator. If you plan to scale, consult legal counsel before scraping at scale.

Call to action

Ready to ship Rebecca Yu’s vibe-coded dining micro-app for your circle? Clone the sample repo, run the spider on allowed targets, and iterate on the scoring weights until the recommendations feel right. If you want a starter repository with all the code above wired together and a ready-to-deploy Docker image, grab our reproducible template and deploy to Render in under 30 minutes.

Get the template, sample data, and deployment guide — jumpstart your dining micro-app today.

Unknown

Contributor

Senior editor and content strategist. Writing about technology, design, and the future of digital media. Follow along for deep dives into the industry's moving parts.

Up Next

Why Your Scraping Operations Need to Adapt to Social Media Algorithms

From Our Network

Trending stories across our publication group

Monitor and Maintain On-Prem AI Models for WordPress: Ops, Observability, and Cost Control

modifywordpresscourse.com

ops•10 min read

Monitor and Maintain On-Prem AI Models for WordPress: Ops, Observability, and Cost Control

Operationalizing Post‑Patch Validation: Avoiding the 'Fail to Shut Down' Trap in Clinical Environments

allscripts.cloud

patch validation•10 min read

Operationalizing Post‑Patch Validation: Avoiding the 'Fail to Shut Down' Trap in Clinical Environments

Edge AI in the Browser: Using Local LLMs to Power Rich Web Apps Without Cloud Calls

webtechnoworld.com

Web Apps•12 min read

Edge AI in the Browser: Using Local LLMs to Power Rich Web Apps Without Cloud Calls

Choosing the Right Developer Desktop: Lightweight Linux for Faster Serverless Builds

functions.top

developer experience•10 min read

Choosing the Right Developer Desktop: Lightweight Linux for Faster Serverless Builds

How to Build a Small-Scale Mirrored Archive Using Torrents for Critical Tools During CDN Outages

filesdownloads.net

Archives•10 min read

How to Build a Small-Scale Mirrored Archive Using Torrents for Critical Tools During CDN Outages

Secure Client-Side Encryption for Uploads in Multi-Provider Environments

uploadfile.pro

encryption•11 min read

Secure Client-Side Encryption for Uploads in Multi-Provider Environments

2026-02-22T00:29:25.950Z