Advertisement
Role
About the Role
Paperpile runs on data at scale, with a literature database of 250M+ academic papers and a growing body of user data accumulated over more than a decade. You'll work across the systems that ingest, process, store, and serve this data reliably: building pipelines, optimizing search, handling PDFs at scale, and exposing clean APIs.
Requirements:
- Strong backend engineering background with experience building and operating data-heavy systems in production.
- Experience deploying and operating services on AWS.
- Experience designing and maintaining data ingestion pipelines handling messy, heterogeneous sources.
- Comfortable with web scraping and working with third-party data sources and APIs.
- Familiarity with Node.js and TypeScript (experience in Java or Python is also acceptable).
- High standards for data quality, including correctness, deduplication, and consistency.
- Solid understanding of full-text search systems including indexing strategy, relevance tuning, and query optimization.
- Proficient in building reliable REST APIs.
Useful Experience:
- Familiarity with academic publishing formats and data sources (PubMed, Crossref, arXiv).
- Experience with PDF processing pipelines (extraction, transformation, storage, and delivery at scale).
- Experience with LLM-based document processing or ML pipelines for extracting structured data from unstructured text.
- Large scale web crawling and scraping.
Advertisement
Skills
Required Skills
Node.js
TypeScript
AWS
Data Pipelines
Web Scraping
Full-text Search
REST APIs
PDF Processing
LLM
Machine Learning
Interested in this role?
Sign in to your free seeker account to apply.
Advertisement