PhD Student in Informatics - Pennsylvania State University

Studying how privacy norms evolve from the conventional web to agentic AI

PhD student in Informatics at Penn State studying privacy policy analysis, sectoral web classification, and disclosure behavior in AI agents.

View CV Google Scholar Email

Portrait of Shahriar Shayesteh wearing glasses and a purple shirt.

Overview

I study privacy disclosures in two settings: the conventional web and emerging agentic systems. The site centers on the datasets, analyses, and evaluation work that connect those settings.

Who I am

PhD student in Informatics

At Pennsylvania State University, I work on privacy, web-scale text analysis, and responsible AI questions.

What I study

Privacy norms across two environments

My work tracks privacy disclosures in websites and asks how those norms shift when AI agents act on a user’s behalf.

Where to start

SoACer, PrivaSeer, and agentic disclosure

These projects show the main arc of the site: sector discovery, privacy analysis, and agentic AI evaluation.

3M+: privacy policies in PrivaSeer
195,495: websites in the SoAC corpus
59,590: sector-tagged privacy policies analyzed

Research Map

Three themes organize the current work.

Sectoral web classification

I build datasets and models that recover service sector from website text, making sector context usable for downstream analysis.

Privacy policy analysis at scale

I study how disclosures vary across industries and over time, with attention to transparency, vagueness, and sector-specific norms.

Agentic AI disclosure

I examine what AI agents reveal during tool use and how model behavior and schema design shape oversharing risk.

Selected Work

A few projects that show the main arc of the research.

Sectoral web classification

2025

SoACer and the SoAC Corpus

Problem

Sector context is hard to recover from noisy and heterogeneous website content.

Contribution

SoACer and the SoAC Corpus provide a web-scale classification pipeline and dataset for sector-aware analysis.

Why it matters

They make privacy and governance studies more useful by preserving service context instead of flattening the web.

Paper Code Dataset

Large-scale privacy disclosure measurement

2025-ongoing

PrivaSeer and sectoral privacy analysis

Problem

Privacy disclosures are difficult to compare across industries and time at scale.

Contribution

My work uses PrivaSeer and sector-aware analyses to study convergence, divergence, and opacity in privacy policies.

Why it matters

This creates empirical grounding for transparency, governance, and data-handling research.

SOUPS paper Project site

Agentic privacy disclosure

2026

Agentic privacy disclosure

Problem

AI agents may disclose user information too similarly across service contexts when they use tools.

Contribution

My dissertation work treats runtime tool calls as a key disclosure moment and studies both model behavior and schema design.

Why it matters

It extends privacy analysis from the conventional web to agentic systems.

Selected Publications

Selected papers relevant to the homepage. Full publication details are on Google Scholar and the CV.

Google Scholar View CV

2025

ACM DocEng

SoACer: Sector-Based Corpus and LLM-Based Framework for Sectoral Website Classification

Builds a dataset and classification pipeline for sector-aware web analysis.

Paper

2025

USENIX SOUPS

The PrivaSeer Project: Large-Scale Resources for Analysis of Privacy Policy Text

Provides infrastructure for large-scale transparency and privacy-policy analysis.

Paper

2022

FLAIRS-35

Generative Adversarial Learning with Negative Data Augmentation for Semi-Supervised Text Classification

Earlier work on robust NLP under low-label conditions.

Paper Thesis

Academic Snapshot

Current work

Graduate Research Assistant

Human Language Technologies Lab, Pennsylvania State University - Fall 2023-Present

Work on PrivaSeer, sector-aware privacy analysis, and large-scale empirical studies of disclosure behavior.

PhD in Informatics

Pennsylvania State University - Expected 2027

Dissertation on the evolution of sectoral privacy norms from the conventional web to the agentic web.

Previous research and service

Research Intern

Department of Canadian Heritage - Feb 2022-Apr 2022

Applied NLP methods to qualitative corpora and translated findings into policy-relevant insights for public stakeholders.

Graduate Research Assistant

NLP Laboratory, University of Ottawa - Jan 2021-Jun 2023

Studied fairness and robustness in semi-supervised text classification and built the work that became my master's thesis.

Reviewer

TrustNLP at NAACL 2025

Reviewed work on trustworthy NLP and responsible model behavior.

Graduate Student Panelist

IST 197: Introduction to Research

Spoke with students about research careers and graduate study.

Collaboration and Contact

I welcome conversations about privacy, AI governance, responsible NLP, and agentic systems, especially around research collaboration, datasets and evaluations, contributor work, and research opportunities.

Research collaboration
Datasets, evaluations, and contributor work
Academic and applied research opportunities

Email CV PDF Google Scholar GitHub LinkedIn Hugging Face