Academic Research Partnership: De-Identified Conversation Dataset Access

Q: What research topics can this dataset support?

The corpus can support NLP, dialogue modeling, persuasion research, behavioral economics, compliance AI, human-in-the-loop review design, and outcome prediction work where conversational behavior matters.

Q: How is researcher access controlled?

Access is controlled through proposal review, legal terms, institution verification, role-based permissions, and secure enclave workflows that restrict how approved researchers interact with the corpus.

Q: Is data transferred to researchers?

The standard model is controlled enclave access rather than unrestricted file transfer. Researchers work inside an approved environment and export only permitted outputs.

Q: What are the co-authorship terms?

Co-authorship is available when the project depends materially on dataset design, annotation strategy, domain interpretation, or joint analysis effort. Specific terms are set during proposal review.

Q: Which institutions have you worked with?

Institution details are discussed during diligence when both sides can assess fit, governance, and publication plans. The access process is selective rather than open-download.

Direct answer

This dataset can answer research questions that require real negotiation dialogue plus known outcomes: which phrasing predicts agreement, how hardship signals change behavior, what conversational markers precede successful resolution, how compliance constraints shape language choice, and how domain-specific dialogue differs from open-web conversation corpora. Because the records are de-identified and outcome-linked, qualified teams can study not only what people said but what happened after they said it.

Academic researchers have many text datasets and very few that combine high-stakes financial conversations, structured outcomes, and controlled access terms. Public NLP corpora often lack a business objective, lack verified result labels, or flatten the interaction into one-off prompts. This corpus is different. It captures repeated conversational work inside collections, insurance, and enrollment settings where persuasion, compliance, timing, and trust all matter.

95K+

Labeled conversations tied to observable outcomes and workflow context

Primary verticals: collections, insurance, and enrollment negotiation flows

0 PII

Research corpus prepared for controlled access without personal identifiers

Co-authorship

Available when the project requires material joint domain and analytic work

What the dataset enables

For NLP teams, the obvious draw is outcome-linked dialogue: a rare chance to study multi-turn interactions where intent and result can be aligned. Researchers can test classification, retrieval, summarization, policy-constrained generation, and turn-level decision support against data that reflects real operational stakes rather than web chatter. For dialogue scientists, the corpus offers a dense setting for interruption handling, negotiation sequence analysis, and turn-taking under pressure.

For behavioral economists and persuasion researchers, the value is different. The calls expose how framing, urgency, empathy, refusal, and concession interact under practical limits. For compliance AI researchers, the corpus can support work on safe prompting, violation detection, and supervised review tools that flag risky phrasing without making final legal judgments.

Why this corpus is unique

Most public language datasets either have broad topical variety with thin labels or strong labels with shallow conversation depth. This corpus offers repeated domain focus, operational context, and outcomes. It also reflects constrained language, which matters. The same representative must pursue a business goal while staying within policy and legal limits. That tension produces data that is useful for studying tradeoffs in a way open-domain dialogue sets cannot match.

The vertical mix also matters. Collections, insurance, and enrollment are not identical, but each contains negotiation, compliance, and information asymmetry. That makes the corpus broad enough for comparative work while still preserving a clear shared structure.

Research area	What the corpus offers	Why public datasets fall short
Dialogue modeling	Multi-turn negotiation with real constraints and outcomes	Many public corpora lack verified outcomes or domain pressure
Behavioral research	Observable tradeoffs in wording, timing, and concession behavior	Open-web text rarely captures live financial negotiation
Compliance AI	Examples of approved and risky phrasing in regulated settings	Generic datasets do not reflect policy-bound conversation work
Outcome prediction	Labels tied to resolution, refusal, and next-step behavior	Equivalent outcome-linked business dialogue corpora are scarce

Access model

The default model is secure enclave access, not bulk transfer. Approved researchers work inside a controlled environment where the corpus, tooling, and export rules are defined up front. That keeps governance tight while still allowing meaningful academic work. Teams can run notebooks, evaluate models, and prepare results without taking unrestricted raw files off-platform.

Controlled access also makes cross-institution collaboration easier. Legal review can focus on the project scope, the export policy, and the institutional protections rather than on open-ended downstream redistribution risk.

Why enclave access matters: it lets serious researchers work with rare data while keeping governance disciplined enough for domain-sensitive conversation records.

Co-authorship terms

Co-authorship is available when the project is more than a simple data pull. If the work depends on annotation design, domain interpretation, custom task framing, or significant joint analysis, a shared authorship plan can be discussed at the proposal stage. If the project is mainly independent academic use, access can proceed without any publication claim beyond standard acknowledgment and citation terms.

The key point is clarity early. Research teams should describe their target venue, expected contribution, timeline, and whether they want a hands-on data partner or only governed access.

Comparison to existing NLP datasets

There are strong public datasets for summarization, intent detection, and open-domain dialogue. There are also helpful customer-service corpora. What is missing at scale is de-identified, outcome-linked financial negotiation data where turns are tied to real business results and constrained operating rules. That gap is why this corpus can matter to both academic and applied research groups.

Ideal partner institutions

The best partners are labs and departments that already run sensitive-data workflows: computational social science groups, NLP labs, public policy schools, business schools studying negotiation, and interdisciplinary centers working on responsible AI. Teams should be prepared to define a narrow question, identify their methods, and explain why outcome-linked call data is necessary for the study.

Precedent

The model is closer to controlled academic API and research-access programs than to open corpus dumps. Twitter and Reddit both set expectations for restricted yet useful academic pathways: clear review, defined use, and governed outputs. A domain-sensitive call corpus benefits from the same approach, especially when publication value depends more on rigorous access than on unrestricted downloads.

Application process

Applications should describe the research question, the principal investigators, the institution, the expected methods, the compute needs, and whether publication is planned. If the project touches model generation, safety review, or domain interpretation, include that too. Strong proposals explain why existing public datasets are not enough and what scientific value outcome-linked conversations add.

Submit a short proposal by email.
Complete fit review and governance screening.
Define access scope, export rules, and publication terms.
Begin work in the secure enclave after approval.

Important: access is selective. There is no instant download and no open public release of the raw corpus.

FAQ

What research topics can this dataset support?

It can support NLP, dialogue systems, persuasion research, behavioral economics, compliance AI, and outcome prediction where conversation structure and result labels matter.

How is researcher access controlled?

Access is governed through proposal review, institution verification, legal terms, role-based permissions, and secure enclave workflows.

Is data transferred to researchers?

The standard model is enclave access rather than unrestricted file transfer. Researchers work in an approved environment and export only permitted outputs.

What are the co-authorship terms?

Co-authorship is discussed when the work involves substantial joint design, annotation, interpretation, or analysis. Terms are set during proposal review.

Which institutions have you worked with?

Institution details are shared during diligence when both sides can assess fit, governance requirements, and publication plans. The program is selective and tailored to the project.

Submit a Research Proposal

If your lab needs a rare outcome-linked dialogue corpus for serious research, send a proposal summary and intended methods.

Submit a Research Proposal Book a research access call

The Largest De-Identified B2B Call Dataset for Academic Research — Apply for Access

On this page

What the dataset enables

Why this corpus is unique

Access model

Co-authorship terms

Comparison to existing NLP datasets

Ideal partner institutions

Precedent

Application process

FAQ

What research topics can this dataset support?

How is researcher access controlled?

Is data transferred to researchers?

What are the co-authorship terms?

Which institutions have you worked with?

Submit a Research Proposal

Related data product pages