RepoWise Icon

RepoWise

A Conversational Framework for Mining and Reasoning About Project Repositories

Department of Computer Science, University of California, Davis

Abstract

Open-source software (OSS) projects often face challenges in governance, sustainability, and contributor onboarding. Traditional analytics provide static metrics but lack interpretability and interactivity. RepoWise introduces a conversational framework powered by large language models (LLMs) that performs forensic-style reasoning over OSS repositories. It enables natural-language dialogue around key project documents—such as governance.md, contributing.md, and README.md—to surface insights into project health, sustainability risks, and actionable next steps.

By combining conversational AI with OSS analytics, RepoWise offers an interpretable, interactive approach to understanding the social and technical dynamics of open-source development. The system automatically extracts and indexes governance documents, contribution guidelines, commit data, and issue reports from GitHub repositories, then uses an LLM-based Few-Shot Chain of Thought (CoT) intent classification with dual retrieval engines to generate context-grounded, evidence-backed responses.

We’d really appreciate your thoughts on this tool. Please share your feedback here:  https://forms.gle/GUQyYY6SijDbtUVe9

Demo Video

Watch RepoWise in action as we demonstrate its key features, including repository indexing, natural language querying, and evidence-grounded response generation.

Can't see the video? Watch on YouTube

System Architecture

The system architecture of RepoWise integrates the aforementioned modules into a cohesive, multi-layered design that connects the user interface, backend services, retrieval engines, storage subsystems, and external APIs. Figure 1 illustrates this architecture and traces the flow of data from user query to final response.

At the top of the stack, the Frontend Interface enables query submission and response visualization. The frontend communicates with the backend through asynchronous HTTP requests. The Backend Core, implemented using FastAPI, serves as the central orchestrator that manages intent classification, query routing, retrieval invocation, and prompt assembly. Within the backend, the Intent Classifier applies the classification pipeline described earlier and dispatches the query to the appropriate processing module based on the predicted intent.

The Query Processing Layer consists of three components: the RAG Engine, the Structured Analytics Engine, and a Static Response Handler. When the intent is PROJECT_DOC_BASED, the RAG Engine retrieves semantically relevant documentation fragments from ChromaDB. When the intent is COMMITS or ISSUES, the Structured Analytics Engine performs computation over structured repository activity data obtained from the GitHub API. For OUT_OF_SCOPE queries, a fixed response is returned without invoking the LLM. For GENERAL queries, the request is routed directly to the LLM client.

Once the prompt is assembled, it is sent to the LLM Client, which interfaces with the local Mistral 7B model hosted by the Ollama server. The model generates an evidence-grounded response, which is streamed back to the backend. The Response Delivery Layer then formats the output, attaches provenance metadata, and returns the response to the frontend.

Beneath the query processing layer, the Storage Layer manages persistent data across the ChromaDB vector store, the structured analytics cache, and the raw repository file cache. This layer synchronizes periodically with the GitHub API to ensure data freshness and supports efficient access for repeated queries.

Finally, the External Services layer includes the GitHub API, which supplies live repository data, and the Ollama inference server, which hosts the local LLM instance. These services are accessed securely via REST endpoints to minimize latency and preserve privacy during inference.

Overall, RepoWise embodies a fully integrated, retrieval-augmented architecture in which classification, data acquisition, and reasoning operate in concert. The system’s modular design enables straightforward extension to new data sources, query types, or analytical capabilities, supporting future research and practical deployment at scale.

RepoWise system architecture diagram

Figure 1: RepoWise system architecture (click to enlarge)

Core Components

  1. User Interface (Frontend): React-based chat interface for natural language queries
  2. Intent Classification Module: LLM-based Few-Shot Chain of Thought (CoT) classification achieving 97.6% accuracy
  3. RAG Pipeline: Semantic search over project documentation using all-MiniLM-L6-v2 embeddings and ChromaDB
  4. CSV Data Pipeline: Structured retrieval of commit and issue metadata via GitHub API
  5. Prompt Assembly Engine: Merges retrieved context with anti-hallucination rules
  6. LLM Generation Module: Model-agnostic local inference via Ollama (default: Mistral 7B, configurable to any compatible model)
  7. Persistent Storage Layer: ChromaDB vector store, CSV cache, and file cache for reproducibility

Intent Classification System

The Intent Classification Module is the cognitive core of RepoWise. It determines how each user query should be processed by identifying its semantic intent and routing it to the corresponding retrieval engine. The classification mechanism uses a hybrid approach: keyword-based detection for out-of-scope queries combined with LLM-based Few-Shot Chain of Thought (CoT) prompting for all other intents, achieving 97.6% accuracy.

Hybrid Classification Approach

  1. Keyword-Based OUT_OF_SCOPE Detection: Fast pattern matching identifies greetings, casual conversation, and off-topic queries without LLM overhead
  2. LLM Few-Shot CoT Classification: For non-trivial queries, the system uses few-shot examples with chain-of-thought reasoning to classify intent. The LLM analyzes the query, considers similar examples, and provides structured reasoning before outputting the final classification

Query Classification Categories

  • OUT_OF_SCOPE: Greetings, casual conversation, unrelated queries → Direct response without LLM
    • Example: "Hello", "How are you?"
  • PROJECT_DOC_BASED: Governance, contribution guidelines, project policies → routed to RAG Pipeline
    • Example: "Who maintains this project?"
    • Example: "What are the contribution guidelines?"
  • COMMITS: Code changes, contributors, development activity → routed to CSV Data Pipeline
    • Example: "Who are the most active contributors?"
    • Example: "What files were changed recently?"
  • ISSUES: Bugs, feature requests, project health → routed to CSV Data Pipeline
    • Example: "How many open issues are there?"
    • Example: "What are the most commented issues?"
  • GENERAL: Generic programming questions not project-specific → General knowledge response
    • Example: "What is a pull request?"

Prompt Engineering Pipeline

When a user submits a natural language query to RepoWise, the system transforms it through a carefully orchestrated pipeline that ensures responses remain grounded in actual repository data while being tailored to the specific information need. This appendix details the complete prompt engineering approach, explaining both the technical implementation and the reasoning behind each design decision.

Overview: From Query to Response

The prompt engineering pipeline in RepoWise operates in three stages: intent classification, context retrieval, and response generation. Each stage employs specialized prompt templates designed to address a fundamental challenge in conversational repository analysis: how to produce accurate, verifiable answers without hallucinating information that does not exist in the project’s artifacts.

We deliberately chose a multi-stage architecture over a single end-to-end prompt because different query types require fundamentally different retrieval strategies. A question about contribution guidelines requires semantic search over documentation, while a question about top contributors requires aggregation over structured commit data. Attempting to handle both with a single retrieval mechanism would compromise accuracy for at least one query type.

Stage 1: Intent Classification

The first stage determines which retrieval engine should handle the query by classifying it into one of five intent categories: PROJECT_DOC_BASED, COMMITS, ISSUES, GENERAL, or OUT_OF_SCOPE. This classification is critical because it routes the query to the appropriate data source and prompt template.

Few-Shot Chain-of-Thought Approach

Rather than relying on simple keyword matching or a fine-tuned classifier, RepoWise employs a Few-Shot Chain-of-Thought (CoT) prompting strategy. We chose this approach because repository queries often contain semantically ambiguous phrases that require reasoning to disambiguate. For example, “Who are the core developers?” requires aggregation over commit history to rank contributors by activity, while “How can I start contributing?” requires semantic search over documentation to retrieve contribution guidelines and onboarding procedures.

The classification prompt begins by establishing the task and defining each category with precise boundaries:

You are an intent classifier for a GitHub repository Q&A system.

TASK: Classify the user query into exactly ONE category.
Think step-by-step about what information is needed
and where it would be found.

CATEGORIES:
- PROJECT_DOC_BASED: Questions about governance, contribution
  guidelines, maintainers, licenses, policies, code of conduct
- COMMITS: Questions about commit history, contributors by
  code/commits, file modifications, development activity
- ISSUES: Questions about bug reports, feature requests,
  issue reporters, open/closed issues, issue statistics
- GENERAL: Generic programming questions not specific to
  this repository
- OUT_OF_SCOPE: Greetings, off-topic queries, questions
  about the assistant itself

Exemplar Design for Disambiguation

The prompt includes 24 carefully selected exemplars that demonstrate the reasoning process. Each exemplar shows not just the classification but why that classification is correct. We specifically chose exemplars that address common sources of confusion like below:

Query: "Who are the top 5 contributors?"
Reasoning: "Top contributors" implies ranking by measurable
activity like commit count. This requires analyzing commit
history data, not reading governance docs.
Intent: COMMITS

Query: "Who maintains this project?"
Reasoning: "Maintainers" are explicitly defined roles documented
in MAINTAINERS.md, CODEOWNERS, or governance docs. This is
asking about documented roles, not commit statistics.
Intent: PROJECT_DOC_BASED

This explicit reasoning serves two purposes: it guides the LLM to consider where the answer would be found (not just what the question asks), and it creates a consistent decision boundary that the model can apply to novel queries.

Hybrid Architecture for Latency Optimization

While the CoT approach provides high accuracy, LLM inference introduces some latency. To optimize response time without sacrificing accuracy, RepoWise employs a hybrid architecture where certain intents are detected through fast keyword matching before invoking the LLM. Specifically, OUT_OF_SCOPE queries (greetings, off-topic questions) are identified through pattern matching, reserving LLM classification for queries that genuinely require semantic reasoning. This hybrid design maintains classification accuracy above 97% while reducing latency for straightforward cases that do not require LLM inference.

Stage 2: Context Retrieval

Once the intent is classified, the system retrieves relevant context using intent-specific strategies. The choice of retrieval mechanism directly impacts answer quality, and different intent types demand fundamentally different approaches.

Semantic Search for Documentation Queries

For PROJECT_DOC_BASED queries, RepoWise performs hybrid semantic search over project documentation stored in ChromaDB. Documents such as README, CONTRIBUTING, GOVERNANCE, CODE_OF_CONDUCT, SECURITY, LICENSE, MAINTAINERS, CODEOWNERS, and OWNERS are chunked, embedded using the all-MiniLM-L6-v2 model, and indexed for similarity search.

When a query arrives, it is embedded in the same vector space, and the system retrieves the top-5 most similar chunks. These chunks are then re-ranked using a hybrid scoring function that combines semantic similarity with two additional factors: (1) document type prioritization (GOVERNANCE, MAINTAINERS, CODEOWNERS) for queries about project roles, and (2) content keyword matching. For “who” queries specifically, the re-ranker also counts GitHub username patterns (@ symbols) to prioritize documents containing contributor identities. We found that pure semantic similarity occasionally retrieved topically related but factually irrelevant passages; the hybrid approach mitigates this by ensuring governance documents rank higher for governance queries.

Natural Language to Pandas for Structured Queries

For COMMITS and ISSUES queries, semantic search is fundamentally inappropriate. Questions like “Who are the top 5 contributors by commit count?” require aggregation, sorting, and counting operations that cannot be approximated through vector similarity.

Instead, RepoWise translates natural language queries into executable pandas code. This translation uses a specialized prompt that includes the complete data schema and exact query-to-code mappings:

COMMITS DATA SCHEMA - ALL COLUMNS:
| Column       | Type     | Description                          |
|--------------|----------|--------------------------------------|
| commit_sha   | string   | Unique commit identifier             |
| name         | string   | Contributor name                     |
| email        | string   | Contributor email                    |
| date         | datetime | Commit timestamp                     |
| filename     | string   | File path modified                   |
| lines_added  | int      | Lines added in this file             |
| lines_deleted| int      | Lines deleted in this file           |

CRITICAL RULES FOR COMMITS:
1. ONE ROW PER FILE MODIFIED, not one row per commit
2. To count COMMITS: df.drop_duplicates(subset=['commit_sha'])
3. Use 'name' for contributor names, NOT 'user_login'

The schema documentation is essential because the data model has non-obvious semantics. For instance, the commits table contains one row per file modified, not one row per commit. Without explicit guidance, an LLM would likely count rows rather than unique commit SHAs, producing incorrect contributor rankings. By documenting these semantics directly in the prompt, we ensure the generated code handles edge cases correctly.

The prompt also includes exact code mappings for common query patterns:

"top 5 contributors by commit count":
result = df.drop_duplicates(subset=['commit_sha'])
          .groupby('name').size()
          .sort_values(ascending=False)
          .head(5)
          .reset_index(name='commit_count')

Similarly, for issues queries, the prompt includes the issues data schema:

ISSUES DATA SCHEMA - ALL COLUMNS:
| Column       | Type     | Description                              |
|--------------|----------|------------------------------------------|
| type         | string   | 'issue' or 'comment' - FILTER BY THIS    |
| issue_num    | int      | Issue number (e.g., 123 for #123)        |
| title        | string   | Issue title (only for type='issue')      |
| user_login   | string   | GitHub username of reporter              |
| issue_state  | string   | 'OPEN' or 'CLOSED' (uppercase)           |
| created_at   | datetime | When issue/comment was created           |
| updated_at   | datetime | When issue/comment was last updated      |
| body         | string   | Issue/comment content text               |

CRITICAL RULES FOR ISSUES:
1. Dataset has BOTH issues AND comments (check 'type' column)
2. To count ISSUES: df[df['type'] == 'issue']
3. Use 'user_login' for reporter names, NOT 'name'

These mappings function as few-shot examples that demonstrate correct pandas idioms. The LLM can then generalize from these examples to handle variations like “top 10 contributors in the past 6 months” by combining the ranking pattern with a date filter.

The decision to use pandas code generation rather than semantic search for quantitative queries reflects a fundamental insight: aggregation is not retrieval. Computing “top N” requires processing the entire dataset, not finding the most similar chunks. Numeric operations demand exact values, not semantic approximations. Multi-field filtering (e.g., “open issues with more than 10 comments”) requires Boolean logic across columns. Chronological sorting is a structured operation, not a semantic one. By matching the retrieval mechanism to the query type, RepoWise ensures that quantitative queries receive computed answers rather than semantic approximations.

Stage 3: Response Generation

The final stage generates a natural language response using task-specific prompts. For repository-specific intents (PROJECT_DOC_BASED, COMMITS, and ISSUES), each prompt is composed of five modular components: system role, task instructions, anti-hallucination rules, retrieved context, and the user question.

Component 1: System Role

Every prompt begins with a brief system role that establishes the LLM’s persona:

You are a precise document analyst for the {project_name} project.

This framing is intentionally minimal but serves an important function: it primes the model to behave as an analyst extracting information from provided documents rather than a general assistant drawing on training knowledge. The word “precise” specifically signals that accuracy takes precedence over fluency or completeness.

Component 2: Task Instructions

The task instructions vary substantially based on the classified intent. We found that generic instructions produced adequate but not excellent responses; task-specific guidance significantly improved answer quality.

For WHO queries (entity extraction), the instructions emphasize pattern recognition:

TASK: ENTITY EXTRACTION - Extract names, emails, GitHub usernames

1. Search the documents for actual names, email addresses, and
   GitHub usernames
2. Look for these patterns:
   - Email format: "Name <email@domain>"
   - GitHub format: "@username" (e.g., @fchollet, @MarkDaoust)
   - CODEOWNERS format: "/path/ @username1 @username2"
   - Plain names: "Maintained by: John Doe"
3. ONLY extract names/usernames that actually appear in documents
4. If NO names found, respond: "No maintainer information found
   in the available documents"

The explicit pattern enumeration prevents the model from hallucinating maintainer identities. By specifying exactly what formats to look for, we constrain the extraction to verifiable entities.

For HOW queries (process explanation), the instructions emphasize procedural fidelity:

TASK: PROCESS EXPLANATION - Explain step-by-step procedures

      1. Provide a comprehensive explanation of the process
      2. Break down into clear, numbered steps
      3. Include prerequisites, requirements, or important context
      4. Mention specific tools, commands, or guidelines referenced
      5. Cite which documents contain each piece of information

For COMMITS queries, the instructions emphasize data-grounded analysis:

TASK: ANALYZE COMMIT DATA

    1. Answer ONLY using the commit data shown below
    2. DO NOT make up or invent information
    3. Include specific details (commit SHAs, author names, dates)
    4. For "top N" queries: Provide exactly N items in numbered format
    5. If the data doesn't answer the question, say:
   "The commit data doesn't contain this information"

For ISSUES queries, the instructions parallel those for commits but emphasize issue-specific details:

TASK: ANALYZE ISSUES DATA

1. Answer ONLY using the issues data shown below
2. DO NOT make up or invent information
3. Include specific details (issue numbers, titles, users, states)
4. For statistical questions, include numbers and percentages
5. For list questions, provide the COMPLETE requested list
6. If the data doesn't answer the question, say:
   "The issues data doesn't contain this information"

Component 3: Anti-Hallucination Rules

This component enables RepoWise acknowledging that missing information is preferable to fabricating plausible-sounding answers.

The rules are organized into seven categories, each addressing a specific failure mode we observed during development:

RULE 1: INFORMATION SOURCE
- Your ONLY source is the project documents provided below
- DO NOT use external knowledge or training data
- DO NOT make logical inferences beyond what is stated

RULE 2: HANDLING MISSING INFORMATION
If information is NOT in the documents, respond EXACTLY:
"The available project documents for {project_name} do not
contain information about [topic]."

DO NOT:
- Provide general knowledge answers ("typically", "usually")
- Make up specific details
- Give partial answers then admit uncertainty afterward

RULE 3: VERIFICATION PROCESS
Before stating ANY fact:
1. Locate the exact text in the documents below
2. Verify it's explicitly stated, not inferred
3. Note which document it comes from
4. Only then include it in your answer

RULE 4: ANSWER FORMAT
GOOD: "According to GOVERNANCE.md, maintainers are elected
      by consensus vote."
BAD:  "Maintainers are typically elected by a majority vote,
      though this isn't explicitly stated."

RULE 5: NAMES, NUMBERS, AND SPECIFICS
- Only mention names, emails, numbers, or percentages that
  appear verbatim in the documents
- If you cannot find a specific piece of information, say so
- Never invent examples or provide "typical" values

RULE 6: OUTPUT FORMAT
- DO NOT expose your reasoning process to the user
- DO NOT write: "Let me verify...", "Based on my analysis..."
- DO NOT mention: "ANTI-HALLUCINATION", "rules", or
  "guidelines I'm following"

RULE 7: RESPONSE COMPLETENESS
- Provide COMPLETE answers with all relevant details
- Include supporting information when available
- Balance brevity with informativeness

Rule 2 is particularly important: it provides an exact template for acknowledging missing information. Without this template, we observed that models would often hedge (“This project likely follows standard practices...”) rather than clearly stating that the information was unavailable.

Component 4: Retrieved Context

The retrieved context is formatted to clearly delineate the evidence base:

AVAILABLE GOVERNANCE DOCUMENTS FOR {project_name}:

[README] README.md:
{readme_content}

[CONTRIBUTING] CONTRIBUTING.md:
{contributing_content}

[GOVERNANCE] GOVERNANCE.md:
{governance_content}

The bracketed document labels serve two purposes: they make citation straightforward (the model can simply reference “[CONTRIBUTING]”), and they establish clear boundaries between documents. For structured data queries, the context contains the pandas query results formatted as readable tables.

Component 5: User Question

The prompt concludes with the user question and a generation cue:

USER QUESTION: {query}

Your answer:

The phrase “Your answer:” signals the transition from instruction to generation. This consistent cue ensures predictable generation behavior across all templates.

Special Cases: GENERAL and OUT_OF_SCOPE

Not all queries require repository-specific retrieval. For GENERAL queries (generic programming questions), RepoWise invokes the LLM directly without repository context:

You are a helpful AI assistant. Answer the user's question based on your knowledge.

USER QUESTION: {query}

For OUT_OF_SCOPE queries (greetings, off-topic requests), RepoWise returns a hardcoded response without invoking the LLM at all:

"I'm a project governance assistant designed to help you open-source project documentation, contribution
guidelines, maintainers, issues, and commit history. Please
ask me questions about the selected project."

This design choice reflects an important principle: when we know the query is out of scope, there is no benefit to generating a response. The hardcoded message is faster, cheaper, and more predictable than LLM generation.

Key Features

🤖 LLM-Based Intent Classification

Few-Shot Chain of Thought (CoT) classification achieving 97.6% accuracy in routing queries to appropriate retrieval engines

🔍 Dual Retrieval Engines

Semantic RAG for documentation and structured CSV pipeline for commits/issues data

🛡️ Anti-Hallucination Mechanisms

Context-grounded responses with strict factual boundaries and evidence-backed reasoning

📊 Evidence-Grounded Repository Mining

Forensic-style reasoning with source attribution, provenance tracking, and transparent citation of evidence

🔒 Privacy-Preserving Local LLM

Local inference via Ollama ensures data privacy and reproducibility without sending data to external APIs

🔄 Model-Agnostic Architecture

Swap LLMs via environment variables—supports Mistral, Llama 3, Gemma, Phi-3, or any Ollama-compatible model for future flexibility

💬 Interactive Conversational Interface

Natural language dialogue with provenance metadata and source attribution

Technology Stack

Backend

  • Framework: FastAPI (Python)
  • Vector Database: ChromaDB for document embeddings
  • Embeddings: all-MiniLM-L6-v2 (sentence-transformers)
  • LLM: Model-agnostic architecture (default: Mistral 7B via Ollama). Configurable via environment variables to support any Ollama-compatible model (e.g., Llama 3, Gemma, Phi-3)

Frontend

  • Framework: React 18 with Vite
  • UI Library: Tailwind CSS
  • State Management: TanStack Query (React Query)

Installation & Setup

Backend Setup

# Clone the repository
git clone https://github.com/RepoWise/backend.git
cd backend

# Create virtual environment
python -m venv venv
source venv/bin/activate  # On Windows: venv\Scripts\activate

# Install dependencies
pip install -r requirements.txt

# Configure environment
cp .env.example .env
# Edit .env with your GitHub token and settings
# Optional: Change OLLAMA_MODEL to use a different LLM (e.g., llama3, gemma, phi3)

# Install Ollama (for local LLM)
curl -fsSL https://ollama.com/install.sh | sh
ollama pull mistral:7b  # Or: ollama pull llama3, ollama pull gemma, etc.

# Start the server
./start_dev.sh

Frontend Setup

# Clone the repository
git clone https://github.com/RepoWise/frontend.git
cd frontend

# Install dependencies
npm install

# Configure environment
cp .env.example .env
# Edit .env with backend URL

# Start development server
npm run dev

Usage Example

Adding a Repository

POST /api/projects/add
{
  "github_url": "https://github.com/facebook/react"
}

Querying Documentation

POST /api/query
{
  "project_id": "facebook-react",
  "query": "What are the contribution guidelines?"
}

Response Format

{
  "answer": "To contribute to React, you should...",
  "sources": [
    {
      "file_path": "CONTRIBUTING.md",
      "score": 0.89,
      "content": "..."
    }
  ],
  "suggested_questions": [
    "How do I submit a pull request?",
    "What is the code review process?"
  ]
}

Research & Publications

RepoWise represents a paradigm shift in repository analytics—moving from static metrics to interactive, interpretable inquiry. By combining conversational retrieval with evidence-grounded LLM reasoning, the framework enables stakeholders to ask natural-language questions and receive contextual, verifiable answers with source citations, transforming how developers, maintainers, and researchers understand and navigate project repositories.

Use Cases

  • Contributor Onboarding: New contributors ask procedural questions in natural language ("How do I submit a pull request?", "What coding standards should I follow?") and receive step-by-step guidance extracted directly from project documentation with exact citations
  • Governance and Community Health: Maintainers query governance structure ("Who has commit access?"), contribution policies ("What's our code review process?"), and community dynamics ("Who are the most active contributors?") to assess transparency, engagement, and sustainability
  • Repository Forensics: Researchers and auditors perform forensic analysis to trace decision provenance ("When was the license changed?"), verify compliance ("Are there any dual-licensed files?"), detect documentation drift, and investigate historical development patterns

Key Research Contributions

  • LLM-based Few-Shot Chain of Thought (CoT) intent classification with 97.6% accuracy
  • Dual retrieval architecture combining semantic RAG and structured CSV data
  • Evidence-grounded reasoning with anti-hallucination mechanisms
  • Model-agnostic architecture enabling local LLM inference for privacy-preserving, reproducible analysis
  • Conversational framework for governance and sustainability forensics

Future Work

  • Agentic Multi-Turn Reasoning: Extend RepoWise with autonomous, multi-turn reasoning framework capable of proactive retrieval and conversational memory management
  • Code-Level Analysis: Expand beyond documentation to include code summarization, dependency forensics, and automatic sustainability scoring
  • Foundation Recommendation: Develop AI-driven foundation alignment analysis to recommend which OSS foundation (Apache, Eclipse, OSGeo, etc.) best fits a project's governance model based on textual and social cues
  • Public Deployment: Deploy as a web-based service with user authentication and persistent repository contexts for community-driven evaluation
  • Longitudinal Studies: Conduct large-scale user studies to assess real-world usability, adoption patterns, and long-term maintenance behavior

Cite RepoWise

If you use RepoWise in your research or projects, please cite it using the following entry:

@software{RepoWise2025,
  author       = {RepoWise contributors},
  title        = {RepoWise — Repository sustainability tracker (website)},
  year         = {2025},
  url          = {https://repowise.github.io/RepoWise-website/},
}

Acknowledgments

This research was supported by the National Science Foundation under Grant No. 2020751, as well as by the Alfred P. Sloan Foundation through the OSPO for UC initiative (Award No. 2024-22424).

Contact

RepoWise is developed by the DECAL Lab at UC Davis.

GitHub Organization | Backend Repository | Frontend Repository

For questions or feedback, please open an issue on our GitHub repository.

RepoWise User Analytics

Views: ...

Registered Users: ...

Processed Repositories: ...

Viewers Preview