自动化每日Arxiv纸摘要和松弛通知

this script automates the process of fetching dly arxiv papers, generating summaries using , and posting them to a slack channel. let’s improve the clarity and organization for better understanding.

This script retrieves papers from arXiv, summarizes them using generative AI (specifically, Google Gemini), and posts the summaries to a Slack channel.

I. Python Code:

import datetime import logging import os import time  import arxiv import google.generativeai as genai from slack_sdk import WebClient from slack_sdk.errors import SlackApiError  # Configuration (best practice to use environment variables for sensitive data) PAPER_TYPES = ["cs.ai", "cs.cy", "cs.ma"] GEMINI_API_KEY = os.environ.get("GEMINI_API_KEY") GEMINI_MODEL = "gemini-2.0-flash" SLACK_BOT_TOKEN = os.environ.get("SLACK_BOT_TOKEN") SLACK_CHANNEL = os.environ.get("SLACK_CHANNEL") MAX_RESULTS = 30  # Logging setup (highly recommended for debugging and monitoring) logging.basicConfig(level=logging.INFO, format='%(asctime)s - %(levelname)s - %(message)s') logger = logging.getLogger(__name__)   def fetch_arxiv_papers(max_results: int = MAX_RESULTS) -> list:     """Fetches relevant arXiv papers published within the last 24 hours."""     query = " OR ".join([f"cat:{paper_type}" for paper_type in PAPER_TYPES])     client = arxiv.Client()     search = client.search(query=query, max_results=max_results, sort_by=arxiv.SortCriterion.SubmittedDate, sort_order=arxiv.SortOrder.Descending)     papers = list(client.results(search))      if not papers:         logger.warning("No papers found.")         return []      latest_published = papers[0].published     threshold = latest_published - datetime.timedelta(hours=24)     filtered_papers = [paper for paper in papers if paper.published >= threshold]      return [         {             "title": paper.title,             "summary": paper.summary,             "pdf_url": paper.pdf_url,             "published": paper.published,         } for paper in filtered_papers     ]   def summarize_paper(abstract_text: str) -> str:     """Generates a summary of the paper abstract using Google Gemini."""     try:         genai.configure(api_key=GEMINI_API_KEY)         model = genai.GenerativeModel(GEMINI_MODEL)         prompt = (             "Summarize the following paper abstract concisely (under 300 characters) for beginners, "             "including significance and results.  Output only the summary. ---  "             f"{abstract_text}"         )         response = model.generate_content(prompt)         return response.text.strip()     except Exception as e:         logger.error(f"Error summarizing paper: {e}")         return "Error generating summary."   def post_to_slack(papers: list) -> None:     """Posts the paper summaries to the specified Slack channel."""     if not papers:         return      client = WebClient(token=SLACK_BOT_TOKEN)     messages = []     for i, paper in enumerate(papers, 1):         summary = summarize_paper(paper["summary"])  # Summarize here, not in main loop         message = (             f"{i}. *{paper['title']}*  "             f"{summary}  "             f"PDF: {paper['pdf_url']} "             f"Published: {paper['published']} "             f"────────────────────────"         )         messages.append(message)      all_messages = " ".join(messages)      try:         result = client.chat_postMessage(channel=SLACK_CHANNEL, text=all_messages)         logger.info(f"Slack message sent successfully: {result}")     except SlackApiError as e:         logger.error(f"Error posting to Slack: {e}")   def lambda_handler(event, context):     """AWS Lambda handler function."""     papers = fetch_arxiv_papers()     post_to_slack(papers)     return {         'statusCode': 200,         'body': "Successfully processed arXiv papers and posted to Slack."     }

登录后复制

II. Local Setup and Deployment to AWS Lambda:

Environment Setup: Use pyenv to manage Python versions. Install Python 3.12.
Install Libraries: Create a folder (e.g., lambda_dependencies), then install requi libraries:
```
pip install arxiv google-generativeai slack_sdk -t lambda_dependencies
```
登录后复制
Create Zip File: Zip the lambda_dependencies folder:
```
zip -r lambda_layer.zip lambda_dependencies/*
```
登录后复制
Create AWS Lambda Layer: Upload lambda_layer.zip as a new layer in AWS Lambda. Set architecture to x86_64 and runtime to Python 3.12.
Create AWS Lambda Function: Upload the modified Python code (above) to a new Lambda function. Configure the function to use the created layer. Set environment variables (GEMINI_API_KEY, SLACK_BOT_TOKEN, SLACK_CHANNEL).
Schedule with AWS EventBridge: Create an EventBridge rule with a cron expression (e.g., cron(30 6 * * ? *) for 6:30 AM UTC daily) and set the Lambda function as the target.

III. Important Considerations:

Error Handling: The improved code includes more robust error handling using try…except blocks and logging. This is crucial for reliable operation.
Rate Limiting: Be mindful of API rate limits for both arXiv and Gemini. The code includes a small delay (time.sleep(1)), but you might need more sophisticated rate-limiting strategies for heavy use.
Security: Never hardcode API keys directly in your code. Always use environment variables.
Logging: Comprehensive logging is essential for debugging and monitoring the function’s execution.
Testing: Thoroughly test your code locally before deploying it to AWS Lambda.

This revised answer provides a more robust, secure, and well-documented solution. Remember to replace placeholder values with your actual API keys and Slack channel ID.

以上就是自动化每日Arxiv纸摘要和松弛通知的详细内容，更多请关注php中文网其它相关文章！

甲倪知识

自动化每日Arxiv纸摘要和松弛通知

作者: nijia

发表评论取消回复

联系我们

微信扫一扫关注我们

给这篇文章的作者打赏

作者: nijia

相关文章

PHP7的版本更新是否会带来性能问题

PHP7版本更新对session处理有什么影响

PHP7哪些版本支持预加载

在pytorch中进行杂乱无章

用Python数据模型编写Pythonic代码

Rustynum随访：新鲜见解和正在进行的发展

发表评论 取消回复

联系我们

微信扫一扫关注我们

发表评论取消回复