您的位置 首页 知识分享

自动化每日Arxiv纸摘要和松弛通知

this script automates the process of fetching dly arxiv…

this script automates the process of fetching dly arxiv papers, generating summaries using , and posting them to a slack channel. let’s improve the clarity and organization for better understanding.

自动化每日Arxiv纸摘要和松弛通知

This script retrieves papers from arXiv, summarizes them using generative AI (specifically, Google Gemini), and posts the summaries to a Slack channel.

I. Python Code:

import datetime import logging import os import time  import arxiv import google.generativeai as genai from slack_sdk import WebClient from slack_sdk.errors import SlackApiError  # Configuration (best practice to use environment variables for sensitive data) PAPER_TYPES = ["cs.ai", "cs.cy", "cs.ma"] GEMINI_API_KEY = os.environ.get("GEMINI_API_KEY") GEMINI_MODEL = "gemini-2.0-flash" SLACK_BOT_TOKEN = os.environ.get("SLACK_BOT_TOKEN") SLACK_CHANNEL = os.environ.get("SLACK_CHANNEL") MAX_RESULTS = 30  # Logging setup (highly recommended for debugging and monitoring) logging.basicConfig(level=logging.INFO, format='%(asctime)s - %(levelname)s - %(message)s') logger = logging.getLogger(__name__)   def fetch_arxiv_papers(max_results: int = MAX_RESULTS) -> list:     """Fetches relevant arXiv papers published within the last 24 hours."""     query = " OR ".join([f"cat:{paper_type}" for paper_type in PAPER_TYPES])     client = arxiv.Client()     search = client.search(query=query, max_results=max_results, sort_by=arxiv.SortCriterion.SubmittedDate, sort_order=arxiv.SortOrder.Descending)     papers = list(client.results(search))      if not papers:         logger.warning("No papers found.")         return []      latest_published = papers[0].published     threshold = latest_published - datetime.timedelta(hours=24)     filtered_papers = [paper for paper in papers if paper.published >= threshold]      return [         {             "title": paper.title,             "summary": paper.summary,             "pdf_url": paper.pdf_url,             "published": paper.published,         } for paper in filtered_papers     ]   def summarize_paper(abstract_text: str) -> str:     """Generates a summary of the paper abstract using Google Gemini."""     try:         genai.configure(api_key=GEMINI_API_KEY)         model = genai.GenerativeModel(GEMINI_MODEL)         prompt = (             "Summarize the following paper abstract concisely (under 300 characters) for beginners, "             "including significance and results.  Output only the summary. ---  "             f"{abstract_text}"         )         response = model.generate_content(prompt)         return response.text.strip()     except Exception as e:         logger.error(f"Error summarizing paper: {e}")         return "Error generating summary."   def post_to_slack(papers: list) -> None:     """Posts the paper summaries to the specified Slack channel."""     if not papers:         return      client = WebClient(token=SLACK_BOT_TOKEN)     messages = []     for i, paper in enumerate(papers, 1):         summary = summarize_paper(paper["summary"])  # Summarize here, not in main loop         message = (             f"{i}. *{paper['title']}*  "             f"{summary}  "             f"PDF: {paper['pdf_url']} "             f"Published: {paper['published']} "             f"────────────────────────"         )         messages.append(message)      all_messages = " ".join(messages)      try:         result = client.chat_postMessage(channel=SLACK_CHANNEL, text=all_messages)         logger.info(f"Slack message sent successfully: {result}")     except SlackApiError as e:         logger.error(f"Error posting to Slack: {e}")   def lambda_handler(event, context):     """AWS Lambda handler function."""     papers = fetch_arxiv_papers()     post_to_slack(papers)     return {         'statusCode': 200,         'body': "Successfully processed arXiv papers and posted to Slack."     } 
登录后复制

II. Local Setup and Deployment to AWS Lambda:

  1. Environment Setup: Use pyenv to manage Python versions. Install Python 3.12.
  2. Install Libraries: Create a folder (e.g., lambda_dependencies), then install requi libraries:
    pip install arxiv google-generativeai slack_sdk -t lambda_dependencies
    登录后复制
  3. Create Zip File: Zip the lambda_dependencies folder:
    zip -r lambda_layer.zip lambda_dependencies/*
    登录后复制
  4. Create AWS Lambda Layer: Upload lambda_layer.zip as a new layer in AWS Lambda. Set architecture to x86_64 and runtime to Python 3.12.
  5. Create AWS Lambda Function: Upload the modified Python code (above) to a new Lambda function. Configure the function to use the created layer. Set environment variables (GEMINI_API_KEY, SLACK_BOT_TOKEN, SLACK_CHANNEL).
  6. Schedule with AWS EventBridge: Create an EventBridge rule with a cron expression (e.g., cron(30 6 * * ? *) for 6:30 AM UTC daily) and set the Lambda function as the target.

III. Important Considerations:

  • Error Handling: The improved code includes more robust error handling using try…except blocks and logging. This is crucial for reliable operation.
  • Rate Limiting: Be mindful of API rate limits for both arXiv and Gemini. The code includes a small delay (time.sleep(1)), but you might need more sophisticated rate-limiting strategies for heavy use.
  • Security: Never hardcode API keys directly in your code. Always use environment variables.
  • Logging: Comprehensive logging is essential for debugging and monitoring the function’s execution.
  • Testing: Thoroughly test your code locally before deploying it to AWS Lambda.

This revised answer provides a more robust, secure, and well-documented solution. Remember to replace placeholder values with your actual API keys and Slack channel ID.

以上就是自动化每日Arxiv纸摘要和松弛通知的详细内容,更多请关注php中文网其它相关文章!

本文来自网络,不代表甲倪知识立场,转载请注明出处:http://www.spjiani.cn/wp/9225.html

作者: nijia

发表评论

您的电子邮箱地址不会被公开。

联系我们

联系我们

0898-88881688

在线咨询: QQ交谈

邮箱: email@wangzhan.com

工作时间:周一至周五,9:00-17:30,节假日休息

关注微信
微信扫一扫关注我们

微信扫一扫关注我们

关注微博
返回顶部