関連研究まとめ
計算ノートブック関連の論文サマリ。1論文1フォルダ(notes/<cite-key>/)で管理しています。各ノートは # AI解説(AIによる客観整理)/# Q&A(自分の質問)/# 自分のコメント の3部構成です。
分類はテーマ別の大分類(3つ)→サブ分類の階層です。論文同士がどう繋がるか(系譜・対立・補完)は 全体地図(overview) にまとめています(状態管理・チェックポイント・移行の関係マップ を公開中)。
1. 状態管理・チェックポイント・移行
ノートブックレベル(状態・タイムトラベル)
- ElasticNotebook: Enabling Live Migration for Computational Notebooks
- Kishu: Time-Traveling for Computational Notebooks
- Chipmink: Efficient Delta Identification for Massive Object Graphs
- Multiverse Notebook: Shifting Data Scientists to Time Travelers
- Fork It: Supporting Stateful Alternatives in Computational Notebooks
- NotebookOS: A Replicated Notebook Platform for Interactive Training with On-Demand GPUs
システムレベル(CR・ライブマイグレーション)
- Fast in-memory CRIU for docker containers
- Checkpoint, Restore, and Live Migration for Science Platforms
- Context-aware Execution Migration Tool for Data Science Jupyter Notebooks on Hybrid Clouds
- A Framework to capture and reproduce the Absolute State of Jupyter Notebooks
- ElasticHub: A Cost-Efficient JupyterHub Platform via Automated Scaling with Kubernetes on Hybrid Cloud
2. ノートブックの実態調査・データセット
大規模調査・サーベイ・コード品質・再現性
- A large-scale study about quality and reproducibility of Jupyter notebooks
- Data Science Through the Looking Glass: Analysis of Millions of GitHub Notebooks and ML.NET Pipelines
- A Systematic Literature Review of Software Engineering Research on Jupyter Notebook
- A large-scale comparison of Python code in Jupyter notebooks and scripts
- Exploration and Explanation in Computational Notebooks(UCSD ~1.25M コーパスの元論文)
- Jupyter Notebooks on GitHub: Characteristics and Code Clones(~2.7M)
- Do Code Quality and Style Issues Differ Across (Non-)Machine Learning Notebooks? Yes!
- Bug Analysis in Jupyter Notebook Projects: An Empirical Study
- Computational reproducibility of Jupyter notebooks from biomedical publications
- Are the Majority of Public Computational Notebooks Pathologically Non-Executable?
データセット
- Boa Meets Python: A Boa Dataset of Data Science Software in Python Language
- KGTorrent: A Dataset of Python Jupyter Notebooks from Kaggle
- Code4ML: a large-scale dataset of annotated Machine Learning code
- DistilKaggle: A Distilled Dataset of Kaggle Jupyter Notebooks
- JuNE: Jupyter Notebooks Executions Dataset(実行イベント+時刻のログ)
- JuICe: A Large Scale Distantly Supervised Dataset for Open Domain Context-based Code Generation
- Training and Evaluating a Jupyter Notebook Data Science Assistant(JuPyT5 / DSP ベンチマーク)
計算環境・ノートブック向けツール(暫定)
- Juneau: Data Lake Management for Jupyter
- MOON: Assisting Students in Completing Educational Notebook Scenarios
- Fine-Grained Lineage for Safer Notebook Interactions(nbsafety)
性能分析・プロファイリング(SLR §5.2.4)
- Bridging between Data Science and Performance Analysis: Tracing of Jupyter Notebooks
- JUmPER: Performance Data Monitoring, Instrumentation and Visualization for Jupyter Notebooks
- Containerized Jupyter Notebooks: balancing flexibility and performance
- Integrating interactive performance analysis in Jupyter Notebooks for parallel programming education
- Performance Prediction of Jupyter Notebook in JupyterHub using Machine Learning
- Themisto: Jupyter-Based Runtime Benchmark
3. インフラ・HPC・クラウド運用
HPC × Jupyter 統合
- Interactive analysis notebooks on DESY batch resources: Bringing Jupyter to HTCondor and Maxwell at DESY
- Distributed workflows with Jupyter
- Jup2Kub: algorithms and a system to translate a Jupyter Notebook pipeline to a fault tolerant distributed Kubernetes deployment
- Enhancing Research Productivity: Seamless Integration of Personal Devices and HPC Resources with the Cybershuttle Notebook Gateway
- Interactive Supercomputing with Jupyter(NERSC Cori)
- SSH Kernel: A Jupyter Extension Specifically for Remote Infrastructure Administration
JupyterHub デプロイ・運用
- JupyterHub on an on-premises cloud – a special focus on GPU Accelerated Machine Learning and 3D Visualization
- Integrating Jupyter into Research Computing Ecosystems: Challenges and Successes in Architecting JupyterHub for Collaborative Research Computing Ecosystems(TACC)
- Deploying an Educational JupyterHub for Exploratory Data Analysis, Visualization, and Running Idealized Weather Models on the Jetstream2 Cloud
- Scaling JupyterHub Using Kubernetes on Jetstream Cloud: Platform as a Service for Research and Educational Initiatives in the Atmospheric Sciences
- Deploying Jupyter Notebooks at scale on XSEDE resources for Science Gateways and workshops
- Pedagogy, Infrastructure, and Analytics for Data Science Education at Scale(Berkeley Data 8)
クラウドバースティング・HPC ポリシー
- Consideration of a Supercomputing System with Cloud Bursting Functionality from an Operational Perspective
- Exploring Diverse Cloud Bursting Policies Using Pareto Conditioned Networks
- Cloud enabling educational platforms with corc
- Interactive and Urgent HPC: Challenges and Opportunities