Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

增加山河软件镜像源的帮助文档 #92

Open
wants to merge 15 commits into
base: main
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
4 changes: 2 additions & 2 deletions content/ai/deep_learning/quickstart/deploy_app.md
Original file line number Diff line number Diff line change
@@ -1,5 +1,5 @@
---
title: "步骤一:部署 Deep Learning 应用"
title: "部署 Deep Learning 应用"
description: 本小节主要介绍如何快速部署 Deep Learning 应用。
keywords:
weight: 10
Expand All @@ -13,7 +13,7 @@ draft: false
## 前提条件

- 已获取管理控制台登录账号和密码。
- 已创建 [VPC 网络](https://docsv3.qingcloud.com/network/vpc/manual/vpcnet/10_create_vpc/)和[私有网络](https://docsv3.qingcloud.com/network/vpc/manual/vxnet/05_create_vxnet/),且私有网络已加入 VPC 网络。
- 已创建 [VPC 网络](https://docsv3.shanhe.com//network/vpc/manual/vpcnet/10_create_vpc/)和[私有网络](https://docsv3.shanhe.com/network/vpc/manual/vxnet/05_create_vxnet/),且私有网络已加入 VPC 网络。

## 操作步骤

Expand Down
10 changes: 10 additions & 0 deletions content/chaosuan/_index.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,10 @@
---
title: "超算使用文档"
linkTitle: "Document"
_build:
render: false
weight: 2
collapsible: true
icon: "/images/icons/index/product-icon-host.svg"
---

Binary file added content/chaosuan/environment/_image/intel-mpi.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added content/chaosuan/environment/_image/module-av.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
29 changes: 29 additions & 0 deletions content/chaosuan/environment/_index.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,29 @@
---
title: "软件环境"
linkTitle: "软件环境"
weight: 4
collapsible: true
type: "product"

section1:
title: 软件环境

Section2:
title: 软件环境
children:
- title: 软件加载
content: 山河集群软件环境使用帮助
url: "softwareload/softwaremoduleload"

- title: 软件编译
content: 山河集群软件环境使用帮助
url: "softwaremake/softwarecompile"

- title: 软件使用
content: 山河集群软件环境使用帮助
url: "softwareuse/biology/blast"

- title: 环境使用
content: 山河集群软件环境使用帮助
url: "envinmentuse/conda-use"
---
8 changes: 8 additions & 0 deletions content/chaosuan/environment/envinmentuse/_index.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,8 @@
---
title: "环境使用"
linkTitle: "环境使用"
_build:
render: false
weight: 61
collapsible: true
---
84 changes: 84 additions & 0 deletions content/chaosuan/environment/envinmentuse/conda-use.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,84 @@
---
title: "conda"
description: Test description
draft: false
enableToc: false
keyword: test
weight: 4
---

## conda python环境创建及调试

集群默认python环境为 2.7,且没有相关的依赖包。假设您需要使用python依赖包或者更高版本的python环境,推荐您自行创建python环境以及安装依赖包(module中只有一个 3.9 环境)

1. 编辑家目录中的 .condarc 文件,使用如下命令内容填充
```bash
cat > ~/.condarc << "EOF"
channels:
- defaults
show_channel_urls: true
default_channels:
- https://mirrors.shanhe.com/anaconda/pkgs/main
- https://mirrors.shanhe.com/anaconda/pkgs/r
- https://mirrors.shanhe.com/anaconda/pkgs/msys2
custom_channels:
bioconda: https://mirrors.shanhe.com/anaconda/cloud
conda-forge: https://mirrors.shanhe.com/anaconda/cloud
ssl_verify: false
EOF
```

2. 加载conda环境
```bash
module load conda3
```

3. 创建conda环境
```bash
conda create -n python39 python=3.9
# -n: 设置新的环境的名字
# python=3.9 指定新环境的python的版本,非必须参数
# 这里也可以用一个-y参数,可以直接跳过安装的确认过程。
```

4. conda环境使用
```bash
# 启动加载python环境
conda activate python39
```
```bash
# conda 安装命令
conda install gatk
```
```bash
# 安装指定版本
# conda install 软件名=版本号
conda install gatk=3.7
```
```bash
# conda 搜索需要的安装包
conda search gatk
```
```bash
# 更新指定软件
conda update gatk
```
```bash
# 卸载指定软件
conda remove gatk
```
5. 账户初始化conda环境
```bash
# 将 conda 环境以及创建 ,账户自动加载
# 将如下命令加入 ~/.bashrc 文件中
module load conda3
conda activate python39
```

***

### 附:使用pip命令安装软件
```bash
# 下面举例了pip安装numpy库
pip install numpy -i https://mirrors.shanhe.com/simple --trusted-host mirrors.shanhe.com
```
113 changes: 113 additions & 0 deletions content/chaosuan/environment/envinmentuse/google-auth.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,113 @@
---
title: "google"
description: Test description
draft: false
enableToc: false
keyword: test
weight: 4
---

## Google Authenticator

###### Google身份验证器是一款TOTP与HOTP的两步验证软件令牌,此软件用于Google的认证服务。此项服务所使用的算法已列于 RFC 6238 和 RFC 4226 中。

###### Google身份验证器给予用户一个六位到八位的一次性密码用于进行登录Google或其他站点时的附加验证。其同样可以给第三方应用生成口令,例如密码管理员或网络硬盘。

***
#### 一、安装使用
1. 软件安装

a. 二进制安装
```bash
yum install -y epel-* mercurial autoconf automake libtool pam-devel

yum install -y google-authenticator
```
b. 编译安装
```bash
yum install -y epel-* mercurial autoconf automake libtool pam-devel git

git clone https://github.com/google/google-authenticator-libpam.git

cd google-authenticator-libpam-master/ #进入目录
chmod +x bootstrap.sh #设置可执行权限
./bootstrap.sh
./configure
make install
ln -s /usr/local/lib/security/pam_google_authenticator.so /usr/lib64/security/pam_google_authenticator.so
```

2. PAM配置
```bash
vim /etc/pam.d/sshd
auth required pam_google_authenticator.so
# 或者
echo "auth required pam_google_authenticator.so" >>/etc/pam.d/sshd
```

3. SSH配置
```bash
vim /etc/ssh/sshd_config
# 将
ChallengeResponseAuthentication no
# 改为
ChallengeResponseAuthentication yes
# 或者
sed -i 's/ChallengeResponseAuthentication no/ChallengeResponseAuthentication yes/g' /etc/ssh/sshd_config
# 重启sshd服务
systemctl restart sshd.service
```

4. google-authenticator运行配置

总的来说,首先运行google-authenticator二进制程序,然后一路y下来即可,中间使用二维码,或者密钥与手机端进行绑定,下方是具体过程。
```bash
google-authenticator
Do you want authentication tokens to be time-based (y/n) y
Warning: pasting the following URL into your browser exposes the OTP secret to Google:
https://www.google.com/chart?chs=200x200&chld=M|0&cht=qr&chl=otpauth://totp/root@demo%3Fsecret%3DXQ2WB526GLPJ7SI64Z3RZISOEE%26issuer%3Ddemo




这里会有一个二维码,需要在手机上下载`googleauthenticator`APP扫码绑定
安卓 IOS手机都可以在应用商店搜索安装



Your new secret key is: XQ2WB526GLPJ7SI64Z3RZISOEE
Your verification code is 917990
Your emergency scratch codes are:
42623319
72314571
14476695
95764389
38976136

Do you want me to update your "/root/.google_authenticator" file? (y/n) y

Do you want to disallow multiple uses of the same authentication
token? This restricts you to one login about every 30s, but it increases
your chances to notice or even prevent man-in-the-middle attacks (y/n) y

By default, a new token is generated every 30 seconds by the mobile app.
In order to compensate for possible time-skew between the client and the server,
we allow an extra token before and after the current time. This allows for a
time skew of up to 30 seconds between authentication server and client. If you
experience problems with poor time synchronization, you can increase the window
from its default size of 3 permitted codes (one previous code, the current
code, the next code) to 17 permitted codes (the 8 previous codes, the current
code, and the 8 next codes). This will permit for a time skew of up to 4 minutes
between client and server.
Do you want to do so? (y/n) y

If the computer that you are logging into isn't hardened against brute-force
login attempts, you can enable rate-limiting for the authentication module.
By default, this limits attackers to no more than 3 login attempts every 30s.
Do you want to enable rate-limiting? (y/n) y
```

5. 登录

a. xshell
![svg](../../0-image/xshell-google-au.png)
87 changes: 87 additions & 0 deletions content/chaosuan/environment/envinmentuse/hpcx.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,87 @@
---
title: "hpcx"
description: Test description
draft: false
enableToc: false
keyword: test
weight: 4
---

# AI 平台,一个虚拟化平台,基于docker搭建,适用于ai 训练、gpu 加速计算的平台。

## 一、使用说明

```bash
##1、共享目录为 “/用户名目录” ,注意在当前用户下,创建的所有开发环境共享目录为同一个
##2、网络支持ib,ib驱动存在于物理机上,当前镜像内没有,调用时可用通过指定网卡的方式调用
##3、因网络驱动存在于物理机,不建议使用源码编译mpi,可以使用hpc-x环境安装mpi的方式来替代

```

## 二、平台基本操作

参考浪潮普通用户操作手册

### AI平台mpi使用—hpc-x

```
https://developer.nvidia.com/networking/hpc-x ### 下载地址
```

#### 安装步骤

```
cd hpcx
export HPCX_HOME=$PWD
```

#### 根据当前环境编译openmpi

```
$ tar xfp ${HPCX_HOME}/sources/openmpi-gitclone.tar.gz
$ cd ${HPCX_HOME}/sources/openmpi-gitclone
$ ./configure CC=icc CXX=icpc F77=ifort FC=ifort --prefix=${HPCX_HOME}/ompi-icc \
--with-hcoll=${HPCX_HOME}/hcoll \
--with-ucx=${HPCX_HOME}/ucx \
--with-platform=contrib/platform/mellanox/optimized \
2>&1 | tee config-icc-output.log
$ make -j32 all 2>&1 | tee build_icc.log && make -j24 install 2>&1 | tee install_icc.log
```

### 激活hpc-x

```bash
export HPCX_HOME=$PWD
cd hpcx-v2.12-gcc-MLNX_OFED_LINUX-5-ubuntu18.04-cuda11-gdrcopy2-nccl2.12-x86_64/
source hpcx-init.sh
hpcx_load

##module 方式加载
module use $HPCX_HOME/modulefiles
module load hpcx
```

### mpirun命令参数

```bash
QUDA_ENABLE_P2P=3 /yangybai11/hpcx-v2.12-gcc-MLNX_OFED_LINUX-5-ubuntu18.04-cuda11-gdrcopy2-nccl2.12-x86_64/ompi/bin/mpirun --allow-run-as-root -np 16 --host 192.208.79.37:8,192.224.8.14:8 -bind-to none -map-by slot -x LD_LIBRARY_PATH -x HOROVOD_MPI_THREADS_DISABLE=1 -x PATH -mca pml ucx -x NCCL_DEBUG=INFO -x NCCL_TREE_THRESHOLD=0 -x UCX_LOG_LEVEL=info ./hmc -i s1.0_restart_37540.xml -geom 1 2 2 4
##--allow-run-as-root root 执行mpirun
## -mca pml ucx UCX 与 OpenSHMEM 显式使用
##
```

```bash
/etc/hosts ##跨节点作业时,可能需要ssh 各节点之间的连接
~/.ssh/known_hosts
```

注意:这个平台网络存在于物理机上,在实例环境中访问存在问题,使用 https://developer.nvidia.com/networking/hpc-x

```
https://content.mellanox.com/hpc/hpc-x/v2.12/hpcx-v2.12-gcc-MLNX_OFED_LINUX-5-ubuntu18.04-cuda11-gdrcopy2-nccl2.12-x86_64.tbz
```





22 changes: 22 additions & 0 deletions content/chaosuan/environment/envinmentuse/intel-compiler.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,22 @@
---
title: "intel"
description: Test description
draft: false
enableToc: false
keyword: test
weight: 4
---

## Intel环境变量加载

本集群安装有 Intel 编译器 2015-2020 版本,以及 Intel oneAPI 编译器 2021-2022 版本

![png](/chaosuan/environment/_image/intel-mpi.png)

环境变量拆分为 intel 和 intelmpi 两个(其中intel包含了MKL环境),若无特殊,则需要同时加载两个,例如

```bash
module load intel/2022 intelmpi/2022
```

拆分的目的在于有的时候仅仅需要intelmpi
Loading