Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[DataCap Application] <Duitang> - <Arts&Recreation>3 #30

Open
2 tasks done
Mark-duitang opened this issue Nov 13, 2024 · 85 comments
Open
2 tasks done

[DataCap Application] <Duitang> - <Arts&Recreation>3 #30

Mark-duitang opened this issue Nov 13, 2024 · 85 comments

Comments

@Mark-duitang
Copy link

Mark-duitang commented Nov 13, 2024

Data Owner Name

Duitang Information Technology (Shanghai) Co., Ltd.

Data Owner Country/Region

China

Data Owner Industry

Arts & Recreation

Website

https://www.duitang.com/

Social Media Handle

https://weibo.com/duitang/

Social Media Type

WeChat

What is your role related to the dataset

Dataset Owner

Total amount of DataCap being requested

10PiB

Expected size of single dataset (one copy)

2.5PiB

Number of replicas to store

4

Weekly allocation of DataCap requested

1.25PiB

On-chain address for first allocation

f3qgtbnk6h3dfg6zwkqwddez6rj7lvrcvurvltuk3q6oqsnui4ptcp5adrwddy5nchopjs7r5ple5rylofbvpq

Data Type of Application

Public, Open Commercial/Enterprise

Custom multisig

  • Use Custom Multisig

Identifier

No response

Share a brief history of your project and organization

Duitang.com is an interest community focused on sharing aesthetic wallpapers, gathering tens of millions of wallpaper enthusiasts and creators. Here, users can find a wide variety of high-definition pictures, including popular avatars, landscape wallpapers, girl backgrounds, anime character wallpapers, etc., to meet the personalized needs of different users.

As a community platform centered on aesthetic wallpapers, the content within the community covers multiple fields, from wallpapers of popular IPs to works of original designers, from natural landscape photography to the reinterpretation of anime illustrations. Users can not only browse and download these aesthetic wallpapers, but also interact and communicate with other users, and even upload their own works to share their creativity and talents with more people.

Is this project associated with other projects/ecosystem stakeholders?

No

If answered yes, what are the other projects/ecosystem stakeholders

No response

Describe the data being stored onto Filecoin

Community HD wallpapers, gifs, videos, etc.

Where was the data currently stored in this dataset sourced from

My Own Storage Infra

If you answered "Other" in the previous question, enter the details here

No response

If you are a data preparer. What is your location (Country/Region)

China

If you are a data preparer, how will the data be prepared? Please include tooling used and technical details?

Graphsplit

If you are not preparing the data, who will prepare the data? (Provide name and business)

No response

Has this dataset been stored on the Filecoin network before? If so, please explain and make the case why you would like to store this dataset again to the network. Provide details on preparation and/or SP distribution.

We have applied before under different allocators, the application link is:
https://github.com/VenusOfficial/Pathway-VFDA/issues/54
https://github.com/FilscanOfficial/filscan-backend/issues/23
The reason for repeated application is that a single allocator can provide a limited number of continuous supplies. In order to meet my data storage speed and quantity, we dispersed the application, but the data was stored in 4 copies as we promised, and each allocator was brand new data.

Please share a sample of the data

https://www.duitang.com/blog/?id=1221808369
https://www.duitang.com/blog/?id=939087415
https://www.duitang.com/blog/?id=1516930252

Confirm that this is a public dataset that can be retrieved by anyone on the Network

  • I confirm

If you chose not to confirm, what was the reason

No response

What is the expected retrieval frequency for this data

Yearly

For how long do you plan to keep this dataset stored on Filecoin

2 to 3 years

In which geographies do you plan on making storage deals

Greater China, Asia other than Greater China, North America, Europe

How will you be distributing your data to storage providers

HTTP or FTP server, IPFS, Others

How did you find your storage providers

Slack, Big Data Exchange, Partners

If you answered "Others" in the previous question, what is the tool or platform you used

No response

Please list the provider IDs and location of the storage providers you will be working with.

1,f03187925 USA 
2,f03157902 USA/Los Angeles
3,  f03136895 Virginia, USA
4,  f03224828  London, UK

How do you plan to make deals to your storage providers

Boost client, Lotus client, Droplet client, Singularity

If you answered "Others/custom tool" in the previous question, enter the details here

No response

Can you confirm that you will follow the Fil+ guideline

Yes

Copy link
Contributor

datacap-bot bot commented Nov 13, 2024

Application is waiting for allocator review

@ipfsforcezuofu
Copy link
Owner

@Mark-duitang
Thank you for reaching out to us. Please provide the application materials for KYB/KYC via email [email protected]. Thanks!

@Mark-duitang
Copy link
Author

@ipfsforcezuofu
Thank you for your reply. I have sent the proof materials to your email address. Please check it. Thank you!

@ipfsforcezuofu
Copy link
Owner

@Mark-duitang
Thank you for providing the information about your organization and the location proofs for the 4 SPs. We have posted them for community review. The KYB and KYC processes have been completed. Regarding your 10PiB DC request, unfortunately, we cannot guarantee that amount now or in the near future. Since this is our first collaboration, we can allocate 500T DC to you. Future tranches of DC will be adjusted based on your compliance and our availability. Please let me know if this is acceptable to you.
图片

@Mark-duitang
Copy link
Author

@ipfsforcezuofu
I can accept that, thank you very much!

Copy link
Contributor

datacap-bot bot commented Nov 13, 2024

Datacap Request Trigger

Total DataCap requested

10PiB

Expected weekly DataCap usage rate

1.25PiB

DataCap Amount - First Tranche

500TiB

Client address

f3qgtbnk6h3dfg6zwkqwddez6rj7lvrcvurvltuk3q6oqsnui4ptcp5adrwddy5nchopjs7r5ple5rylofbvpq

Copy link
Contributor

datacap-bot bot commented Nov 13, 2024

DataCap Allocation requested

Multisig Notary address

Client address

f3qgtbnk6h3dfg6zwkqwddez6rj7lvrcvurvltuk3q6oqsnui4ptcp5adrwddy5nchopjs7r5ple5rylofbvpq

DataCap allocation requested

500TiB

Id

dbd715cd-752b-4355-8097-472f1e30da54

Copy link
Contributor

datacap-bot bot commented Nov 13, 2024

Application is ready to sign

Copy link
Contributor

datacap-bot bot commented Nov 13, 2024

Request Approved

Your Datacap Allocation Request has been approved by the Notary

Message sent to Filecoin Network

bafy2bzacecyzzate4kj7jpjnlyxd3cnokffudvxfz2w5bwefkpox3zd4pvalg

Address

f3qgtbnk6h3dfg6zwkqwddez6rj7lvrcvurvltuk3q6oqsnui4ptcp5adrwddy5nchopjs7r5ple5rylofbvpq

Datacap Allocated

500TiB

Signer Address

f1x4nh2yvv2o2wwr4f7l7ocuenz7trdv7z5oqlgni

Id

dbd715cd-752b-4355-8097-472f1e30da54

You can check the status here https://filfox.info/en/message/bafy2bzacecyzzate4kj7jpjnlyxd3cnokffudvxfz2w5bwefkpox3zd4pvalg

Copy link
Contributor

datacap-bot bot commented Nov 13, 2024

Application is Granted

@ipfsforcezuofu
Copy link
Owner

@Mark-duitang

According to our original DC allocation plan, your assignment should have been 620, 1250, 2500, and 5630 TiB. However, based on the nature of your dataset and our current DC availability, we are assigning 500 TiB to you as the first tranche. Given that the total demand from all our clients exceeds our capacity, we will allocate 500 TiB each time later, provided your compliance is satisfactory and DC remains available. Please be aware that DC service might be paused temporarily while we apply for new DC from governance. Please arrange your data storage accordingly.

@Mark-duitang
Copy link
Author

@ipfsforcezuofu
OK, thanks for the advance notice, and we have another SP: f03224648 reporting that their remaining storage is running out and may need to be suspended for a while, so we decided that when f03224648 stops cooperating, we will add a new SP: f03136895. Region: US/Virginia. I will send you the KYC information via email, thank you!

@ipfsforcezuofu
Copy link
Owner

@Mark-duitang
Thank you for providing the utility bill to verify the new SP's location. The KYC process for SPf03136895 has been completed.
图片

@Mark-duitang
Copy link
Author

@ipfsforcezuofu
I have synchronized SP:f03224648 to you two days ago. It will be replaced by SP:f03136895 after the storage is used up. However, SP:f03136895 is temporarily uncooperative due to its own operational issues, so we have found a new SP:f03087446. I will send you the relevant email. I am sorry to bother you.

@Mark-duitang
Copy link
Author

@ipfsforcezuofu
Newly added SP: f03087446
Region: Los Angeles, USA
KYC information has been sent to your email, please check, thank you!

@ipfsforcezuofu
Copy link
Owner

@Mark-duitang
Thank you for providing the utility bill to verify the new SP's location. The KYC process for SPf03087446 has been completed.
图片

Copy link
Contributor

datacap-bot bot commented Nov 15, 2024

Client used 75% of the allocated DataCap. Consider allocating next tranche.

@ipfsforcezuofu
Copy link
Owner

checker:manualTrigger

Copy link
Contributor

datacap-bot bot commented Nov 16, 2024

DataCap and CID Checker Report Summary1

Storage Provider Distribution

✔️ Storage provider distribution looks healthy.

⚠️ 25.00% of Storage Providers have retrieval success rate less than 75%.

Deal Data Replication

✔️ Data replication looks healthy.

Deal Data Shared with other Clients2

✔️ No CID sharing has been observed.

Full report

Click here to view the CID Checker report.

Footnotes

  1. To manually trigger this report, add a comment with text checker:manualTrigger

  2. To manually trigger this report with deals from other related addresses, add a comment with text checker:manualTrigger <other_address_1> <other_address_2> ...

@ipfsforcezuofu
Copy link
Owner

@Mark-duitang
The compliance report shows the data is health. I will continue to assign 500T.

Copy link
Contributor

datacap-bot bot commented Dec 24, 2024

DataCap and CID Checker Report Summary1

Storage Provider Distribution

⚠️ 1 storage providers sealed more than 25% of total datacap - f03187925: 25.03%

⚠️ 33.33% of Storage Providers have retrieval success rate less than 75%.

Deal Data Replication

✔️ Data replication looks healthy.

Deal Data Shared with other Clients2

✔️ No CID sharing has been observed.

Full report

Click here to view the CID Checker report.

Footnotes

  1. To manually trigger this report, add a comment with text checker:manualTrigger

  2. To manually trigger this report with deals from other related addresses, add a comment with text checker:manualTrigger <other_address_1> <other_address_2> ...

@ipfsforcezuofu
Copy link
Owner

@Mark-duitang
Compared to the previous report, the retrieval rate for f03136895 has improved from 11.91% to 26.29%. However, it is still relatively low. On the other hand, the retrieval rate for SP f03224828 has decreased significantly from 91.56% to 50.92%. Could you please check and provide clarification?
image

@Mark-duitang
Copy link
Author

@ipfsforcezuofu
Yes, we are paying attention to this issue, but we haven't found the cause of the problem yet. We have checked the search procedures and they are all normal. I am trying to re-post the rootcid I sent before and search again to see if there is any improvement. If there is still no improvement, I can only trouble you to ask the official to provide the search error log.

@ipfsforcezuofu
Copy link
Owner

checker:manualTrigger

Copy link
Contributor

datacap-bot bot commented Dec 25, 2024

DataCap and CID Checker Report Summary1

Storage Provider Distribution

⚠️ 1 storage providers sealed more than 25% of total datacap - f03187925: 25.03%

⚠️ 33.33% of Storage Providers have retrieval success rate less than 75%.

Deal Data Replication

✔️ Data replication looks healthy.

Deal Data Shared with other Clients2

✔️ No CID sharing has been observed.

Full report

Click here to view the CID Checker report.

Footnotes

  1. To manually trigger this report, add a comment with text checker:manualTrigger

  2. To manually trigger this report with deals from other related addresses, add a comment with text checker:manualTrigger <other_address_1> <other_address_2> ...

@ipfsforcezuofu
Copy link
Owner

@Mark-duitang
The newly generated report shows a slight improvement in the retrieval rate for the two SPs. If the situation does not improve in the next report, an issue will be raised to the government team for further investigation. For now, I will assign an additional 1.28 PiB for this round. Please make every effort to improve the retrieval rate. Thank you.
image

Copy link
Contributor

datacap-bot bot commented Dec 25, 2024

Application is in Refill

Copy link
Contributor

datacap-bot bot commented Dec 25, 2024

Request Approved

Your Datacap Allocation Request has been approved by the Notary

Message sent to Filecoin Network

bafy2bzacedtoaummfg7wdzdckgxwkdwq6gyazvxfffusowioqfg7wutxzfcqo

Address

f3qgtbnk6h3dfg6zwkqwddez6rj7lvrcvurvltuk3q6oqsnui4ptcp5adrwddy5nchopjs7r5ple5rylofbvpq

Datacap Allocated

1.25PiB

Signer Address

f1x4nh2yvv2o2wwr4f7l7ocuenz7trdv7z5oqlgni

Id

40257614-1074-48ab-ab64-29911176fd09

You can check the status here https://filfox.info/en/message/bafy2bzacedtoaummfg7wdzdckgxwkdwq6gyazvxfffusowioqfg7wutxzfcqo

Copy link
Contributor

datacap-bot bot commented Dec 25, 2024

Application is Granted

@datacap-bot datacap-bot bot added granted and removed Refill labels Dec 25, 2024
Copy link
Contributor

datacap-bot bot commented Dec 27, 2024

Client used 75% of the allocated DataCap. Consider allocating next tranche.

@ipfsforcezuofu
Copy link
Owner

@Mark-duitang
Please note that DC assignment will resume once the DC is refreshed later.

@ipfsforcezuofu
Copy link
Owner

checker:manualTrigger

Copy link
Contributor

datacap-bot bot commented Dec 30, 2024

DataCap and CID Checker Report Summary1

Storage Provider Distribution

⚠️ 1 storage providers sealed more than 25% of total datacap - f03187925: 25.03%

⚠️ 33.33% of Storage Providers have retrieval success rate less than 75%.

Deal Data Replication

✔️ Data replication looks healthy.

Deal Data Shared with other Clients2

✔️ No CID sharing has been observed.

Full report

Click here to view the CID Checker report.

Footnotes

  1. To manually trigger this report, add a comment with text checker:manualTrigger

  2. To manually trigger this report with deals from other related addresses, add a comment with text checker:manualTrigger <other_address_1> <other_address_2> ...

@ipfsforcezuofu
Copy link
Owner

@Mark-duitang
As highlighted by FIL+ governance, there are some potential issues, including the use of VPNs, unclear dataset content and preparation process, and receiving DC from other allocators. Could you please review the details at filecoin-project/Allocator-Governance#264 and provide an explanation as soon as possible? Thank you for your cooperation.

@Mark-duitang
Copy link
Author

@ipfsforcezuofu
Hello, I am willing to cooperate with your work. Please refer to the following points:

  1. Number of copies: We have always stored 4 copies. 9 is the number of SPs, not the number of stored copies. Due to the SP's own operation and insufficient pledged coins, we will replace the SP midway.
  2. About VPN: I asked the SP, and they said they did not use the VPN service. How do you judge that the SP used the VPN service? Can you provide some evidence to facilitate communication between me and the SP?
  3. Data preparation and data set content:
    For the case of inconsistent file sizes, we split and merge the original data sets and finally package the data files. Data content: We are a picture sharing community (covering 27 categories such as movies, wallpapers, avatars, emoticons, animated pictures, weddings, food, pets, etc.), with billions of high-definition high-quality pictures.
  4. Obtained from the other two allcators:
    Since allcators are often in short supply, we do not want to have storage interruptions, so we have made multiple applications.

@filecoin-watchdog
Copy link

@Mark-duitang

Number of copies: We have always stored 4 copies. 9 is the number of SPs, not the number of stored copies. (...)

The "Unique Data Bytes by Number of Providers" section in the report represents the frequency of usage of the same data. For instance, it shows that you have already stored nearly 200 TiB of data 9 times, almost 100 TiB 8 times, and so on.

It’s important to note that this section does not necessarily reflect the number of storage providers. For example, if you split the dataset and stored one half with 4 providers and the other with 4 different providers, this section would still display 4 copies.

I hope this explanation clarifies the concept.

@ipfsforcezuofu
Copy link
Owner

@Mark-duitang

Thank you for the explanation. Regarding the VPN issue, FIL+ Watchdog has checked several different tools, all of which indicated a high probability of VPN usage. I have also checked and double-confirmed the issue. Could you please provide your improvement plan?

@Mark-duitang
Copy link
Author

@filecoin-watchdog
Thank you very much for your further explanation. We are replying to you so late because we are conducting self-examination and have now found out the problem:

  1. After investigation, we found that the main reason was that since November, due to SP's own operational problems, we replaced some SPs, which caused some errors in code logic. We originally divided the data equally into 4 parts. After replacing the SP in the middle, the newly added node will regenerate the replaced node data once, so more than 4 nodes will use duplicate data. If only 4 SPs are added by default without changing in the middle, this situation will not occur. We will change the code logic later, and the newly added SP will continue from the data generated by the replaced SP instead of generating it again, which can avoid this problem;
  2. We checked the database and found that there were indeed human errors in the past, which caused 1590 messages to have repeated CIDs for a total of 5 times;
    For these two errors, we will make technical adjustments in the future to avoid them. Thank you again for helping us find the problem in time.

@Mark-duitang
Copy link
Author

@ipfsforcezuofu
Regarding the VPN issue, it is difficult for us to completely prevent SP from having such situations because they have promised not to use it and it is difficult for us to confirm that they are using VPN. However, we are very willing to cooperate to do more communication and ask SP to cooperate with rectification. My plan for the current situation is that we will cancel the current SP cooperation in the next round and look for a new SP. Thank you!

@ipfsforcezuofu
Copy link
Owner

@Mark-duitang
Thank you for your proposal to address the VPN issue. Please ensure that the new SPs' compliance with VPN requirements is thoroughly verified before submitting any changes in the application. Feel free to reach out to us if you need any assistance.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants