Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[DataCap Refresh] <3rd> Review of <Hash Big Data> #275

Open
hash889900 opened this issue Jan 21, 2025 · 3 comments
Open

[DataCap Refresh] <3rd> Review of <Hash Big Data> #275

hash889900 opened this issue Jan 21, 2025 · 3 comments
Assignees
Labels
Diligence Audit in Process Governance team is reviewing the DataCap distributions and verifying the deals were within standards Refresh Applications received from existing Allocators for a refresh of DataCap allowance

Comments

@hash889900
Copy link

Basic info

  1. Type of allocator: [manual]
  1. Paste your JSON number: [1050]

  2. Allocator verification: [yes]

  1. Allocator Application
  2. Compliance Report
  1. Previous reviews

Current allocation distribution

Client name DC granted
ringcoming 3 PiB
Chinese Academy of Sciences 0.5 PiB
Mody 1.46 PiB
Xinghongweiye 3.25 PiB

I. ringcoming

  • DC requested: 3 PiB
  • DC granted so far: 3 PiB

II. Dataset Completion

https://www.ringcoming.shop/datasets
III. Does the list of SPs provided and updated in the issue match the list of SPs used for deals?

yes

IV. How many replicas has the client declared vs how many been made so far:

6 vs 6

V. Please provide a list of SPs used for deals and their retrieval rates

SP ID % retrieval Meet the >75% retrieval?
f01531188 95.51% YES
f02201190 69.61% NO
f01969779 73.27% NO
f03241759 52.72% NO
f03241413 78.67% YES
f03238633 92.25% YES

I. Chinese Academy of Sciences

  • DC requested: 5 PiB
  • DC granted so far: 0.5 PiB

II. Dataset Completion

http://dr5.lamost.org/v3/sas/catalog/
http://dr5.lamost.org/v3/sas/fits/20111024/F5902/
http://dr5.lamost.org/v3/sas/fits/20111024/F5907/
http://dr5.lamost.org/v3/sas/fits/20111024/F5909/
http://dr5.lamost.org/v3/sas/png/20111024/F5902/
http://dr5.lamost.org/v3/sas/png/20111024/F5907/
http://dr5.lamost.org/v3/sas/png/20111024/F5909/
http://dr5.lamost.org/v3/sas/sky/20111024/
III. Does the list of SPs provided and updated in the issue match the list of SPs used for deals?

yes

IV. How many replicas has the client declared vs how many been made so far:

10 vs 4

V. Please provide a list of SPs used for deals and their retrieval rates

SP ID % retrieval Meet the >75% retrieval?
f01106668 73.60% NO
f03150906 93.93% YES
f01889668 34.30% NO
f01518369 36.53% NO

I. Mody

  • DC requested: 2 PiB
  • DC granted so far: 1.46 PiB

II. Dataset Completion

https://x.com/blocklikecom
https://blocklikecom.medium.com/
III. Does the list of SPs provided and updated in the issue match the list of SPs used for deals?

yes

IV. How many replicas has the client declared vs how many been made so far:

8 vs 7

V. Please provide a list of SPs used for deals and their retrieval rates

SP ID % retrieval Meet the >75% retrieval?
f02826007 17.93% NO
f02826588 31.80% NO
f02827996 35.10% NO
f03226958 89.21% YES
f03081958 68.88% NO
f01082888 7.40% NO
f03228820 0.00% NO

I. Xinghongweiye

  • DC requested: 5 PiB
  • DC granted so far: 5 PiB

II. Dataset Completion

https://pan.baidu.com/s/1mMWZ06Znbxc_ppcSHhGiiA?pwd=i96g
III. Does the list of SPs provided and updated in the issue match the list of SPs used for deals?

yes

IV. How many replicas has the client declared vs how many been made so far:

10 vs 10

V. Please provide a list of SPs used for deals and their retrieval rates

SP ID % retrieval Meet the >75% retrieval?
f02826553 17.72% NO
f01930832 0.00% NO
f03166677 53.13% NO
f03166688 42.41% NO
f01690363 88.03% YES
f02828669 8.02% NO
f02891969 0.00% NO
f01930330 0.00% NO
f02362412 61.06% NO
f02639492 58.38% NO
f02368946 0.00% NO
f03166668 0.00% NO
f03166666 15.85% NO
f02353497 62.52% NO
f02353580 79.11% YES
f02825822 11.94% NO
f02826815 18.41% NO

Allocation summary

  1. Notes from the Allocator

Second review of several issues mentioned by galen, which have been followed up in the follow-up

Clients changing replicas/project sizes without sufficient justification. For example, would it be better for a client to open a new application for these new SPs and replicas?

It won't happen again.

SPs utilizing VPNs without increased diligence

Have asked the client to send proof of geolocation for all sp's or sp's that are questioning the VPN to me for verification

Image

Image

Image

Highly variable retrieval rates

The retrieval rate issue has been checking in on a regular basis with updated reports going in

Data preparation details from clients

Image
Will continue to follow up on this issue with subsequent clients
8. Did the allocator report up to date any issues or discrepancies that occurred during the application processing?

yes
9. What steps have been taken to minimize unfair or risky practices in the allocation process?

Generate reports on an ongoing basis and check report content
10. How did these distributions add value to the Filecoin ecosystem?

High retrieval rate and availability of data
11. Please confirm that you have maintained the standards set forward in your application for each disbursement issued to clients and that you understand the Fil+ guidelines set forward in your application

yes
12. Please confirm that you understand that by submitting this Github request, you will receive a diligence review that will require you to return to this issue to provide updates.

yes

@filecoin-watchdog filecoin-watchdog added Refresh Applications received from existing Allocators for a refresh of DataCap allowance Awaiting Community/Watchdog Comment DataCap Refresh requests awaiting a public verification of the metrics outlined in Allocator App. labels Jan 21, 2025
@filecoin-watchdog
Copy link
Collaborator

filecoin-watchdog commented Jan 28, 2025

@hash889900
[DataCap Refresh] <3rd> Review of <Hash Big Data> #275

[DataCap Application] <ringcoming> \36

3/3PiB

  • The KYC process, SP verification, and documentation are properly handled.
  • Data set size is identified correctly.
  • Data retrieval and distribution across SPs are well-managed.
  • Data preparation step should as output have also an “index” file for the data set. This would help other network users understand the data stored and how they might benefit from it. At present, there is no clear information on the exact data being stored or its potential uses.

[DataCap Application] <Large Sky Area Multi-Object Fiber Spectroscopic Telescope-5> \33

0.5/5PiB

  • KYC diligence is performed well, questions about stored data were addressed by the allocator and resolved by the client.
  • Some SPs seem to use VPNs. While this is acceptable, the allocator should verify their physical locations for accuracy.
  • Currently, 4 SPs are sealing data, compared to the 10 declared. This is acceptable during the initial stages but should be monitored.
  • Overuse of the CID report tool makes the review thread difficult to analyze.
  • Data preparation step should as output have also an “index” file for the data set. This would help other network users understand the data stored and how they might benefit from it. At present, there is no clear information on the exact data being stored or its potential uses.

[DataCap Application] <BlockLike> \24

1.46/2PiB

  • The client initially marked the data as not public, but later claimed in comments that it is completely public.
  • Data preparation step should as output have also an “index” file for the data set. This would help other network users understand the data stored and how they might benefit from it. At present, there is no clear information on the exact data being stored or its potential uses.
  • Data set size justification needs more attention, such as providing screenshots of internal infrastructure or an index file.
  • No evidence of KYC/KYB was provided (aside from SP verification).
  • Performance issues: poor data retrieval, unbalanced distribution, and duplicate data.
  • The client stopped communicating with the allocator, halting further DataCap distributions.
  • Missing details: F03226958 and f03228820 were not disclosed in the application or comments.
  • Sum of Unique Data is 370TiB instead of declared 250TiB

[DataCap Application] <Shenzhen Xinghongweiye Technology Co., LTD> - <Medical> \8

5/5PiB

  • Data preparation step should as output have also an “index” file for the data set. This would help other network users understand the data stored and how they might benefit from it. At present, there is no clear information on the exact data being stored or its potential uses.
  • Retrieval performance issues:
    • 53% of data has a retrieval success rate of 0%. (9 out of 17 SPs)
    • 76% of data has a retrieval success rate below 75%. (13 out of 17 SPs)
  • No evidence of proper KYC or KYB processes being conducted.
  • In case of that client last tranche of DataCap was assigned, those points should be threated as advice for future clients
  • The requested DataCap is not accurate. For a dataset size of 1PiB with 10 replicas, the client should request 10 PiB, not 5 PiB.
  • Sum of Unique Data is 617 TiB instead of declared 1PiB.

@filecoin-watchdog filecoin-watchdog added Awaiting Response from Allocator If there is a question that was raised in the issue that requires comment before moving forward. and removed Awaiting Community/Watchdog Comment DataCap Refresh requests awaiting a public verification of the metrics outlined in Allocator App. labels Jan 28, 2025
@hash889900
Copy link
Author

hash889900 commented Jan 30, 2025

@filecoin-watchdog Thanks

Currently, 4 SPs are sealing data, compared to the 10 declared. This is acceptable during the initial stages but should be monitored.

Data set size justification needs more attention, such as providing screenshots of internal infrastructure or an index file.

Some SPs seem to use VPNs. While this is acceptable, the allocator should verify their physical locations for accuracy.

In case of that client last tranche of DataCap was assigned, those points should be threated as advice for future clients

I'll keep an eye on it.

No evidence of KYC/KYB was provided (aside from SP verification).

Image The email included their business license

Missing details: F03226958 and f03228820 were not disclosed in the application or comments.

It has been disclosed in the comments

Image

[DataCap Application] \24

For this client, the datacap trigger is stopped until a response is given.

No evidence of proper KYC or KYB processes being conducted.

Here at the very beginning the customer is asked to provide business license verification

The requested DataCap is not accurate. For a dataset size of 1PiB with 10 replicas, the client should request 10 PiB, not 5 PiB.
Sum of Unique Data is 617 TiB instead of declared 1PiB.

These are my rules.

Overall limit of 5P for a single customer for a single application

Realizing later that this wasn't enough, I requested a change to 15P, but got no response

I'm wondering if I can subsequently modify it to a total cap of 15P

On the issue of retrieval rates I have been generating reports on an ongoing basis and asking questions to ask

Data preparation step should as output have also an “index” file for the data set. This would help other network users understand the data stored and how they might benefit from it. At present, there is no clear information on the exact data being stored or its potential uses.

Will follow up on this

@filecoin-watchdog
Copy link
Collaborator

@hash889900 Thank you for your detailed explanation

@filecoin-watchdog filecoin-watchdog added Diligence Audit in Process Governance team is reviewing the DataCap distributions and verifying the deals were within standards and removed Awaiting Response from Allocator If there is a question that was raised in the issue that requires comment before moving forward. labels Feb 3, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Diligence Audit in Process Governance team is reviewing the DataCap distributions and verifying the deals were within standards Refresh Applications received from existing Allocators for a refresh of DataCap allowance
Projects
None yet
Development

No branches or pull requests

3 participants