Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[DataCap Refresh] 2nd Review of Allocator Antalpha #254

Open
nj-steve opened this issue Dec 13, 2024 · 16 comments
Open

[DataCap Refresh] 2nd Review of Allocator Antalpha #254

nj-steve opened this issue Dec 13, 2024 · 16 comments
Assignees
Labels
Diligence Audit in Process Governance team is reviewing the DataCap distributions and verifying the deals were within standards Refresh Applications received from existing Allocators for a refresh of DataCap allowance

Comments

@nj-steve
Copy link

Basic info

  1. Type of allocator: [manual]
  1. Paste your JSON number: [1052]

  2. Allocator verification: [Yes]

  1. Allocator Application
  2. Compliance Report
  1. Previous reviews

Current allocation distribution

Client name DC granted
The Rieseberg Lab at the University of British Columbia 1 PiB
Shenyang Dongya Medical Research Institute Co. 1 PiB
Earth Observatory of Singapore, Nanyang Technological University 5.5 PiB

I. The Rieseberg Lab at the University of British Columbia

  • DC requested: 3.5 PiB
  • DC granted so far: 1 PiB

II. Dataset Completion

III. Does the list of SPs provided and updated in the issue match the list of SPs used for deals?
Yes

IV. How many replicas has the client declared vs how many been made so far:

8 vs 4

V. Please provide a list of SPs used for deals and their retrieval rates

SP ID % retrieval Meet the >75% retrieval?
f03226688 87.39% YES
f03126993 75.71% YES
f03228500 30.59% NO
f03251999 94.51% YES

I. Shenyang Dongya Medical Research Institute Co.

  • DC requested: 6 PiB
  • DC granted so far: 1 PiB

II. Dataset Completion

III. Does the list of SPs provided and updated in the issue match the list of SPs used for deals?
Yes

IV. How many replicas has the client declared vs how many been made so far:

9 vs 6

V. Please provide a list of SPs used for deals and their retrieval rates

SP ID % retrieval Meet the >75% retrieval?
f01949267 84.23% YES
f03166666 62.65% NO
f02639492 64.6% NO
f02362412 58.54% NO
f02822222 53.09% NO

I. Earth Observatory of Singapore, Nanyang Technological University

  • DC requested: 10 PiB
  • DC granted so far: 5.5 PiB

II. Dataset Completion

III. Does the list of SPs provided and updated in the issue match the list of SPs used for deals?
Yes

IV. How many replicas has the client declared vs how many been made so far:

6 vs 6

V. Please provide a list of SPs used for deals and their retrieval rates

SP ID % retrieval Meet the >75% retrieval?
f03226688 89.71% YES
f03226691 43.8% NO
f01053413 66.43% NO
f03126993 75.71% YES
f02825282 27.68% NO
f03228500 41.53% NO
f02826253 57.68% NO

Allocation summary

  1. Notes from the Allocator

The regional distribution is good for everyone. As for retrieval, I suggest that clients should not send deals to SPs that do not support retrieval. Some SPs only support retrieval at the beginning, but later they do not support it in order to save costs. This also brings great trouble to our work. I can only suggest that if there is a problem with retrieval, do not send data storage deals to such SPs.

  1. Did the allocator report up to date any issues or discrepancies that occurred during the application processing?
    Yes
  1. What steps have been taken to minimize unfair or risky practices in the allocation process?
    I will check the data report regularly. For those SPs who do not comply with the fil+ rules, I will remind the Client not to send them storage deals. This will also promote the development of the Filecoin ecosystem and allow miners who support retrieval to get higher computing power. Those SPs who do not contribute to the ecosystem in order to save costs will not get higher computing power.
    Also if a client or SP encounters a technical problem, I will do my best to help them. If I can't solve it, I will ask the Filecoin community for help.
  1. How did these distributions add value to the Filecoin ecosystem?
    The three datasets are data from three different industries.

Climate Data: The S1 SLC data are a Level-1 product that collects radar amplitude and phase information in all-weather, day or night conditions, which is ideal for studying natural hazards and emergency response, land applications, oil spill monitoring, sea-ice conditions, and associated climate change effects.By sharing this kind of data, research organizations can identify possible future climate changes that could save lives and property in time

Medical data: This data set contains medical data, surgical data, equipment and other videos and images, medical literature, etc. This will allow more medical practitioners to learn more relevant knowledge.

Plant data: This dataset captures Sunflower's genetic diversity originating from thousands of wild, cultivated, and landrace sunflower individuals distributed across North America.By comparing these data, better sunflower seeds with higher yield and quality can be developed. It will be of great help to both farmers and society.

A lot of valuable data is in AWS, so Amazon has a high market value. When a lot of valuable data is on the FIlecoin network and more people access Filecoin, then Filecoin will have a very high ecological value.

  1. Please confirm that you have maintained the standards set forward in your application for each disbursement issued to clients and that you understand the Fil+ guidelines set forward in your application
    Yes
  1. Please confirm that you understand that by submitting this Github request, you will receive a diligence review that will require you to return to this issue to provide updates.
    Yes
@nj-steve
Copy link
Author

Hello Kevin @Kevin-FF-USA , please take the time to read our application report.

@filecoin-watchdog filecoin-watchdog added Refresh Applications received from existing Allocators for a refresh of DataCap allowance Awaiting Community/Watchdog Comment DataCap Refresh requests awaiting a public verification of the metrics outlined in Allocator App. labels Dec 16, 2024
@Kevin-FF-USA
Copy link
Collaborator

Hi @nj-steve ,
Next steps coming is the Watchdog or a member of the community will leave input or questions. Will follow the progress, and here if you have a questions.

@filecoin-watchdog
Copy link
Collaborator

@nj-steve

  1. The Rieseberg Lab at the University of British Columbia
    From the report and analysis of allocator.tech, it is evident that the last allocation did not pass. Have you reported this to the support team?
    Additionally, this dataset was stored earlier as indicated here:
    GitHub Link.
    After the first allocation, all SPs were located in one region, which the allocator flagged. However, none of the SPs used for deals were updated by the client.
    The latest report reveals the following:
  • None of the SPs from the original list were used for deals.
  • Only one new SP was incorporated, and data was distributed across two geopolitical regions.
  • According to the allocator application distribution across three regions is required.

Lastly, was a KYC process performed for this client?

  1. Earth Observatory of Singapore, Nanyang Technological University
    This was not mentioned in the previous review, but the dataset was stored earlier as indicated here:
    GitHub Link.

There are several issues that require clarification:

  • The client requested 10 PiB, while the dataset size is only 1.1 PiB, and they declared 6 replicas. Why was an additional 3 PiB requested?
  • The client declared 6 replicas, but there are already 11 replicas (9 as of the 12/13/2024 report).
  • The SP list from 12/13/2024 included 14 IDs, but only 7 were summarized in the self-review. Why was this discrepancy not addressed?
  • Nearly half of the SPs have a retrieval rate of 0% (7 out of 15), with 2 others having retrieval rates below 30%.
  1. Shenyang Dongya Medical Research Institute Co.
    The client stated this dataset was not stored previously; however, this is not entirely accurate:
    GitHub Link.

Further issues include:

  • 5 SPs have a retrieval rate of 0%.
  • Only 3 out of 10 SPs have acceptable retrieval rates.

@filecoin-watchdog filecoin-watchdog added Awaiting Response from Allocator If there is a question that was raised in the issue that requires comment before moving forward. and removed Awaiting Community/Watchdog Comment DataCap Refresh requests awaiting a public verification of the metrics outlined in Allocator App. labels Jan 2, 2025
@nj-steve
Copy link
Author

nj-steve commented Jan 3, 2025

Hello @filecoin-watchdog Thanks for your inspection. You checked it very carefully.

The Rieseberg Lab at the University of British Columbia

  • About the region:

I have also communicated with the client. According to their feedback, since there are little datacap in the first round, it is still a bit difficult to distribute it in three geographical regions.

Because some miners may not be willing to cooperate if you only give them 100T datacap. They also need to consider the time to invest real money.
If they are given too little DC, they will look for new partners and refuse to cooperate with us.
So I think if most SPs support retrieval, we can appropriately relax it to the second or third round, and then examine the three geographical regions, giving them some flexible time.

  • About initial SPs:

Because the current SPs are distributed in 2 regions and all support retrieval, I did not strictly require this client to comply with the initial SPs.

  • About kyc:
image

Thanks.

@nj-steve
Copy link
Author

nj-steve commented Jan 3, 2025

Earth Observatory of Singapore, Nanyang Technological University

  • About Early storage:
    I have checked this before, but I forgot to disclose it. I can explain why storage is allowed, because the early storage is nearly 2 years old and the data is almost expired.

  • About dataset size:
    The real data in our data sealing process is slightly different from the datacap. There is a rule in Filecoin that fills in powers of 2. For example, the real car file is only 18G, but storing this car file requires a 32G datacap. So 1.1 PiB dataset size, from the extreme point of view, it requires 1.1*2 and then *6 copies.

  • About too many replicas:
    I think you are right. I need to limit the client and not be too free. 11 is a bit too much, but the data volume is not large, only 46T. The maximum number can be 8.

  • About retrieval rate of 0% (7 out of 15)
    I think it is difficult to force SP to support retrieval all the time. The main reason is that we cannot control SP. The plan I will implement later is that : for SPs that supported retrieval in the early stage but do not support it later, we will not cooperate with them until they support it; for SPs that have repeated retrieval problems for more than 3 times, we will terminate cooperation.
    Does the community have any good solutions for those that initially support retrieval but no longer support it after 1-2 months?

Thanks.

@nj-steve
Copy link
Author

nj-steve commented Jan 3, 2025

Shenyang Dongya Medical Research Institute Co.

I admire your carefulness.
It is indeed the same customer and the same data set as this one.(filecoin-project/filecoin-plus-large-datasets#2159)
Although they didn’t store data before, they still need to display the application link to be open and transparent. This is what I want to improve in my future work.

5 Sps retrieval rate of 0%.
I think it is difficult to force SP to support retrieval all the time. The main reason is that we cannot control SP. I have clearly informed the client that SPs who support retrieval in the early stage but not in the later stage will be blacklisted and will not send data deals to them.
Does the community have any good solutions for those that initially support retrieval but no longer support it after 1-2 months?
Fortunately, most of SPs support retrieval.
image

Thanks. @filecoin-watchdog

@filecoin-watchdog
Copy link
Collaborator

About dataset size:
The real data in our data sealing process is slightly different from the datacap. There is a rule in Filecoin that fills in powers of 2. For example, the real car file is only 18G, but storing this car file requires a 32G datacap. So 1.1 PiB dataset size, from the extreme point of view, it requires 1.1*2 and then *6 copies.

That's inaccurate. Please refer to this comment in the review where similar matter was raised:
#170 (comment)

@nj-steve
Copy link
Author

nj-steve commented Jan 8, 2025

@filecoin-watchdog Thank you for your hard work.

@filecoin-watchdog filecoin-watchdog added Diligence Audit in Process Governance team is reviewing the DataCap distributions and verifying the deals were within standards and removed Awaiting Response from Allocator If there is a question that was raised in the issue that requires comment before moving forward. labels Jan 8, 2025
@filecoin-watchdog
Copy link
Collaborator

@nj-steve are you able to answer the above question, please?

@nj-steve
Copy link
Author

Hi @filecoin-watchdog , Since someone thinks deals should be full of data, not half full. I'll send this to my clients. After that, I will no longer supply all additional datacap requests.
Does this need a fil+ rule? Maybe some people don't realize whether we need to be full of data or full of 80% data or half.

@filecoin-watchdog
Copy link
Collaborator

@nj-steve
As mentioned in the quoted thread, we want sectors to be filled completely, not halfway. Even from a logical point of view, if we have memory disks, we don't want to waste space on them, we want to fill them as much as possible.

I don't know the rule you presented, do you know what is the source of it?

There is a rule in Filecoin that fills in powers of 2. For example, the real car file is only 18G, but storing this car file requires a 32G datacap. So 1.1 PiB dataset size, from the extreme point of view, it requires 1.1*2 and then *6 copies.

@nj-steve
Copy link
Author

@nj-steve As mentioned in the quoted thread, we want sectors to be filled completely, not halfway. Even from a logical point of view, if we have memory disks, we don't want to waste space on them, we want to fill them as much as possible.

I don't know the rule you presented, do you know what is the source of it?

There is a rule in Filecoin that fills in powers of 2. For example, the real car file is only 18G, but storing this car file requires a 32G datacap. So 1.1 PiB dataset size, from the extreme point of view, it requires 1.1*2 and then *6 copies.

https://spec.filecoin.io/systems/filecoin_files/piece/#section-systems.filecoin_files.piece.data-representation
@filecoin-watchdog Hi , you can see this doc. This is a proof of Filecoin, although I don't want to waste disk space.

@nj-steve
Copy link
Author

I will guide my clients that not wasting disk space is also a green practice.

@martplo
Copy link

martplo commented Jan 23, 2025

Hi! I’ve commented on this thread before, but I didn’t want to get ahead of myself with the watchdog’s response. Since he didn’t mention it, I’d like to mention the discussion that’s available here: #256

Quote from the aforementioned discussion:

b. No Excessive Padding of Datasets
Metadata and supporting information are allowed, but padding exceeding 25% of a dataset is not acceptable.
Datasets should be appropriately sized, focused, and free of unnecessary filler content.

I hope this will add value to this discussion.

@nj-steve
Copy link
Author

Hi! I’ve commented on this thread before, but I didn’t want to get ahead of myself with the watchdog’s response. Since he didn’t mention it, I’d like to mention the discussion that’s available here: #256

Quote from the aforementioned discussion:

b. No Excessive Padding of Datasets
Metadata and supporting information are allowed, but padding exceeding 25% of a dataset is not acceptable.
Datasets should be appropriately sized, focused, and free of unnecessary filler content.

I hope this will add value to this discussion.

Thank you for your discussion. There should be some flexibility for SPs. Otherwise, it will discourage them from joining Filecoin.

@Kevin-FF-USA
Copy link
Collaborator

Hi @nj-steve,

Thanks for submitting this application for refresh.
Wanted to send you a friendly update - as this works its way through the system you should see a comment from Galen on behalf of the Governance this week. If you have any questions or need support until then, please let us know.

Warmly,
-Kevin

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Diligence Audit in Process Governance team is reviewing the DataCap distributions and verifying the deals were within standards Refresh Applications received from existing Allocators for a refresh of DataCap allowance
Projects
None yet
Development

No branches or pull requests

4 participants