[DataCap Refresh] 2nd Review of Allocator Antalpha #254

nj-steve · 2024-12-13T13:26:02Z

Basic info

Type of allocator: [manual]

Paste your JSON number: [1052]
Allocator verification: [Yes]

Previous reviews

1st review Outcome: 2.5 PiB

Current allocation distribution

Client name	DC granted
The Rieseberg Lab at the University of British Columbia	1 PiB
Shenyang Dongya Medical Research Institute Co.	1 PiB
Earth Observatory of Singapore, Nanyang Technological University	5.5 PiB

I. The Rieseberg Lab at the University of British Columbia

DC requested: 3.5 PiB
DC granted so far: 1 PiB

II. Dataset Completion

III. Does the list of SPs provided and updated in the issue match the list of SPs used for deals?
Yes

IV. How many replicas has the client declared vs how many been made so far:

8 vs 4

V. Please provide a list of SPs used for deals and their retrieval rates

SP ID	% retrieval	Meet the >75% retrieval?
f03226688	87.39%	YES
f03126993	75.71%	YES
f03228500	30.59%	NO
f03251999	94.51%	YES

I. Shenyang Dongya Medical Research Institute Co.

DC requested: 6 PiB
DC granted so far: 1 PiB

II. Dataset Completion

III. Does the list of SPs provided and updated in the issue match the list of SPs used for deals?
Yes

IV. How many replicas has the client declared vs how many been made so far:

9 vs 6

V. Please provide a list of SPs used for deals and their retrieval rates

SP ID	% retrieval	Meet the >75% retrieval?
f01949267	84.23%	YES
f03166666	62.65%	NO
f02639492	64.6%	NO
f02362412	58.54%	NO
f02822222	53.09%	NO

I. Earth Observatory of Singapore, Nanyang Technological University

DC requested: 10 PiB
DC granted so far: 5.5 PiB

II. Dataset Completion

III. Does the list of SPs provided and updated in the issue match the list of SPs used for deals?
Yes

IV. How many replicas has the client declared vs how many been made so far:

6 vs 6

V. Please provide a list of SPs used for deals and their retrieval rates

SP ID	% retrieval	Meet the >75% retrieval?
f03226688	89.71%	YES
f03226691	43.8%	NO
f01053413	66.43%	NO
f03126993	75.71%	YES
f02825282	27.68%	NO
f03228500	41.53%	NO
f02826253	57.68%	NO

Allocation summary

Notes from the Allocator

The regional distribution is good for everyone. As for retrieval, I suggest that clients should not send deals to SPs that do not support retrieval. Some SPs only support retrieval at the beginning, but later they do not support it in order to save costs. This also brings great trouble to our work. I can only suggest that if there is a problem with retrieval, do not send data storage deals to such SPs.

Did the allocator report up to date any issues or discrepancies that occurred during the application processing?
Yes

What steps have been taken to minimize unfair or risky practices in the allocation process?
I will check the data report regularly. For those SPs who do not comply with the fil+ rules, I will remind the Client not to send them storage deals. This will also promote the development of the Filecoin ecosystem and allow miners who support retrieval to get higher computing power. Those SPs who do not contribute to the ecosystem in order to save costs will not get higher computing power.
Also if a client or SP encounters a technical problem, I will do my best to help them. If I can't solve it, I will ask the Filecoin community for help.

How did these distributions add value to the Filecoin ecosystem?
The three datasets are data from three different industries.

Climate Data: The S1 SLC data are a Level-1 product that collects radar amplitude and phase information in all-weather, day or night conditions, which is ideal for studying natural hazards and emergency response, land applications, oil spill monitoring, sea-ice conditions, and associated climate change effects.By sharing this kind of data, research organizations can identify possible future climate changes that could save lives and property in time

Medical data: This data set contains medical data, surgical data, equipment and other videos and images, medical literature, etc. This will allow more medical practitioners to learn more relevant knowledge.

Plant data: This dataset captures Sunflower's genetic diversity originating from thousands of wild, cultivated, and landrace sunflower individuals distributed across North America.By comparing these data, better sunflower seeds with higher yield and quality can be developed. It will be of great help to both farmers and society.

A lot of valuable data is in AWS, so Amazon has a high market value. When a lot of valuable data is on the FIlecoin network and more people access Filecoin, then Filecoin will have a very high ecological value.

Please confirm that you have maintained the standards set forward in your application for each disbursement issued to clients and that you understand the Fil+ guidelines set forward in your application
Yes

Please confirm that you understand that by submitting this Github request, you will receive a diligence review that will require you to return to this issue to provide updates.
Yes

nj-steve · 2024-12-13T13:28:32Z

Hello Kevin @Kevin-FF-USA , please take the time to read our application report.

Kevin-FF-USA · 2024-12-17T15:11:58Z

Hi @nj-steve ,
Next steps coming is the Watchdog or a member of the community will leave input or questions. Will follow the progress, and here if you have a questions.

filecoin-watchdog · 2025-01-02T20:55:53Z

@nj-steve

The Rieseberg Lab at the University of British Columbia
From the report and analysis of allocator.tech, it is evident that the last allocation did not pass. Have you reported this to the support team?
Additionally, this dataset was stored earlier as indicated here:
GitHub Link.
After the first allocation, all SPs were located in one region, which the allocator flagged. However, none of the SPs used for deals were updated by the client.
The latest report reveals the following:

None of the SPs from the original list were used for deals.
Only one new SP was incorporated, and data was distributed across two geopolitical regions.
According to the allocator application distribution across three regions is required.

Lastly, was a KYC process performed for this client?

Earth Observatory of Singapore, Nanyang Technological University
This was not mentioned in the previous review, but the dataset was stored earlier as indicated here:
GitHub Link.

There are several issues that require clarification:

The client requested 10 PiB, while the dataset size is only 1.1 PiB, and they declared 6 replicas. Why was an additional 3 PiB requested?
The client declared 6 replicas, but there are already 11 replicas (9 as of the 12/13/2024 report).
The SP list from 12/13/2024 included 14 IDs, but only 7 were summarized in the self-review. Why was this discrepancy not addressed?
Nearly half of the SPs have a retrieval rate of 0% (7 out of 15), with 2 others having retrieval rates below 30%.

Shenyang Dongya Medical Research Institute Co.
The client stated this dataset was not stored previously; however, this is not entirely accurate:
GitHub Link.

Further issues include:

5 SPs have a retrieval rate of 0%.
Only 3 out of 10 SPs have acceptable retrieval rates.

nj-steve · 2025-01-03T19:14:52Z

Hello @filecoin-watchdog Thanks for your inspection. You checked it very carefully.

The Rieseberg Lab at the University of British Columbia

About the region:

I have also communicated with the client. According to their feedback, since there are little datacap in the first round, it is still a bit difficult to distribute it in three geographical regions.

Because some miners may not be willing to cooperate if you only give them 100T datacap. They also need to consider the time to invest real money.
If they are given too little DC, they will look for new partners and refuse to cooperate with us.
So I think if most SPs support retrieval, we can appropriately relax it to the second or third round, and then examine the three geographical regions, giving them some flexible time.

About initial SPs:

Because the current SPs are distributed in 2 regions and all support retrieval, I did not strictly require this client to comply with the initial SPs.

About kyc:

Thanks.

nj-steve · 2025-01-03T20:13:48Z

Earth Observatory of Singapore, Nanyang Technological University

About Early storage:
I have checked this before, but I forgot to disclose it. I can explain why storage is allowed, because the early storage is nearly 2 years old and the data is almost expired.
About dataset size:
The real data in our data sealing process is slightly different from the datacap. There is a rule in Filecoin that fills in powers of 2. For example, the real car file is only 18G, but storing this car file requires a 32G datacap. So 1.1 PiB dataset size, from the extreme point of view, it requires 1.1*2 and then *6 copies.
About too many replicas:
I think you are right. I need to limit the client and not be too free. 11 is a bit too much, but the data volume is not large, only 46T. The maximum number can be 8.
About retrieval rate of 0% (7 out of 15)
I think it is difficult to force SP to support retrieval all the time. The main reason is that we cannot control SP. The plan I will implement later is that : for SPs that supported retrieval in the early stage but do not support it later, we will not cooperate with them until they support it; for SPs that have repeated retrieval problems for more than 3 times, we will terminate cooperation.
Does the community have any good solutions for those that initially support retrieval but no longer support it after 1-2 months?

Thanks.

nj-steve · 2025-01-03T20:36:24Z

Shenyang Dongya Medical Research Institute Co.

I admire your carefulness.
It is indeed the same customer and the same data set as this one.(filecoin-project/filecoin-plus-large-datasets#2159)
Although they didn’t store data before, they still need to display the application link to be open and transparent. This is what I want to improve in my future work.

5 Sps retrieval rate of 0%.
I think it is difficult to force SP to support retrieval all the time. The main reason is that we cannot control SP. I have clearly informed the client that SPs who support retrieval in the early stage but not in the later stage will be blacklisted and will not send data deals to them.
Does the community have any good solutions for those that initially support retrieval but no longer support it after 1-2 months?
Fortunately, most of SPs support retrieval.

Thanks. @filecoin-watchdog

filecoin-watchdog · 2025-01-07T17:37:29Z

About dataset size:
The real data in our data sealing process is slightly different from the datacap. There is a rule in Filecoin that fills in powers of 2. For example, the real car file is only 18G, but storing this car file requires a 32G datacap. So 1.1 PiB dataset size, from the extreme point of view, it requires 1.1*2 and then *6 copies.

That's inaccurate. Please refer to this comment in the review where similar matter was raised:
#170 (comment)

nj-steve · 2025-01-08T03:09:47Z

@filecoin-watchdog Thank you for your hard work.

filecoin-watchdog · 2025-01-22T11:33:44Z

@nj-steve are you able to answer the above question, please?

nj-steve · 2025-01-22T12:20:30Z

Hi @filecoin-watchdog , Since someone thinks deals should be full of data, not half full. I'll send this to my clients. After that, I will no longer supply all additional datacap requests.
Does this need a fil+ rule? Maybe some people don't realize whether we need to be full of data or full of 80% data or half.

filecoin-watchdog · 2025-01-22T16:45:24Z

@nj-steve
As mentioned in the quoted thread, we want sectors to be filled completely, not halfway. Even from a logical point of view, if we have memory disks, we don't want to waste space on them, we want to fill them as much as possible.

I don't know the rule you presented, do you know what is the source of it?

There is a rule in Filecoin that fills in powers of 2. For example, the real car file is only 18G, but storing this car file requires a 32G datacap. So 1.1 PiB dataset size, from the extreme point of view, it requires 1.1*2 and then *6 copies.

nj-steve · 2025-01-22T21:58:10Z

@nj-steve As mentioned in the quoted thread, we want sectors to be filled completely, not halfway. Even from a logical point of view, if we have memory disks, we don't want to waste space on them, we want to fill them as much as possible.

I don't know the rule you presented, do you know what is the source of it?

There is a rule in Filecoin that fills in powers of 2. For example, the real car file is only 18G, but storing this car file requires a 32G datacap. So 1.1 PiB dataset size, from the extreme point of view, it requires 1.1*2 and then *6 copies.

https://spec.filecoin.io/systems/filecoin_files/piece/#section-systems.filecoin_files.piece.data-representation
@filecoin-watchdog Hi , you can see this doc. This is a proof of Filecoin, although I don't want to waste disk space.

nj-steve · 2025-01-22T22:04:32Z

I will guide my clients that not wasting disk space is also a green practice.

martplo · 2025-01-23T09:15:18Z

Hi! I’ve commented on this thread before, but I didn’t want to get ahead of myself with the watchdog’s response. Since he didn’t mention it, I’d like to mention the discussion that’s available here: #256

Quote from the aforementioned discussion:

b. No Excessive Padding of Datasets
Metadata and supporting information are allowed, but padding exceeding 25% of a dataset is not acceptable.
Datasets should be appropriately sized, focused, and free of unnecessary filler content.

I hope this will add value to this discussion.

nj-steve · 2025-01-23T22:52:28Z

Hi! I’ve commented on this thread before, but I didn’t want to get ahead of myself with the watchdog’s response. Since he didn’t mention it, I’d like to mention the discussion that’s available here: #256

Quote from the aforementioned discussion:

b. No Excessive Padding of Datasets
Metadata and supporting information are allowed, but padding exceeding 25% of a dataset is not acceptable.
Datasets should be appropriately sized, focused, and free of unnecessary filler content.

I hope this will add value to this discussion.

Thank you for your discussion. There should be some flexibility for SPs. Otherwise, it will discourage them from joining Filecoin.

Kevin-FF-USA · 2025-01-27T20:56:42Z

Hi @nj-steve,

Thanks for submitting this application for refresh.
Wanted to send you a friendly update - as this works its way through the system you should see a comment from Galen on behalf of the Governance this week. If you have any questions or need support until then, please let us know.

Warmly,
-Kevin

nj-steve assigned Kevin-FF-USA Dec 13, 2024

filecoin-watchdog added Refresh Applications received from existing Allocators for a refresh of DataCap allowance Awaiting Community/Watchdog Comment DataCap Refresh requests awaiting a public verification of the metrics outlined in Allocator App. labels Dec 16, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[DataCap Refresh] 2nd Review of Allocator Antalpha #254

[DataCap Refresh] 2nd Review of Allocator Antalpha #254

nj-steve commented Dec 13, 2024

nj-steve commented Dec 13, 2024

Kevin-FF-USA commented Dec 17, 2024

filecoin-watchdog commented Jan 2, 2025

nj-steve commented Jan 3, 2025

nj-steve commented Jan 3, 2025 •

edited

Loading

nj-steve commented Jan 3, 2025 •

edited

Loading

filecoin-watchdog commented Jan 7, 2025

nj-steve commented Jan 8, 2025

filecoin-watchdog commented Jan 22, 2025

nj-steve commented Jan 22, 2025

filecoin-watchdog commented Jan 22, 2025

nj-steve commented Jan 22, 2025

nj-steve commented Jan 22, 2025

martplo commented Jan 23, 2025

nj-steve commented Jan 23, 2025

Kevin-FF-USA commented Jan 27, 2025

[DataCap Refresh] 2nd Review of Allocator Antalpha #254

[DataCap Refresh] 2nd Review of Allocator Antalpha #254

Comments

nj-steve commented Dec 13, 2024

Basic info

Current allocation distribution

Allocation summary

nj-steve commented Dec 13, 2024

Kevin-FF-USA commented Dec 17, 2024

filecoin-watchdog commented Jan 2, 2025

nj-steve commented Jan 3, 2025

nj-steve commented Jan 3, 2025 • edited Loading

nj-steve commented Jan 3, 2025 • edited Loading

filecoin-watchdog commented Jan 7, 2025

nj-steve commented Jan 8, 2025

filecoin-watchdog commented Jan 22, 2025

nj-steve commented Jan 22, 2025

filecoin-watchdog commented Jan 22, 2025

nj-steve commented Jan 22, 2025

nj-steve commented Jan 22, 2025

martplo commented Jan 23, 2025

nj-steve commented Jan 23, 2025

Kevin-FF-USA commented Jan 27, 2025

nj-steve commented Jan 3, 2025 •

edited

Loading

nj-steve commented Jan 3, 2025 •

edited

Loading