-
-
Notifications
You must be signed in to change notification settings - Fork 183
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Structured Data 2021 #2174
Comments
As discussed in Slack, I'd be very keen to author this. I'd also be happy to take my hat out of the 'Author' ring (and to play Reviewer instead) for #2148 so as to be able to resource this effectively (which I've updated accordingly). |
@jono-alderson thanks for your interest in authoring this chapter! As the content team lead, you'll be responsible for the scope and direction of the chapter and keeping it on schedule. We automatically monitor the staffing and progress of each chapter based on the state of the initial comment so please keep that updated as you add new contributors and meet each milestone. We've created a Google Doc for this chapter, which you're encouraged to use to collaborate with the content team on the initial outline, metrics, and ultimately the final draft. Next steps for this chapter are:
There's not currently a section coordinator for this chapter, so I'll be periodically checking in with you directly to make sure the chapter is staying on schedule. Reach out here in this issue if you have any questions about the process. More information about the content team lead and author roles and responsibilities are available for reference in the wiki if needed. To anyone else interested in contributing to this chapter, please comment below to join the team! |
Hey @jono-alderson , If you'll have me, I'd love to help out with the analysis for this chapter, this year! |
That'd be wonderful, thanks! NB, I'm aiming to start outlining a plan and firing out some comms this weekend :) |
Hi @jono-alderson just checking in. Here are some tips to help keep the chapter on track:
|
Happy to help if you guys need any more reviewers :) |
You asked about microformats - I'm happy to help review on that area, and help those running analyses make sense of them. |
Thanks - more reviewers are definitely welcome! I have a feeling that we're going to need lots of hands on deck for this! |
Thanks, Kevin, that'd be amazing. I'm conscious that whilst schema.org and JSON-LD is very trendy at the moment, there's lot of structured data out there in legacy formats that I'm keen for us not to overlook. I'll add you as a reviewer! Delightful to have your input. |
@rviscomi I don't appear to be able to edit the top comment; do I need some permissions? |
I've apparently got edit access, so I've added @kevinmarks and @vdwijngaert as reviewers, and myself as an analyst, @jono-alderson :) I've also checked off that May 31st milestone since we now have at least one of each role. Do you want to remove the |
Thanks! Still happy to invite more folks. It's a big topic, so I'm happy to cat-herd involvement from a wider pool potentially; unless there are good reasons not to? Could you also add @jvandriel as a reviewer and editor, please? :) |
Nope, I'm sure that's fine to leave the badges up if we're still looking for people :) Added, and also put everyone in the frontmatter of the Google doc as well. |
Hi all 👋 happy to contribute on this one - either as author or editor, whatever feels more necessary. |
I'm happy to join and help out as well - also very curious to see the outcome |
@jono-alderson you'll need to accept our invitation to join the HTTP Archive team in order to get edit access on GitHub. Check your email or visit https://github.com/HTTPArchive/ to accept. Happy to see the increased interest in this chapter! |
Here's the sharable link for anyone to join the Slack channel: https://join.slack.com/t/httparchive/shared_invite/zt-45sgwmnb-eDEatOhqssqNAKxxOSLAaA |
Looks like we have everybody in Slack except for @vdwijngaert; are you able to join us, Koen? :) |
@jono-alderson I'm here because @jvandriel asked, then I saw your tweet asking for involvement from people with expertise in Dublin Core / other metadata. I might be able to help as reviewer, if you still need such help. |
Hi @philbarker, thanks for reaching out! That'd be amazing; I'll add you to the team list! I know I'm personally weak on knowledge around DC, so keen to have an expert involved! Please feel free to jump into the Slack channel, and contribute any ideas/direction, etc! |
All, the outline in the chapter doc is looking great. Nice work! 🚀 @jono-alderson is the outline complete, or are you still adding to it? |
Getting there! |
One thing I might suggest would be a deeper integration with knowledge graphs like Wikidata. If I've got this structured data on a page: {
"@type": "Person",
"name": "Greg Brimble",
"nationality": {
"@type": "Country",
"name": "United Kingdom"
},
"sameAs": ["https://www.wikidata.org/wiki/Q52444075"]
} and Wikidata has this: "instance of" → "human"
(P31 → Q5)
"country of citizenship" → "United Kingdom"
(P27 → Q145)
This is getting dangerously close to what my undergraduate dissertation was on 😅 The difficulty is in doing the ontology matching (finding equivalent properties and entities), which might be a bit out-of-scope for this analysis (e.g. Schema.org's "Person" ≠ Q5, but Schema.org's "nationality" === P27). |
That'd be pretty awesome, but I think that comparing to external sources at scale is going to be waaayyy out of scope. |
Might be interesting to touch on the use of sameAs:
without diving too deeply into the mire of ontology mapping... |
Hey @jono-alderson, could you give an update on the chapter outline? I see some new topics added today, but not sure if it's still being worked on. If it's finalized you could check off Milestone 1 above, otherwise let us know when you think it'll be ready. Thanks! @GregBrimble please take a close look at the outline to see whether we need any custom metrics to extract structured data info from the DOM at runtime. Those would need to be written and merged no later than the end of the month to be added to the test pipeline in time. |
Hello hello! I'm happy with the chapter outline, and will check off the milestone now. @GregBrimble, I think we need to explore your message in Slack (https://httparchive.slack.com/archives/C021GGN9W4D/p1623610269059000) ASAP, as that might influence our next steps. |
And we've got the run's results! July's data is up so we can now play around in BigQuery. I've started the queries in #2293, and have requested edit access to the results sheet so I can start putting stuff down there. Checked the error log as a first priority, and so far, it looks pretty good. We have our structured data custom metrics on 13,775,158 of the 13,778,213 pages we've run against. We captured 508 error logs, and I'm assuming the rest (2,547) failed so hard that we couldn't even capture the exception. 99.98% success is good enough for me. This analysis is due September 30, but I can't imagine it takes nearly that long. The hardest bit will be the JSON-LD parsing, which in all likelihood I'm going to do locally. I'll do bits and pieces over the next few days, so keep an eye on that linked PR to follow along :) |
This is monumentally exciting! |
👋 Hi @jono-alderson @cyberandy @GregBrimble, just checking in on the chapter progress. How is the analysis coming along? |
Hey, made decent progress the weekend before last, but it's a busy week at work, this week, so haven't had a chance to get back to it. I'll get this completed next weekend |
Any updates from your side, @GregBrimble ? |
@jonoalderson @cyberandy @kevinmarks @vdwijngaert @jvandriel @philbarker @GregBrimble @jvandriel @JasmineDWillson Thank you all for your hard work getting this chapter over the finish line in time for the pre-release—Structured Data has been the most-read (English-version) chapter in the past couple of weeks! Congratulations on finishing the chapter, and I'm excited to see us launch the rest of the chapters along side it on Wednesday 🎉 When you get 5 minutes, I'd really appreciate if you could fill out our contributor survey to tell us (the project leads) about your experience. It's super helpful to hear what went well or what could be improved for next time. 🙏 |
Part I Chapter 4: Structured Data
If you're interested in contributing to the Structured Data chapter of the 2021 Web Almanac, please reply to this issue and indicate which role or roles best fit your interest and availability: author, reviewer, analyst, and/or editor.
Content team
Expand for more information about each role
Note: The time commitment for each role varies by the chapter's scope and complexity as well as the number of contributors.
For an overview of how the roles work together at each phase of the project, see the Chapter Lifecycle doc.
Milestone checklist
0. Form the content team
1. Plan content
2. Gather data
3. Validate results
4. Draft content
5. Publication
Chapter resources
Refer to these 2021 Structured Data resources throughout the content creation process:
📄 Google Docs for outlining and drafting content
🔍 SQL files for committing the queries used during analysis
📊 Google Sheets for saving the results of queries
📝 Markdown file for publishing content and managing public metadata
The text was updated successfully, but these errors were encountered: