List of things that could have been done better and new features #142

StoneT2000 · 2021-09-05T23:43:19Z

Made this a public issue since I'm 100% sure people want to see this + also leave feedback through Github (instead of a form or Discord or Kaggle forums) and there's a lot of problems that may get forgotten before we run this competition again.

Also note, these are just suggestions from our team, the community, but aren't necessarily going to be added subject to the community, developer time etc.

Main topics I've broken problems and new features down to:

Competition / Developer Flow / Tooling
Leaderboard / Evaluation System
Visualizer
Game Design
Quality of Life
Other

Competition / Developer Flow / Tooling

Python based environment using the gym API (Pettingzoo based would be good) and that is fast. Keep the cross-language match running tool
Game constants stored in the python environment and consistently updated.
If running on kaggle, keep replays in as similar as a format as kaggle.
An open beta if possible and more channels for open feedback. One way is to make the first few weeks of competition always a beta before changes are published at the end of the first few weeks for official release.
Maintaining symmetry in a single episode and in starter kits. Asymmetry subtly presents an accessibility problem in that those who are unaware of the asymmetry may find frustration in debugging.
More levels of tutorials, depending on background and experience, and make these discoverable
Improved information discovery, especially for the API and the Specs.
Higher quality starter kits
- Properly written C++ code
- Add doc strings / comments for all functions
- Add more quality of life functions so user does not need to bother implementing some (is day / is night function)
Provide a compilation service for users who don't / can't use docker. Probably really easy to setup? Or could also just give a kaggle notebook that compiles for you
Better CLI Tool (and GUI one too)
- Less windows bugs and weird errors popping up. This probably deterred a ton of competitors who want to use windows, don't want to use WSL, and have no options.
- GUI based tool with a modular dashboard that does all the CLI tool does but more. Moreover allows competitors who aren't familiar with the CLI to compete effectively as well.
- Give warnings for incorrect use. A ton of people have issues because they use python 2 and while python 2 is getting phased out, this still happens way too often. Could just print a warning saying "hey, you need to use python version 3, try installing that with X, change default interpreter, python3 etc."
- Keep packages more minimal (will probably happen when using a python engine)
- A less broken local tournament evaluation system. Currently seems to have issues running stable for long periods of time (hopefully fixed in dimensions 6.0). Someone could probably write a better one easily with just bash.
- More options. What's a good way to allow a ton of configurable options for a CLI tool? Pass in a json in the CLI? I've always found that super weird / clumsy but I see it happen somewhat often (e.g. kaggle's own environment package). On a GUI, this incredibly easy given we have more options with the UI to improve the user experience, minimize user errors.
- Match more closely how kaggle times your agent. E.g. using a overage concept (already implemented in next version actually)
- Play by hand. Allow one to run a match between two agents up to some turn, then take one or more over and give them the exact instructions. Probably a difficult feature to implement nicely, will require a lot of season to season programming of interfaces.

Leaderboard / Evaluation System

(Some of these might not be possible on kaggle but we may hope)

disabling old submissions. / keeping a max of k submissions active to significantly reduce clutter and increase match rate for submissions that people care about
During final evaluation, divide it into phases where each successive phase eliminates the lowest ranked agents to give more matches for higher ranked bots that more often than not need more matches to be accurately evaluated (and also have bigger stakes for medals, prizes etc.). And maybe immediately after each phase, lower the "sigma" value of agents / decrease how much your skill rating can change after each match.
A whole history type of skill rating system that weighs each match as having equal value, optimizing a vector of skill ratings (our parameters) that satisfies some equations that define a rating with respect to all other ratings and the match results
Having multiple tracks on the leaderboard for different types of solutions (RL, (ML / not RL), rule based), and maybe tag yourself by degree (middle/high school, undergrad, grad, professional), by language (python, JS, C++ etc.)

Visualizer

Less memory intensive visualizer
Flexible game annotations. See Screeps for a good example
State based visualizers and replays
Avoid last minute changes (some of them introduced some color issues)
More indicators of unit actions (maybe via animations, some graphics etc). It was impossible to see when transfers happened, when units sat still because they chose to it collided, etc

Game Design

It seems that imitation learning is super powerful. The first place solution has public replays and is much much more powerful than 2nd place, making IL approaches getting pretty good results without too much work. Two approaches, 1. make imitation learning more difficult, 2. balance the game somehow so that the top solution is never that far away from second (probably difficult to do), 3. make rule based more powerful
- Make rule based more powerful
  - Rule based works well because they use algorithms for "solved" problems and various priors. Make a design where there are some explicit algorithms that prove very useful / optimal for helping win a game. ML approaches will then need to spend time to learn these / coding them into the agent.
- Increase difficulty for ML approaches:
  - Increase board size / unit count (makes approaches that make predictions for every tile of the map more difficult or at a per unit basis more difficult).
  - Longer horizon. Errors in imitation should accumulate over time and more so the longer the horizon is, making IL more difficult and in general RL more difficult provided that the game isn't easily decided early on.
  - Increase action and observation space.
  - Add more continuous actions / larger range kinds of actions. Lux AI season 1 has only discrete actions apart from transfer actions. However, transfer is so little used so it doesn't contribute to difficulty in IL. And while it appears the top agent (Toad Brigade at the time of writing) uses transfers and IL would need to copy this, they only ever transfer their entire cargo at once and not fractions of it, so transfer becomes again effectively a discrete action with a small cardinality. If a continuous action / large range discrete action is frequently used, IL approaches will now need to do a regression task on top of a classification task (prediction what action is taken) that they either somehow do perfectly well, or have to aid it by some hardcoding (which requires some thorough human analysis) to make better predictions at what value for the action a top agent may be executing.
  - Encode more concepts of "pipelined" strategies where a team would need to execute a sequence of high level sets of actions. A longer sequence is also prone to error accumulation probably, and may in general be more difficult for RL approaches to define good rewards for such kinds of strategies. Rule based probably finds this easier.
  - Have more teams competing in a single match, add's a lot more noise via other teams doing different things that the to be imitated agent is reacting differently to. Our team is against this since this is very difficult to estimate good skill ratings without high variance.
Confusing mechanics and poor specs
- Transfers are ambiguous. It was never clarified how chain transfers happen
- Fair auto mining from adjacent tiles is confusing, while it may add some nice complexity to the competition, was 1. hard to come up an algorithm for and 2. difficult to convey how it works at both a high and implementation level of detail.
- Clarifying what is a unit and what is a building
- Cooldown is difficult to convey because there doesn't appear? to be a reasonable term for lowering cooldown (heating up?). Unless you play turn based video games (where this concept seems to be prevalent), this is a bit confusing.
- Grammar check the specs

Quality of Life

A version notifier that auto lets the user know when using the GUI or CLI tool if a update is necessary, and when the environment updates, check if that should be updated
Option to run agents as a local process, a dockerized container, or a server (with one central server to run the engine or one agent running the engine as well). Being able to run agents as a server and connect over the internet allows for people to technically challenge each other directly if either permits. We can then provide a simple ngrok kind of solution to let agents across the world connect and compete against each other with no extra costs to anyone.
Better day-zero kaggle notebooks and kaggle notebooks for creating submissions for compiled languages for users who don't want to use Docker / can't access it.
When making IDs, make them human readable and contain more info such as what team a unit is on.

Other

Make better rules that players have to accept. We forgot to specifically require open sourcing for prize awarding, it was ambiguous
Make it crystal clear that a competition is open to everyone, and provide a FAQ

nosound2 · 2021-12-06T14:09:02Z

Can be nice to have all actions to be visible by default on GUI (maybe with a checkbox button). Specifically transfer with an arrow, cooldown with overheating icon, center action with a standby icon. It can significantly improve understanding what is going on regular replays.

Regarding the rules, maybe add action to units to move 2 cells and not one, by spending some extra fuel, let's say 20-30 fuel. It may improve the diversity of attacking strategies.

Additionally, maybe add a forth resource (oranges), which will pop up randomly on the map in small amounts after the first night and till the end.

And, of course, carts are great, just need to buff them up somehow.

StoneT2000 · 2021-12-06T21:39:59Z

Awesome suggestions @nosound2

We actually considered exactly that idea about additional movement at the cost of some resources. This may or may not appear next season in some form.

For features that are non deterministic to a player like random oranges I'm not too sure, I think the already high variability in map generation and resource distributions creates significant variance in a game and doesn't need more randomness that could make a game unfair.

And for the visualizer/gui (and the error logs like warnings about collisions invalid actions etc), completely agree that they need to be more readable. So visuals for all actions (a common feedback on our mid season feedback), human readable ids, and others. Perhaps animations somehow would be a good idea. I think movement actions were quite clear with the smooth animations, perhaps we can similarly do so with all the others somehow, up to our designer.

nosound2 · 2021-12-07T09:25:11Z

That is cool, thanks.

Regarding the "oranges", after I wrote the proposal I realised that randomness in map generation after the initial generation is not acceptable. The whole map must be generated in the beginning. Then, maybe they can be generated from the start, but be activated at specific turns. So it will be known from the beginning where and when they appear.

The reason why I push this proposal is that I feel that mid-end game is a little bit static. Sometimes completely static, if all the resources are already collected or well-guarded. Having some options for a comeback, ability to "break the defence" of that wood cluster, or just in general to have some objectives in the mid-end game can be good.

Reducing number of allowed units on the map can help. Maybe by having a price for unit production.

royerk · 2021-12-07T23:13:27Z

Leaderboard: show full rank information. For example for a trueskill system: mean & sigma, it helps showing how stable is a bot.

StoneT2000 added the enhancement New feature or request label Sep 5, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

List of things that could have been done better and new features #142

List of things that could have been done better and new features #142

StoneT2000 commented Sep 5, 2021 •

edited

Loading

nosound2 commented Dec 6, 2021 •

edited

Loading

StoneT2000 commented Dec 6, 2021

nosound2 commented Dec 7, 2021

royerk commented Dec 7, 2021

List of things that could have been done better and new features #142

List of things that could have been done better and new features #142

Comments

StoneT2000 commented Sep 5, 2021 • edited Loading

Competition / Developer Flow / Tooling

Leaderboard / Evaluation System

Visualizer

Game Design

Quality of Life

Other

nosound2 commented Dec 6, 2021 • edited Loading

StoneT2000 commented Dec 6, 2021

nosound2 commented Dec 7, 2021

royerk commented Dec 7, 2021

StoneT2000 commented Sep 5, 2021 •

edited

Loading

nosound2 commented Dec 6, 2021 •

edited

Loading