Skip to content

Commit

Permalink
Model
Browse files Browse the repository at this point in the history
  • Loading branch information
TeoCalvo committed Jul 11, 2023
1 parent 8934038 commit 25096c9
Show file tree
Hide file tree
Showing 4 changed files with 68 additions and 0 deletions.
29 changes: 29 additions & 0 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -3,3 +3,32 @@
Repositório destinado a adição de dados no datalake mantido pelo canal [Téo Me Why](https://www.twitch.tv/teomewhy).

Todos os códigos necessários e informações básicas sobre os dados serão disponibilizados nestes repositório.

## Dados existentes

|Nome|Contexto|Fonte|Schema|
|---|---|---|---|
|Censo escolar|Microdados do Censo Escolar da Educacação Básica|[:link:](https://www.gov.br/inep/pt-br/acesso-a-informacao/dados-abertos/microdados/censo-escolar)|bronze_censo_escolar|
|Enem|Microdados do Enem|[:link:](https://www.gov.br/inep/pt-br/acesso-a-informacao/dados-abertos/microdados/enem)|bronze_enem|
|Gamers Club|Estatísticas de partidas, medalhas, players|[:link:](https://www.kaggle.com/datasets/gamersclub/brazilian-csgo-plataform-dataset-by-gamers-club)|bronze_gc|
|Olist|Vendas e-commerce|[:link:](https://www.kaggle.com/datasets/gamersclub/brazilian-csgo-plataform-dataset-by-gamers-club)|bronze_olist|
|TSE|Eleições|[:link:](https://dadosabertos.tse.jus.br/dataset/)|bronze_tse, silver_tse|

## Como solicitar novos dados?

Abra uma issue neste projeto com o seguinte template:

```
Título: Nome da fonte de dados
- Descrição da fonte de dados:
- Do que diz respeito?
- Qual o contexto deste dado?
- Quantos anos de histórico?
- Qual o volume?
- Link para acesso aos dados.
- Por que este dado é relevante e deveria estar no datalake?
```
22 changes: 22 additions & 0 deletions src/03.silver/igdb/models/fit_model.py
Original file line number Diff line number Diff line change
Expand Up @@ -91,3 +91,25 @@
.cumsum()
.head(50)
.index.tolist())

# COMMAND ----------



imputer_0 = imputation.ArbitraryNumberImputer(arbitrary_number=0, variables=features)

tree_model = ensemble.RandomForestClassifier(min_samples_leaf=250, max_depth=8)

model_pipeline = pipeline.Pipeline( [('Imputer', imputer_0),
("Tree Model", tree_model)])


with mlflow.start_run():
mlflow.sklearn.autolog()

model_pipeline.fit(X_train, y_train)

y_prob_test = model_pipeline.predict_proba(X_test)
auc_test = metrics.roc_auc_score(y_test, y_prob_test[:,1])
mlflow.log_metrics({"auc_test":auc_test})

4 changes: 4 additions & 0 deletions src/03.silver/igdb/models/predict_model.py
Original file line number Diff line number Diff line change
Expand Up @@ -40,6 +40,10 @@
predict_sdf = predict_set.load_df()
predict_pdf = predict_df.toPandas()

# COMMAND ----------



# COMMAND ----------

# DBTITLE 1,Predictions
Expand Down
13 changes: 13 additions & 0 deletions src/03.silver/pizza_query/extract.sql
Original file line number Diff line number Diff line change
@@ -0,0 +1,13 @@
-- Databricks notebook source
SELECT *
FROM silver.pizza_query.item_pedido

-- COMMAND ----------

SELECT *
FROM silver.pizza_query.pedido

-- COMMAND ----------

SELECT *
FROM silver.pizza_query.produto

0 comments on commit 25096c9

Please sign in to comment.