Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Question about the validity of my data in a PRE_COPY event #15

Closed
saami783 opened this issue Jun 24, 2024 · 7 comments
Closed

Question about the validity of my data in a PRE_COPY event #15

saami783 opened this issue Jun 24, 2024 · 7 comments

Comments

@saami783
Copy link

Hello,

I want to add checks on the validity of my data in a PRE_COPY event. Specifically, I want to verify if my data has the correct expected format, if there are no null values in any fields, etc. However, I do not understand how the bundle operates when I perform these checks. Should I retrieve the data from the temporary import table, perform my checks on this data, then truncate the table to insert the valid data back into the same table, and finally let the bundle handle the copying?

@thislg
Copy link
Member

thislg commented Jun 24, 2024

See my example with valid IS true copy condition in issue #2. In a POST_LOAD event subscriber you can set the valid boolean to false or true based on your custom logic. A common use case is running SQL queries on your temp table to validate the data. Then let the bundle handle the copying: lines with valid values set to false won't be copied.

@saami783
Copy link
Author

Or, can we validate the current occurrence? For example, the bundle checks the first occurrence in the table, then my program checks the same row immediately afterward, and so on until there are no more data to validate. This would save me from having to retrieve all the data and manipulate the import database.

@saami783
Copy link
Author

saami783 commented Jun 24, 2024

See my example with valid IS true copy condition in issue #2. In a POST_LOAD event subscriber you can set the valid boolean to false or true based on your custom logic. A common use case is running SQL queries on your temp table to validate the data. Then let the bundle handle the copying: lines with valid values set to false won't be copied.

Ok thank you for your answer, so I have an extra_field isValid like this in my configuration :

resources:
  import:
    tablename: scratch.import_file_tmp
    load:
      add_file_line_number: true
      # strategy: https://github.com/le-phare/import-bundle/blob/master/docs/configure/load.md#strategy
      pattern: '.*poke.*'
      format_options:
        with_header: true
        field_delimiter: "\t"
        line_delimiter: "\n"
      fields: # https://github.com/le-phare/import-bundle/blob/master/docs/configure/load.md#:~:text=fields%20et%20extra_fields-,fields,-and%20extra_fields%20have
        NAME: ~
        family: ~
        pv: ~
        pf: ~
        pm: ~
      extra_fields: # https://github.com/le-phare/import-bundle/issues/2
        calcul: integer
        isValid:
          type: boolean
          options:
            default: NULL

    copy:
      target: poke
      strategy: insert # https://github.com/le-phare/import-bundle/blob/master/docs/configure/copy.md#:~:text=MySQL%2C%20PostgreSQL%209.5%2B-,Executes,-an%20UPDATE%20in
      strategy_options:
        copy_condition: isValid IS TRUE
        distinct: true
        conflict_target: name
        non_updateable_fields: ["name"]
      mapping:
        name: name
        family: family
        pv: pv
        pf: pf
        pm: pm

And now, assuming I want to validate my data where the pv field must not be null and it is an integer, how can I do this in my POST_LOAD event? Without considering the possible options in the configuration file such as:

<field_1_name>:
    type: string
    options:
        notnull: false
class OnPostLoadSubscriber implements EventSubscriberInterface
{

    public function __construct(private FormFactoryInterface $formFactory,
                                private Connection $connection,
                                private LoggerInterface $logger) {
    }

    /**
     * @throws Exception
     */
    public function onImportPostLoad($event): void
    {
        $config = $event->getConfig()->get('resources')['import']->getConfig();

        $tableName = explode('.', $config['tablename']);
        $tableName = $tableName[1];

        $sql = "SELECT * FROM {$tableName}";

        $this->connection->beginTransaction();

        $stmt = $this->connection->executeQuery($sql);
        $entities = $stmt->fetchAllAssociative();

        foreach ($entities as $entityData) {
            $fileLineParts = explode(':', $entityData['file_line_no']);
            $id = $fileLineParts[2];
            $poke = new Poke();

            $poke->setId((int) $id);
            $poke->setName($entityData['name']);
            $poke->setFamily($entityData['family']);
            $poke->setPv($entityData['pv']);
            $poke->setPf($entityData['pf']);
            $poke->setPm($entityData['pm']);
            $poke->setCalcul($entityData['calcul']);

            echo $poke . "\n";

            $form = $this->formFactory->create(PokeType::class, $poke);
            $form->submit($poke);

            if ($form->isSubmitted() && $form->isValid()) {
                $sqlUpdate = "UPDATE {$tableName} SET isvalid = :isvalid WHERE file_line_no LIKE :file_line_no";
                $this->connection->executeQuery($sqlUpdate, [
                    'isvalid' => 1,
                    'file_line_no' => "%:" . $id,
                ]);
            } else {
                $errors = (string) $form->getErrors(true, false);
                $this->logger->alert("Validation errors for Poke ID {$id}: " . $errors . "\n");
            }
        }
        $this->connection->commit();
        $this->connection->close();
    }
    public static function getSubscribedEvents(): array
    {
        return [
            'import.post_load' => 'onImportPostLoad',
        ];
    }
}

In order to understand the validation logic and enforce my constraints (using Symfony forms, etc.), do I need to retrieve all the data from the table with an SQL query and validate them one by one, ensuring to indicate whether isValid = true or false?

@thislg
Copy link
Member

thislg commented Jun 28, 2024

Your method of using Symfony Form requires iterating over the data which can be slow and inefficient.
For simple checks without logging you can use copy_condition directly: copy_condition: temp.pv IS NOT NULL AND pg_input_is_valid(temp.pv, 'integer').
There is another way by using an UPDATE query to set your isvalid flag for the whole table in your event subscriber:

UPDATE {$tableName} SET isvalid = FALSE WHERE temp.pv IS NULL OR !pg_input_is_valid(temp.pv, 'integer')

Then if you want to log invalid lines, use SELECT file_line_no FROM scratch.import_file_tmp WHERE isvalid = FALSE request to retrieve them and call your logger.

@saami783
Copy link
Author

saami783 commented Jul 1, 2024

Ok thank you for your answer @thislg

@saami783
Copy link
Author

saami783 commented Jul 2, 2024

Is this available on MariaDB & MySQL? Because here, the prefix pg seems to indicate PostgreSQL ... and I'm not sure if the other engines offer it.

@thislg
Copy link
Member

thislg commented Jul 2, 2024

I gave you an example in Postgresql but you can use any function available in your DBMS.

@thislg thislg closed this as completed Jul 2, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants