Added Legolas with first node/edge schemes. by evetion · Pull Request #43 · Deltares/Ribasim

evetion · 2023-01-19T14:30:04Z

Fixes #24. Legolas only really does validation on a schema level, but the defined struct allows for per row validation. Next up is to validate the graph itself.

Co-authored-by: Martijn Visser <mgvisser@gmail.com>

evetion · 2023-01-24T09:35:13Z

I've added a load of TODOs, not all of them need to be fixed in this PR.

Creating a JSON scheme can be done, but getting all the little rules in there (connectors) will be a pain and there's no automation for it in the current ecosystem.

I had to fix the tests to adhere to the Schema we defined, so that's great already.

visr · 2023-01-24T15:03:25Z

+        sv = schema()
+        validate(Tables.schema(table), sv)
+        R = Legolas.record_type(sv)
+        foreach(R, Tables.rows(table))  # construct each row


Do you know roughly how long this takes for the national schematization? Especially the forcing can contain millions of rows, and sometimes only a fraction of them is used for short calculations.

Also, the result of this loop is not kept, that means that we only get errors, but any changes like clamp in the Legolas tour is never done, or not?

That's correct, at the moment I only check the types and for empty strings. And I rather throw on error than clamp some values, but that's up for debate.

Ok yeah right now we don't use something like clamp, and I can't directly think of a use case where it would be better than an error, so let's keep it like this.

Performance was answered here: #43 (comment)

visr · 2023-01-24T15:07:19Z

-    profile = DataFrame(read_table(config["profile"]))
-    forcing = DataFrame(read_table(config["forcing"]))
+    # Load data and validate schema + rows for required field tyopes and values
+    node = read_table(config["node"]; schema = NodeV1SchemaVersion)


Is there any reason to pass the type, and not its instance, NodeV1SchemaVersion()? The latter feels a bit more natural. And the schema keyword can then be set to ::Union{SchemaVersion, Nothing}.

Because Legolas code requires the type, I passed the type. I can make it an instance and use typeof?

Ok if Legolas works on the type we can keep it as is, but https://github.com/Deltares/Ribasim.jl/pull/43/files#diff-cd7159638f60b3d2b30ed4bbe7b622ffcdc834f0a2beafe12f46e219a4aa932aR69 (sv = schema()) suggests it doesn't.

visr · 2023-01-24T15:34:06Z

+    @assert issorted(forcing.time)
    startrow = searchsortedfirst(forcing.time, starttime)
    endrow = searchsortedlast(forcing.time, endtime)
    forcing = @view forcing[startrow:endrow, :]
    # then keep only IDs we use
    forcing = filter(:id => in(ids), forcing; view = true)


Good catch that the assert must be moved up because of the searchsorted first.

In general in this function before I only validated the subset that we are using, and now we are validating the whole dataset. If it is cheap enough that is fine, but good to check the cost.

The slowest is indeed validation the forcing on 3M rows, which takes 1.7s.

[ Info: Validating Legolas.SchemaVersion{Symbol("ribasim.node"), 1} for 29616 rows. 0.019118 seconds (206.28 k allocations: 4.905 MiB) ┌ Warning: unsupported ARROW:extension:name type: "geoarrow.linestring", arrow type = Vector{Tuple{Float64, Float64}} └ @ Arrow ~/.julia/packages/Arrow/QsQ3U/src/eltypes.jl:53 [ Info: Validating Legolas.SchemaVersion{Symbol("ribasim.edge"), 1} for 39043 rows. 0.024101 seconds (426.03 k allocations: 10.671 MiB) [ Info: Validating Legolas.SchemaVersion{Symbol("ribasim.state"), 1} for 8531 rows. 0.028598 seconds (75.18 k allocations: 1.541 MiB, 76.58% gc time) [ Info: Validating Legolas.SchemaVersion{Symbol("ribasim.static"), 1} for 5464 rows. 0.010356 seconds (52.27 k allocations: 1.375 MiB) [ Info: Validating Legolas.SchemaVersion{Symbol("ribasim.profile"), 1} for 46560 rows. 0.029184 seconds (601.92 k allocations: 13.447 MiB) [ Info: Validating Legolas.SchemaVersion{Symbol("ribasim.forcing"), 1} for 3088222 rows. 1.741670 seconds (33.91 M allocations: 837.849 MiB, 4.59% gc time)

Thanks for the numbers, I'd say 1.7s is no reason to worry for now, we don't have to optimize further. But perhaps later on if needed we will validate only what we use.

Added Legolas with first node/edge schemes.

fcb5cbc

visr reviewed Jan 19, 2023

View reviewed changes

Comment thread src/Ribasim.jl

Comment thread src/Ribasim.jl Outdated

evetion and others added 3 commits January 19, 2023 16:02

Update src/Ribasim.jl

63b8b54

Co-authored-by: Martijn Visser <mgvisser@gmail.com>

Include valdation.jl

7179cf3

Working validation.

6d9e28d

evetion marked this pull request as ready for review January 24, 2023 09:32

visr reviewed Jan 24, 2023

View reviewed changes

Addressed review comments.

d8eb4c0

visr enabled auto-merge (squash) January 26, 2023 09:52

visr approved these changes Jan 26, 2023

View reviewed changes

visr merged commit 1b9ec20 into main Jan 26, 2023

visr deleted the feat/validation branch January 26, 2023 09:52

Conversation

evetion commented Jan 19, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

Uh oh!

evetion commented Jan 24, 2023

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

visr Jan 26, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

evetion commented Jan 19, 2023 •

edited

Loading

visr Jan 26, 2023 •

edited

Loading