Skip to content

Added Legolas with first node/edge schemes.#43

Merged
visr merged 5 commits into
mainfrom
feat/validation
Jan 26, 2023
Merged

Added Legolas with first node/edge schemes.#43
visr merged 5 commits into
mainfrom
feat/validation

Conversation

@evetion
Copy link
Copy Markdown
Member

@evetion evetion commented Jan 19, 2023

Fixes #24. Legolas only really does validation on a schema level, but the defined struct allows for per row validation. Next up is to validate the graph itself.

Comment thread src/Ribasim.jl
Comment thread src/Ribasim.jl Outdated
evetion and others added 3 commits January 19, 2023 16:02
Co-authored-by: Martijn Visser <mgvisser@gmail.com>
@evetion evetion marked this pull request as ready for review January 24, 2023 09:32
@evetion
Copy link
Copy Markdown
Member Author

evetion commented Jan 24, 2023

I've added a load of TODOs, not all of them need to be fixed in this PR.

Creating a JSON scheme can be done, but getting all the little rules in there (connectors) will be a pain and there's no automation for it in the current ecosystem.

I had to fix the tests to adhere to the Schema we defined, so that's great already.

Comment thread src/bmi.jl
sv = schema()
validate(Tables.schema(table), sv)
R = Legolas.record_type(sv)
foreach(R, Tables.rows(table)) # construct each row
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you know roughly how long this takes for the national schematization? Especially the forcing can contain millions of rows, and sometimes only a fraction of them is used for short calculations.

Also, the result of this loop is not kept, that means that we only get errors, but any changes like clamp in the Legolas tour is never done, or not?

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's correct, at the moment I only check the types and for empty strings. And I rather throw on error than clamp some values, but that's up for debate.

Copy link
Copy Markdown
Member

@visr visr Jan 26, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok yeah right now we don't use something like clamp, and I can't directly think of a use case where it would be better than an error, so let's keep it like this.

Performance was answered here: #43 (comment)

Comment thread src/construction.jl
profile = DataFrame(read_table(config["profile"]))
forcing = DataFrame(read_table(config["forcing"]))
# Load data and validate schema + rows for required field tyopes and values
node = read_table(config["node"]; schema = NodeV1SchemaVersion)
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there any reason to pass the type, and not its instance, NodeV1SchemaVersion()? The latter feels a bit more natural. And the schema keyword can then be set to ::Union{SchemaVersion, Nothing}.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Because Legolas code requires the type, I passed the type. I can make it an instance and use typeof?

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok if Legolas works on the type we can keep it as is, but https://github.com/Deltares/Ribasim.jl/pull/43/files#diff-cd7159638f60b3d2b30ed4bbe7b622ffcdc834f0a2beafe12f46e219a4aa932aR69 (sv = schema()) suggests it doesn't.

Comment thread src/construction.jl Outdated
Comment thread src/construction.jl
Comment thread src/construction.jl Outdated
Comment thread src/construction.jl
Comment thread src/construction.jl
Comment on lines +38 to 43
@assert issorted(forcing.time)
startrow = searchsortedfirst(forcing.time, starttime)
endrow = searchsortedlast(forcing.time, endtime)
forcing = @view forcing[startrow:endrow, :]
# then keep only IDs we use
forcing = filter(:id => in(ids), forcing; view = true)
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch that the assert must be moved up because of the searchsorted first.

In general in this function before I only validated the subset that we are using, and now we are validating the whole dataset. If it is cheap enough that is fine, but good to check the cost.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The slowest is indeed validation the forcing on 3M rows, which takes 1.7s.

[ Info: Validating Legolas.SchemaVersion{Symbol("ribasim.node"), 1} for 29616 rows.
  0.019118 seconds (206.28 k allocations: 4.905 MiB)
┌ Warning: unsupported ARROW:extension:name type: "geoarrow.linestring", arrow type = Vector{Tuple{Float64, Float64}}
└ @ Arrow ~/.julia/packages/Arrow/QsQ3U/src/eltypes.jl:53
[ Info: Validating Legolas.SchemaVersion{Symbol("ribasim.edge"), 1} for 39043 rows.
  0.024101 seconds (426.03 k allocations: 10.671 MiB)
[ Info: Validating Legolas.SchemaVersion{Symbol("ribasim.state"), 1} for 8531 rows.
  0.028598 seconds (75.18 k allocations: 1.541 MiB, 76.58% gc time)
[ Info: Validating Legolas.SchemaVersion{Symbol("ribasim.static"), 1} for 5464 rows.
  0.010356 seconds (52.27 k allocations: 1.375 MiB)
[ Info: Validating Legolas.SchemaVersion{Symbol("ribasim.profile"), 1} for 46560 rows.
  0.029184 seconds (601.92 k allocations: 13.447 MiB)
[ Info: Validating Legolas.SchemaVersion{Symbol("ribasim.forcing"), 1} for 3088222 rows.
  1.741670 seconds (33.91 M allocations: 837.849 MiB, 4.59% gc time)

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the numbers, I'd say 1.7s is no reason to worry for now, we don't have to optimize further. But perhaps later on if needed we will validate only what we use.

Comment thread src/construction.jl
Comment thread src/validation.jl
@visr visr enabled auto-merge (squash) January 26, 2023 09:52
@visr visr merged commit 1b9ec20 into main Jan 26, 2023
@visr visr deleted the feat/validation branch January 26, 2023 09:52
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

validation of inputs

2 participants