fix: move PRAGMA journal_mode/synchronous out of upgrade SQL
PRAGMA journal_mode and synchronous cannot be changed inside a
transaction. The v40 upgrade SQL bundled DROP INDEX statements
(which implicitly start a transaction) together with PRAGMAs,
causing a "Safety level may not be changed inside a transaction"
error.
Fix: remove PRAGMAs from db_upgrades.h v40 and set synchronous=NORMAL
in pkgdb_open right after journal_mode=WAL, where no transaction
is active.
run_prstmt: skip expensive sqlite3_expanded_sql when debug is off
sqlite3_expanded_sql formats the entire SQL with all bound values
substituted. This was called unconditionally on every prepared
statement execution, even in production (debug_level=0), wasting
CPU cycles on memory allocation and string formatting.
Guard the call with ctx.debug_level >= 4, matching the pattern
already used by pkgdb_debug().
shlibs: add missing indexes on shlib_id for faster library lookups
pkgdb_query_shlib_require/provide join through shlib_id but no index
existed on that column. Without an index, SQLite does a full table scan
of pkg_shlibs_required/provided to find matching rows for a given shlib.
Added indexes (4 for local DB, 2 for binary repo):
pkg_shlibs_required_shlib_id(shlib_id)
pkg_shlibs_required_ignore_shlib_id(shlib_id)
pkg_shlibs_provided_shlib_id(shlib_id)
pkg_shlibs_provided_ignore_shlib_id(shlib_id)
Local DB: schema bumped to v42 with upgrade path for existing databases.
Binary repo: added to binary_repo_initsql.
flavors: replace duplicated CTE with persistent view in both databases
The WITH flavors AS (...) CTE was duplicated across 5 queries in
pkgdb_query.c and repo/binary/query.c. Replace it with a persistent
VIEW created in both the local and binary repo schemas.
Changes:
- binary_private.h: add CREATE VIEW to binary_repo_initsql
- pkgdb.c: add CREATE VIEW to pkgdb_init() for fresh databases
- db_upgrades.h: add upgrade v41 to create the view on existing DBs
- pkgdb_query.c: remove the 2 duplicated CTE definitions
- repo/binary/query.c: remove the 2 duplicated CTE definitions
- pkgdb.c: bump DB_SCHEMA_MINOR 40 -> 41
binary repo: use 16K pages and synchronous=OFF during bulk update
- page_size: switch from getpagesize() (typically 4K) to 16384 for
better B-tree cache efficiency with millions of rows
- synchronous: set to OFF during bulk INSERT phase; the database is
re-opened with synchronous=FULL on subsequent access via
pkg_repo_binary_init, so durability is maintained for read queries
local db: remove 15 redundant indexes, switch to WAL+NORMAL, bump schema v40
Each *_{package_id} index is redundant: all junction tables use
PRIMARY KEY(package_id, other_id) or UNIQUE(package_id, other_id),
which implicitly creates an index with package_id as leading column.
The explicit single-column index adds storage and INSERT overhead
with zero query-plan benefit.
Dropped indexes (15 total):
pkg_script_package_id, pkg_directories_package_id,
pkg_categories_package_id, pkg_licenses_package_id,
pkg_users_package_id, pkg_groups_package_id,
pkg_shlibs_required_package_id, pkg_shlibs_required_ignore_package_id,
pkg_shlibs_provided_package_id, pkg_shlibs_provided_ignore_package_id,
pkg_annotation_package_id, pkg_conflicts_pid,
pkg_provides_id, pkg_requires_package_id, pkg_lua_script_package_id
PRAGMA changes:
- journal_mode: TRUNCATE -> WAL (better concurrent read performance)
[4 lines not shown]
pkgdb_iterator: convert implicit SQL-89 JOINs to explicit JOIN syntax
Replaced 11 comma-separated implicit JOIN queries with explicit
JOIN ... ON syntax for clarity and maintainability:
- pkg_directories, directories
- pkg_licenses, licenses AS l
- pkg_categories, categories AS c
- pkg_users, users
- pkg_groups, groups
- pkg_shlibs_required(/ignore/provided/ignore), shlibs AS s
- pkg_provides, provides AS s
- pkg_requires, requires AS s
pkgdb_query: optimized SQL queries and removed BUFSIZ limit
- Removed GROUP BY p.id from exact-match which query (path TEXT PRIMARY KEY
guarantees uniqueness per file across all packages; GROUP BY retained for
GLOB mode)
- Changed LEFT JOIN to INNER JOIN in which query (WHERE f.path = ? filters
NULLs, making LEFT JOIN misleading)
- Replaced fixed BUFSIZ buffers with xasprintf() for safe dynamic allocation
in pkgdb_query_cond, pkgdb_query_which, and pkgdb_file_exists
- Simplified pkgdb_file_exists to a const string instead of snprintf
rwhich: remove redundant pkg_files indexes
The pkg_files table has a UNIQUE(package_id, dir_id, name) constraint
which implicitly creates an index with package_id as the leading column.
This serves any WHERE package_id=? query just as efficiently as a
dedicated index, making pkg_files_package_id redundant.
The pkg_files_name index is unused: no query in the binary repo filters
by name alone - file_which either matches (dir_id, name) via the
composite index, or uses a GLOB on a concatenated expression which
cannot use any index.
Only pkg_files_dir_name(dir_id, name) is retained, as it is used by the
exact-match path in pkg_repo_binary_file_which.
rwhich: replace SQL subquery with hashmap for dir_id lookups
The FILEDIR2 prepared statement previously used a subquery:
(SELECT id FROM file_dirs WHERE path = ?2)
for every file inserted (6M+ rows on a full FreeBSD repo update).
Replace this with an in-memory pkghash built from a single SELECT after
Phase 1 directory insertion. Phase 2 looks up dir_id from the hashmap
in O(1) average instead of O(log N) per file via SQL.
Also change FILEDIR2 to accept dir_id as a direct integer parameter,
eliminating the subquery entirely.
Net improvement: ~6M subquery executions → 1 bulk SELECT + ~6M hash
lookups (all in-process, no SQL round-trips).
rwhich: remove unnecessary GROUP BY in exact-match query
The UNIQUE(package_id, dir_id, name) constraint guarantees at most
one row per package per file. No grouping is needed for the exact
match query path. The GROUP BY is retained in the glob variant,
where a single GLOB can match multiple distinct files from the same
package.
jobs/conflicts: fix data corruption in conflict_items_insert
conflict_items_insert called vec_push first, then memmove to shift
elements right at the insertion position. This overwrote the just-pushed
element at the end of the vector, duplicating the last existing element
instead.
Fix by growing capacity manually, shifting elements first, then
inserting at the correct position.
pkghash: fix deletion in open-addressing hash table
Use tombstone markers instead of NULLing keys on deletion.
In open-addressing with linear probing, setting key to NULL creates
a hole that breaks probe chains for subsequent lookups.
Track tombstones and trigger rehash when tombstone density exceeds 25%
to guarantee bounded probe sequences.
fdopendir UB: use dirfd() instead of raw fd
POSIX states that after fdopendir(), the file descriptor is under
the control of the system and must not be used directly. Use dirfd()
to retrieve the fd from the DIR stream for subsequent openat()
and fstatat() calls.
Rename dirfd variable in clean_cache.c to avoid collision with
the POSIX dirfd() function.
pkg: simplify pkg_is_valid with table-driven NULL checks
Replace the chain of 9 identical if/return checks with a loop over
a table of (offsetof, name) pairs. This reduces ~50 lines to ~25,
makes the required-fields list explicit, and simplifies adding or
removing required fields in the future.
manifest: replace sscanf in url_decode with manual hex conversion
The sscanf("%2x") call in url_decode is expensive and pulls in
the full scanf machinery. Replace it with a simple inline hex
conversion using the (c | 32) trick to normalize letter case.
manifest: replace script-type if-chains with table-driven lookups
Replace the manual if/STREQ chains in script_type_str() and
lua_script_type_str() with loops over static const lookup tables.
This makes the valid types explicit and simplifies adding new script
types in the future.
add: fix temp file leak and potential buffer overflow in merge_with_external_tool
- Unlink the output_path temp file on all error paths (strlcpy truncation
and tmp file creation failures) to prevent leaking temp files in /tmp.
- Fix potential buffer overflow when constructing the merge command:
strlcpy returns the full source length even on truncation, which could
advance i past the command[MAXPATHLEN] bounds. Cap at the actual
number of bytes written to stay within the buffer.
checksum: replace sprintf in hex encoding with inline conversion
Replace sprintf("%02x") in pkg_checksum_encode_hex with a
simple lookup table for hex characters. This is faster (no
format string parsing) and equally clear.
libpkg: Add CVE name parsing to OSVf parser
Add CVE names parsing to OSVf parser. As there
is no CVE name in OSVf schema. CVE names are extending
database_specific-object. Usage example JSON would be:
...
"database_specific": {
"references": {
"cvename": [
"CVE-2003-0031",
"CVE-2003-0032"
]
}
}
...
tests: Add CVEs to example OSVf JSON
Add CVSs to example OSVf JSON. Currently
they are only for parsing and there is no
further testing are they correct
fix: Update tests to support new updated OSVf Schema
In official OSVf Schema FreeBSD Ecosystem is not FBSD
but FreeBSD. Update correction to test json file and to library
test file.
refactor: Add SPDX license identifier tags to files
Add SPDX license identifier tags to files that are licensed under the
LicenseRef-scancode-bsd-unchanged license
refactor: Add identifier tags to files that are licensed under the MIT
Add SPDX license identifier tags to files that are licensed under the
MIT license
refactor: Add identifier tags to files that are licensed under the ISC
Add SPDX license identifier tags to files that are licensed under the
ISC license
refactor: Add identifier tags to files that are licensed under the BSD-2-Clause
Add SPDX license identifier tags to files that are licensed under the
BSD-2-Clause license.
refactor: Add identifier tags to files that are licensed under the BSD-3-Clause
Add SPDX license identifier tags to files that are licensed under the
BSD-3-Clause license