Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[pull] master from apify:master #1

Merged
merged 86 commits into from
May 25, 2024
Merged

Conversation

pull[bot]
Copy link

@pull pull bot commented Apr 27, 2024

See Commits and Changes for more details.


Created by pull[bot]

Can you help keep this open source service alive? 💖 Please sponsor : )

@pull pull bot added the ⤵️ pull label Apr 27, 2024
renovate bot and others added 28 commits April 28, 2024 00:05
`undefined` means that there is no explicit rule for the requested
route. No rules means no disallow, therefore it's allowed.

Fixes #2437

---------

Co-authored-by: Jan Buchar <Teyras@gmail.com>
The docs appear to be a bit misleading. If people want "Same Subdomain"
they should actually use "Same Hostname".

![image](https://github.com/apify/crawlee/assets/10026538/2b5452c5-e313-404b-812d-811e0764bd2d)
…#2442)

According to the
[RFC1341](https://www.w3.org/Protocols/rfc1341/4_Content-Type.html), the
Content-type header can contain additional string parameters.
During local development, we are firing events for the AutoscaledPool
about current system resources like memory or CPU. We were firing them
once a minute by default, but we remove those snapshots older than 30s,
so we never had anything to compare and always used only the very last
piece of information.

This PR changes the interval to 1s, aligning this with how the Apify
platform fires events.
…ts()` (#2456)

This PR resolves three main issues with adding large amount of requests
into the queue:
- Every requests added to the queue was automatically added to the LRU
requests cache, which has a size of 1 million items. this makes sense
for enqueuing a few items, but if we try to add more than the limit, we
end up with overloading the LRU cache for no reason. Now we only add the
first 1000 requests to the cache (plus any requests added via separate
calls, e.g. when doing `enqueueLinks` from inside a request handler,
again with a limit of the first 1000 links).
- We used to validate the whole requests array via `ow`, and since the
shape can vary, it was very slow (e.g. 20s just for the `ow`
validation). Now we use a tailored validation for the array that does
the same but resolves within 100ms or so.
- We always created the `Request` objects out of everything, which had a
significant impact on memory usage. Now we skip this completely and let
the objects be created later when needed (when calling
`RQ.addRequests()` which only receives the actual batch and not the
whole array)

Related: https://apify.slack.com/archives/C0L33UM7Z/p1715109984834079
renovate bot and others added 29 commits May 21, 2024 01:10
This will respect the Actor SDK override automatically since importing
the SDK will fire this side effect:

https://github.com/apify/apify-sdk-js/blob/master/packages/apify/src/key_value_store.ts#L25
Co-authored-by: Saurav Jain <sauain@SauravApify.local>
Co-authored-by: Saurav Jain <sauain@SauravApify.local>
Co-authored-by: Martin Adámek <banan23@gmail.com>
Co-authored-by: davidjohnbarton <41335923+davidjohnbarton@users.noreply.github.com>
This takes ~50ms on my machine 🤯 

- closes #2366 
- Replacing spaces with tabs won't be done right here, right now.
- eslint and biome are reconciled
- ~biome check fails because of typescript errors - we can either fix
those or find a way to ignore it~
@gitworkflows gitworkflows merged commit a534b13 into threatcode:master May 25, 2024
6 of 7 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.